The Tsunami of Data Growth

Explosive data growth has been a defining feature in the growth of the storage infrastructure over the past 10 years. The amount of data captured and stored has climbed impressively. In its annual study of the largest database implementations, Winter Corporation found that the workload of the largest transactional database nearly doubled from 2001 to 2003. Last year, the most heavily used database system (according to Winter)--the database management system (DBMS) at US Customs & Border Protection (CBP)--supported 51,448 transactions per second (tps), up from 26,655tps in 2001.

Although the CBP database runs on an IBM eSeries mainframe, Winter identified similar growth at every level of database technology. For the first time, Winter named the top 10 databases running on the Windows platform; the largest of those recorded 3634tps.

Increased workload reflects increased data volumes. The database that contains the largest volume of normalized data (94.3TB) is at AT&T Labs-Research. (The normalized-data metric measures how much information a database manages, excluding indexes and other management-oriented data.) The AT&T implementation uses AT&T's Daytona database management software, Sun Microsystems SunFire E10000 servers, and Sun StorEdge storage systems. In Winter's new category of hybrid databases, in which most information is stored on tape rather than on disk, the Stanford Linear Accelerator Center (SLAC) was the largest. The 828TB SLAC database also uses SunFire servers and Sun StorEdge storage arrays.

The question is, where is all this data coming from? Looking back, the answer is four-fold: improved instrumentation, automated enterprise business processes, individual productivity software, and analytics. Improved instrumentation that captures digital rather than analog data has driven the growth of scientific, engineering, and production data. In business, data growth has come from the implementation of IT systems that automate enterprise-level business processes such as enterprise resource planning (ERP) and customer relationship management (CRM) and from individual productivity applications such as email and word processing. The final piece of the puzzle has been analytics. After companies capture data, they want to use it to improve their business processes and outcomes. Transforming transactional data into a format suitable for analytics generates even more data and is a major source of data growth.

The future promises new sources of significant data growth. These sources include the proliferation of new data types, such as audio and video information; the development of handheld devices that have data management capabilities; and the adoption of radio frequency identification (RFID).

Digital audio and video might not seem new--their development stretches back nearly a quarter of a century. But as network and storage technology has improved, the use of digital audio and video has grown. For example, although the press has concentrated on the impact of Apple Computer's iPod music player on the music business, the iPod is a storage story as well. Suddenly, people were able to buy a thousand songs for a very reasonable price and carry those songs around with them. From a wider perspective, as recording, transmitting, and receiving audio files over computer networks becomes easier, new uses for audio in the enterprise will emerge.

The iPod is just one of an emerging new generation of devices that let users manage data locally and periodically synchronize the data with a storage infrastructure. A generation of specialized handheld devices is being developed for applications ranging from field service to medical data management. Increased use of handheld devices that have significant local-data-management capabilities will result in people generating more data that will eventually make its way to centralized corporate storage devices.

But RFID is the heavyweight in the future-data-growth arena. Experts view RFID, which facilitates the tracking of goods through the supply chain, as the most important new information technology since the development of universal product codes (UPCs). Over time, virtually all goods will be tagged with a small radio transmitter. Tracking is the first application, but the tags will undoubtedly evolve to produce additional data for other applications, and that data will also have to be stored. Business executives inevitably will want to analyze the data, and transforming it into a format suitable for analysis will result in yet more data that must be stored.

The amount of stored data has grown exponentially over the past decade. But emerging technologies will cause a veritable tsunami of data in the years to come.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.