Data Warehouses: Are They Getting Too Big?

Data-warehouse proponents have long dreamed of providing one view of the enterprise to the entire organization. For example, by integrating multiple operational data sources, you could give the sales team, the credit department, and the customer service group all the same view of one customer. Then, the sales representative would know whether it should offer a discount to clinch a deal because that customer required very little customer service. Or, if the customer required too much hand holding and took too long to pay the bills, the sales representative might concede that it was not worth doing business with the customer.

But the dream of an enterprise data warehouse that can provide the foundation of business intelligence (BI) applications has too often turned into a nightmare for IT departments. Through the mid-1990s, the technology to build large-scale data warehouses was immature and required a lot of custom coding. Many projects failed technically—they simply didn't work.

Currently, data-warehouse supporters argue that existing, packaged applications are sophisticated enough to get most data-warehouse projects up and running. But data warehouses still might not have an impact on the operational performance of a company unless the organization changes its corporate culture and business process.

One of the most complicated problems facing data-warehouse professionals is the volume of data that companies want to include. Attempts to build warehouses encompassing data from an entire enterprise have raised significant risks of failure, so many companies scaled back their projects and began building data marts, which often involved data only from one functional area. But an exclusive analysis based on data from reveals that companies, once again, are intrigued by the prospects of very large data warehouses. Graph 1 shows the amount of data currently contained in operational data warehouses.

Currently, fewer than 25 percent of the survey respondents have more than 1TB of data in their data warehouses. Although 1TB of data represents a lot of information, most data warehouses exist only in the world's largest companies. Some observers believe that companies with less than $500 million in annual revenue can't afford to build large-scale data warehouses. Within the large-enterprise community, a terabyte of data in a data warehouse is hardly outlandish. At the same time, a data warehouse with less than 50GB probably represents a pilot project.

But if data warehouses are already big, companies anticipate that they will get bigger over the next 3 years. Graph 2 shows the capacity that companies anticipate their data warehouses will have in 3 years.

Companies believe that far fewer small data warehouses will exist 3 years from now. At the same time, the number of data warehouses 1TB or larger will grow by more than 50 percent.

Graph 3 shows that many companies anticipate the most significant growth will come in data warehouses with 2TB to 5TB of data and data warehouses with greater than 5TB of data.

Increasing the size of the data warehouse increases the complexity of the implementation. This complexity is evident when you look at the hardware platforms for most data warehouse projects. Although about 70 percent of all corporate data resides on mainframe computers, the source data for most data warehouses resides on enterprise servers, as Graph 4 shows.

Most data warehouses draw source data from several different hardware platforms, as Graph 5 indicates. But it is not clear that the motherlode of information—the historical data that resides on legacy mainframe computer systems—is being, and can be, effectively tapped.

The grand vision of data-warehouse technology—the ability to provide one view of an enterprise—has not yet been realized. In the face of failures in the mid-1990s, data-warehouse proponents reined in their ambitions. Over the next 3 years, however, companies seem poised to once again build very large, enterprise-strength data warehouses. It is still an open question as to whether they can overcome the barriers ahead this time around.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.