Spread the Love: The Challenge of Corralling Scattered Data

I took my first database class at Penn State back in 1987. I still remember one of the first lectures, in which we tried to debate and define what a database is. We agreed that a well-organized metal file cabinet had characteristics of a database, and I remember the professor quoting statistics to highlight that the vast majority of corporate data was stored in loosely structured electronic formats such as spreadsheets rather than the “real” databases that we would be studying. The more things change the more they stay the same. I suspect that wide swaths of the global business community still store huge amounts of data outside of “real” databases.

The Double-Tongued Word Wrester Dictionary ( http://www.doubletongued.org ), a fun site that records undocumented or under-documented words from the fringes of the English language, provides a quotation from the Info-Tech Research Group article “Spreadmarts Bad for Business.” The article explains a spreadmart as follows:

"When spreadsheets containing valuable corporate data are duplicated uncontrollably, and then modified differently by different users, each file becomes a separate version of the “truth.” Each one of these fractured versions of the truth is called a “spreadmart.” Coined by Wayne Eckerson in 2002, spreadmart is a word meaning both spreadsheet and data mart."

I plan to explore the spreadmart idea in multiple upcoming columns. I’m also going to take the liberty of expanding the term spreadmart to include sources of data that might not be traditionally viewed as business intelligence (BI) data. For example, some colleagues, a member of the SQL Server Development team, and I recently had a conversation about real-time BI, and we wondered whether building tools on top of RSS feeds would be real-time BI. This idea might not traditionally fit into the BI arena, but it’s arguably an interesting way to use BI tools.

I know that the spreadmart phenomenon is a data-management problem that various groups at Microsoft have been trying to address, and numerous tools in the current Microsoft suite--such as Microsoft Excel, Microsoft SharePoint Portal Server, and Microsoft Office InfoPath--can address spreadmarts in different ways. Microsoft’s pending Office 2007 release includes several interesting server-based technologies, and improvements to existing products will further the goal of dealing with spreadmart. At TechEd this week, Microsoft is making several BI-related announcements that I will cover in upcoming articles. And just last Tuesday, the company announced Microsoft Office PerformancePoint Server 2007, an integrated BI platform that will tie together all the company’s BI tools.

What do you think about spreadmart? Is it a problem that we can ever really solve? I think this will be an interesting topic for on-and-off debate over the next few months, and I look forward to hearing from you on the subject. Send your thoughts to me at [email protected]

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.