In “What’s So Great About Longhorn?” (InstantDoc ID 45479) and “Requiem for WinFS” (InstantDoc ID 45630), I wrote about why Longhorn without WinFS is a bad idea. However, in those articles, I discussed only the merely attractive components of WinFS. Now, I’d like to look at the change-the-world part of WinFS: non-file items. Since the days of DOS, Microsoft OSs have treated hard disks the same way: partition them into volumes named after letters of the alphabet (e.g., C) and put just two kinds of items on them—files and folders. But let’s take a closer look at what files are and what we use them for.
In the early days of PCs, most files were either text files (e.g., readme.txt, program documentation, BASIC source code—one of the most common forms of programs at the time), executable files (e.g., .com, .exe, .dll), word-processing files (typically just ASCII text or something similar), or spreadsheet files. But as far as the OS was concerned, only two kinds of files existed. First, there were “text” files, which restricted their contents to readable ASCII (or Unicode) characters, carriage returns, and line feeds. The OS understood that these files contained lines of text information, and every line was separated from every other line by a carriage return and a line feed. Second, there were “binary” files, files that the OS essentially didn’t understand. All the OS knew about a binary file was that it started at one place on the disk and ended at another place on the disk. Its records—if it even was record-oriented—were demarcated in some way that the OS didn’t grasp, and the OS left it up to the developer who created the file to understand how to read, write, modify, and query its records. (I’m simplifying the scenario because the OS does understand a few kinds of binary files—.exe and .dll files are good examples.)
But look at the kinds of data files that we use today—files such as Windows Address Book (WAB) files, Quicken files, Microsoft Outlook personal folder store (PST) files, Outlook Express files, and Internet Favorites. Consider files that contain more than one phone-book entry, or more than one checking account entry, or more than one calendar entry, or more than one email message. In other words, these are files that contain databases.
They’re simple, informal databases, yes, but they’re databases nonetheless. They each have a schema, and they have records. Their schema is their structure—a list of the fields, attributes, or columns that the database contains. Their records are, of course, the entries inside them—the phone numbers, the checks written, calendar appointments, and so on. Those schemas will vary depending on the application. For example, Quicken’s data files include your checking-account balance, but your WAB files don’t. These types of files also resemble databases in what we want to do with them: We need to add records (e.g., write a check, create a new contact), delete and modify records, and query the entire database to ask questions—“How much money did I spend on electricity last year?” or “What’s Harry’s e-mail address?”
These informal databases differ from more formal databases in at least two ways. They aren’t hosted on Microsoft SQL Server, Oracle, MySQL, or other database engines. And their file formats aren’t documented, in most cases: I can’t issue SQL Server queries to WAB or Quicken. The formats of those files (and many other informal databases) aren’t documented, either.
Now, that’s a shame: I enter valuable data into an application, which then stores my data on my hard disk, but then chains me to that application whenever I want to use the data. If I wanted to grab a contact from WAB and use it in some other application, I’d find it either difficult or impossible, depending on the target application.
Don’t misunderstand me—I’m not slamming developers. The problem, as I see it, is that a large percentage of modern PC applications need to store record-oriented data, but the OS doesn’t know what a record is. So developers end up needing to create ad hoc databases that only their application understands. The result is a hard disk that contains little “islands of data” that can’t communicate with other applications’ islands. Microsoft has tried to address this problem over the years with COM, DCOM, Dynamic Data Exchange (DDE), OLE, and other programming and data-interchange frameworks, but all those frameworks have been simply patches over the basic “island” nature of data—nothing more than a fleet of boats, you might say.
WinFS, in contrast, drains the oceans between the data, removing the need for the boats. Whereas Windows currently understands only files and folders, WinFS understands files, folders, and records. No, “records” isn’t the Microsoft term—the company calls them “non-file items”—but that’s essentially what they are. Greatly simplified, WinFS makes the file system smarter. In fact, WinFS makes the phrase “file system” outdated. The better phrase might be “data system.” You can either save data on your hard disk encapsulated as files, as we’ve done until now, or you can save it as records. Because the OS understands records, hard disks look less like dumping grounds for files and more like collections of databases, all of which you can view, query, and modify with a standard set of tools.
Sure, I know. I hear you. We’ve had the benefit of standard, unifying tools for data storage and manipulation for years in the form of SQL Server databases. But I’d bet that the sum total of all the informal databases (contact lists, stores of email messages, folders full of digital pictures, personal-finance data) would far exceed the size of all the formal SQL Server databases in every organization in the world. A file system—a data system—that makes it simple to bring personal data of all kinds under the SQL Server umbrella could make our PCs more useful and perhaps keep them useful when their hard disks start to exceed terabytes in size.