I was on a train platform the other morning, and I eyed a now-familiar site on a flat screen monitor that had been set up to alert commuters about arriving trains: A wide Windows dialog box was blocking the center of the screen, indicating some kind of system error. The PC behind the display needed to be rebooted. All it was waiting for was a nonexistent user to come along and click OK.
So I chuckle knowingly at these now infamous occurrences--the Windows NT 4.0 blue screen on the airport screen will of course go down in history as a scar on the soul of Windows--but I also acknowledge that things wouldn't be any better if the IT market suddenly reversed itself. That is, if Linux or Apple Computer's Mac OS X were the dominant platforms, I believe we'd be seeing dialog boxes and crashed screens for those systems in public places instead. Windows is often put in a position to be ridiculed because of its ubiquity, not because it's less well engineered.
I do believe that Microsoft has made dubious technical decisions and does create buggy software. Obviously, the company could do things better. But I don't subscribe to the "grass is always greener" theory that many open source software (OSS) and OS X proponents do. Switching to Linux, for example, will likely introduce more problems than benefits, and it's a long-term commitment.
Regardless, OSS proponents have been pushing the supposed security, reliability, and durability advantages of Linux over Windows for years now. My gut feeling has always been that were Linux installed in as many production environments as Windows, it would fall apart as much or more, albeit in different ways. What's lacking, of course, is evidence. Whereas Microsoft has sponsored study after study to examine the competitive advantages of Windows and Linux, the cozy relationships between the software giant and the companies making these studies always made the results less than believable.
Last week, however, I think we reached a turning point in understanding how Linux and Windows differ in the real world. Yes, yet another study is involved, and yes, Microsoft commissioned this one as well. However, the company that performed the study, Security Innovation, is highly regarded for its independence and methodology. In this study, "Reliability: Analyzing Solution Uptime as Business Needs Change," Security Innovation examines the real-world reliability of Windows and Linux, not abstract and often pointless statistics such as uptime. One caveat: The study very specifically examines an imaginary e-commerce application running over a 1-year period. During that time period, the OSs were upgraded to new versions with various product updates, and the application was upgraded with new functionality.
Ryan Gavin, the director of platform strategy at Microsoft told me in a briefing last week that the study marks the first time Linux and Windows have been compared in such a realistic setting. "This is not about deploying Linux for specific workloads," he told me. "It's about what happens when these systems are managed over time, while new features are being added and the tech stack is being upgraded. This is where we see customers run into issues."
As part of the study, sets of experienced Windows and Linux systems administrators were given control of e-commerce environments based on their respective systems. The Windows environments were based on Windows 2000, then upgraded to Windows Server 2003 and any applicable hotfixes and security patches during the simulated year of the study. The Linux environments began life with Novell SuSE Linux Enterprise Server 8 and were upgraded to SuSE 9 and any applicable updates. Both groups of administrators had to configure and maintain the systems over time, introduce new functionality to the e-commerce application over time (including personalization, dynamic search, and list-targeting features), and perform the major OS version upgrades. Security Innovation examined the performance of the administrators, noting how long they took to execute each task.
At a high level, the Windows systems were dramatically more reliable than the Linux systems. On average, patching Linux took six times longer than patching Windows, and there were almost five times as many patches to apply on Linux (187) as there were on Windows (39). More important, perhaps, the Linux systems suffered from 14 "critical breakages," software dependency failures in which software simply stopped working on those systems. Windows had no dependency failures.
In my own experiences with Linux over the past decade, I've seen this sort of dependency problem arise numerous times. A certain software component requires a certain version of another software component, so you upgrade. But that upgrade breaks another component that depends on the upgraded component. "On two of the Linux environments, the administrators grabbed \[the updated component they needed\], dropped it in, and then had huge dependency failures," Gavin said. "One of them went into an immediate downward spiral that affected the RPM installer and created an unbootable system. Another Linux administrator actually did get it all to work OK using custom code he created. So he worked around the issues. But when he finished, he had no confidence the system could be managed by anyone else. If he got hit by bus, it was over."
These dependency problems can be even more insidious than might be immediately obvious. According to Gavin, Novell (and Red Hat, another major Linux vendor) won't support enterprise systems that have had upgrades to core system components. So after you perform certain upgrades, he said, those environments are no longer supported.
Another pertinent bit of information is the way in which patches and system updates are applied in each environment. On the Windows environments, each of the administrators followed the same path for installing updates. But on Linux, each of the administrators took entirely different approaches. "From a Linux standpoint, there are a million ways of doing things," Gavin noted. "But there are also a half a million ways to do them wrong. You start to hit walls because of the fragility of the \[Linux\] architecture, and the benefit of componentization \[also\] has some detriments, including complexity, manageability, and time to market."
As you might expect, Novell isn't particularly amused by these assertions because the company has essentially set itself up as a major player in the Linux market. In a blog posting last week, Novell Senior Manager of Public Relations Kevan Barney wrote that Microsoft was trying to confuse the market. However, I find his two basic points to be somewhat spurious. First, he denied that Linux had interoperability problems, then suggested that Windows' security problems would cause more long-term financial problems regardless. Second, he used that old chestnut that OSS backers often pull out in a bid to change the argument. He claimed that SuSE, as a Linux "distribution," can't be fairly compared to an OS such as Windows because SuSE (and other Linux distributions) contains so many software packages and services. The point here is that the number of patches required by a Linux distribution is artificially high. If that's the case, perhaps Novell and other Linux vendors could ship more concise distributions. Until they do, we can compare Windows only to the Linux distributions customers can actually buy.
What Barney doesn't address is the fundamental truth of the study, which I find to be fairly ironclad: When compared with Windows systems in the real world, Linux distributions are more complex to manage and can thus be less reliable. This doesn't mean Linux is a lost cause. But after years of posturing by the Linux community, I'm surprised that's the best they can offer when some serious concerns are raised.
Reliability: Analyzing Solution Uptime as Business Needs Change (PDF version)