Performance is the Holy Grail of the computer business. Most vendors tout their products as providing better performance than the competition. Big database companies spend tens of thousands of dollars running benchmarks on machine configurations that cost hundreds of thousands of dollars to prove that their database is fastest. And all this effort exists to convince you, the systems administrator, that vendor metrics for measuring performance are valid indicators of the way a product will behave in your environment. Wouldn't it be nice if that were true?
The Ideal and the Real
When I think about performance, I'm often reminded of the aftermarket automotive parts business. Hundreds of vendors sell products that give your car more horsepower. The only problem is that if product A promises to add 25 horsepower and product B promises to add 15 horsepower, the total horsepower you gain if you use both products isn't cumulative; in other words, you don't get an extra 40 horsepower. In fact, because your car isn't the perfect test environment that the vendor used to substantiate its claims, you'll be lucky if your total performance increase equals either of the individual product's claims. Just as automotive parts vendors do, vendors in the computer industry design performance evaluations to show their products in the best possible light.
But real networks aren't picture-perfect environments in which troublesome elements can easily be eliminated to enhance performance. Gestalt is a better descriptor: The whole is immeasurably greater than the sum of its parts. The performance problems that a systems administrator faces are often symptomatic; the network environment is experiencing effects of an unknown origin that manifest as a performance problem. Perhaps users are complaining that the network is "slow" (a symptom). How do you determine what's really causing the problem?
A few years ago, I had a conversation with the Chief Information Officer (CIO) of a Fortune 50 company. His office, which was in the corporate headquarters, was also the IT department for the relatively small number of employees at the corporate level (about 300 spread over four floors of an office tower). The IT department was getting ready to rip out the existing network infrastructure and replace it with the latest and greatest in high-speed networks. From the CIO's description of the company's network problems, it didn't sound to me as if replacing all the hardware would provide a solution, so I drilled down for more information. As it happened, all the 300 corporate-level users were on a single network segment. I suggested segmenting the network by floor and adding a faster backbone between the floors. I received an email message 2 weeks later thanking me for saving the company $250,000. This CIO's problem was a simple one to solve from my perspective, but his IT staffers' perspective was limited by the fact that each had a very narrow area of responsibility and was focused exclusively on the symptoms manifesting in his or her area. My advantage was that I wasn't too close to the problem—I was able to take a holistic view and consider systemic issues that would cause the problems the CIO had described.
Analyze Twice, Implement Once
When you evaluate the performance of your network environment, focusing solely on components doesn't go far enough. You must consider how the entire network works together, looking beyond individual hardware and software concerns to gain an understanding of how you need to integrate all the network's components. Make sure you factor in the way your network configuration fits into your business process. For example, in the past few years, I've come across several companies with a corporate mandate to move to a server-based computing model to cut costs. (Let me emphasize that absolutely nothing is wrong with a well-designed server-based computing system in the proper environment.) The situations I've witnessed seem to go pretty much the same way: A successful pilot project is followed by a rollout that's a total failure. In each case, the failure results from one of two common mistakes.
The first mistake is misunderstanding the corporate business process and how a small but crucial part of the company does its job: "I know we're rolling out our server-based initiative today, but what do I do with the 25 users in the art department and their Macintoshes?" The second mistake is misunderstanding the existing network infrastructure: "Why didn't somebody tell me that all our MAN links are only fractional T1? We've been trying to move hundreds of users' data over some very skinny pipes."
Make sure that you have in place a good analytical process for identifying the performance problems in your environment. Performance problems that are easy to solve (e.g., replacing hardware that's broken or no longer performing up to specification) will always exist. Performance problems that crop up after software upgrades might be remedied simply by adding more memory to users' machines, rather than by replacing the hardware. But you need to ensure that you understand the true source of a particular problem and aren't merely treating its symptoms. Always ask yourself, Will fixing this first problem simply expose a second problem? Even if you're certain that a user needs a newer and faster machine, don't leap immediately into an upgrade. What impact will changing a department from 233MHz Pentium II machines to 2GHz Pentium 4 machines have on the rest of your environment? If the Pentium II machines weren't putting a heavy load on the network because of the time they took to process data, the Pentium 4 machines will remove that network performance cushion. Resist the temptation to throw a bigger/faster/better/new/improved solution at a problem without analyzing the problem first. Treating the cause can prevent a recurrence of the symptoms; treating the symptoms will only mask the problem for a while, and when the problem resurfaces, it will likely be worse than before.
A well-running systems environment is more than the sum of its parts. It's an enabling technology for your company's business; you should get far more out of it than you need to put into it. You can't measure your computing environment's performance in megahertz, bytes per second, SPECint, or by any other artificial test metric. The only metric that matters is how well your computing environment improves your business performance.