Web benchmarks are useful for gathering a lot of information about your sites and servers. For example, you can use Web benchmarks to load test hardware before you deploy it to determine its stability under load, find out your existing hardware's capabilities, find bottlenecks in your applications, or even determine which Web server software or platform will best meet your needs. (Many different Web benchmarks are available on the Internet. See Table 1 for a list of common tests.)
You might never have to test with a specific benchmark for that benchmark to be useful to you. With all the competition in the hardware and software markets today, many vendors are publishing their own results to demonstrate their equipment's capabilities.
For example, many vendors use the SPECweb99 benchmarking software to test their products. The SPECweb99 benchmark, which Standard Performance Evaluation Corporation (SPEC) wrote, is the most recent release of this benchmark. The first release in 1996 was called SPECweb96. SPEC has many other benchmarks, and I recommend that you check out its Web site (http://www.spec.org) for further information. Many top hardware vendors are using the SPECweb99 benchmark to demonstrate their hardware's power in Web serving environments that use different Web server software and OSs.
I'm going to teach you how to make sense of Web benchmarks. I give you a brief explanation of how to read a benchmark, followed by a comparison of specific sets of results from different benchmarks. Later in this article, I take a more analytical look at some of these results and show you how to interpret them for useful information.
Here are a few items to keep in mind when you're looking at benchmarks so that you can interpret them without jumping to conclusions or being misled by the information that's presented. (The information in this section applies to any benchmark, whether it's Web, CPU, or I/O related.) Begin by reading about the benchmark. Information to look for when you're reading includes methods used during testing, the hardware and software tested, and any special changes the testers made to the operating environment during testing. By paying attention to these details, you can better understand how the testers achieved their results.
Start by looking at the benchmark itself. Is it an industry standard benchmark with no affiliation to a particular hardware or software manufacturer? If not, who came up with the benchmark? For example, it wouldn't be fair for Microsoft to make a benchmark for Windows NT, then test NT against Sun Microsystem's Solaris. In such a case, Microsoft might have conducted the test to produce favorable results. After all, Microsoft is trying to sell NT, not Solaris. You need to look at the results objectively; benchmarks can be deceiving.
Next, does the benchmark compare similar items (e.g., comparing two servers or two OSs)? If the compared items aren't similar (e.g., comparing an Alpha processor to an Intel processor), the results are more difficult to interpret. Did the testers take special steps to prepare the items they were testing before they executed the tests? For example, in the case of OSs, if you're reading results of similar OSs, were they tweaked at all? If so, were they tweaked in the same way? I've seen tests that pit two Web servers on different platforms against each other with OSs tuned in different ways.
When looking at any kind of benchmark, you should view it as you would any sort of sporting event. No team or player should have an unfair advantage. As consumers and users, we want to see a good clean match in which each contestant has a fair chance to prove that it deserves our business.
Comparing Benchmark Results
Before you read on, look at the results that Table 2, page 6, shows. I chose to compare the SPECweb96 and SPECweb99 benchmarks because they're good standards-based benchmarks that a third party has created. Many hardware manufacturers use these benchmarks to performance test Web server software on their servers. SPEC has benchmarks that test everything from Java Virtual Machine (JVM) performance to CPU performance.
Table 2 compares the SPECweb96 results with and without Microsoft Scalable Web Cache (SWC) software. Looking at the results, you can draw several quick observations. In the SPECweb96 testing procedure, IIS served only static pages. Because IIS served no dynamic pages during this test, you can see why using a Web caching engine provides better results.
Next, notice that the results vary among the one-, two-, and four-processor tests. The results for the four-processor IIS 5.0 with SWC 2.0 test are much better than the results for the four-processor IIS 4.0 SWC 1.1 test. (However, remember that you should compare only similar items, so you can exclude the results from the IIS 5.0 machine with SWC 2.0.) Notice that the IIS 4.0 results indicate that IIS doesn't scale linearly between one and two processors, but it scales well between one and four processors.
Now, looking at the non-SWC results for IIS 4.0, note that the scaling pattern is almost the same as the results with SWC. However, also note the much smaller scores. The smaller scores result from the fact that the test doesn't compare similar items. In one instance, the testers are caching the data; in the other instance, they aren't. If you don't take the hardware into consideration and you serving only static content, you can see the advantage of installing the SWC software.
I included the SPECweb99 results strictly for the purpose of demonstrating how you might see results presented in the wild. Knowing that you can't fairly compare dissimilar environments, if you were to compare the SPECweb96 and the SPECweb99 results, you would first investigate the testing methods. After your investigation, you'd realize that with the new revision, the testers have changed how they test. Instead of strictly static content, the testers have added some dynamic content to the workload to better represent what's on the Internet today.
Finally, you must consider any tweaking or tuning the testers performed and whether the tests were performed in a special environment. In the case of the SPECweb test examples, many of the tests share common modifications and special environments that you might not see in the field unless money is no object. Without going into great detail, here are the changes that the SPECweb testers made. The testers took a great deal of care to tweak the main subsystems, including disk I/O, network I/O, and processor load, by making many nonstandard changes to the registry. The testers also tuned the OS, IIS, and networking stack through the registry. In addition, the testers created a special hardware environment that included gigabit network cards and multiple hard drives, which separated the OS and page files. Outside the testing environment, some of these changes might not be practical to use in the field.
If you're performance testing Web software with SPECweb9X testing software, make sure that your testing environment closely mimics your production environment. If you don't want to spend a lot of money to buy the SPECweb9X testing software, you can use any tool that Table 1 shows to create your own tests. Just remember that whenever you're testing, no matter how small the change is, always retest to see what kind of improvements (if any) you made.