Benchmarks Gone Bad

Benchmarks are serious business in the database industry. World-record benchmark scores mean publicity and product recognition. That's why Microsoft, IBM, and Oracle spend hundreds of thousands of dollars putting together Transaction Processing Performance Council (TPC) benchmark assaults in hopes of becoming the database King of the Hill. However, DBAs and developers often question the value of these benchmark scores, citing the fundamental differences between the benchmarked environments and their real-world production environments. Benchmarked systems typically comprise above-average database server hardware. The disk subsystem is configured for performance using RAID 0 rather than for recoverability using RAID 1 or 5. The benchmark application, although modeled after a real order-entry and shipping scenario, isn't actually in use anywhere, and the database schema and the data in the database are nothing like what's in your own systems.

But these differences don't mean that benchmarks have no value. The TPC-C benchmark tests, in particular, demonstrate the ultimate high-end scalability of a database system. They can also draw a correlation between the performance scores and the total cost of the tested system. Although they don't represent the type of transaction rates you'll see for your application, they do show how each database system reacts under a comparable load. These tests are successful because the TPC solicits input from all its members, creating an environment that's fair to all tested products. The TPC serves as an independent auditor of the results, focusing solely on the benchmarking process. In addition, the vendors themselves conduct the tests—there's no doubt about the expertise available to install the system or fine-tune the database.

However, not all database benchmarks are created equal. The reasons why the TPC tests are successful are the same reasons why magazines and other interested parties, such as the big consulting firms, can't perform good database benchmarks. Recent eWeek tests provide a high-profile example of a database benchmark gone wrong—at least from the SQL Server perspective. First, eWeek ran the test suite as a Web application, which doesn't seem unusual until you consider that the testing staff didn't use Microsoft IIS. Instead, eWeek opted for BEA's WebLogic Web server. Second, the application was written in JavaScript—certainly not representative of any production SQL Server Web application that I've ever seen. Furthermore, eWeek performed the tests using a beta version of the recently released Microsoft JDBC driver.

In light of such a skewed starting point, it's hard to have any confidence in the testing staff's ability to perform more sophisticated database installation and tuning. To its credit, eWeek admitted that its tests had a middleware dependency, then ran an addition to the tests that showed vastly better SQL Server results from a more typical SQL Server installation. But the magazine still published the Java-based results. Unfortunately, one poorly conducted benchmark casts doubt on other benchmarks, such as the TPC, where the organization works hard to set up a fair test.

Comments

Plain text