Benchmarks: How Useful are They in Real-World Deployments?

One of the most common questions I get relates to the published benchmark results for Exchange Server. The Messaging API (MAPI) Messaging Benchmark (MMB) has become the de facto standard for measuring Exchange Server scalability. But does this benchmark really reflect scalability or deployable configurations?

To start, you need to understand the vendor benchmarking game. I've had a chance to see the game played out over the last few years since Microsoft first shipped Exchange Server. The hardware vendors fall into two camps: those who do the work and those who don’t. The vendors who do the work pride themselves on publishing results that push Exchange Server MMB results to the highest level. The no-work vendors simply try to out-perform the previous record result by getting 5 or 10 more MMBs out of a server configuration, or they rely on Intel to do the work for them. The game is a leap-frog competition in which each vendor aims for the most-recently published, highest MMB result. This approach is similar to what occurs in the Transaction Performance Council (TPC) world with benchmarks such as TPC-C and TPC-D. The latest MMB results are at stratospheric levels in the low 30,000 MMB range. The most irritating thing about the game is that results have become unusable for Exchange Server deployment planners trying to understand how many users per server they can deploy.

So, is this scenario going to get better in the future? Fortunately, relief is in sight, and Microsoft has recognized that MMB results are not reflective of actual deployment scenarios. Again, this situation is similar to what happens in the TPC world. When the current benchmark standard has outlived its usefulness, you define a new one. For Exchange 2000, I anticipate Microsoft will reposition Exchange Server MAPI benchmarking and attempt to create new standards for Internet messaging protocols such as IMAP and POP3. Working with hardware partners, Microsoft will define more realistic benchmarking workloads that are closer representations of real-world deployments. For example, work is already underway on a new benchmark tentatively known as MMB2. MMB2 will continue to use LoadSim but would take the standard medium canonical profile to a new load level. Benchmark developers will accomplish this task by adjusting load parameters such as message and attachment sizes, distribution list usage, send/receive ratios, calendaring and public folder usage, and other MAPI tasks that dramatically increase server load. The focus is on disk I/O and CPU utilization—performance areas that the current MMB workload grossly underestimates. Although no benchmark can perfectly represent your environment, the hope is that new benchmark definitions for Exchange 2000 Server will provide planners with a more realistic comparison that represents real users.

Comments

Plain text