Deficient IBM and Hitachi 120,000 mailbox configurations

Reading the Exchange Solution Reviewed Program (ESRP) page for Exchange 2013, I was taken by the claims advanced by IBM and Hitachi in documents that purport to describe storage solutions capable of supporting the predicted I/O workload generated by 120,000 Exchange 2013 mailboxes. My jaw dropped as I read the documents because both solutions are radically inadequate.

Let me explain why I should use such strong words. The IBM solution uses 24 servers in two Database Availability Groups. Nothing wrong here until you read:

“This solution utilizes Microsoft® Exchange 2013 Mailbox Resiliency with database availability groups (DAG). A two DAG solution comprised of 24 mailbox servers was created that supported a total of 120,000 mailboxes with a mailbox size of 2GB. Two XIV Gen3.2 frames were used, and the databases and copies were equally distributed across them. Each server hosted 5,000 users, and had 12 active databases, with 833 users per database. Within the DAG, there were two copies of every database; one local, and one on another server connected to a second XIV Gen3.2 storage array. This configuration can provide for both high-availability, and disaster-recovery scenarios.”

The problem stares us in the face: each database has just two copies. Why, I ask myself, would I want to deploy a solution that supports so many users in a configuration that exploits Exchange’s inbuilt high availability features without supplying sufficient capacity to protect the databases against failure? After all, a 12-node DAG has tons of potential to support more than two copies per database. In fact, you could have up to 12 copies (a tad excessive) of each database if you wanted to deploy sufficient hardware. Three (or even better, four) copies of each database would be much more reasonable for anyone seeking true high availability.

The core problem in IBM’s configuration is that the servers are deficient in memory and storage. In fact, they have just enough to be able to run the JetStress tests to validate that the configuration can support the theoretical I/O load that might be generated by 120,000 simulated users. Not enough storage capacity or RAM is available to support sufficient database copies to “provide for both high availability and disaster-recovery”.

Bad as the IBM configuration is, I think the Hitachi configuration is much worse. The introduction proudly proclaims:

“This solution includes Exchange 2013 Mailbox Resiliency by using the database availability group (DAG) feature. This tested configuration uses twelve DAGs, each containing twenty four database copies and two servers (one simulated). The test configuration was capable of supporting 120,000 users with a 0.12 IOPS per user profile and a user mailbox size of 2 GB. “

Once again we see an obvious problem. The IBM solution at least follows the principle that a large DAG is usually better than a small DAG because a larger number of DAG members allows for greater flexibility in database copies and placement. The Hitachi solution is ridiculous and demonstrates a total lack of knowledge about how DAGs work. Why anyone would think that deploying twelve two-member DAGs would be anything close to a good configuration for 120,000 mailboxes is beyond me. In fact, it’s a joke configuration that is designed with one purpose in mind and that’s to pass the theoretical challenge posted by JetStress.

Neither configuration is suited for a production system. Both are designed solely to test storage and lack the resources necessary to handle the full workload of multirole DAG member servers deployed into real-life environments. And both use storage-based protection (RAID) to justify minimum DAG-based protection against failure, which is fine if you want to depend on just two database copies... Even with the undoubted goodness that RAID and enterprise-class disks can provide over JBOD, two database copies are insufficient.

It’s a great idea for Microsoft to provide a single location where vendors can post their results for different Exchange 2013 storage configurations. All the flaws and benefits of the various approaches that can be taken in hardware planning for different scenarios are exposed in one place. But hardware vendors do themselves no good by posting results that are laughable, obviously deficient, and totally impractical in the real world, all of which goes to prove that you should never accept documents that describe vendor-supplied configurations at face value. Always validate the described configurations and results using your own knowledge and experience of how software works in production environments or by running your own tests. After all, anyone can download and run the JetStress tool.

If IBM and Hitachi really understood Exchange they wouldn’t have come up with these configurations. A few minutes playing with the Exchange 2013 server role calculator would have resulted in much better configurations. But maybe the IBM and Hitachi hardware wouldn’t have done so well using those configurations…

Follow Tony @12Knocksinna

Comments

Plain text