Understanding the overhead in an Exchange mailbox database

It’s taken me a while to get around to mentioning the rather useful “Database Growth Reporting” script for Exchange 2010 and Exchange 2013 that was described on the EHLO blog in January 2014. My apologies for this lapse in service. All I can plead is that other stuff (like Exchange 2013 SP1 and MEC and then Exchange 2016) got in the way between now and then and that I never really had a chance to test the code out thoroughly, which is a prerequisite before commenting on software. Or at least, it should be.

But back to ExchangeDatabaseGrowthReporting.ps1, there are two reasons why I like the script very much. The first is that its authors work for Microsoft’s support team (at least, their job titles as listed are “Senior Support Escalation Engineers”). This isn’t to say that every piece of software generated by support personnel is brilliant and awe-inspiring, but the undeniable fact is that these folk see a different face of software to most of us. Instead of the bright and wonderful picture painted by the marketing and sales people, support personnel have to deal with the dank underbelly of software, the place that no one really wants to go unless absolutely necessary.

And because they see all the problems that afflict software and have the chance to discuss these issues with their colleagues, support personnel often come up with really helpful solutions for the rest of us. The Remote Connectivity Analyzer (ExRCA) and the Office Configuration Analyzer Tool (OFFCAT) are other examples of support tools that add enormous value to Exchange administrators.

The database reporting script might not be the most elegant piece of software ever created but it does a job by focusing on a fact that many forget when they work with Exchange. The heart of the product is a set of one or more databases. Sure, despite the loud calls of many, the database is not serviced by a “real” engine like SQL. Exchange has to “make do” with Jet. I say this with tongue firmly in cheek because Jet has done a superb job for Exchange since 1996…

Because databases are involved, some overhead has to be expected. No database is perfectly efficient when it comes to internal storage. In the case of Exchange, a heck of a lot of transactions flow in and out of its databases and those transactions are multidimensional in nature. No one mail message is similar to another. They all seem to have different body parts, number of recipients, properties, and so on. Jet gobbles everything up and keeps on working, which is exactly what you want it to do.

Helping administrators to understand how much overhead is incurred by normal database operations is the second reason I like the reporting script. Why is this important? Well, we have to size a database to understand how many mailboxes will “fit” into it. In the old days, this was simple enough because we’d multiply the mailbox quota by the number of mailboxes to get an approximate size and then add 20% for contingency and growth. Because mailbox quotas were so small (storage was expensive), most users tended to use the majority of the quota (thus leading to many happy hours spent deleting messages to free space so that new mail could be delivered), and the sizing was pretty accurate. Of course, Jet was a lot less efficient in those days, but even so, the sizing process was much simpler.

Now we have much larger mailbox quotas (which people don’t fill) so the old mailbox quota * number of mailboxes calculation is far too simplistic to work, especially when you take other factors into account such as mailbox holds (litigation, retention, and in-place), single item recovery, calendar logging, and so on. The fact that each Exchange mailbox has a separate quota for the recoverable items folder (30GB by default) is an indication of how much data can be retained in a mailbox without a user being aware of anything. All of this means that it’s hard to know exactly what Exchange stores in a mailbox database and it’s hard to figure out what size a database is likely to be on disk.

Which brings us back to the reporting script. If you run the script against any production database (sure, try it against a test database for a start, but a production database will report very different information because of the usage pattern that it goes through), you’ll get a real insight into the amount of information stored in mailboxes and folders and how efficient the database is. The authors expect an overhead of 20%, but some databases that I ran the script against reported figures much higher, provoking more questions as to why this might be. And once you understand what's really happening inside a database, you'll be better able to explain why that database is the size that it is and why it might be expected to grow over time.

In any case, I like ExchangeDatabaseGrowthReporting.ps1 and recommend it to you as a tool that might fit in your Exchange toolbox. Or at least, something to run on when you’re not busy and have come to wonder just what’s going on inside your databases… as we all do from time to time.

Follow Tony @12Knocksinna

Comments

Plain text