Based on some of the Twitter traffic that I have received recently, my post “NFS and Exchange – not a good combination” appeared to have hit a nerve with some NFS vendors. People working at Nutanix and EMC used Twitter to tell me how wrong I was in saying that NFS might corrupt Exchange databases. The debate was largely constructive and I had no problem with anything that was said. There’s also an interesting discussion going on about the wisdom of Exchange support for NFS in the Ideas Forum for Exchange, if you’d like to check that out. In addition, a group of NFS advocates have stated their views on the topic in the Exchange Server Development forum on TechNet (see “Support for Exchange Databases running within VMDKs on NFS datastores” - read the comments as well as the original text).
Getting back to the original post, my perspective is that I was simply reporting the official support stance as set down by the Microsoft Exchange development group. I do not work for Microsoft (and never have) and do not have access to their source code. Nor do I understand what precise evidence they have to justify holding so robustly to their stance that block-level storage is the only supported platform for Exchange. However, I consider the allegations of incompetence levelled at Microsoft that I have seen in some tweets to be both unfair and inaccurate. Sure, I have met some people who worked for Microsoft over the years whom I thought were incapable, but I have a high respect for those who stand over the statement about block-mode storage.
When it comes to storage, following a discussion with Jeff Mealiffe, a well-known authority on performance and storage in the Exchange development group, my understanding is that Microsoft has many the same concerns for SQL databases as they have for Exchange. The SQL team has put together the requirements for storage vendors in a reasonably understandable document (even for people who are not storage professionals). Specifically for Exchange, Jeff emphasized that the problem areas for NFS are in the areas of:
- Forced Unit Access (FUA) and Write-Through, including statements such as “All components in a solution must honor the write-to-stable media intent. This includes, but is not limited to, caching components.”
- Write Ordering. This is required to preserve the integrity of transactions going to the database (you clearly don’t want data written in the wrong order).
- Torn I/O Protection. The document says that “a solution must provide sector alignment and sizing in a way that prevents torn I/O including splitting I/Os across various I/O entities in the I/O path.” In other words, storage must ensure that all of the data for a transaction is written and never reports success when partial writes occur. (This presentation is a quick read for those who want more information on data corruptions)
Storage is a complex area and solutions that want to support applications like SQL or Exchange have to comply with all of the requirements set down by the developers. After all, the documented requirements form the basis for testing the application and if a solution doesn’t comply with one or more of the requirements then the application could be presented with invalid, corrupt, or incomplete data. None of these are good scenarios.
Major players like VMware certainly do the work to validate their parts of the equation. For instance, this VMware knowledge base article “provides information about NFS datastore deployment in VMware ESX, and confirms that NFS in an ESX environment maintains write ordering and write-through integrity for such applications.”
I am not saying that NFS solutions are guilty of failing to do the right thing when it comes to interacting with Exchange. I am saying that getting everything right is complex, especially when virtual platforms are thrown into the mix. Now you might have a combination of a hypervisor from one vendor working with storage from another dealing with an operating system and application from Microsoft. Given the speed of software and hardware development today, the mix that you deal with in February 2014 might not necessarily be the same (or probably will not be) in June.
Complexity and cost of the resulting support matrix if NFS is embraced lies at the heart of Microsoft’s objections. As an illustration, it is possible to use tools like JetStress to test one combination of Exchange (a specific build) against VMware (again a specific build) with the databases running on NFS storage from a particular vendor, let’s say NetApp. No doubt NetApp will specify a certain version of firmware and hardware for the test. And after the test period is over and the results collated and verified, you have a single approved NFS solution. Now repeat for each change in the combination – new cumulative updates for Exchange (now arriving at a quarterly cadence), updates for the hypervisor, and product updates and new solutions from NFS vendors that you care to approve for deployment. Even if every vendor takes responsibility for performing the tests and publishing results through a process similar to the existing Exchange Solution Reviewed Program (ESRP), you still have to grapple with a) potentially unworkable solutions being tested (something we have already seen in some ESRP configurations), b) the issue of updating results as elements of the solution change, and c) customer support and the interaction between Microsoft and the other companies who own part of a solution. The result could be an unworkable quagmire.
Those working on NFS-based solutions are as competent as their Microsoft counterparts. I can easily imagine that they are frustrated by Microsoft’s hard-line attitude. I am sure that they have tons of examples of how well NFS works with Exchange that they have gathered and documented in their developments labs and in real-life customer engagements. Indeed, Josh Odgers of Nutanix reported in the ideas forum that he had:
“Just ran an Exchange ESRP 24 hour test on a Nutanix NX-3451 with Exchange 2013 in a Windows 2008 VM w/ 8 VMDKs on a single NFS datastore and passed! (I was not at all surprised!)”
It’s only natural that they want to gain from their work by being able to sell fully-supported NFS-based storage solutions for Exchange, which I guess is why three questions are outlined in the forum contribution referred to earlier:
- “Can you clarify by providing some form of documentation what the issue is with Exchange on NFS natively. The goal (is) to ensure if there is an issue, it’s understood by the community
- Can you clarify by providing some form of documentation what the issue is with Exchange storage in a VDMK on an NFS datastore (where the storage is abstracted by the hypervisor). The goal again is to ensure if there is an issue, it’s understood by the community and can possibly be addressed in future hypervisors.
- If the support statement is simply outdated and needs updating, let’s work together to make it happen for the good of all Microsoft’s customers, especially those who have NFS storage from one of the many vendors in the market.”
Put more succinctly, the NFS community want the Exchange development group to clearly explain their issues with NFS so that efforts can be made to put NFS into a supportable state. It’s a reasonable ask.
On the Microsoft side, it’s fair to say that no development group wants to take on an extra support load. Support consumes a huge amount of effort and expense, especially when the product is popular. NFS-based solutions are attractive from a cost perspective and it is therefore logical that if Microsoft supported NFS for Exchange these solutions would soon take a reasonable slice of Exchange deployments, especially in the small-to-medium segments. The size of that opportunity is what attracts NFS vendors and drives them in their desire for NFS to be a supported platform.
But when things go wrong with an Exchange database running on NFS storage, the customer’s first call is always going to be to Microsoft. After all, they want to get their database up and clients connected again as quickly as possible and Microsoft support is a natural first port of call. It’s true that databases configured with sufficient copies in a DAG should be well protected and a copy will be activated to restore service, but it is conceivable that a hardware corrupt could overwhelm all copies (this is the reason why lagged copies and backups exist) and render a database inoperative. As the support case evolves, it might be true that Exchange is at fault, but perhaps the root of Microsoft’s concern is how to resolve problems if the hardware is at fault.
Companies with solid engineering and strong track records of successful products plus the capacity to help customers deploy and use NFS solutions in the right way would do the right thing and prosper. On the other hand, you can imagine how other players might seek to enter the market using low-priced solutions that don’t work as well as you’d hope. And customers who use those solutions in tandem with Exchange will seek support from Microsoft when things go wrong, something that could create a support tsunami.
For all of the perceived risks that Microsoft might see in supporting NFS solutions for Exchange, I imagine that a substantial upside exists too. Microsoft has spent the last decade beating the “I/O reduction” drum in a crusade that has transformed Exchange from a point (Exchange 2003) where expensive SAN-based deployments were the norm to now (Exchange 2013) where low-cost DAS solutions are the preferred storage choice for many, including Microsoft’s own Exchange Online service within Office 365. It seems to make commercial good sense to give customers even more choice in storage platforms, especially if more cost can be extracted. The caveat being that Microsoft must be satisfied that it can support NFS-based deployments as efficiently and effectively as it can do for DAS today. That will take an enormous amount of effort from both the NFS vendors and Microsoft.
Age lends a certain wisdom. Over my career I have learned that there are usually two sides to every technical debate. I think sufficient evidence exists to warrant Microsoft supporting some form of NFS for Exchange. I’ve encouraged the NFS vendors who have contacted me to use the opportunity presented by the upcoming Microsoft Exchange Conference in Austin to sit down with Microsoft and thrash things out. I'd even be willing to host a debate between the two sides at MEC. That might be quite fun!
Update (September 29, 2014): Nutanix global director of solutions and performance engineering (some title that!) Lukas Lundell has published a blog post acknowledging that full support from Microsoft is only obtainable when customers deploy Exchange using supported storage (which doesn't include NFS) like SMB 3.0. On the other hand, he also says "we see no technical issues with NFS-based datastores with VMware, so we have agreed to support customers who would prefer this configuration for the benefits it provides." I guess the debate continues...
Follow Tony @12Knocksinna