The demise of the infamous FSW warning

Among the detail buried in all the fuss and bother surrounding the release of the first cumulative update for Exchange 2013 were some changes to the High Availability system. Fortunately, Scott Schnoll, who must have clocked up more air miles than anyone else to educate people about the mysteries of the DAG, has documented the CU1 HA changes in his blog.

It’s easy to overlook detail, especially when the changes are subtle and will possibly not be used all that often. For instance, Scott points out that although the Set-DatabaseAvailabilityGroup cmdlet now boasts a SkipDagValidation switch, “it won’t be of much use to on-premises environments”. On the other hand, I’m sure that we are all relieved that the new parameter “has some usefulness for us in Exchange Online.” This then begs the question as to the conditions that might allow the SkipDagValidation to be useful? Never mind… it’s just detail.

The changes made to the Update-MailboxDatabaseCopy cmdlet are more interesting in a practical sense because I think they will be used more often. Discovering that a database copy needs to be reseeded is always guaranteed to improve the mood of an administrator. A database reseed is not particularly efficient in Exchange 2010 and Microsoft improved matters in Exchange 2013 by allowing reseeding to occur in parallel from multiple copies. Further improvements are made in CU1 to enable better automation, probably driven from experience in “the service” where I imagine that reseeds occur frequently due to disk failure. After all, the economics of cloud platforms are built around using commodity hardware and commodity disks tend to fail more often than their enterprise counterparts. This does not matter so much when software can recover and repair itself automatically, hence the improvements in automation that we see across different aspects of Exchange high availability. Lagged database copies playing down transaction logs when disk space runs low is another example of such a change introduced in Exchange 2013.

But then we get to the most important change of the lot: the eradication of the infamous error posted when Exchange protests that the server hosting the file share witness for a DAG is not manageable by Exchange (see the screen shot). As Scott notes in his post, the problem has been around for a while and exists in both Exchange 2010 (fixed in Exchange 2010 SP2 RU5) and Exchange 2013 RTM (full details are available in another of Scott's HA blog posts). Basically the problem happens when a server that does not run Exchange is used to host the file share witness and Exchange gets it wrong when it checks whether it can manage that server. Experienced Exchange administrators know that adding the Exchange Trusted Subsystem group to the local administrators group for the server in question grants Exchange sufficient authority over the server and ignore the warning message, probably because they know that the DAG runs even when the warning is present.

However, if you have not seen the error before it can be quite worrying to see Exchange protest that insufficient permission exists and “the database availability group may be more vulnerable to failure”. This is exactly the kind of reassurance that people want when attempting to configure high availability. In fact, the warning is the result of a bug where “the code is actually checking to see if the witness server is a member of the Exchange Trusted Subsystem USG.” There’s absolutely no reason why the witness server needs to be a member of Exchange Trusted Subsystem but there you are.

The error is now fixed and the anti-acid tablets can be stowed safely to await the next bug. The nature of software is that one will be coming along any time now. Just stay tuned and you will see...

Follow Tony @12Knocksinna

Comments

Plain text