The influence Managed Availability has over DAGs that you might not realize

By now I guess that it is a well-known fact that the Managed Availability framework in Exchange 2013 is one of the major developments in this release. The heritage of Managed Availability lies firmly within the Office 365 service, where it was developed to help the Exchange developers rest easy at night because they wouldn’t be disturbed and gotten out of bed to resolve problems that occurred within “the service”. Managed Availability is intended to detect, assess, and resolve common problems without human intervention. In short, it gives Exchange 2013 a self-healing capability that makes the product a very different beast in the eyes of monitoring products such as SCOM.

The influence of Managed Availability is pervasive across the whole of Exchange 2013. In recent times we have seen just how pervasive its influence is and how problems can occur when Managed Availability has a hiccup. Software is never really perfect and it's not surprising that we should see issues like this as administrators get used to working with Managed Availability. Overall, I think that Managed Availability is a very good thing and will improve over time.

Getting back to the topic of this post, Managed Availability works closely with Active Manager to ensure that active database copies are positioned on the healthiest available server within a Database Availability Group (DAG). In other words, if Managed Availability detects a problem with the server that currently hosts a database, it will ask Active Manager whether a healthier server exists. If such a server is available, Active Manager will perform a switchover and activate the database copy on that server.

Failure to put a DAG member into maintenance mode before applying an upgrade is another reason why Managed Availability will ask Active Manager to activate a database copy on a different server. It’s better when an administrator controls this process and selects the server to absorb the load of the databases on the server that is to be upgraded.

Remember that Managed Availability is designed to function automatically so Exchange 2013 is working properly when Managed Availability detects a problem that it considers to be causing a problem on a mailbox server and prompts Active Manager to take action. Being human, an administrator might well miss the signs that Managed Availability picks up through its array of probes. For instance, who checks that all of the application pools are functioning correctly on a server? And who does this kind of thing every five minutes? Such a pedantic attitude to server monitoring is appropriate for a computer where humans rapidly tire of the repetition.

The upshot is that active database copies might be far more mobile within a DAG than you expect. Again, this is by design. It makes no sense to keep an active database copy on a server that is exhibiting signs of potential failure. Active Manager running within an Exchange 2010 DAG also performs automatic database switchovers. The difference is that its Exchange 2013 counterpart is under the influence of Managed Availability and therefore has a lot more data available to understand when a database switchover should be performed. And so over time it is quite possible that database copies are activated on servers in a way that might surprise. That is, if you don’t check.

Of course, I know that I am communicating with computer professionals here and that anyone reading this text will carefully scan their DAG every morning, perhaps even before the full effect of your chosen early morning beverage is felt, to ensure that all is well and that databases are active on their preferred servers. By doing this you make sure that workload is distributed evenly across the DAG and that the database layout designed through careful planning using Microsoft’s tools and your own experience is used.

But life is imperfect. Not always, but perhaps more frequently when it comes to IT. Other tasks get in the way of scanning a DAG for inconsistencies. In short, there’s always something else to do that might mean that a DAG might degrade over time in a subtle manner from a position of maximum effectiveness (a state sometimes only ever attained just after installation) to a point where the DAG is functional but not pretty. A DAG is designed to be highly available and it will do its level best to maintain service by shuffling databases between available member servers. Users won’t be aware that any problems exist until the last database copy holding their mailbox fails. At this point, your reputation will suffer.

You can, of course, keep a very close eye on all aspects of your DAG, including careful perusal of the Crimson Channel (perhaps using the PowerShell techniques explained in this EHLO post) to locate events logged when Active Manager activates database copies. However, as pointed out above, careful monitoring is tiresome and boring. For this reason I suggest that you consider introducing some automation of your own to complement the wonders of Managed Availability. Using Paul Cunningham’s Database Availability Group Health Check script (Get-DAGHealth.ps1) is a good starting point as it can be run as frequently as you like (daily is probably enough) and its output can be delivered as a nicely-formatted HTML message to your mailbox where the parts shaded in red deserve your attention.

After servers are restored to full health, you might need to redistribute active database copies around the DAG to reflect your intended layout. Microsoft provides the RedistributeActiveDatabases.ps1 script for this purpose so this shouldn’t be too hard.

Managed Availability clearly exerts a lot of influence over the way Exchange 2013 works. Given the need to automate as much as possible within the massive multi-tenant Office 365 datacenters, I can only imagine that the trend toward automated management will continue. We simply have to learn how to harness this new capability to maximum advantage.

Follow Tony @12Knocksinna

Comments

Plain text