How Exchange 2013 measures and monitors server health

Most recent posts about the influence the Managed Availability framework exerts over Exchange 2013 Database Availability Groups prompted a question about how many probes does Exchange 2013 deploy to assess how well its components are working on a server, be it a standalone server or one that's a member of a DAG.

As you might recall from Ross Smith IV’s EHLO post covering Managed Availability, a set of probes is installed on every Exchange 2013 server to provide data that Managed Availability can then assess using its monitors. Monitors decide whether any intervention is necessary and if so, invoke a “responder” to take some action to make the component healthy again. The action can range from resetting an application pool right up to forcing a server reboot.

One immediate question is just how many probes are used by Exchange 2013? Some insight can be gained by running the command:

Get-ServerHealth –Server Server1 > C:\Temp\ServerProbes.txt

This command generates a server health report for the nominated server and pipes the output to a text file. It’s much easier to browse the set of probes through a text file than attempting to follow the output in a PowerShell window. If you open the text file, you’ll see a list of the probes, the component (or resource) that is monitored, the health set into which Exchange groups resources, and their alert state.

You’ll find hundreds of probes in the set, but it’s more convenient to group them into components like ActiveSync, which is what Exchange does by referring to “health sets”. To see the information presented in a more digestible format, run the command:

Get-HealthReport –Server ServerName

The server health report reveals the major component areas that Managed Availability concerns itself with, including the protocol stacks that clients use to access Exchange (OWA, EWS, ECP, etc.) and other major areas of functionality (Autodiscover, Transport, Assistants, and so on). In the screen shot you can see the current health status of each area (Autodiscover is shown as unhealthy for some reason that deserves investigation), and the MonitorCount, which I take to be the number of probes associated with each component. “HubTransport” (not shown here) has 138 probes while “Search” has 69 and are the most heavily instrumented components. CU1 includes a special version of the server health report to report on components that affect high availability:

Get-ServerHealth –Server ExServer1 -HaImpactingOnly

Exchange 2013 (from CU1 onward) supports the concept of rollup groups for health reports. In other words, you can measure a set of servers to derive an overall health picture for all the servers. For instance, to see the health status for ActiveSync across a DAG, you’d use the command:

(Get-DatabaseAvailabilityGroup –Identity DAG1).Servers | Get-HealthReport –RollupGroup –HealthSet ActiveSync

For now the Exchange 2013 Management Pack for Microsoft’s System Center Operations Manager (SCOM) is the major consumer of this health information. It’s possible that other third party monitoring products will take an interest in this data over time. After all, the data generated by Managed Availability hasn’t really seen much real use outside Microsoft yet. But in time, who knows?

Follow Tony @12Knocksinna

Comments

Plain text