One of Active Directory's strengths is its distributed nature. Since its architecture is spread across multiple domain controllers (DCs), AD scales extremely well and is highly fault tolerant to domain controller errors. However, the highly sophisticated replication engine that keeps AD data consistent across all these DCs is dependent on many other systems such as basic network topology, firewall configurations, and DNS. Active Directory topics are the highest call volume generator for Microsoft's Customer Support Services, and replication failures generate among the highest call volumes within the AD support area.
AD replication is a complicated subject. First, the actual step-by-step replication process between two replication partner DCs is not trivial to understand, and comes with a heavy dose of new concepts that have names like "up-to-dateness vector", "invocation ID", and "high water mark". Next, the process actually works between more than two DCs; very large AD implementations have thousands of DCs connected in a byzantine mesh of replication connection objects. Finally, replication isn't a single-headed entity; it's a hydra that works independently on the schema, configuration, and at least one domain partition in every forest.
As if basic replication concepts aren't difficult enough, the tool to diagnose and manipulate replication is also challenging. REPADMIN is the kitchen sink of replication management utilities. It's a command line utility with 69 different commands, grouped in three tiers of complexity from old command syntax to shoot-your-AD-in-the-foot power. Even figuring out the detailed help syntax (REPADMIN /?:
Active Directory Replication Status Tool
Enter the ADRAP team. A part of the Premier Field Engineering team (PFE - a division of Microsoft Technical Support), the Active Directory Rapid Assessment Program is a service offered to companies that purchase a Microsoft Premier Support agreement. In an ADRAP engagement Microsoft engineers run a number of sophisticated tests against your AD environment, essentially holding it up to a microscope, and make a series of recommendations to improve the health of your Active Directory forest. The ADRAP process is generally held in very high regard in the AD community; I've participated in one for Intel.
Rich Doyle, a Principal Architect for Microsoft PFE and the owner of the ADRAP program, took it upon himself to use the team's experience from conducting literally thousands of assessments to build a utility to make AD replication troubleshooting far easier. He created the Active Directory Replication Status Tool (ADREPLSTATUS) to check the replication status of all domain controllers in a forest, isolate and prioritize the errors in a graphical manner, and provide clear guidance on how to correct them. Doyle based the utility on one of the thirty ADRAP modules to build a simpler tool, so you could think of using ADREPLSTATUS as a miniature ADRAP focused solely on replication.
The driving need behind the creation of this tool is that unsolved, long-term replication errors can lead to larger issues in your AD forest, especially lingering objects. As anyone that's had to chase down and stamp out lingering objects can attest, they can be a real problem for your environment, and they're a pain to clean up. And the more DCs or domains you have, the harder they are to stamp out. Replication errors that have existed more than the tombstone lifetime (180 days generally, but old AD installations may be set to as little as 60 days) can especially generate lingering objects.
ADREPLSTATUS has a simple list of requirements. It can be installed and run on any full, supported version of Windows Server or client as long as the platform has .NET framework 4.0 installed. The utility can only analyze replication in the forest or domain to which the client is joined. It can, however, be run from a regular domain user account; no special privileges are required.
When you first launch the tool, ADREPLSTATUS can check replication status on the entire forest, a single domain in the forest, or a specific set of DCs in a domain (including a single DC). For this example I'm running ADREPLSTATUS in my home domain, which has three DCs. You can see a discovery error in Figure 1, which turned out to be an incomplete metadata cleanup of an RODC I previously did a /FORCEREMOVAL on.
Replication Status Viewer
If discovery has completed with no errors, you'll be taken directly to the Replication Status Viewer (Figure 2). You can also launch the viewer via a button in the Replication Status section of the ribbon. It's worth noting that in this tool it's not uncommon to see buttons in two places that do the same task; this was done with the intent to make the most likely task easier to find.
The Replication Status Viewer automatically groups by destination DC (since replication is always inbound), then by source DC. You can see this grouping just above the column headers. You can change the grouping by simply dragging a column header to this section and dropping it on top of the existing group order; if the view gets too confusing, you can easily reset it back to its default by clicking on Options, Load Default Grid View Settings in the ribbon. The viewer provides a wealth of detail in its columns, and you can add or remove columns (Figure 3).
Reviewing and resolving errors
This tool is of course more interesting to use when replication errors exist, so let's induce a few. When I disable outbound replication from RENSHI-DC (a good command to know, and one I'll let you look up for future reference), the status viewer lights up with red errors on the DCs that should be receiving replication from RENSHI-DC (Figure 4). There's an error legend at the bottom of the window that colors replication errors by how old they are, and the darkest are oldest and most urgent to solve. You can also toggle the view with the Errors Only button in the Data section so, logically enough, only the errors show.
The Last Sync Message field gives an indication of what the error is, but note that there's a button to the left of each directory partition row. It shows the specific error number, and when you choose it, the tool goes to TechNet and shows you details on the error and how to resolve it (Figure 5). There are over 70 replication error messages, and ADREPLSTATUS is designed to make this part of the replication error hunt much easier.
Why did I call this the tool we've been waiting for – almost? In version 1.0, ADREPLSTATUS is a view-only tool. You can't trigger or otherwise affect replication from it, so you still must use REPADMIN or Active Directory Sites and Services to correct the errors ADREPLSTATUS finds. It's inconvenient, but Doyle's reasoning was that in a first version it was safer to simply report replication than to impact it. I look forward to a later version with replication commands built in.
The Active Directory Replication Status Tool is a welcome addition to any AD administrator's toolkit. It's not the comprehensive resolution tool it will ultimately be, but even as a version 1.0 product everyone that maintains more than just two or three DCs should install and use it to ensure their AD replication environment is clean. Later in the year we'll post a comprehensive article about the most common AD error scenarios Microsoft Customer Support Services encounters and how ADREPLSTATUS can help you solve them.