In “Troubleshooting Active Directory Replication,” Sean Deuby presented several strategies for solving Active Directory (AD) replication problems. To troubleshoot AD replication at a deeper level, it helps to have an in-depth understanding of how replication works when changes occur in the directory. AD was one of the first LDAP directories to introduce multi-master replication, whereby changes can originate in any instance of the directory (i.e., on any domain controller—DC). Previously, such as with Windows NT 4.0, changes could originate on only one DC, the Primary Domain Controller (PDC). Multi-master replication has numerous inherent benefits, but it presents a complex problem: how to coalesce and replicate changes to the directory. Throughout this article, I’ll refer back to the simple three-DC domain that Figure 1 shows.
Keeping Track of Replication
AD uses several counters and tables to ensure that every DC has the most current information for each attribute and object and to prevent any endless replication loops. AD uses naming contexts (NCs), also known as directory partitions, to segment replication. Every forest has a minimum of three NCs: the domain NC, the configuration NC, and the schema NC. AD also supports special NCs, often known as application partitions or non-domain naming contexts (NDNCs). DNS uses NDNCs (e.g., DomainDnsZones, ForestDnsZones). Each NC or NDNC replicates independently of one another.
Every DC maintains a special counter known as an update sequence number (USN) counter. The USN is a 64-bit number that you can think of like a clock. The USN counter is never decremented, and a USN can never be reused. Each DC maintains a separate USN counter that starts during the Dcpromo process and is incremented over the lifetime of a DC. It’s improbable that any two DCs in a forest will ever have the exact same USN at the same time. The USN counter is incremented each time a transaction occurs on a DC. Transactions are typically create, update, or delete operations against an object. An update transaction might include updates to a single attribute, or it might include updates to many attributes. In the event that a transaction fails and is rolled back, the USN assigned to the transaction isn’t reused. When an object is modified (or created), the usnChanged attribute of that object is stamped with the USN of the transaction that caused the change. You can therefore keep track of changes to AD by asking a DC for all the objects for which the usnChanged attribute is greater than the highest USN the last time you checked.
Table 1 illustrates a simple example of the changes to the USNs of two DCs over time. Consider a scenario in which you create five new groups on DC-A. This action will increment DC-A’s USN counter by five. DC-A then replicates these groups to DC-B, whose USN counter is incremented by five. (Note that the initial USN values in Table 1 were chosen for illustration purposes.) Subsequently, you edit the name of one of the groups on DC-B. DC-B’s USN counter is incremented by one. When the change in name replicates to DC-A, DC-A’s USN counter is incremented by one.
From the perspective of DC-A, when the five groups are created, this is considered to be an originating write. From the perspective of DC-B, this is a replicated write. Conversely, when a group’s name is updated on DC-B, DC-B considers this action an originated write and DC-A considers it a replicated write.
The AD replication process identifies all the DCs participating in the replication process using two globally unique identifiers (GUIDs). The first GUID, the Directory Service Agent (DSA) GUID, is established during Dcpromo and doesn’t change for the lifetime of the DC. The DSA GUID is stored in the objectGuid attribute of the NTDS Settings object under the DC as shown in Active Directory Sites and Services. The second GUID, the Invocation ID, is the identifier for the DC during the replication process; it might change during the DC’s lifetime.
The Invocation ID is stored in the invocationId attribute of the NTDS Settings object. Any time a restore is performed on a DC by a supported restoration process, such as Windows Server Backup or NTBACKUP, that DC’s Invocation ID is reset. By resetting the Invocation ID, AD is able to ensure that the DC receives a copy of any changes that occurred on that DC between when the backup was taken and when the restore was performed. Because the Invocation ID is the unique identifier for the DC during the replication process, the reset of the Invocation ID effectively ensures that the DC enters the replication process as a new DC and there are no assumptions about data that might already exist on that DC. Improper restoration of a virtualized DC, such as restoring or reverting back to a saved snapshot, won’t reset the DC’s invocation ID. This leads to a situation known as USN rollback, which can cause severe replication problems.
Now that we’ve discussed how replication keeps track of DCs and changes, we can take a look at how replication determines what changes need to be replicated to a given DC and how replication ensures that changes aren’t unnecessarily replicated. Two tables are used for this process: the High-Watermark Vector (HWMV) and the Up-To-Dateness Vector (UTDV). The HWMV is maintained independently by each DC to keep track of where it left off (in terms of the last USN) replicating an NC with a given partner. The UTDV is used by DCs to ensure that they don’t create needless replication or a loop. When DC-A sends DC-B a request for replication, it includes its UTDV so that DC-B sends only changes that DC-A hasn’t received (e.g., in the case of changes made on DC-B that were replicated to DC-C and in turn to DC-A).
Table 2: DC-A’s High Watermark Vector (HWMV)
Table 3: DC-B’s High Watermark Vector (HWMV)
The UTDV stores the highest originating update USN that the DC has received from every other DC replicating a given NC. By storing this information, DCs will never be sent changes that they’ve already received via another path (e.g., if a change occurs on DC-A, but DC-C receives it via DC-B). This behavior is often referred to as propagation dampening. Using the UTDV, the DC sending the information is able to determine changes it hasn’t sent to the DC that’s requesting replication, but also not send changes the DC has already received from other DCs. This behavior prevents an endless loop of changes being replicated between DCs.
To summarize this process, each DC maintains an independent, forward-moving counter known as a USN counter. The USN counter on a DC is incremented each time that DC performs an originating write (such as a create, delete, or update) to the directory. When DCs replicate, they ask for all the changes since the previous USN they replicated from that DC. This previous USN is stored in the HWMV so that DCs don’t ask for changes they’ve already received. Inside each replication request, DCs also include their UTDV. Each DC maintains a UTDV for each NC replicated, and inside the UTDV the DC tracks the highest originating update USN for which it has received changes, for every DC replicating the NC. This prevents endless replication loops and leads to the behavior known as propagation dampening, which ensures that updates aren’t needlessly replicated.
Tracking Object Updates
The key to AD’s replication model is replication metadata (i.e., information about the data that has replicated). Replication metadata is associated with each object in the directory, and this is what AD uses to determine the relative state of objects across multiple DCs. Every object has a number of fields that it stores on a per-attribute basis inside a table that constitutes that object’s replication metadata:
- Attribute name
- Change timestamp
- Attribute version number
- Originating DC ID (DSA GUID)
- Originating DC USN
If we consider a simplified version of the scenario outlined earlier, in which we complete the following tasks, we can illustrate how each of these fields in the replication metadata changes:
- Create a user on DC-A.
- Replicate that user to DC-B.
- Modify an attribute of the user on DC-B.
- Replicate the change back to DC-A.
When we create a user on DC-A (as in Step 1), the new user’s replication metadata will look similar to Table 4. In the interest of simplification, I’ve included only three attributes (first name, last name, and password); however, many more attributes are set when a user is created.
The usnCreated attribute of the user is also permanently set on DC-A to 5,001. The usnChanged attribute is also set to 5,001; however, this attribute will be modified each time an update is made (originated) or received (replicated) for the user. After the new user replicates to DC-B (Step 2), the replication metadata on DC-B will match Table 5.
When a change to the user’s password (Step 3) occurs on DC-B, the metadata for the password attribute (unicodePwd) is updated, as Table 6 shows. The change is subsequently replicated back to DC-A, which updates the local metadata for the user, as Table 7 shows.
Decoding Object History
One of the nice things about replication metadata is that it lives with an object for its entire life. If you’ve ever been asked where or when an attribute was changed, or when an object was created, you can use replication metadata to find out. The data is accessible in a couple of difficult-to-understand formats as an attribute of the object, but fortunately Repadmin, which is included in Windows Server 2008 and later (or in the Support Tools for previous versions of Windows), makes it easy to decode the data.
To review the replication metadata of an object, you must provide the DC to request the metadata from and the distinguished name (DN) of the object in question. You can use the following command to review the metadata for a user:
'User A' on DC-A: repadmin /showobjmeta "DC-A" "CN=User A,CN=Users,DC=contoso,DC=com"
You’ll see output similar to that in Figure 2.
From this output, we can see that the user was created on DC-A on March 24, 2008. This is evident based on the originating timestamp and originating DSA for the objectClass attribute, which is version 1. The objectClass attribute can change in some scenarios, in which case you’d need to look elsewhere (such as the metadata for the objectGuid attribute). On March 22, 2010, the user’s givenName (first name) was modified on DC-B, as evidenced by the same originating DSA and originating timestamp columns. You can determine the number of changes that have been made to an attribute based on the version number.
When it’s possible to make changes to the same object on multiple DCs at the same time, conflicts can occasionally occur. The most common types of conflicts are objects that are created in the same place with the same name (e.g., two “John Doe” accounts in the same organizational unit—OU) and changes to the same attribute in between replication cycles.
In the case of whole object conflicts, the problem is that it isn’t permissible to have two objects with the same relative distinguished name (RDN) within a given container. If you created the user “John Doe” in AD, by default that user’s RDN would be 'CN=John Doe'. If another administrator created an account for John Doe on a different DC in the same container before your user replicated, the change would be permitted—however, during replication AD would need to handle the duplicate RDNs. AD does this by keeping the RDN of the object with the most recent timestamp and renaming the older object(s) such that their GUID is appended (e.g., the older John Doe would have an RDN of 'CN=John Doe\0ACNF:
When changes occur to the same attribute within a replication cycle (e.g., perhaps a user’s description is changed on two DCs by two administrators at about the same time), AD must decide which update to keep. AD first looks at the version number of the attribute in the replication metadata. The change with the highest version number wins. If the version numbers are equal, AD then looks at the timestamps and picks the last write. In the unlikely event that the timestamps are identical, AD looks at the originating server GUIDs and picks the change from the mathematically largest originating server GUID.
The final scenario is the case in which an OU or container is deleted, but before that deletion replicates to other DCs, another administrator creates a child object inside that OU or container. A simple example is when you’re closing an office, perhaps the Chicago office, so you delete the OU for Chicago. Meanwhile, an administrator in San Francisco moves a new user to the Chicago OU. When the deletion of the Chicago OU replicates to San Francisco, AD won’t delete the user that was moved. Instead, AD moves the user to the LostAndFound container at the root of the domain (or in the case of the configuration NC, the LostAndFoundConfig container).
AD implemented one of the first multi-master LDAP directory replication models. Multi-master replication introduces some complex challenges, such as how to ensure that replication doesn’t create loops or endless network traffic and how to resolve conflicts. Every DC stores several tables to keep track of replication state and to ensure consistency. In addition, each object stores replication metadata, which serves as a history of that object.