Most of us, when troubleshooting a replication problem, have been in this head-scratching situation: DC1 can’t seem to find DC2, but when you log on to DC1 and ping DC2, you instantly get a good reply. Why can’t it find DC2? Because name resolution between DCs is, naturally, a little more complicated than you may think. But it’s for a good reason.
One of the most common errors we see when replication isn’t working is some kind of name resolution error, for example “RPC server is unavailable” or “DNS lookup failure”. (Don’t get me started on the lack of clarity on that first error.) Once you realize it’s a name resolution problem, you of course start looking at DNS. But where?
Because we humans and most computer services locate other computers on the network using the DNS “A” record (e.g. mycomputer.deuby.net), it’s natural to assume that’s also how DCs find each other for replication. They do – eventually – but only indirectly. For replication purposes, a DC’s directory service registers a GUID, unique in the forest, in DNS as a CNAME (alias) record. This cname resolves to the DC’s A record.
This CNAME record is known as the DSA object GUID. In the screenshot at right from the DNS Management snap-in for my domain, you can see the GUID of 2008rodc.deuby.net, its fully-qualified domain name (FQDN), and the FQDN of its target host. When a directory service is attempting to locate its replication partners, it’s using the FQDN of the CNAME: That’s the second text field in the screenshot.
There are several ways you can find the DSA GUID of the DC that’s causing the error. First, you can look it up in the DNS Management snap-in, under the _msdcs container of the domain’s zone. You can look it up, that is, if it’s been registered correctly in DNS. If you aren’t sure, try one of the next methods.
You can dig it up under Sites and Services, or the simplest way is to do a REPADMIN /SHOWREPL
Once you have the DSA GUID, ping it from a DC that’s receiving the errors. (You could also do it from your own client, but that’s probably introducing another variable – you may be using a different DNS server than the DC – into the problem, and you must Keep It Simple, Stupid.) If you got the DSA GUID from Sites & Services or REPADMIN, be sure to append the rest of the DNS suffix (in this case, “_msdcs.deuby.net”) to the ping to get the FQDN . If you get no response from the ping, or a “could not find host” error, it means the replication problem is most likely caused because the CNAME or A record is not registered correctly.
Next time I’ll give a few different tips on how to re-register a DC’s DNS records.
Follow Sean on Twitter at @shorinsean.