I have a technology consulting business in the San Francisco Bay area. One foggy afternoon, I received a call from a client who was desperate for help. He was having occasional problems locating local resources. He thought it was a network problem, but he couldn't be sure, and he was panicky: At the law firm where he worked, things typically got ugly when attorneys found themselves unable to print their documents. This wasn't an ordinary law firm either—it was a high-tech firm boasting more attorneys than many companies have employees. I agreed to assist at my standard double rate for attorneys.
Setting the Scene
The company was using BIND 9.2.3 on Linux as its primary DNS server. The firm had recently deployed Active Directory (AD) on Windows Server 2003 and was slowly adding users and printers. I was told that some users suddenly couldn't print. Other users simply complained that the network was slow.
When I arrived on site, the IT staff greeted me at the door and ushered me quickly into a conference room, where my interrogation of the IT staff began.
"Has anything changed?" I asked.
"No settings have changed?"
"Any new hardware added to the network?"
The only nugget of information I could coax from the IT staff was that in the process of migrating users and printers to AD, they had added some security. They assured me, of course, that this addition should have no effect on users' ability to print.
Diagnosing the Problem
I began by pinging some of the printers in question by IP address. I detected no problems. Next, using Fully Qualified Domain Names (FQDNs), I tried pinging the printers through Linux DNS, then through DNS/AD on Windows 2003. The results were quite interesting. Depending on which DNS server was specified, some printers' IP addresses couldn't be resolved. A quick check of the Linux machines' DNS log files revealed hundreds of daily errors, which dropped off on weekends. Aha! A clue!
I could have used other software utilities to perform several other tests, but what I really needed was a tool that would let me get down to a deeper layer and peer inside the packets themselves. Most diagnostic programs wouldn't have provided sufficiently detailed information for my task and might have led me down a dead-end path. If I wanted to crack this case quickly, I would need a network protocol analyzer, which would let me linguistically analyze the contents of packets going across the wire and get to the root of the problem.
Investigating the Network
I considered using Microsoft's Netmon for my task, but the version included with Windows 2003 captures only packets to and from the server. Instead, I decided to use a commercial analyzer that would capture all traffic. (Free network protocol analyzers are available on the Web. Check out Ethereal at http://www.ethereal.com.) If I had Microsoft Systems Management Server (SMS), I could have used its version of Netmon, which captures all traffic.
I set up the analyzer to capture packets between the Linux DNS server designated with the Start of Authority (SOA) and the Windows 2003 DNS designated with the SOA. I wanted to see the results of two tests: a zone transfer from the Linux box to the Windows system, and a zone transfer from the Windows system to the Linux host. I performed the cross-examination by initiating a zone transfer on the Linux box, pausing, then initiating a zone transfer on the Windows system. Then, I analyzed the results.
On close examination of the trace file, I saw that the Linux host successfully established a connection with the Windows system, failed to perform the zone transfer, then gracefully severed the connection. Looking at the packets from the zone transfer from the Windows system to the Linux host, I saw the same result.
The servers were able to successfully establish a TCP port 53 zone-transfer TCP/IP connection, but they didn't transfer DNS information and finally disconnected. Upon inspection of the packet payload, I discovered an authentication problem: Something was amiss in the packet response. But DNS transfers don't require authentication, right?
As I mentioned previously, the IT staff had changed some security settings. Eventually, I received more thorough information: To prevent a forged zone transfer, the Linux administrator had enabled the DNS Security extensions protocol (DNSSEC) on the Linux DNS servers. DNSSEC, defined in Internet Engineering Task Force (IETF) Request for Comments (RFC) 2535, uses public and private keys. Windows 2003's DNS doesn't fully support DNSSEC, so we removed the security changes and tried again. The problem was solved. The company was in the process of migrating to Windows 2003 and AD, so we decided to reverse the platforms and make the Windows 2003 server the primary DNS server.
Laying Out the Solution
By default, Windows workstations are set to register their address in DNS. (This check box is located on the DNS tab, which you access through TCP/IP Properties, Advanced TCP/IP Settings.) Those hundreds of errors in the Linux machine's DNS log file occurred because the workstations were trying to register with DNS as a dynamic DNS (DDNS) entry. Because the Linux box was set to use DNSSEC, the workstations' requests were blocked, thus polluting the log file and generating unnecessary network traffic.
My client was pleased that I was able to solve his problem in half a day. At this high-profile law firm, after all, time is money: The firm bills approximately $25,000 per hour, and the product it produces is words on paper. I might have saved the client upwards of $100,000 with my solution. And you can bet the firm's IT staff has gained a deeper understanding of Windows DNS security intricacies—particularly how DNSSEC works.