Troubleshooting DNS in the New Decade

Got an Active Directory (AD) problem? Chances are, it’s really a DNS problem. Can’t get to your email, Twitter, or Facebook account? Chances are, it’s really a DNS problem. Smoking out DNS-related problems is Step Two in troubleshooting almost any network problem—but knowing how to troubleshoot DNS is something of a moving target, because it just keeps changing. We’ve covered the basics of DNS troubleshooting in the past (see the Learning Path). In this article, we move beyond the basics to take a new look at an old idea. We explore how to simplify name resolution on your network with a couple of DNS troubleshooting tools that far surpass Nslookup, and we examine in-depth the unjustly accused cause of a DNS trouble scenario that’s new in Windows Server 2008 R2: Extension mechanisms for DNS (EDNS). (By the way, Step One is “Check that it’s plugged in.” But you already knew that, right?)

Turn Off WINS

When people say, “Check DNS,” it’s sort of shorthand for “Check the entire name resolution infrastructure,” which includes local NetBIOS broadcasts, WINS, and a newly styled Network Neighborhood replacement called Network Discovery that arrived with Windows Vista—not to mention the HOSTS and LMHOSTS files. No wonder name resolution troubleshooting is so complicated! It’s as if some of your electricity came from the phone company and ran at 80 volts, some came from the cable company and provided direct current, and the rest came from the traditional power company, but you never knew exactly which kind of power went to your blender, your computer, or Grandma’s iron lung—which would make troubleshooting nonworking appliances very difficult and time consuming. Simplify the name resolution, and the troubleshooting gets easier.

Before you turn off WINS, you should of course test your network configuration without it (including the NetBIOS over TCP/IP setting in your TCP/IP properties). I think you’ll be surprised at how few things—or nothing at all—still need WINS, although it clearly depends on how modern your client and server OSs are. Disabling WINS is a terrible idea in Windows 2000 Server, workable but occasionally annoying in Windows Server 2003 and Windows XP, and fairly trouble-free in Vista and later OSs. Understand, however, that just because your OS is fine in a NetBIOS-free environment, your apps might not be—I’ve heard that some anti-malware apps need NetBIOS, although I haven’t run into this situation. If you do need WINS for an occasional app or two, look into using Server 2008’s GlobalNames zone; it can help DNS do part of WINS’s job in some cases. (For more information about Server 2008’s GlobalNames zone, see “What is GlobalNames in Windows Server 2008?”)

Use Network Monitor

Everyone knows about Network Monitor, but most folks are still scared to try it—and they shouldn’t be. Network Monitor captures and displays every network packet that enters or leaves your system, laying bare for your inspection every single bit that zips through your NIC. Netmon initially seems like a tool for black belts, but in some ways it’s even better suited for the Jack-of-all-trades DNS troubleshooter because it lets administrators employ an old repair adage: It’s hard to recognize “sick” if you don’t know what “healthy” looks like. So if you run Netmon on your system when it’s working, keep that capture handy and then run another capture when things aren’t working. Compare the two captures and play a bit of “What’s different?” and you’ll soon start gleaning clues.

I know that simply mentioning Netmon causes some folks to run in fear, but don’t—here’s a quick primer on Netmon and DNS. In this example, I set up a Server 2008 R2 DNS server, which I configured to look to itself to resolve DNS addresses. I told the server to ping www.bigfirm.com and capture the resulting network traffic with Netmon. The result will be a complete mess of largely irrelevant network chatter—a ton of ore from which we want to pluck just a few golden nuggets.

The first step is to install Network Monitor. (The tool is available for free download.) When you start Netmon, be sure to right-click its icon and select Run as administrator. The first time you start Netmon, the program asks whether you want to use Microsoft Update. After you dismiss this dialog box, you’ll see a welcome screen such as the one in Figure 1. The lower left window controls which traffic you want captured. We don’t care about the isatap and Teredo traffic, so clear those check boxes. Leave the Local Area Connection check box selected. Ignore the P-Mode option; enabling it would only add to the clutter. Click New capture tab in the upper left window.

Figure 1: Network Monitor’s welcome screen

The Capture screen that opens, which Figure 2 shows, is one of the reasons administrators shy away from Netmon. To simplify this screen, close the Network Conversation pane on the far left side and the Hex Details pane in the lower right corner, leaving just Display Filter, Frame Summary, and Frame Details.

Figure 2: Network Monitor’s Capture screen

To put Netmon to work, open a command prompt and click the green triangle next to Start in the Netmon window. Give Netmon a second to get rolling, then return to the command prompt and run the command

ping -n 1 www.bigfirm.com

After the ping command runs, return to Netmon and click the blue square next to Stop. Congratulations—you’ve made your first capture! Look at the status information at the very bottom of the Netmon window frame to see how many network packets you captured. Depending on how quickly you worked and how quiet your network is, you might have anywhere from about a dozen frames to a hundred or so. No matter how many you have, it’ll look like a mess. To separate the wheat from the chaff, you need to create a display filter.

To view only the DNS-specific traffic, enter DNS in the Display Filter text field and click Apply. You’ll see a screen such as the one in Figure 3. This example screenshot shows just the six packets I’m interested in. (I removed the Process, Time Offset, and TimeDateLocalAdjusted columns because they weren’t relevant to this capture.)

Figure 3: Viewing DNS-specific traffic

These six packets show how a DNS server finds basically any host in any .com domain. First, your local DNS server queries a root server for the host address of www.bigfirm.com (packet 6, notice “Query for www.bigfirm.com…”). The root server responds, “I have no idea, but if you ask one of the .com DNS servers, it can help you” (packet 7, notice “Response – Success”). Then your DNS server asks a .com DNS server for www.bigfirm.com’s host address (packet 8) and is told that no, the .com DNS server can’t help, but that your DNS server might try asking bigfirm.com’s DNS server (packet 9). Your DNS server then asks bigfirm.com’s DNS server for www.bigfirm.com’s host address (packet 10) and finally gets its desired response (packet 11). Note that Netmon makes keeping track of who’s doing the talking easier with its Source and Destination columns: My local DNS server is 19.168.1.125, the root server is 192.203.230.10, the .com DNS server is d.gtld-servers, and bigfirm.com’s DNS server is web2.minasi.com. This is, of course, a very high-level overview.

Figure 4: Viewing a frame’s details

To view a packet’s hierarchy, click the first of the filtered frames and look in the Frame Details pane. Figure 4 shows the packet hierarchy, which Netmon calls a frame. This frame contains the Ethernet packet, which in turn contains the IPv4 packet. Within IPv4 is the UDP packet (it could be TCP, but most DNS traffic runs over UDP), and inside that is the actual DNS query. Each level’s summary includes relevant addresses or ports and lengths for a nice overview. To dig deeper into the DNS query, click the plus sign to expand the DNS frame, as Figure 5 shows. Netmon’s level of detail in this frame is fairly self-explanatory. You can see that it’s a query (rather than a response), the question (host record for www.bigfirm.com), and an additional record that I’ll cover later in the article.

Figure 5: Expanded DNS frame

Examine the next frame’s DNS details and you’ll see that the root server responds by telling your DNS server, “Go talk to a .com DNS server,” by simply returning the list of 13 .com DNS servers. Your DNS server makes the same query to the .com DNS servers, recursing through the DNS hierarchy until your DNS server finally gets its answer.

So how does this information help you troubleshoot DNS problems? Well, just recently, one of my DNS servers simply stopped responding to DNS queries. To make matters worse, a look at the detailed logging that Windows DNS servers can offer showed that the DNS server hadn’t received any queries. Could the firewall have somehow started blocking DNS traffic? The fastest way to find out was to run Netmon, which showed me that yes, indeed, the NIC was receiving the DNS queries from other systems—I could see the frames, and I could see that my DNS server had responded to none of them. I was pretty confident that it wasn’t a DNS problem, but rather something in the routing and IP itself. Sure enough, disabling RRAS solved the problem. (The ultimate answer seems to be that a patch broke my RRAS-based VPN servers and leaked over to the IP stack somehow.) Without the clarification of a Netmon trace, I’d have spent hours trying to eliminate possible culprits.

Dump Nslookup, Get DIG

The basic DNS troubleshooting tool shipped with Windows is of course Nslookup. But did you know that UNIX folks have had a much better alternative for years—Domain Information Groper? DIG isn’t built into Windows, but it’s easy to find and is a great addition to your DNS toolbelt.

To get DIG, go to the Internet Systems Consortium’s Downloads site, and download the latest version of BIND (currently BIND 9.7.2-P3). BIND is a free program that’s the Internet’s most popular DNS server.

After you download the latest version of BIND, create a folder on your system’s hard drive, add the folder to your system’s PATH environment variable, and copy the files from the BIND zip file to the folder. (If you prefer, you can delete everything in the folder except the DLL files, dig.exe, and dig.html, which is DIG’s Help file—because we’re not running BIND, so we don’t need all the extra files.)

DIG’s basic syntax looks like

dig record-to-query-for \\[@dnsserver\\] \\[querytype\\] \\[+option1, +option2…\\]

So a query such as

dig bigfirm.com ns @192.168.1.125

would instruct DIG to ask the DNS server at 192.168.1.125 to find all the name server records for bigfirm.com. Figure 6 shows this query’s output; notice the greater level of detail in DIG’s output than in Nslookup’s output.

Skip down a few lines to the two lines that begin with ->>HEADER<<-; you can see the DNS header information basically taken from the frame and reformatted a bit. The status info tells you whether the query failed because the record doesn’t exist (NXDOMAIN), there was some sort of configuration error on the server (SERVFAIL), no error at all (NOERROR), or an invalid query, such as asking for a record type that doesn’t exist (FORERR).

You then see the flags in the DNS header. A nice feature of DIG is that it lets you force the DNS header flags to particular values when you make your query. So, for example, if you were to add +norecurse, DIG would set the flag telling the DNS server in question to perform only the first step in resolving bigfirm.com, which in this case would return only the root servers.

The option +trace takes things a step further and causes DIG to print out exactly what the DNS server is doing as it finds bigfirm.com’s DNS server. This tool is very useful for anyone running AD, because security forces us to run our AD-serving DNS hierarchies outside of the public hierarchy, which can lead to a number of configuration errors that +trace can help smoke out. Put DIG on a USB stick and run it on a troubled system with +trace to find the domain controller’s (DC’s) host record; this action will often shed some light on the problem. Once you give DIG a try, I bet you’ll never go back to Nslookup.

Check EDNS

Let’s finish with a topic that I keep running across and am constantly asked about. I’ve found that people seem to think that a DNS feature called EDNS is making Server 2008 R2–based DNS servers incapable of resolving common Internet domain names and that the solution is to disable EDNS—in truth, EDNS isn’t new, nor is it at fault. Like many bedrock Internet protocols, our requirements for DNS have outgrown their original 1983 (RFC 882) specifications, forcing the Internet authorities to try to figure out how to shoehorn new capabilities into a small and inflexibly laid-out space.

I say that DNS is inflexibly laid out because it relies heavily upon a small number of 1-bit flags that indicate such things as whether or not a query needs recursion and whether or not a given DNS server is capable of doing a recursive search. The original RFCs allow only enough space for seven such flags, of which only one remains unused. Got a great idea to solve some DNS problem with a useful new flag? Too bad, unless something changes—there's no room at the inn.

The small-space issue stems from the fact that whenever DNS communicates via UDP—which is preferable, because UDP is so fast and the Internet contains so much DNS traffic—it’s constrained to a maximum packet size of 512 bytes. That 512-byte maximum was mandated by RFCs 883 (1983) and 1035 (1987) and is based on 1980s network realities that no longer apply. For example, have you ever noticed that very few domains on the Internet have more than 13 DNS servers? Even the massively overworked root domain advertises 13 DNS servers, even though it actually hosts 236 servers and uses clustering to make so many appear to be so few. This is because advertising more than 13 servers would create a packet that’s larger than 512 bytes, thus forcing a fallback to the much-slower TCP.

So we needed more flags and bigger packets, but we didn’t want to expand DNS in a way that would create a worldwide DNS compatibility nightmare in the process. The answer? 1999’s RFC 2671 and EDNS (again, Extension mechanisms for DNS). EDNS provides a clever way for an ever-growing population of EDNS-aware DNS servers to detect whether they’re talking to fellow EDNS-aware servers (and thus enjoy the benefits of more flags and more space), or instead are speaking to EDNS-deaf servers (in which case they remain within the pre–RFC 2671 realities, thus avoiding compatibility issues).

When an EDNS-aware requestor queries a responder for a DNS record, it formats the request in standard DNS format but then adds an extra record to its request: a new-to-EDNS kind of DNS record called an OPT record. OPT isn’t like the more familiar DNS records, such as A, MX, NS, SOA, CNAME, etc.; you’ll never see an OPT record in a zone file. It’s more like a secret handshake that only EDNS-aware servers know—a bit of data added to a query.

For example, suppose an EDNS-aware requestor wants to retrieve the A record for a system named PC1 in the bigfirm.com zone from bigfirm’s DNS server. A pre-EDNS requestor would just request the A record of the responder. In contrast, an EDNS-aware requestor would say, “I’ve got two requests for you: First, I need the A record for ‘PC1’ in bigfirm.com, and second, here’s an OPT query with the value ‘4000.’” The 4000 value is the requestor’s way of saying, “Hey, I understand EDNS and if you want to send me a packet that’s larger than 512 bytes, I can handle any UDP packet up to 4,000 bytes.” If the responder isn’t EDNS-aware, it won’t recognize the OPT record—and in that case just ignores it, responding only to the familiar A record query type and emitting no error messages (and thus preserving backward compatibility). But if the responder is EDNS-aware, it responds to both the A record request and the OPT request; in the OPT response, it also includes a number declaring how large a UDP packet it can handle. Thus, if a requestor were to send an OPT=4096 query at the end of another query, and if the responder came back with an OPT=1280 response, then the requestor would know first that the other side understood EDNS and second that it can use oversized UDP packets, but no larger than 1,280 bytes.

So, how does this process go wrong? Well, imagine that your EDNS-aware Server 2008 R2 server queries another EDNS-aware server, and they decide to use a UDP packet that’s larger than 512 bytes. This size of packet can become fragmented (whereas 512-byte packets almost never do), and many cheap Network Address Translation (NAT) routers discard fragmented packets. Worse yet, some firewalls have security rules that say, “If it’s DNS and UDP and it’s bigger than 512 bytes, it must be evil—discard it.” In either case, the result is the same: a failed resolution.

The best solution is to figure out what hop on the journey caused the problem, but that’s not always possible. Another approach is to simply tell your Server 2008 R2 server to no longer send out OPT queries, and thus to never initiate EDNS transactions. You can do that from an elevated command prompt with the following command:

dnscmd /config /enableednsprobes 0

No reboot is necessary after you run this command, and replacing the 0 with a 1 undoes the effect. Understand, however, that the only effect of this command is to prevent Server 2008 R2 from starting an EDNS conversation. If a DNS server queries the Server 2008 R2 server with an OPT packet, the Server 2008 R2 server will happily respond in EDNS fashion—so make sure that your routers and firewalls don’t have an outgoing filter that kills big DNS UDP packets!

EDNS isn’t new to Windows Server; Windows DNS has supported it since Server 2003. Server 2008 R2’s only change was to enable the probes. Nor is EDNS some sort of exotic futuristic protocol; rather, some sniffing of my Internet traffic shows that at least 85 percent of DNS servers of all stripes understand EDNS and are using it to good advantage. It would be a shame to make your Server 2008 R2 DNS server miss out on EDNS’s advantages, so you should do a bit of router reconfiguration before deprobing your DNS server.

Plan Ahead

DNS failures are big problems, but they’re easily conquered with a bit of housekeeping, keeping up on the best tools, and staying abreast of what’s new in DNS. Try out some of what you’ve learned here, and you’ll be better prepared to fix the really bad stuff!

Comments

Plain text