\[Author's Note: Each article in this troubleshooting series builds on the previous month's article. In the July issue, I showed you how to use WinDbg to gather information about your IIS servers. This month, I show you how to put into practice all the tips and techniques I've given you so far to troubleshoot an IIS crash.\]
Over the past few months, I've introduced you to the Windows 32-bit debugging tools (available for download from http://www.microsoft.com/ddk/debugging) and showed you how to gather important information from your IIS machines. This month, I show you how to put into practice everything you've been reading about. I've created a simple Internet Server API (ISAPI) application that you can download and use to make IIS crash. (I've also included the source code and the symbols for the application.) So let's get started.
Setting Up WinDbg
To begin, copy debug.dll and debug.pdb to your scripts directory (usually C:\inetpub\scripts). Note that to execute a .dll file, you must mark this directory for Execute and set the appropriate NTFS permissions. Make sure that you mark the Default Web Site (or whichever site you want to run this application under) to run in process. To mark the site, open the Microsoft Management Console (MMC) Internet Information Services snap-in (the Internet Information Server snap-in in Windows NT 4.0), right-click the Web site that contains your \scripts virtual directory, then select Properties. Click the Home Directory tab to access the Application Isolation options (the Run in separate memory space options in NT). Set the process isolation to Low (or clear the Run in separate memory space check box in NT), then restart the site.
Now, attach the debugger to the inetinfo.exe process. Start Windows Debugger (WinDbg) by choosing Start, Programs, Debugging Tools For Windows, then select WinDbg. From WinDbg, choose File, Attach to a process. When the list of processes appears, select inetinfo.exe, then click OK. After WinDbg attaches to the process and loads some information, the window in Figure 1, page 2, appears. Notice that the next-to-last line says ntdll!DbgBreakPoint. This line indicates that the process to which WinDbg is attached is currently stopped and that the debugger is waiting for input.
IIS doesn't yet have a problem, so start a log file and tell WinDbg to let IIS run again. The two commands to accomplish these tasks are
.logopen c:\debuglog.txt g
(You enter these commands in the prompt window—i.e., the field preceded by 0:000>—in WinDbg.) The first command sets up a log file that writes all output from WinDbg to a text file. (Note that the period before this command is necessary to the command.) The second command tells WinDbg to let the process (in this case, inetinfo.exe) start running again. WinDbg then goes into a monitoring mode. After you enter the second command, notice that the prompt window is no longer active. When a fault occurs that trips WinDbg, the prompt window will reappear.
Now that IIS is running with WinDbg attached, you can crash it. To crash IIS, open a browser and load the ISAPI .dll file that you put in the \scripts directory (e.g., http://localhost/debug/debug.dll). If you call debug.dll without URL parameters, you'll see the page that Figure 2 shows. Follow the instructions on this page to make debug.dll crash your system. When you run debug.dll in Crash mode, the browser eventually times out and you see the output window that Figure 3 shows.
In this output window, you can see several items. First, you see a list of modules that were loaded when you made your request to run in debug.dll (i.e., the ModLoad lines). Next, you see that a problem occurred—in this case, a first-chance access violation. (I explain the difference between first- and second-chance access violations later.) A warning message about not being able to verify a symbol checksum appears after the access violation. (Ignore this message for now.) Following the warning message, three lines of register/value pairs appears. (Registers are the memory locations built into a processor. The processor uses these locations to control how it runs.) These pairs represent the states of the Intel processor's registers at the time of the crash. Next, you see a line that tells you what line of code was running (i.e., debug!HandleRequest+1d). Finally, you see a line of code that shows the assembler command that tripped WinDbg. Now, you can start the Stack Backtrace.
What's a Stack Backtrace?
A Stack Backtrace shows you what a thread was doing up to the point at which you made the trace. This list includes only the current activities, not finished activities. Let's say I'm writing an article for IIS Administrator. The thread represents the task of writing, publishing, and distributing the article. I start out by making a commitment to write the article. This commitment becomes the bottom frame, or line, on the stack. Next, I make a rough draft—the second frame on the stack. When I've finished the draft, I send the article to my editor—the third frame on the stack. This frame is also my first call into another module in the memory space. When my editor has finished her edit, she sends the copy back to me for review—the return value from my call into the editor's module. At this point, frame 3 on the stack disappears because it completed its work and sent the results back to the calling function.
You're now back to a stack with two frames. After I've reviewed the edited article, the editor sends it to the publishing department—add a new frame 3 to the stack. The publisher lays the article out in its proper position in the newsletter. This full layout goes to the printer—frame 4 on the stack. The shipping department puts labels on the printed newsletters and calls the Postal Service to pick them up—frames 5 and 6. The postal carrier reads the address label on your copy of the newsletter. (For you programmers, consider this label a pointer. The label points to where you live so the postal carrier can reach you.) The postal carrier drives to the location that the label indicates but for some reason can't deliver the newsletter. This road block is a first-chance exception. The program crashes. Your stack would look like Table 1.
Running the Stack Backtrace
So, let's get back to the real example. Type
in WinDbg's prompt window. Figure 4 shows the output from the Kb command. By typing the Kb command, I asked the debugger to give me specific information in a way that's easy to read. The Stack Backtrace contains six columns of information:
- ChildEBP—The Child Extended Base Pointer column points to the beginning of the local information for the current frame.
- RetAddr—The Return Address column tells each call where to return to when it's done.
- Args to Child—The next three columns represent the arguments (i.e., data) that are passed to the called child, or function.
- Function information—This column contains information about the called function. This information relies on the proper symbols being installed. (See "Starting the Troubleshooting Process," June 2001, for more information about symbols.) The format is
where module is the name of the .dll or .exe file, function is the name of the routine inside the module, and offset is the number of bytes after the beginning of the function. If source-code information about the module is available, codelocation is the drive and folder of the source code file for the frame.
Looking at the output from the Stack Backtrace, note that a function called HandleRequest inside module debug.dll caused IIS to access violate. However, this line doesn't tell you whether HandleRequest was the culprit in the crash. I said earlier that the postal carrier couldn't deliver the newsletter. Why not? To perform their jobs, postal carriers read the address on a piece of mail and go to that address. If the postal carrier in my scenario read the newsletter address correctly, the address on the newsletter must have been incorrect. As a result, the postal carrier walked into the wall. To tell whether such is the case in the real scenario, look at what the HandleRequest function did. The function attempted to pass deaddead as a pointer to a memory address. (Note that deaddead is a hexadecimal value, not a variable.) However, the memory address referred to is invalid. Now you know that the postal carrier received a bad address.
This exercise is the premise of walking a stack to see what really happened. You could continue to walk the stack frames until you find a frame that received good information but sent bad information to the function it called. Then, you can investigate that function for bad code. (Note that in the case of third-party software, you might need to work with the author or company to resolve the problem.) If you wanted to take a simpler approach, you could look at the stack and see what programs were running, regardless of source code or parameters passed. If you see a module or program listed that you suspect might be misbehaving, move that application into its own memory space. On a busy production server, this method is a quick way to identify a problem.
A Few Considerations
I've used a simple and straightforward access violation to demonstrate how you walk a stack. However, the problem is rarely this simple. Usually, several frames exist on the stack above the faulting frame—assuming that the faulting frame still exists. Here's a list of items to consider when you're looking at access-violation stacks:
- In most access-violation cases, you see several frames of code from kernel32.dll or ntdll.dll (or other NT modules) on the top of the stack. Such frames aren't unusual because most bugs in which bad pointers cause a problem (such as in this case) don't blow up until they're deep inside NT. I'm not saying that the NT files on the top are never the cause of problems, but they rarely are.
- You might or might not see the entire backtrace in a stack. To tell whether the entire backtrace is showing, look at the bottom frame. If this frame is a call to BaseThreadStart or a similar thread-creation routine, you're most likely looking at the whole backtrace. Not seeing the entire stack isn't necessarily a problem. Often, even partial stacks are enough to help. Debuggers can't always walk a stack all the way back because of bad symbols or a problem deciphering all the information on the stack.
- Don't forget about the functions that might already have run and finished. In my example, you could easily see where I passed a bad pointer from the second frame to the first frame. However, what if (in the newsletter scenario) I originally passed the addresses to the editor? Perhaps the editor spilled coffee on one of the labels and smudged it. Then, the label was forwarded until finally the postal carrier hit the road block. Finding this problem would be difficult because no evidence of the editor exists on the stack.
- If you don't have access to source code, you can make educated guesses as to what might be going on. If you see a call to something like Kernel32!RaiseException, you can assume that the program had some error-handling routine (or relied on IIS's error handling) and did something that caused the error-handling routine to be called (which is the sole purpose of the RaiseException call).
Exceptions and Error Handling
The final point I want to bring up here is the difference between first- and second-chance exceptions and how error-handling routines fit into the debugging picture. Going back to the postal carrier scenario, let's say that you anticipated the possibility that one of the addresses was wrong, so you bought protection in the form of a second carrier (i.e., an exception-handling routine). The first carrier hitting the road block raised a first-chance exception, which made the other carrier kick in. The second carrier's instructions were to go pick up the newsletter, call the office, and get a corrected address. The office returns another address, then the second carrier delivers the newsletter. The error-handling routine worked!
However, if the second postal carrier received the same address or another invalid address, this carrier would hit the same road block as the first carrier, and a second-chance exception would occur. At this point, the program can't recover. In the real scenario, I had no error-handling routine, so the first-chance exception immediately be-came a second-chance exception.
Debuggers will trip (i.e., generate a break) on first-chance exceptions. So, when a program trips the debugger, you might be looking at a problem that can fix itself. If a program trips while performing a live debug and you want to know whether the program can recover, simply type
in WinDbg's prompt window and see whether the debugger immediately trips again.
You've made your first foray into debugging IIS. For more in-depth information about stacks, frames, and how to read them, see the Microsoft article "Analyzing Logs from Exception Monitor" (http://msdn.microsoft.com/workshop/server/iis/readlogs.asp). Next month, I'll dive into a common scenario in which IIS hangs and the debugger doesn't trip or catch anything automatically.
Note: You must use CScript or WScript to execute DispNode.vbs and DispTree.vbs. CScript provides the best output (i.e., it displays the output in a scrolling fashion as opposed to WScript's dialog boxes) and even lets you capture the data to a file.