Last month, I gave you a quick look at IIS architecture. This month, I show you how to start applying that knowledge to isolating and fixing problems you might encounter. The information here is second nature to most administrators, but many in the industry, including some support personnel, tend to overlook it. Combining this information with a good understanding of how IIS works can easily narrow down your search for the problem.
The first and most important thing to do when trying to resolve an IIS problem is to review all the facts you already have. Ask yourself these key questions.
- What symptoms are you seeing? In this case, you're looking for the problem type. You could have a Dr. Watson error or an Access Violation error. The server might appear to hang and not serve pages. It could hard lock to the point of not running applications. The client might be receiving the ASP 0115 A Trappable error has occurred error message, but the server still appears to run.
- If you're experiencing a hang in which Active Server Pages requests appear to stop responding, can you access an .htm page? This question is a crucial part of isolating problems. Active Server Pages (ASP) requests have a unique pool of worker threads (provided by COM+ in IIS 5.0 or Modular Transaction System—MTX—in IIS 4.0). If ASP is hung but the server can still serve .htm or static pages, you want to concentrate your efforts specifically on ASP. If everything is hung, you know you need to investigate IIS's core thread pool, known as the Asynchronous Thread Queue (ATQ).
- Was the server working before this problem occurred, or is this system new? If the server was working before, what changed? Did you add code, change a setting, or move the site to a new machine? If this server was working fine last week, you need to look at what changed. Don't overlook anything here, because if everything was fine and it's not now, something changed. Knowing what changed can at least give you a starting point in your investigation. Code additions or revisions are often culprits in problems with Web servers.
- What does the problem site or server do? In other words, what kind of code is on the site and does it talk to databases, back-end servers, and so on? This question can sometimes be difficult to answer. If the machine is hosting one site that uses VBScript and ADO to retrieve data from a back-end Microsoft SQL Server, you have a pretty good idea of the components involved. If, however, the server is an ISP machine that hosts 50 different sites, each written by a different client, the problem becomes more difficult. I usually recommend that ISPs and administrators hosting multiple sites keep a log with the general functionality of the different sites they're hosting. Your list will never be perfect because you can't guarantee that your customers will always tell you everything, but you'll have a good starting point.
- Is the problem reproducible on demand? Can you perform a series of steps that cause the problem to occur every time? If so, can you perform those steps on a different machine with the same application and have the problem occur on the other machine? This process can tell you whether the problem is in the code or setup or in the settings of a specific machine. Also, if you can't reproduce the problem on another machine, start looking at the differences between the two machines.
- Do relevant entries appear in Event Viewer? This information is quick and easy to retrieve. Often, you can find a problem simply by looking at the messages in Event Viewer. For example, if you're having trouble getting an ASP page to return data from a SQL Server machine and you see an error in Event Viewer stating that an ADO error has occurred (along with an error code), you can go to Microsoft's Support site and search the Knowledge Base for the error. Chances are that you'll find one or more articles that address the error.
- What's required to get the server running again? Discover this information systematically. For example, if the Web sites are hung, first try stopping and restarting the Web service by choosing Start, Run, then typing net stop w3svc. Restart the service by choosing Start, Run, then typing net start w3svc. If that process doesn't work, try stopping the entire Web server by choosing Start, Run, and typing net stop iisadmin /y, then restarting the server by choosing Start, Run, and typing net start w3svc. Keep in mind that you also have to restart any other services you had running in inetinfo, such as FTP and SMTP. If this process still doesn't work, you need to restart the server on which IIS runs.
- What does the client see? If a client reports trouble accessing the site, ask the user for specifics. Remember that some newer browsers have a setting for Friendly HTTP errors. This setting masks the real error and doesn't let you get enough information. Ask the client to disable this setting, then try to reproduce the error.
Using the Information
After you've gathered the information, you can start troubleshooting. Over the next few months, I'll cover many scenarios that show you how to use the information you've gathered. I'll also show you how these different pieces of information relate to and build on one another. This month, however, to give you a quick example of what I'm talking about, consider the following situation.
A Web server is serving pages that are incomplete (the bottom part of the page is missing). The problem happens only on the production server. You discover that some code has been changed in an Internet Server API (ISAPI) filter that regulates page lengths. So, you take the new code and run it on a staging server, but you can't reproduce the problem. You assume that something else must also have changed. When you compare the settings on the production server with those on the staging server, you see that the setting for Keep-Alives is different on the staging server. If you change the setting on the test box to match that of the production box, you reproduce the problem. Now you know what the cause of the problem is, and you can decide whether to change Keep-Alives on the production server or rewrite the code to accommodate the new setting.
Setting Up Your Servers
To avoid some common problems, let's look at how you should set up your production, staging, and development servers. I recommend that you install a small set of debugging tools on your production and staging servers and all the debugging tools on your development servers.
Before you can set up your server, you must download the latest Windows 2000 or Windows NT debugging tools from http://www.microsoft.com/ddk/debugging. This download includes some basic, commonly used tools. When you've downloaded the debugging tools, run the .exe program to begin the installation. You can accept all the defaults, but you'll probably want to change the installation path to something short, such as C:\dbg, because you'll use some of the tools in a command window. To see a list of the debugging tools included in the download, go to the directory in which you installed the tools, then open relnotes.txt.
Next, you need to install symbols. (To understand what symbols are and where to get them, see the sidebar "Understanding and Using Symbols," page 2.) I recommend that you keep the appropriate symbols on each server. Obtain all the proper symbol files you need for your servers, and run the setup programs for them. Keep the following information in mind.
Always install IIS and OS symbols in the same order in which you installed software on your system. If the system prompts you to overwrite older symbol files, do so. The symbol file for each file must match the file version exactly. When you obtain hotfixes or security patches, make sure you add the symbols for them as well.
Symbol files usually reside on the system drive under \%winnt%\symbols. You can move the symbol files anywhere you want, but you need to make sure you set the system environment variable SYMPATH to the directory in which the symbols reside. Make sure you set the System variable and not the logged-on user's variable, because several tools run under the context of the System account.
Running Microsoft Windows NT 4.0 Option Pack. If you're installing symbols on an NT 4.0 server that's running the NT 4.0 Option Pack, you must perform an extra step to set up symbols. After you run the Option Pack Symbol setup, several directories will appear under the Symbol directory (e.g., DLL, EXE, OCX). A directory called IIS4 will also appear, which will have similar directories under it. Copy these directories up one level so that their contents are added to the files in the core directories. (The system will prompt you to overwrite files, which you should do.)
Set up your development servers. For development servers, install all the tools and symbols that you installed on your production and staging servers. Also, install any debugging tools that are part of your development environment. Remember that your development tools should never be on a production server. See your development product documentation for installation procedures.
Crash Cart Servers
A crash cart server is a support-only machine that you can leave in your production server room for remote debugging. Many companies set up production servers in a demilitarized zone (DMZ) that doesn't allow remote access or restricts the tools and utilities that can go on the servers. To deal with this restriction, load all the necessary tools on a small server (usually on a rolling cart) and make the server ready to move over to whatever machine is experiencing problems. The crash cart server should contain all these components (at a bare minimum):
- Win2K Server with Performance Monitor and Component Services installed (Performance Monitor can read NT 4.0 log files, so you cover both platforms this way.)
- the debugging tools I mentioned previously for production servers
- a full symbol tree
- a full copy of Network Monitor (the copy that's part of the NT installation is crippled and won't work here) or another network-analysis package
- DUN, both server and client
- an Ethernet adapter and cable
- a serial (null modem) cable (for Kernel mode debugs)
By having this cart available, you'll be ready to work on most problems right in the server room with a minimal effect on uptime.
Now that you've set up your machines and are ready to gather all the relevant information about any problems you encounter, you're ready to start troubleshooting. The next few articles in this series will lead you through symptoms, steps to find the cause, and how to fix the problems. I'll work through the problems based on sample scenarios. Each scenario will start by answering the questions in the Problem Analysis section of this month's article and will include steps that you can use to solve the problems.
Next month, I'll start covering most of the debugging tools I mentioned this month. If you want a jump-start on this information, you can read the debugger.chm Help file. This file is a great introductory reference to several trouble-shooting techniques.
Note: I recommend installing the debugging tools so that you're prepared to troubleshoot a problem if it occurs. In many scenarios, companies have strict guidelines about modifying production or staging servers. If the debugging tools are part of a core installation, then you don't need to wait for approval to install the tools.