This month, I show you how to troubleshoot situations in which your server is sluggish or unresponsive because of high CPU utilization. When a server's CPU or CPUs are working at or above 80 percent to 90 percent utilization, applications on the server can become sluggish or stop responding completely. When this situation occurs, you need to determine which process is monopolizing the CPUs.
To find the problem process, open Task Manager and click the Processes tab to see a list of running processes, like the one that Figure 1 shows. (You can click the CPU column header to sort the list of processes by their CPU utilization.) Make a note of which process is monopolizing the CPU time. If IIS is causing the problem, the offending process will be inetinfo.exe or the out-of-process (OOP) executable dllhost.exe (Windows 2000) or mtx.exe (Windows NT 4.0). If you use Microsoft Indexing Service, cidaemon.exe might explain the high CPU utilization. This process builds and updates the Index catalog and typically uses a lot of CPU time. However, cidaemon .exe is also set to run at a low priority, so it usually doesn't degrade the performance of the system even if it's at a high CPU level. If any other process is monopolizing the CPU, the problem is most likely not related to IIS.
When you know which process is causing the problem, you can gather more information about that process. The following steps should help you obtain the information you need. These steps assume you've already installed the debugging tools I've mentioned in previous articles in this series. For debugger installation instructions, see "Starting the Troubleshooting Process," June 2001, InstantDoc ID 20594. The steps also assume that you've downloaded two additional files—debug.dll and debug.pdb—from the Code Library on the Windows Web Solutions Web site (http://www.windowswebsolutions.com), installed them, and copied them to your default scripts directory. You need debug.dll and debug.pdb if you want to peg the CPU (i.e., cause the CPU to perform at 100 percent utilization) and follow along with the steps in this article. You don't need the files to troubleshoot an actual high-CPU-utilization problem. Make sure that the scripts virtual directory is running in process.
Collecting Performance Information
Open Performance Monitor on the server, and create a Performance Monitor log file. To do so on a Win2K server, follow these steps:
- Expand the Performance Logs and Alerts node.
- Right-click Counter Logs, then select New Log Settings.
- Enter a descriptive name for the log file.
- Note the log file location—you'll need to know it later. (If you want to change the file's location, you can do that after you add the performance objects below.)
- Select All Counters and All Instances.
- Select the following items from the Performance object drop-down list, being sure to click Add after each selection: Active Server Pages, Memory, Process, Processor, Thread, Web Service, Internet Information Services Global.
- Click Close.
- Change the interval to 10, then click OK.
To create a Performance Monitor log file on an NT 4.0 server, follow these steps:
- Select Log from the Options menu.
- Set the location and name of the log file.
- Change the interval to 1 (for 1 second).
- Click Save.
- Click the plus sign (+) on the toolbar.
- Add the following objects: Active Server Pages, Memory, Process, Processor, Thread, Web Service, and Internet Information Server Global.
- Click Done.
- Select Log from the Options menu.
- Click Start Log.
After you've created the log file, let the server run for 1 to 2 minutes. Then, open a browser and type
to begin pegging the CPU. Your system will run very sluggishly during the pegging process.
Run Autodump+ (AD+—one of the debugging tools I've mentioned in previous articles) using the syntax
cscript adplus.vbs -hang -iis -o <logfilepath>
For more information about AD+, see the Microsoft article "HOWTO: Use Autodump+ to Troubleshoot 'Hangs' and 'Crashes'" (http://support.micro soft.com/default.aspx?scid=kb;en-us;q286350).
On a Win2K server, right-click the appropriate log listed in Performance Monitor under Counter Logs, then select Stop Log. On an NT 4.0 server, select Log from Performance Monitor's Options menu and click Stop Log. After you've stopped the logs, use Task Manager to end the inetinfo.exe process. At this point, you should have a log file from Performance Monitor and a dump file of inetinfo.exe and any OOP IIS applications.
Examining Performance Information
Now you can examine the log and dump files. In earlier articles, I've talked about NT Synchronization Objects, which are used heavily by IIS and are therefore important to understand in high-CPU-utilization cases. When a thread is waiting for something else to finish working or for data from a disk or network connection, the thread isn't really doing anything, so the program running in the thread should use an NT Synchronization Object to "put the thread to sleep." Then, when the other thread completes its work or the disk returns the necessary data, NT can "wake up" the first thread and it can continue working. While the first thread is asleep, it doesn't use any CPU time.
Because IIS spends a lot of time waiting for incoming requests, responses from databases, and data from disks, most of its threads are in a wait state (and therefore not using much CPU time) at any given point in time. So, in a case in which IIS is using a lot of CPU time, one or two threads are typically the culprits. You can look at the Performance Monitor log to determine which threads are utilizing the CPU.
In Win2K, open Performance Monitor, then click the View Log File Data icon on the taskbar and select the log file, as Figure 2 shows. Next, you need to add counters for thread CPU utilization. Click the Add Counters icon on the taskbar to open the Add Counters dialog box, which Figure 3 shows. From the Performance object drop-down list, select Thread. Select the Select counters from list option, then choose % Processor Time. Then, select the Select instances from list option and choose all the inetinfo threads. Click Add to load the counters. In the Performance Monitor console, ensure that the Highlight icon on the taskbar is enabled, as Figure 4 shows. The steps are similar on an NT server.
Now you can navigate through the list of counters at the bottom of the window until you find the one that's using most or all of the CPU time. (When highlighting is enabled, the white line on the graph represents the counter selected at the bottom of the window. The white line of the problem thread will consistently have the highest position on the graph.) For example, in Figure 4, the problem thread has instance number 7. Make a note of which thread was high in your test.
Note that if you see a thread that's using a lot of CPU time but that dies down to 0 percent for several seconds at the end of the log, you probably won't get any useful information by examining that thread in the dump file. The dump file records one instant toward the end of the Performance Monitor log, and a thread that was back at 0 percent was probably already out of the routine that was causing high CPU utilization in the first place. The exception to this generalization is threads whose CPU utilization drops right at the end of the log. This behavior is typically caused by the debugger freezing the process while creating the dump file.
Now that you know which thread you want to examine, you can load the dump file in Windows Debugger (WinDbg). Go to the desired thread by typing
(e.g., ~7s). Then, type
to get the stack trace for that thread. You should see a dump similar to the one at the bottom of Figure 5. (Note that an overlying window shows source code for the section of code being executed in the thread. The code showed up because I copied debug.pdb—i.e., the debug symbols—to the same directory as debug.dll. You might not see such a window.) At the top of the stack, you can see a call to debug.dll, so you can conclude that this DLL is using all the CPU time. If debug.dll were a real application running on your system, you could now ask the developer to review the code for problems.
Troubleshooting Without a Log
If you don't have a Performance Monitor log, you can simply look at what all the threads in the inetinfo.exe dump file are doing (the command to dump the stacks for all threads is ~*kb) to determine which thread is the culprit. Remember that most threads will be waiting for another process or for data; you can eliminate those threads immediately. One of the following synchronization calls on the top frame of a thread's stack indicates a thread that you can ignore:
(This list isn't all-inclusive; other calls also can indicate no CPU utilization.) Another thing to remember is that the top frame of the stack for thread 0 in inetinfo.exe always reports the Zw-ReadFile state. However, thread 0 (and this statement applies only to thread 0) in the ZwReadFile state isn't really doing any work. You can assume that it isn't using any CPU cycles.
When you examine dump files that you've captured during a time of high CPU utilization, you can fairly easily see which process or processes are using all the processor time. Eliminate all the threads that are in one of the states listed above, then look at the threads that are left. If you don't have the Performance Monitor log to accompany the dump, you must do some guesswork, but you can almost always narrow the list of threads until you find the ones that are actually utilizing the CPU.