When you think of a computer system's performance, imagine a chain: The slowest component (or weakest link) affects the performance of the overall system. This weak link in the performance chain is also called a bottleneck. The best indicator that a bottleneck exists is the end user's perception of a lag in a system's or application's response time. To tune a system's performance, you need to determine where—CPU, memory, disk, network, applications, clients, or Windows NT resources—a bottleneck exists. If you add resources to an area that isn't choking your system's performance, your efforts are in vain.
You can use NT Server's native tools (or those of third-party vendors) to optimize the performance of your system and identify potential bottlenecks. NT Server's primary performance tools are Task Manager, which Figure 1, page 42, shows, and Performance Monitor, which Figure 2, page 42, shows. Task Manager can give you a quick look at what's happening in your system. Although it doesn't provide a logging mechanism, Task Manager displays specific information about your system's programs and processes. Task Manager also lets you manage the processes that might be adversely affecting your system. You can use Performance Monitor to obtain more detailed performance information (in the form of charts, alerts, and reports that specify both current activity and ongoing logging) based on system events. The Microsoft Windows NT Server 4.0 Resource Kit also contains tools that you can use for troubleshooting. (For a sampling of NT performance-monitoring tools, see Table 1, page 43.)
Before you start performance tuning, you must understand your system. You should know what server hardware you have, how NT operates, what applications you're running, who uses the system, what kind of workload the system handles, and how your system fits into the network infrastructure. You also need to establish a performance baseline that tells you how your system uses its resources during periods of typical activity. (You can use Performance Monitor to establish your baseline.) Until you know how your system performs over time, you won't be able to recognize slowdowns or improvements in your NT server's performance. Include as many objects in your baseline measurements as possible (e.g., memory, processor, system, paging file, logical disk, physical disk, server, cache, network interface). At a minimum, include all four major resources (i.e., memory, processor, disk, and network interface) when taking a server's baseline measurements—regardless of server function (e.g., file server, print server, application server, domain server).
Because all four of a server's major resources are interrelated, locating a bottleneck can be difficult. Resolving one problem can cause another. When possible, make one change at a time, then compare your results with your baseline to determine whether the change was helpful. If you make several changes before performing a comparison, you won't know precisely what works and what doesn't work. Always test your new configuration, then retest it to be sure changes haven't adversely affected your server. Additionally, always document your processes and the effects of your modifications.
Insufficient memory is a common cause of bottlenecks in NT Server. A memory deficiency can disguise itself as other problems, such as an overloaded CPU or slow disk I/O. The best first indicator of a memory bottleneck is a sustained high rate of hard page faults (e.g., more than five per second). Hard page faults occur when a program can't find the data it needs in physical memory and therefore must retrieve the data from disk. You can use Performance Monitor to determine whether your system is suffering from a RAM shortage. The following counters are valuable for viewing the status of a system's memory:
- Memory: Pages/sec—Shows the number of requested pages that aren't immediately available in RAM and that must be read from the disk or that had to be written to the disk to make room in RAM for other pages. If this number is high while your system is under a usual load, consider increasing your RAM. If Memory: Pages/sec is increasing but the Memory: Available Bytes counter is decreasing toward the minimum NT Server limit of 4MB, and the disks that contain the pagefile.sys files are busy (marked by an increase in %Disk Time, Disk Bytes/sec, and Average Disk Queue Length), you've identified a memory bottleneck. If Memory: Available Bytes isn't decreasing, you might not have a memory bottleneck. In this case, check for an application that's performing a large number of disk reads or writes (and make sure that the data isn't in cache). To do so, use Performance Monitor to monitor the Physical Disk and Cache objects. The Cache object counters can tell you whether a small cache is affecting system performance.
- Memory: Available Bytes—Shows the amount of physical memory available to programs. This figure is typically low because NT's Disk Cache Manager uses extra memory for caching, then returns the extra memory when requests for memory occur. However, if this value is consistently below 4MB on a server, excessive paging is occurring.
- Memory: Committed Bytes—Indicates the amount of virtual memory that the system has committed to either physical RAM for storage or to pagefile space. If the number of committed bytes is larger than the amount of physical memory, more RAM is probably necessary.
- Memory: Pool Nonpaged Bytes—Indicates the amount of RAM in the nonpaged pool system memory area, in which OS components acquire space to accomplish tasks. If the Memory: Pool Nonpaged Bytes value shows a steady increase but you don't see a corresponding increase in server activity, a running process might have a memory leak. A memory leak occurs when a bug prevents a program from freeing up memory that it no longer needs. Over time, memory leaks can cause a system crash because all available memory (i.e., physical memory and pagefile space) has been allocated.
- Paging File: %Usage—Shows the percentage of the maximum pagefile size that the system has used. If this value hits 80 percent or higher, consider increasing the pagefile size.
You can instruct NT Server to tune the memory that you have in your system. In the Control Panel Network applet, go to the Services tab and select Server. When you click Properties, a dialog box presents four optimization choices, as Figure 3 shows: Minimize Memory Used, Balance, Maximize Throughput for File Sharing, and Maximize Throughput for Network Applications. Another parameter that you can tune—on the Performance tab of the System Properties dialog box—is the virtual memory subsystem (aka the pagefile).
If you have a multiuser server environment, you'll be particularly interested in two of these memory-optimization strategies: Maximize Throughput for File Sharing and Maximize Throughput for Network Applications. When you select Maximize Throughput for File Sharing, NT Server allocates the maximum amount of memory for the file-system cache. (This process is called dynamic disk buffer allocation.) This option is especially useful if you're using an NT Server machine as a file server. Allocating all memory for file-system buffers generally enhances disk and network I/O performance. By providing more RAM for disk buffers, you increase the likelihood that NT Server will complete I/O requests in the faster RAM cache instead of in the slower file system on the physical disk.
When you select Maximize Throughput for Network Applications, NT Server allocates less memory for the file-system cache so that applications have access to more RAM. This option optimizes server memory for distributed applications that perform memory caching. You can tune applications (e.g., Microsoft SQL Server, Exchange Server) so that they use specific amounts of RAM for buffers for disk I/O and database cache.
However, if you allocate too much memory to each application in a multiapplication environment, excessive paging can turn into thrashing. Thrashing occurs when all active processes and file-system cache requests become so large that they overwhelm the system's memory resources. When thrashing occurs, requests for RAM create hard page faults at an alarming rate, and the OS devotes most of its time to moving data in and out of virtual memory (i.e., swapping pages) rather than executing programs. Thrashing quickly consumes system resources and typically increases response times. If an application you're working with stops responding but the disk drive LED keeps blinking, your computer is probably thrashing.
To ease a memory bottleneck, you can increase the size of the pagefile or spread the pagefile across multiple disks or controllers. An NT server can contain as many as 16 pagefiles at one time and can read and write to multiple pagefiles simultaneously. If disk space on your boot volume is limited, you can move the pagefile to another volume to achieve better performance. However, for the sake of recoverability, you might want to place a small pagefile on the boot volume and maintain a larger file on a different volume that offers more capacity. Alternatively, you might want to place the pagefile on a hard disk (or on multiple hard disks) that doesn't contain the NT system files or on a dedicated non-RAID FAT partition.
I also recommend that you schedule memory-intensive applications across several machines. Through registry editing, you can enable an NT server to use more than 256KB of Level 2 cache. Start regedit.exe, go to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management subkey, and double-click SecondLevelDataCache. Click the decimal base, and enter the amount of Level 2 cache that you have (e.g., 512 if you have 512KB). Then, click OK, close the registry editor, and reboot. I also recommend disabling or uninstalling unnecessary services, device drivers, and network protocols.
To determine whether an NT Server machine has a CPU bottleneck, remember to first ensure that the system doesn't have a memory bottleneck. CPU bottlenecks occur only when the processor is so busy that it can't respond to requests. Symptoms of this situation include high rates of processor activity, sustained long queues, and poor application response. CPU-bound applications and drivers and extreme interrupts (which badly designed disk or network-subsystem components create) are common causes of CPU bottlenecks.
You can use the following counters to view the status of your system's CPU utilization:
- Processor: % Processor Time—Measures the amount of time a processor spends executing nonidle threads. (If your system has multiple processors, you need to monitor the System: % Total Processor Time counter.) If a processor consistently runs at more than 80 percent capacity, the processor might be experiencing a bottleneck. To determine the cause of a processor's activity, you can monitor individual processes through Performance Monitor. However, a high Processor: % Processor Time doesn't always mean the system has a CPU bottleneck. If the CPU is servicing all the NT Server scheduler requests without building up the Server Work Queues or the Processor Queue Length, the CPU is servicing the processes as fast as it can handle them. A processor bottleneck occurs when the System: Processor Queue Length value is growing; Processor: % Processor Time is high; and the system's memory, network interface, and disks don't exhibit any bottlenecks. When a CPU bottleneck occurs, the processor can't handle the workload that NT requires. The CPU is running as fast as it can, but requests are queued and waiting for CPU resources.
- Processor: % Privileged Time—Measures the amount of time the CPU spends performing OS services.
- Processor: % User Time—Measures the amount of time the processor spends running application and subsystem code (e.g., word processor, spreadsheet). A healthy percentage for this value is 75 percent or less.
- Processor: Interrupts/sec—Measures the number of application and hardware device interrupts that the processor is servicing. The interrupt rate depends on the rate of disk I/O, the number of operations per second, and the number of network packets per second. Faster processors can tolerate higher interrupt rates. For most current CPUs, 1500 interrupts per second is typical.
- Process: % Processor Time—Measures the amount of a processor's time that a process is occupying. Helps you determine which process is using up most of a CPU's time.
- System: Processor Queue Length—Shows the number of tasks waiting for processor time. If you run numerous tasks, you'll occasionally see this counter go above 0. If this counter regularly shows a value of 2 or higher, your processor is definitely experiencing a bottleneck. Too many processes are waiting for the CPU. To determine what's causing the congestion, you need to use Performance Monitor to monitor the process object and further analyze the individual processes making requests on the processor.
One way to resolve processor bottlenecks is to upgrade to a faster CPU (if your system board supports it). If you have a multiuser system that's running multithreaded applications, you can obtain more processor power by adding CPUs. (If a process is multithreaded, adding a processor improves performance. If a process is single-threaded, a faster processor improves performance.) However, if you're running the single-processor NT kernel, you might need to update the kernel to the multiprocessor version. To do so, reinstall the OS or use the resource kit's uptomp.exe utility.
Another way to tune CPU performance is to use Task Manager to identify processes that are consuming the most CPU time, then adjust the priority of those processes. A process starts with a base priority level, and its threads can deviate two levels higher or lower than the base. If you have a busy CPU, you can boost a process's priority level to improve CPU performance for that process. To do so, press Ctrl+Alt+Del to access Task Manager, then go to the Processes tab. Right-click the process, choose Set Priority, and select a value of High, Normal, or Low, as Figure 4 shows. The priority change takes effect immediately, but this fix is temporary: After you reboot the system or stop and start the application, you'll lose the priority properties that you set. To ensure that an application always starts at a specified priority level, you can use the Start command from the command line or within a batch script. To review the Start command's options, enter
at the command prompt.
To determine whether your system is experiencing disk bottlenecks, first ensure that the problem isn't occurring because of insufficient memory. A disk bottleneck is easy to confuse with pagefile activity resulting from a memory shortage. To help you distinguish between disk activity related to Virtual Memory Manager's paging to disk and disk activity related to applications, keep pagefiles on separate, dedicated disks.
Before you use Performance Monitor to examine your disks, you must understand the difference between its two disk counters. LogicalDisk counters measure the performance of high-level items (e.g., stripe sets, volume sets). These counters are useful for determining which partition is causing the disk activity, possibly identifying the application or service that's generating the requests. PhysicalDisk counters show information about individual disks, regardless of how you're using the disks. LogicalDisk counters measure activity on a disk's logical partitions, whereas PhysicalDisk counters measure activity on the entire physical disk.
NT doesn't enable Performance Monitor disk counters by default; you must enable them manually. Enabling these counters will result in a 2 to 5 percent performance hit on your disk subsystem. To activate Performance Monitor disk counters on the local computer, type
at a command prompt. (If you're monitoring RAID, use the -ye switch.) Restart the computer.
To analyze disk-subsystem performance and capacity, monitor the Performance Monitor's disk-subsystem counters. The following counters are available under both LogicalDisk and PhysicalDisk:
- % Disk Time—Measures the amount of time the disk spends servicing read and write requests. If this value is consistently close to 100 percent, the system is using the disk heavily. If the disk is consistently busy and a large queue has developed, the disk might be experiencing a bottleneck. Under typical conditions, the value should be 50 percent or less.
- Avg. Disk Queue Length—Shows the average number of pending disk I/O requests. If this value is consistently higher than 2, the disk is experiencing congestion.
- Avg. Disk Bytes/Transfer—Measures throughput (i.e., the average number of bytes transferred to or from the disk during write or read operations). The larger the transfer size, the more efficiently the system is running.
- Disk Bytes/sec—Measures the rate at which the system transfers bytes to or from the disk during write or read operations. The higher the average, the more efficiently the system is running.
- Current Disk Queue Length—Shows how many disk requests are waiting for processing. During heavy disk access, queued requests are common; however, if you see requests consistently backed up, your disk isn't keeping up.
If you determine that your disk subsystem is experiencing a bottleneck, you can implement several solutions. You can add a faster disk controller, add more disk drives in a RAID environment (spreading the data across multiple physical disks improves performance, especially during reads), or add more memory (to increase file cache size). You also might try defragmenting the disk, changing to a different I/O bus architecture, placing multiple partitions on separate I/O buses (particularly if a disk has an I/O-intensive workload), or choosing a new disk with a low seek time (i.e., the time necessary to move the disk drive's heads from one data track to another). If your file system is FAT, remember that NTFS is best for volumes larger than 400MB.
You can also provide more disk spindles to the application. How you organize your data depends on your data-integrity requirements. Use striped volumes to process I/O requests concurrently across multiple disks, to facilitate fast reading and writing, and to improve storage capacity. When you use striped volumes, disk utilization per disk decreases and overall throughput increases because the system distributes work across the volumes.
Consider matching the file system's allocation unit size to the application block size to improve the efficiency of disk transfers. However, increasing the cluster size doesn't always improve disk performance. If the partition contains many small files, a smaller cluster size might be more efficient. You can change the cluster size in two ways. At the command line, enter
format <disk>:/FS:NTFS /A:<cluster size>
or use Disk Administrator. Select Tools, Format, and change the allocation unit size. NTFS supports a cluster size of 512 bytes, 1024 bytes, 2048 bytes, 4096 bytes, 8192 bytes, 16KB, 32KB, or 64KB. FAT supports a cluster size of 8192 bytes, 16KB, 32KB, 64KB, 128KB, or 256KB.
After you consider a system's memory, CPU, and disk metrics, your next step is to examine the network subsystem. Client machines and other systems must be able to connect quickly to the NT Server system's network I/O subsystem so that they provide acceptable response times to end users. To determine where network bottlenecks reside and how to fix them, you must understand what type of workload your client systems generate, which key network architecture components are in use, and what type of network protocol (e.g., Ethernet, NetBEUI) and physical network you're on. Performance Monitor collects data for each physical network adapter. To determine how busy your adapters are, use the following counters:
- Network Interface: Output Queue Length—Measures the length of an adapter's output packet queue. A value of 1 or 2 is acceptable. However, if this measurement is frequently at or higher than 3 or 4, your network I/O adapter can't keep up with the server's requests to move data onto the network.
- Network Interface: Bytes Total/sec—Measures all network traffic (number of bytes sent and received) that moves through a network adapter, including all overhead that the network protocol (e.g., TCP/IP, NetBEUI) and physical protocol (e.g., Ethernet) incur. If, for example, in a 10Base-T network, the value is close to 1Mbps and the output queue length continues to increase, you might have a network bottleneck.
- Network Interface: Bytes Sent/sec—Shows the number of bytes sent through a specific network adapter card.
- Server: Bytes Total/sec—Shows the number of bytes that the server has sent and received over the network through all of its network adapter cards.
- Server: Logon/sec—Shows the number of logon attempts per second for local authentication, over-the-network authentication, and service-account authentication. This counter is useful on domain controllers (DCs) to determine how much logon validation is occurring.
- Server: Logon Total—Shows the number of logon attempts for local authentication, over-the-network authentication, and service-account authentication since the computer was last started.
If you determine that the network subsystem is experiencing a bottleneck, you can implement numerous measures to alleviate the problem. You can bind your network adapter to only those protocols that are currently in use, upgrade your network adapters to the latest drivers, upgrade to better adapters, or add adapters to segment the network (so that you can isolate traffic to appropriate segments). Check overall network throughput, and improve physical-layer components (e.g., switches, hubs) to confirm that the constraint is in the network plumbing. You might also try distributing the processing workload to additional servers.
In a TCP/IP network, you can adjust the TCP window size for a potential improvement in performance. The TCP/IP receive window size shows the amount of receive data (in bytes) that the system can buffer at one time on a connection. In NT, the window size is fixed and defaults to 8760 bytes for Ethernet, but you can adjust the window size in the registry. You can either modify the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\ Parameters\TcpWindowSize subkey to globally change the setting on the computer or use the setsockopt() Windows Sockets call to change the setting on a per-socket basis. The optimal size depends on your network architecture. In TCP, the maximum achievable throughput equals window size divided by round-trip delay or network latency.
Finally, don't use Autosense mode in your network adapters. Set your NICs to the precise speed that you want. To change the setting, use the configuration program that came with your network adapter.
Understand Your Environment
Performance analysis requires logical thinking, testing, and patience. NT's primary monitoring and tuning tools can help you manage the performance of your company's systems.
The key to achieving your objective is understanding what you have, how your applications work, and how your users use your network. To resolve any performance problems and plan for future requirements, combine the knowledge you gain from these tools' output with an understanding of your applications and your environment.