I've mentioned in previous Microsoft Proxy Server articles that a key measurement of whether you're supporting Proxy Server successfully is performance. This month, I end this series by discussing specifically what you can do to improve the performance and availability of your proxy server. I show you how to use Microsoft Performance Monitor to quantify your observations and look at third-party products to do your day-to-day monitoring.
Proxy Server and Performance Monitor
Not long ago, I supported a pair of proxy servers that suffered in terms of performance every time the user count exceeded 700 (on one server). These machines were the latest thing when my group bought them—333MHz dual-Pentium II processors and a lot of memory and disk storage. You could almost predict when the proxy servers were about to crash: When they hit the 700-user threshold, proxy server performance for all users quickly degraded to an unacceptable level. In one case, the proxy servers peaked at more than 1000 users before we found it belly-up.
My group grappled with how to tackle the problem. We discussed whether we had hit the theoretical limits for Proxy Server, but conversations with Proxy Server technical support personnel assured us that that wasn't the case. They said it was users on one proxy server, though they remained tight-lipped about just what the actual ceilings were. We thought that we might need to upgrade the server hardware, so we talked to Microsoft about where to spend the upgrade money and about the relative return on money spent upgrading. From those discussions, we determined that the money was best spent in the areas of overall disk performance, network speed, processor speed, and memory—in that order.
In the end, we chose to purchase new servers and not to upgrade for two reasons. First, our user base was growing at an astronomical rate. We could have spent $4000 or $5000 on each server and still experienced similar user-count ceilings. But rather than spend the money on new proxy servers that would be peers to the existing proxy servers, we chose to chain the proxy servers. Finally, we installed our new, blazing 733MHz Pentium III processor proxy servers as downstream servers. (Figure 1 shows a diagram of this chain.)
The reasoning for our decision to place faster proxy servers behind slower proxy servers is rooted in the user experience. Users judge the quality of their browsing experience by how fast content appears on their screens. Users are far more tolerant of a Web site that appears to be down than of seeing an error message coming back from a proxy server or, even worse, returning nothing at all. (For information about capacity planning, see the sidebar "IIS Capacity Planning," page 8.)
We'll eventually replace these proxy servers. By the time we replace them, processor speeds will be more than 1GHz, so we'll do the switcheroo again and for the same reasons. By the way, we're now successfully supporting those four-digit user levels that Microsoft promised across just two servers. We often monitor our user counts to see how well utilized those proxy servers are. We can also monitor user counts to determine when the peak loads for a proxy server occur. For us, peak times always seem to be at the start of the workday and at lunchtime. Obviously, you'll want to avoid those peak periods when planning maintenance for your proxy server.
When you install Proxy Server, you'll get several new objects and counters for monitoring your proxy server. Open Performance Monitor on any proxy server, and let's look at some counters.
Web Proxy Server Service/Current Users. This counter lets you measure how many users are currently connected to your proxy server's Web Proxy Service, the service to which nearly every Web browser connects to surf Web pages. You can get the same value from the Microsoft Management Console's (MMC's) Shared Properties page. Note that this value doesn't always mean that users are actively using that many browsers: Users might have the browser in the background on their desktop. This counter is popular, especially if you use a chart or log to see Web surfing trends in your environment.
Web Proxy Server Service/Maximum Users. This counter shows the peak value of the current users as generated by the Current Users counter. This value is important not only for performance but also for licensing purposes. For statistical purposes, when Performance Monitor monitors the number of users on my proxy server, I always graph this value (in addition to the Current Users value) to give a better representation.
Web Proxy Server Service/Sites Denied and Web Proxy Server Service/ Sites Granted. These two counters are important if you use domain filtering at the proxy server level. A third-party plugin might not utilize these counters. The counters use domain filtering to monitor the number of sites allowed and disallowed by client request as the clients request the sites. You can look at these two values simultaneously to ensure that your domain filtering policy is working. If one counter is incrementing and the other counter is zero or not incrementing over a period of time, it signals a problem in your policy.
Cache Hit Ratio (%). This counter is arguably the most important for monitoring the effectiveness and demonstrating the value of a proxy server. The cache-hit ratio represents the percentage of client requests that the proxy server's local cache satisfied. The higher the number, the better overall performance the user sees. In addition, the local cache satisfying more requests results in lower bandwidth needs. When a proxy server first starts, it takes several hours (possibly several days) to stabilize. For the average set of users and the average proxy server, any percentage greater than 30 is a good number. Often, you can achieve percentages in the 40s. You can get minor improvements through cache management, including increasing the cache size or modifying the caching policy. For information about Proxy Server caching, see "Proxy Server Caching," May 2000. This article also discusses several other worthwhile Performance Monitor counters.
Web Proxy Server Service: Requests/ Sec. This counter value represents the number of requests per second the proxy server is evaluating on behalf of users. This counter demonstrates work being done on the proxy server in realtime, and values can swing widely based on the types of content that users request.
Web Proxy Server Service: Current Average Milliseconds/Request. This counter shows the amount of time necessary to complete a client's request. This value varies from proxy server to proxy server. User loads, machine hardware, and available bandwidth all play a part in determining it. You'll simply have to get a feel for what is average for your installation and take action if the value rises sharply. Note that momentary spikes in this value are common. As users enter an erroneous URL or a URL that no longer exists, this value is artificially inflated as the DNS request times out or the URL times out. The counter averages this longer time period with the successful requests, creating a spike in the results.
Windows NT 4.0/IIS Performance-Related Counters
If you suspect a performance-related problem with your proxy server, you can do more than look at only the Proxy Server-specific Performance Monitor counters. A host of other counters exist to help you look for bottlenecks. Many publications describe in detail the process for running down general bottlenecks. You can also run a search on the Windows 2000 Magazine Web site (http://www.win2000mag.com/) and find articles such as John Saville, "Troubleshooting NT Performance Monitoring," April 1998.
Memory: Pages/sec. Any value on this counter up to 20 is acceptable. Twenty to 60 isn't good, and anything over 60 means trouble.
Processor: % Processor time. Microsoft says the maximum acceptable value is 75 percent in most cases. A consistent 85 to 90 percent is evidence that the machine is properly scaled and matched to its intended function. A process running consistently near 100 percent means trouble.
LogicalDisk: Queue Length and PhysicalDisk: Queue Length. Both counters should show values less than 2, on average. If you have results greater than 2, suspect the disk or the controller card: The disk or card might have a problem or require an upgrade. Moving from 7200rpm to 10,000rpm disks makes a big difference in proxy server return times.
Process: % Processor Time (Instances: inetinfo, wspsrv) and Process: Virtual Bytes (Instances: inetinfo, wspsrv). I like to monitor these two instances of processor time to understand what resources these two processes are consuming. If a proxy server is about to fail, you can use Performance Monitor's charting function here first to notice a trend.
You'll develop a similar list of counters with which to monitor performance. Don't forget that you need to get a baseline from your proxy server for later comparison. If you change hardware, OS, or proxy server parameters, don't forget to recalculate your baseline. The slightest change can throw your baseline completely off. Eventually, you'll get to know what the thresholds are for your proxy server environment and be able to plan accordingly.
Setting administrative alerts to point to serious Proxy Server events that demand immediate attention isn't an easy thing to do. In large shops supporting hundreds or thousands of proxy server users, you'll want to monitor multiple proxy servers and be alerted when specific conditions exist.
Performance Monitor works well to monitor specific counters looking for specific conditions or exceeding certain thresholds, but it isn't meant to be an enterprisewide monitoring tool. To obtain this comprehensive functionality, you must look to a third party. The crucial feature for monitors is their ability to monitor down to the service level. That means that you're looking not just for server failure, but also for service failure. Any product that provides this feature might also be able to provide other services, such as failure notifications by email or pager. The product might also attempt to restart the service automatically when a certain threshold is crossed. In many products, a number of options exist for reacting to a service-level failure.
I've used NetIQ AppManager (http://www.netiq.com/) in the past for service-failure functionality. Although AppManager is a pricey enterprise solution, it compensates for all Performance Monitor's shortcomings and can monitor a slew of different servers and platforms. Other products that perform this same functionality include Heroix's RoboMon (http://www.heroix.com/product_info.htm) and Argent Software's Guardian (http://www.argentsoftware.com/prod/product/guardian.htm).
The Future of Proxy Server
Over the past 7 months, I've explained how you can configure and optimize Proxy Server in your Web environment. As Proxy Server evolves into its next generation, I'll revisit this topic again from time to time. Meanwhile, be sure to check out the beta version of Microsoft's newest Internet product, Internet Security and Acceleration Server 2000 (http://www.microsoft.com/isaserver/default.asp). Internet Security and Acceleration Server 2000 has many roots in Proxy Server 2.0. This version requires that you run Win2K Server, Win2K Advanced Server, or Win2K Datacenter Server.