\[Editors' Note: The performance-monitoring methods and diagnostics in this article rely on HTTP requests and on ping.exe and tracert.exe commands. Firewalls and numerous other IT-architecture security measures might not permit these protocols. You'll need to work with your security team to plan the location of your Web-monitoring modules.\]
Is your Web site running well? Is your site always available, and does it provide content at an acceptable performance level? The best way to answer these questions is to monitor your Web services' performance and availability from an end user's perspective. You can leverage Windows 2000, the Microsoft Windows 2000 Resource Kit, a scripting language (e.g., Perl), and Internet standards (e.g., http, Internet Control Message Protocol—ICMP) to gain valuable insight into your Web servers' response times and availability. After you begin tracking your Web site's performance, you can use a combination of automated and manual processes to isolate performance bottlenecks and troubleshoot your Web site.
Customer Perspective: Site Monitoring Methodology
The strategy for determining your Web site's performance (i.e., how long selected Web pages take to download to your customers) and availability centers around the customers to whom you provide Web content or services. What do your customers experience when they download your home page? To determine the answer, you need to run the same application and follow the same network paths as your customers, then measure performance. You can deploy special agents on every desktop that you administer; this approach lets you directly measure customers' application performance. However, cost and manageability constraints make this solution unrealistic for most environments.
Another way to statistically sample your crucial Web servers' performance and availability is to place a few well-located Web-monitoring modules on key systems throughout yourenterprise. These systems use the same network paths that your customers use to place HTTP requests to your Web sites. You must track the data path from the client, through the network, to the Web servers; tracking the entire path gives you insight into your customers' experience and helps you troubleshoot all facets of your infrastructure. Figure 1 illustrates this system-management approach.
Although this monitoring strategy isn't radical, it differs from most commercial system- and network-management packages that use ICMP echoes (e.g., basic ping responses) to determine whether a server or network device is available. (For more information about commercial management packages, see the sidebar "Commercial Tools for System and Network Management.") Basic ping tracking is helpful when you want to monitor availability but provides information only when the server's OS is running, and the information pertains to a generic data packet's response time over the network only from a central location to the server, as Zone B in Figure 1 shows. This type of tracking can't help you determine whether a server is performing its intended application services or how the network is affecting specific application service delivery to your customers, which Zone A and Zone D in Figure 1 represent.
Conversely, you can use the system-specific data that you can collect with the Web-monitoring method to find out how much time your customers must wait when they download specific URLs. You can use Win2K (or Windows NT) event logs to track the URLs' availability over time and automatically alert you (e.g., send an email, page a systems administrator, send SNMP traps) when performance falls below an acceptable level. To automate diagnostic programs and alerts, youcan integrate these monitoring tools into commercial system- and network-management tools that you might already have in place.
Implementing and Testing the Monitoring Modules
How can you put this Web-monitoring methodology into motion and make it work for you? To start, you need to determine which URLs you want to monitor, which customer locations you want to track (i.e., where you need to place your monitoring modules), which tracking method (e.g., event logs, database) you want to use, and who you want the system to alert if a problem arises.
The best way to describe the steps that you need to follow is to give you an example. For our example, we'll monitor insideweb1.mycompany.com, insideweb2.mycompany.com, www.mycompany.com, and www.partnerwebsite.com. (This combination of sites represents internal, external, and business-partner sites.) We'll monitor these URLs from two internal network locations: the network switch to which MyCompany's CEO connects and aswitch to which the highest percentage of our customers connect. If our monitoring modules detect a problem, the modules will send an event to the Win2K event logs, write the diagnostic data to a flat file, and send an email alert to our systems administrator (aka SuperSA). Our example also includes an option to use an ODBC call to store the information in a Microsoft SQL Server 7.0 database, which we can use to track Web site performance over time and which provides a more robust mechanism for further analyses.
To set up a monitoring module on your Win2K system, you need to configure TCP/IP, the resource kit (the monitoring module uses the resource kit's Timethis utility to time requested Web transactions) and your chosen scripting language. For our example, we use ActivePerl 5.6,but you can use any scripting language that lets you make an HTTP request from the command line and that lets you call Win2K command-line tools. (You can obtain a freeware version of ActivePerl from http://www.activestate.com. For more information about Perl, see Bob Wells and Toby Everett, "Modify the Registry with Perl," January 1998.)
Next, install and customize webmonitoringmodulescript.pl and its associated modules: poll.pl, event.pl, and optmod.pl (Listings 1 through 4, respectively, show these scripts). Download these listings and place them in one directory. You don't need a dedicated system to run these scripts. In our tests on a 450MHz dual Pentium III CPU-based system, monitoring 20 servers at 10-minute intervals generated a CPU load of less than 5 percent every 10 minutes. If you can run an action from the command line, you can start the action from within the Web-monitoring module scripts.
Each script includes explanatory comments and customization sections that you can edit to complete numerous actions (e.g., Listing 4, Customization Section A lets you send an email alert to more than one person) and to suit your environment. (We customized the scripts for our example.) In the script that Listing 1 shows, you'll need to review and update the following variables: baseline, which denotes the maximum acceptable time in seconds for a URL request to complete; outputfile, which defines the file in which the modulekeeps results; and array, which defines the URLs that you want to monitor. Set the baseline value to match your acceptable performance threshold. (In our script, we set the baseline to 5 seconds. One of the best ways to determine the threshold is to bring up some Web pages and gauge your reaction. How long is too long for you to wait? Set your baseline accordingly. If you want to adjust the baseline later, you'll need to change only one number.) If you use the Win2K Network Load Balancing (NLB) service in a cluster environment, you need to modify certain aspects of the script (see the sidebar, "Monitoring Modules and Windows 2000 Network Load Balancing Service" for more information). In the script that Listing 4 shows, you'll need to edit Customization Section A to tell the module whom to alert in case of trouble.
The module emulates a customer browsing a Web site: The script makes an HTTP request to the target URLs and grabs the Web pages one at a time. And thanks to Internet standards, you don't need a Web browser to read the Web pages. The example script that Listing 2 shows uses a Perl for Win32 API call: Request(Get, $url). After the module fetches a Web page, the script evaluates whether the Web server is responding within your performance baseline and whether the URL is available (see Listing 1). The module then logs the Web page's performance. This process can help you ensure that the Web server is responding as planned. If the server returns an error, doesn't respond at the proper performance level, or doesn't respond at all, the monitoring system sends an event to the event logs, writes the data to a flat file, and sends an email alert to our SuperSA's pager.
After you install the scripts on your Web-monitoring module system, you need to verify that the scripts work. We suggest that when you run your scripts the first time, you use the perl.exe –d command-line option, which lets you step through the script line by line. To test the scripts' availability functions, enter a bogus address into your list of URLs (i.e., the script's array variable). This address will trigger the system to notify you of at least one unavailable URL. To test the scripts' performance functions, you can lower the performance baseline to zero. This baseline will trigger the system to notify you of unacceptable performance for all pages.
After you've confirmed that the monitoring modules are working, you can schedule the monitoring system's operation from Win2K. To schedule the scripts from the Win2K Scheduler, go to Start, Settings, Control Panel, Scheduled Tasks, Add Scheduled Tasks. To decide how often to run the module, ask yourself how often your Web site operates. Typically, you can schedule the module to run every 10 minutes. For business-critical Web sites, you might want to run the module every 5 minutes. After you collect performance and availability data for at least 1 hour, you can move forward to system performance and trend analysis.
System Performance and Trend Analysis
What about situations in which you need to analyze a trend in your performance and availability data (e.g., the CEO complains that your Web site was down last week, and you want to verify that it was)? To accomplish this task, you must store the data. Listing 1 includes code to write data to a SQL Server 7.0 database. (You can use any ODBC-compliant database, such as Access, Lotus Notes, or Oracle.) In the database, we store the URL of themonitored Web site, the site's performance time, and a page-retrieval timestamp. We can then mine this data to create reports or correlate the data and generate graphs in a program such as Microsoft Excel or Seagate's Crystal Reports. Figure 2 graphs correlated data to show overall performance. From this graph, we can determine when the Web server response time began to lag. Figure 3 shows the Web services' availability—an important metric from a management point of view. You can use graphs such as these to keep your IT team, company president, or the general public up-to-date on the success of your Web implementation.
Leveraging Win2K Troubleshooting Tools
When your Web performance is poor, you immediately need the appropriate data to track down the problem and determine a solution. Win2K and NT 4.0 include two troubleshooting tools: Ping and Tracert.
If the server doesn't respond adequately to a Web request, you can use Ping to determine general network latency. Ping provides a generic round-trip packet response time to the problematic Web server; this information can help you determine whether the server or network is completely down or whether overall network latency is slowing down Web-page delivery. If a network problem exists, you can use Tracert to determine which network device is unavailable or running slow. Tracert checks the generic packet-response time from the perspective of each network route (i.e., hop). Figure 4 shows an example Tracert output. You can use this information to determine the slowest path between your monitoring module (which represents your customers) and your Web site. You can also use this information to locate a failed network link.
In Listing 1, directly after the comment #Generate an error if retrieval time exceeds the baseline, we call another Perl module named optmod.pl, which Listing 3 illustrates. Optmod.pl generates ping.exe and tracert.exe, writes the results to a log file, and sends the results to the SuperSA. Optmod.pl doesn't include Pathping, which is a new tool in Win2K. Pathping gives you finer granular control over the number of hops that the module uses to locate your Web site, the time between attempts, the number of tries per network hop, and the available functionality of network services such as layer-2 priority and Resource Reservation Protocol (RSVP) support. To learn about this new network-troubleshooting tool, from the command line type
If you're running the Win2K Performance Monitor on your Web server, you can review the Performance Monitor's data from the time that your monitoring system experienced Web server problems; you can thus isolate server-side problems. (For more information about Win2K Performance Monitor, see "Windows 2000 Performance Tools," April 2000.)
Proactive System Management
What do you do if the system shows that your Web pages' performance is lacking? Don't wait for customers to complain or go to a competing Web site: Be proactive. The Web-monitoring module is flexible and lets you customize the script for various options. You can use the module script to track the problem and to alert you of other performance or availability failings (i.e., write an event to the Win2K event logs). You can then integrate the monitoring data into a commercial management program (for information about commercial programs' capabilities, see the sidebar "Commercial Tools for System and Network Management"), or you can use the Win2K Event Viewer to review the event logs. You might want to run a command-line program that can call a series of commands. For our example, from within the module we make an ActivePerl system call that runs optmod.pl, which in turn calls a freeware program called Blat (http://www.interlog.com/~tcharron/blat.html), which sends an email alert to our SuperSA's pager and home email account (see Listing 4).
Your customers value your Web services. The Web-monitoring module methodology lets you proactively monitor and track the performance and availability of your key Web sites. Monitoring your Web services is only one portion of an overall system and network management solution, but it is surprisingly easy to implement and provides valuable insight into the delivery of critical Web services. The combination of customer-perspective data, network-performance diagnostic information, and server data can give you a formidable arsenal to track, troubleshoot, and resolve Web-service performance weaknesses before they become overwhelming problems. Be creative with the example scripts—you can easily customize these tools for any enterprise—and keep ahead of your competition.