For all the effort you put into your Web server and Web site content, the ultimate payoff is hidden in the growing files that document the visitors to your Web site. Log files, when you break them down and analyze them, tell many tales. Most of us can tell where our users come from and what they come to our sites to see: That's the relatively simple stuff. Depending on your logging method and depth, you can discover which browser your users are using.
This month, I describe how you can look at your logs for additional information. I introduce you to different logging methods and some commercial analysis packages. Finally, I show you a simple command that you can use to retrieve additional information.
IIS 5.0 and IIS 4.0 each have four ways to log visits to your Web site. You can choose from among these formats:
- World Wide Web Consortium (W3C) Extended Log File
- IIS Log File
- National Center for Supercomputing Applications (NCSA) Common Log File
- ODBC Logging
Each format provides various sets of information; thus, some formats are more verbose than others.
Although you can determine where the logs will reside, IIS creates a new subdirectory (usually called W3SVCx, where x represents the sequentially created site) beneath your chosen location for each Web site you create. You can house the logs for each Web site you host on your IIS server in a different directory, which lets you maintain different sets of logs for each site. You can use the Internet Service Manager (ISM) to control the logging options for each site. Simply open Microsoft Management Console (MMC) Internet Information Services snap-in, and expand your Web server in the left pane. Right-click the Web site for which you want to change the logging, and select Properties. On the Default Web Site Properties dialog box, which Figure 1, page 8, shows, note the Active log format list box at the bottom of the dialog box. This box tells you what logging format the Web site is using. You can change the type of logging here by clicking the drop-down list box and selecting a new logging type.
Click Properties. On the General Properties tab, which Figure 2, page 8, shows, you can set the frequency with which logs close and the system creates new logs to avoid creating huge log files that become tedious to process. You can also set the location of the log files. By default, the logs exist in the \system32 directory, which isn't the best choice. Windows 2000 and Windows NT don't behave well with full system disks, so I recommend moving these logs to a drive that doesn't contain the OS. To move the logs, click Browse and select a new location on the Local Computer. After you've selected the new location, click OK three times to close the three dialog boxes. You also need to stop and restart your Web server before the new changes will take effect. (Note that stopping and starting a Web site causes the logging function to insert additional headers into your log files, which creates problems for automated Web log analyzers.)
Now you know how to find the logs and choose which logging format you want. Let's examine the different formats and their respective advantages.
W3C Extended Log File Format
The W3C format is the most customizable of the four formats. Because this format lets you choose which fields go into your logs, you don't have to waste disk space on fields you'll never use. Assuming you've selected the W3C format in the Default Web Site Properties, click the Properties tab to open the Extended Logging Properties dialog box. The Extended Properties tab presents a long list of field choices, as Figure 3 shows. (This tab looks a little different if you're using IIS 4.0, but the field choices are the same.) The names in parentheses tell you the field name that appears in the log file's header. Selecting or clearing these check boxes changes which fields you will log. If you make changes, save them, then stop and restart the Web server to make those changes effective.
Figure 4 shows a few sample entries in the IIS log made using the W3C Extended Log File format. A number sign (#) precedes the log file header. Notice that the fields that Figure 3 shows appear at the beginning of the log. (The format always substitutes dashes into the log when there is no value to log, such as the cs-username field.) The system successfully retrieved two pages—iisstart.asp and pagerror.gif. The system also logged the browser's user agent string, a configurable value that identifies the type of browser the client is running.
The W3C format usually records time in Greenwich Mean Time (GMT) instead of local time, although in IIS 5.0 you can change this setting on the Properties dialog box. You might need to be sensitive to this change when you decide how frequently you'll change your logs. For example, in the Central time zone, the logs flip at 6:00 p.m. local time or 7:00 p.m., depending on whether it's daylight savings time.
Another advantage to using the W3C format is that if you suspect that an out-of-process application is a candidate for throttling (i.e., forcibly reducing the amount of time a process can occupy the processor), you can set up process accounting in Extended Properties, as Figure 5 shows. This additional logged information can help you determine whether you need to employ process throttling on the Web site.
Microsoft IIS Log Format
The Microsoft format is the default for newly installed IIS servers. This format is neither as customizable as the W3C format nor as verbose as W3C can be when you configure all fields. Figure 6 shows a typical IIS log entry. An IIS 5.0 log entry has no header, but Figure 7 shows how you decode the fields.
The Client IP address, User Name, Date, and Time are self-explanatory. Service# is the instance of the Web server serving the request. This entry is almost always W3SVC for a Web server and MSFTPSVC for an FTP server. The number that follows the instance denotes the specific instance of a Web site or FTP server. The numbers are sequential beginning with W3SVC1, which is the Default Web Site. New Web sites that you create receive the designations W3SVC2, W3SVC3, and so on. Server Computer Name is the computer name of the Web server; its IP address follows. Elapsed Time shows how long it took to satisfy the request, followed by the number of bytes sent or received. The HTTP Service Code shows the result code that the server sent to the client (e.g., 200, 302, 401). The Request Type and the Target path and filename close out the entry into the log file.
NCSA Log File Format
The NCSA format is probably the oldest of the formats. The NCSA developed it during a time when HTML was young, so the format doesn't support the flashy logging features the other formats do. Because the NCSA has been out the longest and many Web server platforms support it, a lot of free log analyzer software is available. You can't customize the NCSA format like you can the W3C standard, but you can do a lot with it. Figure 8 shows a sample NCSA log entry. As you can see, the log entry is small. Figure 9 shows you how to decode the fields.
The CONVLOG utility lets you convert other log formats to the NCSA format. In addition, the utility can resolve individual IP addresses to Fully Qualified Domain Names (FQDNs) during the conversion process. Here's a typical CONVLOG command:
CONVLOG ie ex00312.log -d
This command converts a W3C-formatted log to the NCSA format and resolves any IP addresses. The syntax of the CONVLOG command is too lengthy to print here, but you can easily see it from the command line of your IIS server by entering the command CONVLOG from the DOS command shell in Win2K. (Note that if you convert W3C logs to the NCSA format, you'll lose any process accounting information.)
ODBC logging is best described as logging directly to a Microsoft SQL Server or ODBC-ready database. Microsoft Access can also speak ODBC, so you could also log directly to that platform. ODBC logging isn't a popular option because it requires additional overhead and network connections to a remote database server.
The benefit of this logging format is that you have all your Web server logs centralized in one place. You might also have other logging going to the same database. When you need to print reports, you'll have an easier time pulling the logs from an offline database rather than sacrificing your Web server's resources. Another benefit is that you can use SQL-type queries against your logs, which makes ad hoc queries a snap.
Making the Log File Information Useful
If you're staring at piles of log files and you're not quite sure what to do with them, you're probably not alone. Some of the Web servers I'm responsible for log more than 100MB per day in the W3C format. That number might not be impressive depending on the Web site traffic and the number of graphics and application calls in your content. Nevertheless, it's a lot of information to chew through.
Web server administrators evaluate logs in two ways—performance and problems. Showing performance is easy with commercial packages such as WebTrends' Log Analyzer and Mach5 Enterprises' FastStats Analyzer. However, even if you have a commercial package that automates everything, you still need to check your logs for potential problems the logging package can't catch. Malicious users wanting to test your resolve toward Web server security are commonplace in this business. These users check whether your patches and security settings are up-to-date. For example, you might see something like this in your Web logs:
Unless you're using an application that uses the Remote Data Factory object, this line doesn't appear in your logs. Malicious users also attempt to exploit other applications and content on your system to see whether you've overlooked security precautions during your installation process. So, how do you keep an eye out for this information?
I use the FINDSTR command frequently. You can use it on live open log files and on closed log files. I'm fortunate in that I don't have any applications that require POST on my Web servers, so I can search for any line that matches the word POST to determine who is snooping around my Web server. For example, the command
FINDSTR /i /c:"POST" ex000312.log
displays to the console anything that matches POST in the file ex000312 .log. From there, I can pipe the output into a file or another FINDSTR command, such as
FINDSTR /i /c:"POST" ex000312.log | FINDSTR /I /v /c:"goodapp.dll"
This command displays all POST entries that aren't involved with goodapp .dll. This command might not display someone trying to test your application by flooding it with input (a buffer overrun condition), but you get the picture. With some careful DOS commands and scripting, you can make a great screening tool to help you parse your logs. You might be able to parse for all the exceptions and have the server mail the file to you regularly.
Here's a word of caution: As you parse logs, you might run across something that doesn't look right. Be cautious when you react. Don't overreact just because someone is rattling the doors around your Web server. It happens all the time to a lot of us. However, if someone has definitely accessed your Web server, investigate the event calmly but thoroughly until you have peace of mind.