Somewhere, right now, there's a person on the Internet looking for a solution, an answer to a question, or a means to fill a need. The words that the person enters into the search engine will dictate the results that are returned and the links that the person will follow. With the right words, your Web site might be in that list of results. With the wrong words, it won't. Do you know which keywords people have used to find your Web site in the past? Do you know which keywords most frequently lead them to your Web site?
Web analysis tools aren't new, but I'm still amazed by them. The sheer processing and data-mining power available in a comprehensive Web analysis package is truly remarkable. Need to know how your clients are finding you? Simple—just look in the Web log analysis to see which search phrases they used to get to your site. Need to know whether most of your visitors are running Windows 2000, Windows 98, Apple Computer's Mac OS-X, or Linux? Again, Web analysis software can quickly deliver this type of information. But what if your company doesn't have the budget for this kind of software? Fortunately, there's an exceptionally robust Web analysis package available in the open source community—and it runs on Windows.
AWStats was first created in May 2000 by Laurent Destailleur, a computer engineer in Paris. Laurent had to regularly report his company's Web statistics. He needed a better application than what was available in the open-source world at that time, so he created his own. Laurent first posted his Perl source code on SourceForge.net in October 2000, and the project has grown ever since. As of this writing, AWStats stands at version 6.2 and is an exceptionally reliable and trusted log analysis tool. AWStats can run as a command-line application or as an interactive component of a Microsoft IIS Web server. It can give you a thorough view of who is visiting your Web site.
For the purposes of walking through a sample AWStats setup, I'll be using Win2K Server configured with default IIS settings. Thus, the path to the Web site I'll analyze will be C:\Inetpub\wwwroot and the log files will be in the standard C:\WINNT\System32\LogFiles directory.
Step 1. Customize IIS
The only item you need to customize in IIS is the logging format. The default IIS logging format doesn't provide enough detail for AWStats, so you need to configure IIS to store its logs in the World Wide Web Consortium (W3C) extended log format. To do so, open the Microsoft Management Console (MMC) Internet Information Services snap-in. Right-click the Web site for which you want to change the logging, and select Properties. In the Default Web Site Properties dialog box, select W3C Extended Log File Format from the Active log format drop-down list. Next, click the Properties button to open the Extended Logging Properties dialog box. In this dialog box, make sure the following properties are selected:
No other properties should be selected. AWStats has a particular format it looks for, so any variance might produce unpredictable results.
To make sure that all your logs will be in the correct format, you should delete or archive any previous logs that might have been generated by IIS. If IIS is currently running, you might not be able to remove the current day's log because it'll probably be in use. In that case, simply stop the World Wide Web Publishing service on your system, remove the open file, then restart the service.
Step 2. Install ActivePerl and AWStats
To use AWStats, you need to install ActiveState's ActivePerl and AWStats on your server. You can download ActivePerl from ActiveState's Web site at http://www.activestate.com/products/activeperl. As of this writing, ActiveState offers two releases: ActivePerl 5.8.6 and ActivePerl 5.6.1. I've found that ActivePerl 5.6.1 works fine with AWStats and provides better compatibility with some other Perl open-source tools I like to use. So, if you think you might want to try some other open-source Perl applications, I recommend that you install ActivePerl 5.6.1. If you prefer to have the latest version, AWStats also works with ActivePerl 5.8.3. Feel free to download whichever version suits your needs.
To install ActivePerl, you need Windows Installer. The ActivePerl installation is rather straightforward. There are no real options from which to choose, other than the installation path. I recommend that you install ActivePerl in the AWStats default installation path, which is C:\Program Files\AWStats.
After you've installed ActivePerl, you need to download the main AWStats distribution from the SourceForge.net project site at http://awstats.sourceforge.net. There's a self-installing executable available for easy installation. After you download the AWStats executable, simply launch the .exe file and accept the default installation path. When the installation is complete, the application will be in C:\Program Files\AWStats.
Step 3. Configure AWStats
As a part of the setup process, AWStats launches its own configuration program, a Perl application that executes in a command window. The configuration application first asks for the path to your Web server configuration file, as Figure 1 shows. The configuration application assumes that you're using a Web server, such as Apache, that's configurable through a text file. Because IIS isn't configured in this manner, simply enter none.
After attempting to configure your Web server, the configuration program walks you through building a basic configuration template for your Web site. As Figure 2 shows, it first asks whether you want to create a new AWStats configuration file. Answer yes. Next, the configuration program asks you to provide a name for the Web site you'll be analyzing. I've selected the name www.toombspartners.com—the name of the domain for the fictitious organization Toombs Partners—as the Web site name. The program uses the name you provide to create the filename for the configuration file. In this case, the program will name the file awstats.www.toombspartners.com.conf.
Finally, the configuration application informs you that it can't automatically create scheduled tasks for you because you're installing AWStats on a Windows server. We'll take care of that task in Step 7, so for now just acknowledge the message, and the configuration program will be complete. The configuration parameters configured throughout this process are stored in C:\Program Files\AWStats\wwwroot\cgi-bin\awstats.www.toombspartners.com.conf.
There are still a few more settings you need to change, so open your configuration file with Notepad or another text editor. The first parameter you need to change is the path to your log files. By default, AWStats uses the path /var/log/httpd/mylog.log, which is a UNIX-centric file path, so it contains forward slashes (/) rather than backslashes (\). Look for the LogFile parameter in the configuration file, which will look like
Replace "/var/log/httpd/mylog.log" with the default installation location for your log files. If you've followed a default installation of IIS, specify "C:/WINNT/System32/LogFiles/W3SVC1/ex%YY-24%MM-24%DD-24.log". This value will instruct AWStats to look in the C:\WINNT\System32\LogFiles\W3SVC1 directory for filenames that begin with ex, end with .log, and have a two-digit year, two-digit month, and two-digit day in the middle. This is the default naming convention for IIS log files. If you're using another naming convention or if the log files are stored in a directory other than W3SVC1, adjust the LogFile parameter value accordingly.
Another parameter you need to change is the LogFormat parameter, which tells AWStats the type of logging. LogFormat is set to 1 by default. The proper value for an IIS Web site is 2.
You must also change the SiteDomain and HostAliases parameters, which define how the outside world references your Web site and how the Web site might be referenced internally, respectively. Change the SiteDomain parameter's value to the main domain name for your server or the main intranet site name. Modify the HostAliases parameter's value so that it specifies other possible domain names, addresses, or virtual host names that might also be used to reference your site internally.
Save the configuration file, then exit the text editor. You're now ready to run AWStats for the first time.
Step 4. Run AWStats
Assuming you configured everything correctly, you're ready to give AWStats its first run-through. However, to do so, you need to understand how AWStats works.
There are two primary components you'll be using in AWStats: the analysis application and the report-generation application. The analysis application does all the raw number crunching. The report-generation application converts the analyzed data into human-readable HTML reports.
Given the sheer volume of log information created by many Web sites, AWStats writes all its statistical-analysis results to its own database file. That way, when you execute AWStats again, it won't need to reanalyze the same data. Imagine how long AWStats would take to execute if you launched it in December and it had to go back and process the previous 11 months worth of log files every time.
To create the AWStats database and import an IIS log file into the analysis application, change to the C:\ProgramFiles\AWStats\wwwroot\cgi-bin directory and execute the command
awstats.pl -config=WebSiteName -update -logfile=IISLogName
where WebSiteName is your Web site's name and IISLogName is the name of the IIS log file you want to import into the database for analysis. (Although this command appears on several lines here, you would enter it on one line in the command window. The same holds true for the other multiline commands in this article.)
For example, suppose we want to analyze the IIS log files for the Toombs Partners Web site, starting with the month of June. It's June 4, so we have three complete IIS log files (i.e., the log files for June 1, June 2, and June 3) and one incomplete log file (i.e., the log file for June 4). To import the June 1 log file, run the command
awstats.pl -config=www.toombspartners.com -update -logfile="C:/WINNT/system32/ logfiles/w3svc1/ex050601.log"
After running that command, we'd then process the ex050602.log and ex050603.log files in the same way.
AWStats uses the Web site's name in the database file's name. For example, in the case of Toombs Partners, the name of the database file is awstats%MM%YYYY.www.toombspartners.com.txt, where %MM is the two-digit month and %YYYY is the four-digit year.
When the analysis application starts analyzing log files, you should see the program display several statistics about what it's processing. When it finishes, you should see a statistic that specifies how many new qualified records were found, as Figure 3 shows. The number of new qualified records should roughly be in line with the number of lines of data in your log files. It's not uncommon to see a few lines dropped due to errors; as long as you don't see a significant number of errors, you should be ready to create your first report. If the number of new qualified records is 0, there's a problem with your configuration and you need to figure out what the problem is.
Step 5. Create a Web Server for AWStats
To make it easy for you and others in your organization to browse the reports for your Web site, you can build another Web site within IIS and point it to a directory that stores the AWStats output. To make a new directory for your AWStats reports, first create a directory called AWStats in the Inetpub directory (i.e., C:\Inetpub\AWStats). Copy the icon directory in C:\Program Files\AWStats\wwwroot and paste it into the C:\Inetpub\AWStats directory. Next, define a new Web site in IIS and point it to this directory, as Figure 4 shows. Create a unique TCP port, IP address, or host header assignment for this Web site if there's already an existing site on that server listening on port 80.
When you use the report-generation application, AWStats creates a custom Web page. The default filename for that Web page is awstats.WebSiteName.html, where WebSiteName is the Web site name that you defined in Step 3. You need to set that Web page as the default home page in the Documents tab of the AWStats Site Properties dialog box.
Finally, if you want to add any security to your statistics site, you can do so by modifying the appropriate properties on the Directory Security tab of the AWStats Site Properties dialog box. For example, you can restrict users based on IP addresses or require authentication to access the site. If you don't add any security, anyone who can connect to the server will be able to browse the site.
Step 6. Build the Reports
After you define your IIS Web site, it's time to get to the fun stuff—building the reports. Change to the C:\Program Files\AWStats\tools directory and execute the command
awstats_ buildstaticpages.pl -config=WebSiteName -update -lang=en -dir="C:\Inetpub\awstats" -awstatsprog= "C:/Program Files/ AWStats/wwwroot/ cgi bin/awstats.pl"
where WebSiteName is your Web site's name (e.g., www.toombspartners.com for the Toombs Partners example). The report-generation component (i.e., awstats_buildstaticpages.pl) builds the main statistics report page, along with all related subpages, for your site and writes them to the directory specified by the -dir switch.
If everything is working properly, AWStats should create one report after another, until the processing cycle is complete. After the reporting cycle finishes, you can open a Web browser on your system and navigate to your newly created AWStats Web site. The main page of the AWStats report should be your default page. Figure 5 shows a portion of a sample main output page.
The main output page contains a number of high-level statistics and clickable links that let you drill-down into more detail. For example, if you click the Countries link under the Who section in Figure 5, you'll be sent to the Countries (Top 10) table that's further down in the main output page. Clicking the Full list link in the table's title bar takes you to a table that lists all the visitors' countries. Alternatively, you can go directly to the full list of countries by clicking the Full list link that appears after the Countries link in the Who section.
But wait—you don't have country statistics in your report? Not to worry, this is the default behavior for AWStats. Let's look at how to fix that.
One of the primary mechanisms used to determine where a visitor is coming from is to perform a reverse lookup on the visitor's IP address to see whether there's a Fully Qualified Domain Name (FQDN) associated with that address. If there's a proper FQDN associated with the IP address, the address should end in an extension such as .com, .ca, or .jp. AWStats uses this extension to determine where the visitor is coming from.
In its default configuration, AWStats doesn't perform reverse DNS lookups because it takes a significant amount of time to perform a reverse DNS lookup for every IP address that comes to a site. AWStats is highly efficient in its own processing, but when it must submit a query to an external DNS server and wait for the response, the process literally slows down exponentially. For example, the three test log files that I'm using for this article contain about 350,000 addresses. Without performing reverse DNS lookups, AWStats analyzed this much data in only a few minutes. With reverse lookups enabled, the process took more than 6 hours to complete. So, there's a significant performance hit involved with performing reverse DNS lookups.
If you want AWStats to perform reverse DNS lookups, you need to find the DNSLookup parameter in your site's configuration file and change the parameter's value from 2 to 1. After you set that parameter value, AWStats will attempt to perform a reverse DNS lookup for every IP address in any new log files you analyze. In other words, you need to manually reanalyze any existing log files. To do so, delete the AWStats database file for your site, then start the analysis process again. This time, AWStats will perform a reverse DNS lookup for all your records—but be prepared for it to take a long time. After AWStats has analyzed your log files with reverse lookups, simply create the reports again and you should have the statistics about which countries your visitors are from.
Step 7. Schedule AWStats to Run Nightly
You'll probably want to have AWStats process your log files every night so that you don't have to manually execute the analysis and report-generation applications. To do this, you can use the NightRun.bat file, which Listing 1 shows. This batch file first calls the analysis application, which analyzes the IIS log files. After the analysis process completes, the batch file then calls the report-generation application, which creates the reports and places them in the IIS directory for AWStats.
To use NightRun.bat, first download the file from the Windows IT Pro Web site. Go to http://www.windowsitpro.com, enter 45264 in the InstantDoc ID text box, then click the 45264.zip hotlink. Next, open the file in Notepad or another text editor, and replace each occurrence of www.toombspartners.com with your Web site's name. Finally, open the Control Panel Scheduled Tasks applet and schedule the batch file to run every night at some point after midnight. Why after midnight? When I showed you how to execute AWStats in Step 4, I used the -logfile switch to supply a specific log-file name. If that switch is missing, as it is in NightRun.bat, AWStats automatically chooses the previous day as the date to use for any wildcards defined in the configuration file's Logfile= parameter. So, if this batch file executes after midnight on June 4, AWStats analyzes the log files for June 3 because a wildcard date format was used in the LogFile parameter's value.
Web Statistics That Are Just a Click Away
It's nice to have a fresh set of Web statistics ready and waiting for you every morning. If you're not mining your company's Web site data, you should be, even if you aren't selling products on your site. The statistics can give you a better idea as to what people are looking for when they find your Web site—and in today's age of customer service, that's always a good thing to know.
|Project Snapshot: How to|
PROBLEM: If your company is put off by the high cost of Web analysis software, you can use AWStats.|
WHAT YOU NEED: AWStats, ActivePerl, and NightRun.bat
DIFFICULTY: 3.5 out of 5