A chief success factor in any proxy server deployment is how long it takes the server to respond to a Microsoft Proxy Server client workstation query. Whether you're using the Web Proxy, Winsock Proxy, or Socks Proxy service, a consistently fast response is the measure of success. Proxy Server's caching feature can help you improve the responsiveness of your browser queries. This month, I discuss the caching feature and how to create a caching architecture for your proxy server environment.
The Content-Caching Process
Proxy server administrators know that many employees frequently visit the same Web sites every day—CNN, eBay, CBS Marketwatch, Yahoo, and business-specific sites important to the operation of the business. Without a proxy server in place that supports caching, each time a browser requests content from a popular site, the server must retrieve a fresh copy from the Web site. Caching stores a copy of that content locally on your proxy server; from there, the server can serve that content rather than have to retrieve it again from the network.
On sites in which the content is always changing (e.g., news and financial Web sites), you might wonder how to prevent your proxy server from caching old, outdated information. Web site administrators can use two methods to prevent this problem:
- Set a default expiration on the Microsoft IIS server.
- Set an expiration on the content itself. (This specification was originally part of the HTTP 1.0 specification, but it was updated in HTTP 1.1.)
(For more information about this specification, see the Internet Engineering Task Force—IETF—Request for Comments—RFC—1945 and RFC 2068 at http://www.freenic.net/rfcs.) A content creator might use the Expires attribute to set the expiration on the content within a <META> tag. Here's an example:
<META HTTP-EQUIV="expires" _ CONTENT="Sat, 22 Jan 2000 _ 21:00:02 GMT">
Both expiration-setting methods depend on the Web server administrator's or the content creator's applying an expiration value to the object. The intent is that if the object hasn't expired, the client (in this case, the proxy server) doesn't need to retrieve the object from the source on behalf of the real client (hiding behind the proxy server): Responding to the client with the cached copy of the object on the local proxy server is sufficient.
Passive and Active Caching
Proxy Server performs two types of caching—passive caching and active caching. The difference between the two types lies in when Proxy Server caches content.
Passive caching. Passive caching occurs on behalf of every Web Proxy service request for content (i.e., objects). As browsers request content from the Web Proxy service, the service consults the cache to see whether a current copy of the object exists. If no copy exists, the service downloads a fresh copy from the Web server and serves it to the client. Subsequently, the service caches the object on the proxy server's local drives. This newly cached object is now ready for the proxy server to serve when other browser requests for the same object occur.
Serving cached copies of Web pages is a benefit to the local user; however, for Web sites tracking page hits, the result is a lost hit. Lost hits can potentially result in lost revenues. In addition, not every type of content is cacheable. (Examples of noncacheable content include Active Server Pages—ASP—and Common Gateway Interface—CGI—objects.) In passive caching, as copies of objects make their way to the cache, Proxy Server assigns a Time to Live (TTL) value to the object. If the content has no set expiration date, then Proxy Server figures the TTL as 20 percent of the object's age, as calculated from the HTTP Last-Modified header. The Web server usually provides the Last-Modified value, which it usually takes from the date and timestamp provided by the Web server's file system. Screen 1 shows the passive caching policy for Proxy Server objects using the default values. If the object is 10 hours old, the TTL of the object would be 2 hours. (Note that the minimum and maximum TTL of objects are 15 minutes and 1440 minutes, respectively.)
If the content provider used the <META> tag HTTP-Expires to assign an expiration date and time, Proxy Server uses this value. Figure 1 shows the HTTP headers that I extracted from an image linked to the home page of a popular Web portal. The header says the object is about 6 years old and won't expire for 10 more years. However, the content isn't really that old: The content creator wanted you to cache the image for a long time, so he or she gave the object a distant expiration date. Now, let's figure the TTL for this object after Proxy Server has cached it. Figure 2 shows how you figure 20 percent of the object's age (if you assume that today is May 15, 2000). Proxy Server would keep this object in cache for 14 months and 18 days before the object expired. The result of this long caching period is that your proxy server doesn't have to go get the image countless times for each user, and the content provider doesn't have to continually serve it up and, in turn, can cut down on his or her bandwidth requirements. It's a good deal for both the content provider and the proxy server administrator; however, you have to have sufficient space in your cache to hold these objects.
Active caching. Unlike passive caching, active caching is caching that the proxy server performs during its idle periods. This type of caching is called active because it proactively downloads the most frequently requested pages your local proxy server cache learns. If an entertainment Web site is one of the most requested Web sites on your proxy server, active caching will have a fresh copy on hand in anticipation of browser requests. This active caching process occurs only during idle periods—for example, overnight. You can disable this feature for those proxy servers that have time or bandwidth restrictions.
Proxy Server takes more than just site visitation frequency into consideration when deciding what to actively cache. Proxy Server's active caching gives priority to longer TTLs over shorter TTLs. It also considers objects that are about to expire and peak periods for the proxy server.
Expiring Cache Objects
I've already shown that returning a nonexpired cached object is preferable to retrieving a fresh copy from the source. However, what happens when you want to retrieve an object that has expired? You don't need to automatically go back and get a fresh copy, especially if the object's author hasn't modified it.
Fortunately, when Proxy Server realizes that the client is requesting an object that is in its cache but expired, it performs a conditional GET. Newer browsers perform this query frequently to take advantage of their caches. Microsoft Internet Explorer (IE) calls its cache Temporary Internet Files. The only difference between a GET issued to a Web server and a conditional GET is the addition of an If-Modified-Since header to the GET request that went to the source Web server. Figure 3 shows an example of such a statement.
When a Web server such as IIS receives a conditional GET request, it compares the file's timestamp to the If-Modified-Since value it received from the client. If the Web server has a newer, updated object, then the server returns the new object. If the object hasn't been modified, then the Web server simply returns a result code of 304, which, according to RFC 2068, means Not-Modified. The proxy server, in turn, serves the result from its cache and extends the TTL of that object.
This update procedure results in significant bandwidth savings for both proxy server and administrator. Although the object might be expired, Proxy Server doesn't need to retrieve the object again if it hasn't been modified. Proxy Server simply updates the object in cache with a new TTL. Proxy Server 2.0 has a known problem with expired content. For more information about this problem, see the Microsoft article "Proxy Server May Receive Old Data from a Web Site" (http://support.microsoft.com/support/kb/articles/q239/4/95.asp).
Cache Construction and Logging
The Proxy Server installation process allocates space for the Proxy Server cache. By default, the installation program looks for the largest partition and suggests placing the cache there. According to Proxy Server's installation documents, the guideline for cache sizing is 100MB plus 500KB per user. For example, for 100 users, your cache should be 150MB in size. Today, disk space is cheap compared with just a few years ago. Tens of gigabytes of space for just a few hundred users is practical. To enjoy the benefits of caching, you have to have the real estate to store all this content in anticipation of the next use.
Proxy Server creates a directory for every 500MB of cache space you allocate; it creates a minimum of five folders (even on small drives) and a maximum of 200 folders. The allocated cache space divided by the number of folders represents the amount of available storage in each folder. In addition, Proxy Server won't store any object larger than one-eighth the folder size. (For more information about folder storage, see the Microsoft article "Proxy: Large Files May Not Be Cached" at http://support.microsoft.com/support/kb/articles/q201/1/84.asp.)
When you run out of space in your cache folders, Proxy Server casts expired objects and those objects about to expire from the cache first. Proxy Server also prioritizes objects marked for deletion from least-often used to more-frequently used. This process creates room for newer objects entering the cache.
Arrays and Caching
To spread out the overload of caching, you can distribute the task across several peer proxy servers. The attachment of proxy servers is an array. Peer proxy servers that join an array are members. The members of an array jointly participate in the caching process. The Cache Array Routing Protocol (CARP) manages the entire process. This protocol helps determine which proxy server in the array will maintain the cache for a particular object.
You initialize and configure arrays from the shared services dialog box in the Microsoft Management Console (MMC). Click Array, then click Join to create an array. Enter the name of another proxy server with which you want to form an array, as Screen 2 shows. You must provide either the name of a computer in the existing array or the name of the computer with which you want to form the array. If the computer you named isn't already part of an array, Proxy Server creates an array.
Most proxy server administrators use arrays in hierarchical proxy server environments in which a remote office uses a downstream proxy server to speak to an array of upstream proxy servers housed at a central location. Client browsers don't really benefit from connecting directly to the upstream proxy server. In this scenario, because array members share caching responsibilities, one proxy server essentially becomes the client of another array member to serve the client requests. To avoid needlessly tying up proxy server resources, have a CARP-aware downstream proxy server that correctly assesses which array member can efficiently resolve the request.
Monitor Caching Through Performance Monitor
You can use several Windows NT Performance Monitor counters to assess your proxy server's caching function. The most important Performance Monitor object is Web Proxy Server Cache, which contains counters that help you assess how well your proxy server's caching function is performing. Table 1 provides a list of counters that you can use to evaluate the caching function.
Caching in Proxy Server Logs
From time to time, you might also want to know the source from which the proxy server served a page (e.g., from the cache, from the source Web server). A few fields in the Proxy Server logs can tell you where the object came from. (For a full listing of fields in the Web Proxy service log files, see the Microsoft article "Web Proxy Server 2.0 Log File Format" at http://support.microsoft.com/support/ kb/articles/q234/1/47.asp.)
Because each log file line contains 23 fields, any one of the log file lines would be too long to fit in this format. So for this example, assume that you have a log file open and that you can see the entries. For caching, you're concerned only with fields 1 and 21 (Object Source). The Object Source field shows from where the proxy server retrieved the object. Table 2 shows the possible codes you might see in this field.
Field 1, ClientIP, tells you more than who requested something from your proxy server. Whenever active caching takes place, there is no client IP address to record. Instead, Proxy Server logs one dash in this field. Using this information, you can find which URLs Proxy Server is actively caching.
In addition, if you have additional URLs to cache, you can use the MMC to force Proxy Server to cache or not cache particular URLs. From the MMC shared services dialog box, click the Caching tab, click Advanced, and click Cache Filters. Click Add to add a URL for Proxy Server to cache or not to cache. This process helps you partially control what Proxy Server does and doesn't cache.
Proxy Server's caching feature single-handedly determines the performance of your proxy server. If you improperly size your proxy server's cache, you'll impair your performance. Next month, I'll talk about the common problems proxy server administrators face. I'll also talk about end-user trouble calls and what you can do to avoid them.