Skip navigation

Grouping Web Site Hits into Sessions

Suppose you own a piece of Web-monitoring software that has produced a table like the one at callout A in Listing A. The table shows Web site hits by site URL and caller IP address. Now suppose you want to group this data into two sessions, one starting at 7:30 a.m., the other at 2:01 p.m. In this case, let's define a session as a set of hits on the same URL from the same IP address, where each hit falls within 30 minutes of another hit in the set. For example, Listing A contains two sessions because of a gap of more than 30 minutes between 8:02 a.m. and 2:01 p.m. The time between 7:33 a.m. and 8:02 a.m. is only 29 minutes, so the 8:02 a.m. hit is part of the first session.

This problem doesn't appear to resemble the magazine-subscription problem. This problem has only one column of dates, not two. On the surface, this problem appears to be about grouping points in time rather than overlapping intervals. However, this problem is almost identical to the magazine-subscription problem. The key is to construct for each hit the time and date when the session would time out if no more hits happened. Listing B shows the web_hits table with this hypothetical timeout date added.

You can relatively easily modify Listing 5 from the main article to use the new table and show the start and end time of each session, as Listing C shows. The results of running Listing C are

site_url caller_ip start_session   end_session
----------------- -------------- ----------------------- ----------------------- 2001-07-04 07:30:05.323 2001-07-04 08:02:14.330 2001-07-04 14:01:09.220 2001-07-04 14:25:21.787

This result set lets you calculate session length, number of hits per session, average length of session, and so on. These measurements can be valuable in calculating the effectiveness of your Web site.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.