Counting the Costs

Downloads
27592.zip

T-SQL programmers frequently face tricky date- and time-related challenges. And solving these temporal problems often requires combining several different techniques that you've learned from experience. Let's look at an interesting temporal challenge that I call the call-costs problem. To solve the problem, you might find several techniques that I discussed in past columns useful, such as using auxiliary tables and a step-by-step modular approach that uses views.

The Call-Costs Problem

Internet service providers (ISPs) often charge different rates for user dial-up calls during different periods of the day. ISPs usually charge higher rates during periods of high traffic than during periods of low traffic. For example, suppose a fictitious ISP, LessWire, charges 2 cents per minute from 22:00 through 07:59 and 5 cents per minute from 08:00 through 21:59. LessWire stores call information in a table named Calls. Running the script in Listing 1 creates the Calls table and populates it with sample data.

Each row in Calls contains the caller's user name in the username column, the call start date and time in the calldate column, and the call length in minutes in the duration column. Note that the calldate column is defined as smalldatetime, meaning that it's accurate to the minute. That accuracy is especially convenient in the solution to this problem because you need to perform calculations involving whole minutes. Your task is to calculate the total duration and cost of calls for each user. For the given sample data, the output should look like Figure 1. Try to make the solution flexible in the sense that you could easily change rates and add or change the start and end times of rate periods.

The Solution

First, you need to decide where and how to store the information about rates and rate periods. Hard-coding them into your queries is a bad choice because this approach requires tedious, awkward code maintenance with every rate change. A better idea is to create a Rates table in which you can store all rate-related information and have your queries refer to that table. Run the script that Listing 2 shows to create a Rates table and populate it with the periods and rates I gave earlier.

The fromtime and totime columns store (as a smalldatetime data type) the minute of day the rate period starts and ends, and the rate column stores the period's rate. Note that I keep three rate periods in the Rates table even though only two exist. I did this because one of the rate periods (22:00 through 07:59) spans 2 days, and I wanted to express all rate periods in one day's boundaries to avoid unnecessary complications in the solution. Another point to keep in mind is that the smalldatetime data type stores both a date and a time, and when you enter only the time portion, as I did in the fromtime and totime columns, SQL Server stores the base date January 1, 1900. I didn't explicitly specify the date to emphasize the fact that the time periods aren't specific to a certain date.

After you have the rate periods organized in a table, you can start working on the queries. You have to split each call into periods according to the day's rate periods. A view containing each call's start and end times can help. I created a view called VStartEndCalls that has columns called firstminute and lastminute, representing the call's inclusive start and end times to the minute. Because the date portion of the rate periods' start and end times is an old base date and I knew I'd have to represent these time periods in current terms (date of call), I also calculated the difference in days between the base date and the call date. I named the result column diff. You can run the script that Listing 3 shows to create the VStartEndCalls view. Figure 2 shows the result of a SELECT * query against this view.

If you're not sure about the usefulness of the diff value, bear with me while I explain the next step, which is to generate all potential or possible call periods for each day a call spanned. In a later step, you can discard or revise the potential call periods to reflect the actual ones. For example, user2 started a call on February 12, 2003, at 23:00, and the call took 1440 minutes (24 hours), spanning 2 days. Each day has three possible rate periods, so this call has three potential periods on February 12 and three potential periods on February 13. I created a view called VPotentialCallPeriods that returns, for each call, all potential call periods for each day the call spanned. To generate for each call as many rows as the number of days the call spanned, I used an auxiliary table called Nums that contains a sequence of consecutive integers. I populated the Nums table with values in the range 1 to 1000, which is much more than the maximum number of anticipated days a call might span. The script in Listing 4 creates and populates the Nums auxiliary table, and the script in Listing 5 creates the VPotentialCallPeriods view.

The query in the VPotentialCallPeriods view definition performs a join between the VStartEndCalls view and the Nums auxiliary table. The purpose of the JOIN condition

n 

is to duplicate each call in VStartEndCalls as many times as the days the call spanned. For example, this join would generate two rows for the call by user2 that spanned 2 days--one row where n=1 and one where n=2. The query in the view cross-joins the result with the Rates table to match all possible rate periods with each day of each call. This means that user2's call appears in the result of the three-way join in six rows because it was matched with two rows from Nums (n = 1, 2), each of which was matched with three rows from Rates: 1 x 2 x 3 = 6.

Now, you just have to write expressions in the query's SELECT list that calculate the start and end times of the potential call periods. The following expressions calculate the start and end times of a potential call period (the potentialfrom and potentialto result columns); see if you can figure out how the calculations work:
fromtime + diff + n - 1 AS
   potentialfrom
totime + diff + n - 1 AS
   potentialto

Recall that diff is the difference in days between the call date and the base date (January 1, 1900) and that rate period start and end times (fromtime and totime) are expressed as time values with the same base date. By adding diff to fromtime and totime, you get those times in terms of the call's start date. By adding n and subtracting 1, you get those times in the correct day during the call. The whole process is probably not simple to grasp without looking at the results of a query against the VPotentialCallPeriods view. Run the following query, look at the results in Figure 3, page 16, and reread the explanation of this step:
SELECT username, potentialfrom,
  potentialto, rate
FROM VPotentialCallPeriods

Notice, for example, that the code split user2's call, which I discussed earlier, into six potential call periods--three in each day the call spanned (00:00 to 07:59, 08:00 to 21:59, and 22:00 to 23:59).

The next step is to calculate the actual call periods. The idea is to compare the start and end times of the potential call periods with the start and end times of the complete calls, then determine the start and end times of the actual call periods accordingly. Run the script that Listing 6 shows to create the VActualCallPeriods view, which uses CASE expressions in the SELECT list to perform these calculations.

To calculate the start time of the actual call period (actualfrom), the first CASE expression checks whether the call started during the potential call period, and if so, returns the call start time:
WHEN
  firstminute 
  BETWEEN
  potentialfrom 
  AND
  potentialto
  THEN
  firstminute

For example, user2's call start time (firstminute) on January 12, 2003, was 23:00. For the potential call period January 12, 2003, 22:00-23:59, 23:00 falls between 22:00 and 23:59, so 23:00 is returned as the start time of  the actual call period (actualfrom). If the actual start time doesn't fall within the potential period, the second WHEN clause checks whether the potential period's start time occurs during the actual call. In that case, the code returns the potential call period's start time as the actual call period's start time:
WHEN potentialfrom BETWEEN
  firstminute AND lastminute THEN
  potentialfrom


Still discussing the same call, consider the potential period January 13, 2003, 00:00-7:59. The start time of this potential call (00:00) occurs during the complete call, so the code uses it as the actual call period's start time. Note that when none of the WHEN clauses in a CASE expression return true and no ELSE clause exists, the CASE expression returns NULL. In this case, the expression returns NULL when the potential call period doesn't overlap the complete call at all, such as with the potential call period January 12, 2003, 00:00-7:59.

In a similar manner, the second CASE expression calculates the actual call period's end time:
  CASE
    WHEN lastminute BETWEEN
      potentialfrom AND 
      potentialto THEN lastminute
    WHEN potentialto BETWEEN
      firstminute AND lastminute
      THEN potentialto
  END AS actualto

I'll let you figure out the logic yourself this time. Run the following query against the VActualCallPeriods view, then examine the results, which Figure 4 shows:
SELECT username, actualfrom,
  actualto, rate
FROM VActualCallPeriods

Rows with NULLs represent potential call periods that didn't overlap with the actual call. Other rows contain the start and end times of the actual call periods. For example, user2's call was split into four different actual call periods. For simplicity's sake, I inserted one call for each user as sample data, but the code would work correctly with many calls per user. To return the cost per user and call, run the following simple query, which sums the cost of each call:
SELECT username, firstminute,
  lastminute,
  SUM(DATEDIFF(minute,
    actualfrom, actualto) + 1) AS
    duration,
  SUM((DATEDIFF(minute,
    actualfrom, actualto) + 1) *
    rate) AS cost
FROM VActualCallPeriods
GROUP BY username, firstminute,
lastminute

The code calculates the cost of each period as the duration in minutes multiplied by the period's rate. Note that the total duration is calculated here as the sum of all durations of actual call periods, even though it's a known value. You can verify the result by cross-checking the total durations in the result (which Figure 5 shows) against the durations inserted into the Calls table. You can also manually calculate one session's total cost by examining the output that Figure 4 shows and comparing it to the result that Figure 5 shows.

To calculate total duration and cost per user, simply remove all irrelevant columns from the GROUP BY clause and from the SELECT list and leave only the username column:

SELECT username,
  SUM(DATEDIFF(minute,
    actualfrom, actualto) + 1) AS
    duration,
  SUM((DATEDIFF(minute,
    actualfrom, actualto) + 1) *
    rate) AS cost
FROM VActualCallPeriods
GROUP BY username

Figure 1 shows the result, which is the final solution you're looking for.

A Matter of Practice 


One secret to solving tough challenges such as this one is to master the basics by constant practice. At some point, you realize that solving most complex problems is merely a matter of choosing the right basic techniques and combining them wisely. The call-costs problem is a challenge, but the "ingredients" of the solution are simple: views, a simple auxiliary table, basic date-manipulation functions, basic aggregate functions, and a CASE expression. Constantly practice the basics, and you'll be able to solve virtually any problem.

Comments

Plain text