Weighted Selection in T-SQL

It’s 4 p.m. on Friday when your manager rushes in and assigns you an “urgent” project that must launch Saturday morning. Senior management needs to test a series of different signup pages for the company Web site to quickly determine which marketing message pulls in the most subscribers. On Monday morning, only the page that pulls the most subscribers will survive. You feel a sudden empathy with the beleaguered comic-strip character Dilbert.

A few obstacles stand in the way of a quick solution. At any one time, there might be 1000 or more concurrent users on your Web site. The site consists of more than 500 Web pages; you can’t deploy the solution quickly because the site must be compiled, QA tested, and launched to 20 Web servers with almost no downtime for users.

Management wants to use the existing signup page as a landing page and rotate the different redirect versions based on a percentage or weighting system. Management also wants to be able to change the weights at a moment's notice or completely remove a page version immediately. These requirements could apply in several situations:

single page redirecting to one of multiple versions
single page serving up different content in a weighted rotation
ads in rotation on one page or multiple pages

The previous hypothetical example is similar to a challenge I actually faced on the job. The development environment I worked in wasn't conducive to writing the solution for a weighted, multipage solution in compiled code. The Web site was too large and had too many concurrent users, and the entire site had to be recompiled each time with a typical QA time of more than eight hours—but management required no downtime during a new-code launch. I needed a solution that could respond to quick changes in a zero-downtime environment without recompiling the Web site. Here's how I used T-SQL to solve the problem.

T-SQL to the Rescue

I based my method on the solution to another problem I'd encountered months earlier. For a different project, I wanted to update a sequential number field in a table by using a single UPDATE statement.

Suppose I have a table with a field that defines the order in which each row should display regardless of the record's physical position in the table. In this example, I’m posting an online quiz with x number of questions, which Table 1 shows. The table needs a display-order column since I might have to display questions in a different order or insert new questions between existing ones.

If this table has more than a couple dozen questions, initially updating the display-order column by hand becomes quite tedious. I wanted a T-SQL–based solution that let me update a sequentially numbered column by using a single UPDATE statement run only once. After that, as I inserted new questions, removed old ones, or rearranged the order of some, I could do so manually without much trouble. Although I work with SQL Server every day, I’m primarily a Web developer and couldn't figure out how to create a single UPDATE to accomplish my task.

Google Me This

I did a Google search and found the article “Creating a Sequential Record Number field” on the SQL Team.com Web site (http://www.sqlteam.com/item.asp?itemid=765). In a nutshell, the article pointed out that the Microsoft SQL Server 2000 Books Online (BOL) documentation for the UPDATE statement states:

“SET @variable = column = expression sets the variable to the same value as the column. This differs from SET @variable = column, column = expression, which sets the variable to the pre-update value of the column.”

According to the information in that article, the following UPDATE statement updates a column with a sequential set of numbers for all rows in increments of 10:

DECLARE @myVar int
SET @myVar = 0

UPDATE
  myTable
SET
  @myvar = myField = @myVar + 10

The myField column for the first row starts at 10, and myField in each subsequent row will be 10 greater than the previous myField row value. I used increments of 10 to allow room between questions for new questions, but you could use any number. This technique doesn't guarantee which row will end up getting the 10 value, which will get the 20 value, and so on. It simply increments each subsequent row by a value of 10.

Let's take a closer look at how the UPDATE works.

@myvar = (myField = (@myVar + 10))

(This SET statement is enclosed in parentheses to show the order in which the statement is executed.) myField is first set to the value of @myVar + 10. (In this case, @myVar + 10 = 10 because @myVar was initialized to 0.) At this point, @myVar is still set to 0 since it hasn't yet been assigned a value in the UPDATE. @myVar is then assigned the value of myField (10). This solution is handy for updating all fields of a table with sequential values, although its practical uses are limited.

More Complex Requirements

Months later, I worked on a project in which I needed to randomly redirect visitors to different versions of a Web page—but based on a weighted value when a visitor hit a single, specific landing page. For example, there might be four different variations of the redirect page with two needing to be hit 20 percent of the time each, one needing to be hit 10 percent of the time, and the remaining 50 percent hit all the time. The landing page that each visitor hit had to remain the same because of purchased ads that were hard-coded to a specific URL. The hard-coded URL became the landing page, which then needed to redirect to one of several pages that had different content. Part of the project's specifications required the solution to be data-driven and change the compiled code as little as possible.

I recalled the UPDATE statement I'd used previously and realized I could use it as the basis for the solution to my weighted-pages problem. I created a table containing columns for the landing-page name, the page and location to redirect to, and finally a weight value. From a good programming standpoint, the data-access layer should return the contents of this table to the business layer, and the business layer would then determine which page to redirect to based on the weight of each page for a given landing page.

However, in a lot of commercial Web applications, if you want to make quick changes and test-market different versions of a page, it’s sometimes better to put some of your business logic within a stored procedure for ease and speed of deployment. In my case, it meant that I could make minor changes to the affected “base” code page and run one compile deploy. Adding or removing a page or changing the weight then becomes a simple and quick database change that immediately takes effect on all your Web servers.

I wanted my solution to meet the following requirements so it would accommodate future enhancements:

Individual weights could be any positive integer.
The sum of the weights must be any positive integer.
The order of the weights doesn't matter.
You could exclude a row from the selection process.
The solution could handle one or more landing pages, each with as many redirect pages as needed.
SQL Server would return the result row via a stored procedure or user-defined function (UDF).

I required the second item simply because I could and would not depend on someone to ensure that the sum of all weights always equaled 100 for a simple percentage.

Table 2 shows a basic table structure containing sample rows for a set of records tied to a single landing page. You can use any number of landing pages as long as they’re unique. As you can see in Table 2, the weights can be any positive integer, and the sum doesn't have to equal 100.

The weighting system is automatically normalized under this solution. That is, through a range system, each weight can be equated with a percentage. That’s why the individual weights can be any value. Each row’s weight creates a percentage that's based on the sum of all weights for a given page_name. If a weight is 0, the T-SQL code ignores it. This is a simple and intuitive method to exclude a row from selection for a given page_name.

To sum up, the redirect page is selected in this way:

Select all rows, for a given page_name, where the weight is greater than 0, into a table variable containing a lower and upper range column.
Generate a random number between 1 and the sum of all weights to determine where it falls within the weighting for the selected pages.
Assign a range for each page for every selected row that’s as wide as that page's weight. For example, Landing_page1.aspx has a weight of 63. This means that this page will be selected 63 out of 259 times. So a range of 1 to 63 might be assigned or 64 to 125, for example. Any range that has 63 segments, in this example, will work.
Return the redirect_page value for the row in which the random number falls within the page’s calculated weight range.

Implementing the Solution

Using the method described in the SQL Team.com article, I was able to use one UPDATE statement to generate the range for each row’s weight on the fly, regardless of order. This method lets you store only a weight in the table instead of having to calculate the weight range by hand.

The heart of this method uses the UPDATE statement in Listing 1, which assumes that a table variable @versions has been created and populated with the primary key from the weighted table and the weight for each row selected based on the passed page_name. (You can download the complete code containing the UPDATE; to do so, click the Download the Code Here button at the beginning of the article. The complete code creates a sample weighted table and a stored procedure which, when passed a page name, selects the appropriate redirect page based on the weighted values stored in the weighted table. The table variable also contains the Lower_range and Upper_range columns, initially null.

@start is initialized to 1, the lowest possible value for a range. When the UPDATE is executed, the Lower_Range field is set to @lowerRangeCounter + @start. This value is then assigned to the @lowerRangeCounter variable. @start is now assigned the value from the weight column for the current row. Next, the Upper_Range field is set to the sum of the @upperRangeCounter, initialized to 0, plus the page weight for the current row. Because variables are used in the UPDATE statement, the values set for each previously updated row carry over to the next row when it’s updated. This allows the Lower_Range and Upper_Range fields to be updated correctly as each row is updated. Table 3 shows the results of the UPDATE on the @versions table. Finally, a random number between 1 and 259 is generated, and the row is returned wherever it falls between a row’s Lower_Range and Upper_Range.

Why Not Weight?

You can use this method whenever you need to be able to choose one option from many different options based on a weighting system. The weighting system isn't tied to a maximum value, nor need you concern yourself with the order of the rows. The solution is as simple as adding a record and assigning it a weight.

Comments

Plain text