Simplifying the Sequence Generator

Downloads
23306.zip

Simplifying the Sequence Generator

I enjoyed Alexander Netrebchenko's article "Generating Sequences in T-SQL" (November 2001, InstantDoc ID 22442), which explained a technique for populating tables and creating queries that generate number, date, and time sequences without loops. The article really made me think. As I reviewed the article's fn_sequence() user-defined function (UDF), I found that the function's code isn't easily maintainable. The original definition of fn_sequence() required the hard-coding of another join in the function to increase the ceiling of the sequence being generated. The code also seemed bulky. To improve maintainability, I revised the code as Listing 1 shows. I removed the exponent logic from fn_sequence() and placed the bulk of the work into fn_p(), which calls itself recursively and generates one-column rowsets. Simplifying and minimizing the code makes management and maintenance easier, especially if someone other than the code's creator is working on it.

When I was making these changes, I first addressed the fn_p() function, which cross-joins with itself. Fn_p() now calls itself recursively (i.e., calls itself within the function), each time reducing the exponent being passed by one. This solution is more flexible and maintainable than hard-coding additional joins because recursion lets you reuse code. Each time fn_p() performs the cross join, it subtracts 1 from @exp until the value of @exp is less than 0, at which point fn_p() returns 0—and you have a safe exit point.

Second, in the fn_sequence() function, I needed to find out what power of 10 to use to generate the correct sequence. So I converted the original function's integer parameter to a varchar and subtracted 1 from the parameter's length. The conversion to a varchar data type lets the LEN() function generate the length of the character representation of the number. I chose this approach to avoid adding multiple IF...ELSE statements to determine what power to initially pass into the fn_p() function. I thought of this approach when I remembered an old trick: Any power of 10 is 1 followed by a number of zeros equal to the power (e.g., 10²=100). This approach results in the initial exponent of 10 that fn_sequence() passes to fn_p().

These two functions create sequence results as needed, which could place an undue burden on server resources if the sequence is large. If users work with the data, you might want to limit the size of the sequence. One way to do this is to test the size of the exponent (@exp) in fn_sequence() after it's generated and before it's passed to fn_p(). You can set up the function to return a descriptive error describing fn_sequence() limitations if the exponent exceeds 5 (i.e., generating a sequence of a million or more records).

—Ben Craigo
[email protected]

Thank you for your interest in my article. The code in Listing 2 shows an optimized version of the sequence-generator function, which I wrote after I submitted my November 2001 article. The key improvements in the sequence generator are in the fn_p() and fn_sequence() functions, as you suggest.

First, the fn_p() function in the optimized sequence generator uses recursive calls. I used an approach similar to the one you proposed: The fn_p() cross-joins with itself, decreasing the @exp parameter's value each time. This more elegant implementation significantly reduces the sequence generator's code size and improves readability. I added a check (IF@exp > 9 RETURN) to avoid having sequence values overflow the int data-type range (i.e., the fn_p() function returns an empty rowset if @exp > 9). So the sequence range is now 0 to 10⁹-1. The practical limit, however, is about 10⁶ because bigger @exp values result in much longer query execution times on my system.

Second, I optimized the fn_sequence() function. The optimization improves the function because it lets fn_sequence() calculate the actual size of the sequence before calling fn_p(). For example, if you want to generate numbers from 100,000 to 102,000, you don't need to generate 10⁶ numbers and use BETWEEN AND to filter them. Instead, you generate numbers from 0 to 2000, then add them to the sequence start value. This change significantly decreases query execution time. Another benefit of this approach is that you can pass a negative value as the @start parameter, which lets the function generate sequences that start with a negative value and end with a positive value (e.g., -1000 to 1000) in one call.

One comment about your code: The IF statement that ends the recursion in the fn_p() function, which the code at callout A in Listing 1 shows, adds one extra CROSS JOIN operation. So if the @exp parameter is 2, fn_p() will be called four times instead of three times. The fourth call will cross-join a rowset (0, 1, 2, ... 9) with 0, which is unnecessary. I appreciate your comments.

—Alexander Netrebchenko

Comments

Plain text