Figure It Out

Downloads
20591.zip

Use mathematics to optimize T-SQL code

Editor's Note. Send your experts-only T-SQL tips to Itzik Ben-Gan at [email protected]. If we use your tip in the magazine, you'll receive $100 and an exclusive T-SQL Black Belt shirt.

In "Adding Performance," May 2001, I showed how a biological testing facility approached a certain business problem by using a mathematical equation that converts summation to multiplication. That example had specific requirements, which the lab personnel fulfilled by employing the T-SQL LOG10() function. You can approach other business problems through T-SQL by finding a mathematical solution. Eitan Farchi, Ph.D., of the IBM research laboratory in Haifa, Israel, provided a proof that the mathematical solution to this article's problem applies to all possible input values.

Transmitting data from one system to another can corrupt that data, so how can you find out whether a particular piece of data, say the number n, is now corrupt? At the source system, you can apply an algorithm to the number n that produces one digit, then you can append that digit to the original number. This type of calculated digit is called a check digit. At the target system, you apply the same algorithm to the number that system receives. If the resulting check digit is different from the check digit that the target system received with the number, you know that the number is now corrupt, and the target system can request a retransmission. For an extensive discussion about check digits, see Joe Celko's book Data & Databases: Concepts in Practice (Morgan Kaufmann, 1999).

The Cast9 algorithm, which is commonly used to calculate check digits, sums all the digits of the source number. If the result has more than one digit, the Cast9 algorithm sums the digits of the result number, continuing to sum until only one digit remains. This digit, which becomes the check digit, is then appended to the source number. For example, suppose you want to use the Cast9 algorithm to calculate a check digit for the number 369. You start by summing all the digits: 3 + 6 + 9 = 18. The result has more than one digit, so you sum those digits: 1 + 8 = 9. Now you have just one digit, the check digit, which you append to the original number, making 3699.

Suppose you want to append check digits to the product IDs in a Products table in your SQL Server database. The script in Listing 1 creates and populates a Products table. If you're working with SQL Server 2000, you can implement the Cast9 algorithm in a user-defined function (UDF). One way to implement such a function is to use two loops, as Listing 2 shows. The outer loop continues to run as long as the source number contains more than one digit. The inner loop sums all the digits of the number. Finally, the function appends the check digit to the original value you provided as an argument to the function and returns the entire number.

To retrieve the products and the product IDs with a check digit attached, you can issue the following query:

SELECT
	productid,
dbo.fn_cast9(productid) 
AS pid_cast9,
	productname
FROM Products

Table 1 shows the result.

Listing 2's implementation of the function looks efficient enough, but if, like me, you try to optimize every piece of code that comes your way, you're probably already wondering how you can implement the Cast9 algorithm more efficiently. As I considered how to optimize the implementation of this algorithm, the process of summing all the digits of a number sounded familiar. Eventually, I remembered something I learned in elementary school: How do you know whether a number divides by 9 with no remainder? You sum all the digits of the number; if the result divides by 9 with no remainder, the number itself divides by 9 with no remainder. At this point, I scribbled a few computations and discovered that when I started with positive numbers that divide by 9 with no remainder and recursively summed all the digits until one digit remained, I always got 9. When dividing by 9 gave a remainder of 1, the sum of the digits was also 1; when the remainder was 2, the sum of the digits was also 2; and so on.

Eventually, I figured out how to optimize the UDF; you can write the solution as the T-SQL expression (@n 1) % 9 + 1, where @n is a variable that represents the source number. In T-SQL, the percent sign (%) is the modulo operator, which returns the remainder of integer division. This computation worked for all the numbers I tested, but I didn't know whether it would work for all numbers. I discussed the issue with Eitan Farchi, who provided a mathematical proof demonstrating that this computation works for any positive number. I revised and simplified Farchi's mathematical proof; this revision is in the sidebar "Calculating Cast9." You can now revise and optimize the dbo.fn_cast9 function with the code that Listing 3 shows.

An added bonus of implementing the Cast9 algorithm with this simple mathematical computation is that you can embed the computation in the query, instead of implementing the computation as a function. This capability lets you use Cast9 in SQL Server 7.0 and earlier releases, which don't support UDFs. To retrieve a list of products and the product IDs with a check digit attached, you can now issue the following query:

SELECT
	productid,
	productid * 10 + (productid - 1) % 9 + 1
	AS pid_cast9,
	productname
FROM Products

When you aim for short and efficient code in T-SQL, mathematics can be a great tool. Using a mathematical approach, you can efficiently solve even problems that don't seem to have any relationship to mathematics at first glance. Next time you face a T-SQL problem, consider using a mathematical solution.

Comments

Plain text