Performing Base Reconversions

Downloads
46365.zip

People are accustomed to using a decimal base to work with numbers. However, in computerized systems, it sometimes makes more sense to use other bases, which can let you employ fewer digits to represent numbers. In my June column, "Performing Base Conversions," InstantDoc ID 46016, I discussed base conversions and demonstrated some techniques to convert a value expressed in any given base to a decimal value. I explained how you can perform base-to-decimal conversions efficiently by using set-based techniques. The solutions I described require a lot of logical manipulation. Likewise, by using mainly logical deduction, you can figure out how to convert values back from a decimal base to another base. By using the techniques I showed last month and the ones I'll describe this month, you can convert numbers from any source base to any target base. Don't forget to check the "Logical Puzzle" sidebar, page XX, for the solution to last month's Burning Ropes puzzle and to try your luck with a new puzzle—calculating a maximum value by using a mathematical expression.

Converting to Decimal

If applications need to compress numbers into fewer digits than decimal values can store, they can feed nondecimal-base values to SQL Server for storage. For example, serial numbers in base 36 require significantly fewer digits to represent values than in decimal base. The decimal value 3792919053113386084 has 19 digits, but the same value expressed in base 36 requires only 12 digits: STELLAARTOIS. Such nondecimal numbers might represent serial numbers printed or burned onto products in a limited amount of space.

SQL Server lacks built-in support for representing values in nondecimal bases, so you usually store nondecimal-base values in your database as character strings. If you need to perform arithmetic manipulation between nondecimal-base values, you can convert them to decimal values first, perform the arithmetic manipulation, then convert the result back to the original base.

In June, I explained how to convert a value in a given base to a decimal value. Now, let's look at how to do the opposite. Given a decimal value v and a target base b, generate a character string representing v in base b. To obtain test data, run the script in Listing 1, which creates the T1 table and populates it with three decimal values: 3792919053113386084, 31, and 11. You need to develop T-SQL code that returns all decimal values from T1 in a requested target base (@base).

To validate your solutions, run them for bases 36, 16, and 2. Table 1 shows the desired results for base 36, Table 2 for base 16, and Table 3 for base 2. I'll present three solutions, but since this problem is an exercise in logic, try devising your own solutions before looking at mine.

Iterative Solution

I'll start with the iterative solution because it's the simplest and most straightforward of the three. I created a function called fn_dectobase(), which accepts two inputs: @val, the input decimal value, and @base, the target base. The function returns a character string that represents the input value in the target base.

The function implements the following (pseudo code) algorithm:

declare and init @r with empty string;
while @val > 0
begin
  @r = (new_base_digit 
   representing @val mod 
   @base) + @r;
       -- + used to concatenate
  @val = @val div @base;
end
return @r;

Essentially, the function iteratively extracts the rightmost digit from @val by performing @val mod @base, converts the decimal value to the new base digit, and concatenates it with the result string. The function then removes the digit from @val by performing @val div @base.

For example, suppose you want to convert the decimal value 31 to base 16. The aforementioned algorithm applies the following series of steps (pseudo code):

@val = 31; @base = 16; @r = ';

@r = base16digit
(31 mod 16) + @r;
  -- base16digit(15) + ' → 'F'    
    + ' → 'F'
@val = 31 div 16; -- 1

@r = base16digit(1 mod 16) + @r;
  -- base16digit(1) + 'F' → '1' 
    + 'F' → '1F'
@val = 1 div 16; -- 0

The result is 1F. The base16 digit(<decimal_value>) in the pseudo code stands for the calculation of a single digit in the new base (16 in this example). To find the new base digit that represents a certain decimal value, use the following T-SQL expression:

SUBSTRING('0123456789ABCDEFGHIJKL
  MNOPQRSTUVWXYZ',
  <decimal_value> + 1, 1)

The first argument to the SUBSTRING() function is a string containing a sequence of 36 digits supporting any base up to 36. Given a decimal value n, where n < target_base, the corresponding target_base digit is the digit in the n + 1 position within the string. If you wonder why this is true, simply imagine that you had 36 fingers instead of 10 and used those to count. For example, the decimal value n is represented by the base 16 digit F, which appears in the sixteenth position (15 + 1) within the string.

Running the code in Listing 2 creates the fn_dectobase() function, which implements the algorithm I described earlier. To test the function, run the following code:

SELECT dbo.fn_dectobase(31, 16);

You get the string "1F" as a result. To get the results shown in Tables 1, 2, and 3, run the queries that Listing 3 shows.

Set-Based Solution

My first set-based solution has four main steps. First, it calculates the number of digits that the target base value will use. Next, for each target digit, it calculates the digit value (in decimal base).Third, it converts each target digit to the target base digit. And finally, it performs string concatenation to concatenate all digits to one target string.

Step 1 would be easy if T-SQL provided the necessary log functions. Mathematically, given the decimal value n, the number of digits required to express n in base b is FLOOR(LOGb(n)) + 1. T-SQL doesn't provide a log function with any base except natural log and log 10. To calculate the log in any given base (b) of any given value (n), you can use the following equation: LOGb(n) = LOGx(n)/LOG x(b), where x can be any value. Therefore, you can express LOGb(n) in T-SQL as LOG10(n)/LOG10(b). Although the mathematical expression is correct, the T-SQL expression might return incorrect values because the LOG10() function returns a float value, which is imprecise.

My tests produced incorrect results, especially when working with large values, so I had to find a different way to calculate the target number of digits. For this purpose, I created an auxiliary table called BasePowers and populated it with all supported powers of all bases up to base 36. Running the code in Listing 4 creates the table and populates it with data. You should expect overflow errors when running this code. The code in Listing 4 loops from base 2 through base 36, then in an inner loop raises each base by the powers 1, 2, 3, and so on until the script generates an overflow error. This error occurs once for each base, so you should expect 35 such errors. Such an overflow means that the result is outside the range of supported bigint values and therefore of no interest to us, since our solution supports conversions of values that fit in a bigint data type.

As an example of the information you can get from the BasePowers table, try running the following query:

SELECT pos, val FROM BasePowers
   WHERE base = 16;

Table 4 shows the results of this query, which returns all powers of 16 that are less than or equal to the largest supported value in a big integer. Given a decimal value @n and a target base @b, the following code returns a row for each target base digit from the BasePowers table:

SELECT pos, val FROM BasePowers
   WHERE base = @b AND val <= @n;

For example, for @n = 31 and @b = 16, this query returns two rows from BasePowers, one for each target base digit (pos = 1, val = 1) and (pos = 2, val = 16).

As a reminder, step 2 calculates the decimal value of each target digit in the new base. In other words, it breaks the whole decimal value into pieces corresponding to the individual target digits. For example, the solution code will convert the input decimal value 31 after applying all four steps to the base 16 value 1F. So, step 2 should first produce the decimal equivalent of each target digit—that is, first digit = 15 (decimal value of F hexadecimal) and second digit = 1 (decimal value of 1 hex). To calculate the decimal value representing a digit in position p, you use the following formula: dec_val = @n ÷ val % @b, where val (taken from the BasePowers table) is the power of @b for a digit in position pos. Note that I'm using integer division and modulo here. As an example, the decimal value representing the target base digit in position 1 is 15 (31 ÷ 1 % 16). The decimal value representing the target base digit in position 2 is 1 (31 ÷ 16 % 16). Here's the query that produces the decimal values representing all of the target base value's digits for an input decimal value @n and target base @b:

DECLARE @n AS bigint, @b AS int;
SET @n = 31; SET @b = 16;
SELECT pos, @n / val % @b FROM BasePowers
  WHERE base = @b AND val <= @n;

Step 3 is supposed to convert the decimal representation of each target digit to the correct target base digit. You use the same technique I demonstrated in the iterative solution using the SUBSTRING() function and a string of 36 digits. Listing 5 shows how you merge the conversion technique and the preceding query to return the desired results for step 3—namely, the digit F in the postion and the digit 1 in the second. Now apply the same calculations to the values in T1 to get the breakdown of digits—for example, in base 36, as Listing 6 shows.

Now that you have all target digits and their positions, you just apply the fourth and final step, which is to concatenate the digits into one result string. You can apply string concatenation by using a pivoting technique. For information about performing string concatenation using pivoting, see my T-SQL 2005 Web columns from June 2004, "Pivot (or Unpivot) Your Data" (InstantDoc ID 42901), and July 2004, "Dynamic Pivoting" (InstantDoc ID 43140). Because you might need 63 digits to express the largest supported bigint value in the smallest base (2), you'll need 63 MAX(CASE...) expressions in the query's SELECT list. Listing 7 shows the (abbreviated) final solution. In Listing 7, I used base 36 as an example of generating the output that Table 1 shows.

Set-Based Solution with Optimized String Concatenation

Typically, a set-based solution is faster than the iterative alternative. In this case, the set-based solution is slower and significantly longer and more complicated than the iterative solution, mainly because of the string-concatenation technique that contains 63 MAX(CASE...) expressions. To find a fast solution that uses at least partially set-based techniques, I tried to optimize the string-concatenation part of the solution.

My third solution starts with the same query I used in the previous solution, which generates a result set that contains all target digits and their positions in separate rows before applying the pivot technique. You already know how to calculate the target number of digits. Now generate a string that contains as many zeros as the target number of digits:

DECLARE @r AS varchar(63);
SET @r = REPLICATE('0',
(SELECT MAX(pos) FROM 
  BasePowers
     WHERE base = @b AND 
       val <= @n));

Next, use the STUFF() function as follows to "plant" the different digits in @r while scanning the rows that the previous solution's query returned:

SELECT @r = STUFF(@r, LEN(@r) - 
  pos + 1, 1, digit)
FROM (query_returning_digit
  _breakdown) AS D
WHERE digit <> '0';
SELECT @r;

The use of STUFF() here is a bit tricky to follow. Try to visualize the breakdown of the base 36 digits for the input decimal value 3792919053113386084, as Table 5 shows. The result value has 12 target digits, so the code initializes @r with 12 zeros.

Now imagine the main piece of code containing the STUFF() function scanning the 12 rows that hold the digit position (pos column), and value (digit column). Regardless of the order in which SQL Server accesses these 12 rows, the STUFF() function replaces the zero in the correct position from the right (LEN(@r) – pos + 1) with the actual digit, and by the end of the scan, your code plants all digits.

Listing 8 shows the full implementation of this solution, creating the fn_dectobase() function. You use the function the same way you did with the iterative implementation I described earlier.

Set-Based Versus Iterative Conversions

My first idea when looking for solutions that convert a decimal value to a desired base was the iterative implementation I demonstrated. But because set-based solutions prove faster than iterative ones in most cases, I then tried to create a set-based solution. However, the first set-based solution I tried was slower, longer, and more complex, mainly because of the pivoting string-concatenation technique. In an effort to optimize the string concatenation, I used the STUFF() function. But even after I optimized the string concatenation, the iterative solution ran twice as fast as the optimized set-based solution.

Base conversions involve data formatting more than data manipulation, and set-based operations weren't designed to format data efficiently. This task needs an iterative approach, but T-SQL isn't strong in iterative operations. If you're not happy with the fastest (iterative) option I showed, you might want to perform the base conversions in the client application by using a programming language (e.g., Visual Basic—VB, C++, C#) designed for such activities.

All the solutions require a lot of complex logical manipulation. Whether or not you use the T-SQL implementations for base conversions, you might find that the logical manipulation and techniques I demonstrated here will help you with other tasks.

Comments

Plain text