More Bitwise Wisdom

Use bitwise XOR and NOT and helper UDFs to encrypt, store, and manipulate data As a developer, you might want to perform tasks that involve bitwise operations. For example, you might want to manipulate bit strings to determine the state of a particular bit, set a bit's value, or retrieve information from bit strings. In "Get Wise About Bits," October 2002, I explained how to perform bitwise operations by using the bitwise AND (&) and bitwise OR (|) operators. Let's take a look at the other two bitwise operators—XOR (^) and NOT (~) and learn how to extend the bitwise manipulation capabilities in T-SQL by writing your own user-defined functions (UDFs).

Bitwise Exclusive Or (^)

The bitwise XOR operator is less common in T-SQL than other bitwise operators. A bitwise XOR operation between two bits returns 0 if both bits have the same value (e.g., 0^0, 1^1) and 1 if the bits have opposite values (e.g., 0^1, 1^0). The acronym XOR stands for eXclusive OR. The XOR operator is similar to the bitwise OR operator that I discussed in "Get Wise About Bits" in that for the OR operation's result to be 1, one of the arguments has to be 1. However XOR is different from the bitwise OR operator in that if both bits are the same, a bitwise XOR always returns 0, whereas a bitwise OR returns 1 if both arguments are 1. SQL Server determines the results of XOR operations on a bit-by-bit basis according to the truth table that Table 1 shows. A truth table contains all possible combinations of the input bits and the result of the bitwise operation for each combination.

The bitwise XOR operator has two common uses in computing environments: simple key encryption and storing parity information. Both applications of the XOR operator are interesting, but before you can implement them, you need to have a firm understanding of the basics of XOR operations. Bear with me while I walk you through the basics before we get to the more juicy parts.

Both uses of XOR (encryption and parity) rely on the fact that the result of a bitwise XOR operation is reversible; you can't reverse the result of a bitwise AND or a bitwise OR operation. For example, if A ^ B = C, you can always deduce that A ^ C = B and also B ^ C = A. In English, you perform a bitwise XOR operation between two arguments and set the result aside for safekeeping. If one of the arguments is missing or corrupted, you can always calculate it by XORing the result with the other argument. For example, 3 ^ 5 = 6, 3 ^ 6 = 5, and 6 ^ 5 = 3. As I mentioned earlier, bitwise operations operate on the binary representation of the integers, on a bit by bit basis. This principle is also true for multiple bitwise XOR operations; if A ^ B ^ C = D, you can always deduce that A ^ B ^ D = C, and so on. With bitwise AND and bitwise OR, you can't regenerate an argument if you have only the result and the other argument. For example, if A & B = C, you can neither deduce that A & C = B, nor that B & C = A. For example, if 3 & 5 = 1, you can't assume that 3 & 1 = 5 nor that 1 & 5 = 3. Consider the two integer arguments 3 and 5 and the result of the bitwise operations between them. Table 2 shows the arguments as integer values and shows the bit representation of the last byte of the integers (all the rest of the bits are 0). You can use Table 2 to help you quickly translate the input arguments to their binary representation and to easily perform the bitwise operations between the input arguments on a bit-by-bit basis.

Table 3 shows the result of the bitwise operations AND, OR, and XOR between the two input arguments values 3 and 5; this table shows the result as an integer and in its binary representation. For example, the first row in Table 3 shows the result of arg1 & arg2, which is 1. To calculate the result yourself, you can write down the binary representation of the two input arguments as they appear in Table 2, and perform a bitwise AND on a bit-by-bit basis:

00000011 -- arg1 - 3
&
00000101 -- arg2 - 5
-------------
00000001 -- result  - 1

Similarly, the second and third rows in Table 3 show the result of arg1 | arg2 and arg1 ^ arg2 respectively.

Suppose you've lost arg2 (5) and you want to reconstruct it from the existing arg1 (3) and the result that Table 3 shows. Only a bitwise XOR operation between the available argument and the result will render the missing argument. Take a look at Table 4, which shows the result of the bitwise operations AND, OR, and XOR between arg1 (3) and the result shown in Table 3. For example, the first row in Table 3 shows the result of 3 & 5, which is 1. The first row in Table 4 shows the result of 3 & 1, which is 1:

00000011 -- arg1 - 3
&
00000001 -- previous result - 1
-------------
00000001 -- result  - 1

As you can see, you can't regenerate the lost arg2 by performing a bitwise AND between arg1 and the result of arg1 & arg2. However, the third row in Table 4 shows that you can regenerate the lost arg2 by performing a bitwise XOR between arg1 and the result of arg1 ^ arg2:

00000011 -- arg1 - 3
^
00000110 -- previous result - 6
-------------
00000101 -- result  - 5

The same principle applies to trying to reconstruct arg1 (3) from the result and the available arg2 (5), as Table 5 shows. Once you understand the basic principles of how the bitwise XOR works, you can use it to encrypt data and store parity information.

Encrypting Data

Using XOR with an encryption key is one of the most basic ways to encrypt data. Suppose you have a piece of data that you want to encrypt—for this example, call the data A. You generate an encryption key called B. At this point, you and your client store the key in a safe place. You encrypt your data by applying the XOR operator to the data and the encryption key: A ^ B = C. You deliver the encrypted data C to your client, who in turn performs the same operation between the piece of data received (C) and the encryption key (B). The result is the original source data (A).

Using this encryption method is simple, but it's useless if two of the arguments are available to an unauthorized third party. If you have access to the source data (A) and the encrypted data (C), you can always calculate the encryption key (B) by applying the XOR operation to A and C. Suppose you don't expose the encryption key, but you do supply the function f()—where f(A) = C—which accepts an argument and returns its encrypted form by applying the XOR operator to the argument and the encryption key. Those who have access to the function can deduce the encryption key by supplying their own source data (A)—f(A) = C, A ^ C = B. So, if you have any plans to encrypt your data by using XOR with an encryption key, make sure that the key is stored in a safe place and that only authorized applications can use the encryption function.

Storing Parity Information

You can use the bitwise XOR operator to store parity information (or "safety check" information) with your data and later use the parity information to see whether your data has been corrupted. You can also use parity information to regenerate corrupted data. To understand how to use parity information as a safety check, suppose you have a few pieces of data—A = 9, B = 5, C = 4, and D = 7—and you want to deliver them from point 1 to point 2. You generate the parity information—call it E—by performing a bitwise XOR operation between all the pieces of data you want to deliver—A ^ B ^ C ^ D = E, or in numbers, 9 ^ 5 ^ 4 ^ 7 = 15. You can check this calculation by running the following in Query Analyzer:

SELECT 9 ^ 5 ^ 4 ^ 7
SELECT 9 | 5 | 4 | 7

You deliver the parity information along with the data to the target application. The target application, in turn, performs the same bitwise XOR operation on the data pieces it receives. If the result is equal to the parity information it received (E = 15), the target application is assured that the data isn't corrupted.

This method of storing parity information isn't completely reliable because data corruption can occur in more than one piece of data and still result in the same parity. For example, if both A and D are corrupted in such a way that the target application receives the values A = 8 and D = 6 and all the other values are transferred with no corruptions, the result of a bitwise XOR operation between all four pieces of data—8 ^ 5 ^ 4 ^ 6—is also 15. However, if the parity information calculated from the received data is different from the parity information delivered with the data, the target application can be sure that data corruption occurred and can request that the source application resubmit the data. For example, if the target application received A as 8 and no other pieces of data were corrupted, the parity calculation at the target—8 ^ 5 ^ 4 ^ 7—would yield 14.

To understand how you can use parity information to reconstruct corrupt data, consider the previous example using A, B, C, and D as the data and E as the parity information. If one of the pieces of data was corrupted—let's say B—and you know which piece of data is corrupt, you can always reconstruct the correct missing data from the available pieces of correct data and the parity bits—A ^ C ^ D ^ E = B or 9 ^ 4 ^ 7 ^ 15 = 5.

Bitwise NOT (~)

Just as cards and coins have two faces, bits have two possible values (except the unknown value, which is represented by NULL in SQL Server). The bitwise NOT operator simply reverses bits, turning 1 to 0 and 0 to 1. Table 6 shows the truth table for bitwise NOT.

For example, if you want to update a bit column called bit_col in a table called T1 by turning all 0s to 1s and all 1s to 0s, you can use the following code:

UPDATE T1 SET bit_col = ~bit_col

You might want to run such a query if you decide to change the meaning of the 0s and 1s. Say you have a bit column called gender, in which 0 represents female and 1 represents male. You merge data with another table where 0 represents male and 1 represents female. After you run a query similar to the one above, both tables are consistent.

You can also use the bitwise NOT operator to simplify the generation of bitmasks. Suppose you want to turn off the third bit of an integer column called status in a table called Customers for customer A. You need to apply the AND operator to the status column and a bitmask in which all bits are turned on except the third bit. A simple way to generate such a bitmask is to perform a bitwise NOT operation, as the following statement shows:

UPDATE Customers SET status = status
& (~4) WHERE customerid = 'A'

The NOT operator is applied to a value in which all bits are turned off except the third: ~4. (Remember that the value of the third bit is 4.)

Helper UDFs

You might need to perform any of several bit-manipulation­related activities for which T-SQL provides no built-in support or which are simply tedious. Let's look at several cases in which you can use UDFs to make such tasks possible in some cases and easier in others.

Performing bitwise operations with two binary arguments. As I mentioned in "Get Wise About Bits", the bitwise operators AND, OR, and XOR accept two integer values, or at most one binary value, and the operator NOT accepts only one integer value. Many programmers find this limit restrictive when they want to perform bitwise computations between two binary values—especially if the binary values are larger than 8 bytes, which is the largest integer data type (bigint) supplied in T-SQL. In SQL Server 2000, you can create bitwise functions that accept two binary values as large as the binary or varbinary data types allow—8000 bytes—by performing the following steps:

1. Loop through all the bytes in both variables, extracting one byte at a time from each of the variables.
2. Cast one of the extracted bytes to a tinyint data type.
3. Perform the original bitwise operation between the two bytes and concatenate the result to a variable that you'll use as the function's return value.

The function that the code in Listing 1 creates implements this logic for a bitwise AND operation between two binary values. Notice that the function bitwise_and() uses the function linear_min(), which returns the minimum of its two input values. This value is the number of iterations of your loop. Similarly, you can create functions that implement bitwise OR and bitwise XOR between two binary values, as Listing 2 shows.

Implementing a bitwise NOT is much simpler. You loop through all bytes, extracting one byte in each iteration of the loop, performing the original bitwise NOT on that byte, and concatenating the result to the returned variable, as Listing 3 shows. To test the bitwise functions you've just created, run the following code:

SELECT dbo.bitwise_and
(0x00000001000000010000000100000001,
0xffffffffffffffffffffffffffffffff)
SELECT dbo.bitwise_or (0x00000001000000010000000100000001,
0xffffffffffffffffffffffffffffffff)
SELECT dbo.bitwise_xor
(0x00000001000000010000000100000001,
0xffffffffffffffffffffffffffffffff)
SELECT dbo.bitwise_not
(0x00000001000000010000000100000001)

Conversion UDFs. If you've had to handle bit manipulation with bitmasks, you've probably found the task of generating bitmasks confusing and annoying. In "Get Wise About Bits," I showed some examples of how to use bitmasks to turn on or off some flags in bit strings. I showed an example in which you use an UPDATE statement to turn on the third and seventeenth bits in a column called flags in a table called T1:

UPDATE T1
SET flags = flags | 0x00010004
WHERE key_col =

The bitmask 0x00010004 is expressed in hexadecimal notation (65540 decimal), and it has all bits turned off except the third (bit value 4) and seventeenth (bit value 65536) bits. To help you generate such a bitmask, you can create the function dbo.fn_2to16() that Listing 4 shows. This function accepts as an argument (@base2) a bit string expressed as a varchar value made up of 1s and 0s and returns a varchar in base16 (hex) notation. The function first adds leading 0s to the beginning of the input string to make its number of digits a whole multiple of four because every group of four base2 digits translates to one hex digit. The function then enters a loop, which runs as many times as the expected number of hex digits (base2 digits / 4). In each iteration of the loop, the function extracts a different group of four base2 digits, starting with the rightmost group and advancing to the left. The function then translates the four base2 digits to a hex digit by using a CASE expression and concatenates the hex digit to the result string, which is held in the variable @base16. Eventually, the function returns the result string that holds the hex translation of the input base2 string. To test the function, run the following code:

SELECT dbo.fn_2to16('10000000000000100')

You should get 10004 as the result. Similarly, the function dbo.fn_16to2(), which Listing 5 shows, converts a base16 string to a base2 string. To test the function, run the following code:

SELECT dbo.fn_16to2(10004)

You should get 00010000000000000100 as the result. Now that you have the conversion functions, generating bitmasks is easier. The only problem is that the bitmask is returned as a varchar value, so you can't use it directly in your query. With our UPDATE statement example, you'd need to first manually invoke dbo.fn_2to16() to get the hex string, then copy the string to your query.

You can make your life even easier by creating a function that accepts a character-based base16 string and converts it to a binary data type, then invoking this function directly in your UPDATE statement. Web Listing 1 shows the function dbo.fn_chartobin(), which performs the conversion. The function forms a loop that runs as many times as there are bytes in the result value—in other words, the number of digits in the source value: 2 (every two hex digits make a byte). You translate a character holding the hex digit to the binary value by using a CASE expression similar to the way you used it in the previous functions to perform the conversion. However, translating a pair of digits expressed as characters to a binary byte is tricky. To understand why, let's walk through an example. Suppose you had to convert the character string F3 to the binary value 0xF3. You'd probably try to separate the characters—you'd translate them to the binary values 0xF and 0x3 and concatenate them to get 0xF + 0x3. The problem is that T-SQL works only with full-byte binary values, so when you write 0xF + 0x3, T-SQL translates it to 0x0F + 0x03, and the result is 0x0F03. To demonstrate what I've just explained, try this:

SELECT 0xF + 0x3

To make the function in Web Listing 1 work, I translated all odd digits to 0x0 and all even digits to 0x0, then performed a bitwise OR (|) between them. For example, F3 would translate to 0xF0 | 0x03 = 0xF3. Also notice that one of the arguments in the function converts to a tinyint data type because you can't perform a bitwise operation between two binary values. Because the result of the bitwise operation is a tinyint, the function converts the result to a binary(1) value. To test the function, issue the following code:

SELECT dbo.fn_chartobin('10004')

You should get 0x010004 as the result.

Now, to turn on the third and seventeenth bits in the column flags in table T1, issue the following statement:

UPDATE T1
SET flags = flags |
dbo.fn_chartobin(dbo.fn_2to16
('10000000000000100'))
WHERE key_col =

Now that you're familiar with T-SQL's bitwise manipulation capabilities, you can use what you've learned to handle bit strings. And when you need certain bit-manipulation capabilities that aren't available in T-SQL, you can always write your own UDFs. 