In many of my articles, I mention that SQL involves a lot of logic and that you can improve your SQL querying capabilities by solving pure logic puzzles. You've probably already heard of the logic puzzle called Sudoku, which is popular these days.
Sudoku is a Japanese term that refers to a number that's singular or unique. However, the concept behind Sudoku puzzles is by no means unique. In 1783, Swiss mathematician Leonhard Euler invented Latin squares as a variation of magic squares. In Latin squares, you populate a square grid with numbers or symbols that can't repeat in the same row or column. In the 1970s, a form of the puzzle called number place surfaced in America. In the late 1980s, the Sudoku puzzle surfaced in Japan.
The Sudoku puzzle has more restrictions than Latin squares. As Figure 1 shows, in a classic Sudoku puzzle, you get a 9×9 squared grid that's subdivided into 3×3 squared boxes. Some of the cells are already populated with numbers that range from 1 through 9. The goal is to populate all cells with numbers in the range 1 through 9 so that no number will repeat in the same row, column, or 3×3 box. Note that in a correctly formed Sudoku puzzle, there can be only one correct solution. Figure 2 shows the solution to the puzzle in Figure 1.
Sudoku puzzles have varying levels of difficulty, but all require you to apply logical deduction to solve them. The variations in complexity have to do with varying levels of logical rules and deductions that you need to apply to solve the puzzle. I thought that it would be interesting to handle Sudoku puzzles with T-SQL, using some of the new features in SQL Server 2005. That way, you'll have a chance to practice both your logic skills and learn how to use the T-SQL enhancements. The main T-SQL enhancements that I'll use are the PIVOT and UNPIVOT operators, and Common Table Expressions (CTEs). You can find details about the PIVOT and UNPIVOT operators in the Web-exclusive T-SQL 2005 articles "UNPIVOT" (InstantDoc ID 43589), "Dynamic Pivoting" (InstantDoc ID 43140), and "Pivot (or Unpivot) Your Data" (InstantDoc ID 42901). You can learn about CTEs in the T-SQL Black Belt articles "Cycling with CTEs (June 2004, InstantDoc ID 42452) and "Get in the Loop with CTEs" (May 2004, InstantDoc ID 42072).
Preparing the Sudoku Puzzle's Input
I like to use a normalized table to store input for a Sudoku puzzle. The code in Listing 1 creates a normalized table named SudokuInput and populates it with the sample data in Figure 1. This code loads the table with only those rows that represent populated cells.
Some programmers prefer to store the input in a denormalized form, where each row in the table represents a row in the input puzzle. The code in Listing 2 creates a denormalized table named SudokuInputD and populates it with the sample data in Figure 1.
Note that with the new PIVOT and UNPIVOT operators, you can easily generate one form from the other. The code at callout A in Listing 3 shows how to generate a denormalized form of the input from the normalized SudokuInput table. The code at callout B shows how to generate a normalized form of the input from the denormalized SudokuInputD table. In my code, I like to use the normalized form because I find it much more convenient for set manipulation.
Solving Sudoku Puzzles with T-SQL
There are two main approaches that you can take in programming a solution to Sudoku puzzles. One is a brute-force approach in which you try different permutations and check for validity or invalidity. Such a solution can be very slow and doesn't involve much sophisticated logic, so I won't discuss that approach here. The other approach is to identify logical rules that you use when solving a Sudoku puzzle and program those rules, which is my goal in this article. Some Sudoku puzzles can be hard to solve and require you to apply several interesting logical rules that have set-based applications.
I recommend that you create a table-valued function that returns the Sudoku solution as a normalized table. You need to accommodate cases in which the rules that you apply aren't sufficient to come up with the final solution, unless you're sure that your solution applies to any given Sudoku puzzle. If you suspect that your rule set isn't guaranteed to generate the final solution for any given Sudoku puzzle, you can take one of two approaches in terms of the content of the nonfinal cells in the returned table:
- You can return only final cells (one unique value was identified).
- You can return final cells and return nonfinal cells with all the possible values that remain in those cells after elimination. In your output, you can return the nonfinal cells as arrays or formatted as comma separated lists of values.
I prefer the latter approach in which I use comma separated lists of values for the output.
In my solution, I use the Nums auxiliary table, which I've used many times in past T-SQL Black Belt articles. The code in Listing 3 creates and populates the Nums table, which simply contains a sequence of integers from 1 and on. The code in Listing 4 creates a table-valued function named fn_sudoku, which at this point, doesn't yet apply any rules.
As Listing 4 shows, the fn_sudoku function declares the @rc variable and the @Sudoku Unique and @SudokuNonUnique table variables. The @rc variable keeps track of the number of rows affected by applying the different rules. The @SudokuUnique variable holds cells for which a final unique value is identified. The @SudokuNoUnique variable holds nonfinal cells (i.e., cells with multiple values).
The code at callout A in Listing 4 loads all values from the input Sudoku puzzle into @SudokuUnique. Because the rules might refer to the box in which a cell resides, I included a box attribute besides the row and col attributes. The box number is a function of the given row and column. The formula to calculate the box number is
(row-1)/3*3 + (col-1)/3 + 1
For example, a cell in row 5, col 4 resides in box 5. When you try to figure out this example in your head, keep in mind that T-SQL applies integer division when dividing two integers, which means that you get an integer quotient (fraction truncated).
At callout B in Listing 4, the fn_sudoku function loads into @SudokuNonUnique nine rows with the values 1 through 9 for each of the unoccupied cells in the input Sudoku puzzle. The code achieves this by cross joining three instances of Nums, where the n column in each is in the range 1 through 9. One instance represents rows (R), one instance represents columns (C), and one instance represents values (V). The code filters only cells (row/col combination) that don't exist in SudokuInput.
Most of the fn_sudoku function's logic is introduced in the loop at callout C in Listing 4. You basically write two types of rules. First, you write rules that delete from @SudokuNon-Unique those values that can't be correct. Then, you write rules that insert the final correct values into @SudokuUnique. (I give examples of delete and insert rules in the next section.)
At the beginning of the loop, the @rc variable is initialized with zero. Then, each delete and insert rule increments the @rc variable with the number of rows affected by the rule. The loop continues as long as rules affect rows in the previous round. Finally, the fn_sudoku function loads into the @Sudoku table variable (which is the function's output parameter) all final values from @SudokuUnique and nonfinal lists from @SudokuNonUnique.
To test the function after adding each rule, it's convenient to examine the output in a pivoted form (one row for each Sudoku row), where each cell contains a concatenated list of the remaining values. The code in Listing 5 pivots and concatenates the output from the fn_sudoku function.
In Listing 5, the SudokuStrings CTE converts the val column values returned from the function into character strings because you need to concatenate multiple values. The SudokuPivotVals CTE pivots multiple values of the same cell (row/col) and concatenates them into a comma-separated list of values. The SudokuPivotVals2 CTE removes the trailing comma. The SudokuPivotCols CTE pivots columns of the same row, so that you get nine different columns in the same result row.
To test the code, you can run it using the function's implementation, which doesn't apply any rules yet. Table 1 shows the abbreviated output. All occupied cells from SudokuInput show up as scalar values. Unoccupied cells from SudokuInput currently show up as lists—that is, 1,2,3,4,5,6,7,8,9. However, for brevity, I replaced the string 1,2,3,4,5, 6,7,8,9 with the string 1-9 in Table 1.
Now it's time to implement the brains of the fn_sudoku function by identifying and writing its delete and insert rules. I'll give a few sample rules, then leave it up to you to add others.
Listing 6 shows sample delete rules that remove values from @SudokuNonUnique. If a cell is occupied in @SudokuUnique, its value is final. So, the first rule removes all potential values from the corresponding cell in @SudokuNonUnique. The second rule removes values that appear in the same row, column, or box in @SudokuUnique.
To test these delete rules, you can add the code in Listing 6 to the function's loop body at callout C in Listing 4, then run the formatting code in Listing 5. Table 2 shows the results. As you can see, the lists got smaller.
Listing 7 shows sample insert rules that load final values into @SudokuUnique. This code works by isolating values in @SudokuNonUnique from cells that aren't populated yet in @SudokuUnique (same row and col; doesn't exist in @SudokuUnique). The first rule isolates values from @SudokuNonUnique that became unique in their cell after applying the delete rules. A value is unique in its cell if there's no other value in the same cell (same row and col; different value). The second rule isolates values that can't appear in another cell in the same row. The logic of this rule is a bit tricky, but if you've practiced relational division logic, this logic should be familiar to you. The T-SQL expression that applies the second rule basically translates to the following pseudocode:
Isolate the value if you can't find (the same value in another unoccupied cell in the same row for which you can't find (a final occupied cell with the same value and (the same row, column, or cube)))
The third rule isolates values that can't appear in another cell in the same column. The fourth rule isolates values that can't appear in another cell in the same box.
To test these insert rules, you can add the code in Listing 7 to the function's loop at callout C in Listing 4, then run the code in Listing 5 to generate formatted output. As Figure 2 shows, the delete and insert rules that I provided are sufficient to solve this Sudoku puzzle as well as solve other Sudoku puzzles.
Putting It All Together
The solution that I've provided contains the framework for solving Sudoku puzzles and some sample fundamental rules. You can continue improving the solution by identifying and implementing more rules. In this article's 47752.zip file, which you can download at InstantDoc ID 47752, you'll find sudoku.sql, which contains the code in the seven listings.
Sudoku.sql also contains the input for another Sudoku puzzle in case you want to try to solve one on your own. The sample rules in Sudoku.sql don't solve this puzzle completely, so you need to do some logical thinking to come up with the solution. In the 47752.zip file, you'll find the input for this puzzle (Figure A) as well as the solved puzzle (Figure B).
Keep On Thinking
Handling Sudoku puzzles with T-SQL is a great way to practice logic and logic programming at the same time. I also recommend that you try to solve Sudoku puzzles manually. Not only will you hone your logic skills, but you'll probably also have some fun in the process.