Dynamic Pivoting

The PIVOT() function, a new T-SQL construct in SQL Server 2005, lets you rotate data from rows to columns. Last month in "Pivot (or Unpivot) Your Data," I covered the basics of PIVOT() by showing you how to rotate attributes from rows to columns and how to aggregate data. In all my examples, you know the result column list beforehand, so you could write static T-SQL queries to provide the desired result. This month, I discuss a scenario in which you don't know the result column list ahead of time and demonstrate some dynamic T-SQL code that returns the desired result. I also show you how to use the PIVOT() operator to concatenate strings.

Dynamic PIVOT()

To demonstrate dynamic pivoting, let's use the Orders table that Listing 1 creates and populates. In "Pivot (or Unpivot) Your Data," I presented the following problem: return the yearly customer order quantities, then generate a result set that has a row for each customer and a column for each year (2002, 2003, 2004). Because you know the order years ahead of time, the solution query was static:

SELECT *
FROM (SELECT custid, YEAR(orderdate) AS orderyear, qty
  FROM dbo.Orders) AS D
  PIVOT(SUM(qty) FOR orderyear IN([2002],[2003],[2004])) AS P

Now, assume that you don't already know the years in which orders were placed. You have no idea which result columns you need to generate or how many. The dynamic part of the preceding query is the list of years you provided as input to the IN predicate. SQL Server 2005 doesn't let you specify a subquery as an argument to the IN predicate, so you have to generate the query string dynamically and use the sp_executesql stored procedure to execute it. Listing 2 dynamically constructs and executes the T-SQL string in three main steps, which callouts A, B, and C show. Run Listing 2 all in one shot—all three callouts—because there are variables defined in the first callout that are used in callouts B and C, so you have to run the whole thing as one batch. Remember, strip off the BEGIN CALLOUT and END CALLOUT lines before running the listing. Callout A's code declares a table variable @T and populates the variable with the distinct list of order years from the Orders table. The code at callout B constructs in the @cols variable the list of order years by using a simple loop that iterates through the rows in @T. The IN predicate generates the column list in the expected form: ',[first_year],[second_year],...,[last_year]'. The extra comma isn't required, but the code that constructs the string is simpler when you don't worry about it at this stage. Callout B's last code line strips off the preceding comma. And finally, the code at callout C constructs the whole query string in the @sql variable by concatenating the following elements: the known part of the earlier query up to the IN predicate's opening parenthesis, the years list in the variable @cols, and the known part of the earlier query starting with the closing parenthesis of the IN predicate. After Listing 2 constructs the query string, callout C uses the sp_executesql procedure to print and execute the string for debugging purposes.

Figure 1 shows Listing 2's results. The query string generates the desired output and the results contain the three order years in which orders were made (2002, 2003, and 2004). You can test the code for any number of years by adding orders from other years to the Orders table, then running Listing 2 again. The code should also return the other order years as result columns.

String Concatenation

Another useful application for pivoting data is concatenating strings. Imagine you're asked to return a comma-separated list of customer IDs that each employee deals with, as Figure 2 shows. For this task, you can assume that each employee deals with a maximum of 10 customers. After you have a solution to the problem, you can apply the dynamic pivoting technique I described earlier to handle an unknown number of customers.

You can use a pivoting technique to solve the problem by generating a result column for each employee's customers (first customer, second customer, etc.), then concatenate the different customer IDs instead of returning them in separate columns. However, the first customer for each employee will be different. For example, the employee 1's first customer (according to alphabetical order) is A, whereas employee 2's first customer is B. To return all employees' first customer in the first result column, and the second customer in the second column, and so on, you need to calculate a value that will position a customer ID in the right order. There's an easy, efficient way to achieve this positioning by using the ROW_NUMBER() function that I covered in May and June. The following query returns the distinct customer IDs for each employee and each customer's position among the customer IDs that the employee handles in the rownum result column:

SELECT empid, custid,
  ROW_NUMBER() OVER(
    PARTITION BY empid ORDER BY custid) AS rownum
FROM (SELECT DISTINCT empid, custid
  FROM dbo.Orders) AS D1

This query is an intermediate step that's part of the whole solution that I'll describe shortly. Figure 3 shows this query's result. After you have the rownum column to position the customers, it's easy to generate a result column for each customer position with the customer IDs as column values. Create a derived table, D2, from the above query, and pivot the data by requesting the MAX(custid) for each row number in a list of 10 possible positions, as Listing 3 shows.

Figure 4 shows Listing 3's query's result. The code generates a row for each employee, and the MAX() aggregate function returns the customer ID for each intersection of employee and customer position. It returns NULLs when there's no customer in the intersection (e.g., employee 1, position 4). The rest is simple. In the SELECT list, you need to concatenate all 10 result columns with the customer IDs into one result string. Because some of the values might be NULLs, wrap each column with the ISNULL() function to return an empty string instead of a NULL, as Listing 3 shows. You're done. Figure 3 shows this query's desired result.

Turnabout

When you need to pivot data and you already know what result columns you want to generate, you can use static queries with the new PIVOT() operator. When you don't know which attributes you're going to use to generate the result columns or how many attributes there are, you have to use dynamic execution. SQL Server 2005 also offers the UNPIVOT() operator, which is essentially the PIVOT() function's opposite--but my discussion of the UNPIVOT() operator will have to wait until next month.

Comments

Plain text