Skip navigation
Enhancing PIVOT for Crosstab Queries

Enhancing PIVOT for Crosstab Queries

Learn 6 ways to work around PIVOT's limitations

Downloads
93907.zip

Server's pivoting capabilities (also known as crosstab queries) have many practical applications. Given an input table expression, pivoting—which is essentially rotation of data—turns one column's values from multiple rows into multiple columns in a single result row. You can use pivoting in different ways, such as in reporting, custom aggregates, and relational division. Although SQL Server 2005's pivoting feature is useful, the PIVOT operator has some limitations. Here I'll show you some workarounds for these limitations, and next month I'll provide a flexible and more complete solution to dynamic-pivoting needs in SQL Server 2005. Before continuing, though, if you aren't familiar with the new PIVOT operator in SQL Server 2005, I suggest you read the Web-exclusive articles "Dynamic Pivoting," July 2004 and "Pivot (or Unpivot) Your Data," June 2004, which explain PIVOT; and "Dynamic Crosstab Queries," November 2000, which discusses dynamic pivoting in SQL Server 2000.

Implicit Grouping

Pivoting involves three phases: grouping, rotation, and aggregation.To understand these phases, let's consider an example. Suppose you're asked to pivot data from the Orders table in the Northwind database. You need to return the sum of freight for each employee and shipper, such that employee IDs (grouping column) would appear on rows, shipper IDs (rotation column) on columns, and the sum of freight (aggregation column) in the intersection of each employee and shipper, as Table 1 shows.

The grouping phase involves grouping the data by the grouping column EmployeeID. A row will be generated in the result for each employee. The rotation phase involves generating a result column for each element of the rotation column ShipVia. The aggregation phase involves aggregating the Freight column for each combination of grouping element (employee) and rotation element (shipper).

The PIVOT operator has a confusing aspect. Although you specify the rotation and aggregation columns explicitly in the PIVOT clause, SQL Server determines the grouping columns implicitly. The grouping columns are all columns from the table provided as input to the PIVOT operator, which aren't mentioned in the PIVOT clause as either aggregation or rotation columns. Thus, the following PIVOT query doesn't return the result you'd expect:

SELECT EmployeeID, \[1\], \[2\], \[3\] FROM dbo.Orders
  PIVOT(SUM(Freight)
  FOR ShipVia IN(\[1\], \[2\],
  \[3\])) AS P; SQL 

(Some code in this article wraps to multiple lines because of page constraints.) Instead of getting nine rows in the result (one row for each employee) as Table 1 shows,you get 830 rows in the result: a row for each order instead of a row for each employee. Since the columns Freight and ShipVia were specified in the PIVOT clause as aggregation and rotation columns, respectively, SQL Server implicitly used all other columns in the table for grouping.

The workaround for the implicit-grouping limitation is simple. Instead of querying the base table (e.g., Orders) directly, provide the PIVOT operator a table expression (derived table or common table expression—CTE) as the input table, as Listing 1 shows. The table expression contains only the columns required for pivoting: grouping, rotation, and aggregation.

PIVOT Isn't Dynamic

You must specify a static list of rotation elements in the parentheses following the rotation column and the IN keyword. For example, the previous PIVOT query specifies the static list of shipper IDs \[1\],\[2\],\[3\]. You can't specify a subquery or ask SQL Server to determine the distinct rotation elements. This means that with a static query, you must know in advance which elements you want to rotate.

The only workaround for this limitation is to use dynamic SQL.You query the distinct elements that you need to rotate, construct the PIVOT query string, and execute the query string by using sp_executesql or EXEC.The code in Listing 2 demonstrates how to construct the list of shipper IDs with a FOR XML PATH query. This code returns the string \[1\],\[2\],\[3\].

There are other techniques you can use to concatenate strings (e.g., by using a cursor), but the technique that uses the FOR XML PATH option is by far simpler and faster than the alternatives. (I describe this technique in "Custom Aggregations: Specialized Solutions," July 2006.)

If you don't have a table like Shippers with a row for each distinct rotation element, you can query the distinct rotation elements from the table used as the input to the PIVOT operator (Orders, in our case), as the code in Listing 3 shows. You can then construct the full query string and execute it as Listing 4 shows.

PIVOT Columns Must Be Base Columns

None of PIVOT's columns (grouping, rotation, aggregation) can be directly an expression; rather, they must all be base columns in the table provided to PIVOT as input. It's obvious that the grouping columns are base columns since the grouping columns are derived implicitly. The aggregation column can't be an expression—for example, SUM (Quantity * UnitPrice). You can't even specify COUNT ( *); instead, if you need to use the COUNT aggregate, you must specify a base column name—for example, COUNT (Order-ID). Similarly, you can't specify an expression as the rotation column—for example, YEAR (OrderDate) IN (\[1996\], \[1997\], \[1998\]).

The workaround for all these limitations is to use a table expression similar to using one to circumvent the implicit-grouping problem I described earlier. To demonstrate the technique, suppose that you need to pivot data from the result of a join between the Orders and Order Details tables. You want months on rows, years on columns, and the sum of Quantity * Unit-Price in the intersections of months and years, as Table 2 shows. The code in Listing 5 uses a derived table to provide a solution.

Another simple workaround is to use COUNT(some_column). The column can be either a real column from the input table or a pseudo column in the table expression derived from a constant. For example, the query in Listing 6 returns the output in Table 3 with the count of orders for each month and year.

Can't Rotate More than One Column

The PIVOT operator doesn't let you specify multiple IN clauses to rotate more than one column. For example, you can't request to rotate both shippers and years: FOR ShipVia IN(\[1\], \[2\], \[3\]) AND OrderYear IN (\[1996\],\[1997\], \[1998\]).

As a workaround, in the table expression that you're preparing as input to the PIVOT operator, you can concatenate the elements that you want to rotate. For example, to rotate both shippers and years, you can use the expression at callout A in Listing 7 to concatenate the two. In the parentheses following the PIVOT operator (callout B), specify FOR RotationCol IN(\[1_1996\], \[1_1997\], \[1_1998\], \[2_1996\], \[2_1997\], \[2_1998\],\[3_1996\],\[3_1997\],\[3_1998\]).The query in Listing 7 calculates the sum of freight for each employee and shipper/year combination, with employees on rows and shipper/year combinations on columns, and generates the output that Table 4 shows.

Can't Specify Multiple Aggregations

PIVOT lets you specify only one aggregation. Suppose you need to get both the sum of freight and count of orders for each employee and shipper, as Web Table 1 shows. You can't do this directly by using a single PIVOT operation, but as a workaround you can join the result sets of two PIVOT operations, as Listing 8 shows.

The query in Listing 8 defines two CTEs: The SumFreight CTE contains the sum of freight for each employee and shipper, and the CountOrders CTE contains the count of orders for each employee and shipper. The outer query joins SumFreight and Count-Orders based on a match between the employee IDs and returns the columns with the sum of freight and count of orders of each shipper.

Next Time: A More Complete Solution

I've described various limitations of the PIVOT operator and provided workarounds for each limitation. Next time, I'll show you how to write a stored procedure that encapsulates the logic and circumvents most limitations, thereby providing a more complete and flexible solution to dynamic-pivoting needs.

TAGS: SQL
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish