Unpivoting Data

Pivoting and unpivoting data in SQL Server are very common tasks involving data rotation. In the case of pivoting, you need to rotate data from a state of rows to a state of columns. In the case of unpivoting, you need to rotate data from a state of columns to a state of rows. Last month I covered pivoting techniques; this month I focus on unpivoting techniques.

Related: Pivot (or Unpivot) Your Data

Related: UNPIVOT

Sample Data and Desired Results

The task of unpivoting data involves working with input data that appears in pivoted form; the purpose of unpivoting the data is to spread, or split, multiple column values from the same source row into multiple target rows. For each set of columns that you’re unpivoting, you typically want to generate two result columns: one to hold the source column names (call it the names column), and one to hold the source column values (call it the values column). If this explanation doesn’t seem to make much sense for now, that’s understandable—the process of unpivoting is best understood through an example, which is coming shortly. First I’ll describe the sample data, then the desired output, and then I’ll cover three techniques to achieve the desired results.

For sample data you’ll use a database called InsideTSQL2008. You can download the source code to create the sample data. After running the script to create the database InsideTSQL2008, run the code in Listing 1 to create and populate the table PvtSample, which you’ll use as the input data for the unpivoting tasks described in the article.

USE InsideTSQL2008;
IF OBJECT_ID('dbo.PvtSample', 'U') IS NOT NULL DROP TABLE dbo.PvtSample;
WITH PivotInput AS
(
  SELECT O.orderid, shipperid, shipcity, freight, SUM(qty) as qty
  FROM Sales.Orders AS O
    JOIN Sales.OrderDetails AS OD
  ON O.orderid = OD.orderid
  WHERE shipcountry = N'Spain'
  GROUP BY O.orderid, shipperid, shipcity, freight
)
SELECT shipperid,
  SUM(CASE WHEN shipcity = 'Barcelona' THEN freight END) AS frtBarcelona,
  SUM(CASE WHEN shipcity = 'Madrid'    THEN freight END) AS frtMadrid,
  SUM(CASE WHEN shipcity = 'Sevilla'   THEN freight END) AS frtSevilla,
  SUM(CASE WHEN shipcity = 'Barcelona' THEN qty     END) AS qtyBarcelona,
  SUM(CASE WHEN shipcity = 'Madrid'    THEN qty     END) AS qtyMadrid,
  SUM(CASE WHEN shipcity = 'Sevilla'   THEN qty     END) AS qtySevilla
INTO dbo.PvtSample
FROM PivotInput
GROUP BY shipperid;

ALTER TABLE dbo.PvtSample
  ADD CONSTRAINT PK_PvtSample PRIMARY KEY(shipperid);

UPDATE dbo.PvtSample
  SET frtBarcelona = NULL, qtyBarcelona = NULL
WHERE shipperid = 3;
GO
SELECT * FROM dbo.PvtSample;

Table 1 contains the output of the query at the end of Listing 1, showing the contents of the PvtSample table. As you can see, the table has a row for each shipper, with the shipper ID, a set of three columns with total freight values for each shipping city in Spain (frtBarcelona, frtMadrid, frtSevilla), and a set of three columns with total quantities for each shipping city in Spain (qtyBarcelona, qtyMadrid, qtySevilla).

In this article I discuss unpivoting a single set of columns, as well as unpivoting multiple sets of columns. To unpivot a single set of columns, you need to focus on the source columns shipperid, frtBarcelona, frtMadrid, and frtSevilla and ignore all the qty columns (qtyBarcelona, qtyMadrid, and qtySevilla). The task is to unpivot the data such that each source row will be unpivotted into three target rows—one for each shipping city. Each target row should hold the shipper ID, shipping city, and freight. Inapplicable cases shouldn’t be returned in the result; that is, cases in which the source value was NULL, as with shipper 3 in Barcelona. Table 2 contains the desired output for the first unpivoting task.

The second task involves unpivoting multiple sets of columns. Like in the first unpivoting task, you need to unpivot each source row into three target rows—one for each shipping city. But in the second unpivoting task, you should produce two target value columns—one for freights (from the source columns frtBarcelona, frtMadrid, and frtSevilla), and another for quantities (from the source columns qtyBarcelona, qtyMadrid, and qtySevilla). Table 3 contains the desired output for the second unpivoting task.

UNPIVOT Operator

SQL Server 2005 introduced the UNPIVOT operator to address unpivoting needs. This operator was implemented as a table operator, much like the join table operator was. This means that you use the UNPIVOT operator in the FROM clause of your query. It operates on the table specified to the left of the UNPIVOT keyword and returns a table result that you have to assign with an alias. The general syntax of a query using the UNPIVOT operator is:

SELECT 
FROM  UNPIVOT() AS ;

Just like with other table operators, the input to the UNPIVOT operator can be the result of preceding table operators, and the output of the UNPIVOT operator can be used as input by a subsequent table operator or another logical query processing phase (e.g., the WHERE filtering phase).

Some database administrators and developers have a hard time getting used to the syntax of the UNPIVOT operator. I’ll try to explain its syntax such that it will be straightforward and easy to remember, and I’ll do so using the first pivoting task described earlier (unpivoting freights for each shipping city). If you think about it, unpivoting involves rotating a set of source columns into two target columns—those that I referred to earlier as the values column and the names column. So in every unpivoting problem you need to identify three things:

1. Set of source columns to be unpivotted (in our case frtBarcelona, frtMadrid, and frtSevilla)

2. Name you want to assign to the target values column (freight in our case)

3. Name you want to assign to the target names column (shipcity in our case)

After you identify these three elements, it’s just a matter of sticking them in the right places as part of the UNPIVOT operator’s specification. The specification of the UNPIVOT operator is:

UNPIVOT(  FOR  IN () ) AS

So in our case, to address the first task you’d use the following specification:

UNPIVOT( freight FOR shipcity IN (frtBarcelona, frtMadrid, frtSevilla) ) AS U

The values of the source columns will be stored in the values column (freight), and the names of the source columns will be stored in the names column (shipcity) as character strings. Listing 2 contains the complete solution query to address the first task, producing the desired output shown earlier in Table 2.

SELECT shipperid, SUBSTRING(shipcity, 4, 8000) AS shipcity, freight
FROM dbo.PvtSample
  UNPIVOT( freight FOR shipcity IN
    (frtBarcelona, frtMadrid, frtSevilla) ) AS U;

Because the names of the source columns that hold the freights for the different shipping cities all start with the prefix frt followed by the city name, the code applies string manipulation in the SELECT list to the target names column to remove this prefix and keep only the city names.

All in all, the syntax of the UNPIVOT operator is very short and elegant, as long as you have fairly basic needs—that is, assuming that you need to unpivot only one set of source columns, all unpivotted columns have exactly the same type (including size), and you want to remove all rows with NULLs in the values column from the output. Also, when unpivoting one set of source columns, SQL Server processes the UNPIVOT operator quite efficiently, scanning the input table only once.

But things get trickier when the unpivoting task goes beyond the basic needs. For example, if the source columns to be unpivotted aren’t exactly the same type—even if they differ only in size, precision, scale, etc.—your code will fail. The workaround is to define a table expression in which you first cast all source column types to a common type, then provide that table expression as input to the UNPIVOT operator.

You’ll face another obstacle if you don’t want to remove rows with NULLs in the values column. The UNPIVOT operator doesn’t really leave this as an option but instead makes the removal of the rows with the NULLs as a mandatory phase. The workaround if you want to keep those rows is a bit awkward—you need to define a table expression in which you use the COALESCE function to substitute all NULLs with a non-NULL value that can’t originally appear in the data, and provide that table expression as input to the UNPIVOT operator. Then in the outer query you need to use the NULLIF function to substitute the non-NULL values back with NULLs.

You also need to straggle a bit in case you need to unpivot multiple sets of columns, as is the case with the second task. As a reminder, you need to unpivot both freights and quantities for each shipping city. The two sets of columns you need to unpivot are \\{frtBarcelona, frtMadrid, frtSevilla\\} for freights and \\{qtyBarcelona, qtyMadrid, qtySevilla\\} for quantities. The UNPIVOT operator doesn’t support such a scenario directly. A workaround is to write two UNPIVOT queries, each defining a different table expression, then have an outer query join the two, as Listing 3 shows, producing the output in Table 3.

WITH F AS
(
  SELECT shipperid, SUBSTRING(shipcity, 4, 8000) AS shipcity, freight
  FROM dbo.PvtSample
    UNPIVOT( freight FOR shipcity IN
  (frtBarcelona, frtMadrid, frtSevilla) ) AS U
),
Q AS
(
  SELECT shipperid, SUBSTRING(shipcity, 4, 8000) AS shipcity, qty
  FROM dbo.PvtSample
    UNPIVOT( qty FOR shipcity IN
  (qtyBarcelona, qtyMadrid, qtySevilla) ) AS U
)
SELECT
  COALESCE(F.shipperid, Q.shipperid) AS shipperid,
  COALESCE(F.shipcity, Q.shipcity) AS shipcity,
  F.freight, Q.qty
FROM F FULL OUTER JOIN Q
  ON F.shipperid = Q.shipperid
 AND F.shipcity = Q.shipcity;

If you’re wondering why the outer query uses a full outer join and the COALESCE function, this has to do with the possibility of having NULLs in the value columns, as well as the fact that each of the individual UNPIVOT queries will eliminate those rows with NULLs.

As you can see, in the case of multiple sets of source columns to be unpivotted, the query gets complicated. Also, if you examine the execution plan for the code in Listing 3, you’ll see that the unpivoting work is pretty much repeated. The input data is scanned twice, and there’s also the cost of the join. This plan is quite inefficient.

Other techniques exist for achieving unpivoting without the use of the UNPIVOT operator and without repeating the work. You’ll probably find them more convenient to work with, especially when the task is beyond the basics. First I’ll present a technique that uses a cross join, then a technique that uses the APPLY operator.

Cross Join

The cross join technique to achieve unpivoting involves three steps:

1. Generating copies

2. Extracting value

3. Removing NULLs

The first step generates a copy of each source row for each column that needs to be unpivotted. Our first task involves unpivoting three columns holding freight values for three shipping cities. To achieve this step you need to perform a cross join between the PvtSample table and a table that has a row for each shipping city. You can define the table of cities as a derived table using a table-value constructor introduced in SQL Server 2008, like so:

(VALUES('Barcelona'),('Madrid'),('Sevilla')) AS SC(shipcity)

Prior to SQL Server 2008, you can define the table of cities by using single row SELECT statements separated by UNION ALL set operations, like so:

(           SELECT 'Barcelona'
UNION ALL SELECT 'Madrid'
   UNION ALL SELECT 'Sevilla'   ) AS SC(shipcity)

To generate a copy of each source row for each shipping city, you need to perform a cross join between PvtSample and the table of cities. Using the table-value constructor introduced in SQL Server 2008, you would achieve this like so:

FROM dbo.PvtSample 
CROSS JOIN (VALUES('Barcelona'),('Madrid'),('Sevilla')) AS SC(shipcity)

To implement the second step—extracting value—you need to extract the value from the column corresponding to the city that the current copy represents. This can be achieved using the following CASE expression, which you should invoke in the SELECT list:

CASE shipcity
WHEN 'Barcelona' THEN frtBarcelona
   WHEN 'Madrid'    THEN frtMadrid
   WHEN 'Sevilla'   THEN frtSevilla
 END AS freight

Here’s the complete query implementing both steps:

SELECT shipperid, shipcity,
CASE shipcity
     WHEN 'Barcelona' THEN frtBarcelona
     WHEN 'Madrid'    THEN frtMadrid
     WHEN 'Sevilla'   THEN frtSevilla
   END AS freight
 FROM dbo.PvtSample
   CROSS JOIN (VALUES('Barcelona'),('Madrid'),('Sevilla')) AS SC(shipcity)

Note that this query doesn’t attempt to remove rows with NULLs in the result values column, unlike the solution that uses the UNPIVOT operator, which doesn’t leave this as an option. If you want to remove the rows with the NULLs, you need to define a table expression based on the last query, and perform the filtering in the WHERE clause of the outer query, as Listing 4 shows.

WITH C AS
(
  SELECT shipperid, shipcity,
    CASE shipcity
  WHEN 'Barcelona' THEN frtBarcelona
  WHEN 'Madrid'    THEN frtMadrid
  WHEN 'Sevilla'   THEN frtSevilla
    END AS freight
  FROM dbo.PvtSample
    CROSS JOIN (VALUES('Barcelona'),('Madrid'),('Sevilla')) AS SC(shipcity)
)
SELECT *
FROM C
WHERE freight IS NOT NULL;

Besides giving you control over whether to remove rows with NULLs, this technique has another advantage over the technique using the UNPIVOT operator; if you need to unpivot multiple sets of columns (like in our case, both the freights and the quantities), simply add another CASE expression for each set. For example, to address the second task in the article, you’d use the code in Listing 5.


WITH C AS
(
  SELECT shipperid, shipcity,
    CASE shipcity
  WHEN 'Barcelona' THEN frtBarcelona
  WHEN 'Madrid'    THEN frtMadrid
  WHEN 'Sevilla'   THEN frtSevilla
    END AS freight,
    CASE shipcity
  WHEN 'Barcelona' THEN qtyBarcelona
  WHEN 'Madrid'    THEN qtyMadrid
  WHEN 'Sevilla'   THEN qtySevilla
    END AS qty
  FROM dbo.PvtSample
    CROSS JOIN ( VALUES('Barcelona'),('Madrid'),('Sevilla') ) AS SC(shipcity)
)
SELECT *
FROM C
WHERE freight IS NOT NULL AND qty IS NOT NULL;

With one set of columns to be unpivotted, the solution using the cross join performs very similar to the one using the UNPIVOT operator. But with multiple sets of columns, the solution using the cross join doesn’t repeat the work and access to the input data for each set, like the solution using the UNPIVOT operator does. If you examine the plan for the query in Listing 5, you’ll see that the input data is scanned only once.

The main downside of this solution is that with multiple sets of columns to be unpivotted and the need to remove NULLs, the code tends to get verbose because of the repetitions in the CASE expressions and the table expression used for filtering. But all hope is not lost—there’s a third solution that’s both concise and efficient.

APPLY Operator

The APPLY table operator was added in SQL Server 2005, and it keeps surprising me with its cool and efficient applications. In essence, the APPLY operator applies the table expression that appears to its right to each row from the table that appears to its left. The interesting thing is that the query in the right table expression can refer to elements from the left row. Using a table-value constructor to define a derived table to the right of the APPLY operator, you can construct multiple rows in the right from each source row in the left, and thus achieve in one shot both the phase that generates copies and the one that extracts a different value from each row. As the old adage says, a picture is worth a thousand words; the picture in our case is that of the query in Listing 6, addressing the first task in the article. As I mentioned, the APPLY operator achieves both steps described earlier—generating copies and extracting values. The third step—removing NULLs—is optional with this solution. Because the APPLY operator returns the columns shipcity and freight, you can refer to the freight column directly in the query’s WHERE clause, as Listing 6 shows, without the need to define a table expression like you did in the solution using the cross join.

SELECT shipperid, A.*
FROM dbo.PvtSample
    CROSS APPLY( VALUES( 'Barcelona', frtBarcelona ),
           ( 'Madrid'   , frtMadrid    ),
           ( 'Sevilla'  , frtSevilla   )
       ) AS A(shipcity, freight)
WHERE freight IS NOT NULL;

As for dealing with multiple sets of columns, for each set of columns you simply add the corresponding column to each row in the VALUES clause. For example, to unpivot the set of quantity columns in addition to the set of freight columns, you’d use the code in Listing 7.

SELECT shipperid, A.*
FROM dbo.PvtSample
    CROSS APPLY( VALUES( 'Barcelona', frtBarcelona, qtyBarcelona ),
           ( 'Madrid'   , frtMadrid   , qtyMadrid    ),
           ( 'Sevilla'  , frtSevilla  , qtySevilla   )
       ) AS A(shipcity, freight, qty)
WHERE freight IS NOT NULL AND qty IS NOT NULL;

Third Solution’s a Charm

In this article I covered three techniques to unpivot data: The first was based on the UNPIVOT operator, the second was based on a cross join, and the third was based on the APPLY operator. All three solutions perform similarly as long as you unpivot only one set of columns—but if you unpivot multiple sets of columns, the second and third solutions perform better than the first because they still scan the input data only once and don’t involve any join overhead. The second solution is more verbose than the third because of lengthy CASE expressions (one for each set of unpivotted columns), plus it required defining a table expression in order to remove the NULL rows.

I find the third solution, which uses the APPLY operator, to be ideal. It’s elegant, concise, and efficient. It scans the input data only once, there’s no need for a join when faced with multiple sets of columns to be unpivoted, you can control whether to filter NULL rows, and if you do need to filter rows, you don’t need to put the original query in a table expression. Many thanks to Mohammad Salimabadi, who pointed me toward this solution, and to Plamen Ratchev and Brad Shultz, who covered it in the past.

Comments

Plain text