Logical Query Processing Part 5: The FROM Clause and UNPIVOT

Logical Query Processing Part 5: The FROM Clause and UNPIVOT

This is Part 5 of a series. Part 1 provided a high-level overview. Part 2, Part 3 and Part 4 covered the FROM clause table operators JOIN, APPLY and PIVOT, respectively. In this article I cover the fourth and last FROM clause table operator—UNPIVOT.

This article continues the series covering logical query processing, which describes the conceptual interpretation of a query. Part 1 provided a high level overview. Part 2, Part 3 and Part 4 covered the FROM clause table operators JOIN, APPLY and PIVOT, respectively. In this article I cover the fourth and last FROM clause table operator—UNPIVOT.

For sample data I’ll use the same sample database TSQLV4 that I used in the previous parts. If you don’t have it installed already you can download the code that creates and populates it from here. Make sure to set your database context to this database before running the code samples in this article:

USE TSQLV4;

Besides describing the logical processing aspects of the UNPIVOT operator, I’ll also provide an alternative to unpivoting data using the APPLY operator. I’ll also provide an example for combining the UNPIVOT and PIVOT operator, and finally, and explain how you can dynamically unpivot data.

Unoivoting data

While pivoting data involves rotating data from a state of rows to columns, unpivoting rotates data from a state of columns to rows. As an example, run the following code to create a table called MyPivotedOrders and fill it with sample data:


IF OBJECT_ID(N'Sales.MyPivotedOrders', N'U') IS NOT NULL DROP TABLE Sales.MyPivotedOrders;
-- In SQL Server 2016 use: DROP TABLE IF EXISTS Sales.MyPivotedOrders;
GO

WITH C AS
(
  SELECT empid, YEAR(orderdate) AS orderyear, val
  FROM Sales.OrderValues
  WHERE custid = 5
)  
SELECT *
INTO Sales.MyPivotedOrders
FROM C
  PIVOT( SUM(val)
    FOR orderyear IN ([2014], [2015], [2016]) ) AS P;

ALTER TABLE Sales.MyPivotedOrders
  ADD CONSTRAINT PK_MyPivotedOrders PRIMARY KEY(empid);

SELECT *
FROM Sales.MyPivotedOrders;

Table 1 shows the contents of MyPivotedOrders.

Table 1: Contents of Sales.MyPivotedOrders table

empid  2014     2015     2016
------ -------- -------- --------
1      NULL     6627.75  NULL
2      613.20   NULL     NULL
3      2222.40  2804.19  1835.70
4      NULL     NULL     709.55
5      NULL     601.83   1096.20
8      1488.80  NULL     2048.21
9      NULL     3815.25  1064.50

Unpivoting involves taking any number of source columns that is greater than one, and transforming them into two target columns—one that will hold the source column values (the values column) and another that will hold the source column names (the names column).

In this particular example the source has a row per employee and a column per order year, and each intersection of row and column holds the total order values for the current employee and year. It’s not convenient to perform data manipulation against the current state of the data. You want to unpivot it into a state that has a separate row per employee and year. Table 2 shows the desired results.

Table 2: Desired result for unpivoting task

empid  orderyear  val     
------ ---------- --------
1      2015       6627.75 
2      2014       613.20  
3      2014       2222.40 
3      2015       2804.19 
3      2016       1835.70 
4      2016       709.55  
5      2015       601.83  
5      2016       1096.20 
8      2014       1488.80 
8      2016       2048.21 
9      2015       3815.25 
9      2016       1064.50 

Similar to the PIVOT operator, T-SQL supports an operator called UNPIVOT, which you also use in the FROM clause of the query. Just like PIVOT, the UNPIVOT operator applies a number of logical steps. I’ll number the steps 1-UN, where 1 represents the fact that the FROM clause is the first major clause that is logically processed, U stands for UNPIVOT and N is the step number within the UNPIVOT operator.

The UNPIVOT operator applies the following three logical steps:

1-U1. Generate copies

1-U2. Extract element

1-U3. Remove NULLs

The following sections describe these steps and demonstrate them for our sample task.

1-U1. Generate copies

The first step in unpivoting data is to generate a separate copy of each source row for each unpivoted column. Our unpivoted columns represent the order years: 2014, 2015 and 2016, therefore this step generates a copy for each year. Table 3 shows the result of this step for our sample task.

Table 3: Result of step 1-U1

empid  orderyear  2014     2015     2016    
------ ---------- -------- -------- --------
1      2014       NULL     6627.75  NULL    
1      2015       NULL     6627.75  NULL    
1      2016       NULL     6627.75  NULL    
2      2014       613.20   NULL     NULL    
2      2015       613.20   NULL     NULL    
2      2016       613.20   NULL     NULL    
3      2014       2222.40  2804.19  1835.70 
3      2015       2222.40  2804.19  1835.70 
3      2016       2222.40  2804.19  1835.70 
4      2014       NULL     NULL     709.55  
4      2015       NULL     NULL     709.55  
4      2016       NULL     NULL     709.55  
5      2014       NULL     601.83   1096.20 
5      2015       NULL     601.83   1096.20 
5      2016       NULL     601.83   1096.20 
8      2014       1488.80  NULL     2048.21 
8      2015       1488.80  NULL     2048.21 
8      2016       1488.80  NULL     2048.21 
9      2014       NULL     3815.25  1064.50 
9      2015       NULL     3815.25  1064.50 
9      2016       NULL     3815.25  1064.50 

As mentioned earlier, unpivoting transform or source columns into two target columns—the values column and the names column. As you can see, the first step already creates the names column (orderyear in our example).

1-U2. Extract element

The second step essentially creates the values column (which we will name val in our example). It extracts the element from the source columns that corresponds to the name that the current copy represents. In our example, it extracts the value from the source year column that corresponds to the current orderyear value. When orderyear is 2014, val returns the value from the column [2014], when orderyear is 2015, val returns the value from [2015], and when orderyear is 2016, val returns the value from [2016]. Table 4 shows the desired result of this step for our sample task.

Table 4: Result of step 1-U2

empid  orderyear  val     
------ ---------- --------
1      2014       NULL    
1      2015       6627.75 
1      2016       NULL    
2      2014       613.20  
2      2015       NULL    
2      2016       NULL    
3      2014       2222.40 
3      2015       2804.19 
3      2016       1835.70 
4      2014       NULL    
4      2015       NULL    
4      2016       709.55  
5      2014       NULL    
5      2015       601.83  
5      2016       1096.20 
8      2014       1488.80 
8      2015       NULL    
8      2016       2048.21 
9      2014       NULL    
9      2015       3815.25 
9      2016       1064.50 

1-U3. Remove NULLs

The third and last step in unpivoting is to remove the rows with the NULLs in the result values column. You typically don’t want to keep those rows since they represent inapplicable cases. In our example those are cases where that employee didn’t have any activity in that year.

The output of this step in our example was shown earlier in Table 2 as the desired output of the unpivoting task.

The UNPIVOT Operator

As mentioned, the UNPIVOT operator is designed very similar to the PIVOT operator as a step in the FROM clause. Its syntax requires you to identify the three elements mentioned earlier that are involved in the unpivoting task:

1. Target name for column that will hold source column values (val in our case)

2. Target name for column that will hold source column names (orderyear in our case)

3. Names of source columns ([2014],[2015],[2016] in our case)

Once identified, you specify those elements in the UNPIVOT operator using the following syntax:

SELECT *
FROM 
  UNPIVOT( 1 FOR 2 IN (3) ) AS ;

Applied to our example, here’s the complete solution query with the UNPIVOT operator:

SELECT empid, orderyear, val

FROM Sales.MyPivotedOrders

  UNPIVOT( val FOR orderyear IN ([2014],[2015],[2016]) ) AS U;

Figure 1 shows how the elements that are involved in the UNPIVOT operator correspond to both the operator’s input and output.

Figure 1: UNPIVOT operator

 

Note that the names column (orderyear) is typed as NVARCHAR(128). If you need a different type like INT, cast it.

Figure 2 provides the flow chart describing the logical query processing of the FROM clause, with all four table operators, including the UNPIVOT operator.

 

Figure 2: Logical query processing flow chart - the FROM clause

 

Alternative solution using APPLY

The UNPIVOT operator is designed very elegantly and is very concise, but it has its disadvantages. For one, it doesn’t support (at least in a simple way) unpivoting multiple sets of source columns into multiple target values columns. Imagine for example that you have one set of columns representing order values and another representing order quantities. For another, what if you’re not interested in the removal of the rows with the NULLs in the values column? Following is an alternative solution to unpivoting using the APPLY operator, handling our example task:

 

SELECT empid, orderyear, val
FROM Sales.MyPivotedOrders
  CROSS APPLY ( VALUES(2014, [2014]),
          (2015, [2015]),
          (2016, [2016]) ) AS A(orderyear, val)
WHERE val IS NOT NULL;

The solution uses the CROSS APPLY operator with a table value constructor to define three rows for the three order years, to achieve the effect of generating copies (step 1).

The second column in each of the three rows extracts the element from the source year column that corresponds to the current order year (step 2). You could achieve the first step with a CROSS JOIN, but not the second. Recall that a join perceives its two inputs as a set, and therefore you cannot refer from one side to the elements of the other. Conversely, CROSS APPLY evaluates the inputs from left to right and therefore allows the right input to refer to elements from the left input. Here, to support multiple measures (e.g., val and qty), simply specify more columns in each row.

Finally, since APPLY generated the values column alias (val in our case) in the FROM clause, you can refer to that alias in the WHERE clause if you wish to remove the rows with the NULLs. Here, clearly, the removal of the NULLs is optional.

To demonstrate using this technique to unpivot multiple sets of columns, I’ll use a table called MyPivotedOrders2, which you create and populate by running the following code:


IF OBJECT_ID(N'Sales.MyPivotedOrders2', N'U') IS NOT NULL

  DROP TABLE Sales.MyPivotedOrders2;

-- In SQL Server 2016 use: DROP TABLE IF EXISTS Sales.MyPivotedOrders2;

GO


SELECT empid,

  SUM(CASE WHEN orderyear = 2014 THEN val END) AS val2014,

  SUM(CASE WHEN orderyear = 2015 THEN val END) AS val2015,

  SUM(CASE WHEN orderyear = 2016 THEN val END) AS val2016,

  SUM(CASE WHEN orderyear = 2014 THEN qty END) AS qty2014,

  SUM(CASE WHEN orderyear = 2015 THEN qty END) AS qty2015,

  SUM(CASE WHEN orderyear = 2016 THEN qty END) AS qty2016

INTO Sales.MyPivotedOrders2

FROM Sales.OrderValues

  CROSS APPLY ( VALUES(YEAR(orderdate)) ) AS A(orderyear)

WHERE custid = 5

GROUP BY empid;


ALTER TABLE Sales.MyPivotedOrders2

  ADD CONSTRAINT PK_MyPivotedOrders2 PRIMARY KEY(empid);


SELECT *

FROM Sales.MyPivotedOrders2;


Table 5 shows the contents of MyPivotedOrders2.

Table 5: Contents of Sales.MyPivotedOrders2 table

empid  val2014  val2015  val2016  qty2014  qty2015  qty2016
------ -------- -------- -------- -------- -------- --------
1      NULL     6627.75  NULL     NULL     235      NULL
2      613.20   NULL     NULL     62       NULL     NULL
3      2222.40  2804.19  1835.70  43       181      56
4      NULL     NULL     709.55   NULL     NULL     61
5      NULL     601.83   1096.20  NULL     38       67
8      1488.80  NULL     2048.21  64       NULL     75
9      NULL     3815.25  1064.50  NULL     27       92

You want to create two values columns—from val2014, val2015, val2016 you want to create a target column called val, and from qty2014, qty2015, qty2016 a target column called qty column. As mentioned, for each target measure, simply add in each row in the table value constructor, the column that corresponds to that measure, like so:

SELECT empid, orderyear, val, qty
FROM Sales.MyPivotedOrders2
  CROSS APPLY ( VALUES(2014, [val2014], [qty2014]),
          (2015, [val2015], [qty2015]),
          (2016, [val2016], [qty2016]) ) AS A(orderyear, val, qty)
WHERE val IS NOT NULL OR qty IS NOT NULL;

This query generates the output shown in Table 6.

Table 6: Result of unpivoting with multiple measures

empid  orderyear  val      qty
------ ---------- -------- ----
1      2015       6627.75  235
2      2014       613.20   62
3      2014       2222.40  43
3      2015       2804.19  181
3      2016       1835.70  56
4      2016       709.55   61
5      2015       601.83   38
5      2016       1096.20  67
8      2014       1488.80  64
8      2016       2048.21  75
9      2015       3815.25  27
9      2016       1064.50  92

Combining operators

Remember that you can combine table operators within the FROM clause. From a logical query processing standpoint, they’re evaluated from left to right in written order. This means that one operator’s result because the input to the next. For example, suppose that you wanted to transpose the data from the Sales.MyPivotedOrders table, and instead of showing employees on rows and order years on columns, you want to return order years on rows and employee IDs on columns. Table 7 shows the desired output.

Table 7: Desired result for transpose task

orderyear  1    2    3    4    5    8    9
---------- -------- -------- -------- -------- -------- -------- --------
2014       NULL     613.20   2222.40  NULL     NULL     1488.80  NULL
2015       6627.75  NULL     2804.19  NULL     601.83   NULL     3815.25
2016       NULL     NULL     1835.70  709.55   1096.20  2048.21  1064.50

To achieve this, you first unpivot the data to create a result with a row per employee and order year, and then pivot the result to create a row per order year and a column per employee, like so:

SELECT *
FROM Sales.MyPivotedOrders
  UNPIVOT( val FOR orderyear IN ([2014],[2015],[2016]) ) AS U
  PIVOT( MAX(val) FOR empid IN([1], [2], [3], [4], [5], [8], [9]) ) AS P;

Dynamic UNPIVOT

Last month I showed how to use a technique based on the FOR XML option to construct a PIVOT query and execute it dynamically to avoid needing to hard code the spreading values. In a very similar way you can handle a dynamic unpivoting task. In our example with the N'Sales.MyPivotedOrders table, you can obtain the names of the columns that need to be unpivoted by querying the sys.columns view. You can select all columns besides the empid column. The rest is pretty much the same as what I showed in the dynamic pivoting task. You use the FOR XML technique to create the comma separated list of column names that should appear in the UNPIVOT operator’s IN clause, and concatenate the static parts of the query before and after the list of columns. Finally, you use sp_executesql to execute the code. Here’s the complete solution code:

DECLARE
  @cols AS NVARCHAR(1000),
  @sql  AS NVARCHAR(4000);

-- Construct the column list for the IN clause
SET @cols = STUFF(
  (SELECT N','+ QUOTENAME(name) AS [text()]
   FROM sys.columns
   WHERE object_id = OBJECT_ID(N'Sales.MyPivotedOrders', N'U')
     AND name NOT IN(N'empid')
   ORDER BY name
   FOR XML PATH(''), TYPE).value('.[1]','NVARCHAR(MAX)'),
  1, 1, N'');

-- Construct the full T-SQL statement
-- and execute dynamically
SET @sql = N'SELECT empid, orderyear, val
FROM Sales.MyPivotedOrders
  UNPIVOT(val FOR orderyear IN(' + @cols + N')) AS U;';

EXEC sys.sp_executesql @stmt = @sql;

What’s next?

In this article I focused on the logical query processing of the UNPIVOT table operator. I described the three steps involved in the operator: generate copies, extract element and remove NULLs. I provided the syntax of the operator and tips about how to remember the elements involved in that syntax. I then explained the disadvantages of the operator and provided an alternative solution using the APPLY operator. I also demonstrated combining PIVOT and UNPIVOT operators, and finally showed how to handle unpivoting dynamically. This article concludes the coverage of the logical query processing aspects of the first major querying clause—the FROM clause. Next month I’ll move on to the second major clause—the WHERE clause.

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish