SQL Server 2005's Apply, Part 1

When a new version of SQL Server is released, you can typically see the benefits of any newly introduced T-SQL features fairly quickly, as well as how you can use them. But some features take longer to discover, and it can be quite a while before people understand their true value and what you can do with them. Such is the APPLY operator that was added in SQL Server 2005. Initially, this feature seemed like just an ordinary language element with a couple of uses—but as time went by, people found very creative and interesting uses for it. This article is the first in a two-part series. In Part 1, I describe the fundamentals of APPLY, using classic examples for demonstration purposes. In Part 2, I’ll cover a few more creative uses that go beyond the fundamentals.

I’d like to thank David Long, a SQL Server Magazine reader who suggested that I write an article about APPLY. Although I’ve touched on specific aspects and uses of APPLY in the past in my blog, this operator certainly deserves more extensive, focused coverage.

CROSS, OUTER, and Implicit APPLY

As I mentioned, I’ll start by describing the fundamentals of the APPLY operator. The examples in this section are against a sample database called InsideTSQL2008 that you can download. After you create the sample database, run the following code to add a shipper to the Shippers table and an index to the Orders table:

SET NOCOUNT ON;
USE InsideTSQL2008;

SET IDENTITY_INSERT Sales.Shippers ON;
INSERT INTO Sales.Shippers(shipperid, companyname, phone)
  VALUES(4, N'Shipper 00004', N'(243) 112-6091')
SET IDENTITY_INSERT Sales.Shippers OFF;

CREATE UNIQUE INDEX idx_sid_od_oid_i_eid_cid
  ON Sales.Orders(shipperid, orderdate, orderid)
  INCLUDE(empid, custid);

APPLY is a table operator. It applies a table expression you provide as its right input to each row from the table you provide as its left input. If this doesn’t make sense yet, don’t worry; often with such language elements, things make more sense when you see an example that demonstrates the use of the feature. But before I show an example, I want to mention that standard SQL supports a similar feature—however, this feature is called LATERAL correlation rather than APPLY. Back to APPLY and an example; suppose that you need to write a query that returns the two most recent orders for shipper 1. You write the following TOP query:

SELECT TOP (2) orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipperid = 1
ORDER BY orderdate DESC, orderid DESC;

But what if you need to return the two most recent orders for each of the shippers in the Shippers table? More generally, this problem is known as the TOP N per Group problem—and in our case, N is equal to 2, and the group, or partitioning, element is the shipper. The TOP filter doesn’t support a partitioning option. What you basically want to do is somehow invoke the TOP query that you wrote for a specific shipper against each of the shippers in the Shippers table. Of course, you could define a cursor against Shippers, iterating through the shippers one at a time, but that’s far from being an ideal solution. That’s exactly where the APPLY operator can help you. You provide the Shippers table as the left input to APPLY and a correlated derived table based on the previous TOP query as the right input, and instead of referring to a constant shipper ID in the query filter, you correlate the order’s shipper ID by the shipper’s shipper ID, like so:

SELECT S.shipperid, S.companyname, A.*
FROM Sales.Shippers AS S
  CROSS APPLY
    (SELECT TOP (2) orderid, orderdate, empid, custid
     FROM Sales.Orders AS O
     WHERE O.shipperid = S.shipperid
     ORDER BY orderdate DESC, orderid DESC) AS A;

Figure 1 shows the output of this query.

shipperid   companyname	    orderid	orderdate		 empid	 custid
---------   -------------   --------	-----------------------	 ------	 -------
1	    Shipper GVSUA   11071	2008-05-05 00:00:00.000	 1	 46
1	    Shipper GVSUA   11070	2008-05-05 00:00:00.000	 2	 44
2	    Shipper ETYNR   11077	2008-05-06 00:00:00.000	 1	 65
2	    Shipper ETYNR   11076	2008-05-06 00:00:00.000	 4	 9
3	    Shipper ZHISN   11061	2008-04-30 00:00:00.000	 4	 32
3	    Shipper ZHISN   11058	2008-04-29 00:00:00.000	 9	 6

The left table (Shippers) is aliased as S and the inner table (Orders) is aliased as O. Therefore, to correlate orders to the current shipper, the inner query that’s used to define the correlated derived table A uses the predicate O.shipperid = S.shipperid. Observe in the query that in the SELECT list you’re allowed to refer to attributes from both the left side (identified by S), and the right side (identified by A). Also observe that this query uses the CROSS type of the APPLY operator. With CROSS APPLY, if a row from the left side doesn’t find a match in the right side, the left row isn’t returned—much like in an inner join. There are four shippers at the moment in the Shippers table, with IDs 1, 2, 3, and 4, but only the first three have related orders. Hence, the output of the query that Figure 1 shows doesn’t include shipper 4.

In terms of optimization, APPLY excels when the partitioning element (shipperid in our case) is dense (small number of shippers, each with a fairly large number of orders) and when you arrange a good index to support the APPLY activity. The indexing guidelines follow a rule that you can memorize as POC—for Partitioning, Ordering, and Coverage. You want the index key list to start with the partitioning column (shipperid in our case), followed by the ordering columns (orderdate, orderid in our case), and include the covered columns (empid, custid in our case). If the index is clustered, you should care only about the PO parts in the key list because the index leaf rows contain all attributes anyway. You created an index called idx_sid_od_oid_i_eid_cid, following our POC guidelines earlier. With this index in place, Figure 2 shows the plan that you get for our CROSS APPLY query.

Figure 2: Plan for Top N per Group query

Observe that the plan first scans the clustered index of the Shippers table to obtain the shipper IDs. Then in a loop that iterates once per shipper, the plan performs an Index Seek operation in our POC index. The seek goes to the end of the current shipper segment in the index leaf and scans backwards two rows. Those two rows hold the information about the current shipper’s two most recent orders. This plan is very efficient when the partitioning element is dense.

By the way, SQL Server Denali introduces a standard alternative to TOP, called OFFSET/FETCH. With the new option, the code returning the two most recent orders for each shipper looks like this:

SELECT S.shipperid, S.companyname, A.*
FROM Sales.Shippers AS S
  CROSS APPLY
    (SELECT orderid, orderdate, empid, custid
     FROM Sales.Orders AS O
     WHERE O.shipperid = S.shipperid
     ORDER BY orderdate DESC, orderid DESC
     OFFSET 0 ROWS FETCH FIRST 2 ROWS ONLY) AS A;

For more information about OFFSET/FETCH, see “OFFSET/FETCH, Part 1.”

Back to our CROSS APPLY query with the TOP option, remember that the right input to the APPLY operator is a table expression. In all examples so far, I used a correlated derived table—but you can also use a table function. In fact, the official documentation about APPLY discusses its use with table functions and not with other table expressions. The fact that you can use functions is very convenient, because you can encapsulate the derived table query in an inline table function and thus simplify the solution by improving its readability and maintainability. As an example, run the code in Listing 1 to encapsulate the TOP query from the previous examples in a function called TopShipperOrders.

IF OBJECT_ID('Sales.TopShipperOrders', 'IF') IS NOT NULL
  DROP FUNCTION Sales.TopShipperOrders;
GO
CREATE FUNCTION Sales.TopShipperOrders(@sid AS INT, @n AS BIGINT) RETURNS TABLE
AS
RETURN
  SELECT TOP (@n) orderid, orderdate, empid, custid
  FROM Sales.Orders
  WHERE shipperid = @sid
  ORDER BY orderdate DESC, orderid DESC;
GO

The function accepts a shipper ID and a number of rows as two inputs and returns that many of the requested shipper’s most recent orders. Then, to return the two most recent orders for each shipper, instead of using a correlated derived table as the right input, you can use the new function, like so:

SELECT S.shipperid, S.companyname, A.*
FROM Sales.Shippers AS S
  CROSS APPLY Sales.TopShipperOrders(S.shipperid, 2) AS A;

Remember that CROSS APPLY won’t return left rows for which the right table expression returns an empty set. So in our case, as you can see in the output of the CROSS APPLY query in Figure 1, shipper 4 isn’t returned. If you want to return left rows that have no matches in the right side, you need to use the OUTER version of APPLY instead of the CROSS version, like so:

SELECT S.shipperid, S.companyname, A.*
FROM Sales.Shippers AS S
  OUTER APPLY Sales.TopShipperOrders(S.shipperid, 2) AS A;

Figure 3 shows the output of this query. This time you can see a row for shipper 4, with NULLs used as placeholders for the attributes from the right side.

shipperid   companyname     orderid	orderdate		empid	custid
---------   -------------   --------	-----------------------	------	-------
1	    Shipper GVSUA   11071	2008-05-05 00:00:00.000	1	46
1	    Shipper GVSUA   11070	2008-05-05 00:00:00.000	2	44
2	    Shipper ETYNR   11077	2008-05-06 00:00:00.000	1	65
2	    Shipper ETYNR   11076	2008-05-06 00:00:00.000	4	9
3	    Shipper ZHISN   11061	2008-04-30 00:00:00.000	4	32
3	    Shipper ZHISN   11058	2008-04-29 00:00:00.000	9	6
4	    Shipper 00004   NULL	NULL			NULL	NULL

With the introduction of the APPLY operator in SQL Server 2005, you can now refer to a table function in a subquery and pass a column from the outer table as input. Even though you’re not using the APPLY operator explicitly, you’re applying the table function to each row from the outer table implicitly. Therefore, I like to think of this capability as implicit APPLY. As an example, the following query returns, for each shipper, the order ID of its second most recent order:

SELECT S.shipperid, S.companyname,
  (SELECT TOP (1) orderid
   FROM Sales.TopShipperOrders(S.shipperid, 2) AS A
   ORDER BY orderdate, orderid) AS secondkey
FROM Sales.Shippers AS S;

Figure 4 shows the output of this query.

shipperid	companyname	secondkey
----------	--------------	-----------
1		Shipper GVSUA	11070
2		Shipper ETYNR	11076
3		Shipper ZHISN	11058
4		Shipper 00004	NULL

This implicit APPLY capability is new to SQL Server 2005. When you’re done, run the following code for cleanup:

DELETE FROM Sales.Shippers WHERE shipperid > 3;
DROP INDEX idx_sid_od_oid_i_eid_cid ON Sales.Orders;

First Fundamentals, Then Beyond

In this article, I covered the fundamentals of the APPLY operator. I explained that this operator accepts two table expressions as input and applies the right table expression to each row from the left. This operator allows correlations from the right table expression to refer to attributes from the left side. I described three forms of APPLY: CROSS APPLY, OUTER APPLY, and implicit APPLY. I explained that with CROSS APPLY, if a left table gets an empty set back from the right table expression, the left row isn’t returned either. If you want to preserve all rows from the left side, you need to use the OUTER APPLY operator. Finally, implicit APPLY allows querying a table function in a subquery and passing attributes from the outer table as input. I also explained (and demonstrated) that even though the official documentation for APPLY says that you can use table functions only as the right input, you can actually also use derived tables.

So much for the fundamentals of APPLY—as I mentioned, since the introduction of APPLY in SQL Server 2005, people have found very interesting and creative uses for the operator that go well beyond the fundamentals. I’ll cover several such uses in Part 2.

Comments

Plain text