Last month I started a two-part series about the OFFSET/FETCH filter that’s introduced in the next version of SQL Server 2012 (formerly code-named Denali). I described the fundamentals of the feature, including its syntax and optimization. This month I cover additional aspects of OFFSET/FETCH, such as the use of the filter in table expressions, logical query processing, modifications, determinism, and randomization. You can use the code in Listing 1 below to create the sample data used in this article. This is the same sample data used last month, so if you already created it, you don’t need to run the code again.
SET NOCOUNT ON; USE tempdb; IF OBJECT_ID('dbo.Orders' , 'U' ) IS NOT NULL DROP TABLE dbo.Orders; IF OBJECT_ID('dbo.Customers' , 'U' ) IS NOT NULL DROP TABLE dbo.Customers; IF OBJECT_ID('dbo.SeqOrderIDs', 'SO') IS NOT NULL DROP SEQUENCE dbo.SeqOrderIDs; IF OBJECT_ID('dbo.GetNums' , 'IF') IS NOT NULL DROP FUNCTION dbo.GetNums; GO CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE AS RETURN WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5) SELECT n FROM Nums ORDER BY n OFFSET 0 ROWS FETCH FIRST @n ROWS ONLY; GO CREATE TABLE dbo.Customers ( custid INT NOT NULL, CONSTRAINT PK_Customers PRIMARY KEY(custid) ); INSERT INTO dbo.Customers WITH (TABLOCK) (custid) SELECT n AS custid FROM dbo.GetNums(50000) AS Nums; CREATE SEQUENCE dbo.SeqOrderIDs AS INT START WITH 1 INCREMENT BY 1; CREATE TABLE dbo.Orders ( orderid INT NOT NULL DEFAULT (NEXT VALUE FOR dbo.SeqOrderIDs), custid INT NOT NULL, orderdate DATE NOT NULL, filler BINARY(200) NOT NULL DEFAULT (0x43112609), CONSTRAINT PK_Orders PRIMARY KEY(orderid), CONSTRAINT FK_Orders_Customers FOREIGN KEY(custid) REFERENCES dbo.Customers(custid) ); CREATE UNIQUE INDEX idx_unc_od_oid ON dbo.Orders(orderdate, orderid); INSERT INTO dbo.Orders WITH (TABLOCK) (custid, orderdate) SELECT CAST(RAND(n*101) * 50000 AS INT) + 1 AS custid, DATEADD(day, RAND(n*211)*365*3, '20090101') AS orderdate FROM dbo.GetNums(1000000) AS Nums;
OFFSET/FETCH and Table Expressions
I mentioned last month (“OFFSET/FETCH, Part 1”) that there are some similarities between the conceptual designs of the OFFSET/FETCH filter and the TOP filter. One of those similarities is that both filters are connected to the ORDER BY clause that traditionally served a presentation ordering meaning. The upside regarding this similarity is that if you worked with TOP, you will very quickly get the hang of working with OFFSET/FETCH. The downside is that the confusing aspects of the TOP design that are related to ordering are also applicable to OFFSET/FETCH.
You might have stumbled into cases in the past in which an attempt to define a view (or any other type of table expression for that matter) was rejected by SQL Server when an ORDER BY clause was specified in the query. For example, the following isn’t allowed:
CREATE VIEW dbo.MyView AS SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate DESC, orderid DESC; GO
The reason for this restriction stems from the relational model that’s the foundation of SQL, which in turn is the foundation of T-SQL. A view in SQL is supposed to represent a table; a table is an attempt (never mind how successful) at representing a relation from the relational model. A relation is supposed to represent a set from mathematical set theory. A set has no order. So as you can see, an ordered view is by definition a concept that contradicts the relational model, and therefore as a rule, both standard SQL and SQL Server disallow ORDER BY in a query that defines a table expression.
To allow TOP queries in a table expression definition, SQL Server makes an exception to this rule when TOP is also specified. But here’s the tricky part that many people fail to realize: The ordering is guaranteed only for the immediate level that contains the ORDER BY clause—not for an outer level. In other words, TOP will perform the correct filtering based on the specified ORDER BY clause, but the outer query against the table expression gives no presentation ordering guarantees unless you also specify an ORDER BY clause there.
The same situation is applicable to the OFFSET/FETCH clause. That is, SQL Server lets you define a table expression based on a query with an ORDER BY clause as long as OFFSET/FETCH is also specified, like so:
CREATE VIEW dbo.MyView AS SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate DESC, orderid DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY; GO
However, you should remember that a query against the table expression gets no presentation ordering guarantees unless you also specify an ORDER BY clause in the outer query. This is fully supported by standard SQL, as the ISO SQL:2008 standard points out:
“A derived table is a table derived directly or indirectly from one or more other tables by the evaluation of an expression, such as a <query expression> or <table expression>. A <query expression> can contain an optional <order by clause>. The ordering of the rows of the table specified by the <query expression> is guaranteed only for the <query expression> that immediately contains the <order by clause>.”
Note that the OFFSET/FETCH filter isn’t supported in indexed views or views with CHECK OPTION.
The ability to specify OFFSET/FETCH in a query that defines a table expression becomes handy when using set operations such as UNION, EXCEPT, and INTERSECT. Set operations allow an ORDER BY clause that applies to the result of the set operation—but not directly in the input queries to the set operation, because by definition it’s a set operation. Because the OFFSET/FETCH filter is tied to the ORDER BY clause, this means that you can’t specify this filter directly in the input queries. However, you can use table expressions as a workaround, like so:
WITH Set1 AS ( SELECT orderid, orderdate, custid, filler FROM dbo.Orders WHERE custid = 1 ORDER BY orderdate DESC, orderid DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY ), Set2 AS ( SELECT orderid, orderdate, custid, filler FROM dbo.Orders WHERE custid = 2 ORDER BY orderdate DESC, orderid DESC OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY ) SELECT * FROM Set1 UNION ALL SELECT * FROM Set2;
Logical Query Processing
In terms of evaluation order of query clauses, the OFFSET/FETCH clause is processed right after the ORDER BY clause—or, as I mentioned earlier, you can consider OFFSET/FETCH as actually being part of the ORDER BY specification. Generally, the various query clauses are evaluated in the following order:
- GROUP BY
- ORDER BY
Note that the ORDER BY clause is evaluated last, after SELECT. This means that the expressions in the SELECT list are evaluated before the OFFSET/FETCH filtering and not the other way around. To see why it’s important to understand this fact, consider the following query:
SELECT orderid, orderdate, custid, filler, ROW_NUMBER() OVER(ORDER BY orderdate DESC, orderid DESC) AS rownum FROM dbo.Orders ORDER BY orderdate DESC, orderid DESC OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY;
Because the row numbers are assigned to the result rows before the filtering, the result set will contain row numbers 21 through 30, not 1 through 10. If you need the result rows to be assigned with row numbers 1 through 10, you need to use a table expression, like so:
WITH C AS ( SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate DESC, orderid DESC OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY ) SELECT *, ROW_NUMBER() OVER(ORDER BY orderdate DESC, orderid DESC) AS rownum FROM C;
OFFSET/FETCH and Modifications
As I mentioned earlier, although the TOP filter is allowed with modification statements, the OFFSET/FETCH filter isn’t—at least not directly. The reason for the lack of support for the OFFSET/FETCH filter in modification statements is that this filter is tied to the ORDER BY clause, and modification statements don’t have an ORDER BY clause. By the way, even though TOP is allowed with modification statements, this capability has limited usability because of the lack of support for ORDER BY. At any rate, there’s a simple workaround: Define a table expression based on a SELECT query with ORDER BY and OFFSET/FETCH, and issue your modification statement through the table expression, like so (don’t run this code):
WITH C AS ( SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate, orderid OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY ) DELETE FROM C;
This code deletes the 10 oldest orders in the system.
Nondeterminism and Arbitrary and Random Order
When using the OFFSET/FETCH filter with an ORDER BY list that uniquely identifies a row, the query is said to be deterministic—meaning that there’s only one correct result. SQL Server lets you use a nonunique ORDER BY list, but then you need to realize that the query is nondeterministic. For example, consider the following query:
DECLARE @pagenumber AS INT = 6, @pagesize AS INT = 10; SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate DESC OFFSET (@pagenumber - 1) * @pagesize ROWS FETCH NEXT @pagesize ROWS ONLY;
The orderdate column isn’t unique; therefore, the query is nondeterministic. Technically, if you run the query again, there’s the possibility that it will return a different result set even without any changes in the underlying data (although this is unlikely because of optimization). Moreover, suppose you issue a subsequent query asking for the next page of 10 rows:
DECLARE @pagenumber AS INT = 7, @pagesize AS INT = 10; SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate DESC OFFSET (@pagenumber - 1) * @pagesize ROWS FETCH NEXT @pagesize ROWS ONLY;
Technically, there’s the possibility that the same row will appear in both result sets even without any changes in the underlying data. To avoid such a problem, you can guarantee determinism by adding a tiebreaker to the ORDER BY list, making it unique. If you have no special preference for which tiebreaker to use, you can always use the primary key, like so:
SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY orderdate DESC, orderid DESC OFFSET (@pagenumber - 1) * @pagesize ROWS FETCH NEXT @pagesize ROWS ONLY
Note that the OFFSET/FETCH filter doesn’t support a logical equivalent to TOP’s WITH TIES option, which ensures determinism by returning all ties.
What’s also interesting about OFFSET/FETCH is that although the ORDER BY clause doesn’t have to be unique, in which case the query becomes nondeterministic, this filter does require an ORDER BY clause to exist. So if you’re looking for completely arbitrary order, you can’t omit the ORDER BY clause. Try running the following code, and you’ll get an error:
SELECT orderid, orderdate, custid, filler FROM dbo.Orders OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY;
There’s a workaround, though. Specify an ORDER BY clause, but use a subquery that returns a constant as the only expression in the ORDER BY list, like so:
SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY (SELECT NULL) OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY;
This way you don’t pay any performance penalties associated with sorting, or force an ordered scan of an index. Note, though, that this ordering shouldn’t be considered random, but rather arbitrary. That is, there’s some likelihood (although no assurance) because of optimization that repeated invocations of this code will keep obtaining the same rows if the underlying data doesn’t change between invocations.
Random order is a totally different thing. One way to achieve random order is to order by the expression CHECKSUM(NEWID()), like so:
SELECT orderid, orderdate, custid, filler FROM dbo.Orders ORDER BY CHECKSUM(NEWID()) OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY;
The NEWID function returns a different GUID in each row. It has fairly random distribution, but with CHECKSUM applied to it, the result has better random distribution. The downside of this approach is that the entire input set (a million rows in our case) needs to be scanned and sorted. With large input tables, a better performing workaround is to use the TABLESAMPLE option. This option, with its default (and currently only) mode SYSTEM, uses the same algorithm that SQL Server uses to sample data to create distribution statistics. It physically scans only those pages that were chosen by the randomization algorithm. For example, here’s how you’d pick 10 random rows using OFFSET/FETCH out of roughly 100 that the TABLESAMPLE option would return:
SELECT orderid, orderdate, custid, filler FROM dbo.Orders TABLESAMPLE(100 ROWS) ORDER BY CHECKSUM(NEWID()) OFFSET 0 ROWS FETCH FIRST 10 ROWS ONLY;
I specified a larger number of rows as input to TABLESAMPLE than the number of rows that I eventually need to return. That’s because TABLESAMPLE will calculate a randomization factor based on the number of requested rows divided by the table’s cardinality, then apply that randomization factor to each page irrespective of the other pages. So there’s a certain possibility that fewer rows than the requested ones will be returned. The more rows you ask for by TABLESAMPLE, the probability that you’ll get at minimum the requested number of rows in the fetch_value naturally becomes greater and greater.
Work Around the Limitations
In this article I explained how to use OFFSET/FETCH filter with table expressions, how to work around the fact that you can’t use this filter with modifications statements, and the implications of the logical query processing phase in which this filter is evaluated. I also explained nondeterminism and arbitrary and random order.
The OFFSET/FETCH feature is very user friendly in the sense that its syntax is straightforward and intuitive. But as I explained, this feature has several restrictions and limitations, mainly because its design connects it to the ORDER BY clause. As long as you’re aware of the background for the limitations (having to do with the mathematical foundations of SQL), you can understand the reasoning. I provided workarounds to these limitations, which as you saw are quite straightforward.