Avoid Unnecessary Lookups when Using ROW_NUMBER for Paging

A common way to implement paging solutions is to use the ROW_NUMBER function. However, the typical solution has an inherent inefficiency. I'll describe this inefficiency and provide a tip to solve it. I'll use the Orders table in the PerformanceV3 database in my examples. You can download the source code to create the sample database PerformanceV3.

Suppose that you need to return to the user information about orders based on orderid ordering. The user provides as inputs the page number (@pagenum) and the page size (@pagesize). You compute row numbers based on orderid ordering and then return the range of row numbers based on the inputs: rownum BETWEEN (@pagenum - 1) * @pagesize + 1 AND @pagenum * @pagesize. Here's the complete solution code, using a large page number on purpose to emphasize and exaggerate the performance problem:

USE PerformanceV3; -- http://tsql.solidq.com/books/source_code/PerformanceV3.zip
GO

DECLARE @pagenum AS INT = 1000, @pagesize AS INT = 25;

WITH C AS
(
  SELECT ROW_NUMBER() OVER(ORDER BY orderid) AS rownum,
    orderid, orderdate, custid, empid, filler
  FROM dbo.Orders
)
SELECT orderid, orderdate, custid, empid, filler
FROM C
WHERE rownum BETWEEN (@pagenum - 1) * @pagesize + 1 AND @pagenum * @pagesize;

Figure 1 shows the query plan for this solution.

You have a nonclustered, noncovering index called PK_Orders defined on the Orders table, with orderid as the key. Fortunately, the optimizer is smart enough to choose a plan that scans only as many rows in the index as the upper row number in the desired range. So, with a page number of 1,000 and a page size of 25, the plan scans the first 25,000 rows in the index and stops. Then it filters only the last 25 rows. The unfortunate thing is that the plan also performs 25,000 lookups instead of just 25, which results in many unnecessary lookups. With a request for page number 1,000, you get 76,978 reads. Granted, users don't usually get that far, but the farther they do get, the more unnecessary lookups are performed.

A simple trick that prevents the unnecessary lookups is to have the inner query return only the key that appears in the index. Then, have the outer query perform a join between the CTE and the table to obtain the rest of the information. Here's the complete solution code implementing this trick:

DECLARE @pagenum AS INT = 1000, @pagesize AS INT = 25;

WITH C AS
(
  SELECT ROW_NUMBER() OVER(ORDER BY orderid) AS rownum, orderid
  FROM dbo.Orders
)
SELECT C.orderid, O.orderdate, O.custid, O.empid, O.filler
FROM C
  INNER JOIN dbo.Orders AS O
    ON C.orderid = O.orderid
WHERE rownum BETWEEN (@pagenum - 1) * @pagesize + 1 AND @pagenum * @pagesize;

Figure 2 shows the plan for this solution.

Observe that this time you get only 25 seeks in the index on orderid, followed by 25 lookups. This plan performs only 226 reads, thanks to the fact that the filter is applied in the plan before the seeks and the lookups.

A very similar problem exists when using the OFFSET-FETCH filter for paging. The solution is also based on the same trick.

Comments

Plain text