Paging is the process of splitting a result set of a query that you're supposed to return to the client into chunks and letting the user navigate through the different pages. You usually use paging when the result set of the query doesn't fit in a single screen. I discuss paging in SQL Server 2000 in detail in "Implementing Paging," December 2004. Here I focus on several important enhancements in SQL Server 2005 Beta 2 that give you a lot of flexibility and power in developing efficient paging solutions. I'll discuss four key paging enhancements: the ability to guarantee the order of assigning IDENTITY values, row numbers, TOP-option enhancements, and the new APPLY operator. Be aware that the features I describe here aren't final, nor does Microsoft guarantee that they'll appear in the final release of SQL Server 2005 in the same form or even at all.
Paging Limitations in SQL Server 2000
SQL Server 2005 includes new features that promise to overcome two SQL Server 2000 limitations that significantly affect paging solutions. The first limitation is that the TOP option in SQL Server 2000 can accept only a constant as an argument. If you're writing routines (stored procedures or user-defined functions-UDFs) that include static T-SQL queries that use the TOP option, you can't accept the page size as an argument to the routine, so you must determine a fixed page size ahead of time and stick to it.
The second limitation is more important. Many programmers implement paging by populating a temporary table with the source table rows, along with row numbers that are generated by an IDENTITY property or function. Those programmers trust that IDENTITY values will be assigned according to the order specified in the ORDER BY clause of the statement that populates the temporary table. However, SQL Server 2000 never guarantees that IDENTITY values are assigned in the order specified in the query's ORDER BY clause. First, the optimizer can create an execution plan in which the IDENTITY values are assigned before sorting occurs. Second, there's no guarantee that the identity-assignment process will be serialized. In a multi-CPU machine, IDENTITY values might be assigned separately to different chunks of the data. (For more information about this limitation, see "IDENTITY() Function Isn't Reliable for Imposing Order on a Result Set," September 2004, and the Microsoft article "INF: How the IDENTITY Function Behaves When It Is Used in SELECT INTO Queries That Have an ORDER BY Clause and a TOP Operator or a SET ROWCOUNT Statement."
Guaranteeing the Order of Assigning IDENTITY Values in SQL Server 2005
As I mentioned, one important enhancement to paging in SQL Server 2005 is that the order of assignment of IDENTITY values is guaranteed. If you specify an ORDER BY clause in an INSERT SELECT or SELECT INTO statement that populates a temporary table, SQL Server 2005 serializes the operation and always generates IDENTITY values after sorting the rows. For example, suppose that you want to page through the rows from the Orders table in the Northwind database, sorted by OrderDate and OrderID. To do so, you first use this code to create and populate the temporary table:
USE Northwind; SELECT IDENTITY(INT, 1, 1) AS RowNum, OrderID+0 AS OrderID, OrderDate, CustomerID, EmployeeID, ShipVia INTO #OrdersRN FROM Orders ORDER BY OrderDate, OrderID; CREATE UNIQUE CLUSTERED INDEX idx_uc_rownum ON #OrdersRN(RowNum);
Then, you use the following code to respond to a request for a certain page (stored in the @pagenum variable) that has a given page size (stored in the @pagesize variable):
DECLARE @pagenum AS INT, @pagesize AS INT; SET @pagenum = 3; SET @pagesize = 10; SELECT * FROM #OrdersRN WHERE RowNum BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize ORDER BY RowNum;
This technique isn't guaranteed to produce correct row numbers in SQL Server 2000 but is guaranteed to produce the correct ones in SQL Server 2005.
The new SQL Server 2005 ROW_NUMBER() function makes the change in IDENTITY-assignment behavior moot and also provides a much more elegant way to calculate row numbers. (I discuss the ROW_NUMBER() function in detail in the Web-exclusive article "Calculating Row Numbers in SQL Server 2005," April 2004, InstantDoc ID 42302.) You can avoid using the IDENTITY property entirely if you substitute the following code for the previous SELECT INTO statement:
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum, OrderID, OrderDate, CustomerID, EmployeeID, ShipVia INTO #OrdersRN FROM Orders;
Typically, applications that provide paging capabilities also let a user dynamically choose filters and sort order for the data. You can easily revise the previous example to support dynamic filters and sorting, which you provide as arguments. To do so, first, create the temporary table that will hold the result set:
CREATE TABLE #OrdersRN ( RowNum INT, OrderID INT, OrderDate DATETIME, CustomerID CHAR(5), EmployeeID INT, ShipVia INT );
Next, construct a dynamic T-SQL statement similar to the following code, which contains the filters and sort arguments, use it to populate the temporary table, and create a clustered index on the column that holds the row numbers:
-- @filter and @sort would typically be arguments to a stored procedure DECLARE @filter AS NVARCHAR(MAX), @sort AS NVARCHAR(MAX); SET @filter = N'ShipVia = 1 AND ShipCountry = N'USA'' SET @sort = N'OrderDate, OrderID' DECLARE @sql NVARCHAR(MAX); SET @sql = N'SELECT ROW_NUMBER() OVER(ORDER BY ' + COALESCE(@sort, N'NULL') + N') AS RowNum,' + N' OrderID, OrderDate, CustomerID, EmployeeID, ShipVia' + N' FROM Orders' + COALESCE(N' WHERE ' + @filter, N'); INSERT INTO #OrdersRN EXEC sp_executesql @sql; CREATE UNIQUE CLUSTERED INDEX idx_uc_rownum ON #OrdersRN(RowNum);
Finally, return the desired page requests by querying the temporary table, calculating the target row numbers according to the @pagenum and @pagesize arguments:
-- @pagenum and @pagesize would typically be arguments to a stored procedure DECLARE @pagenum AS INT, @pagesize AS INT; SET @pagenum = 3; SET @pagesize = 10; SELECT * FROM #OrdersRN WHERE RowNum BETWEEN (@pagenum-1)*@pagesize+1 AND @pagenum*@pagesize ORDER BY RowNum;
Using the temporary table technique, which contains all source rows along with row numbers, allows flexible, scrollable, dynamic paging and also returns pages quickly. However, the temporary table consumes a lot of tempdb resources, especially when the source table contains many rows, and also is overkill if a user requests only a small number of pages. SQL Server 2005 includes other enhancements that let you develop solutions that consume far fewer tempdb resources.
Using TOP, APPLY, and Page Numbers
Another way to handle paging in SQL Server 2005 is to write a function that accepts as arguments the sort-column values of the first row of a desired page (@orderdate, @orderid) and the desired page size (@pagesize). I've written a sample function, which returns the result of a SELECT statement that returns the @pagesize number of rows, starting with the row that contains the given sort-column values. This function also uses the SQL Server 2005 TOP enhancement, which lets you provide a variable as an argument instead of a constant. To see how the sample function works, run the following code in the Northwind database to create the function fn_getorderspage:
CREATE FUNCTION fn_getorderspage (@orderid AS INT, @orderdate AS DATETIME, @pagesize AS INT) RETURNS TABLE AS RETURN SELECT TOP(@pagesize) OrderID, OrderDate, CustomerID, EmployeeID, ShipVia FROM Orders WHERE OrderDate > @OrderDate OR (OrderDate = @OrderDate AND OrderID >= @OrderID) ORDER BY OrderDate, OrderID;
The function returns the TOP @pagesize rows of orders that have an OrderDate value greater than the given OrderDate, or an equal OrderDate and a greater than or equal to OrderID value, sorted by OrderDate and OrderID. In other words, it returns the TOP @pagesize orders, starting with a row that has the given OrderDate and OrderID values, according to the OrderDate and OrderID sort. To test the function, invoke it with the order ID 10248, order date '19960704', and a page size of 10 rows:
SELECT * FROM fn_getorderspage(10248, '19960704', 10) AS P;
Table 1 shows the result of this query. Next, run the following code, which creates a function called fn_orderspages that accepts a page size as an argument and returns the page numbers, order dates, and order IDs of the first row of each page:
CREATE FUNCTION fn_orderspages(@pagesize AS INT) RETURNS TABLE AS RETURN SELECT (RowNum - 1) / @pagesize + 1 AS PageNum, OrderID, OrderDate FROM (SELECT OrderID, OrderDate, ROW_NUMBER() OVER(ORDER BY OrderDate, OrderID) AS RowNum FROM Orders) AS D WHERE RowNum % @pagesize = 1;
The query in the function calculates row numbers for all orders in the Orders table sorted by order date and order ID. The query calculates page numbers by using the formula PageNum = (RowNum-1) / @pagesize + 1 and filters only the first row in each page by checking that the remainder of dividing the row number by the page size equals 1.
To test the function, run the following code and examine the abbreviated result, which Table 2 shows:
SELECT * FROM fn_orderspages(10) AS P;
The Orders table contains 830 rows, and a page size of 10 was specified as the function's argument. Thus, the result of running the above code contains 83 rows, each of which represents the first row of each page.
To start scrolling through pages, the client application should first submit the following code, which populates a temporary table with the function's result, then indexes it by page number:
SELECT * INTO #OrdersPages FROM fn_orderspages(10) AS F; CREATE UNIQUE CLUSTERED INDEX idx_uc_pagenum ON #OrdersPages(PageNum);
To satisfy a page request (say, page 3), the client application submits this code:
SELECT O.* FROM #OrdersPages AS P CROSS APPLY fn_getorderspage(P.OrderID, P.OrderDate, 10) AS O WHERE P.PageNum = 3;
The new CROSS APPLY operator invokes the specified table-valued function for each row of the outer table after it applies the filter in the WHERE clause and provides the outer table's columns as arguments. Because we filtered only one row from the #OrdersPages table-the one holding the row of the third page-the table UDF was invoked only once. Table 3 shows the result of the previous query.
If you want to return more than one page, say the first, fifth, and seventh pages, simply specify the list of page numbers in the query's filter, like this:
SELECT P.PageNum, O.* FROM #OrdersPages AS P CROSS APPLY fn_getorderspage(P.OrderID, P.OrderDate, 10) AS O WHERE P.PageNum IN(1, 5, 7);
Table 4 shows the result-the three requested pages.
By using the technique I just described, I've demonstrated several important enhancements in SQL Server 2005. However, you should be aware that although the technique is more efficient than the technique I discussed in the previous section for handling large volumes of data when you need only a small number of pages, it's much less flexible. This technique relies on static queries and thus applies to cases in which the filters and sorting are static (i.e., known ahead), which aren't typical. If you need to support dynamic filtering and sorting, you'll have to use dynamic T-SQL both to create the functions and to construct the queries that use the functions. Also, this technique assumes that no additions or deletions of rows are made within the data you're querying (e.g., when querying static historic data), which also isn't typical.
Faster, More Flexible, and Guaranteed to Work
SQL Server 2000 imposes many limitations on developers who want to implement paging solutions. The most serious limitation is probably that paging solutions aren't guaranteed to work correctly because of the IDENTITY problem I discussed earlier. Besides assuring the order of assignment of IDENTITY values, SQL Server 2005 gives you several T-SQL enhancements that make your life much easier and let you develop faster, more flexible solutions that are guaranteed to work correctly.