Implementing Paging

When you write applications that return query results to the client and display the results on the screen, you need to take into account the end user's screen-size limitations. If a query's result set contains too many rows to fit on the screen, you have to introduce logic into the application to split the original result set into chunks or pages. You then provide buttons or other graphical elements that let the user navigate between the result pages. The process of splitting the data into chunks, called paging, is very common, especially with Web applications. Let's look at some efficient ways to achieve paging, using two navigation options and paging techniques, then consider a commonly used technique that isn't guaranteed to work as you might expect.

Orders Scenario

When you're planning to implement paging in your application, you need to determine three things. First, what sort order do you need for the rows you'll return? Second, how many rows do you want to return per page? And third, what navigation technique will you provide for users to page through the results?

The sort order and the page size (i.e., the number of rows in a page) you choose determine which rows appear on which page. My examples use the Northwind database's Orders table. For this article's examples, I use a sort order based on OrderDate and OrderID and a page size of 10 rows. Note that sorting on unique columns is important. Without a unique sort, the relationship of the source row to the target page isn't deterministic, meaning that you can't determine for sure which page a row belongs to. Let's say you want to return the orders sorted chronologically. Because the values in the OrderDate column don't have to be unique, you need to choose a "tiebreaker" column to sort on secondarily that will determine the order of rows that have the same order date. I chose the OrderID column as the tiebreaker because it's unique.

How you choose to set up navigation between pages depends on the application users' needs. If the users need to navigate only from the current page to either the next or the previous page, you can implement a simple, efficient solution. But if users need more dynamic capabilities, such as navigating to a specific page number, you need to implement a solution that lets you filter the rows of a desired page based only on two inputs: page size and page number. Let's start by looking at navigation between adjacent pages.

Navigating to Adjacent Pages

To allow navigation to adjacent pages, you need to implement code that performs three tasks: get the first page, get the next page, and get the previous page. I recommend that you use stored procedures, one for each task. Using stored procedures gives you the flexibility to change your implementation if you find a more efficient paging method later without needing to change the client application.

Run Listing 1's code in the Northwind database to create the stored procedure that returns the first page of the Orders table. The stored procedure's code is simple. The query retrieves the top 10 rows, based on an ascending sort by order date and order ID. Run the following code to test the stored procedure:

EXEC dbo.usp_GetFirstPage;

As Figure 1 shows, this code returns the first page of Orders.

The stored procedure that returns the next page after the current one accepts two input arguments: the order date and order ID of the last row on the previously returned page. In the client application, you need to store those values locally while processing the call to the current page.

Run the code that Listing 2, shows to create the usp_GetNextPage stored procedure, which returns the page after the current one. The query filters to find the rows that have either a later order date than the order date of the last row on the previous page or the same order date and a higher order ID. The query returns the top 10 rows based on an ascending sort by order date and order ID.

Assuming the user has already requested the first page, the client application locally stored the values of order date and order ID from the last row on the first page returned. When the user requests the second page, the client application submits the following code to invoke usp_GetNextPage:

EXEC dbo.usp_GetNextPage @last_orderdate = '19960716', @last_orderid   = 10257;

The procedure returns the second page of Orders, as the result in Figure 2 shows. The client application now locally stores the order date and order ID of the last row on this page in anticipation of sending those values as input arguments if the user requests the next page. Similarly, the client application also locally stores the order date and order ID of the first row returned on this page, which it will use if the user requests the previous page.

The implementation of the stored procedure that returns the previous page is similar to the implementation of usp_GetNextPage, with a few changes. First, you need to reverse the direction of the operators—instead of greater than (>), use less than (

Run the code that Listing 3 shows to create the usp_GetPrevPage stored procedure. To test the stored procedure, execute it as follows, providing the order date and the order ID of the first row returned on the second page:

EXEC dbo.usp_GetPrevPage @first_orderdate = '19960717', @first_orderid = 10258;

You get the first page (Figure 1) back again.

Navigating to Non-Adjacent Pages

If you need to provide users with more flexible navigation capabilities that let them request specific pages regardless of the last page they requested (e.g., page 3, page 10, then page 15), you need to implement a different solution—one that can filter the rows of a requested page when given only the page number and page size as inputs. You can implement a client-side solution or a server-side solution. In a client-side solution, you retrieve all table rows sorted and calculate row numbers by incrementing a counter. You cache all table rows, then respond to a page request by calculating the range of row numbers that you need to return. The first row of a page is ((page number-1) x page size) + 1, and the last row is page number x page size. For example, page 3 with a page size of 10 rows contains the rows 21 through 30. The drawbacks of a client-side solution are that it requires a lot of memory resources at the client and it causes a lot of network traffic and wasted resources when the user doesn't need to access more than a few pages.

In a server-side solution, as in a client-side solution, you calculate row numbers (with sequential integers representing the desired sort) and store the row numbers in a temporary table along with the primary keys of the rows. The client application calculates the range of row numbers of the page the user requests, then submits a query that joins the temporary table and the base table to get all the requested attributes (e.g., order date, customer ID). In the temporary table, you can alternatively store all the attributes of interest to the user, not just the row numbers and the primary keys, thereby removing the need for a join. However, storing all attributes in the temporary table requires a lot of tempdb resources, and you need to determine whether you have those to spare.

Let's start finding a solution to a server-side paging need by calculating row numbers for the different orders and storing them along with the order IDs in a temporary table. Many programmers use a technique to calculate row numbers that isn't guaranteed to always work as you expect. The following code (don't run it!) represents a typical attempt to calculate row numbers for each order ID, based on order date and order ID order:

SELECT IDENTITY(int, 1, 1) AS 
	RowNum, OrderID+0 AS OrderID
INTO dbo.#OrdersRN
FROM dbo.Orders
ORDER BY OrderDate, OrderID;

This technique won't necessarily work as you'd expect because Microsoft doesn't guarantee that SQL Server will calculate IDENTITY values after the sort takes place. Without such a guarantee, the IDENTITY values that SQL Server assigns don't necessarily represent the requested sort. So, you have to use a different technique to calculate row numbers. In my April 2004 T-SQL 2005 Web column, "Calculating Row Numbers in SQL Server 2005," I mention that a set-based technique to calculate row numbers in SQL Server 2000 is very slow. Until you use SQL Server 2005, you have to use a cursor if you want to calculate row numbers with reasonable performance. Listing 4 shows the cursor-based code you can use to iterate through the sorted orders, calculating row numbers as you go, and storing the row numbers along with the order IDs in a temporary table. After the code creates the temporary table, your application responds to a specific page request, as Listing 5 shows. You need to run Listing 5's code to get a specific page of orders (e.g., page 3) by joining the Orders table and the temporary table that the code in Listing 4 created, filtering the range of rows that appear in the requested page.

Paging Correctly

Paging is necessary when you want to return a query's result set back to the client application in chunks. You can now provide solutions for two navigation techniques: adjacent pages and non-adjacent pages. And using stored procedures provides encapsulation that lets you alter your solutions' implementation without affecting the client application. Also, remember that IDENTITY values won't necessarily correspond to the sort order you specify in the query's ORDER BY clause, so avoid that technique for calculating row numbers based on a certain order.

Comments

Plain text