Descending Indexes

Certain aspects of SQL Server index B-trees and their use cases are common knowledge, but some aspects are less widely known because they fall into special cases. In this article I focus on special cases related to backward index ordering, and I provide guidelines and recommendations regarding when to use descending indexes. All my examples use a table called Orders that resides in a database called Performance. Run the code in Listing 1 to create the sample database and table and populate it with sample data. Note that the code in Listing 1 is a subset of the source code I prepared for my book Inside Microsoft SQL Server 2008: T-SQL Querying (Microsoft Press, 2009), Chapter 4, Query Tuning. If you have the book and already created the Performance database in your system, you don’t need to run the code in Listing 1.

SET NOCOUNT ON; 
USE master; 
IF DB_ID('Performance') IS NULL   
CREATE DATABASE Performance; 
GO USE Performance; 
GO 
-- Creating and Populating the Nums Auxiliary Table SET NOCOUNT ON; IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL   

DROP TABLE dbo.Nums; 
CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY); 
DECLARE @max AS INT, @rc AS INT; 
SET @max = 1000000; 
SET @rc = 1; 
INSERT INTO dbo.Nums(n) VALUES(1); 
WHILE @rc * 2 <= @max BEGIN   
INSERT INTO dbo.Nums(n) SELECT n + @rc FROM dbo.Nums;   
SET @rc = @rc * 2; END INSERT INTO dbo.Nums(n)   
SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max; 
GO IF OBJECT_ID('dbo.Orders', 'U') IS NOT NULL   
DROP TABLE dbo.Orders; 
GO -- Data Distribution Settings DECLARE   @numorders   
AS INT,   @numcusts    
AS INT,   @numemps     
AS INT,   @numshippers 
AS INT,   @numyears    
AS INT,   @startdate   
AS DATETIME; 
SELECT   
@numorders   =   1000000,   
@numcusts    =     20000,   
@numemps     =       500,   
@numshippers =     5,   
@numyears    =     4,   
@startdate   = '20050101'; 
-- Creating and Populating the Orders Table 
CREATE TABLE dbo.Orders 
(   
orderid   INT    
NOT NULL,   
custid    CHAR(11)   
NOT NULL,   empid     INT    
NOT NULL,   shipperid VARCHAR(5) 
NOT NULL,   orderdate DATETIME   
NOT NULL,   filler    CHAR(155)  
NOT NULL DEFAULT('a') ); 
CREATE CLUSTERED INDEX idx_cl_od ON dbo.Orders(orderdate); 
ALTER TABLE dbo.Orders ADD   CONSTRAINT PK_Orders PRIMARY KEY NONCLUSTERED(orderid); 
INSERT INTO dbo.Orders WITH (TABLOCK) (orderid, custid, empid, shipperid, orderdate)   
SELECT n AS orderid,     
'C' + RIGHT('000000000'         
+ CAST(         
1 + ABS(CHECKSUM(NEWID())) % @numcusts         
AS VARCHAR(10)), 10) AS custid,     
1 + ABS(CHECKSUM(NEWID())) % @numemps AS empid,     CHAR(ASCII('A') - 2 
       + 2 * (1 + ABS(CHECKSUM(NEWID())) % @numshippers)) AS shipperid,       DATEADD(day, n / (@numorders / (@numyears * 365.25)), @startdate)     
-- late arrival with earlier date     
- CASE WHEN n % 10 = 0        
THEN 1 + ABS(CHECKSUM(NEWID())) % 30         
ELSE 0       
END AS orderdate   
FROM dbo.Nums   
WHERE n <= @numorders   
ORDER BY CHECKSUM(NEWID()); GO

One of the widely understood aspects of SQL Server's indexes is that the leaf level of an index enforces bidirectional ordering through a doubly-linked list. This means that in operations that can potentially rely on index ordering—for example, filtering (seek plus partial ordered scan), grouping (stream aggregate), presentation ordering (ORDER BY)—the index can be scanned either in an ordered forward or ordered backward fashion. So, for example, if you have a query with ORDER BY col1 DESC, col2 DESC, SQL Server can rely on index ordering both when you create the index on a key list with ascending ordering (col1, col2) and with the exact reverse ordering (col1 DESC, col2 DESC).

So when do you need to use the DESC index key option? Ask SQL Server practitioners this question, and most of them will tell you that the use case is when there are at least two columns with opposite ordering requirements. For example, to support ORDER BY col1, col2 DESC, there’s no escape from defining one of the keys in descending order—either (col1, col2 DESC), or the exact reverse order (col1 DESC, col2). Although this is true, there’s more to the use of descending index than what’s commonly known.

Index Ordering and Parallelism

As it turns out, SQL Server’s storage engine isn’t coded to handle parallel backward index scans (as of SQL Server 2008 SP1 Cumulative Update 6)—not because there’s a technical problem or engineering difficulty with supporting the option, but simply because it hasn’t yet floated as a customer request. My guess is that most DBAs just aren’t aware of this behavior and therefore haven’t thought to ask for it. Although performing a backward scan gives you the benefit of relying on index ordering and therefore avoiding expensive sorting or hashing, the query plan can’t benefit from parallelism. If you find a case in which parallelism is important, you need to arrange an index that allows an ordered forward scan.

Consider the following query as an example:

USE Performance;

SELECT *
FROM dbo.Orders
WHERE orderid <= 100000
ORDER BY orderdate;

There’s a clustered index defined on the table with orderdate ascending as the key. The table has 1,000,000 rows, and the number of qualifying rows in the query is 100,000. My laptop has eight logical CPUs. Figure 1 shows the graphical query plan for this query. Here’s the textual plan:

|--Parallelism(Gather Streams, ORDER BY:(\\[orderdate\\] ASC))

|--Clustered Index Scan(OBJECT:(\\[idx_cl_od\\]),  WHERE:(\\[orderid\\]<=(100000)) ORDERED FORWARD)

As you can see, a parallel query plan was used. Now try the same query with descending ordering:

SELECT *
FROM dbo.Orders
WHERE orderid <= 100000
ORDER BY orderdate DESC;

Figure 2 shows the graphical query plan for this query. Here’s the textual plan:

|--Clustered Index Scan(OBJECT:(\\[idx_cl_od\\]),  
WHERE:(\\[orderid\\]<=(100000)) ORDERED BACKWARD)

Note that although an ordered scan of the index was used, the plan is serial because the scan ordering is backward. If you want to allow parallelism, the index must be scanned in an ordered forward fashion. So in this case, the column orderdate must be defined with DESC ordering in the index key list.

As an anecdote, this reminds me that when descending indexes were introduced in SQL Server 2000 RTM, my friend Wayne Snyder discovered an interesting bug. Suppose you had a descending clustered index on the Orders table and issued the following query:

DELETE FROM dbo.Orders WHERE orderdate < '20050101';

Instead of deleting the rows before 2005, SQL Server deleted all rows after January 1, 2005! Fortunately, Wayne discovered this bug and reported it, and it was fixed in SP1.

Index Ordering and Ranking Calculations

Back to cases in which descending indexes are relevant, it appears that ranking calculations—particularly ones that have a PARTITION BY clause—need to perform an ordered forward scan of the index in order to avoid the need to sort the data. Again, this is only the case when the calculation is partitioned. When the calculation isn’t partitioned, both a forward and backward scan can be utilized. Consider the following example:

SELECT
ROW_NUMBER() OVER(ORDER BY orderdate DESC,
                 orderid   DESC) AS RowNum,
  orderid, orderdate, custid, filler
FROM dbo.Orders;

Assuming that there’s currently no index to support the ranking calculation, SQL Server must actually sort the data, as the query execution plan in Figure 3 shows. Here’s the textual form of the plan:

|--Sequence Project(DEFINE:(\\[Expr1004\\]=row_number))
 |--Segment
  |--Parallelism(Gather Streams, ORDER BY:(\\[orderdate\\] DESC, \\[orderid\\] DESC))
   |--Sort(ORDER BY:(\\[orderdate\\] DESC, \\[orderid\\] DESC))
    |--Clustered Index Scan(OBJECT:(\\[idx_cl_od\\]))

Indexing guidelines for queries with nonpartitioned ranking calculations are to have the ranking ordering columns in the index key list, either in specified order, or exactly reversed, plus include the rest of the columns from the query in the INCLUDE clause for coverage purposes. With this in mind, to support the previous query you can define the index with all the keys in ascending order, like so:

CREATE UNIQUE INDEX idx_od_oid_i_cid_filler
  ON dbo.Orders(orderdate, orderid)
  INCLUDE(custid, filler);

Rerun the query, and observe in the query execution plan in Figure 4 that the index was scanned in an ordered backward fashion. Here’s the textual form of the plan:

|--Sequence Project(DEFINE:(\\[Expr1004\\]=row_number))
 |--Segment
  |--Index Scan(OBJECT:(\\[idx_od_oid_i_cid_filler\\]), ORDERED BACKWARD)

However, when partitioning is involved in the ranking calculation, it appears that SQL Server is strict about the ordering requirement—it must match the ordering in the expression. For example, consider the following query:

SELECT
  ROW_NUMBER() OVER(PARTITION BY custid
            ORDER BY orderdate DESC,
                 orderid   DESC) AS RowNum,
  orderid, orderdate, custid, filler
FROM dbo.Orders;

When partitioning is involved, the indexing guidelines are to put the partitioning columns first in the key list, and the rest is the same as the guidelines for nonpartitioned calculations. Now try to create an index following these guidelines, but have the ordering columns appear in ascending order in the key list:

CREATE UNIQUE INDEX idx_cid_od_oid_i_filler
  ON dbo.Orders(custid, orderdate, orderid)
  INCLUDE(filler);

Observe in the query execution plan in Figure 5 that the optimizer didn’t rely on index ordering but instead sorted the data. Here’s the textual form of the plan:

|--Parallelism(Gather Streams)
 |--Index Insert(OBJECT:(\\[idx_cid_od_oid_i_filler\\]))
  |--Sort(ORDER BY:(\\[custid\\] ASC, \\[orderdate\\] ASC, \\[orderid\\] ASC) PARTITION ID:(\\[custid\\]))
   |--Index Scan(OBJECT:(\\[idx_od_oid_i_cid_filler\\]))

If you want to avoid sorting, you need to arrange an index that matches the ordering in the ranking calculation exactly, like so:

CREATE UNIQUE INDEX idx_cid_odD_oidD_i_filler
  ON dbo.Orders(custid, orderdate DESC, orderid DESC)
  INCLUDE(filler);

Examine the query execution plan in Figure 6, and observe that the index was scanned in an ordered forward fashion and a sort was avoided. Here’s the textual plan:

|--Sequence Project(DEFINE:(\\[Expr1004\\]=row_number))
 |--Segment
  |--Index Scan(OBJECT:(\\[idx_cid_odD_oidD_i_filler\\]), ORDERED FORWARD)

When you’re done, run the following code for cleanup:

DROP INDEX dbo.Orders.idx_od_oid_i_cid_filler;
DROP INDEX dbo.Orders.idx_cid_od_oid_i_filler;
DROP INDEX dbo.Orders.idx_cid_odD_oidD_i_filler;

One More Time

In this article I covered the usefulness of descending indexes. I described the cases in which index ordering can be relied on in both forward and backward linked list order, as opposed to cases that support only forward direction. I explained that partitioned ranking calculations can benefit from index ordering only when an ordered forward scan is used, and therefore to benefit from index ordering you need to create an index in which the key column ordering matches that of the ORDER BY elements in the ranking calculation. I also explained that even when backward scans in an index are supported, this prevents parallelism; so even in those cases there might be benefit in arranging an index that matches the ordering requirements exactly rather than in reverse.

Comments

Plain text