The Big Cover-Up

Using nonclustered indexes in a certain way can make them almost as useful as clustered indexes. As I mentioned in "Time for a Tune-Up," August 2001, if the keys of a nonclustered index include all the columns referenced in a query, SQL Server doesn't have to access the table's data pages at all. Because the index's leaf level contains all the necessary information, which is logically stored in order of the first index key, that leaf level is almost like a table with a clustered index. A nonclustered index that contains all the data a query needs is called a covering index; a query that such an index supports is called a covered query.

Covering Indexes

In "Time for a Tune-Up," I looked at a couple of examples of when a covering index on a single column can be useful. One such situation is when you're interested only in the values in one column. Let's say I want to know the ID values for all customers in the Northwind database whose ID value starts with the letter C. The only information I need is in the customerID column, so a nonclustered index on customerID would cover this query:

SELECT  DISTINCT customerID FROM  Orders
WHERE  customerID LIKE 'C%'

Covering indexes are also often useful when you're dealing with aggregates. Remember that a nonclustered index contains a row with the index key value for every row in the base table, even if duplicates exist. Suppose you want to know how many rows in the Orders table have the customerID value ERNSH. A nonclustered index on customerID will contain an index row for every instance of ERNSH, so SQL Server can just count the entries in the index's leaf level:

SELECT  count(*) FROM  Orders
WHERE  customerID = 'ERNSH'

The previous two examples show queries that are covered by nonclustered indexes on a single column. Even more useful as covering indexes are composite indexes. Let's build a composite index on the Northwind database's Customers table on the country and city columns:

CREATE INDEX location_index ON Customers(country, city)

SQL Server can then use this composite index in queries such as

SELECT  city FROM  Customers
WHERE  country = 'Brazil'

If you run the above query with STATISTICS PROFILE and STATISTICS IO on, the execution plan shows that SQL Server used an index seek on the index location_index:

SELECT \[city\]=\[city\] FROM \[Customers\] WHERE \[country\]=@1

 |--Index Seek(OBJECT:(\[Northwind\].\[dbo\]
 .\[Customers\].\[location_index\]), SEEK:
 (\[Customers\].\[Country\]=Convert(\[@1\])) ORDERED FORWARD)

SQL Server used the index to traverse the index levels and find the first index row containing Brazil. Then, SQL Server could move horizontally through the index, retrieving the city values from the index rows until it encountered a country value that wasn't equal to Brazil.

Nothing in the query's plan specifically tells you that SQL Server is using a covering index, but you can compare this plan with the plan of the following similar noncovered query, which needs data from a column that isn't in the index:

SELECT CompanyName, city FROM Customers
WHERE  country = 'Brazil'

This query's plan shows a bookmark lookup operation as its middle step. The covered query, which was selecting just the city column, didn't show a bookmark-lookup, which means that SQL Server never accessed the data. In a bookmark-lookup operation, SQL Server takes the pointer from the nonclustered index row and uses it to access data in the table.

Another difference between the covered and noncovered queries is the number of logical reads they require. The covered query requires only one logical read, whereas the noncovered query requires 19 page accesses.

You can use covering indexes in two ways. For the previous covered query, SQL Server traverses the index from the root to the leaf. It stops looking when it gets to the leaf because it has found in the index all the data values it needs; it never needs to access the actual data pages. I refer to the situation in which SQL Server starts at the root and traverses each level to the leaf as a vertical use of a covering index.

SQL Server can also use a covering index by simply scanning the leaf level. The sysindexes table contains the page address of the first page at the leaf level; each page contains pointers to both the next page and the previous page. When scanning the leaf level, SQL Server doesn't traverse the index in the usual manner (i.e., starting at the root) but instead accesses the index more like it does during a table scan. The case when SQL Server just scans the leaf level is a horizontal use of a covering index.

Here's an example of a query that uses a covering index horizontally:

SELECT DISTINCT customerID FROM Orders

This query asks to see every distinct customerID value in the Orders table. One method SQL Server uses to find distinct values is to first sort the data. Because the key values in a nonclustered index's leaf level are already sorted, scanning the leaf level and returning only one row for each distinct customerID value in the leaf is a very straightforward operation for SQL Server.

SQL Server also horizontally scans a nonclustered index for queries that perform an aggregate on every data value in the index. If you had a nonclustered index on the Quantity column in the Order Details table, SQL Server could use that index to easily find the average quantity value. To test this idea, build the index, then run the aggregate query as follows:

CREATE INDEX QtyIndx ON \[Order Details\](Quantity)
GO
SELECT  AVG(Quantity) FROM  \[Order Details\]

The execution plan shows a nonclustered index scan of the QtyIndx index, then a stream aggregation that computes the average, but no access of the data itself.

Create a Cover-Up

Covering indexes provide the fastest response time possible in many cases, so if you have a crucial query that won't use your previously created indexes, consider creating a covering index. You might be able to create such an index by adding another column to an existing index, thereby creating a composite index. You can use a composite covering index for many kinds of queries. In the Northwind database, the Orders table has an index on the customerID column, which lets you execute this query quickly:

SELECT  employeeID  FROM Orders
WHERE customerID  = 'ALFKI'

However, adding the employeeID column to that index as follows could make the query a covered query because all the referenced columns would then be in the index:

CREATE INDEX customer_employee ON Orders(customerID, employeeID)

The same index will cover the query below because SQL Server will scan the leaf level:

SELECT customerID FROM Orders
WHERE employeeID= 5

Because row locators in the leaf level of a nonclustered index are actually the clustering keys if the table is clustered, you might have more covering indexes than you think. The following query is similar to a noncovered one I used earlier because it needs data from the customerID column, which wasn't among the declared index keys:

SELECT customerID, city FROM Customers
WHERE country = 'Brazil'

However, the plan for the above query shows that SQL Server performs a nonclustered index seek on the location_index index with no bookmark lookup:

|—Index Seek(OBJECT:(\[Northwind\].\[dbo\]
.\[Customers\].\[location_index\]), SEEK:(\[Customers\].\[Country\]=Convert(\[@1\])) ORDERED FORWARD)

The query accessing customerID in this case is covered because SQL Server uses the clustered index key as the bookmark for all nonclustered indexes if the table has a clustered index. So this index really has three keys: Every index row contains the two defined keys of country and city, plus the clustered key (customerID) value that corresponds to that country and city combination. So any query that involves only the country, city, and customerID columns is a covered query.

Composites

Composite indexes can be useful structures even when they don't cover a query because they can help SQL Server pinpoint the location of the rows you've requested more precisely than can a single-column index. A composite index is most useful to a query when the search arguments (SARGs) in the query match the leftmost columns in the index key. For example, if an index has a composite key of last_name, first_name, that index is useful for queries such as WHERE last_name = 'Smith' or WHERE last_name = 'Smith' AND first_name = 'Bridgette'. But it isn't useful for a query such as WHERE first_name = 'Bridgette'.

Using an index is somewhat like using a phone book: You use a phone book as an index on last name to find the corresponding phone number. But the standard phone book is useless if you know only a person's first name—even if the first name is practically unique—because a first name might be located on any page. The phone book might have only one entry with that first name, but without an index sorted on first name, you'll have to scan the entire book to locate that name. Remember that the Northwind database's Customers table has a nonclustered index on country, city. The following query is looking for rows with the city value 'Seattle':

SELECT CompanyName, City FROM Customers
WHERE City = 'Seattle'

Although only one row has the desired value, SQL Server will perform a clustered index scan (table scan) unless a separate index on the city column also exists.

When you're building composite indexes, put the most selective columns leftmost in the key of nonclustered indexes. For example, an index on emp_name, emp_sex is useful for a SARG such as WHERE emp_name = 'Smith' AND emp_sex = 'M'. But if the index is defined as emp_sex, emp_name, it can't help with most retrievals. As the leftmost key, emp_sex can't narrow the search enough to make the index useful.

Be especially aware of this limitation when you're building indexes to enforce a primary key or unique constraint that's defined on multiple columns. SQL Server builds the index in the order that you define the constraint columns. So you need to adjust the column order in the constraint to make the index most useful to queries and list the most selective columns first; doing so won't affect the constraint's role in enforcing uniqueness.

Best Indexes

To create the best possible indexes for a table, you not only have to know your data so that you know which data is selective and which isn't, but you need to know what kinds of queries your users will be submitting. For example, will they always request exact values or will they sometimes be interested in ranges? Suppose you know that your users will be submitting queries involving both the product ID and the quantity values from the Northwind database's Order Details table. You might decide to build a composite index on those two columns. But which column should come first? ProductID has 77 values and is thus a bit more selective than Quantity, which has only 55 values. That difference isn't too great, however, so you'll need to consider other factors.

Various criteria can make a difference in which column you should list first. For example, suppose a user submits the following query:

SELECT * FROM \[Order Details\]
WHERE ProductID = 59
AND Quantity < 10

If a query contains both parts of a composite index as SARGs, which column you define first doesn't really matter as long as the SARGs are based on equality. For the above query, imagine that I created an index on Quantity first and then on ProductID. All the rows for Quantity 1 would be together for all the different product IDs, followed by all the rows for Quantity 2, for all the different product IDs. The 11 rows that satisfy my WHERE clause wouldn't all be together in the index's leaf level. But if, when I created the index, I specified the ProductID column first, all the rows for ProductID 59 would be together. And within all the 59s, the quantity values would be sorted, so SQL Server would store together all the rows that met my conditions.

The Last Piece

Here are a couple of basic rules about creating composite indexes. First, the most selective column should come first; second, columns that will be searched based on an equality comparison should come before columns that will be searched by range. In this and the past two articles, I've examined how indexes are organized and which indexes can be useful for which kinds of queries. But a missing piece that will help you grasp index and query tuning is an understanding of how the SQL Server query optimizer knows whether an index will be useful. In my next column, I look at how the optimizer decides whether to use any of the available indexes when choosing the best execution plan for a query.

Comments

Plain text