Skip navigation

Tip: Index on Computed Column

Consider cases of columns referenced in a query based on which the optimizer can theoretically use an access method that relies on index ordering. Classic examples are columns that you filter by (index seek), group by (ordered scan and stream aggregate), order by (ordered scan), etc. In most cases when you apply manipulation to the column of interest as opposed to referring only to the base column, it will prevent the optimizer from relying on index ordering. There are only a very small number of exceptions where the optimizer was coded to realize that index ordering is preserved albeit the manipulation.  As a couple of examples, assuming there is an index on orderdate, the following queries have the potential to rely on index ordering:

USE InsideTSQL2008;

SELECT orderid, orderdate, custid, empid

FROM Sales.Orders

WHERE orderdate = '20070212';

 

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]))

       |--Index Seek(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[idx_nc_orderdate\]), SEEK:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderdate\]='2007-02-12 00:00:00.000') ORDERED FORWARD)

       |--Clustered Index Seek(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[PK_Orders\]), SEEK:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]=\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]) LOOKUP ORDERED FORWARD)

 

SELECT orderdate, COUNT(*) AS num_orders

FROM Sales.Orders

GROUP BY orderdate;

 

  |--Compute Scalar(DEFINE:(\[Expr1003\]=CONVERT_IMPLICIT(int,\[Expr1006\],0)))

       |--Stream Aggregate(GROUP BY:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderdate\]) DEFINE:(\[Expr1006\]=Count(*)))

            |--Index Scan(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[idx_nc_orderdate\]), ORDERED FORWARD)

 

The plan for the first query uses an index seek. The plan for the second query uses an ordered scan of the index  and a stream aggregate.

However, in the following cases that apply manipulation to the column, the optimizer wasn’t coded to realize that index ordering is preserved:

SELECT orderid, orderdate, custid, empid

FROM Sales.Orders

WHERE CONVERT(CHAR(8), orderdate, 112) = '20070212';

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]))

       |--Index Scan(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[idx_nc_orderdate\]),  WHERE:(CONVERT(char(8),\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderdate\],112)='20070212'))

       |--Clustered Index Seek(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[PK_Orders\]), SEEK:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]=\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]) LOOKUP ORDERED FORWARD)

 

SELECT YEAR(orderdate) AS orderyear, COUNT(*) AS num_orders

FROM Sales.Orders

GROUP BY YEAR(orderdate);

 

  |--Compute Scalar(DEFINE:(\[Expr1004\]=CONVERT_IMPLICIT(int,\[Expr1007\],0)))

       |--Hash Match(Aggregate, HASH:(\[Expr1003\]), RESIDUAL:(\[Expr1003\] = \[Expr1003\]) DEFINE:(\[Expr1007\]=COUNT(*)))

            |--Compute Scalar(DEFINE:(\[Expr1003\]=datepart(year,\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderdate\])))

                 |--Index Scan(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[idx_nc_orderdate\]))

 

In the first case the plan shows a full scan of the index (as opposed to the desired index seek), and in the second case the plan shows an unordered index scan followed by a hash aggregate (as opposed to an ordered index scan followed by a stream aggregate).

In some cases there are alternatives that provide the same functionality without applying manipulation to the column of interest. For example, when filtering a date and time period like a whole month, you should use a range filter as opposed to manipulating the filtered column. BTW, in SQL Server 2008 the optimizer was coded to realize that the expression CAST(orderdate AS DATE) preserves index ordering, therefore the predicate CAST(orderdate AS DATE) = '20070212' can potentially be handled with an index seek. But the main point I wanted to make in this entry is different…

You can create a computed column based on the expression of interest; index the computed column; and the optimizer will consider using such an index even when the query doesn’t refer to the computed column name, rather keeps the original expression. As an example, the following code creates two computed columns and indexes on them:

ALTER TABLE Sales.Orders ADD

  orderdatestr AS CONVERT(CHAR(8), orderdate, 112),

  orderyear    AS YEAR(orderdate);

 

CREATE INDEX idx_orderdatestr ON Sales.Orders(orderdatestr);

CREATE INDEX idx_orderyear    ON Sales.Orders(orderyear);

 

This time the plans for the previous queries with the manipulated columns do rely on index ordering:

SELECT orderid, orderdate, custid, empid

FROM Sales.Orders

WHERE CONVERT(CHAR(8), orderdate, 112) = '20070212';

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]))

       |--Index Seek(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[idx_orderdatestr\]), SEEK:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderdatestr\]='20070212') ORDERED FORWARD)

       |--Clustered Index Seek(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[PK_Orders\]), SEEK:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]=\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderid\]) LOOKUP ORDERED FORWARD)

 

SELECT YEAR(orderdate) AS orderyear, COUNT(*) AS num_orders

FROM Sales.Orders

GROUP BY YEAR(orderdate);

 

  |--Compute Scalar(DEFINE:(\[Expr1004\]=CONVERT_IMPLICIT(int,\[Expr1007\],0)))

       |--Stream Aggregate(GROUP BY:(\[Expr1003\]) DEFINE:(\[Expr1007\]=Count(*)))

            |--Compute Scalar(DEFINE:(\[Expr1003\]=\[InsideTSQL2008\].\[Sales\].\[Orders\].\[orderyear\]))

                 |--Index Scan(OBJECT:(\[InsideTSQL2008\].\[Sales\].\[Orders\].\[idx_orderyear\]), ORDERED FORWARD)

 

Observe that in the first case the plan shows an index seek in the new index, and in the second case it shows an ordered index scan and a stream aggregate.

Here’s some cleanup code:

DROP INDEX Sales.Orders.idx_orderdatestr, Sales.Orders.idx_orderyear;

ALTER TABLE Sales.Orders DROP COLUMN orderdatestr, orderyear;

 

Cheers,

BG

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish