Now that you've seen how this approach to creating and maintaining hierarchical data works, let's explore its practical implementation, including how indexes can improve query performance and how to control the order in which hierarchical queries return data.
Indexing for Better Performance
In the main article's simple example, all the table's rows fit in a single page, so indexes don't provide much performance benefit. In the real world, however, the tables would be much larger, and indexes could provide considerable performance improvements for certain queries.
A nonclustered index on the hierarchy column is one of the most important indexes for improving the performance of queries that return only a few rows:
CREATE UNIQUE NONCLUSTERED INDEX idx_nc_hierarchy ON Employees(hierarchy)
Note, however, that you can place an index only on a column that is 900 or fewer bytes. This restriction limits the maximum number of hierarchy levels the solution can support. But you can still support more than 100 levels, which is more than enough for most hierarchical structures.
Another problem with the nonclustered index on the hierarchy column is that the query processor doesn't always use the index. For example, imagine you have many rows in the Employees table and you issue the following query, which should return only a few rows:
SELECT * FROM Employees WHERE hierarchy LIKE (SELECT hierarchy FROM Employees WHERE empid = 7) + '%' ORDER BY hierarchy
The query processor will probably decide not to use the index on the hierarchy column. When you use the LIKE operator to filter rows, the query processor will most likely use the index on hierarchy when you compare a particular column to a constant in the form of 'constant%'. So you could rephrase the query as
SELECT * FROM Employees WHERE hierarchy LIKE '.1.3.7.%' ORDER BY hierarchy
Still, you don't want to have to check for the hierarchy value manually whenever you need to query the Employees table. One solution is to encapsulate the query inside a stored procedure. You can simply build the SQL statement inside a character-string variable, then execute the statement dynamically:
DECLARE @cmd AS varchar(8000)
SET @cmd = 'SELECT * FROM Employees WHERE hierarchy LIKE ''' + (SELECT hierarchy + '%' FROM Employees WHERE empid = 7) + '''ORDER BY hierarchy' EXECUTE(@cmd)
This technique has the same effect as running the query with a constant, and the query processor will probably use the index on the hierarchy column to improve query performance.
Why not simply write a stored procedure that accepts the employee ID as a parameter, defines a variable, and fetches the hierarchy value into that variable based on the given employee ID? Later, you could use the variable in the query. The problem with this technique is that the query optimizer optimizes a batch or a stored procedure—including the variable declaration and assignment in its body and the query that uses the variable—all in one unit. Thus, the variable's value is unknown when the query optimizer optimizes the query.
Alternatively, SQL Server MVP Umachandar Jayachandran shares the following join solution, which often uses the index on hierarchy:
SELECT * FROM Employees AS E JOIN (SELECT hierarchy FROM Employees WHERE empid = 7) as M ON E.hierarchy LIKE M.Hierarchy + '%'
Still, the best advice for wringing the best performance out of your queries is—as always—test, test, and test some more.
Hierarchical Data in Order
Note that in these hierarchy queries' results, the order among siblings might be surprising. Because the hierarchy column value is a character string, if you have two siblings with employee IDs of 100 and 20, for example, the employee with an ID of 20 sorts after the employee with an ID of 100. Usually, the order among siblings isn't important as long as the solution maintains the hierarchy. But if sibling order is important, you can change the hierarchy column to hold the chain of employee IDs in fixed-length format, in which the length is the maximum number of digits possible. For example, you could represent the employee ID of 20 as '0000020' and the employee ID of 100 as '0000100'. However, this technique uses a lot of disk space, so if you don't care about sibling order, stick with the hierarchy column's original format.