Maintaining Hierarchies

A hot Internet startup lures Andrew away to a new job, so Steven and Michael—who reported to Andrew—need a new manager. Superstar Janet takes Steven and Michael under her management wing, even though she already directly manages Robert, Laura, and Ann. Robert, in turn, manages his own team of employees. But recently, Robert has been looking for better hours and more pay. If Robert leaves, who'll oversee David, Ron, and Dan—not to mention David's assistant, James? And, more important, how will IT keep track of all the employee-manager changes in the company's employment hierarchy?

Hierarchical structures, also called trees, have hierarchical dependencies between their members. A common hierarchical structure is an organizational chart that describes the relationships between a company's employees. A manager is in charge of certain employees, those employees might be in charge of other employees, and so on.

The SQL language doesn't have built-in support for such hierarchical structures—nor does SQL Server. So how do you handle hierarchies with relational database management systems (RDBMSs) such as SQL Server? Consider Figure 1, which shows a simple organizational chart. Notice that each employee has a manager except Nancy, who is the top-level manager. The most common way to represent such a structure in a relational database is to use pairs of columns: One column holds the employees' (the children's) IDs; the other holds the IDs of their managers (the parents). The problem with this solution is that Nancy doesn't have a manager but you still need to store a value in her manager ID column. A common way to handle this problem is to store a NULL in the manager ID column. Another solution would be to store Nancy's employee ID in the manager ID column, making Nancy her own manager.

To see how you can maintain hierarchies with SQL Server, let's create a simple table that holds information about the employees in Figure 1's organizational chart. You can then use triggers, T-SQL queries, and stored procedures to track employee ID, employee name, manager ID, and employee and manager salaries as employees join the company, change jobs within the company, and leave the company. A script that creates the Employees table and populates it with data is available in the "Download the Code" file at the top of the page. For our example, let's use NULL as Nancy's manager ID value.

Querying the Employees Table

Without adding any information to the Employees table, you can use T-SQL statements to answer certain questions about the hierarchy of employees. For example, to find out who is the most senior manager in the organization, you can run the query

SELECT * FROM Employees
WHERE mgrid IS NULL

To find the names of all employees and their managers, you can run

SELECT E.empname AS EmployeeName, M.empname AS ManagerName
FROM Employees AS E LEFT OUTER JOIN Employees AS M
ON E.mgrid = M.empid

This query uses an outer join because an inner join would have excluded Nancy, the senior manager, from the report. The left outer join includes all the rows from the left table (the Employees table), whether or not those rows have matches in the right table (which represents managers).

The following query shows you Robert's immediate subordinates:

SELECT * FROM Employees
WHERE mgrid = 7

And to see all leaf-level employees—employees with no subordinates—you can run

SELECT * FROM Employees AS M
WHERE NOT EXISTS (SELECT empid
          FROM Employees AS E
		WHERE E.mgrid = M.empid)

This query is a correlated subquery that returns only employees who aren't managers.

Maintaining New Data

Although you can answer some questions by using the existing Employees table, you can't easily meet all of your needs for employee and manager information. Consider the following informational requests:

Show me all employees so that I can see their hierarchical dependencies.
Show me details about Robert and all his subordinates at all levels.
What is the total salary of Robert and all his subordinates at all levels?
Show me details about all leaf-level employees who report to Janet.
Show me details about all employees who are two levels under Janet.
Show me the chain of management leading to James.

To answer such requests within the existing Employees table structure, you have to use cursors or temporary tables. You can use stored procedures or a client application to implement cursors or temporary tables. (For examples of such solutions, see the SQL Server Books Online—BOL—topic "Expanding Hierarchies.")

Alternatively, you can add to the Employees table information about the hierarchical structure, then use T-SQL statements to answer these questions. The problem with modifying the table structure—for example, adding a new member or updating an existing member—is that the additional information gets out of synch and becomes of no value. To avoid this pitfall, you usually must require users to modify data only through stored procedures.

This article presents a solution that requires adding information to the Employees table. Listing 1 shows the script that creates a new Employees table with two additional columns. The lvl column holds each employee's level in the hierarchy, starting at zero for the highest level. James' lvl, for example, is 14. The hierarchy column holds the chain of command for each employee, beginning with the ID of the top manager and ending with the employee's ID. Dots separate the employee IDs in the chain. For example, James' hierarchy is '.1.3.7.11.14.'

An important aspect of this solution is that it's self maintaining: It uses triggers to automatically update the lvl and hierarchy values whenever a database user updates employee information. The solution uses triggers instead of special stored procedures to update the hierarchical data because triggers allow programmatic control of the updates and fire automatically after a modification.

Listing 2 shows the insert trigger that maintains the lvl and hierarchy columns. Note that after you create the trigger, you need to reinsert each employee's row into the new Employees table to fill it with data. Also note that this trigger lets you insert only one row at a time. You can add logic to support inserts of whole subtrees (subhierarchies within the complete hierarchy), but to keep this solution from getting too complex, I didn't include that logic.

Let's look at what's going on inside the insert trigger. Suppose you want to add an employee named Sean, who reports to Ron (employee ID 12). You would run the following insert statement:

INSERT INTO employees(empid, mgrid, empname, 
  salary) VALUES(15, 12, 'Sean', $1500.00)

Figure 2 shows what happens logically inside the trigger. First, the trigger joins the Employees table and the inserted table (which holds only Sean's new row) and returns Sean's row. This join is possible because the trigger fires only after the insert adds Sean's information to the Employees table. However, Sean's lvl and hierarchy columns still contain NULL values. The trigger then takes the result from Step 1 (Sean's row) and performs a left outer join with the Employees table to find Sean's manager's row. The trigger uses a left outer join here because the new employee might not have a manager. If the employee has NULL in the mgrid column, a match on manager wouldn't exist and the row you were trying to modify would disappear from the join result.

Now that all the required employee and manager information is available, the trigger can calculate and update the lvl and hierarchy columns. The following statement calculates the lvl column:

lvl = CASE
    WHEN E.mgrid IS NULL THEN 0
    ELSE M.lvl + 1
  END

If E.mgrid is NULL, you've added an employee that has no manager, and therefore, the employee's level is zero. In all other cases, the employee's level is one level under his or her manager.

The following statement then calculates the hierarchy column:

hierarchy = CASE
         WHEN E.mgrid IS NULL THEN '.' 
     ELSE M.hierarchy
        END + CAST(E.empid AS varchar(10)) + '.'

If the employee has no manager, the hierarchy column is '.empid.' In all other cases, the column value is the employee's manager's hierarchy value concatenated with the employee's employee ID.

Answering the Tough Questions

Now that you have the insert trigger to automatically generate lvl and hierarchy column values whenever a database user adds an employee, the Employees table should have all the information you need to answer those tough and not-so-tough questions presented earlier. For example, you can now use the following query to find out who is the most senior manager in the organization:

SELECT *
FROM Employees
WHERE lvl = 0

You can use this type of query to find all the employees at any level and even exploit an index on the lvl column to optimize query performance.

To see all employees and their hierarchical dependencies, run

SELECT *
FROM Employees
ORDER BY hierarchy

Now, let's move on to some tougher requests. To see details about Robert (employee ID 7) and his subordinates at all levels, run

SELECT *
FROM Employees
WHERE hierarchy LIKE (SELECT hierarchy
FROM Employees
WHERE empid = 7) + '%'
ORDER BY hierarchy

To see details about Robert's subordinates—excluding Robert—at all levels, you can run

SELECT *
FROM Employees
WHERE hierarchy LIKE (SELECT hierarchy
FROM Employees
WHERE empid = 7) + '_%'
ORDER BY hierarchy

On first glance, you might think the previous two queries are identical, but they aren't. Notice that the first query filters on rows that match the pattern emp_hierarchy LIKE mgr_hierarchy + '%'. Because the percent sign wildcard can stand for zero or more characters, Robert's hierarchy matches the pattern and is part of the result. In the second query, however, the pattern is slightly different: emp_hierarchy LIKE mgr_hierarchy + '_%'. The underscore wildcard (_) replaces a single unknown character, while the percent sign (%) replaces any number of unknown characters, including zero. Combining these symbols requires the existence of at least one additional character in the employee's hierarchy besides the manager's hierarchy. Thus, the query results don't include Robert.

The following query returns the total salary of Robert and his subordinates at all levels:

SELECT sum(salary) AS total_salary
FROM Employees
WHERE hierarchy LIKE (SELECT hierarchy
FROM Employees
WHERE empid = 7) + '%'

To see details about all leaf-level employees who report to Janet (employee ID 3), you can run

SELECT *
FROM Employees AS M
WHERE hierarchy LIKE (SELECT hierarchy
FROM Employees
WHERE empid = 3) + '%'
  AND NOT EXISTS(SELECT mgrid FROM Employees AS E
WHERE M.empid = E.mgrid)

And to see details about all employees two levels under Janet, run

SELECT *
FROM Employees AS M
WHERE hierarchy LIKE (SELECT hierarchy
FROM Employees
WHERE empid = 3) + '%'
AND lvl - (SELECT lvl
FROM Employees
WHERE empid = 3) = 2

SELECT *
FROM Employees AS E JOIN Employees AS M
ON E.hierarchy LIKE M.hierarchy + '%'
WHERE M.empid = 13
AND E.lvl - M.lvl = 2

Finally, to see the chain of management leading to James (employee ID 14), you can run

SELECT *
FROM Employees
WHERE (SELECT hierarchy
   FROM Employees
   WHERE empid = 14) LIKE hierarchy + '%'
ORDER BY hierarchy

Maintaining Data Modifications

This example solution for managing hierarchies with SQL Server uses the insert trigger to maintain the lvl and hierarchy columns when a database user adds an employee to the Employees table. Now, the solution needs to include an update trigger that modifies lvl and hierarchy information when employees' places within the organization change. Listing 3 shows the update trigger that maintains lvl and hierarchy.

Many computing environments don't allow updates to the primary key, so the trigger first checks whether anyone has modified the empid column. If so, the trigger rolls back the update operation and terminates. Next, the trigger checks the employee's mgrid column to see whether the employee now reports to a different manager. The employee's change in status might change his or her salary, but the trigger needs to modify lvl and hierarchy only if the employee changes managers. After all the relevant safety checks, the trigger moves to the bulk of its logic: the update statement.

Suppose Andrew is now reporting to Janet instead of Nancy:

UPDATE employees
SET mgrid = 3
WHERE empid = 2

Andrew's new position also affects his subordinates at all levels—in this case, Steven and Michael. Steven's and Michael's levels in the hierarchy change along with their hierarchy chain, as Table 1 shows.

You can generalize the effects of an employee's movement on the employee and on all of his or her subordinates and devise a formula that calculates new lvl and hierarchy values—for example,

Let old_emp_lvl = the employee's pre-update level
Let new_mgr_lvl = the level of the employee's new manager
Let new_mgr_hier = the hierarchy of the employee's new manager
Let mov_empid = the employee's employee ID
Let right_hier = the right part of the hierarchy of each affected employee, starting with his or her employee ID

Thus, the formula for calculating the new lvl and hierarchy values looks like

lvl = lvl - old_emp_lvl + \{ new_mgr_lvl + 1 | 0 \} hierarchy = \{ new_mgr_hier | '.' \} + mov_empid + '.' + right_hier

Using the formula for lvl, you can calculate Michael's new level as

Michael's current level Andrew's old level + Janet's level + 1
= 2 1 + 1 + 1 = 3

Note that if the employee moves to the highest level in the hierarchy, you don't need to retrieve his or her manager's level because that employee has no manager; his or her level is therefore 0.

According to the hierarchy formula, Michael's new hierarchy is

Janet's hierarchy + Andrew's employee ID + '.' + Michael's right part of the hierarchy starting at his employee ID = '.1.3.' + '2' + '.' + '6.' = '.1.3.2.6.'

And, again, if the employee moves to the highest level in the hierarchy, you don't need to retrieve his or her manager's hierarchy because that employee has no manager.

The update statement inside the trigger updates the Employees table. But to obtain the employee and manager information needed to filter the rows that need updating and then perform the update, the statement has to join two tables. Figure 3 shows what happens logically inside the update trigger.

The update statement first joins the Employees table and the inserted table, which holds only the updated row for Andrew. The join is tricky because you base it on the LIKE operator instead of on equality:

FROM Employees AS E JOIN INSERTED AS I
  ON E.hierarchy LIKE I.hierarchy + '%'

The join selects rows from the Employees table based on whether they start with the same hierarchy as the hierarchy in the inserted table (Andrew's old hierarchy). The result of this join is Andrew's row and the rows of his subordinates at all levels. The update statement then performs a left outer join of Step 1's results and the Employees table—which joins Andrew's and his subordinates' rows with Andrew's manager's row. The update trigger uses the left outer join here for the same reason the insert trigger used the left outer join earlier: to guard against a row dropping out in case you updated the employee's manager ID to NULL.

Removing Employees

Member removal is an operation that you might prefer to handle with stored procedures instead of an automatic mechanism such as a trigger because in different situations, you might want to implement different removal logic. Here are a few removal scenarios you can implement with stored procedures. Note that if you always want to use one of these or another removal operation for a particular scenario, you can implement the operation as a trigger instead of a stored procedure and have a totally self-maintaining solution.

Scenario 1: Remove the whole subtree, including the specified employee and all of his or her subordinates. Listing 4 shows the stored procedure that removes a subtree. The following statement removes Robert and his subtree:

EXEC RemoveSubtree @empid = 7

Scenario 2: Remove the specified employee and have his or her direct subordinates report to that employee's manager. Listing 5 shows the stored procedure that handles this scenario. Notice that the stored procedure doesn't take any action on the lvl and hierarchy columns of employees who move to the higher level; the update trigger takes care of modifying those values. Now, let's remove Andrew from the Employees table and assign subordinates Steven and Michael to Janet:

EXEC RemoveEmployeeUpgradeSubs @empid = 2

Scenario 3: Remove the specified employee and assign his or her direct subordinates to a new specified manager. Listing 6 shows the stored procedure that handles this scenario. Now, let's remove Janet and assign her subordinates to Margaret:

EXEC RemoveEmployeeMoveSubs @empid = 3, @newmgr = 4

Now that you've seen how this solution works, you can explore its practical implications. The sidebar "Practical Implementation," available online, discusses how indexes can improve query performance and how to control the order in which hierarchical queries return data. Although SQL Server doesn't offer built-in support for hierarchical structures, many businesses that rely on SQL Server to handle their business information need to manage hierarchical data. With triggers and T-SQL, you can create a functional—and self-maintaining—solution for creating and managing hierarchical dependencies.

This article is adapted from Itzik Ben-Gan's upcoming book Advanced Transact-SQL for SQL Server 2000 (Apress), co-authored by Tom Moreau.

Comments

Plain text