In previous articles, I’ve discussed the treatment of specialized structures in SQL Server 2005 called graphs and trees. Graphs are data structures that represent relationships between pairs of nodes. A tree is a graph that has a single root node, and you can reach each node in the tree structure by following only one path. An example of a tree is an employee organizational chart, which has many employees but only one CEO, and each employee has only one management chain. T-SQL in SQL Server 2005 lets you solve challenging problems related to the treatment of trees. For example, in "Get in the Loop with CTEs" (May 2004, InstantDoc ID 42072), I use the recursive common table expressions (CTEs) introduced in SQL Server 2005 to analyze employee hierarchies by manipulating data in a tree structure.
In this article, I’ll solve another challenge related to a tree structure—posted by SQL Server MVP Erland Sommarskog in a private Microsoft Most Valuable Professional (MVP) forum—to calculate for each employee the sum of the salaries of all subordinates in all levels, including the employee at the top of a subtree.
Calculating the Sum of Salaries in a Tree
To solve the challenge, first run the code in Listing 1 to create and populate the Employees table that I'll use in my examples throughout this article. The Employees table has a row for each employee that lists the employee's ID (empid), manager ID (mgrid), name, and salary. In this table, David is the CEO of the organization. Because David is the root of the tree, his mgrid column is NULL. You'll be able to reach each employee by following only one path of managers starting with the root (i.e., David). Table 1 shows the result of calculating the sum of David's subordinates' salaries.
Now let's calculate the salaries in Aaron's tree. Aaron’s salary is $5000. Aaron’s direct and indirect subordinates are Rita (empid = 9, salary = $3000); Gabriel (empid = 11, salary = $3000); Emilia (empid = 12, salary = $2000); Michael (empid = 13, salary = $2000); and Didi (empid = 14, salary = $1500). The sum of the salaries of Aaron and his subordinates in all levels is $5000 + $3000 + $3000 + $2000 + $2000 + $1500 = $16,500. Now, try to calculate the sum for employee ID 2 (Eitan). Table 2 shows the results you should see.
Remember, the challenge at hand is to calculate for each employee the sum of the salaries of all subordinates in all levels, including the employee at the top of a subtree. Try to solve the challenges yourself before looking at my solutions.
Solution Based on Recursive CTEs
My first solution, which Listing 2 shows, uses a recursive CTE called Explode to generate the results that Table 1 shows. The Explode CTE generates results similar to a transitive closure of the tree. This means that the Explode CTE generates all pairs of employees who share a management path between them. The major column represents a manager ID, the minor column represents a subordinate (direct or indirect) employee ID, and salary represents the salary of the subordinate(s). The anchor member of the CTE (the first query in the CTE) returns the "self" pairs for all employees—that is, both the major and minor columns hold the same employee ID. Remember that the accumulated salary values should include the employee at the top of the subtree. The anchor member returns a row for each employee, and each row represents the root of a different subtree. The purpose of the recursive member—the second query in the CTE—is to explode each root to its subtree to return all subordinates of the subtree in all levels along with their salaries. In each invocation, the recursive member returns the next level of subordinates for each employee in the previous level. Each row consists of the subtree’s root employee ID (major), the subordinate’s ID, and the salary.
The third query in the CTE aggregates each subtree represented by the major column, groups the rows from Explode by major (employee ID), and returns the total salary for each group. To apply this solution to a subtree of a given node (say Eitan, empid = 2), you can add a preliminary CTE called Subtree, which uses a traditional recursive CTE that returns the subtree of a given node. Then have the Explode CTE query the Subtree CTE instead of querying the Employees table directly. The solution that Listing 3 shows generates the results in Table 2.
Solution Based on Loops
You can also solve the challenge and generate the results that Table 1 shows by using loops. Listing 4 shows you how to calculate the salaries for all employees. First, the code calculates the level of each employee (the distance from the root) by assigning a level counter (@lvl) for each subordinate and a value of 0 to the root. As the code moves its way down the tree—a level at a time—it increments the level counter by 1 for each level. Then the code stores a row for each employee in a temporary table called #Tree. The row for each employee consists of the columns level (lvl), child employee ID (cid), parent employee ID (pid), and salary. The code then scans the employees a level at a time, starting with the bottom level and moving upward. For each employee, the code calculates the cumulative subtree sum of salaries as the current employee’s salary plus the cumulative salaries of the subordinates.
The code calculates the current employee’s salary plus the cumulative salaries of the subordinates by using an UPDATE CTE. The CTE’s inner query joins the level of employees above the current level (#Tree AS Target—the target for the update) with a derived table called Source—the table with the input data for the UPDATE statement—that represents the current level of employees. Source contains for each manager in the current level the sum of cumulative salaries of all subordinates and generates a column called src_salary. The outer UPDATE statement updates the target manager’s salary (tgt_salary) with his or her current salary plus the sum of his or her subordinate's cumulative salaries (src_salary).
To apply this solution to a particular subtree whose node ID you get as input, simply replace the commented filter in the first INSERT... SELECT statement. Replace WHERE mgrid IS NULL with WHERE empid = @root. For example, enter WHERE empid = 2 to receive the results for Eitan's subtree, as Table 2 shows.
Shorter or Faster?
The main benefit of the solutions that Listings 2 and 3 show are that the solutions that use recursive CTEs are short and elegant and therefore easy to maintain. However, these solutions are slow if you run them against large trees. Think about it. The Explode recursive CTE explodes each employee and his or her subordinates in all levels, then aggregates the data. This means that each employee is returned once for each direct or indirect manager.
The solution I show in Listing 4 that's based on loops requires more code and is harder to maintain. However, this solution doesn't explode each employee to all of his or her subordinates. Instead, the solution reuses cumulative values calculated in the current level to calculate the next level. The loop solution ends up scanning less data and therefore is substantially faster.