SQL Server 2008’s new data type, called HIERARCHYID, provides powerful new capabilities. For example, you can use HIERARCHYID to represent nodes in a hierarchy and to query the hierarchy, as I demonstrated in last month’s column. (See “HierarchyID,” July 2008.) Another capability you can use is to move a subtree from one parent location in the hierarchy to another by using the GetReparentedValue method. In addition, you can convert a parent-child representation of the hierarchy to a representation that uses the new HIERARCHYID data type. I’ll walk through these capabilities in this month’s column, using the same Employees table that I used in last month’s column. If you’re following along, you’ll want to run the code in Listing 1 to create the Employees table and populate it with sample data.
Moving a Subtree
When I showed how to insert new nodes into the hierarchy last month, I explained how to use the GetDescendant method to calculate a HIERARCHYID value for a new node under an existing node. You might need to handle other types of changes within hierarchies as well. For example, you might need to move a whole subtree of employees from one manager to another (i.e., move an employee and all of his or her subordinates).
To achieve this task you can use the GetReparented- Value method of the HIERARCHYID data type. You invoke this method on the HIERARCHYID value of the node you want to reparent (call it @existing_ node_hid), and provide as inputs the value of the old parent (call it @old_parent_hid) and the value of the new parent (@new_mgr_hid). Here’s the syntax of the GetReparentedValue method:
Note that this method doesn’t overwrite or change the value in @existing_node_hid. Instead, it returns a new HIERARCHYID value. You must update the target node’s HIERARCHYID value with the new one. Logically, the GetReparentedValue method does something very simple—it substitutes the part of the existing node’s path that represents the old parent’s path with the new parent’s path. If the path of the existing node is /x/a/, the path of the old parent is /x/, and the path of the new parent is /y/, the GetReparentedValue method would return /y/a/. For example, if the path of the existing node is /1/2/1/, the path of the old parent is /1/2/, and the path of the new parent is /2/1/1/4/, the GetReparentedValue method would return /2/1/1/4/1/.
If the target parent already has children, the Get- ReparentedValue method won’t necessarily produce a unique value. If you reparent node /x/a/ from old parent /x/ to new parent /y/, and /y/ already has a child /y/a/, you will get a duplicate value. The trick is that you don’t use the GetReparentedValue method to move a single node from one parent to another; for this task you can simply use the GetDescendant method, and produce a completely new value. You use the GetReparentedValue method when you need to move a whole subtree in conjunction with the GetDescendant method.
Suppose you need to move the whole subtree of employee /x/a/ from old parent /x/ to new parent /y/ that already has children. You achieve this with the following steps:
1. Use the GetDescendant method to produce a completely new HIERARCHYID value under /y/ (call it /y/b/).
2. Update the HIERARCHYID value of all nodes in the subtree of /x/a/ (including self), to the return value of node.GetReparentedValue(/x/a/, /y/b/).
Because /y/b/ is a completely new HIERARCHYID value under the target parent, /y/b/ has no existing children; hence, moving the subtree doesn’t cause conflicting HIERARCHYID values.
Run the code in Listing 2 to create the usp_Reparent stored procedure that implements this logic to move a subtree. The code in the procedure first opens a transaction. Within the transaction, the code retrieves the hid value of the new manager (@new_mgr_hid) and specifies the UPDLOCK hint to obtain an UPDATE lock on the row. Only one transaction can hold an UPDATE lock at a time; once obtained, this lock is kept until the end of the transaction. This way, if simultaneous invocations of usp_Reparent and usp_AddEmp make requests to create a new HIERARCHYID value under the same target manager, those requests won’t produce duplicate values, but instead will be queued.
The code retrieves the hid value of the old manager (@old_root). Next, the code uses the GetDescendant method to produce a new HIERARCHYID value under the new manager (@new_root). Then, the GetReparentedValue method updates the HIERARCHYID values of the employee that is moved and of all descendants of that employee. Finally, the code commits the transaction, releasing all locks.
To demonstrate using the usp_Reparent procedure, run the code in Listing 3. The code first queries the Employees table, producing the output shown in Table 1. The code then invokes the usp_Reparent procedure to reparent the subtree of employee 5 (Jiru) under manager 9 (Rita). Finally, the code queries the Employees table again to present the hierarchy after the subtree has been moved. Table 2 shows the output of this query.
Converting Parent-Child to HIERARCHYID
Suppose you have an existing hierarchy represented as an adjacency list with parent-child relationships, and you need to convert it to a representation based on the new HIERARCHYID data type. To achieve this task, first run the code in Listing 4 and Listing 5. The code in Listing 4 creates and populates the source EmployeesOld table, where the parent-child relationships are reflected by the mgrid-empid attributes. The code in Listing 5 creates the target EmployeesNew table, where the converted hierarchy of employees will be stored. The hid attribute will hold the HIERARCHYID value for each employee.
To perform the conversion, I take the following steps:
1. Define a regular common table expression (CTE— call it EmpsRN) that calculates a row number for each employee; the row number is partitioned by the manager (mgrid) and ordered by the attributes that you want to determine order among siblings (say, empid).
2. Define a recursive CTE (call it EmpPaths) that queries EmpsRN and builds a character path for each employee. The anchor member will query the root employee (CEO), and assign ‘/’ as the path. The recursive member will query the direct subordinates of the previous level of managers in each iteration, then concatenate to the manager’s path the current employee’s row number and a slash.
3. Query the EmpPaths table and convert the path that was produced for each employee to the HIERARCHYID data type, and insert the result rows to the EmployeesNew table.
The complete solution that converts the old hierarchy to the new one is shown in Listing 6. After running the code in Listing 6, query the new hierarchy as follows:
SELECT REPLICATE(‘ | ‘, lvl) + empname AS empname,hid.ToString() AS path FROM dbo.EmployeesNew ORDER BY hid;
You’ll get the output shown in Table 3, where you can see the logical paths that were originally constructed from the row numbers of the employees in the management chain leading to each employee.
Practice, Practice, Practice
The new HIERARCHYID data type provides a native way to handle hierarchies, with behavior and functionality encapsulated in the type and exposed in the form of methods. The purpose of having a native type is to simplify maintaining and querying hierarchies, but as is typical with new tools, the HIERARCHYID data type requires some getting used to. Requests for a subgraph, path, presentation, reparenting, and so on simply look different when the hierarchy is represented with the HIERARCHYID data type. Only after you spend enough time practicing with a new tool can you truly appreciate whether it improves your experience.