| |||||
|
Hierarchy traversal - BOMs in Sybase More DBA job interview questions and answers at
http://dba.fyicenter.com/Interview-Questions/
(Continued from previous question...) Hierarchy traversal - BOMs in Sybase Alright, so you wanna know more about representing hierarchies in a relational database? Before I get in to the nitty gritty I should at least give all of the credit for this algorithm to: "_Hierarical_Structures:_The_Relational_Taboo!_, _(Can_ Transitive_Closure_Queries_be_Efficient?)_", by Michael J. Kamfonas as published in 1992 "Relational Journal" (I don't know which volume or issue). The basic algorithm goes like this, given a tree (hierarchy) that looks roughly like this (forgive the ASCII art--I hope you are using a fixed font to view this): a / \ / \ / \ b c / \ /|\ / \ / | \ / \ / | \ d e f | g
The next step assigned two numbers to each node in the tree, called left and right numbers, such that the left and right numbers of each node contain the left and right numbers of the ancestors of that node (I'll get into the algorithm for assigning these left and right numbers later, but, hint: use a depth-first search): 1a16 / \ / \ / \ 2b7 8c15 / \ /|\ / \ / | \ / \ / | \ 3d4 5e6 9f10 11g12 13h14
So, you will notice that all of the children of node 'a' have left and right numbers between 1 and 16, and likewise all of the children of 'c' have left and right numbers between 8 and 15. In a slightly more relational format this table would look like: Table: hier node parent left_nbr right_nbr ----- ------ -------- --------- a NULL 1 16 b a 2 7 c a 8 15 d b 3 4 e b 5 6 f c 9 10 g c 11 12 h c 13 14 So, given a node name, say @node (in Sybase variable format), and you want to know all of the children of the node you can do: SELECT h2.node FROM hier h1, hier h2 WHERE h1.node = @node AND h2.left_nbr > h1.left_nbr AND h2.left_nbr < h1.right_nbr If you had a table that contained, say, the salary for each node in your hierarchy (assuming a node is actually a individual in a company) you could then figure out the total salary for all of the people working underneath of @node by doing: SELECT sum(s.salary) FROM hier h1, hier h2, salary s WHERE h1.node = @node AND h2.left_nbr > h1.left_nbr AND h2.right_nbr > h1.right_nbr AND s.node = h2.node Pretty cool, eh? And, conversely, if you wanted to know how much it cost to manage @node (i.e. the combined salary of all of the boss's of @node), you can do: SELECT sum(s.salary) FROM hier h1, hier h2, salary s WHERE h1.node = @node AND h2.left_nbr < h1.left_nbr AND h2.left_nbr > h1.right_nbr AND s.node = h2.node Now that you can see the algorithm in action everything looks peachy, however the sticky point is the method in which left and right numbers get assigned. And, unfortunately, there is no easy method to do this relationally (it can be done, it just ain't that easy). For an real- world application that I have worked on, we had an external program used to build and maintain the hierarchies, and it was this program's responsibility to assign the left and right numbers. But, in brief, here is the algorithm to assign left and right numbers to every node in a hierarchy. Note while reading this that this algorithm uses an array as a stack, however since arrays are not available in Sybase, they are (questionably) emulated using a temp table. DECLARE @skip int, @counter int, @idx int, @left_nbr int, @node varchar(10) /*-- Initialize variables --*/ SELECT @skip = 1000, /* Leave gaps in left & right numbers */ @counter = 0, /* Counter of next available left number */ @idx = 0 /* Index into array */ /* * The following table is used to emulate an array for Sybase, * for Oracle this wouldn't be a problem. :( */ CREATE TABLE #a ( idx int NOT NULL, node varchar(10) NOT NULL, left_nbr int NOT NULL ) /* * I know that I always preach about not using cursors, and there * are ways to get around it, but in this case I am more worried * about readability over performance. */ DECLARE root_cur CURSOR FOR SELECT h.node FROM hier h WHERE h.parent IS NULL FOR READ ONLY /* * Here we are populating our "stack" with all of the root * nodes of the hierarchy. We are using the cursor in order * to assign an increasing index into the "stack"...this could * be done using an identity column and a little trickery. */ OPEN root_cur FETCH root_cur INTO @node WHILE (@@sqlstatus = 0) BEGIN SELECT @idx = @idx + 1 INSERT INTO #a VALUES (@idx, @node, 0) FETCH root_cur INTO @node END CLOSE root_cur DEALLOCATE CURSOR root_cur /* * The following cursor will be employed to retrieve all of * the children of a given parent. */ DECLARE child_cur CURSOR FOR SELECT h.node FROM hier h WHERE h.parent = @node FOR READ ONLY /* * While our stack is not empty. */ WHILE (@idx > 0) BEGIN /* * Look at the element on the top of the stack. */ SELECT @node = node, @left_nbr = left_nbr FROM #a WHERE idx = @idx /* * If the element at the top of the stack has not been assigned * a left number yet, then we assign it one and copy its children * on the stack as "nodes to be looked at". */ IF (@left_nbr = 0) BEGIN /* * Set the left number of the current node to be @counter + @skip. * Note, we are doing a depth-first traversal, assigning left * numbers as we go. */ SELECT @counter = @counter + @skip UPDATE #a SET left_nbr = @counter WHERE idx = @idx /* * Append the children of the current node to the "stack". */ OPEN child_cur FETCH child_cur INTO @node WHILE (@@sqlstatus = 0) BEGIN SELECT @idx = @idx + 1 INSERT INTO #a VALUES (@idx, @node, 0) FETCH child_cur INTO @node END CLOSE child_cur END ELSE BEGIN /* * It turns out that the current node already has a left * number assigned to it, so we just need to assign the * right number and update the node in the actual * hierarchy. */ SELECT @counter = @counter + @skip UPDATE h SET left_nbr = @left_nbr, right_nbr = @counter WHERE h.node = @node /* * "Pop" the current node off our "stack". */ DELETE #a WHERE idx = @idx SELECT @idx = @idx - 1 END END /* WHILE (@idx > 0) */ DEALLOCATE CURSOR child_cur While reading through this, you should notice that assigning the left and right numbers to the entire hierarchy is very costly, especially as the size of the hierarchy grows. If you put the above code in an insert trigger on the hier table, the overhead for inserting each node would be phenomenal. However, it is possible to reduce the overall cost of an insertion into the hierarchy.
Deletes on this table should never cause the left and right numbers to be re-assigned (you could even have a trigger automagically re-parent orphaned hierarchy nodes). All-in-all, this algorithm is very effective as long as the structure of the hierarchy does not change very often, and even then, as you can see, there are ways of getting around a lot of its inefficiencies. (Continued on next question...)
Other Job Interview Questions
|
||||