Hierarchy traversal - BOMs in Sybase

DBA > Job Interview Questions > Sybase Interview Questions and Answers

More DBA job interview questions and answers at http://dba.fyicenter.com/Interview-Questions/

(Continued from previous question...)

Hierarchy traversal - BOMs in Sybase

Alright, so you wanna know more about representing hierarchies in a relational database? Before I get in to the nitty gritty I should at least give all of the credit for this algorithm to: "_Hierarical_Structures:_The_Relational_Taboo!_, _(Can_ Transitive_Closure_Queries_be_Efficient?)_", by Michael J. Kamfonas as published in 1992 "Relational Journal" (I don't know which volume or issue).

The basic algorithm goes like this, given a tree (hierarchy) that looks roughly like this (forgive the ASCII art--I hope you are using a fixed font to view this):

                                    a
                                   / \
                                 /     \
                               /         \
                             b             c
                            / \           /|\
                           /   \        /  |  \
                          /     \     /    |   \
                         d       e   f     |    g

Note, that the tree need not be balanced for this algorithm to work.

The next step assigned two numbers to each node in the tree, called left and right numbers, such that the left and right numbers of each node contain the left and right numbers of the ancestors of that node (I'll get into the algorithm for assigning these left and right numbers later, but, hint: use a depth-first search):

                                   1a16
                                   / \
                                 /     \
                               /         \
                            2b7           8c15
                            / \           /|\
                           /   \        /  |  \
                          /     \     /    |   \
                        3d4     5e6 9f10 11g12 13h14

Side Note: The careful observer will notice that these left and right numbers look an awful lot like a B-Tree index.

So, you will notice that all of the children of node 'a' have left and right numbers between 1 and 16, and likewise all of the children of 'c' have left and right numbers between 8 and 15. In a slightly more relational format this table would look like:

              Table: hier
   node   parent left_nbr  right_nbr
   -----  ------ --------  ---------
   a        NULL        1         16
   b           a        2          7
   c           a        8         15
   d           b        3          4
   e           b        5          6
   f           c        9         10
   g           c       11         12
   h           c       13         14

So, given a node name, say @node (in Sybase variable format), and you want to know all of the children of the node you can do:

    SELECT h2.node
      FROM hier   h1,
           hier   h2
     WHERE h1.node      =   @node
       AND h2.left_nbr  >   h1.left_nbr
       AND h2.left_nbr  <   h1.right_nbr

If you had a table that contained, say, the salary for each node in your hierarchy (assuming a node is actually a individual in a company) you could then figure out the total salary for all of the people working underneath of @node by doing:

    SELECT sum(s.salary)
      FROM hier   h1,
           hier   h2,
           salary s
     WHERE h1.node      =   @node
       AND h2.left_nbr  >   h1.left_nbr
       AND h2.right_nbr >   h1.right_nbr
       AND s.node       =   h2.node

Pretty cool, eh? And, conversely, if you wanted to know how much it cost to manage @node (i.e. the combined salary of all of the boss's of @node), you can do:

    SELECT sum(s.salary)
      FROM hier   h1,
           hier   h2,
           salary s
     WHERE h1.node      =   @node
       AND h2.left_nbr  <   h1.left_nbr
       AND h2.left_nbr  >   h1.right_nbr
       AND s.node       =   h2.node

Now that you can see the algorithm in action everything looks peachy, however the sticky point is the method in which left and right numbers get assigned. And, unfortunately, there is no easy method to do this relationally (it can be done, it just ain't that easy). For an real- world application that I have worked on, we had an external program used to build and maintain the hierarchies, and it was this program's responsibility to assign the left and right numbers.

But, in brief, here is the algorithm to assign left and right numbers to every node in a hierarchy. Note while reading this that this algorithm uses an array as a stack, however since arrays are not available in Sybase, they are (questionably) emulated using a temp table.

    DECLARE @skip            int,
            @counter         int,
            @idx             int,
            @left_nbr        int,
            @node            varchar(10)

    /*-- Initialize variables --*/
    SELECT @skip    = 1000,   /* Leave gaps in left & right numbers */
           @counter = 0,      /* Counter of next available left number */
           @idx     = 0       /* Index into array */

    /*
     * The following table is used to emulate an array for Sybase,
     * for Oracle this wouldn't be a problem. :(
     */
    CREATE TABLE #a (
        idx          int           NOT NULL,
        node         varchar(10)   NOT NULL,
        left_nbr     int           NOT NULL
    )

    /*
     * I know that I always preach about not using cursors, and there
     * are ways to get around it, but in this case I am more worried
     * about readability over performance.
     */
    DECLARE root_cur CURSOR FOR
      SELECT h.node
        FROM hier h
       WHERE h.parent IS NULL
    FOR READ ONLY

    /*
     * Here we are populating our "stack" with all of the root
     * nodes of the hierarchy.  We are using the cursor in order
     * to assign an increasing index into the "stack"...this could
     * be done using an identity column and a little trickery.
     */
    OPEN root_cur
    FETCH root_cur INTO @node
    WHILE (@@sqlstatus = 0)
    BEGIN
      SELECT @idx = @idx + 1
      INSERT INTO #a VALUES (@idx, @node, 0)
      FETCH root_cur INTO @node
    END
    CLOSE root_cur
    DEALLOCATE CURSOR root_cur

    /*
     * The following cursor will be employed to retrieve all of
     * the children of a given parent.
     */
    DECLARE child_cur CURSOR FOR
      SELECT h.node
        FROM hier h
       WHERE h.parent = @node
    FOR READ ONLY

    /*
     * While our stack is not empty.
     */
    WHILE (@idx > 0)
    BEGIN
      /*
       * Look at the element on the top of the stack.
       */
      SELECT @node      = node,
             @left_nbr  = left_nbr
        FROM #a
       WHERE idx = @idx

      /*
       * If the element at the top of the stack has not been assigned
       * a left number yet, then we assign it one and copy its children
       * on the stack as "nodes to be looked at".
       */
      IF (@left_nbr = 0)
      BEGIN
         /*
          * Set the left number of the current node to be @counter + @skip.
          * Note, we are doing a depth-first traversal, assigning left
          * numbers as we go.
          */
         SELECT @counter  = @counter + @skip
         UPDATE #a
            SET left_nbr  = @counter
          WHERE idx = @idx

         /*
          * Append the children of the current node to the "stack".
          */
         OPEN child_cur
         FETCH child_cur INTO @node
         WHILE (@@sqlstatus = 0)
         BEGIN
            SELECT @idx = @idx + 1
            INSERT INTO #a VALUES (@idx, @node, 0)
            FETCH child_cur INTO @node
         END
         CLOSE child_cur

      END
      ELSE
      BEGIN
         /*
          * It turns out that the current node already has a left
          * number assigned to it, so we just need to assign the
          * right number and update the node in the actual
          * hierarchy.
          */
         SELECT @counter = @counter + @skip

         UPDATE h
            SET left_nbr  = @left_nbr,
                right_nbr = @counter
          WHERE h.node    = @node

         /*
          * "Pop" the current node off our "stack".
          */
         DELETE #a WHERE idx = @idx
         SELECT @idx = @idx - 1
      END
    END /* WHILE (@idx > 0) */
    DEALLOCATE CURSOR child_cur

While reading through this, you should notice that assigning the left and right numbers to the entire hierarchy is very costly, especially as the size of the hierarchy grows. If you put the above code in an insert trigger on the hier table, the overhead for inserting each node would be phenomenal. However, it is possible to reduce the overall cost of an insertion into the hierarchy.

By leaving huge gaps in the left & right numbers (using the @skip variable), you can reduce the circumstances in which the numbers need to be reassigned for a given insert. Thus, as long as you can squeeze a new node between an existing pair of left and right numbers you don't need to do the re-assignment (which could affect all of the node in the hierarchy).
By keeping an extra flag around in the hier table to indicate which nodes are leaf nodes (this could be maintained with a trigger as well), you avoid placing leaf nodes in the array and thus reduce the number of updates.

Deletes on this table should never cause the left and right numbers to be re-assigned (you could even have a trigger automagically re-parent orphaned hierarchy nodes).

All-in-all, this algorithm is very effective as long as the structure of the hierarchy does not change very often, and even then, as you can see, there are ways of getting around a lot of its inefficiencies.

(Continued on next question...)