DBA > Articles

Binary Trees in SQL

By: Joe Celko
To read more DBA articles, visit http://dba.fyicenter.com/article/

The Adjacency List Tree

So as to illustrate the point he'd reached, I've cleaned up the code just a little bit (changed to ISO-11179 data element names, added constraints, shorten a 100-character name, removed an audit timestamp, remove an IDENTITY, made the flag CHAR(1) instead of INTEGER, etc) he posted this table. Oh, I changed the table name for this article. Hey, I just want to show the skeleton.

CREATE TABLE Binary_Tree

(node_id INTEGER NOT NULL PRIMARY KEY,
parent_node_id INTEGER NOT NULL,


CHECK (parent_node_id <> node_id),
node_flag CHAR(1) DEFAULT 'L' NOT NULL -- L= left child, R= right child
CHECK (node_flag IN ('L', 'R'))


user_first_name VARCHAR(25) NOT NULL);

There are problems with this design of table such as the use of IDENTITY and flags. And root is neither left nor right, I guess it has to be a NULL.

The adjacency list is a way of "faking" pointer chains, the traditional programming method in procedural languages for handling trees. This is where we get terms like “parent”, “child”, “link” and so forth; these are classic assembly language terms.

There are major problems with the simple Adjacency List Model and this version is even more complicated.

You need to prevent orphans in any tree. That is, to be a tree there must be a path from every node back to the root node. This schema fails:

INSERT INTO Binary_Tree
VALUES (1, 2, 'L', 'Sam'),
(3, 4, 'R', 'Fred');

You need to prevent cycles. That is, nobody is his own boss. The simplest constraint for this is CHECK (parent_node_id <> node_id) and I am surprised that almost nobody bothers with it. But after that, it requires a trigger that follows the paths from the root. This schema fails as written.

INSERT INTO Binary_Tree
VALUES (1, 2, 'L', 'Sam'),
(1, 3, 'R', 'Fred'),
(3, 4, 'L', 'John'),
(4, 1, 'L', 'Loop');

The first thought is that you can constraint this model with a simple count of the children:

NOT EXISTS
(SELECT node_id
FROM Binary_Tree
GROUP BY node_id
HAVING COUNT(*) > 2)

This is not enough. We also need to require that we have at most one left child and at most one right child. Here is one way:

NOT EXISTS
(SELECT node_id
FROM Binary_Tree
GROUP BY node_id
HAVING SUM(CASE WHEN node_flag = 'L' THEN 1 ELSE 0 END) > 1
OR SUM(CASE WHEN node_flag = 'R' THEN 1 ELSE 0 END) > 1)


There are other constraints that make a graph a tree. We know that the number of edges in a tree is the number of nodes minus one so this is a connected graph. That constraint looks like this:

CHECK ((SELECT COUNT(*) FROM Binary_Tree) -1 -- edges

= (SELECT COUNT(parent_node_id) FROM Binary_Tree)) – nodes

I am showing this constraint and others as CHECK() clauses, but you will have to use TRIGGERs in SQL Server; I am being lazy about keeping the code as short as I can.

The Nested Set Solution
At this point, my regular readers are expecting me to pull out the Nested Set Model since I have written so much on it over the years. If you don't know how it works, you can Google it or get a copy of my TREES & HIERARCHIES book (ISBN: 978-1-55860-920-4)

CREATE TABLE NestTree

(node_name CHAR(2) NOT NULL PRIMARY KEY,

lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft > rgt));

NestTree

node_name lft rgt
===================

'A' 1 9

'B' 2 3

'C' 4 11

'D' 5 6

'E' 7 8


A nice thing is that the name of each node appears once and only once in the table. It avoids cycles easily and orphans are not possible. Now here are the kickers! How do you write a constraint to assure that this is a binary tree? Here is one way:

CHECK

(2 <= ALL

(SELECT COUNT (C.node_name)

FROM NestTree AS C

LEFT OUTER JOIN

NestTree AS P

ON P.lft

= (SELECT MAX(lft)

FROM NestTree AS S

WHERE C.lft > S.lft

AND C.lft < S.rgt)))


Yes, that is ugly code. And most SQL programmers do not know about the ALL() predicates. See if you can write a simpler constraint. The Binary Heap

Binary trees have lots of known mathematical properties. For example, the number of distinct binary trees with (n) nodes is called a Catalan number and it is give by the formula ((2n)!/((n+1)!n!)). The literature is full of various kinds of binary trees:

Perfect binary tree:

a binary tree in which each node has exactly zero or two children and all leaf nodes are at the same level. A perfect binary tree has exactly ((2^h)-1) nodes, where (h) is the height. Every perfect binary tree is a full binary tree and a complete binary tree.

Balanced binary tree:
a binary tree where no leaf is more than a certain amount farther from the root than any other leaf. See also AVL tree, red-black tree, height-balanced tree, weight-balanced tree, and B-tree.

AVL tree:
A balanced binary tree where the heights of the two subtrees rooted at a node differ from each other by at most one. The structure is named for the inventors, Adelson-Velskii and Landis (1962).

Height-balanced tree:
a tree whose subtrees differ in height by no more than one and the subtrees are height-balanced, too. An empty tree is height-balanced. A binary tree can be skewed to one side or the other. As an extreme example, imagine a binary tree with only left children, all in a straight line. The ideal situation is to have a balanced binary tree -- one that is as shallow as possible because at each subtree the left and right children are the same size or no more than one node different. This will give us a worst search time of LOG2(n) tries for a set of(n) nodes. Fibonacci tree:

A variant of a binary tree where a tree of order(n) where(n > 1) has a left subtree of order n-1 and a right subtree of order(n-2). An order 0 Fibonacci tree has no nodes, and an order 1 tree has 1 node. A Fibonacci tree of order(n) has(F(n + 2) - 1) nodes, where F(n) is the n-th Fibonacci number. A Fibonacci tree is the most unbalanced AVL tree possible.

The binary tree that we are interested in is called a heap. It is used for the HeapSort, which first appeared in J. W. J. Williams' article published in the June 1964 issue of Communications of the ACM, titled "Algorithm number 232 – Heapsort. Currently, there is an article on using a version of a heap for data access in a virtual machine environment , 'Think you've mastered the art of server performance? Think again'. by Poul-Henning Kamp

Start with a simple one dimensional array and put the root value in A[1], then the left child does into A[2] and the right child goes into A[3]. In general, the left child of the subtree root in location (n) of the array is A[2*n] and the right child is A[2*n +1].

Damjan S. Vujnovic (damjan@galeb.etf.bg.ac.yu) worked out the details of the following queries against a binary tree. Let's construct a binary tree and load it with some sample data.

Full article...


Other Related Articles

... to read more DBA articles, visit http://dba.fyicenter.com/article/