CMPS 144
Notes on binary trees.

Defn: An undirected graph is a set of nodes (also called vertices) and edges, where each edge is a set of two nodes (indicating that those two nodes are "connected" by the edge). In drawing a graph, we use circles (or boxes) to depict nodes and (not necessarily straight) lines (from one node to another) to depict edges. Note that no two edges can connect the same pair of nodes. (If we allow two or more edges to connect the same pair of nodes, we call the resulting structure a "multi-graph".)

Defn: A path in a graph is a sequence of distinct edges, in which each edge in the sequence "begins" in the node at which the previous edge "ends". The length of a path is the number of edges in it. For example, in a graph whose nodes are named by upper case letters, the path (of length three) described by the sequence of nodes <B, D, A, E> is composed of the edges {B,D}, {D,A}, and {A,E}.

Defn: A graph G is connected if, for every pair of nodes u and v in G, there is a path connecting u and v.

Defn: A cycle in a graph is a path of length three or more that begins and ends at the same node. (A "trivial" path containing no edges (and thus beginning and ending at the same node) is not considered to be a cycle. Nor can a cycle have length two, because then it would have to use the same edge twice.)

Defn: A graph G is acyclic if it contains no cycles.

Defn: A (free) tree is a connected, acyclic, undirected graph. Equivalently, a tree is an undirected graph in which, for every pair of nodes u and v, there is a unique path from u to v. In computing, we are usually interested in a variation of this concept called a rooted tree. To obtain a rooted tree from a free tree, you choose one of its nodes to be the root and consider all the edges to be directed "away" from that node. (For some applications, it is better to think of all edges as being directed toward the root.) The nodes directly connected to the root are called its children. Each such node has as its children all those nodes to which it is connected by an edge, with the exception of its parent. In effect, this kind of tree describes a hierarchical structure and corresponds to an outline. Below is an outline, formatted in the traditional way, and next to it a rooted tree depicting the same information (except that some details have been omitted).

U of S
I. CAS
   A. English
   B. CS
      1. Beidler
         (a) CMPS 240
         (b) SE 512
      2. Bi
         (a) CMPS 352
         (b) SE 516
   C. Math
      1. Monks
      2. Dougherty
   D. History
II. KSOM
   A. Accounting
   B. Marketing
   C. Finance
III. Panuska School (PAN)
   A. Nursing
   B. PT
   C. EDU 
                        U of S
                       /   |   \
                     /     |     \
                   /       |       \
                 /         |         \
               /           |           \
             /             |             \
           CAS           KSOM            PAN
          / | \          / | \          / | \ 
        /   |   \       /  |  \        /  |  \ 
       CS  HIST MATH  ACC MAR FIN    NURS PT  EDU
     /   \
    /     \
Beidler   Bi 
         /  \
        /    \
   CMPS352  SE516

Notice the resemblance to a tree in nature (but upside down).

In order to converse about rooted trees, we introduce several terms:

We could also describe a rooted tree recursively, by saying that a (rooted) tree is either

  1. empty (meaning it has no nodes or edges), or
  2. consists of a root node r having zero or more children, each of which is the root node of a (sub)tree, no two of which have any nodes in common and none of which contains r.

Here we are interested specifically in binary trees:

Defn: A binary tree is a rooted tree in which each node has at most two children; furthermore, each child is distinguished as being either the left child or the right child of its parent. With respect to a node x, we refer to the subtree rooted at x's left (respectively, right) child as the left (respectively) right subtree of x.

Now we give two very different specifications of a Binary Tree ADT. One is referred to as the recursive paradigm and the other is the positional paradigm. Which is to be preferred depends upon the application.

/* Recursive paradigm for Binary Trees */

public interface RecBinTree<T> {

   // observers

   boolean isEmpty();            // is this tree empty?
   boolean isLeaf();             // is this tree a single node?
   T rootOf();                   // returns object in root node of this tree
   RecBinTree<T> leftSubtree();  // returns left subtree of this tree
   RecBinTree<T> rightSubtree(); // returns right subtree of this tree


   // mutators

   void setRoot(T obj);       // replace object at root of this tree by obj

   // replaces this tree by t; makes t into an empty tree
   RecBinTree<T> graft(RecBinTree<T> t);

   // replaces this tree by a one-node tree containing obj
   RecBinTree<T> graft(T obj);

   // makes this tree empty; returns its old value
   RecBinTree<T> prune();

}  // end RecBinTree

/* Positional paradigm for Binary Trees, in which we imagine that
** there is a single cursor that can move around within the tree.  At
** any given moment, the cursor is at a "position" within the tree, which
** can be either a node or a "phantom", the latter corresponding to a
** place where a node does not exist but could be inserted.  (If, say, a
** particular node has no left child, the position where such a child could
** be inserted is a phantom.)  In other words, a phantom position corresponds
** to an empty subtree.  A phantom position in a tree plays a role similar
** to the rear position in a postional list.  In a binary tree, the number 
** of phantom positions is exactly one more than the number of nodes.
*/
public interface PosBinTree<T> {

   // observers

   boolean isEmpty();       // is the tree empty?
   boolean atRoot();        // is cursor at the root?
   boolean atLeaf();        // is cursor at a leaf?
   boolean atPhantom();     // is cursor at a "phantom"?
   boolean atLeftChild();   // is cursor at a left child (possibly phantom)?
   boolean atRightChild();  // is cursor at a right child (possibly phantom)?
   boolean hasLeftChild();  // is cursor at a node having a left child?
   boolean hasRightChild(); // is cursor at a node having a right child?
   T getObj();              // yields item in current node (pre: !atPhantom)

   // navigator mutators

   void toRoot();       // move cursor to the root 
   void toLeftChild();  // move cursor to left child of its current position
   void toRightChild(); // move cursor to right child of its current position
   void toParent();     // move cursor to parent of its current position

   // preconditions:
   //   toLeftChild() and toRightChild() require !atPhantom()
   //   toParent() requires !atRoot() 


   // content mutators

   // pre: atPhantom()
   // post: node containing obj has been placed at cursor's position
   void graft(T obj);

   // pre: atPhantom()
   // post: s has been placed at cursor's position; s has been made empty
   void graft(PosBinTree<T> s);

   // post: subtree rooted at cursor's position has been chopped off from
   //       tree and returned as a separate tree
   PosBinTree<T> prune()

   void replace(T newObj);  // replaces object in node at cursor's postion 

}  // end PosBinTree 


Among the applications of binary trees are

  1. representing arithmetic/boolean expressions
  2. lookup table (update and retrieval within a collection of data)
  3. representing priority queues (also called "heap")
  4. digital search trees

As an example of (1), consider the arithmetic expression

(2 + (5 - 17)) * 6 + 9 / 7

          +
        /   \
       /     \
      *      divide 
     / \      / \
    /   \    /   \
   +     6  9     7
  / \
 /   \
2     -
     / \
    /   \
   5     17  

To the right is this expression depicted as a rooted tree. Notice that non-leaves are labeled by operators and leaves are labeled by integer literals. How would a tree such as this be processed in order to evaluate the corresponding expression? Recursively! If a tree is single node (necessarily containing a number), it must correspond to the primitive expression consisting only of that number. Otherwise, the tree must have a root containing an operator and two subtrees corresponding to the expressions that are the left and right operands of that operator. In this case, it suffices to evaluate the two subtrees recursively and then to apply the operator in the root to the two values thereby obtained.

Making use of the recursive version of the binary tree (as well as a class Operator that we don't bother to specify), we could write it as follows:

   int evaluate( RecBinaryTree t )  {

       if ((t.isLeaf()) {  // the lone node of t contains a number
          { return t.rootOf(); }
       else {  // root of t is an interior node containing an operator
          int y = evaluate( t.leftSubtree() );
          int z = evaluate( t.rightSubtree() );
          Operator op = (Operator) t.rootOf();
          return op.apply(y,z);  // applies op to y and z 
       }
    }

Augmenting the method to allow for unary operators such as unary + and − (or ! (negation) in the case of boolean expressions) is conceptually easy but complicates the code so much that, for the sake of simplicity, we assume that all operators are binary.


As for application (2), there is the binary search tree (BST).

A binary search tree (BST) is a binary tree in which, among the information being stored at each node there is a key field, and each node satisfies the condition that its key is greater than all those in the nodes in its left subtree but less than all those in nodes in its right subtree. (It is possible to handle duplicate keys in a BST nicely, but, for the sake of simplicity, we will not deal with this possibility.) For example, here is a BST (in which only the keys are shown):

                  47
                 /  \
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          /                \
        18                 77
       /  \               /  \
      /    \             /    \
     /      \           /      \
    /        \         /        \
   5         24      57          85
  /         /  \                /  \
-1        21   33             80    90
  \      /    /              /  \     \
   2   19   30              79   82    95
        \                      /
         20                   81 

To search for a key in a BST, we begin at the root, repeating the following until either the current position is a "phantom" or has as its key the one that is sought: proceed to the left subtree or to the right subtree, respectively, according to whether the sought key is less than or greater than, respectively, the current node's key.

As an example, suppose that we seek the key 82 in the BST above. We begin at the root, whose key is 47. Because 82 is larger, we move to the right child of the root, which has key 77. But 82 is larger than 77, so we go to that node's right child, where we find 85. As 82 is smaller than 85, we move to the left child, which contains 80. As 82 is larger than 80, we move to the right child, where we find 82.

Now suppose we seek 60. Using the same reasoning as above, from the root we move first to the right, then to the left, and then to the right again. At this point, we are at a phantom position, which tells us that the sought key is nowhere in the tree.

We could write it (using the positional paradigm) in Java as follows:

/* pre: t is a binary search tree with respect to the ordering
**      defined by comp (of type Comparator<T>).
** post: If itemSought is equal to the object in some node in t, the cursor's
**   position will be at such a node and the value true will be returned; 
**   otherwise, the cursor will end up at the phantom position where 
**   itemSought should be inserted (if desired) and false will be returned.
*/
boolean searchBST(PosBinTree<T> t, T itemSought) {

   boolean found = false;
   t.toRoot();
   while (!atPhantom()  &&  !found) {
      T crrntObj = t.getObj();
      int compareResult = comp.compare(itemSought, crrntObj);
      if (compareResult < 0)
         { t.toLeftChild(); }
      else if (compareResult > 0)
         { t.toRightChild(); }
      else
         { found = true; }
   }
   return found;
}

Employing the recursive paradigm, we could write it in Java as follows:

/*  pre: t is a binary search tree with respect to the ordering
**       defined by comp (of type Comparator<T>)
**  post: If itemSought is equal to an object in some node of t, the
**     subtree rooted at such a node will be returned; otherwise an
**     empty subtree of t at which itemSought belongs will be returned
*/
RecBinTree<T> searchBST(RecBinTree<T> t, T itemSought) {

   if (t.isEmpty())
      { return t; }
   else {
      int compareResult = comp.compare(itemSought,t.rootOf());
      if (compareResult < 0)
         { return searchBST( t.leftSubtree(), itemSought ); }
      else if (compareResult > 0)
         { return searchBST( t.rightSubtree(), itemSought ); }
      else
         { return t; }
   }
}

Once we know how to search in a BST, performing an insertion is easy, because (assuming that the search led to a phantom position in the case of the positional paradigm or, in the case of the recursive paradigm, to an empty subtree) all that is necessary is to insert a new node where the search ended. Translating this into Java, we get (in the positional paradigm)

/* pre: t is a binary search tree with respect to the ordering
**      defined by comp (of type Comparator<T>).
**  post: If some node in t contains an object that is equal to newObj,
**        t is not changed; otherwise, a new node containing newObj is
**        inserted into t.
*/
void insertBST(PosBinTree<T> t, T newObj) {

   if (searchBST(t, newObj))
      { }  // an object equal to newObj already exists in t
   else // the current position in t is where the new item should be inserted
      { t.graft(obj); }
}

For the recursive paradigm, we get

/* pre: t is a binary search tree with respect to the ordering
**      defined by comp (of type Comparator<T>)
**  post: if some node in t contains an object that is equal to newObj,
**        t is not changed; otherwise, a new node containing newObj is
**        inserted into t.
*/
void insertBST(RecBinTree<T> t, T newObj) {

   RecBinTree<T> s = searchBST(t, newObj);
   if (s.isEmpty())
      { s.graft(newObj); }
   else
      { }   // an object equal to newObj already exists in t
}

                  47
                 /  \
                /    \
               /      \
              /        \
             /          \
            /            \
           /              \
          /                \
        18                 77
       /  \               /  \
      /    \             /    \
     /      \           /      \
    /        \         /        \
   5         24      57          85
  /         /  \                /  \
-1        21   33             80    90
  \      /    /              /  \     \
   2   19   30              79   82    95
        \                      /
         20                   81 
Deleting a specified item from a BST is somewhat more complicated than inserting one. Consider the BST pictured to the right (which is repeated from above). Suppose that we wish to delete 30 from the collection of keys found in the tree. Simply pruning the (leaf) node containing that key does the job! Indeed, this approach works for any key occupying a leaf node.

On the other hand, such a simple solution does not work in the case that the key to be deleted occupies an interior node. Going back to the picture, suppose that 21 is to be deleted. If we simply prune the subtree rooted at the node containing that key, not only will 21 be deleted from the tree, but also the keys 19 and 20 that happen to be in that subtree.

A few minutes' thought should convince you, however, that to delete a key found in a node v having only one child w, it suffices to connect w directly to v's parent u, thereby making u the parent of w, rather than the grandparent. (If v had been the root of the tree, then w would become the root.) Another way to describe it is this: the subtree rooted at w replaces the one rooted at v, as illustrated here:

Removing a key in a node having one child
          |                                      |
          |                                      |
          u                                      |
         / \                                     u
        /   \                                   / \
       /     \                                 /   \
      v       .     is transformed into       /     \ 
     /       / \                             w       .
    /       /   \                           / \     / \
   w       +-----+                         /   \   /   \
  / \                                     +-----+ +-----+
 /   \  
+-----+ 

Although the picture shows v as being the left child of u and w as being the left child of v, either or both of them could have been right children. That replacing v's subtree by w's subtree results in a tree that retains the properties of a BST follows from the fact that, except for the key in node v (which no longer appears in the tree), all descendants of u remain in the same subtree (left vs. right) of u (and all of u's ancestors) as they had been.

Having figured out how to delete a key occupying a node having at most one child, it remains to tackle the somewhat more difficult case of deleting a key that occupies a node having two children. Going back to our example BST, suppose that we wished to delete the key 18. The solution that works for keys found in nodes having only one child won't work here, for obvious reasons. Hence, rather than thinking in terms of removing the node v in which 18 occurs, let's think about replacing 18 in v with a key k found deeper in the tree (i.e., in one of v's subtrees), which leaves us with the problem of deleting key k from whichever subtree we found it in. Suppose, arbitrarily, that we look in v's right subtree for such a k. Then our tentative solution looks like this:

Removing a key in a node having two children
          |                                      |
          |                                      |
        v (18)                                 v (k)
         / \                                    / \
        /   \                                  /   \ 
       /     \                                /     \ 
      /       \      BST T on the left       /       \ 
     .         .    is transformed into     .         .
    / \       / \   BST T' on the right    / \       / \
   /   \     /   \                        /   \     /   \
  /     \   /     \                      /     \   /     \
 +-------+ +-------+                    +-------+ +-------+
                                           ^          ^
                                        same as     with k
                                         before     deleted 

After a few minutes' thought, it should be clear that, in order for T' (on the right) to be a BST, our only choice for k is the smallest key occurring in v's right subtree (in T). (If we were to choose k to be some key greater than the smallest key in v's right subtree, T' would have that smaller key in the right subtree of the node containing k, violating the conditions defining a BST.

But how do we find the smallest key in a BST? That's easy! Starting at the root, just keep going to the left child until there is none. Conveniently, such a node has at most one child (namely, on the right), and so deleting the key in it is easy.

In our example (of deleting 18), the subtree rooted at the node containing 18 is modified as shown below. Specifically, 19 replaces 18 and the node that had contained 19 was deleted by connecting 20 directly to 21.

originalupdated
        18
       /  \
      /    \
     /      \
    /        \
   5         24
  /         /  \
-1        21   33
  \      /    /
   2   19   30
        \
         20
        19
       /  \
      /    \
     /      \
    /        \
   5         24
  /         /  \
-1        21   33
  \      /    /
   2   20   30 

Using the recursive paradigm, we could write the following methods to support the delete operation:

/* pre: t is a binary search tree with respect to the ordering
**      defined by comp (of type Comparator<T>)
** post: an object in t equal to obj has been deleted from t, assuming
**       that there is one.  Otherwise, t is unchanged.
*/
void deleteBST(RecBinTree<T> t, T obj) {
   RecBinTree<T> s = searchBST(t, obj);
   if (s.isEmpty())
      { }
   else 
      { deleteRootBST(s) }
}

/* pre: !t.isEmpty()
*  post: the object in t's root node has been deleted from t
*/
void deleteRootBST(RecBinTree<T> t) {

   RecBinTree<T> left = t.leftSubtree();
   RecBinTree<T> right = t.rightSubtree();

   if (left.isEmpty()  &&  right.isEmpty())  // t is a leaf node
      { t.prune(); }
   else if (left.isEmpty()) {  // left is empty, right is not
      t.prune();      // make t empty
      t.graft(right); // graft right in place of t
   }
   else if (right.isEmpty()) {  // right is empty, left is not
      t.prune();      // make t empty
      t.graft(left);  // graft left in place of t
   } 
   else { // t has two nonempty subtrees
      RecBinTree<T> s = leftMostSubtree(right);
      t.setRoot(s.getRoot());
      deleteRootBST(s);
   }
}

/* pre: !t.isEmpty()
*  post: tree returned is the "leftmost" subtree within t 
*/
RecBinTree<T> leftMostSubtree(RecBinTree<T> t) {
   RecBinTree<T> left = t.leftSubtree();
   if (left.isEmpty())
      { return t; }
   else
      { return leftMostSubtree(left); }
}

Run-time Analysis of Search/Insert in BST's

What is the running time of searching or inserting in a BST, as a function of the number of nodes in the tree? That is not an easy question to answer, at first. An easier question is "What is its running time, as a function of the height of the tree?". It is clear that, at worst, the number of iterations (or recursive calls, in the case of the recursive algorithm) is equal to the number of "levels" in the tree, which corresponds to one more than the tree's height. If we assume that each iteration takes constant time (which, with a good choice of data structure to represent a tree, is achievable), we conclude that the algorithm's running time is proportional to the tree's height. To answer the original question, we might attempt to determine the relationship between a binary tree's height and its number of nodes. To get a handle on this, we consider the question "Among all binary trees of height h, where h is some natural number, what is the smallest number of nodes and what is the largest number?"

A tree of a given height with the smallest possible number of nodes is constructed by making each node have only a single child (except for the lone leaf, which has none). Such a tree has only h+1 nodes! Since our tree search algorithm takes time proportional to h, we conclude that, in the worst case (when n is smallest in relation to h), it takes time proportional to n, too. Thus, we have a linear time search algorithm! This is no better than searching in a list!

Luckily, it turns out that "most" binary trees are such that their heights are much smaller than their # of nodes. The most extreme case arises by packing as many nodes as possible into each of the h+1 levels of the tree.

Numbering levels starting at 0, we assert that, for each i, the maximium number of nodes that can occur on level i is 2i. But the sum

20 + 21 + 22 + ... + 2h

is equal to 2h+1 − 1. That is, letting h be the height of a binary tree and n the number of nodes in it, we get

n 2h+1 − 1 
n+1 2h+1 (add 1 to both sides)
lg(n+1)h+1 (take lg of both sides)
lg(n+1) − 1h (subtract 1 from both sides)

What we have established is that, for a binary tree of n nodes having height h,

lg (n+1) − 1 ≤ h < n

The closer that h falls to the lower bound, the better the performance of the methods above. Luckily, it turns out that, on average, binary trees tend to have height closer to the lower bound. (To state it more precisely, there exist constants c and n0 such that, for every n ≥ n0, the average height among all binary trees with n nodes is less than or equal to c·lg n. In other words, the function

f(n) = "average height of all binary trees with n nodes"
is in the class O(lg n).

What is even better is that there are techniques, one (called AVL Trees) having been discovered by two Russian mathematicians (Adelson-Velski and Landis) in about 1960, to guarantee that insertions and deletions never allow the height of a BST to grow beyond (approximately) 1.44 · lg n. (The algorithms for insertion and deletion in AVL trees are somewhat more complicated, but their running times are still proportional to the height of the tree.) These techniques are beyond the scope of the course, but we shall, in discussing BST's, assume that their heights are in O(lg n).

It is interesting to compare searching in a BST to searching in an ordered array. BST is a nice compromise between ordered arrays and (ordered) lists, in that its running times for search, insert, and delete are all good (O(lg n)), while the other structures suffer in at least one operation.

Data StructureSearchInsertDelete
ordered arrayO(lg n)O(n)O(n)
unordered arrayO(n)O(1)O(1)
listO(n)O(1)O(1)
(balanced) BSTO(lg n)O(lg n)O(lg n)

Note that the asymptotic running times given above for Insert and Delete do not include the time required in searching to find the proper location at which to perform the insertion or deletion.