COS 126 Lecture 10: Trees

slides 1 - 4 : Motivation for trees

There are many applications in which we need to store and retrieve information. Keeping that information in sorted order is essential. So far, we've seen arrays and linked lists as structures for storing data. Using a binary search, we can quickly search through an array of items to find something. We can't do that with a linked list, so searching is extremely slow. On the other hand, once we find an insertion point, it's easy to do the actual insertion by changing pointer values with a linked list. With an array implementation, we need to 'shift' values over to make room for a new item. This can be a very expensive operation. A binary search tree (BST) is a data structure which provides both quick searches and quick insertions/deletions.

(See section 5.4 of Sedgewick for an overview of trees.)

slide 5 - Binary Search Tree

The definition provided is:
typedef struct STnode *link;
struct STnode { Item_Type item; link l; link r; };
Notice that this definition is similar to that of a basic linked list. Instead of a 'link next,' though, there is a 'link l' and a 'link r.' This is just variable naming, though. A doubly-linked list has the same structure (with 'link prev' and 'link next.') What makes it a tree is the implementation of the functions associated with the data structure. The link structure is not linear: instead of a pointer to the 'head' of the list, you will usually have a pointer to the 'root' of the tree. This pointer has the address of only one node - the root node, but you can think of it as pointing to the entire tree, just as you can think of a 'head' pointer as pointing to the entire linked list. The left (l) and right (r) pointers of each node point to just one other node each, but you can similarly think of them as pointing to entire 'subtrees.'

slide 6 - Search in Binary Search Trees

The fundamental property of a binary search tree is that all the nodes 'to the left' of any node have keys that are less than it, while all the nodes 'to the right' of any node have keys that are greater (or equal) to it. (Duplicate keys can be handled in multiple ways - they can be put to the left or to the right of a node with equal key, or they can be disregarded: no duplicates policy.) Ideally, we would like a ``roughly'' equal number of nodes in each half of the tree, in which case we call the tree balanced.

A recursive search function is given. It works only because we can assume that the search tree property holds for the tree. You should try tracing through the code for the tree shown on slide #8. Try searching for 43, for example. What is the search cost? (You should relate the number of recursive calls to the total number of keys in the tree.) What would the search cost be if the keys were stored in an array? in a linked list?

slide 7 - Insertion in binary search trees

Trees must be built according to the binary search tree property (less than to the left, greater than to the right) so that the search function will work properly. The algorithm for the insert function is similar to that of the search function, because you want to put the new node in the location where you would find it if you searched for it.

Note that the first key you insert will always be the root of your tree.

You should trace through the given insertion code, too. What is the sequence of function calls that results if 60 is inserted into the tree in slide #8? Again, look at the number of function calls to determine the search cost.

Try tracing through the code for other sequences of values, too: 1,2,3,4,5,6,7 or 7,6,5,4,3,2,1 vs 4,6,3,2,7,5,1

slide 8 - BST construction example

As long as you keep the binary search tree property in mind, you don't need to trace through the code to determine what the tree will look like for a given set of keys.

For this example, the keys are inserted in the following order: 33, 25, 72, 51, 06, 13, 43, 99

slide 9 - Other types of trees

The family tree example provides a basis for the 'family' terminology - parent, child, grandparent, sibling, ancestor, descendant, etc. You can read more about this on page 119 of Sedgewick.

A arithmetic parse tree naturally groups terms of an expression. Each subtree corresponds to a pair of parentheses in the expression (a * (b + c)) - (d + e). Operators are always roots of subtrees. Constants are all leaves in the tree. You can interpret (or evaluate) the tree by recursively evaluating subtrees. The value of each subtree (including the whole tree) is the value of the operator applied to its left and right subtree.

Here's another example, with numbers:
               +
            /     \
           /       \
          -         *
        /   \     /   \
       5     2   +     2
                / \
               2   1
The value of the expression represented by this parse tree is equal to the sum of the (recursive) values of the left and right subtrees.
The value of the left subtree (with '-' as its root) is equal to the difference between its left subtree and its right subtree. So, the value of this subtree is (5 - 2) => 3
The value of the right subtree (with '*' as its root) is equal to the product of its left subtree and its right subtree. So, the value of this subtree is (the value of the subtree rooted with '+) times 2.
The value of that last subtree is 2+1 = 3.
Then, the value of the subtree rooted with '*' is 3 * 2 => 6
So, the value of the whole tree is 3 + 6
The last example relates to the Unix file system. It demonstrates yet another application of trees; in particular, it is one which is not binary.

slide 10 - Traversing Binary Trees

The sample code listed is an implementation of inorder traversal. If it is executed on a binary search tree, it will print out all the keys stored in the tree in sorted order.

Inorder traversal prints out the nodes "in order."

There are two other major types of traversal - preorder and postorder. The difference between the traversals is the order in which the root of a subtree is visited. (A 'visit' is a function which does something at each node, typically printing it, reading some value stored at the node, or updating some value in all the nodes.)

Preorder: Visit the root of the tree (or subtree), then (recursively) traverse the left subtree, then traverse the right subtree

Inorder: Traverse the left subtree, then visit the root, then traverse the right subtree

Postorder: Traverse the left subtree, then traverse the right subtree, then visit the root

For practice, try to write the order of the keys visited with these 3 types of traversal for the tree on slide #8.

slide 11 - Preorder traversal with a stack

A stack can be used to implement preorder traversal without recursion. The other types of traversal are more difficult to implement (without recursion.)

Again, the tree on slide #8 is a good practice example. Trace through the code to see that the preorder traversal is correctly implemented.

slide 12 - Level order traversal

This type of traversal is simpler to implement without recursion. Queues are the natural method for implementation.

Try this code for the tree on slide #8, too. (You should be able to do all these traversals without the code, too.)