Comp 210 Lab 5: Trees

Index: Binary Search Trees, Arithmetic Expressions

In class, we've been using various forms of family trees as examples. This week in lab, we'll use more example forms of trees. Trees are a very common form of data. In fact, lists are simply trees with at most one branch at each node.

Binary Search Trees

We often want to be able to build a data structure that allows us to look up information based on some key value. One simple idea would be to just create a list of all the data items. The disadvantage is that it can then take a long time to find the items at the end of the list, since we have to look at all the prior items first. Many data structures can be used for this problem, one of which is a binary search tree.

For simplicity, let's assume we just want to keep track of a bunch of numbers, remembering which ones we've seen. From this, we could easily generalize our results to, for example, keeping track of a sports team's player information indexed by player number, or to indexing on names rather than numbers.

The following are some binary search trees for the numbers 1,3,4,6,9:

Each node holds an index (here, a number), data for that index (here, nothing), and two possibly-empty subtrees. For each node, all indices in the left branch are less than this node's index, while all indices in the right branch are greater than this node's index. Thus, each of the above three trees is a correct binary search tree. Which is "best" is a question we'll delay until Comp 212.

To do:

Make a data definition for these binary search trees.
Make example data representing the above three trees.
Develop a function search which consumes a binary search tree and a number and returns the boolean indicating whether the number is in the tree. (Follow the design recipe, of course.)
Develop a function insert which consumes a binary search tree and a number and returns a new binary search tree. The resulting tree should contain all the numbers from the first tree, plus the new number. However, if the number was already in the given tree, the resulting tree shouldn't have the number duplicated.

This function gives you the freedom to make the input and output trees structured very differently. Hint: The easiest option is to do no rearrangements, and add the new node at the bottom of the appropriate branch. For the curious... In Comp 212, you'll see why rearranging the tree in certain ways can make a "better" tree.

For examples of this function, use insert to generate the above three trees.
Develop a function in-order which consumes a binary search tree and returns a list of the numbers in the tree in sorted order. You'll want to use the built-in function append to append lists together. (Later in the course, we'll see how to implement append.) Hint: Note that the leaves of the tree are sorted left-to-right.
For the curious... Describe how long search and insert take in terms of the shape of the input tree.

Many operations on binary search trees would be faster if the trees were also balanced, i.e., the branches were all more-or-less equally deep. Implementing balanced binary trees is much more difficult, as you'll see in Comp 212.
For the curious... Change the definition of a node to also store more information than just the index. E.g., consider each number to be a team player's number, and add the player's stats. Adapt search and insert to look up and add this information to the tree.

Arithmetic Expressions

Arithmetic expressions can also be modeled as trees. For example, the Scheme expression

     (+ 5 (* (- 1 8) (+ 7 1)))

is pictorially

        +
      /   \
     5     *
         /   \
        -     +
       / \   / \
      1   8 7   1

We would like to treat this expression as a piece of data. This is just like how DrScheme works -- one of your programs is really just a big piece of data for DrScheme.

An A-Expression is one of the following:

n, where n is a number,
(make-add m n), where m, n are A-Expressions,
(make-sub m n), where m, n are A-Expressions,
(make-mul m n), where m, n are A-Expressions, or
(make-div m n), where m, n are A-Expressions.

I.e., the A-Expression (make-add ...) will represent the appropriate arithmetic expression involving addition. We use the name "A-Expression" to distinguish it from its meaning of an arithmetic expression you'd write on a piece of paper or type into DrScheme.

Thus the above Scheme expression is represented by

     (make-add 5
               (make-mul (make-sub 1 8)
                         (make-add 7 1)))

As always, we distinguish between the information (the Scheme expression (+ 5 (* (- 1 8) (+ 7 1)))) and its data representation (this A-Expression).

To do:

Make more example data. Make a template for functions using A-Expressions.
Develop the function evaluate which consumes an A-Expression and returns a number. Your function should be like a mini-DrScheme, computing the number that the A-Expression represents.

DrScheme, like any language interpreter, is just a generalization of evaluate. A Scheme expression is like an arithmetic expression, but with more cases.
For the curious... Let's say we had many basic operations, not just these four. We would want to have one structure for any binary operation, using a separate data definition enumerating all of our operations. Rewrite the data definitions, examples, and evaluate for this.
For the very curious... Adapt the previous idea to also allow unary operations.