Comp 210 Lab 5: local, big-O notation

Index: local, big-O notation

`local`

Class introduced the syntax

     (local definitions
        body-expression)

Today we will get more experience on what this is good for. We will write several versions of a program to find the maximum number in a non-empty list of numbers.

We use a non-empty list of numbers to avoid defining the maximum for an empty list, which is a completely separate issue from today's use of local. Thus our template is

     (define (f a-nelon)
        (cond
           [(empty? (rest a-nelon))
            ...]
           [else
            ...(first a-nelon)...(f (rest a-nelon))...]))

Version 1

First, let's look at the version that follows straight from the template for a non-empty list of numbers:

     (define (max-list1 a-nelon)
        (cond
           [(empty? (rest a-nelon))
            (first a-nelon)]
           [else
            (cond
                [(> (first a-nelon) (max-list1 (rest a-nelon)))
                 (first a-nelon)]
                [else
                 (max-list1 (rest a-nelon))]))

To do: Use this code on some examples, including some longer lists.

Q: Observe that each call to max-list1 makes at most two natural recursive calls. Can you describe at most how many times max-list1 is called in total in relation to the length of a-nelon?

Version 2

You should have noticed that the first version can be rather slow. The basic problem is that each call to the function can make at most two natural recursive calls. Using local, we can reduce this to one natural recursive call.

Stylistically, the first version violates our rule against duplicating code. Using local will give us a nice way to avoid this, improving code legibility.

     (define (max-list2 a-nelon)
        (cond
           [(empty? (rest a-nelon))
            (first a-nelon)]
           [else
            (local [(define max-rest-list (max-list2 (rest a-nelon)))]
               (cond
                   [(> (first a-nelon) max-rest-list)
                    (first a-nelon)]
                   [else
                    max-rest-list]))

Here, we compute the maximum of the rest of the list once before the cond and remember the result by giving it a name. In the cond, we just use this value instead of recomputing it. This is an improvement in both style and efficiency.

To do: Use this code on the same examples as before. Compare your results.

Q: Observe that each call to max-list2 makes at most one natural recursive calls. Can you describe at most how many times max-list2 is called in total in relation to the length of a-nelon? Why is it more than twice as fast?

Versions 3 and 4

The outer cond's whole else branch is just computing the maximum of two values. That's a common operation that may occur elsewhere in other functions.

To do: Write a function max : number number -> boolean that returns the maximum of the two given numbers. Rewrite both max-list1 and max-list2 to use max, calling the new functions max-list3 and max-list4. (max is predefined in DrScheme, but we'll ignore that.)

To do: Compare the efficiency and style of these new versions. What is the advantage, if any, of using local in this context?

Version 5

Maybe we don't want anyone else to use max after all. For example, what if several people are writing pieces of a large program. While you have written your function max to compute the maximum of two numbers, a coworker has written a function max to find the employee record with the largest salary.

We would like to avoid such confusion by ensuring that only you can use your max. We can do this using local -- making this helper function local to the main function. For example, modifying version 3, we can obtain the following:

     (define (max-list5 a-nelon)
        (local [(define (max m n)
                   (cond [(> m n) m]
                         [else       n]))]
           (cond
              [(empty? (rest a-nelon))
               (first a-nelon)]
              [else
               (max (first a-nelon) (max-list5 (rest a-nelon)))])))

To do: Compare the style and efficiency to previous versions.

We could, of course, modify the fourth version in the same way. But we don't bother, because as you should have discovered, there's no point in using a local in the cond branch when using a helper function to do all the work.

Version 6

Let's put local aside for a minute. In class, you saw the following function (here using its usual name):

     (define (foldr combinef base l)
        (cond
           [(empty? l) base]
           [else       (combinef (first l)
                                 (foldr combinef base (rest l)))]))

To do: Write a version of max-list using this. This should be very straightforward after the class examples.

In summary

We've seen many ways of writing this one function. The most straightforward way, strictly following the template, is inefficient and has a stylistic problem. Both can be improved with the use of either local or a helper function in several different ways. Which solution is best depends on the context, and deciding which to use is a skill to be developed by a computer scientist.

Big-O notation

When describing the efficiency of the above functions, you might have said something like

"max-list1 can call itself recursively at most 2ⁿ times if n is the length of a-nelon."
"max-list2 can call itself recursively at most n times if n is the length of a-nelon."

(At least, these are the correct answers.)

This kind of statement is very common, and we have a shorter and more precise notation for it:

max-list1 is in O(2ⁿ)
max-list1 is in O(n)

There are some advantages of this notation, aside from its brevity.

The big-O stands for "order of magnitude", and lets us ignore unimportant details like constants:
O(n) = O(n + 10) = O(3n - 9)

or, more generally, smaller factors:

O(n²) = O(n² + 3n + 4)
It can describe not just how many recursive function calls there are, but how much time is used by the whole computation. Here, this would also count how many times conditions are tested, selectors are used, etc. However, in this example, all of those are already counted, since we ignore constant factors.

(Note: big-O notation is useful for more than just describing time of execution.)