Comp 210 Lab 6: Mutual Recursion, Accumulators and Locals

Index: list, Directory Example, Accumulators, Local & Scope (reading only)

`list`

First, we have a brief aside from our main topic.

The following is a convenient abbreviation shown in class, and already introduced in some lab sections:

     (list 1 3 5 7)
     =
     (cons 1 (cons 3 (cons 5 (cons 7 empty))))

Using list introduces the exact same cons structure, but it is easier for humans to read in big or nested lists, as in some data structures we're now using. When writing programs on lists, you should still think in terms of the cons structure.

Q: What's the difference between (cons 1 empty) and (list 1 empty)?
A: The former creates a list containing one element, 1. The latter creates a list containing two elements, 1 and empty, i.e., the list (cons 1 (cons empty empty)).

Q: What's the difference between (cons 1 2) and (list 1 2)?
A1: The former is an error -- cons's second argument must be a list. The latter creates a list containing two elements, 1 and 2, i.e., the list (cons 1 (cons 2 empty)).
A2: For the curious... There is another answer if we consider all of Scheme. However, we won't be talking about that in this course.

Q: What's the difference between (cons 1) and (list 1)?
A: The former is an error -- cons requires two arguments. The latter creates a list containing one element, 1.

Q: What's the difference between (cons 1 2 empty) and (list 1 2 empty)?
A: The former is an error -- cons requires two arguments. The latter creates a list containing three elements.

Q: What is (list)?
A: empty.

In short, cons and empty are the constructors of lists. list is another function which makes lists, and is convenient shorthand for combinations of the constructors.

Directory Example

The following are data definitions for for a simplified representation of files and directories, as seen in class:

     A file is a symbol (its name).

     A directory is a structure
       (make-dir name contents)
     where name is a symbol, and contents is a l-o-f-d.

     A list-of-files-and-directories (l-o-f-d) is one of
       - empty
       - (cons f lofd)
         where f is a file, and lofd is a l-o-f-d
       - (cons d lofd)
         where d is a directory, and lofd is a l-o-f-d.

Observe the mutual recursion between directories and list-of-files-and-directories.

This is just another example of a tree data structure. We warned you trees were common!

Also, as seen in class, the following are the corresponding templates:

     ;(define (dir-fn a-dir)
     ;  ...(dir-name a-dir)...
     ;  ...(lofd-fn (dir-contents a-dir))...)

     ;(define (lofd-fn a-lofd)
     ;  (cond
     ;    [(empty? a-lofd)
     ;     ...]
     ;    [(file? (first a-lofd))
     ;     ...(first a-lofd)...(lofd-fn (rest a-lofd))...]
     ;    [(dir? (first a-lofd))
     ;     ...(dir-fn (first a-lofd))...(lofd-fn (rest a-lofd))...]))

To do:

Develop a function
```
     ; find? : directory file -> boolean
     ; Returns whether the given file is in the directory or one
     ; of its subdirectories.
     
```
Note that this is a vast simplification of find, the mother-of-all everything-but-the-kitchen-sink UNIX directory traversing command.

(optional -- go do the accumulator exercises first please) Develop a function

     ; flatten-dir-once : directory symbol -> (directory or l-o-f-d)
     ; Returns a structure like the original directory, except that the
     ; named directory is removed, with its contents moved up one level.

Here are two pictorial examples, in both cases removing the directory named to-remove.

	Input	Output
Example 1:	foo / \| \ bar baz to-remove / \ one two	foo / / \ \ bar baz one two
Example 2:	to-remove / \| \ foo bar baz	foo bar baz

Follow the templates and think about a single case at a time. If you do that, it shouldn't be too difficult. If you don't, you'll probably have real trouble.

If you still have time, develop a function
```
     ; any-duplicate-names? : directory -> boolean
     ; Returns whether any directory directly or indirectly contains
     ; another directory or file of the same name.  It does NOT check
     ; for duplicated names in separate branches of the tree.
     
```
There's a straightforward way that just follows the template. There's a more efficient way that follows a template we'll introduce later in the course.

For the very curious... Develop a program to check for duplicated names among all directories and files in the given tree. Here's a hint.
For the curious... Give more accurate data definitions for modeling UNIX directories and files. Files have not only names, but contents and other properties, such as an owner and last-modified date. Furthermore, a directory is considered to be a kind of file.

Here are solutions.

Accumulators

Accumulators are an algorithmic technique where the result is stored in a parameter that is passed to the recursive call. The final result is thus "accumulated" in that parameter, which is usually simply returned by the base case. Note that accumulator-style functions usually need helper functions, which are the actual workhorse accumulating & recurring functions. The main function's job is often just to initialize the accumulator and set up the helper.

To do:

To illustrate the differences between the structural recursion process we've been using (I like to call it "reverse accumulation" because it accumulates the return value on the way out of the recursion) and the accumulator style (which I like calling "forward accumulation" because it accumulates as the process goes into the recursion), we will write the same function two ways. So, try writing a function to sum all the elements of a list of integers in both accumulator and non-accumulator styles. What are the advantages/disadvantages of each technique?
Write a function that returns the last element of a list. Try it both with and without an accumulator. What are the advantages/disadvantages of each technique?
Using an accumulator, write a function that takes in a list and returns the a list with all the elements of the original list reversed. That is, given (1 'a 5 7 'b) the result is ('b 7 4 'a 1).
Write a function that shows the ancesteral heritage for a given person in a descendant tree. The output should be a list of the names from the top of the tree directly down to and including the given person.
For those who can't get enough fun:

Consider the Fibonacci sequence: 0 1 1 2 3 5 8 13 .... note that given the initial (0 1), the next element is always the sum of the previous two.

Wrtie a function that takes in an ascending Fibonacci sequence (assume you start with (0 1) ) and returns a Fibonacci sequence with the next value at the end of the list (i.e. next to the empty list).
That is (fib (list 0 1)) --> (list 0 1 1) or (fib (fib (fib (list 0 1)))) -> (list 0 1 1 2 3)

And for major kicks, try writing a function that takes a descending Fibonacci sequence (assume you start with (1 0) )and returns the a Fibonacci sequence with the next value at the beginning of the list (i.e. (first list)).
That is (fib2 (list 1 0)) -> (list 1 1 0) or (fib2 (fib2 (fib2 (list 1 0)))) -> (list 3 2 1 1 0) (Actually, I think this one is easier than the first one.)

Local & Scope

This section is not required for this lab. You should read and understand the material however.

Index: What is local?, Scope, When to use local?, When not to use local?, local & the Design Methodology

First, to use local, change DrScheme's language level to "Intermediate Student".

What is `local`? (Recap)

Class introduced local. Let's start with a quick review. Its syntax is

     (local [definitions] expression)

for example,

     (local [(define x 3)
             (define y (+ x 1))]
        (+ x y))

How do we evaluate it? Basically, we

evaluate the definitions' expressions, in sequence,
make the resulting definitions global, and
evaluate the local's expression body.

There is one problem with this: the locally defined names may conflict with previous definitions. How do we disambiguate which definitions are being referred to? One way to explain this is that we'll rename the local variables consistently, something like x to x', so that the resulting names have not been used before and will not be used again. So, we amend the previous evaluation strategy to

rename the locally defined variables consistently in both the definitions' expressions and the local's expression body,
evaluate the definitions' expressions, in sequence,
make the resulting definitions global, and
evaluate the local's expression body.

local does not allow you to solve any problems that you couldn't solve before, but it does allow you to solve them with better programs.

To do: Hand-evaluate all steps of the following program:

     (define x 3)
     (define z 6)
     (local [(define x 7)
             (define y (+ x 4))]
        (+ x y z))

Scope

Previously, in programs like

     (define x 3)
     (define (square x)
        (* x x))
     (square 4)

we knew that the variable x inside square was somehow different from the one outside. We didn't discuss this much, since it was fairly intuitive. Our terminology is that the x inside the function is local, while the other is global. We also say that the local variable shadows (or hides or masks) the identically-named global variable.

This distinction is one of scope. I.e., we can have multiple distinct variables with the same name. Each variable has a scope -- the part of the code where that variable can be referred to. Previously, we variables either had local scope or global scope. Now, local allows us to make further distinctions. I.e., some variables will be "more local" than others.

Thinking about scope is basically a shortcut to the whole renaming step of local evaluation. Once you understand scope, you simply say that a use of variable x refers to the most local definition of x.

The scoping rules are rather simple:

Global definitions: The scope of x in (define x expression) or (define (x ...) expression) is the entire program. It is NOT just the remainder of the program following this definition (as erroneously stated in some old versions of the Comp 210 lab materials). If this claim were true, mutual recursion would be impossible! The variable x does not have a value until its define statement has been executed, but the variable still exists and can be referenced inside a function definition that precedes the defintion of x.

For the curious... As we'll soon see, (define (x ...) expression) is shorthand for (define x some-similar-expression), so this case could be described more simply.
Local to a function: The scope of x in (define (foo ... x ...) expression) is the expression.
Local to local: The scope of a define'd variable in a local's definitions is both all of the definitions in the enclosing local (including those that precede it!) and the local expression body.

That's a lot of words for an idea that most people find reasonably intuitive. Really the main difference now is that we have a mechanism for nesting definitions, so we can have more than just the two levels of "global" and "local" variables.

If you use DrScheme's "Check Syntax" button, it will show you the scope of variables:

If you put the cursor on a variable use, it draws an arrow back to the corresponding definition.
If you put the cursor on a variable definition, it draws arrows to all of its uses.

To do: Use DrScheme's "Check Syntax" with the previous examples.

To do: The following are stylistically ugly toy examples for scoping. Draw arrows between definition and uses in the following examples, and figure out what are the results of the expressions. Check your answers with DrScheme's "Check Syntax" button and its evaluation.

(define x 1)
(define y 2)
(define z 3)

(define (fee x y)
   (local [(define z 7)
           (define y 4)]
      (+ x y z)))
(fee x z)

(define (fie x y)
   (local [(define z 7)
           (define y (+ y 3))]
      (+ x y z)))
(fie x z)

(define (foe x y)
   (local [(define z 7)
           (define y (+ x z))
           (define x (local ((define y 10)) (+ y z)))]
      (+ x y z)))
(foe x z)

(define (fum x y)
   (local [(define fum 7)
           (define (x z) (+ y z))]
      (x fum)))
(fum x z)

(define (foo x y)
   (local [(define (z y)
               (cond [(zero? y) 1]
                     [(positive? y) (* x (z (sub1 y)))]))]
      (local [(define x (+ y 1))]
         (+ x (z y)))))
(foo x z)

Fortunately, real-world examples as convoluted as these are uncommon. However, the last example's use of x in z is a common technique we'll see more of later.

When to use `local`?

There are several overlapping reasons for using local:

To hide a helper function or variable from other code.

E.g., hiding bigger in our first class example:
```
     (define (bigger num1 num2) ...)
     (define (best-score a-lon)
         ...(bigger (first a-lon) (best-score (rest a-lon)))...)
     
```
This is a very important rule stylistically. On large projects with multiple programmers, for example, it is easy for multiple programmers to define helper functions with the same names. Hiding these prevents others from interfering with your programs' correctness.

In the end, your programs tend to look like
```
     (define (my-main-function ...)
         (local [(define (helper1 ...) ...)
                 (define (helper2 ...) ...)
                 (define (helper3 ...) ...)
                 ...]
            code-using-helpers))
     
```
Reuse code and avoid duplicating code.

Again, referring to the class best-score example, the reason the first version was inefficient was because we recomputed recursive calls. By only making the recursive call once, we improved the algorithm. We could have done that without a helper function, as in
```
     (define (best-score a-lon)
         (cond
             [(empty? a-lon) 0]    ; the minimum score
             [(cons? a-lon)
              (local [(define best-score-rest (best-score (rest a-lon)))]
                 (cond
                     [(< (first a-lon) best-score-rest) best-score-rest]
                     [else (first a-lon)]))]))
     
```
Avoiding duplication also helps stylistically, since it is a special case of our general rule to reuse code when possible. It also helps avoid errors.

A special case of this is eliding invariant function arguments. A variant of our in-class example:
```
     (define (is-in-los? a-los a-sym)
        (local [(define (is-in-alos? a-los)
                   (cond
                       [(empty? a-los) false]
                       [(cons? a-los) (or (symbol=? (first a-los) a-sym)
                                          (is-in-alos? (rest a-los)))]))]
           (is-in-alos? a-los)))
     
```
The argument a-sym is invariant in the search, and this technique allows us to not pass it around everywhere. For the curious... At the lowest levels of implementation, this idea can improve efficiency, but whether it does at the high-level of Scheme is unclear, because Scheme has lots of leeway in implementing your code.

To name interesting subcomputations.

Even when not accomplishing either above goal, it can be useful to name the results of subexpressions to improve the readability of your code.

As a small example, compare the readability of

     (define (rectangle-area upper-left-point lower-right-point)
         (* (abs (- (point-x upper-left-point)
                    (point-x lower-right-point)))
            (abs (- (point-y upper-left-point)
                    (point-y lower-right-point)))))

     (define (rectangle-area upper-left-point lower-right-point)
         (local [(define top-length (abs (- (point-x upper-left-point)
                                            (point-x lower-right-point))))
                 (define side-length (abs (- (point-y upper-left-point)
                                             (point-y lower-right-point))))]
            (* top-length side-length)))

To do: Go back and review your last two assignments. Use local where appropriate.

You are expected to use local on the current and future assignments where appropriate.

For the curious... You can also use define-struct in a local. Q: When would this be appropriate?

When not to use `local`?

Usually you shouldn't consider identical recursive calls in different conditional cases to be repeated code.

Consider the following code:

     (define (count-symbols a-lons)
        (cond
            [(empty? a-lons) 0]
            [(number? (first a-lons)) (count-symbols (rest a-lons))]
            [(symbol? (first a-lons)) (add1 (count-symbols (rest a-lons)))]))

Q: What is wrong in the following uses of local?

     (define (count-symbols a-lons)
        (local [(define rest-count (count-symbols (rest a-lons)))]
           (cond
               [(empty? a-lons) 0]
               [(number? (first a-lons)) rest-count]
               [(symbol? (first a-lons)) (add1 rest-count)]))

     (define (count-symbols a-lons)
        (cond
            [(empty? a-lons) 0]
            [else
               (local [(define rest-count (lons-length (rest a-lons)))]
                   (cond
                      [(number? (first a-lons)) rest-count]
                      [(symbol? (first a-lons)) (add1 rest-count)]))]))

Don't bother naming uninteresting subcomputations.

`local` & the Design Methodology

You should still use the design methodology for developing local functions. In particular, local functions still need a contract and purpose.

However, there are complications. You can't test a local function independently, because it is hidden. A standard technique is to define the function globally for testing, then move it into a local. This techniques doesn't work when the function uses variables that aren't local to the function, as when eliding invariant arguments, e.g.,

(define (expt x y)
   (local [(define (expt-of-x y)
               (cond
                   [(zero? y)     1]
                   [(positive? y) (* x (expt-of-x (sub1 y)))]))]
      (expt-of-x y)))

Comp 210 Lab 6: Mutual Recursion, Accumulators and Locals

list