Comp210 Lecture

A New Way to Add

We've already written a fine way to add up all the numbers in a list of numbers:

;; A lon is a list of numbers

(define myList1 (list 5 4 3 2 1))
(define myList2 (list -2 -1 0 1 2 ))

;; fine_add: lon -> num
;; adds up all the numbers in a lon

(define (fine_add a-lon)
    (cond
        [(empty? a-lon) 0]
        [(cons? a-lon) (+ (first a-lon) (fine_add (rest a-lon)))]))

"fine_add tests:"
(= 0 (fine_add empty))
(= 15 (fine_add myList1))
(= 0 (fine_add myList2))

But is there any other way to do this?

Well, there's always a bad way...

;; bad_add: lon -> num
;; adds up all the numbers in a lon

(define (bad_add a-lon)
    (cond
        [(empty? a-lon) 0]
        [(cons? a-lon) 
            (cond
                [(empty? (rest a-lon)) (first a-lon)]
                [(cons? (rest a-lon)) (+  (first a-lon) (bad_add (rest a-lon)))])]))

"bad_add tests:"
(= 0 (bad_add empty))
(= 15 (bad_add myList1))
(= 0 (bad_add myList2))

Why is this so bad?

Answer: It violates the encapsulation of rest.

Let's think about what we are trying to do... We're trying to find another way to add up the numbers in a lon. So far, we've used "natural recursion" to add up the values on the way out of the recursion. I'll call this "reverse accumulation" (note that this is not an industry standard term). Reverse accumulation works great for many processes, but it doesn't work for everything.

Another way we can add up the numbers is to add them up on the way into the recursion. I'll call this "forward accumulation". Let's see how we might do this:

"The sum of all the numbers in a list is first plus the the 'running total' which is the sum of all the previous numbers."

The problem is that the first number is treated slightly differently since there is no running total yet. Let's deal with this case first, since we are always fundamentally dealing with the beginning of a list. Since the processing of the rest of the list is different than that of the first, we simply delegate that responsibility to another function:

;; good_add: lon -> num
;; adds up all the numbers in a lon

(define (good_add a-lon)
    (cond
        [(empty? a-lon) 0]    ;; The sum of an empty list is still 0!
        [(cons? a-lon) (good_add_help (rest a-lon) (first a-lon))]))  ;; first is the running total so far.

Notice that the function used in the cons? The function used in this section is not the original function "good_add" because the function that is needed requires an additional input parameter, namely the running total. good_add is not a recursive function!

So let's see what good_add_help, the "helper function", does:

;; good_add_help: a-lon num -> num
;; adds all the numbers in a lon to the supplied accumulator
;; and returns the result.

(define (good_add_help a-lon acc)
    (cond
        [(empty? a-lon) acc]   ;; the sum is the running total 
        [(cons? a-lon) (good_add_help (rest a-lon) (+ (first a-lon) acc))]))   ;; add first to accumulator and recur.

"good_add tests:"
(= 0 (good_add empty))
(= 15 (good_add myList1))
(= 0 (good_add myList2))

The running total is called the "accumulator". Functions of this style are called "accumulator style algorithms".

So why use accumulator style algorithms? It looks from the simple example above that in only means more code.

Possible reasons for using accumulator style algorithms:

It preserves the encapsulation.
It follows the date definition-derived template.
It creates "tail recursive" algorithms -- algorithms where the unprocessed recursive result is returned, which can be optimized for space and speed.
It may not be possible using reverse accumulation -- at least not without violating various precepts of good programming style.

There's a second way to write the main function to sum the list of numbers:

;; good_add2: lon -> num
;; adds up all the numbers in a lon

(define (good_add2 a-lon)
     (good_add_help a-lon 0))

"good_add2 tests:"
(= 0 (good_add2 empty))
(= 15 (good_add2 myList1))
(= 0 (good_add2 myList2))

Pros:

Shorter code
Treats first element the same as the second.
Easier to understand for some people.

Cons:

Doesn't follow our original template exactly (it fits a new view on the problem--see below).
Harder to understand for some people.
Requires that the helper function have a well defined initial value that is not dependent on any other input.

Often this style of accumulator algorithm comes around because one initially conceived the processing of the list in terms of the what is now the helper function. The main function is really just a "wrapping" -- an encapsulation -- of the helper function that hides the initialization of the accumulator from the user. A good way to think of this (Thanks to Dr. Greiner for pointing this out) is that the good_add2 function is not a recursive function but rather just simply a function on an encapsulated type, list-of-numbers. good_add2 just processes the list without breaking its encapsulation at all, just like a function on an number would process the number.

Here's a third type of implementation that is halfway between the first two:

;; good_add3: lon -> num
;; adds up all the numbers in a lon

(define (good_add3 a-lon)
    (cond
        [(empty? a-lon) 0]
        [(cons? a-lon) (good_add_help a-lon 0)]))

"good_add3 tests:"
(= 0 (good_add3 empty))
(= 15 (good_add3 myList1))
(= 0 (good_add3 myList2))

This technique preserves the invariant template, but yet allows a clear initialization of the accumulator.

One requirement for using the above two methods is that the accumulator must have a well-defined initial value that can be used. This is not always true, forcing you to use the longer first implementation.

I find that I use all three styles in my own work. Because of structural enforcements of invariants plus more subtle theoretical reasons, the third method here is chosen over the first or second.

The first technique shown above will always work because it strictly follows the invariant template.

My advice: Apply the First Law of Programming.

All 3 types of implementations will be accepted, but the first and third methods are recommended.

Let's try another algorithm: Let's try to find the last element in a list.

First, the data definition:

;; a NElon is a non-empty list of numbers
;; note that a NElon is a lon.
;; (cons first rest)
;; where first is a number and rest is a lon

Now some bad code:

#| Template:
(define (f-NElon a-nelon...)
	(cond
		[(cons? a-nelon)( ... (first a-nelon)...(rest a-nelon)...)]))
|#
;; note that the empty? clause is missing in the template.

;; bad_last: NElon -> num 
;; returns the last element of a NElon

(define (bad_last a-nelon)
    (cond
        [(cons? a-nelon) 
            (cond
                [(empty? (rest a-nelon)) (first a-nelon)]
                [(cons? (rest a-nelon)) (bad_last (rest a-nelon))])]))
  
"bad_last tests:"
(= 1 (bad_last myList1))
(= 2 (bad_last myList2))

That code should make your skin crawl...eeee!

Let's try that again:

;; good_last: NElon -> num or sym
;; returns the last element of a NElon

(define (good_last a-nelon)
    (cond
        [(cons? a-nelon)( good_last_help (rest a-nelon) (first a-nelon))]))


;; good_last_help: lon, num -> num
;; returns the last element of a lon 
;; or the accumulator if the lon is empty.

(define (good_last_help a-lon acc)
    (cond
        [(empty? a-lon) acc]
        [(cons? a-lon)( good_last_help (rest a-lon) (first a-lon))]))


"good_last tests:"
(= 1 (good_last myList1))
(= 2 (good_last myList2))

I often call this sort of algorithm a "pass-ahead" algorithm because of how information is passed from forward in the recursion.

Notice that the helper takes a lon, not a NElon. Why?

If a-nelon is a NElon, then what can you say about (rest a-nelon)?

Class exercise: Write a function, reverse_list, that will take a list and return a list with the elements in reverse order:

"reverse_list test cases:"
  (equal? (list 5 4 3 2 1) (reverse_list (list 1 2 3 4 5)))
  (equal? (list 'a  'b  'c 'd  'e) (reverse_list (list 'e 'd 'c 'b 'a)))
  (equal? empty (reverse_list empty))

Note that there are three ways (at least) to write this function.

Here's a classic example of when you absolutely need an accumulator algorithm:

Consider the way that we tend to print lists so they are easily readable ("pretty print"):

empty --> "()"

(cons 'a empty) --> "(a)"

(cons 'a (cons 'b empty)) --> "(a b)"

(cons 'a (cons 'b (cons 'c empty))) --> "(a b c)"

The problem here is that while the list of symbols above are recursive, the pretty printed list is not recursive. Notice that the first element does not have a space in front of it while the second element does.

Thus we are forced, on a cons list, to treat the first element differently than the second. From the second element on, not including the ending paranthesis, the output is recursive, so we can use a recursive function.

That is, consider that

"(a b c)" is really "(a[the rest of the list])"

and [the rest of the list] is " b c" (note the spaces).

What is [the rest of the list] when we start with "(a)"?

This function can only be written using the first technique above! -- Why?

Try writing the pretty-printing function without peeking at the solution. Advice: use the built-in string-append and symbol->string functions.

The code from today, including the class exercise, can be downloaded here.

Comp210 Lecture # 11 Spring 2003

A New Way to Add

Possible reasons for using accumulator style algorithms:

Class exercise: Write a function, reverse_list, that will take a list and return a list with the elements in reverse order: