Comp210 Lab 4

Comp 210 Lab 4: Arithmetic Imprecision

(Computers Can be Wrong)

In the previous lab, we demonstrated a function (strongest-coalition-1) which could be unusably slow when given reasonable data. Though slow, at least the function always gave the correct answer. Today's lab demonstrates how sometimes computers give answers that are just flat-out wrong.

Table of Contents

Tail Recursion

Scheme's exact numbers

fixed-width integers, and overflow

floating-point representation

underflow

other imprecision

time/precision tradeoff

The UNIX Corner

Tail Recursion

Before getting into the main part of today's lab, just an observation about some previous examples we've seen.

Recall the hand-evaluation of a function like product that we did previously:

(product (list 4 5 3))
=
(* 4 (product (list 5 3)))
=
(* 4 (* 5 (product (list 3))))
=
...

The answer to product is reduced to a multiplication and a recursive call. We noted at the time, how the pending multiplications pile up in front. If we were calling product on a list with a thousand elements, there would at some point literally be a thousand pending operations inside the computer, taking up scratch-space.

Compare this with a function like contains-mary from the class notes. The hand-evaluations for that were qualitatively different:

(contains-mary (list 'jim-bob 'athena 'shiva 'mary 'dweezil))
=
(contains-mary (list 'athena 'shiva 'mary 'dweezil))
=
(contains-mary (list 'shiva 'mary 'dweezil))
=
...

The point is, that in this case there is no pending computation piling up at each step; the answer to contains-mary is reduced to just a recursive call, but no additional work; running contains-mary on a list with a thousand names wouldn't take up much additional scratch room.

There is a name for this difference between the two approaches: we say the second case is tail recursive, because the last ("tail") part of the function is nothing but a recursive call (as opposed to product, where the last part was * applied to a recursive call).

We tend to like tail recursive functions, since they may not take so much extra scratch space to evaluate.

Scheme's Exact Number Representation

While there is no bound on how big or tiny a mathematical number can be, unfortunately every computer does have a bound on how big or tiny a number it can represent internally.

You have probably already noticed that Scheme uses "arbitrary precision arithmetic": It can represent very large integers accurately. For instance, ten to the eightieth power plus seventeen can be typed directly into Scheme, 100000000000000000000000000000000000000000000000000000000000000000000000000000017, and the value stored exactly. The only limit to such numbers in Scheme is the actual amount of memory of the computer.

In order to represent these large, exact-precision integers, Scheme is (essentially) keeping a list of the digits. When it adds or multiplies two exact-precision numbers together, Scheme performs the grade-school algorithms.

Overflow

Most programming languages don't have don't have exact arithmetic built-in. They commonly make the simplifying assumption that the largest integer has, say, nine digits. We call these fixed-width numbers.

Using fixed-width numbers up to (say) a billion might be fine for most everyday calculations, but what happens when you try to add a couple 9-digit numbers and get a 10-digit number? The result can no longer be stored as an exact integer; this is called overflow. What happens next depends on the particular system. The program might halt; it might procede with the result tagged as a special, anomolous value perhaps written +Infinity; or (most frightful yet) it might procede with some totally spurious number, without even informing anybody that all subsequent calculations are garbage.

Floating-point Representation

The idea of finite representations is not peculiar to integers; it applies to trying to represent real numbers as well.

Scheme represents rational numbers (fractions) exactly, as one arbitrary-precision integer divided by another.

However, in most languages, "real numbers" are approximated by using (say) 15 significant digits. In order to express fractional amounts, of numbers, scientific notation is used. For instance, 6.022e23 is 6.022 with the decimal place shifted 23 places to the right (i.e., 60.22 hexillion). Just as the mantissa (the "6.022" part) is restricted to 15 digits, the exponent (the "23" part) is also restricted in size; we'll see how big it can be in a moment. This method of representing numbers is called floating-point representation, since the decimal point "floats around", relative to the 15 significant digits, depending on the exponent. Notice that inside the computer, each floating-point number needs the same, fixed number of digits to be stored. This convenience sometimes makes up for the fact that such numbers might only be approximations to the numbers we're actually interested in.

Just as with integers, it is possible to think of numbers whose exponent is larger than can be stored in a computer's floating point number; calculations whose answer exceeds the largest expressible number is again called overflow.

In addition to arbitrary-precision rational numbers, Scheme also uses floating-point numbers whenever a number is typed in as a decimal point. Evaluate (/ 2 37) and compare it to (/ 2.0 37.0). (If you really need them, there are also functions exact->inexact and vice-versa to convert between the two formats.)

Looking for Large Numbers

For any number x, if we double x and then divide by two, hopefully we get x back. But here's where overflow can get in the way: If x is large enough that doubling it causes an overflow, then doubling x followed by halving the result will likely not give us x back. Let's take a number, and keep doubling it until (using this approach) we find an overflow:

(define find-huge
  (lambda (x)
    (if (not (= x (/ (* x 2.0) 2.0)))
	x
	(find-huge (* x 2.0)))))

(define huge (find-huge 1.0))
(+ huge huge)

Underflow

Underflow occurs when a number is too tiny to be expressed as a floating-point number; i.e. when you get a non-zero number which is very very close to zero, such as 1/1e400.

There is an IEEE floating point standard (click here to see nitty-gritty bit-layout details) which also incorporates gradual underflow, a scheme to try to go underflow "softly" from a non-zero value to zero (in some situations this difference is a qualitative change, as well as quantitative one).

Beware that some people mis-use the term underflow to mean "overflow in the negative direction", that is taking a very negative number and multiplying it by two. The technical term for that situation is "overflow in the negative direction".

Self-exercise: Write a function find-tiny which finds a number which is within a factor of two of the smallest (positive) representable fixpoint number.

Other Sources of Error

Besides underflow and overflow, there are two other ways errors can be introduced in floating point numbers.

The first is round-off error. Sometimes, when dividing one number with 10 decimal places by another with 10 decimal places, the the result requires 11 or more decimal places to store the answer. No problem, you round the last digit, and you have an answer that's pretty darn close to exact.

The problem is, round-off error can accumulate. For a simple example, write a function which adds up 185 copies of (/ 1.0 185)--what is the result? What if you have a function which starts at 1 and subtracts 1.0/185 with each recursive call, quitting when its argument is precisely equal to zero?

The second form of error can be more insidious. If we are using only (say) 15 significant digits, then we run into problems when adding numbers which vary by more than a factor of 10^16: 1.0e16 + 1 should be 1.00000000000000001e16, but if you only have 15 digits the closest answer is 1.0e16, which is what you started with! You might think this error is reasonable; being wrong by one part in 10^16 (ten million billion) is admittedly close enough for government work.

However, small errors can accumulate. Copy the file /home/comp210/Labs/lab04/sum.ss to your directory, and fill in the bit of code it indicates. This shows a rule-of-thumb to minimize the error: always handle small-magnitude numbers first.

Does that approach always solve the problem? If a small error can occur even once, are there cases where a large error can? Consider the following list, named janus after the two-faced Roman god.

:- janus
(list 31
      2e+34
      -1.2345678901235e+80
      2749
      -2939234
      -2e+33
      3.2e+270
      17
      -2.4e+270
      4.2344294738446e+170
      1
      -8e+269
      0
      99)

(This list, and a function add-list, can be found in Projects/lab4-library.ss.) What are the results of (add-list janus) and (add-list (reverse janus))? That's quite a difference for two answers that should be the same! What's going on? Do you still trust your computer to give you reasonably correct answers as much as you did 30 seconds ago?

Time vs. Precision Tradeoff

So if Scheme has arbitrary precision integers and rationals, why do we ever bother with floating-point arithmetic? Like any interesting choice, there is a trade-off between the two alternatives (else it wouldn't be interesting). Using arbitrary precision arithmetic can sometimes slow us down, and often the added accuracy isn't enlighting.

Consider writing our own version of expt to raise one number to another, where the second number is a nice, positive integer. You should be able to whip this out in no time:

(define my-expt
  (lambda (r n)
    (cond [(= n 0) 1]
	  [else (* r (my-expt r (1- n)))])))

Okay, now define two versions of the same number, once as a floating-point number and once as an exact-precision number:

(define approx (+ 1 1e-12))
(define exact  (+ 1 (expt 10 -12)))

(Evaluate each of these variables, just to see what their values look like.)

Now, we try our function on these numbers: What is (my-expt approx 30)? How about (my-expt exact 30)? Which answer is more useful to you? (Check out the interesting symmetry of the numerator in this second version.)

(You may also want to try raising approx to a huge number: try (my-expt approx 1e12). Is the result familiar?)

The UNIX corner

The UNIX command-o'-the day is wc.

No no no, "word count". It's a simple program which, given a file, will count how many lines, words, and characters are in it. For instance,

     owl% wc 4soln.ss
     269    1186    8231 4soln.ss

This means that my solution was 269 lines long.

Also, by reading the manual page (man wc) you glean that if you don't name an input file, wc counts the characters from "standard input"--that is, from the keyboard until you type ^D. Try it:

    owl% wc
    oh boy isn't
    this a barrel
    of fun
    ^D
     3       8      34

Sure enough three lines and eight words. Fascinating!

Okay, I'll admit, wc by itself isn't terribly exciting. However, it's a great way to introduce a general UNIX concept: pipes. Sure my homework four solution was 269 lines long, but how many of those were comments? We can use grep(you remember grep, right?) to select only the lines containing ;;. But grep ";;" 4soln.ss just shows the lines on the screen. Really, I'd like to feed those lines to wc. Use the pipe symbol, |, to do this:

    owl% grep ";;" 4soln.ss | wc
        119     735    4510

How many lines of comments did your solution have?

Why is this called a "pipe"? We have taken the standard output pouring out of the first command (grep), and used the pipe to connect it to the intake (er, standard input) of the second program (wc).

Using pipes to connect programs can be a very handy tool; UNIX has many little utilites (like wc) which are uninspiring by themselves, but when combined with other programs are very useful.

As you might expect, you can pipe several things in a row together. My homework file defined a lot of test cases; here's a way to find how many lines were in my solution, after filtering out all lines containing "define" and all lines containing ";;":

      owl% grep -v ";;" 4soln.ss | grep -v define | wc -l
          119

Back to Comp 210 Home