Comp 210 Lab 2: Simple data definitions: `cond`, `struct`s

In class, we've seen the design recipe, introduced boolean data (and some functions like cond, as well as and, or, not. Further, we've just barely been introduced to another type of data -- symbols -- and finally a way to create new types of data: structures. We'll review those here, and see how they influence the design recipe.

built-in data types: booleans, symbols,
Data with cases (cond),
Data with fields (define-struct),
The revised Design recipe,
Further Exercises

new primitive data types

First, let's quickly recall the design recipe (so far).

4. Contract, purpose, header for the function.
5. Develop test cases for the function.
6. Write the body of the function.
7. test your function. (using your previously-written test cases)

(We'll fill in some new steps during this lab.)

We've seen that DrScheme has, in addition to numbers, two boolean values, and (an infinite number of) symbols.

  17    true   'central-daylight-time   'galatea2.2

Three quick exercises:

Type in a couple more examples of symbols. Try calling some the built-in functions <=, =, and, or, not, symbol=?. Is 17 times 18 bigger than 256?
What is the contract of these built-in functions? (Choose any two, and write down the contract.)
(You can verify your guess by asking the labbie, or by going to DrScheme's Help Desk, and looking at the manual for the Beginning Student language.)
Practice writing a function which returns a boolean: Write within-two? which takes in two numbers m and n, and returns true if m-n is less than 2 and conversely n-m is less than 2. (For learning purposes only, use and, not the absolute value.)

Your test cases can be look like:
```
(within-two? 99.8 101)
true

(within-two? 5 -5)
false
```
(if successful, answers to come in pairs), or of the form
```
(boolean=? (within-two? 99.8 101)
           true)

(boolean=? (within-two? 5 -5)
           false)
```
(if succesful, all answers are true).
solution

Cases

We've already seen examples in class, of a particular pattern: When our input data neatly falls into one of several cases, use cond ("conditional").

Example: Last year, Rice had Bill Cosby as a commencement speaker; his honorarium was not disclosed. This year, the search committee might be looking at another comedian. However they don't don't know what their final budget will be: only that they'll get at least $500. If budgeted $8000 or more, they can get Ellen Degeneres to speak. If budgeted less, but still at least $2000, they can get Rodney Dangerfield. If they don't even get $2000, then for $500 or more they can get Carrottop.

Let's write a function speaker-affordable (or, "spkr-affrd") so that by plugging in a (constantly modified estimate of a) budget, the function returns the best speaker affordable.

What is the contract for this function? The header? The description?
What are the pertinent test cases?
Following the design recipe, now we have a good idea what the function will do, so we can start to right the function body.
Finally, check that your previous test cases really do work.

As a group, discuss any ambiguities, issues, thoughts. solution
Note that one of the first things you had to decide, was how to represent the budget (clearly a number is appropriate), and how to represent the comedian (a bit of thought reveals that symbols works fine).

A biologist has the following model for lion population in the savannah: If the population is less than 800, then next year the population will grow by 10% (food is plentiful). If the population is more than 800 but less than 1100, the next year's population will grow by 2% (food is available, but scarce). If the population is 1100 or more, then disease is likely, and the next year's population will be down 30%.
(Comp210 does claim this is a reasonable model, especially given this function's discontinuity at 800!)

Exercise: With your partner, write a function which, given this year's lion-population, returns next year's lion-population (according to this model). Following the design recipe, of course! (Halfway through, the TAs will stop you to discuss any issues.) solution

Is there anything in common about these examples? Really, we find there are often several categories of input. For example, we could write

; A lion-pop is one of:
;  - an integer in [0,800)
;  - an integer in [800,1100)
;  - an integer in [1100,+inf.0)

(Note the "half-open inteveral" notation, standard for mathematicians: the square-bracket means include the number in the interval, and the round-bracket means exclude the number from the interval. These are not meant to be scheme-parentheses. We'll use "+inf.0" to mean positive-infinity for now (note that we exclude infinity from being a lion-pop.)

Aside: A nuance:
Once we've given this name to this type of data, what would the contract for your function be? What is gained by doing this? First, this data better reflects our thinking of the program. (If we later write a function involving food-supply, those might also be integers, but we'd never refer to is as a lion-pop.

Aside:
Note that this more specific contract might also spur us to realize that our output should be an integer. Rounded? Truncated? You'd have to go back and ask the biologist! We won't worry about this issue for this problem, but again you're free to look at the manual in Help Desk, if you're curious about what rounding functions are provided by DrScheme.

This "x, y, or z" structure is a very common way to define data, and you see that the code reflects this way of thinking about the data: There is one cond branch for each branch of data.

Structured data

You've just re-watched the Powerpuff Girl movie yet again, and you decide you want to write a program involving Townsville's heroes. Before we even start to write any code at all, our first question is: how will we represent any individual powerpuff girl ("ppg"), in our program? We decide that the aspects we want our program to represent will be their name, color, ferocity-level (a number) and whether or not they're conflict-avoidant. (Note that we think about this before writing any code!)

Well, let's think about what types of data we could use.

Numbers? Naw; I don't think of them as girls number 1,2,3, so my program should not encode them as integers.
Booleans? Right out: (if nothing else, it's impossible by a counting argument: you can't represent three heroes with only two possible values!)
Symbols? That's certainly possible, to have 'blossom, 'bubbles, 'buttercup.

It would mean we'd hand-craft functions such as
```
;; ppg->ferocity: symbol -> number
;; Given one of {'blossom,'buttercup,'bubbles},
;; return their ferocity index, a number.
;;
(define (ppg->ferocity a-name)
  ...)
```
Similarly, you'd have functions to translate a name to a color, etc.

What drawback does this approach have? Upon reflection, it doesn't really capture how think about our heroes: Bubbles is one person, which captures many different attributes.

By not reflecting our thinking, it turns out this approach is also hard to maintain: If later a new series, "the powerpuff girls kids", introduces the new hero "burbles", you'd have to encode the new info by changing several functions in several different places. Bugs are introduced if you forget any of those places. Keep in mind that in large programs, these functions might be spread over multiple files and maintained by different programmers. Very easy to miss a bug!
Structures! We saw in class, we can make a new type of data, composed out of several smaller data. This reflects how we think about them; it also means that making a new hero will be done all at once, rather than changing a bunch of functions.

Defining the data:

(define-struct ppg (name color ferocity ca?))
;
; A Powerpuff Girl ("ppg") is:
;   (make-ppg symbol symbol number boolean)
; where name is the ppg's name,
; color is their color 
;   (presumably one of {'green,'blue,'red}, but perhaps different
;    if new ppgs are ever created)
; ferocity is their ferocity index, a number.
; and ca? is whether or not they're conflict avoidant.

This is a bit strange: we're expanding the language by introducing a new data type. In addition to numbers, booleans, and symbols, DrScheme now knows about ppgs! We have not yet created any instances of this type -- that is, we haven't yet defined blossom, etc.

Examples of the data

Fortunately, the comments above, which explained the meaning of the different fields, also shows us how to create particular instances of ppgs:

(make-ppg 'buttercup 'green 90 false)

Go ahead and create the value for blossom (red, ferocity 70, and not conflict avoidant) and bubbles (blue, ferocity 77, and conflict avoidant, though The Professor notes she is actually much more empathic, suggesting that would be a better name for the data definition.)

What's going on?

The result of make-ppg is a particular structure. Think of it as a chest of drawers ("fields"): they are labeled name, ferocity, etc. Within each drawer is the value of that field ('buttercup, 90, etc).

Note that make-ppg is a function which was created automatically, after scheme evaluated your define-struct.
Exercise: What is the contract for make-ppg? Discuss.
make-ppg is called a "constructor".

In addition to make-ppg, there are a few other functions which drscheme has automatically created, upon seeing your (define-struct ppg (name ...)): these functions are called "selectors":

ppg-name : ppg -> symbol
ppg-ca? : ppg -> boolean.
(Note the convention of a trailing "?" indicating a boolean function (or value); this falls out our happy choice of including "?" in our define-struct.)
...

Aside:
A mathematician notes that the selectors are the inverse-function of the constructor. In arithmetic there is an axiom that "For any value of x, x + 0 = x". In scheme they'd note the axiom "For any symbols a, b, and any number c, and any boolean d, it is true that

(symbol=? (ppg-color (make-ppg a b c d))
          b)

"Do I always have to type in (make-ppg 'buttercup 'green 90 false) every time I want to refer to Buttercup? In a word, Yes. (Until next week.) The whole thing, (make-ppg ...), is just a value, in exactly the way 17 is just a particular value. (Though ppgs are much more intricate things than numbers, so representing them takes more keystrokes.)

Compare an ancient roman taking this class, bemoaning "Do i really have to type in four characters, '2000', every time i want to call a function on the number MM?" There is difference between a number, and its representation (arabic, roman, etc). The representation of ppgs is admittedly cumbersome. (But we'll see later how to attach a name to any value -- be it a number like 3.1415926535897932384626433, or -- exactly analagously -- a ppg like (make-ppg 'buttercup 'green 90 false).)

TAs: you'll probably make it to about here.

Functions using structures

Note how much we've accomplished, and we haven't yet even thought about what functions to write!. Fortunately, once we have a clear data definition, writing the functions is often the easy part.

Exercise: Write a function negotiates-well? (or, "nw?") which takes in (our representation of) a powerpuff girl, and returns whether or not she is a good negotiator. It turns out, good negotiators are those whose ferocity is less than 80.

Challenge version:
If you've done this, make it a bit more complicated: Actually, good negotiators are those who are either conflict-avoidant, or have ferocity bigger than 20 and less than 80. You may either use and, or for this, or you may use a cond; either way is fine. (To mull over: which more closely resembles the phrasing of the problem?)

Don't forget the design recipe! Discuss. solution

Optional Exercise: With your partner, write a function two-can-beat-up?, which takes in two powerpuff girls and a villain's toughness (a number). Our heroes can whoop some heinie if their combined ferocity meets or exceeds the villain's toughness. solution

Stepping back: Recipe redux

What was involved in the preceding examples? We realized that when approaching a problem, the first thing we do, is decide how our program will represent data, in the problem we're modeling. Moreover, we've seen two common situations:

The data is of the form "x, y, or z". Any particular function dealing with that data will use a cond, with one branch for each type of data.
The data is of the form "x, y, and z". We use structures to encapsulate these three data into one conceptual datum. Any particular function dealing with that data will use the structure's selectors.

Our code mirrors our data, which in turn mirrors how we conceive of the problem. This is no coincidence. More than half the work of writing a good program lies in choosing good data definitions. We'll see how the shape of the data often guides you to a bug-free solution.

We'll close by updating our design recipe. (Note to labbies: Students haven't seen all these steps in class.)

1. data definition
How, exactly, do you represent the problem in terms of Scheme data?
2. Example data.
You'll also use these data for testing specific functions later, but this is an important step in its own right.
4. Contract, purpose, header for the function.
5. Develop test cases for the function.
6. Write the body of the function.
1. If your data has various cases, include a cond -- one branch per case of data.
  (Note: You can write all the cond questions first, to distinguish the data, and come back later to worry about the answer for a particular function.)
2. If your data is a structure, your code will be using the various selectors for that structure. You can write these down before thinking about this particular function, like a good craftsperson inspecting what tools are available to complete the task with, before launching in blindly.
7. test your function. (using your previously-written test cases)

Aside:
What is step 3? We'll talk about that in lecture: it will just be a case of saying that defining your data alone gives you enough information to write the skeleton of any function which processes that data! (viz. take the header, and include steps 6a or 6b). This is handy because you can write the skeleton just once, and then copy/paste it for every later function involving that data.

Aside:
Programming is not tinker&pray.
(I.e., copying a similar-looking program from a book, then repeatedly making small changes, running some tests, and hoping it will work, and tinkering some more. This "procedure" leaves people wandering, and only stops by hoping that passing some test cases mean it works on all other inputs too.) There is Process in creating a program: the form of your data guides the shape of your program; the design recipe elucidates this.

If you finish early..

Optional Exercise: Your powerpuff girl program would be boring without villains. With your partner, discuss what features of villains your program will model, and create an appropriate data definition. Include a toughness, and whether or not they are always hungry. Be sure to make examples of the data!

Optional Exercise: Write can-handle?, which takes in one ppg and one villain, and returns whether the ppg could handle the villain one-on-one. (You can determine for yourself, whether peaceful negotiation counts as successfully handling a villain. :-)

Here's a new data definition to think about:

; A Townsvillite is:
;  - a ppg, or
;  - a villain.

This is a data definition of the "or" type, which suggests functions which handle Townsvillites will have a cond with two branches. (This is the entire data definition -- don't create any new structures.)

Optional Exercise: We already have examples of both ppgs and villains, so after pasting this definition, write is-rash?, which takes in any Townsvillite, and determines whether they're rash. (ppgs are rash if they're not conflict avoidant; villains are rash if .. they are very tough, or they are always hungry. (Or, use your own criterion.)
Remember that contract/purpose/header comes before test cases comes before the body of your function.
Hint You'll also need to use the functions ppg? and villain? -- functions also created by a define-struct ("type predicates"). They take in any type of value, and return a boolean; compare with the built-ins number?, symbol?, boolean?.

Optional Exercise: Go back an extend your definition of a Townsvillite to include regular citizens (like the mayor or ). Update is-rash?, so that it still accepts any Townsvillite and gives a reasonable asnwer.

solution

Comp 210 Lab 2: Simple data definitions: cond, structs