Simple data definitions: `cond`, `struct`s

In class, we've reviewed the design recipe and introduced Booleans and conditionals. Further, we've just barely introduced other kinds of data, symbols and structures. We'll review those here, and see how they influence the design recipe.

Note: There are more exercises here than will be covered during lab. We encourage you to try them on your own.

Index: Built-in data: Booleans, symbols, Data with cases (cond), Data with fields (define-struct), The revised design recipe, Further exercises

Built-in data: Booleans, symbols

First, let's quickly recall the design recipe (so far). We'll fill in some new steps during this lab, so we'll start with some funny numbering.

Function contract, purpose, header.
Function test cases.
Function body.
Function testing: Use your previously-written test cases.

We've seen that DrScheme has

numbers: 17, -42.89, 3/17, etc.
Booleans: true, false
symbols: 'central-daylight-time, 'galatea2.2, 'mary, etc.

Quick exercises on built-in data
Try calling some the built-in functions like `<=`, `=`, `and`, `or`, `not`, `symbol=?` on various data. E.g., is 17 times 18 bigger than 256? What are the contracts of these built-in functions? (Choose any two, and write down the contract.) (You can verify your guess by asking the labby or by going to DrScheme's Help Desk and looking at the manual for the Beginning Student language.) Practice writing a function which returns a Boolean: Write `within-two?` which takes in two numbers `m` and `n`, and returns true if `m`-`n` is less than 2, and conversely `n`-`m` is less than 2. (For learning purposes only, don't use `abs`, which computes the absolute value.) Your test cases can look like (boolean=? (within-two? 99.8 101) true) (boolean=? (within-two? 5 -5) false)
Sample solution

Cases

We've already seen examples in class of a particular pattern: when our input data neatly falls into one of several cases, use cond ("conditional").

Example with cases
We'd like to keep track of who the fastest runner is for the Rice track team. Of course, who is fastest depends entirely on what distance you're interested in. Let's assume we're interested in all distances, not just those used in track meets. Marianne is the team's best sprinter, consistently the best for the standard 100m and 200m distances, and generally for anything under 300m. Becca is the outstanding middle-distance runner for the standard 400m and 800m distances, but also from 300m to just under 1000m. Suzy is best for the long distances, from 1000m to just under 10000m. Finally, marathon specialist Yvonne outlasts everyone for races 10000m and up. Let's write a function `fastest-runner` so that by plugging in a distance, the function returns the fastest runner. What is the contract for this function? The header? The purpose? What are the pertinent test cases? Following the design recipe, now we have a good idea what the function will do, so we can start to right the function body. Finally, check that your previous test cases really do work.
Note that one of the first things you had to decide, was how to represent the distance (clearly a number is appropriate), and how to represent the runner (a bit of thought reveals that symbols works fine). As a group, discuss any ambiguities, issues, thoughts. Sample solution. We could have shortened the sample solution by eliminating the first part of the `and` in each of the 2nd and 3rd cases. Why is this a bad idea?

Another example with cases
A biologist has the following model for lion population in the savannah: If the population is less than 800, then next year the population will grow by 10% (food is plentiful). If the population is more than 800 but less than 1100, the next year's population will grow by 2% (food is available, but scarce). If the population is 1100 or more, then disease is likely, and the next year's population will be down 30%. (COMP210 doesn't claim this is a reasonable model, especially given this function's discontinuity at 800!) With your partner, write a function which, given this year's lion-population, returns next year's lion-population (according to this model). Following the design recipe, of course! (Halfway through, the TAs will stop you to discuss any issues.)
Sample solution.

Is there anything in common about these examples? Really, we find there are often several categories of input. For example, we could write

; A lion-pop is one of:
;  - an integer in [0,800)
;  - an integer in [800,1100)
;  - an integer in [1100,+inf.0)

(Note the "half-open inteveral" notation, standard for mathematicians: the square-bracket means include the number in the interval, and the round-bracket means exclude the number from the interval. These are not meant to be scheme-parentheses. We'll use "+inf.0" to mean positive-infinity for now (note that we exclude infinity from being a lion-pop.)

A couple questions for discussion
Once we've given this name to this type of data, what would the contract for your function be? What is gained by doing this?
Answers: To be discussed in lab. This data definition better reflects our thinking of the program. (If we later write a function involving food-supply, those might also be integers, but we'd never refer to is as a `lion-pop`.

An aside
We might realize that our output should be an integer. How should we obtain an integer: round? truncate? You'd have to go back and ask the biologist! We won't worry about this issue for this problem, but again you're free to look at the manual in Help Desk, if you're curious about what rounding functions are provided by DrScheme.

In summary, this "x, y, or z" structure is a very common way to define data, and you see that the code reflects this way of thinking about the data: There is one cond branch for each branch of data.

Structured data

You've just re-watched the PowerPuff Girl movie yet again, and you decide you want to write a program involving Townsville's heroes. Before we even start to write any code at all, our first question is: how will we represent any individual PowerPuff girl (ppg), in our program? We decide that the aspects we want our program to represent will be their name, color, ferocity-level (a number) and whether or not they're conflict-avoidant. (Note that we think about this before writing any code!)

Well, let's think about what types of data we could use.

Numbers? Naw; I don't think of them as girls number 1,2,3, so my program should not encode them as integers.
Booleans? Right out: (If nothing else, it's impossible by a counting argument: you can't represent three heroes with only two possible values!)
Symbols? That's certainly possible, to have 'Blossom, 'Bubbles, 'Buttercup.

It would mean we'd hand-craft functions such as
```
       ;; ppg->ferocity: symbol -> number
       ;; Given one of {'blossom,'buttercup,'bubbles},
       ;; return their ferocity index, a number.
       ;;
       (define (ppg->ferocity a-name)
         ...)
       
```
Similarly, you'd have functions to map from a name to a color, etc.

What drawback does this approach have? Upon reflection, it doesn't really capture how think about our heroes: Bubbles is one person, which captures many different attributes.

By not reflecting our thinking, it turns out this approach is also hard to maintain: If later a new series, "The PowerPuff Girls Kids", introduces the new hero "Burbles", you'd have to encode the new info by changing several functions in several different places. Bugs are introduced if you forget any of those places. Keep in mind that in large programs, these functions might be spread over multiple files and maintained by different programmers. Very easy to miss a bug!
Structures! We saw in class, we can make a new type of data, composed out of several smaller data. This reflects how we think about them; it also means that making a new hero will be done all at once, rather than changing a bunch of functions.

Defining the data:

     (define-struct ppg (name color ferocity ca?))
     ;
     ; A PowerPuff Girl ("ppg") is:
     ;   (make-ppg symbol symbol number Boolean)
     ; where name is the ppg's name,
     ; color is their color 
     ;   (presumably one of {'green,'blue,'red}, but perhaps different
     ;    if new ppgs are ever created)
     ; ferocity is their ferocity index, a number.
     ; and ca? is whether or not they're conflict-avoidant.

This is a bit strange: we're expanding the language by introducing a new data type. In addition to numbers, Booleans, and symbols, DrScheme now knows about ppgs! We have not yet created any instances of this type -- that is, we haven't yet defined Blossom, etc.

Examples of the data

Fortunately, the comments above, which explained the meaning of the different fields (or attributes), also shows us how to create particular instances of ppgs:

     (make-ppg 'Buttercup 'green 90 false)

Creating structured data
Go ahead and create the value for Blossom (red, ferocity 70, and not conflict-avoidant) and Bubbles (blue, ferocity 77, and conflict-avoidant, though The Professor notes she is actually much more empathic, suggesting that would be a better name for the data definition.)

What's going on?

The result of make-ppg is a particular structure. Think of it as a chest of drawers (the fields), with drawers labeled name, ferocity, etc. Within each drawer is the value of that field ('Buttercup, 90, etc).

Note that make-ppg is a function which was created automatically, after DrScheme evaluated your define-struct. make-ppg is called a constructor.

PowerPuff question
What is the contract for `make-ppg`? Discuss.

In addition to the constructor make-ppg, DrScheme automatically creates a few other functions upon seeing your (define-struct ppg (name ...)). It creates four selectors (or accessors), one for each of the specified fields. Two are

ppg-name : ppg -> symbol
ppg-ca? : ppg -> Boolean.
(Note the convention of a trailing "?" indicating a Boolean function (or value). This falls out our happy choice of including "?" in our define-struct.)

Selectors question
What are the other two selectors for `ppg`s? What are their contracts?

An aside
A mathematician notes that the selectors are the inverse-function of the constructor. In arithmetic there is an axiom that "For any value of `x`, `x` + 0 = `x`". In Scheme they'd note the axiom "For any symbols `a`, `b`, and any number `c`, and any Boolean `d`, it is true that (symbol=? (ppg-color (make-ppg `a` `b` `c` `d`)) `b`)

"Do I always have to type in (make-ppg 'Buttercup 'green 90 false) every time I want to refer to Buttercup? In a word, Yes. (Until next week.) The whole thing, (make-ppg ...), is just a value, in exactly the way 17.938 is just a particular value. (Though ppgs are much more intricate things than numbers, so representing them takes more keystrokes.)

Compare an ancient Roman taking this class, bemoaning "Do i really have to type in four characters, '2000', every time i want to call a function on the number MM?" There is difference between a number, and its representation (Arabic, Roman, etc.). The representation of ppgs is admittedly cumbersome. (But we'll see later how to attach a name to any value -- be it a number like 3.1415926535897932384626433, or -- exactly analagously -- a ppg like (make-ppg 'Buttercup 'green 90 false).)

Functions using structures

Note how much we've accomplished, and we haven't yet even thought about what functions to write!. Fortunately, once we have a clear data definition, writing the functions is often the easy part.

PowerPuff exercise
Using the design recipe, develop a function `negotiates-well?` (or, "`nw?`") which takes in (our representation of) a PowerPuff girl, and returns whether or not she is a good negotiator. It turns out, good negotiators are those whose ferocity is less than 80. Challenge version: If you've done this, make it a bit more complicated: Actually, good negotiators are those who are either conflict-avoidant, or have ferocity bigger than 20 and less than 80. You may either use `and`, `or` for this, or you may use a `cond`; either way is fine. (To mull over: which more closely resembles the phrasing of the problem?)
Sample solution.

Optional PowerPuff exercise
With your partner, develop a function `two-can-beat-up?`, which takes in two PowerPuff girls and a villain's toughness (a number). Our heroes can whoop some heinie if their combined ferocity meets or exceeds the villain's toughness.
Sample solution.

Stepping back: Recipe redux

What was involved in the preceding examples? We realized that when approaching a problem, the first thing we do, is decide how our program will represent data, in the problem we're modeling. Moreover, we've seen two common situations:

The data is of the form "x, y, or z". Any particular function dealing with that data will use a cond, with one branch for each type of data.
The data is of the form "x, y, and z". We use structures to encapsulate these three data into one conceptual datum. Any particular function dealing with that data will use the structure's selectors.

Our code mirrors our data, which in turn mirrors how we conceive of the problem. This is no coincidence. More than half the work of writing a good program lies in choosing good data definitions. We'll see how the shape of the data often guides you to a bug-free solution.

We'll close by updating our design recipe. (Note to labbies: Students haven't seen all these steps in class.)

1. new Data definition: How, exactly, do you represent the problem in terms of Scheme data?
2. new Data examples.
4. Function contract, purpose, header.
5. Function test cases: Use your previously-written data examples.
6. Function body.
1. new If your data has various cases, include a cond -- one branch per case of data. (Note: You can write all the cond questions first, to distinguish the data, and come back later to worry about the answer for a particular function.)
2. new If your data is a structure, your code will be using the various selectors for that structure. You can write these down before thinking about this particular function, like a good craftsperson inspecting what tools are available to complete the task with, before launching in blindly.
7. Function testing: Use your previously-written test cases.

An aside
What is step 3? We'll talk about that in lecture: it will just be a case of saying that defining your data alone gives you enough information to write the skeleton of any function which processes that data! (viz. take the header, and include steps 6a or 6b). This is handy because you can write the skeleton just once, and then copy/paste it for every later function involving that data.

Programming is not tinker & pray. (I.e., copying a similar-looking program from a book, then repeatedly making small changes, running some tests, and hoping it will work, and tinkering some more. That approach leaves people wandering, and only stops by hoping that passing some test cases mean it works on all other inputs too.) There is Process in creating a program: the form of your data guides the shape of your program; the design recipe elucidates this.

If you finish early...

Optional PowerPuff exercises
Your PowerPuff girl program would be boring without villains. With your partner, discuss what features of villains your program will model, and create an appropriate data definition. Include a toughness, and whether or not they are always hungry. Be sure to make examples of the data! Develop `can-handle?`, which takes in one `ppg` and one villain, and returns whether the `ppg` could handle the villain one-on-one. (You can determine for yourself, whether peaceful negotiation counts as successfully handling a villain. `:-)` Here's a new data definition to think about: ; A Townsvillite is: ; - a ppg, or ; - a villain. This is a data definition of the "or" type, which suggests functions which handle Townsvillites will have a `cond` with two branches. (This is the entire data definition -- don't create any new structures.) We already have examples of both `ppg`s and villains, so after pasting this definition, develop `is-rash?`, which takes in any Townsvillite, and determines whether they're rash. (`ppg`s are rash if they're not conflict-avoidant; Villains are rash if they are very tough, or they are always hungry. (Or, use your own criterion.)) Hint: You'll also need to use the functions `ppg?` and `villain?` -- functions also created by a `define-struct` (type predicates). They take in any type of value, and return a Boolean; compare with the built-ins `number?`, `symbol?`, `boolean?`. Go back an extend your definition of a Townsvillite to include regular citizens. Update `is-rash?`, so that it still accepts any Townsvillite and gives a reasonable asnwer.
Sample solutions.

COMP 210: Principles of Computing and Programming

Lab # 2 Fall 2003

Simple data definitions: `cond`, `struct`s

Built-in data: Booleans, symbols

Cases

Structured data

Defining the data:

Examples of the data

What's going on?

Functions using structures

Stepping back: Recipe redux

If you finish early...

COMP 210: Principles of Computing and Programming

Lab # 2 Fall 2003

Simple data definitions: cond, structs

Built-in data: Booleans, symbols

Cases

Structured data

Defining the data:

Examples of the data

What's going on?

Functions using structures

Stepping back: Recipe redux

If you finish early...

Simple data definitions: `cond`, `struct`s