Lab 13
Lists in C

Comp 210

Spring 1996

Contents

At this point, you should have read the entire C Primer. If you haven't, you are likely to have trouble understanding this material; it's a waste of your time and ours for you to ask lots of questions that are answered in the handout.

Equivalence of arrays and pointers

In lecture (and in the C Primer) it is noted that arrays and pointers are equivalent; thus, the expressions a[7] and *(a+7) mean the same thing in C. While this is true, you should never use the latter expression. The former is clearer, easier to read and write, and no faster (though old C compilers sometimes generated more efficient code for the pointer version, leading a generation of C programmers to use the obfuscated version simply in order to get a little more performance). You should understand that C implements arrays in terms of pointers, but don't use that knowledge when you write programs.

Similarly, the & address-of operator was introduced in class. Don't use it; it is very dangerous and its only real use is to produce blackboard examples before you know about new (which you have read about in the C Primer by now).

The null pointer

When we create a new pointer, it doesn't point anywhere in particular; it could have any value, which could refer to any address in the machine (including ones that don't belong to us), so it's dangerous and wrong to dereference it. However, C can't distinguish between such nonsense values and pointers to real objects, so you have to keep track of which is which yourself.

Sometimes you want a pointer not to point anywhere in particular; use the pointer value NULL (sometimes called the null pointer) for that purpose. You may never dereference NULL, but its advantage is that it can be distinguished from real pointers by means of the test (ptr == NULL) which is true for the null pointer and false otherwise. If you have a pointer that might or might not be NULL, then you should test it before using it; if it's NULL, then there is nothing to do.

Structures

Like Scheme, C can create new types by combining primitive data types. A collection of one or more variables, possibly of different types, is a structure. For instance, the following defines a new type called point which is suitable for representing points in the plane.

struct point
{
  int x;
  int y;
};
Now point is a type just like int, float, etc., and we can define variables of that type. Be sure to fill in the parts of any points you create. Structure slots (sometimes called members) are referred to by the syntax structure.slotname.
point origin;
origin.x = 0;
origin.y = 0;
It is also possible to initialize an instance (only at the time of its definition) by enclosing the slot values in curly braces:
point pt = {100, 200};
point pt2;
pt2 = {100,200};  // ILLEGAL
Now pt.x and pt.y have values 100 and 200 respectively.

Recursive structures

Now we will define lists in C. Recall the data definition for a list of integers:

intlist :=   null
           | (cons int intlist)
Each cons cell has two parts: its car is an integer and its cdr is a list. We might think of defining a structure to represent cons cells as follows:
struct conscell
{
  int car;
  conscell cdr;     // WRONG
};

This definition has the problem that it never stops: the cdr field of every conscell is another conscell. A conscell is as large as an integer plus another conscell: in other words, a conscell is infinitely large. The difficulty is that we have used only one alternative from the data definition:

intlist := (cons int intlist)

We can use pointers to solve our problems: a pointer has a definite size, and a pointer can either point to more data or to no more data (if it is NULL). Thus, the following definition:

struct conscell
{
  int car;
  conscell * cdr;    // RIGHT
};

We represent a list not as a conscell, but as a pointer to a conscell. (That's important; read it again.) Thus, the second element of conscell is a list (i.e., its type is conscell *, just as the data definition states. The reason we don't represent a list as a conscell is that such a representation doesn't admit a reasonable way to specify the empty list; but when pointers are used, the value NULL can be used as a conscell *, in which case it stands for the empty list.

Given a conscell k, we can extract k's elements via k.car and k.cdr; their types are int and conscell *. We can get the element after k.car via (*(k.cdr)).car. We can't do just k.cdr.car because k.car is not a conscell but a pointer to a conscell, so we can't use the . operator--which is applicable only to structures--on it. Since the action of dereferencing then extracting a structure field is common, C provides a shorthand for it: a->b is equivalent to (*a).b. Thus, (k.cdr->car is equivalent to (*(k.cdr)).car.

Dynamic memory allocation

In Scheme, in order to build a 22-element list, we must call cons 22 times. Likewise, in C we must create 22 conscell objects in order to make such a list. (Scheme's list function provides a convenient shorthand, but cons is still being called that many times. C doesn't have such a shorthand, and can't be expected to since you just defined the conscell type.)

We have two ways to define a new conscell. The first is to define a variable of that type:

{
  conscell c;
  ...
}
The problem with this approach is that when the scope of the variable c is exited, then variable c goes away, as does its value. Our new conscell doesn't survive the function call or block in which it was created.

To construct objects that stick around forever, use the new operator. Followed by a type, it forms an expression whose result type is a pointer to that type. For instance, new int is an expression which allocates space for an integer and returns a pointer to that storage. Similarly, new conscell creates a new conscell and returns a pointer to it. You should not use new as a statement:

  new conscell;      // WRONG, pointer is discarded
because while this creates a new conscell, the pointer is thrown away. It's as if you had turned some other expression into a statement:
  7 + 15;            // WRONG, result is discarded
You should use the value of the expression; the most typical thing to do is assign it to a variable (of the right type)
  conscell * ccp = new conscell;
and then refer to the memory through the pointer. When the scope of ccp is exited, the variable and its value are eliminated. That value is just a pointer and it was stored elsewhere, you can still manipulate the new memory through it. For instance, here is one way to write cons in C:
conscell * cons(int newcar, conscell * newcdr)
{
  conscell * result = new conscell;
  result->car = newcar;
  result->cdr = newcdr;
  return result;
}
Notice that this function takes a list as its second argument and returns a list, as expected.

Exercises

  1. Build a four-element list of integers. Either use cons or call new directly.
  2. Write a function that computes the length of a list.
  3. Write a function that computes the sum of the elements of a list.
  4. Write a function that prints each element of a list, all enclosed in parentheses.
  5. Write a function read_list that repeatedly prompts the user for a positive integer and adds it to the front of a list using cons, terminating (and returning the list) when the user enters 0. Or, your function could read a length, then that many integers.