Lab 12
Arrays

Spring 1996

Simulating variable-length arrays
- Strings
Exercises
- String manipulation
- Sorting via generative recursion

Today's lab doesn't introduce any deep new concepts, but does provide more practice with arrays (including non-integer arrays), C++, and recursion on non-recursively-defined data structures (such recursion is called generative recursion). By now you should have read all of the C Primer except for the sections on pointers and dynamic memory allocation (and it wouldn't hurt to read those, too, so you're prepared for the next week of lecture).

Simulating variable-length arrays

It is very easy to modify the length of a list; more precisely, we have simple, efficient operations for extending a list with one additional element at the front (cons) and for specifying the list consisting of a list sans its first element (cdr). Scheme vectors and C arrays, on the other hand, are not so convenient: they have one particular size, specified when they are created and set for all time. (Scheme lets us supply an arbitrary expression to make-vector, so we can create vectors of whatever size we like, even if we don't know the exact size (only an expression computing it) when writing a program; but so far we don't know how to do that in C, so we must specify a particular constant size for C arrays when we write a program.)

This inflexibility is inconvenient; we don't want to be forced to allocate an entire new vector or array (and copy the contents of the old one to the new one) in order to add or remove a single element. Therefore, we frequently simulate variable-length arrays by allocating a large array and using only part of it. At different times different amounts of the array are used.

There are two basic techniques for specifying how much of an array is being used: storing the length and using a sentinel value. So far we have stored the array length; when we pass an array to a function, we have also pass in its length, so that the function knows how much of the array to use. Sometimes the length of the array is stored in the array, as its first element. This works in Scheme, and in C when the array is an array of integers, but not when the array contains some other data type (in C, every element of an array must have the same type).

A second technique is to use a sentinel value, which is a value that we know never appears in the data; when we encounter that value while processing the array, we know we have already examined every interesting value. For instance, the following code processes every element of an array of integers which uses the value -22 as a sentinel.

int index = 0;
while (array[index] != -22)
{
  // ... process array[index] ...
  index = index + 1;
}

A disadvantage of this technique is that it the array cannot hold arbitrary values--the sentinel value must not appear in the data, lest it be misinterpreted as indicating the end of the array. When we know something about the values (for instance, they are always positive, so -22 definitely won't appear), this technique has the advantage of not forcing extra parameters to be passed around and (more problematically) returned.

Strings

C represents strings as character arrays delimited by the '\0' character. Strings are not just character arrays--they must contain the null character ('\0', not to be confused with the null pointer, NULL) somewhere, indicating the end of the data. An array of 22 characters can only contain a 21-character string, because the last (21st) position is occupied by '\0'. Leaving off the terminating character is a serious error, because C will continue processing memory looking for that character; it will eventually run off the end of the array and begin processing other data as if they were part of the string. This can lead to mysterious and very difficult to debug errors.

Exercises

String manipulation

Write a C program which reads a word of 99 or fewer characters and outputs its length. For instance, if you called the program lab12, an execution of it would look like this (where % is the Unix prompt):

    % lab12
    Type a word: sequoia
    The word sequoia has 7 letters.
    %

Do not use the built-in function strlen; write your own version so that you get practice manipulating strings. You can read in a word (note spaces are prohibited) by doing this:

    char word[100];
    cin >> word;

Note the duality of << with >>. This form reads only a single word, stopping at the first whitespace. (The cin.getline function reads an entire line of text, but we won't bother to use it here.) If you finish quickly, modify your program to also output the number of vowels in the word.

Write your own version of the strcat function, which concatenates two strings. When you make the call, be sure that one of the strings is long enough to hold the contents of both!

Sorting via generative recursion

Write a C program to sort the characters of a word. (You may wish to write your program in Scheme and then translate it to C.) Use the insertion sort algorithm, which can be recursively defined:

If the array has length 1 or less, it is already sorted. Otherwise, let n be the length of the array. Sort the first n-1 elements, then insert the nth element in its proper place, and you are done.

To insert in an m-element sorted (sub)array an element e which follows the end of the array (i.e., it appears in array element m+1), compare e to the last element of the sorted range (the mth element). If m=0 or e is greater than the last sorted element, then all m+1 elements are already in sorted order. Otherwise, swap e with the mth element, because that last sorted element certainly follows e in sorted order. Now recursively insert e in the sorted array of m-1 elements that precedes it.

You may wish to split the problem into two functions (vectorinsert and sort), each containing a single while loop or other iteration construct.

We already know that for inductively-defined data such as lists and trees, recursion is the technique of choice. The function schema mirrors the data definition and it is easy to write functions that are correct and don't omit important cases. We call such recursion natural recursion. Here we see that recursion can be applied to non-recursively defined data; then we call it generative recursion. Recursion (and wishful thinking, which helps you write recursive procedures) is a powerful technique that you will encounter again and again, even when it does not at first appear to be directly applicable to your problems.