Comp 210 Lab 11: Bucket Size: The Whole Truth

There is one slight gloss-over made, when we say that C, when looking up array elements, blindly looks at the bucket where the vector starts, and offsets the same number of buckets as the array index.

Let's consider the following, where T stands for some type (int, char, float, whatever).

  T vec[100];
  int i;
  ... 
  vec[ i ];
So vec[0], ..., vec[99] are stored sequentially in memory. And, as we just saw, C actually just creates a placeholer vec which is an address of where the vector elements are stored. (Note that vec is not of type T, but instead of type "pointer to T", written T*.)

Now here's the lie: Sure enough, vec[0] is stored in the bucket pointed to by vec. Let's again suppose this happens to be bucket 3000. But where is vec[1]? Bucket 3001 sounds reasonable, except that whatever type T is, it may not fit into a single bucket! In particular, while a char happens to fit into a single bucket, an int needs four buckets! (Well, see below for nothing but the truth). So the vector vec doesn't need 1000 buckets, but instead needs 1000 * sizeof(T), where sizeof(T) is the number of buckets needed to hold a T. So where is (the first bucket of) vec[1] located, in memory? Yes, at 3000+1*sizeof(T). Similarly vec[17] begins at address vec + 17*sizeof(T).

C does all this for you, when you use []. Even were you write *(vec + 17), it would include a factor of sizeof(T), which it knows how to do since you declared vec to be a pointer-to-T.

Do not do pointer arithmetic yourself. If you want to refer to vec[17], then write that, even though yes yes yes C lets you get away with writing raw pointer arithmetic as shown above. But doing the pointer arithmetic yourself opens the door for trouble, as well as making the code less readable. Nor are you likely to gain any significant run-time efficiency: g++ already calculates addresses efficiently.

One final note, on the size of int: saying that a single int occupies four buckets is true on the Sparc machines in Ryon. But It might vary on different machines, and might change over time. Ten years ago, int fit in two bytes--er, buckets. The size of int, etc, is not part of the official C language.


Back to Lab 11
Back to Comp 210 Home