Comp 210 Lab 12: C Pitfalls

Table of Contents

  • Cotchas (C Gotchas)
  • Segmentation Faults and Bus Errors
  • Other Selected Error Messages
  • This lab will discuss some common C pitfalls. You can also browse the C FAQ from comp.lang.c. Disclaimer: some of the constructs advised against here are occaisionally justifiable. Our warnings are only guidlines, but don't ignore them unless you know what you're doing.

    Cotchas (C Gotchas)

  • C is int-ocentric.
    C considers all sorts of different things as interchangable with int.
    Most importantly, it conflates int with bool. If it sees an integer in a boolean context, it considers 0 as false and 1 (or anything else) as true. Don't partake of this confusion -- use int or bool depending on what you mean. Just because C lets you do something doesn't that you should. We'll see some examples below.
    (Deep down, C also considers char to be the same as int, considering 'a' the same as 97 (on ASCII-based machines). This is non-portable; you should avoid exploiting C's confusion if avoidable.)

  • = vs ==
    x == 14 is a boolean expression returning true or false. x = 14, pronounced "x gets 14", does two distinct things: as an operator, it assigns 14 to the placeholder x. Moreover, as an expression, this whole thing has the value 14. Usually this value is thrown away, as in x = 14;. However it is admittedly convenient to be able to sometimes initialize several variables at once: x = (y = (z = 0));. The parentheses are optional.

    However, in if (x=0) y=z;, the value of x=0 (which is always 0) is now mis-interpreted as a boolean value, as mentioned above. So the body of the if statement will never ever be executed. When (not if) you do this, you'll hopefully get a compiler warning, something about assignment in parenthesized context.

    Some people prefer the habit of writing if (0 == x) ... when doing comparisons, so that mistyping 'gets' instead of 'equals' will cause a compile-time error, and the mistake discovered sooner rather than later.

  • x==y==7; 0 <= x <= 10
    if (x == y == 7) ... is tempting to write, but wrong. It is never true. Why? (Hint: it is treated as if ((x == y) == 7) ....) Similarly, if (0 <= x <= 10) ... is always true, since 0 <= x is either true or false, that is either 1 or 0, which is always less than 10. Arrgh!

  • 1/2 vs 1.0/2.0
    When doing division, if both arguments are int then the result is an int, using truncation.
    int a = 2, b = 3;
    float x = 2.0, y = 3.0;
    
    Thus a/b is zero. On the other hand, if the arguments are floats, then x/y is 0.66666. What about a/y? As a general rule, if C encounters a mixed-type expression, it converts everything to the most general (biggest) type. Since all ints can be (closely approximated by) floats but not vice-versa, float is more general, and a/y (or 2/3.0) becomes ((float)a)/y (or 2.0/3.0) and is 0.666666).

  • Direction of rounding negative numbers is implementation-specific.
    The direction of trunction isn't specified for negative numbers: -2/3 might result in 0 or -1 on different implementations. Similarly, the results of the mod operator % are not specified for negative operands, and -2 % 3 may vary from system to system. (However, it is required that (a/b)*b + a%b == a always holds when b != 0.) If your code depends on the direction of truncation for negative values, look up and use ceil(), floor(), div(), and fmod, which can be had by including math.h.

  • Precedence
    We all know that 2+3*4 entails doing the multiplication before the addition. But in (! a || b), does the negation refer to only a, or to all of a || b? (Recall that || is C-ese for "or".) The answer is: if you don't know, don't bother looking it up, instead just use parentheses to disambiguate. Unlike Scheme, extra parentheses don't hurt, and can avoid the introduction of hard-to-detect bugs. (If you do happen to know the precedence of ! versus ||, don't be ashamed to use the knowledge, but I can't recommend cluttering your brain with such details. A glance at page 181 of Harbison and Steele reveals 50 types of tokens occupying 29 levels of precedence.)

  • Short-Circuits
    This is an un-pitfall: As in Scheme, boolean expressions &&, || are evaluated left-to-right only until the answer is determined. Thus in the code if ((n != 0) && (sum/n > previousAvg)) ... , the division by n is safe because it won't be zero.

    (The two inside sets of parens are probably not needed, but what could be some unexpected snags if you leave them off, and the precedence operators aren't as you might expect?)

  • Comma
    The comma operator has a specific meaning in C; it's like Scheme's begin:
        expr1,expr2;
    
    means evaluate expr1, then expr2 (in order), and the whole thing has the value expr2.

    The only problem is that sometimes the comma is used where it shouldn't be: in a for-loop (you must use a semicolon), or as an attempt to make two-dimensional arrays: int board[19,25] happily compiles, but makes a one dimensional array of size 25. (Presumably int board[19][25] was intended.)

  • char *s = "hello"; s[2] = 'o';
    This may well cause a segmentation error, since the address where "hello" is stored may not be writable. If you want to use s as an array of characters, char s[] = "hello"; will work because this is array initialization. Or, allocate space for s with malloc() and use strcpy(); see a C book for details of malloc().

  • Use %s to print strings with printf
    char buff[BIG_ENOUGH_CONSTANT];
    ... // buff gets some message to be printed
    printf(buff);
    
    This is fine and dandy until you (or the user via a scanf) does something like:
    buff[i] = '%';
    buff[i+1] = 'd';
    
    Now the call to printf will have the wrong number or arguments and bad things will happen. Therefore it's recommended to use the control codes uniformly, to print strings:
    printf("%s", buff);
    

  • Declare-Before-Use
    In C, you must always declare a placeholder before using it. If f() calls g() and vice-versa, this puts you in a bit of a pickle. But fortunately, you can declare a function without giving the code for it.
    int f( int n );   // this is a declaration of f()
    
    int g( int m ) {  // this is both the declaration and definition of g()
      return f(m-3);  // It's now legal to mention f().
      }
    
    int f() {         // f() was previously declared; now its defined
      if (n < 0)
        return n;
      else
       return g(n);
      }
    

  • Only Declare Things Once
    In extended C, it happens that you can declare new local variables not just at the beginning of the function, but also at the start of any block (curly-braces). g++ won't complain if you define a new variable which shadows an existing variable:
      int x = 10;
      while (x >= 0) {
        int x = x - 1;     // this declares a different x!
        cout << x << " ";
      }
    
    This code is an infinite loop, printing (interestingly enough) 3,2,1,0,-1,-2,-3,.... The problem is that beginners are tempted to put the word "int" before every line that assigns to a variable; the compiler reads this as a declaration of a new variable.

  • Array Indexing
    I've said it before, I'll say it again: If an array is declared to be size big, it's up to you to make sure you only access elements 0..size-1. This becomes more subtle when you have a function which takes 'an array of unspecified size', like void foo( int arr[] ). You should either pass the size of the array as an additional parameter, or otherwise take precautions.

  • C Ignores Indentation
    You should always indent to show your intention: line B should be indented beneath the preceding line A if and only if B is logically inside of the statement of B. You clarify your intention not only to yourself, but others reading your code. (Fortunately, emacs indents things for you automatically if you are in its C or C++ or Scheme or Java mode.)

    That said, the C compiler itself doesn't give one whit about indentation. Recall the syntax for the if statement:

      if (condition)
        action
    
    where action is either a single statement followed by a semicolon, or a block statement with curly braces (not followed by a semicolon).
    int main() {
      //...
      if (x == y)
        y = y + x;
        printf( "I just increased the value of y.\n" );
      printf( "Continuing onward...\n" );
      //...
      }
    
    Here, because there are no brackets around the if clause, the printf is not inside the if. This type of logic error can be very difficult to spot. If emacs tries to indent a line in a funny way, stop and figure out why.

    Some people swear by always including brackets around the body of an if, for, while, even when only a single statement (and thus not strictly necessary). This avoids the above error, and also lets you add or remove statements from bodies w/o worrying about updating the status of the braces. (This also makes it easier to yank-and-paste code, when necessary.)

  • Empty for Bodies
    Related to the previous problem, a superfluous semicolon can be disastrous. Why is the following an infinite loop?
      int i = 1;
      while (i < 100);
        i = 2*i;
      printf( "The first power of two bigger than 100 is %d.\n", i );
    
    In the situation where you really do want to have an empty loop body, indent the null statement properly:
      for ( int i = 0;  i < 100; i = 2*i )
        ;  // Look at that null statement there!
      printf( "blahblahblah\n" );
    

  • Dangling elses
    In C, if-then and if-then-else statements are different, but their similarity can cause confusion when they are intermingled.
    char x;
    if (condition1)
        if (condition1a)
            x = 'a';
      else             // where to indent
       x = '?';        //   this?
    
    The question is, does x = '?' happen when (condition1 && !condition1a), or does it happen whenever (! condition1)? As it happens, the rule is "match an else with the nearest preceding unmatched if". So the answer is ... .
  • Use & with scanf.
    C's function scanf can be used similarly to Scheme's read. It is analagous to printf, except that you must put an ampersand & in front of the variables you want to read values into.
      int n;
      printf( "Enter a value for n: " );
      scanf( "%d", &n );
    
    If you think about it, scanf( "%d", n ) would definitely be funny: we don't want to pass the value of n to scanf, why it doesn't even have a value yet! If you really want to know, &n means the address-of-n, which scanf can use so it knows what bucket to put its answer into. You don't have to know this, or how to use scanf, but that's the story if you every see somebody else use it.
  • Exercise
    /*
     * Copy this program, which can be found in
     * ~comp210/Labs/lab12/wrongo.c,
     * and compile it to see how well the compiler likes it.
     * The file has a number of (hopefully) fairly obvious mistakes,
     * as well as some hideous indentation.
     * Modify it, so that it shines.
     */
    
    #include <stddef.h>     // get declaration of NULL
    #include <stdio.h>      // get declaration of printf
    int main() {
      double x;
        printf( "Enter a positive x, and I'll calculate how many doublings\n" );
        printf( "of 1 it takes until x is met or exceeded: " );
        scanf( "%lf", &x );
      
    int n = 1,doublings=0;
      for ( bool notDoneYet = x > 0; notDoneYet = true; notDoneYet == n >= x );
        {
          n == n * 2;
          doublings++;
        }
    
    printf( "ceil(log_2(%g)) = %d.\n", x, doublings );
    return 0;
      }
    
    (The program correctly uses scanf to read a double.) When you're done polishing, you may want to compare with righto.c.

    Pointers and Seg Faults and Bus Errors, Oh My!

    When you compile and run your program, the most dreaded error is "Segmentation fault". (At least that's what gets printed in UNIX; under MacOS and Windows it is pronounced "application unexpectedly quit" or "bomb icon" or "system frozen".) This usually indicates that your program is trying to read or write to an illegal memory location (either because your program doesn't have permission to that part of memory (UNIX), or that location doesn't actually exist (UNIX or PC-OS)).

    When I see a segmentation fault, my first suspicions are that my code is

  • trying to access beyond array bounds, or is
  • trying to access through an uninitialized (or null) pointer.
  • The first, we've warned you about enough times. Here are some examples of the second, uninitialized/null pointers:
      char str1[200] = "hi";         // This is fine.
      char *str2;                    // str2 has no value yet; okay but be cautious.
      strcpy( str2, "Katasrophe" );  // WRONG we're copying into str2 which
                                     //   could be pointing at anywhere in memory.
                                     //   "str2 = str1" would have saved us,
                                     //   by having str2 and str1 both point
                                     //   at a (valid) block of 200*sizeof(char)
    
    Here's another example, which uses a structure as briefly talked about in one lecture:
      #include <stddef.h>            // Get declaration of NULL.
      #include <stdio.h>             // Get declaration of printf.
    
      struct intCons {             // Declare a type "struct intCons"
        int car;                   //   with a car and a cdr.
        struct intCons *cdr;
        }
        
      intCons* l2 = new intCons;
      l2->car = 5;
      l2->cdr = NULL;
      intCons* l1;
    
      printf( "%d", l2->car );        // Fine, prints 5.
      l1 = l2->cdr;                   // Fine, l1 gets NULL.
      if (l2->cdr->cdr == NULL)       // WRONG: l2->cdr is NULL, so 'the thing
                                      // pointed to by l2->cdr' makes no sense.
        printf( "l2 has only one item.\n" );
                                      // Compare with also-incorrect Scheme:
                                      //   (if (null? (cdr (cdr l2))) ...)
    
    A Bus Error is (approximately) as frightening as a Segmentation Fault. If you see "Bus Error", look for the same two type of problems as might cause a seg fault.

    (Advanced:) Here is one way a bus error can occur, for those who care: On some machines, when the women in the controller ask for the contents of a memory bucket/byte, they actually get several (perhaps 4) buckets, a "word". All good and fine, except that when requesting a memory location, the request must be a multiple of 4 ("on a word boundary"). That is, not every memory-location (bucket) can be fetched individually (it's not "byte-addressable"). A bus error occurs if a bucket is requested that isn't a multiple of 4.

    Now usually the C compiler takes care of this for you (after all, this word size, 4, varies from machine to machine, being 1 in the trivial case). However, if you're accessing through a garbage pointer, then it may not be pointing at a word boundary, and the memory doesn't know what values to ship to the women via the data bus.

    Common C Error Messages

    Although you are using (extended) C, this list of common C++ errors also includes errors which you'll encounter from the C compiler. (Just ignore the C++-specific items, which talk about "class", "constructor", "object", "instance", or "method".)
    Back to Comp 210 Home