COMP 210 Lab 9: Big-Oh

For the curious... A program P runs in time (or uses space) O(f(n)) for some function f, if the running time (space usage) on inputs of size n, for suffciently large n, is bounded by C*f(n) for some constant C. The "sufficiently large n" qualification provides exceptions for values of n below some threshold N. For example, say that we define the size of a list as the length of the list and we have written a linear version of the reverse function. The reverse function on the empty list runs very fast but it does not run in zero time, which is what our definition would require if we did not allow a finite number of exceptions.

Obviously any function that runs time O(n) also technically runs in time O(n²) and O(2ⁿ). By convention, we use the best (smallest) possible function f when stating asymptotic complexity. If a program runs in linear time (O(n)), it is misleading to say that is runs in quadratic time (O(n²)).

Why Ignoring Constant Factors?

We are concerned with general behavior (running time, in this case) of an algorithm. We'd like to say things about the algorithm itself, as separate from the algorithm when run on a Sparc5 with 8MB of cache blahblahblah, and separate from the algorithm when run on an original MacPlus. Making such statements would be too specific and too short-lived to be of lasting use or interest.

Our analysis of the running time is being measured in 'number of steps', which is only proportional to the observable time taken. (How long a particular step takes depends on the speed of a computer, its instruction set, etc.) If we're not talking about running the algorithm on one particular machine and one particular configuration, then the best we can do is to calculate running time to within a constant factor.

This lets us get away with ignoring the base of the logarithm, since switching bases just changes the result by a constant factor.

It is certainly arguable how useful it is to ignore constant factors; but in any case knowing that one algorithm runs in time O (n log n) and another in time n^2 is good evidence that for inputs of any reasonable size, you'll probably want the faster algorithm rather than the faster computer.

Abuse of Notation

Technically, O(g(n)) is a set: the set of all functions which grow no faster than g (up to a constant factor, ignoring small values of n). Thus, proper usage is "f(n) is an element of O(g(n))".

However, in common usage, people usually talk as if O(g(n)) were a single function, saying "f (n) is big-Oh of g(n)", and writing "f(n) = O(g(n))". It is okay to do this, as long as you are aware that you are abusing notation.

While speaking of ambiguous notation, notice that calling n² a function is a bit insidious. Really, the function is understood to be lambda (n) n². The latter you can apply to any number you like (even if you have an unrelated placeholder n floating around); the former apparently depends on the one particular value n has at the time you use the function. Similarly, instead of saying f (n) is a function, really f is the function (or long-windedly, lambda (n) f(n); that's just a long name for f.

Always remember, there is a difference between a function, and a function evaluated at a particular point.

All this will be more (re)explained in COMP 212 and COMP 280.