A couple of motivating code examples: insertion sort, and merge sort. (We'll re-consider quicksort later.)
;;;;;;;;;;;;;;;;;;;;;;;;;;;; Insertion Sort ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; iSort: list-of-numbers --> list-of-numbers ;; Return a list with all elements of nums, in ascending order. ;; Uses an insertion-sort. ;; (define (iSort nums) (cond [(empty? nums) empty] [(cons? nums) (insert (first nums) (iSort (rest nums)))])) ;; insert: number, list-of-numbers --> list-of-numbers ;; Return an ascending list with the elements of already-sorted ;; and also new, inserted into the correct (ascending) place. ;; ;; Pre-condition: Already-sorted must be in ascending order. ;; (define (insert new already-sorted) (cond [(empty? already-sorted) (list new)] [(cons? already-sorted) (cond [(< new (first already-sorted)) (cons new already-sorted)] [else (cons (first already-sorted) (insert new (rest already-sorted)))])])) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Merge Sort ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; mSort: list-of-number --> list-of-number ;; Return a list with the same elements of lon, in ascending order. ;; (define (mSort lon) (cond [(length<=1? lon) lon] [else (local {(define two-parts (unzip lon)) (define one-part (first two-parts)) (define other-part (second two-parts))} (merge-two (mSort one-part) (mSort other-part)))])) ;; (merge-two l1 l2) ;; l1, l2 are ascending lists of numbers ;; return a single ascending list of the numbers of of l1,l2. ;; ;; Example: ;; (merge-two (list 3 8 11) (list 1 3 4 9 29)) ;; = (list 1 3 3 4 8 9 11 29) ;; (define (merge-two l1 l2) (cond [(and (empty? l1) (empty? l2)) empty] [(and (empty? l1) (cons? l2)) l2] [(and (cons? l1) (empty? l2)) l1] [(and (cons? l1) (cons? l2)) (cond [(> (first l1) (first l2)) (cons (first l2) (merge-two l1 (rest l2)))] [else (cons (first l1) (merge-two (rest l1) l2))])])) ;; unzip: list-of-x --> (list list-of-x list-of-x)) ;; Return two lists, each containing every-other element of lst ;; (in unspecified order). ;; (define (unzip lst) (unzip-help lst empty empty)) ;; unzip-help: list-of-x list-of-x list-of-x --> (list list-of-x list-of-x) ;; Return two lists, each containing every-other element of lst ;; and the elements of so-far1 (so-far2) respectively. ;; (define (unzip-help lst so-far1 so-far2) (cond [(empty? lst) (list so-far1 so-far2)] [(cons? lst) (unzip-help (rest lst) so-far2 (cons (first lst) so-far1))])) ; Fancy footwork: swap order of so-far1, so-far2.
We saw in a Comp210 lab empirical tests showing that these two types of sorting have different behaviours, and discussed the reasons there. Today we'll look at the tools needed to formalize these concepts.
For a function "insert", we'll consider the running time of iSort -- a function tiSort, which takes a list, and returns how long iSort runs on that input … on a 1GHz sparc20 with 512MB RAM, no other applications running besides the OS …
Suppose:
tiSort( (list 50 23) ) = 7ns tiSort( (list 11 50 23) ) = 19ns tiSort( (list 33 11 50 23) ) = 31ns tiSort( (list 11 23 33 50) ) = 8ns tiSort( (list 50 33 23 11) ) = 60nsWe can compare this to mergesort:
tmSort( (list 50 23) ) = 11ns tmSort( (list 11 50 23) ) = 15ns tmSort( (list 33 11 50 23) ) = 19ns tmSort( (list 11 23 33 50) ) = 20ns tmSort( (list 50 33 23 11) ) = 19ns(These are fictitious but representative numbers)
We want to be able to compare these two sorts, to arrive at a general conclusion. There are several glitches:
We answer each of these concerns in turn. We want general answers, so we can also analyze the repeated-squaring algorithms (multiple versions) from hws, as compared to other algorithms for exponentiation.
Some of these approximations are definite trade-offs of accuracy vs keeping your model simple.
The exact function is very complicated [when input is a list]
Solution: Rather than look at individual lists,
look at lists of length n.
Take the …best-case? average-case? worst-case?
We'll take worst-case, with the theory that when we show the worst-case isn't so bad, we have an iron-clad guarantee. (Also, average-case is much more difficult in general.)
We extend tiSort over N:
tiSort(n) = maxl ∈ ℜn tiSort(l)
The max corresponds to worst-case.
(What would best-case be? average-case?)
Solution: We might count just the number of atomic operations made. This abstracts away OS, memory size, etc. It does require some consensus about what is an atomic operation; what takes 3 steps on one processor might take 7 on another.
Besides, this #operations is talking about machine code (technology dependent), not high-level source code, where we'd prefer to keep the discussion. (Consider Java, where counting JVM instructions is machine-independent; of course different Java compilers can still compile to different byte code.)
However, these two processors probably have a constant-factor conversion between them: E.g. whenever you see those 3 steps, you can always convert them into (no more than) 7 steps on the other.
Better solution: count the number of atomic operations, up to a constant factor. This constant factor could correspond to running a 30% less-efficient compiler on a 57% faster machine, or adding more cache (making all memory accesses nearly 3 times faster), etc.
For these latter two reasons,
we'll develop a formal notion of comparing (the growth of) two functions.
[We'll be talking about functions for a while now, having
shifted the topic away from algorithms
to their running times (functions)
to their worst-case running-times on inputs of a certain size.]
Intuition: f = O(g) (pronounced ``f is big-Oh of g'') means "f ≤ g, up to a constant, ignoring small inputs"
Definition:
f ∈ O(g)
iff:
∃ c,n0 ∀ n>n0,
|f(n)| ≤ c⋅|g(n)|.
(Book uses "k" instead of "n0"; we are usually interested in functions in N+→N+ rather than ℜ→ℜ. We'll often ignore the absolute values, for the same reason. And there is a fair amount of play in the details &emdash; we can have replace strict-inequalities with non-strict ones and vice-versa, without actually changing anything.)
Abuse alert: Although people write "f = O(g)", really O(g) is a set of functions, and we are saying ``f ∈ O(g)''. O(g) is the set of all functions which g is no-less-than … up to constant, ignoring small inputs.
We often use n to mean "the size of the input". Be careful:
Consider the running time of isPrime(n). One implementation may do √n divisions. Is this a sub-linear algorithm? In general, it's considered a very slow-running algorithm — I can easily write down an input that isn't computatinally feasible: say, a 30-digit number. While exact integers on the order of 1030 don't arise in counting problems, they do arise routinely in cryptography (e.g. secure web transactions).
The upshot is, sometimes we want to say that, for numeric problems
that take an input n,
the size of the input is not n itself, but the number of
digits (bits) needed to represent n.
So the size of the input might be log(n),
and a running time of
√n
= √2log2(n)
= 2log2(n)/2
which is exponential in the size of the input!
After centuries of work, only in 2002 has a true polynomial-time algorithm been found for determining whether a number is prime.
≤ is to O(•)
as
≥ is to Ω(&bull)
as
= is to Θ(&bull)
That is,
these are (resp.) useful for expressing
upper bound,
lower bound,
and
tight bounds
(with the usual caveats:
up to a constant factor, ignoring small inputs).
Th'm:
the running time of any
comparison-based¹
sort is Ω(n log n).
This is a very strong result — you can't be substantially
cleverer than mergesort!
We will show this later, after covering a bit of
counting and the pigeon-hole principle.
Less common are little-oh (omicron, ο) and little-omega; these are the companions of strictly < and >, resp.
Note that ``f ∈ O(g)'' is close to saying that ∃ c such that the limit as n→∞ of f(n)/g(n) < c. It's not exactly, equivalent for two technicalities:
Big-Oh and Big-Theta are also good for expressing error estimates,
in a formal way:
For example, Stirling's Approximation says that
n! ≈ &radic(2πn)⋅(n/e)nBut what if we are working with factorials and want to be sure we have an upper bound? A more useful statement of Stirling's approximation is
n! ≈ ( &radic(2πn)⋅(n/e)n ) ⋅ (1 + Θ(1/n))or even
n! ≈ ( &radic(2πn)⋅(n/e)n ) ⋅ (1 + 1/(12n) + Θ(1/n2))
Of note:
¹
At least, not without doing more than just comparisons between the
objects you're sorting.
But often there is a further bit of information:
Suppose you are sorting N exams by score.
If scores are always integers in [0,100],
then you can set up 101 bins, and sort all N exams in a single pass.
(``Bin sort'').
(back)