[an error occurred while processing this directive]

Quantifying algorithms

Comparing insertion sort and mergesort

A couple of motivating code examples: insertion sort, and merge sort. (We'll re-consider quicksort later.)

;;;;;;;;;;;;;;;;;;;;;;;;;;;; Insertion Sort ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; iSort: list-of-numbers --> list-of-numbers
;; Return a list with all elements of nums, in ascending order.
;; Uses an insertion-sort.
;;
(define (iSort nums)
  (cond [(empty? nums) empty]
        [(cons? nums)  (insert (first nums) (iSort (rest nums)))]))

;; insert: number, list-of-numbers --> list-of-numbers
;; Return an ascending list with the elements of already-sorted
;;   and also new, inserted into the correct (ascending) place.
;;
;; Pre-condition: Already-sorted must be in ascending order.
;; 
(define (insert new already-sorted)
  (cond [(empty? already-sorted) (list new)]
        [(cons? already-sorted) 
         (cond [(< new (first already-sorted)) (cons new already-sorted)]
               [else (cons (first already-sorted)
                           (insert new (rest already-sorted)))])]))



                                           
;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Merge Sort ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; mSort: list-of-number --> list-of-number
;; Return a list with the same elements of lon, in ascending order.
;;
(define (mSort lon)
    (cond [(length<=1? lon) lon]
          [else (local {(define two-parts (unzip lon))
                        (define one-part (first two-parts))
                        (define other-part (second two-parts))}
                  (merge-two (mSort one-part)
                             (mSort other-part)))]))


;; (merge-two l1 l2)
;; l1, l2 are ascending lists of numbers
;; return a single ascending list of the numbers of of l1,l2.
;;
;; Example:
;;    (merge-two (list 3 8 11) (list 1 3 4 9 29))
;;  = (list 1 3 3 4 8 9 11 29)
;;
(define (merge-two l1 l2)
  (cond [(and (empty? l1) (empty? l2)) empty]
        [(and (empty? l1) (cons?  l2)) l2]
        [(and (cons?  l1) (empty? l2)) l1]
        [(and (cons?  l1) (cons?  l2))
         (cond [(> (first l1) (first l2))
                (cons (first l2) (merge-two l1 (rest l2)))]
               [else (cons (first l1) (merge-two (rest l1) l2))])]))

                                                  
;; unzip: list-of-x --> (list list-of-x list-of-x))
;; Return two lists, each containing every-other element of lst
;; (in unspecified order).
;;
(define (unzip lst) (unzip-help lst empty empty))


;; unzip-help: list-of-x list-of-x list-of-x --> (list list-of-x list-of-x)
;; Return two lists, each containing every-other element of lst
;; and the elements of so-far1 (so-far2) respectively.
;;
(define (unzip-help lst so-far1 so-far2)
  (cond [(empty? lst) (list so-far1 so-far2)]
        [(cons?  lst) (unzip-help (rest lst) so-far2 (cons (first lst) so-far1))]))
                      ; Fancy footwork: swap order of so-far1, so-far2.

We saw in a Comp210 lab empirical tests showing that these two types of sorting have different behaviours, and discussed the reasons there. Today we'll look at the tools needed to formalize these concepts.

For a function "insert", we'll consider the running time of iSort -- a function t_iSort, which takes a list, and returns how long iSort runs on that input … on a 1GHz sparc20 with 512MB RAM, no other applications running besides the OS …

Suppose:

t_iSort( (list 50 23) )       =   7ns
t_iSort( (list 11 50 23) )    =  19ns
t_iSort( (list 33 11 50 23) ) =  31ns

t_iSort( (list 11 23 33 50) ) =   8ns
t_iSort( (list 50 33 23 11) ) =  60ns

We can compare this to mergesort:

t_mSort( (list 50 23) )       =  11ns
t_mSort( (list 11 50 23) )    =  15ns
t_mSort( (list 33 11 50 23) ) =  19ns

t_mSort( (list 11 23 33 50) ) =  20ns
t_mSort( (list 50 33 23 11) ) =  19ns

(These are fictitious but representative numbers)

We want to be able to compare these two sorts, to arrive at a general conclusion. There are several glitches:

The exact function is very complicated, since lists of different length. Is there a more useful function, to capture the essence of the running time?
This is tied to one particular technology. Isn't there something more fundamental about these algorithms, independent of technology? How to capture that?
One algorithm might be much more efficient than another in general, but (for small inputs) suffer from startup overhead (initializing variables, bringing code into memory, etc). In fact, those small inputs we care about least, since they're not the inputs causing us to wait. How can we have a notion which overlooks startup overhead?

We answer each of these concerns in turn. We want general answers, so we can also analyze the repeated-squaring algorithms (multiple versions) from hws, as compared to other algorithms for exponentiation.

Some of these approximations are definite trade-offs of accuracy vs keeping your model simple.

The exact function is very complicated [when input is a list]
Solution: Rather than look at individual lists, look at lists of length n. Take the …best-case? average-case? worst-case?

We'll take worst-case, with the theory that when we show the worst-case isn't so bad, we have an iron-clad guarantee. (Also, average-case is much more difficult in general.)

We extend t_iSort over N:
t_iSort(n) = max_{l ∈ ℜⁿ} t_iSort(l)
The max corresponds to worst-case. (What would best-case be? average-case?)
This is tied to one particular technology.

Solution: We might count just the number of atomic operations made. This abstracts away OS, memory size, etc. It does require some consensus about what is an atomic operation; what takes 3 steps on one processor might take 7 on another.

Besides, this #operations is talking about machine code (technology dependent), not high-level source code, where we'd prefer to keep the discussion. (Consider Java, where counting JVM instructions is machine-independent; of course different Java compilers can still compile to different byte code.)

However, these two processors probably have a constant-factor conversion between them: E.g. whenever you see those 3 steps, you can always convert them into (no more than) 7 steps on the other.

Better solution: count the number of atomic operations, up to a constant factor. This constant factor could correspond to running a 30% less-efficient compiler on a 57% faster machine, or adding more cache (making all memory accesses nearly 3 times faster), etc.
Ignore startup overhead for small inputs. When comparing two algorithms, We'll simply talk about comparing algorithms' asymptotic (marginal) time complexity.

For these latter two reasons, we'll develop a formal notion of comparing (the growth of) two functions.
[We'll be talking about functions for a while now, having shifted the topic away from algorithms to their running times (functions) to their worst-case running-times on inputs of a certain size.]

Defining Big-Oh

Intuition: f = O(g) (pronounced ``f is big-Oh of g'') means "f ≤ g, up to a constant, ignoring small inputs"

Definition: f ∈ O(g) iff:
∃ c,n₀ ∀ n>n₀, |f(n)| ≤ c⋅|g(n)|.

Examples

Consider f(n)=10n, g(n)=n².
Is f big-oh of g? g big-oh of f?
Polynomials in general: ∑_i=n⁰ a_i⋅xⁱ is O(xⁿ) (Which c, k?)
Show that for any bases b1,b2 > 1, f ∈ O( log_b1 ) iff f ∈ O( log_b2 ).
[Details left to the reader.]
Thus, when writing big-Oh it's fine to drop the base of the log, since any base will do. In fact, it's arguably better to drop the base of the log, just as it's clearer to write ``O(x²)'' than ``O(17x²+23)'', even though these are both the same set of functions.
Prove ∑ i² = O(n³).
Prove ∑ i = O(n²).
Consider a loop, where the body takes O(1) (constant) time. This body is executed up to n times. What is the overall running time?
As before, except the body takes time O(k), and the loop is for k ← 1 to n { body }?
As before, except before running the loop there is also an initialization code which takes time…
- 15?
- 15n?
- 15n²?
- 15n³?
As before, but now the loop takes time O(k²).
What is the running time of insert? Of iSort?
The same, except there are n³ passes through the loop?
From Rosen: how to prove that f ∈ O(F), g ∈ O(G) ⇒ f⋅g ∈ O(F⋅G), and other similar statements?

Comments

(Book uses "k" instead of "n₀"; we are usually interested in functions in N+→N+ rather than ℜ→ℜ. We'll often ignore the absolute values, for the same reason. And there is a fair amount of play in the details &emdash; we can have replace strict-inequalities with non-strict ones and vice-versa, without actually changing anything.)

Abuse alert: Although people write "f = O(g)", really O(g) is a set of functions, and we are saying ``f ∈ O(g)''. O(g) is the set of all functions which g is no-less-than … up to constant, ignoring small inputs.

We often use n to mean "the size of the input". Be careful:

When adding a list of N numbers, this is clearly an O(N) task … though what if each number has 3000 digits? N digits?
Is taking the square root really a one-step operation?
(This is the one advantage of fixed-point arithmetic — even though it can overflow and have round-off error and introduce bugs if programmers don't realize their language's arithmetic may not actually correspond to real arithemetic, we do have the fact that arithmetic operations are constant-time.)
Consider the running time of isPrime(n). One implementation may do √n divisions. Is this a sub-linear algorithm? In general, it's considered a very slow-running algorithm — I can easily write down an input that isn't computatinally feasible: say, a 30-digit number. While exact integers on the order of 10³⁰ don't arise in counting problems, they do arise routinely in cryptography (e.g. secure web transactions).

The upshot is, sometimes we want to say that, for numeric problems that take an input n, the size of the input is not n itself, but the number of digits (bits) needed to represent n. So the size of the input might be log(n), and a running time of
√n
= √2^log₂(n)
= 2^log₂(n)/2
which is exponential in the size of the input!

After centuries of work, only in 2002 has a true polynomial-time algorithm been found for determining whether a number is prime.

≤ is to O(•) as ≥ is to Ω(&bull) as = is to Θ(&bull) That is, these are (resp.) useful for expressing upper bound, lower bound, and tight bounds
(with the usual caveats: up to a constant factor, ignoring small inputs).

Th'm: the running time of any comparison-based ¹ sort is Ω(n log n).
This is a very strong result — you can't be substantially cleverer than mergesort! We will show this later, after covering a bit of counting and the pigeon-hole principle.

Less common are little-oh (omicron, ο) and little-omega; these are the companions of strictly < and >, resp.

Note that ``f ∈ O(g)'' is close to saying that ∃ c such that the limit as n→∞ of f(n)/g(n) < c. It's not exactly, equivalent for two technicalities:

the limit may not exist, yet one function is still big-Oh of the other.
(Exercise: find such a pair of functions!)
the functions we want to work with &emdash; often stemming from the run-time of algorithms &emdash; are often defined only on integer inputs. This one can be remedied easily enough by taking the piecewise linear extension to ℜ while not differentiable everywhere, the ratio-limit may well exist.

Big-Oh and Big-Theta are also good for expressing error estimates, in a formal way:
For example, Stirling's Approximation says that

n! ≈ &radic(2πn)⋅(n/e)ⁿ

But what if we are working with factorials and want to be sure we have an upper bound? A more useful statement of Stirling's approximation is

n! ≈ ( &radic(2πn)⋅(n/e)ⁿ ) ⋅ (1 + Θ(1/n))

or even

n! ≈ ( &radic(2πn)⋅(n/e)ⁿ ) ⋅ (1 + 1/(12n) + Θ(1/n²))

Of note:

O(1): bounded by some constant
Θ(1): some constant
ο(1): smaller than any constant (limiting to zero).

¹ At least, not without doing more than just comparisons between the objects you're sorting. But often there is a further bit of information: Suppose you are sorting N exams by score. If scores are always integers in [0,100], then you can set up 101 bins, and sort all N exams in a single pass. (``Bin sort'').
(back)

[an error occurred while processing this directive] [an error occurred while processing this directive]