Lecture 27 sketch — complexity

[an error occurred while processing this directive]

Aside: Codeword error on hw: twiddling one letter can result in a loss of information: "table" and "fable" both map from "5able".
In the space Σⁿ, the Hamming distance between two words a,b:
let dist(a,b) be the numer of indices in which a,b differ.
Is dist(•,•) a metric?
Definition of metric:

∀x,y, dist(x,y) ≥ 0
∀x,y dist(x,y) = dist(y,x)
∀x,y dist(x,y) = 0 iff x=y.
∀x,y,z dist(x,y) + dist(y,z) ≥ dist(x,z)

Yes, our function meets these criteria.

Application to coding: codewords are a subset of Σⁿ such that all points are at least distance (say) 3.
This means that if there are one or two errors, you can detect that an error occured. If only one error, you can recover the original. (If two errors, you may inadvertently recover the wrong original.)

Define: the sphere of radius n around w, B_n(w): … . What is our codeword condition, in terms of these spheres? Note that the space is a discrete graph.

Fact: log(n!) = Θ(n log n). [Show.]

A lower bound on sorting: (Rosen 9.2) computation tree based on comparisons; has n! leaves; thus must have height lg(n!) = Ω(n log n) (Recall tree's size-vs-height bound from structural-induction lecture.)

We have a very complete picture for sorting: we know an lower bound for any algorithm, and and an algorithm which attains it. Usually we're not so lucky. For instance, as mentioned, we'd like to know that there's no quick way to factor numbers (that it's complexity is Θ(n), not just O(n).)

One counting argument: Recall that reals in [0,1] are uncountable. (Consider writing such numbers in binary.) How many functions are there, in N → {F,T}? This is isomorphic. How many programs exist? Countable. Therefore: There are functions which cannot be computed (!) Recall from comp210: The Halting Problem.

Note how we're distinguishing between a Problem, and various algorithms to solve (instances of) the problem.

Def'ns:

P = PTIME: The set of all problems which have a polynomial-time solution. That is, run times in ∪_k∈N O(n^k) where n is the size of the input.
NP = NPTIME: The set of all problems which have a polynomial-time verifier. (much weaker!) (Stands for "non-deterministic polynomial".)
EXPTIME: the set of all problems which have an exponential-time solution.

Note that P is contained in NP is contained in EXPTIME. Are these containments proper? Unknown (!) (Despite Homer³'s assertion.)

Classify these problems as best possible:

Sorting: given a list of n numbers, return the sorted list.
Shortest-path: Given a graph and two vertices, what is shortest path btwn?
Longest-path
Composite?: Given a number n, is it composite?
Prime?: Given a number n, is it prime?
3-coloring: Partition a graph into three sets R,G,B such that no edges between elements of the same set(color).
Chess: given a board, does white have a forced win? For 8x8 boards, this is a finite problem: O(1). But for nxn board, it's harder.
Minesweeper-consistency
TAUTOLOGY: given phi, is it true under all truth assignments?
EQUIVALENCE: given phi, psi, are they the same under all truth assignment? Recall: we showed earlier that each of these reduced to the other: If your friend can quickly solve Tautology, then you could set up a Equivalence-solving shoppe, by subcontracting w/ your friend, so Equiv ≤_P Tautology. Conversely, Tautology ≤_P Equivalence.
SAT: given a propositional formula, is it satisfiable?
Circuit-SAT (as SAT, but w/ a AND/OR/NOT feed-forward circuit.)
has-PTIME-soln?: Given (the code for) a verifier, and an input for the problem in question, is there a sol'n which appeases the verifier? Naive algorithm:

Note that this last problem is artificial. It has the property that it's "NP-complete": if you can solve that problem quickly (in P time), then you could solve any problem in NP quickly.

For reductions, we require that a reduction be computable in P, so that we get the desired "can solve in polynomial time" relationship.

Th'm (Cook, 1974): SAT is NP-complete.
Proof sketch: Reduce has-PTIME-soln? to SAT:
Given a verifier and an input problem:

We take the verifier, and model it with a huge propositional formula: For every bit of memory it might potentially use during a run, we'll have a proposition for that for each of the n^k time steps.
Now make a bunch of formulas which relate one time-step to the next: For example, if instruction#95 were ``j ← i'', then we would have a formula ``(PC₁₇ = 95) → (j₁₈ ↔ i₁₇''; this talks about time 17; we'd have another such formula talking aobut time 18, and 19, etc (each involving separate propositions).
Call this mega-formula φ. It's huge, but it's still polynomial in the input length. Note that the value of the input-bits at time 0 is entirely unconstrained; after that, there are formulas constraining what the values of memory at all later times are (at least, at times up to n^k).
Now, just pass the question to the SAT solver, asking φ ∧ Output_n^k. This solves whether or not there's some input that makes the Output bit true.

[Realize, this is very much a sketch.

For instance, is φ really polynomial?
What language constructs are being assumed?
Do we really need to be given k as well as the verifier? What if k isn't known?, or it takes fewer than n^k steps for some inputs?
There's some funniness about the size of the verifier, as opposed to the size of the subproblem-input — what if the verifier were exponential in size of the subproblem-input? Does it even make sense to talk about that, for a fixed subproblem-input?
Etc.

See Comp 481 for more info.

Many problems of practical interest have been shown NP-complete. (The 10-coloring problem is assigning students to colleges, w/o having people from same high school in same college.) For instance, SAT can be reduced to 3-coloring: via a clever way of taking a formula, creating an equivalent formula with exactly 3 variables per clause (adding extra dummy variables), then taking these 3-clauses and constructing a weird graph that can be 3-colored iff the original formula was satisfied. We know that solving any of these is as difficult as solving all of them, and probably not worth your effort to try. Though there is a $1,000,000 cash prize for resolving P vs NP either way. (Some NP-complete problems allow approximate sol'ns, which might be good enough; for others NP-complete, finding an approximate sol'n is as difficult as finding the entire sol'n!)

Note that Minesweeper is NP-complete.

Reached here, 2004.Apr.20

Randomized Algorithms

Now, a different tack: Flipping coins to gain computational power(?!)

Towards randomized algs and their running time:
Running time of quicksort?:
Hmm, look at avg-case time.
Hmm, need to define "average".
[Can set up recurrence, and show that any sol'n is bounded by O(n log n)]
For random partition, what is the size of the larger half?:
(We'll assume we keep the pivot itself out of either half (at least, one occurrence of the pivot).)

    1/n⋅max(0,n-1) + 1/n⋅max(1,n-2) + 1/n⋅max(2,n-3) + 1/n⋅max(3,n-4) + … + 1/n⋅(max(⌊n/2⌋,⌈n/2⌉) 
  + 1/n⋅(max(⌈n/2⌉,⌊n/2⌋) + … + 1/n⋅max(n-4,3) + 1/n⋅max(n-3,2) + 1/n⋅max(n-2,1) + 1/n⋅max(n-1,0)
= 1/n ⋅ [   (n-1) + (n-2) + … + ⌈n/2⌉
          + ⌈n/2⌉ + … + (n-2) + (n-1) ]
= 2/n ⋅ [   (n-1) + (n-2) + … + ⌈n/2⌉ ]
= 2/n ⋅[ (n-1 + n/2) ⋅ (n-1)/4 ] 
= (3n-2)(n-1)/4n = (3n^2-5n+2)/4n = 3n/4 + O(1)

(We're being a bit sloppy here about whether n is even or odd; with a bit of care, this can be patched over w/o problem.)

Note how randomization helped our quicksort alg. Consider too:

flipping mattress, or
who to pay for dinner?
choose one of three people randomly; we only have access to random bits.
Or, we have independent-but-biased coin (but don't know the bias). How to get random bits from it?
Extremely clever: Look at successive pairs of flips — only take those which differ!
Measuring pi: throw darts at a square, count how many land in circle. More generally, measuring volume defined enclosed by wacky surfaces.
flipping mattress, or
who to pay for dinner?
choose one of three people randomly; we only have access to random bits.
Or, we have independent-but-biased coin (but don't know the bias). How to get random bits from it? Look at successive pairs of flips — only take those which differ!
Measuring pi: throw darts at a square, count how many land in circle. More generally, measuring volume defined enclosed by wacky surfaces.
testing primes For a number n, and 1 < b < n: "The Divisor test: b|n ?"
1. If n is prime, then the test always passes. (No false negatives.) But the converse doesn't always hold: if the test passes, it may or may not be prime. (Some false positives.)
  What if we search for witnesses b < n, uniformly at random?
2. If n is non-prime, then the test might fail. But how often? There could be as few as two factors, so the test might fail for only 2/n of the possible tests. If n is on the order of 10³⁰ and we're picking at random, it'd take a long time to have confidence. (We'd rather try each candidate methodically!)
BUT, Miller came up with a very clever alternate test:
For a number n and b < n, The Miller test of n with base b:
Let n-1 = 2^st, where s non-negative and t odd. It passes iff:
- b^t ≡ 1 (mod n), or
- ∃ 0≤j<s such that b^{2^jt} ≡ -1.
(See also Rosen's exercises.) Pertinent fact:
1. If n is prime, then Miller's test always passes.
2. If n is non-prime, then there are at least n/4 bases which will cause the test to fail Miller's test.>
Algorithm: Choose b uniformly at random;
- if fails test answer ``n definitely not prime''; if it passes answer
- ``n prime (prob that this is a false positive: < 75%)''
How to amplify this algorithm?

Likelihood of a cosmic ray hitting a transistor with enough energy to flip a bit while our program is running: < 10⁶⁰ [guess]

Don't be confused by raging threads on sci.math: "a number is prime or it isn't; nonsensical to say it's prime with probability …". Whenever you give a probability, you have to say what you're randomizing over!

These are all trade-offs:

"Monte Carlo": guaranteed fast run time / low memory, but not accurate (false-positives are possible)
"Las Vegas": always acurrate, usually quick but possibly slow.

Even more intriguingly: Recently, primality has been shown to be in PTIME (no randomness required).
Deep question: Does randomization really ever help us, in P(TIME) vs NP(TIME) problems? Or is every problem for which randomization helps, is there some clever PTIME algorithm we just don't know about yet?

[an error occurred while processing this directive] [an error occurred while processing this directive]