Lecture 17 sketch — induction

[an error occurred while processing this directive]

Recursive definitions and structural induction:

A binary tree is:

(make-empty-tree)
(make-branch [any] [tree] [tree])

(define-struct empty-tree ())
(define-struct branch (datum left right))

Prove: for any binary tree, there is one more leaf than branch.

NOT the way to argue the inductive step:

"Take any binary tree of size k; I'll splice out a leaf and add a branch with two leaves. this gives me a binary tree of size k+1, which has one more branch and (net) one more leaf."

Problem: this gives you some binary tree of size k+1, but can you guarantee that all such binary trees were created through this method?

To be clear, start with an arbitrary binary tree of size k+1, and show that it has the property (knowing that any way you trim it down gives you a binary tree of size k, for which the inductive hypothesis will hold).

We'll define the size of a binary tree as the number of branches:

For an empty binary tree, 0.
For a branch, 1+size(left)+size(right).

We'll define the height of a binary tree as:

For an empty binary tree, 0.
For a branch, it's 1+max(height(left),height(right)).

Prove: For any binary tree, size(t)+1 ≤ 2^height(t)

Similarly, structural induction on WFFs of first-order logic: A WFF is:

T or F,
a proposition,
¬φ,
(φ ∧ ψ),
(φ ∨ ψ),
(φ → ψ)
∀ x.φ
∃ x.φ

This is a recursive definition of structure; proof on WFF will have eight cases (two base, 6 recursive).

Def'n: An "atom" is T, F, or a proposition.

Prove: for any WFF, the number of connectives ≥ is equal to the number of atoms minus 1.

(More generally, we can include not: #conns ≥ #props-1.)

Here is a def'n of string: For a set Σ (the alphabet), the set Σ^* ("strings over Σ") contains:

λ (empty string)
wa, where a ∈ Σ, and w ∈ Σ^*.

Once we have this, how to define concatenation?
For v,w ∈ Σ*, vw ∈ Σ* as follows:

If w=λ, vw = v.
if w=ua (for a letter a and a string u), vw = (vu)a.

Th'm (book): length(vw) = length(v) + length(w).

Define length on def'n.
Then, induct on w only!
That is, P(w) = "length(vw) = …" (for arbitrary string v).
Then just look at the two cases for w being empty or not. (This is an example in Rosen.)

Some exercises:

Give a def'n of "string of even length".
Give a def'n of "stuttering string", each letter duplicated — e.g. aabbzzzzyy.
Give a def'n of w^R:
- if w=λ, λ^R = λ.
- if w=ua, w^R = au^R.

Book (and a hw problem) talks about reversal w^R of a string w. What is skeleton for showing

(vw)^R = w^Rv^R ?

Again, make P() about w only, with a "forall v ∈ Σ^* …". Use the Def'n of concatenation!

Section 3.4, #30: Prove that a string contains at most one more occurrence of 01 than of 10. Notation, to help: for a string s, #01(s) is the number of 01's, and #10(s) corresponding.

Let P(s) be "#01(s) ≤ #10(s)+1". Proof by structural induction:

base case: if s = λ, #01(λ) = 0 ≤ 0+1 = #10(λ)+1, Check.
Inductive step, s=wa:
- case a=0: inductive hypothesis: #01(w) ≤ #10(w) + 1.
  Thus #01(s) = #01(w0) = #01(w) ≤ #10(w)+1 ≤ #10(w0) +1 = #10(s)+1.
- case a=1:
  #01(s) = #01(w1) ≤ #01(w)+1 ≤ #10(w)+1+1 = #10(s)+2 ?!. Uh-oh, we can't go through with this step!
Stepping back, it seems clear — if we only knew our smaller string could have one more 01 than 10, then surely adding a 1 on the end can only increase the number of 01s.

To get the inductive step, we wish our inductive hypothesis had been a bit stronger — something like ``#01(s) ≤ #10(s)+1, and equality holds only if the string's last letter is 1''. This actually gives enough information for the proof to now go through -- ironically, by making our proof goal tougher, our problem will become easier! This is called "loading the induction hypothesis":

Let P(s) be ``#01(s) ≤ #10(s)+1. Furthermore, if s doesn't end in 1, then this inequality is strict: #01(s) < #10(s)+1.''
Proof by structural induction:

Base case: if s = λ, #01(λ) = 0 < 0+1 = #10(λ)+1, Check. Since λ doesn't end in 1, we have shown the stricter version.
Inductive step, s=wa:
- case I: s=w0. We will show that in this case, #01(s)<#10(s)+1 (the stricter version).
  - Case Ia: w ends in 1.
    #01(s) = #01(w0) by case I
               = #01(w) since appending a 0 can't increase #01s.
               ≤ #10(w)+1 by inductive hypothesis
               = (#10(w0)-1)+1 since w ends in 1, w0 has one more #10 than w did.
               = #10(w0)
               < #10(w0)+1.
    Note that this inequality is strict, so we don't need to worry about P(s)'s ``if equality holds'' clause, and we have fully satisified P(s).
  - case Ib: w doesn't end in 1 (it might be λ).
    #01(s) = #01(w0) By case 1
               = #01(w) since w0 ends in 0
               < #10(w)+1 by inductive hypothesis; equality can't hold since w doesn't end in 1.
               = #10(w0)+1 since w ends in 0
               = #10(s)+1.
- case II: s=w1:
  Here, since s ends in a 1, we need only show the non-strict inequality, #01(s) ≤ #10(s)+1.
  - Case IIa: w ends in 1.
    #01(s) = #01(w1) by case II
               = #01(w) by case IIa
               ≤ #10(w)+1 by inductive hypothesis
               = #10(w1)+1
               = #10(s)+1.
  - Case IIb: w doesn't end in 1 (possibly, w = λ!).
    Note that this is where our stronger inductive hypothesis pays off: #01(w) < #10(w)+1. Since these are integers, a number being <(#10(w)+1) means being ≤#10(w). Thus we have:
    #01(s) = #01(w1) by case II
               ≤ #01(w) + 1 Appending a 1 can't boost #01 by more than 1
               ≤ #10(w)+1 by improved inductive hypothesis
               = #10(w1)+1
               = #10(s)+1.

[The first time I wrote out this proof, I didn't use this notation "#01()". As i was repeating the same words over and over, I went back and re-wrote it with notation.

By the way, deciding what things to give special shorthand/names to is part of the art of writing a good proof… just as deciding precisely what tasks to include as separate functions is part of the art of writing a good program.

Beware adding too much new notation — this can result in a more-confusing proof. Using "standard" names like x,y for real numbers, m,n for natnums, i,j for indices can also make your notation more accessible. ]

Note

In structural induction (and in general for the inductive step(s)), start with an arbitrary structure, then name the sub-parts its made out of, and then invoke the inductive hypothesis.

Example:

Let P(t) be ``2^height(t) ≥ size(t)''. We prove P(t) holds for all trees t by structural induction:

More clear:
- Case 1, t = (make-leaf): …
- Case 2, t = (make-brach datum t1 t2), where t1 and t2 are binary trees:
  By our inductive hypothesis, we know P(t1) and P(t2), that is 2^height(t1) …
Less clear:
- Case 1: P((make-leaf)): …
- Case 2: Asssume P(t1) and P(t2) hold, for some trees t1,t2 Then we show that P((make-branch datum t1 t2)) holds: …

The reason the first situation is more clear is that it's nicer to say "hey, you give me any tree t. I'll reason with it."

The second approach is round-about, saying "well, if something holds for some smaller trees t1 and t2, then it holds for a tree made out of them." Left unstated is: "Oh, and this tree I made out of the t1,t2 happens to be the particular tree you are interested in, even though I never really told you which t1 and t2 I was choosing."

Why all these different forms of induction?

For you to ponder: If I restricted you to only use a single base case, would this suffice to still solve the problem? That is, is strong-induction-with-multiple-premises a truly more powerful inference rule than strong-induction-with-single-premise?

For that matter, introduced mathematical (non-strong) induction as a new rule of inference; then sneakily slipped in strong-induction as well. Are we justified to throw in stronger and stronger rules of inference whenever we feel like it? (Rosen even has a couple of exercises in this vein: sect 3.3, #55,56.)

Loading the induction hypothesis

Suppose you are given some code for append, and you try to prove

For any list a, (append a (append a a)) = (append (append a a) a)

Seems like this shouldn't be so bad. But you try doing it by induction, and you get stuck — your inductive hypothesis doesn't quite give you enough information to get things to go through.

;; append: list, list --> list
;;
(define (append x y)
   (cond [(empty? x) y]
         [(cons?  x) (cons (first x)
                           (append (rest x) y))]))

Let P(a) be ``(append a (append a a)) = (append (append a a) a)''. We will (try unsuccessfully to) prove by structural induction that this property holds for our code as written.

Base case: a = empty: In this case, (append empty empty) = empty (by the code), so indeed

  (append empty (append empty empty))
= (append empty empty) 
= (append (append empty empty) empty)

Inductive step: a = (cons a0 a*).
```
  (append a (append a a))
= (append (cons a0 a*) (append a a))    ; by this case for a
= (cons a0 (append a* (append a a)))    ; by code for append
```
(So far so good; we can't apply our inductive hypothesis (since we're not triply-appending the same list any more), though I guess we can pursue the inner append knowing something about the structure of a:
```
= (cons a0 (append a* (append (cons a0 a*) a)))   ; by this case for a
= (cons a0 (append a* (cons a0 (append  a* a))))  ; by code for append
= ???
```
Unfortunately, we are stuck — we have a statement where the second argument to the (first) append is in terms of cons, but that doesn't help us, since our code never disassembles our second argument.

Ironically, by proving a stronger statement, your life actually becomes easier! It's not too hard to show that

For any lists a,b,c (append a (append b c)) = (append (append a b) c).

We will still induct on a (alone): let P(a) be ``For all lists b,c, (append a (append b c)) = (append (append a b) c)''. This is called ``loading the inductive hypothesis''.

We will (successfully, this time) prove by structural induction that P(a) holds for all lists a.

Base case: a = empty: In this case, (append empty empty) = empty (by the code), so indeed

  (append empty (append empty empty))
= (append empty empty) 
= (append (append empty empty) empty)

Inductive step: a = (cons a0 a*).

  (append a (append b c))
= (append (cons a0 a*) (append b c))    ; by this case for a
= (cons a0 (append a* (append b c)))    ; by code for append
= (cons a0 (append (append a* b) c))    ; by our inductive hypothesis!
= (append (cons a0 (append a* b)) c)    ; By code (in reverse)
= (append (append (cons a0 a*) b) c)    ; By code (in reverse)
= (append (append a b) c)               ; By this case for a

(Hmm, in this case it's because the first example, while it can be (correctly) viewed as an instance of "append associates", it can also be (incorrectly) viewed as "append commutes"!)

[an error occurred while processing this directive] [an error occurred while processing this directive]