Lecture 23 sketch — mod applications

[an error occurred while processing this directive]

Public Keys and Amazon

Encryption:

Motivation: When you type in my credit card number to Amazon.com, just what does the lock in the lower-corner of browser mean?

First, realize that web-transmissions (and emails) get sent through intermediate computers, whose owners are free to snoop everything they pass along. So you want your browser to encrypt your credit card number before it sends off the web form. This wouldn't be too hard, if you'd previously talked with amazon and they'd assigned you some sort of secret password. (Presumably over the phone, since their email to you might be snooped by others!) BUT, you haven't talked w/ Amazon before, and besides maybe your phone is tapped and even your mail is being intercepted. No pre-arranged Captain-Mod's-Secret-Decoder-Ring allowed! Is it still possible for you to give secret info to Amazon, while eavesdroppers can't figure out?

A weird idea: Public-key system:
a box with two keys; if it's locked with one key, only the other can unlock it (and vice-versa).
[Not clear you can actually build such a box!]
By calling one key private and the other public (and placing many copies of this public key and lock in mailrooms or on your webpage), anybody can send you a message; even if somebody interecepts the message and knows your public key, they're not able to break the cipher. (Even the sender can't decrypt the message, if they somehow forgot what they sent.)

a naive public-key system

Still, this is a nice pie-in-the-sky idea, but can you actually make such a two-lock box? Here's one attempt:

To send Amazon a one-letter message, convert that letter to a number (A=1,B=2,..,Z=26). Then, multiply that number by 73 ("the public key"), but keep only the last two digits. That's the encrypted message — that's all there is to the sender's work! (math-speak: for a message m, the encoding is 73m mod 100.)

To decrypt, Amazon will do the same thing on their end, except instead of 73 they'll use some "private key" which magically undoes the multiplying by 73. (We don't know what that is.)

Demo: Have a student secretly select a two-digit "credit card number", and the teacher will play the part of Amazon. E.g., if they secretly choose 19, then the dialogue goes as follows:

 Browser: Hi, I'm Jane Doe's web browser; she wants to order 
          the Back Sink Boys' latest CD.

 Amazon:  Sure — give me your (one-letter) credit-card, but
          encrypt as follows: multiply by 73, and keep only last 2 digits.

 Browser: [takes the 'S' Jane types in, converts it to 19, and computes
           73⋅19 =1387; keeps only 87.  Draws a lock on Jane's screen.]

          Okay Amazon — the encrypted letter is 87.

 Amazon:  [thinking to itself:  Ah, multiply 87 by [my-private-key],
           keeping only the last two digits, that gives me, um … 19;
           ah, 19 means "S".]

          Thank you; their album _Unclogged_ is on its way!

Class exercise: Having heard the encryption scheme and the encrypted message 49 (but not knowing the private key), try to decrypt it! (As a homework?)

What is the private key, which amazon uses? It's known only to amazon; for this example it turns out to be 37. Thus 37⋅87=3219, which (mod 100) is 19. (It's a coincidence that the two keys are 73 and 37; we could have instead used, say, 71 and 31 as the keys. More below.)

Note that there is no secret pre-arrangement between the sender and the receipient (you and Amazon); both the algorithm and public key are on (say) a web page for the world to see. Only Amazon's private key, 37, is hidden. Not even the sender knows it.

Weaknesses of this system

Discussion: How did people go about trying to break the code?

[One possible attempt: try each of 100 decryption keys.]
[Another possible way: there were only 26 possible messages; an eavesdropper can try encrypting each of those and see if it matches the encoded message.
[Extra-credit to any student who organized the class to work on the problem in parallel. :-]

My key

Why it works: The numbers 37, 73, and 100 were chosen to work together:

37⋅73 = 2101 = 1 (mod 100).

Thus sombody starts with m, and broadcasts 73m (mod 100). You compute 37⋅(73m) = (37⋅73)m = 2101m = 2100m + m = m (mod 100). We say "37 and 73 are multiplicative inverses of each other, mod 100". While there is no division mod 100 (we're not using any fractions), multiplying by one of the numbers un-does multiplying by the other, so it's the "mod" equivalent of division.

Other uses: Signing with public keys

Can also authenticate messages this way, so you can be sure that email from Leebron claiming school canceled for snow day is really from him (or at least, somebody who with his private key).

If we can put a box inside another box, then we can do both — I can send somebody a secure message which they can authenticate really is from me.

Euclid Breaks the naive system

Alas, the public-key system we just demonstrated is terribly unsecure: Euclid figured out how to break it around 300 B.C.E.

At least, he came up with an algorithm for determining the gcd (greatest common divisor) of two numbers. His algorithm can be modified to figure out the inverse of a number mod m.

Claim: gcd(a,b) = gcd(a-b,b) (if a > b).
This isn't too hard to see, if you think as a and b as lengths, and the gcd as the longest ruler which evenly measures both a and b.

---------------------------------- a=34
--------------------------         b=26
                          -------- a-b=8

Note that if we can lay down a ruler and evenly measures b, and we can also can go on to evenly measure a, then that ruler must evenly measure the interval between a and b (which is b-a). (Conversely, the biggest ruler which measures both b and the difference a-b must be the biggest ruler that measure a and b. (How would you prove this?) Thus gcd(a,b) = gcd(a-b,b).

Further claim: gcd(a,b) = gcd( remainder(a,b), b) (if a > b).
This is the same idea as before:

--------------------------         a=26
--------                           b= 8
------------------------          3b=24
                        --        remainder(26,8) = 2

If some ruler evenly divides 8, then it must also divide the 3⋅8 = 24; if the same ruler also divides 26 then it must divide 2 by the previous case.

In fact, gcd(26,34)=2.

As a base case, notice that for any number n, gcd(n,0)=n.

Euclid's algorithm

(define (gcd a b)
  (if (zero? b)
      a
      (gcd b (remainder a b))))

Let's use the algorithm to compute gcd(37,100); we'll keep track of both the remainder and the quotient of a,b:

100 = 37⋅2 + 26       so   gcd(100,37) = gcd(37,26)
 37 = 26⋅1 + 11       so   gcd( 37,26) = gcd(26,11)
 26 = 11⋅2 +  4       so   gcd( 26,11) = gcd(11, 4)
 11 =  4⋅2 +  3       so   gcd( 11, 4) = gcd( 4, 3)
  4 =  3⋅1 +  1       so   gcd(  4, 3) = gcd( 3, 1)
  3 =  1⋅3 +  0       so   gcd(  3, 1) = gcd( 1, 0) = 1

The interesting part is using this same information to figure out the 73 is the invers of 37 (mod 100), which lets Euclid break our public-key attempt. First, we re-write a bunch of the equations above, in reverse order and just moving some terms from the right over to the left:

  1 =   4 -  1⋅ 3
  3 =  11 -  2⋅ 4
  4 =  26 -  2⋅11
 11 =  37 -  1⋅26
 26 = 100 -  2⋅37

Note that the first line gives the gcd (1) in terms of the 3,4, the last remainders from the Euclid's algorithm. We will leverage that and the second line to express 1 in terms of 4,11. After another step we'll have 1 in terms of 11,26; by repeating this procedure we'll get 1 in terms of 37 and 100, which will give us the answer we want:

  1 =   4 - 1⋅ 3       1 = 4 - 1⋅3
  3 =  11 - 2⋅ 4         = 4 - 1⋅(11 - 2⋅4) = -11 + 3⋅4
  4 =  26 - 2⋅11         = -11 + 3⋅(26 - 2⋅11) = 3⋅26 - 7⋅11
 11 =  37 - 1⋅26         = 3⋅26 - 7⋅(37 - 1⋅26) = -7⋅37 + 10⋅26
 26 = 100 - 2⋅37         = -7⋅37 +10⋅(100-2⋅37) = 10⋅100 -27⋅37

This last equation is the one we want: -27⋅37 = 1 + (-10)⋅100, which is another way of saying -27⋅37 ≡ 1 (mod 100). Thus, -27 ≡ 73 is the inverse of 37, mod 100!

Now Amazon could try to Euclid by using a very long public key and modulus (say, a hundred digits instead of just two). However, this doesn't help much: Euclid's algorithm is very efficient, and runs in time O(log(n)) — proportional to the number of digits of n, which is on a par with how long it takes Amazon to decode messages sent to them anyway.

RSA — a better system

Named after Rivest, Shamir, Adleman. An idea similar to the above, but there is no (known) ``back-door'' to quickly compute the private key from the public key.

To explain how RSA works, we need a couple of results first:

Th'm: inverses mod m

If gcd(a,m) = 1, then a^-1 (mod m) exists and can be efficiently computed.

We only gave one example of Euclid's algorithm to compute the gcd (above), but it can generalize directly to give the theorem.

Th'm: Fermat's Little

Let p be prime. Then ∀a∈Z, a^p-1≡a (mod p).

Proof: Consider the set of numbers {1,2,3,…p-1}; Also, consider {a,2a,3a,…a(p-1)} (mod p). These two sets are actually the same numbers (in a different order): (Why? Hint: prove that each of these numbers are different, and non-zero.) So the product of all these numbers (mod p) is also equal: (p-1)! ≡ a^p-1(p-1)! (mod p). Since (p-1)! is relatively prime to p, it has an inverse (mod p), and we conclude 1 ≡ a^p-1.

Th'm: Chinese Remainder Corollary

Sun-Tzu posed the question, ``Suppose we have an unknown number of objects. When counted in threes, 2 are left over, when counted in fives, 3 are left over, and when counted in sevens, 2 are left over. How many objects are there?'' He then proceeds to answer the question, in the process sketching the proof that doing arithmetic mod 3⋅5⋅7 is the same as doing three simpler arithmetic problems in parallel (one mod 3, another mod 5, and another mod 7). (More precisely: if p,q are relatively prime, Z/pq is isomorphic to the pair ⟨Z/p,Z/q⟩.)

We won't prove the Chinese Remainder Theorem here (see book if interested), but we need a corollary:

If gcd(m,n)=1, and
x ≡ a (mod m)
x ≡ a (mod n)
then x ≡ a (mod m⋅n)

Setting up RSA

Find p,q large primes (200digits, say). [There is a fast way to do this that we haven't shown. Note that primes being common helps us.] Find an 'exponent' e such that e is relatively prime to (p-1)(q-1). Finally, find d=e^-1 mod (p-1)(q-1). (Rosen shows how to use Euclid's algorithm for this; we won't cover it.) Public key: n, e. Private key: d

En- and De-crypting RSA

To encrypt a message, break it into packets and encode each packet as a number. Then for the number M, the encryped message C is

C ≡ M^e (mod n)

where ⟨n,e⟩ is the recipient's public key.

To decrypt, using the recipient's private key d:

C^d (mod n) ≡ M^ed (mod n) ≡ M^{1+k(p-1)(q-1)} (mod n)

A bit of math can be used to show that this is just M¹ (mod n): If we look at this (mod p) alone, we note that C^d ≡ M^{1+k(p-1)(q-1)} ≡ M⋅(M,sup>(p-1))^k(q-1) ≡ M⋅1^k(q-1) ≡ M (mod p). by Fermat's Little Th'm. Similarly, considering mod q alone, we can get C^d ≡ M (mod q). Thus by the Chinese Remainder Th'm corollary, we really do have C^d ≡ M mod n.

RSA attacks

If only an eavesdropper could determine p,q then they could break the code. However, they only know the product n; how difficult is it to factor n?

The security of this method hinges upon the inability to efficiently factor a 400-digit number. (There are 10⁴⁰⁰ potential divisors, or even if you only use primes, there are still 10⁴⁰⁰/400ln(10) ~ 10^397 divisors to test. How long, at 10¹⁰ divisions/sec?)

Th'm: Prime Number Th'm

How common are primes? Certainly, their distribution is a bit sporadic, but as numbers get bigger they seem to become more sparse. Let π(N) denote the number of primes less than N. A crown jewel of Number Theory is

π(N) ~ N/ln(N)

This is a statement about the limit as N approaches ∞; Several error bounds are known. More Details

So: How many 100-digit numbers are prime? About one in 100!
(Well, we're off by a factor of ln(10): it's actually closer to 1 in 300).
(By comparison, how many 100-digit numbers are perfect squares? Out of the first 10¹⁰⁰ numbers, only 10⁵⁰ are, which is a miniscule fraction!)
Primes are very common! By choosiing 100-digit numbers at random, a computer is almost sure to find one within a few hundred iterations.

However, this brings up a different question: if a computer guesses a number, can it (quickly) know whether the number is prime? Alas, doing 10¹⁰⁰ attempted divisions doesn't count as ``quick''. Interestingly (and happily), it seems that determining whether a number is prime is easier than factoring; see ``Miller's Test'' in the book for one (probabilistic) algorithm. (People have also recently found a non-randomized algorithm for testing primality.)

Punch line: For years, number theory was thought the 'most pure', least applied branch of mathematics, dealing with 200-digit numbers that can't possibly correspond to counting anything meaningful. But with the advent of Public-Key crypto, suddenly it's one of the most economically and militarily important branches of math.

Related topics: clipper chip; messages as pictures; messages hidden in pictures; digital watermarks

[an error occurred while processing this directive] [an error occurred while processing this directive]