[an error occurred while processing this directive]
Counting
Choosing items
How many ways to choose a password that is…
-
a single letter/lowercase-or-digit? (an ``alphanumeric'' character)
-
a single letter/lowercase-or-hexdigit? (hexdigits can be either case)
-
a letter/lowercase-and-digit (one of each, in that order)?
-
4 digits in a row? (``digit and digit and digit and digit'')
— a bike/luggage lock. How long to crack such a lock, by brute force?
-
8 alphanumeric letters-or-numbers?
(How long to crack, assuming the passwd program takes 10-7s to
see if the password actually matches? How to foil brute-force attacks?)
-
8-alphanumeric characters, where there is at least one number?
Have to be sneaky: total number, minus those with no numbers.
-
An english word? 25000 to 200000.
AOL uses pairs of words — how much more secure is that?
-
My magazine acct# has 30 digits. How many accts might they have?
Well, people can have multiple accounts; how many accts per person
do they have?
[Okay actually they are encoding information; it's not just a number.]
["encode" vs "encrypt"]
[Long optional digression/rumination on brute-force crypto-attacks.]
Note that this is a bit like the movies (matrix, Termnator, …)
where you see the heroic hacker, with their display whirring through
all possible inputs keys, seeing if entering that key lets them into the
system.
-
Often, you see the length of the key being tried get shorter — from
20 letter/digits down to length 19, length 18, etc, finally honing in
on the solution.
This only works if you can somehow determine something like "aha, after
trying some inputs, I know the first letter of the password must be 'K'".
Compare with the purported bike-lock-attack algorithm in class, where
you can determine which dial is causing the problem, and then within
10 steps figure out which digit that particular dial needs.
Not a very effective password system, if it can be broken like that!
-
- The system might be able to detect repeated login failures:
the code that does decrypting might be augmented to detect whether
there are 100,000 false logins within one second, all from
the same ATM/terminal/whatever.
This doesn't work so well in a big centralized system that doesn't
know where the attempt is coming from: E.g. a program trying to
break into paypal.com, where it makes up a random user account
and a random password and makes each attempt from a different IP address.
It's hard to distinguish this attack from regular (valid) failed logins
from across the country/web.
-
For this latter point, you might consider that if you get a failed
attempt, the passwd-authentifier-program waits half a second
before reporting "access denied".
…of course, a smart break-in program might realize "hey if an
attempt takes more than 20ms for an answer, it probably means that
i'm about to get an access-denied answer; i might as well proceed
to the next input".
Thus, password-authentifiers should consider designing a system where
either response — ``access permitted'' or ``access denied'' —
takes (say) a tenth of a second.
This still doesn't negate the brute force attack where the attacker
is setting up many different IP connections in parallel.
Active research, is trying to distinguish such attacks from normal
large-scale usage. Even if you're not concerned with an attacker
actually guessing a password, you might care about denial-of-service
attacks (where the password verifier is busy processing requests,
and 99.99% of them are being generated by a bogus attacker,
meaning that valid requests take forever to process.)
[If interested, consider classes like Computer Security,
and/or Networking.]
Pigeon-hole principle:
-
With 600 freshmen, is there a shared birthday?
-
With 2400 students total, multiple sharings?
-
How many kids have the same initials and birthdays?
[population of TX: 21million.]
-
- Houston population 4million; #hairs on head is bounded by 150,000
(triple that if you include beards); to be safe call it 500,000.
Different sites give slightly different estimates, e.g.
60-150k; 450k w/ neck;
100k;
100k-150k.
pigeon-hole principle: No matter how N pigeons are distributed among k
pigeon-holes, there must be at least ⌈N/k⌉ in some hole.
Proof by contradiction:
N ≤ k⋅(⌈N/k⌉-1) < k⋅(N/k) = N (since ⌈N/k⌉ < N/k +1.)]
Example:
The file system
HFS uses 16-bit addresses
to access disk sectors.
If a disk is 256GB (238 bytes), what is the minimum size of a sector?
(This is how much space must be used to store even a 1-byte file.
How many 1-byte files can be stored on the disk?)
The revised HFS+ uses 32-bit addresses.
Now what is the minimum file size?
Permutations
Choosing k elements out of n, where order matters: "Permutations":
P(n,k) = n!/(n-k)!.
We have n choices for the first, and then regardless of this
we have n-1 choices for the second item, … through the kth item.
We say the choices are "independent".
(We'll later define x,y independent as meaning the ways to choose
x and ways to choose y can be multiplied together.)
Note that Perm(n,n)=n!,
and 0! = 1
(how many days for a class of 0 students to
arrange themselves, before a seating pattern is duplicated?)
If people say "a permutation of n things", they mean a (re)arrangement.
Thus, if a 4-letter password is using the letters a,b,c,d
there are P(4,4)=4! permutations of those letters.
Example:
In a LAN with n nodes, How many sender/receiver possibilities,
(if you can't send to yourself?) I.e. how many connections.
If each message is sent to one other member (to ensure no personal mail?),
how many sender/receiver/verifier arrangements?
Combinations
Choosing k elements out of n, where order doesn't matter:
A subset of size k (instead of a ordered sequence).
C(n,k)= n!/k!(n-k)!
Note that
P(n,r) = C(n,r)⋅P(r,r)
Th'm : C(n,k) = C(n,n-k).
Example:
Paths on grid from (0,0) to (m,n).
(An Isomorphic problem: bit strings containing m 1's).
Example:
Binomial coeffs:
-
Consider (a+b)^3 = (a+b)(a+b)(a+b)
-
Consider (a+b)^18 = (a+b)(a+b)(a+b)…(a+b);
you take k a's and n-k b's.
Pascal's Identity:
C(n+1,k)=C(n,k)+C(n,k-1)
[See book for Pascal's triangle — how to compute
C(,) via addition only.]
Permutations with repetition (identical elements)
The Contemporary Art Museum (CAM, corner of Bissonet & Main, free, worthwhile)
is pushing their new CAM diet, where you get 3 mangos(M), 2 apples(A),
and a chocolate bar (C); each day you can have one one of these for dessert.
Of course, each artist wants their own individual diet plan;
how many possible diets are there?
[Note: this is the same as the number of strings which
can be made out of the letters 'McMama']
Sol'n: Imagine all letters are distinguished
(not just two "A"s, but "A1" and "A2");
there are clearly 6! possibilities.
(Imagine the big long list of 720 strings.)
But for every entry we can match it with the swapped-A version:
each pair is the same, but our 6! is counting it twice.
Divide by 2, to compensate for over-counting the A's.
Even after this, we see that for every word, we can re-arrage the three M's
in any of 3! = 6 says; thus every entry is part of a group of 6 identical
strings, and we count all 6 instead of just one.
Thus our final answer is
6!/2!3! = 60 artistically meaningful diets.
In general:
Out of n items, if you want to choose
k1 of Type 1,
k2 of Type 2,
…
ki of Type i,
(where
k1 +
k2 +
… +
ki = n),
then there are
n/(k1 ⋅!
k2 ⋅!
…
ki!)
ways of doing so.
Combinations with repetitions
A very different sorts of problem:
You want to choose n items of r different types.
Example:
Leebron decrees that each college will
have its own lab, containing exactly 5 computers.
Each computer can be {sun,ibm,apple}.
Of course, each college wants to be unique and to
have a different lab-setup than all other colleges.
How many colleges can all have a different setup?
(we don't care where in the lab they're placed)
[start to enumerate; it gets complicted soon,
with no obvious ways to figure our over-counts.]
In general, the problem is "choose n items of r types"
(where all items of the same type are indistinguishable —
ie we only care about how many apples, and don't distinguish
them further by where they're located in the room.)
C(n+r-1,n) ways:
n+r-1 slots, where each slot will be either an item or a divider.
[equivalently: choose n of them to be non-dividers.]
C(n+r-1,r-1) = C(n+r-1,n) = ways to choose n items from r categories.
This isn't an obvious trick — very sneaky!
But happily, it's now in your bag of tools.
By the way —
What if we do care(count,distinguish) where in the lab they're placed,
how many possible labs are there?
Partitioning
(Not covered in lecture):
There are four types of partitioning problems (not all covered above):
How many ways to put
5 students into 3 classes, where:
-
each student and class distinguishable
[ie, Amy in Comp210 and Ben in Huma101 is different from vice-versa.]
[easy]
-
classes distinguished, students not
[ie Of 5 foreign-exchange students, how to divvy them
up between 3 classes]
[covered already in lecture]
-
students distinguished, classes not:
[ie how many ways to assign
Amy, Ben, Cat, Deng, and Eustace
into three classes; AB/CD/E is the same as CD/AB/E.]
[combine ideas from above:
Choose 5 items of three types,
and note that we've overcounted by 3!]
-
neither distinguished
[5 balls in 3 boxes, those boxes can move around
so 4/1/0 is same as 4/0/1, and 2/2/1 is the same as 2/2/1 :-]
This is how many sets of positive integers sum to 5;
Look up "Partitioning an integer" in Rosen.
(Hint: generalize to two variables:
consider partitions of m which use no elements larger than n.)
See also
another
example to work on.
In actuality, Comp Sci uses C(n,r) all the time,
but the other versions (and tricks) are only useful occasionally.
So memorize combinations, and for other problems you'll retreat
to the textbook when the problem arises.
Inclusion-Exclusion
Quickly, Inclusion-Exclusion:
Let A be the set of Freshmen,
B the set of Jones students,
and C the set of econ majors.
What is the size of A ∪ B ∪ C?
Well, |A|+|B|+|C| is an over-count, since anybody in
multiple sets is being counted repeatedly.
[Draw Venn Diagram]
In particular, the people in B∪C are counted twice,
so subtract them out; similar for A∪B and A∪C.
|A|+|B|+|C| (note: C(3,1) choices)
- |A∪B| - |A∪C| - |B∪C| (note: C(3,2) choices).
This now correctly counts people in just one set,
and correctly counts people in exactly two,
but the people in all three were added three times
were added three times then subtraced three times.
We must add them in once more:
Correct answer:
|A|+|B|+|C| (note: C(3,1) choices)
- |A∩B| - |A∩C| - |B∩C| (note: C(3,2) choices).
+ |A∩B∩C|
This generalizes to more than three:
To calculate the size of four sets, it's
the size of all C(4,1) original sets,
minus the size of all C(4,2) 2-way intersections,
plus the size of all C(4,3) 3-way intersections,
minus the size of all C(4,4) 4-way intersections.
A note, for probability:
|A∪B∪C| ≤ |A|+|B|+|C|.
This can be a handy approximation esp. when each of A,B,C are all small
compared to the entire domain, and/or when there is little-to-no overlap.
Note use of rhetoric:
On expressing doubts of the qualifications of a Supreme Court nominee,
a pundit is quoted in the N.Y.Times, saying
If 100 legal experts had each recommended 100 top candidates for the
Supreme Court, Mr. Will added, "Miers's name probably would not have
appeared in any of the 10,000 places on those lists."
It is saying something that 100 wouldn't include her, but that could
also mean she's universally agreed-upon as the 101st most qualified person
in the country. The actual rank being suggested is almost certainly
between 100 and 10,000.
http://www.nytimes.com/2005/10/06/politics/politicsspecial1/06nominate.html
(Attributed to "Mr. Will, a conservative essayist", about Bush's nominee.)
quiz,
quiz,
[an error occurred while processing this directive]
[an error occurred while processing this directive]