15-451 Algorithms * Number-theoretic algorithms II - Recap - primality testing (+ dealing with Carmichael #s) - complexity classes RP, co-RP, BPP - RSA ============================================================================== We ended class last time with Fermat's little theorem: if N is prime and A is between 1 and N-1, then A^{N-1} = 1 mod N. So, if we pick some A (like A=2) and compute A^{N-1} mod N (which we now know how to do efficiently), and find the result is not equal to 1, this is a PROOF that N is not prime, even though it gives us no information about how to factor N. Today we'll extend this further to get a randomized polynomial-time algorithm for testing if a number is prime or composite. RECAP ===== DEFN: Z_N^* = {A in 1..N such that GCD(A,N)=1} E.g., Z_15^* = {1,2,4,7,8,11,13,14} Recall, Z_N^* is a group under multiplication mod N. That means it's closed under the group operation (e.g., 7*8=11 mod 15), and also every A in Z_N^* has an inverse B in Z_N^* (AB=1 mod N). E.g., 2^{-1} = 8 mod 15. A REALLY IMPORTANT PROPERTY of groups: say G is a group and H is a subgroup of G (if A,B are in H then A*B is in H; if A is in H then A^{-1} is in H). Then THE SIZE OF H DIVIDES THE SIZE OF G. E.g., G = Z_15^*, H = {1,2,4,8}. (H is the powers of 2) (we proved this last time) THEOREM: for all A in Z_N^*, order(A) divides phi(N). [Recall, order(A) = smallest t>0 s.t. A^t = 1 (mod N). phi(N) = |Z_N^*|. ] PROOF: {1, A, A^2, ..., A^{t-1}} is a subgroup of Z_N^*: it's closed under multiplication mod N and taking inverses. So we just use our Really Important Property. COROLLARY 1: Euler's Theorem: for any A in Z_N^*, A^{phi(N)} = 1 (mod N). COROLLARY 2: Fermat's little theorem: If N is prime, then for any A in Z_N^*, A^{N-1} = 1 mod N. DEFN: N is a Carmichael number if N is composite but A^{N-1}=1 for all A in Z_N^*. PRIMALITY TESTING ================= We're now going to give a fast randomized algorithm for testing if a number is prime, with the following properties: if N is prime, then it outputs YES if N is composite, then it outputs NO with probability at least 1 - 1/2^100. So, if it says YES, you don't have a 100% proof that the number is prime, but you can be pretty confident. [Note, it was only very recently that a *deterministic* poly-time algorithm for this problem was developed] THEOREM: if N is composite but not a Carmichael number, then A^{N-1}!=1 for *at least half* of the A in Z_N^*. Proof: Let H = {A in Z_N^* : A^{N-1} = 1 mod N}. Then H is a subgroup of Z_N^* because it's closed under multiplication (if A^{N-1} = 1 and B^{N-1} = 1 then (AB)^{N-1}=1) and inverses (if AB = 1 then (AB)^{N-1} = 1, so if A^{N-1} = 1 then B^{N-1}=1). This means that if there is even a single element of G that is not in H (namely if N is not Carmichael) then |H| is at most |G|/2. So, right away we get that if there is even a single witness to N being composite, there must be a *lot* of witnesses. (QED So, if we ignore the Carmichael numbers, then the algorithm we want is just this: - pick 100 random values A between 1 and N. - If all have A^{N-1} = 1 mod N then output YES (Probably Prime) - Else output NO (Definitely Composite) [don't even need to test GCD since if GCD is not 1 then for sure A^{N-1} != 1 mod N] The trick for Carmichael numbers is it turns out they are easy to factor. (They are also pretty rare. Smallest is 561.) Combining these two gives us the Miller-Rabin primality test. More on Carmichael numbers: We're going to be able to factor Carmichael numbers using the following idea. Suppose we have some number x that's not 1 or -1 mod N, such that x^2 = 1 mod N. E.g., 11^2 = 1 mod 15. This means that (x-1)*(x+1) is a multiple of N, even though neither x-1 nor x+1 is. So, only some of the factors of N must go into x-1 and the others into x+1 (like 10,12 for N=15). This means GCD(x-1,N) gives us a factor of N (as does GCD(x+1,N)), and recall that GCD is something WE CAN COMPUTE EFFICIENTLY! The way we will find such an x is via the following key lemma: KEY LEMMA: Suppose N is odd, not a prime power or perfect square, and let t < N. If N is composite and there exists x in Z_N^* such that x^t != 1 (mod N), then at least half of x in Z_N^* have x^t != {-1,+1} mod N. Furthermore, if t is ODD, then such an x exists. Proof of KEY LEMMA: Don't have time to go through details but one step is you show the set of x such that x^t *is* in {-1,1} is a subgroup, and then you also use something called the Chinese Remainder Theorem. Proof of factoring Carmichael given our key lemma: First, we can trivially handle Ns that are even, prime-powers, or perfect squares, so we can assume the lemma above holds. Now, take N-1 and pull out all powers of 2, so that we have N-1 = B * 2^k, where B is an odd number. Now, consider exponents t = B, 2B, 4B, 8B, ..., N-1. The lemma says that we can put them into two categories: (1) all A in Z_N^* have A^t = 1 mod N (2) at least half of A have A^t != 1 or -1 (mod N) The lemma tells us that t=B is in category (2), and the fact that N is Carmichael tells us that N-1 is in category 1. So the world looks something like this: t = B 2B 4B 8B ... N-1 --------------------------------------------------- category: 2 2 1 1 ... 1 ^ call this point t_critical Now, pick random A and compute A^B, A^{2B},..., A^{N-1}. Define t_critical as largest exponent in category (2). By definition, there is at least 1/2 chance that A^{t_critical} != {1,-1}. Call this x. So, x is not 1 or -1, but x^2 = 1 mod N. So, we use this to factor! A little more complexity theory =============================== Turns out there are a number of interesting complexity classes that you can define in terms of a polynomial-time algorithm V that takes in two inputs: an instance I, and an auxiliary string w, where we assume that the length of w is polynomial in the size of the instance. NP: A is in NP if there is a polynomial-time algorithm V(I,w) such that: If "I" is in YES(A), then there exists w such that V(I,w) = YES. If "I" is in NO(A), then for all w, V(I,w) = NO. Co-NP: other way around from NP: swap YES and NO. RP (randomized polynomial time): A is in RP if exists poly-time V s.t.: If "I" is in YES(A), then for at least half of w, V(I,w) = YES. If "I" is in NO(A), then for all w, V(I,w) = NO. Co-RP: the other way around. BPP: two-sided error If "I" is in YES(A), then for at least 2/3 of w, V(I,w) = YES. If "I" is in NO(A), then for at least 2/3 of w, V(I,w) = NO. Can boost up the 1/2, 2/3 by repetition. We showed primality in Co-RP. RSA PUBLIC-KEY CRYPTOGRAPHY =========================== RSA is an answer to the question of "how can two people communicate securely if they have never met to exchange secret keys before?". Answer is to somehow separate encryption and decryption. Each person has a public encryption key that's published in the phone book, and then their own private decryption key that's kept secret. To send a msg to A, you look them up in the phone book and use their public key. The point about public-key crypto is that just because you can encrypt a message to A doesn't mean you can decrypt anyone else's msgs sent to A. RSA: Person A (Alice) subscribes by (1) picking two large primes p and q (say 100 decimal digits long) and computing N = p*q. (2) Picking a small odd number e relatively prime to (p-1)*(q-1). E.g., e=3. (3) Computing d = e^{-1} mod phi(N), where phi(N) = (p-1)*(q-1). (4) Publishing the pair (e,N) as her PUBLIC KEY in a global phone book, and keeping the pair (d,N) secret as her SECRET KEY. The primes p and q can now be thrown away. Person B(Bob) sends msg M to Alice by computing x = M^e mod N, and sending x. (5) To decrypt, Alice computes x^d mod N, which is M. [Ignoring various issues like: might want to prepend garbage onto M so that eavesdropper can't guess-and-check] Let's now look at details of how/why all this works: STEP 1: a reasonable proportion of random numbers are prime. So can just pick random numbers and test, until we get two primes. STEP 3: just requires computing inverse which we know how to do. STEP (5): Why do we get back M? Answer is that this comes from Euler's theorem. x^d = M^{de} mod N. By definition, de = 1 + k*phi(N). So, M^{1 + k*phi(N)} = M*M^{phi(N)^k} = M*1^k = M (mod N). Also: use fast exponentiation here since d might be a large number. ====================================================================== Why might we expect RSA to be secure? Here is one fact: given N and e, finding the decryption key d is as hard as factoring. (Though this doesn't say there might not be some other way of decrypting a message that people haven't thought of).