15-451 Algorithms		     
* Approximation Algorithms			4/10/2008
  - vertex cover 				Author: Avrim Blum
  - set cover					Instructor: Manuel Blum
  - MAX-3-SAT                                     
======================================================================
NP-completeness summary:
=======================
 - picture of P, NP, co-NP, PSPACE, turing-computable functions
   (PSPACE = problems solvable with polynomial amount of memory usage. 
   Anything in NP is also in PSPACE since with polynomial space you
   can just have a counter that tries all possible proof strings, running
   each one into the verifier to see if any of them work.)

 - NP-complete problems: in NP *and* capture essence of entire class
   in that a polynomial-time algorithm to solve one of them would let you
   solve anything in NP.

 - To prove that problem A is NP-complete, after showing A is in NP
   (usually the easy part), it's sufficient to take some problem B that
   you already know to be NP-complete (like 3-SAT) and show how you could
   use an algorithm for problem A to solve problem B.  I.e., to reduce B
   to A.

APPROXIMATION ALGORITHMS
========================
General Question: Maybe we can't hope for a fast algorithm that always
gets the best solution, but can we at least guarantee to get a "pretty
good" solution?  E.g., can we guarantee to find a solution that's
within 10% of optimal?  Or how about within a factor of 2 of optimal?
Or, anything non-trivial?

Interesting thing: even though all NP-complete problems are equivalent
in terms of difficulty of finding optimal solution, the difficulty of
getting a good approximation varies all over the map.

VERTEX COVER
============
   - GIVEN: a graph G.  GOAL: find the smallest set of vertices such that
	every edge is incident to (touches) at least one vertex in the set. 

   - Example:                            +----+----+
                                         |    |    |
                                         +----+----+

   - Can think of as: what is the fewest # of guards we need to place
	in museum to cover all the corridors.

   - This problem is NP-hard.  But it turns out that for any graph G
     we can get within a factor of 2. 

Let's start with some strategies that *don't* work.

  Strawman #1: Pick arbitrary vertex with at least one uncovered edge
	incident, put into cover, and repeat.  What would be a bad example?
	[A: how about a star graph]

  Strawman #2: how about picking the vertex that covers the *most*
	uncovered edges.  Turns out this doesn't work either. [make bipartite
	graph where opt is size t, but this alg finds one of size t +
	floor(t/2) + floor(t/3) + floor(t/4) + ... + 1.  This is Theta(t log
	t).  Best examples are with t=6 or t=12.]

How to get factor of 2.  Two algorithms:

 Alg1:  Pick arbitrary edge.  We know any VC must have at least 1
        endpt, so lets take both.  Then throw out all edges covered
        and repeat.   Keep going until no uncovered edges left.  What
	we've found in the end is a matching (a set of edges no two of
	which share an endpoint) that's "maximal" (meaning that you can't add
	any more edges to it and keep it a matching).  This means if
	we take all endpoints, we have a VC.  So, if we picked k
	edges, our VC has size 2k. But, any VC must have size at least k since
	you need to have at least one endpoint of each edge (and,
	these edges don't touch, so these are k *different* vertices).

Here's another 2-approximation algorithm for Vertex Cover:

 Alg2:  Step1: Solve a *fractional* version of the problem.  Have a
        variable x_i for each vertex.  Constraint 0<= x_i <= 1.  Think
        of x_i = 1 as picking the vertex, x_i = 0 as not picking it,
        and in-between as "partially picking it".  Each edge should be
        covered in that for each edge (i,j) we want x_i+x_j >= 1.
        Then our goal is to minimize sum_i x_i.  We can solve this
        using linear programming.  This is called an "LP relaxation"
        because any true vertex cover is a feasible solution, but
        we've made the problem easier by allowing fractional solutions
        too.  So, the value of the optimal solution now will be at
        least as good as the smallest vertex cover, maybe even better,
        but it just might not be legal any more.

        E.g., triangle-graph.   E.g., star-graph.

        Step2: now pick each vertex i such that x_i >= 1/2.

   Claim 1: this is a VC.  Why?  [get at least 1 endpt of each edge]

   Claim 2: The size of this VC is at most twice the size of the
   optimal VC.  Why?  Let OPT_frac be the value of the optimal
   fractional solution, and OPT_VC be the size of the smallest vertex
   cover.  First, as we noted above, OPT_frac <= OPT_VC.  Second, our
   solution is at most 2*OPT_frac since it's no worse than doubling
   and rounding down.  So, put together, our solution <= 2*OPT_VC.

   Interesting fact: nobody knows any algorithm with approximation
   ratio 1.9.  Best known is 2 - O(1/sqrt(log n)), which is 2 - o(1).

   Current best hardness result: Hastad shows 7/6 is NP-hard.
   Improved to 1.361 by Dinur and Safra.  Beating 2-epsilon has been
   related to some other open problems, but not known to be NP-hard.


SET-COVER
---------
Set-cover:
	Given a domain X of n points, and m subsets S_1, S_2, ..., S_m of
	these points.  Goal: find the fewest number of these subsets needed to
	cover all the points.

Set-cover is NP-hard.  However, there's a simple algorithm that gets a
ratio of ln(n): 

	Greedy Algorithm: Pick the set that covers the most points.  Throw out
	all the points covered.  Repeat.
	
	What's an example where this *doesn't* find the best solution?
	
	Theorem: If the optimal solution uses k sets, the greedy algorithm
	finds a solution with at most k*ln(n) sets.
	
	Proof: Since the optimal solution uses k sets, there must some
	set that covers at least a 1/k fraction of the
	points.  Therefore, after the first iteration of the algorithm,
	there are at most n(1 - 1/k) points left.  After the second,
	there are at most n(1 - 1/k)^2 points left, etc.  After k
	rounds, there are at most n(1 - 1/k)^k < n*(1/e) points left.
	After k*ln(n) rounds, there are < n*(1/e)^{ln n} = 1 points
	left, which means we must be done.
	
In fact, it's been proven that unless everything in NP can be solved in time
n^{O(loglog n)}, then you can't get better than (1-epsilon)*ln(n) [Feige].

MAX-SAT:  Given a CNF formula (like in SAT), try to maximize the
number of clauses satisfied.

  To make things cleaner, let's assume we have reduced each clause [so,
  (x or x or y) would become just (x or y), and (x or not(x)) would be
  removed] 

Claim: if every clause has size exactly 3 (this is sometimes called
  the MAX-exactly-3-SAT problem), then there is a simple randomized
  algorithm can satisfy at least a 7/8 fraction of clauses.  So,
  this is for sure at least a 7/8-approximation.

Proof: Just try a random assignment to the variables.  Each clause has
  a 7/8 chance of being satisfied.  So if there are m clauses total, the
  expected number satisfied is (7/8)m.  If the assignment satisfies
  less, just repeat.  Since the number of clauses satisfied is bounded
  (it's an integer between 0 and m), with high probability it won't take
  too many tries before you do at least as well as the expectation.

How about a deterministic algorithm?  Here's a nice way we can do that.
  First, let's generalize the above statement to talk about general CNF
  formulas.  

Claim: Suppose we have a CNF formula of m clauses, with m_1 clauses of
  size 1, m_2 of size 2, etc.  (m = m_1 + m_2 + ...).  Then a random
  assignment satisfies sum_k m_k(1 - 1/2^k) clauses.

Proof: linearity of expectation.

Now, here is a deterministic algorithm : Look at x_1: for each of the
  two possible settings (0 or 1) we can calculate the expected number of
  clauses satisfied if we were to go with that setting, and then set the
  rest of the variables randomly.  (It is just the number of clauses
  already satisfied plus sum_k m_k(1-1/2^k), where m_k is the number of
  clauses of size k in the ``formula to go''.)  Fix x_1 to the setting
  that gives us a larger expectation.  Now go on to x_2 and do the same
  thing, setting it to the value with the highest expectation-to-go, and
  then x_3 and so on.  The point is that since we always pick the
  setting whose expectation-to-go is larger, this expectation-to-go
  never decreases (since our current expectation is the average of the
  ones we get by setting the next variable to 0 or 1).

This is called the ``conditional expectation'' method.  The algorithm
  itself is completely deterministic --- in fact we could rewrite
  it to get rid of any hint of randomization by just viewing sum_k
  m_k(1-1/2^k) as a way of weighting the clauses to favor the small
  ones, but our motivation was based on the randomized method.

Interesting fact: getting a 7/8 + epsilon approximation for any
  constant epsilon (like .001) for MAX-exactly-3-SAT is NP-hard. 

In general, the area of approximation algorithms and approximation
hardness is a major area of algorithms research.  Occupies a good
fraction of major algorithms conferences.