(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF

Lecture 1 Introduction 6.
046J Spring 2015
Lecture 1: Introduction
6.006 pre-requisite:
• Data structures such as heaps, trees, graphs
• Algorithms for sorting, shortest paths, graph search, dynamic programming
Course Overview
This course covers several modules:
1. Divide and Conquer - FFT, Randomized algorithms
2. Optimization - greedy and dynamic programming
3. Network Flow
4. Intractibility (and dealing with it)
5. Linear programming
6. Sublinear algorithms, approximation algorithms
7. Advanced topics
Theme of today’s lecture

Very similar problems can have very different complexity. Recall:
• P: class of problems solvable in polynomial time. O(nk ) for some constant k.
Shortest paths in a graph can be found in O(V 2 ) for example.
• NP: class of problems verifiable in polynomial time.

Hamiltonian cycle in a directed graph G(V, E) is a simple cycle that contains
each vertex in V .
Determining whether a graph has a hamiltonian cycle is NP-complete but ver
ifying that a cycle is hamiltonian is easy.
• NP-complete: problem is in NP and is as hard as any problem in NP.

If any NPC problem can be solved in polynomial time, then every problem in
NP has a polynomial time solution.
1
Lecture 1 Introduction 6.046J Spring 2015
Interval Scheduling
Requests 1, 2, . . . , n, single resource
s(i) start time, f (i) finish time, s(i) < f (i) (start time must be less than finish
time for a request)
Two requests i and j are compatible if they don’t overlap, i.e., f (i) ≤ s(j) or
f (j) ≤ s(i).
In the figure below, requests 2 and 3 are compatible, and requests 4, 5 and 6 are
compatible as well, but requests 2 and 4 are not compatible.
2 3
4 5 6
Goal: Select a compatible subset of requests of maximum size.

Claim: We can solve this using a greedy algorithm.
A greedy algorithm is a myopic algorithm that processes the input one piece at a
time with no apparent look ahead.
Greedy Interval Scheduling

1. Use a simple rule to select a request i.
2. Reject all requests incompatible with i.
3. Repeat until all requests are processed.
2
Possible rules?
1. Select request that starts earliest, i.e., minimum s(i).
Long one is earliest. Bad :(
2. Select request that is smallest, i.e., minimum f (i) − s(i).
Smallest. Bad :(
3. For each request, find number of incompatibles, and select request with mini
mum such number.
Least # of incompatibles. Bad :(
4. Select request with earliest finish time, i.e., minimum f (i).
Claim 1. Greedy algorithm outputs a list of intervals
< s(i1 ), f (i1 ) >, < s(i2 ), f (i2 ) >, . . . , < s(ik ), f (ik ) >
such that
s(i1 ) < f (i1 ) ≤ s(i2 ) < f (i2 ) ≤ . . . ≤ s(ik ) < f (ik )
Proof. Simple proof by contradiction – if f (ij ) > s(ij+1 ), interval j and j +1 intersect,
which is a contradiction of Step 2 of the algorithm!
Claim 2. Given list of intervals L, greedy algorithm with earliest finish time produces
k ∗ intervals, where k ∗ is optimal.
Proof. Induction on k ∗ .
Base case: k ∗ = 1 – this case is easy, any interval works.
Inductive step: Suppose claim holds for k ∗ and we are given a list of intervals
whose optimal schedule has k ∗ + 1 intervals, namely
S ∗ [1, 2, . . . , k ∗ + 1] =< s(j1 ), f (j1 ) >, . . . , < s(jk∗ +1 ), f (jk∗ +1 ) >
3
Say for some generic k, the greedy algorithm gives a list of intervals
S[1, 2, . . . , k] =< s(i1 ), f (i1 ) >, . . . , < s(ik ), f (ik ) >
By construction, we know that f (i1 ) ≤ f (j1 ), since the greedy algorithm picks the
earliest finish time.
Now we can create a schedule
S ∗∗ =< s(i1 ), f (i1 ) >, < s(j2 ), f (j2 ) >, . . . , < s(jk∗ +1 ), f (jk∗ +1 ) >
since the interval < s(i1 ), f (i1 ) > does not overlap with the interval < s(j2 ), f (j2 ) >
and all intervals that come after that. Note that since the length of S ∗∗ is k ∗ + 1, this
schedule is also optimal.
Now we proceed to define L' as the set of intervals with s(i) ≥ f (i1 ).
Since S ∗∗ is optimal for L, S ∗∗ [2, 3, . . . , k ∗ + 1] is optimal for L' , which implies
that the optimal schedule for L' has k ∗ size.
We now see by our initial inductive hypothesis that running the greedy algorithm
on L' should produce a schedule of size k ∗ . Hence, by our construction, running the
greedy algorithm on L' gives us S[2, . . . , k].
This means k − 1 = k ∗ or k = k ∗ + 1, which implies that S[1, . . . , k] is indeed
optimal, and we are done.
Weighted Interval Scheduling

Each request i has weight w(i). Schedule subset of requests that are non-overlapping
with maximum weight.
A key observation here is that the greedy algorithm no longer works.
Dynamic Programming
We can define our sub-problems as
Rx = {j ∈ R|s(j) ≥ x}
Here, R is the set of all requests.

If we set x = f (i), then Rx is the set of requests later than request i.
Total number of sub-problems = n (one for each request)
Only need to solve each subproblem once and memoize.
We try each request i as a possible first. If we pick a request as the first, then the
remaining requests are Rf (i) .
4
Note that even though there may be requests compatible with i that are not in
f (i)
R , we are picking i as the first request, i.e., we are going in order.
opt(R) = max (w(i) + opt(Rf (i) ))

1≤i≤n
2
Total running time is O(n ) since we need O(n) time to solve each sub-problem.
Turns out that we can actually reduce the overall complexity to O(n log n). We
leave this as an exercise.
Non-identical machines
As before, we have n requests {1, 2, . . . , n}. Each request i is associated with a start
time s(i) and finish time f (i), m different machine types as well τ = {T1 , . . . , Tm }.
Each request i is associated with a set Q(i) ⊆ τ that represents the set of machines
that request i can be serviced on.
Each request has a weight of 1. We want to maximize the number of jobs that
can be scheduled on the m machines.
This problem is in NP, since we can clearly check that a given subset of jobs with
machine assignments is legal.
Can k ≤ n requests be scheduled? This problem is NP-complete.
Maximum number of requests that should be scheduled? This problem is NP-hard.
Dealing with intractability

1. Approximation algorithms: Guarantee within some factor of optimal in poly
nomial time.
2. Pruning heuristics to reduce (possible exponential) runtime on “real-world”

examples.
3. Greedy or other sub-optimal heuristics that work well in practice but provide
no guarantees.
5
MIT OpenCourseWare
http://ocw.mit.edu
6.046J / 18.410J Design and Analysis of Algorithms

Spring 2015
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 2 Divide and Conquer 6.046J Spring 2015
Lecture 2: Divide and Conquer
• Paradigm
• Convex Hull
• Median finding
Paradigm
Given a problem of size n divide it into subproblems of size nb , a ≥ 1, b > 1. Solve each
subproblem recursively. Combine solutions of subproblems to get overall solution.
n
T (n) = aT ( ) + [work for merge]
b
Convex Hull
Given n points in plane
S = {(xi , yi )|i = 1, 2, . . . , n}
assume no two have same x coordinate, no two have same y coordinate, and no
three in a line for convenience.
Convex Hull ( CH(S) ): smallest polygon containing all points in S.
v r
p u
t s
CH(S) represented by the sequence of points on the boundary in order clockwise

as doubly linked list.
p q r s t
Brute force for Convex Hull

Test each line segment to see if it makes up an edge of the convex hull
• If the rest of the points are on one side of the segment, the segment is on the
convex hull.
• else the segment is not.
O(n2 ) edges, O(n) tests ⇒ O(n3 ) complexity
Can we do better?
Divide and Conquer Convex Hull

Sort points by x coord (once and for all, O(n log n))
For input set S of points:
• Divide into left half A and right half B by x coords
• Compute CH(A) and CH(B)
• Combine CH’s of two halves (merge step)
How to Merge?
A B
a4 b2
a5
a1
a2 b1
b3
a3
L
• Find upper tangent (ai , bj ). In example, (a4 , b2 ) is U.T.
• Find lower tangent (ak , bm ). In example, (a3 , b3 ) is L.T.
• Cut and paste in time Θ(n).
First link ai to bj , go down b ilst till you see bm and link bm to ak , continue along
the a list until you return to ai . In the example, this gives (a4 , b2 , b3 , a3 ).
Finding Tangents
Assume ai maximizes x within CH(A) (a1 , a2 , . . . , ap ). b1 minimizes x within CH(B)
(b1 , b2 , . . . , bq )
L is the vertical line separating A and B. Define y(i, j) as y-coordinate of inter
section between L and segment (ai , bj ).
Claim: (ai , bj ) is uppertangent iff it maximizes y(i, j).
If y(i, j) is not maximum, there will be points on both sides of (ai , bj ) and it
cannot be a tangent.
Algorithm: Obvious O(n2 ) algorithm looks at all ai , bj pairs. T (n) = 2T (n/2) +
Θ(n2 ) = Θ(n2 ).
1 i=1
2 j=1
3 while (y(i, j + 1) > y(i, j) or y(i − 1, j) > y(i, j))
4 if (y(i, j + 1) > y(i, j)) [ move right finger clockwise
5 j = j + 1( mod q)
6 else
7 i = i − 1( mod p) [ move left finger anti-clockwise
8 return (ai , bj ) as upper tangent
Similarly for lower tangent.

n
T (n) = 2T ( ) + Θ(n) = Θ(n log n)
2
Intuition for why Merge works
ap-1 b3 b4
ap b2
b1
a1 bq bq-1
a2
a1 , b1 are right most and left most points. We move anti clockwise from a1 ,
clockwise from b1 . a1 , a2 , . . . , aq is a convex hull, as is b1 , b2 , . . . , bq . If ai , bj is such
that moving from either ai or bj decreases y(i, j) there are no points above the (ai , bj )
line.
The formal proof is quite involved and won’t be covered.
Median Finding
Given set of n numbers, define rank(x) as number of numbers in the set that are ≤ x.
Find element of rank l n+1
2
J (lower median) and I n+1
2
l (upper median).
Clearly, sorting works in time Θ(n log n).
Can we do better?
B x C
k - 1 elements h - k elements
Select(S, i)
1 Pick x ∈ S [ cleverly
2 Compute k = rank(x)
3 B = {y ∈ S|y < x}
4 C = {y ∈ S|y > x}
5 if k = i
6 return x
7 else if k > i
8 return Select(B, i)
9 else if k < i
10 return Select(C, i − k)
Picking x Cleverly
Need to pick x so rank(x) is not extreme.
• Arrange S into columns of size 5 (I n5 l cols)
• Sort each column (bigger elements on top) (linear time)
• Find “median of medians” as x
>x
larger
medians x
smaller
<x
How many elements are guaranteed to be > x?

Half of the I n5 l groups contribute at least 3 elements > x except for 1 group with
less than 5 elements and 1 group that contains x.
n n
At lease 3(I 10 l − 2) elements are > x, and at least 3(I 10 l − 2) elements are < x
Recurrence:
O(1), for n ≤ 140

T (n) = (1)
T (I n5 l) + T ( 710n + 6), Θ(n), for n > 140
Solving the Recurrence

Master theorem does not apply. Intuition n5 + 710n < n.
Prove T (n) ≤ cn by induction, for some large enough c.
True for n ≤ 140 by choosing large c
n 7n
T (n) ≤ cI l + c( + 6) + an (2)
5 10
cn 7nc
≤ +c+ + 6c + an (3)
5 10
cn
= cn + (− + 7c + an) (4)
10
70c
If c ≥ n
+ 10a, we are done. This is true for n ≥ 140 and c ≥ 20a.
Appendix 1
Example
b1 b2
a4 b4 b3
a3
a2 a1 L
a3 , b1 is upper tangent. a4 > a3 , b2 > b1 in terms of Y coordinates.
a1 , b3 is lower tangent, a2 < a1 , b4 < b3 in terms of Y coordinates.
ai , bj is an upper tangent. Does not mean that ai or bj is the highest point.

Similarly, for lower tangent.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
Lecture 3: Divide and Conquer:

Fast Fourier Transform
• Polynomial Operations vs. Representations
• Divide and Conquer Algorithm
• Collapsing Samples / Roots of Unity
• FFT, IFFT, and Polynomial Multiplication
Polynomial operations and representation

A polynomial A(x) can be written in the following forms:
A(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1

n1
= ak xk
k=0
= (a0 , a1 , a2 , . . . , an−1 ) (coefficient vector)
The degree of A is n − 1.
Operations on polynomials
There are three primary operations for polynomials.
1. Evaluation: Given a polynomial A(x) and a number x0 , compute A(x0 ). This

can be done in O(n) time using O(n) arithmetic operations via Horner’s rule.
• Horner’s Rule: A(x) = a0 + x(a1 + x(a2 + · · · x(an−1 ) · · · )). At each step,

a sum is evaluated, then multiplied by x, before beginning the next step.
Thus O(n) multiplications and O(n) additions are required.
2. Addition: Given two polynomials A(x) and B(x), compute C(x) = A(x) +
B(x) (∀x). This takes O(n) time using basic arithmetic, because ck = ak + bk .
1
3. Multiplication: Given two polynomials A(x) and B(x), compute C(x) =

A(x)·B(x) (∀x). Then ck = kj=0 aj bk−j for 0 ≤ k ≤ 2(n−1), because the degree
of the resulting polynomial is twice that of A or B. This multiplication is then
equivalent to a convolution of the vectors A and reverse(B). The convolution
is the inner product of all relative shifts, an operation also useful for smoothing
etc. in digital signal processing.
• Naive polynomial multiplication takes O(n2 ).

• O(nlg 3 ) or even O(n1+ε ) (∀ε > 0) is possible via Strassen-like divide-and
conquer tricks.
• Today, we will compute the product in O(n lg n) time via Fast Fourier
Transform!
Representations of polynomials
First, consider the different representations of polynomials, and the time necessary
to complete operations based on the representation.
There are 3 main representations to consider.
1. Coefficient vector with a monomial basis
2. Roots and a scale term
• A(x) = (x − r0 ) · (x − r1 ) · · · · · (x − rn−1 ) · c
• However, it is impossible to find exact roots with only basic arithmetic
operations and kth root operations. Furthermore, addition is extremely
hard with this representation, or even impossible. Multiplication simply
requires roots to be concatenated, and evaluation can be completed in
O(n).
3. Samples: (x0 , y0 ), (x1 , y1 ), . . . , (xn−1 , yn−1 ) with A(xi ) = yi (∀i) and each xi
is distinct. These samples uniquely determine a degree n − 1 polynomial A,
according to the Lagrange and Fundamental Theorem of Algebra. Addition
and multiplication can be computed by adding and multiplying the yi terms,
assuming that the xi ’s match. However, evaluation requires interpolation.
The runtimes for the representations and the operations is described in the table
below, with algorithms for the operations versus the representations.
2
Algorithms vs. Representations

Coefficients Roots Samples
Evaluation O(n) O(n) O(n2 )
Addition O(n) ∞ O(n)
Multiplication O(n )
2
O(n) O(n)
We combine the best of each representation by converting between coefficients and

samples in O(n lg n) time.
How? Consider the polynomial in matrix form.
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 x0 x20 · · · xn0 −1 a0 y0
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢1 x1 x21 · · · x1n−1 ⎥ ⎢ a1 ⎥ ⎢ y1 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ a ⎥ ⎢ y ⎥
V · A = ⎢ 1 x2 x22 · · · xn−1
2 ⎥⎢ 2 ⎥ = ⎢ 2 ⎥
⎢ . ⎥ ⎢ ⎥ ⎢ ⎥
⎢ .. .
. .. ... . ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . . . . ⎦⎣ . ⎦ ⎣ . ⎦
−1
1 xn−1 xn−1
2
··· xnn−1 an−1 yn−1
where V is the Vandermonde matrix with entries vjk = xkj .

Then we can convert between coefficients and samples using the matrix vector
product V · A, which is equivalent to evaluation. This takes O(n2 ).
Similarly, we can samples to coefficients by solving V \Y (in MATLAB® notation).
This takes O(n3 ) via Gaussian elimination, or O(n2 ) to compute A = V −1 · Y , if V −1
is precomputed.
To do better than Θ(n2 ) when converting between coefficients and samples, and
vice versa, we need to choose special values for x0 , x1 , . . . , xn−1 . Thus far, we have
only made the assumption that the xi values are distinct.
Divide and Conquer Algorithm

We can formulate polynomial multiplication as a divide and conquer algorithm with
the following steps for a polynomial A(x) ∀ x ∈ X.
1. Divide the polynomial A into its even and odd coefficients:

1n
2
−11
X
Aeven (x) = a2k xk = (a0 , a2 , a4 , . . .)
k=0
ln
2
−1J
X
Aodd (x) = a2k+1 xk = (a1 , a3 , a5 , . . .)
k=0
3
2. Recursively conquer Aeven (y) for y ∈ X 2 and Aodd (y) for y ∈ X 2 , where X 2 =
{x2 | x ∈ X}.
3. Combine the terms. A(x) = Aeven (x2 ) + x · Aodd (x2 ) for x ∈ X.
However, the recurrences for this algorithm is

(n )
T (n, |X|) = 2 · T , |X| + O(n + |X|)
2
= O(n2 )
which is no better than before.

|X|
We can do better if X is collapsing: either |X| = 1 (base case), or |X 2 | = 2
and
X 2 is (recursively) collapsing. Then the recurrence is of the form
(n)
T (n) = 2 · T + O(n) = O(n lg n).
2
Roots of Unity
Collapsing sets can be constructed via square roots. Each of the following collapsing
sets is computing by taking all square roots of the previous set.
1. {1}
2. {1, −1}
3. {1, −1, i, −i}

√ √
4. {1, −1, ± 2
2
(1 + i), ± 2
2
(−1 + i)}, which lie on a unit circle
We can repeat this process and make our set larger and larger by finding more and
more points on this circle. These points are called the nth roots of unity. Formally,
the nth roots of unity are n x’s such that xn = 1. These points are uniformly spaced
around the unit circle in the complex plane (including 1). These points are of the
form (cos θ, sin θ) = cos θ +i sin θ = eiθ by Euler’s Formula, for θ = 0, n1 τ, n2 τ, . . . , n−1
n
τ
(where τ = 2π).
The nth roots of unity where n = 2£ form a collapsing set, because (eiθ )2 = ei(2θ) =
ei(2θ mod τ ) . Therefore the even nth roots of unity are equivalent to the n2 nd roots of
unity.
4
FFT, IFFT, and Polynomial Multiplication

We can take advantage of the nth roots of unity to improve the runtime of our
polynomial multiplication algorithm. The basis for the algorithm is called the Discrete
Fourier Transform (DFT).
The DFT allows the transformation between coefficients and samples, computing
A → A∗ = V · A for xk = eiτ k/n where n = 2£ , where A is the set of coefficients and
A∗ is the resulting samples. The individual terms aj∗ = j=0
Pn−1 iτ jk/n
e · aj .
Fast Fourier Transform (FFT)

The FFT algorithm is an O(n lg n) divide and conquer algorithm for DFT, used by
Gauss circa 1805, and popularized by Cooley and Turkey and 1965. Gauss used the
algorithm to determine periodic asteroid orbits, while Cooley and Turkey used it to
detect Soviet nuclear tests from offshore readings.
A practical implementation of FFT is FFTW, which was described by Frigo and
Johnson at MIT. The algorithm is often implemented directly in hardware, for fixed
n.
Inverse Discrete Fourier Transform

The Inverse Discrete Fourier Transform is an algorithm to return the coefficients
of a polynomial from the multiplied samples. The transformation is of the form
A∗ → V −1 · A∗ = A.
In order to compute this, we need to find V −1 , which in fact has a very nice
structure.
Claim 1. V −1 = n1 V¯ , where V¯ is the complex conjugate of V .1

1
Recall the complex conjugate of p + qi is p − qi.
5
Proof. We claim that P = V · V̄ = nI:
pjk = (row j of V ) · (col. k of V¯ )

n−1
X
= eijτ m/n eikτ m/n
m=0
n−1
eijτ m/n e−ikτ m/n
X
=
m=0
n−1
X
= ei(j−k)τ m/n
m=0
Pn−1
Now if j = k, pjk = m=0 = n. Otherwise it forms a geometric series.
n−1
X
pjk = = (ei(j−k)τ /n )m
m=0
(e ) −1
iτ (j−k)/n n
=
e iτ (j−k)/n −1
=0
because eiτ = 1. Thus V −1 = n1 V̄ , because V · V̄ = nI.
This claim says that the Inverse Discrete Fourier Transform is equivalent to the
Discrete Fourier Transform, but changing xk from eikτ /n to its complex conjugate
e−ikτ /n , and dividing the resulting vector by n. The algorithm for IFFT is analogous
to that for FFT, and the result is an O(n lg n) algorithm for IDFT.
Fast Polynomial Multiplication

In order to compute the product of two polynomials A and B, we can perform the
following steps.
1. Compute A∗ = F F T (A) and B ∗ = F F T (B), which converts both A and B

from coefficient vectors to a sample representation.
2. Compute C ∗ = A∗ · B ∗ in sample representation in linear time by calculating

Ck∗ = Ak∗ · Bk∗ (∀k).
3. Compute C = IFFT(C ∗ ), which is a vector representation of our final solution.
6
Applications
Fourier (frequency) space many applications. The polynomial A∗ = F F T (A) is com
plex, and the amplitude |a∗k | represents the amplitude of the frequency-k signal, while
arg(a∗k ) (the angle of the 2D vector) represents the phase shift of that signal. For ex
ample, this perspective is particularly useful for audio processing, as used by Adobe
Audition, Audacity, etc.:
• High-pass filters zero out high frequencies
• Low-pass filters zero out low frequencies
• Pitch shifts shift the frequency vector
• Used in MP3 compression, etc.
7
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 4 van Emde Boas 6.046J Spring 2015
Lecture 4: Divide and Conquer:
van Emde Boas Trees
• Series of Improved Data Structures
• Insert, Successor
• Delete
• Space
This lecture is based on personal communication with Michael Bender, 2001.
Goal
We want to maintain n elements in the range {0, 1, 2, . . . , u − 1} and perform Insert,
Delete and Successor operations in O(log log u) time.
O(1)
• If n = nO(1) or n(log n) , then we have O(log log n) time operations
– Exponentially faster than Balanced Binary Search Trees

– Cooler queries than hashing
• Application: Network Routing Tables
– u = Range of IP Addresses → port to send (u = 232 in IPv4)
Where might the O(log log u) bound arise ?
• Binary search over O(log u) elements
• Recurrences
log u
– T (log u) = T 2 + O(1)
√
– T (u) = T ( u) + O(1)
Improvements
We will develop the van Emde Boas data structure by a series of improvements on
a very simple data structure.
Bit Vector
We maintain a vector V of size u such that V [ x ] = 1 if and only if x is in the set.
Now, inserts and deletes can be performed by just flipping the corresponding bit
in the vector. However, successor/predecessor requires us to traverse through the
vector to find the next 1-bit.
• Insert/Delete: O(1)
• Successor/Predecessor: O(u)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1
Figure 1: Bit vector for u = 16. THe current set is {1, 9, 10, 15}.
Split Universe into Clusters

√
We can improve performance by splitting up the range {0, 1, 2, . . . , u − 1} into u
√ √
clusters of size u. If x = i u + j, then V [ x ] = V.Cluster [i ][ j].
√
low( x ) = x mod u = j
x
high( x ) = √ =i
u
√
index (i, j) = i u + j
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1
V.Cluster [0] V.Cluster [1] V.Cluster [2] V.Cluster [3]
√
Figure 2: Bit vector (u = 16) split into 16 = 4 clusters of size 4.
• Insert:
– Set V.cluster [ high( x )][low( x )] = 1 O(1)
– Mark cluster high( x ) as non-empty O(1)
• Successor:
√
– Look within cluster high( x ) O( u)
√
– Else, find next non-empty cluster i O( u)
√
– Find minimum entry j in that cluster O( u)
√
– Return index (i, j) Total = O( u)
Recurse
√
The three operations in Successor are also Successor calls to vectors of size u. We
can use recursion to speed things up.
√ √
• V.cluster [i ] is a size- u van Emde Boas structure (∀ 0 ≤ i < u)
• V.summary is a size- u van Emde Boas structure
• V.summary[i ] indicates whether V.cluster [i ] is nonempty
I NSERT (V, x )
1 Insert(V.cluster [ high( x )], low[ x ])
2 Insert(V.summary, high[ x ])
So, we get the recurrence:

√
T (u) = 2T ( u) + O(1)
log u
T ' (log u) = 2T ' + O(1)
2
=⇒ T (u) = T ' (log u) = O(log u)
S UCCESSOR (V, x )
1 i = high( x )
2 j = Successor (V.cluster [i ], j)
3 if j == ∞
4 i = Successor (V.summary, i )
5 j = Successor (V.cluster [i ], −∞)
6 return index (i, j)
√
T (u) = 3T ( u) + O(1)

' ' log u
T (log u) = 3T + O(1)
2
=⇒ T (u) = T ' (log u) = O((log u)log 3 ) ≈ O((log u)1.585 )
To obtain the O(log log u) running time, we need to reduce the number of re
cursions to one.
Maintain Min and Max

We store the minimum and maximum entry in each structure. This gives an O(1)
time overhead for each Insert operation.
S UCCESSOR (V, x )
1 i = high( x )
2 if low( x ) < V.cluster [i ].max
3 j = Successor (V.cluster [i ], low( x ))
4 else i = Successor (V.summary, high( x ))
5 j = V.cluster [i ].min
6 return index (i, j)
√
T (u) = T ( u) + O(1)
=⇒ T (u) = O(log log u)
Don’t store Min recursively

The Successor call now needs to check for the min separately.
if x < V.min : return V.min (1)
I NSERT (V, x )
1 if V.min == None
2 V.min = V.max = x I O(1) time
3 return
4 if x < V.min
5 swap( x ↔ V.min)
6 if x > V.max
7 V.max = x)
8 if V.cluster [ high( x ) == None
9 Insert(V.summary, high( x )) I First Call
10 Insert(V.cluster [ high( x )], low( x )) I Second Call
If the first call is executed, the second call only takes O(1) time. So
√
T (u) = T ( u) + O(1)
D ELETE (V, x )
1 if x == V.min I Find new min
2 i = V.summary.min
3 if i = None
4 V.min = V.max = None I O(1) time
5 return
6 V.min = index (i, V.cluster [i ].min) I Unstore new min
7 Delete(V.cluster [ high( x )], low( x )) I First Call
8 if V.cluster [ high( x )].min == None
9 Delete(V.summary, high( x )) I Second Call
10 I Now we update V.max
11 if x == V.max
12 if V.summary.max = None
13 else
14 i = V.summary.max
15 V.max = index (i, V.cluster [i ].max )
If the second call is executed, the first call only takes O(1) time. So
√
T (u) = T ( u) + O(1)
Lower Bound [Patrascu & Thorup 2007]

Even for static queries (no Insert/Delete)
O(1)
• Ω(log log u) time per query for u = n(log n)
• O(n · poly(log n)) space
Space Improvements
We can improve from Θ(u) to O(n log log u).
• Only create nonempty clusters
– If V.min becomes None, deallocate V
• Store V.cluster as a hashtable of nonempty clusters
• Each insert may create a new structure Θ(log log u) times (each empty insert)
ˇ at]
– Can actually happen [Vladimir Cun´
• Charge pointer to structure (and associated hash table entry) to the structure
This gives us O(n log log u) space (but randomized).
Indirection
We can further reduce to O(n) space.
• Store vEB structure with n = O(log log u) using BST or even an array
=⇒ O(log log n) time once in base case
• We use O(n/ log log u) such structures (disjoint)
=⇒ O( log log
n
u · log log u ) = O( n ) space for small
• Larger structures “store” pointers to them
=⇒ O( log log
n
u · log log u ) = O( n ) space for large
• Details: Split/Merge small structures
6
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 5 Amortization 6.046J Spring 2015
Lecture 5: Amortization
Amortized analysis is a powerful technique for data structure analysis, involving

the total runtime of a sequence of operations, which is often what we really care
about. This lecture covers:
• Different techniques of amortized analysis
– aggregate method
– accounting method
– charging method
– potential method
• Examples of amortized analysis
– table doubling
– binary counter
– 2-3 tree and 2-5 tree
Table doubling
(Recall from 6.006) We want to store n elements in a table of size m = Θ(n). One
idea is to double m whenever n becomes larger than m (due to insertions). The cost
to double a table of size m is clearly Θ(m) = Θ(n), which is also the worse case cost
of an insertion.
But what is the total cost of n insertions? It is at most
20 + 21 + 22 + · · · + 2�lg n� = Θ(n).
In this case, we say each insertion has Θ(n)/n = Θ(1) amortized cost.
Aggregate Method
The method we used in the above analysis is the aggregate method: just add up the
cost of all the operations and then divide by the number of operations.
total cost of k operations

amortized cost per operation =
k
Aggregate method is the simplest method. Because it’s simple, it may not be able
to analyze more complicated algorithms.
Amortized Bound Definition

Amortized cost can be, but does not have to be, average cost. We can assign any
amortized cost to each operation, as long as they “preserve the total cost”, i.e., for
any sequence of operations,
� �
amortized cost ≥ actual cost
where the sum is taken over all operations.
For example, we can say a 2-3 tree achieves O(1) amortized cost per create, O(lg n∗ )
amortized cost per insert, and 0 amortized cost per delete, where n∗ is the maximum
size of the 2-3 tree during the entire sequence of operations. The reason we can claim
this is that for any sequence of operations, suppose there are c creations, i insertions
and d ≤ i deletions (cannot delete from an empty tree), the total amortized cost is
asymptotically the same as the total actual cost:
O(c + i lg n∗ + 0d) = O(c + i lg n∗ + d lg n∗ )
Later, we will tighten the amortized cost per insert to O(lg n) where n is the
current size.
Accounting Method
This method allows an operation to store credit into a bank for future use, if its
assigned amortized cost > its actual cost; it also allows an operation to pay for its
extra actual cost using existing credit, if its assigned amortized cost < its actual cost.
Table doubling
For example, in table doubling:
– if an insertion does not trigger table doubling, store a coin represnting c = O(1)
work for future use.
– if an insertion does trigger table doubling, there must be n/2 elements that are
inserted after the previous table doubling, whose coins have not been consumed.
Use up these n/2 coins to pay for the O(n) table doubling. See figure below.
– amortized cost for table doubling: O(n) − c · n/2 = 0 for large enough c.
– amortized cost per insertion: 1 + c = O(1).
an element a unused coin
table doubling due to the next insert
2-3 trees
Now let’s try the accounting method on 2-3 trees. Our goal is to show that insert has
O(lg n) amortized cost and delete has 0 amortized cost. Let’s try a natural approach:
save a O(lg n) coin for inserting an element, and use this coin when we delete this
element later. However, we will run into a problem: by the time we delete the element,
the size of the tree may have got bigger n' > n, and the coin we saved is not enough
to pay for the lg n' actual cost of that delete operation! This problem can be solved
using the charging method in the next section.
Charging Method
The charging method allows operations to charge cost retroactively to past operations.
amortized cost of an operation = actual cost of this operation

− total cost charged to past operations
+ total cost charged by future operations
Table doubling and halving

For example, in table doubling, when the table doubles from m to 2m, we can charge
Θ(m) cost to the m/2 insert operations since the last doubling. Each insert is charged
by Θ(1), and will not be charged again. So the amortized cost per insert is Θ(1).
Now let’s extend the above example with table halving. The motivation is to save
space when with deletes. If the table is down to 1/4 full, n = m/4, we shrink the
table size from m to m/2 at Θ(m) cost. This way, the table is half full again after
any resize (doubling or shrinking). Now each table doubling still has ≥ m/2 insert
operations to charge to, and each table halving has ≥ m/4 delete operations to charge
to. So the amortized cost per insert or delete is still Θ(1).
Free deletion in 2-3 trees

For another example, let’s consider insertion and deletion in 2-3 trees. Again, our
goal is to show that insert has O(lg n) amortized cost, where n is the size of the tree
when that insert happens, and delete has 0 amortized cost.
Insert does not need to charge anything.
Delete will charge an insert operation. But we will not charge the insert of the
element to be deleted, because we will run into the same problem as the accounting
method. Instead, each delete operation will charge the insert operation that brought
the tree to its current size n. Each insert is still charged at most once, because for
the tree size to reach n again, another insert must happen.
Potential Method
This method defines a potential function Φ that maps a data structure (DS) configu
ration to a value. This function Φ is equivalent to the total unused credits stored up
by all past operations (the bank account balance). Now
amortized cost of an operation = actual cost of this operation + ΔΦ
and
� �
amortized cost = actual cost + Φ(final DS) − Φ(initial DS).
In order for the amortized bound to hold, Φ should never go below Φ(initial DS)
at any point. If Φ(initial DS) = 0, which is usually the case, then Φ should never go
negative (intuitively, we cannot ”owe the bank”).
Relation to accounting method

In accounting method, we specify ΔΦ, while in potential method, we specify Φ. One
determines the other, so the two methods are equivalent. But sometimes one is more
intuitive than the other.
Binary counter
Our first example of potential method is incrementing a binary counter. E.g.,
0011010111
increment ↓
0011011000
Cost of increment is Θ(1 + #1), where #1 represents the number of trailing 1 bits.
So the intuition is that 1 bits are bad.
Define Φ = c · #1. Then for large enough c,
amortized cost = actual cost + ΔΦ

= Θ(1 + #1) + c(−#1 + 1)
= Θ(1)
Φ(initial DS) = 0 if the counter starts at 000 · · · 0. This is necessary for the above
amortized analysis. Otherwise, Φ may become smaller than Φ(initial DS).
Insert in 2-3 trees

Insert can cause O(lg n) splits in the worst case, but we can show it causes only O(1)
amortized splits. First consider what causes a split: insertion into a 3-node (a node
with 3 children). In that case, the 3-node needs to split into two 2-nodes.
So 3-nodes are bad. We define Φ = the number of 3-nodes. Then ΔΦ ≤ 1 −
the number of splits. Amortized number of splits = actual number of splits + ΔΦ =
1. Φ(initial DS) = 0 if the tree is empty initially.
The above analysis holds for any (a, b)-tree, if we define Φ to be the number of
b-nodes.
If we consider both insertion and deletion in 2-3 trees, can we claim both O(1) splits
for insert, and O(1) merges for delete? The answer is no, because a split creates two
2-nodes, which are bad for merge. In the worse case, they may be merged by the next
delete, and then need split again on the next insert, and so on.
What do we solve this problem? We need to prevent split and merge from creating
‘bad’ nodes.
Insert and delete in 2-5 trees

We can claim O(1) splits for insert, and O(1) merges for delete in 2-5 trees.
In 2-5 trees, insertion into a 5-node (a node with 5 children) causes it to split into
two 3-nodes.
promote to parent
e
5 k e y s 5 k y s
6 children 3 children 3 children
Deletion from a 2-node causes it to merge with another 2-node to form a 3-node.
demote from parent

x
1
x 1
key demoted 2 children 3 children
1 child left
5-nodes and 2-nodes are bad. We define Φ = # of 5-nodes + # of 2-nodes.

Amortized splits and merges = 1. Φ(initial DS) = 0 if the tree is empty initially.
The above analysis holds for any (a, b)-tree where b > 2a, because splits and
merges do not produce bad nodes. We define Φ to be the number of b-nodes plus the
number of a nodes.
Note: The potential examples could also be done with the accounting method by
placing coins on 1s (binary counter) or 2/5-nodes ((2, 5)-trees).
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 6 Randomized Algorithms 6.046J Spring 2015
Lecture 6: Randomized Algorithms
• Check matrix multiplication
• Quicksort
Randomized or Probablistic Algorithms

What is a randomized algorithm?
• Algorithm that generates a random number r ∈ {1, ..., R} and makes decisions
based on r’s value.
• On the same input on different executions, a randomized algorithm may
– Run a different number of steps

– Produce a different output
Randomized algorithms can be broadly classified into two types- Monte Carlo and
Las Vegas.
Monte Carlo Las Vegas
runs in polynomial time always runs in expected polynomial time
output is correct with high probability output always correct
Matrix Product
C =A×B
Simple algorithm: O(n3 ) multiplications.
Strassen: multiply two 2 × 2 matrices in 7 multiplications: O(n
log2 7 ) = O(n2.81 )
Coppersmith-Winograd: O(n2.376 )
Matrix Product Checker

Given n × n matrices A, B, C, the goal is to check if A × B = C or not.
Question. Can we do better than carrying out the full multiplication?
We will see an O(n2 ) algorithm that:
• if A × B = C, then P r[output=YES] = 1.
• if A × B = C, then P r[output=YES] ≤ 12 .
We will assume entries in matrices ∈ {0, 1} and also that the arithmetic is mod 2.
Frievald’s Algorithm
Choose a random binary vector r[1...n] such that P r[ri = 1] = 1/2 independently for
r = 1, ..., n. The algorithm will output ’YES’ if A(Br) = Cr and ’NO’ otherwise.
Observation
The algorithm will take O(n2 ) time, since there are 3 matrix multiplications Br,
A(Br) and Cr of a n × n matrix by a n × 1 matrix.
Analysis of Correctness if AB = C
Claim. If AB = C, then P r[ABr = Cr] ≥ 1/2.
Let D = AB − C. Our hypothesis is thus that D = 0. Clearly, there exists r
such that Dr = 0. Our goal is to show that there are many r such that Dr = 0.
Specifically, P r[Dr = 0] ≥ 1/2 for randomly chosen r.
D = AB −C = 0 =⇒ ∃ i, j s.t. dij =
0. Fix vector v which is 0 in all coordinates
except for vj = 1. (Dv)i = dij = 0 implying Dv = 0. Take any r that can be chosen
by our algorithm. We are looking at the case where Dr = 0. Let
r' = r + v
Since v is 0 everywhere except vj , r' is the same as r exept rj' = (rj + vj ) mod 2.
Thus, Dr' = D(r + v) = 0 + Dv = 0. We see that there is a 1 to 1 correspondence
between r and r' , as if r' = r + V = r'' + V then r = r'' . This implies that
number of r' for which Dr' = 0 ≥ number of r for which Dr = 0
From this we conclude that P r[Dr = 0] ≥ 1/2
Quicksort
Divide and conquer algorithm but work mostly in the divide step rather than combine.
Sorts “in place” like insertion sort and unlike mergesort (which requires O(n) auxiliary
space).
Different variants:
• Basic: good in average case
• Median-based pivoting: uses median finding
• Random: good for all inputs in expectation (Las Vegas algorithm)
Steps of quicksort:
• Divide: pick a pivot element x in A, partition the array into sub-arrays L,

consisting of all elements < x, G consisting of all elements > x and E consisting
of all elements = x.
• Conquer: recursively sort subarrays L and G
• Combine: trivial
Basic Quicksort
Pivot around x = A[1] or A[n] (first or last element)
• Remove, in turn, each element y from A
• Insert y into L, E or G depending on the comparison with pivot x
• Each insertion and removal takes O(1) time
• Partition step takes O(n) time
• To do this in place: see CLRS p. 171
Basic Quicksort Analysis

If input is sorted or reverse sorted, we are partitioning around the min or max element
each time. This means one of L or G has n − 1 elements, and the other 0. This gives:
T (n) = T (0) + T (n − 1) + Θ(n)

= Θ(1) + T (n − 1) + Θ(n)
= Θ(n2 )
However, this algorithm does well on random inputs in practice.
Pivot Selection Using Median Finding

Can guarantee balanced L and G using rank/median selection algorithm that runs
in Θ(n) time. The first Θ(n) below is for the pivot selection and the second for the
partition step.
(n)
T (n) = 2T + Θ(n) + Θ(n)
2
T (n) = Θ(n log n)
This algorithm is slow in practice and loses to mergesort.
Randomized Quicksort
x is chosen at random from array A (at each recursion, a random choice is made).
Expected time is O(n log n) for all input arrays A. See CLRS p.181-184 for the
analysis of this algorithm; we will analyze a variant of this.
“Paranoid” Quicksort
Repeat
choose pivot to be random element of A
perform Partition
Until
resulting partition is such that
|L| ≤ 34 |A| and |G| ≤ 34 |A|
Recurse on L and G
“Paranoid” Quicksort Analysis

Let’s define a ”good pivot” and a ”bad pivot”
Good pivot: sizes of L and G ≤ 34 n each
Bad pivot: one of L and G is ≤ 34 n each
bad pivots good pivots bad pivots
n n n
4 2 4
We see that a pivot is good with probability > 1/2.

Let T (n) be an upper bound on the expected running time on any array of n size.
T(n) comprises:
• time needed to sort left subarray
• time needed to sort right subarray
• the number of iterations to get a good call. Denote as c · n the cost of the
partition step
Expectations
2cn
(2cn)/4 3(2cn)/4
(2cn)/16 3(2cn)/16 9(2cn)/16
O(1) O(1)
T (n) ≤ maxn/4≤i≤3n/4 (T (i) + T (n − i)) + E(#iterations) · cn
Now, since probability of good pivot > 12 ,
E(#iterations) ≤ 2
(n) 3n
T (n) ≤ T +T + 2cn
4 4
We see in the figure that the height of the tree can be at most log
4
(2cn) no matter
what branch we follow to the bottom. At each level, we do a total of 2cn work. Thus,
expected runtime is T (n) = Θ(n log n)
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 8 Hashing 6.046J Spring 2015
Lecture 8: Hashing
Course Overview
This course covers several modules:
1. Review: dictionaries, chaining, simple uniform
2. Universal hashing
3. Perfect hashing
Review
Dictionary Problem
A dictionary is an Abstract Data Type (ADT) that maintains a set of items. Each
item has a key. The dictionary supports the following operations:
• insert(item): add item to set
• delete(item): remove item from set
• search(key): return item with key if it exists
We assume that items have distinct keys (or that inserting new ones clobbers old
ones).
This problem is easier than predecessor/successor problems solved in previous
lecture (by van Emde Boas trees, or by AVL/2-3 trees/skip lists).
Hashing from 6.006

Goal: O(1) time per operation and O(n) space complexity.
Definitions:
• u = number of keys over all possible items
• n = number of keys/items currently in the table
• m = number of slots in the table
Solution: hashing with chaining

Assuming simple uniform hashing,
1
Pr {h(k1 ) = h(k2 )} =
k1 =k2 m
n
we achieve Θ(1 + α) time per operation, where α = m is called load factor. The
downside of the algorithm is that it requires assuming input keys are random, and it
only works in average case, like basic quicksort. Today we are going to remove the
unreasonable simple uniform hashing assumption.
Etymology
The English ‘hash’ (1650s) means “cut into small pieces”, which comes from the
French ‘hacher‘ which means “chop up”, which comes from the Old French ‘hache’
which means “axe” (cf. English ‘hatchet’). Alternatively, perhaps they come from
Vulcan ‘la’ash’, which means “axe”. (R.I.P. Leonard Nimoy.)
Universal Hashing
The idea of universal hashing is listed as following:
• choose a random hash function h from H
• require H to be a universal hashing family such that

1
Pr {h(k) = h(k' )} ≤ for all k = k '
h∈H m
.
• now we just assume h is random, and make no assumption about input keys.
(like Randomized Quicksort)
Theorem: For n arbitrary distinct keys and random h ∈ H, where H is a universal

hashing family,
n
E[ number of keys colliding in a slot ] ≤ 1 + α where α =
m

1 if h(ki ) = h(kj )
Proof : Consider keys k1 , k2 , . . . , kn . Let Ii,j = . Then we have
0 otherwise
E[Ii,j ] = Pr{Ii,j = 1}
= Pr{h(ki ) = h(kj )} (1)
1
≤ for any j 6= i
m
� �
L
n
E[# keys hashing to the same slot as ki ] = E Ii,j
j=1
L
n
=
E[Ii,j ] (linearity of expectation)
j=1
L
=
E[Ii,j ] + E[Ii,i ]
6
j=i
n
≤ +1
m
(2)
D
From the above theorem, we know that Insert, Delete, and Search all take O(1+α)
expected time. Here we give some examples of universal hash functions.
All hash functions: H = {all hash functions h : {0, 1, . . . , u − 1} → {0, 1, . . . , m −

1}}. Apparently, H is universal, but it is useless. On one hand, storing a single
hashing function h takes log(mu ) = u log(m) bits » n bits. On the other hand, we
would need to precompute u values, which takes Ω(u) time.
Dot-product hash family:

Assumptions
• m is a prime
• u = mr where r is an integer
In real cases, we can always round up m and u to satisfy the above assumptions. Now
let’s view keys in base m: k = hk0 , k1 , . . . , kr−1 i. For key a = ha0 , a1 , a2 , . . . , ar−1 i,
define
ha (k) = a · k mod m (dot product)
r−1
X (3)
= ai ki mod m
i=0
Then our hash family is H = {ha | a ∈ {0, 1, . . . , u − 1}}

Storing ha ∈ H requires just storing one key, which is a. In the word RAM
model, manipulating O(1) machine words takes O(1) time and “objects of interest”
(here, keys) fit into a machine word. Thus computing ha (k) takes O(1) time.
Theorem: Dot-product hash family H is universal.
Proof : Take any two keys k = 6 k 0 . They must differ in some digits. Say kd =
6 kd0 .
Define not d = {0, 1, . . . , r − 1} \ {d}. Now we have
( r−1 r−1
)
X X
Pr{ha (k) = ha (k 0 )} = Pr ai ki = ai ki0 (mod m)
a a
( i=0 i=0
)
X X
= Pr ai ki + ad kd = ai ki0 + ad kd0 (mod m)
a
6
i=d 6
i=d
( )
X
= Pr ai (ki − ki0 ) + ad (kd − kd0 ) = 0 (mod m)
a
6
i=d
( )
X
= Pr ad = −(kd − kd0 )−1 ai (ki − ki0 ) (mod m)
a (4)
6
i=d
(m is prime ⇒ Zm has multiplicative inverses)

= E [Pr{ad = f (k, k 0 , anot d )}]
anot d ad
X
(= Pr{anot d = x}Pr{ad = f (k, k 0 , x)})
ad
x

1
= E
anot d m
1
=
m

4
Another universal hash family from CLRS: Choose prime p ≥ u (once). Define
hab (k) = [(ak + b) mod p)] mod m. Let H = {hab | a, b ∈ {0, 1, . . . , u − 1}}.
Perfect Hashing
Static dictionary problem: Given n keys to store in table, only need to support
search(k). No insertion or deletion will happen.
Perfect hashing: [Fredman, Komlós, Szemerédi 1984]
• polynomial build time with high probability (w.h.p.)
• O(1) time for search in worst case
• O(n) space in worst case
Idea: 2-level hashing
The algorithm contains the following two major steps:
Step 1: Pick h1 : {0, 1, . . . , u − 1} → {0, 1, . . . , m − 1} from a universal hash family

for m = Θ(n) (e.g., nearby prime). Hash all items with chaining using h1 .
Step 2: For each slot j ∈ {0, 1, . . . , m − 1}, let lj be the number of items in slot j.
lj = |{i | h(ki ) = j}|. Pick h2,j : {0, 1, . . . , u − 1} → {0, 1, . . . , mj } from a universal
hash family for lj2 ≤ mj ≤ O(lj2 ) (e.g., nearby prime). Replace chain in slot j with
hashing-with-chaining using h2,j .
Lm−1
The space complexity is O(n + j=0 lj2 ). In order to reduce it to O(n), we need
to add two more steps:
Lm−1
Step 1.5: If j=0 lj2 > cn where c is a chose constant, then redo Step 1.
Step 2.5: While h2,j (ki ) = h2,j (ki' ) for any i 6= i' , j, repick h2,j and rehash those lj .
The above two steps guarantee that there are no collisions at second level, and
the space complexity is O(n). As a result, search time is O(1). Now let’s look at the
build time of the algorithm. Both Step 1 and Step 2 are O(n). How about Step 1.5
and Step 2.5?
For Step 2.5,
L
6 i' } ≤
Pr{h2,j (ki ) = h2,j (ki' ) for some i = Pr{h2,j (ki ) = h2,j (ki' )} (union bound)
h2,j h2,j
6 ,
i=i
� �
lj 1
≤ · 2
2 lj
1
<
2
(5)
As a result, each trial is like a coin flip. If the outcome is “tail”, we move to the
next step. By Lecture 7, we have E[#trials] ≤ 2 and #trials = O(log n) w.h.p. By a
Chernoff bound, lj = O(log n) w.h.p., so each trial takes O(log n) time. Because we
have to do this for each j, the total time complexity is O(log n) · O(log n) · O(n) =
O(n log2 n) w.h.p.
(
1 if h(ki ) = h(ki0 )
For Step 1.5, we define Ii,i0 = . Then we have
0 otherwise
"m−1 # " n n #
X XX
E lj2 = E Ii,i0
j=0 i=1 i0 =1
n
XX n
= E[Ii,i0 ] (linearity of expectation)
(6)
i=1 i0 =1

n 1
≤n+2 ·
2 m
= O(n) because m = Θ(n)
By Markov inequality, we have

(m−1 ) Pm−1
2
j=0 lj 1
X
Pr lj2 ≤ cn ≤ ≤
h1
j=0
cn 2
for a sufficiently large constant c. By Lecture 7, we have E[#trials] ≤ 2 and #trials =

O(log n) w.h.p. As a result, Step 1 and Step 1.5 combined takes O(n log n) time w.h.p.
7
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 9 Augmentation 6.046J Spring 2015
Lecture 9: Augmentation
This lecture covers augmentation of data structures, including

• easy tree augmentation
• order-statistics trees
• finger search trees, and
• range trees
The main idea is to modify “off-the-shelf” common data structures to store (and
update) additional information.
Easy Tree Augmentation

The goal here is to store x.f at each node x, which is a function of the node, namely
f (subtree rooted at x). Suppose x.f can be computed (updated) in O(1) time from x,
children and children.f . Then, modification a set S of nodes costs O(# of ancestors
of S) to update x.f , because we need to walk up the tree to the root. Two examples
of O(lg n) updates are
• AVL trees: after rotating two nodes, first update the new bottom node and
then update the new top node
• 2-3 trees: after splitting a node, update the two new nodes.
• In both cases, then update up the tree.
Order-Statistics Trees (from 6.006)

The goal of order-statistics trees is to design an Abstract Data Type (ADT) interface
that supports the following operations
• insert(x), delete(x), successor(x),
• rank(x): find x’s index in the sorted order, i.e., # of elements < x,
• select(i): find the element with rank i.
We can implement the above ADT using easy tree augmentation on AVL trees
(or 2-3 trees) to store subtree size: f (subtree) = # of nodes in it. Then we also have
x.size = 1 + c.size for c in x.children.
As a comparison, we cannot store the rank for each node. In that case, insert(−∞)
will change the ranks for all nodes.
rank(x) can be computed as follows:

1
• initialize rank = x.left.size + 1
• walk up from x to root, whenever taking a left move (x → x' ), rank +=

x' .left.size + 1
x'
select(i) can be implemented as follows:
• x = root
2
• rank = x.left.size + 1
• if i = rank: return x
• if i < rank: x = x.left
• if i > rank: x = x.right, i −= rank
Finger Search Trees

The goal of finger search trees [Brown and Tarjan, 1980] is that, if we already have
node y, we want to search x from y in O (lg |rank(y) − rank(x)|) time. Intuitively, we
would like the search of x to be fast if we already have a node y that is close to x.
One idea is to use level-linked 2-3 trees, where each node has pointers to its next
and previous node on the same level.
1
omit the +1 term if indices start at 0.
2
same as above.
The level links can be maintained during split and merge.
split new
merge delete
We store all keys in the leaves. Non-leaf nodes do not store keys; instead, they
store the min and max key of the subtree (via easy tree augmentation).
Then the original top-down search(x) (without being given y) can be implemented
as follows:
• start from the root, look at min & max of each child ci
• if ci . min ≤ x ≤ ci . max, go down to ci
• if ci . max ≤ x ≤ ci+1 . min, return ci . max (as predecessor) or ci+1 . min (as
successor)
search(x) from y can be implemented as follows. Initialize v to the leaf node

containing y (given), and then in a loop do
• if v. min ≤ x ≤ v. max (this means x is in the subtree rooted at v), do top-down

search for x from v and return
• elif x < v. min: v = v.prev (the previous node in this level)
• elif x > v. max: v = v.next (the next node in this level)
• v = v.parent
Analysis. We start at the leaf level, and go up by 1 level in each iteration. At step
i, level link at height i skips roughly ci keys (ranks), where c ∈ [2, 3]. Therefore, if
|rank(y) − rank(x)| = k, we will reach the subtree containing x in O(lg k) steps, and
the top-down search that follows is also O(lg k).
Orthogonal Range Searching and Range Trees

Suppose we have n points in a d-dimension space. We would like a data structure that
supports range query on these points: find all the points in a give axis-aligned
box. An axis-aligned box is simply an interval in 1D, a rectangle in 2D, and a cube
in 3D.
To be more precise, each point xi (for i from 1 to n) is a d-dimension vector
xi = (xi1 , xi2 , . . . , xid ). Range-query(a, b) takes two points a = (a1 , a2 , . . . , ad ) and
b = (b1 , b2 , . . . , bd ) as input, and should return a set of indices {i | ∀j, aj ≤ xij ≤ bj }.
1D case
We start with the simple case of 1D points, i.e., all xi ’s and a and b are scalars.
Then, we can simply use a sorted array. To do range-query(a, b), we simply
perform two binary searches for a and b, respectively, and then return all the points
in between (say there are k of them). The complexity is O(lg n + k).
Sorted arrays are inefficient for insertion and deletion. For a dynamic data struc
ture that supports range queries, we can use finger search tree from the previous
section. Finger search trees support efficient insertion and deletion. To do range-
query(a, b), we first search for a, and then keep doing finger search to the right by 1
until we exceed b. Each finger search by 1 takes O(1), so the total complexity is also
O(lg n + k).
However, neither of the above approaches generalizes to high dimensions. That’s
why we now introduce range trees.
1D range trees
A 1D range tree is a complete binary search tree (for dynamic, use an AVL tree).
Range-query(a, b) can be implemented as follows:
• search(a)
• search(b)
• find the least common ancestor (LCA) of a and b, vsplit
• return the nodes and subtrees “in between”. There are O(lg n) nodes and
O(lg n) subtrees “in between”.
Analysis. O(lg n) to implicitly represent the answer. O(lg n + k) to output all k

answers. O(lg n) to report k via subtree size augmentation.
find(a) find(b)
Figure 1: 1D range tree. Range-query(a, b) returns all hollow nodes and shaded
subtrees. Image from Wikipedia http://en.wikipedia.org/wiki/Range_tree
2D range trees
A 2D range tree consists of a primary 1D range tree and many secondary 1D range
trees. The primary range stores all points, keyed on the first coordinate. Every node
v in the primary range tree stores all points in v’s subtree in a secondary range tree,
keyed on the second coordinate.
Range-query(a, b) can be implemented as follows:
• use the primary range tree to find all points with the correct range on the first
coordinate. Only implicitly represent the answer, so this takes O(lg n).
• for the O(lg n) nodes, manually check whether their second coordinate lie in the
correct range.
• for the O(lg n) subtrees, use their secondary range tree to find all points with
the correct range on the second coordinate.
Analysis. O(lg2 n) to implicitly represent the answer, because we will find O(lg2 n)
nodes and subtrees in secondary range trees. O(lg2 n + k) to output all k answers.
O(lg2 n) to report k via subtree size augmentation.
Space complexity is O(n lg n). The primary subtree is O(n). Each point is dupli
cated up to O(lg n) times in secondary subtrees, one per ancestor.
d-D range trees

Just recurse: primary 1D range tree → secondary 1D range trees → tertiary 1D range
trees → · · ·
Range-query complexity: O(lgd n + k).
Space complexity: O(n lgd−1 n).
See 6.851 for Chazelle’s improved results: O(lgd−1 n + k) range-query complexity

( )d−1
and O n lglglgn
n space complexity.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 10: Dynamic Programming

• Longest palindromic sequence
• Optimal binary search tree
• Alternating coin game
DP notions
1. Characterize the structure of an optimal solution
2. Recursively define the value of an optimal solution based on optimal solutions

of subproblems
3. Compute the value of an optimal solution in bottom-up fashion (recursion &

memoization)
4. Construct an optimal solution from the computed information
Longest Palindromic Sequence

Definition: A palindrome is a string that is unchanged when reversed.
Examples: radar, civic, t, bb, redder
Given: A string X[1 · · · n], n ≥ 1
To find: Longest palindrome that is a subsequence
Example: Given “c h a r a c t e r”
output “c a r a c”
Answer will be ≥ 1 in length
Strategy
L(i, j): length of longest palindromic subsequence of X[i · · · j] for i ≤ j.
1
1 def L(i, j) :
2 if i == j: return 1
3 if X[i] == X[j]:
4 if i + 1 == j: return 2
5 else : return 2 + L(i + 1, j - 1)
6 else :
7 return max(L(i + 1, j), L(i, j - 1))
Exercise: compute the actual solution
Analysis
As written, program can run in exponential time: suppose all symbols X[i] are dis
tinct.
T (n) = running time on input of length n
1 n=1
T (n) =
2T (n − 1) n > 1
n−1
=2
Subproblems
n
But there are ) distinct subproblems: each is an (i, j) pair with
2
i < j. By solving each subproblem only once. running time reduces to
θ(n2 ) · θ(1) = θ(n2 )

where θ(n2 ) is the number of subproblems and θ(1) is the time to solve each
subproblem, given that smaller ones are solved.
Memoize L(i, j), hash inputs to get output value, and lookup hash table to see if
the subproblem is already solved, else recurse.
Memoizing Vs. Iterating

1. Memoizing uses a dictionary for L(i, j) where value of L is looked up by using
i, j as a key. Could just use a 2-D array here where null entries signify that the
problem has not yet been solved.
2. Can solve subproblems in order of increasing j − i so smaller ones are solved

first.
2
Optimal Binary Search Trees: CLRS 15.5

Given: keys K1 , K2 , · · · , Kn , K1 < K2 < · · · < Kn , WLOG Ki = i
weights W1 , W2 , · · · , Wn
Find: BST T that minimizes:
L
n
Wi · (depthT (Ki ) + 1)
i=1
Example: Wi = pi = probability of searching for Ki

Then, we are minimizing expected search cost.
(say we are representing an English → French dictionary and common words
should have greater weight)
Enumeration
Exponentially many trees
1 2
n = 2
2 1
W1 + 2W2 2W1 + W2
3 3 2 1 1
n = 3
2 1 1 3 3 2
1 2 2 3
3W1 + 2W2 + W3 2W1 + 3W2 + W3 2W1 + W2 + 2W3 W1 + 3W2 + 2W3 W1 + 2W2 + 3W3
Strategy
W (i, j) = Wi + Wi+1 + · · · + Wj
e(i, j) = cost of optimal BST on Ki , Ki+1 , · · · , Kj
Want e(1, n)
Greedy solution?
Pick Kr in some greedy fashion, e.g., Wr is maximum.
greedy doesn’t work, see example at the end of the notes.
3
Kr
keys Ki,…, Kr-1 keys Kr+1,…, Kj

e(i, r-1) e(r+1, j)
DP Strategy: Guess all roots

⎧
⎨ Wi if i = j
e(i, j) =
⎩ min (e(i, r − 1) + e(r + 1, j) + W (i, j)) else
i≤r≤j
+W (i, j) accounts for Wr of root Kr as well as the increase in depth by 1 of all

the other keys in the subtrees of Kr (DP tries all ways of making local choices and
takes advantage of overlapping subproblems)
Complexity: θ(n2 ) · θ(n) = θ(n3 )
where θ(n2 ) is the number of subproblems and θ(n) is the time per subproblem.
Alternating Coin Game

Row of n coins of values V1 , · · · , Vn , n is even. In each turn, a player selects either
the first or last coin from the row, removes it permanently, and receives the value of
the coin.
Question
Can the first player always win?
Try: 4 42 39 17 25 6
Strategy
V1 , V2 , · · · , Vn−1 , Vn
1. Compare V1 + V3 + · · · + Vn−1 against V2 + V4 + · · · + Vn and pick whichever is

greater.
2. During the game only pick from the chosen subset (you will always be able to!)
How to maximize the amount of money won assuming you move first?
4
Optimal Strategy
V (i, j): max value we can definitely win if it is our turn and only coins Vi , · · · , Vj
remain.
V (i, i) : just pick i.
V (i, i + 1): pick the maximum of the two.
V (i, i + 2), V (i, i + 3), · · ·
V (i, j) = max{(range becomes (i + 1, j)) + Vi , (range becomes (i, j − 1)) + Vj }
Solution
V (i + 1, j) subproblem with opponent picking
we are guaranteed min{V (i + 1, j − 1), V (i + 2, j)}
Where V (i+1, j−1) corresponds to opponent picking Vj and V (i+2, j) corresponds
to opponent picking Vi+1
We have
V (i + 1, j − 1), V (i, j − 2),

V (i, j) = max min + Vi , min + Vj
V (i + 2, j) V (i + 1, j − 1)
Complexity?
Θ(n2 ) · Θ(1) = Θ(n2 )
Example of Greedy Failing for Optimal BST prob

lem
Thanks to Nick Davis!
5
10
2
1 8
1 3
9
4
Figure 1: cost = 1 × 2 + 10 × 1 + 8 × 2 + 9 × 3 = 55
8
3
10 9
2 4
1
1
Figure 2: cost = 1 × 3 + 10 × 2 + 8 × 1 + 9 × 2 = 49
6
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 11 All-Pairs Shortest Paths 6.046J Spring 2015
Lecture 11: All-Pairs Shortest Paths
Introduction
Different types of algorithms can be used to solve the all-pairs shortest paths problem:
• Dynamic programming
• Matrix multiplication
• Floyd-Warshall algorithm
• Johnson’s algorithm
• Difference constraints
Single-source shortest paths

• given directed graph G = (V, E), vertex s ∈ V and edge weights w : E → R
• find δ(s, v), equal to the shortest-path weight s− > v, ∀v ∈ V (or −∞ if negative
weight cycle along the way, or ∞ if no path)
Situtation Algorithm Time

unweighted (w = 1) BFS O(V + E)
non-negative edge weights Dijkstra O(E + V lg V )
general Bellman-Ford O(V E)
acyclic graph (DAG) Topological sort + one pass of B-F O(V + E)
All of the above results are the best known. We achieve a O(E + V lg V ) bound
on Dijkstra’s algorithm using Fibonacci heaps.
All-pairs shortest paths

• given edge-weighted graph, G = (V, E, w)
• find δ(u, v) for all u, v ∈ V
A simple way of solving All-Pairs Shortest Paths (APSP) problems is by running

a single-source shortest path algorithm from each of the V vertices in the graph.
Situtation Algorithm Time E = Θ(V 2 )

unweighted (w = 1) |V |× BFS O(V E) O(V 3 )
non-negative edge weights |V |× Dijkstra O(V E + V 2 lg V ) O(V 3 )
general |V |× Bellman-Ford O(V 2 E) O(V 4 )
general Johnson’s O(V E + V 2 lg V ) O(V 3 )
These results (apart from the third) are also best known — don’t know how to
beat |V |× Dijkstra
Algorithms to solve APSP

Note that for all the algorithms described below, we assume that w(u, v) = ∞ if
(u, v) ∈ E.
Dynamic Programming, attempt 1

(m)
1. Sub-problems: duv = weight of shortest path u → v using ≤ m edges
2. Guessing: What’s the last edge (x, v)?
3. Recurrence:
d(m) (m−1)
uv = min(dux + w(x, v) for x ∈ V )
0 if u = v
d(0)
uv =
∞ otherwise
4. Topological ordering: for m = 0, 1, 2, . . . , n − 1: for u and v in V :
5. Original problem:
If graph contains no negative-weight cycles (by Bellman-Ford analysis), then
(n−1) (n)
shortest path is simple ⇒ δ(u, v) = duv = duv = · · ·
Time complexity
In this Dynamic Program, we have O(V 3 ) total sub-problems.
Each sub-problem takes O(V ) time to solve, since we need to consider V possible
choices. This gives a total runtime complexity of O(V 4 ).
Note that this is no better than |V |× Bellman-Ford
Bottom-up via relaxation steps
1 for m = 1 to n by 1
2 for u in V
3 for v in V
4 for x in V
5 if duv > dux + dxv
6 duv = dux + dxv
In the above pseudocode, we omit superscripts because more relaxation can never
hurt.
(m) Lm/21 Lm/21
Note that we can change our relaxation step to duv = min(dux +dxv for x ∈ V ).
3
This change would produce an overall running time of O(n lg n) time. (student sug
gestion)
Matrix multiplication
Recall the task of standard matrix multiplication,
n
Given n × n matrices A and B, compute C = A · B, such that cij = k=1 aik · bkj .
• O(n3 ) using standard algorithm
• O(n2.807 ) using Strassen’s algorithm
• O(n2.376 ) using Coppersmith-Winograd algorithm
• O(n2.3728 ) using Vassilevska Williams algorithm
Connection to shortest paths

• Define ⊕ = min and 8 = +
• Then, C = A 8 B produces cij = mink (aik + bkj )

(m)
• Define D(m) = (dij ), W = (w(i, j)), V = {1, 2, . . . , n}
With the above definitions, we see that D(m) can be expressed as D(m−1) 8 W .
In other words, D(m) can be expressed as the circle-multiplication of W with itself m
times.
Matrix multiplication algorithm

• n − 2 multiplications ⇒ O(n4 ) time (stil no better)
lg n
• Repeated squaring: ((W 2 )2 )2··· = W 2 = W n−1 = (δ(i, j)) if no negative-
weight cycles. Time complexity of this algorithm is now O(n3 lg n).
We can’t use Strassen, etc. since our new multiplication and addition operations
don’t support negation.
Floyd-Warshall: Dynamic Programming, attempt 2

(k)
1. Sub-problems: cuv = weight of shortest path u → v whose intermediate
vertices ∈ {1, 2, . . . , k}
2. Guessing: Does shortest path use vertex k?
3. Recurrence:
(k−1) (k−1)
c(k) (k−1)
uv = min(cuv , cuk + ckv )
(0)
cuv = w(u, v)
4. Topological ordering: for k: for u and v in V :

(n) (n)
5. Original problem: δ(u, v) = cuv . Negative weight cycle ⇔ negative cuu
Time complexity
This Dynamic Program contains O(V 3 ) problems as well. However, in this case, it
takes only O(1) time to solve each sub-problem, which means that the total runtime
of this algorithm is O(V 3 ).
Bottom up via relaxation

1 C = (w(u, v))
2 for k = 1 to n by 1
3 for u in V
4 for v in V
5 if cuv > cuk + ckv
6 cuv = cuk + ckv
As before, we choose to ignore subscripts.
Johnson’s algorithm
1. Find function h : V → R such that wh (u, v) = w(u, v) + h(u) − h(v) ≥ 0 for all
u, v ∈ V or determine that a negative-weight cycle exists.
2. Run Dijkstra’s algorithm on (V, E, wh ) from every source vertex s ∈ V ⇒ get

δh (u, v) for all u, v ∈ V
3. Given δh (u, v), it is easy to compute δ(u, v)
Claim. δ(u, v) = δh (u, v) − h(u) + h(v)
Proof. Look at any u → v path p in the graph G
• Say p is v0 → v1 → v2 → · · · → vk , where v0 = u and vk = v.

k
wh (p) = wh (vi−1 , vi )
i=1
= [w(vi−1 , vi ) + h(vi−1 ) − h(vi )]

i=1
k
= w(vi−1 , vi ) + h(v0 ) − h(vk )
i=1
= w(p) + h(u) − h(v)
• Hence all u → v paths change in weight by the same offset h(u) − h(v),
which implies that the shortest path is preserved (but offset).
How to find h?
We know that
wh (u, v) = w(u, v) + h(u) − h(v) ≥ 0
This is equivalent to,
h(v) − h(u) ≤ w(u, v)
for all (u, v) ∈ V . This is called a system of difference constraints.
Theorem. If (V, E, w) has a negative-weight cycle, then there exists no solution to

the above system of difference constraints.
Proof. Say v0 → v1 → · · · → vk → v0 is a negative weight cycle.

Let us assume to the contrary that the system of difference constraints has a
solution; let’s call it h.
This gives us the following system of equations,
h(v1 ) − h(v0 ) ≤ w(v0 , v1 )

h(v2 ) − h(v1 ) ≤ w(v1 , v2 )
..
.
h(vk ) − h(vk−1 ) ≤ w(vk−1 , vk )
h(v0 ) − h(vk ) ≤ w(vk , v0 )
Summing all these equations gives us
0 ≤ w(cycle) < 0
which is obviously not possible.

From this, we can conclude that no solution to the above system of difference
constraints exists if the graph (V, E, w) has a negative weight cycle.
Theorem. If (V, E, w) has no negative-weight cycle, then we can find a solution to

the difference constraints.
Proof. Add a new vertex s to G, and add edges (s, v) of weight 0 for all v ∈ V .
• Clearly, these new edges do not introduce any new negative weight cycles to the
graph
• Adding these new edges ensures that there now exists at least one path from s
to v. This implies that δ(s, v) is finite for all v ∈ V
• We now claim that h(v) = δ(s, v). This is obvious from the triangle inequality:
δ(s, u) + w(u, v) ≥ δ(s, v) ⇔ δ(s, v) − δ(s, u) ≤ w(u, v) ⇔ h(v) − h(u) ≤ w(u, v)
Time complexity
1. The first step involves running Bellman-Ford from s, which takes O(V E) time.
We also pay a pre-processing cost to reweight all the edges (O(E))
2. We then run Dijkstra’s algorithm from each of the V vertices in the graph; the
total time complexity of this step is O(V E + V 2 lg V )
3. We then need to reweight the shortest paths for each pair; this takes O(V 2 )
time.
The total running time of this algorithm is O(V E + V 2 lg V ).
Applications
Bellman-Ford consult any system of difference constraints (or report that it is un
solvable) in O(V E) time where V = variables and E = constraints.
An exercise is to prove the Bellman-Ford minimizes maxi xi − mini xi .
This has applications to
• Real-time programming
• Multimedia scheduling
• Temporal reasoning
For example, you can bound the duration of an event via difference constraint
LB ≤ tend − tstart ≤ U B, or bound a gap between events via 0 ≤ tstart2 − tend1 ≤ ε,
or synchronize events via |tstart1 − tstart2 | ≤ ε or 0.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 12 Minimum Spanning Tree 6.046J Spring 2015
Lecture 12: Greedy Algorithms and
Minimum Spanning Tree
Introduction
• Optimal Substructure
• Greedy Choice Property
• Prim’s algorithm
• Kruskal’s algorithm
Definitions
Recall that a greedy algorithm repeatedly makes a locally best choice or decision, but
ignores the effects of the future.
A tree is a connected, acyclic graph.
A spanning tree of a graph G is a subset of the edges of G that form a tree and
include all vertices of G.
Finally, the Minimum Spanning Tree problem: Given an undirected graph G =
(V, E) and edge weights W : E → R, find a spanning tree T of minimum weight
�
e∈T w(e).
A naive algorithm
The obvious MST algorithm is to compute the weight of every tree, and return the
tree of minimum weight. Unfortunately, this can take exponential time in the worst
case. Consider the following example:
If we take the top two edges of the graph, the minimum spanning tree can consist
of any combination of the left and right edges that connect the middle vertices to the
left and right vertices. Thus in the worst case, there can be an exponential number
of spanning trees.
Instead, we consider greedy algorithms and dynamic programming algorithms to
solve MST. We will see that greedy algorithms can solve MST in nearly linear time.
Properties of Greedy Algorithms

Problems that can be solved by greedy algorithms have two main properties:
• Optimal Substructure: the optimal solution to a problem incorporates the op

timal solution to subproblem(s)
• Greedy choice property: locally optimal choices lead to a globally optimal so
lution
We can see how these properties can be applied to the MST problem
Optimal substructure for MST

Consider an edge e = {u, v}, which is an edge of some MST. Then we can contract e
by merging the vertices u and v to create a new vertex. Then any edge adjacent to
vertex u or v is adjacent to the newly created vertex, and the process could result in
a multiedge if u and v have a mutual neighbor. We resolve the multiedge problem by
creating a single edge with the minimum weight between the two edges.
This leads us to the following lemma:
Lemma 1. If T ' is a minimum spanning tree of G/e, then T ' ∪ {e} is an MST of G.
Proof. Let T ∗ be an MST of G containing the edge e. Then T ∗ /e is a spanning tree

of G' = G/{e}. By definition, T ' is an MST of G' . Thus the total weight of T ' is less
than or equal to that of T ∗ /e, or w(T ' ) ≤ w(T ∗ /e). Then
w(T ) = w(T ' ) + w(e) ≤ w(T ∗ /e) + w(e) = w(T ' )
The statement can be used as the basis for a dynamic programming algorithm, in
which we guess an edge that belongs to the MST, retract the edge, and recurse. At
the end, we decontract the edge and add e to the MST.
The lemma guarantees that this algorithm is correct. However, this algorithm is
requires exponential time, because there are an exponential number of edges that we
can guess to form our MST.
We make the algorithm polynomial time by removing the guessing process.
Greedy Choice Property

The MST problem can be solved by a greedy algorithm because the the locally optimal
solution is also the globally optimal solution. This fact is described by the Greedy-
Choice Property for MSTs, and its proof of correctness is given via a “cut and paste”
argument common for greedy proofs.
Lemma 2 (Greedy-Choice Property for MST). For any cut (S, V \ S) in a graph
G = (V, E, w), any least-weight crossing edge e = {u, v} with u ∈ S and v ∈
/ S is in
some MST of G.
Proof. First, consider an MST T of G. Then T has a path from u to v. Because

u ∈ S and v ∈ / S, the path has some edge e' = {u' , v ' } which also crosses the cut.
Then T ' = T \ {e' } ∪ {e} is a spanning tree of G and w(T ' ) = w(T ) − w(e' ) + w(e),
but e is a least-weight edge crossing (S, V \ S). Thus w(e) ≤ w(e' ), so w(T ' ) ≤ w(T ).
Therefore T ' is an MST too.
Prim’s Algorithm
Now, we can apply the insights from the optimal structure and greedy choice property
to build a polynomial-time, greedy algorithm to solve the minimum spanning tree
problem.
Prim’s Algorithm Psuedocode

1 Maintain priority queue Q on V \ S, where v.key = min{w(u, v) | u ∈ S}
2 Q = V
3 Choose arbitrary start vertex s ∈ V , s.key = ∅
4 for v in V \ {s}
5 v.key = ∞
6 while Q is not empty
7 u = Extract-Min(Q), add u to S
8 for v ∈ Adj[u]
9 if v ∈ Q and v ∈
/ S and w(u, v) < v.key:
10 v.key = w(u, v) (via a Decrease-Key operation)
11 v.parent = u
12 return {{v, v.parent} | v ∈ V \ {s}}
In the above pseudocode, we choose an arbitrary start vertex, and attempt to

sequentially reduce the distance to all vertices. After attempting to find the lowest
weight edge to connect all vertices, we return our MST
Correctness
We prove the correctness of Prim’s Algorithm with the following invariants.
1. v ∈
/ S =⇒ v.key = min{w(u, v) | u ∈ S}
2. Tree TS within S ⊆ MST of G.
The first invariant is follows from Step 8 of the algorithm above. A proof of the
second invariant follows:
Proof. Assume by induction that TS ⊆ MST T ∗ . Then S → S ' ∪ {e}, where e is a

least-weight edge crossing the cut (S, V \ S). Then we can greedily cut and paste
e, which implies that we can modify T ∗ to include e without removing TS , since the
edges of TS do not cross the cut. Therefore TS ∪ {e} = TS' ⊆ T ∗ .
Thus Prim’s Algorithm always adds edges that have the lowest weight and gradu
ally builds a tree that is always a subset of some MST, and returns a correct answer.
Runtime
Prim’s algorithm runs in
O(V ) · TExtract-Min + O(E) · TDecrease-Key
The O(E) term results from the fact that Step 8 is repeated a number of times equal
to the sum of the number of adjacent vertices in the graph, which is equal to 2|E|,
by the handshaking lemma.
Then the effective runtime of the algorithm varies with the data structures used
to implement the algorithm. The table below describes the runtime with the different
implementations of the priority queue.
Priority Queue
TExtract-Min TDecrease-Key Total
Array O(V ) O(1) O(V 2 )
Binary heap O(lg V ) O(lg V ) O(E lg V )
Fibonacci heap O(lg V ) (amortized) O(1) (amortized) O(E + V lg V )
[CLRS ch. 19]
Kruskal’s Algorithm
Kruskal’s Algorithm is another algorithm to solve the MST problem. It constructs
an MST by taking the globally lowest-weight edge and contracting it.
Kruskal’s Algorithm Pseudocode

1 Maintain connected components that have been added to the
MST so far T , in a Union-Find structure
2 Initialize T = ∅
3 for v in V
4 Make-Set(v)
5 Sort E by weight
6 For e = (u, v) ∈ E (in increasing-weight order):
7 if Find-Set(u) = Find-Set(v):
8 Add e to T
9 Union(u, v)
Correctness
We use the following invariant to prove the correctness of Kruskal’s Algorithm.
Claim 3. The tree-so-far T ⊆ MST T ∗ .
Proof. We give an induction proof. We begin by assuming that the tree-so-far T ⊆ T ∗ ,

via the inductive hypothesis. When we add an edge e between some components C1
and C2 , we use the greedy-choice property on the cut (C1 , V \ C2 ). Thus we have
added the edge without removing T , and our new tree-so-far remains a subset of the
MST T ∗ .
Runtime
Kruskal’s algorithm has an overall runtime of
Tsort (E) + O(V ) · TMake-Set + O(E)(TFind + TUnion ) = O(E lg E + Eα(V ))
We note that TMake-Set is O(1) and Tfind + TUnion is amortized O(α(V )) for Union-Find
data structures.
Furthermore, if all weights are integer weights, or all weights are in the range
[0, E O(1) ], then the runtime of the sorting step is O(E), using Counting Sort or a
similar algorithm, and the runtime of Kruskal’s Algorithm will be better than that
of Prim’s Algorithm.
Other MST Algorithms

Currently, the fastest MST algorithm is a randomized algorithm with an expected
runtime of O(V + E). The algorithm was proposed by Karger, Klein, and Tarjan in
1993.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 14 Baseball Elimination 6.046J Spring 2015
Lecture 14: Baseball Elimination

In this lecture, we will be studying how to use max-flow algorithms to compute
elimination of sports teams in a league. Concretely, let us consider the standings of
the AL Eastern Division of the Major League Baseball on August 30, 19961 , without
Detroit’s record.
Table 1: Standings of AL East on August 30, 1996.

Team Wins (wi ) Losses (fi ) To Play (ri ) Games against each other
NY 75 59 28 – 5 7 4 3
Baltimore 71 63 28 5 – 2 4 4
Boston 69 65 28 7 2 – 4 0
Toronto 63 71 28 4 4 4 – 0
Detroit ? ? 28 3 4 0 0 –
NY Ba Bo T D
From this chart, how can we figure out if a team is eliminated? A naive sports
writer can only compute that Team i is eliminated if wi + ri < wj for some other
j. For example, if Detroit’s record was w5 = 46, then Detroit is certainly eliminated
since w5 + r5 = 46 + 28 < 75 = w1 .
This condition, however, is sufficient, but not necessary. For instance, consider
w5 = 47. Though w5 + r5 = 75, either NY or Baltimore will reach 76 wins since they
have 5 games left against each other. How can we determine if Detroit is eliminated
for arbitrary values of w5 ?
To answer this question, we can use max-flow. Consider the Figure 1, where
capacity between s and node i − j is the number of games left to be played between
team i and j, between node i − j and node k = 1, 2, 3, 4 is infinity, and node k and t
is w5 + r5 − wk . The intuition for the construction of the graph is that we will assume
Detroit win all r5 games, and try to keep the number of wins per team to be less than
or equal to the total possible wins of Detroit (≤ w5 + r5 ).
Theorem 1. Team 5 (Detroit) is eliminated if and only if max-flow does not saturate
all edges leaving the source, i.e., max flow value < 26.
Proof. Saturation of the edge capacity corresponds to playing all the remaining games.
If all the games cannot be played, while keeping the total number of wins of a team
to be less than or equal to w5 + r5 , then Team 5 is eliminated.
1
roughly speaking!
1
Lecture 14 Baseball Elimination 6.046J Spring 2015
Figure 1: Flow network to determine if Team 5 is eliminated
In Figure 1, the min-cut (S, T ) is S = {s, 1 − 2, 1 − 3, 2 − 3, 1, 2, 3} and T =

{1 − 4, 2 − 4, 3 − 4, 4, t}. The capacity of the min-cut c(S, T ) = 4 + 4 + 4 + 1 + 5 + 7
= 25 < 26. Therefore, Team 5 Detroit is eliminated.
Alternate explanation: Note that the max-flow will find the subset of teams
that eliminates Team 5. In this example, consider subset of teams 1,2, and 3. The
total number of wins among the 3 teams is 215 wins, and they have 14 games left
to play with each other. Then there will be 229 total wins at the end of the regular
season. This implies that there exists at least one team that wins 1 2293
l = 77 games.
Therefore, if Detroit only has 48 wins, then it is certainly eliminated (48 + 28 = 76 <
77). We can find such set using max-flow and max-flow min-cut theorem.
2
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 15 Linear Programming 6.046J Spring 2015
Lecture 15: Linear Programming
Linear programming (LP) is a method to achieve the optimum outcome under

some requirements represented by linear relationships. More precisely, LP can solve
the problem of maximizing or minimizing a linear objective function subject to some
linear constraints.
In general, the standard form of LP consists of
• Variables: x = (x1 , x2 , . . . , xd )T
• Objective function: c · x
• Inequalities (constraints): Ax ≤ b, where A is a n × d matrix
and we maximize the objective function subject to the constraints and x ≥ 0.

LP has many different applications, such as flow, shortest paths, and even politics.
In this lecture, we will be covering different examples of LP, and present an algorithm
for solving them. We will also learn how to convert any LP to the standard form in
this lecture.
1 Examples of Linear Programming: Politics

In this example, we will be studying how to campaign to win an election. In general,
there are n demographics, each with pi people, and m issues that the voters are
interested in. Given the information on how many votes can be obtained per dollar
spent advertising in support of an issue, how can we guarantee victory by ensuring a
majority vote in all demographics?1
In particular, consider the example with 3 demographics and 4 issues shown in
Table 1. What is the minimum amount of money we can spend to guarantee majority
in all demographics?
Let x1 , x2 , x3 , x4 denote the dollars spent per issue. We can now formulate this
problem as an LP problem:
Minimize x1 + x2 + x3 + x4
Subject to − 2x1 + 8x2 + 0x3 + 10x4 ≥ 50, 000 (Urban Majority) (1)
5x1 + 2x2 + 0x3 + 0x4 ≥ 100, 000 (Suburban Majority) (2)
3x1 − 5x2 + 10x3 − 2x4 ≥ 25, 000 (Rural Majority) (3)
x1 , x2 , x3 , x4 ≥ 0 (Can’t unadvertise) (4)
1
We will assume that the votes obtained by advertising different issues are disjoint.
Policy Demographic
Urban Suburban Rural
Building roads -2 5 3
Gun Control 8 2 -5
Farm Subsidies 0 0 10
Gasoline Tax 10 0 2
Population 100,000 200,000 50,000
Table 1: Votes per dollar spent on advertising, and population.
1.1 Certificate of Optimality

Though we have not presented an algorithm for solving this problem, given a solution,
we can verify that the solution is optimal with a proper certificate. For example,
2050000
x1 =
111
425000
x2 =
111
x3 =0
625000
x4 =
111
3100000
x1 + x2 + x3 + x4 =
111
is a solution to this problem. Now consider the following equation (certificate):
25 46 14 140
· (1) + · (2) + · (3) = x1 + x2 + x3 + x4
222 222 222 222
25 46 14
≥ · 50000 + · 100000 + · 25000

222 222 222
3100000
=
111
140 3100000
⇒ x1 + x2 + x3 + x4 ≥
222 111
140
We also know that x1 + x2 + x3 + x4 ≥ x1 + x2 + x
222 3
+ x4 . Therefore, the given
solution is an optimal solution to the problem.
2 Linear Programming Duality

The short certificate provided in the last section is not a coincidence, but a conse
quence of duality of LP problems. For every primal LP problem in the form of
Maximize c · x
Subject to Ax ≤ b, x ≥ 0,
there exists an equivalent dual LP problem
Minimize b · y
Subject to AT y ≥ c, y ≥ 0.
This property of LP can be used show many important theorems. For instance, the
max-flow min-cut theorem can be proven by formulating the max-flow problem as the
primal LP problem.
3 Converting to Standard Form

The natural LP formulation of a problem may not result in the standard LP form. In
these cases, we can convert the problem to standard LP form without affecting the
answers by using the following rules.
• Minimize an objective function: Negate the coefficients and maximize.
• Variable xj does not have a non-negativity constraint: Replace xj with

x "j − x""j , and xj" , x""j ≥ 0.
• Equality constraints: Split into two different constraints; x = b ⇒ x ≤ b, x ≥

b.
• Greater than or equal to constraints: Negate the coefficients, and translate

to less than or equal to constraint.
4 Formulating LP Problems
In this section, we will give brief descriptions of how to formulate some problems seen
previously in this class as LP problems. Once we have a LP formulation, we can
convert the problem into the standard form as described in Section 3.
4.1 Maximum Flow

We can model the max flow problem as maximization of sum of flows, under some
constraints which will model different properties of the flow. Given G(V, E), the
capacity c(e) for each e ∈ E, the source s, and the sink t,
L
Maximize f (s, v)
v∈V
Subject to f (u, v) = −f (v, u) ∀u, v ∈ V skew symmetry

L
f (u, v) = 0 ∀u ∈ V − {s, t} conservation
v∈V
f (u, v) ≤ c(u, v) ∀u, v ∈ V capacity.
4.2 Shortest Paths

We can model the shortest paths problem as minimization of the sum of all distances
from a node. Note that this sum is minimized only when all distances are minimized.
Given G(V, E), the weight w(e) for each e ∈ E, and the source s,
L
Maximize d(v)
v∈V
Subject to d(v) − d(u) ≤ w(u, v)∀u, v ∈ V triangular inequality

L
d(s) = 0.
v∈V
Note the maximization above, so all distances don’t end up being zero. There is no
solution to this LP if and only if there exists a negative weight cycle reachable from
s.
5 Algorithms for LP
There are many algorithms for solving LP problems:
• Simplex algorithm: In the feasible region, x moves from vertex to vertex in

direction of c. The algorithm is simple, but runs in exponential time in the
worst case.
• Ellipsoid algorithm: It starts with an ellipsoid that includes the optimal

solution, and keeps shrinking the ellipsoid until the optimal solution is found.
This was the first poly-time algorithm, and was a theoretical breakthrough.
However, the algorithm is impractical in practice.
• Interior Point Method: x moves inside the polytope following c. This algo
rithm runs in poly-time, and is practical.
In this lecture, we will study only the simplex algorithm.
5.1 Simplex Algorithm

As mentioned before, the simplex algorithm works well in practice, but runs in ex
ponential time in the worst case. At a high level, the algorithm works as Gaussian
elimination on the inequalities or constraints. The simplex algorithm works as follows:
• Represent LP in “slack” form.
• Convert one slack form into an equivalent slack form, while likely increasing the
value of the objective function, and ensuring that the value does not decrease.
• Repeat until the optimal solution becomes apparent.
5.1.1 Simplex Example

Consider the following example:
Minimize 3x1 + x2 + x3
Subject to x1 + x2 + 3x2 ≤ 30
2x1 + 2x2 + 5x3 ≤ 24
4x1 + x2 + 2x3 ≤ 36
x1 , x2 , x3 ≥ 0
Change the given LP problem to slack form, consisting of the original variables
called nonbasic variables, and new variables representing slack called basic variables.
z = 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 − 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
We start with a basic solution: we set all nonbasic variables on the right hand side
to some feasible value, and compute the values of the basic variables. For instance,
we can set x1 = x2 = x3 = 0. Note that the all 0 solution satisfies all constraints in
this problem, but may not do so in the general case.
We now perform the pivoting step:
• Select a nonbasic variable xe whose coefficient in the objective function is posi

tive.
• Increase the value of xe as much as possible without violating any constraints.
• Set xe to be basic, while some other basic variable becomes nonbasic.
In this example, we can increase the value of x1 . The third constraint will limit
the value of x1 to 9. We then get
x2 x 3 x 6
x1 = 9 − − − .
4 2 4
Now rewrite the other constraints with x6 on the right hand side.
x2 x3 3x6
z = 27 + + −
4 2 4
x2 x3 x6
x1 = 9 − − −
4 2 4
3x2 5x3 x6
x4 = 21 − − +
4 2 4
3x2 x6
x5 = 6 − − 4x3 +
2 2
We note the equivalence of the solutions. That is, the original basic solution
(0, 0, 0, 30, 24, 36) satisfies the rewritten constraints, and has the objective value of 0.
The second basic solution (9, 0, 0, 21, 6, 0) has the objective value of 27.
At this point, pivoting on x6 will actually cause the objective value to decrease
(though the computation is not shown here). Thus let us pick x3 as the next pivot
to get
111 x2 x5 11x6
z= + − −
4 16 8 16
33 x2 x5 5x6
x1 = − + −
4 16 8 16
3 3x2 x5 x6
x2 = − − +−
2 8 4 8
69 3x2 5x5 x6
x4 = + + −
4 16 8 16
which results in basic solution ( 33
4
, 0, 23 , 69
4
, 0, 0) with objective value of 111
4
.
Finally, pivoting on x2 yields

x3 x5 2x6
z = 28 − − −
6 6 3
x 3 x5 x6
x1 = 8 + + −
6 6 3
8x3 2x5 x6
x2 = 4 − − +
3 3 3
x 3 x5
x4 = 18 − +
2 2
Though we will not prove the correctness of this algorithm in this lecture, when all
coefficients of all nonbasic variables are negative, the simplex algorithm has found the
( )
optimal solution. In general, simplex algorithm is guaranteed to converge in n+m n
iterations where n is the number of variables, and n + m is the number of constraints.
This for general n and m can be exponential.
6 More Topics of LP
There are several important questions regarding LP that were not discussed in this
lecture:
• How do we determine if LP is feasible?
• What if LP is feasible, but the initial basic solution is infeasible?
• How do we determine if the LP is unbounded?
• How do we choose the pivot?
These questions are answered in the textbook and other LP literature.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture NP-Completeness 6.046J Spring 2015
Lecture 16: NP-Completeness
Introduction
• NP-hardness and NP-completeness
• 3SAT
4
Super Mario Brothers

4
3 Dimensional Matching
4
Subset Sum (weak)
Partition (weak)
Rectangle Packing (weak)
4-Partition (strong)
Rectangle Packing (strong)
Jigsaw Puzzles
NP-Hard and NP-Complete problems

Today, we discuss NP-Completeness.
Recall from 6.006:
• P = the set of problems that are solvable in polynomial time. If the problem
has size n, the problem should be solved in nO(1) .
• NP = the set of decision problems solvable in nondeterministic polynomial time.

The output of these problems is a YES or NO answer. Nondeterministic refers
to the fact that a solution can be guessed out of polynomially many options in
O(1) time. If any guess is a YES instance, then the nondeterministic algorithm
will make that guess.
In this model of nondeterminism, we can assume that all guessing is done first.
This is equivalent to finding a polynomial-time verifier of polynomial-size certificates
for YES answers.
Note that there is an asymmetry between YES and NO inputs
• A problem X is NP-complete if X ∈NP and X is NP-hard.
• A problem X is NP-hard if every problem Y ∈NP reduces to X.
– If P = NP, then X ∈
/ P.
• A reduction from problem A to problem B is a polynomial-time algorithm that

converts inputs to problem A into equivalent inputs to problem B. Equivalent
means that both problem A and problem B must output the same YES or NO
answer for the input and converted input.
– If B ∈ P, then A ∈ NP
– If B ∈ NP, then A ∈ NP
– If A is NP-hard, then B is NP-hard.
We can show that problems are NP-complete via the following steps.
1. Show X ∈ NP. Show that X ∈ NP by finding a nondeterministic algorithm,

or giving a valid verifier for a certificate.
2. Show X is NP-hard. Reduce from a known NP-complete problem Y to X.

This is sufficient because all problems Z ∈ NP can be reduced to Y , and the
reduction demonstrates inputs to Y can be modified to become inputs to X,
which implies X is NP-hard. We must demonstrate the following properties for
a complete reduction.
(a) Give an polynomial-time conversion from Y inputs to X inputs.

(b) If Y ’s answer is YES, then X’s answer is YES.
(c) If X’s answer is YES, then Y ’s answer is YES.
Finally, a gadget transforms features in an input problem to a feature in an

output problem.
3SAT
The 3SAT was discovered to be NP-complete by Cook in 1971.
Definition 1. 3SAT: Given a boolean formula of the form:
(x1 ∨ x3 ∨ x¯6 ) ∧ (x¯2 ∨ x3 ∨ x¯7 ) ∧ . . .
is there an assignment of variables to True and False, such that the entire formula
evaluates to True?
We note that a literal is of the form {xi , x̄i }, and both forms of the literal corre
spond to the variable xi . A clause is made up of the OR of 3 literals, and a formal is
the AND of clauses.
3SAT ∈ NP because we can create a verifier for a certificate. For a given instance
of 3SAT, a certificate corresponds to a list of assignments for each variable, and a
verifier can compute whether the instances is satisfied, or can be evaluated to true.
Thus the verifier is polynomial time, and the certificate has polynomial length.
It is important to note that this verifier only guarantees that a 3SAT instance is
verifiable. To ensure that a 3SAT instance is not verifiable, the algorithm would have
to check every variable assignment, which cannot be done in polynomial time.
3SAT is also NP-hard. We give some intuition for this result. Consider any prob
lem in NP. Because it belongs in NP, a nondeterministic polynomial time algorithm
exists to solve this problem, or a verifier to check a solution. The verifier is an al
gorithm that can be implemented as a circuit. Now, the circuit consists of AND,
OR and NOT gates, which can be represented as a formula. This formula can be
converted to 3SAT form, where each clause has 3 literals, which is equivalent to the
original formula. Thus all problems in NP can be converted to 3SAT, and the inputs
to the original problem are equivalent to the converted inputs to 3SAT, thus 3SAT is
NP-complete.
Super Mario Brothers [Aloupis, Demaine, Guo, Viglietta 2014]
We show that Super Mario Brothers is NP-hard by giving a reduction from 3SAT.
This version of Super Mario Brothers is generalized to an arbitrary screen size of
n × n, so we remove limits on the number of items on screen. We have the following
problem definition
Definition 2. Super Mario Brothers: Given a level of Super Mario Brothers, can
we advance to the next level?
Because we reduce from 3SAT, we are given a 3SAT instance, and we must gen
erate a level of Super Mario Brothers that corresponds to that 3SAT instance.
We construct the level by constructing gadgets for each of the variables in the
3SAT formula, as pictured in Figure 1. Mario jumps down from the ledge, and
cannot jump back up. He can fall to the left or right, corresponding to assigning the
variable to True or False. The remainder of the level is set up such that this choice
cannot be reversed.
We also create the following gadget for clauses. After choosing the assignment
for a given variable, Mario visits all of the clause gadgets with the same literal value,
(b) Clause gadget

(a) Variable gadget
(c) Crossover gadget
Figure 1: Gadgets for Super Mario Brothers.

© Nintendo. All rights reserved. This content is excluded from our Creative
Commons license. For more information, see http://ocw.mit.edu/fairuse.
then moves to the next variable gadget. By visiting a clause, Mario can release a star.
Finally, after visiting all of the variable gadgets, Mario must re-traverse the clause
gadgets. If the clause gadget was previously visited, a star is available, and he can
pass through the fire. Otherwise, he will not be able to traverse the clause gadget
and die.
Thus, winning the level is equivalent to passing through all the clause gadgets on
the second pass through. Mario can only pass all the clause gadgets if they have all
been satisfied by a variable assignment. The actions throughout the variable gadgets
correspond to the solution to the 3SAT formula, so if Mario can pass the level, we
have a solution to the 3SAT problem.
The final gadget needed is the crossover gadget. It ensures that Mario does not
switch between variable and clause gadgets when it is not allowed. The total size of
all these gadgets within the polynomial size required by the reduction.
Thus, Mario can win the level if and only if the original 3SAT formula could
be satisfied. Therefore have a reduction from 3SAT, and Super Mario Brothers is
NP-hard.
3 Dimensional Matching (3DM)

Definition 3. 3DM: Given disjoint sets X, Y , and Z, each of n elements and triples
T ⊆ X × Y × Z is there a subset S ⊆ T such that each element ∈ X ∪ Y ∪ Z is in
exactly one s ∈ S?
3DM is NP. Given a certificate, which lists a candidate list of triples, a verifier can
check that each triple belongs to T and every element of X ∪ Y ∪ Z is in one triple.
3DM is also NP-complete, via a reduction from 3SAT. We build gadgets for the
variables and clauses.
The variable gadget for variable xi is displayed in the picture. Only red or blue
triangles can be chosen, where the red triangles correspond to the true literal, while
the blue triangles correspond to the false literal. Otherwise, there will be overlap, or
some inner elements will not be covered. There is an 2nxi “wheel” for each variable
gadget, where nxi corresponds to the number of occurrences of xi in the formula.
The clause gadget for the clause xi ∧ x¯j ∧ xk is displayed in the picture. The dot
that is unshared in each variable’s triangle within the clause gadget is also the single
dot within the variable gadget.
Then, if we set xi to true, we take all of the red false triangles in the variable
gadget, leaving a blue true triangle available to cover the clause gadget. However,
this still potentially leaves x¯j and xk uncovered, so we need a garbage collection
�
gadget, pictured below. There are x nx clauses of these gadgets, because there are
nx unnecessary elements of the variable gadget that won’t be covered. However, of
the remaining elements, one of these per clause will be covered, so the remaining need
to be covered by the garbage collection clause.
Thus, if a solution exists to the 3SAT formula, we can find the solution to the
3DM problem by choosing the points in 3DM corresponding to the variable values.
If we have a 3DM solution, we can map this back to the satisfying assignment for
3SAT. Thus, our reduction is complete and 3DM is NP-hard.
Subset Sum
Definition 4. Subset Sum Given n integers A = {a1 , a2 , . . . , an } and a target sum
t, is there a subset S ⊆ A stuch that
� �
S= ai = t
ai ∈S
The Subset Sum problem is NP-complete. It is in NP, because a verifier can

simply check that the given subset is a subset of A and that its sum is equivalent to
the target in polynomial time.
It is NP-hard via a reduction from 3DM. View the numbers in base b = 1 +
maxi nxi . Then the triple (xi , xj , xk ) can be written in base b in the form 000100100001000 =
bi + bj + bk , where the first 1 corresponds to i, the second to j, and the third to k.
�
The target sum t = 111111111111111 = i bi .
This prevents any 1s from colliding, which ensures that each element is used
exactly once, because multiple 1s correspond to reusing an element, and the base is
sufficiently large so that no smaller numbers can be summed to create the next power
of b. This completes the reduction.
In fact, the Subset Sum problem is only weakly NP-hard. The number of digits
in t is O(n). This means that the values of the numbers used in the reduction are
exponential in input, making this problem weakly NP-hard. Problems which are
strongly NP-hard must only use number values that are polynomial in the size of
the input.
This also implies that the problem can be solved by a pseudopolynomial algorithm.
Partition
Definition 5. Partition: Given A = {a1 , a2 , . . . , an }, is there a subset S ⊆ A such
that � � 1�
S= A\S = A?
2
Partition is also weakly NP-complete. It is a special case of the Subset Sum
�
problem, where we set t = 12 A. In fact, we can reduce Partition to Subset Sum,
though this is not the direction we want for the reduction.
�
We can reduce from Subset Sum to Partition as follows. Let σ = A. Add
elements an+1 = σ + t and an+2 = 2σ − t to A. Then an+1 and an+2 must be on
different sides of the partition. In order to balance the two sides, σ + t must be added
to an+1 and t must be added to an+1 , so each subset has sum 2σ. Thus if we can
solve Partition, we also have the subset of elements that sum to t, the target for the
Subset Sum problem, completing our reduction.
Rectangle Packing
Definition 6. Rectangle Packing: Given a set of rectangles Ri and a target rect
angle T , can we pack the rectangles in T such that there is no overlap? Note that
sum of the area of the rectangles Ri is equivalent to the area of the target rectangle
�
or i Ri = T .
Rectangle packing is weakly NP-hard via a reduction from Partition. For every
element ai in Partition, we create a rectangle Ri with height 1 and width 3ai . The
�
target rectangle has height 2 and width 3t = 32 A. Because each rectangle has width
at least 3, all rectangles must be packed horizontally. Thus to solve the Rectangle
packing problem, we must separate the blocks into two groups with total width 3t,
which will correspond to two subsets with total sum t in the Partition problem.
Jigsaw Puzzles [Demaine & Demaine 2007]
Definition 7. Given square tiles with no patterns, can these tiles be arranged to
fit a target rectangular shape? Note that the tiles can have a side tab, pocket, or
boundary, but tabs and pockets must have matching shapes.
The most obvious reduction is from Partition. For every number, create a set
of square rectangles with a unique tab and pocket, where the number of tiles is
equivalent to the value of ai , and the two end pieces of the rectangle have boundaries.
However, this reduction cannot be completed because the inputs to Partition can be
exponentially large.
Instead, the reduction comes from 4-Partition.
Definition 8. 4-Partition: Given n integers A = {a1 , a2 , . . . , an } ∈ ( 5t , 3t ), is there

� n
a partition into n4 subsets of 4 elements, each with the same sum t = A/ 4 ?
4-Partition is a strongly NP-complete problem. We reduce from 4-Partition to

Jigsaw Puzzles to show that Jigsaw Puzzles are NP-hard.
For each ai , we create the following set of pieces. This ensures that we cannot
mix pieces from different ai .
These pieces are set into the following target board. There are n4 rows, each
of which will hold pieces corresponding to 4 ai terms in the 4-Partition problem.
The width of the target board is t, because each row must hold t pieces, so that
the corresponding ai form a group of sum t in the 4-Partition problem. Thus, the
reduction is complete.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 17: Approximation Algorithms
• Definitions
• Vertex Cover
• Set Cover
• Partition
Approximation Algorithms and Schemes

Let Copt be the cost of the optimal algorithm for a problem of size n. An approxi
mation algorithm for this problem has an approximation ratio Q(n) if, for any input,
the algorithm produces a solution of cost C such that:
C Copt
max( , ) ≤ Q(n)
Copt C
Such an algorithm is called a Q(n)-approximation algorithm.
An approximation scheme that takes as input c > 0 and produces a solution such
that C = (1 + c)Copt for any fixed c, is a (1 + c)-approximation algorithm.
A Polynomial Time Approximation Scheme (PTAS) is an approximation algorithm

that runs in time polynomial in the size of the input, n. A Fully Polynomial Time
Approximation Scheme (FPTAS) is an approximation algorithm that runs in time
polynomial in both n and c. For example, a O(n2/E ) approximation algorithm is a
PTAS but not a FPTAS. A O(n/c2 ) approximation algorithm is a FPTAS.
Vertex Cover
Given an undirected graph G(V, E), find a subset V ' ⊆ V such that, for every edge
(u, v) ∈ E, either u ∈ V ' or v ∈ V ' (or both). Furthermore, find a V ' such that |V ' |
is minimum. This is an NP-Complete problem.
Approximation Algorithm For Vertex Cover

Here we define algorithm Approx Vertex Cover , an approximation algorithm for Ver
tex Cover. Start with an empty set V ' . While there are still edges in E, pick an edge
(u, v) arbitrarily. Add both u and v into V ' . Remove all edges incident on u or v.
Repeat until there are no more edges left in E. Approx Vertex Cover runs in poly
nomial time.
Take for example the following graph G:
b c d
a e f g
Approx Vertex Cover could pick edges (b, c), (e, f ) and (d, g), such that V ' =
{b, c, e, f, d, g} and |V ' | = 6. Hence, the cost is C = |V ' | = 6. The optimal solution
for this example is {b, d, e}, hence Copt = 3.
Claim: Approx Vertex Cover is a 2-approximation algorithm.
Proof: Let U ⊆ V be the set of all the edges that are picked by Approx Vertex Cover .
The optimal vertex cover must include at least one endpoint of each edge in U (and
other edges). Furthermore, no two edges in U share an endpoint. Therefore, |U |
is a lower bound for Copt . i.e. Copt ≥ |U |. The number of vertices in V ' returned
by Approx Vertex Cover is 2 · |U |. Therefore, C = |V ' | = 2 · |U | ≤ 2Copt . Hence
C ≤ 2 · Copt . D
Set Cover
Given a set X and a family of (possibly overlapping) subsets S1 , S2 , · · · , Sm ⊆ X such
that ∪m
i=1 Si = X, find a set P ⊆ {1, 2, 3, · · · , m} such that ∪i∈P Si = X. Furthermore
find a P such that |P | is minimum.
Set Cover is an NP-Complete problem.
Approximation Algorithm for Set Cover

Here we define algorithm Approx Set Cover , an approximation algorithm for Set
Cover. Start by initializing the set P to the empty set. While there are still ele
ments in X, pick the largest set Si and add i to P . Then remove all elements in
Si from X and all other subsets Sj . Repeat until there are no more elements in X.
Approx Set Cover runs in polynomial time.
In the following example, each dot is an element in X and each Si are subsets of
X.
S3 S4 S5
S1
S2
S6
Approx Set Cover selects sets S1 , S4 , S5 , S3 in that order. Therefore it returns P =

{1, 4, 5, 3} and its cost C = |P | = 4. The optimal solution is Popt = {S3 , S4 , S5 } and
Copt = |Popt | = 3.
Claim: Approx Set Cover is a (ln(n)+1)-approximation algorithm (where n = |X|).
Proof: Let the optimal cover be Popt such that Copt = |Popt | = t. Let Xk be the
set of elements remaining in iteration k of Approx Set Cover . Hence, X0 = X.
Then:
• for all k, Xk can be covered by t sets (from the optimal solution)
|Xk |
• one of them covers at least t
elements
|Xk |
• Approx Set Cover picks a set of (current) size ≥ t
• for all k, |Xk+1 | ≤ (1 − 1t )|Xk | (More careful analysis (see CLRS, Ch. 35) relates
Q(n) to harmonic numbers. t should shrink.)
• for all k, |Xk+1 | ≤ (1 − 1t )k · n ≤ e−k/t · n (n = |X0 |)
Algorithm terminates when |Xk | < 1, i.e., |Xk | = 0 and will have cost C = k.
e−k/t · n < 1
ek/t > n
Hence algorithm terminates when kt > ln(n). Therefore kt = CCopt ≤ ln(n) + 1. Hence
Approx Set Cover is a (ln(n) + 1)-approximation algorithm for Set Cover. D
Notice that the approximation ratio gets worse for larger problems as it changes
with n.
Partition
The input is a set S = {1, 2, · · · , n} of n items with weights s1 , s2 , · · · , sn . Assume,
without loss of generality, that the items are ordered such that s1 ≥ s2 ≥ · · · ≥ sn .
�
Partition S into sets A and B to minimize max(w(A), w(B)), where w(A) = Si
� i∈A
and w(B) = Sj .
j∈B
�
n
Define 2L = si = w(S). Then optimal solution will have cost Copt ≥ L by

i=1
definition.
Partition is an NP-Complete problem. Want to find a PTAS (1 + c)-approximation.

(Note that 2-approximation in this case is trivial). Also, an FPTAS also exists for
this problem.
Approximation Algorithm for Partition

Here we define Approx Partition. Define m = I 1E l − 1. (c ≈ 1
m+1
) The algorithm
proceeds in two phases.
First Phase: Find an optimal partition A' , B ' of s1 , · · · , sm . This takes O(2m ) time.
Second Phase: Initialize sets A and B to A' and B ' respectively. Hence they
already contain a partition of elements s1 , · · · , sm . Then, for each i, where i goes
from m + 1 to n, if w(A) ≤ w(B), add i to A, otherwise add i to B.
Claim: Approx Partition is a PTAS for Partition.
Proof: Without loss of generality, assume w(A) ≥ w(B). Then the approxima
tion ratio is CCopt = w(A)
L
. Let k be the last item added to A. There are two cases,
either k was added in the first phase, or in the second phase.
Case 1: k is added to A in the first phase. This means that A = A' . We have
an optimal partition since we can’t do better than w(A' ) when we have n ≥ m items,
and we know that w(A' ) is optimal for the m items.
Case 2: k is added to A in the second phase. Here we know w(A) − sk ≤ w(B)

since this is why k was added to A and not to B. (Note that w(B) may have in
creased after this last addition to A). Now, because w(A) + w(B) = 2L, w(A) − sk ≤
w(B) = 2L − w(A). Therefore w(A) ≤ L + s2k . Since s1 ≥ s2 ≥ · · · ≥ sn , we can say
that s1 , s2 , · · · , sm ≥ sk . Now since k > m, 2L ≥ (m + 1)sk .
w(A) L + s2k sk sk 1
Now, L
≤ = 1+ 2L ≤ 1+ (m+1)·s = 1+ m+1 = 1+c. Hence Approx Partition
L k
is a (1 + c)-approximation for Partition. D
Natural Vertex Cover Approximation

Here we describe Approx Vertex Cover Natural , a different approximation algorithm
for Vertex Cover. Start with an empty set V ' . While there are still edges left in E,
pick the vertex v ∈ V that has maximum degree and add it to V ' . Then remove v and
all incident edges from E. Repeat until no more edges left in E. In the end, return V ' .
The following example shows a bad-case example for Approx Vertex Cover Natural .
In the example, the optimal cover will pick the k! vertices at the top.
k! vertices of degree k
...
k!/k vertices k!/(k-1) vertices k! vertices

of degree k of degree k-1 of degree 1
Approx Vertex Cover Natural could possibly pick all the bottom vertices from left
to right in order. Hence the cost could be k! · ( k1 + k−1
1
+ · · · + 1) ≈ k! log k. Which is
a factor of log k worse than optimal.
Claim: Approx Vertex Cover Natural is a (log n)-approximation.
Proof: Let Gk be the graph after iteration k of the algorithm. And let n be the
number of edges in the graph, i.e. |G| = n = |E|. With each iteration, the algorithm
selects a vertex and deletes it along with all incident edges. Let m = Copt be the
number of vertices in the optimal vertex cover for G. Then let’s look at the first m
iterations of the algorithm: G0 → G1 → G2 → · · · → Gm .
Let di be the degree of the maximum degree vertex of Gi−1 . Then the algorithm
deletes all edges incident on that vertex to get Gi . Therefore:
�
m
|Gm | = |G0 | − di
i=1
Also:
m �
m
|Gi−1 |
di ≥
i=1 i=1
m
This is true because given |Gi−1 | edges that can be covered by m vertices, we know
that there is a vertex with degree at least |Gm
i−1 |
. Then:
m
|Gi−1 | �
m
|Gm |
≥
= |Gm |
i=1
m
i=1
m
This is true since |Gi | ≤ |Gi−1 | for all i. Then, it follows:
|G0 | − |Gm | ≥ |Gm |
�
m
Because |Gm | ≤ di . Hence after m iterations, the algorithm will have deleted half
i=1
or more edges from G0 . And generally, since every m iterations it will halve the
number of edges in the graph, in m · log |G0 | iterations, it will have deleted all the
edges. And since with each iteration it addes 1 vertex to the cover, it will end up with
a vertex cover of size m · log |G0 | = m · log n. Since we assumed that m was the size of
the optimal vertex cover, CCopt = m log
m
n
= log n. Hence Approx Vertex Cover Natural
is a (log n)-approximation. D
Note that since n ≈ k! log k in the example of Figure , the worst-case example is
log k ≈ log log n worse, but we have only shown an O(log n) approximation.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 18 Fixed-Parameter Algorithms 6.046J Spring 2015
Lecture 18: Fixed-Parameter Algorithms
• Vertex Cover
• Fixed-Parameter Tractability
• Kernelization
• Connection to Approximation
Fixed Parameter Algorithms

Fixed Parameter Algorithms are an alternative way to deal with NP-hard problems
instead of approximation algorithms. There are three general desired features of an
algorithm:
1. Solve (NP-)hard problems
2. Run in polynomial time (fast)
3. Get exact solutions

In general, unless P = NP, an algorithm can have two of these three features, but not
all three. An algorithm that has Features 2 and 3 is an algorithm in P (poly-time
exact). An approximation algorithm has Features 1 and 2. It solves hard problems,
and it runs fast, but it does not give exact solutions. Fixed-parameter algorithms will
have Features 1 and 3. They will solve hard problems and give exact solutions, but
they will not run very fast.
Idea: The idea is to aim for an exact algorithm but isolate exponential terms to a
specific parameter. When the value of this parameter is small, the algorithm gets fast
instances. Hopefully, this parameter will be small in practice.
Parameter: A parameter is a nonnegative integer k(x) where x is the problem

input. Typically, the parameter is a natural property of the problem (some k in
input). It may not necessarily be efficiently computable (e.g., OPT).
Parameterized Problem: A parameterized problem is simply the problem plus

the parameter or the problem as seen with respect to the parameter. There are
potentially many interesting parameterizations for any given problem.
Goal: The goal of fixed-parameter algorithms is to have an algorithm that is poly

nomial in the problem size n but possibly exponential in the parameter k, and still
get an exact solution.
k-Vertex Cover
Given a graph G = (V, E) and a nonnegative integer k, is there a set S ⊆ V of
vertices of size at most k, |S| ≤ k, that covers all edges. This is a decision problem
for Vertex Cover and is also NP-hard. We will use k as the parameter to develop a
fixed-parameter algorithm for k-Vertex Cover. Note that we can have k << |V | as
the figure below shows:
Brute-force solution (bad)

() ( v ) () ()
Try all kv + k−1 + · · · + v0 sets of ≤ k vertices. Can skip all terms smaller than kv
because bigger sets have more coverage. Testing coverage takes O(m) time where m
is the number of edges. Therefore, the total runtime is O(V k |E|). It is polynomial for
fixed k but not the same polynomial for all k’s. It is inefficient in most cases. Hence
we define nf (k) to be bad, where n = |V | + |E| is the input size.
Bounded search-tree algorithm (good)

This is a general technique used to improve brute force searches. It works as follows:
• pick arbitrary edge e = (u, v)
• we know that either u ∈ S or v ∈ S (or both) but don’t know which
• guess which one: try both possibilities
1. add u to S, delete u and incident edges from G, and recurse with k ' = k−1.
2. do the same but with v instead of u
3. return the OR of the two outcomes
This is like guessing in dynamic programming but memoization doesn’t help here.
The recursion tree looks like the following:
u,v
u v
u',v' u'',v''
u' v' u'' v''
At a leaf (k = 0), return YES if |E| = 0 (all edges covered). It takes O(V ) time
to delete u or v. Therefore this has a total runtime of (2k |V |).
• O(V ) for fixed k
• degree of polynomial is independent of k
• also polynomial for k = O(lg |V |)
• practical for e.g. k ≤ 32
• Hence we define f (k) · nO(1) to be good
Fixed Parameter Tractability

A parameterized problem is fixed-parameter tractable (FPT) if there is an algorithm
with running time ≤ f (k) · nO(1) , such that f : N → N (non negative) and k is the
parameter, and the O(1) degree of the polynomial is independent of k and n.
Question: Why f (k) · nO(1) and not f (k) + nO(1) ?
t
Theorem: ∃f (k) · nc algorithm ⇐⇒ ∃f ' (k) + nc
Proof:
(⇐)
t
Trivial (assuming f ' (k) and nc are ≥ 1)
(⇒)
if n ≤ f (k), then f (k) · nc ≤ f (k)c+1
if f (k) ≤ n then f (k) · nc ≤ n

c+1
t
Therefore f (k) · nc ≤ max(f (k)c+1 , nc+1
) ≤ f (k)c+1 + nc+1 = f ' (k) + nc
D
Alternatively, since xy ≤ x2 + y 2 , can just make f ' (k) = (f (k))2 and c' = 2c.
Example: O(2k · n) ≤ O(4k + n2 )
Kernelization
Kernelization is a simplifying self-reduction. It is a polynomial time algorithm that
converts an input (x, k) into a small and equivalent input (x' , k ' ). Here, small means
|x' | ≤ f (k) and equivalent means the answer to x is the same as the answer to x' .
Theorem: a problem is FPT ⇐⇒ ∃ a kernelization
Proof:
(⇐)
Kernelize ⇒ n' ≤ f (k)
Run any finite g(n' ) algorithm
Totals to nO(1) + g(f (k)) time
(⇒)
let A be an f (k) · nc algorithm, then assuming k is known:
if n ≤ f (k), it’s already kernelized.
if f (k) ≤ n, then
1. run A → f (k) · nc ≤ nc+1 time
2. output O(1)-sized YES/NO instance as appropriate (to kernelize)

if k is unknown: run A for nc+1 time and if it is still not done, we know it is already
kernelized.
So we know (exponential) kernel exists. Recent work aims to find polynomial (even
linear) kernels when possible.
Polynomial kernel for k-Vertex Cover

To create a kernel for k-Vertex Cover, the algorithm follows the following steps:
• Make graph simple by removing all self loops and multi-edges
• Any vertex of degree > k must be in the cover (else would need to add > k
vertices to cover incident edges)
• Remove such vertices (and incident edges) one at a time, decreasing k accord
ingly
• Remaining graph has maximum degree ≤ k
• Each remaining vertex covers ≤ k edges
• If the number of remaining edges is > k, answer NO and output canonical NO

instance.
• Else, |E ' | ≤ k 2
• Remove all isolated vertices (degree 0 vertices)
• Now |V ' | ≤ 2k 2
• The input has been reduced to instance (V ' , E ' ) of size O(k 2 )
The runtime of the kernelization algorithm is naively O(V E). (O(V + E) with more
work.) After this, we can apply either a brute-force algorithm on the kernel, which
yields an overall runtime O(V + E + (2k2 )k k 2 ) = O(V + E + 2k k 2k+2 ). Or we can
apply a bounded search-tree solution, which yields a runtime of O(V + E + 2k k 2 ).
The best algorithm to date: O(kV + 1.274k ) by [Chen, Kanj, Xia - TCS 2010].
Connection to Approximation Algorithms

Take an optimization problem, integral OPT and consider its associated decision
problem: “OPT ≤ k ?” and parameterize by k.
Theorem: optimization problem has EPTAS
(EPTAS: efficient PTAS, f ( 1 ) · nO(1) e.g. ApproxP artition[L17])
⇒ decision problem is FPT
Proof: (like FPTAS, pseudopolynomial algorithm)
• Say maximization problem (and ≤ k decision)

1
• run EPTAS with t = 2k
in f (2k) · nO(1) time.
1 1
• relative error ≤ 2k
< k
• ⇒ absolute error < 1 if OPT ≤ k
• So if we find a solution with value ≤ k, then OPT ≤ (1 +

2
1
k ) · k ≤ k +
12
Integral ⇒ OPT ≤ k ⇒ YES
• else OPT > k
Also: =, ≤, ≥ decision problems are equivalent with respect to FPT.
(Can use this relation to prove that EPTASs don’t exists in some cases)
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 21: Cryptography: Hashing
In this lecture, we will be studying some basics of cryptography. Specifically, we

will be covering
• Hash functions
• Random oracle model
• Desirable Properties
• Applications to security
1 Hash Functions
A hash function h maps arbitrary strings of data to fixed length output. The function
is deterministic and public, but the mapping should look “random”. In other words,
h : {0, 1}∗ → {0, 1}d
for a fixed d. Hash functions do not have a secret key. Since there are no secrets and
the function itself is public, anyone can evaluate the function. To list some examples,
Hash function MD4 MD5 SHA-1 SHA-256 SHA-512

d 128 128 160 256 512
In practice, hash functions are used for “digesting” large data. For example, if
you want to check the validity of a large file (potentially much larger than a few
megabytes), you can check the hash value of that file with the expected hash. There
fore, it is desirable (especially for cryptographic hash functions covered here) that
the function is collision resistant. That is, it should be “hard” to find two inputs m1
and m2 for hash function h such that h(m1 ) = h(m2 ). Most modern hash functions
hope to achieve security level of 264 or better, which means that the attacker needs
to test more than 264 different inputs to find a collision. Unfortunately, MD4 and
MD5 aimed to provide 264 security, but has been shown to be broken using 26 and
237 inputs respectively. SHA-1 aimed to provide 280 security, but has been shown (at
least theoretically) to be no more than 261 security.
1.1 Random Oracle

The Random Oracle model is an ideal model of the hash function that is not achievable
in practice. In this model, we assume there exists an oracle h such that on input
x ∈ {0, 1}∗ , if h has not seen x before, then it outputs a random value as h(x).
Otherwise, it returns h(x) it previously output. The random oracle gives a random
value for all new inputs, and gives deterministic answers to all inputs it has seen
before. Unfortunately, a random oracle does not exist since it requires infinite storage,
so in practice we use pseudo-random functions.
1.2 Desirable Properties

There are many desirable properties of a hash function.
1. One-way (pre-image resistance): Given y ∈ {0, 1}d , it is hard to find an x such

that h(x) = y.
2. Strong collision-resistance: It is hard to find any pair of inputs x, x' such that
h(x) = h(x' ).
3. Weak collision-resistance (target collision resistance, 2nd pre-image resistance):

Given x, it is hard to find x' such that h(x) = h(x' ).
4. Pseudo-random: The function behaves indistinguishable from a random oracle.
5. Non-malleability: Given h(x), it is hard to generate h(f (x)) for any function f .
Some of the properties imply others, and some others do not. For example,
• 2⇒3
• 1 ⇒ 2, 3.
Furthermore, collision can be found in O(2d/2 ) (using birthday paradox), and inversion
can be found in O(2d ).
To give more insight as to why some properties do not imply others, we provide
examples here. Consider h that satisfies 1 and 2. We can construct a new h' such that
h' takes in one extra bit of input, and XORs the first two bits together to generate
an input for h. That is, h' (a, b, x2 , . . . , xn ) = h((a ⊕ b), x2 , . . . , xn ). h' is still one-way,
but is not weak collision resistant. Now consider a different h'' that evaluates to 0||x
if |x| ≤ n, and 1||h(x) otherwise. h'' is weak collision resistant, but is not one-way.
1.3 Applications
There are many applications of hash functions.
1. Password Storage: We can store hash h(p) for password p instead of p directly,
and check h(p) to authenticate a user. If it satisfies the property 1, adversary
comprising h(p) will not learn p.
2. File Authenticity: For each file F , we can store h(F ) in a secure location. To
check authenticity of a file, we can recompute h(F ). This requires property 3.
3. Digital Signature: We can use hash functions to generate a signature that guar
antees that the message came from a said source. For further explanation, refer
to Recitation 11.
4. Commitments: In a secure bidding, Alice wants to bid value x, but does not
want to reveal the bid until the auction is over. Alice then computes h(x), and
publicize it, which serves as her commitment. When bidding is over, then she
can reveal x, and x can be verified using h(x).
It should be “binding” so that Alice cannot generate a new x that has the same
commitment, and it should “hide” x so no ones learns anything before x is re
vealed. Furthermore, it should be non-malleable. To guarantee secrecy, we need
more than the properties from the previous section, as h' (x) = h(x)||M SB(x)
actually satisfies 1, 2, and 5. In practice, this problem is bypassed by adding
randomness in the commitment, C(x) = h(r||x) for r ∈R {0, 1}256 , and reveal
both randomness and x at the end.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 22 6.046J Spring 2015
Lecture 22: Cryptography: Encryption
• Symmetric key encryption
• Key exchange
• Asymmetric key encryption
• RSA
• NP-complete problems and cryptography
– graph coloring
– knapsack
Symmetric key encryption

c = ek (m)
m = dk (c)
Here c is the ciphertext, m is the plaintext, e is the encryption function, d is the
decryption function and k is the secret key. e, d permute and reverse-permute the
space of all messages.
Reversible operations: ⊕, +/−, shift left/right.
Symmetric algorithms: AES, RC5, DES
Key Management Question

How does secret key k get exchanged/shared?
Alice wants to send a message to Bob. There are pirates between Alice and Bob,
that will take any keys or messages in unlocked box(es), but won’t touch locked boxes.
How can Alice send a message or a key to Bob (without pirates knowing what was
sent)?
Solution:
• Alice puts m in box, locks it with kA
• Box sent to Bob
• Bob locks box with kB
• Box sent to Alice
• Alice unlocks kA
• Box sent to Bob
• Bob unlocks kB , reads m
Notice that this method relied on the commutativity of the locks. This means
that the order of the lock and unlock operations doesn’t matter.
Diffie-Hellman Key Exchange

G = Fp∗
Here Fp∗ is a finite field (mod p, a prime). ∗ means invertible elements only
({1, 2, ..., p − 1})
g public
Alice Bob 2≤g ≤p−2
p public
Select a
Compute g a ga 1 ≤ a, b ≤ p − 2
Select b
gb Compute g b
( ta
Alice can compute g b mod p = k.
a b
Bob can compute (g ) mod p = k.
Assumes the Discrete Log Problem is hard (given g a , compute a) and Diffie Hell-
man Problem is hard (given g a , g b compute g ab ).
Can we attack this? Man-in-the-middle:
• Alice doesn’t know she is communicating with Bob.
• Alice agrees to a key exchange with Eve (thinking she is Bob).
• Bob agrees to a key exchagne with Eve (thinking she is Alice).
• Eve can see all communications.
Public Key Encryption

message + public key = ciphertext
ciphertext + private key = message
The two keys need to be linked in a mathematical way. Knowing the public key
should tell you nothing about the private key.
RSA
• Alice picks two large secret primes p and q.
• Alice computes N = p · q.
• Chooses an encryption exponent e which satisfies gcd(e, (p − 1)(q − 1)) = 1,

e = 3, 17, 65537.
• Alice’s public key= (N, e).
• Decryption exponent obtained using Extended Euclidean Algorithm by Alice

such that e · d ≡ 1 mod (p − 1)(q − 1).
• Alice private key=(d, p, q) (storing p and q is not absolutely necessary, but we

do it for efficiency).
Encryption and Decryption with RSA

c = me mod N encryption
m = cd mod N decryption
Why it works
φ = (p − 1)(q − 1)
Since ed ≡ 1 mod φ there exists an integer k such that ed = 1 + kφ.
Two cases:
Case 1 gcd(m, p) = 1. By Fermat’s theorem,
mp−1 ≡ 1 mod p
( p−1 tk(q−1)
m ·m≡m mod p
m1+k(p−1)(q−1) = med ≡ m mod p
Case 2 gcd(m, p) = p. This means that m mod p = 0 and so m

ed ≡ m
Thus, in both cases, med ≡ m mod p. Similarly, med ≡ m
mod q. Since p, q are
distinct primes, med ≡ m mod N . So cd = (me )d ≡ m mod N
Hardness of RSA
• Factoring: given N , hard to factor into p, q.
• RSA Problem: given e such that gcd(e, (p − 1)(q − 1)) = 1 and c, find m such
that me ≡ c mod N .
NP-completeness
Is N composite with a factor within a range? unknown if NP-complete
Is a graph k-colorable? In other words: can you assign one of k colors to each
vertex such that no two vertices connected by an edge share the same color? NP-
complete
Given a pile of n items, each with different weights wi , is it possible to put items
in a knapsack such that we get a specific total weight S? NP-complete
NP-completeness and Cryptography

• NP-completeness: about worst-case complexity
• Cryptography: want a problem instance, with suitably chosen parameters that

is hard on average
Most knapsack cryptosystems have failed.

Determining if a graph is 3-colorable is NP-complete, but very easy on average.
This is because an average graph, beyond a certain size, is not 3-colorable!
Consider a standard backtracking search to determine 3-colorability.
• Order vertices v1 , ..., vt . Colors = {1, 2, 3}
• Traverse graph in order of vertices.
• On visiting a vertex, choose smallest possible color that “works”.
• If you get stuck, backtrack to previous choice, and try next choice.
• Run out of colors for 1st vertex → output ’NO’
• Successfully color last vertex → output ’YES’

On a random graph of t vertices, average number of vertices traveled < 197,
regardless of t!
Knapsack Cryptography
General knapsack problem: NP-complete
Super-increasing knapsack: linear time solvable. In this problem, the weights are
constrained as follows:
L
j−1
wj ≥ wi
i=1
Merkle Hellman Cryptosystem

Private transform
Private key → super-increasing knapsack problem −−−−−−−−−−→ “hard” general knap
sack problem → public key.
Transform: two private integers N, M s.t. gcd(N, M ) = 1.
Multiply all values in the sequence by N and then take mod M .
Example: N = 31, M = 105, private key={2, 3, 6, 14, 27, 52},
public key={62, 93, 81, 88, 102, 37}
Merkle Hellman Example
Message = 011000 110101 101110

Ciphertext:011000 93 + 81 = 174
110101 62 + 93 + 88 + 37 = 280
101110 62 + 81 + 88 + 102 = 333
= 174, 280, 333
Recipient knows N = 31, M = 105, {2, 3, 6, 14, 27, 52}. Multiplies each ciphertext
block by N −1 mod M . In this case, N −1 = 61 mod 105.
174 · 61 = 9 = 3 + 6 = 011000
280 · 61 = 70 = 2 + 3 + 13 + 52 = 110101
333 · 61 = 48 = 2 + 6 + 13 + 27 = 101110
Beautiful but broken

Lattice based techniques break this scheme.
Density of knapsack d = max{log nwi :1≤i≤n}
2
Lattice basis reduction can solve knapsacks of low density. Unfortunately, M-H
scheme always produces knapsacks of low density.
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 23 Cache-Oblivious I 6.046J Spring 2015
Lecture 23: Cache-oblivious Algorithms I
This lecture introduces cache-oblivious algorithms. Topics include

• memory hierarchy
• external memory vs. cache-oblivious model
• cache-oblivious scanning
• cache-oblivious divide & conquer algorithms: median finding & matrix multi
plication
1 Modern Memory Hierarchy

So far in this class, we have viewed all operations and memory accesses as equal cost.
However, modern computers have memory hierarchy.
CPU L1 L2 L3 memory disk
size ∼ 10KB 100KB MB GB TB
latency ∼ 1ns 10ns 10ns 100ns 10ms
Each hierarchy on the right is bigger, but has longer latency due to the longer
distance data has to travel. Yet, bandwidth between different hierarchies is usually
matched.
A common technique to mitigate latency is blocking: when fetching a word of
data, get the entire block containing it. Using algorithmic terminology, the idea is
to amortize latency over a whole block. For this idea to work, we additional require
the algorithm to use all words in a block (spatial locality) and reuse blocks in cache
(temporal locality).
2 Memory Model of Algorithms

2.1 External Memory Model
cache memory/disk
fast slow & blocked

CPU
... ...
O(1) size ...

...
Total size M Infinite size

divided into M/B blocks divided into blocks
Each block has B words Each block has B words
In this model, cache acecsses are free. The algorithm explicitly reads and writes
memory in blocks. We will count the number of memory transfers between the cache
and the memory, as an important metric of the algorithm’s performance.
2.2 Cache-oblivious Model

The only change from the external memory model is that the algorithm no longer
knows B and M . Accessing a word in the memory automatically fetches an entire
block into the cache, and evicts the least recently used (LRU) block from the cache
if the cache is full.
Every algorithm is a cache-oblivious algorithm, but we would like to find the
algorithm that minimizes our new metirc — the number of memory transfers.
Why do we like cache-olivious algorithms (as opposed to letting the algorithm
know B and M )? Because this way, an algorithm can auto-tune itself and run effi
ciently on different computers (possibly with different B and M ). Besides, it is cool
research!
3 Scanning
Example program:
for i in range(N ): sum += A[i]
Assume A is stored contiguously in memory. External memory model can align A

with a block boundary, so it needs IN/Bl memory transfers.
Cache-oblivious algorithms cannot control alignment (because it does not know

B), so it needs IN/Bl + 1 = N/B + O(1) memory transfers. O(1) parallel scans still
need O(N/B + 1) memory transfers.
4 Divide & Conquer

Divide & Conquer algorithms divide problems down to O(1) size. The base case of
the recursion is either when
• problem fits in cache i.e., ≤ M , or
• problem fits in O(1) blocks, i.e., O(B).
Below we will see one example for each.
4.1 Median Finding / Order Statistics

Recall the steps of the algorithm
1. view array as partitioned into columns of 5 (each column is O(1) size).
2. sort each column
3. recursively find the median of the column medians
4. partition array by x
5. recurse on one side
We will now analyze the number of memory transfers in each step. Let M T (N )
be the total number of memory transfers.
1. free
2. a scan, O(N/B + 1)
3. M T (N/5), this involves a pre-processing step that coallesces the N/5 elements
in a consecutive array
4. 3 parallel scans, O(N/B + 1)
5. M T (7N/10)
Therefore, we get the recursion
M T (N ) = M T (N/5) + M T (7N/10) + O(N/B + 1).
Solving the recursion requires setting a base case. An obvious base case is
M T (O(1)) = O(1).
But we can get a stronger base case: M T (O(B)) = O(1). Uing this base case, the
recursion solves to M T (N ) = O(N/B + 1). (Intuition: cost at level of the recursion
decreases geometrically, so the cost at root dominates.)
4.2 Matrix Multiplication

Problem: compute Z = X · Y where X, Y, Z are all N × N matrices. Also suppose
X is stored in row-major order, and Y is stored in column-major order to improve
locality.
If we use the basic algorithm, computing one element in Z requires one or two
parallel scans, because it either requires scanning either a new row from X, or a new
column from Y . Computing each element takes O(N/B + 1) memory transfers, so
computing the entire Z costs O(N 3 /B + N 2 ) memory transfers.
Instead we will use the blocked matrix multiplcation algorithm (not Strassen).
Note that key block should be stored consecutively. We get recursion
M T (N ) = 8M T (N/2) + O(N 2 /B + 1)
The first term is recursive sub-matrix multiplication, and the second term is matrix
addition which requires scanning the matrices.
Again, we can have different base cases
• weak: M T (O(1)) = O(1)
• better: M T (O(B)) = O(1)

J
• even better: M T ( M/3) = O(M/B)
The third case represents the case that the three involved matrices can fit in the
cache together. Therefore, to multiply them, we only need one scan to load all of
them into cache.
If we draw the recursion tree, the cost at each level is geometrically increasing this
time, N 2 /B, 8( N2 )2 /B, 82 ( N4 )2 /B, . . . . Therefore, the cost at the leaves dominate, and
the total cost = cost per leave · number of leaves,
√ √ √
M T (N ) = O(M/B) · 8O(lg N/ M ) = O(M/B) · O((N/ M )3 ) = O(N 3 /B M ).
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015
Lecture 24 Cache-Oblivious II 6.046J Spring 2015
Lecture 24: Cache-oblivious algorithms II
• Search
– binary
– B-ary
– cache-oblivious
• Sorting
– mergesorts
– cache-oblivious
Why LRU block replacement strategy?

LRUM ≤ 2 · OP TM/2 [Sleater and Tarjan 1985]
Proof.
• partition block access sequence into maximal phases of M/B distinct blocks
• LRU spends ≤ M/B memory transfers/phase
• OPT must spend ≥ M2 /B memory transfers per phase: at best, starts phase
with entire M/2 cache with needed items. But there are M/B blocks during
phase. So ≤ half free
Search
Preprocess n elements in comparison model to support predecessor search for x.
B-trees
They support predecessor (and insert and delete) in O(logB+1 N ) memory transfers.
• each node occupies Θ(1) blocks
• height= Θ(logB N )
• need to know B
Binary search
Approximately, every iteration visits a different block until we are in x’s block. Thus,
M T (N ) = Θ(log N − log B) = Θ(log(N/B)). SLOW
van Emde Boas layout

[Prokop 1999]
(1/2)lg N
\sqrt{N}
lg N
middle level
• store N elements in complete BST
• carve BST at middle level of edges
• recursively layout the pieces and concatenate
• like block matrix multiplication, order of pieces doesn’t matter; just need each
piece to be stored consecutively
Analysis of BST search in vEB layout:
• consider recursive level of refinement at which structure has ≤ B nodes

√
• the height of the vEB tree is between 12 lg B and lg B =⇒ size is between B
and B
=⇒ any root-to-node path (search path) visits ≤ 1lglgNB = 2 logB N trees that
2
have size ≤ B
• each tree of size ≤ B occupies ≤ 2 memory blocks
=⇒ ≤ 4 logB N = O(logB N ) memory transfers
• this generalizes to heights that are not powers of 2, B-trees of constant branch
ing factor and dynamic B-trees: O(logB N ) insert/delete. [Bender, Demaine,
Farach-Colton 2000]
Sorting
B-trees
N inserts into (cache-oblivious) B-tree =⇒ M T (N ) = Θ(N logB N ) NOT OPTI
MAL. By contrast, BST sort is optimal O(N lg N )
Binary mergesort
• binary mergesort is cache-oblivious.
• the merge is 3 parallel scans
=⇒ M T (N ) = 2M T (N/2) + O(N/B + 1)
M T (M ) = O(M/B)
• the recursion tree has lg(N/M ) levels, and each level contributes O(N/B)
=⇒ M T (N ) = N B
N
lg M . ← lgBB faster than the B-tree version discussed earlier!
M/B-way mergesort
• split array into M/B equal subarrays
• recursively sort each
• merge via M/B parallel scans (keeping one “current” block per list)
� �
M N
=⇒ M T (N ) = MT + O(N/B + 1)
B M/B
M T (M ) = O(M/B)
N
=⇒ height becomes logM/B +1
M
N B
= logM/B · +1
B M
N M
= logM/B − logM/B +1
B B
N
= logM/B
B
� �
N N

=⇒ M T (N ) = O logM/B
B B
This is asymptotically optimal, in the comparison model.
Cache-oblivious Sorting
This requires the tall-cache assumption: M = Ω(B 1+E ) for some fixed f > 0, e.g.,
M = Ω(B 2 ) or M/B = Ω(B).
Then, ≈ N E -way mergesort with recursive (“funnel”) merge works.
Priority Queues
• O( B1 logM/B N
B
) per insert or delete-min
• generalizes sorting
• external memory and cache-oblivious
• see 6.851
MIT OpenCourseWare
http://ocw.mit.edu

Spring 2015

(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF

Uploaded by

Copyright:

Available Formats

(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF

Uploaded by

Copyright:

Available Formats

Lecture 1 Introduction 6.

046J Spring 2015

• Algorithms for sorting, shortest paths, graph search, dynamic programming

2. Optimization - greedy and dynamic programming

4. Intractibility (and dealing with it)

6. Sublinear algorithms, approximation algorithms

Theme of today’s lecture

• NP: class of problems veriﬁable in polynomial time.

• NP-complete: problem is in NP and is as hard as any problem in NP.

Goal: Select a compatible subset of requests of maximum size.

Greedy Interval Scheduling

2. Reject all requests incompatible with i.

3. Repeat until all requests are processed.

1. Select request that starts earliest, i.e., minimum s(i).

Long one is earliest. Bad :(

2. Select request that is smallest, i.e., minimum f (i) − s(i).

Least # of incompatibles. Bad :(

4. Select request with earliest ﬁnish time, i.e., minimum f (i).

Claim 1. Greedy algorithm outputs a list of intervals

S ∗ [1, 2, . . . , k ∗ + 1] =< s(j1 ), f (j1 ) >, . . . , < s(jk∗ +1 ), f (jk∗ +1 ) >

S[1, 2, . . . , k] =< s(i1 ), f (i1 ) >, . . . , < s(ik ), f (ik ) >

Weighted Interval Scheduling

Here, R is the set of all requests.

opt(R) = max (w(i) + opt(Rf (i) ))

Dealing with intractability

2. Pruning heuristics to reduce (possible exponential) runtime on “real-world”

6.046J / 18.410J Design and Analysis of Algorithms

Lecture 2: Divide and Conquer

CH(S) represented by the sequence of points on the boundary in order clockwise

Lecture 2 Divide and Conquer 6.046J Spring 2015

Brute force for Convex Hull

• else the segment is not.

O(n2 ) edges, O(n) tests ⇒ O(n3 ) complexity

Divide and Conquer Convex Hull

• Divide into left half A and right half B by x coords

• Compute CH(A) and CH(B)

• Combine CH’s of two halves (merge step)

• Find upper tangent (ai , bj ). In example, (a4 , b2 ) is U.T.

• Find lower tangent (ak , bm ). In example, (a3 , b3 ) is L.T.

Lecture 2 Divide and Conquer 6.046J Spring 2015

• Cut and paste in time Θ(n).

Similarly for lower tangent.

Intuition for why Merge works

Lecture 2 Divide and Conquer 6.046J Spring 2015

• Arrange S into columns of size 5 (I n5 l cols)

• Sort each column (bigger elements on top) (linear time)

• Find “median of medians” as x

Lecture 2 Divide and Conquer 6.046J Spring 2015

How many elements are guaranteed to be > x?

O(1), for n ≤ 140

Solving the Recurrence

Lecture 2 Divide and Conquer 6.046J Spring 2015

a3 , b1 is upper tangent. a4 > a3 , b2 > b1 in terms of Y coordinates.

a1 , b3 is lower tangent, a2 < a1 , b4 < b3 in terms of Y coordinates.

ai , bj is an upper tangent. Does not mean that ai or bj is the highest point.

6.046J / 18.410J Design and Analysis of Algorithms

Lecture 3: Divide and Conquer:

• Divide and Conquer Algorithm

• Collapsing Samples / Roots of Unity