(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
(It-Ebooks-2017) It-Ebooks - Design and Analysis of Algorithms Lecture Notes (MIT 6.046J) - Ibooker It-Ebooks (2017) PDF
Lecture 1: Introduction
6.006 pre-requisite:
• Data structures such as heaps, trees, graphs
Course Overview
This course covers several modules:
1. Divide and Conquer - FFT, Randomized algorithms
3. Network Flow
5. Linear programming
7. Advanced topics
1
Lecture 1 Introduction 6.046J Spring 2015
Interval Scheduling
Requests 1, 2, . . . , n, single resource
s(i) start time, f (i) finish time, s(i) < f (i) (start time must be less than finish
time for a request)
Two requests i and j are compatible if they don’t overlap, i.e., f (i) ≤ s(j) or
f (j) ≤ s(i).
In the figure below, requests 2 and 3 are compatible, and requests 4, 5 and 6 are
compatible as well, but requests 2 and 4 are not compatible.
2 3
4 5 6
2
Lecture 1 Introduction 6.046J Spring 2015
Possible rules?
Smallest. Bad :(
3. For each request, find number of incompatibles, and select request with mini
mum such number.
< s(i1 ), f (i1 ) >, < s(i2 ), f (i2 ) >, . . . , < s(ik ), f (ik ) >
such that
s(i1 ) < f (i1 ) ≤ s(i2 ) < f (i2 ) ≤ . . . ≤ s(ik ) < f (ik )
Proof. Simple proof by contradiction – if f (ij ) > s(ij+1 ), interval j and j +1 intersect,
which is a contradiction of Step 2 of the algorithm!
Claim 2. Given list of intervals L, greedy algorithm with earliest finish time produces
k ∗ intervals, where k ∗ is optimal.
Proof. Induction on k ∗ .
Base case: k ∗ = 1 – this case is easy, any interval works.
Inductive step: Suppose claim holds for k ∗ and we are given a list of intervals
whose optimal schedule has k ∗ + 1 intervals, namely
3
Lecture 1 Introduction 6.046J Spring 2015
Say for some generic k, the greedy algorithm gives a list of intervals
By construction, we know that f (i1 ) ≤ f (j1 ), since the greedy algorithm picks the
earliest finish time.
Now we can create a schedule
S ∗∗ =< s(i1 ), f (i1 ) >, < s(j2 ), f (j2 ) >, . . . , < s(jk∗ +1 ), f (jk∗ +1 ) >
since the interval < s(i1 ), f (i1 ) > does not overlap with the interval < s(j2 ), f (j2 ) >
and all intervals that come after that. Note that since the length of S ∗∗ is k ∗ + 1, this
schedule is also optimal.
Now we proceed to define L' as the set of intervals with s(i) ≥ f (i1 ).
Since S ∗∗ is optimal for L, S ∗∗ [2, 3, . . . , k ∗ + 1] is optimal for L' , which implies
that the optimal schedule for L' has k ∗ size.
We now see by our initial inductive hypothesis that running the greedy algorithm
on L' should produce a schedule of size k ∗ . Hence, by our construction, running the
greedy algorithm on L' gives us S[2, . . . , k].
This means k − 1 = k ∗ or k = k ∗ + 1, which implies that S[1, . . . , k] is indeed
optimal, and we are done.
Dynamic Programming
We can define our sub-problems as
Rx = {j ∈ R|s(j) ≥ x}
4
Lecture 1 Introduction 6.046J Spring 2015
Note that even though there may be requests compatible with i that are not in
f (i)
R , we are picking i as the first request, i.e., we are going in order.
Non-identical machines
As before, we have n requests {1, 2, . . . , n}. Each request i is associated with a start
time s(i) and finish time f (i), m different machine types as well τ = {T1 , . . . , Tm }.
Each request i is associated with a set Q(i) ⊆ τ that represents the set of machines
that request i can be serviced on.
Each request has a weight of 1. We want to maximize the number of jobs that
can be scheduled on the m machines.
This problem is in NP, since we can clearly check that a given subset of jobs with
machine assignments is legal.
Can k ≤ n requests be scheduled? This problem is NP-complete.
Maximum number of requests that should be scheduled? This problem is NP-hard.
3. Greedy or other sub-optimal heuristics that work well in practice but provide
no guarantees.
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 2 Divide and Conquer 6.046J Spring 2015
• Paradigm
• Convex Hull
• Median finding
Paradigm
Given a problem of size n divide it into subproblems of size nb , a ≥ 1, b > 1. Solve each
subproblem recursively. Combine solutions of subproblems to get overall solution.
n
T (n) = aT ( ) + [work for merge]
b
Convex Hull
Given n points in plane
S = {(xi , yi )|i = 1, 2, . . . , n}
assume no two have same x coordinate, no two have same y coordinate, and no
three in a line for convenience.
Convex Hull ( CH(S) ): smallest polygon containing all points in S.
v r
p u
t s
p q r s t
• If the rest of the points are on one side of the segment, the segment is on the
convex hull.
Can we do better?
How to Merge?
A B
a4 b2
a5
a1
a2 b1
b3
a3
L
First link ai to bj , go down b ilst till you see bm and link bm to ak , continue along
the a list until you return to ai . In the example, this gives (a4 , b2 , b3 , a3 ).
Finding Tangents
Assume ai maximizes x within CH(A) (a1 , a2 , . . . , ap ). b1 minimizes x within CH(B)
(b1 , b2 , . . . , bq )
L is the vertical line separating A and B. Define y(i, j) as y-coordinate of inter
section between L and segment (ai , bj ).
Claim: (ai , bj ) is uppertangent iff it maximizes y(i, j).
If y(i, j) is not maximum, there will be points on both sides of (ai , bj ) and it
cannot be a tangent.
Algorithm: Obvious O(n2 ) algorithm looks at all ai , bj pairs. T (n) = 2T (n/2) +
Θ(n2 ) = Θ(n2 ).
1 i=1
2 j=1
3 while (y(i, j + 1) > y(i, j) or y(i − 1, j) > y(i, j))
4 if (y(i, j + 1) > y(i, j)) [ move right finger clockwise
5 j = j + 1( mod q)
6 else
7 i = i − 1( mod p) [ move left finger anti-clockwise
8 return (ai , bj ) as upper tangent
ap-1 b3 b4
ap b2
b1
a1 bq bq-1
a2
a1 , b1 are right most and left most points. We move anti clockwise from a1 ,
clockwise from b1 . a1 , a2 , . . . , aq is a convex hull, as is b1 , b2 , . . . , bq . If ai , bj is such
that moving from either ai or bj decreases y(i, j) there are no points above the (ai , bj )
line.
The formal proof is quite involved and won’t be covered.
Median Finding
Given set of n numbers, define rank(x) as number of numbers in the set that are ≤ x.
Find element of rank l n+1
2
J (lower median) and I n+1
2
l (upper median).
Clearly, sorting works in time Θ(n log n).
Can we do better?
B x C
k - 1 elements h - k elements
Select(S, i)
1 Pick x ∈ S [ cleverly
2 Compute k = rank(x)
3 B = {y ∈ S|y < x}
4 C = {y ∈ S|y > x}
5 if k = i
6 return x
7 else if k > i
8 return Select(B, i)
9 else if k < i
10 return Select(C, i − k)
Picking x Cleverly
Need to pick x so rank(x) is not extreme.
>x
larger
medians x
smaller
<x
n 7n
T (n) ≤ cI l + c( + 6) + an (2)
5 10
cn 7nc
≤ +c+ + 6c + an (3)
5 10
cn
= cn + (− + 7c + an) (4)
10
70c
If c ≥ n
+ 10a, we are done. This is true for n ≥ 140 and c ≥ 20a.
Appendix 1
Example
b1 b2
a4 b4 b3
a3
a2 a1 L
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
The degree of A is n − 1.
Operations on polynomials
There are three primary operations for polynomials.
2. Addition: Given two polynomials A(x) and B(x), compute C(x) = A(x) +
B(x) (∀x). This takes O(n) time using basic arithmetic, because ck = ak + bk .
1
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
Representations of polynomials
First, consider the different representations of polynomials, and the time necessary
to complete operations based on the representation.
There are 3 main representations to consider.
• A(x) = (x − r0 ) · (x − r1 ) · · · · · (x − rn−1 ) · c
• However, it is impossible to find exact roots with only basic arithmetic
operations and kth root operations. Furthermore, addition is extremely
hard with this representation, or even impossible. Multiplication simply
requires roots to be concatenated, and evaluation can be completed in
O(n).
3. Samples: (x0 , y0 ), (x1 , y1 ), . . . , (xn−1 , yn−1 ) with A(xi ) = yi (∀i) and each xi
is distinct. These samples uniquely determine a degree n − 1 polynomial A,
according to the Lagrange and Fundamental Theorem of Algebra. Addition
and multiplication can be computed by adding and multiplying the yi terms,
assuming that the xi ’s match. However, evaluation requires interpolation.
The runtimes for the representations and the operations is described in the table
below, with algorithms for the operations versus the representations.
2
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
3
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
2. Recursively conquer Aeven (y) for y ∈ X 2 and Aodd (y) for y ∈ X 2 , where X 2 =
{x2 | x ∈ X}.
Roots of Unity
Collapsing sets can be constructed via square roots. Each of the following collapsing
sets is computing by taking all square roots of the previous set.
1. {1}
2. {1, −1}
We can repeat this process and make our set larger and larger by finding more and
more points on this circle. These points are called the nth roots of unity. Formally,
the nth roots of unity are n x’s such that xn = 1. These points are uniformly spaced
around the unit circle in the complex plane (including 1). These points are of the
form (cos θ, sin θ) = cos θ +i sin θ = eiθ by Euler’s Formula, for θ = 0, n1 τ, n2 τ, . . . , n−1
n
τ
(where τ = 2π).
The nth roots of unity where n = 2£ form a collapsing set, because (eiθ )2 = ei(2θ) =
ei(2θ mod τ ) . Therefore the even nth roots of unity are equivalent to the n2 nd roots of
unity.
4
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
5
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
Pn−1
Now if j = k, pjk = m=0 = n. Otherwise it forms a geometric series.
n−1
X
pjk = = (ei(j−k)τ /n )m
m=0
(e ) −1
iτ (j−k)/n n
=
e iτ (j−k)/n −1
=0
This claim says that the Inverse Discrete Fourier Transform is equivalent to the
Discrete Fourier Transform, but changing xk from eikτ /n to its complex conjugate
e−ikτ /n , and dividing the resulting vector by n. The algorithm for IFFT is analogous
to that for FFT, and the result is an O(n lg n) algorithm for IDFT.
6
Lecture 3 Fast Fourier Transform 6.046J Spring 2015
Applications
Fourier (frequency) space many applications. The polynomial A∗ = F F T (A) is com
plex, and the amplitude |a∗k | represents the amplitude of the frequency-k signal, while
arg(a∗k ) (the angle of the 2D vector) represents the phase shift of that signal. For ex
ample, this perspective is particularly useful for audio processing, as used by Adobe
Audition, Audacity, etc.:
7
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 4 van Emde Boas 6.046J Spring 2015
• Insert, Successor
• Delete
• Space
Goal
We want to maintain n elements in the range {0, 1, 2, . . . , u − 1} and perform Insert,
Delete and Successor operations in O(log log u) time.
O(1)
• If n = nO(1) or n(log n) , then we have O(log log n) time operations
• Recurrences
log u
– T (log u) = T 2 + O(1)
√
– T (u) = T ( u) + O(1)
Improvements
We will develop the van Emde Boas data structure by a series of improvements on
a very simple data structure.
Bit Vector
We maintain a vector V of size u such that V [ x ] = 1 if and only if x is in the set.
Now, inserts and deletes can be performed by just flipping the corresponding bit
in the vector. However, successor/predecessor requires us to traverse through the
vector to find the next 1-bit.
• Insert/Delete: O(1)
• Successor/Predecessor: O(u)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1
Figure 1: Bit vector for u = 16. THe current set is {1, 9, 10, 15}.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1
√
Figure 2: Bit vector (u = 16) split into 16 = 4 clusters of size 4.
• Insert:
• Successor:
√
– Look within cluster high( x ) O( u)
√
– Else, find next non-empty cluster i O( u)
√
– Find minimum entry j in that cluster O( u)
√
– Return index (i, j) Total = O( u)
Recurse
√
The three operations in Successor are also Successor calls to vectors of size u. We
can use recursion to speed things up.
√ √
• V.cluster [i ] is a size- u van Emde Boas structure (∀ 0 ≤ i < u)
I NSERT (V, x )
1 Insert(V.cluster [ high( x )], low[ x ])
2 Insert(V.summary, high[ x ])
S UCCESSOR (V, x )
1 i = high( x )
2 j = Successor (V.cluster [i ], j)
3 if j == ∞
4 i = Successor (V.summary, i )
5 j = Successor (V.cluster [i ], −∞)
6 return index (i, j)
√
T (u) = 3T ( u) + O(1)
' ' log u
T (log u) = 3T + O(1)
2
=⇒ T (u) = T ' (log u) = O((log u)log 3 ) ≈ O((log u)1.585 )
To obtain the O(log log u) running time, we need to reduce the number of re
cursions to one.
S UCCESSOR (V, x )
1 i = high( x )
5 j = V.cluster [i ].min
√
T (u) = T ( u) + O(1)
=⇒ T (u) = O(log log u)
I NSERT (V, x )
1 if V.min == None
2 V.min = V.max = x I O(1) time
3 return
4 if x < V.min
5 swap( x ↔ V.min)
6 if x > V.max
7 V.max = x)
8 if V.cluster [ high( x ) == None
9 Insert(V.summary, high( x )) I First Call
10 Insert(V.cluster [ high( x )], low( x )) I Second Call
If the first call is executed, the second call only takes O(1) time. So
√
T (u) = T ( u) + O(1)
=⇒ T (u) = O(log log u)
D ELETE (V, x )
1 if x == V.min I Find new min
2 i = V.summary.min
3 if i = None
4 V.min = V.max = None I O(1) time
5 return
6 V.min = index (i, V.cluster [i ].min) I Unstore new min
7 Delete(V.cluster [ high( x )], low( x )) I First Call
8 if V.cluster [ high( x )].min == None
9 Delete(V.summary, high( x )) I Second Call
10 I Now we update V.max
11 if x == V.max
12 if V.summary.max = None
13 else
14 i = V.summary.max
15 V.max = index (i, V.cluster [i ].max )
If the second call is executed, the first call only takes O(1) time. So
√
T (u) = T ( u) + O(1)
=⇒ T (u) = O(log log u)
Space Improvements
We can improve from Θ(u) to O(n log log u).
• Each insert may create a new structure Θ(log log u) times (each empty insert)
ˇ at]
– Can actually happen [Vladimir Cun´
• Charge pointer to structure (and associated hash table entry) to the structure
Indirection
We can further reduce to O(n) space.
• Store vEB structure with n = O(log log u) using BST or even an array
=⇒ O( log log
n
=⇒ O( log log
n
u · log log u ) = O( n ) space for large
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 5 Amortization 6.046J Spring 2015
Lecture 5: Amortization
– aggregate method
– accounting method
– charging method
– potential method
– table doubling
– binary counter
– 2-3 tree and 2-5 tree
Table doubling
(Recall from 6.006) We want to store n elements in a table of size m = Θ(n). One
idea is to double m whenever n becomes larger than m (due to insertions). The cost
to double a table of size m is clearly Θ(m) = Θ(n), which is also the worse case cost
of an insertion.
But what is the total cost of n insertions? It is at most
20 + 21 + 22 + · · · + 2�lg n� = Θ(n).
In this case, we say each insertion has Θ(n)/n = Θ(1) amortized cost.
Aggregate Method
The method we used in the above analysis is the aggregate method: just add up the
cost of all the operations and then divide by the number of operations.
For example, we can say a 2-3 tree achieves O(1) amortized cost per create, O(lg n∗ )
amortized cost per insert, and 0 amortized cost per delete, where n∗ is the maximum
size of the 2-3 tree during the entire sequence of operations. The reason we can claim
this is that for any sequence of operations, suppose there are c creations, i insertions
and d ≤ i deletions (cannot delete from an empty tree), the total amortized cost is
asymptotically the same as the total actual cost:
Later, we will tighten the amortized cost per insert to O(lg n) where n is the
current size.
Accounting Method
This method allows an operation to store credit into a bank for future use, if its
assigned amortized cost > its actual cost; it also allows an operation to pay for its
extra actual cost using existing credit, if its assigned amortized cost < its actual cost.
Table doubling
For example, in table doubling:
– if an insertion does not trigger table doubling, store a coin represnting c = O(1)
work for future use.
– if an insertion does trigger table doubling, there must be n/2 elements that are
inserted after the previous table doubling, whose coins have not been consumed.
Use up these n/2 coins to pay for the O(n) table doubling. See figure below.
– amortized cost for table doubling: O(n) − c · n/2 = 0 for large enough c.
2-3 trees
Now let’s try the accounting method on 2-3 trees. Our goal is to show that insert has
O(lg n) amortized cost and delete has 0 amortized cost. Let’s try a natural approach:
save a O(lg n) coin for inserting an element, and use this coin when we delete this
element later. However, we will run into a problem: by the time we delete the element,
the size of the tree may have got bigger n' > n, and the coin we saved is not enough
to pay for the lg n' actual cost of that delete operation! This problem can be solved
using the charging method in the next section.
Charging Method
The charging method allows operations to charge cost retroactively to past operations.
Potential Method
This method defines a potential function Φ that maps a data structure (DS) configu
ration to a value. This function Φ is equivalent to the total unused credits stored up
by all past operations (the bank account balance). Now
and
� �
amortized cost = actual cost + Φ(final DS) − Φ(initial DS).
In order for the amortized bound to hold, Φ should never go below Φ(initial DS)
at any point. If Φ(initial DS) = 0, which is usually the case, then Φ should never go
negative (intuitively, we cannot ”owe the bank”).
Binary counter
Our first example of potential method is incrementing a binary counter. E.g.,
0011010111
increment ↓
0011011000
Cost of increment is Θ(1 + #1), where #1 represents the number of trailing 1 bits.
So the intuition is that 1 bits are bad.
Define Φ = c · #1. Then for large enough c,
Φ(initial DS) = 0 if the counter starts at 000 · · · 0. This is necessary for the above
amortized analysis. Otherwise, Φ may become smaller than Φ(initial DS).
If we consider both insertion and deletion in 2-3 trees, can we claim both O(1) splits
for insert, and O(1) merges for delete? The answer is no, because a split creates two
2-nodes, which are bad for merge. In the worse case, they may be merged by the next
delete, and then need split again on the next insert, and so on.
What do we solve this problem? We need to prevent split and merge from creating
‘bad’ nodes.
promote to parent
e
5 k e y s 5 k y s
6 children 3 children 3 children
Deletion from a 2-node causes it to merge with another 2-node to form a 3-node.
1
x 1
key demoted 2 children 3 children
1 child left
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 6 Randomized Algorithms 6.046J Spring 2015
• Quicksort
• Algorithm that generates a random number r ∈ {1, ..., R} and makes decisions
based on r’s value.
Randomized algorithms can be broadly classified into two types- Monte Carlo and
Las Vegas.
Monte Carlo Las Vegas
runs in polynomial time always runs in expected polynomial time
output is correct with high probability output always correct
Matrix Product
C =A×B
Simple algorithm: O(n3 ) multiplications.
Strassen: multiply two 2 × 2 matrices in 7 multiplications: O(n
log2 7 ) = O(n2.81 )
Coppersmith-Winograd: O(n2.376 )
• if A × B = C, then P r[output=YES] = 1.
• if A × B = C, then P r[output=YES] ≤ 12 .
We will assume entries in matrices ∈ {0, 1} and also that the arithmetic is mod 2.
Frievald’s Algorithm
Choose a random binary vector r[1...n] such that P r[ri = 1] = 1/2 independently for
r = 1, ..., n. The algorithm will output ’YES’ if A(Br) = Cr and ’NO’ otherwise.
Observation
The algorithm will take O(n2 ) time, since there are 3 matrix multiplications Br,
A(Br) and Cr of a n × n matrix by a n × 1 matrix.
Analysis of Correctness if AB = C
Claim. If AB = C, then P r[ABr = Cr] ≥ 1/2.
Let D = AB − C. Our hypothesis is thus that D = 0. Clearly, there exists r
such that Dr = 0. Our goal is to show that there are many r such that Dr = 0.
Specifically, P r[Dr = 0] ≥ 1/2 for randomly chosen r.
D = AB −C = 0 =⇒ ∃ i, j s.t. dij =
0. Fix vector v which is 0 in all coordinates
except for vj = 1. (Dv)i = dij = 0 implying Dv = 0. Take any r that can be chosen
by our algorithm. We are looking at the case where Dr = 0. Let
r' = r + v
Since v is 0 everywhere except vj , r' is the same as r exept rj' = (rj + vj ) mod 2.
Thus, Dr' = D(r + v) = 0 + Dv = 0. We see that there is a 1 to 1 correspondence
between r and r' , as if r' = r + V = r'' + V then r = r'' . This implies that
Quicksort
Divide and conquer algorithm but work mostly in the divide step rather than combine.
Sorts “in place” like insertion sort and unlike mergesort (which requires O(n) auxiliary
space).
Different variants:
Steps of quicksort:
• Combine: trivial
Basic Quicksort
Pivot around x = A[1] or A[n] (first or last element)
Randomized Quicksort
x is chosen at random from array A (at each recursion, a random choice is made).
Expected time is O(n log n) for all input arrays A. See CLRS p.181-184 for the
analysis of this algorithm; we will analyze a variant of this.
“Paranoid” Quicksort
Repeat
choose pivot to be random element of A
perform Partition
Until
resulting partition is such that
|L| ≤ 34 |A| and |G| ≤ 34 |A|
Recurse on L and G
n n n
4 2 4
• the number of iterations to get a good call. Denote as c · n the cost of the
partition step
Expectations
2cn
(2cn)/4 3(2cn)/4
O(1) O(1)
E(#iterations) ≤ 2
(n) 3n
T (n) ≤ T +T + 2cn
4 4
We see in the figure that the height of the tree can be at most log
4
(2cn) no matter
what branch we follow to the bottom. At each level, we do a total of 2cn work. Thus,
expected runtime is T (n) = Θ(n log n)
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 8 Hashing 6.046J Spring 2015
Lecture 8: Hashing
Course Overview
This course covers several modules:
2. Universal hashing
3. Perfect hashing
Review
Dictionary Problem
A dictionary is an Abstract Data Type (ADT) that maintains a set of items. Each
item has a key. The dictionary supports the following operations:
We assume that items have distinct keys (or that inserting new ones clobbers old
ones).
This problem is easier than predecessor/successor problems solved in previous
lecture (by van Emde Boas trees, or by AVL/2-3 trees/skip lists).
Etymology
The English ‘hash’ (1650s) means “cut into small pieces”, which comes from the
French ‘hacher‘ which means “chop up”, which comes from the Old French ‘hache’
which means “axe” (cf. English ‘hatchet’). Alternatively, perhaps they come from
Vulcan ‘la’ash’, which means “axe”. (R.I.P. Leonard Nimoy.)
Universal Hashing
The idea of universal hashing is listed as following:
• now we just assume h is random, and make no assumption about input keys.
(like Randomized Quicksort)
1 if h(ki ) = h(kj )
Proof : Consider keys k1 , k2 , . . . , kn . Let Ii,j = . Then we have
0 otherwise
E[Ii,j ] = Pr{Ii,j = 1}
= Pr{h(ki ) = h(kj )} (1)
1
≤ for any j 6= i
m
� �
L
n
E[# keys hashing to the same slot as ki ] = E Ii,j
j=1
L
n
=
E[Ii,j ] (linearity of expectation)
j=1
L
=
E[Ii,j ] + E[Ii,i ]
6
j=i
n
≤ +1
m
(2)
D
From the above theorem, we know that Insert, Delete, and Search all take O(1+α)
expected time. Here we give some examples of universal hash functions.
• m is a prime
• u = mr where r is an integer
In real cases, we can always round up m and u to satisfy the above assumptions. Now
let’s view keys in base m: k = hk0 , k1 , . . . , kr−1 i. For key a = ha0 , a1 , a2 , . . . , ar−1 i,
define
ha (k) = a · k mod m (dot product)
r−1
X (3)
= ai ki mod m
i=0
Proof : Take any two keys k = 6 k 0 . They must differ in some digits. Say kd =
6 kd0 .
Define not d = {0, 1, . . . , r − 1} \ {d}. Now we have
( r−1 r−1
)
X X
Pr{ha (k) = ha (k 0 )} = Pr ai ki = ai ki0 (mod m)
a a
( i=0 i=0
)
X X
= Pr ai ki + ad kd = ai ki0 + ad kd0 (mod m)
a
6
i=d 6
i=d
( )
X
= Pr ai (ki − ki0 ) + ad (kd − kd0 ) = 0 (mod m)
a
6
i=d
( )
X
= Pr ad = −(kd − kd0 )−1 ai (ki − ki0 ) (mod m)
a (4)
6
i=d
4
Lecture 8 Hashing 6.046J Spring 2015
Another universal hash family from CLRS: Choose prime p ≥ u (once). Define
hab (k) = [(ak + b) mod p)] mod m. Let H = {hab | a, b ∈ {0, 1, . . . , u − 1}}.
Perfect Hashing
Static dictionary problem: Given n keys to store in table, only need to support
search(k). No insertion or deletion will happen.
Step 2: For each slot j ∈ {0, 1, . . . , m − 1}, let lj be the number of items in slot j.
lj = |{i | h(ki ) = j}|. Pick h2,j : {0, 1, . . . , u − 1} → {0, 1, . . . , mj } from a universal
hash family for lj2 ≤ mj ≤ O(lj2 ) (e.g., nearby prime). Replace chain in slot j with
hashing-with-chaining using h2,j .
Lm−1
The space complexity is O(n + j=0 lj2 ). In order to reduce it to O(n), we need
to add two more steps:
Lm−1
Step 1.5: If j=0 lj2 > cn where c is a chose constant, then redo Step 1.
Step 2.5: While h2,j (ki ) = h2,j (ki' ) for any i 6= i' , j, repick h2,j and rehash those lj .
The above two steps guarantee that there are no collisions at second level, and
the space complexity is O(n). As a result, search time is O(1). Now let’s look at the
build time of the algorithm. Both Step 1 and Step 2 are O(n). How about Step 1.5
and Step 2.5?
For Step 2.5,
L
6 i' } ≤
Pr{h2,j (ki ) = h2,j (ki' ) for some i = Pr{h2,j (ki ) = h2,j (ki' )} (union bound)
h2,j h2,j
6 ,
i=i
� �
lj 1
≤ · 2
2 lj
1
<
2
(5)
As a result, each trial is like a coin flip. If the outcome is “tail”, we move to the
next step. By Lecture 7, we have E[#trials] ≤ 2 and #trials = O(log n) w.h.p. By a
Chernoff bound, lj = O(log n) w.h.p., so each trial takes O(log n) time. Because we
have to do this for each j, the total time complexity is O(log n) · O(log n) · O(n) =
O(n log2 n) w.h.p.
(
1 if h(ki ) = h(ki0 )
For Step 1.5, we define Ii,i0 = . Then we have
0 otherwise
"m−1 # " n n #
X XX
E lj2 = E Ii,i0
j=0 i=1 i0 =1
n
XX n
= E[Ii,i0 ] (linearity of expectation)
(6)
i=1 i0 =1
n 1
≤n+2 ·
2 m
= O(n) because m = Θ(n)
7
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 9 Augmentation 6.046J Spring 2015
Lecture 9: Augmentation
• order-statistics trees
• range trees
The main idea is to modify “off-the-shelf” common data structures to store (and
update) additional information.
• AVL trees: after rotating two nodes, first update the new bottom node and
then update the new top node
• 2-3 trees: after splitting a node, update the two new nodes.
• rank(x): find x’s index in the sorted order, i.e., # of elements < x,
We can implement the above ADT using easy tree augmentation on AVL trees
(or 2-3 trees) to store subtree size: f (subtree) = # of nodes in it. Then we also have
x.size = 1 + c.size for c in x.children.
As a comparison, we cannot store the rank for each node. In that case, insert(−∞)
will change the ranks for all nodes.
x'
• x = root
2
• rank = x.left.size + 1
• if i = rank: return x
2
same as above.
split new
merge delete
We store all keys in the leaves. Non-leaf nodes do not store keys; instead, they
store the min and max key of the subtree (via easy tree augmentation).
Then the original top-down search(x) (without being given y) can be implemented
as follows:
• start from the root, look at min & max of each child ci
• if ci . max ≤ x ≤ ci+1 . min, return ci . max (as predecessor) or ci+1 . min (as
successor)
• v = v.parent
Analysis. We start at the leaf level, and go up by 1 level in each iteration. At step
i, level link at height i skips roughly ci keys (ranks), where c ∈ [2, 3]. Therefore, if
|rank(y) − rank(x)| = k, we will reach the subtree containing x in O(lg k) steps, and
the top-down search that follows is also O(lg k).
1D case
We start with the simple case of 1D points, i.e., all xi ’s and a and b are scalars.
Then, we can simply use a sorted array. To do range-query(a, b), we simply
perform two binary searches for a and b, respectively, and then return all the points
in between (say there are k of them). The complexity is O(lg n + k).
Sorted arrays are inefficient for insertion and deletion. For a dynamic data struc
ture that supports range queries, we can use finger search tree from the previous
section. Finger search trees support efficient insertion and deletion. To do range-
query(a, b), we first search for a, and then keep doing finger search to the right by 1
until we exceed b. Each finger search by 1 takes O(1), so the total complexity is also
O(lg n + k).
However, neither of the above approaches generalizes to high dimensions. That’s
why we now introduce range trees.
1D range trees
A 1D range tree is a complete binary search tree (for dynamic, use an AVL tree).
Range-query(a, b) can be implemented as follows:
• search(a)
• search(b)
• return the nodes and subtrees “in between”. There are O(lg n) nodes and
O(lg n) subtrees “in between”.
find(a) find(b)
Figure 1: 1D range tree. Range-query(a, b) returns all hollow nodes and shaded
subtrees. Image from Wikipedia http://en.wikipedia.org/wiki/Range_tree
2D range trees
A 2D range tree consists of a primary 1D range tree and many secondary 1D range
trees. The primary range stores all points, keyed on the first coordinate. Every node
v in the primary range tree stores all points in v’s subtree in a secondary range tree,
keyed on the second coordinate.
Range-query(a, b) can be implemented as follows:
• use the primary range tree to find all points with the correct range on the first
coordinate. Only implicitly represent the answer, so this takes O(lg n).
• for the O(lg n) nodes, manually check whether their second coordinate lie in the
correct range.
• for the O(lg n) subtrees, use their secondary range tree to find all points with
the correct range on the second coordinate.
Analysis. O(lg2 n) to implicitly represent the answer, because we will find O(lg2 n)
nodes and subtrees in secondary range trees. O(lg2 n + k) to output all k answers.
O(lg2 n) to report k via subtree size augmentation.
Space complexity is O(n lg n). The primary subtree is O(n). Each point is dupli
cated up to O(lg n) times in secondary subtrees, one per ancestor.
and O n lglglgn
n space complexity.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 10 Introduction 6.046J Spring 2015
DP notions
1. Characterize the structure of an optimal solution
Strategy
L(i, j): length of longest palindromic subsequence of X[i · · · j] for i ≤ j.
1
Lecture 10 Introduction 6.046J Spring 2015
1 def L(i, j) :
2 if i == j: return 1
3 if X[i] == X[j]:
4 if i + 1 == j: return 2
5 else : return 2 + L(i + 1, j - 1)
6 else :
7 return max(L(i + 1, j), L(i, j - 1))
Analysis
As written, program can run in exponential time: suppose all symbols X[i] are dis
tinct.
T (n) = running time on input of length n
1 n=1
T (n) =
2T (n − 1) n > 1
n−1
=2
Subproblems
n
But there are ) distinct subproblems: each is an (i, j) pair with
2
i < j. By solving each subproblem only once. running time reduces to
2
Lecture 10 Introduction 6.046J Spring 2015
L
n
Wi · (depthT (Ki ) + 1)
i=1
Enumeration
Exponentially many trees
1 2
n = 2
2 1
W1 + 2W2 2W1 + W2
3 3 2 1 1
n = 3
2 1 1 3 3 2
1 2 2 3
3W1 + 2W2 + W3 2W1 + 3W2 + W3 2W1 + W2 + 2W3 W1 + 3W2 + 2W3 W1 + 2W2 + 3W3
Strategy
W (i, j) = Wi + Wi+1 + · · · + Wj
e(i, j) = cost of optimal BST on Ki , Ki+1 , · · · , Kj
Want e(1, n)
Greedy solution?
Pick Kr in some greedy fashion, e.g., Wr is maximum.
greedy doesn’t work, see example at the end of the notes.
3
Lecture 10 Introduction 6.046J Spring 2015
Kr
Question
Can the first player always win?
Try: 4 42 39 17 25 6
Strategy
V1 , V2 , · · · , Vn−1 , Vn
2. During the game only pick from the chosen subset (you will always be able to!)
How to maximize the amount of money won assuming you move first?
4
Lecture 10 Introduction 6.046J Spring 2015
Optimal Strategy
V (i, j): max value we can definitely win if it is our turn and only coins Vi , · · · , Vj
remain.
V (i, i) : just pick i.
V (i, i + 1): pick the maximum of the two.
V (i, i + 2), V (i, i + 3), · · ·
Solution
V (i + 1, j) subproblem with opponent picking
we are guaranteed min{V (i + 1, j − 1), V (i + 2, j)}
Where V (i+1, j−1) corresponds to opponent picking Vj and V (i+2, j) corresponds
to opponent picking Vi+1
We have
Complexity?
5
Lecture 10 Introduction 6.046J Spring 2015
10
2
1 8
1 3
9
4
Figure 1: cost = 1 × 2 + 10 × 1 + 8 × 2 + 9 × 3 = 55
8
3
10 9
2 4
1
1
Figure 2: cost = 1 × 3 + 10 × 2 + 8 × 1 + 9 × 2 = 49
6
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 11 All-Pairs Shortest Paths 6.046J Spring 2015
Introduction
Different types of algorithms can be used to solve the all-pairs shortest paths problem:
• Dynamic programming
• Matrix multiplication
• Floyd-Warshall algorithm
• Johnson’s algorithm
• Difference constraints
• find δ(s, v), equal to the shortest-path weight s− > v, ∀v ∈ V (or −∞ if negative
weight cycle along the way, or ∞ if no path)
All of the above results are the best known. We achieve a O(E + V lg V ) bound
on Dijkstra’s algorithm using Fibonacci heaps.
These results (apart from the third) are also best known — don’t know how to
beat |V |× Dijkstra
3. Recurrence:
d(m) (m−1)
uv = min(dux + w(x, v) for x ∈ V )
0 if u = v
d(0)
uv =
∞ otherwise
5. Original problem:
If graph contains no negative-weight cycles (by Bellman-Ford analysis), then
(n−1) (n)
shortest path is simple ⇒ δ(u, v) = duv = duv = · · ·
Time complexity
In this Dynamic Program, we have O(V 3 ) total sub-problems.
Each sub-problem takes O(V ) time to solve, since we need to consider V possible
choices. This gives a total runtime complexity of O(V 4 ).
Note that this is no better than |V |× Bellman-Ford
1 for m = 1 to n by 1
2 for u in V
3 for v in V
4 for x in V
5 if duv > dux + dxv
6 duv = dux + dxv
In the above pseudocode, we omit superscripts because more relaxation can never
hurt.
(m) Lm/21 Lm/21
Note that we can change our relaxation step to duv = min(dux +dxv for x ∈ V ).
3
This change would produce an overall running time of O(n lg n) time. (student sug
gestion)
Matrix multiplication
Recall the task of standard matrix multiplication,
n
Given n × n matrices A and B, compute C = A · B, such that cij = k=1 aik · bkj .
With the above definitions, we see that D(m) can be expressed as D(m−1) 8 W .
In other words, D(m) can be expressed as the circle-multiplication of W with itself m
times.
We can’t use Strassen, etc. since our new multiplication and addition operations
don’t support negation.
3. Recurrence:
(k−1) (k−1)
c(k) (k−1)
uv = min(cuv , cuk + ckv )
(0)
cuv = w(u, v)
Time complexity
This Dynamic Program contains O(V 3 ) problems as well. However, in this case, it
takes only O(1) time to solve each sub-problem, which means that the total runtime
of this algorithm is O(V 3 ).
Johnson’s algorithm
1. Find function h : V → R such that wh (u, v) = w(u, v) + h(u) − h(v) ≥ 0 for all
u, v ∈ V or determine that a negative-weight cycle exists.
i=1
• Hence all u → v paths change in weight by the same offset h(u) − h(v),
which implies that the shortest path is preserved (but offset).
How to find h?
We know that
wh (u, v) = w(u, v) + h(u) − h(v) ≥ 0
This is equivalent to,
h(v) − h(u) ≤ w(u, v)
for all (u, v) ∈ V . This is called a system of difference constraints.
0 ≤ w(cycle) < 0
Proof. Add a new vertex s to G, and add edges (s, v) of weight 0 for all v ∈ V .
• Clearly, these new edges do not introduce any new negative weight cycles to the
graph
• Adding these new edges ensures that there now exists at least one path from s
to v. This implies that δ(s, v) is finite for all v ∈ V
• We now claim that h(v) = δ(s, v). This is obvious from the triangle inequality:
δ(s, u) + w(u, v) ≥ δ(s, v) ⇔ δ(s, v) − δ(s, u) ≤ w(u, v) ⇔ h(v) − h(u) ≤ w(u, v)
Time complexity
1. The first step involves running Bellman-Ford from s, which takes O(V E) time.
We also pay a pre-processing cost to reweight all the edges (O(E))
2. We then run Dijkstra’s algorithm from each of the V vertices in the graph; the
total time complexity of this step is O(V E + V 2 lg V )
3. We then need to reweight the shortest paths for each pair; this takes O(V 2 )
time.
Applications
Bellman-Ford consult any system of difference constraints (or report that it is un
solvable) in O(V E) time where V = variables and E = constraints.
An exercise is to prove the Bellman-Ford minimizes maxi xi − mini xi .
This has applications to
• Real-time programming
• Multimedia scheduling
• Temporal reasoning
For example, you can bound the duration of an event via difference constraint
LB ≤ tend − tstart ≤ U B, or bound a gap between events via 0 ≤ tstart2 − tend1 ≤ ε,
or synchronize events via |tstart1 − tstart2 | ≤ ε or 0.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 12 Minimum Spanning Tree 6.046J Spring 2015
Introduction
• Optimal Substructure
• Prim’s algorithm
• Kruskal’s algorithm
Definitions
Recall that a greedy algorithm repeatedly makes a locally best choice or decision, but
ignores the effects of the future.
A tree is a connected, acyclic graph.
A spanning tree of a graph G is a subset of the edges of G that form a tree and
include all vertices of G.
Finally, the Minimum Spanning Tree problem: Given an undirected graph G =
(V, E) and edge weights W : E → R, find a spanning tree T of minimum weight
�
e∈T w(e).
A naive algorithm
The obvious MST algorithm is to compute the weight of every tree, and return the
tree of minimum weight. Unfortunately, this can take exponential time in the worst
case. Consider the following example:
If we take the top two edges of the graph, the minimum spanning tree can consist
of any combination of the left and right edges that connect the middle vertices to the
left and right vertices. Thus in the worst case, there can be an exponential number
of spanning trees.
Instead, we consider greedy algorithms and dynamic programming algorithms to
solve MST. We will see that greedy algorithms can solve MST in nearly linear time.
• Greedy choice property: locally optimal choices lead to a globally optimal so
lution
We can see how these properties can be applied to the MST problem
Lemma 1. If T ' is a minimum spanning tree of G/e, then T ' ∪ {e} is an MST of G.
The statement can be used as the basis for a dynamic programming algorithm, in
which we guess an edge that belongs to the MST, retract the edge, and recurse. At
the end, we decontract the edge and add e to the MST.
The lemma guarantees that this algorithm is correct. However, this algorithm is
requires exponential time, because there are an exponential number of edges that we
can guess to form our MST.
We make the algorithm polynomial time by removing the guessing process.
Lemma 2 (Greedy-Choice Property for MST). For any cut (S, V \ S) in a graph
G = (V, E, w), any least-weight crossing edge e = {u, v} with u ∈ S and v ∈
/ S is in
some MST of G.
Prim’s Algorithm
Now, we can apply the insights from the optimal structure and greedy choice property
to build a polynomial-time, greedy algorithm to solve the minimum spanning tree
problem.
2 Q = V
4 for v in V \ {s}
5 v.key = ∞
7 u = Extract-Min(Q), add u to S
8 for v ∈ Adj[u]
9 if v ∈ Q and v ∈
/ S and w(u, v) < v.key:
11 v.parent = u
Correctness
We prove the correctness of Prim’s Algorithm with the following invariants.
1. v ∈
/ S =⇒ v.key = min{w(u, v) | u ∈ S}
The first invariant is follows from Step 8 of the algorithm above. A proof of the
second invariant follows:
Thus Prim’s Algorithm always adds edges that have the lowest weight and gradu
ally builds a tree that is always a subset of some MST, and returns a correct answer.
Runtime
Prim’s algorithm runs in
The O(E) term results from the fact that Step 8 is repeated a number of times equal
to the sum of the number of adjacent vertices in the graph, which is equal to 2|E|,
by the handshaking lemma.
Then the effective runtime of the algorithm varies with the data structures used
to implement the algorithm. The table below describes the runtime with the different
implementations of the priority queue.
Priority Queue
TExtract-Min TDecrease-Key Total
Array O(V ) O(1) O(V 2 )
Binary heap O(lg V ) O(lg V ) O(E lg V )
Fibonacci heap O(lg V ) (amortized) O(1) (amortized) O(E + V lg V )
[CLRS ch. 19]
Kruskal’s Algorithm
Kruskal’s Algorithm is another algorithm to solve the MST problem. It constructs
an MST by taking the globally lowest-weight edge and contracting it.
Correctness
We use the following invariant to prove the correctness of Kruskal’s Algorithm.
Runtime
Kruskal’s algorithm has an overall runtime of
Tsort (E) + O(V ) · TMake-Set + O(E)(TFind + TUnion ) = O(E lg E + Eα(V ))
We note that TMake-Set is O(1) and Tfind + TUnion is amortized O(α(V )) for Union-Find
data structures.
Furthermore, if all weights are integer weights, or all weights are in the range
[0, E O(1) ], then the runtime of the sorting step is O(E), using Counting Sort or a
similar algorithm, and the runtime of Kruskal’s Algorithm will be better than that
of Prim’s Algorithm.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 14 Baseball Elimination 6.046J Spring 2015
From this chart, how can we figure out if a team is eliminated? A naive sports
writer can only compute that Team i is eliminated if wi + ri < wj for some other
j. For example, if Detroit’s record was w5 = 46, then Detroit is certainly eliminated
since w5 + r5 = 46 + 28 < 75 = w1 .
This condition, however, is sufficient, but not necessary. For instance, consider
w5 = 47. Though w5 + r5 = 75, either NY or Baltimore will reach 76 wins since they
have 5 games left against each other. How can we determine if Detroit is eliminated
for arbitrary values of w5 ?
To answer this question, we can use max-flow. Consider the Figure 1, where
capacity between s and node i − j is the number of games left to be played between
team i and j, between node i − j and node k = 1, 2, 3, 4 is infinity, and node k and t
is w5 + r5 − wk . The intuition for the construction of the graph is that we will assume
Detroit win all r5 games, and try to keep the number of wins per team to be less than
or equal to the total possible wins of Detroit (≤ w5 + r5 ).
Theorem 1. Team 5 (Detroit) is eliminated if and only if max-flow does not saturate
all edges leaving the source, i.e., max flow value < 26.
Proof. Saturation of the edge capacity corresponds to playing all the remaining games.
If all the games cannot be played, while keeping the total number of wins of a team
to be less than or equal to w5 + r5 , then Team 5 is eliminated.
1
roughly speaking!
1
Lecture 14 Baseball Elimination 6.046J Spring 2015
2
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 15 Linear Programming 6.046J Spring 2015
• Objective function: c · x
Minimize x1 + x2 + x3 + x4
Subject to − 2x1 + 8x2 + 0x3 + 10x4 ≥ 50, 000 (Urban Majority) (1)
5x1 + 2x2 + 0x3 + 0x4 ≥ 100, 000 (Suburban Majority) (2)
3x1 − 5x2 + 10x3 − 2x4 ≥ 25, 000 (Rural Majority) (3)
x1 , x2 , x3 , x4 ≥ 0 (Can’t unadvertise) (4)
1
We will assume that the votes obtained by advertising different issues are disjoint.
Policy Demographic
Urban Suburban Rural
Building roads -2 5 3
Gun Control 8 2 -5
Farm Subsidies 0 0 10
Gasoline Tax 10 0 2
Population 100,000 200,000 50,000
25 46 14
Maximize c · x
Subject to Ax ≤ b, x ≥ 0,
Minimize b · y
Subject to AT y ≥ c, y ≥ 0.
This property of LP can be used show many important theorems. For instance, the
max-flow min-cut theorem can be proven by formulating the max-flow problem as the
primal LP problem.
4 Formulating LP Problems
In this section, we will give brief descriptions of how to formulate some problems seen
previously in this class as LP problems. Once we have a LP formulation, we can
convert the problem into the standard form as described in Section 3.
Note the maximization above, so all distances don’t end up being zero. There is no
solution to this LP if and only if there exists a negative weight cycle reachable from
s.
5 Algorithms for LP
There are many algorithms for solving LP problems:
• Interior Point Method: x moves inside the polytope following c. This algo
rithm runs in poly-time, and is practical.
In this lecture, we will study only the simplex algorithm.
• Convert one slack form into an equivalent slack form, while likely increasing the
value of the objective function, and ensuring that the value does not decrease.
Minimize 3x1 + x2 + x3
Subject to x1 + x2 + 3x2 ≤ 30
2x1 + 2x2 + 5x3 ≤ 24
4x1 + x2 + 2x3 ≤ 36
x1 , x2 , x3 ≥ 0
Change the given LP problem to slack form, consisting of the original variables
called nonbasic variables, and new variables representing slack called basic variables.
z = 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 − 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3
We start with a basic solution: we set all nonbasic variables on the right hand side
to some feasible value, and compute the values of the basic variables. For instance,
we can set x1 = x2 = x3 = 0. Note that the all 0 solution satisfies all constraints in
this problem, but may not do so in the general case.
We now perform the pivoting step:
In this example, we can increase the value of x1 . The third constraint will limit
the value of x1 to 9. We then get
x2 x 3 x 6
x1 = 9 − − − .
4 2 4
Now rewrite the other constraints with x6 on the right hand side.
x2 x3 3x6
z = 27 + + −
4 2 4
x2 x3 x6
x1 = 9 − − −
4 2 4
3x2 5x3 x6
x4 = 21 − − +
4 2 4
3x2 x6
x5 = 6 − − 4x3 +
2 2
We note the equivalence of the solutions. That is, the original basic solution
(0, 0, 0, 30, 24, 36) satisfies the rewritten constraints, and has the objective value of 0.
The second basic solution (9, 0, 0, 21, 6, 0) has the objective value of 27.
At this point, pivoting on x6 will actually cause the objective value to decrease
(though the computation is not shown here). Thus let us pick x3 as the next pivot
to get
111 x2 x5 11x6
z= + − −
4 16 8 16
33 x2 x5 5x6
x1 = − + −
4 16 8 16
3 3x2 x5 x6
x2 = − − +−
2 8 4 8
69 3x2 5x5 x6
x4 = + + −
4 16 8 16
which results in basic solution ( 33
4
, 0, 23 , 69
4
, 0, 0) with objective value of 111
4
.
6 More Topics of LP
There are several important questions regarding LP that were not discussed in this
lecture:
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture NP-Completeness 6.046J Spring 2015
Introduction
• NP-hardness and NP-completeness
• 3SAT
4
3 Dimensional Matching
4
Partition (weak)
4-Partition (strong)
Jigsaw Puzzles
• P = the set of problems that are solvable in polynomial time. If the problem
has size n, the problem should be solved in nO(1) .
In this model of nondeterminism, we can assume that all guessing is done first.
This is equivalent to finding a polynomial-time verifier of polynomial-size certificates
for YES answers.
Note that there is an asymmetry between YES and NO inputs
– If P = NP, then X ∈
/ P.
– If B ∈ P, then A ∈ NP
– If B ∈ NP, then A ∈ NP
– If A is NP-hard, then B is NP-hard.
We can show that problems are NP-complete via the following steps.
3SAT
The 3SAT was discovered to be NP-complete by Cook in 1971.
is there an assignment of variables to True and False, such that the entire formula
evaluates to True?
We note that a literal is of the form {xi , x̄i }, and both forms of the literal corre
spond to the variable xi . A clause is made up of the OR of 3 literals, and a formal is
the AND of clauses.
3SAT ∈ NP because we can create a verifier for a certificate. For a given instance
of 3SAT, a certificate corresponds to a list of assignments for each variable, and a
verifier can compute whether the instances is satisfied, or can be evaluated to true.
Thus the verifier is polynomial time, and the certificate has polynomial length.
It is important to note that this verifier only guarantees that a 3SAT instance is
verifiable. To ensure that a 3SAT instance is not verifiable, the algorithm would have
to check every variable assignment, which cannot be done in polynomial time.
3SAT is also NP-hard. We give some intuition for this result. Consider any prob
lem in NP. Because it belongs in NP, a nondeterministic polynomial time algorithm
exists to solve this problem, or a verifier to check a solution. The verifier is an al
gorithm that can be implemented as a circuit. Now, the circuit consists of AND,
OR and NOT gates, which can be represented as a formula. This formula can be
converted to 3SAT form, where each clause has 3 literals, which is equivalent to the
original formula. Thus all problems in NP can be converted to 3SAT, and the inputs
to the original problem are equivalent to the converted inputs to 3SAT, thus 3SAT is
NP-complete.
We show that Super Mario Brothers is NP-hard by giving a reduction from 3SAT.
This version of Super Mario Brothers is generalized to an arbitrary screen size of
n × n, so we remove limits on the number of items on screen. We have the following
problem definition
Definition 2. Super Mario Brothers: Given a level of Super Mario Brothers, can
we advance to the next level?
Because we reduce from 3SAT, we are given a 3SAT instance, and we must gen
erate a level of Super Mario Brothers that corresponds to that 3SAT instance.
We construct the level by constructing gadgets for each of the variables in the
3SAT formula, as pictured in Figure 1. Mario jumps down from the ledge, and
cannot jump back up. He can fall to the left or right, corresponding to assigning the
variable to True or False. The remainder of the level is set up such that this choice
cannot be reversed.
We also create the following gadget for clauses. After choosing the assignment
for a given variable, Mario visits all of the clause gadgets with the same literal value,
then moves to the next variable gadget. By visiting a clause, Mario can release a star.
Finally, after visiting all of the variable gadgets, Mario must re-traverse the clause
gadgets. If the clause gadget was previously visited, a star is available, and he can
pass through the fire. Otherwise, he will not be able to traverse the clause gadget
and die.
Thus, winning the level is equivalent to passing through all the clause gadgets on
the second pass through. Mario can only pass all the clause gadgets if they have all
been satisfied by a variable assignment. The actions throughout the variable gadgets
correspond to the solution to the 3SAT formula, so if Mario can pass the level, we
have a solution to the 3SAT problem.
The final gadget needed is the crossover gadget. It ensures that Mario does not
switch between variable and clause gadgets when it is not allowed. The total size of
all these gadgets within the polynomial size required by the reduction.
Thus, Mario can win the level if and only if the original 3SAT formula could
be satisfied. Therefore have a reduction from 3SAT, and Super Mario Brothers is
NP-hard.
3DM is NP. Given a certificate, which lists a candidate list of triples, a verifier can
check that each triple belongs to T and every element of X ∪ Y ∪ Z is in one triple.
3DM is also NP-complete, via a reduction from 3SAT. We build gadgets for the
variables and clauses.
The variable gadget for variable xi is displayed in the picture. Only red or blue
triangles can be chosen, where the red triangles correspond to the true literal, while
the blue triangles correspond to the false literal. Otherwise, there will be overlap, or
some inner elements will not be covered. There is an 2nxi “wheel” for each variable
gadget, where nxi corresponds to the number of occurrences of xi in the formula.
The clause gadget for the clause xi ∧ x¯j ∧ xk is displayed in the picture. The dot
that is unshared in each variable’s triangle within the clause gadget is also the single
dot within the variable gadget.
Then, if we set xi to true, we take all of the red false triangles in the variable
gadget, leaving a blue true triangle available to cover the clause gadget. However,
this still potentially leaves x¯j and xk uncovered, so we need a garbage collection
�
gadget, pictured below. There are x nx clauses of these gadgets, because there are
nx unnecessary elements of the variable gadget that won’t be covered. However, of
the remaining elements, one of these per clause will be covered, so the remaining need
to be covered by the garbage collection clause.
Thus, if a solution exists to the 3SAT formula, we can find the solution to the
3DM problem by choosing the points in 3DM corresponding to the variable values.
If we have a 3DM solution, we can map this back to the satisfying assignment for
3SAT. Thus, our reduction is complete and 3DM is NP-hard.
Subset Sum
Definition 4. Subset Sum Given n integers A = {a1 , a2 , . . . , an } and a target sum
t, is there a subset S ⊆ A stuch that
� �
S= ai = t
ai ∈S
Partition
Definition 5. Partition: Given A = {a1 , a2 , . . . , an }, is there a subset S ⊆ A such
that � � 1�
S= A\S = A?
2
Partition is also weakly NP-complete. It is a special case of the Subset Sum
�
problem, where we set t = 12 A. In fact, we can reduce Partition to Subset Sum,
though this is not the direction we want for the reduction.
�
We can reduce from Subset Sum to Partition as follows. Let σ = A. Add
elements an+1 = σ + t and an+2 = 2σ − t to A. Then an+1 and an+2 must be on
different sides of the partition. In order to balance the two sides, σ + t must be added
to an+1 and t must be added to an+1 , so each subset has sum 2σ. Thus if we can
solve Partition, we also have the subset of elements that sum to t, the target for the
Subset Sum problem, completing our reduction.
Rectangle Packing
Definition 6. Rectangle Packing: Given a set of rectangles Ri and a target rect
angle T , can we pack the rectangles in T such that there is no overlap? Note that
sum of the area of the rectangles Ri is equivalent to the area of the target rectangle
�
or i Ri = T .
Rectangle packing is weakly NP-hard via a reduction from Partition. For every
element ai in Partition, we create a rectangle Ri with height 1 and width 3ai . The
�
target rectangle has height 2 and width 3t = 32 A. Because each rectangle has width
at least 3, all rectangles must be packed horizontally. Thus to solve the Rectangle
packing problem, we must separate the blocks into two groups with total width 3t,
which will correspond to two subsets with total sum t in the Partition problem.
Definition 7. Given square tiles with no patterns, can these tiles be arranged to
fit a target rectangular shape? Note that the tiles can have a side tab, pocket, or
boundary, but tabs and pockets must have matching shapes.
The most obvious reduction is from Partition. For every number, create a set
of square rectangles with a unique tab and pocket, where the number of tiles is
equivalent to the value of ai , and the two end pieces of the rectangle have boundaries.
However, this reduction cannot be completed because the inputs to Partition can be
exponentially large.
Instead, the reduction comes from 4-Partition.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 17 Introduction 6.046J Spring 2015
• Definitions
• Vertex Cover
• Set Cover
• Partition
C Copt
max( , ) ≤ Q(n)
Copt C
Such an algorithm is called a Q(n)-approximation algorithm.
An approximation scheme that takes as input c > 0 and produces a solution such
that C = (1 + c)Copt for any fixed c, is a (1 + c)-approximation algorithm.
Vertex Cover
Given an undirected graph G(V, E), find a subset V ' ⊆ V such that, for every edge
(u, v) ∈ E, either u ∈ V ' or v ∈ V ' (or both). Furthermore, find a V ' such that |V ' |
is minimum. This is an NP-Complete problem.
b c d
a e f g
Approx Vertex Cover could pick edges (b, c), (e, f ) and (d, g), such that V ' =
{b, c, e, f, d, g} and |V ' | = 6. Hence, the cost is C = |V ' | = 6. The optimal solution
for this example is {b, d, e}, hence Copt = 3.
Proof: Let U ⊆ V be the set of all the edges that are picked by Approx Vertex Cover .
The optimal vertex cover must include at least one endpoint of each edge in U (and
other edges). Furthermore, no two edges in U share an endpoint. Therefore, |U |
is a lower bound for Copt . i.e. Copt ≥ |U |. The number of vertices in V ' returned
by Approx Vertex Cover is 2 · |U |. Therefore, C = |V ' | = 2 · |U | ≤ 2Copt . Hence
C ≤ 2 · Copt . D
Set Cover
Given a set X and a family of (possibly overlapping) subsets S1 , S2 , · · · , Sm ⊆ X such
that ∪m
i=1 Si = X, find a set P ⊆ {1, 2, 3, · · · , m} such that ∪i∈P Si = X. Furthermore
find a P such that |P | is minimum.
In the following example, each dot is an element in X and each Si are subsets of
X.
S3 S4 S5
S1
S2
S6
Proof: Let the optimal cover be Popt such that Copt = |Popt | = t. Let Xk be the
set of elements remaining in iteration k of Approx Set Cover . Hence, X0 = X.
Then:
• for all k, Xk can be covered by t sets (from the optimal solution)
|Xk |
• one of them covers at least t
elements
|Xk |
• Approx Set Cover picks a set of (current) size ≥ t
• for all k, |Xk+1 | ≤ (1 − 1t )|Xk | (More careful analysis (see CLRS, Ch. 35) relates
Q(n) to harmonic numbers. t should shrink.)
Algorithm terminates when |Xk | < 1, i.e., |Xk | = 0 and will have cost C = k.
e−k/t · n < 1
ek/t > n
Hence algorithm terminates when kt > ln(n). Therefore kt = CCopt ≤ ln(n) + 1. Hence
Notice that the approximation ratio gets worse for larger problems as it changes
with n.
Partition
The input is a set S = {1, 2, · · · , n} of n items with weights s1 , s2 , · · · , sn . Assume,
without loss of generality, that the items are ordered such that s1 ≥ s2 ≥ · · · ≥ sn .
�
Partition S into sets A and B to minimize max(w(A), w(B)), where w(A) = Si
� i∈A
and w(B) = Sj .
j∈B
�
n
First Phase: Find an optimal partition A' , B ' of s1 , · · · , sm . This takes O(2m ) time.
Second Phase: Initialize sets A and B to A' and B ' respectively. Hence they
already contain a partition of elements s1 , · · · , sm . Then, for each i, where i goes
Proof: Without loss of generality, assume w(A) ≥ w(B). Then the approxima
tion ratio is CCopt = w(A)
L
. Let k be the last item added to A. There are two cases,
either k was added in the first phase, or in the second phase.
Case 1: k is added to A in the first phase. This means that A = A' . We have
an optimal partition since we can’t do better than w(A' ) when we have n ≥ m items,
and we know that w(A' ) is optimal for the m items.
w(A) L + s2k sk sk 1
Now, L
≤ = 1+ 2L ≤ 1+ (m+1)·s = 1+ m+1 = 1+c. Hence Approx Partition
L k
The following example shows a bad-case example for Approx Vertex Cover Natural .
In the example, the optimal cover will pick the k! vertices at the top.
k! vertices of degree k
...
Approx Vertex Cover Natural could possibly pick all the bottom vertices from left
to right in order. Hence the cost could be k! · ( k1 + k−1
1
+ · · · + 1) ≈ k! log k. Which is
a factor of log k worse than optimal.
Proof: Let Gk be the graph after iteration k of the algorithm. And let n be the
number of edges in the graph, i.e. |G| = n = |E|. With each iteration, the algorithm
selects a vertex and deletes it along with all incident edges. Let m = Copt be the
number of vertices in the optimal vertex cover for G. Then let’s look at the first m
iterations of the algorithm: G0 → G1 → G2 → · · · → Gm .
Let di be the degree of the maximum degree vertex of Gi−1 . Then the algorithm
deletes all edges incident on that vertex to get Gi . Therefore:
�
m
|Gm | = |G0 | − di
i=1
Also:
m �
m
|Gi−1 |
di ≥
i=1 i=1
m
This is true because given |Gi−1 | edges that can be covered by m vertices, we know
that there is a vertex with degree at least |Gm
i−1 |
. Then:
m
|Gi−1 | �
m
|Gm |
≥
= |Gm |
i=1
m
i=1
m
�
m
Because |Gm | ≤ di . Hence after m iterations, the algorithm will have deleted half
i=1
or more edges from G0 . And generally, since every m iterations it will halve the
number of edges in the graph, in m · log |G0 | iterations, it will have deleted all the
edges. And since with each iteration it addes 1 vertex to the cover, it will end up with
a vertex cover of size m · log |G0 | = m · log n. Since we assumed that m was the size of
the optimal vertex cover, CCopt = m log
m
n
= log n. Hence Approx Vertex Cover Natural
is a (log n)-approximation. D
Note that since n ≈ k! log k in the example of Figure , the worst-case example is
log k ≈ log log n worse, but we have only shown an O(log n) approximation.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 18 Fixed-Parameter Algorithms 6.046J Spring 2015
• Vertex Cover
• Fixed-Parameter Tractability
• Kernelization
• Connection to Approximation
Idea: The idea is to aim for an exact algorithm but isolate exponential terms to a
specific parameter. When the value of this parameter is small, the algorithm gets fast
instances. Hopefully, this parameter will be small in practice.
k-Vertex Cover
Given a graph G = (V, E) and a nonnegative integer k, is there a set S ⊆ V of
vertices of size at most k, |S| ≤ k, that covers all edges. This is a decision problem
for Vertex Cover and is also NP-hard. We will use k as the parameter to develop a
fixed-parameter algorithm for k-Vertex Cover. Note that we can have k << |V | as
the figure below shows:
1. add u to S, delete u and incident edges from G, and recurse with k ' = k−1.
2. do the same but with v instead of u
3. return the OR of the two outcomes
This is like guessing in dynamic programming but memoization doesn’t help here.
The recursion tree looks like the following:
u,v
u v
u',v' u'',v''
u' v' u'' v''
At a leaf (k = 0), return YES if |E| = 0 (all edges covered). It takes O(V ) time
to delete u or v. Therefore this has a total runtime of (2k |V |).
t
Theorem: ∃f (k) · nc algorithm ⇐⇒ ∃f ' (k) + nc
Proof:
(⇐)
t
Trivial (assuming f ' (k) and nc are ≥ 1)
(⇒)
if n ≤ f (k), then f (k) · nc ≤ f (k)c+1
Alternatively, since xy ≤ x2 + y 2 , can just make f ' (k) = (f (k))2 and c' = 2c.
Example: O(2k · n) ≤ O(4k + n2 )
Kernelization
Kernelization is a simplifying self-reduction. It is a polynomial time algorithm that
converts an input (x, k) into a small and equivalent input (x' , k ' ). Here, small means
|x' | ≤ f (k) and equivalent means the answer to x is the same as the answer to x' .
Proof:
(⇐)
Kernelize ⇒ n' ≤ f (k)
Run any finite g(n' ) algorithm
Totals to nO(1) + g(f (k)) time
(⇒)
let A be an f (k) · nc algorithm, then assuming k is known:
if n ≤ f (k), it’s already kernelized.
if f (k) ≤ n, then
1. run A → f (k) · nc ≤ nc+1 time
So we know (exponential) kernel exists. Recent work aims to find polynomial (even
linear) kernels when possible.
• Any vertex of degree > k must be in the cover (else would need to add > k
vertices to cover incident edges)
• Remove such vertices (and incident edges) one at a time, decreasing k accord
ingly
• Else, |E ' | ≤ k 2
• Now |V ' | ≤ 2k 2
• The input has been reduced to instance (V ' , E ' ) of size O(k 2 )
The runtime of the kernelization algorithm is naively O(V E). (O(V + E) with more
work.) After this, we can apply either a brute-force algorithm on the kernel, which
yields an overall runtime O(V + E + (2k2 )k k 2 ) = O(V + E + 2k k 2k+2 ). Or we can
apply a bounded search-tree solution, which yields a runtime of O(V + E + 2k k 2 ).
The best algorithm to date: O(kV + 1.274k ) by [Chen, Kanj, Xia - TCS 2010].
• relative error ≤ 2k
< k
(Can use this relation to prove that EPTASs don’t exists in some cases)
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 21 Hashing 6.046J Spring 2015
• Desirable Properties
• Applications to security
1 Hash Functions
A hash function h maps arbitrary strings of data to fixed length output. The function
is deterministic and public, but the mapping should look “random”. In other words,
for a fixed d. Hash functions do not have a secret key. Since there are no secrets and
the function itself is public, anyone can evaluate the function. To list some examples,
In practice, hash functions are used for “digesting” large data. For example, if
you want to check the validity of a large file (potentially much larger than a few
megabytes), you can check the hash value of that file with the expected hash. There
fore, it is desirable (especially for cryptographic hash functions covered here) that
the function is collision resistant. That is, it should be “hard” to find two inputs m1
and m2 for hash function h such that h(m1 ) = h(m2 ). Most modern hash functions
hope to achieve security level of 264 or better, which means that the attacker needs
to test more than 264 different inputs to find a collision. Unfortunately, MD4 and
MD5 aimed to provide 264 security, but has been shown to be broken using 26 and
237 inputs respectively. SHA-1 aimed to provide 280 security, but has been shown (at
least theoretically) to be no more than 261 security.
2. Strong collision-resistance: It is hard to find any pair of inputs x, x' such that
h(x) = h(x' ).
5. Non-malleability: Given h(x), it is hard to generate h(f (x)) for any function f .
Some of the properties imply others, and some others do not. For example,
• 2⇒3
• 1 ⇒ 2, 3.
Furthermore, collision can be found in O(2d/2 ) (using birthday paradox), and inversion
can be found in O(2d ).
To give more insight as to why some properties do not imply others, we provide
examples here. Consider h that satisfies 1 and 2. We can construct a new h' such that
h' takes in one extra bit of input, and XORs the first two bits together to generate
an input for h. That is, h' (a, b, x2 , . . . , xn ) = h((a ⊕ b), x2 , . . . , xn ). h' is still one-way,
but is not weak collision resistant. Now consider a different h'' that evaluates to 0||x
if |x| ≤ n, and 1||h(x) otherwise. h'' is weak collision resistant, but is not one-way.
1.3 Applications
There are many applications of hash functions.
1. Password Storage: We can store hash h(p) for password p instead of p directly,
and check h(p) to authenticate a user. If it satisfies the property 1, adversary
comprising h(p) will not learn p.
2. File Authenticity: For each file F , we can store h(F ) in a secure location. To
check authenticity of a file, we can recompute h(F ). This requires property 3.
3. Digital Signature: We can use hash functions to generate a signature that guar
antees that the message came from a said source. For further explanation, refer
to Recitation 11.
4. Commitments: In a secure bidding, Alice wants to bid value x, but does not
want to reveal the bid until the auction is over. Alice then computes h(x), and
publicize it, which serves as her commitment. When bidding is over, then she
can reveal x, and x can be verified using h(x).
It should be “binding” so that Alice cannot generate a new x that has the same
commitment, and it should “hide” x so no ones learns anything before x is re
vealed. Furthermore, it should be non-malleable. To guarantee secrecy, we need
more than the properties from the previous section, as h' (x) = h(x)||M SB(x)
actually satisfies 1, 2, and 5. In practice, this problem is bypassed by adding
randomness in the commitment, C(x) = h(r||x) for r ∈R {0, 1}256 , and reveal
both randomness and x at the end.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 22 6.046J Spring 2015
• Key exchange
• RSA
– graph coloring
– knapsack
• Alice unlocks kA
Notice that this method relied on the commutativity of the locks. This means
that the order of the lock and unlock operations doesn’t matter.
g public
Alice Bob 2≤g ≤p−2
p public
Select a
Compute g a ga 1 ≤ a, b ≤ p − 2
Select b
gb Compute g b
( ta
Alice can compute g b mod p = k.
a b
Bob can compute (g ) mod p = k.
Assumes the Discrete Log Problem is hard (given g a , compute a) and Diffie Hell-
man Problem is hard (given g a , g b compute g ab ).
Can we attack this? Man-in-the-middle:
The two keys need to be linked in a mathematical way. Knowing the public key
should tell you nothing about the private key.
RSA
• Alice picks two large secret primes p and q.
• Alice computes N = p · q.
Why it works
φ = (p − 1)(q − 1)
Since ed ≡ 1 mod φ there exists an integer k such that ed = 1 + kφ.
Two cases:
mp−1 ≡ 1 mod p
( p−1 tk(q−1)
m ·m≡m mod p
m1+k(p−1)(q−1) = med ≡ m mod p
Hardness of RSA
• Factoring: given N , hard to factor into p, q.
• RSA Problem: given e such that gcd(e, (p − 1)(q − 1)) = 1 and c, find m such
that me ≡ c mod N .
NP-completeness
Is N composite with a factor within a range? unknown if NP-complete
Is a graph k-colorable? In other words: can you assign one of k colors to each
vertex such that no two vertices connected by an edge share the same color? NP-
complete
Given a pile of n items, each with different weights wi , is it possible to put items
in a knapsack such that we get a specific total weight S? NP-complete
• If you get stuck, backtrack to previous choice, and try next choice.
Knapsack Cryptography
General knapsack problem: NP-complete
Super-increasing knapsack: linear time solvable. In this problem, the weights are
constrained as follows:
L
j−1
wj ≥ wi
i=1
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 23 Cache-Oblivious I 6.046J Spring 2015
• cache-oblivious scanning
• cache-oblivious divide & conquer algorithms: median finding & matrix multi
plication
Each hierarchy on the right is bigger, but has longer latency due to the longer
distance data has to travel. Yet, bandwidth between different hierarchies is usually
matched.
A common technique to mitigate latency is blocking: when fetching a word of
data, get the entire block containing it. Using algorithmic terminology, the idea is
to amortize latency over a whole block. For this idea to work, we additional require
the algorithm to use all words in a block (spatial locality) and reuse blocks in cache
(temporal locality).
... ...
In this model, cache acecsses are free. The algorithm explicitly reads and writes
memory in blocks. We will count the number of memory transfers between the cache
and the memory, as an important metric of the algorithm’s performance.
3 Scanning
Example program:
for i in range(N ): sum += A[i]
4. partition array by x
We will now analyze the number of memory transfers in each step. Let M T (N )
be the total number of memory transfers.
1. free
2. a scan, O(N/B + 1)
3. M T (N/5), this involves a pre-processing step that coallesces the N/5 elements
in a consecutive array
5. M T (7N/10)
Solving the recursion requires setting a base case. An obvious base case is
M T (O(1)) = O(1).
But we can get a stronger base case: M T (O(B)) = O(1). Uing this base case, the
recursion solves to M T (N ) = O(N/B + 1). (Intuition: cost at level of the recursion
decreases geometrically, so the cost at root dominates.)
M T (N ) = 8M T (N/2) + O(N 2 /B + 1)
The first term is recursive sub-matrix multiplication, and the second term is matrix
addition which requires scanning the matrices.
Again, we can have different base cases
The third case represents the case that the three involved matrices can fit in the
cache together. Therefore, to multiply them, we only need one scan to load all of
them into cache.
If we draw the recursion tree, the cost at each level is geometrically increasing this
time, N 2 /B, 8( N2 )2 /B, 82 ( N4 )2 /B, . . . . Therefore, the cost at the leaves dominate, and
the total cost = cost per leave · number of leaves,
√ √ √
M T (N ) = O(M/B) · 8O(lg N/ M ) = O(M/B) · O((N/ M )3 ) = O(N 3 /B M ).
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Lecture 24 Cache-Oblivious II 6.046J Spring 2015
• Search
– binary
– B-ary
– cache-oblivious
• Sorting
– mergesorts
– cache-oblivious
• partition block access sequence into maximal phases of M/B distinct blocks
• OPT must spend ≥ M2 /B memory transfers per phase: at best, starts phase
with entire M/2 cache with needed items. But there are M/B blocks during
phase. So ≤ half free
Search
Preprocess n elements in comparison model to support predecessor search for x.
B-trees
They support predecessor (and insert and delete) in O(logB+1 N ) memory transfers.
• height= Θ(logB N )
• need to know B
Binary search
Approximately, every iteration visits a different block until we are in x’s block. Thus,
M T (N ) = Θ(log N − log B) = Θ(log(N/B)). SLOW
(1/2)lg N
\sqrt{N}
lg N
middle level
• like block matrix multiplication, order of pieces doesn’t matter; just need each
piece to be stored consecutively
• this generalizes to heights that are not powers of 2, B-trees of constant branch
ing factor and dynamic B-trees: O(logB N ) insert/delete. [Bender, Demaine,
Farach-Colton 2000]
Sorting
B-trees
N inserts into (cache-oblivious) B-tree =⇒ M T (N ) = Θ(N logB N ) NOT OPTI
MAL. By contrast, BST sort is optimal O(N lg N )
Binary mergesort
• binary mergesort is cache-oblivious.
=⇒ M T (N ) = 2M T (N/2) + O(N/B + 1)
M T (M ) = O(M/B)
• the recursion tree has lg(N/M ) levels, and each level contributes O(N/B)
=⇒ M T (N ) = N B
N
lg M . ← lgBB faster than the B-tree version discussed earlier!
M/B-way mergesort
• split array into M/B equal subarrays
• merge via M/B parallel scans (keeping one “current” block per list)
� �
M N
=⇒ M T (N ) = MT + O(N/B + 1)
B M/B
M T (M ) = O(M/B)
N
=⇒ height becomes logM/B +1
M
N B
= logM/B · +1
B M
N M
= logM/B − logM/B +1
B B
N
= logM/B
B
� �
N N
=⇒ M T (N ) = O logM/B
B B
This is asymptotically optimal, in the comparison model.
Cache-oblivious Sorting
This requires the tall-cache assumption: M = Ω(B 1+E ) for some fixed f > 0, e.g.,
M = Ω(B 2 ) or M/B = Ω(B).
Then, ≈ N E -way mergesort with recursive (“funnel”) merge works.
Priority Queues
• O( B1 logM/B N
B
) per insert or delete-min
• generalizes sorting
• see 6.851
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.