Cormen 2nd Edition Solutions
Cormen 2nd Edition Solutions
Cormen 2nd Edition Solutions
wax@alum.mit.edu
c
Text copyright
2014
John L. Weatherwax
All Rights Reserved
Please Do Not Redistribute Without Permission from the Author
To my family.
Acknowledgments
Special thanks to Avi Cohenal for finding typos in these notes. All comments were (and are)
much appreciated.
Exercise 1.2-2
For this exercise we want to determine the smallest value of n such that
Tmerge sort (n) = 65n lg(n) < 8n2 = Tinsertion sort (n) .
We can find such a value of n by plotting both sides of this inequality and observing where
they two curves cross. We do that in the following R code
ns = 5:80
t_merge = 65 * ns * log( ns, base=2 )
t_insert = 8 * ns^2
plot( ns, t_merge, type=l, col=green, xlab=n, ylab=time,
ylim=c(min(c(t_merge,t_insert)), max(c(t_merge,t_insert))) )
50000
0
10000
20000
time
30000
40000
merge
insertion
20
40
60
80
Figure 1: A comparison between the time to run merge sort and insertion sort on an
input of size n.
lines( ns, t_insert, col=red )
grid()
legend( 10, 50000, c("merge", "insertion"), col=c("green","red"), lty=c(1,1) )
Running the above code we get Figure 1. From that figure it looks like the two curves cross
when n 45. Thus insertion sort is actually faster than merge sort when n is smaller than
this value. For values of n large we see that merge sort runs in less time than insertion sort.
Exercise 1.2-3
For this we want to determine the smallest value of n such that
100n2 < 2n .
We can do a plot like in the previous problem to determine this in the following R code
ns = 1:17
t_left = 100 * ns^2
t_right = 2^ns
plot( ns, t_left, type=l, col=green, xlab=n, ylab=time,
ylim=c(min(c(t_left,t_right)), max(c(t_left,t_right))) )
lines( ns, t_right, col=red )
grid()
legend( 1, 120000, c("quadratic growth", "exponential growth"),
col=c("green","red"), lty=c(1,1) )
When we run the above code we get Figure 2. In that plot we see that after an initial section
(where n is small) the value of 2n becomes much larger than the quadratic competitor.
7
120000
0
20000
40000
60000
time
80000
100000
quadratic growth
exponential growth
10
15
Function
lg(n)
n
n
n lg(n)
n2
n3
2n
n!
1 second
O(1012 )
O(106)
O(104)
O(103)
100
20
9
1 minute
O(1015 )
O(107)
O(106)
O(103)
400
26
11
1 hour
O(1019)
O(109)
O(108)
O(104)
1500
32
13
1 day
O(1021)
O(1010)
O(109)
O(105)
4000
36
14
1 month
O(1024)
O(1012)
O(1011)
O(106 )
10000
41
15
1 year
O(1029)
O(1014)
O(1013)
O(107)
100000
49
17
1 century
O(1033 )
O(1016 )
O(1015 )
O(108)
500000
56
19
Table 1: The largest problem size n (approximate) that is solvable by the time in the column
heading when the time to solve a problem of size n takes f (n) microseconds. Here the value
of represents numerical overflow when trying to invert the given function.
Using these functions we can compute the largest value of n that a problem can be solved
in the given time using the following R code
# the time we can spend solving (in seconds)
solve_time = c( 1, 60, 60*60, 24*3600, 30*24*3600, 365*30*24*3600, 100*365*30*24*3600 )
solve_time_ms = solve_time / 10^(-6)
# Here we invert each function:
data = c( exp( solve_time_ms ),
solve_time_ms^2,
solve_time_ms,
sapply( solve_time_ms, invert_nlogn ),
sqrt( solve_time_ms ),
solve_time_ms^(1/3),
log( solve_time_ms, base=2 ),
sapply( solve_time_ms, invert_nfactorial ) )
If we put the numbers we get from the above R code into a table we get Table 1. Notice
that if we have a fixed time budget (i.e. we are looking at a particular column) then as we
move downwards though the rows, the size of the problem we can solve in that time drops
drastically.
Exercise 2.1-2
To change the INSERTION-SORT routine to sort the numbers in decreasing order we
change the line that says
while i > 0 and A[i] > key
to
while i > 0 and A[i] < key
then we will be searching linearly though the sorted list to find the position where A[i] >= key
and that inequality signals that the value of key should go at position A[i+1].
10
A
0
0
1
1
0
1
0
1
B
0
1
0
1
0
0
1
1
Table 2: The sum of the binary digits A and B (with a possible carry bit)
Exercise 2.1-3 (linear search)
Pseudocode for this procedure is given as follows.
Linear-Search(A, )
1 for i 1 to length[A]
2 if A[i] ==
3
then return i
4 return nil
11
Binary-Add(A, B, C)
1 carry 0
2 for i 1 to length[A]
3
do
4
perform the binary addition A[i] + B[i] + carry
5
if carry == 0
6
then if A[i] == 1 and B[i] == 1
7
then carry 1
8
else if A[i] == 0 and B[i] == 0
9
then carry 0
10
C[i] should get the least significant bit of the given binary addition
11
C[i] A[i] + B[i] + carry
12 update the last element of C
13 if carry == 1
14
then C[length[A] + 1] carry
Exercise 2.2-1
The given function is (n3 ).
Replacing that 41 with x and finding the next smallest number to be the other 41 we get
the algorithmic step
x, x, 59, x, 41, 58 26, 31, 41, 41
Next we find 58 to have
x, x, 59, x, x, 58 26, 31, 41, 41, 58
Finally we find 59 to get
x, x, 59, x, x, x 26, 31, 41, 41, 58, 59
On the right-hand-side we now have a sorted list obtained from the original list A.
Having done an example of how this algorithm works we now present pseudocode for its
implementation
Selection-Sort(A)
1 for i 1 to length[A] 1
2
do
3
Find the smallest element in the array from A[i + 1] to A[length[A]]
4
sindex i
5
for j i + 1 to length[A]
6
do
7
if A[j] < A[sindex ]
8
then sindex j
9
Swap elements A[i] and A[sindex ]
10
tmp A[i]
11
A[i] A[sindex ]
12
A[sindex ] tmp
The invariant of this code is that at each iteration of the outer loop all elements in the left
half of the array A[1 . . . i 1] will be the sorted i 1 smallest elements from A. We only
need to loop over the first n 1 elements since if we sort this many elements the last (and
largest element) must be found at the last location of the array A[n] and the full array A is
sorted.
The best-case and worst-case performance for this algorithm are the same since the above
loops are performed regardless of what elements are needed to be exchanged. Another way
to say this is to note that we do not know a priori that one does not need to search all
elements of the subarray A[i + 1 . . . n] and thus have to do the work regardless of the input
array.
Note that in the inner minimization loop we must search over a list of length n i and we do
this for i starting from 1 to n 1. Thus the time complexity of this procedure is dominated
13
n1
X
i=1
=C
n1
X
i=1
n(n 1)
+ lower order terms = (n2 ) .
=C
2
Exercise 2.2-3
Without any special assumptions on the input array A I would say that on average n/2
elements would need to be searched to find a given element. This can be argued by assuming
that on average 1/2 of the elements in A will be larger than v and 1/2 of the elements in
A will be smaller than v. Thus on average n/2 elements would have to be searched. The
best case for this algorithm is when the search key v is the first element of the array. The
worst case for this algorithm is when v is the last element of the array. In all cases we have
T (n) = (n).
Exercise 2.2-4
We could modify our algorithm to check if the input is equal to a specially chosen input
and then have the algorithm return a precomputed result. This would really require no
algorithmic work and thus the best case performance of this algorithm (on these special
cases) is extremely fast.
Exercise 2.3-1
The given array starts as
< 3 , 41 , 52 , 26 , 38 , 57 , 9 , 49 >
We first break this array into two parts
< 3 , 41 , 52 , 26 > and < 38 , 57 , 9 , 49 >
Each of these arrays is then further broken up into two parts (each with only two elements)
< 3 , 41 > and < 52 , 26 > and < 38 , 57 > and < 9 , 49 >
14
We would next break each pair of element into eight single element lists which we would
then have to combine with the Merge command. This results in each two element list being
sorted and we get
< 3 , 41 > and < 26 , 52 > and < 38 , 57 > and < 9 , 49 >
Next we merge these groups of two elements into two groups each with four elements to get
< 3 , 26 , 41 , 52 > and < 9 , 38 , 49 , 57 >
Finally we merge these two sorted groups of four elements into on fully sorted array to get
< 3 , 9 , 26 , 38 , 41 , 49 , 52 , 57 >
Exercise 2.3-2
Recall that the procedure Merge(A, p, q, r) merges two sorted subarrays of A namely A(p, q)
and A(q + 1, r) into a single sorted list. To start the elements from A(p, q) and A(q + 1, r)
are copied into two working arrays L (for left) and R (for right). This part of the algorithm
is the same as that given in the text. Once we have our data in these two working arrays we
can proceed to copy it without using sentinels. The pseudocode for this entire procedure is
then given by
15
Merge(A, p, q, r)
1 the number of elements in the left array
2 n1 q p + 1
3 the number of elements in the right array
4 n2 r q
5 create arrays L[1 . . . n1 ] and R[1 n2 ]
6 for i 1 to n1
7
do L[i] A[p + i 1]
8 for j 1 to n2
9
do R[j] A[q + j]
10 i 1 holds the index into L
11 j 1 holds the index into R
12 k 1 holds the index into A
13 while i n1 and j n2
14
do if L[i] R[j]
15
then A[k] L[i]
16
i i+1
17
else A[k] R[j]
18
j j+1
19
k k+1
20 at this point one of L or R is now empty
21 while i n1 never executed if the left array runs out first
22
do
23
A[k] L[i]
24
i i+1
25
k k+1
26 while j n2 never executed if the right array runs out first
27
do
28
A[k] R[j]
29
j j+1
30
k k+1
Exercise 2.3-3
We want to show that for the given recurrence relationship that the solution is
T (n) = n lg(n) ,
when n is a power of two. Lets check this for the first power of two i.e. when n = 2 that the
proposed solution is true. In this case we have T (2) = 2 lg(2) = 2 which is correct.
Next, as needed for mathematical induction assume that T (n) = n lg(n) for n a power of
two up to some point. That is for n {21 , 22 , 23 , . . . , 2k1, 2k }. Then for the next power of
16
Exercise 2.3-4
Some very simple pseudocode of the suggested procedure it might be
Recursive-Insertion-Sort(A, n)
1 Recursive-Insertion-Sort(A, n 1)
2 Insert A(n) into the sorted list A[1 . . . n 1]
This is enough to see that if T (n) represents the amount of time it takes to run this code on
an array of length n we have that
T (n) = T (n 1) + (n 1) ,
since to do the insertion of A(n) into A[1 . . . n 1] we must process at most n elements.
Note we could use a bisection based search method to find the location where A(n) should
be placed in the array A[1 . . . n 1] which would require lg(n) work. Once this location is
found we still need to move the elements of A to the right to insert it. This insertion
process will require (n) work.
The above difference equation is one that we can solve using methods presented later in the
text. However, since this equation is so simple there are other methods one can use to solve
for T (n). Using methods from [1] we can write it as
T (n) = (n) ,
so that integrating (reverse differencing in this case) both sides we get T (n) = (n2 ). This
book will present a general method for solving recurrences of this form in later chapters.
As an aside, pseudocode of the procedure Recursive-Insertion-Sort might be
17
Recursive-Insertion-Sort(A, n)
1 Recursive-Insertion-Sort(A, n 1)
2 key A[n]
3 i n1
4 shift elements of A to the right to make space for A[n]
5 while i > 0 and A[i] > key
6
do
7
A[i + 1] A[i]
8
ii1
9 A[i + 1] key
Exercise 2.3-5
To start this problem we will write pseudocode for iterative binary search. In that procedure
we assume that our input array A is sorted and we are given an element v to search for. In
the pseudocode below, we will take p and q to be indexes into A such that we want to search
for v from all elements between A[p] and A[q] inclusive. Our pseudocode for iterative binary
search is then given by
Iterative-Binary-Search(A, p, q, v)
1 loc nil
2 while q p 0
3
do
4
m floor((p + q)/2)
5
if A[m] = v
6
then return m
7
if A[m] > v
8
then q m 1
9
else p m + 1
10 return nil
Note that the worst case running time of this algorithm would happen when we have to
divide the array in two the most number of times possible that is if we had to search lg(n)
times, thus we would have T (n) = (lg(n)). We now present pseudocode for recursive binary
search
18
Recursive-Binary-Search(A, p, q, v)
1 if p q < 0
2
then return nil
3 m floor((p + q)/2)
4 if A[m] = v
5
then return m
6 if A[m] > v
7
then return Recursive-Binary-Search(A, p, m 1, v)
8
else return Recursive-Binary-Search(A, m + 1, q, v)
From the recursive search pseudocode we have that
n
+ (1) .
T (n) = T
2
The solution to this recurrence relationship (to be demonstrated later) is T (n) = (lg(n))
the same as we got in the earlier argument.
Note that we would initially call each procedure with the arguments (A, 1, n, v).
Exercise 2.3-6
The answer to this question is no. Even if we could obtain the location of the index i in the
array A where we would need to place the element A[j] at no computational cost we still
must move all the elements of A to insert A[j] into the array A[1 . . . j]. This shuffling of
the element requires at most (j) work. Thus the worst case complexity of insertion sort is
not changed from (n2 ) by using binary search. While the above is a worst case complexity
argument note that in practice this change would improve the performance of insertion sort.
As there are better algorithms for sorting than insertion sort this observation is not that
useful however.
Exercise 2.3-7
To start this problem, note that exhaustive search (over all pairs of elements of S) would
require
n!
n
n(n 1)
=
=
= (n2 ) ,
2
(n 2)!2!
2
amount of work and thus is much more costly than the procedure we describe next.
To start, we will sort the set S dropping any duplicate elements. Using merge sort the
complexity of this action is (n lg(n)). To derive the remainder of our algorithm we will take
note of the following two facts. If we have two numbers a and b such that a < x2 and b < x2 .
Then we will have that
a + b < x.
19
x
2
a + b > x.
Thus the point x2 determine the location in our sorted list such that if there are two numbers
that sum to x then one of them must be smaller than x2 and the other one must be larger
than x2 . The argument above shows that if that was not true then the two numbers would
sum to a value that was less than x or to a value that was greater than x. Find the location
where x2 would be placed in the sorted list of original numbers. This location splits the
sorted list into two smaller sorted lists A and B where all elements in A are less than x2 and
all elements of B are greater than x2 . If two elements sum to x one must come from A and
another from B. In fact if we take each value of a A and compute a target of b = x a
then if we find that b B we are done. In the same way if we take each value of b B and
compute a
= x b then if a
A we are done.
To turn these observations into an algorithm we do the following. For each value of a A
we compute the value x a (there are |A| of these numbers) and for each value of b B
we compute the value x b (there are |B| of these numbers). All of this work then costs
|A| + |B| = (n).
As argued above, if we have two values a and b that sum to x then one of these numbers
(formed from x a or x b) that we just computed must also be in the original set S. If
see if this is true we form a new list of length 2n consisting of the original set S and all of
the numbers we just computed. We then sort this list which costs
(2n lg(2n)) = (2n(lg(n) + lg(2)) = (n lg(n)) ,
work. We can then walk this sorted list looking for any adjacent duplicate elements. If we
find any, we have that there must be two elements that sum to x. The total complexity of
this procedure is (n lg(n) + n) = (n lg(n)).
n
k
Part (a): We start by sorting each of the nk lists using insertion sort. As insertion sort
requires (k 2 ) work and we have nk lists to sort the total amount of work required is
n
(k 2 ) = (nk) .
k
Part (b): One way (not optimal) to merge these nk smaller lists into one complete list would
be the following. As each nk lists are sorted (say in ascending order) we first select the single
smallest value from the set of the first nk values in each smaller list. This can be done in nk
work (as we scan a single value in each of the smaller arrays) and then copy the smallest of
these to our output array. We now compare another nk values, find the smallest, and copy
20
,
k
work under this algorithm.
A better algorithm would be to perform merges of adjacent pairs of smaller sublists. The
first step of this algorithm will merge pairs of lists (each of size k) and produce sorted lists
n
of size 2k. The merge requires (2k) work and there are 2k
pairs to merge giving a total
amount of work in the first step of
n
= (n) .
2k
2k
n
Merging these smaller lists of size 2k with 4k
pairs gives a total amount of work of
n
= (n) ,
2(2k)
4k
again. Continuing this argument a third time we again get (n) amount of work. Thus at
each step we need to perform (n) amount of work. We have to perform lg nk of these steps
to get the final sorted list of length n. Thus the total work under this algorithm is
n
.
n lg
k
Part (c): The combined algorithm then must do an amount of work given by
n
.
nk + n lg
k
Recall that directly applying merge sort on a list of length n takes (n lg(n)). Thus we want
to pick k such that
n
nk + n lg
(n lg(n)) .
k
To get a understanding of what k might need to be we can try to divide by n on both
sides to get
n
k + lg
(lg n)) .
k
In the above expression to leading order we must take
k = (lg(n)) .
Part (d): In practice we could run timing tests to see how large k could be to have insertion
sort be faster than merge sort.
Part (b): Let the loop invariant for the inner loop be the following. At the beginning and
end of the inner loop we will have A[j] to be the smallest element from all those that come
after it. That is A[j] = min{A[i] : j i n}. We need to show that this loop invariant is
satisfied at the start of the loop and is true at the end of the loop.
At the start of the loop j = n = length[A] and so A[j] is trivially the minimum of the set
with only one element {A[n]}.
For a general index value j, if there was an element of A[i] smaller than A[j] for j < i n
then it would have been exchanged at some point in the loop and would have to come before
A[j]. At the completion of this loop we will have the smallest value at location j and the
loop invariant holds.
Part (c): Let the loop invariant for the outer loop be the following. At index i the elements
in A[1 . . . i 1] are the smallest elements from A and are in sorted order.
Before the first iteration we have i = 1 and the loop invariant trivially holds.
Consider iteration i of the outer loop. Then from the loop invariant on the previous outer
loop iteration we have that A[1 . . . i 1] holds the sorted smallest i 1 elements from A.
From the loop invariant for the inner loop when it ends we have that A[i] is the smallest
value from the remaining elements of A. It stands to reason that A[1 . . . i] is the sorted
smallest i elements from A and the loop invariant is true at the end of the outer loop.
Part (d): To determine the worst-case running time for bubblesort note that the code that
is executed inside the two nested loops takes (1) time. The innermost loop has this code
executed n (i + 1) + 1 = n i times for each value of i in the outer most loop. The
outermost loop runs for i = 1 to i = n. Thus the total work done is
n
X
i=1
(n i) =
n
X
(i) = (n2 ) .
i=1
n
X
ak xk ,
k=0
directly. This means that as a subroutine we need to compute the terms ak xk . We do this
in the following pseudocode
22
Naive-Polynomial-Evaluation(a, x)
1 the polynomial coefficients are held in the array a such that a0 = a[1], a1 = a[2] etc.
2 y0
3 k0
4 while k n
5
do
6
Compute the term ak xk
7
i1
8
term a[k]
9
while i k
10
do
11
multiply by x
12
term term x
13
i i+1
14
y = y + term
15
k k+1
16 return y
The innermost while loop takes (1) work and is executed (k) times. The outermost loop
then takes
!
n
X
n(n + 1)
= (n2 ) .
T (n) =
k =
2
k=0
Part (c-d): To show that the given expression for y is a loop invariant we must show that
it is true before the loop starts and after the loop ends. At the start of the loop i = n and
the value of the proposed loop invariant y is zero by the convention of a summation with no
terms (since the upper limit of the sum is 1). Assuming the loop invariant is true at the
top of the loop we can write it as
n(i+1)
y=
X
k=0
Using this we see that the value of y at the end of the loop is given by
y ai + xy
ni
X
ak+i+1 xk ,
k=0
which shows that the loop invariant holds at the end of the loop. Thus at the top of the last
iteration of the loop we have i = 0 where y is given by
y=
n1
X
ak+1 xk ,
k=0
23
n1
X
k+1
ak+1 x
n
X
ak xk ,
k=0
k=0
n1
X
k=1
k=
n
X
k=1
kn
n(n + 1)
n
n = (n + 1 2)
2
2
n(n 1)
= (n2 ) .
=
2
=
Part (c): Every inversion must be undone by a step in the insertion sort inner while loop
and thus we expect that the running time of insertion sort to be proportional to the number
of inversions.
Part (d): Following the hint given in the book we will attempt to implement a divide-andconquer merge sort type algorithm to count the number of inversion in an input array. If we
take the merge sort algorithm as a starting point then when after splitting the array A into
two parts we see that the number of inversion in the original array A will be the number of
inversions found in the left-half plus the number of inversions found in the right-half plus
the number of inversions involving elements in the left and right halves.
In each of the recursive calls lets assume that our algorithm returns the number of inversions
in the left and right subarrays. We then have to implement a merge like procedure that
counts the number of inversions between the two left and right arrays. The big leap to
understanding this problem is to notice that if the left and right subarrays are returned
sorted (along with the number of inversions each subarray contained) we can easily count
the number of cross inversions when we merge the two arrays.
Assume that we have two sorted arrays L (for left) and R (for right) and we will merge
these two into an output sorted array as in merge sort. In that procedure if we are placing
an element of L into the output array then this element is smaller than all other elements
in L and is also smaller than all other elements in R. Thus this operation does not reveal
24
an inversion between elements in L and R. On the other hand, if we place an element from
R into the output array then this element must be smaller than all the remaining unplaced
elements in the L array.
If there are m remaining elements in L we have found m cross inversions. As we continually
place elements from L and R onto the output array each time we place one from R we add
m to the number of cross inversions found. At this point the pseudocode for this procedure
can be written as
Count-Inversions-N-Sort(A, p, r)
1 Returns the number of inversions and A sorted between the two indices p and r
2 if p r
3
then return 0
4 q floor((p + r)/2)
5 inversions Count-Inversions-N-Sort(A, p, q)
6 inversions inversions +Count-Inversions-N-Sort(A, q + 1, r)
7 inversions inversions +Count-Inversions-N-Merge(A, p, q, r)
8 return inversions
Next we need the following pseudocode (borrowing form the procedure Merge above).
25
Count-Inversions-N-Merge(A, p, q, r)
1 the number of elements in the left array
2 n1 q p + 1
3 the number of elements in the right array
4 n2 r q
5 create arrays L[1 . . . n1 ] and R[1 n2 ]
6 for i 1 to n1
7
do L[i] A[p + i 1]
8 for j 1 to n2
9
do R[j] A[q + j]
10 i 1 holds the index into L
11 j 1 holds the index into R
12 k 1 holds the index into A
13 inversions 0
14 while i n1 and j n2
15
do if L[i] R[j]
16
then A[k] L[i]
17
i i+1
18
else A[k] R[j]
19
j j+1
20
Add in the inversion count
21
inversions inversions +(n1 i + 1)
22
k k+1
23 at this point one of L or R is now empty
24 while i n1 never executed if the left array runs out first
25
do
26
A[k] L[i]
27
i i+1
28
k k+1
29 while j n2 never executed if the right array runs out first
30
do
31
A[k] R[j]
32
j j+1
33
k k+1
34 return inversions
From the above code we see that the running time of this algorithm on an input of size n
can be written as
n
T (n) = 2T
+ (n) .
2
This has the solution T (n) = n lg(n) as we were to show.
I should also mention that this algorithm, to report the number of inversions in an array with
this complexity, is given as an example of using the divide-and-conquer algorithm design
paradigm, and was presented in two lectures in Tim Roughgardens Coursera algorithms
26
class1 . One of the requirements of that class was to implement this algorithm. You can find
source code for a python implementation of that algorithm linked to the web site for these
notes.
https://www.coursera.org/course/algo
27
1
2
To show that this is true assume that f (n) < g(n) for a given n. Then
1
(f (n) + g(n)) g(n) = max(f (n), g(n)) .
2
The case where f (n) > g(n) is symmetric and gives the same conclusion.
For the value for c2 we can take c2 = 1. Then the statement above is
max(f (n), g(n)) f (n) + g(n) ,
which is certainly true when f and g are nonnegative functions. Combining these two results
we have the the desired expression.
Exercise 3.1-2
We want to show that for and real constants a and b > 0 that we have
(n + a)b = (nb ) .
To show this note that the left-hand-side can be bounded above as
a b
b
(1 + a)b nb ,
n 1+
n
since
a
n
< a for all n 1. Thus in the required bound for the notation of
c1 nb (n + a)b c2 nb ,
28
(1)
we see that the constant value (1 + a)b works for c2 . Also from the simple inequality of
11+
a
,
n
a b
.
nb nb 1 +
n
From this we see that in Equation 1 we can take the value of c1 = 1. In summary we have
shown
a b
nb nb 1 +
(1 + a)b nb ,
n
for all n 1 which is the definition of (n + a)b = (nb ).
Exercise 3.1-3
The statement that the running time T (n) is O(n2 ) means that there exists a constants c and
n0 such that T (n) cn2 for all n n0 . The statement is at least implies that T (n) g(n)
for some function g(n) (here it looks like it should be n2 ). These two statements contradict
each other.
Exercise 3.1-4
To answer the first question we ask if there are constants c and n0 such that
2n+1 c2n ,
for all n n0 . If we take c > 2 say c = 4 then the above becomes
2n+1 2n+2 ,
which is true for all n 1 and we have that 2n+1 = O(2n ).
To answer the second question we ask if there are constants c and n0 such that
22n c2n ,
for all n n0 . If we divide this expression by 2n we get
2n c ,
which cannot be true and this statement is false.
29
Exercise 3.1-5
This is really just a restatement of the () notation in the O() and () notation (and vice
versa) since for example the expression f (n) = (g(n)) means that
c1 g(n) f (n) c2 g(n) .
From which we see that the left inequality is the statement of f (n) = (g(n)) and the right
inequality is the statement that f (n) = O(g(n)).
Exercise 3.1-6
For this problem we want to show that
T (n) = (g(n)) ,
if and only if
Tworst (n) = O(g(n)) and Tbest (n) = (g(n)) .
To show that T (n) = (g(n)) imply the other two expressions first recall that T (n) = (g(n))
means that
c1 g(n) T (n) c2 g(n) ,
for some c1 and c2 and all n n0 and the above inequality is valid no matter what inputs
are given to the algorithm. Specifically we can take worst and best case inputs i.e. T (n) =
Tworst (n) and T (n) = Tbest (n) to get
Tworst (n) c2 g(n)
Tbest (n) c1 g(n) .
(2)
(3)
The first inequality is the statement that Tworst (n) = O(g(n)) and the second inequality is
the statement that Tbest (n) = (g(n)).
To prove this in the other direction we start with Equations 2 and 3. Then for a general
input which takes time T (n) we have
T (n) < Tworst (n) c2 g(n) ,
and
T (n) > Tbest (n) c1 g(n) ,
we have by combining these two that
c1 g(n) T (n) c2 g(n) ,
or that T (n) = (g(n)) as we were to show.
30
Exercise 3.1-7
For this problem we want to show that
o(g(n)) (g(n)) = .
To show this assume this set is not empty and let f (n) be a function in it. Then as f (n)
o(g(n)) we have that for any c > 0 there exists a n0 > 0 such that f (n) < cg(n) for all
n > n0 .
As we also have that f (n) (g(n)) then for any c > 0 there exists n0 > 0 such that
f (n) > cg(n) for all n > n0 .
If we choose a value of c that is the same between these two statements and a value of n0 that
is large enough to satisfy each statement we would have that f (n) < cg(n) and f (n) > cg(n)
for n > n0 which is a logical contradiction. Thus no such f (n) can exist and the intersection
of the two sets above must be empty.
Exercise 3.1-8
Following the notation for one-dimensional function we can define (g(n, m)) to be the set
of all functions f (n, m) such that there exists positive constants c1 , n0 , and m0 such that
0 c1 g(n, m) f (n, m) ,
for all n n0 and m m0 .
In the same way we will define (g(n, m)) to be the set of all functions f (n, m) such that
there exists positive constants c1 , c2 , n0 , and m0 such that
0 c1 g(n, m) f (n, m) c2 g(n, m) ,
for all n n0 and m m0 .
Notes on common functions
As this section of the book seemed like a review of mathematical concepts that will be used
in the rest of the book in this section of the notes I present here some of the equations
presented there in a bit more detail (providing proofs of some of the statements made) and
equation number references that can be used later in this manual.
We can bound any number x by the nearest two integers using floor and ceiling functions
x 1 x x x x + 1 ,
31
(4)
Note that repeated division and then ceiling (flooring) the result are equivalent to a single
ceiling (flooring) as when n is any real number and a, and b are positive integers we have
l m l m
n
n 1
=
,
(5)
a b
ab
and
j k
n 1
a b
jnk
ab
(6)
1
.
loga (b)
(7)
To prove this, note that by converting the left-hand-side to natural logs we have
logb (a) =
1
ln(a)
1
= ln(b) =
.
ln(b)
loga (b)
ln(a)
(8)
To do this note that for any a > 1 and real b we have that
nb
= 0.
n+ an
If we replace n with lg(n) and a with 2a in this expression we end up with
lim
(9)
lgb (n)
lgb (n)
lgb (n)
=
lim
=
lim
= 0.
n+ (2a )lg(n)
n+ 2a lg(n)
n+ na
lim
(10)
(11)
To prove this we let y = alogb (n) and take the logarithm to the base b of both sides of
this expression. Since exponentiation in an argument of a logarithm can be turned into
multiplication (and back again) we have
logb (y) = logb (n) logb (a) = logb (nlogb (a) ) .
Solving the above for y we have that
y = nlogb (a) = alogb (n) .
32
1
1+
.
n! = 2n
e
n
(12)
(13)
n!
lim n = lim
n+ n
n+
2n
en
1
= 0,
1+
n
(14)
Thus to show this is true we will simply compute this limit and show that it is true. For
f (n) = n! and g(n) = 2n by using Stirlings formula we find
n
1 + n1
2n ne
n!
= lim
lim
n
n+
n+ 2n
2
n n
1
= + ,
= lim
2n 1 +
n+
n
2e
n
2e
(15)
To show this we will take the logarithm of Stirlings approximation given in Equation 13
where we get
n n
1
lg(n!) = lg
2n
1+
e
n
1
1
1
+ lg(n) n lg(e) + n lg(n) .
= lg(2) + lg 1 +
2
n
2
If we divide both sides by n lg(n) and take the limit as n the first four terms have a
zero limit and the last term has a limit of one (it is, in fact, the constant one). The fact that
lg(n!)
lim
= 1,
n n lg(n)
33
means that given any > 0 there exists a n0 such that for for all n n0 we have
lg(n!)
n lg(n) 1 < .
lg(n!)
+1,
n lg(n)
or
(1 )n lg(n) lg(n!) ( + 1)n lg(n) ,
for all n n0 . In the above lets take =
1
2
3
1
n lg(n) lg(n!) n lg(n) ,
2
2
which is the statement that lg(n!) = (n lg(n)).
34
(16)
(17)
If we have m n then we know that g(m) g(n) so applying f to the two numbers g(m)
and g(n) and using the fact that f is monotonically increasing we have that
f (g(m)) f (g(n)) ,
so the function f (g()) is monotonically increasing.
Finally, using Equations 16 and 17 when both f and g are nonnegative if we multiply these
two equations we get
f (m)g(m) g(n)f (n) ,
showing that the function f (n)g(n) is monotonically increasing.
Exercise 3.2-2
See the notes on Page 32 where these identities are proved.
Exercise 3.2-3
See the notes on Page 33 where these identities are proved.
m
2m me
1 + m1
m!
= lim
lim
km
m+
m+ 2km
2
1
m m
2m 1 +
= + ,
= lim
m+
m
2k e
35
As once n > 2k e the fraction 2nk e is larger than one and powers of a number larger than
one tend to positive infinity. This means that no such C can exist and we lg(n)! is not
polynomially bounded.
Next we want to see if lg(lg(n))! polynomially bounded. This means that
lg(lg(n))! Cnk ,
m
for m = 1, 2, 3, . . . then in
m! C2k2 ,
m
for increasing m. Now lets divide both sides by 2k2 , use Stirlings approximation, and take
the limit of the left-hand-side as m .
m
2m me
1 + m1
m!
lim
lim
m =
k2m
m+ 2k2
m+
2 m
m
.
= lim
2m m k2m
m+
e 2
m
m!
mm
2m m km2
lim
lim
m <
m+ 2k2
m+
e 2
m m
2m
= 0,
= lim
m+
e2km
as in this limit the fraction we have limm e2mkm = 0and increasing powers of this fraction will be small enough to make the product with m limit to zero. One could prove
these statements rigorously using LHospitals rule if desired. This means that lg(lg(n))! is
polynomially bounded.
Exercise 3.2-5
For this problem we want to know which of the two functions lg(lg (n)) or lg (lg(n)) is
asymptotically smaller. We can get a feeling for the answer if we take n = 265536 then
lg (n) = 5 so lg(lg (n)) = 2.32. For the second function with this n we have lg(n) = 65536
and so lg (lg(n)) = 5 thus for this simple case by inspection it would appear that the function
lg (lg(n)) is larger and we would have
lg(lg (n)) = O(lg (lg(n))) .
This can be seen with the following argument. Note that we have
lg (n) = lg (lg(n)) + 1 ,
36
(18)
since the inner lg(n) on the right-hand-side will take away from number of lg(n) applications
needed in computing the function lg () on the left-hand-side and so we add one (to the
right-hand-side). If we take the lg() of both sides of this expression we have
lg(lg (n)) = lg(lg (lg(n)) + 1) .
(19)
ln(1 + x)
x
<
,
ln(2)
ln(2)
for x sufficiently large we can bound the right-hand-side of Equation 19 by C lg (lg(n)) (here
1
C = ln(2)
and we get
lg(lg (n)) < C lg (lg(n)) ,
lg(x + 1) =
1+ 5
12 5
2
F1 =
= 1,
5
which are valid base cases for the Fibonacci numbers.
i
Next we assume that Fi = 5 for all i n. Then we want to prove this formula is true
for Fn+1 . To do that we will use the recursion relationship for Fn . We have
Fn+1 = Fn + Fn1
n n n1 n1
n + n1 n + n1
=
+
=
5
5
5
5
n+1
1
2
n+1
1
2
( + ) ( + )
.
=
5
5
Now the coefficient of n+1 can be manipulated as
2
4
2(1 + 5) + 4
1
2
+
+ =
=
1 + 5 (1 + 5)2
(1 + 5)2
(6 + 2 5) (1 5)2
6+2 5
=
=
(1 + 5)2
(1 + 5)2 (1 5)2
(6 + 2 5)(1 2 5 + 5)
(6 + 2 5)(6 2 5)
36 20
=
=
=
= 1.
2
(1 5)
16
16
37
The same thing can be done to the coefficient of n+1 . Taken together we have shown that
Fn+1
n+1 n+1
= .
5
5
Exercise 3.2-7
For this exercise we want to prove that Fi+2 i . To start we will consider Equation 20
evaluated at i + 2 where we have
"
#
2
i+2 i
i+2 i+2
= i
Fi+2 =
5
5
"
#
i
2 2 ()
= i
.
5
Now the expression simplifies as
=
!
1+ 5
2
Thus we have
Fi+2 = i
!
15
1 5
=
= 1 .
2
4
"
2 (1)i 2
2 2
.
5
The expression with the positive sign (ignoring the 5 for a moment) is
or
!2
!2
1
+
5
5
1
2 + 2 =
+
2
2
and the expression with the negative sign (again ignoring the 5) is
!2
!2
1
1
+
5
5
2 2 =
2
2
4 5
1+2 5+5 12 5+5
=
= 5 = 2.236068 < 3 .
=
4
4
4
38
(21)
As the expression with the negative sign is the smaller of the two from Equation 21 we see
that Fi+2 must be larger than the value obtained when we take the negative sign or
#
"
2
2
i
= i ,
Fi+2 =
5
which is what we wanted to prove.
d
X
ai ni .
i=0
Part (a): If p(n) = O(nk ) is true this means that there exists positive constants C and
n0 > 0 such that
p(n) Cnk ,
for all n n0 . From how p(n) is defined this means that
d
X
i=0
ai ni Cnk ,
ai nik C ,
or
a0 nk + a1 n1k + + ad ndk C ,
for all n n0 . Since when k > d we have that
lim (a0 nk + a1 n1k + + ad ndk ) = 0 ,
n+
so the above is true in that case as the left-hand-side will eventually be smaller than any
positive constant. If k = d then the left-hand-side is
a0 nd + a1 n1d + + ad ,
which limits to ad as n and our expression on the left-hand-side can again be made
smaller than a constant.
39
n
X
Ai,k Bk,j .
k=1
n
X
Aj,k Bk,i .
k=1
Note that the components of A in the above can be expressed in term of As transpose since
Aj,k = (AT )k,j . The same can be said for B and when this substitution is done we have
((AB)T )i,j = (AB)j,i =
n
n
X
X
(B T )i,k (AT )j,k .
(AT )j,k (B T )i,k =
k=1
k=1
Where in the first summation we have replace Aj,k with (AT )k,j (similarly for B) and in the
second summation we just placed the term B T before the term involving AT . The above can
be recognized as the (i, j)th element of the product B T AT as expected.
Using the fact that (AB)T = B T AT (proven above), and that (AT )T = A, for the product
AT A we have that
(AT A)T = (A)T (AT )T = AT A .
thus since AT A when transposed equals itself we have that it is a symmetric matrix as
requested.
40
and BA = I
and CA = I ,
Multiplying the equation AB = I on the left by C gives CAB = C. Using the fact that
CA = I, the above simplifies to B = C, showing that the inverse of a matrix must be unique.
n
Y
i=1
gii =
n+1
Y
i=1
gii ,
proving by induction that the determinant of a triangular matrix is equal to the product of
its diagonal elements.
We will prove that the inverse of a lower triangular matrix L (if it exists) is lower triangular
by induction. We assume that we are given an L that is non-singular and lower triangular.
We want to prove that L1 is lower triangular. We will do this by using induction on n the
dimension of L. For n = 1 L is a scalar and L1 is also a scalar. Trivially both are lower
triangular. Now assume that if L is non-singular and lower triangular of size n n, then
L1 has the same property. Let L be a matrix of size (n + 1) (n + 1) and partition L as
follows
L11 0
L=
.
L21 L22
Where L11 and L22 are both lower triangular matrices of sizes less than n n, so that we
can apply the induction hypothesis. Let M = L1 and partition M con formally i.e.
M11 M12
.
M=
M21 M22
We want to show that M12 must be zero. Now since ML = I by multiplying the matrices
above out we obtain
M11 M12
L11 0
LM =
M21 M22
L21 L22
I 0
L11 M11
L11 M12
=
=
0 I
L21 M11 + L22 M21 L21 M12 + L22 M22
Equating block components gives
L11 M11
L11 M12
L21 M11 + L22 M21
L21 M12 + L22 M22
=
=
=
=
I
0
0
I.
By the induction hypothesis both L11 and L22 are invertible. Thus the equation L11 M11 = I
gives M11 = L1
11 , and the equation L11 M12 = 0 gives M12 = 0. With these two conditions
the equation L21 M12 + L22 M22 = I becomes L22 M22 = I. Since L22 is invertible we compute
that M22 = L1
22 . As both L11 and L22 are lower triangular of size less than n n by the
induction hypothesis their inverse are lower triangular and we see that M itself is then lower
triangular since
1
L11
0
.
M=
M21 L1
22
Thus by the principle of induction we have shown that the inverse of a lower triangular
matrix is lower triangular.
that each row of P when multiplied into A will select a single row of A and thus produces a
permutation of the rows of A. In the same way the product AP will produce a permutation
of the columns of A. If we have two permutation matrices P1 and P2 , then P1 acting on P2
will result in a permutation of the rows of P2 . This result is a further permutation matrix,
since it is the combined result of two permutations, that from P2 and then that from P1 . If
P is a permutation, then it represents a permutation of the rows of the identity matrix. The
matrix P representing the permutation which reorders the rows of P back into the identity
matrix would be the inverse of P and from this argument we see that P is invertible and has
an inverse that is another permutation matrix. The fact that P 1 = P T can be recognized
by considering the product of P with P T . When row i of P is multiplied by any column of
P T not equal to i the result will be zero since the location of the one in each vector wont
agree in the location of their index. However, when row i of P is multiplied by column i the
result will be one. Thus we see by looking at the components of P P T that P P T = I and
P T is the inverse of P .
..
M =
1
.
..
.
1
1
Where the one in the above matrix is at location (j, i). In this case the inverse of M is then
easy to compute; it is the identity matrix with a minus one at position (j, i), specifically we
have
..
M 1 =
1
.
..
1
1
Now multiplying B by this matrix on the left will operate on the columns of B, thus B =
BM 1 is the same matrix as B but with the jth and the negative of the ith column added
43
together and placed in column j. That is we are subtracting the ith column from the jth
column and placing it in column j.
1
CT ,
det(A)
where C is the matrix of cofactors of A. This cofactor matrix C is the matrix such that its
(i, j) element is given by the cofactor of the element aij or
Ci,j = (1)i+j det(A[ij] ) .
Where A[ij] is the ijth minor of the matrix A, i.e. the submatrix that is produced from
A by deleting its ith row and its jth column. Because the determinants of the minors of
A only involve additions and multiplications, if A has only real elements then all cofactors
and determinants of A must be real. By the adjoint theorem above, A1 can have only real
elements. The other direction, where we argue that if A1 has only real elements then A
must have only real elements can be seen by applying the above arguments to the matrix
A1 .
44
xi vi = 0 .
i=1
Since A is full column rank its columns are linearly independent (this is the definition of full
column rank). Because of this fact, the only way these columns can sum to zero is if x = 0
and we have proven one direction. Now lets assume that Ax = 0 implies that x = 0. This
statement is equivalent to the fact that the columns of A are linearly independent which
again implies that A is of full column rank.
x
x
0
0
0
n1
n2
2
1 x1
x
x
x
1
1
1
x22 x2n2 x2n1
det(V (x0 , x1 , x2 , , xn3 , xn2 , xn1 )) = 1 x2
..
..
..
..
..
.
.
.
.
.
n2
n1
1 xn1 x2n1 xn1
xn1
n2
2
1 x0
x
x
0
0
0
n2
n2
2
1 x1
x1 x1
(x1 x0 )x1
n2
n2
2
1 x2
(x
x
)x
x
x
2
0 2
=
2
2
.
..
..
..
..
..
.
.
.
.
.
n2
n2
1 xn1 x2n1 xn1
(xn1 x0 )xn1
Where we see that this action has introduced a zero in the (1, n) position. Continuing, we
now multiply column n 2 by x0 and add it to column n 1. We find that the above
determinant now becomes
1 x0
x20
0
0
n3
n2
2
1 x1
x
(x
x
)x
(x
x
)x
1
0 1
1
0 1
1
n3
n2
2
1 x2
x
(x
x
)x
(x
x
)x
2
0
2
0
2
2
2
.
..
..
..
..
..
.
.
.
.
.
n2
n3
2
1 xn1 xn1 (xn1 x0 )xn1 (xn1 x0 )xn1
Where we see that this action has introduced another zero this time at the (1, n1) position.
Continuing in this manner we will obtain the following determinant where the first row has
46
0
(x1 x0 )x1n3
(x2 x0 )x2n3
..
.
0
(x1 x0 )x1n2
(x2 x0 )x2n2
..
.
n3
n2
(xn1 x0 )xn1
(xn1 x0 )xn1
We can expand this determinant about the first row easily since there is only a single nonzero
element in this row. After this reduction we see that we can factor x1 x0 from the first
row of our reduced matrix, factor x2 x0 from the second row, x3 x0 from the third row,
and so on til we get to the n 1th row of our reduced matrix where we will factor xn1 x0 .
This gives us
n3
n2
2
1 x1
x
x
x
1
1
1
n1
n3
n2
2
1 x2
Y
x
x
x
2
2
2
det(V (x0 , x1 , x2 , , xn2 , xn1 )) =
(xi x0 ) ..
..
..
..
.. .
.
.
.
.
.
i=1
n3
n2
2
1 xn1 xn1 xn1 xn1
Note that the remaining determinant is of size (n 1) (n 1) and is of exactly the same
form as the previous determinant. Thus using the same manipulations performed earlier we
see that
det(V ) =
=
n1
Y
i=1
n1
Y
i=1
n1
Y
i=1
(xi x0 )
(xi x0 )
n1
Y
i=1
(xi x0 )
n1
Y
j=2
n1
Y
j=2
(xj x1 )
n1
Y
j=2
(xj x1 )
n1
Y
k=3
n1
Y
k=3
(xk x2 )
n1
Y
l=n2
(xk xj ) ,
0j<kn1
using only three real multiplications. To perform this computation define the some intermediate variables t1 , t2 , and t3 in terms of our inputs as
t1 = ad
t2 = bc
t3 = (a + b)(c d) ,
which involve only three real multiplications. Note that the product obtained by computing
t3 is algebraically equivalent to
t3 = (a + b)(c d) = ac ad + bc bd = (ac bd) + (ad + bc) ,
where we have grouped specific terms to help direct the manipulations we will perform below.
Note also that the sums t1 + t2 and t3 + t1 t2 are given by
t1 + t2 = ad + bc
t3 + t1 t2 = ac bd .
so that our full complex product can be written in terms of these sums as
(a + ib)(c + id) = (t3 + t1 t2 ) + i(t1 + t2 ) .
3
x1
1 0 0
4 1 0 x2 = 14 ,
7
x3
6 5 1
x1 = 3
x2 = 14 4x1 = 14 12 = 2
x3 = 7 + 6x1 5x2 = 7 + 18 10 = 1 .
Exercise 31.1-2
If a|b then we have that there exists an integer k1 such that b = k1 a. Since b|c we have that
there exists an integer k2 such that c = k2 b. Using the first of these two relationships into
the second we have
c = k2 b = k2 (k1 a) = k1 k2 a ,
showing that a|c.
Exercise 31.1-3
Assume that gcd(k, p) = d 6= 1. Then d|k and d|p. Since p is prime if d|p then d = 1 or
d = p. As we assumed that d 6= 1 we must have d = p. The fact that d|k means that |d| |k|
or |p| |k|. This last fact is a contraction to the problem assumption that k < p. Thus the
assumption that d 6= 1 must be false.
Exercise 31.1-4
Since gcd(a, n) = 1 then by Theorem 31.2 there must exist integers x and y such that
1 = ax + ny. If we multiply this equation by b we get
b = abx + nbx .
Note that the right-hand-side of the above is divisible by n since it is a linear combination
of two things (the terms abx and nbx) both of which are divisible by n. We know that abx is
divisible by n since ab is and nbx is divisible by n since n is a factor. Since the right-hand-side
equals b this shows that b is divisible by n.
49
Exercise 31.1-5
Using Equation 26 we can write
p p1
p
,
=
k k1
k
which since the right-hand-side is a multiple of p shows that p divides kp or p| kp . Next
note that using the binomial theorem we can write (a + b)p as
p1
X
p k pk
p
p
p
a b
.
(a + b) = a + b +
k
k=1
Since p divides kp it divides the summation on the right-hand-side of the above equation.
Thus the remainder of (a + b)p and ap + bp when dividing by p is the same or
(a + b)p = ap + bp
(mod p) .
Exercise 31.1-6
Now the definition of the mod operators is that
x mod b = x
x mod a = x
jxk
j xb k
a
(22)
a.
Since a|b there exists an integer k such that b = ka. Thus Equation 23 becomes
jxk
ka .
x mod b = x
b
(23)
as we were to show. Next if we assume that x y (mod b) then taking the mod a operator
to both sides of this expression and using the just derived result we get
(x mod b) mod a = (y mod b) mod a or x mod a = y mod a ,
as we were to show.
Exercise 31.1-8
Recall that Theorem 31.2 states that that gcd(a, b) is the smallest positive element of the set
{ax + by : x, y Z}. Since this set definition of the greatest common divisor is symmetric
in a and b we see that
gcd(a, b) = gcd(b, a) .
50
(24)
From the above we see that now all recursive calls to EUCLID will have the first argument
larger than the second argument. In python, the greatest common divisor is available in the
fractions module i.e.
import fractions
print fractions.gcd(30,21) # the example from the book gives 3
At the time of writing the source code of this function shows that its in fact using Euclids
algorithm for its implementation.
Exercise 31.2-1
Let d = gcd(a, b) then d must have a prime factorization in terms of the same primes as a
and b namely
d = pg11 pg22 pgrr .
Since d is a divisor of a and b we must have d|a and d|b which means that
pgi i |a and pgi i |b ,
for i = 1, 2, , r. This in tern means that
pgi i |pei i
gi ei
and gi fi .
Exercise 31.2-2
See the output in Table 3 where we follow the output format given in this section of the
notes where as we move down the table we are moving in increasing stack depth. From that
output we see that gcd(899, 493) = 29 and that 29 = 899(6) + 493(11).
Exercise 31.2-3
One way to see that these two expressions are equal is to use one step of EUCLIDs algorithm
to evaluate both of them and see that after the first step both of these algorithms are
52
stack level
0
1
2
3
4
5
a
b
a/b
899 493
1
493 406
1
406 87
4
87 58
1
58 29
2
29
0
-
d x y
29 -6 11
29 5 -6
29 -1 5
29 1 -1
29 0 1
29 1 0
Exercise 31.2-4
One way to do this would be the following in python
def iterative_euclid(a,b):
while b != 0 :
a, b = b, a % b # swap in place
return a
Here we have assigned b to a and a mod b to b in one step using pythons tuple assignment.
Exercise 31.2-5
For the Fibonacci sequence defined as F0 = 0, F1 = 1, and
Fk = Fk1 + Fk2
Then it can be shown that Fk+1
k+1
for k 2 ,
1+ 5
= 1.61803 .
=
2
53
From Lames theorem in the book we have less than k recursive calls in EUCLID(a, b) when
Fk+1 > b or using the approximation above for Fk+1 when
k+1
> b.
5
Solving the above for k gives
ln( 5)
= 1.67 ,
log ( 5) =
ln()
Thus we conclude that k > log (b) + 0.67. Thus we have shown that the invocation
EUCLID(a, b) makes at most 1 + log (b) recursive calls.
Exercise 31.2-7
One can show that the function returns the same answer independent of the order of the
arguments by induction. It is easiest to see how to write
gcd(a0 , a1 , a2 , , an ) = a0 x0 + a1 x1 + a2 x2 + + an xn ,
by looking at an example. Consider
gcd(a0 , a1 , a2 , a3 ) = gcd(a0 , gcd(a1 , a2 , a3 )) = gcd(a0 , gcd(a1 , gcd(a2 , a3 ))) .
We can now use Theorem 31.2 to write gcd(a2 , a3 ) as a2 x2 + a3 x3 for two integers x2 and x3 .
This would give
gcd(a0 , a1 , a2 , a3 ) = gcd(a0 , gcd(a1 , a2 x2 + a3 x3 )) .
We can again use use Theorem 31.2 to write gcd(a1 , a2 x2 + a3 x3 ) as
x1 a1 + y1 (a2 x2 + a3 x3 ) = x1 a1 + y1 x2 a2 + y1 x3 a3 .
for two integers x1 and y1 . Thus we have shown that
gcd(a0 , a1 , a2 , a3 ) = gcd(a0 , x1 a1 + y1 x2 a2 + y1 x3 a3 ) .
One more application of Theorem 31.2 gives
gcd(a0 , a1 , a2 , a3 ) = x0 a0 + y2 (x1 a1 + y1 x2 a2 + y1 x3 a3 )
= x0 a0 + y2 x1 a1 + y1 y2 x2 a2 + y1 y2 x3 a3 .
As each coefficient in front of ai is the product of integers it itself is an integer. This gives the
desired representation when n = 3. In general, we use the definition of the greatest common
divisor for n > 1 to nest gcd function calls until we get a function call with only two
arguments gcd(an1 , an ). We can then use Theorem 31.2 to write this as an1 xn1 + an xn
for two integers xn1 and xn . We can then unnest the gcd function calls each time using
Theorem 31.2 to get the desired result.
54
+4
0
1
2
3
0
0
1
2
3
1
1
2
3
0
2
2
3
0
1
3
3
0
1
2
1
1
2
3
4
2
2
4
1
3
3
3
1
4
2
4
4
3
2
1
.
lcm(a1 , a2 , . . . , an ) = d
d
d
d
In the right-hand-side
of the above expression by grouping d with a given term in parenthesis
we have that d adi = ai and thus the above number is a multiple of ai for each i.
Exercise 31.3-1
The group (Z4 , +4 ) has the elements {0, 1, 2, 3} with the addition operation modulo four.
The group operation table for this group is given in Table 4. The group (Z5 , 5 ) is the
multiplication group of order n. The elements of that group are the elements of Z5 that are
relatively prime to five i.e.
Z5 = {[a]5 Z5 : gcd(a, 5) = 1} .
The elements in Z5 are then {1, 2, 3, 4}. The group operation table is given in Table 5. To
show that these two groups are isomorphic to each other we are looking for a mapping ()
from Z4+ = {0, 1, 2, 3} to the set Z5 = {1, 2, 3, 4} such that (a)(b) = (c) mod 5. Its clear
that (0) = 1 (i.e. the addition identity should map to the multiplication identity). To get
55
1
2
4
3
1
1
2
4
3
2
2
4
3
1
4
4
3
1
2
3
3
1
2
4
Table 6: The table that results when we take the group operation table for (Z4 , +4 ) and
transform each element under the () mapping.
the rest of the mapping for lets try (1) = 2 and see what we can conclude if that is true.
We have
(0) = 1
(1) = 2
(2) = (1 + 1) = (1)2 mod 5 = 22 mod 5 = 4 mod 5 = 4
(3) = (1 + 1 + 1) = (1)3 mod 5 = 23 mod 5 = 8 mod 5 = 3
(4 mod 4) = (0) = 1 .
Thus we have found a one-to-one mapping that takes the elements and operation from
(Z4 , +4 ) to (Z5 , 5 ). If we then take the elements of Table 4 and use the above mapping on
each element we get Table 6. Notice that this table is really just Table 5 but with some rows
and columns interchanged.
Exercise 31.3-2
For this problem we want to prove that a nonempty closed subset of a finite group is a
subgroup. To do that we need to show the following
Closure: We are told that the subset is closed therefore we have the closure property
needed for a group immediately.
Identity: As the subset S is nonempty it must have at least one element. Denote this
this element as a. If a = e then S has the identity element and we are done. Even if
a 6= e then as a|S | = e (by Corollary 31.19 from the book) then since S is closed the
subset must have the identity element as a member in this case as well.
Associativity: The associativity of the elements of the subset S follow from that of
the associativity of the elements in the original group S.
Inverses: Let a S . Then as we know that e S we claim that the inverse of a
must also be in S . We can see that using the fact that a|S | = e or a|S |1 a = e. This
later expression means that a|S |1 is an inverse of the element a. As S is closed the
element a|S |1 must be a member of S so the set S has all its elements inverses.
56
Exercise 31.3-3
If p is prime and e a positive integer then from the definition of the function we have
Y
1
e
e
1 ,
(p ) = p
p
e
p |p
where the product is over all primes p that divide pe . The only prime that does divide pe is
p itself so the above becomes
1
e
e
(p ) = p 1
= pe1 (p 1) .
p
Exercise 31.3-4
To show this we will use Theorem 31.6 from the text if both gcd(a, p) = 1 and gcd(b, p) = 1
then gcd(ab, p) = 1. To use this note that since a Zn and x Zn they are both relatively
prime to n and thus gcd(a, n) = 1 and gcd(x, n) = 1. Then using Theorem 31.6 we have
that the product ax is relatively prime to n and thus is an element of Zn . Thus the given
function f maps each elements of Zn to another element of Zn and results in a permutation
of Zn .
Exercise 31.3-5
Since the group Z9 has the nine elements {0, 1, 2, 3, 4, 5, 6, 7, 8} then by Lagranges theorem
the number of elements in any subgroup of it must be a divisor of nine. That means that
the subgroups must have 1, 3, or 9 elements. The subgroup with only one element is the
subgroup with only the identity or {0}. The subgroup with nine elements is the original
group again. The subgroup with three elements can be obtained by using 3 as a generator
and is the subgroup {0, 3, 6}.
Z13
= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Since this set has twelve elements by Lagranges theorem any subset of it must have 1, 2, 3,
4, 6, or 12 elements itself. The subset with only one element is the set {1} and the subset
57
chap 31 prob 3 5.py. Running that code we find that all the subsets of Z13
are
< 2 > =< 6 >=< 7 >=< 11 >= {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} = Z13
< 3 > =< 9 >= {1, 3, 9}
< 4 > =< 10 >= {1, 3, 4, 9, 10, 12}
< 5 > =< 8 >= {1, 5, 8, 12}
< 12 > = {1, 12} .
Exercise 31.4-1
WWX: Proof from here down.
For this exercise we will follow the steps in the pseudo-code Modular-Linear-Equation-Solver
by hand, to find the solutions (if they exist). To start we note that we have
a = 35
b = 10
n = 50 .
Then we let
d = gcd(a, n) = gcd(35, 50) = 5 .
Now we know that we can write d as
d = gcd(35, 50) = 5 = 35x + 50y .
If x = 7 then y = 5. Next we ask if d|b or 5|10 which is yes. Thus we compute
10
x0 = 7
= 14 .
5
Then for i = 0, 1, 2, 3, 4 we compute
xi = x0 + i
n
d
= 14 + i
50
5
= 14 + 10i .
58
Exercise 31.4-2
Assume by way of contradiction that x 6= y mod n but that
ax = ay mod n .
Then since x 6= y mod n we have x = y + z with n not a factor of z or n z or 2 mod n 6= 0.
Now if x = y + z and we multiply by a we get
ax = ay + az ,
and we consider this modulo n we get
ax mod n = ay mod n + az mod n .
Now as ax mod n = ay mod n we have 0 = az mod n. But a 6= 0 and gcd(a, n) = 1 therefore
we must have z mod n = 0 but that is a contradiction to the statement above. Thus
ax = ay mod n x = y ,
when gcd(a, n) = 1.
We want to show ax = ay mod n when x 6= y when gcd(a, n) > 1. Let z = 2, a = 2, and
n = 4. Then x = y + z = y + 2. And 2x = 2y mod 4 is true with x = 2 and y = 0 and
x 6= y mod 4.
Problem 31.1 (The binary gcd algorithm)
Part (a): The statement
gcd(a, b) = 2gcd
a b
,
2 2
follows from Corollary 31.4 from the book which states that for all integers a and b and any
nonnegative integer n that
gcd(a n, b n) = ngcd(a , b ) .
When we take n = 2, a = a2 , and b =
b
2
Part (b): For this part we assume that a is odd and b is even. Then from Theorem 31.2 in
the book we have that gcd(a, b) is the smallest positive element of the set {ax + by} for all
x and y from Z. This is the same as the smallest positive element of the set
b
y
ax +
2
with y = 2y. The above is equal to gcd a, 2b .
Part (c): We will first show that if a > b then
gcd(a, b) = gcd(a b, b) .
59
To show this, recall that the left-hand-side is the smallest positive integer in the set {ax+by}
while the right-hand-side is the smallest postive integer in the set {(a b)x + by } = {ax +
b(y x )} which are the same number.
For this part we assume that a and b are both odd and that a > b. In that case a b is the
difference of two odd numbers and so is an even number. From Part (a) we then have that
ab
gcd(a, b) = gcd(a b, b) = gcd
,b .
2
If b > a the situation is the same
gcd(a, b) = gcd(b a, a) = gcd
ba
,a .
2
We can then use the fact that gcd(a , b ) = gcd(a , b ) to reverse the sign of the
sion.
60
ba
2
expres-
n
k
n(k 1)
nk n
n
k
Which is known to be true. Using the same manipulations we can show show that
n
n2
.
k2
k
This inductive process can continue, subtracting one each time from the numerator and
denominator. As a check we can verify the correctness of the final inequality which is given
by
nk+1
n
.
1
k
Using the same approach as above
nk+1
1
kn k 2 + k
n(k 1) k(k 1)
(k 1)(n k)
61
n
k
n
0
0
and the last equation is true. Given this sequence of results we can derive the lower bounds
on the binomial coefficients as
n(n 1) . . . (n k + 1)
n
=
k
k(k 1) . . . 1
n n 1
nk+1
=
k
k1
1
n n
n
k
k
k
n k
=
k
a upper bound on the binomial coefficients (book notes page 1097)
We have from the definition of the binomial coefficients that
n(n 1) (n k + 1)
n
=
k
k(k 1) 2 1
k
n
k!
ek nk
kk
en k
.
k
k
Where in the above we have used the k! ke (obtained from Stirlings approximation),
in the form
e k
1
k!
k
nk
k!
in the above.
62
the binomial coefficients and the entropy function (book notes page 1097)
If we assign k = n, then we have the following
nn
n
n
(n)n (n n)nn
nn
=
(n)n ((1 )n)(1)n
n
n
=
(n) ((1 )n)1
n
1
=
(1 )1
1 !n
1
1
=
1
2nH() ,
1
or
H() = lg() (1 ) lg(1 ) .
This is the (binary) entropy function. We note before continuing that since = n/k the
above bound can also be written as
n
n
2nH( k ) .
k
Exercise C.1-1 (counting substrings)
Assuming k < n, then the first substring occupies the locations 1, 2, . . . , k, the second substring occupies the locations 2, 3, . . . , k + 1, and so on. The last possible substring would
occupy the locations n k + 1, n k, . . . , n. Thus we have in total
nk+1
substrings of size k in a string of size n.
To calculate how many substrings of size k a string of size n has, we from the above formula
that we have n size one substrings (k = 1), n 1 size two substrings (k = 2), n 2 size
three substrings, etc. Continuing this pattern the total number of substrings is the sum of
all these numbers given by
n
X
k=1
(n k + 1) =
n
X
l=1
63
l=
n(n + 1)
.
2
22 ,
possible Boolean functions with n-input variables and 1-output variable.
For a n-input m-output we have 2m possible choices for each possible output, so using the
same logic as before we have
n
n
(2m )2 = 2m2 ,
possible n-input m-output Boolean functions.
n!
= (n 1)! .
n
Exercise C.1-4 (an even sum from distinct subsets of size three)
Our set is given by S = 1, 2, . . . , 100 and we want three distinct numbers that sum to an
even number. The total number of distinct size
3 subsets
(independent of the order of the
100
. Half of these will sum to an even
elements in the subset) drawn from S is given by
3
number and the other half will sum to an odd number. Thus the total number is given by
1 100
= 80850 .
3
2
64
Another way to obtain this same number is to recognize that to have an even sum I must
pick either three even numbers or one even number and two odd numbers. Since these two
outcomes are mutually exclusive using the rule of sum the total number of ways to pick a
even summable subset is the sum of the number of ways to select each of the previous sets.
This number is
50
50
50
+
,
3
1
2
where the first combination is the number of ways to select three even numbers and the
second combination product is the number of ways to select a single odd number and then
two even numbers. One can check that the sum above reduces to 80850 also.
(25)
65
1
1
1
1
4
5
3
6
10
1
4
10
15
20
1
5
15
0
0
1
6
n(n + 1)
2
i=
i=1
n+1
2
n(n + 1)
2
since the two sides are equal we have the proven identity.
n
k
as a function of k)
for 0 k n
n
is an increasing function of k, which means proving
All one has to do is show that
k
the following chain of reasoning is reversible
n
n
>
k
k+1
n!
n!
>
(k + 1)!(n k 1)!
(n k)!k!
(k + 1)!
(n k)!
>
(n k 1)!
k!
nk > k+1
n > 2k + 1
n1
k <
2
n
is increasing as a function of k as long as k < n1
which says that
, which equals the
2
k
floor and ceiling expressions above if n is odd or even respectively.
k
j
j+k
To show this using an algebraic proof, we will evaluate the left hand side of this expression,
convert it into the right hand side multiplied by a correction factor and then show that the
correction factor is less than one. To begin we have
n!
n
=
j+k
(j + k)!(n j k)!
n!
j!(n j)!
=
j!(n j)! (j + k)!(n j k)!
(n j)!
j!k!
n
=
j
k!(n j k)! (j + k)!
n
n j (j(j 1)(j 2) 2 1)(k(k 1) 2 1)
=
j
k
(j + k)(j + k 1)(j + k 2) 21
n
nj
Note that the top and bottom of the factor multiplying the terms
both
j
k
have j + k elements in their product. With out loss of generality we will assume that j k.
The product above can be written as
j!k!
(j(j 1)(j 2) 2 1)(k(k 1) 2 1)
=
(j + k)(j + k 1)(j + k 2) 21
(k + j)(k + j 1) (k + 1)k!
j(j 1)(j 2) 21
=
< 1,
(j + k)(j + k 1)(j + k 2) (k + 2)(k + 1)
67
since for every factor in the numerator, we can find a term in the denominator that is larger
than it, for example the ratio of the first product in the numerator and denominator above
satisfy
j
< 1.
j+k
As the combined correction factor (with all these ratios) is less than one, we have shown the
inequality we started with.
As a combinatorial proof, the number of ways to select j + k items from a set of n will
be smaller than if we first select j items from n, and for each of those sets select k items
from n j. That the former is a larger number can be seen by the following example where
equality does not hold. Consider the case n = 4, k = 3, and j = 1, then
3
4
4
= 4.
=1<
3
1
4
There the number of ways to select 4 items from 4 is only one which is smaller than if we are
allowed to select 1 of the 4 items first (of which there are four ways) and then combine this
with the number of ways to select three remaining elements (of which there is only one).
nn
= n
(n 1)n1
nn1
(n 1)n1
n1
n
= n
n1
n1
1
= n
1 n1
1
<1
n
1
n
is bounded by
1
1
1
n
Taking positive powers of this expression (i.e. to the n 1 power) can only increase its value
and we have that
n1
1
n<n
1 n1
which provides the anchoring point from which to begin mathematical induction. Having
shown that this inequality is true for k = 1 we next proceed to assume it is true for k < k
and show that it is true for k < k + 1. To do this consider the following
n!
n
=
k+1
(n k 1)!(k + 1)!
n!
nk
=
(n k)!k! k + 1
nk
n
=
k
k+1
n
nn
nk
n
=
.
k
k (n k)nk k + 1
k k (k + 1)(n k)nk1
Here on the last step we have used the induction hypothesis. Our induction proof will be
finished if we can show that the last expression above is less than the following
nn
.
(k + 1)k+1(n k 1)nk1
Towards this end we will show this by assuming that it is true and then through reversible
transformations reduce this expression to one that is known to be true. Since all transformation are reversible the original inequality must be true. So to begin we assume that
nn
nn
k k (k + 1)(n k)nk1
(k + 1)k+1(n k 1)nk1
(k + 1)k
(n k)nk1
kk
(n k 1)nk1
k
nk1
1
1
1+
1+
k
nk1
From the above if we define the function f (x) as
x
1
f (x) = 1 +
x
The above expression becomes the following functional relationship, which if we can prove
is true, will complete the inductive proof
f (k) f (n k 1) .
69
2.6
2.4
2.2
1.8
1.6
1.4
1.2
5
x
10
Figure 3: Graphical plot showing the monotonicity of f (x) required for the inductive proof
in Exercise C.1-12.
Further assuming that f has a monotone inverse (to be discussed below) then we can apply
the inverse to this expression to obtain the region of validity for this inequality as
k nk1
or k n1
. For all the statements above to be valid (and our induction proof to be complete)
2
all one must do now is to show that f (x) has an inverse or equivalently that f (x) is monotonic
over its domain. This could be done with derivatives but rather than that Figure 3 shows a
plot of f (x) showing its monotonic behavior.
Note also from the symmetry relationship possessed by the binomial coefficients, i.e,
n
n
=
nk
k
we can extend our result above to all 0 k n.
Exercise C.1-13 (Stirlings approximate for some binomial coefficients)
We desire to show that
2n
n
22n
= (1 + O(1/n))
n
using Stirlings approximation. From the definition of the binomial coefficients we know that
(2n)!
2n
=
.
n
n! n!
Now Stirlings approximation gives an estimate of n!. Its expression is
n n
1
1 + ( ) .
n! = 2n
e
n
70
2n
1
1 + ( ) .
(2n)! = 4n
e
n
We also need an expression for (n!)2 , which is given by
2
n 2n
1
2
1 + ( )
(n!) = 2n
e
n
n 2n
1
= 2n
1 + ( ) .
e
n
Thus we have that
2n
n
=
=
2n 2n
1 + ( n1 )
e
2n
2n ne
1 + ( n1 )
2n
1 + ( n1 )
1
2n
n n
1 + ( n1 )
2n
4n
1
1
1 + ( )
1 + ( )
n
n
n
2n
2
1
=
1 + ( ) .
n
n
2
=
As was to be shown. In deriving this result we have made repeated use of the algebra of
order symbols.
ln(x)
ln(2)
(1 )
+ lg(1 ) +
ln(2)
(1 ) ln(2)
= lg() + lg(1 ) .
H () = lg()
71
When we set this equal to zero looking for an extrema we have that
lg() + lg(1 ) = 0
which has as its solution = 21 . We can check that this point is truly a maximum and not
a minimum by computing the second derivative of H(). We have that
1
1
H () =
,
ln(2) (1 ) ln(2)
which is negative when =
1
2
Taking the derivative of this expression with respect to x gives the following
n
X
n
n1
kxk1 y nk ,
n(x + y)
=
k
k=0
n2
n1
n
X
n
k,
=
k
k=0
=
=
=
=
..
.
=
A1
A2 \A1
A3 \(A1 A2 )
A4 \(A1 A2 A3 )
Aj \(A1 A2 A3 Aj1 )
72
Then by construction
A1 A2 A3 = C1 C2 C3 ,
and the Cj s are disjoint, so that we have
Pr(A1 A2 A3 ) = Pr(C1 C2 C3 ) =
Pr(Cj ) .
After we draw the first card an important subcalculation is to compute the number of ways
to draw two sorted cards from a list. Assuming we have n ordered items in our list, then
we have n ways to draw the first element and n 1 ways to draw the second element giving
n(n1) ways to draw two elements. In this pair of elements one ordering result in a correctly
sorted list (the other will not). Thus half of the above are incorrectly ordered what remains
are
n(n 1)
N2 (n) =
2
The number of total ways to draw three sorted cards can now be found. If we first draw an
eight we have two cards from which to draw the two remaining ordered cards in N2 (2) ways.
If we instead first drew a seven we have three cards from which to draw the two remaining
ordered cards in N2 (3) ways. If instead we first drew a six then we have four cards from
which to draw two ordered cards in N2 (4) ways. Continuing, if we first draw a one, we have
nine cards from which to draw two ordered cards in N2 (9) ways. Since each of these options
is mutually exclusive the total number of ways to draw three ordered cards is given by
N2 (3) + N2 (4) + N2 (5) + + N2 (9) =
9
X
N2 (n) =
n=2
9
X
n(n 1)
n=2
= 120 .
Finally with this number we can calculate the probability of drawing three ordered cards as
1
120
= .
720
6
unbiased coin flip does not match the digit in the binary expansion of a/b. The for each
flip a success can occur with probability p = 1/2 and a failure can occur with probability
q = 1 p = 1/2, we have that the number of flips n needed before a success is a geometric
random variable with parameter p = 1/2. This is that
P {N = n} = q n1 p .
So that the expected number of flips required for a success is then the expectation of the
geometric random variable and is given by
E[N] =
1
= 2,
p
P {A B}
,
P {B}
P {A|B} + P {A|B}
=
Now since the events A B and A B are mutually exclusive the above equals
B}
P {(A B) (A B)}
P {(A A)
P {B}
=
=
= 1.
P {B}
P {B}
P {B}
Continuing to peal off terms from the back we eventually obtain the requested expression
i.e.
P {A1 A2 An1 An } =
..
.
1
,
365
since for the specification of either one persons birthday the probability that the other person
will have that birthday is 1/365. Now we have that
1
1
1
P (Ai,j Ar,s ) = P (Ai,j |Ar,s )P (Ar,s ) =
=
.
365
365
3652
This is because P (Ai,j |Ar,s ) = P (Ai,j ) i.e. the fact that persons r and s have the same
birthday has no effect on whether the persons i and j have the same birthday. This is true
even if one of the people in the pairs (i, j) and (r, s) is the same. When we consider the
intersection of all the sets Ai,j , the situation changes. This is because the event (i,j) Ai,j
(where the intersection is over all pairs (i, j)) is the event that every pair of people have the
same birthday, i.e. that everyone considered has the same birthday. This will happen with
probability
n1
1
,
365
76
while if the events Ai,j were independent the required probability would be
P (Ai,j ) =
(i,j)
n
1
2
365
1
365
n(n1)
2
n
Since
6= n 1, these two results are not equal and the totality of events Ai,j are not
2
independent.
1
p+
2
2
1
1
= (p2 + ) .
2
4
or simplifying this result equality only holds if p = 1/2 which we are assuming is not true.
Now let event E denote the event that we are told that the coin selected and flipped by both
77
parties is the fair one, i.e. that event C2 happens. Then we have
1
1
and P (H2 |E) =
2
2
1
.
P (H1 , H2 |E) =
4
P (H1 |E) =
Since in this case we do have P (H1 , H2 |E) = P (H1|E)P (H2|E) we have the required conditional independence. Intuitively, this result makes sense because once we know that the coin
flipped is fair we expect the result of each flip to be independent.
2
2
= ,
3
3
since this is greater than the probability we would win under the strategy were we dont
switch this is the action we should take.
P (Y |X)P (X)
.
P (Y )
78
1
1
2
+0+ = .
3
3
3
1
1
1(1/3)
= > .
2/3
2
3
Thus the probability that prisoner X will be set free has increased and prisoner X has
learned from his question.
proof of the cumulative summation formula for E[X] (book notes page 1109)
We have from the definition of the expectation that
E[X] =
X
i=0
iP {X = i} .
expressing this in terms of the the complement of the cumulative distribution function
P {X i} gives E[X] equal to
X
i=0
i(P {X i} P {X i + 1}) .
Using the theory of difference equations we write the difference in probabilities above in
terms of the operator defined on a discrete function f (i) as
g(i) g(i + 1) g(i) ,
giving
E[X] =
X
i=0
ii P {X i} .
No using the discrete version of integration by parts (demonstrated here for two discrete
functions f and g) we have
f (i)g(i) =
f (i)g(i)|
i=1
i=0
f (i)g(i) ,
i=1
X
i=1
i}|
i=1
P {X i} .
79
X
i=1
P {X i}
1
2
3
4
5
6
1
(2,1)
(2,2)
(4,3)
(5,4)
(6,5)
(7,6)
2
(3,2)
(4,2)
(5,3)
(6,4)
(7,5)
(8,6)
3
4
5
6
(4,3) (5,4) (6,5) (7,6)
(5,3) (6,4) (7,5) (8,6)
(6,3) (7,4) (8,5) (9,6)
(7,4) (8,4) (9,5) (10,6)
(8,5) (9,5) (10,5) (11,6)
(9,6) (10,6) (11,6) (12,6)
Table 7: The possible values for the sum (the first number) and the maximum (the second
number) observed when two die are rolled.
which is the desired sum. To see this another way we can explicitly write out the summation
given above and cancel terms as in
E[X] =
X
i=0
=
+
+
=
i(P {X i} P {X i + 1})
0 + P {X 1} P {X 2}
2P {X 2} 2P {X 3}
3P {X 3} 3P {X 4} + . . .
P {X 1} + P {X 2} + P {X 3} + . . . ,
Exercise C.3-2 (expectation of the index of the max and min of a random array)
Since the array elements are assumed random the maximum can be at any of the n indices
with probability 1/n. Thus if we define X to be the random variable representing the location
of the maximum of our array we see that the expectation of X is given by
1
1
1
E[X] = 1
+2
+ +n
n
n
n
n
1 n(n + 1)
1X
k=
=
n k=1
n
2
=
n+1
.
2
17
= 0.0787 .
216
To verify that these numbers are correct in the Matlab file exercise C 3 3.m, a Monte-Carlo
simulation is developed verifying these values. This function can be run with the command
exercise C 3 3(100000).
81
XX
(x + y)P {X = x, Y = y} ,
x
since X and Y are non-negative. Then using the linearity of the above we have
XX
XX
xP {X = x, Y = y} +
yP {X = x, Y = y}
E[max(X, Y )]
x
X
x
xP {X = x} +
= E[X] + E[Y ] ,
X
y
yP {Y = y}
E[X]
,
t
X
x<t
xP {X = x} +
X
xt
xP {X = x} .
P
But since X is non-negative we can drop the expression x<t xP {X = x} from the above
and obtain a lower bound. Specifically we have
X
E[X]
xP {X = x}
xt
X
xt
P {X = x}
= tP {X t} ,
or the desired result.
sBt
P {X = 0} = 1 p
P {X = 1} = p .
E[X 2 ] E[X]2
p p2 = p(1 p)
E[X](1 E[X])
E[X] E[1 X] .
Where the transformation 1 E[X] = E[1 X] is possible because of the linearity of the
expectation.
E[(aX)2 ] E[aX]2
E[a2 X 2 ] a2 E[X]2
a2 (E[X 2 ] E[X]2 )
a2 Var(X) ,
while the sum of 11 occurs for the elements (6, 5), (5, 6). The the probability we roll a 7 or
an 11 is given by
2
8
2
6
+
=
= .
p=
36 36
36
9
if we take this as the probability of success then the remaining results from the book follow.
X
pX k
p(1 p)k1 =
q
q
k=1
k=1
!
X
p
=
qk 1
q k=0
p
1
=
1
q 1q
p 1
=
1
q p
p 1p
= 1,
=
q
p
as we were to show.
Exercise C.4-2 (average number of times to obtain three heads and tails)
Let one trial consist of flipping six coins. Then we consider a success when we have obtained
6
three head and three tails. Thussince
there are 2 possible outcomes of six flips when the
6
the probability of success (p) for this experiment
flips are ordered there are then
3
is given by
6
3
20
p=
=
= 0.3125 .
6
2
64
Since the number of trials needed to obtain a success is a geometric distribution we have
that the average number of times we must perform this experiment is given by the average
for a geometric distribution with probability of success p or
1
1
=
= 3.2 .
p
0.3125
85
1
1 + ( ) ,
n! = 2n
e
n
we can simplify the factorial expression above (and dropping the order symbols ) we have
nq
np
n n
n!
1
1
e
e
2n
(nq)!(np)!
e
2nq nq
2np np
1/2
nn
n
1
=
(nq)nq (np)np
2 (nq)(np)
1/2
n
1
1
1
=
.
q q pp
2 nqp
86
1
2npq
1/2
as we were to show.
we see that for large n the probability b(0; n, 1/n) is approximately e1 . The probability of
at least one success is given by b(1; n, 1/n). This is equal to
1
n1
1
1
n
b(1; n, 1/n) =
1
1
n
n
n1
1
1
1
= n
n
n
n1
1
=
1
n
n
1
1
1
=
n
1 n1
n
1
1
as n + .
n
Using the limiting argument above we have this probability equal to e1 as n goes to infinity.
Guildenstern when his flip lands tails. Then both professors will obtain the same number
of heads if say professor Rosencrantz has k successes while professor Guildenstern has n k
successes. Thus the number of sequences (from our 4n ) that have the same number of heads
for both professors is an equivalent problem to selecting k + (n k) = n total locations
from among 2n possible and declaring these to be the locations of the successes for both
professors. This means that if a success is placed before the n + 1th flip it is considered to be
a success for professor Rosencrantz and denotes a head (the remaining flips for Rosencrantz
are then all tails). If a success falls after the nth flip it is considered to be a success for
professor Guildenstern and is considered to be a tail (the remaining flips for Guildenstern
are considered
to be all heads). The number of sequences with n total successes is given by
2n
and so the probability of obtaining the same number of heads is given by
n
2n
n
,
4n
as claimed. Using the result from Exercise C.1-13 which derives the approximate
1
4n
2n
1 + O( ) .
=
n
n
n
we can simplify the probability of obtaining the same number of heads and find that is is
approximately equal to
1
1
1 + O( ) .
n
n
Another way to approach this problem is to explicitly sum the probability that professor
Rosencrantz has k successes while professor Guildenstern has nk successes. The probability
that both these events happen (since they are independent) is given by the product of the
appropriate binomial distribution i.e.
b(k; n, 1/2)b(n k; n, 1/2) .
So the total probability of that the two professors get the same number of heads is given by
the sum of that probability for all possible ks (k = 0 to k = n) or
n
X
k=0
n
X
b(k; n, 1/2)2
k=0
n
X
2 2n
1
n
=
k
2
k=0
n
2
1 X n
.
= n
4 k=0 k
88
In the above we have used the symmetry of the binomial distribution with respect to k and
n k, i.e. b(k; n, 1/2) = b(n k; n, 1/2), which a special case of the result shown in Exercise
C.4-3 above. Equating these two results we have a nice combinatorial identity
2
n
X
n
2n
,
=
k
n
k=0
as claimed.
= 2nH( k )n ,
n
X
Ii .
i=1
Here where Ii is one if the ith Bernoulli trial is a success and zero otherwise. Then since
the definition of P {X < k} means the sum of the probability that X takes the values
0, 1, . . . , k 1 or
k1
X
P {X < k} =
P {X = i} ,
i=0
the probability we are attempting to bound is given in terms of equality probabilities i.e.
expressions like P {X = i}. Now for the random variable X to be exactly i means that only
89
i of the Bernoulli trials were a success (no more and no less) and the remaining trials were a
failure. Thus if we select a subset of size i from the n total Bernoulli random variables, the
above probability is expressed as the sum over all such subsets of size i i.e.
X
(26)
P {X = i} =
pl1 pl2 pli (1 pli+1 )(1 pli+2 ) (1 pln ) .
Here the sum is over all possible subsets of size i from n and the indices li select which of
the i Bernoulli indicator variables from n were successful. Now since pi p it follows that
1 pi 1 p and thus using these for each factor in the product we have that the term in
the sum above is bounded as
pl1 pl2 pli (1 pli+1 )(1 pli+2 ) (1 pln ) pi (1 p)ni ,
n
Since there are
total terms in the sum above we see that
i
n
pi (1 p)ni = b(i; n, p) .
P {X = i}
i
(27)
k1
X
i=0
P {X = i}
k1
X
b(i; n, p) ,
i=0
P {X k} =
n
X
i=k
P {X = i}
n
X
i=k
P {X = i} = P {X k} ,
From Corollary C.5 we have that (with X the number of success obtained in our 4n flips)
that
n1
X
b(i; 4n, 1/2) < b(n; 4n, 1/2) .
P {X < n} =
i=0
n 4nn
1
1
b(n; 4n, 1/2) =
2
2
4n
1
4n!
=
3n!n! 2
4n
n
4n
1
.
2
Since the numerator in the above has n factors we can break this factor up into n products
(each of which is less than 4) like the following
4n
4n 1
4n 2
4n 3
3n + 2
3n + 1
.
n
n1
n2
n3
2
1
Therefore since each term in the above is bounded by 4 and there are n of them we can
further bound b(n; 4n, 1/2) as follows
4n n
1
1
b(n; 4n, 1/2) < 4
=
.
2
4
n
Thus since
n1
X
i=0
n n
1
1
b(i; 4n, 1/2) <
<
,
4
2
we see that obtaining fewer than n heads in 4n flips of a fair coin is less likely then obtaining
no heads when flipping a fair coin n times.
91
np
i=k+1
To prove this lets first consider the ratio b(i + 1; n, p)/b(i; n, p) which is given by
n
pi+1 (1 p)ni1
i+1
b(i + 1; n, p)
=
b(i; n, p)
n
pi (1 p)ni
i
ni
p
=
.
i+1 1p
Thus (since we assume that i is taken between k and n i.e. that k < i < n) we have
p
nk
b(i + 1; n, p)
.
b(i; n, p)
k+1
1p
Defining x to be the above expression we see that x < 1 through the following arguments
p
n np
x <
np + 1
1p
n(1 p)p
np
=
=
(np + 1)(1 p)
np + 1
np
np
<
<
= 1.
np + 1
np
Now using the ratio above and the definition of x we have that b(i + 1; n, p) is bounded by
b(i; n, p) as
b(i + 1; n, p) < xb(i; n, p) .
In the same way, b(i + 2; n, p) is bounded by b(i; n, p) as
b(i + 2; n, p) < xb(i + 1; n, p) < x2 b(i; n, p) .
Continuing this iteration scheme for i + 3, i + 4, we see that the probability X > k (the
right tail probability) is bounded above by
n
X
b(i; n, p) < xb(k; n, p) + x2 b(k; n, p) + + xn b(k; n, p) .
i=k+1
< b(k; n, p)
= b(k; n, p)
92
i=1
X
i=1
xi
x
1x
(nk)p
knp
x
(n k)p
=
,
1x
1 p + k np
so we can finally conclude that
i=k+1
b(i; n, p)
(n k)p
b(k; n, p) ,
k np
We next prove Corollary C.7 which is the statement that given a sequence of n Bernoulli
trials each with probability p of being a success and a specific number of success k in the
right tail of our distribution ( np+n
< k < n) that the probability of obtaining more than k
2
success is less than one half the probability of obtaining more than k 1 success. This is
really a statement that as we require having more and more successes from our n Bernoulli
trials the probabilities decrease geometrically (with rate 1/2). We will begin by showing that
the coefficient of b(k; n, p) in Corollary C.6 is less than one. We have since np+n
<k<n
2
that
n np+n
nk
2
p = p < 1.
p < np+n
k np
np
2
Thus from Corollary C.6 we have that
n
X
b(i; n, p) < b(k; n, p) ,
P {X > k} =
i=k+1
or equivalently that
b(k; n, p)
> 1.
i=k+1 b(i; n, p)
Pn
1
1
<
= ,
1+1
2
a
a+1
< 1, then
q = 1p = 1
1
a
=
.
a+1
a+1
93
n
i
pi q ni , i.e.
From this we see that ai = (a + 1)n pi q ni and thus our desired sum is given by
k1
k1
X
X
n
n
i
n
a = (a + 1)
pi q ni .
i
i
i=0
i=0
Which can be seen as the left tail of the binomial distribution for which we have developed
bounds for in the text. Specifically if X is a binomial random variable (with parameters
(n, p)) then using the bounds that
P {X < k} =
k1
X
b(i; n, p) <
i=0
kq
b(k; n, p) ,
np k
1
k a+1
a
a
)
b(k; n,
)
b(i; n,
a
a+1
a+1
n a+1 k
=
k
a
b(k; n,
).
na k(a + 1)
a+1
k(a
+
1)
a
+
1
i=0
as we were requested to show.
pi q ni .
i=0
Since 1
n
i
i=0
94
i ni
pq
i=0
<
kq
np k
np k
k
nq
nk
nk
(n )e
r
r
To show the second identity, we will apply Corollary C.9 to the random variable Y , but
in this case the probability of success for each Bernoulli trial is a constant p. Thus the
probability of failure in each Bernoulli trial is also a constant q = 1 p. We thus obtain
(since E[Y ] = nq)
nqe r
.
P {Y nq r}
r
95
Again using the fact that Y = n X to write the above expression in terms of X we find
nqe r
P {n X nq r}
,
r
or
P {n(1 q) X r}
or
P {np X r}
the desired result.
nqe r
r
nqe r
r
pi eqi + qi epi e
as suggested in the text. If anyone has such a proof please contact me.
Using the notation in the text we have that (by using Markovs inequality)
P {X r} E[e(X) ]er .
P
Since X is the number of success in n independent Bernoulli trials X = ni=1 Xi the expectation of the exponential can be written as the product of the individual indicator random
variables Xi as
n
Y
(X)
E[e
]=
E[e(Xi pi ) ] .
i=1
The explicit expectation of each term in the product is easily computed using the definition
of expectation and gives
E[e(Xi pi ) ] = pi eqi + qi epi .
2 /2
E[e
n
Y
2 /2
2 /2
= en
i=1
2 /2
P {X r} en
2 /2)
er = e(rn
To compute the tightest possible upper bound on this probability we can minimize the
right hand side of this expression with respect to . This is equivalent to maximizing the
expression r n2 /2 with respect to . Taking the derivative of this expression and
setting it equal to zero gives
r n = 0 ,
96
which has = r/n as its solution. The second derivative of r n2 /2 with respect to
begin negative imply that this value of corresponds to a maximum. The value of this
quadratic at this maximal is given by
r
n r2
r2
r
=
.
n
2 n2
2n
This result intern implies that we have an upper bound on P {X r} given by
r2
P {X r} e 2n ,
as we were required to show.
We can check that this point is indeed a minimum of our right-hand side by computing the
sign of the second derivative at that point. The second derivative is given by
d2
exp(e r) = exp(e r)(e r)2 + exp(e r)(e )
2
d
= exp(e r)(2e2 2re + r 2 + e )
= exp(e r)(2e2 (2r 1)e + r 2 ) .
Evaluating this expression at = ln(r/) (and ignoring the exponential factor which does
not affect the sign of the second derivative) gives
2
r
2
97
placements of identical balls where no more than one ball can go in any given bin.
Part (e): We assume in this part that we have more balls than bins or n > b. Consider all
of our n indistinguishable balls lined up in a row and note that our problem is equivalent
to that of counting the number of ways we can select b 1 spaces (which will represent
the b 1 internal bin boundaries) from all n 1 possible spaces in our linear ordering of the
balls. The number of ways to select b 1 things from a set of n 1 is given by
n1
,
b1
or the claimed number.
98
References
[1] W. G. Kelley and A. C. Peterson. Difference Equations. An Introduction with Applications. Academic Press, New York, 1991.
99