Lectures On Number Theory - Lindahl
Lectures On Number Theory - Lindahl
Lectures On Number Theory - Lindahl
by
Lars-ke Lindahl
Contents
Preface
1. Divisibility 1
2. Prime Numbers 7
3. The Linear Diophantine Equation ax + by = c 11
4. Congruences 14
5. Linear Congruences 18
6. The Chinese Remainder Theorem 20
7. Public-Key Cryptography 25
8. Pseudoprimes 26
9. Polynomial Congruences with Prime Moduli 28
10. Polynomial Congruences with Prime Power Moduli 32
11. The Congruence x
2
a (mod m) 35
12. General Quadratic Congruences 39
13. The Legendre Symbol and the Lemma of Gauss 40
14. Quadratic Reciprocity 42
15. Primitive Roots 44
16. Arithmetic Functions 49
17. Sums of Squares 52
18. Pythagorean Triples 55
19. Fermats Last Theorem 57
20. Continued Fractions 58
21. Simple Continued Fractions 63
22. Rational Approximations to Irrational Numbers 66
23. Periodic Continued Fractions 71
24. Continued Fraction Expansion of
d 77
25. Pells Equation 79
Preface
The present lecture notes contain material for a 5 credit points course in Elementary
Number Theory. The formal prerequisites for the material are minimal; in particular no
previous course in abstract algebra is required. High school mathematics, familiarity with
proofs by mathematical induction and with the basic properties of limits of sequences of
real numbers (in particular the fact that a bounded monotone sequence of real numbers is
convergent) are all that is needed. (The discussion of the prime number counting function
(x) in section 2 requires more calculus skills, but this part could be skipped without any
loss of continuity.)
A preliminary version of these notes has been carefully reviewed by my student
JoakimElgh, andI wouldlike to thank himfor some very useful suggestions andimprove-
ments.
Uppsala, 2002 Lars-ke Lindahl
Lectures on Number Theory
1. Divisibility
Denition 1.1 An integer b is divisible by an integer a, written a[b, if there is an integer
x such that b = ax. We also say that b is a multiple of a, and that a is a divisor of b.
Any integer a has 1 and a as divisors. These divisors are called trivial.
The proof of the following simple properties are left to the reader.
Proposition 1.2 Let a, b and c be integers.
(i) If a[b and b ,= 0, then [a[ [b[ .
(ii) If a[b, then a[bc.
(iii) If a[b and b[c, then a[c.
(iv) If c[a and c[b, then c[(ax + by) for all integers x and y.
(v) If a[b and b[a, then a = b.
(vi) Assume c ,= 0. Then a[b if and only if ac[bc.
Denition 1.3 Every nonzero integer a has nitely many divisors. Consequently, any
two integers a and b, not both = 0 have nitely many common divisors. The greatest of
these is called the greatest common divisor and it is denoted by (a, b) .
In order not to have to avoid the special case a = b = 0, we also dene (0, 0) as the
number 0. (One good reason for this choice will appear in Theorem 1.9.)
By denition, if at least one of the numbers a and b is nonzero, then
d = (a, b) d[a d[b (x[a x[b x d).
Obviously, (b, a) = (a, b) = (a, b) = (a, b) = (a, b) , so when calculating the
greatest common divisor of two numbers we may replace them by their absolute values.
Example 1. The number 102 has the positive divisors 1, 2, 3, 6, 17, 34, 51, 102, and
the number 170 has the positive divisors 1, 2, 5, 10, 17, 34, 85, and 170. The common
positive divisors are 1, 2, 17, and 34. Hence (102, 170) = 34. To determine the greatest
common divisor by nding all common divisors is obviously not a feasible method if the
given numbers are large. .
2 Lectures on Number Theory
Proposition 1.4 For all integers n, (a, b) = (a nb, b) .
Proof. Write r = a nb; then a = r + nb. Assuming c[b we now see from Proposition
1.2 (iv) that c[a if and only if c[r . Consequently, the pairs a, b and a, r have the same
common divisors. In particular, they have the same greatest common divisor. .
We can extend the denition of greatest common divisor in a straightforward way.
Given n integers a
1
, a
2
, . . . , a
n
not all zero, we dene their greatest common divisor
(a
1
, a
2
, . . . , a
n
) to be the greatest integer which divides all the given numbers. Finally, we
dene (0, 0, . . . , 0) = 0.
If (a, b) = 1 we say that a and b are relatively prime. More generally, the integers a
1
,
a
2
, . . . , a
n
are called relatively prime if (a
1
, a
2
, . . . , a
n
) = 1, and they are called pairwise
relatively prime if any two of them are relatively prime.
Example 2. The numbers 4, 6, and 9 are relatively prime but not pairwise relatively
prime. .
Theorem 1.5 (The Division Algorithm) Given integers a and b with a > 0 there exist two
unique integers q and r such that b = aq + r and 0 r < a.
The number q is calledthe quotient and r is calledthe (principal) remainder. Obviously,
q = [b/a] (= the greatest integer b/a).
Proof. Consider the arithmetic progression
. . . , b 3a, b 2a, b a, b, b + a, b + 2a, b + 3a, . . .
This sequence contains a smallest non-negative number r . By denition, r = b qa for
some integer q, and clearly 0 r < a. This proves the existence.
To prove uniqueness, suppose we also have b = aq
/
+ r
/
with 0 r
/
< a. Then
r r
/
= a(q
/
q) and a < r r
/
< a.
Thus a[(r r
/
) , and it follows that r r
/
= 0 or [a[ [r r
/
[ . Since the latter case is
excluded, we conclude that r r
/
= 0, that is r = r
/
. Therefore a(q q
/
) = 0, which implies
q q
/
= 0, i.e. q = q
/
. .
More generally, we say that r
/
is a remainder when b is divided by a whenever
there is an integer q
/
such that b = aq
/
+r
/
without any further restriction on r
/
. If r
/
is an
arbitrary remainder and r is the principal remainder then obviously r
/
r = na for some
integer n, and conversely. For the principal remainder r we either have 0 r a/2
or a/2 < r < a, and in the latter case the remainder r
/
= r a satises the inequality
a/2 < r
/
< 0. Hence, there is always a uniqe remainder r satisfying the inequality
a/2 < r a/2. This is the remainder of least absolute value. We thus have the following
division algorithm, which for some purposes is more efcient than the ordinary one.
Theorem 1.5
/
(Modied Division Algorithm) Given integers a and b with a > 0 there
exist two unique integers q and r such that b = aq + r and a/2 < r a/2.
Example 3. 37 = 2 13 + 11 = 3 13 2. 11 is the principal remainder and 2 is the
least absolute remainder. .
We now turn to an important class of subsets of Z.
1. Divisibility 3
Denition 1.6 A non-empty set A of integers is called an ideal if it is closed under
subtraction and under multiplication by arbitrary integers, that is if it has the following
two properties:
(i) x, y A x y A
(ii) x A, n Z nx A.
Example 4. The sets 0, Z, and 0, 3, 6, 9, . . . are ideals. More generally, given
any integer g, the set A = ng [ n Z consisting of all multiples of g is an ideal. This
ideal is said to be generated by the number g, and it will be denoted by gZ. Thus, using
this notation, 3Z = 0, 3, 6, 9, . . ..
Note that the trivial ideal 0 is generated by 0 and that the whole set Z is generated
by 1. .
To show that a subset A of Z is an ideal it sufces to verify that (i) holds, because
we have the following result.
Proposition 1.7 A non-empty subset A of Z is an ideal if x, y A x y A.
Proof. Suppose A is a non-empty subset with property (i) of Denition 1.6, and let x
0
be an element of A. Since 0 = x
0
x
0
we rst note that 0 A. Then we see that
x A x = 0 x A and that x, y A x, y A x + y A, i.e. the set
A is closed under addition. Next assume that the implication x A nx A holds
for a certain nonnegative integer n (this is certainly true for n = 0). Then we also have
x A (n + 1)x = nx + x A. Hence, it follows by induction that x A nx A
holds for each nonnegative integer n. Finally, if x A and n is a negative integer, then
n is positive, so it follows rst that (n)x A and then that nx = (n)x A. This
shows that property (ii) of Denition 1.6 holds for A. .
Remark. The ideal concept is a ring concept. A ring is a set with two operations, addition and
multiplication, satisfying certain natural axioms. The integers Z forma ring, andanother important
example is given by the set of polynomials with ordinary polynomial addition and multiplication
as operations. For ideals in general rings, property (ii) does not follow from property (i). Thus the
ring Z is special in that respect.
The ideals that are listed in Example 4 are all generated by a single number g. We
next show that all ideals of Z have this property.
Theorem 1.8 Every ideal A is generated by a unique nonnegative number g, that is A = gZ =
ng [ n Z. If A is not equal to the zero ideal 0, then the generator g is the smallest positive
integer belonging to A.
Proof. The zero ideal is generated by 0, so assume that A contains some nonzero integer
x
0
. Since by (ii), A also contains the number x
0
( = (1)x
0
), A certainly contains a
positive integer. Let g be the least positive integer belonging to A.
We will prove that A is generated by the number g. That ng belongs to A for every
integer n follows immediately from (ii), so we only have to prove that there are no other
numbers in A. Therefore, let b A and divide b by g. By the division algorithm, there
exist integers q and r with 0 r < g such that b qg = r . Since qg A it follows
from (i) that r A, and since g is the least positive integer in A, we conclude that r = 0.
Hence b = qg as claimed. .
We will now use Theorem 1.8 to characterize the greatest common divisor. Let a
and b be two integers and consider the set
A = ax + by [ x, y Z.
4 Lectures on Number Theory
The set A is clearly closed under subtraction, i.e. A is an ideal, and by the previous
theorem, A is generated by a unique nonnegative number g. This number has the
following two properties
(i) There exist integers x
0
, y
0
such that ax
0
+ by
0
= g
(ii) For all integers x and y there exists an integer n such that ax + by = ng.
Taking x = 1 and y = 0 in (ii) we see that a = ng for some integer n and hence
g[a. Similarly, g[b, so g is a common divisor of a and b. Using (i), we see that every
common divisor of a and b is a divisor of g. In particular, the greatest common divisor
d = (a, b) divides g and hence d g. It follows that g is the greatest common divisor, i.e.
g = (a, b) .
This is also true in the trivial case a = b = 0, for then g = 0 and we have dened
(0, 0) to be the number 0.
Our discussion is summarized in the following theorem.
Theorem 1.9 The ideal ax+by [ x, y Z is generated by the greatest common divisor (a, b) ,
i.e.
(i) There exist integers x
0
and y
0
such that ax
0
+ by
0
= (a, b) .
(ii) ax + by is a multiple of (a, b) for all integers x and y.
The proof of Theorem 1.9 is easily extended to cover the case of n integers a
1
, a
2
,
. . . , a
n
instead of two integers a and b. The general result reads as follows.
Theorem 1.9
/
Let a
1
, a
2
, . . . , a
n
be any integers. The ideal
a
1
x
1
+ a
2
x
2
+ + a
n
x
n
[ x
1
, x
2
, . . . , x
n
Z
is generated by the greatest common divisor d = (a
1
, a
2
, . . . , a
n
) , i.e.
(i) There exist integers y
1
, y
2
, . . . , y
n
such that a
1
y
1
+ a
2
y
2
+ + a
n
y
n
= d.
(ii) a
1
x
1
+ a
2
x
2
+ + a
n
x
n
is a multiple of d for all integers x
1
, x
2
, . . . , x
n
.
Corollary 1.10 If c[a and c[b, then c [(a, b) , i.e. every common divisor of a and b is a divisor
of the greatest common divisor (a, b) .
Proof. By Theorem 1.9 (i) we have ax
0
+ by
0
= (a, b) , and the conclusion of the corollary
now follows from Proposition 1.2 (iv). .
Corollary 1.11 (i) (ca, cb) = c(a, b) for every nonnegative integer c.
(ii) If d = (a, b) ,= 0, then
_
a
d
,
b
d
_
= 1.
Proof. (i) Write d = (a, b) . By Theorem1.9, the ideal ax+by [ x, y Z is generatedby d.
Now cax + cby = c(ax + by) , so it follows that the ideal cax + cby [ x, y Z is generated
by cd. But the latter ideal is according to Theorem 1.9 also generated by the number
(ca, cb) . Since the nonnegative generator is unique, we conclude that (ca, cb) = cd.
(ii) By (i), d
_
a
d
,
b
d
_
= (a, b) = d. The result now follows upon division by d.
.
Theorem 1.12 If (a, b) = 1 and a[bc, then a[c.
Proof. Assume (a, b) = 1 and a[bc. Since clearly a[ac, it follows that a is a common
divisor of ac and bc. By Corollary 1.11, (ac, bc) = c(a, b) = c, and the conclusion a[c now
follows from Corollary 1.10. .
1. Divisibility 5
Theorem 1.13 If a[c, b[c and (a, b) = 1, then ab[c.
Proof. By assumption, c = am for some integer m. Since b[am and (b, a) = 1, we
conclude from Theorem 1.12 that b[m, that is m = bn for some integer n. Hence, c = abn,
i.e. ab[c. .
Theorem 1.14 If (a, b) = (a, c) = 1, then (a, bc) = 1.
Proof. By Theorem 1.9 there are integers x, y and z, w such that ax + by = 1 and
az + cw = 1. Then by cw = (1 ax)(1 az) = 1 an, where n = x + z axz is an integer.
Hence, an + bcyw = 1, and we conclude from Theorem 1.9 that (a, bc) = 1. .
We now turn to the problem of efciently calculating the greatest common divisor
of two integers a and b. We can of course assume that both are non-negative and that
a b.
If b = 0 then (a, b) = (a, 0) = a and there is nothing more to do. Otherwise, we use
Proposition 1.4 to see that (a, b) = (a nb, b) for all integers n. In particular, using the
ordinary division algoritm a = qb + r with 0 r < b we obtain
(1) (a, b) = (a qb, b) = (r, b) = (b, r).
If r = 0, then we are nished, because (a, b) = (b, 0) = b. Otherwise, (1) allows us to
replace the pair (a, b) with the smaller pair (b, r) , where r < b < a, and we can repeat
the whole procedure. Since at each step we get a new pair with smaller integers, we must
nally reach a stage where one of the numbers is 0.
The whole procedure may be summarized as follows.
Theorem 1.15 (The Euclidean Algorithm) Let a and b be integers with a b 0. Put
a
0
= a and b
0
= b.
(i) If b
0
= 0, then (a, b) = a
0
.
(ii) Otherwise, using the division algorithm calculate q and r such that a
0
= qb
0
+ r with
0 r < b
0
.
(iii) Put a
0
= b
0
and b
0
= r and go to (i).
The algorithm must terminate, because the successive b
0
:s form a decreasing sequence of
non-negative integers.
Instead of using the principal remainder, we could also use the remainder of least
absolute value at each step. In general, this procedure will require fewer iterations. This
modied algorithm runs as follows:
Theorem 1.15
/
(The Euclidean Algorithm with least absolute remainder) Let a and b
be integers with a b 0. Put a
0
= a and b
0
= b.
(i) If b
0
= 0, then (a, b) = a
0
.
(ii) Otherwise, using the division algorithm calculate q and r such that a
0
= qb
0
+ r with
[r[ b
0
/2.
(iii) Put a
0
= b
0
and b
0
= [r[ and go to (i).
(In (iii) we use the fact that (a
0
, b
0
) = (a
0
, b
0
) so it does not matter that we use [r[ in
order to get a nonnegative number b
0
.) Again, the algorithm must terminate because at
each step the new b
0
is at most half of the old one.
Example 5. Let us calculate (247, 91) . The ordinary division algorithm gives
247 = 2 91 + 65
91 = 1 65 + 26
65 = 2 26 + 13
26 = 2 13.
6 Lectures on Number Theory
Hence (247, 91) = (91, 65) = (65, 26) = (26, 13) = (13, 0) = 13.
By instead using least absolute remainders, we obtain the following sequence as a
result of the division algorithm:
247 = 3 91 26
91 = 3 26 + 13
26 = 2 13.
Hence (247, 91) = (91, 26) = (26, 13) = (13, 0) = 13. .
By Theorem 1.9, we know that the linear equation
ax + by = (a, b)
has at least one integer solution x
0
and y
0
. (We will see later that there are in fact
innitely many integer solutions.) As a by-product of the Euclidean Algorithm we have
an algorithm for nding such a solution. Denoting the successive pairs (a
0
, b
0
) obtained
during the process by (a
0
, b
0
) , (a
1
, b
1
) , (a
2
, b
2
) , . . . , (a
n
, b
n
) , with b
n
= 0, we have
a
0
= a, b
0
= b
a
i
= b
i1
, b
i
= a
i1
q
i
b
i1
for suitable integers q
i
, i = 1, 2, . . . , n
a
n
= (a, b)
It follows that each of the numbers a
i
and b
i
is a linear combination of the previous ones
a
i1
and b
i1
and hence ultimately a linear combination of a and b, that is a
i
= x
i
a + y
i
b
for suitable integers x
i
, y
i
, which can be found by calculating backwards, and similarly
for b
i
. In particular, this holds for (a, b) = a
n
.
Example 6. Going backwards in the calculations in Example 5, using the absolute
remainder variant, we nd that
13 = 91 3 26 = 91 3 (3 91 247) = 3 247 8 91.
Hence, the equation 247x+91y = (247, 91) has x = 3, y = 8 as one of its integer solutions.
.
The union I J of two ideals I = aZ and J = bZ in Z need not be an ideal. In
fact, the union is an ideal if and only if one of the two ideals I and J is a subset of
the other, i.e. if and only if one of the two generators a and b is divisible by the other.
However, there is always a smallest ideal which contains the union I J , namely the ideal
(a, b)Z = ax + by [ x, y Z. Thus, the greatest common divisor (a, b) is (uniquely
determined as) the non-negative generator of the smallest ideal containing the union
aZ bZ.
On the other hand, it is completely obvious that the intersection I J of two ideals
I = aZ and J = bZ is an ideal. (Indeed, the intersection of any number of ideals is an
ideal.) By denition, an integer x belongs to this intersection if and only if a[x and b[x,
i.e. if and only if x is a common multiple of a and b.
Thus, the ideal aZbZ coincides withthe set of all commonmultiples of the numbers
a and b. This observation leads us to the following concept, which is dual to the concept
of greatest common divisor.
2. Prime Numbers 7
Denition 1.16 Let a and b be two integers. The nonnegative generator of the ideal
aZ bZ is called the least common multiple of the two numbers, and it is denoted by
[a, b] . More generally, given any sequence a
1
, a
2
, . . . , a
n
of integers, we dene their least
common multiple [a
1
, a
2
, . . . , a
n
] to be the uniquely determined nonnegative generator
of the ideal a
1
Z a
2
Z . . . a
n
Z.
Note that [a, b] = 0 if a = 0 or b = 0, because the intersection aZ bZ is then equal
to the trivial ideal 0. If a and b are both nonzero, then aZ bZ is a nontrivial ideal
since it certainly contains the number ab. Thus, nontrivial common multiples exist, and
the least common multiple [a, b] is a positive integer in that case.
Example 7. [30, 42] =210, because in the sequence 30, 60, 90, 120, 150, 180, 210, . . . of
multiples of 30, the number 210 is the rst one that is also a multiple of 42. .
Proposition 1.17 [ca, cb] = c[a, b] if c is a nonnegative number.
Proof. [ca, cb]Z = caZ cbZ = c(aZ bZ) = c[a, b]Z. .
Proposition 1.18 Let a and b be nonnegative integers. Then [a, b] (a, b) = ab.
Proof. If one of the two numbers equals zero, then [a, b] = ab = 0, wo we may assume
that a and b are both positive. Let d = (a, b) . If d = 1, then any common multiple of a
and b must also by a multiple of ab, by Theorem 1.13, and it follows that ab must be the
least common multiple of a and b, i.e. ab = [a, b] = [a, b] (a, b) .
If d > 1, then
_
a
d
,
b
d
_
= 1. According to the case just proved,
_
a
d
,
b
d
=
a
d
b
d
. Now
multiply this equality by d
2
and apply Propostion 1.17 to obtain ab = d
2
_
a
d
,
b
d
= d [a, b] =
(a, b) [a, b] . .
2. Prime Numbers
Denition 2.1 An integer > 1 is called a prime number or a prime if it has only trivial
divisors. An integer > 1 which is not a prime is called composite.
Thus, p > 1 is a prime number if and only if 1 < x < p x,[ p.
Theorem 2.2 Let p be a prime number. If p[bc, then p[b or p[c.
Proof. Assume that p[bc but p,[ b. Since p has only trivial divisors, it follows that
(p, b) = 1. Hence p[c by Theorem 1.12. .
Theorem 2.2 is easily extended to
Theorem 2.2
/
Let p be a prime number. If p[b
1
b
2
b
n
, then p[b
i
for some i .
Proof. By Theorem 2.2, p[b
1
b
2
b
n
p[b
1
p[b
2
. . . b
n
. The result now follows by
induction. .
8 Lectures on Number Theory
Theorem 2.3 (The Fundamental Theorem of Arithmetic) Every integer n > 1 can be
expressed as a product of primes in a unique way apart from the order of the prime factors.
Proof. The existence of such a factorization is proved by induction. Assume that every
integer less than n can be written as a product of primes. If n is a prime, then we have
a factorization of n consisting of one prime factor. If n is composite, than n = n
1
n
2
with
1 < n
1
< n and 1 < n
2
< n, and it follows from the induction hypothesis that each of n
1
and n
2
is a product of primes. Therefore, n is also a product of primes.
Now suppose that there is an integer with to different factorizations. Then there
is a least such number n. Let n = p
1
p
2
p
r
= q
1
q
2
q
s
, where each p
i
and q
j
is
a prime and where the two factorizations are different. Since p
1
divides the product
q
1
q
2
q
s
, it follows from Theorem 2.2
/
that p
1
divides one of the prime numbers q
1
,
. . . , q
s
. Renumbering these numbers, we may assume that p
1
[q
1
, which of course means
that p
1
= q
1
. Dividing n by p
1
we get a smaller number
n
p
1
= p
2
p
3
p
r
= q
2
q
3
q
s
with two different prime factorizations, but this would contradict the assumption that n
is the smallest number with different factorizations. .
If the prime factorizations of two given numbers are known, then we can easily
determine their greatest common divisor and least common multiple.
Proposition 2.4 Let a and b be two positive integers and write a = p
m
1
1
p
m
2
2
p
m
k
k
and
b = p
n
1
1
p
n
2
2
p
n
k
k
, where p
1
, p
2
, . . . , p
k
are different primes and m
1
, m
2
, . . . , m
k
and
n
1
, n
2
, . . . , n
k
are nonnegative integers. Put d
j
= min(m
j
, n
j
) and D
j
= max(m
j
, n
j
) ; then
(a, b) = p
d
1
1
p
d
2
2
p
d
k
k
and [a, b] = p
D
1
1
p
D
2
2
p
D
k
k
.
Proof. Obvious. .
Theorem 2.5 There exist innitely many primes.
Proof. We will show that given any nite collection of primes p
1
, p
2
, . . . , p
n
there is a
prime q which does not belong to the collection. Let N = p
1
p
2
p
n
+1. By Theorem 2.3,
N has a prime factor q (which could be N itself). Since (N, p
j
) = (1, p
j
) = 1 for each j
whereas (N, q) = q, it follows that q ,= p
j
for each j . .
On the other hand, there are arbitrarily large gaps in the sequence of primes:
Proposition 2.6 For any natural number k there exist k consecutive composite numbers.
Proof. Consider the numbers (k + 1)! + 2, (k + 1)! + 3, . . . , (k + 1)! + (k + 1); they are
composite, because they are divisible by 2, 3, . . . , k + 1, respectively. .
Let (x) denote the number of primes that are less than or equal to the real number
x. Thus
(x) =
_
_
0 if x < 2
1 if 2 x < 3
2 if 3 x < 5
.
.
.
n if p
n
x < p
n+1
where p
n
denotes the nth prime number.
We will give a crude estimate for (x) . To this end, we will need the following
inequality.
2. Prime Numbers 9
Lemma 2.7 Let x be a real number > 2. Then
px
1
p
> lnln x 1.
Here, the sum is over all primes p satisfying p x.
Since lnln x tends to with x it follows from the inequality above that the sum
1/p over all primes is innite. This, of course, implies that there are innitely many
primes. Thus, by proving Lemma 2.7 we will obtain a new proof of Theorem 2.5.
Proof. Let p
1
, p
2
, . . . , p
n
denote all primes x, and put
^ = p
k
1
1
p
k
2
2
p
k
n
n
[ k
1
0, k
2
0, . . . , k
n
0,
i.e. ^ consists of 1 and all positive integers whose prime factorization only uses the
primes p
1
, p
2
, . . . , p
n
.
Since the factorization of any number x only uses primes that are x, the set ^
contains all of the numbers 1, 2, 3, . . . , [x] (= the greatest integer x). Consequently,
n^
1
n
[x]
n=1
1
n
_
[x]+1
1
dt
t
= ln([x] + 1) > ln x.
Now observe that
px
_
1
1
p
_
1
=
px
_
1 +
1
p
+
1
p
2
+ +
1
p
k
+
_
=
n^
1
n
.
Combining this with the previous inequality we obtain the inequality
px
_
1
1
p
_
1
> ln x,
and, by taking the logarithm of both sides, the inequality
(1)
px
ln
_
1
1
p
_
1
> lnln x.
Now use the Maclaurin expansion of ln(1 + x) to get
ln(1 x) = x +
x
2
2
+
x
3
3
+ . . . x +
x
2
2
(1 + x + x
2
+ . . .) = x +
x
2
2
_
1
1 x
_
for 0 x < 1. Since 1/(1 x) 2 when x
1
2
, we conclude that the inequality
ln(1 x)
1
= ln(1 x) x + x
2
holds for x
1
2
. In particular, if p is a prime, then
1
p
1
2
, and consequently,
ln(1
1
p
)
1
1
p
+
1
p
2
.
By summing these inequalities for all primes p x and comparing with (1), we obtain
10 Lectures on Number Theory
(2)
px
1
p
+
px
1
p
2
> lnln x.
Here the sum 1/p
2
over all primes x can be estimated as follows
px
1
p
2
n=2
1
n
2
n=2
1
n(n 1)
=
n=2
_
1
n 1
1
n
_
= 1,
and by combining this inequality with (2) we obtain the desired result
px
1
p
> lnln x 1. .
Lemma 2.8
px
1
p
=
(x)
x
+
_
x
2
(u)
u
2
du.
Proof. Let p
1
< p
2
< . . . < p
n
denote the primes x. Then
_
x
2
(u)
u
2
du =
n1
k=1
_
p
k+1
p
k
(u)
u
2
du +
_
x
p
n
(u)
u
2
du =
n1
k=1
_
p
k+1
p
k
k
u
2
du +
_
x
p
n
n
u
2
du
=
n1
k=1
k
_
1
p
k
1
p
k+1
_
+ n
_
1
p
n
1
x
_
=
n1
k=1
k
p
k
k=2
k 1
p
k
+
n
p
n
n
x
=
n
k=1
1
p
k
(x)
x
.
.
Theorem 2.9 For any > 0 and any real number , there exists a number x > such that
(x) > (1 )
x
ln x
.
Remark. For those who know the denition of limsup we can state Theorem 2.9 as
follows: limsup
x
(x)
x/ ln x
1.
Proof. Assume the theorem to be false. Then there is an > 0 and a real number such
that (x) (1 )
x
ln x
for all x > . But then
_
x
2
(u)
u
2
du =
_
2
(u)
u
2
du +
_
x
(u)
u
2
du C + (1 )
_
x
1
u ln u
du
= C + (1 )(lnln x lnln ) = D + (1 )(lnln x),
where C and D are constants (depending on ). Since obviously (x) < x, it now
follows from Lemma 2.8, that
px
1
p
(1 ) lnln x + Constant.
This contradicts Lemma 2.7. .
Theorem 2.9 can be sharpened considerably. The following result was conjectured
by Gauss and proven by J. Hadamard and Ch. de la Vall ee Poussin in 1896 using
advanced methods from the theory of functions of a complex variable.
3. The Linear Diophantine Equation ax+by=c 11
Theorem 2.10 (The Prime Number Theorem)
lim
x
(x)
x/ ln x
= 1.
The proof is too complicated to be given here.
We will now derive heuristically some conclusions from the Prime Number Theo-
rem. Firstly, it follows that (x)/x < C/ ln x for some constant C, and hence the ratio
(x)/x approaches 0 and the ratio (x (x))/x approaches 1 as x tends to innity. Since
n(n) is the number of composite numbers less than or equal to n, the ratio (n(n))/n
represents the proportion of composite numbers among the rst n integers. That this
ratio tends to 1 means in a certain sense that almost all positive integers are composite.
Onthe other hand, primes are not particularlyscarce, because the logarithmfunction
grows veryslowly. Bythe Prime Number Theoremwe canuse x/ ln x as anapproximation
of (x) . If x is a large number and y is small compared to x then ln(x + y) ln x, and
hence
(x + y) (x)
x + y
ln(x + y)
x
ln x
y
ln x
.
This means that in a relatively small interval of length y around the large number x there
are approximately y/ ln x primes, and we can expect to nd a prime in the interval if the
length is about ln x. If the primes were randomly distributed the probability of a large
number x being prime would be approximately 1/ ln x. Taking for example x = 10
100
we
have ln x 230. Thus, if we choose an integer N at random" in the neigborhood of 10
100
the probability that N is prime is roughly 1/230. Of course, we can raise this probability
t o 1/115 by avoiding the even numbers, and if we make sure that N is not divisible by
2, 3, or 5, the probability that N is prime grows to about 1/60. Thus, provided we use
an efcient primality test, we can produce a very large prime by rst choosing a number
N at random not divisible by 2, 3, or 5 (and some other small primes) and testing it for
primality. If N turns out to be a prime, then we are happy, otherwise we consider the
next integer in the sequence N + 2, N + 4, N + 6, . . . that is not divisible by 3 and 5 (and
the other selected small primes) and test this for primality. Because of the Prime Number
Theorem we feel condent that we will nd a prime after not too many tries.
3. The Linear Diophantine Equation ax+by=c
Let a, b and c be integers and consider the equation
(1) ax + by = c.
We are interested in integer solutions x and y, only.
From section 1 we already know a lot about the equation. By Theorem 1.9, the
set ax + by [ x, y Z coincides with the set of all multiples n(a, b) of the greatest
common divisor of a and b. It follows that equation (1) is solvable if and only if (a, b)[c.
Moreover, the Euclidean algorithm provides us with a method for nding a solution x
0
,
y
0
of the equation ax +by = (a, b) , and by multiplying this solution by c/(a, b) we will get
a solution of the original equation (1). What remains is to nd the general solution given
one particular solution. The complete story is summarized in the following theorem.
12 Lectures on Number Theory
Theorem 3.1 The equation ax + by = c has integer solutions if and only if (a, b)[c. If x
0
, y
0
is a solution, then all integer solutions are given by
x = x
0
+
b
(a, b)
n, y = y
0
a
(a, b)
n, n Z.
Proof. The numbers x and y dened above are integers, and one immediately veries
that they satisfy the equation. To see that these are all solutions, assume that x, y is an
arbitrary integer solution. Then ax + by = ax
0
+ by
0
. It follows that a(x x
0
) = b(y
0
y) ,
and that
(2)
a
d
(x x
0
) =
b
d
(y
0
y),
where we have written d = (a, b) for short. Since
_
a
d
,
b
d
_
= 1, we conclude from Theorem
1.12 that
b
d
is a divisor of x x
0
, i.e. there exists an integer n such that x x
0
=
b
d
n. By
inserting this into (2) and simplifying, we also obtain y y
0
=
a
d
n. .
The case (a, b) = 1 is so important that it is worth stating separately.
Corollary 3.2 Suppose that (a, b) = 1. Then the linear equation ax + by = c has integer
solutions for all integers c. If x
0
, y
0
is a solution, then all solutions are given by
x = x
0
+ bn, y = y
0
an, n Z.
According to Theorem 3.1, the distance between two consecutive x-solutions is b/d
and the distance between two consecutive y-solutions is a/d, where d = (a, b) . It follows
that, provided the equation is solvable, there is a solution (x, y) with 0 x b/d 1.
We can nd this solution by successively trying x = 0, x = 1, . . . , solving the equation
for y until an integer value for y is found. Of course, we can also solve the equation by
looking for a solution y in the interval 0 y a/d 1. Hence, we can easily solve the
equation ax +by = c by trial and error whenever at least one of the numbers a/d and b/d
is small.
Example 1. Solve the equation
247x + 91y = 39.
Solution 1: The equation is solvable, because (247, 91) = 13 and 13[39. Since
91
13
= 7
the equation has an integer solution with 0 x 6. Trying x = 0, 1, 2, we nd that
x = 2 gives the integer value y = 5. Therefore, the general solution of the equation is
x = 2 + 7n, y = 5 19n.
Solution 2: In Example 6, section 1, we found that x = 3, y = 8 solves the equation
247x + 91y = 13. By multiplying this solution by 3, we get the particular solution x
0
= 9,
y
0
= 24 to our given equation, and the general solution is x = 9 + 7n, y = 24 19n.
This parametrization of the solutions is different from that above, but the set of solutions
is of course the same as in solution no. 1.
Solution 3: The solution above uses the Euclidean algorithm. We will now give another
method, which is more or less equivalent to the Euclidean algorithm, but the presentation
is different. To solve
(3) 247x + 91y = 39
3. The Linear Diophantine Equation ax+by=c 13
we start by writing 247 = 2 91+65, 247x = 91 2x +65x and 247x +91y = 65x +91(2x +y) .
Introducing new integer variables x
1
= x, y
1
= 2x + y, we now rewrite equation (3) as
(4) 65x
1
+ 91y
1
= 39
This equation has smaller coefcients. Note that if x
1
and y
1
are integers, then x = x
1
and y = y
1
2x are integers, too. Hence, solving (4) for integer values is equivalent to
solving (3) for integer values.
The same procedure can now be repeated. Write 91 = 65 + 26 and 65x
1
+ 91y
1
=
65(x
1
+ y
1
) + 26y
1
in order to replace equation (4) with the equivalent equation
(5) 65x
2
+ 26y
2
= 39, with x
2
= x
1
+ y
1
, y
2
= y
1
We continue, noting that 65 = 2 26 + 13, and obtain
(6) 13x
3
+ 26y
3
= 39, with x
3
= x
2
, y
3
= 2x
2
+ y
2
Now 26 = 2 13, so
(7) 13x
4
+ 0y
4
= 39, with x
4
= x
3
+ 2y
3
, y
4
= y
3
From (7) we conclude that x
4
= 39/13 = 3 whereas y
4
is an arbitrary integer, n say. Going
backwards, we nd
y
3
= y
4
= n, x
3
= x
4
2y
3
= 3 2n
x
2
= x
3
= 3 2n, y
2
= y
3
2x
2
= n 2(3 2n) = 6 + 5n
y
1
= y
2
= 6 + 5n, x
1
= x
2
y
1
= 3 2n + 6 5n = 9 7n
x = x
1
= 9 7n, y = y
1
2x = 6 + 5n 2(9 7n) = 24 + 19n .
For linear equations with more than two variables we have the following result,
which follows immediately from Theorem 1.9
/
.
Theorem 3.3 The linear equation a
1
x
1
+ a
2
x
2
+ + a
n
x
n
= c has integer solutions if and only
if (a
1
, a
2
, . . . , a
n
)[c.
The third solution method in Example 1 can easily be adopted to take care of
equations with more than two variables.
Example 2. Solve the equation
6x + 10y + 15z = 5
for integer solutions.
Solution: The equation is solvable, because (6, 10, 15) = 1. Consider the least coefcient
6 and write 10 = 6+4 and 15 = 2 6+3. Introducing new variables x
1
= x +y +2z, y
1
= y,
and z
1
= z we can rewrite our linear equation as
6x
1
+ 4y
1
+ 3z
1
= 5.
Since 6 = 2 3 and 4 = 3 + 1, we put x
2
= x
1
, y
2
= y
1
, and z
2
= 2x
1
+ y
1
+ z
1
. This change
of variables transforms our equation into
0x
2
+ y
2
+ 3z
2
= 5.
14 Lectures on Number Theory
Now 1 is the least non-zero coefcient, and we put x
3
= x
2
, y
3
= y
2
+ 3z
2
, and z
3
= z
2
.
Our equation now reads
0x
3
+ y
3
+ 0z
3
= 5
with the obvious solution x
3
= m, y
3
= 5, z
3
= n, m and n being arbitrary integers.
Going backwards we get after some easy calculations:
x = 5 + 5m 5n, y = 5 3n, z = 5 2m + 4n, m, n Z. .
4. Congruences
Denition 4.1 Let m be a positive integer. If m[(a b) then we say that a is congruent to
b modulo m and write a b (mod m) . If m ,[(a b) then we say that a is not congruent
to b modulo m and write a , b (mod m) .
Obviously, a b (mod m) is equivalent to a = b + mq for some integer q.
We now list some useful properties, which follow easily from the denition.
Proposition 4.2 Congruence modulo m is an equivalence relation, i.e.
(i) a a (mod m) for all a.
(ii) If a b (mod m) , then b a (mod m) .
(iii) If a b (mod m) and b c (mod m) , then a c (mod m) .
Proof. We leave the simple proof to the reader. .
Our next proposition shows that congruences can be added, multiplied and raised
to powers.
Proposition 4.3 Let a, b, c and d be integers.
(i) If a b (mod m) and c d (mod m) , then a + c b + d (mod m) .
(ii) If a b (mod m) and c d (mod m) , then ac bd (mod m) .
(iii) If a b (mod m) , then a
k
b
k
(mod m) for all non-negative integers k.
(iv) Let f (x) be a polynomial with integral coefcients. If a b (mod m) then f (a) f (b)
(mod m) .
Proof. (i) is left to the reader.
(ii) If a b (mod m) and c d (mod m) , then a = b + mq and c = d + mr for
suitable integers q and r . It follows that ac = bd + m(br + dq + mqr) . Hence ac bd
(mod m) .
(iii) Taking c = a and d = b in (ii) we see that a b (mod m) implies a
2
b
2
(mod m) . Applying (ii) again, we get a
3
b
3
(mod m) , and the general case follows by
induction.
(iv) Suppose f (x) =
n
j=0
c
j
x
j
. Using (iii) we rst obtain a
j
b
j
(mod m) for
each j , and then c
j
a
j
c
j
b
j
(mod m) by (ii). Finally, repeated application of (i) gives
f (a) =
n
j=0
c
j
a
j
n
j=0
c
j
b
j
= f (b) (mod m) . .
Remark on the computation of powers. In many applications we need to compute
powers a
k
modulo m. The naive approach would invoke k 1 multiplications. This is
ne if k is small, but for large numbers k such as in the RSA-algorithm, to be discussed
4. Congruences 15
in section 7, this is prohibitively time consuming. Instead, one should compute a
k
recursively using the formula
a
k
=
_
(a
k/2
)
2
= (a
[k/2]
)
2
if k is even,
a (a
(k1)/2
)
2
= a (a
[k/2]
)
2
if k is odd.
Thus, a
k
is obtained from a
[k/2]
by using one multiplication (squaring) if k is even, and
two multiplications (squaring followed by multiplication by a) if k is odd. Depending
on the value of k, the innermost computation of the recursion will be a
2
or a
3
= a a
2
.
The total number of multiplications required to compute a
k
from a using recursion
is of the order of magnitude log k, which is small compared to k. Indeed, if k has the
binary expansion k =
r
r1
. . .
1
0
=
r
j=0
j
2
j
, (
r
= 1), then [k/2] =
r
r1
. . .
1
, and
k is odd if
0
= 1 and even if = 0. It now easily follows that the number of squarings
needed equals r , and that the number of extra multiplications by a equals the number of
nonzero digits
j
minus 1. Thus, at most 2r multiplications are needed.
Example 1. The computation of 3
1304
(mod 121) by recursion can be summarized in
the following table:
k 1304 652 326 163 162 81 80 40 20 10 5 4 2 1
3
k
(mod 121) 81 9 3 27 9 3 1 1 1 1 1 81 9 3
The numbers in the top row are computed from left to right. If a number is even, the next
number is obtained by dividing it by 2, and if a number is odd the next one is obtained
by subtracting 1. The numbers in the bottom row are computed from right to left. For
instance, 3
4
= (3
2
)
2
9
2
81, 3
5
= 3 3
4
3 81 243 1, 3
326
= (3
163
)
2
27
2
3. .
We next investigate what happens when the modulus is multiplied or divided by a
number. The simple proof of the following proposition is left to the reader.
Proposition 4.4 Let c be an arbitrary positive integer, and let d be a positive divisor of m.
(i) If a b (mod m) , then ac bc (mod mc) .
(ii) If a b (mod m) , then a b (mod d) .
In general, congruences may not be divided without changing the modulus. We
have the following result.
Proposition 4.5 Let c be a non-zero integer.
(i) If ca cb (mod m) , then a b (mod m/(c, m))
(ii) If ca cb (mod m) and (c, m) = 1, then a b (mod m) .
Proof. (i) Let d = (c, m) . If ca cb (mod m) , then m[c(a b) and
m
d
c
d
(a b) . Since
_
m
d
,
c
d
_
= 1, it follows that
m
d
of integers obtained by choosing one integer from each of the residue classes that are
relatively prime to m, is called a reduced residue system modulo m.
The following two observations are immediate consequences of the denitions: The
number (m) equals the number of integers in the interval [0, m 1] that are relatively
prime to m. y
1
, y
2
, . . . , y
(m)
is a reduced residue system modulo m if and only if the
numbers are pairwise incongruent modulo m and (y
i
, m) = 1 for all i .
Example 3. The positive integers less than 8 that are relatively prime to 8 are 1, 3, 5,
and 7. It follows that (8) = 4 and that 1, 3, 5, 7 is a reduced residue system modulo 8.
.
Example 4. If p is a prime, then the numbers 1, 2, . . . , p 1 are all relatively prime
to p. It follows that (p) = p 1 and that 1, 2, . . . , p 1 is a reduced residue system
modulo p. .
Example 5. Let p
k
be a prime power. An integer is relatively prime to p
k
if and only
if it is not divisible by p. Hence, in the interval [0, p
k
1] there are p
k1
integers that are
4. Congruences 17
not relatively prime to p, viz. the integers np, where n = 0, 1, 2, . . . , p
k1
1, whereas
the remaining p
k
p
k1
integers in the interval are relatively prime to p. Consequently,
(p
k
) = p
k
p
k1
= p
k
_
1
1
p
_
. .
Theorem 4.13 Let (a, m) = 1. Let r
1
, r
2
, . . . , r
m
be a complete residue system, and let
s
1
, s
2
, . . . , s
(m)
be a reduced residue system modulo m. Then ar
1
, ar
2
, . . . , ar
m
is a complete
and as
1
, as
2
, . . . , as
(m)
is a reduced residue system modulo m.
Proof. In order to show that the set ar
1
, ar
2
, . . . , ar
m
is a complete residue system, we
just have to check that the elements are chosen from distinct residue classes, i.e. that
i ,= j ar
i
, ar
j
(mod m) . But by Proposition 4.5 (ii), ar
i
ar
j
(mod m) implies r
i
r
j
(mod m) and hence i = j .
Since (s
i
, m) = 1 and (a, m) = 1, we have (as
i
, m) = 1 for i = 1, 2, . . . , (m) by
Theorem 1.14. Hence as
1
, as
2
, . . . , as
(m)
are (m) numbers belonging to residue classes
that are relatively prime to m, and by the same argument as above they are chosen from
distinct residue classes. It follows that they form a reduced residue system. .
Theorem 4.14 (Eulers theorem) If (a, m) = 1, then
a
(m)
1 (mod m).
Proof. Let s
1
, s
2
, . . . , s
(m)
be a reduced residue system modulo m. By Theorem 4.13,
the set as
1
, as
2
, . . . , as
(m)
is also a reduced residue system. Consequently, to each s
i
there corresponds one and only one as
j
such that s
i
as
j
(mod m) . By multiplying
together and using Proposition 4.3 (ii), we thus get
(m)
j=1
(as
j
)
(m)
i=1
s
i
(mod m),
and hence
a
(m)
(m)
j=1
s
j
(m)
i=1
s
i
(mod m).
Since (s
i
, m) = 1, we can use Proposition 4.5 (ii) repeatedly to cancel the s
i
, and we obtain
a
(m)
1 (mod m) . .
The following theorem is an immediate corollary.
Theorem 4.15 (Fermats theorem) If p is a prime and p,[ a, then
a
p1
1 (mod p).
For every integer a, a
p
a (mod p) .
Proof. If p,[ a, then (a, p) =1. Since (p) = p 1 by Example 4, the rst part now follows
immediately from Eulers theorem. By multiplying the congruence by a, we note that
a
p
a (mod p) , and this obvioulsy holds also in the case a 0 (mod p) . .
Example 6. Modulo 7 we get 3
1
3, 3
2
2, 3
3
6, 3
4
4, 3
5
5, andnally 3
6
1
inaccordance withFermats theorem. Similarly, 2
1
2, 2
2
4, 2
3
1, andhence 2
6
1.
.
18 Lectures on Number Theory
5. Linear Congruences
The congruence
(1) ax b (mod m)
is equivalent to the equation
(2) ax my = b
where we of course only consider integral solutions x and y. We know from Theorem 3.1
that this equation is solvable if and only if d = (a, m) divides b, and if x
0
, y
0
is a solution
then the complete set of solution is given by
x = x
0
+
m
d
n, y = y
0
+
a
d
n.
We get d pairwise incongruent x-values modulo m by taking n = 0, 1, . . . , d 1, and
any solution x is congruent to one of these. This proves the following theorem.
Theorem 5.1 The congruence
ax b (mod m)
is solvable if and only if (a, m)[b. If the congruence is solvable, then it has exactly (a, m) pairwise
incongruent solutions modulo m.
We have the following immediate corollaries.
Corollary 5.2 The congruene ax 1 (mod m) is solvable if and only if (a, m) = 1, and in
this case any two solutions are congruent modulo m.
Corollary 5.3 If (a, m) = 1, then the congruence ax b (mod m) is solvable for any b and
any two solutions are congruent modulo m.
Note that the existence of a solution in Corollories 5.2 and 5.3 can also be deduced
from Eulers theorem. By taking x
0
= a
(m)1
and x
1
= bx
0
we obtain ax
0
= a
(m)
1
(mod m) and ax
1
= bax
0
b (mod m) .
However, in order to solve the congruence (1) it is usually more efcient to solve
the equivalent equation (2) using the methods from section 3. Another possibility is to
replace the congruence (1) by a congruence with a smaller modulus and then reduce the
coefcients in the following way:
In (1) we can replace the numbers a and b with congruent numbers in the interval
[0, m 1] , or still better in the interval [m/2, m/2] . Assuming this done, we can now
write equation (2) as
(3) my b (mod a)
with a module a that is less than the module m in (1). If y = y
0
solves (3), then
x =
my
0
+ b
a
5. Linear Congruences 19
is a solution to (1). Of course, the whole procedure can be iterated again and again until
nally a congruence of the form z c (mod n) is obtained.
Example 1. Solve the congruence
(4) 296x 176 (mod 114).
Solution: Since 2 divides the numbers 296, 176, and 114, we start by replacing (4) with
the following equivalent congruence:
(5) 148x 88 (mod 57).
Now, reduce 148 and 88 modulo 57. Since 148 23 and 88 26, we can replace (5)
with
(6) 23x 26 (mod 57).
Now we consider instead the congruence
57y 26 (mod 23),
which of course is quivalent to
(7) 11y 3 (mod 23).
Again, replace this with the congruence
23z 3 (mod 11)
which is at once reduced to
z 3 (mod 11).
Using this solution, we see that
y =
23 3 3
11
= 6
is a solution to (7) and that all solutions have the form y 6 (mod 23) . It now follows
that
x =
57 6 + 26
23
= 16
solves (6) and the equivalent congruence (4), and that all solutions are of the form x 16
(mod 57) , which can of course also be written as x 16, 73 (mod 114). .
Concluding remarks. These remarks are intended for readers who are familiar with elementary
group theory.
Let Z
m
denote the set of all residue classes modulo m. We can equip Z
m
with a multiplication
operation by dening the product of two residue classes as follows:
a b = ab.
For this denition to be well behaved it is of course necessary that the residue class ab be dependent
on the residue classes a and b only, and not on the particular numbers a and b chosen to represent
them, and that ab belong to Z
m
. However, this follows from Proposition 4.3 (ii) and Theorem 1.14.
The multiplication on Z
m
is obviously associative and commutative, and there is an identity
element, namely the class 1. Moreover, it follows from Corollary 5.2 that the equation a x = 1 has
a unique solution x Z
m
for each a Z
m
. Thus, each element in Z
m
has a unique multiplicative
inverse.
This shows that Z
m
is a nite abelian (commutative) group. The order of the group (i.e. the
number of elements in the group) equals (m) , by denition of the Euler -function.
One of the rst theorems encountered when studying groups reads: If n is the order of a
nite group with identity element e , then a
n
= e for every element a in the group. Applying this
result to the group Z
m
, we recover Eulers theorem, since the statement
a
(m)
= 1
is just another way of saying that
a
(m)
1 (mod m)
holds for every number a that is relatively prime to m.
20 Lectures on Number Theory
6. The Chinese Remainder Theorem
Let us start by considering a system of two congruences
_
x a
1
(mod m
1
)
x a
2
(mod m
2
)
where (m
1
, m
2
) = 1. The rst congruence has the solutions x = a
1
+ m
1
y, y Z, and by
substituting this into the second congruence, we obtain a
1
+ m
1
y a
2
(mod m
2
) , that is
m
1
y a
2
a
1
(mod m
2
) . Now, since (m
1
, m
2
) = 1, this congruence has solutions of the
form y = y
0
+ m
2
n and hence x = a
1
+ m
1
y
0
+ m
1
m
2
n. This shows that the system has a
unique solution x x
0
(mod m
1
m
2
) .
Consider now a system of three congruences
(1)
_
_
x a
1
(mod m
1
)
x a
2
(mod m
2
)
x a
3
(mod m
3
)
where the moduli m
1
, m
2
and m
3
are pairwise relatively prime. As shown above, we
can replace the rst two congruences with an equivalent congruence of the form x x
0
(mod m
1
m
2
) , and hence the whole system (1) is equivalent to a system of the form
(2)
_
x x
0
(mod m
1
m
2
)
x a
3
(mod m
3
)
Now, by assumption (m
1
m
2
, m
3
) = 1, and hence (2) has a unique solution x x
1
(mod m
1
m
2
m
3
) .
By induction, it is now very easy to prove the following general result.
Theorem 6.1 (The Chinese Remainder Theorem) The system
(3)
_
_
x a
1
(mod m
1
)
x a
2
(mod m
2
)
.
.
.
x a
r
(mod m
r
)
where m
1
, m
2
, . . . , m
r
are pairwise relatively prime, has a unique solution modulo m
1
m
2
m
r
.
Proof. We will give a second proof of the theorem and also derive a formula for the
solution.
Let for each j = 1, 2, . . . , r ,
j
be an integer satisfying
j
_
1 (mod m
j
)
0 (mod m
i
), if i ,= j.
6. The Chinese Remainder Theorem 21
Then obviously
(4) x =
r
j=1
j
a
j
satises the system (3).
It remains to prove that the numbers
j
exist. Put m = m
1
m
2
m
r
. By assumption
_
m
m
j
, m
j
_
= 1 and hence, by Corollary 5.2, there is a number b
j
such that
m
m
j
b
j
1 (mod m
j
).
The numbers
j
=
m
m
j
b
j
will now clearly have the desired properties.
This proves the existence of a solution x to (3). To prove that the solution is unique
modulo m, suppose x
/
is another solution. Then x x
/
(mod m
j
) holds for j = 1, 2, . . . ,
r , and it follows from Proposition 4.6 that x x
/
(mod m
1
m
2
m
r
) . .
Formula (4) is particularly useful when we are to solve several systems (3) with the
same moduli but with different right hand members a
1
, a
2
, . . . , a
r
.
Example 1. Let us solve the system
_
_
x 1 (mod 3)
x 2 (mod 4)
x 3 (mod 5)
Solution 1: Using the method in our rst proof of the Chinese Remainder Theorem, we
replace the rst congruence by x = 1 + 3y. Substituting this into the second congruence
we obtain 3y + 1 2 (mod 4) or 3y 1 (mod 4) . This congruence has the solutions
y 1 (mod 4) , i.e. y = 1 +4z. Hence, x = 2 +12z, and substituting this into the last
congruence we end up in the congruence 12z 2 3 (mod 5) or 12z 5 0 (mod 5) .
This congruence has the unique solution z 0 (mod 5) , that is z = 5t and x = 2 + 60t .
Hence, the system has the unique solution x 2 (mod 60) .
Solution 2: Let us instead use the method of the second proof. Then we have rst to nd
numbers b
1
, b
2
, and b
3
such that
20b
1
1 (mod 3), 15b
2
1 (mod 4), 12b
3
1 (mod 5).
One easily obtains b
1
= 2, b
2
= 3, and b
3
= 3. Next, we compute
1
= 20b
1
= 40,
2
= 15b
2
= 45, and
3
= 12b
3
= 36. Finally,
x =
1
+ 2
2
+ 3
3
= 40 + 90 + 108 = 238 58 (mod 60). .
The condition that the moduli m
1
, m
2
, . . . , m
p
be pairwise relatively prime is abso-
lutely essential for the conclusion of Theorem 6.1. Without that condition the system (3) is
either unsolvable or there are more than one incongruent solution modulo m
1
m
2
m
r
.
Necessary and sufcient for the system to be solvable is that (m
i
, m
j
)[(a
i
a
j
) for all i ,= j .
A given system can be solved or proved unsolvable by reasoning as in the rst solution
of Example 1.
We will now derive some important consequences of Theorem 6.1. Given a positive
integer n we let ((n) denote a xed complete residue system modulo n. The subset of all
22 Lectures on Number Theory
numbers in ((n) that are relatively prime to n forms a reduced residue system which we
denote by 1(n) . The set 1(n) contains (n) numbers. To be concrete, we could choose
((n) = 0, 1, 2, . . . , n 1; then 1(n) = j [ 0 j n 1 and (j, n) = 1.
Let now m
1
and m
2
be two relatively prime numbers and put m = m
1
m
2
. Then
((m) and the Cartesian product ((m
1
) ((m
2
) contain the same number of elements, viz.
m. We will construct a bijection between these two sets.
Given x ((m) and j = 1 or 2, we denote by x
j
the unique number in ((m
j
) that
satises x
j
x (mod m
j
) . We then dene (x) = (x
1
, x
2
) .
A map between two sets with the same number of elements is a bijection if and
only if it is surjective. But surjectivity of the map follows immediately from the
Chinese Remainder Theorem, because given (x
1
, x
2
) ((m
1
) ((m
2
) , there is a (unique)
x ((m) such that x x
1
(mod m
1
) and x x
2
(mod m
2
) , which amounts to saying
that (x) = (x
1
, x
2
) .
We will next identify the image (1(m)) of the reduced residue system 1(m) under
the map . Since
(x, m) = 1 (x, m
1
) = (x, m
2
) = 1
and
x x
j
(mod m
j
) ((x, m
j
) = 1 (x
j
, m
j
) = 1)
it follows that x 1(m) (x) 1(m
1
) 1(m
2
) . Thus, maps the set 1(m)
bijectively onto the Cartesian product 1(m
1
) 1(m
2
) . The former set has (m) elements
and the latter has (m
1
)(m
2
) elements. Since the two sets must have the same number
of elements, we have proved the following important theorem about Eulers -function.
Theorem 6.2 If m = m
1
m
2
, where the integers m
1
and m
2
are relatively prime, then
(m) = (m
1
)(m
2
).
Corollary 6.3 If n = p
k
1
1
p
k
2
2
p
k
r
r
, where p
1
, p
2
, . . . , p
r
are different primes, then
(n) = n
_
1
1
p
1
__
1
1
p
2
_
_
1
1
p
r
_
.
Proof. By repeated application of Theorem 6.2, we obtain
(m
1
m
2
m
r
) = (m
1
)(m
2
) (m
r
)
if the integers m
1
, m
2
, . . . , m
r
are pairwise relatively prime. In particular, this holds
when the numbers m
i
are powers of distinct primes. By Example 5 in section 4, (p
k
) =
p
k1
(p 1) = p
k
(1 1/p) if p is prime. .
Apolynomial f (x) =
n
i=0
a
i
x
i
with coefcients a
i
Z is called an integral polynomial,
and the congruence
f (x) 0 (mod m),
is calleda polynomial congruence. Aninteger a is calleda solutionor a root of the polynomial
congruence if f (a) 0 (mod m) .
If a is a root of the polynomial congruence and if b a (mod m) , then b is also a
root. Therefore, in order to solve the polynomial cogruence it is enough to nd all roots
that belong to a given complete residue system ((m) modulo m, e.g. to nd all solutions
among the numbers 0, 1, 2, . . . , m1. By the number of roots of a polynomial congruence
we will mean the number of such incongruent roots.
6. The Chinese Remainder Theorem 23
Next, consider a system
_
_
f
1
(x) 0 (mod m
1
)
f
2
(x) 0 (mod m
2
)
.
.
.
f
r
(x) 0 (mod m
r
)
of polynomial congruences, where the moduli m
1
, m
2
, . . . , m
r
are assumedto be pairwise
relatively prime. By a solution of such a system we mean, of course, an integer which
solves simultaneously all the congruences of the system. If a is a solution of the system,
and if b a (mod m
1
m
2
m
r
) , then b is also a solution of the system, since for each j
we have b a (mod m
j
) . Hence, to nd all solutions of the system it sufces to consider
solutions belonging to a complete residue system modulo m
1
m
2
m
r
; by the number of
solutions of the system we will mean the number of such incongruent solutions.
Theorem 6.4 Let
_
_
f
1
(x) 0 (mod m
1
)
f
2
(x) 0 (mod m
2
)
.
.
.
f
r
(x) 0 (mod m
r
)
be a system of polynomial congruences, and assume that the the moduli m
1
, m
2
, . . . , m
r
are
pairwise relatively prime. Let X
j
be a complete set of incongruent solutions modulo m
j
of the
j th congruence, and let n
j
denote the number of solutions. The number of solutions of the system
then equals n
1
n
2
n
r
, and each solution of the system is obtained as the solution of the system
_
_
x a
1
(mod m
1
)
x a
2
(mod m
2
)
.
.
.
x a
r
(mod m
r
)
with (a
1
, a
2
, . . . , a
r
) ranging over the set X
1
X
2
X
r
.
Of course, a set X
j
might be empty in which case n
j
= 0
Proof. Write m = m
1
m
2
m
r
, and let X ((m) be the set of solutions of the system.
For each choice of (a
1
, a
2
, . . . , a
r
) X
1
X
2
X
r
, there is by the Chinese Remainder
Theorem a unique a ((m) such that a a
j
(mod m
j
) holds for each j , and it follows
from Proposition 4.3 that a is a solution of the system, i.e. a X. Conversely, if a X,
then a is a solution of each individual congruence, and hence there exists, for each j ,
a unique a
j
X
j
such that and a a
j
(mod m
j
) . Thus, we have a surjective map
a (a
1
, a
2
, . . . , a
r
) from X to X
1
X
2
X
r
. This map is also injective, because
a , b (mod m) implies a , b (mod m
j
) for at least on j . Thus, the map is bijective, and
we conclude that the number of elements in X equals n
1
n
2
n
r
. .
Example 2. Consider the system
_
x
2
+ x + 1 0 (mod 7)
2x 4 0 (mod 6)
24 Lectures on Number Theory
By trying x = 0, 1, 2, 3, we nd that x 2 (mod 7) and x 3 (mod 7) are
the solutions of the rst congruence. Similarly, we nd that x 1 (mod 6) and x 2
(mod 6) solve the second congruence. We conclude that the system has 4 incongruent
solutions modulo 42. To nd these, we have to solve each of the following four systems:
_
x 2 (mod 7)
x 1 (mod 6)
_
x 2 (mod 7)
x 2 (mod 6)
_
x 3 (mod 7)
x 1 (mod 6)
_
x 3 (mod 7)
x 2 (mod 6)
We use the solution formula (4) obtained in the proof of the Chinese Remainder Theorem.
Thus, we determine b
1
and b
2
such that
42
7
b
1
1 (mod 7) and
42
6
b
2
1 (mod 6).
We easily nd that b
1
= 1 and b
2
= 1 solve these congruences, and hence we can take
1
= 6 and
2
= 7. We conclude that four different solutions modulo 42 of our original
system are
x
1
= 6 2 + 7 (1) = 19 23
x
2
= 6 2 + 7 2 = 2
x
3
= 6 (3) + 7 (1) = 11
x
4
= 6 (3) + 7 2 = 32 .
We now turn to an important special case of Theorem 6.4.
Theorem 6.5 Let f (x) be an integral polynomial. For each positive integer m, let X(m) denote
a complete set of roots modulo m of the polynomial congruence
f (x) 0 (mod m),
and let N(m) denote the number of roots.
Assume m = m
1
m
2
m
r
, where the numbers m
1
, m
2
, . . . , m
r
are pairwise relatively
prime; then
N(m) = N(m
1
)N(m
2
) N(m
r
).
Moreover, to each r -tuple (a
1
, a
2
, . . . , a
r
) X(m
1
) X(m
2
) X(m
r
) there corresponds a
unique solution a X(m) such that a a
j
(mod m
j
) for each j .
Proof. By Proposition 4.6, the congruence f (x) 0 (mod m) is equivalent to the system
_
_
f (x) 0 (mod m
1
)
f (x) 0 (mod m
2
)
.
.
.
f (x) 0 (mod m
r
)
Hence, Theorem 6.4 applies. .
It follows that in order to solve a polynomial congruence modulo m it is sufcient
to know how to solve congruences with prime power moduli.
Example 3. Let f (x) = x
2
+ x + 1. Prove that the congruence f (x) 0 (mod 15) has
no solutions.
7. Public-Key Cryptography 25
Solution: By trying the values x = 0, 1, 2 we nd that the congruence f (x) 0
(mod 5) has no solutions. Therefore, the given congruence modulo 15 ( = 5 3) has no
solutions. .
Example 4. Let f (x) = x
2
+ x +9. Find the roots of the congruence f (x) 0 (mod 63) .
Solution: Since 63 = 3
2
7, we start by solving the two congruences f (x) 0 (mod 7)
and f (x) 0 (mod 9) . The rst congruence has the sole root 3 (mod 7), and the second
congruence has the roots 0 and 1 (mod 9). It follows that the given congruence has two
roots modulo 63, and they are obtained by solving the congruences
_
x 3 (mod 7)
x 0 (mod 9)
and
_
x 3 (mod 7)
x 1 (mod 9)
Using the Chinese remainder theorem, we nd that the roots are 45 and 17 modulo 63. .
7. Public-Key Cryptography
In 1977 R.L. Rivest, A. Shamir and L.M. Adleman invented an asymmetric encryption
scheme which has been called the RSA algorithm and which uses congruence arithmetic.
The method uses two keys, one public encryption key and one secret private decryption key.
The security of the algorithm depends on the hardness of factoring a large composite
number and computing e
th
roots modulo a composite number for a specied integer e.
The RSA algorithm is based on the following theorem.
Theorem 7.1 Suppose that m is a positive square-free integer, i. e. that the canonical prime
factorization m = p
1
p
2
p
r
consists of distinct primes, and let e and d be positive integers such
that ed 1 (mod (m)) . Then a
ed
a (mod m) holds for each integer a.
Proof. By Proposition 4.6 it sufces to prove that a
ed
a (mod p) holds for each prime
p = p
i
dividing the modulus m. This is trivially true if a 0 (mod p) , because then
a
ed
a 0 (mod p) . Hence, we may assume that a , 0 (mod p) .
By assumption, ed = 1 + n(m) for some nonnegative integer n, and
(m) = (p m/p) = (p)(m/p) = (p 1)(m/p) .
Hence, ed = 1 + n(p 1)(m/p) = 1 + (p 1)N, where N is a nonnegative integer.
Therefore, by Fermats theorem
a
ed
= a
1+(p1)N
= a (a
p1
)
N
a 1
N
= a (mod p). .
An RSA public key consists of a pair (m, e) of integers. The number m will be used
as the modulus, and e is the public exponent. The modulus m is the product of two
distinct large primes p and q (the current recommendation is that each prime should
have a size of at least 2
512
). The exponent e must be relatively prime to (m) , that is to
p 1 and q 1, and it is normally chosen to be a small prime such as 3 ( = 2 + 1), 17
( = 2
4
+ 1), or 65537 ( = 2
16
+ 1), since powers a
e
(mod m) can be computed very fast for
these particular choices of e.
In actual implementations of the RSA algorithm, the exponent e is rst xed. Then
the primes p and q satisfying (p 1, e) = (q 1, e) = 1 are generated at random in such
a way that each prime of the desired size (say 2
512
) has the same probability of being
chosen. Finally, m = pq.
26 Lectures on Number Theory
The private key consists of the pair (m, d) , where d is the unique positive number
less than (m) satisfying ed 1 (mod (m)) . The number e, as well as the primes p and
q, and the number (m) , are kept secrete by the owner of the private key.
Suppose now that somebody, say Alice, wants to send a secret message to Bob,
the owner of the private key. The rst thing to do is to convert the message into an
integer a in the range [0, m 1] in some standard way. (We could for example use the
standard ASCII code. Since the ASCII code encodes H as 072, e as 101, l as 108,
o as 111, and !, the message Hello! would by concatenation become the number
a = 072101108108111033.) If the message is too long, the message representative a will
be out of range, but the message could then be divided into a number of blocks which
could be encoded separately.
The sender Alice nowuses the public encryption key to compute the unique number
b satisfying b a
e
(mod m) and 0 b m 1. This number b, the ciphertext
representative of a, is transmitted to Bob.
When b is received, Bob uses his private exponent d to nd the unique number c
satisfying 0 c < m and c b
d
(mod m) . By Theorem 7.1, c = a and hence the secret
number a is recovered.
Suppose that some third party gains access to the number b. In order to recover the
number a he has to extract the e
th
root of b. There seems to be no other feasible method
for this than to nd the number d, and for that he need to know (m) , and for that he has
to factor the number m. But factorization of integers with 1000 binary digits seems to be
beyond the reach of todays algorithms and fastest computers. Therefore, it is believed
that the RSA method is very secure.
It is important that the message number a is not too small relative to m, because if
a
e
< m then we can nd a from the ciphertext representative b = a
e
by just extracting
the ordinary e
th
root of the integer b. Therefore, one has to use padding techniques
which extend numbers with few non-zero digits in order to obtain a secure algorithm. A
detailed description of how this is done is beyond the scope of this presentation.
8. Pseudoprimes
If a number n is composite, then it has a prime factor p which is less than or equal to
j=1
_
ja
p
_
.
42 Lectures on Number Theory
Proof. We have to prove that n has the same parity as the number N in Gausss lemma,
i.e. that n N (mod 2) . We use the same notation as in the proof of the lemma. By
summing over j in equation (1), we obtain
(2)
(p1)/p
j=1
ja = p
(p1)/2
j=1
_
ja
p
_
+
N
i=1
r
i
+
M
k=1
s
k
.
Since the numbers (p r
1
) , (p r
2
) , . . . , (p r
N
) , s
1
, s
2
, . . . , s
M
are the numbers 1, 2,
. . . , (p 1)/2 in some order, we also have
(p1)/2
j=1
j =
N
i=1
(p r
i
) +
M
k=1
s
k
.
Subtracting this from equation (2), we obtain
(a 1)
(p1)/2
j=1
j = 2
N
i=1
r
j
+ p
_(p1)/2
j=1
_
ja
p
_
N
_
= 2
N
i=1
r
j
+ p(n N).
Since a 1 is an even number, it follows that p(n N) is even, that is n N is even. .
Example 2. Let us use Theorem 13.5 to compute
_
3
p
_
for primes p 5. Since
_
3j
p
_
=
_
0 if 1 j [p/3],
1 if [p/3] + 1 j (p 1)/2.
it follows that
_
3
p
_
= (1)
n
, where n = (p 1)/2 [p/3] . By considering the cases
p = 12k 1 and p = 12k 5 separately, we see that n is even if and only if p 1
(mod 12) . Hence,
_
3
p
_
= 1 if and only if p 1 (mod 12) .
.
Gausss lemma and Theorem 13.5 are too cumbersome for numerical calculations of
_
a
p
_
. Instead, one uses Gausss law of quadratic reciprocity, which will be the theme of
next section.
14. Quadratic Reciprocity
Theorem 14.1 (The Gaussian Reciprocity Law) Let p and q be two distinct odd primes.
Then
_
p
q
__
q
p
_
= (1)
p1
2
q1
2
,
that is
_
q
p
_
=
_
_
_
p
q
_
if p 1 (mod 4) or q 1 (mod 4),
_
p
q
_
if p 3 (mod 4) and q 3 (mod 4).
14. Quadratic Reciprocity 43
Proof. By Theorem 13.5,
_
p
q
__
q
p
_
= (1)
M
(1)
N
= (1)
M+N
, where
M =
(q1)/2
k=1
_
kp
q
_
and N =
(p1)/2
j=1
_
jq
p
_
.
We will prove that M + N = (p 1)/2(q 1)/2.
To this end, consider the set
A = (j, k) [ j = 1, 2, . . . , (p 1)/2 and k = 1, 2, . . . , (q 1)/2.
We can represent A as a rectangular set of lattice points, i.e. points with integral coordi-
nates, in a rectilinear coordinate system. Since (p, q) = 1, no number qj/p is an integer
when j = 1, 2, . . . , p 1. Hence, there is no point from A on the line y =
q
p
x. Let B
be all points in A that lie below this line, that is (j, k) B if and only if k < jq/p. For a
given j this condition is satised for k = 1, 2, . . . , [jq/p] . Hence there are exactly [jq/p]
points in B whose rst coordinate equals j . Since this holds for j = 1, 2, . . . , (p 1)/2,
we conclude that the total number of points in B equals N.
Similarly, M is the number of points in the set C = (j, k) A [ j < kp/q = (j, k)
A [ k > jq/p, which can be represented as the set of all points in A above the line
y =
q
p
x. Since A is the disjoint union of B and C, it follows that M + N equals the
number of points in A, which is (p 1)/2(q 1)/2.
This number is odd if and only if both (p 1)/2 and (q 1)/2 are odd, that is if
and only if p q 3 (mod 4) . Hence
_
p
q
_
and
_
q
p
_
are of opposite sign if and only
if p q 3 (mod 4) . .
Example 1. The number 991 is a prime, and we will compute
_
402
991
_
. We start by
factoring 402; 402 = 2 3 67. Next we compute
_
2
991
_
= 1, since 991 1 (mod 8).
_
3
991
_
=
_
991
3
_
, since 991 3 (mod 4)
=
_
1
3
_
, since 991 1 (mod 3)
= 1.
_
67
991
_
=
_
991
67
_
, since 991 67 3 (mod 4)
=
_
14
67
_
, since 991 14 (mod 67)
=
_
1
67
__
2
67
__
7
67
_
, 14 = (1) 2 7,
= (1) (1)
_
_
67
7
__
, since 67 3 (mod 8) and 7 3 (mod 4)
=
_
4
7
_
, since 67 4 (mod 7),
= 1, since 4 is a square.
44 Lectures on Number Theory
Finally,
_
402
991
_
=
_
2
991
__
3
991
__
67
991
_
= 1 (1) 1 = 1.
.
Example 2. The number 2137 is a prime which is congruent to 1 modulo 8, and
666 = 2 3
2
37. It follows that
_
666
2137
_
=
_
2
2137
__
37
2137
_
= 1
_
2137
37
_
.
Now 2137 9 (mod 37) and 37 1 (mod 4) . Hence
_
2137
37
_
=
_
9
37
_
=
_
1
37
_
= 1, that is
_
666
2137
_
= 1. .
15. Primitive Roots
Let us start by computing the powers 3
i
modulo 7 for 0 i < (7) = 6. We obtain 3
0
= 1,
3
1
= 3, 3
2
2, 3
3
6, 3
4
4, 3
5
5. Hence, the set 3
i
[ 0 i < (7) is a reduced
residue system modulo 7, that is every integer a not divisible by 7 is congruent to 3
i
for
a unique integer i modulo (7) . This fact allows us to replace calculations using only
multiplication and exponentiation modulo 7 by calculations using addition modulo (7)
instead.
Example 1. Solve the equation x
5
6 (mod 7) .
Solution: Let x 3
y
(mod 7) . Since 6 3
3
(mod 7) , the given equation can now be
written 3
5y
3
3
(mod 7) , which is equivalent to the congruence 5y 3 (mod 6) . The
latter congruence has the unique solution y 3 (mod 6) , andhence our original equation
has the unique solution x 6 (mod 7) . .
Motivated by Example 1, we will investigate numbers m with the property that
there exists a number g such that g
i
[ 0 i < (m) is a reduced residue system. That
not all numbers m have this property follows from the following example.
Example 2. Since 1
2
3
2
5
2
7
2
1 (mod 8) and (8) = 4, it follows that
a
i
[ 0 i < 4 is never equal to a reduced residue system modulo 8. .
Proposition 15.1 Let m be a positive integer and a any integer such that (a, m) = 1. Dene
A = k Z [ a
[k[
1 (mod m).
Then A is an ideal in Z.
Proof. We have to prove that the set A is closed under subtraction, i.e. that j, k A
j k A. To prove this we may assume j k, because j k belongs to A if and only if
k j belongs to A.
Suppose j, k A. If j k 0, then a
j
a
k
1 (mod m) , and hence a
jk
a
jk
a
k
= a
j
1 (mod m) . If j 0 > k, then a
j
a
k
1 (mod m) , and we obtain
a
jk
= a
j
a
k
1 1 = 1 (mod m) . Finally, if 0 > j k, then a
j
a
k
1 (mod m) , and
we conclude that a
jk
a
j
a
jk
= a
k
1 (mod m) . Thus, in each case j k A. .
Note that A contains nonzero integers, because (m) belongs to A by Eulers
theorem. By Theorem 1.8, the ideal A is generated by a unique positive integer h, which
is the smallest positive integer belonging to A, that is a
h
1 (mod m) while a
j
, 1
(mod m) for 1 j < h.
15. Primitive Roots 45
Denition 15.2 The positive generator h of A, i.e. the smallest postive integer such that
a
h
1 (mod m) , is called the order of a modulo m and is denoted by ord a.
The order ord a of course depends on the modulus m, but since the modulus will
always be xed during a calculation, this ambiguity in the notation causes no difculties.
For any modulus m, ord1 = 1.
Example 3. Modulo 8 we have ord3 = ord5 = ord7 = 2. .
Example 4. Let us compute the order of the numbers 2, 3 and 6 modulo 7. Our
calculations before Example 1 show that ord3 = 6. Since 2
2
4 (mod 7) and 2
3
1
(mod 7) , ord2 = 3, and since 6
2
1 (mod 7) , ord6 = 2. .
The following theorem is an immediate consequence of the fact that the ideal A is
generated by h = ord a.
Theorem 15.3 Assume (a, m) = 1 and write h = ord a modulo m. Then
(i) a
n
1 (mod m) if and only if h[n;
(ii) h[(m) ;
(iii) a
j
a
k
(mod m) if and only if j k (mod h) ;
(iv) the numbers 1, a, a
2
, . . . , a
h1
are incongruent modulo m, and each power a
n
is congruent
to one of these modulo m;
(v) ord a
k
= h/(h, k) .
Proof. (i) follows from the denition of a generator of an ideal.
(ii) follows from (i) and Eulers theorem.
(iii) Assume k j 0; then a
k
a
j
(mod m) holds if andonlyif a
kj
1 (mod m) ,
because we may divide the former congruence by a
j
since (a, m) = 1. The conclusion
now follows from (i).
(iv) is of course a consequence of (iii).
(v) By (i), (a
k
)
n
1 (mod m) kn 0 (mod h) . We can divide the right hand
congruence by k provided we change the modulus to h/(h, k) . Thus,
(a
k
)
n
1 (mod m) n 0 (mod h/(h, k)).
The smallest positive number n satisfying the last congruence is n = h/(h, k) ; by deni-
tion, this is the order of a
k
modulo m. .
Theorem 15.3 (ii) implies that ord a (m) for every number a which is relatively
prime to m. An obvious question now arises: For which m does there exist an integer
whose order is as large as possible, namely (m) ? This question motivates the following
denition.
Denition 15.4 Assume that (g, m) = 1. If the order of g modulo m equals (m) , then
g is called a primitive root modulo m, or a primitive root of m.
Example 5. In Example 4 we calculated the order of 3 modulo 7 and found that
ord3 = 6 = (7) . Consequently, 3 is a primitive root modulo 7. .
Example 6. Not every integer has a primitive root. If m = 8, then a
2
1 for every
odd integer and hence ord a 2 < 4 = (8) for every a relatively prime to 8, that is 8 has
no primitive roots. .
46 Lectures on Number Theory
Theorem 15.5 Suppose g is a primitive root modulo m. Then
(i) 1, g, g
2
, . . . , g
(m)1
is a reduced residue system modulo m;
(ii) g
j
g
k
(mod m) if and only if j k (mod (m)) ;
(iii) g
k
is a primitive root modulo m if and only if (k, (m)) = 1.
In particular, if there exists a primitive root modulo m, then there are precisely ((m))
primitive roots.
Proof. Theorem 15.5 is just a special case of Theorem 15.3. .
Example 7. We have foundthat 3 is a primitive root modulo 7. Since ((7)) = (6) = 2,
there are 2 primitive roots. The other primitive root is 3
5
( 5 (mod 7) ). .
We will show that the only positive integers having primitive roots are 1, 2, 4, p
k
and 2p
k
, where p is an odd prime and k an arbitrary positive integer. We start by proving
that each prime has primitive roots; for this we will need the following two lemmas.
Lemma 15.6 If a has order h and b has order k modulo m, and if (h, k) = 1, then ab has order
hk modulo m.
Proof. Let r be the order of ab. Since (ab)
hk
= (a
h
)
k
(b
k
)
h
1
k
1
h
= 1 (mod m) , we
conclude that r[hk. To complete the proof, we have to show that hk[r . Note that
b
rh
(a
h
)
r
b
rh
= (ab)
rh
1 (mod m) , and hence k[rh. Since (h, k) = 1 it follows that
k[r . In a similar way, we show that h[r . Since (h, k) = 1 it now follows that hk[r . .
Example 8. Working modulo 7 we have ord2 = 3 and ord6 = 2. Consequently, since
2 6 5 (mod 7) , ord5 = ord(2 6) = 2 3 = 6. .
Lemma 15.7 Let p and q be primes, and suppose that q
k
[(p 1) . Then there exists a number
a of order q
k
modulo p.
Proof. By Corollary 9.9, the congruence x
q
k
1 (mod p) has exactly q
k
roots. By Theo-
rem 15.3 (i), the order of such a root is a divisor of q
k
. If a is a root of order less than q
k
,
then a is the root of the congruence x
q
k1
1 (mod p) , but this congruence has exactly
q
k1
roots. Hence, there are exactly q
k
q
k1
incongruent numbers of order precisely q
k
.
.
Theorem 15.8 If p is a prime, then there exist exactly (p 1) primitive roots modulo p.
Proof. By the last statement of Theorem 15.5, it is enough to show that there exists at
least one primitive root modulo p. Let p 1 = q
k
1
1
q
k
2
2
q
k
r
r
be the factorization of p 1
into distinct primes. By Lemma 15.7 there are integers a
i
of order q
k
i
i
for i = 1, 2, . . . , r .
The numbers q
k
i
i
are pairwise relatively prime, so by repeated use of Lemma 15.6 we see
that g = a
1
a
2
a
r
has order p 1, that is g is a primitive root modulo p. .
Suppose that g is a primitive root modulo m. If (a, m) = 1, then Theorem 15.5
implies that there is a unique integer i , with 0 i (m) 1 such that g
i
a (mod m) .
This fact allows us to make the following denition.
Denition 15.9 Let g be a primitive root of m, and suppose (a, m) = 1. The smallest
nonnegative integer i such that g
i
a (mod m) is called the index of a (to the base g) and
is denoted by ind a.
The index depends on both the modulus m and the root g, but since m and g are
usually xed, the notation should cause no confusion.
There is a strong similarity between logarithms and indices, and the following
theorem states the most important properties. The proof is simple and is left to the reader.
15. Primitive Roots 47
Theorem 15.10 Suppose g is a primitive root modulo m, and let ind a denote the index of a
to the base g.
(i) ind1 = 0 and ind g = 1.
(ii) a b (mod m) if and only if ind a = indb.
(iii) ind ab ind a + indb (mod (m)) .
(iv) ind a
k
k ind a (mod (m)) , for all nonnegative integers k.
Theorem 15.11 Let m be a positive integer having a primitive root, and suppose (a, m) = 1.
Then the congruence x
n
a (mod m) has a solution if and only if
(1) a
(m)/(n,(m))
1 (mod m).
If the congruence x
n
a (mod m) is solvable, then it has exactly (n, (m)) incongruent
solutions.
Proof. Let g be a primitive root modulo m, and let d = (n, (m)) . Taking indices, we see
that the congruence x
n
a (mod m) holds if and only if n ind x ind a (mod (m)) .
By Theorem 5.1, this congruence is solvable if and only if d[ ind a, and if solutions exist,
then there are exactly d incongruent solutions.
To complete the proof, we show that (1) holds if and only if d [ ind a. Taking indices,
we see that (1) is equivalent to ((m)/d) ind a 0 (mod (m)) , which holds if and only if
d [ ind a. .
If m has a primitive root, then the solutions of a solvable the congruence x
n
a
(mod m) can be found using indices, provided we compute (or have available) a table of
indices for the given modulus m. See Example 1.
Since every prime modulus has a primitive root, we have the following corollary of
Theorem 15.11.
Corollary 15.12 Suppose p is prime and (a, p) = 1. Then the congruence x
n
a (mod p) is
solvable if and only if
a
(p1)/(n, p1)
1 (mod p).
Note. The corollary gives an efcient procedure for determining whether the congruence
x
n
a (mod p) is solvable, but to actually nd a solution is more difcult. However, if
(n, p 1) = 1, this is relatively easy. Use the Euclidean Algorithm to nd positive integers
s and t such that sn = t(p 1) +1; then a
sn
= a
t(p1)
a a (mod p) , that is a
s
is a solution
of the congruence x
n
a (mod p) .
The following corollary is a generalization of Corollary 9.9.
Corollary 15.13 Suppose that m has a primitive root and that n[(m) . Then the congruence
x
n
1 0 (mod m) has exactly n roots.
Proof. The congruence x
n
1 (mod m) is obviously solvable. Hence, by Theorem 15.11
it has (n, (m)) = n incongruent solutions. .
We next show that if p is an odd prime, then p
k
has primitive roots for each k.
Theorem 15.14 Suppose that p is an odd prime.
(i) If g is a primitive root modulo p, then g + np is a primitive root modulo p
2
for exactly
p 1 values of n modulo p.
(ii) If g is a primitive root modulo p
2
, then g is a primitive root modulo p
k
for all k 2.
Proof. Let h denote the order of g+np modulo p
2
. ( h may depend on n.) Then h[(p
2
) ,
that is h[p(p 1) .
48 Lectures on Number Theory
But (g + np)
h
1 (mod p
2
) implies (g + np)
h
1 (mod p) , and by the Binomial
theorem, (g + np)
h
= g
h
+
h
j=1
_
h
j
_
(np)
j
g
hj
g
h
(mod p) , and hence g
h
1 (mod p) .
Since g has order p 1, it follows that (p 1)[h.
Thus h = p 1 or h = p(p 1) . In the latter case, g + np is a primitive root of p
2
,
and in the former case it is not. We will prove that the former case arises only for one of
the p possible values of n.
Let f (x) = x
p1
1; then g is a root of the congruence f (x) 0 (mod p) and
f
/
(g) = (p 1)g
p2
, 0 (mod p) , since (g
p2
, p) = 1. Hence, by Theorem 10.1 there is a
unique root of the form g + np of the congruence f (x) 0 (mod p
2
) . This proves our
claim.
(ii) It sufces to prove that if g is a primitive root modulo p
k
, k 2, then g is also
a primitive root modulo p
k+1
. Let h be the order of g modulo p
k+1
; then h[(p
k+1
) , that
is h[p
k
(p 1) . Because g
h
1 (mod p
k+1
) implies g
h
1 (mod p
k
) and g is a primitive
root modulo p
k
, (p
k
) must divide h, that is p
k1
(p 1)[h.
Thus either h = p
k1
(p 1) or h = p
k
(p 1) = (p
k+1
) . In the latter case, g is a
primitive root modulo p
k+1
as claimed. We must show that the former case is excluded.
Let t = (p
k1
) ; then g
t
1 (mod p
k1
) by Eulers theorem, and therefore g
t
=
1 + np
k1
for some integer n. If p[n then we would have g
t
1 (mod p
k
) , which
contradicts the fact that g is primitive root modulo p
k
. Thus, p,[ n.
By the Binomial Theorem
g
pt
= (g
t
)
p
= (1 + np
k1
)
p
= 1 + np
k
+
p(p 1)
2
n
2
p
2k2
+ . . . 1 + np
k
(mod p
k+1
).
Here, we have used that fact that the integer
p(p 1)
2
n
2
p
2k2
=
p 1
2
n
2
p
2k1
is divisible
by p
k+1
, because 2k 1 k + 1 when k 2, and the remaining omitted terms in the
expansion contain even higher powers of p.
Since p,[ n, we now conclude that
g
pt
, 1 (mod p
k+1
).
Therefore, h ,= pt = p(p
k1
) = p
k1
(p 1) , and the proof is complete. .
Example 9. Since 2
2
1 , 1 (mod 5) , we conclude that the order of 2 modulo 5
must be 4, that is 2 is a primitive root of 5. By Theorem 15.14, 2 + 5n is a primitive root of
25 for exactly four values of n, 0 n 4. Since (25) = 20, the primitive roots of 25 have
order 20. The order h modulo 25 of an arbitrary number a is a divisor of 20. If h < 20,
then either h[4 or h[10, so it follows that a
4
1 (mod 25) or a
10
1 (mod 25) . Hence,
to nd whether a number a has order 20 it is enough to compute a
4
and a
10
modulo
25; the order is 20 if and only if none of these two powers are congruent to 1. For a = 2
we obtain 2
2
4, 2
4
16, 2
8
6 and 2
10
24. Hence, the order of 2 is 20, i.e. 2 is a
primitive root of 25.
For a = 7 we obtain 7
2
1 and 7
4
1 (mod 25) , that is the order of 7 is 4, and
7 is not a primitive root of 25. Of course, it now follows that 12, 17 and 22 are primitive
roots of 25.
By Theorem 15.14 (ii), 2 is a primitive root of 5
k
for all k. .
Theorem 15.15 Suppose that p is an odd prime, and let g be a primitive root modulo p
k
. If g
is odd, then g is also a primitive root modulo 2p
k
, and if g is even, then g + p
k
is a primitive root
modulo 2p
k
.
Proof. If g is odd, then g
j
1 (mod 2) for every j 1. Thus g
j
1 (mod 2p
k
) if and
only if g
j
1 (mod p
k
) , and hence the order of g modulo 2p
k
is equal to the order of g
modulo p
k
, namely (p
k
) . Since (2p
k
) = (p
k
) , g is a primitive root of 2p
k
.
16. Arithmetic Functions 49
If g is even, then g cannot be a primitive root of 2p
k
, for a primitive root is always
relatively prime to the modulus. But g + p
k
is odd and, since it is congruent to g modulo
p
k
, it is also a primitive root modulo p
k
. Hence, g + p
k
is a primitive root of 2p
k
by the
preceding argument. .
Example 10. By Example 9, 2 is a primitive root of 5
k
for each k. Hence, 2 + 5
k
is a
primitive root of 2 5
k
for each k. In particular 7 is a primitive root of 10, and 27 is a
primitive root of 50. By the same example, 17 is also a primitive root of 5
k
for each k.
Since 17 is odd, it follows that 17 is a primitive root of 2 5
k
for each k. .
Theorem 15.16 There exists a primitive root modulo m if and only if m = 1, 2, 4, p
k
, or 2p
k
,
where p is an odd prime and k is an arbitrary positive integer.
Proof. First note that 1, 2, and 4 have primitive roots (1, 1, and 3, respectively). Theorems
15.8, 15.14, and 15.15 imply that p
k
and 2p
k
have primitive roots whenever p is an odd
prime and k is an arbitrary positive integer.
Conversely, to prove that these are the only positive integers having primitive roots,
assume m > 2 has a primitive root. By Corollary 15.13, the congruence x
2
1 (mod m)
has exactly 2 incongruent solutions (because 2[(m) for all m 3). Theorem 11.5 now
implies that m must be either 4, p
k
, or 2p
k
, where p is an odd prime. .
Concluding remarks. Readers with a basic knowledge of group theory may have noticed that most
of the notions in this section are special cases of general group notions.
If G is a general nite group with identity element e , then the order ord a of an element a is
dened to be the smallest positive integer n satisfying a
n
= e , while the order of the group, ord G,
is dened to be the number of elements in G. If h = ord a, then h[ ord G and e, a, a
2
, . . . , a
h1
is
a subgroup of G. This subgroup coincides with G if ord a = ord G, and G is then called a cyclic
group with a as a generator.
Applying these general notions to the specic case when G is the group Z
m
of all residue
classes modulo m that are relatively prime to m, we see that the order h of an integer a modulo m
coincides with the order of the residue class a in Z
m
, that h[(m) , that a number g is a primitive
root modulo m if and only if the residue class g generates Z
m
, and that there exists a primitive root
modulo m if and only if the group Z
m
is a cyclic group. Using the language of groups we can state
Theorem 15.16 as follows: The group Z
m
is cyclic if and only if m = 1, 2, 4, p
k
or 2p
k
, where p is
an odd prime and k is an arbitrary positive integer.
16. Arithmetic Functions
Functions that are dened for all positive integers and whose range is a subset of R (or
more generally C) are called arithmetic functions.
We have already considered one very important arithmetic functions the Euler
-function. Other important arithmetic functions to be considered in this section are
(n) , the number of positive divisors of n;
(n) , the sum of the positive divisors of n;
k
(n) , the sum of the kth powers of the positive divisors of n.
We will use the following sum and product conventions.
d[n
f (d) and
d[n
f (d)
denote the sum and product, respectively, of f (d) over all positive divisors d of n. For
example,
d[12
f (d) = f (1) + f (2) + f (3) + f (4) + f (6) + f (12) .
50 Lectures on Number Theory
Using this notation, we have
(n) =
d[n
1, (n) =
d[n
d,
k
(n) =
d[n
d
k
.
Note that the divisor function (n) and (n) are special cases, since (n) =
0
(n) and
(n) =
1
(n) .
Denition 16.1 An arithmetic function f (n) is called multiplicative if it is not identically
zero and satises f (mn) = f (m) f (n) for every pair of relatively prime positive integers m
and n. If f (mn) = f (m) f (n) for each pair m and n, relatively prime or not, then f (n) is
said to be completely multiplicative.
If f is a multiplicative function, then f (n) = f (n) f (1) for every positive integer n,
and since there is an n for which f (n) ,= 0, it follows that f (1) = 1. Using mathematical
induction, it is easy to prove that if m
1
, m
2
, . . . , m
r
are pairwise relatively prime positive
integers, then
f (m
1
m
2
m
r
) = f (m
1
) f (m
2
) f (m
r
).
In particular, this holds whenever the integers m
1
, m
2
, . . . , m
r
are powers of distinct
primes. Thus, if n = p
k
1
1
p
k
2
2
p
k
r
r
is the canonical factorization of the integer n > 1 as
a product of powers of distinct primes, then f (n) = f (p
k
1
1
) f (p
k
2
2
) f (p
k
r
r
) . Therefore, the
value of f (n) for every n is completely determined by the values f (p
k
) for all prime
powers.
We already know that (n) is multiplicative (Theorem 6.2), and we have used this
fact to obtain a formula for (n) .
Our next theorem yields a general method for constructing multiplicative functions.
Theorem 16.2 Let f (n) be a multiplicative function, and let F(n) =
d[n
f (d) . Then F(n) is
multiplicative.
Proof. Let (m, n) = 1. If d[mn, then d = d
1
d
2
, where d
1
[m and d
2
[n. Moreover, d
1
=
(m, d) , d
2
= (n, d) and (d
1
, d
2
) = 1, and the factorization is unique. Consequently,
F(mn) =
d[mn
f (d) =
d
1
[m
d
2
[n
f (d
1
d
2
) =
d
1
[m
d
2
[n
f (d
1
) f (d
2
)
=
d
1
[m
f (d
1
)
d
2
[n
f (d
2
) = F(m)F(n).
.
Corollary 16.3 (i) The functions (n) , (n) , and more generally,
k
(n) are multiplicative.
(ii) If n = p
k
1
1
p
k
2
2
p
k
r
r
, then
(n) =
r
j=1
(k
j
+ 1) and (n) =
r
j=1
_
p
k
j
+1
1
p
j
1
_
.
Proof. (i) Since
k
(n) =
d[n
d
k
, and the function f (n) = n
k
is (completely) multiplicative,
it follows fromthe previous theoremthat
k
(n) is multiplicative. (n) and (n) are special
cases.
(ii) The positive divisors of p
k
are 1, p, p
2
, . . . , p
k
, and hence (p
k
) = k + 1 and
(p
k
) =
k
j=0
p
j
= (p
k+1
1)/(p 1) . The formulas for (n) and (n) follow from this. .
16. Arithmetic Functions 51
Theorem 16.4 For every positive integer n,
d[n
(d) = n.
Proof. Write F(n) =
d[n
(d) ; then F(n) is multiplicative by Theorem 16.2. Since the
function G(n) = n is also multiplicative, it sufces to verify that F(p
k
) = p
k
for all prime
powers p
k
in order to prove that F(n) = n for all n. But (p
j
) = p
j
p
j1
for j 1, and
hence
F(p
k
) =
d[p
k
(d) =
k
j=0
(p
j
) = 1 +
k
j=1
(p
j
p
j1
) = p
k
. .
Let f (n) be an arithmetic function, and dene F(n) =
d[n
f (n) . Is the function f
uniquely determined by the function F? We have
_
_
F(1) = f (1)
F(2) = f (1) + f (2)
F(3) = f (1) + f (3)
F(4) = f (1) + f (2) + f (4)
F(5) = f (1) + f (5)
.
.
.
F(n) = f (1) + . . . + f (n)
This can be viewed as a triangular system of linear equations with f (1) , f (2) , . . . ,
f (n) as unknowns. It is now obvious that f (n) is a linear combination of F(1) , F(2) , . . . ,
F(n) with integral coefcients. In particular, the function f is uniquely determined by
the function F. Our next objective is to derive a formula for f (n) , and for this we will
need the following function.
Denition 16.5 Dene
(n) =
_
_
_
1 if n = 1,
0 if n is divisible by p
2
for some prime p
(1)
r
if n = p
1
p
2
p
r
, where p
1
, p
2
, . . . , p
r
are distinct primes.
The function is called Mbius -function.
Theorem 16.6 The function (n) is multiplicative and
d[n
(d) =
_
1 if n = 1
0 if n > 1.
Proof. Multiplicativity is obvious. Dene F(n) =
d[n
(d) ; then F(n) is multiplicative
by Theorem 16.2. Since (p) = 1 and (p
j
) = 0 for j 2, we have F(p
k
) =
k
j=0
(p
j
) =
(1) + (p) = 1 1 = 0, for all primes p and all k 1. Hence, F(n) = 0 for all n > 1, and
F(1) = (1) = 1. .
Theorem 16.7 (Mbius inversion formula) Let f be an arbitrary arithmetic function. If
F(n) =
d[n
f (d) for every positive integer n, then f (n) =
d[n
(d)F(n/d) .
Proof. Using the denition of F we obtain
d[n
(d)F(n/d) =
d[n
(d)
k[(n/d)
f (k) =
all d , k with dk[n
(d) f (k).
52 Lectures on Number Theory
Now we can reverse the order of summation and write the last sum in the form
k[n
f (k)
d[(n/k)
(d).
By Theorem 16.6,
d[(n/k)
(d) = 0 except for k = n, when the value is 1. Hence,
d[n
(d)F(n/d) =
k[n
f (k)
d[(n/k)
(d) = f (n) . .
The following converse is also true.
Theorem 16.8 If f (n) =
d[n
(d)F(n/d) for every positive integer n, then F(n) =
d[n
f (d) .
Proof. Dene G(n) =
d[n
f (d) ; then f (n) =
d[n
(d)G(n/d) by Theorem 16.7. Thus,
(1)
d[n
(d)F(n/d) =
d[n
(d)G(n/d)
holds for all n. We will use induction to show that this implies that F(n) = G(n) for all
positive integers n.
First of all, taking n = 1 in (1) we get (1)F(1/1) = (1)G(1/1) , that is F(1) = G(1) .
Suppose that we have F(m) = G(m) for all m < n. Since n/d < n for all positive divisors
d of n except for d = 1, (1) now simplies to (1)F(n/1) = (1)G(n/1) , and we conclude
that F(n) = G(n) . This completes the induction. .
17. Sums of Squares
In this section we will treat the problem of representing a positive integer as a sum of
squares. In particular, we will determine which numbers can be written as the sum of
two squares, and we will prove that every positive number is a sum of four squares.
By denition, a positive integer n is the sumof two squares if the equation x
2
+y
2
= n
has an integral solution x, y. Since x
2
0 or 1 (mod 4) for all integers x, it is clear that
the sum x
2
+ y
2
of two squares is never congruent to 3 modulo 4. Hence no integer of the
form 4m + 3 is the sum of two squares. For primes we have the following necessary and
sufcient condition
Theorem 17.1 Let p be a prime. Then p is the sum of two squares if and only if p = 2 or
p 1 (mod 4) .
Proof. By the preceding comments, no prime 3 (mod 4) is the sum of two squares,
and clearly 2 = 1
2
+ 1
2
. It remains to prove that every prime congruent to 1 modulo 4 is
the sum of two squares.
Assume p 1 (mod 4) and write N = [
p < N + 1. The
number 1 is a quadratic residue of p by Theorem 11.9, and hence there is an integer
i such that i
2
1 (mod p) . Let A be the set of all pairs (j, k) , where j and k are
integers in the interval [0, N] , let B denote the set of all residue classes modulo p, i.e.
B = 0, 1, . . . , p 1, and dene a function f : A B by f (x, y) = x + iy. Since B has
p elements and A has (N + 1)
2
> p elements, it is impossible for the function f to be
injective. Hence there must be two distinct pairs (x
1
, y
1
) and (x
2
, y
2
) in A that are mapped
onto the same equivalence class, that is x
1
+ iy
1
x
2
+ iy
2
(mod p) . Let a = x
1
x
2
and
b = y
1
y
2
; then a and b are not both zero and a ib (mod p) , and by squaring we
obtain a
2
i
2
b
2
b
2
(mod p) , that is a
2
+ b
2
is a multiple of p. But [a[ N and
[b[ N, and therefore 0 < a
2
+ b
2
2N
2
< 2p. It follows that a
2
+ b
2
= p. .
17. Sums of Squares 53
Lemma 17.2 Suppose that n = a
2
+ b
2
and that the prime factorization of n contains the prime
factor q, where q 3 (mod 4) . Then
(i) q[a and q[b;
(ii) q must appear to an even power in the prime factorization of n.
Proof. (i) Assume q,[ a. Then there is a number s such that sa 1 (mod q) , and by
multiplying the congruence a
2
+ b
2
0 (mod q) by s
2
we obtain (sb)
2
= s
2
b
2
s
2
a
2
j=1
p
j
j
s
j=1
q
j
j
,
where the p
j
s and q
j
s are distinct primes, p
j
1 (mod 4) and q
j
3 (mod 4) . Then n can
be expressed as a sum of two squares if and only if all the exponents
j
are even.
Proof. The sufciency of the condition follows by repeated application of Lemma 17.3,
because 2 and the primes p
j
can be represented as a sum of two squares by Theorem 17.1,
and the square q
2
(of any number q) is obviously the sum of two squares ( = q
2
+0
2
). The
necessity follows immediately from Lemma 17.2. .
Having decided which numbers are sums of two squares, the next natural task is to
determine which numbers are representable as sums of three squares. Since the quadratic
residues of 8 are 0, 1, and 4, the sum a
2
+ b
2
+ c
2
of three squares can never be congruent
to 7 modulo 8. Hence, no integer of the form 8m + 7 is representable as a sum of three
squares. It is not difcult to extend the argument to show that no number of the form
4
k
(8m + 7) is the sum of three squares. Conversely, any other number can be written as a
sum of three squares. The proof, which is due to Gauss, is complicated and will not be
given here. Thus the complete characterization is as follows.
Theorem 17.5 A positive integer can be expressed as a sum of three squares if and only if it is
not of the form 4
k
(8m + 7) .
When it comes to the question of representing numbers as sums of four squares, we
have the following simple answer.
Theorem 17.6 Every positive integer is the sum of four squares.
The rst complete proof of this result was given by Lagrange in 1770. We will base
our proof on the following two lemmas.
54 Lectures on Number Theory
Lemma 17.7 Suppose 3m is a sum of four squares. Then m is also a sum of four squares.
Proof. Let 3m = a
2
+ b
2
+ c
2
+ d
2
. Since every square is congruent to 0 or to 1 modulo 3,
there are only two possibilities: either all four of the squares a
2
, b
2
, c
2
, and d
2
are
congruent to 0 modulo 3, or one of them, a
2
say, is congruent to 0 whereas the other three
are congruent to 1. In the rst case, a b c d 0 (mod 3) , and in the second case
a 0, b 1, c 1, and d 1 (mod 3) . In any case we have, by changing signs
of b, c, and d if necessary, b c d (mod 3) . It follows that the four numbers b + c + d,
a +b c, a +c d, and a b +d are all divisible by 3, and by expanding and simplifying,
we nd that
_
b + c + d
3
_
2
+
_
a + b c
3
_
2
+
_
a + c d
3
_
2
+
_
a b + d
3
_
2
=
3a
2
+ 3b
2
+ 3c
2
+ 3d
2
9
=
9m
9
= m,
that is m is the sum of four squares. .
Lemma 17.8 Let n be a square-free integer, i.e. n is not divisible by p
2
for any prime p. Then
there exist integers a and b such that a
2
+ b
2
1 (mod n) .
Proof. We will rst prove that the result holds when n is a prime p. For p = 2 and
p 1 (mod 4) , it follows from Theorem 11.9 that there is a number a such that a
2
1
(mod p) ; thus we can take b = 0. The case p 3 (mod 4) remains, and we will give a
proof for this case that works for all odd primes.
Let m = (p 1)/2, and dene
A = 0
2
, 1
2
, 2
2
, . . . , m
2
and B = 1 0
2
, 1 1
2
, 1 2
2
, . . . , 1 m
2
.
Any two elements of A are incongruent modulo p. For let 0 i m. The congruence
x
2
i
2
(mod p) has exactly two roots i modulo p, and the only number x in the
interval [0, m] that is congruent to i is x = i , since p i > m. Similarly, any two
elements of B are incongruent modulo p. Thus, each of the sets A and B contains
m+1 = (p+1)/2 incongruent integers. Since their union contains p+1 integers, it follows
that there is an element a
2
of A and an element 1 b
2
of B such that a
2
1 b
2
(mod p) , that is a
2
+ b
2
1 (mod p) .
Suppose now that n = p
1
p
2
p
r
is a product of distinct primes. For each prime
p
j
, choose a
j
, b
j
such that a
2
j
+ b
2
j
1 (mod p
j
) . By the Chinese Remainder Theorem,
there are numbers a and b such that a a
j
(mod p
j
) and b b
j
(mod p
j
) for all j . It
follows that a
2
+ b
2
1 (mod p
j
) holds for all j , and this implies that a
2
+ b
2
1
(mod n) . .
Proof of Theorem 17.6. Let n be a positive integer and write n = k
2
m, where m is square-
free. If m is the sumof four squares, say m = a
2
+b
2
+c
2
+d
2
, then n = (ak)
2
+(bk)
2
+(ck)
2
+(dk)
2
is a sum of four squares, too.
Hence, we may as well assume that n is square-free. Then, by Lemma 17.8, there
are two integers a and b such that a
2
+ b
2
1 (mod n) . Consider all ordered pairs
(ax + by z, bx ay w) , where x, y, z, and w range over all integers from 0 to [
n ] .
There are (1 + [
n ])
4
> n
2
choices for the quadruple (x, y, z, w) but only n
2
distinct
orderedpairs modulo n. Hence there exist distinct orderedquadruples (x
1
, y
1
, z
1
, w
1
) and
(x
2
, y
2
, z
2
, w
2
) , withall entries lyinginthe interval from0 to [
n ] , suchthat ax
1
+by
1
z
1
ax
2
+ by
2
z
2
(mod n) and bx
1
ay
1
w
1
bx
2
ay
2
w
2
(mod n) .
18. Pythagorean Triples 55
Let x = x
1
x
2
, y = y
1
y
2
, z = z
1
z
2
, and w = w
1
w
2
. Then ax + by z
(mod n) and bx ay w (mod n) . Therefore, (ax +by)
2
+(bx ay)
2
z
2
+w
2
(mod n) .
But (ax + by)
2
+ (bx ay)
2
= (a
2
+ b
2
)(x
2
+ y
2
) (x
2
+ y
2
) (mod n) . It follows that
x
2
+ y
2
+ z
2
+ w
2
0 (mod n) , that is x
2
+ y
2
+ z
2
+ w
2
= kn for some integer k. Clearly,
[x[ , [y[ , [z[ , and [w[ are all less than or equal to [
n ]
2
< 4n, and thus k = 1, 2, or 3.
If k = 1, we are done, while if k = 3, then 3n is a sum of four squares, and n itself is
a sum of four squares by Lemma 17.7. Now suppose k = 2; since 2n is even, either zero,
two, or four of the numbers x, y, z, and w are even. If exactly two are even, we may
suppose that they are x and y. In each case, the numbers x y and z w are even, and
hence (x y)/2 and (z w)/2 are integers. But
_
x + y
2
_
2
+
_
x y
2
_
2
+
_
z + w
2
_
2
+
_
z w
2
_
2
=
2x
2
+ 2y
2
+ 2z
2
+ 2w
2
4
=
4n
4
= n,
and hence n is a sum of four squares. This completes the proof. .
The same year as Lagrange proved the four squares theorem, Waring made the
following broader conjecture: Every number is the sum of 4 squares, 9 cubes, 19 fourth
powers, and in general, a sum of a xed number of kth powers. This conjecture was
nally settled in 1909 when David Hilbert proved the following theorem.
Theorem 17.9 For any k 2, there is a smallest positive integer s(k) such that every positive
integer can be expressed as a sum of s(k) nonnegative kth powers.
Hilberts proof was a pure existence proof and gave no method of determining s(k) .
A natural problem is to determine s(k) explicitely. Lagranges theorem and the fact that
7 is not the sum of three squares shows that s(2) = 4. To determine s(k) for k 3 turned
out to be very difcult. It is now known that s(3) = 9, s(4) = 19, and s(5) = 37.
It is rather easy to show that the number n = 2
k
[(3/2)
k
] 1 can not be expressed
as a sum with fewer than 2
k
+ [(3/2)
k
] 2 kth powers. Hence, s(k) 2
k
+ [(3/2)
k
] 2
for every k 2, and it has been conjectured that in fact s(k) = 2
k
+ [(3/2)
k
] 2 for every
value of k. This equality is now known to hold for every k 471 600 000 and also for all
sufciently large k, so the conjecture is very likely true.
18. Pythagorean Triples
In this section we will consider the problem of nding right triangles having sides of
integral length, that is of nding integer solutions of the equation
x
2
+ y
2
= z
2
.
This problem was considered in Egypt long before Pythagoras, but he is credited for
having found a formula generating innitely many solutions.
Denition 18.1 If x, y and z are positive integers satisfying x
2
+ y
2
= z
2
, then (x, y, z)
is called a Pythagorean triple. If, in addition, the three numbers are pairwise relatively
prime, then (x, y, z) is called a primitive triple.
56 Lectures on Number Theory
Proposition 18.2 If (x, y, z) is a Pythagorean triple, then (x, y) = (x, z) = (y, z) .
Proof. Suppose d[x. If d[y, then d
2
[(x
2
+ y
2
) , and hence d[z. Similarly, if d[z, then
d
2
[(z
2
x
2
) , so d[y. It follows that x and y have the same common divisors as x and z,
and in particular the two pairs have the same greatest common divisor, i. e. (x, y) = (x, z) .
Similarly, (x, y) = (y, z) . .
In particular, if (x, y, z) is a Pythagorean triple and two of the three numbers are
relatively prime, then the three numbers are pairwise relatively prime, that is (x, y, z) is a
primitive triple.
Theorem 18.3 Every Pythagorean triple is a multiple of a primitive Pythagorean triple. Con-
versely, every multiple of a Pythagorean triple is a Pythagorean triple.
Proof. If (x, y, z) is a Pythagorean triple and d = (x, y) is the greatest common divisor
of x and y, then clearly (x/d, y/d, z/d) is a primitive Pythagorean triple by Proposition
18.2, and hence (x, y, z) is a multiple of a primitive triple. The converse is obvious. .
Proposition 18.4 Suppose (x, y, z) is a primitive Pythagorean triple. Then x and y are of
opposite parity, i.e. one of the numbers is odd and the other is even.
Proof. Since (x, y) = 1, both numbers cannot be even. Suppose x and y are both odd.
Then x
2
y
2
1 (mod 4) and hence z
2
2 (mod 4) , which is impossible. Therefore, x
and y are of opposite parity. .
In order to nd all Pythagorean triples it sufces, by Theorem 18.3, to nd all
primitive triples (x, y, z) , and when we are looking for primitive triples, it is no restriction
to assume that x is odd and y is even, because (x, y, z) is a Pythagorean triple if and only
if (y, x, z) is so.
All primitive Pythagorean triples (x, y, z) with even y are generated as follows.
Theorem 18.5 A triple (x, y, z) , with y even, is a primitive Pythagorean triple if and only if it
is of the form
x = a
2
b
2
, y = 2ab, z = a
2
+ b
2
,
where a and b are relatively prime positive integers of opposite parity with a > b.
Proof. If x, y, and z are dened as above, then x
2
+y
2
= (a
2
b
2
)
2
+4(ab)
2
= (a
2
+b
2
)
2
= z
2
,
that is (x, y, z) is a Pythagorean triple. To see that it is primitive, assume that (x, z) > 1.
Then, x and z have a common prime divisor p which must be odd, since x and z are
both odd. Note that z + x = 2a
2
and z x = 2b
2
, and hence p[2a
2
and p[2b
2
. Since p is
odd, it follows that p[a and p[b, which contradicts the assumption (a, b) = 1. Therefore,
(x, z) = 1 and it follows from Proposition 18.2 that (x, y, z) is a primitive triple.
Conversely, assume (x, y, z) is a primitive Pythagorean triple with y even. Then x
and z must be odd. Thus z + x and z x are even. Let r = (z + x)/2 and s = (z x)/2.
Then z = r + s and x = r s, so any common divisor of r and s is a common divisor of x
and z. Since (x, z) = 1 it follows that (r, s) = 1. Since y
2
= z
2
x
2
= (z + x)(z x) = 4rs,
we have (y/2)
2
= rs; but r and s are relatively prime, so r and s must each be perfect
squares. Write r = a
2
and s = b
2
, where a and b are positive integers; then a > b and
x = r s = a
2
b
2
, z = r + s = a
2
+ b
2
, and y = 2
2 + 1 = 2 + (
2 1) = 2 +
1
2 + 1
= 2 +
1
2 +
1
2 + 1
= 2 +
1
2 +
1
2 +
1
2 + 1
= 2 +
1
2 +
1
2 +
1
2 +
1
.
.
.
We start by giving a formal denition of nite continued fractions. Though we are
mainly interested in continued fractions whose terms are integers, it is convenient with a
more general denition.
Denition 20.1 Let a
0
, a
1
, . . . , a
n
be real numbers, all positive except possibly a
0
. The
expression
a
0
+
1
a
1
+
1
a
2
+
1
.
.
.
+
1
a
n1
+
1
a
n
is called a nite continued fraction and is denoted by a
0
, a
1
, . . . , a
n
). The numbers a
k
are
called the terms or the partial quotients of the continued fraction.
If the reader does not like the dots in the above denition, the following recursive
denition should satisfy her completely:
a
0
) = a
0
a
0
, a
1
) = a
0
+ 1/a
1
)
a
0
, a
1
, . . . , a
n
) = a
0
, a
1
, . . . , a
n2
, a
n1
+
1
a
n
), if n 2.
20. Continued Fractions 59
The reason for assuming a
k
> 0 for k 1 in the above denition is that this
guarantees that no division by zero will occur.
A continued fraction with n+1 terms can be compressed by viewing it as composed
of two shorter continued fractions as follows, which is very useful in induction proofs.
Proposition 20.2 Let 1 k n. Then
(i) a
0
, a
1
, . . . , a
n
) = a
0
, a
1
, . . . , a
k1
, a
k
, a
k+1
, . . . , a
n
)), and
(ii) a
0
, a
1
, . . . , a
n
) = a
0
+
1
a
1
, . . . , a
n
)
.
Proof. The formulas should be obvious from the very denition of continued fractions.
For a formal proof of (i), use induction on the number m of terms in the innermost
continued fraction a
k
, a
k+1
, . . . , a
n
). If m = 1, that is k = n, then a
n
) = a
n
, and there is
nothing to prove. If m = 2, then a
n1
, a
n
) = a
n1
+ 1/a
n
, and the identity (i) coincides
with the recursive denition of a
0
, a
1
, . . . , a
n
).
Now suppose inductively that the identity (i) holds whenever the innermost contin-
ued fraction has m terms, and consider the case when a
k
, a
k+1
, . . . , a
n
) has m + 1 terms.
By the induction hypothesis applied twice and the case m = 2 applied once, we obtain
a
0
, a
1
, . . . , a
n
) = a
0
, a
1
, . . . , a
k1
, a
k
, a
k+1
, . . . , a
n
))
= a
0
, a
1
, . . . , a
k1
, a
k
, a
k+1
, . . . , a
n
)))
= a
0
, a
1
, . . . , a
k1
, a
k
, a
k+1
, . . . , a
n
)).
This completes the induction argument.
(ii) is a special case of (i), obtained by taking k = 1. .
Innite continued fractions are dened as limits of nite continued fractions in a
straightforward way.
Denition 20.3 Let (a
n
)
n=0
be a sequence of real numbers, all positive exept possibly a
0
.
The sequence (a
0
, a
1
, . . . , a
n
))
n=0
is called an innite continued fraction and is denoted by
a
0
, a
1
, a
2
, . . .). The innite continued fraction is said converge if the limit
lim
n
a
0
, a
1
, . . . , a
n
)
exits, and in that case the limit is also denoted by a
0
, a
1
, a
2
, . . .).
In order to determine the convergence of a given innite continued fraction we
need to consider the nite continued fractions a
0
, a
1
, . . . , a
n
) for increasing values of n.
Suppose now that we have computed the value of a
0
, a
1
, . . . , a
n
) and want to compute
the value of a
0
, a
1
, . . . , a
n
, a
n+1
) without having to repeat the whole computation from
scratch. The recursion formula (ii) in Proposition 20.2 will then be of no use, since
it denes a
0
, a
1
, . . . , a
n
, a
n+1
) in terms of a
0
and a
1
, . . . , a
n
, a
n+1
) and not in terms of
a
0
, a
1
, . . . , a
n
) and a
n+1
. Fortunately, there is an easy way to compute the continued
fractions a
0
, a
1
, . . . , a
n
) in succession, and we will now describe this method.
Denition 20.4 Let (a
n
)
N
n=0
be a nite ( N N) or innite ( N = ) sequence of real
numbers, all positive except possibly a
0
, and dene two sequences (p
n
)
N
n=2
and (q
n
)
N
n=2
recursively as follows:
p
2
= 0, p
1
= 1, p
n
= a
n
p
n1
+ p
n2
if n 0,
q
2
= 1, q
1
= 0, q
n
= a
n
q
n1
+ q
n2
if n 0.
The pair (p
n
, q
n
) , as well as the quotient p
n
/q
n
(where n 0), is called the nth convergent
of the given sequence (a
n
)
N
n=0
or, equivalently, of the corresponding continued fraction.
Obviously, q
0
= 1, and q
n
> 0 for all n 0. Thus, (q
n
)
N
n=0
is a positive sequence.
The connection between continued fractions and convergents is given by the next
theorem, which also contains some crucial identities.
60 Lectures on Number Theory
Theorem 20.5 Let (a
n
)
N
n=0
be a sequence of real numbers, all positive except possibly a
0
, with
corresponding convergents (p
n
, q
n
) , and write c
n
= p
n
/q
n
. Then
(i) a
0
, a
1
, . . . , a
n
) = c
n
, for all n 0;
(ii) p
n
q
n1
p
n1
q
n
= (1)
n1
, if n 1;
(iii) c
n
c
n1
= (1)
n1
/q
n1
q
n
, if n 1;
(iv) p
n
q
n2
p
n2
q
n
= (1)
n
a
n
, if n 0;
(v) c
n
c
n2
= (1)
n
a
n
/q
n2
q
n
, if n 2.
Proof. (i): The case n = 0 is trivial, because c
0
= p
0
/q
0
= a
0
/1 = a
0
.
Suppose inductively that (i) holds for all continued fractions with n terms, and let
a
0
, a
1
, . . . , a
n
) be a continued fraction with n + 1 terms. Since
a
0
, a
1
, . . . , a
n
) = a
0
, a
1
, . . . , a
n2
, a
n1
+ 1/a
n
),
and since the latter continued fraction has n terms and its (n 1) st convergent equals
((a
n1
+ 1/a
n
)p
n2
+ p
n3
), (a
n1
+ 1/a
n
)q
n2
+ q
n3
) , we conclude that
a
0
, a
1
, . . . , a
n
) =
(a
n1
+ 1/a
n
)p
n2
+ p
n3
(a
n1
+ 1/a
n
)q
n2
+ q
n3
=
a
n
(a
n1
p
n2
+ p
n3
) + p
n2
a
n
(a
n1
q
n2
+ q
n3
) + q
n2
=
a
n
p
n1
+ p
n2
a
n
q
n1
+ q
n2
=
p
n
q
n
.
This completes the induction step.
(ii) Write z
n
= p
n
q
n1
p
n1
q
n
. Using the recursive denitions, we obtain
z
n
= p
n
q
n1
p
n1
q
n
= (a
n
p
n1
+ p
n2
)q
n1
p
n1
(a
n
q
n1
+ q
n2
)
= p
n2
q
n1
p
n1
q
n2
= z
n1
,
for n 0, and it follows at once that z
n
= (1)
n1
z
1
. But z
1
= 1, since p
1
= q
2
= 1
and p
2
= q
1
= 0. Hence, z
n
= (1)
n1
, as required.
(iii) follows from (ii) upon division by q
n1
q
n
, which is nonzero for n 1.
(iv) Using the recursive denition of p
n
and q
n
and equality (ii), we obtain
p
n
q
n2
p
n2
q
n
= (a
n
p
n1
+ p
n2
)q
n2
p
n2
(a
n
q
n1
+ q
n2
)
= a
n
(p
n1
q
n2
p
n2
q
n1
) = a
n
(1)
n2
= (1)
n
a
n
.
(v) follows from (iv) upon division by q
n2
q
n
. .
Example 1. We use Theorem 20.5 to evaluate the continued fraction 2, 5, 4, 3, 2, 1).
The computations are easily caried out by using the following table:
n 2 1 0 1 2 3 4 5
a
n
2 5 4 3 2 1
p
n
0 1 2 9 38 123 284 407
q
n
1 0 1 5 21 68 157 225
The entries are computed according to the recursive formulas given in Denition 20.4.
For example, to nd p
4
= a
4
p
3
+ p
2
, multiply a
4
= 2 by the last computed p-value p
3
( = 123) and add the preceding term p
2
( = 38) to obtain p
4
= 2(123) +(38) = 284.
Finally, note that 2, 5, 4, 3, 2, 1) = p
5
/q
5
= 407/225. The successive convergents are
2, 9/5, 38/21, 123/68, 284/157, and 407/225. .
20. Continued Fractions 61
Corollary 20.6 Let (a
n
)
N
n=0
be a nite or innite sequence of real numbers, all positive except
possibly a
0
, with convergents c
n
= p
n
/q
n
. The convergents c
2i
with even indices form a strictly
increasing sequence and the convergents c
2j+1
with odd indices forma strictly decreasing sequence,
and c
2i
< c
2j+1
, that is
c
0
< c
2
< . . . < c
2i
< . . . < c
2j+1
< . . . < c
3
< c
1
.
Proof. We have c
n
c
n2
= (1)
n
a
n
/q
n
q
n2
, by Theorem20.5 (v). Hence, if n 2 is even,
then c
n
c
n2
> 0 and if n 3 is odd, then c
n
c
n2
< 0. Finally, by Theorem 20.5 (iii),
c
2k+1
c
2k
= 1/q
2k
q
2k+1
> 0. Thus, if i j , then c
2j
< c
2i
< c
2i+1
and c
2i
< c
2i+1
< c
2j+1
. .
Example 2. In Example 1 we computed the continued fraction 2, 5, 4, 3, 2, 1) and its
successive convergents. It is easily veried that
2 <
38
21
<
284
157
<
407
225
<
123
68
<
9
5
in accordance with Corollary 20.6. .
Let (a
n
)
n=0
be a sequence of real numbers, all positive except possibly a
0
, with
convergents c
n
= p
n
/q
n
. ByTheorem20.5, c
n
= a
0
, a
1
, . . . , a
n
). Corollary20.6 implies that
the sequence (c
2k
)
k=0
of convergents with even indices is strictly increasing and bounded
above by c
1
. Hence, the limit c
/
= lim
k
c
2k
exists. Similarly, the sequence (c
2k+1
)
k=0
is strictly decreasing and bounded below by c
0
. Therefore, the limit c
//
= lim
k
c
2k+1
exists, too, and obviously c
2k
< c
/
c
//
< c
2k+1
for all k.
The limit
c = lim
n
c
n
= lim
n
a
0
, a
1
, . . . , a
n
)
exists if and only if c
/
= c
//
, that is if and only if c
2k+1
c
2k
0 as k . By Theorem
20.5, 0 < c
2k+1
c
2k
< 1/q
2k
q
2k+1
. Therefore, lim
n
q
n
= is a sufcient condition
for the existence of the limit c, i.e. for the convergence of the innite continued fraction
a
0
, a
1
, a
2
, . . .). Our next proposition gives a condition on the sequence (a
n
)
n=0
which will
guarantee that q
n
.
Proposition 20.7 Let (a
n
)
n=0
be a sequence with convergents (p
n
, q
n
) and assume that there is
a constant > 0 such that a
n
for all n 1. Then q
n
as n . More precisely,
there is a constant r > 1 and a positive constant C such that q
n
Cr
n
for all n 0.
The sequence (q
n
)
n=1
is strictly increasing if a
n
1 for all n 1 .
Proof. By assumption, q
n
= a
n
q
n1
+ q
n2
q
n1
+ q
n2
for all n 1. Let r denote the
positive root of the quadratic equation x
2
= x + 1, that is r = /2 +
_
1 +
2
/4, and let C
denote the smallest of the two numbers 1 and a
1
/r . Then q
0
= 1 Cr
0
and q
1
= a
1
Cr
1
.
We claim that q
n
Cr
n
for all n 0. This follows by induction, because if q
k
Cr
k
for
0 k n 1, then q
n
Cr
n1
+ Cr
n2
= Cr
n2
(r + 1) = Cr
n2
r
2
= Cr
n
. Obviously,
r > 1, so it follows that q
n
as n .
If a
n
1 for all n 1, then q
n
= a
n
q
n1
+ q
n2
q
n1
+ q
n2
> q
n1
for all n 2,
which means that the sequence (q
n
)
n=1
is strictly increasing. .
Denition 20.8 A sequence (a
n
)
n=0
of real numbers will be called admissible if there is a
positive constant such that a
n
for all n 1.
Asequence (a
n
)
n=0
consistingof integers, all positive except possibly a
0
, is obviously
admissible with = 1. In particular, for such sequences the corresponding sequence
(q
n
)
n=1
is strictly increasing and unbounded.
The discussion preceeding Proposition 20.7 may now be summarized as follows:
62 Lectures on Number Theory
Theorem 20.9 Let (a
n
)
n=0
be an admissible sequence with convergents c
n
= p
n
/q
n
. The innite
continued fraction = a
0
, a
1
, a
2
, . . .) is then convergent, and it satises
c
2n
< < c
2n+1
and (1)
a
n+2
q
n
q
n+2
< [ c
n
[ <
1
q
n
q
n+1
(2)
for all n 0.
Proof. It only remains to prove (2). By (1), for each n 0, the number belongs to the
interval with endpoints c
n
and c
n+1
, and hence
[ c
n
[ < [c
n+1
c
n
[ =
1
q
n
q
n+1
,
where the last equality follows from Theorem 20.5 (iii).
Moreover, the number c
n+2
lies strictly between the numbers c
n
and . Conse-
quently,
[ c
n
[ > [c
n+2
c
n
[ =
a
n+2
q
n
q
n+2
,
where the last equality is a consequence of Theorem 20.5 (v). This completes the proof of
the theorem. .
Is is often useful to regard an innite continued fractions as a nite continued
fraction with an innite continued fraction as its last term (cf. Proposition 20.2).
Theorem 20.10 Let (a
n
)
n=0
be an admissible sequence of real numbers, let k be a positive integer,
and write
k
= a
k
, a
k+1
, a
k+2
, . . .). Then
a
0
, a
1
, a
2
, . . .) = a
0
, a
1
, . . . , a
k1
,
k
).
Proof. Write = a
0
, a
1
, a
2
, . . .) = lim
n
a
0
, a
1
, . . . , a
n
). By letting n in the
relation
a
0
, a
1
, . . . , a
n
) = a
0
+
1
a
1
, a
2
, . . . , a
n
)
,
we obtain
= a
0
+ 1/
1
= a
0
,
1
).
(Note that
1
> a
1
> 0.) This proves the case k = 1. In particular, we have
k
= a
k
,
k+1
)
for each k.
The general case now follows by induction. Assume that the theorem holds for a
certain k 1; then
= a
0
, a
1
, . . . , a
k1
,
k
) = a
0
, a
1
, . . . , a
k1
, a
k
,
k+1
)) = a
0
, a
1
, . . . , a
k1
, a
k
,
k+1
),
where the last equality follows from Proposition 20.2. This completes the induction step.
.
Example 3. Let us use Theorem 20.10 to compute the periodic innite continued frac-
tion = 1, 2, 3, 1, 2, 3, . . .) = 1, 2, 3 ), where the bar over 1, 2, 3 indicates that this block
of integers is repeated indenitely. By periodicity, = 1, 2, 3,
3
) with
3
= , that is
21. Simple Continued Fractions 63
= 1, 2, 3, ). To compute the value of this nite continued fraction we use convergents,
which are computed in the following table:
n 2 1 0 1 2 3
a
n
1 2 3
p
n
0 1 1 3 10 10 + 3
q
n
1 0 1 2 7 7 + 2
It follows that
= 1, 2, 3, ) =
p
3
q
3
=
10 + 3
7 + 2
.
Solving for we obtain the quadratic equation 7
2
8 3 = 0 with the roots (4
37)/7.
Since > 0, we conclude that = (4 +
37)/7. .
Example 4. To compute the innite periodic continued fraction = 0, 1, 1, 2, 3 ), we
start by writing = 0, 1, ), where = 1, 2, 3 ), and = 0 + 1/(1 + 1/) = /( + 1) . The
value of was computed in the previous example. Inserting = (4 +
37)/12. .
Example 5. = 1, 1, 1, . . .) = 1 ) is the simplest possible innite continued fraction.
We will see later that this number plays a special role when it comes to approximation
of irrational numbers by rational numbers. Since = 1, ), satises the equation
= 1 +1/ , that is
2
= +1. This quadratic equation has the roots (1
5)/2. .
21. Simple Continued Fractions
Denition 21.1 A nite or innite continued fraction is called simple, if all its terms are
integers.
We recall that all terms of a continued fraction, except possibly the rst term a
0
,
are by default supposed to be positive. In particular, all terms of a simple continued
fraction, except possibly the rst one, are positive integers. This means that the terms of
an innite simple continued fraction form an admissible sequence (with = 1), so there
are no convergence problems: The innite simple continued fractions are automatically
convergent.
The value of a nite simple continued fraction is a rational number. Of course, this
follows easily from the recursive denition of nite continued fractions, but we can also
deduce it from the following theorem.
Theorem 21.2 Let (p
n
, q
n
) be the nth convergent of a nite or innite simple continued fraction.
The numbers p
n
and q
n
are then relatively prime integers for each n. Thus, the fractions
c
n
= p
n
/q
n
, n 0, are rational numbers in reduced form.
Proof. It follows at once from their dening recursive relations that p
n
and q
n
are
integers when the terms a
n
of the continued fraction are integers. Relative primeness is
a consequence of the identity p
n
q
n1
p
n1
q
n
= (1)
n1
. .
64 Lectures on Number Theory
Corollary 21.3 Every nite simple continued fraction a
0
, a
1
, . . . , a
n
) is a rational number.
Proof. Because a
0
, a
1
, . . . , a
n
) = p
n
/q
n
. .
Theorem 21.4 The value of an innite simple continued fraction is irrational.
Proof. Assume = a
0
, a
1
, a
2
, . . .) is rational and write = a/b with integers a and b.
By Theorem 20.9, 0 < [a/b p
n
/q
n
[ < 1/q
n
q
n+1
. Multiplying by bq
n
we obtain
0 < [aq
n
bp
n
[ <
b
q
n+1
.
By choosing n so large that b/q
n+1
< 1, which is possible since q
n+1
, we obtain the
inequality 0 < [aq
n
bp
n
[ < 1. Since aq
n
bp
n
is an integer, this is a contradiction. .
Theorem 21.5 Every real number can be expressed as a simple continued fraction. The fraction
is nite if and only if the real number is rational.
Proof. Let be a real number, and dene a
0
= [] . We use the following recursive
algorithm to dene a (possibly empty) nite or innite sequence a
1
, a
2
, . . . of positive
integers.
Step 0: If = a
0
, then = a
0
), and the algorithm stops. Otherwise, 0 < a
0
< 1, and
we dene
1
= 1/( a
0
) , noting that
1
> 1 and that = a
0
,
1
). We then proceed to
step 1.
Step k for k = 1, 2, . . . : Suppose the positive integers a
1
, a
2
, . . . , a
k1
and the real
number
k
> 1 have been dened and that = a
0
, a
1
, . . . , a
k1
,
k
). Dene a
k
= [
k
] .
If
k
= a
k
, then = a
0
, a
1
, . . . , a
k
) and the algorithm stops. Otherwise, dene
k+1
= 1/(
k
a
k
) , which is then a real number > 1, note that
k
= a
k
,
k+1
), and
= a
0
, a
1
, . . . , a
k
,
k+1
), and proceed to step k + 1.
If the algorithm stops, then is a nite simple continued fraction. Otherwise it
denes an innite sequence (a
n
)
n=0
. Dene = a
0
, a
1
, a
2
, . . .), and let c
n
= p
n
/q
n
denote
the nth convergent of the innite continued fraction . Since = a
0
, a
1
, . . . , a
n
,
n+1
),
the numbers c
n1
and c
n
are also convergents of . It therefore follows from Theorem
20.9 and Corollary 20.6 that and both lie between the numbers c
n1
and c
n
. Hence,
[ [ < [c
n
c
n1
[ =
1
q
n1
q
n
.
Since q
n
as n , we conclude that = = a
0
, a
1
, a
2
, . . .). .
Example 1. Using the algorithm of Theorem 21.5 we compute the continued fraction
expansion of
2 as follows:
a
0
= [
2] = 1,
1
= 1/( a
0
) = 1/(
2 1) =
2 + 1;
a
1
= [
1
] = 2,
2
= 1/(
1
a
1
) = 1/(
2 1) =
2 + 1 =
1
.
Since
2
=
1
, we conclude that a
2
= a
1
and
3
=
2
, etc. Hence, a
n
= a
1
= 2 for all n 1.
Therefore,
2 = 1, 2, 2, 2, . . .) = 1, 2 ). .
Since k = k 1+1/1, any integer k can be written in two ways as a simple continued
fraction: k = k) = k 1, 1). It follows that every rational number has at least two
different representations as nite simple continued fractions, because if a
0
, a
1
, . . . , a
n
)
is a representation with a
n
> 1, then a
0
, a
1
, . . . , a
n
1, 1) is a different representation
ending in 1. Conversely, if a
0
, a
1
, . . . , a
n
, 1) is a continued fraction ending in 1, then
a
0
, a
1
, . . . , a
n
, 1) = a
0
, a
1
, . . . , a
n
+ 1). However, these are the only different ways to
represent a rational number as a simple continued fraction. For the proof of this fact we
shall need the following lemma.
21. Simple Continued Fractions 65
Lemma 21.6 Let a
0
, b
0
be integers, let a
1
, a
2
, . . . , a
n
be positive integers, and let x, y be two
real numbers 1. Then
b
0
= a
0
, x) x = 1 and a
0
= b
0
1 (1)
a
0
,= b
0
a
0
, x) ,= b
0
, y) (2)
a
0
, a
1
, . . . , a
n
, x) = a
0
, a
1
, . . . , a
n
, y) x = y (3)
Proof. (1): Suppose b
0
= a
0
, x) and x > 1. Then a
0
< a
0
, x) = b
0
= a
0
+ 1/x < a
0
+ 1,
which is a contradiction, since b
0
is an integer. Hence, x = 1, and b
0
= a
0
+ 1.
(2): Suppose a
0
< b
0
; then a
0
, x) = a
0
+ 1/x a
0
+ 1 b
0
< b
0
, y).
(3): If a
0
, x) = a
0
, y), then obviously x = y, so the assertion holds when n = 0.
Now suppose that the implication is true with n replaced by n 1, and assume that
a
0
, a
1
, . . . , a
n
, x) = a
0
, a
1
, . . . , a
n
, y). Since a
0
, a
1
, . . . , a
n
, x) = a
0
, a
1
, . . . , a
n1
, a
n
, x)),
and the other continued fraction may be shortened analogously, it follows from our
induction hypothesis that rst a
n
, x) = a
n
, y), and then x = y. .
Theorem 21.7 Each integer k has exactly two representations as simple continued fractions,
viz. k) and k 1, 1). Each nonintegral rational number has exactly two representations as
simple continued fractions, and they are of the form a
0
, a
1
, . . . , a
n
) and a
0
, a
1
, . . . , a
n
1, 1),
where n 1 and a
n
> 1. Each irrational number has a unique representation as an innite
simple continued fraction.
Proof. We have alreadynotedthat eachrational number has twodifferent representations
as nite simple continued fractions, and that each irrational has one representation as
innite simple continued fraction, so it sufces to prove that these representations are the
only one.
First assume that k is an integer and k = a
0
, a
1
, . . . , a
n
) = a
0
, a
1
, . . . , a
n
)), with
n 1. It then follows from Lemma 21.6 (1) that a
0
= k 1 and x = a
1
, . . . , a
n
) = 1. If
n 2, then x > a
1
1, which is impossible. Hence n = 1 and a
1
= 1, that is k = k) and
k = k 1, 1) are the only representations of k as a simple continued fraction.
Let now a
0
, a
1
, . . . , a
n
) = b
0
, b
1
, . . . , b
m
) be two representations of a nonintegral
rational number, and assume that m n. Suppose there is an index k < n such
that a
k
,= b
k
, and let k denote the least such index. Writing the continued fraction
a
0
, a
1
, . . . , a
n
) as a
0
, . . . , a
k1
, a
k
, . . . , a
n
)) and similarly for b
0
, b
1
, . . . , b
m
), we then
conclude, using Lemma 21.6 (3), that a
k
, . . . , a
n
) = b
k
, . . . , b
m
), or equivalently that
a
k
, a
k+1
, . . . , a
n
)) = b
k
, b
k+1
, . . . , b
m
)). However, this is impossible because of (2). Thus,
a
k
= b
k
for all k < n and we conclude using (3) that a
n
= b
n
, . . . , b
m
). But a
n
is an
integer, and we already know that there are only two possible representations of integers
as simple continued fractions; either m = n and a
n
= b
n
, or m = n + 1, b
n
= a
n
1 and
b
n+1
= 1.
Let nally be anirrational number, andsuppose = a
0
, a
1
, a
2
, . . .) = b
0
, b
1
, b
2
, . . .)
are two different representations of . Then there is a rst index k such that a
k
,= b
k
,
and we conclude from (3) that a
k
, a
k+1
, a
k+2
, . . .) = b
k
, b
k+1
, b
k+2
, . . .). However, this
contradicts (2). .
66 Lectures on Number Theory
22. Rational Approximations to Irrational Numbers
Let be an irrational number. Given a positive integer b, we let a denote the integer
that is nearest to b , that is a is either equal to [b] or [b] + 1. Then [b a[ < 1/2, and
dividing by b we obtain
a
b
<
1
2b
.
Since b can be taken arbitrarily large, it follows that can be approximated arbitrarily
well by rational numbers a/b. This is sometimes expressed by saying that the rational
numbers are dense in the set of real numbers.
The inequality above gives a bound on [ a/b[ in terms of the denominator b. A
natural question now arises: How well can we approximate with rational numbers a/b
given that there is a prescribed upper bound on the size of b?
It follows from Theorem 20.9, that c
n
= p
n
/q
n
, the nth convergent of the expansion
of as an innite simple continued fraction, satises
p
n
q
n
<
1
q
2
n
.
Thus, the approximation error for convergents p
n
/q
n
is considerably smaller than what
can be expected for general rational numbers a/b.
We will prove that we can do a bit better; there are innitely many rational numbers
a/b such that [ a/b[ < 1/
5 b
2
(Theorem 22.7). This result is sharp in the sense that
the constant
5 can not be replaced by any bigger constant (Theorem 22.8). The rational
numbers a/b satisfying this inequality have to be convergents, because we will also prove
that if [ a/b[ < 1/2b
2
, then necessarily a/b is a convergent (Theorem 22.4).
Thus, continued fractions and convergents play a very important role in the theory
of rational approximation.
In the sequel, we will use both [ a/b[ and [b a[ as measures of how well a/b
approximates . For convenience, we rst reformulate inequality (2) of Theorem 20.9
slightly:
Let p
n
/q
n
be the nth convergent of the irrational number = a
0
, a
1
, a
2
, . . .). Then
(1)
a
n+2
q
n+2
< [q
n
p
n
[ <
1
q
n+1
.
Inequality (1) is of course obtained from Theorem 20.9 (2) by multiplying through by q
n
.
Theorem 22.1 Let (p
n
, q
n
) denote the nth convergent of the simple continued fraction expan-
sion of the irrational number . Then
p
n
q
n
<
p
n1
q
n1
and (2)
[q
n
p
n
[ < [q
n1
p
n1
[ (3)
for every n 1.
Proof. We start by proving the second stronger inequality. Suppose = a
0
, a
1
, a
2
, . . .).
Using inequality (1) twice, we obtain [q
n1
p
n1
[ > a
n+1
/q
n+1
1/q
n+1
> [q
n
p
n
[ ,
which proves (3).
22. Rational Approximations to Irrational Numbers 67
Inequality (2) is an easy consequence of (3) and the fact that q
n
q
n1
for n 1:
p
n
q
n
=
1
q
n
q
n
p
n
<
1
q
n
q
n1
p
n1
1
q
n1
q
n1
p
n1
p
n1
q
n1
. .
Convergents have some interesting extremal properties which we are now going to
study. The proofs of these will rely on the following simple observation:
If r
1
= a
1
/b
1
and r
2
= a
2
/b
2
are two rational numbers with positive denominators
and r
1
,= r
2
, then
[r
1
r
2
[
1
b
1
b
2
.
This is obvious because r
1
r
2
= (a
1
b
2
a
2
b
1
)/b
1
b
2
and the numerator a
1
b
2
a
2
b
1
is a
nonzero integer and its absolute value is thus at least one.
On the other hand, if c
n
= p
n
/q
n
and c
n+1
= p
n+1
/q
n+1
are two consecutive conver-
gents of a number , then [c
n+1
c
n
[ = 1/q
n+1
q
n
, according to Theorem 20.5.
In the formulation of the following theorems we will use the notation
min
1tB
f (s, t)
to denote the minimum of f (s, t) when s ranges over all integers and t ranges over all
integers in the interval [1, B] .
Theorem 22.2 Let be an irrational number, let B be a positive integer, and consider the value
of [t s[ for all integers t in the interval [1, B] and all integers s. Suppose that the minimum
is achieved for s = a and t = b, that is
[b a[ = min
1tB
[t s[.
Then, a and b are relatively prime, and (a, b) is a convergent of the simple continued fraction
expansion of .
Proof. Let d be a common divisor of a and b, and suppose d > 1. Write a
/
= a/d and
b
/
= b/d; then 1 b
/
B and [b
/
a
/
[ = [b a[/d < [b a[ , which contradicts the
choice of a and b. Thus d = 1, and a and b are relatively prime.
Let now c
n
= p
n
/q
n
denote the nth convergent of and write r = a/b. We shall
prove that r = c
n
for some n; since both fractions a/b and p
n
/q
n
are in reduced form, it
will then follow that a = p
n
and b = q
n
.
First suppose r < c
0
. Since c
0
< , [ r[ > [c
0
r[ 1/bq
0
. Multiplying through
by b, we obtain [b a[ = b[ r[ > 1/q
0
1/q
1
> [q
0
p
0
[ . Since q
0
= 1 b, this
contradict the minimality assumption on a and b. Thus, r c
0
.
Next suppose that r > c
1
. Since c
1
> , [ r[ > [c
1
r[ 1/bq
1
. Multiplying
through by b, we again obtain [b a[ > 1/q
1
> [q
0
p
0
[ , which is impossible.
Hence, c
0
r c
1
. Since (c
2k
) is an increasing sequence and (c
2k+1
) is a decreasing
sequence, both with limit , the rational number r lies between c
n1
and c
n+1
for some
integer n. If r is either c
n1
or c
n+1
, we are nished. Otherwise, note that these two
convergents are on the same side of , and that c
n
lies on the other side. It follows that
[r c
n1
[ < [c
n
c
n1
[ . Since the left hand side of this inequality is 1/bq
n1
and the
right hand side equals 1/q
n
q
n1
, we conclude that 1/bq
n1
< 1/q
n
q
n1
, that is q
n
< b.
We also have [ r[ > [c
n+1
r[ 1/bq
n+1
, and multiplying both sides by b we
obtain [b a[ > 1/q
n+1
> [q
n
p
n
[ . Since q
n
< b, this contradicts the assumption that
[t s[ is minimized when t = b and s = a. The proof is now complete. .
Theorem 22.2 tells us that if a/b is the best approximation to in the sense that
[b a[ cannot be made smaller by replacing a/b with any other rational number s/t
with 1 t b, then a/b is necessarily a convergent of . By combining this result with
Theorem 22.1 we obtain the following more precise information:
68 Lectures on Number Theory
Theorem 22.3 Let be irrational with convergents (p
n
, q
n
) . Then
[q
n
p
n
[ = min
1t<q
n+1
[t s[ (4)
p
n
q
n
= min
1tq
n
s
t
. (5)
Proof. By Theorem 22.2, there is a convergent (p
m
, q
m
) of such that [q
m
p
m
[ =
min
1t<q
n+1
[t s[ . Since q
k
q
n+1
for all k n + 1 and since [q
k
p
k
[ decreases when
k increases, it follows that m = n. This proves (4).
To prove (5), assume that 1 t q
n
and let s be arbitrary. Using (4), we obtain
p
n
q
n
=
1
q
n
q
n
p
n
1
q
n
t s
=
t
q
n
s
t
q
n
q
n
s
t
s
t
.
Hence,
p
n
q
n
= min
1tq
n
s
t
. .
Remark. If q
n+1
> 2q
n
, the following stronger result holds:
p
n
q
n
= min
1tq
n+1
/2
s
t
.
Proof. Assume [ s/t[ < [ p
n
/q
n
[ . Using the triangle inequality and Theorem 20.9,
we then obtain
1/tq
n
[s/t p
n
/q
n
[ [s/t [ + [ p
n
/q
n
[ < 2[ p
n
/q
n
[ < 2/q
n
q
n+1
.
It follows that t > q
n+1
/2. .
Example 1. Using the algorithm in Theorem 21.5 and the decimal expansion of , one
easily nds that = 3, 7, 15, 1, 292, 1, 1, 1, 2, . . .). To compute the rst ve convergents
we use the following table:
n 2 1 0 1 2 3 4
a
n
3 7 15 1 292
p
n
0 1 3 22 333 355 103 993
q
n
1 0 1 7 106 113 33 102
The convergent p
1
/q
1
is the familiar approximation 22/7, rst given by Archimedes; it
is the best approximation among all rationals with denominator not exceeding 7. The
approximation 355/113 is remarkably accurate; by Theorem 20.9,
355
113
<
1
113 33102
< 3 10
7
.
Using the remark following Theorem 22.3 we see that in order to obtain a better approx-
imation we need a rational a/b with b > 33102/2 = 16 551. In fact, 355/113 is the best
approximation to among all rationals with denominators not exceeding 16 603. .
In Theorems 22.2 and 22.3, the convergents appear as solutions to certain minimum
problems. Therefore, it should not come as a surprise that best rational approximations
to irrationals have to be convergents. The following theorem gives a precise meaning to
this statement.
22. Rational Approximations to Irrational Numbers 69
Theorem 22.4 Let be irrational, and let a and b be integers with b positive. If
a
b
<
1
2b
2
,
then a/b equals one of the convergents of the simple continued fraction expansion of .
Remark. The fraction a/b is not necessarily reduced.
Proof. If the fraction a/b is not reduced, then the reduced fraction a
/
/b
/
obviously
satises the same inequality. Therefore, we may as well assume from the beginning that
a/b is reduced, i.e. that a and b are relatively prime. Under this assumption we will
prove that the inequality
(6) [b a[ [t s[
holds for all integers s and t with 1 t b. It then follows from Theorem 22.2 that a/b
is a convergent.
Assume (6) is false for some integers s and t with 1 t b. Then
(7) [t s[ < [b a[,
and it follows that
s
t
<
1
t
[b a[ =
b
t
a
b
<
b
t
1
2b
2
=
1
2bt
.
Using the triangle inequality, we thus obtain
a
b
s
t
a
b
s
t
<
1
2b
2
+
1
2bt
=
1
2bt
_
1 +
t
b
_
1
bt
.
Multiply through by bt ; this results in [at bs[ < 1, and since at bs is an integer we
conclude that at bs = 0, that is a/b = s/t . Since the fraction a/b is reduced, t b. But
t b, and hence t = b and s = a. This is a contradiction because of (7). .
It remains to prove that there are fractions a/b that satisfy the inequality in Theo-
rem 22.4. By Theorem 20.9, the convergents p/q of an irrational number satisfy the
inequality [ p/q[ < 1/q
2
. The following theorem shows that of any two successive
convergents at least one will satisfy the stronger inequality in Theorem 22.4.
Theorem 22.5 Of any two successive convergents of the simple continued fraction expansion of
the irrational number , at least one convergent p/q will satisfy the inequality
p
q
<
1
2q
2
.
Proof. Assume the theorem is false. Then there are two successive convergents c
n
=
p
n
/q
n
and c
n+1
= p
n+1
/q
n+1
such that [ c
n
[ > 1/2q
2
n
and [ c
n+1
[ > 1/2q
2
n+1
. (The
inequalities are strict since is irrational.) Since the two convergents are on opposite
sides of , it follows that
1
q
n
q
n+1
= [c
n+1
c
n
[ = [c
n+1
[ + [ c
n
[ >
1
2q
2
n+1
+
1
2q
2
n
.
Multiplying through by q
n+1
q
n
, we obtain
(8) 1 >
1
2
_
q
n
q
n+1
+
q
n+1
q
n
_
.
To conclude the proof we note that x + 1/x = 2 + (
x 1/
x)
2
2 for all positive
numbers x. Therefore, the right hand side of inequality (8) is certainly 1, and this is a
contradiction. This proves the theorem. .
The result of Theorem 22.5 can be improved, as Theorem 22.7 will show. We shall
need the following simple lemma.
70 Lectures on Number Theory
Lemma 22.6 Let x be a positive real number, and suppose x +
1
x
<
5. Then
x <
5 + 1
2
and
1
x
>
5 1
2
.
Proof. x + 1/x <
5 x
2
5x + 1 < 0
_
x (
5 + 1)/2
__
x (
5 1)/2
_
< 0
(
5 1)/2. .
Theorem 22.7 If is irrational, then there exist innitely many rational numbers a/b such
that
a
b
<
1
5 b
2
.
Indeed, out of three successive convergents of , at least one will satisfy the inequality.
Proof. Suppose on the contrary that none of the convergents c
k
= p
k
/q
k
, k = n 1, n,
n + 1, satises the inequality. Then [ c
k
[ 1/
5 q
2
k
for k = n 1, n, and n + 1. The
successive convergents c
n1
and c
n
lie on opposite sides of ; hence
1
q
n
q
n1
= [c
n
c
n1
[ = [c
n
[ + [ c
n1
[
1
5
_
1
q
2
n
+
1
q
2
n1
_
.
Multiplying through by q
n
q
n1
, we obtain q
n
/q
n1
+q
n1
/q
n
<
5 + 1)/2 and q
n1
/q
n
> (
5 1)/2.
Analogously, we must have q
n+1
/q
n
< (
5 + 1
2
>
q
n+1
q
n
=
a
n
q
n
+ q
n1
q
n
q
n
+ q
n1
q
n
= 1 +
q
n1
q
n
> 1 +
5 1
2
=
5 + 1
2
.
This contradiction proves the theorem. .
Theorem 22.8 The constant
5 in the preceding theorem is best possible, because if C >
5
and = 1, 1, 1, . . .) = (
a
b
<
1
Cb
2
holds for only nitely many integers a and b.
Proof. By Theorem 22.4, any fraction a/b satisfying the inequality must be a convergent,
so it sufces to prove that only nitely many convergents p
n
/q
n
of satisfy the inequality.
First note that
1
= (
5 1)/2, +
1
=
5, and
1
= 1.
Since p
n
= p
n1
+ p
n2
and q
n
= q
n1
+ q
n2
, where p
2
= 0 and p
1
= 1, whereas
q
1
= 0 and q
0
= 1, we conclude that q
n
= p
n1
for all n 1, and that (q
n
)
0
is the
ordinary Fibonacci sequence. Moreover, using induction it is easy to show that
p
n
= A
n
+ B()
n
,
where the constants A and B are determined by the condition
_
p
1
= A
1
B = 1
p
0
= A + B = 1
23. Periodic Continued Fractions 71
Solving for A and B, we nd
A =
1 +
+
1
, B =
1
1
+
1
, and AB( +
1
) =
1
+
1
=
1
5
.
Hence,
[q
n
p
n
[ = [p
n1
p
n
[ = [A
n
B()
2n
A
n
B()
n
[
= [B(
2
+ 1)
n
[ = B(
2
+ 1)
n
.
It follows that
q
2
n
p
n
q
n
= [q
n
p
n
[ q
n
= B(
2
+ 1)
n
(A
n1
+ B()
1n
)
= AB( +
1
) + (1)
n
B
2
(
2
+ 1)
12n
=
1
5
+ B
2
(1)
n
(
2
+ 1)
12n
.
Let n tend to ; then
12n
0 since > 1, and we conclude that
lim
n
q
2
n
p
n
q
n
=
1
5
.
Since 1/
p
n
q
n
>
1
C
,
for all but nitely many n. Thus, the inequality in Theorem 22.8 holds for only nitely
many convergents p
n
/q
n
. .
23. Periodic Continued Fractions
In section 20 we computed some periodic simple continued fractions and found that they
were roots of quadratic equations with integer coefcients. The goal of this section is to
prove that this property characterizes the periodic simple continued fractions, that is an
irrational number has a periodic continued fraction expansion if and only if it saties a
quadratic equation with integer coefcients.
Denition 23.1 An innite sequence (a
n
)
n=0
is called periodic if there is a nonzero integer
p and an integer m such that
a
n
= a
n+p
for all n m.
The integer p is called a period of the sequence.
If p and q are two different periods for the sequence, then p q is a period, too,
because a
n+pq
= a
n+pq+q
= a
n+p
= a
n
for all sufciently large integers n. Thus, the set
of all periods together with the number 0 is an ideal in Z. It follows that there exists a
smallest positive integer r such that all periods of the sequence are multiples of r . This
uniquely determined number is called the period and the period length of the sequence.
A periodic sequence with period p > 0 can be written in the form
b
0
, b
1
, . . . , b
m1
, c
0
, c
1
, . . . , c
p1
, c
0
, c
1
, . . . , c
p1
, . . . = b
0
, b
1
, . . . , b
m1
, c
0
, c
1
, . . . , c
p1
where the bar over the c
0
, c
1
, . . . , c
p1
indicates that this block of numbers is repeated
indenitely.
A periodic sequence (a
n
)
n=0
with period p > 0 is called purely periodic if a
n
= a
n+p
holds for all n 0. Purely periodic sequences are of the form a
0
, a
1
, . . . , a
p1
.
72 Lectures on Number Theory
Denition 23.2 An innite continued fraction a
0
, a
1
, a
2
, . . .) is called (purely) periodic if
the corresponding sequence (a
n
)
n=0
of terms is (purely) periodic. Of course, the period of
a periodic continued fraction is by denition the period of the sequence of terms.
Let = a
0
, a
1
, a
2
, . . .) be a continued fraction and write
k
= a
k
, a
k+1
, a
k+2
, . . .). If
is a periodic continued fraction with period p, then obviously there is an integer m
such that
n
=
n+p
holds for all n m. Conversely, if
n+p
=
n
holds for some number
n, then is a periodic continued fraction with p as a period (and the period r is some
divisor of p).
Denition 23.3 An irrational number is called a quadratic irrational (or algebraic of
degree two) if it is the root of a quadratic polynomial with integer coefcients, that is if
a
2
+ b + c = 0 for suitable integer coefcients a, b, and c with a ,= 0.
Proposition 23.4 A real number is a quadratic irrational if and only if it has the form
= r + s
d, where d is a positive integer that is not a perfect square, r and s are rational
numbers and s ,= 0.
Proof. Any real irrational solution of a quadratic equation ax
2
+bx +c = 0 obviously has
this form. Conversely, a real number of this form is irrational and satises the quadratic
equation (x r)
2
= s
2
d, which can be turned into a quadratic equation with integer
coefcients upon multiplication by the squares of the denominators of r and s. .
Denition 23.5 Let d be a positive integer that is not a perfect square. We dene Q[
d ]
to be the set of all real numbers of the form = r + s
d ] , then
their sum + , difference , product , and quotient / also belong to Q[
d ] , the
quotient of course provided ,= 0.
Proposition 23.7 Suppose , Q[
d ] . Then ( + )
/
=
/
+
/
, ( )
/
=
/
/
,
()
/
=
/
/
, and (/)
/
=
/
/
/
.
Proposition 23.8 If the number has a periodic simple continued fraction expansion, then
is a quadratic irrational.
Proof. Being an innite continued fraction, is irrational. We will prove that Q[
d ]
for a suitable positive integer d that is not a perfect square.
Assume = b
0
, b
1
, . . . , b
m1
, c
0
, c
1
, . . . , c
r1
), and let = c
0
, c
1
, . . . , c
r1
). Then
= c
0
, c
1
, . . . , c
r1
, ).
In terms of the convergents (p
k
, q
k
) of the nite continued fraction c
0
, c
1
, . . . , c
r1
),
= c
0
, c
1
, . . . , c
r1
, ) =
p
r1
+ p
r2
q
r1
+ q
r2
,
and solving for we see that satises a quadratic equation with integer coefcients.
Hence, is a quadratic irrational, that is Q[
d ] . .
The converse of Proposition 23.8 is true, that is every quadratic irrational has a
periodic simple continued fraction expansion. The proof of this needs some preparatory
work.
Lemma 23.9 If is a quadratic irrational, then can be written in the form
=
u +
d
v
,
where d is an integer that is not a perfect square, u and v are integers, and v[(d u
2
) .
Proof. By Proposition 23.4, = r + s
D
c
=
a[c[ +
b
2
c
2
D
c[c[
=
u +
d
v
,
and the integers u = a[c[ , v = c[c[ and d = b
2
c
2
D satisfy the requirement v[(d u
2
) . .
Suppose
0
is a quadratic irrational. Using Lemma 23.9, we rst write
0
= (u
0
+
d)/v
0
,
where d is an integer that is not a perfect square, and u
0
and v
0
are integers, and
v
0
[(d u
2
0
) .
We then recall the recursive algorithm in Theorem 21.5 for obtaining the continued
fraction expansion of a
0
, a
1
, a
2
, . . .) of
0
. The terms a
n
are given by
a
0
= [
0
],
n+1
=
1
n
a
n
, and a
n+1
= [
n+1
] for n = 0, 1, 2, . . . ,
and we have
0
= a
0
, a
1
, . . . , a
n
,
n+1
) for all n.
Now suppose inductively that
n
= (u
n
+
d)/v
n
, with integers u
n
and v
n
that
satisfy v
n
[(d u
2
n
) . Then
n+1
=
1
n
a
n
=
1
d (a
n
v
n
u
n
)
v
n
=
d + (a
n
v
n
u
n
)
d (a
n
v
n
u
n
)
2
v
n
=
u
n+1
+
d
v
n+1
,
where u
n+1
= a
n
v
n
u
n
and v
n+1
= (d u
2
n+1
)/v
n
.
Clearly, u
n+1
is an integer and u
n+1
u
n
(mod v
n
) . Hence by the induction
assumption, d u
2
n+1
d u
2
n
0 (mod v
n
) , that is v
n
divides d u
2
n+1
. Therefore, v
n+1
is also an integer, and v
n+1
[(d u
2
n+1
) , because v
n
v
n+1
= d u
2
n+1
.
By induction, we have thus proved the validity of the following algorithm:
Theorem 23.10 Suppose
0
= (u
0
+
d)/v
0
, where d is a positive integer that is not a perfect
square, u
0
and v
0
are integers and v
0
[(d u
2
0
) . Dene recursively the sequences (u
n
)
0
, (v
n
)
0
,
(a
n
)
0
and (
n
)
0
as follows:
n
=
u
n
+
d
v
n
, a
n
= [
n
]
u
n+1
= a
n
v
n
u
n
, v
n+1
=
d u
2
n+1
v
n
, for n 0.
74 Lectures on Number Theory
Then u
n
and v
n
are integers, v
n
[(d u
2
n
) , and
0
= a
0
, a
1
, . . . , a
n
,
n+1
) for all n, and
0
= a
0
, a
1
, a
2
, . . .). .
Example 1. Let us compute the continuedfraction expansion of the number (1
5)/3
using the algorithm of Theorem 23.10. Since 3,[ (5 1
2
) , we rst have to put the number
in the form of Lemma 23.9. Multiplying numerator and denominator by 3, we obtain
0
=
3 +
45
9
, that is u
0
= 3, v
0
= 9, and d = 45.
Now v
0
[(d u
2
0
) , so we can start the algorithm. The result of the computations is shown
in the following table:
n 0 1 2 3 4 5 6 7 8 9
u
n
3 12 1 5 5 3 6 6 3 5
v
n
9 11 4 5 4 9 1 9 4 5
a
n
1 1 1 2 2 1 12 1 2 2
Since (u
9
, v
9
) = (u
3
, v
3
) , we conclude that
9
=
3
. Thus,
1
5
3
= 1, 1, 1, 2, 2, 1, 12, 1, 2 ). .
Lemma 23.11 Let be a quadratic irrational and dene
n
as in Theorem23.10. If the conjugate
/
k
< 0 for some index k, then 1 <
/
n
< 0 for all n > k.
Proof. By induction, it sufces to prove that
/
n
< 0 implies 1 <
/
n+1
< 0. So
assume
/
n
< 0. Using the relation
n+1
= 1/(
n
a
n
) and taking conjugates, we obtain
/
n+1
= 1/(
/
n
a
n
) . Since a
n
1, the denominator
/
n
a
n
is strictly less than 1, so it
follows that 1 <
/
n+1
< 0. .
Lemma 23.12 Let be a quadratic irrational, and dene
n
and a
n
= [
n
] as above. If
1 <
/
n
< 0, then a
n
= [1/
/
n+1
] .
Proof. We have
/
n+1
= 1/(
/
n
a
n
) , whence 1/
/
n+1
= a
n
/
n
. Since 0 <
/
n
< 1, it
follows that [1/
/
n+1
] = [a
n
/
n
] = a
n
. .
Lemma 23.13 If is a quadratic irrational, then there exists an index k such that
/
k
< 0.
Proof. Let (p
k
, q
k
) denote the kth convergent of . Since = a
0
, a
1
, . . . , a
n1
,
n
), we
have
=
p
n1
n
+ p
n2
q
n1
n
+ q
n2
,
and solving for
n
we obtain
n
=
q
n2
p
n2
p
n1
q
n1
=
q
n2
q
n1
_
p
n2
/q
n2
p
n1
/q
n1
_
.
By taking conjugates, we get
/
n
=
q
n2
q
n1
_
/
p
n2
/q
n2
/
p
n1
/q
n1
_
.
We now use the fact that the convergents p
n
/q
n
converge to as n tends to innity and
that
/
,= . It follows that the expression within parenthesis converges to (
/
)/(
/
) ,
that is to 1, as n tends to innity. Consequently, the expression within parenthesis is
certainly > 0 when n is big enough, that is
/
n
has the same sign as q
n2
/q
n1
, which
is negative since q
n
is positive for all n 0. .
23. Periodic Continued Fractions 75
Theorem 23.14 A real number has a periodic simple continued fraction expansion if and only
if it is a quadratic irrational.
Proof. We have already proved that a periodic continued fraction is a quadratic irrational
(Proposition 23.8). To prove the converse, let =
0
be a quadratic irrational and write
n
=
u
n
+
d
v
n
as in Theorem 23.10. By Lemma 23.13, there is an index k such that
/
k
< 0, and by
Lemma 23.11, 1 <
/
n
< 0 for all n > k. Since
n
> 1 for all n 1, we conclude that
1 <
n
/
n
=
2
d
v
n
and 0 <
n
+
/
n
=
2u
n
v
n
for all n > k. Hence 0 < v
n
< 2
d and u
n
> 0 if n > k. Moreover, using the relation
du
2
n+1
= v
n
v
n+1
> 0, we obtain u
2
n+1
< d, that is u
n+1
<
d and 0 < v
n
< 2
d satises 1 <
/
< 0.
Theorem 23.16 The simple continued fraction expansion of the real quadratic irrational number
is purely periodic if and only if is reduced. Also, if = a
0
, a
1
, . . . , a
r1
), then 1/
/
=
a
r1
, a
r2
, . . . , a
1
, a
0
).
Proof. Suppose =
0
is a reduced quadratic irrational, and use Theorem 23.10 to write
n
= (u
n
+
d)/v
n
. Since 1 <
/
0
< 0 by assumption, we have 1 <
/
n
< 0 and
a
n
= [1/
/
n+1
] for all n 0 by Lemma 23.11 and Lemma 23.12.
We know from Theorem 23.14 that has a simple periodic continued fraction
expansion. Let r be the period length; then there is a smallest number m 0 such that
n+r
=
n
for all n m.
We must prove that m = 0.
Assume m 1. Starting from
m
=
m+r
we rst obtain
/
m
=
/
m+r
by taking
conjugates, and hence a
m1
= [1/
/
m
] = [1/
/
m+r
] = a
m+r1
. Since
1
m1
a
m1
=
m
=
m+r
=
1
m+r1
a
m+r1
,
we then conclude that
m1+r
=
m1
, which violates the denition of m. Thus m = 0,
and is purely periodic.
Conversely, suppose that is purely periodic, say = a
0
, a
1
, . . . , a
r1
), where
a
0
, a
1
, . . . , a
r1
are positive integers. Then > a
0
1. Let (p
n
, q
n
) denote the nth
convergent of ; then
= a
0
, a
1
, . . . , a
r1
, ) =
p
r1
+ p
r2
q
r1
+ q
r2
.
76 Lectures on Number Theory
Thus satises the quadratic equation
f (x) = q
r1
x
2
+ (q
r2
p
r1
)x p
r2
= 0.
This equation has two roots, and its conjugate
/
. Since > 1, we need only prove
that f (x) has a root between 1 and 0 to establish that 1 <
/
< 0. We will do this by
showing that f (0) < 0 and f (1) > 0.
Note that p
n
is positive for all n 1 (since a
0
> 0). Hence, f (0) = p
r2
< 0.
Next we see that
f (1) = q
r1
q
r2
+ p
r1
p
r2
= (a
r1
1)(q
r2
+ p
r2
) + q
r3
+ p
r3
q
r3
+ p
r3
> 0.
Thus, is reduced.
Finally, to prove that 1/
/
has the statedcontinuedfraction expansion, we suppose
that = a
0
, a
1
, . . . , a
r1
). Taking conjugates inthe relation
n
= 1/(
n1
a
n1
) we obtain
/
n
= 1/(
/
n1
a
n1
) , which can be rewritten as
1/
/
n
= a
n1
+
1
1/
/
n1
for all n 1.
Since 1/
/
n
> 1 for all n, the above equation can be expressed as a continued fraction
expansion
1/
/
n
= a
n1
, 1/
/
n1
).
Starting with 1/
/
r
, iterating and using the fact that =
0
=
r
, we thus obtain
1/
/
= 1/
/
0
= 1/
/
r
= a
r1
, 1/
/
r1
) = a
r1
, a
r2
, 1/
/
r2
) = . . .
= a
r1
, a
r2
, . . . , a
1
, a
0
, 1/
/
0
).
Hence, 1/
/
= a
r1
, a
r2
, . . . , a
1
, a
0
). .
Example 2. The quadratic irrational (2 +
10)/3 is reduced. Its continued fraction
expansion is easily computed with the aid of Theorem 23.10. Since 3[(10 2
2
) , we can
start with u
0
= 2, v
0
= 3 and d = 10. The computations are summarized in the following
table:
n 0 1 2 3
u
n
2 1 2 2
v
n
3 3 2 3
a
n
1 1 2 1
Since (u
3
, v
3
) = (u
1
, v
1
) , we conclude that the period is 3 and that (2 +
10)/3 = 1, 1, 2 ).
.
24. Continued Fraction Expansion of
d 77
24. Continued Fraction Expansion of
_
d
Theorem 24.1 Let d be a positive integer that is not a perfect square. The simple continued
fraction expansion of
d is of the form
a
0
, a
1
, a
2
, . . . , a
r1
, 2a
0
),
where a
0
= [
d ] and a
j
= a
rj
for j = 1, 2, . . . , r 1.
Proof. Let a
0
= [
d ] and = a
0
+
d
satises 1 <
/
< 0. By Theorem 23.16, has a purely periodic continued fraction
expansion starting with [] = 2a
0
, say
(1) = a
0
+
d = 2a
0
, a
1
, a
2
, . . . , a
r1
) = 2a
0
, a
1
, a
2
, . . . , a
r1
, 2a
0
).
If we subtract a
0
from each side, we get
d = a
0
, a
1
, a
2
, . . . , a
r1
, 2a
0
).
To prove that the sequence a
1
, a
2
, . . . , a
r1
is symmetric, we note that
= a
0
+
d = 2a
0
+
d a
0
= 2a
0
/
= 2a
0
+
1
1/
/
= 2a
0
, 1/
/
).
By Theorem 23.16,
1/
/
= a
r1
, a
r2
, . . . , a
1
, 2a
0
),
and hence
= 2a
0
, a
r1
, a
r2
, . . . , a
1
, 2a
0
).
A comparison with (1) gives a
j
= a
rj
for 1 j r 1. .
Example 1. Tocompute the continuedfractionexpansionof
19 we use Theorem23.10
with u
0
= 0, v
0
= 1 and d = 19. We get the following table:
n 0 1 2 3 4 5 6 7
u
n
0 4 2 3 3 2 4 4
v
n
1 3 5 2 5 3 1 3
a
n
4 2 1 3 1 2 8 2
It follows that the expansion has period length 6, and that
19 = 4, 2, 1, 3, 1, 2, 8 ). .
78 Lectures on Number Theory
Theorem 24.2 Let (p
n
, q
n
) denote the nth convergent of
d, let the integers u
n
and v
n
be
dened for the number =
d as in Theorem 23.10, that is
n
= (u
n
+
d)/v
n
with v
n
[(d u
2
n
) ,
and let r be the period length of the continued fraction expansion of
d. Then
(i) p
2
n
dq
2
n
= (1)
n1
v
n+1
for every n 1;
(ii) v
n
> 0 for every n 0;
(iii) v
n
= 1 if and only if r[n.
Proof. Write
d = a
0
, a
1
, a
2
, . . .) = a
0
, a
1
, . . . , a
n
,
n+1
).
(i) We have
d =
n+1
p
n
+ p
n1
n+1
q
n
+ q
n1
=
(u
n+1
+
d)p
n
+ v
n+1
p
n1
(u
n+1
+
d)q
n
+ v
n+1
q
n1
,
which can also be written as
u
n+1
p
n
+ v
n+1
p
n1
dq
n
(u
n+1
q
n
+ v
n+1
q
n1
p
n
)
d = 0.
Since
d is irrational, it follows that
_
u
n+1
p
n
+ v
n+1
p
n1
dq
n
= 0
u
n+1
q
n
+ v
n+1
q
n1
p
n
= 0
Eliminating u
n+1
from this system, we obtain
p
2
n
dq
2
n
= v
n+1
(p
n
q
n1
q
n
p
n1
) = (1)
n1
v
n+1
,
where we used Theorem 20.5 to get the last equality.
(ii) The convergents p
n
/q
n
are >
d if n is even. Therefore,
p
2
n
dq
2
n
has the same sign as (1)
n1
, so it follows from (i) that v
n+1
is positive for every
n 1.
(iii) Since =
d has period length r ,
kr+1
=
1
for all positive integers k. It
follows that
kr
a
kr
=
1
kr+1
=
1
1
=
0
a
0
= a
0
+
d,
that is
kr
= a
kr
a
0
+
d. Hence, v
kr
= 1 (and u
kr
= a
kr
a
0
).
Conversely, assume v
n
= 1; then
n
= u
n
+
d, so a
n
= [
n
] = u
n
+ [
d ] = u
n
+ a
0
and
n
a
n
=
d a
0
=
0
a
0
, that is
n+1
= 1/(
n
a
n
) = 1/(
0
a
0
) =
1
. It follows
from this that n is a multiple of the period length r . .
The reader may have noted in the few examples of continued fraction expansion of
d)/v
n
< 0, because
/
0
=
d < 0, that is
u
n
<
d and hence
n
< 2
d/v
n
d. Finally, a
n
= [
n
] [
d ] = a
0
. .
25. Pells Equation 79
25. Pells Equation
The equation x
2
dy
2
= N, with given nonzero integers d and N, is called Pells equation.
If d is negative, Pells equation can have only a nite number of solutions in integers,
since x
2
N and y
2
N/d.
If d = a
2
is a perfect square, then we have (x + ay)(x ay) = N, and again there
is only a nite number of integral solutions to Pells equation, since there is only a nite
number of ways to factor N.
We will therefore suppose that d is a positive integer that is not a perfect square. We
will show that in that case there is either no solution at all or innitely many solutions in
integers. When N = 1, we will give a complete description of the set of solutions.
If (u, v) is an integral solution of Pells equation x
2
dy
2
= N, then (u, v) is also
a solution for every combination of the signs. Thus, in order to nd all integral solutions
it sufces to nd all positive solutions, that is all solutions (u, v) with u > 0 and v > 0. If
N is a perfect square, there will of course be two additional trivial solutions (
N, 0) ,
and if N/d happens to be an integer that is a perfect square, (0,
_
N/d) are two
trivial solutions of Pells equation.
If (x
1
, y
1
) and (x
2
, y
2
) are two positive solutions of x
2
dy
2
= N, then x
2
1
x
2
2
=
d(y
2
1
y
2
2
) , and hence x
1
< x
2
if and only if y
1
< y
2
. Thus, if we order the positive
solutions according to increasing x-value or according to increasing y-value we will get
the same result.
If there is a positive solution in integers of Pells equation, then there is obviously a
positive solution (x
1
, y
1
) with a least positive x-value. This solution has also the least y-
value among all positive solutions. Since it plays a special role we introduce the following
denition.
Denition 25.1 Suppose Pells equation x
2
dy
2
= N has positive integral solutions.
The fundamental solution, or least positive solution, is the positive solution (x
1
, y
1
) such that
x
1
< u and y
1
< v for every other positive solution (u, v) .
The following theorem gives a connection between Pells equation and continued
fractions.
Theorem 25.2 Let d be a positive integer that is not a perfect square, and suppose [N[ <
d.
If (u, v) is a positive solution in integers of x
2
dy
2
= N, then there is a convergent (p
n
, q
n
) of
the simple continued fraction expansion of
d such that u/v = p
n
/q
n
.
Remark. The numbers u and v need not be relatively prime, but if c is their greatest
common divisor, then obviously c
2
[N. Hence, if N is square-free, and in particular if
N = 1, then u and v are necessarily relatively prime. That means that there is an index
n such that u = p
n
and v = q
n
.
Proof. We will consider a more general situation. Let d and N be any positive real
numbers, not necessarily integers, such that
d is irrational and N <
d, and assume
that u and v are positive integers satisfying u
2
dv
2
= N.
Since
_
u
v
d
__
u
v
+
d
_
=
u
2
dv
2
v
2
=
N
v
2
80 Lectures on Number Theory
and the second factor of the left hand side is positive, we rst conclude that u/v
d > 0,
and consequently u/v +
d > 2
d. Hence
0 <
u
v
d =
N
v
2
(u/v +
d)
<
d
2v
2
d
=
1
2v
2
.
By Theorem 22.4, u/v is a convergent of
d.
Let now d and N be as in the statement of the theorem. The case N > 0 is a special
case of what we have just proved.
If N < 0, we rewrite the equation as y
2
(1/d)x
2
= N/d. Since 0 < N/d <
d/d =
_
1/d, we can apply the general case above, and we conclude that v/u is a
convergent of 1/
d. Suppose
d has the continued fraction expansion a
0
, a
1
, a
2
, . . .).
Then 1/
d = 0,
d) = 0, a
0
, a
1
, a
2
, . . .). Hence, there is an n such that
v
u
= 0, a
0
, a
1
, . . . , a
n
) =
1
a
0
, a
1
, . . . , a
n
)
,
that is u/v = a
0
, a
1
, . . . , a
n
) is a convergent of
d. .
By combining the theorem above with Theorem 24.2, we get a complete description
of the solution set of Pells equation in the case N = 1.
Theorem 25.3 Suppose d is a positive integer that is not a perfect square, let r be the period
length of the simple continued fraction expansion of
d, and let (p
n
, q
n
)
n=0
be the corresponding
sequence of convergents.
(i) Suppose r is even. Then
(a) x
2
dy
2
= 1 has no solutions in integers;
(b) all positive integral solutions of x
2
dy
2
= 1 are given by x = p
kr1
, y = q
kr1
for
k = 1, 2, 3, . . . , with x = p
r1
and y = q
r1
as the fundamental solution.
(ii) Suppose r is odd Then
(a) all positive integral solutions of x
2
dy
2
= 1 are given by x = p
kr1
, y = q
kr1
for
k = 1, 3, 5, . . . , with x = p
r1
and y = q
r1
as the fundamental solution;
(b) all positive integral solutions of x
2
dy
2
= 1 are given by x = p
kr1
, y = q
kr1
for
k = 2, 4, 6, . . . , with x = p
2r1
and y = q
2r1
as the fundamental solution.
Proof. By the previous theorem, the positive integral solutions of x
2
dy
2
= 1 are to
be found among the convergents (p
n
, q
n
) . Furthermore, a
0
= [
d ] 1, so the sequence
(p
n
)
n=0
is strictly increasing. Therefore, the rst solution that appears in the sequence
(p
n
, q
n
) will be the fundamental solution.
According to Theorem 24.2, p
2
n
dq
2
n
= (1)
n1
v
n+1
, where v
n
1 for all n and
v
n
= 1 if and only if r[n. Thus, [p
2
n
dq
2
n
[ 2 except when n = kr 1 for some
nonnegative integer k, in which case
p
2
n
dq
2
n
= (1)
kr
.
If r is even, then (1)
kr
= 1 for all k, and hence (p
kr1
, q
kr1
) is a solution of x
2
dy
2
= 1
for all k, whereas the equation x
2
dy
2
= 1 has no positive solution, and of course
no solution at all in integers. This proves part (i). If the period length r is odd, then
(1)
kr
= 1 for k even, and = 1 for k odd, and this proves part (ii). .
Example 1. We shall use Theorem25.3 to ndthe fundamental solution of the equation
x
2
19y
2
= 1.
25. Pells Equation 81
The continuedfractionexpansion
19 = 4, 2, 1, 3, 1, 2, 8 ) was foundinthe previous
section. Since the period length is 6, the fundamental solution is (x, y) = (p
5
, q
5
) . The
convergents are computed in the following table:
n 2 1 0 1 2 3 4 5
a
n
4 2 1 3 1 2
p
n
0 1 4 9 13 48 61 170
q
n
1 0 1 2 3 11 14 39
Thus, the fundamental solution is (x, y) = (170, 39) . .
Theorem 25.3 gives a method for computing the successive solutions of Pells equa-
tion but it is tedious to compute convergents (p
n
, q
n
) . Having found the fundamental
solution, we can nd the remaining positive solutions by a simpler method, which will
be described in Theorem 25.6 below.
Lemma 25.4 Let (x
1
, y
1
) be an arbitrary integral solution of x
2
dy
2
= M and (x
2
, y
2
) an
arbitrary integral solution of x
2
dy
2
= N, and dene the integers u and v by
(x
1
+ y
1
d)(x
2
+ y
2
d) = u + v
d,
that is u = x
1
x
2
+ y
1
y
2
d, v = x
1
y
2
+ x
2
y
1
. Then (u, v) is a solution of x
2
dy
2
= MN. If
(x
1
, y
1
) and (x
2
, y
2
) are positive solutions, then (u, v) is also positive.
Proof. By taking conjugates we have (x
1
y
1
d)(x
2
y
2
d) = u v
d, and hence
u
2
dv
2
= (u + v
d)(u v
d) = (x
1
+ y
1
d)(x
2
+ y
2
d)(x
1
y
1
d)(x
2
y
2
d)
= (x
2
1
dy
2
1
)(x
2
2
dy
2
2
) = MN.
The solution (u, v) will obviously be positive if the original ones are positive. .
Corollary 25.5 If the equation x
2
dy
2
= N has an integral solution, then it has innitely
many integral solutions.
Proof. Suppose the equation x
2
dy
2
= N has at least one integral solution. This solution
multiplied by any solution of x
2
dy
2
= 1 yields another solution of x
2
dy
2
= N. Since
the equation x
2
dy
2
= 1 has innitely many integral solutions, there will also be innitely
many integral solutions of x
2
dy
2
= N. .
Theorem 25.6 Let (x
1
, y
1
) be the fundamental solution of x
2
dy
2
= 1. Then all positive
integral solutions are given by (x
n
, y
n
) , n 1, where the integers x
n
and y
n
are recursively
dened by
x
n+1
= x
1
x
n
+ y
1
y
n
d, y
n+1
= x
1
y
n
+ y
1
x
n
.
Proof. Note that x
n+1
+y
n+1
d = (x
1
+y
1
d)(x
n
+y
n
d) = (x
1
+y
1
d)
n+1
. Hence byLemma
25.4 with M = N = 1, if (x
n
, y
n
) is a positive solution of Pells equation x
2
dy
2
= 1, then
(x
n+1
, y
n+1
) will also be a positive solution. It therefore follows by induction, that (x
n
, y
n
)
is a solution for all n.
It remains to show that every positive integral solution is obtained in this way.
Suppose there is a positive solution (u, v) that is not of the form (x
n
, y
n
) . Since x
n
forms
an increasing sequence, there must be some integer m such that x
m
u < x
m+1
. It
follows that y
m
v < y
m+1
, because we get the same result if positive solutions are
ordered according to their x-value or y-value. We cannot have equality, because u = x
m
82 Lectures on Number Theory
would imply v = y
m
. Now (x
m
, y
m
) is of course also a (non-positive) solution of
x
2
dy
2
= 1, so by Lemma 25.4 we will obtain another solution (s, t) by dening
s + t
d = (u + v
d)(x
m
y
m
d) =
u + v
d
x
m
+ y
m
d
.
Since x
m
+ y
m
d < u + v
d < x
m+1
+ y
m+1
d, we have
1 < s + t
d <
x
m+1
+ y
m+1
d
x
m
+ y
m
d
= x
1
+ y
1
d.
But s t
d = 1/(s + t
d) +
1
2
(s t
d) >
1
2
+ 0 > 0
t
d =
1
2
(s + t
d)
1
2
(s t
d) >
1
2
1
2
= 0,
so (s, t) is a positive solution. Therefore, s > x
1
and t > y
1
, but this contradicts
s +t
d < x
1
+y
1
d) = (x
1
+ y
1
d)
n
. Then all positive integral solutions of x
2
dy
2
= 1
are given by (x
n
, y
n
) with n odd, and all positive integral solutions of x
2
dy
2
= 1 are given by
(x
n
, y
n
) with n even. In particular, (x
2
, y
2
) is the fundamental solution of x
2
dy
2
= 1.