Whatsnew Terrytao PDF
Whatsnew Terrytao PDF
Whatsnew Terrytao PDF
1
ii
What’s new - 2007:
Open questions, expository articles, and lecture
series from a mathematical blog
Terence Tao
1 The author is supported by NSF grant CCF-0649473 and a grant from the MacArthur foun-
dation.
To my advisor, Eli Stein, for showing me the importance of good exposition;
To my friends, for supporting this experiment;
And to the readers of my blog, for their feedback and contributions.
Contents
1 Open problems 1
1.1 Best bounds for capsets . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Noncommutative Freiman theorem . . . . . . . . . . . . . . . . . . . 4
1.3 Mahler’s conjecture for convex bodies . . . . . . . . . . . . . . . . . 7
1.4 Why global regularity for Navier-Stokes is hard . . . . . . . . . . . . 10
1.5 Scarring for the Bunimovich stadium . . . . . . . . . . . . . . . . . . 21
1.6 Triangle and diamond densities . . . . . . . . . . . . . . . . . . . . . 25
1.7 What is a quantum honeycomb? . . . . . . . . . . . . . . . . . . . . 29
1.8 Boundedness of the trilinear Hilbert transform . . . . . . . . . . . . . 34
1.9 Effective Skolem-Mahler-Lech theorem . . . . . . . . . . . . . . . . 39
1.10 The parity problem in sieve theory . . . . . . . . . . . . . . . . . . . 43
1.11 Deterministic RIP matrices . . . . . . . . . . . . . . . . . . . . . . . 55
1.12 The nonlinear Carleson conjecture . . . . . . . . . . . . . . . . . . . 59
2 Expository articles 63
2.1 Quantum mechanics and Tomb Raider . . . . . . . . . . . . . . . . . 64
2.2 Compressed sensing and single-pixel cameras . . . . . . . . . . . . . 71
2.3 Finite convergence principle . . . . . . . . . . . . . . . . . . . . . . 77
2.4 Lebesgue differentiation theorem . . . . . . . . . . . . . . . . . . . . 88
2.5 Ultrafilters and nonstandard analysis . . . . . . . . . . . . . . . . . . 95
2.6 Dyadic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.7 Math doesn’t suck . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
2.8 Nonfirstorderisability . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.9 Amplification and arbitrage . . . . . . . . . . . . . . . . . . . . . . . 133
2.10 The crossing number inequality . . . . . . . . . . . . . . . . . . . . . 142
2.11 Ratner’s theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
2.12 Lorentz group and conic sections . . . . . . . . . . . . . . . . . . . . 155
2.13 Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.14 John’s blowup theorem . . . . . . . . . . . . . . . . . . . . . . . . . 166
2.15 Hilbert’s nullstellensatz . . . . . . . . . . . . . . . . . . . . . . . . . 172
2.16 Hahn-Banach, Menger, Helly . . . . . . . . . . . . . . . . . . . . . . 180
2.17 Einstein’s derivation of E = mc2 . . . . . . . . . . . . . . . . . . . . 186
v
vi CONTENTS
3 Lectures 193
3.1 Simons Lecture Series: Structure and randomness . . . . . . . . . . . 194
3.2 Ostrowski lecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
3.3 Milliman lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Preface
vii
viii CONTENTS
Almost nine years ago, in 1999, I began a “What’s new?” page on my UCLA home
page in order to keep track of various new additions to that page (e.g. papers, slides,
lecture notes, expository “short stories”, etc.). At first, these additions were simply
listed without any commentary, but after a while I realised that this page was a good
place to put a brief description and commentary on each of the mathematical articles
that I was uploading to the page. (In short, I had begun blogging on my research,
though I did not know this term at the time.)
Every now and then, I received an email from someone who had just read the most
recent entry on my “What’s new?” page and wanted to make some mathematical or
bibliographic comment; this type of valuable feedback was one of the main reasons
why I kept maintaining the page. But I did not think to try to encourage more of this
feedback until late in 2006, when I posed a question on my “What’s new?” page and
got a complete solution to that problem within a matter of days. It was then that I began
thinking about modernising my web page to a blog format (which a few other math-
ematicians had already begun doing). On 22 February 2007, I started a blog with the
unimaginative name of “What’s new” at erryao.wordpress.com; I chose wordpress
for a number of reasons, but perhaps the most decisive one was its recent decision to
support LATEX in its blog posts.
It soon became clear that the potential of this blog went beyond my original aim
of merely continuing to announce my own papers and research. For instance, by far
the most widely read and commented article in my blog in the first month was a non-
technical article, “Quantum Mechanics and Tomb Raider” (Section 2.1), which had
absolutely nothing to do with my own mathematical work. Encouraged by this, I began
to experiment with other types of mathematical content on the blog; discussions of my
favourite open problems, informal discussions of mathematical phenomena, principles,
or tricks, guest posts by some of my colleagues, and presentations of various lectures
and talks, both by myself and by others; and various bits and pieces of advice on
pursuing a mathematical career and on mathematical writing. This year, I also have
begun placing lecture notes for my graduate classes on my blog.
After a year of mathematical blogging, I can say that the experience has been pos-
itive, both for the readers of the blog and for myself. Firstly, the very act of writing
a blog article helped me organise and clarify my thoughts on a given mathematical
topic, to practice my mathematical writing and exposition skills, and also to inspect the
references and other details more carefully. From insightful comments by experts in
other fields of mathematics, I have learned of unexpected connections between different
fields; in one or two cases, these even led to new research projects and collaborations.
From the feedback from readers I obtained a substantial amount of free proofreading,
while also discovering what parts of my exposition were unclear or otherwise poorly
worded, helping me improve my writing style in the future. It is a truism that one of
the best ways to learn a subject is to teach it; and it seems that blogging about a subject
comes in as a close second.
In the last year (2007) alone, at least a dozen new active blogs in research math-
ematics have sprung up. I believe this is an exciting new development in mathemat-
ical exposition; research blogs seem to fill an important niche that neither traditional
print media (textbooks, research articles, surveys, etc.) nor informal communications
(lectures, seminars, conversations at a blackboard, etc.) adequately cover at present.
CONTENTS ix
Indeed, the research blog medium is in some ways the “best of both worlds”; informal,
dynamic, and interactive, as with talks and lectures, but also coming with a permanent
record, a well defined author, and links to further references, as with the print media.
There are bits and pieces of folklore in mathematics, such as the difference between
hard and soft analysis (Section 2.3) or the use of dyadic models for non-dyadic situa-
tions (Section 2.6) which are passed down from advisor to student, or from collaborator
to collaborator, but are too fuzzy and non-rigorous to be discussed in the formal liter-
ature; but they can be communicated effectively and efficiently via the semi-formal
medium of research blogging.
On the other hand, blog articles still lack the permanence that print articles have,
which becomes an issue when one wants to use them in citations. For this and other
reasons, I have decided to convert some of my blog articles from 2007 into the book
that you are currently reading. Not all of the 93 articles that I wrote in 2007 appear
here; some were mostly administrative or otherwise non-mathematical in nature, some
were primarily announcements of research papers or other articles which will appear
elsewhere, some were contributed guest articles, and some were writeups of lectures
by other mathematicians, which it seemed inappropriate to reproduce in a book such
as this. Nevertheless, this still left me with 32 articles, which I have converted into
print form (replacing hyperlinks with more traditional citations and footnotes, etc.). As
a result, this book is not a perfect replica of the blog, but the mathematical content is
largely the same. I have paraphrased some of the feedback from comments to the blog
in the endnotes to each article, though for various reasons, ranging from lack of space
to copyright concerns, not all comments are reproduced here.
The articles here are rather diverse in subject matter, to put it mildly, but I have
nevertheless organised them into three categories. The first category concerns vari-
ous open problems in mathematics that I am fond of; some are of course more difficult
than others (see e.g. the article on Navier-Stokes regularity, Section 1.4), and others are
rather vague and open-ended, but I find each of them interesting, not only in their own
right, but because progress on them is likely to yield insights and techniques that will
be useful elsewhere. The second category are the expository articles, which vary from
discussions of various well-known results in maths and science (e.g. the nullstellensatz
in Section 2.15, or Einstein’s equation E = mc2 in Section 2.17), to more philosophical
explorations of mathematical ideas, tricks, tools, or principles (e.g. ultrafilters in Sec-
tion 2.5, or amplification in Section 2.9), to non-technical expositions of various topics
in maths and science, from quantum mechanics (Section 2.1) to single-pixel cameras
(Section 2.2). Finally, I am including writeups of three lecture series I gave in 2007;
my Simons lecture series at MIT on structure on randomness, my Ostrowski lecture
at the University of Leiden on compressed sensing, and my Milliman lectures at the
University of Washington on additive combinatorics.
In closing, I believe that this experiment with mathematical blogging has been gen-
erally successful, and I plan to continue it in the future, and perhaps generating several
more books such as this one as a result. I am grateful to all the readers of my blog for
supporting this experiment, for supplying invaluable feedback and corrections, and for
encouraging projects such as this book conversion.
x CONTENTS
A remark on notation
One advantage of the blog format is that one can often define a technical term simply
by linking to an external web page that contains the definition (e.g. a Wikipedia page).
This is unfortunately not so easy to reproduce in print form, and so many standard
mathematical technical terms will be used without definition in this book; this is not
going to be a self-contained textbook in mathematics, but is instead a loosely connected
collection of articles at various levels of technical difficulty. Of course, in the age of
the internet, it is not terribly difficult to look up these definitions whenever necessary.
I will however mention a few notational conventions that I will use throughout. The
cardinality of a finite set E will be denoted |E|. We will use the asymptotic notation X =
O(Y ), X Y , or Y X to denote the estimate |X| ≤ CY for some absolute constant
C > 0. In some cases we will need this constant C to depend on a parameter (e.g. d), in
which case we shall indicate this dependence by subscripts, e.g. X = Od (Y ) or X d Y .
We also sometimes use X ∼ Y as a synonym for X Y X.
In many situations there will be a large parameter n that goes off to infinity. When
that occurs, we also use the notation on→∞ (X) or simply o(X) to denote any quantity
bounded in magnitude by c(n)X, where c(n) is a function depending only on n that
goes to zero as n goes to infinity. If we need c(n) to depend on another parameter, e.g.
d, we indicate this by further subscripts, e.g. on→∞;d (X).
1
We will occasionally use the averaging notation Ex∈X f (x) := |X| ∑x∈X f (x) to de-
note the average value of a function f : X → C on a non-empty finite set X.
0.0.1 Acknowledgments
Many people have contributed corrections or comments to individual blog articles, and
are acknowledged in the end notes to those articles here. Thanks also to harrison, Gil
Kalai, Greg Kuperberg, Phu, Jozsef Solymosi, Tom, and Y J for general corrections,
reference updates, and formatting suggestions.
Chapter 1
Open problems
1
2 CHAPTER 1. OPEN PROBLEMS
Perhaps my favourite open question is the problem on the maximal size of a cap set -
a subset of Fn3 (F3 being the finite field of three elements) which contains no lines, or
equivalently no non-trivial arithmetic progressions of length three. As an upper bound,
one can easily modify the proof of Roth’s theorem[Ro1953] to show that cap sets must
have size O(3n /n); see [Me1995]. This of course is better than the trivial bound of 3n
once n is large. In the converse direction, the trivial example {0, 1}n shows that cap sets
can be as large as 2n ; the current world record is (2.2174 . . .)n , held by Edel[Ed2004].
The gap between these two bounds is rather enormous; I would be very interested in
either an improvement of the upper bound to o(3n /n), or an improvement of the lower
bound to (3 − o(1))n . (I believe both improvements are true, though a good friend of
mine disagrees about the improvement to the lower bound.)
One reason why I find this question important is that it serves as an excellent model
for the analogous question of finding large sets without progressionsq of length three
log log N
in the interval {1, . . . , N}. Here, the best upper bound of O(N log N ) is due to
(log log N)2
Bourgain[Bo1999] (with a recent improvement to O(N ) [Bo2008]), while the
√ log2/3 N
best lower bound of Ne−C log N is an ancient result of Behrend[Be1946]. Using the
finite field heuristic (see Section 2.6) that Fn3 “behaves like” {1, . . . , 3n }, we see that
the Bourgain bound should be improvable to O( logNN ), whereas the Edel bound should
√
be improvable to something like 3n e−C n . However, neither argument extends eas-
ily to the other setting. Note that a conjecture of Erdős asserts that any set of positive
integers whose sum of reciprocals diverges contains arbitrarily long arithmetic progres-
sions; even for progressions of length three, this conjecture is open, and is essentially
equivalent (up to log log factors) to the problem of improving the Bourgain bound to
o( logNN ).
The Roth bound of O(3n /n) appears to be the natural limit of the purely Fourier-
analytic approach of Roth, and so any breakthrough would be extremely interesting,
as it almost certainly would need a radically new idea. The lower bound might be
improvable by some sort of algebraic geometry construction, though it is not clear at
all how to achieve this.
One can interpret this problem in terms of the wonderful game “Set”, in which case
the problem is to find the largest number of cards one can put on the table for which
nobody has a valid move. As far as I know, the best bounds on the cap set problem in
small dimensions are the ones cited in [Ed2004].
There is a variant formulation of the problem which may be a little bit more tractable.
Given any 0 < δ ≤ 1, the fewest number of lines N(δ , n) in a set of Fn3 of density at
least δ is easily shown to be (c(δ ) + o(1))32n for some 0 < c(δ ) ≤ 1. (The ana-
logue in Z/NZ is trickier; see [Cr2008], [GrSi2008].) The reformulated question is
then to get as strong a bound on c(δ ) as one can. For instance, the counterexample
0, 1m × Fn3 shows that c(δ ) δ log3/2 9/2 , while the Roth-Meshulam argument gives
c(δ ) e−C/δ .)
1.1. BEST BOUNDS FOR CAPSETS 3
1.1.1 Notes
This article was originally posted on Feb 23, 2007 at
terrytao.wordpress.com/2007/02/23
Thanks to Jordan Ellenberg for suggesting the density formulation of the problem.
Olaf Sisask points out that the result N(δ , n) = (c(δ ) + o(1))32n has an elementary
proof; by considering sets in Fn3 of the form A × F3 for some A ⊂ Fn−1 3 one can obtain
the inequality N(δ , n) ≤ 32 N(δ , n − 1), from which the claim easily follows.
Thanks to Ben Green for corrections.
4 CHAPTER 1. OPEN PROBLEMS
for some ε = ε(δ ) > 0. This particular theorem was first proven in [BoGlKo2006]
with an earlier partial result in [BoKaTa2004]; more recent and elementary proofs with
civilised bounds can be found in [TaVu2006], [GlKo2008], [Ga2008], [KaSh2008].
See Section 3.3.3 for further discussion.
In contrast, inverse theorems in this subject start with a hypothesis that, say, the sum
set A + A of an unknown set A is small, and try to deduce structural information about
A. A typical goal is to completely classify all sets A for which A + A has comparable
size with A. In the case of finite subsets of integers, this is Freiman’s theorem[Fr1973],
which roughly speaking asserts that if |A + A| = O(|A|), if and only if A is a dense
subset of a generalised arithmetic progression P of rank O(1), where we say that A
is a dense subset of B if A ⊂ B and |B| = O(|A|). (The “if and only if” has to be in-
terpreted properly; in either the “if” or the “only if” direction, the implicit constants
in the conclusion depend on the implicit constants in the hypothesis, but these depen-
dencies are not inverses of each other.) In the case of finite subsets A of an arbitrary
abelian group, we have the Freiman-Green-Ruzsa theorem [GrRu2007], which asserts
that |A + A| = O(|A|) if and only if A is a dense subset of a sum P + H of a finite
subgroup H and a generalised arithmetic progression P of rank O(1).
One can view these theorems as a “robust” or “rigid” analogue of the classification
of finite abelian groups. It is well known that finite abelian groups are direct sums of
cyclic groups; the above results basically assert that finite sets that are “nearly groups”
in that their sum set is not much larger than the set itself, are (dense subsets of) the
direct sums of cyclic groups and a handful of arithmetic progressions.
The open question is to formulate an analogous conjectural classification in the
non-abelian setting, thus to conjecture a reasonable classification of finite sets A in a
multiplicative group G for which |A · A| = O(|A|). Actually for technical reasons it may
be better to use |A · A · A| = O(|A|); I refer to this condition by saying that A has small
tripling. (Note for instance that if H is a subgroup and x is not in the normaliser of
H, then H ∪ {x} has small doubling but not small tripling. On the other hand, small
tripling is known to imply small quadrupling, etc., see e.g. [TaVu2006].) Note that I
am not asking for a theorem here - just even stating the right conjecture would be major
progress! An if and only if statement might be too ambitious initially: a first step would
be to obtain a slightly looser equivalence, creating for each group G and fixed ε > 0
a class P of sets (depending on some implied constants) for which the following two
statements are true:
1.2. NONCOMMUTATIVE FREIMAN THEOREM 5
(i) If A is a finite subset of G with small tripling, then A is a dense subset of O(|A|ε )
left- or right- translates of a set P of the form P.
(ii) If P is a set of the form P, then there exists a dense subset A of P with small
tripling (possibly with a loss of O(|A|ε ) in the tripling constant).
• For abelian groups G, from the Freiman-Green-Ruzsa theorem, we know that the
standard candidate suffices.
• For G = SL2 (C), we know from work of Elekes and Király[ElKi2001] and
Chang[Ch2008] that the standard candidate suffices.
• For G = SL3 (Z), a result of Chang[Ch2008] shows that if A has small tripling,
then it is contained in a nilpotent subgroup of G.
• For a free non-abelian group, we know (since the free group embeds into SL2 (C))
that the standard candidate suffices; a much stronger estimate in this direction
was recently obtained by Razborov [Ra2008].
These examples do not seem to conclusively suggest what the full classification
should be. Based on analogy with the classification of finite simple groups, one might
expect the full classification to be complicated, and enormously difficult to prove; on
the other hand, the fact that we are in a setting where we are allowed to lose factors of
O(1) may mean that the problem is in fact significantly less difficult than that classi-
fication. (For instance, all the sporadic simple groups have size O(1) and so even the
monster group is “negligible”.) Nevertheless, it seems possible to make progress on
explicit groups, in particular refining the partial results already obtained for the spe-
cific groups mentioned above. An even closer analogy may be with Gromov’s theorem
[Gr1981] on groups of polynomial growth; in particular, the recent effective proof of
this theorem by Kleiner [Kl2008] may prove to be relevant for this problem.
1.2.1 Notes
This article was originally posted on Mar 2, 2007 at
terrytao.wordpress.com/2007/03/02
Thanks to Akshay Venkatesh and Elon Lindenstrauss to pointing out the analogy
with Gromov’s theorem, and to Harald Helfgott for informative comments.
1.3. MAHLER’S CONJECTURE FOR CONVEX BODIES 7
converging towards a ball. One can quickly verifies that each application of Steiner
symmetrisation does not decrease the Mahler volume, and the result easily follows. As
a corollary one can show that the ellipsoids are the only bodies which actually attain
the maximal Mahler volume. (Several other proofs of this result, now known as the
Blaschke-Santaló inequality, exist in the literature. It plays an important role in affine
geometry, being a model example of an affine isoperimetric inequality.)
Somewhat amusingly, one can use Plancherel’s theorem to quickly obtain a crude
version of this inequality, losing a factor of O(d)d ; indeed, as pointed out to me by
Bo’az Klartag, one can view the Mahler conjecture as a kind of “exact uncertainty
principle”. Unfortunately it seems that Fourier-analytic techniques are unable to solve
these sorts of “sharp constant” problems (for which one cannot afford to lose unspeci-
fied absolute constants).
The lower inequality remains open. In my opinion, the main reason why this con-
jecture is so difficult is that unlike the upper bound, in which there is essentially only
one extremiser up to affine transformations (namely the ball), there are many distinct
extremisers for the lower bound - not only the cube and the octahedron (and affine
images thereof), but also products of cubes and octahedra, polar bodies of products of
cubes and octahedra, products of polar bodies of... well, you get the idea. (As pointed
out to me by Gil Kalai, these polytopes are known as Hanner polytopes.) It is really dif-
ficult to conceive of any sort of flow or optimisation procedure which would converge
to exactly these bodies and no others; a radically different type of argument might be
needed.
The conjecture was solved for two dimensions by Mahler [Ma1939] but remains
open even in three dimensions. If one is willing to lose some factors in the inequality,
though, then some partial results are known. Firstly, from John’s theorem[Jo1948] one
trivially gets a bound of the form v(B) ≥ d −d/2 v(Bd ). A significantly deeper argument
of Bourgain and Milman[BoMi1987], gives a bound of the form v(B) ≥ C−d v(Bd )
for some absolute constant C; this bound is now known as the reverse Santaló in-
equality. A slightly weaker “low-tech” bound of v(B) ≥ (log2 d)−d v(Bd ) was given by
Kuperberg[Ku1992], using only elementary methods. The best result currently known
is again by Kuperberg[Ku2008], who showed that
2d π
v(B) ≥ 2d
v(Bd ) ≥ ( )d−1 v(Qd )
4
d
1.3.1 Notes
This article was originally posted on Mar 8, 2007 at
terrytao.wordpress.com/2007/03/08
The article generated a fair amount of discussion, some of which I summarise be-
low.
Bo’az Klartag points out that the analogous conjecture for non-symmetric bodies
- namely, that the minimal Mahler volume is attained when the body is a simplex -
may be easier, due to the fact that there is now only one extremiser up to affine trans-
formations. Greg Kuperberg noted that by combining his inequalities from [Ku2008]
with the Rogers-Shephard inequality[RoSh1957], that this conjecture is known up to a
π d
factor of ( 4e ) .
Klartag also pointed out that this asymmetric analogue has an equivalent formula-
tion in terms of the Legendre transformation
L( f )(ξ ) := sup x · ξ − f (x)
x∈Rd
(I) Exact and explicit solutions (or at least an exact, explicit transformation to a
significantly simpler PDE or ODE);
(II) Perturbative hypotheses (e.g. small data, data close to a special solution, or more
generally a hypothesis which involves an ε somewhere); or
(III) One or more globally controlled quantities (such as the total energy) which are
both coercive and either critical or subcritical.
Note that the presence of (I), (II), or (II) are currently necessary conditions for a
global regularity result, but far from sufficient; otherwise, papers on the global regu-
larity problem for various nonlinear PDE would be substantially shorter. In particular,
there have been many good, deep, and highly non-trivial papers recently on global
regularity for Navier-Stokes, but they all assume either (I), (II) or (II) via additional
hypotheses on the data or solution. For instance, in recent years we have seen good
results on global regularity assuming (II) (e.g. [KoTa2001]), as well as good results
on global regularity assuming (III) (e.g. [EsSeSv2003]); a complete bibilography of
recent results is unfortunately too lengthy to be given here.)
1.4. WHY GLOBAL REGULARITY FOR NAVIER-STOKES IS HARD 11
The Navier-Stokes global regularity problem for arbitrary large smooth data lacks
all of these three ingredients. Reinstating (II) is impossible without changing the state-
ment of the problem, or adding some additional hypotheses; also, in perturbative situa-
tions the Navier-Stokes equation evolves almost linearly, while in the non-perturbative
setting it behaves very nonlinearly, so there is basically no chance of a reduction of the
non-perturbative case to the perturbative one unless one comes up with a highly non-
linear transform to achieve this (e.g. a naive scaling argument cannot possibly work).
Thus, one is left with only three possible strategies if one wants to solve the full prob-
lem:
• Solve the Navier-Stokes equation exactly and explicitly (or at least transform this
equation exactly and explicitly to a simpler equation);
• Discover a new globally controlled quantity which is both coercive and either
critical or subcritical; or
• Discover a new method which yields global smooth solutions even in the absence
of the ingredients (I), (II), and (III) above.
For the rest of this article I refer to these strategies as “Strategy 1”, “Strategy 2”,
and “Strategy 3” respectively.
Much effort has been expended here, especially on Strategy 3, but the supercritical-
ity of the equation presents a truly significant obstacle which already defeats all known
methods. Strategy 1 is probably hopeless; the last century of experience has shown
that (with the very notable exception of completely integrable systems, of which the
Navier-Stokes equations is not an example) most nonlinear PDE, even those arising
from physics, do not enjoy explicit formulae for solutions from arbitrary data (al-
though it may well be the case that there are interesting exact solutions from special
(e.g. symmetric) data). Strategy 2 may have a little more hope; after all, the Poincaré
conjecture became solvable (though still very far from trivial) after Perelman[Pe2002]
introduced a new globally controlled quantity for Ricci flow (the Perelman entropy)
which turned out to be both coercive and critical. (See also my exposition of this topic
at [Ta2006c].) But we are still not very good at discovering new globally controlled
quantities; to quote Klainerman[Kl2000], “the discovery of any new bound, stronger
than that provided by the energy, for general solutions of any of our basic physical
equations would have the significance of a major event” (emphasis mine).
I will return to Strategy 2 later, but let us now discuss Strategy 3. The first basic
observation is that the Navier-Stokes equation, like many other of our basic model
equations, obeys a scale invariance: specifically, given any scaling parameter λ > 0,
and any smooth velocity field u : [0, T ) × R3 → R3 solving the Navier-Stokes equations
for some time T , one can form a new velocity field u(λ ) : [0, λ 2 T ) × R3 → R3 to the
Navier-Stokes equation up to time λ 2 T , by the formula
1 t x
u(λ ) (t, x) := u( , )
λ λ2 λ
(Strictly speaking, this scaling invariance is only present as stated in the absence of an
external force, and with the non-periodic domain R3 rather than the periodic domain
12 CHAPTER 1. OPEN PROBLEMS
T3 . One can adapt the arguments here to these other settings with some minor effort,
the key point being that an approximate scale invariance can play the role of a perfect
scale invariance in the considerations below. The pressure field p(t, x) gets rescaled
too, to p(λ ) (t, x) := λ12 p( λt2 , λx ), but we will not need to study the pressure here. The
viscosity ν remains unchanged.)
We shall think of the rescaling parameter λ as being large (e.g. λ > 1). One should
then think of the transformation from u to u(λ ) as a kind of “magnifying glass”, taking
fine-scale behaviour of u and matching it with an identical (but rescaled, and slowed
down) coarse-scale behaviour of u(λ ) . The point of this magnifying glass is that it
allows us to treat both fine-scale and coarse-scale behaviour on an equal footing, by
identifying both types of behaviour with something that goes on at a fixed scale (e.g.
the unit scale). Observe that the scaling suggests that fine-scale behaviour should play
out on much smaller time scales than coarse-scale behaviour (T versus λ 2 T ). Thus, for
instance, if a unit-scale solution does something funny at time 1, then the rescaled fine-
scale solution will exhibit something similarly funny at spatial scales 1/λ and at time
1/λ 2 . Blowup can occur when the solution shifts its energy into increasingly finer and
finer scales, thus evolving more and more rapidly and eventually reaching a singularity
in which the scale in both space and time on which the bulk of the evolution is occuring
has shrunk to zero. In order to prevent blowup, therefore, we must arrest this motion
of energy from coarse scales (or low frequencies) to fine scales (or high frequencies).
(There are many ways in which to make these statements rigorous, for instance using
Littlewood-Paley theory, which we will not discuss here, preferring instead to leave
terms such as “coarse-scale” and “fine-scale” undefined.)
Now, let us take an arbitrary large-data smooth solution to Navier-Stokes, and let
it evolve over a very long period of time [0, T ), assuming that it stays smooth except
possibly at time T . At very late times of the evolution, such as those near to the final
time T , there is no reason to expect the solution to resemble the initial data any more
(except in perturbative regimes, but these are not available in the arbitrary large-data
case). Indeed, the only control we are likely to have on the late-time stages of the
solution are those provided by globally controlled quantities of the evolution. Barring
a breakthrough in Strategy 2, we only have two really useful globally controlled (i.e.
bounded even for very large T ) quantities:
1R 2
• The maximum kinetic energy sup0≤t<T 2 R3 |u(t, x)| dx; and
1 RT R 2
• The cumulative energy dissipation 2 0 R3 |∇u(t, x)| dxdt.
Indeed, the energy conservation law implies that these quantities are both bounded by
the initial kinetic energy E, which could be large (we are assuming our data could be
large) but is at least finite by hypothesis.
The above two quantities are coercive, in the sense that control of these quantities
imply that the solution, even at very late times, stays in a bounded region of some
function space. However, this is basically the only thing we know about the solution at
late times (other than that it is smooth until time T , but this is a qualitative assumption
and gives no bounds). So, unless there is a breakthrough in Strategy 2, we cannot rule
out the worst-case scenario that the solution near time T is essentially an arbitrary
1.4. WHY GLOBAL REGULARITY FOR NAVIER-STOKES IS HARD 13
smooth divergence-free vector field which is bounded both in kinetic energy and in
cumulative energy dissipation by E. In particular, near time T the solution could be
concentrating the bulk of its energy into fine-scale behaviour, say at some spatial scale
1/λ . (Of course, cumulative energy dissipation is not a function of a single time, but is
an integral over all time; let me suppress this fact for the sake of the current discussion.)
Now, let us take our magnifying glass and blow up this fine-scale behaviour by λ to
create a coarse-scale solution to Navier-Stokes. Given that the fine-scale solution could
(in the worst-case scenario) be as bad as an arbitrary smooth vector field with kinetic
energy and cumulative energy dissipation at most E, the rescaled unit-scale solution can
be as bad as an arbitrary smooth vector field with kinetic energy and cumulative energy
dissipation at most Eλ , as a simple change-of-variables shows. Note that the control
given by our two key quantities has worsened by a factor of λ ; because of this wors-
ening, we say that these quantities are supercritical - they become increasingly useless
for controlling the solution as one moves to finer and finer scales. This should be con-
trasted with critical quantities (such as the energy for two-dimensional Navier-Stokes),
which are invariant under scaling and thus control all scales equally well (or equally
poorly), and subcritical quantities, control of which becomes increasingly powerful at
fine scales (and increasingly useless at very coarse scales).
Now, suppose we know of examples of unit-scale solutions whose kinetic energy
and cumulative energy dissipation are as large as Eλ , but which can shift their energy
to the next finer scale, e.g. a half-unit scale, in a bounded amount O(1) of time. Given
the previous discussion, we cannot rule out the possibility that our rescaled solution
behaves like this example. Undoing the scaling, this means that we cannot rule out the
possibility that the original solution will shift its energy from spatial scale 1/λ to spatial
scale 1/2λ in time O(1/λ 2 ). If this bad scenario repeats over and over again, then
convergence of geometric series shows that the solution may in fact blow up in finite
time. Note that the bad scenarios do not have to happen immediately after each other
(the self-similar blowup scenario); the solution could shift from scale 1/λ to 1/2λ ,
wait for a little bit (in rescaled time) to “mix up” the system and return to an “arbitrary”
(and thus potentially “worst-case”) state, and then shift to 1/4λ , and so forth. While
the cumulative energy dissipation bound can provide a little bit of a bound on how
long the system can “wait” in such a “holding pattern”, it is far too weak to prevent
blowup in finite time. To put it another way, we have no rigorous, deterministic way of
preventing “Maxwell’s demon” from plaguing the solution at increasingly frequent (in
absolute time) intervals, invoking various rescalings of the above scenario to nudge the
energy of the solution into increasingly finer scales, until blowup is attained.
Thus, in order for Strategy 3 to be successful, we basically need to rule out the sce-
nario in which unit-scale solutions with arbitrarily large kinetic energy and cumulative
energy dissipation shift their energy to the next highest scale. But every single analytic
technique we are aware of (except for those involving exact solutions, i.e. Strategy 1)
requires at least one bound on the size of solution in order to have any chance at all.
Basically, one needs at least one bound in order to control all nonlinear errors - and any
strategy we know of which does not proceed via exact solutions will have at least one
nonlinear error that needs to be controlled. The only thing we have here is a bound on
the scale of the solution, which is not a bound in the sense that a norm of the solution
is bounded; and so we are stuck.
14 CHAPTER 1. OPEN PROBLEMS
To summarise, any argument which claims to yield global regularity for Navier-
Stokes via Strategy 3 must inevitably (via the scale invariance) provide a radically new
method for providing non-trivial control of nonlinear unit-scale solutions of arbitrary
large size for unit time, which looks impossible without new breakthroughs on Strategy
1 or Strategy 2. (There are a couple of loopholes that one might try to exploit: one can
instead try to refine the control on the “waiting time” or “amount of mixing” between
each shift to the next finer scale, or try to exploit the fact that each such shift requires
a certain amount of energy dissipation, but one can use similar scaling arguments to
the preceding to show that these types of loopholes cannot be exploited without a new
bound along the lines of Strategy 2, or some sort of argument which works for arbitrar-
ily large data at unit scales.)
To rephrase in an even more jargon-heavy manner: the “energy surface” on which
the dynamics is known to live in, can be quotiented by the scale invariance. After this
quotienting, the solution can stray arbitrarily far from the origin even at unit scales, and
so we lose all control of the solution unless we have exact control (Strategy 1) or can
significantly shrink the energy surface (Strategy 2).
The above was a general critique of Strategy 3. Now I’ll turn to some known
specific attempts to implement Strategy 3, and discuss where the difficulty lies with
these:
ment one tries to establish actual control on the regularity of this generalised
solution one will encounter all the supercriticality difficulties mentioned earlier.
3. Exploiting blowup criteria. Perturbative theory can yield some highly non-trivial
blowup criteria - that certain norms of the solution must diverge if the solution is
to blow up. For instance, a celebrated result of Beale-Kato-Majda[BeKaMa1984]
shows that the maximal vorticity must have a divergent time integral at the
blowup point. However, all such blowup criteria are subcritical or critical in
nature, and thus, barring a breakthrough in Strategy 2, the known globally con-
trolled quantities cannot be used to reach a contradiction. Scaling arguments
similar to those given above show that perturbative methods cannot achieve a
supercritical blowup criterion.
out by the scaling symmetry) that one can begin to use subcritical and supercriti-
cal conservation laws and monotonicity formulae as well (see my survey on this
topic [Ta2006f]). Unfortunately, as the strategy is currently understood, it does
not seem to be directly applicable to a supercritical situation (unless one simply
assumes that some critical norm is globally bounded) because it is impossible, in
view of the scale invariance, to minimise a non-scale-invariant quantity.
If we abandon Strategy 1 and Strategy 3, we are thus left with Strategy 2 - dis-
covering new bounds, stronger than those provided by the (supercritical) energy. This
is not a priori impossible, but there is a huge gap between simply wishing for a new
bound and actually discovering and then rigorously establishing one. Simply sticking
the existing energy bounds into the Navier-Stokes equation and seeing what comes out
will provide a few more bounds, but they will all be supercritical, as a scaling argument
quickly reveals. The only other way we know of to create global non-perturbative de-
terministic bounds is to discover a new conserved or monotone quantity. In the past,
when such quantities have been discovered, they have always been connected either
to geometry (symplectic, Riemmanian, complex, etc.), to physics, or to some consis-
tently favourable (defocusing) sign in the nonlinearity (or in various “curvatures” in
the system). There appears to be very little usable geometry in the equation; on the
one hand, the Euclidean structure enters the equation via the diffusive term ∆ and by
the divergence-free nature of the vector field, but the nonlinearity is instead describ-
ing transport by the velocity vector field, which is basically just an arbitrary volume-
preserving infinitesimal diffeomorphism (and in particular does not respect the Eu-
clidean structure at all). One can try to quotient out by this diffeomorphism (i.e. work in
material coordinates) but there are very few geometric invariants left to play with when
one does so. (In the case of the Euler equations, the vorticity vector field is preserved
1.4. WHY GLOBAL REGULARITY FOR NAVIER-STOKES IS HARD 17
modulo this diffeomorphism, as observed for instance in [Li2003], but this invariant is
very far from coercive, being almost purely topological in nature.) The Navier-Stokes
equation, being a system rather than a scalar equation, also appears to have almost no
favourable sign properties, in particular ruling out the type of bounds which the maxi-
mum principle or similar comparison principles can give. This leaves physics, but apart
from the energy, it is not clear if there are any physical quantities of fluids which are de-
terministically monotone. (Things look better on the stochastic level, in which the laws
of thermodynamics might play a role, but the Navier-Stokes problem, as defined by the
Clay institute, is deterministic, and so we have Maxwell’s demon to contend with.)
It would of course be fantastic to obtain a fourth source of non-perturbative controlled
quantities, not arising from geometry, physics, or favourable signs, but this looks some-
what of a long shot at present. Indeed given the turbulent, unstable, and chaotic nature
of Navier-Stokes, it is quite conceivable that in fact no reasonable globally controlled
quantities exist beyond that which arise from the energy.
Of course, given how hard it is to show global regularity, one might try instead
to establish finite time blowup instead (this also is acceptable for the Millennium
prize[Fe2006]). Unfortunately, even though the Navier-Stokes equation is known to
be very unstable, it is not clear at all how to pass from this to a rigorous demonstration
of a blowup solution. All the rigorous finite time blowup results (as opposed to mere
instability results) that I am aware of rely on one or more of the following ingredients:
(b) An ansatz for a blowup solution (or approximate solution), combined with some
nonlinear stability theory for that ansatz;
(d) An indirect argument, constructing a functional of the solution which must at-
tain an impossible value in finite time (e.g. a quantity which is manifestly non-
negative for smooth solutions, but must become negative in finite time).
It may well be that there is some exotic symmetry reduction which gives (a), but
no-one has located any good exactly solvable special case of Navier-Stokes (in fact,
those which have been found, are known to have global smooth solutions). Method
(b) is problematic for two reasons: firstly, we do not have a good ansatz for a blowup
solution, but perhaps more importantly it seems hopeless to establish a stability theory
for any such ansatz thus created, as this problem is essentially a more difficult version
of the global regularity problem, and in particular subject to the main difficulty, namely
controlling the highly nonlinear behaviour at fine scales. (One of the ironies in pursuing
method (b) is that in order to establish rigorous blowup in some sense, one must first
establish rigorous stability in some other (renormalised) sense.) Method (c) would
require a comparison principle, which as noted before appears to be absent for the non-
scalar Navier-Stokes equations. Method (d) suffers from the same problem, ultimately
coming back to the “Strategy 2” problem that we have virtually no globally monotone
18 CHAPTER 1. OPEN PROBLEMS
quantities in this system to play with (other than energy monotonicity, which clearly
looks insufficient by itself). Obtaining a new type of mechanism to force blowup other
than (a)-(d) above would be quite revolutionary, not just for Navier-Stokes; but I am
unaware of even any proposals in these directions, though perhaps topological methods
might have some effectiveness.
So, after all this negativity, do I have any positive suggestions for how to solve this
problem? My opinion is that Strategy 1 is impossible, and Strategy 2 would require
either some exceptionally good intuition from physics, or else an incredible stroke of
luck. Which leaves Strategy 3 (and indeed, I think one of the main reasons why the
Navier-Stokes problem is interesting is that it forces us to create a Strategy 3 technique).
Given how difficult this strategy seems to be, as discussed above, I only have some
extremely tentative and speculative thoughts in this direction, all of which I would
classify as “blue-sky” long shots:
1. Work with ensembles of data, rather than a single initial datum. All of our cur-
rent theory for deterministic evolution equations deals only with a single solution
from a single initial datum. It may be more effective to work with parameterised
familes of data and solutions, or perhaps probability measures (e.g. Gibbs mea-
sures or other invariant measures). One obvious partial result to shoot for is to
try to establish global regularity for generic large data rather than all large data;
in other words, acknowledge that Maxwell’s demon might exist, but show that
the probability of it actually intervening is very small. The problem is that we
have virtually no tools for dealing with generic (average-case) data other than
by treating all (worst-case) data; the enemy is that the Navier-Stokes flow itself
might have some perverse entropy-reducing property which somehow makes the
average case drift towards (or at least recur near) the worst case over long peri-
ods of time. This is incredibly unlikely to be the truth, but we have no tools to
prevent it from happening at present.
2. Work with a much simpler (but still supercritical) toy model. The Navier-Stokes
model is parabolic, which is nice, but is complicated in many other ways, being
relatively high-dimensional and also non-scalar in nature. It may make sense to
work with other, simplified models which still contain the key difficulty that the
only globally controlled quantities are supercritical. Examples include the Katz-
Pavlovı́c dyadic model[KaPa2005] for the Euler equations (for which blowup
can be demonstrated by a monotonicity argument; see [FrPa2008]), or the spher-
ically symmetric defocusing supercritical nonlinear wave equation −utt + ∆u =
u7 in three spatial dimensions.
3. Develop non-perturbative tools to control deterministic non-integrable dynam-
ical systems. Throughout this post we have been discussing PDE, but actu-
ally there are similar issues arising in the nominally simpler context of finite-
dimensional dynamical systems (ODE). Except in perturbative contexts (such as
the neighbourhood of a fixed point or invariant torus), the long-time evolution
of a dynamical system for deterministic data is still largely only controllable by
the classical tools of exact solutions, conservation laws and monotonicity formu-
lae; a discovery of a new and effective tool for this purpose would be a major
1.4. WHY GLOBAL REGULARITY FOR NAVIER-STOKES IS HARD 19
5. Try a topological method. This is a special case of (1). It may well be that
a primarily topological argument may be used either to construct solutions, or
to establish blowup; there are some precedents for this type of construction in
elliptic theory. Such methods are very global by nature, and thus not restricted
to perturbative or nearly-linear regimes. However, there is no obvious topology
here (except possibly for that generated by the vortex filaments) and as far as I
know, there is not even a “proof-of-concept” version of this idea for any evolution
equation. So this is really more of a wish than any sort of concrete strategy.
1.4.1 Notes
This article was originally posted on Mar 18, 2007 at
terrytao.wordpress.com/2007/03/18
Nets Katz points out that a significantly simpler (but still slightly supercritical)
problem would be to improve the double-exponential bound of Beale-Kato-Majda
[BeKaMa1984] for the growth of vorticity for periodic solutions to the Euler equations
in two dimensions.
Sarada Rajeev points out an old observation of Arnold that the Euler equations are
in fact the geodesic flow in the group of volume-preserving diffeomorphisms (using
the Euclidean L2 norm of the velocity field to determine the Riemannian metric struc-
ture); such structure may well be decisive in improving our understanding of the Euler
equation, and thus (indirectly) for Navier-Stokes as well.
Stephen Montgomery-Smith points out that any new conserved or monotone quan-
tities (of the type needed to make a “Strategy 2” approach work) might distort the
famous Kolmogorov 5/3 power law for the energy spectrum. Since this law has been
confirmed by many numerical experiments, this could be construed as evidence against
a Strategy 2 approach working. On the other hand, Montgomery-Smith also pointed
out that for two-dimensional Navier-Stokes, one has L p bounds on vorticity which do
not affect the Kraichnan 3 power law coming from the enstrophy.
After the initial posting of this article, I managed to show[Ta2008b] that the peri-
odic global regularity problem for Navier-Stokes was equivalent to the task of obtain-
ing a local or global H 1 bound on classical solutions, thus showing that the regularity
problem is in some sense “equivalent” to that of making Strategy 2 work.
1.5. SCARRING FOR THE BUNIMOVICH STADIUM 21
The problem of scarring for the Bunimovich stadium is well known in the area of
quantum chaos or quantum ergodicity (see e.g. [BuZw2004]); I am attracted to it both
for its simplicity of statement, and also because it focuses on one of the key weaknesses
in our current understanding of the Laplacian, namely is that it is difficult with the tools
we know to distinguish between eigenfunctions (exact solutions to −∆uk = λk uk ) and
quasimodes (approximate solutions to the same equation), unless one is willing to work
with generic energy levels rather than specific energy levels.
The Bunimovich stadium Ω is the name given to any planar domain consisting of
a rectangle bounded at both ends by semicircles. Thus the stadium has two flat edges
(which are traditionally drawn horizontally) and two round edges: see Figure 1.1.
Despite the simple nature of this domain, the stadium enjoys some interesting clas-
sical and quantum dynamics. It was shown by Bunimovich[Bu1974] that the classical
billiard ball dynamics on this stadium is ergodic, which means that a billiard ball with
randomly chosen initial position and velocity (as depicted above) will, over time, be
uniformly distributed across the billiard (as well as in the energy surface of the phase
space of the billiard). On the other hand, the dynamics is not uniquely ergodic because
there do exist some exceptional choices of initial position and velocity for which one
does not have uniform distribution, namely the vertical trajectories in which the billiard
reflects orthogonally off of the two flat edges indefinitely.
Rather than working with (classical) individual trajectories, one can also work with
(classical) invariant ensembles - probability distributions in phase space which are in-
variant under the billiard dynamics. Ergodicity then says that (at a fixed energy) there
are no invariant absolutely continuous ensemble other than the obvious one, namely
the probability distribution with uniformly distributed position and velocity direction.
On the other hand, unique ergodicity would say the same thing but dropping the “ab-
solutely continuous” - but each vertical bouncing ball mode creates a singular invariant
ensemble along that mode, so the stadium is not uniquely ergodic.
Now from physical considerations we expect the quantum dynamics of a system to
have similar qualitative properties as the classical dynamics; this can be made precise
in many cases by the mathematical theories of semi-classical analysis and microlocal
analysis. The quantum analogue of the dynamics of classical ensembles is the dynam-
22 CHAPTER 1. OPEN PROBLEMS
h̄2
ih̄∂t ψ + ∆ψ = 0,
2m
where we impose Dirichlet boundary conditions ψ|∂ Ω = 0 (one can also impose Neu-
mann conditions if desired, the problems seem to be roughly the same). The quantum
analogue of an invariant ensemble is that of a single eigenfunction (i.e. a solution uk
to the equation −∆uk = λk uk ), which we normalise in the usual L2 manner, so that
2
R
Ω |uk | = 1. (Due to the compactness of the domain Ω, the set of eigenvalues λk of
the Laplacian −∆ is discrete and goes to infinity, though there is some multiplicity
arising from the symmetries of the stadium. These eigenvalues are the same eigen-
values that show up in the famous “can you hear the shape of a drum?” problem
[Ka1966].) Roughly speaking, quantum ergodicity is then the statement that almost
all eigenfunctions are uniformly distributed in physical space (as well as in the energy
surface of phase space), whereas quantum unique ergodicity (QUE) is the statement
that all eigenfunctions are uniformly distributed. A little more precisely:
1. If quantum ergodicity holds, then for any open subset A ⊂ Ω we have A |uk |2 →
R
In fact, quantum ergodicity and quantum unique ergodicity say somewhat stronger
things than the above two statements, but I would need tools such as pseudodiffer-
ential operators to describe these more technical statements, and so I will not do so
here.
Now it turns out that for the stadium, quantum ergodicity is known to be true; this
specific result was first obtained by Gérard and Leichtman[GeLi1993], although ”clas-
sical ergodicity implies quantum ergodicity” results of this type go back to Schnirelman[Sn1974]
(see also [Ze1990], [CdV1985]). These results are established by microlocal analysis
methods, which basically proceed by aggregating all the eigenfunctions together into
a single object (e.g. a heat kernel, or some other function of the Laplacian) and then
analysing the resulting aggregate semiclassically. It is because of this aggregation that
one only gets to control almost all eigenfunctions, rather than all eigenfunctions.
In analogy to the above theory, one generally expects classical unique ergodicity
should correspond to QUE. For instance, there is the famous (and very difficult) quan-
tum unique ergodicity conjecture of Rudnick and Sarnak[RuSa1994], which asserts
that QUE holds for all compact manifolds without boundary with negative sectional
curvature. This conjecture will not be discussed here (it would warrant an entire arti-
cle in itself, and I would not be the best placed to write it). Instead, we focus on the
Bunimovich stadium. The stadium is clearly not classically uniquely ergodic due to the
vertical bouncing ball modes, and so one would conjecture that it is not QUE either. In
fact one conjectures the slightly stronger statement:
Conjecture 1.1 (Scarring conjecture). There exists a subset A ⊂ Ω and a sequence uk j
of eigenfunctions with λk j → ∞, such that A |uk j |2 does not converge to |A|/|Ω|. Infor-
R
1.5. SCARRING FOR THE BUNIMOVICH STADIUM 23
On the other hand, the stadium is a very simple object - it is one of the simplest
and most symmetric domains for which we cannot actually compute eigenfunctions or
eigenvalues explicitly. It is tempting to just discard all the microlocal analysis and just
try to construct eigenfunctions by brute force. But this has proven to be surprisingly
difficult; indeed, despite decades of sustained study into the eigenfunctions of Lapla-
cians (given their many applications to PDE, to number theory, to geometry, etc.) we
still do not know very much about the shape and size of any specific eigenfunction for
a general manifold, although we know plenty about the average-case behaviour (via
microlocal analysis) and also know the worst-case behaviour (by Sobolev embedding
or restriction theorem type tools). This conjecture is one of the simplest conjectures
which would force us to develop a new tool for understanding eigenfunctions, which
could then conceivably have a major impact on many areas of analysis.
One might consider modifying the stadium in order to make scarring easier to show,
for instance by selecting the dimensions of the stadium appropriately (e.g. obeying a
Diophantine condition), or adding a potential or magnetic term to the equation, or per-
haps even changing the metric or topology. To have even a single rigorous example
of a reasonable geometric operator for which scarring occurs despite the presence of
quantum ergodicity would be quite remarkable, as any such result would have to in-
volve a method that can deal with a very rare set of special eigenfunctions in a manner
quite different from the generic eigenfunction.
Actually, it is already interesting to see if one can find better quasimodes than the
ones listed above which exhibit scarring, i.e. to improve the O(1) error in the spectral
bandwidth; this specific problem has been proposed in [BuZw2004] as a possible toy
version of the main problem.
1.5.1 Notes
This article was originally posted on Mar 28, 2007 at
terrytao.wordpress.com/2007/03/28
Greg Kuperberg and I discussed whether one could hope to obtain this conjecture
by deforming continuously from the rectangle (for which the eigenfunctions are ex-
plicitly known) to the stadium. Unfortunately, since eigenvalues generically do not
intersect each other under continuous deformations, the ordering of the eigenvalues
does not change, and so by Weyl’s law one does not expect the scarred states of the
stadium to correspond to any particularly interesting states of the rectangle.
Many pictures of stadium eigenfunctions can be found online, for instance at Dou-
glas Stone’s page
www.eng.yale.edu/stonegroup/
or Arnd Bäcker’s page
www.physik.tu-dresden.de/\˜{}baecker/
A small number of these eigenfunctions seem to exhibit scarring, thus providing
some numerical support for the above conjectures, though of course these conjectures
concern the asymptotic regime in which the eigenvalue goes to infinity, and so cannot
be proved or disproved solely through numerics.
1.6. TRIANGLE AND DIAMOND DENSITIES 25
Up to insignificant errors of o(1) (i.e. anything which goes to zero as the number
of vertices goes to infinity), these densities can also be interpreted probabilistically as
follows: if x, y, z, w are randomly selected vertices in V, then we have
P({x, y} ∈ E) = α + o(1);
P({x, y}, {y, z}, {z, x} ∈ E) = β + o(1); and
P({x, y}, {y, z}, {z, x}, {y, w}, {w, x} ∈ E) = γ + o(1).
(The errors of o(1) arise because the vertices x, y, z, w may occasionally collide with
each other, though this probability becomes very small when n is large.) Thus we see
that these densities are “local” qualities of the graph, as we only need to statistically
sample the graph at a small number of randomly chosen vertices in order to estimate
them.
A general question is to determine all the constraints relating α, β , γ in the limit n →
∞. (It is known from the work of Lovász and Szegedy[LoSz2006] that the relationships
between local graph densities such as these stabilise in this limit; indeed, given any
error tolerance ε > 0 and any large graph G with densities α, β , γ, there exists a graph
with “only” Oε (1) vertices whose densities α 0 , β 0 , γ 0 differ from those of G by at most
ε, although the best known bounds for Oε (1) are far too poor at present to be able
to get any useful information on the asymptotic constraint set by direct exhaustion by
computer of small graphs.)
Let us forget about diamonds for now and only look at the edge and triangle densi-
ties α, β . Then the story is already rather non-trivial. The main concern is to figure out,
26 CHAPTER 1. OPEN PROBLEMS
for each fixed α, what the best possible upper and lower bounds on β are (up to o(1)
errors); since the collection of graphs with a given edge density is “path-connected”
in some sense, it is not hard to see that every value of β between the upper and lower
bounds is feasible modulo o(1) errors.
The best possible upper bound is easy: β ≤ α 3/2 + o(1). This can be estab-
lished by either the Kruskal-Katona theorem[Kr1963, Ka1968], the Loomis-Whitney
inequality[LoWh1949] (or the closely related box theorem[BoTh1995]), or just two
applications of Hölder’s inequality; we leave this as an exercise. The bound is sharp, as
can be seen by looking at a complete subgraph on (α 1/2 + o(1))n vertices. (We thank
Tim Austin and Imre Leader for these observations and references, as well as those in
the paragraph below.) There is some literature on refining the o(1) factor; see [Ni2008]
for a survey.
The lower bound is trickier. The complete bipartite graph example shows that the
trivial lower bound β ≥ 0 is attainable when α ≤ 1/2−o(1), and Turán’s theorem[Tu1941]
shows that this is sharp. For α ≥ 1/2, a classical theorem of Goodman[Go1959] (see
also [NoSt1963]) shows that β ≥ α(2α − 1) − o(1). When α = 1 − 1/k for some in-
teger k, this inequality is sharp, as can be seen by looking at the complete k-partite
graph.
Goodman’s result is thus sharp at infinitely many values of α, but it turns out that it
is not quite the best bound. After several partial results, the optimal bound was obtained
recently by Razborov[Ra2008b], who established for 1 − 1/k < α < 1 − 1/(k + 1) that
p p
(k − 1) k − 2 k(k − α(k + 1)) k + k(k − α(k + 1))
β≥ − o(1)
k2 (k + 1)2
and that this is sharp (!) except for the o(1) error (see [Fi1989] for some additional
work on this error term).
Now we consider the full problem of relating edge densities, triangle densities, and
diamond densities. Given that the relationships between α, β were already so complex,
a full characterisation of the constraints connecting α, β , γ is probably impossible at
this time (though it might be possible to prove that they can be decidable via some
(impractical) computer algorithm, and it also looks feasible to determine the exact
constraints between just α and γ). The question of Trevisan however focuses on a
specific regime in the configuration space, in which β is exceptionally small. From the
Cauchy-Schwarz inequality and the observation that a diamond is nothing more than a
pair of triangles with a common edge, we obtain the inequality
β2
γ≥ − o(1). (1.1)
α
Because we understand very well when equality holds in the Cauchy-Schwartz inequal-
ity, we know that (1.1) would only be sharp when the triangles are distributed ”evenly”
among the edges, so that almost every edge is incident to the roughly expected number
of triangles (which is roughly β n/α). However, it is a remarkable fact that this type of
equidistribution is known to be impossible when β is very small. Indeed, the triangle
removal lemma of Ruzsa and Szemerédi[RuSz1978] asserts that if β is small, then one
1.6. TRIANGLE AND DIAMOND DENSITIES 27
can in fact make β vanish (i.e. delete all triangles) by removing at most c(β )n2 edges,
where c(β ) → 0 in the limit β → 0. This shows that among all the roughly αn2 /2
edges in the graph, at most c(β )n2 of them will already be incident to all the triangles
in the graph. This, and Cauchy-Schwarz, gives a bound of the form
β2
γ≥ − o(1), (1.2)
c(β )
2 ↑↑ 1 := 2; 2 ↑↑ (n + 1) := 22↑↑n ;
this is a very rapidly growing function, faster than exponential, double exponential,
or any other finite iterated exponential. We invert this function and define the inverse
tower function log∗ n by
This function goes to infinity as n → ∞, but very slowly - slower than log n or even
log log n (which, as famously stated by Carl Pomerance, “is proven to go to infinity, but
has never been observed to do so”).
The best bound on c(β ) known is of the form
1 −ε
c(β ) (log∗ )
β
for some absolute constant ε > 0 (e.g. 1/10 would work here). This bound is so poor
because the proof goes via the Szemerédi regularity lemma[Sz1978], which is known
by the work of Gowers[Go1997] to necessarily have tower-type dependencies in the
constants.
The open question is whether one can obtain a bound of the form (1.2) in which
1/c(β ) is replaced by a quantity which grows better in β , e.g. one which grows log-
arithmically or double logarithmically rather than inverse-tower-exponential. Such a
bound would perhaps lead the way to improving the bounds on the triangle removal
lemma; we now have many proofs of this lemma, but they all rely on one form or
another of the regularity lemma and so inevitably have the tower-exponential type
bounds present. The triangle removal lemma is also connected to many other prob-
lems, including property testing for graphs and Szemerédi’s theorem on arithmetic
progressions[Sz1975] (indeed, the triangle removal lemma implies the length three
special case of Szemerédi’s theorem, i.e. Roth’s theorem[Ro1953]), so progress on im-
proving (1.2) may well lead to much better bounds in many other problems, as well
as furnishing another tool beyond the regularity lemma with which to attack these
problems. Curiously, the work of Lovász and Szegedy[LoSz2006] implies that the
question can be rephrased in a purely analytic fashion, without recourse to graphs. Let
28 CHAPTER 1. OPEN PROBLEMS
W : [0, 1]2 → [0, 1] be a measurable symmetric function on the unit square, and consider
the quantities
Z 1Z 1Z 1
β := W (x, y)W (y, z)W (z, x) dxdydz
0 0 0
and Z 1Z 1Z 1Z 1
γ := W (x, y)W (y, z)W (z, x)W (y, w)W (w, x) dxdydzdw.
0 0 0 0
Any bound connecting β and γ here is known to imply the same bound for triangle
and diamond densities (with an error of o(1)), and vice versa. Thus, the question is
now to establish the inequality γ ≥ β 2 /c0 (β ) for some civilised value of c; (β ), which
at present is only known to decay to zero as β → 0 like an inverse tower-exponential
function.
1.6.1 Notes
This article was originally posted on Apr 1, 2007 at
terrytao.wordpress.com/2007/04/01
Thanks to Vlado Nikiforov for pointing out some additional references and related
questions.
Yuval Peres pointed out some similarity between this problem and a conjecture
of Sidorenko[Si1994], which asserts that the number of copies of a bipartite graph
(VH ,WH , EH ) inside a larger graph (VG ,WG , EG ) should always be at least |VG ||VH | |WG ||WH | (|EG |/|VG ||WG |)|EH | ,
which is asymptotically what one expects for a random graph; this conjecture is known
for some simple examples of graphs H, such as cycles, paths, or stars, but is open in
general.
1.7. WHAT IS A QUANTUM HONEYCOMB? 29
λ1 ≥ λ2 ≥ . . . ≥ λn
the eigenvalues of B as
µ1 ≥ µ2 ≥ . . . ≥ µn
and the eigenvalues of C as
ν1 ≥ ν2 ≥ . . . ≥ νn
. Thus for instance µ2 is the second largest eigenvalue of B, etc.
An old question (essentially due to Sylvester, though this particular formulation is
due to Weyl) was to determine the complete set of relationships between the λi , the µ j ,
and the νk . There are a number of reasonably obvious equalities and inequalities that
one can obtain here. For instance, from the obvious identity tr(A) + tr(B) = tr(C) we
conclude the trace identity
λ1 + . . . + λn + µ1 + . . . + µn = ν1 + . . . + νn ,
30 CHAPTER 1. OPEN PROBLEMS
ν1 ≤ λ1 + µ1 .
And so on and so forth. It turns out that the set of all possible λi , µ j , νk form a convex
cone, determined by a finite number of linear inequalities; this can be derived from
symplectic geometry considerations (the Atiyah/Guillemin-Sternberg convexity theo-
rem [At1982], [GuSt1982], or more precisely a refinement due to Kirwan[Ki1984]).
A complete (in fact, overcomplete) list of such inequalities, generated by a beautifully
recursive formula, was conjectured by Alfred Horn[Ho1962]. The Horn conjecture
was finally settled in a combination of two papers: one by Klyachko[Kl1998], which
used geometric invariant theory to reduce the problem to a simpler problem known as
the saturation conjecture, and one by Allen Knutson and myself [KnTa1999], which
established the saturation conjecture by a combinatorial argument using honeycombs.
Note that the lengths of the edges in the honeycomb are variable, but there are only
three possible orientations, 120 degree angles apart. This is a honeycomb of order
1.7. WHAT IS A QUANTUM HONEYCOMB? 31
e(λ1 ), . . . , e(λn ),
e(µ1 ), . . . , e(µn ),
e(ν1 ), . . . , e(νn ),
inequality
ν1 ≤ λ1 + µ1 .
One can continue creating inequalities of this type, and there will be a strong resem-
blance of those inequalities with those in the additive problem. This is not so surpris-
ing, since the additive problem emerges as a limiting case of the multiplicative one
(if U = exp(εA),V = exp(εB),W = exp(εC) and UV = W , then A + B = C + O(ε)
when ε is small, by the Baker-Campbell-Hausdorff formula). What is more surprising
is that when the λi , µ j , νk are sufficiently small, that the inequalities which describe the
multiplicative problem are exactly those that describe the additive problem! In fact,
it is known that the space of all possible λi , µ j , νk for the multiplicative problem is a
convex polytope contained within the convex cone for the additive problem, and in fact
a quantum version of the Horn conjecture (i.e. an explicit recursive description of the
faces of this polytope) was proven by Belkale [Be2008] (building upon earlier work
in [AgWo1998], [Be2001]). For instance, while for the additive problem there is the
constraint
νi+ j−1 ≤ λi + µ j
whenever i + j − 1 ≤ n (the Weyl inequalities), in the multiplicative problem one also
has the additional constraint
λi + µ j ≤ νi+ j + 1.
As with the additive problem, the complete set of all inequalities of this form turns out
to be rather messy to describe and I will not do so here.
Just as the additive Weyl problem turned out to be linked to Schubert calculus (the
intersection numbers of Schubert classes), the multiplicative problem turned out to be
linked to quantum Schubert calculus (the Gromov-Witten numbers of the same classes),
and making this link precise turned out to be the key to the proof of the quantum Horn
conjecture.
This solves the “qualitative” version of the multiplicative Weyl problem, namely
whether there exists any triple UV = W with the specified eigenvalues. However, one
can still ask “quantitative” versions, namely to compute the volume of the space of all
such triples. There is also the discretised quantitative version, which concerns either
the Gromov-Witten numbers for Schubert classes, or else the multiplicities of fusion
products in the Verlinde algebra of SU(n); these are rather technical and we refer to
[AgWo1998] for details. There should exist some concept of “quantum honeycomb”
which computes all of these numbers, in much the same way that the usual honeycombs
computes the volume of the space of solutions to A + B = C with specified eigenval-
ues, intersection numbers for Schubert classes, or multiplicities of tensor products of
SU(n) irreducible representations. Vaguely speaking it seems that one wants to con-
struct an analogue of the planar honeycomb which lives instead on something like a
two-dimensional torus, but it is not entirely clear (even when n = 2) what the precise
definition of this object should be.
It may seem like one needs to learn a fearsome amount of machinery to attack
this problem, but actually I think one can at least guess what the quantum honeycomb
should be just by experimentation with small cases n = 1, 2, 3 and by using various san-
ity checks (this is how Allen and I discovered the additive honeycombs). For instance,
1.7. WHAT IS A QUANTUM HONEYCOMB? 33
the equation UV = W has the cyclic symmetry (U,V,W ) 7→ (V,W −1 ,U −1 ) and so the
quantum honeycomb should enjoy a similar symmetry. There is also the translation
symmetry (U,V,W ) 7→ (e(α)U, e(β )V, e(γ)W ) whenever α + β + γ = 0, so quantum
honeycombs should be translation invariant. When the honeycomb is small (all vertices
close to the origin) there should be a bijective correspondence between the quantum
honeycomb and the regular honeycomb. The constraints between all the boundary val-
ues are already known due to the resolution of the quantum Horn conjecture. There are
some other extreme cases which are also understood quite well, for instance when one
of the matrices is very close to the identity but the other two are not.
My guess is that once a reasonable candidate for a quantum honeycomb is found
which passes all the obvious sanity checks, actually verifying that it computes every-
thing that it should will be a relatively routine matter (we have many different com-
binatorial ways of establishing things like this). This will give a combinatorial tool
for computing a number of interesting quantities, and will probably shed some light
also as to why these honeycombs appear in the subject in the first place. (It seems to
be somehow related to the Dynkin diagram An for the underlying group SU(n); it has
proven a little tricky to try to find analogues of these objects for the other Dynkin dia-
grams.) Certainly they seem to be computing something rather non-trivial; for instance
the Littlewood-Richardson numbers that are computed by additive honeycombs have
even been proposed to play a role in lower bounds in complexity theory, and specifi-
cally the P 6= NP problem[Mu2007]!
1.7.1 Notes
This article was originally posted on Apr 19, 2007 at
terrytao.wordpress.com/2007/04/19
A java applet demonstrating honeycoms in action can be found at [KnTa2001b].
Thanks to Allen Knutson for suggestions and encouragement.
34 CHAPTER 1. OPEN PROBLEMS
wavelets can be used to reconstitute functions, one can establish the desired bounded-
ness. The use of wavelets to mediate the action of the Hilbert transform fits well with
the two symmetries of the Hilbert transform (translation and scaling), because the col-
lection of wavelets also obeys (discrete versions of) these symmetries. One can view
the theory of such wavelets as a dyadic framework for Calderón-Zygmund theory.
Just as the Hilbert transform behaves like the identity, it was conjectured by Calderón
(motivated by the study of the Cauchy integral on Lipschitz curves) that the bilinear
Hilbert transform
dt
Z
B( f , g)(x) := p.v. f (x + t)g(x + 2t)
R t
would behave like the pointwise product operator f , g 7→ f g (exhibiting again the anal-
ogy between p.v.1/t and δ (t)), in particular one should have the Hölder-type inequality
whenever 1 < p, q < ∞ and 1r = 1p + 1q . (There is nothing special about the “2” in the
definition of the bilinear Hilbert transform; one can replace this constant by any other
constant α except for 0, 1, or ∞, though it is a delicate issue to maintain good control
on the constant C p,q when α approaches one of these exceptional values. Note that
by settingg = 1 and looking at the limiting case q = ∞ we recover the linear Hilbert
transform theory from the bilinear one; thus we expect the bilinear theory to be harder.)
Again, this claim is trivial when localising to a single scale |t| ∼ 2n , as it can then be
quickly deduced from Hölder’s inequality. The difficulty is then to combine all the
scales together.
It took some time to realise that Calderón-Zygmund theory, despite being incredi-
bly effective in the linear setting, was not quite the right tool for the bilinear problem.
One way to see the problem is to observe that the bilinear Hilbert transform B (or
more precisely, the estimate (1.3)) enjoys one additional symmetry beyond the scaling
and translation symmetries that the Hilbert transform H obeyed. Namely, one has the
modulation invariance
B(e−2ξ f , eξ g) = e−ξ B( f , g)
for any frequency ξ , where eξ (x) := e2πiξ x is the linear plane wave of frequency ξ ,
which leads to a modulation symmetry for the estimate (1.3). This symmetry - which
has no non-trivial analogue in the linear Hilbert transform - is a consequence of the
algebraic identity
ξ x − 2ξ (x + t) + ξ (x + 2t) = 0
which can in turn be viewed as an assertion that linear functions have a vanishing
second derivative.
It is a general principle that if one wants to establish a delicate estimate which is
invariant under some non-compact group of symmetries, then the proof of that estimate
should also be largely invariant under that symmetry (or, if it does eventually decide to
break the symmetry (e.g. by performing a normalisation), it should do so in a way that
will yield some tangible profit). Calderón-Zygmund theory gives the frequency origin
ξ = 0 a preferred role (for instance, all wavelets have mean zero, i.e. their Fourier
36 CHAPTER 1. OPEN PROBLEMS
transforms vanish at the frequency origin), and so is not the appropriate tool for any
modulation-invariant problem.
The conjecture of Calderón was finally verified in a breakthrough pair of papers by
Lacey and Thiele [LaTh1997, LaTh1999], first in the “easy” region 2 < p, q, r0 < ∞ (in
which all functions are locally in L2 and so local Fourier analytic methods are partic-
ularly tractable) and then in the significantly larger region where r > 2/3. (Extending
the latter result to r = 2/3 or beyond remains open, and can be viewed as a toy version
of the trilinear Hilbert transform question discussed below.) The key idea (dating back
to [Fe1973]) was to replace the wavelet decomposition by a more general wave packet
decomposition - wave packets being functions which are well localised in position,
scale, and frequency, but are more general than wavelets in that their frequencies do
not need to hover near the origin; in particular, the wave packet framework enjoys the
same symmetries as the estimate that one is seeking to prove. (As such, wave packets
are a highly overdetermined basis, in contrast to the exact bases that wavelets offers,
but this turns out to not be a problem, provided that one focuses more on decompos-
ing the operator B rather than the individual functions f , g.) Once the wave packets
are used to mediate the action of the bilinear Hilbert transform B, Lacey and Thiele
then used a carefully chosen combinatorial algorithm to organise these packets into
“trees” concentrated in mostly disjoint regions of phase space, applying (modulated)
Calderón-Zygmund theory to each tree, and then using orthogonality methods to sum
the contributions of the trees together. (The same method also leads to the simplest
proof known[LaTh2000] of Carleson’s celebrated theorem[Ca1966] on convergence of
Fourier series.)
Since the Lacey-Thiele breakthrough, there has been a flurry of other papers (in-
cluding some that I was involved in) extending the time-frequency method to many
other types of operators; all of these had the characteristic that these operators were
invariant (or “morally” invariant) under translation, dilation, and some sort of modula-
tion; this includes a number of operators of interest to ergodic theory and to nonlinear
scattering theory. However, in this post I want to instead discuss an operator which
does not lie in this class, namely the trilinear Hilbert transform
dt
Z
T ( f , g, h)(x) := p.v. f (x + t)g(x + 2t)h(x + 3t) .
R t
Again, since we expect p.v.1/t to behave like δ (t), we expect the trilinear Hilbert
transform to obey a Hölder-type inequality
2
for any “quadratic frequency” ξ , where qξ (x) := e2πiξ x is the quadratic plane wave of
frequency ξ , which leads to a quadratic modulation symmetry for the estimate (1.4).
This symmetry is a consequence of the algebraic identity
which can in turn be viewed as an assertion that quadratic functions have a vanishing
third derivative.
It is because of this symmetry that time-frequency methods based on Fefferman-
Lacey-Thiele style wave packets seem to be ineffective (though the failure is very
slight; one can control entire “forests” of trees of wave packets, but when summing
up all the relevant forests in the problem one unfortunately encounters a logarithmic
divergence; also, it is known that if one ignores the sign of the wave packet coeffi-
cients and only concentrates on the magnitude - which one can get away with for the
bilinear Hilbert transform - then the associated trilinear expression is in fact divergent).
Indeed, wave packets are certainly not invariant under quadratic modulations. One can
then hope to work with the next obvious generalisation of wave packets, namely the
“chirps” - quadratically modulated wave packets - but the combinatorics of organising
these chirps into anything resembling trees or forests seems to be very difficult. Also,
recent work in the additive combinatorial approach to Szemerédi’s theorem[Sz1975]
(as well as in the ergodic theory approaches) suggests that these quadratic modulations
might not be the only obstruction, that other “2-step nilpotent” modulations may also
need to be somehow catered for. Indeed I suspect that some of the modern theory of
Szemerédi’s theorem for progressions of length 4 will have to be invoked in order to
solve the trilinear problem. (Again based on analogy with the literature on Szemerédi’s
theorem, the problem of quartilinear and higher Hilbert transforms is likely to be sig-
nificantly more difficult still, and thus not worth studying at this stage.)
This problem may be too difficult to attack directly, and one might look at some
easier model problems first. One that was already briefly mentioned above was to return
to the bilinear Hilbert transform and try to establish an endpoint result at r = 2/3. At
this point there is again a logarithmic failure of the time-frequency method, and so one
is forced to hunt for a different approach. Another is to look at the bilinear maximal
operator
1 r
Z
M( f , g)(x) := supr>0 f (x + t)g(x + 2t)dt
2r −r
which is a bilinear variant of the Hardy-Littlewood maximal operator, in much the
same way that the bilinear Hilbert transform is a variant of the linear Hilbert trans-
form. It was shown by Lacey[La2000] that this operator obeys most of the bounds
that the bilinear Hilbert transform does, but the argument is rather complicated, com-
bining the time-frequency analysis with some Fourier-analytic maximal inequalities of
Bourgain[Bo1990]. In particular, despite the “positive” (non-oscillatory) nature of the
maximal operator, the only known proof of the boundedness of this operator is oscil-
latory. It is thus natural to seek a “positive” proof that does not require as much use
of oscillatory tools such as the Fourier transform, in particular it is tempting to try
an additive combinatorial approach. Such an approach has had some success with a
slightly easier operator in a similar spirit, in an unpublished paper of Demeter, Thiele,
38 CHAPTER 1. OPEN PROBLEMS
1.8.1 Notes
This article was originally posted on May 10, 2007 at
terrytao.wordpress.com/2007/05/10
Thanks to Gil Kalai for helpful comments.
1.9. EFFECTIVE SKOLEM-MAHLER-LECH THEOREM 39
sequence. (Hint: this is related to the fact that the sum or product of algebraic integers
is again an algebraic integer.)
The Skolem-Mahler-Lech theorem concerns the set of zeroes Z := {n ∈ N : xn = 0}
of a given integer linear recurrence sequence. In the case of the Fibonacci sequence,
the set of zeroes is pretty boring; it is just {0}. To give a slightly less trivial example,
the linear recurrence sequence
xn = xn−2 ; x0 = 0, x1 = 1
has a zero set which is the even numbers {0, 2, 4, . . .}. Similarly, the linear recurrence
sequence
xn = xn−4 + xn−2 ; x0 = x1 = x3 = 0, x2 = 1
has a zero set {0, 1, 3, 5, . . .}, i.e. the odd numbers together with 0. One can ask
whether more interesting zero sets are possible; for instance, can one design a lin-
ear recurrence system which only vanishes at the square numbers {0, 1, 4, 9, . . .}? The
Skolem-Mahler-Lech theorem says no:
Theorem 1.2 (Skolem-Mahler-Lech theorem). The zero set of a linear recurrence set
is eventually periodic, i.e. it agrees with a periodic set for sufficiently large n. In fact,
a slightly stronger statement is true: the zero set is the union of a finite set and a finite
number of residue classes {n ∈ N : n = r mod m}.
Interestingly, all known proofs of this theorem require that one introduce the p-
adic integers Z p (or a thinly disguised version thereof). Let me quickly sketch a proof
as follows (loosely based on the proof of Hansel[Ha1985]). Firstly it is not hard to
reduce to the case where the final coefficient ad is non-zero. Then, by elementary
linear algebra, one can get a closed form for the linear recurrence sequence as
xn = hAn v, wi
hAmn Ar v, wi = 0
This identity makes sense in the (rational) integers Z, and hence also in the larger ring
of p-adic integers Z p . On the other hand, observe from binomial expansion that P(n)
can be expressed as a formal power series in p with coefficients polynomial in n:
∞
P(n) = ∑ p j Pj (n).
j=0
Problem 1.3. Given an integer linear recurrence sequence (i.e. given the data d, a1 , . . . , ad , x0 , . . . , xd−1
as integers), is the truth of the statement ”xn 6= 0 for all n” decidable in finite time?
(Note that I am only asking here for decidability, and not even asking for effective
bounds.) It is faintly outrageous that this problem is still open; it is saying that we do
not know how to decide the halting problem even for “linear” automata!
The basic problem seems to boil down to one of determining whether an “almost
polynomial” P : Z p → Z p (i.e. a uniform limit of polynomials) has an integer zero
or not. It is not too hard to find the p-adic zeroes of P to any specified accuracy
(by using the p-adic version of Newton’s method, i.e. Hensel’s lemma), but it seems
that one needs to know the zeroes to infinite accuracy in order to decide whether they
are integers or not. It may be that some techniques from Diophantine approximation
(e.g. some sort of p-adic analogue of the Thue-Siegel-Roth theorem[Ro1955]) are
relevant. Alternatively, one might want to find a completely different proof of the
Skolem-Mahler-Lech theorem, which does not use p-adics at all.
42 CHAPTER 1. OPEN PROBLEMS
1.9.1 Notes
This article was originally posted on May 25, 2007 at
terrytao.wordpress.com/2007/05/25
I thank Kousha Etessami and Tom Lenagan for drawing this problem to my atten-
tion. Thanks to Johan Richter and Maurizio for corrections.
Akshay Venkatesh, Jordan Ellenberg, and Felipe Voloch observed that there were
several deeper theorems in arithmetic geometry than the Skolem-Mahler-Lech theorem
which were similarly ineffective, such as Chabauty’s theorem and Falting’s theorem;
indeed one can view these three theorems as counting rational points on linear tori,
abelian varieties, and higher genus varieties respectively.
Kousha Etessami also pointed out the work of Blondel and Portier[BlPo2002]
showing the NP-hardness of determining whether an integer linear recurrence con-
tained a zero, as well as the survey [HaHaHiKa2005].
1.10. THE PARITY PROBLEM IN SIEVE THEORY 43
Well, we know that the total number of integers in [N, 2N] is N + O(1). Of this set,
we know that 21 N + O(1) of the elements are not coprime to 2 (i.e. they are divisible
by 2), and that 31 N + O(1) are not coprime to 3. So we should subtract those two sets
from the original set, leaving 61 N + O(1). But the numbers which are divisible by both
2 and 3 (i.e. divisible by 6) have been subtracted twice, so we have to put them back
in; this adds in another 16 N + O(1), giving a final count of 13 N + O(1) for the quantity
(1.5); this is of course a simple instance of the inclusion-exclusion principle in action.
An alternative way to estimate (1.5) is to use the Chinese remainder theorem to rewrite
(1.5) as
|{n ∈ [N, 2N] : n = 1, 5 mod 6}|
and use our ability to count residue classes modulo 6 to get the same final count of
1
3 N + O(1) (though the precise bound on the error term will be slightly different). For
very small moduli such as 2 and 3, the Chinese remainder theorem is quite efficient, but
it is somewhat rigid, and for higher moduli (e.g. for moduli much larger than log N) it
turns out that the more flexible inclusion-exclusion principle gives much better results
(after applying some tricks to optimise the efficiency of that principle).
We can of course continue the example of (1.5), counting the numbers in [N, 2N]
which are coprime to 2, 3, 5, 7, etc., which by the sieve of Eratosthenes will eventually
give us a count for the primes in [N, 2N], but let us pause for a moment to look at the
larger picture. We have seen that some sets in [N, 2N] are fairly easy to count accurately
(e.g. residue classes with small modulus), and others are not (e.g. primes, twin primes).
1.10. THE PARITY PROBLEM IN SIEVE THEORY 45
What is the defining characteristic of the former types of sets? One reasonable answer
is that the sets that are easy to count are low-complexity, but this is a rather vaguely
defined term. I would like to propose instead that sets (or more generally, weight
functions - see below) are easy to count (or at least estimate) whenever they are smooth
in a certain sense to be made more precise shortly. This terminology comes from
harmonic analysis rather than from number theory (though number theory does have
the related concept of a smooth number), so I will now digress a little bit to talk about
smoothness, as it seems to me that this concept implicitly underlies the basic strategy
of sieve theory.
Instead of talking about the problem of (approximately) counting a given set in
[N, 2N], let us consider instead the analogous problem of (approximately) computing
the area of a given region E (e.g. a solid ellipse) in the unit square [0, 1]2 . As we
are taught in high school, one way to do this is to subdivide the square into smaller
squares, e.g. squares of length 10−n for some n, and count how many of these small
squares lie completely or partially in the set E, and multiply by the area of each square;
this is of course the prelude to the Riemann integral. It works well as long as the set
E is “smooth” in the sense that most of the small squares are either completely inside
or completely outside the set E, with few borderline cases; this notion of smoothness
can viewed as a quantitative version of Riemann integrability. Another way of saying
this is that if one wants to determine whether a given point (x, y) lies in E, it is usually
enough just to compute x and y to the first n significant digits in the decimal expansion.
Now we return to counting sets in [N, 2N]. One can also define the notion of a
“smooth set” here by again using the most significant digits of the numbers n in the
interval [N, 2N]; for instance, the set [1.1N, 1.2N] would be quite smooth, as one would
be fairly confident whether n would lie in this set or not after looking at just the top two
or three significant digits. However, with this “Euclidean” or “Archimedean” notion
of smoothness, sets such as the primes or the odd numbers are certainly not smooth.
However, things look a lot better if we change the metric, or (more informally) if we
redefine what “most significant digit” is. For instance, if we view the last digit in the
base 10 expansion of a number n (i.e. the value of n mod 10) as the most significant
one, rather than the first - or more precisely, if we use the 10-adic metric intead of
the Euclidean one, thus embedding the integers into Z10 rather than into R - then the
odd numbers become quite smooth (the most significant digit completely determines
membership in this set). The primes in [N, 2N] are not fully smooth, but they do exhibit
some partial smoothness; indeed, if the most significant digit is 0, 2, 4, 5, 6, or 8, this
fully determines membership in the set, though if the most significant digit is 1, 3, 7,
or 9 then one only has partial information on membership in the set.
Now, the 10-adic metric is not fully satisfactory for characterising the elusive con-
cept of number-theoretic “smoothness”. For instance, the multiples of 3 should be a
smooth set, but this is not the case in the 10-adic metric (one really needs all the dig-
its before one can be sure whether a number is a multiple of 3!). Also, we have the
problem that the set [N/2, N] itself is now no longer smooth. This can be fixed by
working not with just the Euclidean metric or a single n-adic metric, but with the prod-
uct of all the n-adic metrics and the Euclidean metric at once. Actually, thanks to the
Chinese remainder theorem, it is enough to work with the product of the p-adic met-
rics for primes p and the Euclidean metric, thus embedding the integers in the integer
46 CHAPTER 1. OPEN PROBLEMS
adele ring R × ∏ p Z p . For some strange reason, this adele ring is not explicitly used in
most treatments of sieve theory, despite its obvious relevance (and despite the amply
demonstrated usefulness of this ring in algebraic number theory or in the theory of L-
functions, as exhibited for instance by Tate’s thesis[Ta1950]). At any rate, we are only
using the notion of “smoothness” in a very informal sense, and so we will not need
the full formalism of the adeles here. Suffice to say that a set of integers in [N, 2N] is
“smooth” if membership in that set can be largely determined by its most significant
digits in the Euclidean sense, and also in the p-adic senses for all small p; roughly
speaking, this means that this set is approximately the pullback of some “low com-
plexity” set in the adele ring - a set which can be efficiently fashioned out of a few of
basic sets which generate the topology and σ -algebra of that ring. (Actually, in many
applications of sieve theory, we only need to deal with moduli q which are square-free,
which means that we can replace the p-adics Z p with the cyclic group Z/pZ, and so it
is now just the residues mod p for small p, together with the Euclidean most significant
digits, which should control what smooth sets are; thus the adele ring has been replaced
by the product R × ∏ p (Z/pZ).)
[A little bit of trivia: the idea of using R × ∏ p (Z/pZ) as a proxy for the integers
seems to go all the way back to Sun Tzu, who introduced the Chinese Remainder The-
orem in order to efficiently count the number of soldiers in an army, by making them
line up in columns of (say) 7, 11, and 13 and count the three remainders, thus deter-
mining this number up to a multiple of 7 × 11 × 13 = 1001; doing a crude calculation
to compute the most significant digits in R of size of the army would then finish the
job.]
Let us now return back to sieve theory, and the task of counting “rough” sets such
as the primes in [N, 2N]. Since we know how to accurately count “smooth” sets such
as {n ∈ [N, 2N] : n = a mod q} with q small, one can try to describe the rough set of
primes as some sort of combination of smooth sets. The most direct implementation
of this idea is the sieve of Eratosthenes; if one then tries to compute the number of
primes using the inclusion-exclusion principle, one obtains the Legendre sieve (we
implicitly used this idea previously when counting the quantity (1.5)). However, the
number of terms in the inclusion-exclusion formula is very large; if one runs the sieve
of Eratosthenes for k steps (i.e. sieving out multiples of the first k primes), there are
basically 2k terms in the inclusion-exclusion formula, leading to an error term which in
the worst case could be of size O(2k ). A related issue is that the modulus q in many of
the terms in the Legendre sieve become quite large - as large as the product of the first k
primes (which turns out to be roughly ek in size). Since the set one is trying to count is
only of size N, we thus see that the Legendre sieve becomes useless after just log N or
so steps of the Eratosthenes sieve, which is well short of what one needs to accurately
count primes (which requires that one uses N 1/2 / log N or so steps). More generally,
“exact” sieves such as the Legendre sieve are useful for any situation involving only
a logarithmically small number of moduli, but are unsuitable for sieving with much
larger numbers of moduli.
One can view the early development of sieve theory as a concerted effort to rectify
the drawbacks of the Legendre sieve. The first main idea here is to not try to compute
the size of the rough set exactly - as this is too “expensive” in terms of the number of
smooth sets required to fully describe the rough set - but instead to just settle for upper
1.10. THE PARITY PROBLEM IN SIEVE THEORY 47
or lower bounds on the size of this set, which use fewer smooth sets. There is thus
a tradeoff between how well the bounds approximate the original set, and how well
one can compute the bounds themselves; by selecting various parameters appropriately
one can optimise this tradeoff and obtain a final bound which is non-trivial but not
completely exact. For instance, in using the Legendre sieve to try to count primes
between N and 2N, one can instead use that sieve to count the much larger set of
numbers between N and 2N which are coprime to the first k primes, thus giving an
upper bound for the primes between N and 2N. It turns out that the optimal value of k
here is roughly log N or so (after this, the error terms in the Legendre sieve get out of
hand), and give an upper bound of O(N/ log log N) for the number of primes between
N and 2N - somewhat far from the truth (which is ∼ N/ log N), but still non-trivial.
In a similar spirit, one can work with various truncated and approximate versions of
the inclusion-exclusion formula which involve fewer terms. For instance, to estimate
the cardinality | kj=1 A j | of the union of k sets, one can replace the inclusion-exclusion
S
formula
k
[ k
| A j| = ∑ |A j | − ∑ |A j1 ∩ A j2 |
j=1 j=1 1≤ j1 < j2 ≤k (1.6)
+ ∑ |A j1 ∩ A j2 ∩ A j3 | . . .
1≤ j1 < j2 < j3 ≤k
(also known as the union bound), or by the slightly less obvious lower bound
k
[ k
| A j| ≥ ∑ |A j | − ∑ |A j1 ∩ A j2 |.
j=1 j=1 1≤ j1 < j2 ≤k
More generally, if one takes the first n terms on the right-hand side of (1.6), this will
be an upper bound for the left-hand side for odd n and a lower bound for even n. These
inequalities, known as the Bonferroni inequalities, are a nice exercise to prove: they
are equivalent to the observation that in the binomial identity
m m m m m m m
0 = (1 − 1) = − + − + . . . + (−1)
0 1 2 3 m
for any m ≥ 1, the partial sums on the right-hand side alternate in sign between non-
negative and non-positive. If one inserts these inequalities into the Legendre sieve and
optimises the parameter, one can improve the upper bound for the number of primes
in [N, 2N] to O(N log log N/ log N), which is significantly closer to the truth. Unfortu-
nately, this method does not provide any lower bound other than the trivial bound of
0; either the main term is negative, or the error term swamps the main term. A simi-
lar argument was used by Brun[HaRi1974] to show that the number of twin primes in
[N, 2N] was O(N(log log N/ log N)2 ) (again, the truth is conjectured to be ∼ N/ log2 N),
48 CHAPTER 1. OPEN PROBLEMS
which implied his famous theorem that the sum of reciprocals of the twin primes is
convergent.
The full inclusion-exclusion expansion is a sum over 2k terms, which one can view
as binary strings of 0s and 1s of length k. In the Bonferroni inequalities, one only
sums over a smaller collection of strings, namely the Hamming ball of strings which
only involve n or fewer 1s. There are other collections of strings one can use which
lead to upper or lower bounds; one can imagine revealing such a string one digit at a
time and then deciding whether to keep or toss out this string once some threshold rule
is reached. There are various ways to select these thresholding rules, leading to the
family of combinatorial sieves. One particularly efficient such rule is similar to that
given by the Bonferroni inequalities, but instead of using the number of 1s in a string
to determine membership in the summation, one uses a weighted number of 1s (giving
large primes more weight than small primes, because they tend to increase the modulus
too quickly and thus should be removed from the sum sooner than the small primes).
This leads to the beta sieve, which for instance gives the correct order of magnitude
of O(N/ log N) for the number of primes in [N, 2N] or O(N/ log2 N) for the number of
twin primes in [N, 2N]. This sieve is also powerful enough to give lower bounds, but
only if one stops the sieve somewhat early, thus enlarging the set of primes to a set of
almost primes (numbers which are coprime to all numbers less than a certain threshold,
and thus have a bounded number of prime factors). For instance, this sieve can show
that there are an infinite number of twins n, n + 2, each of which has at most nine prime
factors (the number nine is not optimal, but to get better results requires much more
work).
There seems however to be a limit as to what can be accomplished by purely com-
binatorial sieves. The problem stems from the “binary” viewpoint of such sieves: any
given term in the inclusion-exclusion expansion is either included or excluded from
the sieve upper or lower bound, and there is no middle ground. This leads to the next
main idea in modern sieve theory, which is to work not with the cardinalities of sets in
[N, 2N], but rather with more flexible notion of sums of weight functions (real-valued
functions on [N,2N]). The starting point is the obvious formula
|A| = ∑ 1A (n)
n∈[N,2N]
for the cardinality of a set A in [N, 2N], where 1A is the indicator function of the set A.
Applying this to smooth sets such as {n ∈ [N, 2N] : n = a mod q}, we obtain
N
∑ 1n=a mod q (n) = + O(1);
n∈[N,2N]
q
in particular, specialising to the 0 residue class 0 mod q (which is the residue class of
importance for counting primes) we have
N
∑ 1d|n (n) = + O(1)
n∈[N,2N]
d
for any d. Thus if we can obtain a pointwise upper bound on 1A by a divisor sum
1.10. THE PARITY PROBLEM IN SIEVE THEORY 49
for all n and some real constants cd (which could be positive or negative), then on
summing we obtain the upper bound
cd
|A| ≤ N ∑ + O(∑ |λd |). (1.8)
d d d
One can also hope to obtain lower bounds on |A| by a similar procedure (though in
practice, lower bounds for primes have proven to be much more difficult to obtain, due
to the parity problem which we will discuss below). These strategies are suited for
the task of bounding the number of primes in [N, 2N]; if one wants to do something
fancier such as counting twin primes n, n + 2, one has to either involve more residue
classes (e.g. the class −2 mod q will play a role in the twin prime problem) or else
insert additional weights in the summation (e.g. weighting all summations in n by an
additional factor of Λ(n + 2), where Λ is the von Mangoldt function). To simplify the
exposition, though, we shall just stick with the plainer problem of counting primes.
The above strategies generalise the combinatorial sieve strategy, which is a special
case in which the constants cd are restricted to be +1, 0, or −1. In practice, the sum
c
∑d dd in (1.8) is relatively easy to sum by multiplicative number theory techniques;
the coefficients cd , in applications, usually involve the Möbius function µ(d) (which
is unsurprising, since they are encoding some sort of inclusion-exclusion principle),
and are often related to the coefficients of a Hasse-Weil zeta function, as they basically
count solutions modulo d to some set of algebraic equations. The main task is thus
to ensure that the error term in (1.8) does not swamp the main term. To do this, one
basically needs the weights cd to be concentrated on those d which are relatively small
compared with N, for instance they might be restricted to some range d ≤ R where
the sieve level R = N θ is some small power of N. Thus for instance, starting with the
identity
n
Λ(n) = ∑ µ(d) log = − ∑ µ(d) log(d), (1.9)
d|n
d d|n
P2 almost primes - products of at most two (large) primes (e.g. primes larger than N ε
for some fixed ε > 0). Indeed, if one introduces the second von Mangoldt function
n n
Λ2 (n) := ∑ µ(d) log2 ( ) = Λ(n) log n + ∑ Λ(d)Λ( ) (1.10)
d|n
d d|n
d
which is mostly supported on P2 almost primes (indeed, Λ2 (p) = log2 p and Λ2 (pq) =
2 log p log q for distinct primes p, q, and Λ2 is mostly zero otherwise), and uses the
elementary asymptotic
µ(d)
∑ log2 d = 2 log N + O(1)
d≤2N d
This formula (together with the weak prime number theorem mentioned earlier) easily
implies an “P2 almost prime number theorem”, namely that the number of P2 almost
primes less than N is (2 + o(1)) logNN . [This fact is much easier to prove than the prime
number theorem itself. In terms of zeta functions, the reason why the prime number
0 (s)
theorem is difficult is that the simple pole of ζζ (s) at s = 1 could conceivably be coun-
teracted by other simple poles on the line Re(s) = 1. On the other hand, the P2 almost
00 (s)
prime number theorem is much easier because the effect of the double pole of ζζ (s) at
s = 1 cannot be counteracted by the other poles on the line Re(s) = 1, which are at
most simple.]
The P2 almost prime number theorem establishes the prime number theorem “up to
a factor of 2”. It is surprisingly difficult to improve upon this factor of 2 by elementary
methods, though once one can replace 2 by 2−ε for some ε > 0 (a fact which is roughly
equivalent to the absence of zeroes of ζ (s) on the line Re(s) = 1), one can iterate the
Selberg symmetry formula (together with the tautological fact that an P2 almost prime
is either a prime or the product of two primes) to get the prime number theorem; this is
essentially the Erdős-Selberg [Er1949, Se1949] elementary proof of that theorem.
One can obtain other divisor bounds of the form (1.7) by various tricks, for instance
by modifying the weights in the above formulae (1.9), (1.10). A surprisingly useful
upper bound for the primes between N and 2N is obtained by the simple observation
that
1A (n) ≤ ( ∑ λd 1d|n (n))2
d<N
whenever λd are arbitrary real numbers with λ1 = 1, basically because the square of
any real number is non-negative. This leads to the Selberg sieve, which suffices for
many applications; for instance, it can prove the Brun-Titchmarsh inequality[Ti1930],
which asserts that the number of primes between N and N +M is at most (2+o(1))M/ log M,
which is again off by a factor of 2 from the truth when N and M are reasonably
comparable. (The o(1) error can even be essentially deleted by working harder; see
1.10. THE PARITY PROBLEM IN SIEVE THEORY 51
[MoVa1973].) There are also some useful lower bounds for the indicator function of
the almost primes of divisor sum type, which can be used for instance to derive Chen’s
theorem[Ch1973] that there are infinitely many primes p such that p + 2 is a P2 almost
prime, or the theorem that there are infinitely many P2 almost primes of the form n2 +1.
Assuming the generalised Riemann hypothesis, we have a similar claim for residue
classes:
∑ 1n=a mod q λ (n) = Oε (N 1/2+ε ) for all ε > 0.
n≤N
What this basically means is that the Liouville function is essentially orthogonal to all
smooth sets, or all smooth functions. Since sieve theory attempts to estimate everything
52 CHAPTER 1. OPEN PROBLEMS
in terms of smooth sets and functions, it thus cannot eliminate an inherent ambiguity
coming from the Liouville function. More concretely, let A be a set where λ is constant
(e.g. λ is identically −1, which would be the case if A consisted of primes) and suppose
we attempt to establish a lower bound for the size of a set A in, say, [N, 2N] by setting
up a divisor sum lower bound
where the divisors d are concentrated in d ≤ R for some reasonably small sieve level
R. If we sum in n we obtain a lower bound of the form
N
|A| ≥ ∑ cd +... (1.12)
d d
and we can hope that the main term ∑d cd Nd will be strictly positive and the error term
is of lesser order, thus giving a non-trivial lower bound on |A|. Unfortunately, if we
multiply both sides of (1.11) by the non-negative weight 1 + λ (n) and sum in n, we
obtain
0 ≥ ∑ cd 1d|n (n)(1 + λ (n))
d
since we are assuming λ to equal −1 on A. If we sum this in n, and use the fact that λ
is essentially orthogonal to divisor sums, we obtain
N
0 ≥ ∑ cd +...
d d
which basically means that the bound (1.12) cannot improve upon the trivial bound
|A| ≥ 0. A similar argument using the weight 1 − λ (n) also shows that any upper
bound on |A| obtained via sieve theory has to essentially be at least as large as 2|A|.
Despite this parity problem, there are a few results in which sieve theory, in con-
junction with other methods, can be used to count primes. The first of these is the ele-
mentary proof of the prime number theorem alluded to earlier, using the multiplicative
structure of the primes inside the almost primes. This method unfortunately does not
seem to generalise well to non-multiplicative prime counting problems; for instance,
the product of twin primes is not a twin almost prime, and so these methods do not seem
to have much hope of resolving the twin prime conjecture. Other examples arise if one
starts counting certain special two-parameter families of primes; for instance, Fried-
lander and Iwaniec[FrIw1998] showed that there are infinitely many primes of the form
a2 + b4 by a lengthy argument which started with Vaughan’s identity, which is sort of
like an exact sieve, but with a (non-smooth) error term which has the form of a bilin-
ear sum, which captures correlation with the Liouville function. The main difficulty
is to control this bilinear error term, which after a number of (non-trivial) arithmetic
manipulations (in particular, factorising a2 + b4 over the Gaussian integers) reduces to
understanding some correlations between the Möbius function and the Jacobi symbol,
which is then achieved by a variety of number-theoretic tools. The method was then
modified by Heath-Brown[HB2001] to also show infinitely many primes of the form
1.10. THE PARITY PROBLEM IN SIEVE THEORY 53
a3 + 2b3 . Related results for other cubic forms using similar methods have since been
obtained in [HBMo2002], [He2006] (analogous claims for quadratic forms date back
to [Iw1974]). These methods all seem to require that the form be representable as a
norm over some number field and so it does not seem as yet to yield a general procedure
to resolve the parity problem.
The parity problem can also be sometimes be overcome when there is an excep-
tional Siegel zero, which basically means that there is a quadratic character χ(n) = qn
which correlates very strongly with the primes. Morally speaking, this means that the
primes can be largely recovered from the P2 almost primes as being those almost primes
which are quadratic non-residues modulo the conductor q of χ, and this additional in-
formation seems (in principle, at least) to overcome the parity problem obstacle (related
to this is the fact that Siegel zeroes, if they exist, disprove the generalised Riemann hy-
pothesis, and so the Liouville function is no longer as uniformly distributed on smooth
sets as Selberg’s analysis assumed). For instance, Heath-Brown[HB1983] showed that
if a Siegel zero existed, then there are infinitely many prime twins. Of course, assuming
GRH then there are no Siegel zeroes, in which case these results would be technically
vacuous; however, they do suggest that to break the parity barrier, we may assume
without loss of generality that there are no Siegel zeroes.
Another known way to partially get around the parity problem is to combine precise
asymptotics on almost primes (or of weight functions concentrated near the almost
primes) with a lower bound on the number of primes, and then use combinatorial tools
to parlay the lower bound on primes into lower bounds on prime patterns. For instance,
suppose you knew could count the set
A := {n ∈ [N, 2N] : n, n + 2, n + 6 ∈ P2 }
accurately (where P2 is the set of P2 -almost primes), and also obtain sufficiently good
lower bounds on the sets
A1 := {n ∈ A : n prime}
A2 := {n ∈ A : n + 2 prime}
A3 := {n ∈ A : n + 6 prime},
and more precisely that one obtains
(For comparison, the parity problem predicts that one cannot hope to do any better
than showing that |A1 |, |A2 |, |A3 | ≥ |A|/2, so the above inequality is not ruled out by
the parity problem obstruction.)
Then, just from the pigeonhole principle, one deduces the existence of n ∈ [N, 2N]
such that at least two of n, n + 2, n + 6 are prime, thus yielding a pair of primes whose
gap is at most 6. This naive approach does not quite work directly, but by carefully
optimising the argument (for instance, replacing the condition n, n + 2, n + 6 ∈ P2 with
something more like n(n + 2)(n + 6) ∈ P6 ), Goldston, Yildirim, and Pintz[GoYiPi2008]
were recently able to show unconditionally that prime gaps in [N, 2N] could be as small
54 CHAPTER 1. OPEN PROBLEMS
as o(log N), and could in fact be as small as 16 infinitely often if one assumes the Elliot-
Halberstam conjecture[ElHa1969].
In a somewhat similar spirit, my result with Ben Green[GrTa2008] establishing
that the primes contain arbitrarily long progressions proceeds by first using sieve theory
methods to show that the almost primes (or more precisely, a suitable weight function ν
concentrated near the almost primes) are very pseudorandomly distributed, in the sense
that several self-correlations of ν can be computed and agree closely with what one
would have predicted if the almost primes were distributed randomly (after accounting
for some irregularities caused by small moduli). Because of the parity problem, the
primes themselves are not known to be as pseudorandomly distributed as the almost
primes; however, the prime number theorem does at least tell us that the primes have a
positive relative density in the almost primes. The main task is then to show that any
set of positive relative density in a sufficiently pseudorandom set contains arithmetic
progressions of any specified length; this combinatorial result (a “relative Szemerédi
theorem”) plays roughly the same role that the pigeonhole principle did in the work
of Goldston-Yildirim-Pintz. (On the other hand, the relative Szemerédi theorem works
even for arbitrarily low density, whereas the pigeonhole principle does not; because of
this, our sieve theory analysis is far less delicate than that in Goldston-Yildirim-Pintz.)
It is probably premature, with our current understanding, to try to find a systematic
way to get around the parity problem in general, but it seems likely that we will be
able to find some further ways to get around the parity problem in special cases, and
perhaps once we have assembled enough of these special cases, it will become clearer
what to do in general.
1.10.3 Notes
This article was originally posted on Jun 5, 2007 at
terrytao.wordpress.com/2007/06/05
Emmanuel Kowalski pointed out the need to distinguish between almost primes n
which had only O(1) factors, and almost primes n which were coprime to all numbers
between 2 and nc for some 0 < c < 1. The latter type of almost prime (which is sparser)
are the ones which are of importance in sieve theory; their density is similar to that of
the primes (i.e. comparable to 1/ log n) whereas the former type of almost prime has
density (log log n)O(1) / log n instead.
Felipe Voloch noted that by using Galois theory techniques one can sometimes
convert upper bounds in prime number estimates to lower bounds, though this method
does not seem to combine well with sieve theory methods.
Emmanuel Kowalski, Jordan Ellenberg, and Keith Conrad had some interesting
discussions on the role (or lack thereof) of adeles in sieve theory, and on how to define
the correct analogue of a “box” to sieve over in other number fields.
Ben Green pointed out the relationship between elementary sieving methods and
the “W -trick” used in our papers on arithmetic progressions of primes.
1.11. DETERMINISTIC RIP MATRICES 55
theorem
n n
k ∑ a j v j k2 = ∑ |a j |2 (1.13)
j=1 j=1
valid for all complex numbers a1 , . . . , an . In other words, the linear encoding (a1 , . . . , an ) 7→
∑nj=1 a j v j is an isometry. This implies that such an encoding can be inverted in a stable
manner: given the encoded vector w = ∑nj=1 a j v j one can uniquely recover the original
coefficients a1 , . . . , an , and furthermore that small changes in w will not cause large
fluctuations in a1 , . . . , an . Indeed, one can reconstruct the coefficients ai quickly and
explicitly by the formula
a j = hw, v j i. (1.14)
One would like to make n as large as possible, and m as small as possible, so that one
can transform as high-dimensional vectors as possible using only as low-dimensional
space as possible to store the transformed vectors. There is however a basic obstruction
to this, which is that an orthogonal matrix can only exist when n ≤ m; for if n is larger
than m, then there are too many vectors v1 , . . . , vn to remain linearly independent in Cm ,
and one must have a non-trivial linear independence
a1 v1 + . . . + an vn = 0
for all complex numbers a1 , . . . , an . (The constants 0.9 and 1.1 are not terribly im-
portant for this discussion.) Thus we only require that Pythagoras’ theorem hold ap-
proximately rather than exactly; this is equivalent to requiring that the transpose of this
matrix forms a frame. (In harmonic analysis, one would say that the vectors v1 , . . . , vn
are almost orthogonal rather than perfectly orthogonal.) This enlarges the class of ma-
trices that one can consider, but unfortunately does not remove the condition n ≤ m,
since the linear dependence argument which showed that n > m was incompatible with
(1.13), also shows that n > m is incompatible with (1.15).
It turns out, though, that one can pack more than m vectors into Cm if one localises
the almost orthogonality condition (1.15) so that it only holds for sparse sets of co-
efficients a1 , . . . , an . Specifically, we fix a parameter S (less than m), and say that the
matrix (v1 , . . . , vn ) obeys the RIP with sparsity S if one has the almost orthogonality
condition (1.15) for any set of coefficients (a1 , . . . , an ), such that at most S of the a j are
non-zero. [The RIP is also known as the Uniform Uncertainty Principle (UUP) in the
literature, particularly with regard to Fourier-type vectors; see Section 3.2.] In other
words, we only assume that any S of the n vectors v1 , . . . , vn are almost orthogonal at
one time. (It is important here that we require almost orthogonality rather than per-
fect orthogonality, since as soon as a set of vectors are pairwise perfectly orthogonal,
they are of course jointly perfectly orthogonal. In contrast, the constants 0.9 and 1.1
in the RIP condition will deteriorate as S increases, so that local almost orthogonal-
ity does not imply global almost orthogonality.) The RIP property is more powerful
1.11. DETERMINISTIC RIP MATRICES 57
(and hence more useful) when S is large; in particular one would like to approach the
“information-theoretic limit” when S is comparable in magnitude to m.
Roughly speaking, a set of vectors (v1 , . . . , vn ) which obey the RIP are “just as
good” as an orthonormal set of vectors, so long as one doesn’t look at more than S of
these vectors at a time. For instance, one can easily show that the map (a1 , . . . , an ) 7→
∑ j a j v j is still injective as long as one restricts attention to input vectors which are S/2-
sparse or better (i.e. at most S/2 of the coefficients are allowed to be non-zero). This
still leaves the question of how to efficiently recover the sparse coefficients (a1 , . . . , an )
from the transformed vector w = ∑ j a j v j . The algorithm (1.14) is no longer accurate;
however if the coefficients are just a little bit sparser than S/2 (e.g. S/3 will do) then
one can instead use the algorithm of basis pursuit to recover the coefficients (a1 , . . . , an )
perfectly. Namely, it turns out[CaTa2005] that among all the possible representations
w = ∑ j b j v j of w, the one which minimises the l 1 norm ∑ j |b j | will be the one which
matches the S/3-sparse representation ∑ j a j v j exactly. (This has an interesting geomet-
ric interpretation: if we normalise all the vectors v j to have unit length, then this result
says that the simplest (sparsest) way to get from 0 to w by moving in the directions
v1 , . . . , vn is also the shortest way to get there.) There are also some related results re-
garding coefficients (a1 , . . . , an ) which are merely compressible instead of sparse, but
these are a bit more technical; see my paper with Emmanuel Candes[CaTa2007] for
details.
It turns out that RIP matrices can have many more columns than rows; indeed, as
shown in [Do2006], [CaTa2006], n can be as large as m exp(cm/S) for some absolute
constant c ¿ 0. (Subsequent proofs also appeared in [CaRuTaVe2005], [BaDadVWa2008].)
The construction is in fact very easy; one simply selects the vectors v1 , . . . , vn randomly,
either as random unit vectors or as random normalised Gaussian vectors (so all coef-
ficients of each vi are independent Gaussians with mean zero and variance 1/m). The
point is that in a high-dimensional space such as Cm , any two randomly selected vec-
tors are very likely to be almost orthogonal to each other; for instance, it is an easy
computation that the dot product between two random normalised Gaussian vectors
has a variance of only O(1/m), even though the vectors themselves have a magnitude
very close to 1. Note though that control of these dotpproducts is really only enough
to obtain the RIP for relatively small S, e.g. S = O( (m)). For large S, one needs
slightly more advanced tools, such as large deviation bounds on the singular values of
rectangular Gaussian matrices (which are closely related to the Johnson-Lindenstrauss
lemma[JoLi1984]).
The results for small sparsity S are relatively easy to duplicate by deterministic
means. In particular, the paper of de Vore[dV2008] mentioned √ earlier uses a poly-
nomial construction to obtain RIP matrices with S close to m, and n equal to an
arbitrarily large power of m, essentially by ensuring that all pthe column vectors have a
low inner product with each other (of magnitude roughly 1/m or so, matching what
the random construction gives, and almost certainly best possible). But to get to larger
values of S (and in particular, to situations in which S is comparable to m) may require
a more refined calculation (possibly involving higher moments of the Gramian matrix,
as was done in [CaRoTa2006] in the random case). Alternatively, one may rely on
conjecture rather than rigorous results; for instance, it could well be that the matrices
of de Vore satisfy the RIP for far larger sparsities S than are rigorously proven in that
58 CHAPTER 1. OPEN PROBLEMS
paper.
An alternate approach, and one of interest in its own right, is to work on improving
the time it takes to verify that a given matrix (possibly one of a special form) obeys the
RIP. The brute-force approach of checking the singular values of every set of S column
vectors requires a run time comparable to Sn or worse, which is quite poor. (A variant
approach has recently been proposed by Sharon, Wright, and Ma[ShWrMa2008] but
has similar run time costs.) But perhaps there exist specially structured matrices for
which the RIP is easier to verify, and for which it is still likely that the RIP holds. This
would give a probablistic algorithm for producing rigorously certified RIP matrices
with a reasonable average-case run time.
1.11.1 Notes
This article was originally posted on Jul 2, 2007 at
terrytao.wordpress.com/2007/07/02
Thanks to Ajay Bangla for corrections.
Igor Carron asked what happened if one relaxed the RIP condition so that one had
restricted isometry for most collections of sparse columns rather than all. It may be
easier to construct matrices with this weaker property, though these matrices seem to
be somewhat less useful for applications and for rigorous theoretical results.
1.12. THE NONLINEAR CARLESON CONJECTURE 59
where α(k) and β (k) are arbitrary complex numbers; physically, these numbers rep-
resent the amplitudes of the rightward and leftward propagating components of the
solution respectively.
Now suppose that V is non-zero, but is still compactly supported on an interval
[−R, +R]. Then for a fixed frequency k, a solution to (1.16) will still behave like (1.17)
in the regions x > R and x < R, where the potential vanishes; however, the amplitudes
on either side of the potential may be different. Thus we would have
z}|{
What can we say about the matrix V (k)? By using the Wronskian of two so-
lutions to (1.16) (or by viewing (1.16) as a Hamiltonian flow in phase space) we can
z}|{
show that V (k) must have determinant 1. Also, by using the observation that the so-
lution space to (1.16) is closed under complex conjugation u(k, x) 7→ u(k, x), one sees
z}|{
that each coefficient of the matrix V (k) is the complex conjugate of the diagonally
z}|{
opposite coefficient. Combining the two, we see that V (k) takes values in the Lie
group
a b
SU(1, 1) := { : a, b ∈ C, |a|2 − |b|2 = 1}
b a
(which, incidentally, is isomorphic to SL2 (R)), thus we have
z}|{ a(k) b(k)
V (k) =
b(k) a(k)
(One can avoid the additional technicalities caused by the WKB phase correction by
working with the Dirac equation instead of the Schrödinger; this formulation is in
fact cleaner in many respects, but we shall stick with the more traditional Schrödinger
formulation here. More generally, one can consider analogous scattering transforms
for AKNS systems.) One can in fact expand a(k) and b(k) as a formal power Rseries of
i x
multilinear integrals in V (distorted slightly by the WKB phase correction e k −∞ V ). It
is relatively easy to show that this multilinear series is absolutely convergent for every
1.12. THE NONLINEAR CARLESON CONJECTURE 61
k when the potential V is absolutely integrable (this is the nonlinear analogue to the
V (k)e−2ikx is absolutely convergent
R∞
obvious fact that the Fourier integral V̂ (k) = −∞
when V is absolutely integrable; it can also be deduced without recourse to multilinear
series by using Levinson’s theorem.) If V is not absolutely integrable, but instead lies in
L p (R) for some p > 1, then the series can diverge for some k; this fact is closely related
to a classic result of Wigner and von Neumann that the Schrdinger operator can contain
embedded pure point spectrum. However, Christ and Kiselev[ChKi2001] showed that
the series is absolutely convergent for almost every k in the case 1 < p < 2 (this is a
non-linear version of the Hausdorff-Young inequality). In fact they proved a stronger
statement, namely that for almost every k, the eigenfunctions x 7→ u(k, x) are bounded
(and converge asymptotically to plane waves α± (k)eikx + β± (k)e−ikx as x → ∞). There
is an analogue of the Born and WKB approximations for these eigenfunctions, which
shows that the Christ-Kiselev result is the nonlinear analogue of a classical result of
Menshov, Paley and Zygmund showing the conditional convergence of the Fourier
V (x)e−2ikx dx for almost every k when V ∈ L p (R) for some 1 < p < 2.
R∞
integral −∞
The analogue of the Menshov-Paley-Zygmund theorem at the endpoint p = 2 is the
celebrated theorem of Carleson[Ca1966] on almost everywhere convergence of Fourier
series of L2 functions. (The claim fails for p > 2, as can be seen by investigating ran-
dom Fourier series, though I don’t recall the reference for this fact.) The nonlinear
version of this would assert that for square-integrable potentials V , the eigenfunctions
x 7→ u(k, x) are bounded for almost every k. This is the nonlinear Carleson theorem
conjecture. Unfortunately, it cannot be established by multilinear series, because of
a divergence in the trilinear term of the expansion[MuTaTh2003]; but other methods
may succeed instead. For instance, the weaker statement that the coefficients a(k) and
b(k) (defined by density) are well defined and finite almost everywhere for square-
integrable V (which is a nonlinear analogue of Plancherel’s theorem that the Fourier
transform can be defined by density on L2 (R)) was essentially established by Deift
and Killip [DeKi1999], using a trace formula (a nonlinear analogue to Plancherel’s
formula). Also, the “dyadic” or “function field” model (cf. Section 2.6) of the con-
jecture is known[MuTaTh2003b], by a modification of Carleson’s original argument.
But the general case still seems to require more tools; for instance, we still do not have
a good nonlinear Littlewood-Paley theory (except in the dyadic case), which is pre-
venting time-frequency type arguments from being extended directly to the nonlinear
setting.
1.12.1 Notes
This article was originally posted on Dec 17, 2007 at
terrytao.wordpress.com/2007/12/17
62 CHAPTER 1. OPEN PROBLEMS
Chapter 2
Expository articles
63
64 CHAPTER 2. EXPOSITORY ARTICLES
• Objects can behave both like particles (with definite position and a continuum of
states) and waves (with indefinite position and (in confined situations) quantised
states);
• The equations that govern quantum mechanics are deterministic, but the standard
interpretation of the solutions (the Copenhagen interpretation) of these equations
is probabilistic; and
• If instead one applies the laws of quantum mechanics literally at the macroscopic
scale (via the relative state interpretation, more popularly known as the many
worlds intepretation), then the universe itself must split into the superposition of
many distinct “worlds”.
What I will attempt to do here is to use the familiar concept of a computer game
as a classical conceptual model with which to capture these non-classical phenomena.
The exact choice of game is not terribly important, but let us pick Tomb Raider - a
popular game from about ten years ago, in which the heroine, Lara Croft, explores
various tombs and dungeons, solving puzzles and dodging traps, in order to achieve
some objective. It is quite common for Lara to die in the game, for instance by failing
to evade one of the traps. (I should warn that this analogy will be rather violent on
certain computer-generated characters.)
The thing about such games is that there is an “internal universe”, in which Lara in-
teracts with other game elements, and occasionally is killed by them, and an “external
universe”, where the computer or console running the game, together with the human
who is playing the game, resides. While the game is running, these two universes run
more or less in parallel; but there are certain operations, notably the “save game” and
“restore game” features, which disrupt this relationship. These operations are utterly
mundane to people like us who reside in the external universe, but it is an interest-
ing thought experiment to view them from the perspective of someone like Lara, in
the internal universe. (I will eventually try to connect this with quantum mechanics,
but please be patient for now.) Of course, for this we will need to presume that the
Tomb Raider game is so advanced that Lara has levels of self-awareness and artificial
intelligence which are comparable to our own. In particular, we will imagine that Lara
is independent enough to play the game without direct intervention from the player,
whose role shall be largely confined to that of saving, loading, and observing the game.
Imagine first that Lara is about to navigate a tricky rolling boulder puzzle, when
she hears a distant rumbling sound - the sound of her player saving her game to disk.
From the perspective of the player, we suppose that what happens next is the following:
Lara navigates the boulder puzzle but fails, being killed in the process; then the player
restores the game from the save point and then Lara successfully makes it through the
boulder puzzle.
2.1. QUANTUM MECHANICS AND TOMB RAIDER 65
Now, how does the situation look from Lara’s point of view? At the save point,
Lara’s reality diverges into a superposition of two non-interacting paths, one in which
she dies in the boulder puzzle, and one in which she lives. (Yes, just like that cat.) Her
future becomes indeterministic. If she had consulted with an infinitely prescient oracle
before reaching the save point as to whether she would survive the boulder puzzle, the
only truthful answer this oracle could give is “50% yes, and 50% no”.
This simple example shows that the internal game universe can become indeter-
ministic, even though the external one might be utterly deterministic. However, this
example does not fully capture the weirdness of quantum mechanics, because in each
one of the two alternate states Lara could find herself in (surviving the puzzle or being
killed by it), she does not experience any effects from the other state at all, and could
reasonably assume that she lives in a classical, deterministic universe.
So, let’s make the game a bit more interesting. Let us assume that every time
Lara dies, she leaves behind a corpse in that location for future incarnations of Lara
to encounter. Then Lara will start noticing the following phenomenon (assuming she
survives at all): whenever she navigates any particularly tricky puzzle, she usually en-
counters a number of corpses which look uncannily like herself. This disturbing phe-
nomenon is difficult to explain to Lara using a purely classical deterministic model of
reality; the simplest (and truest) explanation that one can give her is a “many-worlds”
interpretation of reality, and that the various possible states of Lara’s existence have
some partial interaction with each other. Another valid (and largely equivalent) expla-
nation would be that every time Lara passes a save point to navigate some tricky puzzle,
Lara’s “particle-like” existence splits into a “wave-like” superposition of Lara-states,
which then evolves in a complicated way until the puzzle is resolved one way or the
other, at which point Lara’s wave function “collapses” in a non-deterministic fashion
back to a particle-like state (which is either entirely alive or entirely dead).
Now, in the real world, it is only microscopic objects such as electrons which seem
to exhibit this quantum behaviour; macroscopic objects, such as you and I, do not
directly experience the kind of phenomena that Lara does, and we cannot interview
individual electrons to find out their stories either. Nevertheless, by studying the sta-
tistical behaviour of large numbers of microscopic objects we can indirectly infer their
quantum nature via experiment and theoretical reasoning. Let us again use the Tomb
Raider analogy to illustrate this. Suppose now that Tomb Raider does not only have
Lara as the main heroine, but in fact has a large number of playable characters, who ex-
plore a large number deadly tombs, often with fatal effect (and thus leading to multiple
game restores). Let us suppose that inside this game universe there is also a scientist
(let’s call her Jacqueline) who studies the behaviour of these adventurers going through
the tombs. However, Jacqueline does not experience the tombs directly, nor does she
actually communicate with any of these adventurers. Each tomb is explored by only
one adventurer; regardless of whether she lives or dies, the tomb is considered “used
up”.
Jacqueline observes several types of trapped tombs in her world, and gathers data
as to how likely an adventurer is to survive any given type of tomb. She learns that
each type of tomb has a fixed survival rate - e.g. she may observe that a tomb of type
A has a 20% survival rate, whilst a tomb of type B has a 50% survival rate - but that
it seems impossible to predict with any certainty whether any given adventurer will
66 CHAPTER 2. EXPOSITORY ARTICLES
survive any given type of tomb. So far, this is something which could be explained
classically; each tomb may have a certain number of lethal traps in them, and whether
an adventurer survives these traps or not may entirely be due to random chance or other
“hidden variables”.
But then Jacqueline encounters a mysterious quantisation phenomenon: the sur-
vival rate for various tombs are always one of the numbers 100%, 50%, 33.3 . . . %, 25%, 20%, . . .;
in other words, the ”frequency” of success for a tomb is always of the form 1/n for
some integer n. This phenomenon would be difficult to explain in a classical universe,
since the effects of random chance should be able to produce a continuum of survival
probabilities.
Here’s what is going on. In order for Lara (or any other adventurer) to survive a
tomb of a given type, she needs to stack together a certain number of corpses together
to reach a certain switch; if she cannot attain that level of “constructive interference”
to reach that switch, she dies. The type of tomb determines exactly how many corpses
are needed; for instance, a tomb of type A might requires four corpses to be stacked
together. Then the player who is playing Lara will have to let her die four times before
she can successfully get through the tomb; and so from her perspective, Lara’s chances
of survival are only 20%. In each possible state of the game universe, there is only one
Lara which goes into the tomb, who either lives or dies; but her survival rate here is
what it is because of her interaction with other states of Lara (which Jacqueline cannot
see directly, as she does not actually enter the tomb).
In our own reality, a familiar example of this type of quantum effect is the fact that
each atom (e.g. sodium or neon) can only emit certain wavelengths of light (which
end up being quantised somewhat analogously to the survival probabilities above); for
instance, sodium only emits yellow light, neon emits blue, and so forth. The electrons
in such atoms, in order to emit such light, are in some sense clambering over skeletons
of themselves to do so; the more commonly given explanation is that the electron is be-
having like a wave within the confines of an atom, and thus can only oscillate at certain
frequencies (similarly to how a plucked string of a musical instrument can only exhibit
a certain set of wavelengths, which coincidentally are also proportional to 1/n for in-
teger n). Mathematically, this “quantisation” of frequency can be computed using the
bound states of a Schrödinger operator with potential. [I will not attempt to stretch the
Tomb Raider analogy so far as to try to model the Schrödinger equation! In particular,
the complex phase of the wave function - which is a fundamental feature of quantum
mechanics - is not easy at all to motivate in a classical setting.]
Now let’s use the Tomb Raider analogy to explain why microscopic objects (such
as electrons) experience quantum effects, but macroscopic ones (or even mesoscopic
ones, such as large molecues) seemingly do not. Let’s assume that Tomb Raider is now
a two-player co-operative game, with two players playing two characters (let’s call
them Lara and Indiana) as they simultaneously explore different parts of their world.
The players can choose to save the entire game, and then restore back to that point; this
resets both Lara and Indiana back to the state they were in at that save point.
Now, this game still has the strange feature of corpses of Lara and Indiana from
previous games appearing in later ones. However, we assume that Lara and Indiana
are entangled in the following way: if Lara is in tomb A and Indiana is in tomb B,
then Lara and Indiana can each encounter corpses of their respective former selves, but
2.1. QUANTUM MECHANICS AND TOMB RAIDER 67
only if both Lara and Indiana died in tombs A and B respectively in a single previous
game. If in a previous game, Lara died in tomb A and Indiana died in tomb C, then this
time round, Lara will not see any corpse (and of course, neither will Indiana). (This
entanglement can be described a bit better by using tensor products; rather than saying
that Lara died in A and Indiana died in B, one should instead think of Lara ⊗ Indiana
dying in |Ai ⊗ |Bi, which is a state which is orthogonal to |Ai ⊗ |Ci.) With this type of
entanglement, one can see that there is going to be significantly less “quantum weird-
ness” going on; Lara and Indiana, adventuring separately but simultaneously, are going
to encounter far fewer corpses of themselves than Lara adventuring alone would. And
if there were many many adventurers entangled together exploring simultaneously, the
quantum effects drop to virtually nothing, and things now look classical unless the ad-
venturers are somehow organised to “resonate” in a special way (much as Bose-Einstein
condensates operate in our own world).
The Tomb Raider analogy is admittedly not a perfect model for quantum mechan-
ics. In the latter, the various possible basis states of a system interfere with each other
via linear superposition of their complex phases, whereas in the former, the basis states
interfere in an ordered nonlinear fashion, with the states associated to earlier games
influencing the states of later games, but not vice versa. Another very important fea-
ture of quantum mechanics - namely, the ability to change the set of basis states used
to decompose the full state of the system - does not have a counterpart in the Tomb
Raider model. Nevertheless, this model is still sufficiently non-classical (when viewed
from the internal universe) to construct some partial analogues of well-known quantum
phenomena. We illustrate this with two more examples.
game is restored, she can go in on side B and balance herself against the corpse from
the previous game to defuse the trap. So she in fact has up to a 50% chance of survival
here. (Actually, if she chooses a door randomly each time, and the player restores the
game until she makes it through, the net chance of survival is only 2 ln 2−1 = 38.6 . . . %
- why?) On the other hand, if either of the doors is locked in advance, then her survival
rate drops to 0%.
This does not have an easy classical explanation within the game universe, even
with hidden variables, at least if you make the locality assumption that Lara can only
go through one of the two one-way doors, and if you assume that the locks have no
effect other than to stop Lara from choosing one of the doors.
where A ∨ B is the event that at least one of A and B occur, and A ∧ B is the event that A
and B both occur. Since P(A ∨ B) clearly cannot exceed 1, we conclude that
Note that this inequality holds regardless of whether A and B are independent or not.
Iterating (2.1), we conclude that for any three events A, B,C, we have
Start
↓
Gate L ← Tomb A → Gate I
Gate L and Gate I both have two up-down switches which either character can ma-
nipulate into any of the four positions before trying to open the gate: up-up, up-down,
2.1. QUANTUM MECHANICS AND TOMB RAIDER 69
down-up, or down-down. However, the gates are trapped: only two of the positions
allow the gate to be opened safely; the other two positions will ensure that the gate
electrocutes whoever is trying to open it. Lara and Indiana know that the gates are
anti-symmetric: if one flips both switches then that toggles whether the gate is safe or
not (e.g. if down-up is safe, then up-down electrocutes). But they do not know exactly
which combinations are safe.
Lara and Indiana (starting in the position “Start”) desperately need to open both
gates before a certain time limit, but do not know which of the combinations are safe.
They have just enough time for Lara to go to Gate L through Tomb A, and for Indiana
to go to Gate I through Tomb A, but there is not enough time for Lara to communicate
to Indiana what she sees at Gate L, or conversely.
They believe (inaccurately, as it turns out) that inside Tomb A, there is inscribed
a combination (of one of the four positions) which will safely open both gates. Their
plan is to jointly go to Tomb A, find the combination, write that combination down on
two pieces of paper (one for Lara, one for Indiana), and then Lara and Indiana will
travel separately to Gate L and Gate I to try that combination to unlock both gates. At
this point, the player saves the game and play continues repeatedly from this restore
point. We re-emphasise that the player actually has no control over Lara and Indianas
actions; they are independent AIs, following the plan described above.
Unfortunately for Lara and Indiana, the combination in Tomb A is simply a random
combination - up-up, up-down, down-up, and down-down are each 25% likely to be
found in Tomb A. In truth, the combinations to Gate L and Gate I have been set by
Jacqueline. Jacqueline has set Gate L to one of the following two settings:
• Setting L1 : Gate L will open safely if the switches are up-up or up-down, but
electrocutes if the switches are down-up or down-down
• Setting L2 : Gate L will open safely if the switches are up-up or down-up, but
electrocutes if the switches are up-down or down-down.
Similarly, Jacqueline has set Gate I to one of the following two settings:
• Setting I1 : Gate I will open safely if the switches are up-up or up-down, but
electrocutes if the switches are down-up or down-down.
• Setting I2 : Gate I will open safely if the switches are up-down or down-down,
but electrocutes if the switches are down-up or up-up.
Note that these settings obey the anti-symmetry property mentioned earlier.
Jacqueline sets Gate L to setting La for some a = 1, 2, and Gate I to setting Ib for
some b = 1, 2, and measures the probability pab of the event that Lara and Indiana both
survive, or both die, thus computing four numbers p11 , p12 , p21 , p22 . (To do this, one
would have to assume that the experiment can be repeated a large number of times, for
instance by assuming that a large number of copies of these tombs and gates exist across
the game universe, with a different pair of adventurers exploring each such copy.)
Jacqueline does not know the contents (or “hidden variables”) of Tomb A, and does
not know what Lara and Indiana’s strategy is to open the gates (in particular, the strat-
egy could be randomly chosen rather than deterministic). However, if she assumes that
70 CHAPTER 2. EXPOSITORY ARTICLES
communication between Lara and Indiana is local (thus Lara cannot transmit informa-
tion about Gate L to Indiana at Gate I, or vice versa), and that the universe is classical
(in particular, that no multiple copies of the universe exist), then she can deduce a cer-
tain theoretical inequality connecting the four numbers p11 , p12 , p21 , p22 . Indeed, she
can write pab = P(la = ib ), where la is the random variable that equals 1 when Lara sets
the switches of gate L to a position which is safe for La and 0 otherwise, and similarly
ib is the random variable that equals 1 when Indiana sets the switches of gate I to a
position which is safe for Ib and 0 otherwise. Applying (2.3), we conclude that
regardless of what goes on in Tomb A, and regardless of what strategy Indiana and
Lara execute.
We now show that in the actual Tomb Raider universe, the inequality (2.4) is vio-
lated - which proves to Jacqueline that her universe must either be non-local (with in-
stantaneous information transmission) or non-classical (with the true state of the game
universe being described as a superposition of more than one classical state).
First suppose that Gate L and Gate I are both set to setting 1, thus they open on
up-* settings (i.e. up-up or up-down) and electrocute on down-*. If Lara and Indiana
find an up-* pattern in Tomb A then they both survive. In some cases they may both
be electrocuted, but only if they both hold down-* codes. If Lara and Indiana later
encounter corpses of themselves clutching a down-* code, they are intelligent enough
to apply the opposite of that code (overriding whatever false clue they got from Tomb
A) and pass through safely. As the situation is totally symmetric we see in this case
that p = p11 = 1.
Now suppose that Gate L and Gate I are both set to setting 2, thus Gate L is only safe
for *-up and gate I is only safe for *-down. Then what happens every time the game is
played is that exactly one of Lara or Indiana dies. Note that due to the entangled nature
of the corpse mechanic, this means that Lara and Indiana never see any useful corpses
which could save their lives. So in this case p = p22 = 0.
Now suppose that Gate L is in setting 1 and Gate I is in setting 2, or vice versa. Then
what happens, if Indiana and Lara see no corpses, is that they have an independent 50%
chance of survival, and thus a 50% chance of meeting the same fate. On the other hand,
if Indiana and Lara see corpses (and the way the mechanic works, if one of them sees
a corpse, the other does also), then they will use the more intelligent negation strategy
to open both gates. Thus in these cases p12 or p21 is strictly greater than 1/2.
Putting all these estimates together, we violate the inequality (2.4).
2.1.3 Notes
This article was originally posted on Feb 26, 2007 at
terrytao.wordpress.com/2007/02/26
It was derived from an interesting conversation I had several years ago with my
friend Jason Newquist, on trying to find some intuitive analogies for the non-classical
nature of quantum mechanics.
2.2. COMPRESSED SENSING AND SINGLE-PIXEL CAMERAS 71
pixels, which are all exactly the same colour - e.g. all white. Without compression,
this square would take 10, 000 bytes to store (using 8-bit grayscale); however, instead,
one can simply record the dimensions and location of the square, and note a single
colour with which to paint the entire square; this will require only four or five bytes in
all to record, leading to a massive space saving. Now in practice, we don’t get such
an impressive gain in compression, because even apparently featureless regions have
some small colour variation between them. So, given a featureless square, what one
can do is record the average colour of that square, and then subtract that average off
from the image, leaving a small residual error. One can then locate more squares where
the average colour is significant, and subtract those off as well. If one does this a couple
times, eventually the only stuff left will be very small in magnitude (intensity), and not
noticeable to the human eye. So we can throw away the rest of the image and record
only the size, location, and intensity of the “significant” squares of the image. We can
then reverse this process later and reconstruct a slightly lower-quality replica of the
original image, which uses much less space.
Now, the above algorithm is not all that effective in practice, as it does not cope well
with sharp transitions from one colour to another. It turns out to be better to work not
with average colours in squares, but rather with average colour imbalances in squares -
the extent to which the intensity on (say) the right half of the square is higher on average
than the intensity on the left. One can formalise this by using the (two-dimensional)
Haar wavelet system. It then turns out that one can work with “smoother” wavelet
systems which are less susceptible to artefacts, but this is a technicality which we will
not discuss here. But all of these systems lead to similar schemes: one represents the
original image as a linear superposition of various “wavelets” (the analogues of the
coloured squares in the preceding paragraph), stores all the significant (large magni-
tude) wavelet coefficients, and throws away (or “thresholds”) all the rest. This type of
“hard wavelet coefficient thresholding” compression algorithm is not nearly as sophis-
ticated as the ones actually used in practice (for instance in the JPEG 2000 standard)
but it is somewhat illustrative of the general principles in compression.
To summarise (and to oversimplify somewhat), the original 1024×2048 image may
have two million degrees of freedom, and in particular if one wants to express this im-
age in terms of wavelets then one would need thus need two million different wavelets
in order to reconstruct all images perfectly. However, the typical interesting image is
very sparse or compressible in the wavelet basis: perhaps only a hundred thousand of
the wavelets already capture all the notable features of the image, with the remaining
1.9 million wavelets only contributing a very small amount of “random noise” which
is largely invisible to most observers. (This is not always the case: heavily textured
images - e.g. images containing hair, fur, etc. - are not particularly compressible in
the wavelet basis, and pose a challenge for image compression algorithms. But that is
another story.)
Now, if we (or the camera) knew in advance which hundred thousand of the 2
million wavelet coefficients are going to be the important ones, then the camera could
just measure those coefficients and not even bother trying to measure the rest. (It is
possible to measure a single coefficient by applying a suitable “filter” or “mask” to
the image, and making a single intensity measurement to what comes out.) However,
the camera does not know which of the coefficients are going to be the key ones, so it
2.2. COMPRESSED SENSING AND SINGLE-PIXEL CAMERAS 73
must instead measure all 2 million pixels, convert the image to a wavelet basis, locate
the hundred thousand dominant wavelet coefficients to keep, and throw away the rest.
(This is of course only a caricature of how the image compression algorithm really
works, but we will use it for sake of discussion.)
Now, of course, modern digital cameras work pretty well, and why should we try
to improve on something which isn’t obviously broken? Indeed, the above algorithm,
in which one collects an enormous amount of data but only saves a fraction of it, works
just fine for consumer photography. Furthermore, with data storage becoming quite
cheap, it is now often feasible to use modern cameras to take many images with no
compression whatsoever. Also, the computing power required to perform the com-
pression is manageable, even if it does contribute to the notoriously battery-draining
energy consumption level of these cameras. However, there are non-consumer imaging
applications in which this type of data collection paradigm is infeasible, most notably
in sensor networks. If one wants to collect data using thousands of sensors, which each
need to stay in situ for long periods of time such as months, then it becomes necessary
to make the sensors as cheap and as low-power as possible - which in particular rules
out the use of devices which require heavy computer processing power at the sensor
end (although - and this is important - we are still allowed the luxury of all the com-
puter power that modern technology affords us at the receiver end, where all the data
is collected and processed). For these types of applications, one needs a data collection
paradigm which is as “dumb” as possible (and which is also robust with respect to,
say, the loss of 10% of the sensors, or with respect to various types of noise or data
corruption).
This is where compressed sensing comes in. The guiding philosophy is this: if
one only needs 100, 000 components to recover most of the image, why not just take
100, 000 measurements instead of 2 million? (In practice, we would allow a safety
margin, e.g. taking 300, 000 measurements, to allow for all sorts of issues, ranging from
noise to aliasing to breakdown of the recovery algorithm.) In principle, this could lead
to a power consumption saving of up to an order of magnitude, which may not mean
much for consumer photography but can be of real importance in sensor networks.
But, as I said before, the camera does not know in advance which hundred thousand
of the two million wavelet coefficients are the important ones that one needs to save.
What if the camera selects a completely different set of 100, 000 (or 300, 000) wavelets,
and thus loses all the interesting information in the image?
The solution to this problem is both simple and unintuitive. It is to make 300, 000
measurements which are totally unrelated to the wavelet basis - despite all that I have
said above regarding how this is the best basis in which to view and compress images.
In fact, the best types of measurements to make are (pseudo-)random measurements
- generating, say, 300, 000 random “mask” images and measuring the extent to which
the actual image resembles each of the masks. Now, these measurements (or “correla-
tions”) between the image and the masks are likely to be all very small, and very ran-
dom. But - and this is the key point - each one of the 2 million possible wavelets which
comprise the image will generate their own distinctive “signature” inside these random
measurements, as they will correlate positively against some of the masks, negatively
against others, and be uncorrelated with yet more masks. But (with overwhelming
probability) each of the 2 million signatures will be distinct; furthermore, it turns out
74 CHAPTER 2. EXPOSITORY ARTICLES
Note that these image recovery algorithms do require a non-trivial (though not ridicu-
lous) amount of computer processing power, but this is not a problem for applications
such as sensor networks since this recovery is done on the receiver end (which has
access to powerful computers) rather than the sensor end (which does not).
There are now rigorous results [CaRoTa2006, GiTr2008, CaTa2006, Do2006, RuVe2006]
which show that these approaches can reconstruct the original signals perfectly or
2.2. COMPRESSED SENSING AND SINGLE-PIXEL CAMERAS 75
• Linear coding. Compressed sensing also gives a simple way for multiple trans-
mitters to combine their output in an error-correcting way, so that even if a sig-
nificant fraction of the output is lost or corrupted, the original transmission can
still be recovered. For instance, one can transmit 1000 bits of information by
encoding them using a random linear code into a stream of 3000 bits; and then
it will turn out that even if, say, 300 of the bits (chosen adversarially) are then
corrupted, the original message can be reconstructed perfectly with essentially
no chance of error. The relationship with compressed sensing arises by viewing
the corruption itself as the sparse signal (it is only concentrated on 300 of the
3000 bits).
76 CHAPTER 2. EXPOSITORY ARTICLES
Many of these applications are still only theoretical, but nevertheless the potential of
these algorithms to impact so many types of measurement and signal processing is
rather exciting. From a personal viewpoint, it is particularly satisfying to see work
arising from pure mathematics (e.g. estimates on the determinant or singular values of
Fourier minors) end up having potential application to the real world.
2.2.2 Notes
This article was originally posted on April 13, 2007 at
terrytao.wordpress.com/2007/04/13
For some explicit examples of how compressed sensing works on test images, see
www.acm.caltech.edu/l1magic/examples.html
2.3. FINITE CONVERGENCE PRINCIPLE 77
used. There is “exact hard analysis” where one really uses ≤; “quasi-exact hard analysis” in which one is
willing to lose absolute constants (and so one sees notation such as O(), ., or ); “logarithmically coarse
hard analysis” in which one is willing to lose quantities such as logO(1) N which are ”logarithmic” in some
key parameter N; and ”polynomially coarse hard analysis” in which one is willing to lose quantities such
as N O(1) which are polynomial in key parameters. Finally, there is coarse analysis in which one is willing
to lose arbitrary functions of key parameters. The relationships between these flavours of hard analysis are
interesting, but will have to wait to be discussed elsewhere.
2 One can use these axioms to make finer distinctions, for instance “strongly finitary” analysis, in which
one is not even willing to use real numbers, but instead only works with finite complexity numbers (e.g.
rationals), and “strongly infinitary” analysis, in which one freely uses the axiom of choice (or related concepts
such as ultrafilters, see Section 2.5). There are also hybrids between finitary and infinitary analysis, such as
“pre-infinitary” analysis, in which one takes sequences of increasingly large or complex objects, and uses
phrases such as “passing to a subsequence if necessary” frequently, but does not actually “jump to the limit”;
we also have “pseudo-finitary” analysis, of which non-standard analysis is the most prominent example, in
which infinitary methods are re-expressed using infinitesimals or other pseudo-finitary objects. See Section
2.5 for further discussion.
3 Partial differential equations (PDE) is an interesting intermediate case in which both types of analysis
are popular and useful, though many practitioners of PDE still prefer to primarily use just one of the two
types. Another interesting transition occurs on the interface between point-set topology, which largely uses
soft analysis, and metric geometry, which largely uses hard analysis. Also, the ineffective bounds which crop
78 CHAPTER 2. EXPOSITORY ARTICLES
It is fairly well known that the results obtained by hard and soft analysis respectively
can be connected to each other by various “correspondence principles” or “compact-
ness principles”. It is however my belief that the relationship between the two types of
analysis is in fact much closer4 than just this; in many cases, qualitative analysis can
be viewed as a convenient abstraction of quantitative analysis, in which the precise de-
pendencies between various finite quantities has been efficiently concealed from view
by use of infinitary notation. Conversely, quantitative analysis can often be viewed as
a more precise and detailed refinement of qualitative analysis. Furthermore, a method
from hard analysis often has some analogue in soft analysis and vice versa, though the
language and notation of the analogue may look completely different from that of the
original. I therefore feel that it is often profitable for a practitioner of one type of anal-
ysis to learn about the other, as they both offer their own strengths, weaknesses, and
intuition, and knowledge of one gives more insight5 into the workings of the other. I
wish to illustrate this point here using a simple but not terribly well known result, which
I shall call the “finite convergence principle”6 . It is the finitary analogue of an utterly
trivial infinitary result - namely, that every bounded monotone sequence converges -
but sometimes, a careful analysis of a trivial result can be surprisingly revealing, as I
hope to demonstrate here.
Before I discuss this principle, let me first present an informal, incomplete, and
inaccurate “dictionary” between soft and hard analysis, to try to give a rough idea of
the (partial) correspondences between the two:
up from time to time in analytic number theory are a sort of hybrid of hard and soft analysis. Finally, there
are examples of evolution of a field from soft analysis to hard (e.g. Banach space geometry) or vice versa
(e.g. recent developments in extremal combinatorics, particularly in relation to the regularity lemma).
4 There are rigorous results from proof theory, such as Herbrand’s theorem[He1930], which can allow
one to automatically convert certain types of qualitative arguments into quantitative ones. There has recently
been some activity in applying the ideas from this and other proof mining results to various basic theorems
in analysis; see [Ko2008].
5 For instance, in my result with Ben Green[GrTa2008] establishing arbitrarily long arithmetic progres-
sions of primes, the argument was (necessarily) finitary in nature, but it was absolutely essential for us to be
aware of the infinitary arguments and intuition that had been developed in ergodic theory, as we had to adapt
such arguments to the finitary setting in order to conclude our proof, and it would have far less evident how
to discover such arguments if we were always restricted to looking at finitary settings. In general, it seems
that infinitary methods are good for “long-range” mathematics, as by ignoring all quantitative issues one
can move more rapidly to uncover qualitatively new kinds of results, whereas finitary methods are good for
“short-range” mathematics, in which existing ”soft” results are refined and understood much better via the
process of making them increasingly sharp, precise, and quantitative. I feel therefore that these two methods
are complementary, and are both important to deepening our understanding of mathematics as a whole.
6 Thanks to Ben Green for suggesting this name; Jennifer Chayes has also suggested the “metastability
principle”.
2.3. FINITE CONVERGENCE PRINCIPLE 79
• Soft analysis statements can often be stated both succinctly and rigorously, by
using precisely defined and useful concepts (e.g. compactness, measurability,
etc.). In hard analysis, one usually has to sacrifice one or the other: either one is
rigorous but verbose (using lots of parameters such as ε, N, etc.), or succinct but
“fuzzy” (using intuitive but vaguely defined concepts such as “size”, “complex-
ity”, “nearby”, etc.).
• A single concept in soft analysis can have multiple hard analysis counterparts. In
particular, a “naive” translation of a statement in soft analysis into hard analysis
may be incorrect. (In particular, one should not use the above table blindly to
convert from one to the other.)
There are quite a lot of quantifiers here. One can cut down the complexity a little
bit by replacing the notion of a convergent sequence with that of a Cauchy sequence.
This lets us eliminate the need for a limit x, which does not have an obvious finitary
counterpart. This leaves us with
Note now that one does not need the real number system to make this principle
both meaningful and non-trivial; the principle already works quite well when restricted
to the rationals. (Exercise: prove this principle for the rationals without constructing
the real number system.) Informally speaking, this principle asserts that every bounded
monotone sequence is eventually stable up to error ε.
Now let’s try to find the finitary (quantitative) equivalent of this principle. The most
naive thing to do is simply to replace the infinite sequence by a finite sequence, thus
But this proposition is trivially true; one can simply set N equal to M (or any num-
ber larger than M). So one needs to strengthen the claim. What about making N be
independent of M, and only dependent on ε?
But this is trivially false; consider for instance a sequence xi which equals zero
except at i = M, at which point we jump up to xM = 1. We are not going to get the
Cauchy property unless we set N to be as large as M... but we can’t do that if we only
want N to depend on ε.
So, is there anything non-trivial that one can say at all about finite bounded mono-
tone sequences? Well, we have the pigeonhole principle:
Indeed, if the gaps between each element xN of the sequence and the next xN+1 were
always larger than ε, then xM − x1 would exceed (M − 1)ε ≥ 1, a contradiction. This
principle is true, but it is too weak to be considered a true finitary version of the infinite
convergence principle; indeed, we see that the pigeonhole principle easily implies
but does not obviously imply the full infinite convergence principle.
The problem is that the pigeonhole principle only establishes instantaneous stabil-
ity of the sequence at some point n, whereas the infinite convergence principle con-
cludes the permanent stability of the sequence after some point N. To get a better
finitary match to the infinite convergence principle, we need to extend the region of
stability that the pigeonhole principle offers. Now, one can do some trivial extensions
such as
which one can quickly deduce from the first pigeonhole principle by considering
the sparsified sequence xk , x2k , x3k , . . .. But this is only a little bit better, as it now gives
the infinitary statement
but is still not strong enough to imply the infinite convergence principle in its full
strength. Nevertheless, it shows that we can extend the realm of stability offered by the
pigeonhole principle. One can for instance sparsify further, replacing n + k with 2n:
This can be proven by applying the first version of the pigeonhole principle to
the sparsified sequence x1 , x2 , x4 , x8 , . . .. This corresponds to an infinite convergence
principle in which the conclusion is that lim infn→∞ |x2n − xn | = 0.
One can of course keep doing this, achieving various sparsified versions of the
pigeonhole principle which each capture part of the infinite convergence principle. To
get the full infinite convergence principle, one cannot use any single such sparsified
version of the pigeonhole principle, but instead must take all of them at once. This is
the full strength of the finite convergence principle:
This principle is easily proven by appealing to the first pigeonhole principle with the
sparsified sequence xi1 , xi2 , xi3 , . . . where the indices are defined recursively by i1 := 1
and i j+1 := i j + F(i j ). This gives an explicit bound on M as M := ib1/εc+1 . Note that
the first pigeonhole principle corresponds to the case F(N) ≡ 1, the second pigeonhole
principle to the case F(N) ≡ k, and the third to the case F(N) ≡ N. A particularly
useful case for applications is when F grows exponentially in N, in which case M
grows tower-exponentially in 1/ε.
82 CHAPTER 2. EXPOSITORY ARTICLES
Informally, the above principle asserts that any sufficiently long (but finite) bounded
monotone sequence will experience arbitrarily high-quality amounts of metastability
with a specified error tolerance ε, in which the duration F(N) of the metastability
exceeds the time N of onset of the metastability by an arbitrary function F which is
specified in advance.
Let us now convince ourselves that this is the true finitary version of the infinite
convergence principle, by deducing them from each other:
The finite convergence principle implies the infinite convergence principle. Suppose for
contradiction that the infinite convergence principle failed. Untangling the quantifiers,
this asserts that there is an infinite sequence 0 ≤ x1 ≤ x2 ≤ . . . ≤ 1 and an ε > 0 with
the property that, given any positive integer N, there exists a larger integer N + F(N)
such that xN+F(N) − xN > ε. But this contradicts the finite convergence principle.
The infinite convergence principle implies the finite convergence principle. Suppose for
contradiction that the finite convergence principle failed. Untangling the quantifiers,
(i)
this asserts that there exists ε > 0 and a function F, together with a collection 0 ≤ x1 ≤
(i)
. . . ≤ xMi ≤ 1 of bounded monotone sequences whose length Mi goes to infinity, such
that for each one of these sequences, there does ¡b¿not¡/b¿ exist 1 ≤ N < N + F(N) ≤
(i) (i)
Mi such that |xn − xm | ≤ ε for all N ≤ n, m ≤ N + F(N). Let us extend each of the
finite bounded sequences to infinite bounded sequences in some arbitrary manner, e.g.
(i)
defining xn = 1 whenever n > Mi . The space of all bounded sequences is well-known7
to be sequentially compact in the product topology, thus after refining the i labels to a
(i)
subsequence if necessary, we can assume that the sequences (xn )∞ n=1 converge in the
∞
product topology (i.e. pointwise) to a new limit sequence (xn )n=1 . Since each of the
original sequences were bounded in the interval [0, 1] and monotone, we see that the
limit sequence is also. Furthermore, we claim that there does not exist any N ≥ 1 for
which |xn − xm | < ε for all N ≤ n, m ≤ N + F(N). Indeed, if this were the case, then by
(i) (i)
pointwise convergence we would also have |xn − xm | < ε for all N ≤ n, m ≤ N + F(N)
(i)
and all sufficiently large i, but this contradicts the construction of the xn . But now we
∞
see that this infinite bounded monotone sequence (xn )n=1 contradicts the infinite con-
vergence principle.
pactness here rather than topological compactness, the result here is in fact much closer in spirit to the
Arzelá-Ascoli theorem. In particular, the axiom of choice is not actually used here, instead one can repeat-
edly use the Bolzano-Weierstrass theorem for the interval [0, 1] followed by a diagonalisation argument to
establish sequential compactness. The astute reader here will observe that the Bolzano-Weierstrass theorem
is essentially equivalent to the infinite convergence principle! Fortunately, there is no circularity here, be-
cause we are only using this theorem in order to deduce the finite convergence principle from the infinite,
and not the other way around.
2.3. FINITE CONVERGENCE PRINCIPLE 83
• The ”naive” finitisation of an infinitary statement is often not the correct one.
• While the finitary version of an infinitary statement is indeed quantitative, the
bounds obtained can be quite poor (e.g. tower-exponential or worse).
• The deduction of the infinitary statement from the finitary one is quite short, as
long as one is willing to work indirectly (arguing by contradiction).
• The deduction of the finitary statement from the infinitary one is a bit more com-
plicated, but still straightforward, and relies primarily on compactness.
• In particular, the equivalence of the finitary and infinitary formulations requires
a non-trivial amount of infinitary mathematics (though in this particular case, we
can at least leave the ultrafilters out of it).
These morals apply not only to the finite and infinite convergence principle, but
to many other pairs of finitary and infinitary statements, for instance Szemerédi’s the-
orem[Sz1975] on one hand and the Furstenberg recurrence theorem[Fu1977] on the
other; see Section 3.1.2 for more discussion. In these contexts, the correspondence
between the finitary and infinitary statements is known as the Furstenberg correspon-
dence principle.
2.3.3 Applications
So, we’ve now extracted a quantitative finitary equivalent of the infinitary principle that
every bounded monotone sequence converges. But can we actually use this finite con-
vergence principle for some non-trivial finitary application? The answer is a definite
yes: the finite convergence principle (implicitly) underlies the famous Szemerédi regu-
larity lemma[Sz1975], which is a major tool in graph theory, and also underlies several
other regularity lemmas, such as the arithmetic regularity lemma of Green[Gr2005]
and the “strong” regularity lemma in [AlFiKrSz2000]. More generally, this principle
seems to often arise in any finitary application in which tower-exponential bounds are
inevitably involved.
Before plunging into these applications, let us first establish a Hilbert space ver-
sion8 of the convergence principle. Given a (closed) subspace X of a Hilbert space
H, and a vector v ∈ H, let πX v be the orthogonal projection from v onto X. If X is
finite dimensional, then this projection can be defined in a finitary way, for instance by
applying the Gram-Schmidt orthogonalisation procedure to X. If X is infinite dimen-
sional, then even the existence of the orthogonal projection is not completely trivial,
and in fact relies ultimately on the infinite convergence principle. Closely related to the
existence of this projection is the following monotone continuity property:
Proposition 2.12 (Hilbert space infinite convergence principle). Let 0 ⊂ X1 S ⊂ X2 ⊂
. . . ⊂ H be a nested sequence of subspaces of a Hilbert space H, and let X := ∞ n=1 Xn
be the monotone closed limit of the Xn . Then for any vector v, πXn v converges strongly
in H to πX v.
8 One could also view this as a “noncommutative” or “quantum” version of the convergence principle,
but this is somewhat of an abuse of terminology, despite the presence of the Hilbert space, since we don’t
actually have any noncommutativity or any other quantum weirdness going on.
84 CHAPTER 2. EXPOSITORY ARTICLES
As with the infinite convergence principle in [0, 1], there is a Cauchy sequence
version which already captures the bulk of the content:
One can deduce this principle from the analogous principle in [0, 1] by first nor-
malising kvkH = 1, and then observing from Pythagoras’ theorem that kπXn vk2H (which
one should view as the energy of Xn as measured relative to v) is a bounded mono-
tone sequence from 0 to 1. Applying the infinite convergence principle, followed by
Pythagoras’ theorem yet again, we obtain the claim. Once one sees this, one immedi-
ately concludes that there is also a finitary equivalent:
Informally, given a long enough sequence of nested subspaces, and a given bounded
vector v, one can find an arbitrarily good region of metastability in the orthogonal
projections of v into these subspaces.
From this principle one can then quickly deduce the Szemerédi regularity lemma[Sz1975]
as follows. Let G = (V, E) be a graph. One can think of the adjacency matrix 1E of
this graph as an element of the (finite-dimensional) Hilbert space L2 (V ×V ), where the
product space V × V is given normalised counting measure (and the discrete sigma-
algebra 2V × 2V ). We can construct a nested sequence B0 ⊂ B1 ⊂ B2 ⊂ . . . of σ -
algebras in V (which one can think of as a sequence of increasingly fine partitions of
V), together with the attendant sequence L2 (B0 × B0 ) ⊂ L2 (B1 × B1 ) ⊂ . . . of sub-
spaces (this corresponds to functions on V × V which are constant on any product of
pair of cells in the partition), by the following greedy algorithm9 :
Let ε > 0 and F : Z+ → Z+ be a function. Applying the Hilbert space finite conver-
gence principle to the above sequence of vector spaces, we obtain some N with some
bounded size (depending on ε and F) such that
k fn − fm kL2 (V ×V ) ≤ ε 2 (2.5)
What this basically means is that the partition Bn is very regular, in that even the greed-
iest way to refine this partition does not significantly capture any more of the fluctua-
tions of the graph G. By choosing F to be a suitable exponentially growing function,
one can make the regularity of this partition exceed the number of cells (which is basi-
cally 22N ) in the partition BN , which is “within epsilon” of the partition Bn in the sense
of (2.5). Putting all this together, one can get a strong version of the Szemerédi regular-
ity lemma, which implies the usual formulation by a simple argument; see [Ta2006h]
for further discussion. The choice of F being exponential is what results in the noto-
rious tower-exponential bounds in this regularity lemma (which are necessary, thanks
to a result of Gowers[Go1997]). But one can reduce F to, say, a polynomial, resulting
in more civilised bounds but with a weaker regularity conclusion. Such a “weak regu-
larity lemma” was for instance established by Frieze and Kannan[FrKa1999], and also
underlies the “generalised Koopman von Neumann theorem” which is a key compo-
nent of my result with Ben Green[GrTa2008] establishing long arithmetic progressions
in the primes. In the opposite direction, various flavours of “strong regularity lemma”
have appeared in the literature [AlFiKrSz2000], [RoSc2007], [Ta2006h], and also turn
out to be convenient ways to formulate hypergraph versions of the regularity lemma of
adequate strength to imply non-trivial theorems (such as Szemerdi’s theorem).
Rather than using sets which maximise discrepancy, one can also use sublevel sets
of the eigenvectors of the adjacency matrix corresponding to the largest eigenvalues
of the matrix to generate the partition; see [FrKa1999] for details of a closely related
construction.
The appearance of spectral theory (eigenvalues and eigenvectors) into this topic
brings one in contact with Fourier analysis, especially if one considers circulant ma-
trices (which correspond in graph-theoretic terms to Cayley graphs on a cyclic group).
This leads us towards the arithmetic regularity lemma of Green[Gr2005], which regu-
larises a bounded function f on a finite abelian group G in terms of a partition generated
by the sublevel sets (Bohr sets) of a bounded number of characters; the precise formula-
tion is a bit lengthy to state properly, although it simplifies substantially in the “dyadic
model” case (see Section 2.6) when G is a vector space over a small finite field (e.g.
F2 ). This arithmetic regularity lemma can also be established using the finite conver-
gence principle (in either the numerical form or the Hilbert space form). Indeed, if we
let H = L2 (G) and let Vn be the vector space generated by the characters associated to
the n largest Fourier coefficients of f , then by applying the finite convergence principle
(with v = f ) we can locate a metastable region, where there is not much going on (in
an L2 sense) between VN and VN+F(N) for some (exponentially growing) function F,
86 CHAPTER 2. EXPOSITORY ARTICLES
thus there is a “spectral gap” of sorts between the N largest Fourier coefficients and the
coefficients ranked N + F(N) and beyond. The sublevel sets of characters associated
to the N largest coefficients can then be used to regularise the original function f . Sim-
ilar ideas also appear in [Bo1986], [GrKo2006]. See also my survey [Ta2007f] for a
general discussion of structural theorems of this type.
(both finite and infinite) which can always be “computed” in “finite time”. But one should take this informal
definition with a large grain of salt: while there is indeed an algorithm for computing F(A) for any given set
A which will eventually give the right answer, you might not be able to tell when the algorithm has finished!
A good example is the asymptotically stable function F(A) := inf(A): you can “compute” this function for
any set A by initialising the answer to 0, running a counter n from 0 to infinity, and resetting the answer
permanently to n the first time n lies in A. As long as A is non-empty, this algorithm terminates in finite time
with the correct answer; if A is empty, the algorithm gives the right answer from the beginning, but you can
never be sure of this fact! In contrast, the cardinality |A| of a possibly infinite set A cannot be computed even
in this rather unsatisfactory sense of having a running “provisional answer” which is guaranteed to eventually
be correct.
2.3. FINITE CONVERGENCE PRINCIPLE 87
F(A) = f (inf(A)) for some fixed function f is already very interesting - it is the 1-
uniform case of a “strong Ramsey theorem” and is barely provable by finitary means11 ,
although the general case of that theorem is not finitarily provable, even if it is an im-
mediate consequence of Proposition 2.16; this assertion is essentially the celebrated
Paris-Harrington theorem. The assumption of asymptotic stability of F is necessary,
as one can see by considering the counterexample F(A) := |A|.
I am enclosing “finitary” in quotes in Proposition 2.16, because while most of the
assertion of this principle is finitary, one part still is not, which is the notion of “asymp-
totically stable”. This is a notion which cannot be precisely formulated in a purely
finitary manner, even though the notion of a set function is basically a finitary con-
cept (ignoring for now a subtle issue about what “function” means). If one insists on
working in a finitary setting, then one can recast the infinite pigeonhole principle as a
schema of finitary principles, one for each asymptotically stable set function F, but in
order to work out exactly which set functions are asymptotically stable or not requires
infinitary mathematics. (And for some (constructible, well-defined) set functions, the
asymptotic stability is undecidable; this fact is closely related to the undecidability of
the halting problem and is left as an exercise to the reader.)
The topic of exactly which statements in infinitary mathematics are “truly infini-
tary” is a fascinating one, and is basically a question in reverse mathematics, but we
will not be able to discuss it here.
2.3.5 Notes
This article was originally posted on May 23, 2007 at
terrytao.wordpress.com/2007/05/23
I am indebted to Harvey Friedman for discussions on the Paris-Harrington theorem
and the infinite pigeonhole principle, and to Henry Towsner, Ulrich Kohlenbach, and
Steven Simpson for pointing out the connections to proof theory and reverse mathe-
matics.
Richard Borcherds pointed out that the distinction between hard and soft analysis
was analogous to the distinction between first-order and second-order logic.
JL pointed out the paper of Freedman[Fr1998], in which a limiting process is pro-
posed to convert problems in complexity theory to some infinitary counterpart in de-
cidability theory.
Thanks to Liu Xiao Chuan for corrections.
11 Try it, say for k = 10 and F(A) := inf(A) + 10. What quantitative bound for N do you get?
88 CHAPTER 2. EXPOSITORY ARTICLES
Here we use the oriented definite integral, thus xy = − yx . Specialising to the case
R R
In other words, almost all the points x of A are points of density of A, which roughly
speaking means that as one passes to finer and finer scales, the immediate vicinity of
x becomes increasingly saturated with A. (Points of density are like robust versions of
interior points, thus the Lebesgue density theorem is an assertion that measurable sets
are almost like open sets. This is Littlewood’s first principle.) One can also deduce
the Lebesgue differentiation theorem back from the Lebesgue density theorem by ap-
proximating f by a finite linear combination of indicator functions; we leave this as an
exercise.
The Lebesgue differentiation and density theorems are qualitative in nature: they
assert that 1r xx+r f (y) dy eventually gets close to f (x) for almost every x, or that A
R
One then sees that if x is any element of An which is not on the boundary, then it is
indeed true that the local density |An ∩[x−r,x+r]|
2r of An will eventually converge to 1, but
one has to wait until r is of size 1/2n or smaller before one sees this; for scales much
larger than this, the local density will remain stubbornly close to 1/2. AR similar phe-
nomenon holds for the indicator functions fn := 1An : the local average 1r xx+r fn (y) dy
will eventually get close to fn (x), which is either 0 or 1, but when |r| 1/2n , these
averages will also stay close to 1/2. (Closely related to this is the fact that the functions
fn converge weakly to 1/2, despite only taking values in {0, 1}.)
Intuitively, what is going on here is that while each set An is certainly Lebesgue
measurable, these sets are getting increasingly “less measurable” as n gets large, and
the rate of convergence in the Lebesgue differentiation and density theorems depends
on how measurable the sets An are. One can illustrate this by considering (non-rigorously)
the limiting case n = ∞ as follows. Suppose we select a random subset A∞ of [0, 1] by
requiring each real number x in [0, 1] to lie in A∞ with an independent probability of 1/2
(thus we are flipping an uncountable number of coins to determine this set!). The law
of large numbers (applied very non-rigorously!) then suggests that with probability 1,
A∞ should have density 1/2 in every single interval I in [0, 1], thus |A∞ ∩ I| = 21 |I|. This
would seem to violate the Lebesgue density theorem; but what is going on here is that
the set A∞ is in fact almost surely non-measurable (indeed, the Lebesgue density theo-
rem provides a proof of this fact, modulo the issues of justifying several non-rigorous
claims in this paragraph).
So, it seems that to proceed further we need to quantify the notion of measurability,
in order to decide which sets or functions are “more measurable” than others. There
are several ways to make such a quantification. Here are some typical proposals:
Definition 2.19. A set A ⊂ [0, 1] is (ε, n)-measurable if there exists a set B which is the
union of dyadic intervals [ j/2n , ( j + 1)/2n ] at scale 2−n , such that A and B only differ
on a set of Lebesgue measure (or outer measure) ε.
Definition 2.20. A function f : [0, 1] → [0, 1] is (ε, n)-measurable if there exists a func-
tion g which is constant on the dyadic intervals [ j/2n , ( j + 1)/2n ], which differs from
f in L1 -norm by at most ε, thus 01 | f (x) − g(x)| dx ≤ ε.
R
Remark 2.21. One can phrase these definitions using the σ -algebra generated by the
dyadic intervals of length 2−n ; we will not do so here, but these σ -algebras are certainly
underlying our discussion. Their presence is particuarly prominent in the “ergodic
theory” approach to this circle of ideas, which we are not focusing on here.
One can now obtain the following quantitative results:
Theorem 2.24 (Lebesgue approximation theorem, first version). Let A ⊂ [0, 1] be mea-
surable. Then for every ε > 0 there exists n such that A is (ε, n)-measurable.
Theorem 2.25 (Lebesgue approximation theorem, second version). Let f :[0,1]→
[0, 1] be measurable. Then for every ε > 0 there exists n such that f is (ε, n)-measurable.
These two results are easily seen to be equivalent. Let us quickly recall the proof
of the first version:
Proof of Theorem 2.24. The claim is easily verified when A is the finite union of dyadic
intervals, and then by monotone convergence one also verifies the claim when A is com-
pact (or open). One then verifies that the claim is closed under countable unions, inter-
sections, and complements, which then gives the claim for all Borel-measurable sets.
The claim is also obviously true for null sets, and thus true for Lebesgue-measurable
sets.
So, we’re done, right? Well, there is still an unsatisfactory issue: the Lebesgue
approximation theorems guarantee, for any given ε, that a measurable set A or a mea-
surable function f will eventually be (ε, n)-measurable by taking n large enough, but
don’t give any bound as to what this n will be. In a sense, this is unavoidable, even if
we consider “nice” objects such as compact sets A or piecewise constant functions f ;
the example of the set An or the function fn discussed previously show that for fixed ε,
one can be forced to take n to be arbitrarily large.
However, we can start looking for substitutes for these theorems which do have
quantitative bounds. Let’s focus on the first version of the Lebesgue approximation
theorem, and in particular in the case when A is compact. Then, we can write A =
T∞ (n) (n) is the union of all the (closed) dyadic intervals which intersect
n=1 A , where A
A. The measures |A(n) | are a monotone decreasing sequence of numbers between 0 and
1, and thus (by Proposition 2.1!) they have a limit, which (by the upper continuity of
Lebesgue measure) is just |A|. Thus, for every ε > 0 we have |A(n) | − |A| < ε for all
sufficiently large n, which explains why A is (ε, n)-measurable for all large n.
So now we see where the lack of a bound on n is coming from - it is the fact that the
infinite convergence principle also does not provide an effective bound on the rate of
convergence. But in Section 2.3, we saw how the finite convergence theorem (Propo-
sition 2.11) did provide an effective substitute for the infinite convergence principle. If
we apply it directly to this sequence |A(n) |, this is what we get:
2.4. LEBESGUE DIFFERENTIATION THEOREM 91
This theorem does give a specific upper bound on the scale n one has to reach in
order to get quantitative measurability. The catch, though, is that the measurability is
not attained for the original set A, but instead on some discretisation A(n+F(n)) of A.
However, we can make the scale at which we are forced to discretise to be arbitrarily
finer than the scale at which we have the measurability.
Nevertheless, this theorem is still a little unsatisfying, because it did not directly
say too much about the original set A. There is an alternate approach which gives a
more interesting result. In the previous results, the goal was to try to approximate an
arbitrary object (a set or a function) by a “structured” or ‘low-complexity” one (a finite
union of intervals, or a piecewise constant function), thus trying to move away from
“pseudorandom” or “high-complexity” objects (such as the sets An and functions fn
discussed earlier). Of course, the fact that these pseudorandom objects actually exist
is what is making this goal difficult to achieve satisfactorily. However, one can adopt
a different philosophy, namely to embrace both the structured and the pseudorandom
aspects of these objects, and focus instead on creating an efficient decomposition of
arbitrary objects into the structured and pseudorandom components.
To do this, we need to understand what “pseudorandom” means. One clue is to
look at the examples An and fn discussed earlier. Observe that if one averages fn on
any reasonable sized interval J, one gets something very close to the global average of
fn , i.e. 1/2. In other words, the integral of fn on an interval J is close to the global
average of fn times |J|. (This is also true when J is a small interval, since in this case
both expressions are small.) This motivates the following definition:
Thus, for instance, fn is 2−n -regular on [0, 1]. We then have an analogue of the
Szemerédi regularity lemma for subsets of the interval, which I will dub the “Lebesgue
regularity lemma”:
Lemma 2.28 (Lebesgue regularity lemma). If ε > 0 and f : [0, 1] → [0, 1] is measur-
able, then there exists an positive integer n = Oε (1) (i.e. n is bounded by a quantity
depending only on ε), such that f is ε-regular on all but at most ε2n of the 2n dyadic
intervals of length 2−n .
Proof. As with the proof of many other regularity lemmas, we shall rely primarily
on the energy increment argument (the energy is also known as the index in some
literature). For minor notational reasons we will take ε to be a negative power of 2.
For each integer n, let f (n) : [0, 1] → [0, 1] be the conditional expectation of f to the
dyadic intervals of length 2−n , thus on each such interval I, f (n) is equal to the constant
1R
value I I f (again, we are ignoring sets of measure zero). An easy application of
Pythagoras’s theorem (for L2 ([0, 1])) shows that the energies En := 01 | f (n) (x)|2 dx are
R
92 CHAPTER 2. EXPOSITORY ARTICLES
an increasing sequence in n, and bounded between 0 and 1. Applying (a special case of)
the finite convergence principle, we can find n = Oε (1) such that we have the energy
metastability
En+log2 1/ε − En ≤ ε 3 .
for all but ε2n of the 2n dyadic intervals I of length 2−n . The Cauchy-Schwarz inequal-
ity then quickly shows that f is ε-regular on all of these intervals.
Remark 2.30. Note that strong (ε, m)-regularity implies 2ε-regularity as long as 2−m ≤
ε. Strong (ε, m)-regularity offers much better control on the fluctuation of f at finer
scales, as long as the scale is not too fine (this is where the parameter m comes in).
The bound on n is rather poor; it is basically a 1/ε 3 -fold iteration of the function
n 7→ n + F(n) log2 1/ε applied to 1, so for instance if one wanted F to be exponential
in nature then n might be as large as a tower of exponentials of height 1/ε 3 or so. (A
2.4. LEBESGUE DIFFERENTIATION THEOREM 93
very similar situation occurs for the Szemerédi regularity lemma, which has a variety
of such strong versions [AlFiKrSz2000], [RoSc2007], [Ta2006h].)
We can now return to the Lebesgue differentiation theorem, and use the strong
regularity lemma to obtain a more satisfactory quantitative version of that theorem:
This theorem can be deduced fairly easily by combining the strong regularity lemma
with the Hardy-Littlewood maximal inequality (to deal with the errors e), and by cover-
ing (most of) the non-dyadic intervals [x, x +r] or [x, x +s] by dyadic intervals and using
the boundedness of f to deal with the remainder. We leave the details as an exercise.
One sign that this is a true finitary analogue of the infinitary differentiation theorem
is that this finitary theorem implies most of the infinitary theorem; namely, it shows
that for any measurable f , and almost every x, the sequence 1r xx+r f (y) is a Cauchy
R
sequence as r → 0, although it does not show that the limit is equal to f (x). (Finitary
statements can handle Cauchy sequences - which make sense even in the rationals - but
have some trouble actually evaluating the limits of such sequences - which need the
(infinite precision) real numbers and thus not truly finitary.) Conversely, using weak
compactness methods one can deduce the quantitative differentiation theorem from the
infinitary one, in much the same way that the finite and infinite convergence principles
can be deduced from each other.
The strong Lebesgue regularity lemma can also be used to deduce the (one-dimensional
case of the) Rademacher differentiation theorem, namely that a Lipschitz continuous
function from [0, 1] to the reals is almost everywhere differentiable. To see this, sup-
pose for contradiction that we could find a function g which was Lipschitz continuous
but failed to be differentiable on a set of positive measure, thus for every x in this set,
the (continuous) sequence g(x+r)−g(x)
r is not a Cauchy sequence as r goes to zero. We
can normalise the Lipschitz constant of g to equal 1. Then by standard arguments we
can find ε > 0 and a function F : N → N such that for every x in a set of measure
greater than ε, and every n, the sequence g(x+r)−g(x)
r fluctuates by at least ε in the range
2−n−F(n) < r < 2−n . Now let M be a very large integer (depending on ε and F) and
discretise g to scale 2−M to create a piecewise linear approximant gM , which is the
antiderivative of a bounded function f which is constant on dyadic intervals of length
2−M . We apply the strong Lebesgue regularity lemma to f and find a scale n = OF,ε (1)
for which the conclusion of that lemma holds; by choosing n large enough we can en-
sure that M ≥ n + F(n) ≥ n. It is then not hard to see that the lemma contradicts the
previous assertion that g(x+r)−g(x)
r fluctuates for certain ranges of x and r.
I used several of the above ideas in [Ta2008c] to establish a quantitative version of
the Besicovitch projection theorem.
2.4.1 Notes
This article was originally posted on Jun 18, 2007 at
94 CHAPTER 2. EXPOSITORY ARTICLES
terrytao.wordpress.com/2007/06/18
Jeremy Avigad mentioned some connections with Steinhaus’s classic theorem that
if A, B are subsets of R with positive measure, then the set A + B contains an interval,
for which effective versions have been recently established.
2.5. ULTRAFILTERS AND NONSTANDARD ANALYSIS 95
precise and meaningful, by using non-standard analysis, which is the most well-known
of the “pseudo-finitary” approaches to analysis, in which one adjoins additional num-
bers to the standard number system. Similarly for “bounded” replaced by “small”,
“polynomial size”, etc.. Now, in order to set up non-standard analysis one needs a
(non-principal) ultrafilter (or an equivalent gadget), which tends to deter people from
wanting to hear more about the subject. Because of this, many treatments of non-
standard analysis tend to gloss over the actual construction of non-standard number
systems, and instead emphasise the various benefits that these systems offer, such as
a rigorous supply of infinitesimals, and a general transfer principle that allows one to
convert statements in standard analysis into equivalent ones in non-standard analysis.
This transfer principle (which requires the ultrafilter to prove) is usually recommended
to be applied only at the very beginning and at the very end of an argument, so that the
bulk of the argument is carried out purely in the non-standard universe.
I feel that one of the reasons that non-standard analysis is not embraced more
widely is because the transfer principle, and the ultrafilter that powers it, is often re-
garded as some sort of “black box” which mysteriously bestows some certificate of
rigour on non-standard arguments used to prove standard theorems, while conveying
no information whatsoever on what the quantitative bounds for such theorems should
be. Without a proper understanding of this black box, a mathematician may then feel
uncomfortable with any non-standard argument, no matter how impressive and power-
ful the result.
The purpose of this post is to try to explain this black box from a “hard analysis”
perspective, so that one can comfortably and productively transfer into the non-standard
universe whenever it becomes convenient to do so (in particular, it can become cost-
effective to do this whenever the burden of epsilon management becomes excessive,
and one is willing to not make certain implied constants explicit).
p−lim xn must also be either 0 or 1. From a voting perspective, the p-limit is a voting
system: a mechanism for extracting a yes-no answer out of the yes-no preferences of
an infinite number of voters.
Let p denote the collection of all subsets A of the natural numbers such that the
indicator sequence of A (i.e. the boolean sequence xn which equals 1 when n lies in
A and equals 0 otherwise) has a p-limit of 1; in the voting theory language, p is the
collection of all voting blocs who can decide the outcome of an election by voting in
unison, while in the probability theory language, p is the collection of all sets of natural
numbers of measure 1. It is easy to verify that p has four properties:
1. (Monotonicity) If A lies in p, and B contains A, then B lies in p.
2. (Closure under intersection) If A and B lie in p, then A ∩ B also lies in p.
3. (Dichotomy) If A is any set of natural numbers, either A or its complement lies
in p, but not both.
4. (Non-principality) If one adds (or deletes) a finite number of elements to (or
from) a set A, this does not affect whether the set A lies in p.
A collection p obeying properties 1 and 2 is called a filter; a collection obeying 1,2,
and 3 is called an ultrafilter, and a collection obeying 1,2,3, and 4 is a non-principal
ultrafilter]footnoteIn contrast, a principal ultrafilter is one which is controlled by a
single index n0 in the sense that p = {A : n0 ∈ A}. In voting theory language, this is
a scenario in which n0 is a dictator; in probability language, the random variable n is
now a deterministic variable taking the values of n0 .
A property A(n) pertaining to a natural number n can be said to be p-true if the
set {n : A(n) true} lies in p, and p-false otherwise; for instance any tautologically true
statement is also p-true. Using the probabilistic interpretation, these notions are anal-
ogous to those of “almost surely true” and “almost surely false” in probability the-
ory. (Indeed, one can view p as being a probability measure on the natural numbers
which always obeys a zero-one law, though one should caution that this measure is
only finitely additive rather than countably additive, and so one should take some care
in applying measure-theoretic technology directly to an ultrafilter.)
Properties 1-3 assert that this notion of “p-truth” obeys the usual laws of proposi-
tional logic; for instance property 2 asserts that if A is p-true and B is p-true, then so
is “A and B”, while property 3 is the familiar law of the excluded middle and property
1 is modus ponens. This is actually rather remarkable: it asserts that ultrafilter voting
systems cannot create voting paradoxes, such as those guaranteed by Arrow’s theorem.
There is no contradiction here, because Arrow’s theorem only applies to finite (hence
compact) electorates of voters, which do not support any non-principal ultrafilters. At
any rate, we now get a hint of why ultrafilters are such a useful concept in logic and
model theory.
We have seen how the notion of a p-limit creates a non-principal ultrafilter p. Con-
versely, once one has a non-principal ultrafilter p, one can uniquely recover the p-limit
operation. This is easiest to explain using the voting theory perspective. With the ultra-
filter p, one can ask yes-no questions of an electorate, by getting each voter to answer
yes or no and then seeing whether the resulting set of “yes” voters lies in p. To take
2.5. ULTRAFILTERS AND NONSTANDARD ANALYSIS 99
a p-limit of a bounded sequence xn , say in [0, 1], what is going on is that each voter n
has his or her own favourite candidate number xn between 0 and 1, and one has to elect
a real number x from all these preferences. One can do this by an infinite electoral
version of “Twenty Questions”: one asks all the voters whether x should be greater
than 1/2 or not, and uses p to determine what the answer should be; then, if x is to
be greater than 1/2, one asks whether x should be greater than 3/4, and so on and so
forth. This eventually determines x uniquely; the properties 1 − 4 of the ultrafilter can
be used to derive properties 1 − 3 of the p-limit.
A modification of the above argument also lets us take p-limits of any sequence
in a compact metric space (or slightly more generally, in any compact Hausdorff first-
countable topological space12 ). These p-limits then behave in the expected manner
with respect to operations in those categories, such as composition with continuous
functions or with direct sum. As for unbounded real-valued sequences, one can still
extract a p-limit as long as one works in a suitable compactification of the reals, such
as the extended real line.
The reconstruction of p-limits from the ultrafilter p is also analogous to how, in
probability theory, the concept of expected value of a (say) non-negative random vari-
able X can be reconstructed from the concept of probability via the integration formula
E(X) = 0∞ P(X ≥ λ ) dλ . Indeed, one can define p−lim xn to be the supremum of all
R
numbers x such that the assertion xn > x is p-true, or the infimum of all numbers y that
xn < y is p-true.
We have said all these wonderful things about non-principal ultrafilters, but we
haven’t actually shown that these amazing objects actually exist. There is a good reason
for this - the existence of non-principal ultrafilters requires the axiom of choice (or
some slightly weaker versions of this axiom, such as the boolean prime ideal theorem).
Let’s give two quick proofs of the existence of a non-principal ultrafilter:
Proof 1. Let q be the set of all cofinite subsets of the natural numbers (i.e. sets whose
complement is finite). This is clearly a filter which is proper (i.e. it does not contain
the empty set 0).
/ Since the union of any chain of proper filters is again a proper filter,
we see from Zorn’s lemma that q is contained in a maximal proper filter p. It is not
hard to see that p must then be a non-principal ultrafilter.
Proof 2. Consider the StoneCech compactification β N of the natural numbers. Since
N is not already compact, there exists an element p of this compactification which
does not lie in N. Now note that any bounded sequence xn on the natural numbers is
a bounded continuous function on N (since N is discrete) and thus, by definition of
β N, extends uniquely to a bounded continuous function on β N, in particular one can
evaluate this function at p to obtain a real number x p . If one then defines p−lim xn := x p
one easily verifies the properties 1-4 of a p-limit, which by the above discussion creates
a non-principal ultrafilter (which by abuse of notation is also referred to as p; indeed,
β N is canonically identifiable with the space of all ultrafilters).
These proofs are short, but not particularly illuminating. A more informal, but per-
haps more instructive, explanation of why non-principal ultrafilters exist can be given
12 Note however that Urysohn’s metrisation theorem implies that any compact Hausdorff first-countable
space is metrisable.
100 CHAPTER 2. EXPOSITORY ARTICLES
as follows. In the voting theory language, our task is to design a complete and con-
sistent voting system for an infinite number of voters. In the cases where there is
near-consensus, in the sense that all but finitely many of the voters vote one way or an-
other, the decision is clear - go with the option which is preferred by the infinite voting
bloc. But what if an issue splits the electorate with an infinite number of voters on each
side? Then what one has to do is make an arbitrary choice - pick one side to go with
and completely disenfranchise all the voters on the other side, so that they will have
no further say in any subsequent votes. By performing this disenfranchisement, we in-
crease the total number of issues for which our electoral system can reach a consistent
decision; basically, any issue which has the consensus of all but finitely many of those
voters not yet disenfranchised can now be decided upon in a consistent (though highly
unfair) manner. We now continue voting until we reach another issue which splits the
remaining pool of voters into two infinite groups, at which point we have to make an-
other arbitrary choice, and disenfranchise another infinite set of voters. Very roughly
speaking, if one continues this process of making arbitrary choices “ad infinitum”, then
at the end of this transfinite process we eventually exhaust the (uncountable) number of
issues one has to decide, and one ends up13 with the non-principal ultrafilter. (If at any
stage of the process one decided to disenfranchise all but finitely many of the voters,
then one would quickly end up with a principal ultrafilter, i.e. a dictatorship.)
With this informal discussion, it is now rather clear why the axiom of choice (or
something very much like that axiom) needs to play a role in constructing non-principal
ultrafilters. However, one may wonder whether one really needs the full strength of an
ultrafilter in applications; to return once again to the voting analogy, one usually does
not need to vote on every single conceivable issue (of which there are uncountably
many) in order to settle some problem; in practice, there are often only a countable or
even finite number of tricky issues which one needs to put to the ultrafilter to decide
upon. Because of this, many of the results in soft analysis which are proven using
ultrafilters can instead be established using a “poor man’s non-standard analysis” (or
“pre-infinitary analysis”) in which one simply does the “voter disenfranchisement” step
mentioned above by hand. This step is more commonly referred to as the trick of
“passing to a subsequence whenever necessary”, and is particularly popular in the soft
analysis approach to PDE and calculus of variations. For instance, to minimise some
functional, one might begin with a minimising sequence. This sequence might not
converge in any reasonable topology, but it often lies in a sequentially compact set
in some weak topology (e.g. by using the sequential version of the Banach-Alaoglu
theorem), and so by passing to a subsequence one can force the sequence to converge
in this topology. One can continue passing to a subsequence whenever necessary to
force more and more types of convergence, and can even diagonalise using the Arzela-
Ascoli argument to achieve a countable number of convergences at once (this is of
course the sequential Banach-Alaoglu theorem in disguise); in many cases, one gets
such a strong convergence that one can then pass to the limit. Most of these types of
13 One should take this informal argument with a grain of salt; it turns out that after one has made an infinite
number of choices, the infinite number of disenfranchised groups, while individually having no further power
to influence elections, can begin having some collective power, basically because property 2 of a filter only
guarantees closure under finite intersections and not infinite intersections, and things begin to get rather
complicated. At this point, I recommend abandoning the informal picture and returning to Zorn’s lemma.
2.5. ULTRAFILTERS AND NONSTANDARD ANALYSIS 101
Definition 2.33. Let X be any set. The ultrapower ∗ X of X is defined to be the collec-
tion of all sequences (xn ) with entries in X, modulo the equivalence that two sequences
(xn ), (yn ) are considered equal if they agree p-almost surely (i.e. the statement xn = yn
is p-true).
formulation of the twin prime conjecture: for every integer N, there exists a prime p > N such that p + 2 is
also prime.
104 CHAPTER 2. EXPOSITORY ARTICLES
some mathematical manipulations in the complex domain, and then verifying that the
complex-valued answer one gets is in fact real-valued.
Let’s give an example of a non-standard number. Let ω be the non-standard natural
number (n), i.e. the sequence 0, 1, 2, 3, . . . (up to p-almost sure equivalence, of course).
This number is larger than any standard number; for instance, the standard number
5 corresponds to the sequence 5, 5, 5, . . .; since n exceeds 5 for all but finitely many
values of n, we see that n > 5 is p-true and hence ω > 5. More generally, let us say that
a non-standard number is limited if its magnitude is bounded by a standard number,
and unlimited otherwise; thus ω is unlimited. The notion of “limited” is analogous to
the notion of being O(1) discussed earlier, but unlike the O() notation, there are no
implicit quantifiers that require care to manipulate (though as we shall see shortly, the
difficulty has not gone away completely).
One also sees, for instance, that 2ω is larger than the sum of ω and any limited
number, that ω 2 is larger than the product of ω with any limited number, and so forth.
It is also clear that the sum or product of any two limited numbers is limited. The
number 1/ω has magnitude smaller than any positive standard real number and is thus
considered to be an infinitesimal. Using p-limits, we quickly verify that every limited
number x can be uniquely expressed as the sum of a standard15 number st(x) and an
infinitesimal number x − st(x). The set of standard numbers, the set of limited num-
bers, and the set of infinitesimal numbers are all subrings of the set of all non-standard
numbers. A non-zero number is infinitesimal if and only if its reciprocal is unlimited.
Now at this point one might be suspicious that one is beginning to violate some of
the axioms of the natural numbers or real numbers, in contradiction to the transfer prin-
ciple alluded to earlier. For instance, the existence of unlimited non-standard natural
numbers seems to contradict the well-ordering property: if one defines S ⊂ ∗ N to be the
set of all unlimited non-standard natural numbers, then this set is non-empty, and so the
well-ordering property should then provide a minimal unlimited non-standard number
inf(S) ∈ ∗ N. But then inf(S) − 1 must be unlimited also, a contradiction. What’s the
problem here?
The problem here is rather subtle: a set of non-standard natural numbers is not quite
the same thing as a non-standard set of natural numbers. In symbols: if 2X := {A : A ⊂
∗
X} denotes the power set of X, then 2 N 6≡ ∗ (2N ). Let’s look more carefully. What is
a non-standard set A ∈ ∗ (2N ) of natural numbers? This is basically a sequence (An ) of
sets of natural numbers, one for each voter. Any given non-standard natural number
m = (mn ) may belong to A or not, depending on whether the statement mn ∈ An is
p-true or not. We can collect all the non-standard numbers m which do belong in A,
∗ ∗
and call this set Ã; this is thus an element of 2 N . The map A 7→ Ã from ∗ (2N ) to 2 N
turns out to be injective (why? this is the transferred axiom of extensionality), but it
is not surjective; there are some sets of non-standard natural numbers which are not
non-standard sets of natural numbers, and as such the well-ordering principle, when
transferred over from standard mathematics, does not apply to them. This subtlety
is all rather confusing at first, but a good rule of thumb is that as long as your set (or
function, or whatever) is not defined using p-dependent terminology such as “standard”
15 The map x 7→ st(log x), by the way, is a homomorphism from the semiring of non-standard positive
ω
reals to the tropical semiring (R, min, +), and thus encodes the correspondence principle between ordinary
rings and tropical rings.
2.5. ULTRAFILTERS AND NONSTANDARD ANALYSIS 105
This lemma looks very similar to linear Taylor expansion, but note that there are
no limits involved (despite the suggestive o() notation); instead, we have the concept
of an infinitesimal. The implication of (2) from (1) follows easily from the definition
of derivative, the transfer principle, and the fact that infinitesimals are smaller in mag-
nitude than any positive standard real number. The implication of (1) from (2) can be
seen by contradiction; if f is not differentiable at x with derivative L, then (by the ax-
iom of choice) there exists a sequence hn of standard real numbers going to zero, such
that the Newton quotient ( f (x + hn ) − f (x))/hn is bounded away from L by a standard
positive number. One now forms the non-standard infinitesimal h = (hn ) and obtains a
contradiction to (2).
Using this equivalence, one can now readily deduce the usual laws of differential
calculus, e.g. the chain rule, product rule, and mean value theorem; the proofs are alge-
braically almost identical to the usual proofs (especially if one rewrites those proofs in
o() notation), but one does not need to deal explicitly with epsilons, deltas, and limits
(the ultrafilter has in some sense already done all that for you). The epsilon manage-
ment is done invisibly and automatically; one does not need to keep track of whether
one has to choose epsilon first before selecting delta, or vice versa. In particular, most
16 The situation here is similar to that with the adjective “constructive”; not every function from the con-
structive numbers to the constructive numbers is itself a constructive function, and so forth.
106 CHAPTER 2. EXPOSITORY ARTICLES
of the existential quantifiers (“... there exists ε such that ...”) have been eliminated,
leaving only the more pleasant universal quantifiers (”for every infinitesimal h ...”).
There is one caveat though: Lemma 2.34 only works when x is standard. For in-
stance, consider the standard function f (x) := x2 sin(1/x3 ), with the convention f (0) =
0. This function is everywhere differentiable, and thus extending to non-standard num-
bers we have f (x + h) = f (x) + h f 0 (x) + o(|h|2 ) for all standard x and infinitesimal h.
However, the same claim is not true for arbitrary non-standard x; consider for instance
what happens if one sets x = −h.
One can also obtain an analogous characterisation of the Riemann integral: a stan-
dard function f is Riemann integrable on an interval [a, b] with integral A if and only if
one has
A = ∑ f (xi∗ )(xi+1 − xi ) + o(1)
1≤i<n
with sup1≤i<n (xi+1 − xi ) infinitesimal. One can then reprove the usual basic results,
such as the fundamental theorem of calculus, in this manner; basically, the proofs are
the same, but the limits have disappeared, being replaced by infinitesimals.
1. f (m) = O(g(m)) in the standard sense, i.e. there exists a standard positive real
constant C such that | f (m)| ≤ Cg(m) for all standard natural numbers n.
This lemma is proven similarly to Lemma 2.34; the implication of (2) from (1) is
obvious from the transfer principle, while to the implication of (2) from (1) is again by
contradiction, converting a sequence of increasingly bad counterexamples to (1) to a
counterexample to (2). Lemma 2.35 is also a special case of the “overspill principle”
in non-standard analysis, which asserts that a non-standard set of numbers which con-
tains arbitrarily large standard numbers, must also contain an unlimited non-standard
number (thus the large standard numbers “spill over” to contain some non-standard
numbers). The proof of the overspill principle is related to the (specious) argument
discussed above in which one tried to derive a contradiction from the set of unlimited
natural numbers, and is left as an exercise.
17 In some texts, the notation f = O(g) only requires that | f (m)| ≤ Cg(m) for all sufficiently large m. The
nonstandard counterpart to this is the claim that | f (m)|/g(m)| is limited for every unlimited non-standard m.
2.5. ULTRAFILTERS AND NONSTANDARD ANALYSIS 107
Because of the above lemma, it is now natural to define the non-standard coun-
terpart of the O() notation: if x, y are non-standard numbers with y positive, we say
that x = O(y) if |x|/y is limited. Then the above lemma says that the standard and
non-standard O() notations agree for standard functions of one variable. Note how the
non-standard version of theO() notation does not have the existential quantifier (”...
there exists C such that ...”) and so the epsilon management is lessened. If we let L
denote the subring of ∗ R consisting of all limited numbers, then the claim x = y + O(z)
can be rewritten as x = y mod zL , thus we see how the O() notation can be viewed al-
gebraically as the operation of quotienting the (non-standard) real numbers by various
dilates of the subring L .
One can convert many other order-of-magnitude notions to non-standard notation.
For instance, suppose one is performing some standard hard analysis involving some
large parameter N > 1, e.g. one might be studying a set of N points in some group
or Euclidean space. One often wants to distinguish between quantities which are of
polynomial size in N and those which are super-polynomial in size; for instance, these
N points might lie in a finite group G, where G has size much larger than N, and one’s
application is such that any bound which depends on the size of G will be worthless.
Intuitively, the set of quantities which are of polynomial size in N should be closed
under addition and multiplication and thus form a sort of subring of the real numbers,
though in the standard universe this is difficult to formalise rigorously. But in non-
standard analysis, it is not difficult: we make N non-standard (and G too, in the above
example), and declare any non-standard quantity x to be of polynomial size if we have
x = O(N O(1) ), or equivalently if log(1 + |x|)/ log N is limited. We can then legitimately
form the set P of all non-standard numbers of polynomial size, and this is in fact a
subring of the non-standard real numbers; as before, though, we caution that P is not
a non-standard set of reals, and in particular is not a non-standard subring of the reals.
But since P is a ring, one can then legitimately apply whatever results from ring the-
ory one pleases to P, bearing in mind though that any sets of non-standard objects one
generates using that theory may not necessarily be non-standard objects themselves. At
the end of the day, we then use the transfer principle to go back to the original problem
in which N is standard.
As a specific example of this type of thing from my own experience, in [TaVu2007],
we had a large parameter n, and had at some point to introduce the somewhat fuzzy no-
tion of a “highly rational number”, by which we meant a rational number a/b whose nu-
merator and denominator were both at most no(n) in magnitude. Such numbers looked
like they were forming a field, since the sum, difference, product, or quotient of two
highly rational numbers was again highly rational (but with a slightly different rate of
decay in the o() notation). Intuitively, one should be able to do any algebraic manipula-
tion on highly rational numbers which is legitimate for true fields (e.g. using Cramer’s
rule to invert a non-singular matrix) and obtain an output which is also highly rational,
as long as the number of algebraic operations one uses is O(1) rather than, say, O(n).
We did not actually formalise this rigorously in our standard notation, and instead
resorted to informal English sentences to describe this; but one can do everything per-
fectly rigorously in the non-standard setting by letting n be non-standard, and defining
the field F of non-standard rationals a/b where a, b = O(no(n) ); F is genuinely a field of
non-standard rationals (but not a non-standard field of rationals), and so using Cramer’s
108 CHAPTER 2. EXPOSITORY ARTICLES
rule here (but only for matrices of standard size) would be perfectly legitimate. (We
did not actually write our argument in this non-standard manner, keeping everything in
the usual standard hard analysis setting, but it would not have been difficult to rewrite
the argument non-standardly, and there would be some modest simplifications.)
an even larger system ∗∗ R, whose elements can be identified (modulo ∗ p-almost sure
equivalence) with non-standard sequences (xn ) of non-standard numbers in ∗ R (where
n now ranges over the non-standard natural numbers ∗ N); one could view these as
“doubly non-standard numbers”. This gives us some “even smaller” infinitesimals,
such as the “doubly infinitesimal” number η1 given by the non-standard sequence
1, η0 , η02 , η03 , . . .. This quantity is smaller than any standard or (singly) non-standard
number, in particular infinitesimally smaller than any positive quantity depending (via
standard or singly non-standard operations) on standard or singly non-standard con-
stants such as 1 or η0 . For instance, it is smaller than 1/A(b1/η0 c), where A is
the Ackermann function, since the sequence that defines η1 is indexed over the non-
standard natural numbers and η0n will drop below 1/A(b1/η0 c) for sufficiently large
non-standard n.
One can continue in this manner, creating a triply infinitesimal quantity η2 which
is infinitesimally smaller than anything depending on 1, η0 , or η1 , and so forth. Indeed
one can iterate this construction an absurdly large number of times, though in most ap-
plications one only needs an explicitly finite number of elements from this hierarchy.
Having this hierarchy of infinitesimals, each one of which is guaranteed to be infinites-
imally small compared to any quantity formed from the preceding ones, is quite useful:
it lets one avoid having to explicitly write a lot of epsilon-management phrases such as
“Let η2 be a small number (depending on η0 and η1 ) to be chosen later” and “... assum-
ing η2 was chosen sufficiently small depending on η0 and η1 ”, which are very frequent
in hard analysis literature, particularly for complex arguments which involve more than
one very small or very large quantity. (The paper [CoKeStTaTa2008] referred to earlier
is of this type.)
2.5. ULTRAFILTERS AND NONSTANDARD ANALYSIS 109
2.5.6 Conclusion
I hope I have shown that non-standard analysis is not a totally “alien” piece of mathe-
matics, and that it is basically only “one ultrafilter away” from standard analysis. Once
one selects an ultrafilter, it is actually relatively easy to swap back and forth from the
standard universe and the non-standard one (or to doubly non-standard universes, etc.).
This allows one to rigorously manipulate things such as “the set of all small numbers”,
or to rigorously say things like “η1 is smaller than anything that involves η0 ”, while
greatly reducing epsilon management issues by automatically concealing many of the
quantifiers in one’s argument. One has to take care as to which objects are standard,
non-standard, sets of non-standard objects, etc., especially when transferring results
between the standard and non-standard worlds, but as long as one is clearly aware of
the underlying mechanism used to construct the non-standard universe and transfer
back and forth (i.e. as long as one understands what an ultrafilter is), one can avoid dif-
ficulty. The main drawbacks to use of non-standard notation (apart from the fact that it
tends to scare away some of your audience) is that a certain amount of notational setup
is required at the beginning, and that the bounds one obtains at the end are rather inef-
fective (though, of course, one can always, after painful effort, translate a non-standard
argument back into a messy but quantitative standard argument if one desires).
2.5.7 Notes
This article was originally posted on Jun 25, 2007 at
terrytao.wordpress.com/2007/06/25
Theo Johnson-Freyd, noted that the use of ultrafilters was not completely identical
to the trick of passing to subsequences whenever necessary; for instance, there exist
ultrafilters with the property that not every sequence in a compact set (e.g. [0, 1]) admits
a large convergent subsequence. (Theo learned about this observation from Ken Ross.)
Eric Wofsey, answering a question of Theo, pointed out that, thanks to a cardinality
argument, there exist a pair of non-principal ultrafilters which are not permutations of
each other, despite the fact that one non-principal ultrafilter tends to be just as good
as any other for most applications. On the other hand, if one assumes the continuum
hypothesis, then any two ultrapowers (chosen using different ultrafilters of N) of a
structure with at most the cardinality of the continuum and with a countable language
are isomorphic (meaning that there is a bijection between the ultrapowers that preserves
all interpretations of the language symbols), although in many applications in non-
standard analysis one needs to take extensions of uncountably many objects and so this
equivalence does not always apply.
Alejandro Rivero pointed out Connes’ non-commutative variant of non-standard
analysis, which used compact operators for infinitesimals and avoided the use of ultra-
filters (or the axiom of choice), though as a consequence the transfer principle was not
fully present.
Michael Greineckerer pointed out that an explicit link between Arrow’s theorem
and ultrafilters appears in the papers [KiSo1972], [Ha1976].
Thanks to Liu Xiao Chuan for corrections.
110 CHAPTER 2. EXPOSITORY ARTICLES
2. Algebraic structures, such group, ring, or field structures, and everything else
that comes from those categories (e.g. subgroups, homomorphisms, involutions,
etc.); and
Now, polynomial rings such as Z[t] or R[t] are a bit “too big” to serve as models
for Z or R (unless one adjoins some infinitesimals, as in Section 2.5, but that’s an-
other story), as they have one more dimension. One can get a more accurate model by
considering the decimal representation again, which identifies natural numbers as poly-
nomials over the space of digits {0, 1, . . . , 9}. This space is not closed under addition
(which is what causes spillover in the first place); but we can remedy this by replacing
this space of digits with the cyclic group Z/10Z. This gives us the model (Z/10Z)[t]
for the integers; this is the decimal representation without the operation of carrying.
If we follow the usual decimal notation and identify polynomials in (Z/10Z)[t] with
strings of digits in the usual manner (e.g. identifying 3t + 2 with 32) then we obtain a
number system which is similar, but not quite identical, to the integers. For instance,
66 + 77 now equals 33 rather than 143; 25 ∗ 4 now equals 80 rather than 100; and so
forth. Note that unlike the natural numbers, the space of polynomials is already closed
under negation and so there is no need to introduce negative numbers; for instance, in
this system we have −12 = 98. I’ll refer to (Z/10Z)[t] as the “base 10 dyadic” model
for the integers (somewhat annoyingly, the term “10-adic” is already taken to mean
something slightly different).
There is also a base 10 dyadic model for the real numbers, in which we allow
infinitely many negative powers of t but only finitely many positive powers in t; in
other words, the model is (Z/10Z)((1/t)), the ring of formal Laurent series in 1/t.
This ring again differs slightly from the reals; for instance, 0.999 . . . is now no longer
equal to 1.000 . . . (in fact, they differ by 1.111 . . .). So the decimal notation maps
(Z/10Z)((1/t)) onto the positive real axis R+ , but there is a small amount of non-
injectivity caused by this map.
The base 10 dyadic models for the reals and integers are not particularly accurate,
due to the presence of zero divisors in the underlying base ring Z/10Z. For instance,
we have 2 × 5 = 0 in this model. One can do a lot better by working over a finite field
F, such as the field F2 of two elements. This gives us dyadic models F[t] and F((1/t))
for the integers and reals respectively which turn out to be much closer analogues than
the base 10 model. For instance, F[t], like the integers, is a Euclidean domain, and
F((1/t)) is a field. (In the binary case F = F2 , the addition operation is just bitwise
XOR, and multiplication is bitwise convolution.) We can also model many other non-
dyadic objects, as the following table illustrates:
2.6. DYADIC MODELS 113
Non-dyadic Dyadic
Integers Z Polynomials F[t]
Rationals Q Rational functions F(t)
Reals R Laurent polynomials F((1/t))
Unit circle R/Z F((1/t))/F[t] ≡ 1t F[ 1t ]
d
|F| · Z t d · F[t]
Cyclic group Z/|F| · Zd Vector space F d
Finite field Z/p · Z Finite field F[t]/p(t) · F[t]
Absolute value (Exponential of) degree
Plane wave Walsh function
Wavelet Haar wavelet
Gaussian Step function
Ball Dyadic interval
Heat operators Martingale conditional expectations
Band-limited Locally constant
Interval / arithmetic progression Subspace / subgroup
Bohr set Hyperplane
Recall that we can define the absolute value (or norm) of an integer n as the index
of the subgroup hni of the integers. Exactly the same definition can be applied to the
dyadic model F[t] of the integers; the absolute value of an element n ∈ F[t] can then
be seen to equal |n| = |F|deg(n) ∈ Z+ , where deg(n) is the degree of t in n (with the
convention that 0 has a degree of −∞ and thus an absolute value of 0). For instance,
in the binary case, t 3 + t + 1 (or 1011) has a norm of 8. Like the absolute value on the
integers, the absolute value on the dyadic model F[t] of the integers is multiplicative
and obeys the triangle inequality, giving rise to a metric on F[t] by the usual formula
d(n, m) := |n − m|. In fact, we have something better than a metric, namely an ultra-
metric; in the dyadic world, the triangle inequality
One can then uniquely extend this absolute value multiplicatively to the dyadic model
F((1/t)) of the reals, which is given by the same formula |n| = |F|deg(n) ∈ R+ , where
deg(n) is now understood to be the highest exponent of t which appears in the expan-
sion of n (or −∞ of no such exponent appears). Thus for instance in the binary case
1/t + 1/t 2 + 1/t 3 + . . . (or 0.111 . . .) has a norm of 1/2. Just as with the real line, this
absolute value turns the dyadic real line F((1/t)) into a complete metric space. The
metric space then generates balls B(x, r) := {y ∈ F((1/t)) : |y − x| < r}, which in the
binary case are identifiable with dyadic intervals. The fact that we have an ultrametric
instead of a metric means that the balls enjoy a very useful nesting property, which is
unavailable in the non-dyadic setting: if two balls intersect, then the larger one must
necessarily contain the smaller one.
On the other hand, most of the “one-dimensional” structure of the real line is lost
when one passes to the dyadic model. For instance, the dyadic real line is still locally
114 CHAPTER 2. EXPOSITORY ARTICLES
compact, but not locally connected; the topology is instead locally that of a Cantor
space. There is no natural notion of order on the dyadic integers or real line, and the
metric is non-Archimedean. Related to this, mathematical induction no longer applies
to the dyadic integers. Nevertheless, and somewhat counter-intuitively, one can go
remarkably far in mimicking many features of the integers and real numbers without
using any one-dimensional structure. I’ll try to illustrate this in a number of contexts.
d
ep( ∑ a j t j ) := e(a−1 /p)
j=−∞
(which would be a square wave in the binary case) using almost exactly the same
formula, namely Z
fˆ(ξ ) := f (x)e p (−xξ ) dx
F((1/t))
for all well-behaved f : F((1/t)) → C. One can then show that this dyadic Fourier
transform (known as the Walsh-Fourier transform in the binary case) enjoys all the
usual algebraic properties that the non-dyadic Fourier transform does - for instance,
it reacts with convolution, translation, modulation, and dilation in exactly the same
way as its non-dyadic counterpart, and also enjoys a perfect analogue of Plancherel’s
theorem. (It also has a more pleasant fast Fourier transform algorithm than its non-
dyadic counterpart, as one no longer needs the additional step of taking care of the
2.6. DYADIC MODELS 115
spillover from one scale to the next.) In fact, the dyadic structure makes the harmonic
analysis on F((1/t)) somewhat simpler than that on R, because of the ability to have
perfect phase space localisation. In the real line, it is well-known that a function and
its Fourier transform cannot simultaneously be compactly supported without vanishing
completely (because if a function was compactly supported, then its Fourier transform
would be a real analytic function, which cannot be compactly supported without van-
ishing completely, due to analytic continuation). However, analytic continuation is
a highly “one-dimensional” property (among other things, it exploits connectedness).
Furthermore, it is not a robust property, and it is possible to have functions f on the real
line such that f and its Fourier transform are “almost compactly supported”, or more
precisely rapidly decreasing; the Gaussian function f (x) = exp(−π|x|2 ), which is its
own Fourier transform, is a particularly good example. In the dyadic world, the ana-
logue of the Gaussian function is the step function 1B(0,1) , which is also its own Fourier
transform, and thus demonstrates that it is possible for a function and its Fourier trans-
form to both be compactly supported. More generally, it is possible for a function
f : F((1/t)) → C to be supported on a dyadic interval I, and for its Fourier trans-
form to be supported on another dyadic interval J, as long as the uncertainty principle
|I||J| ≥ 1 is respected. One can use these “Walsh wave packets” (which include the
Haar wavelets and Radamacher functions as special cases) to elegantly and efficiently
perform time-frequency analysis in the dyadic setting. This has proven to be an invalu-
able model to work with before tackling the more interesting time-frequency problems
in the non-dyadic setting (such as those relating to Carleson’s theorem[Ca1966], or to
various multilinear singular integrals), as many technical headaches (such as those in-
volving “Schwartz tails”) are absent in the dyadic setting, while the time-frequency
combinatorics (which is really the heart of the matter) stays largely intact19 . See
[Pe2001], [Ta2001]
In some cases one can in fact deduce a non-dyadic harmonic analysis result di-
rectly from a dyadic one via some sort of averaging argument (or the 1/3 translation
trick of Michael Christ[Ch1988], which is the observation that every non-dyadic inter-
val (in, say, [0, 1]) is contained either in a dyadic interval of comparable size, or the 1/3
translation of a dyadic interval of comparable size). In particular the “Bellman func-
tion” approach to harmonic analysis often proceeds via this averaging, as the Bellman
function method requires a recursive dyadic structure (or a continuous heat kernel-type
structure) in order to work properly. In general, though, the dyadic argument only
serves as a model “road map” for the non-dyadic argument, rather than a formal com-
ponent. There are only a few cases known where a dyadic result in harmonic analysis
has not shown the way towards proving the non-dyadic analogue; one of these excep-
tions is the problem of establishing a nonlinear analogue of Carleson’s theorem, which
was achieved in the dyadic setting[MuTaTh2003b] but remains open in the non-dyadic
setting.
19 To give just one example, the Shannon sampling theorem collapses in the dyadic setting to the trivial
statement that a function which is locally constant on dyadic intervals of length 2−n , can be reconstructed
exactly by sampling that function at intervals of 2−n
116 CHAPTER 2. EXPOSITORY ARTICLES
{0} = F 0 ⊂ F 1 ⊂ . . . ⊂ F n
to convert the dyadic arguments to the non-dyadic setting (although the converse step
of converting non-dyadic arguments to dyadic ones is usually rather straightforward).
One notable exception here is the parity problem (Section 1.10, which has resisted
progress in both dyadic and non-dyadic settings.
Let’s now turn to the Riemann hypothesis. Classically, number theory has focused
on the multiplicative structure of the ring of integers Z. After factoring out the group of
units {−1, +1}, we usually restrict attention to the positive integers Z+ . In the dyadic
model, we study the multiplicative structure of the ring F[t] of polynomials for some
finite field F. After factoring out the group F × of units, we can restrict attention to
the monic polynomials F[t]m . As the ring of polynomials is a Euclidean domain, it
has unique factorisation, and in particular every monic polynomial can be expressed
uniquely (up to permutation) as the product of irreducible monic polynomials, which
we shall call prime polynomials.We can analyse the problem of counting primes in F[t]
by using zeta functions, in complete analogy with the integer case. The Riemann zeta
function is of course given by
1
ζ (s) := ∑ s
n∈Z+
n
(for Re(s) > 1) and we introduce the analogous zeta function
1
ζF[t] (s) := ∑ .
n∈F[t]m
|n|s
log n = ∑ Λ(d)
d|n
where Λ(d) is the von Mangoldt function, defined to equal log p when d is the power
of a prime p and 0 otherwise. Taking the Mellin transform of this identity, we conclude
that
ζ 0 (s) Λ(n)
− = ∑ s
,
ζ (s) n∈Z + n
which is the fundamental identity linking the zeroes of the zeta function to the dis-
tribution of the primes. We can do the same thing in the dyadic case, obtaining the
identity
0 (s)
ζF[t] ΛF[t] (n)
− = ∑ , (2.6)
ζF[t] (s) n∈F[t] |n|s
m
where the von Mangoldt function ΛF[t] (n) for F[t] is defined as log |p| when n is the
power of a prime polynomial p, and 0 otherwise.
So far, the dyadic and non-dyadic situations are very closely analogous. But now
we can do something special in the dyadic world: we can compute the zeta function
explicitly by summing by degree. Indeed, we have
∞
1
ζF[t] (s) = ∑ ∑ ds
.
d=0 n∈F[t]m :deg(n)=d
|F|
2.6. DYADIC MODELS 119
The number of monic polynomials of degree d is |F|d . Summing the geometric series,
we obtain an exact formula for the zeta function:
In particular, the Riemann hypothesis for F[t] is a triviality - there are clearly no zeroes
whatsoever! Inserting this back into (2.6) and comparing coefficients, one soon ends
up with an exact prime number theorem for F[t]:
which quickly implies that the number of prime polynomials of degree d is d1 |F|d +
O(|F|d/2 ). (One can generalise the above analysis to other varieties over finite fields,
leading ultimately to the (now-proven) Weil conjectures, which include the “Riemann
hypothesis for function fields”.)
Another example of a problem which is hard in non-dyadic number theory but triv-
ial in dyadic number theory is factorisation. In the integers, it is not known whether
a number which is n digits long can be factored (probabilistically) in time polynomial
in n (the best known algorithm for large n, the number field sieve, takes a little longer
than exp(O(log1/3 n)) time, according to standard heuristics); indeed, the presumed
hardness of factoring underlies many popular cryptographic protocols such as RSA.
However, in F[t] with F fixed, a polynomial f of degree n can be factored (probabilis-
tically) in time polynomial in n by the following three-stage algorithm:
1. Compute the gcd of f and its derivative f 0 using the Euclidean algorithm (which
is polynomial time in the degree). This locates all the repeated factors of f , and
lets one quickly reduce to the case when f is squarefree. (This trick is unavailable
in the integer case, due to the lack of a good notion of derivative.)
2. Observe (from Cauchy’s theorem) that for any prime polynomial g of degree d,
d d
we have t |F| = t mod g. Thus the polynomial t |F| − t contains the product of
all the primes of this degree (and of all primes of degree dividing d); indeed, by
the exact prime number theorem and a degree count, these are the only possible
d d
factors of t |F| − t. It is easy to compute the remainder of t |F| − t modulo f in
d
polynomial time, and then one can compute gcd of f with t |F| − t in polyno-
mial time also. This essentially isolates the prime factors of a fixed degree, and
quickly lets one reduce to the case when f is the product of distinct primes of the
same degree d. (Here we have exploited the fact that there are many primes with
exactly the same norm - which is of course highly false in the integers. Similarly
in Step 3 below.)
3. Now we apply the Cantor-Zassenhaus algorithm. Let us assume that |F| is odd
(the case |F| = 2 can be treated by a modification of this method). By computing
d
g(|F| −1)/2 mod f for randomly selected g, we can generate some random square
roots a of 1 modulo f (thanks to Cauchy’s theorem and the Chinese remainder
theorem; there is also a small chance we generate a non-invertible element, but
120 CHAPTER 2. EXPOSITORY ARTICLES
this is easily dealt with). These square roots a will be either +1 or −1 modulo
each of the prime factors of f . If we take the gcd of f with a + 1 or a − 1 we
have a high probability of splitting up the prime factors of f ; doing this a few
times one soon isolates all the prime factors separately.
2.6.7 Conclusion
As the above whirlwind tour hopefully demonstrates, dyadic models for the integers,
reals, and other “linear” objects show up in many different areas of mathematics. In
some areas they are an oversimplified and overly easy toy model; in other areas they
get at the heart of the matter by providing a model in which all irrelevant technicalities
are stripped away; and in yet other areas they are a crucial component in the analysis
of the non-dyadic case. In all of these cases, though, it seems that the contribution that
dyadic models provide in helping us understand the non-dyadic world is immense.
2.6.8 Notes
This article was originally posted on July 27, 2007 at
terrytao.wordpress.com/2007/07/27
John Armstrong noted some analogies between the cocycles which make the non-
dyadic world more complicated than the dyadic world, and the commutators which
make the non-commutative world more complicated than the commutative world, though
the former seem to be more “nilpotent” than the latter.
2.7. MATH DOESN’T SUCK 121
Anyway, Danica’s book has already been reviewed in several places, and there’s
not much more I can add to what has been said elsewhere. I thought however that I
could talk about another of Danica’s contributions to mathematics, namely her paper
“Percolation and Gibbs states multiplicity for ferromagnetic Ashkin-Teller models on
Z2 ”[ChMcKWi1998], joint with Brandy Winn and my colleague Lincoln Chayes. This
paper is noted from time to time in the above-mentioned publicity, and its main result
is sometimes referred to there as the “Chayes-McKellar-Winn theorem”, but as far as I
know, no serious effort has been made to explain exactly what this theorem is, or the
wider context the result is placed in. So I’ll give it a shot; this allows me an oppor-
tunity to talk about some beautiful topics in mathematical physics, namely statistical
mechanics, spontaneous magnetisation, and percolation.
set {x0 : H(x0 ) = E − H(x)}. Now as the outside system S0 is very large, this set will be
be enormous, and presumably very complicated as well; however, the key point is that
it only depends on E and x through the quantity E − H(x). Indeed, we conclude that
the canonical ensemble distribution of microstates at x is proportional to Ω(E − H(x)),
where Ω(E 0 ) is the number of microstates of the outside system S0 with energy E 0 .
Now it seems that it is hopeless to compute Ω(E 0 ) without knowing exactly how
the system S0 works. But, in general, the number of microstates in a system tends
to grow exponentially in the energy in some fairly smooth manner, thus we have
Ω(E 0 ) = exp(F(E 0 )) for some smooth increasing function F of E 0 (although in some
rare cases involving population inversion, F may be decreasing). Now, we are assum-
ing S0 is much larger than S, so E should be very large compared with H(x). In such
a regime, we expect Taylor expansion to be reasonably accurate, thus Ω(E − H(x)) ≈
exp(F(E) − β H(x)), where β := F 0 (E) is the derivative of F at E (or equivalently, the
log-derivative of Ω); note that β is positive by assumption. The quantity exp(F(E))
doesn’t depend on x, and so we conclude that the canonical ensemble is proportional to
counting measure, multiplied by the function exp(−β H(x)). Since probability distri-
butions have total mass 1, we can in fact describe the probability P(x) of the canonical
ensemble being at x exactly as
1 −β H(x)
P(x) = e
Z
Z := ∑ e−β H(x) .
x
The canonical ensemble is thus specified completely except for a single parame-
ter β > 0, which depends on the external system S0 and on the total energy E. But
if we take for granted the laws of thermodynamics (particularly the zeroth law), and
compare S0 with an ideal gas, we can obtain the relationship β = 1/kT , where T is
the temperature of S0 and k is Boltzmann’s constant. Thus the canonical ensemble of a
system S is completely determined by the temperature, and on the energy functional H.
The underlying transition graph and transition probabilities, while necessary to ensure
that one eventually attains this ensemble, do not actually need to be known in order to
compute what this ensemble is, and can now (amazingly enough) be discarded. (More
generally, the microscopic laws of physics, whether they be classical or quantum, can
similarly be discarded almost completely at this point in the theory of statistical me-
chanics; the only thing one needs those laws of physics to provide is a description of
all the microstates and their energy, though in some situations one also needs to be able
to compute other conserved quantities, such as particle number.)
At the temperature extreme T → 0, the canonical ensemble becomes concentrated
at the minimum possible energy Emin for the system (this fact, incidentally, inspires the
numerical strategy of simulated annealing); whereas at the other temperature extreme
T → ∞, all microstates become equally likely, regardless of energy.
2.7. MATH DOESN’T SUCK 125
ena such as spontaneous magnetism. A slight generalisation of the Ising model is the
Potts model; the Ashkin-Teller model, which is studied in [ChMcKWi1998], is an in-
terpolant betwen a certain Ising model and a certain Potts model.
1. In the classical Ising model, there are two magnetisation states (+1 and −1); the
energy of a bond between two particles is −1/2 if they are in the same state, and
+1/2 if they are in the opposite state (thus one expects the states to align at low
temperatures and become non-aligned at high temperatures);
2. In the four-state Ising model, there are four magnetisation states (+1, +1), (+1, −1),
(−1, +1), and (−1, −1) (which can be viewed as four equally spaced vectors in
the plane), and the energy of a bond between two particles is the sum of the clas-
sical Ising bond energy between the first component of the two particle states,
and the classical Ising bond energy between the second component. Thus for
instance the bond energy between particles in the same state is −1, particles
in opposing states is +1, and particles in orthogonal states (e.g. (+1, +1) and
(+1, −1)) is 0. This system is equivalent to two non-interacting classical Ising
models, and so the four-state theory can be easily deduced from the two-state
theory.
3. In the degenerate Ising model, we have the same four magnetisation states, but
now the bond energy between particles is +1 if they are in the same state or
opposing state, and 0 if they are in an orthogonal state. This model essentially
2.7. MATH DOESN’T SUCK 127
collapses to the two-state model after identifying (+1, +1) and (−1, −1) as a
single state, and identifying (+1, −1) and (−1, +1) as a single state.
4. In the four-state Potts model, we have the same four magnetisation states, but
now the energy of a bond between two particles is −1 if they are in the same
state and 0 otherwise.
5. In the Ashkin-Teller model, we have the same four magnetisation states; the en-
ergy of a bond between two particles is −1 if they are in the same state, 0 if
they are orthogonal, and ε if they are in opposing states. The case ε = +1 is the
four-state Ising model, the case ε = 0 is the Potts model, and the cases 0 < ε < 1
are intermediate between the two, while the case ε = −1 is the degenerate Ising
model.
For the classical Ising model, there are two minimal-energy states: the state where
all particles are magnetised at +1, and the state where all particles are magnetised at
−1. (One can of course also take a probabilistic combination of these two states, but
we may as well restrict attention to pure states here.) Since one expects the system to
have near-minimal energy at low temperatures, we thus expect to have non-uniqueness
of Gibbs states at low temperatures for the Ising model. Conversely, at sufficiently high
temperatures the differences in bond energy should become increasingly irrelevant, and
so one expects to have uniqueness of Gibbs states at high energy. (Nevertheless, there
is an important duality relationship between the Ising model at low and high energies.)
Similar heuristic arguments apply for the other models discussed above, though for
the degenerate Ising model there are many more minimal-energy states and so even
at very low temperatures one only expects to obtain partial ordering rather than total
ordering in the magnetisations.
For the Askhin-Teller models with 0 < ε < 1, it was known for some time that
there is a unique critical temperature Tc (which has a physical interpretation as the
Curie temperature), below which one has non-unique and magnetised Gibbs states
(thus the expected magnetisation of any given particle is non-zero), and above which
one has unique (non-magnetised) Gibbs states. (For ε close to −1 there are two critical
temperatures, describing the transition from totally ordered magnetisation to partially
ordered, and from partially ordered to unordered.) The problem of computing this tem-
perature Tc exactly, and to describe the nature of this transition, appears to be rather
difficult, although there are a large number of partial results. What was shown in
[ChMcKWi1998], though, is that this critical temperature Tc is also the critical tem-
perature Tp for a somewhat simpler phenomenon, namely that of site percolation. Let
us denote one of the magnetised states, say (+1, +1), as “blue”. We then consider
the Gibbs state for a bounded region (e.g. an N × N square), subject to the boundary
condition that the entire boundary is blue. In the zero temperature limit T → 0 the
entire square would then be blue; in the high temperature limit T → +∞ each particle
would have an independent random state. Consider the probability pN that a particle
at the center of this square is part of the blue “boundary cluster”; in other words, the
particle is not only blue, but there is a path of bond edges connecting this particle to the
boundary which only goes through blue vertices. Thus we expect this probability to be
128 CHAPTER 2. EXPOSITORY ARTICLES
close to 1 at very low temperatures, and close to 0 at very high temperatures. And in-
deed, standard percolation theory arguments show that there is a critical temperature Tp
below which limN→∞ pN is positive (or equivalently, the boundary cluster has density
bounded away from zero), and below which limN→∞ pN = 0 (thus the boundary cluster
has asymptotic density zero). The “Chayes-McKellar-Winn theorem” is then the claim
that Tc = Tp .
This result is part of a very successful program, initiated by Fortuin and Kasteleyn
[FoKa1972], to analyse the statistical mechanics of site models such as the Ising, Potts,
and Ashkin-Teller models via the random clusters generated by the bonds between
these sites. (One of the fruits of this program, by the way, was the FKG correlation in-
equality, which asserts that any two monotone properties on a lattice are positively cor-
related. This inequality has since proven to be incredibly useful in probability, combi-
natorics and computer science.) The claims Tc ≤ Tp and Tp ≤ Tc are proven separately.
To prove Tc ≤ Tp (i.e. multiple Gibbs states implies percolation), the main tool is a
theorem of Chayes and Machta[ChMa1995] that relates the non-uniqueness of Gibbs
states to positive magnetisation (the existence of states in which the expected magneti-
sation of a particle is non-zero). To prove Tp ≤ Tc (i.e. percolation implies multiple
Gibbs states), the main tool is a theorem of Gandolfi, Keane, and Russo[GaKeRu1998],
which studied percolation on the infinite lattice and who showed that under certain con-
ditions (in particular, that a version of the FKG inequality is satisfied), there can be at
most one infinite cluster; basically, one can use the colour of this cluster (which will
exist if percolation occurs) to distinguish between different Gibbs states. (The fractal
structure of this infinite cluster, especially near the critical temperature, is quite inter-
esting, but that’s a whole other story.) One of the main tasks in [ChMcKWi1998] paper
is to verify the FKG inequality for the Ashkin-Teller model; this is done by viewing
that model as a perturbation of the Ising model, and expanding the former using the
random clusters of the latter.
When one heats an iron bar magnet above a certain special temperature - the Curie
temperature - the iron bar will cease to be magnetised; when one cools the bar again
below this temperature, the bar can once again spontaneously magnetise in the pres-
ence of an external magnetic field. This phenomenon is still not perfectly understood;
for instance, it is difficult to predict the Curie temperature precisely from the funda-
mental laws of physics, although one can at least prove that this temperature exists.
However, Chayes, McKellar, and Winn have shown that for a certain simplified model
for magnetism (known as the Ashkin-Teller model), the Curie temperature is equal to
the critical temperature below which percolation can occur; this means that even when
the bar is unmagnetised, enough of the iron atoms in the bar spin in the same direction
that they can create a connected path from one end of the bar to another. Percolation in
the Ashkin-Teller model is not fully understood either, but it is a simpler phenomenon
to deal with than spontaneous magnetisation, and so this result represents an advance
in our understanding of how the latter phenomenon works.
2.7. MATH DOESN’T SUCK 129
2.7.5 Notes
This article was originally posted on Aug 20, 2007 at
terrytao.wordpress.com/2007/08/20
See also an explanation by John Baez at
golem.ph.utexas.edu/category/2007/08/gerbes\_in\_the\_guardian.html{\#}c011515
130 CHAPTER 2. EXPOSITORY ARTICLES
2.8 Nonfirstorderisability
I recently came across the phenomenon of nonfirstorderisability in mathematical logic:
there are perfectly meaningful and useful statements in mathematics which cannot be
phrased within the confines of first order logic (combined with the language of set the-
ory, or any other standard mathematical theory). In order to phrase such statements
rigorously, one must use a more powerful language such as second order logic instead.
This phenomenon is very well known among logicians, but I hadn’t learned about it un-
til very recently, and had naively assumed that first order logic sufficed for “everyday”
usage of mathematics.
Let’s begin with some simple examples of statements which can be expressed in
first-order logic. If B(x, y) is a binary relation on two objects x, y, then we can express
the statement
Theorem 2.36. To every finitely generated real vector space V one can associate a
unique non-negative integer dim(V ) such that
4. dim(R) = 1; and
which is part of the fundamental theorem of linear algebra, does not seem to be
expressible as stated in first order set theory (though of course the concept of dimension
can be explicitly constructed within this language), even if we drop the uniqueness and
restrict ourselves to just the assertion that dim() obey, say, property 1, so that we get
an assertion of the form (2.7). Note that the category of all finite-dimensional vector
spaces is not a set (for reasons relating to Russell’s paradox) and so we cannot view
dim as a function. More generally, many statements in category theory dealing with
large categories seem to not be expressible in first order logic.
I can’t quite show that (2.7) is not expressible in first-order logic, but I can come
very close, using non-standard analysis (see Section 2.5). The statement
Theorem 2.37. For every real numbers x and x0 there exists real numbers st(x) and
st(x0 ) depending only on x and x0 respectively, such that st(x + x0 ) = st(x) + st(x0 ),
st(xx0 ) = st(x)st(x0 ), st(1) = 1, and st(x) is non-negative whenever x is non-negative,
and also such that st(x) is not always equal to x.
is true in the non-standard model of the real numbers, but false in the standard
model (this is the classic algebra homework problem that the only order-preserving
field homomorphism on the reals is the identity). Since the transfer principle ensures
that all first-order statements that are true in the standard reals are also true in the non-
standard reals, this means that the above statement cannot be expressed in first-order
logic. If it weren’t for the “st(x) is not always equal to x” part, this would basically be
of the form (2.7).
It seems to me that first order logic is limited by the linear (and thus totally ordered)
nature of its sentences; every new variable that is introduced must be allowed to depend
on all the previous variables introduced to the left of that variable. This does not fully
capture all of the dependency trees of variables which one deals with in mathematics.
In analysis, we tend to get around this by using English phrasings such as
and
or by using the tremendously convenient O() and o() notation of Landau. One then
takes for granted that one can eventually unwind all these phrasings to get back to a
sentence in formal, first-order logic. As far as analysis is concerned, this is a fairly
safe assumption, since one usually deals with objects in very concrete sets such as the
real numbers, and one can easily model all of these dependencies using functions from
concrete sets to other concrete sets if necessary. (Also, the hierarchy of magnitudes in
analysis does often tend to be rather linearly ordered.) But some subtleties may appear
when one deals with large categories, such as the category of sets, groups, or vector
spaces (though in most applications, one can cap the cardinality of these objects and
then one can represent these categories up to equivalence by an actual set). It may be
that a more diagrammatic language (perhaps analogous to the commutative diagrams
in category theory, or one based on trees or partially ordered sets rather than linearly
ordered ones) may be a closer fit to expressing the way one actually thinks about how
variables interact with each other. Second-order logic¡/a¿ is, of course, an obvious
candidate for such a language, but it may be overqualified for the task.
2.8.1 Notes
This article was originally posted on Aug 28, 2007 at
terrytao.wordpress.com/2007/08/27
Ori Gurel-Gurevich pointed out that if one used a first-order set theory such as
NBG, which incorporates classes as well as sets, then statements such as Theorem 2.36
can be stated in first-order logic.
Andy D. gave the example of the quaternary relation Q(x, x0 , y, y0 ) defined as
(y 6= x) ∧ ((x = x0 ) =⇒ (y = y0 )) ∧ ((x 6= x0 ) =⇒ (y 6= y0 ))
for which (2.7) holds if and only if there is a perfect matching on the elements of the
universe, or in other words if the universe is either infinite or finite of even order. But
the parity of a finite universe is known to not be definable in first-order logic, thus
establishing the claim in the article.
Suresh Venkat commented on connections between first and second-order logic and
complexity theory, while Emmanuel Kowalski commented on connections between
first-order definability and the structure of sets in arithmetic geometry. David Corfield
pointed out the work of Hintikka on branching quantifiers, which can capture state-
ments such as (2.7), and the work of Abramsky connecting these quantifiers to game
theory.
Thanks to tom for corrections.
2.9. AMPLIFICATION AND ARBITRAGE 133
kv − wk2 ≥ 0 (2.9)
but after expanding everything out, one only gets the weaker inequality
1 1
Rehv, wi ≤ kvk2 + kwk2 . (2.10)
2 2
Now (2.10) is weaker than (2.8) for two reasons; the left-hand side is smaller, and
the right-hand side is larger (thanks to the arithmetic mean-geometric mean inequality).
However, we can amplify (2.10) by arbitraging some symmetry imbalances. Firstly,
observe that the phase rotation symmetry v 7→ eiθ v preserves the RHS of (2.10) but not
the LHS. We exploit this by replacing v by eiθ v in (2.10) for some phase θ to be chosen
later, to obtain
134 CHAPTER 2. EXPOSITORY ARTICLES
1 1
Re(eiθ hv, wi) ≤ kvk2 + kwk2 .
2 2
Now we are free to choose θ at will (as long as it is real, of course), so it is natural
to choose θ to optimise the inequality, which in this case means to make the left-hand
side as large as possible. This is achieved by choosing eiθ to cancel the phase of hv, wi,
and we obtain
1 1
|hv, wi| ≤ kvk2 + kwk2 . (2.11)
2 2
This is closer to (2.8); we have fixed the left-hand side, but the right-hand side is
still too weak. But we can amplify further, by exploiting an imbalance in a different
symmetry, namely the homogenisation symmetry (v, w) 7→ (λ v, λ1 w) for a scalar λ > 0,
which preserves the left-hand side but not the right. Inserting this transform into (2.11)
we conclude that
λ2 1
|hv, wi| ≤ kvk2 + 2 kwk2
2 2λ
where λ > 0 is at our disposal to choose. We can optimise in λ by minimising the
right-hand side, and indeed one easily sees that the minimum p (or infimum, if one of
v and w vanishes) is kvkkwk (which is achieved when λ = kwk/kvk when v, w are
non-zero, or in an asymptotic limit λ → 0 or λ → ∞ in the degenerate cases), and so
we have amplified our way to the Cauchy-Schwarz inequality (2.8).
for non-negative measurable f , g and dual exponents 1 ≤ p, q ≤ ∞, one can begin with
the elementary (weighted) arithmetic mean-geometric mean inequality
1 p 1 q
ab ≤ a + b (2.14)
p q
2.9. AMPLIFICATION AND ARBITRAGE 135
for non-negative a, b (which follows from the convexity of the function θ 7→ aθ b1−θ ,
which in turn follows from the convexity of the exponential function) to obtain the
inequality
1 1
Z
f (x)g(x) dµ(x) ≤ k f kLp p (X,dµ) + kgkqLq (X,dµ) .
X p q
This inequality is weaker than (2.13) (because of (2.14)); but if one amplifies by arbi-
traging the imbalance in the homogenisation symmetry ( f , g) 7→ (λ f , λ1 g) one obtains
(2.13). As a third example, the Sobolev embedding inequality
k f kLq (Rd ) ≤ C p,q,d (k f kL p (Rd ) + k∇ f kL p (Rd ) ), (2.15)
which is valid for 1 < p < q < ∞ and 1q > 1p − d1 (and also valid in some endpoint cases)
and all test functions (say) f on Rd , can be amplified to obtain the Gagliardo-Nirenberg
inequality
k f kLq (Rd ) ≤ C p,q,d k f k1−θ
L p (Rd )
k∇ f kθL p (Rd ) (2.16)
where 0 < θ < 1 is the number such that 1q = 1p − θd , by arbitraging the action of the
dilation group f (x) 7→ f (λ x). (In this case, the dilation action does not leave either
the LHS or RHS of (2.15) invariant, but it affects the LHS in a well controlled man-
ner, which can be normalised out by dividing by a suitable power of λ .) The same
trick, incidentally, reveals why the Sobolev embedding inequality fails when q < p or
when q1 < 1p − d1 , because in these cases it leads to an absurd version of the Gagliardo-
Nirenberg inequality. Observe also that the Gagliardo-Nirenberg inequality (2.16) is
dimensionally consistent; the dilation action affects both sides of the inequality in the
same way. (The weight of the representation of the dilation action on an expression
is the same thing as the exponent of the length unit that one assigns to the dimen-
sion of that expression.) More generally, arbitraging a dilation symmetry allows a
dimensionally consistent inequality to emerge from a dimensionally inconsistent (or
dimensionally inefficient) one.
Applying the Sobolev inequality (2.15) to each localised function ψ(· − n) f and then
summing up, one obtains the localised Sobolev inequality k f kLq (Rd ) ≤ C0p,q,d (∑n∈Zd (k f kL p (Qn ) +
k∇ f kL p (Qn ) )q )1/q , where Qn is the cube of sidelength 1 centred at n. This estimate is a
little stronger than (2.15), because the l q summation norm is smaller than the l p sum-
mation norm.
20 This particular amplification was first observed by Marcinkiewicz and Zygmund[MaZy1939].
2.9. AMPLIFICATION AND ARBITRAGE 137
to be true. Indeed, if such an estimate was true, then by using the translation invariance
we can amplify the above estimate to
for any x0 ∈ Rn . But if one fixes f and lets x0 go to infinity, we see that the right-hand
side grows like |x0 |β while the left-hand side grows like |x0 |α (unless T f vanishes
entirely), leading to a contradiction.
One can obtain particularly powerful amplifications by combining translation-invariance
with linearity, because one can now consider not just translates f (x − x0 ) of a single
function f , but also consider superpositions ∑Nn=1 cn f (x − xn ) of such functions. For in-
stance, we have the principle (which I believe was first articulated by Littlewood) that a
non-trivial translation-invariant linear operator T can only map L p (Rd ) to Lq (Rd ) when
q ≥ p. (Littlewood summarised this principle as “the higher exponents are always on
the left”.) To see this, suppose that we had an estimate of the form
a function space norm with a low number of derivatives (i.e. a low-regularity norm) cannot control a norm
with a high number of derivatives. Here, the underlying symmetry that drives this principle is modulation
invariance rather than translation invariance.
138 CHAPTER 2. EXPOSITORY ARTICLES
C
m({x ∈ G : M f (x) ≥ λ }) ≤ k f kLp p (G) . (2.19)
λp
To see this, suppose for contradiction that (2.19) failed for any C; by homogeneity, it
would also fail even when restricted to the case λ = 1. What this means (thanks to the
axiom of choice) is that for any δ > 0, there exists fδ such that
On the other hand, another application of Khintchine’s inequality using (2.20) shows
that Fδ has an L p norm of O(δ 1/p ) on the average. Thus we have constructed functions
f of arbitrarily small L p norm whose maximal function M f is bounded away from zero
on a set of measure bounded away from zero. From this and some minor additional
tricks it is not difficult to then construct a function f in L p whose maximal function is
infinite on a set of positive measure, leading to the desired contradiction.
If one has an estimate for which only one of the sides behaves nicely under tensor
powers, then there can be some opportunity for arbitrage. For instance, suppose we
wanted to prove the Hausdorff-Young inequality
for some constant C p > 1. However, we can exploit the fact that the Fourier trans-
form commutes with tensor powers. Indeed, by applying the above inequality with f
replaced by f ⊗M (and G replaced by GM ) we see that
k fˆkM 0
l p (Ĝ)
≤ C p k f kM
L p (G)
for every M ≥ 1; taking Mth roots and then letting M go to infinity we obtain (2.21); the
tensor power trick has “magically” deleted the constant C p from the inequality. More
generally, one can use the tensor power trick to deduce the Riesz-Thorin interpolation
theorem from the Marcinkiewicz interpolation theorem (the key point being that the
(L p , Lq ) operator norm of a tensor power T ⊗M of a linear operator T is just the Mth
power of the operator norm of the original operator T). This gives a proof of the Riesz-
Thorin theorem that does not require complex analysis.
Actually, the tensor power trick does not just make constants disappear; it can also
get rid of logarithms. Because of this, we can make the above argument even more
elementary by using a very crude form of the Marcinkiewicz interpolation argument.
Indeed, suppose that f is a quasi-step function, or more precisely that it is supported
on some set E in G and takes values between A and 2A for some A > 0. Then from
(2.22), (2.23) we see that k fˆkl ∞ (Ĝ) = O(A|E|/|G|) and k fˆkl 2 (Ĝ) = O(A(|E|/|G|)1/2 ),
and hence k fˆk p0 = O(A(|E|/|G|)1/p ). Now if f is not a quasi-step function, one can
l (Ĝ)
decompose it into O(1 + log |G|) such functions by the “wedding cake” decomposition
(dividing the range of | f | into dyadic intervals from k f kL∞ to k f kL∞ /|G|100 ; the portion
of | f | which is less than k f kL∞ /|G|100 can be easily dealt with by crude methods). From
the triangle inequality we then conclude the weak Hausdorff-Young inequality
If one runs the tensor power trick again, one can eliminate both the constant factor
C p and the logarithmic factor 1 + log |G| and recover (2.21) (basically because M 1/M
converges to 1 as M goes to infinity). More generally, the tensor power trick can convert
restricted or weak-type estimates into strong-type estimates whenever a tensor power
symmetry is available.
140 CHAPTER 2. EXPOSITORY ARTICLES
The deletion of the constant C p may seem minor, but there are some things one
can do with a sharp22 estimate that one cannot with a non-sharp one. For instance,
by differentiating (2.21) at p = 2 (where equality holds) one can obtain the entropy
uncertainty principle
1 1 1
∑ | fˆ(ξ )|2 log | fˆ(ξ )|2 + |G| ∑ | f (x)|2 log | f (x)|2 ≥ 2 log |G|
χ∈Ĝ x∈G
whenever we have the normalisation k f kL2 (G) = 1. (More generally, estimates involv-
ing Shannon entropy tend to be rather amenable to the tensor power trick.)
The tensor power trick also allows one to disprove certain estimates. Observe that
if two functions f , g on a finite additive group G such that | f (x)| ≤ g(x) for all x (i.e. g
majorises f ), then from Plancherel’s identity we have
and more generally (by using the fact that the Fourier transform intertwines convolution
and multiplication) that
k fˆkl p (Ĝ) ≤ kĝkl p (Ĝ)
for all even integers p = 2, 4, 6, . . .. Hardy and Littlewood conjectured that a similar
bound held for all 2 ≤ p < ∞, thus
But if such a bound held, then by the tensor power trick one could delete the con-
stant C p . But then a direct computation (for instance, inspecting what happens when f
is infinitesimally close to g) shows that this amplified estimate fails, and so the Hardy-
Littlewood majorant conjecture is false. (With a little more work, one can then transfer
this failure from finite abelian groups G to other groups, such as the unit circle R/Z
or cyclic groups Z/NZ, which do not obviously admit tensor product structure; this
was first done in [Ba1973], and with stronger quantitative estimates in [MoSh2002],
[GrRu2004].)
The tensor product trick is also widely used in additive combinatorics (I myself
learnt this trick from [Ru1996]). Here, one deals with sets A rather than functions f,
but the idea is still the same: replace A by the Cartesian power AM , see what estimate
one gets, and let M → ∞. There are many instances of this trick in the literature, but I’ll
just describe one representative one, due to Ruzsa[Ru1996]. An important inequality
of Plünnecke[Pl1969] asserts, among other things, that for finite non-empty sets A, B
of an additive group G, and any positive integer k, the iterated sumset kB = B + . . . + B
obeys the bound
|A + B|k
|kB| ≤ . (2.24)
|A|k−1
22 I should remark that in Euclidean space, the constant in Hausdorff-Young can be improved to below 1,
but this requires some particularly Euclidean devices, such as the use of Gaussians, although this is not too
dissimilar as there are certainly many connections between Gaussians and tensor products (cf. the central
limit theorem). All of the above discussion also has an analogue for Young’s inequality. See [Be1975] for
more details.
2.9. AMPLIFICATION AND ARBITRAGE 141
(This inequality, incidentally, is itself proven using a version of the tensor power trick,
in conjunction with Hall’s marriage theorem, but never mind that here.) This inequality
can be amplified to the more general inequality
|A + B1 | . . . |A + Bk |
|B1 + . . . + Bk | ≤
|A|k−1
via the tensor power trick as follows. Applying (2.24) with B := B1 ∪. . .∪Bk , we obtain
(|A + B1 | + . . . + |A + Bk |)k
|B1 + . . . + Bk | ≤ .
|A|k−1
The right-hand side looks a bit too big, but this is the same problem we encountered
with the Cauchy-Schwarz or Holder inequalities, and we can resolve it in a similar
way (i.e. by arbitraging homogeneity). If we replace G with the larger group G × Zk
and replace each set Bi with the larger set Bi × {ei , 2ei , . . . , Ni ei }, where e1 , . . . , ek is
the standard basis for Zk and Ni are arbitrary positive integers (and replacing A with
A × {0}), we obtain
(N1 |A + B1 | + . . . + Nk |A + Bk |)k
N1 . . . Nk |B1 + . . . + Bk | ≤ .
|A|k−1
|A + B1 | . . . |A + Bk |
|B1 + . . . + Bk | ≤ Ck
|A|k−1
for some constant Ck ; but then if one replaces A, B1 , . . . , Bk with their Cartesian pow-
ers AM , BM M th
1 , . . . , Bk , takes M roots, and then sends M to infinity, we can delete the
constant Ck and recover the inequality.
2.9.5 Notes
This article was originally posted on Sep 5, 2007 at
terrytao.wordpress.com/2007/09/05
A rather different perspective on the Cauchy-Schwarz inequality can be found at
www.dpmms.cam.ac.uk/\˜{}wtg10/csineq.html
Emmanuel Kowalski pointed out that Deligne’s proof [De1974] of the Weil conjec-
tures also relies on the tensor power trick. Mike Steele pointed out Landau’s proof of
the maximum principle | f (z)| ≤ supw∈γ | f (w)| for holomorphic functions f in an open
domain, closed curves γ in that domain, and points z in the interior of that curve, also
exploited the tensor power trick, by first using the Cauchy integral formula to establish
a crude bound | f (z)| ≤ Cz,γ supw∈γ | f (w)| and then deleting the constant Cz,γ using the
tensor power symmetry f 7→ f n .
Thanks to furia kucha, Van Vu, and Andy Cotton-Clay for corrections.
142 CHAPTER 2. EXPOSITORY ARTICLES
Now one of the great features of graphs, as opposed to some other abstract maths
concepts, is that they are easy to draw: the abstract vertices become dots on a plane,
while the edges become line segments or curves connecting these dots23 . Let us infor-
mally refer to such a concrete representation D of a graph G as a drawing of that graph.
Clearly, any non-trivial graph is going to have an infinite number of possible drawings.
In some of these drawings, a pair of edges might cross each other; in other drawings, all
edges might be disjoint (except of course at the vertices, where edges with a common
endpoint are obliged to meet). If G has a drawing D of the latter type, we say that the
graph G is planar.
Given an abstract graph G, or a drawing thereof, it is not always obvious as to
whether that graph is planar; just because the drawing that you currently possess of G
contains crossings, does not necessarily mean that all drawings of G do. The wonder-
ful little web game “Planarity” at www.planarity.net illustrates this point excel-
lently. Nevertheless, there are definitely graphs which are not planar; in particular the
complete graph K5 on five vertices, and the complete bipartite graph K3,3 on two sets
of three vertices, are non-planar.
There is in fact a famous theorem of Kuratowski[Ku1930] that says that these two
graphs are the only ”source” of non-planarity, in the sense that any non-planar graph
contains (a subdivision of) one of these graphs as a subgraph. (There is of course the
even more famous four-colour theorem that asserts that every planar graphs is four-
colourable, but this is not the topic of my article today.)
Intuitively, if we fix the number of vertices |V |, and increase the number of edges
|E|, then the graph should become “increasingly non-planar”; conversely, if we keep
the same number of edges |E| but spread them amongst a greater number of vertices
|V |, then the graph should become “increasingly planar”. Is there a quantitative way to
23 To avoid some technicalities we do not allow these curves to pass through the dots, except if the curve
K K 3,3
5
measure the “non-planarity” of a graph, and to formalise the above intuition as some
sort of inequality?
It turns out that there is an elegant inequality that does precisely this, known as the
crossing number inequality [AjChNeSz1982], [AjChNeSz1982]. Nowadays it can be
proven by two elegant amplifications of Euler’s formula, as we shall see.
If D is a drawing of a graph G, we define cr(D) to be the total number of crossings
- where pairs of edges intersect at a point, for a reason other than sharing a common
vertex. (If multiple edges intersect at the same point, each pair of edges counts once.)
We then define the crossing number cr(G) of G to be the minimal value of cr(D) as D
ranges over the drawings of G. Thus for instance cr(G) = 0 if and only if G is planar.
One can also verify that the two graphs K5 and K3,3 have a crossing number of 1. This
quantity cr(G) will be the measure of how non-planar our graph G is. The problem
is to relate this quantity in terms of the number of vertices —V— and the number of
edges |E|. We of course do not expect an exact identity relating these three quantities
(two graphs with the same number of vertices and edges may have a different number
of crossing numbers), so will settle for good upper and lower bounds on cr(G) in terms
of |V | and |E|.
How big can the crossing number of a graph G = (V, E) be? A trivial upper bound
is cr(G) = O(|E|2 ), because if we place the vertices in general position (or on a circle)
and draw the edges as line segments, then every pair of edges crosses at most once. But
this bound does not seem very tight; we expect to be able to find drawings in which
most pairs of edges in fact do not intersect.
Let’s turn our attention instead to lower bounds. We of course have the trivial lower
bound cr(G) ≥ 0; can we do better? Let’s first be extremely unambitious and see when
one can get the minimal possible improvement on this bound, namely cr(G) > 0. In
other words, we want to find some conditions on |V | and |E| which will force G to be
non-planar. We can turn this around by taking contrapositives: if G is planar, what does
this tell us about |V | and |E|?
Here, the natural tool is Euler’s formula24 |V | − |E| + |F| = 2, valid for any planar
drawing, where |F| is the number of faces (including the unbounded face). What do we
know about |F|? Well, every face is adjacent to at least three edges, whereas every edge
24 This is the one place where we shall really use the topological structure of the plane; the rest of the
argument is combinatorial. There are some minor issues if the graph is disconnected, or if there are vertices
of degree one or zero, but these are easily dealt with.
144 CHAPTER 2. EXPOSITORY ARTICLES
is adjacent to exactly two faces. By double counting the edge-face incidences, we con-
clude that 3|F| ≤ 2|E|. Eliminating |F|, we conclude that |E| ≤ 3|V | − 6 for all planar
graphs (and this bound is tight when the graph is triangular). Taking contrapositives,
we conclude
cr(G) > 0 whenever |E| > 3|V | − 6. (2.25)
Now, let us amplify this inequality by exploiting the freedom to delete edges. Indeed,
observe that if a graph G can be drawn with only cr(G) crossings, then we can delete
one of the crossings by removing an edge associated to that crossing, and so we can
remove all the crossings by deleting at most cr(G) edges, leaving at least |E| − cr(G)
edges (and |V | vertices). Combining this with (2.25) we see that regardless of the
number of crossings, we have
|E| − cr(G) ≤ 3|V | − 6
leading to the following amplification of (2.25):
cr(G) ≥ |E| − 3|V | + 6 (2.26)
This is not the best bound, though, as one can already suspect by comparing (2.26) with
the crude upper bound cr(G) = O(|E|2 ). We can amplify (2.26) further by exploiting
a second freedom, namely the ability to delete vertices. One could try the same sort
of trick as before, deleting vertices which are associated to a crossing, but this turns
out to be very inefficient (because deleting vertices also deletes an unknown number
of edges, many of which had nothing to do with the crossing). Indeed, it would seem
that one would have to be fiendishly clever to find an efficient way to delete a lot of
crossings by deleting only very few vertices.
However, there is an amazing (and unintuitive) principle in combinatorics which
states that when there is no obvious “best” choice for some combinatorial object (such
as a set of vertices to delete), then often trying a random choice will give a reason-
able answer, if the notion of “random” is chosen carefully. (See [Go2000] for some
further discussion of this principle.) The application of this principle is known as the
probabilistic method, first introduced by Erdős [Er1947].
Here is how it works in this current setting. Let 0 < p ≤ 1 be a parameter to be
chosen later. We will randomly delete all but a fraction p of the vertices, by letting
each vertex be deleted with an independent probability of 1 − p (and thus surviving
with a probability of p). Let V 0 be the set of vertices that remain. Once one deletes
vertices, one also has to delete the edges attached to these vertices; let E 0 denote the
surviving edges (i.e. the edges connected to vertices in V 0 ). Let G0 = (V 0 , E 0 ) be the
surviving graph (known as the subgraph of G induced by V 0 ). Then from (2.26) we
have
cr(G0 ) ≥ |E 0 | − 3|V 0 | + 6.
Now, how do we get from this back to the original graph G = (V, E)? The quantities
|V 0 |, |E 0 |, and cr(G0 ) all fluctuate randomly, and are difficult to compute. However, their
expectations are much easier to deal with. Accordingly, we take expectations of both
sides (this is an example of the first moment method). Using linearity of expectation,
we have
E(cr(G0 )) ≥ E(|E 0 |) − 3E(|V 0 |) + 6.
2.10. THE CROSSING NUMBER INEQUALITY 145
These quantities are all relatively easy to compute. The easiest is E(|V 0 |). Each ver-
tex in V has a probability p of ending up in V 0 , and thus contributing 1 to —V’—.
Summing up (using linearity of expectation again), we obtain E(|V 0 |) = p|V |.
The quantity E(|E 0 |) is almost as easy to compute. Each edge e in E will have a
probability p2 of ending up in E 0 , since both vertices have an independent probability
of p of surviving. Summing up, we obtain E(|E 0 |) = p2 |E|. (The events that each
edge ends up in E 0 are not quite independent, but the great thing about linearity of
expectation is that it works even without assuming any independence.)
Finally, we turn to E(cr(G0 )). Let us draw G in the optimal way, with exactly cr(G)
crossings. Observe that each crossing involves two edges and four vertices. (If the
two edges involved in a crossing share a common vertex as well, thus forming an α
shape, one can reduce the number of crossings by 1 by swapping the two halves of the
loop in the α shape. So with the optimal drawing, the edges in a crossing do not share
any vertices in common.) Passing to G0 , we see that the probability that the crossing
survives in this drawing is only p4 . By one last application of linearity of expectation,
the expected number of crossings of this diagram that survive for G0 is p4 cr(G). This
particular diagram may not be the optimal one for G0 , so we end up with an inequality
Ecr(G0 ) ≤ p4 cr(G). Fortunately for us, this inequality goes in the right direction, and
we get a useful inequality:
|E|3
cr(G) ≥ whenever |E| ≥ 4|V |. (2.27)
64|V |2
This is quite a strong amplification of (2.25) or (2.26) (except in the transition region in
which |E| is comparable to |V |). Is it sharp? We can compare it against the trivial bound
cr(G) = O(|E|2 ), and we observe that the two bounds match up to constants when |E|
is comparable to |V |2 . (Clearly, |E| cannot be larger than |V |2 .) So the crossing number
inequality is sharp (up to constants) for dense graphs, such as the complete graph Kn
on n vertices.
146 CHAPTER 2. EXPOSITORY ARTICLES
Are there any other cases where it is sharp? We can answer this by appealing
to the symmetries of (2.27). By the nature of its proof, the inequality is basically
symmetric under passage to random induced subgraphs, but this symmetry does not
give any further examples, because random induced subgraphs of dense graphs again
tend to be dense graphs (cf. the computation of E|V 0 | and E|E 0 | above). But there is a
second symmetry of (2.27) available, namely that of replication. If one takes k disjoint
copies of a graph G = (V, E), one gets a new graph with k|V | vertices and k|E| edges,
and a moment’s thought will reveal that the new graph has a crossing number of kcr(G).
Thus replication is a symmetry of (2.27). Thus, (2.27) is also sharp up to constants for
replicated dense graphs. It is not hard to see that these examples basically cover all
possibilities of |V | and |E| for which |E| ≥ 4|V |. Thus the crossing number inequality
cannot be improved except for the constants. (The best constants known currently can
be found in [PaRaTaTo2006].)
Remark 2.38. A general principle, by the way, is that one can roughly gauge the
“strength” of an inequality by the number of independent symmetries (or approximate
symmetries) it has. If for instance there is a three-parameter family of symmetries,
then any example that demonstrates that sharpness of that inequality is immediately
amplified to a three-parameter family of such examples (unless of course the example
is fixed by a significant portion of these symmetries). The more examples that show
an inequality is sharp, the more efficient it is - and the harder it is to prove, since one
cannot afford to lose anything (other than perhaps some constants) in every one of the
sharp example cases. This principle is of course consistent with the points in my pre-
vious article (Section 2.9) on arbitraging a weak asymmetric inequality into a strong
symmetric one.
Dually, using the axiom that two lines intersect in at most one point, we obtain
(One can also deduce one inequality from the other by projective duality.)
Can one do better? The answer is yes, if we observe that a configuration of points
and lines naturally determines a drawing of a graph, to which the crossing number can
be applied. To see this, assume temporarily that every line in L is incident to at least
two points in P. A line l in L which is incident to k points in P will thus contain k − 1
line segments in P; k − 1 is comparable to k. Since the sum of all the k is I(P, L) by
definition, we see that there are roughly I(P, L) line segments of L connecting adjacent
points in P; this is a diagram with |P| vertices and roughly |I(P, L)| edges. On the other
hand, a crossing in this diagram can only occur when two lines in L intersect. Since two
lines intersect in at most one point, the total number of crossings is O(|L|2 ). Applying
the crossing number inequality (2.27), we obtain
We can then remove our temporary assumption that lines in L are incident to at least
two points, by observing that lines that are incident to at most one point will only
contribute O(|L|) incidences, leading to the Szemerédi-Trotter theorem
This bound is somewhat stronger than the previous bounds, and is in fact surprisingly
sharp; a typical example that demonstrates this is when P is the lattice {1, . . . , N} ×
{1, . . . , N 2 } and L is the set of lines {(x, y) : y = mx + b} with slope m ∈ {1, . . . , N} and
intercept b ∈ {1, . . . , N 2 }; here |P| = |L| = N 3 and the number of incidences is roughly
N4.
The original proof of this theorem, by the way, proceeded by amplifying (2.28) us-
ing the method of cell decomposition; it is thus somewhat similar in spirit to Szekély’s
proof, but was a bit more complicated technically. In [Wo1999], Wolff conjectured a
continuous version of this theorem for fractal sets, sometimes called the Furstenberg
set conjecture, and related to the Kakeya conjecture; a small amount of progress be-
yond the analogue of (2.28) is known [KaTa2001], [Bo2003], but we are still far from
the best possible result here.
Let A be a finite non-empty set of non-zero real numbers. We can form the sum set
A + A := {a + b : a, b ∈ A}
and the product set
A · A = {ab : a, b ∈ A}.
If A is in “general position”, it is not hard to see that A + A and A · A both have cardi-
nality comparable to |A|2 . However, in certain cases one can make one or the other sets
significantly smaller. For instance, if A is an arithmetic progression {a, a + r, . . . , a +
(k − 1)r}, then the sum set A + A has cardinality comparable to just |A|. Similarly, if A
is a geometric progression {a, ar, . . . , ark−1 }, then the product set A · A has cardinality
comparable to |A|. But clearly A cannot be an arithmetic progression and a geometric
progression at the same time (unless it is very short). So one might conjecture that at
least one of the sum set and product set should be significantly larger than A. Infor-
mally, this is saying that no finite set of reals can behave much like a subring of R. This
intuition was made precise by Erdős and Szemerédi [ErSz1983], who established the
lower bound
max(|A + A|, |A · A|) |A|1+c
for some small c > 0 which they did not make explicit. They then conjectured that in
fact c should be made arbitrary close to the optimal value of 1, and more precisely that
max(|A + A|, |A · A|) |A|2 exp(−δ log |A|/ log log |A|)
for large —A— and some absolute constant δ > 0. (The exponential factor is sharp,
as can be seen from setting A = {1, . . . , N}, and using some analytic number theory to
control the size of A · A.)
The Erdős-Szemerédi conjecture remains open, however the value of c has been
improved; currently, the best bound is due to Solymosi[So2005], who showed that c
can be arbitrarily close to 3/11. Solymosi’s argument is based on an earlier argument
of Elekes[El1997], who obtained c = 1/4 by a short and elegant argument based on the
Szemerédi-Trotter theorem which we will now present. The basic connection between
the two problems stems from the familiar formula y = mx + b for a line, which clearly
encodes a multiplicative and additive structure. We already used this connection im-
plicitly in the example that demonstrated that the Szemerédi-Trotter theorem was sharp.
For Elekes’ argument, the challenge is to show that if A + A and A · A are both small,
then a suitable family of lines y = mx + b associated to A will have a high number of
incidences with some set of points associated to A, so that the Szemerédi-Trotter may
then be profitably applied. It is not immediately obvious exactly how to do this, but
Elekes settled upon the choice of letting P := (A + A) × (A · A), and letting L be the
space of lines y = mx + b with slope in A−1 and intercept in A, thus |P| = |A + A||A · A|
and |L| = |A|2 . One observes that each line in L is incident to —A— points in P,
leading to |A|3 incidences. Applying the Szemerédi-Trotter theorem and doing the al-
gebra one eventually concludes that max(|A + A|, |A · A|) |A|5/4 . (A more elementary
proof of this inequality, not relying on the Szemerédi-Trotter theorem or crossing num-
ber bounds, and thus having the advantage on working on other archimedean fields
such as C, was subsequently found by Solymosi [So2008], but the best bounds on the
sum-product problem in R still rely very much on the Szemerédi-Trotter inequality.)
2.11. RATNER’S THEOREMS 149
where A is a matrix with some designated logarithm log(A), and At := exp(t log(A)).
Generally, we expect the coefficients of At to contain exponentials (as is the case in
(2.30)), or sines and cosines (which are basically just a complex version of exponen-
tials). However, if A is a unipotent matrix (i.e. the only eigenvalue is 1, or equivalently
that A = 1 + N for some nilpotent matrix N), then At is a polynomial in t, rather than
an exponential or sinusoidal function of t. More generally, we say that an element g
of a Lie group G is unipotent if its adjoint action x 7→ gxg−1 on the Lie algebra g is
unipotent. Thus for instance any element in the centraliser of G is unipotent, and every
element of a nilpotent group is unipotent.
We can now state one of Ratner’s theorems.
Theorem 2.39 (Ratner’s orbit closure theorem). Let X = G/Γ be a homogeneous space
of finite volume with a connected finite-dimensional Lie group G as symmetry group,
and let U be a connected subgroup of G generated by unipotent elements. Let Ux be
an orbit of U in X. Then the closure Ux is itself a homogeneous space of finite volume;
in particular, there exists a closed subgroup U ≤ H ≤ G such that Ux = Hx.
This theorem (first conjectured by Raghanuthan, I believe) asserts that the orbit
of any unipotent flow is dense in some homogeneous space of finite volume. In the
case of algebraic groups, it has a nice corollary: any unipotent orbit in an algebraic
homogeneous space which is Zariski dense, is topologically dense as well.
In some applications, density is not enough; we also want equidistribution. Hap-
pily, we have this also:
Theorem 2.40 (Ratner’s equidistribution theorem). Let X, G, U, x, H be as in the orbit
closure theorem. Assume also that U is a one-parameter group, thus U = {gt : t ∈
R} for some homomorphism t 7→ gt . Then Ux is equidistributed in Hx; thus for any
continuous function F : Hx → R we have
Z T
1
Z
lim F(gt x) dt = F
T →∞ T 0 Hx
R
where Hx represents integration on the normalised Haar measure on Hx.
One can also formulate this theorem (first conjectured by Dani[Da1986], I believe)
for groups U that have more than one parameter, but it is a bit technical to do so and we
shall omit it. My paper [GrTa2008f] with Ben Green concerns a quantitative version
of this theorem in the special case when X is a nilmanifold, and where the continuous
orbit Ux is replaced by a discrete polynomial sequence. (There is an extensive literature
on generalising Ratner’s theorems from continuous U to discrete U, which I will not
discuss here.)
From the equidistribution theorem and a little bit of ergodic theory one has a
measure-theoretic corollary, which describes ergodic measures of a group generated
by unipotent elements:
Theorem 2.41 (Ratner’s measure classification theorem). Let X be a finite volume
homogeneous space for a connected Lie group G, and let U be a connected subgroup
of G generated by unipotent elements. Let µ be a probability measure on X which
is ergodic under the action of U. Then µ is the Haar measure of some closed finite
volume orbit Hx for some U ≤ H ≤ G.
152 CHAPTER 2. EXPOSITORY ARTICLES
cannot get arbitrarily close to 0, basically because the golden ratio is very hard to
approximate by a rational a/b (the best approximants being given, of course, by the
Fibonacci numbers).
However, for indefinite quadratic forms Q of three or more variables m ≥ 3 with
incommensurate coefficients, Oppenheim[Op1929] conjectured that there was no dis-
creteness whatsoever, and that the set Q(Zm ) was dense in R. There was much partial
progress on this problem in the case of many variables (in large part due to the power
of the Hardy-Littlewood circle method in this setting), but the hardest case of just three
variables was only solved in by Margulis[Ma1989] in 1989.
Nowadays, we can obtain Margulis’ result as a quick consequence of Ratner’s the-
orem as follows. It is not difficult to reduce to the most difficult case m = 3. We
need to show that the image of Z3 under the quadratic form Q : R3 → R is dense in
2.11. RATNER’S THEOREMS 153
R. Now, every quadratic form comes with a special orthogonal group SO(Q), defined
as the orientation-preserving linear transformations that preserve Q; for instance, the
Euclidean form x12 + x22 + x32 in R3 has the rotation group SO(3), the Minkowski form
x12 + x22 + x32 − x42 has the Lorentz group SO(3, 1), and so forth. The image of Z3 under
Q is the same as that of the larger set SO(Q)Z3 . [We may as well make our domain
as large as possible, as this can only make our job easier, in principle at least.] Since
Q is indefinite, Q(R3 ) = R, and so it will suffice to show that SO(Q)Z3 is dense in
R3 . Actually, for minor technical reasons it is convenient to just work with the identity
component SO(Q)+ of SO(Q) (which has two connected components).
[An analogy with the Euclidean case Q(x1 , x2 , x3 ) = x12 + x22 + x32 might be enlight-
ening here. If one spins around the lattice Z3 by the Euclidean orthogonal group
SO(Q) = SO(3), one traces out a union of spheres around the origin, where the radii
of the spheres are precisely those numbers whose square can be expressed as the sum
of three squares. In this case, SO(Q)Z3 is not dense, and this is reflected in the fact
that not every number is the sum of three perfect squares. The Oppenheim conjecture
asserts instead that if you spin a lattice by an irrational Lorentz group, one traces out a
dense set.]
In order to apply Ratner’s theorem, we will view SO(Q)+ Z3 as an orbit Ux in a
symmetric space G/Γ. Clearly, U should be the group SO(Q)+ , but what to do about
the set Z3 ? We have to turn it somehow into a point in a symmetric space. The obvious
thing to do is to view Z3 as the zero coset (i.e. the origin) in the torus R3 /Z3 , but this
doesn’t work, because SO(Q)+ does not act on this torus (it is not a subgroup of R3 ).
So we need to lift up to a larger symmetric space G/Γ, with a symmetry group G which
is large enough to accommodate SO(Q)+ .
The problem is that the torus is the moduli space for translations of the lattice Z3 ,
but SO(Q)+ is not a group of translations; it is instead a group of unimodular linear
transformations, i.e. a subgroup of the special linear group SL(3, R). This group acts
on lattices, and the stabiliser of Z3 is SL(3, Z). Thus the right homogeneous space to
use here is X := SL(3, R)/SL(3, Z), which has a geometric interpretation as the moduli
space of unimodular lattices in R3 (i.e. a higher-dimensional version of the modu-
lar curve); X is not compact, but one can verify that X has finite volume, which is
good enough for Ratner’s theorem to apply. Since the group G = SL(3, R) contains
U = SO(Q)+ , U acts on X. Let x = SL(3, Z) be the origin in X (under the moduli space
interpretation, x is just the standard lattice Z3 ). If Ux is dense in X, this implies that
the set of matrices SO(Q)+ SL(3, Z) is dense in SL(3, R); applying this to, say, the unit
vector (1, 0, 0), we conclude that SO(Q)+ Z3 is dense in R3 as required. (These reduc-
tions are due to Raghunathan. Note that the new claim is actually a bit stronger than the
original Oppenheim conjecture; not only are we asserting now that SO(Q)+ applied to
the standard lattice Z3 sweeps out a dense subset of Euclidean space, we are saying the
stronger statement that one can use SO(Q)+ to bring the standard lattice ”arbitrarily
close” to any given unimodular lattice one pleases, using the topology induced from
SL(3, R).)
How do we show that Ux is dense in X? We use Ratner’s orbit closure theorem!
This theorem tells us that if Ux is not dense in X, it must be much smaller - it must
be contained in a closed finite volume orbit Hx for some proper closed connected sub-
group H of SL(3, R) which still contains SO(Q)+ . [To apply this theorem, we need
154 CHAPTER 2. EXPOSITORY ARTICLES
to check that U is generated by unipotent elements, which can be done by hand; here
is where we need to assume m ≥ 3.] An inspection of the Lie algebras of SL(3, R)
and SO(Q)+ shows in fact that the only such candidate for H is SO(Q)+ itself (here is
where we really use the hypothesis m = 3!). Thus SO(Q)+ x is closed and finite volume
in X, which implies that SO(Q)+ ∩ SL(3, Z) is a lattice in SO(Q)+ . Some algebraic
group theory (specifically, the Borel density theorem) then shows that SO(Q)+ lies in
the Zariski closure of SO(Q)+ ∩ SL(3, Z), and in particular is definable over Q. It is
then not difficult to see that the only way this can happen is if Q has rational coefficients
(up to scalar multiplication), and the Oppenheim conjecture follows.
2.11.2 Notes
This article was originally posted on September 29, 2007 at
terrytao.wordpress.com/2007/09/29
Thanks to Matheus for providing the reference [KaUg2007]. Thanks also to Elon
Lindenstrauss for some corrections.
2.12. LORENTZ GROUP AND CONIC SECTIONS 155
λ ny
both the unstable and stable modes of A). Then the orbit xn = will expand
λ −n z
exponentially in the unstable mode and contract exponentially
in the stable mode, and
a
the orbit will lie along the rectangular hyperbola { : ab = yz}.
b
As the above examples show, orbits of linear transformations can exhibit a variety
of behaviours, from exponential growth to exponential decay to oscillation to some
combination of all three. But there is one special case in which the behaviour is much
simpler, namely that the orbit remains polynomial. This occurs when A is a unipotent
matrix, i.e. A = I + N where N is nilpotent (i.e. N m = 0 for some finite m). A typical
example of a unipotent matrix is
1 1 0
A = 0 1 1 (2.31)
0 0 1
(and indeed, by the Jordan normal form (see Section 2.13), all unipotent matrices
are similar to direct sums of matrices of this type). For unipotent matrices, the binomial
formula terminates after m terms to obtain a polynomial expansion for An :
An = (I + N)n
n(n − 1) 2 n . . . (n − m + 2) m−1
= I + nN + N +...+ N .
2 (m − 1)!
From this we easily see that, regardless of the choice of initial vector x, the coeffi-
cients of xn are polynomial in n. (Conversely, if the coefficients of xn are polynomial
in n for every x, it is not hard to show that A is unipotent; I’ll leave this as an exer-
cise.) It is instructive to see what is going on at the coefficient level, using
the matrix
an
(2.31) as an example. If we express the orbit xn in coordinates as xn = bn , then the
cn
recurrence xn+1 = Axn becomes
an+1 = an + bn
bn+1 = bn + cn
cn+1 = cn .
We thus see that the sequence cn is constant, the sequence bn grows linearly, and an
grows quadratically, so the whole orbit xn has polynomial coefficients. If one views the
recurrence xn+1 = Axn as a dynamical system, the polynomial nature of the dynamics
are caused by the absence of (both positive and negative) feedback loops: c affects b,
and b affects a, but there is no loop in which a component ultimately affects itself,
which is the source of exponential growth, exponential decay, and oscillation. Indeed,
one can view this absence of feedback loops as a definition of unipotence.
For the purposes of proving a dynamical theorem such as Ratner’s theorem, unipo-
tence is important for several reasons. The lack of exponential growing modes means
that the dynamics is not exponentially unstable going forward in time; similarly, the
2.12. LORENTZ GROUP AND CONIC SECTIONS 157
lack of exponentially decaying modes means that the dynamics is not exponentially
unstable going backward in time. The lack of oscillation does not improve the stability
further, but it does have an important effect on the smoothness of the dynamics. In-
deed, because of this lack of oscillation, orbits which are polynomial in nature obey
an important dichotomy: either they go to infinity, or they are constant. There is a
quantitative version of this statement, known as Bernstein’s inequality: if a polynomial
remains bounded over a long interval, then its derivative is necessarily small. (From a
Fourier-analytic perspective, being polynomial with low degree is analogous to being
“low frequency”; the Fourier-analytic counterpart of Bernstein’s inequality is closely
related to the Sobolev inequality, and is extremely useful in PDE. But I digress.) These
facts seem to play a fundamental role in all arguments that yield Ratner-type theorems.
If g is unipotent, we thus see that the two orbits (gn x)n∈Z and (gn xε )n∈Z only di-
verge polynomially in n, without any oscillation. In particular, we have the dichotomy
that two orbits either diverge, or are translates of each other, together with Bernstein-
like quantitative formulations of this dichotomy. This dichotomy is a crucial compo-
nent in the proof of Ratner’s theorem, and explains why we need the group action to be
generated by unipotent elements.
invertible, one can eliminate the shift b by translating the orbit xn , or more specifically
making the substitution
yn := xn + (A − I)−1 b
which simplifies (2.32) to
yn+1 := Ayn ; y0 := x + (A − I)−1 b
which allows us to solve for the orbit xn explicitly as
xn = An (x + (A − I)−1 b) − (A − I)−1 b.
Of course, we have to analyse things a little differently in the degenerate case that A − I
is not invertible, in particular the lower order term b plays a more significant role in
this case. Leaving that case aside for the moment, we see from the above formula that
the behaviour of the orbit xn is going to be largely controlled by the spectrum of A. In
this case, A will have two (generalised) eigenvalues λ , 1/λ whose product is 1 (since
det(A) = 1) and whose sum is real (since A clearly has real trace). This gives three
possibilities:
1. Elliptic case. Here λ = eiθ is a non-trivial unit phase. Then A is similar (after
a real linear transformation) to the rotation matrix Rθ described earlier, and so
the orbit xn lies along a linear transform of a circle, i.e. the orbit lies along an
ellipse.
2. Hyperbolic case. Here λ is real with |λ| > 1 or 0 < |λ | < 1. In this case A is
λ 0
similar to the diagonal matrix , and so by previous discussion we see
0 1/λ
that the orbit xn lies along a linear transform of a rectangular hyperbola, i.e. the
orbit lies along a general hyperbola.
3. Parabolic case. This is the boundary case between the elliptic and hyperbolic
cases, in which λ = 1. Then either A is the identity (in which case xn travels
along a line,
or is
constant), or else (by the Jordan normal form) A is similar to
1 1
the matrix . Applying a linear change of coordinates, we thus see that
0 1
the affine recurrence xn+1 = Axn + b is equivalent to the 2 × 2 system
yn+1 = yn + zn + c
zn+1 = zn + d
for some real constants c, d and some real sequences yn , zn . If c, d are non-zero,
we see that zn varies linearly in n and yn varies quadratically in n, and so (yn , zn )
lives on a parabola. Undoing the linear change of coordinates, we thus see in
this case that the original orbit xn also lies along a parabola. (If c or d vanish, the
orbit lies instead on a line.)
Thus we see that all elements of SL(2, R) preserve some sort of conic section. The
elliptic elements trap their orbits along ellipses, the hyperbolic elements trap their or-
bits along hyperbolae, and the parabolic elements trap their orbits along parabolae (or
2.12. LORENTZ GROUP AND CONIC SECTIONS 159
along lines, in some degenerate cases). The elliptic elements thus generate oscillation,
the hyperbolic elements generate exponential growth and decay, and the parabolic el-
ements are unipotent and generate polynomial growth. (If one interprets elements of
SL(2, R) as area-preserving linear or affine transformations, then elliptic elements are
rotations around some origin (and in some coordinate system), hyperbolic elements are
compressions along one axis and dilations along another, and parabolic elements are
shear transformations and translations.)
Remark 2.42. It is curious that every element of SL(2, R) preserves at least one non-
trivial quadratic form; this statement is highly false in higher dimensions (consider for
instance what happens to diagonal matrices). I don’t have a “natural” explanation of
this fact - some sort of fixed point theorem at work, perhaps? I can cobble together a
proof using the observations that (a) every matrix in SL(2, R) is similar to its inverse,
(b) the space of quadratic forms on R2 is odd-dimensional, (c) any linear transforma-
tion on an odd-dimensional vector space which is similar to its inverse has at least one
eigenvalue equal to ±1, (d) the action of a non-degenerate linear transformation on
quadratic forms preserves positive definiteness, and thus cannot have negative eigen-
values, but this argument seems rather ad hoc to me.
One can view the parabolic elements of SL(2, R) as the limitof elliptic or hyper-
1 1
bolic ones in a number of ways. For instance, the matrix is hyperbolic when
ε 1
ε > 0, parabolic when ε = 0, and elliptic when ε < 0. This is related to how the hy-
perbola, parabola, and ellipse emerge as sections of the light cone. Another way to
obtain the parabola a limit is to view that parabola as an infinitely large ellipse (or
hyperbola), with centre infinitely far away. For instance, the ellipse of vertical radius
√ 2 2
R and horizontal radius R centred at (0,R) is given by the equation xR + (y−R) R2
= 1,
1 2 1 2
which can be rearranged as y = 2 x + 2R y . In the limit R → ∞, this ellipse becomes the
parabola y = 12 x2 , and rotations associated with those ellipses can converge to parabolic
affine maps of the type described above. A similar construction allows one to view the
parabola as a limit of hyperbolae; incidentally, one can use (the Fourier transform of)
this limit to show (formally, at least) that the Schrdinger equation emerges as the non-
relativistic limit of the Klein-Gordon equation.
for some 0 ≤ r ≤ d. The pair (r, d − r) is the signature of Q, and SO(Q) is isomorphic
to the group SO(r, d − r). The signature is an invariant of Q; this is Sylvester’s law of
inertia.
In the Euclidean (i.e. definite) case r = d (or r = 0), the level sets of Q are
spheres (in diagonalised form) or ellipsoids (in general), and so the orbits of elements
in SO(Q) ∼ = SO(d) stay trapped on spheres or ellipsoids. Thus their orbits cannot ex-
hibit exponential growth or decay, or polynomial behaviour; they must instead oscillate,
much like the elliptic elements of SL(2, R). In particular, SO(Q) does not contain any
non-trivial unipotent elements.
In the indefinite case d = 2, r = 1, the level sets of Q are hyperbolae (as well as the
light cone {(x1 , x2 ) : x12 − x22 = 0}, which in two dimensions is just a pair of intersecting
lines). It is then geometrically clear that most elements of SO(Q) ∼ = SO(1, 1) are going
to be hyperbolic, as their orbits will typically escape to infinity along hyperbolae. (The
only exceptions are the identity and the negative identity.) Elements of SO(1, 1) are
also known as Lorentz boosts. (More generally, SO(d, 1) (or SO(1, d)) is the structure
group for special relativity in d − 1 space and 1 time dimensions.)
Now we turn to the case of interest, namely d = 3 and Q indefinite, thus r = 1 or
r = 2. By changing the sign of Q if necessary we may take r = 1, and after diagonalising
we can write
Q(x1 , x2 , x3 ) = x12 + x22 − x32 .
The level sets of Q are mostly hyperboloids, together with the light cone {(x1 , x2 , x3 ) :
x12 + x22 − x32 = 0}. So a typical element of SO(Q) ∼ = SO(2, 1) will have orbits that are
trapped inside light cones or on hyperboloids.
In general, these orbits will wander in some complicated fashion over such a cone
or hyperboloid. But for some special elements of SO(Q), the orbit is contained in
a smaller variety. For instance, consider a Euclidean rotation around the x3 axis by
some angle θ . This clearly preserves Q, and the orbits of this rotation lie on horizontal
circles, which are of course each contained in a hyperboloid or light cone. So we see
that SO(Q) contains elliptical elements, and this is “because” we can get ellipses as
sections of hyperboloids and cones, by slicing them with spacelike planes.
Similarly, if one considers a Lorentz boost in the x1 , x2 directions, we also preserve
Q, and the orbits of this rotation lie on vertical hyperbolae (or on a one-dimensional
light cone). So we see that SO(Q) contains hyperbolic elements, which is “because” we
can get hyperbolae as sections of hyperbolae and cones, by slicing them with timelike
planes.
So, to get unipotent elements of SO(Q), it is clear what we should do: we should
exploit the fact that parabolae are also sections of hyperboloids and cones, obtained
by slicing these surfaces along null planes. For instance, if we slice the hyperboloid
{(x1 , x2 , x3 ) : x12 + x22 − x32 = 1} with the null plane {(x1 , x2 , x3 ) : x3 = x2 + 1} we obtain
the parabola {(x1 , x3 −1, x3 ) : 2x3 = x12 }. A small amount of calculation then lets us find
a linear transformation which preserves both the hyperboloid and the null plane (and
thus preserves Q and preserves the parabola); indeed, if we introduce null coordinates
2.12. LORENTZ GROUP AND CONIC SECTIONS 161
(y1 , y2 , y3 ) := (x1 , x3 − x2 , x3 + x2 ), then the hyperboloid and null plane are given by the
equations y21 = y2 y3 + 1 and y2 = 1 respectively; a little bit of algebra shows that the
linear transformations (y1 , y2 , y3 ) 7→ (y1 + ay2 , y2 , y3 + 2ay1 + a2 y2 ) will preserve both
surfaces for any constant a. This provides a one-parameter family (a parabolic sub-
group, in fact) of unipotent elements (known as null rotations) in SO(Q). By rotating
the null plane around we can get many such one-parameter families, whose orbits trace
out all sorts of parabolae, and it is not too hard at this point to show that the unipotent
elements can in fact be used to generate all of SO(Q) (or SO(Q)+ ).
Remark 2.43. Incidentally, the fact that the parabola is a section of a cone or hy-
perboloid of one higher dimension allows one (via the Fourier transform) to embed
solutions to the free Schrödinger equation as solutions to the wave or Klein-Gordon
equations of one higher dimension; this trick allows one, for instance, to derive the
conservation laws of the former from those of the latter. See for instance Exercises
2.11, 3.2, and 3.30 of my book [Ta2006d].
2.12.5 Notes
This article was originally posted on Oct 5, 2007 at
terrytao.wordpress.com/2007/10/05
Thanks to Emmanuel Kowalski and Attila Smith for corrections.
162 CHAPTER 2. EXPOSITORY ARTICLES
Theorem 2.45 (Nilpotent Jordan normal form). Every nilpotent linear transformation
T : V → V on a finite dimensional vector space is similar to a direct sum of right shifts.
We will prove this theorem later, but for now let us see how we can quickly deduce
Theorem 2.44 from Theorem 2.45. The idea here is, of course, to split up the minimal
polynomial, but it turns out that we don’t actually need the minimal polynomial per se;
any polynomial that annihilates the transformation will do.
More precisely, let T : V → V be a linear transformation on a finite-dimensional
complex vector space V . Then the powers I, T, T 2 , T 3 , . . . are all linear transforma-
tions on V . On the other hand, the space of all linear transformations on V is a finite-
dimensional vector space. Thus there must be a non-trivial linear dependence between
these powers. In other words, we have P(T ) = 0 (or equivalently, V = ker(P(T ))) for
some polynomial P with complex coefficients.
Now suppose that we can factor this polynomial P into two coprime factors of lower
degree, P = QR. Using the extended Euclidean algorithm (or more precisely, Bézout’s
identity), we can find more polynomials A, B such that AQ + BR = 1. In particular,
A(T )Q(T ) + B(T )R(T ) = I. (2.33)
The formula (2.33) has two important consequences. Firstly, it shows that ker(Q(T )) ∩
ker(R(T )) = {0}, since if a vector v was in the kernel of both Q(T ) and R(T ), then by
applying (2.33) to v we obtain v = 0. Secondly, it shows that ker(Q(T )) + ker(R(T )) =
V . Indeed, given any v ∈ V , we see from (2.33) that v = R(T )B(T )v + Q(T )A(T )v;
since Q(T )R(T ) = R(T )Q(T ) = P(T ) = 0 on V , we see that R(T )B(T )v and Q(T )A(T )
lie in ker(Q(T )) and ker(R(T )) respectively. Finally, since all polynomials in T com-
mute with each other, the spaces ker(Q(T )) and ker(R(T )) are T -invariant.
Putting all this together, we see that the linear transformation T on ker(P(T )) is
similar to the direct sum of the restrictions of T to ker(Q(T )) and ker(R(T )) respec-
tively. We can iterate this observation, reducing the degree of the polynomial P which
annihilates T , until we reduce to the case in which this polynomial P cannot be split
into coprime factors of lesser degree. But by the fundamental theorem of algebra, this
can only occur if P takes the form P(t) = (t − λ )m for some λ ∈ C and m ≥ 0. In other
words, we can reduce to the case when (T − λ I)m = 0, or in other words T is equal to
λ I plus a nilpotent transformation. If we then subtract off the λ I term, the claim now
easily follows from Theorem 2.45.
Remark 2.46. From a modern algebraic geometry perspective, all we have done here
is split the spectrum of T (or of the ring generated by T ) into connected components.
It is interesting to see what happens when two eigenvalues get very close together.
If one carefully inspects how the Euclidean algorithm works, one concludes that the
coefficients of the polynomials A(T ) and B(T ) above become very large (one is trying
to separate two polynomials Q(T ) and R(T ) that are only barely coprime to each other).
Because of this, the Jordan decomposition becomes very unstable when eigenvalues
begin to collide.
Because the fundamental theorem of algebra is used, it was necessary25 to work
in an algebraically closed field such as the complex numbers C. Over the reals, one
25 Indeed, one can in fact deduce the fundamental theorem of algebra from the Jordan normal form theo-
rem.
164 CHAPTER 2. EXPOSITORY ARTICLES
picks up other “elliptic” components, such as 2 × 2 rotation matrices, which are not
decomposable into translates of shift operators.
Thus far, the decompositions have been canonical - the spaces one is decomposing
into can be defined uniquely in terms of T (they are the kernels of the primary factors
of the minimal polynomial). However, the further splitting of the nilpotent (or shifted
nilpotent) operators into smaller components will be non-canonical26 , depending on an
arbitrary choice of basis.
this is easiest to see by inspecting the dimensions of the kernels of (T − λ I)m for various λ , m using a Jordan
normal form.
2.13. JORDAN NORMAL FORM 165
T mxi −1 xi of our orbits. For instance, in the above example we can apply T once to
obtain the non-trivial linear relation
3T x + 5T 2 y + 7T 3 z = 0.
T (3x + 5Ty + 7T 2 z) = 0.
2.13.3 Notes
This article was originally posted on Oct 12, 2007 at
terrytao.wordpress.com/2007/10/12
Greg Kuperberg observed that the above proof also yields the classification of
finitely generated PID modules in the case of finitely generated modules with non-
trivial annihilator. In particular, replacing the polynomial ring C[X] by modules over
Z, one can obtain the classification of fintie abelian groups. Greg also pointed out that
the Hahn-Hellinger theorem can be viewed as an infinite-dimensional analogue of the
Jordan normal form for self-adjoint operators.
166 CHAPTER 2. EXPOSITORY ARTICLES
∂tt u − ∆u = 0 (2.35)
More precisely, solutions to (2.35) tend to decay in time as t → +∞, as can be seen
from the presence of the 1t term in the explicit formula
1 1
Z Z
u(t, x) = ∂t u(0, y) dS(y) + ∂t [ u(0, y) dS(y)], (2.37)
4πt |y−x|=t 4πt |y−x|=t
for such solutions in terms of the initial position u(0, y) and initial velocity ∂t u(0, y),
where t > 0, x ∈ R3 , and dS is the area element of the sphere {y ∈ R3 : |y − x| = t}.
(For this post I will ignore the technical issues regarding how smooth the solution has
to be in order for the above formula to be valid.) On the other hand, solutions to (2.36)
tend to blow up in finite time from data with positive initial position and initial velocity,
even if this data is very small, as can be seen by the family of solutions
for T > 0, 0 < t < T , and x ∈ R3 , where c is the positive constant c := ( 2(p+1)
(p−1)2
)1/(p−1) .
For T large, this gives a family of solutions which starts out very small at time zero,
but still manages to go to infinity in finite time.
The equation (2.34) can be viewed as a combination of equations (2.35) and (2.36)
and should thus inherit a mix of the behaviours of both its “parents”. As a general rule,
when the initial data u(0, ·), ∂t u(0, ·) of solution is small, one expects the dispersion to
“win” and send the solution to zero as t → ∞, because the nonlinear effects are weak;
conversely, when the initial data is large, one expects the nonlinear effects to “win”
and cause blowup, or at least large amounts of instability. This division is particularly
pronounced when p is large (since then the nonlinearity is very strong for large data
and very weak for small data), but not so much for p small (for instance, when p = 1,
the equation becomes essentially linear, and one can easily show that blowup does not
occur from reasonable data.)
The theorem of John formalises this intuition, with a remarkable threshold value
for p:
2.14. JOHN’S BLOWUP THEOREM 167
∂t u = F
can be rewritten via the fundamental theorem of calculus in the integral form
Z t
u(t) = u(0) + F(s) ds,
0
∂tt u − ∆u = F
can be rewritten via the fundamental solution (2.37) of the homogeneous equation (to-
gether with Duhamel’s principle) in the integral form
Z tZ
1 1
u(t, x) = ulin (t, x) + F(s, y) dS(y)ds
4π 0 |y−x|=|t−s| t − s
where ulin is the solution to the homogeneous wave equation (2.35) with initial posi-
tion u(0, x) and initial velocity ∂t u(0, x) (and is given using (2.37)). [I plan to write
more about this formula in a later article, but today I will just treat it as a mirac-
ulous identity. I will note however that the formula generalises Newton’s formula
1 R 1
u(x) = 4π R |x−y| F(y) dy for the standard solution to Poisson’s equation −∆u = F.]
3
Using the fundamental solution, the nonlinear wave equation (2.34) can be rewrit-
ten in integral form as
Z tZ
1 1
u(t, x) = ulin (t, x) + |u(s, y)| p dS(y)ds. (2.38)
4π 0 |y−x|=|t−s| t − s
168 CHAPTER 2. EXPOSITORY ARTICLES
Remark 2.50. Strictly speaking, one needs to first show that the solution exists and is
sufficiently smooth before Duhamel’s principle can be rigorously applied, but this turns
out to be a routine technical detail and I will not discuss it here.
John’s argument now exploits a remarkable feature of the fundamental solution of
the three-dimensional wave equation, namely that it is non-negative; combining this
with the non-negativity of the forcing term |u| p , we see that the integral in (2.38), that
represents the cumulative effect of the nonlinearity, is always non-negative. Thus we
have the pointwise inequality
but also we see that any lower bound for u of the form u(t, x) ≥ v(t, x) can be immedi-
ately bootstrapped via (2.38) to a new lower bound
Z tZ
1 1
u(t, x) ≥ ulin (t, x) + |v(s, y)| p dS(y)ds. (2.40)
4π 0 |y−x|=|t−s| t −s
This gives a way to iteratively give lower bounds on a solution u, by starting with the
lower bound (2.38) (and computing ulin (t, x) explicitly using (2.37)) and then feeding
this bound repeatedly into (2.40) to see what one gets. (This iteration procedure is
closely related to the method of Picard iteration for constructing solutions to nonlinear
ODE or PDE, which is still widely used today in the modern theory.)
What will transpire√is that this iterative process will yield successively larger lower
√ when p < 1 + 2, but will yield successively smaller lower bounds when p >
bounds
1 + 2; this is the main driving force√behind John’s theorem. (To actually establish
blowup in finite time when p < 1 + 2, there is an auxiliary step that uses energy
inequalities to show that once the solution gets sufficiently large, it will be guaranteed
to develop singularities within
√ a finite amount of additional time. To establish global
solutions when p > 1 + 2, one needs to show that the lower bounds constructed by
this scheme in fact converge to the actual solution, and establish uniform control on all
of these lower bounds.)
The remaining task is a computational one, to evaluate the various lower bounds for
u arising from (2.39) and (2.40) from some given initial data. In principle, this is just
an application of undergraduate several variable calculus, but if one sets about work-
ing out the relevant integrals exactly (using polar coordinates, etc.), the computations
quickly become tediously complicated. But we don’t actually need exact, closed-form
expressions for these integrals; just knowing the order of magnitude of these integrals is
enough. For that task, much faster (and looser) computational techniques are available.
Let’s see how. We begin with the computation of the linear solution ulin (t, x). This
is given in terms of the initial data u(0, x), ∂t u(0, x) via the formula (2.37). Now, for the
purpose of establishing John’s theorem in the form stated above, we have the freedom
to pick the initial data as we please, as long as it is smooth, small, and compactly
supported. To make our life easier, we pick initial data with vanishing initial position
and non-negative initial velocity, thus u(0, x) = 0 and ∂t u(0, x) ≥ 0; this eliminates the
pesky partial derivative in (2.37) and makes ulin non-negative. More concretely, let us
take
∂t u(0, x) := εψ(x/ε)
2.14. JOHN’S BLOWUP THEOREM 169
for some fixed non-negative bump function ψ (the exact form is not relevant) and some
small ε > 0, thus the initial velocity has very small amplitude and width. To simplify
the notation we shall work with macroscopic values of ε, thus ε ∼ 1, but it will be not
hard to see that the arguments below also work for very small ε (though of course the
smaller ε is, the longer it will take for blowup to occur).
As I said before, we only need an order of magnitude computation. Let us reflect
this by describing the initial velocity ∂t u(0, x) in fuzzier notation:
1
Note that the factor 4π can be discarded for the purposes of order of magnitude com-
putation. Geometrically, the integral is measuring the area of the portion of the sphere
{|y − x| = t} which intersects the ball {y = O(1)}. A little bit of geometric visualisa-
tion will reveal that for large times t 1, this portion of the sphere will vanish unless
|x| = t + O(1), in which case it is a spherical cap of diameter O(1), and thus area O(1).
Thus we are led to the back-of-the-envelope computation
1
ulin (t, x) ∼ when |x| = t + O(1) and t 1
t
with ulin (t, x) zero when |x| =
6 t + O(1). (This vanishing outside of a neighbourhood of
the light cone {|x| = t} is a manifestation of the sharp Huygens principle.) In particular,
from (2.39) we obtain the initial lower bound
1
u(t, x) when |x| = t + O(1) and t 1.
t
If we then insert this bound into (2.40) and discard the linear term ulin (which we
already know to be positive, and which we have already “used up” in some sense) we
obtain the lower bound
Z tZ
1 1
u(t, x) dS(y)ds.
0 |y−x|=|t−s|;|y|=s+O(1);s1 t − s sp
This is a moderately scary looking integral. But we can get a handle on it by first look-
ing at it geometrically. For a fixed point (t, x) in spacetime, the region of integration
is the intersection of a backwards light cone {(s, y) : 0 ≤ s ≤ t; |y − x| = |t − s|} with a
thickened forwards light cone {(s, y) : |y| = s + O(1); s 1}. If |x| is much larger than
170 CHAPTER 2. EXPOSITORY ARTICLES
t, then these cones will not intersect. If |x| is close to t, the intersection looks compli-
cated, so let us consider the spacelike case when |x| is much less than t, say |x| ≤ t/2;
we also continue working in the asymptotic regime t 1. In this case, a bit of geom-
etry or algebra shows that the intersection of the two light cones is a two-dimensional
ellipsoid in spacetime of radii ∼ t (in particular, its surface area is ∼ t 2 ), and living at
times s in the interior of [0,t], thus s and t − s are both comparable to t. Thickening the
forward cone, it is then geometrically intuitive that the intersection of the backwards
light cone with the thickened forwards light cone is an angled strip around that ellipse
of thickness ∼ 1; thus the total measure of this strip is roughly ∼ t 2 . Meanwhile, since
s and t − s are both comparable to t, the integrand is of magnitude ∼ 1t t1p . Putting all
of this together, we conclude that
11
u(t, x) t 2 = t 1−p
t tp
whenever we are in the interior cone region {(t, x) : t 1; |x| ≤ t/2}.
To summarise so far, the linear evolution filled out the light cone {(t, x) : t 1; |x| =
t + O(1)} with a decay t −1 , and then the nonlinearity caused a secondary wave that
filled out the interior region {(t, x) : t 1; |x| < t/2} with a decay t 1−p . We now
compute the tertiary wave by inserting the secondary wave bound back into (2.40), to
get Z tZ
1 1
u(t, x) dS(y)ds.
0 |y−x|=|t−s|;|y|<s/2;s1 t − s t p(1−p)
Let us continue working in an interior region, say {(t, x) : t 1; |x| < t/4}. The region
of integration is the intersection of the backwards light cone {(s, y) : 0 ≤ s ≤ t; |y − x| =
t − s} with an interior region {(s,t) : s 1; |y| < s/2}. A brief sketch of the situation
reveals that this intersection basically consists of the portion of the backwards light
cone in which s is comparable in size to t. In particular, this intersection has a three-
dimensional measure of ∼ t 3 , and on the bulk of this intersection, s and t − s are both
comparable to t. So we obtain a lower bound
1 1 2
u(t, x) t 3 = t 1−pt 2−(p−1)
t t p(1−p)
whenever t 1 and |x| < t/4. √
Now we finally see where the condition p < 1 + 2 will come in; if this condition
is true, then 2 − (p − 1)2 is positive, and so the tertiary wave is stronger than the sec-
ondary wave, and also situated in essentially the same location of spacetime. This is
the beginning of a positive feedback loop; the quaternary wave will be even√stronger
still, and so on and so forth. Indeed, it is not hard to show that if p < 1 + 2, then
for any constant A, one will have a lower bound of the form u(t, x) t A in the interior
of the light cone. This does not quite demonstrate blowup per se - merely superpoly-
nomial growth instead - but actually one can amplify this growth into blowup with a
little bit more effort (e.g. integrating (2.34) in spaceR to eliminate the Laplacian term
and investigating the dynamics of the spatial integral R3 u(t, x) dx, taking advantage of
finite speed of propagation for this equation, which limits the support of u to the cone
{|x| ≤ t + O(1)}). A refinement of these arguments, taking into account more of the
2.14. JOHN’S BLOWUP THEOREM 171
components
√ of the various waves in the iteration, also gives blowup for the endpoint
p = 1 + 2. √
In the other direction, if p > 1 + 2, the tertiary wave appears to be smaller than
the secondary wave (though to fully check this, one has to compute a number of other
components of these waves which we have discarded in the above computations). This
sets up a negative feedback loop, with each new wave in the iteration scheme decaying
faster than the previous, and thus suggests global existence of the solution, at least
when the size of the initial data (which was represented by ε) was sufficiently small.
This heuristic prediction can be made rigorous by controlling these iterates in various
function space norms that capture these sorts of decay, but I will not detail them here.
Remark 2.51. More generally, any analysis of a semilinear equation that requires one to
compute the tertiary wave tends to give conditions on the exponents which are quadratic
in nature; if the quaternary wave was involved also, then cubic constraints might be
involved, and so forth. In this particular case, an analysis of the primary and secondary
waves alone (which would lead just to linear constraints on p) are not enough, because
these waves live in very different regions of spacetime and so do not fully capture the
feedback mechanism.
2.14.1 Notes
This article was originally posted on Oct 26, 2007 at
terrytao.wordpress.com/2007/10/26
172 CHAPTER 2. EXPOSITORY ARTICLES
Note that the hypothesis that F is algebraically closed is crucial; for instance, if F is
the real line R, then the equation x2 + 1 = 0 has no solution, but there is no polynomial
Q(x) such that (x2 + 1)Q(x) = 1.
Like many results of the “The only obstructions are the obvious obstructions” type,
the power of the nullstellensatz lies in the ability to take a hypothesis about non-
existence (in this case, non-existence of solutions to P1 (x) = . . . = Pm (x) = 0) and
deduce a conclusion about existence (in this case, existence of Q1 , . . . , Qm such that
P1 Q1 + . . . + Pm Qm = 1). The ability to get “something from nothing” is clearly going
to be both non-trivial and useful. In particular, the nullstellensatz offers an important
duality between algebraic geometry (Conclusion I is an assertion that a certain alge-
braic variety is empty) and commutative algebra (Conclusion II is an assertion that a
certain ideal is non-proper).
Now suppose one is trying to solve the more complicated system P1 (x) = . . . =
Pd (x) = 0; R(x) 6= 0 for some polynomials P1 , . . . , Pd , R. Again, any identity of the form
P1 Q1 + . . . + Pm Qm = 1 will be an obstruction to solvability, but now more obstructions
are possible: any identity of the form P1 Q1 + . . . + Pm Qm = Rr for some non-negative
2.15. HILBERT’S NULLSTELLENSATZ 173
integer r will also obstruct solvability. The strong nullstellensatz asserts that this is the
only obstruction:
II. There exist polynomials Q1 , . . . , Qm ∈ F[x] and a non-negative integer r such that
P1 Q1 + . . . + Pm Qm = Rr .
secretly apply the fundamental theorem of algebra throughout the proof which follows,
to clarify what is going on.
Let us say that a collection (P1 , . . . , Pm ; R) of polynomials obeys the nullstellensatz
if at least one of Conclusions I and II is true. It is clear that Conclusions I and II cannot
both be true, so to prove the nullstellensatz it suffices to show that every collection
(P1 , . . . , Pm ; R) obeys the nullstellensatz.
We can of course throw away any of the Pi that are identically zero, as this does
not affect whether (P1 , . . . , Pm ; R) obeys the nullstellensatz. If none of the Pi remain,
then we have Conclusion I, because the polynomial R has at most finitely many ze-
roes, and because an algebraically closed field must be infinite. So suppose that we
have some non-zero Pi . We then repeatedly use the extended Euclidean algorithm to
locate the greatest common divisor D(x) of the remaining Pi . Note that this algorithm
automatically supplies for us some polynomials Q1 (x), . . . , Qm (x) such that
Because of this, we see that (P1 , . . . , Pm ; R) obeys the nullstellensatz if and only if (D; R)
obeys the nullstellensatz. So we have effectively reduced to the case m = 1.
Now we apply the extended Euclidean algorithm again, this time to D and R, to
express the gcd D0 of D and R as a combination D0 = DA + RB, and also to factor
D = D0 S and R = D0 T for some polynomials A, B, S, T with AS + BT = 1. A little
algebra then shows that one has a solution to the problem
D(x) = 0; R(x) 6= 0
S(x) = 0; D0 (x) 6= 0.
P1 S1 + P2 S2 = Res(P1 , P2 )
or show that
Rr = 0 mod I for some r. (2.42)
We assume that no solution to (2.41) exists, and use this to synthesise a relation of the
form (2.42). Let y ∈ F d−1 be arbitrary. We can view the polynomials P1 (y,t), . . . , Pm (y,t), R(y,t)
as polynomials in F[t], whose coefficients lie in F but happen to depend in a polyno-
mial fashion on y. To emphasise this, we write Pj,y (t) for Pj (y,t) and Ry (t) for R(y,t).
Then by hypothesis, there is no t for which
To motivate the strategy, let us consider the easy case when R = 1, m = 2, and P1 ,
P2 are monic polynomials in t. Then by our previous discussion, the above system is
solvable for any fixed y precisely when Res(P1,y , P2,y ) is zero. So either the equation
176 CHAPTER 2. EXPOSITORY ARTICLES
Res(P1,y , P2,y ) = 0 has a solution, in which case we have (2.41), or it does not. But
in the latter case, by applying the nullstellensatz at one lower dimension we see that
Res(P1,y , P2,y ) must be constant in y. But recall that the resultant is a linear combi-
nation P1,y S1,y + P2,y S2,y of P1,y and P2,y , where the polynomials S1,y and S2,y depend
polynomially on P1,y and P2,y and thus on y itself. Thus we end up with (2.42), and the
induction closes in this case.
Now we turn to the general case. Applying the d = 1 analysis, we conclude that
there exist polynomials Q1,y , . . . , Qm,y ∈ F[t] of t, and an r = ry ≥ 0, such that
r
P1,y (t)Q1,y (t) + . . . + Pm,y (t)Qm,y (t) = Ryy (t). (2.43)
Now, if the exponent ry was constant in y, and the coefficients of Q1,y , . . . , Qm,y de-
pended polynomially on y, we would be in case (2.42) and therefore done.
It is not difficult to make ry constant in y. Indeed, we observe that the degrees
of P1,y (t), . . . , Pm,y (t) are bounded uniformly in y. Inspecting the d = 1 analysis, we
conclude that the exponent ry returned by that algorithm is then also bounded uniformly
in y. We can always raise the value of ry by multiplying both sides of (2.43) by Ry , and
so we can make r = ry independent of y, thus
Now we need to work on the Q’s. Unfortunately, the coefficients on Q are not poly-
nomial in y; instead, they are piecewise rational in y. Indeed, by inspecting the al-
gorithm used to prove the d = 1 case, we see that the algorithm makes a finite num-
ber of branches, depending on whether certain polynomial expressions T (y) of y are
zero or non-zero. At the end of each branching path, the algorithm returns polynomi-
als Q1,y , . . . , Qm,y whose coefficients were rational combinations of the coefficients of
P1,y , . . . , Pm,y and are thus rational functions of x. Furthermore, all the division opera-
tions are by polynomials T (y) which were guaranteed to be non-zero by some stage of
the branching process, and so the net denominator of any of these coefficients is some
product of the T (y) that are guaranteed non-zero.
An example might help illustrate what’s going on here. Suppose that m = 2 and
R = 1, and that P1 (y,t), P2 (y,t) are linear in t, thus
for some polynomials a, b, c, d ∈ F[y]. To find the gcd of P1,y and P2,y for a given
y, which determines the solvability of the system P1,y (t) = P2,y (t) = 0, the Euclidean
algorithm branches as follows:
1. If b(y) is zero, then
(a) If a(y) is zero, then
1
i. If d(y) is non-zero, then 0P1,y + d(y) P2,y is the gcd (and the system is
solvable).
1
ii. Otherwise, if d(y) is zero and c(y) is non-zero, then 0P1,y + c(y) P2,y = 1
is the gcd (and the system is unsolvable).
2.15. HILBERT’S NULLSTELLENSATZ 177
iii. Otherwise, if d(y) and c(y) are both zero, then 0P1,y + 0P2,y is the gcd
(and the system is solvable).
1
(b) Otherwise, if a(y) is non-zero, then a(y) P1,y + 0P2,y = 1 is the gcd (and the
system is unsolvable).
2. Otherwise, if b(y) is non-zero, then
d(y) b(y)
(a) If a(y)d(y)−b(y)c(y) is non-zero, then a(y)d(y)−b(y)c(y) P1,y − a(y)d(y)−b(y)c(y) P2,y =
1 is the gcd (and the system is unsolvable).
1
(b) Otherwise, if a(y)d(y) − b(y)c(y) is zero, then b(y) P1,y + 0P2,y is the gcd
(and the system is solvable).
So we see that even in the rather simple case of solving two linear equations in one
unknown, there is a moderately complicated branching tree involved. Nevertheless,
there are only finitely many branching paths. Some of these paths may be infeasible, in
the sense that there do not exist any y ∈ F d−1 which can follow these paths. But given
any feasible path, say one in which the polynomials S1 (y), . . . , Sa (y) are observed to be
zero, and T1 (y), . . . , Tb (y) are observed to be non-zero, we know (since we are assuming
no solution to (2.41)) that the algorithm creates an identity of the form (2.44) in which
the coefficients of Q1,y , . . . , Qm,y are rational polynomials in y, whose denominators are
products of T1 , . . . , Tb . We may thus clear denominators (enlarging r if necessary) and
obtain an identity of the form
for some polynomials U1 , . . . ,Um . This identity holds whenever y is such that S1 (y), . . . , Sa (y)
are zero and T1 (y), . . . , Tb (y) are non-zero. But an inspection of the algorithm shows
that the only reason we needed T1 (y), . . . , Tb (y) to be non-zero was in order to divide by
these numbers; if we clear denominators throughout, we thus see that we can remove
these constraints and deduce that (2.45) holds whenever S1 (y), . . . , Sa (y) are zero. Fur-
ther inspection of the algorithm then shows that even if S1 (y), . . . , Sa (y) are non-zero,
this only introduces additional terms to (2.45) which are combinations (over F[y,t]) of
S1 , . . . , Sa . Thus, for any feasible path, we obtain an identity in F[y,t]of the form
for some polynomials U1 , . . . ,Um ,V1 , . . . ,Va ∈ F[y,t]. In other words, we see that
To prove this claim, we induct backwards on the length of the partial path. So
suppose we have some partial feasible path, which required S1 (y), . . . , Sa (y) to be zero
and T1 (y), . . . , Tb (y) to be non-zero in order to get here. If this path is complete, then we
are already done, so suppose there is a further branching, say on a polynomial W (y).
At least one of the cases W (y) = 0 and W (y) 6= 0 must be feasible; and so we now
divide into three cases.
Case 1: W (y) = 0 is feasible and W (y) 6= 0 is infeasible. If we follow the W (y) = 0
path and use the inductive hypothesis, we obtain a constraint
for some r. On the other hand, since W (y) 6= 0 is infeasible, we see that the problem
for some r00 , while the infeasibility of the W (y) = 0 path means that there is no solution
to
S1 (y) = . . . = Sa (y) = W (y) = 0; T1 . . . Tb (y) 6= 0
for some Z, and then multiply (2.48) by Z r to eliminate W and obtain (2.46) as desired
(for r + r00 ).
This inductively establishes (2.46) for all partial branching paths, leading eventu-
ally to (2.42) as desired.
2.15. HILBERT’S NULLSTELLENSATZ 179
2.15.3 Notes
This article was originally posted on Nov 26, 2007 at
terrytao.wordpress.com/2007/11/26
An anonymous reader pointed out that a simpler version of the above proof was
obtained by Arrondo[Ar2006] (and indepedently by Manetti). The main new idea
is to first apply a generic linear change of variables to ensure some additional non-
degeneracy in the coefficients of the polynomials, which reduces the number of possi-
bilities when one then turns to the induction on dimension.
180 CHAPTER 2. EXPOSITORY ARTICLES
P1 (x), . . . , Pm (x) ≥ 0
is not the case, then we have b j < ak for some j, k, which allows us to fashion −1 as a
non-negative linear combination of (x − ak ) and (b j − x), and the claim follows.
Now suppose that d ≥ 2 and the claim has already been proven for d − 1. As in the
previous post, we now split x = (x0 ,t) for x0 ∈ Rd−1 and t ∈ R. Each linear inequality
Pj (x0 ,t) ≥ 0 can now be rescaled into one of three forms: t − a j (x0 ) ≥ 0, b j (x0 ) − t ≥ 0,
and c j (x0 ) ≥ 0.
We fix x0 and ask what properties x0 must obey in order for the above system to
be solvable in t. By the one-dimensional analysis, we know that the necessary and
sufficient conditions are that c j (x0 ) ≥ 0 for all c j , and that a j (x0 ) ≤ bk (x0 ) for all a j
and bk . If we can find an x0 obeying these inequalities, then we are in conclusion I and
we are done. Otherwise, we apply the induction hypothesis and conclude that we can
fashion −1 as a non-negative linear combination of the c j (x0 ) and of the bk (x0 ) − a j (x0 ).
But the bk (x0 ) − a j (x0 ) can in turn be expressed as a non-negative linear combination
of t − a j (x0 ) and bk (x0 ) − t, and so we are in conclusion II as desired.
Exercise 2.1. Use Farkas’ lemma to derive the duality theorem in linear programming.
2.16.1 Applications
Now we connect the above lemma to results which are closer to the Hahn-Banach
theorem in its traditional form. We begin with
Theorem 2.55 (Separation theorem). Let A, B be disjoint convex polytopes in Rd . Then
there exists an affine-linear functional P : Rd → R such that P(x) ≥ 1 for x ∈ A and
P(x) ≤ −1 for x ∈ B.
Proof. We can view the system of inequalities P(x)−1 ≥ 0 for x ∈ A and −1−P(x) ≥ 0
for x ∈ B as a system of linear equations on P (or, if you wish, on the coefficients of
P). If this system is solvable then we are done, so suppose the system is not solvable.
Applying Farkas’ lemma, we conclude that there exists x1 , . . . , xm ∈ A and y1 , . . . , yn ∈ B
and non-negative constants q1 , . . . , qm , r1 , . . . , rn such that
m n
∑ qi (P(xi ) − 1) + ∑ r j (−1 − P(y j )) = −1
i=1 j=1
But by convexity the left-hand side is in A and the right-hand side in B, a contradiction.
The above theorem asserts that any two disjoint convex polytopes can be separated
by a hyperplane. One can establish more generally that any two disjoint convex bodies
can be separated by a hyperplane; in particular, this implies that if a convex function
always exceeds a concave function, then there is an affine linear function separating
the two. From this it is a short step to the Hahn-Banach theorem, at least in the setting
of finite-dimensional spaces; if one wants to find a linear functional λ : Rn → R which
has prescribed values on some subspace W , and lies between some convex and concave
sets (e.g. −kxk ≤ λ (x) ≤ kxk for some semi-norm kxk), then by quotienting out W we
can reduce to the previous problem.
We turn now to the minimax theorem. Consider a zero-sum game between two
players, Alice and Bob. Alice can pick any one of n strategies using a probability
distribution p = (p1 , . . . , pn ) of her choosing; simultaneously, Bob can pick any one of
m strategies using a probability distribution q = (q1 , . . . , qm ) of his choosing. Alice’s
expected payoff F then takes the form F(p, q) := ∑ni=1 ∑mj=1 ci, j pi q j for some fixed real
coefficients ci, j ; Bob’s expected payoff in this zero-sum game is then −F(p, q).
Theorem 2.56 (Minimax theorem). Given any coefficients ci, j , there exists a unique
optimal payoff α such that
I. (Alice can expect to win at least α) There exists an optimal strategy p∗ for Alice
such that F(p∗ , q) ≥ α for all q;
II. (Bob can expect to lose at most α) There exists an optimal strategy q∗ for Bob
such that F(p, q∗ ) ≤ α for all p.
Proof. By playing Alice’s optimal strategy off against Bob’s, we see that the supremum
of the set of α which obeys conclusion I is clearly finite, and less than or equal to the
infimum of the set of α which obey conclusion II, which is also finite. To finish the
proof, it suffices to show that these two numbers are equal. If they were not equal, then
we could find an α for which neither of conclusions I and II were true.
If conclusion I failed, this means that the system of linear equations
has no solution. From the convexity of F(p,q) in q, we can replace this system with
n
p1 , . . . , pn ≥ 0; p1 + . . . + pn ≤ 1; ∑ ci, j pi ≥ α for all j.
i=1
The minimax theorem can be used to give game-theoretic proofs of various theo-
rems of Hahn-Banach type. Here is one example:
Theorem 2.57 (Menger’s theorem). Let G be a directed graph, and let v and w be
non-adjacent vertices in G. Then the max-flow from v to w in G (the largest number of
disjoint paths one can find in G from v to w) is equal to the min-cut (the least number
of vertices (other than v or w) one needs to delete from G to disconnect v from w).
The proof we give here is definitely not the shortest proof of Menger’s theorem,
but it does illustrate how game-theoretic techniques can be used to prove combinatorial
theorems.
Proof. Consider the following zero-sum game. Bob picks a path from v to w, and
Alice picks a vertex (other than v or w). If Bob’s path hits Alice’s vertex, then Alice
wins 1 (and Bob wins −1); otherwise Alice wins 0 (and Bob wins 0 as well). Let α
be Alice’s optimal payoff. Observe that we can prove α ≤ 1/ maxflow by letting Bob
picks one of some maximal collection of disjoint paths from v to w at random as his
strategy; conversely, we can prove α ≥ 1/ mincut by letting Alice pick a vertex from
some minimal cut set at random as her strategy. To finish the proof we need to show
that in fact 1/α ≥ mincut and 1/α ≤ maxflow.
Let’s first show that 1/α ≥ mincut. Let’s assume that Alice is playing an optimal
strategies, to get the optimal payoff α for Alice. Then it is not hard to use the optimality
to show that every path that Bob might play must be hit by Alice with probability
exactly α (and any other path will be hit by Alice with probability at least α), and
conversely every vertex that Alice might pick will be hit by Bob with probability α
(and any other vertex will be hit by Bob with probability at most α).
Now suppose that two of Bob’s paths intersect at some intermediate vertex u. One
can show that the two resulting sub-paths from v to u must have an equal chance of
being hit by Alice, otherwise by swapping those two sub-paths one can create a path
which Alice hits with probability strictly less than α, a contradiction. Similarly, the
two sub-paths from u to w must have equal chance of being hit by Alice.
Now consider all the vertices u that Alice can pick for which there exists a path
of Bob which hits u before it hits any other vertex of Alice. Let U be the set of such
u. Every path from v to w must hit U, because if it is possible to avoid U and instead
hit another vertex of Alice, it will again be possible to crate a path that Alice hits with
probability strictly less than α, by the above discussion. Thus, U is a cut set. Any
given path of Bob hits exactly one vertex in U (again by the above discussion). Since
each u in U has a probability α of being hit by Bob, we thus see that this cut set has
size exactly 1/α. Thus 1/α ≥ mincut as desired.
Now we show that α ≤ maxflow. Define an α-flow to be a collection of non-
negative weights on the directed edges of G such that
1. the net flow at v (the total inflow minus total outflow) is −1, and the net flow at
w is +1;
2. for any other vertex u, the net flow at u is zero, and total inflow or outflow at u is
at most α.
184 CHAPTER 2. EXPOSITORY ARTICLES
We first observe that at least one α-flow exists. Indeed, if we pick one of Bob’s optimal
strategies, and weight each edge by the probability that Bob’s path passes through that
edge, one easily verifies that this gives a α-flow.
Given an α-flow, consider the undirected graph consisting of undirected versions
of directed edges on which the weight of the α-flow is positive. If this undirected
graph contains a oriented cycle, then one can modify the α-flow on this cycle by an
epsilon, increasing the flow weight by ε on edges on the cycle that go with the flow, and
reducing them by the same ε on edges that go against the flow; note that this preserves
the property of being an α-flow. Increasing ε, we eventually reduce one of the weights
to zero, thus reducing the number of edges on which the flow is supported. We can
repeat this procedure indefinitely until one arrives at an α-flow whose undirected graph
contains no cycles. Now, as v has outflow at least +1 and every vertex adjacent to v can
have inflow at most α (recall that v is not adjacent to w), the flow must propagate from
v to at least 1/α other vertices. Each of these vertices must eventually flow to w (which
is the only vertex with positive flow) by at least one path; by the above discussion, these
paths need to be disjoint. Thus we have 1/α ≤ max f low as desired.
Remark 2.58. The above argument can also be generalised to prove the max-flow min-
cut theorem, but we will not do so here.
Now we turn to Helly’s theorem. One formulation of this theorem is the following:
Remark 2.60. The reader is invited to verify Helly’s theorem in the d = 1 case, to get
a flavour as to what is going on.
For simplicity we shall just prove Helly’s theorem in the model case when each of
the B1 , . . . , Bm are convex polytopes.
Proof. A convex polytope is the intersection of finitely many half-spaces. From this,
one quickly sees that to prove Helly’s theorem for convex polytopes, it suffices to do so
for half-spaces. By translating things a bit, we may assume that none of the half-spaces
go through the origin. Then each half-space can be expressed as the form {x : P(x) ≥ 1}
for some linear functional P : Rn → R. Note that by duality, one can view P as living
in an d-dimensional vector space (Rd )∗ .
Let say that there are m half-spaces involved, and let P1 , . . . , Pm be the correspond-
ing linear functionals. It suffices to show that if the system P1 (x), . . . , Pm (x) ≥ 1 has
no solution, then there is some sub-collection Pi1 , . . . , Pi j with j ≤ d + 1 such that
Pi1 (x), . . . , Pi j (x) ≥ 1 also has no solution.
By Farkas’ lemma, we know that the system P1 (x), . . . , Pm (x) ≥ 1 has no solution if
and only if 0 is a convex combination of the P1 , . . . , Pm . So we have everything reduces
to establishing
2.16. HAHN-BANACH, MENGER, HELLY 185
Theorem 2.61 (Dual Helly theorem). Suppose that 0 can be expressed as a convex
combination of a collection of vectors v1 , . . . , vm in a d-dimensional vector space. Then
0 is also a convex combination of at most d + 1 vectors from that collection.
To prove this theorem, we use an argument a little reminiscent to that used to prove
1/α ≤ maxflow in the proof of Menger’s theorem. Suppose we can express 0 as a
convex combination of some of the vi . If at most d + 1 of the vectors have non-zero
coefficients attached to them then we are done. Now suppose instead that at least d + 2
vectors have non-zero coefficients, say v1 , . . . , vd+2 have non-zero coefficients. Then
there are at least two linear dependencies among these vectors, which allows us to
find coefficients c1 , . . . , cd+2 summing to zero, but not all zero, such that c1 v1 + . . . +
cd+2 vd+2 = 0. We can then perturb our preceding convex combination by an ε multiple
of this equation to obtain a new representation of 0 as a convex combination of vectors.
If we increase ε, we must eventually send one of the coefficients to zero, decreasing
the total number of vectors with non-zero coefficients. Iterating this procedure we
eventually obtain the dual Helly theorem and hence the original version of Helly’s
theorem.
2.16.2 Notes
This article was originally posted on Nov 30, 2007 at
terrytao.wordpress.com/2007/11/30
Francois and Mattias Aschenbrenner pointed out that a variant of Farkas lemma,
in which positivity is replaced by integrality, was obtained by Kronecker in 1884, and
generalised to certain ordered rings in [Sc2006].
186 CHAPTER 2. EXPOSITORY ARTICLES
Assuming that bodies at rest with zero mass necessarily have zero energy, this
implies the famous formula E = mc2 - but only for bodies which are at rest. For moving
bodies, there is a similar formula, but one has to first decide what the correct definition
of mass is for moving bodies; I will not discuss this issue here, though it can be found
in any textbook on relativity.
Broadly speaking, the derivation of the above proposition proceeds via the follow-
ing five steps:
1. Using the postulates of special relativity, determine how space and time coordi-
nates transform under changes of reference frame (i.e. derive the Lorentz trans-
formations).
2. Using 1., determine how the temporal frequency ν (and wave number k) of pho-
tons transform under changes of reference frame (i.e. derive the formulae for
relativistic Doppler shift).
3. Using Planck’s law E = hν (and de Broglie’s law p = h̄k) and 2., determine
how the energy E (and momentum p) of photons transform under changes of
reference frame.
4. Using the law of conservation of energy (and momentum) and 3., determine
how the energy (and momentum) of bodies transform under changes of reference
frame.
Actually, as it turns out, Einstein’s analysis for bodies at rest only needs to un-
derstand changes of reference frame at infinitesimally low velocity, |v| c. However,
in order to see enough relativistic effects to deduce the mass-energy equivalence, one
needs to obtain formulae which are accurate to second order in v (or more precisely,
v/c), as opposed to those in Newtonian physics which are accurate to first order in
2.17. EINSTEIN’S DERIVATION OF E = MC2 187
v (or v/c). Also, to understand the relationship between mass, velocity, energy, and
momentum for moving bodies rather than bodies at rest, one needs to consider non-
infinitesimal changes of reference frame.
Remark 2.63. Einstein’s argument is, of course, a physical argument rather than a
mathematical one. While I will use the language and formalism of pure mathematics
here, it should be emphasised that I am not exactly giving a formal proof of the above
Proposition in the sense of modern mathematics; these arguments are instead more like
the classical proofs of Euclid, in that numerous “self evident” assumptions about space,
time, velocity, etc. will be made along the way. (Indeed, there is a very strong analogy
between Euclidean geometry and the Minkowskian geometry of special relativity.) One
can of course make these assumptions more explicit, and this has been done in many
other places, but I will avoid doing so here in order not to overly obscure Einstein’s
original argument.
for any spacetime event E. Another way is by replacing the observer which is sta-
tionary in F with an observer which is moving at a constant velocity v in F, to create
a new inertial reference frame Fv : S → R × R with the same orientation as F. In our
analysis, we will only need to understand infinitesimally small velocities v; there will
be no need to consider observers traveling at speeds close to the speed of light.
The new frame Fv : S → R ×R and the original frame F : S → R ×R must be related
by some transformation law
Fv = Lv ◦ F (2.51)
for some bijection Lv : R × R → R × R. A priori, this bijection Lv could depend on
the original frame F as well as on the velocity v, but the principle of relativity implies
that Lv is in fact the same in all reference frames F, and so only depends on v.
188 CHAPTER 2. EXPOSITORY ARTICLES
but this is of course not enough information to fully specify Lv . To proceed further, we
recall Newton’s first law, which states that an object with no external forces applied to
it moves at constant velocity, and thus traverses a straight line in spacetime as measured
in any inertial reference frame. (We are assuming here that the property of “having no
external forces applied to it” is not affected by changes of inertial reference frame. For
non-inertial reference frames, the situation is more complicated due to the appearance
of fictitious forces.) This implies that Lv transforms straight lines to straight lines. (To
be pedantic, we have only shown this for straight lines corresponding to velocities that
are physically attainable, but let us ignore this minor technicality here.) Combining
this with (2.52), we conclude that Lv is a linear transformation. (It is a cute exercise to
verify this claim formally, under reasonable assumptions such as smoothness of Lv . )
Thus we can view Lv now as a 2 × 2 matrix.
When v = 0, it is clear that Lv should be the identity matrix I. Making the plausible
assumption that Lv varies smoothly with v, we thus have the Taylor expansion
for some matrix L00 and for infinitesimally small velocities v. (Mathematically, what
we are doing here is analysing the Lie group of transformations Lv via its Lie algebra.)
Expanding everything out in coordinates, we obtain
Lv (t, x) = ((1 + αv + O(v2 ))t + (β v + O(v2 ))x, (γv + O(v2 ))t + (1 + δ v + O(v2 ))x)
(2.54)
for some absolute constants α, β , γ, δ ∈ R (not depending on t, x, or v).
The next step, of course, is to pin down what these four constants are. We can
use the reflection symmetry (2.50) to eliminate two of these constants. Indeed, if an
observer is moving at velocity v in frame F, it is moving in velocity −v in frame F,
and hence Fv = F −v . Combining this with (2.50), (2.51), (2.54) one eventually obtains
α = 0 and δ = 0. (2.55)
Inserting this into (2.54) (and using (2.55)) we conclude that γ = −1. We have thus
pinned down Lv to first order almost completely:
Thus, rather remarkably, using nothing more than the principle of relativity and
Newton’s first law, we have almost entirely determined the reference frame transfor-
mation laws, save for the question of determining the real number β . [In mathematical
terms, what we have done is classify the one-dimensional Lie subalgebras of gl2 (R)
which are invariant under spatial reflection, and coordinatised using (2.56).] If this
number vanished, we would eventually recover classical Galilean relativity. If this
number was positive, we would eventually end up with the (rather unphysical) situa-
tion of Euclidean relativity, in which spacetime had a geometry isomorphic to that of
the Euclidean plane. As it turns out, though, in special relativity this number is nega-
tive. This follows from the second postulate of special relativity, which asserts that the
speed of light c is the same in all inertial reference frames. In equations (and because
Fv has the same orientation as F), this is asserting that
Lv (t, ct) ∈ {(t 0 , ct 0 ) : t 0 ∈ R} for all t (2.58)
and
Lv (t, −ct) ∈ {(t 0 , −ct 0 ) : t 0 ∈ R} for all t. (2.59)
Inserting either of (2.58), (2.59) into (2.57) we conclude that β = −1/c2 , and thus
we have obtained a full description of Lv to first order:
vx
Lv (t, x) = (t − 2 , x − vt) + O(v2 (|t| + |x|)). (2.60)
c
for left-ward moving radiation. (As before, one can give an exact formula here, but the
above asymptotic will suffice for us.)
is Planck’s constant, and the sign depends on whether one is moving rightward or
leftward. In particular, from (2.63) we have the pleasant relationship
E = |p|c (2.66)
for photons. [More generally, it turns out that for arbitrary bodies, momentum, velocity,
and energy are related by the formula p = c12 Ev, though we will not derive this fact
here.] Applying (2.64), (2.65), we see that if we view a photon in a new reference
frame Fv , then the observed energy E and momentum p now become
for any photon (moving either leftward or rightward) with energy E and momentum p
as measured in frame F, and energy E 0 and momentum p0 as measured in frame Fv .
Remark 2.64. Actually, the error term O(v3 ) can be deleted entirely by working a little
harder. From the linearity of Lv and the conservation of energy and momentum, it is
then natural to conclude that (2.69) should also be valid not only for photons, but for
any object that can exchange energy and momentum with photons. This can be used to
derive the formula E = mc2 fairly quickly, but let us instead give the original argument
of Einstein, which is only slightly different.
We return to frame F, and assume that our body emits two photons of equal energy
∆E/2, one moving left-ward and one moving right-ward. By (2.66) and conservation
of momentum, we see that the body remains at rest after this emission. By conservation
of energy, the remaining energy in the body is E − ∆E. Let’s say that the new mass in
the body is m − ∆m. Our task is to show that ∆E = ∆mc2 .
To do this, we return to frame Fv . By (2.67), the rightward moving photon has
energy
∆E
(1 − v/c + v2 /2c2 + O(v3 )) ; (2.71)
2
in this frame; similarly, the leftward moving photon has energy
∆E
(1 + v/c + v2 /2c2 + O(v3 )) . (2.72)
2
What about the body? By repeating the derivation of (2.69), it must have energy
1
(E − ∆E) + (m − ∆m)v2 + O(v3 ). (2.73)
2
By the principle of relativity, the law of conservation of energy has to hold in the frame
Fv as well as in the frame F. Thus, the energy (2.71) + (2.72) + (2.73) in frame Fv after
the emission must equal the energy E 0 = (2.70) in frame Fv before emission. Adding
everything together and comparing coefficients we obtain the desired relationship ∆E =
∆mc2 .
Remark 2.66. One might quibble that Einstein’s argument only applies to emissions of
energy that consist of equal and opposite pairs of photons. But one can easily generalise
the argument to handle arbitrary photon emissions, especially if one takes advantage of
(2.69); for instance, another well-known (and somewhat simpler) variant of the argu-
ment works by considering a photon emitted from one side of a box and absorbed on
the other. More generally, any other energy emission which could potentially in the fu-
ture decompose entirely into photons would also be handled by this argument, thanks
to conservation of energy. Now, it is possible that other conservation laws prevent
decomposition into photons; for instance, the law of conservation of charge prevents
an electron (say) from decomposing entirely into photons, thus leaving open the pos-
sibility of having to add a linearly charge-dependent correction term to the formula
E = mc2 . But then one can renormalise away this term by redefining the energy to
subtract such a term; note that this does not affect conservation of energy, thanks to
conservation of charge.
2.17.6 Notes
This article was originally posted on Dec 28, 2007 at
terrytao.wordpress.com/2007/12/28
Laurens Gunnarsen pointed out that Einstein’s argument required the use of quan-
tum mechanics to derive the equation E = mc2 , but that this equation can also be
derived within the framework of classical mechanics by relying more heavily on the
representation theory of the Lorentz group.
Thanks to Blake Stacey for corrections.
Chapter 3
Lectures
193
194 CHAPTER 3. LECTURES
• (Twin prime conjecture) There are infinitely many positive integers n such that
n, n + 2 are both prime.
• (Sophie Germain prime conjecture) There are infinitely many positive integers n
such that n, 2n + 1 are both prime.
• (Even Goldbach conjecture) For every even number N ≥ 4, there is a natural
number n such that n, N − n are both prime.
196 CHAPTER 3. LECTURES
As a general rule, it appears that it is feasible (after non-trivial effort) to find patterns
in the primes involving two or more degrees of freedom (as described by the parameters
n, n0 in above examples), but we still do not have the proper technology for finding
patterns in the primes involving only one degree of freedom n. (This is of course
an oversimplification; for instance, the pattern n, n + 2, n0 , n0 + 2 has two degrees of
freedom, but finding infinitely many of these patterns in the primes is equivalent to the
twin prime conjecture, and thus presumably beyond current technology. If however
one makes a non-degeneracy assumption, one can make the above claim more precise;
see [GrTa2008b].)
One useful tool for establishing some (but not all) of the above positive results is
Fourier analysis (which in this context is also known as the Hardy-Littlewood circle
method). Rather than give the textbook presentation of that method here, let us try to
motivate why Fourier analysis is an essential feature of many of these problems from
the perspective of the dichotomy between structure and randomness, and in particular
viewing structure as an obstruction to computing statistics which needs to be under-
stood before the statistic can be accurately computed.
To treat many of the above questions concerning the primes in a unified manner, let
us consider the following general setting. We consider k affine-linear forms ψ1 , . . . , ψk :
Zr → Z on r integer unknowns, and ask
Question 3.1. Does there there exist infinitely many r-tuples ~n = (n1 , . . . , nr ) ∈ Zr+ of
positive integers such that ψ1 (~n), . . . , ψk (~n) are simulatenously prime?
For instance, the twin prime conjecture is the case when k = 2, r = 1, ψ1 (n) =
n, and ψ2 (n) = n + 2; van der Corput’s theorem is the case when k = 3, r = 2, and
ψ j (n, n0 ) = n + ( j − 1)n0 for j = 0, 1, 2; and so forth.
Because of the “obvious” structures in the primes, the answer to the above question
can be “no”. For instance, since all but one of the primes are odd, we know that
there are not infinitely many patterns of the form n, n + 1 in the primes, because it
is not possible for n, n + 1 to both be odd. More generally, given any prime q, we
know that all but one of the primes is coprime to q. Hence, if it is not possible for
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 197
ψ1 (~n), . . . , ψk (~n) to all be coprime to q, the answer to the above question is basically
no (modulo some technicalities which I wish to gloss over) and we say that there is an
obstruction at q. For instance, the pattern n, n + 1 has an obstruction at 2. The pattern
n, n + 2, n + 4 has no obstruction at 2, but has an obstruction at 3, because it is not
possible for n, n + 2, n + 4 to all be coprime to 3. And so forth.
Another obstruction comes from the trivial observation that the primes are all pos-
itive. Hence, if it is not possible for ψ1 (~n), . . . , ψk (~n) to all be positive for infinitely
many values of ~n, then we say that there is an obstruction at infinity, and the answer to
the question is again “no” in this case. For instance, for any fixed N, the pattern n, N −n
can only occur finitely often in the primes, because there are only finitely many n for
which n, N − n are both positive.
It is conjectured that these “local” obstructions are the only obstructions to solv-
ability of the above question. More precisely, we have
Conjecture 3.2. (Dickson’s conjecture)[Di1904] If there are no obstructions at any
prime q, and there are no obstructions at infinity, then the answer to the above question
is “yes”.
This conjecture would imply the twin prime and Sophie Germain conjectures, as
well as the Green-Tao theorem; it also implies the Hardy-Littlewood prime tuples
conjecture[HaLi1923] as a special case. There is a quantitative version of this con-
jecture which predicts a more precise count as to how many solutions there are in a
given range, and which would then also imply Vinogradov’s theorem, as well as Gold-
bach’s conjecture (for sufficiently large N); see [GrTa2008b] for further discussion. As
one can imagine, this conjecture is still largely unsolved, however there are many im-
portant special cases that have now been established - several of which were achieved
via the Hardy-Littlewood circle method.
One can view Dickson’s conjecture as an impossibility statement: that it is impos-
sible to find any other obstructions to solvability for linear patterns in the primes than
the obvious local obstructions at primes q and at infinity. (It is also a good example of a
local-to-global principle, that local solvability implies global solvability.) Impossibil-
ity statements have always been very difficult to prove - one has to locate all possible
obstructions to solvability, and eliminate each one of them in turn. In particular, one
has to exclude various exotic “conspiracies” between the primes to behave in an un-
usually structured manner that somehow manages to always avoid all the patterns that
one is seeking within the primes. How can one disprove a conspiracy?
To give an example of what such a “conspiracy” might look like, consider the twin
prime conjecture, that of finding infinitely many pairs n, n + 2 which are both prime.
This pattern encounters no obstructions at primes q or at infinity and so Dickson’s
conjecture predicts that there should be infinitely many such patterns. In particular,
there are no obstructions at 3 because prime numbers can equal 1 or 2 mod 3, and
it is possible to find pairs n, n + 2 which also have this property. But suppose that
it transpired that all but finitely many of the primes ended up being 2 mod 3. From
looking at tables of primes this seems to be unlikely, but it is not immediately obvious
how to disprove it; it could well be that once one reaches, say, 10100 , there are no
more primes equal to 1 mod 3. If this unlikely ”conspiracy” in the primes was true,
then there would be only finitely many twin primes. Fortunately, we have Dirichlet’s
198 CHAPTER 3. LECTURES
theorem, which guarantees infinitely many primes equal to a mod q whenever a, q are
coprime, and so we can rule out this particular type of conspiracy. (This does strongly
suggest, though, that knowledge of Dirichlet’s theorem is a necessary but not sufficient
condition in order to solve the twin prime conjecture.) But perhaps there are other
conspiracies that one needs to rule out also?
To look for other conspiracies that one needs to eliminate, let us rewrite the con-
spiracy “all but finitely many of the primes are 2 mod 3” in the more convoluted format
1
0.6 < { p} < 0.7 for all but finitely many primes p
3
where {x} is the fractional part of x. This type of conspiracy can now be generalised;
for instance consider the statement
√
0 < { 2p} < 0.01 for all but finitely many primes p (3.1)
Again, such a conspiracy seems very unlikely - one would expect these fractional
parts to be uniformly distributed between 0 and 1, rather than concentrate all in the in-
terval [0, 0.01] - but it is hard to rule this conspiracy out a priori. And if this conspiracy
(3.1) was in fact true, then the twin prime conjecture would be false, as can be quickly
seen by considering the identity
√ √ √
{ 2(n + 2)} − { 2n} = 2 2 mod 1,
which forbids the two fractional parts on the left-hand side to simultaneously fall in
the interval [0, 0.01]. Thus, in order to solve the twin prime conjecture, one must rule
out
√ (3.1). Fortunately, it has been known since the work of Vinogradov[Vi1937] that
{ 2p} is in fact uniformly distributed in the interval [0, 1], and more generally that
{α p} is uniformly distributed in [0,1] whenever α is irrational. Indeed, by Weyl’s
famous equidistribution theorem (see e.g. [KuNe1974]), this uniform distribution, this
is equivalent to the exponential sum estimate
∑ e2πiα p = o( ∑ 1),
p<N p<N
progressions, which most other patterns do not have, namely that arithmetic progres-
sions tend to exist both in structured objects and in pseudorandom objects (and also in
hybrids of the two). This is why results about arithmetic progressions have tended to
be easier to establish than those about more general patterns, as one does not need to
know as much about the structured and random components of the set in which one is
looking for progressions.)
More generally, we can see that if the primes correlate in some unusual way with
a linear character e2πiα p , then this is likely to bias or distort the number of patterns
{n, n + n0 , n + 2n0 + 2} in a significant manner. However, thanks to Fourier analysis, we
can show that these “Fourier conspiracies” are in fact the only obstructions to counting
this type of pattern. Very roughly, one can sketch the reason for this as follows. Firstly,
it is helpful to create a counting function for the primes, namely the von Mangoldt
function Λ(n), defined as log p whenever n is a power of a prime p, and 0 otherwise.
This rather strange-looking function is actually rather natural, because of the identity
∑ Λ(d) = log n
d|n
for all positive integers n, where the sum is over all positive integers d which divide
n; this identity is a restatement of the fundamental theorem of arithmetic, and in fact
defines the von Mangoldt function uniquely. The problem of counting patterns {n, n +
n0 , n + 2n0 + 2} is then roughly equivalent to the task of computing sums such as
where
Λ̂(θ ) := ∑ Λ(n)e−2πinθ
n
is a sum very similar in nature to the sums ∑ p<N e2πipα mentioned earlier. Substituting
this formula into (3.2), we essentially get an expression of the form
Z 1
Λ̂(θ )2 Λ̂(−2θ )e4πiθ dθ
0
(again ignoring issues related to the ranges that n, n0 are being summed over). Thus, if
one gets good enough control on the Fourier coefficients Λ̂(θ ), which can be viewed
as a measure of how much the primes “conspire” with a linear phase oscillation with
frequency θ , then one can (in principle, at least) count the solutions to the pattern
{n, n + n0 , n + 2n0 + 2} in the primes. This is the Hardy-Littlewood circle method in a
nutshell, and this is for instance how van der Corput’s theorem and Vinogradov theorem
were first proven.
200 CHAPTER 3. LECTURES
I have glossed over the question of how one actually computes the Fourier coeffi-
cients Λ̂(θ ). It turns out that there are two cases. In the “major arc” case when θ is
rational, or close to rational (with a reasonably small denominator), then the problem
turns out to be essentially equivalent to counting primes in arithmetic progressions, and
so one uses tools related to Dirichlet’s theorem (i.e. L-functions, the Siegel-Walfisz the-
orem [Wa1936], etc.). In the “minor arc” case when θ is far from rational, one instead
uses identities such as
n
Λ(n) = ∑ µ(d) log ,
d|n
d
where µ is the Möbius function (i.e. µ(n) := (−1)k when n is the product of k distinct
prime factors for some k ≥ 0, and µ(n) = 0 otherwise), to split the Fourier coefficient
as
Λ̂(θ ) = ∑ ∑ µ(d) log(m)e2πiαdm
d m
and then one uses the irrationality of α to exhibit some significant oscillation in the
phase e2πiαdm , which cannot be fully canceled out by the oscillation in the µ(d) factor.
(In practice, the above strategy does not work directly, and one has to work with various
truncated or smoothed out versions of the above identities; this is technical and will not
be discussed here.)
Now suppose we look at progressions of length 4: n, n + n0 , n + 2n0 , n + 3n0 . As
with progressions of length 3, “linear” or “Fourier” conspiracies such as (3.1) will bias
or distort the total count of such progressions in the primes less than a given number
N. But, in contrast to the length 3 case where these are the only conspiracies that actu-
ally influence things, for length 4 progressions there are now “quadratic” conspiracies
which can cause trouble. Consider for instance the conspiracy
√
0 < { 2p2 } < 0.01 for all but finitely many primes p. (3.3)
This conspiracy, which can exist even when all linear conspiracies are eliminated, will
significantly bias the number of progressions of length 4, due to the identity
√ √ √ √
{ 2n2 } − 3{ 2(n + n0 )2 } + 3{ 2(n + 2n0 )2 } − { 2(n + 3n0 )2 } = 0 mod 1
√
which is related to the fact that the function 2n2 has a vanishing third derivative. In
this case, the conspiracy works in one’s favour, increasing the total number of progres-
sions of length 4 beyond what one would have naively expected; as mentioned before,
this is related to a remarkable “indestructability” property of progressions, which can
be used to establish things like the Green-Tao theorem without having to deal directly
with these obstructions. Thus, in order to count progressions of length 4 in the primes
accurately (and not just to establish the qualitative result that there are infinitely many
of them), one needs to eliminate conspiracies such as (3.3), which necessitates under-
2
standing exponential sums such as ∑ p<N e2πiα p for various rational or irrational num-
bers α. What’s worse, there are several further “generalised quadratic” conspiracies
which can also bias this count, for instance the conspiracy
√ √
0 < {b 2pc 3p} < 0.01 for all but finitely many primes p,
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 201
where
√ √ x 7→ bxc is the greatest integer function. The point here is that the function
b 2xc 3x has a third√ divided difference which does not entirely vanish (as with the
genuine quadratic 2x2 ), but does vanish a significant portion of the time (because the
greatest integer function obeys the linearity property bx + yc = bxc + byc a significant
fraction of the time), which does lead ultimately to a non-trivial bias effect. √Because
√
of this, one is also faced with estimating exponential sums such as ∑ p<N e2πib 2pc 3p .
It turns out that the correct way to phrase all of these obstructions is via the machinery
of 2-step nilsequences: details can be found in [GrTa2008b, GrTa2008c, GrTa2008d].
As a consequence, we can in fact give a precise count as to how many arithmetic
progressions of primes of length 4 with all primes less than N; it turns out to be
3 3p − 1 N2 N2
( ∏ (1 − 3
) + o(1)) 4 ≈ 0.4764 4 .
4 p≥5 (p − 1) log N log N
The method also works for other linear patterns of comparable “complexity” to pro-
gressions of length 4. We are currently working on the problem of longer progressions,
in which cubic and higher order obstructions appear (which should be modeled by
3-step and higher nilsequences); some work related to this should appear here shortly.
• Ergodic theory (or more specifically, multiple recurrence theory), which seeks
to find patterns in positive-measure sets under the action of a discrete dynamical
system on probability spaces (or more specifically, measure-preserving actions
of the integers Z);
• Graph theory, or more specifically the portion of this theory concerned with
finding patterns in large unstructured dense graphs; and
• Ergodic graph theory, which is a very new and undeveloped subject, which
roughly speaking seems to be concerned with the patterns within a measure-
preserving action of the infinite permutation group S∞ , which is one of several
models we have available to study infinite ”limits” of graphs.
On the other hand, we have some very rigorous connections between combinatorial
number theory and ergodic theory, and also (more recently) between graph theory and
ergodic graph theory, basically by the procedure of viewing the infinitary continuous
setting as a limit of the finitary discrete setting. These two connections go by the names
of the Furstenberg correspondence principle and the graph correspondence principle
respectively. These principles allow one to tap the power of the infinitary world (for
instance, the ability to take limits and perform completions or closures of objects) in
order to establish results in the finitary world, or at least to take the intuition gained in
the infinitary world and transfer it to a finitary setting. Conversely, the finitary world
provides an excellent model setting to refine one’s understanding of infinitary objects,
for instance by establishing quantitative analogues of “soft” results obtained in an in-
finitary manner. I will remark here that this best-of-both-worlds approach, borrowing
from both the finitary and infinitary traditions of mathematics, was absolutely neces-
sary for Ben Green and I in order to establish our result on long arithmetic progressions
in the primes. In particular, the infinitary setting is excellent for being able to rigorously
define and study concepts (such as structure or randomness) which are much ”fuzzier”
and harder to pin down exactly in the finitary world.
Let me first discuss the connection between combinatorial number theory and graph
theory. We can illustrate this connection with two classical results from the former and
latter field respectively:
• Schur’s theorem[Sc1916]: If the positive integers are coloured using finitely
many colours, then one can find positive integers x, y such that x, y, x + y all have
the same colour.
• Ramsey’s theorem[Ra1930]: If an infinite complete graph is edge-coloured us-
ing finitely many colours, then one can find a triangle all of whose edges have
the same colour.
(In fact, both of these theorems can be generalised to say much stronger statements,
but we will content ourselves with just these special cases). It is in fact easy to see that
Schur’s theorem is deducible from Ramsey’s theorem. Indeed, given a colouring of the
positive integers, one can create an infinite coloured complete graph (the Cayley graph
associated to that colouring) whose vertex set is the integers Z, and such that an edge
{a, b} with a < b is coloured using the colour assigned to b − a. Applying Ramsey’s
theorem, together with the elementary identity (c − a) = (b − a) + (c − b), we then
quickly deduce Schur’s theorem.
Let us now turn to ergodic theory. The basic object of study here is a measure-
preserving system (or probability-preserving system), which is a probability space (X, B, µ)
(i.e. a set X equipped with a sigma-algebra B of measurable sets and a probability
measure µ on that sigma-algebra), together with a shift map T : X → X, which for
simplicity we shall take to be invertible and bi-measurable (so its inverse is also mea-
surable); in particular we have iterated shift maps T n : X → X for any integer n, giving
rise to an action of the integers Z. The important property we need is that the shift map
is measure-preserving, thus µ(T (E)) = µ(E) for all measurable sets E.
In the previous lecture we saw that sets of integers could be divided (rather infor-
mally) into structured sets, pseudorandom sets, and hybrids between the two. The same
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 203
is true in ergodic theory - and this time, one can in fact make these notions extremely
precise. Let us first start with some examples:
• The circle shift, in which X := R/Z is the standard unit circle with normalised
Haar measure, and T (x) := x + α for some fixed real number α. If we identify X
with the unit circle in the complex plane via the standard identification x 7→ e2πix ,
then the shift corresponds to an anti-clockwise rotation by α. This is a very
structured system, and corresponds in combinatorial number theory to Bohr sets
such as {n ∈ Z : 0 < {nα} < 0.01}, which implicitly made an appearance in the
previous lecture.
• The two-point shift, in which X := {0, 1} with uniform probability measure, and
T simply interchanges 0 and 1. This very structured system corresponds to the
set A of odd numbers (or of even numbers) mentioned in the previous lecture.
More generally, any permutation on a finite set gives rise to a simple measure-
preserving system.
• The skew shift, in which X := (R/Z)2 is the 2-torus with normalised Haar mea-
sure, and T (x, y) := (x + α, y + x) for some fixed real number α. If we just
look at the behaviour of the x-component of this torus we see that the skew shift
contains the circle shift as a factor, or equivalently that the skew shift is an ex-
tension of the circle shift (in this particular case, since the fibres are circles and
the action on the fibres is rotation, we call this a circle extension of the circle
shift). This system is also structured (but in a more complicated way than the
previous two shifts), and corresponds√ to quadratically structured sets such as the
quadratic Bohr set {n ∈ Z : 0 < { 2n2 } < 0.01}, which made an appearance in
the previous lecture.
• Hybrid systems, e.g. products of a circle shift and a Bernoulli shift, or extensions
of a circle shift by a Bernoulli system, a doubly skew shift (a circle extension of
a circle extension of a circle shift), etc.
One can classify these systems in precise terms according to how the shift action T n
moves sets E around. On the one hand, we have some well-defined notions which
represent structure:
• Periodic systems are such that for every E, there exists a positive n such that
T n E = E. The two-point shift is an example, as is the circle shift when α is
rational.
204 CHAPTER 3. LECTURES
• Almost periodic or compact systems are such that for every E and every ε > 0,
there exists a positive n such that T n E and E differ by a set of measure at most
ε. The circle shift is a good example of this (thanks to Weyl’s equidistribution
theorem). The term ”compact” is used because there is an equivalent characteri-
sation of compact systems, namely that the orbits of the shift in L2 (X) are always
precompact in the strong topology.
On the other hand, we have some well-defined terms which represent pseudorandom-
ness:
• Strongly mixing systems are such that for every E, F, we have µ(T n E ∩ F) →
µ(E)µ(F) as n tends to infinity; the Bernoulli shift is a good example. In-
formally, this is saying that shifted sets become asymptotically independent of
unshifted sets.
• Weakly mixing systems are such that for every E, F, we have µ(T n E ∩ F) →
µ(E)µ(F) as n tends to infinity after excluding a set of exceptional values of n
of asymptotic density zero. For technical reasons, weak mixing is a better notion
to use in the structure-randomness dichotomy than strong mixing (for much the
same reason that one always wants to allow negligible sets of measure zero in
measure theory).
There are also more complicated (but well-defined) hybrid notions of structure and ran-
domness which we will not give here. We will however briefly discuss the situation for
the skew shift. This shift is not almost periodic: most sets A will become increasingly
“skewed” as it gets shifted, and will never return to resemble itself again. However, if
one restricts attention to the underlying circle shift factor (i.e. restricting attention only
to those sets which are unions of vertical fibres), then one recovers almost periodicity.
Furthermore, the skew shift is almost periodic relative to the underlying circle shift, in
the sense that while the shifts T n A of a given set A do not return to resemble A glob-
ally, they do return to resemble A when restricted to any fixed vertical fibre (this can
be shown using the method of Weyl sums from Fourier analysis and analytic number
theory). Because of this, we say that the skew shift is a compact extension of a compact
system.
As discussed in the above examples, every dynamical system is capable of gener-
ating some interesting sets of integers, specifically recurrence sets {n ∈ Z : T n x0 ∈ E}
where E is a set in X and x0 is a point in X. This set actally captures much of the dynam-
ics of E in the system (especially if X is ergodic and x0 is generic). The Furstenberg
correspondence principle reverses this procedure, starting with a set of integers A and
using that to generate a dynamical system which “models” that set in a certain way.
Modulo some minor technicalities, it works as follows.
1. As with the Bernoulli shift, we work in the space X := {0, 1}Z ≡ 2Z , with the
product sigma-algebra and the left shift; but we leave the probability measure
µ (which can be interpreted as the distribution of a certain random subset of the
integers) undefined for now. The original set A can now be interpreted as a single
point inside X.
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 205
2. Now pick a large number N, and shift A backwards and forwards up to N times,
giving rise to 2N + 1 sets T −N A, . . . , T N A, which can be thought of as 2N + 1
points inside X. We consider the uniform distribution on these points, i.e. we
shift A by a random amount between −N and N. This gives rise to a discrete
probability measure µN on X (which is only supported on 2N + 1 points inside
X). Each of these measures is approximately invariant under the shift T .
3. We now let N go to infinity. We apply the (sequential form of the) Banach-
Alaoglu theorem, which among other things shows that the space of Borel prob-
ability measures on a compact Hausdorff space (which X is) is sequentially com-
pact in the weak-* topology. (This particular version of Banach-Alaoglu can in
fact be established by a diagonalisation argument which completely avoids the
axiom of choice.) Thus we can find a subsequence of the measures µN which
converge in the weak-* topology to a limit µ (this subsequence and limit may
not be unique, but this will not concern us). Since the µN are approximately
invariant under T , with the degree of approximation improving with N, one can
easily show that the limit measure µ is shift-invariant.
By using this recipe to construct a measure-preserving system from a set of integers, it
is possible to deduce theorems in combinatorial number theory from those in ergodic
theory (similarly to how the Cayley graph construction allowed one to deduce theorems
in combinatorial number theory from those in graph theory). The most famous example
of this concerns the following two deep theorems:
• Szemerédi’s theorem[Sz1975]: If A is a set of integers of positive upper density,
and k is a positive integer, then A contains infinitely many arithmetic progres-
sions x, x + n, . . . , x + (k − 1)n of length k. (Note that the case k = 2 is trivial.)
• Furstenberg’s recurrence theorem[Fu1977]: If E is a set of positive measure in
a measure-preserving system, and k is a positive integer, then there are infinitely
many integers n for which µ(A ∩ T n A ∩ . . . ∩ T (k−1)n A) > 0. (Note that the case
k = 2 is the more classical Poincaré recurrence theorem).
Using the above correspondence principle (or a slight variation thereof), it is not diffi-
cult to show that the two theorems are in fact equivalent; see for instance Furstenberg’s
book[Fu1981]. The power of these two theorems derives from the fact that the former
works for arbitrary sets of positive density, and the latter works for arbitrary measure-
preserving systems - there are essentially no structural assumptions on the basic object
of study in either, and it is therefore quite remarkable that one can still conclude such
a non-trivial result.
The story of Szemerédi’s theorem is quite a long one, which I have discussed in
many other places [TaVu2006], [Ta2006e], [Ta2007b], [Ta2007c], though I will note
here that all the proofs of this theorem exploit the dichotomy between structure and
randomness (and there are some good reasons for this - the underlying cause of arith-
metic progressions is totally different in the structured and pseudorandom cases). I
will however briefly describe how Furstenberg’s recurrence theorem is proven (follow-
ing the approach of Furstenberg, Katznelson, and Ornstein[FuKaOr1982]; there are a
couple other ergodic theoretic proofs, including of course Furstenberg’s original proof).
206 CHAPTER 3. LECTURES
The first major step is to establish the Furstenberg structure theorem, which takes an
arbitrary measure-preserving system and describes it as a suitable hybrid of a compact
system and a weakly mixing system (or more precisely, a weakly mixing extension
of a transfinite tower of compact extensions). This theorem relies on Zorn’s lemma,
although it is possible to give a proof of the recurrence theorem without recourse to
the axiom of choice. The proof requires various tools from infinitary analysis (e.g. the
compactness of integral operators) but is relatively straightforward. Next, one makes
the rather simple observation that the Furstenberg recurrence theorem is easy to show
both for compact systems and for weakly mixing systems. In the former case, the al-
most periodicity shows that there are lots of integers n for which T n A is almost identical
with A (in the sense that they differ by a set of small measure) - which, after shifting
by n again, implies that T 2n A is almost identical with T n A, and so forth - which soon
makes it easy to arrange matters so that A ∩ T n A ∩ . . . ∩ T (k−1)n A is non-empty. In the
latter case, the weak mixing shows that for most n, the sets (or “events”) A and T n A are
almost uncorrelated (or “independent”); similarly, for any fixed m, we have A ∩ T m A
and T n (A ∩ T m A) = T n A ∩ T n+m A almost uncorrelated for n large enough. By using the
Cauchy-Schwarz inequality (in the form of a useful lemma of van der Corput) repeat-
edly, we can eventually show that A, T n A, . . . , T (k−1)n A are almost jointly independent
(as opposed to being merely almost pairwise independent) for many n, at which point
the recurrence theorem is easy to show. It is somewhat more tricky to show that one
can also combine these arguments with each other to show that the recurrence property
also holds for the transfinite combinations of compact and weakly mixing systems that
come out of the Furstenberg structure theorem, but it can be done with a certain amount
of effort, and this concludes the proof of the recurrence theorem. This same method of
proof turns out, with several additional technical twists, to establish many further va-
rieties of recurrence theorems, which in turn (via the correspondence principle) gives
several powerful results in combinatorial number theory, several of which continue to
have no non-ergodic proof even today.
(There has also been a significant amount of progress more recently by several
ergodic theorists [CoLe1988], [FuWe1996], [HoKr2005], [Zi2007] in understanding
the “structured” side of the Furstenberg structure theorem, in which dynamical notions
of structure, such as compactness, have been converted into algebraic and topologi-
cal notions of structure, in particular into the actions of nilpotent Lie groups on their
homogeneous spaces. This is an important development, and is closely related to the
polynomial and generalised polynomial sequences appearing in the previous talk, but
it would be beyond the scope of this talk to discuss it here.)
Let us now leave ergodic theory and return to graph theory. Given the power of
the Furstenberg correspondence principle, it is natural to look for something similar
in graph theory, which would connect up results in finitary graph theory with some
infinitary variant. A typical candidate for a finitary graph theory result that one would
hope to do this for is the triangle removal lemma, which was discussed in Section 1.6.
That lemma is in fact closely connected with Szemerédi’s theorem, indeed it implies
the k = 3 case of that theorem (i.e. Roth’s theorem[Ro1953]) in much the same way
that Ramsey’s theorem implies Schur’s theorem. It does turn out that it is possible to
obtain such a correspondence, although the infinitary analogues of things like the trian-
gle removal lemma are a little strange-looking (see e.g. [Ta2007e] or [LoSz2008]). But
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 207
it is easier to proceed by instead working with the concept of a graph limit. There are
several equivalent formulations of this limit, including the notion of a “graphon” in-
troduced by Lovász and Szegedy[LoSz2006], the flag algebra construction introduced
by Razborov[Ra2008c], and the notion of a permutation-invariant measure space intro-
duced by myself[Ta2007e]. I will discuss my own construction here, which is closely
modelled on the Furstenberg correspondence principle. What it does is starts with a
sequence Gn of graphs (which one should think of as getting increasingly large, while
remaining dense) and extracts a limit object, which is a probability space (X, B, µ)
together with an action of the permutation group S∞ on the integers, as follows.
Z
1. We let X = 2( 2 ) be the space of all graphs on the integers, with the standard
product (i.e. weak) topology, and hence product sigma-algebra. This space has
an obvious action of the permutation group S∞ , formed by permuting the vertices.
2. Each graph Gn generates a random graph on the integers - or equivalently, a prob-
ability measure µn in X - as follows. We randomly and independently sample the
vertices of the graph Gn infinitely often, creating a sequence (vn,i )i∈Z of vertices
in the graph Gn . (Of course, many of these vertices will collide, but this will be
not be important for us.) This then creates a random graph on the integers, with
any two integers i and j connected by an edge if their associated vertices vn,i , vn, j
are distinct and are connected by an edge in Gn . By construction, the probability
measure µn associated to this graph is already S∞ -invariant.
3. We then let n go to infinity, and extract a weak limit µ just as with the Furstenberg
correspondence principle.
It is then possible to prove results somewhat analogous to the Furstenberg structure
theorem and Furstenberg recurrence theorem in this setting, and use this to prove sev-
eral results in graph theory (as well as its more complicated generalisation, hypergraph
theory). I myself am optimistic that by transferring more ideas from traditional er-
godic theory into this new setting of “ergodic graph theory”, that one could obtain a
new tool for systematically establishing a number of other qualitative results in graph
theory, particularly those which are traditionally reliant on the Szemerédi regularity
lemma[Sz1978] (which is almost a qualitative result itself, given how poor the bounds
are). This is however still a work in progress.
(In physics, one would also insert some physical constants, such as Planck’s constant
h̄, but for the discussion here it is convenient to normalise away all of these constants.)
Observe that the form of the heat equation and Schrödinger equation differ only by
a constant factor of i (cf. “Wick rotation”). This makes the algebraic structure of the
heat and Schrödinger equations very similar (for instance, their fundamental solutions
also only differ by a couple factors of i), but the analytic behaviour of the two equations
turns out to be very different. For instance, in the category of Schwartz functions, the
heat equation can be continued forward in time indefinitely, but not backwards in time;
in contrast, the Schrödinger equation is time reversible and can be continued indefi-
nitely in both directions. Furthermore, as we shall shortly discuss, parabolic equations
tend to dissipate or destroy the pseudorandom components of a state, leaving only the
structured components, whereas Hamiltonian equations instead tend to disperse or ra-
diate away the pseudorandom components from the structured components, without
destroying them.
Let us now discuss parabolic PDE in more detail. We begin with a simple example,
namely how the heat equation can be used to solve the Dirichlet problem, of construct-
ing a harmonic function ∆u∞ = 0 in a nice domain Ω with some prescribed boundary
data. As this is only an informal discussion I will not write down the precise regularity
and boundedness hypotheses needed on the domain or data. The harmonic function
will play the role here of the “structured” or “geometric” object. From calculus of
variations we know that a smooth function u∞ : Ω → R is harmonic with the speci-
fied boundary data if and only if it minimises the Dirichlet energy E(u) := 21 Ω |∇u|2 ,
R
which is a convex functional on u, with the prescribed boundary data. One way to
locate the harmonic minimum u∞ is to start with an arbitrary smooth initial function
u0 : Ω → R, and then perform gradient flow ∂t u = − δδEu (u) = ∆u on this functional,
i.e. solve the heat equation with initial data u(0) = u0 . One can then show (e.g. by
spectral theory of the Laplacian) that regardless of what (smooth) data u0 one starts
with, the solution u(t) to the heat equation exists for all positive time, and converges to
the (unique) harmonic function u∞ on Ω with the prescribed boundary data in the limit
t → ∞. Thus we see that the heat flow removes the “random” component u0 − u∞ of
the initial data over time, leaving only the “structured” component u∞ .
There are many other settings in geometric topology in which one wants to locate
a geometrically structured object (e.g. a harmonic map, a constant-curvature manifold,
a minimal surface, etc.) within a certain class (e.g. a homotopy class) by minimising
an energy-like functional. In some cases one can achieve this by brute force, creating a
minimising sequence and then extracting a limiting object by some sort of compactness
argument (as is for instance done in the Sacks-Uhlenbeck theory[SaUh1981] of mini-
mal 2-spheres), but then one often has little control over the resulting structured object
that one obtains in this manner. By using a parabolic flow (as for instance done in
the work of Eells-Sampson[EeSa1964] to obtain harmonic maps in a given homotopy
class via harmonic map heat flow) one can often obtain much better estimates and other
control on the limit object, especially if certain curvatures in the underlying geometry
have a favourable sign.
The most famous recent example of the use of parabolic flows to establish geomet-
ric structure from topological objects is, of course, Perelman’s use of the Ricci flow ap-
plied to compact 3-manifolds with arbitrary Riemannian metrics, in order to establish
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 209
the Poincaré conjecture (for the special case of simply connected manifolds) and more
generally the geometrisation conjecture (for arbitrary manifolds). Perelman’s work
showed that Ricci flow, when applied to an arbitrary manifold, will eventually create
either extremely geometrically structured, symmetric manifolds (e.g. spheres, hyper-
bolic spaces, etc.), or singularities which are themselves very geometrically structured
(and in particular, their asymptotic behaviour is extremely rigid and can be classified
completely). By removing all of the geometric structures that are generated by the flow
(via surgery, if necessary) and continuing the flow indefinitely, one can eventually re-
move all the ”pseudorandom” elements of the initial geometry and describe the original
manifold in terms of a short list of very special geometric manifolds, precisely as pre-
dicted by the geometrisation conjecture. It should be noted that Hamilton[Ha1982] had
earlier carried out exactly this program assuming some additional curvature hypothe-
ses on the initial geometry; also, when Ricci flow is instead applied to two-dimensional
manifolds (surfaces) rather than three, Hamilton [Ha1988] observed that Ricci flow ex-
tracts a constant-curvature metric as its structured component of the original metric,
giving an independent proof of the uniformisation theorem (see [ChLuTi2006] for full
details).
Let us now leave parabolic PDE and geometric topology and now discuss Hamilto-
nian PDE, specifically those of (nonlinear) Schrödinger type. (Other classes of Hamil-
tonian PDE, such as nonlinear wave or Airy type equations, also exhibit similar fea-
tures, but we will restrict attention to Schrödinger for sake of concreteness.) These
equations formally resemble Hamiltonian ODE, which can be viewed as finite-dimensional
measure-preserving dynamical systems with a continuous time parameter t ∈ R. How-
ever, this resemblance is not rigorous because Hamiltonian PDE have infinitely many
degrees of freedom rather than finitely many; at a technical level, this means that the dy-
namics takes place on a highly non-compact space (e.g. the energy surface), whereas
much of the theory of finite-dimensional dynamics implicitly relies on at least local
compactness of the domain. Nevertheless, in many dispersive settings (e.g. when the
spatial domain is Euclidean) it seems that almost all of the infinitely many degrees of
freedom are so “pseudorandom” or “radiative” as to have an essentially trivial (or more
precisely, linear and free) impact on the dynamics, leaving only a mysterious ”core”
of essentially finite-dimensional (or more precisely, compact) dynamics which is still
very poorly understood at present.
To illustrate these rather vague assertions, let us first begin with the free linear
Schrödinger equation iut + ∆u = 0, where u : R × Rn → C has some specified initial
data u(0) = u0 : Rn → C, which for simplicity we shall place in the Schwartz class. It is
not hard to show, using Fourier analysis, that a unique smooth solution, well-behaved at
spatial infinity, to this equation exists, and furthermore obeys the L2 (Rn ) conservation
law Z Z
|u(t, x)|2 dx = |u0 (x)|2 dx, (3.4)
Rn Rn
which one might correlate with.) More generally, an arbitrary solution will decompose
uniquely into the sum of a radiating state obeying (3.6), and a unconditionally conver-
gent linear combination of bound states. The proof of these facts largely rests on the
spectral theorem for the underlying Hamiltonian −∆ +V ; the bound states correspond
to pure point spectrum, the weak dispersion property (3.6) corresponds to continuous
spectrum, and the strong dispersion property (3.5) corresponds to absolutely continu-
ous spectrum. Thus the RAGE theorem gives a nice connection between dynamics and
spectral theory. Let us now turn to nonlinear Schrödinger equations. There are a large
number of such equations one could study, but let us restrict attention to a particularly
intensively studied case, the cubic nonlinear Schrödinger equation (NLS)
iut + ∆u = µ|u|2 u
where µ is either equal to +1 (the defocusing case) or −1 (the focusing case). (This
particular equation arises often in physics as the leading approximation to a Taylor
expansion to more complicated dispersive models, such as those for plasmas, mesons,
or Bose-Einstein condensates.) We specify initial data u(0, x) = u0 (x) as per usual, and
to avoid technicalities we place this initial data in the Schwartz class. Unlike the linear
case, it is no longer automatic that smooth solutions exist globally in time, although it
is not too hard to at least establish local existence of smooth solutions. There are thus
several basic questions:
1. (Global existence) Under what conditions do smooth solutions u to NLS exist
globally in time?
2. (Asymptotic behaviour, global existence case) If there is global existence, what
is the limiting behaviour of u(t) in the limit as t goes to infinity?
3. (Asymptotic behaviour, blowup case) If global existence fails, what is the limit-
ing behaviour of u(t) in the limit as t approaches the maximal time of existence?
For reasons of time and space I will focus only on Questions 1 and 2, although Ques-
tion 3 is very interesting (and very difficult). The answer to these questions is rather
complicated (and still unsolved in several cases), depending on the sign µ of the nonlin-
earity, the ambient dimension n, and the size of the initial data. Here are some sample
results regarding Question 1 (most of which can be found for instance in [Caz2003] or
[Ta2006d]):
• If n = 1, then one has global smooth solutions for arbitrarily large data and any
choice of sign.
• For n = 2, 3, 4, one has global smooth solutions for arbitrarily large data in the
defocusing case (this is particularly tricky[RyVi2007] in the energy-critical case
n = 4), and for small data in the focusing case. For large data in the focusing
case, finite time blowup is possible.
• For n ≥ 5, one has global smooth solutions for small data with either sign. For
large data in the focusing case, finite time blowup is possible. For large data
in the defocusing case, the existence of global smooth solutions are unknown
212 CHAPTER 3. LECTURES
even for spherically symmetric data, indeed this problem, being supercritical, is
of comparable difficulty to the Navier-Stokes global regularity problem (Section
1.4).
Incidentally, the relevance of the sign µ can be seen by considering the conserved
Hamiltonian
1 1
Z
H(u0 ) = H(u(t)) := |∇u(t, x)|2 + µ |u(t, x)|4 dx.
Rn 2 4
In the defocusing case the Hamiltonian is positive definite and thus coercive; in the
focusing case it is indefinite, though in low dimensions and in conjunction with the L2
conservation law one can sometimes recover coercivity.
Let us now assume henceforth that the solution exists globally (and, to make a
technical assumption, also assume that the solution stays bounded in the energy space
H 1 (Rn )) and consider Question 2. As in the linear case, we can see two obvious
possible asymptotic behaviours for the solution u(t). Firstly there is the dispersive
or radiating scenario in which (3.5) or (3.6) occurs. (For technical reasons relating
to Galilean invariance, we have to allow the compact set K to be translated in time
by an arbitrary time-dependent displacement x(t), unless we make the assumption of
spherical symmetry; but let us ignore this technicality.) This scenario is known to
take place when the initial data is sufficiently small. (Indeed, it is conjectured to take
place whenever the data is “strictly smaller” in some sense than that of the small-
est non-trivial bound state, aka the ground state; there has been some recent progress
on this conjecture [KeMe2006], [HoRo2008] in the spherically symmetric case.) In
dimensions n = 1, 3, 4, this scenario is also known to be true for large data in the de-
focusing case (the case n = 1 by inverse scattering considerations[No1980], the case
n = 3 by Morawetz inequalities[GiVe1985], and the case n = 4 by the recent work in
[RyVi2007]; the n = 2 case is a major open problem).
The opposite scenario is that of a nonlinear bound state u(t, x) = e−iEt ψ(x), where
E > 0 and ψ solves the time-independent NLS −∆ψ + µ|ψ|2 ψ = −Eψ. From the
Pohozaev identity or the Morawetz inequality one can show that non-trivial bound
states only exist in the focusing case µ = −1, and in this case one can construct
such states, for instance by using the work of Berestycki and Lions[BeLi1980]. So-
lutions constructed using these nonlinear bound states are known as stationary soli-
tons (or stationary solitary waves). By applying the Galilean invariance of the NLS
equation one can also create travelling solitons. With some non-trivial effort one can
also combine these solitons with radiation (as was done recently in three dimensions
[Be2007]), and one should also be able to combine distant solitons with each other to
form multisoliton solutions (this has been achieved in one dimension by inverse scat-
tering methods[No1980], as well as for the gKdV equation[MaMeTs2002] which is
similar in many ways to NLS.) Presumably one can also form solutions which are a
superposition of multisolitons and radiation.
The soliton resolution conjecture asserts that for “generic” choices of (arbitrarily
large) initial data to an NLS with a global solution, the long-time behaviour of the
solution should eventually resolve into a finite number of receding solitons (i.e. a mul-
tisoliton solution), plus a radiation component which decays in senses such as of (3.5)
3.1. SIMONS LECTURE SERIES: STRUCTURE AND RANDOMNESS 213
or (3.6). (For short times, all kinds of things could happen, such as soliton collisons,
solitons fragmenting into radiation or into smaller solitons, etc., and indeed this sort
of thing is observed numerically.) This conjecture (which is for instance discussed in
[So2006], [Ta2004], [Ta2007d]) is still far out of reach of current technology, except
in the special one-dimensional case n = 1 when the equation miraculously becomes
completely integrable, and the solutions can be computed rather explicitly via inverse
scattering methods, as was for instance carried out by Novoksenov[No1980]. In that
case the soliton resolution conjecture was indeed verified for generic data (in which
the associated Lax operator had no repeated eigenvalues or resonances), however for
exceptional data one could have a number of exotic solutions, such as a pair of solitons
receding at a logarithmic rate from each other, or of periodic or quasiperiodic “breather
solutions” which are not of soliton form.
Based on this one-dimensional model case, we expect the soliton resolution con-
jecture to hold in higher dimensions also, assuming sufficient uniform bounds on the
global solution to prevent blowup or “weak turbulence” from causing difficulties. How-
ever, the fact that a good resolution into solitons is only expected for “generic” data
rather than all data makes the conjecture extremely problematic, as almost all of our
tools are based on a worst-case analysis and thus cannot obtain results that are only
supposed to be true generically. (This is also a difficulty which seems to obstruct the
global solvability of Navier-Stokes, as discussed in Section 1.4.) Even in the spheri-
cally symmetric case, which should be much simpler (in particular, the solitons must
now be stationary and centred at the origin), the problem is wide open.
Nevertheless, there is some recent work which gives a small amount of progress
towards the soliton resolution conjecture. For spherically symmetric energy-bounded
global solutions (of arbitrary size) to the focusing cubic NLS in three dimensions, it
is a result of myself [Ta2004] that the solution ultimately decouples into a radiating
term obeying (3.5), plus a ”weakly bound state” which is asymptotically orthogonal to
all radiating states, is uniformly smooth, and exhibits a weak decay at spatial infinity.
If one is willing to move to five and higher dimensions and to weaken the strength of
the nonlinearity (e.g. to consider quadratic NLS in five dimensions) then a stronger
result[Ta2007d] is available under similar hypotheses, namely that the weakly bound
state is now almost periodic, ranging inside of a fixed compact subset of energy space,
thus providing a “dispersive compact attractor” for this equation. In principle, this
brings us back to the realm of dynamical systems, but we have almost no control on
what this attractor is (though it contains all the soliton states and respects the symme-
tries of the equation), and so it is unclear what the next step should be. (There is a
similar result in the non-radial case which is more complicated to state: see [Ta2007d]
for more details.)
3.1.4 Notes
These articles were originally posted on Apr 5-8, 2007 at
terrytao.wordpress.com/2007/04/05
terrytao.wordpress.com/2007/04/07
214 CHAPTER 3. LECTURES
terrytao.wordpress.com/2007/04/08
3.2. OSTROWSKI LECTURE 215
We suppose that we can measure some (but perhaps not all) of the Fourier coefficients
of f , and ask whether we can reconstruct f from this information; the objective is to
use as few Fourier coefficients as possible. More specifically, we fix a set Λ ⊂ Z/NZ
of “observable” frequencies, and pose the following two questions:
For instance, if Λ is the whole set of frequencies, i.e. Λ = Z/NZ, then the answer to Q1
is “yes” (because the Fourier transform is injective), and an answer to Q2 is provided
by the Fourier inversion formula
1
f (x) = √ ∑ cξ e2πix/ξ
N ξ ∈Z/NZ
which can be computed quite quickly, for instance by using the fast Fourier transform.
Now we ask what happens when Λ is a proper subset of Z/NZ. Then the answer
to Q1, as stated above, is “no” (and so Q2 is moot). One can see this abstractly by a
degrees-of-freedom argument: the space of all functions f on N points has N degrees
of freedom, but we are only making |Λ| measurements, thus leaving N − |Λ| remaining
degrees of freedom in the unknown function f . If |Λ| is strictly less than N, then
there are not enough measurements to pin down f precisely. More concretely, we can
easily use the Fourier inversion formula to concoct a function f which is not identically
zero, but whose Fourier transform vanishes on Λ (e.g. consider a plane wave whose
frequency lies outside of Λ). Such a function is indistinguishable from the zero function
as far as the known measurements are concerned.
216 CHAPTER 3. LECTURES
However, we can hope to recover unique solvability for this problem by making
an additional hypothesis on the function f . There are many such hypotheses one could
make, but for this toy problem we shall simply assume that f is sparse. Specifically, we
fix an integer S between 1 and N, and say that a function f is S-sparse if f is non-zero
in at most S places, or equivalently if the support supp( f ) := {x ∈ Z/NZ : f (x) 6= 0}
has cardinality less than or equal to S. We now ask the following modified versions of
the above two questions:
1. Let S and N be known integers, let f : Z/NZ → C be an unknown S-sparse
function, let Λ ⊂ Z/NZ a known set of frequencies, and let cξ = fˆ(ξ ) be a
sequence of known Fourier coefficients of f for all ξ ∈ Λ. Is it possible to
reconstruct f uniquely from this information?
2. If so, what is a practical algorithm for finding f ?
Note that while we know how sparse f is, we are not given to know exactly where f is
sparse - there are S positions out of the N total positions where f might be non-zero, but
we do not know which S positions these are. The fact that the support is not known a
priori is one of the key difficulties with this problem. Nevertheless, setting that problem
aside for the moment, we see that f now has only S degrees of freedom instead of N,
and so by repeating the previous analysis one might now hope that the answer to Q1
becomes yes as soon as |Λ| ≥ S, i.e. one takes at least as many measurements as the
sparsity of f .
Actually, one needs at least |Λ| ≥ 2S (if 2S is less than or equal to N), for the
following reason. Suppose that |Λ| was strictly less than 2S. Then the set of functions
supported on {1, . . . , 2S} has more degrees of freedom than are measured by the Fourier
coefficients at Λ. By elementary linear algebra, this therefore means that there exists a
function f supported on {1, . . . , 2S} whose Fourier coefficients vanish on Λ, but is not
identically zero. If we split f = f1 − f2 where f1 is supported on {1, . . . , S} and f2 is
supported on {S + 1, . . . , 2S}, then we see that f1 and f2 are two distinct S-sparse func-
tions whose Fourier transforms agreed on Λ, thus contradicting unique recoverability
of f .
One might hope that this necessary condition is close to being sharp, so that the
answer to the modified Q1 is yes as soon as |Λ| is larger than 2S. By modifying the
arguments of the previous paragraph we see that Q1 fails if and only if there exists a
non-zero 2S-sparse function whose Fourier transform vanished on all of Λ, but one can
hope that this is not the case, because of the following heuristic:
Principle 3.3 (Uncertainty principle). (informal) If a function is sparse and not iden-
tically zero, then its Fourier transform should be non-sparse.
This type of principle is motivated by the Heisenberg uncertainty principle in
physics; the size of the support of f is a proxy for the spatial uncertainty of f , whereas
the size of the support of fˆ is a proxy for the momentum uncertainty of f . There are a
number of ways to make this principle precise. One standard one is
Proposition 3.4 (Discrete uncertainty principle). If f is not identically zero, then
| supp( f )| × | supp( fˆ)| ≥ N.
3.2. OSTROWSKI LECTURE 217
and
∑ | fˆ(ξ )| ≤ |supp( fˆ)|1/2 ( ∑ | fˆ(ξ )|2 )1/2 .
ξ ∈Z/NZ ξ ∈Z/NZ
From this one can quickly show that one does indeed obtain unique recoverability
for S-sparse functions in cyclic groups of prime order whenever one has |Λ| ≥ 2S, and
that this condition is absolutely sharp. (There is also a generalisation of the above
uncertainty principle to composite Ndue to Meshulam[Me2006].)
This settles the (modified) Q1 posed above, at least for groups of prime order. But
it does not settle Q2 - the question of exactly how one recovers f from the given data
N, S, Λ, (cξ )ξ ∈Λ . One can consider a number of simple-minded strategies to recover f :
1. Brute force. If we knew precisely what the support supp( f ) of f was, we can
use linear algebra methods to solve for f in terms of the coefficients cξ , since we
have |Λ| equations in S unknowns (and Lemma 3.5 guarantees that this system
has maximal rank). So we can simply exhaust all the possible combinations for
supp( f ) (there are roughly NS of these) and apply linear algebra to each combi-
nation. This works, but is horribly computationally expensive, and is completely
impractical once N and S are of any reasonable size, e.g. larger than 1000.
2. l 0 minimisation. Out of all the possible functions f which match the given data
(i.e. fˆ(ξ ) = cξ for all ξ ∈ Λ), find the sparsest such solution, i.e. the solu-
tion which minimises the “l 0 norm” k f kl 0 := ∑x∈Z/NZ | f (x)|0 = | supp( f )|. This
works too, but is still impractical: the general problem of finding the sparsest
solution to a linear system of equations contains the infamous subset sum deci-
sion problem as a special case (we’ll leave this as an exercise to the reader) and
so this problem is NP-hard in general. (Note that this does not imply that the
original problem Q1 is similarly NP-hard, because that problem involves a spe-
cific linear system, which turns out to be rather different from the specific linear
system used to encode subset-sum.)
3. l 2 minimisation (i.e. the method of least squares). Out of all the possible func-
tions f which match the given data, find the one of least energy, i.e. which min-
imises the l 2 norm k f kl 2 := (∑x∈Z/NZ | f (x)|2 )1/2 . This method has the advantage
(unlike 1. and 2.) of being extremely easy to carry out; indeed, the minimiser
is given explicitly by the formula f (x) = √1N ∑ξ ∈Λ cξ e2πixξ /N . Unfortunately,
this minimiser not guaranteed at all to be S-sparse, and indeed the uncertainty
principle suggests in fact that the l 2 minimiser will be highly non-sparse.
So we have two approaches to Q2 which work but are computationally infeasible, and
one approach which is computationally feasible but doesn’t work. It turns out however
that one can take a “best of both worlds” approach halfway between method 2. and
method 3., namely:
4. l 1 minimisation (or basis pursuit): Out of all the possible functions f which
match the given data, find the one of least mass, i.e. which minimises the l 1
norm k f kl 1 := ∑x∈Z/NZ | f (x)|.
The key difference between this minimisation problem and the l 0 problem is that the
l 1 norm is convex, and so this minimisation problem is no longer NP-hard but can be
solved in reasonable (though not utterly trivial) time by convex programming tech-
niques such as the simplex method. So the method is computationally feasible; the
3.2. OSTROWSKI LECTURE 219
only question is whether the method actually works to recover the original S-sparse
function f .
Before we reveal the answer, we can at least give an informal geometric argument
as to why l 1 minimisation is more likely to recover a sparse solution than l 2 minimisa-
tion. The set of all f whose Fourier coefficients match the observed data cξ forms an
affine subspace of the space of all functions. The l 2 minimiser can then be viewed ge-
ometrically by taking l 2 balls (i.e. Euclidean balls) centred at the origin, and gradually
increasing the radius of the ball until the first point of contact with the affine subspace.
In general, there is no reason to expect this point of contact to be sparse (i.e. to lie
on a high-codimension coordinate subspace). If however we replace l 2 with l 1 , then
the Euclidean balls are replaced by octahedra, which are much “pointier” (especially
in high dimensions) and whose corners lie on coordinate subspaces. So the point of
first contact is now much more likely to be sparse. The idea of using l 1 as a “convex
relaxation” of l 0 is a powerful one in applied mathematics; see for instance [Tr2006].
It turns out that if Λ and f are structured in a perverse way, then basis pursuit
does not work (and more generally, any algorithm to solve the problem is necessarily
very unstable). We already saw the Dirac comb example, which relied on the composite
nature of N. But even when N is prime, we can construct pseudo-Dirac comb examples
which exhibit the problem: if f is for instance
√ a discretised
√ bump function adapted
to an arithmetic progression such as {−b Nc, . . . , b Nc}, then elementary Fourier
analysis reveals that the Fourier transform of f will be highly concentrated (though
not completely supported)
√ on√a dual progression (which in the above example will also
be basically {−b Nc, . . . , b Nc}, and have a rapidly decreasing tail away from this
progression. (This is related to the well-known fact that the Fourier transform of a
Schwartz function is again a Schwartz function.) If we pick Λ to be far away from this
progression - e.g. Λ = {bN/3c, . . . , b2N/3c}, then the Fourier transform will be very
small on Λ. As a consequence, while we know abstractly that exact reconstruction of
f is possible if N is a large prime assuming infinite precision in the measurements, any
presence of error (e.g. roundoff error) will mean that f is effectively indistinguishable
from the zero function. In particular it is not hard to show that basis pursuit fails in
general in this case.
The above counterexamples used very structured examples of sets of observed fre-
quencies Λ, such as arithmetic progressions. On the other hand, it turns out, remark-
ably enough, that if instead one selects random sets of frequencies Λ of some fixed
N
size |Λ| (thus choosing uniformly at arndom among all the |Λ| possibilities), then
things become much better. Intuitively, this is because all the counterexamples that
obstruct solvability tend to have their Fourier transform supported in very structured
sets, and the dichotomy between structure and randomness means that a random subset
of Z/NZ is likely to contain a proportional fraction of all structured sets. One specific
manifestation of this is
|Λ|
∑ | fˆ(ξ )|2 ≈ ∑ | f (x)|2
N x∈Z/NZ
ξ ∈Λ
for all 4S-sparse functions f, where by X ≈ Y we mean that X and Y differ by at most
10% (say). (N is not required to be prime.)
The above formulation is a little imprecise; see [CaTa2006], [RuVe2006] for more
rigorous versions. This principle asserts that if the random set Λ is just a little bit
bigger than S (by a couple logs), then it is not possible for the Fourier transform of an
S-sparse function to avoid Λ, and moreover the set Λ must receive its “fair share” of the
l 2 energy, as predicted by Plancherel’s theorem. The “uniform” nature of this principle
refers to the fact that it applies for all S-sparse functions f , with no exceptions. For
a single function f , this type of localisation of the Plancherel identity is quite easy to
prove using Chernoff’s inequality. To extend this to all sparse f , the main strategy (first
used in this type of context in [Bo1989]) is to exploit the fact that the set of sparse f has
low metric entropy and so can be described efficiently by a relatively small number of
functions. (The principle cannot possibly extend to all functions f, since it is certainly
possible to create non-zero functions whose Fourier transform vanishes everywhere on
Λ.)
By using this principle (and variants of this principle), one can indeed show that
basis pursuit works:
Theorem 3.8. [CaRoTa2006], [CaTa2006] Suppose Λ is a random set with |Λ| Slog N.
Then any given S-sparse function f will be recovered exactly by l 1 minimisation with
overwhelming probability. If one makes the stronger hypothesis |Λ| S log4 N, then
with overwhelming probability all S-sparse functions will be recovered exactly by l 1
minimisation. (Again, N is not required to be prime.)
Roughly speaking, the idea in the latter result is to use the UUP to show that the
Fourier coefficients of any sparse (or l 1 -bounded) competitor with disjoint support to
the true S-sparse solution is going to be rather orthogonal to the true solution, and
thus unlikely to be present in an l 1 minimiser. The former result is more delicate and
combinatorial, and requires computing high moments of random Fourier minors.
The method is rather robust; there is some followup work[CaRoTa2006b] which
demonstrates stability of the basis pursuit method with respect to several types of noise;
see also the survey [Ca2006]. It can also be abstracted from this toy Fourier problem
to a more general problem of recovering sparse or compressible data from few mea-
surements. As long as the measurement matrix obeys an appropriate generalisation of
the UUP, the basis pursuit methods are quite effective (both in theory, in numerical
experiments, and more recently in laboratory prototypes).
3.2.1 Notes
This article was originally posted on Apr 15, 2007 at
terrytao.wordpress.com/2007/04/15
3.2. OSTROWSKI LECTURE 221
Utpal Sarkar raised the interesting question of whether some analogue of Corollary
3.6 for arbitrary abelian groups (beyond those in [Me2006]) could be established under
the additional assumption that f and fˆ are not supported in subgroups (or cosets of
subgroups).
222 CHAPTER 3. LECTURES
• Lagrange’s four square theorem: For every positive integer N, there exists a
pattern in S of the form a, b, c, N − a − b − c.
• Vinogradov’s theorem: For every sufficiently large integer N, there exists a pat-
tern in P of the form a, b, c, N − a − b − c.
• Fermat’s two square theorem: For every prime number N = 1 mod 4, there exists
a pattern in S of the form a, N − a.
• Even Goldbach conjecture: For every even number N ≥ 4, there exists a pattern
in P of the form a, N − a.
• Fermat’s four square theorem: There does not exist any pattern in S of the form
a, a + b, a + 2b, a + 3b with b 6= 0.
• Sophie Germain conjecture: There are infinitely many patterns in P of the form
a, 2a + 1.
3.3. MILLIMAN LECTURES 223
I have deliberately phrased the above results in a unified format, namely that of
counting additive patterns with one or more free parameters a, b, . . . in either the squares
or the primes. However, this apparent unification is actually an illusion: the results
involving square numbers are much older (the Pell equation solutions, for instance,
was essentially known to Diophantus, as well as the ancient Indians) and are proven
using completely different methods than for the prime numbers. For the square num-
bers, there are some key algebraic identities and connections, ranging √ from√the high-
school factorisations a2 − b2 = (a − b)(a + b), a2 − 2b2 = (a − 2b)(a + 2b) and
a2 + b2 = (a − ib)(a + ib) to deeper connections between quadratic forms and elliptic
curves, which allow one to prove the results on the left-hand column via the meth-
ods of algebraic number theory. For the primes, on the other hand, there are very few
usable algebraic properties available: one has local (mod q) information, such as the
fact that all primes are odd (with one exception), or adjacent to a multiple of 6 (with
two exceptions), but there are essentially no global algebraic identities or structures to
exploit amongst the primes (except, perhaps, for the identities such as the Euler prod-
uct formula ζ (s) = ∏ p (1 − p−s )−1 connecting the prime numbers to the Riemann zeta
function and its relatives, although this only directly helps one count multiplicative pat-
terns in the primes rather than additive ones). So, whereas the square numbers can be
profitably studied by cleverly exploiting their special algebraic structure, when dealing
with the prime numbers it is in fact better to rely instead on more general tools which
require very little structural control on the set being studied. In particular, in recent
years we have learned that the methods of additive combinatorics, which offers tools
to count additive patterns in arbitrary sets of integers (or more generally, subsets of
an additive group), can be remarkably effective in additive prime number theory. Thus
- rather counter-intuitively - some of our strongest results about additive patterns in
primes have been obtained by using very little information about the primes at all!
To give a very simple example of how additive combinatorics can be applied to
the primes, let us consider the problem of finding parallelograms inside the primes -
patterns of the form a, a + b, a + c, a + b + c with b, c positive integers; for instance, 3,
7, 43, 47 is a parallelogram of primes. It is very hard to produce any parallelograms
of primes by algebraic means (such as an explicit formula); however, there is a simple
combinatorial argument that shows that such parallelograms exist in abundance. The
only actual information needed about the primes for this argument is the prime number
theorem, which says that the number of primes less than a large number N is equal to
(1 + o(1))N/ log N in the limit N → ∞. (Actually, we won’t even need the full strength
of the prime number theorem; the weaker statement that there are N/ log N primes
less than N which was known to Chebyshev and can be established by elementary
means based on the prime factorisation of 2N N , will suffice.)
Let N be a large number, then there are (1 + o(1))N/ log N primes less than N. This
allows us to form roughly ( 21 +o(1))N 2 / log2 N differences p−q of primes 1 < q < p ≤
N. But each of these differences takes values between 1 and N. For N large enough, we
can thus use the pigeonhole principle to conclude that there are two differences p − q
and p0 − q0 of primes which have the same value, which implies that the quadruplet
p, q, q0 , p0 forms a parallelogram. In fact, a slight refinement this argument (using the
Cauchy-Schwarz inequality, which can provide a more quantitative version of the pi-
geonhole principle) shows that there are N 3 / log4 N parallelograms of primes less
224 CHAPTER 3. LECTURES
than N, and in particular that there are infinitely many parallelograms of primes.
The above example shows how one can detect additive patterns in the primes using
very little information about the primes themselves; in the above case, the only infor-
mation we actually needed about the primes was about their cardinality. (Indeed, the
argument is not really about primes at all, and is best viewed as a general statement
about dense sets of integers, known as the Szemerédi cube lemma.) More generally,
the strategy of the additive combinatorial approach is to minimise the number of facts
one actually needs to establish about the primes, and rely primarily on tools which are
valid for rather general classes of sets of integers.
A good example of this type of tool is Szemerédi’s theorem[Sz1975], which asserts
any set of integers A of positive density contains arbitrarily long arithmetic progres-
sions; as with the case of parallelograms, the only information needed about the set
is that it is large. This theorem does not directly apply to the prime numbers P, as
they have density zero, but it turns out that there is a trick (which Ben Green and I
call the transference principle) which (very roughly speaking) lets one locate a dense
set of integers A which “models” the primes, in the sense that there is a relationship
between additive patterns in A and additive patterns in P. (The relationship here is
somewhat analogous to Monte Carlo integration, which uses the average value of a
function f on a sparse pseudorandom set to approximate the average value of f on a
much larger domain.) As a consequence of this principle, Ben and I were able to use
Szemerédi’s theorem to establish that the primes contained arbitrarily long arithmetic
progressions. There have since been a number of similar results in which Szemerédi-
type results for dense sets of integers have been transferred to yield similar statements
about the primes and related sets (e.g. constellations in Gaussian primes[Ta2006g], or
polynomial patterns in the ordinary primes[TaZi2008]).
In this talk, though, I am not going to discuss the above results further, but instead
focus on the task of using additive combinatorics to detect more general classes of
additive patterns in sets of integers such as the primes, with the philosophy of always
trying to use as little structural information about these sets as possible.
To illustrate some of the ideas, let us consider the odd Goldbach conjecture, which
asserts that any odd integer larger than 5 can be expressed as the sum of three primes.
Let’s first tackle a model problem in the same spirit: we’ll work in a cyclic group
Z/NZ instead of the integers, we will pick three sets A, B, C in this group as well as
an element x, and we will ask whether x can be expressed as the sum of an element a
from A, an element b from B, and an element c from C.
Of course, to make any headway on this problem we have to make some assump-
tions on A, B, C. Let us first assume that A, B, C are fairly dense subsets of Z/NZ,
1
say |A|, |B|, |C| ≥ 10 N. Even with such large sets, there is no guarantee that every el-
ement x can be expressed as a sum of elements from A, B, and C respectively. For
instance, if A = B = C = {1, . . . , bN/10c + 1}, we see that only about 30% of the
numbers in Z/NZ can be expressed in this way. Or, if N is a multiple of 10 and
A = B = C = {10, 20, 30, . . .} consist of those elements in Z/NZ which are multiples
of 10, then only 10% of the elements in Z/NZ can be expressed in this fashion. Thus
there are some non-trivial obstructions to this Goldbach-type problem.
However, it turns out that if one of the sets, say A, is sufficiently “uniform” or
“pseudorandom”, then one can always solve this Goldbach-type problem, regardless of
3.3. MILLIMAN LECTURES 225
what the other two sets are. This type of fact is often established by Fourier-analytic
means (or by closely related techniques, such as spectral theory), but let me give a
heuristic combinatorial argument to indicate why one would expect this type of phe-
nomenon to occur. We will work in the contrapositive: we assume that we can find an
x which cannot be expressed as the sum of elements from A, B, and C, and somehow
eliminate the role of x, B, and C to to deduce some “non-uniformity” or “structure” for
A.
So, suppose that x 6= a + b + c for all a in A, b in B, c in C. This implies that
x − a − b always avoids C. Thus x − a − b does not range freely throughout Z/NZ,
but is instead concentrated in a set of 90% the size or smaller. Because of this more
confined space, one would expect more “collisions” than usual, in the sense that there
should be more solutions to the equation x − a − b = x − a0 − b0 with a, a0 in A and b, b0
in B than one would normally expect. Rearranging, we conclude that there are more
solutions to the equation a − a0 = b − b0 than one might first expect. This means that
the differences a − a0 and the differences b − b0 have to cluster in the same region of
Z/NZ, which then suggests that we should have more collisions a − a0 = a00 − a000 with
a, a0 , a00 , a00 in A than one might first expect. To put it another way, A should contain a
higher than expected number of parallelograms a, a + r, a + s, a + r + s (also known as
additive quadruples).
The above argument can be made rigorous by two quick applications of the Cauchy-
Schwarz inequality. If we had |A|, |B|, |C| ≥ δ N for some δ > 0, say, then it is not
hard to use Cauchy-Schwarz to show that A will contain at least δ 4 N 3 parallelograms
(where we allow degenerate parallelograms, in order to simplify the formulae a little);
but if there existed an x which was not the sum of an element from A, an element from
B, and an element of C, one can use this to conclude that A must have a few more
parallelograms, in fact it must have at least (1 + cδ )δ 4 N 3 for some absolute constant
c > 0.
Taking contrapositives, we conclude that if A has a near-minimal number of par-
allelograms (between δ 4 N 3 and (1 + cδ )δ 4 N 3 ), then we can solve this Goldbach-type
problem for any x and any choice of sets B, C of density δ .
So, by using elementary additive combinatorics, we can reduce Goldbach-type
problems to the problem of counting parallelograms in a given set A. But how can
one achieve the latter task? It turns out that for this specific problem, there is an ele-
gant formula from Fourier analysis: the number of parallelograms in a set A ⊂ Z/NZ
is equal to
N 3 ∑ |1̂A (ξ )|4 (3.7)
ξ ∈Z/NZ
which is a fancy way of saying that the linear function x 7→ xξ /N mod 1 has a vanishing
second derivative.
Anyway, returning to the formula (3.7), in the case when A has density exactly δ ,
thus |A| = δ N, we see that the number of parallelograms is equal to
(δ 4 + ∑ |1̂A (ξ )|4 )N 3 .
ξ 6=0
Thus we see (informally, at least) that a set A is going to have near-minimal num-
ber of parallelograms precisely when it is Fourier-pseudorandom in the sense that its
Fourier coefficients at non-zero frequencies are all small, or in other words that the set
A exhibits no correlation or bias with respect to any non-trivial linear phase function
x 7→ e2πixξ /N . (It is instructive to consider our two counterexamples to the toy Gold-
bach problem, namely A = {1, . . . , bN/10c + 1} and A = {10, 20, 30, . . .}. The first set
is biased with respect to the phase x 7→ e2πix/N ; the second set is biased with respect to
x 7→ e2πix/10 .)
This gives us a strategy to solve Goldbach-type problems: if we can show some-
how that a set A does not correlate strongly with any non-trivial linear phase function,
then it should be sufficiently Fourier pseudorandom that there is no further obstruc-
tion to the Goldbach problem. If instead A does closely resemble something related
to a non-trivial linear phase function, then that is quite a bit of structural information
on A and that we should be able to solve the Goldbach type problem by explicit al-
gebraic counting of solutions (as is for instance the case in the two model examples
A = {1, 2, . . . , bN/10c + 1} and A = {10, 20, 30, . . .} discussed earlier).
In the case of sets of integers such as the primes, this type of strategy is known as
the Hardy-Littlewood circle method. It was successfully used by Vinogradov to estab-
lish his theorem that every sufficiently large odd number is the sum of three primes
(and thus every sufficiently large number is the sum of four primes); the problem boils
down to getting sufficiently strong estimates for exponential sums over primes such as
∑ p<N e2πiα p . In the “major arc” case where α is rational (or very close to rational)
with small denominator then methods from multiplicative number theory, based on
zeta functions and L-functions, become useful; in contrast, in the complementary “mi-
nor arc” case where α behaves irrationally, one can use more analytic methods (based,
ultimately, on the equidistribution of multiples of α modulo 1 on the unit circle, and
on the obvious fact that the product of two primes is a non-prime) to obtain good esti-
mates. (I hope to discuss this in more detail in a later post.) A similar argument was
used by van der Corput to establish that the prime numbers contained arbitrarily many
arithmetic progressions of length three. These arguments are actually quite quantitative
and precise; for instance, Vinogradov’s theorem not only gives the existence of a rep-
resentation N = p1 + p2 + p3 of any sufficiently large odd number as the sum of three
primes, but in fact gives an asymptotically precise formula as N → ∞ as to how many
such representations exist. Similarly, van der Corput’s argument gives an asymptoti-
cally precise formula as to how many arithmetic progressions of length three consisting
of primes less than N there are, as N → ∞.
This strategy unfortunately fails miserably for the even Goldbach problem, which
asks whether every even number greater than 2 is the sum of two primes; it turns out
that there is no useful analogue of the parallelogram in this problem, basically due to
3.3. MILLIMAN LECTURES 227
the fact that there is only one free parameter in the pattern one is looking for. However,
it is possible to adapt the strategy to more complicated patterns with two or more free
parameters, such as arithmetic progressions of length greater than three. For instance,
if one wants to find arithmetic progressions of length 4 in a set A, it turns out that this
problem is controlled by the number of parallelopipeds
a, a + r, a + s, a + t, a + r + s, a + r + t, a + s + t, a + r + s + t
that A contains, in much the same way that the odd Goldbach problem was controlled
by parallelograms. So, if one knows how to count how many parallelopipeds there are
in a set, one can (in principle) count how many progressions of length 4 there are as well
(and one can also count a large variety of other patterns too). One would then hope for
an elegant formula analogous to (3.7) to count these objects, but unfortunately it seems
that no such formula exists. Part of the problem is that while parallelograms are closely
tied to the linear (or Fourier) phases x 7→ xξ /N, because such phases have vanishing
second derivative, parallelopipeds are more naturally tied to the larger class of phases
which have vanishing third derivative, such as the quadratic phases x 7→ x2 ξ /N. (Ac-
tually, there are also many more “pseudoquadratic” phases, such as x 7→ bαxcβ x for
various real numbers α, β , whose third derivative exhibits some cancellation but does
not vanish entirely, and are connected to flows on nilmanifolds, but I will not discuss
them in detail here.) With this much larger class of potentially relevant phases, it ap-
pears that there is no useful analogue of the formula (3.7) (basically because there are
so many such phases out there, most of which having no significant correlation with the
set A, that the noise from these irrelevant phases drowns out the signal from those few
phases that are actually important). Nevertheless, there are a set of tools, developed ini-
tially by Timothy Gowers, in what might loosely be called quadratic Fourier analysis,
which can make precise the connection between parallelopipeds and correlations with
quadratic (or pseudoquadratic) phases; there is also the beginnings of a more general
theory connecting higher dimensional parallelopipeds and higher degree phases. This
is still work in progress, but we have already been able to use the theory to understand
several types of linear patterns already; for instance, Ben and I showed that the number
of arithmetic progressions of length four consisting of primes less than a given large
number N is equal to
3 3p − 1 N2 N2
( ∏ (1 − + o(1))) ≈ 0.4764 .
4 p≥5 (p − 1)3 log4 N log4 N
Very briefly, the role of additive combinatorics (and generalised Fourier analysis) is to
replace problems of counting patterns involving multiple prime parameters, with that
of counting correlations that involve a single prime parameter (e.g. computing a sum
2
∑ p<N e2πiα p for various real numbers α), which is significantly easier (though not
entirely trivial) and amenable to a current technology from analytic number theory. So
we don’t dispense with the number theory entirely, but thanks to combinatorics we can
reduce the amount of difficult number theoretical work that we have to do to a feasible
level.
228 CHAPTER 3. LECTURES
If A is drawn randomly from one of the above matrix ensembles, then we have a
very explicit understanding of how each of the coefficients of the matrix A behaves. But
in practice, we want to study more “global” properties of the matrix A which involve
rather complicated interactions of all the coefficients together. For instance, we could
be interested in the following (closely related) questions:
In the special cases of the real and complex Gaussian ensembles, there is a massive
amount of algebraic structure coming from the action of O(n) and U(n) that allows one
to explicitly compute various multidimensional integrals, and this approach actually
works! One gets a very explicit and useful explicit formula for the joint eigenvalue
distribution (first worked out by Ginibre, I believe) this way. But for more general
ensembles, such as the Bernoulli ensemble, such algebraic structure is not present, and
so it is unlikely that any useful explicit formula for the joint eigenvalue distribution
exists. However, one can still obtain a lot of useful information if, instead of trying to
locate each eigenvalue or singular value directly, one instead tries to compute various
special averages (e.g. moments) of these eigenvalues or singular values. For instance,
from undergraduate linear algebra we have the fundamental formulae
n
tr(A) = ∑ λk
k=1
and
n
det(A) = ∏ λk
k=1
connecting the trace and determinant of a matrix A to its eigenvalues, and more gener-
ally
n
tr(Am ) = ∑ λkm (3.8)
k=1
n
det(A − zI) = ∏ (λk − z)
k=1
and similarly for the singular values, we have
n
tr((AA∗ )m ) = ∑ σk2m
k=1
n
det(AA∗ − zI) = ∏ (σk2 − z).
k=1
So, if one can easily compute traces and determinants of the matrix A (and various
other matrices related to A), then one can in principle get quite a bit of control on
the eigenvalues and singular values. It is also worth noting that the eigenvalues and
singular values are related to each other in several ways; for instance, we have the
identity
n n
∏ |λk | = ∏ σk (3.9)
k=1 k=1
(which comes from comparing the determinants of A and AA∗ ), the inequality
σn ≤ sup |λk | ≤ σ1 (3.10)
1≤k≤n
(which comes from looking at the ratio kAxk/kxk when x is an eigenvector), and the
inequality
∑ |λk |2 ≤ ∑ σk2 (3.11)
1≤k≤n 1≤k≤n
3.3. MILLIMAN LECTURES 231
(which is easiest to see by using QR (or KAN) decomposition of the eigenvector matrix
to rotate the matrix A to be upper triangular, and then computing the trace of AA∗ ).
Let’s give some simple examples of this approach. If we take A to be the Gaussian
or Bernoulli ensemble, then the trace of A has expectation zero, and so we know that
the sum ∑nk=1 λk has expectation zero also. (Actually, this is easy to see for symme-
try reasons: A has the same distribution as −A, and so the distribution of eigenvalues
also has a symmetry around the origin.) The trace of AA∗ , by contrast, is the sum of
the squares of all the matrix coefficients, and will be close to n2 (for the Bernoulli
ensemble, it is exactly n2 ); thus we see that ∑nk=1 σk2 ∼ n2 , and so by (3.11) we have
∑nk=1 2 2
√ |λk | = O(n ). So we see that the eigenvalues and singular values should be about
O( n) on the average. By working a little harder (e.g. by playing with very high √ mo-
ments of AA∗ ) one can show that the largest singular value is also going to be O( n)
with high probability,
√ which then implies by (3.10) that all eigenvalues and singular
values will be O( n). Unfortunately, this approach does not seem to yield much in-
formation on the least singular value, which plays a major role in the invertibility and
stability of A.
It is now natural to normalise the eigenvalues and singular values of A by √1 , and
n
consider the distribution of the set of normalised eigenvalues { √1n λk : 1 ≤ k ≤ n}. If one
plots these normalised eigenvalues numerically in the complex plane for moderately
large n (e.g. n = 100), one sees a remarkable distribution; the eigenvalues appear to be
uniformly distributed in the unit circle D := {z ∈ C : |z| ≤ 1}. (For small n, there is a
little bit of a clustering on the real line, just because polynomials with real coefficients
tend to have a couple of real zeroes, but this clustering goes away in the limit as n goes
to infinity.) This phenomenon is known as the circular law; more precisely, if we let n
tend to infinity, then for every sufficiently nice set R in the plane (e.g. one could take R
to be a rectangle), one has
1 1 1
lim {1 ≤ k ≤ n : √ λk ∈ R} → |R ∩ D|.
n→∞ n n π
(Technically, this formulation is known as the strong circular law; there is also a weak
circular law, which asserts that one has convergence in probability rather than almost
sure convergence. But for this lecture I will ignore these distinctions.)
The circular law was first proven in the case of the complex Gaussian ensemble
by Mehta[Me1967], using an explicit formula for the joint distribution of the eigen-
values. But for more general ensembles, in which explicit formulae were not avail-
able, progress was more difficult. The method of moments (in which one uses (3.8)
to compute the sums of powers of the eigenvalues) is not very useful because of the
cancellations caused by the complex nature of the eigenvalues; indeed, one can show
that tr(( √1n A)m ) is roughly zero for every m, which is consistent with the circular law
but also does not preclude, for instance, all the eigenvalues clustering at the origin.
[For random self-adjoint matrices, the moment method works quite well, leading for
instance to Wigner’s semi-circular law.]
232 CHAPTER 3. LECTURES
The first breakthrough was by Girko[Gi1984], who observed that the eigenvalue
distribution could be recovered from the quantities
1 1 1 n 1
log | det( √ A − zI)| = ∑ log | √ λk − z| (3.12)
n n n k=1 n
for complex z (this expression is closely related to the Stieltjes transform 1n ∑nk=1 z−λ
1
k
of the normalised eigenvalue distribution of A, being an antiderivative of the real part
of this transform). To compute this quantity, Girko then used the formula (3.9) to relate
the determinant of √1n A − zI with the singular values of this matrix. The singular value
distribution could then be computed by the moment method (note that singular values,
unlike eigenvalues, are real and non-negative, and so we do not have cancellation prob-
lems). Putting this all together and doing a large number of algebraic computations,
one eventually obtains (formally, at least) a proof of the circular law.
There was however a technical difficulty with the above analysis, which was that
the formula (3.9) becomes very unstable when the least singular value is close to zero
(basically because of a division by zero problem). This is not merely a technical is-
sue but is fundamental to the general problem of controlling eigenvalues of non-self-
adjoint matrices √1n A: these eigenvalues can become very unstable near a region of
pseudospectrum, which can be defined as a complex number z such that the least sin-
gular value of √1n A − zI is small. The classic demonstration of this comes from the
perturbed shift matrices
0 1 0 ... 0
0 0 1 . . . 0
Aε := ... .. .
.
0 0 0 . . . 1
ε 0 0 ... 0
For sake of discussion let us take n to be even. When ε = 0, this matrix is singular, with
least singular value σ1 = 0 and with all n generalised eigenvalues equal to 0. But when
ε becomes positive, the least singular value creeps up to ε, but the n eigenvalues move
rapidly away from the origin, becoming ε 1/n e2πi j/n for j = 1, . . . , n. This is ultimately
because the zero set of the characteristic polynomial zn − ε is very sensitive to the value
of ε when that parameter is close to zero.
So, in order to make the circular law argument complete, one needs to get good
lower bounds on the least singular value of the random matrix A (as well as variants of
this matrix, such as √1n A − zI). In the case of continuous (non-Gaussian) ensembles,
this was first done by Bai [Ba1997]. To illustrate the basic idea, let us look at a toy
problem, to show that the least singular value of a real Gaussian matrix A is usually
non-zero (i.e. A is invertible with high probability). For this, we use some linear
algebra. Let X1 , . . . , Xn denote the rows of A, which we can view as vectors in Rn . Then
the least singular value of A vanishes precisely when X1 , . . . , Xn lie on a hyperplane.
This implies that one of the vectors here is a linear combination of the other n − 1;
by symmetry, we conclude that the probability that the least singular value vanishes is
3.3. MILLIMAN LECTURES 233
There are many questions to study here, but the most basic is the sum-product
problem, which we can state as follows. Let A be a finite non-empty set of elements
of a ring R (e.g. finite sets of integers, or elements of a cyclic group Z/qZ, or sets of
matrices over some ring). Then we can form the sum set
A + A := {a + b : a, b ∈ A}
To avoid degeneracies, let us assume that none (or very few) of the elements in A are
zero divisors (as this may cause A · A to become very small). Then it is easy to see that
A + A and A · A will be at least as large as A itself.
Typically, both of these sets will be much larger than A itself; indeed, if we select
A at random, we generically expect A + A and A · A to have cardinality comparable to
|A|2 . But when A enjoys additive or multiplicative structure, the sets A + A or A · A can
be of size comparable to A. For instance, if A is an arithmetic progression {a, a + r, a +
2r, . . . , a+(k −1)r} or an additive subgroup in the ring R (modulo zero divisors, such as
0), then |A + A| ∼ |A|. Similarly, if A is a geometric progression {a, ar, ar2 , . . . , ark−1 }
or a multiplicative subgroup in the ring R, then |A · A| ∼ |A|. And of course, if A is
both an additive and a multiplicative subgroup of R (modulo zero divisors), i.e. if A
is a subring of R, then |A + A| and |A · A| are both comparable in size to |A|. These
examples are robust with respect to small perturbations; for instance, if A is a dense
subset of an arithmetic progression or additive subgroup, then it is still the case that
A + A is comparable in size to A. There are also slightly more complicated examples
of interest, such as generalised arithmetic progressions, but we will not discuss these
here.
Now let us work in the ring of integers Z. This ring has no non-trivial finite ad-
ditive subgroups or multiplicative subgroups (and it certainly has no non-trivial finite
subrings), but it of course has plenty of arithmetic progressions and geometric progres-
sions. But observe that it is rather difficult for a finite set A of integers to resemble both
an arithmetic progression and a geometric progression simultaneously (unless A is very
small). So one expects at least one of A + A and A · A to be significantly larger than A
itself. This claim was made precise Erdős and Szemerédi[ErSz1983], who showed that
for some absolute constant ε > 0. The value of this constant as improved steadily over
the years; the best result currently is due to Solymosi[So2005], who showed that one
can take ε arbitrarily close to 3/11. Erdős and Szemerédi in fact conjectured that one
can take ε arbitrarily close to 1 (i.e. for any finite set of integers A, either the sum set or
product set has to be very close to its maximal size of |A|2 ), but this conjecture seems
out of reach at present. Nevertheless, even just the epsilon improvement over the trivial
bound of |A| is already quite useful. It is the first example of what is now called the
sum-product phenomenon: if a finite set A is not close to an actual subring, then either
the sum set A + A or the product set A · A must be significantly larger than A itself. One
236 CHAPTER 3. LECTURES
can view (3.7) as a “robust” version of the assertion that the integers contain no non-
trivial finite subrings; (3.7) is asserting that in fact the integers contain no non-trivial
finite sets which even come close to behaving like a subring.
In 1999, Tom Wolff (personal communication) posed the question of whether the
sum-product phenomenon held true in finite fields F p of prime order (note that such
fields have no non-trivial subrings), and in particular whether (3.14) was true when A ⊂
F p , and A was not close to being all of F p , in the sense that |A| ≤ p1−δ for some δ > 0;
of course one would need ε to depend on δ . (Actually, Tom only posed the question for
|A| ∼ p1/2 , being motivated by finite field analogues of the Kakeya problem[Wo1999],
but the question was clearly of interest for other ranges of A as well.) This question was
solved in the affirmative by Bourgain, Katz, and myself[BoKaTa2004] (in the range
pδ ≤ |A| ≤ p1−δ ) and then by Bourgain, Glibichuk, and Konyagin[BoGlKo2006] (in
the full range 1 ≤ |A| ≤ p1−δ ); the result is now known as the sum-product theorem for
F p (and there have since been several further proofs and refinements of this theorem).
The fact that the field has prime order is key; if for instance we were working in a field
of order p2 , then by taking A to be the subfield of order p we see that both A + A and
A · A have exactly the same size as A. So any proof of the sum-product theorem must
use at some point the fact that the field has prime order.
As in the integers, one can view the sum-product theorem as a robust assertion of
the obvious statement that the field F p contains no non-trivial subrings. So the main
difficulty in the proof is to find a proof of this latter fact which is robust enough to
generalise to this combinatorial setting. The standard way to classify subrings is to use
Lagrange’s theorem that the order of a subgroup divides the order of the whole group,
which is proven by partitioning the whole group into cosets of the subgroup, but this
argument is very unstable and does not extend to the combinatorial setting. But there
are other ways to proceed. The argument of Bourgain, Katz, and myself (which is
based on an earlier argument of Edgar and Miller[EdMi2003]), roughly speaking, pro-
ceeds by investigating the “dimension” of F p relative to A, or in other words the least
number of elements v1 , . . . , vd in F p such that every element of F can be expressed
in the form a1 v1 + . . . + ad vd . Note that the number of such representations is equal
to |A|d . The key observation is that as |F p | is prime, it cannot equal |A|d if d > 1,
and so by the pigeonhole principle some element must have more than one represen-
tation. One can use this “linear dependence” to reduce the dimension by 1 (assuming
that A behaves a lot like a subring), and so can eventually reduce to the d = 1 case,
which is prohibited by our assumption A < p1−δ . (The hypothesis |A| > pδ is needed
to ensure that the initial dimension d is bounded, so that the iteration only requires a
bounded number of steps.) The argument of Bourgain, Glibichuk, and Konyagin uses
a more algebraic method (a variant of the polynomial method of Stepanov[St1969]),
using the basic observation that the number of zeroes of a polynomial (counting multi-
plicity) is bounded by the degree of that polynomial to obtain upper bounds for various
sets (such as the number of parallelograms in A). More recently, a short argument of
Garaev[Ga2008] proceeds using the simple observation that if A is any non-trivial sub-
set of F p , then there must exist a ∈ A such that a + 1 6∈ A; applying this to the “fraction
field” Q[A] := {(a − b)/(c − d) : a, b, c, d ∈ A, c 6= d} of A one can conclude that Q[A]
does not in fact behave like a field, and hence A does not behave like a ring.
The sum-product phenomenon implies that if a set A ⊂ F p of medium size pδ ≤
3.3. MILLIMAN LECTURES 237
3. Alice can’t unlock Bob’s padlock... but she can unlock her own. So she removes
her lock, and sends the singly locked box back to Bob.
4. Bob can unlock his own padlock, and so retreives the object safely. At no point
was the object available to any interceptor.
A similar procedure (a slight variant of the Diffie-Hellman protocol, essentially the
Massey-Omura cryptosystem) can be used to transmit a digital message g (which one
should think of as just being a number) from Alice to Bob over an unsecured network,
as follows:
1. Alice and Bob agree (over the unsecured network) on some large prime p (larger
than the maximum size of the message g).
238 CHAPTER 3. LECTURES
3. Bob can’t decode this message (he doesn’t know a), but he doubly locks the
message by raising the message to his own power b, and returns the doubly
locked message gab mod p back to Alice.
4. Alice then “unlocks” her part of the message by taking the ath root (which can
be done by exploiting Cauchy’s theorem) and sends gb mod p back to Bob.
5. Bob then takes the bth root of the message and recovers g.
An eavesdropper (let’s call her Eve) could intercept p, as well as the three “locked”
values ga , gb , gab mod p, but does not directly recover g. Now, it is possible that one
could use this information to reconstruct g (indeed, if one could quickly take discrete
logarithms, then this would be a fairly easy task) but no feasible algorithm for this is
known (if p is large, e.g. 500+ digits); the problem is generally believed to be roughly
comparable in difficulty to that of factoring large numbers. But no-one knows how
to rigorously prove that the Diffie-Hellman reconstruction problem is hard (e.g. non-
polynomial time); indeed, this would imply P 6= NP, since this reconstruction problem
is easily seen to be in NP (though it is not believed to be NP-complete).
Using the sum-product technology, Bourgain was at least able to show that the
Diffie-Hellman protocol was secure (for sufficiently large p) if Eve was only able to see
the high bits of ga , gb , gab mod p, thus pinning down ga , gb , gab to intervals. The reason
for this is that the set {(ga , gb , gab ) ∈ F3p : a, b ∈ Z} has a lot of multiplicative structure
(indeed, it is a multiplicative subgroup of the ring F3p ) and so should be uniformly
distributed in an additive sense (by adapting the above sum-product technology to F3p ).
Another application of sum-product technology was to build efficient randomness
extractors - deterministic algorithms that can create high-quality (very uniform) ran-
dom bits from several independent low-quality (non-uniform) random sources; such
extractors are of importance in computer science and cryptography. Basically, the
sum-product estimate implies that if A, B,C ⊂ F p are sets of medium size, then the
set A + B ·C is significantly larger than A, B, or C. As a consequence, if X, Y , Z are in-
dependent random variables in F p which are not too narrowly distributed(in particular,
they are not deterministic, and thus distributed only on a single value), one can show
(with the assistance of some additive combinatorics) that the random variable X +Y Z
is significantly more uniformly distributed than X, Y , or Z. Iterating this leads to some
surprisingly good randomness extractors, as was first observed by Barak, Impagliazzo,
and Wigderson[BaImWi2006].
Another application of the above sum-product technology was to get a product
estimate in matrix groups, such as SL2 (F p ). Indeed, Helfgott[He2008] was able to
show that if A was a subset of SL2 (F p ) of medium or small size, and it was not trapped
inside a proper subgroup of SL2 (F p ), then A · A · A was significantly larger than A itself.
(One needs to work with triple products here instead of double products for a rather
trivial reason: if A was the union of a subgroup and some external element, then A · A
is still comparable in size to A, but A · A · A will be much larger. This result may not
3.3. MILLIMAN LECTURES 239
immediately look like a sum-product estimate, because there is no obvious addition, but
it is concealed within the matrix multiplication law for SL2 (F p ). The key observation
in Helfgott’s argument, which relies crucially on the sum-product estimate, is that if V
is a collection of diagonal matrices in SL2 (F p ) of medium size, and g is a non-diagonal
matrix element, then the set tr(V gV g−1 ) is significantly larger than V itself. If one
works out explicitly what this trace is, one sees a sum-product type of result emerging.
Conversely, if the trace tr(A) of a group-like set A is large, then the conjugacy classes
in A are fairly small (since trace is conjugation-invariant), which forces many pairs in
A to commute, which creates large sets V of simultaneously commuting (and hence
simultaneously diagonalisable) elements, due to the fact that if two elements in SL2
commute with a third, then they are quite likely to commute with each other. The
tension between these two implications is what underlies Helfgott’s results.
The estimate of Helfgott shows that multiplication by medium-size sets in SL2 (F p )
expands rapidly across the group (unless it is trapped in a subgroup). As a consequence
of Helfgott’s estimate, Bourgain and Gamburd[BoGa2006] were able to show that if S
was any finite symmetric set of matrices in SL2 (Z) which generated a sufficiently large
(or more precisely, Zariski dense) subgroup of SL2 (Z), and S p was the projection of
S to SL2 (Z p ), then the random walk using S p on SL2 (Z p ) was very rapidly mixing,
so that after about O(log p) steps, the walk was very close to uniform. (The precise
statement was that the Cayley graph associated to S p for each p formed an expander
family.) Quite recently, Bourgain, Gamburd, and Sarnak[BoGaSa2006] have applied
these results (and generalisations thereof) to the problem of detecting (or sieving) al-
most primes in thin algebraically generated sets. To motivate the problem, we observe
that many classical questions in prime number theory can be rephrased as one of detect-
ing prime points (p1 , . . . , pd ) ∈ P d in algebraic subsets O of a lattice Zd . For instance,
the twin prime problem asks whether the line O = {(n, n + 2) ∈ Z2 } contains infinitely
many prime points. In general, these problems are very difficult, especially once one
considers sets described by polynomials rather than linear functions; even the one-
dimensional problem of determining whether the set O = {n2 + 1 : n ∈ Z} contains
infinitely many primes has been open for quite a long time (though it is worth men-
tioning the celebrated result of Friedlander and Iwaniec[FrIw1998] that the somewhat
larger set O = {n2 + m4 : n, m ∈ Z} is known to have infinitely many primes).
So prime points are hard to detect. However, by using methods from sieve theory,
one can often detect almost prime points in various sets O - points whose coordinates
are the products of only a few primes. For instance, a famous theorem of Chen[Ch1973]
shows that the line O = {(n, n + 2) ∈ Z2 } contains infinitely many points which are
almost prime in the sense that the first coordinate is prime, and the second coordinate
is the product of at most two primes. The basic idea of sieve theory is to sift out primes
and almost primes by removing all points whose coordinates are divisible by small
factors (and then, due to various generalisations of the inclusion-exclusion principle,
one has to add back in points which are divisible by multiple small factors, and so
forth). See Section 1.10 for further discussion. In order for sieve theory to work well,
one needs to be able to accurately count the size of the original set O (or more precisely,
the size of this set restricted to a ball or a similar object), and also need to count how
many points in that set have a certain residue class modulo q, for various values of q.
(For instance, to sieve out twin primes or twin almost primes in the interval {1, . . . , N},
240 CHAPTER 3. LECTURES
one needs to count how many elements n in that interval are such that n and n + 2 are
both invertible modulo q (i.e. coprime to q) for various values of q.)
For arbitrary algebraic sets O, these tasks are very difficult. For instance, even
the basic task of determining whether a set O described by several polynomials is non-
empty is essentially Hilbert’s tenth problem, which is undecidable in general. But if the
set O is generated by a group Λ acting on Zd (in some polynomial fashion), thus O =
Λb for some point b ∈ Zd , then the problems become much more tractable. If the group
Λ is generated by some finite set S, and we restrict attention to group elements with
some given word length, the problem of understanding how O is distributed modulo q is
equivalent to asking how random walks on S of a given length distribute themselves on
(Z/qZ)d . This latter problem is very close to the problem solved by the mixing results
of Bourgain and Gamburd mentioned earlier, which is where the link to sum-product
estimates arises from. Indeed, Bourgain, Gamburd, and Sarnak have now shown that
rather general classes of algebraic sets generated by subgroups of SL2 (Z) will contain
infinitely many almost primes, as long as there are no obvious algebraic obstructions;
the methods should hopefully extend to more general groups, such as subgroups of
SLn (Z).
3.3.4 Notes
These articles were originally posted on Dec 4-6, 2007 at
terrytao.wordpress.com/2007/12/04
terrytao.wordpress.com/2007/12/05
terrytao.wordpress.com/2007/12/06
Thanks to intoverflow, Harald Helfgott, MK, Mark Meckes, ninguem, and Tom
Smith for corrections and references.
Harald Helfgott remarked that perhaps the right framework for sum-product es-
timates was that of abelian groups G acting on other abelian groups A (thus A is a
Z[G]-module); given any subsets G0 , A0 of G and A respectively that obey various non-
degeneracy conditions, one should be able to take a bounded number of combinations
of G0 and A0 to generate either about |G0 ||A0 | elements of A, or else to generate the
entire submodule hhG0 ihA0 ii.
Helfgott also remarked that the fact that two elements in SL2 that commute with a
third are likely to commute with another also holds in SLn , and more generally in any
semisimple group of Lie type, since a generic element of a semisimple Lie group is
regular semisimple. Emmanuel Kowalski pointed out that this latter result is explicitly
stated in [St1965].
Bibliography
[Ba1973] G. Bachelis, On the upper and lower majorant properties in L p (G), Quart.
J. Math. Oxford Ser. 24 (1973), 119–128.
[Ba1997] Z. D. Bai, Circular law, Ann. Probab. 25 (1997), 494–529.
[BaImWi2006] B. Barak, R. Impagliazzo, A. Wigderson, Extracting randomness us-
ing few independent sources, SIAM J. Comput. 36 (2006), no. 4, 1095–1118.
[BaDadVWa2008] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A Sim-
ple Proof of the Restricted Isometry Property for Random Matrices (aka “The
Johnson-Lindenstrauss Lemma Meets Compressed Sensing”), preprint.
241
242 BIBLIOGRAPHY
[Be1975] W. Beckner, Inequalities in Fourier analysis, Ann. of Math. 102 (1975), no.
1, 159–182.
[Be1946] F. A. Behrend, On sets of integers which contain no three terms in arithmetic
progression, Proc. Nat. Acad. Sci., 32 (1946), 331–332.
[Be2001] P. Belkale, Local systems on P1 − S for S a finite set, Compositio Math. 129
(2001), no. 1, 67–86.
[Be2006] P. Belkale, Geometric proofs of Horn and saturation conjectures, J. Alge-
braic Geom. 15 (2006), no. 1, 133–173.
[Be2008] P. Belkale, Quantum generalization of the Horn conjecture, J. Amer. Math.
Soc. 21 (2008), no. 2, 365–408.
[BeLi1980] H. Berestycki, P.-L. Lions, Existence of a ground state in nonlinear equa-
tions of the Klein-Gordon type, Variational inequalities and complementarity prob-
lems (Proc. Internat. School, Erice, 1978), pp. 35–51, Wiley, Chichester, 1980.
[BeZe1992] A. Berenstein, A. Zelevinsky, Triple multiplicities for sl(r + 1) and the
spectrum of the exterior algebra of the adjoint representation, J. Algebraic Com-
bin. 1 (1992), no. 1, 7–22.
[Be2003] V. Bergelson, Minimal idempotents and ergodic Ramsey theory, Topics in
Dynamics and Ergodic Theory 8-39, London Math. Soc. Lecture Note Series 310,
Cambridge Univ. Press, Cambridge, 2003.
[BlPo2002] V. Blondel, N. Portier, The presence of a zero in an integer linear recurrent
sequence is NP-hard to decide, Lin. Alg. Appl. 351-352 (2002), 91–98.
[BoTh1995] B. Bollobás, A. Thomason, Projections of bodies and hereditary proper-
ties of hypergraphs, Bull. London Math. Soc. 27 (1995), no. 5, 417–424.
[Bo1977] E. Bombieri, The asymptotic sieve, Rend. Accad. Naz. XL (5) 1/2 (1975/76),
243–269 (1977).
[Bo1986] J. Bourgain, A Szemerédi type theorem for sets of positive density in Rk ,
Israel J. Math. 54 (1986), no. 3, 307–316.
BIBLIOGRAPHY 243
[Bo1989] J. Bourgain, Bounded orthogonal systems and the Λ(p)-set problem, Acta
Math. 162 (1989), no. 3-4, 227–245.
[BoMi1987] J. Bourgain, V. D. Milman, New volume ratio properties for convex sym-
metric bodies in Rn , Invent. Math. 88 (1987), no. 2, 319–340.
[BuZw2005] N. Burq, M. Zworski, Bouncing ball modes and quantum chaos, SIAM
Rev. 47 (2005), no. 1, 43–49.
[CaTa2006] E. Candés, T. Tao, Near Optimal Signal Recovery From Random Projec-
tions: Universal Encoding Strategies?, IEEE Inf. Theory 52 (2006), 5406–5425.
[ChMa1995] L. Chayes, J. Machta, On the behavior of the surface tension for spin
systems in a correlated porous medium, J. Statist. Phys. 79 (1995), no. 1-2, 117–
164.
[Ch1988] M. Christ, Weak type (1, 1) bounds for rough operators, Ann. of Math. (2)
128 (1988), no. 1, 19–42
[CoLe1988] J. Conze, E. Lesigne, Sur un théoréme ergodique pour des mesures diag-
onales, C. R. Acad. Sci. Paris Sér. I Math. 306 (1988), no. 12, 491–493
[Co2000] J. Conway, Universal quadratic forms and the fifteen theorem, Quadratic
forms and their applications (Dublin, 1999), 23–26, Contemp. Math., 272, Amer.
Math. Soc., Providence, RI, 2000.
[Da1986] S. Dani, On the orbits of unipotent flows on homogeneous spaces. II, Ergodic
Thy. Dynam. Systems 6 (1986), 167–182.
[De1974] P. Deligne, La conjecture de Weil I., Inst. Hautes Études Sci. Publ. Math.,
48 (1974), pp. 273–308.
[Do2006] D. Donoho, For most large underdetermined systems of equations, the min-
imal l1 -norm near-solution approximates the sparsest near-solution, Comm. Pure
Appl. Math. 59 (2006), no. 7, 907–934.
[Ed2004] Y. Edel, Extensions of generalized product caps, Designs, Codes, and Cryp-
tography, 31 (2004), 5–14.
[EdMi2003] G. Edgar, C. Miller, Borel subrings of the reals, Proc. Amer. Math. Soc.
131 (2003), no. 4, 1121–1129
[EeSa1964] J. Eells, J. Sampson, Harmonic mappings of Riemannian manifolds,
Amer. J. Math. 86 (1964), 109–160.
[EiMaVe2008] M. Einsiedler, G. Margulis, A. Venkatesh, Effective equidistribution
for closed orbits of semisimple groups on homogeneous spaces, preprint.
[Ei1905] A. Einstein, Ist die Trägheit eines Körpers von dessen Energieinhalt
abhängig?, Annalen der Physik 18 (1905), 639-643.
[El1997] G. Elekes, On the number of sums and products, Acta Arith. 81 (1997), 365–
367.
[ElHa1969] P. Elliot, H. Halberstam, A conjecture in prime number theory, Symp.
Math. 4 (1968-1969), 59–72.
[ElKi2001] G. Elekes, Z. Király, On the combinatorics of projective mappings, J. Al-
gebraic Combin. 14 (2001), no. 3, 183–197.
[ElSz2008] G. Elek, B. Szegedy, Limits of Hypergraphs, Removal and Regularity
Lemmas. A Non-standard approach, preprint.
[En1978] V. Enss, Asymptotic completeness for quantum mechanical potential scat-
tering. I. Short range potentials, Comm. Math. Phys. 61 (1978), no. 3, 285–291.
[Er1945] P. Erdős, On a lemma of Littlewood and Offord, Bull. Amer. Math. Soc. 51
(1945), 898–902.
[Er1947] P. Erdős, Some remarks on the theory of graphs, Bull. Am. Math. Soc. 53
(1947), 292–294.
[Er1949] P. Erdős, On a new method in elementary number theory, Proc. Nat. Acad.
Sci. U.S.A. 35 (1949), 374–384.
[ErSz1983] P. Erdős, E. Szeremédi, On sums and products of integers, Studies in Pure
Mathematics; To the memory of Paul Turán. P. Erdos, L. Alpar, and G. Halasz,
editors. Akademiai Kiado - Birkhauser Verlag, Budapest - Basel-Boston, Mass.
1983, 213-218.
[EsSeSv2003] L. Eskauriaza, G. Serëgin, G., V. Sverák, L3,∞ -solutions of Navier-
Stokes equations and backward uniqueness, (Russian) Uspekhi Mat. Nauk 58
(2003), no. 2(350), 3–44; translation in Russian Math. Surveys 58 (2003), no.
2, 211–250.
BIBLIOGRAPHY 247
[GiTr2008] A. Gilbert, J. Tropp, Signal recovery from partial information via Orthog-
onal Matching Pursuit, preprint.
[GiVe1985] J. Ginibre, G. Velo, Scattering theory in the energy space for a class of
nonlinear Schrdinger equations, J. Math. Pures Appl. (9) 64 (1985), no. 4, 363–
401.
[GoTi2008b] F. Götze, A.N. Tikhomirov, The Circular Law for Random Matrices,
preprint
[Go1997] T. Gowers, Lower bounds of tower type for Szemerédi’s uniformity lemma,
Geom. Func. Anal. 7 (1997), 322–337.
[Go2000] T. Gowers, The two cultures of mathematics, in: Mathematics: Frontiers and
Perspectives, International Mathematical Union. V. Arnold, M. Atiyah, P. Lax, B.
Mazur, Editors. American Mathematical Society, 2000.
[Ha1988] R. Hamilton, The Ricci flow on surfaces, Mathematics and general relativ-
ity (Santa Cruz, CA, 1986), 237–262, Contemp. Math., 71, Amer. Math. Soc.,
Providence, RI, 1988.
[Ha1985] G. Hansel, A simple proof of the Skolem-Mahler-Lech theorem, Automata,
languages and programming (Nafplion, 1985), 244–249, Lecture Notes in Com-
put. Sci., 194, Springer, Berlin, 1985.
[Ha1976] B. Hansson, The existence of group preference functions, Public Choice 28
(1976), 89-98.
[HaLi1923] G.H. Hardy and J.E. Littlewood Some problems of “partitio numerorum”;
III: On the expression of a number as a sum of primes, Acta Math. 44 (1923), 1–
70.
[HB1983] R. Heath-Brown, Prime twins and Siegel zeros, Proc. London Math. Soc.
(3) 47 (1983), no. 2, 193–224.
[HB2001] R. Heath-Brown, Primes represented by x3 + 2y3 , Acta Math. 186 (2001),
no. 1, 1–84.
[HBMo2002] R. Heath-Brown, B. Moroz, Primes represented by binary cubic forms,
Proc. London Math. Soc. (3) 84 (2002), no. 2, 257–288.
[He2006] H. Helfgott, The parity problem for reducible cubic forms, J. London Math.
Soc. (2) 73 (2006), no. 2, 415–435
[He2008] H. Helfgott, Growth and generation in SL2 (Z/pZ), Ann. of Math. 167
(2008), 601-623.
[He1991] E. Heller, Wavepacket dynamics and quantum chaology, Chaos et physique
quantique (Les Houches, 1989), 547–664, North-Holland, Amsterdam, 1991.
[HeKa2006] A. Henriques, J. Kamnitzer, The octahedron recurrence and gln crystals,
Adv. Math. 206 (2006), 211–249.
[He1930] J. Herbrand, Recherches sur la théorie de la démonstration, PhD thesis,
University of Paris, 1930.
[Hi1974] N. Hindman, Finite sums from sequences within cells of a partition of N, J.
Comb. Th. A 17 (1974), 1–11.
[HoRo2008] J. Holmer, S. Roudenko, A sharp condition for scattering of the radial
3d cubic nonlinear Schroedinger equation, preprint.
[HoPy1974] D. Holton, W. Pye, Creating Calculus. Holt, Rinehart and Winston, 1974.
[Ho1962] A. Horn, Eigenvalues of sums of Hermitian matrices, Pacific J. Math. 12
(1962) 225–241.
[HoKr2005] B. Host, B. Kra, Nonconventional ergodic averages and nilmanifolds,
Ann. of Math. (2) 161 (2005), no. 1, 397–488.
BIBLIOGRAPHY 251
[KiSo1972] A.P. Kirman, D. Sondermann, Arrows theorem, many agents, and invisible
dictators, Journal of Economic Theory 5 (1972), pp. 267-277.
[Ki1984] F. Kirwan, Convexity properties of the moment mapping. III, Invent. Math.
77 (1984), no. 3, 547–552.
[Kl2000] S. Klainerman, PDE as a unified subject, GAFA 2000 (Tel Aviv, 1999),
Geom. Funct. Anal. 2000, Special Volume, Part I, 279–315.
[Ko2008] U. Kohlenbach, Applied Proof Theory: Proof Interpretations and Their Use
in Mathematics. Springer Verlag, Berlin, 1–536, 2008.
[KnTa1999] A. Knutson, T. Tao, The honeycomb model of GLn (C) tensor products. I.
Proof of the saturation conjecture, J. Amer. Math. Soc. 12 (1999), no. 4, 1055–
1090.
[La2000] M. Lacey, The bilinear maximal functions map into L p for 2/3 < p ≤ 1,
Ann. of Math. (2) 151 (2000), no. 1, 35–57.
[Li2003] Y. Li, Chaos in PDEs and Lax pairs of Euler equations, Acta Appl. Math.
77 (2003), no. 2, 181–214.
[LiWa2002] M.-C. Liu, T. Wang, On the Vinogradov bound in the three primes Gold-
bach conjecture, Acta Arith. 105 (2002), no. 2, 133–175.
[Ma1989] G. Margulis, Discrete subgroups and ergodic theory, Number theory, trace
formulas and discrete groups (Oslo, 1987), 377–398, Academic Press, Boston,
MA, 1989.
[Me1967] M.L. Mehta, Random Matrices and the Statistical Theory of Energy Levels,
Academic Press, New York, NY, 1967.
[Mu2007] K. Mulmuley, Geometric complexity theory VI: the flip via saturated and
positive integer programming in representation theory and algebraic geometry,
preprint.
[Re1986] S. Reisner, Zonoids with minimal volume-product, Math. Z. 192 (1986), no.
3, 339–346.
[RoSh1957] C.A. Rogers, G.C. Shephard, The difference body of a convex body, Arch.
Math. 8 (1957), 220–233.
[Ro1953] K.F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953),
245–252.
[Ru1996] I. Ruzsa, Sums of finite sets, Number Theory: New York Seminar; Springer-
Verlag (1996), D.V. Chudnovsky, G.V. Chudnovsky and M.B. Nathanson editors.
[RuSz1978] I. Ruzsa, E. Szemerédi, Triple systems with no six points carrying three
triangles, Colloq. Math. Soc. J. Bolyai 18 (1978), 939–945.
[Sa1949] L. Santaló, Un invariante afin para los cuerpos convexos del espacio de n
dimensiones, Portugalie Math. 8 (1949), 155–161.
BIBLIOGRAPHY 257
[Sa1998] Y. Saouter, Checking the odd Goldbach conjecture up to 1020 , Math. Comp.
67 (1998), no. 222, 863–866.
[Sc1985] J. Schaeffer, The equation utt − ∆u = |u| p for the critical value of p, Proc.
Roy. Soc. Edinburgh Sect. A 101 (1985), no. 1-2, 31–44.
[Sc1916] I. Schur, Über die Kongruenz xm + ym = zm (mod p), Jber. Deutsch. Math.-
Verein. 25 (1916), 114–116.
[Se1949] A. Selberg, An elementary proof of the prime number theorem, Ann. Math.
50 (1949), 305–313
[So2005] J. Solymosi, On the number of sums and products, Bull. London Math. Soc.
37 (2005), no. 4, 491–494.
[Sp2005] D. Speyer, Horn’s Problem, Vinnikov Curves and Hives, Duke Journal of
Mathematics 127 (2005), 395–428.
[Ta2006f] T. Tao, Global behaviour of nonlinear dispersive and wave equations, Cur-
rent Developments in Mathematics 2006, International Press. 255-340.
[Ta2008b] T. Tao, A quantitative formulation of the global regularity problem for the
periodic Navier-Stokes equation, preprint.