Optimal Transportation and Action Minimizing Measures PDF

Scuola Normale Superiore of Pisa
and
École Normale Supérieure of Lyon
Phd thesis
24th October 2007
Optimal transportation and

action-minimizing measures
Alessio Figalli
a.figalli@sns.it
Advisor Advisor
Prof. Luigi Ambrosio Prof. Cédric Villani
Scuola Normale Superiore of École Normale Supérieure of
Pisa Lyon.
Contents
Introduction 7
1 The optimal transportation problem 17

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Background and some definitions . . . . . . . . . . . . . . . . . . . . . . 19
1.3 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Costs obtained from Lagrangians . . . . . . . . . . . . . . . . . . . . . . 28
1.5 The interpolation and its absolute continuity . . . . . . . . . . . . . . . . 33
1.6 The Wasserstein space W2 . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6.1 Regularity, concavity estimate and a displacement convexity result 41
1.7 Displacement convexity on Riemannian manifolds . . . . . . . . . . . . . 47
1.7.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.8 A generalization of the existence and uniqueness result . . . . . . . . . . 56
2 The irrigation problem 61

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2 Traffic plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3 Dynamic cost of a traffic plan . . . . . . . . . . . . . . . . . . . . . . . . 70
2.4 Synchronizable traffic plans . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.5 Equivalence of the dynamical and classical irrigation problems . . . . . . 76
2.6 Stability with respect to the cost . . . . . . . . . . . . . . . . . . . . . . 77
3 Variational models for the incompressible Euler equations 81

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Notation and preliminary results . . . . . . . . . . . . . . . . . . . . . . 87
3.3 Variational models for generalized geodesics . . . . . . . . . . . . . . . . 91
3.3.1 Arnold’s least action problem . . . . . . . . . . . . . . . . . . . . 91
3.3.2 Brenier’s Lagrangian model and its extensions . . . . . . . . . . . 91
3.3.3 Brenier’s Eulerian-Lagrangian model . . . . . . . . . . . . . . . . 97
3
4 Contents
3.4 Equivalence of the two relaxed models . . . . . . . . . . . . . . . . . . . 99

3.5 Comparison of metrics and gap phenomena . . . . . . . . . . . . . . . . . 100
3.6 Necessary and sufficient optimality conditions . . . . . . . . . . . . . . . 109
3.7 Regularity of the pressure field . . . . . . . . . . . . . . . . . . . . . . . . 124
3.7.1 A difference quotients estimate . . . . . . . . . . . . . . . . . . . 125
3.7.2 Proof of the main result . . . . . . . . . . . . . . . . . . . . . . . 128
4 On the structure of the Aubry set and Hamilton-Jacobi equation 135

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.2 Preparatory lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
1,1
4.3 Existence of Cloc critical subsolution on noncompact manifolds . . . . . . 143
4.4 Proofs of Theorems 4.1.1, 4.1.2, 4.1.4 . . . . . . . . . . . . . . . . . . . . 148
4.4.1 Proof of Theorem 4.1.1 . . . . . . . . . . . . . . . . . . . . . . . . 148
4.4.2 Proof of Theorem 4.1.2 . . . . . . . . . . . . . . . . . . . . . . . . 149
4.4.3 Proof of Theorem 4.1.4 . . . . . . . . . . . . . . . . . . . . . . . . 150
4.4.4 A general result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.5 Proof of Theorem 4.1.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.6 Applications in Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.6.1 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.6.2 Strong Mather condition . . . . . . . . . . . . . . . . . . . . . . . 161
4.6.3 Mañé Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5 DiPerna-Lions theory for SDE 167

5.1 Introduction and preliminary results . . . . . . . . . . . . . . . . . . . . 167
5.1.1 Plan of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.2 SDE-PDE uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.2.1 A representation formula for solutions of the PDE . . . . . . . . . 175
5.3 Stochastic Lagrangian Flows . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.3.1 Existence, uniqueness and stability of SLF . . . . . . . . . . . . . 180
5.3.2 SLF versus RLF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.4 Fokker-Planck equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.4.1 Existence and uniqueness of measure valued solutions . . . . . . . 187
5.4.2 Existence and uniqueness of absolutely continuous solutions in the
uniformly parabolic case . . . . . . . . . . . . . . . . . . . . . . . 188
5.4.3 Existence and uniqueness in the degenerate parabolic case . . . . 199
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.6 A generalized uniqueness result for martingale solutions . . . . . . . . . . 206
Contents 5
6 Appendix 211
6.1 Semi-concave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.2 Tonelli Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6.2.1 Definition and background . . . . . . . . . . . . . . . . . . . . . . 221
6.2.2 Lagrangian costs and semi-concavity . . . . . . . . . . . . . . . . 233
6.2.3 The twist condition for costs obtained from Lagrangians . . . . . 236
Bibliography 239
6 Contents
Introduction
The Monge transportation problem is more than 200 years old [110], and it has generated
in the last years a huge amount of work.
Originally Monge wanted to move, in 3-space, a rubble (déblais) to build up a mound
or fortification (remblais) minimizing the cost. Now, if the rubble consists of masses, say
m1 , . . . , mn at locations {x1 , . . . xn }, one should move them into another set of positions
{y1 , . . . , yn } by minimizing the weighted traveled distance. Therefore one should try to
minimize n
X
mi |xi − T (xi )|,
i=1
over all bijections T : {x1 , . . . xn } → {y1 , . . . , yn }, where d is the usual Euclidean distance
on 3-space.
Nowadays, one would be more interested in minimizing the energy cost rather than
the traveled distance. Therefore one would try rather to minimize
n
X
mi |xi − T (xi )|2 .
i=1
Of course, it is desirable to generalize this to continuous, rather than just discrete,

distributions of matter. To this aim, Monge transportation problem can be stated in
the following general form: given two probability measures µ and ν, defined on the
measurable spaces X and Y , find a measurable map T : X → Y with
T] µ = ν,
i.e. ¡ ¢
ν(A) = µ T −1 (A) ∀A ⊂ Y measurable,
and in such a way that T minimizes the transportation cost. This last condition means
Z ½Z ¾
c(x, T (x)) dµ(x) = min c(x, S(x)) dµ(x) ,
X S] µ=ν X
7
8 Introduction
where c : X × Y → R is some given cost function, and the minimum is taken over all
measurable maps S : X → Y with S] µ = ν. When the transport condition T] µ = ν is
satisfied, we say that T is a transport map, and if T minimizes also the cost we call it an
optimal transport map.
In the development of the theory of optimal transportation, as well as in the devel-
opment of other theories, it is important on the one hand to explore new variants of the
original problem, on the other hand to figure out, in this emerging variety of problems,
some common (and sometimes unexpected) features. This kind of analysis is the main
scope of our thesis.
The problems we will consider are:
1. The optimal transportation problem on manifolds with geometric costs: we study

the problem of the existence and the uniqueness of an optimal transport map on
arbitrary manifolds (withoutR 1 any condition on the sectional curvature), for costs
of the form c(x, y) := inf γ 0 L(γ, γ̇) dt, where the infimum is among all absolutely
continuous curves from x to y, and L(x, v) is a Tonelli Lagrangian.
2. The optimal irrigation problem: this is a generalization of the classical optimal

transportation problem, where one wants to connect a source measure to a target
measure using a “transport structure” in such a way that the mass moves, as much
as possible, in a grouped way. The motivation for such a problem comes from the
modelization of many biological and engineering structures.
3. The Brenier variational theory of incompressible flows: starting from the geomet-
rical interpretation of the Euler equations for incompressible fluids as a geodesic
equation in the space of the measure-preserving diffeomorphisms, one can look for
solutions of the Euler equations by minimizing the action functional. This leads
to the introduction of relaxed models and their study from a calculus of variation
point of view.
4. The Aubry-Mather theory and the solutions of Hamilton-Jacobi equations: the

regularity and the uniqueness of viscosity solutions of the Hamilton-Jacobi equation
is linked to the smallness of certain sets appearing in Lagrangian dynamics. Thus
we will be interested to estimate the Hausdorff dimension of the so called quotient
Aubry set.
5. The DiPerna-Lions theory for martingale solutions of stochastic differential equa-

tions: this theory allows in some sense to prove a sort of existence and uniqueness
for an ordinary differential equation for almost every initial condition once one has
some well-posedness result at the level of the associated transport equation. Our
9
aim will be to develop such a theory in the case of an ordinary differential equation
perturbed by an irregular noise.
We remark that the first three topics are all variants of the optimal transportation
problem. Moreover, even though the last topic is only loosely related to optimal trans-
portation, at the technical level many connections arise, and the study of all them reveals
some new connections. For instance, Bernard and Buffoni [22] have recently shown how
one can fit Mather’s theory, as well as optimal transportation problems on manifolds
with a geometric cost, in the framework of measures in the space of action-minimizing
curves. We proceed in this research of a general unified framework, proving that also
variational solutions of the Euler equations [8] can be seen in this perspective, with
a possibly non-smooth action induced by the pressure field. Also the last two topics
present some links with the first three. For instance, the proof in [23] on the existence
and uniqueness of an optimal transport plan strongly relies on the regularity properties
of solutions of Hamilton-Jacobi equations, while the natural framework which allows to
develop a theory à la DiPerna-Lions for martingale solutions of stochastic differential
equations turns out to be the one of the measures in the space of paths, which, as we
said, is natural also in the optimal transportation problem and in the variational study
of the Euler equations.
Let us give a quick overview on all these subjects (each chapter contains a more
detailed mathematical and bibliographical description of the single problems), providing
also an outline of thesis’ content. All the results in this thesis have been presented
in a series of papers (accepted, submitted, or in preparation), originating from several
collaborations developed during the PhD studies.
1. As we explained above, one is interested to find a transport map T : X → Y from
µ to ν which minimizes the transportation cost, that is
Z ½Z ¾
c(x, T (x)) dµ(x) = min c(x, S(x)) dµ(x) .
X S] µ=ν X
Even in Euclidean spaces, with the cost c equal to the Euclidean distance or its
square, the problem of the existence of an optimal transport map is far from being
trivial. Moreover, it is easy to build examples where the Monge problem is ill-posed
simply because there is no transport map: this happens for instance when µ is a Dirac
mass while ν is not. This means that one needs some restrictions on the measures µ and
ν.
The major advance on this problem is due to Kantorovitch, who proposed in [85],
[86] a notion of weak solution of the optimal transport problem. He suggested to look
for plans instead of transport maps, that is probability measures γ in X × Y whose
marginals are µ and ν, i.e.
(πX )] γ = µ and (πY )] γ = ν,
10 Introduction
where πX : X × Y → X and πY : X × Y → Y are the canonical projections. Denoting

by Π(µ, ν) the set of plans, the new minimization problem becomes then the following:
½Z ¾
C(µ, ν) = min c(x, y) dγ(x, y) . (0.0.1)
γ∈Π(µ,ν) M ×M
If γ is a minimizer for the Kantorovich formulation, we say that it is an optimal plan.
Due to the linearity of the constraint γ ∈ Π(µ, ν), it turns out that weak topologies can
be used to provide existence of solutions to (0.0.1): this happens for instance whenever
X and Y are Polish spaces and c is lower semicontinuous (see [118], [132, Proposition 2.1]
or [133]). The connection between the formulation of Kantorovich and that of Monge can
be seen by noticing that any transport map T induces the plan defined by (IdX ×T ˜ )] µ
˜
which is concentrated on the graph of T , where the map IdX ×T : X → X × Y is defined
by
˜ (x) = (x, T (x)).
IdX ×T
Thus, the problem of showing existence of optimal transport maps reduces to prove that
an optimal transport plan is concentrated on a graph. It is however clear, from what
we already said, that no such result can be expected without additional assumptions
on the measures and on the cost. The first existence and uniqueness result is due to
Brenier. In [30] he considers the case X = Y = Rn , c(x, y) = |x − y|2 , and he shows
that, if µ is absolutely continuous with respect to the Lebesgue measure, there exists
a unique optimal transport map. After this result, many researchers started to work
on the problem, showing existence of optimal maps with more general costs, both in
a Euclidean setting (for example Caffarelli, Evans, Ambrosio and Pratelli, Trudinger
and Wang, McCann, Feldman), in the case of compact manifolds (McCann, Bernard
and Buffoni), and in some particular classes on non-compact manifolds (Feldman and
McCann), see Section 1.1 for precise references.
A fact which is now well understood is that the choice of the cost changes completely
the structure of the problem. In particular, completely different are the cases c(x, y) =
|x − y|p , with p > 1, with respect to the case c(x, y) = |x − y| (the latter has been solved
in the Euclidean case many years after the result of Brenier: first by Evans and Gangbo
[60] under some regularity assumptions on the measures, then by Caffarelli, Feldman and
McCann [39] under much weaker assumptions on the measures, and finally, in a more
general case, by Ambrosio and Pratelli [13]). Indeed, the strict convexity of |x − y|p for
p > 1 allows to prove existence and uniqueness of the transport map under the absolute
continuity assumption on µ. On the other hand, in the case c(x, y) = |x − y| one can still
prove existence of optimal maps if µ is absolutely continuous, but no uniqueness result
can be expected. This is a consequence, even on the real line, of the so-called book-shifting
phenomenon: taking µ = L 1 b[0,1] and ν = L 1 b[1/2,3/2] , the two maps T1 (x) = x + 1/2
and T2 (x) = (x + 1)χ[0,1/2] (x) + xχ[1/2,1] (x) are both optimal. This is a special case of
11
the general fact that, with the cost c(x, y) = |x − y|, one can find an optimal transport
map imposing also that the common mass between µ and ν stays fixed. In Chapter 1,
following a joint work with Albert Fathi [66], we show existence and uniqueness of an
optimal transport map in a very general setting, which includes the case of “geometric”
costs on manifolds, that is costs given by
Z 1
c(x, y) := inf L(γ(t), γ̇(t)) dt,
γ(0)=x,γ(1)=y 0
where L : T M → R is a Tonelli Lagrangian. This is the most general known result, since
it is valid for a wide class of cost functions and it does not require any global assumption
on the manifold (say, a bound on the sectional curvature). To this aim, we will need to
understand the regularity of the cost alongs extremals, a problem which is closely linked
to weak KAM theory and the regularity of solutions of the Hamilton-Jacobi equation,
see also Section 6.2.2.
Moreover, in Section 1.5 we will study the so-called “displacement interpolation”,
which is a way to connect measures using the optimal transportation. For instance,
suppose that µ0 and µ1 are two absolutely continuous measures in Rd , and let T :
Rd → Rd be the optimal transport map from µ0 to µ1 (as we said above, existence and
uniqueness of the optimal transport map in this special case is due to Brenier [30]).
Then, instead of “connecting” µ0 to µ1 in a linear way (that is µt = (1 − t)µ0 + tµ1 ), one
can consider the interpolation µt := ((1 − t) Id +tT )# µ0 , which is called “displacement
interpolation”. An interesting feature is that, from the convexity of certain funtionals
along such curves, one can deduce existence, uniqueness and stability of the gradient
flows of such functionals obtaining many interesting properties for Fokker-Planck-type
evolution equations such as the porous medium equation (see [42], and see also [11] for
an introduction and a wide bibliography on this subject).
The convexity of certain suitable functionals on Riemannian manifolds allows to
express Ricci curvature bounds on the manifold. In Section 1.7, following a joint work
with Cedric Villani [78], we use the general results on optimal transport maps mentionned
above to study the link between more possible notions of “displacement convexity“ (i.e.
convexity along displacement interpolations) and to prove their equivalence.
Finally, in Section 1.8 we will generalize the existence and uniqueness of the optimal
transport map without assumptions on the finiteness of the transportation cost, and we
will also prove that the optimal transport map on a general manifold is approximately
differentiable a.e. whenever the cost is given by c(x, y) = d2 (x, y) [75].
2. Another kind of transport problem is the so-called irrigation problem. Start-
ing from the observation of the frequent occurrence of branched networks in nature
(plants and trees, river basins, bronchial and cardiovascular systems) and in man de-
signed structures (communication networks, electric power supply, water distribution
12 Introduction
or drainage networks), and observing that the common function of such networks is to
transport some goods from an initial distribution (the supply) to another one (the de-
mand), we are interested in finding models which describe such fenomena. This was
done in [82, 98, 135, 25, 24, 29] by considering cost functions that encode the efficiency
of a transport induced by some structure. Branched structures, as the ones observed in
nature, then arise as the optimal structures along which the transport takes place.
PNThe first model P
is due to Gilbert [82]: given two atomic probability measures µ =
N2
1
a δ
i=1 i xi and ν = i=1 bi δyi , find a finite, oriented and weighted graph Γ = (vh , ph ) (vh
are the vectors that orient the graph, ph the weights), which satisfies Kirchhoff’s law at
the junctures (the mass which enters is equal the mass which exits, except at the points
xi where a mass ai exits, and at the points yj where a mass bj enters). One then looks
for a graph which minimizes the transportation cost
X
C α (Γ) = |vh |pαh
h
with 0 ≤ α ≤ 1. The motivation for introducing such a parameter α is that, since the
function t 7→ tα is sub-additive for 0 ≤ α ≤ 1, the inequality (ph1 +ph2 )α ≤ pαh1 +pαh2 holds,
and thus the mass has interest to concentrate and to move together. This problem has
been recently generalized (by Xia, Morel, Bernot, Solimini, etc.) to the case of arbitrary
probability measures, and one arrives at problems where the optimal objects have a
branched structure, and the optimal transportation costs C α (µ, ν) give rise to a distance
which metrizes the weak convergence. In order to extend the above problem to arbitrary
target and source measures, a “probabilistic” formalism that has been considered in
[98, 25, 24] is the one of traffic plans, which are suitable probability measures in the space
of continuous paths which “connect” two fixed measures µ and ν. In this framework, all
particles are indexed by the set Ω := [0, 1], and to each ω ∈ Ω is associated a 1-Lipschitz
path χ(ω, ·) in RN . This is a Lagrangian description of the dynamic of particles that
can be encoded by the image measure Pχ of the map ω 7→ χ(ω, ·) (which is therefore
a measure on the set of 1-Lipschitz paths). To each traffic plan one can associate a
suitable cost function which has to incorporate the principle that it is more efficient to
transport mass in a grouped way rather than in a separate way. Like in the discrete
case considered by Gilbert, to embed this principle the costs incorporate a parameter
α ∈ [0, 1] and make use of the concavity of x 7→ xα . Once the cost and the measures µ
and ν are given, one can consider what is called the irrigation problem by some authors
[25, 24, 26], i.e. the problem of minimizing the cost among structures transporting µ to
ν.
In Chapter 2 we study different kinds of possible costs, and in some case we show
their equivalence. Moreover, we study the properties of the costs when seen as functions
of the parameter α, and we use this analysis to show a stability property of minimizers.
13
This is a joint work with Marc Bernot [27].

3. The velocity of an incompressible fluid moving inside a region D is mathematically
described by a time-dependent and divergence-free vector field u(t, x) which is parallel to
the boundary ∂D. The Euler equations for incompressible fluids describes the evolution
of such velocity field u in terms of the pressure field p:

 ∂t u + (u · ∇)u = −∇p in [0, T ] × D,
div u = 0 in [0, T ] × D, (0.0.2)

u·n=0 on [0, T ] × ∂D.
Let us assume that u is smooth, so that it produces a unique flow g. Writing the Euler
equations in terms of g, we get

 g̈(t, a) = −∇p (t, g(t, a)) (t, a) ∈ [0, T ] × D,
g(0, a) = a a ∈ D, (0.0.3)

g(t, ·) ∈ SDiff(D) t ∈ [0, T ],
where SDiff(D) denotes the space of measure-preserving diffeomorphisms of D. Viewing

SDiff(D) as an infinite-dimensional manifold with the metric inherited from the embed-
ding in L2 , and with tangent space made by the divergence-free vector fields, Arnold
interpreted the equation above, and therefore (0.0.2), as a geodesic equation on SDiff(D)
[15]. According to this intepretation, one can look for solutions of (0.0.3) by minimizing
Z TZ
1
T |ġ(t, x)|2 dµD (x) dt
0 D 2
among all paths g(t, ·) : [0, T ] → SDiff(D) with g(0, ·) = f and g(T, ·) = h prescribed.
This minimization problem presents many difficulties from the calculus of variations
point of view (mainly because of the incompressibility constraint), and also gives rise
to many interesting questions. Brenier [31, 35] introduced two relaxed models to study
this problem. In particular, in [31], he defined a generalized incompressible flow as a
probability measure η on Ω(D) := C([0, T ], D) such that
(et )# η = L d xD ∀t ∈ [0, T ], (e0 , eT )# η = (id × h)# L d xD,
and defined the action of η as

Z Z T
1
A (η) := |ω̇(t)|2 dt dη(ω).
2 Ω(D) 0
The existence of a minimizer can be proved by a standard compactness and semiconti-

nuity argument. Moreover, by a duality argument, Brenier introduced the pressure field,
14 Introduction
and proved that a minimizer of the above variational problem η solves in a “weak” sense
the Euler equations: for all h ∈ Cc∞ (0, 1) and w smooth compactly supported vector
field on D, we have
Z Z T
d
ω̇(t) · [h(t)w(ω(t))] dt dη(ω) = hDx p(t, x), h(t)w(x)i
Ω(D) 0 dt
in the distributional sense.
In particular, this condition identifies uniquely the pressure field p (as a distribution)
up to trivial modifications, i.e. additive perturbations depending on time only. We also
remark that, if the measure η is given by
Z Z
f (ω) dη(ω) = f (t 7→ g(t, x)) dx
Ω(D) D
with g : [0, T ] → SDif f (D) smooth, then u(t, x) := ∂t g(t, g −1 (t, x)) is a solution of the
Euler equations. Now an important problem is to study the structure of minimizers,
finding necessary and sufficient conditions for optimality, a question which will be ad-
dressed and solved in Chapter 3 following a joint work with Luigi Ambrosio [8]. As we
already said, the results we prove show a somehow unexpected connection between the
variational theory of incompressible flows and the theory developed by Bernard-Buffoni
[22] of measures in the space of action-minimizing curves.
Indeed, first we refine a little bit the deep analysis made in [35] of the regularity of
the gradient of the pressure field: Brenier proved that the distributions ∂xi p are locally
finite measures in (0, T ) × D, but this information is not sufficient (due to a lack of time
regularity) to ¡imply that p is a¢ function. In Section 3.7 we improve this result showing
that p ∈ L2loc (0, T ); BVloc (D) (this has been done in another joint work with Luigi
Ambrosio [9]). In particular p is a function at least in some L1loc (Lrloc ) space, for some
r > 1. We can therefore develop a refined analysis of the necessary and sufficient opti-
mality conditions for action-minimizing curves in Γ(D) (see Section 3.6) which involve
the Lagrangian Z
1
Lp (γ) := |γ̇(t)|2 − p(t, γ(t)) dt,
2
the (locally) minimizing curves for Lp and the value function induced by Lp .
We also remark that the possibility of deducing such regularity result for the pressure
is based on the equivalence, proved in Section 3.4, of the above mentioned Brenier model
and the Eulerian-Lagrangian model introduced by the same author in [35] (see Section
3.3.3). Indeed, the regularity of p is easier to study within the latter model.
4. As we said, an important connection between Mather’s theory as well as optimal
transportation problems on manifolds exists [22, 23]. The key point is that cost functions
induced by Tonelli Lagrangians solve an Hamilton-Jacobi equation.
15
Important for studying the dynamic of a Lagrangian system and for having uniqueness
of solutions of the Hamilton-Jacobi equation
H(x, dx u) = c
is to understand the structure of some subsets of the tangent space which capture the
properties of the dynamic. Mather [105] proposed as an important problem to show
that the quotient Aubry set is totally disconnected if the Lagrangian (or, equivalently,
the Hamiltonian) is smooth. In Chapter 4 this problem will be completely solved up to
dimension 3, and in many particular cases in higher dimension, following a joint work
with Albert Fathi and Ludovic Rifford [67].
To understand the key idea of the proof, let us consider the particular case
1
H(x, p) = |p|2 + V (x),
2
and without loss of generality let us assume maxx V (x) = 0. Then in this case the
Hamilton-Jacobi equation one is interested in becomes
1
|dx u|2 + V (x) = 0
2
(the value c = 0 is the Mañé critical value for the above Hamiltonian), and the pro-
jected Aubry set is the set {V = 0}. As shown in Section 4.2, the key point to show
that the quotient Aubry set is totally disconnected (or small in the sense of the Haus-
dorff dimension) is to prove a sort of Sard-type theorem for critical subsolution of the
Hamilton-Jacobi equation (that is functions u which satisfy 12 |dx u|2 +V (x) ≤ 0), showing
that the image of the set {V = 0} under the map u : M → R has zero Lebesgue measure.
Although the function u is only C 1 , and so the classical Sard theorem cannot be applied,
in this case one has the extra information
|dx u|2 ≤ −2V (x),
and the function V (x) is smooth by assumption. One can therefore use the regularity of
V to deduce that u is really “flat” near {V = 0}, and so to deduce the Sard-type result.
5. In Chapter 5, we will develop a theory à la DiPerna-Lions for martingale solutions,
in the sense of Stroock-Varadhan, of stochastic differential equations.
In [56, 4], the authors developed a theory which, roughly speaking, allows to prove
existence and uniqueness in a weak sense for solutions of ordinary differential equations
with nonsmooth coefficients. This theory is bsed on the classical links between the
transport (or the continuity) equation
∂t µ + div(bµ) = 0
16 Introduction
and the associated ordinary differential equation

½
Ẋ(t, x) = b(t, X(t, x))
X(0, x) = x.
What one proves is that, in a suitable sense (see [56, 4, 5] for a precise statement),
existence and uniqueness for the ordinary differential equation hold for almost every
initial condition if, and only if, the partial differential equation is well-posed in L∞ . It
was pointed out in [4] that this theory has a probabilistic flavour, and therefore it is very
natural to look for a more general theory concerning stochastic differential equations
whose limit, as the diffusion coefficient tends to 0, should be the DiPerna-Lions theory.
In Section 5.2 we obtain this type of extension [77]: first we study the links between
the Fokker-Planck equation
X 1X
∂ t µt + ∂i (bi µt ) − ∂ij (aij µt ) = 0 in [0, T ] × Rd , (0.0.4)
i
2 ij
and the associated stochastic differential equation

½
dX = b(t, X) dt + σ(t, X) dB(t)
(0.0.5)
X(0) = x,
P
where aij := k σik σjk . The stochastic differential equation is considered in a weak
sense (that of martingale solutions), and we show that existence and uniqueness for
the stochastic differential equation hold for almost every initial condition if and only if
the partial differential equation is well-posed in L∞ (again uniqueness for the stochastic
differential equation holds in a more complicated sense). Moreover, a study of the Fokker-
Planck equations from a purely partial differential equation point of view (see Section
5.4) allows to apply the above theory to some important specific cases.
As we already said, all these chapters stem from a series of papers published during
the PhD studies, except some parts of Chapter 1. Indeed, a preliminary version of the
work with Albert Fathi [66] with weaker results was already present in the undergraduate
thesis [73]. Also the work with Cédric Villani [78] and the paper [75] were present in
[73] but we decided to include them in the chapter for a more complete and organic
exposition. We have chosen to leave out, since they are not directly linked with the
thesis’ subject, some other papers written during the PhD studies (see [76] and [1]). A
paper somehow related to optimal transportation, written before the beginning of the
PhD studies, is instead [74].
Regarding notation, we tried to make it as much unified as possible. Nevertheless,
the main specific notation will be introduced chapter by chapter.
Chapter 1
The optimal transportation problem
1.1 Introduction
1
The optimal transportation problem we consider in this chapter is the following: given
two probability measures µ and ν, defined on the measurable spaces X and Y , find a
measurable map T : X → Y with
T] µ = ν, (1.1.1)
and in such a way that T minimize the transportation cost, that is
Z ½Z ¾
c(x, T (x)) dµ(x) = min c(x, S(x)) dµ(x) .
X S] µ=ν X
Here c : X × Y → R is some given cost function, and the minimum is taken over all
measurable maps S : X → Y with S] µ = ν. When condition (1.1.1) is satisfied, we say
that T is a transport map, and if T minimize also the cost we call it an optimal transport
map.
Even in Euclidean spaces, and the cost c equal to the Euclidean distance or its
square, the problem of the existence of an optimal transport map is far from being
trivial. Due to the strict convexity of the square of the Euclidean distance, the case
c(x, y) = |x − y|2 is simpler to deal with than the case c(x, y) = |x − y|. The reader
should consult the books and surveys given above to have a better view of the history
of the subject, in particular Villani’s second book on the subject [133]. However for the
case where the cost is a distance, one should cite at least the work of Sudakov [129],
1
This chapter is based on joint works with Albert Fathi [66], Cédric Villani [78], and on the work in
[75].
17
18 1.0. The optimal transportation problem
Evans-Gangbo [60], Feldman-McCann [71], Caffarelli-Feldman-McCann [39], Trudinger-

Wang [130], Ambrosio-Pratelli [13], and Bernard-Buffoni [23]. For the case where the
cost is the square of the Euclidean or of a Riemannian distance, one should cite at least
the work of Knott-Smith [87], Brenier [30], Rachev-Rüschendorf [117], Gangbo-McCann
[80], McCann [109], and Bernard-Buffoni [22].
Our work is related to the case where the cost behaves like a square of a Riemannian
distance. It is strongly inspired by the work of Bernard-Buffoni [22]. In fact, we prove the
non-compact version of this last work adapting some techniques that were first used in the
Euclidean case in [11] by Ambrosio, Gigli, and Savaré. We show that the Monge transport
problem can be solved for the square distance on any complete Riemannian manifold
without any assumption on the compactness or curvature, with the usual restriction on
the measures. Most of the arguments in this chapter are well known to specialists, at
least in the compact case, but they have not been put together before and adapted to
the case we treat. Of course, there is a strong intersection with some of the results that
appear in [133]. For the case where the cost behaves like the distance of a complete
non-compact Riemannian manifold, see [74].
We will prove a generalization of the following theorem (see Theorems 1.4.2 and
1.4.3):
Theorem 1.1.1. Suppose that M is a connected complete Riemannian manifold, whose

Riemannian distance is denoted by d. Suppose that r > 1. If µ and ν are probability
(Borel) measures on M , with µ absolutely continuous with respect to Lebesgue measure,
and Z Z
r
d (x, x0 ) dµ(x) < ∞ and dr (x, x0 ) dν(x) < ∞
M M
for some given x0 ∈ M , then we can find a transport map T : M → M , with T] µ = ν,

which is optimal for the cost dr on M × M . Moreover, the map T is uniquely determined
µ-almost everywhere.
We recall that a measure on a smooth manifold is absolutely continuous with respect

to the Lebesgue measure if, when one looks at it in charts, it is absolutely continuous
with respect to Lebesgue measure. Again we note that there is no restriction on the
curvature of M in the theorem above.
The chapter is structured as follows: in Section 1.2 we recall some known results
on the general theory of the optimal transport problem, and we introduce some useful
definitions. Then, in Section 1.3 we will give very general results for the existence and
the uniqueness of optimal transport maps (Theorems 1.3.1 and 1.3.2, and Complement
1.3.4). In Section 1.4 the above results are applied in the case of costs functions coming
from (weak) Tonelli Lagrangians (Theorems 1.4.2 and 1.4.3). In Section 1.5, we study
the so called “dispacement interpolation”, showing a countably Lipschitz regularity for
1.2. Background and some definitions 19
the transport map starting from an intermidiate time (Theorem 1.5.2). All the tecnical
results about semi-concave functions and Tonelli Lagrangians used in our proofs are
collected in the appendix at the end of the thesis.
1.2 Background and some definitions

The weak formulation of the transport problem proposed by Kantorovich in [85], [86] is
the following: one looks for plans instead of transport maps, that is probability measures
γ in X × Y whose marginals are µ and ν, and one minimizes
½Z ¾
C(µ, ν) = min c(x, y) dγ(x, y) , (1.2.1)
γ∈Π(µ,ν) M ×M
(here Π(µ, ν) denotes the set of plans). If γ is a minimizer for the Kantorovich formu-
lation, we say that it is an optimal plan. Using weak topologies, it is simple to prove
existence of optimal plans whenever X and Y are Polish spaces and c is lower semicon-
tinuous (see [118], [132, Proposition 2.1] or [133]).
It is well-known that a linear minimization problem with convex constraints, like
(1.2.1), admits a dual formulation. Before stating the duality formula, we make some
definitions similar to that of the weak KAM theory (see [65]):
Definition 1.2.1 (c-subsolution). We say that a pair of functions ϕ : X → R∪{+∞},
ψ : Y → R ∪ {−∞} is a c-subsolution if
∀(x, y) ∈ X × Y, ψ(y) − ϕ(x) ≤ c(x, y).
Observe that when c is measurable and bounded below, and (ϕ, ψ) is a c-subsolution
with ϕ ∈ L1 (µ), ψ ∈ L1 (ν), then
Z Z Z
∀γ ∈ Π(µ, ν), ψ dν − ϕ dµ = (ψ(y) − ϕ(x)) dγ(x, y)
Y X X×Y
Z
≤ c(x, y) dγ(x, y).
X×Y
R
If moreover X×Y
c(x, y) dγ < +∞, and
Z Z
(ψ(y) − ϕ(x)) dγ(x, y) = c(x, y) dγ(x, y),
X×Y X×Y
then one would obtain the following equality:

ψ(y) − ϕ(x) = c(x, y) for γ-a.e. (x, y)
(without any measurability or integrability assumptions on (ϕ, ψ), this is just a formal
computation).
Definition 1.2.2 (Calibration). Given an optimal plan γ, we say that a c-subsolution

(ϕ, ψ) is (c, γ)-calibrated if ϕ and ψ are Borel measurable, and
ψ(y) − ϕ(x) = c(x, y) for γ-a.e. (x, y).
Theorem 1.2.3 (Duality formula). Let X and Y be Polish spaces equipped with
probability measures µ and ν respectively, c : X × Y → R a lower semicontinuous cost
function bounded from below such that the infimum in the Kantorovitch problem (1.2.1)
is finite. Then a transport plan γ ∈ Π(µ, ν) is optimal if and only if there exists a
(c, γ)-calibrated subsolution (ϕ, ψ).
For a proof of this theorem see [120] and [133, Theorem 5.9 (ii)].
Here we study Monge’s problem on manifolds for a large class of cost functions
induced by Lagrangians like in [22], where the authors consider the case of compact
manifolds. We generalize their result to arbitrary non-compact manifolds.
Following the general scheme of proof, we will first prove a result on more general
costs, see Theorem 1.3.2. In this general result, the fact that the target space for the
Monge transport is a manifold is not necessary. So we will assume that only the source
space (for the Monge transport map) is a manifold.
Let M be an n-dimensional manifold (Hausdorff and with a countable basis), N a
Polish space, c : M × N → R a cost function, µ and ν two probability measures on M
and N respectively. We want to prove existence and uniqueness of an optimal transport
map T : M → N , under some reasonable hypotheses on c and µ.
One of the conditions on the cost c is given in the following definition:
Definition 1.2.4 (Twist Condition). For a given cost function c(x, y), we define the
skew left Legendre transform as the partial map
Λlc : M × N → T ∗ M,
∂c
Λlc (x, y) = (x, (x, y)),
∂x
whose domain of definition is
½ ¾
∂c
D(Λlc ) = (x, y) ∈ M × N | (x, y) exists .
∂x
Moreover, we say that c satisfies the left twist condition if Λlc is injective on D(Λlc ).
One can define similarly the skew right Legendre transform Λrc : M × N → T ∗ N by
∂c
Λrc (x, y) = (y, ∂y (x, y)),. The domain of definition of Λrc is D(Λrc ) = {(x, y) ∈ M × N |
∂c
∂x
(x, y) exists}. We say that c satisfies the right twist condition if Λrc is injective on
D(Λrc ).
1.2. Background and some definitions 21
The usefulness of these definitions will be clear in the Section 1.4, in which we will
treat the case where M = N and the cost is induced by a Lagrangian. This condition has
appeared already in the subject. It has been known (explicitly or not) by several people,
among them Gangbo (oral communication) and Villani (see [132, page 90]). It is used in
[22], since it is always satisfied for a cost coming from a Lagrangian, as will see below.
We borrow the terminology “twist condition” from the theory of Dynamical Systems:
if h : R × R → R, (x, y) 7→ h(x, y) is C2 , one says that h satisfies the twist condition
∂ 2h
if there exists a constant α > 0 such that ≥ α everywhere. In that case both
∂x∂y
maps Λlh : R × R → R × R, (x, y) 7→ (x, ∂h/∂x(x, y)) and Λrh : R × R → R × R, (x, y) 7→
(y, ∂h/∂y(x, y)) are C1 diffeomorphisms. The twist map f : R×R → R×R associated to
h is determined by f (x1 , v1 ) = (x2 , v2 ), where v1 = −∂h/∂x(x1 , x2 ), v2 = ∂h/∂y(x1 , x2 ),
which means f (x1 , v1 ) = Λrh ◦ [Λlh ]−1 (x1 , −v1 ), see [102] or [79].
We now recall some useful measure-theoretical facts that we will need in the sequel.
Lemma 1.2.5. Let M be an n-dimensional manifold, N be a Polish space, and let

c : M × N → R be a measurable function such that x 7→ c(x, y) is continuous for any
y ∈ N . Then the set
∂c
{(x, y) | (x, y) exists} is Borel measurable.
∂x
∂c
Moreover (x, y) 7→ ∂x
(x, y) is a Borel function on that set.
Proof. This a standard result in measure theory, we give here just a sketch of the proof.
By the locality of the statement, using charts we can assume M = Rn . Let Tk : Rn →
Rn be a dense countable family of linear maps. For any j, k ∈ N, we consider the Borel
function
|c(x + h, y) − c(x, y) − Tk (h)|
Lj,k (x, y) : = sup
|h|∈(0, 1 ) |h|
j
|c(x + h, y) − c(x, y) − Tk (h)|

= sup ,
|h|∈(0, 1j ),h∈Qn |h|
where in the second equality we used the continuity of x 7→ c(x, y). Then it is not difficult
∂c
to show that the set of point where ∂x (x, y) exists can be written as
{(x, y) | inf inf Lj,k (x, y) = 0},

j k
which is clearly a Borel set.

∂c
To show that x 7→ ∂x
(x, y) is Borel, it suffices to note that the partial derivatives
∂c c(x1 , . . . , xi + 1` , . . . , xn , y) − ϕn (x1 , . . . , xi , . . . , xn , y)
(x, y) = lim
∂xi `→∞ 1/`
are countable limits of continuous functions, and hence are Borel measurable.
Therefore, by the above lemma, D(Λlc ) is a Borel set. If we moreover assume that c
satisfies the left twist condition (that is, Λlc is injective on D(Λlc )), then one can define
(Λlc )−1 : T ∗ M ⊃ Λlc (D(Λlc )) → D(Λlc ) ⊂ M × N.
Then, by the injectivity assumption, one has that Λlc (D(Λlc )) is still a Borel set, and
(Λlc )−1 is a Borel map (see [51, Proposition 8.3.5 and Theorem 8.3.7], [70]). We can so
extend (Λlc )−1 as a Borel map on the whole T ∗ M as
½ l −1
l,inv (Λc ) (x, p) if p ∈ Tx∗ M ∩ Λlc (D(Λlc )),
Λc (x, p) =
(x, ȳ) if p ∈ Tx∗ M \ Λlc (D(Λlc )),
where ȳ is an arbitrary point, but fixed point, in N .
1.3 The main result

In order to have general results of existence and uniqueness of transport maps which are
sufficiently flexible to be used also in other situations, and to well show where measure-
theoretic problems enter in the proof of the existence of the transport map, we will
first give a general result where no measures are present (see in the appendix 6.1.3 for
the definition of locally semi-concave function and 6.1.7 for the definition of countably
(n − 1)-Lipschitz set).
Theorem 1.3.1. Let M be a smooth (second countable) manifold, and let N be a Polish
space. Assume that the cost c : M × N → R is Borel measurable, bounded from below,
and satisfies the following conditions:
(i) the family of maps x 7→ c(x, y) = cy (x) is locally semi-concave in x locally uniformly
in y,
(ii) the cost c satisfies the left twist condition.
Let (ϕ, ψ) be a c-subsolution, and consider the set G(ϕ,ψ) ⊂ M × N given by
G(ϕ,ψ) = {(x, y) ∈ M × N | ψ(y) − ϕ(x) = c(x, y)}.

1.3. The main result 23
We can find a Borel countably (n − 1)-Lipschitz set E ⊂ M and a Borel measurable map
T : M → N such that
−1
G(ϕ,ψ) ⊂ Graph(T ) ∪ πM (E),
where πM : M × N → M is the canonical projection, and Graph(T ) = {(x, T (x)) | x ∈
M } is the graph of T . ¡ ¢
In other words, if we define P = πM G(ϕ,ψ) ⊂ M the part of G(ϕ,ψ) which is above
P \ E is contained a Borel graph.
More precisely, we will prove that there exist an increasing sequence of locally semi-
convex functions ϕn : M → R, with ϕ ≥ ϕn+1 ≥ ϕn on M , and an increasing sequence
of Borel subsets Cn such that
• For x ∈ Cn , the derivative dx ϕn exists, ϕn+1 (x) = ϕn (x), and dx ϕn+1 = dx ϕn .
• If we set C = ∪n Cn , there exists a Borel countably (n − 1)-Lipschitz set E ⊂ M

such that P \ E ⊂ C.
Moreover, the Borel map T : M → N is such that
• For every x ∈ Cn , we have
(x, T (x)) = Λl,inv

c (x, −dx ϕn ),
where Λl,inv
c is the extension of the inverse of Λlc defined at the end of Section 1.2.
∂c
• If x ∈ P ∩ Cn \ E, then the partial derivative (x, T (x)) exists (i.e. (x, T (x)) ∈
∂x
D(Λlc ) ), and
∂c
(x, T (x)) = −dx ϕn .
∂x
In particular, if x ∈ P ∩ Cn \ E, we have
(x, T (x)) ∈ D(Λlc ) and Λlc (x, T (x)) = (x − dx ϕn ).
Therefore, thanks to the twist condition, the map T is uniquely defined on P \ E ⊂ C.
The existence and uniqueness of a transport map is then a simple consequence of the
above theorem.
Theorem 1.3.2. Let M be a smooth (second countable) manifold, let N be a Polish

space, and consider µ and ν (Borel) probability measures on M and N respectively.
Assume that the cost c : M × N → R is lower semicontinuous and bounded from below.
Assume moreover that the following conditions hold:
in y,
(ii) the cost c satisfies the left twist condition,
(iii) the measure µ gives zero mass to countably (n − 1)-Lipschitz sets,
(iv) the infimum in the Kantorovitch problem (1.2.1) is finite.
Then there exists a Borel map T : M → N , which is an optimal transport map from
µ to ν for the cost c. Morover, the map T is unique µ-a.e., and any plan γc ∈ Π(µ, ν)
optimal for the cost c is concentrated on the graph of T .
More precisely, if (ϕ, ψ) is a (c, γc )-calibrating pair, with the notation of Theorem
1.3.1, there exists an increasing sequence of Borel subsets Bn , with µ(∪n Bn ) = 1, such
that the map T is uniquely defined on B = ∪n Bn via
∂c
(x, T (x)) = −dx ϕn on Bn ,
∂x
and any optimal plan γ ∈ Π(µ, ν) is concentrated on the graph of that map T .
We remark that condition (iv) is trivially satisfied if
Z
c(x, y) dµ(x) dν(y) < +∞.
M ×N
However we needed to stated the above theorem in this more general form in order to
apply it in Section 1.5 (see Remark 1.5.3).
Proof of Theorem 1.3.2. Let γc ∈ Π(µ, ν) be an optimal plan. By Theorem 1.2.3 there
exists a (c, γ)-calibrated pair (ϕ, ψ). Consider the set
G = G(ϕ,ψ) = {(x, y) ∈ M × N | ψ(y) − ϕ(x) = c(x, y)}.
Since both M and N are Polish and both maps ϕ and ψ are Borel, the subset G is a
Borel subset of M × N . Observe that, by the definition of (c, γc )-calibrated pair, we have
γc (G) = 1.
By Theorem 1.3.1 there exists a Borel countably (n − 1)-Lipschitz set E such that
G \ (πM )−1 (E) is contained in the graph of a Borel map T . This implies that
¡ ¢
B = πM G \ (πM )−1 (E) = πM (G) \ E ⊂ M
˜ )−1 (G \ (πM )−1 (E)) and the map x 7→
is a Borel set, since it coincides with (IdM ×T
˜ (x) = (x, T (x)) is Borel measurable.
IdM ×T
Thus, recalling that the first marginal of γc is µ, by assumption (iii) we get γc ((πM )−1 (E)) =
µ(E) = 0. Therefore γc (G \ (πM )−1 (E)) = 1, so that γc is concentrated on the graph of
T , which gives the existence of an optimal transport map. Note now that µ(B) =
γc (π −1 (B)) ≥ γc (G \ (πM )−1 (E)) = 1. Therefore µ(B) = 1. Since B = P \ E,
where P = πM (G), using the Borel set Cn provided by Theorem 1.3.1, it follows that
Bn = P ∩ Cn \ E = D ∩ Cn is a Borel set with B = ∪n Bn . The end of Theorem 1.3.1
shows that T is indeed uniquely defined on B as said in the statement.
Let us now prove the uniqueness of the transport map µ-a.e.. If S is another optimal
transport map, consider the measures γT = (IdM ×T )# µ and γS = (IdM ×S)# µ. The
measure γ̄ = 12 (γT + γS ) ∈ Π(µ, ν) is still an optimal plan, and therefore must be
concentrated on a graph. This implies for instance that S = T µ-a.e., and thus T is the
unique optimal transport map. Finally, since any optimal γ ∈ Π(µ, ν) is concentrated
on a graph, we also deduce that any optimal plan is concentrated on the graph of T .
Proof of Theorem 1.3.1. By definition of c-subsolution, we have ϕ > −∞ everywhere

on M , and ψ < +∞ everywhere on N . Therefore, if we define Wn = {ψ ≤ n}, we
have Wn ⊂ Wn+1 , and ∪n Wn = N . Since, by hypothesis (i), c(x, y) = cy (x) is locally
semi-concave in x locally uniformly in y, for each y ∈ N there exist a neighborhood Vy of
y such that the family of functions (c(·, z))z∈Vy is locally uniformly semi-concave. Since
N is separable, there exists a countable family of points (yk )k∈N such that ∪k Vyk = N .
We now consider the sequence of subsets (Vn )n∈N ⊂ N defined as
Vn = Wn ∩ (∪1≤k≤n Vyk ) .
We have Vn ⊂ Vn+1 . Define ϕn : M → N by

³ ´
ϕn (x) = sup ψ(y) − c(x, y) = max sup ψ(y) − c(x, y) .
y∈Vn 1≤k≤n y∈Wn ∩Vy
k
Since ψ ≤ n on Kn , and −c is bounded from above, we see that ϕn is bounded from

above. Therefore, by hypothesis (i), the family of functions (ψ(y) − c(·, y))y∈Wn ∩Vyk
is locally uniformly semi-convex and bounded from above. Thus, by Theorem 6.1.4
and Proposition 6.1.11 of the Appendix, the function ϕn is locally semi-convex. Since
ψ(y) − ϕ(x) ≤ c(x, y) with equality on G(ϕ,ψ) , and Vn ⊂ Vn+1 , we clearly have
ϕn ≤ ϕn+1 ≤ ϕ everywhere on M .
A key observation is now the following:
ϕ|Pn = ϕn |Pn ,
where Pn = πM (G(ϕ,ψ) ∩ (M × Vn )). In fact, if x ∈ Pn , by the definition of Pn we know

that there exists a point yx ∈ Vn such that (x, yx ) ∈ G(ϕ,ψ) . By the definition of G(ϕ,ψ) ,
this implies
ϕ(x) = ψ(yx ) − c(x, yx ) ≤ ϕn (x) ≤ ϕ(x).
Since ϕn is locally semi-convex, by Theorem 6.1.8 of the Appendix applied to −ϕn , it
is differentiable on a Borel subset Fn such that its complement Fnc is a Borel countably
(n − 1)-Lipschitz set. Let us then define F = ∩n Fn . The complement E = F c = ∪n Fnc
is also a Borel countably (n − 1)-Lipschitz set. We now define the Borel set
Cn = F ∩ {x ∈ M | ϕk (x) = ϕn (x) ∀k ≥ n}.
We observe that Cn ⊃ Pn ∩ F .
We now prove that G(ϕ,ψ) ∩ ((Pn ∩ F ) × Vn ) is contained in a graph.
To prove this assertion, fix x ∈ Pn ∩ F . By the definition of Pn , and what we said
above, there exists yx ∈ Vn such that
ϕ(x) = ϕn (x) = ψ(yx ) − c(x, yx ).
Since x ∈ F , the map z 7→ ϕn (z) − ψ(yx ) is differentiable at x. Moreover, by condition
(i), the map z 7→ −c(z, yx ) = −cyx (z) is locally semi-convex and, by the definition of ϕn ,
for every z ∈ M , we have ϕn (z) − ψ(yx ) ≥ −c(z, yx ), with equality at z = x. These facts
∂c
taken together imply that (x, yx ) exists and is equal to −dx ϕn . In fact, working in a
∂x
chart around x, since cyx = c(·, yx ) is locally semi-concave, by the definition 6.1.3 of a
locally semi-concave function, there exists linear map lx such that
c(z, yx ) ≤ c(x, yx ) + lx (z − x) + o(|z − x|),
for z in a neighborhood of x. Using also that ϕn is differentiable at x, we get
ϕn (x) − ψ(yx ) + dx ϕn (z − x) + o(|z − x|) = ϕn (z) − ψ(yx )
≥ −c(z, yx ) ≥ −c(x, yx ) − lc (z − x) + o(|z − x|)
= ϕn (x) − ψ(yx ) − lc (z − x) + o(|z − x|).
This implies that lc = −dx ϕn , and that cyx is differentiable at x with differential at x
equal to −dx ϕn . Setting now Gx = {y ∈ N | ϕ(x) − ψ(y) = c(x, y)}, we have just shown
∂c
that {x} × (Gx ∩ Vn ) ⊂ D(Λlc ) for each x ∈ Cn , and also (x, y) = −dx ϕn , for every
∂x
y ∈ Gx ∩ Vn . Recalling now that that, by hypothesis (ii), the cost c satisfies the left
twist condition, we obtain that Gx ∩ Vn is reduced to a single element which is uniquely
characterized by the equality
∂c
(x, yx ) = −dx ϕn .
∂x
So we have proved that G ∩ (M × Vn ) is the graph over Pn ∩ F of the map T defined

uniquely, thanks to the left twist condition, by
∂c
(x, T (x)) = −dx ϕn
∂x
(observe that, since ϕn ≤ ϕk for k ≥ n with equality on Pn , we have dx ϕn |Pn = dx ϕk |Pn
for k ≥ n). Since Pn+1 ⊃ Pn , and Vn ⊂ Vn+1 % N , we can conclude that G(ϕ,ψ) is a
graph over ∪n Pn ∩ F = P ∩ F (where P = πM (G(ϕ,ψ) ) = ∪n Pn ).
Observe that, at the moment, we do not know that T is a Borel map, since Pn is not
a priori Borel. Note first that by definition of Bn ⊂ Bn+1 , we ϕn = ϕn+1 on Bn , and
they are both differentiable at every point of Bn . Since ϕn ≤ ϕn+1 everywhere, by the
same argument as above we get dx ϕn = dx ϕn+1 for x ∈ Bn . Thus, setting B = ∪n Bn ,
we can extend T to M by
½
πN Λl,inv
c (x, −dx ϕn ) on Bn ,
T (x) =
ȳ on M \ B,
where πN : M × N → N is the canonical projection, Λl,inv c is the Borel extension of

(Λlc )−1 defined after Lemma 1.2.5, and ȳ is an arbitrary but fixed point in N . Obviously,
the map T thus defined is Borel measurable and extends the map T already defined on
P \ E.
In the case where µ is absolutely continuous with respect to Lebesgue measure we

can give a complement to our main theorem. In order to state it, we need the following
definition, see [11, Definition 5.5.1, page 129]:
Definition 1.3.3 (Approximate differential). We say that f : M → R has an

approximate differential at x ∈ M if there exists a function h : M → R differentiable at
x such that the set {f = h} has density 1 at x with respect to the Lebesgue measure
(this just means that the density is 1 in charts). In this case, the approximate value of
f at x is defined as f˜(x) = h(x), and the approximate differential of f at x is defined
as d˜x f = dx h. It is not difficult to show that this definition makes sense. In fact, both
h(x), and dx h do not depend on the choice of h, provided x is a density point of the set
{f = h}.
Another characterization of the approximate value f˜(x), and of the approximate

differential d˜x f is given, in charts, saying that the sets
( )
|f (y) − f˜(x) − d˜x f (y − x)|
y| >ε
|y − x|
have density 0 at x for each ε > 0 with respect to Lebesgue measure. This last definition
is the one systematically used in [70]. On the other hand, for our purpose, Definition
1.3.3 is more convenient.
The set points x ∈ M where the approximate derivative d˜x f exists is measurable;
moreover, the map x 7→ d˜x f is also measurable, see [70, Theorem 3.1.4, page 214].
Complement 1.3.4. Under the hypothesis of Theorem 1.3.2, if we assume that µ is

absolutely continuous with respect to Lebesgue measure (this is stronger than condition
(iii) of Theorem 1.3.2), then for any calibrated pair (ϕ, ψ), the function ϕ is approxi-
matively differentiable µ-a.e., and the optimal transport map T is uniquely determined
µ-a.e., thanks to the twist condition, by
∂c
(x, T (x)) = −d˜x ϕ,
∂x
where d˜x ϕ is the approximate differential of ϕ at x. Moreover, there exists a Borel

subset A ⊂ M of full µ measure such that d˜x ϕ exists on A, the map x 7→ d˜x ϕ is Borel
∂c
measurable on A, and (x, T (x)) exists for x ∈ A (i.e. (x, T (x)) ∈ D(Λlc )).
∂x
Proof. We will use the notation and the proof of Theorems 1.3.1 and 1.3.2. We denote
by Ãn ⊂ Bn the set of x ∈ Bn which are density points for Bn with respect to some
measure λ whose measure class in charts is that of Lebesgue (for example one can take λ
as the Riemannian measure associated to a Riemannian metric). By Lebesgue’s Density
Theorem λ(Bn \ Ãn ) = 0. Since µ is absolutely continuous with respect to Lebesgue
measure, we have µ(Ãn ) = µ(Bn ), and therefore Ã = ∪n Ãn is of full µ-measure, since
µ(Bn ) % µ(B) = 1. Moreover, since {ϕ = ϕn } on Bn , and ϕn is differentiable at each
point of Bn , the function ϕ is approximatively differentiable at each point of Ãn with
d˜x ϕ = dx ϕn .
The last part of this complement on measurability follows of course from [70, Theorem
3.1.4, page 214]. But in this case, we can give a direct simple proof. We choose An ⊂ Ãn
Borel measurable with µ(Ãn \An ) = 0. We set A = ∪n An . The set A is of full µ measure.
Moreover, for every x ∈ An , the approximate differential d˜x ϕ exists and is equal to dx ϕn .
Thus it suffices to show that the map x 7→ dx ϕn is Borel measurable, and this follows as
in Lemma 1.2.5.
1.4 Costs obtained from Lagrangians

Now that we have proved Theorem 1.3.2, we want to observe that the hypotheses are
satisfied by a large class of cost functions.
1.4. Costs obtained from Lagrangians 29
We will consider first the case of a Tonelli Lagrangian L on a connected manifold (see
Definition 6.2.4 of the Appendix for the definition of a Tonelli Lagrangian). For t > 0,
the cost ct,L : M × M → R associated to L is given by
Z t
ct,L (x, y) = inf L(γ(s), γ̇(s)) ds,
γ 0
where the infimum is taken over all the continuous piecewise C1 curves γ : [0, t] → M ,
with γ(0) = x, and γ(t) = y (see Definition 6.2.18 of the Appendix).
Proposition 1.4.1. If L : T M → R is a Tonelli Lagrangian on the connected manifold
M , then, for t > 0, the cost ct,L : M × M → R associated to the Lagrangian L is
continuous, bounded from below, and satisfies conditions (i) and (ii) of Theorem 1.3.2.
Proof. Since L is a Tonelli Lagrangian, observe that L is bounded below by C, where C is
the constant given in condition (c) of Definition 6.2.4. Hence the cost ct,L is bounded be-
low by tC. By Theorem 6.2.19 of the Appendix, the cost ct,L is locally semi-concave, and
therefore continuous. Moreover, we can now apply Proposition 6.1.17 of the Appendix
to conclude that ct,L satisfies condition (i) of Theorem 1.3.2.
The twist condition (ii) of Theorem 1.3.2 for ct,L follows from Lemma 6.2.22 and
Proposition 6.2.23.
For costs coming from Tonelli Lagrangians, we subsume the application of the main
Theorem 1.3.2, and its Complement 1.3.4.
Theorem 1.4.2. Let L be a Tonelli Lagrangian on the connected manifold M . Fix t > 0,
µ, ν a pair of probability measure on M , with µ giving measure zero to countably (n−1)-
Lipschitz sets, and assume that the infimum in the Kantorovitch problem (1.2.1) with
cost ct,L is finite. Then there exists a uniquely µ-almost everywhere defined transport
map T : M → M from µ to ν which is optimal for the cost ct,L . Moreover, any plan
γ ∈ Π(µ, ν), which is optimal for the cost ct,L , verifies γ(Graph(T )) = 1.
If µ is absolutely continuous with respect to Lebesgue measure, and (ϕ, ψ) is a ct,L -
calibrated subsolution for (µ, ν), then we can find a Borel set B of full µ measure, such
that the approximate differential d˜x ϕ of ϕ at x is defined for x ∈ B, the map x 7→ d˜x ϕ
is Borel measurable on B, and the transport map T is defined on B (hence µ-almost
everywhere) by
T (x) = π ∗ φH ˜
t (x, dx ϕ),
where π ∗ : T ∗ M → M is the canonical projection, and φH

t is the Hamiltonian flow of the
Hamiltonian H associated to L.
We can also give the following description for T valid on B (hence µ-almost every-
where):
g L (ϕ)),
T (x) = πφL (x, grad
t x
g L (ϕ) is the measurable vector

where φLt is the Euler-Lagrange flow of L, and x → grad x
field on M defined on B by
∂L g L (ϕ)) = d˜x ϕ.
(x, grad x
∂v
Moreover, for every x ∈ B, there is a unique L-minimizer γ : [0, t] → M , with γ(0) =
g L (ϕ)), for 0 ≤ s ≤ t.
x, γ(t) = T (x), and this curve γ is given by γ(s) = πφL (x, grad
s x
Proof. The first part is a consequence of Proposition 1.4.1 and Theorem 1.3.2. When µ
is absolutely continuous with respect to Lebesgue measure, we can apply Complement
1.3.4 to obtain a Borel subset A ⊂ M of full µ measure such that, for every x ∈ A, we
have (x, T (x)) ∈ D(Λlct,L ) and
∂ct,L
(x, T (x)) = d˜x ϕ.
∂x
By Lemma 6.2.22 and Proposition 6.2.23, if (x, y) ∈ D(Λlct,L ), then there is a unique
L-minimizer γ : [0, t] → M , with γ(0) = x, γ(t) = y, and this minimizer is of the form
γ(s) = πφLs (x, v), where π : T M → M is the canonical projection, and v ∈ Tx M is
uniquely determined by the equation
∂ct,L ∂L
(x, y) = − (x, v).
∂x ∂v
g L (ϕ)), where grad
Therefore T (x) = πφLt (x, grad g L (ϕ) is uniquely determined by
x x
∂L g L (ϕ)) = − ∂ct,L (x, T (x)) = d˜x ϕ,

(x, gradx
∂v ∂x
which is precisely the second description of T . The first description of T follows from
the second one, once we observe that
L
g (ϕ))) = (x, ∂L g L (ϕ)) = (x, d˜x ϕ)
L (x, grad x (x, grad x
∂v
φH L
t = L ◦ φt ◦ L
−1
π ∗ ◦ L = π,
where L : T M → T ∗ M is the global Legendre Transform, see Definition 6.2.8 of the

Appendix.
We now turn to the proof of Theorem 1.1.1, which is not a consequence of Theorem
1.4.2 since the cost dr with r > 1 does not come from a Tonelli Lagrangian for r 6= 2.
1.4. Costs obtained from Lagrangians 31
Theorem 1.4.3. Suppose that the connected manifold M is endowed with a Riemannian
metric g which is complete. Denote by d the Riemannian distance. If r > 1, and µ and ν
are probability (Borel) measures on M , where µ gives measure zero to countably (n − 1)-
Lipschitz sets, and
Z Z
r
d (x, x0 ) dµ(x) < ∞ and dr (x, x0 ) dν(x) < ∞
M M
for some given x0 ∈ M , then we can find a transport map T : M → M , with T] µ = ν,

which is optimal for the cost dr on M × M . Moreover, the map T is uniquely determined
µ-almost everywhere.
If µ is absolutely continuous with respect to Lebesgue measure, and (ϕ, ψ) is a cali-
brated subsolution for the cost dr (x, y) and the pair of measures (µ, ν), then the approx-
imate differential d˜x ϕ of ϕ at x is defined µ-almost everywhere, and the transport map
T is defined µ-almost everywhere by
˜ xϕ
∇
T (x) = expx ( ),
˜ x ϕk(r−2)/(r−1)
r1/(r−1) k∇ x
˜ of ϕ is defined by
where the approximate Riemannian gradient ∇ϕ
˜ x ϕ, ·) = d˜x ϕ,
gx (∇
and exp : T M → M is the exponential map of g on T M , which is globally defined since

M is complete.
Proof. We first remark that
dr (x, y) ≤ [d(x, x0 ) + d(x0 , y)]r

≤ [2 max(d(x, x0 ), d(x0 , y))]r
≤ 2r [d(x, x0 )r + d(y, x0 )r ].
Therefore
Z Z
r
d (x, y) dµ(x)dν(y) ≤ 2r [d(x, x0 )r + d(y, x0 )r ] dµ(x)dν(y)
M ×M M ×M
Z Z
r r r
=2 d (x, x0 ) dµ(x) + 2 dr (y, x0 ) dν(y) < ∞,
M M
and thus the infimum in the Kantorovitch problem (1.2.1) with cost dr is finite.
By Example 6.2.5, the Lagrangian Lr,g (x, v) = kvkrx = gx (v, v)r/2 is a weak Tonelli
Lagrangian. By Proposition 6.2.24, the non-negative and continuous cost dr (x, y) is
precisely the cost c1,Lr,g . Therefore this cost is locally semi-concave by Theorem 6.2.19.
By Proposition 6.1.17, this implies that dr (x, y) satisfies condition (i) of Theorem 1.3.2.
The fact that the cost dr (x, y) satisfies the left twist condition (ii) of Theorem 1.3.2
follows from Proposition 6.2.24. Therefore there is an optimal transport map T .
If the measure µ is absolutely continuous with respect to Lebesgue measure, and
(ϕ, ψ) is a calibrated subsolution for the cost dr (x, y) and the pair of measures (µ, ν),
then by Complement 1.3.4, for µ-almost every x, we have (x, T (x)) ∈ D(Λlc1,Lr,g ), and
∂ct,Lr,g
(x, T (x)) = −d˜x ϕ.
∂x
Since (x, T (x)) is in D(Λlc1,Lr,g ), it follows from Proposition 6.2.24 that T (x) = πφg1 (x, vx ),
where π : T M → M is the canonical projection, the flow φgt is the geodesic flow of g on
T M , and vx ∈ Tx M is determined by
∂ct,Lr,g ∂Lr,g
(x, T (x)) = − (x, vx ),
∂x ∂v
or, given the equality above, by
∂Lr,g
(x, vx ) = d˜x ϕ.
∂v
Now the vertical derivative of Lr,g is computed in Example 6.2.5
∂Lr,g
(x, v) = rkvkr−2
x gx (v, ·).
∂v
Hence vx ∈ Tx M is determined by
rkvx kr−2 ˜ ˜
x gx (vx , ·) = dx ϕ = gx (∇x ϕ, ·).
This gives the equality

rkvx kr−2 ˜
x vx = ∇x (ϕ),
from which we easily get

˜ xϕ
∇
vx = .
˜ x ϕk(r−2)/(r−1)
r1/(r−1) k∇ x
Therefore
˜ xϕ
∇
T (x) = πφgt (x, ).
˜ x ϕk(r−2)/(r−1)
r1/(r−1) k∇ x
By definition of the exponential map exp : T M → M , we have expx (v) = πφgt (x, v), and
the formula for T (x) follows.
1.5. The interpolation and its absolute continuity 33
1.5 The interpolation and its absolute continuity

For a cost ct,L coming from a Tonelli Lagrangian L, Theorem 1.4.2 shows not only that
we have an optimal transport map T but also that this map is obtained by following
an extremal for time t. We can therefore interpolate the optimal transport by maps Ts
where we stop at intermediary times s ∈ [0, t]. We will show in this section that these
maps are also optimal transport maps for costs coming from the same Lagrangian. Let
us give now precise definitions.
For the sequel of this section, we consider L a Tonelli Lagrangian on the connected
manifold M . We fix t > 0 and µ0 and µt two probability measures on M , with µ0
absolutely continuous with respect to Lebesgue measure, and such that
½Z ¾
min ct,L (x, y) dγ(x, y) < +∞.
γ∈Π(µ0 ,µt ) M ×M
We call Tt the optimal transport map given by Theorem 1.4.2 for (ct,L , µ0 , µt ). We denote
˜ t )# µ0 is the unique
by (ϕ, ψ) a fixed (ct,L , γt )-calibrated pair. Therefore γt = (IdM ×T
optimal plan from µ0 to µt . By Theorem 1.4.2, we can find a Borel subset B ⊂ M such
that:
• the subset B is of full µ0 measure;
• the approximate ddifferential d˜x ϕ exists for every x ∈ B, and is Borel measurable
on B;
• the map Tt is defined at every x ∈ B, and we have
g L (ϕ)),
Tt (x) = πφLt (x, gradx
where φtL is the Euler-Lagrange flow of L, π : T M → M is the canonical projection,

and the Lagrangian approximate gradient x 7→ grad g L (ϕ)) is defined by
x
∂L g L (ϕ)) = d˜x ϕ;
(x, grad x
∂v
∂c
• for every x ∈ B, the partial derivative (x, Tt (x)) exists, and is uniquely defined
∂x
by
∂c
(x, Tt (x)) = −d˜x ϕ;
∂x
• for every x ∈ B there exists a unique L-minimizer γx : [0, t] → M between x and

Tt (x). This L-minimizer γx is given by
∀s ∈ [0, t], g L (ϕ));

γx (s) = πφLs (x, grad x
• for every x ∈ B, we have
ψ(Tt (x)) − ϕ(x) = ct,L (x, Tt (x)).
We now make the following important remark, that we will need also in the sequel:
Remark 1.5.1. We observe that, for µ0 -a.e. x, there exists an unique curve from x to
∂c
Tt (x) that minimizes the action. In fact, since ∂x (x, y) exists at y = Tt (x) for µ0 -a.e. x,
the twist conditions proved in Section 1.4 tells us that its velocity at time 0 is µ0 -a.e.
univocally determined.
For s ∈ [0, t], we can therefore define µ0 -a.e. an interpolation Ts : M → M by
∀x ∈ B, g L (ϕ)).
Ts (x) = γx (s) = πφLs (x, grad x
Each map Ts is Borel measurable. In fact, since the global Legendre transform is a
homeomorphism and the approximate differential is Borel measurable, the Lagrangian
g L (ϕ) is itself Borel measurable. Moreover the map πφL :
approximate gradient grad s
T M → M is continuous, and thus Ts is Borel measurable. We can therefore define the
probability measure µs = Ts# µ0 on M , i.e. the measure µs is the image of µ0 under the
Borel measurable map Ts .
Theorem 1.5.2. Under the hypothesises above, the maps Ts satisfies the following
properties:
(i) For every s ∈ (0, t), the map Ts is the (unique) optimal transport maps for the cost
cs,L and the pair of measures (µ0 , µs ).
(ii) For every s ∈ (0, t), the map Ts : B → M is injective. Moreover, if we define
c̄s,L (x, y) = cs,L (y, x), the inverse map Ts−1 : Ts (B) → B is the (unique) optimal
transport map for the cost c̄s,L and the pair of measures (µs , µ0 ), and it is count-
ably Lipschitz (i.e. there exist a Borel countable partition of M such that Ts−1 is
Lipschitz on each set).
(iii) For every s ∈ (0, t), the measure µs = Ts# µ0 is absolutely continuous with respect
to Lebesgue measure.
(iv) For every s ∈ (0, t), the composition T̂s = Tt Ts−1 is the (unique) optimal trans-
port map for the cost ct−s,L and the pair of measures (µs , µt ), and it is countably
Lipschitz.
Proof. Fix s ∈ (0, t). It is not difficult to see, from the definition of ct,L , that
∀x, y, z ∈ M, ct,L (x, y) ≤ cs,L (x, y) + ct−s,L (y, z), (1.5.1)
and even that
∀x, z ∈ M, ct,l (x, y) = inf cs,L (x, y) + ct−s,L (y, z).

y∈M
If γ : [a, b] → M is an L-minimizer, the restriction γ|[c,d] to a subinterval [c, d] ⊂ [a, b] is

also an L-minimizer. In particular, we obtain
∀s ∈]a, b[, cb−a,L (γ(a), γ(b)) = cs−a,L (γ(a), γ(s)) + cb−s,L (γ(s), γ(b)).
Applying this to the L-minimizer γx , we get
∀x ∈ B, ct,L (x, Tt (x)) = cs,L (x, Ts (x)) + ct−s,L (Ts (x), Tt (x)). (1.5.2)
We define for every s ∈ (0, t), two probability measures γs , γ̂s on M × M , by

˜ s ) # µ0
γs = (IdM ×T ˜ t )# µ0 ,
and γ̂s = (Ts ×T
˜ s and Ts ×T
where IdM ×T ˜ t are the maps from the subset B of full µ0 measure to M × M
defined by
˜ s )(x) = (x, Ts (x)),
(IdM ×T
˜ t )(x) = (Ts (x), Tt (x)).
(Ts ×T
Note that the marginals of γs are (µ0 , µs ), and those of γ̂s are (µs , µt ). We claim that
cs,L (x, y) is integrable for γs and γ̂t−s . In fact, we have C = inf T M L > −∞, hence
cr,L ≥ Cr. Therefore, the equality (1.5.2) gives
∀x ∈ B, [ct,L (x, Tt (x)) − Ct] = [cs,L (x, Ts (x)) − Cs] + [ct−s,L (Ts (x), Tt (x)) − C(t − s)].
Since the functions between brackets are all non-negative, we can integrate this equality
with respect to µ0 to obtain
Z Z
[ct,L (x, y) − Ct] dγt = [cs,L (x, y) − Cs] dγs
M ×M M ×M
Z
+ [ct−s,L (x, y) − C(t − s)] dγ̂s .
M ×M
But all numbers involved in the equality above are non-negative, all measures are proba-
bility measures, and the cost ct,L is γt integrable since γt is an optimal plan for (ct,L , µ0 , µt ),
and the optimal cost of (ct,L , µ0 , µt ) is finite. Therefore we obtain that cs,L is γs -integrable,
and ct−s,L is γ̂s -integrable.
Since by definition of a calibrating pair we have ϕ > −∞ and ψ < +∞ everywhere
on M , we can find an increasing sequence of compact subsets Kn ⊂ M with ∪n Kn = M ,
and we consider Vn = Kn ∩ {ϕ ≥ −n}, Vn0 = Kn ∩ {ψ ≤ n}, so that ∪n Vn = ∪n Vn0 = M .
We define the functions ϕns , ψsn : M → R by
ψsn (z) = inf ϕ(z̃) + cs,L (z̃, z),
z̃∈Vn
ϕns (z) = sup ψ(z̃) − ct−s,L (z, z̃),

z̃∈Vn0
where (ϕ, ψ) is the fixed ct,L -calibrated pair. Note that ψsn is bounded from below by
−n + t inf T M L > −∞. Moreover, the family of functions (ϕ(z̃) + cs,L (z̃, ·))z̃∈Vn0 is locally
uniformly semi-concave with a linear modulus, since this is the case for the family of
functions (cs,L (z̃, ·))z̃∈Vn0 , by Theorem 6.2.19 and Proposition 6.1.17. It follows from
Proposition 6.1.16 that ψsn is semi-concave with a linear modulus. A similar argument
proves that −ϕns is semi-concave with a linear modulus. Note also that, since Vn and Vn0
are both increasing sequences, we have ψsn ≥ ψsn+1 and ϕn+1 s ≤ ϕns , for every n. Therefore
we can define ϕs (resp. ψs ) as the pointwise limit of the sequence ϕns
Using the fact that (ϕ, ψ) is a ct,L -subsolution, and inequality (1.5.1) above, we obtain
∀x, y, z ∈ M, ψ(y) − ct−s,L (z, y) ≤ ϕ(x) + cs,L (x, z).
Therefore we obtain for x ∈ Vn , y ∈ Vn0 , z ∈ M
ψ(y) − ct−s,L (z, y) ≤ ϕns (z) ≤ ϕs (z) ≤ ψs (z) ≤ ψsn (z) ≤ ϕ(x) + cs,L (x, z). (1.5.3)
Inequality (1.5.3) above yields
∀x, y, z ∈ M, ψ(y) − ct−s,L (z, y) ≤ ϕs (z) ≤ ψs (z) ≤ ϕ(x) + cs,L (x, z). (1.5.4)
In particular, the pair (ϕ, ψs ) is a cs,L -subsolution, and the pair (ϕs , ψ) is a ct−s,L -
subsolution. Moreover, ϕ, ψs , ϕs and ψ are all Borel measurable.
We now define
Bn = B ∩ Vn ∩ Tt−1 (Vn0 ),
so that ∪n Bn = B has full µ0 -measure.
If x ∈ Bn , it satisfies x ∈ Vn and Tt (x) ∈ Vn0 . From Inequality (1.5.3) above, we
obtain
ψ(Tt (x)) − ct−s,L (Ts (x), Tt (x)) ≤ ϕns (Ts (x)) ≤ ϕs (Ts (x))
≤ ψs (Ts (x)) ≤ ψsn (Ts (x)) ≤ ϕ(x) + cs,L (x, Ts (x))
Since Bn ⊂ B, for x ∈ Bn , we have ψ(Tt (x)) − ϕ(x) = ct,L (x, Tt (x)). Combining this
with Equality (1.5.2), we conclude that the two extreme terms in the inequality above
are equal. Hence, for every x ∈ Bn , we have
ψ(Tt (x)) − ct−s,L (Ts (x), Tt (x)) = ϕns (Ts (x)) = ϕs (Ts (x))
= ψs (Ts (x)) = ψsn (Ts (x)) = ϕ(x) + cs,L (x, Ts (x)). (1.5.5)
In particular, we get
∀x ∈ B, ψs (Ts (x)) = ϕ(x) + cs,L (x, Ts (x)),
or equivalently
ψs (y) − ϕ(x) = cs,L (x, y) for γs -a.e. (x, y).
Since (ϕ, ψs ) is a (Borel) cs,L -subsolution, it follows that the pair (ϕ, ψs ) is (cs,L , γs )-
calibrated. Therefore, by Theorem 1.2.3 we get that γs = (IdM ×T ˜ s )# µ0 is an optimal
plan for (cs,L , µ0 , µs ). Moreover, since cs,L is γs -integrable, the infimum in the Kan-
torovitch problem (1.2.1) in Theorem 1.3.2 with cost cs,L is finite, and therefore there
exists a unique optimal transport plan. This proves (i).
Note for further reference that a similar argument, using the equality
∀x ∈ B, ψ(Tt (x)) = ϕs (Ts (x)) + ct−s,L (Ts (x), Tt (x)),
which follows from Equation (1.5.5) above, shows that the measure γ̂s = (Ts ×T ˜ t )# µ0 is
an optimal plan for the cost ct−s,L and the pair of measures (µs , µt ).
We now want to prove (ii). Since B is the increasing union of Bn = B ∩ Vn ∩ Tt−1 (Vn0 ),
it suffices to prove that Ts is injective on Bn and that the restriction T −1 |T (Bn ) is locally
Lipschitz on Ts (Bn ).
Since Bn ⊂ Vn , by Inequality (1.5.3) above we have
∀x ∈ Bn , ∀y ∈ M, ϕns (y) ≤ ψsn (y) ≤ ϕ(x) + cs,L (x, y). (1.5.6)
Moreover, by Equality (1.5.5) above
∀x ∈ Bn , ϕns (Ts (x)) = ψsn (Ts (x)) = ϕ(x) + cs,L (x, Ts (x)). (1.5.7)
In particular, we have ϕns ≤ ψsn everywhere with equality at every point of Ts (Bn ). As
we have said above, both functions ψsn and −ϕns are locally semi-concave with a linear
modulus. It follows, from Theorem 6.1.19, that both derivatives dz ϕns , dz ψsn exist and
are equal for z ∈ Ts (Bn ). Moreover, the map z 7→ dz ϕns = dz ψsn is locally Lipschitz on
Ts (Bn ). Note that we also get from (1.5.6) and (1.5.7) above that for a fixed x ∈ Bn , we
have ϕns ≤ ϕ(x) + cs,L (x, ·) everywhere with equality at Ts (x). Since ϕn is semi-convex,
using that cs,L (x, ·) is semi-concave, again by Theorem 6.1.19, we obtain that the partial
∂cs,L
derivative (x, Ts (x)) of cs,L with respect to the second variable exists and is equal
∂y
to dTs (x) ϕns = dTs (x) ψsn . Since γx : [0, t] → M is an L-minimizer with γx (0) = x and
γx (s) = Ts (x), it follows from Corollary 6.2.20 that
∂cs,L ∂L
dTs (x) ψsn = (x, Ts (x)) = (γx (s), γ̇x (s)).
∂y ∂v
Since γx is an L-minimizer, its speed curve is an orbit of the Euler-Lagrange flow, and
therefore
(Ts (x), dTs (x) ψsn ) = L ((γx (s), γ̇x (s)) = L φLs (γx (0), γ̇x (0)),
and
x = πφL−s L −1 (Ts (x), dTs (x) ψsn ).
It follows that Ts is injective on Bn with inverse given by the map θn : Ts (Bn ) → Bn
defined, for z ∈ Ts (Bn ), by
θn (z) = πφL−s L −1 (z, dz ψsn ).
Note that the map θn is locally Lipschitz on Ts (Bn ), since this is the case for z 7→
dz ψsn , and both maps φL−s , L −1 are C1 , since L is a Tonelli Lagrangian. An analogous
argument proves the countably Lipschitz regularity of T̂s = Tt Ts−1 in part (iv). Finally
the optimality of Ts−1 simply follows from
½Z ¾ ½Z ¾
min c̄s,L (x, y) dγ(x, y) = min cs,L (x, y) dγ(x, y)
γ∈Π(µs ,µ0 ) M ×M γ∈Π(µ0 ,µs ) M ×M
Z
= cs,L (x, Ts (x)) dµ0 (x)
M
Z
= c̄s,L (y, Ts−1 (y)) dµs (y).
M
Part (iii) of the Theorem follows from part (ii). In fact, if A ⊂ M is Lebesgue neg-
ligible, the image Ts−1 (Ts (B) ∩ A) is also Lebesgue negligible, since Ts−1 is countably
Lipschitz on Ts (B), and therefore sends Lebesgue negligible subsets to Lebesgue neg-
ligible subsets. It remains to note, using that B is of full µ0 -measure, that µs (A) =
Ts# µ0 (A) = µ0 (Ts−1 (Ts (B) ∩ A)) = 0.
˜ t )# µ0 is an optimal plan for the
To prove part (iv), we already know that γ̂s = (Ts ×T
cost ct−s,L and the measures (µs , µt ). Since the Borel set B is of full µ0 -measure, and
Ts : B → Ts (B) is a bijective Borel measurable map, we obtain that Ts−1 is a Borel map,
−1
and µ0 = Ts# µs . It follows that
˜ t Ts−1 )# µs .
γ̂s = (IdM ×T
1.6. The Wasserstein space W2 39
Therefore the composition Tt Ts−1 is an optimal transport map for the cost ct−s,L and the
pair of measures (µs , µt ), and it is the unique one since ct−s,L is γ̂s -integrable and µs is
absolutely continuous with respect to the Lebesgue measure.
Remark 1.5.3. We observe that, in proving the uniqueness statement in parts (i) and
(iv) of the above theorem, we needed the full generality of Theorem 1.4.2, in which we
only assume that the infimum in the Kantorovitch problem is finite. Indeed, assuming
Z
ct,L (x, y) dµ0 (x)dµt (y) < +∞,
M ×M
there is a priori no reason for which the two integrals

Z Z
cs,L (x, y) dµ0 (x)dµs (y), ct−s,L (x, y) dµs (x)dµt (y)
M ×M M ×M
would have to be finite. So the existence and uniqueness of a transport map in Theorem
1.3.2 under the integrability assumption on c with respect to µ ⊗ ν instead of assumption
(iv) would not have been enough to obtain Theorem 1.5.2.
Remark 1.5.4. We remark that, if both µ0 and µt are not assumed to be absolutely
continuous, and therefore no optimal transport map necessarily exists, one can still define
an “optimal” interpolation (µs )0≤s≤t between µ0 and µt using some measurable selection
theorem (see [133, Chapter 7]). Then, adapting our proof, one still obtains that, for any
s ∈ (0, t), there exists a unique optimal transport map Ss for (c̄s,L , µs , µ0 ) (resp. a unique
optimal transport map Ŝs for (ct−s,L , µs , µt )), and this map is countably Lipschitz.
We also observe that, if the manifold is compact, our proof shows that the above
maps are globally Lipschitz (see [22]).
1.6 The Wasserstein space W2

Let (M, g) be a smooth complete Riemannian manifold, equipped with its geodesic dis-
tance d and its volume measure vol. We denote with P2 (M ) the set of probability
measures on M with finite 2-order moment, that is
Z
d2 (x, x0 ) dµ(x) < +∞ for a certain x0 ∈ M .
M
We remark that, by the triangle inequality for d, the definition does not depends on the
point x0 . The space P2 (M ) can be endowed of the so called Wasserstein distance W2 :
½Z ¾
2 2
W2 (µ0 , µ1 ) := min d (x, y) dγ(x, y) .
γ∈Π(µ0 ,µ1 ) M ×M
The quantity W2 will be called the Wasserstein distance of order 2 between µ0 and µ1 .
It is well-known that it defines a finite metric on P2 (M ), and so one can speak about
geodesic in the metric space (P2 , W2 ). This space turns out, indeed, to be a length space
(see for example [132], [133]). We denote with P2ac (M ) the subset of P2 (M ) that consists
of the Borel probability measures on M that are absolutely continuous with respect to
vol.
By all the result proved before, it is simple to prove the following:
Proposition 1.6.1. P2ac (M ) is a geodesically convex subset of P2 (M ). Moreover, if
µ0 , µ1 ∈ P2ac (M ), then there is a unique Wasserstein geodesic {µt }t∈[0,1] joining µ0 to µ1 ,
which is given by
˜ ] µ0 ,
µt = (Tt )] µ0 := (exp[t∇ϕ])
where T (x) = expx [∇ ˜ x ϕ] is the unique transport map from µ0 to µ1 which is optimal for
1 2
the cost 2 d (x, y) (and so also optimal for the cost d2 (x, y)). Moreover:
1. Tt is the unique optimal transport map from µ0 to µt for all t ∈ [0, 1];
2. Tt−1 the unique optimal transport map from µt to µ0 for all t ∈ [0, 1] (and, if
t ∈ [0, 1), it is locally Lipschitz);
3. T ◦ Tt−1 the unique optimal transport map from µt to µ1 for all t ∈ [0, 1] (and, if
t ∈ (0, 1], it is locally Lipschitz).
Since we know that the transport is unique, the proof is quite standard. However,
for completeness, we give all the details.
Proof. Let {µt }t∈[0,1] be a Wasserstein geodesic joining µ0 to µ1 . Fix t ∈ (0, 1), and
let γt (resp. γ̂t ) be an optimal transport plan between µ0 and µt (resp. µt and µ1 ) (in
effect, we know that γt is a graph and it is unique, but we will not use this fact). We
now define the probability measure on M × M × M
Z
λt (dx, dy, dz) := γt (dx|y) × γ̂t (dz|y) dµt (y),
M
R R
where γt (dx, dy) = M γt (dx|y) dµt (y) and γ̂t (dy, dz) = M γ̂t (dz|y) dµt (y) are the disin-
tegrations of γt and γ̂t with respect to µt . Then, if we define
γ̃t := π]1,3 λt ,
it is simple to check that γ̃t is a transport plan from µ0 to µ1 . Now, since {µt }t∈[0,1] is a
geodesic, we have that
W2 (µ0 , µ1 ) ≤ kd(x, z)kL2 (γ̃t ,M ×M ) = kd(x, z)kL2 (λt ,M ×M ×M )
≤ kd(x, y)kL2 (λt ,M ×M ×M ) + kd(y, z)kL2 (λt ,M ×M ×M )
(1.6.1)
= kd(x, y)kL2 (γt ,M ×M ) + kd(y, z)kL2 (γ̂t ,M ×M )
= W2 (µ0 , µt ) + W2 (µt , µ1 ) = W2 (µ0 , µ1 ).
This proves that γ̃t is an optimal transport plan between µ0 and µ1 , which implies that
γ̃t is supported on the graph of T . Moreover, since in (1.6.1) all the inequalities are
indeed equalities, we get that
d(x, z) = d(x, y) + d(y, z) for λt -a.e. (x, y, z) ∈ M × M × M
that is, y is on a geodesic from x to z. Moreover, since W2 (µ0 , µt ) = tW2 (µ0 , µ1 ), we also
have
d(x, y) = td(x, z), d(y, z) = (1 − t)d(x, z) for λt -a.e. (x, y, z) ∈ M × M × M.
Since, by Remark 1.5.1, the geodesic from x to T (x) is unique for µ0 -a.e. x, we conclude
that λ is concentrated on the subset {(x, Tt (x), T (x))}x∈supp(µ0 ) , which implies that µt =
(Tt )] µ0 . Moreover we see that µt := (Tt )] µ0 ∈ P2ac (M ). In fact,
Z Z
2
d (x, x0 ) dµt (x) = d2 (Tt (x), x0 ) dµ0 (x)
M M
Z
£ 2 ¤
≤2 d (x, x0 ) + d2 (x, Tt (x)) dµ0 (x)
ZM
£ 2 ¤
≤2 d (x, x0 ) + d2 (x, T (x)) dµ0 (x)
ZM
£ 2 ¤
≤4 d (x, x0 ) + d2 (x0 , T (x)) dµ0 (x)
ZM Z
2
=4 d (x, x0 ) dµ0 (x) + 4 d2 (x0 , y) dµ1 (y) < +∞,
M M
and the result in Section 1.5 tells us that µt is absolutely continuous. Using the notation
of Section 1.4, we have
Z t
1 1
ct (x, y) = inf kγ̇(s)k2γ(s) ds = d2 (x, y).
γ(0)=x, γ(t)=y 0 2 2t
Since Tt and Tt−1 are optimal for the cost function 2t1 d2 (x, y), and T ◦ Tt−1 is optimal for
1
the cost function 2(1−t) d2 (x, y), we get that Tt , Tt−1 and T ◦ Tt−1 are optimal also for the
cost d2 (x, y). ¤
The above result tells us that also (P2ac (M ), W2 ) is a length space.
1.6.1 Regularity, concavity estimate and a displacement con-

vexity result
We now consider the cost function c(x, y) = 21 d2 (x, y). Let µ, ν ∈ P ac (M ) with W2 (µ, ν) <
+∞, and let us denote with f and g their respective densities with respect to vol. Let
T (x) = expx [∇ ˜ x ϕ]
be the unique optimal transport map from µ to ν.

We recall that locally semi-convex (or semi-concave) functions with linear modulus
admit vol-a.e. a second order Taylor expansion (see [16], [50]). Let us recall the definition
of approximate hessian.
Definition 1.6.2 (approximate hessian). We say that f : M → Rm has a approximate

hessian at x ∈ M if there exists a function h : M → R such that the set {f = h} has
density 1 at x with respect to the Lebesgue measure and h admits a second order Taylor
expansion at x, that is, there exists a self-adjoint operator H : Tx M → Tx M such that
1
h(expx w) = h(x) + h∇x h, wi + hHw, wi + o(kwk2x ).
2
˜ 2x f := H.
In this case the approximate hessian is defined as ∇
As in the case of the approximate differential, it is not difficult to show that this
definition makes sense.
Observing that d2 (x, y) is locally semi-concave with linear modulus (see [66, Ap-
pendix]), we get that ϕn is locally semi-convex with linear modulus for each n. Thus we
can define µ-a.e. an approximate hessian for ϕ (see Definition 1.6.2):
˜ 2 ϕ := ∇2 ϕn
∇ for x ∈ An ∩ En ,
x x
where An was defined in the proof of Complement 1.3.4, En denotes the full µ-measure
set of points where ϕn admits a second order Taylor expansion, and ∇2x ϕn denotes the
self-adjoint operator on Tx M that appears in the Taylor expansion on ϕn at x. Let
us now consider, for each set Fn := An ∩ En , an increasing sequence of compact sets
n n
Km ⊂ Fn such that µ(Fn \ ∪m Km ) = 0. We now define the measures µnm := µxKm n
n n n
and νm := T] µm = (exp[∇ϕn ])] µm , and we renormalize them in order to obtain two
probability measures:
µnm νmn
νmn
µ̂nm := n
∈ P2ac (M ), n
ν̂m := n
= n
∈ P2ac (M ).
µm (M ) νm (M ) µm (M )
We now observe that T is still optimal. In fact, if this were not the case, we would have
Z Z
n
c(x, S(x)) dµ̂m (x) < c(x, T (x)) dµ̂nm (x)
M ×M M ×M
n
for a certain S transport map from µ̂nm to ν̂m . This would imply that
Z Z
n
c(x, S(x)) dµm (x) < c(x, T (x)) dµnm (x),
M ×M M ×M
and so the transport map

½ n
S(x) if x ∈ Km ,
S̃(x) := n
T (x) if x ∈ M \ Km
would have a cost strictly less than the cost of T , which would contradict the optimality
of T .
We will now apply the results of [50] to the compactly supported measures µ̂nm and
n
ν̂m in order to get information on the transport problem from µ to ν. In what follows we
will denote by ∇x d2y and by ∇2x d2y , respectively, the gradient and the hessian with respect
to x of d2 (x, y), and by dx exp and d(expx )v the two components of the differential of the
map T M 3 (x, v) 7→ expx [v] ∈ M (whenever they exist). By [50, Theorem 4.2], we get
the following.
Theorem 1.6.3 (Jacobian identity a.e.). There exists a subset E ⊂ M such that
µ(E) = 1 and, for each x ∈ E, Y (x) := d(expx )∇˜ x ϕ and H(x) := 21 ∇2x d2T (x) both exist
and we have
f (x) = g(T (x)) det[Y (x)(H(x) + ∇˜ 2 ϕ)] 6= 0.
x
Proof. It suffices to observe that [50, Theorem 4.2] applied to µ̂nm and ν̂m
n
gives that, for
n
µ-a.e. x ∈ Km ,
f (x) g(T (x))
n
= n det[Y (x)(H(x) + ∇2x ϕn )] 6= 0,
µm (M ) µm (M )
which implies
˜ 2 ϕ)] 6= 0 for µ-a.e. x ∈ K n .

f (x) = g(T (x)) det[Y (x)(H(x) + ∇ x m
Passing to the limit as m, n → +∞ we get the result.
We can thus define µ-a.e. the (weak) differential of the transport map at x as
¡ ¢
˜ 2x ϕ .
dx T := Y (x) H(x) + ∇
Let us prove now that, indeed, T (x) is approximately differentiable µ-a.e., and that the
above differential coincides with the approximate differential of T . In order to prove
this fact, let us first make a formal computation. Observe that since the map x 7→
expx [− 21 ∇x d2y ] = y is constant, we have
¡1 ¢
0 = dx (expx [− 12 ∇x d2y ]) = dx exp[− 12 ∇x d2y ] − d(expx )− 1 ∇x d2y ∇2 d2
2 x y
∀y ∈ M,
2
˜ x ϕ] and recalling
By differentiating (in the approximate sense) the equality T (x) = exp[∇
the equality ∇˜ x ϕ = − ∇x d
1 2
2 T (x) , we obtain
¡ 2 ¢
d˜x T = d(expx )∇˜ x ϕ ∇˜ x ϕ + dx exp[∇ ˜ x ϕ]
¡ 2 ¢ ¡1 2 2 ¢
= d(expx )∇˜ x ϕ ∇˜ ϕ + d(expx ) 1 2
x − 2 ∇x dT (x) 2 ∇x dT (x)
¡ ¢
= d(expx )∇˜ x ϕ H(x) + ∇ ˜ 2x ϕ ,
as wanted. In order to make the above proof rigorous, it suffices to observe that for
µ-a.e. x, T (x) 6∈ cut(x), where cut(x) is defined as the set of points z ∈ M which cannot
be linked to x by an extendable minimizing geodesic. Indeed we recall that the square
of the distance fails to be semi-convex at the cut locus, that is, if x ∈ cut(y), then
d2y (expx [v]) − 2d2y (x) + d2y (expx [−v])

inf = −∞
0<kvkx <1 |v|2
(see [50, Proposition 2.5]). Now fix x ∈ Fn . Since we know that 12 d2 (z, T (x)) ≥ ψ(T (x))−
ϕn (z) with equality for z = x, we obtain a bound from below of the hessian of d2T (x) at x
in terms of the hessian of ϕn at x (see the proof of [50, Proposition 4.1(a)]). Thus, since
each ϕn admits vol-a.e. a second order Taylor expansion, we obtain that, for µ-a.e. x,
x 6∈ cut(T (x)), or equivalently T (x) 6∈ cut(x).
This implies that all the computations we made above in order to prove the formula
for d˜x T are correct. Indeed the exponential map (x, v) 7→ expx [v] is smooth if expx [v] 6∈
cut(x), the function d2y is smooth around any x 6∈ cut(y) (see [50, Paragraph 2]), and ∇˜ xϕ
is approximatively differentiable µ-a.e. Thus, recalling that, once we consider the right
composition of an approximatively differentiable map with a smooth map, the standard
chain rule holds (see the remarks after Definition 1.3.3), we have proved the following
regularity result for the transport map.
Proposition 1.6.4 (approximate differentiability of the transport map). The

transport map is approximatively differentiable for µ-a.e. x, and its approximate differ-
ential is given by the formula
¡ ¢
d˜x T = Y (x) H(x) + ∇
˜ 2ϕ ,
x
where Y and H are defined in Theorem 1.6.3.
To prove our displacement convexity result, the following change of variables formula
will be useful.
Proposition 1.6.5 (change of variables for optimal maps). If A : [0 + ∞) → R is

a Borel function such that A(0) = 0, then
Z Z µ ¶
f (x)
A(g(y)) d vol(y) = A J(x) d vol(x),
M E J(x)
where J(x) := det[Y (x)(H(x) + ∇˜ 2 ϕ)] = det[d˜x T ] (either both integrals are undefined or
x
both take the same value in R).
The proof follows by the Jacobian identity proved in Theorem 1.6.3, exactly as in
[50, Corollary 4.7].
Let us now define for t ∈ [0, 1] the measure µt := (Tt )] µ, where
˜ x ϕ].
Tt (x) = expx [t∇
By the results in [66] and Proposition 1.6.1, we know that Tt coincides with the unique
optimal map pushing µ forward to µt , and that µt is absolutely continuous with respect
to vol for any t ∈ [0, 1].
Given x, y ∈ M , following [50], we define for t ∈ [0, 1]
Zt (x, y) := {z ∈ M | d(x, z) = td(x, y) and d(z, y) = (1 − t)d(x, y)}.
If N is now a subset of M , we set
Zt (x, N ) := ∪y∈N Zt (x, y).
Letting Br (y) ⊂ M denote the open ball of radius r > 0 centered at y ∈ M , for t ∈ (0, 1]
we define
vol(Zt (x, Br (y)))
vt (x, y) := lim >0
r→0 vol(Btr (y))
(the above limit always exists, though it will be infinite when x and y are conjugate
points; see [50]). Arguing as in the proof of Theorem 1.6.3, by [50, Lemma 6.1] we get
the following.
Theorem 1.6.6 (Jacobian inequality). Let E be the set of full µ-measure given by
Theorem 1.6.3. Then for each x ∈ E, Yt (x) := d(expx )t∇˜ x ϕ and Ht (x) := 21 ∇2x d2Tt (x) both
exist for all t ∈ [0, 1] and the Jacobian determinant
˜ 2 ϕ)]
Jt (x) := det[Yt (x)(Ht (x) + t∇ (1.6.2)
x
satisfies
1 1 1 1
Jtn (x) ≥ (1 − t) [v1−t (T (x), x)] n + t [vt (x, T (x))] n J1n (x).
We now consider as source measure µ0 = ρ0 d vol(x) ∈ P ac (M ) and as target measure
µ1 = ρ1 d vol(x) ∈ P ac (M ), and we assume as before that W2 (µ0 , µ1 ) < +∞. By
Proposition 1.6.1 we have
µt = (Tt )] [ρ0 d vol] = ρt d vol ∈ P2ac (M )
for a certain ρt ∈ L1 (M, d vol).
We now want to consider the behavior of the functional
Z
U (ρ) := A(ρ(x)) d vol(x)
M
along the path t 7→ ρt . In Euclidean spaces, this path is called displacement interpolation
and the functional U is said to be displacement convex if
[0, 1] 3 t 7→ U (ρt ) is convex for every ρ0 , ρ1 .
A sufficient condition for the displacement convexity of U in Rn is that A : [0, +∞) →
R ∪ {+∞} satisfy
(0, +∞) ∈ s 7→ sn A(s−n ) is convex and nonincreasing, with A(0) = 0 (1.6.3)
(see [106], [108]). Typical examples include the entropy A(ρ) = ρ log ρ and the Lq -norm
1
A(ρ) = q−1 ρq for q ≥ n−1
n
.
By all the results collected above, arguing as in the proof of [50, Theorem 6.2], we can
prove that the displacement convexity of U is still true on Ricci nonnegative manifolds
under the assumption (1.6.3).
Theorem 1.6.7 (displacement convexity on Ricci nonnegative manifolds). If
Ric ≥ 0 and A satisfies (1.6.3), then U is displacement convex.
Proof. As we remarked above, Tt is the optimal transport map from µ0 to µt . So, by
Theorem 1.6.3 and Proposition 1.6.5, we get
Z Z Ã !
ρ0 (x) ³ 1 ń
U (ρt ) = A(ρt (x)) d vol(x) = A ¡ 1 ¢ Jtn (x) d vol(x), (1.6.4)
n
M Et Jt (x)
n
where Et is the set of full µ0 -measure given by Theorem 1.6.3 and Jt (x) 6= 0 is defined in
(1.6.2). Since Ric ≥ 0, we know that vt (x, y) ≥ 1 for every x, y ∈ M (see [50, Corollary
2.2]). Thus, for fixed x ∈ E1 , Theorem 1.6.6 yields the concavity of the map
1
[0, 1] 3 t 7→ Jtn (x).
1.7. Displacement convexity on Riemannian manifolds 47
Composing this function with the convex nonincreasing function s 7→ sn A(s−n ) we get
the convexity of the integrand in (1.6.4). The only problem we run into in trying to
conclude the displacement convexity of U is that the domain of integration appears to
depend on t. But, since by Theorem 1.6.3 Et is a set of full µ0 -measure for any t ∈ [0, 1],
we obtain that, for fixed t, t0 , s ∈ [0, 1],
U (ρ(1−s)t+st0 ) ≤ (1 − s)U (ρt ) + sU (ρt0 ),
simply by computing each of the three integrals above on the full measure set Et ∩ Et0 ∩
E(1−s)t+st0 .
1.7 Displacement convexity on Riemannian mani-

folds
For the past few years, there has been ongoing research to study the links between
Riemannian geometry and optimal transport of measures [132, 133]. In particular, it was
recently found that lower bounds on the Ricci curvature tensor can be recast in terms
of convexity properties of certain nonlinear functionals defined on spaces of probability
measures [50, 97, 116, 126, 127, 128]. In this section we solve a natural problem in this
field by establishing the equivalence of several such formulations.
Before explaining our results in more detail, let us give some notation and back-
ground. Let (M, g) be a smooth complete connected n-dimensional Riemannian mani-
fold, equipped with its geodesic distance d and its volume measure vol. Let P (M ) be
the set of probability measures on M . For any real number p ≥ 1, we denote by Pp (M )
the set of probability measures µ such that
Z
dp (x, x0 ) dµ(x) < ∞ for some x0 ∈ M .
M
The set P2 (M ) is equipped with the Wasserstein distance of order 2, denoted by W2 :

This is the square root of the optimal transport cost functional, when the cost function
c(x, y) coincides with the squared distance d2 (x, y); see for instance [133, Definition 6.1].
Then P2 (M ) is a metric space, and even a length space; that is, any two probability
measures in P2 (M ) are joined by at least one geodesic curve (µt )0≤t≤1 . (Here and in
the sequel, by convention geodesics are supposed to be globally minimizing and to have
constant speed.)
A basic representation theorem (see [97, Proposition 2.10] or [133, Corollary 7.22])
states that any Wasserstein geodesic curve necessarily takes the form µt = (et )∗ Π, where
Π is a probability measure on the set Γ of minimizing geodesics [0, 1] → M , the symbol
∗ stands for push-forward, and et : Γ → M is the evaluation at time t: et (γ) := γ(t).

So the optimal transport problem between two probability measures µ0 and µ1 produces
three related objects:
- an optimal coupling π of µ0 and µ1 , which is a probability measure on M ×M whose
marginals are µ0 and µ1 , achieving the lowest possible cost for the transport between
these measures;
- a path (µt )0≤t≤1 in the space of probability measures;
- a probability measure Π on the space of geodesics, such that (et )∗ Π = µt and
(e0 , e1 )∗ Π = π. Such a Π is called a dynamical optimal transference plan [133, Defini-
tion 7.20].
The core of the studies in [50, 97, 116, 126, 127, 128] lies in the analysis of the
convexity properties of certain nonlinear functionals along geodesics in P2 (M ), defined
below:
Definition 1.7.1 (Nonlinear functionals of probability measures). Let ν be a
reference measure on M , absolutely continuous with respect to the volume measure. Let
U : R+ → R be a continuous convex function with U (0) = 0; let U 0 (∞) be the limit of
U (r)/r as r → ∞. Let µ be a probability measure on M and let µ = ρν + µs be its
Lebesgue decomposition with respect to ν.
(i) If U (ρ) is bounded below by a ν-integrable function, then the quantity Uν (µ) is
defined by the formula
Z
Uν (µ) = U (ρ(x)) ν(dx) + U 0 (∞) µs [M ].
M
(ii) If π is a probability measure on M × M , admitting µ as first marginal, β is a

positive function on M × M , and β U (ρ/β) is bounded below (as a function of x, y) by
β
a ν-integrable function of x, then the quantity Uπ,ν (µ) is defined by the formula
Z µ ¶
β ρ(x)
Uπ,ν (µ) = U β(x, y)π(dy|x) ν(dx) + U 0 (∞) µs [M ],
M ×M β(x, y)
where π(dy|x) is the disintegration of π(dx dy) with respect to the x variable.
β
Remark 1.7.2. Sufficient conditions for Uν and Uπ,ν to be well-defined are discussed
in [133, Theorems 17.8 and 17.28, Application 17.29] and will not be addressed here.
Remark 1.7.3. If U 0 (∞) = ∞, then finiteness of Uν (µ) implies that µ is absolutely
continuous with respect to ν. This is not true if U 0 (∞) < ∞.
The various notions of convexity that are considered in [97, 126, 127, 128] belong to
the following ones:
Definition 1.7.4 (Convexity properties). (i) Let U and ν be as in Definition 1.7.1,

and let λ ∈ R. We say that the functional Uν is λ-displacement convex if for all Wasser-
stein geodesics (µt )0≤t≤1 whose image lies in the domain of Uν ,
1
Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ) − λ t(1 − t)W22 (µ0 , µ1 ), ∀t ∈ [0, 1]. (1.7.1)
2
We say that the functional Uν is displacement convex with distortion β if for all
Wasserstein geodesics (µt )0≤t≤1 whose image lies in the domain of Uν , if π(dx dy) stands
for the associated optimal coupling between µ0 and µ1 , and π̌ is obtained from π by
exchanging the two variables x and y, then
β β
Uν (µt ) ≤ (1 − t) Uπ,ν (µ0 ) + t Uπ̌,ν (µ1 ), ∀t ∈ [0, 1]. (1.7.2)
(ii) We say that Uν is weakly λ-displacement convex (resp. weakly displacement convex
with distortion β) if for all µ0 , µ1 in the domain of Uν , there is some Wasserstein geodesic
from µ0 to µ1 along which (1.7.1) (resp. (1.7.2)) is satisfied.
(iii) We say that Uν is weakly λ-a.c.c.s. displacement convex (resp. weakly a.c.c.s.
displacement convex with distortion β) if condition (1.7.1) (resp. (1.7.2)) is satisfied along
some Wasserstein geodesic when we further assume that µ0 , µ1 are absolutely continuous
and compactly supported.
Remark 1.7.5. The Wasserstein geodesic in (ii) and (iii) above is implicitly assumed
to have its image entirely contained in the domain of the functional Uν .
Remark 1.7.6. If Uν is a λ-displacement convex functional, then the function t 7→ Uν (µt )
is λ-convex on [0, 1], i.e. for all 0 ≤ s ≤ s0 ≤ 1 and t ∈ [0, 1],
1
Uν (µ(1−t)s+ts0 ) ≤ (1 − t)Uν (µs ) + tUν (µs0 ) − λt(1 − t)(s0 − s)2 W2 (µ0 , µ1 )2 . (1.7.3)
2
This is not a priori the case if we only assume that Uν is weakly λ-displacement convex.
In short, weakly means that we require a condition to hold only for some geodesic
between two measures, as opposed to all geodesics, and a.c.c.s. means that we only
require the condition to hold when the two measures are absolutely continuous and
compactly supported.
There are obvious implications (with or without distorsion)
λ-displacement convex
⇓
weakly λ-displacement convex
⇓
weakly λ-a.c.c.s. displacement convex.
Although the natural convexity condition is arguably the one appearing in (i), that
is, holding true along all Wasserstein geodesics, this condition is quite more delicate to
study than the weaker conditions appearing in (ii) and (iii), in particular for stability
issues: See [97, 126, 127]. In the same references the equivalence between (ii) and (iii)
was established, at least for compact spaces [97, Proposition 3.21]. But the implication
(ii) ⇒ (i) remained open (and was listed as an open problem in a preliminary version
of [133]). Here we shall fill this gap (at least for the functionals defined above), thus
answering a natural question about the notion of displacement convexity. Here is our
main result:
Theorem 1.7.7. Let U , ν and β be as in Definition 1.7.1. Assume that U is Lipschitz.

For each a > 0, define Ua (r) = U (ar)/a. Then
(i) If (Ua )ν is weakly λ-a.c.c.s. displacement convex for any a ∈ (0, 1], then Uν is
λ-displacement convex;
(ii) If (Ua )ν is weakly a.c.c.s. displacement convex with distortion β for any a ∈ (0, 1],
then Uν is displacement convex with distortion β.
Among the consequences of Theorem 1.7.7 is the following corollary:
Corollary 1.7.8. Let M be a smooth complete Riemannian manifold with nonnegative

Ricci curvature and dimension n. Let U (r) = −r1−1/n , and let ν be the volume measure
on M . Then Uν is displacement convex on Pp (M ), where p = 2 if n ≥ 3, and p is any
real number greater than 2 if n = 2.
More generally, Theorem 1.7.7 makes it possible to drop the “weakly” in all displace-
ment convexity characterizations of Ricci curvature bounds.
Before turning to the proof of Theorem 1.7.7, let us explain a bit more about the dif-
ficulties and the strategy of proof. Obviously, there are two problems to tackle: first, the
possibility that µ0 and/or µ1 do not have compact support; and secondly, the possibility
that µ0 and/or µ1 are singular with respect to the volume measure.
It was shown in [97, 126, 127] that inequalities such as (1.7.1) or (1.7.2) are stable
under (weak) convergence. Then it is natural to approximate µ0 , µ1 by compactly
supported, absolutely continuous measures, and pass to the limit. This scheme of proof
is enough to show the implication (iii) ⇒ (ii) in Definition 1.7.4, but does not guarantee
that we can attain all Wasserstein geodesics in this way — unless of course we know that
there is a unique Wasserstein geodesic between µ0 and µ1 .
To treat the difficulty arising from the possible non-compactness, we use the results of
the previous sections, showing that the Wasserstein geodesic between any two absolutely
continuous probability measures on a Riemannian manifold M is unique, even if they
are not compactly supported.
The difficulty arising from the possible singularity of µ0 , µ1 is less simple. If µ0 and µ1
are both singular, then there are in general several Wasserstein geodesics joining them.
A most simple example is constructed by taking µ0 = δx0 and µ1 = δx1 , where δx stands
for the Dirac mass at x, and x0 , x1 are joined by multiple geodesics. So it is part of
the problem to regularize µ0 , µ1 into absolutely continuous measures µ0,k , µ1,k so that,
as k → ∞, the optimal transport between µ0,k and µ1,k converges to a given optimal
transport between µ0 and µ1 .
We handle this by a rather nonstandard regularization procedure, which roughly goes
as follows. We start from a given dynamical optimal transference plan Π between µ0
and µ1 , leave intact that part Π(a) of Π which corresponds to the absolutely continuous
part of µ0 . Then we let displacement occur for a very short time at the level of that
part Π(s) of Π corresponding to the singular part of µ0 . Next we regularize the resulting
contribution of Π(s) .
Let us illustrate this in the most basic case when µ0 = δx0 and µ1 = δx1 . Let
γ = (γt )0≤t≤1 be a given geodesic between x0 and x1 ; we wish to approximate the
Wasserstein geodesic (δγt )0≤t≤1 . Instead of directly regularizing µ0 and µ1 , we shall first
replace µ0 by µτ = δγτ , where τ is positive but very small; and then regularize δγτ and
δx1 into probability measures µτ,ϕ and µ1,ϕ . What we have gained is that the geodesic
joining γτ to x1 = γ1 is unique, so we may let τ → 0 and ϕ → 0 in such a way that the
Wasserstein geodesic joining µτ,ϕ to µ1,ϕ does converge to (δγt )0≤t≤1 .
In a more general context, the procedure will be more tricky, and what will make it
work is the following important property [133, Theorem 7.29]: Geodesics in dynamical
optimal transport plans do not cross at intermediate times. In fact, if Π is a given
dynamical optimal transport plan, then for each t ∈ (0, 1) one can define a measurable
map Ft : M → Γ by the requirement that Ft ◦et = Id, Π-almost surely. In understandable
words, if γ is a geodesic along which there is optimal transport, then the position of γ
at time t determines the whole geodesic γ. This property will ensure that Π(a) and Π(s)
“do not overlap at intermediate times”.
Finally, we note that the results in this section can be extended to more general
situations outside the category of Riemannian manifolds: It is sufficient that the optimal
transport between any two absolutely continuous probability measures be unique. In fact,
there is a more general framework where these results still hold true, namely the case of
nonbranching locally compact, complete length spaces. This extension is established, by
a slightly different approach, in [133, Chapter 30].
1.7.1 Proofs
In the sequel, we shall use the notation Ua,ν for (Ua )ν . An important ingredient in the
proof of Theorem 1.7.7 will be the following lemma, which has interest on its own (and
will be used for different purposes in [133, Chapter 30]).

Lemma 1.7.9. Let U be a Lipschitz convex function with U (0) = 0. Let µ1 , µ2 be any
two probability measures on M , and let Z1 , Z2 be two positive numbers with Z1 + Z2 = 1.
Then
(i) Uν (Z1 µ1 + Z2 µ2 ) ≥ Z1 UZ1 ,ν (µ1 ) + Z2 UZ2 ,ν (µ2 ), with equality if µ1 and µ2 are
singular to each other;
(ii) Let π1 , π2 be two probability measures on M ×M , and let β be a positive measurable
function on M × M . Then
UZβ1 π1 +Z2 π2 ,ν (Z1 µ1 + Z2 µ2 ) ≥ Z1 UZβ1 ,π1 ,ν (µ1 ) + Z2 UZβ2 ,π2 ,ν (µ2 ),
with equality if µ1 and µ2 are singular to each other.

Proof of Lemma 1.7.9. We start by the following remark: If x, y are nonnegative num-
bers, then
U (x + y) ≥ U (x) + U (y). (1.7.4)
Inequality (1.7.4) follows at once from the fact that U (t)/t is a nondecreasing function
of t, and thus
U (x) U (x + y) U (y) U (x + y)
≤ , ≤ =⇒ xU (x + y) + yU (x + y) ≥ (x + y)(U (x) + U (y)).
x x+y y x+y
Next, with obvious notation,
Z
¡ ¢
Uν (Z1 µ1 + Z2 µ2 ) = U (Z1 ρ1 + Z2 ρ2 ) dν + U 0 (∞) Z1 µ1,s [M ] + Z2 µ2,s [M ] ;
Z
1
UZ1 ,ν (µ1 ) = U (Z1 ρ1 ) dν + U 0 (∞)µ1,s [M ];
Z1
Z
1
UZ2 ,ν (µ2 ) = U (Z2 ρ2 ) dν + U 0 (∞)µ2,s [M ];
Z2
so part (i) of the lemma follows immediately from (1.7.4). The claim about equality is
obvious since it amounts to say that U (x + y) = U (x) + U (y) as soon as either x or y is
zero.
The proof of part (ii) is based on a similar type of reasoning. First note that (with the
conventions U (0)/0 = U 0 (0), U (∞)/∞ = U 0 (∞) and µs -almost surely, dµ/dν = +∞)
UZβ1 π1 +Z2 π2 ,ν (Z1 µ1 + Z2 µ2 )

Z µ ¶
Z1 ρ1 (x) + Z2 ρ2 (x) β(x, y)
= U (Z1 π1 + Z2 π2 )(dx dy);
M ×M β(x, y) Z1 ρ1 (x) + Z2 ρ2 (x)
Z µ ¶
Z1 ρ1 (x) β(x, y)
UZβ1 ,π1 ,ν (µ1 ) = U Z1 π1 (dx dy);
β(x, y) Z1 ρ1 (x)
Z µ ¶
β Z2 ρ2 (x) β(x, y)
UZ2 ,π2 ,ν (µ2 ) = U π2 (dx dy).
β(x, y) Z2 ρ2 (x)
So the proof of the lemma will be complete if we can show that
µ ¶
Z1 ρ 1 + Z2 ρ 2 β
U (Z1 π1 + Z2 π2 )
β Z1 ρ 1 + Z2 ρ 2
µ ¶ µ ¶
Z1 ρ1 β Z2 ρ 2 β
≥U (Z1 π1 ) + U (Z2 π2 ). (1.7.5)
β Z1 ρ1 β Z2 ρ 2
Since U (r)/r is a nondecreasing function of r, if X1 , X2 , p1 , p2 are any four nonnegative

numbers then
U (X1 + X2 ) U (X1 ) U (X2 )
(p1 + p2 ) ≥ p1 + p2 .
X1 + X2 X1 X2
To recover (1.7.5), it suffices to apply the latter inequality with
Z1 ρ1 (x) Z2 ρ2 (x)
X1 = , X2 = ,
β(x, y) β(x, y)
d(Z1 π1 ) d(Z2 π2 )
p1 = (x, y), p2 = (x, y)
d(Z1 π1 + Z2 π2 ) d(Z1 π1 + Z2 π2 )
and to integrate against (Z1 π1 + Z2 π2 )(dx dy).
Proof of Theorem 1.7.7. First we observe that Uν is well-defined on P2 (M ) since, if µ =

ρν + µs is the Lebesgue decomposition of a probability measure µ ∈ P (M ), then
U (ρ) ≥ −kU kLip ρ ∈ L1 (M, ν).
In fact, there is also an upper bound, so Uν is well-defined on the whole of P2 (M ) with

values in R. Moreover, by an approximation argument, we may replace the assumptions
of weak a.c.c.s. displacement convexity by weak displacement convexity on the whole of
P2 (M ). (The proof is the same as in [97, Proposition 3.21] (in the compact case) or [133,
Theorem 30.5].)
Let µ0 , µ1 be any two measures in P2 (M ), and let Π be an optimal dynamical
transference plan between µ0 and µ1 . Let further
µ0 = ρ0 ν + µ0,s
be the Lebesgue decomposition of µ0 with respect to ν. Let E (a) and E (s) be two disjoint
Borel subsets of M such that ρ0 ν is concentrated on E (a) and µ0,s is concentrated on
E (s) . We decompose Π as
Π = Π(a) + Π(s) , (1.7.6)
where
© ª © ª
Π(a) := Πx γ ∈ Γ | γ(0) ∈ E (a) , Π(s) := Πx γ ∈ Γ | γ(0) ∈ E (s) .
Taking the marginals at time t in (1.7.6) we get

(a) (s)
µt = µ t + µt .
(a) (s)
In the end, we renormalize µt and µt into probability measures: we define
(a) (a)
Z (a) = Π(a) [Γ] = µ0 [M ] = µt [M ]; Z (s) = Π(s) [Γ],
and
(a) (s)
Π(a) (a) µ Π(s) (s) µ
Π̂ (a)
:= (a) , µ̂t := t(a) ; Π̂ (s)
:= (s) , µ̂t := t(s) .
Z Z Z Z
So
(a) (s)
µt = Z (a) µ̂t + Z (s) µ̂t . (1.7.7)
(a)
We remark that by what we proved in Section 1.5 µt is absolutely continuous for any
(s)
t ∈ [0, 1), but µt is not necessarly completely singular.
It follows from [133, Theorem 7.29 (v)] that for any t ∈ (0, 1) there is a Borel map
(s)
Ft such that Ft (γt ) = γ0 , Π(dγ)-almost surely. Then µt is concentrated on Ft−1 (E (s) ),
(a)
while µt is concentrated on Ft−1 (E (a) ); so these measures are singular to each other.
Then by Lemma 1.7.9 and (1.7.7), for any t ∈ (0, 1),
(a) (s)
Uν (µt ) = Z (a) UZ (a) ,ν (µ̂t ) + Z (s) UZ (s) ,ν (µ̂t ). (1.7.8)
In the sequel, we focus on part (i) of Theorem 1.7.7, since the reasoning is quite the
same for part (ii). By construction and the restriction property of optimal transport [133,
(a) (a)
Theorem 7.29], Π̂(a) is an optimal dynamical transference plan between µ̂0 and µ̂1 , and
(a) (a)
the associated Wasserstein geodesic is (µ̂t )0≤t≤1 . Since by construction µ̂0 is absolutely
(a)
continuous, by what we already proved (µ̂t ) is the unique Wasserstein geodesic joining
(a) (a)
µ̂0 to µ̂1 . Then we can apply the displacement convexity inequality of the functional
UZ (a) ,ν along that geodesic:
(a) (a) (a) λ (a) (a)

UZ (a) ,ν (µ̂t ) ≤ (1 − t) UZ (a) ,ν (µ̂0 ) + t UZ (a) ,ν (µ̂1 ) − t (1 − t) W22 (µ̂0 , µ̂1 ). (1.7.9)
2
Next, let ϕk → 0 be a sequence of positive numbers. From the nonbranching property

(s) (s)
of P2 (M ) [133, Corollary 7.31], there is only one Wasserstein geodesic joining µ̂ϕk to µ̂1
(s)
and it is obtained by reparameterizing (µ̂t )ϕk ≤t≤1 on [0, 1] (with an affine reparameteri-
zation in t). So we can also apply the displacement convexity inequality of the functional
UZ (s) ,ν along that geodesic, and get
µ ¶ µ ¶
(s) 1−t t − ϕk (s)
UZ (s) ,ν (µ̂t ) ≤ UZ (s) ,ν (µ̂(s)
ϕk ) + UZ (s) ,ν (µ̂1 )
1 − ϕk 1 − ϕk
λ (s) (s)
− (t − ϕk ) (1 − t) W22 (µ̂0 , µ̂1 ). (1.7.10)
2
(For the latter term we have used the fact that if (µt )0≤t≤1 is any Wasserstein geodesic,
then W2 (µs , µt ) = |t − s| W2 (µ0 , µ1 ).)
The first term in the right-hand side of (1.7.10) can be trivially bounded by U 0 (∞),
(s) (s)
which coincides with UZ (s) ,ν (µ̂0 ) since µ̂0 is totally singular. Indeed, since U (r) r
≤
U 0 (∞), we have
Z
(s) 1 ¡ ¢
UZ (s) ,ν (µ̂ϕk ) = (s) U Z (s) ρ̂(s) 0 (s)
ϕk dν + U (∞) µ̂ϕk ,s (M )
Z
ZM
1 ¡ ¢
= (s) U Z (s) ρ̂(s) 0 (s)
ϕk dν + U (∞) µ̂ϕk ,s (M )
Z (s)
{ρ̂ϕk >0}
Z ¡ (s) ¢
U Z (s) ρ̂ϕk (s)
= (s) s
ρ̂ϕk dν + U 0 (∞) µ̂(s)
ϕk ,s (M )
(s)
{ρ̂ϕk >0} Z ρ̂ϕk
Z
≤ U 0 (∞)ρ̂(s) 0 (s)
ϕk dν + U (∞) µ̂ϕk ,s (M )
(s)
{ρ̂ϕk >0}
= U 0 (∞) µ̂ϕ(s)k (M ) = U 0 (∞).
Then by passing to the lim inf as k → ∞ in (1.7.10), we recover
(s) (s) (s) λ (s) (s)

UZ (s) ,ν (µ̂t ) ≤ (1 − t) UZ (s) ,ν (µ̂0 ) + t UZ (s) ,ν (µ̂1 ) − t(1 − t) W22 (µ̂0 , µ̂1 ). (1.7.11)
2
By combining together (1.7.8), (1.7.9) and (1.7.11), we obtain
h i h i
(a) (a) (s) (s) (a) (a) (s) (s)
Uν (µt ) ≤ (1−t) Z UZ (a) ,ν (µ̂0 )+Z UZ (s) ,ν (µ̂0 ) +t Z UZ (a) ,ν (µ̂1 )+Z UZ (s) ,ν (µ̂1 )
λ h i
(a) 2 (a) (a) (s) 2 (s) (s)
− t(1 − t) Z W2 (µ̂0 , µ̂1 ) + Z W2 (µ̂0 , µ̂1 ) . (1.7.12)
2
The last term inside square brackets can be rewritten as

Z Z Z
d (γ0 , γ1 )Π (dγ) + d (γ0 , γ1 )Π (dγ) = d2 (γ0 , γ1 )Π(dγ) = W22 (µ0 , µ1 ).
2 (a) 2 (s)
Plugging this back into (1.7.12) and using Lemma 1.7.9, we conclude that
λ
Uν (µt ) ≤ (1 − t) Uν (µ0 ) + t Uν (µ1 ) − t(1 − t) W22 (µ0 , µ1 ).
2
This finishes the proof of Theorem 1.7.7.
Proof of Corollary 1.7.8. Let U := r → −r1−1/N . By the estimates derived in [97,

Proposition E.17], Uν is well-defined on Pp (M ). (This is made more explicit in [133,
Theorem 17.8 and Example 17.9].)
Let DCn be the displacement convex class of order n, that is the class of functions
U ∈ C 2 (0, ∞) ∩ C([0, +∞)) such that U (0) = 0 and δ n U (δ −n ) is a convex function of δ.
(See [133, Definition 17.1]). Obviously, U ∈ DCn . By [133, Proposition 17.7], there is a
sequence (U (`) )`∈N of Lipschitz functions, all belonging to DCn , such that U (`) converges
monotonically to U as ` → ∞.
Since U (`) lies in DCn , it is by now classical (see [133, Theorem 17.15], which sum-
(`)
marizes the works of many authors) that Uν it is a.c.c.s-displacement convex. By
Theorem 1.7.7, this functional is also displacement convex. Then it follows by an easy
limiting argument that Uν itself is displacement convex.
1.8 A generalization of the existence and uniqueness

result
Now we want to generalize this existence and uniqueness result for optimal transport
mapping without any integrability assumption on the cost function, adapting the ideas
of [107]. We assume that M is an n-dimensional manifold and N a locally compact
Polish space. We observe that, without the hypothesis
Z
c(x, y) dµ(x) dν(y) < +∞,
M ×N
in general the minimization problem

½Z ¾
C(µ, ν) := inf c(x, y) dγ(x, y)
γ∈Π(µ,ν) M ×N
1.8. A generalization of the existence and uniqueness result 57
is ill-posed, as it may happen that C(µ, ν) = +∞. Howewer, it is known that the opti-
mality of a transport plan γ is equivalent to the c-cyclical monotonicity of the measure-
theoretic support of γ whenever C(µ, ν) < +∞ (see [13], [120], [133]), and so one may
ask whether the fact that the support of γ is c-cyclically monotone implies that γ is
supported on a graph. Moreover one can also ask whether this graph is unique, that is is
does not depends on γ, which is the case when the cost is µ ⊗ ν integrable, as Theorem
1.3.2 tells us. The uniqueness in that case, follows by the fact that the functions ϕn are
constructed using a pair of function (ϕ, ψ) which is optimal for the dual problem, and
so they are independent of γ. The result we now want to prove is the following:
Theorem 1.8.1. Assume that c : M × N → R is lower semicontinuous and bounded

from below, and let γ be a plan concentrated on a c-cyclically monotone set. If
in y;
∂c
(ii) ∂x
(x, ·) is injective on its domain of definition;
(iii) and the measure µ gives zero mass to sets with σ-finite (n − 1)-dimensional Haus-
dorff measure,
then γ is concentrated on the graph of a measurable map T : M → N (existence).

Moreover, if γ̃ is another plan concentrated on a c-cyclically monotone set, then γ̃ is
concentrated on the same graph (uniqueness).
Proof. Since the proof of the existence result is the same as in Theorem 1.3.2, we
concentrate on the uniqueness part. As we observed before, the difference here with the
case of Theorem 1.3.2 is that the function ϕn depends on the pair (ϕ, ψ), which in this
case depends on γ.
Let (ϕ, ψ) be a pair associated to γ as in Theorem 1.3.2, and let ϕn and Bn be such
that γ is concentrated on the graph of the map T determined on Bn by
∂c
(x, T (x)) = −dx ϕn for x ∈ Bn
∂x
We observe that, thanks to the local compactness of N , in the formula
ϕn (x) = sup ψ(y) − c(x, y) (1.8.1)

y∈Vn
we can assume Vn to be compact. We moreover recall the equality

[
ϕ(x) = ψ(T (x)) − c(x, T (x)) ∀x ∈ Bn . (1.8.2)
n
Let now (ϕ̃, ψ̃) be a pair associated to γ̃, and let ϕ̃n , B̃n and T̃ be constructed as above.
We need to prove that T = T̃ µ-a.e.
Let us define Cn := Bn ∩ B̃n . Then µ(Cn ) % 1. We want to prove that, if x is a µ-density
point of Cn for a certain n, then T (x) = T̃ (x) (we recall that, since µ(∪n Cn ) = 1, also
the union of the µ-density points of Cn is of full µ-measure, see for example [61, Chapter
1.7]).
Let us assume by contradiction that T (x) 6= T̃ (x), that is
dx ϕn 6= dx ϕ̃n .
Since x ∈ supp(µ), each ball around x must have positive measure under µ. Moreover,
the fact that the sets {ϕn = ϕ} and {ϕ̃n = ϕ̃} have µ-density 1 in x implies that the set
{ϕ = ϕ̃}
has µ-density 0 in x. In fact, as ϕn and ϕ̃n are locally semi-convex, up to adding a C 1

function they are concave in a neighborhood of x and their gradients differ at x. So we
can apply the non-smooth version of the implicit function theorem proven in [107], which
tells us that {ϕn = ϕ̃n } is a set with finite (n − 1)-dimensional Hausdorff measure in a
neighborhood of x (see [107, Theorem 17 and Corollary 19]). So we have
"
µ({ϕ = ϕ̃} ∩ Br (x)) µ({ϕ 6= ϕn } ∩ Br (x))
lim sup ≤ lim sup
r→0 µ(Br (x)) r→0 µ(Br (x))
#
µ({ϕn = ϕ̃n } ∩ Br (x)) µ({ϕ̃n 6= ϕ̃} ∩ Br (x))
+ + = 0.
µ(Br (x)) µ(Br (x))
Now, exchanging ϕn with ϕ̃n if necessary, we may assume that

1
µ({ϕn < ϕ̃n } ∩ Br (x)) ≥ µ(Br (x)) for r > 0 sufficiently small,
3
which implies
1
µ({ϕ < ϕ̃} ∩ Br (x)) ≥ µ(Br (x)) for r > 0 sufficiently small. (1.8.3)
4
Let us define A := {ϕ < ϕ̃}, An := {ϕn < ϕ̃n }, En := A ∩ An ∩ Cn . Since the sets
{ϕn = ϕ} and {ϕ̃n = ϕ̃} have µ-density 1 in x, and x is a µ-density point of Cn , we have
µ((A \ En ) ∩ Br (x))
lim = 0,
r→0 µ(Br (x))
1.8. A generalization of the existence and uniqueness result 59
and so, by (1.8.3), we get
1
µ(En ∩ Br (x)) ≥ µ(Br (x)) for r > 0 sufficiently small. (1.8.4)
5
Now, arguing as in the proof of the Aleksandrov’s lemma (see [107, Lemma 13]), we can
prove that
X := T̃ −1 (T (A)) ⊂ A
and X ∩ En lies a positive distance from x. In fact let us assume, without loss of
generality, that
ϕ(x) = ϕn (x) = ϕ̃(x) = ϕ̃n (x) = 0, dx ϕn 6= dx ϕ̃n = 0.
To obtain the inclusion X ⊂ A, let z ∈ X and y := T̃ (z). Then y = T (m) for a certain
m ∈ A. For any w ∈ M , recalling (1.8.2), we have
ϕ(w) ≤ c(m, y) − c(w, y) + ϕ(m),
ϕ̃(m) ≤ c(z, y) − c(m, y) + ϕ̃(z).

Since ϕ(m) < ϕ̃(m) we get
ϕ(w) < c(z, T̃ (z)) − c(w, T̃ (z)) + ϕ̃(z) ∀w ∈ M.
In particular, taking w = z, we obtain z ∈ A, that proves the inclusion X ⊂ A.

Let us suppose now, by contradiction, that there exists a sequence (zk ) ⊂ X ∩ En such
that zk → x. Again there exists mk such that T̃ (zk ) = T (mk ). As dx ϕ̃n = 0, the closure
of the superdifferential of a semi-concave function implies that dzk ϕ̃n → 0. We now
observe that, arguing exactly as above with ϕn and ϕ̃n instead of ϕ and ϕ̃, using (1.8.1),
(1.8.2), and the fact that ϕ = ϕn and ϕ̃ = ϕ̃n on Cn , one obtains
ϕn (w) < c(zk , T̃ (zk )) − c(w, T̃ (zk )) + ϕ̃n (zk ) ∀w ∈ M.
Taking w sufficiently near to x, we can assume that we are in Rn × N . We now remark

that, since zk ∈ En ⊂ D̃n , T̃ (zk ) vary in a compact subset of N . So, by hypothesis (i) on
c, we can find a common modulus of continuity ω in a neighborhood of x for the family
of uniformly semi-concave functions z 7→ c(z, T̃ (zk )). Then, we get
∂c
ϕn (w) < − (zk , T̃ (zk ))(w − zk ) + ω(|w − zk |)|w − zk | + ϕ̃n (zk )
∂x
= dzk ϕ̃n (w − zk ) + ω(|w − zk |)|w − zk | + ϕ̃n (zk ).
Letting k → ∞ and recalling that dzk ϕ̃n → 0 and ϕ̃n (x) = ϕn (x) = 0, we obtain
ϕn (w) − ϕn (x) ≤ ω(|w − x|)|w − x| ⇒ dx ϕn = 0,
which is absurd.
Thus there exists r > 0 such that Br (x) ∩ En and X ∩ En are disjoint, and (1.8.4) holds.
Defining now Y := T (A), by (1.8.4) we obtain
ν(Y ) = µ(T −1 (Y )) ≥ µ(A) = µ(En ) + µ(A \ En ) ≥ µ(Br (x) ∩ En )

1
+ µ(X ∩ En ) + µ(X \ En ) = µ(Br (x) ∩ En ) + µ(X) ≥ µ(Br (x)) + ν(Y ),
5
which is absurd. ¤
Let now consider the special case N = M , with M a complete manifold. As shown
in Paragraph 1.4, the above theorem applies in the following cases:
1. c : M × M → R is defined by
Z 1
c(x, y) := inf L(γ(t), γ̇(t)) dt,
γ(0)=x, γ(1)=y 0
where the infimum is taken over all the continuous piecewise C1 curves, and the
Lagrangian L(x, v) ∈ C 2 (T M, R) is C 2 -strictly convex and uniform superlinear in
v, and verifies an uniform boundeness in the fibers;
2. c(x, y) = dp (x, y) for any p ∈ (1, +∞), where d(x, y) denotes a complete Rieman-
nian distance on M .
Chapter 2
The irrigation problem
2.1 Introduction
1
The variety of structures arising in nature is extraordinary. By exploring the relation-
ship between form and function, D’Arcy Thompson, in his pioneering work [53], tries
to find common principles behind the varied phenomena (physical, chemical, biological,
short or long time scale, etc.) that interact to give birth to these structures. Indeed,
despite the complexity of nature, the approach of retaining only a small but decisive set
of parameters and principles to model the phenomenon at the origin of a given structure
can be successful. See for example [113] or consider the work of Turing on morphogene-
sis that led him to explain the appearance of heterogeneous spatial patterns in terms of
reaction-diffusion mechanisms [131].
Recently, such an approach was taken to model branched networks that achieve a
transport from a source to a target. Such networks are everywhere in nature (plants and
trees, river basins, bronchial and cardiovascular systems) and in man designed struc-
tures (communication networks, electric power supply, water distribution or drainage
networks). The common function of such networks is to transport some goods from an
initial distribution (the supply) to another (the demand). Following D’Arcy Thomp-
son, it is desirable to tie a link between this unity of form (branched networks) and
this unity of function (transporting goods from a supply to a demand). This was done
in [82, 98, 135, 25, 24, 29] by considering cost functions that encode the efficiency of
a transport induced by some structure. Branched structures, as the one observed in
nature, then arise as the optimal structures along which the transport takes place.
A simple but crucial principle was incorporated in the design of all the cost functions
1
This chapter is based on a joint work with Marc Bernot [27].
61
62 2.0. The irrigation problem
used by these authors. This principle states that it is more efficient to transport mass
in a grouped way rather than in a separate way. To embed this principle, the previously
mentioned costs incorporate a parameter α ∈ [0, 1] and make use of the concavity of
x 7→ xα . The idea is that for positive masses m1 and m2 , we have (m1 +m2 )α ≤ mα1 +mα2 ,
so that the particles are interested in moving together in order to lower the cost (see for
example the role of α in (2.1.1)). This effect gets stronger as α decreases, while the limit
case α = 1 gives no importance to the grouping of particles.
We now briefly review the different costs and descriptions of branched structures that
have been introduced so far. We then introduce a new dynamical cost functional, and
enlight the advantages it has over other models.
The model described by Gilbert in [82] consists in finite directed weighed graph G
with straight edges E(G) Pkand a weight function−w : P E(G) → (0, ∞).PThe graph P G
l
connects sources µ+ = a δ
i=1 i xi and targets µ = b δ
j=1 j yj with a
i i = j bj ,
ai , bj ≥ 0, and is required to satisfy Kirchhoff’s law at each vertex. The cost of G is
defined to be: X
M α (G) = w(e)α H1 (e). (2.1.1)
e∈E(G)
In [135], Xia extends this model to a continuous framework using Radon vector measures.
In both these models, the objects and their costs are static in the sense that no “particle”
is actually transported along the structure, and the cost depends only on the geometry
of the network.
In [98, 25, 24], a different kind of object, called traffic plan and denoted by χ, is
considered. In this framework, all particles are indexed by the set Ω := [0, 1], and
to each ω ∈ Ω is associated a 1-Lipschitz path χ(ω, ·) in RN . This is a Lagrangian
description of the dynamic of particles that can be encoded by the image measure Pχ of
the map ω 7→ χ(ω, ·) (which is therefore a measure on the set of 1-Lipschitz paths). This
measure induces a network structure similar to the one considered by Xia. To each traffic
plan is associated a cost E α which depends only on its network structure (see Definition
2.2.4) and, whenever it is finite, is the same as the one considered by Xia. Thus, though
a traffic plan is a dynamical object, its cost is static.
In [29], Brancolini, Buttazzo and Santambrogio consider an Eulerian formulation of
the problem, describing a transport from µ+ to µ− as a path in the space of measures.
The cost of such a path is defined as the length induced by a degenerate Riemannian
metric in the space of probability measures. More precisely, the cost of a path µ(t) is
given by Z 1
J(µ(t))|µ0 |(t) dt,
0
where J is a functional in the space of probability measures and |µ0 | denotes the metric
derivative (for the Wasserstein distance) of the path. Both the object and the cost are
2.1. Introduction 63
dynamical in this model.

All the above described models propose structures that transport a measure µ+ to a
measure µ− and associate a cost to this structure. This leads to consider what is called
the irrigation problem by some authors [25, 24, 26], i.e., given two measures µ+ and µ− ,
the problem of minimizing the cost among structures transporting µ+ to µ− . In the case
of the traffic plan model, an additional problem can be considered, namely the who goes
where problem [25, 24]. The latter problem consists in looking for an optimal structure
that achieves a given transference plan. In other words, rather than only prescribing the
initial an final distribution of masses as in the irrigation problem, one also prescribes
the coupling between initial and final positions of each particle. As an example, one can
think about the case where the initial distribution represents the habitations, and the
final distribution stands for the workplaces. In this case, it is natural to constrain each
inhabitant to go from his habitation to his workplace, and so the problem is to find the
best itinerary he can follow.
Here, we consider the Lagrangian formulation given in [98, 25]. This choice is moti-
vated by the fact that traffic plans permit to recover other descriptions. Indeed, given a
traffic plan χ, one can always canonically define both a structure similar to the one of
Xia, and a path in the space of measures by considering the time marginals of its induced
measure Pχ . We consider general costs of the form
Z Z
C(χ) := c(χ, ω, t)|χ̇(ω, t)| dt dω. (2.1.2)
Ω R+
The advantage of the Lagrangian formulation with respect to the Eulerian one is to
allow to define costs of the above form in which one can take care of the speed of each
single particle, so that only moving particles contribute to the total cost.
What we propose, is to give a cost to the actual “dynamical” transport of mass from
µ+ to µ− that is induced by χ. To obtain such a cost, it is natural to require c(χ, ω, t) to
be local in space-time. By this property, we mean that c(χ, ω, t) only takes into account
the particles that are located at the point χ(ω, t) at time t. In [25] is considered a cost
c(χ, ω, t) depending on the total mass of particles passing through the point χ(ω, t) at
some time (see Definition 2.2.4). Since it takes into account only the global trajectories
of particles but not their local dynamics, this cost is local in space but not in time. The
associated functional E α thus quantifies the cost of the structure achieving the transport,
rather than the cost of the transport itself. In other words, we could also say that E α
evaluates the cost of permanent regime connecting µ+ to µ− , rather than the cost of a
dynamical transport from µ+ to µ− . The elementary cost c we introduce in Definition
2.3.3 has the desired locality property, and we denote by C α the induced cost via formula
(2.1.2). It is possible to extend the time domain by replacing R+ with R in (2.1.2), and
we denote by ERα and CRα the costs corresponding to E α and C α .
A B
Figure 2.2: The best way to switch two

equal masses between two points A and
B is to transport the mass at A to po-
sition B and the mass at B to position
A along the segment joining them. For
Figure 2.1: In the case of the static cost such a structure, the C α cost we propose
E α in [25], a portion of a path where it distinguishes between trajectories going
overlaps with itself contributes only once from A to B and from B to A, which is
to the total cost, whereas the locality in not the case of the E α cost. Thus, the
time of the model we propose gives the C α cost is more realistic for the “who
expected cost. goes where” problem.
We illustrate the advantage of such a “dynamical” cost with respect to the static one
in [25] on two examples:
• It gives a more realistic cost to an overlapping path. Indeed, in the case of the
static cost in [25], a path that follows the same circuit twice contributes to the cost
once, while the locality in time of the model we propose gives the expected cost
(see figure 2.1).
• It is more appropriate for the “who goes where problem”. Let us consider the
problem of two equal masses m located at points A and B, which represent both
the source and the target distribution, and where the transference plan constraint
consists in switching the two masses. In this case, the solution to this “who goes
where” problem is to transport the mass in A to position B and the mass in B
to position A along the segment joining them. For such a structure, the E α cost
does not distinguish between trajectories going from A to B and from B to A.
Indeed, the E α cost of this structure is |A − B|(2m)α , while the natural one would
be 2|A − B|mα . This is exactly the cost given by C α (see figure 2.2).
We will consider the irrigation problems for all the just mentioned costs. As it will be
proved in Section 2.5, the two irrigation problems with costs E α and C α are equivalent if
µ+ is a finite atomic, while the equivalence for ERα and CRα always holds. More precisely, in
these cases, we will prove that any minimizer for the dynamical cost is an E α -minimizer,
and that moreover, up to reparameterization, the converse is true (see the remarks after
Theorem 2.5.2). Since the cost E α (χ) is invariant by reparameterization of the traffic
2.2. Traffic plans 65
plan χ, while C α (χ) in general is not, this fact will tell us in particular that the cost C α
has the feature to select, among all the possible reparameterization of an optimal traffic
plan χ, some particular ones, in which particles actually move in a grouped way.
Given two measures µ+ and µ− , let us define
E α (µ+ , µ− ) := inf E α (χ),
where the infimum is taken over all traffic plans transporting µ+ onto µ− (the same can
be done with C α , ERα and CRα ). By the above formula, one obtains a one-parameter
family of distances between measures, each of them inducing the weak-∗ topology. It
turns out that the continuity of the function α 7→ E α (µ+ , µ− ) is related to the following
stability property: given a converging sequence of traffic plans χn , respectively optimal
for the value αn , its limit is optimal for the limit value of αn . In particular, considering a
sequence αn → 1, one would obtain the convergence of optimal structures to an optimal
structure for the 1-Wasserstein distance. It is therefore of interest to study the α depen-
dence of E α (µ+ , µ− ). This α dependence will be shown in Section 2.6 to be continuous
if α ∈ [1 − N1 , 1] (N being the dimension of the ambient space).
The plan is as follows. In Section 2.2, we recall the main definitions and results
concerning traffic plans. In Section 2.3, we consider the energy functional of [25] in a
more general framework for which we obtain a general lower semicontinuity result. Then
we define a new dynamical (in the sense previously discussed) cost functional and obtain a
partial result of existence of a “dynamical” optimal traffic plan for the irrigation problem.
We can however obtain a more complete existence result by studying the properties of
E α -minimizers. Indeed, in Section 2.4, we prove that any E α -optimal traffic plan can
be suitably reparameterized. From this fact, we deduce in Section 2.5 that the cost of
optimal traffic plans and dynamical optimal traffic plans are the same, and that any
E α -optimal traffic plan can be reparameterized so that it is becomes optimal also for the
dynamical cost C α (this is always true for ERα and CRα , while for E α and C α we need µ+
to be finite atomic). Finally, in Section 2.6, we prove continuity results of E α (µ+ , µ− )
with respect to α, for fixed µ+ and µ− . As we already said above, this implies that limits
of optimal (for different values of α) traffic plans are still optimal for the limit value.
2.2 Traffic plans

In this section, we recall main definitions and results concerning traffic plans (see [98,
135, 24, 25, 26]). Let X be some compact convex N -dimensional set in RN . We shall
denote by L 1 (A) the Lebesgue measure of a measurable set A ⊂ R, and by Lip1 (R+ , X)
the space of 1-Lipschitz curves in X with the metric of uniform convergence on compact
sets of R+ .
Definition 2.2.1. Let Ω = [0, 1]. A traffic plan is a measurable map χ : Ω × R+ → X

such that for all ω, t 7→ χ(ω, t) is 1-Lipschitz, and constant for t sufficiently large.
Without risk of ambiguity, we shall call fiber both the path χ(ω, ·) and ω ∈ Ω. We
denote by Pχ the law of ω 7→ χ(ω) ∈ Lip1 (R+ , X) defined by Pχ (E) := L 1 (χ−1 (E)) for
every Borel set E ⊂ Lip1 (R+ , X).
We remark that in the sequel we will also need to consider the restriction of a traffic
plan to a certain subset of fibers Ω0 ⊂ Ω. By abuse of notation, though Ω0 will not be of
unit mass, we will still call χxΩ̃0 a traffic plan.
Definition 2.2.2. Two traffic plans χ and χ0 are said to be equivalent if Pχ = Pχ0 .
In all the following a “traffic plan” means as well the equivalence class of some χ. All
proven properties of a traffic plan will be true for any representative up to the addition
or removal of a set of fibers with zero measure.
Stopping time, irrigated measures, transference plan

If χ : Ω × R+ → X is a traffic plan, define its stopping time by
Tχ (ω) := inf{t ≥ 0 : χ(ω) is constant on [t, ∞)}.
Let us denote the initial and final point of a fiber ω by τ (ω) = χ(ω, 0) and σ(ω) =
χ(ω, Tχ (ω)). To any χ, one can associate its irrigating and irrigated measure respectively
defined by
µ+ (χ)(A) := τ# Pχ (A) = L 1 ({ω : χ(ω, 0) ∈ A}),
µ− (χ)(A) := σ# Pχ (A) = L 1 ({ω : χ(ω, Tχ (ω)) ∈ A}),
where A is any Borel subset of RN .
Energy of a traffic plan

Definition 2.2.3. Let χ : Ω×R+ → X be a traffic plan. Define the path class of x ∈ RN
in χ as the set
Ωχx := {ω : x ∈ χ(ω, R)},
and the multiplicity of χ at x by θχ (x) = L 1 (Ωχx ). For simplicity, we shall write Ωx := Ωχx ,
whenever the underlying traffic plan χ is not ambiguous.
We use the convention that 0α−1 = +∞ for α ∈ [0, 1).
Definition 2.2.4. Let α ∈ [0, 1]. We call energy of a traffic plan χ : Ω × R+ → X the
functional Z Z
α
E (χ) = θχ (χ(ω, t))α−1 |χ̇(ω, t)| dt dω. (2.2.1)
Ω R+
Let µ+ , µ− be two probability measures in X. Denote by TP(µ+ , µ− ) the set of traffic

+ + − −
plans χ such that
R µ (χ) = µ and µ (χ) =+ µ −. If C > 0,+ call− TPC the set of traffic
plans such that Ω Tχ (ω) dω ≤ C and TPC (µ , µ ) := TP(µ , µ ) ∩ TPC .
Convergence
Definition 2.2.5. We say that a sequence of traffic plans χn converges to a traffic plan
χ if Pχn weakly-∗ converges to Pχ , or equivalently if the random variables χn converge
in law to χ.
Definition 2.2.6. We say that a sequence of traffic plans χn fiber converges to a traffic
plan χ if χn (ω) converges to χ(ω) uniformly on compact subsets of R+ for every ω ∈ Ω
(this is stronger than the usual almost sure convergence of random variables).
Remark 2.2.7. By Skorokhod theorem (see Theorem 11.7.2 [57]) χn converges to χ if
and only if there exist χ̃n and χ̃ equivalent to χn and χ respectively and such that χ̃n (ω)
fiber converges to χ̃(ω).
Proposition 2.2.8. Up to a subsequence, any sequence of traffic plans χn in TPC con-
verges to a traffic plan χ. In addition, µ+ (χn ) * µ+ (χ) and µ− (χn ) * µ− (χ).
Existence of minimizers
The optimization problem we are interested in is the irrigation problem, i.e. the problem
of minimizing E α (χ) in TP(µ+ , µ− ). The following results are proved in [24, 98, 25].
Theorem 2.2.9. If C > 0 and χn : Ω × R+ → X is a sequence in TPC converging to
the traffic plan χ, then
E α (χ) ≤ lim inf E α (χn ).
n
We notice that the cost E α (χ) is invariant by time-reparameterization of χ. Therefore,

one can always reparameterize χ so that |χ̇(ω, t)| = 1 forR all t ∈ (0, Tχ (ω)) without
changing the cost. In this case, since θχα−1 ≥ 1, one gets Ω Tχ (ω)dω ≤ E α (χ). Thus,
if χn is a sequence of traffic plan with a uniformly bounded E α cost, it is in TPC up
to reparameterization for C big enough. By Proposition 2.2.8 and Theorem 2.2.9, the
direct method of the calculus of variations ensures the existence of an optimal traffic
plan in TP(µ+ , µ− ).
Corollary 2.2.10. The problem of minimizing E α (χ) in TP(µ+ , µ− ) admits a solution.
Definition 2.2.11. A traffic plan χ is said to be optimal for the irrigation problem if it
is of minimal cost in TP(µ+ (χ), µ− (χ)).
Let
E α (µ+ , µ− ) := min E α (χ).
TP(µ+ ,µ− )
As proved in [25], there is an optimal traffic plan in TP(µ+ , µ− ) which is loop-free, i.e.
for almost any ω ∈ Ω, the map χ(ω, ·) is one to one in [0, Tχ (ω)]. Moreover, using
Propositions 6.4 and 6.6 in [25], given any optimal traffic plan with finite energy there is
an equivalent loop-free traffic plan with the same energy, hence optimal. Thus, without
loss of generality, we may assume that optimal traffic plans are loop-free.
The triangle inequality for the cost E α holds (just think of concatenating traffic plans
[26]):
Proposition 2.2.12. Let µ0 , µ1 and µ2 be probability measures. We have the triangle

inequality
E α (µ0 , µ2 ) ≤ E α (µ0 , µ1 ) + E α (µ1 , µ2 ).
Stability with respect to µ+ and µ−

The following results were first proved in a slightly different framework by Xia [135], and
their proofs adapt immediately to traffic plans (see [24]). We remark that, here and in
the sequel of the chapter, by atomic measure we mean a finite sum of delta measures.
Let C be a cube with edge length L and center c. Let ν be a probability measure on
the compact set X where X ⊂ C. We may approximate ν by atomic measures as follow.
For each i, let
Ci := {Cih : h ∈ ZN ∩ [0, 2i )N }
be a partition of C into cubes of edge length 2Li . Now, for each h ∈ ZN ∩ [0, 2i )N , let chi
be the center of Cih and mhi = ν(Cih ) be the ν mass of the cube Cih .
Definition 2.2.13. We define the dyadic approximation of ν as

X
Ai (ν) := mhi δchi .
h∈ZN ∩[0,2i )N
We observe that the measures Ai (ν) weakly-∗ converge to ν.
Proposition 2.2.14. Let α ∈ (1 − N1 , 1]. Let ν be a probability measure with support in

a cube centered at c and of edge length L. We have
√
α 2n(N (1−α)−1) N L
E (An (ν), ν) ≤ 1−N (1−α) .
2 −1 2
In particular, E α (An (ν), ν) → 0 locally uniformly in α for all ν when n → ∞.
By this result and Theorem 2.2.9, it is not difficult to prove that the cost E α metrizes
the weak-∗ convergence for α ∈ (1 − N1 , 1].
Lemma 2.2.15. Let α ∈ (1 − N1 , 1]. A sequence of probability measures νn weakly-∗

converges to ν if and only if E α (νn , ν) → 0 when n → ∞.
Corollary 2.2.16. Let α ∈ (1 − N1 , 1]. If χn is a sequence of optimal traffic plans for

the irrigation problem and χn → χ, then χ is optimal.
1
Moreover, by Proposition 2.2.14, E α (µ+ , µ− ) is always finite for α ∈ (1 − N
, 1].
Regularity
The following regularity results were proved in [26].
Proposition 2.2.17. Let µ+ and µ− be atomic probability measures and α ∈ [0, 1]. An
optimum for the irrigation problem is a finite tree made of segments (in the sense that
the fibers χ(ω, ·), once parameterized by arc lengths, describe a finite set of piecewise
linear curves).
Theorem 2.2.18. Let α ∈ (1 − N1 , 1) and let χ be an optimal traffic plan in TP(µ+ , µ− ).

Assume that the supports of µ+ and µ− are at positive distance. In any closed ball B(x, r)
not meeting the supports of µ+ and µ− , the traffic plan has the structure of a finite graph.
Extension of the time domain

In Sections 2.4 and 2.5, we will consider traffic plans defined on Ω × R. All the notions
introduced above are easy to generalize, and we shall denote by TPR (µ+ , µ− ) the set of
extended traffic plans from µ+ and µ− and ERα the correspondingR cost. We denote by
TPR,C (µ+ , µ− ) the traffic plans χ ∈ TPR (µ+ , µ− ) such that Ω Tχ (ω)dω ≤ C, where for
a traffic plan in TPR
Tχ (ω) := inf{t + s : t, s ≥ 0, χ(ω) is constant on (−∞, −s] ∪ [t, ∞)}.
Any traffic plan χ ∈ TPR can be shifted in time so that it can be seen as a traffic plan
in TP and the corresponding ERα and E α costs are the same. Thus, from the point
of view of the irrigation problem, the two formalisms yield the same optimal objects.
However, the introduction of this extended model is made necessary for the study of
the dynamical framework we propose, since the dynamical cost we will consider is not
invariant by time-reparameterization.
2.3 Dynamic cost of a traffic plan

Let χ be a traffic plan and c(χ, ω, t) the elementary cost due to the particle ω along the
fiber χ(ω) at time t. We define a general cost function C of a traffic plan χ as follows:
Z Z
C(χ) := c(χ, ω, t)|χ̇(ω, t)| dt dω. (2.3.1)
Ω R+
The choice c(χ, ω, t) = θχ (χ(ω, t))α−1 yields the energy of a traffic plan given by
Definition 2.2.4. In this section, we first prove that for a large class of elementary costs
c(χ, ω, t), the cost of a traffic plan C(χ) is lower semicontinuous. Then, we introduce
a dynamical elementary cost (see the introduction for the meaning of dynamical) for
which the corresponding cost C is lower semicontinuous. This yields the existence of a
minimizer for the dynamical irrigation problem.
Proposition 2.3.1. Let c : TP(µ+ , µ− ) × Ω × R+ → R+ such that c(·, ω, ·) is lower semi-

continuous (with respect to the fiber convergence on traffic plans and the usual topology
in R+ ) for all ω. If χn : Ω × R+ → X fiber converges to the traffic plan χ, then
C(χ) ≤ lim inf C(χn ).

n
Proof. Let us set cλ (χ, ω, t) := inf s≥0 {c(χ, ω, s) + λ|t − s|}. Since c(χ, ω, ·) is lower
semicontinuous, it is classical (see [10]) that cλ (χ, ω, ·) is λ-Lipschitz and that
c(χ, ω, t) = sup cλ (χ, ω, t).

λ
Let us prove that cλ (·, ω, t) is lower semicontinuous for all ω and t. Let χn → χ, and,
for fixed ω and t, assume that up to a subsequence the liminf of cλ (χn , ω, t) is indeed a
limit. Now, for each n, take tn such that
1
cλ (χn , ω, t) ≥ c(χn , ω, tn ) + λ|t − tn | − .
n
If tn → +∞, since c is non-negative,
lim cλ (χn , ω, t) ≥ lim inf λ|t − tn | = +∞ ≥ cλ (χ, ω, t).

n n
Otherwise, up to a subsequence, we can assume tn → t∞ , so that lim inf n c(χn , ω, tn ) ≥

c(χ, ω, t∞ ). Therefore,
lim cλ (χn , ω, t) ≥ lim inf c(χn , ω, tn ) + λ|t − tn | ≥ c(χ, ω, t∞ ) + λ|t − t∞ | ≥ cλ (χ, ω, t).
n n
2.3. Dynamic cost of a traffic plan 71
Let us fix now T > 0 and ε > 0, and let us consider 0 = t1 ≤ . . . ≤ ti ≤ . . . ≤ tk = T

such Rthat |ti+1 − ti | ≤ ε. Since cλ (χ, ω, ·) is λ-Lipschitz, |χ(ω, ·)| is 1-Lipschitz, and
χ 7→ |χ̇(ω, t)|dt and χ 7→ cλ (χ, ω, t) are lower semicontinuous for the fiber convergence,
we have:
Z X· Z ti+1 ¸
λ λ
lim inf c (χn , ω, t)|χ̇n (ω, t)| dt ≥ lim inf c (χn , ω, ti ) |χ̇n (ω, t)| dt − λε(ti+1 − ti )
n [0,T ] n ti
i
X· Z ti+1 ¸ Z
≥ cλ (χ, ω, ti ) |χ̇(ω, t)| dt − λε(ti+1 − ti ) ≥ cλ (χ, ω, t)|χ̇(ω, t)| dt−2λεT.
i ti [0,T ]
This being true for all ε, we get for a.e. ω and all T > 0,
Z Z
lim inf c(χn , ω, t)|χ̇n (ω, t)| dt ≥ lim inf c(χn , ω, t)|χ̇n (ω, t)| dt
n R+ n [0,T ]
Z Z
λ
≥ lim inf c (χn , ω, t)|χ̇n (ω, t)| dt ≥ cλ (χ, ω, t)|χ̇(ω, t)| dt.
n [0,T ] [0,T ]
Then, by Fatou’s lemma,

Z Z
lim inf C(χn ) = lim inf c(χn , ω, t)|χ̇n (ω, t)| dt dω
n n Ω R+
Z Z Z Z
≥ lim inf c(χn , ω, t)|χ̇n (ω, t)| dt dω ≥ cλ (χ, ω, t)|χ̇(ω, t)| dt dω,
Ω n R+ Ω [0,T ]
and we conclude thanks to the monotone convergence theorem. ¤

We now define the dynamical multiplicity of ω at time t as the proportion of particles
that are exactly at the same place as ω at time t.
Definition 2.3.2. Let χ : Ω × R+ → X be a traffic plan. We define the path class of
(ω, t) ∈ Ω × R in χ as the set
[ω, t]χ := {ω 0 : χ(ω 0 , t) = χ(ω, t)}
and the multiplicity of χ at (ω, t) by θ̃χ (ω, t) := L 1 ([ω, t]χ ).

Definition 2.3.3. Let α ∈ [0, 1]. We call dynamical cost of a traffic plan χ : Ω×R+ → X
the functional Z Z
α
C (χ) = θ̃χ (ω, t)α−1 |χ̇(ω, t)|dtdω, (2.3.2)
Ω R+
i.e. C α (χ) = C(χ) with c(χ, t, ω) = θ̃χ (ω, t)α−1 .

Theorem 2.3.4. If χn : Ω × R+ → X is a sequence in TP(µ+ , µ− ) converging to the

traffic plan χ, then
C α (χ) ≤ lim inf C α (χn ).
n
Proof. Let us denote ½
0 if x 6= y,
δ(x, y) =
1 if x = y.
Setting ·Z ¸α−1
0 0
c(χ, ω, t) := δ(χ(ω, t), χ(ω , t)) dω ,
Ω
where α ∈ [0, 1], we observe that c(χ, t, ω) = θ̃χ (ω, t)α−1 , so that C α (χ) = C(χ) as
defined by (2.3.1). Let us consider a sequence of traffic plans χn fiber converging to χ,
and tn → t. We remark that the function
RN × RN 3 (x, y) 7→ δ(x, y) ∈ R
is upper semicontinuous. Therefore, since χn (ω) is a 1-Lipschitz curve, if χn (ω) → χ(ω)

and tn → t we have
lim sup δ(χn (ω, tn ), χn (ω 0 , tn )) ≤ δ(χ(ω, t), χ(ω 0 , t)).

n
Thus, by Fatou lemma,

Z Z
0 0
lim sup δ(χn (ω, tn ), χn (ω , tn )) dω ≤ δ(χ(ω, t), χ(ω 0 , t)) dω 0 ,
n Ω Ω
and since α ≤ 1,
lim inf c(χn , ω, tn ) ≥ c(χ, ω, t). (2.3.3)
n
Therefore Proposition 2.3.1 ensures that C α is lower semicontinuous. ¤
Remark 2.3.5. It is not difficult to prove the upper semicontinuity of the multiplicity
θχ (χ(ω, t)), so that the elementary cost c(χ, ω, t) = θχ (χ(ω, t))α−1 satisfies the hypothesis
of Proposition 2.3.1. This yields a new simple proof of Theorem 2.2.9.
Like in the last paragraph of Section 2.2, it is possible to consider a dynamical cost
CRα (χ) for χ ∈ TPR (µ+ , µ− ). Proposition 2.3.1 and Theorem 2.3.4 hold with TPR and
CRα in place of TP and C α . The compactness of TPC stated in Proposition 2.2.8 yields:
Proposition 2.3.6. Let µ+ and µ− be probability measures on X, and let C > 0 be such
that TPC (µ+ , µ− ) is not empty (for example, take C ≥ diam(X)). Then, there exist
C α -minimizers (resp. CRα -minimizers) in TPC (resp. TPR,C ).
2.4. Synchronizable traffic plans 73
The argument used to prove Corollary 2.2.10 (that states the existence of E α -minimizers
in TP) is not adaptable to the case of C α , since neither C α (χ) nor CRα (χ) are invariant by
time-reparameterization of χ. In particular, the situation where C α -minimizers in TPC
change as C increases to +∞ is not excluded (this is not the case for E α , since by the
reparameterization argument used to prove Corollary 2.2.10 we know that all minimizers
are in TPC for C = E α (µ+ , µ− )). However, we shall see in Section 2.5, that by using
synchronization techniques developed in Section 2.4 we are still able to prove existence
of C α -minimizers in TP(µ+ , µ− ) provided that µ+ is finite atomic, and of CRα -minimizers
in TPR (µ+ , µ− ).
2.4 Synchronizable traffic plans

Let us define the support of a traffic plan χ as the set of points with positive multiplicity.
This set will be denoted by Sχ .
Definition 2.4.1. A traffic plan χ ∈ TPR (µ+ , µ− ) (resp. TP(µ+ , µ− )) is said to be

synchronized (resp. positive synchronized) if it is loop-free, and for all x in the support
of χ there is a time tχ (x) such that χ(ω, tχ (x)) = x for all ω ∈ Ωx (i.e. all fibers which
pass through x have to pass at the same time).
Given two traffic plans χ and χ̃, we say that χ̃ is a reparameterization of χ if, for
almost every ω ∈ Ω, the curve χ̃(ω, ·) is a reparameterization of χ(ω, ·). We will say that
χ̃ is an arc length parameterization of χ if, for almost every ω, χ̃(ω, ·) is an arc length
parameterization of χ(ω, t).
Definition 2.4.2. A traffic plan χ ∈ TPR (µ+ , µ− ) is said to be synchronizable (resp.

positive synchronizable) if there is some reparameterization χ̃ ∈ TPR (µ+ , µ− ) (resp. in
TP(µ+ , µ− )) of χ such that χ̃ is synchronized (resp. positive synchronized).
Since θχα−1 ≤ θ̃χα−1 with equality if χ is (positive) synchronized, one can easily deduce
that if a traffic plan is synchronized (resp. positive synchronized), then ERα (χ) = CRα (χ)
(resp. E α (χ) = C α (χ)).
The aim of this section is to prove that E α -optimal traffic plans are synchronizable.
Indeed, optimal traffic plans are such that there is a finite or countable set of points (xi )
and sets Ωi ⊂ Ωxi that form an (almost-)partition of Ω. This fact makes it possible to
synchronize independently each tree going through some xi , and then harmonize globally
these synchronizations thanks to the so-called strict single oriented path property that
we now discuss.
The strict single path definition was introduced in [26]. Following these authors, a
traffic plan is said to be strict single path if all fibers going through x and y have to
coincide between x and y. In other terms there is a single path (or none) between any
two points of the irrigation network. All optimal traffic plans can then be proven to
be strict single path up to the removal of a set of fibers with null measure. For our
synchronization purposes, we need to use a slight refinement of this notion, namely what
we call the strict single oriented path property. To state this property in precise terms,
we first need to introduce some definitions.
Definition 2.4.3. Let χ be a loop-free traffic plan, and define tx (ω) := inf{t : χ(ω, t) =
x}. Let x, y in Sχ , and define
→ := {ω ∈ Ωχ ∩ Ωχ : tx (ω) < ty (ω)},

Ω−
xy x y
the set of fibers passing through x and then through y. We denote by χxy the restriction
of χ to ∪ω∈Ω−xy→ {ω} × [tx (ω), ty (ω)]. It is the traffic plan made of all pieces of fibers of χ
joining x to y. Denote its support by Γxy := Sχxy .
Definition 2.4.4. A traffic plan χ has the strict single oriented path property (and we
say that χ is strict single oriented) if, for every pair x, y such that |Ω−
→ | > 0, all fibers
xy
xy
in Ωxy coincide between x and y with an arc Γ joining x to y, and Ω−
−→ → = ∅.
yx
By an immediate adaptation of the strict single path property of optimal traffic plans
proven in [26], we have the following result.
Proposition 2.4.5. (Strict single oriented path property) Let α ∈ [0, 1) and χ be
an optimal traffic plan such that E α (χ) < ∞. Then, up to removing a zero measure set
of fibers, χ has the strict single oriented path property.
We can now detail the lemmas useful to the prove the synchronizability of E α -optima.
Lemma 2.4.6. If χ is strict single oriented and Ω̃x ⊂ Ωx , then χx := χxΩ̃x is synchro-
nizable.
Proof. Let χ̃x (ω, t) be an arc length parameterization of χx (ω, t) such that χ̃x (ω, 0) = x.
Since χx (ω, ·) is injective, there is only one such parameterization. Let us now prove that
χ̃x is synchronized. Indeed, let us consider a point y in the image of χ. Since χ is strict
single oriented, there is only one path that connects x to y on the support of the traffic
plan χx . This allows to define lχx (y) as the distance from x to y (through the support of
χ). Since χ̃x (ω, ·) is parameterized by its arc length, we notice that for all ω ∈ Ωy ∩ Ω̃x
χ̃x (ω, lχ (y)) = y, i.e. χ̃x is synchronized. ¤
Lemma 2.4.7. Let χ1 and χ2 be synchronized, connected, arc length parameterized, and
such that χ1 ∪ χ2 is strict single oriented. Then χ1 ∪ χ2 is synchronizable.
2.4. Synchronizable traffic plans 75
Proof. If the supports of χ1 and χ2 are disjoints, then χ1 ∪ χ2 is already synchronized.

Otherwise, let x be a point in the support of both χ1 and χ2 . Since χ1 is synchronized,
there is some tχ1 (x) such that for all ω ∈ Ωχx 1 , χ1 (ω, tχ1 (x)) = x. We define tχ2 (x)
analogously. Let us prove that tχ1 (x) − tχ2 (x) does not depend on the point x. Let us
consider x1 and x2 points in the supports of both χ1 and χ2 . By connectedness and
the strict single oriented path property, there is a unique path on the support of χ1
connecting x1 and x2 (the same holds for χ2 ). Since χ1 is arc length parameterized,
tχ1 (x1 ) − tχ1 (x2 ) is exactly the distance between x1 and x2 (or its opposite, depending
on the orientation of the path). Since χ1 ∪ χ2 is strict single oriented, the unique path
defined by χ2 is the same as the one of χ1 so that we have:
tχ1 (x1 ) − tχ1 (x2 ) = tχ2 (x1 ) − tχ2 (x2 ).
Thus, shifting the time parameterization of χ2 by tχ1 (x) − tχ2 (x) defines a traffic plan
χ̃2 such that χ1 ∪ χ̃2 is synchronized. ¤
Definition 2.4.8. We shall say that a traffic plan χ is finitely (resp. countably) decom-
posable if there is a finite (resp. countable) set of points (xi ) and sets Ωi ⊂ Ωxi that form
a partition of Ω (almost everywhere).
Proposition 2.4.9. If χ is a strict single oriented countably decomposable traffic plan,
then it is synchronizable.
Proof. Let Ωi ⊂ Ωxi defining a countable decomposition of χ, and let us denote
χi := χxΩi . Lemma 2.4.6 ensures that all the χi are synchronizable, and we denote by
χ̃i an equivalent synchronized traffic plans. Since χ is strict single oriented, ∪i χ̃i is strict
single oriented. Thus, by induction, the repeated application of Lemma 2.4.7 allows to
define a synchronized traffic plan χ̃ that is the union of time shifted versions of χ̃i . Such
a traffic plan χ̃ is a time-reparameterization of χ that is synchronized. ¤
Proposition 2.4.10. If µ+ is finite atomic, then any optimal traffic plan χ ∈ TPR (µ+ , µ− )
is positive synchronizable.
P
Proof. Let (xi )ni=1 be a finite sequence such that µ+ := ni=1 ai δxi . The sets defined
by Ω1 := Ωx1 and Ωi := Ωxi \ (∪j<i Ωj ) for i > 1, yield a partition of Ω, so that χ is
finitely decomposable. Since χ is optimal, it is strict single oriented, and Proposition
2.4.9 ensures that χ is synchronizable. Since µ+ is atomic, by the construction of the
reparameterization given in Lemma 2.4.7 it is simple to see that, with a suitable time
shifting, χ is also positive synchronizable. ¤
Proposition 2.4.11. Any optimal traffic plan χ ∈ TPR is synchronizable.
Proof. Any optimal traffic plan is countably decomposable (see [26, Lemma 3.11]) and
strict single oriented. Thus, by Proposition 2.4.9, it is synchronizable. ¤
2.5 Equivalence of the dynamical and classical irri-

gation problems
In the same way as for E α , we define:
C α (µ+ , µ− ) := inf
+
C α (χ), CRα (µ+ , µ− ) := inf+ CRα (χ).
TP(µ ,µ− ) TPR (µ ,µ− )
Theorem 2.5.1. If µ+ is finite atomic, then, for all α ∈ [0, 1],
E α (µ+ , µ− ) = C α (µ+ , µ− ),
and
ERα (µ+ , µ− ) = CRα (µ+ , µ− ).
Proof. We remark that, by the definition of E α and C α , we immediately have the
inequality
E α (χ) ≤ C α (χ) for all traffic plan χ, (2.5.1)
so that,
E α (µ+ , µ− ) ≤ C α (µ+ , µ− ) ∀α ∈ [0, 1].
Let χ be a minimizer of E α . Proposition 2.4.10 ensures that there is a reparameteri-
zation χ̃ of χ such that χ̃ is positive synchronized, so that
E α (µ+ , µ− ) = E α (χ) = E α (χ̃) = C α (χ̃).
Thus, E α (µ+ , µ− ) = C α (µ+ , µ− ) for all α ∈ [0, 1]. Finally, Proposition 2.4.11 yields
ERα (µ+ , µ− ) = CRα (µ+ , µ− ) for all α ∈ [0, 1]. ¤
By Proposition 2.4.11, we also have:
Theorem 2.5.2. Let µ+ and µ− be two probability measures. Then
ERα (µ+ , µ− ) = CRα (µ+ , µ− ).
Theorem 2.5.2 states the equivalence of the cost given by the dynamical and the clas-
sical irrigation problem. Concerning minimizers, we can observe as a direct consequence
of Theorem 2.5.2 and (2.5.1) that every CRα -minimizer is an ERα -minimizer. Conversely,
by Proposition 2.4.11, any ERα -minimizer can be reparameterized so that it gives a CRα -
minimizer. The same considerations are true for E α and C α if µ+ is finite atomic thanks
to Proposition 2.4.10. Thus, in both these cases, the extended dynamical and classi-
cal irrigation problems yield exactly the same minimizers (up to reparameterization).
2.6. Stability with respect to the cost 77
In particular, as a by-product, we obtain the existence of C α -minimizers if µ+ is finite

atomic, and existence of CRα -minimizers in general.
As a particular consequence of the fact that every CRα -minimizer is an ERα -minimizer,
we notice that CRα -minimizers inherit all the regularity properties of ERα -minimizers (the
same holds for C α -minimizers, in the case µ+ is finite atomic). Thus we can translate
the regularity results in Section 2.2 in the CRα framework.
Proposition 2.5.3. Let α ∈ [0, 1], µ+ and µ− be finite atomic measures, and χ ∈
TP(µ+ , µ− ) be a C α -minimizer. Then χ is a finite tree made of segments.
Theorem 2.5.4. Let α ∈ (1 − N1 , 1), µ+ and µ− be probability measures, and χ ∈

TPR (µ+ , µ− ) be a CRα -minimizer. Assume that the supports of µ+ and µ− are at positive
distance. In any closed ball B(x, r) not meeting the supports of µ+ and µ− , the traffic
plan χ has the structure of a finite graph.
2.6 Stability with respect to the cost

In this section we study the regularity with respect to α of E α (µ+ , µ− ) for fixed µ+ and
µ− . By the equivalence of ERα and CRα (Theorem 2.5.2), and E α and C α when µ+ is finite
atomic (Theorem 2.5.1), one can deduce similar stability results for the dynamical cost.
We start studying the regularity with respect to α of E α (χ) for a fixed traffic plan χ.
Lemma 2.6.1. Let χ be a traffic plan. Then [0, 1] 3 α 7→ E α (χ) ∈ R+ ∪ {+∞} is

non-increasing. Fix now α ∈ [0, 1). Then:
(i) If E α (χ) < +∞, then β 7→ E β (χ) is finite and continuous on [α, 1].
(ii) If E α (χ) = +∞, then E αn (χ) → +∞ for any decreasing sequence αn & α.
Proof. The monotonicity of α 7→ E α (χ) is trivial.
Let χ be such that E α (χ) < +∞ and let βn ∈ [α, 1] such that βn → β. For all
(ω, t) ∈ Ω × R+ , we have
θχ (χ(ω, t))βn −1 → θχ (χ(ω, t))β−1 .
In addition, as θχ (χ(ω, t)) ≤ 1, we have
0 ≤ θχ (χ(ω, t))βn −1 ≤ θχ (χ(ω, t))α−1
Thus, since Z Z
α
E (χ) = θχ (χ(ω, t))α−1 |χ̇(ω, t)| dt dω < ∞,
Ω R+
the dominated convergence theorem ensures the convergence of E βn (χ) to E β (χ).

Let us now consider a traffic plan χ such that E α (χ) = +∞, and let αn be a decreasing
sequence converging to α. Then for all (ω, t) ∈ Ω × R+ , θχ (χ(ω, t))αn −1 is increasingly
converging to θχ (χ(ω, t))α−1 . Thus, the monotone convergence theorem ensures that
E αn (χ) → +∞. ¤
Now we can study the stability of E α (µ+ , µ− ) with respect to α.
Proposition 2.6.2. Let µ+ and µ− be two probability measures. The function [0, 1] 3
α 7→ E α (µ+ , µ− ) ∈ R+ ∪ {+∞} is non-increasing, right continuous and left lower semi-
continuous.
Proof. For simplicity of notation set f (α) := E α (µ+ , µ− ). Observe that, since α 7→
E α (χ) is non-increasing for all χ, f is non-increasing being an infimum of non-increasing
functions. Thus, f is left lower semicontinuous, i.e.
lim inf f (αn ) ≥ f (α) for all αn % α.

n
In what follows, χβ will always denote an optimal traffic plan for the exponent β, i.e.
such that E β (χβ ) = f (β). Let us consider a decreasing sequence αn such that αn & α
and a sequence of optimal traffic plans χαn .
By Lemma 2.6.1 and the optimality of χαn for E αn we get
f (α) = E α (χα ) = lim E αn (χα ) ≥ lim sup E αn (χαn ) ≥ lim inf E αn (χαn ). (2.6.1)
n n n
If lim inf n E αn (χαn ) = +∞, there is nothing to prove. Otherwise, up to apply the
reparameterization argument used to prove Corollary 2.2.9, we can assume that χαn ∈
TPC for some C > 0. Thus, by Proposition 2.2.8, there is a subsequence χαnk such that
χαnk → χ and lim inf E αnk (χαnk ) = lim inf E αn (χαn ). (2.6.2)
k n
Recalling that α 7→ E α (χ) is non-increasing, and that E αm is lower semicontinuous for

m fixed, we have
lim inf E αnk (χαnk ) ≥ lim inf E αm (χαnk ) ≥ E αm (χ) for all m. (2.6.3)
k k
By Lemma 2.6.1, limm E αm (χ) = E α (χ) so that (2.6.1), (2.6.2) and (2.6.3) yield
f (α) ≥ lim sup f (αn ) ≥ lim inf f (αn ) ≥ E α (χ) ≥ f (α).

n n
¤
2.6. Stability with respect to the cost 79
Corollary 2.6.3. Let αn ∈ [0, 1] be a decreasing sequence converging to α, and let µ+ and
µ− be two probability measures. If χαn are optimal traffic plans for E αn and χαn → χ,
then χ is optimal for E α .
Proof. By Proposition 2.6.2, and since α 7→ E α (χ) is non-increasing and E αm is lower
semicontinuous for fixed m, we have
E α (µ+ , µ− ) = lim E αn (χαn ) ≥ lim inf E αm (χαn ) ≥ E αm (χ).

n n
Since by Lemma 2.6.1 limm E αm (χ) = E α (χ), χ is optimal. ¤

If we now constrain α to be in (1 − N1 , 1], we are able to say more. Indeed, in this
case, Proposition 2.2.14 allows us to approximate µ+ and µ− with atomic measures µ+ n
and µ− n in such a way that E α
(µ +
, µ −
) is a uniform limit (locally in α) of E α
(µ +
n , µ −
n ).
Then it is sufficient to prove that E α (µ+ −
n , µn ) is continuous for any n, in order to have
that E α (µ+ , µ− ) is continuous on (1 − N1 , 1].
P1 P2
Lemma 2.6.4. Let µ+ = ki=1 ai δxi and µ− = ki=1 bi δyi be atomic measures such that
Pk1 P k2
i=1 ai = i=1 bi (the irrigating and the irrigated measure have the same mass). Then
α 7→ E α (µ+ , µ− ) is continuous on [0, 1].
Proof. By Proposition 2.2.17, we know that, for all α ∈ [0, 1], an optimum for the
irrigation problem can be viewed as a weighted and oriented finite graph G. Then, if we
call χα an optimum for E α , we have
nα
X
E α (µ+ , µ− ) = E α (χα ) = li mαi ,
i=1
where the li and mi are respectively the lengths and weigths of the edges of G. Then,
since nα
X
β
β 7→ E (χα ) = li mβi
i=1
is continuous and finite on [0, 1], we see that E α (µ+ , µ− ) is finite on [0, 1]. Moreover
we already know by Proposition 2.6.2 that E α (µ+ , µ− ) is left lower semicontinuous and
right continuous. So, in order to conclude it is sufficient to prove that E α (µ+ , µ− ) is left
upper semicontinuous. Let (αn ) be a sequence such that αn % α. The continuity of
β 7→ E β (χα ) ensures that
lim sup E αn (µ+ , µ− ) = lim sup E αn (χαn ) ≤ lim sup E αn (χα ) = E α (χα ) = E α (µ+ , µ− ).
n n n
¤
Theorem 2.6.5. Let αn ∈ [1 − N1 , 1] be a sequence converging to α. If the traffic plans

χαn are optimal for E αn and χαn → χ, then χ is optimal for E α .
Proof. By Proposition 2.2.14, for all µ+ and µ− there are atomic measures µ+ −
n and µn
1
such that E α (µ+ − α + −
n , µn ) converges uniformly to E (µ , µ ) on (1 − N , 1]. Lemma 2.6.4
asserts that α 7→ E α (µ+ − α + −
n , µn ) is continuous, so that α 7→ E (µ , µ ) is continuous on
1
(1 − N , 1]. By the same kind of argument as in the proof of Corollary 2.6.3, we deduce
that χ is optimal. If α = 1 − N1 , we can suppose that up to a subsequence αn & α,
so that Corollary 2.6.3 ensures that χ is optimal (possibly trivially optimal in the case
E α (χ) = ∞). ¤
Remark 2.6.6. In the case α = 1, the irrigation problem for the cost E α is equivalent to
the classical Monge-Kantorovich problem (see [110, 85, 132]). For that particular case,
Theorem 2.6.5 ensures that the transference plan associated to a sequence of optimal traf-
fic plans χαn , where αn → 1, converges, up to a subsequence, to an optimal transference
plan for the Monge-Kantorovich problem.
Chapter 3
Variational models for the

incompressible Euler equations
3.1 Introduction
1
The velocity of an incompressible fluid moving inside a region D is mathematically
described by a time-dependent and divergence-free vector field u(t, x) which is parallel to
the boundary ∂D. The Euler equations for incompressible fluids describes the evolution
of such velocity field u in terms of the pressure field p:

 ∂t u + (u · ∇)u = −∇p in [0, T ] × D,
div u = 0 in [0, T ] × D, (3.1.1)

u·n=0 on [0, T ] × ∂D.
Let us assume that u is smooth, so that it produces a unique flow g, given by

½
ġ(t, a) = u(t, g(t, a)),
g(0, a) = a.
By the incompressibility condition, we get that at each time t the map g(t, ·) : D → D
is a measure-preserving diffeomorphism of D, that is
g(t, ·)# µD = µD ,
(here and in the sequel f# µ is the push-forward of a measure µ through a map f , and
µD is the volume measure of the manifold D). Writing Euler equations in terms of g, we
1
This chapter is based on two joint works with Luigi Ambrosio [8, 9].
81
82 3.0. Variational models for the incompressible Euler equations
get 
 g̈(t, a) = −∇p (t, g(t, a)) (t, a) ∈ [0, T ] × D,
g(0, a) = a a ∈ D, (3.1.2)

g(t, ·) ∈ SDiff(D) t ∈ [0, T ].
Viewing the space SDiff(D) of measure-preserving diffeomorphisms of D as an infinite-
dimensional manifold with the metric inherited from the embedding in L2 , and with
tangent space made by the divergence-free vector fields, Arnold interpreted the equation
above, and therefore (3.1.1), as a geodesic equation on SDiff(D) [15]. According to this
intepretation, one can look for solutions of (3.1.2) by minimizing
Z TZ
1
T |ġ(t, x)|2 dµD (x) dt (3.1.3)
0 D 2
among all paths g(t, ·) : [0, T ] → SDiff(D) with g(0, ·) = f and g(T, ·) = h prescribed
(typically, by right invariance, f is taken as the identity map i), and the pressure field
arises as a Lagrange multiplier from the incompressibility constraint (the factor T in front
of the integral is just to make the functional scale invariant in time). We shall denote by
δ(f, h) the Arnold distance in SDiff(D), whose square is defined by the above-mentioned
variational problem in the time interval [0, 1].
Although in the traditional approach to (3.1.1) the initial velocity is prescribed, while
in the minimization of (3.1.3) is not, this variational problem has an independent interest
and leads to deep mathematical questions, namely existence of relaxed solutions, gap
phenomena and necessary and sufficient optimality conditions, that are investigated in
this chapter. We also remark that no existence result of distributional solutions of (3.1.1)
is known when d > 2 (the case d = 2 is different, thanks to the vorticity formulation of
(3.1.1)), see [94], [36] for a discussion on this topic and other concepts of weak solutions
to (3.1.1).
On the positive side, Ebin and Marsden proved in [58] that, when D is a smooth
compact manifold with no boundary, the minimization of (3.1.3) leads to a unique solu-
tion, corresponding also to a solution to Euler equations, if f and h are sufficienly close
in a suitable Sobolev norm.
On the negative side, Shnirelman proved in [121], [122] that when d ≥ 3 the infimum
is not attained in general, and that when d = 2 there exists h ∈ SDiff(D) which cannot
be connected to i by a path with finite action. These “negative” results motivate the
study of relaxed versions of Arnold’s problem.
The first relaxed version of Arnold’s minimization problem was introduced by Brenier
in [31]: he considered probability measures η in Ω(D), the space of continuous paths
ω : [0, T ] → D, and minimized the energy
Z Z T
1
AT (η) := T |ω̇(τ )|2 dτ dη(ω),
Ω(D) 0 2
with the constraints
(e0 , eT )# η = (i, h)# µD , (et )# η = µD ∀t ∈ [0, T ] (3.1.4)
(here and in the sequel et (ω) := ω(t) are the evaluation maps at time t). According to
Brenier, we shall call these η generalized incompressible flows in [0, T ] between i and
h. Obviously any sufficiently regular path g(t, ·) : [0, 1] → S(D) induces a generalized
incompressible flow η = (Φg )# µD , where Φg : D → Ω(D) is given by Φg (x) = g(·, x), but
the converse is far from being true: the main difference between classical and generalized
flows consists in the fact that fluid paths starting from different points are allowed to
cross at a later time, and fluid paths starting from the same point are allowed to split at
a later time. This approach is by now quite common, see for instance [4] (DiPerna-Lions
theory), [25] (branched optimal transportation), [97], [133].
Brenier’s formulation makes sense not only if h ∈ SDiff(D), but also when h ∈
S(D), where S(D) is the space of measure-preserving maps h : D → D, not necessarily
invertible or smooth. In the case D = [0, 1]d , existence of admissible paths with finite
action connecting i to any h ∈ S(D) was proved in [31], together with the existence
of paths with minimal action. Furthermore, a consistency result was proved: smooth
solutions to (3.1.1) are optimal even in the larger class of the generalized incompressible
flows, provided the pressure field p satisfies
T 2 sup sup |∇2x p(t, x)| ≤ π 2 , (3.1.5)

t∈[0,T ] x∈D
and are the unique ones if the inequality is strict. When η = (Φg )# µD we can recover
g(t, ·) from η using the identity
(e0 , et )# η = (i, g(t, ·))# µD , t ∈ [0, T ].
Brenier found in [31] examples of action-minimizing paths η (for instance in the unit
ball of R2 , between i and −i) where no such representation is possible. The same
examples show that the upper bound (3.1.5) is sharp. Notice however that (e0 , et )# η is
a measure-preserving plan, i.e. a probability measure in D × D having both marginals
equal to µD . Denoting by Γ(D) the space of measure-preserving plans, it is therefore
natural to consider t 7→ (e0 , et )# η as a “minimizing geodesic” between i and h in the
larger space of measure-preserving plans. Then, to be consistent, one has to extend
Brenier’s minimization problem considering paths connecting γ, η ∈ Γ(D). We define
this extension, that reveals to be useful also to connect this model to the Eulerian-
Lagrangian one in [35], and to obtain necessary and sufficient optimality conditions even
when only “deterministic” data i and h are considered (because, as we said, the path
might be non-deterministic in between). In this presentation of our results, however, to
simplify the matter as much as possible, we shall consider the case of paths η between
i and h ∈ S(D) only.
In Section 3.5 we study the relation between the relaxation δ∗ of the Arnold distance,
defined by
½ Z ¾
2
δ∗ (h) := inf lim inf δ(i, hn ) : hn ∈ SDiff(D), |hn − h| dµD → 0 ,
n→∞ D
and the distance δ(i, h) arising from the minimization of the Lagrangian model. It is not
hard to show that δ(i, h) ≤ δ∗ (h), and a natural question is whether equality holds, or a
gap phenomenon occurs. In the case D = [0, 1]d with d > 2, an important step forward
was obtained by Shnirelman in [122], who proved that equality holds when h ∈ SDiff(D);
Shnirelman’s construction provides an approximation (with convergence of the action)
of generalized flows connecting i to h by smooth flows still connecting i to h. The main
result of this section is the proof that no gap phenomenon occurs, still in the case D =
[0, 1]d with d > 2, even when non-deterministic final data (i.e. measure-preserving plans)
are considered. The proof of this fact is based on an auxiliary approximation result,
Theorem 3.5.3, valid in any number of dimensions, which we believe of independent
interest: it allows to approximate, with convergence of the action, any generalized flow
η in [0, 1]d by W 1,2 flows (in time) induced by measure-preserving maps g(t, ·). This
fact shows that the “negative” result of Shnirelman on the existence in dimension 2 of
non-attainable diffeomorphisms is due to the regularity assumption on the path, and it
is false if one allows for paths in the larger space S(D). The proof of Theorem 3.5.3 uses
some key ideas from [122] (in particular the combination of law of large numbers and
smoothing of discrete families of trajectories), and some ideas coming from the theory
of optimal transportation.
Minimizing generalized paths η are not unique in general, as shown in [31]; how-
ever, Brenier proved in [33] that the gradient of the pressure field p, identified by the
distributional relation
∇p(t, x) = −∂t v t (x) − div (v ⊗ v t (x)) , (3.1.6)
is indeed unique. Here v t (x) is the “effective velocity”, defined by (et )# (ω̇(t)η) = v t µD ,
and v ⊗ v t is the quadratic effective velocity, defined by (et )# (ω̇(t) ⊗ ω̇(t)η) = v ⊗ v t µD .
The proof of this fact is based on the so-called dual least action principle: if η is optimal,
we have
AT (ν) ≥ AT (η) + hp, ρν − 1i (3.1.7)
for any measure ν in Ω(D) such that (e0 , eT )# ν = (i, h)# µD and kρν − 1kC 1 ≤ 1/2. Here
ρν is the (absolutely continuous) density produced by the flow ν, defined by ρν (t, ·)µD =
(et )# ν. In this way, the incompressibility constraint can be slightly relaxed and one can
work with the augmented functional (still minimized by η)
ν 7→ AT (ν) − hp, ρν − 1i,
whose first variation leads to (3.1.6).

In Theorem 3.6.2, still using the key Proposition 2.1 from [33], we provide a simpler
proof and a new interpretation of the dual least action principle.
A few years later, Brenier introduced in [35] a new relaxed version of Arnold’s problem
of a mixed Eulerian-Lagrangian nature: the idea is to add to the Eulerian variable
x a Lagrangian one a representing, at least when f = i, the initial position of the
particle; then, one minimizes a functional of the Eulerian variables (density and velocity),
depending also on a. Brenier’s motivation for looking at the new model was that this
formalism allows to show much stronger regularity results for the pressure field, namely
∂xi p are locally finite measures in (0, T ) × D. In Section 3.3.3 we describe in detail this
new model and, in Section 3.4, we show that the two models are basically equivalent.
This result will be used by us to transfer the regularity informations on the pressure
field up to the Lagrangian model, thus obtaining the validity of (3.1.7) for a much larger
class of generalized flows ν, that we call flows with bounded compression. The proof of
the equivalence follows by a general principle (Theorem 3.2.4, borrowed from [11]) that
allows to move from an Eulerian to a Lagrangian description, lifting solutions to the
continuity equation to measures in the space of continuous maps.
In Section 3.6 we look for necessary and sufficient optimality conditions for the
geodesic problem. These conditions require that the pressure field p is a function and not
only a distribution: this technical result is achieved in the last section, where, by carefully
analyzing and improving Brenier’s difference-quotient argument, we show that ∂xi p ∈
¡ ¢ ¡ d/(d−1) ¢
L2loc (0, T ); Mloc (D) (this implies, by Sobolev embedding, p ∈ L2loc (0, T ); Lloc (D) ).
In this final section, although we do not see a serious obstruction to the extension of
our results to a more general framework, we consider the case of the flat torus Td only,
and we shall denote¡by µT the canonical ¢ measure on the flat torus. We observe that
2 d/(d−1) d
in this case p ∈ Lloc (0, T ); L (T ) and so, taking into account that the pressure
field in (3.1.7)R is uniquely determined up to additive time-dependent constants, we may
assume that Td p(t, ·)dµT = 0 for almost all t ∈ (0, T ). R
The first elementary remark is that any integrable function q in (0, T )×Td with Td q(t, ·) dµT =
0 for almost all t ∈ (0, T ) provides us with a null-lagrangian for the geodesic problem,
as the incompressibility constraint gives
Z Z T Z T Z
q(t, ω(t)) dt dν(ω) = q(t, x) dµT (x) dt = 0
Ω(Td ) 0 0 Td
for any generalized incompressible flow ν. Taking also the constraint (e0 , eT )# ν =
(i, h)# µ into account, we get
Z µZ T ¶ Z
1 2
AT (ν) = T |ω̇(t)| − q(t, ω) dt dν(ω) ≥ cTq (x, h(x)) dµT (x),
Ω(Td ) 0 2 Td
RT
where cTq (x, y) is the minimal cost associated with the Lagrangian T 0 21 |ω̇(t)|2 −q(t, ω) dt.
Since this lower bound depends only on h, we obtain that any η satisfying (3.1.4) and con-
2 R
centrated on cq -minimal paths, for some q ∈ L1 , is optimal, and δ (i, h) = cTq (i, h) dµT .
This is basically the argument used by Brenier in [31] to show the minimality of smooth
solutions to (3.1.1), under assumption (3.1.5): indeed, this condition guarantees that
solutions of ω̈(t) = −∇p(t, ω) (i.e. stationary paths for the Lagrangian, with q = p) are
also minimal.
We are able to show that basically this condition is necessary and sufficient for
optimality if the pressure field is globally integrable (see Theorem 3.6.12). However,
since no global in time regularity result for the pressure field is presently known, we have
also been looking for necessary and sufficient optimality conditions that don’t require the
global integrability of the pressure field. Using the regularity p ∈ L1loc ((0, T ); Lr (D)) for
some r > 1, guaranteed in the case D = Td with r = d/(d − 1) by the results contained
in the last saction, we show in Theorem 3.6.8 that any optimal η is concentrated on
locally minimizing paths for the Lagrangian
Z
1
Lp (ω) := |ω̇(t)|2 − p(t, ω) dt (3.1.8)
2
Since we are going to integrate p along curves, this statement is not invariant un-
der modifications of p in negligible sets, and the choice of a specific representative
p̄(t, x) := lim inf ε↓0 p(t, ·) ∗ φε (x) in the Lebesgue equivalence class is needed. Moreover,
the necessity of pointwise uniform estimates on pε requires the integrability of M p(t, x),
the maximal function of p(t, ·) at x (see (3.6.11)).
In addition, we identify a second necessary (and more hidden) optimality condition.
In order to state it, let us consider an interval [s, t] ⊂ (0, T ) and the cost function
½Z t ¾
s,t 1 2 1
cp (x, y) := inf |ω̇(τ )| − p(τ, ω) dτ : ω(s) = x, ω(t) = y, M p(τ, ω) ∈ L (s, t) .
s 2
(3.1.9)
(the assumption M p(τ, ω) ∈ L1 (s, t) is forced by technical reasons). Recall that, accord-
ing to the theory of optimal transportation, a probability measure λ in Td × Td is said
to be c-optimal if Z Z
0
c(x, y) dλ ≥ c(x, y) dλ
Td ×Td Td ×Td
3.2. Notation and preliminary results 87
for any probability measure λ0 having the R same marginals µ1 , µ2 of λ. We shall also
denote Wc (µ1 , µ2 ) the minimal value, i.e. Td ×Td c dλ, with λ c-optimal. Now, let η be an
optimal generalized incompressible R flow between i and h; according to the disintegration
theorem, we can represent η = η a dµD (a), with η a concentrated on curves starting
at a (and ending, since our final conditions is deterministic, at h(a)), and consider the
plans λs,t
a = (es , et )# η a . We show that
for all [s, t] ⊂ (0, T ), λs,t s,t d

a is cp -optimal for µT -a.e. a ∈ T . (3.1.10)
Roughly speaking, this condition tells us that one has not only to move mass from x to y
achieving cs,t
p , but also to optimize the distribution of mass between time s and time t. In
the “deterministic” case when either (e0 , es )# η or (e0 , et )# η are induced by a transport
map g, the plan λs,t a has δg(a) either as first or as second marginal, and therefore it is
uniquely determined by its marginals (it is indeed the product of them). This is the
reason why condition (3.1.10) does not show up in the deterministic case.
Finally, we show in Theorem 3.6.12 that the two conditions are also sufficient, even
on general manifolds D: if, for some r > 1 and q ∈ L1loc ((0, T ); Lr (D)), a generalized
incompressible flow η concentrated on locally minimizing curves for the Lagrangian Lq
satisfies
for all [s, t] ⊂ (0, T ), λs,t s,t
a is cq -optimal for µD -a.e. a ∈ D,
then η is optimal in [0, T ], and q is the pressure field.

These results show a somehow unexpected connection between the variational theory
of incompressible flows and the theory developed by Bernard-Buffoni [20] of measures
in the space of action-minimizing curves; in this framework one can fit Mather’s theory
as well as optimal transportation problems on manifolds, with a geometric cost. In our
case the only difference is that the Lagrangian is possibly nonsmoooth (but hopefully
not so bad), and not given a priori, but generated by the problem itself. Our approach
also yields (see Corollary 3.6.13) a new variational characterization of the pressure field,
as a maximizer of the family of functionals (for [s, t] ⊂ (0, T ))
Z
¡ ¢
q 7→ Wcs,t
q
(ηas , γat ) dµT (a), M q ∈ L1 [s, t] × Td ,
Td
where ηas , γat are the marginals of λs,t

a .
3.2 Notation and preliminary results

Measure-theoretic notation. We start by recalling some basic facts in Measure The-
ory. Let X, Y be Polish spaces, i.e. topological spaces whose topology is induced by
a complete and separable distance. We endow a Polish space X with the correspond-
ing Borel σ-algebra and denote by P(X) (resp. M+ (X), M (X)) the family of Borel
probability (resp. nonnegative and finite, real and with finite total variation) mea-
sures in X. For A ⊂ X and µ ∈ M (X) the restriction µxA of µ to A is defined by
µxA(B) := µ(A ∩ B). We will denote by i : X → X the identity map.
Definition 3.2.1 (Push-forward). Let µ ∈ M (X) and let f : X → Y be a Borel map.

The push-forward f# µ is the measure in Y defined by f# µ(B) = µ(f −1 (B)) for any Borel
set B ⊂ Y . The definition obviously extends, componentwise, to vector-valued measures.
It is easy to check that f# µ has finite total variation as well, and that |f# µ| ≤ f# |µ|.
An elementary approximation by simple functions shows the change of variable formula
Z Z
g df# µ = g ◦ f dµ (3.2.1)
Y X
for any bounded Borel function (or even either nonnegative or nonpositive, and R-valued,
in the case µ ∈ M+ (X)) g : Y → R.
Definition 3.2.2 (Narrow convergence and compactness). Narrow (sequential)

convergence in P(X) is the convergence induced by the duality with Cb (X), the space of
continuous and bounded functions in X. By Prokhorov theorem, a family F in P(X)
is sequentially relatively compact with respect to the narrow convergence if and only if it
is tight, i.e. for any ε > 0 there exists a compact set K ⊂ X such that µ(X \ K) < ε for
any µ ∈ F .
In this chapter we use only the “easy” implication in Prokhorov theorem, namely
that any tight family is sequentially relatively compact. It is immediate to check that a
sufficient condition for tightness of a family F of probability measures is the existence
of a coercive functional Ψ : X → [0, +∞] (i.e. a functional such that its sublevel sets
{Ψ ≤ t}, t ∈ R+ , are relatively compact in X) such that
Z
Ψ(x)dµ(x) ≤ 1 ∀µ ∈ F .
X
Lemma 3.2.3 ([14], Lemma 2.4). Let µ ∈ P(X) and u ∈ L2 (X; Rm ). Then, for any
Borel map f : X → Y , f# (uµ) ¿ f# µ and its density v with respect to f# µ satisfies
Z Z
2
|v| df# µ ≤ |u|2 dµ.
Y X
Furthermore, equality holds if and only if u = v ◦ f µ-a.e. in X.

3.2. Notation and preliminary results 89
Given µ ∈ M+ (X × Y ), we shall denote by µx ⊗ λ its disintegration via the projection

map π(x, y) = x: here λ = π# µ ∈ M+ (X), and x 7→ µx ∈ P(Y ) is a Borel map (i.e.
x 7→ µx (A) is Borel for all Borel sets A ⊂ Y ) characterized, up to λ-negligible sets, by
Z Z µZ ¶
f (x, y) dµ(x, y) = f (x, y) dµx (y) dλ(x) (3.2.2)
X×Y X Y
for all nonnegative Borel map f . Conversely, any λ and any Borel map x 7→ µx ∈ P(Y )
induce a probability measure µ in X × Y via (3.2.2).
Function spaces. We shall denote by Ω(D) the space C([0, T ]; D), and by ω : [0, T ] →
D its typical element. The evaluation maps at time t, ω 7→ ω(t), will be denoted by et .
If D is a smooth, compact Riemannian manifold without boundary (typically the
d-dimensional flat torus Td ), we shall denote µD its volume measure, and by dD its
Riemannian distance, normalizing the Riemannian metric so that µD is a probability
measure. Although it does not fit exactly in this framework, we occasionally consider
also the case D = [0, 1]d , because many results have already been obtained in this
particular case.
We shall often consider measures η ∈ M+ (Ω(D)) such that (et )# η ¿ µD ; in this
case we shall denote by ρη : [0, T ] × D → [0, +∞] the density, characterized by
ρη (t, ·)µD := (et )# η, t ∈ [0, T ].
We denote by SDiff(D) the measure-preserving diffeomorphisms of D, and by S(D)

the measure-preserving maps in D:
S(D) := {g : D → D : g# µD = µD } . (3.2.3)
We also set
S i (D) := {g ∈ S(D) : g is µD -essentially injective} . (3.2.4)
For any g ∈ S i (D) the inverse g −1 is well defined up to µD -negligible sets, µD -measurable,
and g −1 ◦ g = i = g ◦ g −1 µD -a.e. in D. In particular, if g ∈ S i (D), g −1 ∈ S i (D).
We shall also denote by Γ(D) the family of measure-preserving plans, i.e. the prob-
ability measures in D × D whose first and second marginal are µD :
Γ(D) := {γ ∈ P(D × D) : (π1 )# γ = µD , (π2 )# γ = µD } (3.2.5)
(here π1 , π2 are the canonical coordinate projections).

Recall that SDiff(D) ⊂ S i (D) ⊂ S(D) and that any element g ∈ S(D) canonically
induces a measure preserving plan γg , defined by
γg := (i × g)# µD .
Furthermore, this correspondence is continuous, as long as convergence in L2 (µ) of the

maps g and narrow convergence of the plans are considered (see for instance Lemma 2.3
in [14]). Moreover
narrow
{γg : g ∈ S i (D)} = Γ(D), (3.2.6)
L2 (µD )
SDiff(D) = S(D) if D = [0, 1]d , with d ≥ 2 (3.2.7)
(the first result is standard, see for example the explicit construction in [37, Theorem
1.4 (i)] in the case D = [0, 1]d , while the second one is proved in [37, Corollary 1.5])
The continuity equation. In the sequel we shall often consider weak solutions µt ∈
P(D) of the continuity equation
∂t µt + div(v t µt ) = 0, (3.2.8)
where t 7→ µt is narrowly continuous (this is not restrictive, see for instance Lemma 8.1.2
of [11]) and v t (x) is a suitable velocity field with kv t kL2 (µt ) ∈ L1 (0, T ) (formally, v t is a
section of the tangent bundle and |v t | is computed according to the Riemannian metric).
The equation is understood in a weak (distributional) sense, by requiring that
Z Z
d
φ(t, x) dµt (x) = ∂t φ + h∇φ, v t i dµt in D0 (0, T )
dt D D
for any φ ∈ C 1 ((0, T ) × D) with bounded first derivatives and support contained in
d
J × D, with J b ¡ (0, T ). In¢ the case when D ⊂ R is compact, dwe shall consider
1 d
functions φ ∈ C (0, T ) × R , again with support contained in J × R , with J b (0, T ).
The following general principle allows to lift solutions of the continuity equation to
measures in the space of continuous paths.
Theorem 3.2.4 (Superposition principle). Assume that either D is a compact subset
of Rd , or D is a smooth compact Riemannian manifold without boundary, and let µt :
[0, T ] → P(D) be a narrowly continuous solution of the continuity equation (3.2.8) for
a suitable velocity field v(t, x) = v t (x) satisfying kv t kL2 2 (µt ) ∈ L1 (0, T ). Then there exists
η ∈ P(Ω(D)) such that
(i) µt = (et )# η for all t ∈ [0, T ];
(ii) the following energy inequality holds:
Z Z T Z T Z
2
|ω̇(t)| dt dη(ω) ≤ |v t |2 dµt dt.
Ω(D) 0 0 D
Proof. In the case when D = R (and therefore also when D ⊂ Rd is closed) this result
d
is proved in Theorem 8.2.1 of [11] (see also [16], [123], [21] for related results). In the case
when D is a smooth, compact Riemannian manifold we recover the same result thanks
to an isometric embedding in Rm , for m large enough. ¤
3.3. Variational models for generalized geodesics 91
3.3 Variational models for generalized geodesics

3.3.1 Arnold’s least action problem
Let f, h ∈ SDiff(D) be given. Following Arnold [15], we define δ 2 (f, h) by minimizing
the action Z TZ
1
AT (g) := T |ġ(t, x)|2 dµD (x) dt,
0 D 2
among all smooth curves
[0, T ] 3 t 7→ g(t, ·) ∈ SDiff(D)
connecting f to h. By time rescaling, δ is independent of T . Since right composition
with a given element g ∈ SDiff(D) does not change the action (as it amounts just to a
relabelling of the initial position with g), the distance δ is right invariant, so it will be
often useful to assume, in the minimization problem, that f is the identity map.
The action AT can also be computed in terms of the velocity field u, defined by
u(t, x) = ġ(t, y)|y=g−1 (t,x) , as
Z TZ
1
AT (u) = T |u(t, x)|2 dµD (x) dt.
0 D 2
As we mentioned in the introduction, connections between this minimization problem

and (3.1.1) were achieved first by Ebin and Marsden, and then by Brenier: in [31], [35]
he proved that if (u, p) is a smooth solution of the Euler equation in [0, T ] × D, with
D = [0, 1]d , and the inequality in (3.1.5) is strict, then the flow g(t, x) of u is the unique
solution of Arnold’s minimization problem with f =Ri, h = g(T, ·).
1
By integrating the inequality d2D (h(x), f (x)) ≤ 0 |ġ(t, x)|2 dt one immediately ob-
√
tains that kh − f kL2 (D) ≤ 2δ(f, h); Shnirelman proved in [122] that in the case
D = [0, 1]d with d ≥ 3 the Arnold distance is topologically equivalent to the L2 dis-
tance: namely, there exist C > 0, α > 0 such that
δ(f, g) ≤ Ckf − gkαL2 (D) ∀f, g ∈ SDiff(D). (3.3.1)
Shnirelman also proved in [121] that when d ≥ 3 the infimum is not attained in general
and that, when d = 2, δ(i, h) need not be finite (i.e., there exist h ∈ SDiff(D) which
cannot be connected to i by a path with finite action).
3.3.2 Brenier’s Lagrangian model and its extensions

In [31], Brenier proposed a relaxed version of the Arnold geodesic problem, and here we
present more general versions of Brenier’s relaxed problem, allowing first for final data
in Γ(D), and then for initial and final data in Γ(D).
Let γ ∈ Γ(D) be given; the class of admissible paths, called by Brenier generalized
incompressible flows, is made by the probability measures η on Ω(D) such that
(et )# η = µD ∀t ∈ [0, T ].
Then the action of an admissible η is defined as

Z
AT (η) := AT (ω) dη(ω),
Ω(D)
where
( RT
T 0 12 |ω̇(t)|2 dt if ω is absolutely continuous in [0, T ]
AT (ω) := (3.3.2)
+∞ otherwise,
2
and δ (γi , γ) is defined by minimizing AT (η) among all generalized incompressible flows
η connecting γi to γ, i.e. those satisfying
(e0 , eT )# η = γ. (3.3.3)
Notice that it is not clear, in this purely Lagrangian formulation, how the relaxed
distance δ(η, γ) between two measure preserving plans might be defined, not even when η
and γ are induced by maps g, h. Only when g ∈ S i (D) we might use the right invariance
and define δ(γg , γh ) := δ(γi , γh◦g−1 ).
These remarks led us to the following more general problem: let us denote
Ω̃(D) := Ω(D) × D,
whose typical element will be denoted by (ω, a), and let us denote by πD : Ω̃(D) → D
the canonical projection. We consider probability measures η in Ω̃(D) having µD as
second marginal, i.e. (πD )# η = µD ; they can be canonically represented as η a ⊗ µD ,
where η a ∈ P(Ω(D)). The incompressibility constraint now becomes
Z
(et )# η a dµD (a) = µD ∀t ∈ [0, T ], (3.3.4)
D
or equivalently (et )# η = µD for all t, if we consider et as a map defined on Ω̃(D). Given

initial and final data η = ηa ⊗ µD , γ = γa ⊗ µD ∈ Γ(D), the constraint (3.3.3) now
becomes
(e0 , πD )# η = ηa ⊗ µD , (eT , πD )# η = γa ⊗ µD . (3.3.5)
Equivalently, in terms of η a we can write
(e0 )# η a = ηa , (eT )# η a = γa . (3.3.6)

2
Then, we define δ (η, γ) by minimizing the action
Z
AT (ω) dη(ω, a)
Ω̃(D)
among all generalized incompressible flows η (according to (3.3.4)) connecting η to γ

2
(according to (3.3.5) or (3.3.6)). Notice that δ is independent of T , because the action
is scaling invariant; so we can use any interval [a, b] in place of [0, T ] to define δ, and in
this case we shall talk of generalized flow between η and γ in [a, b] (this extension will
play a role in Remark 3.3.2 below).
When ηa = R δa (i.e. η = γi ), (3.3.6) tells us that almost all trajectories of η a start
from a: then D η a dµD (a) provides us with a solution of Brenier’s original model with
the same action,
R connecting γi to γ. Conversely, any solution ν of this model can be
written as D ν a dµD , with ν a concentrated on the curves starting at a, and ν a ⊗ µD
provides us with an admissible path for our generalized problem, connecting γi to γ,
with the same action.
Let us now analyze the properties of (Γ(D), δ); the fact that this is a metric space
and even a length space (i.e. any two points can be joined by a geodesic with length
equal to the distance) follows by the basic operations reparameterization, restriction and
concatenation of generalized flows, that we are now going to describe.
Remark 3.3.1 (Repameterization). Let χ : [0, T ] → [0, T ] be a C 1 map with χ̇ > 0,

χ(0) = 0 and χ(T ) = T . Then, right composition of ω with χ induces a transformation
η 7→ χ# η between generalized incompressible flows that preserves the initial and final
conditionl. As a consequence, if η is optimal the functional χ 7→ AT (χ# η) attains its
minimum when χ(t) = t. Changing variables we obtain
Z T Z Z T Z
2 1 2 1 1 2
AT (χ# η) = T χ̇ (t) |ω̇| (χ(t)) dη(ω, a) dt = T |ω̇| (s) dη(ω, a) ds
0 Ω̃(D) 2 0 ġ(s) Ω̃(D) 2
with g = χ−1 . Therefore, choosing g(s) = s+εφ(s), with φ ∈ Cc1 (0, T ), the first variation
gives Z T µZ ¶
2
|ω̇| (s) dη(ω, a) φ̇(s) ds = 0.
0 Ω̃(D)
R
This proves that s 7→ Ω̃(D) |ω̇|2 (s) dη(ω, a) is equivalent to a constant. We shall call the
square root of this quantity speed of η.
Remark 3.3.2 (Restriction and concatenation). Let [s, t] ⊂ [0, T ] and let rs,t :
C([0, T ]; D) → C([s, t]; D) be the restriction map. It is immediate to check that, for
any generalized incompressible flow η = η a ⊗ µD in [0, T ] between η and γ, the measure
(rs,t )# η is a generalized incompressible flow in [s, t] between ηs := (es )# η a ⊗ µD and

γt := (et )# η a ⊗ µD , with action equal to
Z Z t
1
(t − s) |ω̇(τ )|2 dτ dη(ω, a).
Ω̃(D) s 2
Let s < l < t and let η = µa ⊗ µD , ν = ν a ⊗ µD be generalized incompressible flows,
respectively defined in [s, l] and [l, t], and joining η to γ and γ to θ. Then, writing
γa = (el )# η a = (el )# ν a , we can disintegrate both η a and ν a with respect to γa to obtain
Z Z
ηa = η a,x dγa (x) ∈ P(C([s, l]; D)), νa = ν a,x dγa (x) ∈ P(C([l, t]; D)),
D D
with η a,x , ν a,x concentrated on the curves ω with ω(l) = x. We can then consider
the image λx,a , via the concatenation of paths (from the product of C([s, l]; D) and
C([l, t]; D) to C([s, t]; D)), of the product measure η a,x × ν a,x to obtain a probability
measure in C([s, t]; D) concentrated on paths passing through x at time l. Eventually,
setting Z
λ= λx,a d(γa ⊗ µD )(x, a),
D×D
we obtain a generalized incompressible flow in [s, t] joining η to θ with action given by
t−s t−s
A[s,l] (η) + A[l,t] (ν),
l−s t−l
where A[s,l] (η) is the action of η in [s, l] and A[l,t] (ν) is the action of ν in [l, t] (strictly
speaking, the action of their restrictions).
A simple consequence of the previous remarks is that δ is a distance in Γ(D) (it
suffices to concatenate flows with unit speed); in addition, the restriction of an optimal
incompressible flow η = η a ⊗ µD between ηa ⊗ µD and γa ⊗ µD to an interval [s, t] is still
an optimal incompressible flow in [s, t] between the plans (es )# η a ⊗µD and (et )# η a ⊗µD .
This property will be useful in Section 3.6.
Another important property of δ that will be useful in Section 3.6 is its lower semi-
continuity with respect to the narrow convergence, that we are going to prove in the
next theorem. Another non-trivial fact is the existence of at least one generalized in-
compressible flow with finite action. In [31, Section 4] Brenier proved the existence of
such a flow in the case D = Td . Then in [122, Section 2], using a (non-injective) Lips-
chitz measure-preserving map from Td to [0, 1]d , Shnirelman produced a flow with finite
action also in this case (see also [35, Section 3]). In the next theorem we will show how
to construct a flow with finite action in a compact subset D whenever flows with finite
action can be built in D0 and a possibly non-injective, Lipschitz and measure-preserving
map f : D0 → D exists.
Theorem 3.3.3. Assume that D ⊂ Rd is a compact set. Then the infimum in the
definition of δ(η, γ) is achieved,
(η, γ) 7→ δ(η, γ) is narrowly lower semicontinuous (3.3.7)
and
δ(γi , γh ) ≤ δ(i, h) ∀h ∈ SDiff(D). (3.3.8)
√
Furthermore, sup δ(η, γ) ≤ d when either D = [0, 1]d or D = Td and, more gener-
η, γ∈Γ(D)
ally,
sup δ D (γi , γ) ≤ Lip(f ) sup δ D0 (γi , γ 0 )
γ∈Γ(D) γ 0 ∈Γ(D0 )
whenever a Lipschitz measure-preserving map f : D0 → D exists.

Proof. The inequality δ(γi , γh ) ≤ δ(i, h) simply follows by the fact that any smooth
flow g induces a generalized one, with the same action, by the formula η = Φ# µD ,
where Φ : D → Ω̃(D) is the map x 7→ (g(·, x), x). Assuming that some generalized
incompressible flow with a finite action between η and γ exists, the existence of an optimal
one follows by the narrow lower semicontinuity of η 7→ AT (η) (because ω 7→ AT (ω) is
lower semicontinuous in Ω(D)) and by the tightness of minimizing sequences (because
AT (ω) is coercive in Ω(D), by the Ascoli-Arzelà theorem). A similar argument also
proves the lower semicontinuity of (η, γ) 7→ δ(η, γ), as the conditions (3.3.4), (3.3.5) are
stable under narrow convergence (of η and η, γ).
d d
When either D = [0, √ 1] or D = T , it follows by the explicit construction in [31],
[122] that δ(γi , γh ) ≤ d for all h ∈ S(D); by right invariance (see Proposition 3.3.4
below) the same estimate holds for δ(γf , γh ) with f ∈ S i (D); by density and lower
semicontinuity it extends to δ(η, γ), with η, γ ∈ Γ(D).
Let f : D0 → D be a Lipschitz measure-preserving map and h ∈ S(D); we claim that
it suffices to show the existence of γ 0 ∈ Γ(D0 ) such that (f × f )# γ 0 = (i × h)# µD . Indeed,
if this is proved, since f naturally induces by left composition a map F from Ω̃(D0 ) to
Ω̃(D) given by (ω(t), a) 7→ (f (ω(t)), a), then to any η ∈ Ω(D0 ) connecting i to γ 0 we can
associate F# η, which will be a generalized incompressible flow connecting i to h. By the
trivial estimate
AT (F# η) ≤ Lip2 (f )AT (η),
one obtains δ D (γi , h) ≤ Lip(f )δ D0 (γi , γ 0 ). By density and lower semicontinuity we get
the estimate on δ D (γi , γ) for all γ ∈ Γ(D).
Thus, to conclude the proof, we have to construct γ 0 . Let us consider the disintegra-
tion of µD0 induced by the map f , that is
Z
µD =
0 µy dµD (y) (3.3.9)
D
where, for µD -a.e. y, µy is a probability measure in D0 concentrated on the compact set

f −1 (y). We now define γ 0 as
Z
0
γ := µy × µh(y) dµD (y).
D
Clearly theR first marginal of γ 0 is µD0 ; since h ∈ S(D), changing variables in (3.3.9) one
has µD0 = D µh(y) dµD (y), and so also the second marginal of γ 0 is µD . Let us now prove
that (f × f )# γ 0 = (i × h)# µD : for any φ ∈ Cb (D × D) we have
Z Z
0 0 0
φ(y, y ) d(f × f )# γ (y, y ) = φ(f (x), f (x0 )) dγ 0 (x, x0 )
D×D 0 ×D 0
ZD Z
= φ(f (x), f (x0 )) dµy (x) dµh(y) (x0 ) dµD (y)
0 0
ZD D ×D
= φ(y, h(y)) dµD (y),
D
where in the last equality we used that µy is concentrated on f −1 (y) and µh(y) is concen-
trated on f −1 (h(y)) for µD -a.e. y. ¤
By (3.3.1), (3.3.8) and the narrow lower semicontinuity of δ(i, ·) we get
δ(γi , h) ≤ Ckh − ikαL2 (D) if h ∈ S(D), D = [0, 1]d , d ≥ 3. (3.3.10)
We conclude this section by pointing out some additional properties of the metric
space (Γ(D), δ).
Proposition 3.3.4. (Γ(D), δ) is a complete metric space, whose convergence implies
narrow convergence. Furthermore, the distance δ is right invariant under the action of
S i (D) on Γ(D). Finally, δ-convergence is strictly stronger than narrow convergence and,
as a consequence, (Γ(D), δ) is not compact.
Proof. We will prove that δ(η, γ) ≥ W2 (η, γ), where W2 is the quadratic Wasserstein
distance in P(D × D) (with the quadratic cost c((x1 , x2 ), (y1 , y2 )) = d2D (x1 , y1 )/2 +
d2D (x2 , y2 )/2); as this distance metrizes the narrow convergence, this will give the impli-
cation between δ-convergence and narrow convergence. In order to show the inequality
δ(η, γ) ≥ W2 (η, γ) we consider an optimal flow η a ⊗ µD defined in [0, 1]; then, denoting
by ωa ∈ Ω(D) the constant path identically R equal to a, and by ν a ∈ P(C([0, 1]; D × D))
the measure η a × δωa , the measure ν := D ν a dµD (a) ∈ P(C([0, 1]; D × D)) provides a
“dynamical transference plan” connecting η to γ (i.e. (e0 )# ν = η, (e1 )# ν = γ, see [133,
2
Chapter 7]) whose action is δ (η, γ); since the action of any dynamical transference plan
bounds from above W22 (η, γ), the inequality is achieved.
The completeness of (Γ(D), δ) is a consequence of the inequality δ ≥ W2 (so that

Cauchy sequences in this space are Cauchy sequences for the Wasserstein distance), the
completeness of the Wasserstein spaces of probability measures and the narrow lower
semicontinuity of δ: we leave the details of the simple proof to the reader.
The right invariance of δ simply follows by the fact that η ◦ h = ηh(a) ⊗ µD , γ ◦ h =
γh(a) ⊗ µD , so that
δ(η ◦ h, γ ◦ h) ≤ δ(η, γ),
because we can apply the same transformation to any admissible flow η a ⊗µD connecting
η to γ, producing an admissible flow η h(a) ⊗ µD between η ◦ h and γ ◦ h with the same
action. If h ∈ S i (D) the inequality can be reversed, using h−1 .
Now, let us prove the last part of the statement. We first show that
Z
1 2
d2 (f, h) dµD ≤ δ (γf , γh ) ∀f, h ∈ S(D). (3.3.11)
2 D D
Indeed, considering again an optimal flow η a ⊗ µD , for µD -a.e. a ∈ D we have

Z Z T
1 2 1
d (f (a), h(a)) = W22 (δf (a) , δh(a) ) ≤ T |ω̇(t)|2 dt dη a (ω),
2 D Ω(D) 0 2
and we need only to integrate this inequality with respect to a. From (3.3.11) we obtain
that S(D) is a closed subset of Γ(D), relative to the distance δ. In particular, considering
for instance a sequence (gn ) ⊂ S(D) narrowly converging to γ ∈ Γ(D) \ S(D), whose
existence is ensured by (3.2.6), one proves that the two topologies are not equivalent and
the space is not compact. ¤
Combining right invariance with (3.3.10), we obtain
δ(γg , γh ) = δ(γi , γh◦g−1 ) ≤ Ckg − hkαL2 (D) ∀h ∈ S(D), g ∈ S i (D) (3.3.12)
if D = [0, 1]d with d ≥ 3. By the density of S i (D) in S(D) in the L2 norm and the lower
semicontinuity of δ, this inequality still holds when g ∈ S(D).
3.3.3 Brenier’s Eulerian-Lagrangian model

In [35], Brenier proposed a second possible relaxation of Arnold’s problem, motivated
by the fact that this second relaxation allows for a much more precise description of the
pressure field, compared to the Lagrangian model (see Section 3.6).
Still denoting by η = ηa ⊗ µD ∈ Γ(D), γ = γa ⊗ µD ∈ Γ(D) the initial and final plan,
respectively, the idea is to add to the Eulerian variable x a Lagrangian one a (which, in
the case η = γi , simply labels the position of the particle at time 0) and to consider the
family of distributional solutions, indexed by a ∈ D, of the continuity equation
∂t ct,a + div(v t,a ct,a ) = 0 in D0 ((0, T ) × D), for µD -a.e. a, (3.3.13)
with the initial and final conditions
c0,a = ηa , cT,a = γa , for µD -a.e. a. (3.3.14)
RT R
Notice that minimization of the kinetic energy 0 D |v t,a |2 dct,a dt among all possible
solutions of the continuity equation would give, according to [19], the optimal transport
problem between ηa and γa (for instance, a path of Dirac masses on a geodesic connecting
g(a) to h(a) if ηa = δg(a) , γa = δh(a) ). Here, instead, by averaging with respect to a we
minimize the mean kinetic energy
Z Z TZ
|v t,a |2 dct,a dt dµD (a)
D 0 D
with the only global constraint between the family {ct,a } given by the incompressibility
of the flow: Z
ct,a dµD (a) = µD ∀t ∈ [0, T ]. (3.3.15)
D
It is useful to rewrite this minimization problem in terms of the the global measure c in
[0, T ] × D × D and the measures ct in D × D
c := ct,a ⊗ (L 1 × µD ), ct := ct,a ⊗ µD
(from whom ct,a can obviously be recovered by disintegration), and the velocity field
v(t, x, a) := v t,a (x): the action becomes
Z TZ
1
AT (c, v) := T |v(t, x, a)|2 dc(t, x, a),
0 D×D 2
while (3.3.13) is easily seen to be equivalent to
Z Z
d
φ(x, a) dct (x, a) = h∇x φ(x, a), v(t, x, a)i dct (x, a) (3.3.16)
dt D×D D×D
for all φ ∈ Cb (D × D) with a bounded gradient with respect to the x variable.

Thus, we can minimize the action on the class of couples measures-velocity fields (c, v)
that satisfy (3.3.16) and (3.3.15), with the endpoint condition (3.3.14). The existence of
a minimum in this class can be proved by standard compactness and lower semicontinuity
arguments (see [35] for details). This minimization problem leads to a squared distance
2
between η and γ, that we shall still denote by δ (η, γ). Our notation is justified by the
essential equivalence of the two models, proved in the next section.
3.4. Equivalence of the two relaxed models 99
3.4 Equivalence of the two relaxed models

In this section we show that the Lagrangian model is equivalent to the Eulerian-Lagrangian
one, in the sense that minimal values are the same, and there is a way (not canonical, in
one direction) to pass from minimizers of one problem to minimizers of the other one.
Theorem 3.4.1. With the notations of Sections 3.3.2 and 3.3.3,
min AT (η) = min AT (c, v)

η (c,v)
for any η, γ ∈ Γ(D). More precisely, any minimizer η of the Lagrangian model connect-
ing η to γ induces in a canonical way a minimizer (c, v) of the Eulerian-Lagrangian one,
and satisfies for L 1 -a.e. t ∈ [0, T ] the condition
ω̇(t) = v t,a (et (ω)) for η-a.e. (ω, a). (3.4.1)

Proof. Up to an isometric embedding, we shall assume that D ⊂ Rm isometrically (this is
needed to apply Lemma 3.2.3). If η = η a ⊗µ ∈ P(Ω̃(D)) is a generalized incompressible
flow, we denote by D0 ⊂ D a Borel set of full measure such that AT (η a ) < ∞ for all
a ∈ D0 . For any a ∈ D0 we define
cηt,a := (et )# η a , mηt,a = (et )# (ω̇(t)η a ) .
Notice that mηt,a is well defined for L 1 -a.e. t, and absolutely continuous with respect to
cηt,a , thanks to Lemma 3.2.3; moreover, denoting by v ηt,a the density of mηt,a with respect
to cηt,a , by the same lemma we have
Z Z
η 2 η
|v t,a | dct,a ≤ |ω̇(t)|2 dη a (ω), (3.4.2)
D Ω(D)
with equality only if ω̇(t) = v ηt,a (et (ω)) for η a -a.e. ω. Then, we define the global measure
and velocity by
cη := cηt,a ⊗ (L 1 × µD ), v η (t, x, a) = v ηt (x, a) := v ηt,a (x).
It is easy to check that (cη , v η ) is admissible: indeed, writing η = ηa ⊗ µ, γ = γa ⊗ µD ,

the conditions (e0 )# η a = ηa and (eT )# η a = γa yield cη0,a = ηa and cηT,a = γa (for µD -a.e.
a).
This proves that (3.3.14) is fulfilled; the incompressibility constraint (3.3.15) simply
comes from (3.3.4). Finally, we check (3.3.13) for a ∈ D0 ; this is equivalent, recalling the
definition of v t,a , to Z Z
d
φ(x) dct,a (x) = h∇φ, mηt,a i,
η
(3.4.3)
dt D D
which in turn corresponds to

Z Z
d
φ(ω(t)) dη a (ω) = h∇φ(ω(t)), ω̇(t)i dη a (ω). (3.4.4)
dt Ω(D) Ω(D)
This last identity is a direct consequence of an exchange of differentiation and integral.

By integrating (3.4.2) in time and with respect to a we obtain that AT (cη , v η ) ≤
AT (η), and equality holds only if (3.4.1) holds.
So, in order to conclude the proof, it remains to find, given a couple measure-velocity
field (c, v) with finite action that satisfies (3.3.13), (3.3.14) and (3.3.15), an admissible
generalized incompressible flow η with AT (η) ≤ AT (c, v). By applying Theorem 3.2.4
to the family of solutions of the continuity equations (3.3.13), we obtain probability
measures η a with (et )# η a = ct,a and
Z Z T Z T Z
2
|ω̇(t)| dt dη a (ω) ≤ |v(t, x, a)|2 dct,a (x) dt. (3.4.5)
Ω(D) 0 0 D
Then, because of (3.3.15), it is easy to check that η := η a ⊗ µD is a generalized incom-

pressible flow, and moreover η connects η to γ. By integrating (3.4.5) with respect to a,
we obtain that AT (η) ≤ AT (c, v). ¤
3.5 Comparison of metrics and gap phenomena

Throughout this section we shall assume that D = [0, 1]d . In [122], Shnirelman proved
when d ≥ 3 the following remarkable approximation theorem for Brenier’s generalized
(Lagrangian) flows:
Theorem 3.5.1. If d ≥ 3, then each generalized incompressible flow η connecting i to

h ∈ SDiff(D) may be approximated together with the action by a sequence of smooth
flows (gk (t, ·)) connecting i to h. More precisely:
(i) the measures η k := (gk (·, x))# µD narrowly converge in Ω(D) to η;
(ii) AT (gk ) = AT (η k ) → AT (η).
This result yields, as a byproduct, the identity
δ(γi , γh ) = δ(i, h) for all h ∈ SDiff(D), d ≥ 3. (3.5.1)

3.5. Comparison of metrics and gap phenomena 101
More generally the relaxed distance δ(η, γ) arising from the Lagrangian model can be
compared, at least when η = γi and the final condition γ is induced by a map h ∈ S(D),
with the relaxation δ∗ of the Arnold distance:
½ Z ¾
2
δ∗ (h) := inf lim inf δ(i, hn ) : hn ∈ SDiff(D), |hn − h| dµD → 0 . (3.5.2)
n→∞ D
By (3.3.7) and (3.3.8), we have δ∗ (h) ≥ δ(γi , γh ), and a gap phenomenon is said to occur
if the inequality is strict.
In the case d = 2, while examples of h ∈ SDiff(D) such that δ(i, h) = +∞ are known
[121], the nature of δ∗ (h) and the possible occurrence of the gap phenomenon are not
clear.
In this section we prove the non-occurrence of the gap phenomenon when the fi-
nal condition belongs to S(D), and even when it is a transport plan, still under the
assumption d ≥ 3. To this aim, we first extend the definition of δ∗ by setting
n o
δ∗ (γ) := inf lim inf δ(i, hn ) : hn ∈ SDiff(D), γhn → γ narrowly . (3.5.3)
n→∞
This extends the previous definition (3.5.2), taking into account that γhn narrowly con-
verge to γh if and only if hn → h in L2 (µD ) (for instance, this is a simple consequence of
[14, Lemma 2.3]).
Theorem 3.5.2. If d ≥ 3, then δ∗ (γ) = δ(γi , γ) for all γ ∈ Γ(D).
The proof of the theorem, given at the end of this section, is a direct consequence of
Theorem 3.5.1 and of the following approximation result of generalized incompressible
flows by measure-preserving maps (possibly not smooth, or not injective), valid in any
number of dimensions.
Theorem 3.5.3. Let γ ∈ Γ(D). Then, for any probability measure η on Ω(D) such that
(et )# η = µD ∀t ∈ [0, T ], (e0 , eT )# η = γ,
and AT (η) < ∞, there exists a sequence of flows (gk (t, ·))k∈N ⊂ W 1,2 ([0, T ]; L2 (D)) such
that:
(i) gk (t, ·) ∈ S(D) for all t ∈ [0, T ], hence η k := (Φgk )# µD , with Φgk (x) = gk (·, x), are
generalized incompressible flows;
(ii) η k narrowly converge in Ω(D) to η and AT (gk ) = AT (η k ) → AT (η).

Proof. The first three steps of the proof are more or less the same as in the proof of
Shnirelman’s approximation theorem (Theorem 3.5.1 in [122]).
Step 1. Given ε > 0 small, consider the affine transformation of D into the concentric
cube Dε of size 1 − 4ε:
Tε (x) := (2ε, . . . , 2ε) + (1 − 4ε)x.
This transformation induces a map T̃ε from Ω(D) into C([0, T ]; Dε ) (which is indeed a
bijection) given by
T̃ε (ω)(t) := Tε (ω(t)) ∀ω ∈ Ω(D).
Then we define η̃ ε := (T̃ε )# η, and
η ε := (1 − 4ε)d η̃ ε + η 0,ε ,
where η 0,ε is the “steady” flow in D \ Dε : it consists of all the curves in D \ Dε that
do not move for 0 ≤ t ≤ T . It is then not difficult to prove that η ε → η narrowly and
AT (η ε ) → AT (η), as ε → 0.
Therefore, by a diagonal argument, it suffices to prove our theorem for a measure
η which is steady near ∂D. More precisely we can assume that, if ω(0) is in the 2ε-
neighborhood of ∂D, then ω(t) ≡ ω(0) for η-a.e. ω. Moreover, arguing as in Step 1 of
the proof of the above mentioned approximation theorem in [122], we can assume that
the flow does not move for 0 ≤ t ≤ ε, that is, for η-a.e. ω, ω(t) ≡ ω(0) for 0 ≤ t ≤ ε.
Step 2. Let us now consider a family of independent random variables ω1 , ω2 , . . .
defined in a common probability space (Z, Z, P ), with values in C([0, T ], D) and having
the same law η. Recall that η is steady near ∂D and for 0 ≤ t ≤ ε, so we can see ωi
as random variables with values in the subset of Ω(D) given by the curves which do not
move for 0 ≤ t ≤ ε and in the 2ε-neighbourhood of the ∂D. By the law of large numbers,
the random probability measures in Ω(D)
N
1 X
ν N (z) := δω (z) , z ∈ Z,
N i=1 i
narrowly converge to η with probability 1. Moreover, always by the law of large numbers,
also
AT (ν N (z)) → AT (η)
with probability 1. Thus, choosing properly z, we have approximated η with measures
ν N concentrated on a finite number of trajectories ωi (z)(·) which are steady in [0, ε] and
close to ∂D. From now on (as typical in Probability theory) the parameter z will be
tacitly understood.
Step 3. Let ϕ ∈ Cc∞ (Rd ) be a smooth radial convolution kernel with ϕ(x) = 0 for
|x| ≥ 1 and ϕ(x) > 0 for |x| < 1. Given a finite number of trajectories ω1 , . . . , ωN as
described is step 2, we define
µ ¶
1 x − ωi (0)
ai (x) := d ϕ if dist(ωi (0), ∂D) ≥ ε,
ε ε
µ ¶
1 X x − γ(ωi (0))
ai (x) := d ϕ if dist(ωi (0), ∂D) ≤ ε,
ε γ∈Γ ε
where Γ is the discrete group ofR motions in Rn generated by the reflections in the faces
of D. It is easy to check that ai = 1 and that supp(ai ) is the intersection of D with
the closed ball B ε (ωi (0)). Define
gi,t (x) := ωi (t) + (x − ωi (0)) ∀i = 1, . . . , n.
Let MN := (a1 , . . . , aN , g1,t (x), . . . , gN,t (x)) and let us consider the generalized flow η N
associated to MN , given by
Z N Z
1 X
f (ω) dη N := ai (x)f (t 7→ gi,t (x)) dx (3.5.4)
Ω(D) N i=1 D
P R
(that is, η N is the measure in the space of paths given by N1 i D ai (x)δgi,· (x) dx). The
measure η N is well defined for the following reason: if dist(ωi (0), ∂D) ≤ ε we have
gi,t (x) = x, and if dist(ωi (0), ∂D) > ε and ai (x) > 0 we still have that the curve t 7→
gi,t (x) is contained in D because ai (x) > 0 implies |x − ωi (0)| ≤ ε and, by construction,
dist(ωi (t), ∂D) ≥ ε for all times. Since the density ρηN induced by η N is given by
N
N 1 X
ρ (t, x) := ai (x + ωi (0) − ωi (t)),
N i=1
the flow η N is not measure preserving. However we are more or less in the same situation
as in Step 3 in the proof of the approximation theorem in [122] (the only difference being
that we do not impose any final data). Thus, by [122, Lemma 1.2], with probability 1
sup |ρN (t, x) − 1| → 0,

x,t
sup |∂xα ρN (t, x)| → 0 ∀α,

x,t (3.5.5)
Z Z T
|∂t ρN (t, x)|2 dt dx → 0
D 0
as N → ∞. By the first two equations in (3.5.5), we can left compose gi,t with a smooth
correcting flow ζtN (x) as in Step 3 in the proof of the approximation theorem in [122],
in such a way that the flow η̃ N associated to MÑ := (a1 , . . . , aN , ζtN ◦ g1,t (x), . . . , ζtN ◦
gN,t (x)) via the formula analogous to (3.5.4) is incompressible. Moreover, thanks to the
third equation in (3.5.5) and the convergence of AT (ν N ) to AT (η), one can prove that
AT (η̃ N ) → AT (η) with probability 1.
We observe that, since η is steady for 0 ≤ t ≤ ε, the same holds by construction for
η̃ N . Without loss of generality, we can therefore assume that ζtN does not depend on t
for t ∈ [0, ε].
Step 4. In order to conclude, we see that the only problem now is that the flow η̃ N
associated to MÑ is still non-deterministic, since if x ∈ supp(ai ) ∩ supp(aj ) for i 6= j,
then more that one curve starts from x. Let us partition D in the following way:
D = D1 ∪ D2 ∪ . . . ∪ DL ∪ E,
where E is L d -negligible, any set Dj is open, and all x ∈ Dj belong to the interior of
the supports of exactly M = M (j) ≤ N sets ai , indexed by 1 ≤ i1 < · · · < iM ≤ N
(therefore L ≤ 2N ). This decomposition is possible, as E is contained in the union of
the boundaries of supp ai , which is L d -negligible.
Fix one of the sets Dj and assume just for notational simplicity that ik = k for
1 ≤ k ≤ M . We are going to modify the flow η̃ N in Dj , increasing a little bit its action
(say, by an amount α > 0), in such a way that for each point in Dj only one curve
starts from it. Given x ∈ Dj , we know that M curves start from it, weighted with mass
P
ak (x) > 0, and M k=1 ak (x) = 1. These curves coincide for 0 ≤ t ≤ ε (since nothing
moves), and then separate. We want to partition Dj in M sets Ek , with
Z
d
L (Ek ) = ak (x) dx, 1≤k≤M
Dj
in such a way that, for any x ∈ Ek , only one curve ωxk starts from it at time 0, ωxk (t) ∈ Dj
for 0 ≤ t ≤ ε, and the map Ek 3 x 7→ ωxk (ε) ∈ Dj pushes forward L d xEk into ak L d xDj .
Moreover, we want the incompressibility condition to be preserved for all t ∈ [0, ε]. If
this is possible, the proof will be concluded by gluing ωxk with the only curve starting
from ωxk (ε) with weight ak (ω k (ε)).
The above construction can be achieved in the following way. First we write the
interior of Dj , up to null measure sets, as a countable union of disjoints open cubes (Ci )
with size δi satisfying
M 2 X δi2 d
L (Ci ) ≤ α, (3.5.6)
ε i b̄2i
with b̄i := min min ak . This is done just considering the union of the grids in Rd given
1≤k≤M Ci
by Zd /2n for n ∈ N, and taking initially our cubes in this family; if (3.5.6) does not hold,
we keep splitting the cubes until it is satisfied (b̄i can only increase under this additional
splitting, therefore a factor 4 is gained in each splitting). Once this partition is given,
the idea is to move the mass within each Ci for 0 ≤ t ≤ ε. At least heuristically, one can
imagine that in Ci the functions ak are almost constant and that the velocity of a generic
path in Ci is at most of order δi /ε. Thus, the total energy of the new incompressible
fluid in the interval [0, ε] will be of order
XZ Z ε CX 2 d
|ω̇x (t)|2 dt dx ≤ δi L (Ci )
i C i 0 ε i
and the conclusion will follow by our choice of δi .

So, in order to make this argument rigorous, let us fix i and let us see how to construct
our modified flow in the cube Ci for t ∈ [0, ε]. Slicing Ci with respect to the first (d − 1)-
variables, we see that the transport problem can be solved in each slice. Specifically, if
Ci is of the form xi + (0, δi )d , and we define
Z
k
m := ak (x) dx, k = 1, . . . , M,
Ci
whose sum is δid , then the points which belong to Cik := xi + (0, δi )d−1 × Jk have to move
along curves in order to push forward L d xCik into ak L d xCi , where Jk are M consecutive
open intervals in (0, δi ) with length δi1−d mk . Moreover, this has to be done preserving
the incompressibility condition.
If we write x = (x0 , xd ) ∈ Rd with x0 = (x1 , . . . , xd−1 ), we can transport the M
uniform densities ¡ ¢
H 1 x xi + {x0 } × Jk with x0 ∈ [0, δi ](d−1) ,
into the M densities ¡ ¢
ak (x0 , ·)H 1 x xi + {x0 } × [0, δi ]
moving the curves only in the d-th direction, i.e. keeping x0 fixed. Thanks to Lemma 3.5.4
below and a scaling argument, we can do this construction paying at most M 2 b̄−2 3
i δi /ε in
each slice of Ci , and therefore with a total cost less than
M 2 X δid+2
≤ α.
ε i b̄2i
This concludes our construction. ¤

Lemma 3.5.4. Let M ≥ 1 be an R 1integer and let b1 , . . . , bM : [0, 1] → (0, 1] be continuous

PM
with 1 bk = 1. Setting lk = 0 bk dt ∈ (0, 1], and denoting by J1 , . . . , JM consecutive
intervals of (0, 1) with length lk , there exists a family of uniformly Lipschitz maps h(·, x),
with h(t, ·) ∈ S([0, 1]), such that
h(1, ·)# (χJk L 1 ) = bk L 1 , k = 1, . . . , M
and
M2
A1 (h) ≤ , with b̄ := min min bk > 0. (3.5.7)
b̄2 1≤k≤M [0,1]
Proof. We start with a preliminary remark: let J ⊂ (0, 1) be an interval with length
1
l and assume thatR 1 t 7→ ρt is a nonnegative Lipschitz map between [0, 1] and L (0, 1),
with ρt ≤ 1 and 0 ρt dx = l for all t ∈ [0, 1], and let f (t, ·) be the unique (on J, up to
countable sets) nondecreasing map pushing χJ L 1 to ρt . Assume also that supp ρt is an
interval and ρt ≥ r L 1 -a.e. on supp ρt , with r > 0. Under this extra assumption, f (t, x)
is uniquely determined for all x ∈ J, and implicitly characterized by the conditions
Z f (t,x)
ρt (y) dy = L 1 ((0, x) ∩ J), f (t, x) ∈ supp ρt .
0
This implies, in particular, that f (·, x) is continuous for all x ∈ J. We are going to prove
that this map is even Lipschitz continuous in [0, 1] and
d Lip(ρ· )
| f (t, x)| ≤ for L 1 -a.e. t ∈ [0, 1] (3.5.8)
dt r
for all x ∈ J. To prove this fact, we first notice that the endpoints of the interval
supp ρt (whose length is at least l) move at most with velocity Lip(ρ· )/r; then, we fix
x ∈ J = [a, b] and consider separately the cases
x ∈ ∂J = {a, b}, x ∈ Int(J) = (a, b).
In the first case, since for any t ∈ [0, 1]

Z f (t,a) Z f (t,b)
ρt (y) dy = 0, ρt (y) dy = L 1 (J),
0 0
and by assumption f (t, x) ∈ supp ρt for any x ∈ J, we get supp ρt = [f (t, a), f (t, b)] for
all t ∈ [0, 1]. This, together with the fact that the endpoints of the interval supp ρt move
at most with velocity Lip(ρ· )/r, implies (3.5.8) if x ∈ ∂J. In the second case we have
Z f (t,x)
ρt (y) dy ∈ (0, L 1 (J)),
0
therefore f (t, x) ∈ Int(supp ρt ) for all t ∈ [0, 1]. It suffices now to find a Lipschitz estimate
of |f (s, x)−f (t, x)| when s, t are sufficiently close. Assume that f (s, x) ≤ f (t, x): adding
R f (s,x)
and subtracting 0 ρt (y) dy in the identity
Z f (t,x) Z f (s,x)
ρt (y) dy = ρs (y) dy
0 0
we obtain Z Z
f (t,x) f (s,x)
ρt (y) dy = ρs (y) − ρt (y) dy.
f (s,x) 0
Now, as f (s, x) belongs to supp ρt for |s − t| sufficiently small, we get
r|f (s, x) − f (t, x)| ≤ Lip(ρ· )|t − s|.
This proves the Lipschitz continuity of f (·, x) and (3.5.8).

Given this observation, to prove the lemma it suffices to find maps t 7→ ρkt connecting
χJk L 1 to bk L 1 satisfying:
(i) supp ρkt is an interval, and ρkt ≥ min bk ≥ b̄ L 1 -a.e. on its support;
[0,1]
M −1
(ii) Lip(ρk· ) ≤ 2
on [0, 12 ], and Lip(ρk· ) ≤ 2 on [ 12 , 1];
P
M
(iii) ρkt = 1 for all t ∈ [0, 1].
k=1
Indeed, this would produce maps with time derivative bounded by (M − 1)/(2b̄) on [0, 12 ]
and bounded by 2/b̄ on [ 12 , 1], and this easily gives (3.5.7).
The construction can be achieved in two steps. First, we connect χJk L 1 to lk L 1 in
the time interval [0, 12 ]; then, we connect lk L 1 to bk L 1 in [ 21 , 1] by a linear interpolation.
The Lipschitz constants of the second step are easily seen to be less than 2, so let us
focus on the first interpolation.
Let us first consider the case of two densities ρ1 = χJ1 and ρ2 = χJ2 , with J1 = (0, l1 )
and J2 = (l1 , l). In the time interval [0, τ ], we define the expanding intervals
t t
J1,t = (0, l1 + l2 ), J2,t = (l1 − l1 , 1),
τ τ
so that Jk,τ = (0, l) for k = 1, 2, and then define
 

1 on (0, l1 − τt l1 ), 
1 on (l1 + τt l2 , l),
1 t t 2
ρt := l1 /l on (l1 − τ l1 , l1 + τ l2 ), ρt := l2 /l on (l1 − τt l1 , l1 + τt l2 ),

 

0 otherwise. 0 otherwise.
By construction ρkt ≥ lk on Jk,t for k = 1, 2, ρ1t + ρ2t = 1, and it is easy to see that
l1 l2 l
Lip(ρk· ) ≤ ≤ . (3.5.9)
τl 4τ
We can now define the desired interpolation on [0, 12 ] for general M ≥ 2. Let us define
i
ti := for i = 1, . . . , M − 1,
2(M − 1)
so that tM −1 = 12 . We will achieve our construction of ρkt on [0, 12 ] in M − 1 steps, where

at each step we will progressively define ρkt on the time interval [ti−1 , ti ].
First, in the time interval [0, t1 ], we leave fixed ρk0 := χJk L 1 for k ≥ 3 (if such k
exist), while we apply the above construction in J1 ∪ J2 to ρ1 and ρ2 . In this way, on
[0, t1 ], ρ10 := χJ1 L 1 is connected to ρ1t1 := l1 l+l 1
2
χJ1 ∪J2 L 1 , and ρ20 := χJ2 L 1 is connected
to ρ2t1 := l1 l+l 2
2
χJ1 ∪J2 L 1 .
Now, as a second step, we want to connect ρkt1 to l1 +llk2 +l3 χJ1 ∪J2 ∪J3 L 1 for k = 1, 2, 3,
leaving the other densities fixed. To this aim, we define ρ12 1 2
t1 := ρt1 + ρt1 = χJ1 ∪J2 L .
1
In the time interval [t1 , t2 ], we leave fixed ρk0 := χJk L 1 for k ≥ 4 (if such k exist), and
we apply again the above construction in J1 ∪ J2 ∪ J3 to ρ12 3 1
t1 and ρt1 = χJ3 L . In this
l1 +l2
way, on [t1 , t2 ], ρ12 12 1 3
t1 is connected to ρt2 := l1 +l2 +l3 χJ1 ∪J2 ∪J3 L , and ρt1 is connected to
ρ3t2 := l1 +ll32 +l3 χJ1 ∪J2 ∪J3 L 1 . Finally, it suffices to define ρ1t := l1 l+l 1
ρ12 and ρ2t := l1 l+l
2 t
2
ρ12 .
2 t
In the third step we leave fixed the densities ρkt2 for k ≥ 5, and we do the same
construction as before adding the first three densities (that is, in this case one de-
fines ρ123 1 2 3 1
t2 := ρt2 + ρt2 + ρt2 = χJ1 ∪J2 ∪J3 L ). In this way, we connect ρt2 to ρt3 :=
123 123
l1 +l2 +l3
χ
l1 +l2 +l3 +l4 J1 ∪J2 ∪J3 ∪J4
L 1 and ρ4t2 to ρ4t3 := l1 +l2 l+l
4
3 +l4
χJ1 ∪J2 ∪J3 ∪J4 L 1 , and then we define
ρkt := l1 +llk2 +l3 ρ123
t for k = 1, 2, 3.
Iterating this construction on [ti , ti+1 ] for i ≥ P 4, one obtains the desired maps t 7→ ρkt .
Indeed, by construction ρkt ≥ lk on Jk,t , and M k
k=1 ρt = 1. Moreover, by (3.5.9), it is
simple to see that in each time interval [ti , ti+1 ] one has the bound
M −1
Lip(ρk· ) ≤
.
2
¡ −1)2 ¢
So the energy can be easily bounded by 1/b̄2 (M16 + 1 ≤ M 2 /b̄2 . ¤
Proof. (of Theorem 3.5.2) By applying Theorem 3.5.3 to the optimal η connecting i to
γ, we can find maps gk ∈ S(D) such that γgk → γ narrowly and
lim sup δ(γi , γgk ) ≤ δ(γi , γ).

k→∞
3.6. Necessary and sufficient optimality conditions 109
Now, if d ≥ 3 we can use (3.3.12), the triangle inequality, and the density of SDiff(D) in
S(D) in the L2 norm, to find maps hk ∈ SDiff(D) such that
lim sup δ(γi , γhk ) ≤ δ(γi , γ)

k→∞
and γhk → γ narrowly. This gives the thesis. ¤
3.6 Necessary and sufficient optimality conditions

In this section we study necessary and sufficient optimality conditions for the generalized
geodesics; we shall work mainly with the Lagrangian model, but we will use the equivalent
Eulerian-Lagrangian model to transfer regularity informations for the pressure field to
the Lagrangian model. Without any loss of generality, we assume throughout this section
that T = 1.
The pressure field p can be identified, at least as a distribution (precisely, an element
of the dual of C 1 ([0, 1] × D)), by the so-called dual least action principle introduced
in [33]. In order to describe it, let us build a natural class of first variations in the
Lagrangian model: given a smooth vector field w(t, x), vanishing for t sufficiently close
to 0 and 1, we may define the maps S ε : Ω̃(D) → Ω̃(D) by
S ε (ω, a)(t) := (eεwt ω(t), a) , (3.6.1)
where eεwt x is the flow, in the (ε, x) variables, generated by the autonomous field wt (x) =
d εwt
w(t, x) (i.e. e0wt = i and dε e x = w(t, eεwt x)), and the perturbed generalized flows
η ε := (S ε )# η. Notice that η ε is incompressible if div wt = 0, and more generally the
density ρηε satisfies for all times t ∈ (0, 1) the continuity equation
d ηε
ρ (t, x) + div(wt (x)ρηε (t, x)) = 0. (3.6.2)
dε
This motivates the following definition.
Definition 3.6.1 (Almost incompressible flows). We say that a probability measure

ν on Ω(D) is a almost incompressible generalized flow if ρν ∈ C 1 ([0, 1] × D) and
1
kρν − 1kC 1 ([0,1]×D) ≤ .
2
Now we provide a slightly simpler proof of the characterization given in [33] of the
pressure field (the original proof therein involved a time discretization argument).
∗
Theorem 3.6.2. For all η, γ ∈ Γ(D) there exists p ∈ [C 1 ([0, 1] × D)] such that
2
hp, ρν − 1i(C 1 )∗ ,C 1 ≤ A1 (ν) − δ (η, γ) (3.6.3)
for all almost incompressible flows ν satisfying (3.3.5).
Proof. Let us define the closed convex set C := {ρ ∈ C 1 ([0, 1] × D) : kρ − 1kC 1 ≤ 21 },
and the function φ : C 1 ([0, 1] × D) → R+ ∪ {+∞} given by
½
inf {A1 (ν) : ρν = ρ and (3.3.5) holds} if ρ ∈ C;
φ(ρ) :=
+∞ otherwise.
2
We observe that φ(1) = δ (η, γ). Moreover, it is a simple exercise to prove that φ
is convex and lower semicontinuous in C 1 ([0, 1] × D). Let us now prove that φ has
bounded (descending) slope at 1, i.e.
[φ(1) − φ(ρ)]+
lim sup < +∞,
ρ→1 k1 − ρkC 1
By [33, Proposition 2.1] we know that there exist 0 < ε < 12 and c > 0 such that, for any
ρ ∈ C with kρ − 1kC 1 ≤ ε, there is a Lipschitz family of diffeomorphisms gρ (t, ·) : D → D
such that
gρ (t, ·)# µD = ρ(t, ·)µD ,
gρ (t, ·) = i for t = 0, 1, and the Lipschitz constant of (t, x) 7→ gρ (t, x) − x is bounded
by c. Thus, adapting the construction in [33, Proposition 2.1] (made for probability
measures in Ω(D), and not in Ω̃(D)), for any incompressible flow η connecting η to γ,
and any ρ ∈ C, we can define an almost incompressible flow ν still connecting η to γ
such that ρν = ρ, and
A1 (ν) ≤ A1 (η) + c0 kρ − 1kC 1 (1 + A1 (η)),
where c0 depends only on c (for instance, we define ν := G# η, where G : Ω̃(D) → Ω̃(D)
is the map induced by gρ via the formula (ω(t), a) 7→ (gρ (t, ω(t)), a)). In particular,
considering an optimal η, we get
2
φ(ρ) ≤ φ(1) + ckρ − 1kC 1 (1 + δ (η, γ)) (3.6.4)
for any ρ ∈ C with kρ−1kC 1 ≤ ε. This fact implies that φ is bounded on a neighbourhood
of 1 in C. Now, it is a standard fact of convex analysis that a convex function bounded on
a convex set is locally Lipschitz on that set. This provides the bounded slope property.
By a simple application of the Hahn-Banach theorem (see for instance Proposition 1.4.4
in [11]), it follows that the subdifferential of φ at 1 is not empty, that is, there exists p
in the dual of C 1 such that
hp, ρ − 1i(C 1 )∗ ,C 1 ≤ φ(ρ) − φ(1).
This is indeed equivalent to (3.6.3). ¤
This result tells us that, if η is an optimal incompressible generalized flow connecting

2
η to γ (i.e. A1 (η) = δ (η, γ)), and if we consider the augmented action
Z Z 1
1
A1p (ν) := |ω̇(t)|2 dt dν(ω, a) − hp, ρν − 1i, (3.6.5)
Ω̃(D) 0 2
then η minimizes the new action among all almost incompressible flows ν between η and
γ.
Then, using the identities
¯
d d ε ¯ d
S (ω)(t)¯¯ = w(t, ω(t)) = ∂t w(t, ω(t)) + ∇x w(t, ω(t)) · ω̇(t)
dε dt ε=0 dt
and the convergence in the sense of distributions (ensured by (3.6.2)) of (ρηε − 1)/ε to
−div w as ε ↓ 0, we obtain
¯ Z Z 1
d p ¯ d
¯
0 = A1 (η ε )¯ = ω̇(t) · w(t, ω(t)) dt dη(ω, a) + hp, div wi. (3.6.6)
dε ε=0 Ω̃(D) 0 dt
As noticed in [33], this equation identifies uniquely the pressure field p (as a distribution)
up to trivial modifications, i.e. additive perturbations depending on time only.
In the Eulerian-Lagrangian model, instead, the pressure field is defined (see (2.20) in
[35]) and uniquely determined, still up to trivial modifications, by
µZ ¶ µZ ¶
∇p(t, x) = −∂t v(t, x, a) dct,x (a) −div v(t, x, a) ⊗ v(t, x, a) dct,x (a) , (3.6.7)
D D
all derivatives being understood in the sense of distributions in (0, 1) × D (here (c, v)
is any optimal pair for the Eulerian-Lagrangian model). We used the same letter p
to denote the pressure field in the two models: indeed, we have seen in the proof of
Theorem 3.4.1 that, writing η = η a ⊗ µD , the correspondence
η 7→ (cηt,a , v ηt,a ) with cηt,a := (et )# η a , v ηt,a cηt,a := (et )# (ω̇(t)η a )
maps optimal solutions for the first problem into optimal solutions for the second one.
Since under this correspondence (3.6.7) reduces to (3.6.6), the two pressure fields coin-
cide.
The following crucial regularity result for the pressure field is proved¡ in the last¢
section, and it improves in the time variable the regularity ∂xi p ∈ Mloc (0, 1) × D
obtained by Brenier in [35].
Theorem 3.6.3 (Regularity of pressure). Let (c, v) be an optimal pair for the
Eulerian-Lagrangian model, and let p be the pressure field identified by (3.6.7). Then
∂xi p ∈ L2loc ((0, 1); M(D)) and
¡ ¢ ¡ d/(d−1) ¢
p ∈ L2loc (0, 1); BVloc (D) ⊂ L2loc (0, 1); Lloc (D) .
In the case D = Td the same properties hold globally in space, i.e. replacing BVloc (D)
d/(d−1)
with BV (Td ) and Lloc (D) with Ld/(d−1) (Td ).
The L1loc integrability of p allows much stronger variations in the Lagrangian model,
that give rise to possibly nonsmooth densities, which may even vanish.
From now one we shall confine our discussion to the case of the flat torus Td , as
our arguments involve some global smoothing that becomes more technical, and needs
to be carefully checked in more general situations. We also set µT = µTd and denote
by dT the Riemannian distance in Td (i.e. the distance modulo 1 in Rd /Zd ). In the
next theorem we consider generalized flows ν with bounded compression, defined by the
property ρν ∈ L∞ ((0, 1) × D).
Theorem 3.6.4. Let η be an optimal incompressible flow in Td between η and γ. Then
hp, ρν − 1i ≤ A1 (ν) − A1 (η) (3.6.8)
for any generalized flow with bounded compression ν between η and γ such that
ρν (t, ·) = 1 for t sufficiently close to 0, 1. (3.6.9)
If p ∈ L1 ([0, 1] × Td ), the condition (3.6.9) is not required for the validity of (3.6.8).
Proof. Let J := {ρν (t, ·) 6= 1} b (0, 1) and let us first assume that ρν is smooth. If
kρν −1kC 1 ≤ 1/2, then the result follows by Theorem 3.6.2. If not, for ε > 0 small enough
(1 − ε)η + εν is a slightly compressible generalized flow in the sense of Definition 3.6.1.
Thus, we have
εhp, ρν − 1i = hp, ρ(1−ε)η+εν − 1i ≤ A1 ((1 − ε)η + εν) − A1 (η) = ε (A1 (ν) − A1 (η)) ,
and this proves the statement whenever ρν is smooth.

If ρν is not smooth, we need a regularization argument. Let us assume first that ρν is
smooth in time, uniformly with respect to x, but not in space. We fix a cut-off function
χ ∈ Cc1 (0, 1) strictly positive on a neighbourhood of J and define, for y ∈ Rd , the maps
Tε,y : Ω̃(Td ) → Ω̃(Td ) by
Tε,y (ω, a) := (ω + εyχ, a), (ω, a) ∈ Ω(Td ).

R
Then, we set ν ε := Rd (Tε,y )# νφ(y) dy, where φ : Rd → [0, +∞) is a standard convolution
kernel. It is easy to check that ν ε still connects η to γ, and that
ρν ε (t, ·) = ρν (t, ·) ∗ φεχ(t) ∀t ∈ [0, 1],
where φε (x) = ε−d φ(x/ε). Since

Z Z Z 1
lim A1 (ν ε ) = lim |ω̇(t) + εy χ̇(t)|2 dt dν(ω, a)φ(y) dy = A1 (ν)
ε↓0 ε↓0 Rd Ω̃(Td ) 0
we can pass to the limit in (3.6.8) with ν ε in place of ν, which are smooth.
In the general case we fix a convolution kernel with compact support ϕ(t) and, with
the same choice of χ done before, we define the maps
Z 1
Tε (ω, a)(t) := ( ω(t − sεχ(t))ϕ(s) ds, a).
0
Setting ν ε = (Tε )# ν, it is easy to check that A1 (ν ε ) → A1 (ν) and that

Z 1
νε
ρ (t, x) = ρν (t − sεχ(t), x)ϕ(s) ds
0
are smooth in time, uniformly in x. So, by applying (3.6.8) with ν ε in place of ν, we

obtain the inequality in the limit.
Finally, if p is globally integrable, we can approximate any generalized flow with
bounded compression ν between η andR γ by transforming ω into ω ◦ ψε , where ψε :
1 t
[0, 1] → [0, 1] is defined by ψε (t) := 1−2ε χ
0 [ε,1−ε]
(s)ds (so that ψε is constant for t close
to 0 and 1). Passing to the limit as ε ↓ 0 we obtain the inequality even without the
condition ρν (t, ·) = 1 for t close to 0, 1. ¤
Remark 3.6.5 (Smoothing of flows and plans). Notice that the same smoothing
argument can be used to prove this statement: given a flow η between η = ηa ⊗ µT
and γ = γa ⊗ µT (not necessarily with bounded compression), we can find flows with
bounded compression η ε connecting η ε := (ηa ) ∗ φε ⊗ µT to γ ε := (γa ) ∗ φε ⊗ µT , with
AT (η ε ) = AT (η) and
Z Z 1 Z Z 1
¡ ¢
rε (τ, ω) dτ dη(ω, a) = r(τ, ω) dτ dη ε (ω, a) ∀r ∈ L1 [0, 1] × Td
Ω̃(Td ) 0 Ω̃(Td ) 0
(where, as usual, rε (t, x) = r(t, ·) ∗ φε (x)). In order to have these properties, it suffices
to define Z
ε
η := (σεy )# η φ(y) dy,
Rd
where σz (ω, a) = (ω + z, a). Notice also that the “mollified plans” η ε , γ ε converge to η,
γ in (Γ(Td ), δ): if we consider the map Syε : Td → Ω(Td ) given by x 7→ ωx (t) := x + εty,
the generalized incompressible flow ν ε = ν εa ⊗ µT , with
Z
ε
ν a := (Syε )# λa φ(y) dy,
Rd
connects
R in [0, 1] the plan λ = λa ⊗ µT to λε = (λa ∗ φε ) ⊗ µT , with an action equal to
ε2 Rd |y|2 φ(y) dy.
In order to state necessary and sufficient optimality conditions at the level of single
fluid paths, we have to take into account that the pressure field is not pointwise defined,
and to choose a particular representative in its equivalence class, modulo negligible sets
in spacetime. Henceforth, we define
p̄(t, x) := lim inf pε (t, x), (3.6.10)

ε↓0
where, thinking of p(t, ·) as a 1-periodic function in Rd , pε is defined by

Z
−d/2 2
pε (t, x) := (2π) p(t, x + εy)e−|y| /2 dy.
Rd
Notice that pε is smooth and still 1-periodic. The choice of the heat kernel here is conve-
nient, because of the semigroup property pε+ε0 = (pε )ε0 . Recall that p̄ is a representative,
because at any Lebesgue point x of p(t, ·) the limit of pε (t, x) exists, and coincides with
p(t, x).
In order to handle passages to limits, we need also uniform pointwise bounds on pε ;
therefore we define
Z
−d/2 2
M f (x) := sup (2π) |f |(x + εy)e−|y| /2 dy, f ∈ L1 (Td ). (3.6.11)
ε>0 Rd
We will use the following facts: first,
M fε = sup |fε |ε0 ≤ sup(|f |ε )ε0 ≤ sup |f |r = M f

ε0 >0 ε0 >0 r>0
because of the semigroup property; second, standard maximal inequalities imply kM f kLp (Td ) ≤
cp kf kLp (Td ) for ¡all p > 1. Setting¢ M p(t, x) := M p(t, ·)(x), by Theorem
¡ 3.6.3¢ we infer
2 d/d−1
that M p ∈ Lloc (0, 1), L (T ) , so that in particular M p ∈ Lloc (0, 1) × Td . This is
d 1
the integrability assumption on p that will play a role in the rest of this section.
Definition 3.6.6 (q-minimizing path). Let ω ∈ H 1 ((0, 1); D) with M q(τ, ω) ∈ L1 (0, 1).
We say that ω is a q-minimizing path if
Z 1 Z 1
1 1
|ω̇(τ )|2 − q(τ, ω) dτ ≤ |ω̇(τ ) + δ̇(τ )|2 − q(τ, ω + δ) dτ
0 2 0 2
for all δ ∈ H01 ((0, 1); D) with M q(τ, ω + δ) ∈ L1 (0, 1).

Analogously, we say that ω is a locally q-minimizing path if
Z t Z t
1 2 1
|ω̇(τ )| − q(τ, ω) dτ ≤ |ω̇(τ ) + δ̇(τ )|2 − q(τ, ω + δ) dτ (3.6.12)
s 2 s 2
for all [s, t] ⊂ (0, 1) and all δ ∈ H01 ((s, t); D) with M q(τ, ω + δ) ∈ L1 (s, t).
Remark 3.6.7. We notice that, for incompressible flows η, the L1 (resp. L1loc ) integra-
bility of M q(τ, ω) imposed
¡ on the ¢ curves ω (and on 1their
¡ perturbations
¢ ω + δ) is satisfied
1 d d
η-a.e. if M q ∈ L (0, 1) × T ) (resp. M q ∈ Lloc (0, 1) × T ); this can simply be
obtained first noticing that the incompressibility of η and Fubini’s theorem give
Z Z Z Z
f (τ, ω) dτ dη(ω, a) = f (τ, x) dµTd (x) dτ
Ω̃(Td ) J J Td
for all nonnegative Borel functions f and all intervals J ⊂ (0, 1), and then applying this
identity to f = M q.
Theorem 3.6.8 (First necessary condition). Let η = η a ⊗ µT be any optimal in-

compressible flow on Td . Then, η is concentrated on locally p̄-minimizing paths, where
p̄ is the precise representative of the pressure field p, and on p̄-minimizing paths if
M p ∈ L1 ([0, 1] × Td ).
Proof. With no loss of generality we identify Td with Rd /Zd . Let η be an optimal
incompressible flow and [s, t] ⊂¡(0, 1). We ¢ fix a nonnegative function χ ∈ Cc1 (0, 1) with
{χ > 0} = (s, t). Given δ ∈ H01 [s, t]; Td , y ∈ Rd and a Borel set E ⊂ Ω̃(Td ), we define
Tε,y : Ω̃(Td ) → Ω̃(Td ) by
(
(ω, a) if ω ∈
/ E;
Tε,y (ω, a) :=
(ω + δ + εyχ, a) if ω ∈ E
(of course, the sum is understood modulo 1) and ν ε,y := (Tε,y )# η.

It is easy to see that ν ε,y is a flow with bounded compression, since for all times τ
the curves ω(τ ) are either left unchanged, or translated by the constant δ(τ ) + εyχ(τ ),
so that the density produced by ν ε,y is at most 2, and equal to 1 outside the interval
[s, t].
Therefore, by Theorem 3.6.4 we get
Z Z t Z
ν ε,y
p̄(ρ − 1) dτ dµT ≤ A1 (ω + δ + εyχ) − A1 (ω) dη(ω, a).
Td s E
Rearranging terms, we get

Z Z t Z ·Z t ¸
1 2 1 2
|ω̇| −p̄(τ, ω) dτ dη(ω, a) ≤ |ω̇ + δ̇ + εy χ̇| − p̄(τ, ω + δ + εyχ) dτ dη(ω, a).
E s 2 E s 2
2
We can now average the above inequality using the heat kernel φ(y) = (2π)−d/2 e−|y| /2 ,
and we obtain
Z Z t
1 2
|ω̇| − p̄(τ, ω) dτ dη(ω, a)
E s 2
Z ·Z Z t Z t ¸
1 2
≤ |ω̇ + δ̇ + εy χ̇| dτ φ(y) dy − pεχ(τ ) (τ, ω + δ) dτ dη(ω, a).
E Rd s 2 s
¡ ¢
Now, let D ⊂ H01 [s, t]; Td be a countable dense subset; by the arbitrariness of E and
Remark 3.6.7 we infer the existence of a η-negligible Borel set B ⊂ Ω̃(Td ) such that
M p(τ, ω) ∈ L1 (s, t) and
Z t Z Z t Z t
1 2 1 2
|ω̇| − p̄(τ, ω) dτ ≤ |ω̇ + δ̇ + εy χ̇| dτ φ(y) dy − pεχ(τ ) (τ, ω + δ) dτ
s 2 Rd s 2 s
holds for all ε = 1/n, δ ∈ D and (ω, a) ∈ Ω̃(Td ) \ B.¡ By a ddensity

¢ argument, we see that
1
the same inequality holds for all ε = 1/n, δ ∈ H0¡ [s, t]; T¢ , and ω ∈ Ω̃(Td ) \ B.
Now, if M p(τ, ω + δ) ∈ L1 (s, t), since δ ∈ H01 [s, t]; Td we have that M p(τ, ω + δ) ∈
L1 (s, t), and we can use the bound |pε | ≤ M p to pass to the limit as ε ↓ 0 to obtain that
(3.6.12) holds with q = p̄.
The proof of the global minimality ¡ property
¢ in the case when p ∈ L1 ([0, T ] × Td )
is similar, just letting δ vary in H01 [0, 1]; Td and using a fixed function χ ∈ C 1 ([0, 1])
with χ(0) = χ(1) = 0 and χ > 0 in (0, 1). ¤
In order to state the second necessary optimality condition fulfilled by minimizers,
we need some preliminary definition. Let q ∈ L1 ([s, t] × D) and let us define the cost
cs,t
q : D × D → R of the minimal connection in [s, t] between x and y, namely
½Z t ¾
s,t 1 2 1
cq (x, y) := inf |ω̇(τ )| − q(τ, ω) dτ : ω(s) = x, ω(t) = y, M q(τ, ω) ∈ L (s, t) ,
s 2
(3.6.13)
with the convention cs,tq (x, y) = +∞ if no admissible curve ω exists. Using this cost
function cs,t
q , we can consider the induced optimal transport problem, namely
½Z ¾
Wcs,t
q
(µ1 , µ2 ) := inf cs,t
q (x, y) dλ(x, y) : λ ∈ Γ(µ1 , µ2 ), (cs,t
q )
+ 1
∈ L (λ) ,
D×D
(3.6.14)
where Γ(µ1 , µ2 ) is the family of all probability measures λ in D×D whose first and second
marginals are respectively µ1 and µ2 . Again, we set by convention Wcs,t q
(µ1 , µ2 ) = +∞
if no admissible λ exists.
Unlike most classical situations (see [132]), existence of an optimal λ is not guaranteed
because cs,t
q are not lower semicontinuous in D × D, and also it seems difficult to get
lower bounds on cs,t q . It will be useful, however, the following upper bound on Wcs,t q
:
Lemma 3.6.9. If M q ∈ L1 ([s, t] × Td ) there exists a nonnegative µT -integrable function

Kqs,t satisfying
cs,t s,t
q (x, y) ≤ Kq (x) + Kq (y)
s,t
∀x, y ∈ Td . (3.6.15)
Remark 3.6.10. By (3.6.15) we deduce that, if Kqs,t ∈ L1 (µ1 + µ2 ), then (cs,t + 1

q ) ∈ L (λ)
for all λ ∈ Γ(µ1 , µ2 ) and we have
Z Z
s,t
cq (x, y) dλ(x, y) ≤ Kqs,t (w) d(µ1 + µ2 )(w) ∀λ ∈ Γ(µ1 , µ2 ).
Td ×Td Td
In particular, Wcs,t
q
(µ1 , µ2 ) as defined in (3.6.14) is not equal to +∞.
Proof. Assume s = 0 and let l = t/2. Let us fix x, y ∈ Td ; given z ∈ Td we consider

the projection on Td of the Euclidean path
(
x + τl (z − x) if τ ∈ [0, l];
ωz (τ ) := τ −l
z + l (y − z) if τ ∈ [l, t].
This path leads to the estimate

Z l Z t
d2 (x, z) + d2T (z, y) τ τ −l
c0,t
q (x, y) ≤ T + M q(τ, x+ (z−x)) dτ + M q(τ, z+ (y−z)) dτ.
2l 0 l l l
√
d
By integrating the free variable z with respect to µT , since dT ≤ 2
on Td × Td , we get
Z Z l
d τ τ
c0,t
q (x, y) ≤ + M q(τ, x + (z − x)) + M q(l + τ, z + (y − z)) dτ dµT (z).
4l Td 0 l l
Therefore, the function

Z Z l
d τ τ
Kq0,t (w) := + M q(τ, w + (z − w))+ M q(l + τ, z + (w − z)) dτ dµT (z) (3.6.16)
4l Td 0 l l
fulfils (3.6.15). It is easy to check, using Fubini’s theorem, that Kq0,t is µT -integrable in
Td . Indeed,
Z Z Z Z l
d τ
Kq0,t (w) dµT (w) = + M q(τ, w + (z − w)) dτ dµT (z) dµT (w)
Td 4l Td Td 0 l
Z Z Z l
τ
+ M q(l + τ, z + (w − z)) dτ dµT (w) dµT (z)
Td Td 0 l
Z Z Z l
d τ
= + M q(τ, w + y) dτ dµT (y) dµT (w)
4l Td Td 0 l
Z Z Z l
τ
+ M q(l + τ, z + y) dτ dµT (z) dµT (y)
Td Td 0 l
Z lZ Z
d τ τ
= + M q(τ, w + y) + M q(l + τ, w + y) dµT (w) dµT (y) dτ
4l d d l l
Z0 t ZT T
d
= + M q(τ, w) dµT (w) dτ < +∞.
4l 0 Td
¤
In the proof of the next theorem we are going to use the measurable selection theorem
(see [43, Theorems III.22 and III.23]): if (A, A, ν) is a measure space, X is a Polish space
and E ⊂ A × X is Aν ⊗ B(X)-measurable, where Aν is the ν-completion of A, then:
(i) the projection πA (E) of E on A is Aν -measurable;
(ii) there exists a (Aν , B(X))-measurable map σ : π(E) → X such that (x, σ(x)) ∈ E
for ν-a.e. x ∈ πA (E).
The next theorem will provide a new necessary optimality condition involving not
only the path that should be followed between x and y (which, as we proved, should
minimize the Lagrangian Lp̄ in (3.1.8)), but also the “weights” given to the paths. We
observe that, when a variation of these weights is performed, new flows η̃ between η
and γ are built which need not be of bounded compression, for which (et )# η̃ might be
even singular with respect to µT ; therefore we can’t use directly them in the variational
principle (3.6.8); however, this difficulty can be overcome by the smoothing procedure
in Remark 3.6.5.
Theorem 3.6.11 (Second necessary condition). Let η = η a ⊗ µT be an optimal

incompressible flow on Td between η and γ. Then, for all intervals [s, t] ⊂ (0, 1),
Wcs,t
p̄
(ηa , γa ) ∈ R and the plan (es , et )# η a is optimal, relative to the cost cs,t
p̄ defined
in (3.6.13), for µT -a.e. a.
Proof. Let [s, t] ⊂ (0, 1) be fixed. Since
Z Z Z Z Z t
s,t 1
cp̄ (x, y) d(es , et )# η a dµT (a) ≤ |ω̇(τ )|2 − p̄(τ, ω) dτ dη a (ω) dµT (a)
Td Td ×Td Td Ω(Td ) s 2
2
= (t − s)δ (η, γ), (3.6.17)
it suffices to show that

Z
2
(t − s)δ (η, γ) ≤ Wcs,t
p̄
(ηas , γat ) dµT (a). (3.6.18)
Td
We are going to prove this fact by a smoothing argument. We set η s = ηas ⊗ µT , γ t =

γat ⊗ µT , with ηas = (es )# η a , γat = (et )# η a . Recall that Remark 3.3.1 gives
δ(η, η s ) = sδ(η, γ), δ(γ t , γ) = (1 − t)δ(η, γ).
First, we notice that Lemma 3.6.9 gives

Z Z Z
s t s,t
Wcs,t (ηa , γa ) dµT (a) ≤ K−|p̄| (w) d(ηas + γat )(w) dµT (a)
d −|p̄| d d
T T
Z T (3.6.19)
s,t
=2 K−|p̄| (w) dµT (w) < +∞.
Td
We also remark that, since τ 7→ kpε (τ, ·)k∞ is integrable in (s, t), for any ε > 0 the cost
cs,t
pε is bounded both from above and below. Next, we show that
cs,t s,t
p̄ (x, y) ≥ lim sup cpε (x, y) ∀(x, y) ∈ Td × Td . (3.6.20)
ε↓0
¡ ¢
Indeed, let ω ∈ H 1 [s, t]; Td with ω(s) = x, ω(t) = y and M p(τ, ω) ∈ L1 (s, t) (if there is
no such ω, there is nothing to prove). By the pointwise bound |pε | ≤ M p and Lebesgue’s
theorem, we get
Z t Z t
1 2 1
|ω̇(τ )| − p̄(τ, ω) dτ = lim |ω̇(τ )|2 − pε (τ, ω) dτ.
s 2 ε↓0 s 2
By the L1 (L∞ ) bound on M pε , the curve ω is admissible also for the variational problem
defining cs,t s,t
pε , therefore the above limit provides an upper bound on lim supε cpε (x, y). By
minimizing with respect to ω we obtain (3.6.20).
By (3.6.19) and the pointwise bound p̄ ≥ −|p̄| we infer that the positive part of
Wcs,t
p̄
(ηas , γat ) is µT -integrable. Let now δ > 0 be fixed, and let us consider the compact
space X := P(Td × Td ) and the B(Td )µT ⊗ B(X)-measurable set
½ Z ³ ¾
d s t s,t s t 1´
E := (a, λ) ∈ T × X : λ ∈ Γ(ηa , γa ), cp̄ (x, y) dλ < δ + Wcs,t (ηa , γa ) ∨ −
Td ×Td
p̄ δ
(we skip the proof of the measurability, that is based on tedious but routine arguments).
Since Wcs,tp̄
(ηas , γat ) < +∞ for µT -a.e. a, we obtain that for µT -a.e. a ∈ Td there exists
λ ∈ Γ(ηas , γat ) with (a, λ) ∈ E. Thanks to the measurable selection theorem we can select
a Borel family a 7→ λa ∈ P(Td × Td ) such that λa ∈ Γ(ηas , γat ) and
Z ³ 1´
cs,t
p̄ (x, y) dλ a < δ + W s,t
cp̄ (η s
a a, γ t
) ∨ − for µT -a.e. a ∈ Td .
d
T ×T d δ
By Lemma 3.6.9 and Remark 3.6.10 we get
cs,t s,t s,t s,t s,t

pε (x, y) ≤ Kpε (x) + Kpε (y) ≤ Kp (x) + Kp (y) ∀x, y ∈ Td
and
Z Z Z Z
Kps,t (x) + Kps,t (y) dλa dµT (a) = Kps,t d(ηas + γat ) dµT (a) < +∞
Td Td ×Td Td Td
(we used the pointwise bound M pε ≤ M p and the fact that q 7→ Kqs,t has a monotone
dependence upon M q, see (3.6.16)). Therefore (3.6.20) and Fatou’s lemma give
Z ³ Z Z
s t 1´
δ+ Wcs,t (ηa , γa ) ∨ − dµT (a) ≥ lim sup cs,t
pε (x, y)dλa dµT (a). (3.6.21)
Td p̄ δ ε↓0 Td d
T ×Td
Still thanks
¡ to the ¢ measurable selection theorem, we can find a Borel map (x, y, a) 7→
x,y d x,y x,y x,y
ωa,ε ∈ C [s, t]; T with ωa,ε (s) = x, ωa,ε (t) = y, M pε (τ, ωa,ε ) ∈ L1 (s, t) and
Z t
1 x,y 2 x,y
|ω̇a,ε | − pε (τ, ωa,ε ) dτ < δ + cs,t
pε (x, y) for λa ⊗ µT -a.e. (x, y, a).
s 2
Let λε = λεa ⊗ µT be the push-forward, under the map (x, y, a) 7→ ωa,ε x,y
, of the measure
ε ε
λa ⊗ µT ; by construction this measure fulfils (es )# λa = ηa , (et )# λa = γat , (because the
s
marginals of λa are ηas and γat ), therefore it connects η s to γ t in [s, t]. Then, from (3.6.21)
we get
Z ³ Z Z t
1´ 1
2δ+ Wcs,t (η s
a , γ t
a )∨− dµ T (a) ≥ lim sup |ω̇(τ )|2 −pε (τ, ω) dτ dλε (ω, a).
T d p̄ δ ε↓0 d
C([s,t];T )×Td s 2
ε
Eventually, Remark 3.6.5 provides us with a flow with bounded compression λ̂ connect-
ing η s,ε to γ t,ε in [s, t] with
Z ³ Z Z t
1´ 1 ε
2δ+ Wcs,t (ηa , γa )∨− dµT (a) ≥ lim sup |ω̇(τ )|2 −p̄(τ, ω) dτ dλ̂ (ω, a).
Td
p̄ δ ε↓0 C([s,t];Td )×Td s 2
(3.6.22)
s,ε s t,ε t d
Since η → η and γ → γ in (Γ(T ), δ), we can find (by scaling η from [0, s] to
[0, sε ] and from [t, 1] to [tε , 1], and using repeatedly the concatenation, see Remark 3.3.2)
generalized flows ν ε between γ and η in [0, 1], sε ↑ s, tε ↓ t satisfying:
(a) ν ε connects η to η s in [0, sε ], η s to η s,ε in [sε , s], γ t,ε to γ t in [t, tε ], γ t to γ in [tε , 1]

and is incompressible in all these time intervals;
ε
(b) the restriction of ν ε to [s, t] coincides with λ̂ ;
2 2
(c) the action of ν ε in [0, s] converges to δ (η, η s ) = s2 δ (η, γ), and the action of ν ε in
2 2
[t, 1] converges to δ (γ t , γ) = (1 − t)2 δ (η, γ).
Since ν ε is a flow with bounded compression connecting η to γ we use (3.6.8) and the
incompressibility in [0, 1] \ [s, t] to obtain
Z Z 1 Z Z t
1 2
|ω̇(τ )|2 dτ dν ε (ω, a) − p̄(τ, ω) dτ dν ε (ω, a) ≥ δ (η, γ) (3.6.23)
Ω̃(Td ) 0 2 Ω̃(Td ) s
for all ε > 0. Taking into account that (b) and (c) imply
Z Z s
1 2
|ω̇(τ )|2 dτ dν ε (ω, a) → sδ (η, γ)
Ω̃(Td ) 0 2
and Z Z 1
1 2
|ω̇(τ )|2 dτ dν ε (ω, a) → (1 − t)δ (η, γ),
Ω̃(Td ) t 2
from (3.6.22) and (3.6.23) we get

Z ³ ´
1 2 2
2δ + Wcs,t (ηas , γat ) ∨ − dµT (a) ≥ (1 − s − (1 − t))δ (η, γ) = (t − s)δ (η, γ). (3.6.24)
Td
p̄ δ
Letting δ ↓ 0 we obtain the µT -integrability of Wcs,t

p̄
(ηas , γat ) and (3.6.18). ¤
A byproduct of the above proof is that equalities hold in (3.6.17), (3.6.18), and
therefore
Z Z µZ t ¶
1 2 s,t
|ω̇(τ )| − p̄(τ, ω) dτ − cp̄ (ω(s), ω(t)) dη a (ω) dµT (a) (3.6.25)
Td Ω(Td ) s 2
Z Z Z t Z
1 2
= |ω̇(τ )| − p̄(τ, ω) dτ dη a (ω) dµT (a) − Wcs,t (ηas , γat ) dµT (a) = 0.
Td d
Ω(T ) s 2 Td p̄
This yields in particular also the first optimality condition. However, as the proof of
Theorem 3.6.11 is much more technical than the one presented in Theorem 3.6.8, we
decided to present both.
Now we show that the optimality conditions in Theorems 3.6.8 and 3.6.11 are also
sufficient, even in the case of a general compact manifold without boundary D.
Theorem 3.6.12 (Sufficient condition). Assume that η = η a ⊗ µ is a generalized
incompressible flow in D between η and γ, and assume that for some map q the following
properties hold:
(a) M q ∈ L1 ((0, 1) × D) and η is concentrated on q-minimizing paths;
(b) the plan (e0 , e1 )# η a is optimal, relative to the cost c0,1
q defined in (3.6.13), for
µD -a.e. a.
Then η is optimal and q is the pressure field. In addition, if (a), (b) are replaced by
(a’) M q ∈ L1loc ((0, 1) × D) and η is concentrated on locally q-minimizing paths;
(b’) for all intervals [s, t] ⊂ (0, 1), the plan (es , et )# η a is optimal, relative to the cost
cs,t
q defined in (3.6.13), for µD -a.e. a,
the same conclusions hold.

Proof.R Assume first that (a) and (b) hold, and assume without loss of generality
that D q(t, ·) dµD = 0 for almost all t ∈ (0, 1). Recalling that, thanks to the global
integrability of M q, any generalized incompressible flow ν = ν a ⊗ µD between η and γ
is concentrated on curves ω with M q(τ, ω) ∈ L1 (0, 1) (see Remark 3.6.7), we have
Z Z Z 1
1 2
A1 (ν) = |ω̇| − q(τ, ω) dτ dν a (ω) dµD (a) (3.6.26)
D Ω(D) 0 2
Z Z Z
0,1
≥ cq (x, y) d(e0 , e1 )# ν a dµD (a) ≥ Wcq0,1 (ηa , γa ) dµD (a).
D D×D D
When ν = η the first inequality is an equality, because η is concentrated on q-minimizing

paths, as well as the second inequality, because of the optimality of the plan (e0 , e1 )# η a .
This proves that η is optimal. Moreover, by using the inequality in (3.6.26) with a flow
ν with bounded compression, one obtains
A1 (ν) ≥ A1 (η) + hq, ρν − 1i.
Considering almost incompressible flows ν arising by a smooth perturbation of η as

described at the beginning of this section (see (3.6.1) in particular), the same argument
used to obtain (3.6.6) gives that q satisfies (3.6.6), so that q is the pressure field.
In the case when (a)’ and (b)’ hold, by localizing in all intervals [s, t] ⊂ (0, 1) the
previous argument (see Remark 3.3.2), one obtains that
Z Z t
1 2 2
(t − s) |ω̇| dτ dη(ω, a) = δ (γs , γt ),
Ω̃(D) s 2
where γs = (es , πD )# η and γt = (et , πD )# η. Letting s ↓ 0 and t ↑ 1 we obtain the

optimality of η. ¤
A byproduct of the previous result is a new variational principle satisfied, at least
locally in time, by the pressure
¡ field. ¢Up to a restriction to a smaller time interval we
1
shall assume that M p ∈ L [0, 1] × Td .
Corollary 3.6.13 (Variational characterization of the pressure). Let η, γ ∈ Γ(Td )

and let p be the unique pressure field induced by the constant¡ speed geodesics
¢ in [0, 1]
1 d
between η = ηa ⊗ µRT and γ = γa ⊗ µT . Assume that M p ∈ L [0, 1] × T and, with no
loss of generality, Td p(t, ·) dµT = 0. Then p̄ maximizes the functional
Z Z 1 Z
q 7→ Ψ(q) := Wc0,1
q
(ηa , γa ) dµT (a) + q(τ, x) dµT (x) dτ
Td 0 Td
among all functions q : [0, 1] × Td → R with M q ∈ L1 ([0, 1] × Td ).

Proof. We first remark that the functional Ψ is invariant under sum of functions
depending on t only, so we can assume that the spatial means of any function q vanish.
From (3.6.25) we obtain that
Z Z Z Z 1
1
Wc0,1 (ηa , γa ) dµT (a) = |ω̇(τ )|2 − p̄(τ, ω) dτ dη a (ω) dµT (a).
Td
p̄
Td Ω(Td ) 0 2
By the incompressibility constraint, in the right hand ¡ side p̄ can ¢ be replaced by any
1 d
function q whose spatial
R means vanish and, if M q ∈ L [0, 1] × T , the resulting integral
bounds from above Td Wc0,1q
(ηa , γa ) dµT (a), as we proved in (3.6.26). ¤
3.7 Regularity of the pressure field

In this last section, using the Eulerian-Lagrangian formulation introduced by Brenier in
[35], we want to improve his regularity result to deduce that the pressure field is a locally
integrable funtion.
We therefore consider the family of distributional solutions ct,a , indexed by a ∈ D,
of the continuity equation (3.3.13) with the initial and final conditions (3.3.14), and we
minimize the action Z TZ
1 2
|v| (t, x, a) dc(t, x, a),
0 D×D 2
under the global constraint given by the incompressibility of the flow (3.3.15). By what
we have already proved, the existence of minimizing pairs (c, v) with finite action holds
when, for instance, D = [0, 1]d or D = Td is the flat d-dimensional torus (see Section
3.3.2). Moreover minimizing pairs (c, v) satisfy the following two properties:
R
(a) (Constancy of kinetic energy) The map t 7→ |v|2 (t, x, a) dct coincides a.e. in
(0, T ) with a constant (2T −1 times the minimal action);
(b) (Weak solution to Euler’s equations) There exists a distribution p in (0, T )×D
satisfying
µZ ¶ µZ ¶
∇p = −∂t v(t, x, a) dct,x (a) − div v(t, x, a) ⊗ v(t, x, a) dct,x (a) ,
D D
in the sense of distributions.
In this section we refine a little bit the deep analysis made in [35] of the regularity of
the gradient of the pressure field: Brenier proved that the distributions ∂xi p are locally
finite measures in (0, T ) × D, but this information is not sufficient (due to a lack of
time regularity) to imply that p is a function. As shown in Corollary 3.7.4, a sufficient
¡ d/(d−1) ¢
condition, that gives also p ∈ L2loc (0, T ); Lloc (D) , is that
¡ ¢
∂xi p ∈ L2loc (0, T ); Mloc (D) , i = 1, . . . , d.
The proof of this regularity property is the main scope of this section. The fact that p is
a function at least in some L1loc (Lrloc ) space, for some r > 1, plays an important role in the
analysis, developed in Section 3.6, of the necessary and sufficient optimality conditions
for action-minimizing curves in Γ(D). Indeed, these conditions involve the Lagrangian
Z
1
Lp (γ) := |γ̇(t)|2 − p(t, γ(t)) dt,
2
3.7. Regularity of the pressure field 125
the (locally) minimizing curves for Lp and the value function induced by Lp , and none
of these objects makes sense if p is only a measure in the time variable.
From now, we fix a minimizing pair (c, v), and we shall denote by
Z T Z Z Z
∗ 1 2 1
A := |v| (t, x, a)dc(t, x, a) = T |v|2 (t, x, a) dct (x, a)
2 0 D×D 2 D×D
its action (the last equality follows

R from the property (a) stated above). To simplify
our notation we just denote by the integration on the whole space (0, T ) × D × D,
whenever no ambiguity arises. We shall also assume that either D is the closure of a
bounded Lipschitz
R domain in Rd , or that D = Td is the d-dimensional flat torus, and
denote by dx the integration with respect to µD .
3.7.1 A difference quotients estimate

In order to proceed to the proof, we recall an approximation of the pressure field obtained
in [35] through a dual formulation. The arguments in [35] extend with no change to the
more general model described in the introduction, where an initial and final measure-
preserving plan (instead of i and a measure-preserving map f ) are considered.
Let us consider the Banach space E := C 0 (Q̃) × [C 0 (Q̃)]d , where Q̃ := [0, T ] × D × D,
and we define the convex functions α : E → (−∞, ∞] and β : E → (−∞, ∞] given by
½
0 if F + 12 |Φ|2 ≤ 0,
α(F, Φ) :=
+∞ otherwise,

 hc, F i + hvc, Φi if F = −∂t φ − p, Φ = −∇x φ,
β(F, Φ) := for some φ ∈ C 0 (Q̃) and p ∈ C 0 ([0, T ] × D),

+∞ otherwise,
where (c, v) is the fixed minimizing pair. By the Fenchel-Rockafeller duality Theorem,
Brenier proved in [35, Section 3.2] that
sup {−α(−F, −Φ) − β(F, Φ)} = inf {α∗ (c̃, ṽc̃) + β ∗ (c̃, ṽc̃)},
(F,Φ)∈E (c̃,ṽc̃)∈E ∗
where α∗ and β ∗ denote the Legendre-Fenchel transforms of α and β respectively. Writ-

ing explicitly the minimization problem Rappearing in the right hand side, one exactly
recovers the minimization of the action 12 |v|2 dc, coupled with the endpoint and incom-
pressibility constraints (3.3.14) and (3.3.15). Indeed
Z
∗ 1 2 1
α (c̃, ṽc̃) = h|ṽ| , c̃i = |ṽ|2 dc̃,
2 2
and ½
∗ 0 if hc − c̃, ∂t φ + pi + hvc − ṽc̃, ∇x φi = 0 ∀ p, φ,
β (c̃, ṽc̃) :=
+∞ otherwise.
Thus it is simple to check that β ∗ (c̃, ṽc̃) = 0 if and only if the two constraints (3.3.14)
and (3.3.15) are satisfied.
One therefore deduces that the minimum of the action coincides with the dual prob-
lem sup(F,Φ)∈E {−α(−F, −Φ) − β(F, Φ)}, which more concretely can be written as
sup hc, ∂t φ + pi + hvc, ∇x φi,

p,φ
with
1
∂t φ + |∇x φ|2 + p ≤ 0.
2
Thus, the duality tells us that, for any ε > 0, there exist pε (t, x) and φε (t, x, a) satisfying
1
∂t φε + |∇x φε |2 + pε ≤ 0
2
and
1
h|v|2 , ci ≤ hc, ∂t φε + pε i + hvc, ∇x φε i + ε2 .
2
As shown in [35, Section 3.2], from this one deduces the estimate
Z
1
|v − ∇x φε |2 dc ≤ ε2 . (3.7.1)
2
R
We remark that, up to adding to φε a function of time, one can always assume D pε (t, x) dx =
0 for all t ∈ [0, T ]. As shown in [35, Section 3.4], the family pε in compact in the sense of
distributions, so that there exists a cluster point p. Moreover, since any limit point p of
pε is seen to satisfy (3.6.7) in the sense of distribution for any minimizing pair (c, v), ∇p
is uniquely determined, and this enforces the convergence of the whole family (∇pε )ε>0
to ∇p in the sense of distributions.
Let us now prove the following regularity result on ∇x φε : we present a proof slightly
different from the one in [35].
Proposition 3.7.1. Let τ ∈ (0, T ), let w : D → Rd be a smooth divergence-free vector
field parallel to ∂D and let esw (x) be the measure-preserving flow in D generated by w.
Then, for η < τ we have
Z T −τ Z
¯ ¯
¯∇x φε (t + η, eδw (x), a) − ∇x φε (t, x, a)¯2 dc ≤ L(ε2 + η 2 + δ 2 ), (3.7.2)
τ D×D
with L depending only on τ , w, T and A∗ .

Proof. In the sequel we fix a cut-off function ζ : [0, T ] → [0, 1] identically equal to 1 on
[τ, T − τ ]. We recall the following estimate (Proposition 3.1 in [35]), which follows by
the “quasi optimality” of (pε , φε ) in the dual problem:
Z
1 ¯¯ ¯2
(∂t + v η · ∇x )eδζw − ∇x φε ◦ eδζw ¯ dcη
2
Z Z
2 1 ¯¯ η
¯
δζw ¯2 η 1
≤ε + (∂t + v · ∇x )e dc − |v|2 dc, (3.7.3)
2 2
(here eδζw (x) is the flow generated by w starting from x, at time δζ) where (v η , cη ) is the
“reparameterization” of (v, c) given by
cη = cη (t)dt = ct+ηζ(t) dt, v η (t, x, a) = (1 + ηζ 0 (t))v(t + ηζ(t), x, a).
R R R
The minimality of (v, c) gives |v η |2 dcη ≥ |v|2 dc, and the constancy of t 7→ |v|2 (t, x, a) dct
gives Z Z Z
|v | dc − |v| dc = (η 2 (ζ 0 )2 + 2ηζ 0 ) dc ≤ Cη 2 ,
η 2 η 2
(3.7.4)
with C depending on T , A∗ , and ζ.

Since c is a weak solution to the incompressible Euler equations and w is divergence-
free, we have Z
v · (∂t + v · ∇x )(ζw) dc = 0.
As a consequence, performing a change of variable in time, it is simple to check that

Z
v η · (∂t + v η · ∇x )(ζw) dcη = O(η). (3.7.5)
If we now add and subtract v η , we can rewrite (3.7.3) as

Z
¯ ¯
¯(∂t + v η · ∇x )(eδζw (x) − x) + (v η − ∇x φε ◦ eδζw )¯2 dcη
Z Z
¯ ¯
η ¯2
2
≤ 2ε + ¯ η δζw
(∂t + v · ∇x )(e (x) − x) + v dc − |v|2 dc.
η
Rearranging the squares we get

Z Z
¯ η ¯ £ ¤ £ ¤
¯v − ∇x φε ◦ eδζw ¯2 dcη ≤ −2 (∂t + v η · ∇x )(eδζw (x) − x) · v η − ∇x φε ◦ eδζw dcη
Z
+ 2ε + 2 v η · (∂t + v η · ∇x )(eδζw (x) − x) dcη
2
Z Z
+ |v | dc − |v|2 dc.
η 2 η
Defining
Z Z
¯ η ¯ ¯ ¯
f (δ, ε, η) := ¯v − ∇x φε ◦ eδζw ¯2 dcη = ¯(1+ηζ 0 )v(1+ηζ, x, a)−∇x φε (1+ηζ, eδζw (x), a)¯2 dc
Z T −τ Z
¯ ¯
≥ ¯v − ∇x φε (t + η, eδw (x), a)¯2 dc
τ D×D
we see that it suffices to bound f from above. Since eδζw x − x = δζ(t)w(x) + O(δ 2 ) (in
the C 1 norm in spacetime), by Schwarz inequality, (3.7.4) and (3.7.5) we get
p
f ≤ C f δ + 2ε2 + C(δη + δ 2 ) + Cη 2 ,
which implies Rf (δ, ε, η) ≤ C(δ 2 + ε2 + η 2 ), with C depending on T , A∗ , ζ, and w. This,

together with |v − ∇x φε |2 dc ≤ 2ε2 , gives (3.7.2).
3.7.2 Proof of the main result

Theorem 3.7.2. Let τ ∈ (0, T ) and let w : D → Rd be a smooth divergence-free vector
field parallel to ∂D. Then there exists a constant C = C(w, τ, T, A∗ ) such that
¡ ¢ ¡ ¢
|h∇p·w, ζf i| ≤ Ckf k∞ kζkL2 (0,T ) ∀ζ ∈ Cc∞ (τ, T −τ ); [0, +∞) , f ∈ Cc∞ (0, 1)×D .
(3.7.6)
Proof. For ζ ∈ Cc∞ (τ, T − τ ) nonnegative, η ∈ (0, τ /2) and δ, ε > 0 we consider the
following expression:
Z TZ ¯Z 1 ¯
¯ £ ¤ ¯
I = I(ζ, δ, η, ε) : = ¯
ζ(t) ¯ pε (t + ηθ, e (x)) − pε (t + ηθ, x) dθ¯¯ dxdt
δw
0 D
Z ¯Z 1 0 ¯
¯ £ ¤ ¯
= ζ(t) ¯¯ pε (t + ηθ, e (x)) − pε (t + ηθ, x) dθ¯¯ dc(t, x, a).
δw
0
Our goal is to bound I from above. This will be achieved in the following (many) steps:
I ≤ I1 + I2 + I3 and estimate of I2 , I3 ; I1 ≤ 2kζk∞ ε2 − (I4 + I5 + I6 ) and estimate of I5
and I6 ; I4 = I7 + I8 and estimate of I8 ; I7 = 2I9 + I10 and estimate of I9 ; I10 = I11 + I12
and estimate of I12 ; finally I11 = I13 + I14 and estimate of I13 and I14 . In order to avoid
a cumbersome notation, during this proof we denote by C a generic constant depending
only on (w, τ, T, A∗ ), whose specific value
¡ can change from line
¢ to line.
1 2
R We now2 consider λε (t, x, a) := − ∂t φε + 2 |∇x φε | + pε ≥ 0, and we recall that
λε dc ≤ ε . We have
I ≤ I1 + I2 + I3 ,
where
Z ¯Z 1 ¯
¯ £ ¤ ¯
¯
I1 : = ζ(t) ¯ λε (t + ηθ, e (x), a) − λε (t + ηθ, x, a) dθ¯¯ dc,
δw
Z ¯Z0 1 ¯
¯ £ ¤ ¯
I2 : = ζ(t) ¯¯ ∂t φε (t + ηθ, e (x), a) − ∂t φε (t + ηθ, x, a) dθ¯¯ dc,
δw
0
Z ¯Z 1 ¯
¯ £1 1 ¤ ¯
I3 : = ζ(t) ¯¯ |∇x φε | (t + ηθ, e (x), a) − |∇x φε | (t + ηθ, x, a) dθ¯¯ dc.
2 δw 2
0 2 2
By (3.7.2) we have
√
k∇x φε (t+ηθ, eδw (x), a)kL2 (ζ 2 c) ≤ k∇x φε (t, x, a)kL2 (ζ 2 c) + Lkζk∞ (ε+η+δ) ∀θ ∈ (0, 1).
Therefore writing |A|2 − |B|2 as (A − B) · (A + B) and using (3.7.2) once more, we can
estimate
µZ ¶1/2
2 2 2 2 2 2
I3 ≤ C(ε + η + δ) ζ (t)|∇x φε | (t, x, a) dc + Ckζk∞ (ε + η + δ ) . (3.7.7)
For I2 we first integrate with respect to θ and then use the mean value theorem to obtain
Z Z 1
δ ¯£ ¤ ¯
I2 ≤ ζ(t) ¯ ∇x φε (t + η, eσδw (x), a) − ∇x φε (t, eσδw (x), a) · w(eσδw (x))¯ dσdc
η
Z 1Z 0
δ ¯£ ¯
≤C ζ(t) ¯ ∇x φε (t + η, eσδw (x), a) − ∇x φε (t, eσδw (x), a)¯ dcdσ
η 0
δ
≤ C (ε + η + δ)kζkL2 (0,T ) . (3.7.8)
η
R
Let us now consider I1 : using λε ≥ 0 and λε dc ≤ ε2 , we obtain
Z Z 1
£ ¤
I1 ≤ ζ(t) λε (t + ηθ, eδw (x), a) + λε (t + ηθ, x, a) dθdc
0
Z Z 1
2
£ ¤
≤ 2kζk∞ ε + ζ(t) λε (t + ηθ, eδw (x), a) + λε (t + ηθ, x, a) − 2λε (t, x, a) dθdc
0
≤ 2kζk∞ ε2 − I4 − I5 − I6 ,
where
Z Z 1£ ¤
I4 : = ζ(t) ∂t φε (t + ηθ, eδw (x), a) + ∂t φε (t + ηθ, x, a) − 2∂t φε (t, x, a) dθdc,
0
Z Z 1
1 £ ¤
I5 : = ζ(t) |∇x φε |2 (t + ηθ, eδw (x), a) + |∇x φε |2 (t + ηθ, x, a) − 2|∇x φε |2 (t, x, a) dθdc,
2
Z Z 10
£ ¤
I6 : = ζ(t) pε (t + ηθ, eδw (x)) + pε (t + ηθ, x) − 2pε (t, x) dθdc.
0
Now we notice that

I6 = 0, (3.7.9)
R δw
since c(t, x, da)
R = 1 (by the incompressibility constraint (3.3.15)), e is measure-
preserving, and pε (t, x) dx = 0. For I5 , we have the same bound as for I3 , that is
µZ ¶1/2
2 2
|I5 | ≤ C(ε + η + δ) ζ (t)|∇x φε | (t, x, a) dc + Ckζk2∞ (ε2 2
+η +δ ) 2
. (3.7.10)
We continue splitting I4 as I7 + I8 , with

Z Z 1£ ¡ ¢ ¤
I7 : = ζ(t) ∂t φε (t + ηθ, eδw (x), a) + ∂t φε (t + ηθ, x, a) − 2ζ(t − θη)∂t φε (t, x, a) dθdc,
Z 0Z 1£ ¤
I8 : = 2 ζ(t − θη) − ζ(t) ∂t φε (t, x, a) dθdc.
0
R
For I8 , using once more that λε ≥ 0 and λε dc ≤ ε2 , we have the bound
¯Z Z 1 ¯ ¯Z Z 1 ¯
¯ £ ¤ ¯ ¯ £ ¤ ¯
|I8 | ≤ 2 ¯¯ ζ(t − θη) − ζ(t) λε (t, x, a) dθdc¯¯ + ¯¯ ζ(t − θη) − ζ(t) |∇x φε | (t, x, a) dθdc¯¯
2
¯Z Z 01 ¯ 0
¯ £ ¤ ¯
+ 2 ¯¯ ζ(t − θη) − ζ(t) pε (t, x, a) dθdc¯¯
0
¯Z Z 1 ¯
¯ £ ¤ ¯
≤ 4kζk∞ ε + ¯¯
2
ζ(t − θη) − ζ(t) |∇x φε | (t, x, a) dθdc¯¯
2
0
R R
where
R in the last inequality we used that p ε dct = pε dx = 0. Using also the fact that
t 7→ |v|2 (t, x, a) dct does not depend on t we get
Z
¯ ¯
|I8 | ≤ 4kζk∞ ε + 2kζk∞ 2 ¯|∇x φε |2 (t, x, a) − |v|2 (t, x, a)¯ dc. (3.7.11)
We now consider I7 = I9 + 2I10 , where

Z Z 1£ ¤
I9 : = ζ(t) ∂t φε (t + ηθ, eδw (x), a) − ∂t φε (t + ηθ, x, a) dθdc,
0
Z Z 1
£ ¤
I10 : = ζ(t)∂t φε (t + ηθ, x, a) − ζ(t − θη)∂t φε (t, x, a) dθdc.
0
We have, as for I2 ,
¯Z ¯
1 ¯¯ £¡ ¢ ¡ ¢¤ ¯
|I9 | = ¯ ζ(t) φε (t + η, e (x), a) − φε (t + η, x, a) − φε (t, e (x), a) − φε (t, x, a) dc¯¯
δw δw
η
¯Z Z 1 ¯
δ ¯¯ £ ¤ ¯
= ¯ ζ(t) ∇x φε (t + η, e (x), a) − ∇x φε (t, e (x), a) · w(e (x)) dσdc¯¯
σδw σδw σδw
η 0
δ
≤ C (ε + η + δ)kζkL2 (0,T ) . (3.7.12)
η
For I10 , we use the continuity equation ∂t c + divx (vc) = 0 (see (3.3.16)) and add and
subtract ζ(t) to get
Z Z 1 Z 1 £ ¤
I10 = ∂t ζ(t − (1 − σ)ηθ)∂t φε (t + ηθσ, x, a) ηθ dσdθdc
Z 0Z 0
1Z 1
=− ζ(t − (1 − σ)ηθ)∂t ∇x φε (t + ηθσ, x, a) · v(t, x, a)ηθ dσdθdc
0 0
Z Z 1 Z 1£ ¤
=− ζ(t − (1 − σ)ηθ) − ζ(t) ∂t ∇x φε (t + ηθσ, x, a) · v(t, x, a)ηθ dσdθdc
0 0
Z Z 1Z 1
− ζ(t)∂t ∇x φε (t + ηθσ, x, a) · v(t, x, a)ηθ dσdθdc
0 0
Z Z 1Z 1
£ ¤
=− ζ(t − (1 − σ)ηθ) − ζ(t) ∂t ∇x φε (t + ηθσ, x, a) · v(t, x, a)ηθ dσdθdc
0
Z Z 10
£ ¤
− ζ(t) ∇x φε (t + ηθ, x, a) − ∇x φε (t, x, a) · v(t, x, a) dθdc
0
=: I11 + I12 .
Now we see that, using (3.7.2) and the Schwarz inequality, we easily get
µZ ¶ 21
2 2
|I12 | ≤ C(ε + η) ζ (t)|∇x φε | dc + Ckζk2∞ (ε2 2
+η ) . (3.7.13)
For I11 , it can be written as I13 + I14 , where

Z Z 1Z 1
£ ¤
I13 := ∂t [ζ(t − (1 − σ)ηθ) − ζ(t)]∇x φε (t + ηθσ, x, a) · v(t, x, a)ηθ dσdθdc
Z Z 01 0
= [ζ(t − ηθ) − ζ(t)]∇x φε (t, x, a) · v(t, x, a) dθdc,
0
Z Z 1
£ ¤
= [ζ(t − ηθ) − ζ(t)] ∇x φε (t, x, a) − v(t, x, a) · v(t, x, a) dθdc
Z Z 01
− [ζ(t − ηθ) − ζ(t)]|v|2 (t, x, a) dθdc
0
and
Z Z 1 Z 1£ ¤
I14 := ζ 0 (t − (1 − σ)ηθ) − ζ 0 (t) ∇x φε (t + ηθσ, x, a) · v(t, x, a)ηθ dσdθdc.
0 0
R
Recalling that t 7→ |v|2 (t, x, a) dct is constant, by (3.7.1) we have
¯Z Z 1 ¯
¯ ¡ ¢ ¯
|I13 | ≤ ¯¯ [ζ(t − ηθ) − ζ(t)] ∇x φε (t, x, a) − v(t, x, a) · v(t, x, a) dθdc¯¯ ≤ Ckζk∞ ε.
0
(3.7.14)
Finally, by (3.7.2) we can bound I14 with
Z Z Z Z 1¯
T −τ /2 1 ¯
|I14 | ≤ kζ k∞ η 00 2 ¯∇x φε (t + ηθσ, x, a) · v(t, x, a)¯ dσdθdc
τ /2 D×A 0 0
00 2
≤ kζ k∞ η C (k∇x φε k2 + C(ε + η)) . (3.7.15)
Collecting (3.7.7), (3.7.8), (3.7.9), (3.7.10), (3.7.12), (3.7.13), (3.7.14), (3.7.15) we can
bound from above I as follows:
µZ ¶ 12
2 2
C(ε + η + δ) ζ (t)|∇x φε | dc + Ckζk2∞ (ε2 2
+η +δ ) 2
δ
+ I8 + C (ε + η + δ)kζkL2 (0,T ) + kζ 00 k∞ η 2 C (k∇x φε k2 + C(ε + η)) + 2kζk∞ ε2 + Ckζk∞ ε.
η
¡ ¢
Now, recalling the definition of I, we integrate pε ζ against a function f ∈ Cc∞ (0, T )×D
and pass to the limit as ε → 0, with η = δ frozen, to obtain
¯Z ¯
1 ¯¯ 1 £ −δw
¤ ¯
¯ ≤ Ckf k∞ (kζkL2 (0,T ) +δkζ 00 k∞ +δkζk∞ )
¯ hq, ζ(t) f (t − δθ, e (x)) − f (t − δθ, x) i dθ ¯
δ 0
for any limit point q of pε in the sense of distributions,

R thanks to the fact that, by
(3.7.11), I8 → 0 as ε → 0 (here we use again that t 7→ |v|2 (t, x, a) dct is constant). So,
letting δ → 0, we finally obtain (3.7.6), with ∇q in place of ∇p. But ∇pε → ∇p implies
that ∇p = ∇q and concludes the proof.
Remark 3.7.3. In the case D = Td one can also consider constant vector fields w and
therefore (3.7.6) holds in a stronger (and simpler) form:
¡ ¢ ¡ ¢
|h∂xi p, ζf i| ≤ Ckf k∞ kζkL2 (0,T ) ∀ζ ∈ Cc∞ (τ, T − τ ); [0, +∞) , f ∈ C ∞ [0, 1] × Td
(3.7.16)
∗
with C depending only on τ , T and A .
A simple localization and smoothing argument based on (3.7.6) gives that the pressure
field is locally (globally, in the case D = Td ) induced by a function.
Corollary 3.7.4. Let d ≥ 2. Then for all smooth subdomains D0 ⊂⊂ D there exists
¡ ¢ ¡ ∗ ¢
q ∈ L2loc (0, T ); BV (D0 ) ⊂ L2loc (0, T ); L1 (D0 )
(here 1∗ = d/(d − 1)) with ∇q = ∇p in the sense of distributions in (0, T ) × D0 . In the

case D = Td the same statement holds globally in space, i.e. with D0 = D. Moreover, in
this case the result holds also for d = 1 (with 1∗ = ∞).
Proof. We first notice that for d ≥ 2 any constant vector field w̄ in D0 can be extended
to a divergence-free, smooth and compactly supported vector field in D: indeed, if
D0 ⊂⊂ D1 ⊂⊂ D2 ⊂⊂ D, with D1 and D2 smooth, we may set ŵ = w̄ in D1 , ŵ = 0 in
D \ D2 , and ŵ = ∇ψ in D2 \ D1 , where ψ is a solution of

 ∆ψ = 0 in D2 \ D1 ,
∂ψ
∂ν
=0 on ∂D2 ,
 ∂ψ
∂ν
= w̄ · ν on ∂D1 ,
R R
(existence of ψ can be obtained by minimizing 12 D2 \D1 |∇φ|2 − ∂D1 φw̄·ν in H 1 (D2 \D1 )).
By construction ŵ is divergence-free (in the sense of distributions) in D, compactly
supported and coincides with w̄ in a neighbourhood of D0 , so that a suitable mollification
of ŵ provides the required extension.
Thanks to this remark, (3.7.6) yields
¡ ¢ ¡ ¢
|h∂xi p, ζf i| ≤ Lkf k∞ kζkL2 (0,T ) ∀ζ ∈ Cc∞ (τ, T − τ ); [0, +∞) , f ∈ Cc∞ (0, 1) × D0 ,
(3.7.17)
with L depending only on τ , T , D0 and A∗ . If we denote by qε the¡ mollified functions
¢
of p, this easily implies that |∇qε | is uniformly bounded in L2loc (0, T ); L1 (D0 ) . In
0
particular, if we ¡denote by q̄ε the¢ mean value of qε on D , qε − q̄ε is uniformly bounded
2 1∗ 0
in the
¡ space dLloc 0 (0,
¢ T ); L (D ) , and if q is any weak2 limit
¡ point (in the¢ duality with
2 0
Lloc (0, T ); L (D ) ) we easily get ∇q = ∇p and q ∈ Lloc (0, T ); BV (D ) .
In the case D = Td the proof is analogous: it suffices to apply Remark 3.7.3.
Chapter 4
On the structure of the Aubry set

and Hamilton-Jacobi equation
4.1 Introduction
1
Let M be a smooth manifold without boundary. We denote by T M the tangent bundle
and by π : T M → M the canonical projection. A point in T M will be denoted by (x, v)
with x ∈ M and v ∈ Tx M = π −1 (x). In the same way a point of the cotangent bundle
T ∗ M will be denoted by (x, p) with x ∈ M and p ∈ Tx∗ M a linear form on the vector space
Tx M . We will suppose that g is a complete Riemannian metric on M . For v ∈ Tx M , the
norm kvkx is gx (v, v)1/2 . We will denote by k · kx the dual norm on T ∗ M . Moreover, for
every pair x, y ∈ M , d(x, y) will denote the Riemannian distance from x to y.
We will assume in the whole chapter that H : T ∗ M → R is a Hamiltonian of class
k,α
C , with k ≥ 2, α ∈ [0, 1], which satisfies the three following conditions:
(H1) C 2 -strict convexity: ∀(x, p) ∈ T ∗ M , the second derivative along the fibers
∂ 2 H(x,p)
∂p2
is positive strictly definite;
(H2) uniform superlinearity: for every K ≥ 0 there exists a finite constant C(K)
such that
∀(x, p) ∈ T ∗ M, H(x, p) ≥ Kkpkx + C(K);
(H3) uniform boundedness in the fibers: for every R ≥ 0, we have
sup {H(x, p) | kpkx ≤ R} < +∞.

x∈M
1
This chapter is based on a joint work with Albert Fathi and Ludovic Rifford [67].
135
136 4.0. On the structure of the Aubry set and Hamilton-Jacobi equation
By the Weak KAM Theorem we know that, under the above conditions, there is
c(H) ∈ R such that the Hamilton-Jacobi equation
H(x, dx u) = c
admits a global viscosity solution u : M → R for c = c(H) and does not admit such
solution for c < c(H), see [62], [52], [65], [68], [96]. In fact, if M is assumed to be compact,
then c(H) is the only value of c for which the Hamilton-Jacobi equation above admits
a viscosity solution. The constant c(H) is called the critical value, or the Mañé critical
value of H. In the sequel, a viscosity solution u : M → R of H(x, dx u) = c(H) will be
called a critical viscosity solution or a weak KAM solution, while a viscosity subsolution u
of H(x, dx u) = c(H) will be called a critical viscosity subsolution (or critical subsolution
if u is at least C 1 ).
We recall that the Lagrangian L : T M → R associated to the Hamiltonian H is
defined by
∀(x, v) ∈ T M, L(x, v) := max ∗
{p(v) − H(x, p)} .
p∈Tx M
Since H is of class at least C 2 and satisfies the three conditions (H1)-(H3), it is well-
known (see for instance [65] or [68, Lemma 2.1])) that L is finite everywhere of class at
least C 2 , strictly convex and superlinear in each fiber Tx M , and satisfies
∀(x, p) ∈ Tx∗ M, H(x, p) = max {p(v) − L(x, v)} .
v∈Tx M
Therefore the Fenchel inequality is always satisfied

p(v) ≤ L(x, v) + H(x, p).
Moreover, we have equality in the Fenchel inequality if and only if
(x, p) = L(x, v),
where L : T M → T ∗ M denotes the Legendre transform defined as
µ ¶
∂L
L(x, v) := x, (x, v) .
∂v
Under our assumption L is a diffeomorphism of class at least C 1 . We will denote by φLt
the Euler-Lagrange flow of L, and by XL the vector field on T M that generates the flow
φLt .
As done by Mather in [103], it is convenient to introduce for t > 0 fixed, the function
ht : M × M → R defined by
Z t
∀x, y ∈ M, ht (x, y) := inf L(γ(s), γ̇(s))ds,
0
where the infimum is taken over all the absolutely continuous paths γ : [0, t] → M with
γ(0) = x and γ(t) = y. The Peierls barrier is the function h : M × M → R defined by
h(x, y) := lim inf {ht (x, y) + c(H)t} .

t→∞
It is clear that this function satisfies
∀x, y, z ∈ M, h(x, z) ≤ h(x, y) + ht (y, z) + c(H)t

h(x, z) ≤ ht (x, y) + c(H)t + h(y, z).
Moreover, given a weak KAM solution u, we have
u(y) − u(x) ≤ h(x, y) ∀x, y ∈ M.
It follows that the function h is either identically +∞ or it is finite everywhere. If M

is compact, h is finite everywhere. In addition, if h is finite, then for each x ∈ M , the
function hx (·) = h(x, ·) is a critical viscosity solution (see [65] or [69]). Furthermore, h
satisfies the triangle inequality
∀x, y, z ∈ M, h(x, z) ≤ h(x, y) + h(y, z).
The projected Aubry set A is defined by
A := {x ∈ M | h(x, x) = 0} .
Since h satisfies the triangle inequality, the function dM : A × A → R defined as
∀x, y ∈ A, dM (x, y) := h(x, y) + h(y, x),
is a semi-distance on the projected Aubry set. We define the quotient Aubry set (AM , dM )
to be the metric space obtained by identifying two points in A if their semi-distance dM
vanishes. In [105], Mather formulated the following problem:
Mather’s Problem. If L is C ∞ , is the set AM totally disconnected, i.e. is each con-

nected component of AM reduced to a single point?
In [104], Mather brought a positive answer to that problem in low dimension. More
precisely, he proved that if M has dimension two, or if the Lagrangian is the kinetic
energy associated to a Riemannian metric on M in dimension ≤ 3, then the quotient
Aubry set is totally disconnected. In fact, Mather mentioned in [105] that it would be
even more interesting to be able to prove that the quotient Aubry set has vanishing
one-dimensional Hausdorff measure. The aim of the present chapter is to show that such
a property is satisfied under various assumptions. Let us state our results.
Theorem 4.1.1. If dim M = 1, 2 and H of class C 2 or dim M = 3 and H of class C 3,1 ,

then (AM , dM ) has vanishing one-dimensional Hausdorff measure.
Define the stationary projected Aubry set by
A0 := {x ∈ A | (x, dx hx ) = L(x, 0)} ,
and denote by (A0M , dM ) the quotiented metric space. In fact, at the very end of his
paper [104], Mather noticed that the argument used in the case where L is a kinetic
energy in dimension 3 proves the total disconnectedness of the quotient Aubry set in
dimension 3 as long as A0M is empty. Our result concerning the stationary projected
Aubry set is the following:
Theorem 4.1.2. If dim M ≥ 3 and H of class C k,1 with k ≥ 2 dim M −3, then (A0M , dM )
has vanishing one-dimensional Hausdorff measure. Moreover, if α ∈ (0, 1] is such that
α( k+1
2
+ 1) ≥ dim M then (A0M , dM ) has vanishing α-dimensional Hausdorff measure. In
particular, if H is C ∞ then (A0M , dM ) has zero Hausdorff dimension.
This result is in some sense optimal: for each integer d > 0, and each ² > 0, Mather
has constructed on the torus Td = Rd /Zd a Tonelli Lagrangian L of class C2d−3,1−² such
that Ã is connected, contained in the fixed points of the Euler-Lagrange flow, and the
Mather quotient (AM , dM ) is isometric to an interval, see [105].
As a corollary of the above theorem, we have the following result which was moreorless
already proved by Mather in [105, §19 page 1722] (see also the work of Sorrentino [124],
where the author uses a strategy similar to ours to prove analogous results).
Corollary 4.1.3. Assume that H is of class C 2 and that its associated Lagrangian L
satisfies the following conditions:
1. ∀x ∈ M, minv∈Tx M L(x, v) = L(x, 0);
2. the mapping x ∈ M 7→ L(x, 0) is of class C l,1 (M ) with l ≥ 1.

If dim M = 1, 2 or dim M ≥ 3 and l ≥ 2 dim M − 3, then (AM , dM ) is totally discon-
nected. In particular, if L(x, v) = 21 kvk2x − V (x), with V ∈ C l,1 (M ) and l ≥ 2 dim M − 3
(V ∈ C 2 (M ) if dim M = 1, 2), then (AM , dM ) is totally disconnected.
The Aubry set Ã ⊂ T M can be defined as the set of (x, v) ∈ T M such that x ∈ A and
v is the unique w ∈ Tx M such that dx u = ∂L∂v
(x, w) for any critical viscosity subsolution.
This set is invariant under the Euler-Lagrange flow φLt . Then, for each x ∈ A, there is
only one orbit of φLt in Ã whose projection passes through x. We denote by Ap the set
of x ∈ A whose orbit in the Aubry set is periodic with (strictly) positive period. We call
this set the periodic projected Aubry set. We have the following result:
Theorem 4.1.4. If dim M ≥ 3 and H of class C k with k ≥ 8 dim M − 7, then (ApM , dM )

has vanishing one-dimensional Hausdorff measure. Moreover, if α ∈ (0, 1] is such that
α( k−1
8
+ 1) ≥ dim M then (ApM , dM ) has vanishing α-dimensional Hausdorff measure. In
particular, if H is C ∞ then (ApM , dM ) has zero Hausdorff dimension.
In fact, we notice that the method we use to demonstrate Theorem 4.1.4 highlights
a general assumption under which we can prove that the quotient Aubry set has small
Hausdorff dimension, see Section 4.4. We observe that, if M is assumed to be compact,
the size of the quotient Aubry set can be evaluated on the union of the limit sets of the
orbits of the Aubry set. Moreover limit sets of flows are well understood on surfaces.
Such ideas together with Theorems 4.1.2 and 4.1.4 lead to the following result on surfaces:
Theorem 4.1.5. If M is a compact surface of class C ∞ and H is of class C ∞ , then
(AM , dM ) has zero Hausdorff dimension.
In the last section, we present applications in dynamic of which Theorem 4.1.7 below
is a corollary. If X is a C k vector field on M , with k ≥ 2, the Mañé Lagrangian
LX : T M → R associated to X is defined by
1
LX (x, v) = kv − X(x)k2x , ∀(x, v) ∈ T M,
2
and we denote by φX
t the flow generated by X. We now recall the definition of chain-
recurrence:
Definition 4.1.6. Take ε > 0. A periodic ε-chain is a sequence (x0 , . . . , xn ) such that
x0 = xn and there exist ti ≥ 1 such that d(xi+1 , φX
ti (xi )) ≤ ε for i = 0, . . . , n − 1. A point
x is chain-recurrent if for any ε > 0 there exists a periodic ε-chain passing through x.
Roughly speaking, the chain-recurrence and the fact of being in the projected Aubry
set are two different ways to characterize the set of points which are more or less invari-
ant under the flow for large times. It is therefore a natural question to understand when
these two definitions coincide. Fathi has raised the following problem (compare with the
list of questions http://www.aimath.org/WWN/dynpde/articles/html/20a/ ):
Problem. Let LX : T M → R be the Mañé Lagrangian associated to the Ck , k ≥ 2,

vector field X on the compact connected manifold M .
(1) Is the set of chain-recurrent points of the flow of X on M equal to the projected
Aubry set A?
(2) Give a condition on the dynamics of X that ensures that the only weak KAM
solutions are the constants.
The theorems obtained in the first part of the chapter together with applications in
dynamics developed in Section 6 give an answer to this question if dim M ≤ 3.
Theorem 4.1.7. Let X be a Ck , k ≥ 2 vector field on the compact connected C∞ manifold

M . Assume that one of the conditions hold:
(1) The dimension of M is 1 or 2.
(2) The dimension of M is 3, and the vector field X never vanishes.
(3) The dimension of M is 3, and X is of class C3,1 .
Then the projected Aubry set A of the Mañé Lagrangian LX : T M → R associated to X

is the set of chain-recurrent points of the flow of X on M . Moreover, the constants are
the only weak KAM solutions for LX if and only if every point of M is chain-recurrent
under the flow of X.
The outline is the following: Sections 2 and 3 are devoted to preparatory results.
Section 4 is devoted to the proofs of Theorems 4.1.1, 4.1.2 and 4.1.4. Sections 5 and 6
present applications in dynamics.
4.2 Preparatory lemmas

We denote by SS the set of critical viscosity subsolutions and by S− the set of weak
KAM solutions. Hence S− ⊂ SS. If u : M → R is a critical viscosity subsolution, we
recall that
u(y) − u(x) ≤ h(x, y), ∀x, y ∈ M.
In [69], Fathi and Siconolfi proved that for every critical viscosity subsolution u : M → R,
there exists a C 1 critical subsolution whose restriction to the projected Aubry set is equal
to u. In the sequel, we denote by SS 1 the set of C 1 critical subsolutions. The following
lemma is fundamental in the proof of our results.
Lemma 4.2.1. For every x, y ∈ A,
dM (x, y) = max {(u1 − u2 )(y) − (u1 − u2 )(x)}

u1 ,u2 ∈S−
= max {(u1 − u2 )(y) − (u1 − u2 )(x)}

u1 ,u2 ∈SS
= max {(u1 − u2 )(y) − (u1 − u2 )(x)} .
u1 ,u2 ∈SS 1
4.2. Preparatory lemmas 141
Proof. Let x, y ∈ A be fixed. First, we notice that if u1 , u2 are two critical viscosity
subsolutions, then we have
(u1 − u2 )(y) − (u1 − u2 )(x) = (u1 (y) − u1 (x)) + (u2 (x) − u2 (y))
≤ h(x, y) + h(y, x) = dM (x, y).
Moreover, if we define u1 , u2 : M → R by u1 (z) := h(x, z) and u2 (z) := h(y, z) for any

z ∈ M , then we have
(u1 − u2 )(y) − (u1 − u2 )(x) = (h(x, y) − h(y, y)) − (h(x, x) − h(y, x))
= h(x, y) + h(y, x) = dM (x, y),
since h(x, x) = h(y, y) = 0. Since u1 , u2 are both critical viscosity solutions, we obtain
easily the first and the second equality. The last inequality is an immediate consequence
of the Theorem of Fathi and Siconolfi recalled above.
The proofs of the next two Lemmas can be found in [114]. It has to be noticed that
Norton’s Lemma 4.2.2 is an elegant generalization of the Morse original Lemma, see
[111].
Lemma 4.2.2 (The generalized Morse Vanishing Lemma). Let k ∈ N and α ∈

[0, 1]. Then any set A ⊂ Rn can be decomposed into a countable union A = ∪i∈N Ai where
1. A0 is countable;
2. Ai ⊂ Bi with Bi a C 1 -embedded compact disk of dimension ≤ n such that every

f ∈ C k,α (Rn , R) vanishing on A satisfies, for each i ≥ 1,
|f (x) − f (y)| ≤ Mi |x − y|k+α ∀y ∈ Ai , x ∈ Bi (4.2.1)
for a certain constant Mi .
Lemma 4.2.3. For any C 1 -embedded compact disk B, there is a constant C > 0 such
that for all x, y ∈ B there is a C 1 path in B from x to y with length less than C|x − y|.
The proof of Lemma 4.2.4 that we present here is derived from [18] (compare [72])
who proved that if E ⊂ Rn is a measurable set, f : E → R is continuous, and n ≥ 2 is
such that f satisfies
|f (x) − f (y)| ≤ C|x − y|n ∀x, y ∈ E,
then f (E) has Lebesgue measure zero.
Lemma 4.2.4. Let Ψ : E → X be a map where E is a subset of Rn and (X, dX ) is a

semi-metric space. Suppose that there are α ∈ (0, 1] and M > 0 such that
n
∀x, y ∈ E, dX (Ψ(x), Ψ(y)) ≤ M |x − y| α , (4.2.2)
then the α-dimensional Hausdorff measure of (Ψ(E), dX ) is zero.

Proof. Since it suffices to prove that H α (Ψ(E ∩ K)) = 0 for each compact set K ⊂ Rn ,
we can assume that E is bounded, which in particular implies L n (E) < +∞. We now
write E = E1 ∪ E2 , where E1 is the set of density points for E and E2 := E \ E1 . It is a
standard result in measure theory that L n (E2 ) = 0. Thus for each ² > 0 be fixed, there
exists a countable family of balls {Bi }i∈I such that
[ X
E2 ⊂ Bi and (diam Bi )n ≤ ε.
i∈I i∈I
Then we have
X X
H α (Ψ(E2 )) ≤ (diamX Ψ(Bi ))α ≤ M (diam Bi )n ≤ M ε.
i∈I i∈I
Letting ε → 0, we obtain H α (Ψ(E2 )) = 0.

We now want to prove that H α (Ψ(E1 )) = 0. Fix P ∈ N. For every density point x ∈ E1 ,
there exists ρ = ρ(x) > 0 such that
L n (E1 ∩ B(x, r)) L n (E ∩ B(x, r)) 1
n
= n
≥1− ∀r ≤ ρ(x). (4.2.3)
L (B(x, r)) L (B(x, r)) 2P n
Now it is simple to prove that for all y, z ∈ E1 ∩ B(x, r) there exist P + 1 points
x0 , . . . , xP ∈ E1 , with x0 = y and xP = z such that
4r
|xi − xi−1 | ≤ ∀1 ≤ i ≤ P
P
Indeed, first take y1 , . . . , yP −1 the P − 1 points on the line segment [y, x] such that
|yi − yi−1 | = |y−x|
P
and then observe that, by (4.2.3), B(yi , rPx ) ∩ E1 is not empty for each
i, and so it suffices to take a point xi in that set. Then,
P
X P
X n
dX (Ψ(y), Ψ(z)) ≤ dX (Ψ(xi−1 ), Ψ(xi )) ≤ M |xi − xi−1 | α
i=1 i=1
µ ¶ αn
4r n n n
≤ MP = 4 α M P 1− α r α . (4.2.4)
P
1,1
4.3. Existence of Cloc critical subsolution on noncompact manifolds 143
We are now able to prove that H α (Ψ(E1 )) = 0.

Take an open set Ω ⊃ E1 such that L n (Ω) ≤ 2L n (E1 ) = 2L n (E) < +∞ (we can
assume L n (E) > 0) and consider the fine covering F given by F = {B(x, r)}x∈F1 with r
such that B(x, r) ⊂ Ω and r ≤ ρ(x)
5
, where ρ(x) was defined above. By Vitali’s covering
theorem (see [61, paragraph 1.5.1]), there exists a countable collection G of disjoint balls
in F such that [
E2 ⊂ 5B,
B∈G
where 5B denotes the ball concentric to B with radius 5 times that of B. We can so
consider the covering of f (E1 ) given by ∪1≤i≤Nn {f (5B ∩E1 )}B∈G . In this way, by (4.2.4),
we get
X X
H α (Ψ(E1 )) ≤ (diamX Ψ(5B ∩ E1 ))α ≤ 4n M α P α−n (diam 5B)n
B∈G B∈G
X
n α α−n
≤ 20 M P (diam B)n
B∈G
n α α−n
≤ 20 M P L n (Ω) ≤ 2 · 20n M α P α−n L n (E),
and, in this case we conclude letting P → +∞, as n ≥ 2.
1,1
4.3 Existence of Cloc critical subsolution on noncom-
pact manifolds
In [20], using some kind of Lasry-Lions regularization (see [89]), Bernard proved the
existence of C 1,1 critical subsolutions on compact manifolds. Here, adapting his proof,
we show that the same result holds in the noncompact case and we make clear that
1,1
the Lipschitz constant of the derivative of the Cloc critical subsolution can be uniformly
bounded on compact subsets of M .
Theorem 4.3.1. Assume that H is of class C 2 . For every open subset O of M which
is relatively compact in M , there is a constant L = L(O) > 0 such that if u : M → R
1,1
is a critical viscosity subsolution, then there exists a Cloc critical subsolution v : M → R
whose restriction to the projected Aubry set is equal to u and such that the mapping
x 7→ dx v is L-Lipschitz on O.
Before proving Theorem 4.3.1, we observe that the following result holds:
Lemma 4.3.2. There is a constant K 0 > 0 such that any critical viscosity subsolution
u : M → R is K 0 -Lipschitz on M , that is,
|u(y) − u(x)| ≤ K 0 d(x, y), ∀x, y ∈ M,
where d denotes the Riemannian distance associated to the metric g.

Proof. Let u : M → R be a critical viscosity subsolution and x, y ∈ M be fixed. Let
γx,y : [0, d(x, y)] → M be a minimizing geodesic with constant unit speed joining x to y.
By definition of hd(x,y) (x, y), one has
Z d(x,y)
hd(x,y) (x, y) ≤ L(γx,y (t), γ̇x,y (t)) dt ≤ A(1)d(x, y),
0
where A(1) := supx∈M {L(x, v) | kvkx ≤ 1} is finite thanks to the uniform superlinearity
of H in the fibers. Thus, one has
u(x) − u(y) ≤ hd(x,y) (x, y) + c(H)d(x, y) ≤ (A(1) + c(H))d(x, y).
Exchanging x and y, we conclude that u is K 0 -Lipschitz with K 0 := A(1) + c(H).

Before giving the proof of the theorem, we also notice that since L is uniformly
superlinear in the fibers, there exists a finite constant C(K 0 ) such that
L(x, v) ≥ 2K 0 kvkx + C(K 0 ) ∀(x, v) ∈ T M.
From that, we deduce that for every t > 0,
ht (x, y), ht (y, x) ≥ 2K 0 d(x, y) + C(K 0 )t, ∀x, y ∈ M. (4.3.1)
Proof of Theorem 4.3.1. Let Kn be an increasing sequence of compact sets such that
◦
Kn ⊂K n+1 and ∪n Kn = M . We consider the two Lax-Oleinik semigroups Tt− and Tt+
defined by
Tt− u(x) := inf {u(y) + ht (y, x) + c(H)t} , Tt+ u(x) := sup {u(y) − ht (x, y) − c(H)t} ,
y∈M y∈M
for every x ∈ M . In [65], Fathi proved that those two semigroups preserve the set
of critical viscosity subsolutions and that, for all t > 0 and each continuous function
u, the function Tt+ u is locally semi-convex, while Tt− u is locally semi-concave. In [20],
the idea for proving the existence of C 1,1 critical subsolution on compact manifolds is
the following. It is a known fact that a function is C 1,1 if and only if it is both semi-
concave and semi-convex. Let now u be a critical viscosity subsolution. If we apply the
1,1
semigroup Tt+ to u, we obtain a semi-convex critical viscosity subsolution Tt+ u. Thus,

if one proves that, for s small enough, Ts− Tt+ u is still semi-convex, as we already know
that it is semi-concave, we would have found a C 1,1 critical subsolution. Since we want
1,1
to give a uniform bound on the Lipschitz constant of the derivative of the Cloc critical
subsolution on compact sets, we will have to bound the constant of semi-convexity of
Tt+ u on compact subsets of M . Let us now prove the result in the noncompact case.
Let u : M → R be a critical viscosity subsolution. Let t > 0 be fixed, we set v := Tt+ u.
Lemma 4.3.3. The function v : ³ M → R´ is locally semi-convex on M . Morever, there
◦
2
exists a bounded subset F of C K 4 , R whose bound depends only on t (not on u)
verifying
v(x) = max f (x), ∀x ∈ K4 ,
f ∈F
and such that for each x ∈ K4 and each p ∈ D− v(x) there is f ∈ F such that f (x) = v(x)
and dx f = p.
Proof. Let x ∈ K4 be fixed. From the definition of Tt+ u(x), we have
Tt+ u(x) ≥ u(x) − ht (x, x) − c(H)t
≥ u(x) − tL(x, 0) − c(H)t
≥ u(x) − (A(0) + c(H))t,
where A(0) := supx∈M {L(x, 0)} is finite thanks to the uniform superlinearity of H in the
fibers. On the other hand, by Lemma 4.3.2 and (4.3.1), we have for every y ∈ M ,
u(y) − ht (x, y) − c(H)t ≤ u(x) + K 0 d(x, y) − 2K 0 d(x, y) − c(K 0 )t − c(H)t
≤ u(x) − K 0 d(x, y) − (C(K 0 ) + c(H))t.
Therefore, the supremum in the definition of Tt+ u(x) is necessarily attained at a point
yx ∈ M satisfying
(A(0) − C(K 0 ))
d(x, yx ) ≤ t.
K0
0 ))
Denote by Kx the set of y ∈ M such that d(x, y) ≤ (A(0)−C(KK0
t, and by K the union of
Kx for x ∈ K4 . K is a compact subset of M . By Proposition 6.2.17 of the Appendix,
there is a compact set K̃ R⊂ M and a constant A > 0 such that every curve γ : [0, t] → M
t
with γ(0), γ(t) ∈ K and 0 L(γ(s), γ̇(s)) ds = ht (γ(0), γ(t)) satisfies
γ ([0, t]) ⊂ K̃, and kγ̇(s)kγ(s) ≤ A ∀s ∈ [0, t]. (4.3.2)
Let x ∈ K4 . By construction of Kx , there is yx ∈ Kx such that
Z t
+
Tt u(x) = u(yx ) − L(γx (s), γ̇x (s))ds − c(H)t,
0
where γx : [0, t] → M is a curve such that γx (0) = x, γx (t) = yx and

Z t
ht (x, yx ) = L(γx (s), γ̇x (s))ds
0
Now, for any x ∈ K4 and y ∈ Kx , there are Vx an open neighbourhood of x and

Γx,y : Vx × [0, t] → M a smooth mapping such that Γx,yx (x, ·) = γx (·), and such that for
every x0 ∈ Vx and y ∈ Kx , Γx,y (x0 , ·) is a smooth curve joining x0 to y. We have
Z t µ ¶
+ 0 0 dΓx,yx 0
Tt u(x ) ≥ u(yx ) − L Γx,yx (x , s), (x , s) ds − c(H)t
0 ds
≥ Tt+ u(x) + φx,yx (x0 ),
where the function φx : Vx → R is defined by

Z t µ ¶ µ ¶
0 dΓx,yx 0 dΓx,yx 0
φx,yx (x ) := L Γx,yx (x, s), (x, s) − L Γx,yx (x , s), (x , s) ds
0 ds ds
for all x0 ∈ Vx . The function φx,yx is smooth and satisfies φx,yx (x) = 0. By construction
(because of the compactness
³ ´ of the set K̃), it is clear that the set of functions {φx,yx }x∈K4
◦
can be bounded in C 2 K 4 . More in general, the whole family G := {φx,y }x∈K4 ,y∈Kx is
bounded in C 2 .
Since K4 is compact, up to working in local charts, and using standard arguments to
extend the C 2 functions of our family constructed in charts to an open neighborhood of
K4 in such a way to preserve a global C 2 bound, we can assume that we are in Rn . Thus,
by [119, Proposition 6] applied to −Tt+ u, we obtain that v = Tt+ u is σ-semiconvex on
K4 , with the constant σ depending only on the C 2 bound of the family G (and therefore
is independent of the subsolution u). Now, by [119, Proposition 7], for any x ∈ K4 and
any p ∈ D− v(x) there exists a parabola Px,p with second derivative bounded by σ which
touches v from below at x with dx Px,p = p. By Lemma 4.3.2 we have the global bound
kpkx ≤ K 0 , and therefore F := {Px,p } is the desired family.
We claim that, for t1 , s1 > 0 small enough, the function u1 := Ts−1 Tt+1 u is C 1,1 on K2
and that the Lipschitz constant of its derivative can be bounded independently of u. In
order to prove this claim, we will show that, for s small enough, we have
© ª
Ts− Tt+ u(x) := min Tt+ u(y) + hs (y, x) + c(H)s , ∀x ∈ K2 . (4.3.3)
y∈K3
Once we will have proved this, the problem of proving C 1,1 regularity in K2 will be
exactly the same as in the compact case and so the proof in [20] will work.
1,1
³◦ ´
Indeed, always as in [20], for s small enough Ts− (F ) is a bounded subset of C 2 K 3
and, by (4.3.3), one can write
Ts− v = max Ts− f on K2 ,

f ∈F
that implies that Ts− Tt+ u is C 1,1 on K2 . Moreover, we can assume that s is sufficiently
small so that Ts− (F ) is bounded in C 2 by a constant σ 0 which is still independent of u,
and this implies that the C 1,1 bound is independent of the particular subsolution u. Let
us now prove (4.3.3).
Set v := Tt+ u and fix s > 0. We recall that v is critical viscosity subsolution.
Since v is K 0 -Lipschitz on M , we deduce that for any x, y ∈ M ,
v(y) + hs (y, x) + c(H)s ≥ v(x) − K 0 d(x, y) + 2K 0 d(x, y) + C(K 0 )s + c(H)s

≥ v(x) + K 0 d(x, y) + (C(K 0 ) + C(H))s.
But, taking y = x in the formula defining Ts− v(x) yields for any x ∈ M ,
Ts− v(x) ≤ v(x) + hs (x, x) + c(H)s

≤ v(x) + sL(x, 0) + c(H)s ≤ v(x) + (A(0) + c(H))s.
Consequently, we deduce that, for every x ∈ M , the infimum in the definition of Ts− v(x)
is necessarily attained at a point yx ∈ M satisfying
(A(0) − C(K 0 ))
d(x, yx ) ≤ s.
K0
(C−c(H))
So (4.3.3) follows taking s such that A(0)+c(H) s ≤ dist(K2 , K3c ). As we said above,
now the proof given in [20] allows us to say that u1 is C 1,1 in K2 . Let us now define
u2 (x) := Ts−2 Tt+2 u1 (x). Arguing as above we get that, for t2 , s2 smalls enough, u2 is C 1,1
in K3 . We now claim that, taking t2 , s2 sufficiently smalls, we also have that
µ ¶
1
LipK1 (dx u2 ) ≤ 1 + LipK2 (dx u1 ), (4.3.4)
2
where, for a function f , we denote by LipKn (f ) the Lipschitz constant of f on Kn . This

simply follows observing that, if we denote by Γu2 ⊂ T ∗ M the graph of the differen-
tial of u2 , as dx u2 is Lipschitz on K2 , the evolution of u2 in K1 by the two semigroups
corresponds to the evolution of Γu2 in T ∗ K1 by the Hamiltonian flow, at least for small
times (see [20]). Thus the smoothness of the Hamiltonian flow tells us that the Lipschitz
constant of dx u2 uniformly converges in K1 to the Lipschitz constant of dx u1 . In partic-

ular, for t2 , s2 sufficiently smalls, we get (4.3.4). Now we iterate the construction in the
following way:
un+1 (x) := Ts−n+1 Tt+n+1 un (x),
with tn+1 , sn+1 smalls enough so that
un+1 is C 1,1 in Kn+2 ,
µ ¶
1
LipKn (dx un+1 ) ≤ 1 + n LipKn+1 (dx un ).
2
In this way, by Ascoli-Arzelà theorem and a diagonal argument, we find a subsequence
unk which converges in the C 1 topology to a function u∞ ∈ C 1 . Moreover, as
Y∞ µ ¶
1
1 + n < +∞,
n=1
2
for a fixed compact Kl we have an uniform bound for the Lipschitz constant of dx un on
1,1
Kl for any n. This implies that u∞ ∈ Cloc . ¤
4.4 Proofs of Theorems 4.1.1, 4.1.2, 4.1.4

4.4.1 Proof of Theorem 4.1.1
Let us first assume that dim M = 1, 2. Without loss of generality, we can assume that
we work in a relatively compact open set O ⊂ Rn with n = 1, 2. Our aim is to apply
Lemma 4.2.4 with E = A ∩ O, (X, dX ) = (M, dM ), Ψ = Id and α = 1. By Lemma 4.2.1,
we know that for every x, y ∈ A,
dM (x, y) = max {(u1 − u2 )(y) − (u1 − u2 )(x)} .
u1 ,u2 ∈S−
Let u1 , u2 : M → R be two weak KAM solutions be fixed. It is well known that both the
mappings x ∈ M 7→ dx u1 , x ∈ M 7→ dx u2 coincide and are locally Lipschitz on A (see
[65]). Thus there is a constant C > 0 which does not depend on u1 and u2 such that if
we set v := u1 − u2 , we have
|v(y) − v(x)| ≤ C|x − y|2 , ∀x, y ∈ A ∩ O.
Since u1 and u2 are arbitrary, we get
dM (x, y) ≤ C|x − y|2 , ∀x, y ∈ A ∩ O.
4.4. Proofs of Theorems 4.1.1, 4.1.2, 4.1.4 149
By Lemma 4.2.4, we deduce that if dim M = 1, 2, (AM , dM ) has vanishing one-dimensional

Hausdorff measure.
Let us now assume that dim M = 3. The fact that A0M has vanishing one-dimensional
Hausdorff measure will follow from Theorem 4.1.2. So, it suffices to prove that the semi-
metric space A \ A0 has vanishing one-dimensional Hausdorff measure. Set A0 := A \ A0
and consider x ∈ A0 . From [69, Proposition 5.2], there exists a curve γ : R → M such
that γ(0) = x and such that each critical viscosity subsolution u : M → R satisfies
Z t0
0
u(γ(t )) − u(γ(t)) = L(γ(s), γ̇(s))ds + c(H)(t0 − t),
t
for all t < t0 ∈ R. In particular, each such u is differentiable at each point of γ and
satisfies
∂L
dγ(t) u = (γ(t), γ̇(t)), ∀t ∈ R
∂v
(see [65]). Therefore we deduce that the function v : M → R defined as v := u1 − u2
is constant along the orbits of the Aubry set. Since this is true for any pair of KAM
solutions, this implies that dM (γ(s), γ(t)) = 0 for all s, t ∈ R. As a consequence, it suffices
to prove that the set (A0 ∩ S, dM ) has vanishing one-dimensional Hausdorff measure for
each open surface S ⊂ M which is locally transverse to the orbits of the Aubry set. As
above, this is a consequence of the fact that the mapping x 7→ dx u is locally Lipschitz
on A (with a constant which does not depend on u) and Lemma 4.2.4.

Without loss of generality, we can assume that we work on a relatively compact open set
O ⊂ Rn and that c(H) = 0. Set for every x ∈ O,
H̃(x) := minn {H(x, p)} = −L(x, 0),

p∈R
and
∂L
p̃(x) := (x, 0).
∂v
The function H̃ is of class C k,1 on O and satisfies for every x ∈ O,
H̃(x) = H(x, p̃(x)) ≤ 0.
Moreover, we notice that, by strict convexity of H and the fact that O is relatively
compact, there exists α ≥ 0 such that for every x ∈ O,
q
H(x, p) ≤ 0 =⇒ |p − p̃(x)| ≤ α −H̃(x). (4.4.1)
Using lemma 4.2.2, we can decompose A0 as
A0 = ∪i∈N Ai .
First, since A0 is countable, it has zero Hausdorff dimension. Let us now show that each
Ai has vanishing one-dimensional Hausdorff measure. Let i ≥ 1 be fixed. Since H̃ is a
C k,1 function vanishing on A0 , by (4.2.1) we know that
¯ ¯
¯ ¯
−H̃(x) = ¯H̃(x) − H̃(y)¯ ≤ Mi |x − y|k+1 , ∀y ∈ Ai , ∀x ∈ Bi .
Hence, from (4.4.1), we have for every C 1 critical subsolution u : M → R,

p k+1
|dx u − p̃(x)| ≤ α Mi |x − y| 2 , ∀y ∈ Ai , ∀x ∈ Bi ,
which gives for every pair of C 1 critical subsolutions u1 , u2 : M → R,

k+1
|dx (u1 − u2 )| ≤ Ci |x − y| 2 , ∀y ∈ Ai , ∀x ∈ Bi ,
for a certain constant Ci . By lemma 4.2.3, integrating along a path from x to y in Bi

yields
k+1
|(u1 − u2 )(y) − (u1 − u2 )(x)| ≤ C̃Ci |x − y| 2 +1 , ∀x, y ∈ Ai .
By Lemma 4.2.1, we deduce that
k+1
+1
dM (x, y) ≤ C̃Ci |x − y| 2 , ∀x, y ∈ Ai .
In order to conclude that Ai has vanishing one-dimensional Hausdorff measure, it suffices,

from Lemma 4.2.4, to have k such that k+12
+1 ≥ n, that is k ≥ 2n−3. As a consequence,
we proved that the one-dimensional Hausdorff dimension of (A0M , dM ) vanishes as soon
as k ≥ 2n − 3. Finally, the fact that the α-dimensional Hausdorff measure of (A0M , dM )
vanishes whenever α ∈ (0, 1] is such that α( k+12
+ 1) ≥ dim M follows from the same
arguments.

Let Ãp be the set of points in the Aubry set Ã that project on Ap . For every x ∈ A,
let us denote by T (x) the period of the unique orbit of the Euler-Lagrange flow in Ã
passing through x. Fix x̄ ∈ Ap and denote by γ̄ : [0, T (x̄)] → M the path such that
γ̄(0) = γ̄(T (x̄)) = x̄ and satisfying
Z T (x̄)
hT (x̄) (x̄, x̄) + c(H)T (x̄) = ˙
L(γ̄(s), γ̄(s))ds + c(H)T (x̄) = 0.
0
Let S be a smooth hypersurface in M which is locally transverse to γ̄ at x̄ and E be the

fiber bundle over S, that is, the set of (x, v) ∈ T M with x ∈ S. For every (x, v) ∈ E, let
τ (x, v) be the first time t > 0 such that φt (x, v) ∈ E. For sake of simplicity, for every
(x, v) ∈ E, we denote by γx,v : [0, τ (x, v)] → M the trajectory of the Euler-Lagrange
flow starting at (x, v). We define the mapping θ : E → S by
θ(x, v) := γx,v (τ (x, v)), ∀(x, v) ∈ E.
So θ is something like the Poincaré map (or first return map) associated with γ̄ and S, it
is well defined in a small neighbourhood N ⊂ E of (x̄, v̄) and it is of class C k−1 . Denote
by dSg (·, ·) the distance on S which corresponds to the Riemannian metric induced by g
on S. We recall that the mapping (x, y) 7→ dSg (x, y)2 is smooth in a small neighbourhood
NS of x̄ in S. Without loss of generality we can assume that θ(N ) ⊂ NS . Define the
mapping Ψ : N → R by
³Z τ (x,v) ´2
Ψ(x, v) := L(γx,v (s), γ̇x,v (s))ds + c(H)τ (x, v) + dSg (θ(x, v), x)2 , ∀(x, v) ∈ N.
0
By construction, there exists δ(x̄) > 0 such that, for every (x, v) ∈ Ãp ∩ N , we have
T (x) ∈ (T̄ − δ(x̄), T̄ + δ(x̄)) =⇒ Ψ(x, v) = 0,
where T̄ := T (x̄). Denote by Ãx̄ the set of (x, v) ∈ Ãp ∩ N such that T (x) ∈ (T̄ −
δ(x̄), T̄ + δ(x̄)), and by Ax̄ its projection on M . Furthermore, we notice that for every
(x, v) ∈ N , if we consider a minimizing geodesic with unit speed (for the Riemannian
metric g on S) γ : [0, dSg (θ(x, v), x)] → S joining θ(x, v) to x, we have
Z τ (x,v) Z dS
g (θ(x,v),x)
hρ(x,v) (x, x) ≤ L(γx,v (s), γ̇x,v (s))ds + L(γ(s), γ̇(s))ds,
0 0
where ρ(x, v) is defined by
ρ(x, v) := τ (x, v) + dSg (θ(x, v), x), ∀(x, v) ∈ N.
Hence, if we denote by J an upper bound for |L(x, v)| with x ∈ NS and v ∈ Tx S satisfying
|v|g ≤ 1, we obtain for every (x, v) ∈ N ,
¯Z τ (x,v)
¯
hρ(x,v) (x, x) + c(H)ρ(x, v) ≤ ¯ L(γx,v (s), γ̇x,v (s))ds
0 ¯
¯
+ c(H)τ (x, v)¯ + (c(H) + J)dSg (θ(x, v), x)
p
≤ (1 + c(H) + J) Ψ(x, v). (4.4.2)
Without loss of generality, we can assume from now that we work in Rn . From Lemma
4.2.2, we can decompose the set Ãx̄ as
Ãx̄ = ∪i∈N Ãi .
Let i ≥ 1 be fixed. Since Ψ is of class C k−1 on N , by (4.2.1) we know that

¡ ¢ k−1
0 ≤ Ψ(x, v) ≤ Mi |x − y|2 + |v − w|2 2 , ∀(y, w) ∈ Ãi , ∀(x, v) ∈ B̃i . (4.4.3)
We need now the following result.
Lemma 4.4.1. Let L > 0, t0 > 0, K be a compact subset of M , and O be an open

neighbourhood of K in M . There exists M̂ = M̂ (L, t0 , K, O) > 0 such that, for every
1,1
critical viscosity subsolution of class Cloc such that the mapping x 7→ dx u is L-Lipschitz
on O, we have
1
c(H) − H(x, dx u) ≤ M̂ {ht (x, x) + c(H)t} 2 , ∀t ≥ t0 , ∀x ∈ K.
Proof. Let x ∈ K be such that H(x, dx u) < c(H). For every y ∈ M , we set ²(y) :=
c(H) − H(y, dy u). Since the mapping y 7→ ²(y) is continuous, there exists a constant CL
such that
²(y) ≤ CL , ∀y ∈ K.
Moreover, since y 7→ dy u is L-Lipschitz on O, there exists KL > 0 such that
³ the mapping
´
CL ²(x)
y 7→ H(y, dy u) is KL -Lipschitz on O and 2K L
≤ dist(K, Oc ). Hence B x, 2K L
⊂O
and so we have µ ¶
²(x) ²(x)
H(y, dy u) ≤ c(H) − , ∀y ∈ B x,
2 2KL
Since K is compact and L is uniformly superlinear in the fibers, there exists an upper
bound A for kwky over the set of (y, w) with y ∈ K such that the corresponding periodic
orbit in T M minimizes ht (y, y) for some t ≥ t0 (this follows directly from the proof of
Proposition 6.2.17 in the Appendix). Let γ : [0, t] → M be such that γ(0) = γ(t) = x
and satisfying
Z t
ht (x, x) = L(γ(s), γ̇(s))ds.
0
Since kγ̇(s)kγ(s) is always bounded by A, we have

µ ¶
²(x) ²(x)
γ(s) ∈ B x, , ∀|s| ≤ =: s0 (x).
2KL 2AKL
Thus we have
Z s0 (x) µ ¶
²(x)
u(γ(s0 (x))) − u(x) ≤ L(γ(s), γ̇(s)ds + c(H) − s0 (x).
0 2
Moreover since u is a critical viscosity subsolution, if t ≥ s0 (x) we have

Z t
u(x) − u(γ(s0 (x))) ≤ L(γ(s), γ̇(s)ds + c(H)(t − s0 (x)).
s0 (x)
In consequence, we obtain
Z t
²(x)
0 ≤ L(γ(s), γ̇(s)ds + c(H)t − s0 (x)
0 2
²(x)
= {ht (x, x) + c(H)t} − s0 (x).
2
Therefore we have
²(x) ²2 (x)
{ht (x, x) + c(H)t} ≥ s0 (x) = ,
2 4AKL
which means that, as long as s0 (x) ≤ t0 ,
p 1
c(H) − H(x, dx u) ≤ 2 AKL {ht (x, x) + c(H)t} 2 , ∀t ≥ t0 .
²(x)
Thus, in order to conclude, we need to have s0 (x) = 2AK L
≤ t0 for all x ∈ K, which is
the case if
CL CL
≤ t0 ⇔ AKL ≥ .
2AKL 2t0
Then we conclude
( r )
p CL 1
c(H) − H(x, dx u) ≤ max 2 AKL , 2 {ht (x, x) + c(H)t} 2 , ∀t ≥ t0 , ∀x ∈ K.
t0
Returning to the proof of Theorem 4.1.4, we notice that, without loss of general-
ity, we can assume that we work in a compact set K which is included in a relatively
compact open subset O of Rn with n = dim M . So, let us denote by L = L(O) the
1,1
uniform Lipschitz constant given by Theorem 4.3.1. Fix from now a pair of Cloc critical
subsolutions v1 , v2 : M → R such that dx v1 , dx v2 are L-Lipschitz on O. We notice that,
by strict convexity of the Hamiltonian, there exists a constant β > 0 such that for every
x ∈ O,
µ ¶
p1 + p2 H(x, p1 ) + H(x, p2 )
H x, ≤ − β|p1 − p2 |2 , ∀p1 , p2 ∈ Tx∗ M.
2 2
Hence, we have for every x ∈ O,
µ ¶
dx v1 + dx v2
H x, ≤ c(H) − β|dx v1 − dx v2 |2 .
2
By Lemma 4.4.1, we deduce that
M̂ 1
|dx v1 − dx v2 |2 ≤ {ht (x, x) + c(H)t} 2 , ∀t ≥ T̄ − δ(x̄), ∀x ∈ K. (4.4.4)
β
Let (x, v), (y, w) ∈ Ãi . From Lemma 4.2.3, there is a C 1 path (x(·), v(·)) : [0, 1] → N
in B̃i from (x, v) to (y, w) with length less than Ci |(x, v) − (y, w)|. Since dx v1 = L(x, v)
and dy v1 = L(y, w), there is a constant D > 0 such that for every s ∈ [0, 1],
|(x(s), v(s)) − (x, v)| ≤ Ci |(x, v) − (y, w)|
√
≤ Ci 1 + D2 |x − y|.
Hence, by (4.4.3), we have for every s ∈ [0, 1],
¡ ¢ k−1
Ψ(x(s), v(s)) ≤ Mi Cik−1 1 + D2 2 |x − y|k−1 .
By (4.4.2), this means that for every s ∈ [0, 1],
k−1
hρ(x(s),v(s)) (x(s), x(s)) + c(H)ρ(x(s), v(s)) ≤ (1 + c(H) + J)Di |x − y| 2 ,
for a certain constant Di . By (4.4.4) we finally deduce that there exists some constant
Ei > 0 such that, for every s ∈ [0, 1],
¯ ¯
¯dx(s) v1 − dx(s) v2 ¯ ≤ Ei |x − y| k−1
8 .
Integrating along the path, we obtain

k−1
+1
|(v1 − v2 )(x) − (v1 − v2 )(y)| ≤ CEi |x − y| 8 .
Finally, from Lemma 4.2.1, we deduce that
k−1
+1
dM (x, y) ≤ CEi |x − y| 8 ,
for every x, y ∈ π(Ãi )∩O. As a consequence, we deduce easily that, if k−1
8
+1 ≤ n (that is
k ≥ 8n − 7), the semi-metric space Ax̄ has vanishing one-dimensional Hausdorff measure.
Using a countable family of sets Ax̄ to cover ApM , we conclude easily that the one-
dimensional Hausdorff dimension of (ApM , dM ) vanishes as soon as k ≥ 8n−7. Finally, the
fact that the α-dimensional Hausdorff measure of (ApM , dM ) vanishes whenever α ∈ (0, 1]
is such that α( k−1
8
+ 1) ≥ dim M follows from the same arguments as the proof above.
4.4.4 A general result

We observe that most of the results proved above can be seen as particular cases of the
following general principle.
Theorem 4.4.2. Assume that dim M ≥ 3, H is of class C 2 , and that there are r >
0
0, k 0 , l ∈ N and a function G : T M → R of class C k ,1 which satisfies the following
properties:
1. G(x, v) ≡ 0 on Ã;
2. {mr (x)}l ≤ G(x, v) for all (x, v) ∈ T M , where mr (x) := inf t≥r {ht (x, x) + c(H)t};
3. k 0 ≥ 4l(dim M − 1) − 1.
Then (AM , dM ) has vanishing one-dimensional Hausdorff measure. Moreover, if α ∈
0
(0, 1] is such that α( k 4l+1 + 1) ≥ dim M then (AM , dM ) has vanishing α-dimensional
Hausdorff measure. In particular, if G is C ∞ , then (AM , dM ) has zero Hausdorff dimen-
sion.
Proof. As in the proof of Proposition 4.1.4 we can assume that we work in a compact set
K which is included in a relatively compact open subset O of Rn with n = dim M . Let
L = L(O) be the uniform Lipschitz constant given by Theorem 4.3.1, and v1 , v2 : M → R
1,1
be two Cloc critical subsolutions such that dx v1 , dx v2 are L-Lipschitz on O. By lemma
4.4.1, there exists M̂ > 0 such that
1 1
c(H) − H(x, dx u1 ) ≤ M̂ {mr (x)} 2 , c(H) − H(x, dx u2 ) ≤ M̂ {mr (x)} 2 , ∀x ∈ K.
Arguing now as in the proof of Theorem 4.1.4, we get
M̂ 1 M̂ 1
|dx v1 − dx v2 |2 ≤ {mr (x)} 2 ≤ G(x, v) 2l , ∀x ∈ K.
β β
From Lemma 4.2.2, we can decompose the set Ã as
Ã = ∪i∈N Ãi .
0
Let i ≥ 0 be fixed. Since G is a nonnegative C k ,1 function vanishing on Ã, by (4.2.1),
we know that
¡ ¢ k0 +1
0 ≤ G(x, v) ≤ Mi |x − y|2 + |v, w|2 2 , ∀(y, w) ∈ Ãi , ∀(x, v) ∈ B̃i .
As in the proof of Theorem 4.1.4, we deduce easily that there is a constant Ni > 0 such
that
k0 +1
dM (x, y) ≤ Ni |x − y| 4l +1 ,
for every x, y ∈ π(Ãi ) ∩ O. We now conclude as in the proofs of Theorems 4.1.2 and
4.1.4.
Remark 4.4.3. It can be shown that for every compact subset K ⊂ M , there is a
constant CK > 0 such that
h(x, x) ≤ CK d(x, A)2 , ∀x ∈ K,
where d(x, A) denotes the Riemannian distance from x to the set A (which is assumed
to be nonempty). Therefore, from Theorem 4.4.2, we deduce that if there are l ∈ N and
0
a function G : M → R of class C k ,1 with k 0 ≥ 2l(dim M − 1) − 1 such that
d(x, A)l ≤ G(x), ∀x ∈ M,
then (AM , dM ) has vanishing one-dimensional Hausdorff measure.
4.5 Proof of Theorem 4.1.5

By Theorems 4.1.2 and 4.1.4 we know that (A0M ∪ApM , dM ) has zero Hausdorff dimension.
Theorem 4.1.5 will follow from the fact that AM \ (A0M ∪ ApM ) is a finite set.
We recall that the Aubry set Ã ⊂ T M is defined as the set of (x, v) ∈ T M such that
x ∈ A and v is the unique v ∈ Tx M such that dx u = ∂L ∂v
(x, v) for any critical viscosity
subsolution. This set is invariant under the Euler-Lagrange flow φLt . For every x ∈ A,
we denote by O(x) the projection on A of the orbit of φLt which passes through x. We
observe that the following simple fact holds:
Lemma 4.5.1. If x, y ∈ A and O(x) ∩ O(y) 6= ∅, then dM (x, y) = 0.
Let us define
C0 := {x ∈ A | O(x) ∩ A0 }, Cp := {x ∈ A | O(x) ∩ Ap }.
Thus, if x ∈ C0 ∪ Cp , by Lemma 4.5.1 the Mather distance between x and A0 ∪ Ap is
zero, and we have done.
Let us now define C := A \ (C0 ∪ Cp ), and let (CM , dM ) be the quotiented metric space.
To conclude the proof, we show that this set consists of a finite number of points.
Let u be a C 1,1 critical subsolution (which existence is provided by [20]), and let X
be the Lipschitz vector field uniquely defined by the relation
L(x, X(x)) := (x, dx u),
where L denotes the Legendre transform. Its flow extends on the whole manifold the flow
considered above on A. We fix x ∈ C. Then O(x) is a non-empty, compact, invariant
set which contains a non-trivial minimal set for the flow of X (see [115, Chapter 1]). By
[101], we know that there exists at most a finite number of such non-trivial minimal sets.
Therefore, again by Lemma 4.5.1, (CM , dM ) consists only in a finite number of points.
4.6. Applications in Dynamics 157
4.6 Applications in Dynamics

4.6.1 Preliminary results
Throughout this section, M is assumed to be compact. As before, H : T ∗ M → R is a
Hamiltonian of class at least C 2 satisfying the three usual conditions (H1)-(H3), and L
is the Tonelli Lagrangian which is associated to it by Fenchel’s duality. We denote by
SS the set of critical viscosity subsolutions and by S− the set of weak KAM solutions.
Hence S− ⊂ SS. If u : M → R is a critical viscosity subsolution, we recall, see [65], that
the set Ĩ(u) is the subset of T M defined by
Ĩ(u) = {(x, v) ∈ T M | t 7→ γ(t) := πφt (x, v) is (u, L, c(H))-calibrated on (−∞, +∞)},
that is for all t1 < t2 ∈ R,

Z t2
u(γ(t2 )) − u(γ(t1 )) = L(γ(t), γ̇(t)) dt + c(H)(t2 − t1 ).
t1
The following properties of Ĩ(u) are shown in [65].
Theorem 4.6.1. The set Ĩ(u) is invariant under the Euler-Lagrange flow φLt . Moreover,
if (x, v) ∈ Ĩ(u), then dx u exists, and we have
∂L
dx u = (x, v) and H(x, dx u) = c(H).
∂v
It follows that:
1) The restriction π|Ĩ(u) of the projection is injective; therefore, if we set I(u) =

π(Ĩ(u)), then Ĩ(u) is a continuous graph over I(u).
2) The map x 7→ dx u is continuous on I(u).
Moreover the following results holds (see [65] or [63, Théorème 1]).
Theorem 4.6.2. 1) Two weak KAM solutions that coincide on A are equal everywhere.
2) For every u ∈ SS, there is a unique weak KAM solution u− : M → R such that
u− = u on A; moreover, the two functions u and u− are also equal on I(u).
The Aubry set Ã is given by

\ \
Ã := Ĩ(u) = Ĩ(u)
u∈SS u∈S−
(the equivalence between the definition with SS and the one with S− can be easily shown
from the results of [65]). The projected Aubry set A is simply the image π(Ã). We also
have \ \
A := I(u) = I(u).
u∈SS u∈S−
The Mañé set Ñ is given by

[ [
Ñ := Ĩ(u) = Ĩ(u).
u∈SS u∈S−
Both Ã and Ñ are compact subsets of T M invariant under the Euler-Lagrange flow φLt
of L.
Theorem 4.6.3 (Mañé). Each point of the invariant set Ã is chain-recurrent for the
restriction φLt |Ã . Moreover, the invariant set Ñ is chain-transitive for the restriction
φLt |Ñ .
Corollary 4.6.4. The restriction φLt |Ã to the invariant subset Ã is chain-transitive if
and only if Ã is connected.
Proof. This is an easy well-known result in the theory of Dynamical Systems: Suppose
θt , t ∈ R, is a flow on the compact metric space X. If every point of X is chain-recurrent
for θt , then θt is chain-transitive if and only if X is connected.
We give now the general relationship between uniqueness of weak KAM solutions and
the quotient Mather set.
Proposition 4.6.5. The two following statements are equivalent:
1) Any two weak KAM solutions differ by a constant.
2) The Mather quotient (AM , dM ) is trivial, i.e. is reduced to one point.
Moreover, if anyone of these conditions is true, then Ã = Ñ , and therefore Ã is connected

and the restriction of the Euler-Lagrange flow φLt to Ã is chain-transitive.
Proof. For every fixed x ∈ M , the function y 7→ h(x, y) is a weak KAM solution.
Therefore if we assume that any two weak KAM solutions differ by a constant, then for
x1 , x2 ∈ M we can find a constant Cx1 ,x2 such that
∀y ∈ M, h(x1 , y) = Cx1 ,x2 + h(x2 , y).

If x2 ∈ A, then h(x2 , x2 ) = 0, therefore evaluating the equality above for y = x2 , we

obtain Cx1 ,x2 = h(x1 , x2 ). Substituting in the equality and evaluating we conclude
∀x1 ∈ M, ∀x2 ∈ A, h(x1 , x1 ) = h(x1 , x2 ) + h(x2 , x1 ).
This implies
∀x1 , x2 ∈ A, h(x1 , x2 ) + h(x2 , x1 ) = 0.
Which means that dM (x1 , x2 ) = 0, for every x1 , x2 ∈ A.
To prove the converse, let us recall that for every critical subsolution u, we have
∀x, y ∈ M, u(y) − u(x) ≤ h(x, y).
Therefore applying this for a pair u1 , u2 ∈ SS, we obtain
∀x, y ∈ M, u1 (y) − u1 (x) ≤ h(x, y),

u2 (x) − u2 (y) ≤ h(y, x).
Adding and rearranging, we obtain
∀x, y ∈ M, (u1 − u2 )(y) − (u1 − u2 )(x) ≤ h(x, y) + h(y, x).
Since the right hand side is symmetric in x, y, we obtain
∀x, y ∈ M, |(u1 − u2 )(y) − (u1 − u2 )(x)|≤ h(x, y) + h(y, x).
If we assume that 2) is true, this implies that u1 − u2 is constant c on the projected

Aubry set A, or u1 = u2 + c on A. If u1 , u2 are in fact weak KAM solutions then we
must have u1 = u2 + c on M , because any two solutions equal on the Aubry set are equal
everywhere, see 2) of Theorem 4.6.2.
It remains to show the last statement. Notice that if u1 , u2 ∈ SS differ by a constant
then Ĩ(u1 ) = Ĩ(u2 ). Therefore if any two elements in S− differ by a constant, then
Ã = Ĩ(u) = Ñ ,
where u is any element in S− . But, by Mañé’s Theorem 4.6.3, the invariant set Ñ is
chain-transitive for the flow φt , hence it is connected by Corollary 4.6.4.
We now denote by XL the Euler-Lagrange vector field of L, that is the vector field
on T M that generates φLt . We recall that an important property of XL is that
∀(x, v) ∈ T M, T π(XL (x, v)) = v,
where T π : T (T M ) → T M denotes the canonical projection.

Here is a last ingredient that we will have to use.
Proposition 4.6.6 (Lyapunov Property). Suppose u1 , u2 ∈ SS. The function (u1 −

u2 ) ◦ π is non-decreasing along any orbit of the Euler Lagrange flow φLt contained in
Ĩ(u2 ). If we assume u1 is differentiable at x ∈ I(u2 ), and (x, v) ∈ Ĩ(u2 ), then, using
that u2 is differentiable on I(u2 ), we obtain
XL · [(u1 − u2 ) ◦ π](x, v) = dx u1 (v) − dx u2 (v) ≤ 0.
Moreover, the inequality above is an equality, if and only if dx u1 = dx u2 . In that case
H(x, dx u1 ) = H(x, dx u2 ) = c(H).
Proof. If (x, v) ∈ Ĩ(u2 ) then t 7→ πφt (x, v) is (u2 , L, c(H))-calibrated, hence
Z t2
∀t1 ≤ t2 , u2 ◦ π(φt2 (x, v)) − u2 ◦ π(φt1 (x, v)) = L(φs (x, v)) ds + c(H)(t2 − t1 ).
t1
Since u1 ∈ SS, we get

Z t2
∀t1 ≤ t2 , u1 ◦ π(φt2 (x, v)) − u1 ◦ π(φt1 (x, v)) ≤ L(φs (x, v)) ds + c(H)(t2 − t1 ).
t1
Combining these two facts, we conclude

∀t1 ≤ t2 , u1 ◦ π(φt2 (x, v)) − u1 ◦ π(φt1 (x, v)) ≤ u2 ◦ π(φt2 (x, v)) − u2 ◦ π(φt1 (x, v)).
This implies
∀t1 ≤ t2 , (u1 − u2 ) ◦ π(φt2 (x, v)) ≤ (u1 − u2 ) ◦ π(φt1 (x, v)).
Recall that u2 is differentiable at every x ∈ I(u2 ). Thus, if also dx u1 exists, if (x, v) ∈
Ĩ(u2 ) we obtain
XL · [(u1 − u2 ) ◦ π](x, v) ≤ 0.
We remark that XL ·[(u1 −u2 )◦π](x, v) = dx (u1 −u2 )(T π◦XL (x, v)). Since T π◦XL (x, v) =
v, we obtain
XL · [(u1 − u2 ) ◦ π](x, v) = dx u1 (v) − dx u2 (v) ≤ 0.
If the last inequality is an equality, we have dx u1 (v) = dx u2 (v)). Since (x, v) ∈ Ĩ(u2 ), we
have dx u2 = ∂L∂v
(x, v) and H(x, dx u2 ) = c(H), therefore the Fenchel inequality yields the
equality
dx u2 (v) = L(x, v) + H(x, dx u2 ) = L(x, v) + c(H).
Since u1 ∈ SS, we know that H(x, dx u1 ) ≤ c(H). The previous equality, using the
Fenchel inequality dx u1 (v) ≤ L(x, v) + H(x, dx u1 ), and the fact that dx u1 (v) = dx u2 (v),
implies
H(x, dx u1 ) = c(H) and dx u1 (v) = L(x, v) + H(x, dx u1 ).
This means that we have equality in the Fenchel inequality dx u1 (v) ≤ L(x, v)+H(x, dx u1 ),
we therefore conclude that dx u1 = ∂L
∂v
(x, v), but the right hand side of this last equality
is dx u2 .
4.6.2 Strong Mather condition

Definition 4.6.7. We will say that the the Tonelli Lagrangian L on M satisfies the
strong Mather condition if for every pair u1 , u2 ∈ S− , the image (u1 − u2 )(A) ⊂ R is of
Lebesgue measure 0.
Notice that by part 2) of Theorem 4.6.2, if L satisfies the strong Mather condition,
then for every pair of critical sub-solutions u1 , u2 , the image (u1 − u2 )(A) ⊂ R is also of
Lebesgue measure 0. By the results proved in this chapter, we get:
Theorem 4.6.8. Let L be a Tonelli Lagrangian on the compact manifold M . It satisfies

the strong Mather condition in any one of the following cases:
(1) The dimension of M is 1 or 2.
(2) The dimension of M is 3, and Ã contains no fixed point of the Euler-Lagrange

flow.
(3) The dimension of M is 3, and L is of class C3,1 .
(4) The Lagrangian is of class Ck,1 , with k ≥ 2 dim M − 3, and every point of Ã is
fixed under the Euler-Lagrange flow φLt .
(5) The Lagrangian is of class Ck , with k ≥ 8 dim M − 7, and each point of Ã either
is fixed under the Euler-Lagrange flow φLt or its orbit in the Aubry set is periodic
with (strictly) positive period.
Lemma 4.6.9. Suppose that L is a Tonelli Lagrangian L on the compact manifold M

that satisfies the strong Mather condition. For every u ∈ SS, the set of points in Ĩ(u)
which are chain-recurrent for the restriction φLt |Ĩ(u) of the Euler-Lagrange flow is precisely
the Aubry set Ã.
Proof. First of all, we recall that, from Theorem 4.6.3, each point of A is chain-recurrent
for the restriction φLt |Ã . By [69, Theorem 1.5], we can find a C1 critical viscosity
subsolution u1 : M → R which is strict outside A, i.e. for every x ∈ / A we have
H(x, dx u1 ) < c(H). We define θ on T M by θ = (u1 − u) ◦ π. By Proposition 4.6.1, we
know that at each point (x, v) of Ĩ(u) the derivative of θ exists and depends continuously
on (x, v) ∈ Ĩ(u). By Proposition 4.6.6, at each point of (x, v) of Ĩ(u), we have
XL · θ(x, v) = dx u1 (v) − dx u(v)) ≤ 0,
with the last inequality an equality if and only if dx u1 = dx u, and this implies H(x, dx u1 ) =
c(H). Since u1 is strict outside A, we conclude that XL · θ < 0 on Ĩ(u) \ Ã. Suppose
that (x0 , v0 ) ∈ Ĩ(u) \ Ã. By invariance of both Ã and Ĩ(u), every point on the orbit
φLt (x0 , v0 ), t ∈ R is also contained in Ĩ(u) \ Ã, therefore t 7→ c(t) := θ(φt (x0 , v0 )) is
(strictly) decreasing , and so we have c(1) < c(0). Observe now that θ(Ã) = (u1 − u)(A)
has measure 0 by the strong Mather condition, therefore we can find c ∈]c(1), c(0)[\θ(Ã).
By what we have seen, the directional derivative XL · θ is < 0 at every point of the level
set Lc = {(x, v) ∈ Ĩ(u) | θ(x, v) = c}. Since θ is everywhere non-increasing on the orbits
of φLt and XL · θ < 0 on Lc , we get
∀t > 0, ∀(x, v) ∈ Lc , θ(φt (x, v)) < c.
Consider the compact set Kc = {(x, v) ∈ Ĩ(u) | θ(x, v) ≤ c}. Using again that θ is
non-increasing on the orbits of φLt |Ĩ(u) , we have
∀t ≥ 0, φLt (Kc ) ⊂ Kc and φLt (Kc \ Lc ) ⊂ Kc \ Lc .
Using what we obtained above on Lc , we conclude that
∀t > 0, φLt (Kc ) ⊂ Kc \ Lc .
We fix now some metric on Ĩ(u) defining its topology. We consider then the compact set
φL1 (Kc ). It is contained in the open set Kc \ Lc = {(x, v) ∈ Ĩ(u) | θ(x, v) < c}. We can
therefore find ² > 0 such that the ² neighborhood V² (φ1 (Kc )) of φL1 (Kc ) in Ĩ(u) is also
contained in Kc . Since for t ≥ 1, we have φLt−1 (Kc ) ⊂ Kc , and therefore φLt (Kc ) ⊂ φ1 (Kc ),
it follows that Ã !
[
V² φLt (Kc ) ⊂ Kc .
t≥1
It is know easy to conclude that every ²-pseudo orbit for φLt |Ĩ(u) that starts in Kc remains
in Kc . Since θ(φL1 (x0 , v0 )) = c(1) < c < c(0) = θ(x0 , v0 ), no α-pseudo orbit starting at
(x0 , v0 ) can return to (x0 , v0 ), for α ≤ ² such that the ball of center φL1 (x0 , v0 ) and radius
α, in Ĩ(u), is contained in Kc . Therefore (x0 , v0 ) cannot be chain recurrent.
Theorem 4.6.10. Let L be a Tonelli Lagrangian on the compact manifold M . If L
satisfies the strong Mather condition, then the following statements are equivalent:
(1) The Aubry set Ã, or its projection A, is connected.
(2) The Aubry set Ã is chain-transitive for the restriction of the Euler-Lagrange flow
φLt |Ã .
(3) Any two weak KAM solutions differ by a constant.
(4) The Aubry set Ã is equal to the Mañé set Ñ .

(5) There exists u ∈ SS such that Ĩ(u) is chain-recurrent for the restriction φt |Ĩ(u) of
the Euler-Lagrange flow.
Proof. From Corollary 4.6.4, we know that (1) and (2) are equivalent.
If (1) is true then for u1 , u2 ∈ S− , the image u1 − u2 (A) is a sub-interval of R, but
by the strong Mather condition, it is also of Lebesgue measure 0, therefore u1 − u2 is
constant. Hence (1) implies (3).
If (3) is true then (4) follows from Proposition 4.6.5.
Suppose now that (4) is true. Since for every u ∈ SS, we have Ã ⊂ Ĩ(u) ⊂ Ñ , we
obtain Ĩ(u) = Ñ . But Ñ is chain-transitive for the restriction φLt |Ñ . Hence (4) implies
(5).
If (5) is true for some u ∈ SS, then every point of Ĩ(u) is chain-recurrent for the
restriction φLt |Ĩ(u) . Lemma 4.6.9 then implies that Ã = Ĩ(u), and we therefore satisfy
(2).
4.6.3 Mañé Lagrangians

We give know an application to the Mañé example associated to a vector field. Suppose
M is a compact Riemannian manifold, where the metric g is of class C∞ . If X is a Ck
vector field on M , with k ≥ 2, we define the Lagrangian LX : T M → R by
1
LX (x, v) = kv − X(x)k2x ,
2
where as usual kv − X(x)k2x = gx (v, v). We will call LX the Mañé Lagrangian of X, see
the Appendix in [99]. The following proposition gives the obvious properties of LX .
Proposition 4.6.11. Let LX the Mañé Lagrangian of the Ck , k ≥ 2, vector field X on
the compact Riemannian manifold M . We have
∂LX
(x, v) = gx (v − X(x), ·).
∂v
Its associated Hamitonian HX : T ∗ M → R is given by
1
HX (x, p) = kpk2x + p(X(x)).
2
The constant functions are solutions of the Hamilton-Jacobi equation
HX (x, dx u) = 0.
Therefore, we obtain c(H) = 0. Moreover, we have
Ĩ(0) = Graph(X) = {(x, X(x)) | x ∈ M }.
If we call φt the Euler-Lagrange flow of LX on T M , then for every x ∈ M , and every

t ∈ R, we have φt (x, X(x)) = (γxX (t), γ̇xX (t)), where γxX is the solution of the vector field
X which is equal to x for t = 0. In particular, the restriction φt |Ĩ(0) of the Euler-Lagrange
flow to Ĩ(0) = Graph(X) is conjugated (by π|Ĩ(0) ) to the flow of X on M .
Proof. The computation of ∂L ∂v

X
is easy. For HX , we recall that HX (x, p) = p(vp ) −
L(x, vp ), where vp ∈ Tx M is defined by p = ∂L X
∂v
(x, vp ). Solving for vp , and substituting
yields the result.
If u is a constant function then dx u = 0 everywhere, and obviously HX (x, dx u) = 0.
The fact that c(H) = 0 follows, since c(H) is the only value c for which there exists a
viscosity solution of the Hamilton-Jacobi equation H(x, dx u) = c.
Let us define u0 as the null function on M . Suppose now that γ : (−∞, +∞) → M is
a solution of X (by compactness of M solutions of X are defined for all time). We have
dγ(t) u0 (γ̇(t)) = 0, and HX (γ(t), dγ(t) u0 ) = 0; moreover, since γ̇(t) = X(γ(t)), we also get
LX (γ(t), γ̇(t)) = 0. It follows that
dγ(t) u0 (γ̇(t)) = LX (γ(t), γ̇(t)) + HX (γ(t), dγ(t) u0 ) = LX (γ(t), γ̇(t)).
By integration, we see that γ is (u0 , LX , 0)-calibrated, therefore it is an extremal. Hence

we get φt (γ(0), γ̇(0)) = (γ(t), γ̇(t)), and (γ(0), γ̇(0)) ∈ Ĩ(u0 ). But γ̇(0) = X(γ(0)), and
γ(0) can be an arbitrary point of M . This implies Graph(X) ⊂ Ĩ(u0 ). This finishes the
proof because we know that Ĩ(u0 ) is a graph on a part of the base M .
Lemma 4.6.12. Let LX : T M → R be the Mañé Lagrangian associated to the Ck , k ≥ 2,
vector field X on the compact connected manifold M . Assume that LX satisfies the strong
Mather condition, then we have:
(1) The projected Aubry set A is the set of chain-recurrent points of the flow of X on M .
(2) The constants are the only weak KAM solutions if and only every point of M is
chain-recurrent under the flow of X.
Proof. To prove (1), we apply Lemma 4.6.9 to obtain that the Aubry set Ã is equal to set
of points in Ĩ(0)) = Graph(X) which are chain-recurrent for the restriction φt |Graph(X) .
But from Proposition 4.6.11, then projection π|Graph(X) conjugates φt |Graph(X) to the flow
of X on M . It now suffices to observe that A = π(Ã).
We now prove (2). Suppose that every point of M is chain-recurrent for the flow of X.
From what we have just seen A = M . Therefore property (1) of Theorem 4.6.10 holds.
Therefore by property (3) of that same theorem, we have uniqueness up to constants of
weak KAM solutions, but the constants are weak KAM solutions. To prove the converse,
assume that the constants are the only weak KAM solutions. This implies that property
(3) of Theorem 4.6.10 holds. Therefore by property (4) of that same theorem Ã = Ñ .
But Ĩ(0)) = Graph(X) is squeezed between Ã and Ñ . Therefore Ã = Graph(X). Taking

images by the projection π we conclude that A = M . By part (1) of the present lemma,
every point of M is chain-recurrent for the flow of X on M .
Combining this last lemma and Theorem 4.6.8 completes the proof of Theorem 4.1.7.
Chapter 5
DiPerna-Lions theory for SDE
5.1 Introduction and preliminary results
1
Recent research activity has been devoted to study transport equations with rough
coefficients, showing that a well-posedness result for the transport equation in a certain
subclass of functions allows to prove existence and uniqueness of a flow for the associated
ODE. The first result in this direction is due to DiPerna and P.-L.Lions [56], where
the authors study the connection between the transport equation and the associated
ODE γ̇ = b(t, γ), showing that existence and uniqueness for the transport equation is
equivalent to a sort of well-posedness of the ODE which says, roughly speaking, that
the ODE has a unique solution for L d -almost every initial condition (here and in the
sequel, L d denotes the Lebesgue
P measure in Rd ). In that paper they also show that the
transport equation ∂t u + i bi ∂i u = c is well-posed in L∞ if b = (b1 , . . . , bn ) is Sobolev
and satisfies suitable global conditions (including L∞ -bounds on the spatial divergence),
which yields the well-posedness of the ODE.
In [4] (see also [5]), using a slightly differentPphilosophy, Ambrosio studied the con-
nection between the continuity equations ∂t u + i ∂i (bi u) = c and the ODE γ̇ = b(t, γ).
This different approach allows him to develop the general theory of the so-called Regular
Lagrangian Flows (see [5, Remark 31] for a detailed comparison with the DiPerna-Lions
axiomatization), which relates existence and uniqueness for the continuity equation with
well-posedness of the ODE, without assuming any regularity on the vector field b. Indeed,
since the transport equation is in a conservative form, it has a meaning in the sense of
distributions even when b is only L∞ 1
loc and u is Lloc . Thus, a general theory is developed
in [4] under very general hypotheses, showing as in [56] that existence and uniqueness
1
This chapter is based on the work in [77].
167
168 5.0. DiPerna-Lions theory for SDE
for the continuity equation is equivalent to a sort of well-posedness of the ODE. After
having proved this, in [4] the well-posedness of the continuity equations in L∞ is proved
in the case of vector fields with BV regularity whose distributional divergence belongs to
L∞ (for other similar results on the well-posedness of the transport/continuity equation,
see also [47, 48, 90, 83]).
Our aim is to develop a stochastic counterpart of this theory: in our setting the conti-
nuity equation becomes the Fokker-Planck equation, while the ODE becomes an SDE.
Let us consider the following SDE
½
dX = b(t, X) dt + σ(t, X) dB(t)
(5.1.1)
X(0) = x,
where b : [0, T ]×Rd → Rd and σ : [0, T ]×Rd → L (Rr , Rd ) are bounded (here L (Rr , Rd )
denotes the vector space of linear maps from Rr to Rd ) and B is an r-dimensional Brown-
ian motion on a probability space (Ω, A, P). We want to study the existence and unique-
∗
ness of martingale
P solutions for this equation. Let us define a(t, x) := σ(t, x)σ (t, x)
(that is aij := k σik σjk ). We consider the so called Fokker-Planck equation
½ P 1
P
∂ t µt + i ∂i (bi µt ) − 2 ij ∂ij (aij µt ) = 0 in [0, T ] × Rd ,
(5.1.2)
µ0 = µ̄ in Rd .
We recall that, for a (possibly signed) measure µ = µ(t, x) = µt (x), being a solution of
(5.1.2) simply means that
Z Z hX i
d 1X
ϕ(x) dµt (x) = bi (t, x)∂i ϕ(x)+ aij (t, x)∂ij ϕ(x) dµt (x) ∀ϕ ∈ Cc∞ (Rd ).
dt Rd R d
i
2 ij
(5.1.3)
∗
in the distributional sense on [0, T ], and the initial condition means that µt w -converges
to µ̄ (i.e. converges in the duality with Cc (Rd )) as t → 0. We observe that, since the
equation (5.1.2) is in divergence form, it makes sense without any regularity assumption
on a and b, provided that
Z T Z
¡ ¢
|b(t, x)| + |a(t, x)| d|µt |(x) dt < +∞ ∀A ⊂⊂ Rd
0 A
(here and in the sequel, |µt | denotes the total variation of µt ). Since b and a will always
be assumed to be bounded, in the definition of measure-valued solution of the PDE we
assume that Z T
|µt |(A) dt < +∞ ∀A ⊂⊂ Rd , (5.1.4)
0
5.1. Introduction and preliminary results 169
so that (5.1.2) surely makes sense. However, if µt is singular with respect to the Lebesgue
measure L d , then the products b(t, ·)µt and a(t, ·)µt are sensitive to modification of b(t, ·)
and a(t, ·) in L d -negligible sets. Since in the case of singular measures the coefficients
a and b will be assumed to be continuous, while in the case of coefficients in L∞ the
measures will be assumed to be absolutely continuous, (5.1.2) will always make sense.
Recall also that it is not restrictive to consider only solutions t 7→ µt of the Fokker-Planck
equation that are w∗ -continuous on [0, T ], i.e. continuous in the duality with Cc (Rd ) (see
Lemma 5.2.1). Thus, we can assume that µt is defined for all t and even at the endpoints
of [0, T ].
For simplicity of notation, we define
X 1X
Lt := bi (t, ·)∂i + aij (t, ·)∂ij .
i
2 ij
In this way the PDE can be written as
∂t µt = L∗t µt in [0, T ] × Rd ,
where L∗t denotes the (formal) adjoint of Lt in L2 (Rd ). Using Itô’s formula it is simple
to check that, if X(t, x, ω) ∈ L2 (Ω, C([0, T ], Rd )) is a family of solutions of (5.1.1),
measurable in (t, x, ω), then the measure µt defined by
Z Z
f (x) dµt (x) := E[f (X(t, x, ω))] dµ(x) ∀f ∈ Cc (Rd )
is a solution of (5.1.2) with µ0 = µ (see also Lemma 5.2.4).

We define ΓT := C([0, T ], Rd ), and et : ΓT → Rd , et (γ) := γ(t). Let us recall the
Stroock-Varadhan definition of martingale solutions:
Definition 5.1.1. A measure νx,s on ΓT is a martingale solution of (5.1.1) starting from
x at time s if:
1. νx,s ({γ | γ(s) = x}) = 1;
2. for any ϕ ∈ Cc∞ (Rd ), the stochastic process on ΓT

Z t
ϕ(γ(t)) − Lu ϕ(γ(u)) du
s
is a νx,s -martingale after time s with respect to the canonical filtration.

We will say that the martingale problem is well-posed if, for any (s, x) ∈ Rd , we have
existence and uniqueness of martingale solutions.
In the sequel, we will deal with families {νx }x∈Rd of probability measures that are
measurable with respect to x according to the following standard definition.
Definition 5.1.2. We say that a family of probability measures on a probability space

(Ω, A) {νx }x∈Rd is measurable if, for any A ∈ A, the real valued map x 7→ νx (A) is
measurable.
5.1.1 Plan of the chapter

• The theory of Stochastic Lagrangian Flows
In the first part, we develop a general theory (independent of specific regularity or
ellipticity assumptions), which roughly speaking allows to deduce existence, uniqueness
and stability of martingale solutions for L d -almost every initial condition x whenever
existence and uniqueness is known at the PDE level in the L∞ -setting (and, conversely, if
existence and uniqueness of martingale solutions is known for L d -a.e. initial condition,
then existence and uniqueness for the PDE in the L∞ -setting holds).
More precisely, in Section 5.2 we study how uniqueness of the SDE is related to
that of the PDE. In Paragraph 5.2.1 we prove a representation formula for solutions of
the PDE, which shows that they can always be seen as a superposition of solutions of
the SDE also when standard existence results for martingale solutions of SDE do not
apply. In particular, assuming only the boundedness of the coefficients, we will show
that, whenever we have existence of a solution of the PDE starting from µ0 , there exists
at least one martingale solution of the SDE for µ0 -a.e. initial condition x.
In Section 5.3 we introduce the main object of our study, what we call Stochastic
Lagrangian Flow. In Paragraph 5.3.1 we state and prove our main result regarding the
existence and uniqueness of Stochastic Lagrangian Flows, showing that these flows exist
and are unique whenever the PDE is well-posed in the L∞ -setting. We also prove a
stability result, and we show that Stochastic Lagrangian Flows satisfy the Chapman-
Kolmogorov equation. Moreover, in Paragraph 5.3.2 we investigate the relation between
our result and its deterministic counterpart and, applying our stability result, we deduce
a vanishing viscosity theorem for Ambrosio’s Regular Lagrangian Flows.
• The Fokker-Planck equation

In the second part we study by purely PDE methods the well-posedness of the Fokker-
Planck equation in two extreme (with respect to the regularity imposed in time, or in
space) situations: in the first one, assuming uniform ellipticity of the coefficients and Lip-
schitz regularity in time, we are able to prove existence and uniqueness in the L2 -settings
assuming no regularity in space, but only suitable divergence bounds (see Theorem 5.4.3).
This result, together with Proposition 5.4.4, directly implies the following theorem (here
5.1. Introduction and preliminary results 171
and in the sequel, S+ (Rd ) denotes the set of symmetric and non-negative definite d × d
matrices).
Theorem 5.1.3. Let us assume that a : [0, T ] × Rd → S+ (Rd ) and b : [0, T ] × Rd → Rd
are bounded functions such that:
P ∞ d
1. j ∂j aij ∈ L ([0, T ] × R ) for i = 1, . . . , d,
2. ∂t aij ∈ L∞ ([0, T ] × Rd ) for i, j = 1, . . . , d;

P P
3. ( i ∂i bi − 12 ij ∂ij aij )− ∈ L∞ ([0, T ] × Rd );
4. hξ, a(t, x)ξi ≥ α|ξ|2 ∀(t, x) ∈ [0, T ] × Rd , for a certain α > 0;

a b
5. 1+|x|2
∈ L2 ([0, T ] × Rd ), 1+|x|
∈ L2 ([0, T ] × Rd ).
Then there exist a unique solution of (5.1.2) in L+ , where

© ª
L+ := u ∈ L∞ ([0, T ], L1+ (Rd )) ∩ L∞ ([0, T ], L∞
+ (R d
)) | u ∈ C([0, T ], w ∗
− L∞
(Rd
)) ,
and L1+ and L∞ 1

+ denote the convex subsets of L and L
∞
consisting of non-negative
functions.
In the second case, a does not depend on the space variables, but it can be degenerate
and it is allowed to depend on t even in a measurable way. Since a can also be identically
0, we need to assume BV regularity on the vector field b, and so we can prove:
Theorem 5.1.4. Let us assume that a : [0, T ] → S+ (Rd ) and b : [0, T ] × Rd → Rd are
bounded functions such that:
P
1. b ∈ L1 ([0, T ], BVloc (Rd , Rd )), i ∂i bi ∈ L1loc ([0, T ] × Rd );
P
2. ( i ∂i bi )− ∈ L1 ([0, T ], L∞ (Rd )).
Then there exist a unique solution of (5.1.2) in L+ .
This theorem is a direct consequence of Theorem 5.4.12. Other existence and unique-
ness results for the Fokker-Planck equation, which are in some sense intermediate with
respect the two extreme ones stated above, have been proved in a recent paper of LeBris
and P.-L.Lions [91]. As in our case, in that paper the authors are interested in the well-
posedness of the Fokker-Planck equation as a tool to deduce existence and uniqueness
results at the SDE level (see also [92]). In particular, in [91, Section 4] the authors give
a list of interesting situations in the modelization of polymeric fluids when SDEs with
irregular drift b and dispersion matrix σ arise (see also [88] and the references therein
for other existence and uniqueness results for non-smooth SDEs).
• Conclusions
In Section 5.5 we apply the theory developed in Paragraph 5.3.1 to obtain, in the cases
considered above, the generic well-posedness of the associated SDE.
Finally, in the last section we generalize an important uniqueness result of Stroock
and Varadhan (see Theorem 5.2.2 and the remarks at the end of Theorem 5.5.4).
5.2 SDE-PDE uniqueness

In this section we study the main relations between the SDE and the PDE. The main
result is a general representation formula for solutions of the PDE (Theorem 5.2.7) which
allows to relate uniqueness of the SDE to that of the PDE (Lemma 5.2.3).
As we already said in the introduction, here and in the sequel b and a are always
assumed to be bounded. Let us recall the following result on the time regularity of t 7→ µt
(see also [5, Remark 3] or [11, Lemma 8.1.2]):
Lemma 5.2.1. Up to modification of µt in a negligible set of times, t 7→ µt is w∗ -
continuous on [0, T ]. Moreover, if |µt |(Rd ) ≤ C for any t ∈ [0, T ], then t 7→ µt is
narrowly continuous.
Proof. By (5.1.3), for any ϕ ∈ Cc∞ (Rd )
Z
d
ϕ(x) dµt (x) ∈ L1 ([0, T ]),
dt Rd
and therefore, for a given ϕ, the map t 7→ hµt , ϕi has a unique uniformly continuous
representative in [0, T ]. By a simple density argument, we can find a representative µ̃t
independent of ϕ ∈ Cc∞ (Rd ) such that t 7→ hµ̃t , ϕi is uniformly continuous on [0, T ].
Together with (5.1.4), this implies that t 7→ hµ̃t , ϕi is uniformly continuous for any
ϕ ∈ Cc (Rd ). If moreover |µt |(Rd ) ≤ C for any t ∈ [0, T ], then t 7→ hµ̃t , ϕi is uniformly
continuous for any ϕ ∈ Cb (Rd ). ¤
We also recall the following important theorem of Stroock and Varadhan (for a proof,
see [125, Theorem 6.2.3]):
Theorem 5.2.2. Assume that for any (s, x) ∈ [0, T ]×Rd , for any νx,s and ν̃x,s martingale
solutions of (5.1.1) starting from x at time s, one has
(et )# νx,s = (et )# ν̃x,s ∀t ∈ [s, T ].
Then the martingale solution of (5.1.1) starting from any (s, x) ∈ [0, T ] × Rd is unique.
5.2. SDE-PDE uniqueness 173
We start studying how the uniqueness of (5.1.1) is related to that of (5.1.2).

Lemma 5.2.3. Let A ⊂ Rd be a Borel set. The following two properties are equivalent:
(a) Time-marginals of martingale solutions of the SDE are unique for any x ∈ A.
(b) Finite non-negative measure-valued solutions of the PDE are unique for any non-
negative Radon measure µ0 concentrated in A.
Proof. (b) ⇒ (a): let us choose µ0 = δx , with x ∈ A. Then, if νx and ν̃x are two
martingale solutions of the SDE, we get that µt := (et )# νx and µ̃t := (et )# ν̃x are two
solutions of the PDE with µ0 = δx (see Lemma 5.2.4). This implies that µt = µ̃t , that is
Z Z
hµt , ϕi = ϕ(γ(t)) dνx (γ) = ϕ(γ(t)) dν̃x (γ) = hµ̃t , ϕi ∀ϕ ∈ Cc∞ (Rd ),
ΓT ΓT
that is (et )# νx = (et )# ν̃x (observe in particular that, if A = Rd and we have uniqueness
for the PDE for any initial time s ≥ 0, by Theorem 5.2.2 we get that νx = ν̃x for any
x ∈ Rd ).
(a) ⇒ (b): this implication follows by Theorem 5.2.7, which provides, for every finite
non-negative measure-valued solutions of the PDE, the representation
Z Z
ϕ dµt = ϕ(γ(t)) dνx (γ) dµ0 (x), (5.2.1)
Rd Rd ×ΓT
where, for µ0 -a.e. x, νx is a martingale solution of SDE starting from x (at time 0).
Therefore, by the uniqueness of (et )# νx , we obtain that solutions of the PDE are unique.
¤
We now prove that, if νx is a martingale solution of the SDE starting from x (at time
0) for µ0 -a.e. x, the right hand side of (5.2.1) always defines a non-negative solution of
the PDE. We recall that a locally finite measure in Rd is a possibly signed measure with
locally finite total variation.
Lemma 5.2.4. Let µ0 be a locally finite measure on Rd , and let {νx }x∈Rd be a measurable
family of probability measures on ΓT such that νx is a martingale solutionR of the SDE
starting from x (at time 0) for |µ0 |-a.e. x. Define on ΓT the measure ν := Rd νx dµ0 (x),
and assume that
Z TZ
χBR (γ(t)) dνx (γ) d|µ0 |(x) dt < +∞ ∀R > 0. (5.2.2)
0 Rd ×ΓT
Then the measure µνt on Rd defined by

Z
ν
hµt , ϕi := h(et )# ν, ϕi = ϕ(γ(t)) dνx (γ) dµ0 (x) ∀ϕ ∈ Cc∞ (Rd )
Rd ×Γ T
is a solution of the PDE.

Remark 5.2.5. Property 5.2.2 is trivially true if, for example, |µ0 |(Rd ) < +∞.
Proof. Let us first show that the map t 7→ hµνt , ϕi is absolutely continuous for any
ϕ ∈ Cc∞ (Rd ). We recall that a real valued map t 7→ f (t) is said absolutely continuous
if, for any ε > 0 there exists δ > 0 such that, given any family of disjoint intervals
(sk , tk ) ⊂ [0, T ], the following implication holds:
X X
|tk − sk | ≤ δ ⇒ |f (tk ) − f (sk )| ≤ ε.
k k
Take R > 0 such that supp(ϕ) ⊂ BR , and let I = ∪nk=1 (sk , tk ) be a subset of [0, T ] with
(sk , tk ) disjoint and such that |tk − sk | ≤ 1. For µ0 -a.e. x, by the definition of martingale
solution we have
Z Z Z tk Z
ϕ(γ(tk )) dνx (γ) − ϕ(γ(sk )) dνx (γ) = Lt ϕ(γ(t)) dνx (γ) dt
ΓT ΓT sk ΓT
Z tk Z X Z Z X
1 tk
= bi (t, γ(t))∂i ϕ(γ(t)) dνx (γ) dt+ aij (t, γ(t))∂ij ϕ(γ(t)) dνx (γ) dt
sk ΓT i 2 sk ΓT ij
and so, integrating with respect to µ0 , we obtain

h 1 i Z tk Z
ν ν
|hµtk , ϕi − hµsk , ϕi| ≤ kϕkC 2 kbk∞ + kak∞ χBR (γ(t)) dνx (γ) d|µ0 |(x) dt.
2 sk Rd ×ΓT
Thus
Xn h iXn Z tk Z
ν ν 1
|hµtk , ϕi−hµsk , ϕi| ≤ kϕkC 2 kbk∞ + kak∞ χBR (γ(t)) dνx (γ) d|µ0 |(x) dt,
k=1
2 k=1 s k R d ×Γ
T
which shows that the map t 7→ hµνt , ϕi is absolutely continuous thanks to (5.2.2) and the
absolute continuity property of the integral. So, in order to conclude that µνt solves the
PDE, it suffices to compute the time derivative of t 7→ hµνt , ϕi, and, by the computation
we made above, one simply gets
Z µZ ¶
d ν d
hµ , ϕi = ϕ(γ(t)) dνx (γ) dµ0 (x)
dt t Rd dt ΓT
Z Z
= Lt ϕ(γ(t)) dνx (γ) dµ0 (x) = hµνt , Lt ϕi.
Rd ΓT
¤
Remark 5.2.6. We observe that, by the definition of µνt , the following implications hold:
1. µ0 ≥ 0 ⇒ ∀t ≥ 0, µνt ≥ 0 and µνt (Rd ) = µ0 (Rd ) (the total mass can also be infinite);
2. µ0 signed ⇒ ∀t ≥ 0, |µνt |(Rd ) ≤ |µ0 |(Rd ) (the total variation can also be infinite).
5.2.1 A representation formula for solutions of the PDE

We denote by M+ (Rd ) the set of non-negative finite measures on Rd .
Theorem 5.2.7. Let µt be a solution of the PDE such that µt ∈ M+ (Rd ) for any
t ∈ [0, T ], with µt (Rd ) ≤ C for any t ∈ [0, T ]. Then there exists a measurable family
of probability measures {νx }x∈Rd such that νx is a martingale solution of (5.1.1) starting
from x (at time 0) for µ0 -a.e. x, and the following representation formula holds:
Z Z
ϕ dµt = ϕ(γ(t)) dνx (γ) dµ0 (x). (5.2.3)
Rd Rd ×ΓT
By this theorem it follows that, whenever we have existence of a solution of the

PDE starting from µ0 , there exists a martingale solution of the SDE for µ0 -a.e. initial
condition x.
Proof. Up to a renormalization of µ0 , we can assume that µ0 (Rd ) = 1.
Step 1: smoothing. Let ρ : Rd → (0, +∞) √ be a convolution kernel such that
k − 1+|x|2
|D ρ(x)| ≤ Ck |ρ(x)| for any k ≥ 1 (ρ(x) = Ce , for instance). We consider
the measures µεt := µt ∗ ρε . They are smooth solutions of the PDE
X 1X
∂t µεt + ∂i (bεi µεt ) − ∂ij (aεij µεt ) = 0, (5.2.4)
i
2 ij
(b(t,·)µt )∗ρε (a(t,·)µt )∗ρε

where bεt = bε (t, ·) := µεt
, aεt = aε (t, ·) := µεt
. Then it is immediate to see
that
kbεt k∞ ≤ kbt k∞ , kaεt k∞ ≤ kat k∞ . (5.2.5)
Since |Dk ρ(x)| ≤ Ck |ρ(x)|, it is simple to check that bε and aε are smooth and bounded
together with all their spatial derivatives. By [125, Corollary 6.3.3], the martingale
problem for aε and bε is well-posed (see Definition 5.1.1) and the family {νxε }x∈Rd of
martingale solutions (starting at time 0) is measurable (see R Definition 5.1.2). By (5.2.5)
we can apply Lemma 5.2.4, which tells us that µ̃εt := (et )# Rd νxε dµε0 (x) is a finite measure
which solves the smoothed PDE (5.2.4) with initial datum µε0 . Then, since the solution
of (5.2.4) is unique (Proposition 5.4.1), we obtain µ̃εt = µεt , that is
Z Z
ε
ϕ dµt = ϕ(γ(t)) dνxε (γ) dµε0 (x). (5.2.6)
Rd Rd ×ΓT
Step R2: tightness. It is clear that the measures µε0 = µ0 ∗ ρε are tight. So, if we define
ν ε := Rd νxε dµε0 , we have
lim sup ν ε ({|γ(0)| > R}) = 0.

R→∞ 0<ε<1
£ ¤
For any ϕ ∈ Cc∞ (Rd ), let us define Aϕ := kϕkC 2 kbk∞ + 21 kak∞ . Since for every
ϕ ∈ Cc∞ (Rd ) and any 0 < ε < 1
Z t ³X ´
1X ε
ϕ(γ(t)) − bεi (u, γ(u))∂i ϕ(γ(u)) + aij (u, γ(u))∂ij ϕ(γ(u)) du
0 i
2 ij
is a ν ε -martingale with respect to the canonical filtration, by (5.2.5) we obtain that

ϕ(γ(t)) + Aϕ t is a ν ε -submartingale with respect to the canonical filtration. Thus [125,
Theorem 1.4.6] can be applied, and the tightness of ν ε follows.
Let ν be any limitR point of ν ε , and consider the disintegration of ν with respect to
µ0 = (e0 )# ν, i.e. ν = Rd νx dµ0 (x). Passing to the limit in (5.2.6), we get
Z Z
ϕ dµt (x) = ϕ(γ(t)) dνx (γ) dµ0 (x).
Rd Rd ×ΓT
Step 3: νx is a martingale solution of the SDE for µ0 -a.e. x. Let εn → 0 be a

sequence such that ν is the weak limit of ν εn . Let us fix a continuous function f : Rd → R
with 0 ≤ f ≤ 1, s ∈ [0, T ], and an Fs -measurable continuous function Φs : ΓT → R with
0 ≤ Φs ≤ 1, where (Fs )0≤s≤T denotes the canonical filtration on ΓT . We define
X 1 X εn
Lnt := bεi n (t, ·)∂i + a (t, ·)∂ij .
i
2 ij ij
Since each νxεn is a martingale solution, we know that for any t ∈ [s, T ] and for any
ϕ ∈ Cc∞ (Rd )
Z · Z t ¸
ϕ(γ(t)) − Lnu ϕ(γ(u)) du Φs (γ) dνxεn (γ)f (x) dµε0n (x)
Rd ×Γ 0
T
Z · Z s ¸
= ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνxεn (γ)f (x) dµε0n (x)
n
Rd ×ΓT 0
(see Definition 5.1.1), or equivalently

Z · Z t ¸
ϕ(γ(t)) − ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνxεn (γ)f (x) dµε0n (x) = 0.
n
Rd ×ΓT s
Let us take b̃ : [0, T ] × Rd → Rd and ã : [0, T ] × Rd → S+ (Rd ) bounded and continuous,

and define X 1X
L̃t := b̃i (t, ·)∂i + ãij (t, ·)∂ij ,
i
2 ij
X 1 X εn
L̃nt := b̃εi n (t, ·)∂i + ã (t, ·)∂ij ,
i
2 ij ij
where b̃εi n and ãεijn are defined analogously to bεi n and aεijn . Thus we can write
Z · Z t ¸
ϕ(γ(t)) − ϕ(γ(s)) − Φs (γ) dνxεn (γ)f (x) dµε0n (x)
L̃nu ϕ(γ(u)) du
Rd ×Γ T s
Z ·Z t ¸
= (Lu − L̃u )ϕ(γ(u)) du Φs (γ) dνxεn (γ)f (x) dµε0n (x).
n n
Rd ×ΓT s
Then, recalling that 0 ≤ f ≤ 1 and 0 ≤ Φs ≤ 1, we get
¯Z · Z t ¸ ¯
¯ ¯
¯ n s εn εn ¯
¯ ϕ(γ(t)) − ϕ(γ(s)) − L̃u ϕ(γ(u)) du Φ (γ) dνx (γ)f (x) dµ0 (x)¯
¯ Rd ×ΓT s ¯
Z ·Z t ¯ ¯ ¸
¯ n ¯
≤ ¯(Lu − L̃nu )ϕ(γ(u))¯ du Φs (γ) dνxεn (γ)f (x) dµε0n (x)
Rd ×ΓT s
Z ·Z t ¯ ¯ ¸
¯ n ¯
≤ ¯(Lu − L̃u )ϕ(γ(u))¯ du dνxεn (γ) dµε0n (x)
n
Rd ×Γ s
Z tZ T ¯ ¯
¯ n ¯
= ¯(Lu − L̃nu )ϕ(x)¯ dµεun (x) du
s Rd
¯ ¶ ¯
X t Z ¯µ (bi (u, ·)µu ) ∗ ρε
Z
(b̃ (u, ·)µ ) ∗ ρ ¯
≤ ¯ n
−
i u εn
∂ ϕ ¯(x) dµεun (x) du
¯ µu εn µuεn
i ¯
i s Rd
Z Z ¯µ ¶ ¯
1X t ¯ (aij (u, ·)µu ) ∗ ρεn
¯ (ãij (u, ·)µu ) ∗ ρεn ¯
¯(x) dµεun (x) du
+ ¯ − ∂ ij ϕ¯
2 d
ij s Rµεn u µεn u
XZ tZ
≤ |bi (u, ·) − b̃i (u, ·)|(x)∂i ϕ ∗ ρεn (x) dµu (x) du
i s Rd
Z tZ
1X
+ |aij (u, ·) − ãij (u, ·)|(x)∂ij ϕ ∗ ρεn (x) dµu (x) du.
2 ij s Rd
Since ã and b̃ are continuous, ãεn and b̃εn converge to ã and b̃ locally uniformly. So we
can pass to the limit in the above equation as n → ∞, obtaining

¯Z · Z t ¸ ¯
¯ ¯
¯ ¯
¯ ϕ(γ(t)) − ϕ(γ(s)) − L̃u ϕ(γ(u)) du Φs (γ) dνx (γ)f (x) dµ0 (x)¯
¯ Rd ×ΓT s ¯
X t Z Z
≤ |bi (u, x) − b̃i (u, x)|∂i ϕ(x) dµu (x) du
i s Rd
Z tZ
1X
+ |aij (u, x) − ãij (u, x)|∂ij ϕ(x) dµu (x) du
2 ij s Rd
Choosing two sequences of continuous functions (b̃k )k∈N and (ãk )k∈N converging respec-
RT
tively to b and a in L1 ([0, T ] × Rd , η), with η := 0 µt dt, we finally obtain
Z · Z t ¸
ϕ(γ(t)) − ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ)f (x) dµ0 (x) = 0,
Rd ×Γ T s
that is
Z · Z t ¸
ϕ(γ(t)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ)f (x) dµ0 (x)
Rd ×Γ 0
T
Z · Z s ¸
= ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ)f (x) dµ0 (x).
Rd ×ΓT 0
By the arbitrariness of f we get that, for any 0 ≤ s ≤ t ≤ T , and for any Fs -measurable
function Φs , we have
Z · Z t ¸
ϕ(γ(t)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ)
ΓT 0
Z · Z s ¸
= ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ) for µ0 -a.e. x.
ΓT 0
Letting Φs vary in a dense countable subset of Fs -measurable functions, by approxima-

tions we deduce that, for any 0 ≤ s ≤ t ≤ T , for µ0 -a.e. x,
Z · Z t ¸
ΓT 0
Z · Z s ¸
= ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ)
ΓT 0
5.3. Stochastic Lagrangian Flows 179
for any Fs -measurable function Φs (here the µ0 -a.e. depends on s and t but not on Φs ).
Taking now s, t ∈ [0, T ] ∩ Q, we deduce that, for µ0 -a.e. x,
Z · Z t ¸
ΓT 0
Z · Z s ¸
= ϕ(γ(s)) − Lu ϕ(γ(u)) du Φs (γ) dνx (γ)
ΓT 0
for any s, t ∈ [0, T ] ∩ Q, for any Fs -measurable function Φs . By the continuity of the
above equality with respect to both s and t, and the continuity in time of the filtration
Fs , we conclude that νx is a martingale solution for µ0 -a.e. x. ¤
Remark 5.2.8. We observe that by (5.2.3) it follows that
µt (Rd ) ≤ C ∀t ⇒ µt (Rd ) = µ0 (Rd )
(this result can also be proved more directly using as test functions in (5.1.2) a suitable
sequence (ϕn )n∈N ⊂ Cc∞ (Rd ), with 0 ≤ ϕn ≤ 1 and ϕn % 1, and, even in the case
when the measures µt are signed, under the assumption |µt |(Rd ) ≤ C one obtains the
constancy of the map t 7→ µt (Rd )).
5.3 Stochastic Lagrangian Flows

In this section we want to prove an existence and uniqueness result for martingale so-
lutions which satisfy certain properties, in the spirit of the Regular Lagrangian Flows
(RLF) introduced in [4].
Definition 5.3.1. Given a measure µ0 = ρ0 L d ∈ M+ (Rd ), with ρ0 ∈ L∞ (Rd ), we

say that a measurable family of probability measures {νx }x∈Rd on ΓT is a µ0 -Stochastic
Lagrangian Flow (µ0 -SLF) (starting at time 0), if:
1. for µ0 -a.e. x, νx is a martingale solution of the SDE starting from x (at time 0);
2. for any t ∈ [0, T ] µZ ¶

µt := (et )# νx dµ0 (x) ¿ L d ,
and, denoting µt = ρt L d , we have ρt ∈ L∞ (Rd ) uniformly in t.
More in general, one can analogously define a µ0 -SLF starting at time s with s ∈ (0, T )
requiring that νx is a martingale solution of the SDE starting from x at time s.
Remark 5.3.2. If {νx }x∈Rd is a µ0 -SLF, then it is also a µ00 -SLF for any µ00 ∈ M+ (Rd )
with µ00 ≤ Cµ0 . Indeed, this easily follows by the inequality
Z Z
0
0 ≤ (et )# ν̃x dµ0 (x) ≤ C(et )# ν̃x dµ0 (x).
Rd Rd
5.3.1 Existence, uniqueness and stability of SLF

We denote by L1+ and L∞ 1
+ the convex subsets of L and L
∞
consisting of non-negative
functions, and, following [4], we define
© ª
L := u ∈ L∞ ([0, T ], L1 (Rd )) ∩ L∞ ([0, T ], L∞ (Rd )) | u ∈ C([0, T ], w∗ − L∞ (Rd )) ,
and
© ª
L+ := u ∈ L∞ ([0, T ], L1+ (Rd )) ∩ L∞ ([0, T ], L∞ d ∗ ∞ d
+ (R )) | u ∈ C([0, T ], w − L (R )) .
Under an existence and uniqueness result for the PDE in the class L+ , we prove existence
and uniqueness of SLF.
Theorem 5.3.3 (Existence of SLF starting from a fixed measure). Let us suppose
that, for some initial datum µ0 = ρ0 L d ∈ M+ (Rd ), with ρ0 ∈ L∞ (Rd ), there exists a
solution of the PDE in L+ . Then there exists a µ0 -SLF.
Proof. It suffices to apply Theorem 5.2.7 to the solution of the PDE in L+ . ¤
Let us assume now that forward uniqueness for the PDE holds in the class L+ for
any initial time, that is, for any s ∈ [0, T ], for any ρs ∈ L1+ (Rd ) ∩ L∞ d
+ (R ), if we denote
by ρt L d and ρ̃t L d two solutions of the PDE in the class L+ starting from ρs L d at time
s, then
ρt = ρ̃t for any t ∈ [s, T ].
Before stating and proving our main theorem, we first introduce some notation that
will be used also in the last section.
Let B be the Borel σ-algebra on ΓT = C([0, T ], Rd ), and define the filtrations Ft :=
σ[es | 0 ≤ s ≤ t] and F t := σ[es | t ≤ s ≤ T ]. Set P(ΓT ) the set of probability measures
on ΓT . Now, given ν ∈ P(ΓT ), we denote by
ΓT 3 γ 7→ νFγ t ∈ P(ΓT )
a regular conditional probability distribution of ν given Ft , that is a family of probability

measures on (ΓT , B) indexed by γ such that:
- for each B ∈ B, γ 7→ νFγ t (B) is Ft -measurable;
- Z
ν(A ∩ B) = νFγ t (B) dν(γ) ∀A ∈ Ft , ∀B ∈ B. (5.3.1)
A
Since ΓT is a Polish space and every σ-algebra Ft is finitely generated, such a function
exists and is unique, up to ν-null sets. In particular, up to changing this function in a
ν-null set, the following fact holds:
νFγ t ({γ̃ | γ̃(s) = γ(s) ∀s ∈ [0, t]}) = 1 ∀γ ∈ ΓT . (5.3.2)
Finally, given 0 ≤ t1 ≤ . . . ≤ tn ≤ T , we set M t1 ,...,tn := σ[et1 , . . . , etn ], and one can

γ γ
analogously define νM t1 ,...,tn . For νM t1 ,...,tn an analogous of (5.3.2) holds:
γ
νM t1 ,...,tn ({γ̃ | γ̃(ti ) = γ(ti ) ∀i = 1, . . . , n}) = 1 ∀γ ∈ ΓT . (5.3.3)
γ x1 ,...,xn
If γ(ti ) = xi for i = 1, . . . , n, then we will also use the notation νM t1 ,...,tn = νM t1 ,...,tn .
R
By (5.3.1) one can check that ΓT νFγ̃ tn dνM γ
t1 ,...,tn (γ̃) is a regular conditional probability
t1 ,...,tn
distribution of ν given M , which implies by uniqueness that
Z
γ
νM t1 ,...,tn = νFγ̃ tn dνMγ
t1 ,...,tn (γ̃) for ν-a.e. γ. (5.3.4)
ΓT
Theorem 5.3.4 (Uniqueness of SLF starting from a fixed measure). Let us

assume that forward uniqueness for the PDE holds in the class L+ for any initial time.
Then, for any µ0 = ρ0 L d ∈ M+ (Rd ), with ρ0 ∈ L∞ (Rd ), the µ0 -SLF is uniquely
determined µ0 -a.e. (in the sense that, if {νx } and {ν̃x } are two µ0 -SLF, then νx = ν̃x
for µ0 -a.e. x).
Proof. Let {νx } and {ν̃x } be two µ0 -SLF. Take now a function ψ ∈ Cc (Rd ), with
ψ ≥ 0. By Remark 5.3.2, {νx } and {ν̃x } are two ψµ0 -SLF. Thus, by Lemma 5.2.4 and
the uniqueness of the PDE in L+ , for any ϕ ∈ Cc (Rd ) we have
Z Z
ϕ(et (γ)) dνx (γ)ψ(x) dµ0 (x) = ϕ(et (γ)) dν̃x (γ)ψ(x) dµ0 (x) ∀t ∈ [0, T ].
Rd ×ΓT Rd ×ΓT
(5.3.5)
This clearly implies that, for any t ∈ [0, T ],
(et )# νx = (et )# ν̃x for µ0 -a.e. x.
We now want to use an analogous argument to deduce that, for any 0 < t1 < t2 < . . . <
tn ≤ T ,
(et1 , . . . , etn )# νx = (et1 , . . . , etn )# ν̃x for µ0 -a.e. x. (5.3.6)
The idea is that, given a measure µ̃s = ρ̃s L d ∈ M+ (Rd ), with ρ̃s ∈ L∞ , once we have a
µ̃s -SLF starting at time s we can multiply µ̃s by a function ψs ∈ Cc (Rd ) with ψs ≥ 0, and
by Remark 5.3.2 our µ̃s -SLF is also a ψs µ̃s -SLF starting at time s. Using this argument
n times at different times and the time marginals uniqueness, we will R obtain (5.3.6).
Fix 0 < t1 < . . . < tn ≤ T . Take ψ0 ≥ 0 with ψ0 ∈ Cc (Rd ) and Rd ψ0 dµ0 = 1, and
denote by µψt10 the value at time t1 of the (unique) solution in L+ of the PDE starting
from ψ0 µ0 (which is induced both by {νx } and {ν̃x } by uniqueness, see equation (5.3.5)).
Let {νx,t1 }x∈Rd and {ν̃x,t1 }x∈Rd be the families of probability measures on ΓT given by
the disintegration of
Z Z
ψ0 ψ0
ν := νx ψ0 (x) dµ0 (x) and ν̃ := ν̃x ψ0 (x) dµ0 (x)
Rd Rd
with respect to µψt10 = (et1 )# ν ψ0 = (et1 )# ν̃ ψ0 , that is

Z Z
ψ0
ψ0
ν = νx,t1 dµt1 (x), ν̃ = ψ0
ν̃x,t1 dµψt10 (x). (5.3.7)
Rd Rd
It is easily seen that {νx,t1 } and {ν̃x,t1 } are regular conditional probability distributions,
given M t1 = σ[et1 ], of ν ψ0 and ν̃ ψ0 respectively (that is, with the notation introduced
before, νx,t1 = (ν ψ0 )xMt1 and ν̃x,t1 = (ν̃ ψ0 )xMt1 ). Thus, looking at {νx,t1 } and {ν̃x,t1 } as their
restriction to C([t1 , T ], Rd ), {νx,t1 } and {ν̃x,t1 } are µψt10 -SLF starting at time t1 . Indeed, by
the stability of martingale solutions with respect to regular conditional probability (see
[125, Chapter 6]), {νx,t1 } and {ν̃x,t1 } are martingale solutions of the SDE starting from
x at time t1 for µψt10 -a.e. x (see also the remarks at the end of the proof of Proposition
5.6.1), while (ii) of Definition 5.3.1 is trivially true since {νx } and {ν̃x } are ψ0 µ0 -SLF.
As before, since {νx,t1 } and {ν̃x,t1 } are also ψ1 µψt10 -SLF for any ψ1 ∈ Cc (Rd ) with ψ1 ≥ 0,
using again the uniqueness of the PDE in L+ we get
Z Z
ψ0
ϕ(et2 (γ)) dνx,t1 (γ)ψ1 (x) dµt1 (x) = ϕ(et2 (γ)) dν̃x,t1 (γ)ψ1 (x) dµψt10 (x)
Rd ×ΓT Rd ×ΓT
for any ϕ ∈ Cc (Rd ), which can also be written as

Z Z
ψ0
ϕ(et2 (γ))ψ1 (et1 (γ)) dνx,t1 (γ) dµt1 (x) = ϕ(et2 (γ))ψ1 (et1 (γ)) dν̃x,t1 (γ) dµψt10 (x).
Rd ×ΓT Rd ×ΓT
(5.3.8)
Recalling that by (5.3.7)
Z Z Z Z
ψ0
νx,t1 dµt1 (x) = νx ψ0 (x) dµ0 (x), ν̃x,t1 dµψt10 (x) = ν̃x ψ0 (x) dµ0 (x),
Rd Rd Rd Rd
by (5.3.8) we obtain
Z
ϕ(et2 (γ))ψ1 (et1 (γ)) dνx (γ)ψ0 (x) dµ0 (x)
Rd ×ΓT
Z
= ϕ(et2 (γ))ψ1 (et1 (γ)) dν̃x (γ)ψ0 (x) dµ0 (x)
Rd ×ΓT
R
for any non-negative ψ0 , ψ1 , ϕ ∈ Cc (Rd ) (the constraint Rd ψ0 dµ0 = 1 can be easily
removed multiplying the above equality by a positive constant). Iterating this argument,
we finally get
Z
ψn (etn (γ)) . . . ψ1 (et1 (γ)) dνx (γ)ψ0 (x) dµ0 (x)
Rd ×ΓT
Z
= ψn (etn (γ)) . . . ψ1 (et1 (γ)) dν̃x (γ)ψ0 (x) dµ0 (x),
Rd ×ΓT
for any non-negative ψ0 , . . . , ψn ∈ Cc (Rd ), and thus (5.3.6) follows.

Considering now only rational times, we get that there exists a subset A ⊂ Rd , with
µ0 (Ac ) = 0, such that, for any x ∈ A,
(et1 , . . . , etn )# νx = (et1 , . . . , etn )# ν̃x for any t1 , . . . , tn ∈ [0, T ] ∩ Q.
By continuity, this implies that, for any x ∈ A, νx = ν̃x , as wanted. ¤
Remark 5.3.5. Suppose that forward uniqueness for the PDE holds in the class L+ ,
and take µ0 = ρ0 L d and µ̃0 = ρ̃0 L d , with ρ0 , ρ̃0 ∈ L1+ (Rd ) ∩ L∞ d
+ (R ). If {νx } is a µ0 -SLF
and {ν̃x } is a µ̃0 -SLF, then
νx = ν̃x for µ0 ∧ µ̃0 -a.e. x.
In fact, by Remark 5.3.2 {νx } and {ν̃x } are both µ0 ∧ µ̃0 -SLF, and thus we conclude by
the uniqueness result proved above.
By Theorems 5.3.3 and 5.3.4, and by the remark above, we obtain the following:
Corollary 5.3.6 (Existence and uniqueness of SLF). Let us assume that we have
forward existence and uniqueness for the PDE in L+ . Then there exists a measurable
selection of martingale solution {νx }x∈Rd which is a µ0 -SLF for any µ0 = ρ0 L d with
ρ0 ∈ L1+ (Rd ) ∩ L∞ d
+ (R ), and if {ν̃x }x∈Rd is a µ̃0 -SLF for a fixed µ̃0 = ρ̃0 L
d
with ρ̃0 ∈
1 d ∞ d d
L+ (R ) ∩ L+ (R ), then νx = ν̃x for L -a.e. x ∈ supp(µ̃0 ).
Proof. It suffices to consider a SLF starting from a Gaussian measure (which exists by
Theorem 5.3.3), and to apply Remark 5.3.5. ¤
By now, the above selection of martingale solutions {νx }, which is uniquely deter-
mined L d -a.e., will be called the SLF (starting at time 0 and relative to (b, a)).
We finally prove a stability result for SLF:
Theorem 5.3.7 (Stability of SLF starting from a fixed measure). Let us suppose
that bn , b : [0, T ] × Rd → Rd and an , a : [0, T ] × Rd → S+ (Rd ) are uniformly bounded
functions, and that we have forward existence and uniqueness for the PDE in L+ with
coefficients (b, a). Let µ0 = ρ0 L d ∈ M+ (Rd ), with ρ0 ∈ L∞ (Rd ), andR let {νxn }x∈Rd
n n n n
and {νR x }x∈Rd be µ0 -SLF for (b , a ) and (b, a) respectively. Define ν := Rd νx dµ0 (x),
ν := Rd νx dµ0 (x). Assume that:
1. (bn , an ) → (b, a) in L1loc ([0, T ] × Rd );
2. setting ρnt L d = µnt := (et )# ν n , for any t ∈ [0, T ]
kρnt kL∞ (Rd ) ≤ C for a certain constant C = C(T ).
Then ν n *∗ ν in M(ΓT ).
Proof. Since (bn , an ) are uniformly bounded in L∞ , as in Step 2 of the the proof of
Theorem 5.2.7 one proves that the sequence of probability measures (ν n ) on Rd × ΓT is
tight. In order to conclude, we must show that any limit point of (ν n ) is ν.
Let ν̃ be any limit point of (ν n ). We claim that ν̃ is concentrated on martingale solutions
of the SDE with coefficients (b, a). Indeed, let us define µ̃t := (et )# ν̃. Since µnt → µ̃t
narrowly and ρnt are non-negative functions bounded in L∞ (Rd ), we get µ̃t = ρt L d
for a certain non-negative function ρt ∈ L∞ (Rd ). We now observe that the argument
used in Step 3 of the proof of Theorem 5.2.7 was using only the property that, for any
ϕ ∈ Cc∞ (Rd ),
X Z t Z ¯¯³ ´ ¯
¯
lim sup ¯ bi (u, x) − b̃i (u, x) ∂i ϕ(x)¯ρnu (x) dx du
n
n→+∞ s Rd
i
X Z t Z ¯¯³ ´ ¯
¯
≤ ¯ bi (u, x) − b̃i (u, x) ∂i ϕ(x)¯ρu (x) dx du,
i s Rd
X Z t Z ¯¯³ ´ ¯
¯
lim sup ¯ anij (u, x) − ãij (u, x) ∂ij ϕ(x)¯ρnu (x) dx du
n→+∞ s Rd
ij
X Z t Z ¯¯³ ´ ¯
¯
≤ ¯ aij (u, x) − ãij (u, x) ∂ij ϕ(x)¯ ρu (x) dx du
ij s Rd
for any b̃ : [0, T ] × Rd → Rd and ã : [0, T ] × Rd → S+ (Rd ) bounded and continuous. This
property simply follows by (i) and the w∗ -convergence of ρnt to ρt in L∞ ([0, T ] × Rd ).
Since t 7→ ρt L d is w∗ -continuous in the Rsense of measures, the w∗ -continuity of t 7→ ρt in
L∞ (Rd ) follows. Thus, if we write ν̃ := Rd ν̃x dµ0 (x) (considering the disintegration of ν̃
with respect to µ0 = (e0 )# ν̃), we have proved that {ν̃x } is a µ0 -SLF for (b, a). Therefore,
by Theorem 5.3.4, we conclude that ν = ν̃. ¤
We remark that the theory just developed could be generalized to more general
situations. Indeed the key property of the convex class L+ is the following monotonicy
property:
0 ≤ µ̃t ≤ µt ∈ L+ ⇒ µ̃t ∈ L+
(see also [5, Section 3]).
5.3.2 SLF versus RLF

We remark that, in the special case a = 0, our SLF coincides with a sort of superposition
of the RLF introduced in [4]:
Lemma 5.3.8. Let us assume a = 0. Then νx,s is a martingale solution of the SDE
(which, in this case, is just an ODE) starting from x at time s if and only if it is
concentrated on integral curves of the ODE, that is, for νx,s -a.e. γ,
Z t
γ(t) − γ(s) = b(τ, γ(τ )) dτ ∀t ∈ [s, T ].
s
Proof. It is clear from the definition of martingale solution that, if νx,s is concentrated on
integral curves on the ODE, then it is a martingale solution. Let us prove the converse
implication. By the definition of martingale solution and the fact that a = 0, it is a
known fact that
Z t
Mt := γ(t) − γ(s) − b(τ, γ(τ )) dτ, t ∈ [s, T ],
s
is a νx,s -martingale with zero quadratic variation. This implies that also Mt2 is a mar-
tingale, and since Ms = 0 we get
Z µ Z t ¶2
νx,s
0=E [Mt2 ] = γ(t) − γ(s) − b(τ, γ(τ )) dτ dνx,s (γ) ∀t ∈ [s, T ],
ΓT s
which gives the thesis. ¤

Thus, in the case a = 0, a martingale solution of the SDE starting from x is simply a
measure on ΓT concentrated on integral curves of b. By the results in [4] we know that, if
we have forward uniqueness for the PDE in L+ , then any measure ν on ΓT concentrated
on integral curves of b such that its time marginals induces a solution of the PDE in L+
is concentrated on a graph, i.e. there exists a function x 7→ X(·, x) ∈ ΓT such that
ν = X(·, x)# µ0 , with µ0 := (e0 )# ν
(see for instance [7, Theorem 18]). Then, if we assume forward uniqueness for the PDE in
L+ , our SLF coincides exactly with the RLF in [4]. Applying the stability result proved
in the above paragraph, we obtain that, as the noise tends to 0, our SLF converges to
the RLF associated to the ODE γ̇ = b(γ). So we have a vanishing viscosity result for
RLF.
Corollary 5.3.9. Let us suppose that b : [0, T ]×Rd → Rd is uniformly bounded, and that
we have forward existence and uniqueness for the PDE in L+ with coefficients (b, 0). Let
{νxε }x∈Rd and {νx }x∈Rd be the SLF relative to (b, εI) and (b, 0) respectively (existence and
uniqueness of martingale solutions for the SDE with coefficients (b, εI), together with the
measurability of the family {νxε }x∈Rd , follows byR[125, Theorem 7.2.1]).
R Let µ0 = ρ0 L d ∈
M+ (Rd ), with ρ0 ∈ L∞ (Rd ), and define ν ε := Rd νxε dµ0 (x), ν := Rd νx dµ0 (x).
Set ρεt L d = µεt := (et )# ν ε , and assume that for any t ∈ [0, T ]
kρεt kL∞ (Rd ) ≤ C for a certain constant C = C(T ).
Then ν ε *∗ ν in M(ΓT ).
In [4], the uniqueness of RLF implies the semigroup law (see [4], [5] for more details).
In our case, by the uniqueness of SLF, we have as a consequence that the Chapman-
Kolmogorov equation holds:
Proposition 5.3.10. For any s ≥ 0, let {νx,s }x∈Rd denotes the unique SLF starting at
time s. Let us denote by νs,x (t, dy) the probability measure on Rd given by νs,x (t, ·) :=
(et )# νs,x . Then, for any 0 ≤ s < t < u ≤ T ,
Z
νt,y (u, ·)νs,x (t, dy) = νs,x (u, ·) for L d − a.e. x.
Rd
Proof. Let us define
½
νRs,x on C([s, t], Rd )
ν̃s,x :=
ν ν (t, dy) on C([t, T ], Rd ).
Rd t,y s,x
This gives a family of martingale solution starting from x at time s (see [125]), and,
using that {νx,s } and {νx,t } are SLF starting at time s and t respectively, it is simple to
check that {ν̃s,x }x∈Rd is a SLF starting at time s. Thus, by Theorem 5.3.4, we have the
thesis. ¤
5.4. Fokker-Planck equation 187
5.4 Fokker-Planck equation

We now want to study the Fokker-Planck equation
X 1X
∂ t µt + ∂i (bi µt ) − ∂ij (aij µt ) = 0 in [0, T ] × Rd , (5.4.1)
i
2 ij
where a = (aij ) is symmetric and non-negative definite (that is, a : [0, T ]×Rd → S+ (Rd )).
5.4.1 Existence and uniqueness of measure valued solutions

Proposition 5.4.1. Let us assume that a : [0, T ]×Rd → S+ (Rd ) and b : [0, T ]×Rd → Rd
are bounded functions, having two bounded continuous spatial derivatives. Then, for any
finite measure µ0 there exists a unique finite measure-valued solution of (5.4.1) starting
from µ0 such that |µt |(Rd ) ≤ C for any t ∈ [0, T ].
Proof. Existence: let {νx }x∈Rd be the measurable family of martingale solutions of the
SDE ½ p
dX = b(t, X) dt + a(t, X) dB(t)
X(0) = x
(which exists and is unique by [125,
R Corollary 6.3.3]). Then, by Lemma 5.2.4 and Remark
5.2.6, the measure µt := (et )# Rd νx dµ0 (x) solves (5.4.1) and |µt |(Rd ) ≤ |µ0 |(Rd ).
Uniqueness: by linearity, it suffices to prove that, if µ0 = 0, then µt = 0 for all t ∈ [0, T ].
Fix ψ ∈ Cc∞ (Rd ), t ∈ [0, T ], and let f (t, x) be the (unique) solution of
½ P P
∂t f + i bi ∂i f + 12 ij aij ∂ij f = 0 in [0, t] × Rd
f (t) = ψ on Rd
(which exists and is unique by [125, Theorem 3.2.6]). By [125, Theorems 3.1.1 and 3.2.4],
we know that f ∈ Cb1,2 , i.e. it is uniformly bounded with one bounded continuous time
derivative and two bounded continuous spatial derivatives. Since µt is a finite measure
by assumption, and t 7→ µt is narrowly continuous (Lemma 5.2.1), we can use f (t, ·) as
test functions in (5.1.3), and we get
Z Z h X i
d 1X
f (t, x) dµt (x) = ∂t f (t, x)+ bi (t, x)∂i f (t, x)+ aij (t, x)∂ij f (t, x) dµt (x) = 0
dt Rd Rd i
2 ij
(the above computation is admissible since f ∈ Cb1,2 ). This implies in particular that
Z Z Z
0= f (0, x) dµ0 (x) = f (t, x) dµt (x) = ψ(x) dµt (x).
Rd Rd Rd
By the arbitrariness of ψ and t we obtain µt = 0 for all t ∈ [0, T ]. ¤

We remark that, in the uniformly parabolic case, the above proof still works under
weaker regularity assumptions. Indeed, in that case, one has existence of a measurable
family of martingale solutions of the SDE and of a solution f ∈ Cb1,2 ([0, t] × Rd ) of the
adjoint equation if a and b are just Hölder continuous (see [125, Theorem 3.2.1]). So we
get:
Proposition 5.4.2. Let us assume that a : [0, T ]×Rd → S+ (Rd ) and b : [0, T ]×Rd → Rd
1. hξ, a(t, x)ξi ≥ α|ξ|2 ∀(t, x) ∈ [0, T ] × Rd , for some α > 0;
¡ ¢
2. |b(t, x)−b(s, y)|+ka(t, x)−a(s, y)k ≤ C |x − y|δ + |t − s|δ ∀(t, x), (s, y) ∈ [0, T ]×
Rd , for some δ ∈ (0, 1], C ≥ 0.
Then, for any finite measure µ0 there exists a unique finite measure-valued solution of
(5.4.1) starting from µ0 .
5.4.2 Existence and uniqueness of absolutely continuous solu-

tions in the uniformly parabolic case
We are now interested in absolutely continuous solutions of (5.1.2). Therefore, we con-
sider the following equation
½ P P
∂t u + i ∂i (bi u) − 12 ij ∂ij (aij u) = 0 in [0, T ] × Rd ,
(5.4.2)
u(0) = u0 ,
which must be understood in the distributional sense on [0, T ] × Rd . We now first prove
an existence and uniqueness result in the L2 -setting under a regularity assumption on
the divergence of a, which enables us to write (5.4.2) in a variational form, and thus
to apply classical existence results (the uniqueness part in L2 is much more involved).
After, we will give a maximum principle result.
Let us make the following assumptions on the coefficients:
X X 1X
∂j aij ∈ L∞ ([0, T ] × Rd ) for i = 1, . . . , d, ( ∂i bi − ∂ij aij )− ∈ L∞ ([0, T ] × Rd ),
j i
2 ij
hξ, a(t, x)ξi ≥ α|ξ|2 ∀(t, x) ∈ [0, T ] × Rd , for some α > 0.
(5.4.3)
are bounded functions such that (5.4.3) is fulfilled. Then, for any u0 ∈ L2 (Rd ), (5.4.2)
has a unique solution u ∈ Y , where
© ª
Y := u ∈ L2 ([0, T ], H 1 (Rd )) | ∂t u ∈ L2 ([0, T ], H −1 (Rd )) .
If moreover ∂t aij ∈ L∞ ([0, T ] × Rd ) for i, j = 1, . . . , d, then existence and uniqueness

holds in L2 ([0, T ] × Rd ), and so in particular any solution u ∈ L2 ([0, T ] × Rd ) of (5.4.2)
belongs to Y .
The proof the above theorem is quite standard, except for the uniqueness result in
the large space L2 , which is indeed quite technical and involved. The motivation for
this more general result is that L1+ (Rd ) ∩ L∞ d 2 d 1 d ∞ d
+ (R ) ⊂ L (R ), and L+ (R ) ∩ L+ (R ) is
the space where we need well-posedness of the PDE if we want to apply the theory on
martingale solutions developed in the last section (see Theorems 5.1.3 and 5.5.1).
We now give some properties of the family of solutions of (5.4.2):
Proposition 5.4.4. We assume that a : [0, T ] × Rd → S+ (Rd ) and b : [0, T ] × Rd → Rd

are bounded functions, and that (5.4.3) is fulfilled. Then the solution u ∈ Y provided by
Theorem 5.4.3 satisfies:
(a) u0 ≥ 0 ⇒ u ≥ 0;
(b) u0 ∈ L∞ (Rd ) ⇒ u ∈ L∞ ([0, T ] × Rd ) and we have

P P
∂i bi − 12 ∂ij aij )− k∞
ku(t)kL∞ (Rd ) ≤ ku0 kL∞ (Rd ) etk( i ij ;
(c) if moreover
a b
∈ L2 ([0, T ] × Rd ), ∈ L2 ([0, T ] × Rd ),
1 + |x|2 1 + |x|
then u0 ∈ L1 ⇒ ku(t)kL1 (Rd ) ≤ ku0 kL1 (Rd ) ∀t ∈ [0, T ].
We observe that, by the above results together with Proposition 5.4.2, we obtain:
Corollary 5.4.5. Let us assume that a : [0, T ] × Rd → S(Rd ) and b : [0, T ] × Rd → Rd

2. |b(t, x)−b(s, y)|+ka(t, x)−a(s, y)k ≤ C (|x − y|γ + |t − s|γ ) ∀(t, x), (s, y) ∈ [0, T ]×
Rd , for some γ ∈ (0, 1], C ≥ 0;
P P P
3. ∞ d
j ∂j aij ∈ L ([0, T ]×R ) for i = 1, . . . , d, ( i ∂i bi − 12 ij ∂ij aij )− ∈ L∞ ([0, T ]×
Rd );
a b
4. 1+|x|2
∈ L2 ([0, T ] × Rd ), 1+|x|
∈ L2 ([0, T ] × Rd ).
Then, for any µ0 ∈ M+ (Rd ) there exists a unique finite measure-valued solution µt ∈
M+ (Rd ) of (5.1.2) starting from µ0 . Moreover, if such that µ0 = ρ0 L d with ρ0 ∈ L2 (Rd ),
then µt ¿ L d for all t ∈ [0, T ].
Proof. Existence and uniqueness of finite measure-valued solutions follows by Proposi-
tion 5.4.2. So the only thing to prove is that, if ρ0 ∈ L1 (Rd ) ∩ L2 (Rd ) is non-negative,
then µt ∈ M+ (Rd ) and µt ¿ L d for all t ∈ [0, T ]. This simply follows by the fact that
the solution u ∈ Y provided by Theorem 5.4.3 belongs to L1+ (Rd ) by Proposition 5.4.4,
and thus coincides with µt by uniqueness in the set of finite measure-valued solutions.
¤
In order to prove the results stated before, we need the following theorem of J.-L.Lions
(see [93]):
Theorem 5.4.6. Let H be an Hilbert space, provided with a norm | · |, and inner product
(·, ·). Let Φ ⊂ H be a subspace endowed with a prehilbertian norm k · k, such that the
injection Φ ,→ H is continuous. We consider a bilinear form B : H × Φ → R such that:
- H 3 u 7→ B(u, ϕ) is continuous on H for any fixed ϕ ∈ Φ;
- there exists α > 0 such that B(ϕ, ϕ) ≥ αkϕk2 for any ϕ ∈ Φ.
Then, for any linear continuous form L on Φ there exists v ∈ H such that
B(v, ϕ) = L(ϕ) ∀ϕ ∈ Φ.
Proof of Theorem 5.4.3. We will first prove existence and uniqueness of a solution in the
space Y . Once this will be done, we will show that, if u is a weak solution of (5.4.2)
belonging to L2 ([0, T ]×Rd ) and ∂t aij ∈ L∞ ([0, T ]×Rd ) for i, j = 1, . . . , d, then u belongs
to Y , and so it coincides with the unique solution provided before.
The change of unknown
v(t, x) = e−λt u(t, x)
leads to the equation
½ P P
∂t v + i ∂i (b̃i v) − 21 ij ∂i (aij ∂j v) + λv = 0 in [0, T ] × Rd ,
(5.4.4)
v 0 = u0 ,
P P
where b̃i := bi − 12 j ∂j aij ∈ L∞ ([0, T ]×Rd ). Assuming that λ satisfies λ > 21 k( i ∂i b̃i )− k∞ ,
we will prove existence and uniqueness for u.
Step 1: existence in Y . We want to apply Theorem 5.4.6.
Let us take H := L2 ([0, T ], H 1 (Rd )), Φ := {ϕ ∈ C ∞ ([0, T ]×Rd ) | supp ϕ ⊂⊂ [0, T )×Rd }.
Φ is endowed with the norm
Z
2 2 1
kϕkΦ := kϕkH + |ϕ(0, x)|2 dx.
2 Rd
The bilinear form B and the linear form L are defined as

Z TZ h ³ X ´ 1X i
B(u, ϕ) := u −∂t ϕ − b̃i ∂i ϕ + λϕ + aij ∂j u∂i ϕ dx dt,
0 Rd i
2 ij
Z
L(ϕ) := u0 (x)ϕ(0, x) dx.
Rd
Thanks to these definitions and our assumptions, Lions’ theorem applies, and we find a
distributional solution v of (5.4.4). In particular,
X 1X
∂t v = − ∂i (b̃i v) + ∂i (aij ∂j v) − λv ∈ H ∗ = L2 ([0, T ], H −1 (Rd )),
i
2 ij
and thus v ∈ Y . In order to give a meaning to the initial condition and to show the
uniqueness, we recall that for functions in Y there exists a well-defined notion of trace
at 0 in L2 (Rd ), and the following Gauss-Green formula holds:
Z TZ Z Z
∂t uũ + ∂t ũu dx dt = u(T, x)ũ(T, x) dx − u(0, x)ũ(0, x) dx ∀u, ũ ∈ Y
0 Rd Rd Rd
(5.4.5)
(both facts follow by a standard approximation with smooth functions R and by the
d 2
fact that, if u is smooth and compactly supported in [0, T ) × R , Rd u (0, x) dx ≤
2k∂t ukH ∗ kukH ). Thus, by (5.4.4) and (5.4.5), we obtain that v satisfies
Z
(v(0, x) − u0 (x))ϕ(0, x) dx = 0 ∀ϕ ∈ Φ,
Rd
and therefore the initial condition is satisfied in L2 (Rd ).

Step 2: uniqueness in Y . For the uniqueness, if v ∈ Y is a solution of (5.4.4) with
u0 = 0, again by (5.4.5) we get
Z TZ ³ X ´
1X
0= ∂t v + ∂i (b̃i v) − ∂i (aij ∂j v) + λv v dx dt
0 Rd i
2 ij
Z Z h i
1 T d 2 X X
= v − b̃i ∂i (v 2 ) + aij ∂i v∂j v + 2λv 2 dx
2 0 Rd dt i ij
Z ³ X ´Z T Z
1 2 1 −
≥ v (T, x) dx + λ − k( ∂i b̃i ) k∞ v 2 dx dt
2 Rd 2 i 0 Rd
³ X ´ Z TZ
1
≥ λ − k( ∂i b̃i )− k∞ v 2 dx dt.
2 i 0 R d
P
Since λ > 12 k( i ∂i b̃i )− k∞ , we get v = 0.
Remark 5.4.7. We observe that the above proof still works for the PDE
½ P P
∂t u + i ∂i (bi u) − 12 ij ∂ij (aij u) = U in [0, T ] × Rd ,
u(0) = u0 ,
with U ∈ H ∗ = L2 ([0, T ], H −1 (Rd )). Indeed, it suffices to define L as

Z
L(ϕ) := hU, ϕiH ∗ ,H + u0 (x)ϕ(x) dx,
Rd
and all the rest of the proof works without any changes.
Thanks to this remark, we can now prove uniqueness in the larger space L2 ([0, T ]×Rd )
under the assumption ∂t aij ∈ L∞ ([0, T ] × Rd ) for i, j = 1, . . . , d,.
Step 3: uniqueness in L2 . If u ∈ L2 ([0, T ] × Rd ) is a (distributional) solution of
(5.4.1), then
1X X
∂t u − ∂i (aij ∂j u) = − ∂i (b̃i u) ∈ L2 ([0, T ], H −1 (Rd )).
2 ij i
By Remark 5.4.7, there exists ũ ∈ Y solution of the above equation, with the same initial
condition. Let us define w := u − ũ ∈ L2 ([0, T ] × Rd ). Then w is a distributional solution
of ½ P
∂t w − A(∂x )w := ∂t w − 12 ij ∂i (aij ∂j w) = 0 in [0, T ] × Rd ,
w(0) = 0.
In order to conclude the proof, it suffices to prove that w = 0.
Step 3.1: regularization. Let us consider the PDE
wε − εA(∂x )wε = w in [0, T ] × Rd (5.4.6)
(this is an elliptic problem degenerate in the time variable). Applying Theorem 5.4.6,
with H = Φ := L2 ([0, T ], H 1 (Rd )),
Z Z ³ ´
T
εX
B(u, ϕ) := uϕ + aij ∂j u∂i ϕ dx dt,
0 Rd 2 ij
Z T Z
L(ϕ) := wϕ dx dt,
0 Rd
we find a unique solution wε of (5.4.6) in L2 ([0, T ], H 1 (Rd )), that is wε = (I −εA(∂x ))−1 w,
with (I − εA(∂x )) : L2 ([0, T ], H 1 (Rd )) → L2 ([0, T ], H −1 (Rd )) isomorphism. Now we want
to find the equation solved by wε . We observe that, since (I − εA(∂x ))−1 commutes with
A(∂x ) and ∂t w = A(∂x )w, the parabolic equation solved by wε formally looks
∂t wε − A(∂x )wε = [∂t , (I − εA(∂x ))−1 ]w.
Formally computing the commutator between ∂t and (I − εA(∂x ))−1 , one obtains
X
∂t wε − A(∂x )wε = ε(I − εA(∂x ))−1 ∂j (∂t aij ∂i wε ) (5.4.7)
ij
in the distributional sense (see (5.4.9) below). Let us assume for a moment that (5.4.7)
has been rigorously justified, and let us see how we can conclude.
Step 3.2: Gronwall argument. By (5.4.7) it follows that ∂t wε ∈ L2 ([0, T ], H −1 (Rd )).
Thus, recalling that wε ∈ L2 ([0, T ], H 1 (Rd )), we can multiply (5.4.7) by wε and integrate
on Rd , obtaining
Z Z Z X
1d 2 2
¡ ¢
|wε | dx + α |∇x wε | dx ≤ −ε (∂t aij )∂i wε ∂j (I − εA(∂x ))−1 wε dx.
2 dt Rd Rd Rd ij
We observe that wε (t) → 0 in L2 as t & 0. Indeed, since wε ∈ Y there is a well-defined

notion of trace at 0 in L2 (see (5.4.5)), and it is not difficult to see that this trace is 0 since
w(0) = 0 in the sense of distributions. Thus, integrating in time the above inequality,
we get
kwε (t)k2L2 (Rd ) + 2αk∇x wε k2L2 ([0,T ]×Rd )

¡ ¢
≤ 2Cεk∇x wε kL2 ([0,T ]×Rd ) k∇x (I − εA(∂x ))−1 wε kL2 ([0,T ]×Rd ) ∀t ∈ [0, T ]. (5.4.8)
Let us consider, for a general v ∈ L2 , the function vε := (I − εA(∂x ))−1 v. Multiplying
the identity vε − εA(∂x )vε = v by vε and integrating on [0, T ] × Rd , we get
kvε k2L2 + αεk∇x vε k2L2 ≤ kvε kL2 kvkL2 ,
which implies kvε kL2 ≤ kvkL2 , and therefore αεk∇x vε k2L2 ≤ kvk2L2 . Applying this last
inequality with v = wε , we obtain
¡ ¢ 1
k∇x (I − εA(∂x ))−1 wε kL2 ([0,T ]×Rd ) ≤ √ kwε kL2 ([0,T ]×Rd ) .
αε
Substituting the above inequality in (5.4.8), we have
r
2 2 ε
kwε (t)kL2 (Rd ) + 2αk∇x wε kL2 ([0,T ]×Rd ) ≤ 2C k∇x wε kL2 ([0,T ]×Rd ) kwε kL2 ([0,T ]×Rd )
α
r r
ε 2 ε
≤C k∇x wε kL2 ([0,T ]×Rd ) + C kwε k2L2 ([0,T ]×Rd ) ,
α α
3
which implies, for ε small enough (say ε ≤ 4 Cα 2 ),
r
2 ε
kwε (t)kL2 (Rd ) ≤ C kwε k2L2 ([0,T ]×Rd ) ∀t ∈ [0, T ].
α
By Gronwall inequality wε = 0, and thus by (5.4.6) w = 0.
Step 3.3: rigorous justification of (5.4.7). In order to conclude the proof of the
theorem, we only need to rigorously justify (5.4.7).
Let (anijP)n∈N be a sequence of smooth functions bounded in L∞ , P n
such that haP ξ, ξi ≥
α 2 n n n n
2
|ξ| , j ∂j aij and ∂t aij are uniformly bounded, and aij → aij , j ∂j aij → j ∂j aij ,
∂t anij → ∂t aij a.e. P
We now compute [∂t , (I − εAn (∂x ))−1 ], where An (∂x ) := ij ∂i (anij ∂j ·):
X X
[∂t , (I − εAn (∂x ))−1 ] = [∂t , εk An (∂x )k ] = εk [∂t , An (∂x )k ]
k≥0 n≥0
∞ X
X k−1
¡ ¢i ¡ ¢k−i−1
=ε εAn (∂x ) [∂t , An (∂x )] εAn (∂x )
k=0 i=0 (5.4.9)
X∞ X¡
¡ ¢i ¢k−i−1
=ε εAn (∂x ) [∂t , An (∂x )] εAn (∂x )
i=0 k>i
= ε(I − εA (∂x )) [∂t , A (∂x )](I − εAn (∂x ))−1 ,
n −1 n
P
where at the second equality we used the algebraic identity [A, B k ] = k−1 i
i=0 B [A, B]B
k−i−1
.
∞ d
Thus, for any ϕ, ψ ∈ Cc ([0, T ] × R ), we have
Z TZ Z TZ
¡ n −1
¢ £ ¤
ψ∂t (I − εA (∂x )) ϕ dx dt = ψ (I − εAn (∂x ))−1 ∂t ϕ dx dt
0 Rd 0 Rd
Z TZ
£ ¤
+ε ψ (I − εAn (∂x ))−1 [∂t , An (∂x )](I − εAn (∂x ))−1 ϕ dx dt. (5.4.10)
0 Rd
We now want to pass to the limit in the above identity as n → ∞. Since (I − εAn (∂x ))−1
is selfadjoint in L2 ([0, T ] × Rd ) and it commutes with An (∂x ), we get
Z TZ
£ ¤
ψ (I − εAn (∂x ))−1 [∂t , An (∂x )](I − εAn (∂x ))−1 ϕ dx dt
0 Rd
Z TZ
£ ¤£ ¤
= (I − εAn (∂x ))−1 ψ [∂t , An (∂x )](I − εAn (∂x ))−1 ϕ dx dt
0 Rd
Z TZ
£ ¡ ¢¤ £ ¤
=− ∂t (I − εAn (∂x ))−1 ψ (I − εAn (∂x ))−1 An (∂x )ϕ dx dt
0 Rd
Z TZ
£ ¤£ ¡ ¢¤
− (I − εAn (∂x ))−1 An (∂x )ψ ∂t (I − εAn (∂x )−1 )ϕ dx dt.
0 Rd
By (5.4.9) we have
¡ ¢
∂t (I−εAn (∂x ))−1 ϕ = (I−εAn (∂x ))−1 ∂t ϕ+ε(I−εAn (∂x ))−1 [∂t , An (∂x )](I−εAn (∂x ))−1 ϕ,
P
and, observing that [∂t , An (∂x )] = ij ∂i (∂t anij ∂j ·), we deduce that the right hand side is
uniformly bounded in L2 ([0, T ], H 1 (Rd )). In the same way one obtains
¡ ¢ ¡ ¢
∂t (I − εAn (∂x ))−1 An (∂x )ϕ = (I − εAn (∂x ))−1 ∂t An (∂x )ϕ
+ ε(I − εAn (∂x ))−1 [∂t , An (∂x )](I − εAn (∂x ))−1 An (∂x )ϕ
= (I − εAn (∂x ))−1 [∂t , An (∂x )]ϕ
+ (I − εAn (∂x ))−1 An (∂x )∂t ϕ
+ ε(I − εAn (∂x ))−1 [∂t , An (∂x )](I − εAn (∂x ))−1 An (∂x )ϕ,
and, as above, the right hand side is uniformly bounded in L2 ([0, T ], H 1 (Rd )). Thus
∂t (I − εAn (∂x ))−1 ϕ is uniformly bounded in L2 ([0, T ], H 1 (Rd )) ⊂ L2 ([0, T ] × Rd ) (the
same obviously holds for ψ in place of ϕ), while (I − εAn (∂x ))−1 An (∂x )ϕ is uniformly
bounded in H 1 ([0, T ] × Rd ) (again the same fact holds for ψ in place of ϕ). Therefore,
1
since Hloc ([0, T ] × Rd ) ,→ L2loc ([0, T ] × Rd ) compactly, all we have to check is that
¡ ¢ ¡ ¢
∂t (I − εAn (∂x ))−1 ϕ → ∂t (I − εA(∂x ))−1 ϕ
and
(I − εAn (∂x ))−1 An (∂x )ϕ → (I − εA(∂x ))−1 A(∂x )ϕ
¡ ¢
in the sense of distribution (indeed, by what we have shown above, ∂t (I − εAn (∂x ))−1 ϕ
will converge weakly in L2 while (I − εAn (∂x ))−1 An (∂x )ϕ will converge strongly in L2loc ,
and therefore it is not difficult to see that the product converges to the product of the
limits). We observe that, since the solution of
ϕε − εA(∂x )ϕε = ϕ in [0, T ] × Rd (5.4.11)
belonging to L2 ([0, T ], H 1 (Rd )) is unique, and any limit point of (I −εAn (∂x ))−1 ϕ belongs
to L2 ([0, T ], H 1 (Rd )) and is a distributional solution of (5.4.11), one obtains that
(I − εAn (∂x ))−1 ϕ → (I − εA(∂x ))−1 ϕ
in the distributional sense, which implies the convergence of ∂t (I −εAn (∂x ))−1 ϕ to ∂t (I −
εA(∂x ))−1 ϕ. Regarding (I − εAn (∂x ))−1 An (∂x )ϕ, let us take χ ∈ Cc∞ ([0, T ] × Rd ). Then
we consider
Z TZ Z TZ X
n
£ n −1
¤ ¡ ¢
A (∂x )ϕ (I − εA (∂x )) χ dx dt = − anij ∂j ϕ ∂i (I − εAn (∂x ))−1 χ dx dt.
0 Rd 0 Rd ij
Recalling that (I − εAn (∂x ))−1 χ is uniformly bounded in L2 ([0, T ], H 1 (Rd )), we get that
∂j (I − εAn (∂x ))−1 χ converges to ∂j (I − εA(∂x ))−1 χ weakly in L2 ([0, T ] × Rd ) while
anij → aij a.e., and so the convergence of (I −εAn (∂x ))−1 An (∂x )ϕ to (I −εA(∂x ))−1 A(∂x )ϕ
follows. ¡ ¢
Thus we are able to pass to the limit in (5.4.10), and we get ∂t (I − εA(∂x ))−1 ϕ ∈
L2 ([0, T ], H 1 (Rd )) and
Z T Z Z T Z
¡ −1
£ ¢ ¤
ψ∂t (I − εA(∂x )) ϕ dx dt = ψ (I − εA(∂x ))−1 ∂t ϕ dx dt
0 Rd 0 Rd
Z TZ
£ ¤
+ε ψ (I − εA(∂x ))−1 [∂t , A(∂x )](I − εA(∂x ))−1 ϕ dx dt.
0 Rd
Observing that (I − εA(∂x ))−1 is selfadjoint in L2 ([0, T ] × Rd ) (for instance, this can be
easily proved by approximation), we have that the second integral in the right hand side
can be written as
Z T Z
£ ¤
ψ (I − εA(∂x ))−1 [∂t , A(∂x )](I − εA(∂x ))−1 ϕ dx dt
0 Rd
Z TZ
£ ¤£ ¡ ¢¤
= (I − εA(∂x ))−1 ψ [∂t , A(∂x )] (I − εA(∂x ))−1 ϕ dx dt.
0 Rd
P
Using now that [∂t , A(∂x )] = ij ∂i (∂t aij ∂j ·) in the sense of distributions, it can be easily
proved by approximation that the right hand side above coincides with
Z T Z X ¡ ¢ ¡ ¢
− (∂t aij )∂i (I − εA(∂x ))−1 ψ ∂j (I − εA(∂x ))−1 ϕ dx dt.
0 Rd ij
Therefore we finally obtain

Z T Z Z T Z
¡ −1
¢ £ ¤
ψ∂t (I − εA(∂x )) ϕ dx dt = ψ (I − εA(∂x ))−1 ∂t ϕ dx dt
0 Rd 0 Rd
Z TZ X
¡ ¢ ¡ ¢
−ε (∂t aij )∂i (I − εA(∂x ))−1 ψ ∂j (I − εA(∂x ))−1 ϕ dx dt. (5.4.12)
0 Rd ij
By what we have proved above, it follows that

¡ ¢
∂t (I − εA(∂x ))−1 ϕ ∈ L2 ([0, T ], H 1 (Rd )),
¡ ¢ (5.4.13)
A(∂x ) (I − εA(∂x ))−1 ϕ = (I − εA(∂x ))−1 A(∂x )ϕ ∈ L2 ([0, T ], H 1 (Rd )).
This implies that (5.4.12) holds also for ψ ∈ L2 ([0, T ] × Rd ), and that (I − εA(∂x ))−1 ϕ
is an admissible test function in the equation ∂t w − A(∂x )w = 0. By these two facts we
obtain
Z TZ
£ ¤
0= w (∂t + A(∂x ))(I − εA(∂x ))−1 ϕ dx dt
0 Rd
Z TZ
£ ¤
= w (I − εA(∂x ))−1 (∂t + A(∂x ))ϕ dx dt
0 Rd
Z TZ X
¡ ¢ ¡ ¢
−ε (∂t aij )∂i (I − εA(∂x ))−1 w ∂j (I − εA(∂x ))−1 ϕ dx dt
0 Rd ij
Z T Z Z T Z X ¡ ¢
= wε [(∂t + A(∂x ))ϕ] dx dt − ε (∂t aij )∂i wε ∂j (I − εA(∂x ))−1 ϕ dx dt,
0 Rd 0 Rd ij
which exactly means that

X
∂t wε − A(∂x )wε = ε(I − εA(∂x ))−1 ∂j (∂t aij ∂i wε )
ij
in the distributional sense.
Proof of Proposition 5.4.4. (a) Arguing as in the the first part of the proof of Theorem
5.4.3, with the same notation we have
Z TZ ³ X ´
1X
0= ∂t v + ∂i (b̃i v) − ∂i (aij ∂j v) + λv v − dx dt
0 Rd i
2 ij
Z TZ h X i
1 d ¡ ¢ X
= − (v − )2 − b̃i ∂i (v − )2 − aij ∂i v − ∂j v − − 2λ(v − )2 dx
2 0 Rd dt i ij
Z ³ X ´Z T Z
1 − 2 1 −
≤− (v ) (T, x) dx − λ − k( ∂i b̃i ) k∞ (v − )2 dx dt
2 Rd 2 i 0 R d
³ ´ Z Z
1 X T
≤ − λ − k( ∂i b̃i )− k∞ (v − )2 dx dt,
2 i 0 Rd
and then v − = 0.
(b) It suffices to observe that the above argument works for every v ∈ Y such that
v(0) ≥ 0 and
X 1X
∂t v + ∂i (b̃i v) − ∂i (aij ∂j v) ≥ 0.
i
2 ij
P
Applying this remark to the function v := ku0 kL∞ (Rd ) − ue−λt with λ > k( i ∂i b̃i )− k∞ ,
P
and then letting λ → k( i ∂i b̃i )− k∞ , the thesis follows.
(c) The argument we use here is reminiscent of the one that we will use in the next para-
graph for renormalized solutions. Indeed, in order to prove the thesis, we will implicitly
prove that, if u ∈ L2 ([0, T ], H 1 (Rd )) is a solution of (5.4.2), it is also a renormalized
solution (see Definition 5.4.9).
Let us define ³√ ´
βε (s) := s + ε − ε ∈ C 2 (R).
2 2
Notice that βε is convex and

βε (s) → |s| as ε → 0, βε (s) − sβε0 (s) ∈ [−ε, 0].
Moreover, since βε0 , βε00 ∈ W 1,∞ (R), it is easily seen that
u ∈ L2 ([0, T ], H 1 (Rd )) ⇒ βε (u), βε0 (u) ∈ L2 ([0, T ], H 1 (Rd )).
Fix now a non-negative cut-off function ϕ ∈ Cc∞ (Rd ) with supp(ϕ) ⊂ B2 (0), and ϕ = 1
in B1 (0), and consider the functions ϕR (x) := ϕ( Rx ) for R ≥ 1.
P
Thus, since βε00 ≥ 0 and aij is positive definite, recalling that b̃i = bi − 21 j ∂j aij , for any
t ∈ [0, T ] we have
Z tZ ³ X ´
1X
0= ∂t u + ∂i (b̃i u) − ∂i (aij ∂j u) βε0 (u)ϕR dx ds
0 Rd i
2 ij
Z tZ µ X X
1 d
= (ϕR βε (u)) − 2 b̃i ∂i (uβε0 (u)ϕR ) + 2 b̃i ∂i (βε (u))ϕR
2 0 Rd dt i
X X ¶ i
+ aij ∂i u∂j uβε00 (u)ϕR + aij ∂i (βε (u))∂j ϕR dx ds
ij ij
Z Z
1 1
≥ ϕR βε (u(t)) dx − ϕR βε (u(0)) dx
2 Rd 2 Rd
Z tZ X ³ ´
¡ 0
¢
− b̃i ∂i (uβε (u) − βε (u))ϕR + βε (u)∂i ϕR dx ds
0 Rd i
Z tZ X³ ´
1
− (∂j aij )∂i ϕR + aij ∂ij ϕR βε (u) dx ds
2 0 Rd ij
Z Z Z tZ X
1 1
≥ ϕR βε (u(t)) dx − ϕR βε (u(0)) dx − ( ∂i b̃i )− (uβε0 (u) − βε (u))ϕR dx ds
2 Rd 2 Rd 0 Rd i
Z t Z ³X X ´
1
− bi ∂i ϕR + aij ∂ij ϕR βε (u) dx ds.
0 Rd i
2 ij
Observing that |βε (u)| ≤ |u|, and using Hölder inequality and the inequalities
1 3 1 5
χ{R≤|x|≤2R} ≤ χ{|x|≥R} , 2
χ{R≤|x|≤2R} ≤ χ{|x|≥R} , (5.4.14)
R 1 + |x| R 1 + |x|2
we get
Z Z Z tZ X
ϕR βε (u(t)) dx ≤ ϕR βε (u(0)) dx + 2ε ( ∂i b̃i )− dx ds
Rd Rd 0 |x|≤2R i
µ ° ° ° ¶
° b ° ° a ° °
+ kϕkC 2 6° ° + 5° ° kukL2 ([0,t]×Rd ) .
1 + |x| L2 ([0,T ]×{|x|≥R}) 1 + |x|2 L2 ([0,T ]×{|x|≥R})
Letting first ε → 0 and then R → ∞, we obtain
ku(t)kL1 (Rd ) ≤ ku(0)kL1 (Rd ) ∀t ∈ [0, T ].
5.4.3 Existence and uniqueness in the degenerate parabolic case

We now want to drop the uniform ellipticity assumption on a. In this case, to prove
existence and uniqueness in L+ , we will need to assume a independent of the space
variables.
• Uniqueness in L
The uniqueness result is a consequence of the following comparison principle in L (recall
that the comparison principle in said to hold if the inequality between two solutions at
time 0 is preserved at later times).
Theorem 5.4.8 (Comparison principle in L ). Let us assume that a : [0, T ] →

S+ (Rd ) and b : [0, T ] × Rd → Rd are such that:
P
2. a ∈ L∞ ([0, T ], S+ (Rd )).
Then (5.4.1) satisfies the comparison principle in L1 (Rd ) ∩ L∞ (Rd ). In particular solu-
tions of the PDE in L , if they exist, are unique.
Since we do not assume any ellipticity of the PDE, in order to prove the above result
we use the technique of renormalized solutions, which was first introduced in the study
of the Boltzmann equation by DiPerna and P.-L.Lions [54, 55], and then applied in the
context of transport equations by many authors (see for example [56, 28, 47, 48, 4]).
Definition 5.4.9. Let a : [0, T ] × Rd → S+ (Rd ), b : [0, T ] × Rd → Rd be such that:

P
1. b, i ∂i bi ∈ L1loc ([0, T ] × Rd );
P P
2. a, j ∂j aij , ij ∂ij aij ∈ L1loc ([0, T ] × Rd ).
Let u ∈ L∞ d
loc ([0, T ] × R ) and assume that
X 1X
c := ∂t u + bi ∂i u − aij ∂ij u ∈ L1loc ([0, T ] × Rd ). (5.4.15)
i
2 ij
We say that u is a renormalized solution of (5.4.15) if, for any convex function β : R → R
of class C 2 , we have
X 1X
∂t β(u) + bi ∂i β(u) − aij ∂ij β(u) ≤ cβ 0 (u).
i
2 ij
Equivalently the definition could be given in a partially conservative form:

X 1X X
∂t β(u) + ∂i (bi β(u)) − aij ∂ij β(u) ≤ cβ 0 (u) + ( ∂i bi )β(u).
i
2 ij i
Recalling that a is non-negative definite and β is convex, it is simple to check that,

if everything is smooth so that one can apply the standard chain rule, every solution of
(5.4.15) is a renormalized solution. Indeed, in that case, one gets
X 1X 1 X
∂t β(u) + bi ∂i β(u) − aij ∂ij β(u) = cβ 0 (u) − β 00 (u) aij ∂i u∂j u ≤ cβ 0 (u).
i
2 ij 2 ij
In our case, a solution of the Fokker-Planck equation is renormalized if

X X 1X 1X X
∂t β(u) + (bi − ∂j aij )∂i β(u) − aij ∂ij β(u) ≤ ( ∂ij aij − ∂i bi )uβ 0 (u),
i j
2 ij 2 ij i
or equivalently, writing everything in the partially conservative form,

X X 1X
∂t β(u) + ∂i ((bi − ∂j aij )β(u)) − aij ∂ij β(u)
i j
2 ij
1X X X X
≤( ∂ij aij − ∂i bi )uβ 0 (u) + ∂i (bi − ∂j aij )β(u)
2 ij i i j
X 1X 1 X
=( ∂i bi − ∂ij aij )(β(u) − uβ 0 (u)) − ( ∂ij aij )β(u).
i
2 ij 2 ij
Now, since
X X X
aij ∂ij β(u) = ∂j (aij ∂i β(u)) − ∂j aij ∂i β(u)
ij ij ij
X X X
= ∂ij (aij β(u)) − 2 ∂i ((∂j aij )β(u)) + ( ∂ij aij )β(u),
ij ij ij
the above expression can be simplified, and we obtain that a solution of the Fokker-
Planck equation is renormalized if and only if
X 1X X 1X
∂t β(u) + ∂i (bi β(u)) − ∂ij (aij β(u)) ≤ ( ∂i b i − ∂ij aij )(β(u) − uβ 0 (u)).
i
2 ij i
2 ij
(5.4.16)
It is not difficult to prove the following:
Lemma 5.4.10. Assume that there exist p, q ∈ [1, ∞] such that
a b
∈ L1 ([0, T ], Lp (Rd )), ∈ L1 ([0, T ], Lq (Rd )),
1 + |x|2 1 + |x|
and that X 1X
( ∂i bi − ∂ij aij )− ∈ L1loc ([0, T ] × Rd ).
i
2 ij
Setting a, b = 0 for t < 0, assume moreover that any solution u ∈ L of the Fokker-
Planck equation in (−∞, T ) × Rd is renormalized. Then the comparison principle holds
in L .
Proof. By the linearity of the equation, it suffices to prove that
u0 ≤ 0 ⇒ u(t) ≤ 0 ∀t ∈ [0, T ].
Fix a non-negative cut-off function ϕ ∈ Cc∞ (Rd ) with supp(ϕ) ⊂ B2 (0), and ϕ = 1 in
B1 (0), and take as renormalization function
1 ³√ 2 ´
βε (s) := s + ε2 + s − ε ∈ C 2 (R).
2
Notice that βε is convex and
βε (s) → s+ as ε → 0, βε (s) − sβε0 (s) ∈ [−ε, 0].
By (5.4.16), we know that
X 1X X 1X
∂t βε (u) + ∂i (bi βε (u)) − ∂ij (aij βε (u)) ≤ ( ∂i bi − ∂ij aij )(βε (u) − uβε0 (u))
i
2 ij i
2 ij
in the sense of distributions in (−∞, T ) × Rd . Using as test function ϕR (x) := ϕ( Rx ) for

R ≥ 1, we get
Z Z ³X ´
d 1X
ϕR βε (u) dx ≤ bi (t)∂i ϕR + aij (t)∂ij ϕR βε (u) dx
dt Rd Rd i
2 ij
Z ³X ´
1X
+ ϕR ∂i bi (t) − ∂ij aij (t) (βε (u) − uβε0 (u)) dx
Rd i
2 ij
Observing that |βε (u)| ≤ |u|, by Hölder inequality and the inequalities (5.4.14) we
can bound the first integral in the right hand side, uniformly with respect to ε, with
Z µ ¶
|b(t, x)| 5 |a(t, x)|
kϕkC 2 3 + |u(t, x)| dx
{|x|≥R} 1 + |x| 2 (1 + |x|2 )
µ ° °
° b(t) °
≤ kϕkC 3°
2 ° ku(t)kLp0 (Rd )
1 + |x| Lp ({|x|≥R})
¶
5°° a(t) °
°
+ ° ° ku(t)kLq0 (Rd )
2 1 + |x|2 Lq ({|x|≥R})
(recall that u ∈ L , and thus u ∈ L∞ ([0, T ], Lr (Rd )) for any r ∈ [1, ∞]), while the second
integral is bounded by
Z X 1X
ε ( ∂i bi − ∂ij aij )− dx.
{|x|≤2R} i
2 ij
Letting first ε → 0 and then R → ∞, we get

Z
d
u+ dx ≤ 0
dt Rd
in the sense of distribution in (−∞, T ). Since the function vanishes for negative times,
we conclude u+ = 0. ¤
Now Theorem 5.4.8 is a direct consequence of the following:
Proposition 5.4.11. Let us assume that a : [0, T ] → S+ (Rd ) and b : [0, T ] × Rd → Rd
are such that:
P
2. a ∈ L∞ ([0, T ], S+ (Rd ))
Then any distributional solution u ∈ L∞ d
loc ([0, T ] × R ) of (5.4.15) is renormalized.
Proof. We take η, a smooth convolution kernel in Rd , and we mollify the equation with
respect to the spatial variable obtaining
X 1X
∂ t uε + bi ∂i uε − aij ∂ij uε = c ∗ ηε − rε , (5.4.17)
i
2 ij
where X X
rε := (bi ∂i u) ∗ ηε − bi ∂i (u ∗ ηε ), uε := u ∗ ηε .
i i
By the smoothness of u with respect to x, by (5.4.17) we have that ∂t uε ∈ L1loc . Thus

ε
by the standard chain rule in Sobolev spaces we get that uε is a renormalized solution,
that is X 1X
∂t β(uε ) + bi ∂i β(uε ) − aij ∂ij β(uε ) ≤ (c ∗ ηε − rε )β 0 (uε )
i
2 ij
for any β ∈ C 2 (R) convex. Passing to the limit in the distributional sense as ε → 0 in
the above identity, the convergence of all the terms is trivial except for rε β 0 (uε ).
Let ση be any weak limit point of rε β 0 (uε ) in the sense of measures (such a cluster point
exists since rε β 0 (uε ) is bounded in L1loc ). Thus we get
X 1X
∂t β(u) + bi ∂i β(u) − aij ∂ij β(u) − cβ 0 (u) ≤ −ση ≤ |ση |.
i
2 ij
Since the left

V hand side is independent of η, in order to conclude the proof it suffices to
prove that η |ση | = 0, where η varies in a dense countable set of convolution kernels.
This fact is implicitly proved in [5, Theorem 34], see in particular Step 3 therein. ¤
• Existence in L+
We can now prove an existence and uniqueness result in the class L+ .
Theorem 5.4.12. Let us assume that a : [0, T ] × Rd → S(Rd ) and b : [0, T ] × Rd → Rd

are bounded functions such that
X 1X
( ∂i bi − ∂ij aij )− ∈ L1 ([0, T ], L∞ (Rd )).
i
2 ij
Then, for any µ0 = ρ0 L d ∈ M+ (Rd ), with ρ0 ∈ L1 (Rd ) ∩PL∞ (Rd ), there exists a solution
of (5.1.2) in L+ . If moreover b ∈ L ([0, T ], BVloc (R )), i ∂i bi ∈ L1loc ([0, T ] × Rd ), and
1 d
a is independent of x, then this solution turns out to be unique.

Proof. Existence: it suffices to approximate the coefficients a P
and b locally
Puniformly
1
with smooth uniformly bounded coefficients a and b such that ( i ∂i bi − 2 ij ∂ij anij )−
n n n
is uniformly bounded in L1 ([0, T ], L∞ (Rd )). Indeed, if we now consider the approximate
solutions µnt = ρnt L d ∈ M+ (Rd ), we know that
X 1X
∂t ρnt + ∂i (bni ρnt ) − ∂ij (anij ρnt ) = 0,
i
2 ij
that is
1 X X X 1X
∂t ρnt − anij ∂ij ρnt + (bni − ∂j anij )∂i ρnt + ( ∂i bni − ∂ij anij )ρnt = 0.
2 i j i
2 ij
Using the Feynman-Kac’s formula, we obtain the bound

Rt P 1 P
k( ∂i bn
i (s)− 2 ∂ij an −
ij (s)) kL∞ (Rd ) dt
kρnt kL∞ (Rd ) ≤ kρ0 kL∞ (Rd ) e 0 i ij
.
So we see that the approximate solutions are non-negative and uniformly bounded in
L1 ∩ L∞ (the bound in L1 follows by the constancy of the map t 7→ kρnt kL1 (observe that
ρnt ≥ 0 and recall Remark 5.2.8)). Therefore, any weak limit is a solution of the PDE in
L+ .
Uniqueness: it follows by Theorem 5.4.8. ¤
5.5 Conclusions
Let us now combine the results proved in Sections 5.2 and 5.4 in order to get existence
and uniqueness of SLF. The first theorem follows directly by Corollary 5.3.6 and Theorem
5.1.3, while the second is a consequence of Corollary 5.3.6 and Theorem 5.1.4.

P ∞ d
1. j ∂j aij ∈ L ([0, T ] × R ) for i = 1, . . . , d,
2. ∂t aij ∈ L∞ ([0, T ] × Rd ) for i, j = 1, . . . , d;

P P
3. ( i ∂i bi − 12 ij ∂ij aij )− ∈ L∞ ([0, T ] × Rd );

a b
5. 1+|x|2
∈ L2 ([0, T ] × Rd ), 1+|x|
∈ L2 ([0, T ] × Rd ).
5.5. Conclusions 205
Then there exists a unique SLF (in the sense of Corollary 5.3.6).P P
If moreover (bn , an ) → (b, a) in L1loc ([0, T ] × Rd ) and ( i ∂i bni − 12 ij ∂ij anij )− are
uniformly bounded in L1 ([0, T ], L∞ (Rd )), then the Feynman- Kac formula implies (ii) of
Theorem 5.3.7 (see the proof of Theorem 5.4.12). Thus we have stability of SLF.
Theorem 5.5.2. Let us assume that a : [0, T ] → S(Rd ) and b : [0, T ] × Rd → Rd are
bounded functions such that:
P
1. b ∈ L1 ([0, T ], BVloc (Rd )), i ∂i bi ∈ L1loc ([0, T ] × Rd );
P
2. ( i ∂i bi )− ∈ L1 ([0, T ], L∞ (Rd )).
Then there exists a unique SLF (in the sense of Corollary 5.3.6).P P
If moreover (bn , an ) → (b, a) in L1loc ([0, T ] × Rd ) and ( i ∂i bni − 12 ij ∂ij anij )− are
uniformly bounded in L1 ([0, T ], L∞ (Rd )), then the Feynman-Kac formula implies (ii) of
Theorem 5.3.7 (see the proof of Theorem 5.4.12). Thus we have stability of SLF.
In particular, by Corollary 5.3.9 and the Feynman-Kac formula (see the proof of
Theorem 5.4.12), the following vanishing viscosity result for RLF holds:
Theorem 5.5.3. Let us assume that b : [0, T ] × Rd → Rd is bounded and:

P
1. b ∈ L1 ([0, T ], BVloc (Rd )), i ∂i bi ∈ L1loc ([0, T ] × Rd );
P
2. ( i ∂i bi )− ∈ L1 ([0, T ], L∞ (Rd )).
Let {νxε }x∈Rd be the unique SLF relative to (b, εI), with ε > 0, and {νx }x∈Rd be the RLF
relative to (b, 0) (which is uniquely determined L d -a.e. by the results in [4]). Then, as
ε → 0, Z Z
νxε f (x) dx *∗ νx f (x) dx in M(ΓT ) for any f ∈ Cc (Rd ).
Rd Rd
We finally combine an important uniqueness result of Stroock and Varadhan (see

Theorem 5.2.2) with the well-posedness results on Fokker-Planck of the previous section.
By Theorem 5.2.2, Lemma 5.2.3 applied with A = Rd and Corollary 5.4.5, we have:

2. |b(t, x)−b(s, y)|+ka(t, x)−a(s, y)k ≤ C (|x − y|γ + |t − s|γ ) ∀(t, x), (s, y) ∈ [0, T ]×
Rd , for some γ ∈ (0, 1], C ≥ 0;
P P P
3. ∂j aij ∈ L∞ ([0, T ] × Rd ) for i = 1, . . . , d, (
j i ∂i bi − 12 ij ∂ij aij )− ∈ L∞ ([0, T ] ×
d
R );
a b
4. 1+|x|2
∈ L2 ([0, T ] × Rd ), 1+|x|
∈ L2 ([0, T ] × Rd ).
Then, there exists a unique martingale solution starting from x (at time 0) for any
x ∈ Rd .
We remark that this result is not interesting by itself, since it can be proved that the
martingale problem starting from any x ∈ Rd at any initial time s ∈ [0, T ] is well-posed
also under weaker regularity assumptions (see [125, Chapters 6 and 7]). We stated it
just because we believe that it is an interesting example of how existence and uniqueness
at the PDE level can be combined with a refined analysis at the level of the uniqueness
of martingale solutions. It is indeed in this spirit that we generalize Theorem 5.2.2 in
the following section, hoping that it could be useful for further analogous applications.
5.6 A generalized uniqueness result for martingale

solutions
Here we generalize Theorem 5.2.2, using the notation introduced in Paragraph 5.3.1.
Proposition 5.6.1. For any (s, x) ∈ [0, T ] × Rd , let Cx,s be a subset of martingale
solutions of the SDE starting from x at time s, and let us make the following assumptions:
there exists a measure µ0 ∈ M+ (Rd ) such that:
(i) ∀s ∈ [0, T ], Cx,s is convex for µ0 -a.e. x;
(ii) ∀s ∈ [0, T ], ∀t ∈ [s, T ],
1 2 1 2
for µ0 -a.e. x, (et )# νx,s = (et )# νx,s ∀νx,s , νx,s ∈ Cx,s ;
(iii) for µ0 -a.e. x, for any νx ∈ Cx := Cx,0 , for νx -a.e. γ,

i,γ
∀t ∈ [0, T ], νx,F t
:= (νxi )γFt ∈ Cγ(t),t ,
i,γ
where, with the above notation, we mean that the restriction of νx,F t
to ΓtT :=
C([t, T ], Rd ) is a martingale solution starting from γ(t) at time t;
R
(iv) the solution of (5.1.2) starting from µ0 given by µt := (et )# Rd νx1 dµ0 (x) for a
measurable selections {νx }x∈Rd with νx ∈ Cx (observe that µt does not depends on
the choice of νx ∈ Cx by (ii)), satisfies µt ¿ µ0 for any t ∈ [0, T ].
5.6. A generalized uniqueness result for martingale solutions 207
Then, given two measurable families of probability measures {νx1 }x∈Rd and {νx2 }x∈Rd with
νx1 , νx2 ∈ Cx , νx1 = νx2 for µ0 -a.e. x. In particular, by standard measurable selection
theorems (see for instance [125, Chapter 12]), Cx is a singleton for µ0 -a.e. x.
Proof. Let {νx1 }x∈Rd and {νx2 }x∈Rd be two measurable families of probability measures
with νx1 , νx2 ∈ Cx , and fix 0 < t1 < . . . < tn ≤ T.
Claim: for µ0 -a.e. x, for νxi -a.e. γ (i = 1, 2),
i,γ̃ i,γ
νx,F tn
∈ Cγ̃(tn ),tn for νx,M t1 ,...,tn -a.e. γ̃
i,γ i γ
where νx,M t1 ,...,tn := (νx )M t1 ,...,tn .
This claim follows observing that, by assumption (iii), for µ0 -a.e. x there exists a subset
i,γ
Γx ⊂ ΓT such that νxi (Γx ) = 1 and νx,F tn
∈ Cγ(tn ),tn for any γ ∈ Γx . Thus, by (5.3.1)
applied with ν := νx , A := ΓT , B := Γx , and with M t1 ,...,tn in place of Ftn , one obtains
i
Z
i c i,γ c i
0 = νx (Γx ) = νx,M t1 ,...,tn (Γx ) dνx (γ),
ΓT
that is,
i,γ
for νxi -a.e. γ, νx,M t1 ,...,tn (Γx ) = 1.
This, together with assumption (iii), implies the claim.

i,γ
By (5.3.3), νx,M t1 ,...,tn is concentrated on the set {γ̃ | γ̃(tn ) = γ(tn )}, and so, by the
claim above, we get

i,γ̃ i,γ
νx,F tn
∈ Cγ(tn ),tn for νx,M t1 ,...,tn -a.e. γ̃.
Let A ⊂ Rd be such that µ0 (Ac ) = 0 andR assumption (i) is true for any x ∈ A. By
assumption (iv), we have µtn (Ac ) = 0 = Rd ×ΓT 1Ac (γ(tn )) dνxi (γ) dµ0 (x), that is
for µ0 -a.e. x, γ(tn ) ∈ A for νxi -a.e γ. (5.6.1)

Thus, for µ0 -a.e. x, Cγ(tn ),tn is convex for νxi -a.e γ, and so, by (5.3.4) applied with νxi , we
obtain that
i,γ
for µ0 -a.e. x, νx,M t1 ,...,tn ∈ Cγ(tn ),tn for νxi -a.e. γ (5.6.2)
i,γ tn
(where, with the above notation, we again mean that the restriction of νx,M t1 ,...,tn to ΓT
is a martingale solution starting from γ(tn ) at time tn ). We now want to prove that, for
all n ≥ 1, 0 < t1 < . . . < tn ≤ T , we have that, for µ0 -a.e. x,
Z Z
1
f1 (et1 (γ)) . . . fn (etn (γ)) dνx (γ) = f1 (et1 (γ)) . . . fn (etn (γ)) dνx2 (γ) (5.6.3)
ΓT ΓT
for any fi ∈ Cc (Rd ). We observe that (5.6.3) is true for n = 1 by assumption (ii). We
want to prove it for any n by induction. Let us assume (5.6.3) true for n − 1, and let us
prove it for n.
We want to show that
Z Z
1
f1 (et1 (γ)) . . . fn (etn (γ)) dνx (γ) = f1 (et1 (γ)) . . . fn (etn (γ)) dνx2 (γ),
ΓT ΓT
which can be written also as

1 2
Eνx [f1 (et1 ) . . . fn (etn )] = Eνx [f1 (et1 ) . . . fn (etn )] ,
R
where Eν := ΓT
dν. Now we observe that, for i = 1, 2,
i i
h i i
Eνx [f1 (et1 ) . . . fn (etn )] = Eνx Eνx [f1 (et1 ) . . . fn (etn ) | M t1 ,...,tn−1 ]
i
h i
i
= Eνx f1 (et1 ) . . . fn−1 (etn−1 )Eνx [fn (etn ) | M t1 ,...,tn−1 ]
i £ ¤
= Eνx f1 (et1 ) . . . fn−1 (etn−1 )ψxi (et1 , . . . , etn−1 ) ,
i
where ψxi (et1 , . . . , etn−1 ) := Eνx [fn (etn ) | M t1 ,...,tn−1 ]. Let φ ∈ Cc (Rd ), and let us prove
that
Z
1 £ ¤
Eνx f1 (et1 ) . . . fn−1 (etn−1 )ψx1 (et1 , . . . , etn−1 ) φ(x) dµ0 (x)
Rd Z
2 £ ¤
= Eνx f1 (et1 ) . . . fn−1 (etn−1 )ψx2 (et1 , . . . , etn−1 ) φ(x) dµ0 (x). (5.6.4)
Rd
Let B ⊂ Rd be such that µ0 (B c ) = 0 and assumption

R (ii’) is true for any x ∈ B. By
assumption (iv), we also have µtn−1 (B c ) = 0 = Rd ×ΓT 1B c (etn−1 (γ)) dνxi (γ) dµ0 (x), that
is
for µ0 -a.e. x, γ(tn−1 ) ∈ B for νxi -a.e. γ. (5.6.5)
i,γ
Let us consider νx,M t1 ,...,tn−1 . By (5.6.2),
i,γ
for µ0 -a.e. x, νx,M t1 ,...,tn−1 ∈ Cγ(tn−1 ),tn−1 for νxi -a.e. γ,
and, combining this with (5.6.5), we obtain

i,γ
for µ0 -a.e. x, νx,M t1 ,...,tn−1 ∈ Cγ(tn−1 ),tn−1 and γ(tn−1 ) ∈ B for νxi -a.e. γ.
By assumption (ii) applied with t = tn , this implies that

1,γ 2,γ
for µ0 -a.e. x, (etn )# νx,M t1 ,...,tn−1 = (etn )# ν
x,M t1 ,...,tn−1
for νxi -a.e. γ,
5.6. A generalized uniqueness result for martingale solutions 209
which give us that

for µ0 -a.e. x, ψx1 (et1 , . . . , etn−1 ) = ψx2 (et1 , . . . , etn−1 ) for νxi -a.e. γ. (5.6.6)
Thus we get
Z
1 £ ¤
Eνx f1 (et1 ) . . . fn−1 (etn−1 )ψx1 (et1 , . . . , etn−1 ) φ(x) dµ0 (x)
Rd Z
2 £ ¤
= Eνx f1 (et1 ) . . . fn−1 (etn−1 )ψx1 (et1 , . . . , etn−1 ) φ(x) dµ0 (x)
RdZ
(5.6.6) 2 £ ¤
= Eνx f1 (et1 ) . . . fn−1 (etn−1 )ψx2 (et1 , . . . , etn−1 ) φ(x) dµ0 (x),
Rd
where the first equality in the above equation follows by the inductive hypothesis. Now,
by (5.6.4) and the arbitrariness of φ and of fj , with j = 1, . . . , n, we obtain that, for all
n ≥ 1, 0 < t1 < . . . < tn ≤ T , we have
for µ0 -a.e. x, (et1 , . . . , etn )# νx = (et1 , . . . , etn )# ν̃x ∀t1 , . . . , tn ∈ [0, T ].
Considering only rational times, we get that there exists a subset D ⊂ Rd , with µ0 (Dc ) =
0, such that, for any x ∈ D,
(et1 , . . . , etn )# νx = (et1 , . . . , etn )# ν̃x for any t1 , . . . , tn ∈ [0, T ] ∩ Q.
By continuity, this implies that, for any x ∈ D, νx = ν̃x , as wanted. ¤
The above result apply, for example, in the case when Cx,s denotes the set of all
martingale solutions starting from x. In particular, we remark that, by the above proof,
one obtains the well-known fact that, if νx is a martingale solution starting from x (at
γ
time 0), then, for any 0 ≤ t1 ≤ . . . ≤ tn ≤ T , νx,M t1 ,...,tn is a martingale solution starting
from γ(tn ) at time tn . More in general, since martingale solutions R are closed by convex
γ
combination, is µ is a probability measure on Rd , the average Rd νx,M t1 ,...,tn dµ(x) is a
martingale solution starting from γ(tn ) at time tn .

Observe that assumption (iv) in the above theorem was necessary only to deduce, from
a µ0 -a.e. assumption, a µt -a.e. property. Thus, the above proof give us the following
result:
Proposition 5.6.2. For any (s, x) ∈ [0, T ]×Rd , let Cx,s be a convex subset of martingale
solutions of the SDE starting from x at time s, and let us make the following assumption:
there exists a measure µ0 ∈ M+ (Rd ) such that:
(i) ∀t ∈ [0, T ], for µ0 -a.e. x,
(et )# νx1 = (et )# νx2 ∀νx1 , νx2 ∈ Cx := Cx,0 .
R
If (i) holds, we can define µt := (et )# Rd νx dµ0 (x) for a measurable selections {νx }x∈Rd
with νx ∈ Cx , and this definition does not depends on the choice of νx ∈ Cx . We now
assume that:
(i’) ∀s ∈ [0, T ], ∀t ∈ [s, T ], for µs -a.e. x,

1 2 1 2
(et )# νx,s = (et )# νx,s ∀νx,s , νx,s ∈ Cx,s ;
(ii) ∀s ∈ [0, T ], Cx,s is convex for µs -a.e. x;
(iii) for µ0 -a.e. x, for any νx ∈ Cx , for νx -a.e. γ,

i,γ
∀t ∈ [0, T ], νx,F t
:= (νxi )γFt ∈ Cγ(t),t ,
i,γ
where, with the above notation, we mean that the restriction of νx,F t
to ΓtT is a
martingale solution starting from γ(t) at time t.
Then, given two measurable families of probability measures {νx1 }x∈Rd and {νx2 }x∈Rd with
νx1 , νx2 ∈ Cx , νx1 = νx2 for µ0 -a.e. x. In particular, by standard measurable selection
theorems (see for instance [125, Chapter 12]), Cx is a singleton for µ0 -a.e. x.
Chapter 6
Appendix
6.1 Semi-concave functions

We give the definition of semi-concave function and we recall their main properties. The
main reference on semi-concave functions is the book [41].
We first recall the definition of a modulus (of continuity).
Definition 6.1.1 (Modulus). A modulus ω is a continuous non-decreasing function

ω : [0, +∞) → [0, +∞) such that ω(0) = 0.
We will say that a modulus is linear if it is of the form ω(t) = kt, where k ≥ 0 is
some fixed constant.
We will need the notion of superdifferential. We define it in an intrinsic way on a

manifold.
Definition 6.1.2 (Superdifferential). Let f : M → R be a function. We say that

p ∈ Tx∗ M is a superdifferential of f at x ∈ M , and we write p ∈ D+ f (x), if there exists a
function g : V → R, defined on some open subset U ⊂ M containing x, such that g ≥ f ,
g(x) = f (x), and g is differentiable at x with dx g = p.
We now give the definition of a semi-concave function on an open subset of a Euclidean

space.
Definition 6.1.3 (Semi-concavity). Let U ⊂ Rn open. A function f : U → R is

said to be semi-concave in U with modulus ω (equivalently ω-semi-concave) if, for each
x ∈ U , we have
f (y) − f (x) ≤ hlx , y − xi + ky − xkω(ky − xk)
211
212 6.0. Appendix
for a certain linear form lx : Rn → R.

Note that necessarily lx ∈ D+ f (x). Moreover we say that f : U → R is locally semi-
concave if, for each x ∈ U , there exists an open neighborhood of x in which f is semi-
concave for a certain modulus.
We will say that the function f : U → R is locally semi-concave with a linear modulus
if, for each x ∈ U , we can find an open neighborhood Vx such that the restriction f |Vx is
ω-semi-concave, with ω a linear modulus.
Proposition 6.1.4. 1) Suppose fi : U → R, i = 1, . . . , k is ωi -semi-concave, where U is

an open subset of Rn . Then we have:
P P
(i) for any α1 , . . . , αk ≥ 0, the functions ki=1 αi fi is ( ki=1 αi ωi )-semi-concave on U .
(ii) the function minki=1 fi is (maxki=1 ωi )-semi-concave.
2) Any C1 function is locally semi-concave.
Proof. The proof of 1)(i) is obvious. For the proof of (ii), we fix x ∈ U , and we find
i0 ∈ {1, . . . , k} such that minki=1 fi (x) = fi0 (x). Since fi0 is ωi0 -semi-concave, we can find
a linear map lx : Rn → R such that
∀y ∈ U, fi0 (y) − fi0 (x) ≤ lx (y − x) + ky − xkωi0 (ky − xk).
It clearly follows that

k k k
∀y ∈ U, min fi (y) − min fi (x) ≤ lx (y − x) + ky − xk max ωi (ky − xk).
i=1 i=1 i=1
To prove 2), consider an open convex subset C with C̄ compact and contained in U .
By compactness of C̄ and continuity of x 7→ dx f , we can find a modulus ω, which is a
modulus of continuity for the map x 7→ dx f on C. The Mean Value Formula in integral
form Z 1
f (y) − f (x) = dtx+(1−t)y f (y − x) dt,
0
which is valid for every y, x ∈ C implies that
∀x, y ∈ U, f (y) − f (x) ≤ dx f (y − x) + ky − xkω(ky − xk).
Therefore f is ω-semi-concave in the open subset C.
We now state and prove the first important consequences of the definition of semi-
concavity.
6.1. Semi-concave functions 213
Lemma 6.1.5. Suppose U is an open subset of Rn . Let f : U → R be an ω-semi-concave

function. Then we have:
(i) for every compact subset K ⊂ U , we can find a constant A such that for every
x ∈ K, and every linear form lx on Rn satisfying
∀y ∈ U, f (y) − f (x) ≤ hlx , y − xi + ky − xkω(ky − xk),
we have klx k ≤ A;
(ii) the function f is locally Lipschitz.
Proof. From the definition, it follows that a semi-concave function is locally bounded
from above. We now show that f is also locally bounded from below. Fix a (compact)
cube C contained in U and y2n } be the vertices of the cube. Then, for each
P let {y1 , . . . , P
x ∈ C, we can write x = i αi yi , with i αi = 1. By the semi-concavity of f we have,
for each i = 1, . . . , 2n ,
f (yi ) − f (x) ≤ hlx , yi − xi + kyi − xkω(kyi − xk);
multiplying by αi and summing over i, we get

X X
αi f (yi ) ≤ f (x) + αi kyi − xkω(kyi − xk) ≤ f (x) + B,
i i
with B = DC ω(DC ), where DC is the diameter of the compact cube C. It follows that
∀x ∈ C, f (x) ≥ min f (yi ) − B.

i
We now know that f is locally bounded. Using this fact, it is not difficult to show (i). In
fact, suppose that the closed ball B̄(x0 , 2r), r < +∞, is contained in U . For x ∈ B̄(x0 , r),
we have x − rv ∈ B̄(x0 , 2r) ⊂ U for each v ∈ Rn with kvk = 1, and therefore
f (x − rv) − f (x) ≤ hlx , −rvi + k−rvkω(k−rvk) = −rhlx , vi + rω(r).
Since, by the compactness of B̄(x0 , 2r), we already know that B̃ = supz∈B̄(x0 ,2r) |f (z)| is
finite, this implies
f (x) − f (x − rv) 2B̃

hlx , vi ≤ + ω(r) ≤ + ω(r).
r r
It follows that, for x ∈ B̄(x0 , r),
2B̃
klx k ≤ + ω(r).
r
214 6.0. Appendix
Since the compact set K ⊂ U can be covered by a finite numbers of balls B̄(xi , ri ),
i = 1, . . . , `, we obtain (i).
To prove (ii), we consider a compact subset K ⊂ U , and we apply (i) to obtain the
constant A. We denote by DK the (finite) diameter of the compact set K. For each
x, y ∈ K,
f (y) − f (x) ≤ hlx , y − xi + ky − xkω(ky − xk)

≤ (klx k + ω(DK )) ky − xk
≤ (A + ω(DK ))ky − xk.
Exchanging the role of x and y, we conclude that f is Lipschitz on K.
Let us recall that a Lipschitz real valued function defined on an open subset of
a Euclidean space is differentiable almost everywhere (with respect to the Lebesgue
measure). Therefore by part (ii) of Lemma 6.1.5 above we obtain the following corollary:
Corollary 6.1.6. A locally semi-concave real valued function defined on an open subset
of a Euclidean space is differentiable almost everywhere with respect to the Lebesgue
measure.
In fact, in the case of semi-concave functions there is a better result which is given
in Theorem 6.1.8 below, whose proof can be found in [41, Section 4.1]. Let us first give
a definition:
Definition 6.1.7. We say that E ⊂ Rn is countably (n − 1)-Lipschitz if there exists a

countable family of compact subsets Kj ⊂ Rn such that:
1. E is contained in ∪j Kj ;
2. for each j there exists a hyperplane Hj ⊂ Rn = Hj ⊕Hj⊥ , where Hj⊥ is the Euclidean
orthogonal of Hj , such that Kj is contained in the graph of a Lipschitz function
fj : Aj → Hj⊥ defined on a compact subset Aj ⊂ Hj .
Note that in the definition above, by the graph property (ii), the compact subset Kj
has finite (n − 1)-dimensional Hausdorff measure. Therefore any (n − 1)-Lipschitz set
is contained in a Borel (in fact σ-compact) (n − 1)-Lipschitz set with σ-finite (n − 1)-
dimensional Hausdorff measure.
Theorem 6.1.8. If ϕ : U → R is a semi-concave function defined on the open subset

U of Rn , then ϕ is differentiable at each point in the complement of Borel countably
(n − 1)-Lipschitz set.
In order to extend the definition of locally semi-concave to functions defined on a

manifold, it suffices to show that this definition is stable by composition with diffeomor-
phisms.
Lemma 6.1.9. Let U, V ⊂ Rn be open subsets. Suppose that F : V → U is a C1 map.

If f : U → R is a locally semi-concave function then f ◦ F : V → R is also locally
semi-concave. Moreover, if F is of class C2 , and f : U → R is a locally semi-concave
function with a linear modulus then f ◦ F : V → R is also locally semi-concave with a
linear modulus.
Proof. Since the nature of the result is local, without loss of generality we can assume
that f : U → R is semi-concave with modulus ω. We now show that, for every V 0
convex open subset whose closure V̄ 0 is compact and contained in V , the restriction
f ◦ F |V 0 : V 0 → R is a semi-concave function. We set CV̄ 0 = maxz∈V̄ 0 kDz F k, and we
denote by ω̂V̄ 0 a modulus of continuity for the continuous function z 7→ Dz F on the
compact subset V̄ 0 .
For each x, y in the compact convex subset V̄ 0 ⊂ V , we have
f (F (y)) − f (F (x)) ≤ hlF (x) , F (y) − F (x)i + kF (y) − F (x)kω(kF (y) − F (x)k)
≤ hlF (x) , DF (x)(y − x)i + klF (x) kω̂V̄ 0 (ky − xk)ky − xk
+ CV̄ 0 ky − xkω(CV̄ 0 ky − xk);
Since F (V̄ 0 ) is a compact subset of U we can apply part (i) of Lemma 6.1.5 to obtain
that C̃V̄ 0 = supV̄ 0 klF (x) k is finite. This implies that f ◦ F on V 0 is semi-concave with the
modulus
ω̃(r) = C̃V̄ 0 ω̂V̄ 0 (r) + CV̄ 0 ω(CV̄ 0 r).
If F is C2 , then its derivative DF is locally Lipschitz on U , and we can assume that ω̂V̄ 0
is a linear modulus. Therefore, if ω is a linear modulus, we obtain that ω̃ is also a linear
modulus.
Thanks to the previous lemma, we can define a locally semi-concave function (resp.
a locally semi-concave function for a linear modulus) on a manifold as a function whose
restrictions to charts is, when computed in coordinates, locally semi-concave (resp. locally
semi-concave for a linear modulus). Moreover, it suffices to check this locally semi-
concavity in charts for a family of charts whose domains of definition cover the manifold.
It is not difficult to see that Theorem 6.1.8 is valid on any (second countable) manifold,
since we can cover such a manifold by the domains of definition of a countable family of
charts.
Now we want to introduce the notion of uniformly semi-concave family of functions.
216 6.0. Appendix
Definition 6.1.10. Let fi : U → R, i ∈ I, be a family of functions defined on an

open subset U of Rn . We will say that the family (fi )i∈I is uniformly ω-semi-concave,
where ω is a modulus of continuity, if each fi is ω-semi-concave. We will say that the
family (fi )i∈I is uniformly semi-concave if there exists a modulus of continuity ω such
that the family (fi )i∈I is uniformly ω-semi-concave. We will say that the family (fi )i∈I
is uniformly semi-concave with a linear modulus, if it is uniformly ω-semi-concave, with
ω of the form t 7→ kt, where k is a fixed constant.
Theorem 6.1.11. Suppose that fi : U → R, i ∈ I, is a family of functions defined on
an open subset U of Rn . Suppose that this family (fi )i∈I is uniformly ω-semi-concave,
where ω is a modulus of continuity. If the function
f (x) = inf fi (x)
i∈I
is finite everywhere on U , then f : U → R is also ω-semi-concave.

Proof. Fix x0 ∈ U . We can find a sequence in such that fin (x0 ) & f (x0 ) > −∞. We
choose a cube C ⊂ U with center x0 . Call y1 , . . . , y2n the vertices of C. By the argument
in the beginning of the proof of Lemma 6.1.5, we have
∀x ∈ C, ∀i ∈ I, min fi (yj ) ≤ fi (x0 ) + DC ω(DC ),
1≤j≤2n
where DC is the diameter of the compact cube C. Using the fact that f (yj ) = inf i∈I fi (yj )
is finite, it follows that there exists A ∈ R such that
∀x ∈ C, ∀i ∈ I, fi (x) ≥ A.
Choose now ε > 0 such that B̄(x0 , ε) ⊂ C. If li : Rn → R is a linear form such that
∀y ∈ U, fi (y) ≤ fi (x0 ) + hli , y − x0 i + ky − x0 kω(ky − x0 k),
we obtain that, for every v ∈ Rn of norm 1,
A ≤ fi (x0 ) + hli , εvi + εω(ε).
Since fin (x0 ) & f (x0 ), we can assume fin (x0 ) ≤ M < +∞ for all n, that implies
M −A
klin k ≤ + ω(ε) < +∞.
ε
Up to extracting a subsequence, we can assume lin → l in Rn∗ , the dual space of Rn .
Then, as for every y ∈ U we have f (y) ≤ fin (y), passing to the limit in n in the inequality
f (y) ≤ fin (x0 ) + hlin , y − x0 i + ky − x0 kω(ky − x0 k),
we get
f (y) ≤ f (x0 ) + hl, y − x0 i + ky − x0 kω(ky − x0 k).
Since x0 ∈ U is arbitrary, this concludes the proof.
Before generalizing the notion of uniformly semi-concave family of functions to man-

ifolds, let us look at the following example.
Example 6.1.12. For k ∈ R, define fk : R → R as fk (x) = kx. It is clear that the
family (fk )k∈R is ω-semi-concave for every modulus of continuity ω. In fact
fk (y) − fk (x) = k(y − x) ≤ k(y − x) + |y − x|ω(|y − x|),
since ω ≥ 0. Consider now the diffeomorphism ϕ : R∗+ → R∗+ , ϕ(x) = x2 . Then there
does not exist a non-empty open subset U ⊂ R∗+ , and a modulus of continuity ω, such
that the family (fk ◦ ϕ|U )k∈R is (uniformly) ω-semi-concave. Suppose in fact, by absurd,
that
fk ◦ ϕ(y) − fk ◦ ϕ(x) ≤ lx (y − x) + |y − x|ω(|y − x|),
where lx depends on k but not ω. Since fk ◦ ϕ is differentiable we must have lx (y − x) =
(fk ◦ ϕ)0 (x)(y − x) = 2kx(y − x). Therefore we should have
ky 2 − kx2 ≤ 2kx(y − x) + |y − x|ω(|y − x|).
Fix x, y ∈ U , with y 6= x and set h = y − x. Then

ω(|h|)
kh2 ≤ |h|ω(|h|) ⇒ k ≤ ∀k,
|h|
that is obviously absurd.
Therefore the following is the only reasonable definition for the notion of a uniformly
locally semi-concave family of functions on a manifold.
Definition 6.1.13. We will say that the family of functions fi : M → R, i ∈ I, defined
on the manifold M , is uniformly locally semi-concave (resp. with a linear modulus), if
we can find a cover (Uj )j∈J of M by open subsets, with each Uj domain of a chart
∼
ϕj : Uj −→ Vj ⊂ Rn (where n is the dimension of M ), such that for every j ∈ J the
family of functions (fi ◦ ϕ−1
j )i∈I is a uniformly semi-concave family of functions on the
open subset Vj of Rn (resp. with a linear modulus).
The following corollary is an obvious consequence of Theorem 6.1.11.
Corollary 6.1.14. If the family fi : M → R, i ∈ I is uniformly locally semi-concave
(resp. with a linear modulus) and the function
f (x) = inf fi (x)

i∈I
is finite everywhere, then f : M → R is locally semi-concave (resp. with a linear modu-

lus).
218 6.0. Appendix
Definition 6.1.15. Suppose c : M × N → R is a function defined on the product of

the manifold M by the topological space N . We will say that the family of functions
(c(·, y))y∈N is locally uniformly locally semi-concave (resp. with a linear modulus), if for
each y0 ∈ N we can find a neighborhood V0 of y0 in N such that the family (c(·, y))y∈V0
is uniformly locally semi-concave on M (resp. with a linear modulus).
Proposition 6.1.16. Suppose c : M ×N → R is a function defined on the product of the
manifold M by the topological space N , such that the family of functions (c(·, y))y∈N
is locally uniformly locally semi-concave (resp. with a linear modulus). If K ⊂ N is
compact, and the function
fK (x) = inf c(x, y)
y∈K
is finite everywhere on U , then fK : U → R is locally semi-concave on M (resp. with a

linear modulus).
Proof. By compactness of K, we can find a finite family Vi , i = 1, . . . , ` of open subsets
of N such that K ⊂ ∪ì=1 Vi , and for every i = 1, . . . , `, the family (c(·, y))y∈Vi is locally
uniformly locally semi-concave (resp. with a linear modulus). The function
fi (x) = inf c(x, y)

y∈K∩Vi
is finite everywhere on U , because fi ≥ fK . It follows from Corollary 6.1.14 that fi

is locally semi-concave on M (resp. with a linear modulus), for i = 1, . . . , `. Since
fK = minì=1 fi , we can apply part (ii) of Proposition 6.1.4 to conclude that fK has the
same property.
Proposition 6.1.17. If c : M × N → R is a locally semi-concave function (resp. with a
linear modulus) on the product of the manifolds M and N , then the family of functions
on M (c(·, y))y∈N is locally uniformly locally semi-concave (resp. with a linear modulus).
Proof. We can cover M × N by a family (Ui × Wj )i∈I,j∈J of open sets with Ui open in
∼
M , Wj open in N , where Ui is the domain of a chart ϕi : Ui −→ Ũi ⊂ Rn (where n is
∼
the dimension of M ), and Wj is the domain of a chart ψj : Wj −→ W̃j ⊂ Rm (where m
is the dimension of M ), and such that
¡ ¢
(x̃, ỹ) 7→ c ϕ−1 −1
i (x̃), ψj (ỹ)
is ωi,j -semi-concave on Ũi × W̃j , for some modulus ωi,j . It is then clear that the family
(c(ϕ−1 −1
i (x̃), ψj (ỹ)))ỹ∈W̃j
is uniformly locally ωi,j -semi-concave on Ũi .

The following corollary is now an obvious consequence of Propositions 6.1.17 and

6.1.16.
Corollary 6.1.18. Suppose c : M × N → R is a locally semi-concave function (resp.

with a linear modulus) on the product of the manifolds M and N . Let K be a compact
subset of N . If the function
fK (x) = inf c(x, y)
y∈K
is finite everywhere on U , then fK : U → R is locally uniformly locally semi-concave

(resp. with a linear modulus).
We end this section with another useful theorem. The proof we give is an adaptation
of the proof of [64, Lemma 3.8, page 494].
Theorem 6.1.19. Let ϕ1 , ϕ2 : M → R be two functions, with ϕ1 locally semi-convex

(i.e. −ϕ1 locally semi-concave), and ϕ2 locally semi-concave. Assume that ϕ1 ≤ ϕ2 .
If we define E = {x ∈ M | ϕ1 (x) = ϕ2 (x)}, then both ϕ1 and ϕ2 are differentiable at
each x ∈ E with dx ϕ1 = dx ϕ2 at such a point. Moreover, the map x 7→ dx ϕ1 = dx ϕ2 is
continuous on E.
If ϕ1 is locally semi-convex and ϕ2 is locally semi-concave, both with a linear modulus,
then, in fact, the map x 7→ dx ϕ1 = dx ϕ2 is locally Lipschitz on E.
◦
Proof. Since the statement is local in nature, we will assume that M =B is the Euclidean
unit ball of center 0 in Rn , and that −ϕ1 and ϕ2 are semi-concave with (common) modulus
ω. Suppose now that x ∈ E. We can find two linear maps l1,x , l2,x : Rn → R such that
ϕ1 (y) ≥ ϕ1 (x) + l1,x (y − x) − ky − xkeuc ω(ky − xkeuc )

ϕ2 (y) ≤ ϕ2 (x) + l2,x (y − x) + ky − xkeuc ω(ky − xkeuc ).
Using ϕ1 ≤ ϕ2 , and ϕ1 (x) = ϕ2 (x), we obtain
l1,x (y − x) − ky − xkeuc ω(ky − xkeuc ) ≤ ϕ1 (y) − ϕ1 (x) ≤

≤ ϕ2 (y) − ϕ2 (x) ≤ l2,x (y − x) + ky − xkeuc ω(ky − xkeuc ). (6.1.1)
In particular, we get
l1,x (y − x) − ky − xkeuc ω(ky − xkeuc ) ≤ l2,x (y − x) + ky − xkeuc ω(ky − xkeuc ),
replacing y by x + v with kvkeuc small, we conclude
l1,x (v) − kvkeuc ω(kvkeuc ) ≤ l2,x (v) + kvkeuc ω(kvkeuc ).

220 6.0. Appendix
Therefore
|[l2,x − l1,x ](v)| ≤ 2kvkeuc ω(kvkeuc ),
for v small enough. Since l2,x −l1,x is linear it must be identically 0. We set lx = l2,x = l1,x .
◦
For i = 1, 2 and y ∈B, we obtain from (6.1.1)
|ϕi (y) − ϕi (x) − lx (y − x)| ≤ ky − xkeuc ω(ky − xkeuc ). (6.1.2)
This implies that ϕi is differentiable at x ∈ E, with dx ϕi = l. It remains to show the

continuity of the derivative. Fix r < 1. We now find a modulus of continuity of the
◦ ◦
derivative on the ball r B. If y1 , y2 ∈ E ∩ r B, and kkkeuc ≤ 1 − r, we can apply three
times (6.1.2) to obtain
ϕ1 (y2 ) − ϕ1 (y1 ) − dy1 ϕ1 (y2 − y1 ) ≤ ky2 − y1 keuc ω(ky2 − y1 keuc )

ϕ1 (y2 + k) − ϕ1 (y2 ) − dy2 ϕ1 (k) ≤ kkkeuc ω(kkkeuc )
−ϕ1 (y2 + k) + ϕ1 (y1 ) + dy1 ϕ1 (y2 + k − y1 ) ≤ ky2 + k − y1 keuc ω(ky2 + k − y1 keuc ).
If we add the first two inequality to the third one, we obtain
[dy1 ϕ1 − dy2 ϕ1 ](k) ≤ ky2 − y1 keuc ω(ky2 − y1 keuc ) + kkkeuc ω(kkkeuc )

+ [ky2 − y1 keuc + kkkeuc ]ω(ky2 − y1 keuc + kkkeuc ),
which implies, exchanging k with −k, and using that the modulus ω is non-decreasing
|[dy1 ϕ1 − dy2 ϕ1 ](k)| ≤ 2[ky2 − y1 keuc + kkkeuc ]ω(ky2 − y1 keuc + kkkeuc ).
Since ky2 − y1 keuc < 2, we can apply the inequality (1.5.3) above with any k such that
kkkeuc = (1 − r)ky2 − y1 keuc /2. If we divide the inequality (1.5.3) by kkkeuc , and take
the sup over all k such that kkkeuc = (1 − r)ky2 − y1 keuc /2, we obtain
h 2 i ¡ 1 − r¢
kdy1 ϕ1 − dy2 ϕ1 keuc ≤2 + 1 ω( 1 + ky2 − y1 keuc ).
1−r 2
◦
It follows that a modulus of continuity of x 7→ dx ϕ1 on E ∩ r B is given by
6 − 2r 3 − r
t 7→ ω( t).
1−r 2
◦
This implies the continuity of the map x 7→ dx ϕ1 on E ∩ r B. It also shows that it is
◦
Lipschitz on E ∩ r B when ω is a linear modulus.
6.2. Tonelli Lagrangians 221
6.2 Tonelli Lagrangians

6.2.1 Definition and background
We recall some of the basic definition, and some of the results in Calculus of variations
(in one variable). There are a lot of references on the subject. In [65], one can find an
introduction to the subject that is particularly suited for our purpose. Other references
are [38] and the first chapters in [112]. A brief and particularly nice description of the
main results is contained in [45].
Definition 6.2.1 (Lagrangian). If M is a manifold, a Lagrangian on M is a function

L : T M → R. In the following we will assume that L is at least bounded below and
continuous.
Definition 6.2.2 (Action). If L is a Lagrangian on M , for an absolutely continuous

curve γ : [a, b] → M, a ≤ b, we can define its action AL (γ) by
Z b
AL (γ) = L(γ(s), γ̇(s)) ds.
a
Note that the integral is well defined with values in R ∪ {+∞}, because L is bounded
below, and s → L(γ(s), γ̇(s)) is defined a.e. and measurable. To make things simpler to
write, we set AL (γ) = +∞ if γ is not absolutely continuous.
Definition 6.2.3 (Minimizer). If L is a Lagrangian on the manifold M , an absolutely

continuous curve γ : [a, b] → M , with a ≤ b, is an L-minimizer, if AL (γ) ≤ AL (δ) for
every absolutely continuous curve δ : [a, b] → M with the same endpoints, i.e. such that
δ(a) = γ(a) and δ(b) = γ(b).
Definition 6.2.4 (Tonelli Lagrangian). We will say that L : T M → R is a weak

Tonelli Lagrangian on M , if it satisfies the following hypotheses:
(a) L is C1 ;
(b) for each x ∈ M , the map L(x, ·) : Tx M → R is strictly convex;
(c) there exist a complete Riemannian metric g on M and a constant C > −∞ such
that
∀(x, v) ∈ T M, L(x, v) ≥ kvkx + C
where k·kx is the norm on Tx M obtained from the Riemannian metric g;
222 6.0. Appendix
(d) for every compact subset K ⊂ M the restriction of L to TK M = ∪x∈K Tx M is

superlinear in the fibers of T M → M : this means that for every A ≥ 0, there
exists a constant C(A, K) > −∞ such that
∀(x, v) ∈ TK M, L(x, v) ≥ Akvkx + C(A, K).
We will say that L is a Tonelli Lagrangian, if it is a weak Tonelli Lagrangian, and satisfies
the following two strengthening of conditions (a) and (b) above:
(a’) L is C2 ;
∂2L
(b’) for every (x, v) ∈ T M , the second partial derivative (x, v) is positive definite
∂v 2
on Tx M .
Since above a compact subset of a manifold all Riemannian metrics are equivalent, if
condition (d) in the definition is satisfied for one particular Riemannian metric, then it
is satisfied for any other Riemannian metric.
Note that when L is a weak Tonelli Lagrangian on M , and U : M → R is a C1
function which is bounded below, then L + U , defined by (L + U )(x, v) = L(x, v) + U (x)
is a weak Tonelli Lagrangian. If moreover L is a Tonelli Lagrangian, and U is C2 and
bounded below, then L + U is a Tonelli Lagrangian. Therefore one can generate a lot of
(weak) Tonelli Lagrangians from the following simple example.
Example 6.2.5. Suppose that g is a complete smooth Riemannian metric on M , and
r > 1. We define the Lagrangian Lr,g on M by
Lr,g (x, v) = kvkrx = gx (v, v)r/2 .
1) L2,g is a Tonelli Lagrangian.
2) For any r > 1, the Lagrangian is C1 and is a weak Tonelli Lagrangian.

In both cases, the Riemannian metric mentioned in condition (c) of Definition 6.2.4 is
the same metric g.
Moreover, the vertical derivative of the Lagrangian Lr,g is given by
∂Lr,g
(x, v) = rkvkr−2
x gx (v, ·),
∂v
Proof. Since r > 1 it is not difficult to check that L has (in coordinates) partial derivatives
everywhere with
∂Lr,g ∂Lr,g
(x, 0) = 0 and (x, 0) = 0,
∂x ∂v
and that these partial derivatives are continuous. Therefore L is C1 . A simple compu-
tation gives
∂Lr,g
(x, v) = rkvkr−2
x gx (v, ·).
∂v
We now prove condition (c) and (d) of Definition 6.2.4 at once. In fact, if A is given, we
have
Lr,g (x, v) = kvkrx ≥ Akvkx − Ar/r−1 ,
as on can see by considering separately the two cases kvkr−1
x ≥ A and kvkr−1
x ≤ A. The
rest of the proof is easy.
The completeness of the Riemannian metric in condition (c) of Definition 6.2.4 above
is crucial to guarantee that a set of the form
F = {γ ∈ C 0 ([a, b], M ) | γ(a) ∈ K, AL (γ) ≤ κ},
where K is a compact subset in M , κ is a finite constant, and a ≤ b, is compact in the C0

topology. In fact, condition (c) implies that the curves in such a set F have a g-length
which is bounded independently of γ. Since K is compact (assuming M connected to
simplify things) this implies that there exist x0 ∈ M and R < +∞ such that all the
curves in F are contained in the closed ball B̄(x0 , R) = {y ∈ M | d(x, y) ≤ R}, where
d is the distance associated to the Riemannian metric g. But such a ball B̄(x0 , R) is
compact since g is complete (Hopf-Rinow Theorem). From there, one obtains that the
set F is compact in the C0 topology, see [38, Chapters 2 and 3].
The direct method in the Calculus of Variations, see [38, Theorem 3.7, page 114] or
[65] for Tonelli Lagrangians, implies:
Theorem 6.2.6. Suppose L is a weak Tonelli Lagrangian on the connected manifold

M . Then for every a, b ∈ R, a < b, and every x, y ∈ M , there exists an absolutely
continuous curve γ : [a, b] → M which is an L-minimizer with γ(a) = x and γ(b) = y.
In fact in [38, Theorem 3.7, page 114], the existence of absolutely continuous mini-
mizers is valid under very general hypotheses on the Lagrangian L (the C1 hypothesis
on L is much stronger than necessary). We now come to the problem of regularity of
minimizers which uses the C1 hypothesis on L:
Theorem 6.2.7. If L is a weak Tonelli Lagrangian, then every minimizer γ : [a, b] → M

is C1 . Moreover, on every interval [t0 , t1 ] contained in a domain of a chart, it satisfies
the following equality written in the coordinate system
Z t1
∂L ∂L ∂L
(γ(t1 ), γ̇(t1 )) − (γ(t0 ), γ̇(t0 )) = (γ(s), γ̇(s)) ds, (6.2.1)
∂v ∂v t0 ∂x
224 6.0. Appendix
which is an integrated from of the Euler-lagrange equation. This implies that ∂L/∂v(γ(t), γ̇(t))
is a C1 function of t with
· ¸
d ∂L ∂L
(γ(t), γ̇(t)) = (γ(t), γ̇(t)).
dt ∂v ∂x
Moreover, if L is a Cr Tonelli Lagrangian, with r ≥ 2, then any minimizer is of class
r
C.
Proof. We will only sketch the proof. If L is a Tonelli Lagrangian, this theorem would
be a formulation of what is nowadays called Tonelli’s existence and regularity theory.
In that case its proof can be found in many places, for example [38], [45], or [65]. The
fact that the regularity of minimizers holds for C1 (or even less smooth) Lagrangians is
more recent. The fact that a minimizer is Lipschitz has been established by Clarke and
Vinter, see [44, Corollary 1, page 77, and Corollary 3.1, page 90] (again the hypothesis
L is C1 is stronger than the one required in this last work). The same fact under weaker
regularity assumptions on L has been proved in [6]. A short and elegant proof of the fact
that a minimizer for the class of absolutely continuous curves is necessarily Lipschitz has
been given by Clarke, see [46]. Once one knows that γ is Lipschitz, when L is C1 it is
possible to differentiate the action, see [38], [45], or [65], and, using an integration by
parts, one can show that γ satisfies the following integrated form of the Euler-Lagrange
equation for almost every t ∈ [t0 , t1 ], for some fixed linear form c:
Z t
∂L ∂L
(γ(t), γ̇(t)) = c + (γ(s), γ̇(s)) ds. (6.2.2)
∂v t0 ∂x
But the continuity of the right hand side in (6.2.2) implies that ∂L/∂v(γ(t), γ̇(t)) extends
continuously everywhere on [t0 , t1 ]. Conditions (a) and (b) on L imply that the global
Legendre transform
L : T M → T ∗ M,
∂L
(x, v) 7→ (x, (x, v)),
∂v
is continuous and injective, therefore a homeomorphism on its image by, for example,
Brouwer’s Theorem on the Invariance of Domain (see also Proposition 6.2.9 below). We
therefore conclude that γ̇(t) has a continuous extension to [t0 , t1 ]. Since γ is Lipschitz this
implies that γ is C1 . Equation (6.2.1) follows from (6.2.2), which now holds everywhere
by continuity.
In fact we will only use the cases when L is C2 , in which case this regularity of
minimizers will follow from the “usual” Tonelli regularity theory, or when L is of the
form L(x, v) = kvkpx , p > 1, where the norm is obtained from a C2 Riemannian metric,
in which case the minimizers are necessarily geodesics which are of course as smooth as
the Riemannian metric, see Proposition 6.2.24 below.
To obtain further properties it is necessary to introduce the global Legendre trans-
form.
Definition 6.2.8 (Global Legendre Transform). If L is a C1 Lagrangian on the

manifold L, its global Legendre transform L : T M → T ∗ M , where T ∗ M is the cotangent
bundle of M , is defined by
∂L
L (x, v) = (x, (x, v)).
∂v
Proposition 6.2.9. If L is a weak Tonelli Lagrangian on the manifold M , then its global
Legendre transform L : T M → T ∗ M is a homeomorphism from T M onto T ∗ M .
Moreover, if L is a Cr Tonelli Lagrangian with r ≥ 2, then L is Cr−1 .
Proof. We first prove the surjectivity of L . Suppose p ∈ Tx∗ M . By condition (d) in

Definition 6.2.4, we have
p(v) − L(x, v) ≤ p(v) − (kpkx + 1)kvkx −C(kpkx + 1, {x})

≤ −kvkx −C(kpkx + 1, {x}).
But this last quantity tends to −∞, as kvkx → +∞. Therefore the continuous function
v 7→ p(v) − L(x, v) achieves a maximum at some point vp ∈ Tx M . Since this function
is C1 , its derivative at vp must be 0. This yields p − ∂L/∂v(x, vp ) = 0. Hence (x, p) =
L (x, vp ).
To prove injectivity of L , it suffices to show that for v, v 0 ∈ Tx M , with v 6= v 0 , we have
∂L/∂v(x, v) 6= ∂L/∂v(x, v 0 ). Consider the function ϕ : [0, 1] → R, t 7→ L(x, tv+(1−t)v 0 ),
which by condition (b) of Definition 6.2.4 is strictly convex. Since it is C1 , we must have
ϕ0 (0) 6= ϕ0 (1). In fact, if that was not the case, then the non-decreasing function ϕ0 would
be constant on [0, 1], and ϕ would be affine on [0, 1]. This contradicts strict convexity.
By a simple computation, we therefore get
∂L ∂L
(x, v 0 )(v − v 0 ) = ϕ0 (0) 6= ϕ0 (1) = (x, v)(v − v 0 ).
∂v ∂v
This implies ∂L/∂v(x, v 0 ) 6= ∂L/∂v(x, v). We now show that L is a homeomorphism.
Since this map is continuous, and bijective, we have to check that it is proper, i.e. inverse
images under L of compact subsets of T ∗ M are (relatively) compact. For this it suffices
to show that for every compact subset K ⊂ M , and every C < +∞, the set
∂L
{(x, v) ∈ TK M | k (x, v)kx ≤ C}
∂v
226 6.0. Appendix
is compact. By convexity of v 7→ L(x, v), we obtain
∂L
(x, v)(v) ≥ L(x, v) − L(x, 0).
∂v
But k∂L/∂v(x, v)kx ≥ ∂L/∂v(x, v)(v/kvkx ), therefore by condition (d) of Definition
6.2.4, we conclude that
∂L
∀A ≥ 0, ∀(x, v) ∈ TK M, k (x, v)kx ≥ A − [C(K, A)/kvkx ].
∂v
Taking A = C + 1, we get the inclusion
∂L
{(x, v) ∈ TK M | k (x, v)kx ≤ C} ⊂ {(x, v) ∈ TK M | kvkx ≤ C(K, C + 1)},
∂v
and the compactness of the first set follows.
Suppose now that L is a Cr Tonelli Lagrangian with r ≥ 2. Obviously L is Cr−1 .
By the inverse function theorem, to show that it is a Cr−1 diffeomorphism, it suffices to
show that the derivative is invertible at each point of T M . But a simple computation in
coordinates show that the derivative of L at (x, v) is given in matrix form by
 
Id 0
 ∂ 2L ∂ 2L 
(x, v) (x, v)
∂x∂v ∂v 2
This is clearly invertible by (b’) of Definition 6.2.4.
Definition 6.2.10. If L is a Lagrangian on M , we define its Hamiltonian H : T ∗ M →

R ∪ {+∞} by
H(x, p) = sup p(v) − L(x, v).
v∈Tx M
Proposition 6.2.11. Let L be a weak Tonelli Lagrangian on the manifold M . Its

Hamiltonian H is everywhere finite valued and satisfies the following properties:
(a∗ ) H is C1 , and in coordinates


 ∂H
 (L (x, v)) = v
∂p
 ∂H (L (x, v)) = − ∂L (x, v).

∂x ∂x
(b∗ ) for each x ∈ M , the map H(x, ·) : Tx∗ M → R is strictly convex;

(d∗ ) for every compact subset K ⊂ M the restriction of H to TK∗ M = ∪x∈K Tx∗ M is
superlinear in the fibers of T ∗ M → M : this means that for every A ≥ 0, there
exists a finite constant C ∗ (A, K) such that
∀(x, p) ∈ TK∗ M, H(x, p) ≥ Akpkx + C ∗ (A, K).
In particular, the function H is a proper map, i.e. inverse images under H of

compact subsets of R are compact.
If L is a Cr Tonelli Lagrangian with r ≥ 2, then
(a’∗ ) H is Cr ;
∂ 2H
(b’∗ ) for every (x, v) ∈ M , the second partial derivative (x, p) is positive definite on
∂p2
Tx∗ M .
Proof. To show differentiability, using a chart in M , we can assume that M = U is
an open subset in Rm . Moreover, since all Riemannian metrics are equivalent above
compact subsets, replacing U by an open subset V with compact closure contained in
U , we can assume that the norm used in (c) of Definition 6.2.4 is the constant standard
Euclidean norm k·keuc on the second factor of T V = V × Rm , that is
∀x ∈ V, ∀v ∈ Rm , L(x, v) ≥ Akvkeuc + C(A),
where C(A) is a finite constant, and supx∈V L(x, 0) ≤ C < +∞.

We have T ∗ V = V × Rm∗ , where Rm∗ is the dual space of Rm . We will denote by
k·keuc also the dual norm on Rm∗ obtained from k·keuc on Rm . We now fix R > 0. If
p ∈ Rm∗ satisfies kpkeuc ≤ R, we have
p(v) − L(x, v) ≤ kpkeuc kvkeuc − (R + 1)kvkeuc − C(R + 1)

≤ −kvkeuc − C(R + 1).
Since L(x, 0) ≤ C for x ∈ V , it follows that, for kvkeuc > C − C(R + 1),
p(v) − L(x, v) ≤ −C ≤ −L(x, 0).
This implies
H(x, p) = sup p(v) − L(x, v) = sup p(v) − L(x, v),

v∈Rm kvkeuc ≤C−C(R+1)
Therefore the sup in the definition of H(x, p) is attained at a point v(x,p) with kv(x,p) keuc ≤
C − C(R + 1). Note that this point v(x,p) is unique (compare with the argument proving
228 6.0. Appendix
that the Legendre transform is surjective). In fact, at its maximum v(x,p) , the C1 function
v 7→ p(v) − L(x, v) must have 0 derivative, and therefore
∂L
p= (x, v(x,p) ).
∂v
This means (x, p) = L (x, v(x,p) ), but the Legendre transform is injective by Proposition
6.2.9.
Note, furthermore, that the map
¡ ¢
f : V × {kpkeuc ≤ R} × {kvkeuc ≤ C − C(R + 1)} → R,
((x, p), v) 7→ p(v) − L(x, v),

is C1 . Therefore we obtain that H is C1 from the following classical lemma whose proof
is left to the reader.
Lemma 6.2.12. Let f : N × K → R, (x, k) 7→ f (x, k) be a continuous map, where N
is a manifold, and K is a compact space. Define F : N → R by F (x) = supk∈K f (x, k).
Suppose that:
∂f
1. (x, k) exists everywhere and is continuous as a function of both variables (x, k);
∂x
2. for every x ∈ N , the set {k ∈ K | f (x, k) = F (x)} is reduced to a single point,
which we will denote by kx .
Then F is C1 , and the derivative Dx F of F at x is given by

∂f
Dx F = (x, kx ).
∂x
Returning to the proof of Proposition 6.2.11, by the last statement of the above
lemma we also obtain
∂H ∂H ∂L
(x, p) = v(x,p) and (x, p) = − (x, v(x,p) )
∂p ∂x ∂x
Since (x, p) = L (x, v(x,p) ), this can be rewritten as
∂H ∂H ∂L
◦ L (x, v) = v and ◦ L (x, v) = − (x, v), (6.2.3)
∂p ∂x ∂x
which proves (a∗ ). Note that when L is a Cr Tonelli Lagrangian, by Proposition 6.2.9
the Legendre transform L is a Cr−1 global diffeomorphism. From the expression of
the partial derivatives above, we conclude that ∂H/∂p and ∂H/∂x are both Cr−1 . This
proves (a’∗ ).
We now prove (b’∗ ). Taking the derivative in v of the first equality in (6.2.3)
· ¸
∂H ∂L
x, (x, v) = v,
∂p ∂v
we obtain the matrix equation
∂ 2H ∂ 2L
(L (x, v)) · (x, v) = IdRm ,
∂p2 ∂v 2
where the dot · represents the usual product of matrices. This means that the matrix
representative of ∂ 2 H/∂p2 (x, p) is the inverse of the matrix of a positive definite quadratic
form, therefore ∂ 2 H/∂p2 (x, p) is itself positive definite.
We prove (b∗ ). Suppose p1 6= p2 are both in Tx∗ M . Fix t ∈ (0, 1), and set p3 =
tp1 + (1 − t)p2 . The covectors p1 , p2 , p3 are all distinct. Call v1 , v2 , v3 elements in Tx M
such that pi = ∂L/∂v(x, vi ). By injectivity of the Legendre transform, the tangent
vectors v1 , v2 , v3 are also all distinct. Moreover, for i = 1, 2 we have
H(x, pi ) = pi (vi ) − L(x, vi ),
H(x, p3 ) = p3 (v3 ) − L(x, v3 ) = t[p1 (v3 ) − L(x, v3 )] + (1 − t)[p2 (v3 ) − L(x, v3 )].
Since the sup in the definition of H(x, p) is attained at a unique point, and v1 , v2 , v3 are
all distinct, for i = 1, 2 we must have
pi (v3 ) − L(x, v3 ) < pi (vi ) − L(x, vi ) = H(x, pi ).
It follows that
H(x, tp1 + (1 − t)p2 ) < tH(x, p1 ) + (1 − t)H(x, p2 ).
It remains to prove (d∗ ). Fix a compact set K in M . Since
H(x, p) ≥ p(v) − L(x, v),
we obtain
H(x, p) ≥ sup p(v) + inf −L(x, v).
kvkx ≤A x∈K,kvkx ≤A
But supkvkx ≤A p(v) = Akpkx , and C ∗ (A, K) = inf x∈K,kvkx ≤A −L(x, v) is finite by com-
pactness.
230 6.0. Appendix
Since for a weak Tonelli Lagrangian L, the Hamiltonian H : T ∗ M → R is C1 , we can

define the Hamiltonian vector field XH on T ∗ M . This is rather standard and uses the
fact that the exterior derivative of the Liouville form on M defines a symplectic form
on M , see [2] or [84]. The vector field XH is entirely characterized by the fact that in
coordinates obtained from a chart in M , it is given by
∂H ∂H
XH (x, p) = ( (x, p), − (x, p)).
∂p ∂x
So the associated ODE is given by


 ∂H
 ẋ = (x, p)
∂p
 ṗ = − ∂H (x, p).

∂x
In this form, it is an easy exercise to check that H is constant on any solution of XH .
We know come to the simple and important connection between minimizers and
solutions of XH .
Theorem 6.2.13. Suppose L is a weak Tonelli Lagrangian on M . If γ : [a, b] → M

is a minimizer for L, then the Legendre transform of its speed curve t 7→ L (γ(t), γ̇(t))
is a C1 solution of the Hamiltonian vector field XH obtained from the Hamiltonian H
associated to L.
Moreover, if L is a Tonelli Lagrangian, there exists a (partial) C1 flow φLt on T M
such that every speed curve of an L-minimizer is a part of an orbit of φLt . This flow is
called the Euler-Lagrange flow, is defined by
φLt = L −1 ◦ φH
t ◦L,
where φH 1
t is the partial flow of the C vector filed XH .
Proof. If we write (x(t), p(t)) = L (γ(t), γ̇(t)) then
∂L
x(t) = γ(t) and p(t) = (γ(t), γ̇(t)).
∂v
By Theorem 6.2.7, x(t) = γ(t) is C1 with ẋ(t) = γ̇(t). The fact that p(t) is C1 follows
again from Theorem 6.2.7, which also yields in local coordinates
∂L
ṗ(t) = (γ(t), γ̇(t)).
∂x
Since (x(t), p(t)) = L (γ(t), γ̇(t)), we conclude from Proposition 6.2.11 that t 7→ (x(t), p(t))
satisfies the ODE 
 ∂H
 ẋ = (x, p)
∂p
 ṗ = − ∂H (x, p).

∂x
Therefore the Legendre transform of the speed curve of a minimizer is a solution of the
Hamiltonian vector field XH .
If L is a Tonelli Lagrangian, by Proposition 6.2.11 the Hamiltonian H is C2 . Therefore
the vector field XH is C1 , and it defines a (partial) C1 flow φH t . The rest follows from
what was obtained above and the fact that the Legendre transform is C1 .
We recall the following definition
Definition 6.2.14 (Energy). If L is a C1 Lagrangian on the manifold M , its energy
E : T M → R is defined by
∂L
E(x, v) = H ◦ L (x, v) = (x, v)(v) − L(x, v).
∂v
Corollary 6.2.15 (Conservation of Energy). If L is a C1 Lagrangian on the manifold
M , and γ : [a, b] → M is a C1 minimizer for L, then the energy E is constant on the
speed curve
s 7→ (γ(s), γ̇(s)).
Proof. In fact E(γ(s), γ̇(s)) = H ◦ L (γ(s), γ̇(s)). But s 7→ L (γ(s), γ̇(s)) is a solution
of the vector field H, and the Hamiltonian H is constant on orbits of XH .
Proposition 6.2.16. If L is a weak Tonelli Lagrangian on the manifold M , then for
every compact subset K ⊂ M , and every C < +∞, the set
{(x, v) ∈ TK M | E(x, v) ≤ C}
is compact, i.e. the map E : T M → R is proper on every subset of the form π −1 (K),
where K is a compact subset of M .
Proof. Since E = H ◦ L , this follows from the fact that H is proper and L is a
homeomorphism.
Proposition 6.2.17. Let L be a weak Tonelli Lagrangian on M . Suppose K is a
compact subset of M , and t > 0. Then we can find a compact subset K̃ ⊂ M and a
finite constant A, such that every minimizer γ : [0, t] → M with γ(0), γ(t) ∈ K satisfies
γ([0, t]) ⊂ K̃ and kγ̇(s)kγ(s) ≤ A for every s ∈ [0, t].
232 6.0. Appendix
Proof. We will use as a distance d the one coming from the complete Riemannian metric.
All finite closed balls in this distance are compact (Hopf-Rinow theorem). We choose
x0 ∈ K, and R such that K ⊂ B(x0 , R) (we could take R = diam(K), the diameter of
K). We now pick x, y ∈ K. If α : [0, t] → M is a geodesic with α(0) = x, α(t) = y and
whose length is d(x, y) (such a geodesic exists by completeness), the inequality
d(x, y) ≤ d(x, x0 ) + d(x0 , y) ≤ 2R
implies that α([0, t]) ⊂ B̄(x0 , 3R). Moreover kα̇(s)kα(s) = d(x, y)/t ≤ 2R/t for every
s ∈ [0, t]. By compactness, the Lagrangian L is bounded on the set
K = {(z, v) ∈ T M | z ∈ B̄(x0 , 3R), kvkz ≤ 2R/t}.
We call θ an upper bound of L on K . Obviously the action of α on [0, t] is less

than
R t tθ, and therefore if γ : [0, t] → M is a minimizer with γ(0), γ(t) ∈ K, we get
0
L(γ(s), γ̇(s)) ds ≤ tθ. Using condition (c) on the Lagrangian L and what we obtained
above, we see that
Z t
Ct + kγ̇(s)kγ(s) ds ≤ tθ.
0
It follows that we can find s0 ∈ [0, t] such that
kγ̇(s0 )kγ(s0 ) ≤ θ − C.
Moreover
γ([0, t]) ⊂ B̄(γ(0), t(θ − C)) ⊂ B̄(x0 , R + t(θ − C)).
We set K̃ = B̄(x0 , R + t(θ − C)). If we define
θ1 = sup{E(z, v) | (z, v) ∈ T M, z ∈ K̃, kvkz ≤ θ − C},
we see that θ1 is finite by compactness. Moreover E(γ(s0 ), γ̇(s0 )) ≤ θ1 . But, as mentioned

earlier, the energy E(γ(s), γ̇(s)) is constant on the curve. This implies that the speed
curve
s 7→ (γ(s), γ̇(s))
is contained in the compact set
K˜ = {(z, v) ∈ T M | z ∈ K̃, E(z, v) ≤ θ1 }.
Observing that the set K˜ does not depend on γ, this finishes the proof.
6.2.2 Lagrangian costs and semi-concavity

Definition 6.2.18 (Costs for a Lagrangian). Suppose L : T M → R is a Lagrangian
on the connected manifold M , which is bounded from below. For t > 0, we define the
cost ct,L : M × M → R by
ct,L (x, y) = inf AL (γ)

γ(0)=x,γ(t)=y
where the infimum is taken over all the absolutelyR continuous curves γ : [0, t] → M , with
t
γ(0) = x, and γ(t) = y, and AL (γ) is the action 0 L(γ(s), γ̇(s)) ds of γ.
Using a change of variable in the integral defining the action, it is not difficult to
see that ct,L = c1,Lt where the Lagrangian Lt on M is defined by Lt (x, v) = tL(x, t−1 v).
Observe that Lt is a (weak) Tonelli Lagrangian if L is.
Theorem 6.2.19. Suppose that L : T M → R is a weak Tonelli Lagrangian. Then, for

every t > 0, the cost ct,L is locally semi-concave on M × M . Moreover, if the derivative
of L is locally Lipschitz, then ct,L is locally semi-concave with a linear modulus.
In particular, if L is a Tonelli Lagrangian for every t > 0, the cost ct,L is locally
semi-concave on M × M with a linear modulus.
Proof. By the remark preceding the statement of the theorem, it suffices to prove this
∼
for c = c1,L . Let n be the dimension of M . Choose two charts ϕi : Ui −→ Rn , i = 0, 1,
on M . We will show that
(x̃0 , x̃1 ) 7→ c(ϕ−1 −1

0 (x̃0 ), ϕ1 (x̃1 ))
◦ ◦
is semi-concave on B × B, where B is the closed Euclidean unit ball of center 0 in Rn . By
Proposition 6.2.17, we can find a constant A such that for every minimizer γ : [0, 1] → M ,
with γ(i) ∈ ϕ−1
i (B), we have
∀s ∈ [0, 1], kγ̇(s)kγ(s) ≤ A.
We now pick δ > 0 such that for all z1 , z2 ∈ Rn , with kz1 keuc ≤ 1, kz2 keuc = 2,
d(ϕ−1 −1
i (z1 ), ϕi (z2 )) ≥ δ, i = 0, 1,
where k·keuc denote the Euclidean norm. Then we choose ε > 0 such that Aε < δ. It
follows that
¡ ◦¢ ¡ ◦¢
γ([0, ε]) ⊂ ϕ−1
0 2 B and γ([1 − ε, 1]) ⊂ ϕ −1
1 2B .
234 6.0. Appendix
We set x̃i = ϕi (γ(i)), i = 0, 1. For h0 , h1 ∈ Rn we can define γ̃h0 : [0, ε] → Rn and

γ̃h1 : [1 − ε, 1] → Rn as
ε−s
γ̃h0 (s) = h0 + ϕ0 (γ(s)), 0 ≤ s ≤ ε,
ε
s − (1 − ε)
γ̃h1 (s) = h1 + ϕ1 (γ(s)), 1 − ε ≤ s ≤ 1.
ε
We observe that when h0 = 0 (or h1 = 0) the curve coincide with γ. Moreover γ̃h0 (0) =
x̃0 + h0 , γ̃h1 (1) = x̃1 + h1 . We suppose that khi keuc ≤ 2. In that case the images of both
◦
γ̃h0 and γ̃h0 are contained in 4 B and
kγ̃˙ hi (s)keuc ≤ khi keuc + k(ϕi ◦ γ)0 (s)keuc ≤ 2 + k(ϕi ◦ γ)0 (s)keuc .
Since we know that the speed of γ is bounded in M , we can find a constant A1 such that
∀s ∈ [0, ε], kγ̃˙ h0 (s)keuc ≤ A1 ,
∀s ∈ [1 − ε, 1], kγ̃˙ h1 (s)keuc ≤ A1 .

To simplify a little bit the notation, we define the Lagrangian Li : Rn × Rn → R by
Li (z, v) = L(ϕ−1 −1
i (z), D[ϕi ](v)).
If we concatenate the three curves ϕ−1 −1

0 ◦ γ̃h0 , γ|[ε,1−ε] and ϕ1 ◦ γ̃h1 , we obtain a curve in
M between ϕ−1 −1
0 (x̃0 + h0 ) and ϕ1 (x̃1 + h1 ), and therefore
Z ε
¡ ¢
c ϕ−1
0 (x̃0 + h0 ), ϕ−1
1 (x̃1 + h1 ) ≤ L0 (γ̃h0 (t), γ̃˙ h0 (t)) dt
0
Z 1−ε Z 1
+ L(γ(t), γ̇(t)) dt + L1 (γ̃h1 (t), γ̃˙ h1 (t)) dt.
ε 1−ε
Hence
¡ ¢ ¡ −1 ¢
c ϕ−1
0 (x̃ 0 + h 0 ), ϕ −1
(x̃ 1 + h 1 ) − c ϕ (x̃ 0 ), ϕ −1
(x̃ 1 )
Z1 ε 0 1
£ ¤
≤ L0 (γ̃h0 (t), γ̃˙ h0 (t)) − L0 (ϕ0 ◦ γ(t), (ϕ0 ◦ γ)0 (t)) dt
0
Z 1
£ ¤
+ L1 (γ̃h1 (t), γ̃˙ h1 (t)) − L1 (ϕ1 ◦ γ(t), (ϕ1 ◦ γ)0 (t)) dt.
1−ε
We now call ω a common modulus of continuity for the derivative DL0 and DL1 on
the compact set B̄(0, 4) × B̄(0, A1 ). Here DL0 and DL1 denote the total derivatives of
L0 and L1 , i.e. with respect to all variables. When L has a derivative which is locally
Lipschitz, then DL0 and DL1 are also locally Lipschitz on Rn × Rn , and the modulus ω
◦
can be taken linear. Since γ̃hi (s) ∈B (0, 4) and kγ̃˙ hi (s)k ≤ A1 , we get the estimate
¡ ¢ ¡ −1 ¢
c ϕ−1 −1 −1
0 (x̃0 + h0 ),ϕ1 (x̃1 + h1 ) − c ϕ0 (x̃0 ), ϕ1 (x̃1 )
Z ε µ ¶
0 ε−t 1
≤ DL0 (ϕ0 ◦ γ(t), (ϕ0 ◦ γ) (t)) h0 , − h0 dt
0 ε ε
Z 1 µ ¶
0 t − (1 − ε) 1
+ DL1 (ϕ1 ◦ γ(t), (ϕ1 ◦ γ) (t)) h1 , h1 dt
1−ε ε ε
µ ¶ µ ¶
1 1 1 1
+ω kh0 keuc kh0 keuc + ω kh1 keuc kh1 keuc .
ε ε ε ε
We observe that the sum of the first two terms in the right hand side is linear, while the
sum of the last two is bounded by
µ ¶
1 1
ω k(h0 , h1 )keuc k(h0 , h1 )keuc .
ε ε
Therefore we obtain that

¡ ¢
(x̃0 , x̃1 ) 7→ c ϕ−1 −1
0 (x̃0 ), ϕ1 (x̃1 )
¡1 ¢ ◦ ◦
is semi-concave for the modulus ω̃(r) = 1ε ω ε
r on B × B, as wanted.
Corollary 6.2.20. If L is a weak Tonelli Lagrangian on the connected manifold M ,

then, for every t > 0, a superdifferential of ct,L (x, y) at (x0 , y0 ) is given by
∂L ∂L
(v, w) 7→ (γ(t), γ̇(t))(w) − (γ(0), γ̇(0))(v),
∂v ∂v
where γ : [0, t] → M is a minimizer for L with γ(0) = x0 , γ(t) = y0 , and (v, w) ∈
Tx M × Ty M = T(x,y) (M × M ).
Proof. Again we will do it only for t = 1. If we use the notation introduced in the
previous proof, we see that a superdifferential of
¡ ¢
(x̃0 , x̃1 ) 7→ c ϕ−1 −1
0 (x̃0 ), ϕ1 (x̃1 )
is given by
(h0 , h1 ) 7→ l0 (h0 ) + l1 (h1 ),
236 6.0. Appendix
where
Z εh
t − ε ∂L0
l0 (h0 ) = − (ϕ0 ◦ γ(t), (ϕ0 ◦ γ)0 (t)) (h0 )
0 ε ∂x
1 ∂L0 i
+ (ϕ0 ◦ γ(t), (ϕ0 ◦ γ)0 (t)) (h0 ) dt,
ε ∂v
Z 1 h t − (1 − ε) ∂L
1
l1 (h1 ) = (ϕ1 ◦ γ(t), (ϕ1 ◦ γ)0 (t)) (h1 )
1−ε ε ∂x
1 ∂L1 i
+ (ϕ1 ◦ γ(t), (ϕ1 ◦ γ)0 (t)) (h1 ) dt.
ε ∂v
By Theorem 6.2.7, the curve t 7→ ϕ0 ◦ γ(t) is a C1 extremal of L0 and it satisfies the
following integrated form of the Euler-Lagrange equation:
∂L0 ∂L0
(ϕ0 ◦ γ(t), (ϕ0 ◦ γ)0 (t)) − (ϕ0 ◦ γ(0), (ϕ0 ◦ γ)0 (0))
∂v ∂v Z t
∂L0
= (ϕ0 ◦ γ(s), (ϕ0 ◦ γ)0 (s)) ds.
0 ∂x
This gives us
∂L0
l0 (h0 ) = − (ϕ0 ◦ γ(0), (ϕ0 ◦ γ)0 (0))
∂v Z · Z t ¸
1 ε d ∂L0 0
− (t − ε) (ϕ0 ◦ γ(s), (ϕ0 ◦ γ) (s)) ds dt.
ε 0 ds 0 ∂x
Obviously the second term in the right hand side is 0 and so l0 reinterpreted on Tx0 M
rather than on Rn gives − ∂L
∂v
(γ(0), γ̇(0)). The treatment for l1 is the same.
We have avoided the first variation formula in the proof of Corollary 6.2.20, because
this is usually proven for C 2 variation of curves and C 2 Lagrangians. Of course, our
argument to prove this Corollary is basically a proof for the first variation formula for
C 1 Lagrangians. This is of course already known and the proof is the standard one.
6.2.3 The twist condition for costs obtained from Lagrangians

Lemma 6.2.21. Let L be a weak Tonelli Lagrangian on the connected manifold M .
Suppose that L satisfies the following condition:
(UC) If γi : [ai , bi ] → M, i = 1, 2 are two L-minimizers such that γ1 (t0 ) = γ2 (t0 ) and
γ̇1 (t0 ) = γ̇2 (t0 ), for some t0 ∈ [a1 , b1 ] ∩ [a2 , b2 ], then γ1 = γ2 on the whole interval
[a1 , b1 ] ∩ [a2 , b2 ].
Then, for every t > 0, the cost ct,L : M × M → R satisfies the left (and the right) twist
condition of Definition 1.2.4.
Moreover, if (x, y) ∈ D(Λlct,L ), then we have:
(i) there is a unique L-minimizer γ : [0, t] → M such that x = γ(0), and y = γ(t);
(ii) the speed γ̇(0) is uniquely determined by the equality
∂ct,L ∂L
(x, y) = − (x, γ̇(0)).
∂x ∂v
Proof. We first prove part (ii). Pick γ : [0, t] → M an L-minimizer with x = γ(0) and
y = γ(t). From Corollary 6.2.20 we obtain the equality
∂ct,L ∂L
(x, y) = − (x, γ̇(0)). (6.2.4)
∂x ∂v
Since the C1 map v 7→ L(x, v) is strictly convex, the Legendre transform v ∈ Tx M 7→
∂L/∂v(x, v) is injective, and therefore γ̇(0) ∈ Tx M is indeed uniquely determined by
Equation (6.2.4) above. This proves (ii).
To prove statement (i), consider another L-minimizer γ1 : [0, t] → M is x = γ1 (0).
By what we just said, we also have
∂ct,L ∂L
(x, y) = − (x, γ̇1 (0)).
∂x ∂v
By the uniqueness already proved in statement (ii), we get γ̇1 (0) = γ̇(0). It now follows
from condition (UC) that γ = γ1 on the whole interval [0, t].
The twist condition follows easily. Consider (x, y), (x, y1 ) ∈ D(Λlct,L ) such that
∂ct,L ∂ct,L
(x, y) = (x, y1 ). (6.2.5)
∂x ∂x
By (i) there is a unique L-minimizer γ : [0, t] → M (resp. γ1 : [0, t] → M ) such that
x = γ(0), y = γ(1) (resp. x = γ1 (0), y1 = γ1 (1)), and
∂ct,L ∂L ∂ct,L ∂L
(x, y) = − (x, γ̇(0)) and (x, y1 ) = − (x, γ̇1 (0)).
∂x ∂v ∂x ∂v
From equation (6.2.5), and the injectivity of the Legendre transform of L, it follows
that γ̇1 (0) = γ̇(0). From condition (UC) we get γ = γ1 on the whole interval [0, t]. In
particular, we obtain y = γ(t) = γ1 (t) = y1 .
238 6.0. Appendix
The next lemma is an easy consequence of Lemma 6.2.21 above.

Lemma 6.2.22. Let L be a weak Tonelli Lagrangian on M . If we can find a continuous
local flow φt defined on T M such that:
(UC’) for every L-minimizer γ : [a, b] → M , and every t1 , t2 ∈ [a, b], the point φt2 −t1 (γ(t1 ), γ̇(t1 ))
is defined and (γ(t2 ), γ̇(t2 )) = φt2 −t1 (γ(t1 ), γ̇(t1 )),
then L satisfies (UC). Therefore, for every t > 0, the cost ct,L : M × M → R satisfies the
left twist (and the right) condition of Definition 1.2.4.
Moreover, if (x, y) ∈ D(Λlct,L ), then y = πφt (x, v), where π : T M → M is the
canonical projection, and v ∈ Tx M is uniquely determined by the equation
∂ct,Lr,g ∂L
(x, y) = − (x, v).
∂x ∂v
The curve s ∈ [0, t] 7→ πφs (x, v) is the unique L-minimizer γ : [0, t] → M with γ(0) =
x, γ(1) = y.
Note that the following proposition is contained in Theorem 6.2.13.
Proposition 6.2.23. If L is a Tonelli Lagrangian, then it satisfies condition (UC’) for
the Euler Lagrange flow φLt .
Proposition 6.2.24. Suppose g is a complete Riemannian metric on the connected
manifold M , and r > 1. For a given t > 0, the cost ct,Lr,g of the weak Tonelli Lagrangian
Lr,g , defined by
Lr,g (x, v) = kvkrx = gx (v, v)r/2 ,
is given by
ct,Lr,g = tr−1 drg (x, y),
where dg is the distance defined by the Riemannian metric. The Lagrangian Lr,g satisfies
condition (UC’) of Lemma 6.2.22 for the geodesic flow φgt of g. Therefore its cost ct,Lr,g
satisfies the left (and the right) twist condition. Moreover, if (x, y) ∈ D(Λlct,Lr,g ), then
y = πφgt (x, v), where π : T M → M is the canonical projection, and v ∈ Tx M is uniquely
determined by the equation
∂ct,Lr,g ∂Lr,g
(x, y) = − (x, v).
∂x ∂v
Proof. Define s by 1/s + 1/r = 1. Let γ : [a, b] → M be a piecewise C1 curve. Denoting
by `g (γ) the Riemannian length of γ, we can apply Hölder inequality to obtain
Z b µZ b ¶1/r
1/s r
kγ(s)kx ds ≤ (b − a) kγ(s)kx ds ,
a a
with equality if and only if γ is parameterized with kγ(s)kx constant, i.e. proportionally
to arc-length. This of course implies
Z b
−r/s r
(b − a) `g (γ) ≤ kγ(s)krx ds,
a
with equality if and only if γ is parameterized proportionally to arc-length. Since any

curve can be reparametrized proportionally to arc-length and r/s = r − 1, we conclude
that
ct,Lr,g (x, y) = t1−r dg (x, y)r ,
and that an Lr,g -minimizing curve has to minimize the length between its end-points.
Therefore any Lr,g -minimizing curve is a geodesic and its speed curve is an orbit of
the geodesic flow φgt . Therefore Lr,g satisfies condition (UC’) of Lemma 6.2.22 for the
geodesic flow φgt of g. The rest of the proposition follows from Lemma 6.2.22.
240 6.0. Appendix
Bibliography
[1] A.Abbondandolo & A.Figalli: High action orbits for Tonelli Lagrangians and
superlinear Hamiltonians on compact configuration spaces. J. Differential Equations,
234 (2007), no.2, 626-653.
[2] R.Abraham & J.E.Marsden: Foundations of mechanics. Second edition, revised
and enlarged. (1978) Benjamin/Cummings Publishing Co. Inc. Advanced Book Pro-
gram, Reading, Mass.
[3] L.Ambrosio: Lecture notes on optimal transport problems, in Mathematical Aspects
of Evolving Interfaces, Lecture Notes in Math. 1812, Springer-Verlag, Berlin/New
York (2003), 1-52.
[4] L.Ambrosio: Transport equation and Cauchy problem for BV vector fields. Invent.
Math., 158 (2004), no.2, 227-260.
[5] L.Ambrosio: Lecture notes on transport equation and Cauchy
problem for non-smooth vector fields. Preprint, 2005 (available at
http://cvgmt.sns.it/people/ambrosio).
[6] L.Ambrosio, O.Ascenzi & G.Buttazzo: Lipschitz regularity for minimizers of
integral functionals with highly discontinuous integrands. J. Math. Anal. Appl., 142
(1989), no. 2, 301-316.
[7] L.Ambrosio & G.Crippa: Existence, uniqueness, stability and differentiability
properties of the flow associated to weakly differentiable vector fields. UMI-Lecture
Notes, to appear.
[8] L.Ambrosio & A.Figalli: Geodesics in the space of measure-preserving maps and
plans. Arch. Ration. Mech. Anal., to appear.
[9] L.Ambrosio & A.Figalli: On the regularity of the pressure field of Brenier’s weak
solutions to incompressible Euler equations. Calc. Var. Partial Differential Equations,
31 (2007), no. 4, 497-509.
241
242 Bibliography
[10] L.Ambrosio, N.Fusco & D.Pallara: Functions of bounded variation and free
discontinuity problems. Oxford Mathematical Monographs, 2000.
[11] L.Ambrosio, N.Gigli & G.Savaré: Gradient flows in metric spaces and in the
Wasserstein space of probability measures. Lectures in Mathematics, ETH Zurich,
Birkhäuser (2005).
[12] L.Ambrosio, B.Kirchheim & A.Pratelli: Existence of optimal transport maps
for crystalline norms. Duke Math. J., 125 (2004), no. 2, 207-241.
[13] L.Ambrosio & A.Pratelli: Existence and stability results in the L1 theory
of optimal transportation. Lectures notes in Mathematics, 1813, Springer Verlag,
Berlin/New York (2003), 123-160.
[14] L.Ambrosio, S.Lisini & G.Savaré: Stability of flows associated to gradient vec-
tor fields and convergence of iterated transport maps. Manuscripta Math., 121 (2006),
1-50.
[15] V.Arnold: Sur la géométrie différentielle des groupes de Lie de dimension infinie
et ses applications à l’hydrodynamique des fluides parfaits. (French) Ann. Inst. Fourier
(Grenoble), 16 (1966), fasc. 1, 319-361.
[16] V.Bangert: Analytische Eigenschaften konvexer Funktionen auf Riemannschen
Manigfaltigkeiten. J. Reine Angew. Math., 307 (1979), 309-324.
[17] V.Bangert: Minimal measures and minimizing closed normal one-currents.
Geom. Funct. Anal., 9 (1999), 413-427.
[18] S.Bates: Toward a precise smoothness hypothesis in Sard’s Theorem. Proc. Amer.
Math. Soc., 117 (1993), no. 1, 279-283.
[19] J.-D. Benamou & Y. Brenier: A computational fluid mechanics solution to the
Monge-Kantorovich mass transfer problem. Numer. Math., 84 (2000), 375-393.
[20] P.Bernard: Existence of C 1,1 critical sub-solutions of the Hamilton-Jacobi equa-
tion on compact manifolds. Annales scientifiques de l’ENS, to appear.
[21] P.Bernard: Young measures, superposition, and transport. In preparation.
[22] P.Bernard & B.Buffoni: Optimal mass transportation and Mather theory. J.
Eur. Math. Soc., 9 (2007), no. 1, 85-121.
[23] P.Bernard & B.Buffoni: The Monge problem for supercritical Mañé potential
on compact manifolds. Adv. Math., 207 (2006), no. 2, 691-706.
Bibliography 243
[24] M.Bernot: Irrigation and Optimal Transport, Ph.D. Thesis, École Normale
Supérieure de Cachan, 2005. Available at http://www.umpa.ens-lyon.fr/˜mbernot.
[25] M.Bernot, V.Caselles & J.M.Morel: Traffic plans. Publ. Mat., 49 (2005),
no. 2, 417-451.
[26] M.Bernot, V.Caselles & J.M.Morel: The structure of branched transporta-

tion networks. Calc. Var. Partial Differential Equations, to appear.
[27] M.Bernot & A.Figalli: Synchronized traffic plans and stability of optima.
ESAIM Control Optim. Calc. Var., to appear.
[28] F.Bouchut Renormalized solutions to the Vlasov equations with coefficient of

bounded variations Arch. Rational Mech. Anal., 157 (2001), 75-90.
[29] A.Brancolini, G.Buttazzo & F.Santambrogio: Path functionals over

Wasserstein spaces. J. Eur. Math. Soc., 8 (2006) 3, 414-434.
[30] Y.Brenier: Polar decomposition and increasing rearrangement of vector fields. C.

R. Acad. Sci. Paris Sér. I Math., 305 (1987), no. 19, 805-808.
[31] Y.Brenier: The least action principle and the related concept of generalized flows
for incompressible perfect fluids. J. Amer. Math. Soc., 2 (1989), 225-255.
[32] Y. Brenier: Polar factorization and monotone rearrangement of vector-valued

functions. Comm. Pure Appl. Math., 44 (1991), 375-417.
[33] Y.Brenier: The dual least action problem for an ideal, incompressible fluid. Arch.
Rational Mech. Anal., 122 (1993), 323-351.
[34] Y.Brenier: A homogenized model for vortex sheets. Arch. Rational Mech. Anal.,
138 (1997), 319-353.
[35] Y.Brenier: Minimal geodesics on groups of volume-preserving maps and general-

ized solutions of the Euler equations. Comm. Pure Appl. Math., 52 (1999), 411-452.
[36] Y.Brenier: Convergence of the Vlasov-Poisson system to the incompressible Euler

equations. Comm. Partial Differential Equations, 25 (2000), no. 3-4, 737-754.
[37] Y.Brenier & W.Gangbo: Lp approximation of maps by diffeomorphisms. Calc.

Var. Partial Differential Equations, 16 (2003), no. 2, 147-164.
244 Bibliography
[38] G.Buttazzo, M.Giaquinta & S. Hildebrandt: One-dimensional variational

problems. Oxford Lecture Series in Mathematics and its Applications, 15 (1998),
Oxford Univ. Press, New York.
[39] L.Caffarelli, M.Feldman & R.J.McCann: Constructing optimal maps for

Monge’s transport problem as a limit of strictly convex costs. J. Amer. Math. Soc.,
15 (2002), 1-26.
[40] L.A.Caffarelli & S.Salsa, Editors: Optimal transportation and applications.

Lecture Notes in Mathematics, 1813. Lectures from the C.I.M.E. Summer School
held in Martina Franca, September 2–8, 2001, Springer-Verlag, Berlin (2003).
[41] P.Cannarsa & C.Sinestrari: Semiconcave Functions, Hamilton-Jacobi Equa-

tions, and Optimal Control. Progress in Nonlinear Differential Equations and Their
Applications, 58, Birkhauser Boston (2004).
[42] J.A.Carrillo, R.J.McCann & C.Villani: Contractions in the 2-Wasserstein

length space and thermalization of granular media. Arch. Rational Mech. Anal., 179
(2006), no. 2, 217-263
[43] C.Castaing & M.Valadier: Convex Analysis and Measurable Multifunctions.

Lectures Notes in Mathematics, 580, Spinger-Verlag, Berlin-New York, 1977.
[44] F.H.Clarke & R.B.Vinter: Regularity properties of solutions to the basic prob-
lem in the calculus of variations. Trans. Amer. Math. Soc., 289 (1985), 73-98.
[45] F.H. Clarke: Methods of dynamic and nonsmooth optimization. CBMS-NSF Re-
gional Conference Series in Applied Mathematics, 57 (1989), Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA.
[46] F.H.Clarke: A Lipschitz regularity theorem. Ergodic Theory Dynam. Systems, to

appear.
[47] F.Colombini & N.Lerner: Uniqueness of continuous solutions for BV vector

fields. Duke Math. J., 111 (2002), 357-384.
[48] F.Colombini & N.Lerner: Uniqueness of L∞ solutions for a class of conormal

BV vector fields. Contemp. Math., vol. 368, 133-156. AMS Providence, RI, 2005.
[49] G.Contreras, J.Delgado & R.Iturriaga: Lagrangians flows: the dynamics

of globally minimizing orbits. II. Bol. Soc. Brasil. Mat. (N.S.), 28 (1997), no. 2,
155-196.
Bibliography 245
[50] D.Cordero-Erasquin, R.J.McCann & M.Schmuckenschlager: A Rieman-

nian interpolation inequality à la Borell, Brascamp and Lieb. Invent. Math., 146
(2001), no. 2, 219-257.
[51] D.L.Cohn: Measure theory. Birkhäuser Boston, (1980), Mass.
[52] G.Contreras: Action potential and weak KAM solutions. Calc. Var. Partial Dif-
ferential Equations, 13 (2001), no. 4, 427-458.
[53] D’Arcy Thompson: On Growth and Form. Cambridge University Press, 1942.
[54] R.J.DiPerna & P.-L.Lions: On the Fokker-Planck-Boltzmann equation. Comm.

Math. Phys., 120 (1988), 1-23.
[55] R.J.DiPerna & P.-L.Lions: On the Cuachy problem for Boltzmann equations:
global existence and weak stability. Annals of Math., 130 (1989), no.2, 321-366.
[56] R.J.DiPerna & P.-L.Lions: Ordinary differential equations, transport theory and
Sobolev spaces. Invent. Math., 98 (1989), no.3, 511-547.
[57] R.M.Dudley: Real Analysis and Probability. Cambridge University Press, 2002.
[58] D.G.Ebin & J.Marsden: Groups of diffeomorphisms and the motion of an ideal
incompressible fluid. Annals of Math., 2 (1970), 102–163.
[59] L.C.Evans: Partial differential equations and Monge-Kantorovich mass transfer, in

R. Bott et al., editors, Current Developments in Mathematics. (1997), International
Press, Cambridge, 26-78.
[60] L.C.Evans & W.Gangbo: Differential equations methods for the Monge-
Kantorovich mass transfer problem. Mem. Amer. Math. Soc., 137 (1999).
[61] L.C.Evans & R.F.Gariepy: Measure Theory and Fine Properties of Functions.
Studies in Advanced Mathematics, CRC Press, Boca Raton, FL, 1992.
[62] A.Fathi: Théorème KAM faible et théorie de Mather sur les systèmes lagrangiens.
C. R. Acad. Sci. Paris Sér. I Math., 324 (1997), no. 9, 1043-1046.
[63] A.Fathi: Solutions KAM faibles conjuguées et barrières de Peierls. C. R. Acad.

Sci. Paris Sér. I Math., 325 (1997), no. 6, 649-652.
[64] A.Fathi: Regularity of C 1 solutions of the Hamilton-Jacobi equation. Ann. Fac.

Sci. Toulouse Math. , 12 (2003), 479-516.
246 Bibliography
[65] A.Fathi: Weak KAM theorems in Lagrangian Dynamics. Book to appear.
[66] A.Fathi & A.Figalli: Optimal transportation on non-compact manifolds. Israel

J. Math., to appear.
[67] A.Fathi, A.Figalli & L.Rifford: On a problem of Mather. Comm. Pure Appl.
Math., to appear.
[68] A.Fathi & E.Maderna: Weak KAM theorem on non compact manifolds. Non-
linear Differential Equations Appl., to appear.
[69] A.Fathi & A.Siconolfi: Existence of C 1 critical subsolutions of the Hamilton-

Jacobi equation. Invent. Math., 155 (2004), 363-388.
[70] H.Federer: Geometric measure theory. Die Grundlehren der mathematischen

Wissenschaften, 153, Springer-Verlag New York Inc., New York (1969).
[71] M.Feldman & R.J.McCann: Monge’s transport problem on a Riemannian man-

ifold. Trans. Amer. Math. Soc., 354 (2002), 1667-1697.
[72] S.Ferry: When ²-boundaries are manifolds. Fund. Math., 90 (1976), no. 3, 199-
210.
[73] A.Figalli: Trasporto ottimale su varietà non compatte. Degree Thesis (in english),
(2006) (available at http://cvgmt.sns.it/people/figalli ).
[74] A.Figalli: The Monge problem on non-compact manifolds. Rend. Sem. Mat. Univ.
Padova, 117 (2007), 147-166.
[75] A.Figalli: Existence, Uniqueness, and Regularity of Optimal Transport Maps.

SIAM, Journal of Math. Anal., 39 (2007), no. 1, 126-137.
[76] A.Figalli: A simple proof of the Morse-Sard theorem in Sobolev spaces. Proc.
Amer. Math. Soc., to appear.
[77] A.Figalli: Existence and uniqueness of martingale solutions for SDEs with rough
or degenerate coefficients. J. Funct. Anal, 254 (2008), no.1, 109-153.
[78] A.Figalli & C.Villani: Strong displacement convexity on Riemannian mani-

folds. Math. Z., 257 (2007), no.2, 251-259.
[79] G.Forni & J.N.Mather: Action minimizing orbits in Hamiltonian systems,

Transition to chaos in classical and quantum mechanics, Montecatini Terme,1991.
Lecture Notes in Math., 1589 (1994), Springer,Berlin, 92-186.
Bibliography 247
[80] W.Gangbo & R.J.McCann: The geometry of optimal transportation. Acta

Math., 177 (1996), 113-161.
[81] W.Gangbo: The Monge mass transfer problem and its applications, in Monge
Ampère equation: applications to geometry and optimization. Contemp. Math., 226
(1999), Amer. Math. Soc., Providence, RI, 79-104.
[82] E.N.Gilbert: Minimum cost communication networks. Bell System Tech. J., 46
(1967), 2209-2227.
[83] M.Hauray, C.LeBris & P.-L.Lions: Deux remarques sur les flots généralisées
d’équations différentielles ordinaires. [Two remarks generalized flows for ordinary
differential equations]. C. R. Acad. Sc. Paris, Sér. I, Math, submitted.
[84] H.Hofer & E.Zehnder: Symplectic invariants and Hamiltonian dynamics.

Birkhauser Advanced Texts: Basler Lehrbücher. (1994) Birkhauser Verlag, Basel.
[85] L.V.Kantorovich: On the transfer of masses. Dokl. Akad. Nauk. SSSR, 37

(1942), 227-229.
[86] L.V.Kantorovich: On a problem of Monge. Uspekhi Mat.Nauk., 3 (1948), 225-

226.
[87] M.Knott & C.S.Smith: On the optimal mapping of distributions. J. Optim.

Theory Appl., 43 (1984), 39-49.
[88] N.V.Krylov & M.Röckner: Strong solutions of stochastic equations with singu-
lar time dependent drift. Probab. Theory Related Fields, 131 (2005), no. 2, 154-196.
[89] J.M.Lasry & P.-L.Lions: A remark on regularization in Hilbert spaces. Israel J.

Math., 55 (1986), no. 3, 257-266.
[90] C.LeBris & P.-L.Lions: Renormalized solutions of some transport equations with
partially W 1,1 velocities and applications. Annali di matematica pura e applicata, 183
(2004), 97-130.
[91] C.LeBris & P.-L.Lions: Existence and uniqueness of solutions to Fokker-Planck

type equations with irregular coefficients. Comm. Partial Differential Equations, to
appear.
[92] C.LeBris & P.-L.Lions: Generalized flows for stochastic differential equations
with irregular coefficients. In preparation.
248 Bibliography
[93] J.-L.Lions: Équations diffŕentielles opérationnelles et problèmes aux limites.

(French) Die Grundlehren der mathematischen Wissenschaften, 111. Springer-Verlag,
Berlin, 1961.
[94] P.-L.Lions: Mathematical topics in fluid mechanics, Vol. I: incompressible models.
Oxford Lecture Series in Mathematics and its applications, 3, Oxford University
Press, 1996.
[95] P.-L.Lions: Sur les équations différentielles ordinaires et les équations de trans-
port. [On ordinary differential equations and transport equations]. C. R. Acad. Sc.
Paris, Sér. I, Math, 326 (1998), no. 7, 833-838.
[96] P.-L.Lions, G.Papanicolau & S.R.S.Varadhan: Homogenization of
Hamilton-Jacobi equation. Unpublished preprint, 1987.
[97] J.Lott & C.Villani: Ricci curvature via optimal transport. Annals of Math., to
appear.
[98] F.Maddalena, S.Solimini & J.M.Morel: A variational model of irrigation
patterns. Interfaces and Free Boundaries, 5 (2003), 391–416.
[99] R.Mañé: On the minimizing measures of Lagrangian dynamical systems. Nonlin-
earity, 5 (1992), no. 3, 623-638.
[100] R.Mañé: Lagrangians flows: the dynamics of globally minimizing orbits. Bol. Soc.
Brasil. Mat. (N.S.), 28 (1997), no. 2, 141-153.
[101] N.G.Markley: On the number of recurrent orbit closures. Proc. Amer. Math.
Soc., 25 (1970), no. 2, 413-416.
[102] J.N. Mather: Existence of quasiperiodic orbits for twist homeomorphisms of the
annulus. Topology, 21 (1982), 457-467.
[103] J.N.Mather: Variational construction of connecting orbits. Ann. Inst. Fourier,
43 (1993), 1349-1386.
[104] J.N.Mather: Total disconnectedness of the quotient Aubry set in
low dimensions. Comm. Pure Appl. Math., 56 (2003), 1178-1183.
[105] J.N.Mather: Examples of Aubry sets. Ergod. Th. Dynam. Sys., 24 (2004), 1667-
1723.
[106] McCann: A convexity principle for interacting gases and equilibrium crystals.
PhD thesis, Princeton University, 1994.
Bibliography 249
[107] R.J.McCann: Existence and uniqueness of monotone measure-preserving maps.

Duke Math. J., 80 (1995), no. 2, 309-323.
[108] R.J.McCann: A convexity principle for interacting gases. Adv. Math., 128
(1997), 153-179.
[109] R.J.McCann: Polar factorization of maps on Riemannian manifolds. Geom.

Funct. Anal., 11 (2001), no. 3, 589-608.
[110] G.Monge: Mémoire sur la Théorie des Déblais et des Remblais. Hist. de l’Acad.
des Sciences de Paris (1781), 666-704.
[111] A.P.Morse: The behavior of a function on its critical set. Annals of Math., 40
(1939), 62-70.
[112] J.Moser: Selected chapters in the calculus of variations, Lectures in Mathematics

ETH Zürich, Lecture notes by Oliver Knill, Birkhäuser Verlag, Basel (2003).
[113] J.D.Murray: Mathematical Biology. Biomathematics texts 19, Springer, 1993.
[114] A.Norton: A Critical Set with Nonnull Image has Large Hausdorff Dimension.
Trans. Amer. Math. Soc., 296 (1986), no. 1, 367-376.
[115] I.Nikolaev & E.Zhuzhoma: Flows on 2-dimensional Manifolds. Lecture

Notes in Mathematics, vol 1705, Springer-Varlag, Berlin, 1999.
[116] F.Otto & C.Villani: Generalization of an inequality by Talagrand and links

with the logarithmic Sobolev inequality. J. Funct. Anal., 173 (2000), no. 2, 361-400.
[117] S.T.Rachev & L. Rüschendorf: A characterization of random variables with

minimum L2 -distance. J. Multivariate Anal., 32 (1990), 48-54. Corrigendum in J.
Multivariate Anal., 34 (1990), 156.
[118] S.T.Rachev & L.Ruschendorf: Mass transportation problems. Vol. I: Theory,

Vol. II: Applications. Probability and ita applications, Springer, 1998.
[119] L.Rifford: On viscosity solutions of certain Hamilton-Jacobi equations: Regular-

ity results and generalized Sard’s Theorems. Comm. Partial Differential Equations,
to appear.
[120] W.Schachermayer & J.Teichmann: Characterization of optimal transport

plans for the Monge-Kantorovich-problem. Proc. Amer. Math. Soc., to appear.
250 Bibliography
[121] A.I.Shnirelman: The geometry of the group of diffeomorphisms and the dynamics
of an ideal incompressible fluid. (Russian) Mat. Sb. (N.S.), 128 (170) (1985), no. 1,
82–109.
[122] A.I.Shnirelman: Generalized fluid flows, their approximation and applications.
Geom. Funct. Anal., 4 (1994), no. 5, 586–620.
[123] S.K.Smirnov: Decomposition of solenoidal vector charges into elementary
solenoids and the structure of normal one-dimensional currents. St. Petersburg Math.
J., 5 (1994), 841–867.
[124] A.Sorrentino: On the total disconnectedness of the quotient Aubry set. Preprint,
2006.
[125] D.W.Stroock & S.R.S.Varadhan: Multidimensional diffusion processes.
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Math-
ematical Sciences], 233. Springer-Verlag, Berlin-New York, 1979.
[126] Sturm, K.-T. On the geometry of metric measure spaces. I. Acta Math., 196, 1
(2006), 65-131.
[127] Sturm, K.-T. On the geometry of metric measure spaces. II. Acta Math. 196, 1
(2006), 133-177.
[128] K.T.Sturm & M.K.von Renesse: Transport Inequalities, Gradient Estimates,
Entropy and Ricci Curvature. Comm. Pure Appl. Math., 58, 7 (2005), 923-940.
[129] V.N.Sudakov: Geometric problems in the theory of infinite-dimensional proba-
bility distributions. Proc. Steklov Inst. Math., 141 (1979), 1-178.
[130] N.S.Trudinger & X.J.Wang: On the Monge mass transfer problem. Calc. Var.
Partial Differential Equations, 13 (2001), 19-31.
[131] A.M.Turing: The chemical basis of morphogenesis. Phil. Trans. Soc. Lond.,
B237 (1952), 37-72.
[132] C.Villani: Topics in optimal transportation. Graduate Studies in Mathematics,
58 (2003), American Mathematical Society, Providence, RI.
[133] C.Villani: Optimal transport, old and new. Lecture notes, 2005 Saint-Flour sum-
mer school, available online at http://www.umpa.ens-lyon.fr/˜cvillani.
[134] R.Vinter: Optimal control. Systems & Control: Foundations & Applications,
(2000), Birkhäuser Boston Inc., Boston, MA.
Bibliography 251
[135] Q.Xia: Optimal paths related to transport problems. Commun. Contemp. Math.,
5 (2003), no.2, 251-279.

Optimal Transportation and Action Minimizing Measures PDF

Uploaded by

Copyright:

Available Formats

Optimal Transportation and Action Minimizing Measures PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Transportation and Action Minimizing Measures PDF

Uploaded by

Copyright:

Available Formats

Scuola Normale Superiore of Pisa

Optimal transportation and

1 The optimal transportation problem 17

2 The irrigation problem 61

3 Variational models for the incompressible Euler equations 81

3.4 Equivalence of the two relaxed models . . . . . . . . . . . . . . . . . . . 99

4 On the structure of the Aubry set and Hamilton-Jacobi equation 135

5 DiPerna-Lions theory for SDE 167

Of course, it is desirable to generalize this to continuous, rather than just discrete,

1. The optimal transportation problem on manifolds with geometric costs: we study

2. The optimal irrigation problem: this is a generalization of the classical optimal

4. The Aubry-Mather theory and the solutions of Hamilton-Jacobi equations: the

5. The DiPerna-Lions theory for martingale solutions of stochastic differential equa-

where πX : X × Y → X and πY : X × Y → Y are the canonical projections. Denoting

This is a joint work with Marc Bernot [27].

where SDiff(D) denotes the space of measure-preserving diffeomorphisms of D. Viewing

(et )# η = L d xD ∀t ∈ [0, T ], (e0 , eT )# η = (id × h)# L d xD,

and defined the action of η as

The existence of a minimizer can be proved by a standard compactness and semiconti-

|dx u|2 ≤ −2V (x),

and the associated ordinary differential equation

and the associated stochastic differential equation

The optimal transportation problem

Evans-Gangbo [60], Feldman-McCann [71], Caffarelli-Feldman-McCann [39], Trudinger-

Theorem 1.1.1. Suppose that M is a connected complete Riemannian manifold, whose

for some given x0 ∈ M , then we can find a transport map T : M → M , with T] µ = ν,

We recall that a measure on a smooth manifold is absolutely continuous with respect

1.2 Background and some definitions

then one would obtain the following equality:

Definition 1.2.2 (Calibration). Given an optimal plan γ, we say that a c-subsolution

ψ(y) − ϕ(x) = c(x, y) for γ-a.e. (x, y).

Lemma 1.2.5. Let M be an n-dimensional manifold, N be a Polish space, and let

|c(x + h, y) − c(x, y) − Tk (h)|

{(x, y) | inf inf Lj,k (x, y) = 0},

which is clearly a Borel set.

(Λlc )−1 : T ∗ M ⊃ Λlc (D(Λlc )) → D(Λlc ) ⊂ M × N.

where ȳ is an arbitrary point, but fixed point, in N .

1.3 The main result

(ii) the cost c satisfies the left twist condition.

Let (ϕ, ψ) be a c-subsolution, and consider the set G(ϕ,ψ) ⊂ M × N given by

G(ϕ,ψ) = {(x, y) ∈ M × N | ψ(y) − ϕ(x) = c(x, y)}.

• For x ∈ Cn , the derivative dx ϕn exists, ϕn+1 (x) = ϕn (x), and dx ϕn+1 = dx ϕn .

• If we set C = ∪n Cn , there exists a Borel countably (n − 1)-Lipschitz set E ⊂ M

Moreover, the Borel map T : M → N is such that

• For every x ∈ Cn , we have

(x, T (x)) = Λl,inv

(x, T (x)) ∈ D(Λlc ) and Λlc (x, T (x)) = (x − dx ϕn ).

Therefore, thanks to the twist condition, the map T is uniquely defined on P \ E ⊂ C.

Theorem 1.3.2. Let M be a smooth (second countable) manifold, let N be a Polish

(ii) the cost c satisfies the left twist condition,

(iii) the measure µ gives zero mass to countably (n − 1)-Lipschitz sets,

(iv) the infimum in the Kantorovitch problem (1.2.1) is finite.

G = G(ϕ,ψ) = {(x, y) ∈ M × N | ψ(y) − ϕ(x) = c(x, y)}.

Proof of Theorem 1.3.1. By definition of c-subsolution, we have ϕ > −∞ everywhere

We have Vn ⊂ Vn+1 . Define ϕn : M → N by

Since ψ ≤ n on Kn , and −c is bounded from above, we see that ϕn is bounded from

A key observation is now the following:

where Pn = πM (G(ϕ,ψ) ∩ (M × Vn )). In fact, if x ∈ Pn , by the definition of Pn we know

So we have proved that G ∩ (M × Vn ) is the graph over Pn ∩ F of the map T defined