Partial Differential Equations
Computational Methods in Applied Sciences
Volume 16
Series Editor
E. Oñate
International Center for Numerical Methods in Engineering (CIMNE)
Technical University of Catalonia (UPC)
Edificio C-1, Campus Norte UPC
Gran Capitán, s/n
08034 Barcelona, Spain
onate@cimne.upc.edu
www.cimne.com
For other titles published in this series, go to
www.springer.com/series/6899
Partial Differential
Equations
Modeling and Numerical Simulation
Edited by
Roland Glowinski
University of Houston, TX, USA
and
Pekka Neittaanmäki
University of Jyväskylä, Finland
123
Editors
Roland Glowinski
Department of Mathematics
University of Houston
USA
roland@math.uh.edu
ISBN 978-1-4020-8757-8
Pekka Neittaanmäki
Department of Mathematical Information
Technology
University of Jyväskylä
Finland
pn@mit.jyu.fi
e-ISBN 978-1-4020-8758-5
Library of Congress Control Number: 2008930138
c 2008 Springer Science + Business Media B.V.
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written
permission from the Publisher, with the exception of any material supplied specifically for the purpose
of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
Printed on acid-free paper
9 8 7 6 5 4 3 2 1
springer.com
Dedicated to Olivier Pironneau
Preface
For more than 250 years partial differential equations have been clearly the
most important tool available to mankind in order to understand a large
variety of phenomena, natural at first and then those originating from human activity and technological development. Mechanics, physics and their
engineering applications were the first to benefit from the impact of partial
differential equations on modeling and design, but a little less than a century
ago the Schrödinger equation was the key opening the door to the application
of partial differential equations to quantum chemistry, for small atomic and
molecular systems at first, but then for systems of fast growing complexity.
The place of partial differential equations in mathematics is a very particular
one: initially, the partial differential equations modeling natural phenomena
were derived by combining calculus with physical reasoning in order to express conservation laws and principles in partial differential equation form,
leading to the wave equation, the heat equation, the equations of elasticity,
the Euler and Navier–Stokes equations for fluids, the Maxwell equations of
electro-magnetics, etc. It is in order to solve ‘constructively’ the heat equation
that Fourier developed the series bearing his name in the early 19th century;
Fourier series (and later integrals) have played (and still play) a fundamental
role in both pure and applied mathematics, including many areas quite remote
from partial differential equations.
On the other hand, several areas of mathematics such as differential geometry have benefited from their interactions with partial differential equations.
The need for a better understanding of the properties of the solution of these
equations has been a driver for both the mathematical investigation of their
existence, uniqueness, regularity, and other properties, and the development
of constructive methods to approximate these solutions. Numerical methods
for the approximate solution of partial differential equations were invented,
developed and applied to real life situations long before the advance (in the
mid-forties) of digital computers; let us mention among these early methods:
finite differences, Galerkin, Courant finite element, and a variety of iterative
methods. However, the exponential growth in speed and memory of digital
VIII
Preface
computers has been at the origin of an explosive development of numerical
mathematics, leading itself to applications of size and complexity unthinkable
a not so long time ago.
There has been simultaneity in the progress achieved on both the theory
and the numerics of partial differential equations, each feeding the other one:
indeed, methods for proving the existence of solutions have lead to numerical
methods for the actual computation of these solutions; on the other hand,
conjectures on mathematical properties of solutions have been verified first
computationally providing thus a justification for further analytical investigations. Applications of partial differential equations are essentially everywhere
since to the areas mentioned above we have to add bio and health sciences,
finance, image processing. (It is worth mentioning that today the term partial
differential equations has to be taken in a broader sense than let say fifty years
ago in order to include partial differential inequalities, which are of fundamental importance in, for example, the modeling of non-smooth phenomena.)
From the above comments, it is quite obvious that the “world of partial
differential equations” is a very large and complex one, and, therefore, quite
difficult to explore. Not surprisingly, the many aspects of partial differential
equations (theory, modeling and computation) have motivated a huge number
of publications (books, articles, conference proceedings, websites). Concerning
books, most of them are necessarily specialized (unless elementary) with topics such as elliptic equations, parabolic equations, Navier–Stokes equations,
Maxwell equations, to name some of the most popular ones. We think thus
that there is a need for books on partial differential equations addressing at a
reasonably advanced level a variety of topics. From a practical point of view,
the diversity we mentioned above implies that such books have to be necessarily multi-authors. We think that the present volume is an answer to such a
need since it contains the contributions of experts of international reputation
on a quite diverse selection of topics all partial differential equation related,
ranging from well-established ones in mechanics and physics to very recent
ones in micro-electronics and finance. In all these contributions the emphasis
has been on the modeling and computational aspects.
This volume is structured as follows: In Part I, discontinuous Galerkin and
mixed finite element methods are applied to a variety of linear and nonlinear
problems, including the Stokes problem from fluid mechanics and fully nonlinear elliptic equations of the Monge-Ampère type. Part II is dedicated to the
numerical solution of linear and nonlinear hyperbolic problems. In Part III one
discusses the solution by domain decomposition methods of scattering problems for wave models and of electronic structure related nonlinear variational
problems. Part IV is devoted to various issues concerning the modeling and
simulation of fluid mechanics phenomena involving free surfaces and moving
boundaries. The finite difference solution of a problem from spectral geometry
has also been included in this part. Part V is dedicated to inverse problems.
Finally, in Part VI one addresses the parabolic variational inequalities based
modeling and simulation of finance related processes.
Preface
IX
Some of the issues discussed in this volume have been addressed at the
international conference taking place in Helsinki during fall 2005 to honor
Olivier Pironneau on the occasion of his 60th anniversary. Additional material
has been included in order to broaden the scope of the volume.
Special acknowledgements are due to Marja-Leena Rantalainen from University of Jyväskylä for her most constructive role in the various stages of this
project.
Houston and Jyväskylä
Roland Glowinski
Pekka Neittaanmäki
Contents
List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XIII
Part I Discontinuous Galerkin and Mixed Finite Element Methods
Discontinuous Galerkin Methods
Vivette Girault and Mary F. Wheeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Mixed Finite Element Methods on Polyhedral Meshes
for Diffusion Equations
Yuri A. Kuznetsov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
On the Numerical Solution of the Elliptic Monge–Ampère
Equation in Dimension Two: A Least-Squares Approach
Edward J. Dean and Roland Glowinski . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
Part II Linear and Nonlinear Hyperbolic Problems
Higher Order Time Stepping for Second Order Hyperbolic
Problems and Optimal CFL Conditions
J. Charles Gilbert and Patrick Joly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
Comparison of Two Explicit Time Domain Unstructured
Mesh Algorithms for Computational Electromagnetics
Igor Sazonov, Oubay Hassan, Ken Morgan, and Nigel P. Weatherill . . .
95
The von Neumann Triple Point Paradox
Richard Sanders and Allen M. Tesdall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Part III Domain Decomposition Methods
A Lagrange Multiplier Based Domain Decomposition Method for
the Solution of a Wave Problem with Discontinuous Coefficients
Serguei Lapin, Alexander Lapin, Jacques Périaux,
and Pierre-Marie Jacquart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
XII
Contents
Domain Decomposition and Electronic Structure
Computations: A Promising Approach
Guy Bencteux, Maxime Barrault, Eric Cancès, William W. Hager,
and Claude Le Bris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Part IV Free Surface, Moving Boundaries and Spectral Geometry
Problems
Numerical Analysis of a Finite Element/Volume Penalty
Method
Bertrand Maury . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
A Numerical Method for Fluid Flows with Complex Free
Surfaces
Andrea Bonito, Alexandre Caboussat, Marco Picasso,
and Jacques Rappaz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Modelling and Simulating the Adhesion and Detachment
of Chondrocytes in Shear Flow
Jian Hao, Tsorng-Whay Pan, and Doreen Rosenstrauch . . . . . . . . . . . . . 209
Computing the Eigenvalues of the Laplace–Beltrami Operator
on the Surface of a Torus: A Numerical Approach
Roland Glowinski and Danny C. Sorensen . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Part V Inverse Problems
A Fixed Domain Approach in Shape Optimization Problems
with Neumann Boundary Conditions
Pekka Neittaanmäki and Dan Tiba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Reduced-Order Modelling of Dispersion
Jean-Marc Brun and Bijan Mohammadi . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Part VI Finance (Option Pricing)
Calibration of Lévy Processes with American Options
Yves Achdou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
An Operator Splitting Method for Pricing American Options
Samuli Ikonen and Jari Toivanen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
List of Contributors
Yves Achdou
UFR Mathématiques
Université Paris 7
Case 7012
FR-75251 Paris Cedex 05
France
achdou@math.jussieu.fr
Maxime Barrault
EDF R&D
1 avenue du Général de Gaulle
92141 Clamart Cedex
France
maxime.barrault@edf.fr
Jean-Marc Brun
CEMAGREF/ITAP
FR-34095 Montpellier
France
jean-marc.brun@cemagref.fr
Alexandre Caboussat
Department of Mathematics
University of Houston
Houston, TX 77204-3008
USA
caboussat@math.uh.edu
Guy Benctueux
EDF R&D
1 avenue du Général de Gaulle
92141 Clamart Cedex
France
guy.bencteux@edf.fr
Eric Cancès
CERMICS
Ecole Nationale des Ponts
et Chaussées
6 & 8 avenue Blaise Pascal
Cité Descartes
77455 Marne-La-Vallée Cedex 2
France
cances@cermics.enpc.fr
Andrea Bonito
Department of Mathematics
University of Maryland
College Park, MD 20742-4015
USA
andrea.bonito@epfl.ch
Edward J. Dean
University of Houston
Department of Mathematics
4800 Calhoun
Houston, TX 77004
USA
dean@math.uh.edu
XIV
List of Contributors
Jean-Charles Gilbert
INRIA
Domaine de Voluceau-Roquencourt
BP 105
FR-78153 Le Chesnay Cedex
France
Jean-Charles.Gilbert@inria.fr
Vivette Girault
Laboratoire Jacques-Louis Lions
Université Pierre et Marie Curie
Case 187, 4 Place Jussieu
FR-75252 Paris Cedex 05
France
girault@ann.jussieu.fr
Roland Glowinski
University of Houston
Department of Mathematics
4800 Calhoun
Houston, TX 77004
USA
roland@math.uh.edu
William H. Hager
Department of Mathematics
University of Florida
Gainesville, FL 32611-8105
USA
hager@math.ufl.edu
Jian Hao
Department of Mathematics
University of Houston
Houston, TX 77204-3008
USA
jianh@math.uh.edu
Oubay Hassan
Civil and Computational
Engineering Centre
University of Wales-Swansea
Swansea SA2 8PP
Wales
UK
O.Hassan@swansea.ac.uk
Samuli Ikonen
Nordea Markets
FI-00020 Nordea
Finland
Samuli.Ikonen@nordea.com
Pierre-Marie Jacquart
Dassault Aviation
78, Quai Marcel Dassault
Cedex 300, Saint-Cloud 92552
France
pierre-marie.jacquart@dassaultaviation.fr
Patrick Joly
INRIA
Domaine de Voluceau-Roquencourt
BP 105
FR-78153 Le Chesnay Cedex
France
Patrick.Joly@inria.fr
Yuri Kuznetsov
University of Houston
Department of Mathematics
4800 Calhoun
Houston, TX 77004
USA
kuz@math.uh.edu
Alexander Lapin
Kazan State University
Department of Computational
Mathematics and Cybernetics
18 Kremlyovskaya St.
Kazan 420008
Russia
alapin@ksu.ru
Serguei Lapin
University of Houston
Department of Mathematics
4800 Calhoun Rd
Houston, TX 77204
USA
slapin@math.uh.edu
List of Contributors
Claude Le Bris
CERMICS
6&8 Avenue Blaise Pascal
Cité Descartes
FR-77455 Marne-la-Vallée Cedex 02
France
lebris@cermics.enpc.fr
Bertrand Maury
Laboratoire de Mathématiques
Université Paris-Sud
FR-91405 Orsay Cedex
France
Bertrand.Maury@math.u-psud.fr
Bijan Mohammadi
Mathematics and Modeling Institute
Université de Montpellier II
CC 51
FR-34095 Montpellier
France
Bijan.Mohammadi@math.
univ-montp2.fr
Ken Morgan
Civil and Computational
Engineering Centre
University of Wales-Swansea
Swansea SA2 8PP
Wales
UK
K.Morgan@swansea.ac.uk
Pekka Neittaanmäki
University of Jyväskylä
Department of Mathematical
Information Technology
P.O. Box 35 (Agora)
FI-40014, Jyväskylä
Finland
pn@mit.jyu.fi
Tsorng-Whay Pan
Department of Mathematics
University of Houston
Houston, TX 77204-3008
USA
pan@math.uh.edu
XV
Jacques Periaux
University of Jyväskylä
Department of Mathematical
Information Technology
P.O. Box 35
FI-40014 University of Jyväskylä
Finland
jperiaux@free.fr
Marco Picasso
Institute of Analysis &
Scientific Computing
Ecole Polytechnique
Fédérale de Lausanne
1015 Lausanne
Switzerland
marco.picasso@epfl.ch
Jacques Rappaz
Institut d’Analyse et Calcul
Scientifique
Bat. de mathématiques, Station 8
Ecole Polytechnique Fédérale de
Lausanne
CH-1015 Lausanne
Switzerland
jacques.rappaz@epfl.ch
Doreen Rosenstrauch
The Texas Heart Institute & The
University of Texas Health Science
Center at Houston
Houston, TX 77030
USA
Doreen.Rosenstrauch@uth.tmc.edu
Richard Sanders
University of Houston
Department of Mathematics
4800 Calhoun
Houston, TX 77004
USA
sanders@math.uh.edu
XVI
List of Contributors
Igor Sazanov
Civil and Computational
Engineering Centre
University of Wales-Swansea
Swansea SA2 8PP
Wales
UK
i.sazonov@swansea.ac.uk
Danny C. Sorensen
Rice University
Department of Computational
& Applied Mathematics
Houston, TX, 77251-1892
USA
sorensen@rice.edu
Allen M. Tesdall
Fields Institute
Toronto,ON M5T 3J1
and
Department of
Mathematics
University of Houston
Houston, TX 77204
USA
atesdall@fields.utoronto.ca
Dan Tiba
Romanian Academy
Institute of Mathematics
P.O. Box 1-764
RO-014700 Bucharest
Romania
dan.tiba@imar.ro
Jari Toivanen
Department of Mathematical
Information Technology
P.O. Box 35 (Agora)
FI-40014 University of Jyväskylä,
Finland
Jari.Toivanen@mit.jyu.fi
Nigel P. Weatherill
Civil and Computational
Engineering Centre
University of Wales-Swansea
Swansea SA2 8PP
Wales
UK
N.P.Weatherill@swansea.ac.uk
Mary F. Wheeler
Institute for Computational
Engineering & Sciences (ICES)
University of Texas at Austin
Austin, TX 78712
USA
mfw@ices.utexas.edu
Discontinuous Galerkin Methods
Vivette Girault1 and Mary F. Wheeler2
1
2
Laboratoire Jacques-Louis Lions, Université Pierre et Marie Curie, Paris VI,
FR-75252 Paris cedex 05, France girault@ann.jussieu.fr
Institute for Computational Engineering and Sciences (ICES),
University of Texas at Austin, Austin, TX 78712, USA mfw@ices.utexas.edu
Summary. In this article, we describe some simple and commonly used discontinuous Galerkin methods for elliptic, Stokes and convection-diffusion problems. We
illustrate these methods by numerical experiments.
1 Introduction and Preliminaries
Discontinuous Galerkin (DG) methods use discontinuous piece-wise polynomial spaces to approximate the solution of PDE’s in variational form. The
concept of discontinuous space approximations was introduced in the early
70’s, probably starting with the work of Nitsche [Nit71] in 1971 on domain
decomposition and followed by a number of important contributions such
as the work of Babus̆ka and Zlamal [BZ73], Crouzeix and Raviart [CR73],
Rachford and Wheeler [RW74], Oden and Wellford [OW75], Douglas and
Dupont [DD76], Baker [Bak77], Wheeler [Whe78], Arnold [Arn79, Arn82] and
Wheeler and Darlow [WD80]. Afterward, interest in DG methods for elliptic
problems declined probably because computing facilities at that time were not
sufficient to solve efficiently such schemes. By the end of the 90’s, the thesis of
Baumann [Bau97] and the spectacular increase in computing power, triggered
a renewal of interest in discontinuous Galerkin methods for elliptic and parabolic problems. The work of Baumann was followed by numerous publications
such as Oden, Babus̆ka and Baumann [OBB98], Baumann and Oden [BO99],
Rivière et al. [RWG99, RWG01], Rivière [Riv00], Arnold et al. [ABCM02],
among many others. Research on DG methods is now a very active field.
In the meantime, discontinuous methods were applied extensively to hyperbolic problems [Bey94, BOP96]. One of the first is the upwind scheme
introduced by Reed and Hill in their report [RH73] on neutron transport in
1973. The first numerical analysis was done by Lesaint and Raviart [LR74] in
1974 for the transport equation and by Girault and Raviart [GR79] in 1982 for
the Navier–Stokes equations. We refer to the books by Pironneau [Pir89] and
by Girault and Raviart [GR86] for a thorough study of this upwind scheme.
4
V. Girault and M.F. Wheeler
DG methods have many advantages over continuous methods. The discontinuity of their functions allow the use of non-conforming grids and variable
degree of polynomials on adjacent elements. They are locally mass conservative on each element. Their mass matrix in time-dependent problems is block
diagonal. They are particularly well-adapted to problems with discontinuous
coefficients and can effectively capture discontinuities in the solution. They
can impose essential boundary conditions weakly without the use of a multiplier and thus can be applied to domain decomposition without involving
multipliers. They can be applied to incompressible elasticity problems. They
can be easily coupled with continuous methods.
On the negative side, they are expensive, because they require many degrees of freedom and for this reason, efficient solvers using DG methods for
elliptic or parabolic problems are still the object of research.
In this article, we present a survey on some simple DG methods for elliptic, flow and transport problems. We concentrate essentially on IIPG, SIPG,
NIPG, OBB-DG and the upwind DG of Lesaint and Raviart. There is no space
to present all DG methods and for this reason, we have left out the more sophisticated schemes such as Local Discontinuous Galerkin (LDG) methods for
which we refer to Arnold et al. [ABCM02].
This article is organized as follows. In Section 2, we derive the equations
on which number of DG methods are based when applied to simple model
problems. Section 3 is devoted to the approximation of a Darcy flow. In
Section 4, we describe some DG methods for an incompressible Stokes flow.
A convection-diffusion equation is approximated in Section 5. Section 6 is devoted to numerical experiments performed at the Institute for Computational
Engineering and Sciences, UT Austin.
In the sequel, we shall use the following functional notation. Let Ω be a
domain in Rd , where d is the dimension. For an integer m ≥ 1, H m (Ω) denotes
the Sobolev space defined recursively by
H m (Ω) = {v ∈ H m−1 (Ω); ∇v ∈ H m−1 (Ω)d },
and we set
H 0 (Ω) = L2 (Ω),
equipped with the norm
vL2 (Ω) =
Ω
2
|v| dx
21
.
For fluid pressure and other variables defined up to an additive constant, it
is useful in theory to fix the constant by imposing the zero mean value and,
therefore, we use the space
2
2
L0 (Ω) = v ∈ L (Ω);
v dx = 0 .
Ω
Discontinuous Galerkin Methods
5
2 An Elementary Derivation of Some Simple DG
Methods
In this section, we use very simple examples to derive the equations that are
at the basis of IIPG, SIPG, NIPG, OBB-DG methods and the upwind DG
method of Lesaint–Raviart. In each example, we work out the equations on
a plane domain Ω, with boundary ∂Ω, partitioned into two non-overlapping
subdomains Ω1 and Ω2 with interface Γ12 , and to fix ideas we assume that
each subdomain has part of its boundary on ∂Ω.
2.1 The General Idea for Elliptic Problems
Consider the Laplace equation with a homogeneous Dirichlet boundary condition in Ω and with data in L2 (Ω):
−∆u = f
in Ω,
u=0
on ∂Ω.
(1)
Let v be a test function that is sufficiently smooth in each Ωi , but does not
belong necessarily to H 1 (Ω). If we multiply both sides of the first equation in
(1) by v, apply Green’s formula in each Ωi , and assume that the solution u is
smooth enough, we obtain:
2
∇u · ∇v dx −
(∇u · ni )|Ωi v|Ωi dσ =
f v dx,
(2)
i=1
Ωi
∂Ωi
Ω
where ni denotes the unit normal to ∂Ωi , exterior to Ωi . If u has sufficient
smoothness, then the trace of ∇u · ni on the interface has the same absolute
value, but opposite signs, on Γ12 when coming either from Ω1 or from Ω2 .
As the change in sign comes from the normal vector, we choose once and for
all the normal’s orientation on Γ12 ; for example, we choose the orientation of
n1 . Therefore, setting ne = n1 , denoting by nΩ the exterior normal to ∂Ω,
denoting by [v]e and {v}e the jump and average of the trace of v across Γ12 :
[v]e = v|Ω1 − v|Ω2 ,
{v}e =
1
(v|Ω1 + v|Ω2 ) ,
2
and using the identity
∀a1 , a2 , b1 , b2 ∈ R,
a1 b1 − a2 b2 =
1
[(a1 + a2 )(b1 − b2 ) + (a1 − a2 )(b1 + b2 )] ,
2
(2) becomes
2
i=1
Ωi
∇u · ∇v dx −
∂Ωi \Γ12
(∇u · nΩ )v dσ
−
Γ12
{∇u · ne }e [v]e dσ
=
Ω
f v dx. (3)
6
V. Girault and M.F. Wheeler
The discontinuous Galerkin method called IIPG is based on (3). It uses the
regularity of the normal derivative of u. If, in addition, we want to use the
regularity of u and its zero boundary value, then we can add or subtract
the following terms to the left-hand side of (3):
(∇v · nΩ )u dσ, i = 1, 2.
{∇v · ne }e [u]e dσ,
∂Ωi \Γ12
Γ12
Since these terms are zero, the resulting equation is equivalent to (3). The
discontinuous Galerkin method called SIPG is based on subtraction of these
terms:
2
(∇u · nΩ )v + (∇v · nΩ )u dσ
∇u · ∇v dx −
i=1
Ωi
∂Ωi \Γ12
−
Γ12
{∇u · ne }e [v]e + {∇v · ne }e [u]e dσ =
f v dx, (4)
Ω
and the discontinuous Galerkin methods called NIPG and OBB-DG are based
on addition of this term:
2
(∇u · nΩ )v − (∇v · nΩ )u dσ
∇u · ∇v dx −
i=1
Ωi
∂Ωi \Γ12
−
Γ12
{∇u · ne }e [v]e − {∇v · ne }e [u]e dσ =
f v dx. (5)
Ω
In fact, the OBB-DG formulation is precisely (5).
Clearly, the contribution of the surface integrals to the left-hand side of
(5) is anti-symmetric and hence the left-hand side of (5) is non-negative when
v = u. The left-hand side of (4) is symmetric, but there is no reason why it
should be non-negative and the left-hand side of (3) has no symmetry and
no positivity. The left-hand side of (5) can be made positive when v = u by
adding to it the jump terms
1
|Γ12 |
[u]e [v]e dσ +
Γ12
2
i=1
1
|∂Ωi \ Γ12 |
uv dσ,
∂Ωi \Γ12
where for any set S, |S| denotes the measure of S. But, of course, this will
not do for (3) and (4). However, considering that all these formulations will
be applied to functions in finite-dimensional spaces, we expect to make (3)
and (4) positive by incorporating into the jump terms adequate parameters.
Thus we add
J0 (u, v) =
σ12
|Γ12 |
Γ12
[u]e [v]e dσ +
2
i=1
σi
|∂Ωi \ Γ12 |
∂Ωi \Γ12
uv dσ,
(6)
Discontinuous Galerkin Methods
7
ga
E1
na
E2
Fig. 1. Jumps and averages: the jump on an interior edge is given by [v] = v|E1 −v|E2
and on a boundary edge by [v] = v|E1 ; the averages are respectively given by v =
1
(v|E1 + v|E2 ) and v = v|E1 . The unit normal to γa is na
2
.
where σ12 and σi are suitable non-negative parameters. Summing up, the
IIPG, SIPG, NIPG and OBB-DG formulations read:
2
i=1
−
Ωi
Γ12
∇u · ∇v dx −
∂Ωi \Γ12
(∇u · nΩ )v + ε(∇v · nΩ )u dσ
{∇u · ne }e [v]e + ε{∇v · ne }e [u]e dσ + J0 (u, v) =
f v dx, (7)
Ω
with ε = 0 for IIPG, ε = 1 for SIPG and ε = −1 for NIPG and OBB-DG,
σi = σ12 = 1 for NIPG, σi = σ12 = 0 for OBB-DG and σi and σ12 are well
chosen positive parameters for IIPG and SIPG. An example of jumps and
average for a non-conforming mesh are shown in Figure 1.
Remark 1. The NIPG and OBB-DG formulations differ only on the presence
or absence of jump terms. It turns out that in several cases, such as in Section 3, the jump terms are not necessary, but they can be added to enhance
convergence. However, there are cases, such as in Section 4, where OBB-DG
seems sub-optimal without jumps.
Remark 2. As the normal derivative of the solution has no jumps, it is also
possible to add jumps involving this normal derivative (cf. [Dar80, WD80]):
[∇u · n]e [∇v · n]e dσ.
|Γ12 |
Γ12
The resulting equation is still equivalent to (3).
Finally, let us examine a Laplace equation with mixed non-homogeneous
Dirichlet–Neumann boundary conditions. As an example, we replace (1) by
−∆u = f in Ω,
u = g1 on ∂Ω1 \ Γ12 ,
∇u · nΩ = g2 on ∂Ω2 \ Γ12 . (8)
8
V. Girault and M.F. Wheeler
In this case, we suppress from J0 the boundary term on ∂Ω2 \ Γ12 :
σ1
σ12
J0 (u, v) =
[u]e [v]e dσ +
uv dσ,
|Γ12 | Γ12
|∂Ω1 \ Γ12 | ∂Ω1 \Γ12
(9)
and the IIPG, SIPG, NIPG and OBB-DG formulations become:
2
i=1
Ωi
∇u · ∇v dx −
∂Ω1 \Γ12
(∇u · nΩ )v + ε(∇v · nΩ )u dσ
{∇u · ne }e [v]e + ε{∇v · ne }e [u]e dσ + J0 (u, v)
−
Γ12
f v dx +
g2 v dσ − ε
g1 (∇v · nΩ ) dσ
=
Ω
∂Ω2 \Γ12
∂Ω1 \Γ12
σ1
g1 v dσ,
+
|∂Ω1 \ Γ12 | ∂Ω1 \Γ12
(10)
with the same values of ε, σ1 and σ12 as in (7).
2.2 The General Idea for the Stokes Problem
Consider the incompressible Stokes problem in Ω with data f in L2 (Ω)2 :
−µ∆u + ∇p = f ,
div u = 0 in Ω,
u = 0 on ∂Ω,
(11)
where the viscosity parameter µ is a given positive constant. This is a typical problem with a linear constraint (the zero divergence) and a Lagrange
multiplier (the pressure p).
For treating the pressure term and divergence constraint, we take again
a test function v that is not necessarily globally smooth, but has smooth
components in each Ωi , and assuming the pressure p is sufficiently smooth,
we apply Green’s formula in each Ωi :
Ω
(∇p) · v dx =
2
i=1
−
p div v dx +
Ωi
∂Ωi \Γ12
+
p(v · nΩ ) dσ
Γ12
{p}e [v]e · ne dσ.
(12)
We apply the same formula to the divergence constraint. Thus combining (12)
with (7), we have the following IIPG, SIPG, NIPG and OBB-DG formulations
for the Stokes problem (11):
Discontinuous Galerkin Methods
2
i=1
µ
−
Ωi
Γ12
∇u : ∇v dx −
2
i=1
p div v dx +
(∇u · nΩ )v + ε(∇v · nΩ )u dσ
∂Ωi \Γ12
Ωi
f · v dx,
=
Ω
Ωi
∂Ωi \Γ12
µ {∇u · ne }e [v]e + ε{∇v · ne }e [u]e dσ + µJ0 (u, v)
2
−
+
i=1
9
p(v · nΩ ) dσ
+
Γ12
{p}e [v]e · ne dσ
(13)
q div u dx −
∂Ωi \Γ12
q(u · nΩ ) dσ
−
Γ12
{q}e [u]e · ne dσ = 0,
(14)
with the interpretation for the parameters ε and σ of the formula (7).
2.3 Upwinding in a Transport Problem: General Idea
Consider the simple transport problem in Ω:
c + u · ∇c = f in Ω,
(15)
where f belongs to L (Ω) and u is a sufficiently smooth vector-valued function
that satisfies
(16)
div u = 0 in Ω, u · nΩ = 0 on ∂Ω.
Recall the notation
2
∂c
u · ∇c =
,
ui
∂x
i
i=1
2
and note that when the functions involved are sufficiently smooth, Green’s
formula and (16) yield
(u · ∇c)c dx = 0.
(17)
Ω
For the applications we have in mind, let us assume that c is sufficiently
smooth in each Ωi , but is not necessarily in H 1 (Ω). Then, we must give a
meaning to the product u · ∇c. From the following identity and the fact that
the divergence of u is zero:
div(cu) = c(div u) + u · ∇c = u · ∇c,
and we derive for any smooth function ϕ with compact support in Ω
u · ∇c, ϕ = div(cu), ϕ = −cu, ∇ϕ = − (cu) · ∇ϕ dx
=−
2
i=1
Ωi
Ω
(cu) · ∇ϕ dx.
We use the last equality to define u · ∇c in the sense of distributions.
(18)
10
V. Girault and M.F. Wheeler
Now, we wish to extend this definition to functions u and ϕ that are not
necessarily smooth. Then, we take again a test function v that is sufficiently
smooth in each Ωi , but may not be in H 1 (Ω). Applying Green’s formula to the
last equality in (18) in each Ωi and using the fact that u has zero divergence,
we define:
2
(u · ∇c)v dx :=
(u · ∇c)v dx −
c(u · n)v dσ .
(19)
Ω
i=1
Ωi
∂Ωi
In order to introduce an upwinding into this formula, we consider each Ωi and
the portion of its boundary where the flow driven by u enters Ωi , i.e., where
{u} · ni < 0. We set
(∂Ωi )− = {x ∈ ∂Ωi ; {u} · ni (x) < 0}.
Then we replace (19) by
2
(u · ∇c)v dx :=
(u · ∇c)v dx−
Ω
i=1
Ωi
(20)
int
{u} · ni (c
(∂Ωi )−
ext
−c
)v
int
dσ ,
(21)
where the superscript int (resp. ext) refers to the interior (resp. exterior) trace
of the function in Ωi , and on the part of (∂Ωi )− that lies on ∂Ω, cext = 0 and
{u} = u. This is a straightforward extension of the Lesaint–Raviart upwind
scheme.
Finally, we wish to extend (21) to the case where u satisfies (14) instead
of (16), while preserving some property analogous to (17). Keeping in mind
the identity:
1
1
2
(div u)c dx −
(u · n)c2 dσ = 0,
(22)
(u · ∇c)c dx +
2 Ω
2 ∂Ω
Ω
that holds if c and u are sufficiently smooth, we replace (21) by:
Ω
1
u · ∇c + (div u)c v dx
2
Ωi
i=1
int
ext int
(u · nΩ )cv dσ −
{u} · ni (c − c )v dσ
(u · ∇c)v dx :=
1
−
2
∂Ωi \Γ12
2
(∂Ωi )−
−
1
2
Γ12
[u]e · ne {cv}e dσ.
(23)
This is the upwind formulation proposed and analyzed by Rivière et al.
[GRW05].
Discontinuous Galerkin Methods
11
3 DG Approximation of an Elliptic Problem
Let Ω be a polygon in dimension d = 2 or a Lipschitz polyhedron in dimension
d = 3, with boundary ∂Ω partitioned into two disjoint parts: ∂Ω = ΓD ∪ ΓN ,
with polygonal boundaries if d = 3. For simplicity, we assume that |ΓD | is
positive. Consider the continuity equation for Darcy flow in pressure form
in Ω:
− div(K∇p) = f,
p = g1 ,
in Ω,
on ΓD ,
(24)
(25)
K∇p · nΩ = g2 ,
on ΓN ,
(26)
where nΩ is the unit normal vector to ∂Ω, exterior to Ω, and the permeability
K is a uniformly bounded, positive definite symmetric tensor, that is allowed
to vary in space. For f ∈ L2 (Ω), g1 ∈ H 1/2 (ΓD ) and g2 ∈ L2 (ΓN ), system
(24)–(26) has a unique solution p ∈ H 1 (Ω) and we assume that p is sufficiently
regular to guarantee the consistency of the schemes below.
Let Eh be a regular family of triangulations of Ω consisting of triangles (or
tetrahedra if d = 3) E of maximum diameter h, and such that no face or side
of ∂E intersects both ΓD and ΓN . It is regular in the sense of Ciarlet [Cia91]:
There exists a constant γ > 0, independent of h, such that
∀E ∈ Eh ,
hE
= γE ≤ γ,
̺E
(27)
where hE denotes the diameter of E (bounded above by h) and ̺E denotes
the diameter of the ball inscribed in E.
To simplify the discussion, we assume that Eh is conforming, but most
results in this section remain valid for non-conforming grids as well as for
quadrilateral (or hexahedral if d = 3) grids. We denote by Γh the set of all
interior edges (or faces if d = 3) of Eh and by Γh,D (resp. Γh,N ) the set of
all edges or faces of Eh that lie on ΓD (resp. ΓN ). The elements E of Eh are
numbered and denoted by Ei , say for 1 ≤ i ≤ Ph . With any edge or face e
of Γh shared by Ei and Ej with i < j, we associate once and for all the unit
normal vector ne directed from Ei to Ej and we define the jump [ϕ]e and
average {ϕ}e of a function ϕ by:
[ϕ]e = ϕ|Ei − ϕ|Ej ,
{ϕ}e =
1
(ϕ|Ei + ϕ|Ej ).
2
If e ⊂ ∂Ω, then ne = nΩ and the jump and average of ϕ coincide with the
trace of ϕ.
Considering the differential operator in (24), we define the “discontinuous”
space:
H 1 (Eh ) = {v ∈ L2 (Ω); ∀E ∈ Eh , v|E ∈ H 1 (E)},
equipped with the “broken” semi-norm
12
V. Girault and M.F. Wheeler
1
2
|||K ∇v|||L2 (Eh ) =
1
2
1
2
K ∇v2L2 (E)
E∈Eh
,
(28)
and norm (for which it is a Hilbert space)
1
|||v|||H 1 (Eh ) = v2L2 (Ω) + |||K 2 ∇v|||2L2 (Eh )
21
.
In view of (9), we define the jump bilinear form
σe
[u]e [v]e dσ,
J0 (u, v) =
he e
(29)
e∈Γh ∪Γh,D
where he denotes the diameter of e, and each σe is a suitable non-negative
parameter. It is convenient to define also the mesh-dependent semi-norm
12
1
[|v|]H 1 (Eh ) = |||K 2 ∇v|||2L2 (Eh ) + J0 (v, v) .
(30)
Now, we choose an integer k ≥ 1 and we discretize H 1 (Eh ) with the finite
element space
Xh = {v ∈ L2 (Ω) : ∀E ∈ Eh , v|E ∈ Pk (E)}.
(31)
It is possible to let k vary from one element to the next, but for simplicity we
keep the same k. Then, keeping in mind (10), we discretize (24)–(26) by the
following discrete system: Find ph ∈ Xh such that for all qh ∈ Xh ,
K∇ph · ∇qh dx
E∈Eh
−
=
E
e∈Γh ∪Γh,D
e
f qh dx +
{K∇ph · ne }e [qh ]e + ε{K∇qh · ne }e [ph ]e dσ + J0 (ph , qh )
ΓN
Ω
+
g2 qh dσ − ε
e∈Γh,D
σe
g1 qh dσ,
he e
e
g1 (K∇qh · nΩ ) dσ
(32)
e∈Γh,D
with ε = 1 for SIPG, ε = 0 for IIPG and ε = −1 for NIPG and OBB-DG;
and for each e, σe = 1 for NIPG, σe = 0 for OBB-DG and again σe is a well
chosen positive parameter for IIPG and SIPG.
Remark 3. Let E be an element of Eh with no edge (or face) e on ∂Ω. Taking
qh = χE , the characteristic function of E in (32), we easily derive the discrete
mass balance relation where nE denotes the unit normal exterior to E:
σe
int
ext
f dx.
(p − ph ) dσ =
{K∇ph } · nE dσ +
−
he e h
E
e
e∈∂E
e∈∂E
Discontinuous Galerkin Methods
13
3.1 Numerical Analysis
To simplify the discussion, we introduce the bilinear form defined for any pair
of functions p and q in Xh + H s (Ω) with s > 23 (so that the integrals over e
are well-defined):
ah (p, q) =
E∈Eh
−
E
K∇p · ∇q dx
e∈Γh ∪Γh,D
e
{K∇p · ne }e [q]e + ε{K∇q · ne }e [p]e dσ.
(33)
Clearly, for NIPG,
ah (qh , qh ) + J0 (qh , qh ) = [|qh |]2H 1 (Eh ) ,
(34)
and, therefore, (32) has a unique solution. For IIPG and SIPG [Whe78,
DSW04], an argument on finite-dimensional spaces (cf. [GSWY]) shows that
for each e there exists a constant ce , independent of h, but depending on k,
the regularity constant γ of (27) and the maximum and minimum eigenvalues
of K on the elements adjacent to e, such that for all ph and qh in Xh
{K∇ph · ne }e [qh ]e dσ
e∈Γh ∪Γh,D e
⎞ 21
⎛
1
ce
≤ |||K 2 ∇ph |||L2 (Eh ) ⎝
[qh ]2L2 (e) ⎠ . (35)
he
e∈Γh ∪Γh,D
The assumptions on K imply that the constants ce can be bounded above
independently of h and e and, therefore, applying Young’s inequality, we can
choose constants σe , uniformly bounded above and below with respect to h:
∀e ∈ Γh ∪ Γh,D ,
1 ≤ σ0 ≤ σe ≤ σm ,
such that (for instance)
1
{K∇qh · ne }e [qh ]e dσ ≤ [|qh |]2H 1 (Eh ) .
e∈Γh ∪Γh,D e
4
(36)
(37)
With this choice of penalty parameters σe , the system (32) for IIPG and SIPG
has a unique solution. Furthermore, there exist two positive constants α and
M , independent of h such that for all ph and qh in Xh
|ah (ph , qh )| + |J0 (ph , qh )| ≤ M [|ph |]H 1 (Eh ) [|qh |]H 1 (Eh ) ,
ah (qh , qh ) + J0 (qh , qh ) ≥ α[|qh |]2H 1 (Eh ) .
(38)
14
V. Girault and M.F. Wheeler
This analysis cannot be applied to establish the solvability of OBB-DG,
because the term J0 is missing. If k ≥ 2, one can show directly for OBB-DG
that (32) has a unique solution cf. [RWG01], but the second part of (38) does
not hold. When k = 1, there is a counter-example that shows that (32) is
not well-posed (cf. [OBB98]). For this reason, OBB-DG is only applied when
k ≥ 2.
With the above choice of penalty parameters σe , a standard error analysis
allows to prove optimal a priori error estimates in the norm [|·|]H 1 (Eh ) for IIPG,
SIPG and NIPG: if the exact solution p of (24)–(26) belongs to H k+1 (Ω), then
for the three methods
[|ph − p|]H 1 (Eh ) = O(hk ).
The same result holds for OBB-DG, but the proof is more subtle. The difficulty
lies in estimating the term
T =
{K∇(p − Rh p) · ne }e [qh ]e dσ,
e∈Γh ∪Γh,D
e
where Rh is an interpolation operator in Xh and qh ∈ Xh is an arbitrary test
function. If we had jumps, we would write as in the cases of IIPG, SIPG and
NIPG:
21
21
σe
he
{K∇(p − Rh p) · ne }e L2 (e)
[qh ]e L2 (e) .
|T | ≤
σe
he
e∈Γh ∪Γh,D
1
With a standard interpolation operator, owing to the factor he2 , the term
21
he
{K∇(p − Rh p) · ne }e L2 (e) = O(hk ).
σe
Here we have no jumps and the only way in which we can recover the factor
1
he2 is by constructing an interpolation operator Rh such that
{K∇(p − Rh p) · ne }e dσ = 0.
e
If this is the case, then we can write
{K∇(p − Rh p) · ne }e ([qh ]e − ce ) dσ,
T =
e∈Γh ∪Γh,D
e
where the number ce is chosen so that
1
1
[qh ]e − ce L2 (e) ≤ C hE2 i ∇qh L2 (Ei ) + hE2 j ∇qh L2 (Ej ) ,
and Ei and Ej are the elements adjacent to e. This interpolation operator is
constructed in [RWG01], for k ≥ 2. When k = 1, there are not enough degrees
of freedom for its construction.
Discontinuous Galerkin Methods
15
When the solution of (24)–(26) belongs to H 2 (Ω) for all sufficiently smooth
data (this holds, for example, when K and g1 are sufficiently smooth and ΓD
is the whole boundary), then a duality argument shows that the error for
SIPG in the L2 norm has a higher order:
ph − pL2 (Ω) = O(hk+1 ).
(39)
3
More generally, if there exists s ∈ 2 , 1 such that the solution of (24)–(26)
belongs to H 1+s (Ω) for all correspondingly smooth data then (cf. [RWG01])
ph − pL2 (Ω) = O(hk+s ).
This result follows from the symmetry of ah . For the other methods, which
are not symmetric, the same duality argument (cf. [RWG01]) does not yield
any increase in order, namely all we have is
ph − pL2 (Ω) = O(hk ).
(40)
Nevertheless, numerical results for NIPG and OBB-DG tend to prove that
(39) holds if k is an odd integer, but so far we have no proof of this result.
Remark 4. The choice of penalty parameters for IIPG and SIPG is not
straightforward. If chosen too small, the stability properties in (38) may
be lost. But if chosen too large, the matrix of system (32) may become illconditioned.
Remark 5. One cannot prove basic inequalities on the functions of Xh , such
as Poincaré’s Inequality, without adding jumps to the broken norm; i.e., the
gradients in each element are not sufficient to control the L2 norm. With
jumps, one can prove Poincaré–Friedrich’s inequalities, Sobolev inequalities,
Korn’s inequalities and trace inequalities. For Poincaré–Friedrich’s inequalities
and Korn’s inequalities, we refer to the very good contributions of Brenner
[Bre03, Bre04]. The Sobolev and trace inequalities can be derived by using
similar arguments (cf. [GRW05]). Note that, by virtue of Poincaré’s Inequality,
(40) can be established directly for IIPG, SIPG and NIPG without having to
assume that the solution of (24)–(26) has extra smoothness for all smooth
data.
4 DG Approximation of an Incompressible Stokes
Problem
Let us revert to the problem (11) on a connected polygonal or polyhedral
domain:
−µ∆u + ∇p = f ,
div u = 0 in Ω,
u = 0 on ∂Ω.
16
V. Girault and M.F. Wheeler
For a given force f ∈ L2 (Ω)d , this problem has a unique solution u ∈ H01 (Ω)d
and p ∈ L20 (Ω) (cf., for instance, [Tem79, GR86]). In fact, the solution is more
regular and the scheme below is consistent (cf. [Gri85, Dau89]).
In view of the operator and boundary condition in (11), the relevant spaces
here are H 1 (Eh )d and L20 (Ω), and the set Γh,N is empty. The definition of J0 is
extended straightforwardly to vectors and the permeability tensor is replaced
by the identity multiplied by the viscosity. Thus, the semi-norms (28) and
(30) are replaced by
|||∇v|||L2 (Eh ) =
1
2
E∈Eh
∇v2L2 (E)
,
(41)
21
1
[|v|]H 1 (Eh ) = µ 2 |||∇v|||2L2 (Eh ) + J0 (v, v) .
(42)
Again, we choose an integer k ≥ 1 and we discretize H 1 (Eh )d and L20 (Ω)
with the finite element spaces
Xh = {v ∈ L2 (Ω)d : ∀E ∈ Eh , v|E ∈ Pk (E)d },
(43)
Mh = {q ∈ L20 (Ω) : ∀E ∈ Eh , q|E ∈ Pk−1 (E)}.
(44)
The choice Pk−1 for the discrete pressure, one degree less than the velocity, is
suggested by the fact that L2 is the natural norm for the pressure. Keeping in
mind (13) and (14), we discretize (11) by the following discrete system: Find
uh ∈ Xh and ph ∈ Mh satisfying for all vh ∈ Xh and qh ∈ Mh :
µ
∇uh : ∇vh dx
E
E∈Eh
−µ
−
e∈Γh ∪∂Ω
E∈Eh
e
{∇uh · ne }e [vh ]e + ε{∇vh · ne }e [uh ]e dσ + µJ0 (uh , vh )
ph div vh dx +
E
e∈Γh ∪∂Ω
E∈Eh
E
qh div uh dx −
e
{ph }e [vh ]e · ne dσ =
e∈Γh ∪∂Ω
Ω
f · vh dx,
(45)
e
{qh }e [uh ]e · ne dσ = 0,
with the interpretation for the parameters ε and σ of formula (7).
(46)
Discontinuous Galerkin Methods
17
Let ah and bh denote the bilinear forms
∇u : ∇v dx
ah (u, v) = µ
bh (v, q) =
E∈Eh
E
−µ
e∈Γh ∪∂Ω
E∈Eh
E
e
{∇u · ne }e [v]e + ε{∇v · ne }e [u]e dσ,
q div v dx −
e∈Γh ∪∂Ω
e
{q}e [v]e · ne dσ.
(47)
(48)
Clearly, the properties of ah listed in the previous section are valid here and,
therefore, existence and uniqueness of uh hold for IIPG and SIPG if the
penalty parameters σe are well-chosen; they hold unconditionally for NIPG
and they hold for OBB-DG if k ≥ 2. But existence and uniqueness of ph is
not straightforward because it is the consequence of the uniform “inf-sup”
condition, that is now a standard tool in studying problems with a linear
constraint (cf. [Bab73, Bre74]): There is a constant β ∗ > 0 independent of h
such that
bh (vh , qh )
≥ β∗.
(49)
inf sup
qh ∈Mh vh ∈Xh [|vh |]H 1 (Eh ) qh L2 (Ω)
By using the Raviart–Thomas interpolation operator (cf. [RT75, GR86]), we
can readily show that (49) holds for IIPG, SIPG, NIPG and OBB-DG (cf., for
instance, [SST03]). Hence the four schemes have a unique solution. However, in
order to derive optimal error estimates, we have to bound the term bh (vh , p −
ρh p), where ρh is a suitable approximation operator, for instance, a local L2
projection on each E, and vh is an arbitrary test function in Xh . It is easy to
prove that if p ∈ H k (Eh ) then
21
1
.
[vh ]2L2 (e) + |||∇vh |||2L2 (Eh )
|bh (vh , p − ρh p)| ≤ Chk
he
e∈Γh ∪∂Ω
As J0 is zero for OBB-DG, we cannot obtain a good estimate for this method:
it does not seem to be well-adapted to this formulation of the Stokes problem.
On the other hand, we can obtain optimal error estimates for IIPG, SIPG,
NIPG: if the exact solution (u, p) of the problem (11) belongs to H k+1 (Ω)d ×
H k (Ω), then for the three methods
[|uh − u|]H 1 (Eh ) + ph − pL2 (Ω) = O(hk ).
(50)
Remark 6. Let E be an element as in Remark 3. Taking first qh = χE in (46)
and next the i-th component of vh , vh,i = χE in (45), we obtain the discrete
mass balance relations:
1
ext
div uh dx −
(uint
h − uh ) · nE dσ = 0,
2
E
e∈∂E e
σe
ext
−µ
(uint
−
u
)
dσ
=
fi dx.
{∇uh,i } · nE dσ + µ
h,i
he e h,i
E
e
e∈∂E
e∈∂E
18
V. Girault and M.F. Wheeler
5 DG Approximation of a Convection-Diffusion Equation
Consider the convection-diffusion equation combining (24) and (15) in the
domain Ω of the previous sections:
− div(K∇c) + u · ∇c = f, in Ω,
K∇c · nΩ = 0, on ∂Ω,
(51)
(52)
where f belongs to L20 (Ω), the tensor K satisfies the assumptions listed in
Section 3 and u satisfies (16):
div u = 0 in Ω,
u · nΩ = 0 on ∂Ω.
This problem has a solution c ∈ H 1 (Ω), unique up to an additive constant
under mild restrictions on the velocity u, for instance, when u belongs to
H 1 (Ω)d . We propose to discretize it with a DG method when u is replaced
by the solution uh ∈ Xh of a flow problem that satisfies bh (uh , qh ) = 0 for all
qh ∈ M h :
qh div uh dx −
{qh }e [uh ]e · ne dσ = 0.
E
E∈Eh
e∈Γh ∪∂Ω
e
For an integer ℓ ≥ 1, we define
Yh = {c ∈ L2 (Ω) : ∀E ∈ Eh , c|E ∈ Pℓ (E)}.
(53)
In view of (23) and (32), we discretize (51)–(52) by: Find ch ∈ Yh such that
for all vh ∈ Yh :
K∇ch · ∇vh dx
E∈Eh
−
+
E
e∈Γh ∪∂Ω
E∈Eh
E
e
{K∇ch · ne }e [vh ]e + ε{K∇vh · ne }e [ch ]e dσ + J0 (ch , vh )
1
1
uh · ∇ch + (div uh )ch vh dx −
[uh ]e · ne {ch vh }e dσ
2
2
e∈Γh ∪∂Ω e
int
ext int
−
f vh dx, (54)
{uh } · nE (ch − ch )vh dσ =
E∈Eh
(∂E)−
Ω
where (∂E)− is defined by (20)
(∂E)− = {x ∈ ∂E : {uh } · nE (x) < 0},
and the parameters ε and σe are the same as previously.
To simplify, we introduce the form th with the upwind approximation of
the transport term in (54):
Discontinuous Galerkin Methods
19
1
uh · ∇vh + (div uh )vh wh dx
2
E∈Eh E
1
−
[uh ]e ·ne {vh wh }e dσ.
{uh }·nE (vhint −vhext )whint dσ−
2
e
(∂E)−
th (uh ; vh , wh ) =
e∈Γh ∪∂Ω
E∈Eh
(55)
This form is positive in the following sense (cf. [GRW05]): for all vh ∈ Yh
th (uh ; vh , vh ) =
1
1
|{uh } · nE | 2 (vhint − vhext )2L2 ((∂E)− \∂Ω)
2
E∈Eh
1
+ |uh · nΩ | 2 vh 2L2 ((∂Ω)− ) , (56)
where
(∂Ω)− = {x ∈ ∂Ω : uh · nΩ (x) < 0}.
Therefore, if the penalty parameters σe are chosen as in Section 3, we see that
system (54) has a solution th in Yh , unique up to an additive constant. In
particular, this means that (54) is compatible with (51)–(52) and this is an
important property, cf. [DSW04].
However, proving a priori error estimates is more delicate, considering
that uh proceeds from a previous computation. If the error in computing uh
is measured in the norm (42), then the contribution of th (uh ; ch , vh ) to the
error is estimated as in the Navier–Stokes equations. This requires discrete
Sobolev inequalities, and as mentioned in Remark 5, this does not seem to
be possible for OBB-DG schemes. On the other hand, for IIPG, SIPG and
NIPG, the analysis in [GRW05] carries over here and yields, when u and c
are sufficiently smooth:
[|ch − c|]H 1 (Eh ) = O(hmin(k,ℓ) ),
where k is the exponent in (50).
Remark 7. Let E be an element as in Remark 3. Taking vh = χE in (54), we
obtain the discrete mass balance relation:
σe
−
(cint − cext
{K∇ch } · nE dσ +
h ) dσ
he e h
e
e∈∂E
e∈∂E
1
1
int
ext
int
+
(div uh )ch dx −
(uh − uh ) · nE ch dσ
2
2
E
e∈∂E e
ext
+
|{uh } · nE |(cint
−
c
)
dσ
=
f dx.
h
h
e∈(∂E)−
e
E
20
V. Girault and M.F. Wheeler
6 Some Darcy Flow in Porous Media: Numerical
Examples
In recent years DG methods have been investigated and applied to a wide
collection of fluid and solid mechanics problems arising in many engineering
and scientific fields such as aerospace, petroleum, environmental, chemical and
biomedical engineering, and earth and life sciences. Since the list of publications is substantial and continues to grow, we include only a few references to
illustrate the diversity of applications, [CKS00]. We do provide some numerical examples arising in modeling Darcy flow and transport in porous media
in which DG algorithms offer major advantages over traditional conforming
finite element and finite difference methods.
Geological media such as aquifers and petroleum reservoirs exhibit a high
level of spatial variability at a multiplicity of scales, from the size of individual grains or pores, to facies, stratigraphic and hydrologic units, up to sizes
of formations. These problems are of great importance to a number of scientific disciplines that include the management and protection of groundwater
resources, the deposition of nuclear wastes, the recovery of hydrocarbons, and
the sequestration of excessive carbon dioxide. Numerical simulation of physical flows and chemical reactions in heterogeneous geological media and their
interplay is required for understanding as well as designing mitigation strategies for environmental cleanup or optimizing oil and gas production.
DG methods are effective in treating complex geological heterogeneities
such as impermeable boundaries or flow faults occuring in the interior of a
reservoir. Because of the flexibility of DG, these boundaries do no require special meshing. Instead the face between two internal elements is simply switched
to a no flow boundary condition for both neighboring elements. In Figure 2 we
show an example of a mesh with 1683 triangular elements, in which the dark
−50
Y (m)
−100
−150
−200
−250
100
X (m)
200
Fig. 2. Mesh with internal boundary conditions (left) and pressure and flux solutions
(right)
Discontinuous Galerkin Methods
21
lines are impermeable boundaries. Also shown is the corresponding pressure
and flux solution and the impact of these boundaries is clearly observed.
Another important porous media application where DG could prove to be
extremely important is reactive transport. When dealing with general chemistry and transport, it is imperative that the transport operators be monotone
and conservative. While a number of monotone finite difference methods have
been proposed for structured grids, many of these approaches have not been
extended to unstructured grids. With the use of appropriate numerical fluxes,
approximate Riemann solvers and stability post-processing (slope-limiting),
DG methods can be used to construct discretizations which are conservative
and monotone.
A benchmark case in reactive transport is a simulation of a far field nuclear
waste management problem [cpl01, cpl]. The problem is characterized by large
discontinuous jumps in permeability, effective porosity, and diffusivity, and by
the need to model small levels of concentration of the radioactive constituents.
The permeability field layers of the subsurface are shown in Figure 3.
For this example the magnitude of the velocity varies greatly in the different layers due to the discontinuities in the permeability of the layers. In
addition, in the clay and marl layers, where permeability is small, transport is
dominated by molecular diffusion. In the limestone and dogger limestone layers, where permeability is large, transport is dominated by advection and
dispersion. This example demonstrates the ability of DG to handle both
Fig. 3. Permeability field layers in the reactive transport problem
22
V. Girault and M.F. Wheeler
Cone
600
Y, meters
500
400
300
600
500
Y, meters
1.0E+02
4.2E+03
2.2E+04
1.0E+03
4.0E+04
2.2E+04
1.0E+04
4.0E+05
2.2E+05
1.0E+05
4.0E+05
2.2E+05
1.0E+05
4.2E+07
2.2E+07
1.0E+07
4.2E+06
2.2E+06
1.0E+06
400
300
200
200
100
100
0
0
5000
10000
15000
20000
25000
0
0
5000
10000
15000
20000
25000
X, meters
X, meters
Fig. 4. Simulation of nuclear reactive transport using DG - 1
600
Y, meters
500
400
300
200
100
0
0
5000
10000
15000
20000
25000
X, meters
Fig. 5. Simulation of nuclear reactive transport using DG - 2
advection-dominated and diffusion-dominated problems. Figure 4 shows Iodine concentration at 200K years and Figure 5 at 2 million years. The low
numerical diffusion of the DG method was also found to be important in
this benchmark problem because of the long simulation time, cf. [WESR03].
Details regarding this simulation and several mesh adaptation strategies are
discussed in [SW06a, SW06b]. The latter demonstrated that by employing dynamic adaptivity, time-dependent transport could be resolved without slope
limiting for both long-term and short-term simulations. Moreover, mass conservation was retained locally during dynamic mesh modification.
The theoretical and computational results obtained for primal DG methods for transport and flow are summarized in Table 1. Two rows provide a
comparison of the methods for treating flow problems with highly varying
Discontinuous Galerkin Methods
23
Table 1. Primal DG for transport
OBB-DG NIPG
Penalty Term
Optimality in L2 (H 1 ) or H 1
Optimality in L2 (L2 ) or L2
Robust probs. with highly var. coeffs.
Scalar primary interest(transp.)
Compatibility Flow Condition
0
Yes
No
Yes
No
No
≥0
Yes
No
Yes
No
No
SIPG
> σ0 > 0 > 0
Yes
Yes
No
Yes
No
IIPG
and ≪ σ0
Yes
No
Yes
No
Yes
coefficients and for transport problems in which the scalar variable is of primary interest. These results were obtained from an extensive set of numerical
experiments. The studies indicate that the non-symmetric DG formulations
are more robust in handling rough coefficients. The symmetric form performs
better for treating diffusion/advection/reaction problems since the SIPG form
yield optimal L2 and non-negative norm estimates. The last row summarizes
a compatibility condition formulated in [DSW04] in which the objective is to
choose a flow field that preserves positive concentrations in reactive transport.
The IIPG method is the only primal DG for which this holds.
DG methods are currently being investigated for modeling multiphase flow
in porous media, e.g., see [BR04, KR06] for two-phase incompressible and for
two and three phases compressible systems see [HF06, Esl05, SW]. While
much progress has been made in modeling transport a major disadvantage for
DG has been the development of efficient parallel solvers for large linear and
nonlinear systems, the pressure equation or a fully implicit formulation for
multiphase flow respectively. The development of DG solvers is an active area
of research and new domain decomposition approaches are currently being
developed, e.g., see [Kan05, Joh05, AA07, Esl05, BR00].
References
[AA07]
P. F. Antonietti and B. Ayuso. Schwarz domain decomposition preconditioners for discontinuous Galerkin approximations of elliptic problems:
non-overlapping case. M2AN Math. Model. Numer. Anal., 41(1):21–54,
2007.
[ABCM02] D. N. Arnold, F. Brezzi, B. Cockburn, and L. D. Marini. Unified analysis of discontinuous Galerkin methods for elliptic problems. SIAM J.
Numer. Anal., 39(5):1749–1779, 2002.
[Arn79]
D. N. Arnold. An interior penalty finite element method with discontinuous elements. PhD thesis, University of Chicago, Chicago, IL, 1979.
[Arn82]
D. N. Arnold. An interior penalty finite element method with discontinuous elements. SIAM J. Numer. Anal., 19(4):742–760, 1982.
[Bab73]
I. Babus̆ka. The finite element method with Lagrangian multipliers.
Numer. Math., 20:179–192, 1973.
24
V. Girault and M.F. Wheeler
[Bak77]
[Bau97]
[Bey94]
[BO99]
[BOP96]
[BR00]
[BR04]
[Bre74]
[Bre03]
[Bre04]
[BZ73]
[Cia91]
[CKS00]
[cpl]
[cpl01]
[CR73]
[Dar80]
[Dau89]
G. Baker. Finite element methods for elliptic equations using nonconforming elements. Math. Comp., 31:45–59, 1977.
C. E. Baumann. An hp-adaptive discontinuous finite element method
for computational fluid dynamics. PhD thesis, University of Texas at
Austin, Austin, TX, 1997.
K. S. Bey. An hp-adaptive discontinuous Galerkin method for hyperbolic
conservative laws. PhD thesis, University of Texas at Austin, Austin,
TX, 1994.
C. E. Baumann and J. T. Oden. A discontinuous hp finite element
method for convection-diffusion problems. Comput. Methods Appl.
Mech. Engrg., 175(3–4):311–341, 1999.
K. S. Bey, J. T. Oden, and A. Patra. hp-version discontinuous Galerkin
methods for hyperbolic conservation laws. Compt. Methods Appl. Mech.
Engrg., 133:259–286, 1996.
P. Bastian and V. Reichenberger. Multigrid for higher order discontinuous Galerkin finite elements applied to groundwater flow. Technical
Report 2000-37, SFB 359, 2000.
P. Bastian and B. Riviere. Discontinuous Galerkin for two-phase flow
in porous media. Technical Report 2004-28, IWR(SFB 359), University
of Heidelberg, 2004.
F. Brezzi. On the existence, uniqueness and approximation of the
saddle-point problems arising from Lagrangian multipliers. RAIRO
Anal. Numér., 8:129–151, 1974.
S. Brenner. Poincaré–Friedrichs inequalities for piecewise h1 functions.
SIAM J. Numer. Anal., 41:306–324, 2003.
S. Brenner. Korn’s inequalities for piecewise h1 vector fields. Math.
Comp., 73:1067–1087, 2004.
I. Babus̆ka and M. Zlámal. Nonconforming elements in the finite element
method with penalty. SIAM J. Numer. Anal., 10:863–875, 1973.
P. G. Ciarlet. Basic error estimates for elliptic problems. In P. G.
Ciarlet and J. L. Lions, editors, Handbook of Numerical Analysis, Vol.
II, pages 17–351. North-Holland, Amsterdam, 1991.
B. Cockburn, G. E. Karniadakis, and C.-W. Shu, editors. Discontinuous
Galerkin methods. Theory, computation and applications (Newport, RI,
1999). Number 11 in Lecture Notes in Computational Science and
Engineering. Springer-Verlag, Berlin, 2000.
Couplex1 test case, nuclear waste disposal far field simulation. ANDRA
(the French National Radioactive Waste Management Agency),
http://www.andra.fr/couplex/.
The couplex test cases. ANDRA (the French National Radioactive
Waste Management Agency), http://www.andra.fr/couplex/, 2001.
M. Crouzeix and P. A. Raviart. Conforming and non-conforming finite
element methods for solving the stationary Stokes problem. RAIRO
Anal. Numér., 8:33–76, 1973.
B. L. Darlow. An Penalty-Galerkin method for solving the miscible
displacement problem. PhD thesis, Rice University, Houston, TX, 1980.
M. Dauge. Stationary Stokes and Navier–Stokes systems on two
or three-dimensional domains with corners. SIAM J. Math. Anal.,
20(1):74–97, 1989.
Discontinuous Galerkin Methods
[DD76]
[DSW04]
[Esl05]
[GR79]
[GR86]
[Gri85]
[GRW05]
[GSWY]
[HF06]
[Joh05]
[Kan05]
[KR06]
[LR74]
[Nit71]
[OBB98]
[OW75]
[Pir89]
25
J. Douglas, Jr. and T. Dupont. Interior penalty procedures for elliptic
and parabolic Galerkin methods. In Computing Methods in Applied
Sciences (Second Internat. Sympos., Versailles, 1975), number 58 in
Lecture Notes in Phys., pages 207–216. Springer-Verlag, Berlin, 1976.
C. Dawson, S. Sun, and M. F. Wheeler. Compatible algorithms for
coupled flow and transport. Comput. Methods Appl. Mech. Engrg.,
194:2565–2580, 2004.
O. Eslinger. Discontinuous Galerkin finite element methods applied to
two-phase air-water flow problems. PhD thesis, University of Texas at
Austin, Austin, TX, 2005.
V. Girault and P.-A. Raviart. An analysis of upwind schemes for the
Navier–Stokes equations. SIAM J. Numer. Anal., 19(2):312–333, 1979.
V. Girault and P.-A. Raviart. Finite Element Methods for the Navier–
Stokes Equations. Theory and Algorithms. Number 5 in Springer Series
in Computational Mathematics. Springer-Verlag, Berlin, 1986.
P. Grisvard. Elliptic Problems in Nonsmooth Domains. Number 24 in
Pitman Monographs and Studies in Mathematics. Pitman, Boston, MA,
1985.
V. Girault, B. Rivière, and M. Wheeler. A discontinuous Galerkin
method with non-overlapping domain decomposition for the Stokes and
Navier–Stokes problems. Math. Comp., 74:53–84, 2005.
V. Girault, S. Sun, M. F. Wheeler, and I. Yotov. Coupling discontinuous
Galerkin and mixed finite element discretizations using mortar finite
elements. SIAM J. Numer. Anal. Submitted Oct. 2006.
H. Hoteit and A. Firoozabadi. Compositional modeling by the combined
discontinuous Galerkin and mixed methods. SPE J., 11:19–34, 2006.
K. Johannsen. A symmetric smoother for the nonsymmetric interior
penalty discontinuous Galerkin discretization. ICES Report 05-23, University of Texas at Austin, 2005.
G. Kanschat. Block preconditioners for LDG discretizations of linear
incompressible flow problems. J. Sci. Comput., 22(1–3):371–384, 2005.
W. Klieber and B. Riviere. Adaptive simulations of two-phase flow by
discontinuous Galerkin methods. Comput. Methods Appl. Mech. Engrg.,
196(1–3):404–419, 2006.
P. Lesaint and P. A. Raviart. On a finite element method for solving
the neutron transport equation. In C. deBoor, editor, Mathematical
Aspects of Finite Elements in Partial Differential Equations, pages 89–
123. Academic Press, 1974.
J. A. Nitsche. Über ein Variationsprinzip auf Lösung von DirichletProblemen bei Verwendung von Teilsraumen, die keinen Randbedingungen unteworfen sind. Math. Sem. Univ. Hamburg, 36:9–15, 1971.
J. T. Oden, I. Babus̆ka, and C. E. Baumann. A discontinuous hp finite
element method for diffusion problems. J. Comput. Phys., 146:491–516,
1998.
J. T. Oden and L. C. Wellford, Jr. Discontinuous finite element approximations for the analysis of shock waves in nonlinearly elastic materials.
J. Comput. Phys., 19(2):179–210, 1975.
O. Pironneau. Finite Element Methods for Fluids. Wiley, Chichester,
1989.
26
[RH73]
V. Girault and M.F. Wheeler
W. H. Reed and T. R. Hill. Triangular mesh methods for the neutron
transport equation. Los Alamos Scientific Laboratory Report LA-UR73-479, 1973.
[Riv00]
B. Rivière. Discontinuous Galerkin finite element methods for solving
the miscible displacement problem in porous media. PhD thesis, University of Texas at Austin, Austin, TX, 2000.
[RT75]
P. A. Raviart and J. M. Thomas. A mixed finite element method for second order elliptic problems. In Mathematical Aspects of Finite Element
Methods, number 606 in Lecture Notes in Mathematics. Springer-Verlag,
Berlin, 1975.
[RW74]
H. Rachford and M. F. Wheeler. An H1-Galerkin procedure for the
two-point boundary value problem. In C. deBoor, editor, Mathematical
Aspects of Finite Elements in Partial Differential Equations, pages 353–
382. Academic Press, 1974.
[RWG99] B. Riviere, M. F. Wheeler, and V. Girault. Part I: Improved energy
estimates for interior penalty, constrained and discontinuous Galerkin
methods for elliptic problems. Comput. Geosci., 3:337–360, 1999.
[RWG01] B. Riviere, M. F. Wheeler, and V. Girault. A priori error estimates for
finite element methods based on discontinuous approximation spaces
for elliptic problems. SIAM J. Numer. Anal., 39(3):902–931, 2001.
[SST03]
D. Shotzau, C. Schwab, and A. Toselli. Mixed hp-DGFEM for incompressible flows. SIAM J. Numer. Anal., 40(319):2171–2194, 2003.
[SW]
S. Sun and M. F. Wheeler. Discontinuous Galerkin methods for multiphase compressible flows. In preparation.
[SW06a]
S. Sun and M. F. Wheeler. Anisotropic and dynamic mesh adaptation for discontinuous Galerkin methods applied to reactive transport.
Comput. Methods Appl. Mech. Engrg., 195(25–28):3382–3405, 2006.
[SW06b]
S. Sun and M. F. Wheeler. A posteriori error estimation and dynamic adaptivity for symmetric discontinuous Galerkin approximations
of reactive transport problems. Comput. Methods Appl. Mech. Engrg.,
195:632–652, 2006.
[Tem79]
R. Temam. Navier–Stokes equations. Theory and numerical analysis.
North-Holland, Amsterdam, 1979.
[WD80]
M. F. Wheeler and B. L. Darlow. Interior penalty Galerkin procedures
for miscible displacement problems in porous media. In Computational
methods in nonlinear mechanics (Proc. Second Internat. Conf., Univ.
Texas, Austin, Tex., 1979), pages 485–506, Amsterdam, 1980. NorthHolland.
[WESR03] M. F. Wheeler, O. Eslinger, S. Sun, and B. Rivière. Discontinuous
Galerkin method for modeling flow and reactive transport porous media. In Analysis and Simulation of Multifield Problems, pages 37–58.
Springer-Verlag, Berlin, 2003.
[Whe78]
M. F. Wheeler. An elliptic collocation-finite element method with interior penalties. SIAM J. Numer. Anal., 15(1):152–161, 1978.
Mixed Finite Element Methods on Polyhedral
Meshes for Diffusion Equations
Yuri A. Kuznetsov
Department of Mathematics, University of Houston, 651 Philip G. Hoffman Hall,
Houston, TX 77204–3008, USA kuz@math.uh.edu
Summary. In this paper, a new mixed finite element method for the diffusion
equation on polyhedral meshes is proposed. The method is applied to the diffusion
equation on meshes with mixed cells when all the coefficients and the source function
may have discontinuities inside polyhedral mesh cells. The resulting discrete equations operate only with the degrees of freedom for normal fluxes on the boundaries
of cells and one degree of freedom per cell for the solution function.
Key words: Diffusion equation, mixed finite element method, polyhedral
meshes, mixed cells
1 Introduction
In this paper, we propose a new mixed finite element method for the diffusion
equation on general polyhedral meshes in the case when the coefficients of the
equation and the source function may have strong discontinuities inside mesh
cells. Such mesh cells are called mixed ones. The major idea of the method is
reported in [Kuz05]. This work is a natural extension of the method in [Kuz06]
to 3D diffusion equations.
The discretization method consists of several steps. At the first step, we
partition each polyhedral cell into polyhedral subcells assuming that inside
each subcell the coefficients and the source function are relatively smooth.
Then, in each subcell we impose a local conforming tetrahedral mesh subject
to a structure of the neighboring subcells. The subcell tetrahedral meshes are
not required to be conforming on the interfaces between subcells. A special
finite element subspace of Hdiv (Ω) is invented, and the classical mixed finite
element method [BF91, RT91] is used for discretization of the diffusion equation with the Neumann boundary condition. At the final step, the interior
(with respect to the boundaries of polyhedral mesh cells) degrees of freedom
for the normal fluxes and for the solution function are eliminated, and a new
28
Yu.A. Kuznetsov
degree of freedom per mesh cell for the solution function is defined. The final
system of discrete equations has the same structure as for the classical mixed
FE method.
The paper is organized as follows. In Section 2, we formulate the problem
and requirements for the discretization. In Section 3, we describe partitionings of mesh cells into subcells and polyhedral meshes to be used for the
discretization. We also propose a special finite element subspace of Hdiv (Ω)
for the mixed finite element method. Finally, in Section 4, we describe a condensation procedure for the underlying algebraic system and transform the
condensed system into the standard form which is typical for the classical
finite element method on simplicial meshes. In the final part of Section 3, we
propose an alternative discretization method. In Remark 2 of Section 4, we
prove that this discretization method is equivalent to the “div-const” mixed
finite element method invented and investigated in [KR03, KR05].
2 Problem Formulation
We consider the diffusion equation
− div(a grad p) + cp = f
in Ω
(1)
with the Neumann boundary condition
(a grad p) · n = 0
on ∂Ω
(2)
where Ω is a polyhedral domain in R3 with the boundary ∂Ω, a = a(x)
is a symmetric positive definite 3 × 3 matrix (diffusion tensor) for any x =
(x1 , x2 , x3 ) ∈ Ω, c is a nonnegative function, f is a given source function, and
n is the outward unit normal to ∂Ω. The domain Ω is partitioned into m
open non-overlapping simply connected
m polyhedral subdomains Ωk with the
boundaries ∂Ωk , k = 1, m, i.e. Ω = k=1 Ω k . For the sake of simplicity, we
assume that in each of the subdomains Ωk the matrix a has constant entries
and the coefficient c is a nonnegative constant, k = 1, m. We naturally assume
that in the case c ≡ 0 in Ω the compatibility condition
f dx = 0
(3)
Ω
holds.
In this paper, we consider problem (1), (2) in the form of the first order
system
a−1 u + grad p = 0
in Ω,
− div u −
cp = −f in Ω,
(4)
u·n
=0
on ∂Ω,
where u is said to be the flux vector function.
Mixed FE Methods on Polyhedral Meshes
29
Let ΩH be a polyhedral mesh in Ω with polyhedral mesh cells Ek =
E k \ ∂Ek where ∂Ek are the boundaries of Ek , k = 1, n. Here, n is apositive
n
integer. We assume that Ek ∩ El = ∅, l = k, k, l = 1, n, and Ω = k=1 E k .
We do not assume that the mesh ΩH is geometrically conforming, i.e. the
interfaces ∂Ek ∩ ∂El between two neighboring cells Ek and El are not obliged
to be either a face, or an edge, or a vertex of these cells, l = k, k, l = 1, n. An
example of two nonconforming neighboring
prismatic cells is given in Figure 1.
m
The intersection of Ek with l=1 ∂Ωl defines the partitioning of Ek into
nk polyhedral subcells Ek,s , s = 1, nk , k = 1, n. An example of a partitioning
of a mesh cell into three subcells is given in Figure 2.
Fig. 1. An example of two neighboring prismatic mesh cells with nonconforming
intersecting faces
E
E
k,1
k,2
Ω
2
Ω
1
E
k,3
Ω
3
Fig. 2. An example of a partitioning of a polyhedral cell into three polyhedral
subcells
30
Yu.A. Kuznetsov
A mesh cell E with discontinuities either of the entries of the matrix a, or
the coefficient c, or both is said to be a mixed cell.
On the boundary ∂Ek of a polyhedral cell Ek we define a set of sk nonoverlapping flat polygons Γk,i , i = 1, sk , which satisfies the following three
conditions:
sk
1. ∂Ek = i=1
Γ k,i ;
2. each Γk,i belongs to ∂Ek,s for some s ≤ nk ;
3. each Γk,i belongs either to ∂Ω or to ∂Ek′ ,s′ for some k ′ = k, s′ ≤ nk′ ,
k ′ ≤ n,
where sk is a positive integer, k = 1, n. A 2D example of the partitioning of
∂Ek into Γk,i , i = 1, sk , with sk = 8 is given in Figure 3.
The goal of this paper is to develop a mixed finite element method for the
diffusion problem (4) on the above described polyhedral meshes under special
conditions on the degrees of freedom (DOF) which can be used for discretization. Namely, the final discretization can use only one DOF representing the
normal component of the solution flux vector function u in (4) on each Γk,i ,
i = 1, sk , and only one DOF representing the solution function p in (4) in
each Ek , k = 1, n.
To predict the final discretization scheme to be derived in Section 4, we
define the required discrete equation in Ek for the second equation in (4) by
integrating this equation over the mesh cell Ek :
Ω3
Ω2
Γk, 3
Γk, 2
Γk, 4
Ek, 1
γ k, 1
Γk, 1
Γk, 8
Γk, 5
γ k, 2
Ek, 2
Γk, 7
Γk, 6
Ω1
Fig. 3. A 2D example of the partitionings ∂Ek into Γk,i , i = 1, 8, and ∂Ek,1
into γk,j , j = 1, 2
∂Ek,2
Mixed FE Methods on Polyhedral Meshes
Ek
[− div u − cp] dx = −
f dx,
k = 1, n.
31
(5)
Ek
The latter equality results in the discrete equation
−
sk
i=1
uk,i |Γk,i | − ck |Ek |p̂k = −|Ek |fk ,
where
uk,i
1
=
|Γk,i |
Γk,i
u · nk ds
is the mean value of the normal flux u · nk on Γk,i ,
1
1
c dx and fk =
f dx
ck =
|Ek | Ek
|Ek | Ek
are the mean values of c and f in Ek , respectively,
cp dx
Ek
p̂k =
c dx
(6)
(7)
(8)
(9)
Ek
is the c-weighted mean value of p in Ek . Here, |Γk,i | and |Ek | denote the length
of Γk,i and the area of Ek , respectively, i = 1, sk , and nk is the outward unit
normal to ∂Ek , k = 1, n.
The equation (6) can be written in the matrix form by
0,(k) (k)
BH
where
ū
− ck |Ek |p̂k = −|Ek |fk ,
(10)
= − |Γk,1 | · · · |Γk,sk | ∈ R1×sk
(11)
T
0,(k)
and ū(k) = uk,1 , . . . , uk,sk ∈ Rsk , k = 1, n. The matrix BH will be used
later to derive the final discretization for the problem (4).
The formula (9) assumes that the coefficient c is not equal identically to
zero in Ek . In the case c ≡ 0 in Ek the discrete equation (6) is replaced by
the equation
sk
− uk,i |Γk,i | = −|Ek |fk ,
(12)
0,(k)
BH
i=1
and (10) is replaced by the equation
0,(k) (k)
BH
ū
= −|Ek |fk .
(13)
32
Yu.A. Kuznetsov
3 Mixed Finite Element Method
Let ∂0 Ek,s be the part of the boundary ∂Ek,s of apolyhedral subcell Ek,s
belonging to the interior of Ek , i.e. ∂0 Ek,s = ∂Ek,s Ek , s = 1, nk , k = 1, n.
On ∂0 Ek,s we define a set of tk,s non-overlapping flat polygons γk,s,j which
satisfies the following two conditions:
tk,s
1. ∂0 Ek,s = j=1
γ̄k,s,j ,
2. each γk,s,j belongs to ∂0 Ek,s′ for some s′ = s, s′ ≤ nk ,
where tk,s is a positive integer, s = 1, nk , k = 1, n.
Examples of the partitionings of ∂0 Ek,s into polygons γk,s,j are given in
Figures 3 and 4. In Figure 3, the interface ∂0 Ek,1 = ∂0 Ek,2 between Ek,1
and Ek,2 consists of γk,1 and γk,2 . In Figure 4, ∂0 Ek,1 consists of γk,1,1 = γ1 ,
γk,1,2 = γ2 , and γk,1,3 = γ3 , and ∂0 Ek,2 consists of γk,2,1 = γ3 , γk,2,2 = γ4 , and
γk,2,3 = γ5 . Finally, ∂0 Ek,3 consists of γk,3,1 = γ1 , γk,3,2 = γ2 , γk,3,3 = γ4 , and
γk,3,4 = γ5 .
Let Th,k,s = {ek,s,i } be conforming tetrahedral partitionings of Ek,s ,
s = 1, nk , k = 1, n. The conformity of a tetrahedral partitioning (tetrahedral mesh) means that any two different intersecting closed tetrahedrons in
Th,k,s have either a common vertex, or a common edge, or a common face.
The boundaries ∂Ek,s of Ek,s are unions of polygons in {Γk,i } and in
{γk,s,j }, s = 1, nk , k = 1, n. We assume that each of the tetrahedral meshes
Th,k,s is also conforming with respect to the boundaries of polygons in {Γk,i }
and in {γk,s,j } belonging to ∂Ek,s , i.e. these boundaries belong to the union of
γ3
Ek, 1
γ2
γ1
Ek, 3
γ4
Ek, 2
γ5
Fig. 4. An example of partitionings ∂0 Ek,s into segments γk,j,s , j = 1, tk , s = 1, 3
Mixed FE Methods on Polyhedral Meshes
33
edges of tetrahedrons in Th,k,s , s = 1, nk , k = 1, n. We do not assume that the
tetrahedral meshes Th,k,s and Th,k′ ,s′ are conforming on the interfaces between
neighboring cells Ek and Ek′ when k ′ = k as well as on the interfaces between
neighboring subcells Ek,s and Ek,s′ when k ′ = k.
Let Th be a tetrahedral partitioning of Ω such that its restrictions onto
Ek,s coincide with the tetrahedral meshes Th,k,s , and let RT0 (Ek,s ) be the
lowest order Raviart–Thomas finite element spaces on Th,k,s , s = 1, nk , k =
1, n. We define the finite element spaces Vh,k,s consisting of vector functions
w ∈ RT0 (Ek,s ) which have constant normal fluxes w · nk,s on each of the flat
polygons Γk,i and γk,j belonging to ∂Ek,s , where nk,s are the outward unit
normals to ∂Ek,s , s = 1, nk , k = 1, n. Then, we define the spaces Vh,k on Ek
assuming that the restrictions wk,s of any vector function wk ∈ Vh,k onto
Ek,s belong to the spaces Vh,k,s , s = 1, nk , and the normal components of
wk are continuous through γk,s,j , j = 1, tk . To satisfy the latter condition
we assume that on each polygon γk,s,j belonging to ∂Ek,s ∩ ∂Ek,s′ , s′ = s,
the outward normal components of vector functions wk,s and wk,s′ satisfy
the equalities wk,s · nk,s + wk,s′ · nk,s′ = 0 (we recall that nk,s + nk,s′ = 0),
j = 1, tk,s , k = 1, n.
Finally, we define the finite element space Vh assuming that the restrictions wk of any vector function w ∈ Vh onto Ek belong to the spaces Vh,k
and the normal components of w are continuous on the interfaces ∂Ek ∩ ∂El
between Ek and El . To satisfy the latter condition we assume that on each
polygon Γk,i belonging to ∂Ek ∩∂El the outward normal components of vector
functions wk and wl satisfy the condition wk · nk + wl · nl = 0, 1 ≤ i ≤ sk ,
l = k, k, l = 1, n.
We define the finite element space Qh for the solution function p by setting
that functions in Qh are constant in each of the tetrahedrons in the partitionings Th,k,s , s = 1, nk , k = 1, n. With the defined FE spaces Vh and Qh ,
the mixed finite element discretization to (4) is as follows: Find uh ∈ Vh ,
uh · n = 0 on ∂Ω, and ph ∈ Qh , such that
a−1 uh · v dx −
ph div v dx = 0,
Ω
Ω
(14)
cph q dx
= − f q dx
−
− div uh q dx
Ω
Ω
Ω
for all v ∈ Vh , v · n = 0 on ∂Ω, and q ∈ Qh .
Finite element problem (14) results in the system of linear algebraic equations
M ū + B T p̄ + C T λ̄ = 0,
(15)
B ū − Σ p̄
= F,
C ū
= 0.
Here, M ∈ Rn̂×n̂ is a symmetric positive definite matrix, Σ ∈ RN ×N is either
a symmetric positive definite or a symmetric positive semidefinite matrix,
B ∈ RN ×n̂ , and C ∈ Rñ×n̂ , where n̂ = dim Vh , N is the total number of
34
Yu.A. Kuznetsov
tetrahedrons in Th , and ñ is the total number of polygons Γk,i , i = 1, sk ,
k = 1, n, belonging to ∂Ω. The components of the Lagrange multiplier vector
λ̄ ∈ Rñ represent the mean values of the solution function p on the polygons
Γk,i ⊂ ∂Ω, i = 1, sk , k = 1, n. The third matrix equation in (15) takes care of
the Neumann boundary condition on ∂Ω.
We also consider another discretization to (4): Find uh ∈ Vh , uh · n = 0
on ∂Ω, and ph ∈ Qh such that
a−1 uh · v dx −
p̃h div v dx = 0,
Ω
Ω
(16)
−
cp̃h q dx
= − f˜h q dx
− div uh q dx
Ω
Ω
Ω
for all v ∈ Vh , v · n = 0 on ∂Ω, and q ∈ Qh . Here,
1
ph (x′ ) dx′ , x ∈ Ek,s ,
p̃h (x) =
|Ek,s | Ek,s
and
f˜h (x) =
1
|Ek,s |
f (x′ ) dx′ ,
Ek,s
x ∈ Ek,s ,
(17)
(18)
where |Ek,s | is the volume of Ek,s , s = 1, nk , k = 1, n.
The finite element problem (16) results in the system of linear algebraic
equations
M ū + B T p̄ + C T λ̄ = 0,
p̄
(19)
B ū − Σ
= F 1,
C ū
= 0,
where the matrices M , B, and C are the same as in the system (15). The
∈ RN ×N is a block diagonal matrix with N
= n nk diagonal
matrix Σ
k=1
submatrices
k,s =
Σ
1
ck,s Dk,s ēk,s ēTk,s Dk,s ∈ RNk,s ×Nk,s
|Ek,s |
subvectors
and the vector F 1 ∈ RN consists of N
F k,s = −fk,s Dk,s ēk,s ∈ RNk,s
(20)
(21)
k,s and one vector F k,s per subcell Ek,s ), where ck,s is the
(one matrix Σ
value of the coefficient c in Ek,s , fk,s is the value of the function f˜h in Ek,s ,
ēk,s = (1, . . . , 1)T ∈ RNk,s , and Nk,s is the total number of tetrahedrons in
Th,k,s , s = 1, nk , k = 1, n. Here, Dk,s are diagonal Nk,s × Nk,s matrices with
the volumes of tetrahedrons {ek,s,i } in Th,k,s on the diagonals, s = 1, nk ,
k = 1, n.
In Section 4, we shall prove that the method (16)–(18) is equivalent to
the “div-const” mixed finite element method [KR03, KR05] on the polyhedral
mesh consisting of the polyhedral mesh cells Ek,s , s = 1, nk , k = 1, n.
Mixed FE Methods on Polyhedral Meshes
35
4 Hybridization and Condensation
The underlying system of algebraic equations for the problem (14) can be
written in the macro-hybrid form as follows:
Mk ūk + BkT p̄k + CkT λ̄k = 0,
Bk ūk − Σk p̄k
= F k,
(22)
k = 1, n, complemented by the continuity conditions for the normal fluxes on
the interfaces ∂Ek ∩ ∂El between neighboring cells Ek and El , k, l = 1, n, and
by the Neumann boundary condition for the normal fluxes on ∂Ω. The vector
λ̄k ∈ Rsk represents the mean values of the solution function p on polygons
Γk,i , i = 1, sk , k = 1, n. The matrices Σk are diagonal blocks of the matrix
Σ and the vectors F k are subvectors of the vector F in (15). The matrices
M and B in (15) can be defined by assembling of the matrices Mk and Bk in
(22), respectively.
We partition the components of the vector ūk in (22) into two groups. In
the first group, denoted by subindex H, we include the DOF assigned for the
polygons Γk,i , i = 1, sk , on the boundary of Ek , and to the second group,
denoted by subindex h, we include the rest of the DOF which are interior for
the cell Ek , k = 1, n. Then, the equations (22) can be written in the equivalent
block form (the subindex k is omitted) as follows:
T
MH ūH + MHh ūh + BH
p̄ + C T λ̄ = 0,
T
MhH ūH + Mh ūh + Bh p̄
= 0,
BH ūH + Bh ūh − Σ p̄
= F.
(23)
At first, we consider the case when the coefficient c is a positive function
in Ek , i.e. the matrix Σk in (22) is symmetric and positive definite, 1 ≤ k ≤ n.
We eliminate the vectors ūh and p̄ from (23) in two steps. At the first step,
we eliminate the vector ūh and get the system
where
and
H ūH + B
T p̄ + C T λ̄ = 0,
M
H
BH ūH − Sh p̄
= F,
H = MH − MHh M −1 MhH ,
M
h
H = BH − Bh M −1 MhH ,
B
h
Sh = Bh Mh−1 BhT + Σ.
(24)
(25)
(26)
H and Sh are symmetric and positive
It is obvious that the matrices M
definite. Moreover, the dimension of the null space of the matrix Bh Mh−1 BhT
T
equals to one, and the vector ē = 1, . . . , 1 belongs to the null space of this
T
matrix (ē ∈ ker Bh ).
36
Yu.A. Kuznetsov
At the second step, we eliminate the vector p̄ in (24). Then, we get the
system
ūH + C T λ̄ = ḡ
M
(27)
complemented by the interface and boundary conditions for the components
of ūH . Here,
T −1
H = M
H + B
H
M
Sh BH
(28)
and
T −1
H
ḡ = B
Sh F .
(29)
Sh w̄ = µΣ w̄.
(30)
H in (28), we consider the eigenvalue problem
To analyze the matrix M
Let ν be the dimension of Sh . Then problem (30) has ν positive eigenvalues
1 = µ1 < µ2 ≤ · · · ≤ µν
(31)
and ν corresponding Σ-orthonormal eigenvectors
w̄1 =
where the vector ē = 1, . . . , 1
T
1
ē,
σ
w̄2 , . . . , w̄ν ,
(32)
∈ Rν and
σ ≡ σk =
Ek
1/2
cdx
.
(33)
Thus, we get
ν
Sh−1 =
and
where the matrix
1 T 1
1
ēē +
w̄j w̄jT ≡ 2 ēēT + Qh
2
σ
µ
σ
j=2 j
1 T T
0
H = MH
M
ēē BH ,
+ 2B
σ H
0
T
H
H + B
H
=M
Qh B
MH
(34)
(35)
(36)
is symmetric and positive definite.
Statement 1 The equality
0
H = BH
ēT B
0,(k)
0
holds where the matrix BH
≡ BH
is defined in (11), 1 ≤ k ≤ n.
(37)
Mixed FE Methods on Polyhedral Meshes
37
To derive the required final discretization for the problem (4) we introduce
the new variable p̂ by the formula
1
1 0
T
(38)
p̂ = 2 ēT B
H ūH − ē F ≡ 2 BH ūH + |E|f ,
σ
σ
where
f =−
1 T
ē F .
|E|
(39)
(k)
Then, we get the system in terms of ūH and pk (we return the index k):
0,(k) (k)
ūH
0,(k) (k)
BH ūH
MH
0,(k) T
+ BH
p̂k + CkT λ̄ = ĝk ,
−
ck |Ek |p̂k
= −|Ek |fk ,
(40)
k = 1, n, complemented by the equations of continuity of normal fluxes on the
interfaces between neighboring polyhedral cells and by the equations for the
normal fluxes on ∂Ω. Here,
ĝk = ḡk −
1 (k) T
B
ēk ēTk F k
σk2 H
(41)
and the values of ck and fk are defined in (8). Recall that σk2 = ck |Ek |.
Now, we return to the system (23) and consider the case when the coefficient c ≡ 0 in Ek , i.e. Σk is the zero matrix. In this case, the matrix
Sh = Bh Mh−1 BhT
(42)
in (26) is singular.
Let us consider the eigenvalue problem
Sh w̄ = µDw̄,
(43)
where the subindex k staying for the number of the cell E = Ek is again
omitted. This eigenvalue problem has one zero eigenvalue µ1 = 0 and ν − 1
positive eigenvalues µ2 ≤ µ3 ≤ · · · ≤ µν where ν is the dimension of Sh . We
denote the system of D-orthonormal eigenvectors of problem (43) by
w̄1 , w̄2 , . . . , w̄ν ,
(44)
1
ē.
|E|1/2
(45)
where
w̄1 =
The spectral decomposition of the matrix Sh with respect to eigenvalue
problem (43) is defined by the following formula:
Sh = DW ΛW T D,
(46)
38
Yu.A. Kuznetsov
where
and
Λ = diag µ1 , µ2 , . . . , µν
W = w̄1 w̄2 · · · w̄ν .
(47)
(48)
Consider the second equation in (24) in the form
H ūH − F .
Sh p̄ = B
A solution vector p̄ of this system can be presented by the formula
H ūH − F + αē
p̄ = S + B
h
(49)
(50)
with an arbitrary coefficient α ∈ R in the right-hand side and
Sh+ = W Λ+ W T .
Here,
"
!
−1
Λ+ = diag 0, µ−1
2 , . . . , µν
(51)
(52)
is a diagonal matrix.
Substituting vector p̄ in (50) to the second equation in (23), we get the
equation
H ūH + Mh ūh = BhT S + F .
MhH + BhT Sh+ B
(53)
h
Thus,
ūh = R1 ūH + R2 F ,
where
and
H
R1 = −Mh−1 MhH + BhT Sh+ B
R2 = Mh−1 BhT Sh+ .
(54)
(55)
(56)
Now, we replace the first two equations in (23) by a single equation. To
derive this equation, we multiply the first two equations in (23) by the matrix
IH R1T ,
where IH is the identity sk × sk matrix, and then substitute the vector ūh
defined by formula (54) into the new equation. We get the resulting equation
in terms of vectors ūH , p̄, and λ̄ in the following form:
where the matrix
0
T
H
MH
ūH + B
p̄ + C T λ̄ = ḡ,
$# $
#
MH MHh IH
0
MH
= IH R1T
MhH Mh
R1
(57)
(58)
Mixed FE Methods on Polyhedral Meshes
39
is symmetric and positive definite,
T = B T + RT B T ,
B
H
H
1 h
and
(59)
ḡ = − MHh + R1T Mh R2 F .
(60)
T in (59):
Let us analyze the matrix B
H
T
T
T
T +
H
H
B
= BH
− MhH
+B
Sh Bh Mh−1 BhT =
T
T
− MhH
Mh−1 BhT
= BH
1 T T
=
B ēē D.
|E| H
I − Sh+ Sh =
(61)
To derive the latter formula we used the identity
I − Sh+ Sh =
1 T
ēē D
|E|
(62)
and the fact that ē ∈ ker BhT .
Thus, the equation (57) is equivalent to the equation
0 T
0
MH
p̂ + C T λ̄ = ḡ
ūH + BH
(63)
0
where the matrix BH
is defined in (11), i.e.
0
= ēT BH ,
BH
and
p̂ =
1 T
1
ē Dp̄ ≡
|E|
|E|
(64)
ph dx
(65)
E
is the mean value of ph in the polyhedral cell E.
Complementing the equation (63) in E ≡ Ek by the equation (10) with
(k)
ck = 0, we get the system in terms of ūH and p̂k (we again return the index k):
0,(k) (k)
ūH
0,(k) (k)
BH ūH
MH
0,(k) T
+ BH
p̂k + CkT λ̄ = ĝk ,
= −|Ek |fk ,
0,(k)
(66)
0
0
where MH = MH
and MH
is defined in (58). Recall that the equations (66)
are derived for the case c ≡ 0 in Ek , 1 ≤ k ≤ n.
Using the assembling procedure we get the system in terms of ūH , p̄H , and
the boundary Lagrange multipliers λ̄:
T
0 T
p̄H + C 0 λ̄ = ḡ 0 ,
M 0 ūH + BH
0
(67)
Σ 0 p̄H
=F ,
B 0 ūH −
H
C 0 ūH
= 0.
40
Yu.A. Kuznetsov
0,(k)
The matrix M 0 in (67) is obtained by the assembling of matrices MH
defined in (36) if the coefficient c is a positive function in Ek or in (58) if c ≡ 0
in Ek , k = 1, n. Respectively, the components p̂k of the vector p̄H in (67) are
defined either in (9) if the coefficient c is a positive function in Ek or in (65)
if c ≡ 0 in Ek , k = 1, n.
The elimination of ūH (condensation of the system (67)) results in the algebraic system in terms of vector p̄H and the interface and boundary Lagrange
multiplier vector λ̄:
# $
p̄
A H = q̄.
(68)
λ̄
Here,
A=
where
n
k=1
Nk Ak NkT ,
$
$ # 0,(k) $
−1 #
T
ck |Ek | 0
BH
0,(k)
0,(k)
T
+
Ak =
MH
Ck
BH
0 0
Ck
#
(69)
(70)
are symmetric and positive definite matrices, and Nk are the underlying assembling matrices, k = 1, n. The formula for the vector q̄ in (68) can be easily
derived.
Remark 1. If the function f is constant in E ≡ Ek then the vector F in (23)
is defined by the formula
F = −fE Dē,
(71)
where fE is the value of f in E, and belongs to the null space of the matrix
S + in (51). To this end, instead of (54) we have
ūh = R1 ūH ,
(72)
and ḡ in (57) is the zero vector. Simple analysis shows that the resulting
discretization (66) is equivalent to the “div-const” discretization proposed in
[KR03] (see also [KLS04, KR05]).
Remark 2. The previous remark is concerned the case when c ≡ 0 in E ≡ Ek ,
1 ≤ k ≤ n. Consider the case when c is a positive function in E, the diffusion
equation is discretized by the method (16)–(18) and the value nk for this cell is
equal to one. Under the assumptions made, the equation (index k is omitted)
p̄ = F 1 ,
BH ūH + Bh ūh − Σ
(73)
and the vector F 1 are defined in (20) and (21), respecwhere the matrix Σ
tively, is the underlying counterpart of the third equation in (23). Similar to
(50), we can consider the following formula for the solution subvector p̄:
H ūH − Σ
p̄ − F 1 + αē
p̄ = Sh+ B
(74)
Mixed FE Methods on Polyhedral Meshes
41
with some coefficient α ∈ R where
Sh = Bh Mh−1 BhT
(75)
p̄ and F 1 belong to ker S + . Therefore,
and Sh+ is defined in (51). The vectors Σ
h
instead of (74) we get
H ūH + αē.
(76)
p̄ = Sh+ B
It proves that for the discretization method (16)–(18) the formula (72) is
still valid, and the final discretization (66) is equivalent to the “div-const”
discretization in [KR03].
Acknowledgement. This research was supported by Los Alamos Computational Sciences Institute (LACSI) and by ExxonMobil Upstream Research Company. The
author is grateful to S. Maliassov and M. Shashkov for fruitful discussions, as well
as to O. Boyarkin, V. Gvozdev, and D. Svyatskiy for numerical implementation and
applications of the proposed method.
References
[BF91]
[Kuz05]
[Kuz06]
[KLS04]
[KR03]
[KR05]
[RT91]
F. Brezzi and M. Fortin. Mixed and hybrid finite element methods. SpringerVerlag, Berlin 1991
Yu. Kuznetsov. Mixed finite element method in domains of complex geometry. In Abstract Book – 1st International Seminar of SCOMA, number
A4/2005 in Reports of the Department of Mathematical Information Technology, Series A, Collections, University of Jyväskylä, Jyväskylä, 2005.
Yu. Kuznetsov. Mixed finite element method for diffusion equations on
polygonal meshes with mixed cells. J. Numer. Math., 14(4):305–315, 2006
Yu. Kuznetsov, K. Lipnikov, and M. Shashkov. The mimetic finite difference method on polygonal meshes for diffusion-type equations. Comput.
Geosci., 8:301–324, 2004
Yu. Kuznetsov and S. Repin. New mixed finite element method on polygonal and polyhedral meshes. Russian J. Numer. Anal. Math. Modelling,
18(3):261–278, 2003
Yu. Kuznetsov and S. Repin. Convergence analysis and error estimates
for mixed finite element method on distorted meshes. J. Numer. Math.,
13(1):33–51, 2005
J. E. Roberts and J.-M. Thomas. Mixed and hybrid methods. In P.G. Ciarlet and J.-L. Lions, editors, Handbook of Numerical Analysis, Vol. II,
pages 523–639. North-Holland, Amsterdam, 1991.
On the Numerical Solution of the Elliptic
Monge–Ampère Equation in Dimension Two:
A Least-Squares Approach
Edward J. Dean and Roland Glowinski
University of Houston, Department of Mathematics, 651 P. G. Hoffman Hall,
Houston, TX 77204-3008, USA roland@math.uh.edu, dean@math.uh.edu
1 Introduction
During his outstanding career, Olivier Pironneau has addressed the solution
of a large variety of problems from the Natural Sciences, Engineering and
Finance to name a few, an evidence of his activity being the many articles
and books he has written. It is the opinion of these authors, and former collaborators of O. Pironneau (cf. [DGP91]), that this chapter is well-suited to
a volume honoring him. Indeed, the two pillars of the solution methodology
that we are going to describe are: (1) a nonlinear least squares formulation in
an appropriate Hilbert space, and (2) a mixed finite element approximation,
reminiscent of the one used in [DGP91] and [GP79] for solving the Stokes
and Navier–Stokes equations in their stream function-vorticity formulation;
the contributions of O. Pironneau on the two above topics are well-known
world wide. Last but not least, we will show that the solution method discussed here can be viewed as a solution method for a non-standard variant of
the incompressible Navier–Stokes equations, an area where O. Pironneau has
many outstanding and celebrated contributions (cf. [Pir89], for example).
The main goal of this article is to discuss the numerical solution of the
Dirichlet problem for the prototypical two-dimensional elliptic Monge–Ampère
equation, namely
(E-MA-D)
det D2 ψ = f in Ω, ψ = g on Γ.
In (E-MA-D): (1) Ω is a bounded domain of R2 and Γ is its boundary; (2)
f and g are given functions with f > 0; D2 ψ = (∂ 2 ψ/∂xi ∂xj )1≤i,j≤2 is
the Hessian of the unknown function ψ. The partial differential equation in
(E-MA-D) is a fully nonlinear elliptic one (in the sense of, e.g., Gilbarg and
Trudinger [GT01] and Caffarelli and Cabré [CC95]). The mathematical analysis of problems such as (E-MA-D) has produced a quite abundant literature;
let us mention, among many others, [GT01, CC95, Aub82, Aub98, Cab02]
and the references therein. On the other hand, and to the best of our knowledge, the numerical analysis community has largely ignored these problems,
44
E.J. Dean and R. Glowinski
so far, some notable exceptions being provided by [BB00, OP88, CKO99] (see
also [DG03, DG04]). Indeed we can not resist quoting [BB00] (an article dedicated to the numerical solution of the celebrated Monge–Kantorovitch optimal
transportation problem):
“It follows from this theoretical result that a natural computational
solution of the L2 MKP is the numerical resolution of the Monge–
Ampère equation (6). Unfortunately, this fully nonlinear second-order
elliptic equation has not received much attention from numerical analysts and, to the best of our knowledge, there is no efficient finitedifference or finite-element methods, comparable to those developed
for linear second-order elliptic equations (such as fast Poisson solvers,
multigrid methods, preconditioned conjugate gradient methods, . . . ).”
We will show in this article that, actually, fully nonlinear elliptic problems
such as (E-MA-D) can be solved by appropriate combinations of fast Poisson solvers and preconditioned conjugate gradient methods. However, unlike
the (closely related) Dirichlet problem for the Laplace operator, the problem
(E-MA-D) may have multiple solutions (actually, two at most; cf., e.g., [CH89,
Chapter 4]), and the smoothness of the data does not imply the existence of a
smooth solution. Concerning the last property, suppose that Ω = (0, 1)×(0, 1)
and consider the special case where (E-MA-D) is defined by
2
∂ 2 ψ ∂ 2 ψ ∂ 2 ψ
= 1 in Ω,
−
∂x21 ∂x22 ∂x1 ∂x2
ψ = 0 on Γ.
(1)
The problem (1) can not have smooth solutions since, for those solutions, the
boundary condition ψ = 0 on Γ implies that the product (∂ 2 ψ/∂x21 )(∂ 2 ψ/∂x22 )
and the cross-derivative ∂ 2 ψ/∂x1 ∂x2 vanish at the boundary, implying in turn
that det D2 ψ is strictly less than one in some neighborhood of Γ . The above
(non-existence) result is not a consequence of the non-smoothness of Γ , since
a similar non-existence property holds if in (1) one replaces the above Ω by
the ovoı̈d-shaped domain whose C ∞ -boundary is defined by
Γ =
4
%
Γi ,
i=1
with
Γ1 = {x | x = {x1 , x2 }, x2 = 0, 0 ≤ x1 ≤ 1},
Γ3 = {x | x = {x1 , x2 }, x2 = 1, 0 ≤ x1 ≤ 1},
Γ2 = {x | x = {x1 , x2 }, x1 = 1 − ln 4/(ln x2 (1 − x2 )), 0 ≤ x2 ≤ 1},
Γ4 = {x | x = {x1 , x2 }, x1 = ln 4/(ln x2 (1 − x2 )), 0 ≤ x2 ≤ 1}.
Actually, for the above two Ωs the non-existence of solutions for the problem
(1) follows from the non-strict convexity of these domains. Albeit the problem
Elliptic Monge–Ampère Equation in Dimension Two
45
(1) has no classical solution it has viscosity solutions in the sense of Crandall–
Lions, as shown in, e.g., [CC95, Cab02, Jan88, Urb88, CIL92]. The Crandall–
Lions viscosity approach relies heavily on the maximum principle, unlike the
variational methods used to solve, for example, the second order linear elliptic
equations in divergence form in some appropriate subspace of the Hilbert
space H 1 (Ω). The least-squares approach discussed in this article operates in
the space H 2 (Ω) × Q where Q is the Hilbert space of the 2 × 2 symmetric
tensor-valued functions with component in L2 (Ω). Combined with mixed finite
element approximations and operator-splitting methods it will have the ability,
if g has the H 3/2 (Γ )-regularity, to capture classical solutions, if such solutions
exist, and to compute generalized solutions to problems like (1) which have
no classical solution. Actually, we will show that these generalized solutions
are also viscosity solutions, but in a sense different from Crandall–Lions’.
Remark 1. Suppose that Ω is simply connected. Let us define a vector-valued
∂ψ
∂ψ
, − ∂x
} (= {u1 , u2 }). The problem (E-MA-D) takes
function u by u = { ∂x
2
1
then the equivalent formulation
⎧
in Ω, ∇ · u = 0 in Ω,
⎨ det ∇u = f
(2)
dg
⎩ u·n=
on Γ,
ds
where n denotes the outward unit vector normal at Γ , and s is a counterclockwise curvilinear abscissa. Once u is known, one obtains ψ via the solution
of the following Poisson–Dirichlet problem:
−△ψ =
∂u1
∂u2
−
∂x1
∂x2
in Ω,
ψ = g on Γ.
The problem (2) has clearly an incompressible fluid flow flavor, ψ playing
here the role of a stream function. The relations (2) can be used to solve the
problem (E-MA-D) but this approach will not be further investigated here.
Remark 2. As shown in [DG05], the methodology discussed in this article applies also (among other problems) to the Pucci–Dirichlet problem
αλ+ + λ− = 0 in Ω,
ψ = g on Γ,
(PUC-D)
with λ+ (resp., λ− ) the largest (resp., the smallest) eigenvalue of D2 ψ and
α ∈ (1, +∞). (If α = 1, one recovers the linear Poisson–Dirichlet problem.)
Remark 3. A shortened version of this article can be found in [DG04].
Remark 4. The solution of (E-MA-D) by augmented Lagrangian methods is
discussed in [DG03, DG06a, DG06b].
46
E.J. Dean and R. Glowinski
2 A Least Squares Formulation of the Problem
(E-MA-D)
From now on, we suppose that f > 0 and that {f, g} ∈ {L1 (Ω), H 3/2 (Γ )},
implying that the following space and set are non-empty:
Vg = {ϕ | ϕ ∈ H 2 (Ω), ϕ = g on ∂Ω},
Qf = {q | q ∈ Q, det q = f },
with
Q = {q | q ∈ (L2 (Ω))2×2 , q = qt }.
Solving the Monge–Ampère equation in H 2 (Ω) is equivalent to looking for
the intersection in Q of the two sets D2 Vg and Qf , an infinite dimensional
geometry problem “visualized” in Figures 1 and 2.
If D2 Vg ∩ Qf = ∅ as “shown” in Figure 1, then the problem (E-MA-D)
has a solution in H 2 (Ω). If, on the other hand, it is the situation of Figure 2
which prevails, namely D2 Vg ∩ Qf = ∅, (E-MA-D) has no solution in H 2 (Ω).
However, Figure 2 is constructive in the sense that it suggests looking for a
pair {ψ, p} which minimizes, globally or locally, some distance between D2 ϕ
and q when {ϕ, q} describes the set Vg × Qf .
According to the above suggestion, and in order to handle those situations
where (E-MA-D) has no solution in H 2 (Ω), despite the fact that neither Vg
nor Qf are empty, we suggest to solve the above problem via the following
(nonlinear) least squares formulation:
)
Find {ψ, p} ∈ Vg × Qf such that
(LSQ)
j(ψ, p) ≤ j(ϕ, q), ∀{ϕ, q} ∈ Vg × Qf ,
where, in (LSQ) and below, we have (with dx = dx1 dx2 ):
|D2 ϕ − q|2 dx
j(ϕ, q) = 21
(3)
Ω
and
2
2
2 1/2
|q| = (q11
+ q22
+ 2q12
) ,
D2Vg
Qf
∀q(= (qij )1≤i,j≤2 ) ∈ Q.
(4)
D2 Vg
Qf
D2ψ
p= D
Q
Qf
2ψ
Qf
Fig. 1. Problem (E-MA-D) has a solution in H 2 (Ω).
p
Q
Fig. 2. Problem (E-MA-D) has no solution in H 2 (Ω).
Elliptic Monge–Ampère Equation in Dimension Two
47
Remark 5. The results (described in [DG05]), concerning the numerical solution of the Pucci’s problem (PUC-D) (see Remark 2), suggest that defining
|q| by
2
2
2 1/2
+ q22
+ q12
) , ∀q(= (qij )1≤i,j≤2 ) ∈ Q,
(5)
|q| = (q11
instead of (4), may improve the convergence of the algorithms to be described
in the following sections. We intend to check this conjecture in a near future.
In order to solve (LSQ) by operator-splitting techniques it is convenient to
observe that (LSQ) is equivalent to
)
{ψ, p} ∈ Vg × Q,
(LSQ-P)
jf (ψ, p) ≤ jf (ϕ, q), ∀{ϕ, q} ∈ Vg × Q,
where
jf (ϕ, q) = j(ϕ, q) + If (q),
with
)
0,
If (q) =
+∞,
∀{ϕ, q} ∈ Vg × Q,
(6)
if q ∈ Qf ,
if q ∈ Q \ Qf ,
i.e., If (·) is the indicator functional of the set Qf .
3 An Operator-Splitting Based Method for the Solution
of (E-MA-D) via (LSQ-P)
We can solve the least-squares problem (LSQ) by a block relaxation method
operating alternatively between Vg and Qf . Such relaxation algorithms are
discussed in, e.g., [Glo84]. Closely related algorithms are obtained as follows:
Step 1. Derive the Euler-Lagrange equation of (LSQ-P).
Step 2. Associate to the above Euler-Lagrange equation an initial value problem (flow in the Dynamical System terminology) in Vg × Q.
Step 3. Use operator-splitting to time discretize the above flow problem.
Applying the above program, Step 1 provides us with the Euler–Lagrange
equation of the problem (LSQ-P). A variational formulation of this equation
reads as follows:
⎧
⎨ {ψ, p} ∈ Vg × Q,
(7)
⎩
(D2 ψ − p) : (D2 ϕ − q) dx + ∂If (p), q = 0, ∀{ϕ, q} ∈ V0 × Q,
Ω
where ∂If (p) denotes a generalized differential of the functional If (·) at p.
Next, we have denoted by S : T the Fröbenius scalar product of the two 2 × 2
symmetric tensors S (= (sij )) and T (= (tij )), namely
48
E.J. Dean and R. Glowinski
S : T = s11 t11 + s22 t22 + 2s12 t12
and, finally,
V0 = H 2 (Ω) ∩ H01 (Ω).
Next, we achieve Step 2 by associating with (7) the following initial value
problem (flow), written in semi-variational form:
⎧
Find {ψ(t), p(t)} ∈ Vg × Q for all t > 0 such that
⎪
⎪
⎪
⎪
⎪
⎨
[∂(△ψ)/∂t]△ϕ dx +
D2 ψ : D2 ϕ dx =
p : D2 ϕ dx, ∀ϕ ∈ V0 ,
Ω
Ω
Ω
⎪
⎪
2
⎪
⎪
⎪ ∂p/∂t + p + ∂If (p) = D ψ,
⎩
{ψ(0), p(0)} = {ψ0 , p0 },
(8)
and we look at the limit of {ψ(t), p(t)} as t → +∞. The choice of ψ 0 and p0
will be discussed in Remark 6.
Finally, concerning Step 3 we advocate the following operator-splitting
scheme (à la Marchuk–Yanenko, see, e.g., [Glo03, Chapter 6] and the references therein), but we acknowledge that other splitting schemes are possible:
{ψ 0 , p0 } = {ψ0 , p0 }.
(9)
Then, for n ≥ 0, {ψ n , pn } being known, we obtain {ψ n+1 , pn+1 } from the
solution of
(pn+1 − pn )/τ + pn+1 + ∂If (pn+1 ) = D2 ψ n ,
⎧ n+1
ψ
∈ Vg ;
⎪
⎪
⎪
⎪
⎪
n+1
⎨
n
D2 ψ n+1 : D2 ϕ dx =
△ ψ
− ψ /τ △ϕ dx +
Ω
Ω
⎪
⎪
⎪
⎪
n+1
2
⎪
p
: D ϕ dx, ∀ϕ ∈ V0 ;
⎩ =
(10)
(11)
Ω
above, τ (> 0) is a time-discretization step.
The solution of the sub-problems (10) and (11) will be discussed in Sections
4 and 5, respectively.
Remark 6. The initialization of the flow defined by (8) and of its time-discrete
variant defined by (9)–(11) are clearly important issues. Let us denote by λ1
and λ2 the eigenvalues of the Hessian D2 ψ. It follows from (E-MA-D) that
λ1 λ2 = f , implying in turn that
+
+
λ1 λ2 = f .
(12)
We have, on the other hand,
|△ψ| = |λ1 + λ2 |.
(13)
Elliptic Monge–Ampère Equation in Dimension Two
49
Suppose that we look for a convex solution of (E-MA-D). We have then
λ1 and λ2 positive. Comparing (12) (geometric mean) and (13) (arithmetic
mean) suggests to define ψ0 as the solution of
+
(14)
△ψ0 = 2 f in Ω, ψ0 = g on Γ.
If we look for a concave solution we suggest to define ψ0 as the solution of
+
−△ψ0 = 2 f in Ω, ψ0 = g on Γ.
(15)
√
If {f, g} ∈ L1 (Ω) × H 3/2 (Γ ), then { f , g} ∈ L2 (Ω) × H 3/2 (Γ ), implying that
each of the problems (14) and (15) has a unique solution in Vg (assuming of
course that Ω is convex and/or that Γ is sufficiently smooth). Concerning p0
an obvious choice is provided by
p0 = D2 ψ0 ,
another possibility being
p0 =
√
f √0
0
f
(16)
.
(17)
The symmetric tensor defined by (17) belongs clearly to Qf .
4 On the Solution of the Nonlinear Sub-Problems (10)
Concerning the solution of the sub-problems of type (10), we interpret (10)
as the Euler–Lagrange equation of the following minimization problem:
)
pn+1 ∈ Qf ,
(18)
Jn (pn+1 ) ≤ Jn (q), ∀q ∈ Qf ,
with
Jn (q) =
1
(1 + τ )
2
Ω
|q|2 dx −
(pn + τ D2 ψ n ) : q dx.
(19)
Ω
It follows from (19) that the problem (18) can be solved point-wise on Ω
(in practice, at the grid points of a finite element or finite difference mesh).
To be more precise, we have to solve, a.e. on Ω, a minimization problem of
the following type:
1 2
⎧
(z1 + z22 + 2z32 ) − b1 (x)z1 − b2 (x)z2 − 2b3 (x)z3
⎨ min
2
z
!
(20)
"
3
2
⎩ with z = {zi }3
i=1 ∈ z | z ∈ R , z1 z2 − z3 = f (x) .
Actually, if one looks for convex (resp., concave) solutions of (E-MA-D),
we should prescribe the following additional constraints: z1 ≥ 0, z2 ≥ 0
(resp., z1 ≤ 0, z2 ≤ 0). For the solution of the problem (20) (a constrained
50
E.J. Dean and R. Glowinski
minimization problem in R3 ) we advocate those methods discussed in, e.g.,
[DS96] (after introduction of a Lagrange multiplier to handle the constraint
z1 z2 − z32 = f (x)). Other methods are possible, including the reduction of (20)
to a two-dimensional problem via the elimination of z3 . Indeed, we observe
that (20) is equivalent to
#
$
⎧
1
1
2
⎪
2
⎪
min
(z
+
z
)
−
b
(x)z
−
b
(x)z
−
2|b
(x)|(z
z
−
f
(x))
1
2
1
1
2
2
3
1
2
⎪
⎪
z
2
⎪
⎨
(21)
with z(= {zi }3i=1 ) ∈ z | z ∈ R3 , z1 z2 − f (x) ≥ 0,
⎪
⎪
⎪
⎪
1
⎪
⎩
z3 = sgn(b3 (x))(z1 z2 − f (x)) 2 ,
which leads to the above mentioned reduction; then we make “almost” trivial
the solution of the problem (21) by using the following change of variables
(reminiscent of the polar coordinate based technique used in [DG05] for the
solution of the Pucci’s equation (PUC-D), introduced in Remark 2):
+
+
z1 = ρ f eθ , z2 = ρ f e−θ ,
with θ ∈ R and ρ ≥ 1 (resp., ρ ≤ −1) if one looks for a convex (resp., concave)
solution of (E-MA-D).
5 On the Conjugate Gradient Solution of the Linear
Sub-Problems (11)
The sub-problems (11) are all members of the following family of linear variational problems:
⎧
⎨ u ∈ Vg ,
(22)
⎩
△u△v dx + τ
D2 u : D2 v dx = L(v), ∀v ∈ V0 ,
Ω
Ω
with the functional L linear and continuous from H 2 (Ω) into R; the problems
in (22) are clearly of the biharmonic type. The conjugate gradient solution of
linear variational problems in Hilbert spaces, such as (22), has been addressed
in, e.g., [Glo03, Chapter 3]. Following the above reference, we are going to
solve (22) by a conjugate gradient algorithm operating in the spaces V0 and
Vg , both spaces being equipped with the scalar product defined by
△v△w dx,
{v, w} →
Ω
and the corresponding norm. This conjugate gradient algorithm reads as
follows:
Elliptic Monge–Ampère Equation in Dimension Two
51
Algorithm 1
Step 1. u0 is given in Vg .
Step 2. Solve then
⎧ 0
g ∈ V0 ,
⎪
⎪
⎪
⎨
0
0
△g △v dx =
△u △v dx + τ
D2 u0 : D2 v dx − L(v),
⎪
Ω
Ω
Ω
⎪
⎪
⎩
∀v ∈ V0 ,
(23)
and set w0 = g 0 .
Step 3. Then, for k ≥ 0, uk , g k , wk being known, the last two different from
0, we compute uk+1 , g k+1 , and if necessary wk+1 , as follows:
Solve
⎧
⎪
ḡ k ∈ V0 ,
⎪
⎪
⎪
⎨
(24)
D2 wk : D2 v dx,
△wk △v dx + τ
△ḡ k △v dx =
⎪
Ω
Ω
Ω
⎪
⎪
⎪
⎩
∀v ∈ V0 ,
and compute
ρk = ,
,
|△g k |2 dx
,
△ḡ k △wk dx
Ω
Ω
uk+1 = uk − ρk wk ,
Step 4. If
,
g k+1 = g k − ρk ḡ k .
Ω
and
(25)
(26)
(27)
,
|△g k+1 |2 dx/ Ω |△g 0 |2 dx ≤ tol take u = uk+1 ; else, compute
,
|△g k+1 |2 dx
,
γk = Ω
(28)
|△g k |2 dx
Ω
wk+1 = g k+1 + γk wk .
(29)
Step 5. Do k = k + 1 and return to Step 3.
Numerical experiments have shown that Algorithm 1 (in fact, its discrete
variants) has excellent convergence properties when applied to the solution of
(E-MA-D). Combined with an appropriate mixed finite element approximation of (E-MA-D) it requires the solution of two discrete Poisson problems at
each iteration.
52
E.J. Dean and R. Glowinski
6 On a Mixed Finite Element Approximation
of the Problem (E-MA-D)
6.1 Generalities
Considering the highly variational flavor of the methodology discussed in Sections 2 to 5, it makes sense to look for finite element based methods for the
approximation of (E-MA-D). In order to avoid the complications associated
to the construction of finite element subspaces of H 2 (Ω), we will employ a
mixed finite element approximation (closely related to those discussed in, e.g.,
[DGP91, GP79] for the solution of linear and nonlinear biharmonic problems).
Following this approach, it will be possible to solve (E-MA-D) employing approximations commonly used for the solution of the second order elliptic problems (piecewise linear and globally continuous over a triangulation of Ω, for
example).
6.2 A Mixed Finite Element Approximation
For simplicity, we suppose that Ω is a bounded polygonal domain of R2 . Let
us denote by Th a finite element triangulation of Ω (like those discussed in,
e.g., [Glo84, Appendix 1]). From Th we approximate spaces L2 (Ω), H 1 (Ω)
and H 2 (Ω) by the finite dimensional space Vh defined by
Vh = {v | v ∈ C 0 (Ω̄), v|T ∈ P1 , ∀T ∈ Th },
(30)
with P1 the space of the two-variable polynomials of degree ≤ 1. A function
2
ϕ
2
by Dij
(ϕ). It follows from Green’s
ϕ being given in H 2 (Ω) we denote ∂x∂i ∂x
j
formula that
∂2ϕ
∂ϕ ∂v
dx,
∀v ∈ H01 (Ω), ∀i = 1, 2,
(31)
2 v dx = −
∂x
∂x
i ∂xi
Ω
Ω
i
#
$
∂ϕ ∂v
∂2ϕ
1
∂ϕ ∂v
v dx = −
+
dx, ∀v ∈ H01 (Ω). (32)
∂x
∂x
2
∂x
∂x
∂x
1
2
1
2
2 ∂x1
Ω
Ω
Consider now ϕ ∈ Vh . Taking advantage of the relations (31) and (32), we
2
by
define the discrete analogues of the differential operators Dij
⎧
2
⎨ ∀i = 1, 2, Dhii (ϕ) ∈ V0h ,
∂ϕ ∂v
2
⎩
Dhii
(ϕ)v dx = −
dx, ∀v ∈ V0h ,
∂x
i ∂xi
Ω
⎧ Ω
2
⎪
⎨ Dh12 (ϕ) ∈ V0h ,
$
#
∂ϕ ∂v
1
∂ϕ ∂v
2
⎪
D
(ϕ)v
dx
=
−
+
dx,
⎩
h12
2 Ω ∂x1 ∂x2
∂x2 ∂x1
Ω
where the space V0h is defined by
(33)
∀v ∈ V0h ,
(34)
Elliptic Monge–Ampère Equation in Dimension Two
V0h = Vh ∩ H01 (Ω) (= {v | v ∈ Vh , v = 0 on Γ }).
53
(35)
2
The functions Dhij
(Ω) are uniquely defined by the relations (33) and (34).
However, in order to simplify the computation of the above discrete second order partial derivatives we will use the trapezoidal rule to evaluate the integrals
in the left hand sides of (33) and (34). Owing to their practical importance,
let us detail these calculations:
1. First we introduce the set Σh of the vertices of Th and then Σ0h =
{P | P ∈ Σh , P ∈
/ Γ }. Next, we define the integers Nh and N0h by
Nh = Card(Σh ) and N0h = Card(Σ0h ). We have then dim Vh = Nh
0h
and dim V0h = N0h . We suppose that Σ0h = {Pk }N
k=1 and Σh =
Nh
Σ0h ∪ {Pk }k=N0h +1 .
2. To Pk ∈ Σh we associate the function wk uniquely defined by
wk ∈ Vh , wk (Pk ) = 1, wk (Pl ) = 0, if l = 1, · · · Nh , l = k.
(36)
It is well known (see, e.g., [Glo84, Appendix 1]) that the sets Bh =
N0h
h
{wk }N
k=1 and B0h = {wk }k=1 are vector bases of Vh and V0h , respectively.
3. Let us denote by Ak the area of the polygonal which is the union of
those triangles of Th which have Pk as a common vertex. Applying the
trapezoidal rule to the integrals in the left hand side of the relations (33)
and (34), we obtain:
⎧
2
⎨ ∀i = 1, 2, Dhii (ϕ) ∈ V0h ,
(37)
∂ϕ ∂wk
3
2
⎩ Dhii
(ϕ)(Pk ) = −
dx, ∀k = 1, 2, · · · , N0h ,
Ak Ω ∂xi ∂xi
⎧ 2
2
D
(ϕ)(=
D
(ϕ))
∈ V0h ,
⎪
h21
⎪ h12
⎪
$
#
⎨
∂ϕ ∂wk
∂ϕ ∂wk
3
2
(38)
+
dx,
D (ϕ)(Pk ) = −
⎪ h12
2Ak Ω ∂x1 ∂x2
∂x2 ∂x1
⎪
⎪
⎩
∀k = 1, 2, · · · , N0h .
Computing the integrals in the right hand sides of (37) and (38) is quite
simple since the first order derivatives of ϕ and wk are piecewise constant.
Taking the above relations into account, approximating (E-MA-D) is now a
fairly simple issue. Assuming that the boundary function g is continuous over
Γ , we approximate the affine space Vg by
Vgh = {ϕ | ϕ ∈ Vh , ϕ(P ) = g(P ), ∀P ∈ Σh ∩ Γ },
and then (E-MA-D) by
)
Find ψh ∈ Vgh such that for all k = 1, 2, . . . , N0h ,
2
2
2
Dh11
(ψh )(Pk )Dh22
(ψh )(Pk ) − |Dh12
(ψh )(Pk )|2 = fh (Pk ).
(39)
(E-MA-D)h
The iterative solution of the problem (E-MA-D)h will be discussed in the
following paragraph.
54
E.J. Dean and R. Glowinski
Fig. 3. A uniform triangulation of Ω = (0, 1)2 (h = 1/8)
Remark 7. Suppose that Ω = (0, 1)2 and that triangulation Th is like the one
shown in Figure 3.
1
Suppose that h = I+1
, I being a positive integer greater than 1. In this
particular case, the sets Σh and Σ0h are given by
)
Σh = {Pij | Pij = {ih, jh}, 0 ≤ i, j ≤ I + 1},
(40)
Σ0h = {Pij | Pij = {ih, jh}, 1 ≤ i, j ≤ I},
implying that Nh = (I + 2)2 and N0h = I 2 . It follows then from the relations
(37) and (38) that (with obvious notation):
ϕi+1,j + ϕi−1,j − 2ϕij
, 1 ≤ i, j ≤ I,
h2
ϕi,j+1 + ϕi,j−1 − 2ϕij
2
Dh22
, 1 ≤ i, j ≤ I,
(ϕ)(Pij ) =
h2
2
Dh11
(ϕ)(Pij ) =
(41)
(42)
and
(ϕi+1,j+1 + ϕi−1,j−1 + 2ϕij )
2h2
− (ϕi+1,j + ϕi−1,j + ϕi,j+1 + ϕi,j−1 ) /(2h2 ),
2
Dh12
(ϕ)(Pij ) =
1 ≤ i, j ≤ I. (43)
The finite difference formulas (41)–(43) are exact for the polynomials of degree
≤ 2. Also, as expected,
ϕi+1,j + ϕi−1,j + ϕi,j+1 + ϕi,j−1 − 4ϕij
;
h2
(44)
we have recovered, thus, the well-known 5-point discretization formula for the
finite difference approximation of the Laplace operator.
2
2
Dh11
(ϕ)(Pij ) + Dh22
(ϕ)(Pij ) =
6.3 On the Least-squares Formulation of (E-MA-D)h
Inspired by Sections 3 to 5, we will discuss now the solution of (E-MA-D)h by
a discrete variant of the solution methods discussed there. The first step in
Elliptic Monge–Ampère Equation in Dimension Two
55
this direction is to approximate the least-squares problem (LSQ). To achieve
this goal, we approximate the sets Q and Qf by
Qh = {q | q = (qij )1≤i,j≤2 , q21 = q12 , qij ∈ V0h }
(45)
and
Qf h = {q | q ∈ Qh , q11 (Pk )q22 (Pk ) − |q12 (Pk )|2 = fh (Pk ),
∀k = 1, 2, . . . , N0h },
(46)
respectively, the function fh in (46) (and in (E-MA-D)h ) being a continuous
approximation of f . Next, we approximate the least-squares functional j(·, ·)
(defined by (3) in Section 2) by jh (·, ·) defined as follows:
jh (ϕ, q) = 12 D2h ϕ − q2h ,
∀ϕ ∈ Vh , q ∈ Qh ,
(47)
with
2
D2h ϕ = (Dhij
(ϕ))1≤i,j≤2 ,
((S, T))h =
=
1
3
N0h
(48)
Ak S(Pk ) : T(Pk )
k=1
N0h
1
Ak (s11 t11 + s22 t22 + 2s12 t12 )(Pk ) ,
3
k=1
and then
1/2
Sh = ((S, S))h ,
∀S ∈ Qh .
∀S, T ∈ Qh ,
(49)
(50)
From the above relations, we approximate the problem (LSQ) by the following
discrete least-squares problem:
)
{ψh , ph } ∈ Vgh × Qf h ,
(51)
jh (ψh , ph ) ≤ jh (ϕ, q), ∀{ϕ, q} ∈ Vgh × Qf h .
6.4 On the Solution of the Problem (51)
To solve the minimization problem (51), we shall use the following discrete
variant of the algorithm (9)–(11):
{ψ 0 , p0 } = {ψ0 , p0 }.
(52)
Then, for n ≥ 0, {ψ n , pn } being known, compute {ψ n+1 , pn+1 } via the solution of
(53)
pn+1 = arg min 21 (1 + τ )q2h − ((pn + τ D2h ψ n , q))h ,
q∈Qf h
56
and
E.J. Dean and R. Glowinski
⎧
n+1
⎪
∈ Vgh ,
⎨ψ
(△h [(ψ n+1 − ψ n )/τ ], △h ϕ)h + ((D2h ψ n+1 , D2h ϕ))h
⎪
⎩
= ((pn+1 , D2h ϕ))h , ∀ϕ ∈ V0h ,
(54)
where we have
2
2
△h ϕ = Dh11
(ϕ) + Dh22
(ϕ),
(1)
(2)
(ϕ1 , ϕ2 )h =
1
3
N0h
Ak ϕ1 (Pk )ϕ2 (Pk ),
k=1
∀ϕ ∈ Vh ,
(55)
∀ϕ1 , ϕ2 ∈ V0h ,
(56)
the associated norm being still denoted by · h .
The constrained minimization sub-problems (53) decompose into N0h
three-dimensional minimization problems (one per internal vertex of Th )
similar to those encountered in Section 4, concerning the solution of the problem (10). The various solution methods (briefly) discussed in Section 4 still
apply here. For the solution of the linear sub-problems (54), we advocate
the following discrete variant of the conjugate gradient algorithm (23)–(29)
(Algorithm 1):
Algorithm 2
Step 1. u0 is given in Vgh .
Step 2. Solve
⎧
0
⎪
⎨gh ∈ V0h ,
(△h g 0 , △h ϕ)h = (△h u0 , △h ϕ)h + τ ((D2h u0 , D2h ϕ))h − Lh (ϕ),
⎪
⎩
∀ϕ ∈ V0h ,
(57)
and set
w0 = g 0 .
(58)
Step 3. Then, for k ≥ 0, assuming that uk , g k and wk are known with the last
two different from 0, solve
⎧ k
⎪
⎨ ḡ ∈ V0h ,
(59)
(△h ḡ k , △h ϕ)h = (△h wk , △h ϕ)h + τ ((D2h wk , D2h ϕ))h ,
⎪
⎩
∀ϕ ∈ V0h ,
and compute
ρk = (△h g k , △h g k )h /(△h ḡ k , △h wk )h ,
u
k+1
g
k+1
k
k
= u − ρk w ,
k
k
= g − ρk ḡ .
(60)
(61)
(62)
Elliptic Monge–Ampère Equation in Dimension Two
57
Step 4. If (△h g k , △h g k )h /(△h g 0 , △h g 0 )h ≤ tol. take u = uk+1 ; else, compute
γk = (△h g k+1 , △h g k+1 )h /(△h g k , △h g k )h
(63)
and update wk via
wk+1 = g k+1 + γk wk .
(64)
Step 5. Do k + 1 → k and return to Step 3.
When solving the sub-problems (54), the linear functional Lh (·) encountered in (57) reads as follows:
Lh (ϕ) = (△h ψ n , △h ϕ)h + τ ((pn+1 , D2h ϕ))h .
Concerning the solution of the discrete bi-harmonic problems (57) and
(59), let us observe that both problems are of the following type:
)
Find uh ∈ V0h (or Vgh ) such that
(△h uh , △h v)h = Lh (v),
∀v ∈ V0h ,
(65)
the functional Lh (·) being linear. Let us denote −△h uh by ωh . It follows then
from (37), (55) and (56) that the problem (65) is equivalent to the following
system of two coupled discrete Poisson–Dirichlet problems:
⎧
⎪
⎨ ωh ∈ V0h ,
⎪
⎩
∇ωh · ∇v dx = Lh (v),
Ω
⎧
⎪
⎨ uh ∈ V0h (or Vgh ),
⎪
⎩
∇uh · ∇v dx = (ωh , v)h ,
Ω
∀v ∈ V0h ,
∀v ∈ V0h .
(66)
(67)
Both problems are well-posed. Actually, the solution (by direct or iterative methods) of discrete Poisson problems, such as (66) and (67), has motivated an important literature; some related references can be found in
[Glo03, Chapter 5].
We shall conclude this section by observing that via the algorithm (52)–
(54) we have thus reduced the solution of (E-MA-D)h to the solution of
1. a sequence of discrete (linear) Poisson–Dirichlet problems.
2. a sequence of minimization problems in R3 (or R2 ).
58
E.J. Dean and R. Glowinski
7 Numerical Experiments
The least-squares based methodology discussed in the above sections has been
applied to the solution of three particular (E-MA-D) problems, with Ω =
(0, 1)2 . The first√test problem can be formulated as follows (with |x| = (x21 +
x22 )1/2 and R ≥ 2):
det D2 ψ =
R2
(R2
−
1
|x|2 ) 2
in Ω,
1
ψ = (R2 − |x|2 ) 2 on Γ.
(68)
The function ψ defined by ψ(x) = (R2 −|x|2 )1/2 is a solution to the problem
(68). Its graph is a piece of the sphere of center 0 and radius R. We have
discretized the problem (68) relying on the mixed finite element approximation
discussed in Section 6, associated to a uniform triangulation of Ω (like the
one shown on Figure 3, but finer). The uniformity of the mesh allows us
to solve the various elliptic problems encountered at each iteration of the
algorithm (57)–(64) (Algorithm 2) by fast Poisson solvers taking advantage
of the decomposition properties of the discrete analogues of the biharmonic
problems (23) and (24). To initialize the algorithm (52)–(54), we followed
Remark 6 (see Section 3) and defined ψ0 as the solution of the discrete Poisson
problem
⎧
⎨ψ
0 ∈ Vgh ,
+
⎩ ∇ψ0 · ∇v dx = 2( fh , v)h , ∀v ∈ V0h
Ω
√
and p0 by p0 = D2h ψ0 . The algorithm (52)–(54) diverges if R = 2 (which
is not surprising since the corresponding ψ ∈
/ H 2 (Ω)). On the other hand,
for R = 2 we have a quite fast convergence as soon as τ is large enough,
the corresponding results being reported in Table 1. (We stopped iterating as
soon as Dh2 ψhn − pnh 0,Ω ≤ 10−6 .)
Above, {ψhc , pch } is the computed approximate solution, h the space discretization step, nit the number of iterations necessary to achieve convergence, and Dh2 ψhc − pch 0,Ω is a trapezoidal rule based approximation of
Table 1. First test problem: convergence results
h
1/32
1/32
1/32
1/32
1/32
1/32
1/64
1/64
1/64
τ
nit
Dh2 ψhc − pch Q
ψhc − ψL2 (Ω)
0.1
1
10
100
1, 000
10, 000
1
10
100
517
73
28
21
22
22
76
29
24
0.9813 × 10−6
0.9618 × 10−6
0.7045 × 10−6
0.6773 × 10−6
0.8508 × 10−6
0.8301 × 10−6
0.9624 × 10−6
0.8547 × 10−6
0.8094 × 10−6
0.450 × 10−5
0.449 × 10−5
0.450 × 10−5
0.449 × 10−5
0.449 × 10−5
0.449 × 10−5
0.113 × 10−5
0.113 × 10−5
0.113 × 10−5
Elliptic Monge–Ampère Equation in Dimension Two
59
,
( Ω |D2h ψhc − pch |2 dx)1/2 . Table 1 clearly suggests that: (1) For τ large enough
the speed of convergence is essentially independent of τ ; (2) The speed of convergence is essentially independent of h; (3) The L2 (Ω)-approximation error
is O(h2 ).
The second test problem is defined by
√
1
2 2 3
in Ω, ψ =
|x| 2 on Γ.
(69)
det D2 ψ =
|x|
3
√
3
With these data, the function ψ defined by ψ(x) = 2 3 2 |x| 2 is a solution
of the problem (69). It is easily shown that ψ ∈ W 2,p (Ω) for all p ∈ [1, 4),
but does not have the C 2 (Ω̄)-regularity. Using the same approximation and
algorithms than for the first test problem, we obtain the results reported in
Table 2.
The various comments we have done concerning the solution of the first
test problem still apply here. The graphs of f and ψhc (for h = 1/64) have
been visualized in Figures 4 and 5, respectively.
The third test problem, namely
det D2 ψ = 1 in Ω,
ψ = 0 on Γ,
(70)
has no solution in H 2 (Ω), despite the smoothness of the data, making it,
by far, the more interesting (in some sense) of our test problems, from a
computational point of view. We have reported in Table 3 the results produced
by the algorithm (52)–(54) using ψhn+1 − ψhn L2 (Ω) ≤ 10−7 as the stopping
criterion.
It is clear from Table 3 that the convergence is slower than for the first two
test problems, however, some important features remain such as: the number
of iterations necessary to achieve convergence is essentially independent of
τ , as soon as this parameter is large enough, and increases slowly with 1/h
(actually like h−1/2 ). In Figures 6, 7 and 8 we have shown, respectively, the
graph of ψhc (for h = 1/64), the graph of the function x1 → ψhc (x1 , 1/2) when
x1 ∈ [0, 1], and the graph of the restriction of ψhc to the line x1 = x2 (i.e., the
Table 2. Second test problem: convergence results
h
1/32
1/32
1/32
1/32
1/64
1/64
1/64
1/64
τ
1
10
100
1, 000
1
10
100
1, 000
nit
145
56
46
45
151
58
49
48
Dh2 ψhc − pch Q
−6
0.9381 × 10
0.9290 × 10−6
0.9285 × 10−6
0.9405 × 10−6
0.9500 × 10−6
0.9974 × 10−6
0.9531 × 10−6
0.9884 × 10−6
ψhc − ψL2 (Ω)
0.556 × 10−4
0.556 × 10−4
0.556 × 10−4
0.556 × 10−4
0.145 × 10−4
0.145 × 10−4
0.145 × 10−4
0.145 × 10−4
60
E.J. Dean and R. Glowinski
Fig. 4. Second test problem: graph
of f .
Fig. 5. Second test problem: graph of
ψhc (h = 1/64)
Table 3. Third test problem: convergence results
h
1/32
1/32
1/32
1/32
1/64
1/64
1/64
1/128
1/128
τ
nit
Dh2 ψhc − pch Q
1
100
1, 000
10, 000
1
100
1, 000
100
1, 000
4, 977
3, 297
3, 275
3, 273
6, 575
4, 553
4, 527
5, 401
5, 372
0.1054 × 10−1
0.4980 × 10−2
0.4904 × 10−2
0.4896 × 10−2
0.1993 × 10−1
0.1321 × 10−1
0.1312 × 10−1
0.1841 × 10−1
0.1830 × 10−1
Fig. 6. Third test problem: graph of ψhc (h = 1/64)
graph of the function ξ → ψhc (ξ, ξ) when ξ ∈ [0, 1]). In Figures 7 and 8, we
used − · −· (resp., − − − and — ) to represent the results corresponding to
h = 1/32 (resp., h = 1/64 and h = 1/128).
The results in Figures 7 and 8 suggest strongly that ψh converges to a limit
as h → 0. They suggest also that the convergence is superlinear with respect
to h. The above limit can be viewed as a generalized solution of (E-MA-D)
Elliptic Monge–Ampère Equation in Dimension Two
Cross sections
Diagonal cross sections
0.2
0.2
0.18
0.18
0.16
0.16
0.14
0.14
0.12
0.12
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0
61
0.02
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Fig. 7. Third test problem: graph of
restricted to the line x2 = 1/2
1
ψhc
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 8. Third test problem: graph of ψhc
restricted to the line x1 = x2
(in a least-squares sense). Actually, a closer inspection of the numerical results
shows that the curvature of the graph is negative close to the corners, implying
that the Monge–Ampère equation (70) is violated there (since the curvature
is given by det D2 ψ/(1 + |∇ψ|2 )2 ). Indeed, as expected, it is also violated
along the boundary, since D2h ψhc 0,Ω ≈ 10−2 , while D2h ψhc 0,Ω1 ≈ 10−4 and
D2h ψhc 0,Ω2 ≈ 10−5 , where Ω1 = (1/8, 7/8)2 and Ω2 = (1/4, 3/4)2 . These
results show that in that particular case, at least, the Monge–Ampère equation
det D2 ψ = 1 is verified with a good accuracy, sufficiently far away from Γ .
8 Further Comments
A natural question arising from the material discussed in the above sections
is the following one: Does our least-squares methodology provide viscosity solutions?
We claim that indeed the solutions obtained by the least-squares methodology discussed in the preceding sections are (kind of) viscosity solutions. To
show this property, let us consider (as in Section 3) the flow associated with
the least-squares optimality conditions (7). We have then
⎧
Find {ψ(t), p(t)} ∈ Vg × Q for all t > 0 such that
⎪
⎪
⎪
⎪
⎪
⎪
⎪
∂(△ψ)/∂t
△ϕ
dx
+
D2 ψ : D2 ϕ dx
⎪
⎪
⎪
Ω
Ω
⎪
⎪
⎪
⎪
2
⎪
⎪
=
p : D ϕ dx, ∀ϕ ∈ V0 ,
⎨
Ω
(71)
⎪
⎪
⎪
∂p/∂t : q dx +
p : q dx + ∂IQf (p), q
⎪
⎪
⎪
Ω
Ω
⎪
⎪
⎪
⎪
2
⎪
⎪
=
D ψ : q dx, ∀q ∈ Q,
⎪
⎪
⎪
Ω
⎪
⎩
{ψ(0), p(0)} = {ψ0 , p0 }.
62
E.J. Dean and R. Glowinski
Assuming that Ω is simply connected, we introduce:
u = {u1 , u2 } = {∂ψ/∂x2 , −∂ψ/∂x1 },
v = {v1 , v2 } = {∂ϕ/∂x2 , −∂ϕ/∂x1 },
ω = ∂u2 /∂x1 − ∂u1 /∂x2 ,
θ = ∂v2 /∂x1 − ∂v1 /∂x2 ,
Vg = {v | v ∈ (H 1 (Ω))2 , ∇ · v = 0, v · n = dg/ds on Γ },
V0 = {v | v ∈ (H 1 (Ω))2 , ∇ · v = 0, v · n = 0 on Γ },
0 1
.
L=
−1 0
Above, n is the unit vector of the outward normal at Γ and s is a counterclockwise curvilinear abscissa on Γ . The formulation (71) is equivalent to
⎧
⎪
Find u(t) ∈ Vg for
⎪
all t > 0 such that
⎪
⎪
⎪
⎨ ∂ω/∂t θ dx +
Lp : ∇v dx, ∀v ∈ V0 ,
∇u : ∇v dx =
(72)
Ω
Ω
Ω
⎪
⎪
∂p/∂t + p + ∂IQf (p) + L∇u = 0,
⎪
⎪
⎪
⎩{u(0), p(0), ω(0)} = {u , p , ω }.
0
0
0
The problem (72) has a visco-elasticity flavor, −Lp playing here the role
of the so-called extra-stress tensor. As t → +∞, we obtain thus at the limit a
(kind of) viscosity solution.
Acknowledgement. The authors would like to thank J. D. Benamou, Y. Brenier,
L. A. Caffarelli and P.-L. Lions for assistance and helpful comments and suggestions.
The support of NSF (grant DMS-0412267) is also acknowledged.
References
[Aub82]
[Aub98]
[BB00]
[Cab02]
[CC95]
[CH89]
Th. Aubin. Nonlinear Analysis on Manifolds, Monge–Ampère Equations.
Springer-Verlag, Berlin, 1982.
Th. Aubin. Some Nonlinear Problems in Riemanian Geometry. SpringerVerlag, Berlin, 1998.
J.-D. Benamou and Y. Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math.,
84(3):375–393, 2000.
X. Cabré. Topics in regularity and qualitative properties of solutions of
nonlinear elliptic equations. Discrete Contin. Dyn. Syst., 8(2):331–359,
2002.
L. A. Caffarelli and X. Cabré. Fully Nonlinear Elliptic Equations.
American Mathematical Society, Providence, RI, 1995.
R. Courant and D. Hilbert. Methods of Mathematical Physics, Vol. II.
Wiley Interscience, New York, 1989.
Elliptic Monge–Ampère Equation in Dimension Two
[CIL92]
[CKO99]
[DG03]
[DG04]
[DG05]
[DG06a]
[DG06b]
[DGP91]
[DS96]
[Glo84]
[Glo03]
[GP79]
[GT01]
[Jan88]
[OP88]
[Pir89]
[Urb88]
63
M. G. Crandall, H. Ishii, and P.-L. Lions. User’s guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math.
Soc. (N.S.), 27(1):1–67, 1992.
L. A. Caffarelli, S. A. Kochenkgin, and V. I. Oliker. On the numerical
solution of reflector design with given far field scattering data. In
L. A. Caffarelli and M. Milman, editors, Monge-Ampère Equation: Application to Geometry and Optimization, pages 13–32. American Mathematical Society, Providence, RI, 1999.
E. J. Dean and R. Glowinski. Numerical solution of the two-dimensional
elliptic Monge–Ampère equation with Dirichlet boundary conditions:
an augmented Lagrangian approach. C. R. Math. Acad. Sci. Paris,
336(9):779–784, 2003.
E. J. Dean and R. Glowinski. Numerical solution of the two-dimensional
elliptic Monge–Ampère equation with Dirichlet boundary conditions: a
least-squares approach. C. R. Math. Acad. Sci. Paris, 339(12):887–892, 2004.
E. J. Dean and R. Glowinski. On the numerical solution of a twodimensional Pucci’s equations with Dirichlet boundary conditions: a leastsquares approach. C. R. Math. Acad. Sci. Paris, 341(6):375–380, 2005.
E. J. Dean and R. Glowinski. An augmented Lagrangian approach to the
numerical solution of the Dirichlet problem for the elliptic Monge–Ampère
equation in two dimensions. Electron. Trans. Numer. Anal., 22:71–96, 2006.
E. J. Dean and R. Glowinski. Numerical methods for fully nonlinear
elliptic equations of the Monge–Ampère type. Comput. Methods Appl.
Mech. Engrg., 195(13–16):1344–1386, 2006.
E. J. Dean, R. Glowinski, and O. Pironneau. Iterative solution of the
stream function-vorticity formulation of the Stokes problem. Applications
to the numerical simulation of incompressible viscous flow. Comput. Methods Appl. Mech. Engrg., 87(2–3):117–155, 1991.
J. E. Dennis and R. Schnabel. Numerical Methods for Unconstrained
Optimization and Nonlinear Equations. SIAM, Philadelphia, PA, 1996.
R. Glowinski. Numerical Methods for Nonlinear Variational Problems.
Springer-Verlag, New York, 1984.
R. Glowinski. Finite element methods for incompressible viscous flow. In
P. G. Ciarlet and J.-L. Lions, editors, Handbook of Numerical Analysis,
Vol. IX, pages 3–1176. North-Holland, Amsterdam, 2003.
R. Glowinski and O. Pironneau. Numerical methods for the first biharmonic equation and for the two-dimensional Stokes problem. SIAM Rev.,
17(2):167–212, 1979.
D. Gilbarg and N. Trudinger. Elliptic Partial Differential Equations of
Second Order. Springer-Verlag, Berlin, 2001.
R. Jansen. The maximum principle for viscosity solutions of fully nonlinear second order partial differential equations. Arch. Rational Mech.
Anal., 101:1–27, 1988.
V. I. Oliker and L. D. Prussner. On the numerical solution of the equation
(∂ 2 z/∂x2 )(∂ 2 z/∂y 2 )−((∂ 2 z/∂x∂y))2 = f and its discretization, I. Numer.
Math., 54(3):271–293, 1988.
O. Pironneau. Finite Element Methods for Fluids. Wiley, Chichester,
1989.
J. I. E. Urbas. Regularity of generalized solutions of Monge–Ampère
equations. Math. Z., 197(3):365–393, 1988.
Higher Order Time Stepping for Second Order
Hyperbolic Problems and Optimal CFL
Conditions
J. Charles Gilbert and Patrick Joly
INRIA Rocquencourt, BP 105, 78153 Le Chesnay, France
Jean-Charles.Gilbert@inria.fr
Patrick.Joly@inria.fr
Summary. We investigate explicit higher order time discretizations of linear second
order hyperbolic problems. We study the even order (2m) schemes obtained by
the modified equation method. We show that the corresponding CFL upper bound
for the time step remains bounded when the order of the scheme increases. We
propose variants of these schemes constructed to optimize the CFL condition. The
corresponding optimization problem is analyzed in detail and the analysis results in
a specific numerical algorithm. The corresponding results are quite promising and
suggest various conjectures.
1 Introduction
We are concerned here with a very classical problem, namely the numerical
approximation of second order hyperbolic problems, more precisely problems
of the form
d2 u
+ Au = 0,
(1)
dt2
where A is a linear unbounded positive self-adjoint operator in some Hilbert
space V . This appears to be the generic abstract form for a large class of partial
differential equations in which u denotes a function u(x, t) from Ω ⊂ Rd × R+
in RN and A is a second order differential operator in space, of elliptic nature.
Such models are used for wave propagation in various domains of application,
in particular, in acoustics, electromagnetism, and elasticity [Jol03].
During the past four decades, a considerable literature has been devoted
to the construction of numerical methods for the approximation of (1). The
most recent research deals with the construction of higher order in space
and conservative methods for the space semi-discretization of (1) (see, for
instance, [Coh02] and the references therein). These methods lead us to
consider a family (indexed by h > 0, the approximation parameter which
68
J.Ch. Gilbert and P. Joly
tends to 0 – typically the step size of the computational mesh) of problems of
the form:
d2 u h
+ Ah uh = 0,
(2)
dt2
where the unknown uh is a function of time with value in some Hilbert space Vh
(whose norm will be denoted ·, even if it does depend on h) and Ah denotes
a bounded self-adjoint and positive operator in Vh (namely an approximation
of the second order differential operator A). Several approaches lead naturally
to problems of the form (2), among which
•
•
•
•
variational finite differences [CJ96, Dab86, AKM74],
finite element methods [CJRT01, CJKMVV99],
mixed finite element methods [CF05, PFC05],
conservative discontinuous Galerkin methods [HW02, FLLP05].
Of course, the norm of Ah blows up when h goes to 0, as
Ah = O(h−2 ).
It is well known that one has conservation
-2
1 - duh - +
Eh (t) = 2 - dt -
of the discrete energy:
1
ah (uh , uh ),
2
where ah (·, ·) is the continuous symmetric bilinear form associated with Ah .
From the energy conservation result and the positivity of Ah , one deduces a
stability result: the norm of the solution uh (t) can be estimated in function
of the norm of the Cauchy data:
u0,h = uh (0),
u1,h =
duh
(0),
dt
with constants independent of h. This is also a direct consequence of the
formula:
1
1
−1
uh (t) = cos Ah2 t u0,h + Ah 2 sin Ah2 t u1,h ,
which yields
uh (t) ≤ u0,h + tu1,h .
(3)
In what follows, we are interested in the time discretization of (2) by explicit
finite difference schemes. More specifically, we are interested in the stability
analysis of such schemes, i.e., in obtaining a priori estimates of the form
(3) after time discretization. The conservative nature (i.e., the conservation
of energy) of the continuous problem can be seen as a consequence of the
time reversibility of this equation. That is why we shall favor centered finite
difference schemes which preserve such a property at the discrete level.
The most well known scheme is the classical second order leap-frog scheme.
Let us consider a time step ∆t > 0 and denote by unh ∈ Vh an approximation
of uh (tn ), tn = n∆t. This scheme is
Optimal Higher Order Time Discretizations
un+1
− 2unh + uhn−1
h
+ Ah unh = 0.
∆t2
69
(4)
Of course, (4) must be completed by a start-up procedure using the initial
conditions to compute u0h and u1h . We omit this here for simplicity.
By construction, this scheme is second order accurate in time. Its stability
analysis is well known and we have (see, for instance, [Jol03]):
Theorem 1. A necessary and sufficient condition for the stability of (4) is
∆t2
Ah ≤ 1.
4
(5)
Remark 1. The condition (5) appears as an abstract CFL condition. In the
applications to concrete wave equations, it is possible to get a bound for
Ah of the form
4c2
Ah ≤ 2+ ,
h
where c+ is a positive constant. This one has the dimension of a propagation
velocity and only depends on the continuous problem: it is typically related to
the maximum wave velocity for the continuous problem. Therefore, a (weaker)
sufficient stability condition takes the form
c+ ∆t
≤ 1.
h
In many situations, it is also possible to get a lower bound of the form (where
c− ≤ c+ also has the dimension of velocity)
Ah ≥
4c2−
,
h2
so that a necessary stability condition is
c− ∆t
≤ 1.
h
⊓
⊔
Next we investigate one way to construct more accurate (in time) discretization schemes for (2). This is particularly relevant when the operator
Ah represents a space approximation of the continuous operator A in O(hk )
with k > 2: if one thinks about taking a time step proportional to the space
step h (a usual choice which is in conformity with a CFL condition), one would
like to adapt the time accuracy to the space accuracy. In comparison to what
has been done on the space discretization side, we found very few work in
this direction, even though it is very likely that a lot of interesting solutions
could probably be found in the literature on ordinary differential equations
70
J.Ch. Gilbert and P. Joly
[HW96]. Most of the existing work is in the context of finite difference methods, compact schemes, etc., see, for instance, [Dab86, SB87, CJ96, AJT00] or
[DPJ06, TT05] in the context of the first order hyperbolic problems.
The content of the rest of this paper is as follows. In Section 2, we investigate a class of methods for the time discretization of (2), based on the
so-called modified equation approach. These schemes can be seen as even
higher order variations around the leap-frog scheme of which they preserve
the main properties: explicit nature, time reversibility, energy conservation. It
appears that the computational cost of one time step of the scheme of order
2m is m times larger than for one step of the second order scheme. This can
be counterbalanced if one can use larger time steps than for the second order
scheme. This is where the stability analysis plays a major role (Section 2).
This one shows that even though the maximum allowed time step increases
with m (particularly for small even values of m), it remains uniformly bounded
with m (Theorem 3). In Section 3, we investigate the question of constructing
other schemes, conceived as modifications of the previous one, that should
satisfy:
•
•
the good properties of the schemes (explicitness, conservativity, etc.) and
the order of approximation are preserved,
the maximal time step authorized by the CFL condition is larger.
We formulate this as a family of optimization problems that we analyze in
detail. We are able to prove the existence and the uniqueness of the solution
of these problems (Corollary 2) and to give necessary and sufficient conditions
of optimality (Theorems 4 and 5) that we use to construct an algorithm for
the effective computation of the solutions of these optimization problems.
This algorithm, as well as the corresponding numerical results, are presented
and discussed in Section 4. Our first results are quite promising and show
that the optimization procedure does allow us to improve significantly the
CFL condition. However, the corresponding numerical schemes still have to
be tested numerically. This will be the object of a forthcoming work.
2 Higher Order Schemes by the Modified Equation
Approach
2.1 The modified Equation Approach
It is possible to construct higher order schemes which remain explicit and centered. In particular, all the machinery of Runge–Kutta methods for ordinary
differential equations [HW96] is available. Let us concentrate here on a classical approach, the so-called modified equation approach [SB87, CdLBL97,
Dab86]. For instance, to construct a fourth order scheme, we start by looking
at the truncation error of (4)
Optimal Higher Order Time Discretizations
71
d2 u h n
∆t2 d4 uh n
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
=
(t ) +
(t ) + O(∆t4 ).
2
2
∆t
dt
12 dt4
Using the equation satisfied by uh , we get the identity
∆t2 2
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
A uh (tn ) + O(∆t4 ),
= −Ah uh (tn ) +
2
∆t
12 h
which leads to the following fourth order scheme:
un+1
− 2unh + uhn−1
∆t2 2 n
n
h
A u = 0.
+
A
u
−
h
h
∆t2
12 h h
(6)
This one can be implemented in such a way that each time step involves only
two applications of the operator Ah , using Horner’s rule,
∆t2
n−1
n
2
n
un+1
A
I
−
−
∆t
A
=
2u
−
u
h
h uh .
h
h
h
12
More generally, an explicit centered scheme of order 2m is given by
un+1
− 2unh + uhn−1
(m)
h
+ Ah (∆t)unh = 0,
∆t2
(m)
Ah (∆t) = Ah Pm (∆t2 Ah ), (7)
where the polynomial Pm (x) is defined by
Pm (x) = 1 + 2
m−1
(−1)l
l=1
xl
.
(2l + 2)!
(8)
Indeed, a Taylor expansion gives
uh (tn±1 ) = uh (tn ) +
2m+1
(±1)k
k=1
∆tk dk uh n
(t ) + O(∆t2m+2 )
k! dtk
so that
m
∆t2k−2 d2k uh
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
=2
(tn ) + O(∆t2m ).
2
∆t
2k! dt2k
k=1
Since
d2k uh n
(t )
dt2k
= (−1)k Akh uh (tn ), we also have
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
=
∆t2
n
= −Ah uh (t ) + 2
or equivalently
m
k=2
(−1)k
∆t2k−2 k
Ah uh (tn ) + O(∆t2m ),
2k!
72
J.Ch. Gilbert and P. Joly
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
+
∆t2
m−1
n
(−1)k
+ Ah uh (t ) + 2
k=1
∆t2k
Ak uh (tn ) = O(∆t2m ).
(2k + 2)! h
This identity leads to the scheme (7)–(8).
Using again Horner’s rule for the representation of the polynomial Pm ,
to m successive applications of the operator
reduces the calculation of un+1
h
Ah (∆t), according to the following algorithm:
n
Step 1. Set un,0
h = uh .
Step 2. Compute
n,k−1
un,k
−2
h = uh
∆t2 Ah un,k−1
h
,
(2k + 1)(2k + 2)
k = 1, · · · , m.
Step 3. Set un+1
= un,m
h
h .
In other words, since the most expensive step of the algorithm is the application of the operator Ah (a matrix-vector multiplication in practice), the
computational cost for one time step of the scheme of order 2m is only m
times larger than the computational cost for one time step of the scheme of
order 2.
2.2 Stability Analysis
The stability analysis of the higher order scheme (7) is similar to the one of
the second order scheme but it is complicated by the fact that one must verify
that the operator Ah (∆t) is positive, which already imposes an upper bound
on ∆t.
Theorem 2. A sufficient stability condition for scheme (7) is given by
∆t2 Ah ≤ αm ,
(9)
αm = sup{α | ∀x ∈ [0, α], 0 ≤ Qm (x) ≤ 4},
(10)
where we have defined
with
Qm (x) = xPm (x) = x + 2
m−1
l=1
(−1)l
xl+1
.
(2l + 2)!
(11)
This condition is necessary as soon as the spectrum of Ah is the whole interval
[0, Ah ].
Optimal Higher Order Time Discretizations
73
Proof. Using Von Neumann analysis [RM67] and spectral theory of selfadjoint operators (namely the spectral theorem [RS78]), it is sufficient to look
at the (λ-parameterized) family of difference equations (un is now a sequence
of complex numbers):
un+1 − 2un + un−1
+ λPm (λ∆t2 )un = 0,
∆t2
λ ∈ σ(Ah ),
(12)
where σ(Ah ) is the spectrum of Ah . The characteristic equation of this recurrence is
r2 − 2 − Qm (λ∆t2 ) r + 1 = 0.
This is a second degree equation with real coefficients. The product of the roots
being 1, the two solutions have modulus less than 1 – which is equivalent to
the boundedness of un – if and only if the discriminant of this equation is
non-positive, in which case the roots belong to the unit circle. This leads to
Qm (λ∆t2 )[4 − Qm (λ∆t2 )] ≥ 0 or
0 ≤ Qm (λ∆t2 ) ≤ 4.
If (9) holds, since σ(Ah ) ⊂ [0, Ah ], λ∆t2 ∈ [0, 4] which proves that (9) is a
sufficient stability condition. The second part of the proof is left to the reader.
⊓
⊔
Remark 2. The equality σ(Ah ) = [0, Ah ] holds, for instance, when one uses
a finite difference scheme of the wave equation with constant coefficients in
the whole space. The Fourier analysis proves that the spectrum of Ah is, in
this case, purely continuous.
⊓
⊔
The finiteness of αm for each m is quite obvious. However, its value is
difficult to compute explicitly, except for the first values of m. One has, in
particular,
α1 = 4,
α2 = 12,
1
2
α3 = 2(5 + 5 3 − 5 3 ) ≃ 7.572,
α4 ≃ 21.4812, . . . (13)
For the exact – but very complicated – expression of α4 , we refer to [CJRT01]
or [Jol03]; other values of αm are given in the column “k = 0” of Table 1 on
page 88. It is particularly interesting to note that
+ for the fourth order scheme,
one is allowed to take a time step which is α2 /α1 (≃1.732) times larger
than for the second order scheme, which almost balances the fact that the
cost of one time step is twice larger.
+ In the same way, with the scheme of
order 8, one can take a time step α4 /α1 (≃ 2.317) times larger (while each
time step costs four times more). Surprisingly, the scheme of order 6 seems
less interesting: the stability condition is more constraining that for the fourth
order scheme.
From the theoretical point of view, it would be interesting to know the
behaviour of αm for large m. For this we first identify the limit behaviour of
the polynomials Qm (x). One easily checks that
74
J.Ch. Gilbert and P. Joly
lim Qm (x) = Q∞ (x) ≡ x + 2
m→+∞
+∞
(−1)l
l=1
√
xl+1
= 2(1 − cos x).
(2l + 2)!
(14)
√
x
Remark 3. Setting P∞ (x) = 2 1−cos
and taking (formally) the limit of (7)
x
when m → +∞, we obtain the scheme
un+1
− 2unh + uhn−1
h
+ Ah P∞ (∆t2 Ah ) = 0.
∆t2
(15)
This scheme is, in fact, an exact scheme for the differential equation (2). It
suffices to remark that
1
1
1
sin Ah2 tn+1 − 2 sin Ah2 tn + sin Ah2 tn−1
1
1
= − 2 − cos Ah2 ∆t sin Ah2 tn
cos A 21 tn+1 − 2 cos A 21 tn + cos A 21 tn−1
h
h
h
1
1
n
= − 2 − cos Ah2 ∆t cos Ah2 t ,
so that any solution of (2), of the form (for some a and b in Vh )
1
1
uh (t) = cos Ah2 t a + sin Ah2 t b
satisfies
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
= − Ah ∆t2
∆t2
that is to say
−1
1
2 − cos Ah2 ∆t
Ah uh (tn ),
uh (tn+1 ) − 2uh (tn ) + uh (tn−1 )
= −Ah P∞ (∆t2 Ah ).
∆t2
⊓
⊔
Since 0 ≤ Q∞ (x) ≤ 4, if we define α∞ by (19) for m = +∞ we have
α∞ = +∞. Unfortunately, this does not mean, as we are going to see, that
αm → +∞ when m → +∞. In fact, to describe the behaviour of αm , we have
to distinguish between the even and odd sequences α2m and α2m+1 . Our first
observation is that the convergence of the sequences Q2m (x) and Q2m+1 (x) is
monotone. Indeed, for m ≥ 1
#
$
x2m
x
Q2m−1 (x) − Q2m+1 (x) = 2
1−
4m!
(4m + 1)(4m + 2)
which shows that Q2m+1 (x) is a strictly decreasing sequence for large m:
Q2m+1 (x) < Q2m−1 (x)
as soon as (4m + 1)(4m + 2) > x.
Optimal Higher Order Time Discretizations
75
In particular, since (4m + 1)(4m + 2) > π 2 for m ≥ 1:
Q∞ (π 2 ) =
lim Q2m+1 (π 2 ) = 4
m→+∞
=⇒
Q2m+1 (π 2 ) > 4,
which shows, using the definition (10), that
α2m+1 ≤ π 2 ,
for m ≥ 1.
Moreover, by the definition of αm , we know that Qm (αm ) = 0 or 4. On the
other hand, since the sequence Q2m+1 (x) is decreasing, for any x ∈ [0, π 2 ], we
have
√
Q2m+1 (x) > Q∞ (x) = 2(1 − cos x) in [0, π 2 ].
This makes impossible Q2m+1 (α2m+1 ) = 0, which implies that
Q2m+1 (α2m+1 ) = 4.
Finally, the inequality
Q2m+1 (x) < Q1 (x) = x
implies
Q2m+1 (x) < 4,
∀x ∈ [0, 4],
which implies, in particular,
α2m+1 > 4.
2
Let αodd ∈ [4, 4π ] be any accumulation point of α2m+1 , since the convergence
of Qm to Q∞ is uniform in any compact set, we get:
Q∞ (αodd )
=⇒
(since αodd ∈ [4, π 2 ]) αodd = π 2 .
In the same way
#
$
x2m+1
x
Q2m+2 (x) − Q2m (x) = 2
1−
(4m + 2)!
(4m + 3)(4m + 4)
shows that the sequence Q2m (x) is strictly increasing for large m:
Q2m+2 (x) > Q2m (x)
as soon as (4m + 3)(4m + 4) > x.
In particular, as soon as m ≥ 1,
Q∞ (4π 2 ) =
lim Q2m (4π 2 ) = 0
m→+∞
=⇒
Q2m (4π 2 ) < 0,
which shows that
α2m ≤ 4π 2 ,
m ≥ 1,
√
while the inequality Q2m (x) < 2(1 − cos x) ≤ 4 in [0, π 2 ] for m ≥ 1 implies
that
Q2m (α2m ) = 0.
76
J.Ch. Gilbert and P. Joly
Finally, the inequality, for m > 1,
Q2m (x) > Q2 (x) = x(1 − x/12)
for x < 132
shows that Q2m (x) > 0 for x < 12 which implies that
α2m ≥ 12.
Let αeven ∈ [12, 4π 2 ] be any accumulation point of α2m . We thus get
Q∞ (αeven ) = 0
=⇒
(since αeven ∈ [12, 4π 2 ]) αeven = 4π 2 .
We have shown the following result:
Theorem 3. Let αm be defined by (10). Then
lim α2m = 4π 2 ,
m→+∞
lim α2m+1 = π 2 .
m→+∞
(16)
3 Modified Higher Order Schemes: an Optimization
Approach
For an integer k, we denote
by Pk the set of polynomials of degree less or
equal to k and define P ≡ k≥0 Pk .
A general explicit scheme of order 2m is given by
un+1
− 2unh + uhn−1
2
n
h
+ Pm (∆t2 Ah ) + ∆t2m Am
h Rk (∆t Ah ) Ah uh = 0, (17)
∆t2
where Rk ∈ Pk−1 . The cost of this new scheme is a priori (m + k)/m times
larger than the cost of the scheme corresponding to Rk = 0. As in Theorem 2,
the stability condition of this new scheme is
∆t2
Ah ≤ αm (Rk ),
4
(18)
αm (R) = sup{α | ∀x ∈ [0, α], 0 ≤ x[Pm (x) + xm R(x)] ≤ 4}.
(19)
where we have defined
The natural idea, in some sense, to get an optimal scheme would be to solve
the optimization problem:
Find Rm,k ∈ Pk−1
such that αm (Rm,k ) =
sup αm (R).
(20)
R∈Pk−1
Then, assuming that this problem has a solution Rm,k , one gets the optimal
CFL constant for the schemes in the class, namely
Optimal Higher Order Time Discretizations
αm,k = αm (Rm,k ).
77
(21)
Clearly, since Pk−1 ⊂ Pk , αm,k increases with k. We have also αm,k > 0, since
Pm (0) = 1 (m ≥ 1).
For what follows, it is useful to introduce the following affine map:
ψm : P → P
(22)
R → ψ (R) = Q + xm+1 R,
m
m
where we recall that Qm (x) = xPm (x). Note that ψm maps Pk−1 into Pm+k .
Lemma 1. The function R ∈ Pk−1 → αm (R) ∈ R∗+ has the following properties:
(i) It goes to 0 at infinity:
lim
R →+∞
αm (R) = 0.
(ii) It is upper semi-continuous:
Rn → R
in Pk−1
=⇒
αm (R) ≥ lim sup αm (Rn ).
Proof. Let rj (R) denote the coefficient of xj in R ∈ Pk−1 and consider Rn ∈
Pk−1 such that
Rn ∞ ≡ sup |rj (Rn )| −→ +∞.
0≤j≤k−1
Referring to the fact that Pk−1 is finite dimensional, one can find a subsequence (still denoted Rn for simplification) and a fixed non-zero polynomial
ϕ ∈ Pk−1 such that, as soon as ϕ(x) = 0,
Rn (x) ∼ Rn ∞ ϕ(x)
(n → +∞).
/ [0, 4] for sufficiently large n
For such positive values of x, [ψm (Rn )](x) ∈
which means that αm (Rn ) < x =⇒ lim sup αm (Rn ) < x. Since ϕ is a
non-zero polynomial, one can find arbitrarily small values of such x so that
lim sup αm (Rn ) ≤ 0. As αm (Rn ) is a sequence of positive real numbers, this
means that αm (Rn ) tends to 0.
On the other hand, let Rn ∈ Pk−1 be a sequence converging to R. Let ε
be any arbitrarily small positive number. By the uniform convergence of Rn
to R in the interval IR (ε) = [0, α(R) + ε] we have:
lim ψm (Rn ) − 2L∞ (IR (ε)) = ψm (R) − 2L∞ (IR (ε)) > 2.
n→+∞
Thus, there exists an integer Nε such that:
n ≥ Nε =⇒ ψm (Rn ) − 2L∞ (IR (ε)) > 2
=⇒
αm (Rn ) < αm (R) + ε.
Therefore,
lim sup αm (Rn ) ≤ αm (R) + ε,
which yields (ε being arbitrarily small) lim sup αm (Rn ) ≤ αm (R).
⊓
⊔
78
J.Ch. Gilbert and P. Joly
18
16
14
12
10
8
6
4
2
0
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Fig. 1. Graph of the function α1 (r)
The classical existence theory in analysis [Sch91, Theorem 2.7.11] leads an
existence result.
Corollary 1 (Existence of a solution). The optimization problem (20) has
(at least) one solution.
Clearly, the function R → αm (R) is not continuous. Let us consider, for
instance, the case when m = 1 and k = 1. Then, the function α1 (R) can be
identified to the function of the real variable r defined by
α1 (r) = sup{α | ∀x ∈ [0, α], 0 ≤ x − rx2 ≤ 4}.
It is straightforward to compute that
√
1 − 1 − 16r
1
α1 (r) =
if r <
,
2r
16
and α1 (r) =
1
r
if r ≥
(23)
1
.
16
It is clear that α1 is discontinuous at r = 1/16 since (see also Figure 1)
α1 (1/16) = 16
and
lim α1 (r) = 8.
r↑1/16
Note that for r = 1/16 the graph of the polynomial x − rx2 is tangent to the
line y = 4 at x = 8 < α1 (1/16) = 16. This is an illustration of a more general
property.
Lemma 2. Let Dk be the set of polynomials R ∈ Pk−1 such that
∃x∗ ∈ ]0, αm (R)[
|
[ψm (R)](x∗ ) = 0 or 4.
(24)
The function R → αm (R) is discontinuous at every point of Dk and continuous everywhere else.
Optimal Higher Order Time Discretizations
79
Proof. Let R ∈ Dk be such that [ψm (R)](x∗ ) = 4 for some x∗ ∈ ]0, αm (R)[.
(A similar argument works if [ψm (R)](x∗ ) = 0.) For any ε > 0, ψm (R +
ε) = ψm (R) + εxm+1 > 4 in a small neighborhood of x∗ . This implies that
αm (R + ε) < x∗ < αm (R), hence the discontinuity of αm at R.
On the other hand, let R ∈ Pk−1 \ Dk and consider a sequence of polynomials Rn ∈ Pk−1 converging to R. Since
[ψm (Rn )](x) − [ψm (R)](x)
= |Rn (x) − R(x)| → 0,
xm+1
uniformly in x ∈ [0, αm (R)], there exists an integer N1 such that [ψm (R)](x)−
xm+1 ≤ [ψm (Rn )](x) ≤ [ψm (R)](x) + xm+1 for n ≥ N1 and x ∈ [0, αm (R)].
These inequalities and the fact that [ψm (R)](0) = 0 and [ψm (R)]′ (0) = 1
imply that there is ε1 > 0 such that [ψm (Rn )](x) ∈ [0, 4] for n ≥ N1 and
x ∈ [0, ε1 ]. In other words,
for n ≥ N1 ,
αm (Rn ) ≥ ε1 .
For any ε ∈ ]0, ε1 ], small enough, and JR (ε) = [ε, αm (R) − ε], there holds
ψm (R) − 2L∞ (JR (ε)) < 2.
Then there exists an integer Nε ≥ N1 such that for n ≥ Nε
ψm (Rn ) − 2L∞ (JR (ε)) < 2 or
αm (Rn ) > αm (R) − ε.
Now ε > 0 is arbitrary small, so that lim inf αm (Rn ) ≥ αm (R). The continuity
of αm at R follows, since αm is upper semi-continuous by Lemma 1.
⊓
⊔
Lemma 3. The set of solutions of the optimization problem (20) is a convex
subset of Dk .
Proof. Let us first prove that any local maximum of αm belongs to Dk . Indeed,
it is easy to see that, if R ∈
/ Dk , the function
t ∈ R → αm (R + t)
is continuous and strictly monotone in the neighborhood of the origin. This
shows that R cannot be a local maximum of αm .
Let R1 and R2 be two solutions of (20):
αm (R1 ) = αm (R2 ) = αm.k ≡
sup αm (R).
R∈Pk−1
By definition of αm
∀x ≤ αm,k ,
0 ≤ [ψm (R1 )] (x) ≤ 4 and
0 ≤ [ψm (R2 )] (x) ≤ 4.
80
J.Ch. Gilbert and P. Joly
Therefore, since ψm is an affine function, for any t ∈ [0, 1], there holds
∀x ≤ αm,k , 0 ≤ ψm tR1 + (1−t)R2 (x) ≤ 4.
Hence
αm tR1 + (1−t)R2 = αm,k .
In other words, any point of the segment [R1 , R2 ] is a solution of (20), i.e.,
the set of solutions of (20) is convex.
⊓
⊔
As a consequence of Lemmas 2 and 3, we know that any solution R of (20)
is such that
TR ≡ {τ ∈ ]0, αm,k [ | [ψm (R)] (τ ) = 0 or 4}
is nonempty. Let us call tangent point an element of TR . Theorem 4 below is
more precise, since it claims that there is at least M ≥ k tangent points τj at
which ψm (R) takes alternatively the values 0 and 4. For any R, it is convenient
to construct and enumerate these tangent points in decreasing order:
τM +1 = 0 < τM < · · · < τ1 < τ0 = αm,k .
The selected subset {τ1 , τ2 , . . . , τM } ⊂ TR is built as follows. Let us start by
setting
)
−1 if [ψm (R)] (τ0 ) = 4,
τ0 = αm,k and s0 =
(25)
+1 if [ψm (R)] (τ0 ) = 0.
The points τj ∈ TR , j = 1, . . . , M and their number M are determined by the
following recurrence: For j ≥ 1,
1. set sj = −sj−1 ;
2. if this is possible, take τj as the largest τ ∈ ]0, τj−1 [ such that
)
4 if sj = −1,
[ψm (R)] (τj ) =
0 if sj = +1.
The procedure stops when there is no relevant τj in the step 2 above (it must
stop because of the polynomial nature of ψm (R)). In the proof of Theorem 4
below, sj is actually the sign at τj of a certain function ϕ that is added to a
potential solution R.
A priori, because of the chosen selection procedure, it may occur that
M = 0, even though the number of tangent points is nonzero. The next
theorem shows that this is not the case for a local maximum.
Theorem 4 (Necessary optimality condition). Let R be a local maximum of (20). Then the number M of alternate tangent points selected by the
procedure (25)+1+2 satisfies M ≥ k.
Optimal Higher Order Time Discretizations
81
Proof. We proceed by contradiction, assuming that M ≤ k − 1. For j =
0, . . . , M − 1, one can find a point
τj+ 21 ∈ ]τj+1 , τj [ such that [ψm (R)] ]τj+1 , τj+ 21 ] ⊂ ]0, 4[.
(26)
Consider the polynomial ϕ defined at x ∈ R by
ϕ(x) = s0
M
−1
.
j=0
x − τj+ 12 .
Hence ϕ ≡ s0 if M = 0. This polynomial is of degree M ≤ k − 1, so that
it is a possible increment to R in Pk−1 . For t > 0, consider the polynomial
pt = ψm (R + tϕ), which verifies for all x ∈ R:
pt (x) = [ψm (R)](x) + txm+1 ϕ(x).
We shall get a contradiction and conclude the proof if we show that, for any
small t > 0, pt (x) ∈ ]0, 4[ for x ∈ ]0, αm,k ] (since then αm (R + tϕ) > αm,k and
R would not be a local maximum).
We shall only consider the case when [ψm (R)] (αm,k ) = 4, since the reasoning is similar when [ψm (R)] (αm,k ) = 0. Then s0 = −1 by (25).
•
•
•
On the interval ]τ1/2 , αm,k ], ψm (R) is greater than a positive constant
(since it is positive on ]τ1 , αm,k ] by the definition of τ1 and τ1 < τ1/2 <
αm,k ). On the other hand, on this interval, ϕ is negative (since s0 < 0) and
bounded. Therefore, for t > 0 small enough, pt ∈ ]0, 4[ on this interval.
Since ϕ(τ1/2 ) = 0, pt (τ1/2 ) = [ψm (R)] (τ1/2 ), which is in ]0, 4[ by the
definition of τ1/2 in (26).
On the interval ]τ3/2 , τ1/2 [, ψm (R) is less than a constant < 4 (since it
is < 4 on ]τ2 , τ1/2 ] by the definition of τ2 and τ1/2 , see 2 and (26), and
τ2 < τ3/2 < τ1 < τ1/2 ). On the other hand, ϕ is positive and bounded on
this interval. Therefore, for t > 0 small enough, pt ∈ ]0, 4[ on this interval.
We proceed similarly for the other points τj+1/2 (j = 1, . . . , M − 1) and
intervals ]τj+3/2 , τj+1/2 [ (j = 1, . . . , M − 2). Let us now consider the interval
]0, τM −1/2 [, which contains tangent points that are all at y = 0 or all at y = 4.
•
•
If sM > 0 then, on the considered interval, the tangent points are all at
y = 0, ψm (R) is less than a constant < 4, and ϕ is positive. It results that,
for t > 0 small enough, pt (·) ∈ ]0, 4[ on the interval.
If sM < 0 then, on the considered interval, the tangent points are all
at y = 4, ψm (R) is positive, and ϕ is negative. Since the map x →
[ψm (R)] (x)/x = 1 + c1 x + . . . is greater than a positive constant on the
considered interval, the map x → [ψm (R)] (x)/x + txm ϕ(x) = pt (x)/x is
also positive on the interval for t > 0 sufficiently small. It results that, for
t > 0 small enough, pt (·) ∈ ]0, 4[ on the considered interval.
⊓
⊔
82
J.Ch. Gilbert and P. Joly
Our next result shows that the necessary optimality conditions of Theorem 4 are also sufficient. We shall need the following lemma on polynomials.
Lemma 4. If P ∈ Pk−1 takes alternatively nonnegative and non-positive values at k + 1 successive distinct points, then P = 0.
Proof. Without loss of generality, we can assume that, for points x0 < x1 <
· · · < xk , there hold
(−1)j P (xj ) ≥ 0,
for j = 0, 1, . . . , k.
(27)
Let us introduce the set of indices
I(P ) = {j ∈ {0, 1, . . . , k} | P (xj ) = 0}.
When I(P ) = {0, 1, . . . , k} (resp. I(P ) = ∅), the conclusion is straightforward
since then P has k + 1 (resp. k) roots.
Suppose now that I(P ) = ∅ and I(P ) = {0, 1, . . . , k}. Let us introduce the
Lagrange interpolation polynomials associated with the xj ’s:
Pl (x) =
.
j∈I(P )
j =l
(x − xj )
.
(xl − xj )
Note that all the Pl ’s belong to Pk−1 since I(P ) contains at most k points.
For ε > 0, we introduce
(−1)l Pl
Pε = P + ε
l∈I(P )
and note that
∀j ∈ I(P ),
(−1)j Pε (xj ) = ε > 0.
On the other hand, since Pε → P uniformly on [x0 , xk ], there exists ε0 > 0
such that
∀ε < ε0 , ∀j ∈
/ I(P ), (−1)j Pε (xj ) > 0.
Therefore, for ε < ε0 , Pε satisfies (27) with, moreover, I(Pε ) = ∅. This implies
that Pε = 0. By taking the limit when ε tends to 0, we get P = 0 (actually this
contradicts the fact that I(P ) can be nonempty and different from {0, . . . , k}).
⊓
⊔
Theorem 5 (Sufficient condition of optimality). Suppose that P =
ψm (R), for some R ∈ Pk−1 , have k tangent points {τj }kj=1 such that 0 <
τk < · · · < τ1 < τ0 = αm (R) and P (τj ) + P (τj+1 ) = 4 for j = 0, . . . , k − 1.
Then R is optimal for problem (20).
Optimal Higher Order Time Discretizations
83
Proof. Let Pm,k = ψm (Rm,k ) be an optimal polynomial (Corollary 1). The
difference D = R − Rm,k ∈ Pk−1 takes at x > 0 the value
D(x) =
P (x) − Pm,k (x)
.
xm+1
Since Rm,k is optimal, Pm,k (τj ) ∈ [0, 4] for j = 0, . . . , k. Then D(τj ) ≥ 0
(resp. D(τj ) ≤ 0) when P (τj ) = 4 (resp. P (τj ) = 0). Since P (τj ), j = 0, . . . , k,
alternates in {0, 4}, we have shown that
(−1)j (P (τ0 ) − 2) D(τj ) ≥ 0,
for j = 0, . . . , k.
These inequalities tell us that D ∈ Pk−1 satisfies the conditions of Lemma 4.
Therefore, D = 0 proving that R is optimal.
⊓
⊔
The necessary and sufficient optimality conditions of Theorems 4 and 5
will be used to determine the optimal polynomials in Section 4. We conclude
this section with two corollaries of these optimality conditions. The first one
deals with the uniqueness of the solution. The second one provides a full
description of the optimal polynomials when m = 1, relating them to the
Chebyshev polynomials of the first kind [Che66, LT86, Wei06].
Corollary 2 (Uniqueness of the solution). The maximization problem
(20) has one and only one solution. It has no other local maximum.
Proof. Existence has been quoted in Corollary 1. Uniqueness is is actually a
by-product of the proof of Theorem 5, where it is shown that if a polynomial
P = ψm (R), for some R ∈ Pk−1 , satisfies the optimality conditions (this
is the case for any local maximum, by Theorem 4), then R is equal to an
arbitrarily fixed solution. Hence there cannot be more than one solution or
local maximum.
⊓
⊔
Corollary 3 (Optimal polynomials when m = 1). For k ≥ 0,
α1,k = 4(k + 1)2
and the optimal polynomial ψ1 (R1,k ) takes at x ∈ [0, α1,k ] the value
#
$
2x
[ψ1 (R1,k )](x) = 2 1 − Tk+1 1 −
,
α1,k
(28)
(29)
where Tk denotes the Chebyshev polynomial of the first kind and degree k,
which verifies Tk (x) = cos(k arccos x) for x ∈ [−1, 1].
Proof. Let α1,k be defined by (28) and let ϕ be the function defined at x ∈
[0, α1,k ] by the right-hand side of (29). The fact that ϕ ≡ ψ1 (R1,k ) will result
from the following observations:
84
•
•
•
J.Ch. Gilbert and P. Joly
ϕ ∈ ψ1 (Pk−1 ). Indeed, ϕ ∈ Pk+1 . On the other hand, the above formula
′
of Tk shows that Tk′ (1) = k 2 , so that ϕ′ (0) = 4Tk+1
(1)/α1,k = 1, which
indicates that the coefficient of x in ϕ is the one of Q1 .
The formula of Tk clearly shows that ϕ(x) ∈ [0, 4] for x ∈ [0, α1,k ]. On
′
the other hand, ϕ(α1,k ) = 2[1 + (−1)k ] and ϕ′ (α1,k ) = 4Tk+1
(−1)/α1,k =
k
(−1) , so that ϕ gets out of [0, 4] at x = α1,k .
The formula of Tk shows that
2jπ
2
,
0 < 2j < k+1,
ϕ(τ ) = 0 when τ = 2(k+1) 1 − cos
k+1
(2j+1)π
, 0 < 2j+1 < k+1,
ϕ(τ ) = 4 when τ = 2(k+1)2 1 − cos
k+1
in which j ∈ N. Therefore, ϕ has k tangent points in ]0, α1,k [, at which ϕ
takes alternatively the value 4 and 0.
Using the last observation and the fact that ϕ(α1,k ) = 2[1 + (−1)k ] (= 0 if k
is odd and = 4 if k is even), we show that ϕ satisfies the sufficient optimality
⊓
⊔
conditions (Theorem 5). Hence ϕ = ψ1 (R1,k ).
Remark 4. A natural question is whether the number of tangent points of
an optimal polynomial ψm (Rm,k ) can be greater than k. The answer to this
question depends actually on the coefficients of x0 , . . . , xm , which are fixed in
the optimization process. We do not know the answer when the coefficients
are those of the polynomial Qm , but for other coefficients the number of
tangent points can be greater than k. The argument is the following. Let
[ψm−1 (Rm−1,2 )](x) = Qm−1 (x) + xm (r0 + r1 x) be the optimal polynomial
with m − 1 fixed and two free coefficients. By the previous theorem, it has at
least two tangent points. Now, consider the function ψ̃m obtained by replacing
in ψm defined by (22), Qm by the polynomial x → Qm−1 (x) + r0 xm . Clearly
the optimal polynomial associated with ψ̃m on P0 is ψ̃m (R̃m,1 ) where R̃m,1
is the constant r1 . Therefore, ψ̃m (R̃m,1 ) = ψm−1 (Rm−1,2 ) has two tangent
points, although the minimization has been done on P0 .
⊓
⊔
Remark 5. When checking optimality by looking at the alternate character of
[ψm (R)](τj ) in {0, 4}, one has to include the point τ0 = αm (R). In particular,
when k = 1, a polynomial with a single tangent point may not be optimal. An
example with m = 4 and k = 1 is shown in Figure 2. The optimal polynomial,
given by
[ψ4 (R4,1 )](x) = x −
x3
x4
x2
+
−
+ rx5
12 360 20160
with r ≃ 4.28 × 10−7 ,
is represented by the solid curve; the dashed curve is Q4 . The optimal polynomial [ψ4 (R4,1 )] has only one tangent point τ1 ≃ 33, 39, while τ0 = α4,1 ≃
44.03. As predicted by Theorem 4, [ψ4 (R4,1 )](τ1 ) + [ψ4 (R4,1 )](τ0 ) = 4. Now,
by increasing r to r ≃ 5.13 × 10−7 , one gets the dash-dotted curve, which
Optimal Higher Order Time Discretizations
85
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
−0.5
0
5
10
15
20
25
30
35
40
45
50
Fig. 2. Checking the sufficient condition of optimality for m = 4 and k = 1
has a tangent point at τ1 ≃ 9.88, but is not optimal since the value of the
polynomial at this point does not satisfy [ψ4 (R4,1 )](τ1 ) + [ψ4 (R4,1 )](τ0 ) = 4
(for this polynomial τ0 ≃ 34.22).
⊓
⊔
4 Computational Issues
4.1 Algorithm Based on the Parametrization by the Tangent
Points
In the numerical results discussed below, the optimal polynomial is searched
by its k alternate tangent points (τj )1≤j≤k , with τ1 > τ2 > · · · > τk , whose existence is ensured by Theorem 4. These points are determined in the following
manner. For τ = (τ1 , . . . , τk ), let R(τ ) be the polynomial in Pk−1 satisfying
ψm (R(τ )) = v ∈ Rk ,
in which the components of v take alternatively the values 0 and 4. Whether
one has to impose v1 = 0 or v1 = 4 is further discussed below. The coefficients
r = (r0 · · · rk−1 )T of R(τ ) are uniquely determined by the equation above,
which can also be written
⎞
⎞
⎛ m+1
⎛
· · · τ1m+k
τ1
[ψm (0)] (τ1 )
⎟
⎜ ..
.. ⎟ r = v − ⎜
..
(30)
⎠.
⎝ .
⎝
.
. ⎠
τkm+1 · · · τkm+k
[ψm (0)] (τk )
Next, let us introduce the function F : τ ∈ Rk → F (τ ) ∈ Rk , where the
components of F (τ ) are the derivatives of the polynomial ψm (R(τ )) at the
τj ’s:
86
J.Ch. Gilbert and P. Joly
⎞
[ψm (R(τ ))]′ (τ1 )
⎟
⎜
..
F (τ ) = ⎝
⎠.
.
⎛
[ψm (R(τ ))]′ (τk )
Obviously, there holds F (τ ) = 0 if τ is the vector of the alternate tangent
points of the optimal polynomial. We propose to determine the root(s) τ of
F by Newton’s method (see [Deu04, BGLS06], for instance). The procedure
could have been improved by using a version of Newton’s method that exploits
inequalities (see, for example, [Kan01, BM05] and the references thereof) to
impose τ1 > τ2 > · · · > τk as well as the curvature of the solution polynomial
at the tangent points: [ψm (R(τ ))]′′ (τj )(2 − vj ) ≥ 0, for 1 ≤ j ≤ k. We have
not adopted this additional sophistication, however.
The Newton method requires the computation of F ′ (τ ). If we denote by
rl (τ ), 1 ≤ l ≤ k, the coefficients of R(τ ), by δij the Kronecker symbol, and by
Vk (τ ) the Vandermonde matrix of order k, there holds
k
∂rl
∂Fi
(τ ) = δij [ψm (R(τ ))]′′ (τi ) +
(τ )(m + l)τim+l−1
∂τj
∂τj
l=1
′′
= δij [ψm (R(τ ))] (τi )
+ [Diag(τ1m , . . . , τkm )Vk (τ ) Diag((m + 1), . . . , (m + k))r′ (τ )]ij .
To get an expression of r′ (τ ), let us differentiate with respect to τj the identity
[ψm (R(τ ))](τi ) = vi . It results
δij [ψm (R(τ ))]′ (τi ) + τim+1 · · · τim+k
∂r
(τ ) = 0.
∂τj
Denoting by M (τ ) the coefficient matrix of the linear system (30), we get
r′ (τ ) = −M (τ )−1 Diag ([ψm (R(τ ))]′ (τ1 ), . . . , [ψm (R(τ ))]′ (τk ))
= −M (τ )−1 Diag(F (τ )).
Therefore,
F ′ (τ ) = Diag ([ψm (R(τ ))]′′ (τ1 ), ..., [ψm (R(τ ))]′′ (τk ))
− Diag(τ1m , ..., τkm )Vk (τ ) Diag((m+1), ..., (m+k))M (τ )−1 Diag(F (τ )).
Observe that at a solution τ ∗ the second term above vanishes, so that F ′ (τ ∗ ) is
diagonal. It is also nonsingular if the second derivatives [ψm (R(τ ∗ ))]′′ (τj∗ ) are
nonzero. Around such a solution, Newton’s method is, therefore, well defined.
In the numerical results presented below, we have used the solver of nonlinear equations fsolve of Matlab (version 7.2), which does not take into account
the inequality constraints. The vector v has been determined by adopting the
following heuristics. We have assumed that the optimal polynomial is negative for all x < 0 (it has unit slope at x = 0), which implies that rk , the
Optimal Higher Order Time Discretizations
87
coefficient of xm+k of the optimal polynomial, has the sign (−1)m+k+1 ; if the
assumption is correct, the optimal polynomial should get out of the interval
at y = 0 if m + k is even and at y = 4 if m + k is odd; according to Theorem 4,
one should, therefore, take v1 = 4 − εv if m + k is even and v1 = εv if m + k
is odd. The value of εv is taken nonnegative and as close as possible to 0. A
positive value of εv is usually necessary for counterbalancing rounding errors.
The other values of vi alternate in {εv , 4 − εv }. The initial point τ is chosen
by trials and errors, or according to suggestions made in the discussion below.
The proposed approach has the following advantages (+) and disadvantages (−):
+ The problem has few variables (just k).
+ The problem looks well conditioned, provided the second derivatives at
the tangent points are reasonable, which seems to be the case.
− There is no guarantee that the solution found is the optimal one since a
zero of F will not be a solution to the original problem if the polynomial
gets out of [0, 4] at a point τ0 less than τ1 . An example of this situation is
given in Figure 3. However, if τ0 > τ1 and if [ψm (R)](τ0 )+[ψm (R)](τ1 ) = 4,
the sufficient optimality conditions of Theorem 5 guarantee that R is the
solution.
− The solution polynomial may get out of the interval [0, 4] near a tangent
point due to the lack of precision of the solution, which has motivated the
use of the small εv > 0.
− Obtaining the convergence to a zero of F (not only a stationary point τ ∗
of F 22 , hence verifying F ′ (τ ∗ )T F (τ ∗ ) = 0) depends on the initialization
of the iterative process.
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
−0.5
0
5
10
15
20
25
30
35
40
45
50
Fig. 3. A zero of F that is not an optimal polynomial (m = 3, k = 1).
88
J.Ch. Gilbert and P. Joly
4.2 Numerical Results
Computing αm,k
Table 1 shows the computed values of αm,k for 1 ≤ m ≤ 8 and 0 ≤ k ≤ 8.
The computed solutions were always satisfying the optimality conditions, so
that we are pretty confident in the values of αm,k in the table. In particular,
the small εv > 0 hardly modifies these values.
The column k = 0 of Table 1 corresponds to the polynomials Qm defined by
(11), for which the first values of the αm,0 ’s were already given in (13) (there
denoted αm ). We observe that the convergence of α2m+1,0 (resp. α2m,0 ) to
π 2 ≃ 9.87 (resp. 4π 2 ≃ 39.48), predicted by Theorem 3, is rather fast. On the
other hand, we observe that the values αm,k can be made spectacularly larger
than αm,0 , which was our objective.
We have verified that the optimal polynomials corresponding to m = 1
are, indeed, related to the Chebyshev polynomials through formula (29), as
claimed by Corollary 3. This fact can be observed in the first row of the table,
whose values of α1,k are, indeed, those given by (28).
Another observation is that the oscillating behaviour of αm with m, highlighted in the analysis leading to Theorem 3, is recovered in the sequences
{αm,k }m≥1 . The reason is similar. The first positive stationary point of the
optimal polynomial, which is close to the one of Q∞ , is (resp. is not) a tangent point when m is odd (resp. even). This observation leads to the following
conjecture: if we denote by τm,k,j the jth tangent point of the optimal polynomial ψm (Rm,k ) (1 ≤ j ≤ k), then, when m goes to infinity, τ2m+1,k,j (resp.
τ2m,k,j ) converges the jth (resp. (j+1)th) positive stationary point of Q∞ ,
the polynomial defined by (14). More specifically,
τ2m+1,k,j → j 2 π 2
and τ2m,k,j → (j+1)2 π 2 ,
when m → ∞.
(31)
In practice, these values can be used to choose a good starting point for the
algorithm when m is large.
Table 1. Computed values of the first αm,k ’s
m=1
m=2
m=3
m=4
m=5
m=6
m=7
m=8
k=0
k=1
k=2
k=3
k=4
k=5
k=6
k=7
k=8
4.00
12.00
7.57
21.48
9.53
30.72
9.85
37.08
16.00
32.43
23.40
44.03
31.61
57.23
37.37
70.89
36.00
60.56
45.72
73.45
58.23
89.78
68.93
107.67
64.00
96.61
75.06
110.01
90.77
128.89
108.35
150.35
100.00
140.64
111.58
153.83
129.90
174.84
151.08
199.32
144.00
192.66
155.38
204.98
175.84
227.71
199.56
254.89
196.00
252.67
206.51
263.51
228.71
287.61
255.61
317.22
256.00
320.68
265.04
329.49
288.59
354.59
317.90
386.35
324.00
396.69
331.00
402.92
355.23
428.71
357.95
462.27
Optimal Higher Order Time Discretizations
89
Diagonal schemes k = m
We have found interesting to have a particular look at the case k = m. First it
gives a computational effort per time step that is twice the one for the original
(2m)th order scheme, which corresponds to k = 0. The second reason is more
related to intuition: if one wants to get αm,k roughly proportional to m2 ,
we have to control the first m maxima or minima of the optimal polynomial
ψm (Rm,k ), for which we think that we need m parameters, which corresponds
to k = m. Below, we qualify such a scheme as diagonal.
Figure 4 shows the optimal polynomials ψm (Rm,m ), for m = 1, . . . , 8.
The tangent points are quoted by circles on the graphs, while the αm,m ’s are
quoted by dots.
Table 2 investigates the asymptotic behaviour of the diagonal schemes:
1. Its first column highlights the growth of the ratio between the maximum
time step allowed by the stability analysis in a diagonal scheme ∆tm,m
and in the second order scheme ∆t1,0 . According to Section 2.2, there
holds
1/2
1/2
αm,m
∆tm,m
αm,m
.
(32)
=
=
∆t1,0
α1,0
2
2. The computational cost Cm,m (T ) of the diagonal scheme of order 2m on
1
of
an integration time T is proportional to the computational cost Cm,m
one time step multiplied by the number of time steps. Hence, assuming
that the largest time step allowed by the stability analysis is taken, one
has
1
T
Cm,m
.
Cm,m (T ) ≃
∆tm,m
A similar expression holds for the computational cost C1,0 (T ) of the sec1
1
and ∆t1,0 ,
ond order scheme, with Cm,m
and ∆tm,m replaced by C1,0
respectively. The second column of Table 2 gives the ratio of these two
1
1
≃ 2mC1,0
(each time step of the
costs. Using (32) and the fact that Cm,m
diagonal scheme requires 2m times more operator multiplications than
each time step of the second order scheme), the ratio can be estimated by
4m
Cm,m (T )
≃ 1/2 .
C1,0 (T )
αm,m
The numbers in the second column of Table 2 suggest that this ratio is
bounded.
If the conjecture (33) below is correct, it should converge to
√
4 2/π ≃ 1.80, when m goes to infinity.
3. Taking k = m and j = ⌈m/2⌉ in (31), and assuming that αm,m ∼
2τm,m,⌈m/2⌉ (suggested by the approximate symmetry of the optimal polynomials) lead us to the following conjecture:
αm,m
π2
→
,
m2
2
when m → ∞.
(33)
90
J.Ch. Gilbert and P. Joly
4.5
4
4.5
3.5
4
3
3.5
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
a 1,1 = 16
−0.5
0
2
4
6
8
10
12
14
a 2,2 = 60.56
−0.5
16
4.5
0
10
20
30
40
50
60
70
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
a 3,3 = 75.06
−0.5
0
10
20
30
40
50
60
70
80
4.5
a 4,4 = 153.8
−0.5
0
20
40
60
80
100
120
140
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
a 5,5 = 175.84
−0.5
160
0
20
40
60
80
100
120
140
160
180
4.5
a 6,6 = 287.61
−0.5
0
50
100
150
200
250
300
4.5
4
4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0
a 7,7 = 317.90
−0.5
0
50
100
150
200
250
300
350
a 8,8 = 462.27
−0.5
0
50
100
150
200
250
300
350
400
450
500
Fig. 4. The polynomials Qm = ψm (0) (dashed curves) and the optimal polynomials
ψm (Rm,m ) for m = 1, . . . , 8 (solid curves)
Optimal Higher Order Time Discretizations
91
Table 2. Asymptotic behaviour of the diagonal schemes
m
1
2
3
4
5
6
7
8
∞
∆tm,m
∆tm,0
2.00
3.89
4.33
6.20
6.63
8.48
8.91
10.75
Cm,m (T )
C1,0 (T )
1.00
1.03
1.39
1.29
1.51
1.42
1.57
1.49
2αm,m
m2 π 2
3.24
3.07
1.69
1.95
1.43
1.62
1.31
1.46
1.80
1.00
This conjecture is explored numerically in the third column of Table 2.
Note that it does not distinguish between even and odd values of m, at
least asymptotically. However, looking at the αm,m ’s on the diagonal of
Table 1, it appears that the even values of k = m look more interesting
than the odd ones.
5 Conclusion
In this paper, we have analyzed the stability of higher order time discretization
schemes for second order hyperbolic problems based on the modified equation
approach. We have in particular proven that the upper bound for the time
step (the CFL limit) remains uniformly bounded for large m (2m is the order of the scheme). On the basis of this information, we have proposed the
construction of new schemes that are seen as modifications of the previous
ones and are designed in order to optimize the CFL condition: this is formulated as an optimization problem in a space of polynomials of given degree.
Despite some unpleasant properties (the objective function is non-convex and
even discontinuous at the solution!), this problem can be fully analyzed. In
particular, we prove the existence and uniqueness of the solution and give necessary and sufficient conditions of optimality. These conditions are exploited
to design an algorithm for the effective numerical solution of the optimization
problem. The obtained results are more than satisfactory with respect to our
original objective. They suggest some conjectures that would mean that we
would be able to produce schemes of arbitrary high order in time and whose
computational cost would be almost independent of the order.
Of course, this is a preliminary work and much has still to be done, including the following items:
92
•
•
•
J.Ch. Gilbert and P. Joly
The effective efficiency of the new schemes should be tested on realistic
wave propagation problems.
The impact of the modification of the initial schemes (the ones which are
based on the modified equation technique) on the effective accuracy (we
are only guaranteed that the order of approximation is preserved) should
be analyzed thorough numerical dispersion studies.
Our various theoretical conjectures should be addressed in a rigorous way.
These will be the subjects of forthcoming works.
References
[AJT00]
L. Anné, P. Joly, and Q. H. Tran. Construction and analysis of higher
order finite difference schemes for the 1D wave equation. Comput.
Geosci., 4(3):207–249, 2000.
[AKM74]
R. M. Alford, K. R. Kelly, and Boore D. M. Accuracy of finite difference modeling of the acoustic wave equation. Geophysics, 39:834–
842, 1974.
[BGLS06]
J. F. Bonnans, J. Ch. Gilbert, C. Lemaréchal, and C. Sagastizábal.
Numerical Optimization – Theoretical and Practical Aspects. Universitext. Springer Verlag, Berlin, 2nd edition, 2006.
[BM05]
S. Bellavia and B. Morini. An interior global method for nonlinear
systems with simple bounds. Optim. Methods Softw., 20(4–5):453–
474, 2005.
[CdLBL97]
R. Carpentier, A. de La Bourdonnaye, and B. Larrouturou. On
the derivation of the modified equation for the analysis of linear
numerical methods. RAIRO Modél. Math. Anal. Numér., 31(4):459–
470, 1997.
[CF05]
G. Cohen and S. Fauqueux. Mixed spectral finite elements for the
linear elasticity system in unbounded domains. SIAM J. Sci. Comput., 26(3):864–884 (electronic), 2005.
[Che66]
E. W. Cheney. Introduction to Approximation Theory. McGraw-Hill,
1966.
[CJ96]
G. Cohen and P. Joly. Construction analysis of fourth-order finite difference schemes for the acoustic wave equation in nonhomogeneous
media. SIAM J. Numer. Anal., 33(4):1266–1302, 1996.
[CJKMVV99] M. J. S. Chin-Joe-Kong, W. A. Mulder, and M. Van Veldhuizen.
Higher-order triangular and tetrahedral finite elements with mass
lumping for solving the wave equation. J. Engrg. Math., 35(4):405–
426, 1999.
[CJRT01]
G. Cohen, P. Joly, J. E. Roberts, and N. Tordjman. Higher order
triangular finite elements with mass lumping for the wave equation.
SIAM J. Numer. Anal., 38(6):2047–2078 (electronic), 2001.
[Coh02]
G. C. Cohen. Higher-order numerical methods for transient wave
equations. Scientific Computation. Springer-Verlag, Berlin, 2002.
[Dab86]
M. A. Dablain. The application of high order differencing for the
scalar wave equation. Geophysics, 51:54–56, 1986.
Optimal Higher Order Time Discretizations
[Deu04]
[DPJ06]
[FLLP05]
[HW96]
[HW02]
[Jol03]
[Kan01]
[LT86]
[PFC05]
[RM67]
[RS78]
[SB87]
[Sch91]
[TT05]
[Wei06]
93
P. Deuflhard. Newton Methods for Nonlinear Problems – Affine
Invariance and Adaptative Algorithms. Number 35 in Computational
Mathematics. Springer, Berlin, 2004.
S. Del Pino and H. Jourdren. Arbitrary high-order schemes for the
linear advection and wave equations: application to hydrodynamics
and aeroacoustics. C. R. Math. Acad. Sci. Paris, 342(6):441–446,
2006.
L. Fezoui, S. Lanteri, S. Lohrengel, and S. Piperno. Convergence and
stability of a discontinuous Galerkin time-domain method for the 3D
heterogeneous Maxwell equations on unstructured meshes. M2AN
Math. Model. Numer. Anal., 39(6):1149–1176, 2005.
E. Hairer and G. Wanner. Solving ordinary differential equations.
II, volume 14 of Springer Series in Computational Mathematics.
Springer-Verlag, Berlin, 2nd edition, 1996. Stiff and differentialalgebraic problems.
J. S. Hesthaven and T. Warburton. Nodal high-order methods on
unstructured grids. I. Time-domain solution of Maxwell’s equations.
J. Comput. Phys., 181(1):186–221, 2002.
P. Joly. Variational methods for time-dependent wave propagation
problems. In Topics in computational wave propagation, volume 31
of Lect. Notes Comput. Sci. Eng., pages 201–264. Springer, Berlin,
2003.
Ch. Kanzow. An active set-type Newton method for constrained nonlinear systems. In M.C. Ferris, O.L. Mangasarian, and J.S. Pang,
editors, Complementarity: applications, algorithms and extensions,
pages 179–200, Dordrecht, 2001. Kluwer Acad. Publ.
P. Lascaux and R. Théodor. Analyse Numérique Matricielle
Appliquée à l’Art de l’Ingénieur. Masson, Paris, 1986.
S. Pernet, X. Ferrieres, and G. Cohen. High spatial order finite element method to solve Maxwell’s equations in time domain. IEEE
Trans. Antennas and Propagation, 53(9):2889–2899, 2005.
R. D. Richtmyer and K. W. Morton. Difference methods for initialvalue problems, volume 4 of Interscience Tracts in Pure and Applied
Mathematics. John Wiley & Sons, Inc., New York, 2nd edition, 1967.
M. Reed and B. Simon. Methods of modern mathematical physics.
IV. Analysis of operators. Academic Press [Harcourt Brace Jovanovich Publishers], New York, 1978.
G. R. Shubin and J. B. Bell. A modified equation approach to constructing fourth-order methods for acoustic wave propagation. SIAM
J. Sci. Statist. Comput., 8(2):135–151, 1987.
L. Schwartz. Analyse I – Théorie des Ensembles et Topologie. Hermann, Paris, 1991.
E. F. Toro and V. A. Titarev. ADER schemes for scalar non-linear
hyperbolic conservation laws with source terms in three-space dimensions. J. Comput. Phys., 202(1):196–215, 2005.
E. W. Weisstein. Chebyshev polynomial of the first kind. MathWorld.
http://mathworld.wolfram.com/ChebyshevPolynomialoftheFirst
Kind.html, 2006.
Comparison of Two Explicit Time Domain
Unstructured Mesh Algorithms
for Computational Electromagnetics
Igor Sazonov, Oubay Hassan, Ken Morgan, and Nigel P. Weatherill
Civil and Computational Engineering Centre, School of Engineering, University of
Wales, Swansea SA2 8PP, Wales, UK
{i.sazonov,O.Hassan,K.Morgan,N.P.Weatherill}@swansea.ac.uk
Summary. An explicit finite element time domain method and a co-volume approach, based upon a generalization of the well-known finite difference time domain
scheme of Yee to unstructured meshes, are employed for the solution of Maxwell’s
curl equations in the time domain. A stitching method is employed to produce
meshes that are suitable for use with a co-volume algorithm. Examples, involving
EM wave propagation and scattering, are included and the numerical performance
of the two techniques is compared.
Key words: computational electromagnetics, Delaunay triangulation, Voronoı̈
tessellation, co-volume mesh generation, explicit schemes, finite element method,
co-volume method, EM wave propagation and scattering
1 Introduction
Computational methods are widely employed for the solution of Maxwell’s
equations in a variety of different application areas that fall within the general
field of electromagnetics. For practical applications, the requirement of modelling complex geometries means that unstructured mesh methods are particularly attractive, as fully automatic unstructured mesh generation procedures
are now widely available [Geo91, WH94, PPM99]. Following this philosophy
requires the identification of a suitable unstructured mesh-based solution algorithm and several low-order time domain procedures have been proposed
[MSH91, PLD92, CFS93, DL97, MWH+ 99]. These methods are readily implemented, but may require a significant computational resource to undertake accurate simulations involving wave propagation over a large number of
wavelengths [DBB99]. On the other hand, the Yee scheme [Yee66] is a covolume solution technique, on a structured Cartesian mesh, that exhibits a
high degree of computationally efficiency, in terms of both CPU and memory
requirements.
96
I. Sazonov et al.
To provide a practically useful computational procedure, it is natural to attempt to develop hybrid solution procedures, employing an unstructured mesh
method in the vicinity of a complex geometry and the co-volume method elsewhere [RBT97, MM98, RB00, EL02, EHM+ 03]. An alternative approach is to
employ an unstructured mesh everywhere and to attempt to use an unstructured mesh implementation of the co-volume scheme [Mad95, GL93]. A basic
requirement for the successful implementation of the co-volume scheme is the
existence of two, high quality, mutually orthogonal meshes. For an unstructured mesh implementation, the obvious dual mesh choice is the Delaunay–
Voronoı̈ diagram. Despite the fact that real progress has been achieved in
unstructured mesh generation methods over the last two decades, co-volume
schemes have not generally proved to be effective for simulations involving
domains of complex shape [NW98]. This is due to the difficulties encountered
when attempting to generate sufficiently smooth, high quality dual meshes for
such problems. Standard mesh generation methods are designed to create high
quality Delaunay triangulations, but do not attempt to provide a high quality
dual Voronoı̈ mesh. A stitching method was recently proposed [SWH+ 06] for
the generation of meshes for the co-volume scheme in two dimensions. In this
approach, the problem of triangulation of a domain of complicated shape is
split into a set of relatively simple problems of local triangulation. Each local
mesh is constructed with properties which are close to those of an ideal mesh
and the local triangulations are combined, to form a consistent mesh, by using
a stitching algorithm. The quality of the stitched mesh is improved by the use
of standard mesh quality enhancement methods.
In this paper, we will utilise the meshes produced by the stitching method
to compare the efficiency and the accuracy of a co-volume scheme on unstructured meshes and an explicit linear finite element procedure for Maxwell’s
curl equations [MHP94, MHP96, MHPW00]. The layout of the paper is as
follows: Section 2 describes the governing equations. A brief description of the
finite element time domain algorithm is given in Section 3, while the implementation of the co-volume scheme on unstructured meshes is described in
Section 4. Section 5 provides a brief description of the approach used for the
generation of the required meshes. In Section 6, a study of the accuracy and
the efficiency of both algorithms is presented for wave propagation and wave
scattering examples. Finally, conclusions are drawn in Section 7.
2 Governing Equations
The equations governing the propagation of electromagnetic waves through a
free space region may be considered in the dimensionless integral form
1
1
∂
∂
H dΓ,
E dΓ
(1)
E dΩ =
H dΩ = −
∂t Ω
∂t Ω
Γ
Γ
for an arbitrary surface Ω bounded by a closed contour Γ , or in the corresponding differential form
Electromagnetic Scattering
∂H
= −∇ × E,
∂t
∂E
= ∇ × H.
∂t
97
(2)
Here, E and H denote the electric and magnetic field intensity respectively,
dΩ denotes an element of surface area, in the direction normal to the surface,
and dΓ is an element of contour length, in the tangent direction to the contour.
Consideration will be restricted to the solution of two-dimensional problems,
involving TE polarized waves. In this case, relative to a Cartesian x, y, z coordinate system, the field intensity vectors E = (Ex , Ey , 0) and H = (0, 0, Hz )
are functions of t, x and y only.
The scattering simulations that will be undertaken will involve the interaction between a known incident field, generated by a source located in the
far field, and a scatterer, surrounded by free space. It will be assumed that the
scatterer is a perfect electrical conductor (PEC) and that the incident field is
a plane single frequency wave. For such simulations, it is convenient to split
the total electric and magnetic fields as
E = Einc + Escat ,
H = Hinc + Hscat ,
(3)
where the subscripts inc and scat refer to the incident and scattered wave components respectively. The problem is then formulated in terms of the scattered
fields. The boundary condition at the surface of the scatterer is the requirement that the tangential component of the total electric field should be zero.
The infinite solution domain must be truncated to enable a numerical simulation and the condition that must be imposed at the truncated far field
boundary is that the scattered field should only consist of outgoing waves.
This requirement is imposed by surrounding the computational domain with
an artificial perfectly matched layer (PML) [Ber94, BP97].
3 A Finite Element Method
An explicit finite element time domain (FETD) method, for implementation
on a general unstructured mesh of triangles, can be developed by initially
writing the equations (2) in the form
∂Fk
∂U
∂U
=−
= −Ak
,
∂t
∂xk
∂xk
(4)
where k takes the values 1 and 2 and the summation convention is employed.
Here x1 = x, x2 = y and
⎡ ⎤
⎡
⎤
Hz
0
−(k − 1) (2 − k)
0
0 ⎦.
U = ⎣ Ex ⎦ ,
Ak = ⎣ (k − 1)
(5)
Ey
−(2 − k)
0
0
This equation is discretised using the explicit TG2 algorithm [DH03]. In this
method, the solution is advanced over a time step, ∆t, in a two-stage process.
98
I. Sazonov et al.
In the first stage, the solution is advanced from time level tn to time level
tn+1/2 = tn + ∆t/2 using the forward difference approximation
(n)
∂U
∆t
.
(6)
U(n+1/2) = U(n) −
Ak
2
∂xk
Here, the superscript (n) denotes an evaluation at time t = tn . In the second
stage, the solution at time level tn+1 = tn + ∆t is obtained from the central
difference approximation
(n+1/2)
∂U
U(n+1) = U(n) − ∆t Ak
.
(7)
∂xk
At time t = tn , a continuous piecewise linear approximation, on element e,
may be expressed as
(n)
Ue(n) = N(J) U(J) ,
(8)
where N(J) is the piecewise linear shape function associated with node J of the
mesh, U(J) represent nodal values and the implied summations extend over
each node J of element e. A variational formulation [ZM06] of the equation (6)
is employed to obtain the solution at time level t = tn+1/2 . To obtain the
solution at the end of the time step, at each node I, the weak variational
formulation [ZM06]
(n)
(n+1)
(n)
k
(n+1/2) ∂N(I)
dΩ −
M(IJ) U(J) = M(IJ) U(J) + A
Ue
F̃n N(I) dΓ
∂xk
Γ
e∈Ω
(9)
for the equation (7) is employed over the computational domain, Ω. In the
equation (9), Γ denotes the boundary of region Ω, F̃n is a normal boundary
flux and M(IJ) is the standard consistent mass matrix for the mesh of linear
triangular elements in Ω. The equation (9) is solved by explicit iteration and
the resulting algorithm is stable provided that a CFL condition of the form
∆t ≤ C min he
e
(10)
is satisfied, where he denotes the minimum height of element e and C is a
safety factor.
For scattering simulations, the boundary condition at the surface of the
PEC scatterer is weakly imposed through the Galerkin statement. The truncated far field boundary is taken to be rectangular in shape and a structured
grid of triangular elements is used to discretise the PML region.
4 A Co-Volume Method
For the co-volume method, the governing equations are considered in the
integral, time domain form of the equation (1) and the discretisation is accomplished using two mutually orthogonal meshes [Mad95, GL93]. For this
Electromagnetic Scattering
99
purpose, we choose to employ the Delaunay–Voronoı̈ dual diagram, with the
integrals taken over the edges of the Delaunay and Voronoı̈ cells. To illustrate
the process, consider a triangular element m of the Delaunay mesh. This element will share an edge with Nm elements, with numbers mi , 1 ≤ i ≤ Nm ,
where Nm = 3, unless the element has an edge representing the boundary of
the domain. Suppose the Delaunay edge mmi is the common edge between
elements m and mi and let the length of this edge be denoted by ℓmmi . Similarly, suppose that the Voronoı̈ edge mmi is the line segment connecting the
circumcentres of element m and element mi . The length of this Voronoı̈ edge
will be denoted by hmmi . As basic unknowns in the solution algorithm, we
consider the value of the z-component of the magnetic field at the Voronoı̈
vertices, and denote this by Hm , and the projection of the electric field at
the midpoint of the Delaunay edge mmi , in the direction of the edge, and
denote this by Emmi . In this case, the laws of Ampère and Faraday can be
approximated, using central differencing, as
N
m
∆t
E (n) ℓmmi ,
Sm i=1 mmi
∆t (n+1/2)
(n+1/2)
Hm
,
+
− Hm
i
hmmi
(n+1/2)
(n−1/2)
Hm
= Hm
−
(n)
(n+1)
= Emm
Emm
i
i
(11)
(12)
where Sm is the area of element m. This is a staggered explicit scheme, where
the time step size for a stable implementation may be determined from the
requirement [TH00]
(13)
∆t < C min {ℓmin , hmin } .
Here ℓmin and hmin are the minimum Delaunay and Voronoı̈ edge lengths
respectively and C is a safety factor. This implies the use of meshes which do
not include either very short Delaunay, or very short Voronoı̈, edges. However,
Voronoı̈ edge lengths may vanish completely, on a general unstructured mesh,
when two adjacent triangles have a common circumcentre. When this happens,
the simple remedy is to merge these two triangles to form a single quadrilateral
element. The discrete formulae of the equations (11) and (12) may be applied
directly to this quadrilateral, with appropriate redefinition of Nm . Moreover,
the same merging procedure can be adopted when more than two triangles
share a common circumcentre and the discrete equations applied again to the
polygonal cell that is created by merging the triangles in this manner. This
merging process is illustrated in Figure 1. If the mesh contains short non-zero
Voronoı̈ sides, the merging process may still be carried out, to overcome the
severe restriction on the time step. However, this will reduce the accuracy
of the scheme, due to the slight local non-orthogonality introduced by the
merging.
The boundary condition on the tangential component of the electric field
can be directly imposed at the surface of the PEC. The far field boundary
condition is again approximated by the addition of an artificial PML, with the
external boundary of the truncated domain taken to be rectangular in shape.
100
I. Sazonov et al.
Fig. 1. An example of a Delaunay–Voronoı̈ dual diagram showing two mutually
orthogonal meshes suitable for use with a co-volume solution scheme. The dotted
lines indicate Voronoı̈ edges and the dots represent Voronoı̈ vertices. Quadrilateral
and pentagonal elements, formed by the merging of triangles, are indicated by bold
lines.
5 Mesh Generation
With algorithms of the form considered here, wave propagation problems are
normally simulated on a mesh, which is as uniform as possible, with a prescribed element size δ which is related to the wavelength. For two-dimensional
simulations, in the absence of boundaries, the ideal mesh for the co-volume
method is simply a mesh of equilateral triangles, with the Delaunay edge
length l = δ. In this√case, the Voronoı̈ elements are perfect hexagons, with
edge length h = δ/ 3 ≈ 0.577 δ. This ideal mesh has the highest quality
but, for general scattering simulations, it almost certainly will not be able to
represent the geometry of the scatterer. To overcome this problem, a method
based on stitching the ideal mesh to a near-boundary unstructured mesh has
been developed [SWH+ 06]. In the vicinity of each boundary, a body fitted
local mesh is constructed, with the properties close to those of the ideal mesh.
Near-boundary elements are generated by a modified form of the advancing
front method. The ideal mesh is employed, away from boundaries, in the major portion of the domain. An additional temporary layer of near-boundary
elements is generated to assist the process of connecting the near-boundary
mesh to the ideal mesh. The new nodes of this extra layer are marked as
potential nodes for connection. For each of these potential nodes, the closest
node in the ideal mesh is identified. Joining, consecutively, these identified
nodes of the ideal mesh, we obtain a closed polygon, or set of polygons. The
gap between the near-boundary elements and the ideal mesh element is triangulated using the Delaunay method. Here, points of the ideal mesh which lie
in the gap will also be used during the triangulation. Standard mesh enhancement procedures, such as edge swapping and Laplace smoothing, are used at
the end to improve the quality of the generated elements.
Electromagnetic Scattering
101
6 Numerical Examples
A number of examples will be presented which enable a comparison to be
made between the accuracy and the performance of the FETD approach and
the co-volume algorithm on unstructured meshes.
6.1 Narrow Waveguide
The first example involves the simulation of the propagation, in the positive
x-direction, of a plane harmonic TE wave, of wavelength λ, in a narrow rectangular waveguide. The waveguide occupies the region 0 ≤ x ≤ 200λ and its
width, 0.4λ, is small enough to avoid the generation of any wave normal to the
direction of propagation. Two unstructured meshes, with spacing δ ≈ λ/15
and δ ≈ λ/30, are generated using the stitching method. The majority of the
elements are almost equilateral triangles which exhibit all the desired mesh
quality properties [ZM06]. To enable a comparison with the results produced
by the traditional Yee scheme, two structured triangular grids are generated,
using the vertex spacings δ = λ/15 and δ = λ/30. On these meshes, the covolume scheme of the equations (11) and (12) reduces to the classical Yee
scheme. Figure 2 shows the structured mesh with δ = λ/15 and the unstructured mesh with δ ≈ λ/15. The solution is advanced for 170 cycles, using the
maximum allowable time step. For each case considered, the computed distribution of the magnetic field, between x = 139λ and x = 141λ, is compared
with the exact distribution in Figure 3. It can be seen that the Yee scheme on
the structured grid and the co-volume scheme on the unstructured grid maintain the amplitude of the propagating wave, while the FETD scheme fails
to maintain the amplitude. It can also be observed that the phase velocity
is under-predicted by both the Yee and the co-volume schemes and is overpredicted by the FETD scheme. However, the phase velocity obtained on the
unstructured meshes with the co-volume scheme is more accurate than the
phase velocity obtained using the traditional Yee scheme on the structured
(a)
(b)
Fig. 2. Details of the meshes employed for the propagation of a plane harmonic TE
wave in a waveguide: (a) the structured mesh with δ = λ/15; (b) the unstructured
mesh with δ ≈ λ/15.
102
I. Sazonov et al.
Hz
exact
Yee
FETD
Co-volume
1
0
139
140
x/l 141
(a)
Hz
exact
Yee
FEDT
Co-volume
1
0
139
140
x/l 141
(b)
Fig. 3. Propagation of a plane harmonic TE wave in a waveguide: magnetic field
after 170 cycles at a distance x ≈ 140λ, using (a) δ ≈ λ/15, (b) δ ≈ λ/30.
mesh. Table 1 compares the computational performance of the algorithms, in
terms of the required number of steps per cycle (spc), the CPU time needed
(time), the computed phase velocity (C) and the maximum amplitude (A) of
the magnetic field in the range 0 ≤ x ≤ 160λ. This table also enables computation of the speed-up factor, between the co-volume method and FETD,
which is achieved on both meshes. The effect of dispersion error on the phase
velocity, as a function of time step, is shown in Figure 4. A theoretical phase
velocity of one was specified for the present computation. This figure shows
the computed phase velocity, for various values of the time step, on the unstructured meshes using the co-volume scheme and the FETD scheme and, on
the structured meshes, using the Yee scheme, compared to the theoretically
expected Yee values [TH00]. The phase velocity achieved using the co-volume
method is much superior to the phase velocity expected from the structured
grid implementation.
Electromagnetic Scattering
103
Table 1. Propagation of a plane harmonic TE wave in a waveguide.
δ ≈ λ/15
Scheme
spc
time, s
C
Yee
Co-volume
FETD
21
46
44
0.99613
0.99850
1.0015
1.00
1.00
0.723
Scheme
Yee
Co-volume
FETD
spc
43
106
89
7
29
3151
δ ≈ λ/30
time, s
61
263
23040
A
C
0.99896
0.99964
1.0008
A
1.00
1.00
0.96
h ,
he
Fig. 4. Propagation of a plane harmonic TE wave in a waveguide showing variation
of the computed phase velocity with ∆t/h (∆t/he for FETD). Solid symbols
and solid line: δ ≈ λ/15; open symbols and dotted line: δ ≈ λ/30. Here h is the
averaged Voronoı̈ edge length, he is the averaged minimal triangle height.
6.2 Scattering by a Circular PEC Cylinder
The second example is the simulation of scattering of a plane single frequency
TE wave by a perfectly conducting circular cylinder of diameter λ. The objective is to use this example to illustrate the order of accuracy that can
be achieved with the co-volume solution technique and the FETD technique
on unstructured meshes. The problem is solved on a series of unstructured
meshes, with mesh spacings ranging from λ/8 to λ/128. The minimum distance from the rectangular PML to the cylinder is λ. When the spacing is
104
I. Sazonov et al.
(a)
(b)
Fig. 5. Scattering of a plane TE wave by a circular PEC cylinder of diameter λ
showing (a) an unstructured mesh with δ ≈ λ/16, (b) the corresponding computed
total magnetic field.
Scattering Width,
dB
Viewing Angle, degrees
Fig. 6. Scattering of a plane TE wave by a circular PEC cylinder of diameter
λ showing a comparison between the computed and analytical scattering width
distributions.
λ/16, the mesh employed, excluding the PML region, and the corresponding
distribution of the computed total magnetic field is shown in Figure 5. The
computed scattering width distributions are compared to the exact distribution in Figure 6. For each simulation undertaken, the error, ESW , in the solution is determined as the maximum difference, in absolute value, between the
computed and analytical scattering width distributions. The variation of this
computed error, with the number of elements per wavelength, λ/δ, for both
the FETD and co-volume schemes, is shown in Figure 7. It can be observed
that a convergence rate of around O(δ 2 ) is obtained with both methods on
these unstructured meshes, indicating that second-order accuracy is achieved.
It is likely that the error in the FETD results is slightly less because the approach adopted for the evaluation of the scattering width integral requires an
interpolation, in the co-volume scheme, to obtain all the field components at
Electromagnetic Scattering
105
10 0
ESW , dB
10 -1
10 -2
10 -3
8
16
32
l/d
64 128
Fig. 7. Scattering of a plane TE wave by a circular PEC cylinder of diameter λ
showing the variation of the computed error, with the number of elements, λ/δ, per
wavelength, for the co-volume scheme and the FETD scheme on the unstructured
meshes.
Table 2. Scattering of a plane TE wave by a circular PEC cylinder of diameter λ.
λ/δ
spc
8
16
32
64
128
21
42
83
165
239
Co-volume
time
ESW
0.15
0.5
4.0
37
250
0.744
0.275
0.060
0.019
0.006
spc
FETD
time
ESW
Speed up ratio
FETD/Co-volume
31
61
122
242
485
1.2
15.
117
922
7295
0.750
0.102
0.026
0.007
0.002
8
30
30
25
30
one location. The values of spc, time and ESW are shown in Table 2 for the
co-volume scheme and the FETD scheme on these unstructured meshes. It
can be observed that, for these simulations, the co-volume scheme is nearly
30 times faster than the FETD scheme.
As a more challenging variation of this example, we also consider scattering
of a plane single frequency wave by a perfectly conducting circular cylinder
of diameter 15λ. The mesh employed is generated to meet a mesh spacing
requirement of δ = λ/15. Again, the minimum distance from the PML region
to the cylinder is λ. The solution is advanced for 50 cycles of the incident
wave and the computed and exact scattering width distributions are compared in Figure 8(a). Excellent agreement with the exact solution is observed
using both schemes. The distribution of the computed total magnetic field
in the complete domain, including the PML, is shown in Figure 8(b). For
this example, the co-volume scheme is nearly 34 times faster than the FETD
scheme.
106
I. Sazonov et al.
Scattering Width, dB
Viewing Angle, degrees
(a)
(b)
Fig. 8. Scattering of a plane TE wave by a circular PEC cylinder of diameter
15λ showing (a) a comparison between the exact and computed scattering width
distributions, (b) computed contour distribution of the total magnetic field in the
complete computational domain.
6.3 Scattering by a Square PEC Cylinder
The next example involves scattering of a plane single frequency electromagnetic wave by a perfectly conducting cylinder of square cross section. The
sides of the square are of length λ. The objective is to use this example to
illustrate the accuracy of the FETD and co-volume schemes in the presence
of singularities. This simple geometry means that the computational domain
may be discretised using a structured mesh of square elements and, in this
case, the co-volume scheme of the equations (11) and (12) reduces to the classical Yee scheme. The distribution of the scattering width obtained using the
Yee scheme on a fine Cartesian grid, with 512 elements per wavelength, is
taken as the benchmark solution. An unstructured mesh, termed mesh a, is
generated with mesh spacing λ/16. The solution is advanced on this mesh for 8
cycles using both the co-volume and the FETD schemes. In this case, the error
ESW is determined as the maximum difference, in absolute value, between the
computed and the benchmark scattering width distributions. Table 3 shows
the values of spc, time and ESW for this grid. It is apparent that the error
in the FETD scheme is an order of magnitude greater than the error in the
co-volume method. This is believed to be due to the singularity in the geometry, where higher mesh resolution will be required in a scheme such as FETD.
Two further unstructured meshes are generated, by reducing the spacing by
a factor of 2 (termed mesh b) and 4 (termed mesh c), in the vicinity of the
corners. Details of the three meshes in the region of one of the corners are
shown in Figure 9. Figure 10 shows the variation in the computed error with
the near corner resolution that is employed. It can be seen that the error in
the FETD results decreases as the mesh is refined. It is also clear that the
Electromagnetic Scattering
107
Table 3. Simulation of scattering of a plane TE wave by a square PEC cylinder of
side length λ.
Mesh
resolution
FETD
spc time, s ESW
a
b
c
61
90
182
(a)
18.
27.
58.
Co-volume
spc time, s ESW
2.64 45
1.66 88
0.38 164
0.4
0.8
1.3
(b)
Speed up
ratio
0.21
0.25
0.14
45
34
44
(c)
Fig. 9. Details of the meshes employed for the simulation of scattering of a plane
TE wave by a square PEC cylinder of side length λ showing (a) mesh a, (b) mesh
b, (c) mesh c.
ESW, dB
Near-Corner Resolution
Fig. 10. Simulation of scattering of a plane TE wave by a square PEC cylinder
of side length λ showing the variation in the computed error with the near corner
mesh resolution.
error in the FETD results on mesh c is similar to the error in the co-volume
results obtained on mesh a. The constant error in the co-volume results confirm the belief that no special modification of the scheme is required in the
vicinity of geometrical singularities. Table 3 also displays information about
the calculations performed on meshes b and c. For this example, the co-volume
scheme is faster than FETD by a factor that ranges between 34 and 45. This
level of variation in the speed-up factor is probably due to the difficulty in
determining exactly the small times required for the co-volume solution.
108
I. Sazonov et al.
6.4 Scattering by a PEC NACA0012 Aerofoil
The next example involves the simulation of scattering of a plane single frequency wave, directed along the x-axis, by a perfectly conducting NACA0012
aerofoil of length λ. The aim of this example is to analyse the performance
of the numerical schemes when the geometry exhibits high curvature. A
benchmark solution is computed using an unstructured mesh with spacing
λ/120. The unstructured mesh is generated, outside the aerofoil, in the region −λ ≤ x, y ≤ λ. The scattering width distributions computed on this
mesh with the co-volume scheme and the FETD scheme proved to be identical. An unstructured mesh was generated to meet the spacing requirement
of λ/15. Another unstructured mesh, providing better representation of the
leading edge curvature, is generated by locally reducing the mesh spacing in
the vicinity of the leading edge of the airfoil by a factor of 2. A view of both
these meshes is shown in Figure 11.
The computed scattering width distributions are compared with the benchmark distribution in Figure 12. It can be observed that the co-volume results
are better on the uniform mesh and that the accuracy of the FETD results improve with the local refinement in the leading edge region. For this example,
Table 4 shows the values of spc, time and ESW . The co-volume method is
approximately 30 times faster than FETD for this example.
6.5 Scattering by a PEC Cavity
The final example considers the simulation of scattering of a plane single
frequency wave by a U-shaped PEC cavity. The thickness of the cavity walls
is equal to 0.4λ, the internal cavity width is 2λ and the internal cavity length
is 8λ. In the simulation, the wave is incident upon the open end of the cavity
and propagates in a direction which lies at an angle θ = 30◦ to the main
axis of the cavity. An unstructured mesh is employed, with typical spacing
λ/15, in the region that lies within a distance of λ from the scatterer, as
(a)
(b)
Fig. 11. Details of the unstructured meshes employed for the simulation of scattering of a plane TE wave by a PEC NACA0012 aerofoil of length λ showing (a) the
uniform mesh, (b) the locally refined mesh.
Electromagnetic Scattering
109
Scattering Width, dB
Co-volume l/120
Co-volume l/15 uniform
FETD l/120
FETD l/15 uniform
FETD l/15 refined
Viewing Angle, degrees
Fig. 12. Simulation of scattering of a plane TE wave by a PEC NACA0012 aerofoil
of length λ showing a comparison between the computed and benchmark scattering
width distributions.
Table 4. Simulation of scattering of a plane TE wave by a PEC NACA0012 aerofoil
of length λ.
Mesh
resolution
Uniform
Refined
spc
59
97
FETD
time, s
12.
20.
ESW
6.00
2.14
spc
46
99
Co-volume
time, s
ESW
0.4
0.9
0.6
0.5
Speed up
ratio
30
33
shown in Fig. 13(a). The simulations are advanced for 150 cycles and the
typical distribution of the contours of the computed total magnetic field in
the domain, excluding the PML, is shown in Figure 13(b). A comparison of
the computed scattering width distributions is given in Figure 14. Also shown
on this figure is the scattering width distribution computed using a high order
finite element frequency domain (FEFD) simulation [LMHW02]. The number
of steps per cycle is 57 for the co-volume scheme and 59 for the FETD method
and, for this example, the co-volume scheme requires 31 seconds of cpu time,
while the FETD method requires 1980 seconds. This represents a speed-up of
a factor of 65.
110
I. Sazonov et al.
(a)
(b)
Fig. 13. Simulation of scattering of a plane TE wave by a PEC cavity showing (a)
the unstructured mesh employed, (b) the computed total magnetic field after 150
cycles.
Scattering Width, dB
Viewing Angle, degrees
Fig. 14. Simulation of scattering of a plane TE wave by a PEC cavity showing
a comparison of the scattering width distributions computed, after 150 cycles, by
FETD, the co-volume scheme and a FEFD method.
7 Conclusions
The numerical performance of an explicit unstructured mesh co-volume time
domain scheme and a standard finite element time domain method has been
compared for a number of electromagnetic wave propagation and scattering
examples. To ensure the efficiency of the co-volume approach, the smooth
Delaunay–Voronoı̈ dual meshes that are used are generated using a stitching method. The numerical examples that have been considered show that
the co-volume method is 30–60 times faster than the finite element method
for two-dimensional scattering problems. In addition, the co-volume method
Electromagnetic Scattering
111
proved to be less sensitive to special geometric features, such as singularities
and regions of high curvature. It is anticipated that, for three-dimensional
problems, a speed-up factor of three orders of magnitude could be achieved,
if the mesh generation method can be extended to provide high quality tetrahedral elements.
References
[Ber94]
J.-P. Berenger. A perfectly matched layer for absorption of electromagnetic waves. J. Comput. Phys., 114:185–200, 1994.
[BP97]
F. Bonnet and F. Poupaud. Berenger absorbing boundary condition
with time finite-volume scheme for triangular meshes. Appl. Numer.
Math., 25:333–354, 1997.
[CFS93]
J. P. Cioni, L. Fezoui, and H. Steve. A parallel time-domain Maxwell
solver using upwind schemes and triangular meshes. Impact Comput.
Sci. Engrg., 5:215–247, 1993.
[DBB99]
A. Deraemaeker, I. Babuška, and P. Bouillard. Dispersion and pollution
of the FEM solution for the Helmholtz equation in one, two and three
dimensions. Internat. J. Numer. Methods Engrg., 46:471–499, 1999.
[DH03]
J. Donéa and A. Huerta. Finite element methods for flow problems.
John Wiley & Sons, 2003.
[DL97]
E. Darve and R. Löhner. Advanced structured-unstructured solver for
electromagnetic scattering from multimaterial objects. AIAA Paper
97–0863, Washington, 1997.
[EHM+ 03] M. El hachemi, O. Hassan, K. Morgan, D. P. Rowse, and
N. P. Weatherill. Hybrid methods for electromagnetic scattering
simulations on overlapping grids. Comm. Numer. Methods Engrg.,
19:749–760, 2003.
[EL02]
F. Edelvik and G. Ledfelt. A comparison of time-domain hybrid solvers
for complex scattering problems. Internat. J. Numer. Model.: Elect.
Net. Dev. Fields, 15:475–487, 2002.
[Geo91]
P. L. George. Automatic mesh generation. Applications to finite element methods. John Wiley & Sons, 1991.
[GL93]
S. Gedney and F. Lansing. Full wave analysis of printed microstrip
devices using a generalized Yee algorithm. In Proceedings of the IEEE
Antenas and Propagation Society International Symposium, pages
1179–1182, Ann Arbor, 1993. Pennsylvania State University.
[LMHW02] P. D. Ledger, K. Morgan, O. Hassan, and N. P. Weatherill. Arbitrary order edge elements for electromagnetic scattering simulations
using hybrid meshes and a PML. Internat. J. Numer. Methods Engrg.,
55:339–358, 2002.
[Mad95]
N. Madsen. Divergence preserving discrete surface integral methods
for Maxwell’s equations using nonorthogonal unstructured grids. J.
Comput. Phys., 119:35–45, 1995.
[MHP94]
K. Morgan, O. Hassan, and J. Peraire. An unstructured grid algorithm
for the solution of Maxwell’s equations in the time domain. Internat.
J. Numer. Methods Fluids, 19:849–863, 1994.
112
I. Sazonov et al.
[MHP96]
K. Morgan, O. Hassan, and J. Peraire. A time domain unstructured grid approach to the simulation of electromagnetic scattering in
piecewise homogeneous media. Comput. Methods Appl. Mech. Engrg.,
134:17–36, 1996.
[MHPW00] K. Morgan, O. Hassan, N. E. Pegg, and N. P. Weatherill. The simulation of electromagnetic scattering in piecewise homogeneous media
using unstructured grids. Comput. Mech., 25:438–447, 2000.
[MM98]
A. Monorchio and R. A. Mittra. A hybrid finite-element/finitedifference (FE/FDTD) technique for solving complex electromagnetic
problems. IEEE Microwave Guided Wave Lett., 8:93–95, 1998.
[MSH91]
A. H. Mohammadian, V. Shankar, and W. F. Hall. Computation of
electromagnetic scattering and radiation using a time-domain finitevolume discretization procedure. Comput. Phys. Comm., 68:175–196,
1991.
[MWH+ 99] K. Morgan, N. P. Weatherill, O. Hassan, P. J. Brookes, R. Said, and
J. Jones. A parallel framework for multidisciplinary aerospace engineering simulations using unstructured meshes. Internat. J. Numer.
Methods Fluids, 31:159–173, 1999.
[NW98]
R. A. Nicoladies and Q.-Q. Wang. Convergence analysis of a co-volume
scheme for Maxwell’s equations in three dimensions. Math. Comp.,
67:947–963, 1998.
[PLD92]
B. Petitjean, R. Löhner, and C. R. Devore. Finite element solvers for
radar cross section RCS calculations. AIAA Paper 92–0455, Washington, 1992.
[PPM99]
J. Peraire, J. Peiró, and K. Morgan. Advancing front grid generation.
In J. F. Thompson, B. K. Soni, and N. P. Weatherill, editors, Handbook
of Grid Generation, pages 17.1–17.22. CRC Press, 1999.
[RB00]
T. Rylander and A. Bondeson. Stable FEM–FDTD hybrid method for
Maxwell’s equations. Comput. Phys. Comm., 125:75–82, 2000.
[RBT97]
W. Ruey-Beei and I. Tatsuo. Hybrid finite-difference time-domain modeling of curved surfaces using tetrahedral edge elements. IEEE Trans.
Antennas and Propagation, 45:1302–1309, 1997.
[SWH+ 06] I. Sazonov, D. Wang, O. Hassan, K. Morgan, and N. P. Weatherill. A
stitching method for the generation of unstructured meshes for use with
co-volume solution techniques. Comput. Methods Appl. Mech. Engrg.,
195:1826–1845, 2006.
[TH00]
A. Taflove and S. C. Hagness. Computational electrodynamics: The
finite-difference time domain method. Artech House, Boston, 2nd edition, 2000.
[WH94]
N. P. Weatherill and O. Hassan. Efficient three-dimensional Delaunay
triangulation with automatic point creation and imposed boundary
constraints. Internat. J. Numer. Methods Engrg., 37:2005–2040, 1994.
[Yee66]
K. S. Yee. Numerical solution of initial boundary value problem involving Maxwell’s equation in isotropic media. IEEE Trans. Antennas
and Propagation, 14:302–307, 1966.
[ZM06]
O. C. Zienkiewicz and K. Morgan. Finite elements and approximation.
Dover, 2006.
The von Neumann Triple Point Paradox
Richard Sanders1∗ and Allen M. Tesdall2†
1
Department of Mathematics, University of Houston, Houston, TX 77204, USA
sanders@math.uh.edu
Fields Institute, Toronto, ON M5T 3J1, Canada and Department of
Mathematics, University of Houston, Houston, TX 77204, USA
atesdall@fields.utoronto.ca
2
Summary. We describe the problem of weak shock reflection off a wedge and discuss the triple point paradox that arises. When the shock is sufficiently weak and the
wedge is thin, Mach reflection appears to be observed but is impossible according to
what von Neumann originally showed in 1943. We summarize some recent numerical
results for weak shock reflection problems for the unsteady transonic small disturbance equations, the nonlinear wave system, and the Euler equations. Rather than
finding a standard but mathematically inadmissible Mach reflection with a shock
triple point, the solutions contain a complex structure: there is a sequence of triple
points and supersonic patches in a tiny region behind the leading triple point, with
an expansion fan originating at each triple point. The sequence of patches may be
infinite, and we refer to this structure as Guderley Mach reflection. The presence
of the expansion fans at the triple points resolves the paradox. We describe some
recent experimental evidence which is consistent with these numerical findings.
Key words: self-similar solutions, two-dimensional Riemann problems, triple
point paradox
1 Introduction
Consider a planar normal shock in an inviscid compressible and calorically
perfect gas which impinges on a fixed wedge with apex half angle θw , see
Figure 1. Given an upstream state with density ρ = ρr , velocity u = v = 0
and pressure p = pr , one calculates that downstream of a fast (i.e., u + c)
shock
∗
†
Research supported by the National Science Foundation, Grant DMS 03-06307.
Research supported by the National Science Foundation, Grant DMS 03-06307,
NSERC grant 312587-05, and the Fields Institute.
114
R. Sanders and A.M. Tesdall
I
I
θw
R
R
M
S
(a)
(b)
Fig. 1. A planar shock moving from left to right impinges on a wedge. After contact,
I indicates the incident shock and R indicates the reflected shock. On the right, the
dotted line S indicates a slip line and M is the Mach stem. Regular reflection is
depicted on the left. Irregular reflection is depicted on the right.
Ul
I
R
Ul
?
Ur
I
R
S
Ur
U
?
M
(b)
(a)
Fig. 2. A blow-up of the incident and reflected shock intersection. Regular reflection
is on the left and irregular on the right. The constant states upstream and downstream of the incident shock are denoted by Ur and Ul . Whether or not constant
states indicated by the question marks exist depends on the strength of I.
2γ
γ−1
pl
M2 −
,
=
pr
γ+1
γ+1
(γ + 1) M 2
ρl
=
,
ρr
2 + (γ − 1) M 2
2
ul
=
cr
γ+1
M−
1
M
,
(1)
where γ denotes the ratio of specific heats, and M > 1 denotes the shock
Mach number defined as the Rankine–Hugoniot
shock speed divided by the
+
upstream speed of sound cr = γpr /ρr . Following interaction, a number of
self-similar (with respect to the wedge apex) reflection patterns are possible,
depending on the values of M and θw .
This wedge reflection problem has a rich history, experimentally, analytically, and numerically. Probably the earliest and most significant analytical
result was found by von Neumann [Neu43]. In this work were first formulated
the equations which describe two and three planar shocks meeting at a point
separated by constant states, see Figure 2. The two shock theory leads to
what is known as regular reflection. The three shock theory leads to Mach
reflection. For supersonic regular reflection, state U immediately behind the
reflected shock R is supersonic and becomes subsonic across a sonic line downstream (toward the wedge’s apex). When the incident shock angle is increased
The von Neumann Triple Point Paradox
115
to π/2 − θ∗ (M ) with respect to the wall, where θw = θ∗ (M ) is the critical
wedge half angle, state U becomes sonic. Therefore, at θw = θ∗ (M ), acoustic
signals generated downstream (e.g., from the wedge apex) will overtake the
R-I reflection point, conceivably causing transition from regular reflection,
depicted in the left figure, to irregular reflection, depicted in the right figure.
This is one of several criteria which have been suggested to explain transition
from regular to irregular reflection; see Henderson [Hen87] for a thorough and
detailed discussion.
Loosely speaking, a weak incident shock has M slightly larger than 1,
whereas a strong incident shock has M substantially larger than one. Theoretical analysis indicates that transition to Mach reflection is impossible when the
incident shock is sufficiently weak. In fact, triple point solutions, as depicted in
Figure 2(b), do not exist for sufficiently weak shocks. However, experiments in
which weak shock waves are reflected off a wedge with θw ≪ θ∗ (M ) appear to
show a standard Mach reflection pattern. This apparent disagreement between
theory and experiment was discussed by von Neumann and has since become
known as the von Neumann triple point paradox [Neu63, Hen87, SA05].
Guderley [Gud47, Gud62] as far back as 1947 proposed that there is an
expansion fan and a supersonic region directly behind the triple point in a
steady weak shock Mach reflection. He demonstrated that one could construct local solutions consisting of three plane shocks, an expansion fan, and
a contact discontinuity or slip line meeting at a point. However, despite intensive experimental [BT49, STS92, Ste59] and numerical [CH90, BH92, TR94]
studies, no evidence of an expansion fan or supersonic patch was observed.
The first evidence supporting Guderley’s proposed resolution was contained
in numerical solutions of shock reflection problems for the unsteady transonic
small disturbance equations in [HB00] and the compressible Euler equations
in [VK99]. There were presented solutions that contain a tiny supersonic region embedded in the subsonic flow directly behind the triple point in a weak
shock Mach reflection. Subsequently, Zakharian et al. [ZBHW00] found a supersonic region in a numerical solution of a shock reflection problem for the
Euler equations, for a set of parameter values corresponding to those used in
the unsteady transonic small disturbance solution in [HB00]. The supersonic
region in the solutions in [VK99, HB00, ZBHW00] is extremely small, which
explains why it had not been observed earlier.
This paper is organized as follows. In Section 2 the unsteady transonic
small disturbance asymptotic model for a weak shock impinging on a thin
wedge is recalled. Numerical evidence is offered to suggest an interesting resolution of the von Neumann paradox. Experimental evidence to support what
was found numerically is displayed at the end of this section. In Section 3
a simple 3 × 3 hyperbolic system is given which exhibits irregular reflection
but does not admit Mach reflection. It is solved numerically, displaying very
similar structure to what was found in Section 2. Finally, the full compressible Euler equations are solved in Section 4 for a very weak incident shock
116
R. Sanders and A.M. Tesdall
impinging on a thin wedge. The numerical solution appears to be in agreement
with what is found for the model problems from the previous sections.
2 The Weak Shock Thin Wedge Limit
The compressible Euler equations are given by
∂ρ
+ ∇ · ρu = 0,
∂t
∂ρu
+ ∇ · ρu ⊗ u + ∇p = 0,
∂t
∂ρe
+ ∇ · (ρe + p)u = 0,
∂t
(2)
where ρ is the fluid density, u = (u, v) is the x-y velocity vector, p is the
pressure and e is the total energy per unit mass. The internal energy per unit
mass ε = e − 1/2|u|2 , and we take p = (γ − 1)ρε for a calorically perfect gas
with the constant ratio of specific heats γ > 1.
Consider an incident planar shock with Mach number M = 1 + ε2 striking
a thin wedge with half angle θw = aε, where ε > 0 is destined to vanish. Take
the undisturbed upstream state U+
r as ρ = ρr , u = v = 0 and p = pr , yielding
an upstream speed of sound cr = γpr /ρr . From (1), calculate that Ul is
pl
ul
4γ 2
4
ε + O(ε4 ),
ε2 + O(ε4 ),
= 1+
=
pr
γ+1
cr
γ+1
(3)
ρl
vl
4
−4
4
2
3
5
ε + O(ε ),
aε + O(ε ).
= 1+
=
ρr
γ+1
cr
γ+1
Hunter and Brio [HB00] observed the scales shown in (3) and proposed an
asymptotic model based on
p = pr (1 + ε2 p̂),
u = cr ε2 û,
ρ = ρr (1 + ε2 ρ̂),
v = cr ε3 v̂,
and the stretched independent variables
x̂ =
x − p(t)
,
ε2
ŷ =
y
,
ε
where p(t) is the location where the incident shock would (neglecting possible
interactions) strike the wedge wall at time t,
p(t) = cr cos(θw )(1 + ε2 ) t = cr cos(aε)(1 + ε2 ) t ≈ cr (1 − (1 − a2 /2)ε2 ) t,
The von Neumann Triple Point Paradox
θw
Ul
117
Ur
R
I
s
p(t)
Fig. 3. A weak shock over a thin wedge. Ur and Ul are the states to the right and
left of the incident shock I. θw = aε ≪ 1 and the incident shock has Mach number
M = 1 + ε2 . x = p(t) is the location where I would intersect the wall at time t,
neglecting interaction.
see Figure 3. Inserting these into (2), equating like powers of ε, and making
an additional order one change of variable (denoted by ǔ, etc.), they find that
ǔ and v̌ asymptotically satisfy
ǔt + 1/2 ǔ2
x̌
+ v̌y̌ = 0,
ǔy̌ − v̌x̌ = 0.
(4)
This is, of course, the celebrated unsteady transonic small disturbance equation
(UTSDE). The UTSDE is solved on the upper half plane with a no-flow
boundary condition v̌(x̌, 0, t) = 0 along y̌ = 0 and initial data
)
(0, 0)
if x̌ > ǎy̌
(ǔ(x̌, y̌, 0), v̌(x̌, y̌, 0)) =
(1, −ǎ) if x̌ < ǎy̌,
where
aε
1
a
1 θw
= √
.
∼ √
2
2 1 + ε2 − 1
2 M −1
The jump at x̌ = ǎy̌ corresponds to the incident shock I. The data is vorticityfree but incompatible with the no-flow boundary condition behind. As time
advances, the reflected wave pattern
R will emerge from the trailing boundary.
√
For ǎ in the range 0 < ǎ < 2, regular reflection for this initial-boundary
value problem is impossible [BH92]. Moreover, it is shown in [BH92] as well as
in [TR94] that (4) can never admit triple point solutions. Therefore, this asymptotic model equation is very well designed to investigate the von Neumann
triple point paradox.
A numerical solution to (4) was obtained in [HB00] for the value ǎ = 0.5
(a value for which regular reflection does not occur). An irregular reflection
pattern globally resembling single Mach reflection was observed. When the
region containing the apparent triple point was greatly refined, however, a
small supersonic patch located in the subsonic zone directly below the reflected
shock and behind the Mach stem was detected, see [HB00, page 242]. This,
ǎ =
118
R. Sanders and A.M. Tesdall
along with the contemporaneous work in [VK99], was the first indication that
Guderley’s resolution of the triple point paradox might be essentially correct.
Using a new numerical scheme, a subsequent study by Tesdall and Hunter
[TH02], we further investigated the structure of irregular reflection found in
the UTSDE asymptotic model.
The supersonic patch detected in [VK99, HB00] appeared to confirm
Guderley’s four-wave solution. The patch indicates that it is plausible for
an expansion wave to be a (unobserved) part of the observed three shock confluence. We briefly summarize the numerical techniques employed by Tesdall
and Hunter. First, they used a parabolic grid aligned with the weak reflected
shock. They then solved the UTSDE in self-similar variables x̌ → x̌/t, y̌ → y̌/t.
The advantage of using self-similar coordinates is that the problem remains
fixed on the computational grid, and a steady self-similar solution is obtained
by letting a pseudo-time t → ∞. Following the classical Cole–Murman approach, (ǔ, v̌) is written as grad φ. The nonlinearities in the resulting scalar
equation are discretized by a min-mod limited Engquist–Osher numerical flux.
A steady state solution is obtained by lagged implicit time marching and grid
continuation.
We present results obtained by the method of Tesdall and Hunter in
Figure 4. The full simulation is carried out on a spatial grid that fits in
[−3, 2] × [0, 2.5], with the inverse slope parameter ǎ = 0.5. The total number
of grid points employed is approximately 2.7 × 106 , where, by local grid refinement, the region depicted in Figure 4(a) spans 768 × 608 ≈ 4.7 × 105 points.
This yielded a grid size near the triple point of approximately 1.5 × 10−5 .
Clear evidence of an expansion fan is seen at the triple point depicted in
Figure 4. What is equally remarkable is what appears to be a sequence of progressively smaller and weaker shock/expansion pairs running a short distance
(less than 2%) down the length of the Mach stem. The expansion from wave i
appears to terminate through its interaction with the shock from wave i + 1.
The supersonic region behind the leading triple point is extremely small, which
explains why it had not been observed earlier. The results in [TH02] suggest
that the sequence of triple points and expansion waves/shocks in a weak shock
irregular reflection may be infinite. Whether this sequence is infinite or not is
certainly impossible for any numerical simulation to determine. In fact, one
could argue that the structure indicated in Figure 4 may be numerical flux
dependent (upwind/non-upwind) or that the asymptotic model may predict
something that is not physically realized. We address these concerns here and
in the following sections.
Experimental confirmation poses a most challenging problem simply because the computed Guderley Mach reflection structure is so small and weak.
Nevertheless, some experimental evidence has recently been obtained. Following the announcement of the Guderley Mach reflection solution found in
[TH02], Skews and Ashworth [SA05] modified an existing shock tube experimental apparatus in order to obtain Mach stem lengths more than an order
of magnitude larger than those possible from conventional shock tubes. All
The von Neumann Triple Point Paradox
(a)
119
(b)
(c)
Fig. 4. Closeups of an apparent triple point for the UTSDE using the approach
of Tesdall and Hunter. In (a) and (b) the incident shock leaves to the upper right,
the reflected shock towards the top, and the Mach stem exits at the bottom. The
plot in (a) depicts contour lines of u and shows a sequence of expansions/shocks
running down the Mach stem. The plot in (b) shows a detail of v; 1 denotes state
v = 0, 2 state v = −ǎ and 3 points to the expansion wave emanating from what
appears macroscopically to be a triple point. The dotted line in (b) delineates the
supersonic patches within the subsonic zone behind the Mach stem. The Guderley
Mach reflection structure can be seen better in the surface plot (c) where the viewer
is upstream looking back at the triple point.
experiments were carried out on a 15◦ ramp with incident shock Mach numbers
ranging from 1.05 to 1.1. They present images that “clearly show the existence
of an expansion wave immediately behind the reflected wave as proposed by
Guderley”, and they found “a distinct sharp contrasting line immediately
120
R. Sanders and A.M. Tesdall
(a)
(b)
Fig. 5. On the left, a schlieren image of an experimental weak shock reflection.
The incident shock (vertical) exits at the top and is moving from left to right. The
reflected wave exits to the upper left, and an expansion wave is visible immediately
behind it. A highly contrasted image is on the right, showing evidence of a second
shocklet behind the first.
after the expansion wave, indicating the existence of a terminating shock”. In
addition, they obtained evidence in some of their images of a second terminating shocklet behind the first, as predicted by the simulations in [TH02].
Professor Beric Skews graciously supplied us with the images which we give
here in Figure 5. Further experimental refinements and data acquisition are
currently underway.
3 The Nonlinear Wave System
Here we consider a problem for the nonlinear wave system which is analogous
to the reflection of weak shocks discussed in the previous section. The shock
reflection problem consists of the nonlinear wave system
∂ρ
+ ∇ · ρu = 0,
∂t
∂ρu
+ grad p = 0,
∂t
in the half space x > 0 with piecewise constant Riemann data consisting of
two states separated by a discontinuity located at x = κy. Again, ρ should be
thought of as density, u = (u, v) as velocity having x- and y-components, and
p = p(ρ) as pressure. For convenience, we assume p(ρ) = Cργ where C is a
constant and γ = 2. See [TSK06].
The nonlinear wave system is a simplification of the isentropic Euler
equations obtained by dropping the momentum transport terms from the
The von Neumann Triple Point Paradox
121
momentum equations. Compared to the UTSDE, the nonlinear wave system is closer in structure to the Euler equations: it is linearly well-posed
in space and time, it has a characteristic structure similar to the Euler equations with nonlinear acoustic waves coupled (weakly) to linearly degenerate
waves, and it respects the spatial Euclidean symmetries of gas dynamics (excluding space-time Galilean symmetry, of course). In fact (see [KF94]), it may
be the simplest system one can construct with these symmetries. It has also
served as a prototypical model for the theoretical study of shock wave reflection [ČK98, ČKK05, ČKK01]. However, the greatest attribute of (3) for our
purposes is the sheer simplicity of its wave structure. Moreover, the fluxes
are quadratic (when γ = 2), and so its flux Jacobians are linear in conserved
√
variables. The Jacobian’s eigenvalues are 0 and ±c, where c = pρ , and it
has extremely simple eigenvectors. It is very well suited for efficient finite
differencing.
Let U = (ρ, m, n) denote the vector of conserved variables, where m = ρu
and n = ρv, and consider the following two-dimensional Riemann data:
)
if x < κy,
U1 ≡ (ρ1 , 0, 0)
(5)
U (x, y, 0) =
U0 ≡ (ρ0 , 0, n0 ) if x > κy.
We choose ρ0 > ρ1 to obtain an upward moving shock in the far field, and
determine n0 so that the one-dimensional wave between U0 and U1 at inverse
slope κ consists of a shock and a contact discontinuity with a constant middle
state between them. The following expression for n0 is readily determined:
n0 =
1+
(1 + κ2 )(p(ρ0 ) − p(ρ1 ))(ρ0 − ρ1 ).
κ
(6)
There is no physical wall in the Mach reflection simulation below. Rather,
reflection occurs because the vertical axis is a line of left-right symmetry,
see Figure 6(a). Here, for κ sufficiently large (κ = 1 will do), regular reflection is impossible. Moreover, as with the UTSDE, (3) can never admit triple
point solutions, see [TSK06]. So we now investigate the structure of irregular
reflection, this time, however, for a hyperbolic system – one which resembles
the Euler equations but is not obtained from them via a limit.
The essential feature of the numerical method employed is the capability
to locally refine the grid in the area of the apparent triple point. We again
use self-similar variables
x → x/t ≡ ξ,
y → y/t ≡ η
to cast the problem into one which remains fixed on the grid. Non-uniform,
logically rectangular, finite volume grids are constructed so that for a given κ
the incident shock is aligned with the grid in the far field. Specifically, each
problem with a given incident shock angle has a set of associated finite volume
C-grids, each grid in the set corresponds to a level of grid refinement, and we
use these to grid continue to a steady state.
122
R. Sanders and A.M. Tesdall
C
10
D
T
y/t
5
0
−5
−10
B
0
A
(a)
5
10
x/t
15
(b)
Fig. 6. A schematic diagram of the computational domain is on the left. AD is the
line of symmetry. On the right is a computed self-similar solution with κ = 1.
The basic finite volume schemes used are quite standard. Each grid cell,
Ω, is a quadrilateral and, using ν = (νξ , νη ) to denote the normal vector to a
typical side of Ω, numerical fluxes are designed to be consistent with
⎞
⎛
νξ m + νη n − ξ¯ ρ
F(U ) = (F (U ) − ξU ) νξ + (G(U ) − ηU ) νη = ⎝ νξ p − ξ¯ m ⎠ ,
νη p − ξ¯ n
where ξ¯ = (ξ·ν) and ξ = (ξ, η). Since ξ varies in space, numerical flux formulae
are evaluated at ξ frozen at the midpoint of each cell side. Two distinctly
different numerical fluxes are utilized in the results presented below:
1. Lax–Friedrichs:
HLF =
1
F (Ul ) + F(Ur ) − Λ (Ur − Ul ) ,
2
where Λ > 0 is a scalar constant chosen to be larger than the fastest wave
speed found on the computational domain.
2. Roe:
1
F (Ul ) + F(Ur ) − RΛL (Ur − Ul ) ,
HRoe =
2
¯ | − ξ¯ + c|), and R and L are the matrices
where Λ = diag(| − ξ¯ − c|, | − ξ|,
of the right and left eigenvectors to the Jacobian of F evaluated at the
midpoint URoe = 21 (Ul +Ur ). Since we use the equation of state p = 1/2ρ2 ,
the midpoint yields an exact Roe average.
In order to investigate the structure of the solution near the triple point
in a manner that has as little numerical bias as possible, we opted to first
solve the problem using the classic first-order accurate Lax–Friedrichs finite
The von Neumann Triple Point Paradox
(a)
123
(b)
(c)
Fig. 7. Density contour plots for the nonlinear wave system using the first order
accurate Lax–Friedrichs finite volume scheme in a neighborhood of the triple point.
The region shown includes the locally refined 760 × 760 grid in (a), the 1280 × 1024
grid in (b) and the 2048 × 1320 grid in (c). The heavy line below the reflected shock
and to the right of the Mach stem delineates a supersonic patch found within the
subsonic zone. There is a slight indication of an expansion fan behind the leading
triple point in (c).
volume scheme. That is, the Lax–Friedrichs flux is used in conjunction with
piecewise constant cell-wise reconstruction. Figure 7 depicts a closeup of what
was found on three grids with increasing refinement. The largest grid (c)
contains approximately 11 million grid points. Approximately one quarter of
these are contained in a square of length 0.05 units centered on the triple point.
The solution in (c) clearly resolves a small patch of supersonic flow behind the
triple point. This patch is quite small with width of approximately 0.03 and
height of approximately 0.01. Note the fattening of the incident and Mach
shocks as they leave the region of extreme grid refinement. The much weaker
reflected shock is well resolved since it is aligned with the grid, and the grid in
the direction normal to the reflected shock is very fine near the triple point.
124
R. Sanders and A.M. Tesdall
8.65
8.65
8.64
x/t
y/t
8.64
8.63
8.62
8.63
8.62
0.56
0.57
0.58
x/t
(a)
0.59
0.56
0.57
0.58
0.59
x/t
(b)
Fig. 8. Density contours (a) and x-momentum contours (b) for the nonlinear wave
system using a high-order Roe scheme. These were obtained on the same grid depicted in Figure 7(c). There is now clear evidence of the sequence of interacting
shocks and expansions seen earlier for the UTSDE. The heavy line is the sonic line
and again delineates the supersonic patch.
The width of the supersonic patch is approximately 5% of the length of the
Mach stem. There is a slight indication of an expansion fan at the triple point,
but at this level of grid refinement there is no evidence yet of the sequence of
shocks and expansions seen in Figure 4.
There comes a time when the results from a first-order scheme are, at
best, inadequate, because of hardware limitations. The large grid results just
displayed used a grid whose smallest grid size was on the order of one millionth
of the extent of the computational domain. Moreover, these problems are
steady and, therefore, require hundreds of thousands of pseudo-time iterations.
At this stage we, therefore, employed a (perhaps) somewhat less unbiased
numerical approach – a high-order scheme based on the Roe numerical flux.
High-order accuracy is achieved by using a piecewise quadratic reconstruction
limited in characteristic variables. We give the finest grid results from this
approach in Figure 8. Three shock/expansion pairs are now clearly evident.
The primary wave is at the triple point and two others can be seen along the
Mach stem, a pattern very similar to that found for the UTSDE.
4 Weak Shock Irregular Reflection for the Euler
Equations
We compute numerical solutions for the Euler equations (2) with γ = 5/3. A
weak M = 1.04 vertically aligned incident shock impinges on a θw = 11.5◦
ramp. These data correspond to parameter ǎ ≈ 1/2 in the UTSDE model from
Section 2. The grid is defined by a conformal map of the form z = wα , and so
it is orthogonal with a singularity at the ramp apex x = y = 0. The upstream
speed of sound cr = 1, and boundary data on the left, right and top is given
The von Neumann Triple Point Paradox
125
2.5
2
1.5
1
0.5
0
−2
−1
0
1
2
Fig. 9. The geometry of the M = 1.04/11.5◦ Euler example. The insert indicates
the region where extreme local grid refinement is performed.
to exactly agree with this shock located at x = 1.04. The lower boundary
condition mimics symmetry about the x-axis for x < 0 and symmetry with
respect to the ramp for x > 0. The grid geometry can be seen in Figure 9.
This problem is well outside the range where regular reflection solutions are
possible. Refer again to the figure to see that its numerical solution (under
the insert) clearly resembles single Mach reflection. However, Mach reflection
(where three plane shocks meet at a point) is also not possible for a shock
this weak [Hen87]. This example demonstrates a classic von Neumann triple
point paradox.
This problem is solved in self-similar coordinates by essentially the same
high order Roe method discussed in the previous section. However, we simplify
the Roe approach by again evaluating the Roe matrix at the midpoint, which
for the Euler equations is only an approximation to the Roe average. Also, to
avoid spurious expansion shocks, artificial dissipation on the order of O(|Ur −
Ul |) is appended to the diagonal part of the Roe dissipation matrix in a field
by field manner.
We locally refine a very small neighborhood around the apparent triple
point as done earlier. The full finest grid has eleven million grid points with
800 × 2000 = 1.6 × 106 (∆x ≈ 5 × 10−7 ) devoted to the local refinement.
We plot the sonic number M which is defined as follows. The eigenvalue
corresponding to a fast shock in unit direction n for the self-similar Euler flux
Jacobian is
λ = (u − ξ, v − η) · n + c
where ξ = x/t and η = y/t. Define r2 = ξ 2 + η 2 and set n = (ξ, η)/r,
un = (u, v) · n to find
un − r
r − un
+ 1 = c(1 − M) where M =
.
λ=c
c
c
126
R. Sanders and A.M. Tesdall
0.3045
0.3045
0.304
0.304
0.3035
0.3035
0.303
0.303
0.3025
1.0385
1.039
1.0405
1.04
1.0395
0.3025
1.0385
1.039
1.0395
(a)
1.04
1.0405
(b)
Fig. 10. A closeup of the Euler triple point. The sonic number M on the left and
density ρ on the right. The dotted line on the left delineates the supersonic patch
within the subsonic zone behind the Mach stem.
1.02
1.015
1.015
1.01
1.01
1.005
1.005
1
1
0.995
0.995
0.99
0.99
0
0.2
0.4
(a)
0.6
0.8
1
0.985
0
0.2
0.4
0.6
0.8
1
(b)
Fig. 11. Vertical cross sections of M taken bottom-up slightly to the left of the
Mach stem. On the left M = 1.04/11.5◦ . The reflected shock is the large jump.
Note the crossings at M = 1. On the right, a second example problem with a
slightly stronger incident shock M = 1.075/15.0◦ . The evidence of a sequence of
shock/expansion wave pairs is stronger for this second example.
When M < 1, the flow is called subsonic. When M > 1, the flow is called supersonic. In this sense, when crossing through a self-similar stationary shock,
the fact that M crosses from subsonic to supersonic is nothing more than the
entropy condition λl > s > λr .
Figure 10 gives a sonic number contour plot (a) and density contours (b)
in the triple point neighborhood. Clearly the evidence for Guderley Mach
reflection in this example is not nearly as compelling as found for our earlier
examples. However, these shocks are extremely weak. In recent work for a
γ = 7/5 gas, we slightly strengthened the incident Mach number, M = 1.075,
and obtained far more conclusive results. See the sonic number cross sections
depicted in Figure 11.
The von Neumann Triple Point Paradox
127
References
[BH92]
[BT49]
[CF76]
[CH90]
[ČK98]
[ČKK01]
[ČKK05]
[Gud47]
[Gud62]
[HB00]
[Hen66]
[Hen87]
[HT04]
[KF94]
[Neu43]
[Neu63]
[Ric81]
[SA05]
[Ste59]
[STS92]
[TH02]
M. Brio and J. K. Hunter. Mach reflection for the two-dimensional
Burgers equation. Phys. D, 60:194–207, 1992.
W. Bleakney and A. H. Taub. Interaction of shock waves. Rev. Modern
Physics, 21:584–605, 1949.
R. Courant and K. O. Friedrichs. Supersonic Flow and Shock Waves.
Springer, 1976.
P. Colella and L. F. Henderson. The von Neumann paradox for the
diffraction of weak shock waves. J. Fluid Mech., 213:71–94, 1990.
S. Čanić and B. L. Keyfitz. Quasi-one-dimensional Riemann problems
and their role in self-similar two-dimensional problems. Arch. Rational
Mech. Anal., 144:233–258, 1998.
S. Čanić, B. L. Keyfitz, and E. H. Kim. Mixed hyperbolic-elliptic systems in self-similar flows. Bol. Soc. Bras. Mat., 32:1–23, 2001.
S. Čanić, B. L. Keyfitz, and E. H. Kim. Free boundary problems for
nonlinear wave systems: Mach stems for interacting shocks. SIAM J.
Math. Anal., 37:1947–1977, 2005.
K. G. Guderley. Considerations of the structure of mixed subsonicsupersonic flow patterns. Air Material Command Tech. Report,
F-TR-2168-ND, ATI No. 22780, GS-AAF-Wright Field 39, U.S. WrightPatterson Air Force Base, Dayton, Ohio, October 1947.
K. G. Guderley. The Theory of Transonic Flow. Pergamon Press,
Oxford, 1962.
J. K. Hunter and M. Brio. Weak shock reflection. J. Fluid Mech.,
410:235–261, 2000.
L. F. Henderson. On a class of multi-shock intersections in a perfect
gas. Aero. Q., 17:1–20, 1966.
L. F. Henderson. Regions and boundaries for diffracting shock wave
systems. Z. Angew. Math. Mech., 67:73–86, 1987.
J. K. Hunter and A. M. Tesdall. Weak shock reflection. In D. Givoli,
M. Grote, and G. Papanicolaou, editors, A Celebration of Mathematical
Modeling. Kluwer Academic Press, New York, 2004.
B. L. Keyfitz and M. C. Lopes Filho. A geometric study of shocks in
equations that change type. J. Dynam. Differential Equations, 6:351–
393, 1994.
J. von Neumann. Oblique reflection of shocks. Explosives Research
Report 12, Bureau of Ordinance, 1943.
J. von Neumann. Collected Works, Vol. 6. Pergamon Press, New York,
1963.
R. D. Richtmeyer. Principles of Mathematical Physics, Vol. 1. Springer,
1981.
B. Skews and J. Ashworth. The physical nature of weak shock wave
reflection. J. Fluid Mech., 542:105–114, 2005.
J. Sternberg. Triple-shock-wave intersections. Phys. Fluids, 2:179–206,
1959.
A. Sasoh, K. Takayama, and T. Saito. A weak shock wave reflection
over wedges. Shock Waves, 2:277–281, 1992.
A. M. Tesdall and J. K. Hunter. Self-similar solutions for weak shock
reflection. SIAM J. Appl. Math., 63:42–61, 2002.
128
[TR94]
R. Sanders and A.M. Tesdall
E. G. Tabak and R. R. Rosales. Focusing of weak shock waves and
the von Neumann paradox of oblique shock reflection. Phys. Fluids,
6:1874–1892, 1994.
[TSK06]
A. M. Tesdall, R. Sanders, and B. L. Keyfitz. The triple point paradox
for the nonlinear wave system. SIAM J. Appl. Math., 67:321–336, 2006.
[VK99]
E. Vasil’ev and A. Kraiko. Numerical simulation of weak shock diffraction over a wedge under the von Neumann paradox conditions. Comput.
Math. Math. Phys., 39:1335–1345, 1999.
[ZBHW00] A. Zakharian, M. Brio, J. K. Hunter, and G. Webb. The von Neumann
paradox in weak shock reflection. J. Fluid Mech., 422:193–205, 2000.
A Lagrange Multiplier Based Domain
Decomposition Method for the Solution of a
Wave Problem with Discontinuous Coefficients
Serguei Lapin1 , Alexander Lapin2 , Jacques Périaux3,4 , and Pierre-Marie
Jacquart5
1
2
3
4
5
Department of Mathematics, Washington State University, Pullman WA 99164
USA slapin@math.wsu.edu
Kazan State University, Department of Computational Mathematics and
Cybernetics, 18 Kremlyovskaya St., Kazan 420008, Russia alapin@ksu.ru
Pole Scientifique Dassault/UPMC jperiaux@free.fr
University of Jyväskylä, Department of Mathematical Information Technology,
P.O. Box 35 (Agora), FI-40014 University of Jyväskylä, Finland
Dassault Aviation, 78, Quai Marcel Dassault, Cedex 300, Saint-Cloud 92552,
France pierre-marie.jacquart@dassault-aviation.fr
Summary. In this paper we consider the numerical solution of a linear wave equation with discontinuous coefficients. We divide the computational domain into two
subdomains and use explicit time difference scheme along with piecewise linear finite element approximations on semimatching grids. We apply boundary supported
Lagrange multiplier method to match the solution on the interface between subdomains. The resulting system of linear equations of the “saddle-point” type is solved
efficiently by a conjugate gradient method.
1 Problem Formulation
Let Ω ⊂ R2 be a rectangular domain with sides parallel to the coordinate
axes and boundary Γext (see Fig. 1). Now let Ω2 ⊂ Ω be a proper subdomain
of Ω with a curvilinear boundary and Ω1 = Ω \ Ω̄2 .
We consider the following linear wave problem:
⎧ 2
∂ u
⎪
⎪
ε 2 − ∇ · (µ−1 ∇u) = f
⎪
⎪
⎪
⎨ +∂t
∂u
∂u
+ µ−1
=0
ε µ−1
⎪
∂t
∂n
⎪
⎪
⎪
⎪
⎩ u(x, 0) = ∂u (x, 0) = 0.
∂t
in Ω × (0, T ),
on Γext × (0, T ),
(1)
132
S. Lapin et al.
Ω
R
γ
Ω2
Γext
Fig. 1. Computational domain.
∂u
, ∂u ), n is the unit outward normal vector on Γext . We
Here ∇u = ( ∂x
1 ∂x2
suppose that µi = µ|Ωi , εi = ε|Ωi are positive constants for all i = 1, 2 and
fi = f |Ωi ∈ C(Ω̄i × [0, T ]).
Let
)
)
ε1 if x ∈ Ω1 ,
µ1 if x ∈ Ω1 ,
ε(x) =
and µ(x) =
, ε2 if x ∈ Ω2 ,
µ2 if x ∈ Ω2 .
We define a weak solution of problem (1) as a function u such that
u ∈ L∞ (0, T ; H 1 (Ω)),
∂u
∂u
∈ L∞ (0, T ; L2 (Ω)),
∈ L2 (0, T ; L2 (Γext )) (2)
∂t
∂t
for a.a. t ∈ (0, T ) and for all w ∈ H 1 (Ω) satisfying the equation
6
∂u
∂2u
µ−1 (x)∇u · ∇wdx + ε1 µ−1
f wdx
wdΓ
=
ε(x) 2 wdx +
1
∂t
Ω
Ω
Γext ∂t
Ω
(3)
with the initial conditions
u(x, 0) =
∂u
(x, 0) = 0.
∂t
Note that the first term in (3) means the duality between (H 1 (Ω))∗ and
H 1 (Ω).
Now, using the Faedo–Galerkin method (as in [DL92]), one can prove the
following:
Theorem 1. Under the assumptions (2) there exists a unique weak solution
of problem (1).
A Lagrange Multiplier Based Domain Decomposition Method
Let
2
∂u
1
ε(x) dx +
µ−1 (x)|∇u|2 dx
∂t
2
Ω
Ω
1
E(t) =
2
be the energy of the system. We take w =
dE(t)
+
dt
6
ε1 µ−1
1
133
Γext
(
∂u 2
) dΓ =
∂t
Ω
∂u
∂t
f
in (3) and obtain:
∂u
∂u
dx ≤ f L2 (Ω) L2 (Ω) ,
∂t
∂t
since E(0) = 0, the following stability inequality holds:
E(t) ≤ const T f L2 (Ω×(0,T )) ,
∀t ∈ (0, T ).
In order to use a structured grid in a part of the domain Ω, we introduce
a rectangular domain R with sides parallel to the coordinate axes, such that
Ω2 ⊂ R ⊂ Ω with γ the boundary of R (Fig. 1).
Define Ω̃ = Ω \ R̄ and let the subscript 1 of a function v1 mean that
this function is defined over Ω̃ × (0, T ), while v2 is a function defined over
R × (0, T ).
Now we formulate the problem (3) variationally as follows: Let
∂v
∂v
∞
1
∞
2
2
2
∈L (0, T ; L (Ω̃)),
∈L (0, T ; L (Γext )) ,
W1 = v ∈ L (0, T ; H (Ω̃)),
∂t
∂t
∂v
∈ L∞ (0, T ; L2 (R))) ,
W2 = v ∈ L∞ (0, T ; H 1 (R)),
∂t
Find a pair (u1 , u2 ) ∈ W1 × W2 , such that u1 = u2 on γ × (0, T ) and for a.a.
t ∈ (0, T )
⎧
∂ 2 u1
∂ 2 u2
⎪
−1
⎪
ε
w
dx
+
µ
∇u
·
∇w
dx
+
w2 dx
ε(x)
1
1
1
1
⎪
1
⎪
∂t2
∂t2
⎪
Ω̃
Ω̃
R
⎪
⎪
6
⎪
⎪
∂u1
⎨ + µ−1 (x)∇u · ∇w dx+ ε µ−1
w1 dΓ = f1 w1 dx+ f2 w2 dx,
1 1
2
2
Ω̃
Γext ∂t
R
R
⎪
⎪
1
1
⎪
⎪
for
all
(w
,
w
)
∈
H
(
Ω̃)
×
H
(R)
such
that
w
=
w
on
γ,
1
2
1
2
⎪
⎪
⎪
⎪
∂u
⎪
⎩ u(x, 0) =
(x, 0) = 0.
∂t
(4)
Now, introducing the interface supported Lagrange multiplier λ (a function
defined over γ × (0, T ) ), the problem (4) can be written in the following way:
Find a triple (u1 , u2 , λ) ∈ W1 × W2 × L∞ (0, T ; H −1/2 (γ)), which for a.a.
t ∈ (0, T ) satisfies
134
S. Lapin et al.
∂ 2 u2
∂ 2 u1
∇u
·
∇w
dx
+
µ−1
ε(x) 2 w2 dx
ε1 2 w1 dx +
1
1
1
∂t
∂t
R
Ω̃
Ω̃
6
∂u
1
w
dΓ
+
µ−1 (x)∇u2 · ∇w2 dx + ε1 µ−1
λ(w2 − w1 )dγ
+
1
1
R
γ
Γext ∂t
=
f1 w1 dx +
f2 w2 dx for all w1 ∈ H 1 (Ω̃), w2 ∈ H 1 (R),
(5)
R
Ω̃
ζ(u2 − u1 )dγ = 0 for all ζ ∈ H −1/2 (γ),
(6)
γ
and the initial conditions from (1).
Remark 1. We selected the time dependent approach to capture harmonic
solutions since it substantially simplifies the linear algebra of the solution
process. Furthermore, there exist various techniques to speed up the convergence of transient solutions to periodic ones (see, e.g., [BDG+ 97]).
2 Time Discretization
In order to construct a finite difference approximation in time of the problem
(5), (6), we partition the segment [0, T ] into N intervals using a uniform
discretization step ∆t = T /N . Let uni ≈ ui (n ∆t) for i = 1, 2, λn ≈ λ(n ∆t).
The explicit in time semidiscrete approximation to the problem (5), (6) reads
as follows:
u0i = u1i = 0
for n = 1, 2, . . . , N − 1. Find un+1
∈ H 1 (R) and λn+1 ∈
∈ H 1 (Ω̃), un+1
2
1
−1/2
H
(γ) such that
ε1
Ω̃
un+1
− 2un1 + u1n−1
1
w1 dx +
∆t2
Ω̃
n
µ−1
1 ∇u1 · ∇w1 dx+
− 2un2 + u2n−1
un+1
2
w
dx
+
µ−1 (x)∇un2 · ∇w2 dx+
2
2
∆t
R
R
6
u1n+1 − u1n−1
−1
w1 dΓ + λn+1 (w2 − w1 )dγ =
+ ε1 µ1
2∆t
γ
Γext
n
n
f1 w1 dx +
f2 w2 dx for all w1 ∈ H 1 (Ω̃), w2 ∈ H 1 (R),
=
R
Ω̃
n+1
n+1
ζ(u2 − u1 )dγ = 0 for all ζ ∈ H −1/2 (γ).
+
ε(x)
(7)
(8)
γ
Remark 2. The integral over γ is written formally; the exact formulation requires the use of the duality pairing ·, · between H −1/2 (γ) and H 1/2 (γ).
A Lagrange Multiplier Based Domain Decomposition Method
135
3 Fully Discrete Scheme
To construct a fully discrete space-time approximation to the problem (5), (6),
we will use a lowest order finite element method on two grids semimatching
on γ (Fig. 2) for the space discretization. Namely, let T1h and T2h be triangulations of Ω̃ and R, respectively. Further we suppose that both triangulations
are regular in the sense that
r(e)
≤ q = const
h(e)
for all e ∈ T1h and e ∈ T2h , where q does not depend on e; r(e) is the radius
of the circle inscribed in e, while h(e) is the diameter of e.
We denote by T1h a coarse triangulation and by T2h a fine one. Every edge
∂e ⊂ γ of a triangle e ∈ T1h is supposed to consist of me edges of triangles
from T2h , 1 ≤ me ≤ m for all e ∈ T1h
Moreover, let a triangulation T2h be such that the curvilinear boundary
∂Ω2 is approximated by a polygonal line consisting of the edges of triangles
from T2h whose vertices belong to ∂Ω2 . Further, we say that a triangle e ∈ T2h
lies in Ω2 if its larger part lies in Ω2 , i.e. meas(e ∩ Ω2 ) > meas(e ∩ (R \ Ω2 )),
otherwise this triangle lies in R \ Ω2 .
Let V1h ⊂ H 1 (Ω̃) be the space of the functions globally continuous, and
affine on each e ∈ T1h , i.e. V1h = {uh ∈ H 1 (Ω̃) | uh ∈ P1 (e) ∀e ∈ T1h }.
Similarly, V2h ⊂ H 1 (R) is the space of the functions globally continuous, and
affine on each e ∈ T2h .
For approximating the Lagrange multipliers space Λ = H −1/2 (γ) we proceed as follows. Assume that on γ, T1h is two times coarser than T2h . Then
let us divide every edge ∂e of a triangle e from the coarse grid T1h , which is
located on γ (∂e ⊂ γ), into two parts using its midpoint. Now, we consider the
space of the piecewise constant functions, which are constant on every union
of half-edges with a common vertex (see Fig. 3).
Further, we use quadrature formulas for approximating the integrals over
the triangles from T1h and T2h , as well as over Γext . For a triangle e we set
Ω
γ
R
Fig. 2. Semimatching mesh on γ.
136
S. Lapin et al.
Ω
γ
R
Fig. 3. Space Λ is the space of the piecewise constant functions defined on every
union of half-edges with common vertex.
e
φ(x)dx ≈
3
1
meas(e)
φ(ai ) ≡ Se (φ),
3
i=1
where the ai ’s are the vertices of e and φ(x) is a continuous function on e.
Similarly,
2
1
φ(ai ) ≡ S∂e (φ),
φ(x)dx ≈ meas(∂e)
2
∂e
i=1
where ai ’s are the endpoints of the segment ∂e and φ(x) is a continuous
function on this segment.
We use the notations:
Se (φ), i = 1, 2, and SΓext (φ) =
S∂e (φ).
Si (φ) =
e∈Tih
∂e⊂Γext
Now, the fully discrete problem reads as follows: Let u0ih = u1ih = 0,
n+1
n+1
i = 1, 2. For n = 1, 2, . . . , N − 1, find (un+1
) ∈ V1h × V2h × Λh
1h , u2h , λh
such that
ε1
n−1
−1
n
n
S1 ((un+1
1h − 2u1h + u1h )w1h ) + S1 (µ1 ∇u1h · ∇w1h )+
∆t2
1
n−1
n
−1
+
S (ε(x)(un+1
(x)∇un2h · ∇w2h )+
2h − 2u2h + u2h )w2h ) + S2 (µ
2 2
∆t
6
ε1 µ−1
1
n+1
n−1
SΓext ((u1h − u1h )w1h ) + λn+1
(w2h − w1h )dγ =
+
h
2∆t
γ
= S1 (f1n w1h ) + S2 (f2n w2h ) for all w1h ∈ V1h , w2h ∈ V2h ,
n+1
ζh (un+1
2h − u1h )dγ = 0 for all ζh ∈ Λh .
(9)
(10)
γ
n−1
n
Note that in S2 (ε(x)(un+1
2h − 2u2h + u2h )w2h ) we take ε(x) = ε2 if a
triangle e ∈ T2h lies in Ω2 and ε(x) = ε1 if it lies in R \ Ω2 , and similarly for
S2 (µ−1 (x)∇un2h ∇w2h ).
A Lagrange Multiplier Based Domain Decomposition Method
137
Denote by u1 , u2 and λ the vectors of the nodal values of the corresponding
functions u1h , u2h and λh . Then, in order to find un+1
, un+1
and λn+1 for a
1
2
n+1
, we have to solve a system of linear equations such as
fixed time t
Au + BT λ = F,
Bu = 0,
(11)
(12)
where matrix A is diagonal, positive definite and defined by
6
ε1 µ−1
1
ε1
1
S1 (u1h w1h ) +
S2 (ε(x)u2h w2h ) +
SΓext (u1h w1h ),
(Au, w) =
∆t2
∆t2
2∆t
and where the rectangular matrix B is defined by
(Bu, λ) =
λh (u2h − u1h )dΓ,
γ
and vector F depends on the nodal values of the known functions un1h , un2h ,
n−1
n−1
u1h
and u2h
.
Eliminating u from the equation (11), we obtain
BA−1 BT λ = BA−1 F,
(13)
with a symmetric matrix C ≡ BA−1 BT . Let us prove that C is positive
definite. Obviously, ker C = ker BT . Suppose, that BT λ = 0, then a function
λh ∈ Λh corresponding to vector λ satisfies
I≡
λh uh dγ = 0
γ
for all uh ∈ V1h . Choose uh equal to λh in the nodes of T1h located on γ.
Direct calculations give
$
Nλ #
hi + hi+1 2
1
(λi + λi+1 )2
I=
λi + hi+1
,
2 i=1
2
2
where Nλ is the number of edges of T1h on γ, hi is the length of i-th edge and
hNλ +1 ≡ h1 , λNλ +1 ≡ λ1 . Thus, the equality I = 0 implies that λ = 0, i.e.
ker BT = {0}.
As a consequence we have
Theorem 2. The problem (9), (10) has a unique solution (uh , λh ).
Remark 3. A closely related domain decomposition method applied to the
solution of linear parabolic equations is discussed in [Glo03].
138
S. Lapin et al.
4 Energy Inequality
Theorem 3. Let hmin denote the minimal diameter of the triangles from T1h ∪
T2h . There exists a positive number c such that the condition
√
√
(14)
∆t ≤ c min{ ε1 µ1 , ε2 µ2 } hmin
ensures the positive definiteness of the quadratic form
2
2
n
n
un+1
un+1
1
1
1h − u1h
2h − u2h
= ε1 S 1
+ S2 ε
+
E
2
∆t
2
∆t
n+1
n+1
2
2
u1h + un1h
u2h + un2h
1
1
−1
−1
+ S1 µ1 ∇
+ 2 S2 µ ∇
−
2
2
2
n+1
n+1
2
2
u1h − un1h
u2h − un2h
∆t2
∆t2
−1
−1
S1 µ1 ∇
−
− 8 S2 µ ∇
,
8
∆t
∆t
n+1
(15)
which we call the discrete energy.
The system (9), (10) satisfies the energy identity
E n+1 − E n +
6
ε1 µ−1
1
4∆t
=
n+1
n−1 2
SΓext ((u1h
− u1h
) )=
1
1
n+1
n−1
n−1
S1 (f1n (u1h
− u1h
)) + S2 (f2n (un+1
2h − u2h ))
2
2
(16)
and the numerical scheme is stable: There exists a positive number M = M (T )
such that
n−1
(S1 ((f1k )2 ) + S2 ((f2k )2 )), ∀n.
(17)
E n ≤ M ∆t
k=1
Proof. Let n ≥ 1. From the equation (10) written for tn+1 and tn−1 we obtain
n−1
n+1
n−1
ζh ((un+1
(18)
2h − u2h ) − (u1h − u1h ))dγ = 0 for all ζh ∈ Λh .
γ
Choosing
w1h =
n−1
un+1
1h − u1h
,
2
w2h =
n−1
un+1
2h − u2h
2
in (9) and
λn+1
h
2
in (18), we add these equalities. Using the identities
ζh = −
A Lagrange Multiplier Based Domain Decomposition Method
139
n−1 2
n+1
n−1
n+1
n−1
n 2
n
n
(un+1
ih − 2uih + uih )(uih − uih ) = (uih − uih ) − (uih − uih )
and
1 n+1
n 2
+ unih )2 − (un+1
((u
ih − uih ) ),
4 ih
after several technical transformations we obtain
6
ε1 µ−1
1
n+1
n−1 2
SΓext ((u1h
E n+1 − E n +
− u1h
) )=
4∆t
1
1
n−1
n−1
n n+1
S1 (f1n (un+1
1h − u1h )) + S2 (f2 (u2h − u2h )).
2
2
Therefore,
2
n+1
u1h
− un1h
1
1/2
1/2
n+1
n
n 2
E
S1
(f1 )
≤ E + ∆tS1
+
2
∆t
2
2
n
n
un+1
un+1
1
1/2
1/2
1/2
n 2
2h − u2h
1h − u1h
S2
(f2 )
+S1
+ ∆tS2
+
∆t
2
∆t
2
n
un+1
1/2
2h − u2h
+S2
. (19)
∆t
=
unih un+1
ih
Now, we will show that under the condition (14) the quadratic form E n is
positive definite; more precisely, that there exists a positive constant δ such
that
2
2
n+1
n+1
n
n
u
−
u
u
−
u
2h
1h
2h
1h
+ S2
.
(20)
E n ≥ δ S1
∆t
∆t
Obviously, it is sufficient to prove the inequality
4εe µe Se (vh2 ) ≥ ∆t2 Se (|∇vh |2 ) ∀e ∈ T1h ∪ T2h , ∀vh ∈ P1 (e),
(21)
where εe and µe are defined by εe = ε1 or εe = ε2 (respectively, µe = µ1 or
µe = µ2 ). It is known that for a regular triangulation
2
Se (|∇vh |2 ) ≤ 1/c21 h−2
e Se (vh )
(22)
with a positive constant c1 , universal for all triangles e, where he is the minimal
length of the sides of e. Combining (21) and (22), we observe that the time
step ∆t should satisfy the inequality
√
√
(23)
∆t ≤ c εe µe he , (c = 2c1 ),
for all e ∈ T1h ∪ T2h . Evidently, (14) ensures the validity of (23).
Further, using the relation (20), E 1 = 0 and summing the inequalities (19),
one obtains the stability inequality (17):
E n ≤ M ∆t
n−1
k=1
(S1 ((f1k )2 ) + S2 ((f2k )2 )),
∀n.
140
S. Lapin et al.
5 Numerical Experiments
In order to solve the system of linear equations (11)–(12) at each time step
we use a Conjugate Gradient Algorithm in the form given by Glowinski and
LeTallec [GL89]:
Step
Step
Step
Step
1.
2.
3.
4.
Step 5.
Step 6.
Step 7.
Step 8.
λ0 given.
Au0 = F − Bλ0 .
g0 = −BT u0 .
If g0 ≤ ε0 take λ = λ0 ,
else w0 = g0 .
For m ≥ 0, assuming that λm , gm , wm are known,
Aūm = Bwm .
ḡm = BT ūm .
|gm |2
ρm = m m .
(ḡ , w̄ )
λm+1 = λm − ρm wm .
um+1 = um + ρm v̄m .
gm+1 = gm − ρm ḡm .
gm+1 · gm+1
≤ ε then take λ = λm+1 ,
If
g0 · g0
gm+1 · gm+1
else γm =
.
gm · gm
wm+1 = gm+1 + γm wm .
Do m = m + 1 and go to Step 5.
We consider the problem (9)–(10) with a source term given by the harmonic planar wave
uinc = −eik(t−α·x) ,
(24)
where {xj }2j=1 , {αj }2j=1 , k is the angular frequency and |α| = 1.
For our numerical simulation we consider two cases: the first with the
frequency of the incident wave f = 0.6 GHz and the second with f = 1.2 GHz,
which gives us wavelengths L = 0.5 meters and L = 0.25 meters, respectively.
We performed a series of numerical experiments: scattering by a perfectly
reflecting obstacle, wave propagation through a domain with an obstacle completely consisting of a coating material and scattering by an obstacle with
coating.
First, we consider the scattering by a perfectly reflecting obstacle. For the
experiment we have chosen Ω2 to be in a form of a perfectly reflecting airfoil,
and Ω is a 2 meter × 2 meter rectangle. We used a finite element mesh with
8019 nodes and 15324 elements in the case of f = 0.6 GHz (Fig. 4) and 19246
nodes and 37376 elements for f = 1.2 GHz.
Figure 5 shows the contour plot for the case when the incident wave is
coming from the left and Figure 6 shows the case when the incident wave is
coming from the lower left corner with an angle of 45◦ . For all the experiments
A Lagrange Multiplier Based Domain Decomposition Method
141
Fig. 4. Example of a finite element mesh.
Fig. 5. Contour plot of the real part of the solution for L = 0.5 (left) and L = 0.25
(right) meters. Incident wave coming from the left.
we chose the time step to be ∆t = T /50, where T = 1/f = 1.66 × 10−9 sec
is a time period corresponding to L = 0.5 meters and T = 1/f = 0.83 × 10−9
sec for L = 0.25 meters.
The next set of numerical experiments contains the simulations of wave
propagation through a domain with an obstacle completely consisting of a
142
S. Lapin et al.
Fig. 6. Contour plot of the real part of the solution for L = 0.5 (left) and L = 0.25
(right) meters. Incident wave coming from the lower left corner with an angle of 45
degrees.
coating material. We have taken the coating material coefficients to be ε2 = 1
and µ2 = 9, implying that the speed of propagation in the coating material is
three times slower than in air. As before Ω is a 2 meter × 2 meter rectangle
and Ω2 has the shape of an airfoil.
For the solution of this problem for an incident frequency f = 0.6 GHz we
have used a mesh with a total of 8435 nodes and 16228 elements. The time
step was taken to be ∆t = T /50, where T = 1/f = 1.66 × 10−9 sec is a time
period. We used a mesh consisting of 20258 nodes (39514 elements) for solving
the problem for an incident wave with the frequency f = 1.2 GHz. The time
step was equal to T /50, T = 1/f = 0.83 × 10−9 sec.
In Figures 7 and 8 we present the contour plot of the real part of the
solution for the incident frequency L = 0.5 and L = 0.25. We also performed
numerical computations for the case when the obstacle is an airfoil with a
coating (Figure 9). The coating region is moon shaped and, as before, ε2 = 1
and µ2 = 9. We show in Figure 10 the contour plot of the real part of the
solution for the incident frequency L = 0.5 meters and L = 0.25 meters for
the case when the incident wave is coming from the left. Figure 11 presents
the contour plot of the real part of the solution for incident frequency, L = 0.5
meters and L = 0.25 meters for the case when incident wave is coming from
the lower left corner with angle equal to 45◦ .
An important observation for all of the numerical experiments mentioned
is that, despite the fact that a mesh discontinuity takes place over γ together
with a weak forcing of the matching conditions, we do not observe a discontinuity of the computed fields.
A Lagrange Multiplier Based Domain Decomposition Method
143
Fig. 7. Contour plot of the real part of the solution for L = 0.5 (left) and L = 0.25
(right). Incident wave coming from the left.
Fig. 8. Contour plot of the real part of the solution for L = 0.5 (left) and L = 0.25
(right). Incident wave coming from the lower left corner with an angle of 45 degrees.
144
S. Lapin et al.
Fig. 9. Obstacle in a form of an airfoil with a coating.
Fig. 10. Contour plot of the real part of the solution for L = 0.5 (left) and L = 0.25
(right). Incident wave coming from the left.
A Lagrange Multiplier Based Domain Decomposition Method
145
Fig. 11. Contour plot of the real part of the solution for L = 0.5 (left) and L = 0.25
(right). Incident wave coming from the left lower corner with a 45 degrees angle.
References
[BDG+ 97] M. O. Bristeau, E. J. Dean, R. Glowinski, V. Kwok, and J. Périaux.
Exact controllability and domain decomposition methods with nonmatching grids for the computation of scattering waves. In R. Glowinski,
J. Périaux, and Z. Shi, editors, Domain Decomposition Methods in Sciences and Engineering, pages 291–307. John Wiley & Sons, 1997.
[DL92]
R. Dautray and J.-L. Lions. Mathematical Analysis and Numerical
Methods for Science and Technology, volume 5. Springer-Verlag, 1992.
[GL89]
R. Glowinski and P. LeTallec. Augmented Lagrangian and Operator
Splitting Methods in Nonlinear Mechanics. SIAM, Philadelphia, PA,
1989.
[Glo03]
R. Glowinski. Finite element methods for incompressible viscous flow. In
P. G. Ciarlet and J.-L. Lions, editors, Handbook of Numerical Analysis,
Vol. IX, pages 3–1176. North-Holland, Amsterdam, 2003.
Domain Decomposition and Electronic
Structure Computations: A Promising
Approach
Guy Bencteux1,4 , Maxime Barrault1 , Eric Cancès2,4 , William W. Hager3 ,
and Claude Le Bris2,4
1
2
3
4
EDF R&D, 1 avenue du Général de Gaulle, 92141 Clamart Cedex, France
{guy.bencteux,maxime.barrault}@edf.fr
CERMICS, École Nationale des Ponts et Chaussées, 6 & 8, avenue Blaise Pascal,
Cité Descartes, 77455 Marne-La-Vallée Cedex 2, France,
{cances,lebris}@cermics.enpc.fr
Department of Mathematics, University of Florida, Gainesville, FL 32611-8105,
USA, hager@math.ufl.edu
INRIA Rocquencourt, MICMAC project, Domaine de Voluceau, B.P. 105, 78153
Le Chesnay Cedex, France
Summary. We describe a domain decomposition approach applied to the specific context of electronic structure calculations. The approach has been introduced
in [BCHL07]. We survey here the computational context, and explain the peculiarities of the approach as compared to problems of seemingly the same type in other
engineering sciences. Improvements of the original approach presented in [BCHL07],
including algorithmic refinements and effective parallel implementation, are included
here. Test cases supporting the interest of the method are also reported.
It is our pleasure and an honor to dedicate this contribution to Olivier Pironneau,
on the occasion of his sixtieth birthday. With admiration, respect and friendship.
1 Introduction and Motivation
1.1 General Context
Numerical simulation is nowadays an ubiquitous tool in materials science,
chemistry and biology. Design of new materials, irradiation induced damage,
drug design, protein folding are instances of applications of numerical simulation. For convenience we now briefly present the context of the specific
computational problem under consideration in the present article. A more
detailed, mathematically-oriented, presentation is the purpose of the monograph [CDK+ 03] or of the review article [LeB05].
For many problems of major interest, empirical models where atoms are
represented as point particles interacting with a parameterized force-field are
148
G. Bencteux et al.
adequate models. On the other hand, when electronic structure plays a role in
the phenomenon under consideration, an explicit quantum modelling of the
electronic wavefunctions is required. For this purpose, two levels of approximation are possible.
The first category is the category of ab initio models, which are general purpose models that aim at solving sophisticated approximations of the
Schrödinger equation. Such models only require the knowledge of universal
constants and require a, ideally null but practically limited, number of adjustable parameters. The most commonly used models in this category are
Density Functional Theory (DFT) based models and Hartree–Fock type models, respectively. Although these two families of models have different theoretical grounding, they share the same mathematical nature. They are constrained
minimization problems, of the form
ψi ψj = δij , ∀1 ≤ i, j ≤ N
(1)
inf E(ψ1 , . . . , ψN ), ψi ∈ H 1 (R3 ),
R3
The functions ψi are called the molecular orbitals of the system. The energy
functional E, which of course depends on the model employed, is parametrized
by the charges and positions of the nuclei of the system under consideration.
With such models, systems with up to 104 electrons can be simulated.
Minimization problems of the type (1) are not approached by minimization
algorithms, mainly because they are high-dimensional in nature. In contrast,
the numerical scheme consists in solving their Euler–Lagrange equations,
which are nonlinear eigenvalue problems. The current practice is to iterate
on the nonlinearity using fixed-point type algorithms, called in this framework Self Consistent Field iterations, with reference to the mean-field nature
of DFT and HF type models.
The second category of models is that of semi-empirical models, such as
Extended Hückel Theory based and tight-binding models, which contain additional approximations of the above DFT or HF type models. They consist
in solving linear eigenvalue problems. State-of-the-art simulations using such
models address systems with up to 105 –106 electrons.
Finite-difference schemes may be used to discretize the above problems.
They have proved successful in some very specific niches, most of them related to solid-state science. However, in an overwhelming number of contexts,
the discretization of the nonlinear or linear eigenvalue problems introduced
above is performed using a Galerkin formulation. The molecular orbitals ψi
are developed on a Galerkin basis {χi }1≤i≤Nb , with size Nb > N , the number of electrons in the system. Basis functions may be plane waves. This is
often the case for solid state science applications and then Nb is very large
as compared to N , typically one hundred times as large or more. They may
also be localized functions, namely compactly supported functions or exponentially decreasing functions. Such basis sets correspond to the so-called Linear
Combination of Atomic Orbitals (LCAO) approach. Then the dimension of
the basis set needed to reach the extremely demanding accuracy required for
Domain Decomposition Approach for Computational Chemistry
149
electronic calculation problems is surprisingly small. Such basis sets, typically
in the spirit of spectral methods, or modal synthesis, are, indeed, remarkably
efficient. The domain decomposition method described in the present article
is restricted to the LCAO approach. Indeed, it strongly exploits the locality
of the basis functions.
In both categories of models, linear or nonlinear, the elementary brick is
the solution to a (generalized) linear eigenvalue problem of the following form:
⎧
Hci = εi Sci ,
⎪
⎪
⎪
⎪
⎨ cti Scj = δij ,
⎪
⎪
⎪
⎪
⎩
D⋆ =
N
ε1 ≤ . . . ≤ εN ≤ εN +1 ≤ . . . ≤ εNb ,
(2)
ci cti .
i=1
The matrix H is a Nb × Nb symmetric matrix, called the Fock matrix. When
the linear system above is one iteration of a nonlinear cycle, this matrix is
computed from the result of the previous iteration. The matrix S is a Nb × Nb
symmetric positive definite matrix, called the overlap matrix, which depends
only on the basis set used (it corresponds to the mass matrix in the language
of finite element methods).
One searches for the solution of (2), that is the matrix D⋆ called the density matrix. This formally requires the knowledge of the first N (generalized)
eigenelements of the matrix H (in fact, we shall see below this statement is
not exactly true).
The system of the equations (2) is generally viewed as a generalized eigenvalue problem, and most of the computational approaches consist in solving
the system via the computation of each individual vector ci (discretizing the
wavefunction ψi of (1)), using a direct diagonalization procedure.
1.2 Specificities of the Approaches for Large Systems
The procedure mentioned above may be conveniently implemented for systems of limited size. For large systems, however, the solution procedure for
the linear problem suffers from two computational bottlenecks. The first one
is the need for assembling the Fock matrix. It a priori involves O(Nb3 ) operations in DFT models and O(Nb4 ) in HF models. Adequate approaches, which
lower the complexity of this step, have been proposed. Fast multipole methods (see [SC00]) are one instance of such approaches. The second practical
bottleneck is the diagonalization step itself. This is the focus of the present
contribution. Because of the possibly prohibitive O(Nb3 ) cost of direct diagonalization procedures, the so-called alternatives to diagonalization have been
introduced. The method introduced in the present contribution aims at competing with such methods, and eventually outperforming them. With a view
to understanding the problem under consideration, let us briefly review some
peculiarities of electronic structure calculation problems.
150
G. Bencteux et al.
The situation critically depends on the type of basis set employed. With
plane wave basis sets, the number N of eigenelements to determine can be considered as small, compared to the size Nb of the matrix H (Nb ∼ 100N ). Then,
iterative diagonalization methods, based on the inverse power paradigm, are
a natural choice. In contrast, in the case of localized basis sets we deal with in
this article, Nb varies from 2 to 10 times N . In any case it remains strictly proportional to N . Hence, the problem (2) can be rephrased as follows: identify
say one half of the eigenelements of a given matrix. This makes the problem
very specific as compared to other linear eigenvalue problems encountered in
other fields of the engineering sciences (see [AHLT05, HL07], for instance).
The sparsity of the matrices in the present context is another peculiarity of
the problem. Although the matrices H and S are sparse for large molecular
systems, they are not as sparse as the stiffness and mass matrices usually encountered when using finite difference or finite element methods. For example,
the bandwidth of H and S is of the order of 102 in the numerical examples
reported in Section 5.
1.3 Alternative Methods Towards Linear Scaling
In addition to the above mentioned peculiarities, a crucial specificity of the
problem (2) is that the eigenelements do not need to be explicitly identified.
As expressed by the last line of (2), only the knowledge of the density matrix
D⋆ is required, both for the evaluation of the Fock operator associated to
the next iteration, in a nonlinear context, and for the evaluation of relevant
output quantities, in the linear context or at the last step of the iteration loop.
From a geometrical viewpoint, D⋆ is the S-orthogonal projector (in the
sense that D⋆ SD⋆ = D⋆ and D⋆t = D⋆ ) on the vector subspace generated by
the eigenvectors associated with the lowest N eigenvalues of the generalized
eigenvalue problem Hc = εSc.
The above elementary remark is the bottom line for the development of
the alternative to diagonalization methods, also often called linear scaling
methods because their claimed purpose is to reach a linear complexity of the
solution procedure (either in terms of N the number of electrons, or Nb the
dimension of the basis set). For practical reasons, which will not be further
developed here, such methods assume that:
(H1) The matrices H and S are sparse, in the sense that, for large systems,
the number of non-zero coefficients scales as N . This assumption is not
restrictive. In particular, it is automatically satisfied for DFT and HF
models as soon as the basis functions are localized;
(H2) The matrix D⋆ built from the solution to (2) is also sparse. This condition seems to be fulfilled as soon as the relative gap
γ=
εN +1 − εN
.
εNb − ε1
(3)
Domain Decomposition Approach for Computational Chemistry
151
deduced from the solution of (2) is large enough. This observation can be
supported by qualitative physical arguments [Koh96], but has seemingly
no mathematical grounding to date (see, however, [Koh59]).
State-of-the-art surveys on such methods are [BMG02, Goe99]. One of the
most commonly used linear scaling method is the Density Matrix Minimization (DMM) method [LNV93].
2 A New Domain Decomposition Approach
Our purpose is now to expose a method, based on the domain decomposition
paradigm, which we have recently introduced in [BCHL07], and for which we
also consider a setting where the above two assumptions are valid. Although
still in its development, we have good hope that this approach will outperform
existing ones in a near future. Preliminary test cases support this hope.
The approach described below is not the first occurrence of a method based
on a “geographical” decomposition of the matrix H in the context of quantum
chemistry (see, e.g., [YL95]). A significant methodological improvement is,
however, fulfilled with the present method. To the best of our knowledge,
existing methods in the context of electronic calculations that may be recast
as domain decomposition methods only consist of local solvers complemented
by a crude global step. Our method seems to be the first one really exhibiting
the local/global paradigm in the spirit of methods used in other fields of the
engineering sciences.
In the following, we expose and make use of the method on one-dimensional
systems, typically nanotubes or linear hydrocarbons. Generalizations to threedimensional systems do not really bring up new methodological issues. They
are, however, much more difficult in terms of implementation.
For simplicity, we now present our method assuming that S = INb , i.e. that
the Galerkin basis {χi }1≤i≤Nb is orthonormal. The extension of the method
to the case when S = INb is straightforward. The space Mk,l denotes the
vector space of the k × l real matrices.
Let us first notice that a solution D⋆ of (2) reads
D⋆ = C⋆ C⋆t
where C⋆ is a solution to the minimization problem
"
!
inf Tr(HCC t ), C ∈ MNb ,N (R), C t C = IN .
(4)
(5)
Our approach consists in solving an approximation of the problem (5). The
latter is obtained by minimizing the exact energy Tr(HCC t ) on the set of the
matrices C that have the block structure displayed on Figure 1 and satisfy
the constraint C t C = IN .
152
G. Bencteux et al.
m1
n
C1
0
C =
mp
N b = (p+1) n/2
Cp
0
N = m 1 + ... + mp
Fig. 1. Block structure of the matrices C.
n
n
H1
0
0
H =
Hp
D =
0
0
N b = (p+1) n/2
N b = (p+1) n/2
Fig. 2. Block structure of the matrices H and D.
A detailed justification of the choice of this structure is given in [BCHL07].
Let us only mention here that the decomposition is suggested from the localization of electrons and the use of a localized basis set. Note that each block
overlaps only with its first neighbors. Again for simplicity, we expose the
method in the case where overlapping is exactly n/2, but it could be any
integer smaller than n/2.
The resulting minimization problem can be recast as
inf
) p
i=1
Tr Hi Ci Cit , Ci ∈ Mn,mi (R), mi ∈ N, Cit Ci = Imi ∀ 1 ≤ i ≤ p,
Cit T Ci+1 = 0 ∀ 1 ≤ i ≤ p − 1,
p
mi = N
i=1
In the above formula, T ∈ Mn,n (R) is the matrix defined by
)
1 if k − l = n2 ,
Tkl =
0 otherwise,
Hi ∈ Mn,n (R) is a symmetric submatrix of H (see Figure 2), and
7
. (6)
(7)
Domain Decomposition Approach for Computational Chemistry
153
t
H1
0
C1
0
C1
p
Σ
=
Tr
Tr
t
Hi
Ci Ci
i=1
Cp
0
Hp
Cp
0
C ti T C i+1
t
C1
0
C1
0
0
=
0
0
Cp
0
Cp
Ct C
i
i
.
In this way, we replace the N (N2+1) global scalar constraints C t C = IN
p mi (mi +1)
involving vectors of size Nb , by the
local scalar constraints
i=1
2
p−1
t
t
Ci Ci = Imi and the
i=1 mi mi+1 local scalar constraints Ci T Ci+1 = 0,
involving vectors of size n. We would like to emphasize that we can only
obtain in this way a basis of the vector space generated by the lowest N
eigenvectors of H. This is the very nature of the method, which consequently
cannot be applied for the search for the eigenvectors themselves.
Before we describe in details the procedure employed to solve the Euler–
Lagrange equations of (6) in a greater generality, let us consider, for pedagogic
purpose, the following oversimplified problem:
"
!
inf H1 Z1 , Z1 + H2 Z2 , Z2 , Zi ∈ RNb , Zi , Zi = 1, Z1 , Z2 = 0 . (8)
We have denoted by ·, · the standard Euclidean scalar product on RNb .
The problem (8) is not strictly speaking a particular occurrence of (6),
but it shows the same characteristics and technical difficulties: a separable
functional is minimized, there are constraints on variables of each term and
there is a cross constraint between the two terms.
The bottom line for our decomposition algorithm is to attack (8) as follows. Choose (Z10 , Z20 ) satisfying the constraints and construct the sequence
(Z1k , Z2k )k∈N by the following iteration procedure. Assume (Z1k , Z2k ) is known,
then
Local step: Solve
) k
!
"
= arg inf H1 Z1 , Z1 , Z1 ∈ RNb , Z1 , Z1 = 1, Z1 , Z k = 0 ,
Z
1
2
"
!
2k = arg inf H2 Z2 , Z2 , Z2 ∈ RNb , Z2 , Z2 = 1, Z1k , Z2 = 0 ;
Z
(9)
Global step: Solve
!
"
α∗ = arg inf H1 Z1 (α), Z1 (α) + H2 Z2 (α), Z2 (α), α ∈ R
(10)
where
k + αZ
k
Z
1
2
Z1 (α) = √
,
1 + α2
Z2 (α) =
k
−αZ1k + Z
2
√
,
1 + α2
(11)
154
G. Bencteux et al.
and set
k + Z
k
k
Zk + α∗ Z
−α∗ Z
2
2
Z1k+1 = +1
, Z2k+1 = + 1
.
(12)
∗
2
∗
2
1 + (α )
1 + (α )
This algorithm operates at two levels: a fine level where two problems of
dimension Nb are solved (rather than one problem of dimension 2Nb ); a coarse
level where a problem of dimension 2 is solved.
The local step monotonically reduces the objective function; however, it
may not converge to the global optimum. The technical problem is that the
Lagrange multipliers associated with the constraint Z1 , Z2 = 0 may converge
to different values in the two subproblems associated with the local step. The
global step again reduces the value of the objective function since Z̃1k and
Z̃2k are feasible in the global step. The combined algorithm (local step +
global step), therefore, makes the objective function monotonically decrease.
The simple case H1 = H2 is interesting to consider. First, if the algorithm
is initialized with Z20 = 0 in the first line of (9), it is easily seen that the
local step is sufficient to converge to the global minimizer, in one single step.
Second, it has been proved in [Bar05] that for a more general initial guess and
under some assumption on the eigenvalues of the matrix H1 , this algorithm
globally converges to an optimal solution of (8). Ongoing work [BCHL] aims
at generalizing the above proof when the additional assumption on eigenvalues
is omitted. The analysis of the convergence in the case H1 = H2 is a longer
term goal.
3 The Multilevel Domain Decomposition (MDD)
Algorithm
We define, for all p-tuple (Ci )1≤i≤p ,
p
E (Ci )1≤i≤p =
Tr Hi Ci Cit ,
(13)
U0 = Up = 0.
(14)
i=1
and set by convention
It has been shown in [BCHL07] that updating the block sizes mi along the
iterations is crucial to make the domain decomposition algorithm converge
toward a good approximation of the solution to (5). It is, however, observed
in practice that after a few iterations, the block sizes have converged (they do
not vary in the course of the following iterations). This is why, for the sake of
clarity, we have chosen to present here a simplified version of the algorithm
where block sizes are held constant along the iterations. For a description of
the complete algorithm with variable block sizes, we refer to [BCHL07].
At iteration k, we have at hand a set of matrices (Cik )1≤i≤p such that
k
k
= 0. We now explain how to
Ci ∈ Mn,mi (R), [Cik ]t Cik = Imi , [Cik ]t T Ci+1
k+1
compute the new iterate (Ci )1≤i≤p .
Domain Decomposition Approach for Computational Chemistry
155
Step 1: Local fine solver.
(a) For each i, find
!
inf Tr Hi Ci Cit , Ci ∈ Mn,mi (R), Cit Ci = Imi ,
k
]t T Ci = 0,
[Ci−1
"
k
Cit T Ci+1
= 0 . (15)
This is done via diagonalization of the matrix Hi in the subspace
k t
k
Vik = x ∈ Rn , Ci−1
T x = 0, xt T Ci+1
=0 ,
i.e. diagonalize Pik Hi Pik where Pik is the orthogonal projector on
Vik .
This provides (at least) n − mi−1 − mi+1 real eigenvalues and
associated orthonormal vectors xki,j . The latter are T -orthogonal
k
k
to the column vectors of Ci−1
and Ci+1
.
k
k .
(b) Collect the lowest mi vectors xi,j in the n × mi matrix C
i
Step 2: Global coarse solver. Solve
U ∗ = arg inf {f (U), U = (Ui )i , ∀ 1 ≤ i ≤ p − 1, Ui ∈ Mmi+1 ,mi (R)} ,
(16)
where
−1
(17)
f (U) = E Ci (U) Ci (U)t Ci (U) 2
i
and
k
k
t
ik ]t T T t C
ik − T t C
i−1
ik ]t T t T C
ik + T C
i+1
ik .
Ci (U) = C
[C
Ui [C
Ui−1
(18)
Next set, for all 1 ≤ i ≤ p,
t
Cik+1 = Ci (U ∗ ) Ci (U ∗ ) Ci (U ∗ )
−1/2
.
(19)
Notice that in Step 1, the computations of each odd block is independent
from the other odd blocks, and obviously the same for even blocks. Thus, we
use here a red/black strategy.
In the global step, we perturb each variable by a linear combination of
the adjacent variables. The matrices U = (Ui )i in (16) play the same role as
the real parameter α in the toy example, the equation (10). The perturbation
is designed so that the constraints are satisfied. However, our numerical experiments show that this is not exactly the case, in the sense that, for some
k+1
] may present coefficients as large as about 10−3 . All lini, [Cik+1 ]t T [Ci+1
ear scaling algorithms have difficulties in ensuring this constraint. We should
mention here that in our case, the resulting deviation of C t C from identity is
small, C t C being in any case block tridiagonal.
In practice, we reduce the computational cost of the global step, by again
using a domain decomposition method. The blocks (Ci )1≤i≤p are collected
156
G. Bencteux et al.
1
2
3
4
5
6
7
9
8
10
G1
G2
G3
Fig. 3. Collection of p = 10 blocks into r = 3 groups.
Repeat until convergence:
1a. Local step on blocks: 1, 3, ..., (2i + 1), ...
1b. Local step on blocks: 2, 4, ..., (2i), ...
2a. Global step on groups: {1, 2}, {3, 4}, ..., {2i − 1, 2i}, ...
2b. Global step on groups: {2, 3}, {4, 5}, ..., {2i, 2i + 1}, ...
Fig. 4. Schematic view of the algorithm in the case of 2-block groups (r = 2): tasks
appearing on the same line are independent from one another. Order between the
steps 1a and 1b is reversed from one iteration to the other. The same holds for the
steps 2a and 2b.
in r overlapping groups (Gl )1≤l≤r as shown in Figure 3. As each group only
overlaps with its first neighbors, the problem (16) can be solved first for the
groups (G2l+1 ), next for the groups (G2l ). We have observed that the number
of iterations of the outer loop (local step + global step) does not significantly
increase when the ‘exact’ global step (16) is replaced by the approximate
global step consisting in optimizing first the odd groups, then the even groups.
The numerical results performed so far (see Section 5) tend to show that the
resulting algorithm scales linearly with the system size.
A schematic view of the algorithm is provided in Figure 4.
One important point (not taken into account in [BCHL07]) is that the
Hessian of f enjoys a very specific structure. It is a sum of tensor products of
square matrices of size mi . For example, with two-block groups (r = 2), we
have
4
A(i) U B (i)
(20)
HU =
i=1
with A(i) ∈ Mm2 ,m2 (R) and B (i) ∈ Mm1 ,m1 (R). Consequently, it is possible to compute Hessian-vector products, without assembling the Hessian, in
O(m1 m2 max(m1 , m2 )) elementary operations, instead of O(m21 m22 ) with
a naive implementation. An additional source of acceleration is the fact that
this formulation uses only matrix-matrix products. Efficient implementations
of matrix-matrix products, taking advantage of higher numbers of floating
point operations per memory access, are available in the BLAS 3 library
Domain Decomposition Approach for Computational Chemistry
157
(see, for instance, [PA04]). This makes Newton-like methods affordable: a
good estimation of the Newton direction can be easily computed using an
iterative method.
In the current version of our domain decomposition algorithm, the global
step is solved approximatively by a single iteration of the Newton algorithm
with initial guess Ui = 0, the Newton iteration being computed iteratively
by means of the SYMMLQ algorithm [PS75]. In a next future, we plan to
test the efficiency of advanced first order methods such as the one described
in [HZ05]. No definite conclusions about the comparative efficiencies of the
various numerical methods for performing the global step can be drawn yet.
4 Parallel Implementation
For parallel implementation, the single-program, multi-data (SPMD) model
is used, with message passing techniques using the MPI library, which allows
to maintain only one version of the code.
Each processor executes a single instance of the algorithm presented in
Section 3 applied to a contiguous subset of blocks. Compared to the sequential
version, additional data structures are introduced: each processor needs to
access the matrices Ci and Hi corresponding to the last block of the processor
located on its left and to the first block of the processor located on its right,
as shown in Figure 5. These frontier blocks play the role of ghost nodes in
classical domain decomposition without overlapping. For this reason, we will
sometimes call them the ghost blocks.
The major part of the communications is performed between neighboring
processors at the end of each step of the algorithm (i.e. of each line in the
scheme displayed in Figure 4), in order to update the ghost blocks. This occurs
only four times per iteration and, as we will see in the next section, the sizes
of the exchanged messages are moderate.
Collective communications are needed to compute the current value of
the function f appearing in the formula (17) and to check that the maximum
deviation from orthogonality remains acceptable. They are also needed to sort
the eigenvalues of the different blocks in the local step, in the complete version
Proc 1
Proc 3
Proc 2
Fig. 5. Distribution of blocks over 3 processors. Arrows indicate the supplementary
blocks a processor needs to access.
158
G. Bencteux et al.
of the algorithm, allowing variable block sizes (see [BCHL07]). The important
point is that the amount of data involved in the collective communications is
small as well.
With this implementation we can use up to nbloc/2 processors. In order
to efficiently use a larger number of processors, sublevels of parallelism should
be introduced. For instance, each subproblem (15) (for a given i) can itself be
parallelized.
Apart from the very small part of collective communications, the communication volume associated with each single processor remains constant irrespective of the number of blocks per processor and of the number of processors.
We can thus expect a very good scalability, except for the situations when load
balancing is strongly heterogeneous.
The implementation of the MDD algorithm described above can be easily
extended to cover the case of 2D and 3D molecular systems.
5 Numerical Tests
This section is devoted to the presentation of the performance of the Multilevel Domain Decomposition (MDD) algorithm on matrices actually arising
in real-world applications of electronic structure calculations. The benchmark
matrices are of the same type of those used in the reference paper [BCHL07].
In the first subsection, we briefly recall how these matrices are generated
and we provide some practical details on our implementation of the MDD algorithm. The computational performances obtained on sequential and parallel architectures, including comparisons with the density matrix minimization
(DMM) method and with direct diagonalization using LAPACK, are discussed
in the second and third subsections, respectively.
5.1 General Presentation
Three families of matrices corresponding to the Hartree–Fock ground state of
some polymeric molecules are considered:
•
•
Matrices of type P1 and P2 are related to COH-(CO)nm -COH polymeric
chains, with interatomic Carbon-Carbon distances equal to 5 and 4 atomic
units (a.u.), respectively;
Matrices of type P1 are obtained with polyethylen molecules (CH3 -(CH2 )nm CH3 ) with physically relevant Carbon-Carbon distances.
The geometry of the very long molecules is guessed from the optimal distances
obtained by geometry optimization (with constraints for P1 and P2 ) on moderate size molecules (about 60 Carbon atoms) and minimal basis sets. All these
off-line calculations are performed using the GAUSSIAN package ([FTS+ 98]).
It is then observed that the overlap matrix and Fock matrix obtained exhibit
a periodic structure in their bulk. Overlap and Fock matrices for large size
Domain Decomposition Approach for Computational Chemistry
159
Table 1. Localization parameters, block sizes and asymptotic gaps for the test cases.
Bandwidth of S
Bandwidth of H
n
q
Asymptotic gap (a.u.)
P1
P2
P3
59
99
130
50
1.04 × 10−3
79
159
200
80
3.57 × 10−3
111
255
308
126
2.81 × 10−2
molecules can then be constructed using this periodicity property. For nm
sufficiently large, bulk periodicity is also observed in the density matrix. This
property is used to generate reference solutions for large molecules.
Table 1 gives a synthetic view of the different structure properties of the
three families of matrices under examination. The integer q stands for the
overlap between two adjacent blocks (note that one could have taken n = 2q
if the overlap matrix S was equal to identity, but that one has to take n > 2q
in our case since S = I).
Initial guess generation is of crucial importance for any linear scaling
method. The procedure in use here is in the spirit of the domain decomposition method:
1. A first guess of the block sizes is obtained by locating Z electrons around
each nucleus of charge Z;
2. A set of blocks Ci is built from the lowest mi (generalized) eigenvectors
associated with the block matrices Hi and Si (the block matrices Hi are
introduced in Section 2; the block matrices Si are defined accordingly);
3. These blocks are eventually optimized with the local fine solver of the
MDD algorithm, including block size update (electron transfer).
Criteria for comparing the results
The quality of the results produced by the MDD and DMM methods is evaluated by computing two criteria. The first criterion is the relative energy
0|
difference eE = |E−E
between the energy E of the current iterate D and
|E0 |
the energy E0 of the reference density matrix D⋆ . The second criterion is the
semi-norm
(21)
sup
e∞ =
Dij − [D⋆ ]ij
(i,j) s.t. |Hij |≥ε
−10
with ε = 10 . The introduction of the semi-norm (21) is consistent with
the cut-off on the entries of H (thus the value chosen for ε). Indeed, in most
cases, the matrix D is only used for the calculations of various observables
(in particular the electronic energy and the Hellman–Feynman forces), all of
them of the form Tr(AD), where the matrix A shares the same pattern as
H (see [CDK+ 03] for details). The final result of the calculation is, therefore,
160
G. Bencteux et al.
insensitive to entries Dij with indices (i, j) such that |Hij | is below some
cut-off value.
In all the calculations presented below, the global step is performed with
groups consisting of two blocks (r = 2), and the algorithm is, therefore, exactly
that displayed in Figure 4.
5.2 Sequential Computations
The numerical results presented in this section have been obtained with a
single 2.8 GHz Xeon processor.
Density matrices have been computed for a series of matrices H and S
of types P1 , P2 , and P3 , using (1) the MDD algorithm, (2) a diagonalization
procedure (the dsbgv.f routine from the LAPACK library), and (3) the DMM
method [LNV93]. The latter method belongs to the class of linear scaling
algorithms. An important feature of the DMM method is that linear scaling
is achieved through cut-offs on the matrix entries. We have chosen here a
cut-off strategy based on a priori defined patterns, that may be suboptimal.
Our implementation of DMM converges to a fairly good approximation of the
exact density matrix and scales linearly, but the prefactor might possibly be
improved by more refined cut-off strategies.
A detailed presentation of the comparison between the three methods is
provided in [BCHL07]. Our new approach for computing the Newton direction in the global step (see Section 3) further improves the efficiency of MDD:
with the new implementation of MDD, and with respect to the former implementation reported on in [BCHL07], CPU time is divided by 2 for P1 type
molecules, by 5 for P2 , and by 10 for P3 , and the memory required is now
lower for MDD than for DMM. These results are shown for P2 in Figures 6
and 7. They clearly demonstrate that the MDD algorithm scales linearly with
respect to the parameter nm (in both CPU time and memory occupancy).
Let us also notice that for P2 , the crossover point between diagonalization
and MDD (as far as CPU time is concerned) is now shifted to less than 2,000
basis functions.
5.3 Parallel Computations
We conclude with some tests of our parallel implementation of the MDD
algorithm described in Section 4. These tests have been performed on a 8 node
Linux cluster in dedicated mode, consisting of 8 biprocessors DELL Precision
450 (Intel(R) Xeon(TM) CPU 2.40GHz), with Gigabit Ethernet connections.
They concern the polyethylene family P3 , for which the size of each ghost
block is about 150 Ko.
We only test here the highest level of parallelism of the MDD algorithm,
consistently with the relatively low number of processors that have been used
in this first study. We plan to test multilevel parallelism in a near future.
Domain Decomposition Approach for Computational Chemistry
1e+08
161
LAPACK
DMM
MDD
1e+07
CPU Time in seconds
1e+06
100000
10000
1000
100
10
100
1000
10000
Nb
100000
1e+06
Fig. 6. Requested CPU time for computing the density matrix of a molecule of type
P2 as a function of the number of basis functions.
1e+09
LAPACK
DMM
MDD
Memory requirement in Kbytes
1e+08
1e+07
1e+06
100000
10000
1000
100
1000
10000
Nb
100000
1e+06
Fig. 7. Requested memory for computing the density matrix of a molecule of type
P2 as a function of the number of basis functions.
In particular, the local step in each block, as well as the global step in each
group, will be parallelized.
Tables 2 and 3 report on the speedup (ratio between the wall clock time
with one processor and the wall clock time for several processors) and efficiency (ratio between the speedup and the number of processors) of our
parallel MDD algorithm.
The scalability, namely the variation of the wall clock time when the number of processors and the size of the matrix proportionally grow, is reported
in Table 4, for a molecule of type P3 .
162
G. Bencteux et al.
Table 2. Wall clock time as a function of the number of processors for a molecule
of type P3 , with nm = 3300 (128 blocks). 8 MDD iterations are necessary to achieve
convergence up to 5 × 10−8 in energy and 3 × 10−3 in the density matrix (for the
semi-norm (21)).
Number of processors
Wall clock time (s)
Speedup
Efficiency
1
4300
2
2400
1.8
0.9
4
1200
3.6
0.9
8
580
7.4
0.9
16
360
12
0.75
Table 3. Wall clock time as a function of the number of processors for a molecule of
type P3 , with nm = 13300 (512 blocks). 7 MDD iterations are necessary to achieve
convergence up to 5 × 10−8 in energy and 3 × 10−3 in the density matrix (for the
semi-norm (21)).
Number of processors
Wall clock time (s)
Speedup
Efficiency
1
18460
4
4820
3.8
0.96
8
2520
7.3
0.92
16
1275
14.5
0.91
Table 4. Scalability of the MDD algorithm for a molecule of type P3 . The convergence thresholds are 2.5 × 10−7 in energy and 4 × 10−3 in density matrix (for the
semi-norm (21)).
Number of processors
1
4
8 16
Wall clock time with 200 atoms (8 blocks) per processor (s) 167 206 222 253
Wall clock time with 800 atoms (32 blocks) per processor (s) 1249 1237 1257 1250
Note that the calculations reported in this article have been performed
with minimal basis sets. It is the subject of ongoing works to test the efficiency
of the MDD algorithm for larger basis sets.
Let us finally mention that our parallel implementation of the MDD algorithm allows to solve (2) for a polyethylene molecule with 106 530 atoms
(372 862 basis functions) on 16 processors, in 90 minutes.
6 Conclusion and Perspectives
In its current implementation, the MDD algorithm allows to solve efficiently
the linear subproblem for linear molecules (polymers or nanotubes). The following issues will be addressed in a near future:
•
Still in the case of 1D systems, we will allow blocks to have more than two
neighbors. This should increase the flexibility and efficiency of the MDD
Domain Decomposition Approach for Computational Chemistry
•
•
•
163
algorithm. For instance, this should render calculations with large basis
sets including diffuse atomic orbitals affordable.
We plan to implement the MDD algorithm in the framework of 2D and
3D molecular systems. Note that even with minimal overlap a given block
has typically 8 neighbors in 2D and 26 neighbors in 3D.
The MDD algorithm will be extended to the cases of the nonlinear Hartree–
Fock and Kohn-Sham problems.
The present version of the MDD algorithm is restricted to insulators (i.e. to
matrices H with a sufficiently large gap). The possibility of extending the
MDD methodology to cover the case of metallic systems is a challenging
issue that will be studied.
Acknowledgement. EC and CLB acknowledge financial support from the French
Ministry for research under contract grant “Nouvelles Interfaces des Mathématiques”
SIMUMOL, and from Electricité de France under contract EDF-ENPC. WH acknowledges support from US National Science Foundation under grants 0203370,
0620286, and 0619080.
References
[AHLT05] P. Arbenz, U. L. Hetmaniuk, R. B. Lehoucq, and R. S. Tuminaro. A
comparison of eigensolvers for large-scale 3D modal analysis using AMGpreconditioned iterative methods. Internat. J. Numer. Methods Engrg.,
64:204–236, 2005.
[Bar05]
M. Barrault. Développement de méthodes rapides pour le calcul de structures électroniques. Thèse, l’Ecole Nationale des Ponts et Chaussées,
2005.
[BCHL]
G. Bencteux, E. Cancès, W. W. Hager, and C. Le Bris. Work in progress.
[BCHL07] M. Barrault, E. Cancès, W. W. Hager, and C. Le Bris. Multilevel domain
decomposition for electronic structure calculations. J. Comput. Phys.,
222(1):86–109, 2007.
[BMG02] D. Bowler, T. Miyazaki, and M. Gillan. Recent progress in linear scaling ab initio electronic structure theories. J. Phys. Condens. Matter,
14:2781–2798, 2002.
[CDK+ 03] E. Cancès, M. Defranceschi, W. Kutzelnigg, C. Le Bris, and Y. Maday.
Computational quantum chemistry: a primer. In C. Le Bris, editor,
Handbook of Numerical Analysis, Special volume, Computational Chemistry, Vol. X, pages 3–270. North-Holland, 2003.
[FTS+ 98] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A.
Robb, J. R. Cheeseman, V. G. Zakrzewski, J. A. Montgomery, R. E.
Stratmann, J. C. Burant, S. Dapprich, J. M. Millam, A. D. Daniels,
K. N. Kudin, M. C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi,
R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski,
G. A. Petersson, P. Y. Ayala, Q. Cui, K. Morokuma, D. K. Malick, A. D.
Rabuck, K. Raghavachari, J. B. Foresman, J. Cioslowski, J. V. Ortiz, B. B.
Stefanov, G. liu, A. Liashenko, P. Piskorz, I. Kpmaromi, G. Gomperts, R. L.
164
G. Bencteux et al.
[Goe99]
[HL07]
[HZ05]
[Koh59]
[Koh96]
[LeB05]
[LNV93]
[PA04]
[PS75]
[SC00]
[YL95]
Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara,
C. Gonzalez, M. Challacombe, P. M. W. Gill, B. G. Johnson, W. Chen,
M. W. Wong, J. L. Andres, M. Head-Gordon, E. S. Replogle, and J. A. Pople.
Gaussian 98 (Revision A.7). Gaussian Inc., Pittsburgh, PA, 1998.
S. Goedecker. Linear scaling electronic structure methods. Rev. Mod.
Phys., 71:1085–1123, 1999.
U. L. Hetmaniuk and R. B. Lehoucq. Multilevel methods for eigenspace
computations in structural dynamics. In Domain Decomposition Methods
in Science and Engineering XVI, volume 55 of Lect. Notes Comput. Sci.
Eng., pages 103–113, Springer, Berlin, 2007.
W. Hager and H. Zhang. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim., 16:170–192,
2005.
W. Kohn. Analytic properties of Bloch waves and Wannier functions.
Phys. Rev., 115:809–821, 1959.
W. Kohn. Density functional and density matrix method scaling linearly
with the number of atoms. Phys. Rev. Lett., 76:3168–3171, 1996.
C. Le Bris. Computational chemistry from the perspective of numerical
analysis. In Acta Numerica, Volume 14, pages 363–444. 2005.
X.-P. Li, R. W. Nunes, and D. Vanderbilt. Density-matrix electronic
structure method with linear system size scaling. Phys. Rev. B, 47:10891–
10894, 1993.
W. P. Petersen and P. Arbenz. Introduction to Parallel Computing.
Oxford University Press, 2004.
C. Paige and M. Saunders. Solution of sparse indefinite systems of linear
equations. SIAM J. Numer. Anal., 12:617–629, 1975.
E. Schwegler and M. Challacombe. Linear scaling computation of the
Fock matrix. Theor. Chem. Acc., 104:344–349, 2000.
W. Yang and T. Lee. A density-matrix divide-and-conquer approach
for electronic structure calculations of large molecules. J. Chem. Phys.,
163:5674, 1995.
Numerical Analysis of a Finite
Element/Volume Penalty Method
Bertrand Maury
Laboratoire de Mathématiques, Université Paris-Sud, FR-91405 Orsay Cedex,
France Bertrand.Maury@math.u-psud.fr
Summary. The penalty method makes it possible to incorporate a large class of
constraints in general purpose Finite Element solvers like freeFEM++. We present
here some contributions to the numerical analysis of this method. We propose an
abstract framework for this approach, together with some general error estimates
based on the discretization parameter ε and the space discretization parameter h. As
this work is motivated by the possibility to handle constraints like rigid motion for
fluid-particle flows, we shall pay a special attention to a model problem of this kind,
where the constraint is prescribed over a subdomain. We show how the abstract
estimate can be applied to this situation, in the case where a non-body-fitted mesh
is used. In addition, we describe how this method provides an approximation of the
Lagrange multiplier associated to the constraint.
1 Introduction
Because of its conceptual simplicity and the fact that it is usually straightforward to implement, the penalty method has been widely used to incorporate
constraints in numerical optimization. The general principle can been seen as
a relaxed version of the following fact: given a proper functional J over a set
X, and K a subset of X, minimizing J over K is equivalent to minimizing
JK = J + IK over X, where IK is the indicatrix of K:
)
0
if x ∈ K
IK (x) =
+∞ if x ∈
/K
Assume now that K can be defined as K = {x ∈ X | Ψ (x) = 0}, where Ψ is
a non-negative function, the penalty method consists in considering relaxed
functionals Jε defined as
1
Jε = J + Ψ,
ε
ε > 0.
By definition of K, the function Ψ/ε approaches IK point-wise:
168
B. Maury
1
Ψ (x) −→ IK (x)
ε
as ε goes to 0, ∀x ∈ X.
If Jε admits a minimum uε , for any ε, one can expect uε to approach a (or
the) minimum of J over K, if it exists.
In actual Finite Element computations, some uεh is computed as the solution to a finite dimensional problem, where h is a space-discretization parameter. The present work is motivated by the fact that, even if the penalty
method for the continuous problem is convergent and the discretization procedure is sound, the rate of convergence of uεh toward the exact solution is not
straightforward to obtain.
To our knowledge, the first paper dedicated to the analysis of the penalty
method in the Finite Element context dates back to 1973 (see [Bab73]), where
this method was used to incorporate Dirichlet boundary conditions in some
variants of the Finite Element Method. Since then, this strategy has been
followed to integrate obstacles in fluid flow simulations [ABF99], to model the
rigidity constraint [JLM05].
The present work is motivated by the handling of rigid particles in a
fluid flow. Various approaches have been proposed to incorporate rigid bodies
in a Stokes or Navier–Stokes fluid: arbitrary Lagrangian Eulerian approach
[JT96, Mau99], fictitious domain approach [PG02]. More recently, a strategy
based on augmented Lagrangian principles was proposed to handle a large
class of multimaterial flows [VCLR04, RPVC05]. In [JLM05], we tested the
raw penalty method to handle the rigidity constraint in a viscous fluid. This
approach is not sophisticated: it simply consists in adding to the variational
formulation the term
1
∇u + ∇T u : ∇v + ∇T v .
ε Ω
It presents some drawbacks: as it is based on pure penalty and not augmented Lagrangian, the penalty parameter has to be taken very small for the
constraint to be fulfilled properly, which may harm the conditioning of the
system to solve. Yet, it shows itself to be robust in practice, it allows the use
of non-boundary-fitted (e.g., Cartesian) meshes. Besides, it is straightforward
to implement, so that a full Navier–Stokes solver (in 2D) with circular rigid
particles can be written in about 50 lines, by using, for example, FreeFem++
(created by O. Pironneau, see [FFp]). Note that new tools for 3D problems
are already available (see, e.g., [ff3, DPP03] or [lif]), which enable to perform
computations of three dimensional fluid-particle flows.
We shall actually focus here on a simpler problem (see the problem (8)),
which is a scalar version of the rigidity constraint. The fluid velocity is indeed
replaced by a temperature field, and the rigid particle is replaced by a zone
with infinite conductivity. The Lagrange multiplier can be interpreted in this
context as a heat source term (see Remark 6).
We begin by presenting some standard properties of the penalty method
for quadratic optimization (Section 2.1), and some convergence results. Then
Numerical Analysis of a Finite Element/Volume Penalty Method
169
we present the model problem, describe how we penalize and discretize it,
and we show how the abstract framework applies to this situation. We finish
by presenting an error estimate for the primal and dual components of the
solutions, in terms of the quantities ε (the penalty parameter) and h (the
mesh step size), whose proof is postponed to another paper.
2 Preliminaries, Abstract Framework
2.1 Continuous Problem
We recall here some standard properties concerning the penalty method applied to infinite dimensional problems. Most of those properties are established
in [BF91], with a slightly different formalism. We shall consider the following
set of assumptions:
⎫
V is a Hilbert space, ϕ ∈ V ′ ,
⎪
⎪
⎪
2 ⎪
⎪
a(·, ·) bilinear, symmetric, continuous, elliptic (a(v, v) ≥ α |v| ),⎪
⎪
⎪
⎪
⎪
⎪
b(·, ·) bilinear, symmetric, continuous, non-negative,
⎪
⎬
K = {u ∈ V | b(u, u) = 0} = ker b,
(1)
⎪
⎪
⎪
1
⎪
⎪
J(v) = a(v, v) − ϕ, v, u = arg min J,
⎪
⎪
K
2
⎪
⎪
⎪
⎪
1
1
⎪
ε
⎭
Jε (v) = a(v, v) + b(v, v) − ϕ, v, u = arg min Jε .
V
2
2ε
Proposition 1. Under the assumptions (1), the solution uε to the penalized
problem converges to u.
Proof. We write the variational formulation for the penalized problem:
1
a(uε , v) + b(uε , v) = ϕ, v
ε
∀v ∈ V.
(2)
Taking v = uε , we get
2
α |uε | ≤ a(uε , uε ) ≤ ϕ |uε |
ε
so that |u| is bounded. We extract a subsequence, still denoted by (uε ), which
converges weakly to some z ∈ V . As Jε ≥ J and b(u, u) = 0, we have
J(uε ) ≤ Jε (uε ) ≤ Jε (u) = J(u)
∀ε > 0,
so that (J is convex and continuous) J(z) ≤ lim inf J(uε ) ≤ J(u). As
J(uε ) +
1
b(uε , uε ) ≤ J(u),
2ε
(3)
170
B. Maury
b(uε , uε )/ε is bounded, so that b(uε , uε ) goes to 0 with ε. Consequently, it
holds 0 ≤ b(z, z) ≤ lim inf b(uε , uε ) = 0, which implies z ∈ K, so that z = u.
To establish the strong character of the convergence, we show that uε
converges toward u for the norm associated to a(·, ·), which is equivalent
to the original norm. As uε converges weakly to u for this scalar product
(a(uε , v) → a(u, v) for any v ∈ V ), it is sufficient to establish the convergence
of |uε |a = a(uε , uε )1/2 towards |u|a . Firstly |u|a ≤ lim inf |uε |a , and the other
inequality comes from (3):
1
1
a(uε , uε ) − ϕ, uε ≤ a(u, u) − ϕ, u,
2
2
so that lim sup |uε |a ≤ |u|a .
The proposition does not say anything about the rate of convergence, and
it can be very poor, as the following example illustrates.
Example 1. Consider I = ]0, 1[, V = H 1 (I), and the problem which consists
in minimizing the functional
1
2
|u′ | ,
J(v) =
2 I
over K = {v ∈ V | v(x) = 0 a.e. in O = ]0, 1/2[}. The solution to that
problem is obviously u = max{0, 2(x − 1/2)}. Now let us denote by uε the
minimum of the penalized functional
1
1
2
2
Jε =
|u′ | +
|u| ,
2 I
2ε O
The solution to the penalized problem can be computed exactly:
−1
x
x
x
1
uε = kε (x) sh √
in ]0, 1/2[ with kε (x) = sh √
,
+ √ ch √
ε
ε
2 ε
ε
and uε affine in ]1/2, 1[, continuous at 1/2. This makes it possible to estimate
|uε − u|, which turns out to behave like ε1/4 .
Yet, in many situations, convergence can be shown to be of order 1, given
some assumptions are fulfilled. Let us introduce ξ ∈ V ′ as the unique linear
functional such that
a(u, v) + ξ, v = ϕ, v
∀v ∈ V.
(4)
Before stating the first order convergence result, we show here that the penalty
method provides an approximation of ξ.
Proposition 2. Let ξ ε ∈ V ′ be defined by
v ∈ V −→ ξ ε , v =
1
b(uε , v).
ε
Then ξ ε converges (strongly) to ξ in V ′ , at least as fast as uε converges to u.
Numerical Analysis of a Finite Element/Volume Penalty Method
171
Proof. This is a direct consequence of the identity which we obtain by subtracting (4) and (2):
1
ξ, v − b(uε , v) = a(u − uε , v)
ε
∀v ∈ V
which yields ξ − ξ ε V ′ ≤ C |u − uε |.
Let us now establish the first order convergence, provided an extra compatibility condition between b(·, ·) and ξ is met.
Proposition 3. Under the assumptions (1), we assume, in addition, that
there exists ξ˜ ∈ V such that
˜ v)
ξ, v = b(ξ,
∀v ∈ V.
Then |uε − u| = O(ε).
Proof. First of all, notice that it is possible to pick ξ˜ in K ⊥ (if not, we project
it onto K ⊥ ). Now following the idea which is proposed in [Bab73] in a slightly
different context (see the proof of Theorem 3.2 therein), we introduce
Rε (v) =
1
1
a(u − v, u − v) + b(εξ˜ − v, εξ˜ − v)
2
2ε
and we develop
Rε (v) =
ε ˜ ˜
1
1
1
˜ v).
a(u, u) + b(ξ,
ξ) + a(v, v) + b(v, v) − a(u, v) − b(ξ,
2
2
2
2ε
˜ v) = ξ, v and −a(u, v)−ξ, v = −ϕ, v, the functional Rε is equal to
As b(ξ,
Jε up to a constant. Therefore, minimizing Rε or minimizing Jε are equivalent
tasks. Let us now introduce w = εξ˜ + u. It comes
Rε (w) =
ε2 ˜ ˜
a(ξ, ξ) + 0 because u ∈ K = ker b,
2
so that |Rε (w)| ≤ Cε2 . As uε minimizes Rε ,
0 ≤ Rε (uε ) =
1
1
a(u − uε , u − uε ) + b(εξ˜ − uε , εξ˜ − uε ) ≤ Cε2 ,
2
2ε
from which we deduce, as a(·, ·) is elliptic, |u − uε | = O(ε).
Corollary 1. Under the assumptions (1), we assume, in addition, that b(·, ·)
can be written b(u, v) = (Bu, Bv), where B is a linear continuous operator
onto a Hilbert space Λ, with closed range. Then |uε − u| = O(ε).
172
B. Maury
Proof. Let us show that the assumption of Proposition 3 is met. It is sufficient
to prove that any ξ ∈ V ′ which vanishes over K identifies through b(·, ·) to
some ξ˜ ∈ V , i.e. there exists ξ˜ ∈ V such that
˜ v)
ξ, v = b(ξ,
∀v ∈ V.
Note that, as ξ vanishes over K, it can be seen as a linear functional defined
on K ⊥ , so that it is equivalent to establish that T : V −→ (K ⊥ )′ defined by
˜ v)
ξ˜ −→ ζ : ζ, v = b(ξ,
∀v ∈ K ⊥
is surjective. We denote by T ⋆ ∈ L(K ⊥ , V ) the adjoint of T . For all w ∈ K ⊥ ,
2
|T ⋆ w| = sup
v =0
b(w, v)
(Bw, Bv)
|Bw|
(T ⋆ w, v)
= sup
= sup
≥
.
|v|
|v|
|v|
|w|
v =0
v =0
As B has closed range, |Bw| ≥ C |w| for all w in (ker B)⊥ = K ⊥ , so that
|T ⋆ w| ≥ C 2 |w|
∀w ∈ K ⊥ ,
from which we conclude that T is surjective.
Remark 1. Note that Proposition 3 is strictly stronger than its corollary. Consider the standard situation
V = H 1 (Ω) where, Ω is a smooth, bounded
,
domain, a(u,
∇u · ∇v, and ϕ, v =
f v, where f is L2 , and
, v)2 =
b(v, v) = ∂Ω v . In this situation the corollary cannot be used, because the
trace operator from H 1 (Ω) onto L2 (∂Ω) does not have a close range. On the
other hand, one can establish that
∂u
v
ξ, v =
∂Ω ∂n
and, as the solution u is regular (u ∈ H 2 (Ω)), its normal derivative (in
H 1/2 (∂Ω)) can be built as the trace of a function ξ˜ in H 1 (Ω), so that Proposition 3 holds true.
We conclude this section by some considerations concerning the saddlepoint formulation of the constrained problem. We consider again the closed
situation:
Proposition 4. Under the assumptions of Corollary 1, there exists λ ∈ Λ
such that
a(u, v) + (λ, Bv) = ϕ, v ∀v ∈ V.
(5)
The solution is unique in B(V ) (which identifies to Λ/ ker B ⋆ ).
Proof. The proof of this standard property can be found in [BF91]. In fact, it
˜ As for
has just been established in the proof of Corollary 1: λ is simply B ξ.
uniqueness in B(V ), consider two solutions λ1 , λ2 . The equation (5) implies
that λ2 − λ1 is in ker B ⋆ = B(V )⊥ .
Numerical Analysis of a Finite Element/Volume Penalty Method
173
Proposition 5. Under the assumptions of Proposition 4 (the assumptions (1)
and B(V ) is closed), we introduce
λε =
1
Buε .
ε
Then |λε − λ| = O(ε), where λ is the unique solution of (5) in B(V ).
Proof. Subtracting the variational formulations for u and uε , we get
(λε − λ, Bv) = a(uε − u, v) ∀v ∈ V.
Now, as the range of B is closed, and λε − λ ∈ B(V ) = (ker B ⋆ )⊥ , we have
the inf-sup condition (see, e.g., [BF91])
sup
v∈V
(λε − λ, Bv)
≥ β |λε − λ| ,
|v|
so that
β |λε − λ| ≤ sup
a(uε − u, v)
(λε − λ, Bv)
= sup
≤ a |uε − u| ,
|v|
|v|
which ensures the first order convergence thanks to Corollary 1.
2.2 Discretized Problem
We consider now a family (Vh )h of inner approximation spaces (Vh ⊂ V ) and
the associated penalized/discretized problems
⎧
ε
ε ε
⎪
inf Jhε (v),
⎨ Find uh ∈ Vh such that Jh (uh ) = v∈V
(6)
1
1
⎪
⎩ Jhε (vh ) = a(vh , vh ) + b(vh , vh ) − ϕ, vh .
2
2ε
As far as we know, there does not exist any general theory which would
give an upper bound for the error |u − uεh | as the sum of a discretization error (typically h of h1/2 for volume penalty, depending on whether the mesh
is boundary-fitted or not), and a penalty error (typically ε for closed-range
penalty terms). We propose here two general properties which are direct consequences of standard arguments. They are suboptimal in the sense that neither
of them is optimal from both standpoints (discretization and penalty), but,
at least, they make it possible to recover the behavior in extreme situations
(when ε goes to 0 much quicker than h, and the opposite situation).
We shall need the following lemma:
Lemma 1. Under the assumptions (1), there exists C > 0 such that
b(uε , uε ) ≤ Cε |u − uε | .
174
B. Maury
Proof. By the definition of uε ,
Jε (uε ) =
1
1
1
a(uε , uε ) − ϕ, uε + b(uε , uε ) ≤ Jε (u) = a(u, u) − ϕ, u,
2
2ε
2
so that
0≤
1
1
1
b(uε , uε ) ≤ a(u, u) − a(uε , uε ) + ϕ, uε − u
2ε
2
2
1
ε
≤ a(u + u , u − uε ) + ϕ, uε − u,
2
which yields the estimate by continuity of a(·, ·) and ϕ.
Proposition 6. Under the assumptions (1), we denote by uεh the solution to
the problem (6). Then
+
ε
ε
|uh − u| ≤ C
min |vh − u| + |u − u| .
vh ∈Vh ∩K
Proof. As uεh minimizes a(v − uε , v − uε ) + b(v − uε , v − uε )/ε over Vh ,
2
α |uεh − uε | ≤ a(uεh − uε , uεh − uε )
1
≤ a(uεh − uε , uεh − uε ) + b(uεh − uε , uεh − uε )
ε
1
ε
≤ min a(vh − u , vh − uε ) + b(vh − uε , vh − uε )
vh ∈Vh
ε
1
≤ min
a(vh − uε , vh − uε ) + b(vh − uε , vh − uε ) .
vh ∈Vh ∩K
ε
As vh is in K, the second term is b(uε , uε )/ε, which is bounded by C |uε − u|
(by Lemma 1). Finally, we get
+
ε
ε
ε
ε
|uh − u | ≤ C
min |vh − u | + |u − u| ,
vh ∈Vh ∩K
from which we conclude.
Proposition 7. Under the assumptions (1), it holds
C
|uεh − u| ≤ √ inf |uε − vh | + |uε − u| ,
ε vh ∈Vh
where uεh is the solution to (6).
Proof. One has
|uεh − u| ≤ |uεh − uε | + |uε − u| ,
and we control the first term by Céa’s lemma applied to the bilinear form
a + b/ε, whose ellipticity constant behaves like 1/ε.
Numerical Analysis of a Finite Element/Volume Penalty Method
175
Example 2. The simplest example of penalty formulation one may think about
is the following: the constraint to vanish on a subdomain O ⊂⊂ Ω is handled
by minimizing the functional
1
1
2
|∇v| −
u2 .
(7)
fv +
Jε (v) =
2 Ω
2ε
O
Ω
Example 1 (which is a one-dimensional version of this situation) suggests that
|uε − u| behaves like ε1/4 . If we admit this convergence rate, Proposition 6
provides an estimate in h1/2 + ε1/8 . This estimate is optimal in h: the natural
space discretization order is obtained if ε is small enough (ε = h4 in the
present case).
Symmetrically, the natural order in ε can be recovered if h is small enough.
Indeed, if we admit that uε can be approximated at the same order as u over
Ω, which is 1/2, then the choice ε = h4/3 in Proposition 7 gives
|uεh − u| ≤
C
ε1/2
ε3/4 + ε1/4 = O(ε1/4 ).
2
Notice that if we replace u2 by u2 + |∇u| in the integral over O in (7),
the assumptions of Corollary 1 are fulfilled, so that convergence holds at the
first order in ε. As a consequence, |u − uεh | is bounded by C(h1/2 + ε1/2 ) (by
Proposition 6), which suggests the choice ε = h.
3 Model Problem
This section is dedicated to a special situation, which can be seen as a scalar
version of the rigidity constraint in a Stokes flow.
We introduce a domain Ω ⊂ R2 (smooth, or polygonal and convex), and
O ⊂⊂ Ω which we suppose circular (see Remark 4, at the end of this paper,
for extensions to more general situations). We consider the following problem:
⎧
−△u = f
in Ω \ O,
⎪
⎪
⎪
⎪
⎪
u=0
on ∂Ω,
⎨
(8)
u=U
on ∂O,
⎪
⎪
⎪
⎪
∂u
⎪
⎩
= 0,
∂O ∂n
where U is an unknown constant, and f ∈ L2 (Ω \ O). The scalar field u can
be seen as a temperature, and O as a zone with infinite conductivity.
Definition 1. We say that u is a weak solution to (8) if u ∈ V = H01 (Ω),
there exists U ∈ R such that u = U almost everywhere in O, and
f v ∀v ∈ D(Ω).
∇u · ∇v =
Ω
Ω
176
B. Maury
Proposition 8. The problem (8) admits a unique weak solution u ∈ V =
H01 (Ω), which is characterized as the solution to the minimization problem
⎧
Find u ∈ K such that
⎪
⎪
⎪
⎨
1
2
J(u) = inf J(v), with J(v) =
|∇u| −
fv
(9)
v∈K
2 Ω
⎪
Ω
⎪
⎪
⎩
K = {v ∈ H01 (Ω) | ∇v = 0 a.e. in O},
where f has been extended by 0 inside O.
Furthermore, the restriction of u to Ω \ O is in H 2 (Ω \ O).
Proof. Existence and uniqueness are direct consequences of the Lax–Milgram
theorem applied in K = {v ∈ V | ∇v = 0 a.e. in O}, which gives, in addition,
the characterization of u as the solution to (9). Now the restriction of u to
Ω \ O satisfies −△u = f , with regular Dirichlet boundary conditions on the
boundary of Ω \ O which decomposes as ∂O ∪ ∂Ω. As Ω is a convex polygon
and ∂O is smooth, standard theory ensures u|Ω\O ∈ H 2 (Ω \ O).
We introduce the penalized version of the problem (9)
⎧
ε
ε ε
inf J ε (v),
⎪
⎨ Find u ∈ V such that J (u ) = v∈V
1
1
2
2
⎪
⎩ J ε (v) =
f v,
|∇v| +
|∇v| −
2 Ω
2ε O
Ω
(10)
for which linear convergence can be expected:
Proposition 9. Let u be the solution to the problem (9), uε the solution to
the problem (10). It holds u − uε H 1 (Ω) = O(ε).
Proof. Let us show that
B : v ∈ H01 (Ω) −→ ∇v ∈ L2 (O)2
has a closed range. Consider µ ∈ Λ with µ = ∇v. We define w ∈ H01 (O) as
w = v − m(v), where m(v) is the mean value of v over O. By the Poincaré–
Wirtinger inequality, one has
wH 1 (O) ≤ C µL2 (O)2 .
Now, as O ⊂⊂ Ω, there exists a continuous extension operator from H 1 (O)
to H01 (Ω), so that we can extend w to obtain w̃ ∈ H01 (Ω) with a norm controlled by µL2 (O)2 , which proves the closed character of B(V ). The linear
convergence is then given by Corollary 1.
Numerical Analysis of a Finite Element/Volume Penalty Method
177
3.1 Standard Estimate
Now we consider the family of Cartesian triangulations (Th ) of the square
Ω (see Fig. 1), and we denote by Vh the standard Finite Element space of
continuous, piecewise affine function with respect to Th .
The discretized/penalized problem reads
⎧
ε
ε ε
inf J ε (vh ),
⎪
⎨ Find uh ∈ Vh such that J (uh ) = v∈V
h
(11)
1
1
2
2
⎪ J ε (v ) =
⎩
f vh .
|∇vh | +
|∇vh | −
h
2 Ω
2ε O
Ω
Proposition 10 (Error estimate for (8)). Let u be the weak solution to (9)
and uεh the solution to (11). There exists C > 0 such that
|u − uεh | ≤ C(h1/2 + ε1/2 ).
(12)
Proof. The proof relies on Proposition 6, which asserts
+
ε
ε
min |vh − u| + |u − u| .
|uh − u| ≤ C
vh ∈Vh ∩K
By Proposition 9 the second term scales like ε1/2 . As for the space discretization term, the h1/2 -estimate is given by Proposition 11 below.
Proposition 11 (Approximation of u). We make the same assumptions
as in the beginning of the section, and we consider u ∈ H01 (Ω) such that
u = U ∈ R a.e. in O, uΩ\O ∈ H 2 (Ω \ O). As previously, we consider a
Ω
O
Fig. 1. Domains Ω, O, and the mesh Th .
178
B. Maury
Cartesian triangulation Th of Ω and the associated first order Finite Element
space Vh . There exists C > 0 such that
inf u − ũh H 1 (Ω) ≤ Ch1/2 .
ũh ∈Vh
Proof. We shall use in the proof the following notations: given a domain ω
and v a function over ω, we denote by |u|0,ω the L2 -norm of v over ω, by
|v|1,ω =
ω
2
|∇v|
1/2
the H 1 -seminorm, etc...
We denote by Ih is the standard interpolation operator from C(Ω) onto
Vh . Notice that u is continuous over Ω (it is piecewise H 2 , and continuous
through the interface ∂O). Let us assume here that the constant value U
on O is 0 (which can be achieved by subtracting a smooth extension of this
constant outside O). We define ũh as the function in Vh which is 0 at every
vertex contained in a triangle which intersects O, and which identifies to Ih u
at all other vertices. We introduce a narrow band around O
√
ωh = {x ∈ Ω | x ∈
/ O, d(x, O) < 2 2h}.
As u|Ω\O ∈ H 2 (Ω \ O), the standard finite element estimate gives
|u − ũh |0,Ω\(O∪ωh ) ≤ Ch2 |u|2,Ω\O ,
|u − ũh |1,Ω\(O∪ωh ) ≤ Ch |u|2,Ω\O .
(13)
(14)
By construction, both L2 - and H 1 -errors in O are zero. There remain to
estimate the error in the band ωh . The principle is the following: ũh is a poor
approximation of u in ωh , but it is not very harmful because ωh is small.
Similar estimates are proposed in [SMSTT05] or [AR]. We shall give here a
proof more adapted to our situation. First of all, we write
u − ũh ≤ |u|0,ωh + |u|1,ωh + |uh |0,ωh + |uh |1,ωh = A + B + C + D.
(15)
We assume here that u is C 2 in Ω \ O (the general case h ∈ H 2 (Ω \ O) is
obtained immediately by density). Using polar coordinates (we assume here
that the radius of O is 1), we write
2
|u|1,ωh
=
0
2π
1
1+2h
2
|∇u| r dr dθ.
For i = 1, 2, one has
∂i u(r, θ) = ∂i u(1, θ) +
1
r
∂r ∂i u dr,
Numerical Analysis of a Finite Element/Volume Penalty Method
179
so that
2
|u|1,ωh
≤2
2π
0
1
1+2h
2
|∂i u(1, θ)| r dr dθ+2
2π
0
1+2h
1
2
r
∂r ∂i u ds r dr dθ
1
2
2π 1+2h 1+2h
∂u
2
+ 2h
≤ Ch
|∂r ∂i u| ds r dr dθ
∂n
∂O
1
1
0
2
2
≤ Ch |u|2,Ω\O + C ′ h2 |u|2,Ω\O ,
from which we deduce |u|1,ωh ≤ Ch1/2 . A similar computation on u gives
immediately |u|0,ωh ≤ Ch3/2 . As for ũh (the two last terms in (15)), the proof
is less trivial. It relies on three technical lemmas which we give now before
ending the proof.
Lemma 2. There exist constants C and C ′ such that, for any non-degenerated
triangle T , for any function wh affine in T ,
2
2
2
C |T | wh L∞ (T ) ≤ wh L2 (T ) ≤ C ′ |T | wh L∞ (T ) .
(16)
Proof. It is a consequence of the fact that, when deforming the supporting
1/2
triangle T , the L∞ -norm is unchanged whereas the L2 -norm scales like |T | .
Lemma 3. There exists a constant C such that, for any non-degenerated triangle T , for any function wh affine in T ,
2
|wh |1,T ≤ C
|T |
2
wh L∞ (T ) ,
ρ2T
where ρT is the diameter of the inscribed circle.
Proof. Again, it is a straightforward consequence of the fact that, when
deforming the supporting triangle T , L∞ -norm is unchanged whereas the
gradient (which is constant over the triangle) scales like 1/ρT , so that the
H 1 -seminorm scales like |T | /ρT .
The last lemma quantifies how one can control the L2 -norm of the interpolate of a regular function on a triangle, by means of the L2 -norm and the
H 2 -seminorm of the function.
Lemma 4. There exists a constant C such that, for any regular triangle T
(see below), for any u ∈ H 2 (T ),
2
2
2
|Ih u|0,T ≤ C |u|0,T + h4 |u|2,T .
By regular, we mean that T runs over a set of triangles such that the flatness
diam T /ρT is bounded.
180
B. Maury
Proof. The interpolation operator Ih : H 2 (T ) → L2 (T ) is continuous, and
|u|2,T scales like h/ρ2T ≈ 1/h whereas the L2 -norms scale like h.
We may now complete the proof of Proposition 11. The problematic triangles are those on which ũh neither identifies to 0 nor to Ih u. On such triangles,
ũh sticks to Ih u at 1 or 2 vertices, and vanishes at 2 or 1 vertices. As a consequence, the L∞ -norm of ũh is less than the L∞ -norm of Ih u. Let T be such
a triangle. We write (using Lemma 2, the latter remark, the fact that Ih is a
contraction from L∞ onto L∞ , Lemma 2 again, and Lemma 4),
2
2
2
|ũh |0,T ≤ C ′ |T | ũh L∞ (T ) ≤ C ′ |T | Ih uL∞ (T )
≤
C′
2
2
2
Ih uL2 (T ) ≤ C ′′ |u|0,T + h4 |u|2,T .
C
By summing up all these contributions over all triangles outside O which
intersect ωh (they are all contained in ω2h ) and using the fact that the L2 -norm
of u on ωh behaves like h3/2 |u|2,ωh , we obtain
2
2
2
2
2
|ũh |0,ωh ≤
|ũh |0,T ≤ C |u|0,ω2h + h4 |u|2,ω2h ≤ Ch3 |u|2,ω2h ,
T ∩ωh =∅
which gives the expected h3/2 -estimate for |uh |0,ωh . The last term of (15)
is handled straightforwardly: Thanks to Lemmas 2 and 3, which imply the
inverse inequality |ũh |1,ωh ≤ Ch−1 |ũh |0,ωh , we obtain the h1/2 bound for
|ũh |1,ωh .
3.2 Primal/Dual Estimate
Proposition 5 asserts the convergence of λε towards λ, the Lagrange multiplier
associated to the constraint. One may wonder whether λεh = Buεh /ε is likely
to approximate λ. In general, a positive answer to that question can be given
as soon as a uniform discrete inf-sup condition for B is fulfilled. This condition
is not verified in the situation we consider. The non-uniformity of the inf-sup
condition is due to the fact that there may exist triangles whose intersection
with O is very small. We propose here a way to overcome this problem by
suppressing those tiny areas in the penalty term, which leads us to introduce a
discrete version Bh of B. Let us first give some properties for the continuous
Lagrange multiplier, and we shall give a precise description of the way the
obstacle is lifted.
Proposition 12 (Saddle-point formulation of (9)). Let u be the weak
solution to (8). There exists a unique λ ∈ Λ = L2 (O)2 such that λ is a
gradient, and
∇u · ∇v +
λ · ∇v =
f v ∀v ∈ V.
O
Ω
1
2
In addition, λ is in H (O) .
Ω
Numerical Analysis of a Finite Element/Volume Penalty Method
181
Proof. The first part is a consequence of Proposition 4 (we established in
the proof of Proposition 9 that the range of B is closed), which ensures the
existence of λ ∈ Λ its uniqueness in B(V ).
Let us now describe λ. We have
∇u · ∇v +
λ · ∇v =
f v,
Ω
O
Ω
so that, by taking tests functions in D(O), we get λ ∈ Hdiv (O) with ∇·λ = 0.
Taking now test functions which do not vanish on the boundary of O, we
identify the normal trace of λ as ∂u/∂n ∈ H 1/2 (∂O). Therefore, λ is defined
as the unique divergence free vector field in O, with normal derivative equal
to ∂u/∂n on ∂O, which, in addition, is a gradient. In other words: λ = ∇Φ,
with
⎧
in O,
⎨ △Φ = 0
⎩ ∂Φ = ∂u
on ∂O.
∂n
∂n
As O is smooth, Φ ∈ H 2 (O), so that λ = ∇Φ ∈ H 1 (O)2 .
Now we consider again the family of Cartesian triangulations (Th ) of the
square Ω (see Fig. 1), and we denote by Vh the standard Finite Element space
of continuous, piecewise affine functions with respect to Th . As indicated in
the beginning of this section, we suppress the small areas in the computation
of the penalty term by introducing a reduced obstacle Oh :
Definition 2. The reduced obstacle Oh ⊂ O is defined as the union of the
sets T ∩ O, where T runs over triangles of Th such that their intersection with
O compares reasonably with their own size, in the following sense: given η > 0
a fixed parameter, we set
%
(T ∩ O) .
(17)
Oh =
|T ∩O|≥η|T |
Definition 3. We recall that V = H01 (Ω), Λ is L2 (O)2 , and B ∈ L(V, Λ) is
the gradient operator (see Proposition 12). We define Bh ∈ L(V, Λ) as
v ∈ V −→ µ = Bh v = χOh ∇v,
where χOh is the characteristic function of Oh (see Definition 2). Finally, the
discretization space Λh ⊂ Λ = L2 (O)2 is the set of all those vector fields µh
such that their restriction to Oh is the gradient of a scalar field vh ∈ Vh , and
which vanish almost everywhere in O \ Oh , which we can express
Λh = {µh ∈ Λ | ∃vh ∈ Vh , µh = Bh vh } = Bh (Vh ).
The fully discretized problem reads
182
B. Maury
⎧
ε
ε ε
Jhε (vh ),
⎪
⎨ Find uh ∈ Vh such that Jh (u ) = vhinf
∈Vh
1
1
2
2
⎪
⎩ Jhε (vh ) =
|∇vh | −
|∇vh | +
f vh .
2 Ω
2ε Oh
Ω
(18)
We may now state the primal/dual estimate.
Proposition 13 (Primal/dual error estimate for (8)). Let u be the weak
solution to (8), uεh the solution to (11), λ the Lagrange multiplier (see Proposition 12), and λεh = Bh uεh /ε (see Definition 3). We have the following error
estimate:
|u − uεh | + |λ − λεh | ≤ C(h1/2 + ε1/2 ).
(19)
Proof. The proof of this estimate is quite technical (in particular, the discrete
inf-sup condition, see below), and we shall detail it on a forthcoming paper.
Let us simply say here that it relies on the following ingredients:
1. some general properties of the continuous penalty method which we established in the beginning of this section,
2. an abstract stability estimate for saddle point-like problems with stabilization, in the spirit of Theorem 1.2 in [BF91],
3. a uniform discrete inf-sup condition for Bh :
sup
vh ∈Vh
(Bh vh , λh )
≥ β λh Λh ,
|vh |
(20)
4. some approximation properties for Vh (Proposition 11 and a similar property for the Lagrange multiplier).
Remark 2 (Optimal estimate, role of η in the definition of Oh ). The estimate
we establish is still suboptimal in ε: the order 1/2 is obtained, whereas the
continuous method converges linearly. It is due to the fact that we had to
introduce a discrete operator Bh , and the difference leads to an extra term
which scales like ε1/2 . It calls for some comments on the parameter η which
appears in the definitions of Oh and Bh (see Definitions 2 and 3). The smaller
η is, the closer Bh approaches B, which reduces the ε1/2 term in the estimate. This observation may suggest to have η go to zero in the theoretical
estimate. But, on the other hand, when η goes to 0, so does the inf-sup constant β (see (20)), so that 1/β, which is hidden in the constant C in the error
estimate (19), blows up.
Remark 3 (Boundary fitted meshes). Although it is somewhat in contradiction with its original purpose, the penalty method can be used together with
a discretization based on a boundary fitted mesh. In that case, the approximation error behaves no longer like h1/2 , but like h. More important, it is not
necessary to get rid of the tiny triangles which were incompatible, in case of a
Cartesian mesh, with the uniform discrete inf-sup condition. Now considering
that the half order in ε was lost because of the fact we introduced a reduced
obstacle, one can expect to recover the optimal order of convergence, both in
h and in ε.
Numerical Analysis of a Finite Element/Volume Penalty Method
183
Remark 4 (Technical assumptions). Some assumptions we made are only technical and can surely be relaxed without changing the convergence results. For
example, the inclusion, which we supposed circular, could be any smooth
domain. Note that a convex polygon is not acceptable, as it is seen by the
problem from the outside, so that u may no longer be in H 2 , which rules out
some of the approximation properties we made.
Remark 5 (Convergence in space). The poor rate of convergence in h is optimal for a non-boundary-fitted mesh, at least if we consider the H 1 -error overall
Ω. Indeed, as the solution is constant inside O, non-constant outside with a
jump in the normal derivative, the error within each element intersecting ∂O
is a O(1) in this L∞ -norm. By summing up over all those triangles, which
cover a zone whose measure scales like h, we end up with this h1/2 -error. Note
that a better convergence could be expected, in theory, if one considers only
the error in the domain of interest Ω \ O, the question being now whether the
bad convergence in the neighborhood of ∂O pollutes the overall approximation. Our feeling is that this pollution actually occurs, because nothing is done
in the present approach to distinguish the real domain of interest from the
fictitious domain (inside the obstacle), so that the method tends to balance
the errors on both sides. An interesting way to privilegiate the side of interest
is proposed in [DP02] for a boundary penalty method; it consists in having
the diffusion coefficient vanish within Ω. Note that other methods have been
proposed to reach the optimal convergence rate on non-boundary-fitted mesh
(see [Mau01]), but they are less straightforward to implement.
The simplest way to improve the actual order of convergence is to carry
out a local refinement strategy in the neighborhood of ∂O (see [RAB06], for
example). By using elements of scale h2 in this zone, one recovers the first
order convergence in space, at least in practice.
Remark 6 (Meaning of λ). As we already mentioned, the Lagrange multiplier
λ can often be interpreted as a force or a heat source which ensures the
prescribed constraint, depending on the context, and it may be useful to
estimate this term with accuracy. For example, the problem we considered
can be reformulated as a control problem: find a source term g with zero
mean value (no heat is injected into the system) which is subject to vanish
outside O, such that the solution u to
−△u = f + g,
u = 0 on ∂Ω,
is constant over O. This equation is to be considered in the distributional
sense, as g is surely not a function. (If it were L2 , for example, u would be
in H 2 (Ω), which is surely not true as its normal derivative overcomes a jump
through ∂O.) Abstractly speaking, this source term g is simply the opposite
of the linear functional ξ which we introduced (see (4)) and it is related to
the Lagrange multiplier λ (see (5))
g, v = −ξ, v = −
λ · ∇v = −
λ · n v.
O
∂O
184
B. Maury
The source term g is, therefore, a single layer distribution supported by ∂O,
with weight −λ · n = −∂u/∂n (where n is the outward normal to ∂O). Note
that it is in H 1/2 (∂O).
Remark 7. Note that letting ε go to 0 for any h > 0 leads to an estimate for
a fictitious domain method (à la Glowinski, i.e. based on the use of Lagrange
multiplier). In [GG95], an error estimate is obtained for such a method; it
relies on two independent meshes for the primal and dual components of the
solution (conditionally to some compatibility conditions between the sizes of
the two meshes). We recover this estimate in the situation where the local
mesh is simply the restriction of the covering mesh to the obstacle (to the
reduced obstacle Oh , to be more precise).
Remark 8. The approach we presented can be extended to other situations,
like the one we already considered in Example 2, as soon as a H 1 -penalty is
used. The functional to minimize is then
1
1
2
2
u2 + |∇u| ,
|∇v| −
fv +
Jε (v) =
2 Ω
2ε Ω
Ω
so that B identifies to the restriction operator from H01 (Ω) to H 1 (O). The discrete inf-sup condition, as well as the approximation properties, are essentially
the same as in the case we considered here.
Concerning the original problem of simulating fluid-particle flows, an extra
difficulty lies in the fact that two constraints of different types must be dealt
with (global incompressibility and local rigid motion). It raises additional
issues which shall be addressed in the future.
References
[ABF99]
[AR]
[Bab73]
[BF91]
[DP02]
[DPP03]
Ph. Angot, Ch.-H. Bruneau, and P. Fabrie. A penalization method
to take into account obstacles in incompressible viscous flows. Numer.
Math., 81(4):497–520, 1999.
Ph. Angot and I. Ramière. Convergence analysis of the Q1-finite element method for elliptic problems with non boundary-fitted meshes.
Submitted.
I. Babuška. The finite element method with penalty. Math. Comp.,
27:221–228, 1973.
F. Brezzi and M. Fortin. Mixed and hybrid finite element methods,
volume 15 of Springer Series in Computational Mathematics. SpringerVerlag, New York, 1991.
S. Del Pino. Une méthode d’éléments finis pour la résolution d’EDP
dans des domaines décrits par géométrie constructive. PhD thesis,
Université Pierre et Marie Curie, Paris, 2002.
S. Del Pino and O. Pironneau. A fictitious domain based general PDE
solver. In E. Heikkola, editor, Numerical Methods for Scientific Computing, Barcelona, 2003.
Numerical Analysis of a Finite Element/Volume Penalty Method
[ff3]
[FFp]
[GG95]
185
freeFEM3D (http://www.freefem.org/ff3d/).
freeFEM++ (http://www.freefem.org/).
V. Girault and R. Glowinski. Error analysis of a fictitious domain
method applied to a Dirichlet problem. Japan J. Indust. Appl. Math.,
12(3):487–514, 1995.
[JLM05]
J. Janela, A. Lefebvre, and B. Maury. A penalty method for the simulation of fluid-rigid body interaction. In CEMRACS 2004—mathematics
and applications to biology and medicine, volume 14 of ESAIM Proc.,
pages 115–123 (electronic). EDP Sci., Les Ulis, 2005.
[JT96]
A. A. Johnson and T. E. Tezduyar. Simulation of multiple spheres
falling in a liquid-filled tube. Comput. Methods Appl. Mech. Engrg.,
134(3-4): 351–373, 1996.
[lif]
LifeV (http://www.lifev.org/).
[Mau99]
B. Maury. Direct simulations of 2D fluid-particle flows in biperiodic
domains. J. Comput. Phys., 156(2):325–351, 1999.
[Mau01]
B. Maury. A fat boundary method for the Poisson problem in a domain
with holes. J. Sci. Comput., 16(3):319–339, 2001.
[PG02]
T.-W. Pan and R. Glowinski. Direct simulation of the motion of neutrally buoyant circular cylinders in plane Poiseuille flow. J. Comput.
Phys., 181(1):260–279, 2002.
[RAB06]
I. Ramière, Ph. Angot, and M. Belliard. A fictitious domain approach
with spread interface for elliptic problems with general boundary conditions. Comput. Methods App. Mech. Engrg., 196(4–6):766–781, 2007.
[RPVC05] T. N. Randrianarivelo, G. Pianet, S. Vincent, and J. P. Caltagirone.
Numerical modelling of solid particle motion using a new penalty
method. Internat. J. Numer. Methods Fluids, 47:1245–1251, 2005.
[SMSTT05] J. San Martı́n, J.-F. Scheid, T. Takahashi, and M. Tucsnak. Convergence of the Lagrange-Galerkin method for the equations modelling the
motion of a fluid-rigid system. SIAM J. Numer. Anal., 43(4):1536–1571
(electronic), 2005.
[VCLR04] S. Vincent, J.-P. Caltagirone, P. Lubin, and T. N. Randrianarivelo.
An adaptative augmented Lagrangian method for three-dimensional
multimaterial flows. Comput. & Fluids, 33(10):1273–1289, 2004.
A Numerical Method for Fluid Flows
with Complex Free Surfaces
Andrea Bonito1∗ and Alexandre Caboussat2 , Marco Picasso3 ,
and Jacques Rappaz3
1
2
3
Department of Mathematics, University of Maryland, College Park, MD
20742-4015, USA andrea.bonito@a3.epfl.ch
Department of Mathematics, University of Houston, 77204-3008, Houston, TX,
USA caboussat@math.uh.edu
Institute of Analysis and Scientific Computing, Ecole Polytechnique Fédérale de
Lausanne, 1015 Lausanne, Switzerland
{marco.picasso;jacques.rappaz}@epfl.ch
Summary. A numerical method for the simulation of fluid flows with complex free
surfaces is presented. The liquid is assumed to be a Newtonian or a viscoelastic
fluid. The compressible effects of the surrounding gas are taken into account, as
well as surface tension forces. An Eulerian approach based on the volume-of-fluid
formulation is chosen. A time splitting algorithm, together with a two-grids method,
allows the various physical phenomena to be decoupled. A chronological approach is
adopted to highlight the successive improvements of the model and the wide range
of applications. Numerical results show the potentialities of the method.
1 Introduction
Complex free surface phenomena involving Newtonian and/or non-Newtonian
flows are nowadays a topic of active research in many fields of physics, engineering or bioengineering. The literature contains numerous models for complex liquid-gas free surfaces problems, see, e.g., [FCD+ 06, SZ99]. For instance,
when considering the injection of a liquid in a complex cavity initially filled
with gas, an Eulerian approach is generally adopted in order to catch the
topology changes of the liquid region.
Such two-phases flows are computationally expensive in three space dimensions since (at least) both the velocity and pressure must be computed at
each grid point of the whole liquid-gas domain.
The purpose of this article is to review a numerical model in order to
compute complex free surface flows in three space dimensions. The features
∗
Partially supported by the Swiss National Science Foundation Fellowship PBEL2–
114311
188
A. Bonito et al.
of the model are the following. A volume-of-fluid method is used to track
the liquid domain, which can exhibit complex topology changes. The velocity
field is computed only in the liquid region. The incompressible liquid can be
modeled either as a Newtonian or as a viscoelastic fluid. The ideal gas law is
used to compute the external pressure in the surrounding gas and the resulting
force is added on the liquid-gas free surface. Surface tension effects can also
be taken into account on the liquid-gas free surface. The complete description
of the model can be found in [MPR99, MPR03, CPR05, Cab06, BPL06].
The numerical model is based on a time-splitting approach [Glo03] and a
two-grids method. This allows advection, diffusion and viscoelastic phenomena
to be decoupled, as well as the treatment of the liquid and gas phases. Finite
element techniques [FF92] are used to solve the diffusion phenomena using an
unstructured mesh of the cavity containing the liquid. A forward characteristic
method [Pir89] on a structured grid allows advection phenomena to be solved
efficiently.
The article is structured as follows. In Section 2, the simplest model is
presented: the liquid is an incompressible Newtonian fluid, the effects of the
surrounding gas and surface tension are neglected. The effects of the surrounding gas are described in Section 3, those of the surface tension in Section 4.
Finally, the case of a viscoelastic liquid is considered in Section 5. Numerical
results are presented throughout the text and illustrate the capabilities and
improvements of the model.
2 Modeling of an Incompressible Newtonian Fluid
with a Free Surface
2.1 Governing Equations
The model presented in this section has already been published in [MPR99,
MPR03]. Let Λ, with a boundary ∂Λ, be a cavity of R3 in which a liquid
must be confined, and let T > 0 be the final time of simulation. For any given
time t ∈ (0, T ), let Ωt , with a boundary ∂Ωt , be the domain occupied by
the liquid, let Γt = ∂Ωt \ ∂Λ be the free surface between the liquid and the
surrounding gas and let QT be the space-time domain containing the liquid,
i.e. QT = {(x, t) : x ∈ Ωt , 0 < t < T }.
In the liquid region, the velocity field v : QT → R3 and the pressure
field p : QT → R are assumed to satisfy the time-dependent, incompressible
Navier–Stokes equations, that is
ρ
∂v
+ ρ(v · ∇)v − 2 div (µD(v)) + ∇p = f in QT ,
∂t
div v = 0 in QT .
(1)
(2)
Here D(v) = 0.5 · (∇v + ∇vT ) denotes the rate of deformation tensor, ρ the
constant density and f the external forces.
Fluid Flows with Complex Free Surfaces
189
The dynamic viscosity µ can be constant or, in order to take
+ into account
turbulence effects, a turbulent viscosity µT = µT (v) = αT ρ 2D(v) : D(v),
where αT is a parameter to be chosen, is added. The use of a turbulent
viscosity is required when large Reynolds numbers and thin boundary layers are involved. Otherwise, in order to consider Bingham flows (when considering
+ mud flows or avalanches, for instance), a plastic viscosity µB =
α0 ρ/ 2D(v) : D(v), where α0 is a parameter to be chosen, can be added.
Let ϕ : Λ × (0, T ) → R be the characteristic function of the liquid domain
QT . The function ϕ equals one if the liquid is present, zero if it is not, thus
Ωt = {x ∈ Λ : ϕ(x, t) = 1}. In order to describe the kinematics of the free
surface, ϕ must satisfy (in a weak sense)
∂ϕ
+ v · ∇ϕ = 0 in Λ × (0, T ),
∂t
(3)
where the velocity v is extended continuously in the neighborhood of QT .
At initial time, the characteristic function of the liquid domain ϕ is given,
which defines the initial liquid region Ω0 = {x ∈ Λ : ϕ(x, 0) = 1}. The initial
velocity field v is prescribed in Ω0 .
The boundary conditions for the velocity field are the following. On the
boundary of the liquid region being in contact with the walls (that is to say
the boundary of Λ), inflow, slip or Signorini boundary conditions are enforced,
see [MPR99, MPR03]. On the free surface Γt , the forces acting on the free
surface are assumed to vanish, when both the influence of the external media
and the capillary and surface tension effects are neglected on the free surface.
If these influences are not neglected, we have to establish the equilibrium of
forces on the free surface. In the first case, the following equilibrium relation
is then satisfied on the liquid-gas interface:
−pn + 2µD(v)n = 0 on Γt , t ∈ (0, T ),
(4)
where n is the unit normal of the liquid-gas free surface oriented toward the
external gas.
The mathematical description of our model is complete. The model unknowns are the characteristic function ϕ in the whole cavity, the velocity v
and pressure p in the liquid domain only. These unknowns satisfy the equations (1)–(3). Simplified problems extracted from this model of incompressible liquid flow with a free surface have been investigated theoretically in
[CR05, Cab05], in one and two dimensions of space, and existence results and
error estimates have been obtained.
2.2 Time Splitting Scheme
An implicit splitting algorithm is proposed to solve (1)–(3) by splitting the
advection from the diffusion part of the Navier–Stokes equations. Let 0 =
t0 < t1 < t2 < . . . < tN = T be a subdivision of the time interval [0, T ], define
190
A. Bonito et al.
Fig. 1. The splitting algorithm (from left to right). Two advection problems are
solved to determine the new approximation of the characteristic function ϕn+1 , the
new liquid domain Ω n+1 and the predicted velocity vn+1/2 . Then, a generalized
Stokes problem is solved in the new liquid domain Ω n+1 in order to obtain the
velocity vn+1 and the pressure pn+1 .
δtn = tn+1 − tn the n-th time step, n = 0, 1, 2, . . . , N , δt the largest time step.
Let ϕn , vn , pn , Ω n be approximations of ϕ, v, p, Ωt at time tn , respectively.
Then the approximations ϕn+1 , vn+1 , pn+1 , Ω n+1 at time tn+1 are computed
by means of an implicit splitting algorithm, as illustrated in Figure 1.
Two advection problems are solved first, leading to a prediction of the
new velocity vn+1/2 together with the new approximation of the characteristic
function ϕn+1 at time tn+1 , which allows to determine the new liquid domain
Ω n+1 and the new liquid interface Γ n+1 . Then a generalized Stokes problem is
solved on Ω n+1 with the boundary condition (4) on the liquid interface Γ n+1 ,
Dirichlet, slip or Signorini-type conditions on the boundary of the cavity Λ
and the velocity vn+1 and pressure pn+1 in the liquid are obtained.
This time-splitting algorithm introduces an additional error on the velocities and pressures which is of order O(δt), see, e.g., [Mar90]. This algorithm
allows the motion of the free surface to be decoupled from the diffusion step,
which consists in solving a Stokes problem in a fixed domain [Glo03].
Advection Step. Solve between the times tn and tn+1 the two advection problems:
∂ϕ
∂v
+ (v · ∇)v = 0,
+ v · ∇ϕ = 0
(5)
∂t
∂t
with initial conditions vn and ϕn . This step is solved exactly by the method of
characteristics [Mau96, Pir89] which yields a prediction of the velocity vn+1/2
and the characteristic function of the new liquid domain ϕn+1 :
vn+1/2 (x + δtn vn (x)) = vn (x)
and ϕn+1 (x + δtn vn (x)) = ϕn (x)
(6)
for all x belonging to Ω n . Then, the new liquid domain Ω n+1 is defined as
the set of points such that ϕn+1 equals one.
Diffusion Step. The diffusion step consists in solving a generalized Stokes
problem on the domain Ω n+1 using the predicted velocity vn+1/2 and the
boundary condition (4). The following backward Euler scheme is used:
Fluid Flows with Complex Free Surfaces
ρ
vn+1 − vn+1/2
− 2 div µD(vn+1 ) + ∇pn+1 = f (tn+1 )
δtn
div vn+1 = 0
191
in Ω n+1 ,
in Ω
n+1
,
(7)
(8)
where vn+1/2 is the prediction of the velocity obtained with (6) after the
advection step. The boundary conditions on the free surface are given by (4).
The weak formulation corresponding to (7), (8) and (4), therefore, consists in
finding vn+1 and pn+1 such that vn+1 is vanishing on ∂Λ and
ρ
vn+1 − vn+1/2
·
w
dx
+
2
µD(vn+1 ) : D(w) dx
δtn
Ω n+1
Ω n+1
n+1
−
p
div w dx −
f · w dx −
q div vn+1 dx = 0, (9)
Ω n+1
Ω n+1
Ω n+1
for all test functions (w, q) such that w vanishes on the boundary of the cavity
where essential boundary conditions are enforced.
2.3 A Two-Grids Method for Space Discretization
Advection and diffusion phenomena being now decoupled, the equations (5)
are first solved using the method of characteristics on a structured mesh of
small cells in order to reduce numerical diffusion of the interface Γt between
the liquid and the gas, and have an accurate approximation of the liquid
region, see Figure 2 (left).
The bounding box of the cavity Λ is meshed into a structured grid made
out of small cubic cells of size h, each cell being labeled by indices (ijk). Let
ϕnijk and vnijk be the approximate values of ϕ and v at the center of cell number
(ijk) at time tn . The unknown ϕnijk is the volume fraction of liquid in the cell
ijk and is the numerical approximation of the characteristic function ϕ at
Fig. 2. Two-grids method. The advection step is solved on a structured mesh of
small cubic cells composed of blocks whose union covers the physical domain Ωh
(left), while the diffusion step is solved on a finite element unstructured mesh of
tetrahedra (right).
192
A. Bonito et al.
1 1
16 4
3 1
16 4
0
0
1
41
3 1
16 4
9 1
16 4
ϕ
=
0
1
4
1
Fig. 3. Effect of the SLIC algorithm on numerical diffusion. An example of two
dimensional advection and projection when the volume fraction of liquid in the
1
cell is ϕn
ij = 4 . Left: without SLIC, the volume fraction of liquid is advected and
projected on four cells, with contributions (from the top left cell to the bottom
3 1
, 1 1 , 9 1 , 3 1 . Right: with SLIC, the volume fraction of liquid is
right cell) 16
4 16 4 16 4 16 4
first pushed at one corner, then it is advected and projected on one cell only, with
contribution 41 .
time tn , which is piecewise constant on each cell of the structured grid. The
advection step for the cell number (ijk) consists in advecting ϕnijk and vnijk by
δtn vnijk and then projecting the values on the structured grid, to obtain ϕn+1
ijk
n+ 1
and a prediction of the velocity vijk 2 . A simple implementation of the SLIC
(Simple Linear Interface Calculation) algorithm, described in [MPR03] and
inspired by [NW76], allows to reduce the numerical diffusion of the domain
occupied by the liquid by pushing the fluid along the faces of the cell before
advecting it. The choice of how to push the fluid depends on the volume
fraction of liquid of the neighboring cells. The cell advection and projection
with SLIC algorithm are presented in Figure 3, in two space dimensions for
the sake of simplicity. We refer to [AMS04] for a recent improvement of the
SLIC algorithm.
Remark 1. A post-processing technique allows to avoid the compression effects
and guarantees the conservation of the mass of liquid. Related to global repair
algorithms [SW04], this technique produces final values ϕn+1
ijk which are between zero and one, even when the advection of ϕn gives values strictly larger
than one. The technique consists in moving the fraction of liquid in excess in
the cells that are over-filled to receiver cells in a global manner by sorting the
cells according to ϕn+1 . Details can be found in [MPR99, MPR03].
n+1/2
n+1
and vijk
Once values ϕijk
ϕn+1
P
have been computed on the cells, values of
n+ 1
and of the velocity field vP 2 are computed at the
the fraction of liquid
nodes P of the finite element mesh with approximated projection methods. We
take advantage of the difference of refinement between a coarse finite element
Fluid Flows with Complex Free Surfaces
193
mesh and a finer structured grid of cells. Let Th be the triangulation of the
cavity Λ. For any vertex P of Th , let ψP be the corresponding finite element
basis function (i.e. the continuous, piecewise linear function having value one
at P , zero at the other vertices). Then, ϕn+1
P , the volume fraction of liquid at
vertex P and time tn+1 is computed by:
⎞ ⎛
⎞
⎛
⎟;⎜
⎟
⎜
ϕn+1
ψP (Cijk )⎠ , (10)
ψP (Cijk )ϕn+1
=⎝
P
ijk ⎠ ⎝
K∈Th
ijk
P ∈K Cijk ∈K
K∈Th
ijk
P ∈K Cijk ∈K
where Cijk is the center of the cell (ijk). The same kind of formula is used to
1
obtain the predicted velocity vn+ 2 at the vertices of the finite element mesh.
When these values are available at the vertices of the finite element mesh, the
approximation of the liquid region Ωhn+1 used for solving (9) is defined as the
union of all elements of the mesh K ∈ Th with (at least) one of its vertices P
> 0.5, the approximation of the free surface being denoted by
such that ϕn+1
P
Γhn+1 .
Numerical experiments reported in [MPR99, MPR03] have shown that
choosing the size of the cells of the structured mesh approximately 5 to 10
times smaller than the size of the finite elements is a good choice to reduce
numerical diffusion of the interface Γt . Furthermore, since the characteristics
method is used, the time step is not restricted by the CFL number (which
is the ratio between the time step times the maximal velocity divided by the
mesh size). Numerical results in [MPR99, MPR03] have shown that a good
choice generally consists in choosing CFL numbers ranging from 1 to 5.
Remark 2. In number of industrial mold filling applications, the shape of the
cavity containing the liquid (the mold) is complex. Therefore, a special, hierarchical, data structure has been implemented in order to reduce the memory
requirements, see [MPR03, RDG+ 00]. The cavity is meshed into tetrahedra
for the resolution of the diffusion problem. For the advection part, a hierarchical structure of blocks, which cover the cavity and are glued together, is
defined. A computation is performed inside a block if and only if it contains
cells with liquid. Otherwise the whole block is deactivated.
The diffusion step consists in solving the Stokes problem (9) with finite
) be the piecewise linear approxima(resp. pn+1
element techniques. Let vn+1
h
h
n+1
n+1
tion of v
(resp. p
). The Stokes problem is solved with stabilized P1 −P1
finite elements (Galerkin Least Squares, see [FF92]) and consists in finding
and pressure pn+1
such that:
the velocity vn+1
h
h
194
ρ
A. Bonito et al.
n+1/2
− vh
vn+1
h
w
dx
+
2
µD(vn+1
) : D(w) dx
h
n+1
n+1
δtn
Ωh
Ωh
−
f w dx −
div
w
dx
−
pn+1
q dx
div vn+1
h
h
n+1
Ωh
−
n+1
K⊂Ωh
αK
K
n+1
Ωh
n+1
Ωh
n+1/2
− vh
vn+1
h
n
δt
+
∇pn+1
h
−f
· ∇q dx = 0,
(11)
for all w and q the velocity and pressure test functions, compatible with
the boundary conditions on the boundary of the cavity Λ. The value of the
parameter αK is discussed in [MPR99, MPR03].
back
The projection of the continuous piecewise linear approximation vn+1
h
on the cell (ijk) is obtained by interpolation of the piecewise finite element
approximation at the center Cijk of the cell. It allows to obtain a value of the
n+1
velocity vijk
on each cell ijk of the structured grid for the next time step.
2.4 Numerical Results
The classical “vortex-in-a-box” test case widely treated in the literature is
considered here [RK98]. The initial liquid domain is a circle of radius 0.015
with its center located in (0.05, 0.075). It is stretched by a given velocity, given
by the stream function ψ(x, y) = 0.01π sin2 (πx/0.1) sin2 (πy/0.1) cos(πt/2).
The velocity being periodic in time, the initial liquid domain is reached after
a time T = 2. Figure 4 illustrates the liquid-gas interface for three structured
meshes [CPR05]. The interface with maximum deformation and the interface
after one period of time are represented. Numerical results show the efficiency
and convergence of the scheme.
An S-shaped channel lying between two horizontal plates is filled. The
channel is contained in a 0.17 m × 0.24 m rectangle. The distance between
the two horizontal plates is 0.008 m. Water is injected at one end with
Fig. 4. Single vortex test case, representation of the computed interface at times
t = 1 (maximal deformation) and t = 2 (return to initial shape). Left: coarser mesh,
middle: middle mesh, right: finer mesh.
Fluid Flows with Complex Free Surfaces
195
Fig. 5. S-shaped channel: 3D results when the cavity is initially filled with vacuum.
Time equals 8.0 ms, 26.0 ms, 44.0 ms and 53.9 ms.
constant velocity 8.7 m/s. Density and viscosity are taken to be respectively
ρ = 1000 kg/m3 and µ = 0.01 kg/(ms).
Slip boundary conditions are enforced to avoid boundary layers and a
turbulent viscosity is added, the coefficient αT being equal to 4h2 , as proposed
in [CPR05]. Since the ratio between Capillary number and Reynolds number
is very small, surface tension effects are neglected.
The final time is T = 0.0054 s and the time step is τ = 0.0001 s. The mesh
is made out of 96030 elements. In Figure 5, 3D computations are presented
when a valve is placed at the end of the cavity, thus allowing the gas to exit.
The CPU time for the simulations in three space dimensions is approximately
319 minutes for 540 time steps. Most of the CPU time is spent to solve the
Stokes problem. A comparison with experimental results shows that the bubbles of gas trapped by the liquid vanish too rapidly. In order to obtain more
realistic results, the effect of the gas compressibility onto the liquid must be
considered. This is the scope of the next section.
3 Extension to the Modeling of an Incompressible
Liquid Surrounded by a Compressible Gas
3.1 Extension of the Model
In Section 2, the zero force condition (4) was applied on the liquid-gas interface. Going back to the simulation of Figure 5, this corresponds to filling
with liquid a cavity under vacuum. When considering industrial mold filling
processes, the mold is not initially under vacuum, but contains some compressible air that interacts with the liquid. Therefore, the model has to be
extended. The velocity in the gas is disregarded here, since it is CPU time
consuming to solve the Euler compressible equations in the gas domain. The
model presented in Section 2 is extended by adding the normal forces due to
the gas pressure on the free surface Γt , still neglecting tangential and capillary
forces. The relationship (4) is replaced by
196
A. Bonito et al.
−pn + 2µD(v)n = −P n on Γt , t ∈ (0, T ),
(12)
where P is the pressure in the gas. For instance, consider the experiment of
Figure 5 where the cavity is being filled with liquid. The gas present in the
cavity at initial time can either escape if a valve is placed at the end of the
cavity (in which case the gas does exert very little resistance on the liquid)
or be trapped in the cavity. When a bubble of gas is trapped by the liquid,
the gas pressure prevents the bubble to vanish rapidly, as it is the case for
vacuum.
The pressure in the gas is assumed to be constant in space in each bubble
of gas, that is to say in each connected component of the gas domain. Let k(t)
be the number of bubbles of gas at time t and let Bi (t) denote the domain
occupied by the bubble number i (the i-th connected component). Let Pi (t)
denote the pressure in Bi (t). At initial time, Pi (0) is constant in each bubble
i. The gas is assumed to be an ideal gas. If Vi (t) is defined as the volume of
Bi (t), the pressure in each bubble at time t is thus computed by using the
law of ideal gases at constant temperature:
Pi (t)Vi (t) = constant
i = 1, . . . , k(t).
(13)
The above relationship is an expression of the conservation of the number of
molecules of trapped gas (gas that cannot escape through a valve) between
time t and t + δt. However, this simplified model requires the tracking of the
position of the bubbles of gas between two time steps.
When δt is small enough, three situations appear between two time steps:
first, a single bubble remains a single bubble; or a bubble splits into two
bubbles, or two bubbles merge into one. Combinations of these three situations
may appear.
For instance, in the case of a single bubble, if the pressure P (t) in the
bubble at time t and the volumes V (t) and V (t + δt) are known, the gas
pressure at time t+δt is easily computed from the relation P (t+δt)V (t+δt) =
P (t)V (t). The other cases are described at the discrete level in the following.
Details can be found in [CPR05].
The additional unknowns in our model are the bubbles of gas Bi (t) and
the constant pressure P = Pi (t) in the bubble of gas number i. The equations
(1)–(3) are to be solved together with (12), (13).
3.2 Modification of the Numerical Method
The tracking of the bubbles of gas and the computation of their internal pressure introduce an additional step in our time splitting scheme. This procedure
is inserted between the advection step (6) and the diffusion step (7), (8), in
order to compute an approximation of the pressure to plug into (12).
Let us denote by k n , Pin , Bin , i = 1, 2, . . . , kn , the approximations of k,
Pi , Bi , i = 1, 2, . . . , k, respectively at time tn . Let ξ(t) be a bubble numbering
Fluid Flows with Complex Free Surfaces
197
function, defined as negative in the liquid region Ωt and equal to i in bubble
Bi (t). The approximations k n+1 , Pin+1 , Bin+1 , i = 1, 2, . . . , kn+1 and ξ n+1 are
computed as follows.
Numbering of the Bubbles of gas
Given the new liquid domain Ω n+1 , the key point is to find the number of
bubbles k n+1 (that is to say the number of connected components) and the
bubbles Bin+1 , i = 1, . . . , kn+1 . Given a point P in the gas domain Λ \ Ω n+1 ,
we search for a function u such that −∆u = δP in Λ \ Ω n+1 , with u = 0 on
Ω n+1 and u continuous. Since the solution u to this problem is strictly positive
in the connected component containing point P and vanishes outside, the first
bubble is found. The procedure is repeated iteratively until all the bubbles
are recognized. The algorithm is written as follows:
Set k n+1 = 0, ξ n+1 = 0 in Λ \ Ω n+1 and ξ n+1 = −1 in Ω n+1 , and Θn+1 =
{x ∈ Λ : ξ n+1 (x) = 0}.
While Θn+1 = ∅, do:
1. Choose a point P in Θn+1 ;
2. Solve the following problem: Find u : Λ → R which satisfies:
⎧
in Θn+1 ,
⎪
⎨ −∆u = δP ,
(14)
u = 0,
in Λ \ Θn+1 ,
⎪
⎩
n+1
,
[u] = 0,
on ∂Θ
3.
4.
5.
6.
where δP is Dirac delta function at point P , [u] is the jump of u through
∂Θn+1 ;
Increase the number of bubbles k n+1 at time tn+1 : k n+1 = k n+1 + 1;
n+1
: u(x) = 0};
Define the bubble of gas number k n+1 : Bkn+1
n+1 = {x ∈ Θ
n+1
n+1
Update the bubble numbering function ξ
(x) = k
, for all x ∈
Bkn+1
n+1 ;
Update Θn+1 for the next iteration: Θn+1 = {x ∈ Λ : ξ n+1 (x) = 0}.
The cost of this original numbering algorithm is bounded by the cost of
solving k n+1 times a Poisson problem in the gas domain. The corresponding
CPU time used to solve the Poisson problems is usually less than 10 percent
of the total CPU time. This numbering algorithm is implemented on the finite
element mesh. The Poisson problems (14) are solved on Th , using standard
continuous, piecewise linear finite elements.
Computation of the Pressure in the Gas
Once the connected components of gas are numbered, an approximation
Pin+1 of the constant pressure in bubble i at time tn+1 has to be computed with (13). In the case of a single bubble in the liquid, (13) yields
P1n+1 V1n+1 = P1n V1n . In the case when two bubbles merge, this relation becomes P1n+1 V1n+1 = P1n V1n + P2n V2n . When a bubble B1n splits onto two,
198
A. Bonito et al.
each of its parts at time tn contributes to bubbles B1n+1 and B2n+1 . The
volume fraction of bubble B1n which contributes to bubble Bjn+1 is noted
n+1/2
V1,j
, j = 1, 2. The pressure in the bubble Bjn+1 is computed by taking
into account the compression/decompression of the two fractions of bubbles
n+1/2
Pjn+1 = P1n V1,j
/Vjn+1 , j = 1, 2.
Details of the implementation require to take into account several situations, when two bubbles at time tn and tn+1 do or do not intersect between
two time steps, and are detailed in [CPR05]. The value of the pressure can be
inserted as a boundary term in (9) for the resolution of the generalized Stokes
problem (7), (8).
Remark 3. By using the divergence theorem in the variational formulation (9)
and the fact that P n+1 is piecewise constant, the integral on the free surface
Γhn+1 is transformed into an integral on Ωhn+1 and, therefore, an approximation of the normal vector n is not explicitly needed.
3.3 Numerical Results
Numerical results are presented here for mold filling simulations in order
to show the influence of the gas pressure and to compare with results in
Section 2.4.
The same S-shaped channel is initially filled with gas at atmospheric pressure P = 101300 Pa. A valve is located at the upper extremity of the channel
allowing gas to escape. Numerical results (cf. Figure 6) show the persistence
of the bubbles. The CPU time for the simulations is approximately 344 min
with the bubbles computations (to compare with 319 min in Section 2).
Fig. 6. S-shaped channel: 3D results when the cavity is initially filled with compressible gas at atmospheric pressure. Time equals 8.0 ms, 26.0 ms, 44.0 ms and
53.9 ms.
Fluid Flows with Complex Free Surfaces
199
4 Extension to the Modeling of Incompressible
Liquid-Compressible Gas Two-Phases Flows
with Surface Tension Effects
4.1 Extension of the Model
Surface tension effects are usually neglected for high Reynolds numbers.
However, for creeping flows (with low Reynolds number and high Capillary
number), the surface tension effects become relevant. The model presented in
Section 3 is extended, so that tangential and capillary forces are still neglected
on the free surface, but the normal forces due to the surface tension effects are
added. Details can be found in [Cab06]. The relationship (12) is replaced by
−pn + 2µD(v)n = −P n + σκn on Γt , t ∈ (0, T ),
(15)
where κ = κ(x, t) is the mean curvature of the interface Γt at point x ∈ Γt
and σ is a constant surface tension coefficient which depends on both media
on each side of the interface (namely the liquid and the gas). The continuum
surface force (CSF) model, see, e.g., [BKZ92, RK98, WKP99], is considered
for the modeling of surface tension effects.
4.2 Modification of the Numerical Method
The relationship (15) on the interface requires the computation of the curvature κ and the normal vector n. An additional step is added in the time
splitting scheme to compute these two unknowns before the diffusion part.
The approximations κn+1 and nn+1 of κ and n respectively are computed at
time tn+1 on the interface Γ n+1 as follows.
Since the characteristic function ϕn+1 is not smooth, it is first mollified,
see, e.g., [WKP99], in order to obtain a smoothed approximation ϕ̃n+1 , such
that the liquid-gas interface Γ n+1 is given by the level line {x ∈ Λ : ϕ̃n+1 (x) =
1/2}, with ϕ̃n+1 < 1/2 in the gas domain and ϕ̃n+1 > 1/2 in the liquid
domain. The smoothed characteristic function ϕ̃n+1 is obtained by convolution
of ϕn+1 with the fourth-order kernel function Kε described in [WKP99]:
n+1
ϕ̃
(x) =
ϕn+1 (y)Kε (x − y) dy ∀x ∈ Λ.
(16)
Λ
n+1
The smoothing of ϕ
is performed only in a layer around the free surface.
The parameter ε is the smoothing parameter that describes the size of the
support of Kε , i.e. the size of the smoothing layer around the interface. At
each time step, the normal vector nn+1 and the curvature κn+1 on the liquidgas interface are given respectively by nn+1 = −∇ϕ̃n+1 /∇ϕ̃n+1 and κn+1 =
− div(∇ϕ̃n+1 /∇ϕ̃n+1 ), see, e.g., [OF01, Set96].
Instead of using the structured grid of cells to compute the curvature,
see, e.g., [AMS04, SZ99], the computation of κn+1 is performed on the finite
element mesh, in order to use the variational framework of finite elements.
200
A. Bonito et al.
at
is given by the normalized gradient of ϕ̃n+1
The normal vector nn+1
h
h
each grid point Pj , j = 1, . . . , M where M denotes the number of nodes
in the finite element discretization. Details can be found in [Cab06]. The
curvature κn+1 is approximated by its L2 -projection on the piecewise linear
finite elements space with mass lumping and is denoted by κn+1
. The basis
h
functions of the piecewise linear finite element space associated to each node
is given by the relation
Pj in the cavity being denoted by ψPj , κn+1
h
Λ
κn+1
ψPj dx =
h
∇ϕ̃n+1
h
- ψPj dx,
− div - n+1
∇
ϕ̃
Λ
h
for all j = 1, . . . , M.
The left-hand side of this relation is computed with mass lumping, while the
right-hand side is integrated by parts. Explicit values of the curvature of the
are obtained at the vertices of the finite element mesh being
level lines of ϕ̃n+1
h
in a layer around the free surface. The restriction of κn+1
to the nodes lying
h
on Γhn+1 is used to compute (15).
4.3 Numerical Results
We consider a bubble of gas at the bottom of a cylinder filled with liquid, under gravity forces. The bubble rises and reaches an upper free surface between
water and air, see Figure 7. The physical constants are µ = 0.01 kg/(ms),
ρ = 1000 kg/m3 and σ = 0.0738 N/m. The mesh made out of 115200 tetrahedra. The size of the cells of the structured mesh used for advection step is
approximately 5 to 10 times smaller than the size of the finite elements and
Fig. 7. Three-dimensional rising bubble under a free surface: Representation of the
gas domain at times t = 100.0, 200.0, 230.0, 240.0., 300.0 and 320.0 ms (left to right,
top to bottom).
Fluid Flows with Complex Free Surfaces
201
the time step is chosen such that the CFL number is approximately one. The
smoothing parameter is ε = 0.005. The CPU time for this computation is
approximately 20 hours to achieve 1000 time steps.
5 Extension to the Modeling of Viscoelastic Flows
with a Free Surface
5.1 Extension of the Model
The total stress tensor for incompressible viscoelastic fluids is, by definition,
the sum of a Newtonian part 2µD(v) − pI and a non-Newtonian part denoted
by σ : QT → R3×3 . Owning this decomposition, the system (1)–(2) becomes
ρ
∂v
+ ρ(v · ∇)v − 2 div (µD(v) + σ) + ∇p = f in QT ,
∂t
div v = 0 in QT .
(17)
(18)
The simplest constitutive (or closure) equation for the extra-stress σ, namely
the Oldroyd-B model [Old50], is chosen to supplement the above system
∂σ
T
σ+λ
(19)
+ (v · ∇)σ − (∇v)σ − σ(∇v)
= 2ηD(v) in QT .
∂t
Here λ > 0 is the relaxation time (the time for the stress to return to zero
under constant-strain condition) and η > 0 is the polymer viscosity. The
extra-stress σ has to be imposed only at the inflow. For more details, we refer
to [BPL06].
Remark 4. The numerical procedures described in this section can be extended
to more general deterministic models such as Phan-Thien Tanner [PTT77],
Giesekus [Gie82] and stochastic models such as, e.g., FENE [War72], FENEP [BDJ80]. Two-dimensional computations of free surface flows with FENE
dumbbells have been performed in [GLP03].
5.2 Modification of the Numerical Procedure
The convective term in (19) is treated in the same fashion as (5). Continuous, piecewise linear finite elements are considered to approximate the
extra-stress tensor σ and an EVSS (Elastic Viscous Split Stress) procedure
[FGP97, BPS01, PR01] is used in order to obtain a stable algorithm even if
the solvent viscosity µ vanishes.
Advection Step. Together with (5), solve between the times tn and tn+1
∂σ
+ (u · ∇)σ = 0
∂t
(20)
202
A. Bonito et al.
with initial conditions given by the value of the tensor σ at time tn . This
step is also solved using the characteristics method on the structured grid, see
Figure 2, using the relation σ n+1/2 (x+δtn vn (x)) = σ n (x). As for the velocity
and volume fraction of liquid, the extra-stress tensor σ n+1/2 is computed on
n+1/2
the structured grid of cells (ijk) leading to values σ ijk . Then, values are
interpolated at the nodes of the finite element mesh using the same kind of
formula as in (10), which yields the continuous, piecewise linear extra-stress
n+1/2
σh
.
Diffusion Step. The diffusion step consists in solving the so-called three-fields
Stokes problem on the finite element mesh. Following the EVSS method, we
n+1/2
: Ωhn+1 → R3×3 as the L2 -projection into
define a new extra-tensor Bh
n+1/2
the finite element space of the predicted deformation tensor D(vh
), i.e.
n+1/2
n+1/2
D(vh
) : Eh dx,
Bh
: Eh dx =
n+1
Ωh
n+1
Ωh
for all test functions Eh . Then (9) is modified to take explicitly into account
the term coming from the extra-stress tensor. The extra term
n+1/2
2
ηD(vn+1
)
:
D(w
)
dx
−
2
ηBh
: D(wh ) dx,
h
h
n+1
Ωh
n+1
Ωh
which vanishes at continuous level, is also added. Thus, the weak formulation
(9) becomes, find the piecewise linear finite element approximations vn+1
h
and pn+1
such that vn+1
satisfies the essential boundary conditions on the
h
h
boundary of the cavity Λ and such that
ρ
−2
n+1
Ωh
n+1
Ωh
n+1/2
− vh
vn+1
h
· wh dx + 2
(µ + η)D(vn+1
) : D(wh ) dx
h
n+1
δtn
Ωh
n+1/2
−
pn+1
div wh dx +
σh
: D(wh ) dx
h
n+1
Ωh
n+1/2
ηBh
: D(wh ) dx−
n+1
Ωh
n+1
Ωh
f ·wh dx−
n+1
Ωh
qh div vn+1
dx = 0,
h
(21)
for all test functions wh , qh . Once the velocity vn+1
is computed, the extrah
stress is recovered using (19). More precisely the continuous, piecewise linear
satisfies the prescribed boundary conditions at inflow and
extra-stress σ n+1
h
Fluid Flows with Complex Free Surfaces
n+1
Ωh
σ n+1
: τ h dx + λ
h
+λ
n+1/2
n+1
Ωh
= 2η
− σh
σ n+1
h
δtn
n+1
Ωh
: τ h dx
) : τ h dx
D(vn+1
h
n+1/2
n+1
Ωh
203
(∇vn+1
)σ h
h
n+1/2
+ σh
)T
(∇vn+1
h
: τ h dx, (22)
are interpolated at
and σ n+1
for all test functions τ h . Finally, the fields un+1
h
h
the center of the cells Cijk .
Theoretical investigations for a simplified problem without advection and
free surface have been performed in [BCP07]. Using an implicit function theorem, existence of a solution and convergence of the finite element scheme
have been obtained. We refer to [BCP06b, BCP06a] for an extension to the
stochastic Hookean dumbbells model.
5.3 Numerical Results
Two different simulations are provided here, the buckling of a jet and the
stretching of a filament. In the first simulation, different behaviors between
Newtonian and viscoelastic fluids are observed and the elastic effect of the
relaxation time λ is pointed out. In the second simulation, fingering instabilities can be observed, which corresponds to experiments. More details and test
cases can be found in [BPL06].
Jet buckling
The transient flow of a jet of diameter d = 0.005 m, injected into a parallelepiped cavity of width 0.05 m, depth 0.05 m and height 0.1 m, is reproduced. Liquid enters from the top of the cavity with vertical velocity U = 0.5
m/s. The fluids parameters are given in Table 1, the effects of surface tension
being not considered.
The finite element mesh has 503171 vertexes and 2918760 tetrahedra. The
cells size is 0.0002 m and the time step is 0.001 s thus the CFL number of the
cells is 2.5. A comparison of the shape of the jet with Newtonian flow is shown
in Figure 8. This computation takes 64 hours on a AMD Opteron CPU with
8Gb memory. The elastic effects in the liquid are clearly observed: when the
viscoelastic jet starts to buckle, the Newtonian jet has already produced many
Table 1. Jet buckling. Liquid parameters.
ρ [kg/m3 ] µ [Pa·s] η [Pa·s] λ [s] De = λU/d
Newtonian
Viscoelastic
1030
1030
10.3
1.03
0
9.27
0
1
0
100
204
A. Bonito et al.
Fig. 8. Jet buckling in a cavity. Shape of the jet at time t = 0.125 s (col. 1), t = 0.45
s (col. 2), t = 0.6 s (col. 3), t = 0.9 s (col. 4), t = 1.15 s (col. 5), t = 1.6 s (col. 6),
Newtonian fluid (row 1), viscoelastic fluid (row 2).
folds. For a discussion on the condition for a jet to buckle and comparison
with results obtained in [TMC+ 02], we refer to [BPL06].
Fingering instabilities
The numerical model is capable to reproduce fingering instabilities, as reported in [RH99, BRLH02, MS02, DLCB03] for non-Newtonian flows. The
flow of an Oldroyd-B fluid contained between two parallel coaxial circular
disks with radius R0 = 0.003 m is considered. At the initial time, the distance between the two end-plates is L0 = 0.00015 m and the liquid is at
rest. Then, the top end-plate is moved vertically with velocity L0 ε˙0 eε˙0 t where
ε̇0 = 4.68 s−1 . The liquid parameters are ρ = 1030 kg/m3 , µ = 9.15 Pa·s,
η = 25.8 Pa·s, λ = 0.421 s. Following [MS02, Section 4.4], since the aspect
ratio R0 /L0 is equal to 20, the Weissenberg number We = DeR02 /L20 is large.
The finite element mesh has 50 vertexes along the radius and 25 vertexes
along the height, thus the mesh size is 0.00006 m. The cells size is 0.00001 m
and the initial time step is δt = 0.01 s thus the CFL number of the cells is
initially close to one. The shape of the filament is reported in Figure 9 and
2D cuts in the middle of the height are reported in Figure 10. Fingering instabilities can be observed from the very beginning of the stretching, leading to
branched structures, as described in [MS02, BRLH02, DLCB03]. These instabilities are essentially elastic, without surface tension effects [RH99]. Clearly,
such complex shapes cannot be obtained using Lagrangian models, the mesh
distortion being too large.
Fluid Flows with Complex Free Surfaces
205
Fig. 9. Fingering instabilities. Shape of the liquid region at times t = 0 s (left) and
t = 0.745 s (right).
Fig. 10. Fingering instabilities. Horizontal cuts through the middle of the liquid
region at times t = 0.119 s, t = 0.245 s, t = 0.364 s, t = 0.49 s (first row) and times
t = 0.609 s, t = 0.735 s, t = 0.854 s, t = 0.98 s (second row).
6 Conclusions
An efficient computational model for the simulation of two-phases flows has
been presented. It allows to consider both Newtonian and non-Newtonian
flows. It relies on an Eulerian framework and couples finite element techniques
with a forward characteristics method. Numerical results illustrate the large
range of applications covered by the model. Extensions are being investigated
(1) to couple viscoelastic and surface tension effects, (2) to reduce the CPU
time required to solve Stokes problems, and (3) to improve the reconstruction
of the interface and the computation of surface tension effects.
Acknowledgement. The authors wish to thank Vincent Maronnier for his contribution to this project and his implementation support.
206
A. Bonito et al.
References
[AMS04]
E. Aulisa, S. Manservisi, and R. Scardovelli. A surface marker algorithm
coupled to an area-preserving marker redistribution method for threedimensional interface tracking. J. Comput. Phys., 197(2):555–584, 2004.
[BCP06a] A. Bonito, Ph. Clément, and M. Picasso. Finite element analysis of a
simplified stochastic Hookean dumbbells model arising from viscoelastic
flows. M2AN Math. Model. Numer. Anal., 40(4):785–814, 2006.
[BCP06b] A. Bonito, Ph. Clément, and M. Picasso. Mathematical analysis of a
simplified Hookean dumbbells model arising from viscoelastic flows. J.
Evol. Equ., 6(3):381–398, 2006.
[BCP07]
A. Bonito, Ph. Clément, and M. Picasso. Mathematical and numerical
analysis of a simplified time-dependent viscoelastic flow. Numer. Math.,
107(2):213–255, 2007.
[BDJ80]
R. B. Bird, N. L. Dotson, and N. L. Johnson. Polymer solution rheology based on a finitely extensible bead-spring chain model. J. NonNewtonian Fluid Mech., 7:213–235, 1980.
[BKZ92]
J. U. Brackbill, D. B. Kothe, and C. Zemach. A continuum method for
modeling surface tension. J. Comput. Phys., 100:335–354, 1992.
[BPL06]
A. Bonito, M. Picasso, and M. Laso. Numerical simulation of 3D viscoelastic flows with free surfaces. J. Comput. Phys., 215(2):691–716,
2006.
[BPS01]
J. Bonvin, M. Picasso, and R. Stenberg. GLS and EVSS methods for
a three-field Stokes problem arising from viscoelastic flows. Comput.
Methods Appl. Mech. Engrg., 190(29–30):3893–3914, 2001.
[BRLH02] A. Bach, H. K. Rasmussen, P.-Y. Longin, and O. Hassager. Growth
of non-axisymmetric disturbances of the free surface in the filament
stretching rheometer: experiments and simulation. J. Non-Newtonian
Fluid Mech., 108:163–186, 2002.
[Cab05]
A. Caboussat. Numerical simulation of two-phase free surface flows.
Arch. Comput. Methods Engrg., 12(2):165–210, 2005.
[Cab06]
A. Caboussat. A numerical method for the simulation of free surface
flows with surface tension. Comput. & Fluids, 35(10):1205–1216, 2006.
[CPR05]
A. Caboussat, M. Picasso, and J. Rappaz. Numerical simulation of
free surface incompressible liquid flows surrounded by compressible gas.
J. Comput. Phys., 203(2):626–649, 2005.
[CR05]
A. Caboussat and J. Rappaz. Analysis of a one-dimensional free surface
flow problem. Numer. Math., 101(1):67–86, 2005.
[DLCB03] D. Derks, A. Lindner, C. Creton, and D. Bonn. Cohesive failure of
thin layers of soft model adhesives under tension. J. Appl. Phys.,
93(3):1557–1566, 2003.
[FCD+ 06] M. M. Francois, S. J. Cummins, E. D. Dendy, D. B. Kothe, J. M. Sicilian,
and M. W. Williams. A balanced-force algorithm for continuous and
sharp interfacial surface tension models within a volume tracking framework. J. Comput. Phys., 213(1):141–173, 2006.
[FF92]
L. P. Franca and S. L. Frey. Stabilized finite element method: II. The
incompressible Navier–Stokes equations. Comput. Methods Appl. Mech.
Engrg., 99:209–233, 1992.
Fluid Flows with Complex Free Surfaces
[FGP97]
207
M. Fortin, R. Guénette, and R. Pierre. Numerical analysis of the modified EVSS method. Comput. Methods Appl. Mech. Engrg., 143(1–2):79–
95, 1997.
[Gie82]
H. Giesekus. A simple constitutive equation for polymer fluids based
on the concept of deformation-dependent tensorial mobility. J. NonNewtonian Fluid Mech., 11(1–2):69–109, 1982.
[Glo03]
R. Glowinski. Finite element methods for incompressible viscous flow. In
P. G. Ciarlet and J.-L. Lions, editors, Handbook of Numerical Analysis,
Vol. IX, pages 3–1176. North-Holland, Amsterdam, 2003.
[GLP03]
E. Grande, M. Laso, and M. Picasso. Calculation of variable-topology
free surface flows using CONNFFESSIT. J. Non-Newtonian Fluid
Mech., 113(2):123–145, 2003.
[Mar90]
G. I. Marchuk. Splitting and alternating direction methods. In P. G.
Ciarlet and J.-L. Lions, editors, Handbook of Numerical Analysis, Vol.
I, pages 197–462. North-Holland, Amsterdam, 1990.
[Mau96]
B. Maury. Characteristics ALE method for the 3D Navier-Stokes equations with a free surface. Int. J. Comput. Fluid Dyn., 6:175–188, 1996.
[MPR99] V. Maronnier, M. Picasso, and J. Rappaz. Numerical simulation of free
surface flows. J. Comput. Phys., 155:439–455, 1999.
[MPR03] V. Maronnier, M. Picasso, and J. Rappaz. Numerical simulation of three
dimensional free surface flows. Internat. J. Numer. Methods Fluids,
42(7):697–716, 2003.
[MS02]
G. H. McKinley and T. Sridhar. Filament-stretching rheometry of complex fluids. Ann. Rev. Fluid Mech., 34:375–415, 2002.
[NW76]
W. F. Noh and P. Woodward. SLIC (Simple Line Interface Calculation). In A. I. van de Vooren and P. J. Zandbergen, editors, Proc. of the
5th International Conference on Numerical Methods in Fluid Dynamics (Enschede, 1976), volume 59 of Lectures Notes in Physics, pages
330–340, Springer-Verlag, Berlin, 1976.
[OF01]
S. Osher and R. P. Fedkiw. Level set methods: An overview and some
recent results. J. Comput. Phys., 169:463–502, 2001.
[Old50]
J. G. Oldroyd. On the formulation of rheological equations of state.
Proc. Roy. Soc. London. Ser. A., 200(1063):523–541, 1950.
[Pir89]
O. Pironneau. Finite Element Methods for Fluids. Wiley, Chichester,
1989.
[PR01]
M. Picasso and J. Rappaz. Existence, a priori and a posteriori error
estimates for a nonlinear three-field problem arising from Oldroyd-B
viscoelastic flows. M2AN Math. Model. Numer. Anal., 35(5):879–897,
2001.
[PTT77]
N. Phan-Thien and R.I. Tanner. A new constitutive equation derived
from network theory. J. Non-Newtonian Fluid Mech., 2(4):353–365,
1977.
[RDG+ 00] M. Rappaz, J. L. Desbiolles, C. A. Gandin, S. Henry, A. Semoroz, and
P. Thevoz. Modelling of solidification microstructures. Mater. Sci. Forum, 329(3):389–396, 2000.
[RH99]
H. K. Rasmussen and O. Hassager. Three-dimensional simulations of
viscoelastic instability in polymeric filaments. J. Non-Newtonian Fluid
Mech., 82:189–202, 1999.
[RK98]
W. J. Rider and D. B. Kothe. Reconstructing volume tracking. J.
Comput. Phys., 141:112–152, 1998.
208
[Set96]
A. Bonito et al.
J. A. Sethian. Level Set Methods, Evolving Interfaces in Geometry, Fluid
Mechanics, Computer Vision, and Material Science. Monographs on
Applied and Computational Mathematics. Cambridge University Press,
1996.
[SW04]
M. Shashkov and B. Wendroff. The repair paradigm and application to
conservation laws. J. Comput. Phys., 198(1):265–277, 2004.
[SZ99]
R. Scardovelli and S. Zaleski. Direct numerical simulation of free surface
and interfacial flows. Ann. Rev. Fluid Mech., 31:567–603, 1999.
[TMC+ 02] M. F. Tomé, N. Mangiavacchi, J. A. Cuminato, A. Castelo, and
S. McKee. A finite difference technique for simulating unsteady viscoelastic free surface flows. J. Non-Newtonian Fluid Mech., 106:61–106,
2002.
[War72]
H. R. Warner. Kinetic theory and rheology of dilute suspensions of
finitely extendible dumbbells. Ind. Eng. Chem. Fund., 11:379–387, 1972.
[WKP99] M. W. Williams, D. B. Kothe, and E. G. Puckett. Accuracy and
convergence of continuum surface tension models. In Fluid Dynamics at Interfaces (Gainesville, FL, 1998), pages 294–305. Cambridge
University Press, 1999.
Modelling and Simulating the Adhesion
and Detachment of Chondrocytes
in Shear Flow
Jian Hao1 , Tsorng-Whay Pan1 , and Doreen Rosenstrauch2
1
2
Department of Mathematics, University of Houston, Houston, TX 77204-3008,
USA jianh@math.uh.edu, pan@math.uh.edu
The Texas Heart Institute and the University of Texas Health Science Center at
Houston, Houston, TX 77030, USA Doreen.Rosenstrauch@uth.tmc.edu
1 Introduction
Chondrocytes are typically studied in the environment where they normally
reside such as the joints in hips, intervertebral disks or the ear. For example,
in [SKE+ 99], the effect of seeding duration on the strength of chondrocyte
adhesion to articulate cartilage has been studied in shear flow chamber since
such adhesion may play an important role in the repair of articular defects by
maintaining cells in positions where their biosynthetic products can contribute
to the repair process. However, in this investigation, we focus mainly on the
use of auricular chondrocytes in cardiovascular implants. They are abundant,
easily and efficiently harvested by a minimally invasive technique. Auricular
chondrocytes have ability to produce collagen type-II and other important
extracellular matrix constituents; this allows them to adhere strongly to the
artificial surfaces. They can be genetically engineered to act like endothelial
cells so that the biocompatibility of cardiovascular prothesis can be improved.
Actually in [SBBR+ 02], genetically engineered auricular chondrocytes can be
used to line blood-contacting luminal surfaces of left ventricular assist device
(LVAD) and a chondrocyte-lined LVAD has been planted into the tissue-donor
calf and the results in vivo have proved the feasibility of using autologous auricular chondrocytes to improve the biocompatibility of the blood-biomaterial
interface in LVADs and cardiovascular prothesis. Therefore, cultured chondrocytes may offer a more efficient and less invasive means of covering artificial
surface with a viable and adherent cell layer.
In this chapter, we first develop the model of the adhesion of chondrocytes
to the artificial surface and then combine the resulting model with a Lagrange
multiplier based fictitious domain method to simulate the detachment of chondrocyte cells in shear flow. The chondrocytes in the simulation are treated as
neutrally buoyant rigid particles. As argued in [KS06] that the scaling estimates show that for typical parameter values for cell elasticity, deformations
210
J. Hao et al.
due to shear flow and lubrication forces are small, the cells can be treated as
rigid. The Newtonian incompressible viscous flow is modeled by the Navier–
Stokes equations since the inertial effect is crucial for the lift-off of the cells;
in most studies of cell adhesion, the Stokes flow is considered since the rolling
of cells on the surface and then the capture of cells, like white blood cells, are
the main interest, e.g., see [KS06, KH01, SZD03].
2 Model for Cell Adhesion
Cell adhesion to the extracellular matrix (ECM) plays key roles in the assembly of cells into functional multicellular organisms. Chondrocytes produce
collagen type-II and other important extracellular matrix constituents; this
allows them to adhere strongly to the artificial surfaces. Chondrocyte cells are
responsible for the synthesis and maintenance of a viable ECM which is suitably adapted to cope with the physical pressures of its environment. On the
lined surface of LVAD, a monolayer of cells formed on the surface was reported
in [SBBR+ 02]. Adhesive interactions between chondrocytes and ECM occur
via a variety of molecular systems (e.g., see discussion for cell-matrix adhesion
in [ZBCAG04]). Zaidel et al. have shown in [ZBCAG04] that cell-associated
hyaluronan plays a central role in mediating early stages in the attachment of
chondrocytes to the surfaces. Their results indicate that chondrocytes establish, initially, “soft contact” to the surface through a hyaluronan-based coat.
The surface adhesion, mediated by the hyaluronan coat, occurs within seconds
after the cell first encounters the surface. Then within a few tens of secondsto-minutes, the hyaluronan-mediated adhesion is replaced by integrin-based
interactions which is actually a sequential formation starting from dot-shaped
focal complexes (FXs), then changing to focal adhesions (FAs) and finally
becoming fibrillar adhesions (FBs).
In [ZBCAG04] chondrocytes were allowed to adhere to a serum coated glass
coverslip for 10–25 minutes and exposed to shear flow, they drifted under
flow for quite a distance (compared to their diameters) before detachment
from the surface. In [SKE+ 99] chondrocytes were seeded on the surface of a
piece of articular cartilage for specific durations (5–40 minutes) and then were
exposed to shear flow in a flow chamber. It was observed that the increase
in resistance to shear stress-induced cell detachment with increasing seeding
duration. But in [SBBR+ 02], chondrocytes were allowed to have 24 hours for
seeding process on the luminal surfaces of LVADs and then 4 days in incubator
for promoting ECM synthesis to maximize the adherence of cells. When using
flow loop to precondition seeded cells in order to promote good cell adherence,
the cell loss during the process did not exceed 12%. The results in [SKE+ 99]
suggest that chondrocytes adhere to the surface mainly via hyaluronan gel
and the numbers of integrin-based interactions are not high enough since the
durations are comparable to the one used in [ZBCAG04]. But in [SBBR+ 02],
the results indicate that adhesions are mainly integrin-mediated interactions
Cell Adhesion and Detachment in Shear Flow
211
receptor
ligan
Fig. 1. Model geometry of cell and surface. The surface is covered by ligans and
the cell is rigid and covered by receptors distributed randomly.
(FBs) between the members of the integrin family and corresponding ECM
proteins, such as collagen type-II and fibronectin [Loe93, GHR04].
To model cell adhesion, Hammer et al. in [KH01, CH96] have developed
an adhesive dynamics algorithm, in which adhesion molecules are modeled
as linear, Hookean springs, distributed randomly over the particle surface as
shown in Figure 1. For chondrocytes, which have microvilli on the cell surface
[CKGA03], the randomly distributed receptors as shown in Figure 1 still can
be used. The adhesive dynamics algorithm is as follows:
1. All free adhesion molecule receptors in the contact area are tested for
formation of binding with the substrate ligand against the probability
Pf = 1 − exp(−kf nl τ ),
where kf is the forward reaction rate, nl is the density of ligans, and the
time step is τ . If the generated random number is less than Pf , a bond is
established at this time step.
2. All of the currently bound receptors are tested for breakage against the
probability
Pr = 1 − exp(−kr τ ),
where kr is the reverse reaction rate. If the generated random number is
less than Pr the bond breaks at this time step.
3. Each existing bond is characterized by the vector xb and the force imparted by the spring on the cell is Fb = σ(|xb | − λ)ub with the Hookean
spring constant σ, equilibrium length λ and unit directional vector ub =
xb /|xb |.
4. A summation of the forces from each spring and associated torques is
the information that needs to be included in the Newton–Euler equations
to study cell interaction with the Navier–Stokes flow discussed in the
following section.
212
J. Hao et al.
The backward reaction rate kr in [KH01] is given as follows:
#
$
r0 F
,
kr = kr0 exp
kb T
where kr0 is the reverse reaction rate when the spring length is at its equilibrium length, r0 is the reactive compliance, F is the force on the bond and is
equal to σ(|xb | − λ), kb is the Boltzmann constant and T is the temperature.
The ratio of the forward reaction rate and the reverse reaction rate at any
separation distance is given:
#
$
kf0
σ(|xb | − λ)2
kf
= 0 exp −
kr
kr
2kb T
where kf0 is the forward reaction rate when the spring length is at its equilibrium length. Then the forward reaction rate in [KH01] takes the form
kf = kf0 exp [σ(|xb | − λ)(2r0 − (|xb | − λ))/(2kb T )] .
The strength of the adhesion of each cell (or number of bonds formed via
the above dynamical process) depends on the densities of ligans and receptors
in the contact region between the cell and surface, the area of the contact region, and two reaction rates. For the hyaluronan-mediated adhesion, the above
dynamical bonding approach is a good model. But for the integrin-mediated
adhesions of chondrocytes reported in [SBBR+ 02], we can apply the above
model to form bonds in a probabilistic way with two different considerations:
(1) having larger string constants since focal adhesions and fibrillar adhesions
are much stronger than the hyaluronan-mediated adhesions, (2) after the number of bonds reaches its plateau, we switch to the deterministic approach to
decide when the bond should be break off by checking whether its length is
longer than a chosen one.
3 A Fictitious Domain Formulation
for the Fluid/Particle Interaction and Its Discretization
3.1 Fictitious Domain Formulation
In this section we briefly discuss a fictitious formulation for the fluid-particle
interaction in shear flow and discretization in space and time developed
[PG02]. Let Ω ⊂ R2 be a rectangular region (three-dimensional cases have
been discussed in [PG05]). We suppose that Ω is filled with a Newtonian
viscous incompressible fluid (of density ρf and viscosity µf ) and contains a
moving neutrally buoyant rigid particle B centered at G = {G1 , G2 }t of density ρf (see Fig. 2); the flow is modeled by the Navier–Stokes equations and
the motion of B is described by the Euler–Newton equations. We define
Cell Adhesion and Detachment in Shear Flow
213
Fig. 2. An example of two-dimensional flow region with one rigid body.
Wg0 ,p = {v | v ∈ (H 1 (Ω))2 , v = g0 (t) on the top and bottom of Ω and
v is periodic in the x1 -direction},
W0,p = {v | v ∈ (H 1 (Ω))2 , v = 0 on the top and bottom of Ω and
v is periodic in the x1 -direction},
2
2
q dx = 0 ,
L0 = q | q ∈ L (Ω),
Ω
−→
Λ0 (t) = {µ | µ ∈ (H (B(t)))2 , µ, ei B(t) = 0, i = 1, 2, µ, Gx⊥ B(t) = 0}
1
−→
with e1 = {1, 0}t , e2 = {0, 1}t , Gx⊥ = {−(x2 − G2 ), x1 − G1 }t and ·, ·B(t)
an inner product on Λ0 (t) which can be the standard inner product on
(H 1 (B(t)))2 (see [GPH+ 01, Section 5] for further information on the choice of
·, ·B(t) ). Then the fictitious domain formulation with distributed Lagrange
multipliers for flow around a freely moving neutrally buoyant particle (see
[GPHJ99, GPH+ 01] for detailed discussion of non-neutrally buoyant cases) is
as follows:
For a.e. t > 0, find u(t) ∈ Wg0 ,p , p(t) ∈ L20 , VG (t) ∈ R2 , G(t) ∈ R2 ,
ω(t) ∈ R, λ(t) ∈ Λ0 (t) such that
$
#
∂u
+ (u · ∇)u · v dx + 2µf
D(u) : D(v) dx −
p∇ · v dx
ρf
Ω
Ω
Ω ∂t
−λ, vB(t) = ρf
g · v dx +
F · v dx, ∀v ∈ W0,p ,
(1)
Ω
Ω
q∇ · u(t)dx = 0, ∀q ∈ L2 (Ω),
(2)
Ω
µ, u(t)B(t) = 0, ∀µ ∈ Λ0 (t),
dG
= VG ,
dt
VG (0) = V0G , ω(0) = ω 0 , G(0) = G0 = {G01 , G02 }t ,
)
u0 (x),
∀x ∈ Ω \ B(0),
u(x, 0) = u0 (x) =
0
0
0
0 t
VG + ω {−(x2 − G2 ), x1 − G1 } , ∀x ∈ B(0),
(3)
(4)
(5)
(6)
214
J. Hao et al.
where u and p denote velocity and pressure, respectively, the boundary conditions for the velocity field g0 (t) is 0 at the bottom of Ω and (c, 0)t at the
top of Ω with a fixed speed c for shear flow, λ is a Lagrange multiplier,
D(v) = [∇v + (∇v)t ]/2, g is gravity, F is the pressure gradient pointing in
the x1 -direction, VG is the translation velocity of the particle B, and ω is the
angular velocity of B. We suppose that the no-slip condition holds on ∂B. We
also use, if necessary, the notation φ(t) for the function x → φ(x, t).
Remark 1. The hydrodynamical forces and torque imposed on the rigid body
by the fluid are built in (1)–(6) implicitly (see [GPHJ99, GPH+ 01] for details),
thus we do not need to compute them explicitly in the simulation. Since in
(1)–(6) the flow field is defined on the entire domain Ω, it can be computed
with a simple structured grid.
The forces obtained from those Hookean springs in the model for cell
adhesion has been splitted from the above equations and will be used when
predicting and correcting the motion and positions of cells with the short
repulsion force as discussed in the next section. ⊓
⊔
Remark 2. In (3), the rigid body motion in the region occupied by the particle
is enforced via Lagrange multipliers λ. To recover the translation velocity
VG (t) and the angular velocity ω(t), we solve the following equations:
)
−→
ei , u(t) − VG (t) − ω(t)Gx⊥ B(t) = 0,
−→
−→
Gx⊥ , u(t) − VG (t) − ω(t)Gx⊥ B(t) = 0.
for i = 1, 2,
(7)
⊓
⊔
,
,
Remark 3. In (1), 2 Ω D(u) : D(v) dx can be replaced by Ω ∇u : ∇v dx
since u is divergence free and in W0,p . Also the gravity g in (1) can be absorbed
into the pressure term. ⊓
⊔
3.2 Space Approximation and Time Discretization
Concerning the space approximation of the problem (1)–(6) by a finite element
method, we have chosen P1 -iso-P2 and P1 finite elements for the velocity field
and pressure, respectively (like in [BGP87]). More precisely, with h, a space
discretization step, we introduce a finite element triangulation Th of Ω and
then T2h a triangulation twice coarser. (In practice, we should construct T2h
first and then Th by joining the midpoints of the edges of T2h , dividing thus
each triangle of T2h into four similar subtriangles as shown in Figure 3.)
We approximate then Wg0 ,p , W0,p , L2 and L20 by the following finite dimensional spaces, respectively:
Cell Adhesion and Detachment in Shear Flow
215
Fig. 3. Subdivision of a triangle of T2h .
Wg0 ,h (t) = {vh | vh ∈ (C 0 (Ω))2 , vh |T ∈ P1 × P1 , ∀T ∈ Th , vh = g0 (t)
on the top and bottom of Ω and v is periodic at Γ
in the x1 -direction},
0
(8)
2
W0,h = {vh | vh ∈ (C (Ω)) , vh |T ∈ P1 × P1 , ∀T ∈ Th , vh = 0
on the top and bottom of Ω and v is periodic at Γ
in the x1 -direction},
L2h
L20,h
(9)
0
= {qh | qh ∈ C (Ω), qh |T ∈ P1 , ∀T ∈ T2h , qh is periodic at Γ
in the x1 -direction},
= {qh | qh ∈ L2h ,
qh dx = 0}.
(10)
(11)
Ω
In (8)–(11), P1 is the space of polynomials in two variables of degree ≤ 1.
Remark 4. A different choice of finite element, the Taylor–Hood finite element, for the velocity field has been considered in [JGP02] for simulating the
fluid/particle interaction via distributed Lagrange multiplier based fictitious
domain method for non-neutrally buoyant particles. ⊓
⊔
A finite dimensional space approximating Λ0 (t) is defined as follows: let
{xi }N
i=1 be a set of points covering B(t) (see Figure 4, for example); we define
then
7
)
N
2
µi δ(x − xi ), µi ∈ R , ∀i = 1, ..., N ,
(12)
Λh (t) = µh | µh =
i=1
where δ(·) is the Dirac measure at x = 0. Then, instead of the scalar product
of (H 1 (B(t)))2 we shall use ·, ·Bh (t) defined by
µh , vh Bh (t) =
N
i=1
µi · vh (xi ),
∀µh ∈ Λh (t), vh ∈ W0,h .
(13)
Then we approximate Λ0 (t) by
−→
Λ0,h (t) = µh | µh ∈ Λh (t), µh , ei Bh (t) = 0, i = 1, 2, µh , Gx⊥ Bh (t) = 0 .
(14)
216
J. Hao et al.
*
*
*
* *
* * *
* *
*
* * *
* * * *
* * * * *
* * * *
* * * * *
* * * * *
* * * *
* * * * *
* * * * *
* * *
* * * *
* * *
** *
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
* * *
* *
*
* * *
*
* * *
* * * * *
* * * *
*
* * * *
* * * *
*
* * * *
* * * * *
* * * * *
* * *
* * * *
*
* *
* * *
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Fig. 4. An example of set of collocation points chosen for enforcing the rigid body
motion inside the disk and at its boundary.
Using the above finite dimensional spaces leads to the following approximation of the problem (1)–(6):
For a.e. t > 0, find u(t) ∈ Wg0 ,h (t), p(t) ∈ L20,h , VG (t) ∈ R2 , G(t) ∈ R2 ,
ω(t) ∈ R, λh (t) ∈ Λ0,h (t) such that
$
#
∂uh
+ (uh · ∇)uh · v dx + µf
∇uh : ∇v dx
ρf
∂t
Ω
Ω
−
ph ∇ · v dx − λh , vBh (t) =
F · v dx, ∀v ∈ W0,h ,
(15)
Ω
Ω
q∇ · uh (t)dx = 0, ∀q ∈ L2h ,
(16)
Ω
µ, uh (t)Bh (t) = 0, ∀µ ∈ Λ0,h (t),
dG
= VG ,
dt
VG (0) = V0G , ω(0) = ω 0 , G(0) = G0 = {G01 , G02 }t ,
uh (x, 0) = u0,h (x)
(with ∇ · u0,h = 0).
(17)
(18)
(19)
(20)
Applying a first order operator splitting scheme, Lie’s scheme [CHMM78]
and backward Euler scheme at some fractional steps, to discretize the equations (15)–(20) in time, we obtain (after dropping some of the subscripts h):
Algorithm 1
Step 1. u0 = u0,h , V0G , ω 0 , and G0 are given;
Step 2. For n ≥ 0, knowing un , VnG , ω n and Gn , compute un+1/6 and pn+1/6
via the solution of
Cell Adhesion and Detachment in Shear Flow
217
⎧
⎪
un+1/6 − un
⎪
⎨ρf
· v dx −
pn+1/6 ∇ · v dx = 0, ∀v ∈ W0,h ,
△t
Ω
Ω
⎪
⎪
, pn+1/6 ∈ L20,h .
⎩ q∇ · un+1/6 dx = 0, ∀q ∈ L2h ; un+1/6 ∈ Wgn+1
0 ,h
Ω
(21)
Step 3. Compute un+2/6 via the solution of
⎧
∂u
⎪
· v dx + (un+1/6 · ∇)u · v dx = 0,
⎪
⎨
Ω
Ω ∂t
,
on
(tn , tn+1 ),
∀v
∈
W
0,h
⎪
⎪
⎩
n
n+1/6
,
u(t ) = u
; u(t) ∈ Wgn+1
0 ,h
un+2/6 = u(tn+1 ).
(22)
(23)
Step 4. Compute un+3/6 via the solution of
⎧
un+3/6 − un+2/6
⎨
· v dx + αµf
∇un+3/6 · ∇v dx = 0,
ρf
△t
Ω
Ω
⎩
.
∀v ∈ W0,h ; un+3/6 ∈ Wgn+1
0 ,h
(24)
Step 5. Predict the position and the translation velocity of the center of mass
4
n+ 4 ,0
of the particles as follows: Take VG 6 = VnG and Gn+ 6 ,0 = Gn .
Then predict the new position of the particle via the following subcycling and predicting-correcting technique:
For k = 1, . . . , N ,
Call Adhesive Dynamics Algorithm,
4
n+ 6 ,k = Vn+ 6 ,k−1 + Fr (Gn+ 46 ,k−1 )△t/2N ,
V
G
G
n+ 64 ,k
G
n+ 46 ,k
VG
4
n+ 64 ,k
4
= Gn+ 6 ,k−1 + (V
G
n+ 64 ,k−1
= VG
r
+ F (G
n+ 64 ,k
G
=G
n+ 46 ,k−1
+ (Fr (G
n+ 64 ,k−1
+
n+ 64 ,k−1
+ VG
n+ 46 ,k
)△t/4N ,
+
(26)
)
))△t/4N ,
n+ 4 ,k
(VG 6
(25)
n+ 4 ,k−1
VG 6
)△t/4N ,
(27)
(28)
enddo;
4
4
n+ 4 ,N
n+ 4
and let VG 6 = VG 6 , Gn+ 6 = Gn+ 6 ,N .
n+5/6
Step 6. Now, compute un+5/6 , λn+5/6 , VG
, and ω n+5/6 via the solution
of
⎧
un+5/6 − un+3/6
⎪
⎪
⎪
ρ
·
v
dx
+
βµ
∇un+5/6 · ∇v dx
f
f
⎪
⎪
△t
⎪
Ω
Ω
⎨
= λ, vB n+4/6 , ∀v ∈ W0,h ,
(29)
h
⎪
n+5/6
⎪
⎪
µ,
u
n+4/6 = 0,
⎪
Bh
⎪
⎪
⎩ ∀µ ∈ Λn+4/6 ; un+5/6 ∈ W n+1 , λn+5/6 ∈ Λn+4/6 ,
g0 ,h
0,h
0,h
218
J. Hao et al.
n+5/6
and solve for VG
and ω n+5/6 from
⎧
−−−−−→⊥
⎪
⎨ei , un+5/6 − Vn+5/6 − ω n+5/6 Gn+4/6 x n+4/6 = 0, for i = 1, 2,
G
Bh
−−−−−→⊥
−−−−−→⊥
⎪
n+5/6
⎩Gn+4/6 x , un+5/6 − V
− ω n+5/6 Gn+4/6 x B n+4/6 = 0.
G
h
(30)
n+5/6
n+1,0
n+1,0
n+4/6
Step 7. Finally, take VG
= VG
and G
=G
. Then predict
the final position and translation velocity as follows:
For k = 1, . . . , N ,
Call Adhesive Dynamics Algorithm,
n+1,k = Vn+1,k−1 + Fr (Gn+1,k−1 )△t/2N ,
V
G
G
(31)
n+1,k + Vn+1,k−1 )△t/4N ,
n+1,k = Gn+1,k−1 + (V
G
G
G
n+1,k
VG
=
Vn+1,k−1
G
+ (F (G
r
n+1,k
r
) + F (G
n+1,k−1
(32)
))△t/4N ,
(33)
n+1,k
Gn+1,k = Gn+1,k−1 + (VG
+ Vn+1,k−1
)△t/4N ,
G
(34)
enddo;
n+1,N
and let Vn+1
= VG
, Gn+1 = Gn+1,N ; and set un+1 = un+5/6 ,
G
ω n+1 = ω n+5/6 .
n+s
=
= Wg0 ,h (tn+1 ), Λ0,h
In Algorithm 1, we have tn+s = (n + s)△t, Wgn+1
0 ,h
n+s
n+s
Λ0,h (t
), Bh
is the region occupied by the particle centered at Gn+s ,
and Fr is the combination of a short range repulsion force which prevents the
particle/particle and particle/wall penetration (see, e.g., [GPHJ99, GPH+ 01])
and the force obtained from the adhesive dynamics algorithm for the cell
adhesion. Finally, α and β verify α + β = 1; we have chosen α = 1 and β = 0
in the numerical simulations discussed later.
The degenerated quasi-Stokes problem (21) is solved by a preconditioned
conjugate gradient method introduced in [GPP98], in which discrete elliptic
problems from the preconditioning are solved by a matrix-free fast solver from
FISHPAK by Adams et al. in [ASS80]. The advection problem (22) for the
velocity field is solved by a wave-like equation method as in [DG97]. The
problem (24) is a classical discrete elliptic problem which can be solved by
the same matrix-free fast solver. To enforce the rigid body motion inside
the region occupied by the particles, we have applied the conjugate gradient
method discussed in [PG02, PG05].
4 Numerical Results and Discussion
We consider the detachment of 20 cells in shear flow as the test problem for
cell adhesion model at the initial stage of the adhesion. The computational
domain is Ω = (0, 23)×(0, 10) (unit: 10 µm). Cells have the shape of an ellipse,
Cell Adhesion and Detachment in Shear Flow
219
Table 1. Simulation parameters.
Parameters
Definition
simulation value
R
Nr
NL
λ
σ
µ
ρ
Umax
Hc
T
kf0
kr0
r0
cell radius
receptor number
ligand density
equilibrium bond length
spring constant
viscosity
fluid density
shear rate
cut-off length
temperature
forward reaction rate
reverse reaction rate
reactive compliance
4.0–5.0 µm
780
106 –108 /cm
0.2 µm
0.016 dyne/cm
0.01–0.014 g/cm·s
1.0 g/cm2
20–80/s
0.4 µm
310 K
100.0/s
10.0/s
0.02 µm
with the long semi-axis ra equal to 0.5 and the short semi-axis rb equal to
0.4. The velocity boundary conditions are as follows: a given constant on the
top boundary, zero on the bottom boundary, and periodicity in the horizontal
direction. The fluid and cells are at rest and the cells are in the contact region
initially (see Fig. 6(a)). We assume that the densities of fluid and cells are
1 g/cm3 . The mesh size h for the flow field is 1/48 and the time step △t is
0.001 (unit: 0.1 second). The parameters used in the simulations are given in
Table 1.
We observed the simulations up to t = 100 (10 s), long enough for the flow
to be fully developed. The simulations were conducted at different shear rates
and dynamical viscosities, and the results are summarized in Table 2. From
the table, we can see, no cells were detached from the wall by the observed
time when the shear rate is 20/s for the dynamical viscosity of 0.01 g/cm-s;
while the detachment percentage increases from 10% to 40% when the shear
rate increases from 30/s to 40/s. All the 20 cells were detached from the wall
when the shear rate is greater than 80/s. Figure 5 shows the effect of shear
rate on cell detachment. This observation qualitatively agrees with the in
vitro experiment [SKE+ 99]. We also observed that the detachment percentage
increases from 10% to 35% when the dynamical viscosity is increased from 0.01
to 0.014 (g/cm-s).
Figure 6 shows the snapshots of positions of 20 cells at t = 0, 5, 5.35,
6.06, 9.49, and 10 (s), for the simulation with the dynamical viscosity equal
to 0.01 (g/cm-s) and the shear rate of 30 (/s). The snapshots quite clearly
depict the process of cell detachment from the wall. All the cells adhered to
the wall at t = 5 s; one cell was about to be detached at t = 5.35 s; one cell
was completely detached from the layer at t = 6.06 s. We found that during
the early stage of detachment the percentage of the detached cells is highly
220
J. Hao et al.
Table 2. The calculated detachment percentages at t = 10 s.
Dynamical viscosity (g/cm-s)
Shear rate (/s)
Detachment (%)
0.01
0.01
0.014
0.01
0.01
20
30
30
40
80
0
10
35
40
100
100
percentage of detached cell (%)
90
80
70
60
50
40
30
20
10
0
20
30
40
50
shear rate ( /s)
60
70
80
Fig. 5. The effect of shear rate on cell detachment (viscosity= 0.01 g/cm-s).
linearly correlated with the observed time. This observation was also found
in in vitro experiments [SBBR+ 02].
We have used our models and algorithms to simulate adhesion and detachment of chondrocytes. The simulations successfully depicted the process of cell
detachment from the wall. The numerical results qualitatively agree with the
experiments in the literature. Since there are few publications on modeling
chondrocytes for this problem, our modeling and simulation are quite preliminary. More work is needed in modeling and in investigating parameters for
the cell adhesion at different stages as discussed in [ZBCAG04].
Acknowledgement. We acknowledge the helpful comments and suggestions of R.
Bai, S. Canic, E. J. Dean, R. Glowinski, J. He, H. H. Hu, P. Y. Huang, G. P.
Galdi, D. D. Joseph, and Y. Kuznetsov. We acknowledge also the support of NSF
(grants ECS-9527123, CTS-9873236, DMS-9973318, CCR-9902035, DMS-0209066,
DMS-0443826) and DOE/LASCI (grant R71700K-292-000-99).
Cell Adhesion and Detachment in Shear Flow
221
(a)
10
8
y
6
4
2
0
0
5
10
15
20
15
20
15
20
15
20
15
20
15
20
x
(b)
10
8
y
6
4
2
0
0
5
10
x
(c)
10
8
y
6
4
2
0
0
5
10
x
(d)
10
8
y
6
4
2
0
0
5
10
x
(e)
10
8
y
6
4
2
0
0
5
10
x
(f)
10
8
y
6
4
2
0
0
5
10
x
Fig. 6. Snapshots of 20 cells at t = 0.0 s (a), 5.0 s (b), 5.35 s (c), 6.06 s (d), 9.49 s
(e), and 10.0 s (f) (viscosity = 0.01 g/cm-s, shear rate = 30/s). The percentage of
detached cells is 10% at t = 10.0 s.
222
J. Hao et al.
References
[ASS80]
J. Adams, P. Swarztrauber, and R. Sweet. FISHPAK: A package of
Fortran subprograms for the solution of separable elliptic partial differential equations. The National Center for Atmospheric Research,
Boulder, CO, 1980.
[BGP87]
M. O. Bristeau, R. Glowinski, and J. Periaux. Numerical methods for
the Navier–Stokes equations. Applications to the simulation of compressible and incompressible viscous flow. Comput. Phys. Reports,
6:73–187, 1987.
[CH96]
K. Chang and D. Hammer. Influence of direction and type of applied
force on the detachment of macromolecularly-bound particles from surfaces. Langmuir, 12:2271–2282, 1996.
[CHMM78] A. J. Chorin, T. J. R. Hughes, J. E. Marsden, and M. McCracken.
Product formulas and numerical algorithms. Comm. Pure Appl. Math.,
31:205–256, 1978.
[CKGA03] M. Cohen, E. Klein, B. Geiger, and L. Addadi. Organization and
adhesive properties of the hyaluronan pericellular coat of chondrocytes
and epithelial cells. Biophys. J., 85:1996–2005, 2003.
[DG97]
E. J. Dean and R. Glowinski. A wave equation approach to the numerical solution of the Navier–Stokes equations for incompressible viscous
flow. C. R. Acad. Sci. Paris Sér. I Math., 325(7):783–791, 1997.
[GHR04]
U. R. Goessler, K. Hörmann, and F. Riedel. Tissue engineering with
chondrocytes and function of the extracellular matrix (review). Int. J.
Mol. Med., 13:505–513, 2004.
[GPH+ 01] R. Glowinski, T.-W. Pan, T. I. Hesla, D. D. Joseph, and J. Périaux.
A fictitious domain approach to the direct numerical simulation of incompressible viscous flow past moving rigid bodies: Application to particulate flow. J. Comput. Phys., 169(2):363–426, 2001.
[GPHJ99] R. Glowinski, T.-W. Pan, T. Hesla, and D. D. Joseph. A distributed
Lagrange multiplier/fictitious domain method for particulate flows. Int.
J. Multiph. Flow, 25(5):755–794, 1999.
[GPP98]
R. Glowinski, T.-W. Pan, and J. Périaux. Distributed Lagrange multiplier methods for incompressible flow around moving rigid bodies.
Comput. Methods Appl. Mech. Engrg., 151(1–2):181–194, 1998.
[JGP02]
L. H. Juarez, R. Glowinski, and T.-W. Pan. Numerical simulation of
the sedimentation of rigid bodies in an incompressible viscous fluid
by Lagrange multiplier/fictitious domain methods combined with the
Taylor–Hood finite element approximation. J. Sci. Comput., 17:683–
694, 2002.
[KH01]
M. R. King and D. A. Hammer. Multiparticle adhesive dynamics.
interactions between stably rolling cells. Biophys. J., 81:799–813, 2001.
[KS06]
C. Korn and U. S. Schwarz. Efficiency of initiating cell adhesion in
hydrodynamic flow. Phys. Rev. Lett., 97, 2006. 138103.
[Loe93]
R. F. Loeser. Integrin-mediated attachment of articular chondrocytes
to extracellular matrix proteins. Arthritis Rheum., 36:1103–1110, 1993.
[PG02]
T.-W. Pan and R. Glowinski. Direct simulation of the motion of neutrally buoyant circular cylinders in plane Poiseuille flow. J. Comput.
Phys., 181:260–279, 2002.
Cell Adhesion and Detachment in Shear Flow
[PG05]
223
T.-W. Pan and R. Glowinski. Direct simulation of the motion of neutrally buoyant balls in a three-dimensional Poiseuille flow. C. R.
Mécanique, 333:884–895, 2005.
[SBBR+ 02] T. Scott-Burden, J. P. Bosley, D. Rosenstrauch, K. D. Henderson,
F. J. Clubb, H. C. Eichstaedt, K. Eya, I. Gregoric, T. J. Myers,
B. Radovancevic, and O. H. Frazier. Use of autologous auricular chondrocytes for lining artificial surfaces: a feasibility study. Ann. Thorac.
Surg., 73:1528–1533, 2002.
[SKE+ 99] R. M. Schinagl, M. S. Kurtis, K. D. Ellis, S. Chien, and R. L. Sah.
Effect of seeding duration on the strength of chondrocyte adhesion to
articular cartilage. J. Orthopaedic Research, 17:121–129, 1999.
[SZD03]
M. E. Staben, A. Z. Zinchenko, and R. H. Davis. Motion of a particle
between two parallel plane walls in low-Reynolds-number Poiseuille
flow. Phys. Fluid, 15:1711–1733, 2003.
[ZBCAG04] R. Zaidel-Bar, M. Cohen, L. Addadi, and B. Geiger. Hierarchical
assembly of cell-matrix adhesion complexes. Biochem. Soc. Trans.,
32(3):416–420, 2004.
Computing the Eigenvalues
of the Laplace–Beltrami Operator
on the Surface of a Torus: A Numerical
Approach
Roland Glowinski1 and Danny C. Sorensen2
1
2
University of Houston, Department of Mathematics, Houston, TX, 77004, USA
roland@math.uh.edu
Rice University, Department of Computational and Applied Mathematics,
Houston, TX, 77251-1892, USA sorensen@rice.edu
Summary. In this chapter, we present a methodology for numerically computing
the eigenvalues and eigenfunctions of the Laplace–Beltrami operator on the surface
of a torus. Beginning with a variational formulation, we derive an equivalent PDE
formulation and then discretize the PDE using finite differences to obtain an algebraic generalized eigenvalue problem. This finite dimensional eigenvalue problem is
solved numerically using the eigs function in Matlab which is based upon ARPACK.
We show results for problems of order 16K variables where we computed lowest 15
modes. We also show a bifurcation study of eigenvalue trajectories as functions of
aspect ration of the major to minor axis of the torus.
1 Introduction
A large number of physical phenomena take place on surfaces. Many of these
are modeled by partial differential equations, a typical example being provided by elastic shells. It is not surprising, therefore, that many questions
have arisen concerning the spectrum of some partial differential operators defined on surfaces. This area of investigation is known as spectral geometry.
Among these operators defined on surfaces, a most important one is the so
called Beltrami Laplacian, also known as the Laplace–Beltrami operator. The
main goal of this chapter is to discuss the computation of the lowest eigenvalues of the Laplace–Beltrami operator associated with the boundary of a torus
of R3 . After a description of our methodology for the computation of these
eigenvalues and their corresponding eigenfunctions, we present selected results from our numerical experiments. The methodology consists of obtaining
a finite difference discretization of a PDE that is equivalent to a more standard variational formulation; then the resulting finite dimensional generalized
226
R. Glowinski and D.C. Sorensen
eigenvalue problem is solved to obtain the approximations. A visualization of
our results show the expected Sturm–Liousville behavior of the eigenfunctions
according to wave number. Eigenvalues are typically multiplicity one or two.
However, we show that for certain ratios of the minor to major radii, it is
possible to create eigenvalues of multiplicity three or four. This indicates an
interesting bifurcation structure is associated with this ratio.
A thorough discussion of the approximate solution of eigenvalue problems
for elliptic operators is given by Babushka and Osborn [BO91].
2 Variational Formulation of the Eigenvalue Problem
Let Σ be the boundary of a three-dimensional torus defined by a great circle
of radius R and a small circle of radius ρ (see Figure 1).
Our goal here is to numerically approximate the eigenvalues and corresponding eigenfunctions of the Laplace–Beltrami operator associated with Σ.
A variational formulation of this problem reads as follows:
Find λ ∈ R, u ∈ H1 (Σ) such that
∇Σ u · ∇Σ vdΣ = λ
uvdΣ, ∀v ∈ H1 (Σ).
(1)
Σ
Σ
In the equation (1):
(i) ∇Σ is the tangential gradient on Σ,
(ii) dΣ is the infinitesimal superficial
(surfacic) measure,
,
(iii) H1 (Σ) = {v|v ∈ L2 (Σ), Σ |∇Σ v|2 dΣ < +∞}.
Any function constant over Σ is an eigenfunction of the Laplace–Beltrami
operator, the corresponding eigenvalue being 0 (of multiplicity 1). Our interest
is in the non-trivial solutions of (1). To compute them (at least some of the
smallest ones), we shall use the (θ, φ) coordinates shown in Figure 1. The
problem (1) takes the following form:
Fig. 1. Torus surface Σ (left) and a view from under the top half (right) showing
the major radius R and angle φ and the minor radius ρ and angle θ.
Eigenvalues of the Laplace–Beltrami Operator on the Surface of a Torus
227
Find u ∈ Hp1 (Ω0 ) and λ, such that
Ω0
#
$
ρ
∂u ∂v
R + ρ cos θ ∂u ∂v
+
dφdθ
R + ρ cos θ ∂φ ∂φ
ρ
∂θ ∂θ
ρ(R + ρ cos θ)uvdφdθ, (2)
=λ
Ω0
for all v ∈ Hp1 (Ω0 ), with Ω0 = (0, 2π) × (0, 2π) and with
Hp1 (Ω0 ) = {v | v ∈ H1 (Ω0 ), v(0, θ) = v(2π, θ), for a.e. θ ∈ (0, 2π),
v(φ, 0) = v(φ, 2π), for a.e. φ ∈ (0, 2π)},
i.e., Hp1 (Ω0 ) is a space of doubly periodic functions. In the following, keep in
mind that 0 < ρ < R .
3 An Equivalent PDE Formulation
It follows from the theory of uniformly elliptic operators with smooth coefficients that solving (2) is equivalent to finding u ∈ C ∞ (Ω 0 ), such that
− (Rρ−1 + cos θ)−1
#
$
∂ 2 u ∂u
∂u
−1
+
cos
θ)
−
(Rρ
∂φ2
∂θ
∂θ
= λρ2 (Rρ−1 + cos θ)u
u(0, θ) = u(2π, θ),
∂u
∂u
(0, θ) =
(2π, θ),
∂φ
∂φ
∀θ ∈ [0, 2π],
∀θ ∈ [0, 2π],
u(φ, 0) = u(φ, 2π),
∂u
∂u
(φ, 0) =
(φ, 2π),
∂θ
∂θ
in Ω0 , (3)
∀φ ∈ [0, 2π],
∀φ ∈ [0, 2π].
4 Finite Difference Discretization
Let I be a positive integer (I ≫ 1 in practice). From I, we define the spatial
discretization step h as h = 2π
I and then φi = ih and θj = jh for i = 0, 1, . . . , I
and j = 0, 1, . . . , I. We denote the point (φi , θj ) by Mij . Taking advantage of
the periodic boundary conditions, we discretize the elliptic equation in (3) at
those points Mij such that 1 ≤ i ≤ I and 1 ≤ j ≤ I.
With the usual notation (uij = u(φi , θj )) we obtain for all 1 ≤ i, j ≤ I
(Rρ−1 +cos θj )−1 (2uij −ui+1j −ui−1j )+(Rρ−1 +cos(θj +h/2))(uij −uij+1 )
+ (Rρ−1 + cos(θj − h/2))(uij − uij−1 ) = λρ2 (Rρ−1 + cos θj )h2 uij , (4)
with uI+1j = u1j and u0j = uIj , for j = 1, 2, . . . , I, and with uiI+1 = ui1 and
ui0 = uiI , for i = 1, 2, . . . , I.
228
R. Glowinski and D.C. Sorensen
If these discrete boundary conditions are used to eliminate the unknowns
uI+1j , u0j , uiI+1 and ui0 , we obtain the following discrete eigenproblem (in
RN , N = I 2 ):
If 2 ≤ i, j ≤ I − 1,
2[(Rρ−1 + cos θj )−1 + Rρ−1 + cos θj cos(h/2)]uij
− (Rρ−1 + cos θj )−1 (ui+1j + ui−1j ) − (Rρ−1 + cos(θj + h/2))uij+1
− (Rρ−1 + cos(θj − h/2))uij−1 = λρ2 (Rρ−1 + cos θj )h2 uij . (5)
If i = 1 and 2 ≤ j ≤ I − 1,
2[(Rρ−1 + cos θj )−1 + Rρ−1 + cos θj cos(h/2)]u1j
− (Rρ−1 + cos θj )−1 (u2j + uIj ) − (Rρ−1 + cos(θj + h/2))u1j+1
− (Rρ−1 + cos(θj − h/2))u1j−1 = λρ2 (Rρ−1 + cos θj )h2 u1j . (6)
If i = j = 1,
2[(Rρ−1 + cos h)−1 + Rρ−1 + cos h cos(h/2)]u11
− (Rρ−1 + cos h)−1 (u21 + uI1 ) − (Rρ−1 + cos(3h/2))u12
− (Rρ−1 + cos(h/2))u1I = λρ2 (Rρ−1 + cos h)h2 u11 . (7)
If i = 1 and j = I,
2[(Rρ−1 + 1)−1 + Rρ−1 + cos(h/2)]u1I
− (Rρ−1 + 1)−1 (u2I + uII ) − (Rρ−1 + cos(h/2))u11
− (Rρ−1 + cos(h/2))u1I−1 = λρ2 (Rρ−1 + 1)h2 u1I . (8)
If i = I and 2 ≤ j ≤ I − 1,
2[(Rρ−1 + cos θj )−1 + Rρ−1 + cos θj cos(h/2)]uIj
− (Rρ−1 + cos θj )−1 (u1j + uI−1j ) − (Rρ−1 + cos(θj + h/2))uIj+1
− (Rρ−1 + cos(θj − h/2))uIj−1 = λρ2 (Rρ−1 + cos θj )h2 uIj . (9)
If i = I and j = 1,
2[(Rρ−1 + cos h)−1 + Rρ−1 + cos h cos(h/2)]uI1
− (Rρ−1 + cos h)−1 (u11 + uI−11 ) − (Rρ−1 + cos(3h/2))uI2
− (Rρ−1 + cos(h/2))uII = λρ2 (Rρ−1 + cos h)h2 uI1 . (10)
Eigenvalues of the Laplace–Beltrami Operator on the Surface of a Torus
229
If i = I and j = I,
2[(Rρ−1 + 1)−1 + Rρ−1 + cos(h/2)]uII
− (Rρ−1 + 1)−1 (u1I + uI−1I ) − (Rρ−1 + cos(h/2))uI1
− (Rρ−1 + cos(h/2))uII−1 = λρ2 (Rρ−1 + 1)h2 uII . (11)
If 2 ≤ i ≤ I − 1 and j = 1,
2[(Rρ−1 + cos(h))−1 + Rρ−1 + cos(h) cos(h/2)]ui1
− (Rρ−1 + cos(h))−1 (ui+11 + ui−11 ) − (Rρ−1 + cos(3h/2))ui2
− (Rρ−1 + cos(h/2))uiI = λρ2 (Rρ−1 + cos(h))h2 ui1 . (12)
If 2 ≤ i ≤ I − 1 and j = I,
2[(Rρ−1 + 1)−1 + Rρ−1 + cos(h/2)]uiI
− (Rρ−1 + 1)−1 (ui+1I + ui−1I ) − (Rρ−1 + cos(h/2))ui1
− (Rρ−1 + cos(h/2))uiI−1 = λρ2 (Rρ−1 + 1)h2 uiI . (13)
If 2 ≤ i, j ≤ I − 1,
2[(Rρ−1 + cos θj )−1 + Rρ−1 + cos θj cos(h/2)]uij
− (Rρ−1 + cos θj )−1 (ui+1j + ui−1j ) − (Rρ−1 + cos(θj + h/2))uij+1
− (Rρ−1 + cos(θj − h/2))uij−1 = λρ2 (Rρ−1 + cos θj )h2 uij . (14)
These finite difference formulas generate an approximation to the problem
(3) in the form of a symmetric generalized eigenvalue problem
Ax = λDx,
(15)
with A sparse and symmetric positive semi-definite and with D positive definite and diagonal (independent of the ordering of the variables). We used
Matlab to solve the problem (15) to obtain approximations to eigenvalues
and corresponding eigenfunctions of (2).
5 Numerical Experiments
The Matlab function eigs which is based upon ARPACK [Sor92, LSY98] was
used to perform the numerical calculation of eigenvalues and corresponding
eigenvectors. In all cases, we computed the 15 lowest (algebraically smallest)
eigenvalues of the generalized eigenvalue problem (15) using the shift-invert
option with shift σ = −.0001. Since the eigenvalues are real and non-negative,
the eigenvalues closest to the origin are enhanced with this transformation
and thus easily computed with a Krylov method.
230
R. Glowinski and D.C. Sorensen
Sparsity Pattern of A
0
10
Eigenvalues λj vs wavenumber j, ρ = 1, R = 1.3333, N = 128
3.5
20
3
30
2.5
2
λ
j
40
1.5
50
1
0.5
60
0
10
20
30
nz = 320
40
50
0
60
0
5
10
15
j − wave number
Fig. 2. Sparsity pattern (left) of the matrix A and eigenvalue distribution (right)
of the lowest 15 modes plotted as a function of index.
Contour 2, λ = 0.32332, ρ = 1, R = 1.705, N = 128
6
5
φ − axis
4
3
2
1
1
2
3
4
5
6
θ − axis
Fig. 3. Contour (left) and surface (right) plots of an eigenfunction corresponding
to the lowest nontrivial eigenvalue λ2 which is a double eigenvalue.
The Matlab command used to accomplish this was
[V,Lambda] = eigs(A,D,15,-.0001);
which calculates the k = 15 eigenvalues closest to the shift σ = −.0001.
The computed eigenvalues are returned as a diagonal matrix Lambda and the
corresponding eigenvectors are returned as the corresponding columns of the
N × k matrix V. Figure 2 shows the sparsity pattern of the matrix A.
Figure 3 shows the eigenfunction surface and its contours of the eigenfunction corresponding to the smallest nonzero eigenvalue λ2 . This is a double eigenvalue so λ3 = λ2 and the eigenfunction for λ3 is not shown here.
Below this (Fig. 4) are the surface plots of the eigenfunctions of modes 4 to
15. Surfaces 4 and 7 (the simple sheets) correspond to single eigenvalues. The
remaining eigenfunction surfaces correspond to double eigenvalues. In all of
these plots, R = 4/3 and ρ = 1. The dimension of the matrix is N = 16, 384
corresponding to I = 128 resulting from a grid stepsize of h = 2π/128.
Eigenvalues of the Laplace–Beltrami Operator on the Surface of a Torus
231
Fig. 4. Eigenfunctions corresponding to eigenvalues λ4 to λ15 (in order left to right,
top to bottom).
Eigenvalues as Function of R/ρ
4
3.5
3
λ
j
2.5
2
1.5
1
0.5
0
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Ratio R/ρ
Fig. 5. Bifurcation diagram of 14 leading nontrivial eigenvalues as functions of the
ratio R/ρ. Solid curves are double eigenvalues and dashed curves are singletons.
We note that eigenfunctions associated with single eigenvalues are sheets
that only change sign in the θ direction. Eigenfunctions corresponding to
double eigenvalues change sign in both the θ and φ directions. We studied the
eigenvalue trajectories plotted as functions of the aspect ratio R/ρ and noted
that crossings of these curves provided instances of quadruple eigenvalues
and also of triple eigenvalues. Results of this study are shown graphically in
Figure 5.
232
R. Glowinski and D.C. Sorensen
6 Conclusions
We have addressed the numerical solution of a problem from spectral geometry, namely the computation of the lowest eigenvalues of the Laplace–Beltrami
operator on the surface of a torus in R3 . The methodology developed here
is expected to apply to a number of other surfaces. If combined with appropriate continuation techniques, this approach should enable the numerical solution of certain nonlinear eigenvalue such as those encountered in
[FGH07a, FGH07b, ETFS94, SSS]. We also briefly studied the bifurcations
of the eigenvalue trajectories as functions of the aspect ration R/ρ. An interesting observation was that trajectories of double eigenvalues could cross
other trajectories of double eigenvalues to provide quadruple eigenvalues to
appear at certain ratios. The significance of this will be a subject of future
study.
Acknowledgement. This work was supported in part by the NSF through Grants
DMS-9972591, CCR-9988393, ACI-0082645 and DMS-0412267.
References
[BO91]
I. Babuska and J. E. Osborn. Eigenvalue problems. In P. G. Ciarlet and
J.-L. Lions, editors, Handbook of Numerical Analysis. Vol. II, Finite
Element Methods (Part 1), pages 641–787. North-Holland Publishing
Company, Amsterdam, 1991.
[ETFS94] W. S. Edwards, L. S. Tuckerman, R. A. Friesner, and D. C. Sorensen.
Krylov methods for the incompressible Navier–Stokes equations. Journal
of Computational Physics, 110:82–102, 1994.
[FGH07a] F. Foss, R. Glowinski, and R. H. W. Hoppe. On the numerical solution
of a semilinear elliptic eigenproblem of Lane–Emden type. (I): Problem formulation and description of the algorithms. Journal of Numerical
Mathematics, 15:181–208, 2007.
[FGH07b] F. Foss, R. Glowinski, and R. H. W. Hoppe. On the numerical solution of
a semilinear elliptic eigenproblem of Lane–Emden type. (II): Numerical
experiments. Journal of Numerical Mathematics, 15:277–298, 2007.
[LSY98]
R. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users Guide:
Solution of Large Scale Eigenvalue Problems with Implicitly Restarted
Arnoldi methods. SIAM Publications, Philadelphia, PA, 1998.
[Sor92]
D. C. Sorensen. Implicit application of polynomial filters in a k-step
Arnoldi method. SIAM Journal on Matrix Analysis and Applications,
13:357–385, 1992.
[SSS]
H. A. Smith, R. K. Singh, and D. C. Sorensen. A Lanczos-based eigensolution technique for exact vibration analysis. International Journal for
Numerical Methods in Engineering, 36:1987–2000.
A Fixed Domain Approach in Shape
Optimization Problems with Neumann
Boundary Conditions
Pekka Neittaanmäki1 and Dan Tiba2
1
2
University of Jyväskylä, Department of Mathematical Information Technology,
P.O. Box 35 (Agora), FI-40014 University of Jyväskylä, Finland pn@mit.jyu.fi
Institute of Mathematics, Romanian Academy, P.O. Box 1-764, RO-014700
Bucharest, Romania dan.tiba@imar.ro
Summary. Fixed domain methods have well-known advantages in the solution of
variable domain problems, but are mainly applied in the case of Dirichlet boundary
conditions. This paper examines a way to extend this class of methods to the more
difficult case of Neumann boundary conditions.
1 Introduction
Starting with the well-known monograph of Pironneau [Pir84], shape optimization problems are subject to very intensive research investigations. They
concentrate several major mathematical difficulties: unknown and possibly
non-smooth character of optimal geometries, lack of convexity of the functional to be minimized, high complexity and stiff character of the equations
to be solved numerically, etc. Accordingly, the relevant scientific literature is
huge and we quote here just the books of Mohammadi and Pironneau [MP01]
and of Neittaanmäki, Sprekels and Tiba [NST06] for an introduction to this
domain of mathematics.
In this paper, we study the model optimal design problem
j(x, y(x)) dx
(1)
Min
Ω
subject to the Neumann boundary value problem
⎡
⎤
d
∂y
∂v
⎣
⎦
aij
+ a0 yv dx =
fv
∂xi ∂xj
Ω i,j=1
Ω
for any v ∈ H 1 (Ω).
(2)
236
P. Neittaanmäki and D. Tiba
Here, Ω ⊂ D ⊂ Rd is an unknown domain (the minimization parameter),
while D is a fixed smooth open set in the Euclidean space Rd . The functions
a0 and aij are in L∞ (D) and f ∈ L2 (D), that is (2) makes sense for any Ω
admissible and defines, as it is well known, the unique weak solution y = yΩ ∈
H 1 (Ω) of the second order elliptic equation
d
∂
−
∂xj
i,j=1
∂y
aij
∂xi
+ a0 y = f
in Ω
(3)
with Neumann boundary conditions for the conormal derivative
d
∂y
∂y
aij
=
cos(n̄, xi ) = 0 on ∂Ω.
∂nA
∂x
j
i,j=1
(4)
In the classical formulation (3), (4), ∂Ω has to be assumed smooth and n̄
is the (outward) normal to ∂Ω in the considered points x = (x1 , x2 , ..., xd ).
Non-homogeneous Neumann problems (i.e. with the right-hand side non-zero
in (4)) may be considered as well by a simple translation argument reducing
everything to the homogeneous case.
The functional j : D × R → R is a general convex integrand in the sense
of Rockafellar [Roc70] – more assumptions will be added when necessary.
The open set Ω will be “parametrized” by some continuous function g :
D → R by
(5)
Ω = Ωg = int{x ∈ D | g(x) ≥ 0}
and g ∈ C(D̄) will be the true unknown of the optimization problem (1),
(2). The parametrization is, of course, non-unique, but this does not affect
the argument. Arbitrary Caratheodory open sets Ω ⊂ D may be expressed
in the form Ωg if g is the signed distance function (at some power). Further
constraints on Ω = Ωg (beside Ω ⊂ D) may be imposed in the abstract form
g ∈ C,
(6)
where C ⊂ C(D̄) is some convex closed subset. For instance, if E ⊂ D is a
given subset and C = {g ∈ C(D̄) | g(x) ≥ 0, x ∈ E}, then the constraint
g ∈ C is equivalent with the condition E ⊂ Ω. Other cost functionals may be
studied as well:
j(x, y(x)) dx
E
(if the constraint E ⊂ Ω is imposed) or
j(x, y(x)) dx,
Γ
where Γ ⊂ D is a smooth given manifold and Ω ⊃ Γ for all admissible Ω.
Robin boundary conditions (instead of (4)) may be also discussed by our
Shape Optimization Problems with Neumann Boundary Condition
237
method. In the case of Dirichlet boundary conditions other approaches may
be used [NPT07, NT95, Tib92].
In Section 2 we recall some geometric controllability properties that are at
the core of our approach, while Section 3 contains the basic arguments. The
paper ends with some brief Conclusions.
2 A Controllability-Like Result
In the classical book of Lions [Lio68], it is shown that, when u ∈ L2 (Γ1 ) is
arbitrary and yu is the unique solution (in the transposition sense) of
y=u
−∆y = 0
on Γ1 ,
in G,
y=0
on Γ2 ,
2
u
then the set of normal traces { ∂y
∂n | u ∈ L (Γ1 )} is linear and dense in the
∂y
space H −1 (Γ2 ). Notice that ∂nu ∈ H −1 (Γ2 ) due to some special regularity
results, Lions [Lio68]. Here G ⊂ Rd is an open connected set such that its
boundary ∂G = Γ1 ∪ Γ2 and Γ̄1 ∩ Γ̄2 = ∅. This density result may be interpreted as an approximate controllability property in the sense that the
2
u
“attainable” set of normal derivatives ∂y
∂n (when u ranges in L (Γ1 )) may
−1
approximate any element in the “image” space H (Γ2 ). Constructive approaches, results involving constraints on the boundary control u are reported
in [NST06, Ch. 5.2].
We continue with a distributed approximate controllability property, which
is a constructive variant of Theorem 5.2.21 in [NST06]. We consider the equation (2) in D and with a modified right-hand side:
⎤
⎡
d
∂ṽ
∂
ỹ
⎣
χ0 uṽ dx ∀ṽ ∈ H 1 (D),
(7)
+ a0 ỹṽ ⎦ dx =
aij
∂xi ∂xj
D
D i,j=1
where u ∈ L2 (D) is a distributed control and χ0 is the characteristic function
of some smooth open set Ω0 ⊂ D such that ∂D ⊂ Ω̄0 . That is, Ω0 is a relative
neighborhood of ∂D and we denote Γ = ∂Ω0 \ ∂D. Clearly, Γ̄ ∩ ∂D = ∅.
Theorem 1. Let w ∈ H 1/2 (Γ ) be given and let [uε , yε ] be the unique optimal
pair of the control problem:
1
ε 2
|y − w|H 1/2 (Γ ) + |u|L2 (Ω0 ) , ε > 0,
Min
(8)
2
u∈L2 (Ω0 ) 2
⎡
⎤
d
∂y
∂z
⎣
aij
uz dx ∀z ∈ H 1 (Ω0 ).
(9)
+ a0 yz ⎦ dx =
∂xi ∂xj
Ω i,j=1
Ω0
Then, we have
yε |Γ −→ w
ε→0
strongly in H 1/2 (Γ ).
(10)
238
P. Neittaanmäki and D. Tiba
Proof. The existence and the uniqueness of the optimal pair [uε , yε ] ∈
L2 (Ω0 ) × H 1 (Ω0 ) of the control problem (8), (9) is obvious. The pair [0,0]
is clearly admissible and, for any ε > 0, we obtain
1
ε
1
|yε − w|2H 1/2 (Γ ) + |uε |2L2 (Ω0 ) ≤ |w|2H 1/2 (Γ ) .
2
2
2
Therefore, {yε } and {ε1/2 uε } are bounded respectively in H 1/2 (Γ ), L2 (Ω0 ).
We denote by l ∈ H 1/2 (Γ ) the weak limit (on a subsequence) of {yε − w}.
Let us define the adjoint system by:
⎤
⎡
d
∂p
∂z
ε
⎦
⎣
+ a0 zpε dx = (yε − w)z dσ ∀z ∈ H 1 (Ω0 ), (11)
aij
∂xi ∂xj
Γ
Ω0 i,j=1
which is a non-homogeneous Neumann problem and pε ∈ H 1 (Ω0 ). We also
introduce the equation in variations
⎡
⎤
d
∂z
∂µ
⎣
⎦
+ a0 µz dx =
νz dx ∀z ∈ H 1 (Ω0 ),
(12)
aij
∂xi ∂xj
Ω0
Ω0 i,j=1
which defines the variations yε + λµ, uε + λν for any ν ∈ L2 (Ω0 ) and λ ∈ R.
A standard computation using (11), (12) and the optimality of [uε , yε ]
gives
0 = ε(uε , ν)L2 (Ω0 ) + (yε − w, µ)H 1/2 (Γ )
⎡
⎤
d
∂µ
∂p
ε
⎣
= ε(uε , ν)L2 (Ω0 ) +
aij
+ a0 µpε ⎦ dx
∂xi ∂xj
Ω0 i,j=1
= ε(uε , ν)L2 (Ω0 ) + (pε , ν)L2 (Ω0 ) .
(13)
Due to the convergence properties of the right-hand side in (11), {pε } is
bounded in H 1 (Ω0 ) and we can pass to the limit (on a subsequence) pε → p
weakly in H 1 (Ω0 ), to obtain
⎤
⎡
d
∂p
∂z
⎦
⎣
lz dσ ∀z ∈ H 1 (Ω0 ).
(14)
+ a0 zp dx =
aij
∂xi ∂xj
Γ
Ω0 i,j=1
The passage to the limit in (13), as {ε1/2 uε } is bounded, gives that p ≡ 0 in
Ω0 and (14) shows that l = 0 in Γ .
We have proved (10) in the weak topology of H 1/2 (Γ ). The strong convergence is a consequence of the Mazur theorem [Yos80] and of a variational
argument.
Shape Optimization Problems with Neumann Boundary Condition
239
Remark 1. The Mazur theorem alone and the linearity of (9) produces a
sequence ũε (of convex combinations of uε ) such that the corresponding sequence of states ỹε satisfies (10). Theorem 1 gives a constructive answer to
the approximate controllability property.
If Ω0 is smooth enough and w ∈ H 3/2 (Γ ), then the trace theorem ensures
∂ ŷ
= 0 (null conormal derivative)
the existence of ŷ ∈ H 2 (Ω0 ) such that ∂n
A
and ŷ|Γ = w. That is, the control
d
∂
û = −
∂x
j
i,j=1
∂ ŷ
aij
∂xi
+ a0 ŷ
ensures the exact controllability property. Notice that û is not unique since any
element in H02 (Ω0 ) may be added to ŷ with all the properties being preserved.
3 A Variational Fixed Domain Formulation
We assume that Ω = Ωg , where g ∈ C(D̄), is as in (5). Motivated by the result
in the previous section, we consider the following homogeneous Neumann
problem in D:
d
∂
∂ ỹ
−
(15)
aij
+ a0 ỹ = f + (1 − H(g))u in D,
∂xj
∂xi
i,j=1
∂y
=0
∂nA
on ∂D.
(16)
Here H(·) is the Heaviside function in R and H(g) is, consequently, the characteristic function of Ωg . Under conditions of Theorem 1, the restriction y = ỹ|Ωg
is the solution of (2) in Ω = Ωg . Moreover, since g = 0 on ∂Ωg , under smoothness conditions, ∇g is parallel to n̄, the normal to ∂Ωg . Then, we can rewrite
(4) as
d
∂y
∇g · ei = 0 on ∂Ωg ,
(17)
aij
∂x
j
i,j=1
where we use that cos(n̄, xi ) = cos(∇g, xi ) and ei is the vector of the axis xi .
If the elliptic operator is the Laplace operator, then (17) becomes simply
∇g · ∇y = 0 on ∂Ωg .
In order to fix a unique u ∈ L2 (D) satisfying to (15), (16), (17), we define the
following optimal control problem with state constraints:
1
2
Min
u dx ,
(18)
u∈L2 (D) 2 D
governed by the state system (15), (16) and subject to the state constraint
(17).
240
P. Neittaanmäki and D. Tiba
The discussion in Section 2 shows the existence of infinitely many admissible pairs [u, y] for the constrained control problem (15)–(18). (Here g is fixed
satisfying the necessary smoothness properties.)
In case g and Ωg ⊂ D are variable and unknown, we say that (15)–(18) is
the variational fixed domain (in D!) formulation of the Neumann boundary
value problem. One can write the optimality conditions that give a system of
equations equivalent with (15)–(18) and extend the Neumann problem from
Ωg to D.
We introduce the penalized control problem, for ε > 0, as follows (here
[g ≡ 0] denotes ∂Ωg ):
7
)
1
1
2
2
u dx +
F (yε ) dσ
(19)
Min
2 D
2ε [g≡0]
u∈L2 (D)
subject to
−
d
∂
∂xj
i,j=1
aij
∂yε
∂xi
+ a0 yε = f + (1 − H(g))u
∂yε
=0
∂nA
in D,
(20)
on ∂D.
(21)
Above,
F (y) =
d
i,j=1
aij
∂y
∇g · ei
∂xj
and the problem (19)–(21), which is unconstrained, remains a coercive and
strictly convex control problem. That is, we have the existence and the uniqueness of the approximating optimal pair [uε , yε ] ∈ L2 (D) × H 2 (D) (if ∂D is
smooth enough).
Proposition 1. We have
1
|F (yε )|L2 (∂Ωg ) ≤ Cε 2 ,
uε → û
yε → ŷ
2
(22)
strongly in L (D),
(23)
strongly in H 2 (D),
(24)
where C is a constant independent of ε > 0 and [û, ŷ] ∈ L2 (D) × H 2 (D) is
the unique optimal pair of (15)–(18).
Proof. As in Section 2, by the trace theorem, we may choose ỹ ∈ H 2 (D \ Ωg )
∂ ỹ
with the property that ∂n
= 0 in ∂(D \ Ωg ) and ỹ may be extended to the
A
solution of (2) inside Ωg . We can compute ũ ∈ L2 (D \ Ωg ) by (20) and extend
it by 0 inside Ωg . Then [ũ, ỹ] is an admissible pair for the control problem
(19)–(21) and, by the optimality of [uε , yε ], we get
Shape Optimization Problems with Neumann Boundary Condition
1
2
D
u2ε dx +
1
2ε
[g≡0]
F (yε )2 dσ ≤
1
2
ũ2 dx
241
(25)
D
since F (ỹ) = 0 in ∂Ωg .
The inequality (25) gives (22) and {uε } bounded in L2 (D). By (20), (21),
{yε } is bounded in H 2 (D) and, on a subsequence, we have yε → ŷ, uε →
û weakly in H 2 (D), respectively in L2 (D), where [û, ŷ] again satisfy (20),
(21). Moreover, one can pass to the limit in (22) with ε → 0, to see that
F (ŷ) = 0 in ∂Ωg . This shows that [û, ŷ] is an admissible pair for the original
state constrained control problem (15)–(18). For any admissible pair [µ, z] ∈
L2 (D) × H 2 (D) of (15)–(18), we have F (z) = 0 on ∂Ωg and the inequality
(25) is valid with ũ replaced by µ and we infer
1
1
u2ε dx ≤
µ2 dx.
2 D
2 D
The weak lower semicontinuity of the norm gives
1
1
2
(û) dx ≤
µ2 dx,
2 D
2 D
that is, the pair [û, ŷ] is, in fact, the unique optimal pair of (15)–(18) and we
also have
2
uε dx =
(û)2 dx.
lim
ε→0
D
D
2
Then uε → û strongly in L (D) and yε → ŷ strongly in H 2 (D) by the strong
convergence criterion in uniformly convex spaces. The convergence is valid
without taking subsequences due to the uniqueness of [û, ŷ].
Remark 2. One can further regularize H in (20), by replacing it with a mollification H ε of the Yosida approximation Hε of the maximal monotone extension
of H.
Remark 3. One may take in D even null Dirichlet boundary conditions instead
of (16). Similar distributed controllability properties (approximate or exact)
may be established in very much the same way.
To write shortly, we consider the case of the Laplace operator. The penalized and regularized problem is the following:
)
7
1
1
2
2
Min
u dx +
[∇y · ∇g] dσ ,
2 D
2ε [g≡0]
u∈L2 (D)
−∆y + y = f + (1 − H ε (g))u
y=0
in D,
on ∂D.
Here, the control u ensures the “transfer” from Dirichlet to Neumann (null)
conditions on ∂Ωg and all the results are similar as for the Neumann–Neumann
case.
242
P. Neittaanmäki and D. Tiba
Theorem 2. The gradient of the cost functional (19) with respect to u ∈
L2 (D) is given by
∇J(uε ) = uε + (1 − H(g))pε
in D,
(26)
where pε ∈ L2 (D) is the unique solution of the adjoint equation
⎡
⎤
d
∂z
∂
1
⎣
⎦
F (yε )F (z) dσ
pε −
aij
+ a0 z dx =
∂xj
∂xi
ε [g≡0]
D
i,j=1
∀z ∈ H 2 (D),
∂z
= 0 on ∂D, (27)
∂nA
in the sense of transpositions.
Proof. We discuss first the existence of the unique transposition solution
to (27).
The equation in variations corresponding to (20), (21) is
d
∂
∂z
(28)
aij
+ a0 z = (1 − H(g))v in D,
−
∂xj
∂xi
i,j=1
∂z
=0
∂nA
on ∂D,
(29)
for any v ∈ L2 (D). By regularity theory for differential equations, the unique
solution of (28), (29) satisfies z ∈ H 2 (D).
We perturb this equation by adding δv, δ > 0, in the right-hand side
and we denote by zδ the corresponding solution, zδ ∈ H 2 (D). The mapping
v → zδ , as constructed above, is an isomorphism Tδ : L2 (D) → W = {z ∈
∂z
= 0 on ∂D}.
H 2 (D) | ∂n
A
We define the linear continuous functional on L2 (D) by
1
F (yε )F (Tδ v) dσ ∀v ∈ L2 (D).
(30)
v −→
ε [g≡0]
The Riesz representation theorem applied to (30) ensures the existence of a
unique p̃δ ∈ L2 (D) such that
1
F (yε )F (Tδ v) dσ ∀v ∈ L2 (D).
(31)
p̃δ v =
ε [g≡0]
D
Choosing v = Tδ−1 z, z ∈ W arbitrary, the relation (31) gives
⎛
⎞
d
∂
∂z
p̃δ (1 − H(g) + δ)−1 ⎝−
aij
+ a0 z ⎠ dx
∂x
∂x
j
i
D
i,j=1
1
F (yε )F (z) dσ ∀z ∈ W.
=
ε [g≡0]
(32)
Shape Optimization Problems with Neumann Boundary Condition
243
By redenoting pε = p̃δ (1 − H(g) + δ)−1 ∈ L2 (D) (which conceptually may
depend on δ > 0) in (32) we have proved the existence for (27). The uniqueness
of pε may be shown by contradiction, directly in (27), as the factor multiplying
pε in the left-hand side of (27) “generates” the whole L2 (D) when z ∈ W is
arbitrary.
Coming back to the equation in variations (28), (29) and to the definition of
the control problem (19)–(21), the directional derivative of the cost functional
(19) is given by
1
1
F (yε )F (z) dσ
(33)
uε v dx +
lim [J(uε + λv) − J(uε )] =
λ→0 λ
ε [g≡0]
D
and the Euler equation is
1
0=
F (yε )F (z) dσ
uε v dx +
ε [g≡0]
D
∀v ∈ L2 (D)
(34)
with z defined by (28), (29). By using (27) in (34), since z given by (28), (29)
is an admissible test function, we get
⎡
⎤
d
∂
∂z
0=
pε ⎣−
uε v dx +
aij
+ a0 z ⎦ dx
∂x
∂x
j
i
D
D
i,j=1
uε v dx +
pε (1 − H(g))v dx. (35)
=
D
D
This proves (26) and ends the argument.
Remark 4. Theorem 2 may be applied for any control u ∈ L2 (D). For the
optimal control uε , the directional derivative (and the gradient) is null and
we obtain uε = −pε (1 − H(g)), that is, uε has support in D \ Ωg . This relation
is the maximum (Pontryagin) principle applied to the control problem (19)–
(21). Moreover, one can eliminate uε and write the following system of two
elliptic equations:
d
∂yε
∂
−
aij
+ a0 yε = f − (1 − H(g))2 pε in D,
(36)
∂x
∂x
j
i
i,j=1
D
⎡
pε ⎣−
d
∂
∂x
j
i,j=1
aij
∂z
∂xi
∂yε
=0
∂nA
⎤
+ a0 z ⎦ =
on ∂D,
1
ε
[g≡0]
F (yε )F (z) dσ
∀z ∈ W,
(37)
which constructs in an explicit manner the extension of the Neumann boundary value problem from Ωg to D, modulo the approximation discussed in
Proposition 1.
244
P. Neittaanmäki and D. Tiba
4 Conclusions
The shape optimization problem (1), (2) is transformed in this way into the
optimal control problem
H(g)j(x, y(x)) dx
(38)
Min
g∈C
D
subject to (15)–(17) which, in turn, may be approximated by (19)–(21) or,
equivalently, by (36)–(37). To obtain good differentiability properties with respect to g in the optimization problem (38), one should replace H by H ε ,
some regularization of H, as previously mentioned. Analyzing further approximation properties and the gradient for (38) is a nontrivial task. However,
the application of evolutionary algorithms is possible since it involves just the
values of the cost (38) and no computation of the gradient with respect to g.
As initial population of controls g for the genetic algorithm, corresponding
to the finite element mesh in D, one may use the basis functions for the piecewise linear and continuous finite element basis. In case some supplementary
information is available on the desired shape (for instance, coming from the
constraints), this should be imposed on the initial population. Then, standard
procedures specific to evolutionary algorithms [Hol75] are to be applied.
References
[Hol75]
J. R. Holland. Adaptation in natural and artificial systems. The University
of Michigan Press, Ann Arbor, MI, 1975.
[Lio68] J.-L. Lions. Contrôle optimal des systèmes gouvernées par des equations
aux dérivées partielles. Dunod, Paris, 1968.
[MP01] B. Mohammadi and O. Pironneau. Applied shape optimization for fluids.
The Clarendon Press, Oxford University Press, New York, 2001.
[NPT07] P. Neittaanmäki, A. Pennanen, and D. Tiba. Fixed domain approaches in
shape optimization problems with Dirichlet boundary conditions. Reports
of the Department of Mathematical Information Technology, Series B,
Scientific Computing B16/2007, University of Jyväskylä, Jyväskylä, 2007.
[NST06] P. Neittaanmäki, J. Sprekels, and D. Tiba. Optimization of elliptic systems. Springer-Verlag, Berlin, 2006.
[NT95] P. Neittaanmäki and D. Tiba. An embedding of domains approach in
free boundary problems and optimal design. SIAM J. Control Optim.,
33(5):1587–1602, 1995.
[Pir84] O. Pironneau. Optimal shape design for elliptic systems. Springer-Verlag,
Berlin, 1984.
[Roc70] R. T. Rockafellar. Convex analysis. Princeton University Press, Princeton,
NJ, 1970.
[Tib92] D. Tiba. Controllability properties for elliptic systems, the fictitious domain method and optimal shape design problems. In Optimization, optimal control and partial differential equations (Iaşi, 1992), number 107 in
Internat. Ser. Numer. Math., pages 251–261, Basel, 1992. Birkhäuser.
[Yos80] K. Yosida. Functional analysis. Springer-Verlag, Berlin, 1980.
Reduced-Order Modelling of Dispersion
Jean-Marc Brun1 and Bijan Mohammadi2
1
2
CEMAGREF/ITAP, FR-34095 Montpellier, France
jean-marc.brun@cemagref.fr
I3M-Univ. Montpellier II, CC051, FR-34095 Montpellier, France
bijan.mohammadi@univ-montp2.fr
Summary. We present low complexity models for the transport of passive scalars
for environmental applications. Multi-level analysis has been used with a reduction
in dimension of the solution space at each level. Similitude solutions are used in a
non-symmetric metric for the transport over long distances. Model parameters identification is based on data assimilation. The approach does not require the solution
of any PDE and, therefore, is mesh free. The model also permits to access the solution in one point without computing the solution over the whole domain. Sensitivity
analysis is used for risk analysis and also for the identification of the sources of an
observed pollution.
Key words: Reduced order modelling, source identification, risk analysis by
sensitivity, non-symmetric geometry.
1 Introduction
Air and water contamination by pesticides is a major preoccupation for health
and environment. One aims to model pesticide transport in atmospheric flows
with very low calculation cost making assimilation-simulation and statistic
risk analysis by Monte Carlo simulations realistic. In this problem available
data is incomplete with large variability and the number of parameters involved large. Solution space reduction and reduced order modelling appear,
therefore, as natural way to proceed.
Our contribution is to build a multi-level approach where a given level
provides the inlet condition for the level above. In each level one aims to use
a priori information in the definition of the search space for the solution and
avoid the solution of partial differential equations.
More precisely, a near field (to the injection device) search space is build
using experimental observations. Once this local solution known, the amount
of specie leaving the atmospheric sub-layer is evaluated. This quantity is candidate for long distance transport using similitude solutions for mixing layers and plumes [Sim97]. These are known in Cartesian metrics. An original
246
J.-M. Brun and B. Mohammadi
contribution here is the generalization of these solutions in a non-symmetric
travel-time based metric to account for non-uniform winds. We add constraint
such that solutions built with this approach to be solution of the direct model
(i.e. flow equations and transport model for a passive scalar). In particular,
the divergence free condition for the generated winds, conservation, positivity
and linearity of the solution of transport equations are requested.
Numerical examples show a comparison of this approach with a PDE based
simulation. Examples also show multi-source configurations as well as sensitivity analysis of detected pollution. This is useful for both source identification
and risk analysis.
2 Reduced-Order Modelling
One aims to model very large multi-scale phenomenon present in agricultural phyto treatment of cultures. The different entities to account for range
from rows of plants to water attraction basins and one should also consider
local topography and atmospheric conditions. It is, therefore, obvious that
modelling phenomenon falling in length scales below a few meters becomes
inevitable.
Consider the calculation of a state variable V (p), function of independent
variables p. Our aim is to define a suitable search space for the solution V (p)
instead of considering a general function space. This former approach is what
one does in finite element methods, for instance, where the solution is expressed in some subspace S({WN }) described by the functional basis chosen
{WN }, with the quality of the solution being monitored either through the
mesh quality and/or increasing the order of the finite element [Cia78]. In all
cases, the size of the problem is large 1 ≪ N < ∞ and if the approach is
consistent, the projected solution tends to the exact solution when N → ∞.
In a low-complexity approach, one replaces the calculation of V (p) by a
projection over a subspace S({wn }) generated, for instance, by {wn }, a family
of solutions (‘snapshots’) of the initial full model (p → V (p)). In particular,
one aims n ≪ N [VP05].
In our approach, we aim to remove the calculation of these snapshots
as this is not always an easy task. We take advantage of what we know on
the physic of the problem and replace the direct model p → V (p) by an
approximate model p → v(p) easier to evaluate. This is a very natural way to
proceed, as often one does not need all the details on a given state. Also it is
sufficient for the low-complexity model to have a local validation domain: one
does not necessarily use the same low-complexity model over the whole range
of the parameters. We have used this approach in the incomplete sensitivity
concept where the linearization is performed not for the direct model but for
an approximate state equation [MP01].
Reduced-Order Modelling of Dispersion
247
2.1 Near-Field Solution
The first step is to model the solution at the outlet of the injection device used
to expand the phyto treatment in between rows. One important hypothesis
is to assume two different time scales based on the injection velocity and
the velocity at which the injection source moves. The injection velocity being
much higher, one assumes the local concentration at the outlet of the injection
device to be established instantaneously. This instantaneous local flow field
is devoted to vanish immediately and not to affect the overall atmospheric
circulation. This injection velocity is only designed to determine the part of
the pollutant leaving near-ground area and being candidate for transport over
large distances (see Section 2.2). These are strong hypotheses which seriously
reduce the search space for the solution.
One considers a cylindrical local reference frame where z indicates the
motion direction for the vehicle in the field. One looks for local injection
solutions of the form:
ul ∼ f1 (r)g1 (θ)(zh1 (z) + (1 − h1 (z))r) and cl ∼ f2 (r)g2 (θ)h2 (z),
(1)
where the subscript l reads for local. r is a unit vector having its origin at the
injection point and visiting the unit circle around this point in the plan perpendicular to z. This defines an instantaneous flow field around the injection
point. cl denotes the local distribution of a passive scalar. fi (r), i = 1, 2, are
solutions of a control problem for the assimilation of experimental data by a
PDE based model obtained by dimension reduction of the Navier–Stokes and
transport equations [Fin00, RT81, Sum71, Bru06]. These experimental data
show that after injection both the flow velocity and phyto products concentration drop to nearly zero after three rows of vegetation. gi (θ), i = 1, 2, are
Gauss distributions describing the characteristics of the injection device and
are provided by the manufacturer. hi (z), i = 1, 2, include the characteristics
of the vegetation by assimilation of experimental data and inform on how
the density of the vegetation deviates the flow horizontally. h1 (z) ∈ [−1, 1] is
an erf function, odd and monotonic increasing, and h2 (z) ∈ [0, 1] is a Gauss
distribution.
At this level, one includes compatibility conditions coming from the governing equations. In particular, one aims for the conservation condition to
hold for the concentration of the passive scalar, the flow field to be divergence
free and both variables to verify an advection equation:
cl dv = given.
(2)
∇ · ul = ul · ∇cl = 0
R3
To summarize, the coefficients in functions fi , gi , hi , i = 1, 2, are a solution
of an assimilation problem for experimental data under the constraint (2)
[Bru06].
From now, one expresses the variables in a global Cartesian reference frame
where z denotes the vertical axis.
248
J.-M. Brun and B. Mohammadi
2.2 Long Range Transport and Non-Symmetric Geometry
The modelling above gives a local distribution for the advected quantities.
We are now interested by the quantities candidate for a transport over large
distances. We suppose that those are given by
cl dz or c+ (x, y) = ul + cl ,
c+ (x, y) =
z>H
where H ∼ 2 − 3 m and u+
l = max(0, (u · z)/u). The total quantity being
transported is given by
c+ (x, y)dσ,
C=
R2
which should be conserved by the reduced-order transport model we would
like to build and for which c+ is the input condition.
One aims now to again reduce the search space for the solution. The primary factors influencing the dispersion of a neutral plume are advection by
the wind and turbulent mixing. The simplest model of this process is to assume that the plume advects downwind and spreads out in the horizontal
and vertical directions. Hence, the distribution of a passive scalar c, emitted
from a given point and transported by a uniform plane flow filed U along
x-coordinate, is given by
+
(3)
c(x, y, z) = cc (x)f ( y 2 + z 2 , δ(x)),
where
cc (x) ∼ exp(−a(U )x)
+
+
and f ( y 2 + z 2 , δ(x)) ∼ exp(−b(U, δ(x)) y 2 + z 2 ).
cc is the behavior along the central axis of the distribution and δ(x) characterizes the thickness of the distribution at a given x-coordinate. An analogy
exists with plane or axisymmetric mixing layers and neutral plumes where δ is
parabolic for a laminar jet and linear in turbulent cases [Cou89, Sim97]. a(·)
is a positive monotonic decreasing function and b(·, ·) is positive, monotonic
increasing in U and decreasing in δ. In a uniform atmospheric flow field, this
solution can be used for the transport of c+ above.
We would like to generalize this solution in a non-symmetric metric defined
by migration times based on the flow field and hence treat the case of variable
flow fields.
Nonsymmetric Geometry
In a symmetric geometry the distance function between two points A and B
verifies
d(A, B) = 0 ⇒ A = B,
d(A, B) = d(B, A),
d(A, B) ≤ d(A, C) + d(C, B).
Reduced-Order Modelling of Dispersion
249
But the distance function can be non-uniform with anisotropy (the unit
spheres being ellipsoids). In a chosen metric M the distance between A and
B is given by
dM (AB) =
1
−−→ −−→1/2
dt,
ABM(A + tAB)AB
−→
t−
0
where M is positive definite and symmetric in symmetric geometries. With
M = I, one recovers the Euclidean geometry and variable M permits to account for anisotropy and non-uniformity of the distance function. We have
widely used this approach for mesh adaptation for steady and unsteady phenomenon [AGFM02, HM97, BGM97] linking the metric to the Hessian of the
solution. This definition of the metric permits to equi-distribute the interpolation error over a given mesh and, therefore, monitor the quality of the
solution.
Consider now the following distance function definition:
Definition 1. If A is upwind with respect to B then
B⊥
ds/u = T,
d(B, A) = ∞ and d(A, B) =
A
where T is the migration time from A to B
by A.
⊥
along the characteristic passing
u is the local velocity along this characteristic and is, by definition, tangent
to the characteristic. B ⊥ denotes the projection of B over this characteristic in the Euclidean metric. One supposes that this characteristic is unique,
hence avoiding sources and attraction points in the flow field. In case of nonuniqueness of this projection, one chooses the direction of the projection which
satisfies best the constraint u · ∇cg = 0 in B.
Generalized Plume Solution
Once this distance built, we assume the distribution of a passive scalar transported by a flow u can be written as:
cg = cc (d)f (d⊥
E , δ(d)).
(4)
Here the subscript g reads for global and mentions long distance transport.
d⊥
E is the Euclidean distance in the normal direction local to the characteristic
at B ⊥ (i.e. along direction BB ⊥ ).
Flow Field
One should keep in mind that in realistic configurations, one has very little
information on the details of the atmospheric flow compared to the accuracy
250
J.-M. Brun and B. Mohammadi
one would like for the transport. As an example, the flow will be described
probably by less than one point by several square kilometers. We consider
the near to ground flow field built from observation data as solution of the
following system:
u = ∇φ, −∆φ =
∇φ(xi ) − uobs (xi ),
(5)
i=1,..,nobs
where φ is a scalar potential and nobs the number of observation points. The
observations are close to the ground at z = H and this construction gives a
map of the flow near the ground. This is completed in the vertical direction
using generalized wall functions for turbulent flows [MP94, MP06]:
(u · τ )+ = (u · τ )/uτ = f (z + ) = f (zuτ /ν),
where τ = uH /uH is the local tangent unit vector to the ground in the
direction of the flow and we assume (u · n(z = H) = 0) if n is the normal
to the ground. This is a non-linear equation giving uτ , the friction velocity,
knowing (u · τ )H and is used, in turn, to define the horizontal velocity u · τ =
uτ f (z + ) for z > H. This construction gives two components of the flow and
the divergence free condition implies the third component is constant and,
therefore, it vanishes as it is supposed zero at z = H. This construction can
be improved but we find it sufficient for the level of accuracy required. In
presence of ground variations, the flow is locally rotated to remain parallel to
the ground (see also Section 2.2 for ground variation modelling).
Calculation of Migration Times
As we said, our approach aims to provide the solution at a given point without
calculating the whole solution. Being in point B, one needs an estimation
of the migration time from the source in A to B using the construction in
Section 2.2.
We avoid the construction of characteristics using an iterative polynomial
definition for a characteristic s(t) = (x(t), y(t), z(t)), t ∈ [0, 1], starting from
a third-order polynomial function verifying for each coordinate:
Pn (0) = xA ,
Pn (1) = xB ,
Pn′ (0) = u1A ,
Pn′ (1) = u1B (same for y and z).
If Pn′ (ζ) = u1 (x = Pn (ζ)) this new point should be assimilated by the construction increasing by one the polynomial order. ζ ∈]0, 1[ is chosen randomly.
The migration time is computed over this polynomial approximation of
the characteristic. Here we make the approximation B ⊥ = B which means
the characteristic passing by A passes exactly by B which is unlikely. In a
uniform flow, this means we suppose the angle between the central axis and
AB is small (cosine near 1). One introduces, therefore, a correction factor of
2/3 = 0.636 on the calculated times. This is the stochastic averaged cosine
Reduced-Order Modelling of Dispersion
251
value for a white noise for angles between 0 and π. Once d is calculated by this
⊥
procedure one needs to define d⊥
is unknown. We
E which is unknown as B
⊥
∗
∗
make the approximation dE ∼ dE (B, B ) where B is the projection of B over
the vector u, the averaged velocity along the polynomial characteristic. This
approach gives satisfactory results for smooth atmospheric flow fields which
is our domain of interest as no phyto treatments is, in principle, applied when
the wind is too strong or if the temperature is too high (e.g., for winds stronger
than 20 km/h and air temperature more than 30◦ C). This also makes that the
polynomial construction above gives satisfaction with low order polynomials.
Ground Variations
At this point one accounts for the topography or ground variations ((x, y) →
ψ(x, y)) in the prediction model above. These are available from digital terrain
models (DTM) [Arc06]. Despite this plays an important role in the dispersion
process, it is obviously hopeless to target direct simulation based on a detailed
ground description. One should mention that ground variations effects are implicitly present in observation data for wind and transport as mentioned in
Section 2.2. However, as we said, observations are quite incomplete and to
improve the predictive capacity of the model one needs to model the dependency between ground variations and migration time. Therefore, in addition
to the mentioned assimilation problem, one scales the migration times used
for transport over large distances by a positive monotonic decreasing function
f (φ) with f (0) = 1 where
φ = (∇x,y ψ · uH )/uH .
Here uH is the ‘close to ground’ constructed flow field based on the assimilated
observations.
3 Parameter and Source Identification
Two types of inverse problems have been treated. The first inverse problem
is for parameter identification in the model above assimilating either local
experimental data (as described in Section 2.1) or partial data available on
wind uobs and transported species cobs measured by localized apparatus. In
particular, the unknown parameters in our global transport model comes from
the solution of a minimization problem for:
J(p, uobs , cobs ) = c(p, u(p, uobs )) − cobs ,
(6)
where p gathers all unknown independent parameters in Section 2.2. · is
a discrete L2 -norm over the measurement points. u(p) is the completion of
available wind measurements (uobs ) over the domain described in Section 2.2.
252
J.-M. Brun and B. Mohammadi
Once the model is established, the second inverse problem of interest is
the identification of possible sources of an observed pollution. This region is
defined where Jp′ is large. In this case, the parameter p is the location of the
different sources (cultures).
To solve the minimization problems, we use a semi-deterministic global
optimization algorithm based on the solution of the following boundary value
problem [MS03, Ivo06, IMSH06]:
)
pζζ + pζ = −Jp′ (p(ζ)),
(7)
p(0) = p0 , J(p(1)) = Jm = 0,
where ζ ∈ [0, 1] is a fictitious parameter. Jm is the infimum of our inverse
problems (here taken as 0). This can be solved using solution techniques for
BVPs with free surface to find p(1) realizing the infimum (i.e. J(p(1)) = Jm ).
An analogy can be given with the problem of finding the interface between
water and ice which is only implicitly known through the iso-value of zero
temperature. In case a local minima is enough, the second boundary condition
can be replaced by Jp′ (p(1)) = 0.
This algorithm requires the sensitivity of the functional with respect to
independent variables p. An interesting feature of the present low-cost modelling is that gradients are also available at very low calculation cost. Indeed,
sensitivity evaluation for large dimension minimization problems is not an
easy task. The most efficient approach is to use an adjoint variable with the
difficulty that it requires the development of a specific software. Automatic
differentiation brings some simplification, but does not avoid the main difficulty of intermediate states storage, even though check-pointing technique
brings some relief [Gri01, CFG+ 01]. By simplifying the solution of the transport problem, the present approach also addresses this issue.
4 Numerical Results
The application of low complexity transport model to several flow condition
is shown. Typical fields of 0.01 ∼ 0.1 km2 have been considered in a region of
400 km2 . Rows are spaced by about 1.5 m. The source of the treatment moves
at a speed of around 1 m/s and the injection velocity is taken at 7 to 10 m/s for
a typical treatment of 100 kg/km2 . Mono and multi sources situations (Figs.
3 and 4) are considered and examples of the constructed flow field are shown
together with the wind measurement points assimilated by the model (Figs. 1
and 4). The transport-based and the Euclidean distances have been reported
for a given point in Fig. 2. The impact of ground variations on the advected
species is shown in Fig. 5. An example of source identification problem is
shown in Fig. 6.
Reduced-Order Modelling of Dispersion
253
Fig. 1. Typical trajectory of the vehicle in a culture of 10000 m2 and the location
of this field in a calculation domain of 400 km2 . Wind measurements based on two
points have been reported together with the constructed divergence free flow field
at z = H ∼ 3 m.
Fig. 2. Examples of symmetric Euclidean and non-symmetric travel time based
distances.
5 Concluding Remarks
A low-complexity model has been presented for the prediction of passive scalar
dispersion in atmospheric flows for environmental and agricultural applications. The solution search space has been reduced using a priori physical information. A non-symmetric metric based on migration times has been used
to generalize injection and plume similitude solutions in the context of variable flow fields. Data assimilation has been used to define the flow field and
the parameters in the dispersion model. Sensitivity analysis has been used
254
J.-M. Brun and B. Mohammadi
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0.03
0.025
0.02
0.015
0.01
0.005
0
Fig. 3. Generalized similitude solution (right) for a 2-point based wind (similar to
Fig. 1) compared to a direct simulation with a PDE based transport-diffusion model
for the same wind. The similitude solution has been evaluated on all the nodes of
the finite element mesh for comparison.
Fig. 4. Regions affected from the treatment of two sources. The flow field has been
built from three points of measurement indicated on the picture.
220
200
180
N
160
140
120
100
N
80
2 km
Fig. 5. Left: a typical digital terrain model (x and y coordinates range over 2 km).
Dispersion in a uniform north wind with (middle) and without (right) the ground
model (Section 2.2).
Reduced-Order Modelling of Dispersion
255
Fig. 6. Left: constructed flow field. Middle: dispersion from a vineyard. Right: sensitivity analysis for a dispersion detected on the lower left corner. One can, therefore,
give possible origins of a pollution.
together with this low-complexity modelling to introduce robustness issues in
the prediction. In addition to the data assimilation inverse problem, inverse
source reconstruction has been considered as a natural demand in environmental survilance. The current work concerns the introduction of stochastic
analysis in the present model to produce regional parametric risk maps using
Monte Carlo simulations which become achievable thanks to the low calculation cost of the approach.
Acknowledgement. This contribution is dedicated to Professor O. Pironneau for his
60th birthday. It has been realized for Cemagref at Montpellier, France. The authors would like to thank V. Bellon-Maurel, B. Bonicelli, B. Ruelle and C. Sinfort
for their kindness and valuable comments. Thanks also to S. Labbe from Cemagref/Teledetection for having made available to us DTM models.
References
[AGFM02] F. Alauzet, P.-L. George, P. Frey, and B. Mohammadi. Transient fixed
point based unstructured mesh adaptation. Internat. J. Numer. Methods Fluids, 43(6):729–745, 2002.
[Arc06]
ArcGIS. Geographic information system, 2006. http://www.esri.com/
software/arcgis.
[BGM97] H. Borouchaki, P.-L. George, and B. Mohammadi. Delaunay mesh generation governed by metric specifications. Finite Element in Analysis
and Design, 2:85–109, 1997.
[Bru06]
J. M. Brun. Modélisation à complexité réduite de la dérive. PhD thesis,
University of Montpellier, 2006.
[CFG+ 01] G. Corliss, C. Faure, A. Griewank, L. Hascoet, and U. Naumann, editors. Automatic differentiation of algorithms: From simulation to optimization. Number 50 in Lect. Notes Comput. Sci. Eng. Springer, Berlin,
2001. Selected papers from the AD2000 Conference, Nice, France, June
2000.
256
J.-M. Brun and B. Mohammadi
[Cia78]
[Cou89]
[Fin00]
[Gri01]
[HM97]
[IMSH06]
[Ivo06]
[MP94]
[MP01]
[MP06]
[MS03]
[RT81]
[Sim97]
[Sum71]
[VP05]
Ph. Ciarlet. The finite element method for elliptic problems. NorthHolland, 1978.
J. Cousteix. Turbulence et couche limite. Cepadues publishers, 1989.
J. Finnigan. Turbulence in plant canopies. Annu. Rev. Fluid Mech.,
32:519–571, 2000.
A. Griewank. Computational differentiation. Springer, New york, 2001.
F. Hecht and B. Mohammadi. Mesh adaptation by metric control for
multi-scale phenomena and turbulence. AIAA paper 1997-0859, 1997.
B. Ivorra, D. E. Hertzog, B. Mohammadi, and J. G. Santiago. Semideterministic and genetic algorithms for global optimization of microfluidic protein-folding devices. Internat. J. Numer. Methods Engrg.,
66(2):319–333, 2006.
B. Ivorra. Semi-deterministic global optimization. PhD thesis, University of Montpellier, 2006.
B. Mohammadi and O. Pironneau. Analysis of the k-epsilon turbulence
model. Wiley, 1994.
B. Mohammadi and O. Pironneau. Applied shape optimization for fluids.
Oxford University Press, 2001.
B. Mohammadi and G. Puigt. Wall functions in computational fluid
dynamics. Comput. & Fluids, 40(3):2101–2124, 2006.
B. Mohammadi and J. H. Saiac. Pratique de la simulation numérique.
Dunod, Paris, 2003.
M. R. Raupach and A. S. Thom. Turbulence in and above plant
canopies. Annu. Rev. Fluid Mech., 13:97–129, 1981.
J. Simpson. Gravity currents in the environment and laboratory. Cambridge University Press, 2nd edition, 1997.
B. Sumner. A modeling study of several aspects of canopy flow. Monthly
Weather Review, 99(6):485–493, 1971.
K. Veroy and A. Patera. Certified real-time solution of the parametrized
steady incompressible Navier–Stokes equations: Rigorous reduced-basis
a posteriori error bounds.
Internat. J. Numer. Methods Fluids,
47(2):773–788, 2005.
Calibration of Lévy Processes with American
Options
Yves Achdou1
UFR Mathématiques, Université Paris 7, Case 7012, FR-75251 PARIS Cedex 05,
France and Laboratoire Jacques-Louis Lions, Université Paris 6, France
achdou@math.jussieu.fr
Summary. We study options on financial assets whose discounted prices are exponential of Lévy processes. The price of an American vanilla option as a function of
the maturity and the strike satisfies a linear complementarity problem involving a
non-local partial integro-differential operator. It leads to a variational inequality in a
suitable weighted Sobolev space. Calibrating the Lévy process may be done by solving an inverse least square problem where the state variable satisfies the previously
mentioned variational inequality. We first assume that the volatility is positive: after
carefully studying the direct problem, we propose necessary optimality conditions
for the least square inverse problem. We also consider the direct problem when the
volatility is zero.
1 Introduction
Black–Scholes’ model [BS73, Mer73] is a continuous time model involving a
risky asset (the underlying asset) whose price at time τ is Sτ and a risk-free
asset whose price at time τ is Sτ0 = erτ , r ≥ 0. It assumes that the price of
the risky asset satisfies the following stochastic differential equation:
dSτ = Sτ (rdτ + σdWτ ),
(1)
where Wτ is a standard Brownian motion on the probability space (Ω, A, P∗ )
(the probability P∗ is called the risk-neutral probability).
An American vanilla call (resp. put) option on the risky asset is a contract
giving its owner the right to buy (resp. sell) a share at a fixed price x at
any time before a maturity date t. The price x is called the strike. Exercising the option yields a payoff P◦ (S) = (S − x)+ (resp. P◦ (S) = (S − x)− )
for the call (resp. put) option, when the price of the underlying asset is S.
1
I wish to dedicate this work to O. Pironneau with all my friendship. I have been
working with Olivier for almost fifteen years now, and for me, it has always been
an exciting intellectual and human experience.
260
Y. Achdou
European options are similar contracts, except that they can be exercised only
at maturity t.
Consider an American option with payoff P◦ and maturity t. Under the assumptions that the market is complete and rules arbitrage out, Black–Scholes’
theory predicts that the price of this option at time τ is
(2)
Pτ = sup E∗ e−r(s−τ ) P◦ (Ss ) Fτ ,
s∈Tτ,t
where Tτ,t denotes the set of stopping times in [τ, t] (see [LL97] for the proof
of this formula). It can also be proved, see, e.g., [BL84, JLL90] that Pτ =
P (τ, Sτ ), where the two variables function P is found by solving a parabolic
linear complementarity problem
∂P
σ2 S 2 ∂ 2 P
∂P
+
− rP ≤ 0, P (τ, S) ≥ P◦ (S), τ ∈ [0, t), S > 0,
+ rS
∂τ
2 ∂S 2
∂S
2 2 2
σ S ∂ P
∂P
∂P
+
−
rP
(P − P◦ (S)) = 0, τ ∈ [0, t), S > 0,
+
rS
∂τ
2 ∂S 2
∂S
P (τ = t, S) = P◦ (S).
(3)
The critical parameter in the Black–Scholes model is the volatility σ. Unfortunately, taking σ to be constant and using (2) or (3) often leads to poor
predictions of the prices of the options which are available on the markets.
One possible fix is to assume that the process driving St is a more general
Lévy process: Lévy processes are processes with stationary and independent
increments which are continuous in probability, see, for example, the book by
Cont and Tankov [CT04] and the references therein.
For a Lévy process Xτ on a filtered probability space with probability P∗ ,
the Lévy–Khintchine formula says that there exists a function χ : R → C such
that
2 2
χ(u) =
σ u
− iβu +
2
E∗ (eiuXτ ) = e−τ χ(u) ,
|z|<1
(eiuz − 1 − iuz)ν(dz) +
|z|>1
(4)
(eiuz − 1)ν(dz),
(5)
for σ ≥ 0, β ∈ R and a positive measure ν on R \ {0} such that
min(1, z 2 )ν(dz) < +∞.
R
The measure ν is called the Lévy measure of X.
We assume that under P∗ , the discounted price of the risky asset is a
martingale, and that it is represented as the exponential of a Lévy process:
e−rτ Sτ = S0 eXτ .
Calibration of Lévy Processes with American Options
261
The fact that the discounted price is a martingale is equivalent to E∗ (eXτ ) = 1,
i.e.
σ2
z
− (ez − 1 − z1|z|≤1 )ν(dz).
e ν(dz) < ∞ and β = −
2
R
|z|>1
,
We will also assume that |z|>1 e2z ν(dz) < ∞, so the discounted price is a
square integrable martingale.
We note B the integral operator:
∂
v(Sez ) − v(S) − S(ez − 1) v(S) ν(dz).
(Bv)(S) =
∂S
R
Consider an American option with payoff P◦ and maturity t: in [BL84],
Bensoussan and Lions assumed σ > 0 and studied the variational inequality stemming from the complementarity problem P (t, S) = P◦ (S), and for
τ < t and S > 0,
σ2 S 2 ∂ 2 P
∂P
∂P
(τ, S) +
(τ, S) − rP (τ, S) + (BP )(τ, S) ≤ 0,
(τ, S) + rS
∂τ
2 ∂S 2
∂S
(6)
P (τ, S) ≥ P◦ (S),
and
⎛
⎞
∂P
σ2 S 2 ∂ 2 P
∂P
⎝ ∂τ (τ, S) + 2 ∂S 2 (τ, S) + rS ∂S (τ, S)⎠ (P (τ, S) − P◦ (S)) = 0,
−rP (τ, S) + (BP )(τ, S)
(7)
(8)
in suitable Sobolev spaces with decaying weights near +∞ and 0. They
proved that the price of the American option is Pτ = P (τ, Sτ ). Other approaches with viscosity solutions are possible, see [Pha98], especially in the
case σ = 0. One advantage of the variational methods is that they provide
stability estimates. For numerical methods for options on Lévy driven assets,
see [MvPS04, MSW04, MNS03, AP05a, CV04, CV03].
In what follows, we assume that the Lévy measure has a density, ν(dz) =
k(z)dz. The main goal of the present work is to study a least-square method
for calibrating the volatility σ and the jump density k in order to recover the
prices of a family of American options available on the market.
We shall focus on a family of vanilla put options indexed by i ∈ I, with
maturities ti and strikes xi . One observes S◦ the price of the risky asset and
the prices (P̄i )i∈I of the above-mentioned family of options. We call T the
maximal maturity: T = maxi∈I ti .
The first idea is to try to minimize the functional (σ, k) → i∈I ωi |P̄i −
Pi (0, S◦ )|2 + JR (σ, k) for k and σ in a suitable set,where
262
•
•
•
Y. Achdou
ωi are positive weights,
JR is a suitable regularizing functional,
the prices Pi (0, S◦ ) are computed by solving problem (6)–(8), with t = ti
and P◦ (S) = (xi − S)+ .
Evaluating the functional requires solving #I variational inequalities. This approach was chosen in [Ach05, AP05b] for calibrating models of local volatility
(i.e. the volatility is a function of t and S) with American options.
In the present case, it is possible to choose a better approach: we call
(τ, S) → P (τ, S, t, x) the pricing function for the vanilla American put with
maturity t and strike x. Hereafter, we use the notation
P◦ (x) = (x − S)+ .
(9)
It can be seen that the solution of (6)–(8) is of the form P (τ, S, t, x) = xg(ξ, y),
y = Sx ∈ R+ , ξ = t − τ ∈ (0, τ ), where g is the solution of a complementarity
problem independent of x, easily deduced from (6)–(8). For brevity, we do
not write this problem. From this observation, easy calculations show that,
as a function of t and x, P (0, S, t, x) satisfies the following forward problem:
P (t = 0) = P◦ and for t ∈ (0, T ] and x > 0,
σ 2 x2 ∂ 2 P
∂P
∂P
−
+ BP ≥ 0,
(10)
+ rx
∂t
2 ∂x2
∂x
2 2
2
∂P
σ x ∂ P
∂P
−
+ BP
+ rx
2
∂t
2 ∂x
∂x
P (t, x) ≥ P◦ (x),
(P − P◦ ) = 0,
where the integral operator B is defined by
∂u
z
−z
z
k(z) x(e − 1) (x) + e (u(xe ) − u(x)) dz.
(Bu)(x) = −
∂x
z∈R
(11)
(12)
(13)
The problem (10)–(12) can also be obtained by probabilistic arguments.
The new approach
process is to minimize the func for calibrating the Lévy
2
tional (σ, k) →
ω
|
P̄
−
P
(t
,
x
)|
+
J
i
i
R (σ, k) for σ and k in a suiti∈I i i
able set, where the prices P (ti , xi ) are computed by solving (10)–(12), with
P◦ (x) = (x − S◦ )+ . In contrast with the previous approach, evaluating the
functional requires solving one variational inequality only.
Such a forward problem is reminiscent of the forward equation which is
often used for the calibration of the local volatility with vanilla European options. This equation is known as Dupire’s equation in the finance community,
see [Dup97, AP05a]. Note that the arguments used to obtain (10)–(12) are
easier than those used for getting Dupire’s equation, because the operator in
(6)–(8) is invariant by any change of variable S → λS, λ > 0, which is not the
case with local volatility. Note also that finding a forward linear complementarity problem in the variables t and x is not possible in the case of American
options with local volatility.
Calibration of Lévy Processes with American Options
263
Calibration of σ and k is an inverse problem for finding the coefficients
of a variational inequality involving a partial integro-differential operator.
The main goal of the paper is to study the last least square optimization
problem theoretically, for a special parameterization of k, see (25) below,
with σ bounded away from 0, and to give necessary optimality conditions. The
results presented here have their discrete counterparts when the variational
inequalities are discretized with finite elements of finite differences. Numerical
results will be presented in a forthcoming paper.
2 Preliminary Results
2.1 Change of Unknown Function in the Forward Problem
It is helpful to change the unknown function: we set
u◦ (x) = (S − x)+ ,
u(t, x) = P (t, x) − x + S.
(14)
The function u satisfies: for t ∈ (0, T ] and x > 0,
∂u σ 2 x2 ∂ 2 u
∂u
−
+ Bu ≥ −rx,
+ rx
2
∂t
2 ∂x
∂x
u(t, x) ≥ u◦ (x),
2 2 2
∂u σ x ∂ u
∂u
−
+ Bu + rx (u − u◦ ) = 0.
+ rx
∂t
2 ∂x2
∂x
(15)
(16)
(17)
The initial condition for u is
u(t = 0, x) = u◦ (x),
x > 0.
(18)
For writing the variational inequalities stemming from (15)–(18), we need
to introduce suitable weighted Sobolev spaces. In particular, fractional order
weighted Sobolev spaces will be useful for studying the non-local part of the
operator.
2.2 Functional Setting
Sobolev Spaces on R
For a real number s, let the Sobolev space H s (R) be defined as follows: the distribution w defined on R belongs to H s (R) if and only if its Fourier transform
w
satisfies
2
(1 + ξ 2 )s |w(ξ)|
dξ < +∞.
R
264
Y. Achdou
The spaces H s (R) are Hilbert spaces, with the inner product and norm:
6
w2 (ξ)dξ, wH s (R) = (w, w)H s (R) .
(w1 , w2 )H s (R) = (1 + ξ 2 )s w
1 (ξ)
R
We refer to [Ada75] for the properties of the spaces H s (R). If s is a nonnegative integer, we define the semi-norm
21
s - dℓ v -2
.
|v|H s (R) =
- dy ℓ - 2
L (R)
ℓ=1
If s > 0 is not an integer, we define |v|H s (R) by
|v|2H s (R)
m - ℓ -2
-d v=
- dy ℓ - 2
ℓ=1
L (R)
+
R
R
dm v
dy m (y)
−
2
dm v
dy m (z)
|y − z|1+2s
,
where m is the integer part of s.
Some Weighted Sobolev Spaces on R+
Let L2 (R+ ) be the Hilbert space of square
integrable functions on R+ , en,
1
2
dowed with the norm vL (R+ ) = ( R+ v(x)2 dx) 2 and the inner product
,
(v, w)L2 (R+ ) = R+ v(x)w(x)dx. Let V 1 be the weighted Sobolev space
∂v
∈ L2 (R+ ) ,
V 1 = v ∈ L2 (R+ ), x
(19)
∂x
which is a Hilbert space with the norm
12
-2
∂v
vV 1 = v2L2 (R+ ) + .
-x ∂x - 2
L (R+ )
(20)
It is proved in [AP05a] that D(R+ ) is a dense subspace of V 1 , and that the
following Poincaré inequality is true: for all v ∈ V 1 ,
- dv x
vL2 (R+ ) ≤ 2 .
(21)
- dx - 2
L (R+ )
dv
L2 (R+ ) is a norm equivalent to ·V 1 .
Thus the semi-norm |·|V 1 : |v|V 1 = x dx
For a function v defined on R+ , call ṽ the function defined on R by
y
ṽ(y) = v(exp(y)) exp
.
(22)
2
By using the change of variable y = log(x), it can be seen that the mapping
v → ṽ is a topological isomorphism from L2 (R+ ) onto L2 (R), and from V 1
onto H 1 (R). This leads to defining the space V s , for s ∈ R, by:
Calibration of Lévy Processes with American Options
V s = {v : ṽ ∈ H s (R)},
265
(23)
which is a Hilbert space with the norm vV s = ṽH s (R) . Using the interpolation theorem given, e.g., in [Ada75, Theorem 7.17], one can prove that if
0 < s < 1, then V s can be obtained by real interpolation between the spaces
V 1 and L2 (R+ ) (the parameter for the real interpolation is ν = 21 − s), and
that the norm obtained by the interpolation process is equivalent to the one
defined above. For s > 0, the space V −s is the topological dual of V s . For
s > 0, we introduce the semi-norm |v|V s = |ṽ|H s (R) .
Proposition 1. Let s be a real number such that 12 < s ≤ 1. Then for all
u ∈ V s , v is continuous on (0, +∞) and there exists a constant C > 0 such
that for all x ∈ [1, +∞),
√
(24)
x|v(x)| ≤ CvV s .
2.3 The Integro-Differential Operator
The Integral Operator
We study the integral operator B defined in (13). Let ψ be a measurable,
non-negative and essentially bounded function defined on R, and α be a real
number, 0 ≤ α < 1. Consider the kernel
k(z) =
ψ(z)
.
|z|1+2α
(25)
We assume that z → ψ(z) max e2z , 1 is a bounded function. If α = 0 assume,
, −1
furthermore, that −∞ ψ(z)
|z| dz < +∞. Note that, for B defined in (13), Bu is
well defined if, for example, u ∈ D(R+ ).
Remark 1. To avoid ambiguities in the definition of k, we assume in most of
what follows that there exists a positive constant ψ such that ψ(z) ≥ ψ > 0 in
a fixed neighborhood of z = 0. This assumption is a little restrictive, since, for
example, a logarithmic singularity of k will be ruled out. Most of the results
below hold without the last assumption on ψ.
Proposition 2. Assume that z → ψ(z) max e2z , 1 is a bounded function. If
, −1
α = 0 assume, furthermore, that −∞ ψ(z)
|z| dz < +∞. Then, for each s ∈ R,
(i) if α > 21 , then the operator B is continuous from V s to V s−2α ,
(ii) if α < 12 , then the operator B is continuous from V s to V s−1 ,
(iii) if α = 21 , then the operator B is continuous from V s to V s−1−ε , for any
ε > 0.
Remark 2. As a consequence of Proposition 2, if
B is continuous from V α to V −α .
1
2
< α < 1, then the operator
266
Y. Achdou
Proposition 3. If the assumptions of Proposition 2 are satisfied and if 21 <
α < 1, then for any v, w ∈ V α ,
⎞
⎛
k(z)ez (u(x) − u(xe−z ))(v(x) − v(xe−z ))dxdz
⎟
⎜ R+ R
⎟,
Bu, v + Bv, u = ⎜
⎠
⎝
z
2z
+
k(z)(2e − e − 1)dz
u(x)v(x)dx
R
R+
(26)
where ·, · stands for the duality pairing between V −α and V α .
If 0 ≤ α ≤ 21 , then (26) is true for u, v ∈ V s , s > 12 , defining ·, · as the
duality pairing between V −s and V s .
Proposition 4 (Gårding inequality). If the assumptions of Proposition 2
are satisfied and if there exists a constant ψ such that ψ ≥ ψ > 0 almost
everywhere in a neighborhood of 0, then
(i) if 12 < α < 1, there exists a positive constant C and a non-negative
constant λ such that, for all v ∈ V α ,
Bv, v ≥ C|v|2V α − λv2L2 (R+ ) ;
(ii) if α ≤ 21 , then (27) holds for any v ∈ V s , s >
duality pairing between V −s and V s ).
1
2
(27)
(·, · standing for the
Consider the two situations:
1
2
< α < 1, ψ satisfies the assumptions of Proposition 2, and u ∈ V α ,
then it can be shown (using the interpolation theorem in [Ada75, Theorem
7.17]) that the functions u+ and u− belong to V α ;
2. α ≤ 21 , ψ satisfies the assumptions of Proposition 2, and u ∈ V 1 .
, ,
In both cases, R+ R k(z)ez u− (xe−z )u+ (x)dxdz is well defined because
k(z)ez u− (xe−z )u+ (x)dxdz
R+ R
k(z)ez (u− (xe−z ) − u− (x))u+ (x)dxdz
=
1.
R+
≤
R+
R
R
−z
z
k(z)e (u− (xe
2
) − u− (x)) dzdx
12
u+ L2 (R+ ) ,
and is non-negative. Therefore,
k(z)ez (u(xe−z ) − u+ (xe−z ))u+ (x)dxdz
Bu, u+ = Bu+ , u+ −
R+ R
= Bu+ , u+ +
k(z)ez u− (xe−z )u+ (x)dxdz ≥ Bu+ , u+ .
R+
R
Calibration of Lévy Processes with American Options
267
We have proved
Proposition 5. Under the assumptions of Proposition 4, there exist a positive
constant C and a constant λ ≥ 0 such that, for all u ∈ V α if α > 1/2 or for
all u ∈ V 1 if α ≤ 1/2,
Bu, u+ ≥ C|u+ |2V α − λu+ 2L2 (R+ ) .
(28)
A weak maximum principle for parabolic problems stems from Proposition 5.
The Integro-Differential Operator when the Volatility σ is Positive
When σ > 0, the space V 1 plays a special role. Thus, we use the shorter
notation V = V 1 .
With B defined in (13), we introduce the integro-differential operator A:
Av = −
σ 2 x2 ∂ 2 v
∂v
+ Bv.
+ rx
2 ∂x2
∂x
(29)
If σ > 0, and if (α, ψ) satisfy the assumptions of Proposition 4, then
•
•
A is a continuous operator from V to V −1 ,
we have the Gårding inequality: there exist c > 0 and λ ≥ 0 such that
Av, v ≥ c|v|2V − λv2L2 (R+ ) ,
∀v ∈ V,
(30)
•
for any v ∈ V ,
•
the operator A + λI is one to one and continuous from V 2 onto L2 (R+ ),
with a continuous inverse.
Av, v+ ≥ c|v+ |2V − λv+ 2L2 (R+ ) ,
(31)
Remark 3. Note that the assumption that ψ > 0 near z = 0 is not necessary
for A to have the above properties: indeed, since σ > 0, Gårding’s inequality
holds even if ψ = 0 near 0. The main advantage of this assumption is rather
that it permits a clear identification of the kernel’s singularity at z = 0.
3 The Variational Inequality when the Volatility
σ is Positive
We are ready to write the variational inequalities corresponding to the linear
complementarity problem (15)–(18).
We introduce the closed subspace of V :
K = {v ∈ V, v(x) ≥ u◦ (x) in R+ }.
(32)
268
Y. Achdou
The variational problem consists of finding u ∈ L2 (0, T ; V )∩C 0 ([0, T ]; L2 (R+ )),
2
′
with ∂u
∂t ∈ L (0, T ; V ), such that
1. there exists a constant XT > S such that u(t, x) = 0 for any t ∈ [0, T ],
x ≥ XT ;
2. u(t) ∈ K for almost every t ∈ (0, T );
3. for any v ∈ K with bounded support, for almost every t ∈ (0, T ),
=
<
∂u
+ Au + rx, v − u ≥ 0,
(33)
∂t
here ·, · stands for the duality pairing between V ′ (the dual of V ) and V ;
4. u(t = 0) = u◦ .
Hereafter, this problem will be referred to as (VIP).
3.1 Existence and Uniqueness
Theorem 1. If σ > 0 and under the assumptions of Proposition 4, there
exists a unique u solution of problem (VIP) defined above. Furthermore, u ∈
2
C 0 ([0, T ]; K) ∩ L2 (0, T ; V 2 ) and ∂u
∂t ∈ L ((0, T ) × R+ ).
There exists a non-decreasing and lower semi-continuous function γ :
(0, T ] → (S, XT ), such that for all t ∈ (0, T ), {x > 0 s.t. u(t, x) = u◦ (x)} =
[γ(t), +∞).
Calling
∂u
+ Au + rx,
(34)
µ=
∂t
we have a.e. 0 ≤ µ ≤ rx1{u=0} = rx1{x≥γ(t)} . The function µ is nondecreasing with respect to x (i.e. the distribution ∂µ
∂x is negative) and nonincreasing with respect to t, (i.e. the distribution ∂µ
∂t is positive). For any
X > XT , the total variation of µ in (0, T ) × (0, X) is bounded by rX(T + X).
Almost everywhere in the coincidence set where u(t, x) = 0, it holds
µ(t, x) > 0.
Proof. The proof is too long to be given here. It is written in [Ach06]. Here,
we limit ourselves to list the main steps. The fact that Problem (15)–(18) is
posed in an unbounded domain induces technical difficulties for variational
methods. This leads us to first consider an approximate problem posed in a
bounded domain. Therefore, the program is to
1. approximate (15)–(18) by a similar problem posed in [0, T ]×[0, X], with a
homogeneous Dirichlet condition on the boundary x = X, for some given
positive parameter X > S, and write the related variational problem,
which will be called (VIPX ) below;
Calibration of Lévy Processes with American Options
269
2. solve first a penalized version of (VIPX ). For a function v ∈ L2 ((0, X)) we
call EX (v) the function in L2 (R+ ) obtained by extending v by 0 outside
(0, X). We introduce the Sobolev space
VX = {v ∈ L2 (0, X), EX (v) ∈ V },
(35)
with vVX = EX (v)V . We define the operators AX and BX : VX → VX′ ,
AX v, w = AEX (v), EX (w)
and BX v, w = BEX (v), EX (w).
(36)
The penalized problem is to find uX,ε such that
∂uX,ε
+ AX uX,ε + rx(1 − 1{x>S} Vε (uX,ε )) = 0, t ∈ (0, T ], 0 < x < X,
∂t
uX,ε (t = 0, x) = u◦ (x), 0 < x < X,
uX,ε (t, X) = 0, t ∈ (0, T ],
(37)
where Vε (u) = V( uε ) and V is a smooth non-increasing convex function
such that
0 ≥ V ′ (u) ≥ −2
for 0 ≤ u ≤ 1.
(38)
By using the theory of Lions [Lio69] for parabolic problems with semilinear
monotone operators, one can prove that (37) has a unique solution and
pass to the limit as the penalty parameter tends to zero; one obtains the
existence and uniqueness for (VIPX ).
3. prove that the free boundary of (VIPX ) stays in a bounded domain as
X tends to infinity: this will show that for X large enough a solution of
(VIPX ) is actually a solution of (VIP).
V(0) = 1,
V(u) = 0 for u ≥ 1,
Remark 4. By using the theory presented in [BL84], it is possible to study the
variational inequality in Sobolev spaces with decaying weights as x → 0 and
x → +∞ (actually the variable log(x) was used instead of x in [BL84]). In
Theorem 1, we have avoided these weights.
Remark 5. The last statement of Theorem 1 tells us that there is almost everywhere strict complementarity: the reaction term µ is positive at almost every
point where u = 0.
3.2 Bounds and Sensitivity
In what follows, we aim at obtaining estimates for the solution of (VIP) independent of the parameters (σ, α, ψ), when these parameters vary in a suitably
defined set. Let us introduce B = {f : z → f (z) max(1, |z|, e2z ) ∈ L∞ (R)} endowed with the norm f B = f (·) max(1, | · |, e2· )L∞ (R) . Let us choose some
constants σ, σ̄, α, ψ, ψ̄ and z̄ such that 0 < σ ≤ σ̄, 0 < α < 21 , ψ̄ ≥ ψ > 0
and z̄ > 0. Let us define the subset F of R2+ × B by
270
Y. Achdou
F = [σ, σ̄] × [0, 1 − α] ×
7
ψ ≤ ψ̄; ψ ≥ 0,
B
ψ∈B:
.
ψ ≥ ψ a.e. in [−z̄, z̄]
)
(39)
We can make the three observations:
1. The norm of A as an operator from V to V ′ is bounded independently of
(σ, α, ψ) in F.
2. The constants in (30) and (31) can be taken independent of (σ, α, ψ) in
F.
3. With λ in (30) independent of (σ, α, ψ) in F, the operator A + λI is one to
one and continuous from V 2 onto L2 (R+ ) and (A + λI)−1 : L2 (R+ ) → V 2
is bounded with constants independent of (σ, α, ψ) in F.
These last points are used for proving the following:
Proposition 6 (Bounds). The function γ is bounded in [0, T ] by some
constant X̄ independent of (σ, α, ψ) in F. The quantities uL∞ (0,T ;V ) , uL2
∂
(0,T ;V 2 ) and ∂t uL2 ((0,T )×R+ ) are bounded independently of (σ, α, ψ) in F.
Proposition 7 (Sensitivity). There exists a constant C, such that for all
(σ, α, ψ), (σ̃, α̃, ψ̃) in F,
u − ũL2 (0,T ;V ) +u − ũL∞ (0,T ;L2 (R+ )) ≤ C |σ − σ̃| + |α − α̃| + ψ − ψ̃B ,
T
2
(µ(ũ − u◦ )+ µ̃(u − u◦ )) ≤ C |σ − σ̃| + |α − α̃| + ψ − ψ̃B ,
0
R
calling u = u(σ, α, ψ) and µ = µ(σ, α, ψ) the solution of (VIP) and the
parameters (σ, α, ψ) and the corresponding reaction term (see (34)). Furthermore, let (σn , αn , ψn )n∈N be a sequence of coefficients in F such that
limn→∞ (|σ − σn | + |α − αn | + ψ − ψn B ) = 0. With the notations un =
u(σn , αn , ψn ) and µn = µ(σn , αn , ψn ),
lim un − uL∞ ((0,T )×R+ ) = 0,
n→+∞
lim µn − µLp ((0,T )×R+ ) = 0,
n→+∞
for all p, 1 < p < +∞, and
lim
un − uL∞ (0,T ;V ) + un − uL2 (0,T ;V 2 ) +
n→+∞
- ∂un
∂u −
= 0.
+∂t
∂t -L2 ((0,T )×R+ )
4 Calibration by Least Squares
4.1 Orientation
For calibrating the Lévy process, one observes the spot price S and the prices
(p̄i )i∈I of a family of American put options with maturities/strikes given by
Calibration of Lévy Processes with American Options
271
(Ti , xi ); we call ūi = p̄i −xi +S, i ∈ I. The parameters of the Lévy process, i.e.
the volatility σ, the exponent α and the function ψ will be found as solutions
of a least square problem, where the functional to be minimized is the sum of
a suitable Tychonoff regularization functional JR (σ, α, ψ) and of
ωi (u(Ti , xi ) − ūi )2 ,
J(u) =
i∈I
where ωi are positive weights, and u = u(σ, α, ψ) is a solution of (VIP), with
T = maxi∈I Ti .
We aim at finding some necessary optimality conditions satisfied by the
solutions of the least square problem. The main difficulty comes from the fact
that the derivability of the functional J(u) with respect to the parameter
(σ, α, ψ) is not guaranteed. To obtain some necessary optimality conditions,
we shall consider first a least square problem where u is the solution of the
penalized problem (37) rather than (V IP ), obtain necessary optimality conditions for this new problem, then have the penalty parameter ε tend to 0 and
pass to the limit in the optimality conditions. Such a program has already been
applied in [Ach05] for calibrating the local volatility with American options,
see also [AP05b, AP05a] for a related numerical method and results. The idea
originally comes from Hintermüller [Hin01] and Ito and Künisch [IK00], who
applied a similar program for elliptic variational inequalities. At this point, we
should also mention Mignot and Puel [MP84] who applied an elegant method
for finding optimality conditions for a special control problem for a parabolic
variational inequality.
4.2 Preliminary Technical Results
With the aim of finding optimality conditions for the least square problem
(not completely defined yet), we first state some results concerning the adjoint
of B.
Under the assumptions of Proposition 2, it can be checked that the operator B T defined by
∂u
k(z) x(ez − 1) (x) − e2z u(xez ) + (2ez − 1)u(x) dz (40)
B T u(x) =
∂x
z∈R
is a continuous operator
⎧
s
s−2α
⎪
,
⎨from V to V
from V s to V s−1 ,
⎪
⎩
from V s to V s−1−ε , for any ε > 0,
if α > 21 ,
if α < 12 ,
if α = 12 .
If α > 12 , then for all u, v ∈ V α , B T u, v = Bv, u. This identity holds for
all u, v ∈ V s with s > 12 if α ≤ 12 .
272
Y. Achdou
Lemma 1. Under the assumptions of Proposition 2, and if
(i) either α < 21 ,
(ii) or ψ is continuous near 0 and there exists a bounded function ω : R → R
3
3
and two positive numbers ζ and C such that ψ(z)e 2 z −ψ(0)e− 2 z = zω(z),
with |ω(z)| ≤ C|z|e−ζ|z| , for all z ∈ R,
then for any s ∈ R, the operator B − B T is continuous from V s to V s−1 .
4.3 The Least Square Problem and Its Penalized Version
In order to properly define the least square problem, we have to define the set
where (σ, α, ψ) may vary and the regularization functional.
Let us introduce an Hilbert space Hψ endowed with the norm · Hψ ,
relatively compact in B. Let Jψ be a convex, coercive and C 1 function defined
on Hψ . It is well known that Jψ is also weakly lower semicontinuous in Hψ .
Consider !
Hψ a closed and convex
" subset of Hψ . We assume that Hψ is
contained in ψ : ψB ≤ ψ̄; ψ ≥ 0 and that
1. the functions ψ ∈ Hψ are continuous near 0,
2. there exists two positive constants ψ and z̄ such that ψ(z) ≥ ψ for all z
such that |z| ≤ z̄,
3. there exist two constants ζ > 0 and C ≥ 0 such that for all ψ ∈ Hψ ,
3
3
ψ(z)e 2 z − ψ(0)e− 2 z = zω(z), with |ω(z)| ≤ C|z|e−ζ|z| , for all z ∈ R. This
assumption will allow us to use the results stated in Lemma 1.
Finally, consider the set H = [σ, σ̄] × [0, 1 − α] × Hψ and define
JR (σ, α, ψ) = |σ − σ◦ |2 + |α − α◦ |2 + Jψ (ψ),
where σ◦ and α◦ are suitable prior parameters.
Consider the least square problem:
Minimize J(u) + JR (σ, α, ψ) (σ, α, ψ) ∈ H, u = u(σ, α, ψ) satisfies (VIP).
(41)
We fix X̄ (independent of (σ, α, ψ) ∈ H) as in Proposition 6, and assume that
xi < X̄, i ∈ I. Taking X ≥ X̄, it is also possible to consider the least square
inverse problem corresponding to the penalized problem
(42)
Minimize J(uε ) + JR (σ, α, ψ) (σ, α, ψ) ∈ H, uε satisfies (37).
Propositions 6 and 7 are useful for proving the following:
Proposition 8 (Approximation of the least square problem). Let
(εn )n be a sequence of penalty parameters such that εn → 0 as n → ∞, and let
(σε∗n , αε∗n , ψε∗n ), u∗εn be a solution of the problem (42), with X fixed as above.
Consider a subsequence such that (σε∗n , αε∗n , ψε∗n ) converges to (σ ∗ , α∗ , ψ ∗ ) in
F, ψε∗n weakly converges to ψ ∗ in Hψ and u∗εn → u∗ weakly in L2 (0, T ; VX ),
Calibration of Lévy Processes with American Options
273
where VX is defined in (35). Then (σ ∗ , α∗ , ψ ∗ ), u∗ is a solution of (41), where
we agree to use the notation u∗ for the function EX (u∗ ). We have that
(i) u∗εn converges to u∗ uniformly in [0, T ] × [0, X], and in L2 (0, T ; VX );
(ii) 1{x>S} rxVεn (u∗εn ) converges to µ∗ strongly in L2 ((0, T ) × (0, X));
(iii) for all smooth function χ with compact support contained in [0, X), χu∗εn
converges to χu∗ strongly in L2 (0, T ; V 2 ) and in L∞ (0, T ; V ).
4.4 The Optimality Conditions
We fix X as above. Let a subsequence (σε∗n , αε∗n , ψε∗n , u∗εn ) of solutions of (42)
converge to (σ ∗ , α∗ , ψ ∗ , u∗ ) as in Proposition 8, then (σ ∗ , α∗ , ψ ∗ , u∗ ) is a solution of (41).
The optimality conditions will involve an adjoint problem. Since the cost
functional involves point-wise values of u, the adjoint problem will have a
singular data. In that context, the notion of very weak solution of boundary
and Z,
value problems will be relevant: for that, we introduce the spaces Z
= v ∈ L2 (0, T ; VX ); ∂v + AX v ∈ L2 ((0, T ) × (0, X)) ,
Z
∂t
(43)
Z = {v ∈ Z; v(t = 0) = 0},
where AX is the operator given by (36), (29) and (13), with the parameters
(σ ∗ , α∗ , ψ ∗ ). These spaces endowed with the graph norm are Banach spaces.
We also need to introduce some functionals before stating the optimality
conditions. We assume that u∗ (Ti , xi ) > u◦ (xi ), for all i ∈ I. It is clear from
the continuity of u∗ and from the uniform convergence of u∗εn that there exists
a positive real number a and an integer N such that for n > N , u∗εn (t, x) >
u◦ (x) + εn for all (t, x) such that |t − Ti | < a and |x − xi | < a for some
i ∈ I. We may fix a smooth function φ taking the value 1 for all x such that
|x − xi | ≥ a2 , |Ti − t| ≥ a2 for all i ∈ I, and vanishing in neighborhoods of
(Ti , xi ), i ∈ I.
For a function p such that p ∈ L2 ((0, T ) × R+ ) and φp ∈ L2 (0, T ; VX ) we
introduce the quantities
= T X
T<
2 ∗
∂ 2 u∗
2∂ u
,
φp
+
(1
−
φ)x
p, (44)
x2
G (σ) (u∗ , p) =
∂x2
∂x2
0
0
0
T>
? T X
(α)
(α) ∗
(α) ∗
G (u , p) =
(45)
(1 − φ)BX u∗ p,
BX u , φp +
0
0
0
0
>
?
G (ψ) (u∗ , p), κ =
T
>
?
(ψ,κ)
BX u∗ , φp +
0
T
0
X
(ψ,κ) ∗
(1 − φ)BX
u
p, (46)
where κ ∈ Hψ , ·, · denotes the duality pairing between (VX )′ and VX , and
where
274
Y. Achdou
(α)
BX v(x) = −
∂v
k ∗ (z) log(|z|) x(ez − 1) (x)
∂x
R
+ ez (1{z>− log( X )} v(xe−z ) − v(x)) ,
x
κ(z)
∂v
(ψ,κ)
x(ez − 1) (x)
BX v(x) =
1+2α∗
∂x
R |z|
+ ez (1{z>− log( X )} v(xe−z ) − v(x)) dz.
x
@
A
One can check that G (σ) (u∗ , p), G (α) (u∗ , p) and G (ψ) (u∗ , p), κ are well defined
and do not depend of the particular choice of φ.
We are now ready to state some necessary optimality for the least square
problem (42):
Theorem 2. Let a subsequence (σε∗n , αε∗n , ψε∗n , u∗εn ) of solutions of (42) converge to (σ ∗ , α∗ , ψ ∗ , u∗ ) as in Proposition 8 (we know that (σ ∗ , α∗ , ψ ∗ , u∗ ) is
a solution of (41)). We assume that u∗ (Ti , xi ) > u◦ (xi ), for all i ∈ I.
There exists a function p∗ ∈ L2 ((0, T ) × (0, X)) and a Radon measure ξ ∗
such that for all v ∈ Z (Z is defined by (43))
T
0
0
X
∂v
+ AX v p∗ + ξ ∗ , v = 2
ωi (u∗ (Ti , xi ) − ūi )v((Ti , xi )), (47)
∂t
i∈I
and
µ∗ |p∗ | = 0,
|u∗ |ξ ∗ = 0.
Furthermore, with φ defined above, φp∗
(σ, α, ψ) ∈ H,
(48)
(49)
∈
L2 (0, T, VX ), and for all
(σ − σ ∗ ) 2(σ ∗ − σ◦ ) + σ ∗ G (σ) (u∗ , p∗ ) ≥ 0,
(α − α∗ ) α∗ − α◦ + G (α) (u∗ , p∗ ) ≥ 0,
?
>
DJψ (ψ ∗ ), ψ − ψ ∗ + G (ψ) (u∗ , p∗ ), ψ − ψ ∗ ≥ 0.
(50)
(51)
(52)
with G (σ) , G (α) and G (ψ) defined respectively by (44), (45) and (46).
Proof. The proof consists of first finding the optimality conditions for (42),
then passing to the limit as the penalty parameter tends to zero. It is written
in [Ach06]. Optimality conditions for (42) can be obtained in a now classical
way (see, e.g., the pioneering book of O. Pironneau [Pir84], he was among the
first to understand the potentiality of optimal control techniques in relation
with partial differential equations and optimum design).
Calibration of Lévy Processes with American Options
275
Note that p∗ satisfies
∂p∗
− ATX p∗ = −2
ωi (u∗ (Ti , xi ) − ūi )δt=Ti ⊗ δx=xi
∂t
(53)
i∈I
in the sense of distributions in the open set {x, t : u∗ (t, x) > u◦ (x)} and that
(48) implies that p∗ vanishes in the coincidence set.
5 The Variational Inequality when σ = 0
We focus on the case when σ = 0 and when (α, ψ) ∈ F2 with
#
$
ψB ≤ ψ̄; ψ ≥ 0,
1
F2 =
+ α, 1 − α × ψ ∈ B :
.
ψ ≥ ψ a.e. in [−z̄, z̄]
2
for three constants α, ψ ψ̄, 0 < α <
1
2
(54)
and ψ̄ > ψ > 0.
Remark 6. In the case when σ = 0 and α < 1/2, A is a non-local hyperbolic
operator, and the present theory does not apply.
We may prove that
•
•
A is a continuous operator from V α to V −α ;
we have the Gårding inequality: there exist c > 0 and λ ≥ 0 such that
Av, v ≥ c|v|2V α − λv2L2 (R+ ) ,
∀v ∈ V α
(55)
and
Av, v+ ≥ c|v+ |2V α − λv+ 2L2 (R+ ) ,
•
∀v ∈ V α ;
(56)
the operator A + λI is one to one and continuous from V 2α onto L2 (R+ ).
The goal is to obtain the existence of a weak solution to (15), (17), (18)
by a singular perturbation argument: we fix (α, ψ) ∈ F2 and for η > 0,
we call uη the solution to (15), (17), (18) corresponding to σ = η, given
by Theorem 1. It can be proven that uη L∞ (0,T ;V α ) and uη L2 (0,T ;V 2α ) are
bounded independently of η, and that the free boundary associated to uη stays
in [0, T ] × [0, X̌], where X̌ does not depend on η. By the results contained in
[Lio73, in particular, Théorème 4.1, p. 286], one may pass to the limit as η
tends to zero, and prove the following result:
Theorem 3. We choose σ = 0 and (α, ψ) ∈ F2 and we define
K = {v ∈ V α , v(x) ≥ u◦ (x) in R+ }.
There exists a unique weak solution of (15), (17) and (18) in (0, T ) × R+ ,
i.e. a function u which belongs to C 0 ([0, T ]; K) and to L2 (0, T ; V 2α ), and with
276
Y. Achdou
∂u
∂t
∈ L2 ((0, T ) × R+ ), such that u(t = 0) = u◦ and for all v ∈ K with bounded
support in x,
=
<
∂u
+ Au + rx, v − u ≥ 0, for a.a. t > 0.
(57)
∂t
There exists X̌ > 0 such that
u(t, x) = 0,
∀t ∈ [0, T ], x ≥ X̌,
(58)
The function u is non-increasing with respect to x and non-decreasing with
respect to t and there exists a non-decreasing continuous function γ : (0, T ] →
(S, X̌), such that for all t ∈ (0, T ), {x > 0 s.t. u(t, x) = u◦ (x)} = [γ(t), +∞).
References
[Ach05]
[Ach06]
[Ada75]
[AP05a]
[AP05b]
[BL84]
[BS73]
[CT04]
[CV03]
[CV04]
[Dup97]
Y. Achdou. An inverse problem for a parabolic variational inequality
arising in volatility calibration with American options. SIAM J. Control
Optim., 43(5):1583–1615 (electronic), 2005.
Y. Achdou. An inverse problem for a parabolic variational inequality
with an integro-differential operator arising in the calibration of Lévy
processes with American options. Submitted, 2006.
R. A. Adams. Sobolev spaces, volume 65 of Pure and Applied Mathematics. Academic Press [A subsidiary of Harcourt Brace Jovanovich
Publishers], New York, 1975.
Y. Achdou and O. Pironneau. Computational methods for option pricing,
volume 30 of Frontiers in Applied Mathematics. Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA, 2005.
Y. Achdou and O. Pironneau. Numerical procedure for calibration of
volatility with American options. Appl. Math. Finance, 12(3):201–241,
2005.
A. Bensoussan and J.-L. Lions. Impulse control and quasivariational
inequalities. µ. Gauthier-Villars, Montrouge, 1984. Translated from the
French by J. M. Cole.
F. Black and M. S. Scholes. The pricing of options and corporate liabilities,. Journal of Political Economy,, 81:637–654, 1973.
R. Cont and P. Tankov. Financial modelling with jump processes.
Chapman & Hall/CRC Financial Mathematics Series. Chapman &
Hall/CRC, Boca Raton, FL, 2004.
R. Cont and E. Voltchkova. Finite difference methods for option pricing
in jump-diffusion and exponential Lévy models. Rapport Interne 513,
CMAP, Ecole Polytechnique, 2003.
R. Cont and E. Voltchkova. Integro-differential equations for option
prices in exponential Lévy models. Rapport Interne 547, CMAP, Ecole
Polytechnique, 2004.
B. Dupire. Pricing and hedging with smiles. In Mathematics of derivative
securities (Cambridge, 1995), pages 103–111. Cambridge Univ. Press,
Cambridge, 1997.
Calibration of Lévy Processes with American Options
[Hin01]
277
M. Hintermüller. Inverse coefficient problems for variational inequalities:
optimality conditions and numerical realization. M2AN Math. Model.
Numer. Anal., 35(1):129–152, 2001.
[IK00]
K. Ito and K. Kunisch. Optimal control of elliptic variational inequalities.
Appl. Math. Optim., 41(3):343–364, 2000.
[JLL90]
P. Jaillet, D. Lamberton, and B. Lapeyre. Variational inequalities and
the pricing of American options. Acta Appl. Math., 21(3):263–289, 1990.
[Lio69]
J.-L. Lions. Quelques méthodes de résolution des problèmes aux limites
non linéaires. Dunod, 1969.
[Lio73]
J.-L. Lions. Perturbations singulières dans les problèmes aux limites et en
contrôle optimal, volume 323 of Lecture Notes in Mathematics. SpringerVerlag, Berlin, 1973.
[LL97]
D. Lamberton and B. Lapeyre. Introduction au calcul stochastique appliqué à la finance. Ellipses, 1997.
[Mer73]
R. C. Merton. Theory of rational option pricing. Bell J. Econom. and
Management Sci., 4:141–183, 1973.
[MNS03] A.-M. Matache, P.-A. Nitsche, and C. Schwab. Wavelet Galerkin pricing
of American options on Lévy driven assets. 2003. Research Report SAM
2003-06.
[MP84]
F. Mignot and J.-P. Puel. Contrôle optimal d’un système gouverné par
une inéquation variationnelle parabolique. C. R. Acad. Sci. Paris Sér. I
Math., 298(12):277–280, 1984.
[MSW04] A.-M. Matache, C. Schwab, and T. P. Wihler. Fast numerical solution
of parabolic integro-differential equations with applications in finance.
Technical report, IMA University of Minnesota, 2004. Reseach report
No. 1954.
[MvPS04] A.-M. Matache, T. von Petersdoff, and C. Schwab. Fast deterministic
pricing of Lévy driven assets. Mathematical Modelling and Numerical
Analysis, 38(1):37–72, 2004.
[Pha98]
H. Pham. Optimal stopping of controlled jump-diffusion processes: A
viscosity solution approach. Journal of Mathematical Systems, 8(1):1–
27, 1998.
[Pir84]
O. Pironneau. Optimal shape design for elliptic systems. Springer Series
in Computational Physics. Springer-Verlag, New York, 1984.
An Operator Splitting Method for Pricing
American Options
Samuli Ikonen1 and Jari Toivanen2
1
2
Nordea Markets, FI-00020 Nordea, Finland Samuli.Ikonen@nordea.com
Department of Mathematical Information Technology, P.O. Box 35 (Agora),
FI-40014 University of Jyväskylä, Finland Jari.Toivanen@mit.jyu.fi
Summary. Pricing American options using partial (integro-)differential equation
based methods leads to linear complementarity problems (LCPs). The numerical
solution of these problems resulting from the Black–Scholes model, Kou’s jumpdiffusion model, and Heston’s stochastic volatility model are considered. The finite
difference discretization is described. The solutions of the discrete LCPs are approximated using an operator splitting method which separates the linear problem
and the early exercise constraint to two fractional steps. The numerical experiments
demonstrate that the prices of options can be computed in a few milliseconds on
a PC.
1 Introduction
Since 1973 Black, Scholes, and Merton developed models for pricing options in
[BS73, Mer73] and, on the other hand, the Chicago Board Options Exchange
started to operate, the trading of options has grown to tremendous scale. Basic
options give either the right to sell (put) or buy (call) the underlying asset
with the strike price. European options can be exercised only at the expiry
time while American options can be exercised anytime before the expiry. The
Black–Scholes partial differential equation (PDE) describes the evolution of
the option price in time for European options. In order to avoid arbitrage
opportunities with an American option, the so-called early exercise constraint
has to be posed on its value. Combining this constraint with the PDE leads to
a linear complementarity problem (LCP). For European options it is generally
possible to derive formulas for their price, but American options usually need
to be priced numerically. This paper considers the solution of these pricing
problems. For the general discussion on these topics, we refer to the books
[AP05, CT04, TR00, Wil98].
The Black–Scholes model [BS73] assumes a constant volatility for all options with different strike prices and expiry times on the same underlying
asset. In practice, this does not hold in the markets. One possibility to make
280
S. Ikonen and J. Toivanen
the prices consistent with the markets is to define the volatility as a function of time and the value of the underlying asset, and then calibrate this
function; see [Dup94], for example. In 1976, Merton suggested to add jumps
to the model of the underlying asset in [Mer73]. This jump-diffusion model
helps to explain a good part of difference between the market prices and the
ones given by the Black–Scholes model with a constant volatility. Since then
there has been growing activity to incorporate jumps to the model; see [CT04]
and references therein. One of the models used in this paper is Kou’s jumpdiffusion model. Another generalization is to make the volatility a stochastic
process. Several such multifactor models have been proposed; see [FPS00], for
example. Here Heston’s stochastic volatility model [Hes93] is used. One can
also combine stochastic volatility and jump models like in [Bat96, DPS00], for
example.
Several ways to solve the discretized LCPs resulting from pricing American
options have been described in the literature. Maybe the most common
method is the project SOR iteration proposed in [Cry71]. This method is
fairly generic and easy to implement, but its convergence rate degrades as
grids are refined. For one-dimensional PDE models the resulting LCPs can
be solved very efficiently using the direct algorithm in [BS77] if the matrix is
a tridiagonal M-matrix and the solution has suitable form. The full matrices
resulting from jump-diffusion models require special techniques in order to
obtain efficient algorithms. The papers [AO05, AA00, CV05, MSW05] study
the numerical pricing of European options, and in [dFL04, dFV05, Toi06]
the pricing of American options is considered. For higher-dimensional problems like the ones resulting from Heston’s model multigrid methods have been
considered in [BC83, CP99, Oos03, RW04], for example. An alternative way
is to approximate the LCPs using a penalty method [FV02, ZFV98]. This
leads to a sequence system of linear equations with varying matrices. With
this approach the constraints are always slightly violated. With a fairly similar
Lagrange multiplier method [AP05, HIK03, IK06, IT06b] it can be guaranteed
that the constraints are satisfied.
This paper considers an operator splitting method proposed for the Black–
Scholes model in [IT04a]. The method was applied to Heston’s model and
analyzed in [IT04b], and for Kou’s model it was applied in [Toi06]. The basic
idea of this method is to split a time step with a LCP to two fractional time
steps. The first fractional step requires a system of linear equations to be
solved and the second one enforces the early exercise constraint. The update
to satisfy the constraint is simple and, thus, the main computational burden
will be the solution linear systems. A similar approach is commonly used to
treat the incompressibility condition in the computational fluid dynamics; see
[Glo03], for example. The operator splitting method has two obvious benefits. There are several efficient methods available for solving resulting systems
of linear equations while only a few methods are available for the original
LCPs and they usually cannot compete in the efficiency. Secondly the operator splitting method is easier to implement than an efficient LCP solver. This
An Operator Splitting Method for Pricing American Options
281
paper demonstrates that the operator splitting method is suitable for pricing
American options with different models and that the computation of a sufficiently accurate price for most purposes requires only a few milliseconds on a
contemporary PC.
Outline of the paper is the following. We begin by describing the three
models and the resulting P(I)DEs for European options. After this we formulate linear complementarity problems for the value of American options. Next
we sketch finite difference discretizations for the partial differential operators.
Then the operator splitting method is described and after this methods for
solving the resulting systems of linear equations are discussed. The paper ends
with numerical examples with all of the considered models and conclusions.
2 Models
2.1 Black–Scholes Model
By assuming that the value of the underlying asset denoted by x follows a
geometric Brownian motion with a drift, the Black–Scholes PDE [BS73]
1
vt = ABS v = − (σx)2 vxx − rxvx + rv
2
(1)
can be derived for the value of an option denoted by v, where σ is the volatility
of the value of the asset and r is the risk free interest rate. In practice, the
market prices of options do not satisfy (1). One possible way to make the
model to match the markets is to use a volatility function σ which depends
on the value of the underlying asset and time; see [AP05, Dup94], for example.
In this case, the volatility function has to be calibrated with the market data.
2.2 Jump-Diffusion Models
When there is a high market stress like the crash of 1987 the value of assets
can move faster than a geometric Brownian motion would predict. Partly due
to this, models which allow also jumps for the value of asset have become more
common; see [CT04] and references therein. Already in 1976 Merton considered such a model in [Mer76]. With independent and identically distributed
jumps a partial integro-differential equation (PIDE)
1
vt = AJD v = − (σx)2 vxx −(r −µζ)xvx +(r +µ)v −µ
v(t, xy)f (y) dy (2)
2
R+
can be derived for the value of an option, where µ is the rate of jumps,
the function f defines the distributions of jumps, and ζ is the mean jump
amplitude.
282
S. Ikonen and J. Toivanen
Merton used a Gaussian distribution for jumps in [Mer76]. Kou considered
in [Kou02] a log-double-exponential distribution for jumps which leads a more
flexible and tractable model. In this case, the density is
)
qα2 y α2 −1 ,
y < 1,
f (y) =
(3)
−α1 −1
, y ≥ 1,
pα1 y
where p, q, α1 > 1, and α2 are positive constants such that p + q = 1. The
2
1
+ αqα
− 1. We will employ this model in
mean jump amplitude is ζ = αpα
1 −1
2 +1
the numerical experiments. Also in this case one possible way to calibrate the
model is to let the volatility σ be a function of time and asset value like in
[AA00].
2.3 Stochastic Volatility Models
In practice, the volatility of the value of an asset is not a constant over time.
Several models have been also developed for the behavior of the volatility.
Among several stochastic volatility models probably the one developed by
Heston in [Hes93] is the most popular. It assumes the volatility to be a meanreverting process. Under the assumption that the market price of risk is zero
Heston’s model leads to the two-dimensional PDE
1
1
vt = ASV v = − yx2 vxx − ργyxvxy − γ 2 yvyy − rxvx − α(β − y)vy + rv, (4)
2
2
where y is the variance, that is, the square of the volatility, β is the mean
level of the variance, α is the rate of reversion on the mean level, and γ is the
volatility of the variance. The correlation between the price of the underlying
asset and its variance is ρ.
3 Linear Complementarity Problems
The value of an option at the expiry time T is given by
v(T, x) = g(x),
(5)
where the payoff function g depends on the type of the option. For example,
for a put option with a strike price K it is
g(x) = max{K − x, 0}.
(6)
The value v of an American option satisfies a linear complementarity problem (LCP)
)
v ≥ g,
(vt − Av) ≥ 0,
(7)
(vt − Av)(v − g) = 0,
An Operator Splitting Method for Pricing American Options
283
where A is one of the operators ABS , AJD , or ASV defined by (1), (2), and
(4), respectively.
The operator splitting method is derived from a formulation with a
Lagrange multiplier λ after a temporal discretization. In the continuous level,
the formulation with the Lagrange multiplier reads
)
λ ≥ 0, v ≥ g,
(vt − Av) = λ,
(8)
λ(v − g) = 0.
4 Discretizations
4.1 Spatial Discretizations
The LCPs are posed on infinite domain as there is no upper limit for the value
of the asset and also for variance in the case of Heston’s stochastic volatility
model. In order to use finite difference discretizations for the spatial derivatives, the domain is truncated from sufficiently large values of x and y which
are denoted by X and Y , respectively. The choice of X for the Black–Scholes
model is considered in [KN00], for example. On the truncation boundaries a
suitable boundary condition needs to be posed. For the one-dimensional models for put options, we use homogeneous Dirichlet boundary condition v = 0
at x = X. For Heston’s model homogeneous Neumann boundary conditions
are posed. While these are fairly typical choices for boundary conditions there
are also other choices.
For the interval [0, X], we define subintervals [xi−1 , xi ], i = 1, 2, . . . , m,
where xi s satisfy 0 = x0 < x1 < · · · < xm = X. For Heston’s model, the
interval [0, Y ] is similarly divided by the points 0 = y0 < y1 < · · · < xn = Y .
Finite difference discretizations seek approximations for the value of v at the
grid points xi s for one-dimensional models and (xi , yj ) for Heston’s model. The
spatial partial derivatives appearing in (7) and (8) needs to be approximated
using the grid point values. For the second-order derivative with respect to x,
we use a finite difference approximation
vxx (t, xi ) ≈
2
2
v(t, xi−1 ) −
v(t, xi )
∆xi−1 (∆xi−1 + ∆xi )
∆xi−1 ∆xi
2
+
v(t, xi+1 ), (9)
∆xi (∆xi−1 + ∆xi )
where ∆xi−1 = xi − xi−1 and ∆xi = xi+1 − xi . For the first-order derivative,
one possible approximation is
vx (t, xi ) ≈ −
∆xi − ∆xi−1
∆xi
v(t, xi−1 ) +
v(t, xi )
∆xi−1 (∆xi−1 + ∆xi )
∆xi−1 ∆xi
∆xi−1
v(t, xi+1 ).
+
∆xi (∆xi−1 + ∆xi )
(10)
284
S. Ikonen and J. Toivanen
For Heston’s model the approximations for the partial derivatives with respect
to y can be defined analogously. The approximations (9) and (10) can be shown
to be second-order accurate with respect to the grid step size when the step
size varies smoothly; see [MW86], for example.
When the coefficient for the first-order derivative is large compared to the
coefficient of the second-order derivative, the above discretizations lead to matrices with positive off-diagonal entries. In this case the matrix cannot have
the M-matrix property and the resulting numerical solutions can have oscillations. This situation can be avoided by using locally one-sided differences
for the first-order derivative. The drawback of this approach is that it reduces
the order of accuracy to be first-order with respect to the grid step size. Nevertheless we will use this choice to ensure that the spatial discretizations lead
to M-matrices and, thus, stable discretizations.
Special care must be taken when discretizing the cross derivative vxy in
Heston’s model if M-matrices are sought. In [IT05], a seven-point stencil leading an M-matrix is described. With strong correlation between the value of
asset and its volatility there can be severe restrictions on grid step sizes in
order to obtain M-matrices and accurate discretizations.
The discretization of the integral term in the jump-diffusion model (2)
leads to a full matrix; see [AO05, dFL04, MSW05], for example. Computationally it is expensive to operate with the full matrix and, due to this, different
fast ways have been proposed for operating with it in the above mentioned
articles. Fortunately, with Kou’s log-double-exponential f in (2) is possible to
derive recursive formulas with optimal computational complexity for evaluating quadratures for the integrals. This has been described in [Toi06] and we
will employ this approach with our numerical experiments.
The grid point values of v are collected to a vector v. Similarly we define a
vector g containing the grid point values of the payoff function g. The spatial
discretization leads to a semi-discrete form of the LCP (7) given by
)
v ≥ g,
(vt − Av) ≥ 0,
(11)
(vt − Av)T (v − g) = 0,
where the matrix A is defined by the used finite differences and the inequalities of vectors are componentwise. The semi-discrete form with the Lagrange
multiplier λ corresponding to (8) reads
)
(vt − Av) = λ,
λ ≥ 0, v ≥ g,
(12)
T
λ (v − g) = 0,
where the vector λ contains the grid point values of the Lagrange multiplier.
4.2 Temporal Discretization
For the temporal discretization the time interval [0, T ] is divided into subintervals which are defined by the times 0 = t0 < t1 < · · · < tl = T . The vector
containing the grid point values of v at tk is denoted by v(k) .
An Operator Splitting Method for Pricing American Options
285
Usually in option pricing problems the backward time stepping is started
from a non-smooth final value. Due to this, the time stepping scheme should
have good damping properties in order to avoid oscillations. For example,
the popular Crank–Nicolson method does not have good damping properties
and it can lead to approximations with excessive oscillations. Instead of it we
employ the Rannacher time-stepping scheme [Ran84]. In the option pricing
context it has been analyzed recently in [GC06].
In the Rannacher time-stepping scheme a few first time steps are performed
with the implicit Euler method and then the Crank–Nicolson method is used.
This leads to second-order accuracy and good damping properties. For the
semi-discrete LCP (11) the scheme reads
)
v(k) ≥ g,
B(k) v(k) − C(k) v(k+1) − f (k) ≥ 0,
(13)
T
B(k) v(k) − C(k) v(k+1) − f (k)
v(k) − g = 0,
for k = l − 1, . . . , 0, where
B(k) = I + θk ∆tk A,
C(k) = I − (1 − θk )∆tk A,
(14)
and f (k) is due to possible non-homogeneous Dirichlet boundary conditions.
When the first four time steps are performed with the implicit Euler method
the parameter θk is defined by
)
1, k = l − 1, . . . , l − 4,
θk = 1
(15)
2 , k = l − 5, . . . , 0.
The temporal discretization of the semi-discrete form with the Lagrange
multiplier (12) leads to
)
λ(k) ≥ 0, v(k) ≥ g,
B(k) v(k) − C(k) v(k+1) − f (k) = ∆tk λ(k) ,
(16)
T
v(k) − g = 0,
λ(k)
for k = l − 1, . . . , 0.
5 Operator Splitting Method
Here we describe an operator splitting method [IT04a] which approximates the
solution of the LCP in (16) by two fractional time steps. The first step requires
the solution of a system of linear equations and the second step updates
the solution and Lagrange multiplier to satisfy the linear complementarity
conditions. The advantage of this approach is that it simplifies the solution
procedure and allows to use any efficient method for solving linear systems.
More precisely, the steps in the operator splitting method are
B(k) ṽ(k) = C(k) v(k+1) + f (k) + ∆tk λ(k+1)
(17)
286
S. Ikonen and J. Toivanen
and
)
v(k) − ṽ(k) − ∆tk (λ(k) − λ(k+1) ) = 0,
λ(k)
T
v(k) − g = 0.
λ(k) ≥ 0, v(k) ≥ g,
(18)
The first step (17) uses the Lagrange multiplier vector λ(k+1) from the previous step and not λ(k) which leads to the decoupling of the linear system and
the constraints. The second step does not have any spatial couplings and the
update can be made quickly by going through components of the vectors v(k)
and λ(k) one by one. Due to this, the main computational cost is the solution
of the linear system in the first step (17). Under reasonable assumptions it
can be shown that the difference between the solutions of the original time
stepping and the operator splitting time stepping is second-order with respect
to the time step size [IT04b]. Hence, it does not reduce the order of accuracy
compared to second-order accurate time stepping method like the Rannacher
scheme.
6 Solution of Linear Systems
In each time step with the operator splitting method it is necessary to solve
a system of linear equations with the matrix B defined in (14). Here and
in the following we have omitted the subscript (k) in order to simplify the
notations. The Black–Scholes PDE leads to a tridiagonal B with the above
finite difference discretization. In this case the linear systems can be solved
efficiently using the LU decomposition.
With the jump-diffusion models B is a full matrix and the use of LU decomposition would be computationally too expensive. We adopt the approach
proposed in [AO05, dFV05] which is an iterative method based on a regular
splitting of B. We use the splitting
B = T − R,
(19)
where R is the full matrix resulting from the integral term and, thus, T is
a tridiagonal matrix defined by other terms. Now the iterative method for a
system Bv = b reads
vl+1 = T−1 b + Rvl ,
l = 0, 1, . . . ,
(20)
where v0 is the initial guess taken to be the solution from the previous time
step. The solutions with T, that is, multiplications with T−1 can be computed efficiently using LU decomposition. The multiplications with R can
be performed using the fast recursion formulas in [Toi06] when Kou’s model
is used. Furthermore, it has been shown in [dFV05] that the iteration (20)
converges fast. As the numerical experiments will demonstrate, usually two
or three iterations are enough to obtain the solution with sufficient accuracy.
An Operator Splitting Method for Pricing American Options
287
With Heston’s model B is a block tridiagonal matrix corresponding to a
two-dimensional PDE. Furthermore, B is usually not well conditioned partly
due to varying coefficient in the PDE. In order to obtain a method with optimal computational complexity, we will employ a multigrid method. The analysis in [Oos03] shows that a multigrid with an alternating direction smoother is
robust with respect to all parameters in the problem and discretization. This
smoother is computationally more expensive and complicated to implement
than point smoothers, but we used it as it guarantees a fast multigrid convergence. The grid transfers are performed using full weighting restriction and
bilinear prolongation.
7 Numerical Results
In our numerical examples we price American put options with the parameters
σ = 0.25,
r = 0.1,
T = 0.25,
and K = 10.
(21)
The additional parameters for Kou’s and Heston’s models are defined in the
subsequent sections. In Table 1, we have collected reference option prices for
three asset values. They are computed with very fine discretizations for the
one-dimensional models on the interval [0, 40] and the prices under Heston’s
model are from [IT06b] with y = 0.0625. Fig. 1 shows the price of the option
as a function of x computed with the different models in the interval 8.5 ≤
x ≤ 12.5.
In the following tables all CPU times are given in milliseconds on a PC
with 3.8 GHz Intel Xeon processor and implementations have been made using
Fortran.
7.1 Black–Scholes Model
Based on a few numerical experiments using the model parameters in (21) we
observed that the truncation boundary can be chosen to be X = 2K = 20
with the truncation error being so small that it does not influence the first
five decimals of the prices at x = 9, 10, and 11. We define the spatial grid as
sinh(β(i/n − γ))
xi = 1 +
K,
i = 0, 1, . . . , m,
(22)
sinh(βγ)
Table 1. Reference prices for options with the different models
model \ asset value x = 9
x = 10
x = 11
Black–Scholes
1.030463 0.402425 0.120675
Kou
1.043796 0.429886 0.148625
Heston
1.107621 0.520030 0.213677
288
S. Ikonen and J. Toivanen
1.6
Black-Scholes
Kou
Heston
1.4
1.2
v
1
0.8
0.6
0.4
0.2
0
8.5
9
9.5
10
10.5
x
11
11.5
12
12.5
Fig. 1. The price of the option with respect to the value of the underlying asset for
the three different models.
Table 2. Results for different grids with Black–Scholes model
l m error ratio time
10 20 0.01056
0.02
18 40 0.00208 5.1 0.06
34 80 0.00058 3.6 0.21
66 160 0.00022 2.7 0.79
130 320 0.00007 3.3 3.08
where we have chosen β = 6 and γ = 1/2 which leads to some refinement near
the strike price K. For the temporal discretization, we choose the approximation times to be
−k/(l−2)
a
−1
tk =
T,
k = 0, 1, . . . , l − 4,
(23)
a−1 − 1
and
tk =
a−(k+l−4)/(2l−4) − 1
a−1 − 1
T,
k = l − 3, . . . , l.
(24)
The parameter a in (23) and (24) has been chosen to be a = 2 which leads to
a mild refinement near the expiry.
Table 2 reports the l2 errors computed using the reference prices in Table
1 at x = 9, 10, and 11 for five different space-time grids. The ratio column
in the table gives the ratios between two successive l2 errors. The time is the
CPU time in milliseconds needed to price the options.
7.2 Kou’s Jump-Diffusion Model
The parameters defining the jump probability and its distribution in Kou’s
model are chosen to be
An Operator Splitting Method for Pricing American Options
289
Table 3. Results for different grids with Kou’s model
l m error ratio iter time
10 20 0.01050
3.1 0.10
18 40 0.00231 4.5 3.0 0.29
34 80 0.00056 4.1 3.0 0.97
66 160 0.00022 2.6 2.3 2.95
130 320 0.00006 3.7 2.0 10.17
Table 4. Results for different grids with Heston’s model
l m
n
10 20
8
18 40 16
34 80 32
66 160 64
130 320 128
α1 = 3,
error ratio iter
time
0.02576
1.0
0.7
0.00574 4.5 1.3
5.7
0.00420 1.4 2.0
59.4
0.00049 8.5 2.0 487.5
0.00012 4.1 2.0 4373.7
α2 = 3,
p=
1
,
3
and µ = 0.1.
(25)
We have used the same space-time grids as with the Black–Scholes model.
Table 3 reports the errors, their ratios and CPU times in milliseconds. The
column iter in the table gives the average number of the iterations (20). The
stopping criterion for the iterations was that the norm of the residual vector
is less than 10−11 times the norm of the right-hand side vector.
7.3 Heston’s Stochastic Volatility Model
In Heston’s model the behavior of the stochastic volatility and its correlation
with the value of the asset are described by the parameters
α = 5,
β = 0.16,
γ = 0.9,
and ρ = 0.1.
(26)
The values of these parameters are the same as in many previous studies including [CP99, IT07, Oos03, ZFV98]. The computational domain is truncated
at X = 20 and Y = 1 like also in [Oos03, IT07], for example. We use the same
non-uniform grids as in [IT05] and the parameter w in the discretization of
the cross derivative (not discussed in this paper) is chosen using the formula
in [IT07]. For the time stepping we use uniform time steps.
Table 4 reports the errors, their ratios, the average number of multigrid iterations, and CPU times in milliseconds. The stopping criterion for the multigrid iterations was that the norm of the residual vector is less than 10−6 times
the norm of the right-hand side vector.
290
S. Ikonen and J. Toivanen
8 Conclusions
We described an operator splitting method for solving linear complementarity problems (LCPs) resulting from American option pricing problems. We
considered it in the case of the Black–Scholes model, Kou’s jump-diffusion
model, and Heston’s stochastic volatility model for the value of the underlying asset. The numerical results demonstrated that with all these models the
prices can be computed in a few milliseconds on a PC.
As future research one could consider the construction of adaptive discretization; see [AP05, LPvST07], for example. Also the robustness and accuracy of discretizations for Heston’s model with higher correlations could be
studied. A natural generalization would be to extent the methods for stochastic volatility models including jumps like the ones in [Bat96, DPS00].
References
[AA00]
[AO05]
[AP05]
[Bat96]
[BC83]
[BS73]
[BS77]
[CP99]
[Cry71]
[CT04]
[CV05]
[dFL04]
L. Andersen and J. Andreasen. Jump-diffusion processes: Volatility
smile fitting and numerical methods for option pricing. Rev. Deriv.
Res., 4:231–262, 2000.
A. Almendral and C. W. Oosterlee. Numerical valuation of options
with jumps in the underlying. Appl. Numer. Math., 53:1–18, 2005.
Y. Achdou and O. Pironneau. Computational methods for option pricing, volume 30 of Frontiers in Applied Mathematics. SIAM, Philadelphia, PA, 2005.
D. S. Bates. Jumps and stochastic volatility: Exchange rate processes
implicit Deutsche mark options. Review Financial Stud., 9:69–107,
1996.
A. Brandt and C. W. Cryer. Multigrid algorithms for the solution of
linear complementarity problems arising from free boundary problems.
SIAM J. Sci. Statist. Comput., 4:655–684, 1983.
F. Black and M. Scholes. The pricing of options and corporate liabilities.
J. Polit. Econ., 81:637–654, 1973.
M. J. Brennan and E. S. Schwartz. The valuation of American put
options. J. Finance, 32:449–462, 1977.
N. Clarke and K. Parrott. Multigrid for American option pricing with
stochastic volatility. Appl. Math. Finance, 6:177–195, 1999.
C. W. Cryer. The solution of a quadratic programming problem using
systematic overrelaxation. SIAM J. Control, 9:385–392, 1971.
R. Cont and P. Tankov. Financial modelling with jump processes.
Chapman & Hall/CRC, Boca Raton, FL, 2004.
R. Cont and E. Voltchkova. A finite difference scheme for option pricing
in jump diffusion and exponential Lévy models. SIAM J. Numer. Anal.,
43:1596–1626, 2005.
Y. d’Halluin, P. A. Forsyth, and G. Labahn. A penalty method for
American options with jump diffusion processes. Numer. Math., 97:321–
352, 2004.
An Operator Splitting Method for Pricing American Options
[dFV05]
291
Y. d’Halluin, P. A. Forsyth, and K. R. Vetzal. Robust numerical methods for contingent claims under jump diffusion processes. IMA J. Numer. Anal., 25:87–112, 2005.
[DPS00]
D. Duffie, J. Pan, and K. Singleton. Transform analysis and asset pricing for affine jump-diffusions. Econometrica, 68(6):1343–1376, 2000.
[Dup94]
B. Dupire. Pricing with a smile. Risk, 7:18–20, 1994.
[FPS00]
J.-P. Fouque, G. Papanicolaou, and K. R. Sircar. Derivatives in financial markets with stochastic volatility. Cambridge University Press,
Cambridge, 2000.
[FV02]
P. A. Forsyth and K. R. Vetzal. Quadratic convergence for valuing
American options using a penalty method. SIAM J. Sci. Comput.,
23:2095–2122, 2002.
[GC06]
M. B. Giles and R. Carter. Convergence analysis of Crank-Nicolson
and Rannacher time-marching. J. Comput. Finance, 9:89–112, 2006.
[Glo03]
R. Glowinski. Finite element methods for incompressible viscous flow.
In P. G. Ciarlet and J.-L. Lions, editors, Handbook of Numerical Analysis, Vol. IX, pages 3–1176. North-Holland, Amsterdam, 2003.
[Hes93]
S. Heston. A closed-form solution for options with stochastic volatility
with applications to bond and currency options. Rev. Financial Stud.,
6:327–343, 1993.
[HIK03]
M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set
strategy as a semismooth Newton method. SIAM J. Optim., 13:865–
888, 2003.
[IK06]
K. Ito and K. Kunisch. Parabolic variational inequalities: The Lagrange
multiplier approach. J. Math. Pures Appl., 85:415–449, 2006.
[IT04a]
S. Ikonen and J. Toivanen. Operator splitting methods for American
option pricing. Appl. Math. Lett., 17:809–814, 2004.
[IT04b]
S. Ikonen and J. Toivanen. Operator splitting methods for pricing
American options with stochastic volatility. Reports of the Department
of Mathematical Information Technology, Series B, Scientific Computing B11/2004, University of Jyväskylä, Jyväskylä, 2004.
[IT05]
S. Ikonen and J. Toivanen. Componentwise splitting methods for pricing American options under stochastic volatility. Reports of the Department of Mathematical Information Technology, Series B, Scientific
Computing B7/2005, University of Jyväskylä, Jyväskylä, 2005.
[IT07]
S. Ikonen and J. Toivanen. Componentwise splitting methods for pricing American options under stochastic volatility. Int. J. Theor. Appl.
Finance, 10(2):331–361, 2007.
[IT06b]
K. Ito and J. Toivanen. Lagrange multiplier approach with optimized
finite difference stencils for pricing American options under stochastic volatility. Reports of the Department of Mathematical Information Technology, Series B, Scientific Computing B6/2006, University of
Jyväskylä, Jyväskylä, 2006.
[KN00]
R. Kangro and R. Nicolaides. Far field boundary conditions for BlackScholes equations. SIAM J. Numer. Anal., 38:1357–1368, 2000.
[Kou02]
S. G. Kou. A jump-diffusion model for option pricing. Management
Sci., 48:1086–1101, 2002.
[LPvST07] P. Lötstedt, J. Persson, L. von Sydow, and J. Tysk. Space-time adaptive
finite difference method for European multi-asset options. Comput.
Math. Appl., 53(8):1159–1180, 2007.
292
S. Ikonen and J. Toivanen
[Mer73]
[Mer76]
[MSW05]
[MW86]
[Oos03]
[Ran84]
[RW04]
[Toi06]
[TR00]
[Wil98]
[ZFV98]
R. C. Merton. Theory of rational option pricing. Bell J. Econom. and
Management Sci., 4:141–183, 1973.
R. Merton. Option pricing when underlying stock returns are discontinuous. J. Financial Econ., 3:125–144, 1976.
A.-M. Matache, C. Schwab, and T. P. Wihler. Fast numerical solution
of parabolic integrodifferential equations with applications in finance.
SIAM J. Sci. Comput., 27:369–393, 2005.
T. A. Manteuffel and A. B. White, Jr. The numerical solution of secondorder boundary value problems on nonuniform meshes. Math. Comp.,
47:511–535, 1986.
C. W. Oosterlee. On multigrid for linear complementarity problems
with application to American-style options. Electron. Trans. Numer.
Anal., 15:165–185, 2003.
R. Rannacher. Finite element solution of diffusion problems with irregular data. Numer. Math., 43:309–327, 1984.
C. Reisinger and G. Wittum. On multigrid for anisotropic equations
and variational inequalities: pricing multi-dimensional European and
American options. Comput. Vis. Sci., 7(3–4):189–197, 2004.
J. Toivanen. Numerical valuation of European and American options
under Kou’s jump-diffusion model. Reports of the Department of
Mathematical Information Technology, Series B, Scientific Computing
B11/2006, University of Jyväskylä, Jyväskylä, 2006.
D. Tavella and C. Randall. Pricing financial instruments: The finite
difference method. John Wiley & Sons, Chichester, 2000.
P. Wilmott. Derivatives. John Wiley & Sons, Chichester, 1998.
R. Zvan, P. A. Forsyth, and K. R. Vetzal. Penalty methods for American
options with stochastic volatility. J. Comput. Appl. Math., 91:199–218,
1998.