Multigrid Method
Multigrid Method
Multigrid Method
2006
Gilbert Strang
6.3
Multigrid Methods
The Jacobi and Gauss-Seidel iterations produce smooth errors. The error vector e
has its high frequencies nearly removed in a few iterations. But low frequencies are
reduced very slowly. Convergence requires O(N 2 ) iterationswhich can be unaccept
able. The extremely eective multigrid idea is to change to a coarser grid, on which
smooth becomes rough and low frequencies act like higher frequencies.
On that coarser grid a big piece of the error is removable. We iterate only a few
times before changing from ne to coarse and coarse to ne. The remarkable result
is that multigrid can solve many sparse and realistic systems to high accuracy in a
xed number of iterations, not growing with n.
Multigrid is especially successful for symmetric systems. The key new ingredients
are the (rectangular !) matrices R and I that change grids:
1.
A restriction matrix R transfers vectors from the ne grid to the coarse grid.
2.
h
The return step to the ne grid is by an interpolation matrix I = I2h
.
3.
Interpolation Iv = u
u on the ne (h) grid from
v on the coarse (2h) grid
values are the us.
u1
1
v1 /2
u2
2
v1
v1 /2+v2 /2 u3
1 1 v1
1
v2 = v2
= u4 (1)
2
v2 /2+v3 /2 u5
1 1 v3
u6
v3
2
u7
1
v3 /2
1
u2j+1 = (vj + vj+1 ) .
2
(2)
The odd-numbered rows of the interpolation matrix have entries 12 and 12 . We almost
always use grid spacings h, 2h, 4h, . . . with the convenient ratio 2. Other matrices I
are possible, but linear interpolation is easy and eective. Figure 6.10a shows the
new values u2j+1 (open circles) between the transferred values u2j = vj (solid circles).
c
2006
Gilbert Strang
u2 = v 1
vm =
u1
2+ 2
4
sin 2m
4
j=1
m=1
m=3
j=7
uj = sin 2j
8
(a) Linear interpolation by u = I2hh v
Figure 6.10: Interpolation to the h grid (7 us). Restriction to the 2h grid (3 vs).
When the vs represent smooth errors on the coarse grid (because Jacobi or GaussSeidel has been applied on that grid), interpolation gives a good approximation to
the errors on the ne grid. A practical code can use 8 or 10 grids.
The second matrix we need is a restriction matrix R2h
h . It transfers u on a
ne grid to v on a coarse grid. One possibility is the one-zero injection matrix that
simply copies v from the values of u at the same points on the ne grid. This ignores
the odd-numbered ne grid values u2j+1 . Another possibility (which we adopt) is the
full weighting operator R that comes from transposing I2hh .
1
h T
Fine grid h to coarse grid 2h by a restriction matrix R2h
h = 2 (I2h )
Full weighting Ru = v
Fine grid u to coarse grid v
u1
u2
u3
v1
1 2 1
1
u4 = v 2 .
1 2 1
4
v3
1 2 1
u5
u6
u7
(3)
The eect of this restriction matrix is shown in Figure 6.10b. We intentionally chose
the special case in which uj = sin(2j/8) on the ne grid (open circles). Then v on
the coarse grid (dark circles) is also a pure sine vector. But the frequency is doubled
(a full cycle in 4 steps). So a smooth oscillation on the ne grid becomes half as
smooth on the coarse grid, which is the eect we wanted.
c
2006
Gilbert Strang
1
u2i,2j = vi,j and u2i,2j+1 = (vi,j + vi,j+1 ) as in 1D.
2
(4)
Now sweep vertically, up each column of the ne grid. Interpolation will keep those
values (4) on even-numbered rows 2i. It will average those values to nd u = I2D v
on the ne-grid odd-numbered rows 2i + 1:
Vertical sweep
Averages of (4)
(5)
The entries in the tall thin coarse-to-ne interpolation matrix I2D are 1, 21 , and 14 .
The full weighting ne-to-coarse restriction operator R2D is the transpose I2DT ,
multiplied by 41 . That factor is needed (like 21 in one dimension) so that a constant
vector of 1s will be restricted to a constant vector of 1s. (The entries along each row
1
and
of the wide matrix R add to 1.) This restriction matrix has entries 14 , 81 , and 16
each coarse-grid value v is a weighted average of nine ne-grid values u:
Restriction matrix R =
1
4
IT
1/8
1/16
1/8
u2i,2j /4
1/16
1/8
1/16
2i1
1/8
2i
1/16
2i+1
You can see how a sweep along each row with weights 14 , 12 , 41 , followed by a sweep
down each column, gives the nine coecients in that restriction molecule. Its
matrix R2D is an example of a tensor product or Kronecker product kron(R, R). A
3 by 7 matrix R in one dimension becomes a 9 by 49 restriction matrix R2D in two
dimensions.
Now we can transfer vectors between grids. We are ready for the geometric
multigrid method, when the geometry is based on spacings h and 2h and 4h. The
idea extends to triangular elements (each triangle splits naturally into four similar
triangles). The geometry can be more complicated than our model on a square.
When the geometry becomes too dicult, or we are just given a matrix, we turn
(in the nal paragraph) to algebraic multigrid. This will imitate the multi-scale
idea, but it works directly with Au = b and not with any underlying geometric grid.
c
2006
Gilbert Strang
the low frequency smooth part of the solution u. The multigrid method transfers the
current residual rh = b Auh to the coarse grid. We iterate a few times on that 2h
grid, to approximate the coarse-grid error by E2h . Then interpolate back to Eh on
the ne grid, make the correction to uh + Eh , and begin again.
This ne-coarse-ne loop is a two-grid V-cycle. We call it a v-cycle (small v).
Here are the steps (remember, the error solves Ah (u uh ) = bh Ah uh = rh ):
1.
2.
3.
Solve A2h E2h = r2h (or come close to E2h by 3 iterations from E = 0).
4.
h
E2h . Add Eh to uh .
Interpolate E2h back to Eh = I2h
5.
Step 3 involves a fourth matrix A2h , to be dened now. A2h is square and it is smaller
than the original Ah . In words, we want to project the larger matrix Ah onto the
coarse grid. There is a natural choice ! The variationally correct A2h comes directly
and beautifully from R and A and I:
h
The coarse grid matrix is A2h = R2h
h Ah I2h = RAI.
(6)
When the ne grid has N = 7 interior meshpoints, the matrix Ah is 7 by 7. Then the
coarse grid matrix RAI is (3 by 7)(7 by 7)(7 by 3) = 3 by 3.
Example In one dimension, A = Ah might be the second dierence matrix K/h2 . Our
rst example came from h = 18 . Now choose h = 61 , so that multigrid goes from ve
meshpoints inside 0 < x < 1 to two meshpoints (I is 5 by 2 and R is 2 by 5): The neat
multiplication (we will use it again later) is RAh = RK5 /h2 :
2 1
2 1
1
1
1 1 2 1
0
2
0
1
0
1
2 1
. (7)
RA =
= (2h)2 0 1 0
1 2 1 h2
2 0
4
1
2 1
1
2
c
2006
Gilbert Strang
A natural choice for A2h on the coarse grid is K2 /(2h)2 and multigrid makes this choice:
Coarse grid matrix
A2h
1
2 1
= RAI =
.
2
(2h)2 1
(8)
The reader will appreciate that the I T AI rule preserves symmetry and positive deniteness,
when A has those properties. The rule arises naturally in Galerkin methods [,page ],
including the nite element method. Notice how the restriction operator R with the
factor 14 automatically adjusts 1/h2 to 1/(2h)2 .
Steps 1 and 5 are necessary, but they are really outside the essential multigrid
idea. The smoother is step 1, the post-smoother is step 5. Those are normal
iterations for which weighted Jacobi or Gauss-Seidel is satisfactory.
(9)
iteration matrix = I P 1 A .
c
2006
Gilbert Strang
0 1/2 0 0 0
0 1 0 0 0
0 1 0 0 0
A1
2h RAh = 0 0 0 1 0
0
1/2
0
1/2
0
(11)
S=
.
0 0 0 1 0
Now multiply by I to nd S
0 0 0 1/2 0
The eigenvalues of this S are 1, 1, 0, 0, 0. If you square S, you recover S 2 = S. With
its three columns of zeros, the nullspace of S contains all ne-grid vectors of the form
(e1 , 0, e3 , 0, e5 ). Those are vectors that dont appear on the coarse grid. If the error e had
this form, then E = Se would be zero (no improvement from multigrid). But we dont
expect a large component of those high frequency vectors in e, because of the smoothing.
The column space of S contains column 2 = ( 12 , 1, 12 , 0, 0) and column 4 = (0, 0, 21 , 1, 21 ).
These are mixed-frequency vectors. We do expect them to appear in e, because the
smoothing step didnt remove them. But these are vectors for which E = Se = e and
they are the errors that multigrid catches ! After step 4 they are gone.
Let me say this in another way. Because S 2 = S, the only eigenvectors are = 0
and = 1. (If Su = u we always have S 2 u = 2 u. Then S 2 = S gives 2 = .) Our
example has = 1, 1, 0, 0, 0. The eigenvalues of I S are 0, 0, 1, 1, 1. The eigenvectors
e reveal what multigrid is doing:
E = Se = 0 In this case multigrid gives no improvement. The correction E added
to uh in step 4 is zero. IN the example, this happens for errors e = (e1 , 0, e3 , 0, e5 )
that are zero on the coarse grid where step 3 is working.
E = Se = e In this case multigrid is perfect. The correction Eh added to uh in
step 4 is the whole error eh . In the example, two eigenvectors of S for = 1 are e =
(1, 2, 2, 2, 1) and e = (1, 2, 0, 2, 1). Those have large low-frequency components.
They oscillate up and down only once and twice. They are in the column space of
I. They are no perfect sines, but an important part of the low-frequency error is
caught and removed. The number of independent vectors with Se = e is the number
of coarse gridpoints (here 2). That measures the A2h problem that step 3 deals with.
It is the rank of S and also I. The other 5 2 gridpoints account for the nullspace
of S, where E = Se = 0 means no improvement from multigrid.
Note
The high-frequency vectors (u1 , 0, u3 , 0, u5 ) with Su = 0 are not exactly
combinations of the last three discrete sines y3 , y4 , y5 . The frequencies are mixed by
S, as equations (1819) will clearly show. The exact statements are column space of S
= column space of I and nullspace of S = nullspace of RA. The mixing of frequencies
does not aect our main point: Iteration handles the high frequencies and
multigrid handles the low frequencies.
You can see that a perfect smoother followed by perfect multigrid (exact solution
at step 3) would leave no error. In reality, this will not happen. Fortunately, a careful
c
2006
Gilbert Strang
(not so simple) analysis will show that a multigrid cycle with good smoothing can
reduce the error by a constant factor that is independent of h:
error after step 5 error before step 1 with < 1 .
(12)
1
A typical value is = 10
. Compare with = .99 for Jacobi alone. This is the
Holy Grail of numerical analysis, to achieve a convergence factor (a spectral radius
of the overall iteration matrix) that does not move up to 1 as h 0. We can achieve
a given relative accuracy in a xed number of cycles. Since each step of each cycle
requires only O(n) operations on sparse problems of size n, multigrid is an O(n)
algorithm. This does not change in higher dimensions.
There is a further point about the number of steps and the accuracy. The user
may want the solution error e to be as small as the discretization error (when the
original dierential equation was replaced by Au = b). In our examples with second
dierences, this demands that we continue until e = O(h2 ) = O(N 2 ). In that case
we need more than a xed number of v-cycles. To reach k = O(N 2 ) requires
k = O(log N ) cycles. Multigrid has an answer for this too.
Instead of repeating v-cycles, or nesting them into V-cycles or W-cycles, it is
better to use full multigrid: FMG cycles are described below. Then the operation
count comes back to O(n) even for this higher required accuracy e = O(h2 ).
2h
4h
8h
Figure 6.11: V-cycles and W-cycles and FMG use several grids several times.
The full multigrid cycle in Figure 6.11c is asymptotically better than V or W.
Full multigrid starts on the coarsest grid. The solution on the 8h grid is interpolated
to provide a good initial vector u4h on the 4h grid. A v-cycle between 4h and 8h
improves it. Then interpolation predicts the solution on the 2h grid, and a deeper
c
2006
Gilbert Strang
V-cycle makes it better (using 2h, 4h, 8h). Interpolation of that improved solution
onto the nest grid gives an excellent start to the last and deepest V-cycle.
The operation counts for a deep V-cycle and for full multigrid are certainly greater
than for a two-grid v-cycle, but only by a constant factor. That is because the count
is divided by a power of 2 every time we move to a coarser grid. For a dierential
equation in d space dimensions, we divide by 2d . The cost of a V-cycle (as deep as
we want) is less than a xed multiple of the v-cycle cost:
2
1
1
2d
V-cycle cost < 1 + d + d + v-cycle cost = d
v-cycle cost . (13)
2
2
2 1
Full multigrid is no more than a series of inverted V-cycles, beginning on a very coarse
mesh. By the same reasoning that led to (13),
2d
Full multigrid cost < d
V-cycle cost <
2 1
2d
2d 1
Multigrid Matrices
For a 3-grid V-cycle, what matrix S3 corresponds to the 2-grid v-cycle projection
S = I(RAI)1 RA ? No smoothing is involved here. S and S3 only project to coarser
problems. By itself, S3 will not be a good solver without smoothers.
To construct A4h , replace the matrix A2h = RAI on the middle grid by using a
4h
coarse restriction Rc = R2h
that transfers down to the 4h grid, and an interpolation
2h
Ic = I4h that comes back to 2h:
Very coarse matrix
A4h = Rc A2h Ic .
(15)
1
If h = 16
, that product is (3 7)(7 7)(7 3). The 3 by 3 problem uses A4h on the
1
4h = 4 grid (with three interior unknowns). Then S3 = (S3 )2 is the matrix that goes
down two grids, solves the very coarse problem by A1
4h , and comes back up:
E3 = S3 e = Error removed
2h 1 4h 2h
S3 = I2hh I4h
A4h R2h Rh A .
(16)
1
The error that remains after an unsmoothed V-cycle will be (I S3 )e. On the h = 16
grid, there are 15 frequencies in the error e. Only the lowest 3 are (approximately)
removed by S3 e. We have only solved a 3 by 3 problem with A4h . It is the smoothers,
on the ne h grid and the middle 2h grid, that reduce the high and middle frequency
errors in the solution.
1
Note to myself: S4h = Ic A1
4h Rc A2h = Ic A4h Rc RAI = 7 7 has I on right instead
of left. Still S42h = S4h .
c
2006
Gilbert Strang
Numerical Experiments
The real test of multigrid eectiveness is numerical ! Its k-step approximation ek
should approach zero and the graphs will show how quickly this happens. An initial
guess u0 includes an error e0 . Whatever iteration we use, we are trying to drive uk to
u, and ek to zero. The multigrid method jumps between two or more grids, so as to
converge more quickly, and our graphs will always show the error on the ne grid.
We can work with the equation Ae = 0, whose solution is e = 0. Following [],
Figure 6.
a displays an initial error with low and high frequencies:
(e0 )j = sin 4jh + sin 20jh with h =
1
.
32
We draw continuous functions sin 4x and sin 20x rather than 31 discrete values.
b shows the error vector e3 after three ne-grid sweeps of weighted
Figure 6.
Jacobi (with = 32 ). As expected, the high frequency component has greatly decayed.
The error e3 = (I P 1 A)3 e0 is much smoother than e0 . For our second dierence
matrix Ah (better known as K), the Jacobi preconditioner from Section 6.2 simply
has P 1 = 21 I. We can ignore the factors h2 that divide K and P , because they
cancel.
Figure 6.12: The initial error and the error e3 after three weighted Jacobi relaxations.
Now multigrid begins. The current ne-grid residual is rh = Ah e3 . After restric
tion to the coarse grid it becomes r2h . Three weighted Jacobi iterations on the coarse
grid error equation A2h E2h = r2h start with the guess E2h = 0. That produces...
ADD V-CYCLE AND FMG EXPERIMENTS
Figure 6.13: The ne-grid error after 3 sweeps on the coarse grid (a) and then the
ne grid (b).
Eigenvector Analysis
The reader will recognize that one matrix (like I S for a v-cycle) can describe each
multigrid method. The eigenvalues of a full multigrid matrix would be nice to know,
but they are usually impractical to nd. Numerical experiments build condence.
Computation also provides a diagnostic tool, to locate where convergence is stalled
and a change is needed (often in the boundary conditions). If we want a predictive
tool, the best is modal analysis.
The key idea is to watch the Fourier modes. In our example those are discrete
sines, because of the boundary conditions u(0) = u(1) = 0. We will push this model
problem all the way, to see what the multigrid matrix I S does to those sine vectors.
c
2006
Gilbert Strang
The nal result in (1819) shows why multigrid works and it also shows how pairs of
frequencies are mixed. The eigenvectors are mixtures of two frequencies.
The frequencies that mix together are k and N + 1 k. The discrete sines with
those frequencies are yk and Yk :
k
2k
(N +1k)
2(N +1k)
yk = sin
, sin
, . . . and Yk = sin
, sin
,... .
N +1
N +1
N +1
N +1
Where yk goes up and down k times, Yk does that N + 1 k times.
There is something neat you have to see. Yk and yk have the same compo
nents except for alternating signs ! Cancel N +1 with N +1 in those components
of Yk :
2k
k
2k
k
, sin 2
, . . = + sin
, sin
, . . (17)
Yk = sin
N +1
N +1
N +1
N +1
For our model problem with second dierences, we can report the result of multiplying
by S = I(RAI)1 RA. In fact the true multigrid matrix is I S, to give the error
e E = (I S) e that remains after steps 2, 3, 4. Seeing I S is even better:
1
k
(I S) yk =
1 cos
(yk + Yk )
2
N +1
1
k
(I S)Yk =
1 + cos
(yk + Yk )
2
N +1
(18)
(19)
2.
3.
c
2006
Gilbert Strang
4.
k
k
S = e (I S)e = (I S) 1 + cos
yk 1 cos
Yk = 0 (20)
N +1
N +1
These mixtures e in square brackets are completely killed ( = 0) by multigrid.
A smooth error vector eh (after Jacobi iterations have reduced its high frequency
components) has only small components along the eigenvectors yk + Yk . Multigrid
doesnt touch those pieces ( = 1). The larger components along the Zk will die.
1 1
k
1 1
k
h
2h
h
Ryk =
y
and RYk = cos
y 2h .
(21)
cos
2 2
N +1 k
2 2
N +1 k
You see the aliasing by R. We cannot hope to decide the input yk or Yk from these
outputs. This is normal for a short wide matrix. (The matrix R is 3 by 7 and 2 by 5
in our examples.) The coarse mesh output has only about half as many components
as the ne mesh input.
The transpose of R does the opposite. Where R mixes two inputs ykh and Ykh into
one output, the interpolation matrix I sends one input frequency on the coarse grid
into a pair of frequencies on the ne grid:
k
k
h 2h
h
2 I2h yk = 1 + cos
y 1 cos
Ykh .
(22)
N +1 k
N +1
Interpolation of a smooth vector (low k) on a coarse grid will excite an oscillatory
mode Ykh (high k) on the ne grid. But those oscillations have small amplitude,
because the cosine of k/(N +1) is near 1.
The key formulas (18) and (19) that describe multigrid come from assembling (21)
for R, (22) for I, and the known eigenvalues for Ah and A2h . I think the calculation of
S in (11) shows this best. Its zero columns put yk + Yk in its nullspace. The nonzero
columns in S come from the interpolation matrix I. So Se captures a part of the
whole error e. Multigrid solves a projection of the original problem.
c
2006
Gilbert Strang
1
for M (I S)M .
9
The largest eigenvalue of M is .91you see the value of multigrid. I hope you will try
eig(M M (I S) M M ) with double smoothing (see Problems 912).
(23)
(24)
This tells us the action of Ah = K /h2 . It also tells us about A2h , when the coarse
mesh changes h2 to (2h)2 and to 2.
The restriction matrix R still introduces aliasing. The frequencies that mix to
gether are now and + . Notice how increasing by produces a factor ein =
(1)n with alternating signs. This is exactly what we saw for yk and Yk . This time
it is y y+
that has all zeros in its even-numbered components.
This pure Fourier analysis will go all the way to formulas for the innite (I S)y
and (I S)y+
, just like equations (18) and (19). In that nite case, those equations
explained why multigrid succeeds. They do the same in the innite case.
Let me emphasize why this pure modal analysis (with no boundaries and constant
coecients) was mentioned. It allows Fourier to work freely. The dierential equation
div(c(x, y) grad u) = f (x, y) on a general region would lead to giant complications in
the eigenvectors for multigridimpossible to nd them. But if we x c = constant
and ignore boundaries, those interior eigenvectors return to simple combinations of
y and y+
. That leaves diculties associated with the boundary conditions, which
Fourier doesnt easily resolve.
c
2006
Gilbert Strang
Algebraic Multigrid
We close this section with a few words about algebraic multigrid, when the problem
comes as a system of equations Au = b. There is no grid in the background. We
need to nd replacements for the key ideas on which geometric multigrid was based:
smooth vectors, connected nodes, coarse meshes. Replacing the rst two is fairly easy,
rescaling Ah to A2h on a (nonexistent) coarse mesh is less clear.
1.
Smooth vectors. In geometric multigrid these are vectors for which the
norms of u and Au are comparable. That will be our indication of a smooth
vector. High frequencies in u would be greatly amplied by A (just as the
second derivative amplies sin kt by k 2 ). For low frequencies, Multigrid works
on smooth errors and Jacobi doesnt.
2.
3.
Coarse subset of nodes. Each signicant entry Aij indicates that the value
of uj strongly inuences the value of ui . Probably the errors ej and ei are
comparable, when the error is smooth. We dont need both i and j in the
coarse set C of nodes. On a grid they would be neighbors, not both in C.
But if i is not in C, then every j that strongly inuences i should either be in
C or be strongly inuenced by another J that is in C. This heuristic rule is
discussed more fully in [], with an algorithm for constructing C. (In general,
too many coarse unknowns are better than too few.)
c
2006
Gilbert Strang
is simpler). Fluids do have ne scales so that multigrid becomes a natural idea, not
only numerically but physically. Of course uids dont generally present symmetric
matrices, because of the convective terms, and nite element methods may require
upwind adjustments. The analysis of multigrid convergence becomes harder for
uids, just when a multiscale approach becomes attractive.
c
2006
Gilbert Strang
15 Use A4h from Problem 14 to compute the projection matrix S3 in (16). Verify
that this 15 by 15 matrix has rank 3, and that (S3 )2 = S3 .
1
16 The weighted Jacobi smoothers are Mh = I15 3
K15 and M2h = I7 13 K7 . With
MATLAB, compute the smoothed error reduction matrix V3 and its eigenvalues: