Adjoint Sensitivity Analysis For Differential-Algebraic Equations: The Adjoint Dae System and Its Numerical Solution
Adjoint Sensitivity Analysis For Differential-Algebraic Equations: The Adjoint Dae System and Its Numerical Solution
Adjoint Sensitivity Analysis For Differential-Algebraic Equations: The Adjoint Dae System and Its Numerical Solution
1076
1077
G(x, p) =
g(x, t, p)dt,
Ax + Bx = 0,
where A and B are suciently smooth matrix functions, the adjoint or dual DAE is
given by
(4)
(A ) B = 0,
where denotes the conjugate transpose and denotes the time derivative. In related works [3, 4, 5, 7, 6], Balla and M
arz have derived the adjoint system for index-1
DAEs and showed its solvability and index-1 structure under minimal conditions of
smoothness. When the adjoint method is used for computation of the sensitivities, the
adjoint system (4) may become inhomogeneous and have dierent initial conditions,
depending on the form of the objective functions. Our contributions in the present
paper are rst to give a simple derivation using variational methods of the adjoint
system for general nonlinear DAEs and to show how to use it in sensitivity analysis. Then we present conditions for the consistent initialization of the adjoint DAE
system for DAEs of index up to two (Hessenberg). Finally, we show for linear DAEs
that if the original DAE is stable, then the adjoint DAE (for semi-explicit DAEs) or
an augmented adjoint DAE (for fully implicit DAEs) is stable, and that numerical
stability is maintained for the adjoint DAE or the augmented adjoint DAE.
We note that M
arz [24] has shown that for well-formulated DAE systems (these
DAE systems must be written in a specic form), the operations of adjoint and discretization commute. For such systems, stability of the adjoint and its discretization
1078
is maintained, hence there is no need for an augmented adjoint DAE. There are many
computational advantages for such DAE systems, hence we advocate rewriting the
system into this form whenever possible. However, for large nonlinear problems from
scientic computing this may not always be possible or convenient. Thus we believe
it is important also to consider numerical methods for DAEs of the more general form
(1).
The outline of this paper is as follows. In section 2 we derive the adjoint system by
a variational method, along with conditions for its consistent initialization for index-0
and index-1 DAEs. In section 3 we give some examples and derive the conditions for
the consistent initialization of the adjoint system for Hessenberg index-2 DAEs. In
section 4 we investigate the stability of the adjoint system and show how to formulate
an augmented adjoint system for which stability is maintained even for fully implicit
DAE systems. In section 5 we address the numerical stability of the adjoint and
augmented adjoint DAE systems.
2. Derivation of the adjoint system for sensitivity.
2.1. Sensitivity of G(x, p). We focus rst on solving the sensitivity problem
for G(x, p) dened by (2). Introducing a Lagrange multiplier , we form the augmented objective function
T
I(x, p) = G(x, p)
F (x, x,
t, p)dt.
0
Since F (x, x,
t, p) = 0, the sensitivity of G with respect to p is
T
T
dG
dI
=
=
(5)
(gp + gx xp )dt
(Fp + Fx xp + Fx x p )dt,
dp
dp
0
0
where subscripts on functions such as F or g are used to denote partial derivatives.
By integration by parts, we have
T
T
T
Fx x p dt = ( Fx xp )|0
( Fx ) xp dt.
0
(7)
we obtain
(8)
dG
=
dp
0
(gp Fp ) dt ( Fx xp )|T0 .
Fx |t=T = 0,
(10)
1079
dG
dp
(gp Fp ) dt + ( Fx xp )|t=0 .
This choice will not suce for a Hessenberg index-2 DAE system. Consider the system
d
x = f 1 (xd , xa , p),
(11)
0 = f 2 (xd , p),
where xd and xa denote the dierential and algebraic solution components, respecf 1
f 1
f 2
tively, A = x
d , B = xa , C = xd , and CB invertible. Here the adjoint system is
given by
+ d A + a C = g d ,
d
x
(12)
d B = gxa ,
where d and a denote the dierential and algebraic adjoint variables, respectively.
If the initial conditions are set as in (9), then d (T ) = 0, which may be in conict
with the constraint equations of (12) if g(x, p) depends explicitly on xa . To resolve
this potential conict, we require
d (T ) = C|t=T ,
(13)
(14)
(15)
(16)
gp + d fp1 + a fp2 dt + (d xdp )|t=0 + gxa (CB)1 fp2 .
dg
dp
d dG
dT dp
dg
= (gp Fp )(T )
dp
0
T Fp dt + (T Fx xp )|t=0
d( Fx xp )
,
dT
dg
dp .
1080
where T denotes T
. For index-0 and index-1 DAEs,
Hessenberg index-2 DAE system of the form (11),
d( Fx xp )|t=T
dT
= 0. For a
d(gxa (CB)1 fp2 )
d( Fx xp )|t=T
=
.
dT
dt
t=T
The corresponding adjoint equations are
(18)
(T Fx ) T Fx = 0.
For index-0 and index-1 DAEs (as shown above, the index-2 case is dierent), to nd
the boundary condition for this equation we write as (t, T ) because it depends on
both t and T . Then
(T, T )Fx |t=T = 0.
Taking the total derivative, we obtain
(t + T ) (T, T )Fx |t=T + (T, T )
dFx
= 0.
dt
+ Fx
(T Fx )|t=T = (T, T )
(19)
.
dt
t=T
For the index-one DAE case, (7) and (19) yield
(20)
For the regular implicit ODE case, Fx is invertible; thus we have (T, T ) = 0, which
).
leads to T (T ) = (T
We will discuss the appropriate boundary condition for index-2 Hessenberg DAEs
in the next section.
3. Examples. For simplicity of presentation, throughout this section we assume
that the only dependency of the dierential equations on the parameters p is through
the initial conditions and that the objective function g does not depend explicitly on
p. This is often the case in applications. We also assume that the initial conditions
are consistent with the algebraic constraints (including any hidden constraints) in a
neighborhood of the nominal values of the parameters.
3.1. Standard form ODE. Given the ODE initial value problem
x = f (x, t),
x(0) = x0 (p)
and the function g(x) at t = T , the corresponding adjoint system is
T = fx T ,
T (T ) = gx
dx0
dp .
1081
F (x,
x, t) = 0,
x(0) = x0 (p),
F
with A = F
x nonsingular, B = x , and the function g(x) at t = T , the corresponding
adjoint system is
(A T ) B T = 0,
A T (T ) = gx
dg
dp
= T (0)A(0)x0p .
x = f 1 (xd , xa , t),
0 = f 2 (xd , xa ),
d
x (0) = xd0 (p),
1
f
f
f
f
d
with A = x
d , B = xa , C = xd , D = xa nonsingular, and the function g(x ) at
t = T , the corresponding adjoint system is
dT = A dT C aT ,
0 = B dT + D aT ,
d
T (T ) = gxd
dxd
d
0
and the sensitivity is given by dg
dp = T (0) dp .
d
If the function g depends on both x and xa , g = g(xd (T ), xa (T )), the adjoint
equations are the same as above, but the boundary condition is now given by
dT (T ) = gxd C (D )1 gxa .
3.4. Hessenberg index-2 DAE. Given
x d = f 1 (xd , xa , t),
0 = f 2 (xd ),
d
x (0) = xd0 (p),
1
f
f
f
with A = x
d , B = xa , C = xd , and CB invertible, and given that the function
g(x, T, p) depends only on the dierential components xd of x, the corresponding
adjoint system is
dT = A dT C aT ,
0 = B dT ,
d
T (T ) = (I C (B C )1 B )gxd
dxd
d
0
and the sensitivity is given by dg
dp = T (0) dp .
d
a
If g(x, T, p) depends on both x and x , we must solve for the boundary condition
rst, from
d + A d + C a = gxd ,
(21)
0 = B d + gxa ,
d (T ) = C (B C )1 gxa .
1082
dT = A dT C aT ,
0 = B dT .
so that
Td (T, T )
dgxa
d
= |t=T
(CB)1 C.
dt t=T
dT (T, T ) = (I C (B C )1 B ) gxd |t=T + A d (T, T ) ,
dT ) (T, T )
d((CB)1 C)
dgxa
1
a |t=T
=
(CB)
C
g
x
dt t=T
dt
dT |t=T = d C (B C )1
dgxa
d(C (B C )1 )
g
.
xa
dt t=T
dt
t=T
From
d(C (B C )1 )
dC 1
1 dB
dC
=
(B C ) C (B C )
C +B
(B C )1 ,
dt
dt
dt
dt
we obtain
dT |t=T
dB 1
dC 1
d
1 dgxa
C (B C ) gxa P
(B C ) gxa .
= C (B C )
dt
dt
dt
d
T (T, T ) = P gxd + A (T, T )
(B C ) gxa .
dt
1083
1
et x + et x = 0.
2
1
et et + et = 0,
2
which is equivalent to
1
+ = 0.
2
(24)
Note that we solve the adjoint system in the backward direction. Thus the adjoint
system (24) is unstable.
= F , we can form the augmented adjoint system for (7):
Denoting
x
(25)
F = g ,
x
x
F = 0.
satises
If we solve the augmented adjoint system (25) instead of (24),
(26)
1
= 0,
which is stable in the backward direction. We will show that in general the aug is stable if the original system is stable. Thus in the
mented adjoint system (25) for
implementation [12] we solve (25) instead of (7).
We show rst that the adjoint system (7) for explicit ODE, semi-explicit index-1
DAE, and Hessenberg index-2 DAE systems preserves the stability. In these cases,
Then by bounded transformation, we show
there is no dierence between and .
is also preserved for
that the stability of the augmented adjoint system (25) for
implicit ODE and general index-1 DAEs.
4.1. Explicit ODE. For a linear ODE x = A(t)x, the corresponding homogeneous adjoint system is
= A .
Since the adjoint system is solved backward, we can do a change of variable = T t.
The adjoint system, now to be solved forward, is transformed to
= A .
Then we have the following well-known result.
1084
Theorem 4.1. If the ODE system x = A(t)x is stable, the adjoint ODE system
is also stable.
Proof. If A is constant, the result follows from the fact that A and A have the
same eigenvalues. If A is not constant, we look at the Greens function. Let X and
be the fundamental solutions of the original ODE and the adjoint system, respectively.
The original system is stable if and only if X(s)X 1 (t) is bounded for s > t. The
adjoint system is stable if and only if (s)1 (t) is bounded for s > t.
Let Z(t) = X (T t)(t). Then
= X A + X A = 0.
Z = X (T t)(t) + X (T t)(t)
Thus Z(t) is constant, so that
X (T t)(t) = X (T s)(s).
Then
(s)1 (t) = X(T t)X 1 (T s) .
Finally, note that s > t leads to T t > T s.
4.2. Semi-explicit index-1 DAE. Consider the linear semi-explicit index-1
DAE system
d
x = A(t)xd + B(t)xa ,
0 = C(t)xd + D(t)xa ,
where D(t) is invertible. The corresponding adjoint system is
d
= A d C a ,
0 = B d + D a .
The original system is stable if and only if the essential underlying ODE (EUODE)
x d = Axd B(D)1 Cxd is stable [1]. The corresponding EUODE for the adjoint
system is given by d = A d + C (D )1 B d , or d = (A B(D)1 C) d .
Thus the EUODE of the adjoint system is the adjoint of the EUODE of the original
system. We know from Theorem 4.1 that the EUODE of the adjoint system is stable.
Thus the adjoint system is also stable.
4.3. Hessenberg index-2 DAE. Consider the linear Hessenberg index-2 DAE
system
d
x = A(t)xd + B(t)xa + q(t),
0 = C(t)xd + r(t),
where C(t)B(t) is nonsingular. The corresponding EUODE is derived in [1] as follows. If B is suciently smooth, there exists a smooth bounded matrix function
R(t) R(nd na )nd whose linearly independent, normalized rows form a basis for the
nullspace of B . Here nd is the dimension of the dierential variables xd and na is
the dimension of the algebraic variables xa . Thus R(t)B(t) = 0 and
R
is invertible.
C
1085
(27)
0 t T,
xd is given by
(28)
x =
R
C
1
u
r
= Su F r,
RS = I,
CS = 0.
)r(t) + Rq(t).
u = (RAS + RS)u
(RAF RF
The original system is stable for the dierential variables if the EUODE (30) is stable.
Now consider the adjoint system
d
= A (t)d C (t)a + q(t),
0 = B (t)d + r(t).
Since CS = 0, we have S C = 0. Multiplying the adjoint system by S (t), we obtain
(31)
S = S A (t) + S q(t).
Letting v = S , we have
v
S
= R
(32)
d =
r
B
C (B C )1
.
v = S R v S C (B C )1 r S A R v + S A C (B C )1 r.
+ RAS) v.
v = (S R + S A R )v = (RS
We can see that the EUODE of the adjoint system is the same as the adjoint of
the EUODE of the original system. Thus from Theorem 4.1 we know that the adjoint system for a stable linear Hessenberg index-2 DAE is stable for the dierential
variables.
4.4. Fully implicit DAE. For implicit ODE and index-1 DAEs, we have already seen from the example (24) that the adjoint DAE (7) for may be unstable.
preserves the stability.
But we will show that the augmented adjoint system (25) for
To prove that, we rst refer to a lemma from Campbell, Bichols, and Terrel [11] that
adjoint commutes with any nonsingular linear transformation.
Lemma 4.2. Given the time-dependent linear DAE system
(35)
1086
in the adjoint (7) may not be bounded but the transformation in the augmented
will be. Thus we have the following theorem.
adjoint system (25) for
Theorem 4.3. For general index-0 and index-1 DAE systems, if the original
is stable.
DAE system is stable, the augmented adjoint DAE system (25) for
Proof. Using the smooth SVD [13] of A(t), we can construct orthogonal matrices
Q and P so that
0
(36)
P (t)A(t)Q(t) =
,
0 0
1
0
P (t), we have
where = diag{1 , . . . , k }, i = 0. Dening P (t) =
0
I
I 0
(37)
P (t)A(t)Q(t) =
.
0 0
Let x = Q
x. Since Q is orthogonal, if the original system is stable, the transformed
DAE system is also stable. From the conclusion for the explicit cases, we know that
the adjoint of the transformed DAE system is also stable. Now, instead of letting
we let
= A = (P A) .
Since
= P ,
I 0
PA =
Q ,
0 0
is
it follows that P A is bounded. Thus the augmented adjoint system (25) for
stable.
Remark. From the above proof, we can see that if P is bounded, is bounded.
P will be bounded if the pseudoinverse of A is bounded.
5. Numerical stability. In this section we consider numerical stability for the
adjoint system. For systems of index-0 or index-1, we consider the general linear DAE
(38)
A(t)x(t)
+ B(t)x(t) = 0.
(A ) B = 0.
1087
An+1 xn .
n = [An + hBn ]
An+1 n+1 .
Thus we have
1
1
1
0 = N AN CN (A0 + hB0 )1 .
Thus it follows for all linear ODEs and DAEs that asymptotic numerical stability
of the backward Euler method for the original system implies asymptotic numerical
stability for the adjoint system.
1088
Note that the backward Euler discretization (42) which has been assumed in the
stability analysis is not the same as the discretization which would normally be used
in software like DASPK3.0 for solving the adjoint system (39). The dierence occurs
for systems that are fully implicit. In order to solve (39) using DASPK3.0, we would
A + A B = 0.
Discretization by the backward Euler method would then yield (solving backward)
n+1 n
+ An n Bn n = 0,
An
(47)
h
which diers from the discretization (42) and in general does not preserve the asymptotic numerical stability of the forward system.
Consider example (22), whose backward Euler discretization leads to
(48)
xn+1 =
1
1+ h
2
1
xn ,
1
et et + et = 0.
2
n =
1
1 h
2
1
n+1 ,
which is unstable for any 0 < h < 4. Thus this discretization is not stable using
stepsizes for which the discretization of the forward problem is stable.
The discretization (42), which as we have seen preserves the numerical stability,
can be achieved in DAE software by solving the augmented adjoint system (25).
Acknowledgments. The authors would like to thank Steve Lee, Peter Brown,
Keith Grant, Luc Machiels, and Alan Hindmarsh of Lawrence Livermore National
Laboratory for bringing this problem to our attention and for several interesting
discussions.
REFERENCES
[1] U. M. Ascher and L. R. Petzold, Stability of computational methods for constrained dynamics systems, SIAM J. Sci. Comput., 14 (1993), pp. 95120.
[2] U. M. Ascher and L. R. Petzold, Computer Methods for Ordinary Dierential Equations
and Dierential-Algebraic Equations, SIAM, Philadelphia, 1998.
[3] K. Balla, Linear subspaces for linear DAEs of index 1, Comput. Math. Appl., 32 (1996),
pp. 8186.
[4] K. Balla, Boundary conditions and their transfer for dierential algebraic equations of index
1, Comput. Math. Appl., 31 (1996), pp. 15.
1089