DPOCexam2008midterm Solution

Prof. L.
Guzzella
Prof. R. D’Andrea
Midterm Examination November 12th, 2008
Dynamic Programming & Optimal Control (151-0563-00) Prof. R. D’Andrea
Solutions
Exam Duration: 150 minutes
Number of Problems: 4 (25% each)
Permitted aids: Textbook Dynamic Programming and Optimal Control by

Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages.
Your written notes.
No calculators.
Important: Use only these prepared sheets for your solutions.

Page 2 Midterm Examination – Dynamic Programming & Optimal Control
Problem 1 25%
1 9
3 2 1 1
2 1 1
1 4 1 4
S 2 4 6 T
3 3 1 1
5 1 1
3 5 7 10
1 1
1 1
Figure 1
Find the shortest path from node S to node T for the graph given in Figure 1. Apply the label
correcting method. Use best-first search to determine at each iteration which node to remove
from OPEN; that is, remove node i with
di = min dj ,
j in OPEN
where the variable di denotes the length of the shortest path from node S to node i that has
been found so far.
Solve the problem by populating a table of the following form:
Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =

ation OPEN UPPER
...
Midterm Examination – Dynamic Programming & Optimal Control Page 3
Solution 1
Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =

ation OPEN UPPER
0 - S 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
1 S 1,2,3 0 3 1 3 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
2 2 1,3,4 0 2 1 3 5 ∞ ∞ ∞ ∞ ∞ ∞ ∞
3 1 3,4 0 2 1 3 4 ∞ ∞ ∞ ∞ ∞ ∞ ∞
4 3 4,5 0 2 1 3 4 8 ∞ ∞ ∞ ∞ ∞ ∞
5 4 5,6 0 2 1 3 4 7 5 ∞ ∞ ∞ ∞ ∞
6 6 5,9 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 9
7 9 5 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 7
8 5 - 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 7
The shortest path is S → 2 → 1 → 4 → 6 → 9 → T with a total length of 7.

Problem 2 25%
Consider the dynamic system
xk+1 = (1 − a) wk + auk , 0 ≤ a ≤ 1, k = 0, 1 ,
with initial state x0 = −1. The cost function, to be minimized, is given by

( 1
)
X ¡ ¢
E x22 + x2k + u2k + wk2 .
w0 ,w1
k=0
The disturbance wk takes the values 0 and 1. If xk ≥ 0, both values have equal probability. If
xk < 0, the disturbance wk is 0 with probability 1. The control uk is constrained by
0 ≤ uk ≤ 1, k = 0, 1.
Apply the Dynamic Programming algorithm to find the optimal control policy and the optimal
final cost J0 (−1).
Solution 2
The optimal control problem is considered over a time horizon N = 2 and the cost, to be
minimized, is defined by
g2 (x2 ) = x22 and gk (xk , uk , wk ) = x2k + u2k + wk2 , k = 0, 1.
The DP algorithm proceeds as follows:
2nd stage:
J2 (x2 ) = x22
1st stage:
© ª
J1 (x1 ) = min E x21 + u21 + w12 + J2 (x2 )
0≤u1 ≤1
© ¡ ¢ª
= min E x21 + u21 + w12 + J2 (1 − a) w1 + au1
0≤u1 ≤1 w1
n ¡ ¢2 o
= min E x21 + u21 + w12 + (1 − a) w1 + au1
0≤u1 ≤1 w1
Distinguish two cases: x1 ≥ 0 and x1 < 0.
I ) x1 ≥ 0:
½ ´¾
2 2 1³ 2
´ 1³
2
J1 (x1 ) = min x1 + u1 + 1 + ((1 − a) + au1 ) + 0 + ((1 − a) · 0 + au1 )
0≤u1 ≤1 2 2
| {z }
L(x1 ,u1 )
Find the minimizing ū1 by

¯
∂L ¯¯ ¡ ¢ ! −a (1 − a)
¯ = (1 − a) a + 2 1 + a2 ū1 = 0 ⇔ ū1 = ≤ 0 (!).
∂u1 ū1 2 (1 + a2 )
Recall that the feasible set of inputs u1 is given by 0 ≤ u1 ≤ 1.
However, using the information that L (x1 , u1 ) is convex in u1 ; that is,
∂2L ¡ ¢
2 = 2 1 + a2 > 0,
∂u1
it follows that ū1 is a local minimum and the feasible optimal control u∗1 is given by
⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 ≥ 0.

II ) x1 < 0:
© 2 ¡ ¢ ª
J1 (x1 ) = min x1 + 1 + a2 u21
0≤u1 ≤1 | {z }
L(x1 ,u1 )
Find the minimizing ū1 by

¯
∂L ¯¯ ¡ ¢ !
¯ = 2 1 + a2 ū1 = 0 ⇔ ū1 = 0.
∂u1 ū1
¯
∂2L ¯
Since the sufficient condition for a local minimum, ∂u21 ¯ū
> 0, holds, the optimal control is
1
⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 < 0.
0th stage:
n ¡ ¢o
J0 (−1) = min E (−1)2 + u20 + w02 + J1 (1 − a) w0 + au0
0≤u0 ≤1 w0
Since x0 < 0, we get

© ¡ ¢ª
J0 (−1) = min 1 + u20 + J1 au0 ,
0≤u0 ≤1 | {z }
L(x0 ,u0 )
where au0 ≥ 0. From above’s results, the optimal cost-to-go function for x1 ≥ 0 is
1 1
J1 (x1 ) = + (1 − a)2 + x21 .
2 2
Finally, the minimizing ū0 results from
¯
∂L ¯¯ !
= 2ū0 + 2a2 ū0 = 0 ⇔ ū0 = 0.
∂u0 ¯ū0
¯
∂2L ¯
Since ∂u20 ¯ū
> 0, the optimal control u∗0 is
0
⇒ u∗0 = µ∗0 (−1) = 0.
With this, the optimal final cost reads as
3 1
J0 (−1) = + (1 − a)2 .
2 2
In brief, the optimal control policy is to always set the input to zero, which can also be verified
by carefully looking at the equations given in the problem statement.
Problem 3 25%
t=0 t=1
u
z=0 z=1
Figure 2
At time t = 0, a unit mass is at rest at location z = 0. The mass is on a frictionless surface and
it is desired to apply a force u(t), 0 ≤ t ≤ 1, such that at time t = 1, the mass is at location
z = 1 and again at rest. In particular,
z̈(t) = u(t), 0 ≤ t ≤ 1, (1)
with initial and terminal conditions:
z(0) = 0, ż(0) = 0,
z(1) = 1, ż(1) = 0.
Of all the functions u(t) that achieve the above objective, find the one that minimizes
Z 1
1
u2 (t)dt.
2 0
Hint: The state for this system is x(t) = [ x1 (t), x2 (t) ]T , where x1 (t) = z(t) and x2 (t) = ż(t).
Solution 3
Introduce the state vector · ¸ · ¸

x z
x= 1 = .
x2 ż
Using this notation, the dynamics read as
· ¸ · ¸
ẋ1 x
= 2
ẋ2 u
with initial and terminal conditions,
x1 (0) = 0, x2 (0) = 0,
x1 (1) = 1, x2 (1) = 0.
Apply the Minimum Principle.
• The Hamiltonian is given by
H(x, u, p) = g(x, u) + pT f (x, u)

1
= u2 + p1 x2 + p2 u.
2
• The optimal input u∗ (t) is obtained by minimizing the Hamiltonian along the optimal
trajectory. Differentiating the Hamiltonian with respect to u yields,
u∗ (t) + p2 (t) = 0 ⇔ u∗ (t) = −p2 (t).
Since the second derivative of H with respect to u is 1, u∗ (t) is indeed a minimum.
• The adjoint equations,
ṗ1 (t) = 0
ṗ2 (t) = −p1 (t),
are integrated and result in the following equations:
p1 (t) = c1 , c1 constant
p2 (t) = −c1 t − c2 , c2 constant.
Using this result, the optimal input is given by
u∗ (t) = c1 t + c2 .
• Recalling the initial and terminal conditions on x, we can solve for c1 and c2 .
With above’s results, the optimal state trajectory x∗2 (t) is
1
ẋ∗2 (t) = c1 t + c2 ⇒ x∗2 (t) = c1 t2 + c2 t + c3 , c3 constant,
2
and, therefore,
x∗2 (0) = 0 ⇒ c3 = 0
1
x∗2 (1) = 0 ⇒ c1 + c2 = 0 ⇒ c1 = −2c2 ,
2
yielding to
x∗2 (t) = −c2 t2 + c2 t.
The optimal state x∗1 (t) is given by
1 1
ẋ∗1 (t) = x∗2 (t) = −c2 t2 + c2 t ⇒ x∗1 (t) = − c2 t3 + c2 t2 + c4 , c4 constant.
3 2
With the conditions on x1 , we get
x∗1 (0) = 0 ⇒ c4 = 0
1 1
x∗1 (1) = 1 ⇒ − c2 + c2 = 1 ⇒ c2 = 6 and c1 = −12.
3 2
• Finally, we obtain the optimal control
u∗ (t) = −12t + 6,
and the optimal state trajectory
x∗1 (t) = z ∗ (t) = −2t3 + 3t2

x∗2 (t) = ż ∗ (t) = −6t2 + 6t.
Problem 4 25%
Recall the Minimum Principle.
Under suitable technical assumptions, the following Proposition holds:

Given the dynamic system
ẋ = f (x(t), u(t)) , x(0) = x0 , 0≤t≤T
and the cost function, Z T

h (x(T )) + g (x(t), u(t)) dt,
0
to be minimized, define the Hamiltonian function
H(x, u, p) = g (x, u) + pT f (x, u) .
Let u∗ (t), t ∈ [0, T ] be an optimal control trajectory and x∗ (t) the resulting state trajectory.
Then,
1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t)) , p(T ) = ∂h
∂x (x∗ (T )) ,
2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t)) ,
3. H (x∗ (t), u∗ (t), p(t)) is constant.
Show that if the dynamics and the cost are time varying – that is, f (x, u) is replaced by f (x, u, t)
and g (x, u) is replaced by g (x, u, t) – the Minimum Principle becomes:
1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t), t) , p(T ) = ∂h
∂x (x∗ (T )) ,
2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t), t)
3. H (x∗ (t), u∗ (t), p(t), t) not necessarily constant,
where the Hamiltonian function is now given by
H(x, u, p, t) = g (x, u, t) + pT f (x, u, t) .

Solution 4
General idea:
Convert the problem to a time-independent one, apply the standard Minimum Principle pre-
sented in class, and simplify the obtained equations.
Follow the subsequent steps:

• Introduce an extra state variable y(t) representing the time:
y(t) = t, with ẏ(t) = 1 and y(0) = 0.
• Convert the problem into standard form by introducing the extended state ξ = [ x, y ]T :
The dynamics read now as
˙ = f˜(ξ, u) = [ f (x, u, y), 1 ]T
ξ(t)
and the cost is defined by Z T

h̃(ξ(T )) + g̃(ξ, u) dt,
0
where g̃(ξ, u) = g(x, u, y) and h̃(ξ) = h(x).

The Hamiltonian follows from above’s definitions:
H̃(ξ, u, p̃) = g̃(ξ, u) + p̃T f˜(ξ, u) with p̃ = [ p, py ]T .
• Apply the Minimum Principle:

Denoting the optimal control by u∗ (t) and the corresponding optimal state by ξ ∗ (t), we
get the following:
1. The adjoint equation is given by
˙ = − ∂ H̃ (ξ ∗ (t), u∗ (t), p̃(t)) ,

p̃(t) p̃(T ) =
∂ h̃ ∗
(ξ (T )) . (2)
∂ξ ∂ξ
However,
H̃(ξ, u, p̃) = g(x, u, y) + pT f (x, u, y) + py = H(x, u, p, y) + py ;
that is,
∂ H̃ ∂H ∂ H̃ ∂H
= , = .
∂x ∂x ∂y ∂y
Moreover,
∂ h̃ ∂h ∂ h̃
= and = 0.
∂x ∂x ∂y
From (2), we recover the first equation
∂H ∗ ∂h ∗
ṗ(t) = − (x (t), u∗ (t), p(t), t) , p(T ) = (x (T )) .
∂x ∂x
In addition, replacing y(t) by t again, we get
∂H ∗
ṗy (t) = − (x (t), u∗ (t), p(t), t) , py (T ) = 0.
∂t
2. The optimal input u∗ (t) is obtained by

n o
u∗ (t) = arg minu∈U H(x∗ (t), u∗ (t), p(t), t) + py (t)
= arg minu∈U H(x∗ (t), u∗ (t), p(t), t).
3. Finally, the standard formulation gives us
H(x∗ (t), u∗ (t), p(t), t) + py (t) is constant.

∂H
However, py (t) is constant only if ∂t = 0, which, in general, is only true if f and g
do not depend on time.

DPOCexam2008midterm Solution

Uploaded by

Copyright:

Available Formats

DPOCexam2008midterm Solution

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DPOCexam2008midterm Solution

Uploaded by

Copyright:

Available Formats

Prof. L.

Midterm Examination November 12th, 2008

Dynamic Programming & Optimal Control (151-0563-00) Prof. R. D’Andrea

Exam Duration: 150 minutes

Number of Problems: 4 (25% each)

Permitted aids: Textbook Dynamic Programming and Optimal Control by

Important: Use only these prepared sheets for your solutions.

Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =

Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =

The shortest path is S → 2 → 1 → 4 → 6 → 9 → T with a total length of 7.

Consider the dynamic system

with initial state x0 = −1. The cost function, to be minimized, is given by

g2 (x2 ) = x22 and gk (xk , uk , wk ) = x2k + u2k + wk2 , k = 0, 1.

The DP algorithm proceeds as follows:

Distinguish two cases: x1 ≥ 0 and x1 < 0.

Find the minimizing ū1 by

Recall that the feasible set of inputs u1 is given by 0 ≤ u1 ≤ 1.

However, using the information that L (x1 , u1 ) is convex in u1 ; that is,

⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 ≥ 0.

Find the minimizing ū1 by

⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 < 0.

Since x0 < 0, we get

⇒ u∗0 = µ∗0 (−1) = 0.

With this, the optimal final cost reads as

z̈(t) = u(t), 0 ≤ t ≤ 1, (1)

with initial and terminal conditions:

Introduce the state vector · ¸ · ¸

with initial and terminal conditions,

Apply the Minimum Principle.

• The Hamiltonian is given by

H(x, u, p) = g(x, u) + pT f (x, u)

u∗ (t) + p2 (t) = 0 ⇔ u∗ (t) = −p2 (t).

Since the second derivative of H with respect to u is 1, u∗ (t) is indeed a minimum.

• The adjoint equations,

are integrated and result in the following equations:

Using this result, the optimal input is given by

x∗2 (t) = −c2 t2 + c2 t.

The optimal state x∗1 (t) is given by

With the conditions on x1 , we get

• Finally, we obtain the optimal control

and the optimal state trajectory

x∗1 (t) = z ∗ (t) = −2t3 + 3t2

Recall the Minimum Principle.

Under suitable technical assumptions, the following Proposition holds:

ẋ = f (x(t), u(t)) , x(0) = x0 , 0≤t≤T

and the cost function, Z T

H(x, u, p) = g (x, u) + pT f (x, u) .

2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t)) ,

3. H (x∗ (t), u∗ (t), p(t)) is constant.

2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t), t)

3. H (x∗ (t), u∗ (t), p(t), t) not necessarily constant,

where the Hamiltonian function is now given by

H(x, u, p, t) = g (x, u, t) + pT f (x, u, t) .

Follow the subsequent steps:

y(t) = t, with ẏ(t) = 1 and y(0) = 0.

and the cost is defined by Z T

where g̃(ξ, u) = g(x, u, y) and h̃(ξ) = h(x).

H̃(ξ, u, p̃) = g̃(ξ, u) + p̃T f˜(ξ, u) with p̃ = [ p, py ]T .

• Apply the Minimum Principle:

˙ = − ∂ H̃ (ξ ∗ (t), u∗ (t), p̃(t)) ,