DPOCexam2008midterm Solution
DPOCexam2008midterm Solution
DPOCexam2008midterm Solution
Guzzella
Prof. R. D’Andrea
Solutions
Problem 1 25%
1 9
3 2 1 1
2 1 1
1 4 1 4
S 2 4 6 T
3 3 1 1
5 1 1
3 5 7 10
1 1
1 1
Figure 1
Find the shortest path from node S to node T for the graph given in Figure 1. Apply the label
correcting method. Use best-first search to determine at each iteration which node to remove
from OPEN; that is, remove node i with
di = min dj ,
j in OPEN
where the variable di denotes the length of the shortest path from node S to node i that has
been found so far.
Solve the problem by populating a table of the following form:
Solution 1
Problem 2 25%
xk+1 = (1 − a) wk + auk , 0 ≤ a ≤ 1, k = 0, 1 ,
The disturbance wk takes the values 0 and 1. If xk ≥ 0, both values have equal probability. If
xk < 0, the disturbance wk is 0 with probability 1. The control uk is constrained by
0 ≤ uk ≤ 1, k = 0, 1.
Apply the Dynamic Programming algorithm to find the optimal control policy and the optimal
final cost J0 (−1).
Midterm Examination – Dynamic Programming & Optimal Control Page 5
Solution 2
The optimal control problem is considered over a time horizon N = 2 and the cost, to be
minimized, is defined by
2nd stage:
J2 (x2 ) = x22
1st stage:
© ª
J1 (x1 ) = min E x21 + u21 + w12 + J2 (x2 )
0≤u1 ≤1
© ¡ ¢ª
= min E x21 + u21 + w12 + J2 (1 − a) w1 + au1
0≤u1 ≤1 w1
n ¡ ¢2 o
= min E x21 + u21 + w12 + (1 − a) w1 + au1
0≤u1 ≤1 w1
I ) x1 ≥ 0:
½ ´¾
2 2 1³ 2
´ 1³
2
J1 (x1 ) = min x1 + u1 + 1 + ((1 − a) + au1 ) + 0 + ((1 − a) · 0 + au1 )
0≤u1 ≤1 2 2
| {z }
L(x1 ,u1 )
∂2L ¡ ¢
2 = 2 1 + a2 > 0,
∂u1
it follows that ū1 is a local minimum and the feasible optimal control u∗1 is given by
II ) x1 < 0:
© 2 ¡ ¢ ª
J1 (x1 ) = min x1 + 1 + a2 u21
0≤u1 ≤1 | {z }
L(x1 ,u1 )
0th stage:
n ¡ ¢o
J0 (−1) = min E (−1)2 + u20 + w02 + J1 (1 − a) w0 + au0
0≤u0 ≤1 w0
where au0 ≥ 0. From above’s results, the optimal cost-to-go function for x1 ≥ 0 is
1 1
J1 (x1 ) = + (1 − a)2 + x21 .
2 2
Finally, the minimizing ū0 results from
¯
∂L ¯¯ !
= 2ū0 + 2a2 ū0 = 0 ⇔ ū0 = 0.
∂u0 ¯ū0
¯
∂2L ¯
Since ∂u20 ¯ū
> 0, the optimal control u∗0 is
0
3 1
J0 (−1) = + (1 − a)2 .
2 2
In brief, the optimal control policy is to always set the input to zero, which can also be verified
by carefully looking at the equations given in the problem statement.
Midterm Examination – Dynamic Programming & Optimal Control Page 7
Problem 3 25%
t=0 t=1
u
z=0 z=1
Figure 2
At time t = 0, a unit mass is at rest at location z = 0. The mass is on a frictionless surface and
it is desired to apply a force u(t), 0 ≤ t ≤ 1, such that at time t = 1, the mass is at location
z = 1 and again at rest. In particular,
z(0) = 0, ż(0) = 0,
z(1) = 1, ż(1) = 0.
Of all the functions u(t) that achieve the above objective, find the one that minimizes
Z 1
1
u2 (t)dt.
2 0
Hint: The state for this system is x(t) = [ x1 (t), x2 (t) ]T , where x1 (t) = z(t) and x2 (t) = ż(t).
Page 8 Midterm Examination – Dynamic Programming & Optimal Control
Solution 3
x1 (0) = 0, x2 (0) = 0,
x1 (1) = 1, x2 (1) = 0.
• The optimal input u∗ (t) is obtained by minimizing the Hamiltonian along the optimal
trajectory. Differentiating the Hamiltonian with respect to u yields,
ṗ1 (t) = 0
ṗ2 (t) = −p1 (t),
p1 (t) = c1 , c1 constant
p2 (t) = −c1 t − c2 , c2 constant.
u∗ (t) = c1 t + c2 .
• Recalling the initial and terminal conditions on x, we can solve for c1 and c2 .
With above’s results, the optimal state trajectory x∗2 (t) is
1
ẋ∗2 (t) = c1 t + c2 ⇒ x∗2 (t) = c1 t2 + c2 t + c3 , c3 constant,
2
Midterm Examination – Dynamic Programming & Optimal Control Page 9
and, therefore,
x∗2 (0) = 0 ⇒ c3 = 0
1
x∗2 (1) = 0 ⇒ c1 + c2 = 0 ⇒ c1 = −2c2 ,
2
yielding to
1 1
ẋ∗1 (t) = x∗2 (t) = −c2 t2 + c2 t ⇒ x∗1 (t) = − c2 t3 + c2 t2 + c4 , c4 constant.
3 2
x∗1 (0) = 0 ⇒ c4 = 0
1 1
x∗1 (1) = 1 ⇒ − c2 + c2 = 1 ⇒ c2 = 6 and c1 = −12.
3 2
u∗ (t) = −12t + 6,
Problem 4 25%
Let u∗ (t), t ∈ [0, T ] be an optimal control trajectory and x∗ (t) the resulting state trajectory.
Then,
1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t)) , p(T ) = ∂h
∂x (x∗ (T )) ,
Show that if the dynamics and the cost are time varying – that is, f (x, u) is replaced by f (x, u, t)
and g (x, u) is replaced by g (x, u, t) – the Minimum Principle becomes:
1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t), t) , p(T ) = ∂h
∂x (x∗ (T )) ,
Solution 4
General idea:
Convert the problem to a time-independent one, apply the standard Minimum Principle pre-
sented in class, and simplify the obtained equations.
• Convert the problem into standard form by introducing the extended state ξ = [ x, y ]T :
The dynamics read now as
˙ = f˜(ξ, u) = [ f (x, u, y), 1 ]T
ξ(t)
that is,
∂ H̃ ∂H ∂ H̃ ∂H
= , = .
∂x ∂x ∂y ∂y
Moreover,
∂ h̃ ∂h ∂ h̃
= and = 0.
∂x ∂x ∂y
From (2), we recover the first equation
∂H ∗ ∂h ∗
ṗ(t) = − (x (t), u∗ (t), p(t), t) , p(T ) = (x (T )) .
∂x ∂x
In addition, replacing y(t) by t again, we get
∂H ∗
ṗy (t) = − (x (t), u∗ (t), p(t), t) , py (T ) = 0.
∂t
Page 12 Midterm Examination – Dynamic Programming & Optimal Control