Nothing Special   »   [go: up one dir, main page]

DPOCexam2008midterm Solution

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Prof. L.

Guzzella
Prof. R. D’Andrea

Midterm Examination November 12th, 2008

Dynamic Programming & Optimal Control (151-0563-00) Prof. R. D’Andrea

Solutions

Exam Duration: 150 minutes

Number of Problems: 4 (25% each)

Permitted aids: Textbook Dynamic Programming and Optimal Control by


Dimitri P. Bertsekas, Vol. I, 3rd edition, 2005, 558 pages.
Your written notes.
No calculators.

Important: Use only these prepared sheets for your solutions.


Page 2 Midterm Examination – Dynamic Programming & Optimal Control

Problem 1 25%

1 9
3 2 1 1
2 1 1
1 4 1 4
S 2 4 6 T
3 3 1 1

5 1 1
3 5 7 10
1 1
1 1

Figure 1

Find the shortest path from node S to node T for the graph given in Figure 1. Apply the label
correcting method. Use best-first search to determine at each iteration which node to remove
from OPEN; that is, remove node i with

di = min dj ,
j in OPEN

where the variable di denotes the length of the shortest path from node S to node i that has
been found so far.
Solve the problem by populating a table of the following form:

Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =


ation OPEN UPPER
...
Midterm Examination – Dynamic Programming & Optimal Control Page 3

Solution 1

Iter- Node exiting OPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT =


ation OPEN UPPER
0 - S 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
1 S 1,2,3 0 3 1 3 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞
2 2 1,3,4 0 2 1 3 5 ∞ ∞ ∞ ∞ ∞ ∞ ∞
3 1 3,4 0 2 1 3 4 ∞ ∞ ∞ ∞ ∞ ∞ ∞
4 3 4,5 0 2 1 3 4 8 ∞ ∞ ∞ ∞ ∞ ∞
5 4 5,6 0 2 1 3 4 7 5 ∞ ∞ ∞ ∞ ∞
6 6 5,9 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 9
7 9 5 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 7
8 5 - 0 2 1 3 4 7 5 ∞ ∞ 6 ∞ 7

The shortest path is S → 2 → 1 → 4 → 6 → 9 → T with a total length of 7.


Page 4 Midterm Examination – Dynamic Programming & Optimal Control

Problem 2 25%

Consider the dynamic system

xk+1 = (1 − a) wk + auk , 0 ≤ a ≤ 1, k = 0, 1 ,

with initial state x0 = −1. The cost function, to be minimized, is given by


( 1
)
X ¡ ¢
E x22 + x2k + u2k + wk2 .
w0 ,w1
k=0

The disturbance wk takes the values 0 and 1. If xk ≥ 0, both values have equal probability. If
xk < 0, the disturbance wk is 0 with probability 1. The control uk is constrained by

0 ≤ uk ≤ 1, k = 0, 1.

Apply the Dynamic Programming algorithm to find the optimal control policy and the optimal
final cost J0 (−1).
Midterm Examination – Dynamic Programming & Optimal Control Page 5

Solution 2

The optimal control problem is considered over a time horizon N = 2 and the cost, to be
minimized, is defined by

g2 (x2 ) = x22 and gk (xk , uk , wk ) = x2k + u2k + wk2 , k = 0, 1.

The DP algorithm proceeds as follows:

2nd stage:

J2 (x2 ) = x22

1st stage:
© ª
J1 (x1 ) = min E x21 + u21 + w12 + J2 (x2 )
0≤u1 ≤1
© ¡ ¢ª
= min E x21 + u21 + w12 + J2 (1 − a) w1 + au1
0≤u1 ≤1 w1
n ¡ ¢2 o
= min E x21 + u21 + w12 + (1 − a) w1 + au1
0≤u1 ≤1 w1

Distinguish two cases: x1 ≥ 0 and x1 < 0.

I ) x1 ≥ 0:
½ ´¾
2 2 1³ 2
´ 1³
2
J1 (x1 ) = min x1 + u1 + 1 + ((1 − a) + au1 ) + 0 + ((1 − a) · 0 + au1 )
0≤u1 ≤1 2 2
| {z }
L(x1 ,u1 )

Find the minimizing ū1 by


¯
∂L ¯¯ ¡ ¢ ! −a (1 − a)
¯ = (1 − a) a + 2 1 + a2 ū1 = 0 ⇔ ū1 = ≤ 0 (!).
∂u1 ū1 2 (1 + a2 )

Recall that the feasible set of inputs u1 is given by 0 ≤ u1 ≤ 1.

However, using the information that L (x1 , u1 ) is convex in u1 ; that is,

∂2L ¡ ¢
2 = 2 1 + a2 > 0,
∂u1

it follows that ū1 is a local minimum and the feasible optimal control u∗1 is given by

⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 ≥ 0.


Page 6 Midterm Examination – Dynamic Programming & Optimal Control

II ) x1 < 0:
© 2 ¡ ¢ ª
J1 (x1 ) = min x1 + 1 + a2 u21
0≤u1 ≤1 | {z }
L(x1 ,u1 )

Find the minimizing ū1 by


¯
∂L ¯¯ ¡ ¢ !
¯ = 2 1 + a2 ū1 = 0 ⇔ ū1 = 0.
∂u1 ū1
¯
∂2L ¯
Since the sufficient condition for a local minimum, ∂u21 ¯ū
> 0, holds, the optimal control is
1

⇒ u∗1 = µ∗1 (x1 ) = 0 ∀ x1 < 0.

0th stage:
n ¡ ¢o
J0 (−1) = min E (−1)2 + u20 + w02 + J1 (1 − a) w0 + au0
0≤u0 ≤1 w0

Since x0 < 0, we get


© ¡ ¢ª
J0 (−1) = min 1 + u20 + J1 au0 ,
0≤u0 ≤1 | {z }
L(x0 ,u0 )

where au0 ≥ 0. From above’s results, the optimal cost-to-go function for x1 ≥ 0 is

1 1
J1 (x1 ) = + (1 − a)2 + x21 .
2 2
Finally, the minimizing ū0 results from
¯
∂L ¯¯ !
= 2ū0 + 2a2 ū0 = 0 ⇔ ū0 = 0.
∂u0 ¯ū0
¯
∂2L ¯
Since ∂u20 ¯ū
> 0, the optimal control u∗0 is
0

⇒ u∗0 = µ∗0 (−1) = 0.

With this, the optimal final cost reads as

3 1
J0 (−1) = + (1 − a)2 .
2 2

In brief, the optimal control policy is to always set the input to zero, which can also be verified
by carefully looking at the equations given in the problem statement.
Midterm Examination – Dynamic Programming & Optimal Control Page 7

Problem 3 25%

t=0 t=1
u

z=0 z=1
Figure 2

At time t = 0, a unit mass is at rest at location z = 0. The mass is on a frictionless surface and
it is desired to apply a force u(t), 0 ≤ t ≤ 1, such that at time t = 1, the mass is at location
z = 1 and again at rest. In particular,

z̈(t) = u(t), 0 ≤ t ≤ 1, (1)

with initial and terminal conditions:

z(0) = 0, ż(0) = 0,
z(1) = 1, ż(1) = 0.

Of all the functions u(t) that achieve the above objective, find the one that minimizes
Z 1
1
u2 (t)dt.
2 0

Hint: The state for this system is x(t) = [ x1 (t), x2 (t) ]T , where x1 (t) = z(t) and x2 (t) = ż(t).
Page 8 Midterm Examination – Dynamic Programming & Optimal Control

Solution 3

Introduce the state vector · ¸ · ¸


x z
x= 1 = .
x2 ż
Using this notation, the dynamics read as
· ¸ · ¸
ẋ1 x
= 2
ẋ2 u

with initial and terminal conditions,

x1 (0) = 0, x2 (0) = 0,
x1 (1) = 1, x2 (1) = 0.

Apply the Minimum Principle.

• The Hamiltonian is given by

H(x, u, p) = g(x, u) + pT f (x, u)


1
= u2 + p1 x2 + p2 u.
2

• The optimal input u∗ (t) is obtained by minimizing the Hamiltonian along the optimal
trajectory. Differentiating the Hamiltonian with respect to u yields,

u∗ (t) + p2 (t) = 0 ⇔ u∗ (t) = −p2 (t).

Since the second derivative of H with respect to u is 1, u∗ (t) is indeed a minimum.

• The adjoint equations,

ṗ1 (t) = 0
ṗ2 (t) = −p1 (t),

are integrated and result in the following equations:

p1 (t) = c1 , c1 constant
p2 (t) = −c1 t − c2 , c2 constant.

Using this result, the optimal input is given by

u∗ (t) = c1 t + c2 .

• Recalling the initial and terminal conditions on x, we can solve for c1 and c2 .
With above’s results, the optimal state trajectory x∗2 (t) is

1
ẋ∗2 (t) = c1 t + c2 ⇒ x∗2 (t) = c1 t2 + c2 t + c3 , c3 constant,
2
Midterm Examination – Dynamic Programming & Optimal Control Page 9

and, therefore,

x∗2 (0) = 0 ⇒ c3 = 0
1
x∗2 (1) = 0 ⇒ c1 + c2 = 0 ⇒ c1 = −2c2 ,
2

yielding to

x∗2 (t) = −c2 t2 + c2 t.

The optimal state x∗1 (t) is given by

1 1
ẋ∗1 (t) = x∗2 (t) = −c2 t2 + c2 t ⇒ x∗1 (t) = − c2 t3 + c2 t2 + c4 , c4 constant.
3 2

With the conditions on x1 , we get

x∗1 (0) = 0 ⇒ c4 = 0
1 1
x∗1 (1) = 1 ⇒ − c2 + c2 = 1 ⇒ c2 = 6 and c1 = −12.
3 2

• Finally, we obtain the optimal control

u∗ (t) = −12t + 6,

and the optimal state trajectory

x∗1 (t) = z ∗ (t) = −2t3 + 3t2


x∗2 (t) = ż ∗ (t) = −6t2 + 6t.
Page 10 Midterm Examination – Dynamic Programming & Optimal Control

Problem 4 25%

Recall the Minimum Principle.

Under suitable technical assumptions, the following Proposition holds:


Given the dynamic system

ẋ = f (x(t), u(t)) , x(0) = x0 , 0≤t≤T

and the cost function, Z T


h (x(T )) + g (x(t), u(t)) dt,
0
to be minimized, define the Hamiltonian function

H(x, u, p) = g (x, u) + pT f (x, u) .

Let u∗ (t), t ∈ [0, T ] be an optimal control trajectory and x∗ (t) the resulting state trajectory.
Then,

1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t)) , p(T ) = ∂h
∂x (x∗ (T )) ,

2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t)) ,

3. H (x∗ (t), u∗ (t), p(t)) is constant.

Show that if the dynamics and the cost are time varying – that is, f (x, u) is replaced by f (x, u, t)
and g (x, u) is replaced by g (x, u, t) – the Minimum Principle becomes:

1. ṗ(t) = − ∂H ∗ ∗
∂x (x (t), u (t), p(t), t) , p(T ) = ∂h
∂x (x∗ (T )) ,

2. u∗ (t) = arg minu∈U H (x∗ (t), u, p(t), t)

3. H (x∗ (t), u∗ (t), p(t), t) not necessarily constant,

where the Hamiltonian function is now given by

H(x, u, p, t) = g (x, u, t) + pT f (x, u, t) .


Midterm Examination – Dynamic Programming & Optimal Control Page 11

Solution 4

General idea:
Convert the problem to a time-independent one, apply the standard Minimum Principle pre-
sented in class, and simplify the obtained equations.

Follow the subsequent steps:


• Introduce an extra state variable y(t) representing the time:

y(t) = t, with ẏ(t) = 1 and y(0) = 0.

• Convert the problem into standard form by introducing the extended state ξ = [ x, y ]T :
The dynamics read now as
˙ = f˜(ξ, u) = [ f (x, u, y), 1 ]T
ξ(t)

and the cost is defined by Z T


h̃(ξ(T )) + g̃(ξ, u) dt,
0

where g̃(ξ, u) = g(x, u, y) and h̃(ξ) = h(x).


The Hamiltonian follows from above’s definitions:

H̃(ξ, u, p̃) = g̃(ξ, u) + p̃T f˜(ξ, u) with p̃ = [ p, py ]T .

• Apply the Minimum Principle:


Denoting the optimal control by u∗ (t) and the corresponding optimal state by ξ ∗ (t), we
get the following:
1. The adjoint equation is given by

˙ = − ∂ H̃ (ξ ∗ (t), u∗ (t), p̃(t)) ,


p̃(t) p̃(T ) =
∂ h̃ ∗
(ξ (T )) . (2)
∂ξ ∂ξ
However,

H̃(ξ, u, p̃) = g(x, u, y) + pT f (x, u, y) + py = H(x, u, p, y) + py ;

that is,
∂ H̃ ∂H ∂ H̃ ∂H
= , = .
∂x ∂x ∂y ∂y
Moreover,
∂ h̃ ∂h ∂ h̃
= and = 0.
∂x ∂x ∂y
From (2), we recover the first equation
∂H ∗ ∂h ∗
ṗ(t) = − (x (t), u∗ (t), p(t), t) , p(T ) = (x (T )) .
∂x ∂x
In addition, replacing y(t) by t again, we get
∂H ∗
ṗy (t) = − (x (t), u∗ (t), p(t), t) , py (T ) = 0.
∂t
Page 12 Midterm Examination – Dynamic Programming & Optimal Control

2. The optimal input u∗ (t) is obtained by


n o
u∗ (t) = arg minu∈U H(x∗ (t), u∗ (t), p(t), t) + py (t)

= arg minu∈U H(x∗ (t), u∗ (t), p(t), t).

3. Finally, the standard formulation gives us

H(x∗ (t), u∗ (t), p(t), t) + py (t) is constant.


∂H
However, py (t) is constant only if ∂t = 0, which, in general, is only true if f and g
do not depend on time.

You might also like