Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems

International Journal of Power Electronics and Drive Systems (IJPEDS)

Vol. 12, No. 1, March 2021, pp. 551∼557

ISSN: 2088-8694, DOI: 10.11591/ijpeds.v12.i1.pp551-557

Adaptive dynamic programming algorithm for uncertain

nonlinear switched systems
Dao Phuong Nam1 , Nguyen Hong Quang2 , Nguyen Nhat Tung3 , Tran Thi Hai Yen4
School of Electrical Engineering, Hanoi University of Science and Technology,
Bách Khoa, Hai Bà Trung, Hà Noi, Vietnam
Thai Nguyen University of Technology, So 666 D. 3/2, P, Thành pho Thái Nguyên, Thái Nguyên, Vietnam
Electric Power University, 235 Hoàng Quoc Viet, Co Nhue, Tu Liêm, Hà Noi 129823, Vietnam

Article Info ABSTRACT

Article history: This paper studies an approximate dynamic programming (ADP) strategy of a group of
nonlinear switched systems, where the external disturbances are considered. The neu-
Received Feb 2, 2020
ral network (NN) technique is regarded to estimate the unknown part of actor as well as
Revised Dec 15, 2020 critic to deal with the corresponding nominal system. The training technique is simul-
Accepted Jan 10, 2021 taneously carried out based on the solution of minimizing the square error Hamilton
function. The closed system’s tracking error is analyzed to converge to an attraction
Keywords: region of origin point with the uniformly ultimately bounded (UUB) description. The
simulation results are implemented to determine the effectiveness of the ADP based
Adaptive dynamic controller.
HJB equation
Lyapunov This is an open access article under the CC BY-SA license.
Neural networksstability
Nonlinear switched systems

Corresponding Author:
Nguyen Hong Quang
Thai Nguyen University of Technology,
So 666 D. 3/2, P, Thành pho Thái Nguyên, Thái Nguyên, Vietnam

It is worth noting that many systems in industry can be described by switched system such as DC-
DC converter [1]-[3], H-bridge inverter [4], multilevel inverter [5], photovoltaic inverter [6]. Although many
different approaches for switched systems have been proposed, e.g., switching-delay tolerant control [7], clas-
sical nonlinear control [8]-[12], the optimization approaches with the advantage of mentioning the input/state
constraint has not been mentioned much. The approaches of fuzzy and neural network as well as ANN, par-
ticle swarm optimization (PSO) technique were investigated in several different systems such as photovoltaic
inverter, transmission line. [13]-[17].
Adaptive dynamic programming has been considered in many situations, such as nonlinear continuous
time systems [18], actuator saturation [19], linear systems [20]-[22], output constraint [23]. In the case of non-
linear systems, the algorithm should be implemented based on Neural Networks (NNs). However, Kronecker
product was employed in linear systems. Furthermore, the data driven technique should to be mentioned to
compute the actor/critic precisely. It should be noted that the robotic systems has been controlled by ADP
algorithm [24]-[25].
Our work proposed the solution of adaptive dynamic programming in nonlinear perturbed switching
systems based on the neural networks. The consideration of the Halminton function enables us obtaining the
learning technique of these neural networks. The UUB stability of closed system is analyzed and simulation
results illustrate the high effectiveness of given controller.

552 ❒ ISSN: 2088-8694

Consider the following uncertain nonlinear continuous time switched systems of the form:

ξ(t) = fi (ξ(t)) + gi (ξ(t)) (u + ∆ (ξ, t)) (1)
where ξ (t) ∈ Ωx ∈ Rn denotes the state variables and u (t) ∈ Ωu ∈ Rm describes the control variables.
The function β : [ 0, +∞) 7→ Ω = {1, 2, ..., l} is a information of switching processing, which is known as
a function with many continuous piecewise depending on time, and l is the subsystems number. fi (ξ) are
uncertain smooth vector functions with fi (0) = 0. gi (ξ) are mentioned as smooth vector functions with the
property Gmin ⩽ ∥gi (ξ)∥ ⩽ Gmax . The switching index β (t) is unknown.
Assumption 1: ∆ (ξ, t) is bounded by a certain function ϱ (ξ) as ∥∆ (ξ, t)∥ ⩽ ϱ (ξ)
Consider the cost function connected with the uncertain switched system (1):
J(ξ, u) = r (ξ (τ ) , u (τ )) dτ (2)

where r(ξ, u) = ξ T Qξ + uT Ru and Q = QT > 0; R = RT > 0.

The main purpose is to achieve the state feedback control design and give the upper bound term to
guarantee the closed systems under this controller is robustly stable. Additionally, the performance index (2) is
bounded as J ≤ K (ξ, u) ≤ M .
Definition: The term K(u) is given by the appropriate performance index. As a result, the control
input u∗ = arg min K (ξ, u) is mentioned as the optimal appropriate performance index method.

The obtained nominal system after eliminating the disturbance in switched system (3) is described by:

ξ = fi (ξ) + gi (ξ) u (3)
The performance index of system (3) is modified as (4)
Z∞ h i
Q1 (ξ, u) = r(ξ, u) + γ (ρ (ξ)) dτ (4)

We prove that Q1 (ξ, u) with γ ⩾ ∥R∥ is the one of appropriate performance indexes of dynamical
system (1). Define: V ∗ (t) = min Q1 (ξ, u), we have (5)

r(ξ, u) + γρ2 (ξ) dλ

V (t) = min (5)

based on nominal system and cost function (4), it leads to Halminton function as (6)
∂V ∗

∗ 2
H (ξ, u, V ) = r(ξ, u) + γρ (ξ) + (fi (ξ) + gi (ξ) u) (6)

by using optimality principle, the optimal control input can be obtained as (7).

1 T ∂V
u∗ (ξ) = − R−1 (gi (ξ)) (7)
2 ∂ξ
We continue to utilize this control law (7) for nonlinear continuous SW system (1) and obtain that:
T ∗
Theorem 1: The system (1) under the controller u∗ (ξ) = − 12 R−1 (gi (ξ)) ∂V ∂ξ is stable with the
associated Lyapunov function candidate:

Int J Pow Elec & Dri Syst, Vol. 12, No. 1, March 2021 : 551 – 557
Int J Pow Elec & Dri Syst ISSN: 2088-8694 ❒ 553

r(ξ, u) + γϱ2 (ξ) dλ

V (t) = (8)

where γ ⩾ ∥R∥.
Proof: Taking the derivative of V under the control input u (ξ) = − 21 R−1 (gi (ξ)) ∇V ∗ , we imply
that (9):


V = −ξ T Qξ − γϱ2 (ξ) − ∆ (ξ, t) R∆ (ξ, t) − (u + ∆ (ξ, t)) R (u + ∆ (ξ, t)) (9)
It is able to conclude that (10):

V̇ (t) ⩽ −ξ T Qξ (10)
Therefore, the system (1) is robustly stable. However, it is impossible to solve directly HJB equation.
Hence, the optimal performance index V ∗ for system (3) can be described based on a NN as (11)

V ∗ = wT σ (ξ) + ε (ξ) (11)

where σ (x) : Rn → RN ; σ (0) = 0, w ∈ RN is the NN constant weight vector. σ (x) can be found to
guarantee that when N → ∞, we have: ε (ξ) → 0 and ∇ε (ξ) → 0, so for fixed N , we can assume that:
Assumption 2: ∥ε (ξ)∥ ⩽ εmax ; ∥∇ε (ξ)∥ ⩽ ∇εmax ; ∇σmin ⩽ ∥∇σ (ξ)∥ ⩽ ∇σmax ; ∥w∥ ⩽ wmax .
Combining two formulas (10) and (11) we imply (12)

T 1 T T
H (ξ, u∗ , V ∗ ) = ξ T Qξ + λϱ2 (ξ) + (∇V ∗ ) fi (ξ) − (∇V ∗ ) gi (ξ) R−1 gi (ξ) (∇V ∗ ) = 0 (12)
Formula (19) leads to (13).
∇V ∗ = (∇σ (ξ)) w + ∇ε (ξ) (13)
Obtain the description as (14).

T 1 T T
eN N = −∇ε (ξ) (fi (ξ) + gi (ξ) u∗ ) + ∇ε (ξ) gi (ξ) R−1 gi (ξ) ∇ε (ξ) (14)
It follows that eN N converges uniformly to zero as N → ∞. For each number N , eN N is bounded
on a region as eN N ⩽ emax . Under the structure of ADP-based controller, a critic NN is computed as (15).

T 1 T
V̂ = ŵT σ (ξ) = σ (ξ) ŵ; û = − R−1 (gi (ξ)) ∇V̂ (15)
It is able to achieve that:
1 T T
eHJB = ξ T Qξ + λϱ2 (ξ) + ŵT ∇σ (ξ) fi (ξ) − ŵT ∇σ (ξ) gi (ξ) R−1 gi (ξ) ∇σ (ξ) ŵ (16)
The training law is handled based on a steepest descent method:

d ∂E
b = −α
w (17)
dt ∂w
with E = 21 eTHJB eHJB .
b is trained to minimize the network error part G = 12 eTHJB eHJB . This result
Remark 1: The weight w
is obtained from (18).
∂G ∂G
= −α (18)
∂t ∂w

Adaptive dynamic programming algorithm for uncertain nonlinear switched systems (Dao Phuong Nam)
554 ❒ ISSN: 2088-8694

Theorem 2: Consider the feedback controller in (15) and the critic weight is updated by (18), the
weight estimate error w̃ = w − ŵ and the closed system’s state vector x(t) are uniformly ultimately bounded
Proof: Let’s choose the Lyapunov function:

1 T
V (t) = V1 (t) + V2 (t) , where: V1 (t) = w̃ (t) w̃ (t) , V2 (t) = V ∗ (19)

Using the Assumption 3: ∥fi (ξ) + gi (ξ) u∗ ∥ ⩽ ρmax and the definition:
ρi = fi (ξ) + gi (ξ) u∗ ; Gi = gi (ξ) R−1 gi (ξ) ; ∇σ = ∇σ (ξ) ; ∇ε = ∇ε (ξ). Taking the derivative of V1 (t),
we imply that:
1 1
V̇1 (t) = −w̃T −eN N + w̃T ∇σµi + w̃T ∇σGi ∇ε + w̃T ∇σGi ∇σ T w̃
2 4
∇σ (x) µi + Gi ∇σ T w̃ + ∇ε


It leads to the estimation: V̇1 (t) ⩽ −π1 . For the term V2 (t) , from (20) we have (21).

T  1 T
V̇2 = (∇V ∗ ) (fi + gi (û + ∆)) = − ξ T Qξ + λρ2 (ξ) − (∇V ∗ )
1 T


gi R−1 giT (∇V ∗ ) + (∇V ∗ ) gi R−1 giT ∇σ (ξ) w̃ + ∇ε (ξ) + (∇V ∗ ) gi ∆ (21)
Assume that ρ (ξ) = ϖ ∥ξ∥. From (40) we have (22).

V̇2 ⩽ − (λmin (Q) + λϖ) ∥ξ∥ + θ2 (22)
with θ2 = − 41 (∇V ∗ ) gi R−1 giT (∇V ∗ ) + 21 (∇V ∗ ) gi R−1 giT ∇σ (x) w̃ + ∇ε (x) + (∇V ∗ ) gi ∆.
Based on the two above assumptions, we have (23).

1 2 2  1 2 2
θ2 ⩽ λmax R−1 + (ϑ∇σmax + ∇εmax ) gmax λmax R−1

(wmax ∇σmax + ∇εmax ) gmax
4 2

+ (wmax ∇σmax + ∇εmax ) gmax ϖ ∥x∥ (23)

It is obvious that (λmin (Q) + λϖ) ∥x∥ − θ2 ⩾ π2 with π2 > 0 and we obtain (24).

V̇2 (t) ⩽ −π2 (24)

Remark 2: The coefficients ϑ1 ; ϑ2 can be chosen by renovating the NN of the optimal performance
V (0)
index. Moreover, for arbitrary switching index, after min(π 1 ;π2 )
the variable ∥ξ∥ and ∥w̃∥ tend to the accurate
domains. The ADP controller û is proposed in (15), which tends to the neighborhood of u∗ .
Proof: The deviation of control input is estimated as (25).

1 T


∥û − u∗ ∥ = R−1 (gi (ξ)) (∇σ (ξ)) w̃ + ∇ε (ξ)
λmax R−1 .Gmax . (∇σmax .υ1 + ∇εmax ) = ϑ3

⩽ (25)
Thus the proof is completed.

Int J Pow Elec & Dri Syst, Vol. 12, No. 1, March 2021 : 551 – 557
Int J Pow Elec & Dri Syst ISSN: 2088-8694 ❒ 555

In this section, we consider the simulations to validate the performance of the established control
scheme: Let N = 2 and the subsystems of the switched system are (26) and (27).

ẋ1 = −x31 − 2x2 + (u+ ∆1 (x, t))

ẋ2 = x1 + 0.5 cos x21 sin x32 − (u + ∆1 (x, t))

ẋ1 = −x51 sin (x2 ) + (u + ∆2 (x, t))

ẋ2 = 12 x1 − cos (x1 ) cos x32 − (u + ∆2 (x, t))
The initial state vectors can be chosen as (28).
x (0) = −5
5 (28)
2 0 1 0
Choosing that the parameter matrices: R = ;Q = ; α = 0.1; λ = 5.
0 2 0 3
The simulation results shown in Figure 1 and Figure 2 validate the effectiveness of proposed algorithm.

Figure 1. The response of x2 Figure 2. The response of x2

This paper has investigated the ADP problem of switched nonlinear systems under the external dis-
turbance. We consider previously for nominal system by eliminating the disturbance, then using classical
nonlinear control technique. The neural networks have been designed to estimate the actor and critic NN of
iteration. It is possible to develop the learning algorithm with simultaneous tuning. Finally, UUB description
of the closed system is guaranteed under this work.

This research was supported by Research Foundation funded by Thai Nguyen University of Technol-

Adaptive dynamic programming algorithm for uncertain nonlinear switched systems (Dao Phuong Nam)
Int J Pow Elec & Dri Syst, Vol. 12, No. 1, March 2021 : 551 – 557
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems (Dao Phuong Nam)

