Identification of Nonlinear Dynamic Systems - Classical Methods Versus RBF Networks - Nelles & Isermann ACC 1995

RocrldllnJsof Ib
American Control Conknnw

Sealtls,Wirhlnplon *June 1905
Identification of Nonlinear Dynamic Systems -

Classical Methods versus Radial Bask Function Networks
Oliver Nelles and Rolf Isermann
Darmstadt University of Technology, Institute of Automatic Control
Laboratory of Control Engineering and Process Automation
Landgraf-Georg-Str. 4, 64283 Darmstadt, Germany
E-mail:nelles@irtl.rt.e-techni.k.th-darmstadt.de
Abstract the system has to be known a priori or has to be

This paper compares radial basis function networks for estimated from data, see e.g. [l]. Some dynamic neural
identification of nonlinear dynamic systems with networks try to avoid this disadvantage, but suffer from
classical methods derived from the Volterra series. The other drawbacks. In this paper the series parallel model
performance of these different approaches as Hammer- [2] is used for learning, that is the delayed process
stein, Wiener and NDE models is analysed. Since the outputs are fed in the network instead of the previous
centres and variances of the Gaussian radial basis network outputs. It is known in linear system
functions will be fixed before learning and only the identification theory, that this approach might cause a
weights are learned, a linear optimization problem arises. bias in estimation. This bias is proportional to the power
Therefore training the network and parameter estimation of the noise and is assumed to exist in nonlinear
becomes comparable in calculational effort. estimation as well. However the advantage of this
method is the guaranteed network stability during
It will be shown, that the classical methods can compete learning and a very simple learning law. Therefore the
or even perform better than the neural network, if the
series parallel model is used most frequently.
assumptions for the structure are valid. However if in
practical applications the structure is not known the In linear systems the function f(.) is the hyper plane
radial basis function network performs much better than y(k) = b , u ( k - l ) + b , u ( k - 2 ) + ...+ b,*u(k-nJ
the classical methods. (2)
- a , y ( k - 1) - a,y(k-2) - A . - aqy(k-ny)
1. Introduction and only the slopes in each direction -a, and b, have to
In the recent years the capabilities of neural networks for be identified to obtain a complete model description.
system identification have been studied by many authors. However for nonlinear systems the function f(.) can be
Most papers concentrate their analysis on the per- of arbitrary shape and the problem is to approximate it.
formance of one kind of neural network. Some others Thus in nonlinear system identification we have to deal
compare different kinds of neural networks with their with both estimation and approximation errors.
advantages and disadvantages. But neural networks and The following two sections give an introduction on the
classical methods have hardly been compared. examined classical and neural methods. In section 4 the
The term “classical methods“ refers to the widely known designed test processes and the excitation signal is
and applied parameter estimation methods, that are based described. Section 5 discusses the results and section 6
on Volterra kernels. Usually the simple Hammerstein, ends with conclusions.
Wiener or combined models are applied. These models
assume a special structure of the system, e.g. a nonlinear
2. Classical Methods
static function separated from the linear dynamic system.
Also the more general NDE models have become quite The parametric Volterra model can describe any
popular, since they suit a lot of models, that arise nonlinear process of the following form [3]
directly or by approximation from physical laws. ‘w){ Y ( t ) ) ~ { Y ( ~ ) ~ Y ( O=~ W
+ . . . ~)
{u(t)) (3)
Neural networks on the other hand are not structured in where L o ) are differential operators with D = d/dt and
this sense and try to learn any given nonlinear mapping. F(.) is a linear combination of a finite number of
Most types of neural networks are static. However a products and powers of the variable and the derivatives
dynamic system can be modeled in discrete time by of the variable. The Hammerstein and Wiener models
feeding previous process inputs and outputs in the can be considered as a special case of (3), while the
network. Thus the following nonlinear mapping f(.) can NDE (nonlinear differential equation) model is as general
be learned: as the parametric Volterra model.
y ( k ) = f ( u ( k - l),u(R-2), ...,u(k-ny), The parametric Volterra, the generalized Hammerstein
(1) and Wiener models and the NDE model are shown in
y ( k - l ) , y ( k - 2 ) , ...,Y ( k - n y ) ) Fig. 1. The series connection of a static nonlinearity and
a linear dynamic system is called simple Hammerstein
I Thus information about the dynamic order nu and ny of model. The series connection of a linear dynamic system
3786
and a static nonlinearity is called simple Wiener model.
For system orders higher than one they are special cases
of the corresponding generalized models, otherwise they
are identical.
The Hammerstein and NDE models are linear in their
parameters, but the Wiener model is not. Thus for
identification with the simple Wiener model, first the
static nonlinearity and its inverse are approximated from
a quasi-static test data set with a polynomial of order p.
d) generalized Wiener model

Fig. 1: Block diagrams of parametric models
Second the inverse static nonlinearity is used for

calculating the output of the linear system. Third from
these input/output signals the linear dynamic system is
identified [4]. Additionally we tested this approach for
the Hammerstein type systems, but the performance was
worse than estimating dynamic and nonlinear part
simultaneously (that is possible because the Hammerstein
model is linear in the process parameters).
The order of the approximation polynomials was
determined by starting with a low order p and increasing
it until no further improvement could be achieved.
Therefore the used polynomial orders, depending on the
a) parametric Volterra model process, range from 2 to 13.
3. Radial Basis Function Networks

Radial basis function (RBF) networks have been known
as multidimensional function approximators in
mathematics for a long time [5]. The breakthrough in
time series analysis certainly came with the paper of
Moody and Darken in 1989 [6]. RBF networks
approximate the function f(.) by a weighted sum of
radial basis functions. Fig. 2 shows the network
structure. In this paper Gaussian RBFs are used, thus by
using the Euclidean distance measure the network's
output can be written as
N
Y = Cw,.,
T U I
I i I
where ci, is the j-th component of i-th centre, uij is j-th
b) NDE model component of the i-th standard deviation, N is the
number of RBFs and n = nu + ny is the number of
dimensions. Equation (4) differs from the one mainly
used in most standard papers in the possibility to chose
different widths in each dimension.
This paper only handles the case of fixed centres ci, and
standard deviations uij.Thus learning the weights w, is a
linear optimization problem, because the error is linear in
the parameters wi. The following approach has been
taken. First the minimum and maximum values of the
networks inputs naturally given e.g. by actuator
boundaries are determined or estimated. Then the R B F s
centres are positioned on a regular lattice (see Fig. 3),
c) generalized Hammerstein model with the number of RBFs in each dimension as a free
3787
parameter. After that the standard deviations are xI = @-I), x2 = y(k-1) and y = y(k) has to be predicted
determined by by the network. In this paper the weights have been
u , ~= kj.(Distance of two neighboured optimized by directly solving the linear equation system.
(5) Due to bad numerical condition, when using larger
RBFs in dimension j )
values for the standard deviations, the pseudo inverse
with k, as free parameters. So all N RBFs have the same was evaluated by singular value decomposition.
set of standard deviation and usually all 4 are chosen
equally too. If different standard deviations are chosen in 4. Test Processes and Excitation
each dimension, the Gaussians are elliptic and not radial
any more. Five representative processes out of many examined will
be considered in this paper. The block diagrams of these
processes are shown in Fig. 4. The first process is of
Hammerstein type, the second one is of Wiener type, the
third process suits a NDE model, the fourth and fifth
process do not match any of these structures. It was
expected that the RBF network would outperform the
classical methods with process 4 and 5, while the
classical models with the appropriate structure would
\\ 7 lead to the best results for processes 1, 2 and 3. In
practice however usually no a priori knowledge about the
structure is available. Special models often are merely
W assumed for simplicity and one cannot expect the process
to match exactly all underlying assumptions.
Fig. 2: Radial basis function network with two input
nodes, N Gaussians and one output node
1) atan(.) static nonlinearity with linear first order

dynamic system (Hammerstein)
2) linear first order dynamic system with atan(.) static

X X nonlinearity (Wiener)
Fig. 3: 4x4 RBF centres positioned on a lattice
For the choice of kJ and u , ~theoretic foundations are

given in [7]. However assumptions have to be made
about the function f(.) concerning its smoothness and an
upper bound on the magnitude of its spectrum. Our
4r=ll-i qx) = 0.lX'
experimental investigations show, that in most cases best

results can be obtained by choosing 4 between 2 and 3. 3) y2(k-1) is fed back (NDE) [31
If the number of RBFs per dimension increases, the
optimal k, does decrease less than linear. These large
sigmas lead to good results because they offer wider
"possibilities" for the optimization algorithm, since most
or even all Gaussians contribute significantly to any 4) dead zone with linear first order dynamic system and
approximated point. On the other hand sigmas larger cubic static nonlinearity
than one destroy the desirable local properties of RBF
approximators and lead to uninterpretable high absolute
values (positive and negative) for the optimal weights.
Although large sigmas lead to better interpolation
between the training points, they tend to bad results
outside the training area.
To keep the number of parameters small we mainly
workeg with lattice sizes from 3x3 to 6x6. Since first 5 ) u3(k) as input and y(k-l)/(l+y2(k-1)) is fed back [2]
order systems are considered here, n = 2 and we can set Fig. 4: Test Processes
To obtain good identification results the choice of the
input signal u(k) is of large importance. Only very few
literature is available about this topic, e.g. [8] and [9]. It
is clear that all frequencies and amplitudes of interest
should be excited. For linear processes typically a PRBS
(pseudo random binary signal) is used. This is a
periodical signal that closely imitates white noise; thus it
excites all frequencies. We modify this PRBS by
assigning different amplitudes in the range kin to A,,-
to each constant signal part. Fig. 5 shows that AF'RBS
(amplitude modulated PRBS) for Amin = -4 and A, = 4.
We tested different other alternatives for the excitation
signal but the APRBS performed best. It should be
noted, that the AF'RBS does not change sign every step
-0.41
-0.6
0 100
V
200 300 400
Time [SI
like the PRBS does. This is especially helpful when
identifying systems with dead zone nonlinearities. Also
the minimum hold time should be long enough so that 0.61 :$\ ,n i
the systems output has time to approach the new set
point (that is no problem with linear systems). Otherwise
0.41 \
only output amplitudes close to zero would be generated.
On the other hand a very long minimum hold time
would lead to quasi-static excitation. Thus the minimum
hold time was chosen equal to the process time constant
(about 10s in the examples presented here).
At
I
n 1
-Oa4I
-0.6
L
0 100 200
I 300 400
1
I
Time [s]
4 -1 -Wiener
l:-2 model
-3
1
$00
Time [SI
800 1000 1200 --
-0.51 01itput
Fig. 5: Amplitude modulated pseudo random signal with

amplitudes between -4 and 4 and a register length of 7 -1 iI . A
0 100 200 300 400
Time [SI
5. Results
First the network and the parametric models were trained
with an APRBS. Then the obtained models were tested
with the different input signals: the training signal itself,
sinus, saw tooth and rectangular signals with different
amplitudes and frequencies. The following results have
been obtained (see examples in Fig. 6):
Process 1 (Hammerstein): The Hammerstein model
performed best, but the RBF network was only slightly
worse. The Wiener model led to very bad results.
Process 2 (Wiener): The RBF network performed by far
best. The Wiener model led to 5 to 10 times larger -0.8
errors, but still did a good approximation. The Hammer- 0 100 200 300 400
stein and NDE models failed totally. Time [SI
Process 3 (NDE): The NDE model leads to a perfect fit.
The RBF networks performance was worse, but still Fig. 6: Some identification results for process 4 wNh a
very good. Hammerstein model failed (100 times larger sinus input test signal for RBF network, Hammetst&
error than the RBF networks). Wiener and NDE models
3789
Process 4: The RBF network led to the by far best estimation methods were compared. The RBF networks
results. The Hammerstein was worse (10 times larger led to very good results for all examined processes. In
error), but still quite good. Wiener and NDE failed. most cases classical methods performed better if the
Process 5: The RBF network performed best, but the structure assumptions are valid, but they failed if the
Hammerstein model did only slightly worse for large assumptions are invalid.
input amplitudes. Wiener and NDE models failed. It was pointed out, that large RBF variances lead to
Roughly spoken the expected results have been better but more noise sensitive results and destroy the
confirmed. The RBF network performed very good in all local approximation properties. Therefore a compromise
cases. If the assumed structure was different form the has to be found. Since the variances and centres were
actually existing one, the parametric models failed in f i e d before learning, the calculation of a pseudo inverse
most cases. However there are some very remarkable could be used for weight determination. Thus the RBF
outcomes. First the Wiener model led to worse results network learning was very fast (about 1 minute on an
than the RBF network, although all made structure average PC). Therefore both approaches are perfectly
assumptions were valid. This is obviously due to comparable.
problems with approximating the inverse static In our further work we intend to expand the results to
nonlinearity. Second for large input amplitudes the systems of higher order and multi inpudoutput processes.
Hammerstein model performed quite well with process 5, It must be seen clearly that RBF networks suffer from
although the structure assumptions were not valid. The the "curse of dimensionality" and so might have
reason for this astonishing result is the dominating problems with high order systems. On the other hand
nonlinearity u3 (the feedback gain is very small), that when using classical methods the probability for
causes a structure close to a Hammerstein model. structure assumptions being valid also decreases with
However for small input amplitudes between -1 and 1 increasing order.
the time constant is highly dependent on the process
output and only the RBF network does a good
approximation. Acknowledgements
To examine the sensitivity to noise, a (nearly) white We want to thank Peter Damm, who performed many of
noise signal was added to the process output. No major the simulations presented above during his diploma
differences to the previous results have been observed. thesis.
But there is one important and interesting exception. For
larger variances, e.g. k = 3, the RBF network tends to References
fail with process 2 (see Fig. 7). This may be explained
as follows. The network could only be trained for output He X., Asada H.: "A New Method for Identifying Orders
of Input-Output Models for Nonlinear Dynamic Systems",
values between -1.57 and 1.57 ( = atan(m)) and all values ACC, 1993
outside this interval are due to the added noise signal. Narendra K.S., Parthasarathy K.: "Identification and
Although interpolation between two trained areas Control of Dynamical Systems Using Neural Networks",
improves with larger values of U, generalisation IEEE Transactions on Neural Networks, Vol. 1, No. 1,
performance for areas outside decreases. This March 1990
phenomenon is probably due to the very high positive Isermann R., Lachmann K.-H., Matko D.: "Adaptive
and negative weights learned for large U-values. It is not Control Systems", Prentice Hall, 1992
observed for smaller U-values. Lachmann, K.-H.: "Parameteradaptive Regelalgorithmen
f i r bestimmte Klassen nichtlinearer Prozesse mit
eindeutigen Nichtlinearit2tenen",
Fortschrittberichteder VDI
Zeitschriften, Reihe 8, Nr. 66, VDI-Verlag, 1983
desired Poggio T., Girosi F.: "A Theory of Networks for
Approximation and Learning", MIT A.I. Memo No. 1140,
2 C.B.I.P. Paper No. 31, July 1989
$ 1 Moody J., Darken C.J.: "Fast Learning in Networks of
2 Locally-Tuned Processing Units", Neural Computation 1,
3 0
pp. 281-294, 1989
4 -1 Sanner R.M., Slotine J.-J.E.: "Gaussian Networks for
-2 Direct Adaptive Control", IEEE Transactions on Neural
-3 Networks, Vol. 3, No. 6, November 1992
output Sanner, R.M., Slotine J.-J.E.: "Stable Recursive
-4
I I Identification Using Radial Basis Function Networks",
0 200 400 600 800 1000 1200 ACC, 1992
Time [SI Gorinevsky D.: "On the Persistency of Excitation in RBF
Network Identification", ACC, 1994
Fig. 7: RBF network with large U fails with process 2, if [lo] Haber R., Unbehauen H.: "Structure Identification of
the output signal is spoiled with noise Nonlinear Dynamic Systems - A Survey on Inputloutput
Approaches", Automatica, Vol. 26, No. 4, pp. 651-677,
6. Conclusions 1990
For identification of nonlinear dynamic systems radial [Ill Kecman V., Reiffer B.-M.: "Exploiting the Structural
basis function networks and classical parameter Equivalence of Learning Fuzzy Systems and Radial Basis
Function Neural Networks", EUFIT, Aachen, 1994
3790

Identification of Nonlinear Dynamic Systems - Classical Methods Versus RBF Networks - Nelles & Isermann ACC 1995

Uploaded by

Copyright:

Available Formats

Identification of Nonlinear Dynamic Systems - Classical Methods Versus RBF Networks - Nelles & Isermann ACC 1995

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Identification of Nonlinear Dynamic Systems - Classical Methods Versus RBF Networks - Nelles & Isermann ACC 1995

Uploaded by

Copyright:

Available Formats

RocrldllnJsof Ib

American Control Conknnw

Identification of Nonlinear Dynamic Systems -

Abstract the system has to be known a priori or has to be

d) generalized Wiener model

Second the inverse static nonlinearity is used for

3. Radial Basis Function Networks

1) atan(.) static nonlinearity with linear first order

2) linear first order dynamic system with atan(.) static

Fig. 3: 4x4 RBF centres positioned on a lattice

For the choice of kJ and u , ~theoretic foundations are

experimental investigations show, that in most cases best

Fig. 5: Amplitude modulated pseudo random signal with

You might also like