1 Introduction

The most effective methods for simulation, prognosis, and control of technical systems are based on parametric models. In control design, these models are either used for simulation during the control algorithm design or its parameters are used to design a controller. Besides the conventional physically motivated modelling, system identification offers an alternative. Since the biggest effort in control design is the modelling process [1, 2], the ever-growing complexity of technological applications leads to the necessity of expert knowledge that is not freely available. This difficulty is supplemented by the fact that not all phenomena are well enough understood to be physically modelled.

This contribution is concerned with shaping the process of generating suitable data for the identification of parameters with low parameter uncertainty. For this purpose, the optimal experiment design for the identification of TS-models is investigated. Multisine, multistep and a combination of both signals are used as parametric signal models to relax the optimization problem. The impact of different scalar measures of the FIM are investigated as well as different experiment design procedures that group the TS model parameters into local model and partition parameters. Two independent goals, the reduction of the uncertainty and the improvement of the simulation quality on arbitrary validation data, are investigated. After a short introduction of the model class and the identification process, the method is presented followed by the presentation of three case studies, a simulated test system that has been used as a benchmark system before [3] as well as a test stand of a servo-pneumatic longitudinal drive (SPLD) that was used in Zaidi and Kroll [4], and a three-tank system to validate the results from the parameter study.

2 Literature Overview

The experiment design based on the works of Fisher at the beginning of the twentieth century has been steadily extended and improved [5,6,7]. This was driven by the need to analyze the significance of impact factors in regression applications. The industry has a high interest in generating system inputs to improve the data generation for parametrized models, e.g., to reduce the required experiment time to obtain a suitable model quality.

Experiment design can be categorized in five ways: model-based vs. model-free, online vs. offline, linear vs. nonlinear, static vs. dynamic, and bias vs. variance reduction. There are other goals that can influence the experiment design, e.g., structure selection or instrumentation. These factors will not be addressed in this contribution. In process model-free designs, space-filling approaches are used. In the field of dynamic test signal design, these approaches make use of standard test signals, comprehensively presented in [8, 9] and their modification possibilities w.r.t. the identification of nonlinear models in [10,11,12]. Model based approaches are often optimal experiment designs (OED) that use the model equations to minimize some criterion.

The minimization of a measure on the FIM is an OED approach and is the focus of this contribution. Kroll and Dürrbaum [13] and Dürrbaum and Kroll [14] are motivations for this work. In Kroll and Dürrbaum [13], the FIM is used to determine the optimum experiment design for the estimation of the parameters of a static TS model. Only the determinant is used as a scalar measure on the FIM. In Dürrbaum and Kroll [14], the aspect of using space-filling designs to remedy the effect of wrongly assumed partition parameters is introduced to static models. Although some of the aspects are already present in the static case this contribution addresses the dynamic case in a systematic way, analyzing the impact of different optimization procedures and FIM measures with respect to achievable uncertainty improvement as well as model prediction quality. The actual implementation of the design scheme for nonlinear dynamic systems is also addressed which is left open in many other contributions. The broad term “space-filling” design also means broadband excitation signals in the dynamic case, making the actual approach for dynamic systems very different from its motivation based in static systems. The problems that arise from the application of FIM-based experiment design to nonlinear system identification are addressed in other contributions but the actual implementation of such designs for dynamic systems requires a holistic view of the problem. In the seminal work [15] the problem is stated in a way making it similar to an optimal control problem. The emergence of a parameter-dependent FIM is briefly discussed but the complete ramifications of using dynamic models in context of the FIM are not addressed. In Hametner [16], FIM-based test signal design is done online to handle constraints locally. In Heinz [17], input signals are not the result of an optimization but selected w.r.t. model-based measures. Larsson and Hjalmarsson [18] addresses the structure of the optimization problem. In Fang and Shenton [19] the problem of backpropagation of the model output for the construction of the FIM is addressed for a model that is linear in its parameters (LiP) which reduces the complexity of the FIM significantly.

In recent works [20], confidence regions obtained from a posteriori analysis are parametrized and used for subsequent designs. [21] circumvents the calculation of the posterior by representing the desired samples as a reference trajectory, such that the input design task is transformed into an optimal control problem. Neilsen [22] uses the FIM to assess the uncertainty of other quantities than the model parameters that are more important to the applications dealt with. Pleşu [23] uses the FIM to assess the global sensitivity before any experimental design because the importance of high-quality initial models is emphasized. In Walz et al. [24], the importance of task-oriented experiment design is stressed to significantly reduce the experiment costs both in time and computation, because irrelevant system behavior is excluded from the modelling as well as the experiment design. Current works on fuzzy systems focus on the control design, such as Wang et al. [25] and Shen et al. [26] that make use of the persistent dwell-time switching law to design multi-objective fault-tolerant and H\({_{\infty}}\) controllers.

The focus of this contribution is the investigation of design choices of the complete process from initial design to optimal experiments. In previously mentioned works that use an optimization procedure to generate an optimal input the problem of constraints is either bypassed using online designs or the problem of finding constraints is moved to the specific application. Without suitable constraints, no technically feasible signals will arise. In this contribution, parametrized signal models are used to reduce the number of optimization parameters as well as to exploit the inherent signal structure. This contribution specifically targets TS models and therefore their unique model structure can be exploited in the design process. This contribution addresses the details of using optimal model-based designs for dynamic nonlinear modelling like the necessity of a (high quality) initial simulation model (obtained using process model-free designs) to run the optimization. In many other works, the determinant is used as a measure for the FIM [27, 28]. In this contribution, the impact of different measures is investigated.

3 Problem Statement

The goal is to estimate parameters of locally affine TS models given a system

$$\begin{aligned} y(k)\,=\,&f(y(k-1),\dots ,y(k-n), \nonumber \\&u(k-1),\dots ,u(k-m))+\epsilon (k) \end{aligned}$$
(1)

with additive noise \(\epsilon (k)\), input u(k), and output y(k). In order to identify the model parameters with minimal uncertainty, a test signal design scheme is deployed to generate suitable data. The scheme is based on the FIM that is used to estimate the covariance matrix of the model parameters. In the case of independently and identically normally distributed noise (i.i.d.) with variance \(\sigma ^2\) and mean \(\mu =0\) and for an unbiased estimator the FIM is given as

$$\begin{aligned} {\varvec{I}}&=\frac{1}{\sigma ^2}\sum \limits _{k=1}^N\left[ \left[ \frac{\partial {\hat{y}}(k)}{\partial \varvec{\varTheta }}\right] \cdot \left[ \frac{\partial {\hat{y}}(k)}{\partial \varvec{\varTheta }}\right] ^\top \right] _{\varvec{\varTheta }=\varvec{\varTheta }_0} \end{aligned}$$
(2)

with the number of observations N and the parameter vector \(\varvec{\varTheta }\). In the case of a nonlinear dynamic system, the following holds: The FIM depends on the model parameters, therefore an initial parameter vector \(\varvec{\varTheta }_0\) is necessary for the optimization scheme. The derivatives of the model output \({\hat{y}}(k)\) contain lagged values of the output and the input u(k), therefore a change in u(K) affects all values in \({\hat{y}}(k>K)\).

To assess the uncertainty, scalar measures of the FIM are used, which leads to the following optimization problem:

$$\begin{aligned} {\varvec{u}}_\mathrm {opt}&=\mathrm {arg}\,\underset{{\varvec{u}}}{\mathrm {min}}\,J({\varvec{I}}) \end{aligned}$$
(3)

\({\varvec{u}}=\left\{ u(k)\right\}\) is the vector of the input timeseries and J is a scalar measure of the Fisher Information Matrix (FIM) \({\varvec{I}}\). To solve this problem, an (initial) simulation model is used to calculate the output values during the optimization which is similar to optimal control problems. However, due to the nonlinearity of the cost functions J, which will be introduced later, general purpose optimization tools must be used. To achieve feasibility, different signal models are used to reduce the number of optimization parameters. The focus of this contribution is the analysis of the impact of design choices like signal models, FIM measures, and optimization procedures on the ability of the test signals to generate suitable identification data, as shown in Fig. 1. The computation inputs describe all explicit inputs to the optimization procedure like the hyperparameterization of the optimization algorithm or the boundaries of the signal model parameter values.

Fig. 1
figure 1

Impact factors of the optimization scheme

In preliminary investigations, variation of these factors resulted in vastly different results, therefore the focus of this contribution is the impact analysis of said factors.

4 Model Class and Identification Process

4.1 Locally Affine Fuzzy Takagi-Sugeno Models

In this contribution, locally affine dynamical Takagi-Sugeno models are considered. They are universal approximators [29]. TS models are a superposition of c local models \({\hat{y}}_i(k)\) that are weighted by their respective fuzzy basis functions \(\phi _i(k)\) (FBF). The prediction equation of such a model is:

$$\begin{aligned} {\hat{y}}(k)=\sum \limits _{i=1}^{c}\phi _i\left( {\varvec{z}}\left( k\right) ,\varvec{\varTheta }_{\mathrm {MF}}\right) \cdot {\hat{y}}_i\left( \varvec{\varphi }\left( k\right) ,\varvec{\varTheta }_{\mathrm {LM},i}\right) \end{aligned}$$
(4)

In (4), the FBF depend on the scheduling variable \({\varvec{z}}(k)\) and the fuzzy membership (MF) parameters \(\varvec{\varTheta }_{\mathrm {MF}}\) (see 10) which determine the partitioning into local models. The local models depend on the regression variable \(\varvec{\varphi }(k)\) and the i-th local parameter vector \(\varvec{\varTheta }_{\mathrm {LM},i}\). Local models in ARX configuration (autoregressive with exogenous input) can be written as:

$$\begin{aligned} {\hat{y}}_i(k)&=\varvec{\varphi }^{\top }(k)\cdot \varvec{\varTheta }_{\mathrm {LM},i} \end{aligned}$$
(5)

with the regression variable:

$$\begin{aligned} \varvec{\varphi }^{\top }(k)=&\left[ 1\quad y(k-1)\quad \cdots \quad y(k-n)\right. \nonumber \\&\left. u(k-1-\tau )\quad u(k-m-\tau )\right] \end{aligned}$$
(6)

The dead time \(\tau\) is not considered in the following and the i-th local parameter vector is comprised of the coefficients \(a_{i,j}\) of the lagged outputs, \(b_{i,j}\) of the lagged inputs and the affine term \(c_i\).

$$\begin{aligned} \varvec{\varTheta }_{\mathrm {LM},i}^{\top }&=\begin{bmatrix}c_i&a_{i,1}&\cdots&a_{i,n}&b_{i,1}&\cdots&b_{i,m}\end{bmatrix} \end{aligned}$$
(7)

This leads to an alternative formulation of (5):

$$\begin{aligned} {\hat{y}}_i(k)=\sum \limits _{j=1}^{n}a_{i,j}y(k-j)+\sum \limits _{j=1}^{m}b_{i,j}u(k-j)+c_i \end{aligned}$$
(8)

In locally affine TS models, the nonlinear behavior of the global system is encoded in the partitioning that is defined by the FBF. Therefore the choice of the scheduling variable \({\varvec{z}}(k)\) and the membership function parameters \(\varvec{\varTheta }_{\mathrm {MF}}\) define the global nonlinear structure. The FBF depend on the scheduling variable as well as the membership function parameters. In practice, the scheduling variable is chosen using prior knowledge. Without the use of brute force methods [30], it is typically set to be equal to the regressor. The case of external scheduling is excluded in this contribution. In this contribution it is assumed that the scheduling space is part of the regression space. This is a plausible approach for the modelling of dynamic systems. Additionally, in this contribution it is assumed that \({\varvec{z}}(k)\) can be derived with a linear mapping from the regression vector.

The FBF are chosen as prototype-based membership functions of the fuzzy-c-means (FCM) cluster algorithm:

$$\begin{aligned} \mu _i\left( {\varvec{z}}\left( k\right) ,\varvec{\varTheta }_\mathrm {MF}\right)&=\left[ \sum \limits _{j=1}^{c}\left( \frac{\left\Vert {\varvec{z}}\left( k\right) -{\varvec{v}}_i\right\Vert }{\left\Vert {\varvec{z}}\left( k\right) -{\varvec{v}}_j\right\Vert }\right) ^{\frac{2}{\nu -1}}\right] ^{-1} \end{aligned}$$
(9)

In (9), \(\nu >1\) is the fuzziness parameter that regulates how sharp the transitions between partitions are. It can be shown that membership functions (9) are normalized, meaning \(\sum \nolimits _{j=1}^c\mu _j\left( {\varvec{z}}\left( k\right) ,\varvec{\varTheta }_\mathrm {MF}\right) \equiv 1\,\forall \,{\varvec{z}}(k)\). It is possible to choose different distance norms \(\left\Vert \cdot \right\Vert\). In this contribution, the Euclidean distance is used. The prototypes \({\varvec{v}}_i\) make up the vector of membership function parameters:

$$\begin{aligned} \varvec{\varTheta }_{\mathrm {MF}}^\top&=\begin{bmatrix}{\varvec{v}}_1^\top&\cdots&{\varvec{v}}_c^\top \end{bmatrix}^\top \end{aligned}$$
(10)

4.2 Identification Process

The identification of the parameters in (4) is conducted in four steps:

  1. 1.

    Determination of the hyperparameters n, m, c, \(\nu\)

  2. 2.

    Calculation of the initial partitioning through clustering

  3. 3.

    Calculation of the initial local model parameters using ordinary least-squares (LS) estimation

  4. 4.

    Optimization of the model parameters using para-meter values from previous steps for initialization

Step 1. It can be conducted in various ways. Cluster validity measures like in Juhász et al. [31] should be used with caution in context of dynamic models, since these measures treat the problem as quasi-static which might not reflect the dynamical dependencies. In the model equations, \(\nu\), c as well as the input and output lags n and m are hyperparameters. These parameters are determined by structure selection methods and prior knowledge.

Step 2. With the chosen scheduling variable and hyperparameters c and \(\nu\), the initial partitioning can be calculated. By FCM-clustering [32] the prototypes \({\varvec{v}}_i\) are obtained. Since the FCM-clustering algorithm converges locally, it is conducted multiple times with random initializations.

Step 3. For each data point \({\varvec{z}}(k)\) its membership \(\mu _{i,k}({\varvec{z}}(k))\) to each cluster is known. Using (5), (4) can be rewritten as follows:

$$\begin{aligned} {\hat{y}}\left( k\right)&=\sum \limits _{i=1}^{c}\mu _i\varvec{\varphi }^{\top }\varvec{\varTheta }_{\mathrm {LM},i}\nonumber \\&=\underset{\varvec{\varphi }^\top _\mathrm {E}}{\underbrace{\begin{bmatrix}\mu _1\varvec{\varphi }^{\top }\cdots \mu _c\varvec{\varphi }^{\top }\end{bmatrix}}}\cdot \underset{\varvec{\varTheta }_{\mathrm {LM}}}{\underbrace{\begin{bmatrix}\varvec{\varTheta }_{\mathrm {LM},1}^\top \cdots \varvec{\varTheta }_{\mathrm {LM},c}^\top \end{bmatrix}^\top }}\nonumber \\&:=\varvec{\varphi }^\top _\mathrm {E}\cdot \varvec{\varTheta }_{\mathrm {LM}} \end{aligned}$$
(11)

For readability, the arguments of the terms in (11) are dropped. In (11), \(\varvec{\varphi }^\top _\mathrm {E}\) is called the extended regression vector and \(\varvec{\varTheta }_\mathrm {LM}\) the local model (LM) parameter vector. The evaluation of the model in \(N-n\) datapoints in the case of \(n\ge m\) leads to the following equations:

$$\begin{aligned} \underset{\hat{{\varvec{Y}}}}{\underbrace{\begin{bmatrix}{\hat{y}}\left( n\right) \\ \vdots \\ {\hat{y}}\left( N\right) \end{bmatrix}}}&=\underset{\varvec{\varPhi }_\mathrm {E}}{\underbrace{\begin{bmatrix}\varvec{\varphi }^{\top }_\mathrm {E}\left( n\right) \\ \vdots \\ \varvec{\varphi }^{\top }_\mathrm {E}\left( N\right) \end{bmatrix}}}\varvec{\varTheta }_\mathrm {LM} \end{aligned}$$
(12)

Setting the model output equal to the measured output \({\varvec{Y}}^\top =\begin{bmatrix}y(n)\cdots y(N)\end{bmatrix}\) leads to an overdetermined system of linear equations. If a quadratic cost function is minimized, the resulting parameters \(\hat{\varvec{\varTheta }}_\mathrm {LM}\) are the local model parameters of the nonlinear ARX (NARX) model. The quadratic cost function in the prediction error method (PEM) framework is:

$$\begin{aligned} J_\mathrm {PEM}\left( \varvec{\varTheta }_\mathrm {LM}\right) =\left\Vert \hat{{\varvec{Y}}}\left( \varvec{\varTheta }_\mathrm {LM}\right) -{\varvec{Y}}\right\Vert ^2 \end{aligned}$$
(13)

For the Euclidean norm, the minimizing argument is:

$$\begin{aligned} \hat{\varvec{\varTheta }}_\mathrm {LM}&=\mathrm {arg}\,\underset{\varvec{\varTheta }_\mathrm {LM}}{\mathrm {min}}J_\mathrm {PEM}=\left( \varvec{\varPhi }_\mathrm {E}^\top \varvec{\varPhi }_\mathrm {E}\right) ^{-1}\varvec{\varPhi }_\mathrm {E}^\top {\varvec{Y}} \end{aligned}$$
(14)

Step 4. Parameter values determined in steps 2 & 3 can be used to initialize the nonlinear optimization. The partition parameters from clustering and the local model parameters from (14) are not optimal w.r.t. the recursive model evaluation. The recursive model evaluation will be called simulation, since only initial conditions are needed. The local model parameters and the partition parameters are subsumed into the model parameter vector:

$$\begin{aligned} \varvec{\varTheta }^\top&=\begin{bmatrix}\varvec{\varTheta }^\top _\mathrm {LM}&\varvec{\varTheta }^\top _\mathrm {MF}\end{bmatrix}^\top \end{aligned}$$
(15)

The objective function of the nonlinear optimization problem is:

$$\begin{aligned} \hat{\varvec{\varTheta }}&=\mathrm {arg}\,\underset{\varvec{\varTheta }}{\mathrm {min}}\sum \limits _{k=1}^N\left( y(k)-{\hat{y}}\left( k,\varvec{\varTheta }\right) \right) ^2 \end{aligned}$$
(16)

The nonlinear optimization is conducted using the MATLAB function lsqnonlin with the Levenberg-Marquardt algorithm.

4.3 Fisher Information Matrix for Dynamical TS Models

The Fisher Information Matrix \({\varvec{I}}\) for i.i.d. normally distributed noise is given in (2). The Cramér-Rao-Lower-Bound (CRLB) relates the covariance matrix of the estimated parameters with the FIM:

$$\begin{aligned} \mathrm {cov}\left( \hat{\varvec{\varTheta }}\right) \ge \left( {\varvec{I}}\left( \hat{\varvec{\varTheta }}\right) \right) ^{-1} \end{aligned}$$
(17)

Since the parameters of a TS model can be divided into two groups, the FIM is treated in the same way. The derivatives with respect to the local model parameters result in the extended regression vector:

$$\begin{aligned} \frac{\partial {\hat{y}}(k)}{\partial \varvec{\varTheta }_\mathrm {LM}}&=\varvec{\varphi }_\mathrm {E} \end{aligned}$$
(18)

The derivatives in the NARX case with respect to the partition parameters are more complicated:

$$\begin{aligned} \frac{\partial {\hat{y}}(k)}{\partial \varvec{\varTheta }_\mathrm {MF}}&=\sum \limits _{i=1}^c\left( \frac{\partial {\mu _i}(k)}{\partial \varvec{\varTheta }_\mathrm {MF}}{\hat{y}}_i(k)+ {\mu _i}(k)\underset{={\varvec{0}}}{\underbrace{\frac{\partial {\hat{y}}_i(k)}{\partial \varvec{\varTheta }_\mathrm {MF}}}}\right) \end{aligned}$$
(19)
$$\begin{aligned} \frac{\partial {\mu _i}(k)}{\partial \varvec{\varTheta }_\mathrm {MF}}&=\left[ \frac{\partial {\mu _i}(k)}{\partial {\varvec{v}}_1}\,\cdots \,\frac{\partial {\mu _i}(k)}{\partial {\varvec{v}}_p}\,\cdots \,\frac{\partial {\mu _i}(k)}{\partial {\varvec{v}}_c}\right] \end{aligned}$$
(20)
$$\begin{aligned} \frac{\partial {\mu _i}(k)}{\partial {\varvec{v}}_p}&={\left\{ \begin{array}{ll}\frac{2\mu _i(k)^2}{\nu -1}\frac{{\varvec{z}}(k)-{\varvec{v}}}{\left| {\varvec{z}}(k)-{\varvec{v}}_i\right| }\left( \frac{1}{\mu _i(k)}-1\right) &{}i=p\\ -\frac{2\mu _i(k)^2}{\nu -1}\left( {\varvec{z}}(k)-{\varvec{v}}_p\right) \frac{\left| {\varvec{z}}(k)-{\varvec{v}}_i\right| ^{\frac{2}{\nu -1}}}{\left| {\varvec{z}}(k)-{\varvec{v}}_p\right| ^{\frac{\nu +1}{\nu -1}}}&{}i\ne p\end{array}\right. }\nonumber \\&\mathrm {with}\,\nu >1 \end{aligned}$$
(21)

5 FIM-Based Test Signal Design

In this section, the application of the FIM-based optimal test signal design to nonlinear dynamic TS models is presented.

Fig. 2
figure 2

Test signal Optimization scheme

5.1 Design Process

There are significant differences in procedures w.r.t. static and/or linear FIM-based OED problems of which not all have been addressed in the literature:

  • A change in the input at time k impacts all future output values.

  • The input sequence \(\{u(k)\}\) is a trajectory.

  • \(\{u(k)\}\) is too long to be optimized for each u(k).

  • The FIM cannot be evaluated without model parameter values.

The consequences of these aspects are significant. In the case of a linear static problem, the FIM only depends on the choice of independent inputs, therefore an optimization procedure places these input values within specified bounds. In case of a nonlinear static problem, the optimization procedure is localized since model parameters are needed to calculate the FIM. In the case of a dynamic problem, the FIM contains system outputs that change based on the iterations of the input. Therefore, during the optimization a simulation model is needed to generate the system outputs in each iteration. The proposed test signal design method follows in five steps:

  1. 1.

    Initial signal model parameters \(\varvec{\beta }_0\) are used to generate a test signal \({\varvec{u}}^{(0)}(\varvec{\beta }_0)\) using the signal model.

  2. 2.

    The initial model \({\mathcal {M}}_0\) is used to generate the initial output \({\varvec{y}}^{(0)}\).

  3. 3.

    The test signal, the output, and the initial model parameters are used to calculate the FIM.

  4. 4.

    A scalar measure on the inverse FIM is calculated.

  5. 5.

    IF: Termination criterion is not met, THEN: Signal parameters are adjusted, ELSE: Optimal test signal is found.

The procedure is presented in Fig. 2. The process model as well as the FIM calculation depend on the initial model parameters \(\varvec{\varTheta }_0\) that have been obtained from process model-free identification. Since gradient-based optimization schemes converge to local minima, a particle swarm algorithm has been used. Since the absolute values differ w.r.t. various factors, a convergence-based criterion was chosen. A threshold of \(\epsilon _\mathrm {termination}=10^{-4}\) for the relative change of the best solution has been selected as a termination criterion. This means that in each iteration step, the model has to be simulated for each particle, causing significant computational cost. Before different aspects of the method are clarified in the following subsections, problem (3) has to be reformulated to correspond to Fig. 2:

$$\begin{aligned} \varvec{\beta }_\mathrm {opt}=\mathrm {arg}\,\underset{\varvec{\beta }}{\mathrm {min}}\,J\left( {\varvec{I}}\left( \hat{{\varvec{y}}} \left( {\varvec{u}} \left( \varvec{\beta }, \varvec{\varTheta } \right) , {\varvec{u}} \left( \varvec{\beta }, \varvec{\varTheta } \right) \right) \right) \right) \end{aligned}$$
(22)

The goal is an optimal test signal, but the test signal depends on signal model parameters \(\varvec{\beta }\). Therefore the actual optimization is w.r.t. those parameters. The adjustment of the signal model parameters is subject to a measure on the FIM. The FIM depends on a test signal, a simulated model output as well as the initial model parameters. The simulated output itself depends on the test signal and the model parameters. Therefore, the optimization of the FIM is a highly complex, non-convex, and cascaded problem which is addressed in empirical studies in this contribution.

5.2 Measures on the Fisher Information Matrix

Due to the CRLB there is an inverse relation between the FIM and the covariance matrix of the parameter estimates. With i.i.d. normally distributed zero mean noise, the eigenvalues of the covariance matrix represent the spread in the direction of the half-axes of an ellipsoidal uncertainty region that is characterized by the eigenvectors. Therefore, the most commonly used measures on the FIM use eigenvalues. (2) can be viewed as a matrix that contains the sensitivity of the model output w.r.t. the model parameters. Therefore, the optimization goal is reformulated without rigorous statistical assumptions: The input should be chosen in a way that the model output is most sensitive to a change in the parameters. If the sensitivities are high, the values that any parameter can take are reduced and therefore indirectly the spread of the model parameters is reduced, even though “the spread” cannot be interpreted as an estimation of statistical quantities in a rigorous manner. If a matrix is assessed by a scalar measure, information is lost. Therefore, different measures will be compared:

  • M1: Determinant of the inverse FIM \(J_\mathrm {det}\)

  • M2: Maximum Eigenvalue of the inverse FIM \(J_\mathrm {eig}\)

  • M4: Scaled trace of the inverse FIM \(J_\mathrm {str}\)

  • M3: Trace of the inverse FIM \(J_\mathrm {tr}\)

  • M5: Squared sum of the inverse FIM elements \(J_\mathrm {sens}\)

The measures are implemented as follows:

$$\begin{aligned} J_\mathrm {det}&=\mathrm {log}\left( \mathrm {det}({\varvec{I}}^{-1})\right) =-\mathrm {log}\left( \mathrm {det}\left( {\varvec{I}}\right) \right) \end{aligned}$$
(23)
$$\begin{aligned} J_\mathrm {eig}&=\mathrm {max}\left( \mathrm {eig}\left( {\varvec{I}}\right) ^{-1}\right) \end{aligned}$$
(24)
$$\begin{aligned} J_\mathrm {str}&=\sum \limits _{k=1}^{n_\mathrm {p}}\frac{\lambda _{{\varvec{I}},\mathrm {init},k}}{\lambda _{{\varvec{I}},\mathrm {opt},k}}\end{aligned}$$
(25)
$$\begin{aligned} J_\mathrm {tr}&=-\mathrm {log}\left( \mathrm {tr}\left( {\varvec{I}}\right) \right) \end{aligned}$$
(26)
$$\begin{aligned} J_\mathrm {sens}&=\sum \limits _{k=1}^{n_{\mathrm {p}}}\sum \limits _{l=1}^{n_{\mathrm {p}}}g^2_{k,l} \end{aligned}$$
(27)

In (23) and (26) the negative logarithm of the determinant and the trace is used to avoid calculating the inverse as well as to address the large range of values. In (24) the maximum of the inverse eigenvalues of the FIM are used. In (25), the eigenvalues are normalized to the corresponding eigenvalue of the original test signal to consider the relative change of the eigenvalues. In (27), \(g_{k,l}\) are the elements of the inverse FIM and \(n_{\mathrm {p}}\) is the number of model parameters. The sensitivity sum is the sum of squared elements of the FIM. The choice of measure is a multi-objective optimization problem that is addressed by scalarization through weighted aggregation.

5.3 Optimization and Identification Procedures

The parameters can be separated into the local model \(\varvec{\varTheta }_\mathrm {LM}\) and membership function parameters \(\varvec{\varTheta }_\mathrm {MF}\). The sensitivities w.r.t. different parameter groups of a TS model are on different scales which was confirmed by preliminary experiments. Two approaches to rescale the sensitivities have been investigated. First, the sensitivities were normalized to a percentage change in the parameter. This approach did not yield any improvements on the optimal experiment design. Second, the derivatives have been rescaled with the determinants of the block matrices of local model and partition parameters. This also did not improve the optimal experiment design.

Therefore, another path has been chosen in this contribution. The optimization of the local model parameters and the partition parameters is separated which leads to smaller matrices. The FIM for the local model parameters \({\varvec{I}}_\mathrm {LM}\) is just composed of the “squared” extended regression matrix. The FIM for the partition parameters \({\varvec{I}}_\mathrm {MF}\) is more complicated:

$$\begin{aligned} {\varvec{I}}&=\frac{1}{\sigma ^2}\begin{bmatrix}{\varvec{I}}_\mathrm {LM}&{}{\varvec{I}}_\mathrm {coupl}\\ {\varvec{I}}^\top _\mathrm {coupl}&{}{\varvec{I}}_\mathrm {MF}\end{bmatrix} \end{aligned}$$
(28)

with

$$\begin{aligned} {\varvec{I}}_\mathrm {LM}&=\sum \limits _{k=1}^N\varvec{\varphi }_\mathrm {E}\varvec{\varphi }_\mathrm {E}^\top =\varvec{\varPhi }_\mathrm {E}^\top \varvec{\varPhi }_\mathrm {E}\end{aligned}$$
(29)
$$\begin{aligned} {\varvec{I}}_\mathrm {MF}&=\sum \limits _{k=1}^N\sum \limits _{i=1}^c\left( \frac{\partial \mu _i}{\partial \varvec{\varTheta }_\mathrm {MF}}\right) \left( \frac{\partial \mu _i}{\partial \varvec{\varTheta }_\mathrm {MF}}\right) ^{\top }{\hat{y}}_i(k)^2 \end{aligned}$$
(30)

Those matrices lack the couplings in the complete FIM which is subject to the investigation. It is not clear what constitutes a fair comparison in this type of investigation. The following procedures have been determined:

  • P1: local model and membership function parameters

  • P2: only local model parameters

  • P3: only membership function parameters

  • P4: merged test signal

  • P5: sequential

The mentioned procedures are combinations of identification schemes, optimization schemes of the FIM as well as the construction of the complete test signal. The procedures P1 - P5 have been selected specifically to reduce the number of possible combinations in a way to enable a fair comparison. P1 is the baseline and denotes the optimization of the FIM (28) and the simultaneous estimation of all model parameters in the optimization step. In the other procedures the block structure of the FIM has been exploited. In P2 and P3, the optimization of the test signal has been conducted w.r.t. the local model parameters and the partition parameters, using the partial FIMs (29) and (30), respectively. Since the signals resulting from P2 and P3 are optimized for the identification of the respective parameter group only those parameters have been estimated during the nonlinear optimization step.

Procedures P4 and P5 use the FIM optimization scheme from P2 and P3 but each only with half-length signals. In P4 the two half-length signals are concatenated to obtain a full-length signal. Even for multisine signals, the introduced step at the merging point has no significant impact. This signal is used for a simultaneous estimation of all model parameters. In P5 the half-length signals from P4 are used to sequentially estimate the membership function parameters based on a half-length P3-signal first, and then local model parameters based on a half-length P2 signal. FIM-based test signal design (for nonlinear models) is localized in the region of the true model parameters \(\varvec{\varTheta }_0\) that are substituted by the initial model parameters \(\varvec{\varTheta }_\mathrm {pmf}\). Therefore, the nonlinear optimization based on optimal datasets is initialized with the initial model parameters. FCM partitioning and NARX local model parameters might result in an initialization that has no connection to the optimal design and therefore no conclusions about the parameter uncertainty reduction could be drawn.

6 Case Studies

In this section the results of two extensive and one compact case studies are presented. The presented procedures are applied to a simulation case study (SIM) as well as a servo-pneumatic longitudinal drive test stand (SPLD) and in parts to a 3-tank system (3TK). Multisine, multistep and a combination of both signal types are considered as test signals. Different measures on the FIM as well as different optimization procedures to exploit the TS model structure are investigated. The method is assessed mainly by the simulation error on validation datasets and secondarily by the model parameter uncertainty.

6.1 Experimental and Identification Setups

For the two extensive case studies, an artificial simulation system in the form of a difference equation and a laboratory test stand have been used. For a third compressed case study another laboratory test stand was used.

6.1.1 Narendra System (SIM)

The system

$$\begin{aligned} y(k)&=\frac{y(k-1)\cdot y(k-2)\cdot \left( y(k-1)+2.5\right) }{1+y(k-1)^2+y(k-2)^2}\nonumber \\&+u(k-1)+\epsilon (k) \end{aligned}$$
(31)

was first presented in Narendra and Parthasarathy [3] and serves as an identification benchmark. It is defined by a nonlinear \(2^{\mathrm {nd}}\) order difference equation in the output y(k). u(k) is the input and \(\epsilon (k)\in {\mathcal {N}}\left( 0,0.5^2\right)\) is chosen to be an i.i.d. normally distributed random variable. The noise variance has been chosen as \(\sigma ^2=0.5^2\) to obtain a relatively low signal to noise ratio (SNR).

The hyperparameters c and \(\nu\) of the model are chosen to be \(c=5\) and \(\nu =1.1\) based on the work in Gringard et al. [33]. Since the true system is known and the assumption of fuzzy TS models is that the nonlinear behavior is captured by the partitioning, the dynamic orders of the input and output of the TS model are chosen to be the same as in (31): \(n=2\) and \(m=1\). From (31) we also understand that the nonlinear behavior depends on the first and second lag of the output. Therefore, the regression vector is chosen as:

$$\begin{aligned} \varvec{\varphi }^{\top }(k)=\begin{bmatrix}1&y(k-1)&y(k-2)&u(k-1)\end{bmatrix} \end{aligned}$$
(32)

With the knowledge about the system’s nonlinearity, the scheduling variable \({\varvec{z}}\) is chosen as:

$$\begin{aligned} {\varvec{z}}(k)^\top =\begin{bmatrix}y(k-1)&y(k-2)\end{bmatrix} \end{aligned}$$
(33)

This model has 30 parameters in total.

6.1.2 Servo-Pneumatic Longitudinal Drive (SPLD)

The SPLD test stand is described in detail in [4]. Further details can be found in [34]. The system input is the voltage applied to the servo valve and the output is the longitudinal position of the piston rod. The nonlinear behavior is due to the friction. The friction is reduced by dithering. W.r.t. the test signal design, a maximum frequency of \(f_\mathrm {max}=3\,\mathrm {Hz}\) for the multisine signals as well as the slowest time constant \(T_\mathrm {slow}=0.28\,\mathrm {s}\) for the multistep signals have been determined in preliminary experiments. To achieve a trade-off between reactivation effects of the FBF and smoothing model behavior, a fuzziness parameter of \(\nu =1.3\) has been selected. Using cluster-validation and assessing prediction errors, \(c=6\) local models are chosen. The orders of the input and output dynamics have been selected as \(n=4\) and \(m=3\), respectively, based on initial experiments. The scheduling vector is chosen equal to the regression vector. This results in 90 model parameters in total.

6.1.3 Three-Tank System (3TK)

The three-tank system [35] consists of three tanks that are serially connected. The system input is the volumetric flow rate q fed into the first tank by a pump. Since the fluid dynamics are significantly slower than the pump dynamics, the latter is neglected. The measured system output is the level \(h_2\) of a tank on the other end. The slowest time constant \(T_\mathrm {slow}=900\,\mathrm {s}\) has been used to determine the minimum and maximum lengths of the steps for S1 test signal design. In preliminary experiments \(c=4\), \(n=3\), \(m=1\), and the fuzziness parameter \(\nu =1.3\). The scheduling variable has been determined to be the first 3 lags of the output variable, resulting in 32 model parameters in total.

6.2 Test Signal Design

Table 1 Experiment design variations

Table 1 shows the experiment design variables and how they are applied to the two extensive case studies. The signal model parameters for the multistep signals are the step lengths and step amplitudes, for the multisine signals the parameters are the amplitudes and phases of frequencies on a preselected grid. In the SIM and SPLD case study some of the design variations differ. The results from the SPLD as well as preliminary studies on the SIM case study have shown that the combined test signal (S3) is not suitable for this task and has been removed from experiment design for the SIM. The sensitivity sum (M5) was not used for the SPLD because the PSO was not able to consistently find a minimum, due to the high number of parameters. Therefore, M5 was removed from experiment design in the SPLD case study. The scaled trace (M2) and the scaled trace (M3) did not yield any significant difference on the SIM case study and therefore have been removed from the SIM case study. The eigenvalues (M2) have not been used for the SIM case study, because preliminary experiments as well as the results from the SPLD have shown that the measure has no impact. In the 3TK case study, one combination (S2, M1, P3) is used to design an optimal test signal to indicate that the proposed method can be transferred to different systems. The variations for the artificial test system (SIM) result in 30 (2 (Signals) \(\times\) 3 (Measures) \(\times\) 5 (Procedures)) optimal designs, the variations for the SPLD test stand result in 108 (3 (Signal) \(\times\) 4 (Measures) \(\times\) 9 (Procedures)) optimal designs. The number of data points of the signals for the SIM case study is around 5000, for the SPLD around 2500, and for the 3TK around 6000. However, the signal lengths of all signals within the case studies are comparable and will therefore not be discussed further. For each signal type and system an initial test signal has been designed using process model-free methods. Repeated experiments have been conducted to choose the best model w.r.t. simulation quality on the validation dataset (that is used for assessment of the optimized models). This model is used to initialize the model-based designs.

6.3 Conducting the Experiments

To statistically investigate the impact of experiment design choices on the reduction of the uncertainty of the estimated parameters and the simulation quality, large amounts of data were collected. The initial models were obtained using standard test signal design methods. For the optimization, a particle swarm optimization (PSO) algorithm is used, because preliminary investigations have shown that gradient-based methods cannot handle the search space. If a solution candidate is found by the particleswarm, a gradient-based algorithm (fmincon) efficiently finalizes the optimization. The MATLAB function particleswarm was used with a swarm size of 100. The inertia range has been set to \(\begin{bmatrix}0.1&1.1\end{bmatrix}\). Both adjustment weights were set to 1.49. The meaning of those parameters is described in [36]. From preliminary works it is known that the PSO itself is robust w.r.t. random swarm initializations as well as for model parameter uncertainties.

For each of the initial test signals, repeated experiments have been conducted. For the artificial system, \(N_\mathrm {SIM}=25\) repeats for the multisine and multistep signals each have been conducted. For the SPLD, each initial test signal has been applied \(N_\mathrm {SPLD}=20\) times. On the 3TK system each test signal has been applied \(N_\mathrm {3TK}=10\) times. The best performing models were selected for further use in the model-based design process and as a baseline for the simulation quality. The optimization of a test signal has only been initialized with a model identified with a signal of the same type. From the repeated experiments, information about the parameter uncertainty can be extracted for the process model-free test signal design.

The calculation of the optimal test signals as well as the identification and model evaluations have been conducted on an Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz 3.60GHz PC with 64.0 GB of RAM.

6.4 Results

The data collected in the parameter case study is used to assess the simulation quality as well as the parameter estimation uncertainty. Since the experiments have been conducted with many repeating experiments, “a model” refers to all models that have been identified from a dataset generated with the same test signal. This way, “a model” is defined by the combination of the signal model (S), the FIM measure (M) as well as the optimization procedure (P). Fundamentally, the RMSE is used to assess the simulation quality:

$$\begin{aligned} J_\mathrm {RMSE}=\sqrt{\frac{1}{N}\sum \limits _{k=1}^{N}\left( {\hat{y}}(k)-y(k)\right) ^2} \end{aligned}$$
(34)

All models have been evaluated on several validation datasets. Since there is no normalization method that is invariant w.r.t. different signals, the change relative to the initial model’s simulation quality is used. This enables a fair comparison over different validation signals. The parameter estimation uncertainty is assessed by comparing the empirical covariance matrices based on optimal experiments to the empirical covariance matrices based on the respective process model-free experiment. As an additional assessment, the method is applied to a system with different dynamics and evaluated based on the simulation error as well as direct uncertainty analysis.

6.4.1 Simulation Quality

The simulation quality is assessed in two ways. The first criterion is the fraction of improvement by at least \(25\,\%\) over the respective initial model’s performance, which will be called “success rate”. Figures 3, 4, 5, and 6 show the percentages of models reaching this threshold.

Fig. 3
figure 3

Reduction of the RMSE in \(\%\) by at least \(25\,\%\) w.r.t. the initial model for different FIM measures (SIM)

Fig. 4
figure 4

Reduction of the RMSE in \(\%\) by at least \(25\,\%\) w.r.t. the initial model for different procedures (SIM)

Fig. 5
figure 5

Reduction of the RMSE in \(\%\) by at least \(25\,\%\) w.r.t. the initial model for different FIM measures (SPLD)

Fig. 6
figure 6

Reduction of the RMSE in \(\%\) by at least \(25\,\%\) w.r.t. the initial model for different procedures (SPLD)

Figures 3 and 4 show the results for the Narendra system, Figs. 5 and 6 for the SPLD. The simulation error improvements are either grouped by the procedures or by the FIM measures.

Table 2 Avg. Improvement by FIM measure in \(\%\) (SIM)
Table 3 Avg. improvement by procedure in \(\%\) (SIM)
Table 4 Avg. improvement by FIM measure in \(\%\) (SPLD)
Table 5 Avg. improvement by procedure in \(\%\) (SPLD)

Tables 2, 3, 4, 5 show the average improvements of the successful models regarding the first criterion. The numbers show to which percentage of the initial model’s performance the optimal model’s performance can be lowered by the method. Figure 7 shows an evaluation of several models on a validation datasets for the compact 3TK case study. The FIM-based model outperforms both the model based on a PRBS as well as the PMF-based model.

Fig. 7
figure 7

Simulation results on the 3TK system

6.4.2 Parameter Uncertainty

The investigation of the parameter uncertainty yielded the result that there is no significant change of uncertainty w.r.t. different FIM measures. Therefore, the results have been aggregated. A threshold is used to determine the fraction of models that have reduced uncertainty of at least \(25\,\%\). The uncertainty is assessed the measures on the empirical covariance matrices of the optimal models as well as the initial models. It is presented for M4 in Table 6. The covariance matrices are calculated separately for each parameter group. Since P2 and P3 do not make changes to the partition parameters and the local model parameters, respectively, the corresponding fields are left out. P5 is disregarded in this analysis because it is just the sequential identifications with P2 and P3 signals. To verify the results from the extensive case study, models based on different test signal designs have been analyzed. Table 7 shows the measure M1 of the models. FIM-based design yields more accurate parameters than identification based upon a PRBS as well as PMF design.

Table 6 Fraction of models in \(\%\) with reduced uncertainty using M4 of at least \(25\,\%\)
Table 7 Uncertainty assessment of the 3TK with M1

6.5 Discussion

The choice of the signal model is significant for the resulting model quality, therefore it is important to characterize the type of excitation. The choice can have an impact in three ways. First, a signal model may not be suitable for identifying the plant in general. Second, using a plant model based on a signal type for optimal experiment design of another signal type yields worse results. Third, choosing an input signal type that fits the operating scenario is more likely to cover the relevant behavior. For the Narendra system, multistep signals performed superior, as can be seen in Figs. 3 and 4. S1 and S2 performed comparably for the SPLD, but multisine-based models do not perform well for the Narendra system. From Figs. 4 and 6 it follows that P1 and P3 measures perform very well both on the SPLD as well as the Narendra system, disregarding the overall mediocre performance of S1 models on that test system. P2 focusses on the local model parameters, resulting in a high variability of the partition parameters and therefore of the simulation results. The signal results in P4 is not optimal w.r.t. either parameter group or therefore resulting in mediocre simulation performance. The breakdown w.r.t. the measures in Figs. 3 and 5 implies that the choice of the measure has no significant impact on the results, which was verified by an analysis of the variance (ANOVA).

Table 6 exemplifies the results regarding the uncertainty. It can be observed that the goal of reducing the parameter uncertainty was achieved in the majority of the experiments. However, accurately estimated parameters do not necessarily result in a high simulation quality. Considering the results for the uncertainty reduction of P3 as well as the simulation results, focussing on the membership function parameters leads to better models.

There are several reasons why P3 yields the best results. The membership function parameters determine the nonlinear behavior of the TS model. The quality of the local model parameter estimation is limited by the estimation of the membership function parameters. If this is not considered during the design, local model parameters cannot be estimated accurately. The optimization problem becomes more complex using all model parameters, explaining the inferior performance of P1 compared to P3.

S3 underperforms in every case. The signals have been observed during optimization. During optimization all signal model parameters are treated equally, leading to “chaotic” behavior that is not present if the two signal are optimized individually. For the compact case study of the three-tank system (3TK) an optimal test signal was designed (S2, M1, P3). The uncertainty was reduced considering M1 as shown in Table 7. The assessment of the simulation quality based on the 3TK case study is exemplary shown in Fig. 7.

6.6 Conclusions

The targeted test signal design goal is the reduction of parameter estimation uncertainty as well as the improvement of the simulation quality. The design performance regarding both goals has been investigated. Table 6 shows that the first design goal is achieved, in principle. However, this does not necessarily lead to an improvement of the simulation quality. This is not a case of a bias-variance trade-off but rather that the partitioning conditions the estimation of the local model parameters. Even though local model parameters might be estimated accurately, a low confidence in the partitioning results in worse global simulation properties. Therefore, the design should emphasize the partition parameters. The estimation of the local model parameters is easier and their consideration during the optimal test signal design increases the complexity of the optimization problem in a way that is detrimental to the solution. The choice of the FIM measure does not significantly impact the results, therefore the measure should be chosen considering the computational complexity. Even though P3 is only optimal w.r.t. the partition parameters, it performs better than P1. This is an interesting result, since the computational costs reduce significantly, if the FIM measure is chosen correctly. FIM-based test signal design for TS models assumes that at least the model structure is roughly correct and that the partition parameters are already in a region around their “optimal value” or a computationally demanding robust design (such as sequential) must be applied. This again stresses the importance of model-free test signal design before attempting an optimal design. FIM-based optimal experiment design for dynamic TS models should include a well-designed preceding model-free design stage, consider the operating conditions, and focus on the estimation of partition parameters to achieve low uncertainty as well as high simulation quality models. In a previous study [33], the method has been applied to an electro-mechanical throttle. The results are comparable, even though less design choices were studied.

7 Summary and Outlook

In this contribution a novel method for FIM-based optimal experiment design for the identification of nonlinear dynamic Takagi-Sugeno multi-models was presented. The impact of design choices was analyzed on a simulated system as well as a laboratory test stand and tested on another laboratory test stand. It was demonstrated that the approach leads to improvements in parameter uncertainty, which was the stated goal, but also to significant improvements in simulation quality.

The presented method is a holistic approach, considering all aspects of FIM-based test signal design for the identification of nonlinear dynamic models combined and analyzing the impact of design choices, whereas many other contributions focus on a specific aspect and assume the choice of measure and signal model. It was shown that the design choices have a significant impact on the achievable results, especially exploitation of the unique multi model structure. It was also shown that different measures on the FIM do not result in significant changes either in uncertainty or simulation quality but that the computational cost can be reduced by choosing less complex measures.