Adaptive online sequential extreme learning machine for dynamic modeling

Jie Zhang¹,
Yanjiao Li¹ &
Wendong Xiao¹

596 Accesses
12 Citations
Explore all metrics

Abstract

Extreme learning machine (ELM) is an emerging machine learning algorithm for training single-hidden-layer feedforward networks (SLFNs). The salient features of ELM are that its hidden layer parameters can be generated randomly, and only the corresponding output weights are determined analytically through the least-square manner, so it is easier to be implemented with faster learning speed and better generalization performance. As the online version of ELM, online sequential ELM (OS-ELM) can deal with the sequentially coming data one by one or chunk by chunk with fixed or varying chunk size. However, OS-ELM cannot function well in dealing with dynamic modeling problems due to the data saturation problem. In order to tackle this issue, in this paper, we propose a novel OS-ELM, named adaptive OS-ELM (AOS-ELM), for enhancing the generalization performance and dynamic tracking capability of OS-ELM for modeling problems in nonstationary environments. The proposed AOS-ELM can efficiently reduce the negative effects of the data saturation problem, in which approximate linear dependence (ALD) and a modified hybrid forgetting mechanism (HFM) are adopted to filter the useless new data and alleviate the impacts of the outdated data, respectively. The performance of AOS-ELM is verified using selected benchmark datasets and a real-world application, i.e., device-free localization (DFL), by comparing it with classic ELM, OS-ELM, FOS-ELM, and DU-OS-ELM. Experimental results demonstrate that AOS-ELM can achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Extended Extreme Learning Machine with Residual Compensation and Its Application to Device-Free Localization

Kernel online sequential ELM algorithm with sliding window subject to time-varying environments

Article 07 December 2016

The Parameter Updating Method Based on Kalman Filter for Online Sequential Extreme Learning Machine

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Engel Y, Mannor S, Meir R (2004) The kernel recursive east-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285
Article MathSciNet Google Scholar
Fusi S, Miller EK, Rigotti M (2016) Why neurons mix: High dimensionality for higher cognition. Curr Opin Neurobio 37:66–74
Article Google Scholar
Gu Y, Liu J, Chen Y, Jiang X, Yu H (2014) TOSELM: Timeless online sequential extreme learning machine. Neurocomptuing 128:119–127
Article Google Scholar
Hirnik K (1991) Approximation capabilities of multilayer feedforword networks. Neural Netw 2:251–257
Article Google Scholar
Huang GB, Chen YQ, Babri HA (2000) Classification ability of single hidden layer feedforward networks. IEEE Trans Neural Netw 11:799–801
Article Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: Theory and applications. Neurocomptuing 70:489–501
Article Google Scholar
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Article Google Scholar
Lan Y, Soh YC, Huang GB (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72:3391–3395
Article Google Scholar
Li Y, Zhang S, Yin Y, Xiao W, Zhang J (2017) A novel online sequential extreme learning machine for gas utilization ratio prediction in blast furnaces. Sensors 17(8):1847–1870
Article Google Scholar
Li Y, Zhang S, Yin Y, Zhang J, Xiao W (2019) A soft sensing scheme of gas utilization ratio prediction for balst furnace via improved extreme learning machine. Neural Process Lett 50:1191–1213
Article Google Scholar
Li Y, Zhang S, Zhang J, Yin Y, Xiao W, Zhang Z (2020) Data-driven multi-objective optimizaton for burden surface in blast furnace with feedback compensation. IEEE Trans Ind Inform 16(4):2233–2244
Article Google Scholar
Li Y, Li H, Zhang J, Zhang S, Yin Y (2020) Burden surface decision using MODE with TOPSIS in blast furnace ironmkaing. IEEE Access 8:5712–35725
Google Scholar
Liang NY, Huang GB, Saratchandran P, Sundararajan N (2006) A fast and accurate online squential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423
Article Google Scholar
Lim JS, Lee S, Pang HS (2013) Low complexity adaptive forgetting factor for online sequential extreme learning machine (OS-ELM) for application to nonstationary system estimations. Neural Comput Appl 22:569–576
Article Google Scholar
Maponi P (2007) The solution of linear systems by using the Sherman-Morrison formula. Linear Algebra Appl 420(2–3):276–294
Article MathSciNet Google Scholar
Merched R (2013) A unified approach to structured covariances: Fast generalized sliding window RLS recursions for arbitrary basis. IEEE Trans Signal Process 61(23):6060–6075
Article MathSciNet Google Scholar
Safaei A, Wu QMJ, Thangarajah A, Yang Y (2019) System-on-a-chip (SoC)-based hardware acceleration for an online sequential extreme learning machine (OS-ELM). IEEE Trans Comput Aided Des Int Circuits Syst 38(11):2127–2138
Article Google Scholar
Scardapane S, Comminiello D, Scarpiniti M, Uncini A (2015) Online sequential extreme learning machine with kernels. IEEE Trans Neural Netw Learn Syst 26(9):2214–2220
Article MathSciNet Google Scholar
Soares SG, Araujo R (2016) An adaptive ensemble of on-line extreme learning machines with variable forgetting factor for dynamic system prediction. Neurocomptuing 171:693–707
Article Google Scholar
Tang J, Yu W, Chai T, Zhao L (2012) On-line principal component analysis with application to process modeling. Neurocomputing 82:167–178
Article Google Scholar
Wang R, Kwong S, Wang XZ (2012) A study on random weights between input and hidden layers in extreme learning machine. Soft Comput 16:1465–1475
Article Google Scholar
Xiao W, Zhang J, Li Y, Zhang S, Yang W (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomptuing 261:70–82
Article Google Scholar
Zhang J, Xiao W, Zhang S, Huang S (2017) Device-free localization via an extreme learning machine with parameterized geometrical feature extraction. Sensors 17(4):879–890
Article Google Scholar
Zhang J, Xiao X, Li Y, Zhang S (2018) Residual compensation extreme learning machine for regression. Neurocomputing 311:126–136
Article Google Scholar
Zhang J, Xiao W, Li Y, Zhang S, Zhang Z (2020) Multilayer probability extreme learning machine for device-free localization. Neurocomputing 396:383–393
Article Google Scholar
Zhao J, Wang Z, Park DS (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomptuing 87:79–89
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful comments and suggestions.

Author information

Authors and Affiliations

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Jie Zhang, Yanjiao Li & Wendong Xiao

Authors

Jie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanjiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Wendong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wendong Xiao.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Theorem 1

Assuming that from time q to time $q+r-1$, there are input vector ${X_q}$ and output vector ${Y_q}$ of the r samples, where

$$\begin{aligned} {X_{q + r - 1}}= & {} [{x_q},{x_{q + 1}},\ldots ,{x_{q + r - 1}}] \end{aligned}$$

(29)

$$\begin{aligned} {Y_{q + r - 1}}= & {} [{y_q},{y_{q + 1}},\ldots ,{y_{q + r - 1}}] \end{aligned}$$

(30)

The corresponding output weights ${\beta _{q + r - 1}}$ can be obtained by

$$\begin{aligned} Min:\left\| {{H_{q + r - 1}}{\beta _{q + r - 1}} - {Y_{q + r - 1}}} \right\| \end{aligned}$$

(31)

where ${H_{q + r - 1}}$ is the hidden layer output matrix of the r samples from time q to time $q+r$-1:

$$\begin{aligned} {H_{q + r - 1}}= & {} \left[ {\begin{array}{*{20}{c}} {G({a_1},{b_1},{x_q})}&{} \cdots &{}{G({a_N},{b_N},{x_q})}\\ \vdots &{} \ddots &{} \vdots \\ {G({a_1},{b_1},{x_{q + r - 1}})}&{} \cdots &{}{G({a_N},{b_N},{x_{q + r - 1}})} \end{array}} \right] \end{aligned}$$

(32)

Let ${P_{q + r - 1}} = K_{q + r - 1}^{ - 1}$, the solution of ${\beta _{q + r - 1}}$ is

$$\begin{aligned} {\beta _{q + r - 1}} = {K_{q + r - 1}^{ - 1}}H_{q + r - 1}^T{Y_{q + r - 1}} \end{aligned}$$

(33)

where ${K_{q + r - 1}} = H_{q + r - 1}^T{H_{q + r - 1}}$.

At time $q+r$, when a data pair $({x_{q + r}},{y_{q + r}})$ comes, ${H_{q + r - 1}}$ becomes

$$\begin{aligned} \begin{array}{l} {H_{q + r}} = \left[ {\begin{array}{*{20}{c}} {G({a_1},{b_1},{x_q})}&{} \cdots &{}{G({a_N},{b_N},{x_q})}\\ \vdots &{} \ddots &{} \vdots \\ {G({a_1},{b_1},{x_{q + r - 1}})}&{} \cdots &{}{G({a_N},{b_N},{x_{q + r - 1}})}\\ {G({a_1},{b_1},{x_{q + r}})}&{} \cdots &{}{G({a_N},{b_N},{x_{q + r}})} \end{array}} \right] \nonumber \\ ~~~~~~~= \left[ {\begin{array}{*{20}{c}} {{H_{q + r - 1}}}\\ {{h^T}(q + r)} \end{array}} \right] \end{array}\nonumber \\ \end{aligned}$$

(34)

The corresponding output weights ${\beta _{q + r}}$ can be obtained by

$$\begin{aligned} Min:\left\| {{H_{q + r}}{\beta _{q + r}} - {Y_{q + r}}} \right\| \end{aligned}$$

(35)

where ${Y_{q + r}} = [{y_q},{y_{q + 1}},\ldots ,{y_{q + r - 1}},{y_{q + r}}]$.

The solution of ${\beta _{q + r}}$ is

$$\begin{aligned} {\beta _{q + r}} = {K_{q + r - 1}^{ - 1}}H_{q + r}^T{Y_{q + r}} \end{aligned}$$

(36)

where

$$\begin{aligned} \begin{aligned} {K_{q + r}}&= H_{q + r}^T{H_{q + r}}\\&= [\begin{array}{*{20}{c}} {H_{q + r - 1}^T}&{h(q + r)} \end{array}]\left[ {\begin{array}{*{20}{c}} {{H_{q + r - 1}}}\\ {{h^T}(q + r)} \end{array}} \right] \\&= H_{q + r - 1}^T{H_{q + r - 1}} + h(q + r){h^T}(q + r)\\&= {[{K_{q + r - 1}} + h(q + r){h^T}(q + r)]^{ - 1}} \end{aligned} \end{aligned}$$

(37)

Then,

$$\begin{aligned} \begin{aligned} {\beta _{q + r}}&= K_{q + r}^{ - 1}H_{q + r}^T{Y_{q + r}}\\&= K_{q + r}^{ - 1}\sum \limits _{i = q}^{q + r} {h(i)t(i)} \\&= K_{q + r}^{ - 1}[H_{q + r - 1}^T{Y_{q + r - 1}} + h(q + r)y(q + r)]\\&= K_{q + r}^{ - 1}[{K_{q + r - 1}}{\beta _{q + r - 1}} + h(q + r)y(q + r)]\\&= K_{q + r}^{ - 1}\left\{ {[{K_{q + r}} - h(q + r){h^T}(q + r)]{\beta _{q + r - 1}}}\right. \\&\quad \left. { + h(q + r)y(q + r)} \right\} \\&= {\beta _{q + r - 1}} + K_{q + r}^{ - 1}h(q + r)\\&\qquad [y(q + r) - {h^T}(q + r){\beta _{q + r - 1}}] \end{aligned} \end{aligned}$$

(38)

Accordingly, we have

$$\begin{aligned} {P_{q \!+ \!r}}&= {P_{q \!+\! r \!- \!1}}\! -\! {P_{q \!+\! r\! - \!1}}h(q \!+ \!r)\nonumber \\&\qquad {[I\! +\! {h^T}(q \!+\! r){P_{q \!+\! r - 1}}(q \!+\! r)]^{ - 1}}{h^T}(q \!+ \!r){P_{q \!+\! r - 1}}\nonumber \\&= [I - \frac{{{P_{q + r - 1}}h(q + r){h^T}(q + r)}}{{1 + {h^T}(q + r){P_{q + r - 1}}h(q + r)}}]{P_{q + r - 1}} \end{aligned}$$

(39)

Because ${P_{q + r - 1}} = {(H_{q + r - 1}^T{H_{q + r - 1}})^{ - 1}}$ and ${P_{q + r}} = {P_{q + r}} + h(q + r){h^T}(q + r)$, ${P_{q + r - 1}}$ and ${P_{q + r}}$ are positive definite matrices. Thus, we have

$$\begin{aligned} {P_{q + r}} < {P_{q + r - 1}} \end{aligned}$$

(40)

According to (40), we have the following relationship:

$$\begin{aligned} \mathop {\lim }\limits _{(q + r) \rightarrow \infty } {P_{q + r}} = 0 \end{aligned}$$

(41)

To sum up, extremely, OS-ELM will lose its correction capability. $\square $

Appendix B

Detailed derivation of ALD:

Let ${X_{{N_0}}} = [{x_1},{x_2},\ldots ,{x_{{N_0}}}]$, and the mean of each variable of the initial training data ${\aleph _0} = \{ ({x_i},{y_i})\} _{i = 1}^{{N_0}}$ is

$$\begin{aligned} {u_{{\aleph _0}}}= & {} \frac{1}{{{N_0}}}{({X_{{N_0}}})^T} \cdot {\mathbf{{1}}_{{N_0}}} \end{aligned}$$

(42)

where ${\mathbf{{1}}_{{N_0}}} = {[1,1,\ldots ,1]^T}$. The data scaled to zero mean and unit variance can be represented as

$$\begin{aligned} {\tilde{X}_{{N_0}}}= & {} ({X_{{N_0}}} - {\mathbf{{1}}_{{N_0}}}u_{{\aleph _0}}^T) \cdot \sum \nolimits _{{N_0}}^{ - 1} {} \end{aligned}$$

(43)

where $\sum \nolimits _{{N_0}}^{} { = diag({\sigma _{{N_0}1}},{\sigma _{{N_0}2}},\ldots ,{\sigma _{{N_0}w}})} $, and ${\sigma _{{N_0}w}}$ stands for the standard deviation of the wth datum.

When the new datum ${x_{k + 1}}$ arrives, the corresponding mean and the standard deviation can be calculated by

$$\begin{aligned} {u_{{N_0} + 1}}= & {} \frac{{{N_0}}}{{{N_0} + 1}}{u_{{N_0}}} + \frac{1}{{{N_0} + 1}}{({x_{{N_0} + 1}})^T} \end{aligned}$$

(44)

$$\begin{aligned} \sigma _{\left( {{N_0} + 1} \right) i}^2 \!= & {} \!\frac{{{N_0} - 1}}{{{N_0}}}\sigma _{{N_0}i}^2 + \varDelta u_{{N_0} + 1}^2(i)\nonumber \\&\quad \!+\! \frac{1}{{{N_0}}}{\left\| {{x_{{N_0} + 1}}(i) - {u_{{N_0} + 1}}(i)} \right\| ^2} \end{aligned}$$

(45)

Let ${x_{{N_0} + 1}}$ be scaled with

$$\begin{aligned} {\tilde{x}_{{N_0} + 1}}= & {} ({x_{{N_0} + 1}} - \mathbf{{1}} \cdot u_{{N_0} + 1}^T)\sum \nolimits _{{N_0} + 1}^{ - 1} {} \end{aligned}$$

(46)

where $\sum \nolimits _{{N_0} + 1}^{} { = diag({\sigma _{({N_0} + 1)1}},{\sigma _{({N_0} + 1)2}},\ldots ,{\sigma _{({N_0} + 1)w}})} $.

Expanding the ALD shown in (18), we have

$$\begin{aligned} \begin{aligned} {\delta _{k + 1}}&= \min \left\{ {\sum \limits _{l,m = 1}^{{N_0}} {{\alpha _l}{\alpha _m}\left\langle {{{\tilde{x}}_l},{{\tilde{x}}_m}} \right\rangle \! - \! 2\sum \limits _{m = 1}^{{N_0}} {{\alpha _m}\left\langle {{{\tilde{x}}_m},{{\tilde{x}}_{{N_0} + 1}}} \right\rangle } }}\right. \\&\left. \qquad {{{ \! + \!\left\langle {{{\tilde{x}}_m},{{\tilde{x}}_{{N_0} + 1}}} \right\rangle } } } \right\} \\&= \min \left\{ {\alpha _{{N_0} + 1}^T{J_{{N_0}}}{\alpha _{{N_0} + 1}} - 2\alpha _{{N_0} + 1}^T{j_{{N_0}}} + {{\tilde{j}}_{{N_0} + 1}}} \right\} \end{aligned} \end{aligned}$$

(47)

where ${J_{{N_0}}} = {\tilde{X}_{{N_0}}} \cdot \tilde{X}_{{N_0}}^T$, ${j_{{N_0}}} = {\tilde{X}_{{N_0}}} \cdot \tilde{x}_{{N_0} + 1}^T$, ${\tilde{j}_{{N_0} + 1}} = {\tilde{x}_{{N_0} + 1}} \cdot \tilde{x}_{{N_0} + 1}^T$.

In order to minimize ${\delta _{k + 1}}$, we have

$$\begin{aligned} {\alpha _{{N_0} + 1}} = J_{{N_0}}^{ - 1} \cdot {j_{{N_0}}} \end{aligned}$$

(48)

Substituting (48) into (47), we can obtain the recursive ALD:

$$\begin{aligned} {\delta _{k + 1}}= & {} {\tilde{j}_{{N_0} + 1}} - j_{{N_0}}^T{\alpha _{{N_0} + 1}} = {\tilde{j}_{{N_0} + 1}} - j_{{N_0}}^TJ_{{N_0}}^{ - 1} \cdot {j_{{N_0}}} \end{aligned}$$

(49)

Thus, we have

$$\begin{aligned} {\tilde{x}_{k + 1}}= & {} \sum \limits _{l = 1}^{{N_0}} {{\alpha _l}} {\tilde{x}_l} + \varepsilon \end{aligned}$$

(50)

where $\xi $ is the error.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Li, Y. & Xiao, W. Adaptive online sequential extreme learning machine for dynamic modeling. Soft Comput 25, 2177–2189 (2021). https://doi.org/10.1007/s00500-020-05289-6

Download citation

Published: 04 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00500-020-05289-6

Adaptive online sequential extreme learning machine for dynamic modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Extended Extreme Learning Machine with Residual Compensation and Its Application to Device-Free Localization

Kernel online sequential ELM algorithm with sliding window subject to time-varying environments

The Parameter Updating Method Based on Kalman Filter for Online Sequential Extreme Learning Machine

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A

Proof of Theorem 1

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Adaptive online sequential extreme learning machine for dynamic modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Extended Extreme Learning Machine with Residual Compensation and Its Application to Device-Free Localization

Kernel online sequential ELM algorithm with sliding window subject to time-varying environments

The Parameter Updating Method Based on Kalman Filter for Online Sequential Extreme Learning Machine

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A

Proof of Theorem 1

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation