Abstract
Extreme learning machine (ELM) is an emerging machine learning algorithm for training single-hidden-layer feedforward networks (SLFNs). The salient features of ELM are that its hidden layer parameters can be generated randomly, and only the corresponding output weights are determined analytically through the least-square manner, so it is easier to be implemented with faster learning speed and better generalization performance. As the online version of ELM, online sequential ELM (OS-ELM) can deal with the sequentially coming data one by one or chunk by chunk with fixed or varying chunk size. However, OS-ELM cannot function well in dealing with dynamic modeling problems due to the data saturation problem. In order to tackle this issue, in this paper, we propose a novel OS-ELM, named adaptive OS-ELM (AOS-ELM), for enhancing the generalization performance and dynamic tracking capability of OS-ELM for modeling problems in nonstationary environments. The proposed AOS-ELM can efficiently reduce the negative effects of the data saturation problem, in which approximate linear dependence (ALD) and a modified hybrid forgetting mechanism (HFM) are adopted to filter the useless new data and alleviate the impacts of the outdated data, respectively. The performance of AOS-ELM is verified using selected benchmark datasets and a real-world application, i.e., device-free localization (DFL), by comparing it with classic ELM, OS-ELM, FOS-ELM, and DU-OS-ELM. Experimental results demonstrate that AOS-ELM can achieve better performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Engel Y, Mannor S, Meir R (2004) The kernel recursive east-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285
Fusi S, Miller EK, Rigotti M (2016) Why neurons mix: High dimensionality for higher cognition. Curr Opin Neurobio 37:66–74
Gu Y, Liu J, Chen Y, Jiang X, Yu H (2014) TOSELM: Timeless online sequential extreme learning machine. Neurocomptuing 128:119–127
Hirnik K (1991) Approximation capabilities of multilayer feedforword networks. Neural Netw 2:251–257
Huang GB, Chen YQ, Babri HA (2000) Classification ability of single hidden layer feedforward networks. IEEE Trans Neural Netw 11:799–801
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: Theory and applications. Neurocomptuing 70:489–501
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Lan Y, Soh YC, Huang GB (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72:3391–3395
Li Y, Zhang S, Yin Y, Xiao W, Zhang J (2017) A novel online sequential extreme learning machine for gas utilization ratio prediction in blast furnaces. Sensors 17(8):1847–1870
Li Y, Zhang S, Yin Y, Zhang J, Xiao W (2019) A soft sensing scheme of gas utilization ratio prediction for balst furnace via improved extreme learning machine. Neural Process Lett 50:1191–1213
Li Y, Zhang S, Zhang J, Yin Y, Xiao W, Zhang Z (2020) Data-driven multi-objective optimizaton for burden surface in blast furnace with feedback compensation. IEEE Trans Ind Inform 16(4):2233–2244
Li Y, Li H, Zhang J, Zhang S, Yin Y (2020) Burden surface decision using MODE with TOPSIS in blast furnace ironmkaing. IEEE Access 8:5712–35725
Liang NY, Huang GB, Saratchandran P, Sundararajan N (2006) A fast and accurate online squential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423
Lim JS, Lee S, Pang HS (2013) Low complexity adaptive forgetting factor for online sequential extreme learning machine (OS-ELM) for application to nonstationary system estimations. Neural Comput Appl 22:569–576
Maponi P (2007) The solution of linear systems by using the Sherman-Morrison formula. Linear Algebra Appl 420(2–3):276–294
Merched R (2013) A unified approach to structured covariances: Fast generalized sliding window RLS recursions for arbitrary basis. IEEE Trans Signal Process 61(23):6060–6075
Safaei A, Wu QMJ, Thangarajah A, Yang Y (2019) System-on-a-chip (SoC)-based hardware acceleration for an online sequential extreme learning machine (OS-ELM). IEEE Trans Comput Aided Des Int Circuits Syst 38(11):2127–2138
Scardapane S, Comminiello D, Scarpiniti M, Uncini A (2015) Online sequential extreme learning machine with kernels. IEEE Trans Neural Netw Learn Syst 26(9):2214–2220
Soares SG, Araujo R (2016) An adaptive ensemble of on-line extreme learning machines with variable forgetting factor for dynamic system prediction. Neurocomptuing 171:693–707
Tang J, Yu W, Chai T, Zhao L (2012) On-line principal component analysis with application to process modeling. Neurocomputing 82:167–178
Wang R, Kwong S, Wang XZ (2012) A study on random weights between input and hidden layers in extreme learning machine. Soft Comput 16:1465–1475
Xiao W, Zhang J, Li Y, Zhang S, Yang W (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomptuing 261:70–82
Zhang J, Xiao W, Zhang S, Huang S (2017) Device-free localization via an extreme learning machine with parameterized geometrical feature extraction. Sensors 17(4):879–890
Zhang J, Xiao X, Li Y, Zhang S (2018) Residual compensation extreme learning machine for regression. Neurocomputing 311:126–136
Zhang J, Xiao W, Li Y, Zhang S, Zhang Z (2020) Multilayer probability extreme learning machine for device-free localization. Neurocomputing 396:383–393
Zhao J, Wang Z, Park DS (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomptuing 87:79–89
Acknowledgements
The authors would like to thank the anonymous reviewers for their insightful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Proof of Theorem 1
Assuming that from time q to time \(q+r-1\), there are input vector \({X_q}\) and output vector \({Y_q}\) of the r samples, where
The corresponding output weights \({\beta _{q + r - 1}}\) can be obtained by
where \({H_{q + r - 1}}\) is the hidden layer output matrix of the r samples from time q to time \(q+r\)-1:
Let \({P_{q + r - 1}} = K_{q + r - 1}^{ - 1}\), the solution of \({\beta _{q + r - 1}}\) is
where \({K_{q + r - 1}} = H_{q + r - 1}^T{H_{q + r - 1}}\).
At time \(q+r\), when a data pair \(({x_{q + r}},{y_{q + r}})\) comes, \({H_{q + r - 1}}\) becomes
The corresponding output weights \({\beta _{q + r}}\) can be obtained by
where \({Y_{q + r}} = [{y_q},{y_{q + 1}},\ldots ,{y_{q + r - 1}},{y_{q + r}}]\).
The solution of \({\beta _{q + r}}\) is
where
Then,
Accordingly, we have
Because \({P_{q + r - 1}} = {(H_{q + r - 1}^T{H_{q + r - 1}})^{ - 1}}\) and \({P_{q + r}} = {P_{q + r}} + h(q + r){h^T}(q + r)\), \({P_{q + r - 1}}\) and \({P_{q + r}}\) are positive definite matrices. Thus, we have
According to (40), we have the following relationship:
To sum up, extremely, OS-ELM will lose its correction capability. \(\square \)
Appendix B
Detailed derivation of ALD:
Let \({X_{{N_0}}} = [{x_1},{x_2},\ldots ,{x_{{N_0}}}]\), and the mean of each variable of the initial training data \({\aleph _0} = \{ ({x_i},{y_i})\} _{i = 1}^{{N_0}}\) is
where \({\mathbf{{1}}_{{N_0}}} = {[1,1,\ldots ,1]^T}\). The data scaled to zero mean and unit variance can be represented as
where \(\sum \nolimits _{{N_0}}^{} { = diag({\sigma _{{N_0}1}},{\sigma _{{N_0}2}},\ldots ,{\sigma _{{N_0}w}})} \), and \({\sigma _{{N_0}w}}\) stands for the standard deviation of the wth datum.
When the new datum \({x_{k + 1}}\) arrives, the corresponding mean and the standard deviation can be calculated by
Let \({x_{{N_0} + 1}}\) be scaled with
where \(\sum \nolimits _{{N_0} + 1}^{} { = diag({\sigma _{({N_0} + 1)1}},{\sigma _{({N_0} + 1)2}},\ldots ,{\sigma _{({N_0} + 1)w}})} \).
Expanding the ALD shown in (18), we have
where \({J_{{N_0}}} = {\tilde{X}_{{N_0}}} \cdot \tilde{X}_{{N_0}}^T\), \({j_{{N_0}}} = {\tilde{X}_{{N_0}}} \cdot \tilde{x}_{{N_0} + 1}^T\), \({\tilde{j}_{{N_0} + 1}} = {\tilde{x}_{{N_0} + 1}} \cdot \tilde{x}_{{N_0} + 1}^T\).
In order to minimize \({\delta _{k + 1}}\), we have
Substituting (48) into (47), we can obtain the recursive ALD:
Thus, we have
where \(\xi \) is the error.
Rights and permissions
About this article
Cite this article
Zhang, J., Li, Y. & Xiao, W. Adaptive online sequential extreme learning machine for dynamic modeling. Soft Comput 25, 2177–2189 (2021). https://doi.org/10.1007/s00500-020-05289-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05289-6