Abstract
Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagation errors. Nature 323:533–536
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297
Rosenblatt F (1962) Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Spartan Books, New York
Lowe D (1989) Adaptive radial basis function nonlinearities and the problem of generalisation. In: Proceedings of first IEE international conference on artificial neural networks, pp 171–175
Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, Budapest, Hungary, 25–29 July 2004, pp 985–990
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang G-B, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70:3056–3062
Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71:3460–3468
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44(2):525–536
Huang S-C, Huang Y-F (1991) Bounds on the number of hidden neurons in multilayer perceptrons. IEEE Trans Neural Netw 2(1):47–55
Sartori MA, Antsaklis PJ (1991) A simple method to derive bounds on the size and to train multilayer neural networks. IEEE Trans Neural Netw 2(4):467–471
Huang G-B, Babri HA (1998) Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Trans Neural Netw 9(1):224–229
Gallant A, White H (1992) There exists a neural network that does not make avoidable mistakes. In: White H (ed) Artificial neural networks: approximation and learning theory. Blackwell, Oxford, pp 5–11
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4:251–257
Leshno M, Lin VY, Pinkus A, Schocken S (1993) Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw 6:861–867
Park J, Sandberg IW (1991) Universal approximation using radial-basis-function networks. Neural Comput 3:246–257
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Funahashi K (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2:183–192
Stinchcombe M, White H (1992) Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions. In: White H (ed) Artificial neural networks: approximation and learning theory. Blackwell, Oxford, pp 29–40
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(3):930–945
Kwok T-Y, Yeung D-Y (1997) Objective functions for training new hidden units in constructive neural networks. IEEE Trans Neural Netw 8(5):1131–1148
Meir R, Maiorov VE (2000) On the optimality of neural-network approximation using incremental algorithms. IEEE Trans Neural Netw 11(2):323–337
Romero E (2001) Function approximation with SAOCIF: a general sequential method and a particular algorithm with feed-forward neural networks. Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya. http://www.lsi.upc.es/dept/techreps/html/R01-41.html
Huang G-B (2003) Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans Neural Netw 14(2):274–281
Corwin EM, Logar AM, Oldham WJB (1994) An iterative method for training multilayer networks with threshold function. IEEE Trans Neural Netw 5(3):507–508
Toms DJ (1990) Training binary node feedforward neural networks by backpropagation of error. Electron Lett 26(21):1745–1746
Goodman RM, Zeng Z (1994) A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: Proceedings of the 1994 IEEE workshop of neural networks for signal processing, pp 219–228
Plagianakos VP, Magoulas GD, Nousis NK, Vrahatis MN (2001) Training multilayer networks with discrete activation functions. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN’2001), Washington, DC, USA
Voxman WL, Roy J, Goetschel H (1981) Advanced calculus: an introduction to modern analysis. Marcel Dekker, New York
Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2:321–355
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329
Huang G-B, Li M-B, Chen L, Siew C-K (2008) Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing 71:576–583
Huang G-B, Siew C-K (2004) Extreme learning machine: RBF network case. In: Proceedings of the eighth international conference on control, automation, robotics and vision (ICARCV 2004), vol 2, Kunming, China, 6–9 Dec 2004, pp 1029–1036
Huang G-B, Zhu Q-Y, Mao K-Z, Siew C-K, Saratchandran P, Sundararajan N (2006) Can threshold networks be trained directly?. IEEE Trans Circuits Syst II 53(3):187–191
Serre D (2002) Matrices: theory and applications. Springer, New York
Rao CR, Mitra SK (1971) Generalized inverse of matrices and its applications. Wiley, New York
Huang G-B, Zhou H, Ding X, Zhang R (2010) Extreme learning machine for regression and multi-class classification. IEEE Trans Pattern Anal Mach Intell (submitted)
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Toh K-A (2008) Deterministic neural classification. Neural Comput 20(6):1565–1595
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: IEEE symposium on computational intelligence and data mining (CIDM2009), 30 March 2009–2 April 2009, pp 389–395
Man Z, Lee K, Wang D, Cao Z, Miao C (2011) A new robust training algorithm for a class of single-hidden layer feedforward neural networks. Neurocomputing (in press)
Miche Y, van Heeswijk M, Bas P, Simula O, Lendasse A (2011) TROP-ELM: a double-regularized elm using lars and tikhonov regularization. Neurocomputing (in press)
Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. In: Mozer M, Jordan J, Petscbe T (eds) Neural information processing systems, vol 9. MIT Press, Cambridge, pp 155–161
Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Lin K-M, Lin C-J (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14(6):1449–1459
Lee Y-J, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of the SIAM international conference on data mining, Chicago, USA, 5–7 Apr 2001
Suykens JAK, Vandewalle J (1997) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Frénay B, Verleysen M (2010) Using SVMs with randomised feature spaces: an extreme learning approach. In: Proceedings of the 18th European symposium on artificial neural networks (ESANN), Bruges, Belgium, 28–30 Apr 2010, pp 315–320
Frénay B, Verleysen M (2011) Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing (in press)
Li M-B, Huang G-B, Saratchandran P, Sundararajan N (2005) Fully complex extreme learning machine. Neurocomputing 68:306–314
Cha I, Kassam SA (1995) Channel equalization using adaptive complex radial basis function networks. IEEE J Sel Areas Commun 13:122–131
Jianping D, Sundararajan N, Saratchandran P (2002) Communication channel equalization using complex-valued minimal radial basis function neural networks. IEEE Trans Neural Netw 13:687–696
Kim T, Adali T (2003) Approximation by fully complex multilayer perseptrons. Neural Comput 15:1641–1666
LeCun Y, Bottou L, Orr GB, Müller K-R (1998) Efficient BackProp. Lect Notes Comput Sci 1524:9–50
Platt J (1991) A resource-allocating network for function interpolation. Neural Comput 3:213–225
Kadirkamanathan V, Niranjan M (1993) A function estimation approach to sequential learning with neural networks. Neural Comput 5:954–975
Yingwei L, Sundararajan N, Saratchandran P (1997) A sequential learning scheme for function approximation using minimal radial basis function (RBF) neural networks. Neural Comput 9:461–478
Yingwei L, Sundararajan N, Saratchandran P (1998) Performance evaluation of a sequential minimal radial basis function (RBF) neural network learning algorithm. IEEE Trans Neural Netw 9(2):308–318
Salmerón M, Ortega J, Puntonet CG, Prieto A (2001) Improved RAN sequential prediction using orthogonal techniques. Neurocomputing 41:153–172
Rojas I, Pomares H, Bernier JL, Ortega J, Pino B, Pelayo FJ, Prieto A (2002) Time series analysis using normalized PG-RBF network with regression weights. Neurocomputing 42:267–285
Huang G-B, Saratchandran P, Sundararajan N (2004) An efficient sequential learning algorithm for growing and pruning RBF (GAP-RBF) networks. IEEE Trans Syst Man Cybern Part B 34(6):2284–2292
Huang G-B, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67
Liang N-Y, Huang G-B, Saratchandran P, Sundararajan N (2006) A fast and accurate on-line sequential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423
Chong EKP, Zak SH (2001) An introduction to optimization. Wiley, New York
Golub GH, Loan CFV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Mackey MC, Glass L (1997) Oscillation and chaos in physiological control systems. Science 197:287–289
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Smola A, Schölkopf B (1998) A tutorial on support vector regression. NeuroCOLT2 technical report NC2-TR-1998-030
Hansen LK, Salamon P (1990) Neural network ensemble. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001
Breiman L (1996) Bagging predictor. Mach Learn 24(2):123–140
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227
Freund Y (1995) Boosting a weak algorithm by majority. Inf Comput 121(2):256–285
Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139
Sun Z-L, Choi T-M, Au K-F, Yu Y (2008) Sales forecasting using extreme learning machine with applications in fashion retailing. Decis Support Syst 46(1):411–419
van Heeswijk M, Miche Y, Lindh-Knuutila T, Hilbers PA, Honkela T, Oja E, Lendasse A (2009) Adaptive ensemble models of extreme learning machines for time series prediction. Lect Notes Comput Sci 5769:305–314
van Heeswijk M, Miche Y, Oja E, Lendasse A (2011) Gpu-accelerated and parallelized ELM ensembles for large-scale regression. Neurocomputing (in press)
Minku FL, Inoue H, Yao X (2011) Negative correlation in incremental learning. Nat Comp (in press)
Sun Y, Yuan Y, Wang G (2011) An OS-ELM based distributed ensemble classification framework in p2p networks. Neurocomputing (in press)
Lan Y, Soh YC, Huang G-B (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72:3391–3395
Rong H-J, Ong Y-S, Tan A-H, Zhu Z (2008) A fast pruned-extreme learning machine for classification problem. Neurocomputing 72:359–366
Miche Y, Sorjamaa A, Lendasse A (2008) OP-ELM: theory, experiments and a toolbox. Lect Notes Comput Sci 5163:145–154
Simila T, Tikka J (2005) Multiresponse sparse regression with application to multidimensional scaling. In: Proceedings in artificial neural networks: formal models and their applications, ICANN 2005, vol 3697, pp 97–102
Feng G, Huang G-B, Lin Q, Gay R (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357
Lan Y, Soh YC, Huang G-B (2010) Random search enhancement of error minimized extreme learning machine. In: European symposium on artificial neural networks (ESANN 2010), Bruges, Belgium, Apr 2010, pp 327–332
Li K, Huang G-B, Ge SS (2010) Fast construction of single hidden layer feedforward networks. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, Mar 2010
Mao K-Z, Bilings SA (1997) Algorithms for minimal model structure detection in nonlinear dynamic system identification. Int J Control 68(2):311–330
Lan Y, Soh YC, Huang G-B (2010) Constructive hidden nodes selection of extreme learning machine for regression. Neurocomputing 73:3191–3199
Lan Y, Soh YC, Huang GB (2010) Two-stage extreme learning machine for regression. Neurocomputing 73:3028–3038
Liu Q, He Q, Shi Z (2008) Extreme support vector machine classifier. Lect Notes Comput Sci 5012:222–233
Huang G-B, Ding X, Zhou H (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74:155–163
Fletcher R (1981) Practical methods of optimization. In: Constrained optimization, vol 2. Wiley, New York
Handoko SD, Keong KC, Soon OY, Zhang GL, Brusic V (2006) Extreme learning machine for predicting hla-peptide binding. Lect Notes Comput Sci 3973:716–721
Sun Z-L, Au K-F, Choi T-M (2008) A neuro-fuzzy inference system through integration of fuzzy logic and extreme learning machines. IEEE Trans Syst Man Cybern Part B Cybern 37(5):1321–1331
Tang X, Han M (2009) Partial lanczos extreme learning machine for single-output regression problems. Neurocomputing 72(13-15):3066–3076
Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A (2010) OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 21(1):158–162
Yeu C-WT, Lim M-H, Huang G-B, Agarwal A, Ong Y-S (2006) A new machine learning paradigm for terrain reconstruction. IEEE Geosci Remote Sens Lett 3(3):382–386
Soria-Olivas E, Gomez-Sanchis J, Martin JD, Vila-Frances J, Martinez M, Magdalena JR, Serrano AJ (2011) BELM: Bayesian extreme learning machine. IEEE Trans Neural Netw 22(3):505–509
Xu Y, Dong ZY, Meng K, Zhang R, Wong KP (2011) Real-time transient stability assessment model using extreme learning machine. IET Gener Transm Distrib 5(3):314–322
Barea R, Boquete L, Rodriguez-Ascariz JM, Ortega S, Lopez E (2011) Sensory system for implementing a human-computer interface based on electrooculography. Sensors 11(1):310–328
Chang N-B, Han M, Yao W, Chen L-C, Xu S (2011) Change detection of land use and land cover in an urban region with SPOT-5 images and partial lanczos extreme learning machine. J Appl Remote Sens 4
Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M (2011) ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. IEEE ACM Trans Comput Biol Bioinforma 6(2):452–463
Li F-C, Wang P-K, Wang G-E (2009) Comparison of the primitive classifiers with extreme learning machine in credit scoring. In: 2009 IEEE international conference on industrial engineering and engineering management, pp 685–688
Choi K, Toh K-A, Byun H (2011) Realtime training on mobile devices for face recognition applications. Pattern Recogn 44(2):386–400
Chen FL, Ou TY (2011) Sales forecasting system based on gray extreme learning machine with Taguchi method in retail industry. Expert Syst Appl 38(3):1336–1345
Ye Y, Squartim S, Piazza F (2010) Incremental-based extreme learning machine algorithms for time-variant neural networks. Lect Notes Comput Sci 6215:9–16
Suresh S, Saraswathi S, Sundararajan N (2010) Performance enhancement of extreme learning machine for multi-category sparse data classification problems. Eng Appl Artif Intell 23(7):1149–1157
Li G, Liu M, Dong M (2010) A new online learning algorithm for structure-adjustable extreme learning machine. Comput Math Appl 60(3):377–389
Liu Y, Xu X, Wang C (2009) Simple ensemble of extreme learning machine. In: Proceedings of the 2009 2nd international congress on image and signal processing, pp 2177–2181
Deng W, Chen L (2010) Color image watermarking using regularized extreme learning machine. Neural Network World 20(3):317–330
Mohammed AA, Wu QMJ, Sid-Ahmed MA (2010) Application of wave atoms decomposition and extreme learning machine for fingerprint classification. Lect Notes Comput Sci 6112:246–256
Minhas R, Baradarani A, Seifzadeh S, Wu QMJ (2010) Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing 73:1906–1917
Malathi V, Marimuthu NS, Baskar S (2010) Intelligent approaches using support vector machine and extreme learning machine for transmission line protection. Neurocomputing 73:2160–2167
Tang X-L, Han M (2010) Ternary reversible extreme learning machines: the incremental tri-training method for semi-supervised classification. Knowl Inf Syst 22(3):345–372
Nizar AH, Dong ZY, Wang Y (2008) Power utility nontechnical loss analysis with extreme learning machine method. IEEE Trans Power Syst 23(3):946–955
Cho JS, White H (2011) Testing correct model specification using extreme learning machines. Neurocomputing (in press)
Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine. Neurocomputing (in press)
Deng J, Li K, Irwin GW (2011) Fast automatic two-stage nonlinear model identification based on the extreme learning machine. Neurocomputing (in press)
Acknowledgments
This research was sponsored by the grant from Academic Research Fund (AcRF) Tier 1 under project no. RG 22/08 (M52040128).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, GB., Wang, D.H. & Lan, Y. Extreme learning machines: a survey. Int. J. Mach. Learn. & Cyber. 2, 107–122 (2011). https://doi.org/10.1007/s13042-011-0019-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-011-0019-y