Abstract
The state of the art in supervised learning has developed effective models for learning, generalizing, recognizing faces and images, time series prediction, and more. However, most of these powerful models cannot effectively learn incrementally. Infinite Lattice Learner (ILL) is an ensemble model that extends state-of-the-art machine learning methods into incremental learning models. With ILL, even batch models can learn incrementally with exceptional data retention ability. Instead of continually revisiting past instances to retain learned information, ILL allows existing methods to converge on new information without overriding previous knowledge. With ILL, models can efficiently chase a drifting function without continually revisiting a changing dataset. Models wrapped in ILL can operate in continuous real-time environments where millions of unique samples are seen every day. Big datasets too large to fit in memory, or even a single machine, can be learned in portions. ILL utilizes an infinite Cartesian grid of points with an underlying model tiled upon it. Efficient algorithms for discovering nearby points and lazy evaluation make this seemingly impossible task possible. Extensive empirical evaluation reveals impressive retention ability for all ILL models. ILL similarly proves its generalization ability on a variety of datasets from classification and regression to image recognition.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185
Balsubramani A, Dasgupta S, Freund Y (2013) The fast convergence of incremental PCA. In: Advances in neural information processing systems, pp 3174–3182
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 245–250
Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document
Cederborg T, Li M, Baranes A, Oudeyer P-Y (2010) Incremental local online gaussian mixture regression for imitation learning of multiple tasks. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on. IEEE, pp 267–274
Cohen M B, Elder S, Musco C, Musco C, Persu M (2015) Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing. ACM, pp 163–172
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cyc (2008) Graphic showing the maximum separating hyperplane and the margin. https://commons.wikimedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png. Accessed Mar 2019
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):1
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576
Gepperth A, Hammer B (2016) Incremental learning algorithms and applications. In: European symposium on artificial neural networks (ESANN)
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: International conference on artificial intelligence and statistics, pp 315–323
Hagan MT, Demuth HB, Beale MH et al (1996) Neural network design. PWS Pub, Boston
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. CoRR arXiv:1507.06527
Hazan E et al (2016) Introduction to online convex optimization. Found Trends® Optim 2(3–4):157–325
Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, 1995, vol 1. IEEE, pp 278–282
Huang G-B, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67
Huang G-B, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Jiang F, Sui Y, Cao C (2013) An incremental decision tree algorithm based on rough sets and its application in intrusion detection. Artif Intell Rev 4:1–14
Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26(189–206):1
Kalteh AM, Hjorth P, Berndtsson R (2008) Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ Model Softw 23(7):835–845
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed Mar 2019
Lovinger J, Valova I (2016) Neural field: supervised apportioned incremental learning (SAIL). In: 2016 international joint conference on neural networks (IJCNN), pp 2500–2506
Lowe D, Broomhead D (1988) Multivariable functional interpolation and adaptive networks. Complex syst 2:321–355
Ma K, Ben-Arie J (2014) Compound exemplar based object detection by incremental random forest. In: Pattern recognition (ICPR), 2014 22nd international conference on. IEEE, pp 2407–2412
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol 30, p 1
Milborrow S (2011) Titanic decision tree. https://commons.wikimedia.org/wiki/File:CART_tree_titanic_survivors.png. Accessed Mar 2019
Montgomery DC, Peck EA, Vining GG (2015) Introduction to linear regression analysis. Wiley, New York
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, vol 4. Irwin, Chicago
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn
Pace RK. California housing. http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html. Accessed Dec 2019
Pace RK, Barry R (1997) Sparse spatial autoregressions. Stat Probab Lett 33(3):291–297
Pang S, Ozawa S, Kasabov N (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 35(5):905–914
Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Ramos F, Ott L (2016) Hilbert maps: scalable continuous occupancy mapping with stochastic gradient descent. Int J Robot Res 35(14):1717–1730
Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113(13):130503
Riedmiller M (2005) Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L (eds) Machine learning: ECML 2005. ECML 2005. Lecture notes in computer science, vol 3720. Springer, Berlin, Heidelberg
Ristin M, Guillaumin M, Gall J, Van Gool L (2016) Incremental learning of random forests for large-scale image classification. IEEE Trans Pattern Anal Mach Intell 38(3):490–503
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Rosasco L, Villa S (2015) Learning with incremental iterative regularization. In: Advances in neural information processing systems, pp 1630–1638
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) The cart decision tree for mining data streams. Inf Sci 266:1–15
Schmid H (2013) Probabilistic part-of-speech tagging using decision trees. In: Paper presented at the meeting of the proceedings of the international conference on new methods in language processing, Manchester, UK, 1994
Seber GA, Lee AJ (2012) Linear regression analysis, vol 936. Wiley, New York
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun Rev 5(1):3–55
Shi X, Yang Y, Guo Z, Lai Z (2014) Face recognition by sparse discriminant analysis via joint l 2, 1-norm minimization. Pattern Recognit 47(7):2447–2453
Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Automatic face and gesture recognition, 2002. Proceedings. Fifth IEEE international conference on. IEEE, pp 53–58
Song W, Zhu J, Li Y, Chen C (2016) Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans Circuits Syst Video Technol 26(7):1241–1250
Soudry D, Di Castro D, Gal A, Kolodny A, Kvatinsky S (2015) Memristor-based multilayer neural networks with online gradient descent training. IEEE Trans Neural Netw Learn Syst 26(10):2408–2421
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Tagliaferri R, Longo G, Milano L, Acernese F, Barone F, Ciaramella A, De Rosa R, Donalek C, Eleuteri A, Raiconi G et al (2003) Neural networks in astronomy. Neural Netw 16(3):297–319
Tan Y, Wang J, Zurada JM (2001) Nonlinear blind source separation using a radial basis function network. IEEE Trans Neural Netw 12(1):124–134
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282
Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6985–6989
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. AAAI 2094-2100
Variddhisaï T, Mandic D (2017) Online multilinear dictionary learning for sequential compressive sensing. arXiv preprint arXiv:1703.02492
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge England
Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp 1473–1480
Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040
Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and IEEE 754 compliance for NVIDIA GPUS. rn (A+ B) 21(1):18749–19424
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los Altos
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Yoshida Y, Karakida R, Okada M, Amari S-I (2017) Statistical mechanical analysis of online learning with weight normalization in single layer perceptron. J Phys Soc Jpn 86(4):044002
Zhao H, Yuen PC (2008) Incremental linear discriminant analysis for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 38(1):210–221
Zhao H, Yuen PC, Kwok JT (2006) A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 36(4):873–886
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Radius points algorithm with floating point math
Radius points algorithm with floating point math
When Algorithm 1 is implemented with floating point math (Whitehead and Fit-Florea 2011), as is common in computer systems, numerical precision errors can result in points with insignificant differences, when points would be identical with infinite precision. For example, given grid spacing \(g= 0.1\), radius \(r_1 = 0.1\), and center \(\vec {c}= [0.61]\), using double-precision 64-bit format IEEE 754 math, Algorithm 1 will return points \([g\lceil (\vec {c}_1 - r_1) / g\rceil = 0.6000000000000001]\) and \([g\lceil (\vec {c}_1 - r_1) / g\rceil + g = 0.7000000000000001]\). If we expand the radius to \(r_2 = 0.11\), Algorithm 1 will return points \([g\lceil (\vec {c}_1 - r_2) / g\rceil = 0.5]\), \([g\lceil (\vec {c}_1 - r_2) / g\rceil + g = 0.6]\), and \([g\lceil (\vec {c}_1 - r_2) / g\rceil + g + g = 0.7]\). Although the larger radius \(r_2\) should encompass at least the same points as the smaller radius \(r_1\), limited precision in double-precision 64-bit format IEEE 754 results in completely different points for \(r_1\) and \(r_2\). Note that the radius parameter varies regularly during recursion of the radius points procedure and with various neighborhood functions utilizing Algorithm 1. Rounding each component to the number of significant digits in \(g\) avoids precision errors.
Rights and permissions
About this article
Cite this article
Lovinger, J., Valova, I. Infinite Lattice Learner: an ensemble for incremental learning. Soft Comput 24, 6957–6974 (2020). https://doi.org/10.1007/s00500-019-04330-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04330-7