Infinite Lattice Learner: an ensemble for incremental learning

Justin Lovinger¹ &
Iren Valova¹

197 Accesses
1 Citation
Explore all metrics

Abstract

The state of the art in supervised learning has developed effective models for learning, generalizing, recognizing faces and images, time series prediction, and more. However, most of these powerful models cannot effectively learn incrementally. Infinite Lattice Learner (ILL) is an ensemble model that extends state-of-the-art machine learning methods into incremental learning models. With ILL, even batch models can learn incrementally with exceptional data retention ability. Instead of continually revisiting past instances to retain learned information, ILL allows existing methods to converge on new information without overriding previous knowledge. With ILL, models can efficiently chase a drifting function without continually revisiting a changing dataset. Models wrapped in ILL can operate in continuous real-time environments where millions of unique samples are seen every day. Big datasets too large to fit in memory, or even a single machine, can be learned in portions. ILL utilizes an infinite Cartesian grid of points with an underlying model tiled upon it. Efficient algorithms for discovering nearby points and lazy evaluation make this seemingly impossible task possible. Extensive empirical evaluation reveals impressive retention ability for all ILL models. ILL similarly proves its generalization ability on a variety of datasets from classification and regression to image recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continually trained life-long classification

Article 09 June 2021

Dynamic Ensemble of Ensembles in Nonstationary Environments

The online performance estimation framework: heterogeneous ensemble learning for data streams

Article Open access 21 December 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185
MathSciNet Google Scholar
Balsubramani A, Dasgupta S, Freund Y (2013) The fast convergence of incremental PCA. In: Advances in neural information processing systems, pp 3174–3182
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Google Scholar
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 245–250
Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document
Cederborg T, Li M, Baranes A, Oudeyer P-Y (2010) Incremental local online gaussian mixture regression for imitation learning of multiple tasks. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on. IEEE, pp 267–274
Cohen M B, Elder S, Musco C, Musco C, Persu M (2015) Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing. ACM, pp 163–172
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cyc (2008) Graphic showing the maximum separating hyperplane and the margin. https://commons.wikimedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png. Accessed Mar 2019
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):1
Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Google Scholar
Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576
Gepperth A, Hammer B (2016) Incremental learning algorithms and applications. In: European symposium on artificial neural networks (ESANN)
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: International conference on artificial intelligence and statistics, pp 315–323
Hagan MT, Demuth HB, Beale MH et al (1996) Neural network design. PWS Pub, Boston
Google Scholar
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. CoRR arXiv:1507.06527
Hazan E et al (2016) Introduction to online convex optimization. Found Trends® Optim 2(3–4):157–325
Google Scholar
Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, 1995, vol 1. IEEE, pp 278–282
Huang G-B, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67
Google Scholar
Huang G-B, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Google Scholar
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Google Scholar
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Google Scholar
Jiang F, Sui Y, Cao C (2013) An incremental decision tree algorithm based on rough sets and its application in intrusion detection. Artif Intell Rev 4:1–14
Google Scholar
Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26(189–206):1
MathSciNet MATH Google Scholar
Kalteh AM, Hjorth P, Berndtsson R (2008) Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ Model Softw 23(7):835–845
Google Scholar
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
MathSciNet MATH Google Scholar
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Google Scholar
Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed Mar 2019
Lovinger J, Valova I (2016) Neural field: supervised apportioned incremental learning (SAIL). In: 2016 international joint conference on neural networks (IJCNN), pp 2500–2506
Lowe D, Broomhead D (1988) Multivariable functional interpolation and adaptive networks. Complex syst 2:321–355
MathSciNet MATH Google Scholar
Ma K, Ben-Arie J (2014) Compound exemplar based object detection by incremental random forest. In: Pattern recognition (ICPR), 2014 22nd international conference on. IEEE, pp 2407–2412
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol 30, p 1
Milborrow S (2011) Titanic decision tree. https://commons.wikimedia.org/wiki/File:CART_tree_titanic_survivors.png. Accessed Mar 2019
Montgomery DC, Peck EA, Vining GG (2015) Introduction to linear regression analysis. Wiley, New York
MATH Google Scholar
Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240
Google Scholar
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, vol 4. Irwin, Chicago
Google Scholar
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn
Pace RK. California housing. http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html. Accessed Dec 2019
Pace RK, Barry R (1997) Sparse spatial autoregressions. Stat Probab Lett 33(3):291–297
MATH Google Scholar
Pang S, Ozawa S, Kasabov N (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 35(5):905–914
Google Scholar
Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572
MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508
Google Scholar
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Ramos F, Ott L (2016) Hilbert maps: scalable continuous occupancy mapping with stochastic gradient descent. Int J Robot Res 35(14):1717–1730
Google Scholar
Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113(13):130503
Google Scholar
Riedmiller M (2005) Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L (eds) Machine learning: ECML 2005. ECML 2005. Lecture notes in computer science, vol 3720. Springer, Berlin, Heidelberg
Ristin M, Guillaumin M, Gall J, Van Gool L (2016) Incremental learning of random forests for large-scale image classification. IEEE Trans Pattern Anal Mach Intell 38(3):490–503
Google Scholar
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104
Google Scholar
Rosasco L, Villa S (2015) Learning with incremental iterative regularization. In: Advances in neural information processing systems, pp 1630–1638
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747
Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) The cart decision tree for mining data streams. Inf Sci 266:1–15
MATH Google Scholar
Schmid H (2013) Probabilistic part-of-speech tagging using decision trees. In: Paper presented at the meeting of the proceedings of the international conference on new methods in language processing, Manchester, UK, 1994
Seber GA, Lee AJ (2012) Linear regression analysis, vol 936. Wiley, New York
MATH Google Scholar
Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun Rev 5(1):3–55
MathSciNet Google Scholar
Shi X, Yang Y, Guo Z, Lai Z (2014) Face recognition by sparse discriminant analysis via joint l 2, 1-norm minimization. Pattern Recognit 47(7):2447–2453
Google Scholar
Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Automatic face and gesture recognition, 2002. Proceedings. Fifth IEEE international conference on. IEEE, pp 53–58
Song W, Zhu J, Li Y, Chen C (2016) Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans Circuits Syst Video Technol 26(7):1241–1250
Google Scholar
Soudry D, Di Castro D, Gal A, Kolodny A, Kvatinsky S (2015) Memristor-based multilayer neural networks with online gradient descent training. IEEE Trans Neural Netw Learn Syst 26(10):2408–2421
MathSciNet Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge
MATH Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Google Scholar
Tagliaferri R, Longo G, Milano L, Acernese F, Barone F, Ciaramella A, De Rosa R, Donalek C, Eleuteri A, Raiconi G et al (2003) Neural networks in astronomy. Neural Netw 16(3):297–319
Google Scholar
Tan Y, Wang J, Zurada JM (2001) Nonlinear blind source separation using a radial basis function network. IEEE Trans Neural Netw 12(1):124–134
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282
MathSciNet MATH Google Scholar
Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6985–6989
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186
Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. AAAI 2094-2100
Variddhisaï T, Mandic D (2017) Online multilinear dictionary learning for sequential compressive sensing. arXiv preprint arXiv:1703.02492
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
MATH Google Scholar
Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge England
Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp 1473–1480
Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040
Google Scholar
Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and IEEE 754 compliance for NVIDIA GPUS. rn (A+ B) 21(1):18749–19424
Google Scholar
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los Altos
Google Scholar
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Google Scholar
Yoshida Y, Karakida R, Okada M, Amari S-I (2017) Statistical mechanical analysis of online learning with weight normalization in single layer perceptron. J Phys Soc Jpn 86(4):044002
Google Scholar
Zhao H, Yuen PC (2008) Incremental linear discriminant analysis for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 38(1):210–221
Google Scholar
Zhao H, Yuen PC, Kwok JT (2006) A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 36(4):873–886
Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Information Science Department, University of Massachusetts Dartmouth, Dartmouth, MA, USA
Justin Lovinger & Iren Valova

Authors

Justin Lovinger
View author publications
You can also search for this author in PubMed Google Scholar
Iren Valova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iren Valova.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Radius points algorithm with floating point math

When Algorithm 1 is implemented with floating point math (Whitehead and Fit-Florea 2011), as is common in computer systems, numerical precision errors can result in points with insignificant differences, when points would be identical with infinite precision. For example, given grid spacing $g= 0.1$, radius $r_1 = 0.1$, and center $\vec {c}= [0.61]$, using double-precision 64-bit format IEEE 754 math, Algorithm 1 will return points $[g\lceil (\vec {c}_1 - r_1) / g\rceil = 0.6000000000000001]$ and $[g\lceil (\vec {c}_1 - r_1) / g\rceil + g = 0.7000000000000001]$. If we expand the radius to $r_2 = 0.11$, Algorithm 1 will return points $[g\lceil (\vec {c}_1 - r_2) / g\rceil = 0.5]$, $[g\lceil (\vec {c}_1 - r_2) / g\rceil + g = 0.6]$, and $[g\lceil (\vec {c}_1 - r_2) / g\rceil + g + g = 0.7]$. Although the larger radius $r_2$ should encompass at least the same points as the smaller radius $r_1$, limited precision in double-precision 64-bit format IEEE 754 results in completely different points for $r_1$ and $r_2$. Note that the radius parameter varies regularly during recursion of the radius points procedure and with various neighborhood functions utilizing Algorithm 1. Rounding each component to the number of significant digits in $g$ avoids precision errors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lovinger, J., Valova, I. Infinite Lattice Learner: an ensemble for incremental learning. Soft Comput 24, 6957–6974 (2020). https://doi.org/10.1007/s00500-019-04330-7

Download citation

Published: 06 September 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s00500-019-04330-7

Infinite Lattice Learner: an ensemble for incremental learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Continually trained life-long classification

Dynamic Ensemble of Ensembles in Nonstationary Environments

The online performance estimation framework: heterogeneous ensemble learning for data streams

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's Note

Radius points algorithm with floating point math

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Infinite Lattice Learner: an ensemble for incremental learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Continually trained life-long classification

Dynamic Ensemble of Ensembles in Nonstationary Environments

The online performance estimation framework: heterogeneous ensemble learning for data streams

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's Note

Radius points algorithm with floating point math

Radius points algorithm with floating point math

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation