Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Infinite Lattice Learner: an ensemble for incremental learning

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The state of the art in supervised learning has developed effective models for learning, generalizing, recognizing faces and images, time series prediction, and more. However, most of these powerful models cannot effectively learn incrementally. Infinite Lattice Learner (ILL) is an ensemble model that extends state-of-the-art machine learning methods into incremental learning models. With ILL, even batch models can learn incrementally with exceptional data retention ability. Instead of continually revisiting past instances to retain learned information, ILL allows existing methods to converge on new information without overriding previous knowledge. With ILL, models can efficiently chase a drifting function without continually revisiting a changing dataset. Models wrapped in ILL can operate in continuous real-time environments where millions of unique samples are seen every day. Big datasets too large to fit in memory, or even a single machine, can be learned in portions. ILL utilizes an infinite Cartesian grid of points with an underlying model tiled upon it. Efficient algorithms for discovering nearby points and lazy evaluation make this seemingly impossible task possible. Extensive empirical evaluation reveals impressive retention ability for all ILL models. ILL similarly proves its generalization ability on a variety of datasets from classification and regression to image recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185

    MathSciNet  Google Scholar 

  • Balsubramani A, Dasgupta S, Freund Y (2013) The fast convergence of incremental PCA. In: Advances in neural information processing systems, pp 3174–3182

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Google Scholar 

  • Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 245–250

  • Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document

  • Cederborg T, Li M, Baranes A, Oudeyer P-Y (2010) Incremental local online gaussian mixture regression for imitation learning of multiple tasks. In: Intelligent robots and systems (IROS), 2010 IEEE/RSJ international conference on. IEEE, pp 267–274

  • Cohen M B, Elder S, Musco C, Musco C, Persu M (2015) Dimensionality reduction for k-means clustering and low rank approximation. In: Proceedings of the forty-seventh annual ACM symposium on theory of computing. ACM, pp 163–172

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Cyc (2008) Graphic showing the maximum separating hyperplane and the margin. https://commons.wikimedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png. Accessed Mar 2019

  • Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):1

    Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Google Scholar 

  • Gatys LA, Ecker AS, Bethge M (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576

  • Gepperth A, Hammer B (2016) Incremental learning algorithms and applications. In: European symposium on artificial neural networks (ESANN)

  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: International conference on artificial intelligence and statistics, pp 315–323

  • Hagan MT, Demuth HB, Beale MH et al (1996) Neural network design. PWS Pub, Boston

    Google Scholar 

  • Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. CoRR arXiv:1507.06527

  • Hazan E et al (2016) Introduction to online convex optimization. Found Trends® Optim 2(3–4):157–325

    Google Scholar 

  • Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, 1995, vol 1. IEEE, pp 278–282

  • Huang G-B, Saratchandran P, Sundararajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57–67

    Google Scholar 

  • Huang G-B, Chen L, Siew CK et al (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Google Scholar 

  • Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529

    Google Scholar 

  • Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Google Scholar 

  • Jiang F, Sui Y, Cao C (2013) An incremental decision tree algorithm based on rough sets and its application in intrusion detection. Artif Intell Rev 4:1–14

    Google Scholar 

  • Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26(189–206):1

    MathSciNet  MATH  Google Scholar 

  • Kalteh AM, Hjorth P, Berndtsson R (2008) Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ Model Softw 23(7):835–845

    Google Scholar 

  • Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69

    MathSciNet  MATH  Google Scholar 

  • Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Google Scholar 

  • Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  • Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed Mar 2019

  • Lovinger J, Valova I (2016) Neural field: supervised apportioned incremental learning (SAIL). In: 2016 international joint conference on neural networks (IJCNN), pp 2500–2506

  • Lowe D, Broomhead D (1988) Multivariable functional interpolation and adaptive networks. Complex syst 2:321–355

    MathSciNet  MATH  Google Scholar 

  • Ma K, Ben-Arie J (2014) Compound exemplar based object detection by incremental random forest. In: Pattern recognition (ICPR), 2014 22nd international conference on. IEEE, pp 2407–2412

  • Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML, vol 30, p 1

  • Milborrow S (2011) Titanic decision tree. https://commons.wikimedia.org/wiki/File:CART_tree_titanic_survivors.png. Accessed Mar 2019

  • Montgomery DC, Peck EA, Vining GG (2015) Introduction to linear regression analysis. Wiley, New York

    MATH  Google Scholar 

  • Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans Pattern Anal Mach Intell 36(11):2227–2240

    Google Scholar 

  • Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, vol 4. Irwin, Chicago

    Google Scholar 

  • Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn

  • Pace RK. California housing. http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html. Accessed Dec 2019

  • Pace RK, Barry R (1997) Sparse spatial autoregressions. Stat Probab Lett 33(3):291–297

    MATH  Google Scholar 

  • Pang S, Ozawa S, Kasabov N (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Man Cybern Part B (Cybern) 35(5):905–914

    Google Scholar 

  • Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572

    MATH  Google Scholar 

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybern Part C (Appl Rev) 31(4):497–508

    Google Scholar 

  • Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365

    Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Ramos F, Ott L (2016) Hilbert maps: scalable continuous occupancy mapping with stochastic gradient descent. Int J Robot Res 35(14):1717–1730

    Google Scholar 

  • Rebentrost P, Mohseni M, Lloyd S (2014) Quantum support vector machine for big data classification. Phys Rev Lett 113(13):130503

    Google Scholar 

  • Riedmiller M (2005) Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama J, Camacho R, Brazdil PB, Jorge AM, Torgo L (eds) Machine learning: ECML 2005. ECML 2005. Lecture notes in computer science, vol 3720. Springer, Berlin, Heidelberg

  • Ristin M, Guillaumin M, Gall J, Van Gool L (2016) Incremental learning of random forests for large-scale image classification. IEEE Trans Pattern Anal Mach Intell 38(3):490–503

    Google Scholar 

  • Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104

    Google Scholar 

  • Rosasco L, Villa S (2015) Learning with incremental iterative regularization. In: Advances in neural information processing systems, pp 1630–1638

  • Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747

  • Rutkowski L, Jaworski M, Pietruczuk L, Duda P (2014) The cart decision tree for mining data streams. Inf Sci 266:1–15

    MATH  Google Scholar 

  • Schmid H (2013) Probabilistic part-of-speech tagging using decision trees. In: Paper presented at the meeting of the proceedings of the international conference on new methods in language processing, Manchester, UK, 1994

  • Seber GA, Lee AJ (2012) Linear regression analysis, vol 936. Wiley, New York

    MATH  Google Scholar 

  • Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun Rev 5(1):3–55

    MathSciNet  Google Scholar 

  • Shi X, Yang Y, Guo Z, Lai Z (2014) Face recognition by sparse discriminant analysis via joint l 2, 1-norm minimization. Pattern Recognit 47(7):2447–2453

    Google Scholar 

  • Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Automatic face and gesture recognition, 2002. Proceedings. Fifth IEEE international conference on. IEEE, pp 53–58

  • Song W, Zhu J, Li Y, Chen C (2016) Image alignment by online robust PCA via stochastic gradient descent. IEEE Trans Circuits Syst Video Technol 26(7):1241–1250

    Google Scholar 

  • Soudry D, Di Castro D, Gal A, Kolodny A, Kvatinsky S (2015) Memristor-based multilayer neural networks with online gradient descent training. IEEE Trans Neural Netw Learn Syst 26(10):2408–2421

    MathSciNet  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  • Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Google Scholar 

  • Tagliaferri R, Longo G, Milano L, Acernese F, Barone F, Ciaramella A, De Rosa R, Donalek C, Eleuteri A, Raiconi G et al (2003) Neural networks in astronomy. Neural Netw 16(3):297–319

    Google Scholar 

  • Tan Y, Wang J, Zurada JM (2001) Nonlinear blind source separation using a radial basis function network. IEEE Trans Neural Netw 12(1):124–134

    Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282

    MathSciNet  MATH  Google Scholar 

  • Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6985–6989

  • Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186

    Google Scholar 

  • Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. AAAI 2094-2100

  • Variddhisaï T, Mandic D (2017) Online multilinear dictionary learning for sequential compressive sensing. arXiv preprint arXiv:1703.02492

  • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    MATH  Google Scholar 

  • Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge England

  • Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp 1473–1480

  • Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040

    Google Scholar 

  • Whitehead N, Fit-Florea A (2011) Precision & performance: floating point and IEEE 754 compliance for NVIDIA GPUS. rn (A+ B) 21(1):18749–19424

    Google Scholar 

  • Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los Altos

    Google Scholar 

  • Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Google Scholar 

  • Yoshida Y, Karakida R, Okada M, Amari S-I (2017) Statistical mechanical analysis of online learning with weight normalization in single layer perceptron. J Phys Soc Jpn 86(4):044002

    Google Scholar 

  • Zhao H, Yuen PC (2008) Incremental linear discriminant analysis for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 38(1):210–221

    Google Scholar 

  • Zhao H, Yuen PC, Kwok JT (2006) A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybern Part B (Cybern) 36(4):873–886

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iren Valova.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Radius points algorithm with floating point math

Radius points algorithm with floating point math

When Algorithm 1 is implemented with floating point math (Whitehead and Fit-Florea 2011), as is common in computer systems, numerical precision errors can result in points with insignificant differences, when points would be identical with infinite precision. For example, given grid spacing \(g= 0.1\), radius \(r_1 = 0.1\), and center \(\vec {c}= [0.61]\), using double-precision 64-bit format IEEE 754 math, Algorithm 1 will return points \([g\lceil (\vec {c}_1 - r_1) / g\rceil = 0.6000000000000001]\) and \([g\lceil (\vec {c}_1 - r_1) / g\rceil + g = 0.7000000000000001]\). If we expand the radius to \(r_2 = 0.11\), Algorithm 1 will return points \([g\lceil (\vec {c}_1 - r_2) / g\rceil = 0.5]\), \([g\lceil (\vec {c}_1 - r_2) / g\rceil + g = 0.6]\), and \([g\lceil (\vec {c}_1 - r_2) / g\rceil + g + g = 0.7]\). Although the larger radius \(r_2\) should encompass at least the same points as the smaller radius \(r_1\), limited precision in double-precision 64-bit format IEEE 754 results in completely different points for \(r_1\) and \(r_2\). Note that the radius parameter varies regularly during recursion of the radius points procedure and with various neighborhood functions utilizing Algorithm 1. Rounding each component to the number of significant digits in \(g\) avoids precision errors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lovinger, J., Valova, I. Infinite Lattice Learner: an ensemble for incremental learning. Soft Comput 24, 6957–6974 (2020). https://doi.org/10.1007/s00500-019-04330-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04330-7

Keywords

Navigation