Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

Sparse projection oblique randomer forests

Published: 01 January 2020 Publication History

Abstract

Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with > 100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. Very sparse random projections can be incorporated into gradient boosted trees to obtain potentially similar gains.

References

[1]
Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671-687, 2003.
[2]
Gérard Biau, Luc Devroye, and Gábor Lugosi. Consistency of random forests and other averaging classifiers. The Journal of Machine Learning Research, 9:2015-2033, 2008.
[3]
Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In International Conference on Knowledge Discovery and Data Mining, pages 245-250, 2001.
[4]
Rico Blaser and Piotr Fryzlewicz. Random rotation ensembles. Journal of Machine Learning Research, 17(4):1-26, 2016.
[5]
Leo Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801-849, 1998.
[6]
Leo Breiman. Random forests. Machine Learning, 4(1):5-32, October 2001.
[7]
Leo Breiman and Adele Cutler. Random forests, 2002. URL https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.
[8]
James Browne, Tyler Tomita, and Joshua Vogelstein. rerf: Randomer forest, 2018. URL https://cran.r-project.org/web/packages/rerf/.
[9]
James Browne, Disa Mhembere, Tyler Tomita, Joshua T. Vogelstein, and Randal Burns. Forest packing: fast parallel, decision forests. In SIAM International Conference on Data Mining, pages 46-54. Society for Industrial and Applied Mathematics, 2019.
[10]
Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In International Conference on Machine Learning, pages 161-168, 2006.
[11]
Rich Caruana, Nikos Karampatziakis, and Ainur Yessenalina. An empirical evaluation of supervised learning in high dimensions. In International Conference on Machine learning, pages 96-103, 2008.
[12]
Tianqi Chen. xgboost: Extreme gradient boosting, 2018. URL https://cran.r-project.org/web/packages/xgboost/.
[13]
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In International Conference on Knowledge Discovery and Data Mining, pages 785-794, 2016.
[14]
Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. In ACM Symposium on Theory of Computing, pages 537-546, New York, NY, USA, 2008. ACM.
[15]
Sanjoy Dasgupta and Yoav Freund. Random projection trees for vector quantization. IEEE Transactions on Information Theory, 55(7), 2009.
[16]
Sanjoy Dasgupta and Kaushik Sinha. Randomized partition trees for exact nearest neighbor search. Conference on Learning Theory, 2013.
[17]
Luc Devroye, László Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition. Springer Science & Business Media, 1996.
[18]
Dirk Eddelbuettel. Rcpp: seamless R and C++ integration, 2018. URL https://cran.r-project.org/web/packages/Rcpp/index.html.
[19]
Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, 2018.
[20]
Xiaoli Z. Fern and Carla E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In International Conference on Machine Learning, pages 186-193, 2003.
[21]
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1):3133-3181, 2014.
[22]
Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. In International Conference on Knowledge Discovery and Data Mining, pages 517-522, 2003.
[23]
Jerome H. Friedman. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, pages 1189-1232, 2001.
[24]
David Heath, Simon Kasif, and Steven Salzberg. Induction of oblique decision trees. In International Joint Conferences on Artificial Intelligence, pages 1002-1007, 1993.
[25]
Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Random projections for manifold learning. In Advances in Neural Information Processing Systems, pages 641-648, 2008.
[26]
Gareth M. James. Variance and bias for general loss functions. Machine Learning, 51(2): 115-135, 2003.
[27]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 3146-3154. Curran Associates, Inc., 2017.
[28]
Yann Lecun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of handwritten digits. URL http://yann.lecun.com/exdb/mnist/.
[29]
Donghoon Lee, Ming-Hsuan Yang, and Songhwai Oh. Fast and accurate head pose estimation via random projection forests. In IEEE International Conference on Computer Vision, pages 1958-1966, 2015.
[30]
Ping Li, Trevor J. Hastie, and Kenneth W. Church. Very sparse random projections. In International Conference on Knowledge Discovery and Data Mining, pages 287-296, 2006.
[31]
Gilles Louppe. Understanding Random Forests: From Theory to Practice. PhD thesis, University of Liège, 2014.
[32]
Bjoern H. Menze, B. Michael Kelm, Daniel N. Splitthoff, Ullrich Koethe, and Fred A. Hamprecht. On oblique random forests. In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, editors, Machine Learning and Knowledge Discovery in Databases, volume 6912 of Lecture Notes in Computer Science, pages 453- 469. Springer, Berlin, Heidelberg, 2011.
[33]
Alessandro Motta, Meike Schurr, Benedikt Staffer, and Moritz Helmstaedter. Big data in nanoscale connectomics, and the greed for training labels. Current Opinion in Neurobiology, 55:180-187, 2019.
[34]
Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53):1-32, 2019.
[35]
Tom Rainforth and Frank Wood. Canonical correlation forests. arXiv preprint arXiv:1507.05444, 2015.
[36]
Juan J. Rodriguez, Ludmila I. Kuncheva, and Carlos J. Alonso. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630, 2006.
[37]
Erwan Scornet, Gérard Biau, and Jean-Philippe Vert. Consistency of random forests. The Annals of Statistics, 43(4):1716-1741, 2015.
[38]
Tyler M. Tomita, Mauro Maggioni, and Joshua T. Vogelstein. Romao: Robust oblique forests with linear matrix operations. In SIAM International Conference on Data Mining, 2017.
[39]
Gerard V. Trunk. A problem of dimensionality: A simple example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(3):306-307, 1979.
[40]
Roman Vershynin. High Dimensional Probability: An Introduction with Applications in Data Science. 2019. URL https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf.
[41]
Marvin N. Wright. ranger: A fast implementation of random forests, 2018. URL https://cran.r-project.org/web/packages/ranger/.
[42]
Abraham J. Wyner, Matthew Olson, Justin Bleich, and David Mease. Explaining the success of adaboost and random forests as interpolating classifiers. Journal of Machine Learning Research, 18(1), 2017.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 21, Issue 1
January 2020
10260 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents
CC-BY 4.0

Publisher

JMLR.org

Publication History

Accepted: 01 May 2020
Published: 01 January 2020
Revised: 01 October 2019
Received: 01 September 2018
Published in JMLR Volume 21, Issue 1

Author Tags

  1. ensemble learning
  2. random forests
  3. decision trees
  4. random projections
  5. classification
  6. regression
  7. feature extraction
  8. sparse learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 111
    Total Downloads
  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)13
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media