research-article

Free access

Sparse projection oblique randomer forests

AUTHORs:

Tyler M. Tomita,

Jesse L. Patsolic,

Carey E. Priebe,

Mauro Maggioni,

Joshua T. VogelsteinAuthors Info & Claims

The Journal of Machine Learning Research, Volume 21, Issue 1

Article No.: 104, Pages 4193 - 4231

Published: 01 January 2020 Publication History

PDF eReader Publisher Site

Abstract

Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with > 100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. Very sparse random projections can be incorporated into gradient boosted trees to obtain potentially similar gains.

References

[1]

Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671-687, 2003.

[2]

Gérard Biau, Luc Devroye, and Gábor Lugosi. Consistency of random forests and other averaging classifiers. The Journal of Machine Learning Research, 9:2015-2033, 2008.

[3]

Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. In International Conference on Knowledge Discovery and Data Mining, pages 245-250, 2001.

[4]

Rico Blaser and Piotr Fryzlewicz. Random rotation ensembles. Journal of Machine Learning Research, 17(4):1-26, 2016.

[5]

Leo Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801-849, 1998.

[6]

Leo Breiman. Random forests. Machine Learning, 4(1):5-32, October 2001.

[7]

Leo Breiman and Adele Cutler. Random forests, 2002. URL https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm.

[8]

James Browne, Tyler Tomita, and Joshua Vogelstein. rerf: Randomer forest, 2018. URL https://cran.r-project.org/web/packages/rerf/.

[9]

James Browne, Disa Mhembere, Tyler Tomita, Joshua T. Vogelstein, and Randal Burns. Forest packing: fast parallel, decision forests. In SIAM International Conference on Data Mining, pages 46-54. Society for Industrial and Applied Mathematics, 2019.

[10]

Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In International Conference on Machine Learning, pages 161-168, 2006.

[11]

Rich Caruana, Nikos Karampatziakis, and Ainur Yessenalina. An empirical evaluation of supervised learning in high dimensions. In International Conference on Machine learning, pages 96-103, 2008.

[12]

Tianqi Chen. xgboost: Extreme gradient boosting, 2018. URL https://cran.r-project.org/web/packages/xgboost/.

[13]

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In International Conference on Knowledge Discovery and Data Mining, pages 785-794, 2016.

[14]

Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. In ACM Symposium on Theory of Computing, pages 537-546, New York, NY, USA, 2008. ACM.

[15]

Sanjoy Dasgupta and Yoav Freund. Random projection trees for vector quantization. IEEE Transactions on Information Theory, 55(7), 2009.

[16]

Sanjoy Dasgupta and Kaushik Sinha. Randomized partition trees for exact nearest neighbor search. Conference on Learning Theory, 2013.

[17]

Luc Devroye, László Györfi, and Gábor Lugosi. A Probabilistic Theory of Pattern Recognition. Springer Science & Business Media, 1996.

[18]

Dirk Eddelbuettel. Rcpp: seamless R and C++ integration, 2018. URL https://cran.r-project.org/web/packages/Rcpp/index.html.

[19]

Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, 2018.

[20]

Xiaoli Z. Fern and Carla E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In International Conference on Machine Learning, pages 186-193, 2003.

[21]

Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1):3133-3181, 2014.

[22]

Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. In International Conference on Knowledge Discovery and Data Mining, pages 517-522, 2003.

[23]

Jerome H. Friedman. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, pages 1189-1232, 2001.

[24]

David Heath, Simon Kasif, and Steven Salzberg. Induction of oblique decision trees. In International Joint Conferences on Artificial Intelligence, pages 1002-1007, 1993.

[25]

Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Random projections for manifold learning. In Advances in Neural Information Processing Systems, pages 641-648, 2008.

[26]

Gareth M. James. Variance and bias for general loss functions. Machine Learning, 51(2): 115-135, 2003.

[27]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 3146-3154. Curran Associates, Inc., 2017.

[28]

Yann Lecun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of handwritten digits. URL http://yann.lecun.com/exdb/mnist/.

[29]

Donghoon Lee, Ming-Hsuan Yang, and Songhwai Oh. Fast and accurate head pose estimation via random projection forests. In IEEE International Conference on Computer Vision, pages 1958-1966, 2015.

[30]

Ping Li, Trevor J. Hastie, and Kenneth W. Church. Very sparse random projections. In International Conference on Knowledge Discovery and Data Mining, pages 287-296, 2006.

[31]

Gilles Louppe. Understanding Random Forests: From Theory to Practice. PhD thesis, University of Liège, 2014.

[32]

Bjoern H. Menze, B. Michael Kelm, Daniel N. Splitthoff, Ullrich Koethe, and Fred A. Hamprecht. On oblique random forests. In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, editors, Machine Learning and Knowledge Discovery in Databases, volume 6912 of Lecture Notes in Computer Science, pages 453- 469. Springer, Berlin, Heidelberg, 2011.

[33]

Alessandro Motta, Meike Schurr, Benedikt Staffer, and Moritz Helmstaedter. Big data in nanoscale connectomics, and the greed for training labels. Current Opinion in Neurobiology, 55:180-187, 2019.

[34]

Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53):1-32, 2019.

[35]

Tom Rainforth and Frank Wood. Canonical correlation forests. arXiv preprint arXiv:1507.05444, 2015.

[36]

Juan J. Rodriguez, Ludmila I. Kuncheva, and Carlos J. Alonso. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630, 2006.

[37]

Erwan Scornet, Gérard Biau, and Jean-Philippe Vert. Consistency of random forests. The Annals of Statistics, 43(4):1716-1741, 2015.

[38]

Tyler M. Tomita, Mauro Maggioni, and Joshua T. Vogelstein. Romao: Robust oblique forests with linear matrix operations. In SIAM International Conference on Data Mining, 2017.

[39]

Gerard V. Trunk. A problem of dimensionality: A simple example. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(3):306-307, 1979.

[40]

Roman Vershynin. High Dimensional Probability: An Introduction with Applications in Data Science. 2019. URL https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf.

[41]

Marvin N. Wright. ranger: A fast implementation of random forests, 2018. URL https://cran.r-project.org/web/packages/ranger/.

[42]

Abraham J. Wyner, Matthew Olson, Justin Bleich, and David Mease. Explaining the success of adaboost and random forests as interpolating classifiers. Journal of Machine Learning Research, 18(1), 2017.

Index Terms

Sparse projection oblique randomer forests

Index terms have been assigned to the content through auto-classification.

Recommendations

Random Forests for land cover classification
Special issue: Pattern recognition in remote sensing (PRRS 2004)

Random Forests are considered for classification of multisource remote sensing and geographic data. Various ensemble classification methods have been proposed in recent years. These methods have been proven to improve classification accuracy ...
Selective sampling for trees and forests
Abstract
In this paper we describe selective sampling algorithms for decision trees and random forests and their contribution to the classification accuracy. In our selective sampling algorithms, the instance that yields the highest expected ...
Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests
Abstract
The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 21, Issue 1

January 2020

10260 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Francis Bach
INRIA
,
David Blei
Columbia University
,
Bernhard Schölkopf
MPI for Intelligent Systems

Issue’s Table of Contents

Copyright © 2020.

CC-BY 4.0

Publisher

JMLR.org

Publication History

Accepted: 01 May 2020

Published: 01 January 2020

Revised: 01 October 2019

Received: 01 September 2018

Published in JMLR Volume 21, Issue 1

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
111
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)13

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents