Article

Closed-form dual perturb and combine for tree-based models

Authors:

Louis WehenkelAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 233 - 240

https://doi.org/10.1145/1102351.1102381

Published: 07 August 2005 Publication History

Abstract

This paper studies the aggregation of predictions made by tree-based models for several perturbed versions of the attribute vector of a test case. A closed-form approximation of this scheme combined with cross-validation to tune the level of perturbation is proposed. This yields soft-tree models in a parameter free way. and preserves their interpretability. Empirical evaluations, on classification and regression problems, show that accuracy and bias/variance tradeoff are improved significantly at the price of an acceptable computational overhead. The method is further compared and combined with tree bagging.

References

[1]

Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/.]]

[2]

Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123--140.]]

[3]

Breiman, L. (1998). Arcing classifiers. Annals of statistics, 26, 801--849.]]

[4]

Breiman, L. (2000). Randomizing outputs to increase prediction accuracy. Machine Learning, 40, 229--242.]]

Digital Library

[5]

Breiman, L. (2001). Random forests. Machine learning, 45, 5--32.]]

Digital Library

[6]

Breiman, L., Friedman, J., Olsen. R., & Stone, C. (1984). Classification and regression trees. Wadsworth International (California).]]

[7]

Carter, C., & Catlett, J. (1987). Assessing credit card applications using machine learning. IEEE Expert, Fall, 71--79.]]

[8]

Dahmen, J., Keysers, D., & Ney, H. (2001). Combined classification of handwritten digits using the "virtual test sample method". Proc. of the Second International Workshop on Multiple Classifier Systems, Cambrige, UK (pp. 109--118).]]

Digital Library

[9]

Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40, 139--157.]]

Digital Library

[10]

Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistics, 19.]]

[11]

Friedman, J. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55--77.]]

Digital Library

[12]

Friedman, J. H. (1996). Local learning based on recursive covering (Technical Report). Department of Statistics, Stanford University.]]

[13]

Geurts, P. (2001). Dual perturb and combine algorithm. Proc. of the Eighth International Workshop on Artificial Intelligence and Statistics (pp. 196--201). Key-West, Florida.]]

[14]

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832--844.]]

Digital Library

[15]

Jordan, M. I. (1994). A statistical approach to decision tree modeling. Proc. of the Seventh Annual ACM Conference on Computational Learning Theory. New York. ACM Press.]]

Digital Library

[16]

Ling, C., & Yan, R. (2003). Decision trees with better ranking. Proceedings of the 20th International Conference on Machine Learning (ICML-2003) (pp. 480--487). Washington DC.]]

[17]

Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239--281.]]

Digital Library

[18]

Olaru, C., & Wehenkel, L. (2003). A complete fuzzy decision tree technique. Fuzzy Sets and Systems, 138, 221--254.]]

Digital Library

[19]

Quinlan, J. (1986). C4.5: Programs for machine learning. Morgan Kaufmann (San Mateo).]]

Digital Library

[20]

Torgo, L. (1999). Inductive learning of tree-based regression models. Doctoral dissertation, University of Porto.]]

[21]

Wehenkel, L. (1998). Automatic learning techniques in power systems. Boston: Kluwer Academic.]]

Digital Library

Cited By

Khosravi KGolkarian AMelesse ADeo R(2022)Suspended sediment load modeling using advanced hybrid rotation forest based elastic network approachJournal of Hydrology10.1016/j.jhydrol.2022.127963610(127963)Online publication date: Jul-2022
https://doi.org/10.1016/j.jhydrol.2022.127963
Fathipour-Azar H(2022)Multi-level Machine Learning-Driven Tunnel Squeezing Prediction: Review and New InsightsArchives of Computational Methods in Engineering10.1007/s11831-022-09774-z29:7(5493-5509)Online publication date: 10-Jun-2022
https://doi.org/10.1007/s11831-022-09774-z
Gouk HFrank EPfahringer BCree M(2020)Regularisation of neural networks by enforcing Lipschitz continuityMachine Learning10.1007/s10994-020-05929-wOnline publication date: 6-Dec-2020
https://doi.org/10.1007/s10994-020-05929-w
Show More Cited By

Recommendations

Software defect prediction using tree-based ensembles
PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

Software defect prediction is an active research area in software engineering. Accurate prediction of software defects assists software engineers in guiding software quality assurance activities. In machine learning, ensemble learning has been proven to ...
Perturb and combine to identify influential spreaders in real-world networks
ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Some of the most effective influential spreader detection algorithms are unstable to small perturbations of the network structure. Inspired by bagging in Machine Learning, we propose the first Perturb and Combine (P&C) procedure for networks. It (1) ...
Learning to combine discriminative classifiers: confidence based
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Much of research in data mining and machine learning has led to numerous practical applications. Spam filtering, fraud detection, and user query-intent analysis has relied heavily on machine learned classifiers, and resulted in improvements in robust ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
144
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khosravi KGolkarian AMelesse ADeo R(2022)Suspended sediment load modeling using advanced hybrid rotation forest based elastic network approachJournal of Hydrology10.1016/j.jhydrol.2022.127963610(127963)Online publication date: Jul-2022
https://doi.org/10.1016/j.jhydrol.2022.127963
Fathipour-Azar H(2022)Multi-level Machine Learning-Driven Tunnel Squeezing Prediction: Review and New InsightsArchives of Computational Methods in Engineering10.1007/s11831-022-09774-z29:7(5493-5509)Online publication date: 10-Jun-2022
https://doi.org/10.1007/s11831-022-09774-z
Gouk HFrank EPfahringer BCree M(2020)Regularisation of neural networks by enforcing Lipschitz continuityMachine Learning10.1007/s10994-020-05929-wOnline publication date: 6-Dec-2020
https://doi.org/10.1007/s10994-020-05929-w
Biau GScornet EWelbl J(2018)Neural Random ForestsSankhya A10.1007/s13171-018-0133-yOnline publication date: 21-Jun-2018
https://doi.org/10.1007/s13171-018-0133-y
Geurts PWehenkel L(2016)Comments on: A random forest guided tourTEST10.1007/s11749-016-0487-125:2(247-253)Online publication date: 19-Apr-2016
https://doi.org/10.1007/s11749-016-0487-1
Almeida VGama J(2015)Prediction intervals for electric load forecast: Evaluation for different profiles2015 18th International Conference on Intelligent System Application to Power Systems (ISAP)10.1109/ISAP.2015.7325539(1-6)Online publication date: Sep-2015
https://doi.org/10.1109/ISAP.2015.7325539

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents