research-article

Free access

Born-again tree ensembles

AUTHORs:

Maximilian SchifferAuthors Info & Claims

ICML'20: Proceedings of the 37th International Conference on Machine Learning

Article No.: 903, Pages 9743 - 9753

Published: 13 July 2020 Publication History

PDF eReader Publisher Site

Abstract

The use of machine learning algorithms in finance, medicine, and criminal justice can deeply impact human lives. As a consequence, research into interpretable machine learning has rapidly grown in an attempt to better control and fix possible sources of mistakes and biases. Tree ensembles offer a good prediction quality in various domains, but the concurrent use of multiple trees reduces the interpretability of the ensemble. Against this background, we study born-again tree ensembles, i.e., the process of constructing a single decision tree of minimum size that reproduces the exact same behavior as a given tree ensemble in its entire feature space. To find such a tree, we develop a dynamic-programming based algorithm that exploits sophisticated pruning and bounding rules to reduce the number of recursive calls. This algorithm generates optimal born-again trees for many datasets of practical interest, leading to classifiers which are typically simpler and more interpretable without any other form of compromise.

Supplementary Material

Additional material (3524938.3525841_supp.pdf)

Supplemental material.

Download
384.79 KB

References

[1]

Bai, J., Li, Y., Li, J., Jiang, Y., and Xia, S. Rectified decision trees: Towards interpretability, compression and empirical soundness. arXiv preprint arXiv:1903.05965, 2019.

[2]

Banfield, R. E., Hall, L. O., Bowyer, K.W., and Kegelmeyer, W. P. Ensemble diversity measures and their application to thinning. Information Fusion, 6(1):49-62, 2005.

[3]

Bastani, O., Kim, C., and Bastani, H. Interpretability via model extraction. arXiv preprint arXiv:1706.09773, 2017a.

[4]

Bastani, O., Kim, C., and Bastani, H. Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504, 2017b.

[5]

Bennett, K. Decision tree construction via linear programming. In Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society Conference, Utica, Illinois, 1992.

[6]

Bennett, K. and Blue, J. Optimal decision trees. Technical report, Rensselaer Polytechnique Institute, 1996.

[7]

Bertsimas, D. and Dunn, J. Optimal classification trees. Machine Learning, 106(7):1039-1082, 2017.

Digital Library

[8]

Breiman, L. Random forests. Machine Learning, 45(1): 5-32, 2001.

Digital Library

[9]

Breiman, L. and Shang, N. Born again trees. Technical report, University of California Berkeley, 1996.

[10]

Buciluǎ, C., Caruana, R., and Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006.

Digital Library

[11]

Caruana, R., Niculescu-Mizil, A., Crew, G., and Ksikes, A. Ensemble selection from libraries of models. In Proceedings of the twenty-first International Conference on Machine Learning, pp. 18, 2004.

Digital Library

[12]

Clark, K., Luong, M.-T., Khandelwal, U., Manning, C. D., and Le, Q. V. Bam! born-again multi-task networks for natural language understanding. arXiv preprint arXiv:1907.04829, 2019.

[13]

Frankle, J. and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.

[14]

Friedman, J. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189-1232, 2001.

[15]

Friedman, J. H. and Popescu, B. E. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3): 916-954, 2008.

[16]

Frosst, N. and Hinton, G. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.

[17]

Furlanello, T., Lipton, Z. C., Tschannen, M., Itti, L., and Anandkumar, A. Born again neural networks. arXiv preprint arXiv:1805.04770, 2018.

[18]

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51(5):1-42, 2018.

[19]

Günlük, O., Kalagnanam, J., Menickelly, M., and Scheinberg, K. Optimal decision trees for categorical data via integer programming. arXiv preprint arXiv:1612.03225, 2018.

[20]

Hara, S. and Hayashi, K. Making tree ensembles interpretable: A bayesian model selection approach. arXiv preprint arXiv:1606.09066, 2016.

[21]

Hernández-Lobato, D., Martinez-Muoz, G., and Suárez, A. Statistical instance-based pruning in ensembles of independent classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):364-369, 2009.

Digital Library

[22]

Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.

[23]

Hu, Q., Yu, D., Xie, Z., and Li, X. Eros: Ensemble rough subspaces. Pattern recognition, 40(12):3728-3739, 2007.

Digital Library

[24]

Hu, X., Rudin, C., and Seltzer, M. Optimal sparse decision trees. In Advances in Neural Information Processing Systems, 2019.

[25]

Kisamori, K. and Yamazaki, K. Model bridging: To interpretable simulation model from neural network. arXiv preprint arXiv:1906.09391, 2019.

[26]

Margineantu, D. and Dietterich, T. Pruning adaptive boosting. In Proceedings of the Fourteenth International Conference Machine Learning, 1997.

Digital Library

[27]

Martínez-Muñoz, G., Hernández-Lobato, D., and Suárez, A. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):245-259, 2008.

Digital Library

[28]

Meinshausen, N. Node harvest. The Annals of Applied Statistics, pp. 2049-2072, 2010.

[29]

Melis, D. A. and Jaakkola, T. Towards robust interpretability with self-explaining neural networks. In Advances in Neural Information Processing Systems, 2018.

[30]

Mollas, I., Tsoumakas, G., and Bassiliades, N. Lionforests: Local interpretation of random forests through path selection. arXiv preprint arXiv:1911.08780, 2019.

[31]

Murdoch, W., Singh, C., Kumbier, K., Abassi-Asl, R., and Yu, B. Interpretable machine learning: definitions, methods, and applications. arXiv preprint arXiv:1901.04592v1, 2019.

[32]

Nijssen, S. and Fromont, E. Mining optimal decision trees from itemset lattices. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007.

Digital Library

[33]

Park, S. and Furnkranz, J. Efficient prediction algorithms for binary decomposition techniques. Data Mining and Knowledge Discovery, 24(1):40-77, 2012.

Digital Library

[34]

Partalas, I., Tsoumakas, G., and Vlahavas, I. An ensemble uncertainty aware measure for directed hill climbing ensemble pruning. Machine Learning, 81(3):257-282, 2010.

Digital Library

[35]

Prodromidis, A. L. and Stolfo, S. J. Cost complexity-based pruning of ensemble classifiers. Knowledge and Information Systems, 3(4):449-469, 2001.

Digital Library

[36]

Prodromidis, A. L., Stolfo, S. J., and Chan, P. K. Effective and efficient pruning of metaclassifiers in a distributed data mining system. Knowledge Discovery and Data Mining Journal, 32, 1999.

[37]

Rokach, L. Collective-agreement-based pruning of ensembles. Computational Statistics & Data Analysis, 53(4): 1015-1026, 2009.

Digital Library

[38]

Rokach, L. Decision forest: Twenty years of research. Information Fusion, 27:111-125, 2016.

Digital Library

[39]

Rokach, L., Maimon, O., and Arbel, R. Selective voting-- getting more for less in sensor fusion. International Journal of Pattern Recognition and Artificial Intelligence, 20(03):329-350, 2006.

[40]

Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206- 215, 2019.

[41]

Sirikulviriya, N. and Sinthupinyo, S. Integration of rules from a random forest. In International Conference on Information and Electronics Engineering, volume 6, 2011.

[42]

Smith, J., Everhart, J., Dickson, W., Knowler, W., and Johannes, R. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Annual Symposium on Computer Applications in Medical Care, pp. 261-265. IEEE Computer Society Press, 1988.

[43]

Tamon, C. and Xiang, J. On the boosting pruning problem. In Proceedings of the 11th European Conference on Machine Learning, 2000.

Digital Library

[44]

Tan, H. F., Hooker, G., and Wells, M. T. Tree space prototypes: Another look at making tree ensembles interpretable. arXiv preprint arXiv:1611.07115, 2016.

[45]

Vandewiele, G., Lannoye, K., Janssens, O., Ongenae, F., De Turck, F., and Van Hoecke, S. A genetic algorithm for interpretable model extraction from decision tree ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 2017.

[46]

Verwer, S. and Zhang, Y. Learning optimal classification trees using a binary linear program formulation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.

Digital Library

[47]

Windeatt, T. and Ardeshir, G. An empirical comparison of pruning methods for ensemble classifiers. In International Symposium on Intelligent Data Analysis, 2001.

[48]

Zhang, H. and Wang, M. Search for the smallest random forest. Statistics and its Interface, 2(3):381, 2009.

[49]

Zhang, Q., Nian Wu, Y., and Zhu, S.-C. Interpretable convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

[50]

Zhang, Y., Burer, S., and Street, W. N. Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7(Jul):1315-1338, 2006.

[51]

Zhou, Y. and Hooker, G. Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036, 2016.

[52]

Zhou, Z., Wu, J., and Tang, W. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137: 239-263, 2002.

Digital Library

[53]

Zhou, Z.-H. and Tang, W. Selective ensemble of decision trees. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, 2003.

Cited By

Komusiewicz CKunz PSommer FSorge MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)On computing optimal tree ensemblesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619123(17364-17374)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619123

Recommendations

Software defect prediction using tree-based ensembles
PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

Software defect prediction is an active research area in software engineering. Accurate prediction of software defects assists software engineers in guiding software quality assurance activities. In machine learning, ensemble learning has been proven to ...
On computing optimal tree ensembles
ICML'23: Proceedings of the 40th International Conference on Machine Learning

Random forests and, more generally, (decision-) tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are ...
Tree ensembles for predicting structured outputs

In this paper, we address the task of learning models for predicting structured outputs. We consider both global and local predictions of structured outputs, the former based on a single model that predicts the entire output structure and the latter ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'20: Proceedings of the 37th International Conference on Machine Learning

July 2020

11702 pages

Editors:
Hal Daumé,
Aarti Singh

Copyright © 2020.

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
29
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)4

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Komusiewicz CKunz PSommer FSorge MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)On computing optimal tree ensemblesProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619123(17364-17374)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619123

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents