Abstract
Random forests are one of the most successful ensemble methods which exhibits performance on the level of boosting and support vector machines. The method is fast, robust to noise, does not overfit and offers possibilities for explanation and visualization of its output. We investigate some possibilities to increase strength or decrease correlation of individual trees in the forest. Using several attribute evaluation measures instead of just one gives promising results. On the other hand replacement of ordinary voting with voting weighted with margin achieved on most similar instances gives improvements which are statistically highly significant over several data sets.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Breiman, L.: Bagging predictors. Machine Learning Journal 26(2), 123–140 (1996)
Breiman, L.: Random forests. Machine Learning Journal 45, 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth Inc., Belmont (1984)
Demšar, J.: Statistically correct comparison of classifiers over multiple datasets (2004) (submitted)
Dietterich, T.G., Kerns, M., Mansour, Y.: Applying the weak learning framework to understand and improve C4.5. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference (ICML 1996), pp. 96–103. Morgan Kaufmann, San Francisco (1996)
Freund, Y., Shapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) Machine Learning: Proceedings of the Thirteenth International Conference (ICML 1996), Morgan Kaufmann, San Francisco (1996)
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning Journal 45, 171–186 (2001)
Kononenko, I.: Estimating attributes: analysis and extensions of Relief. In: De Raedt, L., Bergadano, F. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 1995), pp. 1034–1040. Morgan Kaufmann, San Francisco (1995)
Meyer, D., Leisch, F., Hornik, K.: The support vector machine under test. Neurocomputing 55, 169–186 (2003)
Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases (1995), http://www.ics.uci.edu/mlearn/MLRepository.html
Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning Journal 53, 23–69 (2003)
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Machine Learning: Proceedings of the Fourteenth International Conference (ICML 1997), pp. 322–330. Morgan Kaufmann, San Francisco (1997)
Zar, J.H.: Biostatistical Analysis, 4th edn. Prentice Hall, Englewood Clifs (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Robnik-Šikonja, M. (2004). Improving Random Forests. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-30115-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive