Abstract
Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the tree. We address this issue by using the AIC criterion [1] instead of cross-validation to prevent overfitting these models. In addition, a weight trimming heuristic is used which produces a significant speedup. We compare the training time and accuracy of the new induction process with the original one on various datasets and show that the training time often decreases while the classification accuracy diminishes only slightly.
Chapter PDF
Similar content being viewed by others
Keywords
- Training Time
- Training Instance
- Linear Logistic Regression
- Simple Linear Regression Model
- Model Selection Method
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Second Int. Symposium on Information Theory, pp. 267–281 (1973)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Breiman, L., Friedman, H., Olshen, J.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)
Bühlmann, P., Yu, B.: Boosting, model selection, lasso and nonnegative garrote. Technical Report 2005-127, Seminar for Statistics, ETH Zürich (2005)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. In. Conf. on Machine Learning, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. The Annals of Statistics 38(2), 337–374 (2000)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Machine Learning 59(1/2), 161–205 (2005)
Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural Information Processing Systems, vol. 12, pp. 307–313. MIT Press, Cambridge (1999)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implemenations. Morgan Kaufmann, San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sumner, M., Frank, E., Hall, M. (2005). Speeding Up Logistic Model Tree Induction. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_72
Download citation
DOI: https://doi.org/10.1007/11564126_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)