Abstract
We propose a method for constructing regression trees with range and region splitting. We present an efficient algorithm for computing the optimal two-dimensional region that minimizes the mean squared error of an objective numeric attribute in a given database. As two-dimensional regions, we consider a class R of grid-regions, such as “x-monotone,” “rectilinear-convex,” and “rectangular,” in the plane associated with two numeric attributes. We compute the optimal region R. We propose to use a test that splits data into those that lie inside the region R and those that lie outside the region in the construction of regression trees. Experiments confirm that the use of region splitting gives compact and accurate regression trees in many domains.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Asano, T., Chen, D., Katoh, N., & Tokuyama, T. (1996). Polynomial-time solutions to image segmentations. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (pp. 104–113).
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth.
Brodley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.
Fukuda, T., Morimoto, Y., Morishita, S., & Tokuyama, T. (1996). Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. In Proceedings of the ACM SIGMOD Conference on Management of Data (pp. 13–23).
Garey, M. R. & Johnson, D. S. (1979). Computer and Intractability. A Guide to NP-Completeness. San Francisco, CA: W. H. Freeman.
Gehrke, J. E., Ramakrishnan, R., & Ganti, V. (1998). RainForest—A framework for fast decision tree construction of large datasets. In Proceedings of the International Conference on Very Large Data Bases (pp. 416–427).
Hyafil, L. & Rivest, R. L. (1976). Constructing optimal binary decision trees is np-complete. Information Processing Letters, 5, 15–17.
Ittner, A. & Schlosser, M. (1996). Non-linear decision trees—NDT. In Proceedings of the International Conference on Machine Learning (pp. 252–258).
Lubinsky, D. (1996). Tree structured interpretable regression. Learning from Data: AI and Statistics V (pp. 387–398).
Mehta, M., Agrawal, R., & Rissanen, J. (1996). SLIQ: A fast scalable classifier for data mining. In Proceedings of the International Conference on Extending Database Technology (Vol. 1057, pp. 18–32). Berlin: Springer.
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4, 227–243.
Morimoto, Y., Fukuda, T., Morishita, S., & Tokuyama, T. (1997). Implementation and evaluation of decision trees with range and region splitting. Constraints, an International Journal, 2(3/4), 401–427.
Murthy, S. K., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–32.
Piatetsky-Shapiro, G. & Frawley, W. J. (Eds.) (1991). Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Shafer, J. C., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. In Proceedings of the International Conference on Very Large Data Bases (pp. 544–555).
Yoda, K., Fukuda, T., Morimoto, Y., Morishita, S., & Tokuyama, T. (1997). Computing optimized rectilinear regions for association rules. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (pp. 96–103).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Morimoto, Y., Ishii, H. & Morishita, S. Efficient Construction of Regression Trees with Range and Region Splitting. Machine Learning 45, 235–259 (2001). https://doi.org/10.1023/A:1017980905332
Issue Date:
DOI: https://doi.org/10.1023/A:1017980905332