Abstract
Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Algorithm 1 assesses parameter instability only if the splitting variable has two or more levels for a categorical variable, or two or more values for a numeric variable. Otherwise, there is nothing to split.
Our specification has no intercept since two dummy variables are introduced. This means that neither the left nor the right sub-sample is taken as a reference. Note that a model with one intercept and one dummy variable is equivalent and would simply lead to rewrite Eq. (6). However, the fitted log-likelihood is identical with or without intercept, see Corollary 3.1 and Examples 3.1, 3.2 of Brouste et al. (2020).
We do not consider an intercept as the fitted log-likelihood is identical with or without intercept, see Brouste et al. (2020, Corollary 3.1).
The p values can be computed by Monte-Carlo and then adjusted with Bonferroni. This fourth option has also be examined, but the results is not reported as the goodness-of-fits of results is comparable to the other ctree options, but the computation times is much longer.
Since the results of explicit GLM Tree and GLM Tree are equal, only those related to explicit GLM Tree are depicted.
We also model a post-pruned version of the CART model using the one standard error criterion, which produce more accurate results for CART model on the CategIG1 and the ContIG1 datasets, but explicit GLM Tree remains better in terms of RMSE.
References
Box, G.E.P., Cox, D.R.: An analysis of transformations revisited. J. Am. Stat. 77, 209–210 (1964)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, New CRC, Boca Raton (1984)
Brouste, A., Dutang, C., Rohmer, T.: Closed form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modelling. Comput. Stat. 35, 689–724 (2020)
Chambers, J.M., Hastie, T.J.: Statistical Models in S. Chapman and Hall, London (1993)
Ciampi, A.: Generalized regression trees. Comput. Stat. Data Anal. 12(1), 57–78 (1991)
Cortes, C., Vapnik, V.: Supportvector networks. Mach. Learn. 20(3), 273–297 (1995)
Denuit, M., Hainaut, D., Trufin, J.: Effective Statistical Learning Methods for Actuaries I: GLMs and extensions. Springer Actuarial Lecture Notes. Springer, Berlin (2019)
Fahrmeir, L., Kaufmann, H.: Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Stat. 13, 342–368 (1985)
Farkas, S., Lopez, O., Thomas, M.: Cyber claim analysis using Generalized Pareto regression trees with applications to insurance. Insur. Math. Econ. 98, 92–105 (2021)
Fokkema, M.: Fitting prediction rule ensembles with R Package pre. J. Stat. Softw. 92(1), 1–30 (2020)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. Nonlinear Methods Data Min. 38(4), 367–378 (2002)
Gama, J.: Functional trees. Mach. Learn. 55(3), 219–250 (2004)
Garge, N.R., Bobashev, G., Eggleston, B.: Random forest methodology for model-based recursive partitioning: the mobForest package for R. BMC Bioinform. 14, 125 (2013)
Hothorn, T., Zeileis, A.: partykit: a modular toolkit for recursive partytioning in R. J. Mach. Learn. Res. 16, 3905–3909 (2015)
Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)
James, G., Witten, D., Hastie, T., Tibshirani, R.: ISLR: data for an introduction to statistical learning with applications in R (2017)
Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Ann. Appl. Stat. 29, 119–127 (1980)
Kim, H., Loh, W.-Y.: Classification trees with unbiased multiway splits. J. Am. Stat. Assoc. 96(454), 589–604 (2001)
Landwehr, N., Hall, M., Eibe, F.: Logistic model trees. Mach. Learn. 59, 161–205 (2005)
Lawrence, J.: Introduction to Neural Networks: Design, Theory and Applications, 6th edn. California Scientific Software, Nevada City (1994)
Leisch, F., Dimitriadou, E.: mlbench: machine learning benchmark problems (2021)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Liu, N.-T., Lin, F.-C., Shih, Y.-S.: Count regression trees. In: Advances in Data Analysis and Classification (2019)
Loh, W.-Y.: Regression trees with unbiased variable selection and interaction detection. Stat. Sin. 12(2), 361–386 (2002)
Loh, W.-Y.: Fifty years of classification and regression trees. Int. Stat. Rev. 82(3), 329–348 (2014)
Loh, W.-Y., Shih, Y.-S.: Split selection methods for classification trees. Stat. Sin. 7(4), 815–840 (1997)
Loh, W.-Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis. J. Am. Stat. Assoc. 83(403), 715–725 (1988)
McCullagh, P., Nelder, J.A.: Generalized Linear Models. Statistics and Applied Probability, vol. 37, 2nd edn. CRC, Boca Raton (1989)
Philipp, M., Rusch, T., Hornik, K., Strobl, C.: Measuring the stability of results from supervised statistical learning. J. Comput. Graph. Stat. 27(4), 685–700 (2018)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2021)
Rusch, T., Zeileis, A.: Gaining insight with recursive partitioning of generalized linear models. J. Stat. Comput. Simul. 83(7), 1301–1315 (2013)
Seber, G.A.F., Lee, A.J.: Linear Regression Analysis. Wiley, Hoboken (2003)
Seibold, H., Hothorn, T., Zeileis, A.: Generalised linear model trees with global additive effects. In: Advances in Data Analysis and Classification (2018)
Su, X., Wang, M., Fan, J.: Maximum likelihood regression trees. J. Comput. Graph. Stat. 13(3), 586–598 (2004)
Szöcs, E., Schäfer, R.B.: Ecotoxicology is not normal: a comparison of statistical approaches for analysis of count and proportion data in ecotoxicology. Environ. Sci. Pollut. Res. 22(18), 13990–13999 (2015)
Therneau, T., Atkinson, B.: rpart: recursive partitioning and regression trees (2019)
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Springer, Berlin (2002)
Weisberg, S.: Applied Linear Regression. Wiley, Hoboken (2005)
Wilson, K., Grenfell, B.T.: Generalized linear modelling for parasitologists. Parasitol. Today 13(1), 33–38 (1997)
Wood, S.N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(1), 3–36 (2011)
Zeileis, A., Hornik, K.: Generalized M- fluctuation tests for parameter instability. Stat. Neerl. 61(4), 488–508 (2007)
Zeileis, A., Hothorn, T., Hornik, K.: Model-based recursive partitioning. J. Comput. Graph. Stat. 17(2), 492–514 (2008)
Acknowledgements
The authors are also very grateful for the useful suggestions of the two anonymous referees, which led to significant improvements of this article. The remaining errors, of course, should be attributed to the authors alone. This paper also benefits from fruitful discussions with members of the French chair DIALog – Digital Insurance And Long-term risks – under the aegis of the Fondation du Risque, a joint initiative by UCBL and CNP; and with members of the French laboratory SAF (UCBL and ISFA).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Computation details on likelihood and deviance
1.1 A.1 Proof of the fitted log-likelihood
Using Corollary 3.1 of Brouste et al. (2020) for the j-th partitioning variable and using notations from Table 2, we have
where \({\tilde{b}}=(b^\prime )^{-1}\) is the inverse of \(b^\prime \).
1.2 A.2 Proof of the fitted deviance
Let us note the fitted mean \({\hat{\mu _i}}=g^{-1}(<z_i,\widehat{{\varvec{\theta }}{(s)}}>)={\overline{y}}_n^{(j)}\). The deviance is given by
1.3 A.3 Proof of the fitted log-likelihood
We adapt the proof of Brouste et al. (2020) in order to take into account weights when maximizing the log-likelihood. For a known weight \(w_i\), their Equations (4) and (5) become
We adapt their proof of Theorem 3.1. The system \(S({\varvec{\theta }})=0\) is
The first equation being redundant, the second equation simplifies to \(\forall j\in J\)
Using weighted frequencies \( m_w^{(j)} = \sum _{i=1}^nx_i^{(2),j}w_i\), and averages \({\overline{y}}_w^{(j)}= \frac{1}{m_w^{(j)}}\sum _{i=1}^nx_i^{(2),j}w_iy_i. \) Hence if \(y_i\) takes values in \(\mathbb {Y}\subset b'(\Lambda )\), and \(\ell \) injective, we have for all \(j\in J\)
The rest of the proof is identical except to replace \(\overline{y}_n^{(j)}\) by \(\overline{y}_w^{(j)}\). Hence, a slight modification of their Theorem 3.1 is for \(j\in J\)
For the two-variable case, we can also adapt their Theorem 3.2 by changing absolute frequencies and averages to weighted frequencies and averages as below.
Frequency | Mean | Index |
---|---|---|
\(m_k^{(2)} = \sum \limits _{i=1}^n x_i^{(2),k}w_i \) | \({\overline{y}}_w^{(2),k} =\frac{1}{m_k^{(2)}} \sum \limits _{i=1}^n y_i x_i^{(2),k}w_i\) | \(k\in K\) |
\(m_l^{(3)} = \sum \limits _{i=1}^n x_i^{(3),l}w_i\) | \({\overline{y}}_w^{(3),l} =\frac{1}{m_l^{(3)}} \sum \limits _{i=1}^n y_i x_i^{(3),l}w_i\) | \(l\in L\) |
\(m_{k,l} = \sum \limits _{i=1}^n x_i^{(k,l)}w_i\) | \({\overline{y}}_w^{(k,l)}=\frac{1}{m_{k,l}}\sum \limits _{i=1}^{n} y_i x_i^{(k,l)}w_i\) | \((k,l)\in K\times L\) |
1.4 A.4 Proof of the log-likelihood of transformed variable
For \(t(x) = \log (d_1 x+d_2)\), \(t'(x) = \frac{d_1}{d_1x+d_2}\), \(t^{-1}(x)=(e^x-d_2)/d_1\) the log-likelihood of y such that t(y) follows (1) is given by
Therefore, the fitted log-likelihood based on (15) is
where \(t_{j,w}^{l}(s)\) are the corresponding observation of random variable \(T_{j,w}^{l}(s)\) defined in Table 4.
B Additional materials for numerical illustrations
1.1 B.1 Simon Wood’s test functions
We define smooth functions to Simon Wood’s test datasets (Wood 2011) as
1.2 B.2 Runtime comparison between GLM tree and explicit GLM tree
See Fig. 6.
Rights and permissions
About this article
Cite this article
Dutang, C., Guibert, Q. An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests. Stat Comput 32, 6 (2022). https://doi.org/10.1007/s11222-021-10059-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10059-x