Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Algorithm 1 assesses parameter instability only if the splitting variable has two or more levels for a categorical variable, or two or more values for a numeric variable. Otherwise, there is nothing to split.

  2. Our specification has no intercept since two dummy variables are introduced. This means that neither the left nor the right sub-sample is taken as a reference. Note that a model with one intercept and one dummy variable is equivalent and would simply lead to rewrite Eq. (6). However, the fitted log-likelihood is identical with or without intercept, see Corollary 3.1 and Examples 3.1, 3.2 of Brouste et al. (2020).

  3. We do not consider an intercept as the fitted log-likelihood is identical with or without intercept, see Brouste et al. (2020, Corollary 3.1).

  4. The p values can be computed by Monte-Carlo and then adjusted with Bonferroni. This fourth option has also be examined, but the results is not reported as the goodness-of-fits of results is comparable to the other ctree options, but the computation times is much longer.

  5. Since the results of explicit GLM Tree and GLM Tree are equal, only those related to explicit GLM Tree are depicted.

  6. We also model a post-pruned version of the CART model using the one standard error criterion, which produce more accurate results for CART model on the CategIG1 and the ContIG1 datasets, but explicit GLM Tree remains better in terms of RMSE.

References

  • Box, G.E.P., Cox, D.R.: An analysis of transformations revisited. J. Am. Stat. 77, 209–210 (1964)

    Article  Google Scholar 

  • Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  • Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, New CRC, Boca Raton (1984)

    MATH  Google Scholar 

  • Brouste, A., Dutang, C., Rohmer, T.: Closed form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modelling. Comput. Stat. 35, 689–724 (2020)

    Article  Google Scholar 

  • Chambers, J.M., Hastie, T.J.: Statistical Models in S. Chapman and Hall, London (1993)

    MATH  Google Scholar 

  • Ciampi, A.: Generalized regression trees. Comput. Stat. Data Anal. 12(1), 57–78 (1991)

    Article  MathSciNet  Google Scholar 

  • Cortes, C., Vapnik, V.: Supportvector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  • Denuit, M., Hainaut, D., Trufin, J.: Effective Statistical Learning Methods for Actuaries I: GLMs and extensions. Springer Actuarial Lecture Notes. Springer, Berlin (2019)

    Book  Google Scholar 

  • Fahrmeir, L., Kaufmann, H.: Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Stat. 13, 342–368 (1985)

    Article  MathSciNet  Google Scholar 

  • Farkas, S., Lopez, O., Thomas, M.: Cyber claim analysis using Generalized Pareto regression trees with applications to insurance. Insur. Math. Econ. 98, 92–105 (2021)

    Article  MathSciNet  Google Scholar 

  • Fokkema, M.: Fitting prediction rule ensembles with R Package pre. J. Stat. Softw. 92(1), 1–30 (2020)

    Google Scholar 

  • Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. Nonlinear Methods Data Min. 38(4), 367–378 (2002)

    Article  MathSciNet  Google Scholar 

  • Gama, J.: Functional trees. Mach. Learn. 55(3), 219–250 (2004)

    Article  Google Scholar 

  • Garge, N.R., Bobashev, G., Eggleston, B.: Random forest methodology for model-based recursive partitioning: the mobForest package for R. BMC Bioinform. 14, 125 (2013)

    Article  Google Scholar 

  • Hothorn, T., Zeileis, A.: partykit: a modular toolkit for recursive partytioning in R. J. Mach. Learn. Res. 16, 3905–3909 (2015)

    MathSciNet  MATH  Google Scholar 

  • Hothorn, T., Hornik, K., Zeileis, A.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)

    Article  MathSciNet  Google Scholar 

  • James, G., Witten, D., Hastie, T., Tibshirani, R.: ISLR: data for an introduction to statistical learning with applications in R (2017)

  • Kass, G.V.: An exploratory technique for investigating large quantities of categorical data. Ann. Appl. Stat. 29, 119–127 (1980)

    Article  Google Scholar 

  • Kim, H., Loh, W.-Y.: Classification trees with unbiased multiway splits. J. Am. Stat. Assoc. 96(454), 589–604 (2001)

    Article  MathSciNet  Google Scholar 

  • Landwehr, N., Hall, M., Eibe, F.: Logistic model trees. Mach. Learn. 59, 161–205 (2005)

    Article  Google Scholar 

  • Lawrence, J.: Introduction to Neural Networks: Design, Theory and Applications, 6th edn. California Scientific Software, Nevada City (1994)

    Google Scholar 

  • Leisch, F., Dimitriadou, E.: mlbench: machine learning benchmark problems (2021)

  • Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)

  • Liu, N.-T., Lin, F.-C., Shih, Y.-S.: Count regression trees. In: Advances in Data Analysis and Classification (2019)

  • Loh, W.-Y.: Regression trees with unbiased variable selection and interaction detection. Stat. Sin. 12(2), 361–386 (2002)

    MathSciNet  MATH  Google Scholar 

  • Loh, W.-Y.: Fifty years of classification and regression trees. Int. Stat. Rev. 82(3), 329–348 (2014)

    Article  MathSciNet  Google Scholar 

  • Loh, W.-Y., Shih, Y.-S.: Split selection methods for classification trees. Stat. Sin. 7(4), 815–840 (1997)

    MathSciNet  MATH  Google Scholar 

  • Loh, W.-Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis. J. Am. Stat. Assoc. 83(403), 715–725 (1988)

    Article  MathSciNet  Google Scholar 

  • McCullagh, P., Nelder, J.A.: Generalized Linear Models. Statistics and Applied Probability, vol. 37, 2nd edn. CRC, Boca Raton (1989)

    Book  Google Scholar 

  • Philipp, M., Rusch, T., Hornik, K., Strobl, C.: Measuring the stability of results from supervised statistical learning. J. Comput. Graph. Stat. 27(4), 685–700 (2018)

    Article  MathSciNet  Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2021)

  • Rusch, T., Zeileis, A.: Gaining insight with recursive partitioning of generalized linear models. J. Stat. Comput. Simul. 83(7), 1301–1315 (2013)

    Article  MathSciNet  Google Scholar 

  • Seber, G.A.F., Lee, A.J.: Linear Regression Analysis. Wiley, Hoboken (2003)

    Book  Google Scholar 

  • Seibold, H., Hothorn, T., Zeileis, A.: Generalised linear model trees with global additive effects. In: Advances in Data Analysis and Classification (2018)

  • Su, X., Wang, M., Fan, J.: Maximum likelihood regression trees. J. Comput. Graph. Stat. 13(3), 586–598 (2004)

    Article  MathSciNet  Google Scholar 

  • Szöcs, E., Schäfer, R.B.: Ecotoxicology is not normal: a comparison of statistical approaches for analysis of count and proportion data in ecotoxicology. Environ. Sci. Pollut. Res. 22(18), 13990–13999 (2015)

    Article  Google Scholar 

  • Therneau, T., Atkinson, B.: rpart: recursive partitioning and regression trees (2019)

  • Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S. Springer, Berlin (2002)

    Book  Google Scholar 

  • Weisberg, S.: Applied Linear Regression. Wiley, Hoboken (2005)

    Book  Google Scholar 

  • Wilson, K., Grenfell, B.T.: Generalized linear modelling for parasitologists. Parasitol. Today 13(1), 33–38 (1997)

    Article  Google Scholar 

  • Wood, S.N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(1), 3–36 (2011)

    Article  MathSciNet  Google Scholar 

  • Zeileis, A., Hornik, K.: Generalized M- fluctuation tests for parameter instability. Stat. Neerl. 61(4), 488–508 (2007)

    Article  MathSciNet  Google Scholar 

  • Zeileis, A., Hothorn, T., Hornik, K.: Model-based recursive partitioning. J. Comput. Graph. Stat. 17(2), 492–514 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are also very grateful for the useful suggestions of the two anonymous referees, which led to significant improvements of this article. The remaining errors, of course, should be attributed to the authors alone. This paper also benefits from fruitful discussions with members of the French chair DIALog – Digital Insurance And Long-term risks – under the aegis of the Fondation du Risque, a joint initiative by UCBL and CNP; and with members of the French laboratory SAF (UCBL and ISFA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christophe Dutang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Computation details on likelihood and deviance

1.1 A.1 Proof of the fitted log-likelihood

Using Corollary 3.1 of Brouste et al. (2020) for the j-th partitioning variable and using notations from Table 2, we have

$$\begin{aligned}&\log L(\widehat{\theta }_L{(s)}, \widehat{\theta }_R{(s)}, {\varvec{y}}, s) \nonumber \\= & {} \frac{1}{a(\phi )} \sum _{i \in L(j,s)}{\left( y_i {\tilde{b}}\left( {\overline{y}}_j^{L}(s)\right) - b\left( {\tilde{b}}\left( {\overline{y}}_j^{L}(s)\right) \right) \right) } \nonumber \\&+\frac{1}{a(\phi )} \sum _{i \in R(j,s)}{\left( y_i {\tilde{b}}\left( {\overline{y}}_j^{R}(s)\right) - b\left( {\tilde{b}}\left( {\overline{y}}_j^{R}(s)\right) \right) \right) } \nonumber \\&\quad + \sum _{i=1}^nc(y_i,\phi ) \nonumber \\= & {} \frac{1}{a(\phi )}\sum _{l \in \left\{ L,R\right\} }\left( {\tilde{b}}\left( {\overline{y}}_j^{l}(s)\right) m_j^l(s){\overline{y}}_j^l(s)\right. \nonumber \\&\left. - b\left( {\tilde{b}}\left( {\overline{y}}_j^{l}(s)\right) \right) m_j^l(s)\right) \nonumber \\&+ \sum _{i=1}^nc(y_i,\phi ), \end{aligned}$$
(15)

where \({\tilde{b}}=(b^\prime )^{-1}\) is the inverse of \(b^\prime \).

1.2 A.2 Proof of the fitted deviance

Let us note the fitted mean \({\hat{\mu _i}}=g^{-1}(<z_i,\widehat{{\varvec{\theta }}{(s)}}>)={\overline{y}}_n^{(j)}\). The deviance is given by

$$\begin{aligned} D(\widehat{{\varvec{\mu }}},{\varvec{y}})= & {} -2\log L(\widehat{{\varvec{\mu }}},\phi ,{\varvec{y}}) +2\log L({\varvec{y}},\phi ,{\varvec{y}}) \\= & {} \frac{-2}{a(\phi )}\sum _{j=1}^2 {\tilde{b}}\left( {\overline{y}}_n^{(j)}\right) {\overline{y}}_n^{(j)} m_j \\&\quad - \frac{-2}{a(\phi )}\sum _{j=1}^2 b\left( {\tilde{b}}\left( {\overline{y}}_n^{(j)}\right) \right) m_j \\&+\frac{2}{a(\phi )}\sum _{i=1}^n {\tilde{b}}\left( y_i\right) y_i - \frac{2}{a(\phi )}\sum _{i=1}^n b\left( {\tilde{b}}\left( y_i\right) \right) . \end{aligned}$$

1.3 A.3 Proof of the fitted log-likelihood

We adapt the proof of Brouste et al. (2020) in order to take into account weights when maximizing the log-likelihood. For a known weight \(w_i\), their Equations (4) and (5) become

$$\begin{aligned}&\log L({\varvec{\theta }}\,|\,{\varvec{y}}) = \sum _{i=1}^n w_i \frac{y_i\ell (\eta _i) - b\left( \ell (\eta _i)\right) }{a(\phi )} + \sum _{i=1}^nw_ic(y_i,\phi ), \\&S_j({\varvec{\theta }}) = \frac{1}{a(\phi )}\sum _{i=1}^n w_i x_i^{(j)}\ell '(\eta _i)\left( y_i - b'\left( \ell (\eta _i)\right) \right) . \end{aligned}$$

We adapt their proof of Theorem 3.1. The system \(S({\varvec{\theta }})=0\) is

$$\begin{aligned} \left\{ \begin{array}{ll}\displaystyle \sum _{i=1}^n\ell '(\eta _i)w_i\left( y_i - b'\circ \ell (\eta _i)\right) = 0\\ \displaystyle \sum _{i=1}^nx_i^{(2),j}\ell '(\eta _i)w_i\left( y_i - b'\circ \ell (\eta _i)\right) = 0,\quad \forall j\in J. \end{array}\right. \end{aligned}$$

The first equation being redundant, the second equation simplifies to \(\forall j\in J\)

$$\begin{aligned}&\ell '(\theta _{(1)}+\theta _{(2),j})\sum \limits _{i=1}^nx_i^{(2),j}w_iy_i \\&- \ell '(\theta _{(1)}+\theta _{(2),j})b'\circ \ell (\theta _{(1)}+\theta _{(2),j})\sum \limits _{i=1}^nx_i^{(2),j}w_i = 0. \end{aligned}$$

Using weighted frequencies \( m_w^{(j)} = \sum _{i=1}^nx_i^{(2),j}w_i\), and averages \({\overline{y}}_w^{(j)}= \frac{1}{m_w^{(j)}}\sum _{i=1}^nx_i^{(2),j}w_iy_i. \) Hence if \(y_i\) takes values in \(\mathbb {Y}\subset b'(\Lambda )\), and \(\ell \) injective, we have for all \(j\in J\)

$$\begin{aligned} {\overline{y}}_w^{(j)}=b'\circ \ell (\theta _{(1)}+\theta _{(2),j}) \Leftrightarrow \theta _{(1)}+\theta _{(j)} = g(\overline{y}_w^{(j)}). \end{aligned}$$

The rest of the proof is identical except to replace \(\overline{y}_n^{(j)}\) by \(\overline{y}_w^{(j)}\). Hence, a slight modification of their Theorem 3.1 is for \(j\in J\)

$$\begin{aligned} \widehat{{\varvec{\theta }}}_{n,(1)}= & {} \frac{\displaystyle \sum \nolimits _{k=1}^d r_{k}g(\overline{y}_w^{(k)})}{\displaystyle \sum \nolimits _{k=1}^d r_{k} - r_{0}}, \widehat{{\varvec{\theta }}}_{n,(2),j}\\= & {} g(\overline{y}_w^{(j)}) - \frac{\displaystyle \sum \nolimits _{k=1}^d r_{k}g(\overline{y}_w^{(k)}) }{\displaystyle \sum \nolimits _{k=1}^d r_{k} - r_{0}}. \end{aligned}$$

For the two-variable case, we can also adapt their Theorem 3.2 by changing absolute frequencies and averages to weighted frequencies and averages as below.

Frequency

Mean

Index

\(m_k^{(2)} = \sum \limits _{i=1}^n x_i^{(2),k}w_i \)

\({\overline{y}}_w^{(2),k} =\frac{1}{m_k^{(2)}} \sum \limits _{i=1}^n y_i x_i^{(2),k}w_i\)

\(k\in K\)

\(m_l^{(3)} = \sum \limits _{i=1}^n x_i^{(3),l}w_i\)

\({\overline{y}}_w^{(3),l} =\frac{1}{m_l^{(3)}} \sum \limits _{i=1}^n y_i x_i^{(3),l}w_i\)

\(l\in L\)

\(m_{k,l} = \sum \limits _{i=1}^n x_i^{(k,l)}w_i\)

\({\overline{y}}_w^{(k,l)}=\frac{1}{m_{k,l}}\sum \limits _{i=1}^{n} y_i x_i^{(k,l)}w_i\)

\((k,l)\in K\times L\)

1.4 A.4 Proof of the log-likelihood of transformed variable

For \(t(x) = \log (d_1 x+d_2)\), \(t'(x) = \frac{d_1}{d_1x+d_2}\), \(t^{-1}(x)=(e^x-d_2)/d_1\) the log-likelihood of y such that t(y) follows (1) is given by

$$\begin{aligned} \log f_Y(y) = \frac{\lambda t(y) - b(\lambda )}{a(\phi )} + c(y,\phi )+ \log \left( \frac{d_1}{d_1y+d_2}\right) . \end{aligned}$$

Therefore, the fitted log-likelihood based on (15) is

$$\begin{aligned} \begin{aligned}&\log L_j({\hat{\theta _L}} {(s)},{\hat{\theta _R}} {(s)},{\varvec{y}}, s)\\&\quad = \frac{1}{a(\phi )} \sum _{l\in \{L, R\}} {\tilde{b}}\left( {\overline{t}}_{j,w}^{l}(s)\right) m_{j,w}^l(s) {\overline{t}}_{j,w}^{l}(s) \\&\quad - \frac{1}{a(\phi )} \sum _{l\in \{L, R\}} b\left( {\tilde{b}}\left( {\overline{t}}_{j,w}^{l}(s)\right) \right) m_{j,w}^l(s) + \sum _{i=1}^n{\tilde{c}}(y_i,\phi ), \end{aligned} \end{aligned}$$
(16)

where \(t_{j,w}^{l}(s)\) are the corresponding observation of random variable \(T_{j,w}^{l}(s)\) defined in Table 4.

B Additional materials for numerical illustrations

1.1 B.1 Simon Wood’s test functions

We define smooth functions to Simon Wood’s test datasets (Wood 2011) as

$$\begin{aligned} \begin{aligned}&f_0=5\sin (2\pi x),~ f_1=\exp (3x)-7, \\&f_2=0.5\times x^{11}(10(1 - x))^6 - 10 (10x)^3(1 - x)^{10}, \\&f_3=15 \exp (-5 |x-1/2|)-6, \\&f_4=2-1_{(x<= 1/3)}(6x)^3 - 1_{(x>= 2/3)} (6-6x)^3 \\&\quad \quad - 1_{(2/3> x> 1/3)}(8+2\sin (9(x-1/3)\pi )), \\&f_5(x) =\lfloor 20x\rfloor -10,~ f_6(x) =10-\lceil 20x\rceil ,~ \\&f_7(x) =\sin (50 x)+10 x-10,\\&f_8(x) =8+2\cos (50 x)-50 x (1-x),~ \\&f_9(x) =\lceil 50 x (1-x)\rceil -5, f_{10}(x) =5\log (x+10^{-6})+5, \\&f_{11}(x) =-10-5\log (x+10^{-6})+\sin (50x), \\&f_{12}(x) =2\log (x+10^{-6})-2\log (1-x+10^{-6}),\\&f_{13}(x) =10|\sin (20 x)|, \\&f_{14}(x) =1_{(x <= 1/2)}\times 5\sin (20 x) \\&\quad \quad + 1_{(x > 1/2)}\times (5\sin (10) + (\exp (5 (x-0.5))-1)). \end{aligned} \end{aligned}$$

1.2 B.2 Runtime comparison between GLM tree and explicit GLM tree

See Fig. 6.

Fig. 6
figure 6

Mean runtime ratio of GLM tree over explicit GLM tree

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dutang, C., Guibert, Q. An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests. Stat Comput 32, 6 (2022). https://doi.org/10.1007/s11222-021-10059-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10059-x

Keywords

Navigation