Abstract
We propose a statistical inference framework for the component-wise functional gradient descent algorithm (CFGD) under normality assumption for model errors, also known as \(L_2\)-Boosting. The CFGD is one of the most versatile tools to analyze data, because it scales well to high-dimensional data sets, allows for a very flexible definition of additive regression models and incorporates inbuilt variable selection. Due to the variable selection, we build on recent proposals for post-selection inference. However, the iterative nature of component-wise boosting, which can repeatedly select the same component to update, necessitates adaptations and extensions to existing approaches. We propose tests and confidence intervals for linear, grouped and penalized additive model components selected by \(L_2\)-Boosting. Our concepts also transfer to slow-learning algorithms more generally, and to other selection techniques which restrict the response space to more complex sets than polyhedra. We apply our framework to an additive model for sales prices of residential apartments and investigate the properties of our concepts in simulation studies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Automat. Control 19(6), 716–723 (1974)
Berk, R., Brown, L., Buja, A., Zhang, K., Zhao, L., et al.: Valid post-selection inference. Ann. Stat. 41(2), 802–837 (2013)
Brockhaus, S., Scheipl, F., Hothorn, T., Greven, S.: The functional linear array model. Stat. Model. 15(3), 279–300 (2015)
Brockhaus, S., Fuest, A., Mayr, A., Greven, S.: Signal regression models for location, scale and shape with an application to stock returns. J. R. Stat. Soc. Ser. C 67(3), 665–686 (2018)
Bühlmann, P., Hothorn, T.: Boosting algorithms: Regularization, prediction and model fitting (with discussion). Stat. Sci. 22(4), 477–505 (2007)
Bühlmann, P., Yu, B.: Boosting with the \({L}_2\) loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)
Fithian, W., Sun, D., Taylor, J.: Optimal Inference After Model Selection. arXiv e-prints arXiv:1410.2597 (2014)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Hofner, B., Hothorn, T., Kneib, T., Schmid, M.: A framework for unbiased model selection based on boosting. J. Comput. Graph. Stat. 20(4), 956–971 (2011)
Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: Model-based boosting 2.0. J. Mach. Learn. Res 11, 2109–2113 (2010)
Kivaranovic, D., Leeb, H.: Expected length of post-model-selection confidence intervals conditional on polyhedral constraints, vol. 1803, 01665. ArXiv e-prints (2018)
Lee, J.D., Sun, D.L., Sun, Y., Taylor, J.E.: Exact post-selection inference, with application to the lasso. Ann. Stat. 44(3), 907–927 (2016). https://doi.org/10.1214/15-AOS1371
Loftus, J.R., Taylor, J.E.: A significance test for forward stepwise model selection. arXiv e-prints arXiv:1405.3920 (2014)
Loftus, J.R., Taylor, J.E.: Selective inference in regression models with groups of variables. arXiv e-prints arXiv:1511.01478 (2015)
Martino, L., Elvira, V., Louzada, F.: Effective sample size for importance sampling based on discrepancy measures. Signal Process. 131, 386–401 (2017)
Mayr, A., Hofner, B., Waldmann, E., Hepp, T., Meyer, S., Gefeller, O.: An update on statistical boosting in biomedicine. Comput. Math. Methods Med. 2017, 12 (2017a)
Mayr, A., Schmid, M., Pfahlberg, A., Uter, W., Gefeller, O.: A permutation test to analyse systematic bias and random measurement errors of medical devices via boosting location and scale models. Stat. Methods Med. Res. 26(3), 1443–1460 (2017b)
Melcher, M., Scharl, T., Luchner, M., Striedner, G., Leisch, F.: Boosted structured additive regression for escherichia coli fed-batch fermentation modeling. Biotechnol. Bioeng. 114(2), 321–334 (2017). https://doi.org/10.1002/bit.26073
Rafiei, M.H., Adeli, H.: A novel machine learning model for estimation of sale prices of real estate units. J. Constr. Eng. Manage. 142(2), 04015066 (2015)
Rügamer, D., Greven, S.: Selective inference after likelihood- or test-based model selection in linear models. Stat. Probab. Lett. 140, 7–12 (2018)
Rügamer, D., Brockhaus, S., Gentsch, K., Scherer, K., Greven, S.: Boosting factor-specific functional historical models for the detection of synchronization in bioelectrical signals. J. R. Stat. Soc. Ser. C 67(3), 621–642 (2018)
Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. R. Stat. Soc. Ser. B 75(1), 55–80 (2013)
Tian, X., Taylor, J.: Asymptotics of selective inference. Scand. J. Stat. 44(2), 480–499 (2017)
Tian Harris, X., Panigrahi, S., Markovic, J., Bi, N., Taylor, J.: Selective sampling after solving a convex problem, vol. 1609, pp. 05609, ArXiv e-prints (2016)
Tibshirani, R.J., Taylor, J., Lockhart, R., Tibshirani, R.: Exact post-selection inference for sequential regression procedures. J. Am. Stat. Assoc. 111(514), 600–620 (2016)
Tibshirani, R.J., Rinaldo, A., Tibshirani, R., Wasserman, L.: Uniform asymptotic inference and the bootstrap after model selection. Ann. Stat. 46(3), 1255–1287 (2018)
Wasserman, L., Roeder, K.: High dimensional variable selection. Ann. Stat. 37(5A), 2178–2201 (2009)
Yang, F., Barber, R.F., Jain, P., Lafferty, J.: Selective inference for group-sparse linear models. In: Advances in Neural Information Processing Systems, pp. 2469–2477 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Rügamer, D., Greven, S. Inference for \(L_2\)-Boosting. Stat Comput 30, 279–289 (2020). https://doi.org/10.1007/s11222-019-09882-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-019-09882-0