Abstract
Sparse covariates are frequent in classification and regression problems where the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of nonzero parameters, and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regression model and our aim is to address robust and penalized estimation for the regression parameter. We introduce a family of penalized weighted M-type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalization functions including the so-called Sign penalty. We provide a careful analysis of the estimators convergence rates as well as their variable selection capability and asymptotic distribution for fixed and random penalties. A robust cross-validation criterion is also proposed. Through a numerical study, we compare the finite sample performance of the classical and robust penalized estimators, under different contamination scenarios. The analysis of real datasets enables to investigate the stability of the penalized estimators in the presence of outliers.
Similar content being viewed by others
References
Avella-Medina M, Ronchetti E (2018) Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105:31–44
Basu A, Gosh A, Mandal A, Martin N, Pardo L (2017) A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator. Electr J Stat 11:2741–2772
Bianco A, Martínez E (2009) Robust testing in the logistic regression model. Comput Stat Data Anal 53:4095–4105
Bianco A, Yohai V (1996) Robust estimation in the logistic regression model. Lecture Notes Stat 109:17–34
Bondell HD (2005) Minimum distance estimation for the logistic regression model. Biometrika 92:724–731
Bondell HD (2008) A characteristic function approach to the biased sampling model, with application to robust logistic regression. J Stat Plann Inference 138:742–755
Cantoni E, Ronchetti E (2001) Robust inference for generalized linear models. J Am Stat Assoc 96:1022–1030
Chi EC, Scott DW (2014) Robust parametric classification and variable selection by a minimum distance criterion. J Comput Graph Stat 23:111–128
Croux C, Flandre C, Haesbroeck G (2002) The breakdown behavior of the maximum likelihood estimator in the logistic regression model. Stat Probabil Lett 60:377–386
Croux C, Haesbroeck G (2003) Implementing the Bianco and Yohai estimator for logistic regression. Comput Stat Data Anal 44:273–295
Efron B, Hastie T (2016) Computer age statistical inference. Cambridge University Press, Cambridge
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least Angle Regression. Annals Stat 32:407–499
Esser E, Lou Y, Xin J (2013) A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J Imag Sci 6:2010–2046
Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–135
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441
Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the Lasso and generalizations. Chapman and Hall, London
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Knight K, Fu W (2000) Asymptotics for Lasso-type estimators. Annals Stat 28:1356–1378
Kurnaz FS, Hoffmann I, Filzmoser P (2018) Robust and sparse estimation methods for high-dimensional linear and logistic regression. Chemomet Intell Lab Syst 172:211–222
Meinshausen N (2007) Relaxed Lasso. Comput Stat Data Anal 52:374–393
Öllerer V, Croux C (2015) Robust high-dimensional precision matrix estimation. In: Nordhausen K, Taskinen S (eds) Modern nonparametric, robust and multivariate methods. Springer, Cham, pp 325–350
Park H, Konishi S (2016) Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. J Stat Comput Simul 86:1450–1461
Rahimi Y, Wang C, Dong H, Lou Y (2019) A scale invariant approach for sparse signal recovery. SIAM J Sci Comput 41:3649–3672
Smucler E, Yohai VJ (2017) Robust and sparse estimators for linear regression models. Comput Stat Data Anal 111:116–130
Tarr G, Müller S, Weber NC (2016) Robust estimation of precision matrices under cellwise contamination. Comput Stat Data Anal 93:404–420
Tibshirani J, Manning CD (2013) Robust Logistic Regression using Shift Parameters. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp. 124-129
Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58:267–288
van de Geer S, Müller P (2012) Quasi-likelihood and/or robust estimation in high dimensions. Stat Sci 27:469–480
Wang C, Yan M, Rahimi Y, Lou Y (2020) Accelerated schemes for the \(L_1/L_2\) minimization. IEEE Trans Signal Process 68:2660–2669
Wang F, Mukherjee S, Richardson S, Hill S (2020) High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking. Stat Comput 30:697–719
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Annals Stat 38:894–942
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc: Series B 67:301–320
Funding
This research was partially supported by Grant 20020170100022BA from the Universidad de Buenos Aires and pict 2018-00740 from anpcyt at Buenos Aires, Argentina, and also by the Spanish Project MTM2016-76969P from the Ministry of Economy, Industry and Competitiveness (MINECO/AEI/FEDER, UE) (Ana Bianco and Graciela Boente).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
11749_2021_792_MOESM1_ESM.pdf
The supplementary material (available online) contains the proofs of the main results and describes the algorithm used to compute the estimators. It also includes some additional numerical experiments. We also present the analysis of a dataset related to tomography images. (704 KB)
Rights and permissions
About this article
Cite this article
Bianco, A.M., Boente, G. & Chebi, G. Penalized robust estimators in sparse logistic regression. TEST 31, 563–594 (2022). https://doi.org/10.1007/s11749-021-00792-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-021-00792-w