Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Penalized robust estimators in sparse logistic regression

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Sparse covariates are frequent in classification and regression problems where the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of nonzero parameters, and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regression model and our aim is to address robust and penalized estimation for the regression parameter. We introduce a family of penalized weighted M-type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalization functions including the so-called Sign penalty. We provide a careful analysis of the estimators convergence rates as well as their variable selection capability and asymptotic distribution for fixed and random penalties. A robust cross-validation criterion is also proposed. Through a numerical study, we compare the finite sample performance of the classical and robust penalized estimators, under different contamination scenarios. The analysis of real datasets enables to investigate the stability of the penalized estimators in the presence of outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Avella-Medina M, Ronchetti E (2018) Robust and consistent variable selection in high-dimensional generalized linear models. Biometrika 105:31–44

    Article  MathSciNet  Google Scholar 

  • Basu A, Gosh A, Mandal A, Martin N, Pardo L (2017) A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator. Electr J Stat 11:2741–2772

    MathSciNet  MATH  Google Scholar 

  • Bianco A, Martínez E (2009) Robust testing in the logistic regression model. Comput Stat Data Anal 53:4095–4105

    Article  MathSciNet  Google Scholar 

  • Bianco A, Yohai V (1996) Robust estimation in the logistic regression model. Lecture Notes Stat 109:17–34

    Article  MathSciNet  Google Scholar 

  • Bondell HD (2005) Minimum distance estimation for the logistic regression model. Biometrika 92:724–731

    Article  MathSciNet  Google Scholar 

  • Bondell HD (2008) A characteristic function approach to the biased sampling model, with application to robust logistic regression. J Stat Plann Inference 138:742–755

    Article  MathSciNet  Google Scholar 

  • Cantoni E, Ronchetti E (2001) Robust inference for generalized linear models. J Am Stat Assoc 96:1022–1030

    Article  MathSciNet  Google Scholar 

  • Chi EC, Scott DW (2014) Robust parametric classification and variable selection by a minimum distance criterion. J Comput Graph Stat 23:111–128

    Article  MathSciNet  Google Scholar 

  • Croux C, Flandre C, Haesbroeck G (2002) The breakdown behavior of the maximum likelihood estimator in the logistic regression model. Stat Probabil Lett 60:377–386

    Article  MathSciNet  Google Scholar 

  • Croux C, Haesbroeck G (2003) Implementing the Bianco and Yohai estimator for logistic regression. Comput Stat Data Anal 44:273–295

    Article  MathSciNet  Google Scholar 

  • Efron B, Hastie T (2016) Computer age statistical inference. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least Angle Regression. Annals Stat 32:407–499

    Article  MathSciNet  Google Scholar 

  • Esser E, Lou Y, Xin J (2013) A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J Imag Sci 6:2010–2046

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  Google Scholar 

  • Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–135

    Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the Lasso and generalizations. Chapman and Hall, London

    Book  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67

    Article  Google Scholar 

  • Knight K, Fu W (2000) Asymptotics for Lasso-type estimators. Annals Stat 28:1356–1378

    MathSciNet  MATH  Google Scholar 

  • Kurnaz FS, Hoffmann I, Filzmoser P (2018) Robust and sparse estimation methods for high-dimensional linear and logistic regression. Chemomet Intell Lab Syst 172:211–222

    Article  Google Scholar 

  • Meinshausen N (2007) Relaxed Lasso. Comput Stat Data Anal 52:374–393

    Article  MathSciNet  Google Scholar 

  • Öllerer V, Croux C (2015) Robust high-dimensional precision matrix estimation. In: Nordhausen K, Taskinen S (eds) Modern nonparametric, robust and multivariate methods. Springer, Cham, pp 325–350

    Chapter  Google Scholar 

  • Park H, Konishi S (2016) Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. J Stat Comput Simul 86:1450–1461

    Article  MathSciNet  Google Scholar 

  • Rahimi Y, Wang C, Dong H, Lou Y (2019) A scale invariant approach for sparse signal recovery. SIAM J Sci Comput 41:3649–3672

    Article  MathSciNet  Google Scholar 

  • Smucler E, Yohai VJ (2017) Robust and sparse estimators for linear regression models. Comput Stat Data Anal 111:116–130

    Article  MathSciNet  Google Scholar 

  • Tarr G, Müller S, Weber NC (2016) Robust estimation of precision matrices under cellwise contamination. Comput Stat Data Anal 93:404–420

    Article  MathSciNet  Google Scholar 

  • Tibshirani J, Manning CD (2013) Robust Logistic Regression using Shift Parameters. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, pp. 124-129

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58:267–288

    Article  Google Scholar 

  • van de Geer S, Müller P (2012) Quasi-likelihood and/or robust estimation in high dimensions. Stat Sci 27:469–480

    MathSciNet  MATH  Google Scholar 

  • Wang C, Yan M, Rahimi Y, Lou Y (2020) Accelerated schemes for the \(L_1/L_2\) minimization. IEEE Trans Signal Process 68:2660–2669

    Article  MathSciNet  Google Scholar 

  • Wang F, Mukherjee S, Richardson S, Hill S (2020) High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking. Stat Comput 30:697–719

    Article  MathSciNet  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Annals Stat 38:894–942

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc: Series B 67:301–320

    Article  MathSciNet  Google Scholar 

Download references

Funding

This research was partially supported by Grant 20020170100022BA from the Universidad de Buenos Aires and pict 2018-00740 from anpcyt at Buenos Aires, Argentina, and also by the Spanish Project MTM2016-76969P from the Ministry of Economy, Industry and Competitiveness (MINECO/AEI/FEDER, UE) (Ana Bianco and Graciela Boente).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graciela Boente.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

11749_2021_792_MOESM1_ESM.pdf

The supplementary material (available online) contains the proofs of the main results and describes the algorithm used to compute the estimators. It also includes some additional numerical experiments. We also present the analysis of a dataset related to tomography images. (704 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bianco, A.M., Boente, G. & Chebi, G. Penalized robust estimators in sparse logistic regression. TEST 31, 563–594 (2022). https://doi.org/10.1007/s11749-021-00792-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-021-00792-w

Keywords

Mathematics Subject Classification

Navigation