Abstract
To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property.
Similar content being viewed by others
Notes
The signal-to-noise ratio equals \(\frac{\sqrt{{{\varvec{\beta }}}'\Sigma {{\varvec{\beta }}}}}{\sigma }\).
The expected value of the number of contaminated rows is \(n(1-(1-\epsilon )^p)\) for a cellwise contamination level \(\epsilon \).
References
Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann Appl Stat 7(1):226–248
Alqallaf F, Van Aelst S, Yohai V, Zamar R (2009) Propagation of outliers in multivariate data. Ann Stat 37(1):311–331
Belsley D, Kuh E, Welsch R (1980) Regression diagnostics: identifying influential data and source of collinearity. Wiley, New York
Brown P (1982) Multivariate calibration. J R Stat Soc Ser B 44(3):287–321
Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Fu W (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7(3):397–416
Harrison D, Rubinfeld D (1978) Hedonic housing prices and the demand of clean air. J Environ Econ Manag 5(1):81–102
Koller M, Stahel W (2011) Sharpening Wald-type inference in robust regression for small samples. Comput Stat Data Anal 55(8):2504–2515
Little R (1992) Regression with missing X’s: a review. J Am Stat Assoc 87(420):1227–1237
Maronna R, Martin R, Yohai V (2006) Robust statistics, 2nd edn. Wiley, Hoboken
Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley, Hoboken
StataCorp (2013) Stata: release 13. Stata Press, College Station, Texas, Statistical Software
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 3:475–494
Van Aelst S, Vandervieren E, Willems G (2010) Robust principal component analysis based on pairwise correlation estimators. In: Lechevallier Y, Saporta G (eds) COMPSTAT 2010: proceedings in computational statistics. Physika, Heidelberg, pp 1677–1684
Van Aelst S, Vandervieren E, Willems G (2011) Stahel–Donoho estimators with cellwise weights. J Stat Comput Simul 81(1):1–27
Van Aelst S, Vandervieren E, Willems G (2012) A Stahel–Donoho estimator based on huberized outlyingness. Comput Stat Data Anal 56(3):531–542
Acknowledgments
We gratefully acknowledge support from the GOA/12/014 Project of the Research Fund KU Leuven. We thank the referees for their constructive comments, and in particular the third anonymous referee who corrected some flaws in the first version of the paper and who made many suggestions for improving the write up of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Öllerer, V., Alfons, A. & Croux, C. The shooting S-estimator for robust regression. Comput Stat 31, 829–844 (2016). https://doi.org/10.1007/s00180-015-0593-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0593-7