Abstract
This paper discusses simultaneous parameter estimation and variable selection and presents a new penalized regression method. The method is based on the idea that the coefficient estimates are shrunken towards a predetermined coefficient vector which represents the prior information. This method can result in smaller length estimates of the coefficients depending on the prior information compared to elastic net. In addition to the establishment of the grouping property, we also show that the new method has the grouping effect when the predictors are highly correlated. Simulation studies and real data example show that the prediction performance of the new method is improved over the well-known ridge, lasso and elastic net regression methods yielding a lower mean squared error and competes about the variable selection under sparse and non-sparse situations.
Similar content being viewed by others
References
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view toward applications in biology. Ann Rev Stat Appl 1(1):255–278
Donoho DL, Johnstone JM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin
Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton
Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1):2869–2909
Özkale MR, Kaçıranlar S (2007) The restricted and unrestricted two-parameter estimators. Commun Stat Theory Methods 36(15):2707–2725
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108
Wang Y, Jiang Y, Zhang J, Chen Z, Xie B, Zhao C (2019) Robust variable selection based on the random quantile lasso. Commun Stat Simul Comput 2009:1–11
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Zhang C, Wu Y, Zhu M (2019) Pruning variable selection ensembles. Stat Anal Data Min ASA Data Sci J 12(3):168–184
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Proof of Theorem 1
Let
We write the sub-gradients of this function with respect to \(\beta _{i}\), \(\beta _{j}\) and set them equal to zero:
where \({\hat{s}}_{i}\) and \({\hat{s}}_{j}\) are the sub-gradients of the absolute value function of \(\beta _{i}\) and \(\beta _{j}\).
Subtracting Eq. (12) from Eq. (13) and applying Cauchy Schwarz inequality, we get
where \(\hat{{\mathbf {r}}}={\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}\). Since \(\left\| {\mathbf {x}}_{i}-{\mathbf {x}}_{j}\right\| _{2}^{2}=2\left( 1-\rho \right) \), we obtain
Furthermore, \(Q\left( \hat{\varvec{\beta }};{\mathbf {b}},\lambda ,\alpha \right) \le Q\left( {\mathbf {0}};{\mathbf {b}},\lambda ,\alpha \right) \) holds because \(\hat{\varvec{\beta }}\) is the minimizer of Q. Hence, we write
which implies that
If we consider Eqs. (15) and (16) together, then
which completes the proof.\(\square \)
Rights and permissions
About this article
Cite this article
Genç, M., Özkale, M.R. Usage of the GO estimator in high dimensional linear models. Comput Stat 36, 217–239 (2021). https://doi.org/10.1007/s00180-020-01001-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01001-2