Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Usage of the GO estimator in high dimensional linear models

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper discusses simultaneous parameter estimation and variable selection and presents a new penalized regression method. The method is based on the idea that the coefficient estimates are shrunken towards a predetermined coefficient vector which represents the prior information. This method can result in smaller length estimates of the coefficients depending on the prior information compared to elastic net. In addition to the establishment of the grouping property, we also show that the new method has the grouping effect when the predictors are highly correlated. Simulation studies and real data example show that the prediction performance of the new method is improved over the well-known ridge, lasso and elastic net regression methods yielding a lower mean squared error and competes about the variable selection under sparse and non-sparse situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. This method is originally named as naive elastic net by Zou and Hastie (2005). The authors use a scaled version of the method and called it as elastic net. But we follow the same line with Friedman et al. (2010) who drop this distinction.

References

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  Google Scholar 

  • Bühlmann P, Kalisch M, Meier L (2014) High-dimensional statistics with a view toward applications in biology. Ann Rev Stat Appl 1(1):255–278

    Article  Google Scholar 

  • Donoho DL, Johnstone JM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455

    Article  MathSciNet  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332

    Article  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton

    Book  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  • Javanmard A, Montanari A (2014) Confidence intervals and hypothesis testing for high-dimensional regression. J Mach Learn Res 15(1):2869–2909

    MathSciNet  MATH  Google Scholar 

  • Özkale MR, Kaçıranlar S (2007) The restricted and unrestricted two-parameter estimators. Commun Stat Theory Methods 36(15):2707–2725

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108

    Article  MathSciNet  Google Scholar 

  • Wang Y, Jiang Y, Zhang J, Chen Z, Xie B, Zhao C (2019) Robust variable selection based on the random quantile lasso. Commun Stat Simul Comput 2009:1–11

    Google Scholar 

  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67

    Article  MathSciNet  Google Scholar 

  • Zhang C, Wu Y, Zhu M (2019) Pruning variable selection ensembles. Stat Anal Data Min ASA Data Sci J 12(3):168–184

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Murat Genç.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of Theorem 1

Let

$$\begin{aligned} Q\left( \hat{\varvec{\beta }};{\mathbf {b}},\lambda ,\alpha \right) =\frac{1}{2n}\left\| {\mathbf {y}}-{\mathbf {X}}\varvec{\beta }\right\| _{2}^{2}+\lambda \left( \alpha \left\| \varvec{\beta }\right\| _{1}+\frac{1-\alpha }{2}\left\| \varvec{\beta }-{\mathbf {b}}\right\| _{2}^{2}\right) . \end{aligned}$$

We write the sub-gradients of this function with respect to \(\beta _{i}\), \(\beta _{j}\) and set them equal to zero:

$$\begin{aligned} \frac{\partial Q}{\partial \beta _{i}}=-\frac{1}{n}{\mathbf {x}}_{i}^{\top }\left( {\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}\right) +\lambda \alpha {\hat{s}}_{i}+\lambda \left( 1-\alpha \right) \left( {\hat{\beta }}_{i}-b_{i}\right)&=0 \end{aligned}$$
(12)
$$\begin{aligned} \frac{\partial Q}{\partial \beta _{j}}=-\frac{1}{n}{\mathbf {x}}_{j}^{\top }\left( {\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}\right) +\lambda \alpha {\hat{s}}_{j}+\lambda \left( 1-\alpha \right) \left( {\hat{\beta }}_{j}-b_{j}\right)&=0, \end{aligned}$$
(13)

where \({\hat{s}}_{i}\) and \({\hat{s}}_{j}\) are the sub-gradients of the absolute value function of \(\beta _{i}\) and \(\beta _{j}\).

Subtracting Eq. (12) from Eq. (13) and applying Cauchy Schwarz inequality, we get

$$\begin{aligned} \left| {\hat{\beta }}_{j}-{\hat{\beta }}_{i}-\left( b_{j}-b_{i}\right) \right| \le \frac{1}{n\lambda \left( 1-\alpha \right) }\sqrt{\left\| {\mathbf {x}}_{i}-{\mathbf {x}}_{j}\right\| _{2}^{2}\left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}}, \end{aligned}$$
(14)

where \(\hat{{\mathbf {r}}}={\mathbf {y}}-{\mathbf {X}}\hat{\varvec{\beta }}\). Since \(\left\| {\mathbf {x}}_{i}-{\mathbf {x}}_{j}\right\| _{2}^{2}=2\left( 1-\rho \right) \), we obtain

$$\begin{aligned} \left| {\hat{\beta }}_{j}-{\hat{\beta }}_{i}-\left( b_{j}-b_{i}\right) \right| \le \frac{1}{n\lambda \left( 1-\alpha \right) }\sqrt{2\left( 1-\rho \right) \left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}}. \end{aligned}$$
(15)

Furthermore, \(Q\left( \hat{\varvec{\beta }};{\mathbf {b}},\lambda ,\alpha \right) \le Q\left( {\mathbf {0}};{\mathbf {b}},\lambda ,\alpha \right) \) holds because \(\hat{\varvec{\beta }}\) is the minimizer of Q. Hence, we write

$$\begin{aligned} \frac{1}{2n}\left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}+\lambda \alpha \left\| \hat{\varvec{\beta }}\right\| _{1}+\frac{\lambda \left( 1-\alpha \right) }{2}\left\| \hat{\varvec{\beta }}-{\mathbf {b}}\right\| _{2}^{2}\le \frac{1}{2n}\left\| {\mathbf {y}}\right\| _{2}^{2}+\frac{\lambda \left( 1-\alpha \right) }{2}\left\| {\mathbf {b}}\right\| _{2}^{2} \end{aligned}$$

which implies that

$$\begin{aligned} \left\| \hat{{\mathbf {r}}}\right\| _{2}^{2}\le \left\| {\mathbf {y}}\right\| _{2}^{2}+n\lambda \left( 1-\alpha \right) \left\| {\mathbf {b}}\right\| _{2}^{2}. \end{aligned}$$
(16)

If we consider Eqs. (15) and (16) together, then

$$\begin{aligned} \left| {\hat{\beta }}_{j}-{\hat{\beta }}_{i}-\left( b_{j}-b_{i}\right) \right|&\le \frac{1}{n\lambda \left( 1-\alpha \right) }\sqrt{2\left( 1-\rho \right) }\sqrt{\left\| {\mathbf {y}}\right\| _{1}^{2}+n\lambda \left( 1-\alpha \right) \left\| {\mathbf {b}}\right\| _{2}^{2}} \end{aligned}$$

which completes the proof.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Genç, M., Özkale, M.R. Usage of the GO estimator in high dimensional linear models. Comput Stat 36, 217–239 (2021). https://doi.org/10.1007/s00180-020-01001-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01001-2

Keywords

Navigation