Penalized expectile regression: an alternative to penalized quantile regression

Lina Liao¹,
Cheolwoo Park¹ &
Hosik Choi²

1662 Accesses
35 Citations
Explore all metrics

Abstract

This paper concerns the study of the entire conditional distribution of a response given predictors in a heterogeneous regression setting. A common approach to address heterogeneous data is quantile regression, which utilizes the minimization of the $L_1$ norm. As an alternative to quantile regression, we consider expectile regression, which relies on the minimization of the asymmetric $L_2$ norm and detects heteroscedasticity effectively. We assume that only a small set of predictors is relevant to the response and develop penalized expectile regression with SCAD and adaptive LASSO penalties. With properly chosen tuning parameters, we show that the proposed estimators display oracle properties. A numerical study using simulated and real examples demonstrates the competitive performance of the proposed penalized expectile regression, and its combined use with penalized quantile regression would be helpful and recommended for practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group penalized expectile regression

Article 20 November 2024

Hierarchically penalized quantile regression with multiple responses

Article 19 June 2018

Bayesian regularized regression based on composite quantile method

Article 29 April 2016

References

Aigner, D., Amemiya, T., Poirier, D. (1976). On the estimation of production frontiers: Maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review, 17, 377–396.
Belloni, A., Chernozhukov, V. (2011). $\ell _1$-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39, 82–130.
Belloni, A., Chernozhukov, V., Kato, K. (2015). Uniform post-selection inference for least absolute deviation regression and other z-estimation problems. Biometrika, 102, 77–94.
Chatterjee, A., Lahiri, S. N. (2010). Asymptotic properties of the residual bootstrap for Lasso estimators. Proceedings of the American Mathematical Society, 138, 4497–4509.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Friberg, H. A. (2014) Rmosek: The r-to-mosek optimization interface. r package version 1.2.5.1.
Geyer, C. J. (1994). On the asymptotics of constrained M-estimation. The Annals of Statistics, 22, 1993–2010.
Article MathSciNet MATH Google Scholar
Gu, Y., Zou, H. (2016). High-dimensional generalizations of asymmetric least squares regression and their applications. The Annals of Statistics, 44, 2661–2694.
Harrison, D., Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Environmental Economics and Management, 5, 81–102.
Javanmard, A., Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15, 2869–2909.
Jones, M. C. (1994). Expectiles and m-quantiles are quantiles. Statistics & Probability Letters, 20, 149–153.
Article MathSciNet MATH Google Scholar
Kim, Y., Choi, H., Oh, H. S. (2008). Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 103, 1665–1673.
Knight, K., Fu, W. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28, 1356–1378.
Kocherginsky, M., He, X., Mu, Y. (2005). Practical confidence intervals for regression quantiles. Journal of Computational and Graphical Statistics, 14(1), 41–55.
Koenker, R., Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.
Koenker, R., Mizera, I. (2014). Convex optimization in R. Journal of Statistical Software, 60, 1–23.
Kuan, C. M., Yeh, J. H., Hsu, Y. C. (2009). Assessing value at risk with care, the conditional autoregressive expectile models. Journal of Econometrics, 150, 261–270.
Li, Y., Zhu, J. (2008). $l_1$-norm quantile regression. Journal of Computational and Graphical Statistics, 17, 1–23.
Lockhart, R., Taylor, J., Tibshirani, R. J., Tibshirani, R. (2014). A significance test for the Lasso. The Annals of Statistics, 42, 413–468.
Minnier, J., Tian, T., Cai, T. (2011). A perturbation method for inference on regularized regression estimates. Journal of the American Statistical Association, 106, 1371–1382.
MOSEK ApS D. (2011). The mosek optimization tool manual. version 7.0. https://www.mosek.com.
Newey, W. K., Powell, J. L. (1987). Asymmetric least squares estimation and testing. Econometrica, 55, 819–847.
Schnabel, S. K., Eilers, P. H. C. (2009). Optimal expectile smoothing. Computational Statistics and Data Analysis, 53, 4168–4177.
Sobotka, F., Radice, R., Marra, G., Kneib, T. (2013a). Estimating the relationship between women’s education and fertility in Bostwana by using and instumental variable approach to semiparametric expectile regression. Journal of the Royal Statistical Society, Series B, 62, 25–45.
Sobotka, F., Radice, R., Marra, G., Kneib, T. (2013b). On confidence intervals for semiparametric expectile regression. Statistics and Computing, 23, 135–148.
Wang, L., Wu, Y., Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-hign dimension. Journal of the American Statistical Association, 107, 214–222.
Wu, Y., Liu, Y. (2009). Variable selection in quantile regression. Statistica Sinica, 19, 801–817.
Yuille, A. L., Rangarajan, A. (2003). The concave–convex procedure. Neural Computation, 15, 915–936.
Zhang, C. H., Zhang, S. S. (2014). Confidence intervals for low-dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B, 76, 217–242.
Ziegel, J. (2014). Coherence and elicitability. Mathematical Finance, 26, 901–918.
Article MathSciNet MATH Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet MATH Google Scholar
Zou, H., Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1533.

Download references

Acknowledgements

This work is part of the first author’s dissertation. The third author’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2013R1A1A2007611).

Author information

Authors and Affiliations

Department of Statistics, University of Georgia, Athens, GA, 30602, USA
Lina Liao & Cheolwoo Park
Department of Applied Statistics, Kyonggi University, Suwon, Kyonggi-do, 16227, Korea
Hosik Choi

Authors

Lina Liao
View author publications
You can also search for this author in PubMed Google Scholar
Cheolwoo Park
View author publications
You can also search for this author in PubMed Google Scholar
Hosik Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hosik Choi.

Appendix: Proofs of theorems

1.1 Proof of Theorem 1

Following Wu and Liu (2009), it is sufficient to show that for any given $\delta > 0$, there exists a large constant C such that

$$\begin{aligned} P\left\{ \displaystyle \inf _{\Vert \mathbf {u} \Vert = C} R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) > R_{n}({\varvec{\beta }}_0) \right\} \ge 1 - \delta . \end{aligned}$$

(10)

It implies that there exists a local minimizer satisfying $\Vert \hat{{\varvec{\beta }}} - {\varvec{\beta }}_0 \Vert = O_p(n^{-\frac{1}{2}})$. Now consider

$$\begin{aligned}&R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \\&\quad = \displaystyle \sum _{i = 1}^n \Big ( \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 - \mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \\&\qquad +\, n \displaystyle \sum _{j = 1}^p \Big ( p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )\Big ). \end{aligned}$$

Because $p'_{\lambda _n}(\theta ) = \lambda _n \{I(\theta \le \lambda _n ) + \frac{(a\lambda _n - \theta )_{+}}{(a-1)\lambda _n}I(\theta > \lambda _n ) \} \ge 0$ for some $a > 2$ and $\theta > 0$, and $p_{\lambda _n}(0) = 0$,

$$\begin{aligned} n (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )) = n (p_{\lambda _n}(\vert u_j/\sqrt{n} \vert ) - p_{\lambda _n}(0)) \ge 0 \end{aligned}$$

for $j = q+1, \ldots , p$. Hence,

$$\begin{aligned}&R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \\&\quad \ge \displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 -\mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \nonumber \\&\qquad +\, n\sum _{j = 1}^q \Big (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert ) \Big ). \nonumber \end{aligned}$$

(11)

We first consider the second term on the right-hand side of (11). For $j = 1, \ldots , q$,

$$\begin{aligned}&n (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )) \\&\quad = n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{u_j}{\sqrt{n}} + \frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{2} \Big (\frac{u_j}{\sqrt{n}} \Big )^2 + o \Big (\frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{n} \Big ) \Big )\\&\quad = O \Big ( \sqrt{n} \max _{1 \le j \le q} p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) + \max _{1 \le j \le q} p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \Big ). \end{aligned}$$

For large n,

$$\begin{aligned} p^{'}_{\lambda _n}(\vert \beta _{j0} \vert )= & {} \lambda _n \Big (I(\vert \beta _{j0} \vert \le \lambda _n ) + \frac{(a \lambda _n - \vert \beta _{j0} \vert )_{+}}{(a-1)\lambda _n}I(\vert \beta _{j0} \vert > \lambda _n ) \Big ) \\= & {} \frac{(a \lambda _n - \vert \beta _{j0} \vert )_{+}}{a-1} \rightarrow 0 \text{ as } \lambda _n \rightarrow 0, \\ p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )= & {} -\frac{1}{a-1} I(\lambda _n< \vert \beta _{j0} \vert < a \lambda _n ) \rightarrow 0 \text{ as } \lambda _n \rightarrow 0. \end{aligned}$$

Denote the first and second derivatives of $\rho _{\tau }(\epsilon _i - t)$ at $t = 0$ as follows:

$$\begin{aligned} g_\tau (\epsilon _i)= & {} \rho ^{'}_{\tau }(\epsilon _i - t)\mid _{t=0} = -2\tau \epsilon _i I(\epsilon _i \ge 0) - 2(1 - \tau )\epsilon _i I(\epsilon _i< 0),\\ h_\tau (\epsilon _i)= & {} \rho ^{''}_{\tau }(\epsilon _i - t)\mid _{t=0} = 2\tau I(\epsilon _i \ge 0) + 2(1 - \tau )I(\epsilon _i < 0). \end{aligned}$$

Then $\mathrm {E}(g_\tau (\epsilon _i)) = 0$. Denote ${\mathrm {Var}}(g_\tau (\epsilon _i)) = \sigma _{g_\tau }^2$, $\mathrm {E}(h_\tau (\epsilon _i)) = \mu _{h_\tau } > 0$ and ${\mathrm {Var}}(h_\tau (\epsilon _i)) = \sigma _{h_\tau }^2, i = 1, \ldots , n$. According to model (3.1), $\epsilon _i = y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0, i = 1, \ldots , n$. Now we consider the first term on the right-hand side of (11):

$$\begin{aligned}&\displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 -\mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \\&\quad = \displaystyle \sum _{i = 1}^n \Big ( g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} + \frac{h_\tau (\epsilon _i)}{2} \Big (\frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n}\Big ) \Big ). \end{aligned}$$

We note that

$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}}= & {} \mathrm {E}\left( \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}}\right) + O_p \left( \sqrt{{\mathrm {Var}}\left( \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \right) } \right) \\= & {} O_p \left( \sqrt{\mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} \sigma _{g_\tau }^2} \right) , \end{aligned}$$

and

$$\begin{aligned} \displaystyle \sum _{i = 1}^n \frac{h_\tau (\epsilon _i)}{2} \left( \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \right) ^2= & {} \frac{\mu _{h_{\tau }}}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} + O_p \left( \sqrt{ \frac{1}{4} \sum _{i = 1}^n \Big (\mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} \Big )^2 \sigma _{h_\tau }^2} \right) \\= & {} \frac{\mu _{h_\tau }}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} + o_p(1). \end{aligned}$$

Therefore, $R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0)$ is dominated by $ \frac{\mu _{h_\tau }}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u}$, for $\Vert \mathbf {u} \Vert = C$, where C is sufficiently large. In conclusion, there exists a local minimizer of $R_{n}({\varvec{\beta }})$, $\hat{{\varvec{\beta }}}^{(\mathrm{SCAD})}$, such that $\Vert \hat{{\varvec{\beta }}}^{(\mathrm{SCAD})} - {\varvec{\beta }}_0 \Vert = O_p(n^{-\frac{1}{2}})$, if $\lambda _n \rightarrow 0$ as $n \rightarrow \infty $. $\square $

1.2 Proof of Theorem 2

(a) For any ${\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})$, $0 < \Vert {\varvec{\beta }}_2 \Vert \le Cn^{-\frac{1}{2}}$,

$$\begin{aligned}&R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1) - \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1 - \mathbf {x}_{i2}^{\text{ T }}{\varvec{\beta }}_2) \Big ) - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _{j} \vert ) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 + o \Big ((\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}))^2 \Big ) \right) \nonumber \\&\qquad - \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \right. \nonumber \\&\qquad \left. +\, o \Big ((\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }})^2 \Big ) \right) - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _{j} \vert ). \end{aligned}$$

(12)

By Condition 2 and following the proof of Theorem 1,

$$\begin{aligned}&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) = O_p \left( \sqrt{\sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\\&\quad = O_p \left( \sqrt{\sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 = \frac{\mu _{h_\tau }}{2} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + o_p(1)\\&\quad = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2\\&\quad = \frac{\mu _{h_\tau }}{2} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ o_p(1) = O_p(1). \end{aligned}$$

Now we consider the last term on the right-hand side of (12). For $j = q+1, \ldots , p$,

$$\begin{aligned} p_{\lambda _n}(\vert \beta _j \vert )= & {} \displaystyle \lim _{\theta \rightarrow 0^{+}} p_{\lambda _n}(\theta ) + \displaystyle \lim _{\theta \rightarrow 0^{+}} p'_{\lambda _n}(\theta ) \vert \beta _j \vert + o(\vert \beta _j \vert )\\= & {} \lambda _n \vert \beta _j \vert + o(\vert \beta _j \vert ). \end{aligned}$$

Therefore, $n \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _j \vert ) = n \lambda _n \Big (\sum _{j = q+1}^p \Big (\vert \beta _j \vert + o(\vert \beta _j \vert /\lambda _n)\Big ) \Big )$. Because ${\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})$, $o(\vert \beta _j \vert /\lambda _n) = o \Big (\displaystyle \frac{1}{\sqrt{n}\lambda _n}\Big )$. We note that $\sqrt{n} \lambda _n \rightarrow \infty $, $n \lambda _n \rightarrow \infty $ as $n \rightarrow \infty $ and $R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }})$ is dominated by

$$\begin{aligned} - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _j \vert ). \end{aligned}$$

Consequently,

$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \rightarrow -\infty \text{ as } n \rightarrow \infty . \end{aligned}$$

This completes the proof of part(a) of the theorem. $\square $

(b) From Theorem 1 and part(a), we know $\hat{{\varvec{\beta }}}_1$ is a root-n consistent local minimizer of $R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }})$. Let ${\varvec{\theta }}_1 = \sqrt{n} ({\varvec{\beta }}_1 - {\varvec{\beta }}_{10})$, i.e., ${\varvec{\beta }}_1 = {\varvec{\beta }}_{10} + {\varvec{\theta }}_1/\sqrt{n}$. Then

$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }})= & {} \displaystyle \sum _{i = 1}^n \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1) + n \displaystyle \sum _{j = 1}^q p_{\lambda _n}(\vert \beta _j \vert )\\= & {} \displaystyle \sum _{i = 1}^n \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n}) + n \displaystyle \sum _{j = 1}^q p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert ) \\\triangleq & {} Q_n({\varvec{\theta }}_1). \end{aligned}$$

Because $\hat{{\varvec{\theta }}}_1 = \sqrt{n} (\hat{{\varvec{\beta }}}_1^{(\mathrm{SCAD})} - {\varvec{\beta }}_{10})$ is a local minimizer of $Q_n({\varvec{\theta }}_1),$

$$\begin{aligned} \displaystyle \frac{\partial Q_n({\varvec{\theta }}_1)}{\partial \theta _j} \mid _{{\varvec{\theta }}_1 = \hat{{\varvec{\theta }}}_1} = 0, \end{aligned}$$

for $j = 1, \ldots , q$. Now we decompose the derivative of $Q_n({\varvec{\theta }}_1)$ by parts:

$$\begin{aligned} \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n})= & {} \rho _{\tau }(\epsilon _i) + g_\tau (\epsilon _i) \Big (-\frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{\sqrt{n}} \Big )\\&+ \frac{h_\tau (\epsilon _i)}{2} \Big (-\frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{\sqrt{n}} \Big )^2 + o(1), \nonumber \\ \frac{\partial }{\partial \theta _j} \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n})= & {} - g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}} + h_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{n}x_{ij}, \nonumber \\ p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert )= & {} p_{\lambda _n}(\vert \beta _{j0} \vert ) + p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{\theta _j}{\sqrt{n}}\\&+\, \frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{2} \Big (\frac{\theta _j}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n} \Big ). \end{aligned}$$

Therefore, as $n \rightarrow \infty ,$

$$\begin{aligned} n \frac{\partial }{\partial \theta _j} p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert ) = n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\theta _j}{n} \Big ) \rightarrow 0. \end{aligned}$$

(13)

From the proof of Theorem 1, (13) holds. Plugging them in $\displaystyle \frac{\partial Q_n({\varvec{\theta }}_1)}{\partial \theta _j} \mid _{{\varvec{\theta }}_1 = \hat{{\varvec{\theta }}}_1} = 0$, for $j = 1, \ldots , q$, we have

$$\begin{aligned} 0= & {} \sum _{i = 1}^n \Big (- g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}} + h_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij} \Big ) \\&+ \sum _{j = 1}^q n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \Big ),\\ \mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij}= & {} \sum _{i=1}^n \Big (g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}}-\frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij} \Big )\\&-\sum _{j = 1}^q n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \Big ),\\ \mu _{h_\tau } \sum _{i=1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1= & {} \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} -\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1 - \sum _{j = 1}^q \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}),\\ \hat{{\varvec{\theta }}}_1= & {} \left( \mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \right) ^{-1} \left( \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} -\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1\right. \\&\left. -\sum _{j = 1}^q \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}) \right) , \end{aligned}$$

where $\mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10})$ is defined as a q-dimensional vector with the $j\mathrm{th}$ element $n \big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \big )$. According to (13) and Condition 2, as $n \rightarrow \infty $, $\mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow \mu _{h_\tau } \Sigma _{11},$$\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow 0, \text{ and } \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}) \rightarrow 0.$ In addition, $\mathrm {E}\Big (g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}}\Big ) = {\varvec{0}}, i = 1, \ldots , n,$

$$\begin{aligned}&{\mathrm {Var}}\Big (\sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}}\Big ) = \sigma _{g_\tau }^2 \frac{\sum _{i = 1}^n \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow \sigma _{g_\tau }^2 \Sigma _{11},\\&\sum _{i = 1}^n \mathrm {E}\left( \Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert ^2 I\Big (\Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert > \xi \Big ) \right) \le \displaystyle \sum _{i = 1}^n \frac{\mathrm {E}\Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert ^4}{\xi ^2}\\&\quad = \frac{1}{\xi ^2} \mathrm {E}\Big (g_\tau ^4(\epsilon _i)\Big ) \displaystyle \sum _{i = 1}^n \left( \frac{\mathbf {x}_{i1}^{\text{ T }}\mathbf {x}_{i1}}{n} \right) ^2 \rightarrow 0, \end{aligned}$$

for any $\xi > 0$. Applying Lindeberg–Feller CLT, we have

$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w_1} \sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma _{11}). \end{aligned}$$

By Slutsky’s theorem, $\hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} \Big (\mu _{h_\tau } \Sigma _{11} \Big )^{-1} \mathbf {w_1}.$ Then, we can conclude,

$$\begin{aligned} \sqrt{n} (\hat{{\varvec{\beta }}}_1^{(\mathrm{SCAD})} - {\varvec{\beta }}_{10}) \xrightarrow []{\mathcal { L }} N({\varvec{0}}, \sigma _{g_\tau }^2/\mu _{h_\tau }^2 \Sigma _{11}^{-1}). \end{aligned}$$

This completes the proof. $\square $

1.3 Proof of Theorem 3

We first prove the asymptotic normality in part (b). Let ${\varvec{\theta }} = \sqrt{n} ({\varvec{\beta }} - {\varvec{\beta }}_{0})$. Then, we have

$$\begin{aligned} V_n({\varvec{\theta }})\triangleq & {} R_{n}({\varvec{\beta }}_0 + {\varvec{\theta }}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \nonumber \\= & {} \sum _{i = 1}^n \Big ( g_\tau (\epsilon _i) \Big (- \frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big ) + \frac{h_\tau (\epsilon _i)}{2} \Big ( -\frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n}\Big ) \Big ) \nonumber \\&+\, n \lambda _n \displaystyle \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) \nonumber \\= & {} \sum _{i = 1}^n g_\tau (\epsilon _i) \Big (- \frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big ) + \frac{\mu _{h_\tau }}{2} {\varvec{\theta }}^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} {\varvec{\theta }} + \frac{1}{2} {\varvec{\theta }}^{\text{ T }}\sum _{i = 1}^n \Big ( \frac{(h_\tau (\epsilon _i)-\mu _{h_\tau }) \mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \Big ) {\varvec{\theta }} \nonumber \\&+\, o_p(1) + n \lambda _n \displaystyle \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ). \end{aligned}$$

(14)

From the proof of Theorem 2,

$$\begin{aligned}&\displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w} \sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma ), \\&\frac{\mu _{h_\tau }}{2} \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \rightarrow \frac{\mu _{h_\tau }}{2} \Sigma , \\&\frac{1}{2} \displaystyle \sum _{i = 1}^n \frac{(h_\tau (\epsilon _i)-\mu _{h_\tau }) \mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \rightarrow 0. \end{aligned}$$

Now we consider the last term of (14). For $1 \le j \le q$,

$$\begin{aligned} w_j \xrightarrow []{\mathcal { P }} \vert \beta _{j0} \vert ^{-\gamma }, \sqrt{n} \big (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert \big ) \rightarrow \theta _j \text{ sgn }(\beta _{j0}). \end{aligned}$$

By Slutsky’s theorem, $n \lambda _n \sum _{j = 1}^p w_j\big (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert \big ) \xrightarrow []{\mathcal { P }} 0$ because $\sqrt{n} \lambda _n \rightarrow 0$ as $n \rightarrow \infty $. For $q+1 \le j \le p$, $\beta _{j0} = 0$ and

$$\begin{aligned} \sqrt{n} (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) = \vert \theta _j \vert , \sqrt{n} \lambda _n w_j = \lambda _n n^{(\gamma +1)/2} (\vert \sqrt{n} \tilde{\beta }_j \vert )^{-\gamma }, \end{aligned}$$

where $\tilde{\beta }_j$ is the $j\mathrm{th}$ element of $\tilde{{\varvec{\beta }}}$ defined in (3.5) and $\sqrt{n} \tilde{\beta }_j = O_p(1)$. Therefore

$$\begin{aligned} n \lambda _n \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) {\left\{ \begin{array}{ll} \xrightarrow []{\mathcal { P }} \infty &{}\text{ if } \theta _j \ne 0,\\ = 0 &{} \text{ if } \theta _j = 0. \end{array}\right. } \end{aligned}$$

Applying Slutsky’s theorem again, we have $V_n({\varvec{\theta }}) \xrightarrow []{\mathcal { L }} V({\varvec{\theta }})$ for every ${\varvec{\theta }}$. Here,

$$\begin{aligned} V({\varvec{\theta }}) = {\left\{ \begin{array}{ll} \displaystyle \frac{\mu _{h_\tau }}{2} {{\varvec{\theta }}_1}^{\text{ T }}\Sigma _{11} {\varvec{\theta }}_1 + \mathbf {w}_1^{\text{ T }}{\varvec{\theta }}_1 &{}\quad \text{ if } \theta _{j} = 0, q+1 \le j \le p, \\ \infty &{}\quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$

where $\mathbf {w}_1 = (w_1, w_2, \ldots , w_q)^{\text{ T }}\sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma _{11})$ and $ {\varvec{\theta }}_1 = (\theta _1, \theta _2, \ldots , \theta _q)^{\text{ T }}$. We note that $V_n({\varvec{\theta }})$ is convex and the unique minimum of $V({\varvec{\theta }})$ is

$$\begin{aligned} ((-(\mu _{h_\tau } \Sigma _{11})^{-1} \mathbf {w}_{1})^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}. \end{aligned}$$

With the epi-convergence results of Geyer (1994) and Knight and Fu (2000), we have

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\beta }}}_1^{(\mathrm{AL})} - {\varvec{\beta }}_{10}) = \hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} -(b \Sigma _{11})^{-1} \mathbf {w}_{1} \sim N({\varvec{0}}, \sigma _{g_\tau }^2/\mu _{h_\tau }^2 \Sigma _{11}^{-1}) \end{aligned}$$

and

$$\begin{aligned} \sqrt{n} (\hat{{\varvec{\beta }}}_2^{(\mathrm{AL})} - {\varvec{\beta }}_{20}) = \hat{{\varvec{\theta }}}_2 \xrightarrow []{\mathcal { L }} {\varvec{0}} \end{aligned}$$

where $\hat{{\varvec{\theta }}}_2 = (\hat{\theta }_{q+1}, \hat{\theta }_{q+2}, \ldots , \hat{\theta }_p)^{\text{ T }}, $ which proves the asymptotic normality property.

Next, we show the sparsity property. For any ${\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})$, $0 < \Vert {\varvec{\beta }}_2 \Vert \le Cn^{-\frac{1}{2}}$, following the proof of Theorem 2, we have

$$\begin{aligned}&R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 + o \Big ((\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}))^2 \Big ) \right) \nonumber \\&\qquad - \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \right. \nonumber \\&\qquad \left. +\, o_p \Big ((\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }})^2 \Big ) \right) - n \lambda _n \displaystyle \sum _{j = q+1}^p w_j (\vert \beta _{j} \vert ). \end{aligned}$$

(15)

The first two terms are bounded in the same way as the proof of Theorem 2:

$$\begin{aligned}&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) = O_p \left( \sqrt{\sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\\&\quad = O_p \left( \sqrt{\sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\sigma _{g_\tau }^2} \right) = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 \\&\quad = \frac{\mu _{h_\tau }}{2} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + o_p(1) = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \\&\quad =\frac{\mu _{h_\tau }}{2} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ o_p(1)= O_p(1). \end{aligned}$$

For the third term on the right-hand side of (15),

$$\begin{aligned} n \lambda _n \displaystyle \sum _{j = q+1}^p w_j \vert \beta _j \vert = n^{(\gamma +1)/2} \lambda _n \sqrt{n} \displaystyle \sum _{j = q+1}^p (\sqrt{n} \tilde{\beta }_j)^{-\gamma } \vert \beta _j \vert \rightarrow \infty , \end{aligned}$$

because $\sqrt{n} \tilde{\beta }_j = O_p(1)$ and $n^{(\gamma +1)/2} \lambda _n \rightarrow \infty $. Therefore,

This implies $\hat{{\varvec{\beta }}}_2^{(\mathrm{AL})} = {\varvec{0}}$. $\square $

1.4 Proof of Corollary 1

From the proof of Theorem 2, it can be shown that

and $\hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} \Big (\Sigma _{11}^h \Big )^{-1} \mathbf {w_1}$. $\square $

About this article

Cite this article

Liao, L., Park, C. & Choi, H. Penalized expectile regression: an alternative to penalized quantile regression. Ann Inst Stat Math 71, 409–438 (2019). https://doi.org/10.1007/s10463-018-0645-1

Download citation

Received: 18 March 2017
Revised: 01 January 2018
Published: 19 February 2018
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s10463-018-0645-1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group penalized expectile regression

Hierarchically penalized quantile regression with multiple responses

Bayesian regularized regression based on composite quantile method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of theorems

1.1 Proof of Theorem 1

1.2 Proof of Theorem 2

1.3 Proof of Theorem 3

1.4 Proof of Corollary 1

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Penalized expectile regression: an alternative to penalized quantile regression

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Group penalized expectile regression

Hierarchically penalized quantile regression with multiple responses

Bayesian regularized regression based on composite quantile method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of theorems

Appendix: Proofs of theorems

1.1 Proof of Theorem 1

1.2 Proof of Theorem 2

1.3 Proof of Theorem 3

1.4 Proof of Corollary 1

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation