Appendix: Proofs of theorems
1.1 Proof of Theorem 1
Following Wu and Liu (2009), it is sufficient to show that for any given \(\delta > 0\), there exists a large constant C such that
$$\begin{aligned} P\left\{ \displaystyle \inf _{\Vert \mathbf {u} \Vert = C} R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) > R_{n}({\varvec{\beta }}_0) \right\} \ge 1 - \delta . \end{aligned}$$
(10)
It implies that there exists a local minimizer satisfying \(\Vert \hat{{\varvec{\beta }}} - {\varvec{\beta }}_0 \Vert = O_p(n^{-\frac{1}{2}})\). Now consider
$$\begin{aligned}&R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \\&\quad = \displaystyle \sum _{i = 1}^n \Big ( \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 - \mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \\&\qquad +\, n \displaystyle \sum _{j = 1}^p \Big ( p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )\Big ). \end{aligned}$$
Because \(p'_{\lambda _n}(\theta ) = \lambda _n \{I(\theta \le \lambda _n ) + \frac{(a\lambda _n - \theta )_{+}}{(a-1)\lambda _n}I(\theta > \lambda _n ) \} \ge 0\) for some \(a > 2\) and \(\theta > 0\), and \(p_{\lambda _n}(0) = 0\),
$$\begin{aligned} n (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )) = n (p_{\lambda _n}(\vert u_j/\sqrt{n} \vert ) - p_{\lambda _n}(0)) \ge 0 \end{aligned}$$
for \(j = q+1, \ldots , p\). Hence,
$$\begin{aligned}&R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \\&\quad \ge \displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 -\mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \nonumber \\&\qquad +\, n\sum _{j = 1}^q \Big (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert ) \Big ). \nonumber \end{aligned}$$
(11)
We first consider the second term on the right-hand side of (11). For \(j = 1, \ldots , q\),
$$\begin{aligned}&n (p_{\lambda _n}(\vert \beta _{j0} + u_j/\sqrt{n} \vert ) - p_{\lambda _n}(\vert \beta _{j0} \vert )) \\&\quad = n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{u_j}{\sqrt{n}} + \frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{2} \Big (\frac{u_j}{\sqrt{n}} \Big )^2 + o \Big (\frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{n} \Big ) \Big )\\&\quad = O \Big ( \sqrt{n} \max _{1 \le j \le q} p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) + \max _{1 \le j \le q} p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \Big ). \end{aligned}$$
For large n,
$$\begin{aligned} p^{'}_{\lambda _n}(\vert \beta _{j0} \vert )= & {} \lambda _n \Big (I(\vert \beta _{j0} \vert \le \lambda _n ) + \frac{(a \lambda _n - \vert \beta _{j0} \vert )_{+}}{(a-1)\lambda _n}I(\vert \beta _{j0} \vert > \lambda _n ) \Big ) \\= & {} \frac{(a \lambda _n - \vert \beta _{j0} \vert )_{+}}{a-1} \rightarrow 0 \text{ as } \lambda _n \rightarrow 0, \\ p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )= & {} -\frac{1}{a-1} I(\lambda _n< \vert \beta _{j0} \vert < a \lambda _n ) \rightarrow 0 \text{ as } \lambda _n \rightarrow 0. \end{aligned}$$
Denote the first and second derivatives of \(\rho _{\tau }(\epsilon _i - t)\) at \(t = 0\) as follows:
$$\begin{aligned} g_\tau (\epsilon _i)= & {} \rho ^{'}_{\tau }(\epsilon _i - t)\mid _{t=0} = -2\tau \epsilon _i I(\epsilon _i \ge 0) - 2(1 - \tau )\epsilon _i I(\epsilon _i< 0),\\ h_\tau (\epsilon _i)= & {} \rho ^{''}_{\tau }(\epsilon _i - t)\mid _{t=0} = 2\tau I(\epsilon _i \ge 0) + 2(1 - \tau )I(\epsilon _i < 0). \end{aligned}$$
Then \(\mathrm {E}(g_\tau (\epsilon _i)) = 0\). Denote \({\mathrm {Var}}(g_\tau (\epsilon _i)) = \sigma _{g_\tau }^2\), \(\mathrm {E}(h_\tau (\epsilon _i)) = \mu _{h_\tau } > 0\) and \({\mathrm {Var}}(h_\tau (\epsilon _i)) = \sigma _{h_\tau }^2, i = 1, \ldots , n\). According to model (3.1), \(\epsilon _i = y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0, i = 1, \ldots , n\). Now we consider the first term on the right-hand side of (11):
$$\begin{aligned}&\displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0 -\mathbf {x}_i^{\text{ T }}\mathbf {u}/\sqrt{n}) - \rho _{\tau }(y_i - \mathbf {x}_i^{\text{ T }}{\varvec{\beta }}_0) \Big ) \\&\quad = \displaystyle \sum _{i = 1}^n \Big ( g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} + \frac{h_\tau (\epsilon _i)}{2} \Big (\frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n}\Big ) \Big ). \end{aligned}$$
We note that
$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}}= & {} \mathrm {E}\left( \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}}\right) + O_p \left( \sqrt{{\mathrm {Var}}\left( \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \right) } \right) \\= & {} O_p \left( \sqrt{\mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} \sigma _{g_\tau }^2} \right) , \end{aligned}$$
and
$$\begin{aligned} \displaystyle \sum _{i = 1}^n \frac{h_\tau (\epsilon _i)}{2} \left( \frac{\mathbf {x}_i^{\text{ T }}\mathbf {u}}{\sqrt{n}} \right) ^2= & {} \frac{\mu _{h_{\tau }}}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} + O_p \left( \sqrt{ \frac{1}{4} \sum _{i = 1}^n \Big (\mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} \Big )^2 \sigma _{h_\tau }^2} \right) \\= & {} \frac{\mu _{h_\tau }}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u} + o_p(1). \end{aligned}$$
Therefore, \(R_{n}({\varvec{\beta }}_0 + \mathbf {u}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0)\) is dominated by \( \frac{\mu _{h_\tau }}{2} \mathbf {u}^{\text{ T }}\frac{\sum _{i = 1}^n \mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \mathbf {u}\), for \(\Vert \mathbf {u} \Vert = C\), where C is sufficiently large. In conclusion, there exists a local minimizer of \(R_{n}({\varvec{\beta }})\), \(\hat{{\varvec{\beta }}}^{(\mathrm{SCAD})}\), such that \(\Vert \hat{{\varvec{\beta }}}^{(\mathrm{SCAD})} - {\varvec{\beta }}_0 \Vert = O_p(n^{-\frac{1}{2}})\), if \(\lambda _n \rightarrow 0\) as \(n \rightarrow \infty \). \(\square \)
1.2 Proof of Theorem 2
(a) For any \({\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})\), \(0 < \Vert {\varvec{\beta }}_2 \Vert \le Cn^{-\frac{1}{2}}\),
$$\begin{aligned}&R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \Big (\rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1) - \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1 - \mathbf {x}_{i2}^{\text{ T }}{\varvec{\beta }}_2) \Big ) - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _{j} \vert ) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 + o \Big ((\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}))^2 \Big ) \right) \nonumber \\&\qquad - \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \right. \nonumber \\&\qquad \left. +\, o \Big ((\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }})^2 \Big ) \right) - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _{j} \vert ). \end{aligned}$$
(12)
By Condition 2 and following the proof of Theorem 1,
$$\begin{aligned}&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) = O_p \left( \sqrt{\sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\\&\quad = O_p \left( \sqrt{\sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 = \frac{\mu _{h_\tau }}{2} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + o_p(1)\\&\quad = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2\\&\quad = \frac{\mu _{h_\tau }}{2} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ o_p(1) = O_p(1). \end{aligned}$$
Now we consider the last term on the right-hand side of (12). For \(j = q+1, \ldots , p\),
$$\begin{aligned} p_{\lambda _n}(\vert \beta _j \vert )= & {} \displaystyle \lim _{\theta \rightarrow 0^{+}} p_{\lambda _n}(\theta ) + \displaystyle \lim _{\theta \rightarrow 0^{+}} p'_{\lambda _n}(\theta ) \vert \beta _j \vert + o(\vert \beta _j \vert )\\= & {} \lambda _n \vert \beta _j \vert + o(\vert \beta _j \vert ). \end{aligned}$$
Therefore, \(n \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _j \vert ) = n \lambda _n \Big (\sum _{j = q+1}^p \Big (\vert \beta _j \vert + o(\vert \beta _j \vert /\lambda _n)\Big ) \Big )\). Because \({\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})\), \(o(\vert \beta _j \vert /\lambda _n) = o \Big (\displaystyle \frac{1}{\sqrt{n}\lambda _n}\Big )\). We note that \(\sqrt{n} \lambda _n \rightarrow \infty \), \(n \lambda _n \rightarrow \infty \) as \(n \rightarrow \infty \) and \(R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }})\) is dominated by
$$\begin{aligned} - n \displaystyle \sum _{j = q+1}^p p_{\lambda _n}(\vert \beta _j \vert ). \end{aligned}$$
Consequently,
$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \rightarrow -\infty \text{ as } n \rightarrow \infty . \end{aligned}$$
This completes the proof of part(a) of the theorem. \(\square \)
(b) From Theorem 1 and part(a), we know \(\hat{{\varvec{\beta }}}_1\) is a root-n consistent local minimizer of \(R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }})\). Let \({\varvec{\theta }}_1 = \sqrt{n} ({\varvec{\beta }}_1 - {\varvec{\beta }}_{10})\), i.e., \({\varvec{\beta }}_1 = {\varvec{\beta }}_{10} + {\varvec{\theta }}_1/\sqrt{n}\). Then
$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }})= & {} \displaystyle \sum _{i = 1}^n \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_1) + n \displaystyle \sum _{j = 1}^q p_{\lambda _n}(\vert \beta _j \vert )\\= & {} \displaystyle \sum _{i = 1}^n \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n}) + n \displaystyle \sum _{j = 1}^q p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert ) \\\triangleq & {} Q_n({\varvec{\theta }}_1). \end{aligned}$$
Because \(\hat{{\varvec{\theta }}}_1 = \sqrt{n} (\hat{{\varvec{\beta }}}_1^{(\mathrm{SCAD})} - {\varvec{\beta }}_{10})\) is a local minimizer of \(Q_n({\varvec{\theta }}_1),\)
$$\begin{aligned} \displaystyle \frac{\partial Q_n({\varvec{\theta }}_1)}{\partial \theta _j} \mid _{{\varvec{\theta }}_1 = \hat{{\varvec{\theta }}}_1} = 0, \end{aligned}$$
for \(j = 1, \ldots , q\). Now we decompose the derivative of \(Q_n({\varvec{\theta }}_1)\) by parts:
$$\begin{aligned} \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n})= & {} \rho _{\tau }(\epsilon _i) + g_\tau (\epsilon _i) \Big (-\frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{\sqrt{n}} \Big )\\&+ \frac{h_\tau (\epsilon _i)}{2} \Big (-\frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{\sqrt{n}} \Big )^2 + o(1), \nonumber \\ \frac{\partial }{\partial \theta _j} \rho _{\tau }(y_i - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\beta }}_{10} - \mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1/\sqrt{n})= & {} - g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}} + h_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}^{\text{ T }}{\varvec{\theta }}_1}{n}x_{ij}, \nonumber \\ p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert )= & {} p_{\lambda _n}(\vert \beta _{j0} \vert ) + p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{\theta _j}{\sqrt{n}}\\&+\, \frac{p^{''}_{\lambda _n}(\vert \beta _{j0} \vert )}{2} \Big (\frac{\theta _j}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n} \Big ). \end{aligned}$$
Therefore, as \(n \rightarrow \infty ,\)
$$\begin{aligned} n \frac{\partial }{\partial \theta _j} p_{\lambda _n}(\vert \beta _{j0} + \theta _j/\sqrt{n} \vert ) = n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\theta _j}{n} \Big ) \rightarrow 0. \end{aligned}$$
(13)
From the proof of Theorem 1, (13) holds. Plugging them in \(\displaystyle \frac{\partial Q_n({\varvec{\theta }}_1)}{\partial \theta _j} \mid _{{\varvec{\theta }}_1 = \hat{{\varvec{\theta }}}_1} = 0\), for \(j = 1, \ldots , q\), we have
$$\begin{aligned} 0= & {} \sum _{i = 1}^n \Big (- g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}} + h_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij} \Big ) \\&+ \sum _{j = 1}^q n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \Big ),\\ \mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij}= & {} \sum _{i=1}^n \Big (g_\tau (\epsilon _i) \frac{x_{ij}}{\sqrt{n}}-\frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1}^{\text{ T }}\hat{{\varvec{\theta }}}_1}{n}x_{ij} \Big )\\&-\sum _{j = 1}^q n \Big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \Big ),\\ \mu _{h_\tau } \sum _{i=1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1= & {} \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} -\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1 - \sum _{j = 1}^q \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}),\\ \hat{{\varvec{\theta }}}_1= & {} \left( \mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \right) ^{-1} \left( \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} -\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n}\hat{{\varvec{\theta }}}_1\right. \\&\left. -\sum _{j = 1}^q \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}) \right) , \end{aligned}$$
where \(\mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10})\) is defined as a q-dimensional vector with the \(j\mathrm{th}\) element \(n \big (p^{'}_{\lambda _n}(\vert \beta _{j0} \vert ) \text{ sgn }(\beta _{j0}) \frac{1}{\sqrt{n}} + p^{''}_{\lambda _n}(\vert \beta _{j0} \vert ) \frac{\hat{\theta }_j}{n} \big )\). According to (13) and Condition 2, as \(n \rightarrow \infty \), \(\mu _{h_\tau } \sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow \mu _{h_\tau } \Sigma _{11},\)\(\sum _{i = 1}^n \frac{(h_\tau (\epsilon _i) - \mu _{h_\tau }) \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow 0, \text{ and } \mathbf {m}_{\lambda _n}(\hat{{\varvec{\theta }}}_1, {\varvec{\beta }}_{10}) \rightarrow 0.\) In addition, \(\mathrm {E}\Big (g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}}\Big ) = {\varvec{0}}, i = 1, \ldots , n,\)
$$\begin{aligned}&{\mathrm {Var}}\Big (\sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}}\Big ) = \sigma _{g_\tau }^2 \frac{\sum _{i = 1}^n \mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \rightarrow \sigma _{g_\tau }^2 \Sigma _{11},\\&\sum _{i = 1}^n \mathrm {E}\left( \Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert ^2 I\Big (\Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert > \xi \Big ) \right) \le \displaystyle \sum _{i = 1}^n \frac{\mathrm {E}\Vert g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \Vert ^4}{\xi ^2}\\&\quad = \frac{1}{\xi ^2} \mathrm {E}\Big (g_\tau ^4(\epsilon _i)\Big ) \displaystyle \sum _{i = 1}^n \left( \frac{\mathbf {x}_{i1}^{\text{ T }}\mathbf {x}_{i1}}{n} \right) ^2 \rightarrow 0, \end{aligned}$$
for any \(\xi > 0\). Applying Lindeberg–Feller CLT, we have
$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w_1} \sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma _{11}). \end{aligned}$$
By Slutsky’s theorem, \(\hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} \Big (\mu _{h_\tau } \Sigma _{11} \Big )^{-1} \mathbf {w_1}.\) Then, we can conclude,
$$\begin{aligned} \sqrt{n} (\hat{{\varvec{\beta }}}_1^{(\mathrm{SCAD})} - {\varvec{\beta }}_{10}) \xrightarrow []{\mathcal { L }} N({\varvec{0}}, \sigma _{g_\tau }^2/\mu _{h_\tau }^2 \Sigma _{11}^{-1}). \end{aligned}$$
This completes the proof. \(\square \)
1.3 Proof of Theorem 3
We first prove the asymptotic normality in part (b). Let \({\varvec{\theta }} = \sqrt{n} ({\varvec{\beta }} - {\varvec{\beta }}_{0})\). Then, we have
$$\begin{aligned} V_n({\varvec{\theta }})\triangleq & {} R_{n}({\varvec{\beta }}_0 + {\varvec{\theta }}/\sqrt{n}) - R_{n}({\varvec{\beta }}_0) \nonumber \\= & {} \sum _{i = 1}^n \Big ( g_\tau (\epsilon _i) \Big (- \frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big ) + \frac{h_\tau (\epsilon _i)}{2} \Big ( -\frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big )^2 + o \Big (\frac{1}{n}\Big ) \Big ) \nonumber \\&+\, n \lambda _n \displaystyle \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) \nonumber \\= & {} \sum _{i = 1}^n g_\tau (\epsilon _i) \Big (- \frac{\mathbf {x}_{i}^{\text{ T }}{\varvec{\theta }}}{\sqrt{n}} \Big ) + \frac{\mu _{h_\tau }}{2} {\varvec{\theta }}^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} {\varvec{\theta }} + \frac{1}{2} {\varvec{\theta }}^{\text{ T }}\sum _{i = 1}^n \Big ( \frac{(h_\tau (\epsilon _i)-\mu _{h_\tau }) \mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \Big ) {\varvec{\theta }} \nonumber \\&+\, o_p(1) + n \lambda _n \displaystyle \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ). \end{aligned}$$
(14)
From the proof of Theorem 2,
$$\begin{aligned}&\displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w} \sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma ), \\&\frac{\mu _{h_\tau }}{2} \displaystyle \sum _{i = 1}^n \frac{\mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \rightarrow \frac{\mu _{h_\tau }}{2} \Sigma , \\&\frac{1}{2} \displaystyle \sum _{i = 1}^n \frac{(h_\tau (\epsilon _i)-\mu _{h_\tau }) \mathbf {x}_{i} \mathbf {x}_{i}^{\text{ T }}}{n} \rightarrow 0. \end{aligned}$$
Now we consider the last term of (14). For \(1 \le j \le q\),
$$\begin{aligned} w_j \xrightarrow []{\mathcal { P }} \vert \beta _{j0} \vert ^{-\gamma }, \sqrt{n} \big (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert \big ) \rightarrow \theta _j \text{ sgn }(\beta _{j0}). \end{aligned}$$
By Slutsky’s theorem, \(n \lambda _n \sum _{j = 1}^p w_j\big (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert \big ) \xrightarrow []{\mathcal { P }} 0\) because \(\sqrt{n} \lambda _n \rightarrow 0\) as \(n \rightarrow \infty \). For \(q+1 \le j \le p\), \(\beta _{j0} = 0\) and
$$\begin{aligned} \sqrt{n} (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) = \vert \theta _j \vert , \sqrt{n} \lambda _n w_j = \lambda _n n^{(\gamma +1)/2} (\vert \sqrt{n} \tilde{\beta }_j \vert )^{-\gamma }, \end{aligned}$$
where \(\tilde{\beta }_j\) is the \(j\mathrm{th}\) element of \(\tilde{{\varvec{\beta }}}\) defined in (3.5) and \(\sqrt{n} \tilde{\beta }_j = O_p(1)\). Therefore
$$\begin{aligned} n \lambda _n \sum _{j = 1}^p w_j (\vert \beta _{j0} + \theta _j/\sqrt{n} \vert - \vert \beta _{j0} \vert ) {\left\{ \begin{array}{ll} \xrightarrow []{\mathcal { P }} \infty &{}\text{ if } \theta _j \ne 0,\\ = 0 &{} \text{ if } \theta _j = 0. \end{array}\right. } \end{aligned}$$
Applying Slutsky’s theorem again, we have \(V_n({\varvec{\theta }}) \xrightarrow []{\mathcal { L }} V({\varvec{\theta }})\) for every \({\varvec{\theta }}\). Here,
$$\begin{aligned} V({\varvec{\theta }}) = {\left\{ \begin{array}{ll} \displaystyle \frac{\mu _{h_\tau }}{2} {{\varvec{\theta }}_1}^{\text{ T }}\Sigma _{11} {\varvec{\theta }}_1 + \mathbf {w}_1^{\text{ T }}{\varvec{\theta }}_1 &{}\quad \text{ if } \theta _{j} = 0, q+1 \le j \le p, \\ \infty &{}\quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$
where \(\mathbf {w}_1 = (w_1, w_2, \ldots , w_q)^{\text{ T }}\sim N({\varvec{0}}, \sigma _{g_\tau }^2 \Sigma _{11})\) and \( {\varvec{\theta }}_1 = (\theta _1, \theta _2, \ldots , \theta _q)^{\text{ T }}\). We note that \(V_n({\varvec{\theta }})\) is convex and the unique minimum of \(V({\varvec{\theta }})\) is
$$\begin{aligned} ((-(\mu _{h_\tau } \Sigma _{11})^{-1} \mathbf {w}_{1})^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}. \end{aligned}$$
With the epi-convergence results of Geyer (1994) and Knight and Fu (2000), we have
$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\beta }}}_1^{(\mathrm{AL})} - {\varvec{\beta }}_{10}) = \hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} -(b \Sigma _{11})^{-1} \mathbf {w}_{1} \sim N({\varvec{0}}, \sigma _{g_\tau }^2/\mu _{h_\tau }^2 \Sigma _{11}^{-1}) \end{aligned}$$
and
$$\begin{aligned} \sqrt{n} (\hat{{\varvec{\beta }}}_2^{(\mathrm{AL})} - {\varvec{\beta }}_{20}) = \hat{{\varvec{\theta }}}_2 \xrightarrow []{\mathcal { L }} {\varvec{0}} \end{aligned}$$
where \(\hat{{\varvec{\theta }}}_2 = (\hat{\theta }_{q+1}, \hat{\theta }_{q+2}, \ldots , \hat{\theta }_p)^{\text{ T }}, \) which proves the asymptotic normality property.
Next, we show the sparsity property. For any \({\varvec{\beta }}_1 - {\varvec{\beta }}_{10} = O_p(n^{-\frac{1}{2}})\), \(0 < \Vert {\varvec{\beta }}_2 \Vert \le Cn^{-\frac{1}{2}}\), following the proof of Theorem 2, we have
$$\begin{aligned}&R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \nonumber \\&\quad = \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 + o \Big ((\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}))^2 \Big ) \right) \nonumber \\&\qquad - \displaystyle \sum _{i = 1}^n \left( g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ \frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \right. \nonumber \\&\qquad \left. +\, o_p \Big ((\mathbf {x}_{i1}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }})^2 \Big ) \right) - n \lambda _n \displaystyle \sum _{j = q+1}^p w_j (\vert \beta _{j} \vert ). \end{aligned}$$
(15)
The first two terms are bounded in the same way as the proof of Theorem 2:
$$\begin{aligned}&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) = O_p \left( \sqrt{\sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \sigma _{g_\tau }^2} \right) \\&\quad = O_p(1),\\&\sum _{i = 1}^n g_\tau (\epsilon _i) \mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\\&\quad = O_p \left( \sqrt{\sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\sigma _{g_\tau }^2} \right) = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i1}^{\text{ T }}({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) \Big )^2 \\&\quad = \frac{\mu _{h_\tau }}{2} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}\sum _{i = 1}^n \frac{\mathbf {x}_{i1} \mathbf {x}_{i1}^{\text{ T }}}{n} \sqrt{n} ({\varvec{\beta }}_1 -{\varvec{\beta }}_{10}) + o_p(1) = O_p(1),\\&\frac{h_\tau (\epsilon _i)}{2} \Big (\mathbf {x}_{i}^{\text{ T }}(({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}\Big )^2 \\&\quad =\frac{\mu _{h_\tau }}{2} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }}) \sum _{i = 1}^n \frac{\mathbf {x}_i \mathbf {x}_i^{\text{ T }}}{n} \sqrt{n} (({\varvec{\beta }}_1 -{\varvec{\beta }}_{10})^{\text{ T }}, {\varvec{\beta }}_2^{\text{ T }})^{\text{ T }}+ o_p(1)= O_p(1). \end{aligned}$$
For the third term on the right-hand side of (15),
$$\begin{aligned} n \lambda _n \displaystyle \sum _{j = q+1}^p w_j \vert \beta _j \vert = n^{(\gamma +1)/2} \lambda _n \sqrt{n} \displaystyle \sum _{j = q+1}^p (\sqrt{n} \tilde{\beta }_j)^{-\gamma } \vert \beta _j \vert \rightarrow \infty , \end{aligned}$$
because \(\sqrt{n} \tilde{\beta }_j = O_p(1)\) and \(n^{(\gamma +1)/2} \lambda _n \rightarrow \infty \). Therefore,
$$\begin{aligned} R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {\varvec{0}}^{\text{ T }})^{\text{ T }}) - R_{n}(({{\varvec{\beta }}_1}^{\text{ T }}, {{\varvec{\beta }}_2}^{\text{ T }})^{\text{ T }}) \rightarrow -\infty \; \text{ as } \; n \rightarrow \infty . \end{aligned}$$
This implies \(\hat{{\varvec{\beta }}}_2^{(\mathrm{AL})} = {\varvec{0}}\). \(\square \)
1.4 Proof of Corollary 1
From the proof of Theorem 2, it can be shown that
$$\begin{aligned} \displaystyle \sum _{i = 1}^n g_\tau (\epsilon _i) \frac{\mathbf {x}_{i1}}{\sqrt{n}} \xrightarrow []{\mathcal { L }} \mathbf {w_1} \sim N({\varvec{0}}, \Sigma _{11}^g), \end{aligned}$$
and \(\hat{{\varvec{\theta }}}_1 \xrightarrow []{\mathcal { L }} \Big (\Sigma _{11}^h \Big )^{-1} \mathbf {w_1}\). \(\square \)