A consistent and numerically efficient variable selection method for sparse Poisson regression with applications to learning and signal recovery

501 Accesses
Explore all metrics

Abstract

We propose an adaptive $\ell _1$-penalized estimator in the framework of Generalized Linear Models with identity-link and Poisson data, by taking advantage of a globally quadratic approximation of the Kullback-Leibler divergence. We prove that this approximation is asymptotically unbiased and that the proposed estimator has the variable selection consistency property in a deterministic matrix design framework. Moreover, we present a numerically efficient strategy for the computation of the proposed estimator, making it suitable for the analysis of massive counts datasets. We show with two numerical experiments that the method can be applied both to statistical learning and signal recovery problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and sparse regression in generalized linear model by stochastic optimization

Article 11 June 2019

Variable selection and estimation using a continuous approximation to the $L_0$ penalty

Article 19 October 2016

Penalized robust estimators in sparse logistic regression

Article 12 November 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Anscombe, F.J.: The transformation of Poisson binomial and negative-binomial data. Biometrika 35(3/4), 246–254 (1948). https://doi.org/10.2307/2332343
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet MATH Google Scholar
Bertero, M., Lanteri, H., Zanni, L.: Iterative image reconstruction: a point of view. pp 37–63 (2008)
Bogdan, M., van den Berg, E., Sabatti, C., Su, W., Candès, E.J.: SLOPE-adaptive variable selection via convex optimization. Ann. Appl. Stat. 9(3), 1103 (2015)
Article MathSciNet MATH Google Scholar
Bonettini, S., Benvenuto, F., Zanella, R., Zanni, L., Bertero, M.: Gradient projection approaches for optimization problems in image deblurring and denoising. In: 2009 17th European Signal Processing Conference, pp. 1384–1388 (2009)
Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted $\ell $1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008). https://doi.org/10.1007/s00041-008-9045-x
Article MathSciNet MATH Google Scholar
De Vito, E., Rosasco, L., Caponnetto, A., Giovannini, U.D., Odone, F.: Learning from examples as an inverse problem. J. Mach. Learn. Res. 6(May), 883–904 (2005)
MathSciNet MATH Google Scholar
Dobson, A.J., Barnett, A.: An Introduction to Generalized Linear Models. CRC Press, Boca Raton (2008)
MATH Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
Figueiredo, M.A.T., Bioucas-Dias, J.M.: Restoration of Poissonian images using alternating direction optimization. IEEE Trans. Image Process. 19(12), 3133–3145 (2010). https://doi.org/10.1109/TIP.2010.2053941
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, Berlin (2001)
MATH Google Scholar
Friedman, J., Hastie, T., Hfling, H., Tibshirani, R., et al.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. https://doi.org/10.18637/jss.v033.i01 (2010)
Gu, R., Dogandžić, A.: A fast proximal gradient algorithm for reconstructing nonnegative signals with sparse transform coefficients. In: 48th Asilomar Conference on, IEEE Signals, Systems and Computers, 2014, pp. 1662–1667 (2014)
Hansen, N.R., Reynaud-Bouret, P., Rivoirard, V., et al.: Lasso and probabilistic inequalities for multivariate point processes. Bernoulli 21(1), 83–143 (2015)
Article MathSciNet MATH Google Scholar
Harmany, Z.T., Marcia, R.F., Willett, R.M.: SPIRAL out of convexity: sparsity-regularized algorithms for photon-limited imaging. In: International Society for Optics and Photonics IS&T/SPIE Electronic Imaging, pp. 75,330R–75,330R (2010)
Ivanoff, S., Picard, F., Rivoirard, V.: Adaptive Lasso and group-Lasso for functional Poisson regression. J. Mach. Learn. Res. 17(55), 1–46 (2016)
MathSciNet MATH Google Scholar
Jiang, X., Reynaud-Bouret, P., Rivoirard, V., Sansonnet, L., Willett, R.: A data-dependent weighted LASSO under Poisson noise. arXiv preprint arXiv:1509.08892 (2015)
Marschner, I.C., et al.: Glm2: fitting generalized linear models with convergence problems. The R Journal 3(2), 12–15 (2011)
Article Google Scholar
Martinez, J.G., Carroll, R.J., Mller, S., Sampson, J.N., Chatterjee, N.: Empirical performance of cross-validation with oracle methods in a genomics context. Am. Stat. 65(4), 223–228 (2011)
Article MathSciNet Google Scholar
McCullagh, P., Nelder, J.A.: Generalized linear models, 2nd edn. Chapman & Hall, New York (1989)
Peyré, G.: The numerical tours of signal processing. Comput. Sci. Eng. 13(4), 94–97 (2011). https://doi.org/10.1109/MCSE.2011.71
Article Google Scholar
Prince, J.L., Links, J.M.: Medical Imaging Signals and Systems. Pearson Prentice Hall Upper Saddle River, New Jersey (2006)
Google Scholar
Silva, J., Tenreyro, S.: Poisson: some convergence issues. Stata Journal 11(2), 207–212 (2011)
Article Google Scholar
Starck, J.L., Murtagh, F.: Astronomical Image and Data Analysis. Springer, New York (2007)
MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Zanella, R., Boccacci, P., Zanni, L., Bertero, M.: Corrigendum: efficient gradient projection methods for edge-preserving removal of Poisson noise. Inverse Probl. 29(11), 119–501 (2013)
Article MATH Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. J Mach Learn Res 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Article MathSciNet MATH Google Scholar
Zou, H., Zhang, H.H.: On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 37(4), 1733 (2009)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Universita’ degli Studi di Genova, Genova, Italy
Sabrina Guastavino & Federico Benvenuto

Authors

Sabrina Guastavino
View author publications
You can also search for this author in PubMed Google Scholar
Federico Benvenuto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabrina Guastavino.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research leading to these results has received funding from the European Unions Horizon2020 research and innovation programme under Grant Agreement No. 640216.

Appendix: proofs

In order to prove Theorem 1, we start by proving the following

Lemma 1

Let y be a Poisson random variable with mean $\theta $. Let $z>0$ be such that $|z-\theta |\le c\sqrt{\theta }$, where c is a positive constant smaller than $\sqrt{\theta }$. Then

$$\begin{aligned} \mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{z}\right) = O\left( \frac{1}{\theta }\right) , \quad \text { as } \theta \rightarrow \infty . \end{aligned}$$

(22)

Proof

Following Zanella et al. (2013) we obtain

$$\begin{aligned} D(z,k)&= \frac{1}{2}\frac{(k-z)^2}{z}-\frac{1}{6}\frac{(k-z)^3}{z^2}+\frac{1}{3}\frac{(k-z)^4}{z^3} \\&\quad +\, k R_3\left( \frac{k-z}{z}\right) , \end{aligned}$$

where $k\in \mathbb {N}$ and $R_3$ is defined as follows

$$\begin{aligned} R_3(\xi )=\int _{0}^{\xi } \frac{(t-\xi )^3}{(1+t)^4} dt, \end{aligned}$$

where $\xi \ge -1$. By computing the moments of the Poisson random variable y centered in z we obtain

$$\begin{aligned}&\mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{z}\right) =-\frac{1}{6}\frac{\theta -3\theta r-r^3}{z^2}\nonumber \\&\quad +\,\frac{1}{3}\frac{3\theta ^2+6\theta r^2-4\theta r+\theta + r^4}{z^3}+\mathscr {E}(\theta ), \end{aligned}$$

(23)

where $r:=z-\theta $ and

$$\begin{aligned} \mathscr {E}(\theta )=\mathbb {E}\left( y R_3\left( \frac{y-z}{z}\right) \right) =\sum _{k=1}^{\infty } \frac{e^{-\theta }\theta ^k}{k!} k R_3\left( \frac{k-z}{z}\right) . \end{aligned}$$

To conclude we now prove that $\mathscr {E}(\theta )=O(\frac{1}{\theta })$. Following the idea of the proof given in Zanella et al. (2013), we split the series into two parts: in the first ranging k between 0 and $[\frac{z}{2}]$ and in the second one $k \ge [\frac{z}{2}]+1$, where $[\chi ]$ denotes the integer part of $\chi $. We observe that for k from 1 to $ [\frac{z}{2}]$, or equivalently $\xi \in (-\,1,-\,\frac{1}{2}]$, then

$$\begin{aligned} (1+\xi )|R_3(\xi )|\le \frac{1}{e}. \end{aligned}$$

(24)

Since $\frac{\theta ^s}{s!}=\frac{\theta ^s}{\varGamma (s+1)}$ is monotonically increasing for $0\le s\le [\frac{\theta +c\sqrt{\theta }}{2}]$, using Eq. (24) and the Stirling formula we obtain

$$\begin{aligned}&\left| \sum _{k=1}^{[\frac{z}{2}]}\frac{e^{-\theta }\theta ^k}{k!} k R_3\left( \frac{k-z}{z}\right) \right| \le \frac{1}{e} \sum _{k=1}^{[\frac{z}{2}]} z e^{-\theta }\frac{\theta ^k}{k!} \nonumber \\&\quad \le \frac{1}{e} \sum _{k=1}^{[\frac{\theta +c\sqrt{\theta }}{2}]} z e^{-\theta }\frac{\theta ^k}{k!}\le \frac{1}{e} \left[ \frac{\theta +c\sqrt{\theta }}{2}\right] e^{-\theta } \frac{\theta ^{[\frac{\theta +c\sqrt{\theta }}{2}]}}{[\frac{\theta +c\sqrt{\theta }}{2}]!} z \nonumber \\&\quad \le \frac{e^{-\theta -1+[\frac{\theta +c\sqrt{\theta }}{2}]}}{\sqrt{2\pi }}\left( \frac{\theta }{[\frac{\theta +c\sqrt{\theta }}{2}]}\right) ^{[\frac{\theta +c\sqrt{\theta }}{2}]} \left[ \frac{\theta +c\sqrt{\theta }}{2}\right] ^{\frac{1}{2}} z \, . \end{aligned}$$

(25)

Since $\frac{\theta +c\sqrt{\theta }}{2} - 1 \le \left[ \frac{\theta +c\sqrt{\theta }}{2} \right] \le \frac{\theta +c\sqrt{\theta }}{2}$ and $\frac{\theta }{ \left[ \frac{\theta +c\sqrt{\theta }}{2} \right] } > 1$ for $\theta $ large enough, the upper bound in inequality (25) can be bounded by

$$\begin{aligned} M(\theta ) := \frac{e^{-\frac{1}{2}\theta \left( 1+2\theta ^{-1}-c\theta ^{-\frac{1}{2}}- v\log \left( \frac{2}{ v-2\theta ^{-1}}\right) \right) } (\theta v)^{\frac{3}{2}}}{2\sqrt{\pi }} \end{aligned}$$

(26)

where $v:=1+c\theta ^{-\frac{1}{2}}$. Then $M(\theta )\rightarrow 0$ exponentially as $\theta \rightarrow \infty $. Now we consider $k\ge [\frac{z}{2}]+1$, or equivalently $\xi >-\frac{1}{2}$. Since

$$\begin{aligned} |R_3(\xi )|\le 4 \xi ^4, \end{aligned}$$

(27)

we obtain

$$\begin{aligned}&\left| \sum _{k=[\frac{z}{2}]+1}^{\infty }\frac{e^{-\theta }\theta ^k}{k!} k R_3\left( \frac{k-z}{z}\right) \right| \\&\quad \le 4\sum _{k=0}^{\infty }\frac{e^{-\theta }\theta ^k}{k!} k \left( \frac{k-z}{z}\right) ^4 \\&\quad = 4\mathbb {E}\left( \frac{(y-z)^5}{z^4}+\frac{(y-z)^4}{z^3}\right) \\&\quad = 4\frac{3\theta ^3+6\theta ^2 r^2-16\theta ^2 r+11\theta ^2+ \theta (r-1)^4}{z^4} \\&\quad \le 4 \frac{(3+6c^2)\theta ^3+16 c \theta ^2\sqrt{\theta }+11\theta ^2+\theta (c\sqrt{\theta }+1)^4}{(\theta -c\sqrt{\theta })^4}\\&\quad =O\left( \frac{1}{\theta }\right) , \end{aligned}$$

where

$$\begin{aligned} \mathbb {E}((y-z)^5)&= \theta ^5-5\theta ^4(z-2)+5\theta ^3(2z^2-6z+5)\\&\quad -\,5\theta ^2(2z^3-6z^2+7z-3)\\&\quad +\,\theta (5z^4-10z^3+10z^2-5z+1)-z^5 \end{aligned}$$

$$\begin{aligned} \mathbb {E}((y-z)^4)&= \theta ^4+\theta ^3(6-4z)+\theta ^2(6z^2-12z+7)\\&\quad +\,\theta (-4z^3+6z^2-4z+1)+z^4. \end{aligned}$$

$\square $

are the 5th and 4th moments of the Poisson random variable y centered in z.

Proof of Theorem 1

By the triangular inequality we have

$$\begin{aligned}&\left| \mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{y+1}\right) \right| \nonumber \\&\quad \le \left| \mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{z}\right) \right| \nonumber \\&\qquad +\,\left| \mathbb {E}\left( \frac{1}{2}\frac{(y-z)^2}{z}-\frac{1}{2}\frac{(y-z)^2}{y+1}\right) \right| . \end{aligned}$$

(28)

Then, to get the thesis, thanks to Lemma 1, it is sufficient to prove that

$$\begin{aligned} \mathbb {E}\left( \frac{(y-z)^2}{z}-\frac{(y-z)^2}{y+1}\right) = O\left( \frac{1}{\sqrt{\theta }}\right) ,\quad \text { as } \theta \rightarrow \infty .\nonumber \\ \end{aligned}$$

(29)

By writing the left hand side of the Eq. (29) as the difference between the second moments of a Poisson variable centered in z and in $z+1$, we obtain

$$\begin{aligned}&\frac{1}{z}\mathbb {E}\left( (y-z)^2\right) -\frac{1}{\theta }\mathbb {E}\left( (y-z-1)^2\right) +\frac{1}{\theta }e^{-\theta }(z+1)^2 \\&\quad = \frac{\theta ^2-2\theta z+\theta +z^2}{z} \\&\qquad -\,\frac{\theta ^2-\theta (2z+1)+(z+1)^2-(z+1)^2 e^{-\theta }}{\theta }. \end{aligned}$$

By some manipulations and by using that $|z-\theta |\le c\sqrt{\theta }$ we get

$$\begin{aligned}&\left| \mathbb {E}\left( \frac{(y-z)^2}{z}-\frac{(y-z)^2}{y+1}\right) \right| \\&\quad \le e^{-\theta }\left| \frac{(z+1)^2}{\theta }\right| +\left| \frac{(\theta -z)^3}{z\theta }\right| \\&\qquad +\,\left| \frac{(\theta -z)^2}{z\theta }\right| +\left| \frac{3(\theta -z)}{\theta }-\frac{1}{\theta }\right| \\&\quad \le e^{-\theta }\frac{(\theta +c\sqrt{\theta }+1)^2}{\theta }+\frac{c^3}{\sqrt{\theta }-c}\\&\qquad +\,\frac{c^2}{\theta -c\sqrt{\theta }}+\frac{3c}{\sqrt{\theta }}+\frac{1}{\theta } = O\left( \frac{1}{\sqrt{\theta }}\right) . \end{aligned}$$

$\square $

To prove Theorem 2 we need some preliminary results. We start by defining

$$\begin{aligned} \epsilon :=\mathbf {Y}-\mathbf X \beta ^*. \end{aligned}$$

(30)

We observe that the components $\epsilon _i$ are independent random variables with zero mean and $\mathrm{Var}(\epsilon _i)=(\mathbf X \beta ^*)_i$, for all $i\in \{1,\ldots ,n\}$. Hereafter, for easy of notation we suppress the superscript (n) from the estimators.

Lemma 2

There exists a constant $G<+\infty $ such that

$$\begin{aligned} \mathbb {E}\left( \Vert \mathbf {X}^T\varLambda ^2\epsilon \Vert ^4_2\right) \le p^2 G^2 (\tau _{\max }(\mathbf X ^T \mathbf X ))^2. \end{aligned}$$

(31)

Proof of Lemma 2

We compute the term in the l.h.s. in Eq. (31). We have

$$\begin{aligned} \mathbb {E}\left( \Vert \mathbf X ^T\varLambda ^2\epsilon \Vert ^4_2\right) =\mathbb {E}( (\mathbf D ^T \mathbf X \mathbf X ^T \mathbf D )^2 ) \end{aligned}$$

(32)

with $\mathbf D :=\varLambda ^2\epsilon $. For the Singular Value Decomposition we can write

$$\begin{aligned} \mathbf X \mathbf X ^T= \mathbf U ^T \varSigma \mathbf U ~, \end{aligned}$$

where $\mathbf U $ is an orthogonal matrix and $\varSigma $ a diagonal matrix containing the eigenvalues of $\mathbf X ^T \mathbf X $. We define We define $\mathbf H :=\mathbf U \mathbf D = \mathbf U \varLambda ^2 \epsilon $. The ith component of $\mathbf H $ is given by

$$\begin{aligned} H_i= \sum _{l=1}^{n} u_{il}\frac{\epsilon _l}{Y_l+1} ~, \end{aligned}$$

where $u_{il}$ represents the (i, l)-entry of the matrix U. Since $\frac{\epsilon _l}{Y_l+1}$ takes values between $-\,(\mathbf X \beta ^*)_l$ and 1, we have that each component $H_i$ takes values in a compact subset [R, S]. Therefore, as $\mathbf H \in [R,S]^n$, the quadratic form $(\mathbf H ^T\varSigma \mathbf H )^2$ admits a maximum, i.e. there exists an $\mathbf H ^*$ such that $(\mathbf H ^T\varSigma \mathbf H )^2\le ((\mathbf H ^*)^T\varSigma \mathbf H ^*)^2$. Then

$$\begin{aligned} \mathbb {E}\left( \Vert \mathbf X ^T\varLambda ^2\epsilon \Vert ^4_2\right)&=\mathbb {E}((\mathbf H ^T \varSigma \mathbf H )^2)\le ((\mathbf H ^*)^T\varSigma \mathbf H ^*)^2 \nonumber \\&\le p^2 G^2 (\tau _{\max }(\mathbf X ^T \mathbf X ))^2, \end{aligned}$$

(33)

with

$$\begin{aligned} G:=\max _{\begin{array}{c} i\in \{1,\ldots ,n\} : \\ \varSigma _{ii}\ne 0 \end{array}}(H^*_i)^2. \end{aligned}$$

(34)

$\square $

Corollary 2

Under assumption (H1) there exists a constant $G<+\infty $ such that we have the following bound

$$\begin{aligned} \mathbb {E}(\Vert \hat{\beta }(\mathrm{PRLS})-\beta ^*\Vert ^2_2) \le \frac{p G B n}{ (bn)^2}, \end{aligned}$$

(35)

where $\hat{\beta }(\mathrm{PRLS})$ is the reweighted least square estimator defined as follows

$$\begin{aligned} \hat{\beta }(\mathrm{PRLS})=\arg \min _{\beta }\frac{1}{2}\Vert \varLambda (\mathbf {Y}-\mathbf {X}\beta )\Vert ^2_2~. \end{aligned}$$

(36)

Proof of Corollary 2

By using optimality conditions of problem in Eq. (36) and the definition of $\epsilon $ we have

$$\begin{aligned} \hat{\beta }(\mathrm{PRLS})-\beta ^*=(\mathbf X ^T\varLambda ^2 \mathbf X )^{-1} (\mathbf X ^T\varLambda ^2\epsilon )~. \end{aligned}$$

(37)

Then, by using the Cauchy-Schwartz inequality we obtain

$$\begin{aligned}&\mathbb {E}(\Vert \hat{\beta }(\mathrm{PRLS})-\beta ^*\Vert ^2_2)\nonumber \\&\quad \le \sqrt{\mathbb {E}(\Vert (\mathbf X ^T\varLambda ^2 \mathbf X )^{-1}\Vert ^4_2)\mathbb {E}(\Vert \mathbf X ^T\varLambda ^2\epsilon \Vert ^4_2)}, \end{aligned}$$

(38)

where

$$\begin{aligned} \Vert (\mathbf X ^T\varLambda ^2 \mathbf X )^{-1}\Vert ^4_2= \frac{1}{(\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X ))^4}~. \end{aligned}$$

(39)

By assumption (H1) and Lemma 2 we have the thesis. $\square $

Proof of Theorem 2

We want to prove the bound in Eq. (12). From Corollary 2, since

$$\begin{aligned} \mathbb {E}(\Vert \hat{\beta }_{(\mathbf {w},\lambda )}-\beta ^*\Vert ^2_2)&\le \mathbb {E}(\Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert ^2_2) \nonumber \\&\quad +\,\mathbb {E}(\Vert \hat{\beta }(\mathrm{PRLS})-\beta ^*\Vert ^2_2), \end{aligned}$$

(40)

we have to establish a bound for the first term of the r.h.s. of (40). In order to do so, we follow similar arguments as in the proof of Theorem 3.1 by Zou and Zhang (2009). By definition of $\hat{\beta }_{(\mathbf {w} ,\lambda )}$, the following inequality applies

$$\begin{aligned}&\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }_{(\mathbf {w} ,\lambda )})\Vert ^2_2-\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }(\mathrm{PRLS}))\Vert ^2_2 \nonumber \\&\quad \le \lambda \sum _{j=1}^{p}w_j(|\hat{\beta }(\mathrm{PRLS})_j|-|(\hat{\beta }_{(\mathbf {w} ,\lambda )})_j|). \end{aligned}$$

(41)

From the optimality conditions of the optimization problem in Eq. (36), we have

$$\begin{aligned}&\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }_{(\mathbf {w} ,\lambda )})\Vert ^2_2-\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }(\mathrm{PRLS}))\Vert ^2_2\nonumber \\&\quad =\frac{1}{2}(\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS}))^T \mathbf X ^T\varLambda ^2 \mathbf X (\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})), \end{aligned}$$

(42)

and we notice that

$$\begin{aligned}&\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )\Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert ^2_2 \nonumber \\&\quad \le (\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS}))^T \mathbf X ^T\varLambda ^2 \mathbf X (\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})) \end{aligned}$$

(43)

and

$$\begin{aligned}&\sum _{j=1}^{p}w_j(|\hat{\beta }(\mathrm{PRLS})_j | -|(\hat{\beta }_{(\mathbf {w} ,\lambda )})_j|) \nonumber \\&\quad \le \sqrt{\sum _{j=1}^{p}w_j^2}\Vert \hat{\beta }(\mathrm{PRLS})-\hat{\beta }_{(\mathbf {w} ,\lambda )}\Vert _2. \end{aligned}$$

(44)

Using (41), (42), (43) and (44) we obtain

$$\begin{aligned} \Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert _2\le \frac{2\lambda \sqrt{\sum _{j=1}^{p} w_j^2}}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}, \end{aligned}$$

(45)

and finally, the Cauchy Schwartz inequality and assumption (H1) lead to

$$\begin{aligned} \mathbb {E}(\Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert _2^2)\le \frac{4\lambda ^2 \sqrt{\mathbb {E}\left( \left( \sum _{j=1}^{p} w_j^2\right) ^2\right) }}{(b n)^{2}}. \end{aligned}$$

(46)

The thesis follows from Eqs. (40), (46) and Corollary 2. $\square $

Proof of Theorem 3

For brevity we denote the APRiL estimator by $\hat{\beta }$. To prove the model selection consistency we prove that for $n\rightarrow +\infty $

$$\begin{aligned} \mathbb {P}(\forall j\in (\mathscr {A^*})^C,\quad \hat{\beta }_j=0)\longrightarrow 1 \end{aligned}$$

(47)

and

$$\begin{aligned} \mathbb {P}(\forall j\in \mathscr {A^*},\quad |\hat{\beta }_j|>0)\longrightarrow 1 ~. \end{aligned}$$

(48)

We now prove Eq. (47). The functional defined in Eq. (8) is convex and not differentiable and $\mathscr {C}$ is convex set. Then the solution $\hat{\beta }$ is characterized by the (KKT) optimality conditions:

$(\mathbf X \hat{\beta })_i\ge 0$$\forall $$i\in \{1,\ldots ,n\}$;
$\nu _i\ge 0$$\forall $$i\in \{1,\ldots ,n\}$;
$\nu _i (\mathbf X \hat{\beta })_i=0$$\forall $$i\in \{1,\ldots ,n\}$;
if $\hat{\beta }_j\ne 0$
$$\begin{aligned} -\,\mathbf x ^T_j\varLambda ^2(\mathbf Y -\mathbf X \hat{\beta })+\lambda \hat{w}_j sgn(\hat{\beta }_j)-\mathbf x _j^T\nu =0; \end{aligned}$$
(49)
if $\hat{\beta }_j=0$
$$\begin{aligned} -\,\mathbf x _j^T\varLambda ^2(\mathbf Y -\mathbf X \hat{\beta })+\lambda \hat{w}_j s_j-\mathbf x _j^T\nu =0, \end{aligned}$$
(50)
with $s_j\in [-\,1,1]$.

$\nu $ is the n-dimensional vector whose components are the Lagrangian multipliers. Thanks to KKT conditions, the event $\{ \forall j\in (\mathscr {A^*})^C, \hat{\beta }_j=0 \}$ can be written as

$$\begin{aligned} \{\mathbf{x }^T_{j}\varLambda ^{2}(\mathbf{Y }-\mathbf{X }_{\mathscr {A}^{*}}{\hat{\beta }}_{\mathscr {A}^{*}})+\mathbf{x }_j^{T} \nu =\lambda {\hat{w}}_{j} s_{j}, \forall j\in (\mathscr {A}^*)^C\}, \end{aligned}$$

(51)

where $|s_j|\le 1$ [see Eq. (50)], $\mathbf X _{\mathscr {A^*}}$ is the matrix constituted by the columns $\mathbf x _j$ and $\hat{\beta }_{\mathscr {A^*}}$ is the vector constituted by the components $\hat{\beta }_j$ with $j\in \mathscr {A^*}$. By taking the absolute value of each Equation in (51) the event takes the form

$$\begin{aligned} \{ |\mathbf x ^T_j(\varLambda ^2(\mathbf Y -\mathbf X _{\mathscr {A^*}}\hat{\beta }_{\mathscr {A^*}})+\nu )|\le \lambda \hat{w}_j, \forall j\in (\mathscr {A^*})^C \}. \end{aligned}$$

(52)

This implies that Eq. (47) is equivalent to $\mathbb {P}\left( \exists j\in (\mathscr {A^*})^C, \left| \mathbf x ^T_j(\varLambda ^2(\mathbf Y -\mathbf X _{\mathscr {A^*}}\hat{\beta }_{\mathscr {A^*}})+\nu )\right| >\lambda \hat{w}_j\right) \rightarrow 0$ for $n\rightarrow +\infty $. We set

$$\begin{aligned} \hat{S}_j:= & {} |\hat{\beta }(\mathrm{PRiL})_j|+\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }, \\ \hat{\eta }:= & {} \min _{j\in \mathscr {A^*}}\hat{S}_j, \\ \eta:= & {} \min _{j\in \mathscr {A^*}}|\beta ^{*}_j|+\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }, \end{aligned}$$

and

$$\begin{aligned} \hat{E}_j:= \left| \mathbf x ^T_j(\varLambda ^2(\mathbf Y -\mathbf X _{\mathscr {A^*}}\hat{\beta }_{\mathscr {A^*}})+\nu )\right| . \end{aligned}$$

Then

$$\begin{aligned}&\mathbb {P}\left( \exists j\in (\mathscr {A^*})^C \hat{E}_j>\lambda \hat{w_j}\right) \nonumber \\&\quad \le \sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \nonumber \\&\qquad +\,\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{S}_j>\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) +\mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right) . \end{aligned}$$

(53)

The idea is to determine three bounds $M_1$, $M_2$ and $M_3$ depending on n, such that

$$\begin{aligned}&\mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right) \le M_1, \end{aligned}$$

(54)

$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{S}_j>\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \le M_2, \end{aligned}$$

(55)

(56)

and $M_1$, $M_2$ and $M_3$ go to 0 for $n\rightarrow +\infty $. Let us start with the determination of bound $M_1$. Using Corollary 1, it follows that

$$\begin{aligned} \mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right)&\le \mathbb {P}\left( \Vert \hat{\beta }(\mathrm{PRiL})-\beta ^*\Vert _2\ge \frac{\eta }{2}\right) \\&\le \frac{2}{\eta }\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{p G B n}}{bn}\right) =:M_1. \end{aligned}$$

For the determination of bound $M_2$, we use again Corollary 1. We have that

$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{S}_j>\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \\&\quad \le \frac{\mathbb {E}\left( \Vert \hat{\beta }(\mathrm{PRiL})-\beta ^*\Vert _2\right) +p\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }}{\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}}\\&\quad \le \frac{1}{(\frac{\lambda }{n})^{\frac{1}{\gamma }}}\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{pGBn}}{bn}+\frac{p}{n^{\frac{1}{\gamma }+\delta }}\right) =: M_2. \end{aligned}$$

Finally, for the determination of bound $M_3$, we write

$$\begin{aligned}&\sum \nolimits _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \\&\quad \le 2\frac{\mathbb {E}\left( \sum _{j\in (\mathscr {A^*})^C}\hat{E}_j \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) }{n}. \end{aligned}$$

By definition of $\epsilon $, we have

$$\begin{aligned}&\sum \nolimits _{j\in (\mathscr {A^*})^C} \hat{E}_j \nonumber \\&\quad =\sum \nolimits _{j\in (\mathscr {A^*})^C}\left| \mathbf x ^T_j\varLambda ^2\mathbf X _{\mathscr {A^*}}(\beta ^*_{\mathscr {A^*}}-\hat{\beta }_{\mathscr {A^*}})+\mathbf x ^T_j\varLambda ^2\epsilon +\mathbf x ^T_j\nu \right| \nonumber \\&\quad \le \sum \nolimits _{j\in (\mathscr {A^*})^C}\Vert \mathbf x _j^T \varLambda \Vert _2 \sqrt{\tau _{\max }(\mathbf X ^T\varLambda ^2 \mathbf X )} \Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }_{\mathscr {A^*}}\Vert _2 \nonumber \\&\qquad +\,\sum \nolimits _{j\in (\mathscr {A^*})^C}\left| \mathbf x _j^T\nu \right| + \sum \nolimits _{j\in (\mathscr {A^*})^C}\Vert \mathbf x ^T_j\varLambda \Vert _2\Vert \varLambda \epsilon \Vert _2. \end{aligned}$$

(57)

By using assumption (H4), we get

$$\begin{aligned} \sum _{j\in (\mathscr {A^*})^C}\Vert \mathbf x _j^T \varLambda \Vert _2\le p\max _{j=1,\ldots ,p} \Vert \mathbf x _j\Vert _2\le p L, \end{aligned}$$

(58)

and

$$\begin{aligned} \mathbb {E}(\Vert \varLambda \epsilon \Vert _2)=\mathbb {E}\left( \sqrt{\sum _{i=1}^{n}\left( \frac{\epsilon _i}{\sqrt{Y_i+1}}\right) ^2}\right) \le \sqrt{2 n} \end{aligned}$$

(59)

where we used that $\mathbb {E}\left( \frac{\epsilon _i^2}{Y_i+1}\right) \le 2$ for all $i\in \{1,\ldots ,n\}$. Following the idea of the proof of Lemma 2, in particular the calculus which leads (35), we have

$$\begin{aligned} \Vert \hat{\beta }_{\mathscr {A^*}}-\hat{\beta }(\mathrm{PRLS})_{\mathscr {A^*}}\Vert _2\le \frac{2\lambda \sqrt{p} \frac{1}{\hat{\eta }^{\gamma }}}{\tau _{\min }(\mathbf X ^T_{\mathscr {A^*}}\varLambda ^2 \mathbf X _{\mathscr {A^*}})}. \end{aligned}$$

(60)

Thanks to the Cauchy-Schwartz inequality, Eqs. (57), (58), (59), (60) and hypothesis (H1) we obtain the following bound

$$\begin{aligned}&\mathbb {E}\left( \sum _{j\in (\mathscr {A^*})^C}\hat{E}_j \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \nonumber \\&\quad \le p L\sqrt{\mathbb {E}\left( \tau _{\max }( \mathbf X ^T\varLambda ^2 \mathbf X )\right) \mathbb {E}\left( \Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }_{\mathscr {A^*}}\Vert _2^2 \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) }\nonumber \\&\qquad +\,\mathbb {E}\left( \sum _{j\in (\mathscr {A}^*)^C}\left| \mathbf x _j^T\nu \right| \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) +p L\mathbb {E}(\Vert \varLambda \epsilon \Vert _2)\nonumber \\&\quad \le p L \sqrt{B n}\left( \frac{2\lambda \sqrt{p} \left( \frac{\eta }{2}\right) ^{-\gamma }+\sqrt{pGBn}}{b n }\right) \nonumber \\&\qquad +\,\mathbb {E}\left( \sum _{j\in (\mathscr {A}^*)^C}\left| \mathbf x _j^T\nu \right| \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) +p L \sqrt{2n}. \end{aligned}$$

(61)

From optimality conditions in Eqs. (49) and (50) it follows that

$$\begin{aligned} | \mathbf x ^T_j\nu |\le | \mathbf x _j^T\varLambda ^2(\mathbf Y -\mathbf X \hat{\beta })|+\lambda \hat{w}_j, \end{aligned}$$

so we have

$$\begin{aligned}&\mathbb {E}\left( \sum _{j\in (\mathscr {A}^*)^C}\left| \mathbf x _j^T\nu \right| \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \\&\quad \le \mathbb {E}\left( \sum _{j\in \mathscr {A}^*}| \mathbf x _j^T\varLambda ^2\epsilon |\right) \\&\qquad +\,\mathbb {E}\left( \sum _{j\in \mathscr {A}^*}| \mathbf x _j^T\varLambda ^2 \mathbf X (\beta ^*-\hat{\beta })|\mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \\&\qquad +\,\lambda \mathbb {E}\left( \sum _{j\in \mathscr {A}^*}\hat{w}_j\mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \\&\quad \le p L \sqrt{2n}+p L \sqrt{B n}\left( \frac{2\lambda \sqrt{p} n^{1+\gamma \delta }+\sqrt{p G B n} }{bn}\right) \\&\qquad +\,\lambda p \left( \frac{2}{\eta }\right) ^{\gamma } \end{aligned}$$

where we have used the following bound

$$\begin{aligned}&\mathbb {E}\left( \left( \sum _{j=1}^{p}\hat{w}_j^2\right) ^2\right) \nonumber \\&\quad =\mathbb {E}\left( \left( \sum _{j=1}^{p}\frac{1}{\left( |\hat{\beta }(\mathrm{PRiL})_j|+\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }\right) ^{2\gamma }}\right) ^2\right) \nonumber \\&\quad \le p^2 n^{4(1+\delta \gamma )}. \end{aligned}$$

(62)

Then, we obtain

$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C} \mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \\&\quad \le \frac{4 p L}{n}\left( \sqrt{2n}+\frac{B\sqrt{pG}}{b}+\sqrt{B n p}\frac{\lambda \left( \frac{2}{\eta }\right) ^{\gamma }+\lambda n^{1+\delta \gamma }}{bn} + \lambda \frac{2^{\gamma -1}}{\eta ^{\gamma } L}\right) \\&\quad =: M_3. \end{aligned}$$

Now we prove that $M_1$, $M_2$ and $M_3$ go to 0 for $n\rightarrow +\infty $.

$$\begin{aligned} M_3 \rightarrow 0 \end{aligned}$$

because $\frac{\sqrt{n} \lambda \left( \frac{2}{\eta }\right) ^{\gamma }}{n^2} \longrightarrow 0$, $\frac{\sqrt{n}\lambda n^{1+\delta \gamma }}{n^{2}}\longrightarrow 0$ and $\frac{\lambda }{n}\left( \frac{2}{\eta }\right) ^{\gamma }\longrightarrow 0$ for $n\rightarrow +\infty $, for the assumption (H3 c) and for the positivity of constants $\gamma $ and $\delta $;

$$\begin{aligned} M_2 = \frac{1}{(\frac{\lambda }{n})^{\frac{1}{\gamma }}}\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{pGBn}}{bn}+\frac{p}{n^{\frac{1}{\gamma }+\delta }}\right) \rightarrow 0 \end{aligned}$$

because $\frac{\lambda _1}{\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }} n}\longrightarrow 0$ for $n\rightarrow +\infty $ for assumptions (H2) and (H3 a), $\frac{\sqrt{n}}{n\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}}=\frac{1}{\left( \lambda n^{\frac{\gamma }{2}-1}\right) ^{\frac{1}{\gamma }}}\longrightarrow 0$ for $n\rightarrow +\infty $ for the assumption (H3 a), and at last $\frac{1}{\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }} n^{\frac{1}{\gamma }+\delta }}\longrightarrow 0$ for $n\rightarrow +\infty $ for the assumption (H3 b);

$$\begin{aligned} M_1 = \frac{2}{\eta }\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{pGBn}}{bn}\right) \rightarrow 0 \end{aligned}$$

because $\frac{\lambda _1}{n \eta } \longrightarrow 0$ for $n\rightarrow +\infty $, for the assumption (H2) and the definition of $\eta $, and $\frac{\sqrt{n}}{n \eta }=O(\frac{1}{\sqrt{n}})$ for $n\rightarrow +\infty $.

Now we prove Eq. (48). It is sufficient to show that

$$\begin{aligned} \mathbb {P}\left( \min _{j\in \mathscr {A^*}}|\hat{\beta }_j|>0\right) \longrightarrow 1,\quad n\rightarrow +\infty . \end{aligned}$$

By Eq. (60) we have

$$\begin{aligned} \min _{j\in \mathscr {A^*}}|\hat{\beta }_j|>\min _{j\in \mathscr {A^*}}|\hat{\beta }(\mathrm{PRLS})_j|-\frac{2\lambda \sqrt{p}\hat{\eta }^{-\gamma }}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )} ~, \end{aligned}$$

(63)

where

$$\begin{aligned} \min _{j\in \mathscr {A^*}}|\hat{\beta }(\mathrm{PRLS})_j|\ge \min _{j\in \mathscr {A^*}}|\beta ^*_j|-\Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }(\mathrm{PRLS})_{\mathscr {A^*}}\Vert _2 ~.\nonumber \\ \end{aligned}$$

(64)

Since $\min _{j\in \mathscr {A}^*}|\beta _j^*|>0$, to conclude we show that $\Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }(\mathrm{PRLS})_{\mathscr {A^*}}\Vert _2$ and $\frac{2\lambda \sqrt{p}\hat{\eta }^{-\gamma }}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}$ go to 0 in probability. Equation (35) implies that the second term in the r.h.s of (64) goes to zero. Moreover, for the second term in Eq. (63) we have that, given $M>0$

$$\begin{aligned}&\mathbb {P}\left( \frac{2\lambda \sqrt{p}}{\hat{\eta }^{\gamma }\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}>M\right) \nonumber \\&\quad \le \mathbb {P}\left( \frac{2\lambda \sqrt{p}}{\hat{\eta }^{\gamma }\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}>M, \{\hat{\eta }>\frac{\eta }{2}\}\right) +\mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right) \nonumber \\&\quad \le \frac{2\lambda \sqrt{p}}{M}\sqrt{\mathbb {E}\left( \left( \frac{1}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}\right) ^{2}\right) \mathbb {E}\left( \frac{1}{\hat{\eta }^{2\gamma }} \mathbf 1 _{\{\hat{\eta }>\frac{\eta }{2}\}}\right) }+M_1 \nonumber \\&\quad \le \frac{2\lambda \sqrt{p}}{b n M}\left( \frac{2}{\eta }\right) ^{\gamma }+M_1\longrightarrow 0 \text { for } n\rightarrow +\infty \end{aligned}$$

(65)

as $M_1\rightarrow 0$ for $n\rightarrow +\infty $ and $\frac{\lambda }{n }\left( \frac{2}{\eta }\right) ^{\gamma }\rightarrow 0$ for $n\rightarrow +\infty $ thanks to assumption (H3 c). This proves Eq. (48) and concludes the proof. $\square $

Proof of Proposition 1

The proof is analogous to the one of Theorem 3. We start by proving that Eq. (47) holds. It is sufficient to show that the bounds $M_1$, $M_2$ and $M_3$ go to 0 as $n\rightarrow \infty $ under assumptions (H5), (H6), (H7) and (H8). We have that $M_1 \rightarrow 0$ since $\frac{\lambda _1 \sqrt{p}}{\eta n}\rightarrow 0$ for the assumption (H7 a) and $\frac{\sqrt{p}}{\sqrt{n}\eta }\rightarrow 0$ for the assumption (H6). Moreover, $M_2 \rightarrow 0$ since $\left( \frac{\lambda }{n}\right) ^{-\frac{1}{\gamma }}\frac{\lambda _1 \sqrt{p}}{n}=\left( \frac{\lambda n^{\delta \gamma }}{p^{\gamma }}\right) ^{-\frac{1}{\gamma }} \frac{\lambda _1 n^{\delta +\frac{1}{\gamma }-1}}{\sqrt{p}}\rightarrow 0$ for assumptions (H8 b) and (H7 b), $\left( \frac{\lambda }{n}\right) ^{-\frac{1}{\gamma }} \frac{\sqrt{p}}{\sqrt{n}}=\left( \frac{\lambda n^{\delta \gamma }}{p^{\gamma }}\right) ^{-\frac{1}{\gamma }} \frac{n^{-\frac{1}{2}+\delta +\frac{1}{\gamma }}}{\sqrt{p} }\rightarrow 0$ for the assumption (H8 b) and for the hypothesis $\frac{1}{\gamma }+\delta <\frac{c+1}{2}$, and $p \left( \frac{\lambda }{n}\right) ^{-\frac{1}{\gamma }} n^{-\frac{1}{\gamma }-\delta }\rightarrow 0$ for the assumption (H8 b). Finally, $M_3 \rightarrow 0$ since the first two terms in the definition of $M_3$ go to 0 for the assumption (H5), $\frac{\lambda p}{n \eta ^{\gamma }}\frac{\sqrt{p}}{\sqrt{n}}\rightarrow 0$ for assumptions (H5) and (H8 c), $\frac{p}{n}\frac{\lambda n^{1+\delta \gamma }}{\sqrt{n}}\sqrt{p}= \lambda n^{\delta \gamma -\frac{1}{2}} p \sqrt{p}\rightarrow 0$ for the assumption (H8 a) and $\frac{p}{n}\frac{\lambda }{\eta ^{\gamma }}\rightarrow 0$ for the assumption (H8 c).

Now we have to prove that Eq. (48) holds. It is sufficient to observe that, as $n\rightarrow \infty $, the bound in Eq. (35) goes to 0 for the assumption (H5) and the bound in Eq. (65) goes to 0 for the assumption (H8 c). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guastavino, S., Benvenuto, F. A consistent and numerically efficient variable selection method for sparse Poisson regression with applications to learning and signal recovery. Stat Comput 29, 501–516 (2019). https://doi.org/10.1007/s11222-018-9819-1

Download citation

Received: 12 January 2018
Accepted: 18 June 2018
Published: 22 June 2018
Issue Date: 01 May 2019
DOI: https://doi.org/10.1007/s11222-018-9819-1

A consistent and numerically efficient variable selection method for sparse Poisson regression with applications to learning and signal recovery

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust and sparse regression in generalized linear model by stochastic optimization

Variable selection and estimation using a continuous approximation to the \(L_0\) penalty

Penalized robust estimators in sparse logistic regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: proofs

Lemma 1

Proof

Proof of Theorem 1

Lemma 2

Proof of Lemma 2

Corollary 2

Proof of Corollary 2

Proof of Theorem 2

Proof of Theorem 3

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

A consistent and numerically efficient variable selection method for sparse Poisson regression with applications to learning and signal recovery

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust and sparse regression in generalized linear model by stochastic optimization

Variable selection and estimation using a continuous approximation to the \(L_0\) penalty

Penalized robust estimators in sparse logistic regression

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: proofs

Appendix: proofs

Lemma 1

Proof

Proof of Theorem 1

Lemma 2

Proof of Lemma 2

Corollary 2

Proof of Corollary 2

Proof of Theorem 2

Proof of Theorem 3

Proof of Proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation