Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A consistent and numerically efficient variable selection method for sparse Poisson regression with applications to learning and signal recovery

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We propose an adaptive \(\ell _1\)-penalized estimator in the framework of Generalized Linear Models with identity-link and Poisson data, by taking advantage of a globally quadratic approximation of the Kullback-Leibler divergence. We prove that this approximation is asymptotically unbiased and that the proposed estimator has the variable selection consistency property in a deterministic matrix design framework. Moreover, we present a numerically efficient strategy for the computation of the proposed estimator, making it suitable for the analysis of massive counts datasets. We show with two numerical experiments that the method can be applied both to statistical learning and signal recovery problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Anscombe, F.J.: The transformation of Poisson binomial and negative-binomial data. Biometrika 35(3/4), 246–254 (1948). https://doi.org/10.2307/2332343

    Article  MathSciNet  MATH  Google Scholar 

  • Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542

    Article  MathSciNet  MATH  Google Scholar 

  • Bertero, M., Lanteri, H., Zanni, L.: Iterative image reconstruction: a point of view. pp 37–63 (2008)

  • Bogdan, M., van den Berg, E., Sabatti, C., Su, W., Candès, E.J.: SLOPE-adaptive variable selection via convex optimization. Ann. Appl. Stat. 9(3), 1103 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Bonettini, S., Benvenuto, F., Zanella, R., Zanni, L., Bertero, M.: Gradient projection approaches for optimization problems in image deblurring and denoising. In: 2009 17th European Signal Processing Conference, pp. 1384–1388 (2009)

  • Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell \)1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008). https://doi.org/10.1007/s00041-008-9045-x

    Article  MathSciNet  MATH  Google Scholar 

  • De Vito, E., Rosasco, L., Caponnetto, A., Giovannini, U.D., Odone, F.: Learning from examples as an inverse problem. J. Mach. Learn. Res. 6(May), 883–904 (2005)

    MathSciNet  MATH  Google Scholar 

  • Dobson, A.J., Barnett, A.: An Introduction to Generalized Linear Models. CRC Press, Boca Raton (2008)

    MATH  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Figueiredo, M.A.T., Bioucas-Dias, J.M.: Restoration of Poissonian images using alternating direction optimization. IEEE Trans. Image Process. 19(12), 3133–3145 (2010). https://doi.org/10.1109/TIP.2010.2053941

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, vol. 1. Springer, Berlin (2001)

    MATH  Google Scholar 

  • Friedman, J., Hastie, T., Hfling, H., Tibshirani, R., et al.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. https://doi.org/10.18637/jss.v033.i01 (2010)

  • Gu, R., Dogandžić, A.: A fast proximal gradient algorithm for reconstructing nonnegative signals with sparse transform coefficients. In: 48th Asilomar Conference on, IEEE Signals, Systems and Computers, 2014, pp. 1662–1667 (2014)

  • Hansen, N.R., Reynaud-Bouret, P., Rivoirard, V., et al.: Lasso and probabilistic inequalities for multivariate point processes. Bernoulli 21(1), 83–143 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Harmany, Z.T., Marcia, R.F., Willett, R.M.: SPIRAL out of convexity: sparsity-regularized algorithms for photon-limited imaging. In: International Society for Optics and Photonics IS&T/SPIE Electronic Imaging, pp. 75,330R–75,330R (2010)

  • Ivanoff, S., Picard, F., Rivoirard, V.: Adaptive Lasso and group-Lasso for functional Poisson regression. J. Mach. Learn. Res. 17(55), 1–46 (2016)

    MathSciNet  MATH  Google Scholar 

  • Jiang, X., Reynaud-Bouret, P., Rivoirard, V., Sansonnet, L., Willett, R.: A data-dependent weighted LASSO under Poisson noise. arXiv preprint arXiv:1509.08892 (2015)

  • Marschner, I.C., et al.: Glm2: fitting generalized linear models with convergence problems. The R Journal 3(2), 12–15 (2011)

    Article  Google Scholar 

  • Martinez, J.G., Carroll, R.J., Mller, S., Sampson, J.N., Chatterjee, N.: Empirical performance of cross-validation with oracle methods in a genomics context. Am. Stat. 65(4), 223–228 (2011)

    Article  MathSciNet  Google Scholar 

  • McCullagh, P., Nelder, J.A.: Generalized linear models, 2nd edn. Chapman & Hall, New York (1989)

  • Peyré, G.: The numerical tours of signal processing. Comput. Sci. Eng. 13(4), 94–97 (2011). https://doi.org/10.1109/MCSE.2011.71

    Article  Google Scholar 

  • Prince, J.L., Links, J.M.: Medical Imaging Signals and Systems. Pearson Prentice Hall Upper Saddle River, New Jersey (2006)

    Google Scholar 

  • Silva, J., Tenreyro, S.: Poisson: some convergence issues. Stata Journal 11(2), 207–212 (2011)

    Article  Google Scholar 

  • Starck, J.L., Murtagh, F.: Astronomical Image and Data Analysis. Springer, New York (2007)

    MATH  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Zanella, R., Boccacci, P., Zanni, L., Bertero, M.: Corrigendum: efficient gradient projection methods for edge-preserving removal of Poisson noise. Inverse Probl. 29(11), 119–501 (2013)

    Article  MATH  Google Scholar 

  • Zhao, P., Yu, B.: On model selection consistency of lasso. J Mach Learn Res 7, 2541–2563 (2006)

    MathSciNet  MATH  Google Scholar 

  • Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Zhang, H.H.: On the adaptive elastic-net with a diverging number of parameters. Ann. Stat. 37(4), 1733 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabrina Guastavino.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research leading to these results has received funding from the European Unions Horizon2020 research and innovation programme under Grant Agreement No. 640216.

Appendix: proofs

Appendix: proofs

In order to prove Theorem 1, we start by proving the following

Lemma 1

Let y be a Poisson random variable with mean \(\theta \). Let \(z>0\) be such that \(|z-\theta |\le c\sqrt{\theta }\), where c is a positive constant smaller than \(\sqrt{\theta }\). Then

$$\begin{aligned} \mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{z}\right) = O\left( \frac{1}{\theta }\right) , \quad \text { as } \theta \rightarrow \infty . \end{aligned}$$
(22)

Proof

Following Zanella et al. (2013) we obtain

$$\begin{aligned} D(z,k)&= \frac{1}{2}\frac{(k-z)^2}{z}-\frac{1}{6}\frac{(k-z)^3}{z^2}+\frac{1}{3}\frac{(k-z)^4}{z^3} \\&\quad +\, k R_3\left( \frac{k-z}{z}\right) , \end{aligned}$$

where \(k\in \mathbb {N}\) and \(R_3\) is defined as follows

$$\begin{aligned} R_3(\xi )=\int _{0}^{\xi } \frac{(t-\xi )^3}{(1+t)^4} dt, \end{aligned}$$

where \(\xi \ge -1\). By computing the moments of the Poisson random variable y centered in z we obtain

$$\begin{aligned}&\mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{z}\right) =-\frac{1}{6}\frac{\theta -3\theta r-r^3}{z^2}\nonumber \\&\quad +\,\frac{1}{3}\frac{3\theta ^2+6\theta r^2-4\theta r+\theta + r^4}{z^3}+\mathscr {E}(\theta ), \end{aligned}$$
(23)

where \(r:=z-\theta \) and

$$\begin{aligned} \mathscr {E}(\theta )=\mathbb {E}\left( y R_3\left( \frac{y-z}{z}\right) \right) =\sum _{k=1}^{\infty } \frac{e^{-\theta }\theta ^k}{k!} k R_3\left( \frac{k-z}{z}\right) . \end{aligned}$$

To conclude we now prove that \(\mathscr {E}(\theta )=O(\frac{1}{\theta })\). Following the idea of the proof given in Zanella et al. (2013), we split the series into two parts: in the first ranging k between 0 and \([\frac{z}{2}]\) and in the second one \(k \ge [\frac{z}{2}]+1\), where \([\chi ]\) denotes the integer part of \(\chi \). We observe that for k from 1 to \( [\frac{z}{2}]\), or equivalently \(\xi \in (-\,1,-\,\frac{1}{2}]\), then

$$\begin{aligned} (1+\xi )|R_3(\xi )|\le \frac{1}{e}. \end{aligned}$$
(24)

Since \(\frac{\theta ^s}{s!}=\frac{\theta ^s}{\varGamma (s+1)}\) is monotonically increasing for \(0\le s\le [\frac{\theta +c\sqrt{\theta }}{2}]\), using Eq. (24) and the Stirling formula we obtain

$$\begin{aligned}&\left| \sum _{k=1}^{[\frac{z}{2}]}\frac{e^{-\theta }\theta ^k}{k!} k R_3\left( \frac{k-z}{z}\right) \right| \le \frac{1}{e} \sum _{k=1}^{[\frac{z}{2}]} z e^{-\theta }\frac{\theta ^k}{k!} \nonumber \\&\quad \le \frac{1}{e} \sum _{k=1}^{[\frac{\theta +c\sqrt{\theta }}{2}]} z e^{-\theta }\frac{\theta ^k}{k!}\le \frac{1}{e} \left[ \frac{\theta +c\sqrt{\theta }}{2}\right] e^{-\theta } \frac{\theta ^{[\frac{\theta +c\sqrt{\theta }}{2}]}}{[\frac{\theta +c\sqrt{\theta }}{2}]!} z \nonumber \\&\quad \le \frac{e^{-\theta -1+[\frac{\theta +c\sqrt{\theta }}{2}]}}{\sqrt{2\pi }}\left( \frac{\theta }{[\frac{\theta +c\sqrt{\theta }}{2}]}\right) ^{[\frac{\theta +c\sqrt{\theta }}{2}]} \left[ \frac{\theta +c\sqrt{\theta }}{2}\right] ^{\frac{1}{2}} z \, . \end{aligned}$$
(25)

Since \(\frac{\theta +c\sqrt{\theta }}{2} - 1 \le \left[ \frac{\theta +c\sqrt{\theta }}{2} \right] \le \frac{\theta +c\sqrt{\theta }}{2}\) and \(\frac{\theta }{ \left[ \frac{\theta +c\sqrt{\theta }}{2} \right] } > 1\) for \(\theta \) large enough, the upper bound in inequality (25) can be bounded by

$$\begin{aligned} M(\theta ) := \frac{e^{-\frac{1}{2}\theta \left( 1+2\theta ^{-1}-c\theta ^{-\frac{1}{2}}- v\log \left( \frac{2}{ v-2\theta ^{-1}}\right) \right) } (\theta v)^{\frac{3}{2}}}{2\sqrt{\pi }} \end{aligned}$$
(26)

where \(v:=1+c\theta ^{-\frac{1}{2}}\). Then \(M(\theta )\rightarrow 0\) exponentially as \(\theta \rightarrow \infty \). Now we consider \(k\ge [\frac{z}{2}]+1\), or equivalently \(\xi >-\frac{1}{2}\). Since

$$\begin{aligned} |R_3(\xi )|\le 4 \xi ^4, \end{aligned}$$
(27)

we obtain

$$\begin{aligned}&\left| \sum _{k=[\frac{z}{2}]+1}^{\infty }\frac{e^{-\theta }\theta ^k}{k!} k R_3\left( \frac{k-z}{z}\right) \right| \\&\quad \le 4\sum _{k=0}^{\infty }\frac{e^{-\theta }\theta ^k}{k!} k \left( \frac{k-z}{z}\right) ^4 \\&\quad = 4\mathbb {E}\left( \frac{(y-z)^5}{z^4}+\frac{(y-z)^4}{z^3}\right) \\&\quad = 4\frac{3\theta ^3+6\theta ^2 r^2-16\theta ^2 r+11\theta ^2+ \theta (r-1)^4}{z^4} \\&\quad \le 4 \frac{(3+6c^2)\theta ^3+16 c \theta ^2\sqrt{\theta }+11\theta ^2+\theta (c\sqrt{\theta }+1)^4}{(\theta -c\sqrt{\theta })^4}\\&\quad =O\left( \frac{1}{\theta }\right) , \end{aligned}$$

where

$$\begin{aligned} \mathbb {E}((y-z)^5)&= \theta ^5-5\theta ^4(z-2)+5\theta ^3(2z^2-6z+5)\\&\quad -\,5\theta ^2(2z^3-6z^2+7z-3)\\&\quad +\,\theta (5z^4-10z^3+10z^2-5z+1)-z^5 \end{aligned}$$
$$\begin{aligned} \mathbb {E}((y-z)^4)&= \theta ^4+\theta ^3(6-4z)+\theta ^2(6z^2-12z+7)\\&\quad +\,\theta (-4z^3+6z^2-4z+1)+z^4. \end{aligned}$$

\(\square \)

are the 5th and 4th moments of the Poisson random variable y centered in z.

Proof of Theorem 1

By the triangular inequality we have

$$\begin{aligned}&\left| \mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{y+1}\right) \right| \nonumber \\&\quad \le \left| \mathbb {E}\left( D(z,y)-\frac{1}{2}\frac{(y-z)^2}{z}\right) \right| \nonumber \\&\qquad +\,\left| \mathbb {E}\left( \frac{1}{2}\frac{(y-z)^2}{z}-\frac{1}{2}\frac{(y-z)^2}{y+1}\right) \right| . \end{aligned}$$
(28)

Then, to get the thesis, thanks to Lemma 1, it is sufficient to prove that

$$\begin{aligned} \mathbb {E}\left( \frac{(y-z)^2}{z}-\frac{(y-z)^2}{y+1}\right) = O\left( \frac{1}{\sqrt{\theta }}\right) ,\quad \text { as } \theta \rightarrow \infty .\nonumber \\ \end{aligned}$$
(29)

By writing the left hand side of the Eq. (29) as the difference between the second moments of a Poisson variable centered in z and in \(z+1\), we obtain

$$\begin{aligned}&\frac{1}{z}\mathbb {E}\left( (y-z)^2\right) -\frac{1}{\theta }\mathbb {E}\left( (y-z-1)^2\right) +\frac{1}{\theta }e^{-\theta }(z+1)^2 \\&\quad = \frac{\theta ^2-2\theta z+\theta +z^2}{z} \\&\qquad -\,\frac{\theta ^2-\theta (2z+1)+(z+1)^2-(z+1)^2 e^{-\theta }}{\theta }. \end{aligned}$$

By some manipulations and by using that \(|z-\theta |\le c\sqrt{\theta }\) we get

$$\begin{aligned}&\left| \mathbb {E}\left( \frac{(y-z)^2}{z}-\frac{(y-z)^2}{y+1}\right) \right| \\&\quad \le e^{-\theta }\left| \frac{(z+1)^2}{\theta }\right| +\left| \frac{(\theta -z)^3}{z\theta }\right| \\&\qquad +\,\left| \frac{(\theta -z)^2}{z\theta }\right| +\left| \frac{3(\theta -z)}{\theta }-\frac{1}{\theta }\right| \\&\quad \le e^{-\theta }\frac{(\theta +c\sqrt{\theta }+1)^2}{\theta }+\frac{c^3}{\sqrt{\theta }-c}\\&\qquad +\,\frac{c^2}{\theta -c\sqrt{\theta }}+\frac{3c}{\sqrt{\theta }}+\frac{1}{\theta } = O\left( \frac{1}{\sqrt{\theta }}\right) . \end{aligned}$$

\(\square \)

To prove Theorem 2 we need some preliminary results. We start by defining

$$\begin{aligned} \epsilon :=\mathbf {Y}-\mathbf X \beta ^*. \end{aligned}$$
(30)

We observe that the components \(\epsilon _i\) are independent random variables with zero mean and \(\mathrm{Var}(\epsilon _i)=(\mathbf X \beta ^*)_i\), for all \(i\in \{1,\ldots ,n\}\). Hereafter, for easy of notation we suppress the superscript (n) from the estimators.

Lemma 2

There exists a constant \(G<+\infty \) such that

$$\begin{aligned} \mathbb {E}\left( \Vert \mathbf {X}^T\varLambda ^2\epsilon \Vert ^4_2\right) \le p^2 G^2 (\tau _{\max }(\mathbf X ^T \mathbf X ))^2. \end{aligned}$$
(31)

Proof of Lemma 2

We compute the term in the l.h.s. in Eq. (31). We have

$$\begin{aligned} \mathbb {E}\left( \Vert \mathbf X ^T\varLambda ^2\epsilon \Vert ^4_2\right) =\mathbb {E}( (\mathbf D ^T \mathbf X \mathbf X ^T \mathbf D )^2 ) \end{aligned}$$
(32)

with \(\mathbf D :=\varLambda ^2\epsilon \). For the Singular Value Decomposition we can write

$$\begin{aligned} \mathbf X \mathbf X ^T= \mathbf U ^T \varSigma \mathbf U ~, \end{aligned}$$

where \(\mathbf U \) is an orthogonal matrix and \(\varSigma \) a diagonal matrix containing the eigenvalues of \(\mathbf X ^T \mathbf X \). We define We define \(\mathbf H :=\mathbf U \mathbf D = \mathbf U \varLambda ^2 \epsilon \). The ith component of \(\mathbf H \) is given by

$$\begin{aligned} H_i= \sum _{l=1}^{n} u_{il}\frac{\epsilon _l}{Y_l+1} ~, \end{aligned}$$

where \(u_{il}\) represents the (il)-entry of the matrix U. Since \(\frac{\epsilon _l}{Y_l+1}\) takes values between \(-\,(\mathbf X \beta ^*)_l\) and 1, we have that each component \(H_i\) takes values in a compact subset [RS]. Therefore, as \(\mathbf H \in [R,S]^n\), the quadratic form \((\mathbf H ^T\varSigma \mathbf H )^2\) admits a maximum, i.e. there exists an \(\mathbf H ^*\) such that \((\mathbf H ^T\varSigma \mathbf H )^2\le ((\mathbf H ^*)^T\varSigma \mathbf H ^*)^2\). Then

$$\begin{aligned} \mathbb {E}\left( \Vert \mathbf X ^T\varLambda ^2\epsilon \Vert ^4_2\right)&=\mathbb {E}((\mathbf H ^T \varSigma \mathbf H )^2)\le ((\mathbf H ^*)^T\varSigma \mathbf H ^*)^2 \nonumber \\&\le p^2 G^2 (\tau _{\max }(\mathbf X ^T \mathbf X ))^2, \end{aligned}$$
(33)

with

$$\begin{aligned} G:=\max _{\begin{array}{c} i\in \{1,\ldots ,n\} : \\ \varSigma _{ii}\ne 0 \end{array}}(H^*_i)^2. \end{aligned}$$
(34)

\(\square \)

Corollary 2

Under assumption (H1) there exists a constant \(G<+\infty \) such that we have the following bound

$$\begin{aligned} \mathbb {E}(\Vert \hat{\beta }(\mathrm{PRLS})-\beta ^*\Vert ^2_2) \le \frac{p G B n}{ (bn)^2}, \end{aligned}$$
(35)

where \(\hat{\beta }(\mathrm{PRLS})\) is the reweighted least square estimator defined as follows

$$\begin{aligned} \hat{\beta }(\mathrm{PRLS})=\arg \min _{\beta }\frac{1}{2}\Vert \varLambda (\mathbf {Y}-\mathbf {X}\beta )\Vert ^2_2~. \end{aligned}$$
(36)

Proof of Corollary 2

By using optimality conditions of problem in Eq. (36) and the definition of \(\epsilon \) we have

$$\begin{aligned} \hat{\beta }(\mathrm{PRLS})-\beta ^*=(\mathbf X ^T\varLambda ^2 \mathbf X )^{-1} (\mathbf X ^T\varLambda ^2\epsilon )~. \end{aligned}$$
(37)

Then, by using the Cauchy-Schwartz inequality we obtain

$$\begin{aligned}&\mathbb {E}(\Vert \hat{\beta }(\mathrm{PRLS})-\beta ^*\Vert ^2_2)\nonumber \\&\quad \le \sqrt{\mathbb {E}(\Vert (\mathbf X ^T\varLambda ^2 \mathbf X )^{-1}\Vert ^4_2)\mathbb {E}(\Vert \mathbf X ^T\varLambda ^2\epsilon \Vert ^4_2)}, \end{aligned}$$
(38)

where

$$\begin{aligned} \Vert (\mathbf X ^T\varLambda ^2 \mathbf X )^{-1}\Vert ^4_2= \frac{1}{(\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X ))^4}~. \end{aligned}$$
(39)

By assumption (H1) and Lemma 2 we have the thesis. \(\square \)

Proof of Theorem 2

We want to prove the bound in Eq. (12). From Corollary 2, since

$$\begin{aligned} \mathbb {E}(\Vert \hat{\beta }_{(\mathbf {w},\lambda )}-\beta ^*\Vert ^2_2)&\le \mathbb {E}(\Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert ^2_2) \nonumber \\&\quad +\,\mathbb {E}(\Vert \hat{\beta }(\mathrm{PRLS})-\beta ^*\Vert ^2_2), \end{aligned}$$
(40)

we have to establish a bound for the first term of the r.h.s. of (40). In order to do so, we follow similar arguments as in the proof of Theorem 3.1 by Zou and Zhang (2009). By definition of \(\hat{\beta }_{(\mathbf {w} ,\lambda )}\), the following inequality applies

$$\begin{aligned}&\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }_{(\mathbf {w} ,\lambda )})\Vert ^2_2-\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }(\mathrm{PRLS}))\Vert ^2_2 \nonumber \\&\quad \le \lambda \sum _{j=1}^{p}w_j(|\hat{\beta }(\mathrm{PRLS})_j|-|(\hat{\beta }_{(\mathbf {w} ,\lambda )})_j|). \end{aligned}$$
(41)

From the optimality conditions of the optimization problem in Eq. (36), we have

$$\begin{aligned}&\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }_{(\mathbf {w} ,\lambda )})\Vert ^2_2-\frac{1}{2}\Vert \varLambda (\mathbf Y -\mathbf X \hat{\beta }(\mathrm{PRLS}))\Vert ^2_2\nonumber \\&\quad =\frac{1}{2}(\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS}))^T \mathbf X ^T\varLambda ^2 \mathbf X (\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})), \end{aligned}$$
(42)

and we notice that

$$\begin{aligned}&\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )\Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert ^2_2 \nonumber \\&\quad \le (\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS}))^T \mathbf X ^T\varLambda ^2 \mathbf X (\hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})) \end{aligned}$$
(43)

and

$$\begin{aligned}&\sum _{j=1}^{p}w_j(|\hat{\beta }(\mathrm{PRLS})_j | -|(\hat{\beta }_{(\mathbf {w} ,\lambda )})_j|) \nonumber \\&\quad \le \sqrt{\sum _{j=1}^{p}w_j^2}\Vert \hat{\beta }(\mathrm{PRLS})-\hat{\beta }_{(\mathbf {w} ,\lambda )}\Vert _2. \end{aligned}$$
(44)

Using (41), (42), (43) and (44) we obtain

$$\begin{aligned} \Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert _2\le \frac{2\lambda \sqrt{\sum _{j=1}^{p} w_j^2}}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}, \end{aligned}$$
(45)

and finally, the Cauchy Schwartz inequality and assumption (H1) lead to

$$\begin{aligned} \mathbb {E}(\Vert \hat{\beta }_{(\mathbf {w} ,\lambda )}-\hat{\beta }(\mathrm{PRLS})\Vert _2^2)\le \frac{4\lambda ^2 \sqrt{\mathbb {E}\left( \left( \sum _{j=1}^{p} w_j^2\right) ^2\right) }}{(b n)^{2}}. \end{aligned}$$
(46)

The thesis follows from Eqs. (40), (46) and Corollary 2. \(\square \)

Proof of Theorem 3

For brevity we denote the APRiL estimator by \(\hat{\beta }\). To prove the model selection consistency we prove that for \(n\rightarrow +\infty \)

$$\begin{aligned} \mathbb {P}(\forall j\in (\mathscr {A^*})^C,\quad \hat{\beta }_j=0)\longrightarrow 1 \end{aligned}$$
(47)

and

$$\begin{aligned} \mathbb {P}(\forall j\in \mathscr {A^*},\quad |\hat{\beta }_j|>0)\longrightarrow 1 ~. \end{aligned}$$
(48)

We now prove Eq. (47). The functional defined in Eq. (8) is convex and not differentiable and \(\mathscr {C}\) is convex set. Then the solution \(\hat{\beta }\) is characterized by the (KKT) optimality conditions:

  • \((\mathbf X \hat{\beta })_i\ge 0\)\(\forall \)\(i\in \{1,\ldots ,n\}\);

  • \(\nu _i\ge 0\)\(\forall \)\(i\in \{1,\ldots ,n\}\);

  • \(\nu _i (\mathbf X \hat{\beta })_i=0\)\(\forall \)\(i\in \{1,\ldots ,n\}\);

  • if \(\hat{\beta }_j\ne 0\)

    $$\begin{aligned} -\,\mathbf x ^T_j\varLambda ^2(\mathbf Y -\mathbf X \hat{\beta })+\lambda \hat{w}_j sgn(\hat{\beta }_j)-\mathbf x _j^T\nu =0; \end{aligned}$$
    (49)
  • if \(\hat{\beta }_j=0\)

    $$\begin{aligned} -\,\mathbf x _j^T\varLambda ^2(\mathbf Y -\mathbf X \hat{\beta })+\lambda \hat{w}_j s_j-\mathbf x _j^T\nu =0, \end{aligned}$$
    (50)

    with \(s_j\in [-\,1,1]\).

\(\nu \) is the n-dimensional vector whose components are the Lagrangian multipliers. Thanks to KKT conditions, the event \(\{ \forall j\in (\mathscr {A^*})^C, \hat{\beta }_j=0 \}\) can be written as

$$\begin{aligned} \{\mathbf{x }^T_{j}\varLambda ^{2}(\mathbf{Y }-\mathbf{X }_{\mathscr {A}^{*}}{\hat{\beta }}_{\mathscr {A}^{*}})+\mathbf{x }_j^{T} \nu =\lambda {\hat{w}}_{j} s_{j}, \forall j\in (\mathscr {A}^*)^C\}, \end{aligned}$$
(51)

where \(|s_j|\le 1\) [see Eq. (50)], \(\mathbf X _{\mathscr {A^*}}\) is the matrix constituted by the columns \(\mathbf x _j\) and \(\hat{\beta }_{\mathscr {A^*}}\) is the vector constituted by the components \(\hat{\beta }_j\) with \(j\in \mathscr {A^*}\). By taking the absolute value of each Equation in (51) the event takes the form

$$\begin{aligned} \{ |\mathbf x ^T_j(\varLambda ^2(\mathbf Y -\mathbf X _{\mathscr {A^*}}\hat{\beta }_{\mathscr {A^*}})+\nu )|\le \lambda \hat{w}_j, \forall j\in (\mathscr {A^*})^C \}. \end{aligned}$$
(52)

This implies that Eq. (47) is equivalent to \(\mathbb {P}\left( \exists j\in (\mathscr {A^*})^C, \left| \mathbf x ^T_j(\varLambda ^2(\mathbf Y -\mathbf X _{\mathscr {A^*}}\hat{\beta }_{\mathscr {A^*}})+\nu )\right| >\lambda \hat{w}_j\right) \rightarrow 0\) for \(n\rightarrow +\infty \). We set

$$\begin{aligned} \hat{S}_j:= & {} |\hat{\beta }(\mathrm{PRiL})_j|+\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }, \\ \hat{\eta }:= & {} \min _{j\in \mathscr {A^*}}\hat{S}_j, \\ \eta:= & {} \min _{j\in \mathscr {A^*}}|\beta ^{*}_j|+\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }, \end{aligned}$$

and

$$\begin{aligned} \hat{E}_j:= \left| \mathbf x ^T_j(\varLambda ^2(\mathbf Y -\mathbf X _{\mathscr {A^*}}\hat{\beta }_{\mathscr {A^*}})+\nu )\right| . \end{aligned}$$

Then

$$\begin{aligned}&\mathbb {P}\left( \exists j\in (\mathscr {A^*})^C \hat{E}_j>\lambda \hat{w_j}\right) \nonumber \\&\quad \le \sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \nonumber \\&\qquad +\,\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{S}_j>\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) +\mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right) . \end{aligned}$$
(53)

The idea is to determine three bounds \(M_1\), \(M_2\) and \(M_3\) depending on n, such that

$$\begin{aligned}&\mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right) \le M_1, \end{aligned}$$
(54)
$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{S}_j>\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \le M_2, \end{aligned}$$
(55)
$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \le M_3. \end{aligned}$$
(56)

and \(M_1\), \(M_2\) and \(M_3\) go to 0 for \(n\rightarrow +\infty \). Let us start with the determination of bound \(M_1\). Using Corollary 1, it follows that

$$\begin{aligned} \mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right)&\le \mathbb {P}\left( \Vert \hat{\beta }(\mathrm{PRiL})-\beta ^*\Vert _2\ge \frac{\eta }{2}\right) \\&\le \frac{2}{\eta }\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{p G B n}}{bn}\right) =:M_1. \end{aligned}$$

For the determination of bound \(M_2\), we use again Corollary 1. We have that

$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{S}_j>\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \\&\quad \le \frac{\mathbb {E}\left( \Vert \hat{\beta }(\mathrm{PRiL})-\beta ^*\Vert _2\right) +p\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }}{\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}}\\&\quad \le \frac{1}{(\frac{\lambda }{n})^{\frac{1}{\gamma }}}\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{pGBn}}{bn}+\frac{p}{n^{\frac{1}{\gamma }+\delta }}\right) =: M_2. \end{aligned}$$

Finally, for the determination of bound \(M_3\), we write

$$\begin{aligned}&\sum \nolimits _{j\in (\mathscr {A^*})^C}\mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \\&\quad \le 2\frac{\mathbb {E}\left( \sum _{j\in (\mathscr {A^*})^C}\hat{E}_j \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) }{n}. \end{aligned}$$

By definition of \(\epsilon \), we have

$$\begin{aligned}&\sum \nolimits _{j\in (\mathscr {A^*})^C} \hat{E}_j \nonumber \\&\quad =\sum \nolimits _{j\in (\mathscr {A^*})^C}\left| \mathbf x ^T_j\varLambda ^2\mathbf X _{\mathscr {A^*}}(\beta ^*_{\mathscr {A^*}}-\hat{\beta }_{\mathscr {A^*}})+\mathbf x ^T_j\varLambda ^2\epsilon +\mathbf x ^T_j\nu \right| \nonumber \\&\quad \le \sum \nolimits _{j\in (\mathscr {A^*})^C}\Vert \mathbf x _j^T \varLambda \Vert _2 \sqrt{\tau _{\max }(\mathbf X ^T\varLambda ^2 \mathbf X )} \Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }_{\mathscr {A^*}}\Vert _2 \nonumber \\&\qquad +\,\sum \nolimits _{j\in (\mathscr {A^*})^C}\left| \mathbf x _j^T\nu \right| + \sum \nolimits _{j\in (\mathscr {A^*})^C}\Vert \mathbf x ^T_j\varLambda \Vert _2\Vert \varLambda \epsilon \Vert _2. \end{aligned}$$
(57)

By using assumption (H4), we get

$$\begin{aligned} \sum _{j\in (\mathscr {A^*})^C}\Vert \mathbf x _j^T \varLambda \Vert _2\le p\max _{j=1,\ldots ,p} \Vert \mathbf x _j\Vert _2\le p L, \end{aligned}$$
(58)

and

$$\begin{aligned} \mathbb {E}(\Vert \varLambda \epsilon \Vert _2)=\mathbb {E}\left( \sqrt{\sum _{i=1}^{n}\left( \frac{\epsilon _i}{\sqrt{Y_i+1}}\right) ^2}\right) \le \sqrt{2 n} \end{aligned}$$
(59)

where we used that \(\mathbb {E}\left( \frac{\epsilon _i^2}{Y_i+1}\right) \le 2\) for all \(i\in \{1,\ldots ,n\}\). Following the idea of the proof of Lemma 2, in particular the calculus which leads (35), we have

$$\begin{aligned} \Vert \hat{\beta }_{\mathscr {A^*}}-\hat{\beta }(\mathrm{PRLS})_{\mathscr {A^*}}\Vert _2\le \frac{2\lambda \sqrt{p} \frac{1}{\hat{\eta }^{\gamma }}}{\tau _{\min }(\mathbf X ^T_{\mathscr {A^*}}\varLambda ^2 \mathbf X _{\mathscr {A^*}})}. \end{aligned}$$
(60)

Thanks to the Cauchy-Schwartz inequality, Eqs. (57), (58), (59), (60) and hypothesis (H1) we obtain the following bound

$$\begin{aligned}&\mathbb {E}\left( \sum _{j\in (\mathscr {A^*})^C}\hat{E}_j \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \nonumber \\&\quad \le p L\sqrt{\mathbb {E}\left( \tau _{\max }( \mathbf X ^T\varLambda ^2 \mathbf X )\right) \mathbb {E}\left( \Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }_{\mathscr {A^*}}\Vert _2^2 \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) }\nonumber \\&\qquad +\,\mathbb {E}\left( \sum _{j\in (\mathscr {A}^*)^C}\left| \mathbf x _j^T\nu \right| \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) +p L\mathbb {E}(\Vert \varLambda \epsilon \Vert _2)\nonumber \\&\quad \le p L \sqrt{B n}\left( \frac{2\lambda \sqrt{p} \left( \frac{\eta }{2}\right) ^{-\gamma }+\sqrt{pGBn}}{b n }\right) \nonumber \\&\qquad +\,\mathbb {E}\left( \sum _{j\in (\mathscr {A}^*)^C}\left| \mathbf x _j^T\nu \right| \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) +p L \sqrt{2n}. \end{aligned}$$
(61)

From optimality conditions in Eqs. (49) and (50) it follows that

$$\begin{aligned} | \mathbf x ^T_j\nu |\le | \mathbf x _j^T\varLambda ^2(\mathbf Y -\mathbf X \hat{\beta })|+\lambda \hat{w}_j, \end{aligned}$$

so we have

$$\begin{aligned}&\mathbb {E}\left( \sum _{j\in (\mathscr {A}^*)^C}\left| \mathbf x _j^T\nu \right| \mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \\&\quad \le \mathbb {E}\left( \sum _{j\in \mathscr {A}^*}| \mathbf x _j^T\varLambda ^2\epsilon |\right) \\&\qquad +\,\mathbb {E}\left( \sum _{j\in \mathscr {A}^*}| \mathbf x _j^T\varLambda ^2 \mathbf X (\beta ^*-\hat{\beta })|\mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \\&\qquad +\,\lambda \mathbb {E}\left( \sum _{j\in \mathscr {A}^*}\hat{w}_j\mathbf 1 _{\begin{Bmatrix} \hat{\eta }>\frac{\eta }{2} \end{Bmatrix}}\right) \\&\quad \le p L \sqrt{2n}+p L \sqrt{B n}\left( \frac{2\lambda \sqrt{p} n^{1+\gamma \delta }+\sqrt{p G B n} }{bn}\right) \\&\qquad +\,\lambda p \left( \frac{2}{\eta }\right) ^{\gamma } \end{aligned}$$

where we have used the following bound

$$\begin{aligned}&\mathbb {E}\left( \left( \sum _{j=1}^{p}\hat{w}_j^2\right) ^2\right) \nonumber \\&\quad =\mathbb {E}\left( \left( \sum _{j=1}^{p}\frac{1}{\left( |\hat{\beta }(\mathrm{PRiL})_j|+\left( \frac{1}{n}\right) ^{\frac{1}{\gamma }+\delta }\right) ^{2\gamma }}\right) ^2\right) \nonumber \\&\quad \le p^2 n^{4(1+\delta \gamma )}. \end{aligned}$$
(62)

Then, we obtain

$$\begin{aligned}&\sum _{j\in (\mathscr {A^*})^C} \mathbb {P}\left( \hat{E}_j>\lambda \hat{w_j},\hat{\eta }>\frac{\eta }{2}, \hat{S}_j\le \left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}\right) \\&\quad \le \frac{4 p L}{n}\left( \sqrt{2n}+\frac{B\sqrt{pG}}{b}+\sqrt{B n p}\frac{\lambda \left( \frac{2}{\eta }\right) ^{\gamma }+\lambda n^{1+\delta \gamma }}{bn} + \lambda \frac{2^{\gamma -1}}{\eta ^{\gamma } L}\right) \\&\quad =: M_3. \end{aligned}$$

Now we prove that \(M_1\), \(M_2\) and \(M_3\) go to 0 for \(n\rightarrow +\infty \).

$$\begin{aligned} M_3 \rightarrow 0 \end{aligned}$$

because \(\frac{\sqrt{n} \lambda \left( \frac{2}{\eta }\right) ^{\gamma }}{n^2} \longrightarrow 0\), \(\frac{\sqrt{n}\lambda n^{1+\delta \gamma }}{n^{2}}\longrightarrow 0\) and \(\frac{\lambda }{n}\left( \frac{2}{\eta }\right) ^{\gamma }\longrightarrow 0\) for \(n\rightarrow +\infty \), for the assumption (H3 c) and for the positivity of constants \(\gamma \) and \(\delta \);

$$\begin{aligned} M_2 = \frac{1}{(\frac{\lambda }{n})^{\frac{1}{\gamma }}}\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{pGBn}}{bn}+\frac{p}{n^{\frac{1}{\gamma }+\delta }}\right) \rightarrow 0 \end{aligned}$$

because \(\frac{\lambda _1}{\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }} n}\longrightarrow 0\) for \(n\rightarrow +\infty \) for assumptions (H2) and (H3 a), \(\frac{\sqrt{n}}{n\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }}}=\frac{1}{\left( \lambda n^{\frac{\gamma }{2}-1}\right) ^{\frac{1}{\gamma }}}\longrightarrow 0\) for \(n\rightarrow +\infty \) for the assumption (H3 a), and at last \(\frac{1}{\left( \frac{\lambda }{n}\right) ^{\frac{1}{\gamma }} n^{\frac{1}{\gamma }+\delta }}\longrightarrow 0\) for \(n\rightarrow +\infty \) for the assumption (H3 b);

$$\begin{aligned} M_1 = \frac{2}{\eta }\left( \frac{2\lambda _1 \sqrt{p}+\sqrt{pGBn}}{bn}\right) \rightarrow 0 \end{aligned}$$

because \(\frac{\lambda _1}{n \eta } \longrightarrow 0\) for \(n\rightarrow +\infty \), for the assumption (H2) and the definition of \(\eta \), and \(\frac{\sqrt{n}}{n \eta }=O(\frac{1}{\sqrt{n}})\) for \(n\rightarrow +\infty \).

Now we prove Eq. (48). It is sufficient to show that

$$\begin{aligned} \mathbb {P}\left( \min _{j\in \mathscr {A^*}}|\hat{\beta }_j|>0\right) \longrightarrow 1,\quad n\rightarrow +\infty . \end{aligned}$$

By Eq. (60) we have

$$\begin{aligned} \min _{j\in \mathscr {A^*}}|\hat{\beta }_j|>\min _{j\in \mathscr {A^*}}|\hat{\beta }(\mathrm{PRLS})_j|-\frac{2\lambda \sqrt{p}\hat{\eta }^{-\gamma }}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )} ~, \end{aligned}$$
(63)

where

$$\begin{aligned} \min _{j\in \mathscr {A^*}}|\hat{\beta }(\mathrm{PRLS})_j|\ge \min _{j\in \mathscr {A^*}}|\beta ^*_j|-\Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }(\mathrm{PRLS})_{\mathscr {A^*}}\Vert _2 ~.\nonumber \\ \end{aligned}$$
(64)

Since \(\min _{j\in \mathscr {A}^*}|\beta _j^*|>0\), to conclude we show that \(\Vert \beta ^*_{\mathscr {A^*}}-\hat{\beta }(\mathrm{PRLS})_{\mathscr {A^*}}\Vert _2\) and \(\frac{2\lambda \sqrt{p}\hat{\eta }^{-\gamma }}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}\) go to 0 in probability. Equation (35) implies that the second term in the r.h.s of (64) goes to zero. Moreover, for the second term in Eq. (63) we have that, given \(M>0\)

$$\begin{aligned}&\mathbb {P}\left( \frac{2\lambda \sqrt{p}}{\hat{\eta }^{\gamma }\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}>M\right) \nonumber \\&\quad \le \mathbb {P}\left( \frac{2\lambda \sqrt{p}}{\hat{\eta }^{\gamma }\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}>M, \{\hat{\eta }>\frac{\eta }{2}\}\right) +\mathbb {P}\left( \hat{\eta }\le \frac{\eta }{2}\right) \nonumber \\&\quad \le \frac{2\lambda \sqrt{p}}{M}\sqrt{\mathbb {E}\left( \left( \frac{1}{\tau _{\min }(\mathbf X ^T\varLambda ^2 \mathbf X )}\right) ^{2}\right) \mathbb {E}\left( \frac{1}{\hat{\eta }^{2\gamma }} \mathbf 1 _{\{\hat{\eta }>\frac{\eta }{2}\}}\right) }+M_1 \nonumber \\&\quad \le \frac{2\lambda \sqrt{p}}{b n M}\left( \frac{2}{\eta }\right) ^{\gamma }+M_1\longrightarrow 0 \text { for } n\rightarrow +\infty \end{aligned}$$
(65)

as \(M_1\rightarrow 0\) for \(n\rightarrow +\infty \) and \(\frac{\lambda }{n }\left( \frac{2}{\eta }\right) ^{\gamma }\rightarrow 0\) for \(n\rightarrow +\infty \) thanks to assumption (H3 c). This proves Eq. (48) and concludes the proof. \(\square \)

Proof of Proposition 1

The proof is analogous to the one of Theorem 3. We start by proving that Eq. (47) holds. It is sufficient to show that the bounds \(M_1\), \(M_2\) and \(M_3\) go to 0 as \(n\rightarrow \infty \) under assumptions (H5), (H6), (H7) and (H8). We have that \(M_1 \rightarrow 0\) since \(\frac{\lambda _1 \sqrt{p}}{\eta n}\rightarrow 0\) for the assumption (H7 a) and \(\frac{\sqrt{p}}{\sqrt{n}\eta }\rightarrow 0\) for the assumption (H6). Moreover, \(M_2 \rightarrow 0\) since \(\left( \frac{\lambda }{n}\right) ^{-\frac{1}{\gamma }}\frac{\lambda _1 \sqrt{p}}{n}=\left( \frac{\lambda n^{\delta \gamma }}{p^{\gamma }}\right) ^{-\frac{1}{\gamma }} \frac{\lambda _1 n^{\delta +\frac{1}{\gamma }-1}}{\sqrt{p}}\rightarrow 0\) for assumptions (H8 b) and (H7 b), \(\left( \frac{\lambda }{n}\right) ^{-\frac{1}{\gamma }} \frac{\sqrt{p}}{\sqrt{n}}=\left( \frac{\lambda n^{\delta \gamma }}{p^{\gamma }}\right) ^{-\frac{1}{\gamma }} \frac{n^{-\frac{1}{2}+\delta +\frac{1}{\gamma }}}{\sqrt{p} }\rightarrow 0\) for the assumption (H8 b) and for the hypothesis \(\frac{1}{\gamma }+\delta <\frac{c+1}{2}\), and \(p \left( \frac{\lambda }{n}\right) ^{-\frac{1}{\gamma }} n^{-\frac{1}{\gamma }-\delta }\rightarrow 0\) for the assumption (H8 b). Finally, \(M_3 \rightarrow 0\) since the first two terms in the definition of \(M_3\) go to 0 for the assumption (H5), \(\frac{\lambda p}{n \eta ^{\gamma }}\frac{\sqrt{p}}{\sqrt{n}}\rightarrow 0\) for assumptions (H5) and (H8 c), \(\frac{p}{n}\frac{\lambda n^{1+\delta \gamma }}{\sqrt{n}}\sqrt{p}= \lambda n^{\delta \gamma -\frac{1}{2}} p \sqrt{p}\rightarrow 0\) for the assumption (H8 a) and \(\frac{p}{n}\frac{\lambda }{\eta ^{\gamma }}\rightarrow 0\) for the assumption (H8 c).

Now we have to prove that Eq. (48) holds. It is sufficient to observe that, as \(n\rightarrow \infty \), the bound in Eq. (35) goes to 0 for the assumption (H5) and the bound in Eq. (65) goes to 0 for the assumption (H8 c). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guastavino, S., Benvenuto, F. A consistent and numerically efficient variable selection method for sparse Poisson regression with applications to learning and signal recovery. Stat Comput 29, 501–516 (2019). https://doi.org/10.1007/s11222-018-9819-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-018-9819-1

Keywords

Mathematics Subject Classification

Navigation