Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

Published: 01 August 2010 Publication History

Abstract

We consider the problem of high-dimensional variable selection: given n noisy observations of a k-sparse vector β* ∈ Rp, estimate the subset of non-zero entries of β*. A significant body of work has studied behavior of l1-relaxations when applied to random measurement matrices that are dense (e.g., Gaussian, Bernoulli). In this paper, we analyze sparsified measurement ensembles, and consider the trade-off between measurement sparsity, as measured by the fraction γ of non-zero entries, and the statistical efficiency, as measured by the minimal number of observations n required for correct variable selection with probability converging to one. Our main result is to prove that it is possible to let the fraction on non-zero entries γ → 0 at some rate, yielding measurement matrices with a vanishing fraction of non-zeros per row, while retaining the same statistical efficiency as dense ensembles. A variety of simulation results confirm the sharpness of our theoretical predictions.

References

[1]
D. Achlioptas. Database-friendly random projections. In Proc. ACM Symp. Princ. Database Systems (PODS), pages 274-281, New York, USA, 2001. ACM.
[2]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In STOC '96: Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29, New York, NY, USA, 1996. ACM Press.
[3]
R. Baraniuk, M. Davenport, R. Devore, and M. Wakin. A simple proof of the restricted isometry property for random matrices. Constr. Approx, 2007.
[4]
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, UK, 2004.
[5]
E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 51(12):4203- 4215, December 2005.
[6]
S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Computing, 20(1):33-61, 1998.
[7]
G. Cormode and S. Muthukrishnan. Towards an algorithmic theory of compressed sensing. Technical report, Rutgers University, July 2005.
[8]
S. Dasgupta. Learning mixtures of Gaussians. In IEEE Symp. Foundations of Computer Science (FOCS), September 1999.
[9]
S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1):60-65, 2003.
[10]
D. Donoho. For most large underdetermined systems of linear equations, the minimal l 1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6): 797-829, June 2006.
[11]
J. Feldman, T. Malkin, R. A. Servedio, C. Stein, and M. J. Wainwright. LP decoding corrects a constant fraction of errors. IEEE Trans. Information Theory, 53(1):82-89, January 2007.
[12]
A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. Algorithmic linear dimension reduction in the l 1-norm for sparse vectors. In Proc. Allerton Conference on Communication, Control and Computing, Allerton, IL, September 2006.
[13]
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.
[14]
P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, May 2006.
[15]
W. B. Johnson and J. Lindenstrauss. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26:189-206, 1984.
[16]
I. M. Johnstone. Chi-square oracle inequalities. In M. de Gunst, C. Klaassen, and A. van der Vaart, editors, State of the Art in Probability and Statistics, number 37 in IMS Lecture Notes, pages 399-418. Institute of Mathematical Statistics, 2001.
[17]
I. M. Johnstone and A. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682-693, 2009.
[18]
M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag, New York, NY, 1991.
[19]
P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 287-296, New York, NY, USA, 2006. ACM.
[20]
P. Li, T. J. Hastie, and K. W. Church. Nonlinear estimators and tail bounds for dimension reduction in l1 using cauchy random projections. Journal of Machine Learning Research, 8:2497-2532, 2007.
[21]
J. Matousek. Lectures on Discrete Geometry. Springer-Verlag, New York, 2002.
[22]
N. Meinshausen and P. Buhlmann. High dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34:1436-1462, 2006.
[23]
D. Paul. Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statistica Sinica, 17:1617-1642, 2007.
[24]
P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using l 1-regularized logistic regression. Annals of Statistics, 38(3):1287-1319, 2010.
[25]
G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.
[26]
S. Sarvotham, D. Baron, and R. G. Baraniuk. Sudocodes: Fast measurement and reconstruction of sparse signals. In Int. Symposium on Information Theory, Seattle, WA, July 2006.
[27]
R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):267-288, 1996.
[28]
J. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Info Theory, 52(3):1030-1051, March 2006.
[29]
R. Vershynin. On large random almost euclidean bases. Acta. Math. Univ. Comenianae, LXIX: 137-144, 2000.
[30]
R. Vershynin. Random matrix theory. Technical report, UC Davis, 2006. Lecture Notes.
[31]
M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using l 1- constrained quadratic programming (Lasso). IEEE Trans. Inf. Theor., 55(5):2183-2202, 2009.
[32]
M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk. An architecture for compressive imaging. IEEE Int. Conf. Image Proc., 2006.
[33]
W. Wang, M. Garofalakis, and K. Ramchandran. Distributed sparse random projections for refinable approximation. In Proc. International Conference on Information Processing in Sensor Networks, Nashville, TN, April 2007.
[34]
W. Wang, M.J. Wainwright, and K. Ramchandran. Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matrices. IEEE Trans. Inf. Theor., 56(6):2967-2979, 2010.
[35]
W. Xu and B. Hassibi. Efficient compressive sensing with deterministic guarantees using expander graphs. Information Theory Workshop, 2007. ITW '07. IEEE, 2007.
[36]
P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541-2563, 2006.
[37]
S. Zhou, J. Lafferty, and L. Wasserman. Compressed regression. In Neural Information Processing Systems, December 2007.

Cited By

View all
  • (2024)AM-RP Stacking PILers: Random projection stacking pseudoinverse learning algorithm based on attention mechanismThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02780-740:1(273-285)Online publication date: 1-Jan-2024
  • (2020)Random projectionsWIREs Computational Statistics10.1002/wics.149913:1Online publication date: 10-Dec-2020
  • (2016)Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization ApproachAlgorithmic Learning Theory10.1007/978-3-319-46379-7_6(83-97)Online publication date: 19-Oct-2016

Index Terms

  1. High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      Publisher

      JMLR.org

      Publication History

      Published: 01 August 2010
      Published in JMLR Volume 11

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 24 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)AM-RP Stacking PILers: Random projection stacking pseudoinverse learning algorithm based on attention mechanismThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02780-740:1(273-285)Online publication date: 1-Jan-2024
      • (2020)Random projectionsWIREs Computational Statistics10.1002/wics.149913:1Online publication date: 10-Dec-2020
      • (2016)Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization ApproachAlgorithmic Learning Theory10.1007/978-3-319-46379-7_6(83-97)Online publication date: 19-Oct-2016

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media