article

Free access

High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency

Authors:

Martin J. WainwrightAuthors Info & Claims

The Journal of Machine Learning Research, Volume 11

Pages 2361 - 2386

Published: 01 August 2010 Publication History

Abstract

We consider the problem of high-dimensional variable selection: given n noisy observations of a k-sparse vector β^* ∈ R^p, estimate the subset of non-zero entries of β^*. A significant body of work has studied behavior of l₁-relaxations when applied to random measurement matrices that are dense (e.g., Gaussian, Bernoulli). In this paper, we analyze sparsified measurement ensembles, and consider the trade-off between measurement sparsity, as measured by the fraction γ of non-zero entries, and the statistical efficiency, as measured by the minimal number of observations n required for correct variable selection with probability converging to one. Our main result is to prove that it is possible to let the fraction on non-zero entries γ → 0 at some rate, yielding measurement matrices with a vanishing fraction of non-zeros per row, while retaining the same statistical efficiency as dense ensembles. A variety of simulation results confirm the sharpness of our theoretical predictions.

References

[1]

D. Achlioptas. Database-friendly random projections. In Proc. ACM Symp. Princ. Database Systems (PODS), pages 274-281, New York, USA, 2001. ACM.

Digital Library

[2]

N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In STOC '96: Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 20-29, New York, NY, USA, 1996. ACM Press.

Digital Library

[3]

R. Baraniuk, M. Davenport, R. Devore, and M. Wakin. A simple proof of the restricted isometry property for random matrices. Constr. Approx, 2007.

[4]

S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, UK, 2004.

Digital Library

[5]

E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 51(12):4203- 4215, December 2005.

Digital Library

[6]

S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Computing, 20(1):33-61, 1998.

Digital Library

[7]

G. Cormode and S. Muthukrishnan. Towards an algorithmic theory of compressed sensing. Technical report, Rutgers University, July 2005.

[8]

S. Dasgupta. Learning mixtures of Gaussians. In IEEE Symp. Foundations of Computer Science (FOCS), September 1999.

Digital Library

[9]

S. Dasgupta and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1):60-65, 2003.

Digital Library

[10]

D. Donoho. For most large underdetermined systems of linear equations, the minimal l ₁-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6): 797-829, June 2006.

[11]

J. Feldman, T. Malkin, R. A. Servedio, C. Stein, and M. J. Wainwright. LP decoding corrects a constant fraction of errors. IEEE Trans. Information Theory, 53(1):82-89, January 2007.

Digital Library

[12]

A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. Algorithmic linear dimension reduction in the l ₁-norm for sparse vectors. In Proc. Allerton Conference on Communication, Control and Computing, Allerton, IL, September 2006.

[13]

W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13-30, 1963.

[14]

P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307-323, May 2006.

Digital Library

[15]

W. B. Johnson and J. Lindenstrauss. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26:189-206, 1984.

[16]

I. M. Johnstone. Chi-square oracle inequalities. In M. de Gunst, C. Klaassen, and A. van der Vaart, editors, State of the Art in Probability and Statistics, number 37 in IMS Lecture Notes, pages 399-418. Institute of Mathematical Statistics, 2001.

[17]

I. M. Johnstone and A. Lu. On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486):682-693, 2009.

[18]

M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Springer-Verlag, New York, NY, 1991.

[19]

P. Li, T. J. Hastie, and K. W. Church. Very sparse random projections. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 287-296, New York, NY, USA, 2006. ACM.

Digital Library

[20]

P. Li, T. J. Hastie, and K. W. Church. Nonlinear estimators and tail bounds for dimension reduction in l1 using cauchy random projections. Journal of Machine Learning Research, 8:2497-2532, 2007.

Digital Library

[21]

J. Matousek. Lectures on Discrete Geometry. Springer-Verlag, New York, 2002.

Digital Library

[22]

N. Meinshausen and P. Buhlmann. High dimensional graphs and variable selection with the Lasso. Annals of Statistics, 34:1436-1462, 2006.

[23]

D. Paul. Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statistica Sinica, 17:1617-1642, 2007.

[24]

P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using l ₁-regularized logistic regression. Annals of Statistics, 38(3):1287-1319, 2010.

[25]

G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.

[26]

S. Sarvotham, D. Baron, and R. G. Baraniuk. Sudocodes: Fast measurement and reconstruction of sparse signals. In Int. Symposium on Information Theory, Seattle, WA, July 2006.

[27]

R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):267-288, 1996.

[28]

J. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Info Theory, 52(3):1030-1051, March 2006.

Digital Library

[29]

R. Vershynin. On large random almost euclidean bases. Acta. Math. Univ. Comenianae, LXIX: 137-144, 2000.

[30]

R. Vershynin. Random matrix theory. Technical report, UC Davis, 2006. Lecture Notes.

[31]

M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using l ₁- constrained quadratic programming (Lasso). IEEE Trans. Inf. Theor., 55(5):2183-2202, 2009.

Digital Library

[32]

M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk. An architecture for compressive imaging. IEEE Int. Conf. Image Proc., 2006.

[33]

W. Wang, M. Garofalakis, and K. Ramchandran. Distributed sparse random projections for refinable approximation. In Proc. International Conference on Information Processing in Sensor Networks, Nashville, TN, April 2007.

Digital Library

[34]

W. Wang, M.J. Wainwright, and K. Ramchandran. Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matrices. IEEE Trans. Inf. Theor., 56(6):2967-2979, 2010.

Digital Library

[35]

W. Xu and B. Hassibi. Efficient compressive sensing with deterministic guarantees using expander graphs. Information Theory Workshop, 2007. ITW '07. IEEE, 2007.

[36]

P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541-2563, 2006.

Digital Library

[37]

S. Zhou, J. Lafferty, and L. Wasserman. Compressed regression. In Neural Information Processing Systems, December 2007.

Cited By

Cai ZZhang SGuo PZhang JHu L(2024)AM-RP Stacking PILers: Random projection stacking pseudoinverse learning algorithm based on attention mechanismThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02780-740:1(273-285)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s00371-023-02780-7
Cannings T(2020)Random projectionsWIREs Computational Statistics10.1002/wics.149913:1Online publication date: 10-Dec-2020
https://dl.acm.org/doi/10.1002/wics.1499
Zhang LYang TJin RZhou Z(2016)Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization ApproachAlgorithmic Learning Theory10.1007/978-3-319-46379-7_6(83-97)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1007/978-3-319-46379-7_6

Index Terms

High-dimensional Variable Selection with Sparse Random Projections: Measurement Sparsity and Statistical Efficiency
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Statistical graphics

Recommendations

Sparse signal recovery using sparse random projections
Structured Variable Selection with Sparsity-Inducing Norms

We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual l₁-norm and ...
Dual-sparsity regularized sparse representation for single image super-resolution

Row nonlocal similarity regularization term with l1-norm constraint is explored.Dual-sparsity regularized sparse representation model is presented for SISR.Iterative shrinkage algorithm is extended to solve dual l1-norm constraints model. Recently, by ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Publisher

JMLR.org

Publication History

Published: 01 August 2010

Published in JMLR Volume 11

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
190
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)10

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cai ZZhang SGuo PZhang JHu L(2024)AM-RP Stacking PILers: Random projection stacking pseudoinverse learning algorithm based on attention mechanismThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02780-740:1(273-285)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s00371-023-02780-7
Cannings T(2020)Random projectionsWIREs Computational Statistics10.1002/wics.149913:1Online publication date: 10-Dec-2020
https://dl.acm.org/doi/10.1002/wics.1499
Zhang LYang TJin RZhou Z(2016)Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization ApproachAlgorithmic Learning Theory10.1007/978-3-319-46379-7_6(83-97)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1007/978-3-319-46379-7_6

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents