research-article

Sparser Johnson-Lindenstrauss Transforms

Authors:

Daniel M. Kane,

Jelani NelsonAuthors Info & Claims

Journal of the ACM (JACM), Volume 61, Issue 1

Article No.: 4, Pages 1 - 23

https://doi.org/10.1145/2559902

Published: 01 January 2014 Publication History

Abstract

We give two different and simple constructions for dimensionality reduction in ℓ₂ via linear mappings that are sparse: only an O(ε)-fraction of entries in each column of our embedding matrices are non-zero to achieve distortion 1 + ε with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters, improving upon previous works of Achlioptas [2003] and Dasgupta et al. [2010]. Such distributions can be used to speed up applications where ℓ₂ dimensionality reduction is used.

References

[1]

Dimitris Achlioptas. 2003. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 4, 671--687.

Digital Library

[2]

Nir Ailon and Bernard Chazelle. 2009. The fast Johnson--Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39, 1, 302--322.

Digital Library

[3]

Nir Ailon and Edo Liberty. 2009. Fast dimension reduction using Rademacher series on dual BCH codes. Disc. Comput. Geom. 42, 4, 615--630.

Digital Library

[4]

Nir Ailon and Edo Liberty. 2013. An almost optimal unrestricted fast Johnson-Lindenstrauss transform. ACM Trans. Algor. 9, 3, 21.

Digital Library

[5]

Noga Alon. 2003. Problems and results in extremal combinatorics I. Discrete Math. 273, 1--3, 31--53.

Digital Library

[6]

Rosa I. Arriaga and Santosh Vempala. 2006. An algorithmic theory of learning: Robust concepts and random projection. Mach. Learn. 63, 2, 161--182.

Digital Library

[7]

Vladimir Braverman, Rafail Ostrovsky, and Yuval Rabani. 2010. Rademacher chaos, Random Eulerian graphs and the sparse Johnson-Lindenstrauss transform. CoRR abs/1011.2590.

[8]

J. Lawrence Carter and Mark N. Wegman. 1979. Universal classes of hash functions. J. Comput. Syst. Sci. 18, 2, 143--154.

[9]

Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2004. Finding frequent items in data streams. Theoret. Comput. Sci. 312, 1, 3--15.

Digital Library

[10]

Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In Proceedings of the 41st ACM Symposium on Theory of Computing (STOC). 205--214.

Digital Library

[11]

Kenneth L. Clarkson and David P. Woodruff. 2013. Low rank approximation and regression in input sparsity time. In Proceedings of the 45th ACM Symposium on Theory of Computing (STOC). 81--90.

Digital Library

[12]

Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. 2010. A sparse Johnson-Lindenstrauss transform. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC). 341--350.

Digital Library

[13]

Sanjoy Dasgupta and Anupam Gupta. 2003. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algor. 22, 1, 60--65.

Digital Library

[14]

Peter Frankl and Hiroshi Maehara. 1988. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory Ser. B 44, 3, 355--362.

Digital Library

[15]

Zoltán Füredi and János Komlós. 1981. The eigenvalues of random symmetric matrices. Combinatorica 1, 3, 233--241.

[16]

David Lee Hanson and Farroll Tim Wright. 1971. A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Statist. 42, 3, 1079--1083.

[17]

Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory Comput. 8, 1, 321--350.

[18]

Aicke Hinrichs and Jan Vybíral. 2011. Johnson-Lindenstrauss lemma for circulant matrices. Random Struct. Algor. 39, 3, 391--398.

Digital Library

[19]

Piotr Indyk. 2001. Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS). 10--33.

Digital Library

[20]

T. S. Jayram and David P. Woodruff. 2013. Optimal bounds for Johnson-Lindenstrauss transforms and streaming problems with subconstant error. ACM Trans. Algor. 9, 3, 26.

Digital Library

[21]

William B. Johnson and Joram Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189--206.

[22]

Daniel M. Kane, Raghu Meka, and Jelani Nelson. 2011a. Almost optimal explicit Johnson-Lindenstrauss transformations. In Proceedings of the 15th International Workshop on Randomization and Computation (RANDOM). 628--639.

Digital Library

[23]

Daniel M. Kane and Jelani Nelson. 2010. A derandomized sparse Johnson-Lindenstrauss transform. CoRR abs/1006.3585.

[24]

Daniel M. Kane and Jelani Nelson. 2012. Sparser Johnson-Lindenstrauss transforms. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1195--1206.

Digital Library

[25]

Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. 2011b. Fast moment estimation in data streams in optimal space. In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC). 745--754.

Digital Library

[26]

Eyal Kaplan, Moni Naor, and Omer Reingold. 2009. Derandomized constructions of k-wise (almost) independent permutations. Algorithmica 55, 1, 113--133.

Digital Library

[27]

Felix Krahmer and Rachel Ward. 2011. New and improved Johnson-Lindenstrauss embeddings via the Restricted Isometry Property. SIAM J. Math. Anal. 43, 3, 1269--1281.

[28]

Jirí Matousek. 2008. On variants of the Johnson-Lindenstrauss lemma. Rand. Struct. Algor. 33, 2, 142--156.

Digital Library

[29]

Xiangrui Meng and Michael W. Mahoney. 2013. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the 45th ACM Symposium on Theory of Computing (STOC). 91--100.

Digital Library

[30]

Rajeev Motwani and Prabakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press.

Digital Library

[31]

Jelani Nelson and Huy L. Nguyễn. 2013a. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. In Proceedings of the 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS). 117--126.

Digital Library

[32]

Jelani Nelson and Huy L. Nguyễn. 2013b. Sparsity lower bounds for dimensionality reducing maps. In Proceedings of the 45th ACM Symposium on Theory of Computing (STOC). 101--110.

Digital Library

[33]

Vipin Kumar Pang-Ning Tan and Michael Steinbach. 2005. Introduction to Data Mining. Addison-Wesley.

[34]

Tamás Sarlós. 2006. Improved Approximation Algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS). 143--152.

Digital Library

[35]

Mikkel Thorup and Yin Zhang. 2012. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41, 2, 293--331.

Digital Library

[36]

Santosh Vempala. 2004. The Random Projection Method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 65, American Mathematical Society.

[37]

Joachim von zur Gathen and Jürgen Gerhard. 1999. Modern Computer Algebra. Cambridge University Press.

Digital Library

[38]

Jan Vybíral. 2011. A variant of the Johnson-Lindenstrauss lemma for circulant matrices. J. Funct. Anal. 260, 4, 1096--1105.

[39]

Kilian Q. Weinberger, Anirban Dasgupta, John Langford, Alexander J. Smola, and Josh Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML). 1113--1120.

Digital Library

[40]

Eugene P. Wigner. 1955. Characteristic vectors of bordered matrices with infinite dimensions. Ann. Math. 62, 548--564.

Cited By

Wang HLi NZhou YYan JJiang BKong LYan X(2024)Fast Fusion Clustering via Double Random ProjectionEntropy10.3390/e2605037626:5(376)Online publication date: 28-Apr-2024
https://doi.org/10.3390/e26050376
Stock MVan Criekinge WBoeckaerts DTaelman SVan Haeverbeke MDewulf PDe Baets B(2024)Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological dataPLOS Computational Biology10.1371/journal.pcbi.101242620:9(e1012426)Online publication date: 24-Sep-2024
https://doi.org/10.1371/journal.pcbi.1012426
Necoara IChorobura F(2024)Efficiency of Stochastic Coordinate Proximal Gradient Methods on Nonseparable Composite OptimizationMathematics of Operations Research10.1287/moor.2023.0044Online publication date: 16-Apr-2024
https://doi.org/10.1287/moor.2023.0044
Show More Cited By

Index Terms

Sparser Johnson-Lindenstrauss Transforms
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic algorithms
    2. Probabilistic reasoning algorithms
      1. Markov-chain Monte Carlo methods
      2. Sequential Monte Carlo methods
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Dimensionality reduction via the Johnson---Lindenstrauss Lemma: theoretical and empirical bounds on embedding dimension

The Johnson---Lindenstrauss (JL) lemma has led to the development of tools for dealing with datasets in high dimensions. The lemma asserts that a set of high-dimensional points can be projected into lower dimensions, while approximately preserving the ...
On Using Toeplitz and Circulant Matrices for Johnson–Lindenstrauss Transforms
Abstract
The Johnson–Lindenstrauss lemma is one of the cornerstone results in dimensionality reduction. A common formulation of it, is that there exists a random linear mapping $f : R^{n} \to R^{m}$ such that for any vector $x \in R^{n}$ , f preserves its norm to within $(1 \pm ε)$ ... $^{}$ $^{}$ $^{}^{}$ $^{}$ $^{}^{}$ $^{}^{}$
Exploiting fisher and fukunaga-koontz transforms in chernoff dimensionality reduction

Knowledge discovery from big data demands effective representation of data. However, big data are often characterized by high dimensionality, which makes knowledge discovery more difficult. Many techniques for dimensionality reudction have been proposed,...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 61, Issue 1

January 2014

222 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/2578041

Editor:
Victor Vianu
University of California, San Diego

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2014

Accepted: 01 August 2013

Revised: 01 August 2013

Received: 01 February 2012

Published in JACM Volume 61, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Xerox
Center for Massive Data Algorithmics (MADALGO)
Danish National Research Foundation
National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

125
Total Citations
View Citations
1,440
Total Downloads

Downloads (Last 12 months)124
Downloads (Last 6 weeks)13

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang HLi NZhou YYan JJiang BKong LYan X(2024)Fast Fusion Clustering via Double Random ProjectionEntropy10.3390/e2605037626:5(376)Online publication date: 28-Apr-2024
https://doi.org/10.3390/e26050376
Stock MVan Criekinge WBoeckaerts DTaelman SVan Haeverbeke MDewulf PDe Baets B(2024)Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological dataPLOS Computational Biology10.1371/journal.pcbi.101242620:9(e1012426)Online publication date: 24-Sep-2024
https://doi.org/10.1371/journal.pcbi.1012426
Necoara IChorobura F(2024)Efficiency of Stochastic Coordinate Proximal Gradient Methods on Nonseparable Composite OptimizationMathematics of Operations Research10.1287/moor.2023.0044Online publication date: 16-Apr-2024
https://doi.org/10.1287/moor.2023.0044
Gao JLong C(2024)RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36549702:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654970
Dzahini KWild S(2024)Stochastic Trust-Region Algorithm in Random Subspaces with Convergence and Expected Complexity AnalysesSIAM Journal on Optimization10.1137/22M152407234:3(2671-2699)Online publication date: 25-Jul-2024
https://doi.org/10.1137/22M1524072
Dirksen SMendelson SStollenwerk A(2024)Fast Metric Embedding into the Hamming CubeSIAM Journal on Computing10.1137/22M152022053:2(315-345)Online publication date: 14-Mar-2024
https://doi.org/10.1137/22M1520220
Kleyko DRachkovskij D(2024)On Design Choices in Similarity-Preserving Sparse Randomized Embeddings2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651277(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651277
Hovland P(2024)Jacobian sparsity detection using Bloom filtersOptimization Methods and Software10.1080/10556788.2023.2285486(1-13)Online publication date: 10-Oct-2024
https://doi.org/10.1080/10556788.2023.2285486
Meier MNakatsukasa Y(2024)Fast randomized numerical rank estimation for numerically low-rank matricesLinear Algebra and its Applications10.1016/j.laa.2024.01.001686(1-32)Online publication date: Apr-2024
https://doi.org/10.1016/j.laa.2024.01.001
Zhang XShao NSi ZXu JWang WSu HWen JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Reward imputation with sketching for contextual batched banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668940(64577-64588)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668940
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents