Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Stable distributions, pseudorandom generators, embeddings, and data stream computation

Published: 01 May 2006 Publication History

Abstract

In this article, we show several results obtained by combining the use of stable distributions with pseudorandom generators for bounded space. In particular:---We show that, for any p ∈ (0, 2], one can maintain (using only O(log n2) words of storage) a sketch C(q) of a point qlnp under dynamic updates of its coordinates. The sketch has the property that, given C(q) and C(s), one can estimate ‖qsp up to a factor of (1 + ϵ) with large probability. This solves the main open problem of Feigenbaum et al. [1999].---We show that the aforementioned sketching approach directly translates into an approximate algorithm that, for a fixed linear mapping A, and given x ∈ ℜn and y ∈ ℜm, estimates ‖Axyp in O(n + m) time, for any p ∈ (0, 2]. This generalizes an earlier algorithm of Wasserman and Blum [1997] which worked for the case p = 2.---We obtain another sketch function C′ which probabilistically embeds ln1 into a normed space lm1. The embedding guarantees that, if we set m = log(1/δ)O(1/ϵ), then for any pair of points q, sln1, the distance between q and s does not increase by more than (1 + ϵ) with constant probability, and it does not decrease by more than (1 − ϵ) with probability 1 − δ. This is the only known dimensionality reduction theorem for the l1 norm. In fact, stronger theorems of this type (i.e., that guarantee very low probability of expansion as well as of contraction) cannot exist [Brinkman and Charikar 2003].---We give an explicit embedding of ln2 into lnO(log n)1 with distortion (1 + 1/nΘ(1)).

References

[1]
Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the ACM Symposium on Theory of Computing, ACM, New York, 20--29.]]
[2]
Ar, S., Blum, M., Codenotti, B., and Gemmell, P. 1993. Checking approximate computation over the reals. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]]
[3]
Berger, B. 1997. The fourth moment method. SIAM J. Comput. 26.]]
[4]
Brinkman, B., and Charikar, M. 2003. On the impossibility of dimension reduction in l1. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]]
[5]
Broder, A. 1998. Filtering near-duplicate documents. In Proceedings of FUN.]]
[6]
Broder, A., Glassman, S., Manasse, M., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference, 391--404.]]
[7]
Chambers, J. M., Mallows, C. L., and Stuck, B. W. 1976. A method for simulating stable random variables. J. Amer. Statist. Assoc. 71, 340--344.]]
[8]
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J., and Yang, C. 2000. Finding interesting associations without support prunning. In Proceedings of the 16th International Conference on Data Engineering (ICDE).]]
[9]
Cormode, G. 2003. Stable distributions for stream computations: It's as easy as 0,1,2. In Proceedings of the Workshop on Management and Processing of Data Streams.]]
[10]
Cormode, G., Datar, M., Indyk, P., and Muthukrishnan, S. 2002a. Comparing data streams using hamming norms. In Proceedings of the International Conference on Very Large Databases (VLDB).]]
[11]
Cormode, G., Indyk, P., Koudas, N., and Muthukrishnan, S. 2002b. Fast mining of massive tabular data via approximate distance computations. In Proceedings of the 18th International Conference on Data Engineering (ICDE).]]
[12]
Cormode, G., and Muthukrishnan, S. 2003. Estimating dominance norms of multiple data streams. In Proceedings of the European Symposium on Algorithms.]]
[13]
Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]]
[14]
Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry, ACM, New York.]]
[15]
Engebretsen, L., Indyk, P., and O'Donnell, R. 2002. Deterministic dimensionality reduction with applications. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]]
[16]
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright, R. N. 2001a. Secure multiparty computation of approximations. Lecture Notes in Computer Science, vol. 2076, Springer-Verlag, New York, 927.]]
[17]
Feigenbaum, J., Ishai, Y., Malkin, T., Nissim, K., Strauss, M. J., and Wright, R. N. 2001b. Secure multiparty computation of approximations. http://eprint.iacr.org/2001/024/.]]
[18]
Feigenbaum, J., Kannan, S., Strauss, M., and Viswanathan, M. 1999. An approximate l1-difference algorithm for massive data streams. In Proceedings of the Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]]
[19]
Figiel, T., Lindenstrauss, J., and Milman, V. D. 1977. The dimension of almost spherical sections of convex bodies. Acta Math. 139, 53--94.]]
[20]
Freivalds, R. 1979. Fast probabilistic algorithms. In Proceedings of the Mathematical Foundations of Computer Science. Lecture Notes in Computer Science, vol. 74, Springer-Verlag, New York.]]
[21]
Gilbert, A., Guha, S., Kotidis, Y., Indyk, P., Muthukrishnan, M., and Strauss, M. 2002. In Proceedings of the Fast, small-space algorithms for approximate histogram maintenance. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]]
[22]
Henzinger, M., Raghavan, P., and Rajagopalan, S. 1998. Computing on data streams. Technical Note 1998-011, Digital Systems Research Center, Palo Alto, CA.]]
[23]
Indyk, P. 2000. Dimensionality reduction techniques for proximity problems. In Proceedings of the Ninth ACM-SIAM Symposium on Discrete Algorithms, ACM, New York.]]
[24]
Indyk, P. 2001. Tutorial: Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA.]]
[25]
Indyk, P. 2004. Algorithms for dynamic geometric problems over data streams. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]]
[26]
Indyk, P., Koudas, N., and Muthukrishnan, S. 2000. Identifying representative trends in massive time series datasets using sketches. In Proceedings of the 26th International Conference on Very Large Databases (VLDB).]]
[27]
Indyk, P., and Motwani, R. 1998. Approximate nearest neighbor: towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing, ACM, New York.]]
[28]
Indyk, P., and Woodruff, D. 2005. Optimal approximations of the frequency moments of data streams. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York.]]
[29]
Johnson, W., and Lindenstrauss, J. 1984. Extensions of lipshitz mapping into hilbert space. Contemp. Math. 26, 189--206.]]
[30]
Johnson, W., and Schechtman, G. 1982. Embedding lmp into ln1. Acta Math. 149, 71--85.]]
[31]
Lindenstrauss, J., and Milman, V. D. 1993. The local theory of normed spaces and its applications to convexity. In Handbook of Convex Geometry, P. M. Gruber and J. M. Wills, eds Elsevier, Amsterdam, The Netherlands, 1149--1220.]]
[32]
Linial, N., London, E., and Rabinovich, Y. 1994. The geometry of graphs and some of its algorithmic applications. In Proceedings of 35th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 577--591.]]
[33]
Matoušek, J. 2004. Collection of open problems on low-distortion embeddings of finite metric spaces. Available at http://kam.mff.cuni.cz/~matousek/metrop.ps.gz|.]]
[34]
Maurer, A. 2003. A bound on the deviation probability for sums of non-negative random variables. J. Ineq. Pure Applied Math. 4, 1, Art. 15.]]
[35]
Muthukrishnan, S. 2003. Data streams: Algorithms and applications (invited talk at SODA'03). Available at http://athos.rutgers.edu/~muthu/stream-1-1.ps.]]
[36]
Nisan, N. 1990. Pseudorandom generators for space-bounded computation. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York, 204--212.]]
[37]
Nisan, N. 1992. RL ⊂ SC. In Proceedings of the Annual ACM Symposium on Theory of Computing, 619--623.]]
[38]
Sivakumar, D. 2002. Algorithmic derandomization via complexity theory. In Proceedings of the Annual ACM Symposium on Theory of Computing, ACM, New York, 619--626.]]
[39]
Thaper, N., Guha, S., Indyk, P., and Koudas, N. 2002. Dynamic multidimensional histograms. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), ACM, New York.]]
[40]
Wasserman, H., and Blum, M. 1997. Software reliability via run-time result checking. J. ACM.]]
[41]
Zolotarev, V. 1986. One-Dimensional Stable Distributions. Vol. 65 of Translations of Mathematical Monographs, American Mathematical Society.]]

Cited By

View all
  • (2024)On the Feasibility of Forgetting in Data StreamsProceedings of the ACM on Management of Data10.1145/36516032:2(1-17)Online publication date: 14-May-2024
  • (2024)Streaming Algorithms with Few State ChangesProceedings of the ACM on Management of Data10.1145/36511452:2(1-28)Online publication date: 14-May-2024
  • (2024)Limitations of the Impagliazzo–Nisan–Wigderson Pseudorandom Generator Against Permutation Branching ProgramsAlgorithmica10.1007/s00453-024-01251-286:10(3153-3185)Online publication date: 1-Oct-2024
  • Show More Cited By

Index Terms

  1. Stable distributions, pseudorandom generators, embeddings, and data stream computation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of the ACM
      Journal of the ACM  Volume 53, Issue 3
      May 2006
      225 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/1147954
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 May 2006
      Published in JACM Volume 53, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data streams
      2. dimensionality reduction
      3. embeddings
      4. norms
      5. sketching

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)105
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)On the Feasibility of Forgetting in Data StreamsProceedings of the ACM on Management of Data10.1145/36516032:2(1-17)Online publication date: 14-May-2024
      • (2024)Streaming Algorithms with Few State ChangesProceedings of the ACM on Management of Data10.1145/36511452:2(1-28)Online publication date: 14-May-2024
      • (2024)Limitations of the Impagliazzo–Nisan–Wigderson Pseudorandom Generator Against Permutation Branching ProgramsAlgorithmica10.1007/s00453-024-01251-286:10(3153-3185)Online publication date: 1-Oct-2024
      • (2023)Dynamic non-monotone submodular maximizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666883(17369-17382)Online publication date: 10-Dec-2023
      • (2023)Towards Optimal Moment Estimation in Streaming and Distributed ModelsACM Transactions on Algorithms10.1145/359649419:3(1-35)Online publication date: 24-Jun-2023
      • (2023)Streaming Euclidean MST to a Constant FactorProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585168(156-169)Online publication date: 2-Jun-2023
      • (2023)Inapproximability of Matrix \(\boldsymbol{p \rightarrow q}\) NormsSIAM Journal on Computing10.1137/18M123341852:1(132-155)Online publication date: 14-Feb-2023
      • (2023)Dimensionality Reduction for Categorical DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313237335:4(3658-3671)Online publication date: 1-Apr-2023
      • (2023)Random Projection Based Efficient Detector in Massive MIMO Communication Networks2023 IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)10.1109/SPAWC53906.2023.10304438(426-430)Online publication date: 25-Sep-2023
      • (2023)Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00093(1515-1550)Online publication date: 6-Nov-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media