Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Model Counting Meets Distinct Elements in a Data Stream

Published: 01 June 2022 Publication History

Abstract

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137--147, 1999.
[2]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In Proc. of RANDOM, volume 2483, pages 1--10, 2002.
[3]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. of SODA, pages 623--632. ACM/SIAM, 2002.
[4]
J. L. Carter and M. N. Wegman. Universal classes of hash functions. In Proceedings of the ninth annual ACM symposium on Theory of computing, pages 106--112. ACM, 1977.
[5]
S. Chakraborty, K. S. Meel, and M. Y. Vardi. Algorithmic improvements in approximate counting for probabilistic inference: From linear to logarithmic SAT calls. In Proc. of IJCAI, 2016.
[6]
G. Cormode and S. Muthukrishnan. Estimating dominance norms of multiple data streams. In G. D. Battista and U. Zwick, editors, Proc. of ESA, volume 2832 of Lecture Notes in Computer Science, pages 148--160. Springer, 2003.
[7]
G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. ACM Transactions on Algorithms (TALG), 7(2):1--20, 2011.
[8]
G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Continuous sampling from distributed streams. Journal of the ACM (JACM), 59(2):1--25, 2012.
[9]
P. Dagum, R. Karp, M. Luby, and S. Ross. An optimal algorithm for monte carlo estimation. SIAM Journal on computing, 29(5):1484--1496, 2000.
[10]
S. Ermon, C. P. Gomes, A. Sabharwal, and B. Selman. Low-density parity constraints for hashing-based discrete integration. In Proc. of ICML, pages 271--279, 2014.
[11]
W. Feng, T. P. Hayes, and Y. Yin. Distributed symmetry breaking in sampling (optimal distributed randomly coloring with fewer colors). arXiv preprint arXiv:1802.06953, 2018.
[12]
W. Feng, Y. Sun, and Y. Yin. What can be sampled locally? Distributed Computing, pages 1--27, 2018.
[13]
W. Feng and Y. Yin. On local distributed sampling and counting. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing, pages 189--198, 2018.
[14]
M. Fischer and M. Ghaffari. A simple parallel and distributed sampling technique: Local glauber dynamics. In 32nd International Symposium on Distributed Computing, 2018.
[15]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182--209, 1985.
[16]
P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In A. L. Rosenberg, editor, Proc. of SPAA, pages 281--291. ACM, 2001.
[17]
C. P. Gomes, J. Hoffmann, A. Sabharwal, and B. Selman. From sampling to model counting. In Proc. of IJCAI, pages 2293--2299, 2007.
[18]
Z. Huang, K. Yi, and Q. Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proc. of PODS, pages 295--306, 2012.
[19]
A. Ivrii, S. Malik, K. S. Meel, and M. Y. Vardi. On computing minimal independent support and its applications to sampling and counting. Constraints, pages 1--18, 2015.
[20]
D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In Proc. of PODS, pages 41--52. ACM, 2010.
[21]
R. Karp and M. Luby. Monte-carlo algorithms for enumeration and reliability problems. Proc. of FOCS, 1983.
[22]
R. M. Karp, M. Luby, and N. Madras. Monte-carlo approximation algorithms for enumeration problems. Journal of Algorithms, 10(3):429 -- 448, 1989.
[23]
K. S. Meel and S. Akshay. Sparse hashing for scalable approximate model counting: Theory and practice. In Proc. of LICS, 2020.
[24]
K. S. Meel, A. A. Shrotri, and M. Y. Vardi. On hashing-based approaches to approximate dnf-counting. In In Proc. of FSTTCS, 2017.
[25]
K. S. Meel, A. A. Shrotri, and M. Y. Vardi. Not all fprass are equal: Demystifying fprass for dnf-counting (extended abstract). In Proc. of IJCAI, 8 2019.
[26]
A. Pavan and S. Tirthapura. Range-efficient counting of distinct elements in a massive data stream. SIAM J. Comput., 37(2):359--379, 2007.
[27]
C. R´e and D. Suciu. Approximate lineage for probabilistic databases. Proceedings of the VLDB Endowment, 1(1):797--808, 2008.
[28]
P. Senellart. Provenance and probabilities in relational databases. ACM SIGMOD Record, 46(4):5--15, 2018.
[29]
M. Soos and K. S. Meel. Bird: Engineering an efficient cnf-xor sat solver and its applications to approximate model counting. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI)(1 2019), 2019.
[30]
L. Stockmeyer. The complexity of approximate counting. In Proc. of STOC, pages 118--126, 1983.
[31]
S. Tirthapura and D. P. Woodruff. Rectangle-efficient aggregation in spatial data streams. In Proc. of PODS, pages 283--294. ACM, 2012.
[32]
L. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.
[33]
D. P. Woodruff and Q. Zhang. Tight bounds for distributed functional monitoring. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 941--960, 2012.
  1. Model Counting Meets Distinct Elements in a Data Stream

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGMOD Record
    ACM SIGMOD Record  Volume 51, Issue 1
    March 2022
    90 pages
    ISSN:0163-5808
    DOI:10.1145/3542700
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2022
    Published in SIGMOD Volume 51, Issue 1

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 67
      Total Downloads
    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media