Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Better Streaming Algorithms for the Maximum Coverage Problem

Published: 01 October 2019 Publication History

Abstract

We study the classic NP-Hard problem of finding the maximum k-set coverage in the data stream model: given a set system of m sets that are subsets of a universe $\{1,\ldots,n \}$, find the k sets that cover the most number of distinct elements. The problem can be approximated up to a factor $1-1/e$ in polynomial time. In the streaming-set model, the sets and their elements are revealed online. The main goal of our work is to design algorithms, with approximation guarantees as close as possible to $1-1/e$, that use sublinear space $o(mn)$. Our main results are:
Two $(1-1/e-\epsilon )$ approximation algorithms: One uses $O(\epsilon ^{-1})$ passes and $\tilde {O}(\epsilon ^{-2} k)$ space whereas the other uses only a single pass but $\tilde {O}(\epsilon ^{-2} m)$ space. $\tilde {O}(\cdot )$ suppresses polylog factors.
We show that any approximation factor better than $(1-(1-1/k)^{k})\approx 1-1/e$ in constant passes requires ${\Omega }(m)$ space for constant k even if the algorithm is allowed unbounded processing time. We also demonstrate a single-pass, $(1-\epsilon )$ approximation algorithm using $\tilde {O}\left (\epsilon ^{-2} m \cdot \min (k,\epsilon ^{-1})\right )$ space.
We also study the maximum k-vertex coverage problem in the dynamic graph stream model. In this model, the stream consists of edge insertions and deletions of a graph on N vertices. The goal is to find k vertices that cover the most number of distinct edges.
We show that any constant approximation in constant passes requires ${\Omega }(N)$ space for constant k whereas $\tilde {O}(\epsilon ^{-2}N)$ space is sufficient for a $(1-\epsilon )$ approximation and arbitrary k in a single pass.
For regular graphs, we show that $\tilde {O}(\epsilon ^{-3}k)$ space is sufficient for a $(1-\epsilon )$ approximation in a single pass. We generalize this to a $(\kappa -\epsilon )$ approximation when the ratio between the minimum and maximum degree is bounded below by $\kappa $.

References

[1]
Ageev, A.A., Sviridenko, M.: Approximation algorithms for maximum coverage and max cut with given sizes of parts. In: IPCO, volume 1610 of Lecture Notes in Computer Science, pp. 17–30. Springer (1999)
[2]
Ageev, A.A., Sviridenko, M.: Pipage rounding: A new method of constructing algorithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)
[3]
Ahn, K.J, Cormode, G., Guha, S., McGregor, A., Wirth, A.: Correlation clustering in data streams. In: ICML, volume 37 of JMLR Workshop and Conference Proceedings, pp. 2237–2246, JMLR.org (2015)
[4]
Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: SODA, pp. 459–467. SIAM (2012)
[5]
Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: PODS, pp. 5–14. ACM (2012)
[6]
Ahn, K.J., Guha, S., McGregor, A.: Spectral sparsification in dynamic graph streams. In: APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pp. 1–10. Springer (2013)
[7]
Anagnostopoulos, A., Becchetti, L., Bordino, I., Leonardi, S., Mele, I., Sankowski, P.: Stochastic query covering for fast approximate document retrieval. ACM Trans. Inf. Syst. 33(3), 11:1–11:35 (2015)
[8]
Assadi, S.: Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In: PODS, pp. 321–335. ACM (2017)
[9]
Assadi, S., Khanna, S., Li, Y.: Tight bounds for single-pass streaming complexity of the set cover problem. In: STOC, pp. 698–711. ACM (2016)
[10]
Assadi, S., Khanna, S., Li, Y., Yaroslavtsev, G.: Maximum matchings in dynamic graph streams and the simultaneous communication model. In: SODA, pp. 1345–1364. SIAM (2016)
[11]
Ausiello, G., Boria, N., Giannakos, A., Lucarelli, G., Paschos, V.Th.: Online maximum k-coverage. Discret. Appl. Math. 160(13-14), 1901–1913 (2012)
[12]
Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680. ACM (2014)
[13]
Badanidiyuru, A., Vondrák, J.: Fast algorithms for maximizing submodular functions. In: SODA, pp. 1497–1514. SIAM (2014)
[14]
Bateni, M.H., Esfandiari, H., Mirrokni, V.S.: Almost optimal streaming algorithms for coverage problems CoRR, arXiv: 1610.08096 (2016)
[15]
Bhattacharya, S., Henzinger, M., Nanongkai, D., Tsourakakis, C.E.: Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In: STOC, pp. 173–182. ACM (2015)
[16]
Bonnet, É., Escoffier, B., Paschos, V.Th., Stamoulis, G.: A 0.821-ratio purely combinatorial algorithm for maximum k-vertex cover in bipartite graphs. In: LATIN, volume 9644 of Lecture Notes in Computer Science, pp. 235–248. Springer (2016)
[17]
Caskurlu, B., Mkrtchyan, V., Parekh, O., Subramani, K.: On partial vertex cover and budgeted maximum coverage problems in bipartite graphs. In: IFIP TCS, volume 8705 of Lecture Notes in Computer Science, pp. 13–26. Springer (2014)
[18]
Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. Electronic Colloquium on Computational Complexity (ECCC) 18, 62 (2011)
[19]
Chakrabarti, A., Kale, S.: Submodular maximization meets streaming: Matchings, matroids, and more. In: IPCO, volume 8494 of Lecture Notes in Computer Science, pp. 210–221. Springer (2014)
[20]
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. IEEE Computer Society (2003)
[21]
Chakrabarti, A., Wirth, A.: Incidence geometries and the pass complexity of semi-streaming set cover. In: SODA, pp. 1365–1373. SIAM (2016)
[22]
Chekuri, C., Gupta, S., Quanrud, K.: Streaming algorithms for submodular function maximization. In: ICALP (1), volume 9134 of Lecture Notes in Computer Science, pp. 318–330. Springer (2015)
[23]
Chekuri, C., Kumar, A.: Maximum coverage problem with group budget constraints and applications. In: APPROX-RANDOM, volume 3122 of Lecture Notes in Computer Science, pp. 72–83. Springer (2004)
[24]
Chitnis, R., Cormode, G., Esfandiari, H., Hajiaghayi, M.T., McGregor, A., Monemizadeh, M., Vorotnikova, S.: Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In: SODA, pp. 1326–1344. SIAM (2016)
[25]
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming (how to zero in). IEEE Trans. Knowl. Data Eng. 15(3), 529–540 (2003)
[26]
Cormode, G., Karloff, H.J., Wirth, A.: Set cover algorithms for very large datasets. In: CIKM, pp. 479–488. ACM (2010)
[27]
Emek, Y., Rosén, A.: Semi-streaming set cover - (extended abstract). In: ICALP (1), volume 8572 of Lecture Notes in Computer Science, pp. 453–464. Springer (2014)
[28]
Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
[29]
Guha, S., McGregor, A., Tench, D.: Vertex and hyperedge connectivity in dynamic graph streams. In: PODS, pp. 241–247. ACM (2015)
[30]
Har-Peled, S., Indyk, P., Mahabadi, S., Vakilian, A.: Towards tight bounds for the streaming set cover problem. In: PODS, pp. 371–383. ACM (2016)
[31]
Kapralov, M., Lee, Y.T., Musco, C., Musco, C., Sidford, A.: Single pass spectral sparsification in dynamic streams. IEEE Computer Society (2014)
[32]
Kapralov, M., Woodruff, D.P.: Spanners and sparsifiers in dynamic streams. In: PODC, pp. 272–281. ACM (2014)
[33]
Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. Theory of Computing 11, 105–147 (2015)
[34]
Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Inf. Process. Lett. 70(1), 39–45 (1999)
[35]
Kogan, D., Krauthgamer, R.: Sketching cuts in graphs and hypergraphs. In 6th Innovations in Theoretical Computer Science (2015)
[36]
Konrad, C.: Maximum matching in turnstile streams. In: ESA, volume 9294 of Lecture Notes in Computer Science, pp. 840–852. Springer (2015)
[37]
Krause, A., Guestrin, C.: Near-optimal observation selection using submodular functions. In: AAAI, pp. 1650–1654. AAAI Press (2007)
[38]
Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. TOPC 2(3), 14 (2015)
[39]
McGregor, A.: Graph stream algorithms: a survey. SIGMOD Record 43(1), 9–20 (2014)
[40]
McGregor, A., Tench, D., Vorotnikova, S., Vu, H.T.: Densest subgraph in dynamic graph streams. In: MFCS (2), volume 9235 of Lecture Notes in Computer Science, pp. 472–482. Springer (2015)
[41]
McGregor, A., Vorotnikova, S., Vu, H.T.: Better algorithms for counting triangles in data streams. In: PODS, pp. 401–411. ACM (2016)
[42]
McGregor, A., Vu, H.T.: Better streaming algorithms for the maximum coverage problem. In: ICDT, volume 68 of LIPIcs, pp. 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
[43]
Panconesi, A., Srinivasan, A.: Randomized distributed edge coloring via an extension of the chernoff-hoeffding bounds. SIAM J. Comput. 26(2), 350–368 (1997)
[44]
Radhakrishnan, J., Shannigrahi, S.: Streaming algorithms for 2-coloring uniform hypergraphs. In: Algorithms and Data Structures - 12th International Symposium, WADS 2011, New york. Proceedings, pp. 667–678 (2011)
[45]
Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: SDM, pp. 697–708. SIAM (2009)
[46]
Schmidt, J.P., Siegel, A., Srinivasan, A.: Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math. 8(2), 223–250 (1995)
[47]
Spielman, D.A., Teng, S.H.: Spectral sparsification of graphs. SIAM J. Comput. 40(4), 981–1025 (2011)
[48]
Srinivasan, A.: Distributions on level-sets with applications to approximation algorithms. In: FOCS, pp. 588–597. IEEE Computer Society (2001)
[49]
Sun, H.: Counting hypergraphs in data streams. CoRR, arXiv: 1304.7456 (2013)
[50]
Yu, H., Yuan, D.: Set coverage problems in a one-pass data stream. In: SDM, pp. 758–766. SIAM (2013)

Cited By

View all
  • (2024)Counterfactual Explanation at Will, with Zero Privacy LeakageProceedings of the ACM on Management of Data10.1145/36549332:3(1-29)Online publication date: 30-May-2024
  • (2023)The One-Way Communication Complexity of Submodular Maximization with Applications to Streaming and RobustnessJournal of the ACM10.1145/358856470:4(1-52)Online publication date: 12-Aug-2023
  • (2023)A master-apprentice evolutionary algorithm for maximum weighted set K-covering problemApplied Intelligence10.1007/s10489-022-03531-253:2(1912-1944)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Theory of Computing Systems
Theory of Computing Systems  Volume 63, Issue 7
Oct 2019
306 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2019

Author Tags

  1. Data streams
  2. Algorithms
  3. Approximations
  4. Maximum coverage

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Counterfactual Explanation at Will, with Zero Privacy LeakageProceedings of the ACM on Management of Data10.1145/36549332:3(1-29)Online publication date: 30-May-2024
  • (2023)The One-Way Communication Complexity of Submodular Maximization with Applications to Streaming and RobustnessJournal of the ACM10.1145/358856470:4(1-52)Online publication date: 12-Aug-2023
  • (2023)A master-apprentice evolutionary algorithm for maximum weighted set K-covering problemApplied Intelligence10.1007/s10489-022-03531-253:2(1912-1944)Online publication date: 1-Jan-2023
  • (2022)Almost-Smooth Histograms and Sliding-Window Graph AlgorithmsAlgorithmica10.1007/s00453-022-00988-y84:10(2926-2953)Online publication date: 1-Oct-2022
  • (2021)Cardinality constrained submodular maximization for random streamsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540758(6491-6502)Online publication date: 6-Dec-2021
  • (2020)Dynamic submodular maximizationProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496546(9806-9817)Online publication date: 6-Dec-2020

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media