Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2213556.2213560acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Graph sketches: sparsification, spanners, and subgraphs

Published: 21 May 2012 Publication History

Abstract

When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements.
In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of dense sub-graphs. Our main result is a sketch-based sparsifier construction: we show that Õ(nε-2) random linear projections of a graph on n nodes suffice to (1+ε) approximate all cut values. Similarly, we show that Õ(ε-2) linear projections suffice for (additively) approximating the fraction of induced sub-graphs that match a given pattern such as a small clique. Finally, for distance estimation we present sketch-based spanner constructions. In this last result the sketches are adaptive, i.e., the linear projections are performed in a small number of batches where each projection may be chosen dependent on the outcome of earlier sketches. All of the above results immediately give rise to data stream algorithms that also apply to dynamic graph streams where edges are both inserted and deleted. The non-adaptive sketches, such as those for sparsification and subgraphs, give us single-pass algorithms for distributed data streams with insertion and deletions. The adaptive sketches can be used to analyze MapReduce algorithms that use a small number of rounds.

References

[1]
K. J. Ahn and S. Guha. Graph sparsification in the semi-streaming model. In ICALP (2), pages 328--338, 2009.
[2]
K. J. Ahn and S. Guha. Laminar families and metric embeddings: Non-bipartite maximum matching problem in the semi-streaming model. Manuscript, available at http://arxiv.org/abs/1104.4058, 2011.
[3]
K. J. Ahn and S. Guha. Linear programming in the semi-streaming model with application to the maximum matching problem. In ICALP (2), pages 526--538, 2011.
[4]
K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In SODA, 2012.
[5]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137--147, 1999.
[6]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. of SODA, pages 623--632, 2002.
[7]
S. Baswana and S. Sen. A simple and linear time randomized algorithm for computing sparse spanners in weighted graphs. Random Struct. Algorithms, 30(4):532--563, 2007.
[8]
A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in Õ(n2)time. In STOC, pages 47--55, 1996.
[9]
L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, pages 253--262, 2006.
[10]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Theor. Comput. Sci., 312(1):3--15, 2004.
[11]
K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In STOC, pages 205--214, 2009.
[12]
G. Cormode. Sketch techniques for approximate query processing. In G. Cormode, M. Garofalakis, P. Haas, and C. Jermaine, editors, Synposes for Approximate Query Processing: Samples, Histograms, Wavelets and Sketches, Foundations and Trends in Databases. NOW publishers, 2011.
[13]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005.
[14]
G. Cormode, S. Muthukrishnan, and I. Rozenbaum. Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. In VLDB, pages 25--36, 2005.
[15]
G. Cormode, S. Muthukrishnan, K. Yi, and Q. Zhang. Optimal sampling from distributed streams. In PODS, pages 77--86, 2010.
[16]
M. Elkin. A near-optimal fully dynamic distributed algorithm for maintaining sparse spanners, 2006.
[17]
M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011.
[18]
L. Epstein, A. Levin, J. Mestre, and D. Segev. Improved approximation guarantees for weighted matching in the semi-streaming model. CoRR, abs/00907.0305, 2000.
[19]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2):207--216, 2005.
[20]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709--1727, 2008.
[21]
G. Frahling, P. Indyk, and C. Sohler. Sampling in dynamic data streams and applications. In Symposium on Computational Geometry, pages 142--149, 2005.
[22]
W. S. Fung, R. Hariharan, N. J. A. Harvey, and D. Panigrahi. A general framework for graph sparsification. In STOC, pages 71--80, 2011.
[23]
S. Ganguly and L. Bhuvanagiri. Hierarchical sampling from sketches: Estimating functions over data streams. Algorithmica, 53(4):549--582, 2009.
[24]
A. Gilbert and P. Indyk. Sparse recovery using sparse matrices. Proceedings of the IEEE, 98(6):937--947, june 2010.
[25]
R. E. Gomory and T. C. Hu. Multi-Terminal Network Flows. Journal of the Society for Industrial and Applied Mathematics, 9(4):551--570, 1961.
[26]
S. Guha, N. Koudas, and K. Shim. Approximation and streaming algorithms for histogram construction problems. ACM Trans. Database Syst., 31(1):396--438, 2006.
[27]
P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. J. ACM, 53(3):307--323, 2006.
[28]
P. Indyk and D. Woodruff. Optimal approximations of the frequency moments of data streams. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 202--208. ACM New York, NY, USA, 2005.
[29]
W. B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert Space. Contemporary Mathematics, Vol 26, pages 189--206, May 1984.
[30]
H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In COCOON, pages 710--716, 2005.
[31]
H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In PODS, pages 49--58, 2011.
[32]
D. M. Kane, J. Nelson, E. Porat, and D. P. Woodruff. Fast moment estimation in data streams in optimal space. In STOC, pages 745--754, 2011.
[33]
D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41--52, 2010.
[34]
D. R. Karger. Random sampling in cut, flow, and network design problems. In STOC, pages 648--657, 1994.
[35]
J. A. Kelner and A. Levin. Spectral sparsification in the semi-streaming setting. In STACS, pages 440--451, 2011.
[36]
A. McGregor. Finding graph matchings in data streams. APPROX-RANDOM, pages 170--181, 2005.
[37]
A. McGregor. Graph mining on streams. In Encyclopedia of Database Systems, pages 1271--1275, 2009.
[38]
S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, 2006.
[39]
N. Nisan. Pseudorandom generators for space-bounded computation. Combinatorica, 12(4):449--461, 1992.
[40]
A. Schrijver. Combinatorial Optimization - Polyhedra and Efficiency, volume 24 of Algorithms and Combinatorics. Springer, 2003.
[41]
M. Zelke. Weighted matching in the semi-streaming model. Algorithmica 2010.

Cited By

View all
  • (2024)LM-SRPQ: Efficiently Answering Regular Path Query in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121417:5(1047-1059)Online publication date: 1-Jan-2024
  • (2024)Matrix hypercontractivity, streaming algorithms and LDCs: the large alphabet caseACM Transactions on Computation Theory10.1145/368882416:4(1-38)Online publication date: 11-Nov-2024
  • (2024)Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-CutProceedings of the ACM on Management of Data10.1145/36511482:2(1-18)Online publication date: 14-May-2024
  • Show More Cited By

Index Terms

  1. Graph sketches: sparsification, spanners, and subgraphs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
    May 2012
    332 pages
    ISBN:9781450312486
    DOI:10.1145/2213556
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data streams
    2. graph sketching
    3. spanners
    4. sparsification
    5. subgraphs

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '12
    Sponsor:

    Acceptance Rates

    PODS '12 Paper Acceptance Rate 26 of 101 submissions, 26%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)109
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)LM-SRPQ: Efficiently Answering Regular Path Query in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121417:5(1047-1059)Online publication date: 1-Jan-2024
    • (2024)Matrix hypercontractivity, streaming algorithms and LDCs: the large alphabet caseACM Transactions on Computation Theory10.1145/368882416:4(1-38)Online publication date: 11-Nov-2024
    • (2024)Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-CutProceedings of the ACM on Management of Data10.1145/36511482:2(1-18)Online publication date: 14-May-2024
    • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
    • (2024)DoppelGanger++: Towards Fast Dependency Graph Generation for Database ReplayProceedings of the ACM on Management of Data10.1145/36393222:1(1-26)Online publication date: 26-Mar-2024
    • (2024)Optimal Multi-pass Lower Bounds for MST in Dynamic StreamsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649755(835-846)Online publication date: 10-Jun-2024
    • (2024)General-purpose query processing on summary graphsSocial Network Analysis and Mining10.1007/s13278-024-01314-w14:1Online publication date: 9-Aug-2024
    • (2024)A Framework for Adversarial Streaming Via Differential Privacy and Difference EstimatorsAlgorithmica10.1007/s00453-024-01259-886:11(3339-3394)Online publication date: 31-Aug-2024
    • (2024)Deterministic fault-tolerant connectivity labeling schemeDistributed Computing10.1007/s00446-024-00472-6Online publication date: 4-Nov-2024
    • (2023)Deterministic Fault-Tolerant Connectivity Labeling SchemeProceedings of the 2023 ACM Symposium on Principles of Distributed Computing10.1145/3583668.3594584(190-199)Online publication date: 19-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media