Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2817946.2817950acmconferencesArticle/Chapter ViewAbstractPublication PagescosnConference Proceedingsconference-collections
research-article

Streaming Graph Partitioning in the Planted Partition Model

Published: 02 November 2015 Publication History

Abstract

The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partitioning in order to minimize communication and achieve work balance.
In this work we contribute to the recent research line of streaming graph partitioning [30], ;31], [34] which computes an approximately balanced k-partitioning of the vertex set of a graph using a single pass over the graph stream using degree-based criteria. This graph partitioning framework is well tailored to processing large-scale and dynamic graphs. In this work we introduce the use of higher length walks for streaming graph partitioning and show that their use incurs a minor computational cost which can significantly improve the quality of the graph partition. We perform an average case analysis of our algorithm using the planted partition model [7], [25]. We complement the recent results of Stanton [30] by showing that our proposed method recovers the true partition with high probability even when the gap of the model tends to zero as the size of the graph grows. Furthermore, among the wide number of choices for the length of the walks we show that the proposed length is optimal. Finally, we perform simulations which indicate that our asymptotic results hold even for small graph sizes.

References

[1]
http://tinyurl.com/pmzmtys.
[2]
http://yann.lecun.com/exdb/mnist/.
[3]
Large scale machine learning and other animals, http://tinyurl.com/qfn766u, June 2014.
[4]
B. Bollobás. Random graphs. Cambridge University Press, 2001.
[5]
A. Buluc, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. Recent advances in graph partitioning. Arxiv 1311.3144.
[6]
R. Chen, J. Shi, Y. Chen, H. Guan, and H. Chen. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. 2013.
[7]
A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms (RSA), 18(2):116--140, 2001.
[8]
D. P. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, 2009.
[9]
U. Feige and R. Krauthgamer. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput., 31(4):1090--1118, Apr. 2002.
[10]
A. D. Flaxman, A. M. Frieze, and J. Vera. A geometric preferential attachment model of networks. Internet Mathematics, 3(2):187--205, 2006.
[11]
A. Frieze and C. E. Tsourakakis. Some properties of random apollonian networks. In Internet Mathematics. 2014.
[12]
A. Gionis and C. E. Tsourakakis. Dense subgraph discovery: Kdd 2015 tutorial. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2313--2314. ACM, 2015.
[13]
J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Operating Systems Design and Implementation (OSDI), 2012.
[14]
G. Jerrum, M.and Sorkin. The metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82(1):155--175, 1998.
[15]
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In IEEE International Conference on Data Mining (ICDM), 2009.
[16]
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J., 49(2):291--307, Feb. 1970.
[17]
D. E. Knuth. Seminumerical algorithms. 2007.
[18]
A. Konstantin and H. R\"acke. Balanced graph partitioning. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 120--124, 2004.
[19]
R. Krauthgamer, J. S. Naor, and R. Schwartz. Partitioning graphs into balanced components. In Symposium on Discrete Algorithms (SODA), pages 942--949, 2009.
[20]
L. Lovász and M. Simonovits. The mixing rate of markov chains, an isoperimetric inequality, and computing the volume. In Annual Symposium on Foundations of Computer Science (FOCS), pages 346--354. IEEE, 1990.
[21]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 340--349, 2010.
[22]
M. Maier, M. Hein, and U. von Luxburg. Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theoretical Computer Science, 410(19):1749--1764, 2009.
[23]
G. Malewicz, M. H. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010.
[24]
D. Margo and M. Seltzer. A scalable distributed graph partitioner. Proceedings of the VLDB Endowment, 8(12), 2015.
[25]
F. McSherry. Spectral partitioning of random graphs. In Annual Symposium on Foundations of Computer Science (FOCS), pages 529--537. IEEE, 2001.
[26]
M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
[27]
J. Nishimura and J. Ugander. Restreaming graph partitioning: Simple versatile algorithms for advanced balancing. In ACM KDD, 2013.
[28]
K. Schloegel, G. Karypis, and V. Kumar. Parallel multilevel algorithms for multi-constraint graph partitioning (distinguished paper). In Proceedings from the 6th International Euro-Par Conference on Parallel Processing, Euro-Par '00, pages 296--310, 2000.
[29]
D. A. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning. arXiv preprint arXiv:0809.3232, 2008.
[30]
I. Stanton. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1287--1301, 2014.
[31]
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012.
[32]
C. Tsourakakis. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web, pages 1122--1132. International World Wide Web Conferences Steering Committee, 2015.
[33]
C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 104--112. ACM, 2013.
[34]
C. E. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive distributed graphs. Seventh ACM International Conference on Web Search and Data Mining, WSDM 2014, 2014.
[35]
C. E. Tsourakakis, M. N. Kolountzakis, and G. L. Miller. Triangle sparsifiers. J. Graph Algorithms Appl., 15(6):703--726, 2011.
[36]
V. Vu. A simple svd algorithm for finding hidden partitions. CoRR, abs/1404.3918, 2014.
[37]
H. Zhou and D. Woodruff. Clustering via matrix powering. In Symposium on Principles of Database Systems (PODS), 2004.

Cited By

View all
  • (2022)Rough-Set-Based Real-Time Interest Label Extraction over Large-Scale Social NetworksComplexity10.1155/2022/20729502022Online publication date: 1-Jan-2022
  • (2022)AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539100(2762-2770)Online publication date: 14-Aug-2022
  • (2022)GCNSplitProceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3533702.3534920(1-12)Online publication date: 17-Jun-2022
  • Show More Cited By

Index Terms

  1. Streaming Graph Partitioning in the Planted Partition Model

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    COSN '15: Proceedings of the 2015 ACM on Conference on Online Social Networks
    November 2015
    280 pages
    ISBN:9781450339513
    DOI:10.1145/2817946
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed computing
    2. planted partition model
    3. streaming graph partitioning

    Qualifiers

    • Research-article

    Conference

    COSN'15
    Sponsor:
    COSN'15: Conference on Online Social Networks
    November 2 - 3, 2015
    California, Palo Alto, USA

    Acceptance Rates

    COSN '15 Paper Acceptance Rate 22 of 82 submissions, 27%;
    Overall Acceptance Rate 69 of 307 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 19 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Rough-Set-Based Real-Time Interest Label Extraction over Large-Scale Social NetworksComplexity10.1155/2022/20729502022Online publication date: 1-Jan-2022
    • (2022)AntiBenford Subgraphs: Unsupervised Anomaly Detection in Financial NetworksProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539100(2762-2770)Online publication date: 14-Aug-2022
    • (2022)GCNSplitProceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3533702.3534920(1-12)Online publication date: 17-Jun-2022
    • (2021)SDP: Scalable Real-time Dynamic Graph PartitionerIEEE Transactions on Services Computing10.1109/TSC.2021.3137932(1-1)Online publication date: 2021
    • (2021)Planted hitting set recovery in hypergraphsJournal of Physics: Complexity10.1088/2632-072X/abdb7d2:3(035004)Online publication date: 5-May-2021
    • (2021)Finding densest k-connected subgraphsDiscrete Applied Mathematics10.1016/j.dam.2021.08.032305:C(34-47)Online publication date: 31-Dec-2021
    • (2020)Flowless: Extracting Densest Subgraphs Without Flow ComputationsProceedings of The Web Conference 202010.1145/3366423.3380140(573-583)Online publication date: 20-Apr-2020
    • (2020)Optimal Learning of Joint Alignments with a Faulty Oracle2020 Information Theory and Applications Workshop (ITA)10.1109/ITA50056.2020.9244966(1-10)Online publication date: 2-Feb-2020
    • (2020)DETER: Streaming Graph Partitioning via Combined Degree and Cluster InformationAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-38991-8_16(242-255)Online publication date: 22-Jan-2020
    • (2019)Window-based Streaming Graph Partitioning AlgorithmProceedings of the Australasian Computer Science Week Multiconference10.1145/3290688.3290711(1-10)Online publication date: 29-Jan-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media