Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3308558.3313434acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

RealGraph: A Graph Engine Leveraging the Power-Law Distribution of Real-World Graphs

Published: 13 May 2019 Publication History

Abstract

As the size of real-world graphs has drastically increased in recent years, a wide variety of graph engines have been developed to deal with such big graphs efficiently. However, the majority of graph engines have been designed without considering the power-law degree distribution of real-world graphs seriously. Two problems have been observed when existing graph engines process real-world graphs: inefficient scanning of the sparse indicator and the delay in iteration progress due to uneven workload distribution. In this paper, we propose RealGraph, a single-machine based graph engine equipped with the hierarchical indicator and the block-based workload allocation. Experimental results on real-world datasets show that RealGraph significantly outperforms existing graph engines in terms of both speed and scalability.

References

[1]
Ching Avery. 2011. Giraph: Large-scale graph processing infrastructure on hadoop. In Hadoop Summit. 5-9.
[2]
Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25, 2 (2001), 163-177.
[3]
Michael J Carey, David J DeWitt, Michael J Franklin, Nancy E Hall, Mark L McAuliffe, Jeffrey F Naughton, Daniel T Schuh, Marvin H Solomon, CK Tan, Odysseas G Tsatalos, Seth J White, and Michael J Zwilling. 1994. Shoring up persistent applications. In Proceedings of the ACM international conference on management of data (SIGMOD). 383-394.
[4]
Michael J Carey, David J DeWitt, Joel E Richardson, and Eugene J Shekita. 1986. Object and file management in the EXODUS extensible database system. University of Wisconsin-Madison. Computer Sciences Department.
[5]
Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, and Haibing Guan. 2014. Computation and communication efficient graph processing with distributed immutable view. In Proceedings of the international symposium on high-performance parallel and distributed computing (HPDC). 215-226.
[6]
Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, and Huazhong Yang. 2016. NXgraph: An efficient graph processing system on a single machine. In Proceedings of the IEEE international conference on data engineering (ICDE). 409-420.
[7]
H-T Chou, David J Dewitt, Randy H Katz, and Anthony C Klug. 1985. Design and implementation of the Wisconsin storage system. Software: Practice and Experience 15, 10 (1985), 943-962.
[8]
Martin Erwig. 1992. Graph algorithms= iteration+ data structures?. In International workshop on graph-theoretic concepts in computer science. 277-292.
[9]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the USENIX symposium on operating systems design and implementation (OSDI). 17-30.
[10]
Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: barrierless asynchronous parallel execution in pregel-like graph processing systems. Proceedings of the VLDB Endowment 8, 9 (2015), 950-961.
[11]
Min-Hee Jang, Christos Faloutsos, Sang-Wook Kim, U Kang, and Jiwoon Ha. 2016. Pin-trust: Fast trust propagation exploiting positive, implicit, and negative information. In Proceedings of the ACM international conference on information and knowledge management (CIKM). 629-638.
[12]
Yong-Yeon Jo, Jiwon Hong, Myung-Hwan Jang, Jae-Geun Bang, and Sang-Wook Kim. 2016. Data Locality in graph engines: Implications and preliminary experimental results. In Proceedings of the ACM international conference on information and knowledge management (CIKM). 1885-1888.
[13]
Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim. 2016. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proceedings of the ACM international conference on management of data (SIGMOD). 447-461.
[14]
Aapo Kyrola, Guy E Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a pc. In Proceedings of the USENIX symposium on operating systems design and implementation (OSDI). 31-46.
[15]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 1-41.
[16]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716-727.
[17]
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the USENIX annual technical conference (ATC). 195-207.
[18]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of the ACM european conference on computer systems (EuroSys). 527-543.
[19]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM international conference on management of data (SIGMOD). 135-146.
[20]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
[21]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric graph processing using streaming partitions. In Proceedings of the ACM symposium on operating systems principles (SOSP). 472-488.
[22]
Robert Sedgewick and Kevin Wayne. 2011. Algorithms. Addison-wesley professional.
[23]
Kenji Suzuki, Isao Horiba, and Noboru Sugie. 2003. Linear-time connected-component labeling based on sequential local operations. Computer Vision and Image Understanding 89, 1 (2003), 1-23.
[24]
Guozhang Wang, Wenlei Xie, Alan J Demers, and Johannes Gehrke. 2013. Asynchronous large-scale graph processing made easy. In Proceedings of biennial conference on innovative data systems research (CIDR). 3-6.
[25]
Wook-Shin, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, and Hwanjo Yu. 2013. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC. In Proceedings of the ACM international conference on knowledge discovery and data mining (KDD). 77-85.
[26]
Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. GraphX: A resilient distributed graph system on spark. In Proceedings of the international workshop on graph data management experiences and systems (GRADE). 1-6.
[27]
Xintian Yang, Srinivasan Parthasarathy, and Ponnuswamy Sadayappan. 2011. Fast sparse matrix-vector multiplication on GPUs: Implications for graph mining. Proceedings of the VLDB Endowment 4, 4 (2011), 231-242.
[28]
Hilmi Yildirim and Mukkai S Krishnamoorthy. 2008. A random walk method for alleviating the sparsity problem in collaborative filtering. In Proceedings of the ACM conference on recommender systems (RecSys). 131-138.
[29]
Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E Priebe, and Alexander S Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of the USENIX conference on file and storage technologies (FAST). 45-58.
[30]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of the USENIX annual technical conference (ATC). 375-386.

Cited By

View all
  • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
  • (2023)Disentangling Degree-related Biases and Interest for Out-of-Distribution Generalized Directed Network EmbeddingProceedings of the ACM Web Conference 202310.1145/3543507.3583271(231-239)Online publication date: 30-Apr-2023
  • (2023)On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386309(215-220)Online publication date: 15-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph engine
  2. power-law degree distribution
  3. real-world graph
  4. single machine

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)9
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
  • (2023)Disentangling Degree-related Biases and Interest for Out-of-Distribution Generalized Directed Network EmbeddingProceedings of the ACM Web Conference 202310.1145/3543507.3583271(231-239)Online publication date: 30-Apr-2023
  • (2023)On Overcoming HPC Challenges of Trillion-Scale Real-World Graph Datasets2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386309(215-220)Online publication date: 15-Dec-2023
  • (2022)CenGCN: Centralized Convolutional Networks with Vertex Imbalance for Scale-Free GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3149888(1-1)Online publication date: 2022
  • (2022)EnD: Enhanced Dedensification for Graph Compressing and Embedding2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00092(674-681)Online publication date: Nov-2022
  • (2021)RealGraphwebProceedings of the VLDB Endowment10.14778/3476311.347634214:12(2775-2778)Online publication date: 28-Oct-2021
  • (2021)A Data Layout with Good Data Locality for Single-Machine based Graph EnginesIEEE Transactions on Computers10.1109/TC.2021.3107725(1-1)Online publication date: 2021
  • (2021)Exploiting uninteresting items for effective graph-based one-class collaborative filteringThe Journal of Supercomputing10.1007/s11227-020-03573-877:7(6832-6851)Online publication date: 1-Jul-2021
  • (2020)GraphiteProceedings of the VLDB Endowment10.14778/3380750.338075113:6(783-797)Online publication date: 11-Mar-2020
  • (2020)ASiNEProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401079(609-618)Online publication date: 25-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media