Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2749246.2749258acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstractions

Published: 15 June 2015 Publication History

Abstract

Iterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high performance parallel processing for both in-memory and out-of-core graphs; and (2) how to handle large graphs that exceed the resource boundary of traditional systems by resource aware graph partitioning such that it is feasible to run large-scale graph analysis on a single PC. This paper presents GraphLego, a resource adaptive graph processing system with multi-level programmable graph parallel abstractions. GraphLego is novel in three aspects: (1) we argue that vertex-centric or edge-centric graph partitioning are ineffective for parallel processing of large graphs and we introduce three alternative graph parallel abstractions to enable a large graph to be partitioned at the granularity of subgraphs by slice, strip and dice based partitioning; (2) we use dice-based data placement algorithm to store a large graph on disk by minimizing non-sequential disk access and enabling more structured in-memory access; and (3) we dynamically determine the right level of graph parallel abstraction to maximize sequential access and minimize random access. GraphLego can run efficiently on different computers with diverse resource capacities and respond to different memory requirements by real-world graphs of different complexity. Extensive experiments show the competitiveness of GraphLego against existing representative graph processing systems, such as GraphChi, GraphLab and X-Stream.

References

[1]
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system - implementation and observations. In ICDM, 2009.
[2]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.
[3]
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI, 2010.
[4]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. In UAI, 2010.
[5]
U. Kang, H. Tong, J. Sun, C.-Y. Lin, and C. Faloutsos. Gbase: A scalable and general graph management system. In KDD, pages 1091--1099, 2011.
[6]
A. Buluc and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications. IJHPCA, 25(4):496--509, 2011.
[7]
R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.
[8]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. In PVLDB, 2012.
[9]
V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haridasan. Managing large graphs on multi-cores with graph awareness. In ATC, 2010.
[10]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, 2012.
[11]
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. OSDI'12.
[12]
Giraph. http://giraph.apache.org/.
[13]
B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In SIGMOD, 2013.
[14]
W.-S. Han, S. Lee, K. Park, J.-H. Lee, M.-S. Kim, J. Kim, and H. Yu. TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC. In KDD, pages 77--85, 2013.
[15]
W. Xie, G. Wang, D. Bindel, A. Demers, and J. Gehrke. Fast iterative graph computation with block updates. PVLDB, 2013.
[16]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric Graph Processing using Streaming Partitions. In SOSP, 2013.
[17]
K. Lee and L. Liu. Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning. PVLDB'13.
[18]
Y. Tian, A. Balmin, S. Andreas Corsten, S. Tatikonda, and J. McPherson. From "Think Like a Vertex" to "Think Like a Graph". PVLDB, 2013.
[19]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, 2014.
[20]
P. Yuan, W. Zhang, C. Xie, H. Jin, L. Liu, and K. Lee. Fast Iterative Graph Computation: A Path Centric Approach. In SC, 2014.
[21]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, 1998.
[22]
M. Bender, G. Brodal, R. Fagerberg, R. Jacob, and E. Vicari. Optimal sparse matrix dense vector multiplication in the i/o-model. Theory of Computing Systems, 47(4):934--962, 2010.
[23]
X. Zhu and Z. Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. In Carnegie Mellon University CALD Tech Report, 2002.
[24]
R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In ICML, 2003.
[25]
Y. Zhou, H. Cheng, and J. X. Yu. Clustering large attributed graphs: An efficient incremental approach. In ICDM, 2010.
[26]
W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. In ICDM, 2006.
[27]
Y. Zhou and L. Liu. Activity-edge centric multi-label classification for mining heterogeneous information networks. In KDD, 2014.
[28]
S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In WWW, pages 640--651, 2003.
[29]
H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, 2006.
[30]
H. Ma, H. Yang, M. R. Lyu, and I. King. Mining social networks using heat diffusion processes for marketing candidates selection. In CIKM, pages 233--242, 2008.
[31]
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In ICML, 2003.
[32]
S. A. Macskassy and F. Provost. A simple relational classifier. In MRDM, pages 64--76, 2003.
[33]
V. Satuluri and S. Parthasarathy. Scalable graph clustering using stochastic flows: Applications to community discovery. In KDD, 2009.
[34]
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. VLDB'09.
[35]
Y. Zhou and L. Liu. Social influence based clustering of heterogeneous information networks. In KDD, 2013.
[36]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2000.
[37]
Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD'08.
[38]
H. Ma. An experimental study on implicit social recommendation. In SIGIR, pages 73--82, 2013.
[39]
Y. Zhou, L. Liu, C.-S. Perng, A. Sailer, I. Silva-Lepe, and Z. Su. Ranking services by service network structure and service attributes. In ICWS, 2013.
[40]
Yahoo Webscope. Yahoo! AltaVista Web Page Hyperlink Connectivity Graph, circa 2002. http://webscope.sandbox.yahoo.com/.
[41]
P. Boldi, M. Santini, and S. Vigna. A Large Time- aware Web Graph. SIGIR Forum, 42(2):33--38, 2008.
[42]
H. Kwak, C. Lee, H. Park, S. Moon. What is Twitter, a social network or a news media? In WWW, 2010.
[43]
M. Gjoka, M. Kurant, C. T. Butts, A. Markopoulou. Walking in Facebook: A case study of unbiased sampling of OSNs. In INFOCOM, 2010.
[44]
http://www.informatik.uni-trier.de/~ley/db/.
[45]
http://www.last.fm/api/.
[46]
T. Chakraborty, S. Sikdar, V. Tammana, N. Ganguly, and A. Mukherjee. Computer science fields as ground-truth communities: Their impact, rise and fall. In ASONAM, pages 426--433, 2013.
[47]
J. Cohen, P. Cohen, S. G. West, and L. S. Aiken. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. In Routledge, 2002.
[48]
O. Bretscher. Linear Algebra With Applications, 3rd Edition. In Prentice Hall, 1995.
[49]
F. Hillier and G. Lieberman. Introduction to Operations Research. In McGraw-Hill College, 1995.

Cited By

View all
  • (2023)Dimension-independent certified neural network watermarks via mollifier smoothingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619611(28976-29008)Online publication date: 23-Jul-2023
  • (2022)Research on complex data analysis model based on distribution automation data2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)10.1109/ITAIC54216.2022.9836961(1179-1183)Online publication date: 17-Jun-2022
  • (2022)Federated Fingerprint Learning with Heterogeneous Architectures2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00013(31-40)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing
June 2015
296 pages
ISBN:9781450335508
DOI:10.1145/2749246
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3d cube representation
  2. graph processing system
  3. large-scale graph
  4. multigraph processing
  5. parallel computing

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation
  • Intel ISTC

Conference

HPDC'15
Sponsor:

Acceptance Rates

HPDC '15 Paper Acceptance Rate 19 of 116 submissions, 16%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Dimension-independent certified neural network watermarks via mollifier smoothingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619611(28976-29008)Online publication date: 23-Jul-2023
  • (2022)Research on complex data analysis model based on distribution automation data2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)10.1109/ITAIC54216.2022.9836961(1179-1183)Online publication date: 17-Jun-2022
  • (2022)Federated Fingerprint Learning with Heterogeneous Architectures2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00013(31-40)Online publication date: Nov-2022
  • (2020)Adversarial attacks on deep graph matchingProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3497474(20834-20851)Online publication date: 6-Dec-2020
  • (2020)Improving Collaborative Filtering with Social Influence over Heterogeneous Information NetworksACM Transactions on Internet Technology10.1145/339750520:4(1-29)Online publication date: 15-Oct-2020
  • (2020)Robust Meta Network Embedding Against Adversarial Attacks2020 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM50108.2020.00192(1448-1453)Online publication date: Nov-2020
  • (2020)Unsupervised Multiple Network Alignment with Multinominal GAN and Variational Inference2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9377929(868-877)Online publication date: 10-Dec-2020
  • (2020)Graph algorithms: parallelization and scalabilityScience China Information Sciences10.1007/s11432-020-2952-763:10Online publication date: 21-Sep-2020
  • (2019)Dual Adversarial Learning Based Network Alignment2019 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2019.00162(1288-1293)Online publication date: Nov-2019
  • (2019)Big RDF Data Storage, Computation, and Analysis: A Strawman's Arguments2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2019.00168(1693-1703)Online publication date: Jul-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media