Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

The more the merrier: efficient multi-source graph traversal

Published: 01 December 2014 Publication History

Abstract

Graph analytics on social networks, Web data, and communication networks has been widely used in a plethora of applications. Many graph analytics algorithms are based on breadth-first search (BFS) graph traversal, which is not only time-consuming for large datasets but also involves much redundant computation when executed multiple times from different start vertices. In this paper, we propose Multi-Source BFS (MS-BFS), an algorithm that is designed to run multiple concurrent BFSs over the same graph on a single CPU core while scaling up as the number of cores increases. MS-BFS leverages the properties of small-world networks, which apply to many real-world graphs, and enables efficient graph traversal that: (i) shares common computation across concurrent BFSs; (ii) greatly reduces the number of random memory accesses; and (iii) does not incur synchronization costs. We demonstrate how a real graph analytics application---all-vertices closeness centrality---can be efficiently solved with MS-BFS. Furthermore, we present an extensive experimental evaluation with both synthetic and real datasets, including Twitter and Wikipedia, showing that MS-BFS provides almost linear scalability with respect to the number of cores and excellent scalability for increasing graph sizes, outperforming state-of-the-art BFS algorithms by more than one order of magnitude when running a large number of BFSs.

References

[1]
Graph 500 Benchmark, 2014. http://www.graph500.org/.
[2]
V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable Graph Exploration on Multicore Processors. In SC '10, pages 1--11, 2010.
[3]
L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley. Classes of Small-World Networks. PNAS, 97(21): 11149--11152, 2000.
[4]
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. In WebSci '12, pages 33--42, 2012.
[5]
D. A. Bader and K. Madduri. Designing Multithreaded Algorithms for Breadth-First Search and St-connectivity on the Cray MTA-2. In ICPP '06, pages 523--530, 2006.
[6]
D. A. Bader and K. Madduri. SNAP, Small-world Network Analysis and Partitioning: An Open-source Parallel Graph Framework for the Exploration of Large-scale Networks. In IPDPS, pages 1--12, 2008.
[7]
S. Beamer, K. Asanović, and D. Patterson. Direction-Optimizing Breadth-First Search. In SC '12, pages 12:1--12:10, 2012.
[8]
P. Boncz. LDBC: Benchmarks for Graph and RDF Data Management. In IDEAS '13, pages 1--2, 2013.
[9]
U. Brandes. A Faster Algorithm for Betweenness Centrality. Journal of Mathematical Sociology, 25: 163--177, 2001.
[10]
A. Buluç and K. Madduri. Parallel Breadth-First Search on Distributed Memory Systems. In SC '11, pages 65:1--65:12, 2011.
[11]
F. Checconi, F. Petrini, J. Willcock, A. Lumsdaine, A. R. Choudhury, and Y. Sabharwal. Breaking the Speed and Scalability Barriers for Graph Exploration on Distributed-Memory Machines. In SC '12, pages 13:1--13:12, 2012.
[12]
J. Cheng, Z. Shang, H. Cheng, H. Wang, and J. X. Yu. K-Reach: Who is in Your Small World. PVLDB, 5(11): 1292--1303, July 2012.
[13]
B. V. Cherkassky, A. V. Goldberg, and T. Radzik. Shortest Paths Algorithms: Theory and Experimental Evaluation. Mathematical Programming, 73(2): 129--174, 1996.
[14]
J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency. In IPDPS '12, pages 378--389, May 2012.
[15]
Faunus -- Graph Analytics Engine, 2014. http://thinkaurelius.github.io/faunus/.
[16]
D. Gregor and A. Lumsdaine. The Parallel BGL: A Generic Library for Distributed Graph Computations. In POOSC, 2005.
[17]
P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. WTF: The Who to Follow Service at Twitter. In WWW '13, pages 505--514, 2013.
[18]
S. Hong, T. Oguntebi, and K. Olukotun. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU. In PACT '11, pages 78--88, Oct 2011.
[19]
A. Kemper and T. Neumann. HyPer: A Hybrid OLTP&OLAP Main Memory Database System Based on Virtual Memory Snapshots. In ICDE'11, pages 195--206, 2011.
[20]
A. Landherr, B. Friedl, and J. Heidemann. A Critical Review of Centrality Measures in Social Networks. Business & Information Systems Engineering, 2(6): 371--385, 2010.
[21]
LDBC Social Network Data Generator, 2014. https://github.com/ldbc/ldbc_snb_datagen.
[22]
J. Lee, C. Jung, D. Lim, and Y. Solihin. Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems. TPDS, 20(9): 1309--1324, 2009.
[23]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB, 5(8): 716--727, April 2012.
[24]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in Parallel Graph Processing. Parallel Processing Letters, 17(1): 5--20, 2007.
[25]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing. In SIGMOD '10, pages 135--146, 2010.
[26]
Neo4j, 2014. http://neo4j.com/.
[27]
P. Olsen, A. Labouseur, and J.-H. Hwang. Efficient Top-K Closeness Centrality Search. In ICDE '14, pages 196--207, March 2014.
[28]
L. Qin, J. X. Yu, L. Chang, H. Cheng, C. Zhang, and X. Lin. Scalable Big Graph Processing in MapReduce. In SIGMOD '14, pages 827--838, 2014.
[29]
A. Rowstron, D. Narayanan, A. Donnelly, G. O'Shea, and A. Douglas. Nobody Ever Got Fired for Using Hadoop on a Cluster. In HotCDP '12, pages 2:1--2:5, 2012.
[30]
N. Satish, C. Kim, J. Chhugani, and P. Dubey. Large-Scale Energy-efficient Graph Traversal: A Path to Efficient Data-Intensive Supercomputing. In SC '12, pages 14:1--14:11, 2012.
[31]
B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD '13, pages 505--516, 2013.
[32]
D. Simmen, K. Schnaitter, J. Davis, Y. He, S. Lohariwala, A. Mysore, V. Shenoi, M. Tan, and X. Yu. Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery. PVLDB, 7(13): 1405--1416, August 2014.
[33]
Titan -- Distributed Graph Database, 2014. http://thinkaurelius.github.io/titan/.
[34]
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications, volume 8. Cambridge University Press, 1994.

Cited By

View all
  • (2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsInternational Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 1-May-2024
  • (2024)uBlade: Efficient Batch Processing for Uncertainty Graph QueriesProceedings of the ACM on Management of Data10.1145/36549822:3(1-24)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 8, Issue 4
December 2014
132 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 December 2014
Published in PVLDB Volume 8, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)132
  • Downloads (Last 6 weeks)25
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SeraphProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650719(373-387)Online publication date: 27-Feb-2024
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsInternational Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 1-May-2024
  • (2024)uBlade: Efficient Batch Processing for Uncertainty Graph QueriesProceedings of the ACM on Management of Data10.1145/36549822:3(1-24)Online publication date: 30-May-2024
  • (2024)On Efficient Large Sparse Matrix Chain MultiplicationProceedings of the ACM on Management of Data10.1145/36549592:3(1-27)Online publication date: 30-May-2024
  • (2024)FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale SystemsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656621(38-49)Online publication date: 30-May-2024
  • (2024)Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix MultiplicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00052(1-17)Online publication date: 17-Nov-2024
  • (2024)Optimizing sparse general matrix–matrix multiplication for DCUsThe Journal of Supercomputing10.1007/s11227-024-06234-280:14(20176-20200)Online publication date: 30-May-2024
  • (2023)MITra: A Framework for Multi-Instance Graph TraversalProceedings of the VLDB Endowment10.14778/3603581.360359416:10(2551-2564)Online publication date: 1-Jun-2023
  • (2023)HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication AcceleratorProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623790(1148-1162)Online publication date: 28-Oct-2023
  • (2023)Closeness Centrality on Uncertain GraphsACM Transactions on the Web10.1145/360491217:4(1-29)Online publication date: 11-Jul-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media