Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3291656.3291731acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

ShenTu: processing multi-trillion edge graphs on millions of cores in seconds

Published: 11 November 2018 Publication History

Abstract

Graphs are an important abstraction used in many scientific fields. With the magnitude of graph-structured data constantly increasing, effective data analytics requires efficient and scalable graph processing systems. Although HPC systems have long been used for scientific computing, people have only recently started to assess their potential for graph processing, a workload with inherent load imbalance, lack of locality, and access irregularity. We propose ShenTu8, the first general-purpose graph processing framework that can efficiently utilize an entire Petascale system to process multi-trillion edge graphs in seconds. ShenTu embodies four key innovations: hardware specialization, supernode routing, on-chip sorting, and degree-aware messaging, which together enable its unprecedented performance and scalability. It can traverse a record-size 70-trillion-edge graph in seconds. Furthermore, ShenTu enables the processing of a spam detection problem on a 12-trillion edge Internet graph, making it possible to identify trustworthy and spam webpages directly at the fine-grained page level.

References

[1]
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler, "Genbank," Nucleic acids research, 2005.
[2]
H. Mustafa, I. Schilken, M. Karasikov, C. Eickhoff, G. Ratsch, and A. Kahles, "Dynamic compression schemes for graph coloring," bioRxiv, 2018.
[3]
B. Pakkenberg and H. Gundersen, "Total number of neurons and glial cells in human brain nuclei estimated by the disector and the fractionator," Journal of microscopy, 1988.
[4]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry, "Challenges in parallel graph processing," Parallel Processing Letters, 2007.
[5]
M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relationships of the internet topology," in SIGCOMM, ACM, 1999.
[6]
J. Shun and G. E. Blelloch, "Ligra: a lightweight graph processing framework for shared memory," in ACM SIGPLAN Notices, ACM, 2013.
[7]
X. Zhu, W. Chen, W. Zheng, and X. Ma, "Gemini: A computation-centric distributed graph processing system," in OSDI, USENIX, 2016.
[8]
X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip, Z.-H. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg, "Top 10 algorithms in data mining," Knowledge and information systems, 2008.
[9]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, "Pregel: a system for large-scale graph processing," in SIGMOD, ACM, 2010.
[10]
A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan, "One trillion edges: Graph processing at facebook-scale," VLDB, 2015.
[11]
M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou, "Gram: scaling graph computation to the trillions," in SoCC, ACM, 2015.
[12]
Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger, "An algorithmic approach to communication reduction in parallel graph algorithms," in PACT, IEEE, 2015.
[13]
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel, "Chaos: Scale-out graph processing from secondary storage," in SOSP, 2015.
[14]
P. Kumar and H. H. Huang, "G-store: high-performance graph store for trillion-edge processing," in SC, IEEE Press, 2016.
[15]
H. Liu and H. H. Huang, "Graphene: Fine-grained io management for graph computing," in FAST, USENIX Association, 2017.
[16]
S. Maass, C. Min, S. Kashyap, W. Kang, M. Kumar, and T. Kim, "Mosaic: Processing a trillion-edge graph on a single machine," in Proceedings of the Twelfth European Conference on Computer Systems, pp. 527--543, ACM, 2017.
[17]
D. Nguyen, A. Lenharth, and K. Pingali, "A lightweight infrastructure for graph analytics," in SOSP, SOSP, ACM, 2013.
[18]
K. Zhang, R. Chen, and H. Chen, "Numa-aware graph-structured analytics," in ACM SIGPLAN Notices, ACM, 2015.
[19]
N. Sundaram, N. Satish, M. M. A. Patwary, S. R. Dulloor, M. J. Anderson, S. G. Vadlamudi, D. Das, and P. Dubey, "Graphmat: High performance graph analytics made productive," VLDB, 2015.
[20]
A. Kyrola, G. E. Blelloch, and C. Guestrin, "Graphchi: Large-scale graph computation on just a pc.," in OSDI, 2012.
[21]
A. Roy, I. Mihailovic, and W. Zwaenepoel, "X-stream: edge-centric graph processing using streaming partitions," in SOSP, ACM, 2013.
[22]
D. Zheng, D. Mhembere, R. Burns, J. Vogelstein, C. E. Priebe, and A. S. Szalay, "Flashgraph: Processing billion-node graphs on an array of commodity ssds," in FAST, 2015.
[23]
X. Zhu, W. Han, and W. Chen, "Gridgraph: Large scale graph processing on a single machine using 2-level hierarchical partitioning," in USENIX ATC, 2015.
[24]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, "Distributed graphlab: a framework for machine learning and data mining in the cloud," VLDB, 2012.
[25]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin, "Powergraph: Distributed graph-parallel computation on natural graphs.," in OSDI, 2012.
[26]
R. Chen, J. Shi, Y. Chen, and H. Chen, "Powerlyra: Differentiated graph computation and partitioning on skewed graphs," in Proceedings of the Tenth European Conference on Computer Systems, ACM, 2015.
[27]
S. Hong, S. Depner, T. Manhardt, J. Van Der Lugt, M. Verstraaten, and H. Chafi, "Pgx.d: A fast distributed graph processing engine," in SC, ACM, 2015.
[28]
D. Gregor and A. Lumsdaine, "The parallel bgl: A generic library for distributed graph computations," POOSC, 2005.
[29]
F. Checconi and F. Petrini, "Traversing trillions of edges in real time: Graph exploration on large-scale parallel machines," IPDPS, 2014.
[30]
H. Lin, X. Tang, B. Yu, Y. Zhuo, W. Chen, J. Zhai, W. Yin, and W. Zheng, "Scalable graph traversal on sunway taihulight with ten million cores," in IPDPS, IEEE, 2017.
[31]
K. Ueno, T. Suzumura, N. Maruyama, K. Fujisawa, and S. Matsuoka, "Extreme scale breadth-first search on supercomputers," in Big Data, IEEE, 2016.
[32]
C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, and L. Wilcox, "Extreme-scale amr," in SC, IEEE, 2010.
[33]
M. Bernaschi, M. Bisson, T. Endo, S. Matsuoka, and M. Fatica, "Petaflop biofluidics simulations on a two million-core system," in SC, 2011.
[34]
J. Chhugani, C. Kim, H. Shukla, J. Park, P. Dubey, J. Shalf, and H. D. Simon, "Billion-particle simd-friendly two-point correlation on large-scale hpc cluster systems," in SC, IEEE Computer Society Press, 2012.
[35]
T. Muranushi, H. Hotta, J. Makino, S. Nishizawa, H. Tomita, K. Nitadori, M. Iwasawa, N. Hosono, Y. Maruyama, H. Inoue, H. Yashiro, and Y. Nakamura, "Simulations of below-ground dynamics of fungi: 1.184 pflops attained by automated generation and autotuning of temporal blocking codes," in SC, 2016.
[36]
H. Fu, J. Liao, J. Yang, L. Wang, Z. Song, X. Huang, C. Yang, W. Xue, F. Liu, F. Qiao, W. Zhao, X. Yin, C. Hou, C. Zhang, W. Ge, J. Zhang, Y. Wang, C. Zhou, and G. Yang, "The Sunway TaihuLight supercomputer: system and applications," Science China Information Sciences, vol. 072001, 2016.
[37]
W. Zhang, J. Lin, W. Xu, H. Fu, and G. Yang, "Scstore: managing scientific computing packages for hybrid system with containers," Tsinghua Science and Technology, vol. 22, no. 6, pp. 675--681, 2017.
[38]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica, "GraphX : Graph Processing in a Distributed Dataflow Framework," in OSDI '14, 2014.
[39]
S. Beamer, K. Asanovic, and D. Patterson, "Searching for a Parent Instead of Fighting Over Children: A Fast Breadth-First Search Implementation for Graph500," Tech Report UCB/EECS-2011-117, 2011.
[40]
M. Besta, M. Podstawski, L. Groner, E. Solomonik, and T. Hoefler, "To push or to pull: On reducing communication and synchronization in graph computations," in HPDC'17, ACM, 2017.
[41]
G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM SISC, 1998.
[42]
S. Beamer, A. Buluc, K. Asanovic, and D. Patterson, "Distributed memory breadth-first search revisited: Enabling bottom-up search," IPDPSW, 2013.
[43]
P. Erdős and A. Rényi, "On the existence of a factor of degree one of a connected random graph," Acta Mathematica Hungarica, 2005.
[44]
H. Kwak, C. Lee, H. Park, and S. Moon, "What is twitter, a social network or a news media?," in WWW, ACM, 2010.
[45]
P. Boldi and S. Vigna, "The webgraph framework i: compression techniques," in WWW, ACM, 2004.
[46]
W. Han, X. Zhu, Z. Zhu, W. Chen, W. Zheng, and J. Lu, "Weibo, and a tale of two worlds," in ASONAM 2015, ACM, 2015.
[47]
The lemur project: Clueweb12 web graph., "http://www.lemurproject.org/clueweb12/webgraph.php/."
[48]
WDC - Hyperlink Graphs, "http://webdatacommons.org/hyperlinkgraph/," 2018.
[49]
J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, "Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication," in ECML-PKDD'05, Springer, 2005.
[50]
D. Chakrabarti, Y. Zhan, and C. Faloutsos, "R-mat: A recursive model for graph mining," in SIAM DM'04, SIAM, 2004.
[51]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen, "Combating web spam with trustrank," in VLDB, VLDB Endowment, 2004.
[52]
T. Hoefler, T. Schneider, and A. Lumsdaine, "Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks," in Cluster'08, IEEE, Oct. 2008.
[53]
J. Dean and S. Ghemawat, "Mapreduce: simplified data processing on large clusters," Communications of the ACM, 2008.
[54]
K. Avrachenkov, N. Litvak, D. Nemirovsky, and N. Osipova, "Monte carlo methods in pagerank computation: When one iteration is sufficient," SIAM NA, vol. 45, no. 2, pp. 890--904, 2007.
[55]
Search engine optimization marketing spending, "https://www.statista.com/statistics/269410/advertising-expenditure-for-seo-marketing/," 2018.

Cited By

View all
  • (2019)Automatic, application-aware I/O forwarding resource allocationProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323323(265-279)Online publication date: 25-Feb-2019
  • (2019)End-to-end I/O monitoring on a leading supercomputerProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323267(379-394)Online publication date: 26-Feb-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis
November 2018
932 pages

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 11 November 2018

Check for updates

Author Tags

  1. application programming interfaces
  2. big data applications
  3. data analysis
  4. graph theory
  5. supercomputers

Qualifiers

  • Research-article

Conference

SC18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Automatic, application-aware I/O forwarding resource allocationProceedings of the 17th USENIX Conference on File and Storage Technologies10.5555/3323298.3323323(265-279)Online publication date: 25-Feb-2019
  • (2019)End-to-end I/O monitoring on a leading supercomputerProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323267(379-394)Online publication date: 26-Feb-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media