Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2884045.2884054acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

General-purpose join algorithms for large graph triangle listing on heterogeneous systems

Published: 12 March 2016 Publication History

Abstract

We investigate applying general-purpose join algorithms to the triangle listing problem on heterogeneous systems that feature a multi-core CPU and multiple GPUs. In particular, we consider an out-of-core context where graph data are available on secondary storage such as a solid-state disk (SSD) and may not fit in the CPU main memory or GPU device memory. We focus on Leapfrog Triejoin (LFTJ), a recently proposed, worst-case optimal algorithm and present "boxing": a novel yet conceptually simple approach for partitioning and feeding out-of-core input data to LFTJ. The "boxing" algorithm integrates well with a GPU-Optimized LFTJ algorithm for triangle listing. We achieve significant performance gains on a heterogeneous system comprised of GPUs and CPU by utilizing the massive-parallel computation capability of GPUs. Our experimental evaluations on real-world and synthetic data sets (some of which containing more than 1.2 billion edges) show that out-of-core LFTJ is competitive with specialized graph algorithms for triangle listing. By using one or two GPUs, we achieve additional speedups on the same graphs.

References

[1]
A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS'08, pages 739--748. IEEE, 2008.
[2]
S. Baxter. Modern gpu. http://nvlabs.github.io/moderngpu/, 2013.
[3]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, volume 4, pages 442--446. SIAM, 2004.
[4]
S. Chu and J. Cheng. Triangle listing in massive networks and its applications. In SIGKDD, pages 672--680. ACM, 2011.
[5]
R. Dementiev. Algorithm engineering for large data sets. PhD thesis, Saarland University, 2006.
[6]
G. Diamos, H. Wu, J. Wang, A. Lele, and S. Yalamanchili. Relational algorithms for multi-bulk-synchronous processors. PPoPP '13, pages 301--302, 2013.
[7]
X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336. ACM, 2013.
[8]
M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. A resolution-based framework for joins: Worst-case and beyond. CoRR, abs/1404.0703, 2014.
[9]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10, pages 591--600, New York, NY, USA, 2010. ACM.
[10]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.
[11]
B. Menegola. An external memory algorithm for listing triangles. Technical report, Universidade Federal do Rio Grande do Sul, 2010.
[12]
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC'07, San Diego, CA, October 2007.
[13]
H. Q. Ngo, D. T. Nguyen, C. Ré, and A. Rudra. Beyond worst-case analysis for joins with minesweeper. CoRR, abs/1302.0914, 2014.
[14]
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms:{extended abstract}. In PODS, pages 37--48. ACM, 2012.
[15]
R. Pagh and F. Silvestri. The input/output complexity of triangle enumeration. In PODS'14, pages 224--233, 2014.
[16]
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814--818, 2005.
[17]
J. W. Raymond, E. J. Gardiner, and P. Willett. Rascal: Calculation of graph similarity using maximum common edge subgraphs. The Computer Journal, 45(6):631--644, 2002.
[18]
N. Rhodes, P. Willett, A. Calvet, J. B. Dunbar, and C. Humblet. Clip: similarity searching of 3d databases using clique detection. Journal of chemical information and computer sciences, 43(2):443--448, 2003.
[19]
T. Schank. Algorithmic aspects of triangle-based network analysis. Phd in computer science, University Karlsruhe, 2007.
[20]
J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed socialite: a datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment, 6(14):1906--1917, 2013.
[21]
T. L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.
[22]
H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red fox: An execution environment for relational query processing on gpus. CGO '14, pages 44:44--44:54, 2014.
[23]
H. Wu, G. Diamos, J. Wang, S. Cadambi, S. Yalamanchili, and S. Chakradhar. Optimizing data warehousing applications for gpus using kernel fusion/fission. In PLC, IPDPSW '12, pages 2433--2442, 2012.
[24]
H. Wu, D. Zinn, M. Aref, and S. Yalamanchili. Multipredicate join algorithms for accelerating relational graph processing on GPUs. ADMS 2014, September 2014.
[25]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. CoRR, abs/1205.6233, 2012.
[26]
D. Zinn. General-purpose join algorithms for listing triangles in large graphs. CoRR, abs/1501.06689, 2015.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit
March 2016
107 pages
ISBN:9781450341950
DOI:10.1145/2884045
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU
  2. data partitioning
  3. triangle listing

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation (NSF)

Conference

PPoPP '16

Acceptance Rates

GPGPU '16 Paper Acceptance Rate 9 of 23 submissions, 39%;
Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Parallel Logic Programming: A SequelTheory and Practice of Logic Programming10.1017/S147106842200005922:6(905-973)Online publication date: 28-Mar-2022
  • (2021)Accelerating multi-way joins on the GPUThe VLDB Journal10.1007/s00778-021-00708-y31:3(529-553)Online publication date: 2-Nov-2021
  • (2020)Load-Balancing Parallel Relational AlgebraHigh Performance Computing10.1007/978-3-030-50743-5_15(288-308)Online publication date: 15-Jun-2020
  • (2019)Distributed Relational Algebra at Scale2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC.2019.00014(12-22)Online publication date: Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media