research-article

Open access

A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPU

Authors:

Xiaochun YeAuthors Info & Claims

ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops

Pages 50 - 60

https://doi.org/10.1145/3605731.3605746

Published: 07 September 2023 Publication History

All formats PDF

Abstract

Single-Source Shortest Path (SSSP) algorithm is a common routine in graph processing and has been extensively studied on Graphics Processing Unit (GPU). Despite the powerful parallelism resources and high memory bandwidth provided by GPU, the performance of the SSSP algorithm is hindered by several bottlenecks, such as irregular memory access, load imbalance, and redundant operations. In this paper, three optimizations are proposed to boost the performance of the SSSP algorithm on GPU, including property-driven reordering, adaptive load balancing, and bucket-aware asynchronous execution. Property-driven reordering is employed to improve the data locality and work efficiency. Adaptive load balancing brings a higher utilization of software and hardware GPU resources. Bucket-aware asynchronous execution presents a bucket-based approach for asynchronous implementation to accelerate the convergence of SSSP search.

Extensive experimental results show that our work outperforms the state-of-the-art SSSP implementations, including GPU-based and CPU-based, with an average speedup of 5.09 × and 10.32 × on real-world and synthetic graphs. In addition, our SSSP algorithm indicates good scalability when the graph scale and GPU platform change.

References

[1]

2010. Graph500. http://www.graph500.org.

[2]

Masab Ahmad, Halit Dogan, and Omer Khan. 2019. Speculative Task Parallel Algorithm for Single Source Shortest Path. (2019).

[3]

Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia, PA, USA) (KDD ’06). Association for Computing Machinery, New York, NY, USA, 44–54. https://doi.org/10.1145/1150402.1150412

Digital Library

[4]

Richard Bellman. 1958. On a routing problem. Quarterly of applied mathematics 16, 1 (1958), 87–90.

[5]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (04 2016).

[6]

Ajay Brahmakshatriya, Yunming Zhang, Changwan Hong, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2021. Compiling Graph Applications for GPUs with Graphit. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (Virtual Event, Republic of Korea) (CGO ’21). IEEE Press, 248–261. https://doi.org/10.1109/CGO51591.2021.9370321

Digital Library

[7]

Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining, In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM Proceedings Series 6, 442–446. https://doi.org/10.1137/1.9781611972740.43

[8]

Marielle Christiansen, Kjetil Fagerholt, Bjørn Nygreen, and David Ronen. 2013. Ship routing and scheduling in the new millennium. European Journal of Operational Research 228, 3 (2013), 467–483.

[9]

Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In Proceedings of the 48th Annual International Symposium on Computer Architecture (Virtual Event, Spain) (ISCA ’21). IEEE Press, 595–608. https://doi.org/10.1109/ISCA52012.2021.00053

Digital Library

[10]

Andrew Davidson, Sean Baxter, Michael Garland, and John Owens. 2014. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, 349–359. https://doi.org/10.1109/IPDPS.2014.45

Digital Library

[11]

NVIDIA developer. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE. Technical Report. NVIDIA.

[12]

Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2017. Julienne: A Framework for Parallel Graph Algorithms Using Work-Efficient Bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (Washington, DC, USA) (SPAA ’17). Association for Computing Machinery, New York, NY, USA, 293–304. https://doi.org/10.1145/3087556.3087580

Digital Library

[13]

Edsger W Dijkstra 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269–271.

[14]

Rongyu Dong, Huawei Cao, Xiaochun Ye, Yuan Zhang, Qinfen Hao, and Dongrui Fan. 2020. Highly Efficient and GPU-Friendly Implementation of BFS on Single-node System. In 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE, 544–553.

[15]

Xiaojun Dong, Yan Gu, Yihan Sun, and Yunming Zhang. 2021. Efficient Stepping Algorithms and Implementations for Parallel Shortest Paths. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (Virtual Event, USA) (SPAA ’21). Association for Computing Machinery, New York, NY, USA, 184–197. https://doi.org/10.1145/3409964.3461782

Digital Library

[16]

Izzat El Hajj, Juan Gómez-Luna, Cheng Li, Li-Wen Chang, Dejan Milojicic, and Wen-mei Hwu. 2016. KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (Taipei, Taiwan) (MICRO-49). IEEE Press, Article 13, 12 pages.

[17]

Pawan Harish and Petter J Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. High Performance Computing-HiPC 2007 4873, 197–208. https://doi.org/10.1007/978-3-540-77220-0_21

Digital Library

[18]

Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The Dynamics of Viral Marketing. ACM Trans. Web 1, 1 (may 2007), 5–es. https://doi.org/10.1145/1232722.1232727

Digital Library

[19]

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting Positive and Negative Links in Online Social Networks. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 641–650. https://doi.org/10.1145/1772690.1772756

Digital Library

[20]

Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Signed Networks in Social Media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 1361–1370. https://doi.org/10.1145/1753326.1753532

Digital Library

[21]

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA) (KDD ’05). Association for Computing Machinery, New York, NY, USA, 177–187. https://doi.org/10.1145/1081870.1081893

Digital Library

[22]

Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.

[23]

Fredrik Liljeros, Christofer R Edling, Luis A Nunes Amaral, H Eugene Stanley, and Yvonne Åberg. 2001. The web of human sexual contacts. Nature 411, 6840 (2001), 907–908.

[24]

Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-First Graph Traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, Texas) (SC ’15). Association for Computing Machinery, New York, NY, USA, Article 68, 12 pages. https://doi.org/10.1145/2807591.2807594

Digital Library

[25]

Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and Processing of Graph Algorithms on GPUs. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (Renton, WA, USA) (USENIX ATC ’19). USENIX Association, USA, 411–427.

[26]

Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Garzarán, David Padua, and Keshav Pingali. 2016. DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS ’16). Association for Computing Machinery, New York, NY, USA, Article 32, 14 pages. https://doi.org/10.1145/2925426.2926287

Digital Library

[27]

Ulrich Meyer and Peter Sanders. 1998. Delta-Stepping: A Parallel Single Source Shortest Path Algorithm. In Proceedings of the 6th Annual European Symposium on Algorithms(ESA ’98). Springer-Verlag, Berlin, Heidelberg, 393–404.

[28]

Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (Williamsburg, VA, USA) (ASPLOS ’18). Association for Computing Machinery, New York, NY, USA, 622–636. https://doi.org/10.1145/3173162.3173180

Digital Library

[29]

Mhd Ghaith Olabi, Juan Gómez Luna, Onur Mutlu, Wen-mei Hwu, and Izzat El Hajj. 2022. A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. In Proceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization (Virtual Event, Republic of Korea) (CGO ’22). IEEE Press, 1–13. https://doi.org/10.1109/CGO53902.2022.9741284

Digital Library

[30]

Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas) (AAAI’15). AAAI Press, 4292–4293.

[31]

Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Shenzhen, China) (PPoPP ’13). Association for Computing Machinery, New York, NY, USA, 135–146. https://doi.org/10.1145/2442516.2442530

Digital Library

[32]

L. Takac and Michal Zábovský. 2012. Data analysis in public social networks. International Scientific Conference and International Workshop Present Day Trends of Innovations (01 2012), 1–6.

[33]

Hao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, and Xiaodong Zhang. 2019. SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP ’19). Association for Computing Machinery, New York, NY, USA, 38–52. https://doi.org/10.1145/3293883.3295733

Digital Library

[34]

Kai Wang, Don Fussell, and Calvin Lin. 2021. A Fast Work-Efficient SSSP Algorithm for GPUs. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Virtual Event, Republic of Korea) (PPoPP ’21). Association for Computing Machinery, New York, NY, USA, 133–146. https://doi.org/10.1145/3437801.3441605

Digital Library

[35]

Yangzihao Wang, Yuechao Pan, Andrew Davidson, Yuduo Wu, Carl Yang, Leyuan Wang, Muhammad Osama, Chenshan Yuan, Weitang Liu, Andy T. Riffel, and John D. Owens. 2017. Gunrock: GPU Graph Analytics. ACM Trans. Parallel Comput. 4, 1, Article 3 (aug 2017), 49 pages. https://doi.org/10.1145/3108140

Digital Library

[36]

Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities Based on Ground-Truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (Beijing, China) (MDS ’12). Association for Computing Machinery, New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/2350190.2350193

Digital Library

[37]

Yu Zhang, Da Peng, Xiaofei Liao, Hai Jin, Haikun Liu, Lin Gu, and Bingsheng He. 2021. LargeGraph: An efficient dependency-aware GPU-accelerated large-scale graph processing. ACM Transactions on Architecture and Code Optimization (TACO) 18, 4 (2021), 1–24.

Digital Library

[38]

Long Zheng, Xianliang Li, Xi Ge, Xiaofei Liao, Zhiyuan Shao, Hai Jin, and Qiang-Sheng Hua. 2021. Efficient Graph Processing with Invalid Update Filtration. IEEE Transactions on Big Data 7, 3 (2021), 590–602. https://doi.org/10.1109/TBDATA.2019.2921358

[39]

Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI’16). USENIX Association, USA, 301–316.

Index Terms

A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPU
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi

The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of ...
Parallel SIMD CPU and GPU Implementations of Berlekamp---Massey Algorithm and Its Error Correction Application

The Berlekamp---Massey algorithm finds the shortest linear feedback shift register for a binary input sequence. A wide range of applications like cryptography and digital signal processing use this algorithm. This research proposes novel parallel ...
Gillespie's Stochastic Simulation Algorithm on MIC coprocessors

To investigate the behavior of biochemical systems, many runs of Gillespie's Stochastic Simulation Algorithm (SSA) are generally needed, causing excessive computational costs on Central Processing Units (CPUs). Since all SSA runs are independent, the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops

August 2023

217 pages

ISBN:9798400708428

DOI:10.1145/3605731

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP-W 2023

ICPP-W 2023: 52nd International Conference on Parallel Processing Workshops

August 7 - 10, 2023

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
398
Total Downloads

Downloads (Last 12 months)344
Downloads (Last 6 weeks)43

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents