Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3605731.3605746acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPU

Published: 07 September 2023 Publication History

Abstract

Single-Source Shortest Path (SSSP) algorithm is a common routine in graph processing and has been extensively studied on Graphics Processing Unit (GPU). Despite the powerful parallelism resources and high memory bandwidth provided by GPU, the performance of the SSSP algorithm is hindered by several bottlenecks, such as irregular memory access, load imbalance, and redundant operations. In this paper, three optimizations are proposed to boost the performance of the SSSP algorithm on GPU, including property-driven reordering, adaptive load balancing, and bucket-aware asynchronous execution. Property-driven reordering is employed to improve the data locality and work efficiency. Adaptive load balancing brings a higher utilization of software and hardware GPU resources. Bucket-aware asynchronous execution presents a bucket-based approach for asynchronous implementation to accelerate the convergence of SSSP search.
Extensive experimental results show that our work outperforms the state-of-the-art SSSP implementations, including GPU-based and CPU-based, with an average speedup of 5.09 × and 10.32 × on real-world and synthetic graphs. In addition, our SSSP algorithm indicates good scalability when the graph scale and GPU platform change.

References

[1]
2010. Graph500. http://www.graph500.org.
[2]
Masab Ahmad, Halit Dogan, and Omer Khan. 2019. Speculative Task Parallel Algorithm for Single Source Shortest Path. (2019).
[3]
Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia, PA, USA) (KDD ’06). Association for Computing Machinery, New York, NY, USA, 44–54. https://doi.org/10.1145/1150402.1150412
[4]
Richard Bellman. 1958. On a routing problem. Quarterly of applied mathematics 16, 1 (1958), 87–90.
[5]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (04 2016).
[6]
Ajay Brahmakshatriya, Yunming Zhang, Changwan Hong, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2021. Compiling Graph Applications for GPUs with Graphit. In Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization (Virtual Event, Republic of Korea) (CGO ’21). IEEE Press, 248–261. https://doi.org/10.1109/CGO51591.2021.9370321
[7]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining, In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM Proceedings Series 6, 442–446. https://doi.org/10.1137/1.9781611972740.43
[8]
Marielle Christiansen, Kjetil Fagerholt, Bjørn Nygreen, and David Ronen. 2013. Ship routing and scheduling in the new millennium. European Journal of Operational Research 228, 3 (2013), 467–483.
[9]
Vidushi Dadu, Sihao Liu, and Tony Nowatzki. 2021. PolyGraph: Exposing the Value of Flexibility for Graph Processing Accelerators. In Proceedings of the 48th Annual International Symposium on Computer Architecture (Virtual Event, Spain) (ISCA ’21). IEEE Press, 595–608. https://doi.org/10.1109/ISCA52012.2021.00053
[10]
Andrew Davidson, Sean Baxter, Michael Garland, and John Owens. 2014. Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, 349–359. https://doi.org/10.1109/IPDPS.2014.45
[11]
NVIDIA developer. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE. Technical Report. NVIDIA.
[12]
Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2017. Julienne: A Framework for Parallel Graph Algorithms Using Work-Efficient Bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (Washington, DC, USA) (SPAA ’17). Association for Computing Machinery, New York, NY, USA, 293–304. https://doi.org/10.1145/3087556.3087580
[13]
Edsger W Dijkstra 1959. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269–271.
[14]
Rongyu Dong, Huawei Cao, Xiaochun Ye, Yuan Zhang, Qinfen Hao, and Dongrui Fan. 2020. Highly Efficient and GPU-Friendly Implementation of BFS on Single-node System. In 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE, 544–553.
[15]
Xiaojun Dong, Yan Gu, Yihan Sun, and Yunming Zhang. 2021. Efficient Stepping Algorithms and Implementations for Parallel Shortest Paths. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (Virtual Event, USA) (SPAA ’21). Association for Computing Machinery, New York, NY, USA, 184–197. https://doi.org/10.1145/3409964.3461782
[16]
Izzat El Hajj, Juan Gómez-Luna, Cheng Li, Li-Wen Chang, Dejan Milojicic, and Wen-mei Hwu. 2016. KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (Taipei, Taiwan) (MICRO-49). IEEE Press, Article 13, 12 pages.
[17]
Pawan Harish and Petter J Narayanan. 2007. Accelerating large graph algorithms on the GPU using CUDA. High Performance Computing-HiPC 2007 4873, 197–208. https://doi.org/10.1007/978-3-540-77220-0_21
[18]
Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The Dynamics of Viral Marketing. ACM Trans. Web 1, 1 (may 2007), 5–es. https://doi.org/10.1145/1232722.1232727
[19]
Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Predicting Positive and Negative Links in Online Social Networks. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 641–650. https://doi.org/10.1145/1772690.1772756
[20]
Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. 2010. Signed Networks in Social Media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 1361–1370. https://doi.org/10.1145/1753326.1753532
[21]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA) (KDD ’05). Association for Computing Machinery, New York, NY, USA, 177–187. https://doi.org/10.1145/1081870.1081893
[22]
Jure Leskovec, Kevin J Lang, Anirban Dasgupta, and Michael W Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29–123.
[23]
Fredrik Liljeros, Christofer R Edling, Luis A Nunes Amaral, H Eugene Stanley, and Yvonne Åberg. 2001. The web of human sexual contacts. Nature 411, 6840 (2001), 907–908.
[24]
Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-First Graph Traversal on GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, Texas) (SC ’15). Association for Computing Machinery, New York, NY, USA, Article 68, 12 pages. https://doi.org/10.1145/2807591.2807594
[25]
Hang Liu and H. Howie Huang. 2019. SIMD-X: Programming and Processing of Graph Algorithms on GPUs. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (Renton, WA, USA) (USENIX ATC ’19). USENIX Association, USA, 411–427.
[26]
Saeed Maleki, Donald Nguyen, Andrew Lenharth, María Garzarán, David Padua, and Keshav Pingali. 2016. DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS ’16). Association for Computing Machinery, New York, NY, USA, Article 32, 14 pages. https://doi.org/10.1145/2925426.2926287
[27]
Ulrich Meyer and Peter Sanders. 1998. Delta-Stepping: A Parallel Single Source Shortest Path Algorithm. In Proceedings of the 6th Annual European Symposium on Algorithms(ESA ’98). Springer-Verlag, Berlin, Heidelberg, 393–404.
[28]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (Williamsburg, VA, USA) (ASPLOS ’18). Association for Computing Machinery, New York, NY, USA, 622–636. https://doi.org/10.1145/3173162.3173180
[29]
Mhd Ghaith Olabi, Juan Gómez Luna, Onur Mutlu, Wen-mei Hwu, and Izzat El Hajj. 2022. A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. In Proceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization (Virtual Event, Republic of Korea) (CGO ’22). IEEE Press, 1–13. https://doi.org/10.1109/CGO53902.2022.9741284
[30]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas) (AAAI’15). AAAI Press, 4292–4293.
[31]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Shenzhen, China) (PPoPP ’13). Association for Computing Machinery, New York, NY, USA, 135–146. https://doi.org/10.1145/2442516.2442530
[32]
L. Takac and Michal Zábovský. 2012. Data analysis in public social networks. International Scientific Conference and International Workshop Present Day Trends of Innovations (01 2012), 1–6.
[33]
Hao Wang, Liang Geng, Rubao Lee, Kaixi Hou, Yanfeng Zhang, and Xiaodong Zhang. 2019. SEP-Graph: Finding Shortest Execution Paths for Graph Processing under a Hybrid Framework on GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP ’19). Association for Computing Machinery, New York, NY, USA, 38–52. https://doi.org/10.1145/3293883.3295733
[34]
Kai Wang, Don Fussell, and Calvin Lin. 2021. A Fast Work-Efficient SSSP Algorithm for GPUs. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Virtual Event, Republic of Korea) (PPoPP ’21). Association for Computing Machinery, New York, NY, USA, 133–146. https://doi.org/10.1145/3437801.3441605
[35]
Yangzihao Wang, Yuechao Pan, Andrew Davidson, Yuduo Wu, Carl Yang, Leyuan Wang, Muhammad Osama, Chenshan Yuan, Weitang Liu, Andy T. Riffel, and John D. Owens. 2017. Gunrock: GPU Graph Analytics. ACM Trans. Parallel Comput. 4, 1, Article 3 (aug 2017), 49 pages. https://doi.org/10.1145/3108140
[36]
Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities Based on Ground-Truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (Beijing, China) (MDS ’12). Association for Computing Machinery, New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/2350190.2350193
[37]
Yu Zhang, Da Peng, Xiaofei Liao, Hai Jin, Haikun Liu, Lin Gu, and Bingsheng He. 2021. LargeGraph: An efficient dependency-aware GPU-accelerated large-scale graph processing. ACM Transactions on Architecture and Code Optimization (TACO) 18, 4 (2021), 1–24.
[38]
Long Zheng, Xianliang Li, Xi Ge, Xiaofei Liao, Zhiyuan Shao, Hai Jin, and Qiang-Sheng Hua. 2021. Efficient Graph Processing with Invalid Update Filtration. IEEE Transactions on Big Data 7, 3 (2021), 590–602. https://doi.org/10.1109/TBDATA.2019.2921358
[39]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI’16). USENIX Association, USA, 301–316.

Index Terms

  1. A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPU

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops
    August 2023
    217 pages
    ISBN:9798400708428
    DOI:10.1145/3605731
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 September 2023

    Check for updates

    Author Tags

    1. GPU
    2. Graph
    3. Parallel Computing
    4. Single-Source Shortest Path

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP-W 2023

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 398
      Total Downloads
    • Downloads (Last 12 months)344
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media