Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3606043.3606085acmotherconferencesArticle/Chapter ViewAbstractPublication Pageshp3cConference Proceedingsconference-collections
research-article

Design and Implementation of External Storage Large-Scale Graph Computing System

Published: 16 November 2023 Publication History

Abstract

With the rise of big data, graph computing has become prevalent in many fields. To effectively address such problems, large-scale graph computing systems have emerged. Most existing systems adopt memory-based computing frameworks. However, the rapid growth of data scale often leads to insufficient memory storage. To tackle these issues, an external storage-based graph computing system called DCGraph has been designed and implemented. The bottleneck of large-scale graph computing systems based on single-machine external storage usually lies in external storage I/O. Hence, DCGraph has been optimized for external storage I/O. In the preprocessing stage, graph data is compressed and transformed into a two-dimensional CSC format. Each block records the offset of its data. This guarantees that data of the same target vertex is continuously stored in external storage within the same block. DCSC format is selected for compressing data blocks with many vertices of zero in-degree. Additionally, a jump-out calculation mode suitable for the CSC format is designed for specific graph calculation algorithms. During data reading, it is determined whether to skip the current record based on its subscript and offset. Selective scheduling is used to skip inactive data blocks during each layer iteration to reduce unnecessary data reading and processing. This approach reduces both the number and total amount of I/Os. Experimental results show that compared to GridGraph, DCGraph achieves a speedup ratio of 1.5 to 2.8 times while reducing the total amount of I/Os in calculation to less than half of that of GridGraph.
CCS CONCEPTS

References

[1]
McAfee A, Brynjolfsson E, Davenport T H, Big data: the management revolution [J]. Harvard businessreview, 2012, 90(10): 60-68.
[2]
Twitter usage statistics[EB/OL]. http://www.internetlivestats.com/twitter-statistics.
[3]
Kyrola A, Blelloch G, Guestrin C. GraphChi: Large-Scale Graph Computation on Just a PC[C]//10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012: 31-46.
[4]
Zhu X, Han W, Chen W. GridGraph: Large-Scale Graph Processing on a single-machine Using 2-Level Hierarchical Partitioning[C]//2015 USENIX Annual Technical Conference (USENIX ATC 15). 2015: 375-386.
[5]
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
[6]
Shvachko K, Kuang H, Radia S, The hadoop distributed file system[C]//2010 IEEE 26th symposium on mass storage systems and technologies (MSST). Ieee, 2010: 1-10.
[7]
Zaharia M, Chowdhury M, Franklin M J, Spark: Cluster computing with working sets[C]//2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10). 2010.
[8]
Yang C, Buluç A, Owens J D. GraphBLAST: A high-performance linear algebra-based graph framework on the GPU[J]. ACM Transactions on Mathematical Software (TOMS), 2022, 48(1): 1-51.
[9]
Zhang K, Chen R, Chen H. Numa-aware graph-structured analytics[J]. ACM SIGPLAN Notices, 2015, 50(8):183-193.
[10]
Li Bo, Huang Dongqiang, Jia Jinfang, Wu Li, Wang Xiaoying, Huang Jianqiang. Research on Template Computing Optimization Based on CPU+GPU Heterogeneity [J/OL]. Computer Engineering: 1-9 [2023-02-01].https: https://doi.org/10.19678/j.issn.1000-3428.0064282.
[11]
Qiu J, Dhulipala L, Tang J, Lightne: A lightweight graph processing system for network embedding[C]//Proceedings of the 2021 international conference on management of data. 2021: 2281-2289.
[12]
Thorpe J, Qiao Y, Eyolfson J, Dorylus: Affordable, Scalable, and Accurate {GNN} Training with Distributed {CPU} Servers and Serverless Threads[C]//15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 2021: 495-514.
[13]
Dias V, Teixeira C H C, Guedes D, Fractal: A general-purpose graph pattern mining system[C]//Proceedings of the 2019 International Conference on Management of Data. 2019: 1357-1374.
[14]
Lü Y, Guo H, Huang L, GraphPEG: Accelerating graph processing on GPUs[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2021, 18(3): 1-24.
[15]
Wang H, Geng L, Lee R, SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU[C]//Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 2019: 38-52.
[16]
Chen R, Shi J, Chen Y, Powerlyra: Differentiated graph computation and partitioning on skewed graphs[J]. ACM Transactions on Parallel Computing (TOPC), 2019, 5(3): 1-39.
[17]
Zhao J, Zhang Y, Liao X, GraphM: an efficient storage system for high throughput of concurrent graph processing[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019: 1-14.
[18]
Zheng L, Li X, Zheng Y, Scaph: Scalable GPU-Accelerated Graph Processing with Value-Driven Differential Scheduling[C]//2020 USENIX Annual Technical Conference (USENIX ATC 20). 2020: 573-588.
[19]
Chi Y, Dai G, Wang Y, Nxgraph: An efficient graph processing system on a single machine[C]//2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 2016: 409-420.
[20]
Han W S, Lee S, Park K, TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC[C]//Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013: 77-85.
[21]
Zhao C, Zhang Z, Xu P, Kaleido: An efficient out-of-core graph mining system on A single machine[C]//2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020: 673-684.
[22]
Park Y, Min S, Lee J W. Ginex: SSD-enabled Billion-scale Graph Neural Network Training on a Single Machine via Provably Optimal In-memory Caching[J]. arXiv preprint arXiv:2208.09151, 2022.
[23]
Wu C, Zhang G, Wang Y, Redio: Accelerating Disk-Based Graph Processing by Reducing Disk I/Os[J]. IEEE Transactions on Computers, 2018, 68(3): 414-425.
[24]
Sun P, Wen Y, Duong T N B, GraphMP: I/O-efficient big graph analytics on a single commodity machine[J]. IEEE Transactions on Big Data, 2019, 6(4): 816-829.
[25]
Vora K. {LUMOS}:{Dependency-Driven} Disk-based Graph Processing[C]//2019 USENIX Annual Technical Conference (USENIX ATC 19). 2019: 429-442.

Index Terms

  1. Design and Implementation of External Storage Large-Scale Graph Computing System

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HP3C '23: Proceedings of the 2023 7th International Conference on High Performance Compilation, Computing and Communications
    June 2023
    354 pages
    ISBN:9781450399883
    DOI:10.1145/3606043
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 November 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Big Data
    2. External Storage
    3. Graph Processing System
    4. I/O
    5. Single-Machine

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HP3C 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 41
      Total Downloads
    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media