Abstract
The explosive growth of graph data sets has led to an increase in the computing power and storage resources required for graph computing. To handle large-scale graph processing, heterogeneous platforms have become necessary to provide sufficient computing power and storage. The most popular scheme for this is the CPU-GPU architecture. However, the steep learning curve and complex concurrency control for heterogeneous platforms pose a challenge for developers. Additionally, GPUs from different vendors have varying software stacks, making cross-platform porting and verification challenging. Recently, Intel proposed a unified programming model to manage multiple heterogeneous devices at the same time, named oneAPI. It provides a more friendly programming model for simple C++ developers and a convenient concurrency control scheme, allowing managing different vendors of devices at the same time. Hence there is an opportunity to utilize oneAPI to design a general cross-architecture framework for large-scale graph computing. In this paper, we propose a large-scale graph computing framework for multiple types of accelerators with Intel oneAPI and we name it as OneGraph. Our approach significantly reduces the data transfer between GPU and CPU and masks the latency by asynchronous transfer, which significantly improves performance. We conducted rigorous performance tests on the framework using four classical graph algorithms. The experiment results show that our approach achieves an average speedup of 3.3x over the state-of-the-art partitioning-based approaches. Moreover, thanks to the cross-architecture model of Intel oneAPI, the framework can be deployed on different GPU platforms without code modification. And our evaluation proves that OneGraph has only less than 1% performance loss compared to the dedicated programming model on GPUs in large-scale graph computing.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the corresponding author upon reasonable request.
Code availability
The source code of OneGraph is available at https://github.com/NKU-EmbeddedSystem/OneGraph
Notes
This work is a redesign and optimization based on our previous work, which was demonstrated at the 50th International Conference on Parallel Processing (ICPP 2021) as “Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUs”.
References
AMD: ROCm (2022). https://www.amd.com/zh-hans/graphics/servers-solutions-rocm-ml. Accessed: April 7, 2023
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: a scalable fully distributed web crawler. Softw. Pract. Exp. 34, 711–726 (2004)
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining, pp. 442–446. SIAM (2004)
CodePlay: OneAPI for AMD GPUS. https://developer.codeplay.com/products/oneapi/amd/home/ (2023a). Accessed 14 Apr 2023
CodePlay: OneAPI for NVIDIA GPUS. https://developer.codeplay.com/products/oneapi/nvidia/home/ (2023b). Accessed 14 Apr 2023
Dong, Y., et al.: PEGASUS: pre-training graph neural networks by contrastive decoding of graph random walks, pp. 6996–7008 (2021)
Ganguly, D., Zhang, Z., Yang, J., Melhem, R.: Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription, pp. 451–461 (2020)
Han, W., Mawhirter, D., Wu, B., Buland, M.: Graphie: large-scale asynchronous graph traversals on just a GPU, pp. 233–245 (2017)
Harris, M. Unified memory for CUDA beginners. https://developer.nvidia.com/blog/unified-memory-cuda-beginners/ (2021). Accessed 31 Dec 2020
Intel: Intel oneAPI. https://www.intel.com/content/www/us/en/software/oneapi.html (2023a). Accessed Mar 2023
Intel: Migrate CUDA applications to oneAPI cross-architecture programming model based on SYCL. https://www.intel.com/content/www/us/en/developer/articles/technical/migrate-cuda-applications-to-oneapi-based-on-sycl.html (2023b). Accessed 14 Apr 2023
Jiang, C., Chou, J., Zhou, T.: cuGraph: a GPU-accelerated graph analytics library, pp. 1–7. IEEE (2018)
Khorasani, F., Vora, K., Gupta, R., Bhuyan, L.N.: CuSha: vertex-centric graph processing on GPUS, pp. 239–252 (2014)
Khronos: OpenCL. https://www.khronos.org/opencl/ (2011). Accessed 7 Apr 2023
Khronos: SYCL 2020 provisional specification. Tech. Rep., Khronos Group (2020)
Kim, W.: Prediction of essential proteins using topological properties in GO-pruned PPI network based on machine learning methods. Tsinghua Sci. Technol. 17, 645–658 (2012)
Kim, H., Sim, J., Gera, P., Hadidi, R., Kim, H.: Batch-aware unified memory management in GPUS for irregular workloads. In: ASPLOS’20, pp. 1357–1370. Association for Computing Machinery, NY, USA, New York (2020)
Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)
Low, Y., et al.: GraphLab: a new framework for parallel machine learning. In: UAI’10, pp. 340–349. AUAI Press, Arlington, Virginia, USA (2010)
Malewicz, G., et al.: Pregel: a system for large-scale graph processing, pp, 135–146. ACM (2011)
NVIDIA: CUDA toolkit. https://developer.nvidia.com/cuda-toolkit (2022). Accessed 7 Apr 2023
NVIDIA: NVIDIA Tesla P100-the most advanced datacenter accelerator ever built featuring pascal GP100. https://www.nvidia.cn/content/dam/en-zz/Solutions/Data-Center/tesla-p100/pdf/nvidia-teslap100-techoverview.pdf (2006). Accessed 26 Nov 2022
Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: AAAI’15, pp. 4292–4293 (2015)
Sabet, A.H.N., Zhao, Z., Gupta, R.: Subway: minimizing data transfer during out-of-GPU-memory graph processing. In: EuroSys’20. Association for Computing Machinery, NY, USA, New York (2020)
Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing. Proc. VLDB Endow. 11, 420–431 (2017)
Sengupta, D., Song, S.L., Agarwal, K., Schwan, K.: GraphReduce: processing large-scale graphs on accelerator-based systems. In: SC’15, Association for Computing Machinery, NY, USA, New York (2015)
Tang, R., et al.: Ascetic: enhancing cross-iterations data efficiency in out-of-memory graph processing on GPUS. In: ICPP 2021. Association for Computing Machinery, NY, USA, New York (2021)
Wang, Y., et al.: Gunrock: a high-performance graph processing library on the GPU, pp. 1–12 (2016)
Acknowledgements
This work was supported in part by the Key Research and Development Program of Guangdong, China (2021B0101310002), Natural Science Foundation of China (62172239), and Intel Corporation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Zhu, J., Han, J. et al. OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI. CCF Trans. HPC 6, 179–191 (2024). https://doi.org/10.1007/s42514-023-00172-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42514-023-00172-w