Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2311.13225 (cs)

[Submitted on 22 Nov 2023 (v1), last revised 12 Dec 2023 (this version, v2)]

Title:NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Authors:Xin Ai, Qiange Wang, Chunyu Cao, Yanfeng Zhang, Chaoyi Chen, Hao Yuan, Yu Gu, Ge Yu

Abstract:Graph Neural Networks (GNNs) have demonstrated outstanding performance in various applications. Existing frameworks utilize CPU-GPU heterogeneous environments to train GNN models and integrate mini-batch and sampling techniques to overcome the GPU memory limitation. In CPU-GPU heterogeneous environments, we can divide sample-based GNN training into three steps: sample, gather, and train. Existing GNN systems use different task orchestrating methods to employ each step on CPU or GPU. After extensive experiments and analysis, we find that existing task orchestrating methods fail to fully utilize the heterogeneous resources, limited by inefficient CPU processing or GPU resource contention. In this paper, we propose NeutronOrch, a system for sample-based GNN training that incorporates a layer-based task orchestrating method and ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method, fully overlapping different tasks on heterogeneous resources while strictly guaranteeing bounded staleness. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 11.51x performance speedup.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2311.13225 [cs.DC]
	(or arXiv:2311.13225v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2311.13225

Submission history

From: Xin Ai [view email]
[v1] Wed, 22 Nov 2023 08:26:42 UTC (895 KB)
[v2] Tue, 12 Dec 2023 02:36:51 UTC (1,109 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:NeutronOrch: Rethinking Sample-based GNN Training under CPU-GPU Heterogeneous Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators