Abstract
High-performance computing increasingly relies on heterogeneous systems with specialized hardware accelerators to improve application performance. For example, NVIDIA’s CUDA programming system and general-purpose GPUs have emerged as a widespread accelerator in HPC systems. This trend has exacerbated challenges of data placement as accelerators often have fast local memories to fuel their computational demands, but slower interconnects to feed those memories. Crucially, real-world data-transfer performance is strongly influenced not just by the underlying hardware, but by the capabilities of the programming systems. Understanding how application performance is affected by the logical communication exposed through abstractions, as well as the underlying system topology, is crucial for developing high-performance applications and architectures. This report presents initial data-transfer microbenchmark results from two POWER-based systems obtained during work towards developing an automated system performance characterization tool.
This work is supported by IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) - a research collaboration as part of the IBM Cognitive Horizon Network. This work was supported by the Center for Applications Driving Architectures (ADA), one of six centers of JUMP, a Semiconductor Research Corporation program co-sponsored by DARPA. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation award OCI-0725070 and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
NUMA(3) Linux Programmer’s Manual (August 2007)
Cuda c programming guide (Nov 2017)
Caldeira, A.B.: Ibm power system ac922 introduction and technical overview. IBM Redbooks (2018)
Caldeira, A.B., Haug, V., Vetter, S.: Ibm power system 822lc for high performance computing introduction and technical overview. IBM Redbooks (2016)
Google: Benchmark. https://github.com/google/benchmark (2018)
Harris, M.: Unified memory in cuda 6 (2013), https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/
Pearson, C., Dakkak, A., Li, C.: microbench. https://github.com/rai-project/microbench (2018)
Wickman, C., Lameter, C., Schermerhorn, L.: numactl v2.0.11. https://github.com/numactl/numactl (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Pearson, C., Chung, IH., Sura, Z., Hwu, WM., Xiong, J. (2018). NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-02465-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)