Stars
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
Intel® SHMEM - Device initiated shared memory based communication library
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
NUMA-aware multi-CPU multi-GPU data transfer benchmarks
OpenSHMEM Application Programming Interface
jrmadsen / omnitrace
Forked from ROCm/omnitraceOmnitrace: Application Profiling, Tracing, and Analysis
A streamlined CMake build system foundation for developing HPC software