Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2023
shmem4py: High-Performance One-Sided Communication for Python Applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1185–1193https://doi.org/10.1145/3624062.3624602This paper describes shmem4py, a Python wrapper for the OpenSHMEM application programming interface (API) which follows a design similar to that of the well-known mpi4py package. OpenSHMEM is a descendant of the one-sided communication library for the ...
- research-articleNovember 2023
OpenSHMEM Queues: An abstraction for enhancing message rate, bandwidth utilization, and reducing tail latency in OpenSHMEM Applications
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 448–457https://doi.org/10.1145/3624062.3624113The performance of OpenSHMEM applications is significantly influenced by the network message rate and the efficient utilization of bandwidth for small messages. While network hardware offers higher message rates, software overheads hinder the ...
- research-articleNovember 2023
RMARaceBench: A Microbenchmark Suite to Evaluate Race Detection Tools for RMA Programs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 205–214https://doi.org/10.1145/3624062.3624087Parallel programming models with Remote Memory Access (RMA), such as MPI RMA, OpenSHMEM, and GASPI, allow processes to modify the memory of other processes directly. Special care is required to avoid concurrent conflicting accesses that lead to data ...
-
SV-sim: scalable PGAS-based state vector simulation of quantum circuits
- Ang Li,
- Bo Fang,
- Christopher Granade,
- Guen Prawiroatmodjo,
- Bettina Heim,
- Martin Roetteler,
- Sriram Krishnamoorthy
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 97, Pages 1–14https://doi.org/10.1145/3458817.3476169High-performance quantum circuit simulation in a classic HPC is still imperative in the NISQ era. Observing that the major obstacle of scalable state-vector quantum simulation arises from the massively fine-grained irregular data-exchange with remote ...
- ArticleSeptember 2021
OpenSHMEM Active Message Extension for Task-Based Programming
OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 129–143https://doi.org/10.1007/978-3-031-04888-3_8AbstractAs a lightweight library-based Partitioned Global Address Space (PGAS) programming model, OpenSHMEM provides efficient one-sided and collective communications and is receiving more attention in recent years. However, task-based programming models ...
- ArticleSeptember 2021
CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging Architectures
OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 92–110https://doi.org/10.1007/978-3-031-04888-3_6AbstractEndeavors to engineer the next generation of exascale platforms have resulted in a fundamental shift in system architectures. Orthogonal to what was once considered conventional wisdom, high performance systems designed today are characterized by ...
- ArticleSeptember 2021
Dynamic Symmetric Heap Allocation in NVSHMEM
OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 187–198https://doi.org/10.1007/978-3-031-04888-3_12AbstractThe OpenSHMEM programming model encourages application developers to partition memory into local and symmetric segments through the use of the SHMEM_SYMMETRIC_SIZE environment variable. While this can lead to improved communication efficiency, it ...
- research-articleMarch 2021
CircusTent: A Benchmark Suite for Atomic Memory Operations
MEMSYS '20: Proceedings of the International Symposium on Memory SystemsPages 144–157https://doi.org/10.1145/3422575.3422789A paradigm shift is currently taking place in the field of computer architecture. Consistent silicon-level processor improvements, relied upon in the past to drive the augmentation of system scalability, have stalled. As such, it is widely believed ...
- research-articleSeptember 2020
A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 24, Pages 1–25https://doi.org/10.1145/3418084Languages and libraries based on Partitioned Global Address Space (PGAS) programming models are convenient for exploiting scalable parallelism on large applications across different domains with irregular memory access patterns. OpenSHMEM is a PGAS-...
- research-articleNovember 2017
Graph500 on OpenSHMEM: Using A Practical Survey of Past Work to Motivate Novel Algorithmic Developments
PAW17: Proceedings of the Second Annual PGAS Applications WorkshopArticle No.: 2, Pages 1–8https://doi.org/10.1145/3144779.3144781Graph500 is an open specification of a graph-based benchmark for high-performance computing (HPC). The core computational kernel of Graph500 is a breadth-first search of an undirected graph. Unlike many other HPC benchmarks, Graph500 is therefore ...
- research-articleOctober 2016
CUDA-Aware OpenSHMEM
- Khaled Hamidouche,
- Akshay Venkatesh,
- Ammar Ahmad Awan,
- Hari Subramoni,
- Ching-Hsiang Chu,
- Dhabaleswar K. Panda
GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA networks like InfiniBand (IB) to GPUs. It enables IB network adapters to directly write/read data to/from GPU memory. Partitioned Global Address Space (PGAS) programming ...
- articleDecember 2015
Low-level PGAS computing on many-core processors with TSHMEM
Concurrency and Computation: Practice & Experience (CCOMP), Volume 27, Issue 17Pages 5288–5310https://doi.org/10.1002/cpe.3569Diminishing returns from increased clock frequencies and instruction-level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many-core architectures have ...
- research-articleNovember 2015
LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications
LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPCArticle No.: 2, Pages 1–8https://doi.org/10.1145/2833157.2833158We extend the LLVM intermediate representation (IR) to make it a parallel IR (LLVM PIR), which is a necessary step for introducing simple and generic parallel code optimization into LLVM. LLVM is a modular compiler that can be efficiently and easily ...
- ArticleSeptember 2015
On the Fence: An Offload Approach to Ordering One-Sided Communication
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsPages 1–12https://doi.org/10.1109/PGAS.2015.9Partitioned Global Address Space (PGAS) and one-sided communication models allow shared data to be transparently and asynchronously accessed by any process within a parallel computation. In order to ensure that updates are performed in the intended ...
- ArticleSeptember 2015
ISx: A Scalable Integer Sort for Co-design in the Exascale Era
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsPages 102–104https://doi.org/10.1109/PGAS.2015.21This paper introduces a new scalable integer sort application inspired by the NAS Parallel Benchmark integer sort. We provide a detailed analysis of the NPB integer sort to motivate the development of ISx---a new integer sort for co-design. ISx is a ...
- ArticleSeptember 2015
Impact of Frequency Scaling on One Sided Remote Memory Accesses
PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsPages 25–37https://doi.org/10.1109/PGAS.2015.11CPU Frequency scaling is a common approach used for achieving energy savings in parallel applications. A typical approach for achieving power savings is by reducing the frequency of a processor whenever the invested CPU cycles do not contribute to the ...