Keyword: OpenSHMEM : Search

research-article

Open Access

shmem4py: High-Performance One-Sided Communication for Python Applications

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1185–1193https://doi.org/10.1145/3624062.3624602

This paper describes shmem4py, a Python wrapper for the OpenSHMEM application programming interface (API) which follows a design similar to that of the well-known mpi4py package. OpenSHMEM is a descendant of the one-sided communication library for the ...

research-article

Open Access

OpenSHMEM Queues: An abstraction for enhancing message rate, bandwidth utilization, and reducing tail latency in OpenSHMEM Applications

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 448–457https://doi.org/10.1145/3624062.3624113

The performance of OpenSHMEM applications is significantly influenced by the network message rate and the efficient utilization of bandwidth for small messages. While network hardware offers higher message rates, software overheads hinder the ...

research-article

RMARaceBench: A Microbenchmark Suite to Evaluate Race Detection Tools for RMA Programs

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 205–214https://doi.org/10.1145/3624062.3624087

Parallel programming models with Remote Memory Access (RMA), such as MPI RMA, OpenSHMEM, and GASPI, allow processes to modify the memory of other processes directly. Special care is required to avoid concurrent conflicting accesses that lead to data ...

Article

Extending OpenSHMEM with Aggregation Support for Improved Message Rate Performance

Euro-Par 2023: Parallel ProcessingPages 32–46https://doi.org/10.1007/978-3-031-39698-4_3

Abstract

OpenSHMEM is a highly efficient one-sided communication API that implements the PGAS parallel programming model, and is known for its low latency communication operations that can be mapped efficiently to RDMA capabilities of network ...

Article

A Productive and Scalable Actor-Based Programming System for PGAS Applications

Computational Science – ICCS 2022Pages 233–247https://doi.org/10.1007/978-3-031-08751-6_17

Abstract

The Partitioned Global Address Space (PGAS) model is well suited for executing irregular applications on cluster-based systems, due to its efficient support for short, one-sided messages. Separately, the actor model has been gaining popularity as ...

research-article

Public Access

SV-sim: scalable PGAS-based state vector simulation of quantum circuits

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 97, Pages 1–14https://doi.org/10.1145/3458817.3476169

High-performance quantum circuit simulation in a classic HPC is still imperative in the NISQ era. Observing that the major obstacle of scalable state-vector quantum simulation arises from the massively fine-grained irregular data-exchange with remote ...

Article

OpenSHMEM Active Message Extension for Task-Based Programming

OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 129–143https://doi.org/10.1007/978-3-031-04888-3_8

Abstract

As a lightweight library-based Partitioned Global Address Space (PGAS) programming model, OpenSHMEM provides efficient one-sided and collective communications and is receiving more attention in recent years. However, task-based programming models ...

Article

SHMEM-ML: Leveraging OpenSHMEM and Apache Arrow for Scalable, Composable Machine Learning

OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 111–125https://doi.org/10.1007/978-3-031-04888-3_7

Abstract

SHMEM-ML is a domain specific library for distributed array computations and machine learning model training & inference. Like other projects at the intersection of machine learning and HPC (e.g. dask, Arkouda, Legate Numpy), SHMEM-ML aims to ...

Article

CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging Architectures

OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 92–110https://doi.org/10.1007/978-3-031-04888-3_6

Abstract

Endeavors to engineer the next generation of exascale platforms have resulted in a fundamental shift in system architectures. Orthogonal to what was once considered conventional wisdom, high performance systems designed today are characterized by ...

Article

Dynamic Symmetric Heap Allocation in NVSHMEM

OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 187–198https://doi.org/10.1007/978-3-031-04888-3_12

Abstract

The OpenSHMEM programming model encourages application developers to partition memory into local and symmetric segments through the use of the SHMEM_SYMMETRIC_SIZE environment variable. While this can lead to improved communication efficiency, it ...

Article

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart NetworksPages 3–20https://doi.org/10.1007/978-3-031-04888-3_1

Abstract

Over the last few decades, the Message Passing Interface (MPI) has become the parallel-communication standard for distributed algorithms on high-performance CPUs. MPI’s minimal setup overhead and simple API calls give it a low barrier of entry, ...

research-article

CircusTent: A Benchmark Suite for Atomic Memory Operations

MEMSYS '20: Proceedings of the International Symposium on Memory SystemsPages 144–157https://doi.org/10.1145/3422575.3422789

A paradigm shift is currently taking place in the field of computer architecture. Consistent silicon-level processor improvements, relied upon in the past to drive the augmentation of system scalability, have stalled. As such, it is widely believed ...

research-article

A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays

ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 24, Pages 1–25https://doi.org/10.1145/3418084

Languages and libraries based on Partitioned Global Address Space (PGAS) programming models are convenient for exploiting scalable parallelism on large applications across different domains with irregular memory access patterns. OpenSHMEM is a PGAS-...

research-article

Public Access

Graph500 on OpenSHMEM: Using A Practical Survey of Past Work to Motivate Novel Algorithmic Developments

PAW17: Proceedings of the Second Annual PGAS Applications WorkshopArticle No.: 2, Pages 1–8https://doi.org/10.1145/3144779.3144781

Graph500 is an open specification of a graph-based benchmark for high-performance computing (HPC). The core computational kernel of Graph500 is a breadth-first search of an undirected graph. Unlike many other HPC benchmarks, Graph500 is therefore ...

research-article

CUDA-Aware OpenSHMEM

Parallel Computing (PACO), Volume 58, Issue CPages 27–36https://doi.org/10.1016/j.parco.2016.05.003

GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA networks like InfiniBand (IB) to GPUs. It enables IB network adapters to directly write/read data to/from GPU memory. Partitioned Global Address Space (PGAS) programming ...

article

Low-level PGAS computing on many-core processors with TSHMEM

Concurrency and Computation: Practice & Experience (CCOMP), Volume 27, Issue 17Pages 5288–5310https://doi.org/10.1002/cpe.3569

Diminishing returns from increased clock frequencies and instruction-level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many-core architectures have ...

research-article

LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications

LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPCArticle No.: 2, Pages 1–8https://doi.org/10.1145/2833157.2833158

We extend the LLVM intermediate representation (IR) to make it a parallel IR (LLVM PIR), which is a necessary step for introducing simple and generic parallel code optimization into LLVM. LLVM is a modular compiler that can be efficiently and easily ...

Article

On the Fence: An Offload Approach to Ordering One-Sided Communication

PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsPages 1–12https://doi.org/10.1109/PGAS.2015.9

Partitioned Global Address Space (PGAS) and one-sided communication models allow shared data to be transparently and asynchronously accessed by any process within a parallel computation. In order to ensure that updates are performed in the intended ...

Article

ISx: A Scalable Integer Sort for Co-design in the Exascale Era

PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsPages 102–104https://doi.org/10.1109/PGAS.2015.21

This paper introduces a new scalable integer sort application inspired by the NAS Parallel Benchmark integer sort. We provide a detailed analysis of the NPB integer sort to motivate the development of ISx---a new integer sort for co-design. ISx is a ...

Article

Impact of Frequency Scaling on One Sided Remote Memory Accesses

PGAS '15: Proceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming ModelsPages 25–37https://doi.org/10.1109/PGAS.2015.11

CPU Frequency scaling is a common approach used for achieving energy savings in parallel applications. A typical approach for achieving power savings is by reducing the frequency of a processor whenever the invested CPU cycles do not contribute to the ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Caption

shmem4py: High-Performance One-Sided Communication for Python Applications

OpenSHMEM Queues: An abstraction for enhancing message rate, bandwidth utilization, and reducing tail latency in OpenSHMEM Applications

RMARaceBench: A Microbenchmark Suite to Evaluate Race Detection Tools for RMA Programs

Extending OpenSHMEM with Aggregation Support for Improved Message Rate Performance

A Productive and Scalable Actor-Based Programming System for PGAS Applications

SV-sim: scalable PGAS-based state vector simulation of quantum circuits

OpenSHMEM Active Message Extension for Task-Based Programming

SHMEM-ML: Leveraging OpenSHMEM and Apache Arrow for Scalable, Composable Machine Learning

CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging Architectures

Dynamic Symmetric Heap Allocation in NVSHMEM

A Study in SHMEM: Parallel Graph Algorithm Acceleration with Distributed Symmetric Memory

CircusTent: A Benchmark Suite for Atomic Memory Operations

A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays

Graph500 on OpenSHMEM: Using A Practical Survey of Past Work to Motivate Novel Algorithmic Developments

CUDA-Aware OpenSHMEM

Low-level PGAS computing on many-core processors with TSHMEM

LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications

On the Fence: An Offload Approach to Ordering One-Sided Communication

ISx: A Scalable Integer Sort for Co-design in the Exascale Era

Impact of Frequency Scaling on One Sided Remote Memory Accesses

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder