Language types

Applied Filters

People

Publications

Publication Date

Searched The ACM Guide to Computing Literature (3,790,160 records)|Limit your search to The ACM Full-Text Collection (766,445 records)

Showing 1 - 20of210 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
August 2024
Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication
- Hua Huang,
- Edmond Chow
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 11Pages 1977–1988https://doi.org/10.1109/TPDS.2024.3452478
We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will multiply the sparse matrix by multiple dense vectors at the same ...
0
Metrics
Total Citations0
research-article
July 2024
IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 9Pages 1672–1689https://doi.org/10.1109/TPDS.2024.3432579
The matrix multiplication algorithm is a fundamental numerical technique in linear algebra and plays a crucial role in many scientific computing applications. Despite the high performance of mainstream basic linear algebra libraries for large-scale dense ...
0
Metrics
Total Citations0
research-article
November 2023
Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 2Pages 203–220https://doi.org/10.1109/TPDS.2023.3335671
Reclaiming memory in non-blocking dynamic data structures in unmanaged languages like C/C++ presents a unique challenge due to the risk of use-after-free errors caused by concurrent accesses. Existing safe memory reclamation (SMR) algorithms fall short of ...
2
Metrics
Total Citations2
research-article
October 2023
Parallel and Distributed Bayesian Network Structure Learning
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 4Pages 517–530https://doi.org/10.1109/TPDS.2023.3326832
Bayesian networks (BNs) are graphical models representing uncertainty in causal discovery, and have been widely used in medical diagnosis and gene analysis due to their effectiveness and good interpretability. However, mainstream BN structure learning ...
1
Metrics
Total Citations1
research-article
September 2023
<sc>UMA-MF</sc>: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization
- Yizhi Huang,
- Yan Liu,
- Yang Bai,
- Si Chen,
- Renfa Li
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 34, Issue 11Pages 2978–2993https://doi.org/10.1109/TPDS.2023.3317535
Recent research has shown that collaborative computing of CPUs and GPUs in the same system can effectively accelerate large-scale SGD-based matrix factorization (MF), but it faces the problem of limited scalability due to parameter synchronization in the ...
0
Metrics
Total Citations0
research-article
December 2022
Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4935–4947https://doi.org/10.1109/TPDS.2022.3208082
The high concurrency and high throughput characteristics of graphics processing units (GPUs) have made researchers continue to use it to optimize distributed parallel computing architectures. With the upgrading of processor architecture, GPUs allow ...
1
Metrics
Total Citations1
research-article
December 2022
LosaTM: A Hardware Transactional Memory Integrated With a Low-Overhead Scenario-Awareness Conflict Manager
- Chao Fu,
- Li Wan,
- Jun Han
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4849–4862https://doi.org/10.1109/TPDS.2022.3206777
The vigorous development of high compute-intensive applications has led to the demand for maximizing the concurrency of multicore processors. The best-effort hardware transactional memory(HTM) is an important technology adopted by vendors to improve the ...
0
Metrics
Total Citations0
research-article
Open Access
December 2022
Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing
- Shumpei Shiina,
- Kenjiro Taura
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4530–4546https://doi.org/10.1109/TPDS.2022.3196192
Nested (fork-join) parallelism eases parallel programming by enabling high-level expression of parallelism and leaving the mapping between parallel tasks and hardware to the runtime scheduler. A challenge in dynamic scheduling of nested parallelism is how ...
1
Metrics
Total Citations1
research-article
January 2022
A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 1Pages 159–175https://doi.org/10.1109/TPDS.2021.3090328
General sparse matrix-matrix multiplication (SpGEMM) is one of the most important mathematical library routines in a number of applications. In recent years, several efficient SpGEMM algorithms have been proposed, however, most of them are based on the ...
11
Metrics
Total Citations11
research-article
September 2021
Optimizing the LINPACK Algorithm for Large-Scale PCIe-Based CPU-GPU Heterogeneous Systems
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 9Pages 2367–2380https://doi.org/10.1109/TPDS.2021.3067731
There is a widening gap between GPU and other components (CPU, PCIe bus and communication network) in heterogeneous parallel system. The gap forces us to orchestrate cooperative execution among these components much more carefully than ever before. By ...
6
Metrics
Total Citations6
research-article
September 2021
Accelerating the Bron-Kerbosch Algorithm for Maximal Clique Enumeration Using GPUs
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 9Pages 2352–2366https://doi.org/10.1109/TPDS.2021.3067053
Maximal clique enumeration (MCE) is a classic problem in graph theory to identify all complete subgraphs in a graph. In prior MCE work, the Bron-Kerbosch algorithm is one of the most popular solutions, and there are several improved algorithms proposed on ...
4
Metrics
Total Citations4
research-article
August 2021
High-Performance Computing Implementations of Agent-Based Economic Models for Realizing 1:1 Scale Simulations of Large Economies
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 8Pages 2101–2114https://doi.org/10.1109/TPDS.2021.3060462
We present a scalable high-performance computing implementation of an agent-based economic model using distributed <monospace>+</monospace> shared-memory hybrid parallelization paradigms, capable of simulating 1:1 scale models of large economies like the ...
1
Metrics
Total Citations1
research-article
May 2021
Analysis of Global and Local Synchronization in Parallel Computing
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 5Pages 988–1000https://doi.org/10.1109/TPDS.2020.3037469
In a parallel computing scenario, the synchronization overhead, needed to coordinate the execution on the parallel computing nodes, can significantly impair the overall execution performance. Typically, synchronization is achieved by adopting a global ...
2
Metrics
Total Citations2
research-article
April 2021
Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 4Pages 867–883https://doi.org/10.1109/TPDS.2020.3036190
Static program analysis has been widely applied along the whole process of the program development for bug detection, code optimization, testing, etc. Although researchers have made significant work in static program analysis, it is still challenging to ...
8
Metrics
Total Citations8
research-article
April 2021
SEIZE: Runtime Inspection for Parallel Dataflow Systems
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 4Pages 842–854https://doi.org/10.1109/TPDS.2020.3035170
Many Data-Intensive Scalable Computing (DISC) Systems provide easy-to-use functional APIs, and efficient scheduling and execution strategies allowing users to build concise data-parallel programs. In these systems, data transformations are concealed by ...
0
Metrics
Total Citations0
research-article
April 2021
Accelerating Large-Scale Prioritized Graph Computations by Hotness Balanced Partition
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 4Pages 746–759https://doi.org/10.1109/TPDS.2020.3032709
Prioritized computation is shown promising performance for a large class of graph algorithms. It prioritizes the execution of some vertices that play important roles in determining convergence. For large-scale distributed graph processing, graph ...
0
Metrics
Total Citations0
research-article
February 2021
CPDE: A Methodology for the Transparent Distribution of Centralized Smart Grid Programs
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 2Pages 342–354https://doi.org/10.1109/TPDS.2020.3019759
Control and management in smart grids are facing many challenges such as scalability, heterogeneity and technology innovation. This requires a transformation from the traditional centralised paradigm into a distributed one. In this article, a new ...
0
Metrics
Total Citations0
research-article
June 2020
Concurrent Irrevocability in Best-Effort Hardware Transactional Memory
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 31, Issue 6Pages 1301–1315https://doi.org/10.1109/TPDS.2019.2963030
Existing best-effort requester-wins implementations of transactional memory must resort to non-speculative execution to provide forward progress in the presence of transactions that exceed hardware capacity, experience page faults or suffer high-...
0
Metrics
Total Citations0
research-article
Open Access
January 2020
Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 31, Issue 1Pages 111–128https://doi.org/10.1109/TPDS.2019.2927451
Modern applications generate massive amounts of data that is challenging to process or analyse. Graph algorithms have emerged as a solution for the analysis of such data because they can represent the entities participating in the generation of large-...
0
Metrics
Total Citations0
research-article
September 2019
A Hardware Runtime for Task-Based Programming Models
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 30, Issue 9Pages 1932–1946https://doi.org/10.1109/TPDS.2019.2907493
Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only runtimes introduce relevant ...
5
Metrics
Total Citations5

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

All Publications

Content Type

Publisher

Publication Date

Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication

IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs

Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization

Parallel and Distributed Bayesian Network Structure Learning

<sc>UMA-MF</sc>: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization

Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments

LosaTM: A Hardware Transactional Memory Integrated With a Low-Overhead Scenario-Awareness Conflict Manager

Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing

A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures

Optimizing the LINPACK Algorithm for Large-Scale PCIe-Based CPU-GPU Heterogeneous Systems

Accelerating the Bron-Kerbosch Algorithm for Maximal Clique Enumeration Using GPUs

High-Performance Computing Implementations of Agent-Based Economic Models for Realizing 1:1 Scale Simulations of Large Economies

Analysis of Global and Local Synchronization in Parallel Computing

Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation

SEIZE: Runtime Inspection for Parallel Dataflow Systems

Accelerating Large-Scale Prioritized Graph Computations by Hotness Balanced Partition

CPDE: A Methodology for the Transparent Distribution of Centralized Smart Grid Programs

Concurrent Irrevocability in Best-Effort Hardware Transactional Memory

Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi

A Hardware Runtime for Task-Based Programming Models