Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
Exploring the Design Space of Distributed Parallel Sparse Matrix-Multiple Vector Multiplication
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 11Pages 1977–1988https://doi.org/10.1109/TPDS.2024.3452478We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will multiply the sparse matrix by multiple dense vectors at the same ...
- research-articleJuly 2024
IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 9Pages 1672–1689https://doi.org/10.1109/TPDS.2024.3432579The matrix multiplication algorithm is a fundamental numerical technique in linear algebra and plays a crucial role in many scientific computing applications. Despite the high performance of mainstream basic linear algebra libraries for large-scale dense ...
- research-articleNovember 2023
Simple, Fast and Widely Applicable Concurrent Memory Reclamation via Neutralization
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 2Pages 203–220https://doi.org/10.1109/TPDS.2023.3335671Reclaiming memory in non-blocking dynamic data structures in unmanaged languages like C/C++ presents a unique challenge due to the risk of use-after-free errors caused by concurrent accesses. Existing safe memory reclamation (SMR) algorithms fall short of ...
- research-articleOctober 2023
Parallel and Distributed Bayesian Network Structure Learning
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 35, Issue 4Pages 517–530https://doi.org/10.1109/TPDS.2023.3326832Bayesian networks (BNs) are graphical models representing uncertainty in causal discovery, and have been widely used in medical diagnosis and gene analysis due to their effectiveness and good interpretability. However, mainstream BN structure learning ...
- research-articleSeptember 2023
<sc>UMA-MF</sc>: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 34, Issue 11Pages 2978–2993https://doi.org/10.1109/TPDS.2023.3317535Recent research has shown that collaborative computing of CPUs and GPUs in the same system can effectively accelerate large-scale SGD-based matrix factorization (MF), but it faces the problem of limited scalability due to parameter synchronization in the ...
-
- research-articleDecember 2022
Co-Concurrency Mechanism for Multi-GPUs in Distributed Heterogeneous Environments
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4935–4947https://doi.org/10.1109/TPDS.2022.3208082The high concurrency and high throughput characteristics of graphics processing units (GPUs) have made researchers continue to use it to optimize distributed parallel computing architectures. With the upgrading of processor architecture, GPUs allow ...
- research-articleDecember 2022
LosaTM: A Hardware Transactional Memory Integrated With a Low-Overhead Scenario-Awareness Conflict Manager
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4849–4862https://doi.org/10.1109/TPDS.2022.3206777The vigorous development of high compute-intensive applications has led to the demand for maximizing the concurrency of multicore processors. The best-effort hardware transactional memory(HTM) is an important technology adopted by vendors to improve the ...
- research-articleDecember 2022
Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4530–4546https://doi.org/10.1109/TPDS.2022.3196192Nested (fork-join) parallelism eases parallel programming by enabling high-level expression of parallelism and leaving the mapping between parallel tasks and hardware to the runtime scheduler. A challenge in dynamic scheduling of nested parallelism is how ...
- research-articleJanuary 2022
A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 1Pages 159–175https://doi.org/10.1109/TPDS.2021.3090328General sparse matrix-matrix multiplication (SpGEMM) is one of the most important mathematical library routines in a number of applications. In recent years, several efficient SpGEMM algorithms have been proposed, however, most of them are based on the ...
- research-articleSeptember 2021
Optimizing the LINPACK Algorithm for Large-Scale PCIe-Based CPU-GPU Heterogeneous Systems
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 9Pages 2367–2380https://doi.org/10.1109/TPDS.2021.3067731There is a widening gap between GPU and other components (CPU, PCIe bus and communication network) in heterogeneous parallel system. The gap forces us to orchestrate cooperative execution among these components much more carefully than ever before. By ...
- research-articleSeptember 2021
Accelerating the Bron-Kerbosch Algorithm for Maximal Clique Enumeration Using GPUs
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 9Pages 2352–2366https://doi.org/10.1109/TPDS.2021.3067053Maximal clique enumeration (MCE) is a classic problem in graph theory to identify all complete subgraphs in a graph. In prior MCE work, the Bron-Kerbosch algorithm is one of the most popular solutions, and there are several improved algorithms proposed on ...
- research-articleAugust 2021
High-Performance Computing Implementations of Agent-Based Economic Models for Realizing 1:1 Scale Simulations of Large Economies
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 8Pages 2101–2114https://doi.org/10.1109/TPDS.2021.3060462We present a scalable high-performance computing implementation of an agent-based economic model using distributed <monospace>+</monospace> shared-memory hybrid parallelization paradigms, capable of simulating 1:1 scale models of large economies like the ...
- research-articleMay 2021
Analysis of Global and Local Synchronization in Parallel Computing
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 5Pages 988–1000https://doi.org/10.1109/TPDS.2020.3037469In a parallel computing scenario, the synchronization overhead, needed to coordinate the execution on the parallel computing nodes, can significantly impair the overall execution performance. Typically, synchronization is achieved by adopting a global ...
- research-articleApril 2021
Towards Efficient Large-Scale Interprocedural Program Static Analysis on Distributed Data-Parallel Computation
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 4Pages 867–883https://doi.org/10.1109/TPDS.2020.3036190Static program analysis has been widely applied along the whole process of the program development for bug detection, code optimization, testing, etc. Although researchers have made significant work in static program analysis, it is still challenging to ...
- research-articleApril 2021
SEIZE: Runtime Inspection for Parallel Dataflow Systems
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 4Pages 842–854https://doi.org/10.1109/TPDS.2020.3035170Many Data-Intensive Scalable Computing (DISC) Systems provide easy-to-use functional APIs, and efficient scheduling and execution strategies allowing users to build concise data-parallel programs. In these systems, data transformations are concealed by ...
- research-articleApril 2021
Accelerating Large-Scale Prioritized Graph Computations by Hotness Balanced Partition
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 4Pages 746–759https://doi.org/10.1109/TPDS.2020.3032709Prioritized computation is shown promising performance for a large class of graph algorithms. It prioritizes the execution of some vertices that play important roles in determining convergence. For large-scale distributed graph processing, graph ...
- research-articleFebruary 2021
CPDE: A Methodology for the Transparent Distribution of Centralized Smart Grid Programs
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 32, Issue 2Pages 342–354https://doi.org/10.1109/TPDS.2020.3019759Control and management in smart grids are facing many challenges such as scalability, heterogeneity and technology innovation. This requires a transformation from the traditional centralised paradigm into a distributed one. In this article, a new ...
- research-articleJune 2020
Concurrent Irrevocability in Best-Effort Hardware Transactional Memory
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 31, Issue 6Pages 1301–1315https://doi.org/10.1109/TPDS.2019.2963030Existing best-effort requester-wins implementations of transactional memory must resort to non-speculative execution to provide forward progress in the presence of transactions that exceed hardware capacity, experience page faults or suffer high-...
- research-articleJanuary 2020
Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 31, Issue 1Pages 111–128https://doi.org/10.1109/TPDS.2019.2927451Modern applications generate massive amounts of data that is challenging to process or analyse. Graph algorithms have emerged as a solution for the analysis of such data because they can represent the entities participating in the generation of large-...
- research-articleSeptember 2019
A Hardware Runtime for Task-Based Programming Models
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 30, Issue 9Pages 1932–1946https://doi.org/10.1109/TPDS.2019.2907493Task-based programming models such as OpenMP 5.0 and OmpSs are simple to use and powerful enough to exploit task parallelism of applications over multicore, manycore and heterogeneous systems. However, their software-only runtimes introduce relevant ...