Keyword: Parallel Computing : Search

research-article

cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio

SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 15, Pages 1–18https://doi.org/10.1109/SC41406.2024.00021

Existing GPU lossy compressors suffer from expensive data movement overheads, inefficient memory access patterns, and high synchronization latency, resulting in limited throughput. This work proposes cuSZp2, a generic single-kernel error-bounded lossy ...

extended-abstract

Open Access

Agile Queue: A Fast Scalable Concurrent FIFO Queue on GPU

ICPP Workshops '24: Workshop Proceedings of the 53rd International Conference on Parallel ProcessingPages 108–109https://doi.org/10.1145/3677333.3678269

This work presents Agile Queue, a queue specifically designed to support high concurrency on modern GPUs. At its core is to replace conflicting accesses to shared objects with independent accesses to private data. The proposed Agile queue operates on ...

Article

Open Access

Distributed SMT Solving Based on Dynamic Variable-Level Partitioning

Computer Aided VerificationPages 68–88https://doi.org/10.1007/978-3-031-65627-9_4

Abstract

Satisfiability Modulo Theories on arithmetic theories have significant applications in many important domains. Previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT solvers. With the development of ...

research-article

Accelerating Static Null Pointer Dereference Detection with Parallel Computing

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on InternetwarePages 135–144https://doi.org/10.1145/3671016.3671385

High-precision static analysis can effectively detect Null Pointer Dereference (NPD) vulnerabilities in C language, but the performance overhead is significant. In recent years, researchers have attempted to enhance the efficiency of static analysis by ...

research-article

Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems

Journal of Parallel and Distributed Computing (JPDC), Volume 190, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104890

Abstract

Deep Neural Networks (DNNs) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large DNN models in ...

Highlights

Parallel Computing.
Machine Learning.
DNN Model Partitioning.
Embedded Systems.
Embedded Software.

short-paper

Open Access

Preliminary Results of the MLPerf BERT Inference Benchmark on AMD Instinct GPUs

PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 59, Pages 1–5https://doi.org/10.1145/3626203.3670589

In recent years, Artificial Intelligence (AI) has reshaped various facets of our day-to-day lives. To evaluate both hardware and software deployment of Machine Learning (ML) applications, it is necessary to measure system-wide performance and accuracy. ...

short-paper

Multithreading USD and Qt: Adding Concurrency to Filament

DigiPro '24: Proceedings of the 2024 Digital Production SymposiumArticle No.: 3, Pages 1–5https://doi.org/10.1145/3665320.3670992

As production scene complexity and CPU core count increase, the performance of software used to interact with the scenes may not scale accordingly. Filament is Animal Logic’s in-house, USD-based, PyQt lighting DCC, and a key area for improving Filament ...

Article

Automated and Automatic Systems of Management of an Optimization Programs Package for Decisions Making

Mathematical Optimization Theory and Operations ResearchPages 390–407https://doi.org/10.1007/978-3-031-62792-7_26

Abstract

It is known that, in spite of a large number of methods for numerical solutions to various classes of problems, the choice of the most efficient method for solving a particular problem under specific values of its parameters requires a large ...

research-article

Open Access

Shray: An Owner-Compute Distributed Shared-Memory System

ARRAY 2024: Proceedings of the 10th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array ProgrammingPages 25–37https://doi.org/10.1145/3652586.3663314

In this paper, we propose a new library for storing arrays in a distributed fashion on distributed memory systems. From a programmer's perspective, these arrays behave for arbitrary reads as if they were allocated in shared memory. When it comes to ...

research-article

DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs

ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 1–13https://doi.org/10.1145/3650200.3656600

The shortest paths problem is a fundamental challenge in graph theory, with a broad range of potential applications. The algorithms based on matrix multiplication exhibits excellent parallelism and scalability, but is constrained by high memory ...

Article

Open Access

Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means

Intelligent Information and Database SystemsPages 224–236https://doi.org/10.1007/978-981-97-4985-0_18

Abstract

This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a ...

poster

Accelerating Machine Learning Inference on GPUs with SYCL

IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCLArticle No.: 17, Pages 1–2https://doi.org/10.1145/3648115.3648123

Recently, machine learning has established itself as a valuable tool for researchers to analyze their data and draw conclusions in various scientific fields, such as High Energy Physics (HEP). Commonly used machine learning libraries, such as Keras and ...

research-article

Free

JUST ACCEPTED

PaSTG: A Parallel Spatio-Temporal GCN Framework for Traffic Forecasting in Smart City

ACM Transactions on Sensor Networks (TOSN), Just Accepted https://doi.org/10.1145/3649467

Predicting future traffic conditions from urban sensor data is crucial for smart city applications. Recent traffic forecasting methods are derived from Spatio-Temporal Graph Convolution Networks (STGCNs). Despite their remarkable achievements, these ...

Article

Dynamic Multi-bit Parallel Computing Method Based on Reconfigurable Structure

Algorithms and Architectures for Parallel ProcessingPages 347–359https://doi.org/10.1007/978-981-97-0801-7_20

Abstract

Reconfigurable architecture has great potential in computation-intensive and memory-intensive applications due to its flexible information configuration. Aiming at the problem of low computing efficiency caused by the inconsistency between ...

research-article

HPPython: Extending Python with HPspmd for Data Parallel Programming

ISCAI '23: Proceedings of the 2023 2nd International Symposium on Computing and Artificial IntelligencePages 91–94https://doi.org/10.1145/3640771.3643711

In light of previous endeavors and trends in the realm of parallel programming, HPPython emerges as an essential superset that enhances the accessibility of parallel programming for developers, facilitating scalability across multiple nodes. Despite ...

Article

P-QALSH+: Exploiting Multiple Cores to Parallelize Query-Aware Locality-Sensitive Hashing on Big Data

Web and Big DataPages 28–43https://doi.org/10.1007/978-981-97-2390-4_3

Abstract

Approximate nearest neighbor (ANN) search in high dimensional Euclidean space is a fundamental problem of big data processing. Locality-Sensitive Hashing (LSH) is a popular scheme to solve the ANN search problem. In the index phase, an LSH scheme ...

Article

Open Access

Exploring Mapping Strategies for Co-allocated HPC Applications

Euro-Par 2023: Parallel Processing WorkshopsPages 271–276https://doi.org/10.1007/978-3-031-48803-0_31

Abstract

In modern HPC systems with deep hierarchical architectures, large-scale applications often struggle to efficiently utilize the abundant cores due to the saturation of resources such as memory. Co-allocating multiple applications to share compute ...

Article

High-Performance Distributed Computing with Smartphones

Euro-Par 2023: Parallel Processing WorkshopsPages 229–232https://doi.org/10.1007/978-3-031-48803-0_22

Abstract

The demand for large-scale computing, such as the application of AI in big data analytics and engineering simulations, has been steadily increasing. Meanwhile, many companies and individuals now own smartphones and PCs, but a significant portion ...

research-article

CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel

ICPP '23: Proceedings of the 52nd International Conference on Parallel ProcessingPages 92–101https://doi.org/10.1145/3605573.3605647

The parameters of deep learning (DL) have ballooned from millions to trillions over the past decade, thus cannot be fully placed in a limited GPU memory. Existing works offload the parameter-update stage of training DL model to CPU, thus leveraging CPU ...

research-article

Open Access

DFCPP Runtime Library for Dataflow Programming

ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing WorkshopsPages 145–152https://doi.org/10.1145/3605731.3605887

The Dataflow for C++(DFCPP) designed and implemented in this paper is a parallel programming library for dataflow computing on a general control flow hardware platform. Compared with existing dataflow programming libraries, DFCPP has an easy-to-use user ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences