Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
cuSZp2: A GPU Lossy Compressor with Extreme Throughput and Optimized Compression Ratio
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 15, Pages 1–18https://doi.org/10.1109/SC41406.2024.00021Existing GPU lossy compressors suffer from expensive data movement overheads, inefficient memory access patterns, and high synchronization latency, resulting in limited throughput. This work proposes cuSZp2, a generic single-kernel error-bounded lossy ...
- extended-abstractAugust 2024
Agile Queue: A Fast Scalable Concurrent FIFO Queue on GPU
ICPP Workshops '24: Workshop Proceedings of the 53rd International Conference on Parallel ProcessingPages 108–109https://doi.org/10.1145/3677333.3678269This work presents Agile Queue, a queue specifically designed to support high concurrency on modern GPUs. At its core is to replace conflicting accesses to shared objects with independent accesses to private data. The proposed Agile queue operates on ...
- ArticleJuly 2024
Distributed SMT Solving Based on Dynamic Variable-Level Partitioning
AbstractSatisfiability Modulo Theories on arithmetic theories have significant applications in many important domains. Previous efforts have been mainly devoted to improving the techniques and heuristics in sequential SMT solvers. With the development of ...
- research-articleJuly 2024
Accelerating Static Null Pointer Dereference Detection with Parallel Computing
Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on InternetwarePages 135–144https://doi.org/10.1145/3671016.3671385High-precision static analysis can effectively detect Null Pointer Dereference (NPD) vulnerabilities in C language, but the performance overhead is significant. In recent years, researchers have attempted to enhance the efficiency of static analysis by ...
- research-articleJuly 2024
Optimizing DNN training with pipeline model parallelism for enhanced performance in embedded systems
Journal of Parallel and Distributed Computing (JPDC), Volume 190, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104890AbstractDeep Neural Networks (DNNs) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large DNN models in ...
Highlights- Parallel Computing.
- Machine Learning.
- DNN Model Partitioning.
- Embedded Systems.
- Embedded Software.
-
- short-paperJuly 2024
Preliminary Results of the MLPerf BERT Inference Benchmark on AMD Instinct GPUs
- Zixian Wang,
- Khai Vu,
- Miro Hodak,
- Aarush Mehrotra,
- Francisco Gutierrez,
- Kyle Smith,
- Gloria Seo,
- Austin Garcia,
- Bryan Chin,
- Marty Kandes,
- Mary P Thomas
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 59, Pages 1–5https://doi.org/10.1145/3626203.3670589In recent years, Artificial Intelligence (AI) has reshaped various facets of our day-to-day lives. To evaluate both hardware and software deployment of Machine Learning (ML) applications, it is necessary to measure system-wide performance and accuracy. ...
- short-paperJuly 2024
Multithreading USD and Qt: Adding Concurrency to Filament
DigiPro '24: Proceedings of the 2024 Digital Production SymposiumArticle No.: 3, Pages 1–5https://doi.org/10.1145/3665320.3670992As production scene complexity and CPU core count increase, the performance of software used to interact with the scenes may not scale accordingly. Filament is Animal Logic’s in-house, USD-based, PyQt lighting DCC, and a key area for improving Filament ...
- ArticleJune 2024
Automated and Automatic Systems of Management of an Optimization Programs Package for Decisions Making
Mathematical Optimization Theory and Operations ResearchPages 390–407https://doi.org/10.1007/978-3-031-62792-7_26AbstractIt is known that, in spite of a large number of methods for numerical solutions to various classes of problems, the choice of the most efficient method for solving a particular problem under specific values of its parameters requires a large ...
- research-articleJune 2024
Shray: An Owner-Compute Distributed Shared-Memory System
ARRAY 2024: Proceedings of the 10th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array ProgrammingPages 25–37https://doi.org/10.1145/3652586.3663314In this paper, we propose a new library for storing arrays in a distributed fashion on distributed memory systems. From a programmer's perspective, these arrays behave for arbitrary reads as if they were allocated in shared memory. When it comes to ...
- research-articleJune 2024
DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 1–13https://doi.org/10.1145/3650200.3656600The shortest paths problem is a fundamental challenge in graph theory, with a broad range of potential applications. The algorithms based on matrix multiplication exhibits excellent parallelism and scalability, but is constrained by high memory ...
- ArticleJuly 2024
Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means
Intelligent Information and Database SystemsPages 224–236https://doi.org/10.1007/978-981-97-4985-0_18AbstractThis paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a ...
- posterApril 2024
Accelerating Machine Learning Inference on GPUs with SYCL
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCLArticle No.: 17, Pages 1–2https://doi.org/10.1145/3648115.3648123Recently, machine learning has established itself as a valuable tool for researchers to analyze their data and draw conclusions in various scientific fields, such as High Energy Physics (HEP). Commonly used machine learning libraries, such as Keras and ...
- research-articleMarch 2024JUST ACCEPTED
PaSTG: A Parallel Spatio-Temporal GCN Framework for Traffic Forecasting in Smart City
Predicting future traffic conditions from urban sensor data is crucial for smart city applications. Recent traffic forecasting methods are derived from Spatio-Temporal Graph Convolution Networks (STGCNs). Despite their remarkable achievements, these ...
- ArticleMarch 2024
Dynamic Multi-bit Parallel Computing Method Based on Reconfigurable Structure
Algorithms and Architectures for Parallel ProcessingPages 347–359https://doi.org/10.1007/978-981-97-0801-7_20AbstractReconfigurable architecture has great potential in computation-intensive and memory-intensive applications due to its flexible information configuration. Aiming at the problem of low computing efficiency caused by the inconsistency between ...
- research-articleMarch 2024
HPPython: Extending Python with HPspmd for Data Parallel Programming
ISCAI '23: Proceedings of the 2023 2nd International Symposium on Computing and Artificial IntelligencePages 91–94https://doi.org/10.1145/3640771.3643711In light of previous endeavors and trends in the realm of parallel programming, HPPython emerges as an essential superset that enhances the accessibility of parallel programming for developers, facilitating scalability across multiple nodes. Despite ...
- ArticleApril 2024
P-QALSH+: Exploiting Multiple Cores to Parallelize Query-Aware Locality-Sensitive Hashing on Big Data
AbstractApproximate nearest neighbor (ANN) search in high dimensional Euclidean space is a fundamental problem of big data processing. Locality-Sensitive Hashing (LSH) is a popular scheme to solve the ANN search problem. In the index phase, an LSH scheme ...
- ArticleApril 2024
Exploring Mapping Strategies for Co-allocated HPC Applications
Euro-Par 2023: Parallel Processing WorkshopsPages 271–276https://doi.org/10.1007/978-3-031-48803-0_31AbstractIn modern HPC systems with deep hierarchical architectures, large-scale applications often struggle to efficiently utilize the abundant cores due to the saturation of resources such as memory. Co-allocating multiple applications to share compute ...
- ArticleApril 2024
High-Performance Distributed Computing with Smartphones
- Nadeem Ishikawa,
- Hayato Nomura,
- Yuya Yoda,
- Osamu Uetsuki,
- Keisuke Fukunaga,
- Seiji Nagoya,
- Junya Sawara,
- Hiroaki Ishihata,
- Junsuke Senoguchi
Euro-Par 2023: Parallel Processing WorkshopsPages 229–232https://doi.org/10.1007/978-3-031-48803-0_22AbstractThe demand for large-scale computing, such as the application of AI in big data analytics and engineering simulations, has been steadily increasing. Meanwhile, many companies and individuals now own smartphones and PCs, but a significant portion ...
- research-articleSeptember 2023
CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel
ICPP '23: Proceedings of the 52nd International Conference on Parallel ProcessingPages 92–101https://doi.org/10.1145/3605573.3605647The parameters of deep learning (DL) have ballooned from millions to trillions over the past decade, thus cannot be fully placed in a limited GPU memory. Existing works offload the parameter-update stage of training DL model to CPU, thus leveraging CPU ...
- research-articleSeptember 2023
DFCPP Runtime Library for Dataflow Programming
ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing WorkshopsPages 145–152https://doi.org/10.1145/3605731.3605887The Dataflow for C++(DFCPP) designed and implemented in this paper is a parallel programming library for dataflow computing on a general control flow hardware platform. Compared with existing dataflow programming libraries, DFCPP has an easy-to-use user ...