No abstract available.
Proceeding Downloads
Taillight Signal Recognition via Sequential Learning
In autonomous driving, it is crucial to capture the driving intentions of other vehicles on the road, which can then be used for the autonomous driving vehicle to plan a safe route. This study proposes a system to identify the driving intention of other ...
Utility-Based Task Assignment for ON-based Mobile Crowdsourcing
In this paper we are interested in opportunistic network-based (ON-based) mobile crowdsourcing (MCS), where a requester (called a server) assigns a set of tasks to a pool of workers, and the workers process the assigned tasks for payoff. The key to ...
Develop an AIoT Badminton Serving Machine
An AIoT badminton serving machine refers to a shuttlecock serving machine with an IoT wireless interface that can support the integration with computer vision and intelligent computing to develop innovative applications to improve teaching and learning ...
Badminton Shot Event Detection and Feature Calculation from 3D Rally Video
The technical performance of badminton players in games can be evaluated based on the performance indices of each shot. The most representative shot performance indices are ball speed, outgoing angle, and ball type usage. In this study, a stereoscopic ...
Experience Deploying Graph Applications on GPUs with SYCL
SYCL allows for deployment and use of accelerators across vendors’ platforms. In this work, we describe the experience of deploying graph analytics on vendors’ GPUs using SYCL. We contrast the CUDA and SYCL application programming interfaces by ...
Evaluating Accelerators for a High-Throughput Hash-Based Security Protocol
Security threats are rising due to widely available computational power and near-future quantum computers. New cryptographic protocols have been developed to address these challenges, but very few protocols take advantage of parallel computing. In this ...
A Bucket-aware Asynchronous Single-Source Shortest Path Algorithm on GPU
Single-Source Shortest Path (SSSP) algorithm is a common routine in graph processing and has been extensively studied on Graphics Processing Unit (GPU). Despite the powerful parallelism resources and high memory bandwidth provided by GPU, the ...
NFCache: Fine-grained and Flexible Offloading of Network Functions to Programmable Switches
A Service Function Chain (SFC) consists of a sequence of Network Functions (NFs) in order, and plays an important role in network performance and security. In recent years, due to the low throughput and high latency of network functions in the context ...
Enhanced Memory Corruption Detection in C/C++ Programs
Out-of-bound memory accesses, which often occur in programs written in unsafe languages such as C or C++, cause severe troubles. Though there are many useful tools aiming at this problem, we report a new tool, called mcds, for detecting spatial and ...
Enhancing LLVM Optimizations for Linear Recurrence Programs on RVV
The RISC-V Vector Extension (RVV) has emerged as a promising vector architecture for high-performance computing. It enables parallel computing capability for RISC-V CPUs by introducing additional vector instructions and vector registers. To fully ...
Support of Sparse Tensor Computing for MLIR HLS
Nowadays, sparse tensor computations are widely used in machine learning. Without the multiplications in zero values, sparse tensor computation can significantly reduce the latency and power consumption. Famous frameworks like TensorFlow, PyTorch, ...
Pointer Analysis for Programs on Hybrid DRAM-PM Memory Systems
With the development of Non-Volatile Memory (NVM) technology, the practicality of byte-addressable persistent memory (PM) has become increasingly mature. The hybrid DRAM-PM memory systems have added diversity to program design and execution. In this ...
Mapping-Free GPU Offloading in OpenMP Using Unified Memory
With the increasing demand for heterogeneous computing, OpenMP has introduced an offloading feature that allows programmers to offload a task to a device (e.g., a GPU or an FPGA) by adding appropriate directives to the task since version 4.0. Compared ...
Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble Execution
GPUs are renowned for their exceptional computational acceleration capabilities achieved through massive parallelism. However, utilizing GPUs for computation requires manual identification of code regions suitable for offloading, data transfer ...
Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery
Early-bird communication is a communication/computation overlap technique that combines fine-grained communication with partitioned communication to improve application run-time. Communication is divided among the compute threads such that each ...
Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model
Dataflow Software Pipelining for Codelet Model is a coarse-grained code-mapping scheme designed to exploit pipelined parallelism across Codelets executing on different cores. The extended operational semantics of the Codelet model exploit pipelined ...
Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation
We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported ...
DFCPP Runtime Library for Dataflow Programming
The Dataflow for C++(DFCPP) designed and implemented in this paper is a parallel programming library for dataflow computing on a general control flow hardware platform. Compared with existing dataflow programming libraries, DFCPP has an easy-to-use user ...
Design and Implementation of Data Flow Programming Language DFC2
This article presents the design and development of a dataflow programming language called DFC2 (DataFlow C, version 2) based on the dataflow programming model. The DFC2 compiler is responsible for converting the DFC2 language into C++ language, which is ...
Combining a Parallel Branch-and-Bound Algorithm with a Strong Heuristic to Solve the Sequential Ordering Problem
In this paper, we describe how to combine a parallel branch-and-bound (B&B) algorithm and a strong heuristic to solve the Sequential Ordering Problem (SOP), which is an NP-hard optimization problem. A parallel B&B algorithm is run in parallel with the ...
A GPU-Accelerated Population Generation, Sorting, and Mutation Kernel for an Optimization-Based Causal Inference Model
We develop a GPU-accelerated machine learning generative adversarial network model that can be used with observational data for the purpose of constructing causal inferences. The theoretical basis of our machine learning model is novel and is ...
Multiobjective Hyperparameter Optimization for Deep Learning Interatomic Potential Training Using NSGA-II
Deep neural network (DNN) potentials are an emerging tool for simulation of dynamical atomistic systems, with the promise of quantum mechanical accuracy at speedups of 10000 ×. As with other DNN methods, hyperparameters used during training can make a ...
Polar Representation of 2D Image Using Complex Exponential Spiking Neuron Network
The paper introduces an innovative hybrid encoding method for images. It proposes a conversion process where the image is transformed from the conventional Cartesian coordinates representation (x and y) to a polar coordinates representation using ...
A new paradigm for forest fire spread prediction: Faster decisions at high resolution
Climate change has led to a significant increase in the number of wildfire events and their severity. To mitigate their impact, it is necessary to be able to make quick decisions according to the fire behavior. In order to assist on these decisions, we ...
Coordinated Botnet Detection in Social Networks via Clustering Analysis
Graphs are a widely used tool in modeling social interaction networks. In a network that consists of authors and pages with time-stamped interactions between one page and one author, we can model the network as a bipartite temporal graph. These graphs ...
Index Terms
- Proceedings of the 52nd International Conference on Parallel Processing Workshops
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
ICPP '18 | 313 | 91 | 29% |
Overall | 313 | 91 | 29% |