GAS: General-Purpose In-Memory-Computing Accelerator for Sparse Matrix Multiplication
Sparse matrix multiplication is widely used in various practical applications. Different accelerators have been proposed to speed up sparse matrix-dense vector multiplication (SpMV), sparse matrix-sparse vector multiplication (SpMSpV), sparse matrix-dense ...
An Integrated FPGA Accelerator for Deep Learning-Based 2D/3D Path Planning
Path planning is a crucial component for realizing the autonomy of mobile robots. However, due to limited computational resources on mobile robots, it remains challenging to deploy state-of-the-art methods and achieve real-time performance. To address ...
Prefender: A <underline>Pref</underline>etching Def<underline>en</underline>der Against Cache Side Channel Attacks as a Preten<underline>der</underline>
Cache side channel attacks are increasingly alarming in modern processors due to the recent emergence of Spectre and Meltdown attacks. A typical attack performs intentional cache access and manipulates cache states to leak secrets by observing the victim&#...
Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show promise in ...
Reordering and Compression for Hypergraph Processing
Hypergraphs are applicable to various domains such as social contagion, online groups, and protein structures due to their effective modeling of multivariate relationships. However, the increasing size of hypergraphs has led to high computation costs, ...
UniSched: A Unified Scheduler for Deep Learning Training Jobs With Different User Demands
The growth of deep learning training (DLT) jobs in modern GPU clusters calls for efficient deep learning (DL) scheduler designs. Due to the extensive applications of DL technology, developers may have different demands for their DLT jobs. It is important ...
Randomize the Running Function When It Is Disclosed
Address space layout randomization (ASLR) can hide code addresses, which has been widely adopted by security solutions. However, code probes can bypass it. In real attack scenarios, a single code probe can only obtain very limited code information instead ...
Monotonicity of Multi-term Floating-Point Adders
In the literature on algorithms for computing multi-term addition <inline-formula><tex-math notation="LaTeX">$s_{n}=\sum_{i=1}^{n}x_{i}$</tex-math><alternatives><mml:math><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:...
FPGA-Accelerated Range-Limited Molecular Dynamics
Long timescale Molecular Dynamics (MD) simulation of small molecules is crucial in drug design and basic science. To accelerate a small data set that is executed for a large number of iterations, high-efficiency is required. Recent work in this domain has ...
A Computing-in-Memory-Based One-Class Hyperdimensional Computing Model for Outlier Detection
In this work, we present <bold>ODHD</bold>, an algorithm for outlier detection based on hyperdimensional computing (HDC), a non-classical learning paradigm. Along with the HDC-based algorithm, we propose <bold>IM-ODHD</bold>, a computing-in-memory (CiM) ...
HPDK: A Hybrid PM-DRAM Key-Value Store for High I/O Throughput
This paper explores the design of an architecture that replaces Disk with Persistent Memory (PM) to achieve the highest I/O throughput in Log-Structured Merge Tree (LSM-Tree) based key-value stores (KVS). Most existing LSM-Tree based KVSs use PM as an ...
AdaptMD: Balancing Space and Performance in NUMA Architectures With Adaptive Memory Deduplication
Memory deduplication effectively relieves the memory space bottleneck by removing duplicate pages, especially in virtualized systems in which virtual machines run the same OS and similar applications. However, due to the non-uniform access latencies in ...
Decentralized Task Offloading in Edge Computing: An Offline-to-Online Reinforcement Learning Approach
Decentralized task offloading among cooperative edge nodes has been a promising solution to enhance resource utilization and improve users’ Quality of Experience (QoE) in edge computing. However, current decentralized methods, such as heuristics ...
ElasticDNN: On-Device Neural Network Remodeling for Adapting Evolving Vision Domains at Edge
Executing deep neural networks (DNN) based vision tasks on edge devices encounters challenging scenarios of significant and continually evolving data domains (e.g. background or subpopulation shift). With limited resources, the state-of-the-art domain ...
Improved Fault Analysis on Subterranean 2.0
Subterranean 2.0, a NIST second round lightweight cryptographic primitive, was introduced by Daemen et al. in 2020. It has three modes of operation: Subterranean-SAE, Subterranean-<bold>deck</bold>, and Subterranean-XOF. So far, most of the existing ...