Issue Downloads
High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System
Deep convolutional neural networks (DNNs) have been widely used in many applications, particularly in machine vision. It is challenging to accelerate DNNs on embedded systems because real-world machine vision applications should reserve a lot of external ...
FD-CNN: A Frequency-Domain FPGA Acceleration Scheme for CNN-Based Image-Processing Applications
In the emerging edge-computing scenarios, FPGAs have been widely adopted to accelerate convolutional neural network (CNN)–based image-processing applications, such as image classification, object detection, and image segmentation, and so on. A standard ...
An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGA
Transposed convolution has been prevailing in convolutional neural networks (CNNs), playing an important role in multiple scenarios such as image segmentation and back-propagation process of training CNNs. This mainly benefits from the ability to up-...
Accelerating Attention Mechanism on FPGAs based on Efficient Reconfigurable Systolic Array
Transformer model architectures have recently received great interest in natural language, machine translation, and computer vision, where attention mechanisms are their building blocks. However, the attention mechanism is expensive because of its ...
On the RTL Implementation of FINN Matrix Vector Unit
Field-programmable gate array (FPGA)–based accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their ability to scale performance with increasing degrees of specialization with dataflow architectures or custom ...
ACDSE: A Design Space Exploration Method for CNN Accelerator based on Adaptive Compression Mechanism
Customized accelerators for Convolutional Neural Network (CNN) can achieve better energy efficiency than general computing platforms. However, the design of a high-performance accelerator should take into account a variety of parameters and physical ...
TH-iSSD: Design and Implementation of a Generic and Reconfigurable Near-Data Processing Framework
We present the design and implementation of TH-iSSD, a near-data processing framework to address the data movement problem. TH-iSSD does not pose any restriction to the hardware selection and is highly reconfigurable—its core components, such as the on-...
RegKey: A Register-based Implementation of ECC Signature Algorithms Against One-shot Memory Disclosure
To ensure the security of cryptographic algorithm implementations, several cryptographic key protection schemes have been proposed to prevent various memory disclosure attacks. Among them, the register-based solutions do not rely on special hardware ...
SensiX++: Bringing MLOps and Multi-tenant Model Serving to Sensory Edge Devices
We present SensiX++, a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. SensiX++ operates on two fundamental principles: highly modular componentisation to externalise ...
Scheduling Dynamic Software Updates in Mobile Robots
We present NeRTA (Next Release Time Analysis), a technique to enable dynamic software updates for low-level control software of mobile robots. Dynamic software updates enable software correction and evolution during system operation. In mobile robotics, ...
Online Distributed Schedule Randomization to Mitigate Timing Attacks in Industrial Control Systems
Industrial control systems (ICSs) consist of a large number of control applications that are associated with periodic real-time flows with hard deadlines. To facilitate large-scale integration, remote control, and co-ordination, wireless sensor and ...
SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs
Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more ...
Energy-Efficient Communications for Improving Timely Progress of Intermittent-Powered BLE Devices
Battery-less devices offer potential solutions for maintaining sustainable Internet of Things (IoT) networks. However, limited energy harvesting capacity can lead to power failures, limiting the system’s quality of service (QoS). To improve timely task ...
A Comprehensive Model for Efficient Design Space Exploration of Imprecise Computational Blocks
After almost a decade of research, development of more efficient imprecise computational blocks is still a major concern in imprecise computing domain. There are many instances of the introduced imprecise components of different types, while their main ...
Dynamic Thermal Management of 3D Memory through Rotating Low Power States and Partial Channel Closure
Modern high-performance and high-bandwidth three-dimensional (3D) memories are characterized by frequent heating. Prior art suggests turning off hot channels and migrating data to the background DDR memory, incurring significant performance and energy ...
Enabling Binary Neural Network Training on the Edge
- Erwei Wang,
- James J. Davis,
- Daniele Moro,
- Piotr Zielinski,
- Jia Jie Lim,
- Claudionor Coelho,
- Satrajit Chatterjee,
- Peter Y. K. Cheung,
- George A. Constantinides
The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device ...
Design and Analysis of High Performance Heterogeneous Block-based Approximate Adders
Approximate computing is an emerging paradigm to improve the power and performance efficiency of error-resilient applications. As adders are one of the key components in almost all processing systems, a significant amount of research has been carried out ...