Keyword: HLS : Search

research-article

Free

JUST ACCEPTED

SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA Accelerators

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3705324

High-level synthesis (HLS) aims at democratizing custom hardware acceleration with highly abstracted software-like descriptions. However, efficient accelerators still require substantial low-level hardware optimizations, defeating the HLS intent. In the ...

research-article

Efficient deployment of Single Shot Multibox Detector network on FPGAs

Integration, the VLSI Journal (INTG), Volume 99, Issue Chttps://doi.org/10.1016/j.vlsi.2024.102255

Abstract

FPGAs, characterized by their low power consumption and swift response, are ideally suited for parallel computations associated with object detection tasks, making them a popular choice for target detection and neural network acceleration. ...

Highlights

Parallel computation boosts speed and efficiency in convolutional layers.
Integrated parallel processing enhances convolution activation pooling.
Efficient memory management reduces read/write time for feature layers.
...

research-article

Exploiting Retina Biometric Fused with Encoded Hash for Designing Watermarked Convolutional Hardware IP Against Piracy

SN Computer Science (SNCS), Volume 5, Issue 8https://doi.org/10.1007/s42979-024-03247-9

Abstract

The convolution layer in a convolutional neural network (CNN) is highly computationally intensive. It is crucial to design reusable low-cost hardware IP for convolutional layer for enabling hardware-based feature extraction. However, the ...

research-article

Hardware Security of Image Processing Cores Against IP Piracy Using PSO-Based HLS-Driven Multi-Stage Encryption Fused with Fingerprint Signature

SN Computer Science (SNCS), Volume 5, Issue 7https://doi.org/10.1007/s42979-024-03255-9

Abstract

The increasing usage of image processing applications in modern technological environments is driven by their ability to enhance visual quality in diverse applications, from social media to medical imaging. The design of these cores as dedicated ...

research-article

DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLS

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 45, Pages 1–32https://doi.org/10.1145/3650038

Rapid growth in data size poses significant computational and memory challenges to data processing. FPGA accelerators and near-storage processing have emerged as compelling solutions for tackling the growing computational and memory requirements. Many ...

research-article

Open Access

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

MLCAD '24: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CADArticle No.: 14, Pages 1–12https://doi.org/10.1145/3670474.3685952

In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written ...

Article

Open Access

Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL

Euro-Par 2024: Parallel ProcessingPages 121–136https://doi.org/10.1007/978-3-031-69766-1_9

Abstract

Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone ...

research-article

Free

JUST ACCEPTED

A Scalable Accelerator for Local Score Computation of Structure Learning in Bayesian Networks

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3674842

A Bayesian network is a powerful tool for representing uncertainty in data, offering transparent and interpretable inference, unlike neural networks’ black-box mechanisms. To fully harness the potential of Bayesian networks, it is essential to learn the ...

research-article

Applying deep learning to real-time UAV-based forest monitoring: Leveraging multi-sensor imagery for improved results

Expert Systems with Applications: An International Journal (EXWA), Volume 245, Issue Chttps://doi.org/10.1016/j.eswa.2023.123107

Abstract

Rising global fire incidents necessitate effective solutions, with forest surveillance emerging as a crucial strategy. This paper proposes a complete solution using technology that integrates visible and infrared spectrum images through Unmanned ...

Graphical abstract

Display Omitted

Highlights

Real-time detection using UAVs for people and cars in forests based on deep learning.
4-channel object detection model with RGB and IR, for low-visibility environments.
A new annotated and aligned image dataset with four channels (...

short-paper

Best Student Paper

Design and Implementation of a Low-Latency Origin and Relay for Media-over-QUIC Transport

Zafer Gurel

MMSys '24: Proceedings of the 15th ACM Multimedia Systems ConferencePages 524–526https://doi.org/10.1145/3625468.3652914

The Media-over-QUIC Transport (MOQT) is an emerging application-layer protocol designed for low-latency media ingestion and distribution. It is applicable in both browser scenarios (i.e., HTTP/3 Web-Transport) and non-browser settings (i.e., raw QUIC) ...

Article

Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems

Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 19–32https://doi.org/10.1007/978-3-031-55673-9_2

Abstract

The Sparse Matrix Vector (SpMV) multiplication kernel is a key component of many high-performance computing applications, but at the same time one of the most challenging to optimize, primarily due to its low flop-per-byte ratio and irregular ...

research-article

Multi-cut based architectural obfuscation and handprint biometric signature for securing transient fault detectable IP cores during HLS

Integration, the VLSI Journal (INTG), Volume 95, Issue Chttps://doi.org/10.1016/j.vlsi.2023.102114

Abstract

This paper presents a novel dual defense security methodology for fault detectable reusable hardware intellectual property (IP) core against reverse engineering and piracy using multi-cut based architectural obfuscation and handprint biometric ...

Highlights

A novel dual defense security methodology for fault detectable reusable hardware IP core.
Multi-cut based architectural obfuscation and handprint biometric signature-based approach.
Ensures protection of fault secured IPs against ...

short-paper

Open Access

Content Steering: a Standard for Multi-CDN Streaming

MHV '24: Proceedings of the 3rd Mile-High Video ConferencePage 128https://doi.org/10.1145/3638036.3640293

DASH-IF Content Steering [4] (to be soon published as ETSI TS 103 998) is a new standard developed by the DASH Industry Forum (DASH-IF), defining means for managing DASH [5] media delivery using multiple content delivery networks (CDNs). At the server-...

research-article

Parallel chaos-based image encryption algorithm: high-level synthesis and FPGA implementation

The Journal of Supercomputing (JSCO), Volume 80, Issue 8Pages 10985–11013https://doi.org/10.1007/s11227-023-05784-1

Abstract

Nowadays, establishing security in data transmission is essential, and it is achieved by cryptography. Encryption of still or video images in specific applications such as Internet of Things, medical and satellite imaging, in applications ...

research-article

On the RTL Implementation of FINN Matrix Vector Unit

ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 6Article No.: 94, Pages 1–27https://doi.org/10.1145/3547141

Field-programmable gate array (FPGA)–based accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their ability to scale performance with increasing degrees of specialization with dataflow architectures or custom ...

research-article

Exploration of optimal functional Trojan-resistant hardware intellectual property (IP) core designs during high level synthesis

Microprocessors & Microsystems (MSYS), Volume 103, Issue Chttps://doi.org/10.1016/j.micpro.2023.104973

Highlights

Exploration of optimal Trojan resistant design using area-latency tradeoff.
A novel methodology for functional Trojan-resistant hardware IP design.
Presents low-cost optimized Trojan-resistant design using PSO-DSE.
Attains ...

Abstract

Hardware Trojans that have the capability to change the computed functional output in intellectual property (IP) cores, integrated into computing systems can be a vital reliability concern in the context of correct system operation. Therefore, ...

review-article

MVSym: Efficient symbiotic exploitation of HLS-kernel multi-versioning for collaborative CPU-FPGA cloud systems

Integration, the VLSI Journal (INTG), Volume 93, Issue Chttps://doi.org/10.1016/j.vlsi.2023.102052

Abstract

Cloud Warehouses have been exploiting CPU-FPGA collaborative environments, where clients share the same infrastructure to maximize resource utilization with energy efficiency. In this scope, resource provisioning is challenging as ...

Article

Accelerating Graph Neural Networks in Pytorch with HLS and Deep Dataflows

Jose Nunez-Yanez

Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 131–145https://doi.org/10.1007/978-3-031-42921-7_9

Abstract

Graph neural networks (GNNs) combine sparse and dense data compute requirements that are challenging to meet in resource-constrained embedded hardware. In this paper, we investigate a dataflow of dataflows architecture that optimizes data access ...

Article

ArcvaVX: OpenVX Framework for Adaptive Reconfigurable Computer Vision Architectures

Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 97–112https://doi.org/10.1007/978-3-031-42921-7_7

Abstract

The field of computer vision is steadily growing in its complexity and application areas. FPGAs have shown that they can meet the growing demands for performance and energy efficiency. However, their programmability is a major challenge for ...

research-article

Public Access

An FPGA Accelerator for Genome Variant Calling

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 4Article No.: 53, Pages 1–29https://doi.org/10.1145/3595297

In genome analysis, it is often important to identify variants from a reference genome. However, identifying variants that occur with low frequency can be challenging, as it is computationally intensive to do so accurately. LoFreq is a widely used program ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences