Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024JUST ACCEPTED
SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA Accelerators
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3705324High-level synthesis (HLS) aims at democratizing custom hardware acceleration with highly abstracted software-like descriptions. However, efficient accelerators still require substantial low-level hardware optimizations, defeating the HLS intent. In the ...
- research-articleNovember 2024
Efficient deployment of Single Shot Multibox Detector network on FPGAs
AbstractFPGAs, characterized by their low power consumption and swift response, are ideally suited for parallel computations associated with object detection tasks, making them a popular choice for target detection and neural network acceleration. ...
Highlights- Parallel computation boosts speed and efficiency in convolutional layers.
- Integrated parallel processing enhances convolution activation pooling.
- Efficient memory management reduces read/write time for feature layers.
- ...
- research-articleOctober 2024
Exploiting Retina Biometric Fused with Encoded Hash for Designing Watermarked Convolutional Hardware IP Against Piracy
AbstractThe convolution layer in a convolutional neural network (CNN) is highly computationally intensive. It is crucial to design reusable low-cost hardware IP for convolutional layer for enabling hardware-based feature extraction. However, the ...
- research-articleOctober 2024
Hardware Security of Image Processing Cores Against IP Piracy Using PSO-Based HLS-Driven Multi-Stage Encryption Fused with Fingerprint Signature
AbstractThe increasing usage of image processing applications in modern technological environments is driven by their ability to enhance visual quality in diverse applications, from social media to medical imaging. The design of these cores as dedicated ...
- research-articleSeptember 2024
DONGLE 2.0: Direct FPGA-Orchestrated NVMe Storage for HLS
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 17, Issue 3Article No.: 45, Pages 1–32https://doi.org/10.1145/3650038Rapid growth in data size poses significant computational and memory challenges to data processing. FPGA accelerators and near-storage processing have emerged as compelling solutions for tackling the growing computational and memory requirements. Many ...
-
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
MLCAD '24: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CADArticle No.: 14, Pages 1–12https://doi.org/10.1145/3670474.3685952In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written ...
- ArticleAugust 2024
Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL
AbstractMost FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone ...
- research-articleJuly 2024JUST ACCEPTED
A Scalable Accelerator for Local Score Computation of Structure Learning in Bayesian Networks
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3674842A Bayesian network is a powerful tool for representing uncertainty in data, offering transparent and interpretable inference, unlike neural networks’ black-box mechanisms. To fully harness the potential of Bayesian networks, it is essential to learn the ...
- research-articleJuly 2024
Applying deep learning to real-time UAV-based forest monitoring: Leveraging multi-sensor imagery for improved results
Expert Systems with Applications: An International Journal (EXWA), Volume 245, Issue Chttps://doi.org/10.1016/j.eswa.2023.123107AbstractRising global fire incidents necessitate effective solutions, with forest surveillance emerging as a crucial strategy. This paper proposes a complete solution using technology that integrates visible and infrared spectrum images through Unmanned ...
Graphical abstractDisplay Omitted
Highlights- Real-time detection using UAVs for people and cars in forests based on deep learning.
- 4-channel object detection model with RGB and IR, for low-visibility environments.
- A new annotated and aligned image dataset with four channels (...
- short-paperApril 2024Best Student Paper
Design and Implementation of a Low-Latency Origin and Relay for Media-over-QUIC Transport
MMSys '24: Proceedings of the 15th ACM Multimedia Systems ConferencePages 524–526https://doi.org/10.1145/3625468.3652914The Media-over-QUIC Transport (MOQT) is an emerging application-layer protocol designed for low-latency media ingestion and distribution. It is applicable in both browser scenarios (i.e., HTTP/3 Web-Transport) and non-browser settings (i.e., raw QUIC) ...
- ArticleMarch 2024
Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems
- Panagiotis Mpakos,
- Ioanna Tasou,
- Chloe Alverti,
- Panagiotis Miliadis,
- Pavlos Malakonakis,
- Dimitris Theodoropoulos,
- Georgios Goumas,
- Dionisios N. Pnevmatikatos,
- Nectarios Koziris
Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 19–32https://doi.org/10.1007/978-3-031-55673-9_2AbstractThe Sparse Matrix Vector (SpMV) multiplication kernel is a key component of many high-performance computing applications, but at the same time one of the most challenging to optimize, primarily due to its low flop-per-byte ratio and irregular ...
- research-articleApril 2024
Multi-cut based architectural obfuscation and handprint biometric signature for securing transient fault detectable IP cores during HLS
AbstractThis paper presents a novel dual defense security methodology for fault detectable reusable hardware intellectual property (IP) core against reverse engineering and piracy using multi-cut based architectural obfuscation and handprint biometric ...
Highlights- A novel dual defense security methodology for fault detectable reusable hardware IP core.
- Multi-cut based architectural obfuscation and handprint biometric signature-based approach.
- Ensures protection of fault secured IPs against ...
- short-paperMarch 2024
Content Steering: a Standard for Multi-CDN Streaming
MHV '24: Proceedings of the 3rd Mile-High Video ConferencePage 128https://doi.org/10.1145/3638036.3640293DASH-IF Content Steering [4] (to be soon published as ETSI TS 103 998) is a new standard developed by the DASH Industry Forum (DASH-IF), defining means for managing DASH [5] media delivery using multiple content delivery networks (CDNs). At the server-...
- research-articleNovember 2023
On the RTL Implementation of FINN Matrix Vector Unit
ACM Transactions on Embedded Computing Systems (TECS), Volume 22, Issue 6Article No.: 94, Pages 1–27https://doi.org/10.1145/3547141Field-programmable gate array (FPGA)–based accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their ability to scale performance with increasing degrees of specialization with dataflow architectures or custom ...
- research-articleMarch 2024
Exploration of optimal functional Trojan-resistant hardware intellectual property (IP) core designs during high level synthesis
Microprocessors & Microsystems (MSYS), Volume 103, Issue Chttps://doi.org/10.1016/j.micpro.2023.104973Highlights- Exploration of optimal Trojan resistant design using area-latency tradeoff.
- A novel methodology for functional Trojan-resistant hardware IP design.
- Presents low-cost optimized Trojan-resistant design using PSO-DSE.
- Attains ...
Hardware Trojans that have the capability to change the computed functional output in intellectual property (IP) cores, integrated into computing systems can be a vital reliability concern in the context of correct system operation. Therefore, ...
- review-articleNovember 2023
MVSym: Efficient symbiotic exploitation of HLS-kernel multi-versioning for collaborative CPU-FPGA cloud systems
- Michael Guilherme Jordan,
- Bernardo Neuhaus Lignati,
- Guilherme Korol,
- Mateus Beck Rutzig,
- Antonio Carlos Schneider Beck
AbstractCloud Warehouses have been exploiting CPU-FPGA collaborative environments, where clients share the same infrastructure to maximize resource utilization with energy efficiency. In this scope, resource provisioning is challenging as ...
- ArticleSeptember 2023
Accelerating Graph Neural Networks in Pytorch with HLS and Deep Dataflows
Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 131–145https://doi.org/10.1007/978-3-031-42921-7_9AbstractGraph neural networks (GNNs) combine sparse and dense data compute requirements that are challenging to meet in resource-constrained embedded hardware. In this paper, we investigate a dataflow of dataflows architecture that optimizes data access ...
- ArticleSeptember 2023
ArcvaVX: OpenVX Framework for Adaptive Reconfigurable Computer Vision Architectures
Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 97–112https://doi.org/10.1007/978-3-031-42921-7_7AbstractThe field of computer vision is steadily growing in its complexity and application areas. FPGAs have shown that they can meet the growing demands for performance and energy efficiency. However, their programmability is a major challenge for ...
- research-articleSeptember 2023
An FPGA Accelerator for Genome Variant Calling
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 4Article No.: 53, Pages 1–29https://doi.org/10.1145/3595297In genome analysis, it is often important to identify variants from a reference genome. However, identifying variants that occur with low frequency can be challenging, as it is computationally intensive to do so accurately. LoFreq is a widely used program ...