No abstract available.
Proceeding Downloads
Large-scale combinatorial optimization in real-time systems by FPGA-based accelerators for simulated bifurcation
Combinatorial optimization problems are economically valuable but computationally hard to solve. Many practical combinatorial optimizations can be converted to the ground-state search problems of Ising spin models. Simulated bifurcation (SB) is a ...
On the Inevitability of Integrated HPC Systems and How they will Change HPC System Operations
High-Performance Computing (HPC) is at an inflection point in its evolution. General-purpose architectures approach limits in terms of speed and power/energy, requiring the development of specialized architectures to deliver accelerated performance. ...
Software-like Compilation for Data Center FPGA Accelerators
Compilation times for large Xilinx devices, such as the Amazon F1 instance, are on the order of several hours. However, today's data center designs often have many identical processing units (PUs), meaning that conventional design flows waste time ...
Automation of Domain-specific FPGA-IP Generation and Test
Multi-access edge computing (MEC) devices that perform processing between the edge and cloud are becoming important in the Internet of Things infrastructure. MEC devices are designed to reduce the load on the edge devices, ensure real-time performance, ...
A programming environment for multi-FPGA systems based on CyberWorkBench: an integrated design tool
This paper proposes a multi-FPGA programming environment based on NEC's integrated design tool CyberWorkBench (CWB) for a multi-FPGA system FiC (Flow-in-Cloud). Programmers describe their program in SystemC as small modules connected with FIFO channels, ...
Accelerating Matrix Processing for MIMO Systems
Massive Multiple In and Multiple Out (MIMO) is being used in the fifth generation of wireless communication systems. As the number of antennas increases, the computational complexity grows dramatically, and this involves matrix calculations with complex ...
Kyokko: a vendor-independent high-speed serial communication controller
With the advancement of HLS technology, FPGA is finally drawing attention as a power-efficient accelerator device. Unlike GPUs, the computation pipeline and FPGA-to-FPGA interconnection can be tightly coupled on FPGAs because they have high-speed serial ...
StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs
- Artur Podobas,
- Martin Svedin,
- Steven W. D. Chien,
- Ivy B. Peng,
- Naresh Balaji Ravichandran,
- Pawel Herman,
- Anders Lansner,
- Stefano Markidis
The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other - less-known - machine learning algorithms with a mature and solid ...
Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers
For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100 (A100), claiming significant performance improvements over previous generations, ...
A Sorting Library for FPGA Implementation in OpenCL Programming
In this study, we focus on data sorting, which is a basic arithmetic operation, and we present a sorting library that can be used with the OpenCL programming model for field-programmable gate arrays (FPGAs). Our sorting library is built by combining ...
FPGA Acceleration of Short Read Alignment
This work proposes a novel dataflow architecture for Smith-Waterman Matrix-fill and Traceback stages, which are at the heart of short-read alignment on NGS data. The FPGA accelerator is coupled with radical software restructuring to widely-used Bowtie2 ...
Towards Performance Characterization of FPGAs in Context of HPC using OpenCL Benchmarks
OpenCL-based HLS frameworks are known to reduce the development effort for FPGAs while offering good quality results. The upcoming trend to equip heterogeneous compute clusters with hybrid networks for inter-CPU and inter-FPGA communication allows ...
CoopCL: A Framework for Cooperative Execution of Data-parallel Kernels on Multi-device Platforms
This paper describes CoopCL framework that is specifically designed to reduce the multi-device programming complexity. CoopCL consists of a three core components: a C++ API, custom-compiler and a runtime that abstracts and unifies the cooperative ...