Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
SAT-Based Exact Modulo Scheduling Mapping for Resource-Constrained CGRAs
- Cristian Tirelli,
- Juan Sapriza,
- Rubén Rodríguez Álvarez,
- Lorenzo Ferretti,
- Benoît Denkinger,
- Giovanni Ansaloni,
- José Miranda Calero,
- David Atienza,
- Laura Pozzi
ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 20, Issue 3Article No.: 8, Pages 1–26https://doi.org/10.1145/3663675Coarse-Grain Reconfigurable Arrays (CGRAs) represent emerging low-power architectures designed to accelerate Compute-Intensive Loops (CILs). The effectiveness of CGRAs in providing acceleration relies on the quality of mapping: how efficiently the CIL is ...
- research-articleDecember 2022
Towards High-Quality CGRA Mapping with Graph Neural Networks and Reinforcement Learning
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided DesignArticle No.: 61, Pages 1–9https://doi.org/10.1145/3508352.3549458Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution to accelerate domain applications due to its good combination of energy-efficiency and flexibility. Loops, as computation-intensive parts of applications, are often mapped onto ...
- research-articleMay 2022
RF-CGRA: a routing-friendly CGRA with hierarchical register chains
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in EuropePages 262–267CGRAs are promising architectures to accelerate domain-specific applications as they combine high energy-efficiency and flexibility. With either isolated register files (RFs) or link-consuming distributed registers in each processing element (PE), ...
- research-articleFebruary 2021
Folded Integer Multiplication for FPGAs
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPages 160–170https://doi.org/10.1145/3431920.3439299Encryption - especially the key exchange algorithms such as RSA - is an increasing use-model for FPGAs, driven by the adoption of the FPGA as a SmartNIC in the datacenter. While bulk encryption such as AES maps well to generic FPGA features, the very ...
- research-articleMarch 2017
A slack-based approach to efficiently deploy radix 8 booth multipliers
1In 1951 A. Booth published his algorithm to efficiently multiply signed numbers. Since the appearance of such algorithm, it has been widely accepted that radix 4-based Booth multipliers are the most efficient. They allow the height of the multiplier to ...
-
- posterFebruary 2017
Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPage 290https://doi.org/10.1145/3020078.3021778High-Level Synthesis (HLS) has been widely recognized and accepted as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive ...
- research-articleFebruary 2014
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
ODES '14: Proceedings of the 11th Workshop on Optimizations for DSP and Embedded SystemsPages 25–32https://doi.org/10.1145/2568326.2568327For the exploitation of the available parallelism clustered Very Long Instruction Word (VLIW) processors rely on highly optimizing compilers. Aiming this parallelism, many advanced compiler concepts have been developed and proposed in the past. Many of ...
- research-articleDecember 2013
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors
ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 3Article No.: 46, Pages 1–26https://doi.org/10.1145/2539036.2539042We study the trade-off between throughput and memory footprint of embedded software that is synthesized from acyclic static dataflow (task graph) specifications targeting distributed memory multiprocessors. We identify iteration overlapping as a knob in ...
- research-articleDecember 2013
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures
ACM Transactions on Architecture and Code Optimization (TACO), Volume 10, Issue 4Article No.: 58, Pages 1–24https://doi.org/10.1145/2541228.2555314Coarse-Grained Reconfigurable Architectures (CGRAs) present a potential of high compute throughput with energy efficiency. A CGRA consists of an array of Functional Units (FUs), which communicate with each other through an interconnect network ...
- research-articleJune 2012
EPIMap: using epimorphism to map applications on CGRAs
DAC '12: Proceedings of the 49th Annual Design Automation ConferencePages 1284–1291https://doi.org/10.1145/2228360.2228600Coarse-Grained Reconfigurable Architectures (CGRAs) are an attractive platform that promise simultaneous high-performance and high power-efficiency. One of the primary challenges in using CGRAs is to develop efficient compilers that can automatically ...
- research-articleJune 2012
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS), Volume 11S, Issue 1Article No.: 19, Pages 1–24https://doi.org/10.1145/2180887.2180896Code generation in a compiler is commonly divided into several phases: instruction selection, scheduling, register allocation, spill code generation, and, in the case of clustered architectures, cluster assignment. These phases are interdependent; for ...
- research-articleOctober 2010
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systemsPages 21–30https://doi.org/10.1145/1878921.1878925Mobile computing platforms in the form of smart phones, netbooks, and personal digital assistants have become an integral part of our everyday lives. Moving ahead to the future, mobile multimedia support will become a key differentiating factor for ...
- research-articleOctober 2009
CGRA express: accelerating execution using dynamic operation fusion
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systemsPages 271–280https://doi.org/10.1145/1629395.1629433Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used ...
- research-articleJune 2009
Modulo scheduling without overlapped lifetimes
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsPages 1–10https://doi.org/10.1145/1542452.1542454This paper describes complementary software- and hardware-based approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the N-instructions in a loop body and constructs an M-stage software ...
Also Published in:
ACM SIGPLAN Notices: Volume 44 Issue 7 - research-articleJune 2009
AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures
IEEE Transactions on Computers (ITCO), Volume 58, Issue 6Pages 770–783https://doi.org/10.1109/TC.2009.32This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster ...
- ArticleMay 2008
Reconstructing Control Flow in Modulo Scheduled Loops
ICIS '08: Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)Pages 539–544https://doi.org/10.1109/ICIS.2008.16Software pipelining is a loop optimization technique used to exploit instruction level parallelism in the loop. EPICarchitectures, such as Intel IA-64 (Itanium) provide extensive hardware support for software pipelining to generate compact and highly ...
- research-articleApril 2008
Modulo scheduling for highly customized datapaths to increase hardware reusability
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimizationPages 124–133https://doi.org/10.1145/1356058.1356075In the embedded domain, custom hardware in the form of ASICs is often used to implement critical parts of applications when performance and energy efficiency goals cannot be met with software implementations on a general purpose processor or DSP. The ...
- research-articleApril 2008
Latency-tolerant software pipelining in a production compiler
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimizationPages 104–113https://doi.org/10.1145/1356058.1356073In this paper we investigate the benefit of scheduling non-critical loads for a higher latency during software pipelining. "Non-critical" denotes those loads that have sufficient slack in the cyclic data dependence graph so that increasing the ...
- ArticleSeptember 2007
Hierarchical coarse-grained stream compilation for software defined radio
CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systemsPages 115–124https://doi.org/10.1145/1289881.1289903Software Defined Radio (SDR) is an emerging embedded domain where the physical layer of wireless protocols is implemented in software rather than the traditional application specific hardware. The operation throughput requirements of current third-...
- ArticleMarch 2007
Compiler assisted architectural exploration for coarse grained reconfigurable arrays
GLSVLSI '07: Proceedings of the 17th ACM Great Lakes symposium on VLSIPages 164–167https://doi.org/10.1145/1228784.1228827A large number of factors influence the hardware cost and the mapping efficiency of applications on coarse grain reconfigurable architectures. This paper investigates for the first time in a unified way the four factors that are directly related with ...