CHIMAERA: A high-performance architecture with a tightly-coupled reconfigurable functional unit
ZA Ye, A Moshovos, S Hauck, P Banerjee - ACM SIGARCH computer …, 2000 - dl.acm.org
ZA Ye, A Moshovos, S Hauck, P Banerjee
ACM SIGARCH computer architecture news, 2000•dl.acm.orgReconfigurable hardware has the potential for significant performance improvements by
providing support for application-specific operations. We report our experience with
Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit
(RFU) into the pipeline of an aggressive, dynamically-scheduled superscalar processor.
Chimaera is capable of performing 9-input/1-output operations on integer data. We discuss
the Chimaera C compiler that automatically maps computations for execution in the RFU …
providing support for application-specific operations. We report our experience with
Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit
(RFU) into the pipeline of an aggressive, dynamically-scheduled superscalar processor.
Chimaera is capable of performing 9-input/1-output operations on integer data. We discuss
the Chimaera C compiler that automatically maps computations for execution in the RFU …
Reconfigurable hardware has the potential for significant performance improvements by providing support for application-specific operations. We report our experience with Chimaera, a prototype system that integrates a small and fast reconfigurable functional unit (RFU) into the pipeline of an aggressive, dynamically-scheduled superscalar processor. Chimaera is capable of performing 9-input/1-output operations on integer data. We discuss the Chimaera C compiler that automatically maps computations for execution in the RFU. Chimaera is capable of: (1) collapsing a set of instructions into RFU operations, (2) converting control-flow into RFU operations, and (3) supporting a more powerful fine-grain data-parallel model than that supported by current multimedia extension instruction sets (for integer operations). Using a set of multimedia and communication applications we show that even with simple optimizations, the Chimaera C compiler is able to map 22% of all instructions to the RFU on the average. A variety of computations are mapped into RFU operations ranging from as simple as add/sub-shift pairs to operations of more than 10 instructions including several branches. Timing experiments demonstrate that for a 4-way out-of-order superscalar processor Chimaera results in average performance improvements of 21%, assuming a very aggressive core processor design (most pessimistic RFU latency model) and communication overheads from and to the RFU.
ACM Digital Library