research-article

A Reconfigurable Architecture for Binary Acceleration of Loops with Memory Accesses

Authors:

João Canas Ferreira,

João M. P. CardosoAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 7, Issue 4

Article No.: 29, Pages 1 - 20

https://doi.org/10.1145/2629468

Published: 29 December 2014 Publication History

Abstract

This article presents a reconfigurable hardware/software architecture for binary acceleration of embedded applications. A Reconfigurable Processing Unit (RPU) is used as a coprocessor of the General Purpose Processor (GPP) to accelerate the execution of repetitive instruction sequences called Megablocks. A toolchain detects Megablocks from instruction traces and generates customized RPU implementations. The implementation of Megablocks with memory accesses uses a memory-sharing mechanism to support concurrent accesses to the entire address space of the GPP’s data memory. The scheduling of load/store operations and memory access handling have been optimized to minimize the latency introduced by memory accesses. The system is able to dynamically switch the execution between the GPP and the RPU when executing the original binaries of the input application. Our proof-of-concept prototype achieved geometric mean speedups of 1.60× and 1.18× for, respectively, a set of 37 benchmarks and a subset considering the 9 most complex benchmarks. With respect to a previous version of our approach, we achieved geometric mean speedup improvements from 1.22 to 1.53 for the 10 benchmarks previously used.

References

[1]

J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM, 177--189.

Digital Library

[2]

Antonio Carlos S. Beck, Mateus B. Rutzig, Georgi Gaydadjiev, and Luigi Carro. 2008. Transparent reconfigurable acceleration for heterogeneous embedded applications. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’08). ACM, 1208--1213.

Digital Library

[3]

João Bispo and João M. P. Cardoso. 2010a. On identifying and optimizing instruction sequences for dynamic compilation. In Proceedings of the International Conference on Field-Programmable Technology (FPT’10). 437--440.

[4]

João Bispo and João M. P. Cardoso. 2010b. On identifying segments of traces for dynamic compilation. In Proceedings of the International Conference Field-Programmable Logic Applications (FPL’10). 263--266.

Digital Library

[5]

João Bispo, Nuno Paulino, João M. P. Cardoso, and João C. Ferreira. 2013a. Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units. International Journal of Reconfigurable Computing (2013), 20. Article ID 340316.

[6]

João Bispo, Nuno Paulino, João C. Ferreira, and João M. P. Cardoso. 2013b. Transparent trace-based binary acceleration for reconfigurable HW/SW systems. IEEE Transactions on Industrial Informatics 9, 3 (Aug. 2013), 1625--1634.

[7]

João Bispo. 2012. Mapping Runtime-Detected Loops from Microprocessors to Reconfigurable Processing Units. Ph.D. Dissertation. Instituto Superior susheel -- Universidade susheel de Lisboa.

[8]

Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, and Krisztian Flautner. 2005. An architecture framework for transparent instruction set customization in embedded processors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA’05). IEEE Computer Society, 272--283.

Digital Library

[9]

Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO’04). 30--40.

Digital Library

[10]

Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, and Yunheung Paek. 2011. Memory access optimization in compilation for coarse-grained reconfigurable architectures. ACM Transactions on Design Automation of Electron. Syst. 16, 4, Article 42 (Oct. 2011), 27 pages.

Digital Library

[11]

Roman L. Lysecky and Frank Vahid. 2009. Design and implementation of a MicroBlaze-based warp processor. ACM Trans. Embedded Comput. Syst. 8, 3, Article 22 (April 2009), 22 pages.

Digital Library

[12]

Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami. 2012. Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization. Journal of Supercomputing 60, 2 (May 2012), 196--222.

Digital Library

[13]

Hamid Noori, Farhad Mehdipour, Kazuaki Murakami, Koji Inoue, and Morteza Saheb Zamani. 2008. An architecture framework for an adaptive extensible processor. Journal of Supercomputing 45, 3 (Sept. 2008), 313--340.

Digital Library

[14]

Jong Kyung Paek, Kiyoung Choi, and Jongeun Lee. 2011. Binary acceleration using coarse-grained reconfigurable architecture. SIGARCH Computer Architecture News 38, 4 (Jan. 2011), 33--39.

Digital Library

[15]

Nuno Paulino, João C. Ferreira, and João M. P. Cardoso. 2013. Architecture for transparent binary acceleration of loops with memory accesses. In Proceedings of the 9th International Conference on Reconfigurable Computing: Architectures, Tools, and Applications (ARC’13). Springer-Verlag, 122--133.

Digital Library

[16]

Jeff Scott, Lea Hwang Lee, John Arends, and Bill Moyer. 1998. Designing the Low-Power M&ast;CORE Architecture. In Proceedings of the Power Driven Microarchitecture Workshop at the IEEE International Symposium on Circuits and Systems (ISCAS’98). Barcelona, Spain.

[17]

Seoul National University. 2006. SNU Real-Time Benchmarks. Retrieved from http://www.cprover.org/goto-cc/examples/snu.html.

[18]

Greg Stitt and Frank Vahid. 2011. Thread warping: Dynamic and transparent synthesis of thread accelerators. ACM Transactions on Design Automation of Electronic Systems 16, 3, Article 32, 21 pages.

Digital Library

[19]

Texas Instruments. 2008. TMS320C6000 Image Library (IMGLIB) - SPRC264. Retrieved from http://www.ti.com/tool/sprc264. (2008).

[20]

Henry S. Warren. 2002. Hacker’s Delight. Addison-Wesley Longman.

[21]

Wayne Wolf. 2003. A decade of hardware/software codesign. Computer 36 (April 2003), 38--43.

Digital Library

Cited By

Paulino NFerreira JCardoso J(2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
https://dl.acm.org/doi/10.1145/3369764
Paulino NFerreira JCardoso J(2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2573640
Paulino NFerreira JCardoso J(2014)Trace-Based Reconfigurable Acceleration with Data Cache and External Memory SupportProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.29(158-165)Online publication date: 26-Aug-2014
https://dl.acm.org/doi/10.1109/ISPA.2014.29

Index Terms

A Reconfigurable Architecture for Binary Acceleration of Loops with Memory Accesses
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

Architecture for transparent binary acceleration of loops with memory accesses
ARC'13: Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications

This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a ...
Techniques for Dynamically Mapping Computations to Coprocessors
RECONFIG '11: Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs

In embedded reconfigurable computing systems, general purpose processors (GPPs) are typically extended with coprocessors to meet specific goals, such as higher performance and/or energy savings. Coprocessors can range from specialized modules which ...
From Instruction Traces to Specialized Reconfigurable Arrays
RECONFIG '11: Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs

This paper presents an offline tool-chain which automatically extracts loops (Mega blocks) from Micro Blaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops. The system moves loops from the CPU to the RPU ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 7, Issue 4

January 2015

213 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/2699137

Editor:
Steve Wilton
Department of Electrical and Computer Engineering/University of British Columbia/Kaiser, Main Mall/Vancouver, Canada

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 December 2014

Accepted: 01 January 2014

Revised: 01 September 2013

Received: 01 June 2013

Published in TRETS Volume 7, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fundação para a Ciência e a Tecnologia
European Regional Development Fund through the COMPETE Programme (Operational Programme for Competitiveness)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
210
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Paulino NFerreira JCardoso J(2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
https://dl.acm.org/doi/10.1145/3369764
Paulino NFerreira JCardoso J(2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2573640
Paulino NFerreira JCardoso J(2014)Trace-Based Reconfigurable Acceleration with Data Cache and External Memory SupportProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.29(158-165)Online publication date: 26-Aug-2014
https://dl.acm.org/doi/10.1109/ISPA.2014.29

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents