Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Reconfigurable Architecture for Binary Acceleration of Loops with Memory Accesses

Published: 29 December 2014 Publication History

Abstract

This article presents a reconfigurable hardware/software architecture for binary acceleration of embedded applications. A Reconfigurable Processing Unit (RPU) is used as a coprocessor of the General Purpose Processor (GPP) to accelerate the execution of repetitive instruction sequences called Megablocks. A toolchain detects Megablocks from instruction traces and generates customized RPU implementations. The implementation of Megablocks with memory accesses uses a memory-sharing mechanism to support concurrent accesses to the entire address space of the GPP’s data memory. The scheduling of load/store operations and memory access handling have been optimized to minimize the latency introduced by memory accesses. The system is able to dynamically switch the execution between the GPP and the RPU when executing the original binaries of the input application. Our proof-of-concept prototype achieved geometric mean speedups of 1.60× and 1.18× for, respectively, a set of 37 benchmarks and a subset considering the 9 most complex benchmarks. With respect to a previous version of our approach, we achieved geometric mean speedup improvements from 1.22 to 1.53 for the 10 benchmarks previously used.

References

[1]
J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of control dependence to data dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM, 177--189.
[2]
Antonio Carlos S. Beck, Mateus B. Rutzig, Georgi Gaydadjiev, and Luigi Carro. 2008. Transparent reconfigurable acceleration for heterogeneous embedded applications. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’08). ACM, 1208--1213.
[3]
João Bispo and João M. P. Cardoso. 2010a. On identifying and optimizing instruction sequences for dynamic compilation. In Proceedings of the International Conference on Field-Programmable Technology (FPT’10). 437--440.
[4]
João Bispo and João M. P. Cardoso. 2010b. On identifying segments of traces for dynamic compilation. In Proceedings of the International Conference Field-Programmable Logic Applications (FPL’10). 263--266.
[5]
João Bispo, Nuno Paulino, João M. P. Cardoso, and João C. Ferreira. 2013a. Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units. International Journal of Reconfigurable Computing (2013), 20. Article ID 340316.
[6]
João Bispo, Nuno Paulino, João C. Ferreira, and João M. P. Cardoso. 2013b. Transparent trace-based binary acceleration for reconfigurable HW/SW systems. IEEE Transactions on Industrial Informatics 9, 3 (Aug. 2013), 1625--1634.
[7]
João Bispo. 2012. Mapping Runtime-Detected Loops from Microprocessors to Reconfigurable Processing Units. Ph.D. Dissertation. Instituto Superior susheel -- Universidade susheel de Lisboa.
[8]
Nathan Clark, Jason Blome, Michael Chu, Scott Mahlke, Stuart Biles, and Krisztian Flautner. 2005. An architecture framework for transparent instruction set customization in embedded processors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA’05). IEEE Computer Society, 272--283.
[9]
Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th International Symposium on Microarchitecture (MICRO’04). 30--40.
[10]
Yongjoo Kim, Jongeun Lee, Aviral Shrivastava, and Yunheung Paek. 2011. Memory access optimization in compilation for coarse-grained reconfigurable architectures. ACM Transactions on Design Automation of Electron. Syst. 16, 4, Article 42 (Oct. 2011), 27 pages.
[11]
Roman L. Lysecky and Frank Vahid. 2009. Design and implementation of a MicroBlaze-based warp processor. ACM Trans. Embedded Comput. Syst. 8, 3, Article 22 (April 2009), 22 pages.
[12]
Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami. 2012. Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization. Journal of Supercomputing 60, 2 (May 2012), 196--222.
[13]
Hamid Noori, Farhad Mehdipour, Kazuaki Murakami, Koji Inoue, and Morteza Saheb Zamani. 2008. An architecture framework for an adaptive extensible processor. Journal of Supercomputing 45, 3 (Sept. 2008), 313--340.
[14]
Jong Kyung Paek, Kiyoung Choi, and Jongeun Lee. 2011. Binary acceleration using coarse-grained reconfigurable architecture. SIGARCH Computer Architecture News 38, 4 (Jan. 2011), 33--39.
[15]
Nuno Paulino, João C. Ferreira, and João M. P. Cardoso. 2013. Architecture for transparent binary acceleration of loops with memory accesses. In Proceedings of the 9th International Conference on Reconfigurable Computing: Architectures, Tools, and Applications (ARC’13). Springer-Verlag, 122--133.
[16]
Jeff Scott, Lea Hwang Lee, John Arends, and Bill Moyer. 1998. Designing the Low-Power M*CORE Architecture. In Proceedings of the Power Driven Microarchitecture Workshop at the IEEE International Symposium on Circuits and Systems (ISCAS’98). Barcelona, Spain.
[17]
Seoul National University. 2006. SNU Real-Time Benchmarks. Retrieved from http://www.cprover.org/goto-cc/examples/snu.html.
[18]
Greg Stitt and Frank Vahid. 2011. Thread warping: Dynamic and transparent synthesis of thread accelerators. ACM Transactions on Design Automation of Electronic Systems 16, 3, Article 32, 21 pages.
[19]
Texas Instruments. 2008. TMS320C6000 Image Library (IMGLIB) - SPRC264. Retrieved from http://www.ti.com/tool/sprc264. (2008).
[20]
Henry S. Warren. 2002. Hacker’s Delight. Addison-Wesley Longman.
[21]
Wayne Wolf. 2003. A decade of hardware/software codesign. Computer 36 (April 2003), 38--43.

Cited By

View all
  • (2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
  • (2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
  • (2014)Trace-Based Reconfigurable Acceleration with Data Cache and External Memory SupportProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.29(158-165)Online publication date: 26-Aug-2014

Index Terms

  1. A Reconfigurable Architecture for Binary Acceleration of Loops with Memory Accesses

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 7, Issue 4
      January 2015
      213 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/2699137
      • Editor:
      • Steve Wilton
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 29 December 2014
      Accepted: 01 January 2014
      Revised: 01 September 2013
      Received: 01 June 2013
      Published in TRETS Volume 7, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FPGA
      2. Megablock
      3. MicroBlaze
      4. Reconfigurable processor
      5. hardware acceleration
      6. hardware/software architectures
      7. instruction trace
      8. memory access

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Improving Performance and Energy Consumption in Embedded Systems via Binary Acceleration: A SurveyACM Computing Surveys10.1145/336976453:1(1-36)Online publication date: 6-Feb-2020
      • (2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017
      • (2014)Trace-Based Reconfigurable Acceleration with Data Cache and External Memory SupportProceedings of the 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2014.29(158-165)Online publication date: 26-Aug-2014

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media