Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2656106.2656123acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Heuristics for greedy transport triggered architecture interconnect exploration

Published: 12 October 2014 Publication History

Abstract

Most power dissipation in Very Large Instruction Word (VLIW) processors occurs in their large, multi-port register files. Transport Triggered Architecture (TTA) is a VLIW variant whose exposed datapath reduces the need for RF accesses and ports. However, the comparative advantage of TTAs suffers in practice from a wide instruction word and complex interconnection network (IC). We argue that these issues are at least partly due to suboptimal design choices. The design space of possible TTA architectures is very large, and previous automated and ad-hoc design methods often produce inefficient architectures. We propose a reduced design space where efficient TTAs can be generated in a short time using excecution trace-driven greedy exploration. The proposed approach is evaluated by optimizing the equivalent of a 4-issue VLIW architecture. The algorithm finishes quickly and produces a processor with 10% reduced core energy product compared to a fully-connected TTA. Since the generated processor has low IC power and a shorter instruction word than a typical 4-issue VLIW, the results support the hypothesis that these drawbacks of TTA can be worked around with efficient IC design.

References

[1]
H. Corporaal. Microprocessor Architectures: From VLIW to TTA. John Wiley & Sons, Chichester, UK, 1997.
[2]
J.-L. Cruz, A. González, M. Valero, and N. P. Topham. Multiple-banked register file architectures. In Proc. Int. Symp. Comp. Arch., pages 316--325, Vancouver, BC, Canada, 2000.
[3]
O. Esko, P. Jääskeläinen, P. Huerta, C. S. de La Lama, J. Takala, and J. I. Martinez. Customized exposed datapath soft-core design flow with compiler support. In Proc. Int. Conf. Field Programmable Logic and Applications, pages 217--222, Milano, Italy, 2010.
[4]
J. A. Fisher, P. Faraboschi, and C. Young. Embedded computing: a VLIW approach to architecture, compilers and tools. Elsevier, 2005.
[5]
A. Ghazi, J. Boutellier, J. Hannuksela, S. Shahabuddin, and O. Silven. Programmable implementation of zero-crossing demodulator on an application specific processor. In IEEE Workshop on Signal Processing Systems, pages 231--236. IEEE, 2013.
[6]
N. Goel, A. Kumar, and P. R. Panda. Shared-port register file architecture for low-energy VLIW processors. ACM Transactions on Architecture and Code Optimization (TACO), 11(1):1, 2014.
[7]
H. Gould et al. The q-stirling numbers of first and second kinds. Duke mathematical journal, 28(2):281--289, 1961.
[8]
Y. Hara, H. Tomiyama, S. Honda, and H. Takada. Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Inf. Media Tech., 4(4):740--752, 2009.
[9]
Y. He, D. She, B. Mesman, and H. Corporaal. MOVE-Pro: a low power and high code density TTA architecture. In Proc. Int. Conf. Embedded Comp. Syst.: Arch. Modeling Simulation, pages 294--301, Samos, Greece, 2011.
[10]
J. Heikkinen, J. Takala, and H. Corporaal. Dictionary-based program compression on customizable processor architectures. Microprocessors and Microsystems, 33(2):139--153, 2009.
[11]
G. J. Hekstra, G. La Hei, P. Bingley, and F. Sijstermans. Tri-Media CPU64 design space exploration. In Computer Design, 1999.(ICCD'99) International Conference on, pages 599--606. IEEE, 1999.
[12]
T. T. Hoang, U. Jälmbrant, E. der Hagopian, K. P. Subramaniyan, M. Sjalander, and P. Larsson-Edefors. Design space exploration for an embedded processor with flexible datapath interconnect. In IEEE Int. Conf. Application-Specific Syst. Arch. Proc., pages 55--62, Rennes, France, 2010.
[13]
J. Hoogerbrugge and H. Corporaal. Automatic synthesis of transport triggered processors. In Proc. First Ann. Conf. Advanced School for Computing and Imaging, Heijen, The Netherlands, 1995.
[14]
R. Jordans, R. Corvino, L. Jozwiak, and H. Corporaal. Instruction-set architecture exploration strategies for deeply clustered VLIW ASIPs. In Mediterranean Conf. Embedded Computing, pages 38--41. IEEE, 2013.
[15]
V. S. Lapinskii, M. F. Jacome, and G. A. De Veciana. Application-specific clustered vliw datapaths: Early exploration on a parameterized design space. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 21(8):889--903, 2002.
[16]
R. Leupers. Instruction scheduling for clustered VLIW DSPs. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, pages 291--300. IEEE, 2000.
[17]
T. Patyk, P. Salmela, T. Pitkänen, P. Jääskeläinen, and J. Takala. Design methodology for offloading software executions to FPGA. J. Signal Process. Syst., 65(2):245--259, 2011.
[18]
T. Pitkänen, T. Rantanen, A. Cilio, and J. Takala. Hardware cost estimation for application-specific processor design. In Embedded Comp. Sys.: Architectures, Modeling, and Simulation, pages 212--221. Springer, 2005.
[19]
M. Sami, D. Sciuto, C. Silvano, V. Zaccaria, and R. Zafalom. Exploiting data forwarding to reduce the power budget of VLIW embedded processors. In Proc. Conf. and Exhib. Design, Automation and Test in Europe, pages 252--257. IEEE, 2001.
[20]
E. M. Witte, A. Chattopadhyay, O. Schliebusch, D. Kammler, R. Leupers, G. Ascheid, and H. Meyr. Applying resource sharing algorithms to adl-driven automatic asip implementation. In Proc. IEEE Int. Conf. Computer Design, pages 193--199. IEEE, 2005.
[21]
M. Woh, Y. Lin, S. Seo, S. Mahlke, T. Mudge, C. Chakrabarti, R. Bruce, D. Kershaw, A. Reid, M. Wilder, et al. From SODA to scotch: The evolution of a wireless baseband processor. In IEEE/ACM Int. Symp. Microarchitecture, pages 152--163. IEEE, 2008.

Cited By

View all
  • (2023)AEx: Automated High-Level Synthesis of Compiler Programmable Co-ProcessorsJournal of Signal Processing Systems10.1007/s11265-023-01841-395:9(1051-1065)Online publication date: 15-Feb-2023
  • (2019)LordCore: Energy-Efficient OpenCL-Programmable Software-Defined Radio CoprocessorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.289750827:5(1029-1042)Online publication date: May-2019
  • (2019)AEx: Automated Customization of Exposed Datapath Soft-Cores2019 22nd Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2019.00016(35-42)Online publication date: Aug-2019
  • Show More Cited By

Index Terms

  1. Heuristics for greedy transport triggered architecture interconnect exploration

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems
      October 2014
      241 pages
      ISBN:9781450330503
      DOI:10.1145/2656106
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 October 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. TTA
      2. VLIW
      3. design space exploration
      4. interconnection network
      5. port sharing
      6. register file
      7. transport triggered architecture

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ESWEEK'14
      ESWEEK'14: TENTH EMBEDDED SYSTEM WEEK
      October 12 - 17, 2014
      New Delhi, India

      Acceptance Rates

      Overall Acceptance Rate 52 of 230 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)AEx: Automated High-Level Synthesis of Compiler Programmable Co-ProcessorsJournal of Signal Processing Systems10.1007/s11265-023-01841-395:9(1051-1065)Online publication date: 15-Feb-2023
      • (2019)LordCore: Energy-Efficient OpenCL-Programmable Software-Defined Radio CoprocessorIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2019.289750827:5(1029-1042)Online publication date: May-2019
      • (2019)AEx: Automated Customization of Exposed Datapath Soft-Cores2019 22nd Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2019.00016(35-42)Online publication date: Aug-2019
      • (2019)ALMARVI Execution PlatformJournal of Signal Processing Systems10.1007/s11265-018-1424-191:1(61-73)Online publication date: 1-Jan-2019
      • (2018)Transport-Triggered Soft Cores2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00022(83-90)Online publication date: May-2018
      • (2018)Improving Code Density with Variable Length Encoding Aware Instruction SchedulingJournal of Signal Processing Systems10.1007/s11265-015-1081-684:3(435-446)Online publication date: 27-Dec-2018
      • (2018)Variable Length Instruction Compression on Transport Triggered ArchitecturesInternational Journal of Parallel Programming10.1007/s10766-018-0568-846:6(1283-1303)Online publication date: 1-Dec-2018
      • (2017)Codesign Case Study on Transport-Triggered ArchitecturesHandbook of Hardware/Software Codesign10.1007/978-94-017-7267-9_39(1303-1337)Online publication date: 27-Sep-2017
      • (2016)OpenCL programmable exposed datapath high performance low-power image signal processor2016 IEEE Nordic Circuits and Systems Conference (NORCAS)10.1109/NORCHIP.2016.7792906(1-6)Online publication date: Nov-2016
      • (2016)HW/SW Co-design Toolset for Customization of Exposed Datapath ProcessorsComputing Platforms for Software-Defined Radio10.1007/978-3-319-49679-5_8(147-164)Online publication date: 30-Dec-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media