research-article

Architecture Optimization of Application-Specific Implicit Instructions

Authors:

Andrea Di Biagio,

Giovanni Agosta,

Martino Sykora,

Cristina SilvanoAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 11, Issue S2

Article No.: 44, Pages 1 - 23

https://doi.org/10.1145/2331147.2331154

Published: 01 August 2012 Publication History

Abstract

Dynamic configuration of application-specific implicit instructions has been proposed to better exploit the available parallelism at the instruction level in pipelined processors. The support of such implicit instruction issue-requires the pipeline to be extended with a trigger table that describes the instruction implicitly issued as a response to a value written into a triggering register by a triggering instruction (which may be an add or sub instruction). In this article, we explore the design optimization of the trigger table to maximize the number of instructions that can be implicitly issued while keeping the limited size of the trigger table. The concept of implicitly issued instruction has been formally defined by considering the inter-basic block analysis of control and data dependencies. A compilation tool chain has been developed to automatically identify the optimization opportunities, taking into account the constraints imposed by control and data dependencies as well as by architectural limitations. The proposed solutions have been applied to the case of a baseline scalar MIPS processor where, for the selected set of benchmarks (DSPStone and Mibench/automotive), we obtained an average speedup of 17%.

References

[1]

Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools 2nd Ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.

Digital Library

[2]

Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67.

Digital Library

[3]

Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2002. A framework for modeling and estimating the energy dissipation of VLIW-based embedded systems. Des. Autom. Embed. Sys. 7, 3, 183--203.

Digital Library

[4]

Bracy, A., Prahlad, P., and Roth, A. 2004. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO37). IEEE Computer Society, Los Alamitos, CA, 18--29.

Digital Library

[5]

Chattopadhyay, A., Leupers, R., Meyr, H., and Ascheid, G. 2008. Language-Driven Exploration and Implementation of Partially Re-configurable ASIPs. Springer, Berlin.

Digital Library

[6]

Corporaal, H. 1997. Microprocessor Architectures: From VLIW to Tta. John Wiley & Sons, New York, NY.

Digital Library

[7]

Gathaus, M. R., Ringenberg, J. S., Ernst, D., Austen, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization.

Digital Library

[8]

Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. C. 2003. The Intel® Pentium M® processor: Microarchitecture and performance. Intel Technol. J. 7, 2, 21--59.

[9]

Gordon-Ross, A. and Vahid, F. 2006. Frequent loop detection using efficient nonintrusive on-chip hardware. IEEE Trans. Comput. 54, 10, 1203--1215.

Digital Library

[10]

Heinrich, J. 1993. MIPS R4000 Microprocessor User’s Manual. Prentice-Hall PTR, Upper Saddle River, NJ.

[11]

Hrishikesh, M. S., Burger, D., Jouppi, N. P., Keckler, S. W., Farkas, K. I., and Shivakumar, P. 2002. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). IEEE Computer Society, Los Alamitos, CA, 14--24.

Digital Library

[12]

Hu, S., Kim, I., Lipasti, M. H., and Smith, J. E. 2006. An approach for implementing efficient superscalar cisc processors. In the 12th International Symposium on High-Performance Computer Architecture. 41--52.

[13]

Hu, S. and Smith, J. E. 2004. Using dynamic binary translation to fuse dependent instructions. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). IEEE Computer Society, Los Alamitos, CA, 213.

Digital Library

[14]

Kreahling, W., Hines, S., Whalley, D., and Tyson, G. 2006. Reducing the cost of conditional transfers of control by using comparison specifications. SIGPLAN Not. 41, 7, 64--71.

Digital Library

[15]

Krishnaswamy, A. and Gupta, R. 2005. Dynamic coalescing for 16-bit instructions. ACM Trans. Embed. Comput. Sys. 4, 1, 3--37.

Digital Library

[16]

Rixner, S., Dally, W., Khailany, B., Mattson, P., Kapasi, U., and Owens, J. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[17]

Sassone, P. G. and Wills, D. S. 2004. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’37). IEEE Computer Society, Los Alamitos, CA, 7--17.

Digital Library

[18]

Sykora, M., Agosta, G., and Silvano, C. 2008. Dynamic configuration of application-specific implicit instructions for embedded pipelined processors. In SAC ’08: Proceedings of the ACM Symposium on Applied Computing (SAC’08). ACM, New York, NY, 1509--1516.

Digital Library

[19]

Zivojnovic, V., Velarde, J. M., Schläger, C., and Meyr, H. 1994. DSPstone--A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT).

Index Terms

Architecture Optimization of Application-Specific Implicit Instructions
1. Computer systems organization
  1. Architectures
    1. Serial architectures
      1. Pipeline computing
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Dynamic configuration of application-specific implicit instructions for embedded pipelined processors
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

In this paper, we propose the dynamic configuration of application specific implicit instructions for pipelined processors to better exploit the available parallelism at instruction level. Given the target application, the compiler selects a set of ...
Dynamic coalescing for 16-bit instructions

In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit ...
Automatic custom instruction identification for application-specific instruction set processors

The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 11, Issue S2

Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09

August 2012

396 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2331147

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 August 2012

Accepted: 01 January 2010

Revised: 01 December 2009

Received: 01 June 2009

Published in TECS Volume 11, Issue S2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
264
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents