Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Architecture Optimization of Application-Specific Implicit Instructions

Published: 01 August 2012 Publication History

Abstract

Dynamic configuration of application-specific implicit instructions has been proposed to better exploit the available parallelism at the instruction level in pipelined processors. The support of such implicit instruction issue-requires the pipeline to be extended with a trigger table that describes the instruction implicitly issued as a response to a value written into a triggering register by a triggering instruction (which may be an add or sub instruction). In this article, we explore the design optimization of the trigger table to maximize the number of instructions that can be implicitly issued while keeping the limited size of the trigger table. The concept of implicitly issued instruction has been formally defined by considering the inter-basic block analysis of control and data dependencies. A compilation tool chain has been developed to automatically identify the optimization opportunities, taking into account the constraints imposed by control and data dependencies as well as by architectural limitations. The proposed solutions have been applied to the case of a baseline scalar MIPS processor where, for the selected set of benchmarks (DSPStone and Mibench/automotive), we obtained an average speedup of 17%.

References

[1]
Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools 2nd Ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[2]
Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67.
[3]
Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V., and Zafalon, R. 2002. A framework for modeling and estimating the energy dissipation of VLIW-based embedded systems. Des. Autom. Embed. Sys. 7, 3, 183--203.
[4]
Bracy, A., Prahlad, P., and Roth, A. 2004. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO37). IEEE Computer Society, Los Alamitos, CA, 18--29.
[5]
Chattopadhyay, A., Leupers, R., Meyr, H., and Ascheid, G. 2008. Language-Driven Exploration and Implementation of Partially Re-configurable ASIPs. Springer, Berlin.
[6]
Corporaal, H. 1997. Microprocessor Architectures: From VLIW to Tta. John Wiley & Sons, New York, NY.
[7]
Gathaus, M. R., Ringenberg, J. S., Ernst, D., Austen, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization.
[8]
Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. C. 2003. The Intel® Pentium M® processor: Microarchitecture and performance. Intel Technol. J. 7, 2, 21--59.
[9]
Gordon-Ross, A. and Vahid, F. 2006. Frequent loop detection using efficient nonintrusive on-chip hardware. IEEE Trans. Comput. 54, 10, 1203--1215.
[10]
Heinrich, J. 1993. MIPS R4000 Microprocessor User’s Manual. Prentice-Hall PTR, Upper Saddle River, NJ.
[11]
Hrishikesh, M. S., Burger, D., Jouppi, N. P., Keckler, S. W., Farkas, K. I., and Shivakumar, P. 2002. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). IEEE Computer Society, Los Alamitos, CA, 14--24.
[12]
Hu, S., Kim, I., Lipasti, M. H., and Smith, J. E. 2006. An approach for implementing efficient superscalar cisc processors. In the 12th International Symposium on High-Performance Computer Architecture. 41--52.
[13]
Hu, S. and Smith, J. E. 2004. Using dynamic binary translation to fuse dependent instructions. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). IEEE Computer Society, Los Alamitos, CA, 213.
[14]
Kreahling, W., Hines, S., Whalley, D., and Tyson, G. 2006. Reducing the cost of conditional transfers of control by using comparison specifications. SIGPLAN Not. 41, 7, 64--71.
[15]
Krishnaswamy, A. and Gupta, R. 2005. Dynamic coalescing for 16-bit instructions. ACM Trans. Embed. Comput. Sys. 4, 1, 3--37.
[16]
Rixner, S., Dally, W., Khailany, B., Mattson, P., Kapasi, U., and Owens, J. 2000. Register organization for media processing. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).
[17]
Sassone, P. G. and Wills, D. S. 2004. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’37). IEEE Computer Society, Los Alamitos, CA, 7--17.
[18]
Sykora, M., Agosta, G., and Silvano, C. 2008. Dynamic configuration of application-specific implicit instructions for embedded pipelined processors. In SAC ’08: Proceedings of the ACM Symposium on Applied Computing (SAC’08). ACM, New York, NY, 1509--1516.
[19]
Zivojnovic, V., Velarde, J. M., Schläger, C., and Meyr, H. 1994. DSPstone--A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 11, Issue S2
Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
August 2012
396 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2331147
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 01 August 2012
Accepted: 01 January 2010
Revised: 01 December 2009
Received: 01 June 2009
Published in TECS Volume 11, Issue S2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pipeline architecture
  2. implicit instruction issue

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 264
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media