Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1183401.1183427acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

A scalable low power issue queue for large instruction window processors

Published: 28 June 2006 Publication History

Abstract

Large instruction windows and issue queues are key to exploiting greater instruction level parallelism in out-of-order superscalar processors. However, the cycle time and energy consumption of conventional large monolithic issue queues are high. Previous efforts to reduce cycle time segment the issue queue and pipeline wakeup. Unfortunately, this results in significant IPC loss. Other proposals which address energy efficiency issues by avoiding only the unnecessary tag-comparisons do not reduce broadcasts. These schemes also increase the issue latency.To address both these issues comprehensively, we propose the Scalable Lowpower Issue Queue (SLIQ). SLIQ augments a pipelined issue queue with direct indexing to mitigate the problem of delayed wakeups while reducing the cycle time. Also, the SLIQ design naturally leads to significant energy savings by reducing both the number of tag broadcasts and comparisons required.A 2 segment SLIQ incurs an average IPC loss of 0.2% over the entire SPEC CPU2000 suite, while achieving a 25.2% reduction in issue latency when compared to a monolithic 128-entry issue queue for an 8-wide superscalar processor. An 8 segment SLIQ improves scalability by reducing the issue latency by 38.3% while incurring an IPC loss of only 2.3%. Further, the 8 segment SLIQ significantly reduces the energy consumption and energy-delay product by 48.3% and 67.4% respectively on average.

References

[1]
V. Agarwal, M. Hrishikesh, S. Keckler, and D. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000.
[2]
H. Akkary, R. Rajwar, and S. T. Srinivasan. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003.
[3]
M. Brown, J. Stark, and Y. Patt. Select-free instruction scheduling logic. In Proceedings of 34th International Symposium on Microarchitecture, 2001.
[4]
D. C. Burger and T. M. Austin. The Simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, 1997.
[5]
A. Buyuktosunoglu and D. H. Albonesi. Tradeoffs in power-efficient issue queue design. In Proceedings of the International Symposium on Low Power Electronics and Design, 2002.
[6]
A. Cristal, D. Ortega, J. Llosa, and M. Valero. Kilo-instruction processors. ACM Transactions on Architecture and Code Optimization, 1(4), 2004.
[7]
D. Ernst, A. Hamel, and T. Austin. Cyclone: a broadcast-free dynamic instruction scheduler selective replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003.
[8]
D. Ernst and T. M. Austin. Efficient dynamic scheduling through tag elimination. In Proceedings of 29th International Symposium on Computer Architecture, 2002.
[9]
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. In IEEE Journal of Solid-State Circuits, 1996.
[10]
M. Goshima, K. Nishino, Y. Nakashima, S. Mori, T. Kitamura, and S. Tomita. A high-speed dynamic instruction scheduling scheme for superscalar processors. In Proceedings of the 34th International Symposium on Microarchitecture, 2001.
[11]
M. Hrishikesh, N. P. Jouppi, K. I. Farkas, D. Burger, and S. W. K. P. Shivakumar. The optimal useful logic depth per pipeline stages is 6-8 fo4. In Proceedings of 29th International Symposium on Computer Architecture, 2002.
[12]
J. S. Hu, N. Vijaykrishnan, and M. J. Irwin. Exploring wakeup-free instruction scheduling. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, 2004.
[13]
M. Huang, J. Renau, and J. Torellas. Energy-efficient hybrid wakeup logic. In Proceedings of the International Symposium on Low Power Electronics and Design, 2002.
[14]
C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2), 2003.
[15]
I. Kim and M. Lipasti. Half price architecture. In Proceedings of the 30th International Symposium on Microarchitecture, 2003.
[16]
A. KleinOsowski, J. Flynn, N. Meares, and D. J. Lilja. Adapting the SPEC2000 benchmarks suite for simulation-based computer architecture research. In Workshop on Workload Characterization in International Conference on Computer Design, 2000.
[17]
P. Michaud and A. Seznec. Data-flow prescheduling for large instruction windows in out-of-order processors. In Proceedings of 7th International Symposium on High Performance Computer Architecture, 2001.
[18]
Mosis.org. www.mosis.org/cgi-bin/cgiwrap/umosis/swp/params/ibm-013/t4bj-params.txt.
[19]
S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Proceedings of 24th International Symposium on Computer Architecture, 1997.
[20]
T. Sato, Y. Nakamura, and I. Arita. Revisiting direct tag search algorithm on superscalar processors. In Workshop on Complexity-Effective Design held in conjunction with the 28th Annual International Symposium on Computer Architecture, 2004.
[21]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming, 2002.
[22]
P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Technical report, Western Research Laboratory, Compaq Computer Corporation, 2001.
[23]
J. Stark, M. Brown, and Y. Patt. On pipelining dynamic instruction scheduling logic. In Proceedings of the 33rd International Symposium on Microarchitecture, 2000.
[24]
R. Vivekanandham, B. Amrutur, and R. Govindarajan. A scalable low power issue queue for large instruction window processors. Technical Report TR-LHPC-01-2006, HPC, SERC, Indian Institute of Science, 2005.
[25]
D. Wall. Limits of instruction-level parallelism. Technical report, Western Research Laboratory, Compaq Computer Corporation, 1993.
[26]
N. Weste and D. Harris. CMOS VLSI Design: A Circuits and Systems Perspective, 3rd edition. Addison-Wesley Publishing Company, 2005.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '06: Proceedings of the 20th annual international conference on Supercomputing
June 2006
385 pages
ISBN:1595932828
DOI:10.1145/1183401
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. complexity-effective architecture
  2. issue logic
  3. low-power architecture
  4. wakeup logic

Qualifiers

  • Article

Conference

ICS06
Sponsor:
ICS06: International Conference on Supercomputing 2006
June 28 - July 1, 2006
Queensland, Cairns, Australia

Acceptance Rates

ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)OmegaflowProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460367(152-163)Online publication date: 3-Jun-2021
  • (2010)ForwardflowACM SIGARCH Computer Architecture News10.1145/1816038.181596638:3(14-25)Online publication date: 19-Jun-2010
  • (2010)ForwardflowProceedings of the 37th annual international symposium on Computer architecture10.1145/1815961.1815966(14-25)Online publication date: 19-Jun-2010
  • (2007)Indirect Tag Search Mechanism for Instruction Window Energy Reduction7th IEEE International Conference on Computer and Information Technology (CIT 2007)10.1109/CIT.2007.98(841-846)Online publication date: Oct-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media