Article

Free access

Circuits for wide-window superscalar processors

Authors:

Bradley C. Kuszmaul,

Gabriel H. Loh,

Rahul SamiAuthors Info & Claims

ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

Pages 236 - 247

https://doi.org/10.1145/339647.339689

Published: 01 May 2000 Publication History

Abstract

Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (geometric mean of 31%) in program speed compared to today's processors. The processor operates at clock speeds comparable to today's processors, but achieves significantly higher ILP.

To measure the impact of a large window on clock speed, we design and simulate new implementations of the logic components that most limit the critical path of our large-window processor: the schedule logic and the wake-up logic. We use log-depth cyclic segmented prefix (CSP) circuits to reimplement these components. Our layouts and simulations of critical paths through these circuits indicate that our large-window processor could be clocked at frequencies exceeding 500MHz in today's technology. Our commit logic and rename logic can also run at these speeds.

To measure the impact of a large window on ILP, we compare two microarchitectures, the first has a 128-instruction window, an 8-wide fetch unit, and 20-wide issue (four integer, branch, multiply, float, and memory units), whereas the second has a 32-instruction window, and a 4-wide fetch unit and is comparable to today's processors. For each, we simulate different window reuse and bypass policies. Our simulations show that the large-window processor achieves significantly higher IPC. This performance increase comes despite the fact that the large-window processor uses a wrap-around window while the small-window processor uses a compressing window, thus effectively increasing its number of outstanding instructions. Furthermore, the large-window processor sometimes pays an extra clock cycle for bypassing.

References

[1]

Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Electrical Engineering and Computer Science Series. MIT Press, Cambridge, MA, 1990.

Digital Library

[2]

James A. Farrell and Timothy C. Fischer. Issue logic for a 600-mhz out-of-order execution microprocessor. IEEE Journal of Solid-State Circuits, 33(5):707-712, May 1998.

[3]

Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processorian asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (AR- VLSI'99), pages 256-273, Atlanta, GA, 21-24 March 1999. http://ee, yale. edu/papers/usmemo3 . ps.gz.

Digital Library

[4]

R.E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, March/April 1999.

Digital Library

[5]

Nathaniel A. Kushman. Performance nonmonotonicities: A case study of the UltraSPARC processor. Master's thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 1998. ftp ://theory. lcs .mit. edu/pub/cilk/kush man-ms-thesis, ps. gz.

[6]

Bradley C. Kuszmaul, Dana S. Henry, and Gabriel H. Loh. A comparison of scalable superscalar processors. In The Eleventh ACM Symposium on Parallel Algorithms and Architectures (SPAA '99), pages 126-137, St. Malo, France, 27-30 June 1999. An early version is available as Ultrascalar Memo 4, 27 January 1999, from the Yale University Computer Architecture and Engineering Group, 51 Prospect Street, New Haven, CT 06525 http : //ee. y ale. edu/papers/usmemo4, ps. gz.

Digital Library

[7]

Scott McFarling. Combining branch predictors. Technical Note TN-36, Digital Western Research Laboratory, 250 University Avenue, Palo Alto, CA 94301, June 1993.

[8]

Rishiyur S. Nikhil, R R. Fenstermacher, and J. E. Hicks. Id world reference manual (for LISP machines). Unnumbered technical report, Massachusetts Institute of Technology, Laboratory for Computer Science, Computations Structures Group, 1988. Supersedes Dinarte R. Morais, Id World: User's Manual, Computations Structures Group Memo 266, June 1986.

[9]

Subbarao Palacharla, Norman R Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA '97), pages 206-218, Denver, Colorado, 2-4 June 1997. ACM SIGARCH and IEEE Computer Society TCCA. htt p : / / www. e c e. wi s c. edu/~ jes/papers/isca, ss . ps. See also {i0}.

Digital Library

[10]

Subbarao Palacharla, Norman P. Jouppi, and James E. Smith. Quantifying the complexity of superscalar processors. Technical Report CS-TR-96-1328, University of Wisconsin, Madison, 19 November 1996. ft p : / / ft p. c s. wisc. edu/sohi/complexity, report, ps. Z.

[11]

Sanjay Jeram Patel, Daniel Holmes Friendly, and Yale N. Patt. Critical issues regarding the trace cache fetch mechanism. Technical Report CSE-TR-335-97, Computer Science and Engineering, University of Michigan, 7 May 1997. http://www, eecs. umich, edu/HPS/hps_ tracecache, html.

[12]

Yale N. Patt, Sanjay J. Patel, Marius Evers, Daniel H. Friendly, and Jared Stark. One billion transistors, one uniprocessor, one chip. Computer, 30(9):51-57, September 1997. http://www, computer, org/computer /col 997/r9051abs. htm.

Digital Library

[13]

Mendel Rosenblum, Edouard Bugnion, Scott Devine, and Stephen Alan Herrod. Using the SimOS machine simulator to study complex computer systems. ACM Transactions on Modeling and Computer Simulation, 7(1):78-103, January 1997.

Digital Library

[14]

Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. In Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO 30), pages 138-148, Research Triangle Park, North Carolina, 1-3 December 1997. IEEE Comuter Society TC-MICRO and ACM SIGMICRO.

Digital Library

[15]

SPEC (Standard Performance Evaluation Corporation). SPEC 95 CPU performance benchmarks. 10754 Ambassador Drive, Suite 201, Manassas, VA 20109, 1995. http ://www. specbench, org/.

[16]

Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), pages 392-403, Santa Margherita Ligure, Italy, 22-24 June 1995. ACM SIGARCH and IEEE Computer Society TCCA. Computer Architecture News, 23(2), May 1994.

Digital Library

[17]

Ben Verghese. SimOS-Alpha. h t t p : / / www. r e s e a r ch. digital, com/wr i/p ro j ect s / S imOS/, February 1998.

[18]

Neil Weste and Kamran Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. VLSI Systems Series. Addison-Wesley, 1985.

Digital Library

Cited By

Sakai SSuenaga TShioya RAndo H(2018)Rearranging Random Issue Queue with High IPC and Short Delay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00027(123-131)Online publication date: Oct-2018
https://doi.org/10.1109/ICCD.2018.00027
Agarwal MJailia M(2016)Inconsistency in translation lookaside buffer2016 International Conference on ICT in Business Industry & Government (ICTBIG)10.1109/ICTBIG.2016.7892705(1-5)Online publication date: 2016
https://doi.org/10.1109/ICTBIG.2016.7892705
Chen CHsiao K(2007)Scalable Dynamic Instruction Scheduler through Wake-Up Spatial LocalityIEEE Transactions on Computers10.1109/TC.2007.7074356:11(1534-1548)Online publication date: 1-Nov-2007
https://dl.acm.org/doi/10.1109/TC.2007.70743
Show More Cited By

Index Terms

Circuits for wide-window superscalar processors
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware
  1. Hardware validation

Recommendations

Complexity-effective superscalar processors
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are ...
Circuits for wide-window superscalar processors
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)

Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (...
Complexity-effective superscalar processors

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture

June 2000

327 pages

ISBN:1581132328

DOI:10.1145/339647

Chairmen:
Alan Berenbaum
Lucent Technologies
,
Joel Emer
Compaq Computer Corp.

ACM SIGARCH Computer Architecture News Volume 28, Issue 2
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
May 2000
325 pages
ISSN:0163-5964
DOI:10.1145/342001
Chairmen:
Alan Berenbaum
Lucent Technologies, Berkeley Heights, NJ
,
Joel Emer
Compaq Computer Corp., Palo Alto, CA
Issue’s Table of Contents

Copyright © 2000 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2000

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA00

Sponsor:

SIGARCH

ISCA00: 27th International Symposium on Computer Architecture

British Columbia, Vancouver, Canada

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

65
Total Citations
View Citations
783
Total Downloads

Downloads (Last 12 months)130
Downloads (Last 6 weeks)17

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sakai SSuenaga TShioya RAndo H(2018)Rearranging Random Issue Queue with High IPC and Short Delay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00027(123-131)Online publication date: Oct-2018
https://doi.org/10.1109/ICCD.2018.00027
Agarwal MJailia M(2016)Inconsistency in translation lookaside buffer2016 International Conference on ICT in Business Industry & Government (ICTBIG)10.1109/ICTBIG.2016.7892705(1-5)Online publication date: 2016
https://doi.org/10.1109/ICTBIG.2016.7892705
Chen CHsiao K(2007)Scalable Dynamic Instruction Scheduler through Wake-Up Spatial LocalityIEEE Transactions on Computers10.1109/TC.2007.7074356:11(1534-1548)Online publication date: 1-Nov-2007
https://dl.acm.org/doi/10.1109/TC.2007.70743
Hsiao KChen C(2006)Wake-up logic optimizations through selective match and wakeup range limitationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.88415014:10(1089-1102)Online publication date: 1-Oct-2006
https://dl.acm.org/doi/10.1109/TVLSI.2006.884150
Jaleel AJacob B(2006)In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)IEEE Transactions on Computers10.1109/TC.2006.7755:5(559-574)Online publication date: 1-May-2006
https://dl.acm.org/doi/10.1109/TC.2006.77
Junwei Zhou Mason A(2006)A Two-level Hybrid Select Logic for Wide-Issue Superscalar Processors2006 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2006.1692517(41-44)Online publication date: 2006
https://doi.org/10.1109/ISCAS.2006.1692517
Hsiao KChen C(2006)Scheduler Optimization by Exploring Wakeup Locality2006 International Conference on Computer Engineering and Systems10.1109/ICCES.2006.320434(115-120)Online publication date: Nov-2006
https://doi.org/10.1109/ICCES.2006.320434
Hsiao KChen C(2006)Improving Scalability and Complexity of Dynamic Scheduler through Wakeup-based Scheduling2006 International Conference on Computer Design10.1109/ICCD.2006.4380817(197-202)Online publication date: Oct-2006
https://doi.org/10.1109/ICCD.2006.4380817
Hsiao KChen CBagherzadeh NValero MRamirez A(2005)An efficient wakeup design for energy reduction in high-performance superscalar processorsProceedings of the 2nd conference on Computing frontiers10.1145/1062261.1062319(353-360)Online publication date: 4-May-2005
https://dl.acm.org/doi/10.1145/1062261.1062319
Jaleel AJacob B(2005)Using Virtual Load/Store Queues (VLSQs) to Reduce the Negative Effects of Reordered Memory InstructionsProceedings of the 11th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2005.42(191-200)Online publication date: 12-Feb-2005
https://dl.acm.org/doi/10.1109/HPCA.2005.42
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents