Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/339647.339689acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Circuits for wide-window superscalar processors

Published: 01 May 2000 Publication History

Abstract

Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today's technology can achieve an increase of 10-60% (geometric mean of 31%) in program speed compared to today's processors. The processor operates at clock speeds comparable to today's processors, but achieves significantly higher ILP.
To measure the impact of a large window on clock speed, we design and simulate new implementations of the logic components that most limit the critical path of our large-window processor: the schedule logic and the wake-up logic. We use log-depth cyclic segmented prefix (CSP) circuits to reimplement these components. Our layouts and simulations of critical paths through these circuits indicate that our large-window processor could be clocked at frequencies exceeding 500MHz in today's technology. Our commit logic and rename logic can also run at these speeds.
To measure the impact of a large window on ILP, we compare two microarchitectures, the first has a 128-instruction window, an 8-wide fetch unit, and 20-wide issue (four integer, branch, multiply, float, and memory units), whereas the second has a 32-instruction window, and a 4-wide fetch unit and is comparable to today's processors. For each, we simulate different window reuse and bypass policies. Our simulations show that the large-window processor achieves significantly higher IPC. This performance increase comes despite the fact that the large-window processor uses a wrap-around window while the small-window processor uses a compressing window, thus effectively increasing its number of outstanding instructions. Furthermore, the large-window processor sometimes pays an extra clock cycle for bypassing.

References

[1]
Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Electrical Engineering and Computer Science Series. MIT Press, Cambridge, MA, 1990.
[2]
James A. Farrell and Timothy C. Fischer. Issue logic for a 600-mhz out-of-order execution microprocessor. IEEE Journal of Solid-State Circuits, 33(5):707-712, May 1998.
[3]
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processorian asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (AR- VLSI'99), pages 256-273, Atlanta, GA, 21-24 March 1999. http://ee, yale. edu/papers/usmemo3 . ps.gz.
[4]
R.E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, March/April 1999.
[5]
Nathaniel A. Kushman. Performance nonmonotonicities: A case study of the UltraSPARC processor. Master's thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 1998. ftp ://theory. lcs .mit. edu/pub/cilk/kush man-ms-thesis, ps. gz.
[6]
Bradley C. Kuszmaul, Dana S. Henry, and Gabriel H. Loh. A comparison of scalable superscalar processors. In The Eleventh ACM Symposium on Parallel Algorithms and Architectures (SPAA '99), pages 126-137, St. Malo, France, 27-30 June 1999. An early version is available as Ultrascalar Memo 4, 27 January 1999, from the Yale University Computer Architecture and Engineering Group, 51 Prospect Street, New Haven, CT 06525 http : //ee. y ale. edu/papers/usmemo4, ps. gz.
[7]
Scott McFarling. Combining branch predictors. Technical Note TN-36, Digital Western Research Laboratory, 250 University Avenue, Palo Alto, CA 94301, June 1993.
[8]
Rishiyur S. Nikhil, R R. Fenstermacher, and J. E. Hicks. Id world reference manual (for LISP machines). Unnumbered technical report, Massachusetts Institute of Technology, Laboratory for Computer Science, Computations Structures Group, 1988. Supersedes Dinarte R. Morais, Id World: User's Manual, Computations Structures Group Memo 266, June 1986.
[9]
Subbarao Palacharla, Norman R Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA '97), pages 206-218, Denver, Colorado, 2-4 June 1997. ACM SIGARCH and IEEE Computer Society TCCA. htt p : / / www. e c e. wi s c. edu/~ jes/papers/isca, ss . ps. See also {i0}.
[10]
Subbarao Palacharla, Norman P. Jouppi, and James E. Smith. Quantifying the complexity of superscalar processors. Technical Report CS-TR-96-1328, University of Wisconsin, Madison, 19 November 1996. ft p : / / ft p. c s. wisc. edu/sohi/complexity, report, ps. Z.
[11]
Sanjay Jeram Patel, Daniel Holmes Friendly, and Yale N. Patt. Critical issues regarding the trace cache fetch mechanism. Technical Report CSE-TR-335-97, Computer Science and Engineering, University of Michigan, 7 May 1997. http://www, eecs. umich, edu/HPS/hps_ tracecache, html.
[12]
Yale N. Patt, Sanjay J. Patel, Marius Evers, Daniel H. Friendly, and Jared Stark. One billion transistors, one uniprocessor, one chip. Computer, 30(9):51-57, September 1997. http://www, computer, org/computer /col 997/r9051abs. htm.
[13]
Mendel Rosenblum, Edouard Bugnion, Scott Devine, and Stephen Alan Herrod. Using the SimOS machine simulator to study complex computer systems. ACM Transactions on Modeling and Computer Simulation, 7(1):78-103, January 1997.
[14]
Eric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, and Jim Smith. Trace processors. In Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO 30), pages 138-148, Research Triangle Park, North Carolina, 1-3 December 1997. IEEE Comuter Society TC-MICRO and ACM SIGMICRO.
[15]
SPEC (Standard Performance Evaluation Corporation). SPEC 95 CPU performance benchmarks. 10754 Ambassador Drive, Suite 201, Manassas, VA 20109, 1995. http ://www. specbench, org/.
[16]
Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), pages 392-403, Santa Margherita Ligure, Italy, 22-24 June 1995. ACM SIGARCH and IEEE Computer Society TCCA. Computer Architecture News, 23(2), May 1994.
[17]
Ben Verghese. SimOS-Alpha. h t t p : / / www. r e s e a r ch. digital, com/wr i/p ro j ect s / S imOS/, February 1998.
[18]
Neil Weste and Kamran Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. VLSI Systems Series. Addison-Wesley, 1985.

Cited By

View all
  • (2018)Rearranging Random Issue Queue with High IPC and Short Delay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00027(123-131)Online publication date: Oct-2018
  • (2016)Inconsistency in translation lookaside buffer2016 International Conference on ICT in Business Industry & Government (ICTBIG)10.1109/ICTBIG.2016.7892705(1-5)Online publication date: 2016
  • (2007)Scalable Dynamic Instruction Scheduler through Wake-Up Spatial LocalityIEEE Transactions on Computers10.1109/TC.2007.7074356:11(1534-1548)Online publication date: 1-Nov-2007
  • Show More Cited By

Index Terms

  1. Circuits for wide-window superscalar processors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
      June 2000
      327 pages
      ISBN:1581132328
      DOI:10.1145/339647
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 28, Issue 2
        Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
        May 2000
        325 pages
        ISSN:0163-5964
        DOI:10.1145/342001
        Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 May 2000

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Article

      Conference

      ISCA00
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)130
      • Downloads (Last 6 weeks)17
      Reflects downloads up to 21 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Rearranging Random Issue Queue with High IPC and Short Delay2018 IEEE 36th International Conference on Computer Design (ICCD)10.1109/ICCD.2018.00027(123-131)Online publication date: Oct-2018
      • (2016)Inconsistency in translation lookaside buffer2016 International Conference on ICT in Business Industry & Government (ICTBIG)10.1109/ICTBIG.2016.7892705(1-5)Online publication date: 2016
      • (2007)Scalable Dynamic Instruction Scheduler through Wake-Up Spatial LocalityIEEE Transactions on Computers10.1109/TC.2007.7074356:11(1534-1548)Online publication date: 1-Nov-2007
      • (2006)Wake-up logic optimizations through selective match and wakeup range limitationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.88415014:10(1089-1102)Online publication date: 1-Oct-2006
      • (2006)In-Line Interrupt Handling and Lock-Up Free Translation Lookaside Buffers (TLBs)IEEE Transactions on Computers10.1109/TC.2006.7755:5(559-574)Online publication date: 1-May-2006
      • (2006)A Two-level Hybrid Select Logic for Wide-Issue Superscalar Processors2006 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2006.1692517(41-44)Online publication date: 2006
      • (2006)Scheduler Optimization by Exploring Wakeup Locality2006 International Conference on Computer Engineering and Systems10.1109/ICCES.2006.320434(115-120)Online publication date: Nov-2006
      • (2006)Improving Scalability and Complexity of Dynamic Scheduler through Wakeup-based Scheduling2006 International Conference on Computer Design10.1109/ICCD.2006.4380817(197-202)Online publication date: Oct-2006
      • (2005)An efficient wakeup design for energy reduction in high-performance superscalar processorsProceedings of the 2nd conference on Computing frontiers10.1145/1062261.1062319(353-360)Online publication date: 4-May-2005
      • (2005)Using Virtual Load/Store Queues (VLSQs) to Reduce the Negative Effects of Reordered Memory InstructionsProceedings of the 11th International Symposium on High-Performance Computer Architecture10.1109/HPCA.2005.42(191-200)Online publication date: 12-Feb-2005
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media