Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1250662.1250703acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

ReCycle:: pipeline adaptation to tolerate process variation

Published: 09 June 2007 Publication History

Abstract

Process variation affects processor pipelines by making some stages slower and others faster, therefore exacerbating pipeline unbalance. This reduces the frequency attainable by the pipeline. To improve performance, this paper proposes ReCycle, an architectural framework that comprehensively applies cycle time stealing to the pipeline - transferring the time slack of the faster stages to the slow ones by skewing clock arrival times to latching elements after fabrication. As a result, the pipeline can be clocked with a period equal to the average stage delay rather than the longest one. In addition, ReCycle's frequency gains are enhanced with Donor stages, which are empty stages added to "donate" slack to the slow stages. Finally, ReCycle can also convert slack into power reductions.
For a 17FO4 pipeline, ReCycle increases the frequency by 12% and the application performance by 9% on average. Combining ReCycle and donor stages delivers improvements of 36% in frequency and 15% in performance onaverage, completely reclaiming the performance losses due to variation.

References

[1]
M. Agarwal, B. C. Paul, and S. Mitra. Circuit failure prediction and its application to transistor aging. In IEEE VLSI Test Symp., 2007.
[2]
D.H. Albonesi. Dynamic IPC/clock rate optimization. In International Symposium on Computer Architecture, June 1998.
[3]
C. Albrecht, B. Korte, J. Schietke, and J. Vygen. Maximum mean weight cycle in a digraph and minimizing cycle time of a logic chip. Discrete Appl. Math., 123(1--3):103--127, 2002.
[4]
K. Bernstein. High Speed CMOS Design Styles. Kluwer Academic Publishers, 1999.
[5]
E. Borch, E. Tune, S. Manne, and J. S. Emer. Loose loops sink chips. In International Symposium on High-Performance Computer Architecture, pages 299--310, February 2002.
[6]
K. Bowman, S. Duvall, and J. Meindl. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2):183--190, 2002.
[7]
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In International Symposium on Computer Architecture, June 2000.
[8]
A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, and M. Poncino. Dynamic thermal clock skew compensation using tunable delay buffers. In International Symposium on Low Power Electronics and Design, October 2006.
[9]
T. Chen and S. Naffziger. Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation. IEEE Trans. on VLSI Systems, 11(5):888--899, October 2003.
[10]
Z. Chishti and T. Vijaykumar. Wire delay is not a problem for SMT (in the near future). In International Symposium on Computer Architecture, pages 40--51, June 2004.
[11]
L. Cotten. Maximum rate pipelined systems. In AFIPS Spring Joint Computing Conference, 1969.
[12]
N. Cressie. Statistics for Spatial Data. John Wiley & Sons, 1993.
[13]
A. DeHon, T. Knight, Jr., and T. Simon. Automatic impedance control. In ISSCC Digest of Technical Papers, February 1993.
[14]
U. Desai, S. Tam, R. Kim, J. Zhang, and S. Rusu. Itanium processor clock design. In International Symposium on Physical Design, April 2000.
[15]
S. Dhar, D. Maksimovic, and B. Kranzen. Closed-loop adaptive voltage scaling controller for standard-cell ASICs. In International Symposium on Low Power Electronics and Design, August 2002.
[16]
A. Efthymiou and J. D. Garside. Adaptive pipeline depth control for processor power-management. In International Conference on Computer Design, November 2002.
[17]
D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: A lowpower pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, December 2003.
[18]
J. P. Fishburn. Clock skew optimization. In IEEE Trans. on Computers, volume 39, July 1990.
[19]
P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. Modeling within-die spatial correlation effects for process-design co-optimization. In International Symposium on Quality Electronic Design, March 2005.
[20]
P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon. High-performance microprocessor design. IEEE J. Solid-State Circuits, 33(5):676--686, May 1998.
[21]
A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In International Symposium on Computer Architecture, May 2002.
[22]
R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4), April 2001.
[23]
M. Hrishikesh, D. Burger, N. P. Jouppi, S. W. Keckler, K. I. Farkas, and P. Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. International Symposium on Computer Architecture, May 2002.
[24]
E. Humenay, D. Tarjan, and K. Skadron. Impact of parameter variations on multicore chips. In Workshop on Architectural Support for Gigascale Integration (ASGI), June 2006.
[25]
International Technology Roadmap for Semiconductors (2005 Edition).
[26]
P. Kapur, G. Chandra, and K. C. Saraswat. Power estimation in global interconnects and its reduction using a novel repeater optimization methodology. In Design Automation Conference, June 2002.
[27]
T. Karnik, S. Borkar, and V. De. Probabilistic and variation-tolerant design: Key to continued moore's law. In TAU Workshop, 2004.
[28]
R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, 1999.
[29]
J. Koppanalil, P. Ramrakhyani, S. Desai, A. Vaidyanathan, and E. Rotenberg. A case for dynamic pipeline scaling. In Conference on Compilers, Architecture, and Synthesis for Embedded Systems, October 2002.
[30]
S. Lee, S. Das, T. Pham, T. Austin, D. Blaauw, and T. Mudge. Reducing pipeline energy demands with local DVS and dynamic retiming. In International Symposium on Low Power Electronics and Design, pages 319--324, August 2004.
[31]
X. Liang and D. Brooks. Mitigating the impact of process variations on CPU register file and execution units. In International Symposium on Microarchitecture, December 2006.
[32]
B. Nikolic, L. Chang, and T.-J. King. Performance of deeply-scaled, power-constrained circuits. In International Conference on Solid State Devices and Materials, pages 154--155, September 2003.
[33]
M. Olivieri. Design of synchronous and asynchronous ariablelatency pipelined multipliers. In IEEE Trans. on Very Large Scale Integration Systems, volume 9, April 2001.
[34]
S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. Yieldaware cache architectures. In International Symposium on Microarchitecture, December 2006.
[35]
R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, 2005.
[36]
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC Simulator, January 2005. http://sesc.sourceforge.net.
[37]
P. Ribeiro Jr. and P. Diggle. geoR: a package for geostatistical analysis. R-NEWS, 1(2):14--18, June 2001.
[38]
T. Sakurai and R. Newton. Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas. IEEE JSSC, 25(2):584--594, 1990.
[39]
T. Shanley. The Unabridged Pentium-4. Addison-Wesley, July 2004.
[40]
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In International Symposium on Computer Architecture, June 2003.
[41]
M. Shoji. Elimination of process-dependent clock skew in CMOS VLSI. In Journal of Solid State Circuits, pages 875--880, 1986.
[42]
E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In International Symposium on Computer Architecture, May 2002.
[43]
A. Srivastava, D. Sylvester, and D. Blaauw. Statistical Analysis and Optimization for VLSI: Timing and Power. Springer, 2005.
[44]
D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical Report 2006/86, HP Laboratories, June 2006.
[45]
J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of dieto-die and within-die parameter variations on microprocessor frequency and leakage. Journal of Solid-State Circuits, 37(11):1396--1402, 2002.
[46]
X. Vera, O. Ünsal, and A. González. X-pipe: An adaptive resilient microarchitecture for parameter variations. In Workshop on Architectural Support for Gigascale Integration, June 2006.
[47]
J. Xiong, V. Zolotov, and L. He. Robust extraction of spatial correlation. In International Symposium on Physical Design, August 2006.

Cited By

View all
  • (2023)Fault-Tolerant General Purposed ProcessorsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_3(117-168)Online publication date: 2-Mar-2023
  • (2023)Fault-Tolerant CircuitsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_2(33-116)Online publication date: 2-Mar-2023
  • (2022)TimeStitch: Exploiting Slack to Mitigate Decoherence in Quantum CircuitsACM Transactions on Quantum Computing10.1145/35487784:1(1-27)Online publication date: 21-Oct-2022
  • Show More Cited By

Index Terms

  1. ReCycle:: pipeline adaptation to tolerate process variation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
    June 2007
    542 pages
    ISBN:9781595937063
    DOI:10.1145/1250662
    • General Chair:
    • Dean Tullsen,
    • Program Chair:
    • Brad Calder
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
      May 2007
      527 pages
      ISSN:0163-5964
      DOI:10.1145/1273440
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clock skew
    2. pipeline
    3. process variation

    Qualifiers

    • Article

    Conference

    SPAA07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Fault-Tolerant General Purposed ProcessorsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_3(117-168)Online publication date: 2-Mar-2023
    • (2023)Fault-Tolerant CircuitsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_2(33-116)Online publication date: 2-Mar-2023
    • (2022)TimeStitch: Exploiting Slack to Mitigate Decoherence in Quantum CircuitsACM Transactions on Quantum Computing10.1145/35487784:1(1-27)Online publication date: 21-Oct-2022
    • (2022)IntroductionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_1(1-17)Online publication date: 5-Mar-2022
    • (2022)Exploiting Reduced Voltage Margins: From Node- to the Datacenter-levelComputing at the EDGE10.1007/978-3-030-74536-3_4(91-121)Online publication date: 20-Sep-2022
    • (2021)Frequency Scaling for High Performance of Low-End Pipelined ProcessorsAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0602886:2(763-775)Online publication date: Mar-2021
    • (2021)The Quest of the Ideal Error Detecting Architecture: The GRAAL ArchitectureIEEE Transactions on Sustainable Computing10.1109/TSUSC.2018.28788326:3(493-506)Online publication date: 1-Jul-2021
    • (2020)Dynamic Undervolting to Improve Energy Efficiency on Multicore X86 CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300438331:12(2851-2864)Online publication date: 1-Dec-2020
    • (2019)A Variation-Aware Design Methodology for Distributed ArithmeticElectronics10.3390/electronics80101088:1(108)Online publication date: 18-Jan-2019
    • (2019)Instruction-Based Timing Analysis in Pipelined Processors2019 4th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM)10.1109/SEEDA-CECNSM.2019.8908287(1-6)Online publication date: Sep-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media