Article

ReCycle:: pipeline adaptation to tolerate process variation

Authors:

Abhishek Tiwari,

Smruti R. Sarangi,

Josep TorrellasAuthors Info & Claims

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Pages 323 - 334

https://doi.org/10.1145/1250662.1250703

Published: 09 June 2007 Publication History

Abstract

Process variation affects processor pipelines by making some stages slower and others faster, therefore exacerbating pipeline unbalance. This reduces the frequency attainable by the pipeline. To improve performance, this paper proposes ReCycle, an architectural framework that comprehensively applies cycle time stealing to the pipeline - transferring the time slack of the faster stages to the slow ones by skewing clock arrival times to latching elements after fabrication. As a result, the pipeline can be clocked with a period equal to the average stage delay rather than the longest one. In addition, ReCycle's frequency gains are enhanced with Donor stages, which are empty stages added to "donate" slack to the slow stages. Finally, ReCycle can also convert slack into power reductions.

For a 17FO4 pipeline, ReCycle increases the frequency by 12% and the application performance by 9% on average. Combining ReCycle and donor stages delivers improvements of 36% in frequency and 15% in performance onaverage, completely reclaiming the performance losses due to variation.

References

[1]

M. Agarwal, B. C. Paul, and S. Mitra. Circuit failure prediction and its application to transistor aging. In IEEE VLSI Test Symp., 2007.

Digital Library

[2]

D.H. Albonesi. Dynamic IPC/clock rate optimization. In International Symposium on Computer Architecture, June 1998.

Digital Library

[3]

C. Albrecht, B. Korte, J. Schietke, and J. Vygen. Maximum mean weight cycle in a digraph and minimizing cycle time of a logic chip. Discrete Appl. Math., 123(1--3):103--127, 2002.

Digital Library

[4]

K. Bernstein. High Speed CMOS Design Styles. Kluwer Academic Publishers, 1999.

Digital Library

[5]

E. Borch, E. Tune, S. Manne, and J. S. Emer. Loose loops sink chips. In International Symposium on High-Performance Computer Architecture, pages 299--310, February 2002.

Digital Library

[6]

K. Bowman, S. Duvall, and J. Meindl. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-State Circuits, 37(2):183--190, 2002.

[7]

D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In International Symposium on Computer Architecture, June 2000.

Digital Library

[8]

A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, and M. Poncino. Dynamic thermal clock skew compensation using tunable delay buffers. In International Symposium on Low Power Electronics and Design, October 2006.

Digital Library

[9]

T. Chen and S. Naffziger. Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation. IEEE Trans. on VLSI Systems, 11(5):888--899, October 2003.

Digital Library

[10]

Z. Chishti and T. Vijaykumar. Wire delay is not a problem for SMT (in the near future). In International Symposium on Computer Architecture, pages 40--51, June 2004.

Digital Library

[11]

L. Cotten. Maximum rate pipelined systems. In AFIPS Spring Joint Computing Conference, 1969.

[12]

N. Cressie. Statistics for Spatial Data. John Wiley & Sons, 1993.

[13]

A. DeHon, T. Knight, Jr., and T. Simon. Automatic impedance control. In ISSCC Digest of Technical Papers, February 1993.

[14]

U. Desai, S. Tam, R. Kim, J. Zhang, and S. Rusu. Itanium processor clock design. In International Symposium on Physical Design, April 2000.

Digital Library

[15]

S. Dhar, D. Maksimovic, and B. Kranzen. Closed-loop adaptive voltage scaling controller for standard-cell ASICs. In International Symposium on Low Power Electronics and Design, August 2002.

Digital Library

[16]

A. Efthymiou and J. D. Garside. Adaptive pipeline depth control for processor power-management. In International Conference on Computer Design, November 2002.

Digital Library

[17]

D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, and T. Mudge. Razor: A lowpower pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, December 2003.

Digital Library

[18]

J. P. Fishburn. Clock skew optimization. In IEEE Trans. on Computers, volume 39, July 1990.

Digital Library

[19]

P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos. Modeling within-die spatial correlation effects for process-design co-optimization. In International Symposium on Quality Electronic Design, March 2005.

Digital Library

[20]

P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L. Allmon. High-performance microprocessor design. IEEE J. Solid-State Circuits, 33(5):676--686, May 1998.

[21]

A. Hartstein and T. R. Puzak. The optimum pipeline depth for a microprocessor. In International Symposium on Computer Architecture, May 2002.

Digital Library

[22]

R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4), April 2001.

[23]

M. Hrishikesh, D. Burger, N. P. Jouppi, S. W. Keckler, K. I. Farkas, and P. Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. International Symposium on Computer Architecture, May 2002.

Digital Library

[24]

E. Humenay, D. Tarjan, and K. Skadron. Impact of parameter variations on multicore chips. In Workshop on Architectural Support for Gigascale Integration (ASGI), June 2006.

[25]

International Technology Roadmap for Semiconductors (2005 Edition).

[26]

P. Kapur, G. Chandra, and K. C. Saraswat. Power estimation in global interconnects and its reduction using a novel repeater optimization methodology. In Design Automation Conference, June 2002.

Digital Library

[27]

T. Karnik, S. Borkar, and V. De. Probabilistic and variation-tolerant design: Key to continued moore's law. In TAU Workshop, 2004.

[28]

R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, 1999.

Digital Library

[29]

J. Koppanalil, P. Ramrakhyani, S. Desai, A. Vaidyanathan, and E. Rotenberg. A case for dynamic pipeline scaling. In Conference on Compilers, Architecture, and Synthesis for Embedded Systems, October 2002.

Digital Library

[30]

S. Lee, S. Das, T. Pham, T. Austin, D. Blaauw, and T. Mudge. Reducing pipeline energy demands with local DVS and dynamic retiming. In International Symposium on Low Power Electronics and Design, pages 319--324, August 2004.

Digital Library

[31]

X. Liang and D. Brooks. Mitigating the impact of process variations on CPU register file and execution units. In International Symposium on Microarchitecture, December 2006.

Digital Library

[32]

B. Nikolic, L. Chang, and T.-J. King. Performance of deeply-scaled, power-constrained circuits. In International Conference on Solid State Devices and Materials, pages 154--155, September 2003.

[33]

M. Olivieri. Design of synchronous and asynchronous ariablelatency pipelined multipliers. In IEEE Trans. on Very Large Scale Integration Systems, volume 9, April 2001.

Digital Library

[34]

S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. Yieldaware cache architectures. In International Symposium on Microarchitecture, December 2006.

Digital Library

[35]

R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, 2005.

[36]

J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC Simulator, January 2005. http://sesc.sourceforge.net.

[37]

P. Ribeiro Jr. and P. Diggle. geoR: a package for geostatistical analysis. R-NEWS, 1(2):14--18, June 2001.

[38]

T. Sakurai and R. Newton. Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas. IEEE JSSC, 25(2):584--594, 1990.

[39]

T. Shanley. The Unabridged Pentium-4. Addison-Wesley, July 2004.

[40]

T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In International Symposium on Computer Architecture, June 2003.

Digital Library

[41]

M. Shoji. Elimination of process-dependent clock skew in CMOS VLSI. In Journal of Solid State Circuits, pages 875--880, 1986.

[42]

E. Sprangle and D. Carmean. Increasing processor performance by implementing deeper pipelines. In International Symposium on Computer Architecture, May 2002.

Digital Library

[43]

A. Srivastava, D. Sylvester, and D. Blaauw. Statistical Analysis and Optimization for VLSI: Timing and Power. Springer, 2005.

[44]

D. Tarjan, S. Thoziyoor, and N. Jouppi. Cacti 4.0. Technical Report 2006/86, HP Laboratories, June 2006.

[45]

J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of dieto-die and within-die parameter variations on microprocessor frequency and leakage. Journal of Solid-State Circuits, 37(11):1396--1402, 2002.

[46]

X. Vera, O. Ünsal, and A. González. X-pipe: An adaptive resilient microarchitecture for parameter variations. In Workshop on Architectural Support for Gigascale Integration, June 2006.

[47]

J. Xiong, V. Zolotov, and L. He. Robust extraction of spatial correlation. In International Symposium on Physical Design, August 2006.

Digital Library

Cited By

Li XYan GLiu CLi XYan GLiu C(2023)Fault-Tolerant General Purposed ProcessorsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_3(117-168)Online publication date: 2-Mar-2023
https://doi.org/10.1007/978-981-19-8551-5_3
Li XYan GLiu CLi XYan GLiu C(2023)Fault-Tolerant CircuitsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_2(33-116)Online publication date: 2-Mar-2023
https://doi.org/10.1007/978-981-19-8551-5_2
Smith KRavi GMurali PBaker JEarnest NJavadi-Cabhari AChong F(2022)TimeStitch: Exploiting Slack to Mitigate Decoherence in Quantum CircuitsACM Transactions on Quantum Computing10.1145/35487784:1(1-27)Online publication date: 21-Oct-2022
https://dl.acm.org/doi/10.1145/3548778
Show More Cited By

Index Terms

ReCycle:: pipeline adaptation to tolerate process variation
1. Hardware
  1. Robustness

Recommendations

ReCycle:: pipeline adaptation to tolerate process variation

Process variation affects processor pipelines by making some stages slower and others faster, therefore exacerbating pipeline unbalance. This reduces the frequency attainable by the pipeline. To improve performance, this paper proposes ReCycle, an ...
Statistical clock skew modeling with data delay variations
System Level Design

Accurate clock skew budgets are important for microprocessor designers to avoid hold-time failures and to properly allocate resources when optimizing global and local paths. Many published clock skew budgets neglect voltage jitter and process variation, ...
Statistical clock skew analysis considering intradie-process variations

With shrinking cycle times, clock skew has become an increasingly difficult and important problem for high performance designs. Traditionally, clock skew has been analyzed using case-files which cannot model intradie-process variations and hence result ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

June 2007

542 pages

ISBN:9781595937063

DOI:10.1145/1250662

General Chair:
Dean Tullsen
University of California, San Diego
,
Program Chair:
Brad Calder
Microsoft & University of California, San Diego

ACM SIGARCH Computer Architecture News Volume 35, Issue 2
May 2007
527 pages
ISSN:0163-5964
DOI:10.1145/1273440
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA07

Sponsor:

SIGARCH
IEEE-CS

SPAA07: 19th ACM Symposium on Parallelism in Algorithms and Architectures

June 9 - 13, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

123
Total Citations
View Citations
1,126
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)3

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li XYan GLiu CLi XYan GLiu C(2023)Fault-Tolerant General Purposed ProcessorsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_3(117-168)Online publication date: 2-Mar-2023
https://doi.org/10.1007/978-981-19-8551-5_3
Li XYan GLiu CLi XYan GLiu C(2023)Fault-Tolerant CircuitsBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_2(33-116)Online publication date: 2-Mar-2023
https://doi.org/10.1007/978-981-19-8551-5_2
Smith KRavi GMurali PBaker JEarnest NJavadi-Cabhari AChong F(2022)TimeStitch: Exploiting Slack to Mitigate Decoherence in Quantum CircuitsACM Transactions on Quantum Computing10.1145/35487784:1(1-27)Online publication date: 21-Oct-2022
https://dl.acm.org/doi/10.1145/3548778
Sorin DSorin D(2022)IntroductionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_1(1-17)Online publication date: 5-Mar-2022
https://doi.org/10.1007/978-3-031-01723-0_1
Koutsovasilis PKalogirou CParasyris KAntonopoulos CBellas NLalis S(2022)Exploiting Reduced Voltage Margins: From Node- to the Datacenter-levelComputing at the EDGE10.1007/978-3-030-74536-3_4(91-121)Online publication date: 20-Sep-2022
https://doi.org/10.1007/978-3-030-74536-3_4
Tziouvaras ADimitriou GDossis MStamoulis G(2021)Frequency Scaling for High Performance of Low-End Pipelined ProcessorsAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0602886:2(763-775)Online publication date: Mar-2021
https://doi.org/10.25046/aj060288
Dimopoulos MNicolaidis M(2021)The Quest of the Ideal Error Detecting Architecture: The GRAAL ArchitectureIEEE Transactions on Sustainable Computing10.1109/TSUSC.2018.28788326:3(493-506)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TSUSC.2018.2878832
Koutsovasilis PParasyris KAntonopoulos CBellas NLalis S(2020)Dynamic Undervolting to Improve Energy Efficiency on Multicore X86 CPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300438331:12(2851-2864)Online publication date: 1-Dec-2020
https://doi.org/10.1109/TPDS.2020.3004383
Lu YDuan SHalak BKazmierski T(2019)A Variation-Aware Design Methodology for Distributed ArithmeticElectronics10.3390/electronics80101088:1(108)Online publication date: 18-Jan-2019
https://doi.org/10.3390/electronics8010108
Tziouvaras ADimitriou GDossis MStamoulis G(2019)Instruction-Based Timing Analysis in Pipelined Processors2019 4th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM)10.1109/SEEDA-CECNSM.2019.8908287(1-6)Online publication date: Sep-2019
https://doi.org/10.1109/SEEDA-CECNSM.2019.8908287
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents