Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/956417.956578acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches

Published: 03 December 2003 Publication History

Abstract

High-performance caches statically pull up the bit-linesin all cache subarrays to optimize cache accesslatency. Unfortunately, such an architecture results in asignificant waste of energy in nanoscale CMOS implementationsdue to high leakage and bitline discharge inthe unaccessed subarrays. Recent research advocatesbitline isolation to control precharging of individualsubarrays using bitline precharge devices. In this paper,we carefully evaluate the energy and performancetrade-offs of bitline isolation, and propose a techniqueto exploit nearly its full potential to eliminate dischargeand reduce overall energy in level-one caches.Cycle-accurate and circuit simulation results of awide-issue superscalar processor indicate that: (1) infuture CMOS technologies (e.g., 70nm and beyond),cache architectures that exploit bitline isolation caneliminate up to 90% of the bitline discharge, (2) on-demandprecharging (i.e., decoding the address andsubsequently precharging the accessed subarrays) is notviable in level-one caches because prechargingincreases the cache access latency, and (3) our proposalfor gated precharging to exploit subarray referencelocality and precharging only the recently accessed sub-arrayseliminates nearly all of bitline discharge innanoscale CMOS caches with only a 1% of performancedegradation.

References

[1]
{1} D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 32), pages 248-259, Nov. 1999.
[2]
{2} B. J. Benschneider, A. J. Black, and et. al. A 300-MHz 64-b quad-issue CMOS RISC microprocessor. In IEEE Journal of Solid-State Circuits, pages 1203-1214, Nov. 1995.
[3]
{3} S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23-29, July 1999.
[4]
{4} D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83- 94, June 2000.
[5]
{5} A. Chandrakasan, W. J. Bowhill, and F. Fox. Design of High-Performance Microprocessor Circuits. IEEE Press, 2001.
[6]
{6} J. H. Edmondson, P. I. Rubinfeld, P. J. Bannon, B. J. Benschneider, D. Bernstein, R. W. Castelino, E. M. Cooper, D. E. Dever, D. R. Donchin, T. C. Fischer, A. K. Jain, S. Mehta, J. E. Meyer, R. P. Preston, V. Rajagopalan, C. Somanathan, S. A. Taylor, and G. M. Wolrich. Internal organization of the Alpha 21164, a 300- MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1), 1995.
[7]
{7} B. Gieseke, et. al. A 600-mhz superscalar risc microprocessor with out-of-order execution. In ISSCC Digest of Technical Papers, pages 176-177, Feb. 1997.
[8]
{8} S. Heo, K. Barr, M. Hampton, and K. Asanovic. Dynamic fine-grain leakage reduction using leakage-biased bit-lines. In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002.
[9]
{9} G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the pentium 4 processor. In Intel Technical Journal, 2001.
[10]
{10} R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 39(4):490-504, Apr. 2001.
[11]
{11} M. S. Hrishikesh, D. Burger, N. P. Jouppi, S. W. Keckler, K. I. Farkas, and P. Shivakumar. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 14-24, May 2002.
[12]
{12} K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In Proceedings of the 1999 International Symposium on Low Power Electronics and Design (ISLPED) , pages 273-275, Aug. 1999.
[13]
{13} N. S. Kim, K. Flautner, D. Blaauw, and T. Mudge. Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 35), pages 219-230, Nov. 2002.
[14]
{14} J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, P. M. Donahue, J. Eno, G. W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stephany, and S. C. Thierauf. A 160- MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 31(11):1703-1714, 1996.
[15]
{15} M. D. Powell, A. Agrawal, T. Vijaykumar, B. Falsafi, and K. Roy. Reducing set-associative cache energy via selective direct-mapping and way prediction. In Proceedings of the 34rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 34), Dec. 2001.
[16]
{16} P. Ranganathan, S. Adve, and N. P. Jouppi. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 214-224, June 2000.
[17]
{17} T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2002.
[18]
{18} P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Technical Report 2001.2, Compaq Corporation, Western Research Laboratory, Aug. 2001.
[19]
{19} S. J. E. Wilton and N. P. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Equipment Corporation, Western Research Laboratory, July 1994.
[20]
{20} S.-H. Yang and B. Falsafi. Gated precharging: Using temporal locality of subarrays to save deep-submicron cache energy. In Proceedings of Workshop on Complexity-Effective Design held in conjunction with the 29th International Symposium on Computer Architecture (ISCA-29), May 2002.
[21]
{21} S.-H. Yang, M. D. Powell, B. Falsafi, K. Roy, and T. N. Vijaykumar. An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance i-caches. In Proceedings of the Seventh IEEE Symposium on High-Performance Computer Architecture, Jan. 2001.
[22]
{22} S.-H. Yang, M. D. Powell, B. Falsafi, and T. N. Vijaykumar. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In Proceedings of the Eighth IEEE Symposium on High-Performance Computer Architecture, pages 151-161, Feb. 2002.
[23]
{23} K. C. Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro, 16(2), April 1996.

Cited By

View all
  • (2008)On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining PerformanceIEEE Transactions on Computers10.1109/TC.2007.7077057:1(7-24)Online publication date: 1-Jan-2008
  • (2006)Segmented bitline cacheProceedings of the 13th international conference on High Performance Computing10.1007/11945918_17(123-134)Online publication date: 18-Dec-2006
  • (2006)Using branch prediction information for near-optimal i-cache leakageProceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture10.1007/11859802_4(24-37)Online publication date: 6-Sep-2006
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
December 2003
412 pages
ISBN:076952043X

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2003

Check for updates

Qualifiers

  • Article

Conference

MICRO-36
Sponsor:

Acceptance Rates

MICRO 36 Paper Acceptance Rate 35 of 134 submissions, 26%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2008)On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining PerformanceIEEE Transactions on Computers10.1109/TC.2007.7077057:1(7-24)Online publication date: 1-Jan-2008
  • (2006)Segmented bitline cacheProceedings of the 13th international conference on High Performance Computing10.1007/11945918_17(123-134)Online publication date: 18-Dec-2006
  • (2006)Using branch prediction information for near-optimal i-cache leakageProceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture10.1007/11859802_4(24-37)Online publication date: 6-Sep-2006
  • (2004)Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction cachesProceedings of the 2004 international symposium on Low power electronics and design10.1145/1013235.1013254(54-57)Online publication date: 9-Aug-2004

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media