Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2190025.2190051acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Link-time optimization for power efficiency in a tagless instruction cache

Published: 02 April 2011 Publication History

Abstract

The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary accesses to the data banks or to the ways that are not likely to hit. However, tag hunks still need to be checked. This paper considers a new hybrid hardware and linker-assisted approach to tagless instruction caching. Our novel cache architecture, supported by the compilation toolchain, removes the need for tag checks entirely for the majority of cache accesses. The linker places frequently-executed instructions in specific program regions that are then mapped into the cache without the need for tag checks. This requires minor hardware modifications, no ISA changes and works across cache configurations. Our approach keeps the software and hardware independent, resulting in both backward and forward compatibility. evaluation on a superscalar processor with and without SMI' support shows power savings of 66% within the instruction cache with no loss of performance. This translates to a 49% saving when considering the combined power of the instruction cache and translation lookaside buffer, which is involved in managing our tagless scheme.

References

[1]
J. Abella and A. González, "Heterogeneous way-size cache," in ICS - International Conference on Supercomputing, 2006.
[2]
D. H. Albonesi, "Selective cache ways: on-demand cache resource allocation," in MICRO - International Symposium on Microarchitecture, 1999.
[3]
N. Bellas, I. Hajj, C. Polychronopoulos, and G. Stamoulis, "Architectural and compiler techniques for energy reduction in high-performance microprocessors," IEEE Trans. on VLSI Systems, vol. 8, no. 3, 2000.
[4]
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt, "The M5 simulator: Modeling networked systems," IEEE Micro, vol. 26, 2006.
[5]
B. Calder and D. Grunwald, "Next cache line and set prediction," in ISCA - International Symposium on Computer Architecture, 1995.
[6]
G. Chen, I. Kayadif, W. Zhang, M. Kandemir, I. Kolcu, and U. Sezer, "Compiler-directed management of instruction accesses," in Euromicro Symposium on Digital System Design, 2003.
[7]
W. J. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. C. Harting, V. Parikh, J. Park, and D. Sheffield, "Efficient embedded computing," Computer, vol. 41, no. 7, 2008.
[8]
B. De Sutter, B. De Bus, and K. De Bosschere, "Link-time binary rewriting techniques for program compaction," ACM TOPLAS - Trans. on Programming Languages and Systems, vol. 27, no. 5, 2005.
[9]
R. Dreslinski, G. Chen, T. Mudge, D. Blaauw, D. Sylvester, and K. Flautner, "Reconfigurable energy efficient near threshold cache architectures," in MICRO - International Symposium on Microarchitecture, 2008.
[10]
B. Egger, J. Lee, and H. Shin, "Dynamic scratchpad memory management for code in portable systems with an mmu," ACM TECS - Trans. in Embedded Computing Systems, vol. 7, no. 2, 2008.
[11]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, "Drowsy caches: Simple techniques for reducing leakage power," in ISCA - International Symposium on Computer Architecture, 2002.
[12]
A. Hasegawa, I. Kawasaki, K. Yamada, S. Yoshioka, S. Kawasaki, and P. Biswas, "Sh3: High code density, low power," IEEE Micro, vol. 15, no. 6, 1995.
[13]
S. Hines, D. Whalley, and G. Tyson, "Guaranteeing hits to improve the efficiency of a small instruction cache," in MICRO - International Symposium on Microarchitecture, 2007.
[14]
J. K. John, J. S. Hu, and S. G. Ziavras, "Optimizing the thermal behavior of subarrayed data caches," in ICCD - International Conference on Computer Design, 2005.
[15]
S. Kaxiras, Z. Hu, and M. Martonosi, "Cache decay: Exploiting generational behavior to reduce cache leakage power," in ISCA - International Symposium on Computer Architecture, 2001.
[16]
N. S. Kim, K. Flautner, D. Blaauw, and T. Mudge, "Drowsy instruction caches: Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction," in MICRO - International Symposium on Microarchitecture, 2002.
[17]
J. Kin, M. Gupta, and W. Mangione-Smith, "The filter cache: an energy efficient memory structure," in MICRO - International Symposium on Microarchitecture, 1997.
[18]
J. C. Ku, S. Ozdemir, G. Memik, and Y. Ismail, "Thermal management of on-chip caches through power density minimization," in MICRO - International Symposium on Microarchitecture, 2005.
[19]
L. H. Lee, B. Moyer, and J. Arends, "Instruction fetch energy reduction using loop caches for embedded applications with small tight loops," in ISLPED - International Symposium on Low Power Electronics and Design, 1999.
[20]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures," in MICRO - International Symposium on Microarchitecture, 2009.
[21]
A. Ma, M. Zhang, and K. Asanovic, "Way memoization to reduce fetch energy in instruction caches," in WCED - Workshop on Complexity-Effective Design (ISCA), 2001.
[22]
N. Nguyen, A. Dominguez, and R. Barua, "Memory allocation for embedded systems with a compile-time-unknown scratch-pad size," in CASES - Int. Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2005.
[23]
M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy, "Reducing set-associative cache energy via way-prediction and selective direct-mapping," in MICRO - International Symposium on Microarchitecture, 2001.
[24]
R. Ravindran, P. Nagarkar, G. Dasika, E. Marsman, R. Senger, S. Mahlke, and R. Brown, "Compiler managed dynamic instruction placement in a low-power code cache," in CGO - International Symposium on Code Generation and Optimization, 2005.
[25]
J. B. Sartor, S. Venkiteswaran, K. S. McKinley, and Z. Wang, "Cooperative caching with keep-me and evict-me," in INTERACT - Annual Workshop on Interaction between Compilers and Computer Architectures, 2005.
[26]
A. Shrivastava, I. Issenin, and N. Dutt, "Compilation techniques for energy reduction in horizontally partitioned cache architectures," in CASES - Int. Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2005.
[27]
"The Standard Performance Evaluation Corporation (SPEC) CPU 2000 Benchmark Suite."
[28]
M. Verma, L. Wehmeyer, and P. Marweclel, "Dynamic overlay of scratchpad memory for energy minimization," in CODES + ISSS - International conference on Hardware/software codesign and system synthesis, 2004.
[29]
L. Villa, M. Zhang, and K. Asanovic, "Dynamic zero compression for cache energy reduction," in MICRO - International Symposium on Microarchitecture, 2000.
[30]
Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems, "Using the compiler to improve cache replacement decisions," in PACT - Int. Conference on Parallel Architectures and Compilation Techniques, 2002.
[31]
R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe, "SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling," in ISCA - International Symposium on Computer Architecture, 2003.
[32]
C.-L. Yang and C.-H. Lee, "Hotspot cache: joint temporal and spatial locality exploitation for i-cache energy reduction," in ISLPED - International Symposium on Low Power Electronics and Design, 2004.
[33]
H. Yang, R. Govindarajan, G. R. Gao, and Z. Hu, "Improving power efficiency with compiler-assisted cache replacement," Journal of Embedded Computing, vol. 1, no. 4, 2005.
[34]
J. Yang and R. Gupta, "Energy efficient frequent value data cache design," in MICRO - International Symposium on Microarchitecture, 2002.
[35]
W. Zhang, J. Hu, V. Degalahal, M. Kandemir, N. Vijaykrishnan, and M. Irwin, "Compiler-directed instruction cache leakage optimization," in MICRO - International Symposium on Microarchitecture, 2002.
[36]
W. Zhang, M. Karakoy, M. Kandemir, and G. Chen, "A compiler approach for reducing data cache energy," in ICS - International Conference on Supercomputing, 2003.
[37]
X. Zhu and T. Tay, "A compiler-controlled instruction cache architecture for an embedded low power microprocessor," in International Conference Computer and Information Technology, 2005.
[38]
X. Zhuang and S. Pande, "Power-efficient prefetching for embedded processors," ACM TECS - Trans. in Embedded Computing Systems, vol. 6, no. 1, 2007.

Cited By

View all
  • (2015)Instruction-Cache Locking for Improving Embedded Systems PerformanceACM Transactions on Embedded Computing Systems10.1145/270010014:3(1-25)Online publication date: 21-Apr-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '11: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
April 2011
324 pages
ISBN:9781612843568

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 April 2011

Check for updates

Qualifiers

  • Article

Acceptance Rates

CGO '11 Paper Acceptance Rate 28 of 105 submissions, 27%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Instruction-Cache Locking for Improving Embedded Systems PerformanceACM Transactions on Embedded Computing Systems10.1145/270010014:3(1-25)Online publication date: 21-Apr-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media