Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

DLIC: Decoded loop instructions caching for energy-aware embedded processors

Published: 05 September 2013 Publication History

Abstract

With the explosive proliferation of embedded systems, especially through countless portable devices and wireless equipment used, embedded systems have become indispensable to the modern society and people's life. Those devices are often battery driven. Therefore, low energy consumption in embedded processors is important and becomes critical in step with the system complexity. The on-chip instruction cache (I-cache) is usually the most energy-consuming component on the processor chip due to its large size and frequent access operations. To reduce such energy consumption, the existing loop cache approaches use a tiny decoded cache to filter the I-cache access and instruction decode activity for repeated loop iterations. However, such designs are effective for small and simple loops, and only suitable for DSP kernel-like applications. They are not effectual for many embedded applications where complex loops are common. In this article, we propose a decoded loop instruction cache (DLIC) that is small, hence energy efficient, yet can capture most loops, including large nested ones with branch executions, so that a significant amount of I-cache accesses and instruction decoding can be eradicated. The experiments on a set of embedded benchmarks show that our proposed DLIC scheme can reduce energy consumption by up to 87% as compared to normal cache-only design. On average, 66% energy can be saved on instruction fetching and decoding, while at a performance overhead of only 1.4%.

References

[1]
Aa, T. V., Jayapala, M., Barat, F., Deconinck, G., Lauwereins, R., Catthoor, F., and Corporaal, H. 2004. Instruction buffering exploration for low energy VLIWs with instruction clusters. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'04). 824--829.
[2]
Anderson, T. and Agarwala, S. 2000. Effective hardware-based two-way loop cache for high performance low power processors. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers & Processors. 403--407.
[3]
Bajwa, R. S., Hiraki, M., Kojima, H., Gorny, D. J., Nitta, K., Shridhar, A., Seki, K., and Sasaki, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 5, 4, 417--424.
[4]
Bellas, N. E., Hajj, I. N., and Polychronopoulos, C. D. 2000a. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8, 6, 693--708.
[5]
Bellas, N. E., Hajj, I. N., Polychronopoulos, C. D., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In Proceedings of the International Conference on Computer Design (ICCD'99). 378--383.
[6]
Bellas, N. E., Hajj, I. N., Polychronopoulos, C. D., and Stamoulis, G. 2000b. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8, 3, 317--326.
[7]
Burger, D. and Austin, T. 1997. The simplescalar tool set, version 2.0. Tech. rep. CS-TR-1997-1342, Department of Computer Science, University of Wisconsin, Madison, WI.
[8]
Chang, Y.-J. 2006. Lazy BTB: Reduce BTB energy consumption using dynamic profiling. In Proceedings of the Asia and South Pacific Design Automation Conference(ASP-DAC'06). 917--922.
[9]
Chang, Y.-J., Ruan, S.-J., and Lai, F. 2003. Design and analysis of low-power cache using two-level filter scheme. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11, 4, 568--580.
[10]
Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikh, V., Park, J., and Sheffield, D. 2008. Efficient embedded computing. IEEE Comput. 41, 7, 27--32.
[11]
Ghose, K. and Kamble, M. B. 1999. Reducing power in superscalar processor caches using subbanking, multiple line buffers, and bit-line segmentation. In Proceedings of the International Symposium on Low Power Electronics and Design. 70--75.
[12]
González, R., Cristal, A., Ortega, D., Veidenbaum, A., and Valero, M. 2004. A content aware integer register file organization. ACM SIGARCH Comput. Architect. News 32, 2, 314.
[13]
Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). 117--124.
[14]
Gu, J., Guo, H., and Lee, P. 2011. An on-chip instruction cache design with one-bit tag for low power embedded systems. Microprocess. Microsyst. 35, 4, 382--391.
[15]
Guan, X. and Fei, Y. 2008. Reducing power consumption of embedded processors through register file partitioning and compiler support. In Proceedings of the 19th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'08). 269--274.
[16]
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. 83--94.
[17]
Hu, Z. and Martonosi, M. 2000. Reducing register file power consumption by exploiting value lifetime characteristics. In Proceedings of the Workshop on Complexity Effectice Design (in conjunction with ISCA'00).
[18]
Itoh, M., Higaki, S., Takeuchi, Y., Kitajima, A., Imai, M., Sato, J., and Shiomi, A. 2000. Peas-iii: An asip design environment. In Proceedings of the IEEE International Conference on Computer Design. 430--436.
[19]
Kahn, R. and Weiss, S. 2008. Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers. Microprocess. Microsyst. 32, 8, 425--436.
[20]
Kim, S., Vijaykrishnan, N., Kandemir, M., Sivasubramaniam, A., Irwin, M. J., and Geethanjali, E. 2001. Power-aware partitioned cache architectures. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--67.
[21]
Kin, J., Gupta, M., and Mangione-Simith, W. H. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 184--193.
[22]
Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'00). 241--243.
[23]
Manne, S., Klauser, A., and Grunwald, D. 1998. Pipeline gating: Speculation control for energy reduction. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA'98). 132--141.
[24]
Min, R., Xu, Z., Hu, Y., and ben Jone, W. 2004. Partial tag comparison: A new technology for power-efficient set-associative cache designs. In Proceedings of the 17th International Conference on VLSI Design.
[25]
Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckmeyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephang, R., and Thierauf, S. C. 1996. A 160-mhz, 32-b, 0.5-w cmos risc microprocessor. IEEE J. Solid-State Circuits 31, 11, 1703--1714.
[26]
Nalluri, R., Garg, R., and Panda, P. R. 2007. Customization of register file banking architecture for low power. In Proceedings of the 20th International Conference on VLSI Design (VLSID'07). 239--244.
[27]
Panwar, R. and Rennels, D. 1995. Reducing the frequency of tag compares for low power i-cache design. In Proceedings of the International Symposium on Low Power Electronics and Design. 57--62.
[28]
Ravindran, R. A., Nagarkar, P. D., Dasika, G. S., Marsman, E. D., Senger, R. M., Mahlke, S. A., and Brown, R. B. 2005. Compiler managed dynamic instruction placement in a low-power code cache. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'05). 179--190.
[29]
Rawlins, M. and Gordon-Ross, A. 2010. Lightweight runtime control flow analysis for adaptive loop caching. In Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI (GLSVLSI'10). 239--244.
[30]
Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA-6). 375--386.
[31]
Scott, J., Lee, L. H., Arends, J., and Moyer, B. 1998. Designing the low-power m-core architecture. In Proceedings of the International Sympsium on Computer Architecture Power Driven Microarchitecture Workshop. 145--150.
[32]
Solomon, B., Mendelson, A., Orenstein, D., Almog, Y., and Ronen, R. 2003. Micro-operation cache: A power aware frontend for the variable instruction length ISA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11, 5, 801--811.
[33]
Su, C.-L. and Despain, A. M. 1995. Cache design trade-offs for power and performance optimization: A case study. In Proceedings of the International Symposium on Low Power Electronics and Design. 63--68.
[34]
Tang, W., Gupta, R., and Nicolau, A. 2002. Power savings in embedded processors through decode filer cache. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'02). 443--448.
[35]
Thoziyoor, S., Muralimanohar, N., Ahn, J. H., and Jouppi, N. P. 2008. Cacti: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model. Tech. rep. HPL-2008-20, HP Laboratories.
[36]
Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2002. A study on the loop behavior of embedded programs. Tech. rep. UCR-CSE-01-03, University of California, Riverside.
[37]
Vivekanandarajah, K., Srikanthan, T., and Bhattacharyya, S. 2004. Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'04). 373--379.
[38]
Wang, S., Hu, J., and Ziavras, S. G. 2008. BTB access filtering: A low energy and high performance design. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'08). 81--86.
[39]
Zeng, H. and Ghose, K. 2006. Register file caching for energy efficiency. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'06). 244--249.
[40]
Zhang, C., Vahid, F., Yang, J., and Najjar, W. Marc. 2005. A way-halting cache for low-energy high-performance systems. ACM Trans. Architect. Code Optim. (TACO) 2, 1, 34--54.
[41]
Zhang, W. and Allu, B. 2007. Reducing branch predictor leakage energy by exploiting loops. ACM Trans. Embed. Comput. Syst. (TECS) 6, 2, Article 11.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 13, Issue 1
August 2013
332 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2501626
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 05 September 2013
Accepted: 01 May 2012
Revised: 01 July 2011
Received: 01 November 2010
Published in TECS Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache hierarchy
  2. embedded systems
  3. filter cache
  4. instruction decode
  5. low energy
  6. low power

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 258
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media