Nothing Special   »   [go: up one dir, main page]

skip to main content
article

A self-tuning cache architecture for embedded systems

Published: 01 May 2004 Publication History

Abstract

Memory accesses often account for about half of a microprocessor system's power consumption. Customizing a microprocessor cache's total size, line size, and associativity to a particular program is well known to have tremendous benefits for performance and power. Customizing caches has until recently been restricted to core-based flows, in which a new chip will be fabricated. However, several configurable cache architectures have been proposed recently for use in prefabricated microprocessor platforms. Tuning those caches to a program is still, however, a cumbersome task left for designers, assisted in part by recent computer-aided design (CAD) tuning aids. We propose to move that CAD on-chip, which can greatly increase the acceptance of tunable caches. We introduce on-chip hardware implementing an efficient cache tuning heuristic that can automatically, transparently, and dynamically tune the cache to an executing program. Our heuristic seeks not only to reduce the number of configurations that must be examined, but also traverses the search space in a way that minimizes costly cache flushes. By simulating numerous Powerstone and MediaBench benchmarks, we show that such a dynamic self-tuning cache saves on average 40% of total memory access energy over a standard nontuned reference cache.

References

[1]
Agarwal, A., Horowitz, M., and Hennessy, J. 1989. An analytical cache model. ACM Trans. Comput. Syst. 7, 2, 184--215.
[2]
Agarwal, A., Li, H., and Roy, K. 2002. DRG-cache: A data retention gated-ground cache for low power. In Proceedings of 39th Design Automation Conference, New York, NY, June 2002. ACM, 473--478.
[3]
Albonesi, D. H. 1999. Selective cache way: On-demand cache resource allocation. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, Los Alamitos, CA, USA. IEEE Computer Society, 248--259.
[4]
Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture, Piscataway, NJ, USA. IEEE, 245--257.
[5]
Burger, D. and Austin, T. M. 1997. The SimpleScalar Tool Set, Version 2.0. Technical Report #1342, Department of Computer Sciences, University of Wisconsin-Madison.
[6]
Givargis, T. and Vahid, F. 2002. Platune: A tuning framework for system-on-a-chip platforms. IEEE Trans. CAD 21, 11.
[7]
Inoue, K., Ishihara, T., and Murakami, K. 1999. Way-predictive set-associative cache for high performance and low energy consumption. In Proceedings of International Symposium on Low Power Electronic Design.
[8]
Kaxiras, S., Hu, Z., and Martonosi, M. 2001. Cache decay: Exploiting generational behavior to reduce cache leakage power. In 28th Annual International Symposium on Computer Architecture.
[9]
Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In International Symposium on Microarchitecture.
[10]
Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In International Symposium on Low Power Electronics and Design.
[11]
Mips Technologies Inc. 2003. http://www.mips.com/products/s2p3.html.
[12]
Powell, M., Agaewal, A., Vijaykumar, T., Falsafi, B., and Roy, K. 2001. Reducing set-associative cache energy via way-prediction and selective direct mapping. In 34th International Symposium on Microarchitecture.
[13]
Reinman, G. and Jouppi, N. P., 1999. CACTI2.0: An Integrated Cache Timing and Power Model. COMPAQ Western Research Lab.
[14]
Veidenbaum, A., Tang, W., Gupta, R., Nicolau, and Ji. X., 1999. Adapting cache line size to application behavior. In Proceedings of the 1999 International Conference on Supercomputing. ACM, New York, NY, USA, 145--154.
[15]
Segars, S. 2001. Low power design techniques for microprocessors. In IEEE International Solid-State Circuits Conference Tutorial.
[16]
Spec. 2000. Standard Performance Evaluation Corporation. http://www.specbench.org.
[17]
Zhang, C., Vahid, F., and Najjar, W. 2003a. Energy benefits of a configurable line size cache for embedded systems. In Proceedings of IEEE Computer Society Annual Symposium on VLSI. New Trends and Technologies for VLSI Systems Design, Florida, USA. IEEE Computer Society, Los Alamitos, CA, USA, 87--91.
[18]
Zhang, C., Vahid, F., and Najjar, W. 2003b. A highly configurable cache architecture for embedded systems. In Proceedings of the 30th ACM/IEEE International Symposium on Computer Architecture, San Diego, CA. 136--146.

Cited By

View all
  • (2022)Evaluating a Machine Learning-based Approach for Cache Configuration2022 IEEE 13th Latin America Symposium on Circuits and System (LASCAS)10.1109/LASCAS53948.2022.9789040(1-4)Online publication date: 1-Mar-2022
  • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
  • (2019)Efficient Cache Reconfiguration Using Machine Learning in NoC-Based Many-Core CMPsACM Transactions on Design Automation of Electronic Systems10.1145/335042224:6(1-23)Online publication date: 9-Sep-2019
  • Show More Cited By

Index Terms

  1. A self-tuning cache architecture for embedded systems

    Recommendations

    Reviews

    Mehran Rezaei

    Tunable cache organization, particularly for embedded systems, is discussed in this paper. Its motivation is that 60 percent of the energy in embedded systems is spent in cache. To reduce such energy consumption-currently one of the main trends in embedded systems research-the paper presents a self-tunable cache that results in a better cache miss rate based on the type of application. "Lower cache miss rate" is synonymous with "less energy consumed," since fewer off-chip accesses are needed. The authors use a reconfigurable cache organization, with three changeable parameters: cache size, associativity, and block size. Way prediction is also supported in their design. They have designed a finite state machine (FSM) that heuristically finds the best configuration for the running application. Based on the FSM result, the cache can be tuned to perform the best in terms of energy consumption. The empirical simulated results show that this approach saves 40 percent of total memory access energy, in comparison with conventional cache. The paper is well written, with a lot of information for readers. The authors state reasonable energy equations, which can be used for further design along the same lines. On average, a good reduction in the energy consumed in the memory system is reported. The authors, however, have not shown the degradation in execution performance as a result of their approach; in fact, they have not shown any execution time data. SimpleScalar tool set, which they used in their experiments, is fully equipped to report cycle execution time, and the authors failed to include such data in their paper. There are techniques that could simply report better cache behavior; I wonder how much benefit this paper's idea would reveal on top of, for example, stream buffers, which are very well behaved in multimedia applications. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 3, Issue 2
    May 2004
    225 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/993396
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 May 2004
    Published in TECS Volume 3, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cache
    2. architecture tuning
    3. configurable
    4. dynamic optimization
    5. embedded systems
    6. low energy
    7. low power
    8. on-chip CAD

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Evaluating a Machine Learning-based Approach for Cache Configuration2022 IEEE 13th Latin America Symposium on Circuits and System (LASCAS)10.1109/LASCAS53948.2022.9789040(1-4)Online publication date: 1-Mar-2022
    • (2020)A Machine Learning Methodology for Cache Memory Design Based on Dynamic InstructionsACM Transactions on Embedded Computing Systems10.1145/337692019:2(1-20)Online publication date: 11-Mar-2020
    • (2019)Efficient Cache Reconfiguration Using Machine Learning in NoC-Based Many-Core CMPsACM Transactions on Design Automation of Electronic Systems10.1145/335042224:6(1-23)Online publication date: 9-Sep-2019
    • (2018)Access Adaptive and Thread-Aware Cache Partitioning in Multicore SystemsElectronics10.3390/electronics70901727:9(172)Online publication date: 1-Sep-2018
    • (2018)Realizing Closed-Loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00136(719-725)Online publication date: Jul-2018
    • (2018)Near Threshold Last Level Cache for Energy Efficient Embedded Applications2018 Ninth International Green and Sustainable Computing Conference (IGSC)10.1109/IGCC.2018.8752134(1-6)Online publication date: Oct-2018
    • (2018)Runtime Adaptive Cache for the LEON3 ProcessorApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-319-78890-6_28(343-354)Online publication date: 8-Apr-2018
    • (2017)Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-coresProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130388(37-42)Online publication date: 27-Mar-2017
    • (2017)Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-coresDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7926955(37-42)Online publication date: Mar-2017
    • (2017)System performance optimization via design and configuration space explorationProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3119880(1046-1049)Online publication date: 21-Aug-2017
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media