Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2540708.2540715acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Published: 07 December 2013 Publication History

Abstract

In multicore processor systems, last-level caches (LLCs) play a crucial role in reducing system energy by i) filtering out expensive accesses to main memory and ii) reducing the time spent executing in high-power states. Cache compression can increase effective cache capacity and reduce misses, improve performance, and potentially reduce system energy. However, previous compressed cache designs have demonstrated only limited benefits due to internal fragmentation and limited tags.
In this paper, we propose the Decoupled Compressed Cache (DCC), which exploits spatial locality to improve both the performance and energy-efficiency of cache compression. DCC uses decoupled super-blocks and non-contiguous sub-block allocation to decrease tag overhead without increasing internal fragmentation. Non-contiguous sub-blocks also eliminate the need for energy-expensive re-compaction when a block's size changes. Compared to earlier compressed caches, DCC increases normalized effective capacity to a maximum of 4 and an average of 2.2 for a wide range of workloads. A further optimized Co-DCC (Co-Compacted DCC) design improves the average normalized effective capacity to 2.6 by co-compacting the compressed blocks in a super-block. Our simulations show that DCC nearly doubles the benefits of previous compressed caches with similar area overhead. We also demonstrate a practical DCC design based on a recent commercial LLC design.

References

[1]
Abali, B. et al. 2001. Performance of Hardware Compressed Main Memory. In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture.
[2]
Alameldeen, A. and Wood, D. 2004. Adaptive Cache Compression for High-Performance Processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture.
[3]
Alameldeen, A. et al. 2003. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer.
[4]
Alameldeen, A. and Wood, D. 2003. Variability in Architectural Simulations of Multi-threaded Workloads. In Proceedings of the Ninth IEEE Symposium on High-Performance Computer Architecture.
[5]
Arelakis, A., Stenström, P. 2012. A Case for a Value-Aware Cache. IEEE Computer Architecture Letters.
[6]
Aslot, V. et al. 2001. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Workshop on OpenMP Applications and Tools.
[7]
Baek, S. et al. 2013. ECM:Effective Capacity Maximizer for High-Performance Compressed Caching. In Proceedings of IEEE Symposium on High-Performance Computer Architecture.
[8]
Bienia, C. et al. 2009. PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors. In Workshop on Modeling, Benchmarking and Simulation.
[9]
Chen, X. et al. 2010. C-pack: a high-performance microprocessor cache compression algorithm, IEEE Transactions on VLSI Systems.
[10]
Intel Core i7 Processors http://www.intel.com/products/processor/corei7/
[11]
Das, R. et al. 2008. Performance and Power Optimization through Data Compression in Network-on-Chip Architectures, International Symposium on High Performance Computer Architecture.
[12]
Dennard R. et al. 1974. Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions. IEEE Journal of Solid-State Circuits.
[13]
Dusser, J. et al. 2011. Decoupled Zero-Compressed Memory. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers.
[14]
Dusser, J. et al. 2009. Zero content augmented cache. In Proceedings of the 23rd international conference on Supercomputing.
[15]
Ekman, M. and Stenstrom, P. 2005. A robust main-memory compression scheme. SIGARCH Computer Architecture News.
[16]
Hallnor, E. et al. 2005. A Unified Compressed Memory Hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture.
[17]
Hallnor, E. et al. 2000. A Fully Associative Software-Managed Cache Design. In Proceedings of the 27th Annual International Symposium on Computer Architecture.
[18]
Hartstein, A. et al. 2008. On the Nature of Cache Miss Behavior: Is It √2? J. Instruction-Level Parallelism 10.
[19]
CACTI: http://www.hpl.hp.com/research/cacti/
[20]
ITRS. International technology roadmap for semiconductors, 2010 update, 2011. URL http://www.itrs.net
[21]
Keckler, S. 2011. Life After Dennard and How I Learned to Love the Picojoule. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.
[22]
Kim, N. et al. 2002. Low-Energy Data Cache Using Sign Compression and Cache Line Bisection. Second Annual workshop on Memory Performance Issues.
[23]
Kim, S. et al. 2011. Residue Cache: A Low-Energy Low-Area L2 Cache Architecture via Compression and Partial Hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.
[24]
Lee, J. et al. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture.
[25]
Liptay, J. 1968. Structural Aspects of the System/360 Model85 Part II: The Cache. IBM Systems Journal.
[26]
Martin, M. et al. 2005. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News.
[27]
2007. Calculating memory system power for DDR3. Technical Report TN-41-01. Micron Technology.
[28]
Naffziger, S. et al. 2002. Apparatus for cache compression engine for data compression of on-chip caches to increase effective cache size. US patent 6,640,283.
[29]
Pekhimenkoy, G. et al. 2012. Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques.
[30]
Rothman, J. et al. 1999. The Pool of Subsectors Cache Design. International Conference on Supercomputing.
[31]
Seznec, A. 1994. Decoupled sectored caches: Conciliating low tag implementation cost and low miss ratio. International Symposium on Computer Architecture.
[32]
Tremaine, R. et al. 2001. IBM Memory Expansion Technology (MXT). IBM Journal of Research and Development.
[33]
Villa, L. et al. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture.
[34]
Weiss, D. et al. 2011. An 8MB Level-3 Cache in 32nm SOI with Column-Select Aliasing. Solid-State Circuits Conference Digest of Technical Papers.
[35]
Yang, J. et al. 2002. Energy Efficient Frequent Value Data Cache Design. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture.
[36]
Yang, J. et al. 2002. Frequent Value Locality and its Applications. ACM Transactions on Embedded Computing Systems.
[37]
Yang, J. et al. 2000. Frequent Value Compression in Data Caches. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture.
[38]
Yoon, D. et al. 2011. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In Proceeding of the 38th Annual International Symposium on Computer Architecture.
[39]
Zebchuk, J. et al. 2007. A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitectur

Cited By

View all
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
  • (2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
  • Show More Cited By

Index Terms

  1. Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
    December 2013
    498 pages
    ISBN:9781450326384
    DOI:10.1145/2540708
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 December 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache design
    2. compression
    3. energy efficiency
    4. multicore

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MICRO-46
    Sponsor:

    Acceptance Rates

    MICRO-46 Paper Acceptance Rate 39 of 239 submissions, 16%;
    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
    • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
    • (2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
    • (2021)Cache Compression with Efficient in-SRAM Data Comparison2021 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS51552.2021.9605440(1-8)Online publication date: Oct-2021
    • (2021)Reinforcement Learning based Data Compression for Energy-Efficient Non-volatile Caches2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00110(662-668)Online publication date: Dec-2021
    • (2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
    • (2020)An Approximate Memory Architecture for Energy Saving in Deep Learning ApplicationsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.296251667:5(1588-1601)Online publication date: May-2020
    • (2020)ZeroSpy: Exploring Software Inefficiency with Redundant ZerosSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00033(1-14)Online publication date: Nov-2020
    • (2020)Improving the Utilization of Micro-operation Caches in x86 Processors2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00025(160-172)Online publication date: Oct-2020
    • (2019)Cache Replacement PoliciesSynthesis Lectures on Computer Architecture10.2200/S00922ED1V01Y201905CAC04714:1(1-87)Online publication date: 17-Jun-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media