research-article

Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Authors:

Somayeh Sardashti,

David A. WoodAuthors Info & Claims

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 62 - 73

https://doi.org/10.1145/2540708.2540715

Published: 07 December 2013 Publication History

Abstract

In multicore processor systems, last-level caches (LLCs) play a crucial role in reducing system energy by i) filtering out expensive accesses to main memory and ii) reducing the time spent executing in high-power states. Cache compression can increase effective cache capacity and reduce misses, improve performance, and potentially reduce system energy. However, previous compressed cache designs have demonstrated only limited benefits due to internal fragmentation and limited tags.

In this paper, we propose the Decoupled Compressed Cache (DCC), which exploits spatial locality to improve both the performance and energy-efficiency of cache compression. DCC uses decoupled super-blocks and non-contiguous sub-block allocation to decrease tag overhead without increasing internal fragmentation. Non-contiguous sub-blocks also eliminate the need for energy-expensive re-compaction when a block's size changes. Compared to earlier compressed caches, DCC increases normalized effective capacity to a maximum of 4 and an average of 2.2 for a wide range of workloads. A further optimized Co-DCC (Co-Compacted DCC) design improves the average normalized effective capacity to 2.6 by co-compacting the compressed blocks in a super-block. Our simulations show that DCC nearly doubles the benefits of previous compressed caches with similar area overhead. We also demonstrate a practical DCC design based on a recent commercial LLC design.

References

[1]

Abali, B. et al. 2001. Performance of Hardware Compressed Main Memory. In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture.

Digital Library

[2]

Alameldeen, A. and Wood, D. 2004. Adaptive Cache Compression for High-Performance Processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture.

Digital Library

[3]

Alameldeen, A. et al. 2003. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer.

Digital Library

[4]

Alameldeen, A. and Wood, D. 2003. Variability in Architectural Simulations of Multi-threaded Workloads. In Proceedings of the Ninth IEEE Symposium on High-Performance Computer Architecture.

Digital Library

[5]

Arelakis, A., Stenström, P. 2012. A Case for a Value-Aware Cache. IEEE Computer Architecture Letters.

[6]

Aslot, V. et al. 2001. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Workshop on OpenMP Applications and Tools.

Digital Library

[7]

Baek, S. et al. 2013. ECM:Effective Capacity Maximizer for High-Performance Compressed Caching. In Proceedings of IEEE Symposium on High-Performance Computer Architecture.

Digital Library

[8]

Bienia, C. et al. 2009. PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors. In Workshop on Modeling, Benchmarking and Simulation.

[9]

Chen, X. et al. 2010. C-pack: a high-performance microprocessor cache compression algorithm, IEEE Transactions on VLSI Systems.

Digital Library

[10]

Intel Core i7 Processors http://www.intel.com/products/processor/corei7/

[11]

Das, R. et al. 2008. Performance and Power Optimization through Data Compression in Network-on-Chip Architectures, International Symposium on High Performance Computer Architecture.

[12]

Dennard R. et al. 1974. Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions. IEEE Journal of Solid-State Circuits.

[13]

Dusser, J. et al. 2011. Decoupled Zero-Compressed Memory. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers.

Digital Library

[14]

Dusser, J. et al. 2009. Zero content augmented cache. In Proceedings of the 23rd international conference on Supercomputing.

Digital Library

[15]

Ekman, M. and Stenstrom, P. 2005. A robust main-memory compression scheme. SIGARCH Computer Architecture News.

Digital Library

[16]

Hallnor, E. et al. 2005. A Unified Compressed Memory Hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture.

Digital Library

[17]

Hallnor, E. et al. 2000. A Fully Associative Software-Managed Cache Design. In Proceedings of the 27th Annual International Symposium on Computer Architecture.

Digital Library

[18]

Hartstein, A. et al. 2008. On the Nature of Cache Miss Behavior: Is It √2? J. Instruction-Level Parallelism 10.

[19]

CACTI: http://www.hpl.hp.com/research/cacti/

[20]

ITRS. International technology roadmap for semiconductors, 2010 update, 2011. URL http://www.itrs.net

[21]

Keckler, S. 2011. Life After Dennard and How I Learned to Love the Picojoule. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.

[22]

Kim, N. et al. 2002. Low-Energy Data Cache Using Sign Compression and Cache Line Bisection. Second Annual workshop on Memory Performance Issues.

[23]

Kim, S. et al. 2011. Residue Cache: A Low-Energy Low-Area L2 Cache Architecture via Compression and Partial Hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[24]

Lee, J. et al. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture.

Digital Library

[25]

Liptay, J. 1968. Structural Aspects of the System/360 Model85 Part II: The Cache. IBM Systems Journal.

Digital Library

[26]

Martin, M. et al. 2005. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News.

Digital Library

[27]

2007. Calculating memory system power for DDR3. Technical Report TN-41-01. Micron Technology.

[28]

Naffziger, S. et al. 2002. Apparatus for cache compression engine for data compression of on-chip caches to increase effective cache size. US patent 6,640,283.

[29]

Pekhimenkoy, G. et al. 2012. Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques.

Digital Library

[30]

Rothman, J. et al. 1999. The Pool of Subsectors Cache Design. International Conference on Supercomputing.

Digital Library

[31]

Seznec, A. 1994. Decoupled sectored caches: Conciliating low tag implementation cost and low miss ratio. International Symposium on Computer Architecture.

Digital Library

[32]

Tremaine, R. et al. 2001. IBM Memory Expansion Technology (MXT). IBM Journal of Research and Development.

Digital Library

[33]

Villa, L. et al. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture.

Digital Library

[34]

Weiss, D. et al. 2011. An 8MB Level-3 Cache in 32nm SOI with Column-Select Aliasing. Solid-State Circuits Conference Digest of Technical Papers.

[35]

Yang, J. et al. 2002. Energy Efficient Frequent Value Data Cache Design. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[36]

Yang, J. et al. 2002. Frequent Value Locality and its Applications. ACM Transactions on Embedded Computing Systems.

Digital Library

[37]

Yang, J. et al. 2000. Frequent Value Compression in Data Caches. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[38]

Yoon, D. et al. 2011. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In Proceeding of the 38th Annual International Symposium on Computer Architecture.

Digital Library

[39]

Zebchuk, J. et al. 2007. A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitectur

Digital Library

Cited By

Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Schwedock BYoovidhya PSeibert JBeckmann NSalapura VZahran MChong FTang L(2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527379
Show More Cited By

Index Terms

Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip's area and thus account for a significant fraction ...
Skewed Compressed Caches
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Cache compression seeks the benefits of a larger cache with the area and power of a smaller cache. Ideally, a compressed cache increases effective capacity by tightly compacting compressed blocks, has low tag and metadata overheads, and allows fast ...
Size-Aware Cache Management for Compressed Cache Architectures
A practical way to increase the effective capacity of a microprocessor's cache, without physically increasing the cache size, is to employ data compression. Last-Level Caches (LLC) are particularly amenable to such compression schemes, since the primary ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

December 2013

498 pages

ISBN:9781450326384

DOI:10.1145/2540708

General Chair:
Matthew Farrens
UC Davis
,
Program Chair:
Christos Kozyrakis
Stanford University

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MICRO-46

Sponsor:

SIGMICRO

MICRO-46: The 46th Annual IEEE/ACM International Symposium on Microarchitecture

December 7 - 11, 2013

California, Davis

Acceptance Rates

MICRO-46 Paper Acceptance Rate 39 of 239 submissions, 16%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
827
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)5

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Schwedock BYoovidhya PSeibert JBeckmann NSalapura VZahran MChong FTang L(2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527379
Wang XAugustine CNurvitadhi EIyer RZhao LDas R(2021)Cache Compression with Efficient in-SRAM Data Comparison2021 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS51552.2021.9605440(1-8)Online publication date: Oct-2021
https://doi.org/10.1109/NAS51552.2021.9605440
Shen FXu CZhang JChen YHe Y(2021)Reinforcement Learning based Data Compression for Energy-Efficient Non-volatile Caches2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00110(662-668)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00110
Tsai PSanchez AFletcher CSanchez DLarus JCeze LStrauss K(2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378453
Nguyen DHung NKim HLee H(2020)An Approximate Memory Architecture for Energy Saving in Deep Learning ApplicationsIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2019.296251667:5(1588-1601)Online publication date: May-2020
https://doi.org/10.1109/TCSI.2019.2962516
You XYang HLuan ZQian DLiu X(2020)ZeroSpy: Exploring Software Inefficiency with Redundant ZerosSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00033(1-14)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00033
Kotra JKalamatianos J(2020)Improving the Utilization of Micro-operation Caches in x86 Processors2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00025(160-172)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00025
Jain ALin C(2019)Cache Replacement PoliciesSynthesis Lectures on Computer Architecture10.2200/S00922ED1V01Y201905CAC04714:1(1-87)Online publication date: 17-Jun-2019
https://doi.org/10.2200/S00922ED1V01Y201905CAC047
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents