Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

Published: 17 September 2016 Publication History

Abstract

Cache memories play a critical role in bridging the latency, bandwidth, and energy gaps between cores and off-chip memory. However, caches frequently consume a significant fraction of a multicore chip's area and thus account for a significant fraction of its cost. Compression has the potential to improve the effective capacity of a cache, providing the performance and energy benefits of a larger cache while using less area. The design of a compressed cache must address two important issues: (i) a low-latency, low-overhead compression algorithm that can represent a fixed-size cache block using fewer bits and (ii) a cache organization that can efficiently store the resulting variable-size compressed blocks. This article focuses on the latter issue.
Here, we propose Yet Another Compressed Cache (YACC), a new compressed cache design that targets improving effective cache capacity with a simple design. YACC uses super-blocks to reduce tag overheads while packing variable-size compressed blocks to reduce internal fragmentation. YACC achieves the benefits of two state-of-the art compressed caches—Decoupled Compressed Cache (DCC) [Sardashti and Wood 2013a, 2013b] and Skewed Compressed Cache (SCC) [Sardashti et al. 2014]—with a more practical and simpler design. YACC's cache layout is similar to conventional caches, with a largely unmodified tag array and unmodified data array. Compared to DCC and SCC, YACC requires neither the significant extra metadata (i.e., back pointers) needed by DCC to track blocks nor the complexity and overhead of skewed associativity (i.e., indexing ways differently) needed by SCC. An additional advantage over previous work is that YACC enables modern replacement mechanisms, such as RRIP.
For our benchmark set, compared to a conventional uncompressed 8MB LLC, YACC improves performance by 8% on average and up to 26%, and reduces total energy by 6% on average and up to 20%. An 8MB YACC achieves approximately the same performance and energy improvements as a 16MB conventional cache at a much smaller silicon footprint, with only 1.6% greater area than an 8MB conventional cache. YACC performs comparably to DCC and SCC but is much simpler to implement.

References

[1]
Bulent Abali, Hubertus Franke, Xiaowei Shen, Dan E. Poff, and T. Basil Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture.
[2]
Alaa R. Alameldeen, Milo M. K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Mark D. Hill, David A. Wood, and Daniel J. Sorin. 2003. Simulating a $2M commercial server on a $2K PC. IEEE Computer 36, 2, 50--57.
[3]
A. Alameldeen and D. Wood. 2003. Variability in architectural simulations of multi-threaded workloads. In Proceedings of the 9th IEEE Symposium on High-Performance Computer Architecture.
[4]
A. Alameldeen and D. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture.
[5]
Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proceedings of the 41st Annual International Symposium on Computer Architecture.
[6]
V. Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W, B. Jones, and B. Parady. 2001. SPEComp: A new benchmark suite for measuring parallel computer performance. In Proceedings of the Workshop on OpenMP Applications and Tools.
[7]
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2013. ECM: Effective capacity maximizer for high-performance compressed caching. In Proceedings of the IEEE Symposium on High-Performance Computer Architecture.
[8]
Christian Bienia and Kai Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the Workshop on Modeling, Benchmarking, and Simulation.
[9]
CACTI. 2008. Home Page. Retrieved August 16, 2016, from http://www.hpl.hp.com/research/cacti/.
[10]
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 8, 1196--1208.
[11]
Julien Dusser, Thomas Piquet, and André Seznec. 2009. Zero-content augmented caches. In Proceedings of the 23rd International Conference on Supercomputing.
[12]
Julien Dusser and Andre Seznec. 2011. Decoupled zero-compressed memory. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers.
[13]
M. Ekman and P. Stenstrom. 2005. A robust main-memory compression scheme. ACM SIGARCH Computer Architecture News 33, 2, 74--85.
[14]
E. Hallnor and S. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture.
[15]
David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40, 9, 1098--1101.
[16]
Intel. 2016. 6th Generation Intel Core i7 Processors. Retrieved August 16, 2016, from http://www.intel.com/products/processor/corei7/.
[17]
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th International Symposium on Computer Architecture (ISCA’10).
[18]
Nam Sung Kim, Todd Austin, and Trevor Mudge. 2002. Low-energy data cache using sign compression and cache line bisection. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues.
[19]
Soontae Kim, Jesung Kim, Jongmin Lee, and Seokin Hong. 2011. Residue Cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.
[20]
Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture 46, 15, 1365--1382.
[21]
M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News 33, 4, 92--99.
[22]
Micron. 2007. Calculating Memory System Power for DDR3. Technical Report TN-41-01. Micron Technology, Boise, ID.
[23]
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture.
[24]
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 377--388.
[25]
Jeffrey Rothman and Alan Smith. 1999. The pool of subsectors cache design. In Proceedings of the International Conference on Supercomputing.
[26]
Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[27]
S. Sardashti, A. Seznec, and D. Wood. 2014. Skewed Compressed Caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47).
[28]
Somayeh Sardashti and David A. Wood. 2013a. Decoupled Compressed Cache: Exploiting spatial locality for energy optimization. In IEEE Micro Top Picks from the 2013 Computer Architecture Conferences.
[29]
Somayeh Sardashti and David A. Wood. 2013b. Decoupled Compressed Cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture.
[30]
A. Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of the 20th Annual International Symposium on Computer Architecture.
[31]
A Seznec. 1994. Decoupled sectored caches: Conciliating low tag implementation cost and low miss ratio. In Proceedings of the International Symposium on Computer Architecture.
[32]
A. Seznec. 2004. Concurrent support of multiple page sizes on a skewed associative TLB. IEEE Transactions on Computers 53, 7, 924--927.
[33]
A. Seznec and F. Bodin. 1993. Skewed-Associative Caches. Research Report RR1655. INRIA. http://hal.inria.fr/docs/00/07/49/02/PDF/RR-1655.pdf.
[34]
Luis Villa, Michael Zhang, and Krste Asanovic. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/.IEEE International Symposium on Microarchitecture.
[35]
Jeffrey Scott Vitter. 1987. Design and analysis of dynamic Huffman codes. Journal of the ACM 34, 4, 825--845.
[36]
Jun Yang and Rajiv Gupta. 2002. Frequent value locality and its applications. ACM Transactions on Embedded Computing Systems 1, 1, 79--105.
[37]
Jason Zebchuk, Elham Safi, and Andreas Moshovos. 2007. A framework for coarse-grain optimizations in the on-chip memory hierarchy. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.
[38]
Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 3, 337--343.
[39]
Jacob Ziv and Abraham Lempel. 1978. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24, 5, 530--536.

Cited By

View all
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)Rigorous Evaluation of Computer Processors with Statistical Model CheckingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623785(1242-1254)Online publication date: 28-Oct-2023
  • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 3
    September 2016
    207 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2988523
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 September 2016
    Accepted: 01 July 2016
    Revised: 01 June 2016
    Received: 01 March 2016
    Published in TACO Volume 13, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Compression
    2. cache design
    3. energy efficiency
    4. multicore systems
    5. performance

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)159
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
    • (2023)Rigorous Evaluation of Computer Processors with Statistical Model CheckingProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623785(1242-1254)Online publication date: 28-Oct-2023
    • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
    • (2022)A Case for Partial Co-allocation Constraints in Compressed CachesEmbedded Computer Systems: Architectures, Modeling, and Simulation10.1007/978-3-031-04580-6_5(65-77)Online publication date: 27-Apr-2022
    • (2021)Byte-Select CompressionACM Transactions on Architecture and Code Optimization10.1145/346220918:4(1-27)Online publication date: 3-Sep-2021
    • (2021)Understanding Cache CompressionACM Transactions on Architecture and Code Optimization10.1145/345720718:3(1-27)Online publication date: 8-Jun-2021
    • (2021)Cache Compression with Efficient in-SRAM Data Comparison2021 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS51552.2021.9605440(1-8)Online publication date: Oct-2021
    • (2021)Flush-Reload Attack and its Mitigation on an FPGA Based Compressed Cache Design2021 22nd International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED51717.2021.9424252(535-541)Online publication date: 7-Apr-2021
    • (2020)MemSZACM Transactions on Architecture and Code Optimization10.1145/342466817:4(1-25)Online publication date: 10-Nov-2020
    • (2020)FFConvACM Transactions on Embedded Computing Systems10.1145/338054819:2(1-24)Online publication date: 11-Mar-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media