Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Published: 20 June 2009 Publication History

Abstract

Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management policies (e.g., LRU) can lead to poor performance and fairness when the multiple cores compete for the limited LLC capacity. Different memory access patterns can cause cache contention in different ways, and various techniques have been proposed to target some of these behaviors. In this work, we propose a new cache management approach that combines dynamic insertion and promotion policies to provide the benefits of cache partitioning, adaptive insertion, and capacity stealing all with a single mechanism. By handling multiple types of memory behaviors, our proposed technique outperforms techniques that target only either capacity partitioning or adaptive insertion.

References

[1]
J. Abella, A. González, X. Vera, and M. F. P. O'Boyle. IATAC: A Smart Predictor to Turn-Off L2 Cache Lines. Trans. on Architecture and Code Optimization, 2(1):55--77, Mar. 2005.
[2]
T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Micro Magazine, pages 59--67, Feb. 2002.
[3]
D. A. Bader, Y. Li, T. Li, and V. Sachdeva. BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture of Bioinformatics Applications. In Proc. of the IEEE Int. Symp. on Workload Characterization, pages 163--173, Austin, TX, USA, Oct. 2005.
[4]
M. Behar, A. Mendelson, and A. Kolodny. Trace Cache Sampling Filter. In Proc. of the 14th Int. Conference on Parallel Architectures and Compilation Techniques, pages 255--266, St. Louis, MO, USA, Sep. 2005.
[5]
D. S. Bolme, M. M. Strout, and J. R. Beveridge. FacePerf: Benchmarks for Face Recognition Algorithms. In Proc. of the IEEE Int. Symp. on Workload Characterization, Boston, MA, USA, Oct. 2007.
[6]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contenton on a Chip Multi-Processor Architecture. In Proc. of the 11th Int. Symp. on High Performance Computer Architecture, pages 340--351, San Francisco, CA, USA, Feb. 2005.
[7]
J. Chang and G. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proc. of the 21st Int. Conference on Supercomputing, pages 242--252, Seattle, WA, June 2007.
[8]
D. Chiou. Extending the Reach of Microprocessors: Column and Curious Caching. PhD thesis, Massachusettts Institute of Technology, 1999.
[9]
J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access. White paper, Intel Corporation, 2006. http://download.intel.com/technology/architecture/sma.pdf.
[10]
K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proc. of the 29th Int. Symp. on Computer Architecture, pages 148--157, Anchorage, AK, USA, May 2002.
[11]
H. Ghasemzadeh, S. Mazrouee, and M. R. Kakoee. Modified Pseudo LRU Replacement Algorithm. In Proc. of the Int. Symp. on Low Power Electronics and Design, pages 27--30, Potsdam, Germany, Mar. 2006.
[12]
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A Framework for Providing Quality of Service in Chip Multi-Processors. In Proc. of the 40th Int. Symp. on Microarchitecture, Chicago, IL, Dec. 2007.
[13]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commerically Representative Embedded Benchmark Suite. In Proc. of the 4th Workshop on Workload Characterization, pages 83--94, Austin, TX, USA, Dec. 2001.
[14]
G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proc. of the Workshop on Modeling, Benchmarking and Simulation, Madison, WI, USA, June 2005.
[15]
L. R. Hsu, S. K. Reinhardt, R. R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proc. of the 15th Int. Conference on Parallel Architectures and Compilation Techniques, pages 13--22, Seattle, WA, USA, Sep. 2006.
[16]
Z. Hu, M. Martonosi, and S. Kaxiras. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc. of the 29th Int. Symp. on Computer Architecture, pages 209--220, Anchorage, AK, USA, May 2002.
[17]
R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proc. of the Int. Conference on Supercomputing, Saint-Malo, France, June 2004.
[18]
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms. In Proc. of the ACM SIGMETRICS, San Diego, CA, USA, June 2007.
[19]
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. S. Jr., and J. Emer. Adaptive Insertion Policies for Managing Shared Caches. In Proc. of the 17th Int. Conference on Parallel Architectures and Compilation Techniques, 2007.
[20]
S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In Proc. of the 28th Int. Symp. on Computer Architecture, pages 240--251, Göteborg, Sweden, June 2001.
[21]
M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement Algorithms. In Proc. of the Int. Conference on Computer Design, pages 61--68, San Jose, CA, USA, Oct. 2005.
[22]
M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement and Bypassing Algorithms. Trans. on Computers, 57(4):433--447, Apr. 2008.
[23]
S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proc. of the 13th Int. Conference on Parallel Architectures and Compilation Techniques, pages 111--122, Antibes Juan-les-Pins, France, Sep. 2004.
[24]
S. Kim, D. Chandra, and Y. Solihin. Fair Caching in a Chip Multi-Processor Architecture. In Proc. of the IBM P=ACÆ2 Conference, Yorktown Heights, NY, USA, Oct. 2004.
[25]
J. D. Kron, B. Prumo, and G. H. Loh. Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches. In Proc. of the Workshop on Chip Multiprocessor Memory Systems and Interconnects, Beijing, China, June 2008.
[26]
A.-C. Lai, C. Fide, and B. Falsafi. Dead--Block Prediction&Dead-Block Correlating Prefetchers. In Proc. of the 28th Int. Symp. on Microarchitecture, pages 144--154, Gööteborg, Sweden, June 2001.
[27]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. In Proc. of the 30th Int. Symp. on Microarchitecture, pages 330--335, Research Triangle Park, NC, USA, Dec. 1997.
[28]
J. Lin, Q. Lu, X. Ding, Z. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In Proc. of the 14th Int. Symp. on High Performance Computer Architecture, pages 367--378, Salt Lake City, UT, USA, Feb. 2008.
[29]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency. In Proc. of the 41st Int. Symp. on Microarchitecture, pages 222--233, Lake Como, Italy, Nov. 2008.
[30]
G. H. Loh, S. Subramaniam, and Y. Xie. Zesto: A Cycle-Level Simulator for Highly Detailed Microarchitecture Exploration. In Proc. of the Int. Symp. on Performance Analysis of Systems and Software, Boston, MA, USA, Apr. 2009.
[31]
K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proc. of the 2001 Int. Symp. on Performance Analysis of Systems and Software, pages 164--171, Tucson, AZ, USA, Nov. 2001.
[32]
R. Narayanan, B. Ozisikyilmax, J. Zambreno, G. Memik, and A. N. Choudhary. MineBench: A Benchmark Suite for Data Mining Workloads. In Proc. of the IEEE Int. Symp. on Workload Characterization, pages 182---188, San Jose, CA, USA, Oct. 2006.
[33]
M. K. Qureshi, D. Lynch, O. Mutlu, and Y. N. Patt. A Case for MLP-Aware Cache Replacement. In Proc. of the 33rd Int. Symp. on Computer Architecture, pages 167--178, Boston, MA, USA, June 2006.
[34]
M. K. Qureshi. Dynamic Spill-Accept for Scalable High-Performance Caching in CMPs. In Proc. of the 15th Int. Symp. on High Performance Computer Architecture, Raleigh, NC, USA, Feb. 2009.
[35]
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer. Adaptive Insertion Policies for High-Performance Caching. In Proc. of the 34th Int. Symp. on Computer Architecture, pages 381--391, San Diego, CA, USA, June 2007.
[36]
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Int. Symp. on Microarchitecture, pages 423--432, Orlando, FL, Dec. 2006.
[37]
N. Rafique, W.-T. Lin, and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proc. of the 15th Int. Conference on Parallel Architectures and Compilation Techniques, pages 2--12, Seattle, WA, USA, Sep. 2006.
[38]
S. Srikantaiah, M. Kandemir, and M. J. Irwin. Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors. In Proc. of the 13th Symp. on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, USA, Mar. 2009.
[39]
H. S. Stone, J. Tuerk, and J. L. Wolf. Optimal Paritioning of Cache Memory. Trans. on Computers, 41(9):1054--1068, Sep. 1992.
[40]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Jour. of Supercomputing, 28(1):7--26, 2004.
[41]
T. Y. Yeh, P. Faloutsos, S. J. Patel, and G. Reinman. ParallAX: an Architecture for Real-Time Physics. In Proc. of the 34th Int. Symp. on Computer Architecture, pages 232--243, San Diego, CA, USA, June 2007.

Cited By

View all
  • (2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: 1-Jan-2024
  • (2024)Real-Time Task Manager: A Python-Based Approach Using Psutil and Tkinter2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS64042.2024.10816758(1-6)Online publication date: 7-Nov-2024
  • (2024)Leveraging Replacement Algorithm for Improved Cache Management SystemWireless Personal Communications: An International Journal10.1007/s11277-024-11022-5135:1(389-401)Online publication date: 1-Mar-2024
  • Show More Cited By

Index Terms

  1. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
    June 2009
    495 pages
    ISSN:0163-5964
    DOI:10.1145/1555815
    Issue’s Table of Contents
    • cover image ACM Conferences
      ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
      June 2009
      510 pages
      ISBN:9781605585260
      DOI:10.1145/1555754
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2009
    Published in SIGARCH Volume 37, Issue 3

    Check for updates

    Author Tags

    1. cache
    2. contention
    3. insertion
    4. multi-core
    5. promotion
    6. sharing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)58
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: 1-Jan-2024
    • (2024)Real-Time Task Manager: A Python-Based Approach Using Psutil and Tkinter2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS64042.2024.10816758(1-6)Online publication date: 7-Nov-2024
    • (2024)Leveraging Replacement Algorithm for Improved Cache Management SystemWireless Personal Communications: An International Journal10.1007/s11277-024-11022-5135:1(389-401)Online publication date: 1-Mar-2024
    • (2024)Cips: The Cache Intrusion Prevention SystemComputer Security – ESORICS 202410.1007/978-3-031-70903-6_1(3-23)Online publication date: 5-Sep-2024
    • (2023)CLEPSYDRACACHEProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620349(1991-2008)Online publication date: 9-Aug-2023
    • (2023)RSPP: Restricted Static Pseudo-Partitioning for Mitigation of Cross-Core Covert Channel AttacksACM Transactions on Design Automation of Electronic Systems10.1145/363722229:2(1-22)Online publication date: 13-Dec-2023
    • (2023)Smart Cache Insertion and Promotion Policy for Content Delivery NetworksProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605581(183-192)Online publication date: 7-Aug-2023
    • (2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
    • (2023)A Survey on Way-Based Cache Partitioning2023 IEEE Silchar Subsection Conference (SILCON)10.1109/SILCON59133.2023.10405216(1-7)Online publication date: 3-Nov-2023
    • (2023)CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071125(1208-1220)Online publication date: Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media