research-article

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Authors:

Gabriel H. LohAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 37, Issue 3

Pages 174 - 183

https://doi.org/10.1145/1555815.1555778

Published: 20 June 2009 Publication History

Abstract

Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management policies (e.g., LRU) can lead to poor performance and fairness when the multiple cores compete for the limited LLC capacity. Different memory access patterns can cause cache contention in different ways, and various techniques have been proposed to target some of these behaviors. In this work, we propose a new cache management approach that combines dynamic insertion and promotion policies to provide the benefits of cache partitioning, adaptive insertion, and capacity stealing all with a single mechanism. By handling multiple types of memory behaviors, our proposed technique outperforms techniques that target only either capacity partitioning or adaptive insertion.

References

[1]

J. Abella, A. González, X. Vera, and M. F. P. O'Boyle. IATAC: A Smart Predictor to Turn-Off L2 Cache Lines. Trans. on Architecture and Code Optimization, 2(1):55--77, Mar. 2005.

Digital Library

[2]

T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Micro Magazine, pages 59--67, Feb. 2002.

Digital Library

[3]

D. A. Bader, Y. Li, T. Li, and V. Sachdeva. BioPerf: A Benchmark Suite to Evaluate High-Performance Computer Architecture of Bioinformatics Applications. In Proc. of the IEEE Int. Symp. on Workload Characterization, pages 163--173, Austin, TX, USA, Oct. 2005.

[4]

M. Behar, A. Mendelson, and A. Kolodny. Trace Cache Sampling Filter. In Proc. of the 14th Int. Conference on Parallel Architectures and Compilation Techniques, pages 255--266, St. Louis, MO, USA, Sep. 2005.

Digital Library

[5]

D. S. Bolme, M. M. Strout, and J. R. Beveridge. FacePerf: Benchmarks for Face Recognition Algorithms. In Proc. of the IEEE Int. Symp. on Workload Characterization, Boston, MA, USA, Oct. 2007.

Digital Library

[6]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contenton on a Chip Multi-Processor Architecture. In Proc. of the 11th Int. Symp. on High Performance Computer Architecture, pages 340--351, San Francisco, CA, USA, Feb. 2005.

Digital Library

[7]

J. Chang and G. Sohi. Cooperative Cache Partitioning for Chip Multiprocessors. In Proc. of the 21st Int. Conference on Supercomputing, pages 242--252, Seattle, WA, June 2007.

Digital Library

[8]

D. Chiou. Extending the Reach of Microprocessors: Column and Curious Caching. PhD thesis, Massachusettts Institute of Technology, 1999.

Digital Library

[9]

J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access. White paper, Intel Corporation, 2006. http://download.intel.com/technology/architecture/sma.pdf.

[10]

K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge. Drowsy Caches: Simple Techniques for Reducing Leakage Power. In Proc. of the 29th Int. Symp. on Computer Architecture, pages 148--157, Anchorage, AK, USA, May 2002.

Digital Library

[11]

H. Ghasemzadeh, S. Mazrouee, and M. R. Kakoee. Modified Pseudo LRU Replacement Algorithm. In Proc. of the Int. Symp. on Low Power Electronics and Design, pages 27--30, Potsdam, Germany, Mar. 2006.

Digital Library

[12]

F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A Framework for Providing Quality of Service in Chip Multi-Processors. In Proc. of the 40th Int. Symp. on Microarchitecture, Chicago, IL, Dec. 2007.

Digital Library

[13]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commerically Representative Embedded Benchmark Suite. In Proc. of the 4th Workshop on Workload Characterization, pages 83--94, Austin, TX, USA, Dec. 2001.

Digital Library

[14]

G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proc. of the Workshop on Modeling, Benchmarking and Simulation, Madison, WI, USA, June 2005.

[15]

L. R. Hsu, S. K. Reinhardt, R. R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proc. of the 15th Int. Conference on Parallel Architectures and Compilation Techniques, pages 13--22, Seattle, WA, USA, Sep. 2006.

Digital Library

[16]

Z. Hu, M. Martonosi, and S. Kaxiras. Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior. In Proc. of the 29th Int. Symp. on Computer Architecture, pages 209--220, Anchorage, AK, USA, May 2002.

Digital Library

[17]

R. Iyer. CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proc. of the Int. Conference on Supercomputing, Saint-Malo, France, June 2004.

Digital Library

[18]

R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS Policies and Architecture for Cache/Memory in CMP Platforms. In Proc. of the ACM SIGMETRICS, San Diego, CA, USA, June 2007.

Digital Library

[19]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. S. Jr., and J. Emer. Adaptive Insertion Policies for Managing Shared Caches. In Proc. of the 17th Int. Conference on Parallel Architectures and Compilation Techniques, 2007.

Digital Library

[20]

S. Kaxiras, Z. Hu, and M. Martonosi. Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In Proc. of the 28th Int. Symp. on Computer Architecture, pages 240--251, Göteborg, Sweden, June 2001.

Digital Library

[21]

M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement Algorithms. In Proc. of the Int. Conference on Computer Design, pages 61--68, San Jose, CA, USA, Oct. 2005.

Digital Library

[22]

M. Kharbutli and Y. Solihin. Counter-Based Cache Replacement and Bypassing Algorithms. Trans. on Computers, 57(4):433--447, Apr. 2008.

Digital Library

[23]

S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proc. of the 13th Int. Conference on Parallel Architectures and Compilation Techniques, pages 111--122, Antibes Juan-les-Pins, France, Sep. 2004.

Digital Library

[24]

S. Kim, D. Chandra, and Y. Solihin. Fair Caching in a Chip Multi-Processor Architecture. In Proc. of the IBM P=ACÆ2 Conference, Yorktown Heights, NY, USA, Oct. 2004.

Digital Library

[25]

J. D. Kron, B. Prumo, and G. H. Loh. Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches. In Proc. of the Workshop on Chip Multiprocessor Memory Systems and Interconnects, Beijing, China, June 2008.

[26]

A.-C. Lai, C. Fide, and B. Falsafi. Dead--Block Prediction&Dead-Block Correlating Prefetchers. In Proc. of the 28th Int. Symp. on Microarchitecture, pages 144--154, Gööteborg, Sweden, June 2001.

Digital Library

[27]

C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. In Proc. of the 30th Int. Symp. on Microarchitecture, pages 330--335, Research Triangle Park, NC, USA, Dec. 1997.

Digital Library

[28]

J. Lin, Q. Lu, X. Ding, Z. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In Proc. of the 14th Int. Symp. on High Performance Computer Architecture, pages 367--378, Salt Lake City, UT, USA, Feb. 2008.

[29]

H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency. In Proc. of the 41st Int. Symp. on Microarchitecture, pages 222--233, Lake Como, Italy, Nov. 2008.

Digital Library

[30]

G. H. Loh, S. Subramaniam, and Y. Xie. Zesto: A Cycle-Level Simulator for Highly Detailed Microarchitecture Exploration. In Proc. of the Int. Symp. on Performance Analysis of Systems and Software, Boston, MA, USA, Apr. 2009.

[31]

K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proc. of the 2001 Int. Symp. on Performance Analysis of Systems and Software, pages 164--171, Tucson, AZ, USA, Nov. 2001.

[32]

R. Narayanan, B. Ozisikyilmax, J. Zambreno, G. Memik, and A. N. Choudhary. MineBench: A Benchmark Suite for Data Mining Workloads. In Proc. of the IEEE Int. Symp. on Workload Characterization, pages 182---188, San Jose, CA, USA, Oct. 2006.

[33]

M. K. Qureshi, D. Lynch, O. Mutlu, and Y. N. Patt. A Case for MLP-Aware Cache Replacement. In Proc. of the 33rd Int. Symp. on Computer Architecture, pages 167--178, Boston, MA, USA, June 2006.

Digital Library

[34]

M. K. Qureshi. Dynamic Spill-Accept for Scalable High-Performance Caching in CMPs. In Proc. of the 15th Int. Symp. on High Performance Computer Architecture, Raleigh, NC, USA, Feb. 2009.

[35]

M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer. Adaptive Insertion Policies for High-Performance Caching. In Proc. of the 34th Int. Symp. on Computer Architecture, pages 381--391, San Diego, CA, USA, June 2007.

Digital Library

[36]

M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proc. of the 39th Int. Symp. on Microarchitecture, pages 423--432, Orlando, FL, Dec. 2006.

Digital Library

[37]

N. Rafique, W.-T. Lin, and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proc. of the 15th Int. Conference on Parallel Architectures and Compilation Techniques, pages 2--12, Seattle, WA, USA, Sep. 2006.

Digital Library

[38]

S. Srikantaiah, M. Kandemir, and M. J. Irwin. Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors. In Proc. of the 13th Symp. on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, USA, Mar. 2009.

Digital Library

[39]

H. S. Stone, J. Tuerk, and J. L. Wolf. Optimal Paritioning of Cache Memory. Trans. on Computers, 41(9):1054--1068, Sep. 1992.

Digital Library

[40]

G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Jour. of Supercomputing, 28(1):7--26, 2004.

Digital Library

[41]

T. Y. Yeh, P. Faloutsos, S. J. Patel, and G. Reinman. ParallAX: an Architecture for Real-Time Physics. In Proc. of the 34th Int. Symp. on Computer Architecture, pages 232--243, San Diego, CA, USA, June 2007.

Digital Library

Cited By

Jiang ZYang KFisher NGuan NAudsley NDong Z(2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3332711
Sai PKarthik KPrasad KSai Pranav CK.V D(2024)Real-Time Task Manager: A Python-Based Approach Using Psutil and Tkinter2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS64042.2024.10816758(1-6)Online publication date: 7-Nov-2024
https://doi.org/10.1109/CSITSS64042.2024.10816758
Srivastava SSingh P(2024)Leveraging Replacement Algorithm for Improved Cache Management SystemWireless Personal Communications: An International Journal10.1007/s11277-024-11022-5135:1(389-401)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11277-024-11022-5
Show More Cited By

Index Terms

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
1. Computer systems organization
  1. Architectures
    1. Parallel architectures

Recommendations

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Many multi-core processors employ a large last-level cache (LLC) shared among the multiple cores. Past research has demonstrated that sharing-oblivious cache management policies (e.g., LRU) can lead to poor performance and fairness when the multiple ...
The ZCache: Decoupling Ways and Associativity
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases ...
Reactive NUCA: near-optimal block placement and replication in distributed caches
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 37, Issue 3

June 2009

495 pages

ISSN:0163-5964

DOI:10.1145/1555815

Issue’s Table of Contents

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Published in SIGARCH Volume 37, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

290
Total Citations
View Citations
1,947
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)8

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang ZYang KFisher NGuan NAudsley NDong Z(2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3332711
Sai PKarthik KPrasad KSai Pranav CK.V D(2024)Real-Time Task Manager: A Python-Based Approach Using Psutil and Tkinter2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS)10.1109/CSITSS64042.2024.10816758(1-6)Online publication date: 7-Nov-2024
https://doi.org/10.1109/CSITSS64042.2024.10816758
Srivastava SSingh P(2024)Leveraging Replacement Algorithm for Improved Cache Management SystemWireless Personal Communications: An International Journal10.1007/s11277-024-11022-5135:1(389-401)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s11277-024-11022-5
Thoma JStolz FGüneysu T(2024)Cips: The Cache Intrusion Prevention SystemComputer Security – ESORICS 202410.1007/978-3-031-70903-6_1(3-23)Online publication date: 5-Sep-2024
https://doi.org/10.1007/978-3-031-70903-6_1
Thoma JNiesler CFunke DLeander GMayr PPohl NDavi LGüneysu TCalandrino JTroncoso C(2023)CLEPSYDRACACHEProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620349(1991-2008)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620349
Kaur JDas S(2023)RSPP: Restricted Static Pseudo-Partitioning for Mitigation of Cross-Core Covert Channel AttacksACM Transactions on Design Automation of Electronic Systems10.1145/363722229:2(1-22)Online publication date: 13-Dec-2023
https://dl.acm.org/doi/10.1145/3637222
Wang PLiu YZhao ZZhou KHuang ZChen Y(2023)Smart Cache Insertion and Promotion Policy for Content Delivery NetworksProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605581(183-192)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605581
Du ZZhang QLin MLi SLi XJu L(2023)A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317932342:4(1058-1071)Online publication date: Apr-2023
https://doi.org/10.1109/TCAD.2022.3179323
Das PBarbhuiya NRanjan Roy B(2023)A Survey on Way-Based Cache Partitioning2023 IEEE Silchar Subsection Conference (SILCON)10.1109/SILCON59133.2023.10405216(1-7)Online publication date: 3-Nov-2023
https://doi.org/10.1109/SILCON59133.2023.10405216
Lu XWang RSun X(2023)CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071125(1208-1220)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071125
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents