research-article

Vantage: scalable and efficient fine-grain cache partitioning

Authors:

Daniel Sanchez,

Christos KozyrakisAuthors Info & Claims

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Pages 57 - 68

https://doi.org/10.1145/2000064.2000073

Published: 04 June 2011 Publication History

Abstract

Cache partitioning has a wide range of uses in CMPs, from guaranteeing quality of service and controlled sharing to security-related techniques. However, existing cache partitioning schemes (such as way-partitioning) are limited to coarse-grain allocations, can only support few partitions, and reduce cache associativity, hurting performance. Hence, these techniques can only be applied to CMPs with 2-4 cores, but fail to scale to tens of cores.

We present Vantage, a novel cache partitioning technique that overcomes the limitations of existing schemes: caches can have tens of partitions with sizes specified at cache line granularity, while maintaining high associativity and strong isolation among partitions. Vantage leverages cache arrays with good hashing and associativity, which enable soft-pinning a large portion of cache lines. It enforces capacity allocations by controlling the replacement process. Unlike prior schemes, Vantage provides strict isolation guarantees by partitioning most (e.g. 90%) of the cache instead of all of it. Vantage is derived from analytical models, which allow us to provide strong guarantees and bounds on associativity and sizing independent of the number of partitions and their behaviors. It is simple to implement, requiring around 1.5% state overhead and simple changes to the cache controller.

We evaluate Vantage using extensive simulations. On a 32-core system, using 350 multiprogrammed workloads and one partition per core, partitioning the last-level cache with conventional techniques degrades throughput for 71% of the workloads versus an unpartitioned cache (by 7% average, 25% maximum degradation), even when using 64-way caches. In contrast, Vantage improves throughput for 98% of the workloads, by 8% on average (up to 20%), using a 4-way cache.

Supplementary Material

JPG File (isca_3a_1.jpg)

Download
16.04 KB

MP4 File (isca_3a_1.mp4)

Download
134.18 MB

References

[1]

J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In Proc. of the 9th annual ACM Symposium on Theory of Computing, 1977.

Digital Library

[2]

L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd annual Intl. Symp. on Computer Architecture, 2006.

Digital Library

[3]

D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Proc. of the 37th annual Design Automation Conf., 2000.

[4]

D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In Proc. of the 37th annual Design Automation Conf., 2000.

Digital Library

[5]

H. Cook, K. Asanović, and D. A. Patterson. Virtual local stores: Enabling software-managed memory hierarchies in mainstream computing environments. Technical report, EECS Department, U. of California, Berkeley, 2009.

[6]

G. Gerosa et al. A sub-1W to 2W low-power IA processor for mobile internet devices and ultra-mobile PCs in 45nm hi-K metal gate CMOS. In IEEE Intl. Solid-State Circuits Conf., 2008.

[7]

F. Guo, H. Kannan, L. Zhao, R. Illikkal, R. Iyer, D. Newell, Y. Solihin, and C. Kozyrakis. From Chaos to QoS: Case Studies in CMP Resource Management. ACM SIGARCH Computer Architecture News, 35(1), 2007.

Digital Library

[8]

L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proc. of the 31st annual Intl. Symp. on Computer Architecture. 2004.

Digital Library

[9]

L. Hsu, S. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proc. of the 15th intl. conf. on Parallel Architectures and Compilation Techniques, 2006.

Digital Library

[10]

R. Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proc. of the 18th annual intl. conf. on Supercomputing, 2004.

Digital Library

[11]

A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In Proc. of the 17th intl. conf. on Parallel Architectures and Compilation Techniques, 2008.

Digital Library

[12]

A. Jaleel, K. Theobald, S. C. S. Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In Proc. of the 37th annual Intl. Symp. on Computer Architecture, 2010.

Digital Library

[13]

N. Kurd et al. Westmere: A family of 32nm IA processors. In IEEE Intl. Solid-State Circuits Conf., 2010.

[14]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proc. of the 14th IEEE intl. symp. on High Performance Computer Architecture, 2008.

[15]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proc. of the ACM SIGPLAN conf. on Programming Language Design and Implementation, 2005.

Digital Library

[16]

V. Nagarajan and R. Gupta. ECMon: exposing cache events for monitoring. In Proc. of the 36th annual Intl. Symp. on Computer Architecture, 2009.

Digital Library

[17]

C. Percival. Cache missing for fun and profit. BSDCan, 2005.

[18]

M. Qureshi. Adaptive spill-receive for robust high-performance caching in cmps. In Proc. of the 10th intl. symp. on High Performance Computer Architecture, 2009.

[19]

M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. of the 39th annual IEEE/ACM intl. symp. on Microarchitecture, 2006.

Digital Library

[20]

P. Ranganathan, S. Adve, and N. Jouppi. Reconfigurable caches and their application to media processing. In Proc. of the 27th annual Intl. Symp. on Computer Architecture, 2000.

Digital Library

[21]

D. Sanchez and C. Kozyrakis. The ZCache: Decoupling Ways and Associativity. In Proc. of the 43rd annual IEEE/ACM intl. symp. on Microarchitecture, 2010.

Digital Library

[22]

A. Seznec. A case for two-way skewed-associative caches. In Proc. of the 20th annual Intl. Symp. on Computer Architecture, 1993.

Digital Library

[23]

J. Shin et al. A 40nm 16-core 128-thread CMT SPARC SoC processor. In Intl. Solid-State Circuits Conf., 2010.

[24]

G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc of the 8th IEEE intl. symp. on High Performance Computer Architecture, 2002.

Digital Library

[25]

K. Varadarajan, S. Nandy, V. Sharda, A. Bharadwaj, R. Iyer, S. Makineni, and D. Newell. Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions. In Proc. of the 39th annual IEEE/ACM intl. symp. on Microarchitecture, 2006.

Digital Library

[26]

C. Wu and M. Martonosi. A Comparison of Capacity Management Schemes for Shared CMP Caches. In Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, 2008.

[27]

Y. Xie and G. H. Loh. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. of the 36th annual Intl. Symp. on Computer Architecture, 2009.

Digital Library

Cited By

Ramkrishnan KMcCamant SZhai AYew PQuek TGao DZhou JCardenas A(2024)Non-Fusion Based Coherent Cache Randomization Using Cross-Domain AccessesProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645011(186-202)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3645011
Bakhshalipour MGibbons P(2024)Tartan: Microarchitecting a Robotic Processor2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00047(548-565)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00047
Wang YYu JYu Z(2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术：综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
https://doi.org/10.1631/FITEE.2100298
Show More Cited By

Index Terms

Vantage: scalable and efficient fine-grain cache partitioning
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Vantage: scalable and efficient fine-grain cache partitioning
ISCA '11

Cache partitioning has a wide range of uses in CMPs, from guaranteeing quality of service and controlled sharing to security-related techniques. However, existing cache partitioning schemes (such as way-partitioning) are limited to coarse-grain ...
Reactive NUCA: near-optimal block placement and replication in distributed caches

Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors. The large working sets favor a shared cache design that maximizes the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

June 2011

488 pages

ISBN:9781450304726

DOI:10.1145/2000064

General Chairs:
Ravi Iyer
Intel
,
Qing Yang
University of Rhode Island
,
Program Chair:
Antonio González
Intel and UPC

ACM SIGARCH Computer Architecture News Volume 39, Issue 3
ISCA '11
June 2011
462 pages
ISSN:0163-5964
DOI:10.1145/2024723
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '11

Sponsor:

SIGARCH

ISCA '11: The 38th Annual International Symposium on Computer Architecture

June 4 - 8, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

189
Total Citations
View Citations
1,574
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)18

Reflects downloads up to 21 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ramkrishnan KMcCamant SZhai AYew PQuek TGao DZhou JCardenas A(2024)Non-Fusion Based Coherent Cache Randomization Using Cross-Domain AccessesProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645011(186-202)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3645011
Bakhshalipour MGibbons P(2024)Tartan: Microarchitecting a Robotic Processor2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00047(548-565)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00047
Wang YYu JYu Z(2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术：综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
https://doi.org/10.1631/FITEE.2100298
Ni YMehra PMiller ELitz H(2023)TMCProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624667(376-393)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624667
Kwon JLee YKal HKim MKim YRo W(2023)McCore: A Holistic Management of High-Performance Heterogeneous MulticoresProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614295(1044-1058)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614295
Mummidi CKundu S(2023)ACTION: Adaptive Cache Block Migration in Distributed Cache ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/357291120:2(1-19)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1145/3572911
Chen RShi HLi YLiu XWang GFedorova ANarayanan DDi Luna GQuerzoni L(2023)OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity ComputersProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567490(347-364)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3552326.3567490
Gerlach LWeber DZhang RSchwarz M(2023)A Security RISC: Microarchitectural Attacks on Hardware RISC-V CPUs2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179399(2321-2338)Online publication date: May-2023
https://doi.org/10.1109/SP46215.2023.10179399
Chen HWu XDai LLiu Y(2023)Brief Industry Paper: Latency-Driven Optimization of Instruction Blocks Orchestration on Memory2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00053(463-467)Online publication date: 5-Dec-2023
https://doi.org/10.1109/RTSS59052.2023.00053
Jaamoum AHiscock TDi Natale G(2022)Noise-Free Security Assessment of Eviction Set Construction Algorithms with Randomized CachesApplied Sciences10.3390/app1205241512:5(2415)Online publication date: 25-Feb-2022
https://doi.org/10.3390/app12052415
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents