Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2000064.2000073acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Vantage: scalable and efficient fine-grain cache partitioning

Published: 04 June 2011 Publication History

Abstract

Cache partitioning has a wide range of uses in CMPs, from guaranteeing quality of service and controlled sharing to security-related techniques. However, existing cache partitioning schemes (such as way-partitioning) are limited to coarse-grain allocations, can only support few partitions, and reduce cache associativity, hurting performance. Hence, these techniques can only be applied to CMPs with 2-4 cores, but fail to scale to tens of cores.
We present Vantage, a novel cache partitioning technique that overcomes the limitations of existing schemes: caches can have tens of partitions with sizes specified at cache line granularity, while maintaining high associativity and strong isolation among partitions. Vantage leverages cache arrays with good hashing and associativity, which enable soft-pinning a large portion of cache lines. It enforces capacity allocations by controlling the replacement process. Unlike prior schemes, Vantage provides strict isolation guarantees by partitioning most (e.g. 90%) of the cache instead of all of it. Vantage is derived from analytical models, which allow us to provide strong guarantees and bounds on associativity and sizing independent of the number of partitions and their behaviors. It is simple to implement, requiring around 1.5% state overhead and simple changes to the cache controller.
We evaluate Vantage using extensive simulations. On a 32-core system, using 350 multiprogrammed workloads and one partition per core, partitioning the last-level cache with conventional techniques degrades throughput for 71% of the workloads versus an unpartitioned cache (by 7% average, 25% maximum degradation), even when using 64-way caches. In contrast, Vantage improves throughput for 98% of the workloads, by 8% on average (up to 20%), using a 4-way cache.

Supplementary Material

JPG File (isca_3a_1.jpg)
MP4 File (isca_3a_1.mp4)

References

[1]
J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In Proc. of the 9th annual ACM Symposium on Theory of Computing, 1977.
[2]
L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd annual Intl. Symp. on Computer Architecture, 2006.
[3]
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Proc. of the 37th annual Design Automation Conf., 2000.
[4]
D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In Proc. of the 37th annual Design Automation Conf., 2000.
[5]
H. Cook, K. Asanović, and D. A. Patterson. Virtual local stores: Enabling software-managed memory hierarchies in mainstream computing environments. Technical report, EECS Department, U. of California, Berkeley, 2009.
[6]
G. Gerosa et al. A sub-1W to 2W low-power IA processor for mobile internet devices and ultra-mobile PCs in 45nm hi-K metal gate CMOS. In IEEE Intl. Solid-State Circuits Conf., 2008.
[7]
F. Guo, H. Kannan, L. Zhao, R. Illikkal, R. Iyer, D. Newell, Y. Solihin, and C. Kozyrakis. From Chaos to QoS: Case Studies in CMP Resource Management. ACM SIGARCH Computer Architecture News, 35(1), 2007.
[8]
L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proc. of the 31st annual Intl. Symp. on Computer Architecture. 2004.
[9]
L. Hsu, S. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proc. of the 15th intl. conf. on Parallel Architectures and Compilation Techniques, 2006.
[10]
R. Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proc. of the 18th annual intl. conf. on Supercomputing, 2004.
[11]
A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In Proc. of the 17th intl. conf. on Parallel Architectures and Compilation Techniques, 2008.
[12]
A. Jaleel, K. Theobald, S. C. S. Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In Proc. of the 37th annual Intl. Symp. on Computer Architecture, 2010.
[13]
N. Kurd et al. Westmere: A family of 32nm IA processors. In IEEE Intl. Solid-State Circuits Conf., 2010.
[14]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proc. of the 14th IEEE intl. symp. on High Performance Computer Architecture, 2008.
[15]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proc. of the ACM SIGPLAN conf. on Programming Language Design and Implementation, 2005.
[16]
V. Nagarajan and R. Gupta. ECMon: exposing cache events for monitoring. In Proc. of the 36th annual Intl. Symp. on Computer Architecture, 2009.
[17]
C. Percival. Cache missing for fun and profit. BSDCan, 2005.
[18]
M. Qureshi. Adaptive spill-receive for robust high-performance caching in cmps. In Proc. of the 10th intl. symp. on High Performance Computer Architecture, 2009.
[19]
M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. of the 39th annual IEEE/ACM intl. symp. on Microarchitecture, 2006.
[20]
P. Ranganathan, S. Adve, and N. Jouppi. Reconfigurable caches and their application to media processing. In Proc. of the 27th annual Intl. Symp. on Computer Architecture, 2000.
[21]
D. Sanchez and C. Kozyrakis. The ZCache: Decoupling Ways and Associativity. In Proc. of the 43rd annual IEEE/ACM intl. symp. on Microarchitecture, 2010.
[22]
A. Seznec. A case for two-way skewed-associative caches. In Proc. of the 20th annual Intl. Symp. on Computer Architecture, 1993.
[23]
J. Shin et al. A 40nm 16-core 128-thread CMT SPARC SoC processor. In Intl. Solid-State Circuits Conf., 2010.
[24]
G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc of the 8th IEEE intl. symp. on High Performance Computer Architecture, 2002.
[25]
K. Varadarajan, S. Nandy, V. Sharda, A. Bharadwaj, R. Iyer, S. Makineni, and D. Newell. Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions. In Proc. of the 39th annual IEEE/ACM intl. symp. on Microarchitecture, 2006.
[26]
C. Wu and M. Martonosi. A Comparison of Capacity Management Schemes for Shared CMP Caches. In Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, 2008.
[27]
Y. Xie and G. H. Loh. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. of the 36th annual Intl. Symp. on Computer Architecture, 2009.

Cited By

View all
  • (2024)Non-Fusion Based Coherent Cache Randomization Using Cross-Domain AccessesProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645011(186-202)Online publication date: 1-Jul-2024
  • (2024)Tartan: Microarchitecting a Robotic Processor2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00047(548-565)Online publication date: 29-Jun-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
June 2011
488 pages
ISBN:9781450304726
DOI:10.1145/2000064
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
    ISCA '11
    June 2011
    462 pages
    ISSN:0163-5964
    DOI:10.1145/2024723
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache partitioning
  2. multi-core
  3. qos
  4. shared cache

Qualifiers

  • Research-article

Conference

ISCA '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)115
  • Downloads (Last 6 weeks)18
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Non-Fusion Based Coherent Cache Randomization Using Cross-Domain AccessesProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645011(186-202)Online publication date: 1-Jul-2024
  • (2024)Tartan: Microarchitecting a Robotic Processor2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00047(548-565)Online publication date: 29-Jun-2024
  • (2023)Resource scheduling techniques in cloud from a view of coordination: a holistic survey从协同视角论云资源调度技术:综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210029824:1(1-40)Online publication date: 23-Jan-2023
  • (2023)TMCProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624667(376-393)Online publication date: 30-Oct-2023
  • (2023)McCore: A Holistic Management of High-Performance Heterogeneous MulticoresProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614295(1044-1058)Online publication date: 28-Oct-2023
  • (2023)ACTION: Adaptive Cache Block Migration in Distributed Cache ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/357291120:2(1-19)Online publication date: 1-Mar-2023
  • (2023)OLPart: Online Learning based Resource Partitioning for Colocating Multiple Latency-Critical Jobs on Commodity ComputersProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3567490(347-364)Online publication date: 8-May-2023
  • (2023)A Security RISC: Microarchitectural Attacks on Hardware RISC-V CPUs2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179399(2321-2338)Online publication date: May-2023
  • (2023)Brief Industry Paper: Latency-Driven Optimization of Instruction Blocks Orchestration on Memory2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00053(463-467)Online publication date: 5-Dec-2023
  • (2022)Noise-Free Security Assessment of Eviction Set Construction Algorithms with Randomized CachesApplied Sciences10.3390/app1205241512:5(2415)Online publication date: 25-Feb-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media