Article

Virtual private caches

Authors:

Kyle J. Nesbit,

James E. SmithAuthors Info & Claims

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

Pages 57 - 68

https://doi.org/10.1145/1250662.1250671

Published: 09 June 2007 Publication History

Abstract

Virtual Private Machines (VPM) provide a framework for Quality of Service (QoS) in CMP-based computer systems. VPMs incorporate microarchitecture mechanisms that allow shares of hardware resources to be allocated to executing threads, thus providing applications with an upper bound on execution time regardless of other thread activity. Virtual Private Caches (VPCs) are an important element of VPMs. VPC hardware consists of two major components: the VPC Arbiter, which manages shared cache bandwidth, and the VPC Capacity Manager, which manages the cache storage. Both the VPC Arbiter and VPC Capacity Manager provide minimum service guarantees that, when combined, achieve QoS for the cache subsystem. Simulation-based evaluation shows that conventional cache bandwidth management policies allow concurrently executing threads to affect each other significantly in an uncontrollable manner. The evaluation targets cache bandwidth because the effects of cache capacity sharing have been studied elsewhere. In contrast with the conventional policies, the VPC Arbiter meets its QoS performance objectives on all workloads studied and over a range of allocated bandwidth levels. The VPC Arbiter’s fairness policy, which distributes leftover bandwidth, mitigates the effects of cache preemption latencies, thus ensuring threads a high-degree of performance isolation. Furthermore, the VPC Arbiter eliminates negative bandwidth interference which can improve aggregate throughput and resource utilization.

References

[1]

Banga, G., Druschel, P., and Mogul, J., Resource containers: A new facility for resource management in server systems, In Proc. of the 3rd USENIX Symp. On Operating Systems and Design Implementation, Feb. 1999. pp 45--58.

Digital Library

[2]

Bennett, J. C., and Zhang, H., Hierarchical packet fair queuing algorithms. In Trans. On Networking, Oct. 1997. pp 675--689.

Digital Library

[3]

Brown, J., Application Customized CPU Design: The Xbox 360 Story, on IBM Developerworks, Dec. 2005.

[4]

Cazorla, F. J., Ram1rez, A., Valero, M., Knijnenburg, P. M. W., Sakellariou, R., and Fernandez., E., QoS for High-Performance SMT Processors in Embedded Systems. IEEE Micro, 2004. pp 24--31.

Digital Library

[5]

Chetto, H., and Chetto, M., Some Results of the Earliest Deadline Scheduling Algorithm. IEEE Trans. on Software Engineering. 15, 10, Oct. 1989. pp 1261--1269.

Digital Library

[6]

Emer J., et al., Asim: A Performance Model Framework. IEEE Computer, Feb. 2002. pp 68--76.

Digital Library

[7]

Kim, S., Chandra, D., and Solihin, Y., Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proc. of the 13th Intl. Conf. on Parallel Architecture and Compiler Techniques, Sept. 2004. pp 111--122.

Digital Library

[8]

Gupta, D., Cherkasova, L., Gardner, R., and Vahdat, A., Enforcing Performance Isolation Across Virtual Machines in Xen. In Proc. of the USENIX 7th Intl. Middleware Conference, Dec.2006.

Digital Library

[9]

Hennessy J. L., and Patterson, D., A., Computer Architecture: A Quantitative Approach, Third Edition, Morgan Kaugmann, 2002.

Digital Library

[10]

Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S., Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proc. of the 15th Intl. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2006. pp 13--22.

Digital Library

[11]

IBM PowerPC 970FX RISC Microprocessor User's Manual, Version 1.6, Dec. 2005.

[12]

Iyengar, V. S., Trevillyan, L. H., and Bose, P., Representative Traces for Processor Models with Infinite Cache. In Proc. of the 2nd Symp. on High-Performance Computer Architecture, Feb. 1996. pp 62--72.

Digital Library

[13]

Iyer, R. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In Proc, of the 18th Intl. Conf. on Supercomputing, June 26, 2004. pp 257--266.

Digital Library

[14]

Kumar, R., Zyuban, V., and Tullsen, D. M., Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In Proc. of the 32nd Intl. Symp. on Computer Architecture, June 2005. pp 408--419.

Digital Library

[15]

Kongetira, P., Aingaran, K., and Olukotun, K., Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25, 2, Mar. 2005. pp 21--29.

Digital Library

[16]

Le Boudec, J.Y., and Thiran, P., Network Calculus, Springer Verlag, 2004.

[17]

Lee, J. W. and Asanovic, K., METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors. In Proc. of the 12th IEEE Real-Time and Embedded Technology and Applications Symp, April 2006. pp 135--147.

Digital Library

[18]

Luo, K., Gummaraju, J., and Franklin, M., Balancing throughput and fairness in SMT processors. In Proc. of the Intl. Symp. on Performance Analysis of Systems and Software, Jan. 2001. pp 164--171.

[19]

Mak, P., et al., Shared-cache clusters in a system with a fully shared memory. In IBM Journal of R&D Vol. 41 July/Sept. 1997. pp 429--448.

Digital Library

[20]

Micron., 1Gb DDR2 SDRAM Component: MT47H128M8B7-25E, June 2006.

[21]

Nesbit, K.J., Aggarwal, N., Laudon, J., and Smith, J.E., Fair Queuing Memory Systems, In Proc. of 39th Intl. Symp. On Microarchitecture, Dec 2006. pp 208--222.

Digital Library

[22]

Qureshi, M. K. and Patt, Y. N. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Intl. Symp. on Microarchitecture, Dec. 2006. pp 423--432.

Digital Library

[23]

Rafique, N., Lim, W., and Thottethodi, M. Architectural support for operating system-driven CMP cache management. In Proceedings of the 15th Intl. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2006. pp 2--12.

Digital Library

[24]

Sariowan, H., Cruz R.L., and Polyzos G.C., Scheduling for quality of service guarantees via service curves. In Proc. of the 4th Intl. Conf. on Computer Communication and Networks, Sept. 1995. pp 512--520.

Digital Library

[25]

Shreedhar, M., and Varghese, G., Efficient fair queueing using deficit round robin. In Proc. of the Conference on Applications, Technologies, Architectures, and Protocols For Computer Communication, August 1995. pp 231--242.

Digital Library

[26]

Silberschatz, A., Galbin, P. B., and Gagne, G., Operating System Concepts, Seven Edition, John Wiley & Sons, Inc., 2004.

Digital Library

[27]

Skadron, K., and Clark, D. W., Design Issues and Tradeoffs for Write Buffers. In Proc. of the 3rd Symp. on High-Performance Computer Architecture. Feb. 1997. pp 144--155.

Digital Library

[28]

Stewart, D. B., and Mortier, R., Virtual private machines: user-centric performance. In Proc. of the 11th Workshop on ACM SIGOPS European Workshop: Beyond the PC, Sept., 2004. pp 36--40.

Digital Library

[29]

Suh, G. E., Devadas, S., and Rudolph, L., A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In Proceedings of the 8th Intl. Symp. on High-Performance Computer Architecture, Feb. 2002. pp 117--128.

Digital Library

[30]

Tendler, J. M., et. al., Power4 System Mircoarchitecture, Technical white paper, Oct. 2001.

Digital Library

[31]

Verghese, B., Gupta, A., and Rosenblum, M. Performance isolation: sharing and isolation in shared-memory multiprocessors. In Proc. of the 8th Intl. Conf. on Architecture Support For Programming Language and Operating Systems, Oct. 1998. pp 181--192.

Digital Library

[32]

Wilton, S., and Jouppi, N., CACTI: An Enhanced cache Access and Cycle Time Model, In Journal of Solid-State Circuits, Vol. 31, May 1996. pp 677--688.

[33]

Zhang H., Service Disciplines for Guaranteed Performance Service in Packet-switching Networks, In Proc. of the IEEE, vol.83, Oct. 1995. pp 1374--1398.

Cited By

Jiang ZYang KFisher NGuan NAudsley NDong Z(2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: Jan-2024
https://doi.org/10.1109/TPDS.2023.3332711
Wang YSun LWang HGopalakrishnan LEaton R(2020)Novel prioritized LRU circuits for shared cache in computer systemsModern Physics Letters B10.1142/S0217984920502425(2050242)Online publication date: 30-May-2020
https://doi.org/10.1142/S0217984920502425
Chen JWei JFeng YBastani ODillig I(2019)Relational verification using reinforcement learningProceedings of the ACM on Programming Languages10.1145/33605673:OOPSLA(1-30)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360567
Show More Cited By

Index Terms

Virtual private caches
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Communication hardware, interfaces and storage

Recommendations

Virtual private caches

Virtual Private Machines (VPM) provide a framework for Quality of Service (QoS) in CMP-based computer systems. VPMs incorporate microarchitecture mechanisms that allow shares of hardware resources to be allocated to executing threads, thus providing ...
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches

Chip Multiprocessors (CMP) with distributed L2 caches suffer from a cache fragmentation problem; some caches may be overutilized while others may be underutilized. To avoid such fragmentation, researchers have proposed capacity sharing mechanisms where ...
Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches

In this paper, we propose a novel on-chip L2 cache organization for chip multiprocessors (CMPs) with private L2 caches. The proposed approach, called reusability-aware cache sharing (RACS), combines the advantages of both a private L2 cache and a shared ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture

June 2007

542 pages

ISBN:9781595937063

DOI:10.1145/1250662

General Chair:
Dean Tullsen
University of California, San Diego
,
Program Chair:
Brad Calder
Microsoft & University of California, San Diego

ACM SIGARCH Computer Architecture News Volume 35, Issue 2
May 2007
527 pages
ISSN:0163-5964
DOI:10.1145/1273440
Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SPAA07

Sponsor:

SIGARCH
IEEE-CS

SPAA07: 19th ACM Symposium on Parallelism in Algorithms and Architectures

June 9 - 13, 2007

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

158
Total Citations
View Citations
1,344
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jiang ZYang KFisher NGuan NAudsley NDong Z(2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: Jan-2024
https://doi.org/10.1109/TPDS.2023.3332711
Wang YSun LWang HGopalakrishnan LEaton R(2020)Novel prioritized LRU circuits for shared cache in computer systemsModern Physics Letters B10.1142/S0217984920502425(2050242)Online publication date: 30-May-2020
https://doi.org/10.1142/S0217984920502425
Chen JWei JFeng YBastani ODillig I(2019)Relational verification using reinforcement learningProceedings of the ACM on Programming Languages10.1145/33605673:OOPSLA(1-30)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360567
Stein BNielsen BChang BMøller A(2019)Static analysis with demand-driven value refinementProceedings of the ACM on Programming Languages10.1145/33605663:OOPSLA(1-29)Online publication date: 10-Oct-2019
https://dl.acm.org/doi/10.1145/3360566
Li HLu TLiu YChen M(2019)Make Page Coloring more Efficient on Slice-Based Three-Level Cache2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS47876.2019.00051(310-317)Online publication date: Dec-2019
https://doi.org/10.1109/ICPADS47876.2019.00051
Jahre MEeckhout L(2018)GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00034(296-309)Online publication date: Feb-2018
https://doi.org/10.1109/HPCA.2018.00034
Konrad RDansereau DMasood AWetzstein G(2017)SpinVRACM Transactions on Graphics10.1145/3130800.313083636:6(1-12)Online publication date: 20-Nov-2017
https://dl.acm.org/doi/10.1145/3130800.3130836
Cholewiak SLove GSrinivasan PNg RBanks M(2017)ChromablurACM Transactions on Graphics10.1145/3130800.313081536:6(1-12)Online publication date: 20-Nov-2017
https://dl.acm.org/doi/10.1145/3130800.3130815
Yan LSun WJensen HRamamoorthi R(2017)A BSSRDF model for efficient rendering of fur with global illuminationACM Transactions on Graphics10.1145/3130800.313080236:6(1-13)Online publication date: 20-Nov-2017
https://dl.acm.org/doi/10.1145/3130800.3130802
Vermij EFiorin LJongerius RHagleitner CLunteren JBertels K(2017)An Architecture for Integrated Near-Data ProcessorsACM Transactions on Architecture and Code Optimization10.1145/312706914:3(1-25)Online publication date: 6-Sep-2017
https://dl.acm.org/doi/10.1145/3127069
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents