Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1250662.1250671acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Virtual private caches

Published: 09 June 2007 Publication History

Abstract

Virtual Private Machines (VPM) provide a framework for Quality of Service (QoS) in CMP-based computer systems. VPMs incorporate microarchitecture mechanisms that allow shares of hardware resources to be allocated to executing threads, thus providing applications with an upper bound on execution time regardless of other thread activity. Virtual Private Caches (VPCs) are an important element of VPMs. VPC hardware consists of two major components: the VPC Arbiter, which manages shared cache bandwidth, and the VPC Capacity Manager, which manages the cache storage. Both the VPC Arbiter and VPC Capacity Manager provide minimum service guarantees that, when combined, achieve QoS for the cache subsystem. Simulation-based evaluation shows that conventional cache bandwidth management policies allow concurrently executing threads to affect each other significantly in an uncontrollable manner. The evaluation targets cache bandwidth because the effects of cache capacity sharing have been studied elsewhere. In contrast with the conventional policies, the VPC Arbiter meets its QoS performance objectives on all workloads studied and over a range of allocated bandwidth levels. The VPC Arbiter’s fairness policy, which distributes leftover bandwidth, mitigates the effects of cache preemption latencies, thus ensuring threads a high-degree of performance isolation. Furthermore, the VPC Arbiter eliminates negative bandwidth interference which can improve aggregate throughput and resource utilization.

References

[1]
Banga, G., Druschel, P., and Mogul, J., Resource containers: A new facility for resource management in server systems, In Proc. of the 3rd USENIX Symp. On Operating Systems and Design Implementation, Feb. 1999. pp 45--58.
[2]
Bennett, J. C., and Zhang, H., Hierarchical packet fair queuing algorithms. In Trans. On Networking, Oct. 1997. pp 675--689.
[3]
Brown, J., Application Customized CPU Design: The Xbox 360 Story, on IBM Developerworks, Dec. 2005.
[4]
Cazorla, F. J., Ram1rez, A., Valero, M., Knijnenburg, P. M. W., Sakellariou, R., and Fernandez., E., QoS for High-Performance SMT Processors in Embedded Systems. IEEE Micro, 2004. pp 24--31.
[5]
Chetto, H., and Chetto, M., Some Results of the Earliest Deadline Scheduling Algorithm. IEEE Trans. on Software Engineering. 15, 10, Oct. 1989. pp 1261--1269.
[6]
Emer J., et al., Asim: A Performance Model Framework. IEEE Computer, Feb. 2002. pp 68--76.
[7]
Kim, S., Chandra, D., and Solihin, Y., Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proc. of the 13th Intl. Conf. on Parallel Architecture and Compiler Techniques, Sept. 2004. pp 111--122.
[8]
Gupta, D., Cherkasova, L., Gardner, R., and Vahdat, A., Enforcing Performance Isolation Across Virtual Machines in Xen. In Proc. of the USENIX 7th Intl. Middleware Conference, Dec.2006.
[9]
Hennessy J. L., and Patterson, D., A., Computer Architecture: A Quantitative Approach, Third Edition, Morgan Kaugmann, 2002.
[10]
Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S., Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proc. of the 15th Intl. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2006. pp 13--22.
[11]
IBM PowerPC 970FX RISC Microprocessor User's Manual, Version 1.6, Dec. 2005.
[12]
Iyengar, V. S., Trevillyan, L. H., and Bose, P., Representative Traces for Processor Models with Infinite Cache. In Proc. of the 2nd Symp. on High-Performance Computer Architecture, Feb. 1996. pp 62--72.
[13]
Iyer, R. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In Proc, of the 18th Intl. Conf. on Supercomputing, June 26, 2004. pp 257--266.
[14]
Kumar, R., Zyuban, V., and Tullsen, D. M., Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In Proc. of the 32nd Intl. Symp. on Computer Architecture, June 2005. pp 408--419.
[15]
Kongetira, P., Aingaran, K., and Olukotun, K., Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25, 2, Mar. 2005. pp 21--29.
[16]
Le Boudec, J.Y., and Thiran, P., Network Calculus, Springer Verlag, 2004.
[17]
Lee, J. W. and Asanovic, K., METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors. In Proc. of the 12th IEEE Real-Time and Embedded Technology and Applications Symp, April 2006. pp 135--147.
[18]
Luo, K., Gummaraju, J., and Franklin, M., Balancing throughput and fairness in SMT processors. In Proc. of the Intl. Symp. on Performance Analysis of Systems and Software, Jan. 2001. pp 164--171.
[19]
Mak, P., et al., Shared-cache clusters in a system with a fully shared memory. In IBM Journal of R&D Vol. 41 July/Sept. 1997. pp 429--448.
[20]
Micron., 1Gb DDR2 SDRAM Component: MT47H128M8B7-25E, June 2006.
[21]
Nesbit, K.J., Aggarwal, N., Laudon, J., and Smith, J.E., Fair Queuing Memory Systems, In Proc. of 39th Intl. Symp. On Microarchitecture, Dec 2006. pp 208--222.
[22]
Qureshi, M. K. and Patt, Y. N. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Intl. Symp. on Microarchitecture, Dec. 2006. pp 423--432.
[23]
Rafique, N., Lim, W., and Thottethodi, M. Architectural support for operating system-driven CMP cache management. In Proceedings of the 15th Intl. Conf. on Parallel Architectures and Compilation Techniques, Sept. 2006. pp 2--12.
[24]
Sariowan, H., Cruz R.L., and Polyzos G.C., Scheduling for quality of service guarantees via service curves. In Proc. of the 4th Intl. Conf. on Computer Communication and Networks, Sept. 1995. pp 512--520.
[25]
Shreedhar, M., and Varghese, G., Efficient fair queueing using deficit round robin. In Proc. of the Conference on Applications, Technologies, Architectures, and Protocols For Computer Communication, August 1995. pp 231--242.
[26]
Silberschatz, A., Galbin, P. B., and Gagne, G., Operating System Concepts, Seven Edition, John Wiley & Sons, Inc., 2004.
[27]
Skadron, K., and Clark, D. W., Design Issues and Tradeoffs for Write Buffers. In Proc. of the 3rd Symp. on High-Performance Computer Architecture. Feb. 1997. pp 144--155.
[28]
Stewart, D. B., and Mortier, R., Virtual private machines: user-centric performance. In Proc. of the 11th Workshop on ACM SIGOPS European Workshop: Beyond the PC, Sept., 2004. pp 36--40.
[29]
Suh, G. E., Devadas, S., and Rudolph, L., A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In Proceedings of the 8th Intl. Symp. on High-Performance Computer Architecture, Feb. 2002. pp 117--128.
[30]
Tendler, J. M., et. al., Power4 System Mircoarchitecture, Technical white paper, Oct. 2001.
[31]
Verghese, B., Gupta, A., and Rosenblum, M. Performance isolation: sharing and isolation in shared-memory multiprocessors. In Proc. of the 8th Intl. Conf. on Architecture Support For Programming Language and Operating Systems, Oct. 1998. pp 181--192.
[32]
Wilton, S., and Jouppi, N., CACTI: An Enhanced cache Access and Cycle Time Model, In Journal of Solid-State Circuits, Vol. 31, May 1996. pp 677--688.
[33]
Zhang H., Service Disciplines for Guaranteed Performance Service in Packet-switching Networks, In Proc. of the IEEE, vol.83, Oct. 1995. pp 1374--1398.

Cited By

View all
  • (2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: Jan-2024
  • (2020)Novel prioritized LRU circuits for shared cache in computer systemsModern Physics Letters B10.1142/S0217984920502425(2050242)Online publication date: 30-May-2020
  • (2019)Relational verification using reinforcement learningProceedings of the ACM on Programming Languages10.1145/33605673:OOPSLA(1-30)Online publication date: 10-Oct-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
June 2007
542 pages
ISBN:9781595937063
DOI:10.1145/1250662
  • General Chair:
  • Dean Tullsen,
  • Program Chair:
  • Brad Calder
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
    May 2007
    527 pages
    ISSN:0163-5964
    DOI:10.1145/1273440
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chip multiprocessor
  2. performance isolation
  3. quality of service
  4. shared caches
  5. soft real-time

Qualifiers

  • Article

Conference

SPAA07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hopscotch: A Hardware-Software Co-Design for Efficient Cache Resizing on Multi-Core SoCsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.333271135:1(89-104)Online publication date: Jan-2024
  • (2020)Novel prioritized LRU circuits for shared cache in computer systemsModern Physics Letters B10.1142/S0217984920502425(2050242)Online publication date: 30-May-2020
  • (2019)Relational verification using reinforcement learningProceedings of the ACM on Programming Languages10.1145/33605673:OOPSLA(1-30)Online publication date: 10-Oct-2019
  • (2019)Static analysis with demand-driven value refinementProceedings of the ACM on Programming Languages10.1145/33605663:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)Make Page Coloring more Efficient on Slice-Based Three-Level Cache2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS47876.2019.00051(310-317)Online publication date: Dec-2019
  • (2018)GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00034(296-309)Online publication date: Feb-2018
  • (2017)SpinVRACM Transactions on Graphics10.1145/3130800.313083636:6(1-12)Online publication date: 20-Nov-2017
  • (2017)ChromablurACM Transactions on Graphics10.1145/3130800.313081536:6(1-12)Online publication date: 20-Nov-2017
  • (2017)A BSSRDF model for efficient rendering of fur with global illuminationACM Transactions on Graphics10.1145/3130800.313080236:6(1-13)Online publication date: 20-Nov-2017
  • (2017)An Architecture for Integrated Near-Data ProcessorsACM Transactions on Architecture and Code Optimization10.1145/312706914:3(1-25)Online publication date: 6-Sep-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media