Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3190508.3190511acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Public Access

DCAPS: dynamic cache allocation with partial sharing

Published: 23 April 2018 Publication History

Abstract

In a multicore system, effective management of shared last level cache (LLC), such as hardware/software cache partitioning, has attracted significant research attention. Some eminent progress is that Intel introduced Cache Allocation Technology (CAT) to its commodity processors recently. CAT implements way partitioning and provides software interface to control cache allocation. Unfortunately, CAT can only allocate at way level, which does not scale well for a large thread or program count to serve their various performance goals effectively. This paper proposes Dynamic Cache Allocation with Partial Sharing (DCAPS), a framework that dynamically monitors and predicts a multi-programmed workload's cache demand, and reallocates LLC given a performance target. Further, DCAPS explores partial sharing of a cache partition among programs and thus practically achieves cache allocation at a finer granularity. DCAPS consists of three parts: (1) Online Practical Miss Rate Curve (OPMRC), a low-overhead software technique to predict online miss rate curves (MRCs) of individual programs of a workload; (2) a prediction model that estimates the LLC occupancy of each individual program under any CAT allocation scheme; (3) a simulated annealing algorithm that searches for a near-optimal CAT scheme given a specific performance goal. Our experimental results show that DCAPS is able to optimize for a wide range of performance targets and can scale to a large core count.

References

[1]
User space software for intel resource director technology, https://github.com/01 org/intel-cmt-cat/tree/master/pqos.
[2]
Aarts, E., and Korst, J. Simulated annealing and boltzmann machines. Wiley (1989).
[3]
Akiyama, S., and Hirofuchi, T. Quantitative evaluation of intel pebs overhead for online system-noise analysis. In Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 (2017), ACM, p. 3.
[4]
Beyls, K., and D'Hollander, E. H. Discovery of locality-improving refactorings by reuse path analysis. In High Performance Computing and Communications, Second International Conference, HPCC 2006, Munich, Germany, September 13--15, 2006, Proceedings (2006), pp. 220--229.
[5]
Chandra, D., Guo, F., Kim, S., and Solihin, Y. Predicting inter-thread cache contention on a chip multi-processor architecture. In 11th International Symposium on High-Performance Computer Architecture (2005), IEEE, pp. 340--351.
[6]
Chang, J., and Sohi, G. S. Cooperative cache partitioning for chip multiprocessors. In ACM International Conference on Supercomputing 25th Anniversary Volume (2014), ACM, pp. 402--412.
[7]
Eklov, D., and Hagersten, E. Statstack: Efficient modeling of lru caches. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on (2010), IEEE, pp. 55--65.
[8]
El-Sayed, N., Mukkara, A., Tsai, P.-A., Kasture, H., Ma, X., and Sanchez, D. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores. In Proceedings of the 24th International Symposium on High Performance Computer Architecture (HPCA-24) (February 2018).
[9]
Funaro, L., Ben-Yehuda, O. A., and Schuster, A. Ginseng: Market-driven llc allocation. In 2016 USENIX Annual Technical Conference (USENIX ATC 16) (Denver, CO, 2016), USENIX Association, pp. 295--308.
[10]
Guide, P. Intel® 64 and ia-32 architectures software developer's manual. Volume 3B: System programming Guide, Part 3 (2017).
[11]
Herdrich, A., Verplanke, E., Autee, P., Illikkal, R., Gianos, C., Singhal, R., and Iyer, R. Cache QoS: From concept to reality in the intel® xeon® processor e5--2600 v3 product family. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (2016), IEEE, pp. 657--668.
[12]
Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S. Communist, utilitarian, and capitalist cache policies on cmps: caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (2006), ACM, pp. 13--22.
[13]
Hu, X., Wang, X., Li, Y., Luo, Y., Ding, C., and Wang, Z. Optimal symbiosis and fair scheduling in shared cache. IEEE Trans. Parallel Distrib. Syst. 28, 4 (2017), 1134--1148.
[14]
Hu, X., Wang, X., Zhou, L., Luo, Y., Ding, C., and Wang, Z. Kinetic modeling of data eviction in cache. In 2016 USENIX Annual Technical Conference (USENIX ATC 16) (Denver, CO, 2016), USENIX Association, pp. 351--364.
[15]
Hwang, C.-R. Simulated annealing: theory and applications. Acta Applicandae Mathematicae 12, 1 (1988), 108--111.
[16]
Iyer, R. CQoS: a framework for enabling qos in shared caches of cmp platforms. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS) (2004), ACM, pp. 257--266.
[17]
Kim, S., Chandra, D., and Solihin, Y. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (2004), IEEE Computer Society, pp. 111--122.
[18]
Kim, Y. H., Hill, M. D., and Wood, D. A. Implementing stack simulation for highly-associative memories. SIGMETRICS Perform. Eval. Rev. 19, 1 (Apr. 1991), 212--213.
[19]
Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture (2008), IEEE, pp. 367--378.
[20]
Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., and Kozyrakis, C. Heracles: improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News (2015), vol. 43, ACM, pp. 450--462.
[21]
LUO, K., Gummaraju, J., and Franklin, M. Balancing thoughput and fairness in smt processors. In Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE International Symposium on (2001), IEEE, pp. 164--171.
[22]
Manikantan, R., Rajan, K., and Govindarajan, R. Probabilistic shared cache management (prism). In ACM SIGARCH Computer Architecture News (2012), vol. 40, IEEE Computer Society, pp. 428--439.
[23]
Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. Evaluation techniques for storage hierarchies. IBM Systems Journal 9, 2 (1970), 78--117.
[24]
Olken, F. Efficient methods for calculating the success function of fixed-space replacement policies. Tech. rep., Lawrence Berkeley Lab., CA (USA), 1981.
[25]
Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. Adaptive insertion policies for high performance caching. In ACM SIGARCH Computer Architecture News (2007), vol. 35, ACM, pp. 381--391.
[26]
Qureshi, M. K., and Patt, Y. N. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (2006), IEEE Computer Society, pp. 423--432.
[27]
Rafique, N., Lim, W.-T., and Thottethodi, M. Architectural support for operating system-driven cmp cache management. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT) (2006), ACM, pp. 2--12.
[28]
Sanchez, D., and Kozyrakis, C. Vantage: scalable and efficient fine-grain cache partitioning. In ACM SIGARCH Computer Architecture News (2011), vol. 39, ACM, pp. 57--68.
[29]
Snavely, A., and Tullsen, D. M. Symbiotic jobscheduling for a simultaneous mutlithreading processor. ACM SIGPLAN Notices 35, 11 (2000), 234--244.
[30]
Suh, G. E., Devadas, S., and Rudolph, L. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA) (2002), IEEE, pp. 117--128.
[31]
Suh, G. E., Devadas, S., and Rudolph, L. Analytical cache models with applications to cache partitioning. In ACM International Conference on Supercomputing 25th Anniversary Volume (2014), ACM, pp. 323--334.
[32]
Suh, G. E., Rudolph, L., and Devadas, S. Dynamic partitioning of shared cache memory. The Journal of Supercomputing 28, 1 (2004), 7--26.
[33]
Tam, D. K., Azimi, R., Soares, L. B., and Stumm, M. Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In ACM SIGARCH Computer Architecture News (2009), vol. 37, ACM, pp. 121--132.
[34]
Vitter, J. S. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 1 (Mar. 1985), 37--57.
[35]
Wang, X., Chen, S., Setter, J., and Martínez, J. F. SWAP: effective fine-grain management of shared last-level caches with minimum hardware support. In 2017 IEEE International Symposium on High Performance Computer Architecture, HPCA 2017, Austin, TX, USA, February 4--8, 2017 (2017), pp. 121--132.
[36]
Wang, X., Li, Y., Luo, Y., Hu, X., Brock, J., Ding, C., and Wang, Z. Optimal footprint symbiosis in shared cache. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, Shenzhen, China, May 4--7, 2015 (2015), pp. 412--422.
[37]
West, R., Zaroo, P., Waldspurger, C. A., and Zhang, X. Online cache modeling for commodity multicore processors. ACM SIGOPS Operating Systems Review 44, 4 (2010), 19--29.
[38]
Xiang, X., Bao, B., Bai, T., Ding, C., and Chilimbi, T. All-window profiling and composable models of cache sharing. In ACM SIGPLAN Notices (2011), vol. 46, ACM, pp. 91--102.
[39]
Xiang, X., Bao, B., Ding, C., and Gao, Y. Linear-time modeling of program working set in shared cache. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on (2011), IEEE, pp. 350--360.
[40]
Xiang, X., Ding, C., Luo, H., and Bao, B. HOTL: a higher order theory of locality. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, Houston, TX, USA - March 16 -- 20, 2013 (2013), pp. 343--356.
[41]
Xie, Y., and Loh, G. H. Pipp: promotion/insertion pseudo-partitioning of multi-core shared caches. In ACM SIGARCH Computer Architecture News (2009), vol. 37, ACM, pp. 174--183.
[42]
Ye, C., Ding, C., Luo, H., Brock, J., Chen, D., and Jin, H. Cache exclusivity and sharing: Theory and optimization. ACM Trans. Archit. Code Optim. 14, 4 (Nov. 2017), 34:1--34:26.
[43]
Zhang, X., Dwarkadas, S., and Shen, K. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European conference on Computer systems (2009), ACM, pp. 89--102.
[44]
Zhou, P., Pandey, V., Sundaresan, J., Raghuraman, A., Zhou, Y., and Kumar, S. Dynamic tracking of page miss ratio curve for memory management. SIGOPS Oper. Syst. Rev. 38, 5 (Oct. 2004), 177--188.

Cited By

View all
  • (2024)Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job ColocationACM Transactions on Architecture and Code Optimization10.1145/3674736Online publication date: 24-Jun-2024
  • (2024)DMA-assisted I/O for Persistent MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.3373003(1-15)Online publication date: 2024
  • (2024)SGXFault: An Efficient Page Fault Handling Mechanism for SGX EnclavesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326816921:3(1173-1178)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '18: Proceedings of the Thirteenth EuroSys Conference
April 2018
631 pages
ISBN:9781450355841
DOI:10.1145/3190508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache allocation technology
  2. cache occupancy
  3. cache partitioning
  4. miss rate curve
  5. multi-core architectures

Qualifiers

  • Research-article

Funding Sources

Conference

EuroSys '18
Sponsor:
EuroSys '18: Thirteenth EuroSys Conference 2018
April 23 - 26, 2018
Porto, Portugal

Acceptance Rates

EuroSys '18 Paper Acceptance Rate 43 of 262 submissions, 16%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)345
  • Downloads (Last 6 weeks)54
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job ColocationACM Transactions on Architecture and Code Optimization10.1145/3674736Online publication date: 24-Jun-2024
  • (2024)DMA-assisted I/O for Persistent MemoryIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.3373003(1-15)Online publication date: 2024
  • (2024)SGXFault: An Efficient Page Fault Handling Mechanism for SGX EnclavesIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326816921:3(1173-1178)Online publication date: May-2024
  • (2024)-LAP: A Lightweight and Adaptive Cache Partitioning Scheme With Prudent Resizing Decisions for Content Delivery NetworksIEEE Transactions on Cloud Computing10.1109/TCC.2024.342045412:3(942-953)Online publication date: Jul-2024
  • (2023)Quarantine: Mitigating Transient Execution Attacks with Physical Domain IsolationProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3607199.3607248(207-221)Online publication date: 16-Oct-2023
  • (2023)Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud DatacentersACM Transactions on Architecture and Code Optimization10.1145/359305520:3(1-24)Online publication date: 19-Jul-2023
  • (2023)Multi-Tenant In-Memory Key-Value Cache Partitioning Using Efficient Random Sampling-Based LRU ModelIEEE Transactions on Cloud Computing10.1109/TCC.2023.330088911:4(3601-3618)Online publication date: Oct-2023
  • (2023)Reestablishing Page Placement Mechanisms for Nested VirtualizationIEEE Transactions on Cloud Computing10.1109/TCC.2023.3276368(1-12)Online publication date: 2023
  • (2023)Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple ObjectivesIEEE Transactions on Computers10.1109/TC.2023.330395972:12(3443-3457)Online publication date: Dec-2023
  • (2023)Precise control of page cache for containersFrontiers of Computer Science10.1007/s11704-022-2455-018:2Online publication date: 13-Sep-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media