research-article

CABARRE: Request Response Arbitration for Shared Cache Management

Authors:

Preeti Ranjan PandaAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 5s

Article No.: 130, Pages 1 - 24

https://doi.org/10.1145/3608096

Published: 09 September 2023 Publication History

Abstract

Modern multi-processor systems-on-chip (MPSoCs) are characterized by caches shared by multiple cores. These shared caches receive requests issued by the processor cores. Requests that are subject to cache misses may result in the generation of responses. These responses are received from the lower level of the memory hierarchy and written to the cache. The outstanding requests and responses contend for the shared cache bandwidth. To mitigate the impact of the cache bandwidth contention on the overall system performance, an efficient request and response arbitration policy is needed.

Research on shared cache management has neglected the additional cache contention caused by responses, which are written to the cache. We propose CABARRE, a novel request and response arbitration policy at shared caches, so as to improve the overall system performance. CABARRE shows a performance improvement of 23% on average across a set of SPEC workloads compared to straightforward adaptations of state-of-the-art solutions.

References

[1]

Tosiron Adegbija and Ravi Tandon. 2017. Coding for efficient caching in multicore embedded systems. In 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 296–301.

[2]

Mainak Chaudhuri, Jayesh Gaur, and Sreenivas Subramoney. 2019. Bandwidth-aware last-level caching: Efficiently coordinating off-chip read and write bandwidth. In 2019 IEEE 37th International Conference on Computer Design (ICCD). IEEE, 109–118.

[3]

Hyejeong Choi and Sejin Park. 2022. Learning future reference patterns for efficient cache replacement decisions. IEEE Access 10 (2022), 25922–25934.

[4]

Jongwook Chung, Yuhwan Ro, Joonsung Kim, Jaehyung Ahn, Jangwoo Kim, John Kim, Jae W. Lee, and Jung Ho Ahn. 2019. Enforcing last-level cache partitioning through memory virtual channels. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 97–109.

[5]

Kousik Kumar Dutta, Prathamesh Nitin Tanksale, and Shirshendu Das. 2021. A fairness conscious cache replacement policy for last level cache. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 695–700.

[6]

Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 104–117.

[7]

Juan Fang, Zixuan Nie, and Li’ang Zhao. 2022. PACP: A prefetch-aware multi-core shared cache partitioning strategy. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence. 246–251.

Digital Library

[8]

Josue Feliu, Julio Sahuquillo, Salvador Petit, and Jose Duato. 2015. Bandwidth-aware on-line scheduling in SMT multicores. IEEE Trans. Comput. 65, 2 (2015), 422–434.

Digital Library

[9]

Josue Feliu, Julio Sahuquillo, Salvador Petit, and Jose Duato. 2016. Perf&Fair: A progress-aware scheduler to enhance performance and fairness in SMT multicores. IEEE Trans. Comput. 66, 5 (2016), 905–911.

Digital Library

[10]

Saugata Ghose, Hyodong Lee, and José F. Martínez. 2013. Improving memory scheduling via processor-side load criticality information. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 84–95.

Digital Library

[11]

Fazal Hameed, Lars Bauer, and Jörg Henkel. 2013. Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). IEEE, 1–10.

Digital Library

[12]

John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1–17.

Digital Library

[13]

Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, and Miquel Pericàs. 2021. CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 213–225.

[14]

Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana. 2008. Self-optimizing memory controllers: A reinforcement learning approach. ACM SIGARCH Computer Architecture News 36, 3 (2008), 39–50.

Digital Library

[15]

Rahul Jain, Preeti Ranjan Panda, and Sreenivas Subramoney. 2017. A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioning. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 800–805.

[16]

Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr, and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). ACM SIGARCH Computer Architecture News 38, 3 (2010), 60–71.

Digital Library

[17]

Daniel A. Jiménez, Elvira Teran, and Paul V. Gratz. 2023. Last-level cache insertion and promotion policy in the presence of aggressive prefetching. IEEE Computer Architecture Letters 22, 1 (2023), 17–20.

Digital Library

[18]

Kamil Kędzierski, Miquel Moreto, Francisco J. Cazorla, and Mateo Valero. 2010. Adapting cache partitioning algorithms to pseudo-lru replacement policies. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1–12.

[19]

Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 65–76.

Digital Library

[20]

Zhaoying Li, Lei Ju, Hongjun Dai, Xin Li, Mengying Zhao, and Zhiping Jia. 2018. Set variation-aware shared LLC management for CPU-GPU heterogeneous architecture. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 79–84.

[21]

Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, et al. 2020. The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 (2020).

[22]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories 27 (2009), 28.

[23]

Jiwoong Park, Heonyoung Yeom, and Yongseok Son. 2020. Page reusability-based cache partitioning for multi-core systems. IEEE Trans. Comput. 69, 6 (2020), 812–818.

[24]

Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). IEEE, 423–432.

Digital Library

[25]

Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory access scheduling. ACM SIGARCH Computer Architecture News 28, 2 (2000), 128–138.

Digital Library

[26]

Eduardo Olmedo Sanchez and Xian-He Sun. 2019. Cads: Core-aware dynamic scheduler for multicore memory controllers. arXiv preprint arXiv:1907.07776 (2019).

[27]

Yang Song, Olivier Alavoine, and Bill Lin. 2018. Row-buffer hit harvesting in orchestrated last-level cache and DRAM scheduling for heterogeneous multicore systems. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 779–784.

[28]

Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. ACM SIGARCH Computer Architecture News 38, 3 (2010), 72–82.

Digital Library

[29]

Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, and Onur Mutlu. 2016. BLISS: Balancing performance, fairness and complexity in memory access scheduling. IEEE Transactions on Parallel and Distributed Systems 27, 10 (2016), 3071–3087.

Digital Library

[30]

Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the 48th International Symposium on Microarchitecture. 62–75.

Digital Library

[31]

Sakshi Tiwari, Shreshth Tuli, Isaar Ahmad, Ayushi Agarwal, Preeti Ranjan Panda, and Sreenivas Subramoney. 2019. REAL: REquest arbitration in last level caches. ACM Transactions on Embedded Computing Systems (TECS) 18, 6 (2019), 1–24.

Digital Library

[32]

Andreas Tretter, Georgia Giannopoulou, Matthias Baer, and Lothar Thiele. 2017. Minimising access conflicts on shared multi-bank memory. ACM Transactions on Embedded Computing Systems (TECS) 16, 5s (2017), 1–20.

Digital Library

[33]

Hans Vandierendonck and Andre Seznec. 2011. Fairness metrics for multi-threaded processors. IEEE Computer Architecture Letters 10, 1 (2011), 4–7.

Digital Library

[34]

Kelefouras Vasilios, Keramidas Georgios, and Voros Nikolaos. 2018. Combining software cache partitioning and loop tiling for effective shared cache management. ACM Transactions on Embedded Computing Systems (TECS) 17, 3 (2018), 1–25.

Digital Library

[35]

Po-Han Wang, Cheng-Hsuan Li, and Chia-Lin Yang. 2016. Latency sensitivity-based cache partitioning for heterogeneous multi-core architecture. In Proceedings of the 53rd Annual Design Automation Conference. 1–6.

Digital Library

[36]

Yaocheng Xiang, Chencheng Ye, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang. 2019. EMBA: Efficient memory bandwidth allocation to improve performance on intel commodity processor. In Proceedings of the 48th International Conference on Parallel Processing. 1–12.

Digital Library

[37]

Di Xu, Chenggang Wu, and Pen-Chung Yew. 2010. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. 237–248.

Digital Library

Cited By

Bagchi ADharamjeet Rishabh OSuri MPanda P(2024)POEM: Performance Optimization and Endurance Management for Non-volatile CachesACM Transactions on Design Automation of Electronic Systems10.1145/365345229:5(1-36)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3653452
Garzón EHanhan RLanuzza MTeman AYavits L(2024)FASTA: Revisiting Fully Associative Memories in Computer MicroarchitectureIEEE Access10.1109/ACCESS.2024.335596112(13923-13943)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3355961

Index Terms

CABARRE: Request Response Arbitration for Shared Cache Management
1. Computer systems organization

Recommendations

COBRRA: COntention-aware cache Bypass with Request-Response Arbitration
In modern multi-processor systems-on-chip (MPSoCs), requests from different processor cores, accelerators, and their responses from the lower-level memory contend for the shared cache bandwidth, making it a critical performance bottleneck. Prior research ...
REAL: REquest Arbitration in Last Level Caches

Shared last level caches (LLC) of multicore systems-on-chip are subject to a significant amount of contention over a limited bandwidth, resulting in major performance bottlenecks that make the issue a first-order concern in modern multiprocessor systems-...
Dynamic Partitioning of Shared Cache Memory

This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches.
Since memory reference characteristics of processes/threads can ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 22, Issue 5s

Special Issue ESWEEK 2023

October 2023

1394 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3614235

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 September 2023

Accepted: 13 July 2023

Revised: 02 June 2023

Received: 23 March 2023

Published in TECS Volume 22, Issue 5s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Semiconductor Research Corporation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
246
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)11

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bagchi ADharamjeet Rishabh OSuri MPanda P(2024)POEM: Performance Optimization and Endurance Management for Non-volatile CachesACM Transactions on Design Automation of Electronic Systems10.1145/365345229:5(1-36)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3653452
Garzón EHanhan RLanuzza MTeman AYavits L(2024)FASTA: Revisiting Fully Associative Memories in Computer MicroarchitectureIEEE Access10.1109/ACCESS.2024.335596112(13923-13943)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3355961

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents