Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

WCET analysis with MRU cache: Challenging LRU for predictability

Published: 01 April 2014 Publication History

Abstract

Most previous work on cache analysis for WCET estimation assumes a particular replacement policy called LRU. In contrast, much less work has been done for non-LRU policies, since they are generally considered to be very unpredictable. However, most commercial processors are actually equipped with these non-LRU policies, since they are more efficient in terms of hardware cost, power consumption and thermal output, while still maintaining almost as good average-case performance as LRU.
In this work, we study the analysis of MRU, a non-LRU replacement policy employed in mainstream processor architectures like Intel Nehalem. Our work shows that the predictability of MRU has been significantly underestimated before, mainly because the existing cache analysis techniques and metrics do not match MRU well. As our main technical contribution, we propose a new cache hit/miss classification, k-Miss, to better capture the MRU behavior, and develop formal conditions and efficient techniques to decide k-Miss memory accesses. A remarkable feature of our analysis is that the k-Miss classifications under MRU are derived by the analysis result of the same program under LRU. Therefore, our approach inherits the advantages in efficiency and precision of the state-of-the-art LRU analysis techniques based on abstract interpretation. Experiments with instruction caches show that our proposed MRU analysis has both good precision and high efficiency, and the obtained estimated WCET is rather close to (typically 1%∼8% more than) that obtained by the state-of-the-art LRU analysis, which indicates that MRU is also a good candidate for cache replacement policies in real-time systems.

References

[1]
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[2]
Hussein Al-Zoubi, Aleksandar Milenkovic, and Milena Milenkovic. 2004. Performance evaluation of cache replacement policies for the spec cpu2000 benchmark suite. In Proceedings of the 42nd Annual Southeast Regional Conference. ACM-SE 42. ACM, New York, 267--272.
[3]
Frances E. Allen. 1970. Control flow analysis. In Proceedings of the Symposium on Compiler Optimization.
[4]
Sebastian Altmeyer, Claire Maiza, and Jan Reineke. 2010. Resilience analysis: tightening the crpd bound for set-associative caches. In Proceedings of the ACM SIGPLAN/SIGBED 2010 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'10). ACM, New York, 153--162.
[5]
R. Arnold, F. Mueller, D. B. Whalley, and M. G. Harmon. 1994. Bounding worst-case instruction cache performance. In Proceedings of RTSS.
[6]
Todd Austin, Eric Larson, and Dan Ernst. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67.
[7]
Clément Ballabriga and Hugues Casse. 2008. Improving the first-miss computation in set-associative instruction caches. In Proceedings of the Euromicro Conference on Real-Time Systems (ECRTS'08). IEEE Computer Society, Los Alamitos, CA, 341--350.
[8]
M. Berkelaar. lp_solve: (Mixed Integer) linear programming problem solver. ftp://ftp.es.ele.tue.nl/pub/lp_solve.
[9]
Sudipta Chattopadhyay, Abhik Roychoudhury, and Tulika Mitra. 2010. Modeling shared cache and bus in multi-cores for timing analysis. In Proceedings of the 13th International Workshop on Software Compilers for Embedded Systems (SCOPES'10). ACM, New York, 6:1--6:10.
[10]
Christoph Cullmann. 2011. Cache persistence analysis: A novel approachtheory and practice. In Proceedings of the 2011 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'11). ACM, New York, 121--130.
[11]
David Eklov, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2011. Cache pirating: Measuring the curse of the shared cache. In Proceedings of the International Conference on Parallel Processing (ICPP'11). IEEE Computer Society, Los Alamitos, CA, 165--175.
[12]
C. Ferdinand. 1997. Cache behavior prediction for real-time systems. Ph.D. Thesis, Universitat des Saarlandes.
[13]
Christian Ferdinand and Reinhard Wilhelm. 1998. On predicting data cache behavior for real-time systems. In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98), Frank Mueller and Azer Bestavros, Eds., Lecture Notes in Computer Science, vol. 1474, Springer, 16--30.
[14]
Daniel Grund and Jan Reineke. 2009. Abstract interpretation of FIFO replacement. In Proceedings of the 16th International Symposium on Static Analysis (SAS'09). Springer-Verlag, Berlin, 120--136.
[15]
Daniel Grund and Jan Reineke. 2010a. Precise and efficient FIFO-replacement analysis based on static phase detection. In Proceedings of the 2010 22nd Euromicro Conference on Real-Time Systems, (ECRTS'10). IEEE Computer Society, Los Alamitos, CA, 155--164.
[16]
Daniel Grund and Jan Reineke. 2010b. Toward precise PLRU cache analysis. In Proceedings of 10th International Workshop on Worst-Case Execution Time (WCET) Analysis. B. Lisper, Ed., 28--39.
[17]
Nan Guan, Mingsong Lv, Wang Yi, and Ge Yu. 2012. WCET analysis with MRU caches: Challenging LRU for predictability. In Proceedings of the IEEE 18th Real Time and Embedded Technology and Applications Symposium (RTAS'12). IEEE Computer Society, Los Alamitos, CA, 55--64.
[18]
Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The mälardalen WCET benchmarks: Past, present and future. In Proceedings of the 10th International Workshop on Worst-Case Execution Time Analysis (WCET'10). B. Lisper, Ed., OASICS Series, vol. 15, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 136--146.
[19]
Damien Hardy and Isabelle Puaut. 2008. WCET analysis of multi-level non-inclusive set-associative instruction caches. In Proceedings of the 2008 Real-Time Systems Symposium (RTSS'08). IEEE Computer Society, Los Alamitos, CA, 456--466.
[20]
Reinhold Heckmann, Marc Langenbach, Stephan Thesing, and Reinhard Wilhelm. 2003. The influence of processor architecture on the design and the results of WCET tools. Proc. IEEE 91, 7, 1038--1054.
[21]
John L. Hennessy and David A. Patterson. 2006. Computer Architecture: A Quantitative Approach. 4th Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[22]
Bach Khoa Huynh, Lei Ju, and Abhik Roychoudhury. 2011. Scope-aware data cache analysis for WCET estimation. In Proceedings of the 17th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'11). IEEE Computer Society, Los Alamitos, CA, 203--212.
[23]
Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun. 2005. Niagara: A 32-way multithreaded SPARC processor. IEEE Micro 25, 2, 21--29.
[24]
Xianfeng Li, Yun Liang, Tulika Mitra, and Abhik Roychoudhury. 2007. Chronos: A timing analyzer for embedded software. Sci. Comput. Program. 69, 1--3, 56--67.
[25]
Y. S. Li, S. Malik, and A. Wolfe. 1996. Cache modeling for real-time software: Beyond direct mapped instruction caches. In Proceedings of the 1996 Real-Time Systems Symposium (NTSS'96). IEEE Computer Society, Los Alamitos, CA.
[26]
Yau-Tsun Steven Li and Sharad Malik. 1995. Performance analysis of embedded software using implicit path enumeration. In Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference (DAC'95). ACM, New York, 456--461.
[27]
Yun Liang, Huping Ding, Tulika Mitra, Abhik Roychoudhury, Yan Li, and Vivy Suhendra. 2012. Timing analysis of concurrent programs running on shared cache multi-cores. Real-Time Syst. 48, 6, 638--680.
[28]
M. Lv. 2012. CATE: A simulator for Cache Analysis Technique Evaluation in WCET estimation. http://faculty.neu.edu.cn/ise/lvmingsong/cate/.
[29]
A. Malamy, R. Patel, and N. Hayes. 1994. Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature. United States Patent 5029072.
[30]
F. Mueller. 1994. Static cache simulation and its applications. Ph.D. thesis, Florida State University.
[31]
Frank Mueller. 2000. Timing analysis for instruction caches. Real-Time Syst. 18, 2/3, 217--247.
[32]
Peter P. Puschner and Alan Burns. 2000. Guest editorial: A review of worst-case execution-time analysis. Real-Time Syst. 18, 2/3, 115--128.
[33]
J. Reineke. 2008. Caches in WCET analysis - predictability, competitiveness, sensitivity. In Ph.D. thesis, Saarland University.
[34]
Jan Reineke and Daniel Grund. 2008. Relative competitive analysis of cache replacement policies. In Proceedings of the ACM SIGPLAN-SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'08). ACM, New York, 51--60.
[35]
Jan Reineke and Daniel Grund. 2013. Sensitivity of cache replacement policies. ACM Trans. Embed. Comput. Syst. 12, 1s, 42:1--42:18.
[36]
Jan Reineke, Daniel Grund, Christoph Berg, and Reinhard Wilhelm. 2007. Timing predictability of cache replacement policies. Real-Time Syst. 37, 2, 99--122.
[37]
R. Sen and Y. N. Srikant. 2007. WCET estimation for executables in the presence of data caches. In Proceedings of the 7th ACM & IEEE International Conference on Embedded Software (EMSOFT'07). ACM, New York, 203--212.
[38]
Tyler Sondag and Hridesh Rajan. 2010. A more precise abstract domain for multi-level caches for tighter WCET analysis. In Proceedings of the 31st IEEE Real-Time Systems Symposium (RTSS'10). IEEE Computer Society, Los Alamitos, CA, 395--404.
[39]
Jan Staschulat and Rolf Ernst. 2007. Scalable precision cache analysis for real-time software. ACM Trans. Embed. Comput. Syst. 6, 4.
[40]
Andrew S. Tanenbaum. 2007. Modern Operating Systems 3rd Ed. Prentice-Hall.
[41]
Henrik Theiling, Christian Ferdinand, and Reinhard Wilhelm. 2000. Fast and precise WCET prediction by separated cache and path analyses. Real-Time Syst. 18, 2/3, 157--179.
[42]
Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Staschulat, and Per Stenström. 2008. The worst-case execution-time problem - Overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3, 36:1--36:53.
[43]
Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. 2009. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. Trans. Comput.-Aided Des. Integ. Cir. Sys. 28, 7, 966--978.

Cited By

View all
  • (2025)Dhcache: a dual-hash cache for optimizing the read performance in key-value storeThe Journal of Supercomputing10.1007/s11227-024-06828-w81:2Online publication date: 19-Jan-2025
  • (2024)Hot and Cold Data Migration Strategy Based on Newton’s Law of Cooling2024 7th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT61374.2024.10881624(1-7)Online publication date: 20-Dec-2024
  • (2024)Data Popularity for Cache Eviction Algorithms using Random ForestsEPJ Web of Conferences10.1051/epjconf/202429501015295(01015)Online publication date: 6-May-2024
  • Show More Cited By

Index Terms

  1. WCET analysis with MRU cache: Challenging LRU for predictability

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 13, Issue 4s
      Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)
      July 2014
      571 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/2601432
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 01 April 2014
      Accepted: 01 April 2013
      Revised: 01 January 2013
      Received: 01 June 2012
      Published in TECS Volume 13, Issue 4s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Hard real time
      2. MRU
      3. cache replacement
      4. worst-case execution times

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Dhcache: a dual-hash cache for optimizing the read performance in key-value storeThe Journal of Supercomputing10.1007/s11227-024-06828-w81:2Online publication date: 19-Jan-2025
      • (2024)Hot and Cold Data Migration Strategy Based on Newton’s Law of Cooling2024 7th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT61374.2024.10881624(1-7)Online publication date: 20-Dec-2024
      • (2024)Data Popularity for Cache Eviction Algorithms using Random ForestsEPJ Web of Conferences10.1051/epjconf/202429501015295(01015)Online publication date: 6-May-2024
      • (2024)PRVC: A Novel Vehicular Ad-Hoc Network Caching Based on Pre-trained Reinforcement LearningAdvanced Information Networking and Applications10.1007/978-3-031-57840-3_5(43-54)Online publication date: 11-Apr-2024
      • (2023)Leveraging LLVM's ScalarEvolution for Symbolic Data Cache Analysis2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00029(237-250)Online publication date: 5-Dec-2023
      • (2022)A Closer Look at Intel Resource Director Technology (RDT)Proceedings of the 30th International Conference on Real-Time Networks and Systems10.1145/3534879.3534882(127-139)Online publication date: 7-Jun-2022
      • (2022)Warping cache simulation of polyhedral programsProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523714(316-331)Online publication date: 9-Jun-2022
      • (2021)SRCP: sharing and reuse-aware replacement policy for the partitioned cache in multicore systemsDesign Automation for Embedded Systems10.1007/s10617-021-09251-z25:3(193-211)Online publication date: 1-Sep-2021
      • (2019)Fast and exact analysis for LRU cachesProceedings of the ACM on Programming Languages10.1145/32903673:POPL(1-29)Online publication date: 2-Jan-2019
      • (2019)Scope-aware Data Cache Analysis for OpenMP Programs on Multi-core ProcessorsJournal of Systems Architecture10.1016/j.sysarc.2019.04.001Online publication date: Apr-2019
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media