Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Bounding and reducing memory interference in COTS-based multi-core systems

Published: 01 May 2016 Publication History

Abstract

In multi-core systems, main memory is a major shared resource among processor cores. A task running on one core can be delayed by other tasks running simultaneously on other cores due to interference in the shared main memory system. Such memory interference delay can be large and highly variable, thereby posing a significant challenge for the design of predictable real-time systems. In this paper, we present techniques to reduce this interference and provide an upper bound on the worst-case interference on a multi-core platform that uses a commercial-off-the-shelf (COTS) DRAM system. We explicitly model the major resources in the DRAM system, including banks, buses, and the memory controller. By considering their timing characteristics, we analyze the worst-case memory interference delay imposed on a task by other tasks running in parallel. We find that memory interference can be significantly reduced by (i) partitioning DRAM banks, and (ii) co-locating memory-intensive tasks on the same processing core. Based on these observations, we develop a memory interference-aware task allocation algorithm for reducing memory interference. We evaluate our approach on a COTS-based multi-core platform running Linux/RK. Experimental results show that the predictions made by our approach are close to the measured worst-case interference under workloads with both high and low memory contention. In addition, our memory interference-aware task allocation algorithm provides a significant improvement in task schedulability over previous work, with as much as 96 % more tasksets being schedulable.

References

[1]
Akesson B, Goossens K, Ringhofer M (2007) Predator: a predictable SDRAM memory controller. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2007.
[2]
Altmeyer S, Davis R, Maiza C (2011) Cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems. In: IEEE real-time systems symposium (RTSS), 2011.
[3]
Andersson B, Easwaran A, Lee J (2010) Finding an upper bound on the increase in execution time due to contention on the memory bus in COTS-based multicore systems. SIGBED Rev 7(1):4.
[4]
Ausavarungnirun R, Chang KK-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: International symposium on computer architecture (ISCA), 2012.
[5]
Bhat B, Mueller F (2010) Making DRAM refresh predictable. In: Euromicro conference on real-time systems (ECRTS), 2010.
[6]
Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: Characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), 2008.
[7]
Dasari D, Andersson B, Nelis V, Petters SM, Easwaran A, Lee J (2011) Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In: IEEE international conference on trust, security and privacy in computing and communications, 2011.
[8]
de Niz D, Rajkumar R (2006) Partitioning bin-packing algorithms for distributed real-time systems. Int J Embed Syst 2(3):196-208.
[9]
Ebrahimi E, Lee CJ, Mutlu O, Patt YN (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In: International conference on architectural support for programming languages and operating systems (ASPLOS), 2010.
[10]
Eswaran A, Rajkumar R (2005) Energy-aware memory firewalling for QoS-sensitive applications. In: Euromicro conference on real-time systems (ECRTS), 2005.
[11]
Jeong MK, Yoon DH, Sunwoo D, Sullivan M, Lee I, Erez M (2012) Balancing DRAM locality and parallelism in shared memory CMP systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2012.
[12]
Johnson DS, Demers A, Ullman JD, Garey MR, Graham RL (1974) Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J Comput 3(4):299-325.
[13]
Joseph M, Pandya PK (1986) Finding response times in a real-time system. Comput J 29(5):390-395.
[14]
Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014) Bounding memory interference delay in COTS-based multi-core systems. In: IEEE real-time technology and applications symposium (RTAS).
[15]
Kim Y, Han D, Mutlu O, Harchol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: IEEE international symposium on high-performance computer architecture (HPCA), 2010.
[16]
Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: Euromicro conference on real-time systems (ECRTS), 2013.
[17]
Kim H, Kim J, Rajkumar RR. A profiling framework in Linux/RK and its application. In: Open demo session of IEEE real-time systems symposium (RTSS@Work), 2012.
[18]
Kim Y, Papamichael M, Mutlu O, Harchol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2010.
[19]
Kim H, Rajkumar R. Shared-page management for improving the temporal isolation of memory reservations in resource kernels. In: IEEE conference on embedded and real-time computing systems and applications (RTCSA), 2012.
[20]
Krishnapillai Y, Wu ZP, Pellizzoni R (2014) A rank-switching, open-row DRAM controller for mixed-criticality systems. In: Euromicro conference on real-time systems (ECRTS), 2014.
[21]
Lakshmanan K, de Niz D, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: IEEE international conference on distributed computing systems (ICDCS), 2010.
[22]
Lakshmanan K, Rajkumar R, Lehoczky JP (2009) Partitioned fixed-priority preemptive scheduling for multi-core processors. In: Euromicro conference on real-time systems (ECRTS), 2009.
[23]
Lee CJ, Narasiman V, Ebrahimi E, Mutlu O, Patt YN (2010) DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-002, UT Austin, 2010.
[24]
Li Y, Akesson B, Goossens K (2014) Dynamic command scheduling for real-time memory controllers. In: Euromicro conference on real-time systems (ECRTS), 2014.
[25]
Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference inmulticore systems. In: International conference on parallel architectures and compilation techniques (PACT), 2012.
[26]
Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46-61.
[27]
Lv M, Nan G, Yi W, Yu G (2010) Combining abstract interpretation with model checking for timing analysis of multicore software. In: IEEE real-time systems symposium (RTSS), 2010.
[28]
Moscibroda T, Mutlu O (2007) Memory performance attacks: denial of memory service in multicore systems. In: USENIX security symposium, 2007.
[29]
Muralidhara SP, Subramanian L, Mutlu O, Kandemir M, Moscibroda T (2011) Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2011.
[30]
Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: IEEE/ACM International symposium on microarchitecture (MICRO), 2007.
[31]
Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: International symposium on computer architecture (ISCA), 2008.
[32]
Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2006.
[33]
Oikawa S, Rajkumar R (1998) Linux/RK: a portable resource kernel in Linux. In: IEEE real-time systems symposium (RTSS) Work-In-Progress, 1998.
[34]
Paolieri M, Quiñones E, Cazorla F, Valero M (2010) An analyzable memory controller for hard read-time CMPs. IEEE Embed Syst Lett 1(4):86-90.
[35]
Paolieri M, Quiñones E, Cazorla F, Davis R, Valero M (2011) IA3: an interference aware allocation algorithm for multicore hard real-time systems. In: IEEE real-time technology and applications symposium (RTAS), 2011.
[36]
Pellizzoni R, Schranzhofer A, Chen J, Caccamo M, Thiele L (2010) Worst case delay analysis for memory interference in multicore systems. In: Design, automation test in europe conference exhibition (DATE), 2010.
[37]
Rajkumar R, Juvva K, Molano A, Oikawa S (1998) Resource kernels: A resource-centric approach to realtime and multimedia systems. In: SPIE/ACM conference on multimedia computing and networking, 1998.
[38]
Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) PRET DRAM controller: Bank privatization for predictability and temporal isolation. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2011.
[39]
Rixner S, Dally WJ, Kapasi UJ, Mattson P, Owens JD (200) Memory access scheduling. In: International symposium on computer architecture (ISCA), 2000.
[40]
Rosén J, Andrei A, Eles P, Peng Z (2007) Bus access optimization for predictable implementation of realtime applications on multiprocessor systems-on-chip. In: IEEE real-time systems symposium (RTSS), 2007.
[41]
Schliecker S, Negrean M, Ernst R (2010) Bounding the shared resource load for the performance analysis of multiprocessor systems. In: Design, automation test in europe conference exhibition (DATE), 2010.
[42]
Seshadri V, Bhowmick A, Mutlu O, Gibbons PB, Kozuch M, Mowry TC, et al. (2014) The dirty-block index. In: International symposium on computer architecture (ISCA), 2014.
[43]
Subramanian L, Lee D, Seshadri V, Rastogi H, Mutlu O (2014) The blacklisting memory scheduler: achieving high performance and fairness at low cost. In: IEEE international conference on computer design (ICCD), 2014.
[44]
Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2015.
[45]
Subramanian L, Seshadri V, Kim Y, Jaiyen B, Mutlu O (2013) MISE: providing performance predictability and improving fairness in shared main memory systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2013.
[46]
Suzuki N, Kim H, de Niz D, Andersson B, Wrage L, Klein M, Rajkumar RR (2103) Coordinated bank and cache coloring for temporal protection of memory accesses. In: IEEE International conference on embedded software and systems (ICESS), 2013.
[47]
Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans Comput Aided Des Integr Circuits Syst 28(7):966-978.
[48]
Wu ZP, Krish Y, Pellizzoni R (2013) Worst case analysis of DRAM latency in multi-requestor systems. In: IEEE real-time systems symposium (RTSS), 2013.
[49]
Xie M, Tong D, Huang K, Cheng X (2014) Improving system throughput and fairness simultaneously in CMP systems via dynamic bank partitioning. In: IEEE international symposium on high-performance computer architecture (HPCA), 2014.
[50]
Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation onmulticore platforms. In: IEEE real-time technology and applications symposium (RTAS), 2014.
[51]
Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2012) Memory access control in multiprocessor for real-time systems with mixed criticality. In: Euromicro conference on real-time systems (ECRTS), 2012.
[52]
Zhang X, Dwarkadas S, Shen K (2009) Hardware execution throttling for multi-core resource management. In: USENIX annual technical conference (USENIX ATC), 2009.
[53]
Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent Number 5,630,096, 1997.

Cited By

View all
  • (2024)ITER: an ITERative approach for inter-core timing analysis in statically scheduled cyclic executive systems on COTS multicore platforms for CRTESThe Journal of Supercomputing10.1007/s11227-024-06208-480:13(19719-19770)Online publication date: 1-Sep-2024
  • (2022)Mixed Criticality on Multi-cores Accounting for Resource Stress and Resource SensitivityProceedings of the 30th International Conference on Real-Time Networks and Systems10.1145/3534879.3534883(103-115)Online publication date: 7-Jun-2022
  • (2022)Task and Memory Mapping Optimization for SDRAM Interference Minimization on Heterogeneous MPSoCs2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA52439.2022.9921677(1-8)Online publication date: 6-Sep-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Real-Time Systems
Real-Time Systems  Volume 52, Issue 3
May 2016
157 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2016

Author Tags

  1. Bank partitioning
  2. DRAM
  3. Memory controller
  4. Memory interference
  5. Multi-core
  6. Task allocation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ITER: an ITERative approach for inter-core timing analysis in statically scheduled cyclic executive systems on COTS multicore platforms for CRTESThe Journal of Supercomputing10.1007/s11227-024-06208-480:13(19719-19770)Online publication date: 1-Sep-2024
  • (2022)Mixed Criticality on Multi-cores Accounting for Resource Stress and Resource SensitivityProceedings of the 30th International Conference on Real-Time Networks and Systems10.1145/3534879.3534883(103-115)Online publication date: 7-Jun-2022
  • (2022)Task and Memory Mapping Optimization for SDRAM Interference Minimization on Heterogeneous MPSoCs2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA52439.2022.9921677(1-8)Online publication date: 6-Sep-2022
  • (2022)A framework for multi-core schedulability analysis accounting for resource stress and sensitivityReal-Time Systems10.1007/s11241-022-09377-858:4(456-508)Online publication date: 1-Dec-2022
  • (2019)A Survey of Timing Verification Techniques for Multi-Core Real-Time SystemsACM Computing Surveys10.1145/332321252:3(1-38)Online publication date: 18-Jun-2019
  • (2019)Cache-conscious off-line real-time scheduling for multi-core platforms: algorithms and implementationReal-Time Systems10.1007/s11241-019-09333-z55:4(810-849)Online publication date: 1-Oct-2019
  • (2018)An extensible framework for multicore response time analysisReal-Time Systems10.1007/s11241-017-9285-454:3(607-661)Online publication date: 1-Jul-2018
  • (2017)Quantifying WCET reduction of parallel applications by introducing slack time to limit resource contentionProceedings of the 25th International Conference on Real-Time Networks and Systems10.1145/3139258.3139263(188-197)Online publication date: 4-Oct-2017
  • (2017)Providing Predictable Performance via a Slowdown Estimation ModelACM Transactions on Architecture and Code Optimization10.1145/312445114:3(1-26)Online publication date: 22-Aug-2017
  • (2017)Predictable Shared Cache Management for Multi-Core Real-Time VirtualizationACM Transactions on Embedded Computing Systems10.1145/309294617:1(1-27)Online publication date: 6-Dec-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media