article

Bounding and reducing memory interference in COTS-based multi-core systems

Authors:

Dionisio De Niz,

Björn Andersson,

Ragunathan RajkumarAuthors Info & Claims

Real-Time Systems, Volume 52, Issue 3

Pages 356 - 395

https://doi.org/10.1007/s11241-016-9248-1

Published: 01 May 2016 Publication History

Abstract

In multi-core systems, main memory is a major shared resource among processor cores. A task running on one core can be delayed by other tasks running simultaneously on other cores due to interference in the shared main memory system. Such memory interference delay can be large and highly variable, thereby posing a significant challenge for the design of predictable real-time systems. In this paper, we present techniques to reduce this interference and provide an upper bound on the worst-case interference on a multi-core platform that uses a commercial-off-the-shelf (COTS) DRAM system. We explicitly model the major resources in the DRAM system, including banks, buses, and the memory controller. By considering their timing characteristics, we analyze the worst-case memory interference delay imposed on a task by other tasks running in parallel. We find that memory interference can be significantly reduced by (i) partitioning DRAM banks, and (ii) co-locating memory-intensive tasks on the same processing core. Based on these observations, we develop a memory interference-aware task allocation algorithm for reducing memory interference. We evaluate our approach on a COTS-based multi-core platform running Linux/RK. Experimental results show that the predictions made by our approach are close to the measured worst-case interference under workloads with both high and low memory contention. In addition, our memory interference-aware task allocation algorithm provides a significant improvement in task schedulability over previous work, with as much as 96 % more tasksets being schedulable.

References

[1]

Akesson B, Goossens K, Ringhofer M (2007) Predator: a predictable SDRAM memory controller. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2007.

[2]

Altmeyer S, Davis R, Maiza C (2011) Cache related pre-emption delay aware response time analysis for fixed priority pre-emptive systems. In: IEEE real-time systems symposium (RTSS), 2011.

[3]

Andersson B, Easwaran A, Lee J (2010) Finding an upper bound on the increase in execution time due to contention on the memory bus in COTS-based multicore systems. SIGBED Rev 7(1):4.

[4]

Ausavarungnirun R, Chang KK-W, Subramanian L, Loh GH, Mutlu O (2012) Staged memory scheduling: achieving high performance and scalability in heterogeneous systems. In: International symposium on computer architecture (ISCA), 2012.

[5]

Bhat B, Mueller F (2010) Making DRAM refresh predictable. In: Euromicro conference on real-time systems (ECRTS), 2010.

[6]

Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: Characterization and architectural implications. In: International conference on parallel architectures and compilation techniques (PACT), 2008.

[7]

Dasari D, Andersson B, Nelis V, Petters SM, Easwaran A, Lee J (2011) Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In: IEEE international conference on trust, security and privacy in computing and communications, 2011.

[8]

de Niz D, Rajkumar R (2006) Partitioning bin-packing algorithms for distributed real-time systems. Int J Embed Syst 2(3):196-208.

[9]

Ebrahimi E, Lee CJ, Mutlu O, Patt YN (2010) Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In: International conference on architectural support for programming languages and operating systems (ASPLOS), 2010.

[10]

Eswaran A, Rajkumar R (2005) Energy-aware memory firewalling for QoS-sensitive applications. In: Euromicro conference on real-time systems (ECRTS), 2005.

[11]

Jeong MK, Yoon DH, Sunwoo D, Sullivan M, Lee I, Erez M (2012) Balancing DRAM locality and parallelism in shared memory CMP systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2012.

[12]

Johnson DS, Demers A, Ullman JD, Garey MR, Graham RL (1974) Worst-case performance bounds for simple one-dimensional packing algorithms. SIAM J Comput 3(4):299-325.

Digital Library

[13]

Joseph M, Pandya PK (1986) Finding response times in a real-time system. Comput J 29(5):390-395.

[14]

Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014) Bounding memory interference delay in COTS-based multi-core systems. In: IEEE real-time technology and applications symposium (RTAS).

[15]

Kim Y, Han D, Mutlu O, Harchol-Balter M (2010) ATLAS: a scalable and high-performance scheduling algorithm for multiple memory controllers. In: IEEE international symposium on high-performance computer architecture (HPCA), 2010.

[16]

Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: Euromicro conference on real-time systems (ECRTS), 2013.

[17]

Kim H, Kim J, Rajkumar RR. A profiling framework in Linux/RK and its application. In: Open demo session of IEEE real-time systems symposium (RTSS@Work), 2012.

[18]

Kim Y, Papamichael M, Mutlu O, Harchol-Balter M (2010) Thread cluster memory scheduling: exploiting differences in memory access behavior. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2010.

[19]

Kim H, Rajkumar R. Shared-page management for improving the temporal isolation of memory reservations in resource kernels. In: IEEE conference on embedded and real-time computing systems and applications (RTCSA), 2012.

[20]

Krishnapillai Y, Wu ZP, Pellizzoni R (2014) A rank-switching, open-row DRAM controller for mixed-criticality systems. In: Euromicro conference on real-time systems (ECRTS), 2014.

[21]

Lakshmanan K, de Niz D, Rajkumar R, Moreno G (2010) Resource allocation in distributed mixed-criticality cyber-physical systems. In: IEEE international conference on distributed computing systems (ICDCS), 2010.

[22]

Lakshmanan K, Rajkumar R, Lehoczky JP (2009) Partitioned fixed-priority preemptive scheduling for multi-core processors. In: Euromicro conference on real-time systems (ECRTS), 2009.

[23]

Lee CJ, Narasiman V, Ebrahimi E, Mutlu O, Patt YN (2010) DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-002, UT Austin, 2010.

[24]

Li Y, Akesson B, Goossens K (2014) Dynamic command scheduling for real-time memory controllers. In: Euromicro conference on real-time systems (ECRTS), 2014.

[25]

Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference inmulticore systems. In: International conference on parallel architectures and compilation techniques (PACT), 2012.

[26]

Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46-61.

Digital Library

[27]

Lv M, Nan G, Yi W, Yu G (2010) Combining abstract interpretation with model checking for timing analysis of multicore software. In: IEEE real-time systems symposium (RTSS), 2010.

[28]

Moscibroda T, Mutlu O (2007) Memory performance attacks: denial of memory service in multicore systems. In: USENIX security symposium, 2007.

[29]

Muralidhara SP, Subramanian L, Mutlu O, Kandemir M, Moscibroda T (2011) Reducing memory interference in multicore systems via application-aware memory channel partitioning. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2011.

[30]

Mutlu O, Moscibroda T (2007) Stall-time fair memory access scheduling for chip multiprocessors. In: IEEE/ACM International symposium on microarchitecture (MICRO), 2007.

[31]

Mutlu O, Moscibroda T (2008) Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: International symposium on computer architecture (ISCA), 2008.

[32]

Nesbit KJ, Aggarwal N, Laudon J, Smith JE (2006) Fair queuing memory systems. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2006.

[33]

Oikawa S, Rajkumar R (1998) Linux/RK: a portable resource kernel in Linux. In: IEEE real-time systems symposium (RTSS) Work-In-Progress, 1998.

[34]

Paolieri M, Quiñones E, Cazorla F, Valero M (2010) An analyzable memory controller for hard read-time CMPs. IEEE Embed Syst Lett 1(4):86-90.

Digital Library

[35]

Paolieri M, Quiñones E, Cazorla F, Davis R, Valero M (2011) IA³: an interference aware allocation algorithm for multicore hard real-time systems. In: IEEE real-time technology and applications symposium (RTAS), 2011.

[36]

Pellizzoni R, Schranzhofer A, Chen J, Caccamo M, Thiele L (2010) Worst case delay analysis for memory interference in multicore systems. In: Design, automation test in europe conference exhibition (DATE), 2010.

[37]

Rajkumar R, Juvva K, Molano A, Oikawa S (1998) Resource kernels: A resource-centric approach to realtime and multimedia systems. In: SPIE/ACM conference on multimedia computing and networking, 1998.

[38]

Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) PRET DRAM controller: Bank privatization for predictability and temporal isolation. In: IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), 2011.

[39]

Rixner S, Dally WJ, Kapasi UJ, Mattson P, Owens JD (200) Memory access scheduling. In: International symposium on computer architecture (ISCA), 2000.

[40]

Rosén J, Andrei A, Eles P, Peng Z (2007) Bus access optimization for predictable implementation of realtime applications on multiprocessor systems-on-chip. In: IEEE real-time systems symposium (RTSS), 2007.

[41]

Schliecker S, Negrean M, Ernst R (2010) Bounding the shared resource load for the performance analysis of multiprocessor systems. In: Design, automation test in europe conference exhibition (DATE), 2010.

[42]

Seshadri V, Bhowmick A, Mutlu O, Gibbons PB, Kozuch M, Mowry TC, et al. (2014) The dirty-block index. In: International symposium on computer architecture (ISCA), 2014.

Digital Library

[43]

Subramanian L, Lee D, Seshadri V, Rastogi H, Mutlu O (2014) The blacklisting memory scheduler: achieving high performance and fairness at low cost. In: IEEE international conference on computer design (ICCD), 2014.

[44]

Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: IEEE/ACM international symposium on microarchitecture (MICRO), 2015.

[45]

Subramanian L, Seshadri V, Kim Y, Jaiyen B, Mutlu O (2013) MISE: providing performance predictability and improving fairness in shared main memory systems. In: IEEE international symposium on high-performance computer architecture (HPCA), 2013.

[46]

Suzuki N, Kim H, de Niz D, Andersson B, Wrage L, Klein M, Rajkumar RR (2103) Coordinated bank and cache coloring for temporal protection of memory accesses. In: IEEE International conference on embedded software and systems (ICESS), 2013.

[47]

Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans Comput Aided Des Integr Circuits Syst 28(7):966-978.

Digital Library

[48]

Wu ZP, Krish Y, Pellizzoni R (2013) Worst case analysis of DRAM latency in multi-requestor systems. In: IEEE real-time systems symposium (RTSS), 2013.

[49]

Xie M, Tong D, Huang K, Cheng X (2014) Improving system throughput and fairness simultaneously in CMP systems via dynamic bank partitioning. In: IEEE international symposium on high-performance computer architecture (HPCA), 2014.

[50]

Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation onmulticore platforms. In: IEEE real-time technology and applications symposium (RTAS), 2014.

[51]

Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2012) Memory access control in multiprocessor for real-time systems with mixed criticality. In: Euromicro conference on real-time systems (ECRTS), 2012.

[52]

Zhang X, Dwarkadas S, Shen K (2009) Hardware execution throttling for multi-core resource management. In: USENIX annual technical conference (USENIX ATC), 2009.

[53]

Zuravleff W, Robinson T (1997) Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent Number 5,630,096, 1997.

Cited By

Palomo XMolina C(2024)ITER: an ITERative approach for inter-core timing analysis in statically scheduled cyclic executive systems on COTS multicore platforms for CRTESThe Journal of Supercomputing10.1007/s11227-024-06208-480:13(19719-19770)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11227-024-06208-4
Davis RBate I(2022)Mixed Criticality on Multi-cores Accounting for Resource Stress and Resource SensitivityProceedings of the 30th International Conference on Real-Time Networks and Systems10.1145/3534879.3534883(103-115)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1145/3534879.3534883
González AChaudron JBoniol FBouchebaba YBussenot J(2022)Task and Memory Mapping Optimization for SDRAM Interference Minimization on Heterogeneous MPSoCs2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA52439.2022.9921677(1-8)Online publication date: 6-Sep-2022
https://dl.acm.org/doi/10.1109/ETFA52439.2022.9921677
Show More Cited By

Bounding and reducing memory interference in COTS-based multi-core systems

Recommendations

Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
ECRTS '15: Proceedings of the 2015 27th Euromicro Conference on Real-Time Systems

In modern Commercial Off-The-Shelf (COTS) mul-ticore systems, each core can generate many parallel memory requests at a time. The processing of these parallel requests in the DRAM controller greatly affects the memory interference delay experienced by ...
PseudoNUMA for reducing memory interference in multi-core systems
HPC '14: Proceedings of the High Performance Computing Symposium

The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAM's speed and throughput. Moreover, on multi-core platforms, DRAM memory shared ...
Adaptive Time-Based Least Memory Intensive Scheduling
MCSOC '15: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip

DRAM memory is a major resource shared in multi-core system, hence memory requests from different applications interfere with each other. Therefore, different applications running together on the same chip can experience extremely different memory ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Real-Time Systems

Real-Time Systems Volume 52, Issue 3

May 2016

157 pages

ISSN:0922-6443

Issue’s Table of Contents

Copyright © Copyright © 2016 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2016

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Palomo XMolina C(2024)ITER: an ITERative approach for inter-core timing analysis in statically scheduled cyclic executive systems on COTS multicore platforms for CRTESThe Journal of Supercomputing10.1007/s11227-024-06208-480:13(19719-19770)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11227-024-06208-4
Davis RBate I(2022)Mixed Criticality on Multi-cores Accounting for Resource Stress and Resource SensitivityProceedings of the 30th International Conference on Real-Time Networks and Systems10.1145/3534879.3534883(103-115)Online publication date: 7-Jun-2022
https://dl.acm.org/doi/10.1145/3534879.3534883
González AChaudron JBoniol FBouchebaba YBussenot J(2022)Task and Memory Mapping Optimization for SDRAM Interference Minimization on Heterogeneous MPSoCs2022 IEEE 27th International Conference on Emerging Technologies and Factory Automation (ETFA)10.1109/ETFA52439.2022.9921677(1-8)Online publication date: 6-Sep-2022
https://dl.acm.org/doi/10.1109/ETFA52439.2022.9921677
Davis RGriffin DBate I(2022)A framework for multi-core schedulability analysis accounting for resource stress and sensitivityReal-Time Systems10.1007/s11241-022-09377-858:4(456-508)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11241-022-09377-8
Maiza CRihani HRivas JGoossens JAltmeyer SDavis R(2019)A Survey of Timing Verification Techniques for Multi-Core Real-Time SystemsACM Computing Surveys10.1145/332321252:3(1-38)Online publication date: 18-Jun-2019
https://dl.acm.org/doi/10.1145/3323212
Nguyen VHardy DPuaut I(2019)Cache-conscious off-line real-time scheduling for multi-core platforms: algorithms and implementationReal-Time Systems10.1007/s11241-019-09333-z55:4(810-849)Online publication date: 1-Oct-2019
https://dl.acm.org/doi/10.1007/s11241-019-09333-z
Davis RAltmeyer SIndrusiak LMaiza CNelis VReineke J(2018)An extensible framework for multicore response time analysisReal-Time Systems10.1007/s11241-017-9285-454:3(607-661)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s11241-017-9285-4
Martinez SHardy DPuaut IBini EPagetti C(2017)Quantifying WCET reduction of parallel applications by introducing slack time to limit resource contentionProceedings of the 25th International Conference on Real-Time Networks and Systems10.1145/3139258.3139263(188-197)Online publication date: 4-Oct-2017
https://dl.acm.org/doi/10.1145/3139258.3139263
Xiong DHuang KJiang XYan X(2017)Providing Predictable Performance via a Slowdown Estimation ModelACM Transactions on Architecture and Code Optimization10.1145/312445114:3(1-26)Online publication date: 22-Aug-2017
https://dl.acm.org/doi/10.1145/3124451
Kim HRajkumar R(2017)Predictable Shared Cache Management for Multi-Core Real-Time VirtualizationACM Transactions on Embedded Computing Systems10.1145/309294617:1(1-27)Online publication date: 6-Dec-2017
https://dl.acm.org/doi/10.1145/3092946
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents