research-article

Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms

Authors:

Roberto Cavicchioli,

Nicola Capodieci,

Marko BertognaAuthors Info & Claims

2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)

Pages 1 - 10

https://doi.org/10.1109/ETFA.2017.8247615

Published: 12 September 2017 Publication History

Abstract

Most of today's mixed criticality platforms feature Systems on Chip (SoC) where a multi-core CPU complex (the host) competes with an integrated Graphic Processor Unit (iGPU, the device) for accessing central memory. The multi-core host and the iGPU share the same memory controller, which has to arbitrate data access to both clients through often undisclosed or non-priority driven mechanisms. Such aspect becomes critical when the iGPU is a high performance massively parallel computing complex potentially able to saturate the available DRAM bandwidth of the considered SoC. The contribution of this paper is to qualitatively analyze and characterize the conflicts due to parallel accesses to main memory by both CPU cores and iGPU, so to motivate the need of novel paradigms for memory centric scheduling mechanisms. We analyzed different well known and commercially available platforms in order to estimate variations in throughput and latencies within various memory access patterns, both at host and device side.

References

[1]

N. Rajovic, A. Rico, J. Vipond, I. Gelado, N. Puzovic, and A. Ramirez, “Experiences with mobile processors for energy efficient hpc,” in Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 2013, pp. 464–468.

[2]

C. Nvidia, “Programming guide version 8. 0,” Nvidia Corporation, 2016. [Online]. Available: https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf.

[3]

O. W. G. Khronos, “The opencl specification version 2. 0,” Khronos Group, 2015. [Online]. Available: https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf.

[4]

L. Chai, Q. Gao, and D. K. Panda, “Understanding the impact of multicore architecture in cluster computing: A case study with intel dual-core system,” in Cluster Computing and the Grid, 2007. CCGRID 2007. Seventh IEEE International Symposium on. IEEE, 2007, pp. 471–478.

[5]

H. Kim, D. De Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar, “Bounding memory interference delay in cots-based multi-core systems,” in Real-Time and Embedded Technology and Applications Symposium (RTAS), 2014 IEEE 20th. IEEE, 2014, pp. 145–154.

[6]

R. Pellizzoni, A. Schranzhofer, J.-J. Chen, M. Caccamo, and L. Thiele, “Worst case delay analysis for memory interference in multicore systems,” in Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 2010, pp. 741–746.

[7]

D. Dasari, B. Andersson, V. Nelis, S. M. Petters, A. Easwaran, and 1. Lee, “Response time analysis of cots-based multicores considering the contention on the shared memory bus,” in Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on. IEEE, 2011, pp. 1068–1075.

[8]

G. Yao, R. Pellizzoni, S. Bak, H. Yun, and M. Caccamo, “Global realtime memory-centric scheduling for multicore systems,” 2015.

[9]

H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha, “Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms,” in Real-Time and Embedded Technology and Applications Symposium (RTAS), 2013 IEEE 19th. IEEE, 2013, pp. 55–64.

[10]

H. Yun, S. Gondi, and S. Biswas, “Protecting memory-performance critical sections in soft real-time applications,” arXiv preprint arXiv: 1502. 02287, 2015.

[11]

R. Pellizzoni, E. Betti, S. Bak, G. Yao, J. Criswell, M. Caccamo, and R. Kegley, “A predictable execution model for cots-based embedded systems,” in 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, 2011, pp. 269–279.

[12]

M. K. Jeong, M. Erez, C. Sudanthi, and N. Paver, “A qos-aware memory controller for dynamically balancing gpu and cpu bandwidth use in an mpsoc,” in Proceedings of the 49th Annual Design Automation Conference. ACM, 2012, pp. 850–855.

[13]

L. Sha, M. Caccamo, R. Mancuso, J.-E. Kim, M.-K. Yoon, R. Pellizzoni, H. Yun, R. Kegley, D. Perlman, G. Arundale et al., “Single core equivalent virtual machines for hard realtime computing on multicore processors,” Tech. Rep., 2014.

[14]

A. Rao, A. Srivastava, K. Yogesh, A. Douillet, G. Gerfin, M. Kaushik, N. Shulga, V. Venkataraman, D. Fontaine, M. Hairgrove et al., “Uni-fied memory systems and methods,” Jan. 20 2015, uS Patent App. 14/601, 223.

[15]

B. A. Hechtman and D. J. Sorin, “Evaluating cache coherent shared virtual memory for heterogeneous multicore chips,” in Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on. IEEE, 2013, pp. 118–119.

[16]

NVIDIA, “Nvidia tegra k1 white paper, a new era in mobile computing,” NVIDIA Corporation, 2014. [Online]. Available: http://www.nvidia.com/content/pdf/tegra_white_papers/tegra_k1_whitepaper_v1.0.pdf.

[17]

G. A. Elliott, B. C. Ward, and J. H. Anderson, “Gpusync: A framework for real-time gpu management,” in Real-Time Systems Symposium (RTSS), 2013 IEEE 34th. IEEE, 2013, pp. 33–44.

[18]

S. Goossens, B. Akesson, K. Goossens, and K. Chandrasekar, Memory Controllers for Mixed-Time-Criticality Systems. Springer, 2016.

[19]

NVIDIA, “Nvidia tegra x1 white paper, nvidiaś new mobile superchip,” NVIDIA Corporation, 2015. [Online]. Available: http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf.

[20]

D. Marr, F. Binns, D. Hill, G. Hinton, D. Koufaty et al., “Hyper-threading technology in the netburst® microarchitecture,” 14th Hot Chips, 2002.

[21]

S. Saini, H. Jin, R. Hood, D. Barker, P. Mehrotra, and R. Biswas, “The impact of hyper-threading on processor resource utilization in production applications,” in High Performance Computing (HiPC), 2011 18th International Conference on. IEEE, 2011, pp. 1–10.

[22]

Intel, “The compute architecture of intel processor graphics gen 9, v. 1. 0,” Intel White Paper, 2015. [Online]. Available: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf.

[23]

L. W. McVoy, C. Staelin et al., “lmbench: Portable tools for performanceanalysis” in USENIX annual technical conference. San Diego, CA, USA, 1996, pp. 279–294.

[24]

R. A. Starke and R. S. de Oliveira, “Impact of the x86 system management mode in real-time systems,” in Computing System Engineering (SBESC), 2011 Brazilian Symposium on. IEEE, 2011, pp. 151–157.

[25]

C. Maurice, N. Le Scourance, C. Neumann, O. Heen, and A. Francillon, “Reverse engineering intel last-level cache complex addressing using performance counters,” in International Workshop on Recent Advances in Intrusion Detection. Springer, 2015, pp. 48–65.

[26]

Intel, “Intel 64 and ia-32 architectures. optimization reference manual,” Intel Corporation, 2016. [Online]. Available: http://www.intel.com/content/dam/www/public/us/en/documentslmanuals/64-ia-32-architectures-optimization-manual.pdf.

Cited By

Zuepke ABastoni AChen WCaccamo MMancuso R(2024)MemPol: polling-based microsecond-scale per-core memory bandwidth regulationReal-Time Systems10.1007/s11241-024-09422-860:3(369-412)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11241-024-09422-8
Perez-Cerrolaza JAbella JKosmidis LCalderon ACazorla FFlores J(2022)GPU Devices for Safety-Critical Systems: A SurveyACM Computing Surveys10.1145/354952655:7(1-37)Online publication date: 15-Dec-2022
https://dl.acm.org/doi/10.1145/3549526
Brilli GCavicchioli RSolieri MValente PMarongiu A(2022)Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCsACM Transactions on Embedded Computing Systems10.1145/354877322:1(1-25)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3548773

Index Terms

Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms
1. General and reference

Index terms have been assigned to the content through auto-classification.

Recommendations

Analyzing memory management methods on integrated CPU-GPU systems
ISMM '17

Heterogeneous systems that integrate a multicore CPU and a GPU on the same die are ubiquitous. On these systems, both the CPU and GPU share the same physical memory as opposed to using separate memory dies. Although integration eliminates the need to ...
A Globally Arbitrated Memory Tree for Mixed-Time-Criticality Systems

Embedded systems are increasingly based on multi-core platforms to accommodate a growing number of applications, some of which have real-time requirements. Resources, such as off-chip DRAM, are typically shared between the applications using memory ...
Using Criticality of GPU Accesses in Memory Management for CPU-GPU Heterogeneous Multi-Core Processors
Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017

Heterogeneous chip-multiprocessors with CPU and GPU integrated on the same die allow sharing of critical memory system resources among the CPU and GPU applications. Such architectures give rise to challenging resource scheduling problems. In this paper, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)

Sep 2017

1377 pages

Copyright © 2017.

Publisher

IEEE Press

Publication History

Published: 12 September 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zuepke ABastoni AChen WCaccamo MMancuso R(2024)MemPol: polling-based microsecond-scale per-core memory bandwidth regulationReal-Time Systems10.1007/s11241-024-09422-860:3(369-412)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11241-024-09422-8
Perez-Cerrolaza JAbella JKosmidis LCalderon ACazorla FFlores J(2022)GPU Devices for Safety-Critical Systems: A SurveyACM Computing Surveys10.1145/354952655:7(1-37)Online publication date: 15-Dec-2022
https://dl.acm.org/doi/10.1145/3549526
Brilli GCavicchioli RSolieri MValente PMarongiu A(2022)Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCsACM Transactions on Embedded Computing Systems10.1145/354877322:1(1-25)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3548773

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents