Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access
Just Accepted

Tutorial: A Novel Runtime Environment for Accelerator-Rich Heterogeneous Architectures

Online AM: 08 August 2024 Publication History

Abstract

As the landscape of computing advances, system designers are increasingly exploring methodologies that leverage higher levels of heterogeneity to enhance performance within constrained size, weight, power, and cost parameters. CEDR stands as an ecosystem facilitating productive and efficient application development and deployment across heterogeneous computing systems. It fosters the co-design of applications, scheduling heuristics, and accelerators within a unified framework. Our goal is to present CEDR as a promising environment for lifting the barriers to research on heterogeneous systems and addressing the broader challenges within domain specific architectures. We introduce CEDR and discuss the evolutionary design decisions underlying its programming model. Subsequently, we explore its utility for broad range of users through design sweeps on off-the-shelf heterogeneous platforms across scheduling heuristics, hardware compositions, and workload scenarios.

References

[1]
John L. Hennessy and David A. Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2 (1 2019), 48–60.
[2]
Anish Krishnakumar, Umit Ogras, Radu Marculescu, Mike Kishinevsky, and Trevor Mudge. 2023. Domain-Specific Architectures: Research Problems and Promising Approaches. ACM Trans. Embed. Comput. Syst. 22, 2, Article 28(jan 2023), 26 pages.
[3]
Kasra Moazzemi, Biswadip Maity, Saehanseul Yi, Amir M. Rahmani, and Nikil Dutt. 2019. HESSLE-FREE: Heterogeneous Systems Leveraging Fuzzy Control for Runtime Resource Management. ACM Transactions on Embedded Computing Systems 18, 5s (Oct. 2019), 74:1–74:19.
[4]
Cristiana Bolchini, Stefano Cherubin, Gianluca C. Durelli, Simone Libutti, Antonio Miele, and Marco D. Santambrogio. 2018. A Runtime Controller for OpenCL Applications on Heterogeneous System Architectures. SIGBED Rev. 15, 1 (March 2018), 29–35.
[5]
Georgios Christodoulis, François Broquedis, Olivier Muller, Manuel Selva, and Frédéric Desprez. 2018. An FPGA target for the StarPU heterogeneous runtime system. In 2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC). 1–8.
[6]
Xubin Tan, Jaume Bosch, Carlos Álvarez, Daniel Jiménez-González, Eduard Ayguadé, and Mateo Valero. 2019. A Hardware Runtime for Task-Based Programming Models. IEEE Transactions on Parallel and Distributed Systems 30, 9 (2019), 1932–1946.
[7]
Jani Boutellier, Jiahao Wu, Heikki Huttunen, and Shuvra S. Bhattacharyya. 2018. PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms. IEEE Transactions on Signal Processing 66, 3 (2018), 654–665.
[8]
Joshua Auerbach, David F. Bacon, Ioana Burcea, Perry Cheng, Stephen J. Fink, Rodric Rabbah, and Sunil Shukla. 2012. A compiler and runtime for heterogeneous computing. In DAC Design Automation Conference 2012. 271–276. ISSN: 0738-100X.
[9]
Chenying Hsieh, Ardalan Amiri Sani, and Nikil Dutt. 2019. SURF: Self-aware Unified Runtime Framework for Parallel Programs on Heterogeneous Mobile Architectures. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). 136–141. ISSN: 2324-8432.
[10]
J. Mack, N. Kumbhare, A. NK, U. Y. Ogras, and A. Akoglu. 2020. User-Space Emulation Framework for Domain-Specific SoC Design. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 44–53.
[11]
Joshua Mack, Sahil Hassan, Nirmal Kumbhare, Miguel Castro Gonzalez, and Ali Akoglu. 2023. CEDR: A Compiler-Integrated, Extensible DSSoC Runtime. ACM Trans. Embed. Comput. Syst. 22, 2, Article 36(1 2023), 34 pages.
[12]
Joshua Mack, Serhan Gener, Sahil Hassan, H. Umut Suluhan, and Ali Akoglu. 2023. CEDR-API: Productive, Performant Programming of Domain-Specific Embedded Systems. In 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 16–25.
[13]
Umit Ogras, Joshua Mack, and Ali Akoglu. CEDR: A Novel Runtime Environment for Accelerator-Rich Heterogeneous Architectures. ([n. d.]). https://esweek.org/education-class/ec6 2023 Embedded Systems Week (ESWEEK), https://esweek.org/education-class/ec6.
[14]
Joshua Mack, Serhan Gener, Ali Akoglu, Jacob Holtom, Alex Chiriyath, Chaitali Chakrabarti, Daniel Bliss, Anish Krishnakumar, Alper Goksoy, and Umit Ogras. 2022. GNU Radio and CEDR: Runtime Scheduling to Heterogeneous Accelerators. In Proceedings of the GNU Radio Conference, Vol.  7.
[15]
L. Chang, J. Mack, B. Willis, X. Chen, J. Brunhaver, A. Akoglu, and C. Chakrabarti. 2022. Profile-Guided Parallel Task Extraction and Execution for Domain Specific Heterogeneous SoC. In 2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). IEEE Computer Society, Los Alamitos, CA, USA, 913–920.
[16]
Joshua Mack, Samet E. Arda, Umit Y. Ogras, and Ali Akoglu. 2022. Performant, Multi-Objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System on Chip Devices. IEEE Transactions on Parallel and Distributed Systems 33, 9 (2022), 2148–2162.
[17]
Alexander Fusco, Sahil Hassan, Joshua Mack, and Ali Akoglu. 2022. A Hardware-based HEFT Scheduler Implementation for Dynamic Workloads on Heterogeneous SoCs. In 2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC). 1–6.
[18]
Jason Cong, Hui Huang, Chiyuan Ma, Bingjun Xiao, and Peipei Zhou. 2014. A Fully Pipelined and Dynamically Composable Architecture of CGRA. In 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. 9–16.
[19]
Christopher Celio, Pi-Feng Chiu, Borivoje Nikolic, David A. Patterson, and Krste Asanović. 2017. BOOM v2: an open-source out-of-order RISC-V core. Technical Report UCB/EECS-2017-157. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-157.html
[20]
Anish Krishnakumar, Hanguang Yu, Tutu Ajayi, A. Alper Goksoy, Vishrut Pandey, Joshua Mack, Sahil Hassan, Kuan-Yu Chen, Chaitali Chakrabarti, Daniel W. Bliss, Ali Akoglu, Hun-Seok Kim, Ronald G. Dreslinski, David Blaauw, and Umit Y. Ogras. 2024. FALCON: An FPGA Emulation Platform for Domain-Specific SoCs (DSSoCs). IEEE Design & Test 41, 1 (2024), 70–80.
[21]
Timothy Roscoe. 2021. It’s Time for Operating Systems to Rediscover Hardware. USENIX Association.
[22]
Jeffrey D. Ullman. 1975. NP-complete scheduling problems. Journal of Computer and System sciences 10, 3 (1975), 384–393.
[23]
A. Alper Goksoy, Sahil Hassan, Anish Krishnakumar, Radu Marculescu, Ali Akoglu, and Umit Y. Ogras. 2023. Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip. Journal of Low Power Electronics and Applications 13, 4(2023).
[24]
A. Alper Goksoy, Anish Krishnakumar, Md Sahil Hassan, Allen J. Farcas, Ali Akoglu, Radu Marculescu, and Umit Y. Ogras. 2022. DAS: Dynamic Adaptive Scheduling for Energy-Efficient Heterogeneous SoCs. IEEE Embedded Systems Letters 14, 1 (2022), 51–54.
[25]
H. Topcuoglu and S. Hariri. 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. 13, 3 (2002), 260–274. 00000.
[26]
L. F. Bittencourt, R. Sakellariou, and E. R. M. Madeira. 2010. DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. 27–34.
[27]
H. Arabnejad and J. G. Barbosa. 2014. List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table. IEEE Transactions on Parallel and Distributed Systems 25, 3 (2014), 682–694.
[28]
Naqin Zhou, Deyu Qi, Xinyang Wang, Zhishuo Zheng, and Weiwei Lin. 2017. A list scheduling algorithm for heterogeneous systems based on a critical node cost table and pessimistic cost table. Concurrency and Computation: Practice and Experience 29, 5(2017), e3944.
[29]
G. Xie, G. Zeng, X. Xiao, R. Li, and K. Li. 2017. Energy-Efficient Scheduling Algorithms for Real-Time Parallel Applications on Heterogeneous Distributed Embedded Systems. IEEE Transactions on Parallel and Distributed Systems 28, 12 (12 2017), 3426–3442.
[30]
M. F. Akbar, E. U. Munir, M. M. Rafique, Z. Malik, S. U. Khan, and L. T. Yang. 2016. List-Based Task Scheduling for Cloud Computing. In 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData). 652–659.
[31]
Anish Krishnakumar, Samet E. Arda, A. Alper Goksoy, Sumit K. Mandal, Umit Y. Ogras, Anderson L. Sartor, and Radu Marculescu. 2020. Runtime Task Scheduling Using Imitation Learning for Heterogeneous Many-Core Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4064–4077.
[32]
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (HotNets ’16). Association for Computing Machinery, New York, NY, USA, 50–56.
[33]
TraceAtlas. https://github.com/ruhrie/TraceAtlas/. ([n. d.]). accessed date: Jan. 20, 2020.
[34]
Vivek Yadav. Small U-Net for vehicle detection, github.com/vxy10/p5_VehicleDetection_Unet. ([n. d.]). Retrieved June 5, 2023 from github.com/vxy10/p5_VehicleDetection_Unet Accessed: 2023-06-05.
[35]
Imagenette Dataset, github.com/fastai/imagenette. ([n. d.]). Retrieved June 5, 2023 from github.com/fastai/imagenette Accessed: 2023-06-05.
[36]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
[37]
P. Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints (April 2018). arxiv:cs.CL/1804.03209 https://arxiv.org/abs/1804.03209
[38]
Krste Asanović, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraelevitz, Sagar Karandikar, Ben Keller, Donggyu Kim, John Koenig, Yunsup Lee, Eric Love, Martin Maas, Albert Magyar, Howard Mao, Miquel Moreto, Albert Ou, David A. Patterson, Brian Richards, Colin Schmidt, Stephen Twigg, Huy Vo, and Andrew Waterman. 2016. The Rocket Chip Generator. Technical Report UCB/EECS-2016-17. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html
[39]
CEDR Tutorial Set https://github.com/UA-RCL/CEDR/blob/tutorial/CEDR_tutorial.md. ([n. d.]). Retrieved January 31, 2024 from https://github.com/UA-RCL/CEDR/blob/tutorial/CEDR_tutorial.md
[40]
Joshua Mack, Sahil Hassan, and Ali Akoglu. CEDR: A Holistic Software and Hardware Design Environment for FPGA-Integrated Heterogeneous Systems. ([n. d.]). https://www.isfpga.org/workshops-tutorials/#t8 Tutorial: 2024 International Symposium on Field Programmable Gate Arrays, March 3-5, 2024, Monterey, CA, https://www.isfpga.org/workshops-tutorials/#t8.
[41]
Bryan Donyanavard, Tiago Mück, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous manycores. In 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). 1–10.
[42]
Bryan Donyanavard, Tiago Mück, Amir M. Rahmani, Nikil Dutt, Armin Sadighi, Florian Maurer, and Andreas Herkersdorf. 2019. SOSA: Self-Optimizing Learning with Self-Adaptive Control for Hierarchical System-on-Chip Management. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’52). Association for Computing Machinery, New York, NY, USA, 685–698.
[43]
Biswadip Maity, Bryan Donyanavard, Anmol Surhonne, Amir Rahmani, Andreas Herkersdorf, and Nikil Dutt. 2021. SEAMS: Self-Optimizing Runtime Manager for Approximate Memory Hierarchies. ACM Transactions on Embedded Computing Systems 20, 5 (July 2021), 48:1–48:26.
[44]
André Luís del Mestre Martins, Alzemiro Henrique Lucas da Silva, Amir M. Rahmani, Nikil Dutt, and Fernando Gehm Moraes. 2019. Hierarchical adaptive Multi-objective resource management for many-core systems. Journal of Systems Architecture 97 (2019), 416–427.
[45]
Chenying Hsieh, Ardalan Amiri Sani, and Nikil Dutt. 2019. SURF: Self-aware Unified Runtime Framework for Parallel Programs on Heterogeneous Mobile Architectures. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). 136–141. ISSN: 2324-8432.
[46]
Carsten Heinz, Jaco Hofmann, Jens Korinth, Lukas Sommer, Lukas Weber, and Andreas Koch. 2021. The TaPaSCo Open-Source Toolflow. J. Signal Process. Syst. 93, 5 (May 2021), 545–563.
[47]
Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2(2011), 187–198.
[48]
Cristiana Bolchini, Stefano Cherubin, Gianluca C. Durelli, Simone Libutti, Antonio Miele, and Marco D. Santambrogio. 2018. A Runtime Controller for OpenCL Applications on Heterogeneous System Architectures. SIGBED Rev. 15, 1 (March 2018), 29–35.
[49]
Jungwon Kim, Seyong Lee, Beau Johnston, and Jeffrey S. Vetter. 2021. IRIS: A Portable Runtime System Exploiting Multiple Heterogeneous Programming Systems. In 2021 IEEE High Performance Extreme Computing Conference (HPEC). 1–8.
[50]
Giorgos Vasiliadis, Rafail Tsirbas, and Sotiris Ioannidis. 2022. The Best of Many Worlds: Scheduling Machine Learning Inference on CPU-GPU Integrated Architectures. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 55–64.
[51]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol.  32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
[52]
GNU Radio Website,https://www.gnuradio.org. ([n. d.]). Retrieved January 31, 2024 from https://www.gnuradio.org
[53]
Tsung-Wei Huang, Dian-Lun Lin, Chun-Xun Lin, and Yibo Lin. 2022. Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System. IEEE Transactions on Parallel and Distributed Systems 33, 6 (2022), 1303–1320.
[54]
Richard Uhrie, Chaitali Chakrabarti, and John Brunhaver. 2020. Automated Parallel Kernel Extraction from Dynamic Application Traces. (2020). arXiv:2001.09995
[55]
Hoda Naghibijouybari, Esmaeil Mohammadian Koruyeh, and Nael Abu-Ghazaleh. 2022. Microarchitectural Attacks in Heterogeneous Systems: A Survey. Comput. Surveys 55, 7 (2022), 1–40.
[56]
Sasho Nedelkoski, Jorge Cardoso, and Odej Kao. 2019. Anomaly Detection and Classification using Distributed Tracing and Deep Learning. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). 241–250.
[57]
ARM AMBA. AXI4-stream protocol specification. Volume IHI 51A ([n. d.]).
[58]
udmabuf. Udmabuf, A Userspace mappable DMA Buffer, https://github.com/ikwzm/udmabuf. ([n. d.]). Retrieved October 13, 2021 from https://github.com/ikwzm/udmabuf
[59]
H. Umut Suluhan, Serhan Gener, Fusco Alexander, Joshua Mack, Ismet Dagli, Mehmet Belviranli, Cagatay Edemen, and Ali Akoglu. 2024. A Runtime Manager Integrated Emulation Environment for Heterogeneous SoC Design with RISC-V Cores. In 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1–8.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems Just Accepted
EISSN:1558-3465
Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Online AM: 08 August 2024
Accepted: 24 July 2024
Revised: 10 July 2024
Received: 02 April 2024

Check for updates

Author Tags

  1. Domain-Specific SoCs
  2. Heterogeneous application runtimes

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 200
    Total Downloads
  • Downloads (Last 12 months)200
  • Downloads (Last 6 weeks)57
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media