Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3295816.3295821acmotherconferencesArticle/Chapter ViewAbstractPublication PagesandareConference Proceedingsconference-collections
research-article

HERO: an open-source research platform for HW/SW exploration of heterogeneous manycore systems

Published: 04 November 2018 Publication History

Abstract

Heterogeneous systems on chip (HeSoCs) co-integrate a high-performance multicore host processor with programmable manycore accelerators (PMCAs) to combine "standard platform" software support (e.g. the Linux OS) with energy-efficient, domain-specific, highly parallel processing capabilities.
In this work, we present HERO, a HeSoC platform that tackles this challenge in a novel way. HERO's host processor is an industry-standard ARM Cortex-A multicore complex, while its PMCA is a scalable, silicon-proven, open-source many-core processing engine, based on the extensible, open RISC-V ISA.
We evaluate a prototype implementation of HERO, where the PMCA implemented on an FPGA fabric is coupled with a hard ARM Cortex-A host processor, and show that the run time overhead compared to manually written PMCA code operating on private physical memory is lower than 10 % for pivotal benchmarks and operating conditions.

References

[1]
ARM Ltd. 2017. ARM Mali GPU OpenCL.
[2]
A. Capotondi and A. Marongiu. 2017. Enabling Zero-copy OpenMP Offloading on the PULP Many-core Accelerator. In SCOPES '17. ACM, New York, NY, USA, 68--71.
[3]
A. Capotondi, A. Marongiu, and L. Benini. 2018. Runtime Support for Multiple Offload-Based Programming Models on Clustered Manycore Accelerators. IEEE Transactions on Emerging Topics in Computing 6, 3 (July 2018), 330--342.
[4]
Young-kyu Choi, Jason Cong, Zhenman Fang, Yuchen Hao, Glenn Reinman, and Peng Wei. 2016. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms. In DAC '16. ACM, 109.
[5]
F. Conti, P. D. Schiavone, and L. Benini. 2018. XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference. IEEE TCADICS (2018), 1--1.
[6]
Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEE CSE 5, 1 (1998), 46--55.
[7]
S. Das, K. J. M. Martin, P. Coussy, and D. Rossi. 2018. A Heterogeneous Cluster with Reconfigurable Accelerator for Energy Efficient Near-Sensor Data Analytics. In ISCAS '18. 1--5.
[8]
Alejandro Duran, Eduard Ayguadé, Rosa M Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21, 02 (2011), 173--193.
[9]
B. Forsberg, L. Benini, and A. Marongiu. 2018. HePREM: Enabling predictable GPU execution on heterogeneous SoC. In DATE '18. 539--544.
[10]
M. Gautschi et al. 2017. Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices. IEEE TVLSI PP, 99 (2017), 1--14.
[11]
John L Hennessy and David A Patterson. 2018. Computer architecture: a quantitative approach (6 ed.). Elsevier.
[12]
Intel Corp. 2018. Migrating Offloading Software to Intel Xeon Phi Processor. white paper. https://www.intel.com/content/dam/www/public/us/en/documents/white-papersmigrating-offloading-software-paper.pdf.
[13]
kokke. 2018. tiny-AES-c: Small portable AES 128/192/256 in C. GitHub repository. https://github.com/kokke/tiny-AES-c/tree/f56dbc05ab0d795d74f43436aac9da56a7cc8e11
[14]
Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, and Luca Benini. 2017. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CoRR abs/1712.06497 (2017). arXiv:1712.06497 http://arxiv.org/abs/1712.06497
[15]
A. Kurth, P. Vogel, A. Marongiu, and L. Benini. 2018. Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine. In ICCD '18. IEEE.
[16]
I. Loi, A. Capotondi, D. Rossi, A. Marongiu, and L. Benini. 2018. The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores. IEEE TMSCS 4, 2 (Apr 2018), 99--112.
[17]
Andrea Marongiu, Alessandro Capotondi, and Luca Benini. 2016. Controlling NUMA effects in embedded manycore applications with lightweight nested parallelism support. Parallel Comput. 59 (2016), 24 -- 42. Theory and Practice of Irregular Applications.
[18]
A. Marongiu, A. Capotondi, G. Tagliavini, and L. Benini. 2015. Simplifying Many-Core-Based Heterogeneous SoC Programming With Offload Directives. IEEE TII 11, 4 (Aug 2015), 957--967.
[19]
Matt Martineau, Simon McIntosh-Smith, Carlo Bertolli, Arpith C. Jacob, Samuel F. Antao, Alexandre Eichenberger, Gheorghe-Teodor Bercea, Tong Chen, Tian Jin, Kevin O'Brien, Georgios Rokos, Hyojin Sung, and Zehra Sura. 2016. Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support. In PMBS '16. 54--64.
[20]
M. Martineau, S. McIntosh-Smith, and W. Gaudin. 2016. Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model. In IPDPSW '16. 338--347.
[21]
D. Melpignano et al. 2012. Platform 2012, a Many-core Computing Accelerator for Embedded SoCs: Performance Evaluation of Visual Analytics Applications. In DAC '12. 1137--1142.
[22]
Gaurav Mitra, Eric Stotzer, Ajay Jayaraj, and Alistair P Rendell. 2014. Implementation and optimization of the OpenMP accelerator model for the TI Keystone II architecture. In International Workshop on OpenMP. Springer, 202--214.
[23]
NVIDIA Corp. 2017. NVIDIA Tesla V100 GPU Architecture. white paper.
[24]
OpenMP Architecture Review Board. 2013. OpenMP API v4.0.
[25]
PULP Platform. 2018. bigPULP: RISC-V manycore accelerator for HERO (v1.0.0). GitHub repository. https://github.com/pulp-platform/bigpulp/tree/v1.0.0
[26]
PULP Platform. 2018. HERO SDK (v1.0.1). GitHub repository. https://github.com/pulp-platform/hero-sdk/tree/v1.0.1
[27]
Ruymán Reyes, Iván López-Rodríguez, Juan J Fumero, and Francisco de Sande. 2012. accULL: an OpenACC implementation with CUDA and OpenCL support. In ECPP '12. Springer, 871--882.
[28]
D. Rossi, I. Loi, G. Haugou, and L. Benini. 2014. Ultra-low-latency Lightweight DMA for Tightly Coupled Multi-core Clusters. In CF '14. ACM, New York, NY, USA, Article 15, 10 pages.
[29]
D. Rossi, A. Pullini, I. Loi, M. Gautschi, F. K. GÃijrkaynak, A. Teman, J. Constantin, A. Burg, I. Miro-Panades, E. BeignÃĺ, F. Clermidy, P. Flatresse, and L. Benini. 2017. Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster. IEEE Micro 37, 5 (Sept 2017).
[30]
P. Vogel et al. 2015. Lightweight Virtual Memory Support for Many-core Accelerators in Heterogeneous Embedded SoCs. In CODES '15. 45--54.
[31]
P. Vogel et al. 2017. Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs. ACM TECS 16, 5s (2017), 154:1--154:19.
[32]
Andrew Waterman and Krste Asanovic. 2017. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, v2.2.
[33]
Xilinx Inc. 2016. Zynq-7000 All Programmable SoC Overview. Product Specification.
[34]
B. Zimmer et al. 2016. A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC -DC Converters in 28 nm FDSOI. IEEE JSSC 51, 4 (April 2016), 930--942.

Cited By

View all
  • (2024)Scratchy: A Class of Adaptable Architectures with Software-Managed Communication for Edge Streaming ApplicationsDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_6(68-79)Online publication date: 17-Jan-2024
  • (2023)Chronos-v: a many-core high-level model with support for management techniquesAnalog Integrated Circuits and Signal Processing10.1007/s10470-023-02190-8117:1-3(57-71)Online publication date: 1-Dec-2023
  • (2022)Understanding and Mitigating Memory Interference in FPGA-based HeSoCs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774768(1335-1340)Online publication date: 14-Mar-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ANDARE '18: Proceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems
November 2018
36 pages
ISBN:9781450365918
DOI:10.1145/3295816
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. heterogeneous SoCs
  2. multi- and many-core architectures
  3. shared virtual memory

Qualifiers

  • Research-article

Conference

ANDARE'18

Acceptance Rates

Overall Acceptance Rate 3 of 4 submissions, 75%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Scratchy: A Class of Adaptable Architectures with Software-Managed Communication for Edge Streaming ApplicationsDesign and Architectures for Signal and Image Processing10.1007/978-3-031-62874-0_6(68-79)Online publication date: 17-Jan-2024
  • (2023)Chronos-v: a many-core high-level model with support for management techniquesAnalog Integrated Circuits and Signal Processing10.1007/s10470-023-02190-8117:1-3(57-71)Online publication date: 1-Dec-2023
  • (2022)Understanding and Mitigating Memory Interference in FPGA-based HeSoCs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774768(1335-1340)Online publication date: 14-Mar-2022
  • (2022)An FPGA Overlay for Efficient Real-Time Localization in 1/10th Scale Autonomous Vehicles2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774517(915-920)Online publication date: 14-Mar-2022
  • (2022)An Efficient Resource Shared RISC-V Multicore ArchitectureIEICE Transactions on Information and Systems10.1587/transinf.2021EDP7248E105.D:9(1506-1515)Online publication date: 1-Sep-2022
  • (2022)OpenMP Offloading in the Jetson Nano PlatformWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548517(1-8)Online publication date: 29-Aug-2022
  • (2022)HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.318939033:12(4368-4382)Online publication date: 1-Dec-2022
  • (2021)A Non-Intrusive Tool Chain to Optimize MPSoC End-to-End SystemsACM Transactions on Architecture and Code Optimization10.1145/344503018:2(1-22)Online publication date: 9-Feb-2021
  • (2021)Unmanned Vehicles in Smart Farming: a Survey and a Glance at Future HorizonsProceedings of the 2021 Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods and Tools Proceedings10.1145/3444950.3444958(1-8)Online publication date: 18-Jan-2021
  • (2021)A Suitability Analysis of Software Based Testing Strategies for the On-line Testing of Artificial Neural Networks Applications in Embedded Devices2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design (IOLTS)10.1109/IOLTS52814.2021.9486704(1-6)Online publication date: 28-Jun-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media