Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2968455.2968509acmotherconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article
Public Access

Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip

Published: 01 October 2016 Publication History

Abstract

Local memory is a key factor for the performance of accelerators in SoCs. Despite technology scaling, the gap between on-chip storage and memory footprint of embedded applications keeps widening. We present a solution to preserve the speedup of accelerators when scaling from small to large data sets. Combining specialized DMA and address translation with a software layer in Linux, our design is transparent to user applications and broadly applicable to any class of SoCs hosting high-throughput accelerators. We demonstrate the robustness of our design across many heterogeneous workload scenarios and memory allocation policies with FPGA-based SoC prototypes featuring twelve concurrent accelerators accessing up to 768MB out of 1GB-addressable DRAM.

References

[1]
M. Awasthi, et al. Handling the problems and opportunities posed by multiple on-chip memory controllers. In Proceedings of the International Conference on Parallel architectures and compilation techniques (PACT), pages 319--330, Sept. 2010.
[2]
K. Barker, et al. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute, December 2013. http://hpc.pnnl.gov/projects/PERFECT/.
[3]
S. Borkar and A. A. Chien. The future of microprocessors. Communication of the ACM, 54:67--77, May 2011.
[4]
L. P. Carloni. From latency insensitive design to communication-based system-level design. Proceedings of the IEEE, 103(11):2133--2151, Nov. 2015.
[5]
L. P. Carloni. The case for embedded scalable platforms. In Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), June 2016.
[6]
Y.-T. Chen, et al. Accelerator-rich CMPs: From concept to real hardware. In Proceedings of IEEE International Conference on Computer Design (ICCD), pages 169--176, Oct. 2013.
[7]
E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In Proceedings of Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 225--236, 2010.
[8]
J. Cong, et al. Accelerator-rich architectures: Opportunities and progresses. In Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), June 2014.
[9]
E. G. Cota, P. Mantovani, and L. P. Carloni. Exploiting private local memories to reduce the opportunity cost of accelerator integration. In Proceedings of the International Conference on Supercomputing (ICS), June 2016.
[10]
E. G. Cota, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. An analysis of accelerator coupling in heterogeneous architectures. In Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), June 2015.
[11]
E. G. Cota, et al. Accelerator memory reuse in the dark silicon era. Computer Architecture Letters, 13(1):9--12, Jan-Jun 2014.
[12]
W. J. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), pages 684--689, 2001.
[13]
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. Neural acceleration for general-purpose approximate programs. In Proceedings of Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 449--460, 2012.
[14]
C. F. Fajardo, et al. Buffer-integrated-cache: A cost-effective SRAM architecture for handheld and embedded platforms. In Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), pages 966--971, June 2011.
[15]
J. Gaisler. An open-source VHDL IP library with plug & play configuration. Building the Information Society, pages 711--717, 2004.
[16]
R. Komuravelli, et al. Stash: Have your scratchpad and cache it too. In Proceedings of International Symposium on Computer Architecture (ISCA), pages 707--719.
[17]
B. Li, Z. Fang, and R. Iyer. Template-based memory access engine for accelerators in SoCs. In Proceedings of Asia and South Pacific Design Automation Conference (ASP-DAC), pages 147--153, Jan. 2011.
[18]
K. Lim, et al. Thin servers with smart pipes: Designing SoC accelerators for Memcached. SIGARCH Comput. Archit. News, 41(3):36--47, June 2013.
[19]
L. Liu, et al. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the International Conference on Parallel architectures and compilation techniques (PACT), pages 367--376, 2012.
[20]
S. Lotlikar, V. Pai, and P. V. Gratz. AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC Emulation. In Proceedings of Annual Conference on VLSI Design, pages 147--152, Jan. 2011.
[21]
M. J. Lyons, M. Hempstead, G.-Y. Wei, and D. Brooks. The accelerator store: A shared memory framework for accelerator-based systems. ACM Transactions on Architecture and Code Optimization (TACO), 8(4):48:1--48:22, Jan. 2012.
[22]
P. Mantovani, G. D. Guglielmo, and L. P. Carloni. High-level synthesis of accelerators in embedded scalable platforms. In Proceedings of Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2016.
[23]
M. Mehendale, et al. A true multistandard, programmable, low-power, full HD video-codec engine for smartphone SoC. In ISSCC Digest of Technical Papers, pages 226--228, Feb. 2012.
[24]
D. Melpignano, et al. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications. In Proceedings of ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1137--1142, June 2012.
[25]
S. P. Muralidhara, et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 374--385, Dec. 2011.
[26]
C. Pilato, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. System-level Memory Optimization for High-level Synthesis of Component-based SoCs. In Proceedings of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 1--10, Oct. 2014.
[27]
proFPGA Prototyping Systems. http://www.prodesign-europe.com/profpga.
[28]
W. Qadeer, et al. Convolution engine: balancing efficiency & flexibility in specialized computing. In Proceedings of International Symposium on Computer Architecture (ISCA), pages 24--35, June 2013.
[29]
Y. S. Shao, et al. Toward cache-friendly hardware accelerators. In HPCA Sensors and Cloud Architectures Workshop (SCAW), pages 1--6, Feb. 2015.
[30]
S. K. Shukla, Y. Yang, L. N. Bhuyan, and P. Brisk. Shared memory heterogeneous computation on PCIe-supported platforms. In Proceedings of International Conference on Field Programmable Logic and Applications (FPL), pages 1--4, Sept. 2013.
[31]
G. Venkatesh, et al. Conservation cores: reducing the energy of mature computations. In Proceedings of Conference on Architectural support for programming languages and operating systems (ASPLOS), pages 205--218, Mar. 2010.
[32]
P. Vogel, A. Marongiu, and L. Benini. Lightweight virtual memory support for many-core accelerators in heterogeneous embedded socs. In Proceedings of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 45--54, Oct 2015.
[33]
F. Winterstein, et al. MATCHUP: Memory abstractions for heap manipulating programs. In Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pages 136--145, Feb. 2015.
[34]
H.-J. Yang, et al. LMC: Automatic resource-aware program-optimized memory partitioning. In Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pages 128--137, 2016.

Cited By

View all
  • (2023)Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00028(201-213)Online publication date: Apr-2023
  • (2022)A review of auxiliary hardware architectures supporting dynamic taint analysisInternational Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022)10.1117/12.2642713(109)Online publication date: 28-Jul-2022
  • (2022)A Hybrid Memory/Accelerator Tile Architecture for FPGA-based RISC-V Manycore Systems2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL57034.2022.00053(300-306)Online publication date: Aug-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
October 2016
187 pages
ISBN:9781450344821
DOI:10.1145/2968455
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ESWEEK'16
ESWEEK'16: TWELFTH EMBEDDED SYSTEM WEEK
October 1 - 7, 2016
Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)16
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Redwood: Flexible and Portable Heterogeneous Tree Traversal Workloads2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00028(201-213)Online publication date: Apr-2023
  • (2022)A review of auxiliary hardware architectures supporting dynamic taint analysisInternational Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022)10.1117/12.2642713(109)Online publication date: 28-Jul-2022
  • (2022)A Hybrid Memory/Accelerator Tile Architecture for FPGA-based RISC-V Manycore Systems2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL57034.2022.00053(300-306)Online publication date: Aug-2022
  • (2022)Accelerator Design with High-Level SynthesisHandbook of Computer Architecture10.1007/978-981-15-6401-7_19-1(1-33)Online publication date: 27-Jan-2022
  • (2021)Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCsMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480065(350-365)Online publication date: 18-Oct-2021
  • (2021)Automatic Generation of Heterogeneous SoC Architectures With Secure CommunicationsIEEE Embedded Systems Letters10.1109/LES.2020.300397413:2(61-64)Online publication date: Jun-2021
  • (2021)HARDROID: Transparent Integration of Crypto Accelerators in Android2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622875(1-8)Online publication date: 20-Sep-2021
  • (2020)ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116317(1049-1054)Online publication date: Mar-2020
  • (2020)BYOCProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378479(699-714)Online publication date: 9-Mar-2020
  • (2020)Scalable Open-Source System-on-Chip Design: (Invited Talk - Extended Abstract)2020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC)10.1109/VLSI-SOC46417.2020.9344077(7-9)Online publication date: 5-Oct-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media