research-article

MemOReL: A <u>Mem</u>ory-oriented <u>O</u>ptimization Approach to <u>Re</u>inforcement <u>L</u>earning on FPGA-based Embedded Systems

Authors:

Siva Satyendra Sahoo,

Akhil Raj Baranwal,

Akash KumarAuthors Info & Claims

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

Pages 339 - 346

https://doi.org/10.1145/3453688.3461533

Published: 22 June 2021 Publication History

Abstract

Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-the-art design implementations.

Supplemental Material

MP4 File

Presentation video- MemOReL: A Memory-oriented Optimization Approach to Reinforcement Learning on Embedded Systems

Download
165.63 MB

References

[1]

A. R. Baranwal, S. Ullah, S. S. Sahoo, and A. Kumar. 2020. ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--1. https://doi.org/10.1109/TCAD.2020.3028350

[2]

M. Camelo, J. Famaey, and S. Latré. 2016. A Scalable Parallel Q-Learning Algorithm for Resource Constrained Decentralized Computing Environments. In 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC). 27--35.

[3]

B. Chatterjee, N. Cao, A. Raychowdhury, and S. Sen. 2019. Context-Aware Intelligence in Resource-Constrained IoT Nodes: Opportunities and Challenges. IEEE Design Test 36, 2 (2019), 7--40.

[4]

Anup Das, Rishad A. Shafik, Geoff V. Merrett, Bashir M. Al-Hashimi, Akash Kumar, and Bharadwaj Veeravalli. 2014. Reinforcement learning-based inter- and intra-application thermal optimization for lifetime improvement of multicore systems. 2014 51st ACM/EDAC/IEEE DAC (2014), 1--6.

[5]

Hongjia Li, Tianshu Wei, Ao Ren, Qi Zhu, and Yanzhi Wang. 2017. Deep reinforcement learning: Framework, applications, and embedded implementations: Invited paper. 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2017), 847--854. https://doi.org/10.1109/iccad.2017.8203866

Digital Library

[6]

LogiCORE, IP. 2017. AXI Interconnect (v2.1).

[7]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [cs.LG]

[8]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.

[9]

Vidya Rajagopalan, Vamsi Boppana, Sandeep Dutta, Brad Taylor, and Ralph Wittig. 2011. Xilinx Zynq-7000 EPP: An extensible processing platform family. In 2011 IEEE Hot Chips 23 Symposium (HCS). IEEE, 1--24.

[10]

S. S. Sahoo, B. Veeravalli, and A. Kumar. 2019. A Hybrid Agent-based Design Methodology for Dynamic Cross-layer Reliability in Heterogeneous Embedded Systems. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 1--6.

[11]

Lucileide M. D. Da Silva, Matheus F. Torquato, and Marcelo A. C. Fernandes. 2019. Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA. IEEE Access 7 (2019), 2782--2798. https://doi.org/10.1109/access.2018.2885950

[12]

Sergio Spano, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Alberto Nannarelli, and Marco Re. 2019. An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm. IEEE Access 7 (2019), 186340--186351. https://doi.org/10.1109/access.2019.2961174

[13]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[14]

Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King's College, Cambridge, UK. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf

[15]

H. Wicaksono. 2011. Q learning behavior on autonomous navigation of physical robot. In 2011 8th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI). 50--54.

[16]

Ziming Yan and Yan Xu. 2018. Data-Driven Load Frequency Control for Stochastic Power Systems: A Deep Reinforcement Learning Method With Continuous Action Search. IEEE Transactions on Power Systems 34, 2 (2018), 1653--1656. https://doi.org/10.1109/tpwrs.2018.2881359

[17]

James J. Q. Yu, Wen Yu, and Jiatao Gu. 2019. Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems 20, 10 (2019), 3806--3817. https://doi.org/10.1109/tits.2019.2909109

Cited By

Singh NSaha IDastani MSichman JAlechina NDignum V(2024)Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique ExperiencesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663037(1754-1762)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663037
Angeloni DCanese LCardarilli GDi Nunzio LRe MSpanò S(2024)A RISC-V Hardware Accelerator for Q-Learning AlgorithmApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_11(74-79)Online publication date: 13-Jan-2024
https://doi.org/10.1007/978-3-031-48121-5_11
Sutisna NIlmy ASyafalni IMulyawan RAdiono T(2023)FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoCIEEE Access10.1109/ACCESS.2022.323285311(144-161)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2022.3232853
Show More Cited By

Index Terms

MemOReL: A <u>Mem</u>ory-oriented <u>O</u>ptimization Approach to <u>Re</u>inforcement <u>L</u>earning on FPGA-based Embedded Systems

Recommendations

From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators
Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017

Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex ...
Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI

June 2021

504 pages

ISBN:9781450383936

DOI:10.1145/3453688

General Chairs:
Yiran Chen
Duke University, USA
,
Victor Zhirnov
Semiconductor Research Corporation, USA
,
Program Chairs:
Avesta Sasan
George Mason University, USA
,
Ioannis Savidis
Drexel University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Data Availability

Presentation video- MemOReL: A Memory-oriented Optimization Approach to Reinforcement Learning on Embedded Systems https://dl.acm.org/doi/10.1145/3453688.3461533#GLSVLSI21-50.mp4

Conference

GLSVLSI '21

Sponsor:

SIGDA

GLSVLSI '21: Great Lakes Symposium on VLSI 2021

June 22 - 25, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
110
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh NSaha IDastani MSichman JAlechina NDignum V(2024)Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique ExperiencesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663037(1754-1762)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663037
Angeloni DCanese LCardarilli GDi Nunzio LRe MSpanò S(2024)A RISC-V Hardware Accelerator for Q-Learning AlgorithmApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_11(74-79)Online publication date: 13-Jan-2024
https://doi.org/10.1007/978-3-031-48121-5_11
Sutisna NIlmy ASyafalni IMulyawan RAdiono T(2023)FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoCIEEE Access10.1109/ACCESS.2022.323285311(144-161)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2022.3232853
Canese LCardarilli GDi Nunzio LFazzolari RRe MSpanó S(2023)Automatic IP Core Generator for FPGA-Based Q-Learning Hardware AcceleratorsApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-30333-3_32(242-247)Online publication date: 29-Apr-2023
https://doi.org/10.1007/978-3-031-30333-3_32
Sutisna NIlmy ASetiawan HSyafalni IMulyawan RAhmadi NAdiono T(2022)Reinforcement Learning Accelerator using Q-Learning Algorithm with Optimized Bit Precision2022 8th International Conference on Wireless and Telematics (ICWT)10.1109/ICWT55831.2022.9935371(1-5)Online publication date: 21-Jul-2022
https://doi.org/10.1109/ICWT55831.2022.9935371
Carlo Cardarilli GDi Nunzio LFazzolari RGiardino DNatale DRe MSpano S(2021)“MR Q-Learning” Algorithm for Efficient Hardware Implementations2021 55th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF53345.2021.9723147(1186-1190)Online publication date: 31-Oct-2021
https://doi.org/10.1109/IEEECONF53345.2021.9723147

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents