Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3453688.3461533acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

MemOReL: A <u>Mem</u>ory-oriented <u>O</u>ptimization Approach to <u>Re</u>inforcement <u>L</u>earning on FPGA-based Embedded Systems

Published: 22 June 2021 Publication History

Abstract

Reinforcement Learning (RL) represents the machine learning method that has come closest to showing human-like learning. While Deep RL is becoming increasingly popular for complex applications such as AI-based gaming, it has a high implementation cost in terms of both power and latency. Q-Learning, on the other hand, is a much simpler method that makes it more feasible for implementation on resource-constrained embedded systems for control and navigation. However, the optimal policy search in Q-Learning is a compute-intensive and inherently sequential process and a software-only implementation may not be able to satisfy the latency and throughput constraints of such applications. To this end, we propose a novel accelerator design with multiple design trade-offs for implementing Q-Learning on FPGA-based SoCs. Specifically, we analyze the various stages of the Epsilon-Greedy algorithm for RL and propose a novel microarchitecture that reduces the latency by optimizing the memory access during each iteration. Consequently, we present multiple designs that provide varying trade-offs between performance, power dissipation, and resource utilization of the accelerator. With the proposed approach, we report considerable improvement in throughput with lower resource utilization over state-of-the-art design implementations.

Supplemental Material

MP4 File
Presentation video- MemOReL: A Memory-oriented Optimization Approach to Reinforcement Learning on Embedded Systems

References

[1]
A. R. Baranwal, S. Ullah, S. S. Sahoo, and A. Kumar. 2020. ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020), 1--1. https://doi.org/10.1109/TCAD.2020.3028350
[2]
M. Camelo, J. Famaey, and S. Latré. 2016. A Scalable Parallel Q-Learning Algorithm for Resource Constrained Decentralized Computing Environments. In 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC). 27--35.
[3]
B. Chatterjee, N. Cao, A. Raychowdhury, and S. Sen. 2019. Context-Aware Intelligence in Resource-Constrained IoT Nodes: Opportunities and Challenges. IEEE Design Test 36, 2 (2019), 7--40.
[4]
Anup Das, Rishad A. Shafik, Geoff V. Merrett, Bashir M. Al-Hashimi, Akash Kumar, and Bharadwaj Veeravalli. 2014. Reinforcement learning-based inter- and intra-application thermal optimization for lifetime improvement of multicore systems. 2014 51st ACM/EDAC/IEEE DAC (2014), 1--6.
[5]
Hongjia Li, Tianshu Wei, Ao Ren, Qi Zhu, and Yanzhi Wang. 2017. Deep reinforcement learning: Framework, applications, and embedded implementations: Invited paper. 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2017), 847--854. https://doi.org/10.1109/iccad.2017.8203866
[6]
LogiCORE, IP. 2017. AXI Interconnect (v2.1).
[7]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602 [cs.LG]
[8]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
[9]
Vidya Rajagopalan, Vamsi Boppana, Sandeep Dutta, Brad Taylor, and Ralph Wittig. 2011. Xilinx Zynq-7000 EPP: An extensible processing platform family. In 2011 IEEE Hot Chips 23 Symposium (HCS). IEEE, 1--24.
[10]
S. S. Sahoo, B. Veeravalli, and A. Kumar. 2019. A Hybrid Agent-based Design Methodology for Dynamic Cross-layer Reliability in Heterogeneous Embedded Systems. In 2019 56th ACM/IEEE Design Automation Conference (DAC). 1--6.
[11]
Lucileide M. D. Da Silva, Matheus F. Torquato, and Marcelo A. C. Fernandes. 2019. Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA. IEEE Access 7 (2019), 2782--2798. https://doi.org/10.1109/access.2018.2885950
[12]
Sergio Spano, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Alberto Nannarelli, and Marco Re. 2019. An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm. IEEE Access 7 (2019), 186340--186351. https://doi.org/10.1109/access.2019.2961174
[13]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[14]
Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King's College, Cambridge, UK. http://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf
[15]
H. Wicaksono. 2011. Q learning behavior on autonomous navigation of physical robot. In 2011 8th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI). 50--54.
[16]
Ziming Yan and Yan Xu. 2018. Data-Driven Load Frequency Control for Stochastic Power Systems: A Deep Reinforcement Learning Method With Continuous Action Search. IEEE Transactions on Power Systems 34, 2 (2018), 1653--1656. https://doi.org/10.1109/tpwrs.2018.2881359
[17]
James J. Q. Yu, Wen Yu, and Jiatao Gu. 2019. Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE Transactions on Intelligent Transportation Systems 20, 10 (2019), 3806--3817. https://doi.org/10.1109/tits.2019.2909109

Cited By

View all
  • (2024)Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique ExperiencesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663037(1754-1762)Online publication date: 6-May-2024
  • (2024)A RISC-V Hardware Accelerator for Q-Learning AlgorithmApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_11(74-79)Online publication date: 13-Jan-2024
  • (2023)FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoCIEEE Access10.1109/ACCESS.2022.323285311(144-161)Online publication date: 2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '21: Proceedings of the 2021 Great Lakes Symposium on VLSI
June 2021
504 pages
ISBN:9781450383936
DOI:10.1145/3453688
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. energy-efficient computing
  2. fpga
  3. hardware accelerators
  4. high-level synthesis
  5. memory-centric computing

Qualifiers

  • Research-article

Data Availability

Presentation video- MemOReL: A Memory-oriented Optimization Approach to Reinforcement Learning on Embedded Systems https://dl.acm.org/doi/10.1145/3453688.3461533#GLSVLSI21-50.mp4

Conference

GLSVLSI '21
Sponsor:
GLSVLSI '21: Great Lakes Symposium on VLSI 2021
June 22 - 25, 2021
Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique ExperiencesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663037(1754-1762)Online publication date: 6-May-2024
  • (2024)A RISC-V Hardware Accelerator for Q-Learning AlgorithmApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_11(74-79)Online publication date: 13-Jan-2024
  • (2023)FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoCIEEE Access10.1109/ACCESS.2022.323285311(144-161)Online publication date: 2023
  • (2023)Automatic IP Core Generator for FPGA-Based Q-Learning Hardware AcceleratorsApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-30333-3_32(242-247)Online publication date: 29-Apr-2023
  • (2022)Reinforcement Learning Accelerator using Q-Learning Algorithm with Optimized Bit Precision2022 8th International Conference on Wireless and Telematics (ICWT)10.1109/ICWT55831.2022.9935371(1-5)Online publication date: 21-Jul-2022
  • (2021)“MR Q-Learning” Algorithm for Efficient Hardware Implementations2021 55th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF53345.2021.9723147(1186-1190)Online publication date: 31-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media