Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3549179.3549181acmotherconferencesArticle/Chapter ViewAbstractPublication PagesprisConference Proceedingsconference-collections
research-article

FPGA hardware implementation of Q-learning algorithm with low resource consumption

Published: 20 August 2022 Publication History

Abstract

Q-learning is a kind of reinforcement learning, having a wide range of applications varying in different fields. However, in some circumstances like robot control which has shorter training time requirement, Q-learning algorithm implemented on GPU or CPU may not meet the requirement. In this paper, we proposed a novel serial acceleration architecture for Q-learning algorithm and implemented the architecture on xczu7ev-ffvc1156 FPGA using Vivado 2019.1 development environment. As a result, the resource consumption is reduced by about 50% compared with the architecture proposed in [1],and the update cycle of Q-learning algorithm is fixed to 4 clock cycles.

References

[1]
Sergio Spano, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Alberto Nannarelli, and Marco Re. 2019. An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm. IEEE Access (2019), 186340-186351. DOI 10.1109/ACCESS.2019.2961174
[2]
Andrew G. Barto Et Richard S Sutton. 1998. Reinforcement learning: An introduction. MIT Press.
[3]
Gian Carlo Cardarilli, Luca Di Nunzio, Fazzolari Rocco, Daniele Giardino, Marco Matta, Marco Re, and Spanò Sergio. 2021. An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators(Conference Paper). Lecture Notes in Electrical Engineering (2021), 267-272.
[4]
Blad C., S. Kallesøe C., and Bøgh S. Control of HVAC-Systems Using Reinforcement Learning With Hysteresis and Tolerance Control., 2020.
[5]
James J. Q. Yu, Wen Yu, and Jiatao Gu. 2019. Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE T. Intell. Transp. (2019), 3806-3817.
[6]
Jason Tsai Michal Mysior Shengjia Shao. 2018. Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control. IEEE (2018).
[7]
Jiang Zhu, Yonghui Song, and Dingde Jiang. 2018. A New Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of Things. IEEE Internet of Things Journal (2018).
[8]
A. Dolas S., A. Jain S., and N. Bhute A. The Safety Management System Using Q-Learning Algorithm in IoT Environment., 2021.
[9]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Technical Note: Q-Learning. Mach. Learn. (1992), 279-292.
[10]
Meng Zhao, Hui Lu, Siyi Yang, and Fengjuan Guo. 2020. The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access (2020), 47824-47844.
[11]
A. Konar Amit Konar, IG Chakraborty IndraniGoswami Chakraborty, SJ Singh SapamJitu Singh, LC Jain LakhmiC. Jain, and AK Nagar AtulyaK. Nagar. 2013. A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot. IEEE T. Syst. Man Cy.-S. (2013), 1141-1153.
[12]
Ee Soong AUTHOR Low, Pauline AUTHOR Ongp Uthm. Ong, and Kah Chun AUTHOR Cheah. 2019. Solving the optimal path planning of a mobile robot using improved Q-learning. Robotics & Autonomous Systems (2019), 143-161.
[13]
Lucileide M. D. Da Silva, Matheus F. Torquato, and Marcelo A. C. Fernandes. 2019. Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA. IEEE Access (2019), 2782-2798. DOI 10.1109/ACCESS.2018.2885950

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PRIS '22: Proceedings of the 2022 International Conference on Pattern Recognition and Intelligent Systems
July 2022
102 pages
ISBN:9781450396080
DOI:10.1145/3549179
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. Q-learning
  3. acceleration

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PRIS 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 54
    Total Downloads
  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media