research-article

FPGA hardware implementation of Q-learning algorithm with low resource consumption

Authors:

Xiaojuan Liu,

Jietao Diao,

Nan LiAuthors Info & Claims

PRIS '22: Proceedings of the 2022 International Conference on Pattern Recognition and Intelligent Systems

Pages 8 - 13

https://doi.org/10.1145/3549179.3549181

Published: 20 August 2022 Publication History

Get Access

Abstract

Q-learning is a kind of reinforcement learning, having a wide range of applications varying in different fields. However, in some circumstances like robot control which has shorter training time requirement, Q-learning algorithm implemented on GPU or CPU may not meet the requirement. In this paper, we proposed a novel serial acceleration architecture for Q-learning algorithm and implemented the architecture on xczu7ev-ffvc1156 FPGA using Vivado 2019.1 development environment. As a result, the resource consumption is reduced by about 50% compared with the architecture proposed in [1],and the update cycle of Q-learning algorithm is fixed to 4 clock cycles.

References

[1]

Sergio Spano, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Alberto Nannarelli, and Marco Re. 2019. An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm. IEEE Access (2019), 186340-186351. DOI 10.1109/ACCESS.2019.2961174

Crossref

Google Scholar

[2]

Andrew G. Barto Et Richard S Sutton. 1998. Reinforcement learning: An introduction. MIT Press.

Google Scholar

[3]

Gian Carlo Cardarilli, Luca Di Nunzio, Fazzolari Rocco, Daniele Giardino, Marco Matta, Marco Re, and Spanò Sergio. 2021. An Action-Selection Policy Generator for Reinforcement Learning Hardware Accelerators(Conference Paper). Lecture Notes in Electrical Engineering (2021), 267-272.

Google Scholar

[4]

Blad C., S. Kallesøe C., and Bøgh S. Control of HVAC-Systems Using Reinforcement Learning With Hysteresis and Tolerance Control., 2020.

Crossref

Google Scholar

[5]

James J. Q. Yu, Wen Yu, and Jiatao Gu. 2019. Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE T. Intell. Transp. (2019), 3806-3817.

Crossref

Google Scholar

[6]

Jason Tsai Michal Mysior Shengjia Shao. 2018. Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control. IEEE (2018).

Google Scholar

[7]

Jiang Zhu, Yonghui Song, and Dingde Jiang. 2018. A New Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of Things. IEEE Internet of Things Journal (2018).

Crossref

Google Scholar

[8]

A. Dolas S., A. Jain S., and N. Bhute A. The Safety Management System Using Q-Learning Algorithm in IoT Environment., 2021.

Crossref

Google Scholar

[9]

Christopher J. C. H. Watkins and Peter Dayan. 1992. Technical Note: Q-Learning. Mach. Learn. (1992), 279-292.

Digital Library

Google Scholar

[10]

Meng Zhao, Hui Lu, Siyi Yang, and Fengjuan Guo. 2020. The Experience-Memory Q-Learning Algorithm for Robot Path Planning in Unknown Environment. IEEE Access (2020), 47824-47844.

Crossref

Google Scholar

[11]

A. Konar Amit Konar, IG Chakraborty IndraniGoswami Chakraborty, SJ Singh SapamJitu Singh, LC Jain LakhmiC. Jain, and AK Nagar AtulyaK. Nagar. 2013. A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot. IEEE T. Syst. Man Cy.-S. (2013), 1141-1153.

Google Scholar

[12]

Ee Soong AUTHOR Low, Pauline AUTHOR Ongp Uthm. Ong, and Kah Chun AUTHOR Cheah. 2019. Solving the optimal path planning of a mobile robot using improved Q-learning. Robotics & Autonomous Systems (2019), 143-161.

Google Scholar

[13]

Lucileide M. D. Da Silva, Matheus F. Torquato, and Marcelo A. C. Fernandes. 2019. Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA. IEEE Access (2019), 2782-2798. DOI 10.1109/ACCESS.2018.2885950

Crossref

Google Scholar

Recommendations

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning ...
Implementing high-performance, low-power FPGA-based optical flow accelerators in C
ASAP '13: Proceedings of the 2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Recent developments in High-Level Synthesis (HLS) for FPGAs are making it possible to “run” C code on FPGAs thereby making modern programming environments available to FPGA developers. In this paper, C code for a complex optical-flow algorithm is ...
Efficient FPGA hardware development: A multi-language approach

This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

PRIS '22: Proceedings of the 2022 International Conference on Pattern Recognition and Intelligent Systems

July 2022

102 pages

ISBN:9781450396080

DOI:10.1145/3549179

Editor:
Wenbing Zhao

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

PRIS 2022

PRIS 2022: 2022 4th International Conference on Pattern Recognition and Intelligent Systems

July 29 - 31, 2022

Wuhan, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
54
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Recommendations

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Implementing high-performance, low-power FPGA-based optical flow accelerators in C

Efficient FPGA hardware development: A multi-language approach

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations