Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3447928.3456707acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

A few lessons learned in reinforcement learning for quadcopter attitude control

Published: 19 May 2021 Publication History

Abstract

In the context of developing safe air transportation, our work is focused on understanding how Reinforcement Learning methods can improve the state of the art in traditional control, in nominal as well as non-nominal cases. The end goal is to train provably safe controllers, by improving both training and verification methods. In this paper, we explore this path for controlling the attitude of a quadcopter: we discuss theoretical as well as practical aspects of training neural nets for controlling a crazyflie 2.0 drone. In particular we describe thoroughly the choices in training algorithms, neural net architecture, hyperparameters, observation space etc. We also discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts. Finally, we measure the performance of the approach by using a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior.

References

[1]
Houssam Abbas, Yash Vardhan Pant, and Rahul Mangharam. Temporal logic robustness for general signal classes. In HSCC. ACM, 2019.
[2]
Takumi Akazaki and Ichiro Hasuo. Time robustness in MTL and expressivity in hybrid system falsification. In CAV, volume 9207 of LNCS. Springer, 2015.
[3]
Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, and Calin Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In CDC. IEEE, 2016.
[4]
Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural networks with rectified linear units. CoRR, abs/1611.01491, 2016.
[5]
Alexey Bakhirkin and Nicolas Basset. Specification and efficient monitoring beyond STL. In TACAS, volume 11428 of LNCS. Springer, 2019.
[6]
Moses Bangura and Robert Mahony. Nonlinear dynamic modeling for high performance control of a quadrotor. In ACRA, 2012.
[7]
Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems. Automatica, 38(1), 2002.
[8]
D.P. Bertsekas. Reinforcement Learning and Optimal Control. Athena Scientific optimization and computation series. 2019.
[9]
Bitcraze. https://store.bitcraze.io/.
[10]
Lukas Bjarre. Learning for quadcopter control, 2019.
[11]
Lubos Brim, Petr Dluhos, David Safránek, and Tomas Vejpustek. STL: Extending signal temporal logic with signal-value freezing operator. Inf. Comput., 236, 2014.
[12]
M. Deisenroth and C. Rasmussen. PILCO: A model-based and data-efficient approach to policy search. In ICML, 2011.
[13]
Alexandre Donzé. On signal temporal logic. In RV, volume 8174 of LNCS. Springer, 2013.
[14]
Souradeep Dutta, Xin Chen, and Sriram Sankaranarayanan. Reachability analysis for neural feedback systems using regressive polynomial rule inference. In HSCC. ACM, 2019.
[15]
Georgios E. Fainekos and George J. Pappas. Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci., 410(42), 2009.
[16]
Fan Fei, Zhan Tu, and Xinyan Deng. Learn-to-recover: Retrofitting uavs with reinforcement learning-assisted flight control under cyberphysical attacks. In ICRA, 2020.
[17]
James Ferlez, Xiaowu Sun, and Yasser Shoukry. Two-level lattice neural network architectures for control of nonlinear systems. CoRR, abs/2004.09628, 2020.
[18]
Förster, Julian. System Identification of the Crazyflie 2.0 Nano Quadrocopter. B.S. Thesis, ETH Zurich, 2015.
[19]
S Fujimoto, H van Hoof, D Meger, et al. Addressing function approximation error in actor-critic methods. Proceedings of Machine Learning Research, 80, 2018.
[20]
Maor Gaon and Ronen I. Brafman. Reinforcement learning with non-markovian rewards. In AAAI, 2020.
[21]
Yann Gilpin, Vince Kurtz, and Hai Lin. A smooth robustness measure of signal temporal logic for symbolic control. IEEE Control. Syst. Lett., 5(1):241--246, 2021.
[22]
Eric Goubault and Sylvie Putot. Inner and Outer Reachability for the Verification of Control Systems. HSCC, 2019.
[23]
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Soft actor-critic algorithms and applications. CoRR, abs/1812.05905, 2018.
[24]
Iman Haghighi, Noushin Mehdipour, Ezio Bartocci, and Calin Belta. Control from signal temporal logic specifications with smooth cumulative quantitative semantics. In CDC. IEEE, 2019.
[25]
D. Haziza, J. Rapin, and G. Synnaeve. Hiplot, interactive high-dimensionality plots. https://github.com/facebookresearch/hiplot, 2020.
[26]
Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters, 2017.
[27]
Elia Kaufmann, Antonio Loquercio, René Ranftl, Matthias Müller, Vladlen Koltun, and Davide Scaramuzza. Deep drone acrobatics. CoRR, abs/2006.05768, 2020.
[28]
William Koch, Rcihard West Renato Mancuso, and Azer Bestavros. Reinforcement Learning for UAV Attitude Control. ACM Trans. Cyber-Phys. Syst., 3(2), 2019.
[29]
Tim Koning. Developing a self-learning drone, 2020.
[30]
N. O. Lambert, D. S. Drew, J. Yaconelli, S. Levine, R. Calandra, and K. S. J. Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224--4230, 2019.
[31]
Xiao Li and Calin Belta. Temporal logic guided safe reinforcement learning using control barrier functions. 2019.
[32]
T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.
[33]
Lars Lindemann and Dimos V. Dimarogonas. Control barrier functions for signal temporal logic tasks. IEEE Control. Syst. Lett., 3(1), 2019.
[34]
Sergio Lucia and Benjamin Karg. A deep learning-based approach to robust nonlinear model predictive control. IFAC-PapersOnLine, 51(20), 2018. 6th IFAC Conference on Nonlinear Model Predictive Control NMPC 2018.
[35]
Carlos Luis and Jérôme Le Ny. Design of a Trajectory Tracking Controller for a Nanoquadcopter. Technical report, Mobile Robotics and Autonomous Systems Laboratory, Polytechnique Montreal, 2016.
[36]
Noushin Mehdipour, Cristian Ioan Vasile, and Calin Belta. Arithmetic-geometric mean robustness for control from signal temporal logic specifications. In ACC. IEEE, 2019.
[37]
Artem Molchanov, Tao Chen, Wolfgang Hönig, James A. Preiss, Nora Ayanian, and Gaurav S. Sukhatme. Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors. CoRR, abs/1903.04628, 2019.
[38]
Arnab Nilim and Laurent El Ghaoui. Robust Markov Decision Processes with Uncertain Transition Matrices. PhD thesis, 2004.
[39]
C. Poussot-Vassal, F. Demourant, A. Lepage, and D. Le Bihan. Gust load alleviation: Identification, control, and wind tunnel testing of a 2-d aeroelastic airfoil. IEEE Transactions on Control Systems Technology, 25(5), 2017.
[40]
Nirnai Rao, Elie Aljalbout, Axel Sauer, and Sami Haddadin. How to make Deep RL work in practice. 2020.
[41]
Daniele Reda, Tianxin Tao, and Michiel van de Panne. Learning to locomote: Understanding how environment design matters for deep reinforcement learning. In MIG. ACM, 2020.
[42]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
[43]
Hassam Ullah Sheikh and Ladislau Bölöni. Reducing overestimation bias by increasing representation dissimilarity in ensemble based deep Q-learning. arXiv cs.LG 2006.13823, 2020.
[44]
Samarth Sinha, Homanga Bharadhwaj, Aravind Srinivas, and Animesh Garg. D2rl: Deep dense architectures in reinforcement learning, 2020.
[45]
R. S. Sutton, A. G. Barto, and R. J. Williams. Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine, 12(2), 1992.
[46]
Che Wang, Yanqiu Wu, Quan Vuong, and Keith Ross. Striving for simplicity and performance in off-policy DRL: Output normalization and non-uniform sampling. In ICML, volume 119, 2020.
[47]
Jaehyun Yoo, Dohyun Jang, H Jin Kim, and Karl H Johansson. Hybrid reinforcement learning control for a micro quadrotor flight. IEEE Control Systems Letters, 5(2), 2020.

Cited By

View all
  • (2024)Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326343035:3(3168-3180)Online publication date: Mar-2024
  • (2024)Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls2024 IEEE 10th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT61443.2024.00014(56-66)Online publication date: 15-Jul-2024
  • (2024)Reinforcement Learning Based Attitude Control of Quadcopter2024 Tenth Indian Control Conference (ICC)10.1109/ICC64753.2024.10883695(520-525)Online publication date: 9-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HSCC '21: Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control
May 2021
300 pages
ISBN:9781450383394
DOI:10.1145/3447928
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 May 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

  • DGA/MRIS

Conference

HSCC '21
Sponsor:

Acceptance Rates

HSCC '21 Paper Acceptance Rate 27 of 77 submissions, 35%;
Overall Acceptance Rate 153 of 373 submissions, 41%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)7
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326343035:3(3168-3180)Online publication date: Mar-2024
  • (2024)Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls2024 IEEE 10th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT61443.2024.00014(56-66)Online publication date: 15-Jul-2024
  • (2024)Reinforcement Learning Based Attitude Control of Quadcopter2024 Tenth Indian Control Conference (ICC)10.1109/ICC64753.2024.10883695(520-525)Online publication date: 9-Dec-2024
  • (2024)Fault-tolerant control for UAV based on deep reinforcement learning under single rotor failure2024 36th Chinese Control and Decision Conference (CCDC)10.1109/CCDC62350.2024.10588308(5279-5285)Online publication date: 25-May-2024
  • (2024)A Review of Reinforcement Learning for Fixed-Wing Aircraft Control TasksIEEE Access10.1109/ACCESS.2024.343354012(103026-103048)Online publication date: 2024
  • (2024)Reinforcement learning with formal performance metrics for quadcopter attitude control under non-nominal contextsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107090127:PAOnline publication date: 1-Feb-2024
  • (2023)Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT56444.2023.00014(45-55)Online publication date: Jul-2023
  • (2023)Fuzzy PID Controller for UAV Based on Reinforcement LearningProceedings of 2022 International Conference on Autonomous Unmanned Systems (ICAUS 2022)10.1007/978-981-99-0479-2_160(1724-1732)Online publication date: 10-Mar-2023
  • (2022)Using Double Deep Q-Learning to learn Attitude Control of Fixed-Wing Aircraft2022 16th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)10.1109/SITIS57111.2022.00102(646-651)Online publication date: Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media