research-article

A few lessons learned in reinforcement learning for quadcopter attitude control

Authors:

Nicola Bernini,

François SillionAuthors Info & Claims

HSCC '21: Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control

Article No.: 27, Pages 1 - 11

https://doi.org/10.1145/3447928.3456707

Published: 19 May 2021 Publication History

Abstract

In the context of developing safe air transportation, our work is focused on understanding how Reinforcement Learning methods can improve the state of the art in traditional control, in nominal as well as non-nominal cases. The end goal is to train provably safe controllers, by improving both training and verification methods. In this paper, we explore this path for controlling the attitude of a quadcopter: we discuss theoretical as well as practical aspects of training neural nets for controlling a crazyflie 2.0 drone. In particular we describe thoroughly the choices in training algorithms, neural net architecture, hyperparameters, observation space etc. We also discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts. Finally, we measure the performance of the approach by using a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior.

References

[1]

Houssam Abbas, Yash Vardhan Pant, and Rahul Mangharam. Temporal logic robustness for general signal classes. In HSCC. ACM, 2019.

Digital Library

[2]

Takumi Akazaki and Ichiro Hasuo. Time robustness in MTL and expressivity in hybrid system falsification. In CAV, volume 9207 of LNCS. Springer, 2015.

[3]

Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, and Calin Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In CDC. IEEE, 2016.

[4]

Raman Arora, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. Understanding deep neural networks with rectified linear units. CoRR, abs/1611.01491, 2016.

[5]

Alexey Bakhirkin and Nicolas Basset. Specification and efficient monitoring beyond STL. In TACAS, volume 11428 of LNCS. Springer, 2019.

[6]

Moses Bangura and Robert Mahony. Nonlinear dynamic modeling for high performance control of a quadrotor. In ACRA, 2012.

[7]

Alberto Bemporad, Manfred Morari, Vivek Dua, and Efstratios N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems. Automatica, 38(1), 2002.

[8]

D.P. Bertsekas. Reinforcement Learning and Optimal Control. Athena Scientific optimization and computation series. 2019.

[9]

Bitcraze. https://store.bitcraze.io/.

[10]

Lukas Bjarre. Learning for quadcopter control, 2019.

[11]

Lubos Brim, Petr Dluhos, David Safránek, and Tomas Vejpustek. STL: Extending signal temporal logic with signal-value freezing operator. Inf. Comput., 236, 2014.

[12]

M. Deisenroth and C. Rasmussen. PILCO: A model-based and data-efficient approach to policy search. In ICML, 2011.

Digital Library

[13]

Alexandre Donzé. On signal temporal logic. In RV, volume 8174 of LNCS. Springer, 2013.

[14]

Souradeep Dutta, Xin Chen, and Sriram Sankaranarayanan. Reachability analysis for neural feedback systems using regressive polynomial rule inference. In HSCC. ACM, 2019.

Digital Library

[15]

Georgios E. Fainekos and George J. Pappas. Robustness of temporal logic specifications for continuous-time signals. Theor. Comput. Sci., 410(42), 2009.

[16]

Fan Fei, Zhan Tu, and Xinyan Deng. Learn-to-recover: Retrofitting uavs with reinforcement learning-assisted flight control under cyberphysical attacks. In ICRA, 2020.

[17]

James Ferlez, Xiaowu Sun, and Yasser Shoukry. Two-level lattice neural network architectures for control of nonlinear systems. CoRR, abs/2004.09628, 2020.

[18]

Förster, Julian. System Identification of the Crazyflie 2.0 Nano Quadrocopter. B.S. Thesis, ETH Zurich, 2015.

[19]

S Fujimoto, H van Hoof, D Meger, et al. Addressing function approximation error in actor-critic methods. Proceedings of Machine Learning Research, 80, 2018.

[20]

Maor Gaon and Ronen I. Brafman. Reinforcement learning with non-markovian rewards. In AAAI, 2020.

[21]

Yann Gilpin, Vince Kurtz, and Hai Lin. A smooth robustness measure of signal temporal logic for symbolic control. IEEE Control. Syst. Lett., 5(1):241--246, 2021.

[22]

Eric Goubault and Sylvie Putot. Inner and Outer Reachability for the Verification of Control Systems. HSCC, 2019.

Digital Library

[23]

Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Soft actor-critic algorithms and applications. CoRR, abs/1812.05905, 2018.

[24]

Iman Haghighi, Noushin Mehdipour, Ezio Bartocci, and Calin Belta. Control from signal temporal logic specifications with smooth cumulative quantitative semantics. In CDC. IEEE, 2019.

Digital Library

[25]

D. Haziza, J. Rapin, and G. Synnaeve. Hiplot, interactive high-dimensionality plots. https://github.com/facebookresearch/hiplot, 2020.

[26]

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters, 2017.

[27]

Elia Kaufmann, Antonio Loquercio, René Ranftl, Matthias Müller, Vladlen Koltun, and Davide Scaramuzza. Deep drone acrobatics. CoRR, abs/2006.05768, 2020.

[28]

William Koch, Rcihard West Renato Mancuso, and Azer Bestavros. Reinforcement Learning for UAV Attitude Control. ACM Trans. Cyber-Phys. Syst., 3(2), 2019.

Digital Library

[29]

Tim Koning. Developing a self-learning drone, 2020.

[30]

N. O. Lambert, D. S. Drew, J. Yaconelli, S. Levine, R. Calandra, and K. S. J. Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224--4230, 2019.

[31]

Xiao Li and Calin Belta. Temporal logic guided safe reinforcement learning using control barrier functions. 2019.

[32]

T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.

[33]

Lars Lindemann and Dimos V. Dimarogonas. Control barrier functions for signal temporal logic tasks. IEEE Control. Syst. Lett., 3(1), 2019.

[34]

Sergio Lucia and Benjamin Karg. A deep learning-based approach to robust nonlinear model predictive control. IFAC-PapersOnLine, 51(20), 2018. 6th IFAC Conference on Nonlinear Model Predictive Control NMPC 2018.

[35]

Carlos Luis and Jérôme Le Ny. Design of a Trajectory Tracking Controller for a Nanoquadcopter. Technical report, Mobile Robotics and Autonomous Systems Laboratory, Polytechnique Montreal, 2016.

[36]

Noushin Mehdipour, Cristian Ioan Vasile, and Calin Belta. Arithmetic-geometric mean robustness for control from signal temporal logic specifications. In ACC. IEEE, 2019.

[37]

Artem Molchanov, Tao Chen, Wolfgang Hönig, James A. Preiss, Nora Ayanian, and Gaurav S. Sukhatme. Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors. CoRR, abs/1903.04628, 2019.

[38]

Arnab Nilim and Laurent El Ghaoui. Robust Markov Decision Processes with Uncertain Transition Matrices. PhD thesis, 2004.

Digital Library

[39]

C. Poussot-Vassal, F. Demourant, A. Lepage, and D. Le Bihan. Gust load alleviation: Identification, control, and wind tunnel testing of a 2-d aeroelastic airfoil. IEEE Transactions on Control Systems Technology, 25(5), 2017.

[40]

Nirnai Rao, Elie Aljalbout, Axel Sauer, and Sami Haddadin. How to make Deep RL work in practice. 2020.

[41]

Daniele Reda, Tianxin Tao, and Michiel van de Panne. Learning to locomote: Understanding how environment design matters for deep reinforcement learning. In MIG. ACM, 2020.

[42]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.

[43]

Hassam Ullah Sheikh and Ladislau Bölöni. Reducing overestimation bias by increasing representation dissimilarity in ensemble based deep Q-learning. arXiv cs.LG 2006.13823, 2020.

[44]

Samarth Sinha, Homanga Bharadhwaj, Aravind Srinivas, and Animesh Garg. D2rl: Deep dense architectures in reinforcement learning, 2020.

[45]

R. S. Sutton, A. G. Barto, and R. J. Williams. Reinforcement learning is direct adaptive optimal control. IEEE Control Systems Magazine, 12(2), 1992.

[46]

Che Wang, Yanqiu Wu, Quan Vuong, and Keith Ross. Striving for simplicity and performance in off-policy DRL: Output normalization and non-uniform sampling. In ICML, volume 119, 2020.

[47]

Jaehyun Yoo, Dohyun Jang, H Jin Kim, and Karl H Johansson. Hybrid reinforcement learning control for a micro quadrotor flight. IEEE Control Systems Letters, 5(2), 2020.

Cited By

Bøhn ECoates EReinhardt DJohansen T(2024)Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326343035:3(3168-3180)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2023.3263430
Hamilton NDunlap KHobbs K(2024)Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls2024 IEEE 10th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT61443.2024.00014(56-66)Online publication date: 15-Jul-2024
https://doi.org/10.1109/SMC-IT61443.2024.00014
K SR AMishra D(2024)Reinforcement Learning Based Attitude Control of Quadcopter2024 Tenth Indian Control Conference (ICC)10.1109/ICC64753.2024.10883695(520-525)Online publication date: 9-Dec-2024
https://doi.org/10.1109/ICC64753.2024.10883695
Show More Cited By

Index Terms

A few lessons learned in reinforcement learning for quadcopter attitude control

Recommendations

Reinforcement Learning for UAV Attitude Control

Autopilot systems are typically composed of an “inner loop” providing stability and control, whereas an “outer loop” is responsible for mission-level objectives, such as way-point navigation. Autopilot systems for unmanned aerial vehicles are ...
Reinforcement learning with formal performance metrics for quadcopter attitude control under non-nominal contexts
Abstract
We explore the reinforcement learning approach to designing controllers by extensively discussing the case of a quadcopter attitude controller. We provide all details allowing to reproduce our approach, starting with a model of the dynamics of a ...
Control of Attitude Angle for a Tilted Quadrotor
2018 IEEE International Conference on Mechatronics and Automation (ICMA)
Unmanned aerial vehicles (UAVs) have been studied as a means of information gathering in the event of a disaster. Among them, vertical take-off and landing (VTOL) type UAVs called quadrotor attract high attention. A quadrotor has four rotors and has the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HSCC '21: Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control

May 2021

300 pages

ISBN:9781450383394

DOI:10.1145/3447928

Program Chairs:
Sergiy Bogomolov
Newcastle University, UK
,
Raphaël Jungers
UCLouvain, Belgium

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 May 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Results Reproduced / v1.1

Qualifiers

Research-article

Funding Sources

DGA/MRIS

Conference

HSCC '21

Sponsor:

SIGBED

HSCC '21: 24th ACM International Conference on Hybrid Systems: Computation and Control

May 19 - 21, 2021

Tennessee, Nashville

Acceptance Rates

HSCC '21 Paper Acceptance Rate 27 of 77 submissions, 35%;

Overall Acceptance Rate 153 of 373 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
382
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)7

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bøhn ECoates EReinhardt DJohansen T(2024)Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326343035:3(3168-3180)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2023.3263430
Hamilton NDunlap KHobbs K(2024)Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls2024 IEEE 10th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT61443.2024.00014(56-66)Online publication date: 15-Jul-2024
https://doi.org/10.1109/SMC-IT61443.2024.00014
K SR AMishra D(2024)Reinforcement Learning Based Attitude Control of Quadcopter2024 Tenth Indian Control Conference (ICC)10.1109/ICC64753.2024.10883695(520-525)Online publication date: 9-Dec-2024
https://doi.org/10.1109/ICC64753.2024.10883695
Cheng WLi HHou Z(2024)Fault-tolerant control for UAV based on deep reinforcement learning under single rotor failure2024 36th Chinese Control and Decision Conference (CCDC)10.1109/CCDC62350.2024.10588308(5279-5285)Online publication date: 25-May-2024
https://doi.org/10.1109/CCDC62350.2024.10588308
Richter DCalix RKim K(2024)A Review of Reinforcement Learning for Fixed-Wing Aircraft Control TasksIEEE Access10.1109/ACCESS.2024.343354012(103026-103048)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3433540
Bernini NBessa MDelmas RGold AGoubault EPennec RPutot SSillion F(2024)Reinforcement learning with formal performance metrics for quadcopter attitude control under non-nominal contextsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107090127:PAOnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107090
Hamilton NDunlap KJohnson THobbs K(2023)Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT56444.2023.00014(45-55)Online publication date: Jul-2023
https://doi.org/10.1109/SMC-IT56444.2023.00014
Zhang BZhang WMou JYang RZhang Y(2023)Fuzzy PID Controller for UAV Based on Reinforcement LearningProceedings of 2022 International Conference on Autonomous Unmanned Systems (ICAUS 2022)10.1007/978-981-99-0479-2_160(1724-1732)Online publication date: 10-Mar-2023
https://doi.org/10.1007/978-981-99-0479-2_160
Richter DCalix R(2022)Using Double Deep Q-Learning to learn Attitude Control of Fixed-Wing Aircraft2022 16th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)10.1109/SITIS57111.2022.00102(646-651)Online publication date: Oct-2022
https://doi.org/10.1109/SITIS57111.2022.00102

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten