research-article

Public Access

Reinforcement Learning for UAV Attitude Control

Authors:

Renato Mancuso,

Azer BestavrosAuthors Info & Claims

ACM Transactions on Cyber-Physical Systems, Volume 3, Issue 2

Article No.: 22, Pages 1 - 21

https://doi.org/10.1145/3301273

Published: 13 February 2019 Publication History

All formats PDF

Abstract

Autopilot systems are typically composed of an “inner loop” providing stability and control, whereas an “outer loop” is responsible for mission-level objectives, such as way-point navigation. Autopilot systems for unmanned aerial vehicles are predominately implemented using Proportional-Integral-Derivative (PID) control systems, which have demonstrated exceptional performance in stable environments. However, more sophisticated control is required to operate in unpredictable and harsh environments. Intelligent flight control systems is an active area of research addressing limitations of PID control most recently through the use of reinforcement learning (RL), which has had success in other applications, such as robotics. Yet previous work has focused primarily on using RL at the mission-level controller. In this work, we investigate the performance and accuracy of the inner control loop providing attitude control when using intelligent flight control systems trained with state-of-the-art RL algorithms—Deep Deterministic Policy Gradient, Trust Region Policy Optimization, and Proximal Policy Optimization. To investigate these unknowns, we first developed an open source high-fidelity simulation environment to train a flight controller attitude control of a quadrotor through RL. We then used our environment to compare their performance to that of a PID controller to identify if using RL is appropriate in high-precision, time-critical flight control.

References

[1]

ArduPilot. 2018. ArduPilot Home Page. Retrieved January 20, 2019 from http://ardupilot.org/.

[2]

GitHub. 2018. BetaFlight. Retrieved January 20, 2019 from https://github.com/betaflight/betaflight.

[3]

Open Source Robotics Foundation. 2018. gzserver doesn’t close disconnected sockets. Retrieved January 20, 2019 from https://bitbucket.org/osrf/gazebo/issues/2397/gzserver-doesnt-close-disconnected-sockets.

[4]

APM Copter. 2018. Iris QuadCopter. Retrieved January 20, 2019 from http://www.arducopter.co.uk/iris-quadcopter-uav.html.

[5]

Google. 2018. Protocol Buffers. Retrieved January 20, 2019 from https://developers.google.com/protocol-buffers/.

[6]

Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems. 1--8.

Digital Library

[7]

J. Andrew Bagnell and Jeff G. Schneider. 2001. Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the 2001 IEEE International Conference on Robotics and Automation (ICRA’01), Vol. 2. IEEE, Los Alamitos, CA, 1615--1620.

[8]

John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman. 2008. Learning bounds for domain adaptation. In Advances in Neural Information Processing Systems. 129--136.

Digital Library

[9]

Alexey Bobtsov, Alexei Guirik, Marina Budko, and Mikhail Budko. 2016. Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle. In Proceedings of the 2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT’16). IEEE, Los Alamitos, CA, 1--4.

[10]

Samir Bouabdallah, Pierpaolo Murrieri, and Roland Siegwart. 2004. Design and control of an indoor micro quadrotor. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation (ICRA’04), Vol. 5. IEEE, Los Alamitos, CA, 4393--4398.

[11]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, et al. 2016. Openai gym. arXiv:1606.01540.

[12]

Daniel Dewey. 2014. Reinforcement learning and the reward engineering principle. In Proceedings of the 2014 AAAI Spring Symposium Series.

[13]

Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, et al. 2017. OpenAI Baselines. GitHub. Retrieved January 20, 2019 from https://github.com/openai/baselines.

[14]

Travis Dierks and Sarangapani Jagannathan. 2010. Output feedback control of a quadrotor UAV using neural networks. IEEE Transactions on Neural Networks 21, 1 (2010), 50--66.

Digital Library

[15]

Mehdi Fatan, Bahram Lavi Sefidgari, and Ali Vatankhah Barenji. 2013. An adaptive neuro PID for controlling the altitude of quadcopter robot. In Proceedings of the 2013 18th International Conference on Methods and Models in Automation and Robotics (MMAR’13). IEEE, Los Alamitos, CA, 662--665.

[16]

Thomas Gabor, Lenz Belzner, Marie Kiermeier, Michael Till Beck, and Alexander Neitz. 2016. A simulation-based architecture for smart cyber-physical systems. In Proceedings of the 2016 IEEE International Conference on Autonomic Computing (ICAC’16). IEEE, Los Alamitos, CA, 374--379.

[17]

Naira Hovakimyan, Chengyu Cao, Evgeny Kharisov, Enric Xargay, and Irene M. Gregory. 2011. L1 adaptive control for safety-critical systems. IEEE Control Systems 31, 5 (2011), 54--104.

[18]

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. 2017. Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters 2, 4 (2017), 2096--2103.

[19]

Nick Jakobi, Phil Husbands, and Inman Harvey. 1995. Noise and the reality gap: The use of simulation in evolutionary robotics. In Proceedings of the 3rd European Conference on Advances in Artificial Life. 704--720.

Digital Library

[20]

Andrej Karpathy. 2018. Deep Reinforcement Learning: Pong from Pixels. Retrieved January 20, 2019 from http://karpathy.github.io/2016/05/31/rl/.

[21]

H. Jin Kim, Michael I. Jordan, Shankar Sastry, and Andrew Y. Ng. 2004. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems. 799--806.

Digital Library

[22]

William Koch. 2018. GymFC. GitHub. Retrieved January 20, 2019 from https://github.com/wil3/gymfc.

[23]

Nathan Koenig and Andrew Howard. {n.d.}. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’04), Vol. 3. IEEE, Los Alamitos, CA, 2149--2154.

[24]

Kalmanje KrishnaKumar and Karen Gundy-Burlet. 2002. Intelligent control approaches for aircraft applications. In Proceedings of the JANAFF Interagency Propulsion Committee Meeting. Destin, FL.

[25]

Daewon Lee, H. Jin Kim, and Shankar Sastry. 2009. Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter. International Journal of Control, Automation and Systems 7, 3 (2009), 419--428.

[26]

Douglas J. Leith and William E. Leithead. 2000. Survey of gain-scheduling analysis and design. International Journal of Control 73, 11 (2000), 1001--1025.

[27]

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, et al. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971.

[28]

Tarek Madani and Abdelaziz Benallegue. 2006. Backstepping control for a quadrotor helicopter. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 3255--3260.

[29]

K. Niki Maleki, Kaveh Ashenayi, Loyd R. Hook, Justin G. Fuller, and Nathan Hutchins. 2016. A reliable system design for nondeterministic adaptive controllers in small UAV autopilots. In Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC’16). IEEE, Los Alamitos, CA, 1--5.

[30]

Orazio Miglino, Henrik Hautop Lund, and Stefano Nolfi. 1995. Evolving mobile robots in simulated and real environments. Artificial Life 2, 4 (1995), 417--434.

Digital Library

[31]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, et al. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602.

[32]

Fendy Santoso, Matthew A. Garratt, and Sreenatha G. Anavatti. 2017. State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Transactions on Automation Science and Engineering 15, 2, 613--627.

[33]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. 1889--1897.

Digital Library

[34]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347.

[35]

Jack F. Shepherd III and Kagan Tumer. 2010. Robust neuro-control for a micro quadrotor. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, 1131--1138.

Digital Library

[36]

Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge, MA.

Digital Library

[37]

Le Yi Wang and Ji-Feng Zhang. 2001. Fundamental limitations and differences of robust and adaptive control. In Proceedings of the 2001 American Control Conference, Vol. 6. IEEE, Los Alamitos, CA, 4802--4807.

[38]

Steven Lake Waslander, Gabriel M. Hoffmann, Jung Soon Jang, and Claire J. Tomlin. 2005. Multi-agent quadrotor testbed control design: Integral sliding mode vs. reinforcement learning. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’05). IEEE, Los Alamitos, CA, 3712--3717.

[39]

H. Philip Whitaker, Joseph Yamron, and Allen Kezer. 1958. Design of Model-Reference Adaptive Control Systems for Aircraft. MIT Instrumentation Laboratory, Cambridge, MA.

[40]

Peggy S. Williams-Hayes. 2005. Flight test implementation of a second generation intelligent flight control system. In Proceedings of the Infotech@Aerospace Conference.

[41]

John G. Ziegler and Nathaniel B. Nichols. 1942. Optimum settings for automatic controllers. Transactions of the ASME 64, 11 (Nov. 1942), 759--768.

Cited By

Dey SGangopadhyay BDasgupta PDey SDastani MSichman JAlechina NDignum V(2024)MAGNets: Micro-Architectured Group Neural NetworksProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663252(2650-2658)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663252
Rao JChimata S(2024)Machine learning-based surrogates for eVTOL performance prediction and design optimizationMetascience in Aerospace10.3934/mina.20240111:3(246-267)Online publication date: 2024
https://doi.org/10.3934/mina.2024011
Jiang SGe YYang XYang WCui H(2024)UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation LearningFuture Internet10.3390/fi1603010516:3(105)Online publication date: 20-Mar-2024
https://doi.org/10.3390/fi16030105
Show More Cited By

Index Terms

Reinforcement Learning for UAV Attitude Control
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Fuzzy Adaptive Control for a UAV

In this paper a combination of Fuzzy Logic Control (FLC) and Model Reference Adaptive Control (MRAC) will be developed to stabilize and control a fixed-wing unmanned aerial vehicle (UAV). The control must be able to direct the airplane towards different ...
Pitch Loop Control of a VTOL UAV Using Fractional Order Controller

Pitch loop control is the fundamental tuning step for vertical takeoff and landing (VTOL) unmanned aerial vehicles (UAVs), and has significant impact on the flight. In this paper, a fractional order strategy is designed to control the pitch loop of a ...
Model based UAV autopilot tuning
FM'11/HMT'11: Proceedings of the 8th WSEAS international conference on fluid mechanics, 8th WSEAS international conference on Heat and mass transfer

The paper presents the role of Autopilots in Unmanned Aerial Vehicles (UAVs) and the process of their configuration before flying can occur. The common autopilot architecture, emphasising the control module, is given shortly and the functionality of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Cyber-Physical Systems

ACM Transactions on Cyber-Physical Systems Volume 3, Issue 2

April 2019

283 pages

ISSN:2378-962X

EISSN:2378-9638

DOI:10.1145/3284746

Editor:
Tei-Wei Kuo
National Taiwan University, and Academia Sinica, Taiwan

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 13 February 2019

Accepted: 01 December 2018

Revised: 01 September 2018

Received: 01 May 2018

Published in TCPS Volume 3, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

268
Total Citations
View Citations
13,676
Total Downloads

Downloads (Last 12 months)2,467
Downloads (Last 6 weeks)241

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Dey SGangopadhyay BDasgupta PDey SDastani MSichman JAlechina NDignum V(2024)MAGNets: Micro-Architectured Group Neural NetworksProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663252(2650-2658)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663252
Rao JChimata S(2024)Machine learning-based surrogates for eVTOL performance prediction and design optimizationMetascience in Aerospace10.3934/mina.20240111:3(246-267)Online publication date: 2024
https://doi.org/10.3934/mina.2024011
Jiang SGe YYang XYang WCui H(2024)UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation LearningFuture Internet10.3390/fi1603010516:3(105)Online publication date: 20-Mar-2024
https://doi.org/10.3390/fi16030105
Leitão DCunha RLemos J(2024)Adaptive Control of Quadrotors in Uncertain EnvironmentsEng10.3390/eng50200305:2(544-561)Online publication date: 28-Mar-2024
https://doi.org/10.3390/eng5020030
Ren YZhu FSui SYi ZChen K(2024)Enhancing Quadrotor Control Robustness with Multi-Proportional–Integral–Derivative Self-Attention-Guided Deep Reinforcement LearningDrones10.3390/drones80703158:7(315)Online publication date: 10-Jul-2024
https://doi.org/10.3390/drones8070315
Sembiring JSasongko RBastian ERaditya BLimansubroto R(2024)A Deep Learning Approach for Trajectory Control of Tilt-Rotor UAVAerospace10.3390/aerospace1101009611:1(96)Online publication date: 19-Jan-2024
https://doi.org/10.3390/aerospace11010096
Li Yvan Kampen E(2024)Reinforcement Learning-based Intelligent Flight Control for a Fixed-wing Aircraft to Cross an Obstacle Wall2024 European Control Conference (ECC)10.23919/ECC64448.2024.10591030(1636-1641)Online publication date: 25-Jun-2024
https://doi.org/10.23919/ECC64448.2024.10591030
Lu YDong WSun JWang CZhang LDeng F(2024)Formation Tracking for Multiple UAVs Based on Deep Reinforcement Learning2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662743(2382-2387)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10662743
Li MLiu PChen SZhang Y(2024)High Dynamic Control for Quadrotor Based on Learning Method2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662290(2413-2418)Online publication date: 28-Jul-2024
https://doi.org/10.23919/CCC63176.2024.10662290
Febvre MRodriguez JChesne SCollet M(2024)Deep reinforcement learning for tuning active vibration control on a smart piezoelectric beamJournal of Intelligent Material Systems and Structures10.1177/1045389X241260976Online publication date: 25-Jul-2024
https://doi.org/10.1177/1045389X241260976
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents