Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Reinforcement Learning for UAV Attitude Control

Published: 13 February 2019 Publication History

Abstract

Autopilot systems are typically composed of an “inner loop” providing stability and control, whereas an “outer loop” is responsible for mission-level objectives, such as way-point navigation. Autopilot systems for unmanned aerial vehicles are predominately implemented using Proportional-Integral-Derivative (PID) control systems, which have demonstrated exceptional performance in stable environments. However, more sophisticated control is required to operate in unpredictable and harsh environments. Intelligent flight control systems is an active area of research addressing limitations of PID control most recently through the use of reinforcement learning (RL), which has had success in other applications, such as robotics. Yet previous work has focused primarily on using RL at the mission-level controller. In this work, we investigate the performance and accuracy of the inner control loop providing attitude control when using intelligent flight control systems trained with state-of-the-art RL algorithms—Deep Deterministic Policy Gradient, Trust Region Policy Optimization, and Proximal Policy Optimization. To investigate these unknowns, we first developed an open source high-fidelity simulation environment to train a flight controller attitude control of a quadrotor through RL. We then used our environment to compare their performance to that of a PID controller to identify if using RL is appropriate in high-precision, time-critical flight control.

References

[1]
ArduPilot. 2018. ArduPilot Home Page. Retrieved January 20, 2019 from http://ardupilot.org/.
[2]
GitHub. 2018. BetaFlight. Retrieved January 20, 2019 from https://github.com/betaflight/betaflight.
[3]
Open Source Robotics Foundation. 2018. gzserver doesn’t close disconnected sockets. Retrieved January 20, 2019 from https://bitbucket.org/osrf/gazebo/issues/2397/gzserver-doesnt-close-disconnected-sockets.
[4]
APM Copter. 2018. Iris QuadCopter. Retrieved January 20, 2019 from http://www.arducopter.co.uk/iris-quadcopter-uav.html.
[5]
Google. 2018. Protocol Buffers. Retrieved January 20, 2019 from https://developers.google.com/protocol-buffers/.
[6]
Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems. 1--8.
[7]
J. Andrew Bagnell and Jeff G. Schneider. 2001. Autonomous helicopter control using reinforcement learning policy search methods. In Proceedings of the 2001 IEEE International Conference on Robotics and Automation (ICRA’01), Vol. 2. IEEE, Los Alamitos, CA, 1615--1620.
[8]
John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman. 2008. Learning bounds for domain adaptation. In Advances in Neural Information Processing Systems. 129--136.
[9]
Alexey Bobtsov, Alexei Guirik, Marina Budko, and Mikhail Budko. 2016. Hybrid parallel neuro-controller for multirotor unmanned aerial vehicle. In Proceedings of the 2016 8th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT’16). IEEE, Los Alamitos, CA, 1--4.
[10]
Samir Bouabdallah, Pierpaolo Murrieri, and Roland Siegwart. 2004. Design and control of an indoor micro quadrotor. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation (ICRA’04), Vol. 5. IEEE, Los Alamitos, CA, 4393--4398.
[11]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, et al. 2016. Openai gym. arXiv:1606.01540.
[12]
Daniel Dewey. 2014. Reinforcement learning and the reward engineering principle. In Proceedings of the 2014 AAAI Spring Symposium Series.
[13]
Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, et al. 2017. OpenAI Baselines. GitHub. Retrieved January 20, 2019 from https://github.com/openai/baselines.
[14]
Travis Dierks and Sarangapani Jagannathan. 2010. Output feedback control of a quadrotor UAV using neural networks. IEEE Transactions on Neural Networks 21, 1 (2010), 50--66.
[15]
Mehdi Fatan, Bahram Lavi Sefidgari, and Ali Vatankhah Barenji. 2013. An adaptive neuro PID for controlling the altitude of quadcopter robot. In Proceedings of the 2013 18th International Conference on Methods and Models in Automation and Robotics (MMAR’13). IEEE, Los Alamitos, CA, 662--665.
[16]
Thomas Gabor, Lenz Belzner, Marie Kiermeier, Michael Till Beck, and Alexander Neitz. 2016. A simulation-based architecture for smart cyber-physical systems. In Proceedings of the 2016 IEEE International Conference on Autonomic Computing (ICAC’16). IEEE, Los Alamitos, CA, 374--379.
[17]
Naira Hovakimyan, Chengyu Cao, Evgeny Kharisov, Enric Xargay, and Irene M. Gregory. 2011. L1 adaptive control for safety-critical systems. IEEE Control Systems 31, 5 (2011), 54--104.
[18]
Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. 2017. Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters 2, 4 (2017), 2096--2103.
[19]
Nick Jakobi, Phil Husbands, and Inman Harvey. 1995. Noise and the reality gap: The use of simulation in evolutionary robotics. In Proceedings of the 3rd European Conference on Advances in Artificial Life. 704--720.
[20]
Andrej Karpathy. 2018. Deep Reinforcement Learning: Pong from Pixels. Retrieved January 20, 2019 from http://karpathy.github.io/2016/05/31/rl/.
[21]
H. Jin Kim, Michael I. Jordan, Shankar Sastry, and Andrew Y. Ng. 2004. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems. 799--806.
[22]
William Koch. 2018. GymFC. GitHub. Retrieved January 20, 2019 from https://github.com/wil3/gymfc.
[23]
Nathan Koenig and Andrew Howard. {n.d.}. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’04), Vol. 3. IEEE, Los Alamitos, CA, 2149--2154.
[24]
Kalmanje KrishnaKumar and Karen Gundy-Burlet. 2002. Intelligent control approaches for aircraft applications. In Proceedings of the JANAFF Interagency Propulsion Committee Meeting. Destin, FL.
[25]
Daewon Lee, H. Jin Kim, and Shankar Sastry. 2009. Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter. International Journal of Control, Automation and Systems 7, 3 (2009), 419--428.
[26]
Douglas J. Leith and William E. Leithead. 2000. Survey of gain-scheduling analysis and design. International Journal of Control 73, 11 (2000), 1001--1025.
[27]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, et al. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971.
[28]
Tarek Madani and Abdelaziz Benallegue. 2006. Backstepping control for a quadrotor helicopter. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 3255--3260.
[29]
K. Niki Maleki, Kaveh Ashenayi, Loyd R. Hook, Justin G. Fuller, and Nathan Hutchins. 2016. A reliable system design for nondeterministic adaptive controllers in small UAV autopilots. In Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC’16). IEEE, Los Alamitos, CA, 1--5.
[30]
Orazio Miglino, Henrik Hautop Lund, and Stefano Nolfi. 1995. Evolving mobile robots in simulated and real environments. Artificial Life 2, 4 (1995), 417--434.
[31]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, et al. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602.
[32]
Fendy Santoso, Matthew A. Garratt, and Sreenatha G. Anavatti. 2017. State-of-the-art intelligent flight control systems in unmanned aerial vehicles. IEEE Transactions on Automation Science and Engineering 15, 2, 613--627.
[33]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. 1889--1897.
[34]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347.
[35]
Jack F. Shepherd III and Kagan Tumer. 2010. Robust neuro-control for a micro quadrotor. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. ACM, New York, NY, 1131--1138.
[36]
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge, MA.
[37]
Le Yi Wang and Ji-Feng Zhang. 2001. Fundamental limitations and differences of robust and adaptive control. In Proceedings of the 2001 American Control Conference, Vol. 6. IEEE, Los Alamitos, CA, 4802--4807.
[38]
Steven Lake Waslander, Gabriel M. Hoffmann, Jung Soon Jang, and Claire J. Tomlin. 2005. Multi-agent quadrotor testbed control design: Integral sliding mode vs. reinforcement learning. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’05). IEEE, Los Alamitos, CA, 3712--3717.
[39]
H. Philip Whitaker, Joseph Yamron, and Allen Kezer. 1958. Design of Model-Reference Adaptive Control Systems for Aircraft. MIT Instrumentation Laboratory, Cambridge, MA.
[40]
Peggy S. Williams-Hayes. 2005. Flight test implementation of a second generation intelligent flight control system. In Proceedings of the Infotech@Aerospace Conference.
[41]
John G. Ziegler and Nathaniel B. Nichols. 1942. Optimum settings for automatic controllers. Transactions of the ASME 64, 11 (Nov. 1942), 759--768.

Cited By

View all
  • (2024)MAGNets: Micro-Architectured Group Neural NetworksProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663252(2650-2658)Online publication date: 6-May-2024
  • (2024)Machine learning-based surrogates for eVTOL performance prediction and design optimizationMetascience in Aerospace10.3934/mina.20240111:3(246-267)Online publication date: 2024
  • (2024)UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation LearningFuture Internet10.3390/fi1603010516:3(105)Online publication date: 20-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Cyber-Physical Systems
ACM Transactions on Cyber-Physical Systems  Volume 3, Issue 2
April 2019
283 pages
ISSN:2378-962X
EISSN:2378-9638
DOI:10.1145/3284746
  • Editor:
  • Tei-Wei Kuo
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 13 February 2019
Accepted: 01 December 2018
Revised: 01 September 2018
Received: 01 May 2018
Published in TCPS Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attitude control
  2. PID
  3. UAV
  4. adaptive control
  5. autopilot
  6. intelligent control
  7. machine learning
  8. quadcopter
  9. reinforcement learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,467
  • Downloads (Last 6 weeks)241
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MAGNets: Micro-Architectured Group Neural NetworksProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663252(2650-2658)Online publication date: 6-May-2024
  • (2024)Machine learning-based surrogates for eVTOL performance prediction and design optimizationMetascience in Aerospace10.3934/mina.20240111:3(246-267)Online publication date: 2024
  • (2024)UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation LearningFuture Internet10.3390/fi1603010516:3(105)Online publication date: 20-Mar-2024
  • (2024)Adaptive Control of Quadrotors in Uncertain EnvironmentsEng10.3390/eng50200305:2(544-561)Online publication date: 28-Mar-2024
  • (2024)Enhancing Quadrotor Control Robustness with Multi-Proportional–Integral–Derivative Self-Attention-Guided Deep Reinforcement LearningDrones10.3390/drones80703158:7(315)Online publication date: 10-Jul-2024
  • (2024)A Deep Learning Approach for Trajectory Control of Tilt-Rotor UAVAerospace10.3390/aerospace1101009611:1(96)Online publication date: 19-Jan-2024
  • (2024)Reinforcement Learning-based Intelligent Flight Control for a Fixed-wing Aircraft to Cross an Obstacle Wall2024 European Control Conference (ECC)10.23919/ECC64448.2024.10591030(1636-1641)Online publication date: 25-Jun-2024
  • (2024)Formation Tracking for Multiple UAVs Based on Deep Reinforcement Learning2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662743(2382-2387)Online publication date: 28-Jul-2024
  • (2024)High Dynamic Control for Quadrotor Based on Learning Method2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10662290(2413-2418)Online publication date: 28-Jul-2024
  • (2024)Deep reinforcement learning for tuning active vibration control on a smart piezoelectric beamJournal of Intelligent Material Systems and Structures10.1177/1045389X241260976Online publication date: 25-Jul-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media