Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning
Abstract
:1. Introduction
2. The Background
3. Deep Reinforcement Learning
3.1. Model-Based Methods
3.2. Model-Free Methods
4. Imitation Learning
4.1. Behavior Cloning
4.2. Inverse Reinforcement Learning
4.3. Generative Adversarial Imitation Learning
5. Transfer Learning
5.1. Better Simulation
5.2. Policy Randomization
5.3. Robust Policy
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Billard, A.; Kragic, D. Trends and challenges in robot manipulation. Science 2019, 364. [Google Scholar] [CrossRef] [PubMed]
- Thibaut, J.; Toussaint, L. Developing Motor Planning over Ages. J. Exp. Child Psychol. 2010, 105, 116–129. [Google Scholar] [CrossRef] [PubMed]
- Xue, Y.; Ju, Z.; Xiang, K.; Chen, J.; Liu, H. Multimodal Human Hand Motion Sensing and Analysis—A Review. IEEE Trans. Cogn. Dev. Syst. 2019, 11, 162–175. [Google Scholar]
- Feix, T.; Romero, J.; Schmiedmayer, H.; Dollar, A.M.; Kragic, D. The GRASP Taxonomy of Human Grasp Types. IEEE Trans. Hum. Mach. Syst. 2016, 46, 66–77. [Google Scholar] [CrossRef]
- Homberg, B.S.; Katzschmann, R.K.; Dogar, M.R.; Rus, D. Haptic identification of objects using a modular soft robotic gripper. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–3 October 2015; pp. 1698–1705. [Google Scholar]
- Edwards, C.; Edwards, A.; Stoll, B.; Lin, X.; Massey, N. Evaluations of an artificial intelligence instructor’s voice: Social Identity Theory in human-robot interactions. Comput. Hum. Behav. 2019, 90, 357–362. [Google Scholar] [CrossRef]
- Sahbani, A.; Elkhoury, S.; Bidaud, P. An overview of 3D object grasp synthesis algorithms. Robot. Auton. Syst. 2012, 60, 326–336. [Google Scholar] [CrossRef] [Green Version]
- Pinto, L.; Gupta, A. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3406–3413. [Google Scholar]
- Vecerik, M.; Hester, T.; Scholz, J.; Wang, F.; Pietquin, O.; Piot, B.; Heess, N.; Rothorl, T.; Lampe, T.; Riedmiller, M. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards. arXiv 2017, arXiv:1707.08817. [Google Scholar]
- Kroemer, O.; Niekum, S.; Konidaris, G. A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms. arXiv 2019, arXiv:1907.03146. [Google Scholar]
- Bohg, J.; Morales, A.; Asfour, T.; Kragic, D. Data-Driven Grasp Synthesis—A Survey. IEEE Trans. Robot. Autom. 2014, 30, 289–309. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.H.; Lei, Q.J.; Cheng, C.; Zhang, G.; Wang, W.; Xu, Z. A review: Machine learning on robotic grasping. In Proceedings of the Eleventh International Conference on Machine Vision (ICMV 2018), International Society for Optics and Photonics, Munich, Germany, 1–3 November 2019; Volume 11041. [Google Scholar]
- Bicchi, A.; Kumar, V. Robotic grasping and contact: A review. In Proceedings of the 2000 ICRA, Millennium Conference, IEEE International Conference on Robotics and Automation, Symposia Proceedings (Cat. No. 00CH37065), San Francisco, CA, USA, 24–28 April 2000; Volume 1, pp. 348–353. [Google Scholar]
- Colomé, A.; Pardo, D.; Alenya, G.; Torras, C. External force estimation during compliant robot manipulation. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 3535–3540. [Google Scholar]
- Bicchi, A. On the closure properties of robotic grasping. Int. J. Robot. Res. 1995, 14, 319–334. [Google Scholar] [CrossRef]
- Li, J.; Liu, H.; Cai, H. On computing three-finger force-closure grasps of 2-D and 3-D objects. IEEE Trans. Robot. Autom. 2003, 19, 155–161. [Google Scholar]
- Lin, Y.; Sun, Y. Grasp planning to maximize task coverage. Int. J. Robot. Res. 2015, 34, 1195–1210. [Google Scholar] [CrossRef]
- Kemp, C.C.; Edsinger, A.; Torres-Jara, E. Challenges for robot manipulation in human environments [grand challenges of robotics]. IEEE Robot. Autom. Mag. 2007, 14, 20–29. [Google Scholar] [CrossRef]
- Fang, B.; Jia, S.; Guo, D.; Xu, M.; Wen, S.; Sun, F. Survey of imitation learning for robotic manipulation. Int. J. Intell. Robot. Appl. 2019, 3, 362–369. [Google Scholar] [CrossRef]
- Alexandrova, S.; Cakmak, M.; Hsiao, K.; Takayama, L. Robot Programming by Demonstration with Interactive Action Visualizations. Robot. Sci. Syst. 2014, 10. [Google Scholar] [CrossRef]
- Huang, D.; Ma, M.; Ma, W.; Kitani, K.M. How do we use our hands?Discovering a diverse set of common grasps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 666–675. [Google Scholar]
- Pérez-D’Arpino, C.; Shah, J.A. Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 6175–6182. [Google Scholar]
- Yang, Y.; Fermuller, C.; Li, Y.; Aloimonos, Y. Grasp type revisited: A modern perspective on a classical feature for vision. Comput. Vis. Pattern Recognit. 2015, 400–408. [Google Scholar]
- Lenz, I.; Lee, H.; Saxena, A. Deep Learning for Detecting Robotic Grasps. Int. J. Robot. Res. 2013, 34, 705–724. [Google Scholar] [CrossRef] [Green Version]
- Lecun, Y.; Bengio, Y.; Hinton, G.E. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Angelova, A. Real-Time Grasp Detection Using Convolutional Neural Networks. Int. Conf. Robot. Autom. 2015, 1316–1322. [Google Scholar]
- Varley, J.; Weisz, J.; Weiss, J.; Allen, P.K. Generating multi-fingered robotic grasps via deep learning. Intell. Robot. Syst. 2015, 4415–4420. [Google Scholar] [CrossRef]
- BCS, M. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. arXiv 2016, arXiv:1604.06057. [Google Scholar]
- Hou, Z.; Fei, J.; Deng, Y.; Xu, J. Data-efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications. IEEE Trans. Ind. Electron. 2020. [Google Scholar] [CrossRef]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Sallab, A.A.A.; Yogamani, S.; Perez, P. Deep Reinforcement Learning for Autonomous Driving: A Survey. arXiv 2020, arXiv:2002.00444. [Google Scholar]
- Zhu, Y.; Wang, Z.; Merel, J.; Rusu, A.A.; Erez, T.; Cabi, S.; Tunyasuvunakool, S.; Kramar, J.; Hadsell, R.; De Freitas, N.; et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills. arXiv 2018, arXiv:1802.09564. [Google Scholar]
- Zhu, Z.; Hu, H. Robot Learning from Demonstration in Robotic Assembly: A Survey. Robotics 2018, 7, 17. [Google Scholar]
- Andrychowicz, O.M.; Baker, B.; Chociej, M.; Jozefowicz, R.; Mcgrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell, G.; Ray, A.; et al. Learning Dexterous in-Hand Manipulation. Int. J. Robot. Res. 2020, 39, 3–20. [Google Scholar] [CrossRef] [Green Version]
- Levine, S.; Popovic, Z.; Koltun, V. Nonlinear Inverse Reinforcement Learning with Gaussian Processes. Adv. Neural Inf. Process. Syst. 2011, 24, 19–27. [Google Scholar]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version]
- Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutorials 2019, 21, 3133–3174. [Google Scholar] [CrossRef] [Green Version]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, H.; La, H. Review of deep reinforcement learning for robot manipulation. In Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019; pp. 590–595. [Google Scholar]
- Fan, L.; Zhu, Y.; Zhu, J.; Liu, Z.; Zeng, O.; Gupta, A.; Creus-Costa, J.; Savarese, S.; Fei-Fei, L. Surreal: Open-source reinforcement learning framework and robot manipulation benchmark. In Proceedings of the Conference on Robot Learning, Zürich, Switzerland, 29–31 October 2018; pp. 767–782. [Google Scholar]
- Hester, T.; Quinlan, M.; Stone, P. Generalized model learning for Reinforcement Learning on a humanoid robot. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 2369–2374. [Google Scholar]
- Lioutikov, R.; Paraschos, A.; Peters, J.; Neumann, G. Sample-Based Information-Theoretic Stochastic Optimal Control. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–5 June 2014; pp. 3896–3902. [Google Scholar]
- Schenck, C.; Tompson, J.; Fox, D.; Levine, S. Learning Robotic Manipulation of Granular Media. In Proceedings of the Conference on Robot Learning, PMLR, Mountain View, CA, USA, 13–15 November 2017. [Google Scholar]
- Ross, S.; Gordon, G.J.; Bagnell, J.A. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 627–635. [Google Scholar]
- Sutton, R.S.; Mcallester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the NIPs, Denver, CO, USA, 29 November–4 December 1999; Volume 12, pp. 1057–1063. [Google Scholar]
- Peters, J.; Schaal, S. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients. Neural Netw. 2008, 21, 682–697. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Endo, G.; Morimoto, J.; Matsubara, T.; Nakanishi, J.; Cheng, G. Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot. Int. J. Robot. Res. 2008, 27, 213–228. [Google Scholar] [CrossRef] [Green Version]
- Deisenroth, M.P.; Rasmussen, C.E. PILCO: A Model-Based and Data-Efficient Approach to Policy Search. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, 28 June–2 July 2011; pp. 465–472. [Google Scholar]
- Yahya, A.; Li, A.; Kalakrishnan, M.; Chebotar, Y.; Levine, S. Collective robot reinforcement learning with distributed asynchronous guided policy search. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 79–86. [Google Scholar]
- Gass, S.I.; Harris, C.M. Encyclopedia of Operations Research and Management Science. J. Am. Stat. Assoc. 1997, 92, 800. [Google Scholar]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1334–1373. [Google Scholar]
- Levine, S.; Pastor, P.; Krizhevsky, A.; Quillen, D. Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection. Int. J. Robot. Res. 2018, 37, 421–436. [Google Scholar] [CrossRef]
- Shi, H.; Shi, L.; Xu, M.; Hwang, K.S. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Trans. Ind. Inform. 2019, 16, 2393–2402. [Google Scholar] [CrossRef]
- Kroemer, O.; Detry, R.; Piater, J.; Peters, J. Combining active learning and reactive control for robot grasping. Robot. Auton. Syst. 2010, 58, 1105–1116. [Google Scholar] [CrossRef] [Green Version]
- Tan, X.; Chng, C.B.; Su, Y.; Lim, K.B.; Chui, C.K. Robot-assisted training in laparoscopy using deep reinforcement learning. IEEE Robot. Autom. Lett. 2019, 4, 485–492. [Google Scholar] [CrossRef]
- Asada, M.; Noda, S.; Tawaratsumida, S.; Hosoda, K. Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Mach. Learn. 1996, 23, 279–303. [Google Scholar] [CrossRef] [Green Version]
- Wulfmeier, M.; Ondruska, P.; Posner, I. Maximum Entropy Deep Inverse Reinforcement Learning. arXiv 2015, arXiv:1507.04888. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Zhang, F.; Leitner, J.; Milford, M.; Upcroft, B.; Corke, P. Towards vision-based deep reinforcement learning for robotic motion control. arXiv 2015, arXiv:1511.03791. [Google Scholar]
- Hausknecht, M.; Stone, P. Deep Recurrent Q-Learning for Partially Observable MDPs. arXiv 2015, arXiv:1507.06527. [Google Scholar]
- Cao, J.; Liu, W.; Liu, Y.; Yang, J. Generalize Robot Learning From Demonstration to Variant Scenarios with Evolutionary Policy Gradient. Front. Neurorobot. 2020, 14. [Google Scholar] [CrossRef]
- Yang, C.; Chen, C.; Wang, N.; Ju, Z.; Fu, J.; Wang, M. Biologically Inspired Motion Modeling and Neural Control for Robot Learning From Demonstrations. IEEE Trans. Cogn. Dev. Syst. 2019, 11, 281–291. [Google Scholar]
- Levine, S.; Koltun, V. Learning Complex Neural Network Policies with Trajectory Optimization. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 829–837. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.I.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Mirowski, P.; Pascanu, R.; Viola, F.; Soyer, H.; Ballard, A.J.; Banino, A.; Denil, M.; Goroshin, R.; Sifre, L.; Kavukcuoglu, K.; et al. Learning to Navigate in Complex Environments. arXiv 2016, arXiv:1611.03673. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Harley, T.; Lillicrap, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Lillicrap, T.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2016, arXiv:1509.02971. [Google Scholar]
- Gu, S.; Lillicrap, T.; Sutskever, I.; Levine, S. Continuous Deep Q-Learning with Model-based Acceleration. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the 2017 IEEE International conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Bojarski, M.; Testa, D.D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.A.; Zhang, J.; et al. End to End Learning for Self-Driving Cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Kumar, V.; Gupta, A.; Todorov, E.; Levine, S. Learning Dexterous Manipulation Policies from Experience and Imitation. arXiv 2016, arXiv:1611.05095. [Google Scholar]
- Wu, Y.; Charoenphakdee, N.; Bao, H.; Tangkaratt, V.; Sugiyama, M. Imitation Learning from Imperfect Demonstration. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6818–6827. [Google Scholar]
- Takeda, T.; Hirata, Y.; Kosuge, K. Dance Step Estimation Method Based on HMM for Dance Partner Robot. IEEE Trans. Ind. Electron. 2007, 54, 699–706. [Google Scholar] [CrossRef]
- Calinon, S.; Billard, A. Incremental learning of gestures by imitation in a humanoid robot. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, Arlington, VA, USA, 10–12 March 2007; pp. 255–262. [Google Scholar]
- Calinon, S.; Dhalluin, F.; Sauser, E.L.; Caldwell, D.G.; Billard, A. Learning and Reproduction of Gestures by Imitation. IEEE Robot. Autom. Mag. 2010, 17, 44–54. [Google Scholar] [CrossRef] [Green Version]
- Gams, A.; Nemec, B.; Ijspeert, A.J.; Ude, A. Coupling Movement Primitives: Interaction with the Environment and Bimanual Tasks. IEEE Trans. Robot. 2014, 30, 816–830. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Mccarthy, Z.; Jowl, O.; Lee, D.; Chen, X.; Goldberg, K.; Abbeel, P. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1–8. [Google Scholar]
- Ng, A.Y.; Russell, S. Algorithms for Inverse Reinforcement Learning; ICML: Stanford, CA, USA, 2000; Volume 67, pp. 663–670. [Google Scholar]
- Krishnan, S.; Garg, A.; Liaw, R.; Thananjeyan, B.; Miller, L.; Pokorny, F.T.; Goldberg, K. SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. Int. J. Robot. Res. 2019, 38, 126–145. [Google Scholar] [CrossRef]
- Jiang, Y.; Yang, C.; Wang, Y.; Ju, Z.; Li, Y.; Su, C.Y. Multi-hierarchy interaction control of a redundant robot using impedance learning. Mechatronics 2020, 67, 102348. [Google Scholar] [CrossRef]
- Abbeel, P.; Ng, A.Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff Alberta, AL, Canada, 4–8 July 2004; p. 1. [Google Scholar]
- Ratliff, N.; Bagnell, J.A.; Zinkevich, M. Maximum margin planning. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; Volume 3, pp. 729–736. [Google Scholar]
- Klein, E.; Geist, M.; Piot, B.; Pietquin, O. Inverse Reinforcement Learning through Structured Classification. In Proceedings of the NIPS, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1007–1015. [Google Scholar]
- Ho, J.; Gupta, J.K.; Ermon, S. Model-Free Imitation Learning with Policy Optimization. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Xia, C.; Kamel, A.E. Neural inverse reinforcement learning in autonomous navigation. Robot. Auton. Syst. 2016, 84, 1–14. [Google Scholar] [CrossRef]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
- Ziebart, B.D.; Maas, A.L.; Bagnell, J.A.; Dey, A.K. Maximum Entropy Inverse Reinforcement Learning; The AAAI Press: Menlo Park, CA, USA, 2008; pp. 1433–1438. [Google Scholar]
- Finn, C.; Levine, S.; Abbeel, P. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Boularias, A.; Kober, J.; Peters, J. Relative Entropy Inverse Reinforcement Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 182–189. [Google Scholar]
- Peng, X.B.; Abbeel, P.; Levine, S.; De Panne, M.V. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. ACM Trans. Graph. 2018, 37, 143. [Google Scholar] [CrossRef] [Green Version]
- Cai, Q.; Hong, M.; Chen, Y.; Wang, Z. On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator. arXiv 2019, arXiv:1901.03674. [Google Scholar]
- Ho, J.; Ermon, S. Generative Adversarial Imitation Learning. arXiv 2016, arXiv:1606.03476. [Google Scholar]
- Kuefler, A.; Morton, J.; Wheeler, T.A.; Kochenderfer, M.J. Imitating driver behavior with generative adversarial networks. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 204–211. [Google Scholar]
- Baram, N.; Anschel, O.; Caspi, I.; Mannor, S. End-to-End Differentiable Adversarial Imitation Learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 390–399. [Google Scholar]
- Merel, J.; Tassa, Y.; Dhruva, T.B.; Srinivasan, S.; Lemmon, J.; Wang, Z.; Wayne, G.; Heess, N. Learning human behaviors from motion capture by adversarial imitation. arXiv 2017, arXiv:1707.02201. [Google Scholar]
- Wang, Z.; Merel, J.; Reed, S.; Wayne, G.; De Freitas, N.; Heess, N. Robust Imitation of Diverse Behaviors. arXiv 2017, arXiv:1707.02747. [Google Scholar]
- Stadie, B.C.; Abbeel, P.; Sutskever, I. Third-Person Imitation Learning. arXiv 2017, arXiv:1703.01703. [Google Scholar]
- Liu, Y.; Gupta, A.; Abbeel, P.; Levine, S. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–25 May 2018. [Google Scholar] [CrossRef] [Green Version]
- Gupta, A.; Devin, C.; Liu, Y.; Abbeel, P.; Levine, S. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning. arXiv 2017, arXiv:1703.02949. [Google Scholar]
- Raileanu, R.; Goldstein, M.; Szlam, A.; Fergus, R. Fast Adaptation via Policy-Dynamics Value Functions. arXiv 2020, arXiv:2007.02879. [Google Scholar]
- Charles, R.N.; Guillaume, D.; Andreialexandru, R.; Koray, K.; Thais, H.R.; Razvan, P.; James, K.; Josef, S.H. Progressive Neural Networks. arXiv 2017, arXiv:1606.04671. [Google Scholar]
- Tzeng, E.; Devin, C.; Hoffman, J.; Finn, C.; Abbeel, P.; Levine, S.; Saenko, K.; Darrell, T. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints. In Algorithmic Foundations of Robotics XII; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Feifei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- Peng, X.B.; Andrychowicz, M.; Zaremba, W.; Abbeel, P. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 3803–3810. [Google Scholar]
- Ammar, H.B.; Eaton, E.; Ruvolo, P.; Taylor, M.E. Online Multi-Task Learning for Policy Gradient Methods. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1206–1214. [Google Scholar]
- Rusu, A.A.; Vecerik, M.; Rothorl, T.; Heess, N.; Pascanu, R.; Hadsell, R. Sim-to-Real Robot Learning from Pixels with Progressive Nets. In Proceedings of the Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 262–270. [Google Scholar]
- He, Z.; Julian, R.; Heiden, E.; Zhang, H.; Schaal, S.; Lim, J.J.; Sukhatme, G.S.; Hausman, K. Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations. arXiv 2018, arXiv:1810.02422. [Google Scholar]
- Ramakrishnan, R.; Kamar, E.; Dey, D.; Horvitz, E.; Shah, J.A. Blind Spot Detection for Safe Sim-to-Real Transfer. J. Artif. Intell. Res. 2020, 67, 191–234. [Google Scholar] [CrossRef]
- Hwasser, M.; Kragic, D.; Antonova, R. Variational Auto-Regularized Alignment for Sim-to-Real Control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May– 31 August 2020. [Google Scholar]
- Golemo, F.; Taiga, A.A.; Courville, A.; Oudeyer, P.Y. Sim-to-real transfer with neural-augmented robot simulation. In Proceedings of the Conference on Robot Learning, New York, NY, USA, 29–31 October 2018; pp. 817–828. [Google Scholar]
- Jeong, R.; Kay, J.; Romano, F.; Lampe, T.; Rothorl, T.; Abdolmaleki, A.; Erez, T.; Tassa, Y.; Nori, F. Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer. arXiv 2019, arXiv:1910.09471. [Google Scholar]
- Hwangbo, J.; Lee, J.; Dosovitskiy, A.; Bellicoso, D.; Tsounis, V.; Koltun, V.; Hutter, M. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 2019, 4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Matas, J.; James, S.; Davison, A.J. Sim-to-Real Reinforcement Learning for Deformable Object Manipulation. In Proceedings of the Conference on Robot Learning, Zürich, Switzerland, 29–31 October 2018. [Google Scholar]
- Sadeghi, F.; Toshev, A.; Jang, E.; Levine, S. Sim2Real View Invariant Visual Servoing by Recurrent Control. arXiv 2017, arXiv:1712.07642. [Google Scholar]
- Mees, O.; Merklinger, M.; Kalweit, G.; Burgard, W. Adversarial skill networks: Unsupervised robot skill learning from video. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 30 May–5 June 2020; pp. 4188–4194. [Google Scholar]
- Ogenyi, U.E.; Liu, J.; Yang, C.; Ju, Z.; Liu, H. Physical human-robot collaboration: Robotic systems, learning methods, collaborative strategies, sensors and actuators. IEEE Trans. Syst. Man Cybern. 2019, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Gribovskaya, E.; Kheddar, A.; Billard, A. Motion learning and adaptive impedance for robot control during physical interaction with humans. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 4326–4332. [Google Scholar]
- Rozo, L.; Calinon, S.; Caldwell, D.G.; Jimenez, P.; Torras, C. Learning physical collaborative robot behaviors from human demonstrations. IEEE Trans. Robot. 2016, 32, 513–527. [Google Scholar] [CrossRef] [Green Version]
- Calinon, S.; Evrard, P.; Gribovskaya, E.; Billard, A.; Kheddar, A. Learning collaborative manipulation tasks by demonstration using a haptic interface. In Proceedings of the 2009 International Conference on Advanced Robotics, Munich, Germany, 22–26 June 2009; pp. 1–6. [Google Scholar]
- Evrard, P.; Gribovskaya, E.; Calinon, S.; Billard, A.; Kheddar, A. Teaching physical collaborative tasks: Object-lifting case study with a humanoid. In Proceedings of the 2009 9th IEEE-RAS International Conference on Humanoid Robots, Paris, France, 7–10 December 2009; pp. 399–404. [Google Scholar]
- Levine, S.; Wagener, N.; Abbeel, P. Learning contact-rich manipulation skills with guided policy search. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 156–163. [Google Scholar]
- Kaelbling, L.P.; Lozanoperez, T. Unifying perception, estimation and action for mobile manipulation via belief space planning. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, St. Paul, MN, USA, 14–18 May 2012; pp. 2952–2959. [Google Scholar]
- Platt, R.W.; Kaelbling, L.P.; Lozanoperez, T.; Tedrake, R. Efficient Planning in Non-Gaussian Belief Spaces and Its Application to Robot Grasping. In Robotics Research; Springer: Cham, Switzerland, 2017; pp. 253–269. [Google Scholar]
- Li, Z.; Zhao, T.; Chen, F.; Hu, Y.; Su, C.; Fukuda, T. Reinforcement Learning of Manipulation and Grasping Using Dynamical Movement Primitives for a Humanoidlike Mobile Manipulator. IEEE-ASME Trans. Mechatron. 2017, 23, 121–131. [Google Scholar] [CrossRef]
- Kulvicius, T.; Biehl, M.; Aein, M.J.; Tamosiunaite, M.; Worgotter, F. Interaction learning for dynamic movement primitives used in cooperative robotic tasks. Robot. Auton. Syst. 2013, 61, 1450–1459. [Google Scholar] [CrossRef]
- Zhao, T.; Deng, M.; Li, Z.; Hu, Y. Cooperative Manipulation for a Mobile Dual-Arm Robot Using Sequences of Dynamic Movement Primitives. IEEE Trans. Cogn. Dev. Syst. 2020, 12, 18–29. [Google Scholar] [CrossRef]
- Xue, Z.; Ruehl, S.W.; Hermann, A.; Kerscher, T.; Dillmann, R. Autonomous grasp and manipulation planning using a ToF camera. Robot. Auton. Syst. 2012, 60, 387–395. [Google Scholar] [CrossRef]
- Jonschkowski, R.; Brock, O. Learning state representations with robotic priors. Auton. Robot. 2015, 39, 407–428. [Google Scholar] [CrossRef]
- Meng, J.; Zhang, S.; Bekyo, A.; Olsoe, J.; Baxter, B.; He, B. Noninvasive Electroencephalogram Based Control of a Robotic Arm for Reach and Grasp Tasks. Sci. Rep. 2016, 6, 38565. [Google Scholar] [CrossRef] [Green Version]
- Hahne, J.M.; Schweisfurth, M.A.; Koppe, M.; Farina, D. Simultaneous control of multiple functions of bionic hand prostheses: Performance and robustness in end users. Sci. Robot. 2018, 3. [Google Scholar] [CrossRef] [Green Version]
- Li, Z.; Xu, C.; Wei, Q.; Shi, C.; Su, C. Human-Inspired Control of Dual-Arm Exoskeleton Robots with Force and Impedance Adaptation. IEEE Trans. Syst. Man Cybern. 2019, 50, 5296–5305. [Google Scholar] [CrossRef]
- Calinon, S.; Bruno, D.; Malekzadeh, M.S.; Nanayakkara, T.; Caldwell, D.G. Human-robot skills transfer interfaces for a flexible surgical robot. Comput. Methods Programs Biomed. 2014, 116, 81–96. [Google Scholar] [CrossRef]
- Hu, D.; Gong, Y.; Hannaford, B.; Seibel, E.J. Semi-autonomous simulated brain tumor ablation with RAVENII Surgical Robot using behavior tree. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; Volume 2015, pp. 3868–3875. [Google Scholar]
- Deng, M.; Li, Z.; Kang, Y.; Chen, C.L.P.; Chu, X. A Learning-Based Hierarchical Control Scheme for an Exoskeleton Robot in Human—Robot Cooperative Manipulation. IEEE Trans. Cybern. 2020, 50, 112–125. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wen, Y.; Si, J.; Brandt, A.; Gao, X.; Huang, H.H. Online Reinforcement Learning Control for the Personalization of a Robotic Knee Prosthesis. IEEE Trans. Cybern. 2020, 50, 2346–2356. [Google Scholar] [CrossRef] [PubMed]
- Duan, Y.; Schulman, J.; Chen, X.; Bartlett, P.L.; Sutskever, I.; Abbeel, P. Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv 2017, arXiv:1611.02779. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1842–1850. [Google Scholar]
- Andrychowicz, M.; Denil, M.; Gomez, S.; Hoffman, M.W.; Pfau, D.; Schaul, T.; Shillingford, B.; De Freitas, N. Learning to learn by gradient descent by gradient descent. arXiv 2016, arXiv:1606.04474. [Google Scholar]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching networks for one shot learning. arXiv 2016, arXiv:1606.04080. [Google Scholar]
- Ravi, S.; Larochelle, H. Optimization as a Model for Few-Shot Learning; ICLR: Toulon, France, 2017. [Google Scholar]
- Sung, F.; Zhang, L.; Xiang, T.; Hospedales, T.M.; Yang, Y. Learning to Learn: Meta-Critic Networks for Sample Efficient Learning. arXiv 2017, arXiv:1706.09529. [Google Scholar]
- Duan, Y.; Andrychowicz, M.; Stadie, B.C.; Ho, J.; Schneider, J.; Sutskever, I.; Abbeel, P.; Zaremba, W. One-Shot Imitation Learning. Neural Inf. Process. Syst. 2017, 1087–1098. [Google Scholar]
- Finn, C.; Yu, T.; Zhang, T.; Abbeel, P.; Levine, S. One-Shot Visual Imitation Learning via Meta-Learning. In Proceedings of the Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 357–368. [Google Scholar]
- Yu, T.; Finn, C.; Xie, A.; Dasari, S.; Zhang, T.; Abbeel, P.; Levine, S. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. arXiv 2018, arXiv:1802.01557. [Google Scholar]
- Xu, D.; Nair, S.; Zhu, Y.; Gao, J.; Garg, A.; Feifei, L.; Savarese, S. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 3795–3802. [Google Scholar]
Model-Based/Free | Ref. | Year | Authors | Algorithm | Value/Policy-Based |
---|---|---|---|---|---|
[40] | 2010 | Hester et al. | DT | Value-based | |
[41] | 2014 | Lioutikov et al. | LLSE | Value-based | |
Model-based | [42] | 2017 | Schenck et al. | CNN | Value-based |
[47] | 2011 | Deisenroth et al. | PILCO | Policy-based | |
[50] | 2016 | Levine et al. | GPS | Policy-based | |
[51] | 2018 | Levine et al. | CEM | Policy-based | |
[58] | 2015 | Zhang et al. | DQN | Value-based | |
[63] | 2015 | Schulman et al. | TRPO | Policy-based | |
[33] | 2018 | Marcin et al. | PPO | Policy-based | |
Model-free | [64] | 2016 | Mirowski et al. | A3C | Policy-based |
[66] | 2016 | Lillicrap et al. | DDPG | Both | |
[67] | 2016 | Gu et al. | NAF | Both | |
[68] | 2017 | Gu et al. | Asynchronous NAF | Both | |
[69] | 2018 | Haarnoja et al. | SAC | Both |
Categories | Ref. | Year | Authors | Algorithms |
---|---|---|---|---|
[73] | 2007 | Takeda et al. | HMM | |
[74] | 2007 | Calinon et al. | GMM | |
Behavior Cloning | [75] | 2010 | Calinon et al. | GMR |
[76] | 2014 | Gams et al. | DMPs | |
[77] | 2018 | Zhang et al. | VL | |
[81] | 2004 | Abbeel et al. | MP | |
[82] | 2006 | Ratliff et al. | MMP | |
[83] | 2012 | Klein et al. | SC | |
Inverse Reinforcement Learning | [84] | 2016 | Ho et al. | AL |
[85] | 2016 | Xia et al. | NIRL | |
[87] | 2008 | Ziebart et al. | Maximum Entropy IRL | |
[89] | 2011 | Boularias et al. | Relative Entropy IRL | |
[90] | 2018 | Peng et al. | DeepMimic | |
[94] | 2017 | Baram et al. | MGAI | |
[95] | 2017 | Merel et al. | Extended GAIL | |
Generative Adversarial Imitation Learning | [96] | 2017 | Wang et al. | VAE |
[97] | 2017 | Stadie et al. | TPIL | |
[98] | 2017 | Liu et al. | IFO |
Improved Methods | Ref. | Year | Authors | Approaches |
---|---|---|---|---|
[102] | 2015 | Tzeng et al. | Neural-augmented simulation | |
Better Simulation | [103] | 2017 | Zhu et al. | Weak pairwise constraints |
[104] | 2018 | Peng et al. | Framework of AI2-THOR | |
[105] | 2014 | Ammar et al. | Randomizing the dynamics of the simulator | |
Policy Randomization | [106] | 2016 | Rusu et al. | Multi-task policy gradient |
[90] | 2018 | Peng et al. | Progressive neural networks | |
[107] | 2018 | He et al. | Model-predictive control | |
Robust Policy | [108] | 2020 | Ramakrishnan et al. | The oracle feedback |
[109] | 2020 | Hwasser et al. | Variational auto-regularized alignment |
Applications | Classical Demos | References |
---|---|---|
Peg-in-hole | [117] | |
Industrial Robot | grinding and polishing | [118] |
welding | [119] | |
human-machine collaboration | [120] | |
Ironing clothes | [121,122] | |
Personal Robot | pouring water | [123,124,125] |
autonomous navigation | [53,126] | |
obstacle avoidance | [76,127,128] | |
Medical Robot | Rehabilitation training | [129,130] |
surgical operation | [131,132,133] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hua, J.; Zeng, L.; Li, G.; Ju, Z. Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning. Sensors 2021, 21, 1278. https://doi.org/10.3390/s21041278
Hua J, Zeng L, Li G, Ju Z. Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning. Sensors. 2021; 21(4):1278. https://doi.org/10.3390/s21041278
Chicago/Turabian StyleHua, Jiang, Liangcai Zeng, Gongfa Li, and Zhaojie Ju. 2021. "Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning" Sensors 21, no. 4: 1278. https://doi.org/10.3390/s21041278
APA StyleHua, J., Zeng, L., Li, G., & Ju, Z. (2021). Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning. Sensors, 21(4), 1278. https://doi.org/10.3390/s21041278