GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving
<p>GRI is applied to visual-based autonomous driving in an end-to-end pipeline composed of a perception module encoding RGB images from three cameras on the driving agent and a decision-making module inferring an action from the encoded features. This pipeline is trained in two phases: (1) Visual encoders are pretrained on several auxiliary tasks, which are semantic segmentation, road type classification, relevant traffic light presence, and if there is such a traffic light, its state and the distance to it. (2) Visual encoders are frozen and a GRI-based DRL network is trained with both pre-generated expert data with an offline demonstration agent and an online exploration agent gathering data from a simulator. At any given training step, the next episode to add to the replay buffer comes from the demonstration agent with a probability of <math display="inline"><semantics> <msub> <mi>p</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>m</mi> <mi>o</mi> </mrow> </msub> </semantics></math>, else from the exploration agent. Actions correspond to a pair (steering, throttle) to apply to the car.</p> "> Figure 2
<p>Feature extraction from RGB camera images for the visual subsystem. Two encoder-decoder networks are pretrained on segmentation, classifications, and regression tasks. Classifications and regression are only performed on the center image while all three images are segmented. After training, the visual encoders serve as fixed feature extractors with frozen weights. For the DRL backbone training, both encoder outputs are concatenated and sent to the memory buffer as input to DRL. Both encoders are Efficientnet-b1. The segmentation decoder is fully convolutional, and the classification decoder is an MLP with several outputs.</p> "> Figure 3
<p>Simplified representation of the distributed GRIAD setup with a Rainbow-IQN Ape-X backbone. A central computer receives data in a shared replay buffer from both exploration and demonstration agents running on other computers. Data are sampled from this replay buffer to make the backpropagation and update the weights of all the agents. Images from the agents are encoded using the network presented in <a href="#robotics-12-00127-f002" class="html-fig">Figure 2</a> before being stored in the memory buffer.</p> "> Figure 4
<p>Mujoco environments used for our experiments. Respectively HalfCheetah-v2, Humanoid-v2, Ant-v2, and Walker2d-v2. Articulations are controlled to make them walk. Rewards depend on the covered distance.</p> "> Figure 5
<p>Ablation over demonstration agents with the GRI-SAC setup on Mujoco environments, and analysis of the evolution of the evaluation reward in function of the proportion of demonstration agents. GRI-SAC with 0% demonstration agent is vanilla SAC. We observe that GRI-SAC always reaches the level of the expert even when the expert is significantly better than the trained vanilla SAC. The proportion of demonstration agent has a significant impact on the dynamics of the convergence.</p> "> Figure 6
<p>Ablation over demonstration agents with the GRI-DDPG, with 20% of demonstration agents on Mujoco environments. GRI-DDPG systematically leads to a better reward than vanilla DDPG. However, contrary to GRI-SAC, GRI-DDPG with 20% demonstration agents does not systematically reach the expert level.</p> ">
Abstract
:1. Introduction
- Definition of the novel GRI method to combine offline demonstrations and online exploration.
- Presentation and ablation study of GRI for the visual-based Autonomous Driving (GRIAD) algorithm.
- Further analysis of GRI-based algorithms on the Mujoco benchmark.
2. Related Work
2.1. End-to-End Autonomous Driving on CARLA
2.2. Learning from Demonstration and Exploration
3. General Reinforced Imitation
3.1. Method
Algorithm 1: GRI: General Reinforced Imitation. |
3.2. GRI for Autonomous Driving
4. Experimental Results
4.1. GRIAD on CARLA
4.2. GRI on the Mujoco Benchmark
- For HalfCheetah-v2, a difficult task on which the expert is significantly stronger than the trained SAC, we observe that the beginning of the training is slower using GRI-SAC; we call this a warm up phase, which we will explain further in Section 4.3. However, the rewards turns out to become significantly higher after some time. Here, GRI-SAC is better than SAC with every proportion of demonstration agents. The best scores were reached with 10% and 20% of demonstration agents.
- For Humanoid-v2, a difficult task on which the expert is just a little stronger than the trained SAC, we observe that the higher the number of demonstration agents is, the longer the warm up phase is. Nonetheless, GRI-SAC models end up having higher rewards after their warm up phase. The best scores are reached with 10% and 20% of demonstration agents.
- Ant-v2 and Walker2d-v2 are the easiest tasks of the four evaluated. On Ant-v2, the SAC agent reaches the expert level, converging similarly as GRI-SAC regardless of the number of demonstration agents used. Nevertheless, GRI-SAC converges faster with 10% and 20% demonstration agents. On Walker2d-v2, the final reward of GRI-SAC is significantly higher and reaches the expert level, while SAC remains below.
GRI with DDPG as the DRL Backbone
4.3. Limitations and Quantitative Insights
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bojarski, M.; Testa, D.D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to End Learning for Self-Driving Cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Osa, T.; Pajarinen, J.; Neumann, G.; Bagnell, J.A.; Abbeel, P.; Peters, J. An algorithmic perspective on imitation learning. Found. Trends® Robot. 2018, 7, 1–179. [Google Scholar] [CrossRef]
- Prakash, A.; Chitta, K.; Geiger, A. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021. [Google Scholar]
- Toromanoff, M.; Wirbel, E.; Wilhelm, F.; Vejarano, C.; Perrotton, X.; Moutarde, F. End to End Vehicle Lateral Control Using a Single Fisheye Camera. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3613–3619. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1587–1596. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the The 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 1928–1937. [Google Scholar]
- Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 1–16. [Google Scholar]
- Chen, D.; Koltun, V.; Krähenbühl, P. Learning to drive from a world on rails. In Proceedings of the ICCV, Virtual, 11–17 October 2021. [Google Scholar]
- Codevilla, F.; Santana, E.; Lopez, A.; Gaidon, A. Exploring the Limitations of Behavior Cloning for Autonomous Driving. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9328–9337. [Google Scholar] [CrossRef]
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5026–5033. [Google Scholar] [CrossRef]
- Chen, D.; Zhou, B.; Koltun, V.; Krähenbühl, P. Learning by Cheating. In Proceedings of the Conference on Robot Learning (CoRL), London, UK, 8–11 November 2019. [Google Scholar]
- Gordon, D.; Kadian, A.; Parikh, D.; Hoffman, J.; Batra, D. SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1022–1031. [Google Scholar] [CrossRef]
- Toromanoff, M.; Wirbel, E.; Moutarde, F. End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Zhang, Z.; Liniger, A.; Dai, D.; Yu, F.; Van Gool, L. End-to-End Urban Driving by Imitating a Reinforcement Learning Coach. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021. [Google Scholar]
- Hester, T.; Vecerík, M.; Pietquin, O.; Lanctot, M.; Schaul, T.; Piot, B.; Sendonaris, A.; Dulac-Arnold, G.; Osband, I.; Agapiou, J.P.; et al. Learning from Demonstrations for Real World Reinforcement Learning. arXiv 2017, arXiv:1704.03732. [Google Scholar]
- Reddy, S.; Dragan, A.D.; Levine, S. SQIL: Imitation Learning via Regularized Behavioral Cloning. arXiv 2019, arXiv:1905.11108. [Google Scholar]
- Rajeswaran, A.; Kumar, V.; Gupta, A.; Schulman, J.; Todorov, E.; Levine, S. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arXiv 2017, arXiv:1709.10087. [Google Scholar]
- Martin, J.B.; Chekroun, R.; Moutarde, F. Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling. arXiv 2021, arXiv:2110.14464. [Google Scholar]
- Xu, D.; Nair, S.; Zhu, Y.; Gao, J.; Garg, A.; Fei-Fei, L.; Savarese, S. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 3795–3802. [Google Scholar] [CrossRef]
- Gao, Y.; Xu, H.; Lin, J.; Yu, F.; Levine, S.; Darrell, T. Reinforcement Learning from Imperfect Demonstrations. arXiv 2018, arXiv:1802.05313. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Hessel, M.; Modayil, J.; van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.G.; Silver, D. Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv 2017, arXiv:1710.02298. [Google Scholar] [CrossRef]
- Dabney, W.; Ostrovski, G.; Silver, D.; Munos, R. Implicit Quantile Networks for Distributional Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1096–1105. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
- Toromanoff, M.; Wirbel, E.; Moutarde, F. Is Deep Reinforcement Learning Really Superhuman on Atari? In Proceedings of the Deep Reinforcement Learning Workshop of 39th Conference on Neural Information Processing Systems (Neurips’2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Hu, H.; Liu, Z.; Chitlangia, S.; Agnihotri, A.; Zhao, D. Investigating the impact of multi-lidar placement on object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 2550–2559. [Google Scholar]
- Wu, P.; Jia, X.; Chen, L.; Yan, J.; Li, H.; Qiao, Y. Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline. Adv. Neural Inf. Process. Syst. 2022, 35, 6119–6132. [Google Scholar]
- Shao, H.; Wang, L.; Chen, R.; Waslander, S.L.; Li, H.; Liu, Y. ReasonNet: End-to-End Driving with Temporal and Global Reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 13723–13733. [Google Scholar]
- Chen, D.; Krähenbühl, P. Learning from all vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17222–17231. [Google Scholar]
- Shao, H.; Wang, L.; Chen, R.; Li, H.; Liu, Y. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; pp. 726–737. [Google Scholar]
- Fujita, Y.; Nagarajan, P.; Kataoka, T.; Ishikawa, T. ChainerRL: A Deep Reinforcement Learning Library. J. Mach. Learn. Res. 2021, 22, 3557–3570. [Google Scholar]
Method | Cam. | LiDAR | IMU | DS | RC | IS |
---|---|---|---|---|---|---|
GRIAD (ours) | 3 | ✗ | ✗ | 36.79 | 61.85 | 0.60 |
Rails [10] | 4 | ✗ | ✗ | 31.37 | 57.65 | 0.56 |
IAs [15] | 1 | ✗ | ✗ | 24.98 | 46.97 | 0.52 |
TCP [30] | 1 | ✗ | ✓ | 75.13 | 85.53 | 0.87 |
Latent Transfuser [3] | 3 | ✗ | ✓ | 45.2 | 66.31 | 0.72 |
LBC [13] | 3 | ✗ | ✓ | 10.9 | 21.3 | 0.55 |
ReasonNet [31] | 4 | ✓ | ✓ | 79.95 | 89.89 | 0.89 |
LAV [32] | 4 | ✓ | ✓ | 61.8 | 94.5 | 0.64 |
InterFuser [33] | 3 | ✓ | ✓ | 76.18 | 88.23 | 0.84 |
Transfuser+ [3] | 4 | ✓ | ✗ | 50.5 | 73.8 | 0.68 |
Task | Town, Weather | GRIAD | ||
---|---|---|---|---|
Explo. 12 M | Explo. 12 M + Demo. 4 M | Explo. 16 M | ||
Empty | 96.3 ± 1.5 | 98.0 ± 1.7 | 98.0 ± 1.0 | |
Regular | train, train | 95.0 ± 2.4 | 98.3 ± 1.7 | 98.6 ± 1.2 |
Dense | 91.7 ± 2.0 | 93.7 ± 1.7 | 95.0 ± 1.6 | |
Empty | 83.3 ± 3.7 | 94.0 ± 1.6 | 96.3 ± 1.7 | |
Regular | test, train | 82.6 ± 3.7 | 93.0 ± 0.8 | 96.3 ± 2.5 |
Dense | 61.6 ± 2.0 | 77.7 ± 4.5 | 78.0 ± 2.8 | |
Empty | 67.3 ± 1.9 | 83.3 ± 2.5 | 73.3 ± 2.5 | |
Regular | train, test | 76.7 ± 2.5 | 86.7 ± 2.5 | 81.3 ± 2.5 |
Dense | 67.3 ± 2.5 | 82.6 ± 0.9 | 80.0 ± 1.6 | |
Empty | 60.6 ± 2.5 | 68.7 ± 0.9 | 62.0 ± 1.6 | |
Regular | test, test | 59.3 ± 2.5 | 63.3 ± 2.5 | 56.7 ± 3.4 |
Dense | 40.0 ± 1.6 | 52.0 ± 4.3 | 46.0 ± 3.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chekroun, R.; Toromanoff, M.; Hornauer, S.; Moutarde, F. GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving. Robotics 2023, 12, 127. https://doi.org/10.3390/robotics12050127
Chekroun R, Toromanoff M, Hornauer S, Moutarde F. GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving. Robotics. 2023; 12(5):127. https://doi.org/10.3390/robotics12050127
Chicago/Turabian StyleChekroun, Raphael, Marin Toromanoff, Sascha Hornauer, and Fabien Moutarde. 2023. "GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving" Robotics 12, no. 5: 127. https://doi.org/10.3390/robotics12050127
APA StyleChekroun, R., Toromanoff, M., Hornauer, S., & Moutarde, F. (2023). GRI: General Reinforced Imitation and Its Application to Vision-Based Autonomous Driving. Robotics, 12(5), 127. https://doi.org/10.3390/robotics12050127