Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Locally connected interrelated network: : A forward propagation primitive

Published: 01 May 2023 Publication History

Abstract

End-to-end learning for planning is a promising approach for finding good robot strategies in situations where the state transition, observation, and reward functions are initially unknown. Many neural network architectures for this approach have shown positive results. Across these networks, seemingly small components have been used repeatedly in different architectures, which means improving the efficiency of these components has great potential to improve the overall performance of the network. This paper aims to improve one such component: The forward propagation module. In particular, we propose Locally Connected Interrelated Network (LCI-Net) – a novel type of locally connected layer with unshared but interrelated weights – to improve the efficiency of learning stochastic transition models for planning and propagating information via the learned transition models. LCI-Net is a small differentiable neural network module that can be plugged into various existing architectures. For evaluation purposes, we apply LCI-Net to VIN and QMDP-Net. VIN is an end-to-end neural network for solving Markov Decision Processes (MDPs) whose transition and reward functions are initially unknown, while QMDP-Net is its counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown. Simulation tests on benchmark problems involving 2D and 3D navigation and grasping indicate promising results: Changing only the forward propagation module alone with LCI-Net improves VIN’s and QMDP-Net generalisation capability by more than 3× and 10×, respectively.

References

[1]
Collins N and Kurniawati H (2020) Locally-connected interrelated network: a forward propagation primitive. In: Proceedings of the Fourteenth Workshop on the Algorithmic Foundations of Robotics, Oulu, Finland, 21–23 June 2021.
[2]
François-Lavet V, Bengio Y, and Precup D, et al. (2019) Combined reinforcement learning via abstract representations. Association for the Advancement of Artificial Intelligence 33: 3582–3589.
[3]
Gupta S, Davidson J, and Levine S, et al. (2017) Cognitive mapping and planning for visual navigation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 21–26 July 2017.
[4]
Haarnoja T, Ajay A, and Levine S, et al. (2016) Backprop KF: learning discriminative deterministic state estimators. In: NIPS Conference, Barcelona, Spain, 5–10 December 2016.
[5]
Hausknecht M and Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. In: AAAI 2015 Fall Symposium, Arlington, VI, 12–14 November 2015.
[6]
Howard A and Roy N (2003) The robotics data set repository (radish). Available at: http://radish.sourceforge.net/
[7]
Jonkowski R and Brock O (2016) End-to-End Learnable Histogram Filters. In Workshop on Deep Learning for Action and Interaction at NIPS, Barcelona, Spain, 5-10 December 2016.
[8]
Kaelbling LP, Littman ML, and Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1–2): 99–134.
[9]
Karkus P, Hsu D, and Lee WS (2017) QMDP-net: deep learning for planning under partial observability. In: NIPS Conference, Long Beach, CA, 4–9 December 2017.
[10]
Karkus P, Hsu D, and Lee WS (2018) Particle filter networks with application to visual localization. In: CoRL Conference, Zürich, Switzerland, 29–31 October 2018.
[11]
Karkus P, Ma X, and Hsu D, et al. (2019) Differentiable algorithm networks for composable robot learning. In: Robotics: Science and Systems, Breisgau, Germany, 22–26 June 2019.
[12]
Lever G, Stafford R, and Shawe-Taylor J (2014) Learning transition dynamics in MDPs with online regression and greedy feature selection. In: Neural Information Processing Systems Conference Workshop on Autonomously Learning Robots, Montreal, Canada, 12 December 2014.
[13]
Lee L, Parisotto E, and Chaplot DS, et al. (2018) Gated path planning networks. In: ICML Conference, Stockholm, Sweden, 10–15 July 2018.
[14]
Littman ML, Cassandra AR, and Kaelbling LP (1995) Learning policies for partially observable environments: scaling up. In: International Conference on Machine Learning, Tahoe City, CA, 9–12 July 1995.
[15]
Mirowski P, Pascanu R, and Viola F, et al. (2016) Learning to navigate in complex environments. In: ICLR Conference, San Juan, Puerto Rico, 2–4 May 2016.
[16]
Mnih V, Kavukcuoglu K, and Silver D, et al. (2015) Human-level control through deep reinforcement learning. Nature 518: 529–533.
[17]
Moerland TM, Broekens J, and Jonker CM (2017) Learning multimodal transition dynamics for model-based reinforcement learning. In: European Conference on Machine Learning, Skopje, Macedonia, 18–22 September 2017.
[18]
Oh J, Guo X, and Lee H, et al. (2015) Action-conditional video prediction using deep networks in atari games. In: NIPS Conference, Montreal, Canada, 7–12 December 2015, pp. 2863–2871.
[19]
Oh J, Singh S, and Lee H (2017) Value prediction network. In: NIPS Conference, Long Beach, CA, 4–9 December 2017, pp. 6118–6128.
[20]
Okada M, Rigazio L, and Aoshima T (2017) Pathe Integral Networks: Ennd-To-End Differentiable Optimal Control. arXiv preprint arXiv:1706.09597.
[21]
Olah C (2019) Neural networks, types, and functional programming. Available at: http://colah.github.io/posts/2015-09-NN-Types-FP/
[22]
Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming. Hoboken, NJ: John Wiley & Sons.
[23]
Qureshi AH, Simeonov A, and Bency MJ, et al. (2019) Motion planning networks. In: 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 20–24 May 2019, pp. 2118–2124. IEEE.
[24]
Shankar T, Dwivedy SK, and Guha P (2016) Reinforcement learning via recurrent convolutional neural networks. In: ICPR Conference, Cancún, Mexico, 4–8 December 2016, pp. 2592–2597.
[25]
Sondik E (1971) The Optimal Control of Partially Observable Markov processes. PhD Thesis, Stanford University: Stanford, CA.
[26]
Sun Y, Yin X, and Huang F (2021) Temple: Learning template of transitions for sample efficient multi-task rl. In: AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2–9 February 2021.
[27]
Tamar A, Wu Y, and Thomas G, et al. (2017) Value iteration networks. In: IJCAI Conference, Melbourne, Australia, 19–25 August 2017.
[28]
van der Pol E, Worrall D, and van Hoof H, et al. (2020) Mdp homomorphic networks: group symmetries in reinforcement learning. Advances in Neural Information Processing Systems 33; 2146–2154.
[29]
Wahlström N, Schön TB, and Deisenroth MP (2015) Learning deep dynamical models from image pixels. In: The 17th IFAC Symposium on System Identification (SYSID), Beijing, China, 19–21 October 2015.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 42, Issue 6
May 2023
160 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 May 2023

Author Tags

  1. deep learning
  2. imitation learning
  3. reinforcement learning
  4. planning
  5. POMDP

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media