Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Boundary-aware value function generation for safe stochastic motion planning

Published: 16 October 2024 Publication History

Abstract

Navigation safety is critical for many autonomous systems such as self-driving vehicles in an urban environment. It requires an explicit consideration of boundary constraints that describe the borders of any infeasible, non-navigable, or unsafe regions. We propose a principled boundary-aware safe stochastic planning framework with promising results. Our method generates a value function that can strictly distinguish the state values between free (safe) and non-navigable (boundary) spaces in the continuous state, naturally leading to a safe boundary-aware policy. At the core of our solution lies a seamless integration of finite elements and kernel-based functions, where the finite elements allow us to characterize safety-critical states’ borders accurately, and the kernel-based function speeds up computation for the non-safety-critical states. The proposed method was evaluated through extensive simulations and demonstrated safe navigation behaviors in mobile navigation tasks. Additionally, we demonstrate that our approach can maneuver safely and efficiently in cluttered real-world environments using a ground vehicle with strong external disturbances, such as navigating on a slippery floor and against external human intervention.

References

[1]
Agha-Mohammadi A-A, Chakravorty S, and Amato NM (2014) Firm: sampling-based feedback motion-planning under motion uncertainty and imperfect measurements. The International Journal of Robotics Research 33(2): 268–304.
[2]
Althoff M, Stursberg O, and Buss M (2008) Reachability analysis of nonlinear systems with uncertain parameters using conservative linearization. In: 2008 47th IEEE Conference on Decision and Control, Cancun, Mexico, 9–11 December 2008, pp. 4042–4048. IEEE.
[3]
Al-Sabban WH, Gonzalez LF, and Smith RN (2013) Wind-energy based path planning for unmanned aerial vehicles using markov decision processes. In 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, 6–10 May 2013, pp. 784–789. IEEE.
[4]
Antos A, Szepesvári C, and Munos R (2008) Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71(1): 89–129.
[5]
Bach F (2017) On the equivalence between kernel quadrature rules and random feature expansions. Journal of Machine Learning Research 18(1): 714–751.
[6]
Baek SS, Kwon H, and Yoder JA, et al. (2013) Optimal path planning of a target-following fixed-wing uav using sequential decision processes. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013, pp. 2955–2962. IEEE.
[7]
Bansal S and Tomlin CJ (2021) DeepReach: A deep learning approach to high-dimensional reachability. In Proc. IEEE Conf. Robotics and Automation, Xi’an, China, 30 May–5 June 2021, pp. 1817–1824, IEEE.
[8]
Bansal S, Chen M, and Herbert S, et al. (2017) Hamilton-Jacobi reachability: a brief overview and recent advances. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), Melbourne, 12–15 December 2017, pp. 2242–2253. IEEE.
[9]
Bellman RE (2015) Adaptive control processes: a guided tour. Princeton, NJ: Princeton University Press.
[10]
Bemporad A and Morari M (1999) Robust model predictive control: a survey. In: Garulli A and Tesi A (eds) Robustness in identification and control. London: Springer, 207–226.
[11]
Bertsekas DP (2005) Dynamic programming and suboptimal control: a survey from adp to mpc. European Journal of Control 11(4–5): 310–334.
[12]
Bertsekas D (2012) Dynamic programming and optimal control. Nashua, NH: Athena Scientific.
[13]
Bertsekas DP and Tsitsiklis JN (1996) Neuro-dynamic programming. Belmont, MA: Athena Scientific Belmont.
[14]
Braverman A, Gurvich I, and Huang J (2020) On the taylor expansion of value functions. Operations Research 68(2): 631–654.
[15]
Brenner SC, Scott LR, and Scott LR (2008) The mathematical theory of finite element methods. New York: Springer.
[16]
Burden RL, Faires JD, and Burden AM (2015) Numerical Analysis. Boston: Cengage learning.
[17]
Chen M and Tomlin CJ (2018) Hamilton–Jacobi reachability: some recent theoretical advances and applications in unmanned airspace management. Annual Review of Control, Robotics, and Autonomous Systems 1: 333–358.
[18]
Chen J, Su K, and Shen S (2015) Real-time safe trajectory generation for quadrotor flight in cluttered environments. In 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, 6–9 December 2015, pp. 1678–1685. IEEE.
[19]
Chen M, Herbert S, and Claire JT (2016) Fast reachable set approximations via state decoupling disturbances. In 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, 12–14 December 2016, pp. 191–196. IEEE.
[20]
Chen M, Herbert S, and Claire JT (2017) Exact and efficient Hamilton-Jacobi guaranteed safety analysis via system decomposition. In 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017, pp. 87–92. IEEE.
[21]
Cheng S-W, Krishna Dey T, and Shewchuk J, et al. (2013) Delaunay Mesh Generation. Boca Raton, FL: CRC Press.
[22]
Dawson C, Gao S, and Fan C (2023) Safe control with learned certificates: a survey of neural lyapunov, barrier, and contraction methods for robotics and control. IEEE Transactions on Robotics 39(3): 1749–1767.
[23]
Deisenroth MP, Rasmussen CE, and Peters J (2009) Gaussian process dynamic programming. Neurocomputing 72(7–9): 1508–1524.
[24]
Deits R and Tedrake R (2015) Efficient mixed-integer planning for uavs in cluttered environments. In 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, 25 May 2015, pp. 42–49. IEEE.
[25]
Devidze P, Kamalaruban P, and Singla A (2022) Exploration-guided reward shaping for reinforcement learning under sparse rewards. Advances in Neural Information Processing Systems 35: 5829–5842.
[26]
Engel Y, Mannor S, and Meir R (2003) Bayes meets bellman: the Gaussian process approach to temporal difference learning. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, 21–24 August 2003, pp. 154–161. IEEE.
[27]
Evans LC (2010) Partial Differential Equations: Second Edition (Graduate Series in Mathematics). Providence, RI: American Mathematical Society.
[28]
Fan J, Wang Z, and Xie Y, et al. (2020) A theoretical analysis of deep Q-learning. PMLR 120: 486–489.
[29]
Fisac JF, Lugovoy NF, Rubies-Royo V, Ghosh S, and Tomlin CJ (2019). Bridging hamilton-jacobi safety analysis and reinforcement learning . In 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 20 May 2019, pp. 8550–8556. IEEE.
[30]
Fu Z, Lewis TJ, and Kirby RM, et al. (2014) Architecting the finite element method pipeline for the GPU. Journal of computational and applied mathematics 257: 195–211.
[31]
Fu Y, Xiang Y, and Zhang Y (2015) Sense and collision avoidance of unmanned aerial vehicles using markov decision process and flatness approach. In 2015 IEEE International Conference on Information and Automation, Lijiang, 8–10 August 2015, pp. 714–719. IEEE.
[32]
Gammell JD and Strub MP (2021) Asymptotically optimal sampling-based motion planning methods. Annual Review of Control, Robotics, and Autonomous Systems 4: 295–318.
[33]
Gao F and Shen S (2016) Online quadrotor trajectory generation and autonomous navigation on point clouds. In 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Lausanne, 23–27 October 2016, pp. 139–146. IEEE.
[34]
Gao F, Wu W, and Lin Y, et al. (2018) Online safe trajectory generation for quadrotors using fast marching method and bernstein basis polynomial. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, 21–26 May 2018, pp. 344–351. IEEE.
[35]
Gao F, Wu W, and Gao W, et al. (2019) Flying on point clouds: online trajectory generation and autonomous navigation for quadrotors in cluttered environments. Journal of Field Robotics 36(4): 710–733.
[36]
Geist M and Pietquin O (2013) Algorithmic survey of parametric value function approximation. IEEE Transactions on Neural Networks and Learning Systems 24(6): 845–867.
[37]
Gorodetsky AA, Karaman S, and Marzouk YM (2015) Efficient high-dimensional stochastic optimal motion control using tensor-train decomposition. In: Robotics: Science and Systems, Rome, 1 July 2015.
[38]
Haarnoja T, Zhou A, and Abbeel P, et al. (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, Stockholm, 10–15 July 2018, pp. 1861–1870.
[39]
Hessel M, Modayil J, and Van Hasselt H, et al. (2018) Rainbow: combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, 2–7 February 2018.
[40]
Hofmann T, Schölkopf B, and Smola AJ (2008) Kernel methods in machine learning. Annals of Statistics 36(3): 1171–1220.
[41]
Hsu K-C, Nguyen DP, and Fisac JF (2023) ISAACS: Iterative soft adversarial actor-critic for safety. In: Learning for Dynamics & Control, Proceedings of Machine Learning Research, Philadelphia, PA, 15–16 June 2023.
[42]
Huynh VA, Karaman S, and Frazzoli E (2016) An incremental sampling-based algorithm for stochastic optimal control. The International Journal of Robotics Research 35: 305–333.
[43]
James B and Bengio Y (2012) Random search for hyper-parameter optimization. Journal of machine learning research 13(2).
[44]
Kappen HJ, Gómez V, and Opper M (2012) Optimal control as a graphical model inference problem. Machine Learning 87(2): 159–182.
[45]
Oden JT and Reddy JR (2012) An Introduction to the Mathematical Theory of Finite Elements. Chelmsford: Courier Corporation.
[46]
Junges S and Spaan MTJ (2022) Abstraction-refinement for hierarchical probabilistic models. In: International Conference on Computer Aided Verification, Haifa, 7–10 August 2022, pp. 102–123.
[47]
Karaman S and Frazzoli E (2011) Sampling-based algorithms for optimal motion planning. The International Journal of Robotics Research 30(7): 846–894.
[48]
Kober J, Bagnell JA, and Peters J (2013) Reinforcement learning in robotics: a survey. The International Journal of Robotics Research 32(11): 1238–1274.
[49]
Kousik S, Vaskov S, and Bu F, et al. (2020) Bridging the gap between safety and real-time performance in receding-horizon trajectory design for mobile robots. The International Journal of Robotics Research 39(12): 1419–1469.
[50]
Kuss M and Rasmussen CE (2004) Gaussian processes in reinforcement learning. Advances in Neural Information Processing Systems 16: 751–758.
[51]
Langson W, Chryssochoos I, and Raković SV, et al. (2004) Robust model predictive control using tubes. Automatica 40(1): 125–133.
[52]
LaValle SM (2006) Planning Algorithms. Cambridge: Cambridge University Press.
[53]
Liu S, Watterson M, and Mohta K, et al. (2017) Planning dynamically feasible trajectories for quadrotors using safe flight corridors in 3-d complex environments. IEEE Robotics and Automation Letters 2(3): 1688–1695.
[54]
Majumdar A and Tedrake R (2013) Robust online motion planning with regions of finite time invariance. In: Algorithmic Foundations of Robotics X. London:Springer, 543–558.
[55]
Majumdar A and Tedrake R (2017) Funnel libraries for real-time robust feedback motion planning. The International Journal of Robotics Research 36(8): 947–982.
[56]
Marder-Eppstein E, Berger E, and Foote T, et al. (2010) The office marathon: robust navigation in an indoor office environment. In 2010 IEEE International Conference on Robotics and Automation, Anchorage, 3–8 May 2010, pp. 300–307. IEEE.
[57]
Marinkovic D and Zehn M (2019) Survey of finite element method-based real-time simulations. Applied Sciences 9(14): 2775.
[58]
Mellinger D and Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011, pp. 2520–2525. IEEE.
[59]
Mitchell IM (2008) The flexible, extensible and efficient toolbox of level set methods. Journal of Scientific Computing 35: 300–329.
[60]
Mohamed IS, Yin K, and Liu L (2022) Autonomous navigation of AGVs in unknown cluttered environments: log-MPPI control strategy. IEEE Robotics and Automation Letters 7: 10240–10247.
[61]
Mohamed IS, Xu J, and Sukhatme G, et al. (2023) Towards efficient MPPI trajectory generation with unscented guidance: U-MPPI control strategy. arXiv preprint arXiv:2306.12369.
[62]
Munos R and Moore A (2002) Variable resolution discretization in optimal control. Machine Learning 49(2–3): 291–323.
[63]
Munos R and Szepesvári C (2008) Finite-time bounds for fitted value iteration. Journal of Machine Learning Research 9(5): 815–857.
[64]
Oleynikova H, Burri M, and Taylor Z, et al. (2016a) Continuous-time trajectory optimization for online uav replanning. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Daejeon, Korea, 9–14 October 2016, pp. 5332–5339. IEEE.
[65]
Oleynikova H, Alexander M, and Taylor Z, et al. (2016b) Signed distance fields: a natural representation for both mapping and planning. In: RSS 2016 Workshop: Geometry and Beyond-Representations, Physics, and Scene Understanding for Robotics. Ann Arbor: University of Michigan.
[66]
Ormoneit D and Sen Ś (2002) Kernel-based reinforcement learning. Machine Learning 49(2–3): 161–178.
[67]
Otte M, Silva W, and Frew E (2016) Any-time path-planning: time-varying wind field+ moving obstacles. In 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, 16–21 May 2016, pp. 2575–2582. IEEE.
[68]
Pan Y, Farahmand A-M, White M, and Nabi S, et al. (2018) Reinforcement learning with function-valued action spaces for partial differential equation control. arXiv preprint arXiv:1806.06931.
[69]
Pereira A and Thomas C (2020) Challenges of machine learning applied to safety-critical cyber-physical systems. Machine Learning and Knowledge Extraction 2(4): 579–602.
[70]
Pereira AA, Binney J, and Hollinger GA, et al. (2013) Risk-aware path planning for autonomous underwater vehicles using predictive ocean models. Journal of Field Robotics 30(5): 741–762.
[71]
Puterman ML (2014) Markov Decision Processes: Discrete Stochastic Dynamic Programming. Hoboken: John Wiley & Sons.
[72]
Raissi M, Paris P, and GeorgeKarniadakis E (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378: 686–707.
[73]
Ratliff N, Zucker M, and Bagnell JA, et al. (2009) Chomp: gradient optimization techniques for efficient motion planning. In 2009 IEEE International Conference on Robotics and Automation, Kobe, 12–17 May 2009, pp. 489–494. IEEE.
[74]
Rawlings JB, Mayne DQ, and Diehl M (2017) Model predictive control: theory, computation, and design. Madison: Nob Hill Publishing.
[75]
Richter C, Adam B, and Roy N (2016) Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. Robotics Research 114: 649–666.
[76]
Riedmiller M, Hafner R, and Lampe T, et al. (2018) Learning by playing solving sparse reward tasks from scratch. PMLR 80: 4344–4353.
[77]
Kalman RE (1960) Contributions to the theory of optimal control. Bol. Soc. Mat. Mexicana 5(2): 102–119.
[78]
Schulman J, Wolski F, and Dhariwal P, et al. (2017) Proximal policy optimization algorithms. ArXiv Preprint ArXiv:1707.06347.
[79]
Seo H, Lee D, and Son CY, et al. (2022) Real-time robust receding horizon planning using Hamilton–Jacobi reachability analysis. IEEE Transactions on Robotics 39(1): 90–109.
[80]
Shade R and Newman P (2011) Choosing where to Go: Complete 3D Exploration with Stereo. In 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011, pp. 2806–2811.
[81]
Shawe-Taylor J and Cristianini N (2004) Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press.
[82]
Sukumar N and Srivastava A (2022) Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Computer Methods in Applied Mechanics and Engineering 389(2022): 114333.
[83]
Sun W, van den Berg J, and Alterovitz R (2016) Stochastic extended LQR for optimization-based motion planning under uncertainty. IEEE Transactions on Automation Science and Engineering: A Publication of the IEEE Robotics and Automation Society 13(2): 437–447.
[84]
Sutton RS and Barto AG (2018) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
[85]
Taylor G and Parr R (2009) Kernelized value function approximation for reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, 14–18 June 2009, pp. 1017–1024.
[86]
Tedrake R, Manchester IR, and Tobenkin M, et al. (2010) Lqr-trees: feedback motion planning via sums-of-squares verification. The International Journal of Robotics Research 29(8): 1038–1052.
[87]
Theodorou E, Jonas B, and Schaal S (2010) A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research 11: 3137–3181.
[88]
ThomasHughes JR (2012) The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Chelmsford: Courier Corporation.
[89]
Thrun S (2002) Probabilistic robotics. Communications of the ACM 45(3): 52–57.
[90]
Thrun S, Burgard W, and Fox D (2000) Probabilistic robotics. Cambridge, MA: MIT Press Cambridge.
[91]
van den Berg J, Abbeel P, and Goldberg K (2011) Lqg-mp: optimized path planning for robots with motion uncertainty and imperfect state information. The International Journal of Robotics Research 30(7): 895–913.
[92]
Webb DJ and Berg JVD (2012) Kinodynamic rrt*: optimal motion planning for systems with linear differential constraints. arXiv preprint arXiv:1205.5088.
[93]
Wendland H (2004) Scattered data approximation. Cambridge: Cambridge University Press.
[94]
Williams G, Nolan W, and Goldfain B, et al. (2017) Information theoretic mpc for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017, pp. 1714–1721. IEEE.
[95]
Williams G, Goldfain B, and Paul D, et al. (2018) Robust sampling based model predictive control with sparse objective information. In: Robotics: Science and Systems, 2018, Pittsburgh, 26–30 June 2018.
[96]
Xu X, Hu D, and Lu X (2007) Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4): 973–992.
[97]
Xu J, Yin K, and Liu L (2019) Reachable space characterization of markov decision processes with time variability. In: Proceedings of Robotics: Science and Systems, FreiburgimBreisgau, Germany, 22–26 June 2019.
[98]
Xu J, Yin K, and Liu L (2020) Kernel taylor-based value function approximation for continuous-state markov decision processes. Corvalis, OR: Proceedings of Robotics: Science and Systems.
[99]
Xu J, Yin K, and Gregory JM, et al. (2023) Causal inference for de-biasing motion estimation from robotic observational data. In 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023, pp. 3008–3014.
[100]
Yang L and Wang M (2019) Sample-optimal parametric q-learning using linearly additive features. PMLR 97: 6995–7004.
[101]
Zhong M, Johnson M, and Tassa Y, et al. (2023) Value function approximation and model predictive control. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), Singapore, 16–19 April 2013, pp. 100–107. IEEE.
[102]
Zhou B, Gao F, and Wang L, et al. (2019) Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robotics and Automation Letters 4(4): 3529–3536.
[103]
Zhou B, Gao F, and Pan J, et al. (2020) Robust real-time uav replanning using guided gradient-based optimization and topological paths. In 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, 31 May–31 August 2020, pp. 1208–1214. IEEE.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 43, Issue 12
Oct 2024
140 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 16 October 2024

Author Tags

  1. Autonomous navigation
  2. motion planning and control
  3. diffusion Markov decision process
  4. value function
  5. finite elements methods
  6. kernels
  7. second order HJB equation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media