- Open access
- Published:
Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states
Journal of the Brazilian Computer Society volume 15, pages 65–75 (2009)
Abstract
Reinforcement Learning is carried out on-line, through trial-and-error interactions of the agent with the environment, which can be very time consuming when considering robots. In this paper we contribute a new learning algorithm, CFQLearning, which uses macro-states, a low-resolution discretisation of the state space, and a partial-policy to get around obstacles, both of them based on the complexity of the environment structure. The use of macro-states avoids convergence of algorithms, but can accelerate the learning process. In the other hand, partial-policies can guarantee that an agent fulfils its task, even through macro-state. Experiments show that the CFQ-Learning performs a good balance between policy quality and learning rate.
References
Bailey T and Durrant-Whyte H. Simultaneous localisation and mapping (slam): Part ii — state of the art.Robotics and Automation Magazine 2006; 13(3):1–10.
Bianchi RAC.Uso de heurÃsticas para a aceleração do aprendizado por reforço. [PhD thesis]. São Paulo, SP: Universidade de São Paulo; 2004.
Durrant-Whyte H and Bailey T. Simultaneous localisation and mapping (slam): the essential algorithms.Robotics and Automation Magazine 2006; 13(2):1–9. (part I)
Foster D and Dayan P. Structure in the space of value functions.Machine Learning 2002; 49(2/3):325–346.
Jarvis R. Robot path planning: complexity, flexibility and application scope. In:Proceedings of the 2006 international symposium on Practical cognitive agents and robots; 2006; Perth, Australia. New York, SP: ACM; 2006. p. 3–14.
Lee H, Shen Y, Yu CH, Singh G and Andrew Y. Ng: quadruped robot obstacle negotiation via reinforcement learning. In:Proceedings of the IEEE International Conference on Robotics and Automation; 2006; Orlando, Florida. Los Alamitos, CA: IEEE Computer Society Press; 2006. p. 3003–3010.
Marthi B, Russell SJ, Latham D and Guestrin C. Concurrent hierarchical reinforcement learning. In: Kaelbling LP and Saffiotti A. (Eds.).Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; 2005; Edinburgh. San Francisco, CA: Morgan Kaufmann; 2005. p. 779–785.
Mcgovern A, Sutton RS and Fagg AH. Roles of macro-actions in accelerating reinforcement learning. In:Proceedings of the Grace Hopper Grace Hopper Celebration of Women in Computing; 1997; San Jose, CA. Palo Alto, CA: Anita Borg Institute for Women and Technology; 1997. p. 13–18.
Mitchell TM.Machine learning. San Francisco: WCB/McGraw-Hill; 1997.
Moore AW and Atkeson CG. Prioritized sweeping: reinforcement learning with less data and less real time.Machine Learning 1993; 13(1):237–285.
Munos R and Moore A. Variable resolution discretization in optimal control.Machine Learning 2002; 49(2/3):291–323.
Murarka A, Sridharan M and Kuipers B. Detecting obstacles and drop-offs using stereo and motion cues for safe local motion. In:Proceedings of International Conference on Intelligent Robots and Systems; 2008; Nice, France. Los Alamitos, CA: IEEE Computer Society Press; 2008. p. 702–708.
Parr R and Russell S. Reinforcement learning with hierarchies of machines. In:Proceedings of 10 Advances in Neural Information Processing Systems; 1998; Denver, CO. Cambridge, MA: The MIT Press; 1998.
Precup D, Sutton RS and Singh SP. Theoretical results on reinforcement learning with temporally abstract behaviors. In:Proceedings of the Tenth European Conference on Machine Learning; 1998; Berlin. New York: Springer; 1998. p. 382–393.
Ramon J, Driessens K and Croonenborghs T. Transfer learning in reinforcement learning problems through partial policy recycling. In:Proceedings of the 18 European Conference on Machine Learning; 2007; Warsaw. New York, NY: Springer; 2000. p. 699–707.
Reynolds SI. Decision boundary partitioning: variable resolution model-free reinforcement learning. In:Proceedings of the 17 International Conference on Machine Learning; 2000; Palo Alto, CA. San Francisco, CA: Morgan Kaufmann; 2000. p. 783–790.
Ross SM.Applied probability models with optimization applications. San Francisco: Holden-Day; 1970.
Rummery GA and Niranjan M.On-line q-learning using connectionist systems. Cambridge: Cambridge University; 1994. (technical report CUED/F-INFENG/TR 166).
Selvatici AHP and Costa AHR. A hybrid adaptive architecture for mobile robots based on reactive behaviors. In:Proceedings of the 15 International Conference on Hybrid Intelligent Systems; 2005; Rio de Janeiro. Los Alamitos: IEEE Computer Society; 2005. p. 29–34.
Strandberg M. Robot path planning: an object oriented approach. [PhD Thesis]. Sweden: Royal Institute of Technology; 2004.
Sutton RS and Barto AG.Reinforcement learning: an introduction. Cambridge: MIT Press; 1998.
Sutton RS. Learning to predict by method of temporal differences.Machine Learning. 1988; 3(1):9–44.
Sutton RS. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In:Proceedings of the 7 International Conference on Machine Learning; 1990; Austin, TX. San Francisco, CA: Morgan Kaufmann; 1990. p. 216–224.
Watkins JCHC.Learning from Delayed Rewards. [PhD thesis]. Cambridge: University of Cambridge; 1989.
Author information
Authors and Affiliations
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Silva, V.F.d., Costa, A.H.R. Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states. J Braz Comp Soc 15, 65–75 (2009). https://doi.org/10.1007/BF03194507
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF03194507