Open access
Published: September 2009

Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states

Valdinei Freire da Silva¹ &
Anna Helena Reali Costa¹

Journal of the Brazilian Computer Society volume 15, pages 65–75 (2009)Cite this article

666 Accesses
3 Citations
Metrics details

Abstract

Reinforcement Learning is carried out on-line, through trial-and-error interactions of the agent with the environment, which can be very time consuming when considering robots. In this paper we contribute a new learning algorithm, CFQLearning, which uses macro-states, a low-resolution discretisation of the state space, and a partial-policy to get around obstacles, both of them based on the complexity of the environment structure. The use of macro-states avoids convergence of algorithms, but can accelerate the learning process. In the other hand, partial-policies can guarantee that an agent fulfils its task, even through macro-state. Experiments show that the CFQ-Learning performs a good balance between policy quality and learning rate.

References

Bailey T and Durrant-Whyte H. Simultaneous localisation and mapping (slam): Part ii — state of the art.Robotics and Automation Magazine 2006; 13(3):1–10.
Article Google Scholar
Bianchi RAC.Uso de heurísticas para a aceleração do aprendizado por reforço. [PhD thesis]. São Paulo, SP: Universidade de São Paulo; 2004.
Google Scholar
Durrant-Whyte H and Bailey T. Simultaneous localisation and mapping (slam): the essential algorithms.Robotics and Automation Magazine 2006; 13(2):1–9. (part I)
Article Google Scholar
Foster D and Dayan P. Structure in the space of value functions.Machine Learning 2002; 49(2/3):325–346.
Article MATH Google Scholar
Jarvis R. Robot path planning: complexity, flexibility and application scope. In:Proceedings of the 2006 international symposium on Practical cognitive agents and robots; 2006; Perth, Australia. New York, SP: ACM; 2006. p. 3–14.
Chapter Google Scholar
Lee H, Shen Y, Yu CH, Singh G and Andrew Y. Ng: quadruped robot obstacle negotiation via reinforcement learning. In:Proceedings of the IEEE International Conference on Robotics and Automation; 2006; Orlando, Florida. Los Alamitos, CA: IEEE Computer Society Press; 2006. p. 3003–3010.
Google Scholar
Marthi B, Russell SJ, Latham D and Guestrin C. Concurrent hierarchical reinforcement learning. In: Kaelbling LP and Saffiotti A. (Eds.).Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence; 2005; Edinburgh. San Francisco, CA: Morgan Kaufmann; 2005. p. 779–785.
Google Scholar
Mcgovern A, Sutton RS and Fagg AH. Roles of macro-actions in accelerating reinforcement learning. In:Proceedings of the Grace Hopper Grace Hopper Celebration of Women in Computing; 1997; San Jose, CA. Palo Alto, CA: Anita Borg Institute for Women and Technology; 1997. p. 13–18.
Google Scholar
Mitchell TM.Machine learning. San Francisco: WCB/McGraw-Hill; 1997.
MATH Google Scholar
Moore AW and Atkeson CG. Prioritized sweeping: reinforcement learning with less data and less real time.Machine Learning 1993; 13(1):237–285.
Google Scholar
Munos R and Moore A. Variable resolution discretization in optimal control.Machine Learning 2002; 49(2/3):291–323.
Article MATH Google Scholar
Murarka A, Sridharan M and Kuipers B. Detecting obstacles and drop-offs using stereo and motion cues for safe local motion. In:Proceedings of International Conference on Intelligent Robots and Systems; 2008; Nice, France. Los Alamitos, CA: IEEE Computer Society Press; 2008. p. 702–708.
Google Scholar
Parr R and Russell S. Reinforcement learning with hierarchies of machines. In:Proceedings of 10 Advances in Neural Information Processing Systems; 1998; Denver, CO. Cambridge, MA: The MIT Press; 1998.
Google Scholar
Precup D, Sutton RS and Singh SP. Theoretical results on reinforcement learning with temporally abstract behaviors. In:Proceedings of the Tenth European Conference on Machine Learning; 1998; Berlin. New York: Springer; 1998. p. 382–393.
Google Scholar
Ramon J, Driessens K and Croonenborghs T. Transfer learning in reinforcement learning problems through partial policy recycling. In:Proceedings of the 18 European Conference on Machine Learning; 2007; Warsaw. New York, NY: Springer; 2000. p. 699–707.
Google Scholar
Reynolds SI. Decision boundary partitioning: variable resolution model-free reinforcement learning. In:Proceedings of the 17 International Conference on Machine Learning; 2000; Palo Alto, CA. San Francisco, CA: Morgan Kaufmann; 2000. p. 783–790.
Google Scholar
Ross SM.Applied probability models with optimization applications. San Francisco: Holden-Day; 1970.
MATH Google Scholar
Rummery GA and Niranjan M.On-line q-learning using connectionist systems. Cambridge: Cambridge University; 1994. (technical report CUED/F-INFENG/TR 166).
Google Scholar
Selvatici AHP and Costa AHR. A hybrid adaptive architecture for mobile robots based on reactive behaviors. In:Proceedings of the 15 International Conference on Hybrid Intelligent Systems; 2005; Rio de Janeiro. Los Alamitos: IEEE Computer Society; 2005. p. 29–34.
Google Scholar
Strandberg M. Robot path planning: an object oriented approach. [PhD Thesis]. Sweden: Royal Institute of Technology; 2004.
Google Scholar
Sutton RS and Barto AG.Reinforcement learning: an introduction. Cambridge: MIT Press; 1998.
Google Scholar
Sutton RS. Learning to predict by method of temporal differences.Machine Learning. 1988; 3(1):9–44.
Google Scholar
Sutton RS. Integrated architectures for learning, planning and reacting based on approximating dynamic programming. In:Proceedings of the 7 International Conference on Machine Learning; 1990; Austin, TX. San Francisco, CA: Morgan Kaufmann; 1990. p. 216–224.
Google Scholar
Watkins JCHC.Learning from Delayed Rewards. [PhD thesis]. Cambridge: University of Cambridge; 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratório de Técnicas Inteligentes - LTI, Departamento de Engenharia de Computação e Sistemas Digitais - PCS, Escola Politécnica da Universidade de São Paulo - EPUSP, São Paulo, SP, Brasil
Valdinei Freire da Silva & Anna Helena Reali Costa

Authors

Valdinei Freire da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Anna Helena Reali Costa
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Silva, V.F.d., Costa, A.H.R. Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states. J Braz Comp Soc 15, 65–75 (2009). https://doi.org/10.1007/BF03194507

Download citation

Received: 07 July 2009
Accepted: 27 August 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/BF03194507