Abstract
Probabilistic finite state machines have become a popular modeling tool for representing sequential processes, ranging from images and speech signals to text documents and spatial and genomic maps. In this paper, I describe two hierarchical abstraction mechanisms for simplifying the (estimation) learning and (control) optimization of complex Markov processes: spatial decomposition and temporal aggregation. I present several approaches to combining spatial and temporal abstraction, drawing upon recent work of my group as well as that of others. I show how spatiotemporal abstraction enables improved solutions to three difficult sequential estimation and decision problems: hidden state modeling and control, learning parallel plans, and coordinating with multiple agents.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Boutilier, R. Dearden, and M. Goldszmidt. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1–2):49–107, 2000.
R.H. Crites and A.G. Barto. Elevator group control using multiple reinforcement learning agents. Machine Learning, 33:235–262, 1998.
T. Dean and R. Givan. Model minimization in markov decision processes. Proceedings of AAAI, 1997.
T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142–150, 1989.
T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. International Journal of Artificial Intelligence Research, 13:227–303, 2000.
S. Fine, Y. Singer, and N. Tishby. The Hierarchical Hidden Markov Model: Analysis and Applications. Machine Learning, 32(1), July 1998.
Dayne Freitag and Andrew Kachites McCallum. Information extraction with hmms and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Informatino Extraction, 1999.
M. Ghavamzadeh and S. Mahadevan. Continuous-time hierarchical reinforcement learning. In Proceedings of the Eighteenth International Conference on Machine Learning, 2001.
N. Hernandez and S. Mahadevan. Hierarchical memory-based reinforcement learning. Proceedings of Neural Information Processing Systems, 2001.
F. Jellinek. Statistical Methods in Speech Recognition. MIT Press, 2000.
K. Karplus, C. Barrett, and R. Hughey. Hidden markov models for detecting remote protein homologies, 1998.
Craig A. Knoblock. An analysis of ABSTRIPS. In James Hendler, editor, Artificial Intelligence Planning Systems: Proceedings of the First International Conference (AIPS 92), pages 126–135, College Park, Maryland, USA, 1992. Morgan Kaufmann.
S. Koenig and R. Simmons. Xavier: A robot navigation architecture based on partially observable markov decision process models. In D. Kortenkamp, P. Bonasso, and Murphy. R., editors, AI-based Mobile Robots: Case-studies of Successful Robot Systems. MIT Press, 1997.
Daphne Koller and Ronald Parr. Computing factored value functions for policies in structured mdps. 16th International Joint Conference on Artificial Intelligence (IJCAI), pages 1332–1339, 1999.
M. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157–163, 1994.
S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311–365, 1992. Appeared originally as IBM TR RC16359, Dec 1990.
S. Mahadevan, N. Marchalleck, T. Das, and A. Gosavi. Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Proc. 14th International Conference on Machine Learning, pages 202–210. Morgan Kaufmann, 1997.
Andrew K. McCallum. Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, 1995.
N. Meuleau, M. Hauskrecht, K. Kim, L. Peshkin, L. Kaelbling, T. Dean, and C. Boutilier. Solving very large weakly coupled markov decision processes. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 1998.
S. Minut and S. Mahadevan. A reinforcement learning model of selective visual attention. In Fifth International Conference on Autonomous Agents, 2001.
K. Murphy and M. Paskin. Linear time inference in hierarchical hmms. Proceedings of Neural Information Processing Systems, 2001.
I. Nourbakhsh, R. Powers, and S. Birchfield. Dervish: An office-navigation robot. AI Magazine, 16(2):53–60, 1995.
R.E. Parr. Hierarchical Control and Learning for Markov Decision Processes. PhD Thesis, University of California, Berkeley, 1998.
J. Pineau, N. Roy, and S. Thrun. A hierarchical approach to POMDP planning and execution. In Workshop on Hierarchy and Memory in Reinforcement Learning (ICML 2001), Williams College, MA, June 2001.
A. Prieditis. Machine discovery of admissible heuristics, 1993.
M. L. Puterman. Markov Decision Processes. Wiley Interscience, New York, USA, 1994.
B. Ravi and A. Barto. Model minimization in hierarchical reinforcement learning. In Symposium on Abstraction and Reformulation (SARA 2002). Springer Verlag, 2002.
K. Rohanimanesh and S. Mahadevan. Decision-theoretic planning with concurrent temporally extended actions. In 17th Conference on Uncertainty in Artificial Intelligence, 2001.
K. Rohanimanesh and S. Mahadevan. Incremental learning of factorial markov decision processes. under preparation, 2002.
S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice-Hall, 1994.
Hagit Shatkay and Leslie Pack Kaelbling. Learning topological maps with weak local odometric information. In IJCAI (2), pages 920–929, 1997.
T. Sugawara and V. Lesser. Learning to improve coordinated actions in cooperative distributed problem-solving environments. Machine Learning, 33:129–154, 1998.
R. Sutton and A. Barto. An introduction to reinforcement learning. MIT Press, Cambridge, MA., 1998.
R. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999.
M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, pages 330–337, 1993.
Gerald Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–278, 1992.
G. Theocharous. Hierarchical Learning and Planning in Partially Observable Markov Decision Processes. PhD Thesis, Michigan State University, 2002.
G. Theocharous and S. Mahadevan. Approximate planning with hierarchical partially observable markov decision processs for robot navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.
G. Theocharous, K. Rohanimanesh, and S. Mahadevan. Learning hierarchical partially observable markov decision processs for robot navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2001.
G. Wang and S. Mahadevan. Hierarchical optimization of policy-coupled semi-Markov decision processes. In Proc. 16th International Conf. on Machine Learning, pages 464–473. Morgan Kaufmann, San Francisco, CA, 1999.
C. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, England, 1989.
G. Weiss. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge, MA., 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahadevan, S. (2002). Spatiotemporal Abstraction of Stochastic Sequential Processes. In: Koenig, S., Holte, R.C. (eds) Abstraction, Reformulation, and Approximation. SARA 2002. Lecture Notes in Computer Science(), vol 2371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45622-8_3
Download citation
DOI: https://doi.org/10.1007/3-540-45622-8_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43941-7
Online ISBN: 978-3-540-45622-3
eBook Packages: Springer Book Archive