Abstract
Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery – an optimization problem that combines inventory control and vehicle routing.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Powell, W.B., Van Roy, B.: Approximate Dynamic Programming for High-Dimensional Dynamic Resource Allocation Problems. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press, Hoboken (2004)
Van Roy, B., Bertsekas, D.P., Lee, Y., Tsitsiklis, J.N.: A Neuro-Dynamic Programming Approach to Retailer Inventory Management. In: Proceedings of the IEEE Conference on Decision and Control (1997)
Secamondi, N.: Comparing Neuro-Dynamic Programming Algorithms for the Vehicle Routing Problem with Stochastic Demands. Computers and Operations Research 27(11-12) (2000)
Secamondi, N.: A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands. Operations Research 49(5), 768–802 (2001)
Strens, M., Windelinckx, N.: Combining planning with reinforcement learning for multi-robot task allocation. In: Kudenko, D., Kazakov, D., Alonso, E. (eds.) AAMAS 2004. LNCS, vol. 3394, pp. 260–274. Springer, Heidelberg (2005)
Puterman, M.L.: Markov Decision Processes: Discrete Dynamic Stochastic Programming. John Wiley, Chichester (1994)
Tadepalli, P., Ok, D.: Model-based Average Reward Reinforcement Learning. Artificial Intelligence 100, 177–224 (1998)
Bräysy, O., Gendreau, M.: Vehicle Routing Problem with Time Windows, Part II: Metaheuristics. Working Paper, SINTEF Applied Mathematics, Department of Optimisation, Norway (2003)
Ghavamzadeh, M., Mahadevan, S.: Learning to communicate and act using hierarchical reinforcement learning. In: AAMAS, pp. 1114–1121. IEEE Computer Society, Los Alamitos (2004)
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (2002)
Schwartz, A.: A Reinforcement Learning Method for Maximizing Undiscounted Rewards. In: Proceedings of the 10th International Conference on Machine Learning, Amherst, Massachusetts, pp. 298–305. Morgan Kaufmann, San Francisco (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Proper, S., Tadepalli, P. (2006). Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_74
Download citation
DOI: https://doi.org/10.1007/11871842_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)