Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery

Scott Proper²¹ &
Prasad Tadepalli²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

5983 Accesses

Abstract

Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce “tabular linear functions” that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algorithm called ASH-learning, which is an afterstate version of H-Learning. Our extensions make it practical to apply reinforcement learning to a domain of product delivery – an optimization problem that combines inventory control and vehicle routing.

Download to read the full chapter text

Chapter PDF

VLS: A Reinforcement Learning-Based Value Lookahead Strategy for Multi-product Order Fulfillment

Learning to Solve a Stochastic Orienteering Problem with Time Windows

Hybrid algorithm based on reinforcement learning for smart inventory management

Article Open access 03 August 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Google Scholar
Powell, W.B., Van Roy, B.: Approximate Dynamic Programming for High-Dimensional Dynamic Resource Allocation Problems. In: Si, J., Barto, A.G., Powell, W.B., Wunsch, D. (eds.) Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press, Hoboken (2004)
Google Scholar
Van Roy, B., Bertsekas, D.P., Lee, Y., Tsitsiklis, J.N.: A Neuro-Dynamic Programming Approach to Retailer Inventory Management. In: Proceedings of the IEEE Conference on Decision and Control (1997)
Google Scholar
Secamondi, N.: Comparing Neuro-Dynamic Programming Algorithms for the Vehicle Routing Problem with Stochastic Demands. Computers and Operations Research 27(11-12) (2000)
Google Scholar
Secamondi, N.: A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands. Operations Research 49(5), 768–802 (2001)
Google Scholar
Strens, M., Windelinckx, N.: Combining planning with reinforcement learning for multi-robot task allocation. In: Kudenko, D., Kazakov, D., Alonso, E. (eds.) AAMAS 2004. LNCS, vol. 3394, pp. 260–274. Springer, Heidelberg (2005)
Chapter Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Dynamic Stochastic Programming. John Wiley, Chichester (1994)
MATH Google Scholar
Tadepalli, P., Ok, D.: Model-based Average Reward Reinforcement Learning. Artificial Intelligence 100, 177–224 (1998)
Article MATH Google Scholar
Bräysy, O., Gendreau, M.: Vehicle Routing Problem with Time Windows, Part II: Metaheuristics. Working Paper, SINTEF Applied Mathematics, Department of Optimisation, Norway (2003)
Google Scholar
Ghavamzadeh, M., Mahadevan, S.: Learning to communicate and act using hierarchical reinforcement learning. In: AAMAS, pp. 1114–1121. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (2002)
Google Scholar
Schwartz, A.: A Reinforcement Learning Method for Maximizing Undiscounted Rewards. In: Proceedings of the 10th International Conference on Machine Learning, Amherst, Massachusetts, pp. 298–305. Morgan Kaufmann, San Francisco (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Oregon State University, Corvallis, OR, 97331-3202, USA
Scott Proper & Prasad Tadepalli

Authors

Scott Proper
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Tadepalli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Proper, S., Tadepalli, P. (2006). Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_74

Download citation

DOI: https://doi.org/10.1007/11871842_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery

Abstract

Chapter PDF

Similar content being viewed by others

VLS: A Reinforcement Learning-Based Value Lookahead Strategy for Multi-product Order Fulfillment

Learning to Solve a Stochastic Orienteering Problem with Time Windows

Hybrid algorithm based on reinforcement learning for smart inventory management

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery

Abstract

Chapter PDF

Similar content being viewed by others

VLS: A Reinforcement Learning-Based Value Lookahead Strategy for Multi-product Order Fulfillment

Learning to Solve a Stochastic Orienteering Problem with Time Windows

Hybrid algorithm based on reinforcement learning for smart inventory management

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation