Using a logarithmic mapping to enable lower discount factors in reinforcement learning
Abstract
References
Index Terms
- Using a logarithmic mapping to enable lower discount factors in reinforcement learning
Recommendations
Continuous-Time Markov Decision Processes with State-Dependent Discount Factors
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are ...
Rethinking the discount factor in reinforcement learning: a decision theoretic approach
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial IntelligenceReinforcement learning (RL) agents have traditionally been tasked with maximizing the value function of a Markov decision process (MDP), either in continuous settings, with fixed discount factor γ < 1, or in episodic settings, with γ = 1. While this has ...
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning
<P>A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs referred to, in general, as Markov decision problems or MDPs. ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Chapter
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 34Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)5
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in