Li et al., 2021 - Google Patents

Bayesian distributional policy gradients

Li et al., 2021

Document ID: 16582417338726923585
Author: Li L; Faisal A
Publication year: 2021
Publication venue: Proceedings of the AAAI Conference on Artificial Intelligence

External Links

Cited by

Snippet

Abstract Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, ie the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading …

Continue reading at ojs.aaai.org (PDF) (other versions)

230000002787 reinforcement 0 abstract description 7

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B17/00—Systems involving the use of models or simulators of said systems
- G05B17/02—Systems involving the use of models or simulators of said systems electric
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines

Similar Documents

Publication	Publication Date	Title
Swazinna et al.	2021	Overcoming model bias for robust offline deep reinforcement learning
Lin et al.	2020	Model-based adversarial meta-reinforcement learning
Li et al.	2021	Bayesian distributional policy gradients
Ross et al.	2008	Bayesian reinforcement learning in continuous POMDPs with application to robot navigation
Daly et al.	2009	Learning Bayesian network equivalence classes with ant colony optimization
Carr et al.	2019	Counterexample-guided strategy improvement for pomdps using recurrent neural networks
Lin et al.	2021	Accelerated replica exchange stochastic gradient Langevin diffusion enhanced Bayesian DeepONet for solving noisy parametric PDEs
Rostami et al.	2020	Using task descriptions in lifelong machine learning for improved performance and zero-shot transfer
Furmston et al.	2016	Approximate newton methods for policy search in markov decision processes
WO2024066675A1 (en)	2024-04-04	Multi-agent multi-task hierarchical continuous control method based on temporal equilibrium analysis
Hasinoff	2002	Reinforcement learning for problems with hidden state
Wang et al.	2023	Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking
Zhang et al.	2022	Reinforcement learning under a multi-agent predictive state representation model: Method and theory
Beikmohammadi et al.	2024	Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge
Hoffman et al.	2009	An expectation maximization algorithm for continuous Markov decision processes with arbitrary reward
Morere et al.	2020	Reinforcement learning with probabilistically complete exploration
Milios et al.	2018	Probabilistic model checking for continuous-time Markov chains via sequential Bayesian inference
Djeumou et al.	2022	Task-guided inverse reinforcement learning under partial information
Alpcan	2011	Dual control with active learning using Gaussian process regression
Graña et al.	2011	Cooperative multi-agent reinforcement learning for multi-component robotic systems: guidelines for future research
Rohatgi	2023	Computationally Efficient Reinforcement Learning under Partial Observability
Huang et al.	2012	Deep reinforcement learning
Milios et al.	2017	Probabilistic model checking for continuous time markov chains via sequential bayesian inference
Boularias et al.	2013	Apprenticeship learning with few examples
Dastider et al.	2022	Learning adaptive control in dynamic environments using reproducing kernel priors with bayesian policy gradients