Li et al., 2021 - Google Patents
Bayesian distributional policy gradientsLi et al., 2021
View PDF- Document ID
- 16582417338726923585
- Author
- Li L
- Faisal A
- Publication year
- Publication venue
- Proceedings of the AAAI Conference on Artificial Intelligence
External Links
Snippet
Abstract Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, ie the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading …
- 230000002787 reinforcement 0 abstract description 7
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B17/00—Systems involving the use of models or simulators of said systems
- G05B17/02—Systems involving the use of models or simulators of said systems electric
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Swazinna et al. | Overcoming model bias for robust offline deep reinforcement learning | |
Lin et al. | Model-based adversarial meta-reinforcement learning | |
Li et al. | Bayesian distributional policy gradients | |
Ross et al. | Bayesian reinforcement learning in continuous POMDPs with application to robot navigation | |
Daly et al. | Learning Bayesian network equivalence classes with ant colony optimization | |
Carr et al. | Counterexample-guided strategy improvement for pomdps using recurrent neural networks | |
Lin et al. | Accelerated replica exchange stochastic gradient Langevin diffusion enhanced Bayesian DeepONet for solving noisy parametric PDEs | |
Rostami et al. | Using task descriptions in lifelong machine learning for improved performance and zero-shot transfer | |
Furmston et al. | Approximate newton methods for policy search in markov decision processes | |
WO2024066675A1 (en) | Multi-agent multi-task hierarchical continuous control method based on temporal equilibrium analysis | |
Hasinoff | Reinforcement learning for problems with hidden state | |
Wang et al. | Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking | |
Zhang et al. | Reinforcement learning under a multi-agent predictive state representation model: Method and theory | |
Beikmohammadi et al. | Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge | |
Hoffman et al. | An expectation maximization algorithm for continuous Markov decision processes with arbitrary reward | |
Morere et al. | Reinforcement learning with probabilistically complete exploration | |
Milios et al. | Probabilistic model checking for continuous-time Markov chains via sequential Bayesian inference | |
Djeumou et al. | Task-guided inverse reinforcement learning under partial information | |
Alpcan | Dual control with active learning using Gaussian process regression | |
Graña et al. | Cooperative multi-agent reinforcement learning for multi-component robotic systems: guidelines for future research | |
Rohatgi | Computationally Efficient Reinforcement Learning under Partial Observability | |
Huang et al. | Deep reinforcement learning | |
Milios et al. | Probabilistic model checking for continuous time markov chains via sequential bayesian inference | |
Boularias et al. | Apprenticeship learning with few examples | |
Dastider et al. | Learning adaptive control in dynamic environments using reproducing kernel priors with bayesian policy gradients |