Nothing Special   »   [go: up one dir, main page]

Li et al., 2021 - Google Patents

Bayesian distributional policy gradients

Li et al., 2021

View PDF
Document ID
16582417338726923585
Author
Li L
Faisal A
Publication year
Publication venue
Proceedings of the AAAI Conference on Artificial Intelligence

External Links

Snippet

Abstract Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, ie the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading …
Continue reading at ojs.aaai.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computer systems based on specific mathematical models
    • G06N7/005Probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B17/00Systems involving the use of models or simulators of said systems
    • G05B17/02Systems involving the use of models or simulators of said systems electric
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/18Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines

Similar Documents

Publication Publication Date Title
Swazinna et al. Overcoming model bias for robust offline deep reinforcement learning
Lin et al. Model-based adversarial meta-reinforcement learning
Li et al. Bayesian distributional policy gradients
Ross et al. Bayesian reinforcement learning in continuous POMDPs with application to robot navigation
Daly et al. Learning Bayesian network equivalence classes with ant colony optimization
Carr et al. Counterexample-guided strategy improvement for pomdps using recurrent neural networks
Lin et al. Accelerated replica exchange stochastic gradient Langevin diffusion enhanced Bayesian DeepONet for solving noisy parametric PDEs
Rostami et al. Using task descriptions in lifelong machine learning for improved performance and zero-shot transfer
Furmston et al. Approximate newton methods for policy search in markov decision processes
WO2024066675A1 (en) Multi-agent multi-task hierarchical continuous control method based on temporal equilibrium analysis
Hasinoff Reinforcement learning for problems with hidden state
Wang et al. Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking
Zhang et al. Reinforcement learning under a multi-agent predictive state representation model: Method and theory
Beikmohammadi et al. Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge
Hoffman et al. An expectation maximization algorithm for continuous Markov decision processes with arbitrary reward
Morere et al. Reinforcement learning with probabilistically complete exploration
Milios et al. Probabilistic model checking for continuous-time Markov chains via sequential Bayesian inference
Djeumou et al. Task-guided inverse reinforcement learning under partial information
Alpcan Dual control with active learning using Gaussian process regression
Graña et al. Cooperative multi-agent reinforcement learning for multi-component robotic systems: guidelines for future research
Rohatgi Computationally Efficient Reinforcement Learning under Partial Observability
Huang et al. Deep reinforcement learning
Milios et al. Probabilistic model checking for continuous time markov chains via sequential bayesian inference
Boularias et al. Apprenticeship learning with few examples
Dastider et al. Learning adaptive control in dynamic environments using reproducing kernel priors with bayesian policy gradients