Golowich et al., 2022 - Google Patents
Planning in observable pomdps in quasipolynomial timeGolowich et al., 2022
View PDF- Document ID
- 783343790480394217
- Author
- Golowich N
- Moitra A
- Rohatgi D
- Publication year
- Publication venue
- arXiv preprint arXiv:2201.04735
External Links
Snippet
Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning …
- 238000000034 method 0 abstract description 30
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/002—Quantum computers, i.e. information processing by using quantum superposition, coherence, decoherence, entanglement, nonlocality, teleportation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Golowich et al. | Planning in observable pomdps in quasipolynomial time | |
Curi et al. | Efficient model-based reinforcement learning through optimistic policy search and planning | |
Xu et al. | Offline rl with no ood actions: In-sample learning via implicit value regularization | |
Zhang et al. | Gendice: Generalized offline estimation of stationary values | |
Wang et al. | Policy gradient method for robust reinforcement learning | |
Rosencrantz et al. | Learning low dimensional predictive representations | |
Zhang et al. | CORPP: Commonsense reasoning and probabilistic planning, as applied to dialog with a mobile robot | |
Chen et al. | On computation and generalization of generative adversarial imitation learning | |
Mao et al. | Information state embedding in partially observable cooperative multi-agent reinforcement learning | |
Gao et al. | An efficient quantum algorithm for generative machine learning | |
Pan et al. | Frequency-based search-control in dyna | |
Goo et al. | Know your boundaries: The necessity of explicit behavioral cloning in offline rl | |
Golowich et al. | Planning and learning in partially observable systems via filter stability | |
Li et al. | Bayesian distributional policy gradients | |
Zhang et al. | Latent state marginalization as a low-cost approach for improving exploration | |
Zhang et al. | Learning retrospective knowledge with reverse reinforcement learning | |
Wang et al. | Mobile agent path planning under uncertain environment using reinforcement learning and probabilistic model checking | |
Fan et al. | Fedhql: Federated heterogeneous q-learning | |
Campbell et al. | Multiagent allocation of markov decision process tasks | |
Beikmohammadi et al. | Accelerating actor-critic-based algorithms via pseudo-labels derived from prior knowledge | |
Rohatgi | Computationally Efficient Reinforcement Learning under Partial Observability | |
Wei | Value of Information and Reward Specification in Active Inference and POMDPs | |
Levchuk et al. | Active learning and structure adaptation in teams of heterogeneous agents: designing organizations of the future | |
Watanabe et al. | Loopy belief propagation, Bethe free energy and graph zeta function | |
Heng | On the use of transport and optimal control methods for Monte Carlo simulation |