Tian et al., 2023 - Google Patents
On the performance of temporal difference learning with neural networksTian et al., 2023
View PDF- Document ID
- 8542089570375555757
- Author
- Tian H
- Paschalidis I
- Olshevsky A
- Publication year
- Publication venue
- arXiv preprint arXiv:2312.05397
External Links
Snippet
Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis …
- 238000013528 artificial neural network 0 title abstract description 26
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/12—Simultaneous equations, e.g. systems of linear equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/13—Differential equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
- G06F8/437—Type checking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Stabilisation of highly nonlinear hybrid stochastic differential delay equations by delay feedback control | |
Tian et al. | On the performance of temporal difference learning with neural networks | |
Wainwright | Variance-reduced $ Q $-learning is minimax optimal | |
Fu et al. | Global finite-time stabilization of a class of switched nonlinear systems with the powers of positive odd rational numbers | |
Howson et al. | A new algorithm for the solution of multi-state dynamic programming problems | |
Ashcroft et al. | Lucid—A formal system for writing and proving programs | |
Daitch et al. | Faster approximate lossy generalized flow via interior point algorithms | |
Wolf et al. | Exact real-time dynamics of the quantum Rabi model | |
Stinga et al. | Regularity theory for the fractional harmonic oscillator | |
Jorba et al. | Effective reducibility of quasi-periodic linear equations close to constant coefficients | |
Farjadnasab et al. | Model-free LQR design by Q-function learning | |
Tran-Dinh et al. | Fast inexact decomposition algorithms for large-scale separable convex optimization | |
Nemkov et al. | Fourier expansion in variational quantum algorithms | |
Rakkiyappan et al. | Non-fragile robust synchronization for Markovian jumping chaotic neural networks of neutral-type with randomly occurring uncertainties and mode-dependent time-varying delays | |
Liang et al. | A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward | |
Lucia et al. | Efficient stochastic model predictive control based on polynomial chaos expansions for embedded applications | |
Devraj et al. | Zap Q-Learning-a user's guide | |
Johnstone et al. | Projective splitting with forward steps only requires continuity | |
Hirosawa et al. | Generalised energy conservation law for wave equations with variable propagation speed | |
Kelleche et al. | Adaptive Stabilization of a Kirchhoff moving string | |
Yin et al. | Fuzzy dynamical system approach for a dual-parameter hybrid-order robust control design | |
Wu et al. | Multiobjective control for nonlinear stochastic Poisson jump-diffusion systems via TS fuzzy interpolation and Pareto optimal scheme | |
Campbell et al. | A minimal norm corrected underdetermined Gauß–Newton procedure | |
Zhang et al. | Dynamic privacy allocation for locally differentially private federated learning with composite objectives | |
Bartłomiejczyk et al. | Hopf bifurcation in time‐delayed gene expression model with dimers |