Nothing Special   »   [go: up one dir, main page]

Tian et al., 2023 - Google Patents

On the performance of temporal difference learning with neural networks

Tian et al., 2023

View PDF
Document ID
8542089570375555757
Author
Tian H
Paschalidis I
Olshevsky A
Publication year
Publication venue
arXiv preprint arXiv:2312.05397

External Links

Snippet

Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/436Semantic checking
    • G06F8/437Type checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/02Knowledge representation
    • G06N5/022Knowledge engineering, knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures

Similar Documents

Publication Publication Date Title
Li et al. Stabilisation of highly nonlinear hybrid stochastic differential delay equations by delay feedback control
Tian et al. On the performance of temporal difference learning with neural networks
Wainwright Variance-reduced $ Q $-learning is minimax optimal
Fu et al. Global finite-time stabilization of a class of switched nonlinear systems with the powers of positive odd rational numbers
Howson et al. A new algorithm for the solution of multi-state dynamic programming problems
Ashcroft et al. Lucid—A formal system for writing and proving programs
Daitch et al. Faster approximate lossy generalized flow via interior point algorithms
Wolf et al. Exact real-time dynamics of the quantum Rabi model
Stinga et al. Regularity theory for the fractional harmonic oscillator
Jorba et al. Effective reducibility of quasi-periodic linear equations close to constant coefficients
Farjadnasab et al. Model-free LQR design by Q-function learning
Tran-Dinh et al. Fast inexact decomposition algorithms for large-scale separable convex optimization
Nemkov et al. Fourier expansion in variational quantum algorithms
Rakkiyappan et al. Non-fragile robust synchronization for Markovian jumping chaotic neural networks of neutral-type with randomly occurring uncertainties and mode-dependent time-varying delays
Liang et al. A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward
Lucia et al. Efficient stochastic model predictive control based on polynomial chaos expansions for embedded applications
Devraj et al. Zap Q-Learning-a user's guide
Johnstone et al. Projective splitting with forward steps only requires continuity
Hirosawa et al. Generalised energy conservation law for wave equations with variable propagation speed
Kelleche et al. Adaptive Stabilization of a Kirchhoff moving string
Yin et al. Fuzzy dynamical system approach for a dual-parameter hybrid-order robust control design
Wu et al. Multiobjective control for nonlinear stochastic Poisson jump-diffusion systems via TS fuzzy interpolation and Pareto optimal scheme
Campbell et al. A minimal norm corrected underdetermined Gauß–Newton procedure
Zhang et al. Dynamic privacy allocation for locally differentially private federated learning with composite objectives
Bartłomiejczyk et al. Hopf bifurcation in time‐delayed gene expression model with dimers