default search action
Huizhen Yu
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j16]Huizhen Yu:
On Strategic Measures and Optimality Properties in Discrete-Time Stochastic Control with Universally Measurable Policies. Math. Oper. Res. 49(3): 1734-1760 (2024) - [i13]Yi Wan, Huizhen Yu, Richard S. Sutton:
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes. CoRR abs/2408.16262 (2024) - [i12]Huizhen Yu, Yi Wan, Richard S. Sutton:
Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning. CoRR abs/2409.03915 (2024) - 2023
- [i11]Huizhen Yu, Yi Wan, Richard S. Sutton:
A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays. CoRR abs/2312.15091 (2023) - 2022
- [j15]Huizhen Yu:
On Linear Programming for Constrained and Unconstrained Average-Cost Markov Decision Processes with Countable Action Spaces and Strictly Unbounded Costs. Math. Oper. Res. 47(2): 1474-1499 (2022) - 2020
- [j14]Huizhen Yu:
On the Minimum Pair Approach for Average Cost Markov Decision Processes with Countable Discrete Action Spaces and Strictly Unbounded Costs. SIAM J. Control. Optim. 58(2): 660-685 (2020) - [j13]Huizhen Yu:
Average Cost Optimality Inequality for Markov Decision Processes with Borel Spaces and Universally Measurable Policies. SIAM J. Control. Optim. 58(4): 2469-2502 (2020) - [c12]Huizhen Yu:
Research on the Structural Impact of the Disappearance of China's Demographic Dividend on the Education Industry. ICETM 2020: 151-154
2010 – 2019
- 2018
- [j12]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. J. Mach. Learn. Res. 19: 48:1-48:49 (2018) - [i10]Sina Ghiassian, Huizhen Yu, Banafsheh Rafiee, Richard S. Sutton:
Two geometric input transformation methods for fast online reinforcement learning with neural nets. CoRR abs/1805.07476 (2018) - 2017
- [c11]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. Canadian AI 2017: 3-14 - [i9]Ashique Rupam Mahmood, Huizhen Yu, Richard S. Sutton:
Multi-step Off-policy Learning Without Importance Sampling Ratios. CoRR abs/1702.03006 (2017) - [i8]Huizhen Yu, Ashique Rupam Mahmood, Richard S. Sutton:
On Generalized Bellman Equations and Temporal-Difference Learning. CoRR abs/1704.04463 (2017) - [i7]Huizhen Yu:
On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning. CoRR abs/1712.09652 (2017) - 2016
- [j11]Huizhen Yu:
Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize. J. Mach. Learn. Res. 17: 220:1-220:58 (2016) - [i6]Huizhen Yu:
Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms. CoRR abs/1605.02099 (2016) - 2015
- [j10]Huizhen Yu, Dimitri P. Bertsekas:
A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies. Math. Oper. Res. 40(4): 926-968 (2015) - [j9]Huizhen Yu:
On Convergence of Value Iteration for a Class of Total Cost Markov Decision Processes. SIAM J. Control. Optim. 53(4): 1982-2016 (2015) - [c10]Huizhen Yu:
On Convergence of Emphatic Temporal-Difference Learning. COLT 2015: 1724-1751 - [i5]Huizhen Yu:
On Convergence of Emphatic Temporal-Difference Learning. CoRR abs/1506.02582 (2015) - [i4]Ashique Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton:
Emphatic Temporal-Difference Learning. CoRR abs/1507.01569 (2015) - [i3]Huizhen Yu:
Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize. CoRR abs/1511.07471 (2015) - 2013
- [j8]Huizhen Yu, Dimitri P. Bertsekas:
Q-learning and policy iteration algorithms for stochastic shortest path problems. Ann. Oper. Res. 208(1): 95-132 (2013) - [j7]Huizhen Yu, Dimitri P. Bertsekas:
On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems. Math. Oper. Res. 38(2): 209-227 (2013) - 2012
- [j6]Dimitri P. Bertsekas, Huizhen Yu:
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming. Math. Oper. Res. 37(1): 66-94 (2012) - [j5]Huizhen Yu:
Least Squares Temporal Difference Methods: An Analysis under General Conditions. SIAM J. Control. Optim. 50(6): 3310-3343 (2012) - [i2]Huizhen Yu:
A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies. CoRR abs/1207.1421 (2012) - [i1]Huizhen Yu, Dimitri P. Bertsekas:
Discretized Approximations for POMDP with Average Cost. CoRR abs/1207.4154 (2012) - 2011
- [j4]Dimitri P. Bertsekas, Huizhen Yu:
A Unifying Polyhedral Approximation Framework for Convex Optimization. SIAM J. Optim. 21(1): 333-360 (2011) - 2010
- [j3]Huizhen Yu, Dimitri P. Bertsekas:
Error Bounds for Approximations from Projected Linear Equations. Math. Oper. Res. 35(2): 306-329 (2010) - [c9]Dimitri P. Bertsekas, Huizhen Yu:
Distributed asynchronous policy iteration in dynamic programming. Allerton 2010: 1368-1375 - [c8]Dimitri P. Bertsekas, Huizhen Yu:
Q-learning and enhanced policy iteration in discounted dynamic programming. CDC 2010: 1409-1416 - [c7]Huizhen Yu:
Convergence of Least Squares Temporal Difference Methods Under General Conditions. ICML 2010: 1207-1214
2000 – 2009
- 2009
- [j2]Huizhen Yu, Dimitri P. Bertsekas:
Convergence Results for Some Temporal Difference Methods Based on Least Squares. IEEE Trans. Autom. Control. 54(7): 1515-1531 (2009) - [c6]Huizhen Yu, Dimitri P. Bertsekas:
Basis function adaptation methods for cost approximation in MDP. ADPRL 2009: 74-81 - 2008
- [j1]Huizhen Yu, Dimitri P. Bertsekas:
On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP. Math. Oper. Res. 33(1): 1-11 (2008) - [c5]Huizhen Yu, Dimitri P. Bertsekas:
New error bounds for approximations from projected linear equations. Allerton 2008: 1116-1123 - [c4]Huizhen Yu, Dimitri P. Bertsekas:
New Error Bounds for Approximations from Projected Linear Equations. EWRL 2008: 253-267 - 2006
- [b1]Huizhen Yu:
Approximate solution methods for POMDP and POSMDP. Massachusetts Institute of Technology, Cambridge, MA, USA, 2006 - 2005
- [c3]Huizhen Yu:
A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies. UAI 2005: 642-657 - 2004
- [c2]Huizhen Yu, Dimitri P. Bertsekas:
Discretized Approximations for POMDP with Average Cost. UAI 2004: 519 - 2001
- [c1]Huizhen Yu, W. Eric L. Grimson:
Combining Configurational and Statistical Approaches in Image Retrieval. IEEE Pacific Rim Conference on Multimedia 2001: 293-300
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-10 22:16 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint