Cited By
View all- Chen XMa XLi YYang GYang SGao YEvans RShpitser I(2023)Modified retrace for off-policy temporal difference learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625863(303-312)Online publication date: 31-Jul-2023
- Fellows MSmith MWhiteson SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Why target networks stabilise temporal difference methodsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618803(9886-9909)Online publication date: 23-Jul-2023
- Devraj AMeyn S(2017)Zap Q-learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294771.3294984(2232-2241)Online publication date: 4-Dec-2017
- Show More Cited By