default search action
Zhaohan Guo
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c16]Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Rémi Munos, Mark Rowland, Michal Valko, Daniele Calandriello:
A General Theoretical Paradigm to Understand Learning from Human Preferences. AISTATS 2024: 4447-4455 - [c15]Daniele Calandriello, Zhaohan Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. ICML 2024 - [c14]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. ICML 2024 - [c13]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. ICML 2024 - [i17]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. CoRR abs/2402.05749 (2024) - [i16]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney:
Understanding the performance gap between online and offline alignment algorithms. CoRR abs/2405.08448 (2024) - [i15]Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Ávila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana Borsa, Arthur Guez, Will Dabney:
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning. CoRR abs/2406.02035 (2024) - 2023
- [c12]Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Rémi Munos, Will Dabney, Diana L. Borsa:
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition. ICML 2023: 4009-4034 - [c11]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. ICML 2023: 33632-33656 - [i14]Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Rémi Munos, Will Dabney, Diana L. Borsa:
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition. CoRR abs/2305.00654 (2023) - [i13]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. CoRR abs/2312.00886 (2023) - 2022
- [c10]Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. NeurIPS 2022 - [i12]Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. CoRR abs/2206.08332 (2022) - [i11]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. CoRR abs/2212.03319 (2022) - 2021
- [i10]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Ávila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos:
Geometric Entropic Exploration. CoRR abs/2101.02055 (2021) - 2020
- [c9]Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell:
Never Give Up: Learning Directed Exploration Strategies. ICLR 2020 - [c8]Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell:
Agent57: Outperforming the Atari Human Benchmark. ICML 2020: 507-517 - [c7]Zhaohan Daniel Guo, Bernardo Ávila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar:
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning. ICML 2020: 3875-3886 - [c6]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. NeurIPS 2020 - [i9]Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Bilal Piot, Steven Kapturowski, Olivier Tieleman, Martín Arjovsky, Alexander Pritzel, Andrew Bolt, Charles Blundell:
Never Give Up: Learning Directed Exploration Strategies. CoRR abs/2002.06038 (2020) - [i8]Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhaohan Daniel Guo, Charles Blundell:
Agent57: Outperforming the Atari Human Benchmark. CoRR abs/2003.13350 (2020) - [i7]Zhaohan Daniel Guo, Bernardo Ávila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar:
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning. CoRR abs/2004.14646 (2020) - [i6]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. CoRR abs/2006.07733 (2020)
2010 – 2019
- 2019
- [i5]Zhaohan Daniel Guo, Emma Brunskill:
Directed Exploration for Reinforcement Learning. CoRR abs/1906.07805 (2019) - 2018
- [i4]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Toby Pohlen, Rémi Munos:
Neural Predictive Belief Representations. CoRR abs/1811.06407 (2018) - 2017
- [c5]Zhaohan Guo, Philip S. Thomas, Emma Brunskill:
Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation. NIPS 2017: 2492-2501 - [i3]Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill:
Using Options for Long-Horizon Off-Policy Evaluation. CoRR abs/1703.03453 (2017) - [i2]Zhaohan Daniel Guo, Emma Brunskill:
Sample Efficient Feature Selection for Factored MDPs. CoRR abs/1703.03454 (2017) - 2016
- [c4]Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill:
A PAC RL Algorithm for Episodic POMDPs. AISTATS 2016: 510-518 - [c3]Yao Liu, Zhaohan Guo, Emma Brunskill:
PAC Continuous State Online Multitask Reinforcement Learning with Identification. AAMAS 2016: 438-446 - [i1]Zhaohan Daniel Guo, Shayan Doroudi, Emma Brunskill:
A PAC RL Algorithm for Episodic POMDPs. CoRR abs/1605.08062 (2016) - 2015
- [c2]Zhaohan Guo, Emma Brunskill:
Concurrent PAC RL. AAAI 2015: 2624-2630 - 2014
- [c1]Zhaohan Daniel Guo, Gökhan Tür, Wen-tau Yih, Geoffrey Zweig:
Joint semantic utterance classification and slot filling with recursive neural networks. SLT 2014: 554-559
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-04 21:05 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint