default search action
John Schulman
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c29]Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe:
Let's Verify Step by Step. ICLR 2024 - 2023
- [c28]Leo Gao, John Schulman, Jacob Hilton:
Scaling Laws for Reward Model Overoptimization. ICML 2023: 10835-10866 - [i37]Jacob Hilton, Jie Tang, John Schulman:
Scaling laws for single-agent reinforcement learning. CoRR abs/2301.13442 (2023) - [i36]Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe:
Let's Verify Step by Step. CoRR abs/2305.20050 (2023) - 2022
- [c27]Jacob Hilton, Karl Cobbe, John Schulman:
Batch size-invariance for policy optimization. NeurIPS 2022 - [c26]Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe:
Training language models to follow instructions with human feedback. NeurIPS 2022 - [i35]Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, Ryan Lowe:
Training language models to follow instructions with human feedback. CoRR abs/2203.02155 (2022) - [i34]Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, Mark Chen:
Efficient Training of Language Models to Fill in the Middle. CoRR abs/2207.14255 (2022) - [i33]Leo Gao, John Schulman, Jacob Hilton:
Scaling Laws for Reward Model Overoptimization. CoRR abs/2210.10760 (2022) - 2021
- [c25]Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman:
Phasic Policy Gradient. ICML 2021: 2020-2027 - [i32]William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada P. Mohanty, Keisuke Nakata, Ruslan Salakhutdinov, John Schulman, Shinya Shiroshita, Nicholay Topin, Avinash Ummadisingu, Oriol Vinyals:
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors. CoRR abs/2101.11071 (2021) - [i31]Sharada P. Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Grazvydas Semetulskis, João Schapke, Jonas Kubilius, Jurgis Pasukonis, Linas Klimas, Matthew J. Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe:
Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark. CoRR abs/2103.15332 (2021) - [i30]Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt:
Unsolved Problems in ML Safety. CoRR abs/2109.13916 (2021) - [i29]Jacob Hilton, Karl Cobbe, John Schulman:
Batch size-invariance for policy optimization. CoRR abs/2110.00641 (2021) - [i28]Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman:
Training Verifiers to Solve Math Word Problems. CoRR abs/2110.14168 (2021) - [i27]Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman:
WebGPT: Browser-assisted question-answering with human feedback. CoRR abs/2112.09332 (2021) - 2020
- [j2]Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman:
Teacher-Student Curriculum Learning. IEEE Trans. Neural Networks Learn. Syst. 31(9): 3732-3740 (2020) - [c24]Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman:
Leveraging Procedural Generation to Benchmark Reinforcement Learning. ICML 2020: 2048-2056 - [c23]Heewoo Jun, Rewon Child, Mark Chen, John Schulman, Aditya Ramesh, Alec Radford, Ilya Sutskever:
Distribution Augmentation for Generative Modeling. ICML 2020: 5006-5019 - [c22]Sharada P. Mohanty, Jyotish Poonganam, Adrien Gaidon, Andrey Kolobov, Blake Wulfe, Dipam Chakraborty, Grazvydas Semetulskis, João Schapke, Jonas Kubilius, Jurgis Pasukonis, Linas Klimas, Matthew J. Hausknecht, Patrick MacAlpine, Quang Nhat Tran, Thomas Tumiel, Xiaocheng Tang, Xinwei Chen, Christopher Hesse, Jacob Hilton, William Hebgen Guss, Sahika Genc, John Schulman, Karl Cobbe:
Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark. NeurIPS (Competition and Demos) 2020: 361-395 - [i26]Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman:
Phasic Policy Gradient. CoRR abs/2009.04416 (2020) - [i25]Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish:
Scaling Laws for Autoregressive Generative Modeling. CoRR abs/2010.14701 (2020)
2010 – 2019
- 2019
- [c21]Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, John Schulman:
Quantifying Generalization in Reinforcement Learning. ICML 2019: 1282-1289 - [i24]Jacob Jackson, John Schulman:
Semi-Supervised Learning by Label Gradient Alignment. CoRR abs/1902.02336 (2019) - [i23]Thomas Anthony, Robert Nishihara, Philipp Moritz, Tim Salimans, John Schulman:
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees. CoRR abs/1904.03646 (2019) - [i22]Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman:
Leveraging Procedural Generation to Benchmark Reinforcement Learning. CoRR abs/1912.01588 (2019) - 2018
- [c20]Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel:
Model-Based Reinforcement Learning via Meta-Policy Optimization. CoRL 2018: 617-629 - [c19]Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman:
Meta Learning Shared Hierarchies. ICLR (Poster) 2018 - [c18]Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, Sergey Levine:
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. Robotics: Science and Systems 2018 - [i21]Alex Nichol, Joshua Achiam, John Schulman:
On First-Order Meta-Learning Algorithms. CoRR abs/1803.02999 (2018) - [i20]Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, John Schulman:
Gotta Learn Fast: A New Benchmark for Generalization in RL. CoRR abs/1804.03720 (2018) - [i19]Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel:
Model-Based Reinforcement Learning via Meta-Policy Optimization. CoRR abs/1809.05214 (2018) - [i18]Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, John Schulman:
Quantifying Generalization in Reinforcement Learning. CoRR abs/1812.02341 (2018) - 2017
- [c17]Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel:
Variational Lossy Autoencoder. ICLR (Poster) 2017 - [c16]Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel:
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. NIPS 2017: 2753-2762 - [i17]John Schulman, Pieter Abbeel, Xi Chen:
Equivalence Between Policy Gradients and Soft Q-Learning. CoRR abs/1704.06440 (2017) - [i16]Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman:
UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles. CoRR abs/1706.01502 (2017) - [i15]Tambet Matiisen, Avital Oliver, Taco Cohen, John Schulman:
Teacher-Student Curriculum Learning. CoRR abs/1707.00183 (2017) - [i14]John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov:
Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017) - [i13]Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, Sergey Levine:
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. CoRR abs/1709.10087 (2017) - [i12]Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman:
Meta Learning Shared Hierarchies. CoRR abs/1710.09767 (2017) - 2016
- [b1]John Schulman:
Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs. University of California, Berkeley, USA, 2016 - [c15]Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel:
Benchmarking Deep Reinforcement Learning for Continuous Control. ICML 2016: 1329-1338 - [c14]Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel:
VIME: Variational Information Maximizing Exploration. NIPS 2016: 1109-1117 - [c13]Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel:
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. NIPS 2016: 2172-2180 - [c12]John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel:
High-Dimensional Continuous Control Using Generalized Advantage Estimation. ICLR (Poster) 2016 - [i11]Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel:
Benchmarking Deep Reinforcement Learning for Continuous Control. CoRR abs/1604.06778 (2016) - [i10]Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermüller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul F. Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron C. Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Melanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian J. Goodfellow, Matthew Graham, Çaglar Gülçehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrançois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Joseph Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph P. Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang:
Theano: A Python framework for fast computation of mathematical expressions. CoRR abs/1605.02688 (2016) - [i9]Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel:
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks. CoRR abs/1605.09674 (2016) - [i8]Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba:
OpenAI Gym. CoRR abs/1606.01540 (2016) - [i7]Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel:
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. CoRR abs/1606.03657 (2016) - [i6]Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, Dan Mané:
Concrete Problems in AI Safety. CoRR abs/1606.06565 (2016) - [i5]Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel:
Variational Lossy Autoencoder. CoRR abs/1611.02731 (2016) - [i4]Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel:
RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning. CoRR abs/1611.02779 (2016) - [i3]Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel:
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning. CoRR abs/1611.04717 (2016) - 2015
- [c11]John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, Philipp Moritz:
Trust Region Policy Optimization. ICML 2015: 1889-1897 - [c10]John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel:
Gradient Estimation Using Stochastic Computation Graphs. NIPS 2015: 3528-3536 - [i2]John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel:
Trust Region Policy Optimization. CoRR abs/1502.05477 (2015) - [i1]John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel:
Gradient Estimation Using Stochastic Computation Graphs. CoRR abs/1506.05254 (2015) - 2014
- [j1]John Schulman, Yan Duan, Jonathan Ho, Alex X. Lee, Ibrahim Awwal, Henry Bradlow, Jia Pan, Sachin Patil, Ken Goldberg, Pieter Abbeel:
Motion planning with sequential convex optimization and convex collision checking. Int. J. Robotics Res. 33(9): 1251-1270 (2014) - [c9]Yan Duan, Sachin Patil, John Schulman, Kenneth Y. Goldberg, Pieter Abbeel:
Planning locally optimal, curvature-constrained trajectories in 3D using sequential convex optimization. ICRA 2014: 5889-5895 - [c8]Sachin Patil, Yan Duan, John Schulman, Ken Goldberg, Pieter Abbeel:
Gaussian belief space planning with discontinuities in sensing domains. ICRA 2014: 6483-6490 - [c7]Sachin Patil, Gregory Kahn, Michael Laskey, John Schulman, Ken Goldberg, Pieter Abbeel:
Scaling up Gaussian Belief Space Planning Through Covariance-Free Trajectory Optimization and Automatic Differentiation. WAFR 2014: 515-533 - 2013
- [c6]John Schulman, Alex X. Lee, Jonathan Ho, Pieter Abbeel:
Tracking deformable objects with point clouds. ICRA 2013: 1130-1137 - [c5]John Schulman, Ankush Gupta, Sibi Venkatesan, Mallory Tayson-Frederick, Pieter Abbeel:
A case study of trajectory transfer through non-rigid registration for a simplified suturing scenario. IROS 2013: 4111-4117 - [c4]Alex X. Lee, Yan Duan, Sachin Patil, John Schulman, Zoe McCarthy, Jur van den Berg, Ken Goldberg, Pieter Abbeel:
Sigma hulls for Gaussian belief space planning for imprecise articulated robots amid obstacles. IROS 2013: 5660-5667 - [c3]John Schulman, Jonathan Ho, Cameron Lee, Pieter Abbeel:
Learning from Demonstrations Through the Use of Non-rigid Registration. ISRR 2013: 339-354 - [c2]John Schulman, Jonathan Ho, Alex X. Lee, Ibrahim Awwal, Henry Bradlow, Pieter Abbeel:
Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization. Robotics: Science and Systems 2013 - 2011
- [c1]John D. Schulman, Ken Goldberg, Pieter Abbeel:
Grasping and Fixturing as Submodular Coverage Problems. ISRR 2011: 571-583
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-08-21 20:24 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint