default search action
Rohin Shah
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [i25]János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda:
AtP*: An efficient and scalable method for localizing LLM behaviour to components. CoRR abs/2403.00745 (2024) - [i24]Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Grégoire Delétang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca D. Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane:
Evaluating Frontier Models for Dangerous Capabilities. CoRR abs/2403.13793 (2024) - [i23]Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda:
Improving Dictionary Learning with Gated Sparse Autoencoders. CoRR abs/2404.16014 (2024) - [i22]Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah:
On scalable oversight with weak LLMs judging strong LLMs. CoRR abs/2407.04622 (2024) - [i21]Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, János Kramár, Anca D. Dragan, Rohin Shah, Neel Nanda:
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2. CoRR abs/2408.05147 (2024) - 2023
- [c14]Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan:
SIRL: Similarity-based Implicit Representation Learning. HRI 2023: 565-574 - [c13]Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Rohin Shah:
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks. NeurIPS 2023 - [i20]Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan:
SIRL: Similarity-based Implicit Representation Learning. CoRR abs/2301.00810 (2023) - [i19]Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada P. Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou, Bingquan Yu, He Liu, Kai Guan, Yujing Hu, Tangjie Lv, Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik, Shu Ishida, João F. Henriques, Robert Klassert, Walter Laurito, Ellen R. Novoseller, Vinicius G. Goecks, Nicholas R. Waytowich, David Watkins, Josh Miller, Rohin Shah:
Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition. CoRR abs/2303.13512 (2023) - [i18]Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, Vladimir Mikulik:
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla. CoRR abs/2307.09458 (2023) - [i17]Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar:
Explaining grokking through circuit efficiency. CoRR abs/2309.02390 (2023) - [i16]Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Rohin Shah:
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks. CoRR abs/2312.02405 (2023) - [i15]Sebastian Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah:
Challenges with unsupervised LLM knowledge discovery. CoRR abs/2312.10029 (2023) - 2022
- [i14]Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G. Goecks, Nicholas R. Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Jun Shern Chan, Daniel del Castillo, Tom Lieberum:
Retrospective on the 2021 BASALT Competition on Learning from Human Feedback. CoRR abs/2204.07123 (2022) - [i13]Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H. Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah:
An Empirical Investigation of Representation Learning for Imitation. CoRR abs/2205.07886 (2022) - [i12]Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton:
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals. CoRR abs/2210.01790 (2022) - 2021
- [c12]Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, Anca D. Dragan, Rohin Shah:
Evaluating the Robustness of Collaborative Agents. AAMAS 2021: 1560-1562 - [c11]David Lindner, Rohin Shah, Pieter Abbeel, Anca D. Dragan:
Learning What To Do by Simulating the Past. ICLR 2021 - [c10]Cynthia Chen, Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H. Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah:
An Empirical Investigation of Representation Learning for Imitation. NeurIPS Datasets and Benchmarks 2021 - [c9]Stephanie Milani, Anssi Kanervisto, Karolis Ramanauskas, Sander Schulhoff, Brandon Houghton, Sharada P. Mohanty, Byron Galbraith, Ke Chen, Yan Song, Tianze Zhou, Bingquan Yu, He Liu, Kai Guan, Yujing Hu, Tangjie Lv, Federico Malato, Florian Leopold, Amogh Raut, Ville Hautamäki, Andrew Melnik, Shu Ishida, João F. Henriques, Robert Klassert, Walter Laurito, Lucas Cazzonelli, Cedric Kulbach, Nicholas Popovic, Marvin Schweizer, Ellen R. Novoseller, Vinicius G. Goecks, Nicholas R. Waytowich, David Watkins, Josh Miller, Rohin Shah:
Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition. NeurIPS (Competition and Demos) 2021: 171-188 - [c8]Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G. Goecks, Nicholas R. Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Jun Shern Chan, Daniel del Castillo, Tom Lieberum:
Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback. NeurIPS (Competition and Demos) 2021: 259-272 - [c7]Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli:
Optimal Policies Tend To Seek Power. NeurIPS 2021: 23063-23074 - [i11]Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, Anca D. Dragan, Rohin Shah:
Evaluating the Robustness of Collaborative Agents. CoRR abs/2101.05507 (2021) - [i10]Rachel Freedman, Rohin Shah, Anca D. Dragan:
Choice Set Misspecification in Reward Inference. CoRR abs/2101.07691 (2021) - [i9]Dmitrii Krasheninnikov, Rohin Shah, Herke van Hoof:
Combining Reward Information from Multiple Sources. CoRR abs/2103.12142 (2021) - [i8]David Lindner, Rohin Shah, Pieter Abbeel, Anca D. Dragan:
Learning What To Do by Simulating the Past. CoRR abs/2104.03946 (2021) - [i7]Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William H. Guss, Sharada P. Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca D. Dragan:
The MineRL BASALT Competition on Learning from Human Feedback. CoRR abs/2107.01969 (2021) - 2020
- [b1]Rohin Shah:
Extracting and Using Preference Information from the State of the World. University of California, Berkeley, USA, 2020 - [c6]Rachel Freedman, Rohin Shah, Anca D. Dragan:
Choice Set Misspecification in Reward Inference. AISafety@IJCAI 2020 - [c5]Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell:
The MAGICAL Benchmark for Robust Imitation. NeurIPS 2020 - [i6]Sam Toyer, Rohin Shah, Andrew Critch, Stuart Russell:
The MAGICAL Benchmark for Robust Imitation. CoRR abs/2011.00401 (2020)
2010 – 2019
- 2019
- [c4]Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca D. Dragan:
Preferences Implicit in the State of the World. ICLR (Poster) 2019 - [c3]Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan:
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. ICML 2019: 5670-5679 - [c2]Micah Carroll, Rohin Shah, Mark K. Ho, Tom Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca D. Dragan:
On the Utility of Learning about Humans for Human-AI Coordination. NeurIPS 2019: 5175-5186 - [i5]Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca D. Dragan:
Preferences Implicit in the State of the World. CoRR abs/1902.04198 (2019) - [i4]Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan:
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. CoRR abs/1906.09624 (2019) - [i3]Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca D. Dragan:
On the Utility of Learning about Humans for Human-AI Coordination. CoRR abs/1910.05789 (2019) - 2018
- [i2]Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell:
Active Inverse Reward Design. CoRR abs/1809.03060 (2018) - 2016
- [i1]Rohin Shah, Emina Torlak, Rastislav Bodík:
SIMPL: A DSL for Automatic Specialization of Inference Algorithms. CoRR abs/1604.04729 (2016) - 2014
- [c1]Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah E. Chasins, Rastislav Bodík:
Chlorophyll: synthesis-aided compiler for low-power spatial architectures. PLDI 2014: 396-407
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-09-18 23:41 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint