Article

Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance

Authors:

Andrea L. Thomaz,

Cynthia BreazealAuthors Info & Claims

AAAI'06: Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Pages 1000 - 1005

Published: 16 July 2006 Publication History

Abstract

As robots become a mass consumer product, they will need to learn new skills by interacting with typical human users. Past approaches have adapted reinforcement learning (RL) to accept a human reward signal; however, we question the implicit assumption that people shall only want to give the learner feedback on its past actions. We present findings from a human user study showing that people use the reward signal not only to provide feedback about past actions, but also to provide future directed rewards to guide subsequent actions. Given this, we made specific modifications to the simulated RL robot to incorporate guidance. We then analyze and evaluate its learning performance in a second user study, and we report significant improvements on several measures. This work demonstrates the importance of understanding the human-teacher/robot-learner system as a whole in order to design algorithms that support how people want to teach while simultaneously improving the robot's learning performance.

References

[1]

Blumberg, B.; Downie, M.; Ivanov, Y.; Berlin, M.; Johnson, M.; and Tomlinson, B. 2002. Integrated learning for interactive synthetic characters. In Proceedings of the ACM SIGGRAPH.

Digital Library

[2]

Clouse, J., and Utgoff, P. 1992. A teaching method for reinforcement learning. In Proc. of the Nineth International Conf. on Machine Learning (ICML), 92-101.

Digital Library

[3]

Cohn, D.; Ghahramani, Z.; and Jordan., M. 1995. Active learning with statistical models. In Tesauro, G.; Touretzky, D.; and Alspector, J., eds., Advances in Neural Information Processing, volume 7. Morgan Kaufmann.

[4]

Evans, R. 2002. Varieties of learning. In Rabin, S., ed., AI Game Programming Wisdom. Hingham, MA: Charles River Media. 567-578.

[5]

Horvitz, E.; Breese, J.; Heckerman, D.; Hovel, D.; and Rommelse. K. 1998. The lumiere project: Bayesian user modeling for inferring the goals and needs of software users. In In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, 256-265.

Digital Library

[6]

Isbell, C.; Shelton, C.; Kearns, M.; Singh, S.; and Stone, P. 2001. Cobot: A social reinforcement learning agent. 5th Intern. Conf. on Autonomous Agents.

Digital Library

[7]

Kaplan, F.; Oudeyer, P.-Y.; Kubinyi, E.; and Miklosi, A. 2002. Robotic clicker training. Robotics and Autonomous Systems 38(3-4):197-206.

[8]

Kuhlmann, G.; Stone, P.; Mooney, R. J.; and Shavlik, J. W. 2004. Guiding a reinforcement learner with natural language advice: Initial results in robocup soccer. In Proceedings of the AAAI-2004 Workshop on Supervisory Control of Learning and Adaptive Systems.

[9]

Lashkari, Y.; Metral, M.; and Maes, P. 1994. Collaborative Interface Agents. In Proceedings of the Twelfth National Conference on Artificial Intelligence, volume 1. Seattle, WA: AAAI Press.

Digital Library

[10]

Lauria, S.; Bugmann, G.; Kyriacou, T.; and Klein, E. 2002. Mobile robot programming using natural language. Robotics and Autonomous Systems 38(3-4): 171-181.

[11]

Lieberman, H., ed. 2001. Your Wish is My Command: Programming by Example. San Francisco: Morgan Kaufmann.

[12]

Lockerd, A., and Breazeal, C. 2004. Tutelage and socially guided robot learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]

Mataric, M. 1997. Reinforcement learning in the multi-robot domain. Autonomous Robots 4(1):73-83.

Digital Library

[14]

Nicolescu, M. N., and Mataric, M. J. 2003. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the 2nd Intl. Conf. AAMAS.

Digital Library

[15]

Schaal, S. 1999. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3:233242.

[16]

Schohn, G., and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proc. 17th ICML, 839-846. Morgan Kaufmann, San Francisco, CA.

Digital Library

[17]

Smart, W., and Kaelbling, L. 2002. Effective reinforcement learning for mobile robots.

[18]

Stern, A.; Frank, A.; and Resner, B. 1998. Virtual petz (video session): a hybrid approach to creating autonomous, lifelike dogz and catz. In AGENTS '98: Proceedings of the second international conference on Autonomous agents, 334-335. New York, NY, USA: ACM Press.

Digital Library

[19]

Thrun, S. B., and Mitchell, T. M. 1993. Lifelong robot learning. Technical Report IAI-TR-93-7.

Digital Library

[20]

Thrun, S. 2002. Robotics. In Russell, S., and Norvig, P., eds., Artificial Intelligence: A Modern Approach (2nd edition). Prentice Hall.

[21]

Voyles, R., and Khosla, P. 1998. A multi-agent system for programming robotic agents by human demonstration. In Proceedings of AI and Manufacturing Research Planning Workshop.

[22]

Watkins, C., and Dayan, P. 1992. Q-Iearning. Machine Learning 8(3):279-292.

Digital Library

Cited By

van Waveren SPek CTumova JLeite ISakamoto DWeiss AHiatt LShiomi M(2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning PoliciesProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523825(493-501)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.5555/3523760.3523825
Wang HChen GGomez RNakamura KHe BLi G(2022)Shaping Haru’s Affective Behavior with Valence and Arousal Based Implicit Facial Feedback2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900540(769-776)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1109/RO-MAN53752.2022.9900540
Schubert IDriess DOguz OToussaint MRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Learning to executeProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540408(1912-1924)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540408
Show More Cited By

Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance
1. Computer systems organization
  1. Embedded and cyber-physical systems
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
    2. Planning and scheduling

Recommendations

Interactive Reinforcement Learning from Imperfect Teachers
HRI '21 Companion: Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction

Robots can use information from people to improve learning speed or quality. However, people can have short attention spans and misunderstand tasks. Our work addresses these issues with algorithms for learning from inattentive teachers that take ...
Reinforcement learning from simultaneous human and MDP reward
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users---without programming skills---can transfer their task ...
Using Human Reinforcement Learning Models to Improve Robots as Teachers
HRI '18: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction

Robotic teaching has not received nearly as much research attention as robotic learning. In this research, we used the humanoid robot Baxter to provide feedback and positive reinforcement to human participants attempting to achieve a complex task. Our ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'06: Proceedings of the 21st national conference on Artificial intelligence - Volume 1

July 2006

1005 pages

ISBN:9781577352815

Editor:
Anthony Cohn
University of Leeds

Sponsors

AAAI: American Association for Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 16 July 2006

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

van Waveren SPek CTumova JLeite ISakamoto DWeiss AHiatt LShiomi M(2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning PoliciesProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523825(493-501)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.5555/3523760.3523825
Wang HChen GGomez RNakamura KHe BLi G(2022)Shaping Haru’s Affective Behavior with Valence and Arousal Based Implicit Facial Feedback2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900540(769-776)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1109/RO-MAN53752.2022.9900540
Schubert IDriess DOguz OToussaint MRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Learning to executeProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540408(1912-1924)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540408
Seetanadi GÅrzén KMaggio MCucu-Grosjean LMedina RAltmeyer SScharbarg J(2020)Adaptive Routing with Guaranteed Delay Bounds using Safe Reinforcement LearningProceedings of the 28th International Conference on Real-Time Networks and Systems10.1145/3394810.3394815(149-160)Online publication date: 9-Jun-2020
https://dl.acm.org/doi/10.1145/3394810.3394815
Gombolay MJensen RStigile JGolen TShah NSon SShah J(2019)Human-machine collaborative optimization via apprenticeship schedulingJournal of Artificial Intelligence Research10.1613/jair.1.1123363:1(1-49)Online publication date: 17-Apr-2019
https://dl.acm.org/doi/10.1613/jair.1.11233
Doering MLiu PGlas DKanda TKulić DIshiguro H(2019)Curiosity Did Not Kill the RobotACM Transactions on Human-Robot Interaction10.1145/33264628:3(1-24)Online publication date: 23-Jul-2019
https://dl.acm.org/doi/10.1145/3326462
Ayala AHenríquez CCruz FPetkov NStrisciuglio NTravieso C(2019)Reinforcement learning using continuous states and interactive feedbackProceedings of the 2nd International Conference on Applications of Intelligent Systems10.1145/3309772.3309801(1-5)Online publication date: 7-Jan-2019
https://dl.acm.org/doi/10.1145/3309772.3309801
Canal GAlenyà GTorras C(2019)Adapting robot task planning to user preferencesAutonomous Robots10.1007/s10514-018-9737-243:6(1343-1356)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s10514-018-9737-2
Frazier SRiedl MSmith GLelis L(2018)Improving deep reinforcement learning in minecraft with action adviceProceedings of the Fifteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment10.5555/3505425.3505446(146-152)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3505425.3505446
Taylor M(2018)Improving reinforcement learning with human inputProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304833(5724-5728)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304833
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten