research-article

Open access

Interaction Algorithm Effect on Human Experience with Reinforcement Learning

Authors:

Samantha Krening,

Karen M. FeighAuthors Info & Claims

ACM Transactions on Human-Robot Interaction (THRI), Volume 7, Issue 2

Article No.: 16, Pages 1 - 22

https://doi.org/10.1145/3277904

Published: 24 October 2018 Publication History

Abstract

A goal of interactive machine learning (IML) is to enable people with no specialized training to intuitively teach intelligent agents how to perform tasks. Toward achieving that goal, we are studying how the design of the interaction method for a Bayesian Q-Learning algorithm impacts aspects of the human’s experience of teaching the agent using human-centric metrics such as frustration in addition to traditional ML performance metrics. This study investigated two methods of natural language instruction: critique and action advice. We conducted a human-in-the-loop experiment in which people trained two agents with different teaching methods but, unknown to each participant, the same underlying reinforcement learning algorithm. The results show an agent that learns from action advice creates a better user experience compared to an agent that learns from binary critique in terms of frustration, perceived performance, transparency, immediacy, and perceived intelligence. We identified nine main characteristics of an IML algorithm’s design that impact the human’s experience with the agent, including using human instructions about the future, compliance with input, empowerment, transparency, immediacy, a deterministic interaction, the complexity of the instructions, accuracy of the speech recognition software, and the robust and flexible nature of the interaction algorithm.

References

[1]

Brenna D. Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’08). IEEE, 399--404.

[2]

Christoph Bartneck, Takayuki Kanda, Omar Mubin, and Abdullah Al Mahmud. 2009. Does the design of a robot influence its animacy and perceived intelligence? International Journal of Social Robotics 1, 2 (2009), 195--204.

[3]

Thomas Cederborg, Ishaan Grover, Charles L. Isbell, and Andrea Lockerd Thomaz. 2015. Policy shaping with human teachers. In International Joint Conferences on Artificial Intelligence (IJCAI'15). 3366--3372.

Digital Library

[4]

Richard Dearden, Nir Friedman, and Stuart Russell. 1998. Bayesian Q-learning. In AAAI/IAAI. 761--768.

Digital Library

[5]

Carl F. DiSalvo, Francine Gemperle, Jodi Forlizzi, and Sara Kiesler. 2002. All robots are not created equal: The design and perception of humanoid robot heads. In Proceedings of the 4th Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques. ACM, 321--326.

Digital Library

[6]

Brian R. Duffy. 2003. Anthropomorphism and the social robot. Robotics and Autonomous Systems 42, 3 (2003), 177--190.

[7]

Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143--166.

[8]

Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. 2625--2633.

Digital Library

[9]

Sandra G. Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 50. Sage Publications, Los Angeles, CA, 904--908.

[10]

Jay Heinrichs. 2017. Thank You for Arguing: What Aristotle, Lincoln, and Homer Simpson Can Teach Us about the Art of Persuasion. Three Rivers Press (CA).

[11]

David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W. Black, Mosur Ravishankar, and Alexander I. Rudnicky. 2006. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 1. IEEE, I--I.

[12]

Charles Isbell, Christian R. Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents. ACM, 377--384.

Digital Library

[13]

Charles Lee Isbell, Michael Kearns, Satinder Singh, Christian R. Shelton, Peter Stone, and Dave Kormann. 2006. Cobot in LambdaMOO: An adaptive social statistics agent. Autonomous Agents and Multi-Agent Systems 13, 3 (2006), 327--354.

Digital Library

[14]

Madhura Joshi, Rakesh Khobragade, Saurabh Sarda, Umesh Deshpande, and Swati Mohan. 2012. Object-oriented representation and hierarchical reinforcement learning in Infinite Mario. In 2012 IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI’12). Vol. 1. IEEE, 1076--1081.

Digital Library

[15]

W. Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. Vol. 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.

Digital Library

[16]

Samantha Krening. 2018. Newtonian action advice: Integrating human verbal instruction with reinforcement learning. arXiv, arXiv:1804.05821.

[17]

Samantha Krening and Karen M. Feigh. 2018. Characteristics that influence perceived intelligence in AI design. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting.

[18]

Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Transactions on Cognitive and Developmental Systems 9, 1 (2017), 44--55.

[19]

Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. 2004. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In The AAAI 2004 Workshop on Supervisory Control of Learning and Adaptive Systems.

[20]

James MacGlashan, Monica Babes-Vroman, Marie desJardins, Michael Littman, Smaranda Muresan, Shawn Squire, Stefanie Tellex, Dilip Arumugam, and Lei Yang. 2015. Grounding English commands to reward functions. In Proceedings of Robotics: Science and Systems.

[21]

Richard Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, and Edward Wild. 2005. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Association for the Advancement of Artificial Intelligence (AAAI'05). 819--824.

Digital Library

[22]

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 55--60. Retrieved from http://www.aclweb.org/anthology/P/P14/P14-5010.

[23]

Cetin Meriçli, Steven D. Klee, Jack Paparian, and Manuela Veloso. 2014. An interactive approach for situated task specification through verbal instructions. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1069--1076.

Digital Library

[24]

Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning (ICML'00). 663--670.

Digital Library

[25]

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.

Digital Library

[26]

Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2, 1--2 (2008), 1--135.

Digital Library

[27]

Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human Factors 39, 2 (1997), 230--253.

[28]

Taylor Phillips. 2010. Put your money where your mouth is: The effects of southern vs. standard accent on perceptions of speakers. Social Sciences (2010), 53--56. https://scholar.google.com/scholar?hl=en&as_sdt===0%2C6&q===Taylor+Phillips.+2010.+Put+your+money+where+your+mouth+is%3A+The+effects+of+southern+vs.+standard+accent+on+perceptions+of+speakers.+S&btnG===

[29]

Tamara Rakić, Melanie C. Steffens, and Amélie Mummendey. 2011. Blinded by the accent&excl; The minor role of looks in ethnic categorization. Journal of Personality and Social Psychology 100, 1 (2011), 16.

[30]

Himanshu Sahni, Brent Harrison, Kaushik Subramanian, Thomas Cederborg, Charles Isbell, and Andrea Thomaz. 2016. Policy shaping in domains with multiple optimal policies. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 1455--1456.

Digital Library

[31]

Manimaran Sivasamy Sivamurugan and Balaraman Ravindran. 2012. Instructing a reinforcement learner. In Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference (FLAIRS'12).

[32]

Burrhus Frederic Skinner. 1990. The behavior of organisms: An experimental analysis. BF Skinner Foundation.

[33]

Burrhus F. Skinner. 1981. Selection by consequences. Science 213, 4507 (1981), 501--504.

[34]

Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). Vol. 1631. Citeseer, 1642.

[35]

Kaushik Subramanian, Charles L. Isbell Jr., and Andrea L. Thomaz. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 447--456.

Digital Library

[36]

Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge.

Digital Library

[37]

Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding natural language commands for robotic navigation and mobile manipulation. In Association for the Advancement of Artificial Intelligence (AAAI'11). Vol. 1. 2.

Digital Library

[38]

Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney. 2016. Learning multi-modal grounded linguistic semantics by playing “I Spy.” In International Joint Conferences on Artificial Intelligence (IJCAI'16). 3477--3483.

Digital Library

[39]

Jesse Thomason, Shiqi Zhang, Raymond J. Mooney, and Peter Stone. 2015. Learning to interpret natural language commands through human-robot dialog. In International Joint Conferences on Artificial Intelligence (IJCAI'15). 1923--1929.

Digital Library

[40]

Andrea L. Thomaz and Cynthia Breazeal. 2008. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence 172, 6--7 (2008), 716--737.

Digital Library

Cited By

Morey DGifford RRayo M(2024)Early Investments for Teaming Dividends: A Human-Centered Approach to a Patient Decompensation Prediction AlgorithmProceedings of the International Symposium on Human Factors and Ergonomics in Health Care10.1177/232785792413104613:1(7-11)Online publication date: 3-Sep-2024
https://doi.org/10.1177/2327857924131046
Sungeelee VJarrassé NSanchez TCaramiaux B(2024)Comparing Teaching Strategies of a Machine Learning-based Prosthetic ArmProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645170(715-730)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645170
Yau KSaleem YChong YFan XEyu JChieng D(2024)The Augmented Intelligence Perspective on Human-in-the-Loop Reinforcement Learning: Review, Concept Designs, and Future DirectionsIEEE Transactions on Human-Machine Systems10.1109/THMS.2024.346737054:6(762-777)Online publication date: Dec-2024
https://doi.org/10.1109/THMS.2024.3467370
Show More Cited By

Index Terms

Interaction Algorithm Effect on Human Experience with Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Learning settings
      1. Learning from critiques
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques
  2. Interaction design

Recommendations

Effect of Interaction Design on the Human Experience with Interactive Reinforcement Learning
DIS '19: Proceedings of the 2019 on Designing Interactive Systems Conference

A goal of interactive machine learning (IML) is to enable people with no machine learning knowledge to intuitively teach intelligent agents how to perform tasks. This study investigates how three factors of the design of an interactive reinforcement ...
Reinforcement learning from simultaneous human and MDP reward
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users---without programming skills---can transfer their task ...
Framing reinforcement learning from human reward

Several studies have demonstrated that reward from a human trainer can be a powerful feedback signal for control-learning algorithms. However, the space of algorithms for learning from such human reward has hitherto not been explored systematically. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Human-Robot Interaction

ACM Transactions on Human-Robot Interaction Volume 7, Issue 2

Special Issue on Artificial Intelligence and Human-Robot Interaction

July 2018

109 pages

EISSN:2573-9522

DOI:10.1145/3284682

Editors:
Odest Chadwicke Jenkins
University of Michigan, USA
,
Selma Sabanovic
Indiana University, USA

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2018

Accepted: 01 August 2018

Revised: 01 August 2018

Received: 01 April 2018

Published in THRI Volume 7, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Office of Naval Research

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
1,645
Total Downloads

Downloads (Last 12 months)259
Downloads (Last 6 weeks)30

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Morey DGifford RRayo M(2024)Early Investments for Teaming Dividends: A Human-Centered Approach to a Patient Decompensation Prediction AlgorithmProceedings of the International Symposium on Human Factors and Ergonomics in Health Care10.1177/232785792413104613:1(7-11)Online publication date: 3-Sep-2024
https://doi.org/10.1177/2327857924131046
Sungeelee VJarrassé NSanchez TCaramiaux B(2024)Comparing Teaching Strategies of a Machine Learning-based Prosthetic ArmProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645170(715-730)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645170
Yau KSaleem YChong YFan XEyu JChieng D(2024)The Augmented Intelligence Perspective on Human-in-the-Loop Reinforcement Learning: Review, Concept Designs, and Future DirectionsIEEE Transactions on Human-Machine Systems10.1109/THMS.2024.346737054:6(762-777)Online publication date: Dec-2024
https://doi.org/10.1109/THMS.2024.3467370
Guo ZNorman TGerding E(2024)Multi-trainer binary feedback interactive reinforcement learningAnnals of Mathematics and Artificial Intelligence10.1007/s10472-024-09956-4Online publication date: 2-Oct-2024
https://doi.org/10.1007/s10472-024-09956-4
Xia ZLi NXu X(2023)A Bibliometric Review of Analyzing the Intellectual Structure of the Knowledge Based on AI Chatbot Application from 2005–2022Journal of Information Systems Engineering and Management10.55267/iadt.07.144288:1(25843)Online publication date: 2023
https://doi.org/10.55267/iadt.07.14428
Perlmutter MKrening S(2023)The Impact of Example-Based Xai on Trust in Highly-Technical PopulationsProceedings of the Human Factors and Ergonomics Society Annual Meeting10.1177/2169506723119260267:1(1386-1392)Online publication date: 21-Oct-2023
https://doi.org/10.1177/21695067231192602
Bingley WCurtis CLockey SBialkowski AGillespie NHaslam SKo RSteffens NWiles JWorthy P(2023)Where is the human in human-centered AI? Insights from developer priorities and user experiencesComputers in Human Behavior10.1016/j.chb.2022.107617141:COnline publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1016/j.chb.2022.107617
Chetouani M(2023)Interactive Robot Learning: An OverviewHuman-Centered Artificial Intelligence10.1007/978-3-031-24349-3_9(140-172)Online publication date: 4-Apr-2023
https://doi.org/10.1007/978-3-031-24349-3_9
Kaluarachchi TReis ANanayakkara S(2021)A Review of Recent Deep Learning Approaches in Human-Centered Machine LearningSensors10.3390/s2107251421:7(2514)Online publication date: 3-Apr-2021
https://doi.org/10.3390/s21072514
Sheidlower IShort EBethel CPaiva ABroadbent EFeil-Seifer DSzafir D(2021)When Oracles Go Wrong: Using Preferences as a Means to ExploreCompanion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3434074.3447189(344-348)Online publication date: 8-Mar-2021
https://dl.acm.org/doi/10.1145/3434074.3447189
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents