Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Interaction Algorithm Effect on Human Experience with Reinforcement Learning

Published: 24 October 2018 Publication History

Abstract

A goal of interactive machine learning (IML) is to enable people with no specialized training to intuitively teach intelligent agents how to perform tasks. Toward achieving that goal, we are studying how the design of the interaction method for a Bayesian Q-Learning algorithm impacts aspects of the human’s experience of teaching the agent using human-centric metrics such as frustration in addition to traditional ML performance metrics. This study investigated two methods of natural language instruction: critique and action advice. We conducted a human-in-the-loop experiment in which people trained two agents with different teaching methods but, unknown to each participant, the same underlying reinforcement learning algorithm. The results show an agent that learns from action advice creates a better user experience compared to an agent that learns from binary critique in terms of frustration, perceived performance, transparency, immediacy, and perceived intelligence. We identified nine main characteristics of an IML algorithm’s design that impact the human’s experience with the agent, including using human instructions about the future, compliance with input, empowerment, transparency, immediacy, a deterministic interaction, the complexity of the instructions, accuracy of the speech recognition software, and the robust and flexible nature of the interaction algorithm.

References

[1]
Brenna D. Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’08). IEEE, 399--404.
[2]
Christoph Bartneck, Takayuki Kanda, Omar Mubin, and Abdullah Al Mahmud. 2009. Does the design of a robot influence its animacy and perceived intelligence? International Journal of Social Robotics 1, 2 (2009), 195--204.
[3]
Thomas Cederborg, Ishaan Grover, Charles L. Isbell, and Andrea Lockerd Thomaz. 2015. Policy shaping with human teachers. In International Joint Conferences on Artificial Intelligence (IJCAI'15). 3366--3372.
[4]
Richard Dearden, Nir Friedman, and Stuart Russell. 1998. Bayesian Q-learning. In AAAI/IAAI. 761--768.
[5]
Carl F. DiSalvo, Francine Gemperle, Jodi Forlizzi, and Sara Kiesler. 2002. All robots are not created equal: The design and perception of humanoid robot heads. In Proceedings of the 4th Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques. ACM, 321--326.
[6]
Brian R. Duffy. 2003. Anthropomorphism and the social robot. Robotics and Autonomous Systems 42, 3 (2003), 177--190.
[7]
Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. 2003. A survey of socially interactive robots. Robotics and Autonomous Systems 42, 3 (2003), 143--166.
[8]
Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. 2625--2633.
[9]
Sandra G. Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 50. Sage Publications, Los Angeles, CA, 904--908.
[10]
Jay Heinrichs. 2017. Thank You for Arguing: What Aristotle, Lincoln, and Homer Simpson Can Teach Us about the Art of Persuasion. Three Rivers Press (CA).
[11]
David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W. Black, Mosur Ravishankar, and Alexander I. Rudnicky. 2006. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’06). Vol. 1. IEEE, I--I.
[12]
Charles Isbell, Christian R. Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents. ACM, 377--384.
[13]
Charles Lee Isbell, Michael Kearns, Satinder Singh, Christian R. Shelton, Peter Stone, and Dave Kormann. 2006. Cobot in LambdaMOO: An adaptive social statistics agent. Autonomous Agents and Multi-Agent Systems 13, 3 (2006), 327--354.
[14]
Madhura Joshi, Rakesh Khobragade, Saurabh Sarda, Umesh Deshpande, and Swati Mohan. 2012. Object-oriented representation and hierarchical reinforcement learning in Infinite Mario. In 2012 IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI’12). Vol. 1. IEEE, 1076--1081.
[15]
W. Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. Vol. 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.
[16]
Samantha Krening. 2018. Newtonian action advice: Integrating human verbal instruction with reinforcement learning. arXiv, arXiv:1804.05821.
[17]
Samantha Krening and Karen M. Feigh. 2018. Characteristics that influence perceived intelligence in AI design. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting.
[18]
Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Transactions on Cognitive and Developmental Systems 9, 1 (2017), 44--55.
[19]
Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. 2004. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In The AAAI 2004 Workshop on Supervisory Control of Learning and Adaptive Systems.
[20]
James MacGlashan, Monica Babes-Vroman, Marie desJardins, Michael Littman, Smaranda Muresan, Shawn Squire, Stefanie Tellex, Dilip Arumugam, and Lei Yang. 2015. Grounding English commands to reward functions. In Proceedings of Robotics: Science and Systems.
[21]
Richard Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, and Edward Wild. 2005. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Association for the Advancement of Artificial Intelligence (AAAI'05). 819--824.
[22]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 55--60. Retrieved from http://www.aclweb.org/anthology/P/P14/P14-5010.
[23]
Cetin Meriçli, Steven D. Klee, Jack Paparian, and Manuela Veloso. 2014. An interactive approach for situated task specification through verbal instructions. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1069--1076.
[24]
Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning (ICML'00). 663--670.
[25]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.
[26]
Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2, 1--2 (2008), 1--135.
[27]
Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human Factors 39, 2 (1997), 230--253.
[28]
Taylor Phillips. 2010. Put your money where your mouth is: The effects of southern vs. standard accent on perceptions of speakers. Social Sciences (2010), 53--56. https://scholar.google.com/scholar?hl=en&as_sdt===0%2C6&q===Taylor+Phillips.+2010.+Put+your+money+where+your+mouth+is%3A+The+effects+of+southern+vs.+standard+accent+on+perceptions+of+speakers.+S&btnG===
[29]
Tamara Rakić, Melanie C. Steffens, and Amélie Mummendey. 2011. Blinded by the accent! The minor role of looks in ethnic categorization. Journal of Personality and Social Psychology 100, 1 (2011), 16.
[30]
Himanshu Sahni, Brent Harrison, Kaushik Subramanian, Thomas Cederborg, Charles Isbell, and Andrea Thomaz. 2016. Policy shaping in domains with multiple optimal policies. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 1455--1456.
[31]
Manimaran Sivasamy Sivamurugan and Balaraman Ravindran. 2012. Instructing a reinforcement learner. In Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference (FLAIRS'12).
[32]
Burrhus Frederic Skinner. 1990. The behavior of organisms: An experimental analysis. BF Skinner Foundation.
[33]
Burrhus F. Skinner. 1981. Selection by consequences. Science 213, 4507 (1981), 501--504.
[34]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). Vol. 1631. Citeseer, 1642.
[35]
Kaushik Subramanian, Charles L. Isbell Jr., and Andrea L. Thomaz. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 447--456.
[36]
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge.
[37]
Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding natural language commands for robotic navigation and mobile manipulation. In Association for the Advancement of Artificial Intelligence (AAAI'11). Vol. 1. 2.
[38]
Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond J. Mooney. 2016. Learning multi-modal grounded linguistic semantics by playing “I Spy.” In International Joint Conferences on Artificial Intelligence (IJCAI'16). 3477--3483.
[39]
Jesse Thomason, Shiqi Zhang, Raymond J. Mooney, and Peter Stone. 2015. Learning to interpret natural language commands through human-robot dialog. In International Joint Conferences on Artificial Intelligence (IJCAI'15). 1923--1929.
[40]
Andrea L. Thomaz and Cynthia Breazeal. 2008. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence 172, 6--7 (2008), 716--737.

Cited By

View all
  • (2024)Early Investments for Teaming Dividends: A Human-Centered Approach to a Patient Decompensation Prediction AlgorithmProceedings of the International Symposium on Human Factors and Ergonomics in Health Care10.1177/232785792413104613:1(7-11)Online publication date: 3-Sep-2024
  • (2024)Comparing Teaching Strategies of a Machine Learning-based Prosthetic ArmProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645170(715-730)Online publication date: 18-Mar-2024
  • (2024)Multi-trainer binary feedback interactive reinforcement learningAnnals of Mathematics and Artificial Intelligence10.1007/s10472-024-09956-4Online publication date: 2-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Human-Robot Interaction
ACM Transactions on Human-Robot Interaction  Volume 7, Issue 2
Special Issue on Artificial Intelligence and Human-Robot Interaction
July 2018
109 pages
EISSN:2573-9522
DOI:10.1145/3284682
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2018
Accepted: 01 August 2018
Revised: 01 August 2018
Received: 01 April 2018
Published in THRI Volume 7, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human-agent interaction
  2. human factors
  3. natural language interface
  4. reinforcement learning
  5. sentiment

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)244
  • Downloads (Last 6 weeks)23
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Early Investments for Teaming Dividends: A Human-Centered Approach to a Patient Decompensation Prediction AlgorithmProceedings of the International Symposium on Human Factors and Ergonomics in Health Care10.1177/232785792413104613:1(7-11)Online publication date: 3-Sep-2024
  • (2024)Comparing Teaching Strategies of a Machine Learning-based Prosthetic ArmProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645170(715-730)Online publication date: 18-Mar-2024
  • (2024)Multi-trainer binary feedback interactive reinforcement learningAnnals of Mathematics and Artificial Intelligence10.1007/s10472-024-09956-4Online publication date: 2-Oct-2024
  • (2023)A Bibliometric Review of Analyzing the Intellectual Structure of the Knowledge Based on AI Chatbot Application from 2005–2022Journal of Information Systems Engineering and Management10.55267/iadt.07.144288:1(25843)Online publication date: 2023
  • (2023)The Impact of Example-Based Xai on Trust in Highly-Technical PopulationsProceedings of the Human Factors and Ergonomics Society Annual Meeting10.1177/2169506723119260267:1(1386-1392)Online publication date: 21-Oct-2023
  • (2023)Where is the human in human-centered AI? Insights from developer priorities and user experiencesComputers in Human Behavior10.1016/j.chb.2022.107617141:COnline publication date: 15-Feb-2023
  • (2023)Interactive Robot Learning: An OverviewHuman-Centered Artificial Intelligence10.1007/978-3-031-24349-3_9(140-172)Online publication date: 4-Apr-2023
  • (2021)A Review of Recent Deep Learning Approaches in Human-Centered Machine LearningSensors10.3390/s2107251421:7(2514)Online publication date: 3-Apr-2021
  • (2021)When Oracles Go Wrong: Using Preferences as a Means to ExploreCompanion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3434074.3447189(344-348)Online publication date: 8-Mar-2021
  • (2021)Interactive Reinforcement Learning from Imperfect TeachersCompanion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3434074.3446361(577-579)Online publication date: 8-Mar-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media