research-article

See What I See: Enabling User-Centric Robotic Assistance Using First-Person Demonstrations

Authors:

Gopika Ajaykumar,

Chien-Ming HuangAuthors Info & Claims

HRI '20: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction

Pages 639 - 648

https://doi.org/10.1145/3319502.3374820

Published: 09 March 2020 Publication History

Abstract

We explore first-person demonstration as an intuitive way of producing task demonstrations to facilitate user-centric robotic assistance. First-person demonstration directly captures the human experience of task performance via head-mounted cameras and naturally includes productive viewpoints for task actions. We implemented a perception system that parses natural first-person demonstrations into task models consisting of sequential task procedures, spatial configurations, and unique task viewpoints. We also developed a robotic system capable of interacting autonomously with users as it follows previously acquired task demonstrations. To evaluate the effectiveness of our robotic assistance, we conducted a user study contextualized in an assembly scenario; we sought to determine how assistance based on a first-person demonstration (user-centric assistance) versus that informed only by the cover image of the official assembly instruction (standard assistance) may shape users' behaviors and overall experience when working alongside a collaborative robot. Our results show that participants felt that their robot partner was more collaborative and considerate when it provided user-centric assistance than when it offered only standard assistance. Additionally, participants were more likely to exhibit unproductive behaviors, such as using their non-dominant hand, when performing the assembly task without user-centric assistance.

Supplementary Material

MP4 File (p639-wang.mp4)

Download
74.20 MB

References

[1]

Baris Akgun, Maya Cakmak, Jae Wook Yoo, and Andrea Lockerd Thomaz. 2012. Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. ACM, 391--398.

Digital Library

[2]

Jacopo Aleotti and Stefano Caselli. 2006. Robust trajectory learning and approximation for robot programming by demonstration. Robotics and Autonomous Systems, Vol. 54, 5 (2006), 409--413.

[3]

Sonya Alexandrova, Zachary Tatlock, and Maya Cakmak. 2015. Roboflow: A flow-based visual programming language for mobile manipulation tasks. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5537--5544.

[4]

Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and autonomous systems, Vol. 57, 5 (2009), 469--483.

[5]

Reuben M Aronson and Henny Admoni. 2018. Gaze for Error Detection During Human-Robot Shared Manipulation. In Fundamentals of Joint Action workshop, Robotics: Science and Systems .

[6]

Gedas Bertasius, Aaron Chan, and Jianbo Shi. 2018. Egocentric basketball motion planning from a single first-person image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 5889--5898.

[7]

Gedas Bertasius, Hyun Soo Park, Stella X Yu, and Jianbo Shi. 2016. First person action-object detection with egonet. In Proceedings of Robotics: Science and Systems (RSS) .

[8]

Aude Billard, Sylvain Calinon, Ruediger Dillmann, and Stefan Schaal. 2008. Robot programming by demonstration. In Springer handbook of robotics . Springer, 1371--1394.

[9]

Sonia Chernova and Andrea L Thomaz. 2014. Robot learning from human teachers. Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 8, 3 (2014), 1--121.

Digital Library

[10]

LS Homem De Mello and Arthur C Sanderson. 1990. AND/OR graph representation of assembly plans. IEEE Transactions on robotics and automation, Vol. 6, 2 (1990), 188--199.

[11]

Anca Dragan and Siddhartha Srinivasa. 2013. Generating legible motion. (2013).

[12]

Anca D Dragan, Shira Bauman, Jodi Forlizzi, and Siddhartha S Srinivasa. 2015. Effects of robot motion on human-robot collaboration. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction. ACM, 51--58.

Digital Library

[13]

Staffan Ekvall and Danica Kragic. 2006. Learning task models from multiple human demonstrations. In ROMAN 2006-The 15th IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 358--363.

[14]

Alireza Fathi, Xiaofeng Ren, and James M Rehg. 2011. Learning to recognize objects in egocentric activities. In CVPR 2011. IEEE, 3281--3288.

Digital Library

[15]

Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. 2017. One-Shot Visual Imitation Learning via Meta-Learning. In Conference on Robot Learning . 357--368.

[16]

Yuxiang Gao and Chien-Ming Huang. 2019. PATI: a projection-based augmented table-top interface for robot programming. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 345--355.

Digital Library

[17]

Dylan F Glas, Takayuki Kanda, and Hiroshi Ishiguro. 2016. Human-robot interaction design using Interaction Composer eight years of lessons learned. In 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI) . IEEE, 303--310.

[18]

Michael Görner, Robert Haschke, Helge Ritter, and Jianwei Zhang. 2019. MoveIt! Task Constructor for task-level motion planning. (2019).

[19]

Bradley Hayes and Brian Scassellati. 2016. Autonomously constructing hierarchical task networks for planning and human-robot collaboration. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 5469--5476.

[20]

Micha Hersch, Florent Guenter, Sylvain Calinon, and Aude Billard. 2008. Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, Vol. 24, 6 (2008), 1463--1467.

Digital Library

[21]

Chien-Ming Huang and Bilge Mutlu. 2016. Anticipatory robot control for efficient human-robot collaboration. In The Eleventh ACM/IEEE International Conference on Human Robot Interaction. IEEE Press, 83--90.

Digital Library

[22]

Justin Huang and Maya Cakmak. 2017. Code3: A system for end-to-end programming of mobile manipulator robots for novices and experts. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 453--462.

Digital Library

[23]

James Kennedy, Paul Baxter, and Tony Belpaeme. 2015. The robot who tried too hard: Social behaviour of a robot tutor can negatively affect child learning. In 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 67--74.

Digital Library

[24]

Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas Roy. 2010. Toward understanding natural language directions. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction. IEEE Press, 259--266.

Digital Library

[25]

Yasuo Kuniyoshi, Masayuki Inaba, and Hirochika Inoue. 1994. Learning by watching: Extracting reusable task knowledge from visual observation of human performance. IEEE transactions on robotics and automation, Vol. 10, 6 (1994), 799--822.

[26]

Stanislao Lauria, Guido Bugmann, Theocharis Kyriacou, and Ewan Klein. 2002. Mobile robot programming using natural language. Robotics and Autonomous Systems, Vol. 38, 3--4 (2002), 171--181.

[27]

Alex X Lee, Henry Lu, Abhishek Gupta, Sergey Levine, and Pieter Abbeel. 2015. Learning force-based manipulation of deformable objects from multiple demonstrations. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 177--184.

[28]

Jangwon Lee and Michael S Ryoo. 2017. Learning robot activities from first-person human videos using convolutional future regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1--2.

[29]

Kyungjun Lee and Hernisa Kacorri. 2019. Hands Holding Clues for Object Recognition in Teachable Machines. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 336.

Digital Library

[30]

Haoxiang Li, Mohammed Kutbi, Xin Li, Changjiang Cai, Philippos Mordohai, and Gang Hua. 2016. An egocentric computer vision based co-robot wheelchair. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 1829--1836.

[31]

Yin Li, Zhefan Ye, and James M Rehg. 2015. Delving into egocentric actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 287--295.

[32]

Minghuang Ma, Haoqi Fan, and Kris M Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1894--1903.

[33]

Cynthia Matuszek, Liefeng Bo, Luke Zettlemoyer, and Dieter Fox. 2014. Learning from Unscripted Deictic Gesture and Language for Human-Robot Interactions. In AAAI . 2556--2563.

[34]

Grégoire Milliez, Raphaël Lallement, Michelangelo Fiore, and Rachid Alami. 2016. Using human knowledge awareness to adapt collaborative plan generation, explanation and monitoring. In The Eleventh ACM/IEEE International Conference on Human Robot Interaction. IEEE Press, 43--50.

Digital Library

[35]

Anahita Mohseni-Kabir, Charles Rich, Sonia Chernova, Candace L Sidner, and Daniel Miller. 2015. Interactive hierarchical task learning from a single demonstration. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction. ACM, 205--212.

Digital Library

[36]

Yoan Mollard, Thibaut Munzer, Andrea Baisero, Marc Toussaint, and Manuel Lopes. 2015. Robot programming from demonstration, feedback and transfer. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 1825--1831.

[37]

Vladimir Nekrasov, Chunhua Shen, and Ian Reid. 2018. Light-weight refinenet for real-time semantic segmentation. In Proceedings of the British Machine Vision Conference (BMVC) .

[38]

Scott Niekum, Sachin Chitta, Andrew G Barto, Bhaskara Marthi, and Sarah Osentoski. 2013. Incremental Semantically Grounded Learning from Demonstration. In Robotics: Science and Systems, Vol. 9. Berlin, Germany.

[39]

Scott Niekum, Sarah Osentoski, George Konidaris, and Andrew G Barto. 2012. Learning and generalization of complex tasks from unstructured demonstrations. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5239--5246.

[40]

Stefanos Nikolaidis, Anca Dragan, and Siddharta Srinivasa. 2016. Viewpoint-Based Legibility Optimization. In The Eleventh ACM/IEEE International Conference on Human Robot Interaction. IEEE Press, 271--278.

[41]

Stefanos Nikolaidis, Ramya Ramakrishnan, Keren Gu, and Julie Shah. 2015. Efficient model learning from joint-action demonstrations for human-robot collaborative tasks. In Proceedings of the tenth annual ACM/IEEE international conference on human-robot interaction. ACM, 189--196.

Digital Library

[42]

Chris Paxton, Andrew Hundt, Felix Jonathan, Kelleher Guerin, and Gregory D Hager. 2017. CoSTAR: Instructing collaborative robots with behavior trees and vision. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 564--571.

Digital Library

[43]

Alessandro Roncone, Olivier Mangin, and Brian Scassellati. 2017. Transparent role assignment and task allocation in human robot collaboration. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1014--1021.

Digital Library

[44]

MS Ryoo, Thomas J Fuchs, Lu Xia, Jake K Aggarwal, and Larry Matthies. 2015. Robot-centric activity prediction from first-person videos: What will they do to me?. In 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 295--302.

Digital Library

[45]

Yasaman S Sefidgar, Prerna Agarwal, and Maya Cakmak. 2017. Situated tangible robot programming. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 473--482.

Digital Library

[46]

Lanbo She, Yu Cheng, Joyce Y Chai, Yunyi Jia, Shaohua Yang, and Ning Xi. 2014a. Teaching robots new actions through natural language instructions. In Robot and Human Interactive Communication, 2014 RO-MAN: The 23rd IEEE International Symposium on. IEEE, 868--873.

[47]

Lanbo She, Shaohua Yang, Yu Cheng, Yunyi Jia, Joyce Chai, and Ning Xi. 2014b. Back to the blocks world: Learning new actions through situated human-robot dialogue. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) . 89--97.

[48]

Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi. 2016. Egocentric future localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4697--4705.

[49]

Hyun Soo Park and Jianbo Shi. 2015. Social saliency prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4777--4785.

[50]

Maj Stenmark and Pierre Nugues. 2013. Natural language programming of industrial robots. In ISR. Citeseer, 1--5.

[51]

Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. 2019. Lsta: Long short-term attention for egocentric action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 9954--9963.

[52]

C Sylvain. 2009. Robot programming by demonstration: A probabilistic approach .EPFL Press.

[53]

Faraz Torabi, Garrett Warnell, and Peter Stone. 2018. Behavioral cloning from observation. arXiv preprint arXiv:1805.01954 (2018).

[54]

Lu Xia, Ilaria Gori, Jake K Aggarwal, and Michael S Ryoo. 2015. Robot-centric activity recognition from first-person rgb-d videos. In 2015 IEEE Winter Conference on Applications of Computer Vision. IEEE, 357--364.

Digital Library

[55]

Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. 2018. One-shot imitation from observing humans via domain-adaptive meta-learning. In Proceedings of Robotics: Science and Systems (RSS) .

[56]

Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Ken Goldberg, and Pieter Abbeel. 2018. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. 2018 IEEE International Conference on on Robotics and Automation (ICRA) (2018).

Cited By

Xin JWang LXu KYang CYin B(2023)Learning Interaction Regions and Motion Trajectories Simultaneously From Egocentric Demonstration VideosIEEE Robotics and Automation Letters10.1109/LRA.2023.33013078:10(6635-6642)Online publication date: Oct-2023
https://doi.org/10.1109/LRA.2023.3301307
Papoutsakis KPapadopoulos GManiadakis MPapadopoulos TLourakis MPateraki MVarlamis I(2022)Detection of Physical Strain and Fatigue in Industrial Environments Using Visual and Non-Visual Low-Cost SensorsTechnologies10.3390/technologies1002004210:2(42)Online publication date: 16-Mar-2022
https://doi.org/10.3390/technologies10020042
Qian XHe FHu XWang TRamani K(2022)ARnnotate: An Augmented Reality Interface for Collecting Custom Dataset of 3D Hand-Object Interaction Pose EstimationProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545663(1-14)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545663
Show More Cited By

Index Terms

See What I See: Enabling User-Centric Robotic Assistance Using First-Person Demonstrations

Recommendations

What My Eyes Can't See, A Robot Can Show Me: Exploring the Collaboration Between Blind People and Robots
ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility

Blind people rely on sighted peers and different assistive technologies to accomplish everyday tasks. In this paper, we explore how assistive robots can go beyond information-giving assistive technologies (e.g., screen readers) by physically ...
Contextual Programming of Collaborative Robots
Artificial Intelligence in HCI
Abstract
Collaborative robots are envisioned to assist people in an increasing range of domains, from manufacturing to home care; however, due to the variable nature of these fields, such robots will inevitably face unfamiliar situations and unforeseen ...
See You See Me: The Role of Eye Contact in Multimodal Human-Robot Interaction
Special Issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 2 of 2), Regular Articles and Special Issue on Highlights of IUI 2015 (Part 1 of 2)

We focus on a fundamental looking behavior in human-robot interactions—gazing at each other's face. Eye contact and mutual gaze between two social partners are critical in smooth human-human interactions. Therefore, investigating at what moments and in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '20: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction

March 2020

690 pages

ISBN:9781450367462

DOI:10.1145/3319502

General Chairs:
Tony Belpaeme
Ghent University, Belgium
,
James Young
University of Manitoba, Canada
,
Program Chairs:
Hatice Gunes
University of Cambridge, UK
,
Laurel Riek
UC San Diego, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
IEEE-RAS: Robotics and Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HRI '20

Sponsor:

SIGAI
SIGCHI
IEEE-RAS

HRI '20: ACM/IEEE International Conference on Human-Robot Interaction

March 23 - 26, 2020

Cambridge, United Kingdom

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
546
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xin JWang LXu KYang CYin B(2023)Learning Interaction Regions and Motion Trajectories Simultaneously From Egocentric Demonstration VideosIEEE Robotics and Automation Letters10.1109/LRA.2023.33013078:10(6635-6642)Online publication date: Oct-2023
https://doi.org/10.1109/LRA.2023.3301307
Papoutsakis KPapadopoulos GManiadakis MPapadopoulos TLourakis MPateraki MVarlamis I(2022)Detection of Physical Strain and Fatigue in Industrial Environments Using Visual and Non-Visual Low-Cost SensorsTechnologies10.3390/technologies1002004210:2(42)Online publication date: 16-Mar-2022
https://doi.org/10.3390/technologies10020042
Qian XHe FHu XWang TRamani K(2022)ARnnotate: An Augmented Reality Interface for Collecting Custom Dataset of 3D Hand-Object Interaction Pose EstimationProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545663(1-14)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526113.3545663
Ajaykumar GSteele MHuang C(2021)A Survey on End-User Robot ProgrammingACM Computing Surveys10.1145/346681954:8(1-36)Online publication date: 4-Oct-2021
https://dl.acm.org/doi/10.1145/3466819
Ajaykumar G(2021)Assisted End-User Robot ProgrammingProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3481276(797-801)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3462244.3481276
Jones JCortesa CShelton ALandau BKhudanpur SHager G(2021)Fine-Grained Activity Recognition for Assembly VideosIEEE Robotics and Automation Letters10.1109/LRA.2021.30641496:2(3728-3735)Online publication date: Apr-2021
https://doi.org/10.1109/LRA.2021.3064149
Ohkawa TYagi THashimoto AUshiku YSato Y(2021)Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand SegmentationIEEE Access10.1109/ACCESS.2021.30940529(94644-94655)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3094052
Pamparău CAiordachioae AVatavu R(2020)From Do You See What I See? to Do You Control What I See? Mediated Vision, From a Distance, for Eyewear UsersProceedings of the 19th International Conference on Mobile and Ubiquitous Multimedia10.1145/3428361.3432089(326-328)Online publication date: 22-Nov-2020
https://dl.acm.org/doi/10.1145/3428361.3432089
Ajaykumar GHuang CBelpaeme TYoung JGunes HRiek L(2020)User Needs and Design Opportunities in End-User Robot ProgrammingCompanion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3371382.3378300(93-95)Online publication date: 23-Mar-2020
https://dl.acm.org/doi/10.1145/3371382.3378300
Huang C(2020)Contextual Programming of Collaborative RobotsArtificial Intelligence in HCI10.1007/978-3-030-50334-5_22(321-338)Online publication date: 10-Jul-2020
https://doi.org/10.1007/978-3-030-50334-5_22

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents