Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants

Published: 29 May 2020 Publication History

Abstract

We propose a data-driven approach to detect conversational groups by identifying spatial arrangements typical of these focused social encounters. Our approach uses a novel Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group, considering their social context. The predicted pair-wise affinities are then used in a graph clustering framework to identify both small (e.g., dyads) and large groups. The results from our evaluation on multiple, established benchmarks suggest that combining powerful deep learning methods with classical clustering techniques can improve the detection of conversational groups in comparison to prior approaches. Finally, we demonstrate the practicality of our approach in a human-robot interaction scenario. Our efforts show that our work advances group detection not only in theory, but also in practice.

Supplementary Material

ZIP File (v4cscw020aux.mp4.zip)

References

[1]
Jake K Aggarwal and Michael S Ryoo. 2011. Human activity analysis: A review. ACM Computing Surveys (CSUR), Vol. 43, 3 (2011), 16.
[2]
X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. 2016. SALSA: A novel dataset for multimodal group behavior analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 38, 8 (Aug 2016), 1707--1720. https://doi.org/10.1109/TPAMI.2015.2496269
[3]
Stefano Alletto, Giuseppe Serra, Simone Calderara, Francesco Solera, and Rita Cucchiara. 2014. From ego to nos-vision: Detecting social relationships in first-person views. In Proceedings of the 2014 Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops) (Columbus, Ohio). IEEE, 580--585.
[4]
Till Ballendat, Nicolai Marquardt, and Saul Greenberg. 2010. Proxemic interaction: Designing for a proximity and orientation-aware environment. In Proceedings of the 2010 ACM International Conference on Interactive Tabletops and Surfaces (Saarbrücken, Germany). ACM Press, 121--130.
[5]
Loris Bazzani, Marco Cristani, and Vittorio Murino. 2012. Decentralized particle filter for joint individual-group tracking. In Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Providence, Rhode Island). IEEE, 1886--1893.
[6]
Dan Bohus, Sean Andrist, and Eric Horvitz. 2017. A study in scene shaping: Adjusting f-formations in the wild. In Proceedings of the 2017 AAAI Fall Symposium: Natural Communication for Human-Robot Collaboration (Arlington, Virginia). AAAI.
[7]
Dan Bohus and Eric Horvitz. 2009a. Dialog in the open world: platform and applications. In Proceedings of the 2009 International Conference on Multimodal Interfaces (Cambridge, Massachusetts). ACM, 31--38.
[8]
Dan Bohus and Eric Horvitz. 2009b. Learning to predict engagement with a spoken dialog system in open-world settings. In Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 244--252.
[9]
Dan Bohus, Chit W Saw, and Eric Horvitz. 2014. Directions robot: In-the-wild experiences and lessons learned. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multi-agent Systems, 637--644.
[10]
Oliver Brdiczka, Jérôme Maisonnasse, and Patrick Reignier. 2005. Automatic detection of interaction groups. In Proceedings of the 2005 International Conference on Multimodal Interfaces. ACM, 32--36.
[11]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).
[12]
Ming-Ching Chang, Nils Krahnstoever, and Weina Ge. 2011. Probabilistic group-level motion analysis and scenario recognition. In Proceedings of the 2011 International Conference on Computer Vision (ICCV). IEEE, 747--754.
[13]
Chih-Wei Chen, Rodrigo Cilla Ugarte, Chen Wu, and Hamid Aghajan. 2011. Discovering social interactions in real work environments. In Face and Gesture 2011. IEEE, 933--938.
[14]
Wongun Choi, Khuram Shahid, and Silvio Savarese. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the 2009 International Conference on Computer Vision Workshops (ICCV Worshops). IEEE, 1282--1289.
[15]
Tanzeem Choudhury and Alex Pentland. 2002. The sociometer: A wearable device for understanding human networks. In CSCW'02 Workshop: Ad hoc Communications and Collaboration in Ubiquitous Computing Environments. ACM.
[16]
Marco Cristani, Loris Bazzani, Giulia Paggetti, Andrea Fossati, Diego Tosato, Alessio Del Bue, Gloria Menegaz, and Vittorio Murino. 2011. Social interaction discovery by statistical analysis of f-formations. In Proceedings of the 2011 British Machine Vision Conference (BMVC). BMVA Press, 23.1--23.12.
[17]
Marco Cristani, Ramya Raghavendra, Alessio Del Bue, and Vittorio Murino. 2013. Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, Vol. 100 (2013), 86--97.
[18]
Peter Dalsgaard and Kim Halskov. 2010. Designing urban media facc ades: Cases and challenges. In Proceedings of the 2010 Conference on Human Factors in Computing Systems (CHI). ACM, 2277--2286.
[19]
Elwys De Stefani and Lorenza Mondada. 2014. Reorganizing mobile formations: When ?guided" participants initiate reorientations in guided tours. Space and Culture, Vol. 17, 2 (2014), 157--175.
[20]
Eyal Dim and Tsvi Kuflik. 2015. Automatic detection of social behavior of museum visitor pairs. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 4, 4 (2015), 17.
[21]
Vanessa Evers, Nuno Menezes, Luis Merino, Dariu Gavrila, Fernando Nabais, Maja Pantic, and Paulo Alvito. 2014. The development and real-world application of frog, the fun robotic outdoor guide. In Proceedings of the Companion Publication of the 2014 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 281--284.
[22]
Alircza Fathi, Jessica K Hodgins, and James M Rehg. 2012. Social interactions: A first-person perspective. In Proceedings of the 2012 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1226--1233.
[23]
Tian Gan, Yongkang Wong, Daqing Zhang, and Mohan S Kankanhalli. 2013. Temporal encoded f-formation system for social interaction detection. In Proceedings of the 2013 ACM international conference on Multimedia. ACM, 937--946.
[24]
Weina Ge, Robert T Collins, and Barry Ruback. 2009. Automatically detecting the small group structure of a crowd. In Proceedings of the 2009 Workshop on Applications of Computer Vision. IEEE, 1--8.
[25]
Erving Goffman. 2008. Behavior in public places .Simon and Schuster.
[26]
Isabella Gomez Torres, Gaurav Parmar, Samarth Aggarwal, Nathaniel Mansur, and Alec Guthrie. 2019. Affordable smart wheelchair. In Extended Abstracts of the 2019 Conference on Human Factors in Computing Systems (CHI). ACM, Article SRC07, 6 pages. https://doi.org/10.1145/3290607.3308463
[27]
Georg Groh, Alexander Lehmann, Jonas Reimers, Marc René Frieß, and Loren Schwarz. 2010. Detecting social situations from interaction geometry. In Proceedings of the 2010 IEEE Second International Conference on Social Computing. IEEE, 1--8.
[28]
Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR. IEEE, 2255--2264.
[29]
Edward Twitchell Hall. 1910. The Hidden Dimension. Vol. 609. Garden City, NY: Doubleday.
[30]
Hooman Hedayati, Daniel Szafir, and Sean Andrist. 2019. Recognizing f-formations in the open world. In Proceedings of the 2019 ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 558--559.
[31]
Hayley Hung, Gwenn Englebienne, and Laura Cabrera Quiros. 2014. Detecting conversing groups with a single worn accelerometer. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, 84--91.
[32]
Hayley Hung and Ben Kröse. 2011. Detecting f-formations as dominant sets. In Proceedings of the 2011 International Conference on Multimodal Interfaces. ACM, 231--238.
[33]
Helge Hüttenrauch, Kerstin Severinson Eklundh, Anders Green, and Elin A Topp. 2006. Investigating spatial relationships in human-robot interaction. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5052--5059.
[34]
Junko Ichino, Kazuo Isoda, Tetsuya Ueda, and Reimi Satoh. 2016. Effects of the display angle on social behaviors of the people around the display: A field study at a museum. In Proceedings of the 2016 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW). ACM, 26--37.
[35]
Hanbyul Joo, Tomas Simon, Mina Cikara, and Yaser Sheikh. 2019. Towards social artificial intelligence: Nonverbal social signal prediction in a triadic interaction. In Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 10873--10883.
[36]
Manuela Jungmann, Richard Cox, and Geraldine Fitzpatrick. 2014. Spatial play effects in a tangible game with an f-formation of multiple players. In Proceedings of the 2014 Australasian User Interface Conference-Volume 150. Australian Computer Society, Inc., 57--66.
[37]
Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focused encounters. Vol. 7. CUP Archive.
[38]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[39]
Hideaki Kuzuoka, Yuya Suzuki, Jun Yamashita, and Keiichi Yamazaki. 2010. Reconfiguring spatial formation arrangement by robot body orientation. In Proceedings of the 2010 ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 285--292.
[40]
Oswald Lanz. 2006. Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, 9 (2006), 1436--1449.
[41]
Nicolai Marquardt, Robert Diaz-Marino, Sebastian Boring, and Saul Greenberg. 2011. The proximity toolkit: Prototyping proxemic interactions in ubiquitous computing ecologies. In Proceedings of the 2011 ACM Symposium on User Interface Software and Technology (UIST). ACM, 315--326.
[42]
Nicolai Marquardt, Ken Hinckley, and Saul Greenberg. 2012. Cross-device interaction via micro-mobility and f-formations. In Proceedings of the 2012 ACM Symposium on User Interface Software and Technology. ACM, 13--22.
[43]
Paul Marshall, Yvonne Rogers, and Nadia Pantidi. 2011. Using f-formations to analyse spatial patterns of interaction in physical environments. In Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work (CSCW). ACM, 445--454.
[44]
Yoichi Matsuyama, Arjun Bhardwaj, Ran Zhao, Oscar Romeo, Sushma Akoju, and Justine Cassell. 2016. Socially-aware animated intelligent personal assistant agent. In Proceedings of the 2016 meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics, 224--227.
[45]
Microsoft. 2019. Azure Kinect SDK (K4A). https://github.com/microsoft/Azure-Kinect-Sensor-SDK. [Online; accessed 14-October-2019].
[46]
Alejandro Moreno, Robby van Delden, Ronald Poppe, and Dennis Reidsma. 2013. Socially aware interactive playgrounds. IEEE pervasive computing, Vol. 12, 3 (2013), 40--47.
[47]
Daniel Olgu'in Olgu'in, Benjamin N Waber, Taemie Kim, Akshay Mohan, Koji Ara, and Alex Pentland. 2009. Sensible organizations: Technology and methodology for automatically measuring organizational behavior. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 39, 1 (2009), 43--55.
[48]
Hyun S Park, Eakta Jain, and Yaser Sheikh. 2012. 3D social saliency from head-mounted cameras. In Proceedings of the 2012 International Conference on Neural Information Processing Systems (NIPS). Curran Associates Inc., 422--430.
[49]
Massimiliano Pavan and Marcello Pelillo. 2007. Dominant sets and pairwise clustering. IEEE transactions on pattern analysis and machine intelligence, Vol. 29, 1 (2007), 167--172.
[50]
Ashwini Pokle, Roberto Mart'in-Mart'in, Patrick Goebel, Vincent Chow, Hans M Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, et al. 2019. Deep local trajectory replanning and control for robot navigation. arXiv preprint arXiv:1905.05279 (2019).
[51]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 652--660.
[52]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: An open-source Robot Operating System. In Proceedings of the 2009 International Conference on Robotics and Automation (ICRA) Workshop on Open Source Software (Kobe, Japan), Vol. 3. IEEE, 5.
[53]
Elisa Ricci, Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulo, Narendra Ahuja, and Oswald Lanz. 2015. Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In Proceedings of the 2015 International Conference on Computer Vision (ICCV). IEEE, 4660--4668.
[54]
Jorge Rios-Martinez, Anne Spalanzani, and Christian Laugier. 2015. From proxemics theory to socially-aware navigation: A survey. International Journal of Social Robotics, Vol. 7, 2 (2015), 137--153.
[55]
Navyata Sanghvi, Ryo Yonetani, and Kris Kitani. 2018. Learning group communication from demonstration. In Proceedings of Robotics: Science and Systems (RSS), Workshop on Models and Representations for Natural Human-Robot Communication. RSS.
[56]
Friederike Schneemann and Patrick Heinemann. 2016. Context-based detection of pedestrian crossing intention for autonomous driving in urban environments. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2243--2248.
[57]
Francesco Setti, Oswald Lanz, Roberta Ferrario, Vittorio Murino, and Marco Cristani. 2013. Multi-scale f-formation discovery for group detection. In Proceedings of the 2013 IEEE International Conference on Image Processing (ICIP). IEEE, 3547--3551.
[58]
Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. 2015. F-formation detection: Individuating free-standing conversational groups in images. PLOS One, Vol. 10, 5 (2015), e0123783.
[59]
Mason Swofford, John Peruzzi, and Marynel Vázquez. 2018. Conversational group detection with deep convolutional networks. arXiv preprint arXiv:1810.04039 (2018).
[60]
Lili Tong, Audrey Serna, Simon Pageaud, Sébastien George, and Aurélien Tabard. 2016. It's not how you stand, it's how you move: F-formations and collaboration dynamics in a mobile learning game. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI). ACM, 318--329.
[61]
Rudolph Triebel, Kai Arras, Rachid Alami, Lucas Beyer, Stefan Breuers, Raja Chatila, Mohamed Chetouani, Daniel Cremers, Vanessa Evers, Michelangelo Fiore, et al. 2016. Spencer: A socially aware service robot for passenger guidance and help in busy airports. In Field and Service Robotics. Springer, 607--622.
[62]
Sebastiano Vascon and Loris Bazzani. 2017. Chapter 3 - Group detection and tracking using sociological features. In Group and Crowd Behavior for Computer Vision. Academic Press. https://doi.org/10.1016/B978-0--12--809276--7.00004--7
[63]
Sebastiano Vascon, Eyasu Z Mequanint, Marco Cristani, Hayley Hung, Marcello Pelillo, and Vittorio Murino. 2016. Detecting conversational groups in images and sequences: A robust game-theoretic approach. Computer Vision and Image Understanding, Vol. 143 (2016), 11--24.
[64]
Marynel Vázquez. 2017. Reasoning About Spatial Patterns of Human Behavior During Group Conversations with Robots. Ph.D. Dissertation. The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
[65]
Marynel Vázquez, Elizabeth J Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E Hudson. 2017. Towards robot autonomy in group conversations: Understanding the effects of body orientation and gaze. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (HRI). ACM, 42--52.
[66]
Marynel Vázquez, Aaron Steinfeld, and Scott E Hudson. 2015. Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3010--3017.
[67]
Danny Wyatt, Tanzeem Choudhury, and Jeff Bilmes. 2007. Conversation detection and speaker segmentation in privacy-sensitive situated speech data. In Proceedings of the 2007 Conference of the International Speech Communication Association (INTERSPEECH). ISCA.
[68]
Ting Yu, Ser-Nam Lim, Kedar Patwardhan, and Nils Krahnstoever. 2009. Monitoring, recognizing and discovering social networks. In Proceedings of the 2009 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1462--1469.
[69]
Gloria Zen, Bruno Lepri, Elisa Ricci, and Oswald Lanz. 2010. Space speaks: towards socially and personality aware visual surveillance. In Proceedings of the 2010 ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA). ACM, 37--42.

Cited By

View all
  • (2024)Enabling Social Robots to Perceive and Join Socially Interacting Groups Using F-formation: A Comprehensive OverviewACM Transactions on Human-Robot Interaction10.1145/368207213:4(1-48)Online publication date: 23-Oct-2024
  • (2024)Towards Automatic Social Involvement EstimationProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3688615(612-616)Online publication date: 4-Nov-2024
  • (2024)Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365729520:8(1-23)Online publication date: 13-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 4, Issue CSCW1
CSCW
May 2020
1285 pages
EISSN:2573-0142
DOI:10.1145/3403424
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2020
Published in PACMHCI Volume 4, Issue CSCW1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. f-formations
  2. group conversations
  3. proxemic interactions
  4. spatial analysis

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)15
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling Social Robots to Perceive and Join Socially Interacting Groups Using F-formation: A Comprehensive OverviewACM Transactions on Human-Robot Interaction10.1145/368207213:4(1-48)Online publication date: 23-Oct-2024
  • (2024)Towards Automatic Social Involvement EstimationProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3688615(612-616)Online publication date: 4-Nov-2024
  • (2024)Psychology-Guided Environment Aware Network for Discovering Social Interaction Groups from VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365729520:8(1-23)Online publication date: 13-Jun-2024
  • (2024)Conflict Avoidance in Social Navigation—a SurveyACM Transactions on Human-Robot Interaction10.1145/364798313:1(1-36)Online publication date: 12-Feb-2024
  • (2024)Interaction-Shaping Robotics: Robots That Influence Interactions between Other AgentsACM Transactions on Human-Robot Interaction10.1145/364380313:1(1-23)Online publication date: 2-Feb-2024
  • (2024)Characterising CSCW Research on Human-Robot CollaborationProceedings of the ACM on Human-Computer Interaction10.1145/36409998:CSCW1(1-31)Online publication date: 26-Apr-2024
  • (2024)Measuring Visual Social Engagement from Proxemics and Gaze in the Real WorldCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640678(1110-1113)Online publication date: 11-Mar-2024
  • (2024)Towards Collaborative Crash Cart Robots that Support Clinical TeamworkProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3634967(715-724)Online publication date: 11-Mar-2024
  • (2024)T-DANTE: Detecting Group Behaviour in Spatio-Temporal Trajectories Using Context InformationAdvances in Intelligent Data Analysis XXII10.1007/978-3-031-58553-1_3(28-39)Online publication date: 16-Apr-2024
  • (2023)Identifying the Focus of Attention in Human-Robot Conversational GroupsProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623866(3-12)Online publication date: 4-Dec-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media