research-article

Open access

Component attention network for multimodal dance improvisation recognition

Authors:

Sepideh Pashami,

Mårten BjörkmanAuthors Info & Claims

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

Pages 114 - 118

https://doi.org/10.1145/3577190.3614114

Published: 09 October 2023 Publication History

All formats PDF

Abstract

Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component features. We show that our proposed model outperforms the two baseline methods, demonstrating its potential for analyzing improvisation in dance.

References

[1]

Tadas Baltrušaitis, Ntombikayise Banda, and Peter Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, 1–8.

[2]

Himadri Bhuyan, Jagadeesh Killi, Jatindra Kumar Dash, Partha Pratim Das, and Soumen Paul. 2022. Motion Recognition in Bharatanatyam Dance. IEEE Access 10 (2022), 67128–67139.

[3]

Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, and Shrikanth Narayanan. 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces. 205–211.

Digital Library

[4]

Judith Butepage, Michael J Black, Danica Kragic, and Hedvig Kjellstrom. 2017. Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6158–6166.

[5]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[6]

Michael Duhme, Raphael Memmesheimer, and Dietrich Paulus. 2022. Fusion-gcn: Multimodal action recognition using graph convolutional networks. In Pattern Recognition: 43rd DAGM German Conference, DAGM GCPR 2021, Bonn, Germany, September 28–October 1, 2021, Proceedings. Springer, 265–281.

[7]

Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE international conference on computer vision. 2334–2343.

[8]

Mihai Gurban, Jean-Philippe Thiran, Thomas Drugman, and Thierry Dutoit. 2008. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition. In Proceedings of the 10th international conference on Multimodal interfaces. 237–240.

Digital Library

[9]

Danica Hendry, Kevin Chai, Amity Campbell, Luke Hopper, Peter O’Sullivan, and Leon Straker. 2020. Development of a human activity recognition system for ballet tasks. Sports medicine-open 6, 1 (2020), 1–10.

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[11]

Xiaodan Hu and Narendra Ahuja. 2021. Unsupervised 3D pose estimation for hierarchical dance video Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11015–11024.

[12]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[13]

Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. 2019. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10863–10872.

[14]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.

[15]

Beth Logan 2000. Mel frequency cepstral coefficients for music modeling. In Ismir, Vol. 270. Plymouth, MA, 11.

[16]

Hitoshi Matsuyama, Shunsuke Aoki, Takuro Yonezawa, Kei Hiroi, Katsuhiko Kaji, and Nobuo Kawaguchi. 2021. Deep Learning for Ballroom Dance Recognition: A Temporal and Trajectory-Aware Classification Model With Three-Dimensional Pose Estimation and Wearable Sensing. IEEE Sensors Journal 21, 22 (2021), 25437–25448.

[17]

Raphael Memmesheimer, Nick Theisen, and Dietrich Paulus. 2020. Gimme signals: Discriminative signal encoding for multimodal activity recognition. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10394–10401.

[18]

Mihalis A Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing 2, 2 (2011), 92–105.

Digital Library

[19]

Radoslaw Niewiadomski, Maurizio Mancini, Andrea Cera, Stefano Piana, Corrado Canepa, and Antonio Camurri. 2019. Does embodied training improve the recognition of mid-level expressive movement qualities sonification?Journal on Multimodal User Interfaces 13, 3 (2019), 191–203.

[20]

Eftychios Protopapadakis, Athina Grammatikopoulou, Anastasios Doulamis, and Nikos Grammalidis. 2017. Folk dance pattern recognition over depth images acquired via kinect sensor. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 42 (2017), 587.

[21]

Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, and Maja Pantic. 2015. Av+ ec 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In Proceedings of the 5th international workshop on audio/visual emotion challenge. 3–8.

Digital Library

[22]

Maarten J Vaessen, Etienne Abassi, Maurizio Mancini, Antonio Camurri, and Beatrice de Gelder. 2019. Computational feature analysis of body movements reveals hierarchical brain organization. Cerebral Cortex 29, 8 (2019), 3551–3560.

[23]

Chongyang Wang, Temitayo A Olugbade, Akhil Mathur, Amanda C De C. Williams, Nicholas D Lane, and Nadia Bianchi-Berthouze. 2019. Recurrent network based automatic detection of chronic pain protective behavior using mocap and semg data. In Proceedings of the 23rd international symposium on wearable computers. 225–230.

Digital Library

[24]

Chongyang Wang, Min Peng, Temitayo A Olugbade, Nicholas D Lane, Amanda C De C Williams, and Nadia Bianchi-Berthouze. 2019. Learning temporal and bodily attention in protective movement behavior detection. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 324–330.

[25]

Max Welling and Thomas N Kipf. 2016. Semi-supervised classification with graph convolutional networks. In J. International Conference on Learning Representations (ICLR 2017).

[26]

Sandar Win and Thin Lai Lai Thein. 2020. Real-time human motion detection, tracking and activity recognition with skeletal model. In 2020 IEEE Conference on Computer Applications (ICCA). IEEE, 1–5.

[27]

Monika Wysoczanska and Tomasz Trzcinski. 2020. Multimodal Dance Recognition. In VISIGRAPP (5: VISAPP). 558–565.

[28]

Fangkai Yang, Wenjie Yin, Tetsunari Inamura, Mårten Björkman, and Christopher Peters. 2020. Group behavior recognition using attention-and graph-based neural networks. In ECAI 2020. IOS Press, 1626–1633.

[29]

Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions. In Proceedings of the 2018 on audio/visual emotion challenge and workshop. 65–72.

Digital Library

Cited By

Agarwal AJha PJain OBera A(2024)POA-Net: Dance Poses and Activity Classification Using Convolutional Neural Networks2024 IEEE Region 10 Symposium (TENSYMP)10.1109/TENSYMP61132.2024.10752281(1-6)Online publication date: 27-Sep-2024
https://doi.org/10.1109/TENSYMP61132.2024.10752281

Index Terms

Component attention network for multimodal dance improvisation recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
2. Human-centered computing

Recommendations

A Bag of Words approach for recognition of Greek folk dances
SETN '16: Proceedings of the 9th Hellenic Conference on Artificial Intelligence

In this paper we explore the problem of distinguishing Greek folk dances from other kinds of activities, as well as from other dance genres, using video recordings. For this purpose, we adopt dense trajectories descriptors along with a Bag of Words (BoW)...
"Digital Tap Dance": Tap Dance as Medium for Composition
Bi-attention Modal Separation Network for Multimodal Video Fusion
MultiMedia Modeling
Abstract
With the increasing popularity of video sharing websites such as YouTube and Facebook, multimodal video understanding has received increasing attention from the scientific community. Video is usually composed of multimodal signals, such as video, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

October 2023

858 pages

ISBN:9798400700552

DOI:10.1145/3577190

Editors:
Elisabeth André
University of Augsburg
,
Mohamed Chetouani
Sorbonne University
,
Dominique Vaufreydaz
Univ. Grenoble Alpes
,
Gale Lucas
USC Institute for Creative Technologies
,
Tanja Schultz
University of Bremen
,
Louis-Philippe Morency
Carnegie Mellon University
,
Alessandro Vinciarelli
University of Glasgow

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMI '23

Sponsor:

SIGCHI

ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 9 - 13, 2023

Paris, France

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
276
Total Downloads

Downloads (Last 12 months)250
Downloads (Last 6 weeks)22

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Agarwal AJha PJain OBera A(2024)POA-Net: Dance Poses and Activity Classification Using Convolutional Neural Networks2024 IEEE Region 10 Symposium (TENSYMP)10.1109/TENSYMP61132.2024.10752281(1-6)Online publication date: 27-Sep-2024
https://doi.org/10.1109/TENSYMP61132.2024.10752281

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents