Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3577190.3614114acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article
Open access

Component attention network for multimodal dance improvisation recognition

Published: 09 October 2023 Publication History

Abstract

Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component features. We show that our proposed model outperforms the two baseline methods, demonstrating its potential for analyzing improvisation in dance.

References

[1]
Tadas Baltrušaitis, Ntombikayise Banda, and Peter Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). IEEE, 1–8.
[2]
Himadri Bhuyan, Jagadeesh Killi, Jatindra Kumar Dash, Partha Pratim Das, and Soumen Paul. 2022. Motion Recognition in Bharatanatyam Dance. IEEE Access 10 (2022), 67128–67139.
[3]
Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, and Shrikanth Narayanan. 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th international conference on Multimodal interfaces. 205–211.
[4]
Judith Butepage, Michael J Black, Danica Kragic, and Hedvig Kjellstrom. 2017. Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6158–6166.
[5]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[6]
Michael Duhme, Raphael Memmesheimer, and Dietrich Paulus. 2022. Fusion-gcn: Multimodal action recognition using graph convolutional networks. In Pattern Recognition: 43rd DAGM German Conference, DAGM GCPR 2021, Bonn, Germany, September 28–October 1, 2021, Proceedings. Springer, 265–281.
[7]
Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE international conference on computer vision. 2334–2343.
[8]
Mihai Gurban, Jean-Philippe Thiran, Thomas Drugman, and Thierry Dutoit. 2008. Dynamic modality weighting for multi-stream hmms inaudio-visual speech recognition. In Proceedings of the 10th international conference on Multimodal interfaces. 237–240.
[9]
Danica Hendry, Kevin Chai, Amity Campbell, Luke Hopper, Peter O’Sullivan, and Leon Straker. 2020. Development of a human activity recognition system for ballet tasks. Sports medicine-open 6, 1 (2020), 1–10.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[11]
Xiaodan Hu and Narendra Ahuja. 2021. Unsupervised 3D pose estimation for hierarchical dance video Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11015–11024.
[12]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[13]
Jiefeng Li, Can Wang, Hao Zhu, Yihuan Mao, Hao-Shu Fang, and Cewu Lu. 2019. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10863–10872.
[14]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.
[15]
Beth Logan 2000. Mel frequency cepstral coefficients for music modeling. In Ismir, Vol. 270. Plymouth, MA, 11.
[16]
Hitoshi Matsuyama, Shunsuke Aoki, Takuro Yonezawa, Kei Hiroi, Katsuhiko Kaji, and Nobuo Kawaguchi. 2021. Deep Learning for Ballroom Dance Recognition: A Temporal and Trajectory-Aware Classification Model With Three-Dimensional Pose Estimation and Wearable Sensing. IEEE Sensors Journal 21, 22 (2021), 25437–25448.
[17]
Raphael Memmesheimer, Nick Theisen, and Dietrich Paulus. 2020. Gimme signals: Discriminative signal encoding for multimodal activity recognition. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10394–10401.
[18]
Mihalis A Nicolaou, Hatice Gunes, and Maja Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing 2, 2 (2011), 92–105.
[19]
Radoslaw Niewiadomski, Maurizio Mancini, Andrea Cera, Stefano Piana, Corrado Canepa, and Antonio Camurri. 2019. Does embodied training improve the recognition of mid-level expressive movement qualities sonification?Journal on Multimodal User Interfaces 13, 3 (2019), 191–203.
[20]
Eftychios Protopapadakis, Athina Grammatikopoulou, Anastasios Doulamis, and Nikos Grammalidis. 2017. Folk dance pattern recognition over depth images acquired via kinect sensor. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 42 (2017), 587.
[21]
Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, and Maja Pantic. 2015. Av+ ec 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In Proceedings of the 5th international workshop on audio/visual emotion challenge. 3–8.
[22]
Maarten J Vaessen, Etienne Abassi, Maurizio Mancini, Antonio Camurri, and Beatrice de Gelder. 2019. Computational feature analysis of body movements reveals hierarchical brain organization. Cerebral Cortex 29, 8 (2019), 3551–3560.
[23]
Chongyang Wang, Temitayo A Olugbade, Akhil Mathur, Amanda C De C. Williams, Nicholas D Lane, and Nadia Bianchi-Berthouze. 2019. Recurrent network based automatic detection of chronic pain protective behavior using mocap and semg data. In Proceedings of the 23rd international symposium on wearable computers. 225–230.
[24]
Chongyang Wang, Min Peng, Temitayo A Olugbade, Nicholas D Lane, Amanda C De C Williams, and Nadia Bianchi-Berthouze. 2019. Learning temporal and bodily attention in protective movement behavior detection. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 324–330.
[25]
Max Welling and Thomas N Kipf. 2016. Semi-supervised classification with graph convolutional networks. In J. International Conference on Learning Representations (ICLR 2017).
[26]
Sandar Win and Thin Lai Lai Thein. 2020. Real-time human motion detection, tracking and activity recognition with skeletal model. In 2020 IEEE Conference on Computer Applications (ICCA). IEEE, 1–5.
[27]
Monika Wysoczanska and Tomasz Trzcinski. 2020. Multimodal Dance Recognition. In VISIGRAPP (5: VISAPP). 558–565.
[28]
Fangkai Yang, Wenjie Yin, Tetsunari Inamura, Mårten Björkman, and Christopher Peters. 2020. Group behavior recognition using attention-and graph-based neural networks. In ECAI 2020. IOS Press, 1626–1633.
[29]
Jinming Zhao, Ruichen Li, Shizhe Chen, and Qin Jin. 2018. Multi-modal multi-cultural dimensional continues emotion recognition in dyadic interactions. In Proceedings of the 2018 on audio/visual emotion challenge and workshop. 65–72.

Cited By

View all
  • (2024)POA-Net: Dance Poses and Activity Classification Using Convolutional Neural Networks2024 IEEE Region 10 Symposium (TENSYMP)10.1109/TENSYMP61132.2024.10752281(1-6)Online publication date: 27-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
October 2023
858 pages
ISBN:9798400700552
DOI:10.1145/3577190
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention Network
  2. Dance Recognition
  3. Multimodal Fusion

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)250
  • Downloads (Last 6 weeks)22
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)POA-Net: Dance Poses and Activity Classification Using Convolutional Neural Networks2024 IEEE Region 10 Symposium (TENSYMP)10.1109/TENSYMP61132.2024.10752281(1-6)Online publication date: 27-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media