Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3577190.3616118acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment

Published: 09 October 2023 Publication History

Abstract

This paper presents the CASIA-GO entry to the Generation and Evaluation of Non-verbal Behaviour for Embedded Agents (GENEA) Challenge 2023. The system is originally designed for few-shot scenarios such as generating gestures with the style of any in-the-wild target speaker from short speech samples. Given a group of reference speech data including gesture sequences, audio, and text, it first constructs a gesture motion graph that describes the soft gesture units and interframe continuity inside the speech, which is ready to be used for new rhythmic and semantic gesture reenactment by pathfinding when test audio and text are provided. We randomly choose one clip from the training data for one test clip to simulate a few-shot scenario and provide compatible results for subjective evaluations. Despite the 0.25% average utilization of the whole training set for each clip in the test set and the 17.5% total utilization of the training set for the whole test set, the system succeeds in providing valid results and ranks in the top 1/3 in the appropriateness for agent speech evaluation.

References

[1]
Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, and Libin Liu. 2022. Rhythmic gesticulator: Rhythm-aware co-speech gesture synthesis with hierarchical neural embeddings. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–19.
[2]
Che-Jui Chang, Sen Zhang, and Mubbasir Kapadia. 2022. The IVI Lab entry to the GENEA Challenge 2022–A Tacotron2 based method for co-speech gesture generation with locality-constraint attention mechanism. In Proceedings of the 2022 International Conference on Multimodal Interaction. 784–789.
[3]
Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning individual styles of conversational gesture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 3497–3506.
[4]
Ikhsanul Habibie, Mohamed Elgharib, Kripasindhu Sarkar, Ahsan Abdullah, Simbarashe Nyatsanga, Michael Neff, and Christian Theobalt. 2022. A motion matching-based framework for controllable gesture synthesis from speech. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.
[5]
Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2008. Motion graphs. In ACM SIGGRAPH 2008 classes. 1–10.
[6]
Björn Krüger, Jochen Tautges, Andreas Weber, and Arno Zinke. 2010. Fast local and global similarity searches in large motion capture databases. In Symposium on Computer Animation. Citeseer, 1–10.
[7]
Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, and Gustav Eje Henter. 2023. The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settings. In Proceedings of the ACM International Conference on Multimodal Interaction(ICMI ’23). ACM.
[8]
Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan. 2013. Beam search algorithms for multilabel learning. Machine learning 92 (2013), 65–89.
[9]
Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S Srinivasa, and Yaser Sheikh. 2019. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 763–772.
[10]
Sergey Levine, Philipp Krähenbühl, Sebastian Thrun, and Vladlen Koltun. 2010. Gesture Controllers. ACM Trans. Graph. 29, 4, Article 124 (jul 2010), 11 pages. https://doi.org/10.1145/1778765.1778861
[11]
Buyu Li, Yongchi Zhao, Shi Zhelun, and Lu Sheng. 2022. Danceformer: Music conditioned 3d dance generation with parametric motion transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1272–1279.
[12]
Librosa Development Team. 2023. librosa.onset.onset_detect - librosa 0.10.1dev documentation. https://librosa.org/doc/main/generated/librosa.onset.onset_detect.html#librosa.onset.onset_detect
[13]
Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, and Bolei Zhou. 2022. Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10462–10472.
[14]
Stacy Marsella, Yuyu Xu, Margaux Lhommet, Andrew Feng, Stefan Scherer, and Ari Shapiro. 2013. Virtual Character Performance from Speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Anaheim, California) (SCA ’13). Association for Computing Machinery, New York, NY, USA, 25–35. https://doi.org/10.1145/2485895.2485900
[15]
Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, and Michael Neff. 2023. A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. arXiv preprint arXiv:2301.05339 (2023).
[16]
Alla Safonova and Jessica K Hodgins. 2007. Construction and optimal search of interpolated motion graphs. In ACM SIGGRAPH 2007 papers. 106–es.
[17]
Jing Xu, Wei Zhang, Yalong Bai, Qibin Sun, and Tao Mei. 2022. Freeform Body Motion Generation from Speech. arXiv preprint arXiv:2203.02291 (2022), 1–10.
[18]
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, and Haolin Zhuang. 2023. QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2321–2330.
[19]
Sheng Ye, Yu-Hui Wen, Yanan Sun, Ying He, Ziyang Zhang, Yaoyuan Wang, Weihua He, and Yong-Jin Liu. 2022. Audio-Driven Stylized Gesture Generation with Flow-Based Model. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V. Springer, 712–728.
[20]
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity. ACM Trans. Graph. 39, 6, Article 222 (nov 2020), 16 pages. https://doi.org/10.1145/3414685.3417838
[21]
Zeyu Zhao, Nan Gao, Zhi Zeng, and Shuwu Zhang. 2022. Generating Diverse Gestures from Speech Using Memory Networks as Dynamic Dictionaries. In 2022 International Conference on Culture-Oriented Science and Technology (CoST). 163–168. https://doi.org/10.1109/CoST57098.2022.00042
[22]
Chi Zhou, Tengyue Bian, and Kang Chen. 2022. GestureMaster: Graph-based speech-driven gesture generation. In Proceedings of the 2022 International Conference on Multimodal Interaction. 764–770.
[23]
Yang Zhou, Jimei Yang, Dingzeyu Li, Jun Saito, Deepali Aneja, and Evangelos Kalogerakis. 2022. Audio-driven neural gesture reenactment with video motion graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3418–3428.
[24]
Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, and Lequan Yu. 2023. Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation. arXiv preprint arXiv:2303.09119 (2023).

Cited By

View all
  • (2024)Gesture Area Coverage to Assess Gesture Expressiveness and Human-LikenessCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688822(165-169)Online publication date: 4-Nov-2024
  • (2023)The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settingsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616120(792-801)Online publication date: 9-Oct-2023

Index Terms

  1. Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
      October 2023
      858 pages
      ISBN:9798400700552
      DOI:10.1145/3577190
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. few-shot
      2. motion graph
      3. speech-driven gesture generation

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ICMI '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 453 of 1,080 submissions, 42%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)59
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 21 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Gesture Area Coverage to Assess Gesture Expressiveness and Human-LikenessCompanion Proceedings of the 26th International Conference on Multimodal Interaction10.1145/3686215.3688822(165-169)Online publication date: 4-Nov-2024
      • (2023)The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic settingsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3616120(792-801)Online publication date: 9-Oct-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media