Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3429889.3429904acmotherconferencesArticle/Chapter ViewAbstractPublication PagesisaimsConference Proceedingsconference-collections
research-article

A Speech-Driven 3-D Lip Synthesis with Realistic Dynamics in Mandarin Chinese

Published: 04 December 2020 Publication History

Abstract

In this paper, a new speech-driven lip synchronization method is developed, predicting the 3-D geometric shape of the lip without using speech recognition model in the visualization procedure, and can be trained and evaluated with realistic dynamics. Videos of Mandarin Chinese words are used. Speech signals are calculated into MFCC as audio features. 68-points facial landmarks are annotated from the corresponding videos through the prediction algorithm from the Dlib Library. Eos, a 3-D Morphable Face Model, is applied, using the facial landmarks, to predict the 3-D shape, where we can acquire 3-D landmarks. A machine-learning sequence-tagging model, averaged Structured Perceptron using Viterbi algorithm, is applied for modelling the direct prediction of labial parameters from the acoustic MFCC parameters. The 3-D labial area shape from the 'eos' prediction of a frame is morphed according to the predicted 3-D labial landmarks, forming the 3-D lip sequence, which can be plotted synchronically with the acoustic signal. In this 3-D lip synthesis, acoustic features and realistic lip shapes are directly mapped, where lip units and speech recognition are not applied, preserving more realistic articulatory or personality details; and the predicted geometric shapes are comparable with realistic dynamics, with the comparison indicating that this synthesis is of good effect.

References

[1]
D. Cudeiro, et al. (2019). Capture, learning, and synthesis of 3D speaking styles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[2]
M. Goyani, N. Dave, and N. M. Patel (2010). Performance analysis of lip synchronization using LPC, MFCC and PLP speech parameters. 2010 international conference on computational intelligence and communication networks, IEEE.
[3]
P. Kakumanu, et al. (2001). Speech driven facial animation. Proceedings of the 2001 workshop on Perceptive user interfaces.
[4]
F. Lavagetto, et al. (1997), Lip motion modeling and speech driven estimation. 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, IEEE.
[5]
J. S. Chung, A. Jamaludin, and A. Zisserman (2017). You said that?. arXiv preprint arXiv:1705.02966.
[6]
L. Chen et al. (2018). Lip movements generation at a glance. Proceedings of the European Conference on Computer Vision (ECCV).
[7]
M. Brand (1999). Voice puppetry. Proceedings of the 26th annual conference on Computer graphics and interactive techniques.
[8]
L. Xie and Z.-Q. Liu. (2007). Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Transactions on Multimedia, vol. 9, no. 3, pp. 500--510.
[9]
S. Lepsøy and S. Curinga (1998). Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis. Signal Processing: Image Communication, vol. 13, no. 3, pp. 209--225.
[10]
R. Gutierrez-Osuna, et al. (2005). Speech-driven facial animation with realistic dynamics. IEEE transactions on multimedia, vol. 7, no.1, pp. 33--42.
[11]
C. Sagonas, et al. (2013) 300 faces in-the-wild challenge: The first facial landmark localization challenge. Proceedings of the IEEE International Conference on Computer Vision Workshops.
[12]
D. E. King (2009). Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research, vol. 10, pp. 1755--1758.
[13]
P. Huber, et al. (2016). A multiresolution 3D morphable face model and fitting framework. Proceedings of the 11th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.
[14]
M. Collins (2002). Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol. 10, Association for Computational Linguistics.
[15]
M. Collins (2011). Lecture 4, COMS E6998-3: The Structured Perceptron. from http://www.cs.columbia.edu/~mcollins/courses/6998-2012/lectures/lec5.1.pdf

Index Terms

  1. A Speech-Driven 3-D Lip Synthesis with Realistic Dynamics in Mandarin Chinese

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ISAIMS '20: Proceedings of the 1st International Symposium on Artificial Intelligence in Medical Sciences
      September 2020
      313 pages
      ISBN:9781450388603
      DOI:10.1145/3429889
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 December 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. 3-D lip synthesis
      2. Mandarin Chinese
      3. realistic dynamics
      4. speech-driven

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ISAIMS 2020

      Acceptance Rates

      ISAIMS '20 Paper Acceptance Rate 53 of 112 submissions, 47%;
      Overall Acceptance Rate 53 of 112 submissions, 47%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 67
        Total Downloads
      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 12 Nov 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media