Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3371382.3378297acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
abstract

Neural Speech Synthesis with Style Intensity Interpolation: A Perceptual Analysis

Published: 01 April 2020 Publication History

Abstract

State of the art in speech synthesis considerably reduced the gap between synthetic and human speech on the perception level. However the impact of a speech style control on the perception is not well known. In this paper, we propose a method to analyze the impact of controlling the TTS system parameters on the perception of the generated sentence. This is done through a visualization and analysis of listening test results. For this, we train a speech synthesis system with different discrete categories of speech styles. Each style is encoded using a one-hot representation in the network. After training, we interpolate between the vectors representing each style. A perception test showed that despite being trained with only discrete categories of data, the network is capable of generating intermediate intensity levels between neutral and a given speech style.

References

[1]
Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Ostadabbas, and Thierry Dutoit. 2018. The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems. arXiv preprint arXiv:1806.09514 (2018).
[2]
Ye Jia, Yu Zhang, Ron J Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, et almbox. 2018. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis. arXiv preprint arXiv:1806.04558 (2018).
[3]
RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J Weiss, Rob Clark, and Rif A Saurous. 2018. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron. arXiv preprint arXiv:1803.09047 (2018).
[4]
Hideyuki Tachibana, Katsuya Uenoyama, and Shunsuke Aihara. 2017. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention. arXiv preprint arXiv:1710.08969 (2017).
[5]
Noé Tits. 2019. A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech-a Deep Learning approach. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 1--5.
[6]
Noé Tits, Kevin El Haddad, and Thierry Dutoit. 2019 a. The Theory behind Controllable Expressive Speech Synthesis: A Cross-Disciplinary Approach. In Human-Computer Interaction . IntechOpen. https://doi.org/10.5772/intechopen.89849
[7]
Noé Tits, Kevin El Haddad, and Thierry Dutoit. 2020. Exploring Transfer Learning for Low Resource Emotional TTS. In Intelligent Systems and Applications, Yaxin Bi, Rahul Bhatia, and Supriya Kapoor (Eds.). Springer International Publishing, Cham, 52--60.
[8]
Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, and Thierry Dutoit. 2019 b. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis. In Proc. Interspeech 2019 . 4475--4479. https://doi.org/10.21437/Interspeech.2019--1426
[9]
Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Ren, Ye Jia, and Rif A Saurous. 2018. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis. arXiv preprint arXiv:1803.09017 (2018).
[10]
G. N. Yannakakis, R. Cowie, and C. Busso. 2017. The ordinal nature of emotions. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), Vol. 00. 248--255. https://doi.org/10.1109/ACII.2017.8273608

Cited By

View all
  • (2021)Analysis and Assessment of Controllability of an Expressive Deep Learning-Based TTS SystemInformatics10.3390/informatics80400848:4(84)Online publication date: 25-Nov-2021
  • (2021)Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech2021 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT48900.2021.9383526(415-422)Online publication date: 19-Jan-2021

Index Terms

  1. Neural Speech Synthesis with Style Intensity Interpolation: A Perceptual Analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HRI '20: Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction
    March 2020
    702 pages
    ISBN:9781450370578
    DOI:10.1145/3371382
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 2020

    Check for updates

    Author Tags

    1. deep learning
    2. perception
    3. speech synthesis
    4. style interpolation

    Qualifiers

    • Abstract

    Funding Sources

    Conference

    HRI '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 192 of 519 submissions, 37%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Analysis and Assessment of Controllability of an Expressive Deep Learning-Based TTS SystemInformatics10.3390/informatics80400848:4(84)Online publication date: 25-Nov-2021
    • (2021)Vaw-Gan For Disentanglement And Recomposition Of Emotional Elements In Speech2021 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT48900.2021.9383526(415-422)Online publication date: 19-Jan-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media