Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3456126.3456129acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasseConference Proceedingsconference-collections
research-article

SeeSpeech: See Emotions in The Speech

Published: 29 June 2021 Publication History

Abstract

At present, the understanding of speech by machines mostly focuses on the understanding of semantics, but speech should also include emotions in the speech. Emotion can not only strengthen semantics, but can even change semantic information. The paper discusses how to realize the emotion classification, which is called SeeSpeech. SeeSpeech chooses MCEP as the speech emotion feature, and inputs it into CNN and Transformer respectively. In order to obtain richer features, CNN uses batch normalization, while Transformer uses layer normalization, and then combines the output of CNN and Transformer. Finally, the type of emotion is obtained through SoftMax. SeeSpeech obtained the highest classification accuracy rate of 97% on the RAVDESS data set, and also obtained the classification accuracy rate of 85% on the actual edge gateway test. It can be seen from the results that SeeSpeech has encouraging performance in speech emotion classification and has a wide range of application prospects in human-computer interaction.

References

[1]
R. W. Picard, Affective computing. MIT press, 2000.
[2]
P. Gupta and N. Rajput, “Two-stream emotion recognition for call center monitoring,” in Eighth Annual Conference of the International Speech Communication Association, 2007.
[3]
B. Schuller, G. Rigoll, and M. Lang, “Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. IEEE, 2004, pp. I–577.
[4]
V. Kostov and S. Fukuda, “Emotion in user interface, voice interaction system,” in Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics.’cybernetics evolving to systems, humans, organizations, and their complex interactions’(cat. no. 0, vol. 2. IEEE, 2000, pp. 798–803.
[5]
H. Boril, S. Omid Sadjadi, T. Kleinschmidt, and J. H. Hansen, “Analysis and detection of cognitive load and frustration in drivers’ speech,” Proceedings of INTERSPEECH 2010, pp. 502–505, 2010.
[6]
E. Marchi, B. Schuller, A. Batliner, S. Fridenzon, S. Tal, and O. Golan, “Emotion in the speech of children with autism spectrum conditions: Prosody and everything else,” in Proceedings 3rd Workshop on Child, Computer and Interaction (WOCCI 2012), Satellite Event of INTERSPEECH 2012, 2012.
[7]
R. F. Livingstone SR, “(2018) the ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. plos one 13(5): e0196391.” https://doi.org/10.1371/journal.pone.0196391.
[8]
M. G. de Pinto, M. Polignano, P. Lops, and G. Semeraro, “Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients,” in 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS). IEEE, 2020, pp. 1 – 5.
[9]
Iqbal and K. Barua, “A real-time emotion recognition from speech using gradient boosting,” in 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 2019, pp. 1 – 5.
[10]
R. Jannat, I. Tynes, L. L. Lime, J. Adorno, and S. Canavan, “Ubiquitous emotion recognition using audio and video data,” in Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, 2018, pp. 956–959.
[11]
K. S. Rao, S. G. Koolagudi, and R. R. Vempada, “Emotion recognition from speech using global and local prosodic features,” International journal of speech technology, vol. 16, no. 2, pp. 143–160, 2013.
[12]
S. G. Koolagudi and K. S. Rao, “Emotion recognition from speech: a review,” International journal of speech technology, vol. 15, no. 2, pp. 99 –117, 2012.
[13]
S. Kuchibhotla, H. D. Vankayalapati, R. Vaddi, and K. R. Anne, “A comparative analysis of classifiers in emotion recognition through acoustic features,” International Journal of Speech Technology, vol. 17, no. 4, pp. 401–408, 2014.
[14]
D. Bitouk, R. Verma, and A. Nenkova, “Class-level spectral features for emotion recognition,” Speech communication, vol. 52, no. 7-8, pp. 613–625, 2010.
[15]
H. Teager and S. Teager, “Evidence for nonlinear sound production mechanisms in the vocal tract,” in Speech production and speech modelling. Springer, 1990, pp. 241–261.
[16]
J. F. Kaiser, “On a simple algorithm to calculate the'energy'of a signal,” in International conference on acoustics, speech, and signal processing. IEEE, 1990, pp. 381–384.
[17]
F. Bulagang, N. G. Weng, J. Mountstephens, and J. Teo, “A review of recent approaches for emotion classification using electrocardiography and electrodermography signals,” Informatics in Medicine Unlocked, vol. 20, p. 100363, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2352914820301040
[18]
B. Schuller, G. Rigoll, and M. Lang, “Hidden markov model-based speech emotion recognition,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., vol. 2. IEEE, 2003, pp. II–1.
[19]
T. L. Nwe, S. W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden markov models,” Speech communication, vol. 41, no. 4, pp. 603–623, 2003.
[20]
Y. Pan, P. Shen, and L. Shen, “Speech emotion recognition using support vector machine,” International Journal of Smart Home, vol. 6, no. 2, pp. 101–108, 2012.
[21]
J. Nicholson, K. Takahashi, and R. Nakatsu, “Emotion recognition in speech using neural networks,” Neural computing & applications, vol. 9, no. 4, pp. 290–296, 2000.
[22]
F. Eyben, M. Wollmer,¨ A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie, “On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues,” Journal on Multimodal User Interfaces, vol. 3, no. 1-2, pp. 7–19, 2010.
[23]
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016, pp. 5200–5204.
[24]
W. Lim, D. Jang, and T. Lee, “Speech emotion recognition using convolutional and recurrent neural networks,” in 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA). IEEE, 2016, pp. 1 – 4.

Index Terms

  1. SeeSpeech: See Emotions in The Speech
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ASSE '21: 2021 2nd Asia Service Sciences and Software Engineering Conference
    February 2021
    143 pages
    ISBN:9781450389082
    DOI:10.1145/3456126
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 June 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep learning
    2. Emothion classification
    3. Emotions in speech

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ASSE '21

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 57
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media