research-article

A 3-D Audio-Visual Corpus of Affective Communication

Authors:

G. Fanelli,

J. Gall,

H. Romsdorfer,

T. Weise,

L. Van GoolAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 12, Issue 6

Pages 591 - 598

https://doi.org/10.1109/TMM.2010.2052239

Published: 01 October 2010 Publication History

Abstract

Communication between humans deeply relies on the capability of expressing and recognizing feelings. For this reason, research on human-machine interaction needs to focus on the recognition and simulation of emotional states, prerequisite of which is the collection of affective corpora. Currently available datasets still represent a bottleneck for the difficulties arising during the acquisition and labeling of affective data. In this work, we present a new audio-visual corpus for possibly the two most important modalities used by humans to communicate their emotional states, namely speech and facial expression in the form of dense dynamic 3-D face geometries. We acquire high-quality data by working in a controlled environment and resort to video clips to induce affective states. The annotation of the speech signal includes: transcription of the corpus text into the phonological representation, accurate phone segmentation, fundamental frequency extraction, and signal intensity estimation of the speech signals. We employ a real-time 3-D scanner to acquire dense dynamic facial geometries and track the faces throughout the sequences, achieving full spatial and temporal correspondences. The corpus is a valuable tool for applications like affective visual speech synthesis or view-independent facial expression recognition.

Cited By

View all

Ghose DGitelson OScassellati B(2024)Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual DataProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685717(22-32)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3685717
Wu SHaque KYumak Z(2024)ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAEProceedings of the 17th ACM SIGGRAPH Conference on Motion, Interaction, and Games10.1145/3677388.3696320(1-12)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3677388.3696320
Wu SLi YYan YDuan HLiu ZZhai GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)MMHead: Towards Fine-grained Multi-modal 3D Facial AnimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681366(7966-7975)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681366
Show More Cited By

A 3-D Audio-Visual Corpus of Affective Communication
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition ...
A study of correlation between physiological process of articulation and emotions on Mandarin Chinese
Abstract
The goal of this work is to investigate the correlation between physiological process of articulation and emotion expressing modes via speech of Mandarin Chinese. A bimodal emotional speech dataset with simultaneous speech and ...
Highlights
- Developing a bimodal emotional speech dataset in Mandarin Chinese with simultaneous speech and electromagnetic articulography (EMA) data.
A realistic 3D articulatory animation system for emotional visual pronunciation

This paper proposes a realistic 3D articulatory animation system for emotional visual pronunciation driven by the emotional articulatory movement trajectory. Firstly, the articulatory movements, recorded by Electro-Magnetic Articulatory (EMA), are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 12, Issue 6

October 2010

122 pages

ISSN:1520-9210

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 October 2010

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ghose DGitelson OScassellati B(2024)Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual DataProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685717(22-32)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3678957.3685717
Wu SHaque KYumak Z(2024)ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAEProceedings of the 17th ACM SIGGRAPH Conference on Motion, Interaction, and Games10.1145/3677388.3696320(1-12)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3677388.3696320
Wu SLi YYan YDuan HLiu ZZhai GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)MMHead: Towards Fine-grained Multi-modal 3D Facial AnimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681366(7966-7975)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681366
Shen KXia HGeng GGeng GXia SDing ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity ModelingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681359(10506-10514)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681359
Zhang XWu SGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)CLTalk: Speech-Driven 3D Facial Animation with Contrastive LearningProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657625(1175-1179)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657625
Zhao QLong PZhang QQin DLiang HZhang LZhang YYu JXu L(2024)Media2Face: Co-speech Facial Animation Generation With Multi-Modality GuidanceACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657413(1-13)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657413
Song WWang XJiang YLi SHao AHou XQin H(2024)Expressive 3D Facial Animation Generation Based on Local-to-Global Latent DiffusionIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345621330:11(7397-7407)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456213
Chu ZGuo KXing XLan YCai BXu X(2024)CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D AnimationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.338683634:9(8953-8965)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3386836
Yang DLi RYang QPeng YHuang XZou J(2024)3D head-talk: speech synthesis 3D head movement face animationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-023-09292-528:1(363-379)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s00500-023-09292-5
Nocentini FBesnier TFerrari CArguillere SBerretti SDaoudi M(2024)ScanTalk: 3D Talking Heads from Unregistered ScansComputer Vision – ECCV 202410.1007/978-3-031-73397-0_2(19-36)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73397-0_2
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Recommendations

A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

A study of correlation between physiological process of articulation and emotions on Mandarin Chinese

A realistic 3D articulatory animation system for emotional visual pronunciation

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations