宮脇亮輔 et al., 2023 - Google Patents
A Data Collection Protocol, Tool, and Analysis of Multimodal Data at Different Speech Voice Levels for Avatar Facial Animation宮脇亮輔 et al., 2023
View PDF- Document ID
- 7306678649644290414
- Author
- 宮脇亮輔
- ミヤワキリョウスケ
- Publication year
External Links
Snippet
Knowing the relationship between speech-related facial movement and speech is essential for avatar animation. Accurate facial displays are necessary to convey perceptual speech characteristics fully. Recently, an effort has been made to infer the relationship between …
- 230000001815 facial effect 0 title abstract description 176
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00268—Feature extraction; Face representation
- G06K9/00281—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200279553A1 (en) | Linguistic style matching agent | |
US20210390748A1 (en) | Personalized speech-to-video with three-dimensional (3d) skeleton regularization and expressive body poses | |
Busso et al. | Rigid head motion in expressive speech animation: Analysis and synthesis | |
US9691296B2 (en) | Methods and apparatus for conversation coach | |
Ding et al. | Laughter animation synthesis | |
JP2022534708A (en) | A Multimodal Model for Dynamically Reacting Virtual Characters | |
CN111415677A (en) | Method, apparatus, device and medium for generating video | |
Fort et al. | Seeing the initial articulatory gestures of a word triggers lexical access | |
JP2024525119A (en) | System and method for automatic generation of interactive synchronized discrete avatars in real time | |
Arias et al. | Realistic transformation of facial and vocal smiles in real-time audiovisual streams | |
US11860925B2 (en) | Human centered computing based digital persona generation | |
Janssoone et al. | Using temporal association rules for the synthesis of embodied conversational agents with a specific stance | |
US20240220811A1 (en) | System and method for using gestures and expressions for controlling speech applications | |
Amiriparian et al. | Synchronization in interpersonal speech | |
Mattheij et al. | Mirror mirror on the wall | |
Delbosc et al. | Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent | |
Chen et al. | VAST: Vivify your talking avatar via zero-shot expressive facial style transfer | |
宮脇亮輔 et al. | A Data Collection Protocol, Tool, and Analysis of Multimodal Data at Different Speech Voice Levels for Avatar Facial Animation | |
Urbain et al. | Laugh machine | |
CN110166844B (en) | Data processing method and device for data processing | |
Heisler et al. | Making an android robot head talk | |
Granström et al. | Inside out–acoustic and visual aspects of verbal and non-verbal communication | |
Mathur | Scaling machine learning systems using domain adaptation | |
US12126791B1 (en) | Conversational AI-encoded language for data compression | |
Dahmani et al. | Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform |