Nothing Special   »   [go: up one dir, main page]

宮脇亮輔 et al., 2023 - Google Patents

A Data Collection Protocol, Tool, and Analysis of Multimodal Data at Different Speech Voice Levels for Avatar Facial Animation

宮脇亮輔 et al., 2023

View PDF
Document ID
7306678649644290414
Author
宮脇亮輔
ミヤワキリョウスケ
Publication year

External Links

Snippet

Knowing the relationship between speech-related facial movement and speech is essential for avatar animation. Accurate facial displays are necessary to convey perceptual speech characteristics fully. Recently, an effort has been made to infer the relationship between …
Continue reading at naist.repo.nii.ac.jp (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00281Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems

Similar Documents

Publication Publication Date Title
US20200279553A1 (en) Linguistic style matching agent
US20210390748A1 (en) Personalized speech-to-video with three-dimensional (3d) skeleton regularization and expressive body poses
Busso et al. Rigid head motion in expressive speech animation: Analysis and synthesis
US9691296B2 (en) Methods and apparatus for conversation coach
Ding et al. Laughter animation synthesis
JP2022534708A (en) A Multimodal Model for Dynamically Reacting Virtual Characters
CN111415677A (en) Method, apparatus, device and medium for generating video
Fort et al. Seeing the initial articulatory gestures of a word triggers lexical access
JP2024525119A (en) System and method for automatic generation of interactive synchronized discrete avatars in real time
Arias et al. Realistic transformation of facial and vocal smiles in real-time audiovisual streams
US11860925B2 (en) Human centered computing based digital persona generation
Janssoone et al. Using temporal association rules for the synthesis of embodied conversational agents with a specific stance
US20240220811A1 (en) System and method for using gestures and expressions for controlling speech applications
Amiriparian et al. Synchronization in interpersonal speech
Mattheij et al. Mirror mirror on the wall
Delbosc et al. Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent
Chen et al. VAST: Vivify your talking avatar via zero-shot expressive facial style transfer
宮脇亮輔 et al. A Data Collection Protocol, Tool, and Analysis of Multimodal Data at Different Speech Voice Levels for Avatar Facial Animation
Urbain et al. Laugh machine
CN110166844B (en) Data processing method and device for data processing
Heisler et al. Making an android robot head talk
Granström et al. Inside out–acoustic and visual aspects of verbal and non-verbal communication
Mathur Scaling machine learning systems using domain adaptation
US12126791B1 (en) Conversational AI-encoded language for data compression
Dahmani et al. Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform