Öktem et al., 2019 - Google Patents

Prosodic phrase alignment for machine dubbing

Öktem et al., 2019

Document ID: 4106507144681707661
Author: Öktem A; Farrús M; Bonafonte A
Publication year: 2019
Publication venue: arXiv preprint arXiv:1908.07226

External Links

Cited by

Snippet

Dubbing is a type of audiovisual translation where dialogues are translated and enacted so that they give the impression that the media is in the target language. It requires a careful alignment of dubbed recordings with the lip movements of performers in order to achieve …

Continue reading at arxiv.org (PDF) (other versions)

238000000034 method 0 abstract description 17

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G10L2021/105—Synthesis of the lips movements from speech, e.g. for talking heads
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G10L15/265—Speech recognisers specially adapted for particular applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication	Publication Date	Title
Öktem et al.	2019	Prosodic phrase alignment for machine dubbing
US10930263B1 (en)	2021-02-23	Automatic voice dubbing for media content localization
Federico et al.	2020	From speech-to-speech translation to automatic dubbing
US11942093B2 (en)	2024-03-26	System and method for simultaneous multilingual dubbing of video-audio programs
US20160021334A1 (en)	2016-01-21	Method, Apparatus and System For Regenerating Voice Intonation In Automatically Dubbed Videos
US8170878B2 (en)	2012-05-01	Method and apparatus for automatically converting voice
Georgakopoulou	2018	Technologization of audiovisual translation
Hu et al.	2021	Neural dubber: Dubbing for videos according to scripts
Székely et al.	2019	How to train your fillers: uh and um in spontaneous speech synthesis
Baños	2018	Technology and audiovisual translation
Spiteri Miggiani	2021	Exploring applied strategies for English-language dubbing
WO2023279976A1 (en)	2023-01-12	Speech synthesis method, apparatus, device, and storage medium
Sánchez-Mompeán	2020	Prefabricated orality at tone level: Bringing dubbing intonation into the spotlight
KR102261539B1 (en)	2021-06-07	System for providing artificial intelligence based korean culture platform service
de los Reyes Lozano et al.	2023	Beyond the black mirror effect: the impact of machine translation in the audiovisual translation environment
Prazák et al.	2012	Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker's Needs.
Virkar et al.	2022	Prosodic alignment for off-screen automatic dubbing
Dall	2017	Statistical parametric speech synthesis using conversational data and phenomena
CN113851140A (en)	2021-12-28	Voice conversion correlation method, system and device
Jankowska et al.	2017	Reading rate in filmic audio description
Kadam et al.	2021	A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation
KR101920653B1 (en)	2018-11-22	Method and program for edcating language by making comparison sound
Nouza et al.	2015	System for producing subtitles to internet audio-visual documents
Picart et al.	2013	HMM-based speech synthesis of live sports commentaries: Integration of a two-layer prosody annotation
US20240155205A1 (en)	2024-05-09	Method for generating captions, subtitles and dubbing for audiovisual media