Mocanu et al., 2021 - Google Patents

Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired people

Mocanu et al., 2021

Document ID: 2255917904981265260
Author: Mocanu B; Tapu R
Publication year: 2021
Publication venue: IEEE Access

External Links

Cited by

Snippet

In this paper, we introduce a subtitle synchronization and positioning system designed to increase the accessibility of deaf and hearing impaired people to multimedia documents. The main contributions of the paper concern: a novel synchronization algorithm able to …

Continue reading at ieeexplore.ieee.org (PDF) (other versions)

206010011878 Deafness 0 title abstract description 11

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30796—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television, VOD [Video On Demand]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network, synchronizing decoder's clock; Client middleware
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television, VOD [Video On Demand]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television, VOD [Video On Demand]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication	Publication Date	Title
Makino et al.	2019	Recurrent neural network transducer for audio-visual speech recognition
EP1692629B1 (en)	2011-06-08	System & method for integrative analysis of intrinsic and extrinsic audio-visual data
US7299183B2 (en)	2007-11-20	Closed caption signal processing apparatus and method
US7046300B2 (en)	2006-05-16	Assessing consistency between facial motion and speech signals in video
WO2014141054A1 (en)	2014-09-18	Method, apparatus and system for regenerating voice intonation in automatically dubbed videos
WO2007004110A2 (en)	2007-01-11	System and method for the alignment of intrinsic and extrinsic audio-visual information
CN112733654B (en)	2022-05-24	Method and device for splitting video
Marcheret et al.	2015	Detecting audio-visual synchrony using deep neural networks.
CN114143479B (en)	2023-07-25	Video abstract generation method, device, equipment and storage medium
Federico et al.	2014	An automatic caption alignment mechanism for off-the-shelf speech recognition technologies
EP3839953A1 (en)	2021-06-23	Automatic caption synchronization and positioning
Martín et al.	2021	Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation
GB2366110A (en)	2002-02-27	Synchronising audio and video.
González-Carrasco et al.	2019	Sub-sync: Automatic synchronization of subtitles in the broadcasting of true live programs in spanish
Bang et al.	2020	Automatic construction of a large-scale speech recognition database using multi-genre broadcast data with inaccurate subtitle timestamps
Mocanu et al.	2021	Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired people
US9020817B2 (en)	2015-04-28	Using speech to text for detecting commercials and aligning edited episodes with transcripts
Tapu et al.	2019	Dynamic subtitles: A multimodal video accessibility enhancement dedicated to deaf and hearing impaired users
KR102160117B1 (en)	2020-09-25	a real-time broadcast content generating system for disabled
CN116708055B (en)	2024-02-20	Intelligent multimedia audiovisual image processing method, system and storage medium
KR20210047583A (en)	2021-04-30	Database construction method and apparatus
KR101920653B1 (en)	2018-11-22	Method and program for edcating language by making comparison sound
Nouza et al.	2015	System for producing subtitles to internet audio-visual documents
JP6344849B2 (en)	2018-06-20	Video classifier learning device and program
Saz et al.	2018	Lightly supervised alignment of subtitles on multi-genre broadcasts