Mocanu et al., 2021 - Google Patents
Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired peopleMocanu et al., 2021
View PDF- Document ID
- 2255917904981265260
- Author
- Mocanu B
- Tapu R
- Publication year
- Publication venue
- IEEE Access
External Links
Snippet
In this paper, we introduce a subtitle synchronization and positioning system designed to increase the accessibility of deaf and hearing impaired people to multimedia documents. The main contributions of the paper concern: a novel synchronization algorithm able to …
- 206010011878 Deafness 0 title abstract description 11
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30796—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television, VOD [Video On Demand]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network, synchronizing decoder's clock; Client middleware
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television, VOD [Video On Demand]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television, VOD [Video On Demand]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Makino et al. | Recurrent neural network transducer for audio-visual speech recognition | |
EP1692629B1 (en) | System & method for integrative analysis of intrinsic and extrinsic audio-visual data | |
US7299183B2 (en) | Closed caption signal processing apparatus and method | |
US7046300B2 (en) | Assessing consistency between facial motion and speech signals in video | |
WO2014141054A1 (en) | Method, apparatus and system for regenerating voice intonation in automatically dubbed videos | |
WO2007004110A2 (en) | System and method for the alignment of intrinsic and extrinsic audio-visual information | |
CN112733654B (en) | Method and device for splitting video | |
Marcheret et al. | Detecting audio-visual synchrony using deep neural networks. | |
CN114143479B (en) | Video abstract generation method, device, equipment and storage medium | |
Federico et al. | An automatic caption alignment mechanism for off-the-shelf speech recognition technologies | |
EP3839953A1 (en) | Automatic caption synchronization and positioning | |
Martín et al. | Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation | |
GB2366110A (en) | Synchronising audio and video. | |
González-Carrasco et al. | Sub-sync: Automatic synchronization of subtitles in the broadcasting of true live programs in spanish | |
Bang et al. | Automatic construction of a large-scale speech recognition database using multi-genre broadcast data with inaccurate subtitle timestamps | |
Mocanu et al. | Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired people | |
US9020817B2 (en) | Using speech to text for detecting commercials and aligning edited episodes with transcripts | |
Tapu et al. | Dynamic subtitles: A multimodal video accessibility enhancement dedicated to deaf and hearing impaired users | |
KR102160117B1 (en) | a real-time broadcast content generating system for disabled | |
CN116708055B (en) | Intelligent multimedia audiovisual image processing method, system and storage medium | |
KR20210047583A (en) | Database construction method and apparatus | |
KR101920653B1 (en) | Method and program for edcating language by making comparison sound | |
Nouza et al. | System for producing subtitles to internet audio-visual documents | |
JP6344849B2 (en) | Video classifier learning device and program | |
Saz et al. | Lightly supervised alignment of subtitles on multi-genre broadcasts |