Nothing Special   »   [go: up one dir, main page]

Mocanu et al., 2021 - Google Patents

Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired people

Mocanu et al., 2021

View PDF
Document ID
2255917904981265260
Author
Mocanu B
Tapu R
Publication year
Publication venue
IEEE Access

External Links

Snippet

In this paper, we introduce a subtitle synchronization and positioning system designed to increase the accessibility of deaf and hearing impaired people to multimedia documents. The main contributions of the paper concern: a novel synchronization algorithm able to …
Continue reading at ieeexplore.ieee.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • G06F17/30796Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television, VOD [Video On Demand]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network, synchronizing decoder's clock; Client middleware
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television, VOD [Video On Demand]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television, VOD [Video On Demand]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication Publication Date Title
Makino et al. Recurrent neural network transducer for audio-visual speech recognition
EP1692629B1 (en) System & method for integrative analysis of intrinsic and extrinsic audio-visual data
US7299183B2 (en) Closed caption signal processing apparatus and method
US7046300B2 (en) Assessing consistency between facial motion and speech signals in video
WO2014141054A1 (en) Method, apparatus and system for regenerating voice intonation in automatically dubbed videos
WO2007004110A2 (en) System and method for the alignment of intrinsic and extrinsic audio-visual information
CN112733654B (en) Method and device for splitting video
Marcheret et al. Detecting audio-visual synchrony using deep neural networks.
CN114143479B (en) Video abstract generation method, device, equipment and storage medium
Federico et al. An automatic caption alignment mechanism for off-the-shelf speech recognition technologies
EP3839953A1 (en) Automatic caption synchronization and positioning
Martín et al. Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation
GB2366110A (en) Synchronising audio and video.
González-Carrasco et al. Sub-sync: Automatic synchronization of subtitles in the broadcasting of true live programs in spanish
Bang et al. Automatic construction of a large-scale speech recognition database using multi-genre broadcast data with inaccurate subtitle timestamps
Mocanu et al. Automatic subtitle synchronization and positioning system dedicated to deaf and hearing impaired people
US9020817B2 (en) Using speech to text for detecting commercials and aligning edited episodes with transcripts
Tapu et al. Dynamic subtitles: A multimodal video accessibility enhancement dedicated to deaf and hearing impaired users
KR102160117B1 (en) a real-time broadcast content generating system for disabled
CN116708055B (en) Intelligent multimedia audiovisual image processing method, system and storage medium
KR20210047583A (en) Database construction method and apparatus
KR101920653B1 (en) Method and program for edcating language by making comparison sound
Nouza et al. System for producing subtitles to internet audio-visual documents
JP6344849B2 (en) Video classifier learning device and program
Saz et al. Lightly supervised alignment of subtitles on multi-genre broadcasts