Koo et al., 2023 - Google Patents
KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processingKoo et al., 2023
View PDF- Document ID
- 17125082080225205763
- Author
- Koo S
- Park C
- Kim J
- Seo J
- Eo S
- Moon H
- Lim H
- Publication year
- Publication venue
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
External Links
Snippet
Abstract Automatic Speech Recognition (ASR) systems are instrumental across various applications, with their performance being critically tied to user satisfaction. Conventional evaluation metrics for ASR systems produce a singular aggregate score, which is insufficient …
- 238000012805 post-processing 0 title abstract description 16
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G10L15/265—Speech recognisers specially adapted for particular applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5405672B2 (en) | Foreign language learning apparatus and dialogue system | |
Ngueajio et al. | Hey ASR system! Why aren’t you more inclusive? Automatic speech recognition systems’ bias and proposed bias mitigation techniques. A literature review | |
Keshet | Automatic speech recognition: A primer for speech-language pathology researchers | |
Kopparapu | Non-linguistic analysis of call center conversations | |
Mridha et al. | A study on the challenges and opportunities of speech recognition for Bengali language | |
Hong et al. | Identifying speech input errors through audio-only interaction | |
Rabiner et al. | Speech recognition: Statistical methods | |
Choe et al. | Language-specific effects on automatic speech recognition errors for world Englishes | |
CA3160315C (en) | Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore | |
Koo et al. | KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing | |
AU2020103587A4 (en) | A system and a method for cross-linguistic automatic speech recognition | |
KR101145440B1 (en) | A method and system for estimating foreign language speaking using speech recognition technique | |
Kumar et al. | Automatic spontaneous speech recognition for Punjabi language interview speech corpus | |
Manjunath et al. | Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali | |
Sazhok et al. | Punctuation Restoration for Ukrainian Broadcast Speech Recognition System based on Bidirectional Recurrent Neural Network and Word Embeddings. | |
Jackson | Automatic speech recognition: Human computer interface for kinyarwanda language | |
Tits et al. | Flowchase: a Mobile Application for Pronunciation Training | |
Koo et al. | Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline | |
Kano et al. | An end-to-end model for cross-lingual transformation of paralinguistic information | |
Mann et al. | Tamil talk: What you speak is what you get! | |
Balaji et al. | CSE_SPEECH@ LT-EDI-2023Automatic Speech Recognition vulnerable old-aged and transgender people in Tamil | |
Sani et al. | Filled pause detection in indonesian spontaneous speech | |
Vafaie et al. | 4 Speech Recognition for Persian | |
Elfahal | Automatic recognition and identification for mixed sudanese arabic–english languages speech | |
de Carvalho Campinho | Automatic Speech Recognition for European Portuguese |