Computer Science > Computation and Language

arXiv:2404.16743 (cs)

[Submitted on 25 Apr 2024 (v1), last revised 26 Apr 2024 (this version, v2)]

Title:Automatic Speech Recognition System-Independent Word Error Rate Estimation

Authors:Chanho Park, Mingjie Chen, Thomas Hain

Abstract:Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Previous work on WER estimation focused on building models that are trained with a specific ASR system in mind (referred to as ASR system-dependent). These are also domain-dependent and inflexible in real-world applications. In this paper, a hypothesis generation method for ASR System-Independent WER estimation (SIWE) is proposed. In contrast to prior work, the WER estimators are trained using data that simulates ASR system output. Hypotheses are generated using phonetically similar or linguistically more likely alternative words. In WER estimation experiments, the proposed method reaches a similar performance to ASR system-dependent WER estimators on in-domain data and achieves state-of-the-art performance on out-of-domain data. On the out-of-domain data, the SIWE model outperformed the baseline estimators in root mean square error and Pearson correlation coefficient by relative 17.58% and 18.21%, respectively, on Switchboard and CALLHOME. The performance was further improved when the WER of the training set was close to the WER of the evaluation dataset.

Comments:	Accepted to LREC-COLING 2024 (long)
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2404.16743 [cs.CL]
	(or arXiv:2404.16743v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.16743

Submission history

From: Chanho Park [view email]
[v1] Thu, 25 Apr 2024 16:57:05 UTC (110 KB)
[v2] Fri, 26 Apr 2024 11:11:02 UTC (110 KB)

Computer Science > Computation and Language

Title:Automatic Speech Recognition System-Independent Word Error Rate Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Automatic Speech Recognition System-Independent Word Error Rate Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators