Computer Science > Sound

arXiv:2305.18794 (cs)

[Submitted on 30 May 2023]

Title:Understanding temporally weakly supervised training: A case study for keyword spotting

Authors:Heinrich Dinkel, Weiji Zhuang, Zhiyong Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

View PDF

Abstract:The currently most prominent algorithm to train keyword spotting (KWS) models with deep neural networks (DNNs) requires strong supervision i.e., precise knowledge of the spoken keyword location in time. Thus, most KWS approaches treat the presence of redundant data, such as noise, within their training set as an obstacle. A common training paradigm to deal with data redundancies is to use temporally weakly supervised learning, which only requires providing labels on a coarse scale. This study explores the limits of DNN training using temporally weak labeling with applications in KWS. We train a simple end-to-end classifier on the common Google Speech Commands dataset with increased difficulty by randomly appending and adding noise to the training dataset. Our results indicate that temporally weak labeling can achieve comparable results to strongly supervised baselines while having a less stringent labeling requirement. In the presence of noise, weakly supervised models are capable to localize and extract target keywords without explicit supervision, leading to a performance increase compared to strongly supervised approaches.

Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.18794 [cs.SD]
	(or arXiv:2305.18794v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2305.18794

Submission history

From: Heinrich Dinkel [view email]
[v1] Tue, 30 May 2023 07:12:29 UTC (526 KB)

Computer Science > Sound

Title:Understanding temporally weakly supervised training: A case study for keyword spotting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Understanding temporally weakly supervised training: A case study for keyword spotting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators