Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space
Abstract
In many applications of multi-microphone multi-device processing, the synchronization among different input channels can be affected by the lack of a common clock and isolated drops of samples. In this work, we address the issue of sample drop detection in the context of a conversational speech scenario, recorded by a set of microphones distributed in space. The goal is to design a neural-based model that given a short window in the time domain, detects whether one or more devices have been subjected to a sample drop event. The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step. The latter is based on the application of normalized cross-correlation between signals acquired by different devices. The architecture of the neural network relies on a CNN-LSTM encoder, followed by multi-head attention. The experiments are conducted using both artificial and real data. Our proposed approach obtained F1 score of 88% on an evaluation set extracted from the CHiME-5 corpus. A comparable performance was found in a larger set of experiments conducted on a set of multi-channel artificial scenes.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2019
- DOI:
- 10.48550/arXiv.1911.06713
- arXiv:
- arXiv:1911.06713
- Bibcode:
- 2019arXiv191106713R
- Keywords:
-
- Computer Science - Sound;
- Electrical Engineering and Systems Science - Audio and Speech Processing;
- I.2.7
- E-Print:
- Submitted to ICASSP 2020