Robustní zpracování nahrávek pro operativu a bezpečnost

English title

Robust processing of recordings for operations and security

Type

grant

Keywords

speech recognition, robust, recordings, operations, security

Abstract

The aim of the project is to increase competencies, unification and greater coordination of two leading Czech research institutes, in the field of speech information mining from real recordings in the field of security and close cooperation with security corps to put research results into practice of investigation and intelligence. This goal includes a shift in robust automatic speech recognition (ASR), training / adaptation of ASRs for different environments, determining when a person is speaking in a recording (diarization), and researching recordings through acoustic queries (Query by Example)

Team members

Karafiát Martin, Ing., Ph.D. (UPGM FIT VUT) , research leader
Diez Sánchez Mireia, M.Sc., Ph.D. (UPGM FIT VUT) , team leader
Matějka Pavel, Ing., Ph.D. (UPGM FIT VUT) , team leader
Szőke Igor, Ing., Ph.D. (UPGM FIT VUT) , team leader
Beneš Karel, Ing. (UPGM FIT VUT)
Brukner Jan, Ing. (UPGM FIT VUT)
Kesiraju Santosh (UPGM FIT VUT)
Malenovský Vladimír, Ing., Ph.D. (UPGM FIT VUT)
Mošner Ladislav, Ing. (UPGM FIT VUT)
Pálka Petr, Bc. (UPGM FIT VUT)
Plchot Oldřich, Ing., Ph.D. (UPGM FIT VUT)
Schwarz Petr, Ing., Ph.D. (UPGM FIT VUT)
Silnova Anna, MSc., Ph.D. (UPGM FIT VUT)
Švec Ján, Ing. (UPGM FIT VUT)
Veselý Karel, Ing., Ph.D. (UPGM FIT VUT)
Yusuf Bolaji (UPGM FIT VUT)

Publications

2024

KUNEŠOVÁ Marie, ZAJÍC Zbyněk, ŠMÍDL Luboš and KARAFIÁT Martin. Comparison of wav2vec 2.0 models on three speech processing tasks. International Journal of Speech Technology, vol. 27, no. 4, 2024, pp. 1-13. ISSN 1572-8110. Detail
HAN Jiangyu, LANDINI Federico Nicolás, ROHDIN Johan A., DIEZ Sánchez Mireia, BURGET Lukáš, CAO Yuhang, LU Heng and ČERNOCKÝ Jan. Diacorrect: Error Correction Back-End for Speaker Diarization. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024, pp. 11181-11185. ISBN 979-8-3503-4485-1. Detail
LANDINI Federico Nicolás, DIEZ Sánchez Mireia, STAFYLAKIS Themos and BURGET Lukáš. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, vol. 32, no. 7, 2024, pp. 3450-3465. ISSN 1558-7916. Detail
KLEMENT Dominik, DIEZ Sánchez Mireia, LANDINI Federico Nicolás, BURGET Lukáš, SILNOVA Anna, DELCROIX Marc and TAWARA Naohiro. Discriminative Training of VBx Diarization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024, pp. 11871-11875. ISBN 979-8-3503-4485-1. Detail
ZHANG Lin, STAFYLAKIS Themos, LANDINI Federico Nicolás, DIEZ Sánchez Mireia, SILNOVA Anna and BURGET Lukáš. Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?. In: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop. Québec City: International Speech Communication Association, 2024, pp. 123-130. Detail
MOŠNER Ladislav, SERIZEL Romain, BURGET Lukáš, PLCHOT Oldřich, VINCENT Emmanuel, PENG Junyi and ČERNOCKÝ Jan. Multi-Channel Extension of Pre-trained Models for Speaker Verification. In: Proceedings of Interspeech 2024. Kos: International Speech Communication Association, 2024, pp. 2135-2139. ISSN 1990-9772. Detail
YUSUF Bolaji, ČERNOCKÝ Jan and SARAÇLAR Murat. Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units. In: Proceedings of Interspeech 2024. Kos: International Speech Communication Association, 2024, pp. 5068-5072. ISSN 1990-9772. Detail
PEŠÁN Jan, JUŘÍK Vojtěch, RŮŽIČKOVÁ Alexandra, SVOBODA Vojtěch, JANOUŠEK Oto, NĚMCOVÁ Andrea, BOJANOVSKÁ Hana, ALDABAGHOVÁ Jasmína, KYSLÍK Filip, VODIČKOVÁ Kateřina, SODOMOVÁ Adéla, BARTYS Patrik, CHUDÝ Peter and ČERNOCKÝ Jan. Speech production under stress for machine learning: multimodal dataset of 79 cases and 8 signals. Nature Scientific Data, vol. 11, no. 1, 2024, pp. 1-9. ISSN 2052-4463. Detail
ZHANG Lin, WANG Xin, COOPER Erica, DIEZ Sánchez Mireia, LANDINI Federico Nicolás, EVANS Nicholas and YAMAGISHI Junichi. Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio. In: Proceedings of Interspeech 2024. Kos: International Speech Communication Association, 2024, pp. 502-506. ISSN 1990-9772. Detail
YUSUF Bolaji and SARAÇLAR Murat. Written Term Detection Improves Spoken Term Detection. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 32, no. 06, 2024, pp. 3213-3223. ISSN 2329-9290. Detail

2023

SILNOVA Anna, SLAVÍČEK Josef, MOŠNER Ladislav, KLČO Michal, PLCHOT Oldřich, MATĚJKA Pavel, PENG Junyi, STAFYLAKIS Themos and BURGET Lukáš. ABC System Description for NIST LRE 2022. In: Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023, pp. 1-5. Detail
MATĚJKA Pavel, SILNOVA Anna, SLAVÍČEK Josef, MOŠNER Ladislav, PLCHOT Oldřich, KLČO Michal, PENG Junyi, STAFYLAKIS Themos and BURGET Lukáš. Description and Analysis of ABC Submission to NIST LRE 2022. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 511-515. ISSN 1990-9772. Detail
STAFYLAKIS Themos, MOŠNER Ladislav, KAKOUROS Sofoklis, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In: 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023, pp. 1136-1143. ISBN 978-1-6654-7189-3. Detail
MOŠNER Ladislav, PLCHOT Oldřich, PENG Junyi, BURGET Lukáš and ČERNOCKÝ Jan. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 1693-1697. ISSN 1990-9772. Detail
LANDINI Federico Nicolás, DIEZ Sánchez Mireia, LOZANO Díez Alicia and BURGET Lukáš. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In: Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023, pp. 1-5. ISBN 978-1-7281-6327-7. Detail
PENG Junyi, STAFYLAKIS Themos, GU Rongzhi, PLCHOT Oldřich, MOŠNER Ladislav, BURGET Lukáš and ČERNOCKÝ Jan. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023, pp. 1-5. ISBN 978-1-7281-6327-7. Detail
KAKOUROS Sofoklis, STAFYLAKIS Themos, MOŠNER Ladislav and BURGET Lukáš. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In: Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023, pp. 1-5. ISBN 978-1-7281-6327-7. Detail

2022

SILNOVA Anna, STAFYLAKIS Themos, MOŠNER Ladislav, PLCHOT Oldřich, ROHDIN Johan A., MATĚJKA Pavel, BURGET Lukáš, GLEMBEK Ondřej and BRUMMER Johan Nikolaas Langenhoven. Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. In: Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022, pp. 9-16. Detail
KOCOUR Martin, UMESH Jahnavi, KARAFIÁT Martin, ŠVEC Ján, LOPEZ Fernando, BENEŠ Karel, DIEZ Sánchez Mireia, SZŐKE Igor, LUQUE Jordi, VESELÝ Karel, BURGET Lukáš and ČERNOCKÝ Jan. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. In: Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022, pp. 276-280. Detail
ALAM Jahangir, BURGET Lukáš, GLEMBEK Ondřej, MATĚJKA Pavel, MOŠNER Ladislav, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna and STAFYLAKIS Themos et al. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. In: Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022, pp. 346-353. Detail
LANDINI Federico Nicolás, LOZANO Díez Alicia, DIEZ Sánchez Mireia and BURGET Lukáš. From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 5095-5099. ISSN 1990-9772. Detail
MOŠNER Ladislav, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022, pp. 7982-7986. ISBN 978-1-6654-0540-9. Detail
MOŠNER Ladislav, PLCHOT Oldřich, BURGET Lukáš and ČERNOCKÝ Jan. Multisv: Dataset for Far-Field Multi-Channel Speaker Verification. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Singapore: IEEE Signal Processing Society, 2022, pp. 7977-7981. ISBN 978-1-6654-0540-9. Detail
BRUMMER Johan Nikolaas Langenhoven, SWART Albert du Preez, MOŠNER Ladislav, SILNOVA Anna, PLCHOT Oldřich, STAFYLAKIS Themos and BURGET Lukáš. Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 1446-1450. ISSN 1990-9772. Detail
STAFYLAKIS Themos, MOŠNER Ladislav, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna, BURGET Lukáš and ČERNOCKÝ Jan. Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Incheon: International Speech Communication Association, 2022, pp. 605-609. ISSN 1990-9772. Detail

2021

LANDINI Federico Nicolás, GLEMBEK Ondřej, MATĚJKA Pavel, ROHDIN Johan A., BURGET Lukáš, DIEZ Sánchez Mireia and SILNOVA Anna. Analysis of the BUT Diarization System for Voxconverse Challenge. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 5819-5823. ISBN 978-1-7281-7605-5. Detail
KARAFIÁT Martin, VESELÝ Karel, ČERNOCKÝ Jan, PROFANT Ján, NYTRA Jiří, HLAVÁČEK Miroslav and PAVLÍČEK Tomáš. Analysis of X-Vectors for Low-Resource Speech Recognition. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 6998-7002. ISBN 978-1-7281-7605-5. Detail

Products

2024

SW4 Acoustic Pattern Detector, software, 2024
Authors: Yusuf Bolaji, Karafiát Martin, Švec Jan, Šmídl Luboš Detail
SW6 ASR of Eastern European Language, software, 2024
Authors: Šmídl Luboš, Karafiát Martin, Lehečka Jan, Szőke Igor, Švec Jan Detail

2023

SW3 ASR for demanding acoustic conditions, software, 2023
Authors: Šmídl Luboš, Karafiát Martin, Švec Jan, Lehečka Jan, Mošner Ladislav, Brukner Jan Detail

2022

SW2 Robust diarization, software, 2022
Authors: Karafiát Martin, Diez Sánchez Mireia, Švec Jan, Černocký Jan, Szőke Igor, Šmídl Luboš, Zajíc Zbyněk Detail

2021

SW1: ASR of Asian language, software, 2021
Authors: Karafiát Martin, Lehečka Jan, Szőke Igor, Šmídl Luboš, Švec Jan Detail