Improving low-complexity and real-time DeepFilterNet2 for personalized speech enhancement
DeepFilterNet2, a recently proposed real-time and low-complexity speech enhancement (SE) technique, has shown state-of-the-art SE performance in many deep noise suppression tasks. This paper proposes a new approach, termed pDeepFilterNet2, to ...
VAD system under uncontrolled environment: A solution for strengthening the noise robustness using MMSE-SPZC
Voice activity detection (VAD) plays a crucial role in speech processing, serving as a fundamental component for various applications such as speech recognition and communication systems. Numerous approaches have been explored to address the VAD ...
Feature fusion: research on emotion recognition in English speech
English speech incorporates numerous features associated with the speaker’s emotions, offering valuable cues for emotion recognition. This paper begins by briefly outlining preprocessing approaches for English speech signals. Subsequently, the Mel-...
An experiment of Moroccan dialect speech recognition in noisy environments using PocketSphinx
In this study, we introduce an experimental framework for Moroccan dialect speech recognition under various additive noise conditions using the open-source tool PocketSphinx. We curated a corpus comprising the ten most commonly used greetings in ...
Effect of identical twins on deep speaker embeddings based forensic voice comparison
Deep learning has gained widespread adoption in forensic voice comparison in recent years. It is mainly used to learn speaker representations, known as embedding features or vectors. In this work, the effect of identical twins on two state-of-the-...
Speech emotion recognition with transfer learning and multi-condition training for noisy environments
This paper explores the use of transfer learning techniques to develop robust speech emotion recognition (SER) models capable of handling noise in real-world environments. Two SER frameworks have been proposed in this work: Framework-1 is a two-...
Server-side rescoring of spoken entity-centric knowledge queries for virtual assistants
On-device virtual assistants (VAs) powered by automatic speech recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for ...
Attentional multi-feature fusion for spoofing-aware speaker verification
The Spoofing-Aware Speaker Verification (SASV) system is designed to protect automatic speaker verification (ASV) systems from potential speech spoofing attacks by integrating the ASV and countermeasure systems. The optimization of the ASV system ...
Fake news detection models using the largest social media ground-truth dataset (TruthSeeker)
Twitter is a powerful platform for communication and information sharing but is also susceptible to spreading false information. This false information has adverse consequences for society and can significantly impact public perception, decision-...
A robust approach to authorship verification using siamese deep learning: application in phishing email detection
- Mohamed Abdelkarim Remmide,
- Fatima Boumahdi,
- Imane Rebeh Ammar Aouchiche,
- Amina Guendouz,
- Narhimene Boustia
Given the rapid and significant increase in email data, it is crucial for both individuals and organisations to prioritise the implementation of strong cybersecurity measures to combat attacks such as phishing emails. While continuous research has ...
Anomaly detection with a variational autoencoder for Arabic mispronunciation detection
Computer-assisted language learning (CALL) systems increasingly arouse a significant interest and establish a presence in automated foreign language learning. They enhance traditional learning methods by providing access to various accents and ...
Temporal feature-based approaches for enhancing phoneme boundary detection and masking in speech
Automatic phoneme boundary detection is a key problem in speech processing and applications. The accurate phoneme segmentation in continuous speech contributes to the improvement of recognition quality. This study proposes an efficient temporal ...
A quantal model for Algerian vowel features identification using formants and subglottal resonances
The objective of the present study was to test the hypothesis that subglottal resonances (SGRs) serve as quantal boundaries that separate Arabic vowel features. Previous research suggests that SGR frequencies predict discontinuities in vowel ...
Automatic hate speech detection in audio using machine learning algorithms
Even though every individual is entitled to freedom of speech, some limitations exist when this freedom is used to target and harm another individual or a group of people, as it translates to hate speech. In this study, the proposed research deals ...
Analyzing acoustic patterns of vowel sounds produced by native Rangri speakers
The foundational-cum-descriptive study aims to document Rangri vowel realizations acoustically. The Rangri is a member of the Indo-Aryan language family. Ranghar Muhajir of Pakistani Punjab and some parts of Sindh speak Rangri-a zero-resource ...
Pathological voice classification system based on CNN-BiLSTM network using speech enhancement and multi-stream approach
The paper developing a resilient speech classification system for individuals with voice disorders poses a formidable challenge due to the significant variability and distortions inherent in vocal signals. This article outlines the steps to create ...