Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Harnessing the power of Wav2Vec2 and CNNs for Robust Speaker Identification on the VoxCeleb and LibriSpeech Datasets
Expert Systems with Applications: An International Journal (EXWA), Volume 255, Issue PChttps://doi.org/10.1016/j.eswa.2024.124671AbstractSpeaker identification, a cornerstone of speech processing, involves associating individuals with spoken segments within a known speaker pool. This paper presents a significant AI contribution: an innovative framework tailored for closed-set ...
Highlights- Short-duration speech segment focus improves model’s practical applicability.
- Architectural changes increase neural networks’ vocal pattern precision.
- Integration of Wav2Vec2 framework advances state-of-the-art in SID.
- Novel ...
- research-articleJanuary 2025
Text-independent voiceprint recognition via compact embedding of dilated deep convolutional neural networks
Computers and Electrical Engineering (CENG), Volume 118, Issue PBhttps://doi.org/10.1016/j.compeleceng.2024.109408AbstractIn order to process speech, most state-of-the-art experimental methods employ convolutional neural networks (CNNs), which operate on a continuous, 1-dimensional (1-D) time stream. In an audio signal, the mel-spectrogram facilitates the ...
- research-articleOctober 2024
SUETA: Speaker-specific utterance ensemble based transfer attack on speaker identification system
AbstractWith the widespread application of speaker identification (SI) systems in security-related tasks, the robustness of SI systems against adversarial examples has garnered increasing attention. Existing works have demonstrated the vulnerability of ...
- research-articleJuly 2024
Emotional speaker identification using PCAFCM-deepforest with fuzzy logic
Neural Computing and Applications (NCAA), Volume 36, Issue 30Pages 18567–18581https://doi.org/10.1007/s00521-024-10154-wAbstractVoice is perceived as a form of biometrics which communicates valuable and rich information pertinent to an individual, such as his or her identity, gender, accent, age and emotion. Speaker identification denotes the task of identifying speakers ...
- research-articleApril 2024
A stacked convolutional neural network framework with multi-scale attention mechanism for text-independent voiceprint recognition
Pattern Analysis & Applications (PAAS), Volume 27, Issue 2https://doi.org/10.1007/s10044-024-01278-9AbstractShort-utterance speaker identification is a difficult area of study in natural language processing (NLP). Most cutting-edge experimental approaches for speech processing make use of convolutional neural networks (CNNs) and deep neural networks and ...
-
- research-articleJuly 2024
Automatic speaker and age identification of children from raw speech using sincNet over ERB scale
AbstractThis paper presents the newly developed non-native children’s English speech (NNCES) corpus to reveal the findings of automatic speaker and age recognition from raw speech. Convolutional neural networks (CNN), which have the ability to learn low-...
Highlights- The study proposes the use of the SincNet model to extract significant speech cues from children’s raw speech, and evaluates its effectiveness for automatic speaker and age identification tasks.
- The article highlights the benefits of ...
- research-articleJuly 2024
CACRN-Net: A 3D log Mel spectrogram based channel attention convolutional recurrent neural network for few-shot speaker identification
- Banala Saritha,
- Mohammad Azharuddin Laskar,
- Anish Monsley K,
- Rabul Hussain Laskar,
- Madhuchhanda Choudhury
Computers and Electrical Engineering (CENG), Volume 115, Issue Chttps://doi.org/10.1016/j.compeleceng.2024.109100AbstractAdvancements in deep learning for speaker identification are constrained by the limited availability of data, especially in law enforcement applications. This has led to the emergence of few-shot speaker identification, a technique that ...
- research-articleMarch 2024
Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)
International Journal of Speech Technology (SPIJST), Volume 27, Issue 1Pages 179–186https://doi.org/10.1007/s10772-024-10090-zAbstractMachine learning provides researchers in speech processing and bioacoustics numerous advanced and non-invasive techniques to investigate animal vocalizations. Hidden Markov Models (HMMs) are machine learning techniques that were developed and ...
- research-articleApril 2024
Residual networks for text-independent speaker identification: Unleashing the power of residual learning
Journal of Information Security and Applications (JISA), Volume 80, Issue Chttps://doi.org/10.1016/j.jisa.2023.103665AbstractThe human voice, a dynamic signal, conveys valuable information for speaker identification, encompassing gender, age, emotions, and language. In the biometrics industry, identifying voices in real-time amidst diverse accents, tones, and noisy ...
- research-articleJuly 2024
Optimizing Speaker Identification through SincsquareNet and SincNet Fusion with Attention Mechanism
Procedia Computer Science (PROCS), Volume 233, Issue CPages 215–225https://doi.org/10.1016/j.procs.2024.03.211AbstractAdvancements in machine learning and deep learning benefit access control systems, forensics, and biometrics particularly in speaker identification systems. The SincNet architecture is a distinct convolutional neural network (CNN) designed for ...
- research-articleApril 2024
Comparing Machine Learning Models to Determine the Effect of Speech Duration on Speaker Identification within Kazakh Speech Corpus
Procedia Computer Science (PROCS), Volume 231, Issue CPages 727–733https://doi.org/10.1016/j.procs.2023.12.146AbstractThis paper conducts comparative analysis of speaker identification techniques, namely Gaussian Mixture Models (GMM), Bidirectional Long Short-Term Memory (BiLSTM) neural networks, and Recurrent Neural Networks (RNN), applied to the Kazakh Speech ...
- research-articleApril 2024
SilentTrig: An imperceptible backdoor attack against speaker identification with hidden triggers
Pattern Recognition Letters (PTRL), Volume 177, Issue CPages 103–109https://doi.org/10.1016/j.patrec.2023.12.002AbstractSpeaker identification based on deep learning is known to be susceptible to backdoor attacks. However, the current research on audio backdoor attacks is limited, and these attacks often use obvious noises as triggers, which can raise suspicion ...
Highlights- Backdoor attack against speaker identification, improving the imperceptibility.
- Optimized steganographic network for embedding triggers secretively.
- Two-stage adversarial optimization to get indistinguishable malicious samples.
- research-articleNovember 2023
Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal
- Banala Saritha,
- Mohammad Azharuddin Laskar,
- Anish Monsley Kirupakaran,
- Rabul Hussain Laskar,
- Madhuchhanda Choudhury,
- Nirupam Shome
Circuits, Systems, and Signal Processing (CSSP), Volume 43, Issue 3Pages 1839–1861https://doi.org/10.1007/s00034-023-02542-9AbstractSpeech-based speaker identification system is one of the alternatives to the conventional biometric contact-based identification systems. Recent works demonstrate the growing interest among researchers in this field and highlight the practical ...
- research-articleSeptember 2023
Automatic age recognition, call-type classification, and speaker identification of Zebra Finches (Taeniopygia guttata) using hidden Markov models (HMMs)
International Journal of Speech Technology (SPIJST), Volume 26, Issue 3Pages 641–650https://doi.org/10.1007/s10772-023-10041-0AbstractHidden Markov models (HMMs) were developed and implemented to discriminate between each of the 2 ages, 11 call-types, and 51 speakers of birds using cross-validation on the recordings in the 3314 database for chick (19–25 days of age) and adult (...
- research-articleAugust 2023
Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG
Expert Systems with Applications: An International Journal (EXWA), Volume 224, Issue Chttps://doi.org/10.1016/j.eswa.2023.119871AbstractSpeech signals are more susceptible to emotional influences and acoustic interference than other communications. Applications for real-time speech processing face difficulties when dealing with noisy, emotion-filled speech data. ...
Highlights- A novel method is proposed to enhance speaker identification in abnormal conditions.
- review-articleAugust 2023
End-to-end speaker identification research based on multi-scale SincNet and CGAN
Neural Computing and Applications (NCAA), Volume 35, Issue 30Pages 22209–22222https://doi.org/10.1007/s00521-023-08906-1AbstractDeep learning has improved the performance of speaker identification systems in recent years, but it has also presented significant challenges. Typically, data-driven modeling approaches based on DNNs rely on large-scale training data, but due to ...
- research-articleJuly 2023
Degramnet: effective audio analysis based on a fully learnable time–frequency representation
Neural Computing and Applications (NCAA), Volume 35, Issue 27Pages 20207–20219https://doi.org/10.1007/s00521-023-08849-7AbstractCurrent state-of-the-art audio analysis algorithms based on deep learning rely on hand-crafted Spectrogram-like audio representations, that are more compact than descriptors obtained from the raw waveform; the latter are, in turn, far from ...
- research-articleJuly 2023
A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients
Expert Systems with Applications: An International Journal (EXWA), Volume 222, Issue Chttps://doi.org/10.1016/j.eswa.2023.119750AbstractSpeaker identification aims at determining the speaker identity by analyzing his voice characteristics, and relies typically on statistical models or machine learning techniques. Frequency-domain features are by far the most used ...
Highlights- We present a late fusion DNN model with RWs and GTCCs for speaker identification.
- research-articleJune 2023
A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
Neural Computing and Applications (NCAA), Volume 35, Issue 26Pages 18933–18947https://doi.org/10.1007/s00521-023-08736-1AbstractDeep learning has provided many advantages including the ability to extract features from the voice samples and represents the data in a more decisive mode. In speaker identification, many studies have been done to extract more and more meaningful ...
- research-articleJanuary 2023
Speaker identification and localization using shuffled MFCC features and deep learning
International Journal of Speech Technology (SPIJST), Volume 26, Issue 1Pages 185–196https://doi.org/10.1007/s10772-023-10023-2AbstractThe use of machine learning in automatic speaker identification and localization systems has recently seen significant advances. However, this progress comes at the cost of using complex models, computations, and increasing the number of ...