short-paper

Speaker Verification based on extraction of Deep Features

Authors:

Evangelos Mitsianis,

Evaggelos Spyrou,

Theodore GiannakopoulosAuthors Info & Claims

SETN '18: Proceedings of the 10th Hellenic Conference on Artificial Intelligence

Article No.: 36, Pages 1 - 4

https://doi.org/10.1145/3200947.3208070

Published: 09 July 2018 Publication History

Get Access

Abstract

In this paper we present an approach for speaker verification, based on the the extraction of deep features. More specifically, we propose a scheme that is based on a convolutional neural network. For audio representation we opt for spectrograms, i.e., images that result from the spectral content of voices. Our network is trained to extract visual features from these spectrograms. We demonstrate that our network is able to produce discriminative features for the problem at hand, and moreover, when transfer learning is used, few samples may be needed for accurate speaker verification.

References

[1]

Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Computer vision and image understanding 110, 3 (2008), 346--359.

Digital Library

Google Scholar

[2]

Matthew Brown, Gang Hua, and Simon Winder. 2011. Discriminative learning of local image descriptors. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2011), 43--57.

Digital Library

Google Scholar

[3]

Najim Dehak, Patrick J Kenny, Réda Dehak, Pierre Dumouchel, and Pierre Ouellet. 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing 19, 4 (2011), 788--798.

Digital Library

Google Scholar

[4]

Martin A Fischler and Robert C Bolles. 1987. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. In Readings in computer vision. Elsevier, 726--740.

Digital Library

Google Scholar

[5]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.

Google Scholar

[6]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

Google Scholar

[7]

Felix Kreuk, Yossi Adi, Moustapha Cisse, and Joseph Keshet. 2018. Fooling End-to-end Speaker Verification by Adversarial Examples. arXiv preprint arXiv:1801.03339 (2018).

Google Scholar

[8]

Yuan Liu, Yanmin Qian, Nanxin Chen, Tianfan Fu, Ya Zhang, and Kai Yu. 2015. Deep feature for text-dependent speaker verification. Speech Communication 73 (2015), 1--13.

Digital Library

Google Scholar

[9]

David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2 (2004), 91--110.

Digital Library

Google Scholar

[10]

Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345--1359.

Digital Library

Google Scholar

[11]

Gueorgui Pironkov, Stephane Dupont, and Thierry Dutoit. 2016. Multi-task learning for speech recognition: an overview. In Proceedings of the 24th European Symposium on Artificial Neural Networks (ESANN), Vol. 192.

Google Scholar

[12]

Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. 2015. Discriminative learning of deep convolutional feature point descriptors. In Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, 118--126.

Digital Library

Google Scholar

[13]

Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, and Javier Gonzalez-Dominguez. 2014. Deep neural networks for small footprint text-dependent speaker verification. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 4052--4056.

Crossref

Google Scholar

[14]

Xiu Zhang, Bilei Zhu, Linwei Li, Wei Li, Xiaoqiang Li, Wei Wang, Peizhong Lu, and Wenqiang Zhang. 2015. SIFT-based local spectrogram image descriptor: a novel feature for robust music identification. EURASIP Journal on Audio, Speech, and Music Processing 2015, 1 (2015), 6.

Crossref

Google Scholar

Speaker Verification based on extraction of Deep Features
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Local spectral variability features for speaker verification

Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variation through the eigenstructure of the ...
Multitaper MFCC and PLP features for speaker verification using i-vectors

In this paper we study the performance of the low-variance multi-taper Mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) features in a state-of-the-art i-vector speaker verification system. The MFCC and PLP features are ...
Significance of analytic phase of speech signals in speaker verification

Importance of analytic phase in human perception of speaker identity is verified.Features are extracted from derivative of analytic phase, referred to as IFCCs.IFCCs are found suitable for text-dependant & independent speaker recognition systems.Speaker ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SETN '18: Proceedings of the 10th Hellenic Conference on Artificial Intelligence

July 2018

339 pages

ISBN:9781450364331

DOI:10.1145/3200947

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

EETN: Hellenic Artificial Intelligence Society
UOP: University of Patras
University of Thessaly: University of Thessaly, Volos, Greece

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

SETN '18

SETN '18: 10th Hellenic Conference on Artificial Intelligence

July 9 - 12, 2018

Patras, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
83
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Recommendations

Local spectral variability features for speaker verification

Multitaper MFCC and PLP features for speaker verification using i-vectors

Significance of analytic phase of speech signals in speaker verification