Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Speaker recognition based on short utterance compensation method of generative adversarial networks

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

On the basis of gaussian mixture model–universal background model (GMM–UBM) in the speaker recognition system, the paper proposes a short utterance sample compensation method based on the generative adversarial network (GAN) to solve the problem of the inadequate corpus data caused by short utterance, which has led to a serious reduction of recognition rate. The presented method compensates the short utterance samples into the speech samples with sufficient speaker identity information by completing the antagonistic training of generator network and discriminator network. In order to avoid the model crash and gradient instability in the process of GAN training, this paper adopts the condition information in the conditional GAN to guide the compensation process of the generator network, and proposes the generator compensation performance measurement training task and the feature tag training task of the discriminator to stabilize training process. Finally, the proposed short utterance compensation method is evaluated on the speaker recognition system based on GMM–UBM. The experimental results indicate that the presented method can effectively reduce the equal error rate of the speaker recognition system in short utterance environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abadi, M, et al. (2016). Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI}16), pp. 265–283.

  • Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of IEEE,85(9), 1437–1462.

    Article  Google Scholar 

  • Chakroun, R., & Frikha, M. (2018). New approach for short utterance speaker identification. IET Signal Processing,12(7), 873–880.

    Article  Google Scholar 

  • Chao, Y. H., Tsai, W. H., & Wang, H. M. (2009). Improving GMM–UBM speaker verification using discriminative feedback adaptation. Computer Speech & Language,23(3), 376–388.

    Article  Google Scholar 

  • Guo, J., Xu, N., Qian, K., et al. (2018). Deep neural network based i-vector mapping for speaker verification using short utterances. Speech Communication,105, 92–102.

    Article  Google Scholar 

  • Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32(6), 74–99.

    Article  Google Scholar 

  • Heravi, A. R., & Hodtani, G. A. (2018). Where does minimum error entropy outperform minimum mean square error? A new and closer look. IEEE Access,6(99), 5856–5864.

    Article  Google Scholar 

  • Isola P., Zhu J. Y., Zhou T., & Efros A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134.

  • Li, L., Wang, D., Zhang, C., & Suzuki, M. M. (2016). Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio Speech & Language Processing,24(6), 1129–1139.

    Article  Google Scholar 

  • Liu, Z., Wu, Z., Li, T., et al. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics,14(7), 3244–3252.

    Article  Google Scholar 

  • Martinez J., Jorge H.,et al. (2012). Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and vector quantization (VQ) techniques. In Proceedings of the International Conference on Electrical Communications & Computers, pp. 248–251.

  • Shen P., Lu X., Li S., & Kawai H. (2018). Conditional generative adversarial nets classifier for spoken language identification. In Proceedings of the INTERSPEECH, pp. 2814–2818.

  • Sueur, J. (2018). Mel-frequency cepstral and linear predictive coefficients. In Proceedings of the Sound Analysis and Synthesis with R, pp. 381–398.

  • Villalba J., Brummer N., & Dehak N. (2017). Tied variational autoencoder backends for i-vector speaker recognition. In Proceedings of Interspeech, pp. 1004–1008.

  • Wu, Z., Yu, Z., Yuan, J., & Zhang, J. (2016). A twice face recognition algorithm. Soft Computing,20(3), 1007–1019.

    Article  Google Scholar 

  • Zhang, L., Zhao, J. Y., Xu-Lun, Y. E., et al. (2018a). Co-operative generative adversarial nets. Zidonghua Xuebao/acta Automatica Sinica,44(5), 804–810.

    MATH  Google Scholar 

  • Zhang J., Inoue N., & Shinoda K. (2018). I-vector transformation using conditional generative adversarial networks for short utterance speaker verification. arXiv preprint arXiv:1804.00290.

Download references

Funding

This research was supported by the Natural Science Foundation of Chongqin City, China (cstc2017jcyjA0893) and project of theoretical and applied research on enhanced Raman biosensor chip based on plasma waveguide (csts2017jcyjAX0427).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaqin Fu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Z., Fu, Y., Luo, Y. et al. Speaker recognition based on short utterance compensation method of generative adversarial networks. Int J Speech Technol 23, 443–450 (2020). https://doi.org/10.1007/s10772-020-09711-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09711-0

Keywords

Navigation