Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

In audio stream containing multiple speakers, speaker diarization aids in ascertaining “who speak when”. This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient (TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MOATTAR M H, HOMAYOUNPOUR M M. A reveiw on speaker diarization systems and approaches [J]. Speech Communications. 2012, 54(10): 1065-1103.

    Google Scholar 

  2. TRANTER S E, DOUGLAS A. Reynolds, an overview of automatic speaker diarization systems [J]. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(5): 1557–1565.

    Article  Google Scholar 

  3. KENNY P, GUPTA V, STAFYLAKIS T, OUELLET P, ALAM J. Deep neural networks for extracting baum welch statistics for speaker recognition [C]//Proceedings of the Speaker and Language Recognition. 2014: 293-298.

    Google Scholar 

  4. SAYOUD H, OUAMOUR S, KHENNOUF S. Virtual system of speaker tracking by camera using an audio-based source localization [C]//Proceedings of World Congress on Engineering. 2012, 2.

    Google Scholar 

  5. HUANG Y, BENESTY J, ELKO G W. Micro phone arrays for video camera steering [M]//Acoustic Signal Processing for Telecommunication. Hingham, MA, USA: Kluwer Academic Publishers, 2000: 239-260.

    Google Scholar 

  6. CHEN Jian-feng, LOUIS S, WEE S. A new approach for speaker tracking in reverberant environment [J]. Signal Processing, 2002, 82: 1023–1028.

    Article  MATH  Google Scholar 

  7. HU M, SHARMA D, DOCLO S, BROOKES M, NAYLOR P A. Speaker Change detection and speaker diarization using spatial information [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, QLD, Australia: IEEE, 2015: 5743-5747.

    Google Scholar 

  8. MOATTAR M H, HOMAYOUNPOUR M M. Variational conditional random fields for online speaker detection and tracking [J]. Speech Communication, 2012, 54: 763–780.

    Article  Google Scholar 

  9. SUN X, FOOTE J, KIMBER D, MANJUNATH B S. Region of interest extraction and virtual camera control based on panoramic video [J]. IEEE Transactions on Multimedia, 2005, 7(5): 981–990.

    Article  Google Scholar 

  10. CHEN Yun-qiang, RUI Yong, Real-time speaker tracking using particle filter sensor fusion [J]. Proceedings of the IEEE, 2004, 92(3): 485–494.

    Article  Google Scholar 

  11. SWAMY R K, RAMA M K, YEGNANARAYANA B. Determining number of speakers from multi-speaker speech signals using excitation source information [J]. IEEE Signal Processing Letters, 2007, 14(7): 481–484.

    Article  Google Scholar 

  12. PERTILA P. Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking [J]. Computer Speech and Language, 2013, 27: 683–702.

    Article  Google Scholar 

  13. MA Zhong-hong, YANG Yong, GE Qi, DENG Li-jun, XU Zhen-xin, SUN Xu-na. Nonlinear filtering method of zero-order term suppression for improving the image quality in off-axis holography [J]. Optics Communications, 2014, 315: 232–237.

    Article  Google Scholar 

  14. YE Tian, CHEN Zhe, YIN Fu-liang. Distributed Kalman filter-based speaker tracking in microphone array networks [J]. Applied Acoustics, 2015, 89: 71–77.

    Article  Google Scholar 

  15. RAJAKUMAR B. The Lion’s algorithm: A new nature-inspired search algorithm [J]. Procedia Technology, 2012, 6: 126–135.

    Article  Google Scholar 

  16. DUNN R B, REYNOLDS D A, QUATIERI T F. Approaches to speaker detection and tracking in conversational speech [J]. Digital Signal Processing, 2000, 10: 93–112.

    Article  Google Scholar 

  17. DAI Xiao-feng, LAHDESMAKI H, YLI-HARJA O. A stratified beta-gaussian mixture model for clustering genes with multiple data sources [C]//Proceedings of Biocomputation, Bioinformatics, and Biomedical Technologies. Bucharest, Romania: IEEE, 2008: 94-99.

    Google Scholar 

  18. MARKOVIC I, PETROVIC I. Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering [J]. Robotics and Autonomous Systems, 2010, 58: 1185–1196.

    Article  Google Scholar 

  19. YEGNANARAYANA B, MAHADEVA PRASANNA S R. Analysis of instantaneous F0 contours from two speakers mixed signal using zero frequency filtering [C]//Proceedings of Acoustics Speech and Signal Processing. Dallas, TX, USA: IEEE, 2010: 5074-5077.

    Google Scholar 

  20. ALAM M J, OUELLET P, KENNY P, O’SHAUGHNESSY D. Comparative evaluation of feature normalization techniques for speaker verification [J]. Advances in Nonlinear Speech Processing, 2011, 7015: 246–253.

    Article  Google Scholar 

  21. KUMAR K, KIM C, STERN R M. Delta-spectral cepstral Coefficients for robust speech recognition [C]//Proceedings of ICASSP. Prague, Czech: IEEE, 2011: 4784-4787.

    Google Scholar 

  22. GUPTA V, BOULIANNE G, KENNY P, OUELLET P, DUMOUCHEL P. Speaker diarization of the French broadcast news [C]//Proceedings of ICASSP. Las Vegas, NV, USA: IEEE, 2008: 4365-4368.

    Google Scholar 

  23. BARRAS C, ZHU Xuan. MEIGNIER S, GAUVAIN J L. Multistage speaker diarization of broadcast news [J]. IEEE Transactions on Audio, Speech and Language Processing. 2006, 14(5): 1505-1512.

    Article  Google Scholar 

  24. MIRO X A, BOZONNET S, EVANS N, FREDOUILLE C, FRIEDLAND G, VINYALS O, DIARIZATION S. Speaker diarization: A review of recent research [J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(2): 356–370.

    Article  Google Scholar 

  25. CAMPBELL W M, STURIM D E, REYNOLDS D A. Support vector machines using GMM supervectors for speaker verification [J]. IEEE Signal Processing Letters, 2006, 13(5): 308–311.

    Article  Google Scholar 

  26. PEELING P, CEMGIL A T, GODSILL S. Bayesian hierarchical models and inference for musical audio processing [C]//Proceedings of IEEE Wireless Pervasive Computing. Las Vegas, NV, USA: IEEE, 2008: 278-282.

    Google Scholar 

  27. ZHENG Rong, ZHANG Ce, ZHANG Shan-shan, XU Bo. Variational bayes based i-vector for speaker diarization of telephone Conversations [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). Florence, Italy, IEEE, 2014: 91-95.

    Google Scholar 

  28. KENNY P, GUPTA V, STAFYLAKIS T, OUELLET P, ALAM J. Deep neural networks for Baum-Welch statistics for speaker Recognition [C]//Proceedings of Neural Networks for Speaker and Language Modelling, 2014.

    Google Scholar 

  29. FLSDSR corpus dataset. [2016–05–02]. http://cogsys.compute.dtu. dk/soundshare/elsdsr.zip.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Subba Ramaiah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Subba Ramaiah, V., Rajeswara Rao, R. A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization. J. Cent. South Univ. 24, 2649–2663 (2017). https://doi.org/10.1007/s11771-017-3678-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-017-3678-3

Keywords

Navigation