Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation
<p>Magnitude and phase response of the direct current (DC) removal filter.</p> "> Figure 2
<p>LD fit for STFT coefficients with a high spectral resolution: female speech sample.</p> "> Figure 3
<p>LD fit for STFT coefficients with a high spectral resolution: male speech sample.</p> "> Figure 4
<p>LD fit for STFT coefficients with a high spectral resolution: real part of STFT coefficients.</p> "> Figure 5
<p>LD fit for STFT coefficients with a high spectral resolution: imaginary part of STFT coefficients.</p> "> Figure 6
<p>GD fit for STFT coefficients with a high spectral resolution: female speech sample.</p> "> Figure 7
<p>GD fit for STFT coefficients with a high spectral resolution: male speech sample.</p> "> Figure 8
<p>GD fit for STFT coefficients with a high spectral resolution: real part of STFT coefficients.</p> "> Figure 9
<p>GD fit for STFT coefficients with a high spectral resolution: imaginary part of STFT coefficients.</p> "> Figure 10
<p>LD fit for STFT coefficients with a low spectral resolution:male speech sample.</p> "> Figure 11
<p>LD fit for STFT coefficients with a low spectral resolution: female speech sample.</p> "> Figure 12
<p>GD fit for STFT coefficients with a low spectral resolution: female speech sample.</p> "> Figure 13
<p>GD fit for STFT coefficients with a low spectral resolution: male speech sample.</p> ">
Abstract
:1. Introduction
2. Maximum Likelihood Estimation of Laplacian Distribution (LD) and Gaussian Distribution (GD) Parameters
3. Direct Current (DC) Removal and Voice Activity Detection
4. Experimental Procedure and Discussion of Results
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Gazor, S.; Zhang, W. Speech probability distribution. IEEE Signal Process. Lett. 2003, 10. [Google Scholar] [CrossRef]
- Rezayee, A.; Gazor, S. An adaptive KLT approach for speech enhancement. IEEE Trans. Speech Audio Process. 2001, 9, 87–95. [Google Scholar] [CrossRef]
- Backstrom, T. Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source. In Proceedings of the Interspeech: Annual Conference of the International Speech Communication Association, International Speech Communication Association, Stockholm, Sweden, 20–24 August 2017; pp. 344–348. [Google Scholar] [CrossRef]
- Backstrom, T. Speech Coding with Code-Excited Linear Prediction, 1st ed.; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
- Xavier, A.; Simon, B.; Nicholas, E.; Corinne, F.; Gerald, F.; Oriol, V. Speaker diarization: A review of recent research. IEEE Trans. Audio Speech Lang.Process. 2012, 20, 356–370. [Google Scholar] [CrossRef]
- Shin, J.W.; Chang, J.H.; Kim, N.S. Speech probability distribution based on generalized gamma distribution. In Proceedings of the 8th International Conference on Spoken Language Processing, Jeju Island, Korea, 4–8 October 2004. [Google Scholar]
- Shin, J.W.; Chang, J.H.; Kim, N.S. Statistical Modeling of speech signals based on generalized gamma distribution. IEEE Signal Process. Lett. 2005, 12, 258–261. [Google Scholar] [CrossRef]
- Richards, D. L. Statistical properties of speech signals. Proc. Inst. Elect. Eng. 1964, 111, 941–949. [Google Scholar] [CrossRef]
- Gazor, S.; Far, R.R. Probability distribution of speech signal spectral envelope. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE) 2004, (IEEE Cat No. 04CH37513), Niagara Falls, ON, Canada, 2–5 May 2004; Volume 4, pp. 2267–2270. [Google Scholar] [CrossRef]
- Jensen, J.; Batina, I.; Hendriks, R.C.; Heusdens, R. A study of the distribution of time-domain speech samples and discrete Fourier coefficients. In Proceedings of the 1st BENELUX/DSP Valley Signal Processing Symposium, Antwerp, Belgium, 19–20 April 2005; pp. 155–158. [Google Scholar]
- Martin, R. Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; pp. 253–256. [Google Scholar] [CrossRef]
- Martin, R.; Breithaupt, C. Speech enhancement in the DFT domain using Laplacian speech priors. In Proceedings of the International Workshop on Acoustics Echo and Noise Control (IWAENC), Kyoto, Japan, 8–11 September 2003; pp. 87–90. [Google Scholar]
- Zeng, F. G.; Rebscher, S.; Harrison, W.; Sun, X.; Feng, H. Cochlear implants: system design, integration, and evaluation. IEEE Rev. Biomed. Eng. 2008, 1, 115–142. [Google Scholar] [CrossRef] [PubMed]
- NIST/SEMATECH e-Handbook of Statistical Methods. Available online: http://www.itl.nist.gov/div898/handbook/ (accessed on 15 April2018).
- Norton, R.M. The Double Exponential Distribution: Using Calculus to Find a Maximum Likelihood Estimator. Am. Statist. 1984, 38, 135–136. [Google Scholar] [CrossRef]
- Ijyas, V.P.T.; Sameer, S.M. Cramér-Rao bound for joint estimation problems. Electron. Lett. 2013, 49, 427–428. [Google Scholar] [CrossRef]
- Hald, A. On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci. 1999, 14, 214–222. [Google Scholar] [CrossRef]
- Partila, P.; Vozňák, M.; Mikulec, M.; Zdralek, J. Fundamental Frequency Extraction Method using Central Clipping and its Importance for the Classification of Emotional State. Advan. Electr. Electron. Eng. 2012, 10, 270–275. [Google Scholar] [CrossRef]
- Tan, Z.H.; Lindberg, B. Low-complexity variable frame rate analysis for speech recognition and voice activity detection. IEEE J. Sel. Top. Signal Process. 2010, 4, 798–807. [Google Scholar] [CrossRef]
- Fu, Q.J.; Shannon, R.V.; Wang, X. Effects of noise and spectral resolution on vowel and consonant recognition: Acoustic and electric hearing. J. Acoust. Soc. Am. 1998, 104, 3586. [Google Scholar] [CrossRef] [PubMed]
- Clarke, J.; Başkent, D.; Gaudrain, E. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech. J. Acoust. Soc. Am. 2016, 139, 395–405. [Google Scholar] [CrossRef] [PubMed]
- Yoshizawa, T.; Hirobayashi, S.; Misawa, T. Noise reduction for periodic signals using high-resolution frequency analysis. EURASIP J. Audio Speech Music Process. 2011, 1. [Google Scholar] [CrossRef]
- Graf, S.; Zaidi, N.; Herbig, T.; Buck, M.; Schmidt, G. Detection of voiced speech and pitch estimation for application with low spectral resolution. In Proceedings of the DAGA 2017, Kiel, Germay, 6–9 March 2017. [Google Scholar]
- Greenberg, S.; Kingsbury, B.E.D. The modulation spectrogram: in pursuit of an invariant representation of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 21–24 April 1997; pp. 1647–1650. [Google Scholar] [CrossRef]
- Bernhardsson, E. Language Pitch. Available online: https://erikbern.com/2017/02/01/language-pitch.html, 1-Feb-2017 (accessed on 7December 2018).
- Kooagudi, S.G.; Rastogi, D.; Rao, K.S. Identification of language using Mel Frequency Cepstral Coefficients (MFCC). Proceedia Eng. 2012, 38, 3391–3398. [Google Scholar] [CrossRef]
- Gunawan, T.S.; Husain, R.; Kartiwi, M. Development of language identification system using MFCC and vector quantization. In Proceedings of the IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA), Putrajaya, Malaysia, 28–30 November 2017. [Google Scholar]
- Yin, B.; Ambikairajah, E.; Chen, F. Combining Cepstral and Prosodic features in language identification. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006. [Google Scholar]
- Holberg, M.; Gelbart, D.; Hemmert, W. Automatic speech recognition with an adaptation model motivated by auditory processing. IEEE Trans. Audio Speech Lang. Process. 2006, 14, 43–49. [Google Scholar] [CrossRef]
- Alsulaiman, M.; Muhammad, G.; Ali, Z. Comparison of voice features for Arabic speech recognition. In Proceedings of the Sixth International Conference on Digital Information Management, Melbourne, Australia, 26–28 September 2011. [Google Scholar]
- Naini, A.S.; Homayounpour, M.M. Speaker age interval and sex identification based on jitters, shimmers and mean mfcc using supervised and unsupervised discriminative classification methods. In Proceedings of the 8th International conference on signal processing, Beijing, China, 16–20 November2006. [Google Scholar]
- Katrenchuk, D. Age group classification with speech and metadata multimodality fusion. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3–7 April 2017; Volume 2, pp. 188–193. [Google Scholar]
- Kodrasi, I.; Bourlard, H. Statistical modeling of speech spectral coefficients in patients with Parkinson’s disease. In Proceedings of the ITG Conference on Speech Communication, Oldenburg, Germany, 10–12 October 2018. [Google Scholar]
RMS Error | CRB | Hypothesized Distribution (s) | Validity of Distribution | Efficiency of MLE |
---|---|---|---|---|
Small | Small | LD | Valid | Efficient |
Small | Large | GD | Valid | Inefficient |
Large | Small | LD and GD | Invalid | Efficient |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Usman, M.; Zubair, M.; Shiblee, M.; Rodrigues, P.; Jaffar, S. Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation. Symmetry 2018, 10, 750. https://doi.org/10.3390/sym10120750
Usman M, Zubair M, Shiblee M, Rodrigues P, Jaffar S. Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation. Symmetry. 2018; 10(12):750. https://doi.org/10.3390/sym10120750
Chicago/Turabian StyleUsman, Mohammed, Mohammed Zubair, Mohammad Shiblee, Paul Rodrigues, and Syed Jaffar. 2018. "Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation" Symmetry 10, no. 12: 750. https://doi.org/10.3390/sym10120750
APA StyleUsman, M., Zubair, M., Shiblee, M., Rodrigues, P., & Jaffar, S. (2018). Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation. Symmetry, 10(12), 750. https://doi.org/10.3390/sym10120750