Abstract
Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. Identification of the accent before the speech recognition can improve performance of the speech recognition systems. If the number of accents is more in a language, the accent recognition becomes crucial. Telugu is an Indian language which is widely spoken in Southern part of India. Telugu language has different accents. The main accents are coastal Andhra, Telangana, and Rayalaseema. In this present work the samples of speeches are collected from the native speakers of different accents of Telugu language for both training and testing. In this work, Mel frequency cepstral coefficients (MFCC) features are extracted for each speech of both training and test samples. In the next step Gaussian mixture model (GMM) is used for classification of the speech based on accent. The overall efficiency of the proposed system to recognize the speaker, about the region he belongs, based on accent is 91 %.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal, R. K., & Dave, M. (2011). Using Gaussian mixtures for Hindi speech recognition system. International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 157–170.
Beek, B., Neuberg, E., Hodge, D. (1977) An assessment of the technology of automatic speech recognition for military applications. IEEE transactions on acoustics speech and signal processing, ASSP-25 (pp 310–322).
Biadsy, F. (2011), Automatic dialect and accent recognition and its application to speech recognition, A Ph.D. Thesis, Columbia University. http://www.cs.columbia.edu/speech/ThesisFiles/fadi_biadsy.pdf.
Bricker, P. D., et al. (1971). Statistical techniques for talker identification. Bell System Technical Journal, 50, 1427–1454.
Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.
Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. (2014) Lexical stress classification for language learning using spectral and segmental features. ICASSP-14 (pp. 7754–7758).
Kumpf, K., & King, R. W. (1996) Automatic accent classification of foreign accented Australian english speech, ICSLP-96 (Vol. 3, pp. 1740–1743). doi: 10.1109/ICSLP.1996.607964.
Kun, L. I., & Jia, L. I. U. (2010). English sentence accent detection based on auditory features. Beijing: Tsinghua Tongfang Knowledge Network Technology Co., Ltd.
Kumar, G. S., Prasad Raju, K. A., Satheesh, P., & Mohan Rao, (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.
Li, K. P., & Hughes, G. W. (1974). Talker differences as they appear in correlation matrices of continuous speech spectra. The Journal of the Acoustical Society of America., 55, 833–837.
Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1791–1801.
Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000) Mandarin accent adaptation based on context independent/Context-dependent pronunciation modeling. In Proceedings of the acoustics, speech, and signal processing, ICASSP ‘00 (pp: II1025–II1028). Washington, DC: IEEE Computer Society.
Luoh, L., Su, Y.-Z., & Hsu, C.-F. (2010) Speech signal processing based emotion recognition. International Conference on System Science and Engineering, IEEE Conference (pp. 487–490).
Mandal, S. K. D., Gupta, B., & Datta, A. K. (2007). Word boundary detection based on supra segmental features: A case study on Bangla speech. International Journal of Speech Technology, 9(1–2), 17–28.
Ma, Zichen, & Fokoué, Ernest. (2014). A comparison of classifiers in performing speaker accent recognition using MFCCs. Open Journal of Statistics, 4, 258–266.
Malhotra, Kamini, & Khosla, Anu. (2013). Impact of regional Indian accents on spoken Hindi, Asian spoken language research and evaluation (O-COCOSDA/CASLRE). International Conference, 01(2013), 1–4. doi:10.1109/ICSDA.2013.6709876.
Mannepalli, K., Sastry, P. N., Rajesh, V. (2014) Modellling and analysis of accent based recognition and speaker identification system, ARPN Journal of Engineering and Applied Sciences, 9(12), ISSN: 1819-6608.
Meena, K., Subramanian, U., & Muthusamy, G. (2013). Gender classification in speech recognition using fuzzy logic and neural network. The International Arab Journal of Information Technology, 10(5), 477–485.
Mermelstein, P., & Davis, S. (1980). Comparison of parametric representation for mono syllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic Speech and Signal Processing, 28(4), 357–366.
Nidhyananthan, S. S., & Kumari, R. S. S. (2013). Language and text-independent speaker identification system using GMM. WSEAS Transactions on Signal Processing, 9(4), 185–194.
Nelwamondo, F. V., & Marwala, T. (2006), Faults detection using gaussian mixture models, mel-frequency cepstral coefficients and kurtosis. IEEE International Conference on Systems, Man, and Cybernetics October 8–11, Taipei. 1-4244-0100-3/06: pp. 290–295 (Print).
Rao, K. S., & Koolagudi, S. G. (2011) Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systems, Cybernetics and Informatics, 9(4). ISSN: 1690-4524.
Singh, N., Khan, R. A., & Shree, R. (2012). MFCC and prosodic feature extraction techniques: A comparative study (0975– 8887). International Journal of Computer Applications, 54(1), 9–13.
Yan, Q., & Vaseghi, S. (2002) A comparative study of UK and US english accents in recognition and synthesis. IEEE international conference on acoustics, speech, and signal processing (ICASSP, 2002) (pp. 413–416). doi: 10.1109/ICASSP.2002.5745496.
YunXue, Z., Long, Z., ShiJie, Z., Wei, Z. (2015) Chinese accent detection research based on features structured. International Journal of Hybrid Information Technology, 8(5), 303–316. http://dx.doi.org/10.14257/ijhit.2015.8.5.33.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mannepalli, K., Sastry, P.N. & Suman, M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol 19, 87–93 (2016). https://doi.org/10.1007/s10772-015-9328-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9328-y