Abstract
The work starts with a question “Does human vocal folds produce different wavelength when they speak in different accent of same language?” Generally, when humans hear the language, they can easily classify the accent and region from the language. But the challenge was how we give this capability to the machine. By calculating discrete Fourier transform, Mel-spaced filter-bank and log filter-bank energies, we got Mel-frequency cepstral coefficients (MFCCs) of a voice which is the numeric representation of an analog signal. And then, we used different machine learning and deep learning algorithms to find the best possible accuracy. By detecting the region of speaker from voice, we can help security agencies and e-commerce marketing. Working with human natural language is a part of Natural Language Processing (NLP) which is branch of artificial intelligence. For feature extraction, we used MFCCs, and for classification, we used linear regression, decision tree, gradient boosting, random forest and neural network. And we got max 86% accuracy on 9303 data. The data was collected from eight different regions (Dhaka, Khulna, Barisal, Rajshahi, Sylhet, Chittagong, Mymensingh and Noakhali) of Bangladesh. We follow a simple workflow for getting the ultimate result.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mamun, R.K., Abujar, S., Islam, R., Badruzzaman, K.B.M., Hasan, M.: Bangla speaker accent variation detection by MFCC using recurrent neural network algorithm: a distinct approach. In: Saini, H., Sayal, R., Buyya, R., Aliseri, G. (eds.), Innovations in computer science and engineering. Lecture notes in networks and systems, vol. 103 (2020). Springer, Singapore
Bengali language. https://en.wikipedia.org/wiki/Bengali_language. Accessed on 4 Apr 2020
Lin, F., Wu, Y., Zhuang, Y., Long, X., Xu, W.: Human Gender Classification: A Review (2015)
Jiao, Y., Tu, M., Berisha, V., Liss, J.: Accent identification by combining deep neural networks and recurrent neural networks trained on long and short term features. Proc. Interspeech 2016, 2388–2392 (2016)
Patel, I., Kulkarni, R., Yarravarapu, S.R.: Automatic non-native dialect and accent voice detection of south Indian English. Adv. Image Video Process 5. https://doi.org/10.14738/aivp.51.2749
Droua-Hamdani G: Classification of regional accent using speechrhythm metrics. In: Salah, A. A., et al. (eds.), SPECOM 2019, LNAI 11658, pp. 75–81 (2019)
Salau, A.O., Olowoyoand, T.D., Akinola, S.O.: Accent Classification of the Three Major Nigerian Indigenous Languages Using 1DCNN LSTM Network Model, (2020). Springer Nature, Singapore Pte Ltd
Abdullah, R., Muthusamy, H., Vijean, V., Abdullah, Z., Kassim, F.N.C.: Real and complex wavelet transform approaches for malaysian speaker and accent recognition. Pertanika J. Sci. Technol. 27(2), 737–752 (2019)
] Weninger, F., Sun, Y., Park, Y., Willett, D., Zhan, P.: Deep Learning based Mandarin Accent Identification for Accent Robust ASR (2019) ISCA
Jiao, Y., Tu, M., Berisha, V., Liss, J.: Accent identification by combining deep neural networks and recurrent neural networks trained on long and short term features (2019) ISCA
Music Feature Extraction in Python (2018). https://towardsdatascience.com/extract-features-of-music-75a3f9bc265d. Accessed on 4 Apr 2020
Gouyon, F., Pachet, F., Delerue, O., et al.: On the use of zero-crossing rate for an application of classification of percussive sounds. In: Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00) (2000). Verona, Italy
Kattel, M., Nepal, A., Shah, A., Shrestha, D.: Chroma Feature Extraction (2019). https://www.researchgate.net/publication
Reith, H.: Why are male and female voices distinctive? (2016) 330796993 Chroma Feature Extraction. https://www.quora.com/Why-are-male-and-femalevoices-distinctive. Accessed on 21 Sept 2019
The mel frequency scale and coefficients (2013). http://kom.aau.dk/group/04gr742/pdf/MFCCworksheet.pdf. Accessed on 27 Aug 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Badhon, S.M.S.I., Rahaman, H., Rupon, F.R., Abujar, S. (2021). Bengali Accent Classification from Speech Using Different Machine Learning and Deep Learning Techniques. In: Borah, S., Pradhan, R., Dey, N., Gupta, P. (eds) Soft Computing Techniques and Applications. Advances in Intelligent Systems and Computing, vol 1248. Springer, Singapore. https://doi.org/10.1007/978-981-15-7394-1_46
Download citation
DOI: https://doi.org/10.1007/978-981-15-7394-1_46
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7393-4
Online ISBN: 978-981-15-7394-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)