Abstract
Like other applications, under the purview of pattern classification, analyzing speech signals is crucial. People often mix different languages while talking which makes this task complicated. This happens mostly in India, since different languages are used from one state to another. Among many, Southern part of India suffers a lot from this situation, where distinguishing their languages is important. In this paper, we propose image-based features for speech signal classification because it is possible to identify different patterns by visualizing their speech patterns. Modified Mel frequency cepstral coefficient (MFCC) features namely MFCC- Statistics Grade (MFCC-SG) were extracted which were visualized by plotting techniques and thereafter fed to a convolutional neural network. In this study, we used the top 4 languages namely Telugu, Tamil, Malayalam, and Kannada. Experiments were performed on more than 900 hours of data collected from YouTube leading to over 150000 images and the highest accuracy of 94.51% was obtained.
Similar content being viewed by others
References
Alexa, https://www.alexa.com/ [Online; Accessed 5 Oct 2019]
Ambikairajah E, Li H, Wang L, Yin B, Sethu V (2011) Language identification: a tutorial. IEEE Circuits and Systems Magazine 11(2):82–108
Anjana JS, Poorna SS (2018) Language identification from speech features using SVM and LDA. In: 2018 international conference on wireless communications, signal processing and networking (WiSPNET). IEEE, pp 1–4
Bansal S, Agrawal SS (2017) Modeling of linguistic and acoustic information from speech signal for multilingual spoken language identification system (SLID). In: 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA). IEEE, pp 1–6
Bartz C, Herold T, Yang H, Meinel C (2017) Language identification using deep convolutional recurrent neural networks. In: International conference on neural information processing. Springer, Cham, pp 880–889
Bouguelia MR, Nowaczyk S, Santosh KC, Verikas A (2017) Agreeing to disagree: active learning with noisy labels without crowdsourcing. In: International journal of machine learning and cybernetics, pp 1–13
Cortana, https://www.microsoft.com/en-in/windows/cortana [Online; Accessed 5 Oct 2019]
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learning Res 7:1–30
Ethnologue, http://www.ethnologue.com, [Online; Accessed 19 Jan 2019]
Giwa O, Davel MH (2017) The effect of language identification accuracy on speech recognition accuracy of proper names. In: Pattern recognition association of South Africa and robotics and mechatronics (PRASA-RobMech), 2017. IEEE, pp 187–192
Gunawan TS, Husain R, Kartiwi M (2017) Development of language identification system using MFCC and vector quantization. In: 2017 IEEE 4th international conference on smart instrumentation, measurement and application (ICSIMA). IEEE, pp 1–4
Gupta M, Bharti SS, Agarwal S (2017) Implicit language identification system based on random forest and support vector machine for speech. In: 2017 4th international conference on power, control & embedded systems (ICPCES). IEEE, pp 1–6
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1):10–18
https://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/, [Online; Accessed 19 Aug 2018]
https://www.youtube.com, [Online; Accessed 19 Aug 2018]
https://en.wikipedia.org/wiki/Dravidian_languages [Online; Accessed 5 Oct 2019]
Jin M, Song Y, McLoughlin I, Dai LR, Jin M, Song Y, McLoughlin I, Dai LR (2018) LID-senones and their statistics for language identification. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26(1):171–183
Jothilakshmi S, Ramalingam V, Palanivel S (2012) A hierarchical language identification system for Indian languages. Digital Signal Processing 22(3):544–553
Kadambe S, Hieronymus JL (1995) Language identification with phonological and lexical models. In: 1995 International conference on acoustics, speech, and signal processing, 1995. ICASSP-95, vol 5. IEEE, pp 3507–3510
Mukherjee H, Dhar A, Phadikar S, Roy K (2017) RECAL-A language identification system. In: 2017 international conference on signal processing and communication (ICSPC). IEEE, pp 300–304
Mukherjee H, Dhar A, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2018) Identification of top-3 spoken Indian languages: an ensemble learning-based approach. In: 2018 fourth international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 135–140
Mukherjee H, Dutta M, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2018) Lazy learning based segregation of top-3 South Indian languages with LSF-A feature. In: International conference on recent trends in image processing and pattern recognition . Springer, Singapore, pp 449–459
Mukherjee H, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2018) Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int J Speech Techno 21(4):753–760
Mukherjee H, Dhar A, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2019) Linear predictive coefficients-based feature to identify top-7 spoken language. In: International journal of pattern recognition and artificial intelligence, DOI https://doi.org/10.1142/S0218001420580069, (to appear in print)
Niesler T, Willett D (2006) Language identification and multilingual speech recognition using discriminatively trained acoustic models. In: Multilingual speech and language processing
Nyodu K, Sambyo K (2018) Automatic identification of Arunachal language using K-nearest neighbor algorithm. In: 2018 international conference on advances in computing, communication control and networking (ICACCCN). IEEE, pp 213–216
Obaidullah SM, Bose A, Mukherjee H, Santosh KC, Das N, Roy K (2018) Extreme learning machine for handwritten Indic script identification in multiscript documents. J Electron Imaging 27(5):051214
Rao KS, Maity S, Reddy VR (2013) Pitch synchronous and glottal closure based speech analysis for language recognition. Int J Speech Technol 16(4):413–430
Rebai I, BenAyed Y, Mahdi W (2017) Improving of open-set language identification by using deep SVM and thresholding functions. In: 2017 IEEE/ACS 14th international conference on computer systems and applications (AICCSA). IEEE, pp 796–802
Reddy VR, Maity S, Rao KS (2013) Identification of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol 16(4):489–511
Revathi A, Jeyalakshmi C (2017) Comparative analysis on the use of features and models for validating language identification system. In: International conference on inventive computing and informatics (ICICI). IEEE, pp 693–698
Siri, https://www.apple.com/in/siri/ [Online; Accessed 5 Oct 2019]
Tang Z, Wang D, Chen Y, Li L, Abel A (2018) Phonetic temporal neural model for language identification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(1):134–144
Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2018) Deep learning for word-level handwritten Indic script identification. arXiv:1801.01627
Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2019) Improved word-level handwritten Indic script identification by integrating small convolutional neural networks. Neural Comput & Appl: 1–16 https://doi.org/10.1007/s00521-019-04111-1
Vajda S, Santosh KC (2016) A fast k-nearest neighbor classifier using unsupervised clustering. In: RTIP2R-2016, pp 185–193
Wang JC, Wang CY, Chin YH, Liu YT, Chen ET, Chang PC (2017) Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition. Multimed Tools Appl 76(3):4055–4068
Zhan Q, Zhang L, Deng H, Xie X (2018) An improved LSTM for language identification. In: 2018 14th IEEE international conference on signal processing (ICSP). IEEE, pp 609–612
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mukherjee, H., Dhar, A., Obaidullah, S.M. et al. Image-based features for speech signal classification. Multimed Tools Appl 79, 34913–34929 (2020). https://doi.org/10.1007/s11042-019-08553-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08553-6