Abstract
Language resources for Urdu language are not well developed. In this work, we summarize our work on the development of Urdu speech corpus for isolated words. The Corpus comprises of 250 isolated words of Urdu recorded by ten individuals. The speakers include both native and non-native, male and female individuals. The corpus can be used for both speech and speaker recognition tasks. We also report our results on automatic speech recognition task for the said corpus. The framework extracts Mel Frequency Cepstral Coefficients along with the velocity and acceleration coefficients, which are then fed to different classifiers to perform recognition task. The classifiers used are Support Vector Machines, Random Forest and Linear Discriminant Analysis. Experimental results show that the best results are provided by the Support Vector Machines with a test set accuracy of 73 %. The results reported in this work may provide a useful baseline for future research on automatic speech recognition of Urdu.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ethnologue. http://www.ethnologue.com/show_country.asp?name=PK
Sarfraz, H., et al.: Speech corpus development for a speaker independent spontaneous Urdu speech recognition system. In: Proceedings of the O-COCOSDA, Kathmandu, Nepal (2010). doi:10.1109/ivtta.1994.341535
Raza, A.A., Hussain, S., Sarfraz, H., Ullah, I., Sarfraz, Z.: Design and development of phonetically rich Urdu speech corpus. In: Proceeding of International Conference on Speech Database and Assessments, COCOSDA, pp. 38–43 (2009). doi:10.1109/icsda.2009.5278380
Akram, M.U., Arif, M.: Design of an Urdu speech recognizer based upon acoustic phonetic modeling approach. In: Proceedings of 8th International Multitopic Conference (INMIC 2004), pp. 91–96, December 2004. doi:10.1109/inmic.2004.1492852
Ahad, A., Fayyaz, A., Mehmood, T.: Speech recognition using multilayer perceptron. In: Proceedings. IEEE Students Conference, ISCON 2002, pp. 103–109, August 2002. doi:10.1109/iscon.2002.1215948
Hasnain, S., Awan, M.: Recognizing spoken Urdu numbers using fourier descriptor and neural networks with matlab. In: Second International Conference on Electrical Engineering (ICEE 2008), pp. 1–6, March 2008. doi:10.1109/icee.2008.4553937
Ashraf, J., Iqbal, N., Sarfraz Khattak, N., Mohsin Zaidi, A.: Speaker independent Urdu speech recognition using HMM. In: The 7th International Conference on Informatics and Systems (INFOS 2010), pp. 1–5, March 2010. doi:10.1007/978-3-642-13881-2_14
Ali, H., Ahmad, N., Zhou, X., Iqbal, K., Ali, S.M.: DWT features performance analysis for automatic speech recognition of Urdu. SpringerPlus 3(1), 204 (2014). doi:10.1186/2193-1801-3-204
Ali, H., Ahmad, N., Zhou, X.: Automatic speech recognition of Urdu words using linear discriminant analysis. J. Intell. Fuzzy Syst. 28(5), 2369–2375 (2015). doi:10.3233/ifs-151554
Ali, H., Jianwei, A., Iqbal, K.: Automatic speech recognition of Urdu digits with optimal classification approach. Int. J. Comput. Appl. 118(9), 1–5 (2015). doi:10.5120/20770-3275
Center for Language Engineering. www.cle.org.pk
Molau, S., Pitz, M., Schluter, R., Ney, H.: Computing Mel-frequency cepstral coefficients on the power spectrum. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), pp. 73–76 (2001). doi:10.1109/icassp.2001.940770
Han, W., Chan, C.F., Choy, C.S., Pun, K.P.: An efficient MFCC extraction method in speech recognition. In: Proceedings. IEEE International Symposium on Circuits and Systems, ISCAS 2006, May 2006. doi:10.1109/iscas.2006.1692543
Kotnik, B., Vlaj, D., Horvat, B.: Efficient noise robust feature extraction algorithms for distributed speech recognition (DSR) systems. Int. J. Speech Technol. 6(3), 205–219 (2003)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, pp. 144–152 (1992). doi:10.1145/130385.130401
Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon, I., Jackel, L., LeCun, Y., Muller, U., Sackinger, E., Simard, P., Vapnik, V.: Comparison of classifier methods: a case study in handwritten digit recognition. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 77-82, October 1994. doi:10.1109/icpr.1994.576879
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). doi:10.1145/1961189.1961199. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol 1. pp. 278–282, August 1995. doi:10.1109/icdar.1995.598994
Caruana, R., Karampatziakis, N., Yessenalina, A.: An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 96–103 (2008). doi:10.1145/1390156.1390169
Balakrishnama, S., Ganapathiraju, A.: Linear discriminant analysis: a brief tutorial. http://www.music.mcgill.ca. Accessed 10 Feb 2016
Ali, H., Zhou, X., Tie, S.: Comparison of MFCC and DWT features for automatic speech recognition of Urdu. In International Conference on Cyberspace Technology (CCT 2013), Beijing, China, pp. 154–158, November 2013. doi:10.1049/cp.2013.2112
Ali, H., d’Avila Garcez, A.S., Tran, S.N., Zhou, X., Iqbal, K.: Unimodal late fusion for NIST i-vector challenge on speaker detection. Electron. Lett. 50(15), 1098–1100 (2014). doi:10.1049/el.2014.1207
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ali, H., Ahmad, N., Hafeez, A. (2016). Urdu Speech Corpus and Preliminary Results on Speech Recognition. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-44188-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44187-0
Online ISBN: 978-3-319-44188-7
eBook Packages: Computer ScienceComputer Science (R0)