Abstract
Spectral-based features, typically used in ASR systems, do not capture the phase information of speech signals. Thus, exploiting new features that do not ignore the phase of the signal can be a complementary approach to improve the performance of the feature extraction (FE) block of an ASR system. In this paper, we propose an adaptive FE method that uses the reconstructed phase space (RPS) and recurrence plot (RP) theories as its foundations. The RP transformation can reveal some important aspects of the dynamics of high-dimensional speech trajectories reconstructed in the RPS. In this work, after transforming the speech signal to the image-like RP domain as a matrix, we apply a powerful wavelet-based FE method. We use a two-dimensional adaptive wavelet transform, implemented through a customized filter bank, to extract some beneficial dynamical features from the RP matrix for the ASR task. We evaluate the resulting features in an ASR task alone and in combination with the traditional MFCCs. Using the TIMIT speech corpus, the combination of the proposed and MFCC features results in a relative improvement of 7.79% in phoneme recognition accuracy rate compared to using only the MFCC features.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Employed dataset is referenced in the article.
References
Jiang, J.J., Zhang, Y.: Chaotic vibration induced by turbulent noise in a two-mass model of vocal folds. The J. Acoust. Soc. Am. 112(5), 2127–2133 (2002)
Povinelli, R.J., et al.: Statistical models of reconstructed phase spaces for signal classification. IEEE Trans. Signal Process. 54(6), 2178–2186 (2006)
Vieira, V.J., et al.: Exploiting nonlinearity of the speech production system for voice disorder assessment by recurrence quantification analysis. Chaos: An Interdiscip. J. Nonlinear Sci. 28(8), 085709 (2018)
Datta, A.K.: Nonlinearity in speech signal. In: Time Domain Representation of Speech Sounds, pp. 131–154. Springer (2018)
Shekofteh, Y., Almasganj, F.: Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes. Digit. Signal Process. 23(6), 1923–1932 (2013)
Shekofteh, Y., Almasganj, F., Daliri, A.: MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space. Eng. Appl. Artif. Intell. 44, 1–9 (2015)
Firooz, S.G., Almasganj, F., Shekofteh, Y.: Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals. Comput. Electr. Eng. 58, 215–226 (2017)
Jafari, A., Almasganj, F., Bidhendi, M.N.: Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance. Chaos: An Interdiscip. J. Nonlinear Sci. 20(3), 033106 (2010)
Wesley, R.J., Khan, A.N., Shahina, A.: Phoneme classification in reconstructed phase space with convolutional neural networks. Pattern Recogn. Lett. 135, 299–306 (2020)
Akbari, H., et al.: Schizophrenia recognition based on the phase space dynamic of EEG signals and graphical features. Biomed. Signal Process. Control 69, 102917 (2021)
Johnson, M.T., et al.: Time-domain isolated phoneme classification using reconstructed phase spaces. IEEE Trans. Speech Audio Process. 13(4), 458–466 (2005)
Kokkinos, I., Maragos, P.: Nonlinear speech analysis using models for chaotic systems. IEEE Trans. Speech Audio Process. 13(6), 1098–1109 (2005)
Shekofteh, Y., Almasganj, F.: Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems. ETRI J. 35(1), 100–108 (2013)
Vaziri, G., Almasganj, F., Behroozmand, R.: Pathological assessment of patients’ speech signals using nonlinear dynamical analysis. Comput. Biol. Med. 40(1), 54–63 (2010)
Wallot, S., Mønster, D.: Calculation of average mutual information (AMI) and false-nearest neighbors (FNN) for the estimation of embedding parameters of multidimensional time series in matlab. Front. Psychol. 9, 1679 (2018)
Shekofteh, Y., et al.: Parameter identification of chaotic systems using a modified cost function including static and dynamic information of attractors in the state space. Circ. Syst. Signal Process. 38(5), 2039–2054 (2019)
Marwan, N., et al.: Recurrence plots for the analysis of complex systems. Phys. Rep. 438(5–6), 237–329 (2007)
Mathunjwa, B.M., et al.: ECG arrhythmia classification by using a recurrence plot and convolutional neural network. Biomed. Signal Process. Control 64, 102262 (2021)
Saeedi, N.E., Almasganj, F.: Wavelet adaptation for automatic voice disorders sorting. Comput. Biol. Med. 43(6), 699–704 (2013)
Zolfaghari, M., Gholami, S.: A hybrid approach of adaptive wavelet transform, long short-term memory and ARIMA-GARCH family models for the stock index prediction. Expert Syst. Appl. 182, 115149 (2021)
Liu, X., et al.: Adaptive wavelet transform model for time series data prediction. Soft. Comput. 24(8), 5877–5884 (2020)
Qu, H., Li, T., Chen, G.: Adaptive wavelet transform: definition, parameter optimization algorithms, and application for concrete delamination detection from impact echo responses. Struct. Health Monit. 18(4), 1022–1039 (2019)
Whitney, H.: Differentiable manifolds. Ann. Math. 37, 645–680 (1936)
Takens, F.: Detecting strange attractors in turbulence. In: Dynamical systems and turbulence, Warwick 1980, pp. 366–381. Springer (1981)
Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. J. Stat. Phys. 65(3), 579–616 (1991)
Lao, S.-K., et al.: Cost function based on Gaussian mixture model for parameter estimation of a chaotic circuit with a hidden attractor. Int. J. Bifurcation Chaos 24(01), 1450010 (2014)
Povinelli, R.J., et al.: Time series classification using Gaussian mixture models of reconstructed phase spaces. IEEE Trans. Knowl. Data Eng. 16(6), 779–783 (2004)
Hirata, Y., et al.: Fast reconstruction of an original continuous series from a recurrence plot. Chaos: An Interdiscip. J. Nonlinear Sci. 31(12), 121101 (2021)
Marwan, N., et al.: Complex network approach for recurrence analysis of time series. Phys. Lett. A 373(46), 4246–4254 (2009)
Hołyst, J., Żebrowska, M., Urbanowicz, K.: Observations of deterministic chaos in financial time series by recurrence plots, can one control chaotic economy? The Eur. Phys. J. B-Condens. Matter Complex Syst. 20(4), 531–535 (2001)
Webber, C. and Marwan, N.: Recurrence quantification analysis. Theory and Best Practices (2015)
Gao, X., et al.: Automatic detection of epileptic seizure based on approximate entropy, recurrence quantification analysis and convolutional neural networks. Artif. Intell. Med. 102, 101711 (2020)
Shih, F.Y.: Image processing and pattern recognition: fundamentals and techniques. John Wiley & Sons (2010)
Coronel, C., et al.: Quantitative EEG markers of entropy and auto mutual information in relation to MMSE scores of probable Alzheimer’s disease patients. Entropy 19(3), 130 (2017)
Xu, C., et al.: Deep clustering by maximizing mutual information in variational auto-encoder. Knowl.-Based Syst. 205, 106260 (2020)
Lu, T.-C., Grover, T.: Renyi entropy of chaotic eigenstates. Phys. Rev. E 99(3), 032111 (2019)
Mallat, S.: A wavelet tour of signal processing. Elsevier (1999)
Cvetkovic, D., Übeyli, E.D., Cosic, I.: Wavelet transform feature extraction from human PPG, ECG, and EEG signal responses to ELF PEMF exposures: A pilot study. Digital signal processing 18(5), 861–874 (2008)
Dibal, P., et al.: Application of wavelet transform in spectrum sensing for cognitive radio: a survey. Phys. Commun. 28, 45–57 (2018)
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)
Erzin, E., Cetin, A.E. and Yardimci, Y.: Subband analysis for robust speech recognition in the presence of car noise. in 1995 International Conference on Acoustics, Speech, and Signal Processing. IEEE (1995)
Kim, C.W., Ansari, R. and Çetin, A.E.: A class of linear-phase regular biorthogonal wavelets. in icassp (1992)
Saeedi, N.E., Almasganj, F., Torabinejad, F.: Support vector wavelet adaptation for pathological voice assessment. Comput. Biol. Med. 41(9), 822–828 (2011)
Strang, G. and Nguyen, T.: Wavelets and filter banks. SIAM (1996)
Neumann, J., Schnörr, C., Steidl, G.: Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recogn. 38(11), 1815–1830 (2005)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Kramer, O.: Genetic algorithms. In: Genetic algorithm essentials, pp. 11–19. Springer (2017)
Murthy, Y.S., Koolagudi, S.G.: Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS). Expert Syst. Appl. 106, 77–91 (2018)
Behroozmand, R., Almasganj, F.: Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis. Comput. Biol. Med. 37(4), 474–485 (2007)
Bafroui, H.H., Ohadi, A.: Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions. Neurocomputing 133, 437–445 (2014)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Déjean, S. et al.: Forward and backward feature selection for query performance prediction. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing. (2020)
Garofolo, J.S.: Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, (1993)
Young, S., et al.: The HTK book. Camb. Univ. Eng. Dep. 3(175), 12 (2002)
Funding
No funding.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by SF and YS. The first draft of the manuscript was written by SF and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Firooz, S., Almasganj, F. & Shekofteh, Y. Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories. SIViP 18, 1959–1967 (2024). https://doi.org/10.1007/s11760-023-02921-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02921-4