Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

113 Accesses
Explore all metrics

Abstract

Spectral-based features, typically used in ASR systems, do not capture the phase information of speech signals. Thus, exploiting new features that do not ignore the phase of the signal can be a complementary approach to improve the performance of the feature extraction (FE) block of an ASR system. In this paper, we propose an adaptive FE method that uses the reconstructed phase space (RPS) and recurrence plot (RP) theories as its foundations. The RP transformation can reveal some important aspects of the dynamics of high-dimensional speech trajectories reconstructed in the RPS. In this work, after transforming the speech signal to the image-like RP domain as a matrix, we apply a powerful wavelet-based FE method. We use a two-dimensional adaptive wavelet transform, implemented through a customized filter bank, to extract some beneficial dynamical features from the RP matrix for the ASR task. We evaluate the resulting features in an ASR task alone and in combination with the traditional MFCCs. Using the TIMIT speech corpus, the combination of the proposed and MFCC features results in a relative improvement of 7.79% in phoneme recognition accuracy rate compared to using only the MFCC features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Article 01 February 2024

Temporal Feature Selection for Noisy Speech Recognition

Spectro-temporal Power Spectrum Features for Noise Robust ASR

Article 22 November 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Employed dataset is referenced in the article.

References

Jiang, J.J., Zhang, Y.: Chaotic vibration induced by turbulent noise in a two-mass model of vocal folds. The J. Acoust. Soc. Am. 112(5), 2127–2133 (2002)
ADS PubMed Google Scholar
Povinelli, R.J., et al.: Statistical models of reconstructed phase spaces for signal classification. IEEE Trans. Signal Process. 54(6), 2178–2186 (2006)
ADS Google Scholar
Vieira, V.J., et al.: Exploiting nonlinearity of the speech production system for voice disorder assessment by recurrence quantification analysis. Chaos: An Interdiscip. J. Nonlinear Sci. 28(8), 085709 (2018)
Google Scholar
Datta, A.K.: Nonlinearity in speech signal. In: Time Domain Representation of Speech Sounds, pp. 131–154. Springer (2018)
Google Scholar
Shekofteh, Y., Almasganj, F.: Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes. Digit. Signal Process. 23(6), 1923–1932 (2013)
Google Scholar
Shekofteh, Y., Almasganj, F., Daliri, A.: MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space. Eng. Appl. Artif. Intell. 44, 1–9 (2015)
Google Scholar
Firooz, S.G., Almasganj, F., Shekofteh, Y.: Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals. Comput. Electr. Eng. 58, 215–226 (2017)
Google Scholar
Jafari, A., Almasganj, F., Bidhendi, M.N.: Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance. Chaos: An Interdiscip. J. Nonlinear Sci. 20(3), 033106 (2010)
Google Scholar
Wesley, R.J., Khan, A.N., Shahina, A.: Phoneme classification in reconstructed phase space with convolutional neural networks. Pattern Recogn. Lett. 135, 299–306 (2020)
ADS Google Scholar
Akbari, H., et al.: Schizophrenia recognition based on the phase space dynamic of EEG signals and graphical features. Biomed. Signal Process. Control 69, 102917 (2021)
Google Scholar
Johnson, M.T., et al.: Time-domain isolated phoneme classification using reconstructed phase spaces. IEEE Trans. Speech Audio Process. 13(4), 458–466 (2005)
Google Scholar
Kokkinos, I., Maragos, P.: Nonlinear speech analysis using models for chaotic systems. IEEE Trans. Speech Audio Process. 13(6), 1098–1109 (2005)
Google Scholar
Shekofteh, Y., Almasganj, F.: Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems. ETRI J. 35(1), 100–108 (2013)
Google Scholar
Vaziri, G., Almasganj, F., Behroozmand, R.: Pathological assessment of patients’ speech signals using nonlinear dynamical analysis. Comput. Biol. Med. 40(1), 54–63 (2010)
PubMed Google Scholar
Wallot, S., Mønster, D.: Calculation of average mutual information (AMI) and false-nearest neighbors (FNN) for the estimation of embedding parameters of multidimensional time series in matlab. Front. Psychol. 9, 1679 (2018)
PubMed PubMed Central Google Scholar
Shekofteh, Y., et al.: Parameter identification of chaotic systems using a modified cost function including static and dynamic information of attractors in the state space. Circ. Syst. Signal Process. 38(5), 2039–2054 (2019)
Google Scholar
Marwan, N., et al.: Recurrence plots for the analysis of complex systems. Phys. Rep. 438(5–6), 237–329 (2007)
ADS MathSciNet Google Scholar
Mathunjwa, B.M., et al.: ECG arrhythmia classification by using a recurrence plot and convolutional neural network. Biomed. Signal Process. Control 64, 102262 (2021)
Google Scholar
Saeedi, N.E., Almasganj, F.: Wavelet adaptation for automatic voice disorders sorting. Comput. Biol. Med. 43(6), 699–704 (2013)
Google Scholar
Zolfaghari, M., Gholami, S.: A hybrid approach of adaptive wavelet transform, long short-term memory and ARIMA-GARCH family models for the stock index prediction. Expert Syst. Appl. 182, 115149 (2021)
Google Scholar
Liu, X., et al.: Adaptive wavelet transform model for time series data prediction. Soft. Comput. 24(8), 5877–5884 (2020)
Google Scholar
Qu, H., Li, T., Chen, G.: Adaptive wavelet transform: definition, parameter optimization algorithms, and application for concrete delamination detection from impact echo responses. Struct. Health Monit. 18(4), 1022–1039 (2019)
Google Scholar
Whitney, H.: Differentiable manifolds. Ann. Math. 37, 645–680 (1936)
MathSciNet Google Scholar
Takens, F.: Detecting strange attractors in turbulence. In: Dynamical systems and turbulence, Warwick 1980, pp. 366–381. Springer (1981)
Google Scholar
Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. J. Stat. Phys. 65(3), 579–616 (1991)
ADS MathSciNet Google Scholar
Lao, S.-K., et al.: Cost function based on Gaussian mixture model for parameter estimation of a chaotic circuit with a hidden attractor. Int. J. Bifurcation Chaos 24(01), 1450010 (2014)
ADS MathSciNet Google Scholar
Povinelli, R.J., et al.: Time series classification using Gaussian mixture models of reconstructed phase spaces. IEEE Trans. Knowl. Data Eng. 16(6), 779–783 (2004)
Google Scholar
Hirata, Y., et al.: Fast reconstruction of an original continuous series from a recurrence plot. Chaos: An Interdiscip. J. Nonlinear Sci. 31(12), 121101 (2021)
MathSciNet Google Scholar
Marwan, N., et al.: Complex network approach for recurrence analysis of time series. Phys. Lett. A 373(46), 4246–4254 (2009)
ADS CAS Google Scholar
Hołyst, J., Żebrowska, M., Urbanowicz, K.: Observations of deterministic chaos in financial time series by recurrence plots, can one control chaotic economy? The Eur. Phys. J. B-Condens. Matter Complex Syst. 20(4), 531–535 (2001)
MathSciNet Google Scholar
Webber, C. and Marwan, N.: Recurrence quantification analysis. Theory and Best Practices (2015)
Gao, X., et al.: Automatic detection of epileptic seizure based on approximate entropy, recurrence quantification analysis and convolutional neural networks. Artif. Intell. Med. 102, 101711 (2020)
PubMed Google Scholar
Shih, F.Y.: Image processing and pattern recognition: fundamentals and techniques. John Wiley & Sons (2010)
Coronel, C., et al.: Quantitative EEG markers of entropy and auto mutual information in relation to MMSE scores of probable Alzheimer’s disease patients. Entropy 19(3), 130 (2017)
ADS Google Scholar
Xu, C., et al.: Deep clustering by maximizing mutual information in variational auto-encoder. Knowl.-Based Syst. 205, 106260 (2020)
Google Scholar
Lu, T.-C., Grover, T.: Renyi entropy of chaotic eigenstates. Phys. Rev. E 99(3), 032111 (2019)
ADS MathSciNet CAS PubMed Google Scholar
Mallat, S.: A wavelet tour of signal processing. Elsevier (1999)
Cvetkovic, D., Übeyli, E.D., Cosic, I.: Wavelet transform feature extraction from human PPG, ECG, and EEG signal responses to ELF PEMF exposures: A pilot study. Digital signal processing 18(5), 861–874 (2008)
Google Scholar
Dibal, P., et al.: Application of wavelet transform in spectrum sensing for cognitive radio: a survey. Phys. Commun. 28, 45–57 (2018)
Google Scholar
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)
ADS Google Scholar
Erzin, E., Cetin, A.E. and Yardimci, Y.: Subband analysis for robust speech recognition in the presence of car noise. in 1995 International Conference on Acoustics, Speech, and Signal Processing. IEEE (1995)
Kim, C.W., Ansari, R. and Çetin, A.E.: A class of linear-phase regular biorthogonal wavelets. in icassp (1992)
Saeedi, N.E., Almasganj, F., Torabinejad, F.: Support vector wavelet adaptation for pathological voice assessment. Comput. Biol. Med. 41(9), 822–828 (2011)
PubMed Google Scholar
Strang, G. and Nguyen, T.: Wavelets and filter banks. SIAM (1996)
Neumann, J., Schnörr, C., Steidl, G.: Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recogn. 38(11), 1815–1830 (2005)
ADS Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Google Scholar
Kramer, O.: Genetic algorithms. In: Genetic algorithm essentials, pp. 11–19. Springer (2017)
Google Scholar
Murthy, Y.S., Koolagudi, S.G.: Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS). Expert Syst. Appl. 106, 77–91 (2018)
Google Scholar
Behroozmand, R., Almasganj, F.: Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis. Comput. Biol. Med. 37(4), 474–485 (2007)
PubMed Google Scholar
Bafroui, H.H., Ohadi, A.: Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions. Neurocomputing 133, 437–445 (2014)
Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Google Scholar
Déjean, S. et al.: Forward and backward feature selection for query performance prediction. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing. (2020)
Garofolo, J.S.: Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, (1993)
Young, S., et al.: The HTK book. Camb. Univ. Eng. Dep. 3(175), 12 (2002)
Google Scholar

Download references

Funding

No funding.

Author information

Authors and Affiliations

School of Electrical & Computer Engineering, University of Tehran, Tehran, Iran
Shabnam Firooz
Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran
Farshad Almasganj
Faculty of Computer Science and Engineering, Shahid Beheshti University, Velenjak, Tehran, 19839-69411, Iran
Yasser Shekofteh

Authors

Shabnam Firooz
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Almasganj
View author publications
You can also search for this author in PubMed Google Scholar
Yasser Shekofteh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by SF and YS. The first draft of the manuscript was written by SF and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yasser Shekofteh.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Firooz, S., Almasganj, F. & Shekofteh, Y. Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories. SIViP 18, 1959–1967 (2024). https://doi.org/10.1007/s11760-023-02921-4

Download citation

Received: 16 September 2022
Revised: 23 November 2023
Accepted: 26 November 2023
Published: 15 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11760-023-02921-4

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Temporal Feature Selection for Noisy Speech Recognition

Spectro-temporal Power Spectrum Features for Noise Robust ASR

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Temporal Feature Selection for Noisy Speech Recognition

Spectro-temporal Power Spectrum Features for Noise Robust ASR

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation