Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This study explores a novel subspace projection-based approach for analysis of stressed speech. Studies have shown that stress influences the speech production system and it results in a large acoustic variation between the neutral and the stressed speech. This degrades the discrimination capability of an automatic speech recognition system trained on neutral speech when tested on stressed speech. An effort is made to reduce the acoustic mismatch by explicitly normalizing the stress-specific attributes. The stress-specific divergences are normalized by exploiting the subspace filtering technique. To accomplish this, an orthogonal projection based linear relationship between the speech and the stress information has been explored to filter an effective speech subspace, which consists of speech information. Speech subspace is constructed using K-means clustering followed by singular value decomposition method using neutral speech data. The speech and the stress information are separated by projecting the stressed speech orthogonally onto an effective speech subspace. Experimental results indicate that, the bases of an effective subspace comprises the first few eigenvectors corresponding to the highest eigenvalues. To further improve the system performance, both the neutral and the stressed speech are projected onto the lower dimensional subspace. The projections derived using the neutral speech employs heteroscedastic linear discriminant analysis in maximum likelihood linear transformations-based semi-tied adaptation framework. Consistent improvements are noted for the proposed technique in all the discussed cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bou-Ghazale, S., & Hansen, J. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429–442.

    Article  Google Scholar 

  • Chen, Y. (1988). Cepstral domain talker stress compensation for robust speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 36(4), 433–439.

    Article  MATH  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  • Gales, M. (1999). Semi-tied covariance matrices for hidden Markov models. IEEE Transactions on Speech and Audio Processing, 7(3), 272–281.

    Article  Google Scholar 

  • Gangeh, M. J., Fewzee, P., Ghodsi, A., Kamel, M. S., & Karray, F. (2014). Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(6), 1056–1068. doi:10.1109/TASLP.2014.2319157.

    Article  Google Scholar 

  • Ghai, S., & Sinha, R. (2010). Exploring the effect of differences in the acoustic correlates of adults’ and children’s speech in the context of automatic speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2010, Article ID 318785.

  • Hansen, J., & Clements, M. (1995). Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Transactions on Speech and Audio Processing, 3(5), 407–415.

    Article  Google Scholar 

  • Hansen, J., & Varadarajan, V. (2009). Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 366–378.

    Article  Google Scholar 

  • Hansen, J. H., & Patil, S. (2007). Speech under stress: Analysis, modeling and recognition. In C. Müller (Ed.), Speaker classification I (pp. 108–137). Berlin: Springer.

    Chapter  Google Scholar 

  • Hershey, J., & Olsen, P. (2007). Approximating the Kullback Leibler divergence between gaussian mixture models. In IEEE international conference on acoustics, speech and signal processing, 2007 (ICASSP 2007) (Vol. 4, pp. IV–317–IV–320).

  • Jolliffe, I. T. (1986). Principal component analysis. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Kumar, N. (1997). Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition. Ph.D. Thesis, AAI9730738

  • Nakos, G., & Joyner, D. (1998). Linear algebra with applications. Boston: Brooks/Cole.

    Google Scholar 

  • Priya, B., & Dandapat, S. (2015a). Linear transformation on speech subspace for analysis of speech under stress condition. In National conference on communications (NCC), Mumbai, India.

  • Priya, B., & Dandapat, S. (2015b). Stressed speech analysis using sparse representation over temporal information based dictionary. In Annual IEEE India conference (INDICON), Jamia Millia Islamia, New Delhi.

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River, NJ: Prentice Hall.

    MATH  Google Scholar 

  • Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 737–746.

    Article  Google Scholar 

  • Shahnawazuddin, S., Kathania, H., & Sinha, R. (2015). Enhancing the recognition of children’s speech on acoustically mismatched ASR system. In textitIEEE TENCON: Proceedings.

  • Shukla, S., Prasanna, S., & Dandapat, S. (2011). Stressed speech processing: Human vs automatic in non-professional speakers scenario. In National conference on communications (NCC) (pp. 1–5).

  • Silva, J., & Narayanan, S. (2006). Average divergence distance as a statistical discrimination measure for hidden Markov models. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 890–906.

    Article  Google Scholar 

  • Song, P., Jin, Y., Zha, C., & Zhao, L. (2015). Speech emotion recognition method based on hidden factor analysis. Electronics Letters, 51(1), 112–114.

    Article  Google Scholar 

  • Stemmer, G., & Brugnara, F. (2006). Integration of heteroscedastic linear discriminant analysis (HLDA) into adaptive training. In IEEE International conference on acoustics, speech and signal processing (ICASSP 2006, Proceedings (Vol. 1, pp. I–I).

  • Tahon, M., & Devillers, L. (2016). Towards a small set of robust acoustic features for emotion recognition: Challenges. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(1), 16–28. doi:10.1109/TASLP.2015.2487051.

    Article  Google Scholar 

  • Wang, J. C., Chin, Y. H., Chen, B. W., Lin, C. H., & Wu, C. H. (2015). Speech emotion verification using emotion variance modeling and discriminant scale-frequency maps. IEEE Transactions on Audio, Speech, and Language Processing, 23(10), 1552–1562.

    Article  Google Scholar 

  • Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.

    Article  Google Scholar 

  • Womack, B., & Hansen, J. (1999). N-channel hidden Markov models for combined stressed speech classification and recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 668–677.

    Article  Google Scholar 

  • Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598. doi:10.1109/TASL.2011.2162405.

    Article  Google Scholar 

  • Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.

    Article  Google Scholar 

  • Zheng, W., Xin, M., Wang, X., & Wang, B. (2014). A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Processing Letters, 21(5), 569–572.

    Article  Google Scholar 

  • Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216.

    Article  Google Scholar 

  • Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585–589.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhanu Priya.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Priya, B., Dandapat, S. Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments. Int J Speech Technol 19, 731–742 (2016). https://doi.org/10.1007/s10772-016-9362-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-016-9362-4

Keywords

Navigation