Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Marko Janev¹,
Darko Pekar²,
Niksa Jakovljevic¹ &
…
Vlado Delic¹

121 Accesses
12 Citations
Explore all metrics

Abstract

In this paper a novel algorithm for Gaussian Selection (GS) of mixtures used in a continuous speech recognition system is presented. The system is based on hidden Markov models (HMM), using Gaussian mixtures with full covariance matrices as output distributions. The purpose of Gaussian selection is to increase the speed of a speech recognition system, without degrading the recognition accuracy. The basic idea is to form hyper-mixtures by clustering close mixtures into a single group by means of Vector Quantization (VQ) and assigning it unique Gaussian parameters for estimation. In the decoding process only those hyper-mixtures which are above a designated threshold are selected, and only mixtures belonging to them are evaluated, improving computational efficiency. There is no problem with the clustering and evaluation if overlaps between the mixtures are small, and their variances are of the same range. However, in real case, there are numerous models which do not fit this profile. A Gaussian selection scheme proposed in this paper addresses this problem. For that purpose, beside the clustering algorithm, it also incorporates an algorithm for mixture grouping. The particular mixture is assigned to a group from the predefined set of groups, based on a value aggregated from eigenvalues of the covariance matrix of that mixture using Ordered Weighted Averaging operators (OWA). After the grouping of mixtures is carried out, Gaussian mixture clustering is performed on each group separately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Complementary Gaussian Mixture Models for Multimodal Speech Recognition

Generation of GMM Weights by Dirichlet Distribution and Model Selection Using Information Criterion for Malayalam Speech Recognition

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bahl LR, de Souza PV, Gopalakrishnan PS, Nahamoo D, Picheny MA (1991) Context dependent modeling of phones in continuous speech using decision trees. In: Proc DARPA speech and natural language processing workshop, Pacific Grove, pp 264–270
Biggs MC (1975) Constrained minimization using recursive quadratic programming. In: Dixon LCW, Szergo GP (eds) Towards global optimization. North-Holland, Amsterdam, pp 341–349
Google Scholar
Bocchieri E (1993) Vector quantization for efficient computation of continuous density likelihoods. In: Proc ICASSP, Minneapolis, MN, vol 2, pp II-692–II-695
Coleman TF, Li Y (1996) An interior, trust region approach for nonlinear minimization subject to bounds. SIAM J Optim 6:418–445
Article MATH MathSciNet Google Scholar
Gales M (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281
Article Google Scholar
Huang XD, Lee KF, Hon HW (1990) On semi-continuous hidden Markov modelling. In: Proc ICASSP, pp 689–692
Hunt M, Lefebre C (1989) A comparison of several acoustic representations for speech recognition with degraded and undegraded speech. In: Proc ICASSP, pp 262–265
Kannan A, Ostendorf M, Rohlicek JR (1994) Maximum likelihood clustering of Gaussians for speech recognition. IEEE Trans Speech Audio Process 2(3):453–455
Article Google Scholar
Kay SM (1993) Fundamentals of statistical signal processing: estimation theory. Prentice Hall, New York
MATH Google Scholar
Knill KM, Gales MJF, Young SJ (1996) Use of Gaussian selection in large vocabulary continuous speech recognition using HMMs. In: Proc int conf spoken language processing
Lindo Y, Buzo A, Gray RM (1980) An algorithm for vector quantizer design. IEEE Trans Commun COMM 28:84–95
Article Google Scholar
O’Hagan M (1988) Aggregating template or rule antecedents in real time expert systems with fuzzy set logic. In: Proc of the 22-th annual IEEE Asilomar conferences on signals, systems and computers, Pacific Grove, pp 681–689
Shinoda K, Lee C-H (2001) A structural Bayes approach to speaker adaptation. IEEE Trans Speech Audio Process 9(3):276–287
Article Google Scholar
Simonin J, Delphin L, Damnati G (1998) Gaussian density tree structure in a multi-Gaussian HMM based speech recognition system. In: 5th int conf on spoken language processing, Sidney, Australia, 4 December 1998
Watanabe T, Shinoda K, Takagi K, Iso K (1995) High speed speech recognition using tree-structured probability density function. In: Proc int conf acoust speech signal process, vol 1, pp 556–559
Webb A (1999) Statistical pattern recognition. Oxford University Press, London. Arnold a member of the Hodder Headline Group, 338 Euston Road, London NW1 3BH, Great Britain
MATH Google Scholar
Yager RR (1988) On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Trans Syst Man Cybern 18:183–190
Article MATH MathSciNet Google Scholar
Yager RR, Kacprzyk J (1997) The ordered weighted averaging operators, theory and applications. Kluwer Academic, Dordrecht
Google Scholar
Yager RR, Rybalov A (1996) Uniform aggregation operators. Fuzzy Sets Syst 80:111–120
Article MATH MathSciNet Google Scholar
Yager RR, Rybalov A (1998) Full reinforcement operators in aggregation techniques. IEEE Trans Syst Man Cybern 28:757–769
Article Google Scholar
Young SJ, Odell JJ, Woodland PC (1994) Tree-based state tying for high accuracy acoustic modeling. In: Proc of the workshop on human on human language technology, pp 307–312

Download references

Author information

Authors and Affiliations

Faculty of Technical Sciences Novi Sad, University of Novi Sad, Novi Sad, Serbia
Marko Janev, Niksa Jakovljevic & Vlado Delic
Alfanum Speech Technologies, Novi Sad, Serbia
Darko Pekar

Authors

Marko Janev
View author publications
You can also search for this author in PubMed Google Scholar
Darko Pekar
View author publications
You can also search for this author in PubMed Google Scholar
Niksa Jakovljevic
View author publications
You can also search for this author in PubMed Google Scholar
Vlado Delic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marko Janev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Janev, M., Pekar, D., Jakovljevic, N. et al. Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices. Appl Intell 33, 107–116 (2010). https://doi.org/10.1007/s10489-008-0152-9

Download citation

Received: 14 April 2008
Accepted: 13 October 2008
Published: 03 December 2008
Issue Date: October 2010
DOI: https://doi.org/10.1007/s10489-008-0152-9

Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Complementary Gaussian Mixture Models for Multimodal Speech Recognition

Generation of GMM Weights by Dirichlet Distribution and Model Selection Using Information Criterion for Malayalam Speech Recognition

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Eigenvalues Driven Gaussian Selection in continuous speech recognition using HMMs with full covariance matrices

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Complementary Gaussian Mixture Models for Multimodal Speech Recognition

Generation of GMM Weights by Dirichlet Distribution and Model Selection Using Information Criterion for Malayalam Speech Recognition

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation