Abstract
The paper presents a novel precision matrix modeling technique for Gaussian Mixture Models (GMMs), which is based on the concept of sparse representation. Representation coefficients of each precision matrix (inverse covariance), as well as an accompanying overcomplete matrix dictionary, are learned by minimizing an appropriate functional, the first component of which corresponds to the sum of Kullback-Leibler (KL) divergences between the initial and the target GMM, and the second represents the sparse regularizer of the coefficients. Compared to the existing, alternative approaches for approximate GMM modeling, like popular subspace-based representation methods, the proposed model results in notably better trade-off between the representation error and the computational (memory) complexity. This is achieved under assumption that the training data in the recognition system utilizing GMM have an inherent sparseness property, which enables application of the proposed model and approximate representation using only one dictionary and a significantly smaller number of coefficients. Proposed model is experimentally compared with the Subspace Precision and Mean (SPAM) model, a state of the art instance of subspace-based representation models, using both the data from a real Automatic Speech Recognition (ASR) system, and specially designed sets of artificially created/synthetic data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
2 \(\lVert {x}\rVert _{l_{0}}\) represents the number of nonzero coefficients in vector \(x \in {\mathbb {R}^{n}}\), while \(\lVert {x}\rVert _{l_{1}}\) represents its convex relaxation.
There, a distinction was made between inner product and scalar multiplications, and the number of scalar and vector additions was also considered.
4 Sub-gradient ∂ f(x) of a convex function \(f: D\subset \mathbb {R}^{d} \rightarrow \mathbb {R}\) obtained in some x ∈ D is defined as the set of all \(a \in \mathbb {R}^{d}\), such that \(f(y)-f(x)\ge {\langle a\vert x-y \rangle }_{\mathbb {R}^{d}}\).
5 Note that each matrix P can be written as P = QΛQ, where Λ is diagonal matrix whose entries are eigenvalues of P, and Q is orthogonal matrix from COE.
References
Aharon M, Bruckstein MEA (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Axelrod S, Gopinath R, Olsen P (2002) Modeling with a subspace constraint on inverse covariance matrices. In: Proceedings of the ISCA internaional conference on spoken language processing, pp 2177–2180
Axelrod S, Goel V, Gopinath RA, Olsen PA, Visweswariah K (2005) Subspace constrained Gaussian mixture models for speech recognition. IEEE Trans Speech Audio Process 13(6):1144– 1160
Bertolami R, Bunke H (2008) Hidden Markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recog 41(11):3452–3460
Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press
Burget L, Schwarz P, Agarwal M, Akyazi P, Kai F, Glembek O, Goel N, Karafiát M, Povey D, Rastrow A, Rose RC, Thomas S (2010) Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 4334–4337
Cai R, Hao Z, Wen W, Wang L (2013) Regularized Gaussian mixture model based discretization for gene expression data association mining. Appl Intell 39(3):607–613
Chen J, Zhang B, Cao H, Prasad R, Natarajan P (2012a) Applying discriminatively optimized feature transform for HMM-based off-line handwriting recognition. In: Proceedings of the IEEE international conference on frontiers in handwriting recognition, pp 219–224
Chen L, Mao X, Wei P, Xue Y, Ishizuka M (2012b) Mandarin emotion recognition combining acoustic and emotional point information. Appl Intell 37(4):602–612
Dharanipragada S, Visweswariah K (2006) Gaussian mixture models with covariances or precisions in shared multiple subspaces. IEEE Trans Speech Audio Process 14(4):1255– 1266
Elad M (2010) Sparse and redundant representations: from theory to applications in signal and image processing. Springer Verlag
Elad M, Figueiredo MAT, Ma Y (2010) On the role of sparse and redundant representations in image processing. Proc IEEE 98(6):972–982
Gales MJF (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281
Gopinath RA (1998) Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 2, pp 661–664
Günter S, Bunke H (2004) HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recog 37(10):2069–2079
Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing, vol 4, pp 317–320
Horn RA, Johnson CR (1990) Matrix analysis. Cambridge University Press
Hörster E, Lienhart R, Slaney M (2008) Continuous visual vocabulary models for pLSA-based scene recognition. In: Proceedings of the ACM international conference on content-based image and video retrieval, pp 319–328
Inoue N, Shinoda K (2012) A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors. IEEE Trans Multimedia 14(4):1196–1205
Janev M, Pekar D, Jakovljević N, Delić V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMMs with full covariance matrices. Appl Intell 33(2):107– 116
Kannan A, Ostendorf N, Rohlicek J R (1994) Maximum likelihood clustering of Gaussian mixtures for speech recognition. IEEE Trans Speech Audio Process 2(3):453–455
Liwicki M, Bunke H (2009) Combining diverse on-line and off-line systems for handwritten text line recognition. Pattern Recog 42(12):3254–3263
Mezzadri F (2007) How to generate random matrices from the classical compact groups. AMS Not 54(5):592–04
Nocedal J, Wright SJ (1999) Numerical optimization. Springer Verlag
Olsen P A, Gopinath R A (2004) Modeling inverse covariance matrices by basis expansion. IEEE Trans Speech Audio Process 12(1):37–46
Perkins S, Theiler J (2003) Online feature selection using Grafting. In: Proceedings of the IMLS international conference on machine learning, vol 20, pp 592–599
Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356
Popović B, Janev M, Pekar D, Jakovljević N, Gnjatović M, Sečujski M, Delić V (2012) A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models. Appl Intell 37(3):377–389
Povey D (2009) A tutorial-style introduction to subspace Gaussian mixture models for speech recognition. Tech. Rep. MSR-TR-2009-111. Microsoft Research, Redmond, WA
Povey D, Burget L, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel NK, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010) Subspace Gaussian mixture models for speech recognition. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 4330–4333
Povey D, Burget L, Agarwal M, Akyazi P, FKai Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A, Rose R C, Schwarz P, Thomas S (2011) The subspace Gaussian mixture modela structured model for speech recognition. Comput Speech Lang 25(2):404–439
Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057
Schmidt M, Fung G, Rosaless R (2009) Optimization methods for ℓ 1-regularization. Tech. Rep. TR-2009-19, University of British Columbia
Spall JC (2003) Introduction to stochastic search and optimization - Estimation, simulation and control. Wiley
Trefethen LN, Bau D (1997) Numerical linear algebra. 50, SIAM
Vanhoucke V, Sankar A (2004) Mixtures of inverse covariances. IEEE Trans Speech Audio Process 12(3):250–264
Wang Y, Huo Q (2009) Modeling inverse covariance matrices by expansion of tied basis matrices for online handwritten Chinese character recognition. Pattern Recog 42(12):3296–3302
Webb AR (2002) Statistical pattern recognition. Wiley
Wright SJ, Nowak RD, Figueiredo MAT (2009) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493
Acknowledgements
This research work has been supported by the Ministry of Science and Technology of Republic of Serbia, as part of projects: III44003, III43002 and TR32035.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brkljač, B., Janev, M., Obradović, R. et al. Sparse representation of precision matrices used in GMMs. Appl Intell 41, 956–973 (2014). https://doi.org/10.1007/s10489-014-0581-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0581-6