Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Clustering hidden Markov models with variational HEM

Published: 01 January 2014 Publication History

Abstract

The hidden Markov model (HMM) is a widely-used generative model that copes with sequential data, assuming that each observation is conditioned on the state of a hidden Markov chain. In this paper, we derive a novel algorithm to cluster HMMs based on the hierarchical EM (HEM) algorithm. The proposed algorithm i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a "cluster center", that is, a novel HMM that is representative for the group, in a manner that is consistent with the underlying generative model of the HMM. To cope with intractable inference in the E-step, the HEM algorithm is formulated as a variational optimization problem, and efficiently solved for the HMM case by leveraging an appropriate variational approximation. The benefits of the proposed algorithm, which we call variational HEM (VHEM), are demonstrated on several tasks involving time-series data, such as hierarchical clustering of motion capture sequences, and automatic annotation and retrieval of music and of online hand-writing data, showing improvements over current methods. In particular, our variational HEM algorithm effectively leverages large amounts of data when learning annotation models by using an efficient hierarchical estimation procedure, which reduces learning times and memory requirements, while improving model robustness through better regularization.

References

[1]
Dimitris Achlioptas, Frank McSherry, and Bernhard Schölkopf. Sampling techniques for kernel methods. In Advances in Neural Information Processing Systems 14, volume 1, pages 335-342. MIT Press, 2002.
[2]
Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. Discovering clusters in motion time-series data. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 368-375. IEEE, 2003.
[3]
Arthur Asuncion and David J Newman. UCI machine learning repository, 2010. URL http://archive.ics.uci.edu/ml.
[4]
Claus Bahlmann and Hans Burkhardt. Measuring HMM similarity with the Bayes probability of error and its application to online handwriting recognition. In Proceedings of the Sixth International Conference on Document Analysis and Recognition, pages 406-411. IEEE, 2001.
[5]
Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, and Joydeep Ghosh. Clustering with Bregman divergences. The Journal of Machine Learning Research, 6:1705-1749, 2005.
[6]
Eloi Batlle, Jaume Masip, and Enric Guaus. Automatic song identification in noisy broadcast audio. In Proceeding of the International Conference on Signal and Image Processing. IASTED, 2002.
[7]
Jerome R Bellegarda and David Nahamoo. Tied mixture continuous parameter modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(12):2033-2045, 1990.
[8]
Yoshua Bengio, Jean-Francois Paiement, Pascal Vincent, Olivier Delalleau, Nicolas Le Roux, and Marie Ouimet. Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In Advances in Neural Information Processing Systems, volume 16, pages 177-184. MIT Press, 2004.
[9]
David M Blei and Michael I Jordan. Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1):121-143, 2006.
[10]
William M Campbell. A SVM/HMM system for speaker recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages II-209. IEEE, 2003.
[11]
Gustavo Carneiro, Antoni B Chan, Pedro J Moreno, and Nuno Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3):394-410, 2007.
[12]
Antoni B Chan, Emanuele Coviello, and Gert RG Lanckriet. Clustering Dynamic Textures with the Hierarchical EM Algorithm. In Proceeding of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010a.
[13]
Antoni B Chan, Emanuele Coviello, and Gert RG Lanckriet. Derivation of the hierarchical EM Algorithm for Dynamic Textures. Technical report, City University of Hong Kong, 2010b.
[14]
Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. Learning sequence kernels. In IEEE Workshop on Machine Learning for Signal Processing, pages 2-8. IEEE, 2008.
[15]
Emanuele Coviello, Antoni B Chan, and Gert RG Lanckriet. Time series models for semantic music annotation. IEEE Transactions on Audio, Speech, and Language Processing, 5(19): 1343-1359, 2011.
[16]
Emanuele Coviello, Antoni Chan, and Gert Lanckriet. The variational hierarchical EM algorithm for clustering hidden Markov models. In Advances in Neural Information Processing Systems 25, pages 413-421, 2012a.
[17]
Emanuele Coviello, Adeel Mumtaz, Antoni B Chan, and Gert RG Lanckriet. Growing a Bag of Systems Tree for fast and accurate classification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1979-1986. IEEE, 2012b.
[18]
Imre Csiszár and Gábor Tusnády. Information geometry and alternating minimization procedures. Statistics and decisions, 1984.
[19]
Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-38, 1977.
[20]
Inderjit S. Dhillon. Differential entropic clustering of multivariate Gaussians. In Advances in Neural Information Processing Systems, volume 19, page 337. MIT Press, 2007.
[21]
Gianfranco Doretto, Alessandro Chiuso, Ying Nian Wu, and Stefano Soatto. Dynamic textures. International Journal on Computer Vision, 51(2):91-109, 2003.
[22]
Douglas Eck, Paul Lamere, Thierry Bertin-Mahieux, and Stephen Green. Automatic generation of social tags for music recommendation. In Advances in Neural Information Processing Systems, pages 385-392. MIT Press, 2008.
[23]
Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. Spectral grouping using the nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):214-225, 2004.
[24]
Jacob Goldberger and Sam Roweis. Hierarchical clustering of a mixture model. In Advances in Neural Information Processing Systems, pages 505-512. MIT Press, 2004.
[25]
Asela Gunawardana and William Byrne. Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research, 6:2049-73, 2005.
[26]
Peter Hall, Joel L Horowitz, and Bing-Yi Jing. On blocking rules for the bootstrap with dependent data. Biometrika, 82(3):561-574, 1995.
[27]
Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2):83-85, 2005.
[28]
Christian Hennig. Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis, 52(1):258-271, 2007.
[29]
John R Hershey and Peder A Olsen. Variational Bhattacharyya divergence for hidden Markov models. In IEEE Computer Society International Conference on Acoustics, Speech and Signal Processing, pages 4557-4560. IEEE, 2008.
[30]
John R Hershey, Peder A Olsen, and Steven J Rennie. Variational Kullback-Leibler divergence for hidden Markov models. In IEEE Workshop on Automatic Speech Recognition & Understanding, pages 323-328. IEEE, 2007.
[31]
Matthew D Hoffman, David M Blei, and Perry R Cook. Easy as CBA: A simple probabilistic model for tagging music. In Proceedings of the 10th International Symposium on Music Information Retrieval, pages 369-374, 2009.
[32]
Xuedong D Huang and Mervyn A Jack. Semi-continuous hidden Markov models for speech signals. Computer Speech & Language, 3(3):239-251, 1989.
[33]
Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2 (1):193-218, 1985.
[34]
Tommi S. Jaakkola. Tutorial on Variational Approximation Methods. In Advanced Mean Field Methods: Theory and Practice, pages 129-159. MIT Press, 2000.
[35]
Tony Jebara, Risi Kondor, and Andrew Howard. Probability product kernels. The Journal of Machine Learning Research, 5:819-844, 2004.
[36]
Tony Jebara, Yingbo Song, and Kapil Thadani. Spectral clustering and embedding with hidden Markov models. In Machine Learning: ECML 2007, pages 164-175. 2007.
[37]
Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183-233, 1999.
[38]
Biing-Hwang Juang and Lawrence R Rabiner. A probabilistic distance measure for hidden Markov models. AT&T Technical Journal, 64(2):391-408, February 1985.
[39]
Leonard Kaufman and Peter Rousseeuw. Clustering by means of medoids. Statistical Data Analysis Based on the L1-Norm and Related Methods, pages 405-416, 1987.
[40]
Eamonn Keogh and Chotirat Ann Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3):358-386, 2005.
[41]
Eamonn J Keogh and Michael J Pazzani. Scaling up dynamic time warping for datamining applications. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 285-289. ACM, 2000.
[42]
Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, and Michael I Jordan. Bootstrapping big data. Advances in Neural Information Processing Systems, Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale, 2011.
[43]
Anders Krogh, Michael Brown, I Saira Mian, Kimmen Sjölander, and David Haussler. Hidden Markov models in computational biology. Applications to protein modeling. Journal of Molecular Biology, 235(5):1501-1531, 1994.
[44]
Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Scalable algorithms for string kernels with inexact matching. In Advances in Neural Information Processing Systems, pages 881-888, 2008.
[45]
Christina Leslie, Eleazar Eskin, and William Stafford Noble. The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing, volume 7, pages 566-575, 2002.
[46]
Marcus Liwicki and Horst Bunke. Iam-ondb-an on-line english sentence database acquired from handwritten text on a whiteboard. In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pages 956-961. IEEE, 2005.
[47]
Rune B Lyngso, Christian NS Pedersen, and Henrik Nielsen. Metrics and similarity measures for hidden Markov models. In Proceedings International Conference on Intelligent Systems for Molecular Biology, pages 178-186, 1999.
[48]
David JC MacKay. Information Theory, Inference and Learning Algorithms. Cambridge university press, 2003.
[49]
Michael I Mandel and Daniel PW Ellis. Multiple-instance learning for music information retrieval. In Proceedings of the 9th International Symposium on Music Information Retrieval, pages 577-582, 2008.
[50]
Nag, Kin-Hong Wong, and Frank Fallside. Script recognition using hidden Markov models. In IEEE Computer Society International Conference on Acoustics, Speech, and Signal Processing, volume 11, pages 2071-2074. IEEE, 1986.
[51]
Radford M Neal and Geoffrey E Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, 89:355-370, 1998.
[52]
Evert J Nyström. Über die praktische auflösung von integralgleichungen mit anwendungen auf randwertaufgaben. Acta Mathematica, 54(1):185-204, 1930.
[53]
Tim Oates, Laura Firoiu, and Paul R Cohen. Clustering time series with hidden markov models and dynamic time warping. In Joint Conferences on Artificial Intelligence, Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, pages 17-21, 1999.
[54]
Antonello Panuccio, Manuele Bicego, and Vittorio Murino. A hidden Markov model-based approach to sequential data clustering. Structural, Syntactic, and Statistical Pattern Recognition, pages 734-743, 2002.
[55]
William D Penny and Stephen J Roberts. Notes on variational learning. Technical report, Oxford University, 2000.
[56]
Yuting Qi, John William Paisley, and Lawrence Carin. Music analysis using hidden Markov mixture models. IEEE Transactions on Signal Processing, 55(11):5209-5224, 2007.
[57]
Lawrence R Rabiner and Biing-Hwang Juang. Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (NJ, USA), 1993.
[58]
Jeremy Reed and Chin-Hui Lee. A study on music genre classification based on universal acoustic models. In Proceedings of the 7th International Symposium on Music Information Retrieval, pages 89-94, 2006.
[59]
Jose Rodriguez-Serrano and Florent Perronnin. A model-based sequence similarity with application to handwritten word-spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34:2108-2120, 2012.
[60]
Nicolas Scaringella and Giorgio Zoia. On the modeling of time information for automatic genre recognition systems in audio signals. In Proc. 6th International Symposium Music Information Retrieval, pages 666-671, 2005.
[61]
Padhraic Smyth. Clustering sequences with hidden Markov models. In Advances in Neural Information Processing Systems, pages 648-654. MIT Press, 1997.
[62]
Theodoros Theodoridis and Huosheng Hu. Action classification of 3d human models using dynamic ANNs for mobile robot surveillance. In IEEE Computer Society International Conference on Robotics and Biomimetics, pages 371-376. IEEE, 2007.
[63]
Douglas Turnbull, Luke Barrington, David Torres, and Gert RG Lanckriet. Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech and Language Processing, 16(2):467-476, February 2008.
[64]
Nuno Vasconcelos. Image indexing with mixture hierarchies. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.
[65]
Nuno Vasconcelos and Andrew Lippman. Learning mixture hierarchies. In Advances in Neural Information Processing Systems, 1998.
[66]
Martin J Wainwright and Michael I Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1-305, 2008.
[67]
Ben H Williams, Marc Toussaint, and Amos J Storkey. Extracting motion primitives from natural handwriting data. In Proceeding of the 16th International Conference on Artificial Neural Networks, pages 634-643. Springer, 2006.
[68]
Jie Yin and Qiang Yang. Integrating hidden Markov models and spectral analysis for sensory time series clustering. In IEEE Computer Society International Conference on Data Mining. IEEE, 2005.
[69]
Shi Zhong and Joydeep Ghosh. A unified framework for model-based clustering. The Journal of Machine Learning Research, 4:1001-1037, 2003.

Cited By

View all
  • (2022)Recognition and Error Correction Techniques for Piano Playing Music Based on Convolutional Cyclic Hashing MethodWireless Communications & Mobile Computing10.1155/2022/56609612022Online publication date: 1-Jan-2022
  • (2020)Aggregated Wasserstein Distance and State Registration for Hidden Markov ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2019.290863542:9(2133-2147)Online publication date: 1-Sep-2020
  • (2019)Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate InferenceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.284537141:6(1323-1337)Online publication date: 3-May-2019
  • Show More Cited By
  1. Clustering hidden Markov models with variational HEM

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 15, Issue 1
    January 2014
    4085 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents

    Publisher

    JMLR.org

    Publication History

    Published: 01 January 2014
    Revised: 01 June 2013
    Published in JMLR Volume 15, Issue 1

    Author Tags

    1. clustering
    2. hidden Markov mixture model
    3. hidden Markov model
    4. hierarchical EM algorithm
    5. time-series classification
    6. variational approximation

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Recognition and Error Correction Techniques for Piano Playing Music Based on Convolutional Cyclic Hashing MethodWireless Communications & Mobile Computing10.1155/2022/56609612022Online publication date: 1-Jan-2022
    • (2020)Aggregated Wasserstein Distance and State Registration for Hidden Markov ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2019.290863542:9(2133-2147)Online publication date: 1-Sep-2020
    • (2019)Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate InferenceIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.284537141:6(1323-1337)Online publication date: 3-May-2019
    • (2018)Analyzing Accessed Content Sequences with HDP-based ModelsProceedings of the 3rd International Conference on Multimedia Systems and Signal Processing10.1145/3220162.3220187(142-149)Online publication date: 28-Apr-2018

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media