The Hierarchical Hidden Markov Model: Analysis and Applications

Shai Fine¹,
Yoram Singer² &
Naftali Tishby¹

15k Accesses
524 Citations
16 Altmetric
2 Mentions
Explore all metrics

Abstract

We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abe, N. & Warmuth, M. (1992). On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205-260.
Google Scholar
Baldi, P., Chauvin, Y., Hunkapiller, T. & McClure, M. (1994). Hidden Markov models of biological primary sequence information. Proc. Nat. Acd. Sci. (USA), 91(3):1059-1063.
Google Scholar
Baum, L.E. & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, Vol. 37.
Bengio, Y. & Frasconi, P. (1995). An input-output HMM architecture. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427-434. MIT Press, Cambridge, MA.
Google Scholar
Cover, T. & Thomas, J. (1991). Elements of Information Theory. Wiley.
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Series B, 39:1-38.
Google Scholar
Gat, I., Tishby, N. & Abeles, M. (1997). Hidden Markov modeling of simultaneously recorded cells in the associative cortex of behaving monkeys. Network: Computation in neural systems, 8:3, pages 297-322.
Google Scholar
Ghahramani Z. & Jordan, M.I. (1997). Factorial hidden Markov models. Machine Learning, 29: 245-273.
Google Scholar
Gillman, D. & Sipser, M. (1994). Inference and minimization of hidden Markov chains. Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 147-158.
Jelinek, F. (1985). Robust part-of-speech tagging using a hidden Markov model. IBM T.J. Watson Research Technical Report.
Jelinek, F. (1983). Markov source modeling of text generation. Technical report, IBM T.J. Watson Research Center Technical Report.
Jelinek, F. (1985). Self-organized language modeling for speech recognition. IBM T.J. Watson Research Center Technical Report.
Krogh, A., Mian, S.I. & Haussler, D. (1994). A hidden Markov model that finds genes in E.coli DNA. NAR, 22”4768-4778.
Google Scholar
Lari, K. & Young, S.J. (1990). The estimation of stochastic context free grammars using the Inside-Outside algorithm. Computers Speech and Language, 4.
Nag, R., Wong, K.H. & Fallside, F. (1985). Script recognition using hidden Markov models. Proceedings of International Conference on Acoustics Speech and Signal Processing, pages 2071-2074.
Rabiner, L.R. & Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, No. 3.
Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE.
Rissanen, J. (1986). Complexity of strings in the class of Markov sources. IEEE Transactions on Information Theory, 32(4):526-532.
Google Scholar
Ron, D., Singer, Y. & Tishby, N. (1996). The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning, 25:117-149.
Google Scholar
Singer, Y. & Tishby, N.. (1994). Dynamical encoding of cursive handwriting. Biological Cybernetics, 71(3):227-237.
Google Scholar
Singer, Y. & Warmuth, M.K. (1997). Training algorithms for hidden Markov models using entropy based distance functions. In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 641-647. MIT Press, Cambridge, MA.
Google Scholar
Stolcke A. & Omohundro, S.M. (1994). Best-first model merging for hidden Markov model induction. Technical Report ICSI TR-94-003.
Viterbi, A.J. (1967). Error bounds for convulutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13:260-269.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Hebrew University, Jerusalem, 91904, Israel. E-mail
Shai Fine & Naftali Tishby
AT&T Labs, 180 Park Avenue, Florham Park, NJ, 07932. E-mail
Yoram Singer

Authors

Shai Fine
View author publications
You can also search for this author in PubMed Google Scholar
Yoram Singer
View author publications
You can also search for this author in PubMed Google Scholar
Naftali Tishby
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fine, S., Singer, Y. & Tishby, N. The Hierarchical Hidden Markov Model: Analysis and Applications. Machine Learning 32, 41–62 (1998). https://doi.org/10.1023/A:1007469218079

Download citation

Issue Date: July 1998
DOI: https://doi.org/10.1023/A:1007469218079

The Hierarchical Hidden Markov Model: Analysis and Applications

Abstract

Article PDF

Similar content being viewed by others

Hidden Markov Models

Probabilistic Modeling in Machine Learning

Hebb Rule

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

The Hierarchical Hidden Markov Model: Analysis and Applications

Abstract

Article PDF

Similar content being viewed by others

Hidden Markov Models

Probabilistic Modeling in Machine Learning

Hebb Rule

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation