Abstract
We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abe, N. & Warmuth, M. (1992). On the computational complexity of approximating distributions by probabilistic automata. Machine Learning, 9:205-260.
Baldi, P., Chauvin, Y., Hunkapiller, T. & McClure, M. (1994). Hidden Markov models of biological primary sequence information. Proc. Nat. Acd. Sci. (USA), 91(3):1059-1063.
Baum, L.E. & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, Vol. 37.
Bengio, Y. & Frasconi, P. (1995). An input-output HMM architecture. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427-434. MIT Press, Cambridge, MA.
Cover, T. & Thomas, J. (1991). Elements of Information Theory. Wiley.
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Series B, 39:1-38.
Gat, I., Tishby, N. & Abeles, M. (1997). Hidden Markov modeling of simultaneously recorded cells in the associative cortex of behaving monkeys. Network: Computation in neural systems, 8:3, pages 297-322.
Ghahramani Z. & Jordan, M.I. (1997). Factorial hidden Markov models. Machine Learning, 29: 245-273.
Gillman, D. & Sipser, M. (1994). Inference and minimization of hidden Markov chains. Proceedings of the Seventh Annual Workshop on Computational Learning Theory, pages 147-158.
Jelinek, F. (1985). Robust part-of-speech tagging using a hidden Markov model. IBM T.J. Watson Research Technical Report.
Jelinek, F. (1983). Markov source modeling of text generation. Technical report, IBM T.J. Watson Research Center Technical Report.
Jelinek, F. (1985). Self-organized language modeling for speech recognition. IBM T.J. Watson Research Center Technical Report.
Krogh, A., Mian, S.I. & Haussler, D. (1994). A hidden Markov model that finds genes in E.coli DNA. NAR, 22”4768-4778.
Lari, K. & Young, S.J. (1990). The estimation of stochastic context free grammars using the Inside-Outside algorithm. Computers Speech and Language, 4.
Nag, R., Wong, K.H. & Fallside, F. (1985). Script recognition using hidden Markov models. Proceedings of International Conference on Acoustics Speech and Signal Processing, pages 2071-2074.
Rabiner, L.R. & Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, No. 3.
Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE.
Rissanen, J. (1986). Complexity of strings in the class of Markov sources. IEEE Transactions on Information Theory, 32(4):526-532.
Ron, D., Singer, Y. & Tishby, N. (1996). The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning, 25:117-149.
Singer, Y. & Tishby, N.. (1994). Dynamical encoding of cursive handwriting. Biological Cybernetics, 71(3):227-237.
Singer, Y. & Warmuth, M.K. (1997). Training algorithms for hidden Markov models using entropy based distance functions. In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 641-647. MIT Press, Cambridge, MA.
Stolcke A. & Omohundro, S.M. (1994). Best-first model merging for hidden Markov model induction. Technical Report ICSI TR-94-003.
Viterbi, A.J. (1967). Error bounds for convulutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13:260-269.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fine, S., Singer, Y. & Tishby, N. The Hierarchical Hidden Markov Model: Analysis and Applications. Machine Learning 32, 41–62 (1998). https://doi.org/10.1023/A:1007469218079
Issue Date:
DOI: https://doi.org/10.1023/A:1007469218079