Abstract
We study the classification problem that arises when two variables—one continuous (x), one discrete (s)—evolve jointly in time. We suppose that the vector x traces out a smooth multidimensional curve, to each point of which the variable s attaches a discrete label. The trace of s thus partitions the curve into different segments whose boundaries occur where s changes value. We consider how to learn the mapping between the trace of x and the trace of s from examples of segmented curves. Our approach is to model the conditional random process that generates segments of constant s along the curve of x. We suppose that the variable s evolves stochastically as a function of the arc length traversed by x. Since arc length does not depend on the rate at which a curve is traversed, this gives rise to a family of Markov processes whose predictions are invariant to nonlinear warpings (or reparameterizations) of time. We show how to estimate the parameters of these models—known as Markov processes on curves (MPCs)—from labeled and unlabeled data. We then apply these models to two problems in automatic speech recognition, where x are acoustic feature trajectories and s are phonetic alignments.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. New York, NY: John Wiley.
Baum, L. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of a markov process. In O. Shisha (Ed.), Inequalities (Vol. 3, pp. 1–8). New York, NY: Academic Press.
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society B, 39, 1–38.
DoCarmo, M. P. (1976). Differential Geometry of Curves and Surfaces. Englewood Cliffs, NJ: Prentice Hall.
Duda, R. O. & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York, NY: Wiley.
Ostendorf, M., Digalakis, V., & Kimball, O. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 4, 360–378.
Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes. Boston, MA: McGraw-Hill.
Rabiner, L. R. & Juang, B. H. (1993). Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall.
Sachs, R., Tikijian, M., & Roskos, E. (1994). United States English subword speech data. AT&T unpublished report.
Siegler, M.A. & Stern, R. M. (1995). On the effects of speech rate in large vocabulary speech recognition systems. In Proceedings of the 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 612–615).
Simard, P. Y., LeCun, Y., & Denker, J. (1993). Efficient pattern recognition using a new transformation distance. In Advances in Neural Information Processing Systems (Vol. 5, pp. 50–58). San Mateo, CA: Morgan Kauffman.
Tishby, N. (1990). A dynamical system approach to speech processing. In Proceedings of the 1990 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 365–368).
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asyptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13, 260–269.
Wald, R. M. (1984). General Relativity. Chicago, IL: University of Chicago Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Saul, L.K., Rahim, M.G. Markov Processes on Curves. Machine Learning 41, 345–363 (2000). https://doi.org/10.1023/A:1007604231716
Issue Date:
DOI: https://doi.org/10.1023/A:1007604231716