Abstract
Finite-State Transducers (FSTs) are a popular tool for modeling paired input-output sequences, and have numerous applications in real-world problems. Most training algorithms for learning FSTs rely on gradient-based or EM optimizations which can be computationally expensive and suffer from local optima issues. Recently, Hsu et al. [13] proposed a spectral method for learning Hidden Markov Models (HMMs) which is based on an Observable Operator Model (OOM) view of HMMs. Following this line of work we present a spectral algorithm to learn FSTs with strong PAC-style guarantees. To the best of our knowledge, ours is the first result of this type for FST learning. At its core, the algorithm is simple, and scalable to large data sets. We present experiments that validate the effectiveness of the algorithm on synthetic and real data.
This work was partially supported the EU PASCAL2 Network of Excellence (FP7-ICT-216886), and by a Google Research Award. B.B. was supported by an FPU fellowship (AP2008-02064) of the Spanish Ministry of Education. The Spanish Ministry of Science and Innovation supported A.Q. (JCI-2009-04240) and X.C. (RYC-2008-02223 and “KNOW2” TIN2009-14715-C04-04).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abe, N., Takeuchi, J., Warmuth, M.: Polynomial Learnability of Stochastic Rules with Respect to the KL-Divergence and Quadratic Distance. IEICE Transactions on Information and Systems 84(3), 299–316 (2001)
Bailly, R., Denis, F., Ralaivola, L.: Grammatical inference as a principal component analysis problem. In: Proc. ICML (2009)
Bernard, M., Janodet, J.-C., Sebban, M.: A discriminative model of stochastic edit distance in the form of a conditional transducer. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 240–252. Springer, Heidelberg (2006)
Carlyle, J.W., Paz, A.: Realization by stochastic finite automaton. Journal of Computer and System Sciences 5, 26–40 (1971)
Casacuberta, F.: Inference of finite-state transducers by using regular grammars and morphisms. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 1–14. Springer, Heidelberg (2000)
Chang, J.T.: Full reconstruction of markov models on evolutionary trees: Identifiability and consistency. Mathematical Biosciences 137, 51–73 (1996)
Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proc. of ACL, pp. 310–318 (1996)
Clark, A.: Partially supervised learning of morphology with stochastic transducers. In: Proc. of NLPRS, pp. 341–348 (2001)
Clark, A., Costa Florêncio, C., Watkins, C.: Languages as hyperplanes: grammatical inference with string kernels. Machine Learning, 1–23 (2010)
Eisner, J.: Parameter estimation for probabilistic finite-state transducers. In: Proc. of ACL, pp. 1–8 (2002)
Fliess, M.: Matrices de Hankel. Journal de Mathematiques Pures et Appliquees 53, 197–222 (1974)
Haardt, M., Nossek, J.A.: Simultaneous schur decomposition of several nonsymmetric matrices to achieve automatic pairing in multidimensional harmonic retrieval problems. IEEE Transactions on Signal Processing 46(1) (1998)
Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden markov models. In: Proc. of COLT (2009)
Jaeger, H.: Observable operator models for discrete stochastic time series. Neural Computation 12, 1371–1398 (2000)
Jelinek, F.: Statistical Methods for Speech Recognition (Language, Speech, and Communication). MIT Press, Cambridge (1998)
Knight, K., Graehl, J.: Machine transliteration. Computational Linguistics 24(4), 599–612 (1998)
Li, H., Kumaran, A., Pervouchine, V., Zhang, M.: Report of news 2009 machine transliteration shared task. In: Proc. Named Entities Workshop (2009)
Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden markov models. In: Proc. of STOC (2005)
Neuts, M.F.: Matrix-geometric solutions in stochastic models: an algorithmic approach. Johns Hopkins University Press, Baltimore (1981)
Ng, A., Jordan, M.: On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In: NIPS (2002)
Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. In: Proc. of COLT, pp. 31–40 (1995)
Schützenberger, M.: On the definition of a family of automata. Information and Control 4, 245–270 (1961)
Siddiqi, S.M., Boots, B., Gordon, G.J.: Reduced-Rank Hidden Markov Models. In: Proc. AISTATS, pp. 741–748 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balle, B., Quattoni, A., Carreras, X. (2011). A Spectral Learning Algorithm for Finite State Transducers. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-23780-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)