Abstract
During the 80’s, Angluin introduced an active learning paradigm, using an Oracle, capable of answering both membership and equivalence queries. However, practical evidence tends to show that if the former are often available, this is usually not the case of the latter. We propose new queries, called correction queries, which we study in the framework of Grammatical Inference. When a string is submitted to the Oracle, either she validates it if it belongs to the target language, or she proposes a correction, i.e., a string of the language close to the query with respect to the edit distance. We also introduce a non-standard class of languages: The topological balls of strings. We show that this class is not learnable in Angluin’s Mat model, but is with a linear number of correction queries. We conduct several experiments with an Oracle simulating a human Expert, and show that our algorithm is resistant to approximate answers.
This work was supported in part by the IST Programme of the European Community, under the Pascal Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.
Chapter PDF
Similar content being viewed by others
References
Angluin, D.: Queries and concept learning. Machine Learning Journal 2, 319–342 (1987)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Wagner, R., Fischer, M.: The string-to-string correction problem. Journal of the ACM 21, 168–178 (1974)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Amengual, J.C., Sanchis, A., Vidal, E., Benedí, J.M.: Language simplification through error-correcting and grammatical inference techniques. Machine Learning Journal 44(1), 143–159 (2001)
Oncina, J., Sebban, M.: Learning stochastic edit distance: Application in handwritten character recognition. Pattern Recognition 39(9), 1575–1587 (2006)
Tantini, F., de la Higuera, C., Janodet, J.C.: Identification in the limit of systematic-noisy languages. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 19–31. Springer, Heidelberg (2006)
Kearns, M.J., Li, M.: Learning in the presence of malicious errors. SIAM Journal of Computing 22(4), 807–837 (1993)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithmique du texte, Vuibert (2001)
Schulz, K.U., Mihov, S.: Fast string correction with levenshtein automata. Int. Journal on Document Analysis and Recognition 5(1), 67–85 (2002)
Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75, 87–106 (1987)
de la Higuera, C.: Data complexity issues in grammatical inference. In: Data Complexity in Pattern Recognition, pp. 153–172. Springer, Heidelberg (2006)
Angluin, D.: Queries and concept learning. Machine Learning 2, 319–342 (1988)
Becerra-Bonache, L., Yokomori, T.: Learning mild context-sensitiveness: Towards understanding children’s language learning. In: Paliouras, G., Sakakibara, Y. (eds.) ICGI 2004. LNCS (LNAI), vol. 3264, pp. 53–64. Springer, Heidelberg (2004)
de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theoretical Computer Science 230, 39–48 (2000)
Kohonen, T.: Median strings. Information Processing Letters 3, 309–313 (1985)
Martínez-Hinarejos, C.D., Juan, A., Casacuberta, F.: Use of median string for classification. In: Proc. of the 15th International Conference on Pattern Recognition, vol. 2, pp. 2903–2906 (2000)
Becerra-Bonache, L., de la Higuera, C., Janodet, J.C., Tantini, F.: Apprentissage des boules de mots avec des requêtes de correction (in french). In: Proc. of 9th Conférence en Apprentissage, Presses Universitaires de Grenoble (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Becerra Bonache, L., de la Higuera, C., Janodet, JC., Tantini, F. (2007). Learning Balls of Strings with Correction Queries. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)