Abstract
To study the problem of learning from noisy data, the common approach is to use a statistical model of noise. The influence of the noise is then considered according to pragmatic or statistical criteria, by using a paradigm taking into account a distribution of the data. In this article, we study the noise as a nonstatistical phenomenon, by defining the concept of systematic noise. We establish various ways of learning (in the limit) from noisy data. The first is based on a technique of reduction between problems and consists in learning from the data which one knows noisy, then in denoising the learned function. The second consists in denoising on the fly the training examples, thus to identify in the limit good examples, and then to learn from noncorrupted data. We give in both cases sufficient conditions so that learning is possible and we show through various examples (coming in particular from the field of the grammatical inference) that our techniques are complementary.
This work was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’views.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sakakibara, Y.: Recent Advances of Grammatical Inference. Theoretical Computer Science 185, 15–45 (1997)
de la Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition 38, 1332–1348 (2005)
Lang, K.J., Pearlmutter, B.A., Price, R.A.: Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 1–12. Springer, Heidelberg (1998)
de la Higuera, C.: Data complexity in Grammatical Inference. In: Data complexity in Pattern Recognition. Advanced Information and Knowledge Processing, Springer, Heidelberg (2006); ISBN: 1-84628-171-7
Case, J., Jain, S., Sharma, A.: Synthesizing noise-tolerant language learners. Theoretical Computer Science 261, 31–56 (2001)
Stephan, F.: Noisy inference and oracles. Theoretical Computer Science 185, 129–157 (1997)
Wharton, R.M.: Approximate language identification. Information and Control 26, 236–255 (1974)
Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. In: 21st ACM Symposium on Theory of Computing, pp. 433–444 (1989)
Kearns, M.: Efficient noise-tolerant learning from statistical queries. In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, pp. 392–401 (1993)
Sebban, M., Janodet, J.C.: On state merging in grammatical inference: a statistical approach for dealing with noisy data. In: Proceedings of ICML (2003)
Habrard, A., Bernard, M., Sebban, M.: Improvement of the state merging rule on noisy data in probabilistic grammatical inference. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 169–180. Springer, Heidelberg (2003)
Coste, F., Fredouille, D.: Unambiguous automata inference by means of state-merging methods. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 60–71. Springer, Heidelberg (2003)
Giles, C.L., Lawrence, S., Tsoi, A.: Noisy time series prediction using recurrent neural networks and grammatical inference. Machine Learning Journal 44, 161–183 (2001)
Yokomori, T., Kobayashi, S.: Inductive learning of regular sets from examples: a rough set approach. In: Proc. of International Workshop on Rough Sets and Soft Computing (1994)
Miclet, L., Bayoudh, S., Delhay, A.: Définitions et premières expériences en apprentissage par analogie dans les séquences. In: Denis, F. (ed.) CAP, PUG, pp. 31–48 (2005)
Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite state automata – part I and II. Pattern Analysis and Machine Intelligence 27, 1013–1039 (2005)
Abe, N., Warmuth, M.: On the computational complexity of approximating distributions by probabilistic automata. Machine Learning Journal 9, 205–260 (1992)
Thollard, F., Clark, A.: Pac-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 473–497 (2004)
Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS (LNAI), vol. 862, pp. 139–150. Springer, Heidelberg (1994)
Gold, M.: Language identification in the limit. Information and Control 10, 447–474 (1967)
Gold, E.M.: Complexity of automaton identification from given data. Information and Control 37, 302–320 (1978)
de la Higuera, C.: Complexity and reduction issues in grammatical inference. Technical Report, Universität Tübingen (2005) ISSN 0946-3852
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10, 707–710 (1965); Original in Doklady Akademii Nauk SSSR, 163(4), 845–848 (1965)
Wagner, R., Fisher, M.: The string-to-string correction problem. Journal of the ACM 21, 168–178 (1974)
Takada, Y.: Grammatical inference for even linear languages based on control sets. Information Processing Letters 28, 193–199 (1988)
de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theoretical Computer Science 230, 39–48 (2000)
Belmandt, Z.: Manuel de prétopologie et ses applications. Hermés (1993)
Pawlak, Z.: Theory of rough sets: A new methodology for knowledge discovery (abstract). In: ICCI 1990, p. 11 (1990)
Kobayashi, S., Yokomori, T.: On approximately identifying concept classes in the limit. In: ALT 1995, pp. 298–312 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tantini, F., de la Higuera, C., Janodet, JC. (2006). Identification in the Limit of Systematic-Noisy Languages. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_3
Download citation
DOI: https://doi.org/10.1007/11872436_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)