Abstract
We consider the task of training a neural network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government and Binding theory. We investigate the following models: feed-forward neural networks, Frasconi-Gori-Soda and Back-Tsoi locally recurrent neural networks, Williams and Zipser and Elman recurrent neural networks, Euclidean and edit-distance nearest-neighbors, and decision trees. Non-neural network machine learning methods are included primarily for comparison. We find that the Elman and Williams & Zipser recurrent neural networks are able to find a representation for the grammar which we believe is more parsimonious. These models exhibit the best performance.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, R. B. (1983). Sequential connectionist networks for answering simple questions about a microworld. In 5th Annual Proceedings of the Cognitive Science Society, pages 489–495.
Back, A. and Tsoi, A. (1991). FIR and IIR synapses, a new neural network architecture for time series modeling. Neural Computation, 3(3):375–385.
Berg, G. (1992). A connectionist parser with recursive sentence structure and lexical disambiguation. In Proceedings AAAI, pages 32–37.
Chomsky, N. (1981). Lectures on Government and Binding. Foris Publications.
Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. Prager.
Clouse, D., Giles, C., Horne, B., and Cottrell, G. (1994). Learning large DeBruijn automata with feed-forward neural networks. Technical Report CS94-398, Computer Science and Engineering, University of California at San Diego, La Jolla, CA.
Crutchfield, J. P. and Young, K. (1989). Computation at the onset of chaos. In Zurek, W., editor, Complexity, Entropy and the Physics of Information. Addison-Wesley, Reading, MA.
Darken, C. and Moody, J. (1991). Note on learning rate schedules for stochastic optimization. In Neural Information Processing Systems 3, pages 832–838. Morgan Kaufmann.
Das, S., Giles, C., and Sun, G. (1992). Learning context-free grammars: Limitations of a recurrent neural network with an external stack memory. In Proceedings of The Fourteenth Annual Conference of the Cognitive Science Society, pages 791–795, San Mateo, CA. Morgan Kaufmann Publishers.
Elman, J. (1990). Finding structure in time. Cognitive Science, 14:179–211.
Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2/3):195–226.
Elman, J. L. (1984). Structured representations and connectionist models. In 6th Annual Proceedings of the Cognitive Science Society, pages 17–25.
Frasconi, P., Gori, M., and Soda, G. (1992). Local feedback multilayered networks. Neural Computation, 4(1):120–130.
Gasser, M. and Lee, C. (1990). Networks that learn phonology. Technical report, Computer Science Department, Indiana University.
Giles, C., Horne, B., and Lin, T. (1995). Learning a class of large finite state machines with a recurrent neural network. Neural Networks. In press.
Giles, C., Miller, C., Chen, D., Chen, H., Sun, G., and Lee, Y. (1992). Learning and extracting finite state automata with second-order recurrent neural networks. Neural Computation, 4(3):393–405.
Hare, M. (1990). The role of similarity in hungarian vowel harmony: A connectionist account. Technical Report CRL Tech report 9004, Centre for Research in Language, University of California, San Diego.
Hare, M., Corina, D., and Cottrell, G. (1989). Connectionist perspective on prosodic structure. Technical Report CRL Newsletter Volume 3 Number 2, Centre for Research in Language, University of California, San Diego.
Harris, C. L. and Elman, J. L. (1984). Representing variable information with simple recurrent networks. In 6th Annual Proceedings of the Cognitive Science Society, pages 635–642.
Haykin, S. (1994). Neural Networks, A Comprehensive Foundation. Macmillan, New York, NY.
John, M. F. S. and McLelland, J. L. (1990). Learning and applying contextual constraints in sentence comprehension. Artificial Intelligence, 46:5–46.
Joshi, A. K. (1985). Tree adjoining grammars: how much context-sensitivity is required to provide reasonable structural descriptions? In D. R. Dowty, L. K. and Zwicky, A. M., editors, Natural Language Parsing. Cambridge University Press, Cambridge.
Kruskal, J. B. (1983). An overview of sequence comparison. In Sankoff, D. and Kruskal, J. B., editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Massachusetts.
Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.
Lasnik, H. and Uriagereka, J. (1988). A Course in GB Syntax: Lectures on Binding and Empty Categories. MIT Press, Cambridge, MA.
Lawrence, S., Giles, C. L., and Fong, S. (1995). On the applicability of neural network and machine learning methodologies to natural language processing. Technical Report UMIACS-TR-95-64 and CS-TR-3479, Institute for Advanced Computer Studies, University of Maryland, College Park MD 20742.
MacWhinney, B., Leinbach, J., Taraban, R., and McDonald, J. (1989). Language learning: cues or rules? Journal of Memory and Language, 28:255–277.
Miikkulainen, R. and Dyer, M. (1989). Encoding input/output representations in connectionist cognitive systems. In Touretzky, D. S., Hinton, G. E., and Sejnowski, T. J., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 188–195, Los Altos, CA. Morgan Kaufmann.
Pereira, F. (1992). Inside-outside reestimation from partially bracketed corpora. In Association for Computational Linguistics, ACL 92.
Pesetsky, D. M. (1982). Paths and Categories. PhD thesis, MIT.
Pollack, J. (1990). Recursive distributed representations. Artificial Intelligence, 46:77–105.
Pollack, J. (1991). The induction of dynamical recognizers. Machine Learning, 7:227–252.
Pollard, C. (1984). Generalised context-free grammars, head grammars and natural language. PhD thesis, Department of Linguistics, Stanford University, Palo Alto, CA.
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California.
Siegelmann, H. and Sontag, E. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1):132–150.
Sperduti, A., Starita, A., and Goller, C. (1995). Learning distributed representations for the classification of terms. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 509–515.
Stolcke, A. (1990). Learning feature-based semantics with simple recurrent networks. Technical Report TR-90-015, International Computer Science Institute, Berkeley, California.
Sun, G., Giles, C., Chen, H., and Lee, Y. (1993). The neural network push-down automaton: Model, stack and learning simulations. Technical Report UMIACS-TR-93-77, Institute for Advanced Computer Studies, University of Maryland, College Park, MD.
Touretzky, D. S. (1989a). Rules and maps in connectionist symbol processing. Technical Report CMU-CS-89-158, Carnegie Mellon University: Department of Computer Science, Pittsburgh, PA.
Touretzky, D. S. (1989b). Towards a connectionist phonology: The’ many maps’ approach to sequence manipulation. In Proceedings of the 11th Annual Conference of the Cognitive Science Society, pages 188–195.
Watrous, R. and Kuhn, G. (1992). Induction of finite-state languages using second-order recurrent networks. Neural Computation, 4(3):406.
Williams, R. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270–280.
Williams, R. and Zipser, D. (1990). Gradient-based learning algorithms for recurrent connectionist networks. In Chauvin, Y. and Rumelhart, D., editors, Backpropagation: Theory, Architectures, and Applications. Erlbaum, Hillsdale, NJ.
Zeng, Z., Goodman, R., and Smyth, P. (1994). Discrete recurrent neural networks for grammatical inference. IEEE Transactions on Neural Networks, 5(2):320–330.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lawrence, S., Fong, S., Giles, C.L. (1996). Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_36
Download citation
DOI: https://doi.org/10.1007/3-540-60925-3_36
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive