Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Extracting automata from recurrent neural networks using queries and counterexamples (extended version)

Published: 08 June 2022 Publication History

Abstract

We consider the problem of extracting a deterministic finite automaton (DFA) from a trained recurrent neural network (RNN). We present a novel algorithm that uses exact learning and abstract interpretation to perform efficient extraction of a minimal DFA describing the state dynamics of a given RNN. We use Angluin’s L algorithm as a learner and the given RNN as an oracle, refining the abstraction of the RNN only as much as necessary for answering equivalence queries. Our technique allows DFA-extraction from the RNN while avoiding state explosion, even when the state vectors are large and fine differentiation is required between RNN states. We experiment on multi-layer GRUs and LSTMs with state-vector dimensions, alphabet sizes, and underlying DFA which are significantly larger than in previous DFA-extraction work. Aditionally, we discuss when it may be relevant to apply the technique to RNNs trained as language models rather than binary classifiers, and present experiments on some such examples. In some of our experiments, the underlying target language can be described with a succinct DFA, yet we find that the extracted DFA is large and complex. These are cases in which the RNN has failed to learn the intended generalisation, and our extraction procedure highlights words which are misclassified by the seemingly “perfect” RNN.

References

[1]
Adi, Y., Kermany, E., Belinkov, Y., Lavi, O., & Goldberg, Y. (2016). Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. http://arxiv.org/abs/1608.04207
[2]
Angluin D Learning regular sets from queries and counterexamples Information Computer 1987 75 2 87-106
[3]
Arras, L., Montavon, G., Müller, K., & Samek, W. (2017). Explaining recurrent neural network predictions in sentiment analysis. http://arxiv.org/abs/1706.07206
[4]
Ayache, S., Eyraud, R., & Goudian, N. (2018). Explaining black boxes on sequential data using weighted automata. In: Unold O, Dyrka W, Wieczorek W (eds) Proceedings of the 14th International Conference on Grammatical Inference, ICGI 2018, Wrocław, Poland, September 5-7, 2018, PMLR, Proceedings of Machine Learning Research, vol 93, pp 81–103, http://proceedings.mlr.press/v93/ayache19a.html
[5]
Balle B, Carreras X, Luque FM, and Quattoni A Spectral learning of weighted automata - A forward-backward perspective Machine Learning 2014 96 1–2 33-63
[6]
Barbot, B., Bollig, B., Finkel, A., Haddad, S., Khmelnitsky, I., Leucker, M., Neider, D., Roy, R., & Ye, L. (2021). Extracting context-free grammars from recurrent neural networks using tree-automata learning and a* search. In: Chandlee J, Eyraud R, Heinz J, Jardine A, van Zaanen M (eds) Proceedings of the Fifteenth International Conference on Grammatical Inference, PMLR, Proceedings of Machine Learning Research, vol 153, pp 113–129, https://proceedings.mlr.press/v153/barbot21a.html
[7]
Berg T, Jonsson B, Leucker M, and Saksena M Insights to angluin’s learning Electronic Notes in Theoretical Computational Science 2005 118 3-18
[8]
Boser, B.E., Guyon, I.M., & Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ACM, New York, NY, USA, COLT ’92, pp 144–152, http://doi.acm.org/10.1145/130385.130401
[9]
Casey M Correction to proof that recurrent neural networks can robustly recognize only regular languages Neural Computation 1998 10 5 1067-1069
[10]
Cechin, A.L., Simon, D.R.P., & Stertz, K. (2003). State automata extraction from recurrent neural nets using k-means and fuzzy clustering. In: Proceedings of the XXIII International Conference of the Chilean Computer Science Society, IEEE Computer Society, Washington, DC, USA, SCCC ’03, pp 73–78, http://dl.acm.org/citation.cfm?id=950790.951318
[11]
Cho, K., van Merrienboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. http://arxiv.org/abs/1409.1259
[12]
Chung, J., Gülçehre, Ç., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. http://arxiv.org/abs/1412.3555
[13]
Clark, A. (2010). Distributional learning of some context-free languages with a minimally adequate teacher. In: Sempere JM, García P (eds) Grammatical Inference: Theoretical Results and Applications, 10th International Colloquium, ICGI 2010, Valencia, Spain, September 13-16, 2010. Proceedings, Springer, Lecture Notes in Computer Science, vol 6339, pp 24–37,
[14]
Clark, A., & Eyraud, R. (2007). Polynomial identification in the limit of substitutable context-free languages. J Mach Learn Res 8:1725–1745, http://dl.acm.org/citation.cfm?id=1314556
[15]
Clark, A., & Yoshinaka, R. (2016). Distributional Learning of Context-Free and Multiple Context-Free Grammars, Springer Berlin Heidelberg, pp 143–172.
[16]
Cleeremans A, Servan-Schreiber D, and McClelland JL Finite state automata and simple recurrent networks Neural Computation 1989 1 3 372-381
[17]
Cohen, M., Caciularu, A., Rejwan, I., & Berant, J. (2017). Inducing Regular Grammars Using Recurrent Neural Networks. http://arxiv.org/abs/1710.10453
[18]
Dulizia A, Ferri F, and Grifoni P A survey of grammatical inference methods for natural language learning Artificial Intelligence Review 2010 36 1-27
[19]
Elman JL Finding structure in time Cognitive Science 1990 14 2 179-211
[20]
Gers F and Schmidhuber E Lstm recurrent networks learn simple context-free and context-sensitive languages IEEE Transactions on Neural Networks 2001 12 6 1333-1340
[21]
Giles, C. L., Sun, G. Z., Chen, H. H., Lee, Y. C., & Chen, D. (1990). Higher order recurrent networks and grammatical inference. In D. S. Touretzky (Ed.), Advances in Neural Information Processing Systems 2 (pp. 380–387). Morgan-Kaufmann.
[22]
Goldberg Y A primer on neural network models for natural language processing Journal of Artificial Intelligence Resoure 2016 57 345-420
[23]
Goldberg Y Neural Network Methods for Natural Language Processing Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2017
[24]
Goldman SA and Kearns MJ On the complexity of teaching Journal of Computational System of Science 1995 50 1 20-31
[25]
Goodfellow I, Bengio Y, and Courville A Deep learning 2016 Cambridge The MIT Press
[26]
Gorman, K., & Sproat, R. (2016) Minimally supervised number normalization. Transactions of the Association for Computational Linguistics 4:507–519, https://www.transacl.org/ojs/index.php/tacl/article/view/897
[27]
Goudreau MW, Giles CL, Chakradhar ST, and Chen D First-order versus second-order single-layer recurrent neural networks IEEE Transactions Neural Networks 1994 5 3 511-513
[28]
Hewitt, J., Hahn, M., Ganguli, S., Liang, P., & Manning, C.D. (2020). RNNs can generate bounded hierarchical languages with optimal memory. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp 1978–2010, 10.18653/v1/2020.emnlp-main.156, https://www.aclweb.org/anthology/2020.emnlp-main.156
[29]
Hochreiter S and Schmidhuber J Long short-term memory Neural Computation 1997 9 8 1735-1780
[30]
Isberner, M., Howar, F., & Steffen, B. (2014). The TTT algorithm: A redundancy-free approach to active automata learning. In: Bonakdarpour B, Smolka SA (eds) Runtime Verification - 5th International Conference, RV 2014, Toronto, ON, Canada, September 22-25, 2014. Proceedings, Springer, Lecture Notes in Computer Science, vol 8734, pp 307–322,
[31]
Jacobsson H Rule extraction from recurrent neural networks: A taxonomy and review Neural Computation 2005 17 6 1223-1263
[32]
Kádár, Á., Chrupala, G., & Alishahi, A. (2016). Representation of linguistic form and function in recurrent neural networks. http://arxiv.org/abs/1602.08952
[33]
Karpathy, A., Johnson, J., & Li, F. (2015). Visualizing and understanding recurrent networks. http://arxiv.org/abs/1506.02078
[34]
Lei, T., Barzilay, R., & Jaakkola, T.S. (2016). Rationalizing neural predictions. http://arxiv.org/abs/1606.04155
[35]
Li, J., Chen, X., Hovy, E.H., & Jurafsky. D. (2015). Visualizing and understanding neural models in NLP. http://arxiv.org/abs/1506.01066
[36]
Linzen T, Dupoux E, Goldberg Y (2016) Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics 4:521–535, https://transacl.org/ojs/index.php/tacl/article/view/972
[37]
Mayr, F., Yovine, S. (2018). Regular Inference on Artificial Neural Networks. In: Holzinger A, Kieseberg P, Tjoa AM, Weippl E (eds) 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Springer International Publishing, Hamburg, Germany, Machine Learning and Knowledge Extraction, vol LNCS-11015, pp 350–369, 10.1007/978-3-319-99740-7_25, https://hal.inria.fr/hal-02060043, part 5: MAKE Explainable AI
[38]
Murdoch, W.J., & Szlam, A. (2017). Automatic rule extraction from long short term memory networks. http://arxiv.org/abs/1702.02540
[39]
Okudono, T., Waga, M., Sekiyama, T., & Hasuo, I. (2020). Weighted automata extraction from recurrent neural networks via regression on state spaces. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, AAAI Press, pp 5306–5314, https://aaai.org/ojs/index.php/AAAI/article/view/5977
[40]
Omlin CW and Giles CL Extraction of rules from discrete-time recurrent neural networks Neural Networks 1996 9 1 41-52
[41]
Omlin CW, Giles CL (2000) Symbolic knowledge representation in recurrent neural networks: Insights from theoretical models of computation. In: Cloete I, Zurada JM (eds) Knowledge-based Neurocomputing, MIT Press, Cambridge, MA, USA, pp 63–116, http://dl.acm.org/citation.cfm?id=337224.337236
[42]
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp 8024–8035, http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[43]
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, and Duchesnay E Scikit-learn: Machine learning in Python Journal of Machine Learning Research 2011 12 2825-2830
[44]
Sakakibara Y Efficient learning of context-free grammars from positive structural examples Information and Computation 1992 97 1 23-60 https://www.sciencedirect.com/science/article/pii/089054019290003X
[45]
Shi, X., Padhi, I., & Knight, K. (2016). Does string-based neural mt learn source syntax? In: EMNLP, pp 1526–1534
[46]
Shibata C and Yoshinaka R Probabilistic learnability of context-free grammars with basic distributional properties from positive examples Theoretical Computer Science 2016 620 46-72 https://www.sciencedirect.com/science/article/pii/S0304397515009433, algorithmic Learning Theory
[47]
Strobelt, H., Gehrmann, S., Huber, B., Pfister, H., Rush, A.M. (2016). Visual analysis of hidden state dynamics in recurrent neural networks. http://arxiv.org/abs/1606.07461
[48]
Suzgun, M., Gehrmann, S., Belinkov, Y., & Shieber, S.M. (2019). LSTM networks can perform dynamic counting. http://arxiv.org/abs/1906.03648
[49]
Tellier I Learning recursive automata from positive examples Revieq d’Intelligence Artificial 2006 20 6 775-804
[50]
Tomita, M. (1982). Dynamic construction of finite automata from examples using hill-climbing. In: Proceedings of the Fourth Annual Conference of the Cognitive Science Society, Ann Arbor, Michigan, pp 105–108
[51]
Wang, C., & Niepert, M. (2019). State-regularized recurrent neural networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol 97, pp 6596–6606, http://proceedings.mlr.press/v97/wang19j.html
[52]
Wang, Q., Zhang, K., Ororbia II, A.G., Xing, X., Liu, X., Giles, C.L. (2017). An empirical evaluation of recurrent neural network rule extraction. http://arxiv.org/abs/1709.10380
[53]
Wang, Q., Zhang, K., Ororbia II, A.G., Xing, X., Liu, X., & Giles, C.L. (2018). A comparison of rule extraction for different recurrent neural network models and grammatical complexity. http://arxiv.org/abs/1801.05420
[54]
Weiss, G., Goldberg, Y., & Yahav, E. (2018a). Extracting automata from recurrent neural networks using queries and counterexamples. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pp 5244–5253, http://proceedings.mlr.press/v80/weiss18a.html
[55]
Weiss, G., Goldberg, Y., & Yahav, E. (2018b). On the practical computational power of finite precision RNNs for language recognition. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Melbourne, Australia, pp 740–745, https://www.aclweb.org/anthology/P18-2117
[56]
Weiss, G., Goldberg, Y., & Yahav, E. (2019). Learning deterministic weighted automata with queries and counterexamples. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 32
[57]
Yellin, D. M., & Weiss, G. (2021). Synthesizing context-free grammars from recurrent neural networks. In J. F. Groote & K. G. Larsen (Eds.), Tools and Algorithms for the Construction and Analysis of Systems (pp. 351–369). Cham: Springer International Publishing.
[58]
Yokomori T Polynomial-time identification of very simple grammars from positive data Theoretical Computational Science 2003 298 1 179-206
[59]
Yoshinaka R Distributional learning of conjunctive grammars and contextual binary feature grammars Journal of Computation Systematic of Science 2019 104 359-374
[60]
Zeng Z, Goodman RM, and Smyth P Learning finite state machines with self-clustering recurrent networks Neural Computation 1993 5 6 976-990
[61]
Zhang, X., Du, X., Xie, X., Ma, L., Liu, Y., & Sun, M. (2021). Decision-guided weighted automata extraction from recurrent neural networks. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, AAAI Press, pp 11699–11707, https://ojs.aaai.org/index.php/AAAI/article/view/17391

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 113, Issue 5
May 2024
1040 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 08 June 2022
Accepted: 18 December 2021
Revision received: 08 November 2021
Received: 11 April 2020

Author Tags

  1. Recurrent neural networks
  2. Automata
  3. Deterministic finite automata
  4. Exact learning
  5. Extraction

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media