Assessing BERT’s ability to learn Italian syntax: a study on null-subject and agreement phenomena

Raffaele Guarasci¹,
Stefano Silvestri ORCID: orcid.org/0000-0002-9890-8409¹,
Giuseppe De Pietro¹,
Hamido Fujita^2,3,4 &
…
Massimo Esposito¹

520 Accesses
Explore all metrics

Abstract

The work presented in this paper investigates the ability of BERT neural language model pretrained in Italian to embed syntactic dependency relationships into its layers, by approximating a Dependency Parse Tree. To this end, a structural probe, namely a supervised model able to extract linguistic structures from a language model, has been trained leveraging the contextual embeddings from the layers of BERT. An experimental assessment has been performed using an Italian version of BERT-base model and a set of datasets for Italian labelled with Universal Dependencies formalism. The results, achieved using standard metrics of dependency parsers, have shown that a knowledge of the Italian syntax is embedded in central-upper layers of the BERT model, according to what observed in literature for the English case. In addition, the probe has been also used to experimentally evaluate the BERT model behaviour in case of two specific syntactic phenomena in Italian, namely null-subject and subject-verb-agreement, showing better performance than an Italian state-of-the-art parser. These findings can open a path for the development of new hybrid approaches, exploiting the probe to integrate or improve limits or weaknesses in analysing articulated constructions of Italian syntax, traditionally complex to be parsed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Targeted Assessment of the Syntactic Abilities of Transformer Models for Galician-Portuguese

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

Article 02 June 2023

BERT-Based Dependency Parser for Hindi

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Alexiadou A, Carvalho J (2017) The role of locatives in (partial) pro-drop languages. Order Struct Syntax II:41
Google Scholar
Alfieri L, Tamburini F (2016) (Almost) automatic conversion of the Venice Italian Treebank into the merged Italian dependency treebank format. In: Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) and Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), CEUR-WS.org, Napoli, Italy, CEUR Workshop Proceedings, vol 1749. http://ceur-ws.org/Vol-1749/paper2.pdf
Avvaru A, Vobilisetty S, Mamidi R (2020) Detecting sarcasm in conversation context using transformer-based models. Proce Second Worksh Fig Lang Process Assoc Comput Linguist. https://doi.org/10.18653/v1/2020.figlang-1.15
Article Google Scholar
Belinkov Y, Durrani N, Dalvi F, Sajjad H, Glass J (2017) What do neural machine translation models learn about morphology? Proc Annu Meet Assoc Comput Linguist. https://doi.org/10.18653/v1/P17-1080
Article Google Scholar
Bock JK, Miller CA (1991) Broken agreement. Cognit Psychol 23(1):45–93
Article Google Scholar
Bosco C, Anna C, Alberto L (2012) A treebank-based study on the influence of Italian word order on parsing performance. In: Eight International Conference on Language Resources and Evaluation (LREC’12), ELRA, pp 1985–1992
Bosco C, Montemagni S, Simi M (2013) Converting Italian treebanks: towards an Italian Stanford dependency treebank. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, ACL, Sofia, Bulgaria, pp 61–69. https://www.aclweb.org/anthology/W13-2308
Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), ACL, New York City, pp 149–164. https://www.aclweb.org/anthology/W06-2920
Chauhan P, Sharma N, Sikka G (2020) The emergence of social media data and sentiment analysis in election prediction. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02423-y
Article Google Scholar
Chi EA, Hewitt J, Manning CD (2020) Finding universal grammatical relations in multilingual BERT. Proc Annu Meet Assoc Comput Linguist Assoc Comput Linguist. https://doi.org/10.18653/v1/2020.acl-main.493
Article Google Scholar
Chung T, Post M, Gildea D (2010) Factors affecting the accuracy of Korean parsing. In: Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, pp 49–57
Clark K, Khandelwal U, Levy O, Manning CD (2019) What does BERT look at? An analysis of BERTs attention. Proc ACL Worksh Blackbox NLP. https://doi.org/10.18653/v1/W19-4828
Article Google Scholar
Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. Proc Annu Meet Assoc Comput Linguist Assoc Comput Linguist. https://doi.org/10.18653/v1/2020.acl-main.747
Article Google Scholar
Das A, Verma RM (2020) Can machines tell stories? A comparative study of deep neural language models and metrics. IEEE Access 8:181258–181292. https://doi.org/10.1109/ACCESS.2020.3023421
Article Google Scholar
Davis F, van Schijndel M (2020) Recurrent neural network language models always learn English-like relative clause attachment. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, pp 1979–1990. https://www.aclweb.org/anthology/2020.acl-main.179
De Santo A (2019) Testing a minimalist grammar parser on Italian relative clause asymmetries. Proc Worksh Cognit Model Comput Linguist Asso Comput Linguist https://. https://doi.org/10.18653/v1/W19-2911
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. Proc Conf N Am Chap Assoc Comput Linguist. https://doi.org/10.18653/v1/N19-1423
Article Google Scholar
Eisner J (1996) Three new probabilistic models for dependency parsing: An exploration. In: 16th International Conference on Computational Linguistics, Proceedings of the Conference, COLING 1996, ACL, Copenhagen, Denmark, pp 340–345. https://www.aclweb.org/anthology/C96-1058/
Emanuele Pianta CG, Zanoli R (2008) The TextPro tool suite. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA), Marrakech, Morocco, http://www.lrec-conf.org/proceedings/lrec2008/
Everaert MB, Huybregts MA, Chomsky N, Berwick RC, Bolhuis JJ (2015) Structures, not strings: linguistics as part of the cognitive sciences. Trends Cognit Sci 19(12):729–743. https://doi.org/10.1016/j.tics.2015.09.008
Article Google Scholar
Frascarelli M, Jiménez-Fernández ÁL (2019) Understanding partiality in pro-drop languages: an information-structure approach. Syntax 22(2–3):162–198
Article Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inform Sci 180(10):2044–2064. https://doi.org/10.1016/j.ins.2009.12.010
Article Google Scholar
Goldberg Y (2019) Assessing BERT’s syntactic abilities. arXiv:1901.05287 [cs.CL]
Graffi G (1994) Sintassi, vol 3. Il mulino
Gulordava K, Bojanowski P, Grave E, Linzen T, Baroni M (2018) Colorless green recurrent networks dream hierarchically. Proc Conf N Am Chapter Assoc Comput Linguist. https://doi.org/10.18653/v1/N18-1108
Article Google Scholar
Hall Maudslay R, Valvoda J, Pimentel T, Williams A, Cotterell R (2020) A tale of a probe and a parser. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, Online, pp 7389–7395. https://www.aclweb.org/anthology/2020.acl-main.659
Hewitt J, Liang P (2019) Designing and interpreting probes with control tasks. Proc Conf Empir Methods Natl Lang Process. https://doi.org/10.18653/v1/D19-1275
Article Google Scholar
Hewitt J, Manning CD (2019) A structural probe for finding syntax in word representations. Proc Conf N Am Chapter Assoc Comput Linguist. https://doi.org/10.18653/v1/N19-1419
Article Google Scholar
Hupkes D, Veldhoen S, Zuidema WH (2018) Visualisation and diagnostic classifiers reveal how recurrent and recursive neural networks process hierarchical structure. J Artif Intell Res 61:907–926. https://doi.org/10.1613/jair.1.11196
Article MathSciNet MATH Google Scholar
Hurford JR (2011) The origins of grammar: language in the light of evolution II. OUP Oxford, Oxford
Google Scholar
Jawahar G, Sagot B, Seddah D (2019) What does BERT learn about the structure of language? Proc Annu Meet Assoc Comput Linguist Assoc Comput Linguist. https://doi.org/10.18653/v1/P19-1356
Article Google Scholar
Kitaev N, Kaiser L, Levskaya A (2020) Reformer: The efficient transformer. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia. https://openreview.net/forum?id=rkgNKkHtvB
Kübler S, McDonald RT, Nivre J (2009) Dependency parsing. Synth Lect Hum Lang Technol. https://doi.org/10.2200/S00169ED1V01Y200901HLT002
Article Google Scholar
Kuncoro A, Dyer C, Hale J, Yogatama D, Clark S, Blunsom P (2018) LSTMs can learn syntax-sensitive dependencies well, but modeling structure makes them better. Proc Annu Meet Assoc Comput Linguist Assoc Comput Linguist. https://doi.org/10.18653/v1/P18-1132
Article Google Scholar
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2981314
Article Google Scholar
Linzen T, Dupoux E, Goldberg Y (2016) Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Trans Assoc Comput Linguist 4:521–535
Article Google Scholar
Liu H, Xu C, Liang J (2017) Dependency distance: a new perspective on syntactic patterns in natural languages. Phys Life Rev 21:171–193. https://doi.org/10.1016/j.plrev.2017.03.002
Article Google Scholar
Liu J, Ohara N, Rubin A, Draelos R, Rudin C (2020) Metaphor detection using contextual word embeddings from transformers. Proc Second Worksh Fig Lang Process. https://doi.org/10.18653/v1/2020.figlang-1.34
Article Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized BERT pretraining approach
Mareček D, Rosa R (2018) Extracting syntactic trees from transformer encoder self-attentions. Proc EMNLP Worksh Blackbox NLP. https://doi.org/10.18653/v1/W18-5444
Article Google Scholar
Marvin R, Linzen T (2018) Targeted syntactic evaluation of language models. Proc Conf Empir Methods Natl Lang Process Assoc Comput Linguis https://. https://doi.org/10.18653/v1/D18-1151
Article Google Scholar
Nivre J, de Marneffe MC, Ginter F, Hajič J, Manning CD, Pyysalo S, Schuster S, Tyers F, Zeman D (2020) Universal Dependencies v2: an evergrowing multilingual treebank collection. In: Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, pp 4034–4043. https://www.aclweb.org/anthology/2020.lrec-1.497
Nivre J, Fang CT (2017) Universal dependency evaluation. In: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017), ACL, Gothenburg, Sweden, pp 86–95. https://www.aclweb.org/anthology/W17-0411
Nivre J, Hall J, Nilsson J (2006) Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006, European Language Resources Association (ELRA), Genoa, Italy, pp 2216–2219. http://www.lrec-conf.org/proceedings/lrec2006/summaries/162.html
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. Proc Conf N Am Chapter Assoc Comput Linguist. https://doi.org/10.18653/v1/N18-1202
Article Google Scholar
Polignano M, Basile P, de Gemmis M, Semeraro G, Basile V (2019) AlBERTo: Italian BERT language understanding model for NLP challenging Tasks Based on Tweets. In: Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019), CEUR. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85074851349&partnerID=40&md5=7abed946e06f76b3825ae5e294ffac14
Raganato A, Tiedemann J (2018) An analysis of encoder representations in transformer-based machine translation. Proc Worksh. https://doi.org/10.18653/v1/w18-5431
Article Google Scholar
Renzi L, Salvi G, Cardinaletti A (1991) Grande grammatica italiana di consultazione, vol 2. Il mulino
Rizzi L (1986) Null objects in Italian and the theory of pro. Linguist Inq 17(3):501–557
Google Scholar
Rothman J (2009) Understanding the nature and outcomes of early bilingualism: romance languages as heritage languages. Int J Biling 13(2):155–163
Article Google Scholar
Sanguinetti M, Bosco C, Lavelli A, Mazzei A, Antonelli O, Tamburini F (2018) PoSTWITA-UD: an Italian twitter treebank in Universal Dependencies. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, European Language Resources Association (ELRA), Miyazaki, Japan. http://www.lrec-conf.org/proceedings/lrec2018/summaries/636.html
Siddhant A, Johnson M, Tsai H, Ari N, Riesa J, Bapna A, Firat O, Raman K (2020) Evaluating the cross-lingual effectiveness of massively multilingual neural machine translation. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, AAAI Press, New York, NY, USA, pp 8854–8861. https://aaai.org/ojs/index.php/AAAI/article/view/6414
Simi M, Bosco C, Montemagni S (2014) Less is more? Towards a reduced inventory of categories for training a parser for the Italian Stanford dependencies. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), ELRA, Reykjavik, Iceland, pp 83–90
Snow R, O’connor B, Jurafsky D, Ng AY (2008) Cheap and fast–but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 conference on empirical methods in natural language processing, pp 254–263
Song L, Xu K, Zhang Y, Chen J, Yu D (2020) ZPR2: Joint zero pronoun recovery and resolution using multi-task learning and BERT. Proc Annu Meet Assoc Comput Linguist Assoc Comput Linguist. https://doi.org/10.18653/v1/2020.acl-main.482
Article Google Scholar
Tenney I, Das D, Pavlick E (2019) BERT rediscovers the classical NLP pipeline. Proc Annu Meet Assoc Comput Linguist Assoc Comput Linguist. https://doi.org/10.18653/v1/P19-1452
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, pp 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need
Voita E, Talbot D, Moiseev F, Sennrich R, Titov I (2019) Analyzing multi-head self-attention: specialized heads do the heavy lifting, the rest can be pruned. Proc Conf Assoc Comput Linguist. https://doi.org/10.18653/v1/p19-1580
Article Google Scholar
Wang B, Kuo CJ (2020) SBERT-WK: a sentence embedding method by dissecting BERT-based word models. IEEE/ACM Trans Audio Speech Lang Process 28:2146–2157. https://doi.org/10.1109/TASLP.2020.3008390
Article Google Scholar
Wang L, Tu Z, Zhang X, Liu S, Li H, Way A, Liu Q (2017) A novel and robust approach for pro-drop language translation. Mach Transl 31(1–2):65–87
Article Google Scholar
Warstadt A, Singh A, Bowman SR (2019) Neural network acceptability judgments. Trans Assoc Comput Linguistics 7:625–641
Article Google Scholar
Yang Z, Dai Z, Yang Y, Carbonell JG, Salakhutdinov R, Le QV (2019) XLNet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019. Vancouver, BC, Canada, pp 5754–5764
Zeman D, Popel M, Straka M, Hajič J, Nivre J, Ginter F, Luotolahti J, Pyysalo S, Petrov S, Potthast M, Tyers F, Badmaeva E, Gokirmak M, Nedoluzhko A, Cinková S, Hajič jr J, Hlaváčová J, Kettnerová V, Urešová Z, Kanerva J, Ojala S, Missilä A, Manning CD, Schuster S, Reddy S, Taji D, Habash N, Leung H, de Marneffe MC, Sanguinetti M, Simi M, Kanayama H, de Paiva V, Droganova K, Martínez Alonso H, Çöltekin Ç, Sulubacak U, Uszkoreit H, Macketanz V, Burchardt A, Harris K, Marheinecke K, Rehm G, Kayadelen T, Attia M, Elkahky A, Yu Z, Pitler E, Lertpradit S, Mandl M, Kirchner J, Alcalde HF, Strnadová J, Banerjee E, Manurung R, Stella A, Shimada A, Kwak S, Mendonça G, Lando T, Nitisaroj R, Li J, (2017) CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies. Proc CoNLL Shared Task. https://doi.org/10.18653/v1/K17-3001
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Research Council of Italy, Institute for High Performance Computing and Networking (ICAR-CNR), via Pietro Castellino 111, 80131, Naples, Italy
Raffaele Guarasci, Stefano Silvestri, Giuseppe De Pietro & Massimo Esposito
Faculty of Information Technology, Ho Chi Minh City University of Technology (HUTECH), Ho Chi Minh City, Vietnam
Hamido Fujita
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Hamido Fujita
Faculty of Software and Information Science, Iwate Prefectural University, Iwate, Japan
Hamido Fujita

Authors

Raffaele Guarasci
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Silvestri
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe De Pietro
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Esposito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Silvestri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guarasci, R., Silvestri, S., De Pietro, G. et al. Assessing BERT’s ability to learn Italian syntax: a study on null-subject and agreement phenomena. J Ambient Intell Human Comput 14, 289–303 (2023). https://doi.org/10.1007/s12652-021-03297-4

Download citation

Received: 31 October 2020
Accepted: 28 April 2021
Published: 08 May 2021
Issue Date: January 2023
DOI: https://doi.org/10.1007/s12652-021-03297-4

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Targeted Assessment of the Syntactic Abilities of Transformer Models for Galician-Portuguese

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

BERT-Based Dependency Parser for Hindi

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Assessing BERT’s ability to learn Italian syntax: a study on null-subject and agreement phenomena

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Targeted Assessment of the Syntactic Abilities of Transformer Models for Galician-Portuguese

Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese

BERT-Based Dependency Parser for Hindi

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation