Learning the Morphological and Syntactic Grammars for Named Entity Recognition
<p>Sequential Pattern.</p> "> Figure 2
<p>Dependency parsing.</p> "> Figure 3
<p>The structure of the NorG network. (<b>a</b>) The NorG embedding: word segment, lemma, and POS tag. (<b>b</b>) The NorG network: bi-LSTM for encoding the sentence, bi-GAT for receiving the dependency grammar, and the CRF layer for entity recognition.</p> "> Figure 4
<p>Performance against sentence length.</p> "> Figure 5
<p>Training step on the validation dataset.</p> "> Figure 6
<p>Comparison of the NorG network by gold-standard and automatically obtained grammars.</p> ">
Abstract
:1. Introduction
- (1)
- In addition to using embeddings from content, we propose the use of embeddings from different grammars for NER.
- (2)
- We propose the NorG network, which integrates the text content, morphology, and syntax. We found that bidirectional LSTM can capture the morphological knowledge well, and bidirectional GAT can capture the syntactic dependency knowledge well.
- (3)
- Experimental results demonstrate the effectiveness of the proposed method in four languages and some exploratory experiments were conducted to discover the influences of different grammar components on the NER performance.
2. Related Works
3. Materials and Methods
3.1. NorG Embedding
- Word Embedding: To obtain context knowledge, we use word-level segments and embeddings, which are pretrained on the corresponding unilingual datasets.
- Lemma Stamp: A lemma is the base form of a word. Due to the rich morphological changes in languages, some transformations may help the model locate named entities. For example, if a word frequently changes in a corpus, the word can be a verb, an adjective, or a noun but will rarely be an entity. For this reason, a special mark is added to the pretrained embeddings; 1 is set if the lemma and word segments are the same, while 0 is set if the lemma and word segments are different.
- POS Tag Embedding: Part-of-speech (POS) tags can disambiguate words and improve semantic expression. In universal standards, the POS tag contains 17 classes, such as NOUN (Noun) and ADV (Adverb). In the Penn tree standard, the POS tag contains 36 classes, such as NN (Noun, singular) and NNS (Noun, plural). We found that universal standard POS tags positively impacted named entities in experiments, but there are very few POS tag classes. We reclassified the POS tags through a dense layer. In this way, the NOUN tag was subdivided into more fine-grained dimensions. The meaning of the new POS tags is agnostic because they are produced by neural networks. In the end, a dense connection mapped the one-hot universal POS tag vector to 300 dimensions, so the POS tag provides 300 extra features in the pretrained word embedding (see Figure 3a).
- Uppercasing or lowercasing leads to different NER results on named entities [29]. By investigating a large number of Nordic sentences, we found that many entities are displayed in capital letters (either the first letter, the entire word, or an abbreviation). For example, Person Name with the first letter capitalized is easier to recognize. Therefore, it is necessary to train the NER model alongside the capitalization, although lowercasing can reduce the vocabulary size and complexity of the neural network.
3.2. NorG Network
3.2.1. Bi-LSTM Layer
3.2.2. Bi-GAT Layer
3.2.3. CRF Layer
4. Experiments
4.1. NER Datasets
4.1.1. Norwegian Bokmål and Nynorsk
4.1.2. Danish
4.1.3. Finnish
4.2. Baselines
4.3. Hyperparameters of the NorG Network
5. Results
5.1. Main Results
5.2. Ablation Experiments
5.3. Performance against Sentence Length
5.4. Training Step
5.5. Performance on Automatically Obtained Grammars
6. Conclusions
- (1)
- The results of the NorG network are the best results to be obtained in recent research.
- (2)
- The NorG network is able to perceive the grammar features from each component.
- (3)
- The NorG network shows good robustness and was only slightly influenced by sentence length.
- (4)
- The NorG network can extract some entities during early training and shows good stability during training.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pollák, F.; Dorčák, P.; Markovič, P. Corporate Reputation of Family-Owned Businesses: Parent Companies vs. Their Brands. Information 2021, 12, 89. [Google Scholar] [CrossRef]
- Li, J.; Sun, A.; Han, J.; Li, C. A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 2020, 34, 50–70. [Google Scholar] [CrossRef] [Green Version]
- Gu, K.; Vosoughi, S.; Prioleau, T. SymptomID: A framework for rapid symptom identification in pandemics using news reports. ACM Trans. Manag. Inf. Syst. (TMIS) 2021, 12, 1–17. [Google Scholar] [CrossRef]
- Lee, D.; Oh, B.; Seo, S.; Lee, K.H. News recommendation with topic-enriched knowledge graphs. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Conference, 19–23 October 2020; pp. 695–704. [Google Scholar]
- Tanvir, H.M.; Kittask, C.; Sirts, K. EstBERT: A Pretrained Language-Specific BERT for Estonian. In Proceedings of the 23rd Nordic Conference on Computational Linguistics, Reykjavik, Iceland, 31 May–2 June 2021. [Google Scholar]
- Kutuzov, A.; Barnes, J.; Velldal, E.; Ovrelid, L.; Oepen, S. Large-Scale Contextualised Language Modelling for Norwegian. In Proceedings of the 23rd Nordic Conference on Computational Linguistics, Reykjavik, Iceland, 31 May–2 June 2021. [Google Scholar]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Shen, Y.; Tan, S.; Sordoni, A.; Courville, A. Ordered neurons: Integrating tree structures into recurrent neural networks. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 3–5 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
- Jørgensen, F.; Aasmoe, T.; Husevåg, A.S.R.; Øvrelid, L.; Velldal, E. NorNE: Annotating named entities for Norwegian. In Proceedings of the 12th Language Resources and Evaluation Conference, Le Palais du Pharo, France, 11–16 May 2020; pp. 4547–4556. [Google Scholar]
- Hvingelby, R.; Pauli, A.B.; Barrett, M.; Rosted, C.; Lidegaard, L.M.; Søgaard, A. DaNE: A named entity resource for danish. In Proceedings of the 12th Language Resources and Evaluation Conference, Le Palais du Pharo, France, 11–16 May 2020; pp. 4597–4604. [Google Scholar]
- Derczynski, L. Simple natural language processing tools for Danish. arXiv 2019, arXiv:1906.11608. [Google Scholar]
- Bick, E. A named entity recognizer for Danish. In Proceedings of the Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal, 26–28 May 2004. [Google Scholar]
- Johannessen, J.B.; Hagen, K.; Haaland, Å.; Jónsdottir, A.B.; Nøklestad, A.; Kokkinakis, D.; Meurer, P.; Bick, E.; Haltrup, D. Named entity recognition for the mainland Scandinavian languages. Lit. Linguist. Comput. 2005, 20, 91–102. [Google Scholar] [CrossRef]
- Luoma, J.; Oinonen, M.; Pyykönen, M.; Laippala, V.; Pyysalo, S. A broad-coverage corpus for finnish named entity recognition. In Proceedings of the 12th Language Resources and Evaluation Conference, Le Palais du Pharo, France, 11–16 May 2020; pp. 4615–4624. [Google Scholar]
- Kettunen, K.; Löfberg, L. Tagging named entities in 19th century and modern Finnish newspaper material with a Finnish semantic tagger. In Proceedings of the 21st Nordic Conference on Computational Linguistics, Gothenburg, Sweden, 22–24 May 2017; pp. 29–36. [Google Scholar]
- Ruokolainen, T.; Kauppinen, P.; Silfverberg, M.; Lindén, K. A Finnish news corpus for named entity recognition. Lang. Resour. Eval. 2020, 54, 247–272. [Google Scholar] [CrossRef] [Green Version]
- Akbik, A.; Bergmann, T.; Blythe, D.; Rasul, K.; Schweter, S.; Vollgraf, R. FLAIR: An easy-to-use framework for state-of-the-art NLP. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, MN, USA, 3–5 June 2019; pp. 54–59. [Google Scholar]
- Akbik, A.; Blythe, D.; Vollgraf, R. Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 1638–1649. [Google Scholar]
- Akbik, A.; Bergmann, T.; Vollgraf, R. Pooled contextualized embeddings for named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 3–5 June 2019; Volume 1, pp. 724–728. [Google Scholar]
- Lafferty, J.; McCallum, A.; Pereira, F.C. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, CA, USA, 28 June–1 July 2001; pp. 282–289. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Zhang, Y.; Qi, P.; Manning, C.D. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Zhang, J.; Shi, X.; Xie, J.; Ma, H.; King, I.; Yeung, D.Y. Gaan: Gated attention networks for learning on large and spatiotemporal graphs. In Proceedings of the Association for Uncertainty in Artificial Intelligence, Monterey, CA, USA, 7–9 August 2018; pp. 339–349. [Google Scholar]
- Lee, J.B.; Rossi, R.; Kong, X. Graph classification using structural attention. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 19–23 August 2018; pp. 1666–1674. [Google Scholar]
- Virtanen, A.; Kanerva, J.; Ilo, R.; Luoma, J.; Luotolahti, J.; Salakoski, T.; Ginter, F.; Pyysalo, S. Multilingual is not enough: BERT for Finnish. arXiv 2019, arXiv:1912.07076. [Google Scholar]
- Akhtyamova, L.; Martínez, P.; Verspoor, K.; Cardiff, J. testing contextualized word embeddings to improve NER in Spanish clinical case narratives. IEEE Access 2020, 8, 164717–164726. [Google Scholar] [CrossRef]
- Yan, H.; Jin, X.; Meng, X.; Guo, J.; Cheng, X. Event detection with multi-order graph convolution and aggregated attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 5766–5770. [Google Scholar]
- Solberg, P.E.; Skjærholt, A.; Øvrelid, L.; Hagen, K.; Johannessen, J.B. The Norwegian Dependency Treebank. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland, 26–31 May 2014. [Google Scholar]
- Øvrelid, L.; Hohle, P. Universal Dependencies for Norwegian. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia, 23–28 May 2016. [Google Scholar]
- Velldal, E.; Øvrelid, L.; Hohle, P. Joint UD parsing of Norwegian Bokmål and Nynorsk. In Proceedings of the 21st Nordic Conference of Computational Linguistics, Gothenburg, Sweden, 22–24 May 2017; pp. 1–10. [Google Scholar]
- Buch-Kromann, M. The danish dependency treebank and the DTAG treebank tool. In Proceedings of the 2nd Workshop on Treebanks and Linguistic Theories (TLT), Växjö, Sweden, 14–15 November 2003; pp. 217–220. [Google Scholar]
- Johannsen, A.; Alonso, H.M.; Plank, B. Universal dependencies for danish. In Proceedings of the International Workshop on Treebanks and Linguistic Theories (TLT14), Warsaw, Poland, 11–12 December 2015; p. 157. [Google Scholar]
- Keson, B. Vejledning til det Danske Morfosyntaktisk Taggede PAROLE-Korpus; Technical Report; Det Danske Sprog og Litteraturselskab (DSL): Copenhagen, Denmark, 2000. [Google Scholar]
- Nivre, J.; De Marneffe, M.C.; Ginter, F.; Goldberg, Y.; Hajic, J.; Manning, C.D.; McDonald, R.; Petrov, S.; Pyysalo, S.; Silveira, N.; et al. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016; pp. 1659–1666. [Google Scholar]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef] [Green Version]
- Yang, J.; Zhang, Y. NCRF++: An Open-source Neural Sequence Labeling Toolkit. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 74–79. [Google Scholar]
- Wolf, T.; Chaumond, J.; Debut, L.; Sanh, V.; Delangue, C.; Moi, A.; Cistac, P.; Funtowicz, M.; Davison, J.; Shleifer, S.; et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Virtual, 16–20 November 2020; pp. 38–45. [Google Scholar]
- Okazaki, N. CRFsuite: A Fast Implementation of Conditional Random Fields. Software Package. 2007. Available online: http://www.chokkan.org/software/crfsuite (accessed on 20 May 2021).
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 2017 International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Huang, W.; Zhang, T.; Rong, Y.; Huang, J. Adaptive sampling towards fast graph representation learning. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montréal, QC, Canada, 2–8 December 2018. [Google Scholar]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Processing Syst. 2016, 29, 3844–3852. [Google Scholar]
- Gui, T.; Zou, Y.; Zhang, Q.; Peng, M.; Fu, J.; Wei, Z.; Huang, X.J. A lexicon-based graph neural network for chinese ner. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 1040–1050. [Google Scholar]
- Honnibal, M.; Montani, I.; Van Landeghem, S.; Boyd, A. spaCy: Industrial-Strength Natural Language Processing in Python. Software Package. 2020. Available online: https://spacy.io (accessed on 20 May 2021).
English | Norwegian Bokmål |
---|---|
Supreme Court Justice Carsten Smith had ex-Queen Anne-Marie of Greece as her table lady. | Høyesterettsjustitiarius Carsten Smith hadde eks-dronning Anne-Marie av Hellas som sin borddame. |
Pürische Nacht is obviously drawn by the pattern of the Crystal Night. | Pürische Nacht er åpenbart tegnet etter mønster av Krystallnatten. |
We caught up this threat, and decided to evacuate the school, says police inspector Heidi L. Arneberg at Fredrikstad police station to Aftenposten.no. | Vi fanget opp denne trusselen, og besluttet å evakuere skolen, sier politiinspektør Heidi L. Arneberg ved Fredrikstad politistasjon til Aftenposten.no. |
Entities | Abbreviation | Explanation |
---|---|---|
Person | PER | Real or fictional characters and animals |
Organization | ORG | Any collection of people, such as firms, institutions, and organizations. |
Location | LOC | Places, buildings, facilities, etc. |
Geo-political entity | GPE | Geographical regions defined by political and/or social groups |
GPE_LOC | GPE with a locative sense | |
GPE_ORG | GPE with an organization sense | |
Product | PROD | Artificially produced entities are regarded as products |
Event | EVT | Festivals, cultural events, sports events, weather phenomena, wars, etc. |
Derived | DRV | Words that are derived from a name, but arenot a name in themselves. |
Miscellaneous | MISC | Other named entities |
IOB2 | Explanation |
---|---|
I-Entity | This token is inside the entity |
O | This token is outside the entity |
B-Entity | This token is the first token of the entity |
# Text = Hvordan Skal det gå Med EU? (How Will It Go with EU?) | |||||
---|---|---|---|---|---|
Word_id | Word Segments | Lemma | POS Tag | Dependency | NER Label |
1 | Hvordan (How) | hvordan | ADV | 4 | name = O |
2 | skal (will) | skulle | AUX | 4 | name = O |
3 | det (it) | det | PRON | 4 | name = O |
4 | gå (go) | gå | VERB | 0 (root) | name = O |
5 | med (with) | med | ADP | 6 | name = O |
6 | EU (EU) | EU | PROPN | 4 | name = B-GPE_ORG |
7 | ? (?) | $? | PUNCT | 4 | name = O |
Inputs | Outputs |
---|---|
Word Segment | NER label |
POS Tag | |
Lemma | |
Upper/Lower Case | |
Dependency |
Dataset | Train | Val | Test |
---|---|---|---|
NorNE (Bokmål) | 15,696 | 2410 | 1939 |
NorNE (Nynorsk) | 14,174 | 1890 | 1511 |
DaNE (Danish) | 4383 | 564 | 565 |
Turku NER (Finish) | 12,217 | 1364 | 1555 |
Baseline | P | R | F1 |
---|---|---|---|
NCRF++ (2018) [39] | - | - | 89.47 |
BM + NN (2020) [11] | - | - | 90.92 |
NorBERT (2021) [6] | - | - | 85.50 |
cbow300 + CNN + crf | 95.84 | 68.37 | 79.22 |
ie + CNN + crf | 80.24 | 71.41 | 75.05 |
cbow300 + biLSTM + crf | 96.27 | 90.20 | 92.97 |
ie + CNN + crf | 83.74 | 74.20 | 78.19 |
cbow300 + biGRU + crf | 96.28 | 89.42 | 92.55 |
ie + biGRU + crf | 87.07 | 73.77 | 79.39 |
NorG network | 98.10 | 94.76 | 96.28 |
Baseline | P | R | F1 |
---|---|---|---|
NCRF++ (2018) [39] | - | - | 86.53 |
BM + NN (2020) [11] | - | - | 88.03 |
NorBERT (2021) [6] | - | - | 82.80 |
cbow300 + CNN + crf | 95.24 | 44.94 | 59.74 |
ie + CNN + crf | 83.49 | 67.90 | 73.96 |
cbow300 + biLSTM + crf | 94.49 | 76.39 | 83.99 |
ie + CNN + crf | 83.00 | 69.36 | 74.56 |
cbow300 + biGRU + crf | 93.70 | 72.71 | 81.32 |
ie + biGRU + crf | 90.21 | 63.58 | 73.72 |
NorG network | 98.72 | 88.22 | 92.92 |
Baseline | P | R | F1 |
---|---|---|---|
FLAIR (2019) [19] | - | - | 79.70 |
FLAIR + BPE (2020) [12] | - | - | 78.05 |
DanishBERT (2020) [40] | - | - | 83.76 |
cbow300 + CNN + crf | 96.06 | 56.61 | 70.82 |
ie + CNN + crf | 70.93 | 68.96 | 69.38 |
cbow300 + biLSTM + crf | 95.22 | 78.60 | 85.97 |
ie + CNN + crf | 84.58 | 65.48 | 73.56 |
cbow300 + biGRU + crf | 94.87 | 77.79 | 85.19 |
ie + biGRU + crf | 76.19 | 67.64 | 71.53 |
NorG network | 98.07 | 80.80 | 88.33 |
Baseline | P | R | F1 |
---|---|---|---|
CRFsuite (2007) [41] | 74.53 | 63.18 | 68.39 |
NCRF++ (2018) [38] | 82.92 | 80.20 | 81.54 |
FiNER tagger (2017) [17] | 77.16 | 71.24 | 74.08 |
FinnishBERT (2019) [28] | 90.87 | 92.44 | 91.65 |
cbow300 + CNN + crf | 96.93 | 51.20 | 66.23 |
ie + CNN + crf | 77.00 | 54.43 | 63.09 |
cbow300 + biLSTM + crf | 90.61 | 78.51 | 83.78 |
ie + CNN + crf | 82.98 | 52.45 | 63.27 |
cbow300 + biGRU + crf | 91.18 | 77.45 | 83.49 |
ie + biGRU + crf | 86.27 | 52.90 | 64.57 |
NorG network | 96.06 | 87.25 | 91.24 |
Model | Bokmål | Nynorsk | Danish | Finnish |
---|---|---|---|---|
NorG | 96.48 | 93.35 | 92.01 | 89.96 |
Use Initialized Embedding | 94.00 | 92.03 | 89.40 | 87.72 |
Remove POS Tag | 75.73 | 61.29 | 74.43 | 71.37 |
Remove Capitalization | 94.04 | 88.42 | 85.27 | 89.11 |
Remove Lemma | 94.38 | 90.28 | 85.65 | 88.51 |
Use Unidirectional Dependency | 95.06 | 93.29 | 85.18 | 89.11 |
Use a Bi-GCN layer for Dependency | 49.90 | 47.63 | 38.68 | 41.32 |
Use a Bi-GCS layer for Dependency | 95.84 | 92.87 | 85.12 | 91.05 |
Use a Bi-Cheb layer for Dependency | 94.23 | 91.94 | 85.58 | 89.25 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, M.; Yang, Q.; Wang, H.; Pasquine, M.; Hameed, I.A. Learning the Morphological and Syntactic Grammars for Named Entity Recognition. Information 2022, 13, 49. https://doi.org/10.3390/info13020049
Sun M, Yang Q, Wang H, Pasquine M, Hameed IA. Learning the Morphological and Syntactic Grammars for Named Entity Recognition. Information. 2022; 13(2):49. https://doi.org/10.3390/info13020049
Chicago/Turabian StyleSun, Mengtao, Qiang Yang, Hao Wang, Mark Pasquine, and Ibrahim A. Hameed. 2022. "Learning the Morphological and Syntactic Grammars for Named Entity Recognition" Information 13, no. 2: 49. https://doi.org/10.3390/info13020049
APA StyleSun, M., Yang, Q., Wang, H., Pasquine, M., & Hameed, I. A. (2022). Learning the Morphological and Syntactic Grammars for Named Entity Recognition. Information, 13(2), 49. https://doi.org/10.3390/info13020049