Multilingual Fine-Grained Entity Typing

Marieke van Erp¹⁹ &
Piek Vossen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10318))

Included in the following conference series:

International Conference on Language, Data and Knowledge

1409 Accesses

Abstract

Many entity recognition approaches classify recognised entities into a limited set of coarse-grained entity types. However, for deeper natural language analysis and end-user tasks, fine-grained entity types are more useful. For example, while standard named entity recognition may determine that an entity is a person knowing whether that entity is a politician or an actor is important for determining whether, in a subsequent relation extraction task, a relation should be acts or governs. Currently, fine-grained entity typing has only been investigated for English. In this paper, we present a fine-grained entity typing system for Dutch and Spanish using training data extracted from Wikipedia and DBpedia. Our system achieves comparable performance to English with an F$_{1}$ measure of .90 on over 40 types for both Dutch and Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Entity Typing with Triples Using Language Models

Entity Typing Using Distributional Semantics and DBpedia

NERosetta for the Named Entity Multi-lingual Space

Notes

1.
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.
2.
https://wordnet.princeton.edu/.
3.
https://dumps.wikimedia.org/backup-index.html.
4.
https://github.com/attardi/wikiextractor.
5.
Using the wikilinks and instance types dumps from the latest DBpedia, version 2016-04 http://wiki.dbpedia.org/downloads-2016-04.
6.
The types we could not map were the following: location/structure/government, organization/stock_exchange, other/health, other/living_thing, other/product/car, other/product/computer, person/education, person/education/student, person/education/teacher.
7.
Although there is more text in the Spanish DBpedia, we only included a sample here to showcase the adaptability of the approach to other languages.
8.
https://github.com/facebookresearch/fastText.
9.
If an entity X has types location/structure and organisation/education assigned to it, two instances are generated namely X, location/structure and X, organisation/education.
10.
The number of types from levels 1–3 do not add up to the total number of types as some of the higher level types are not present on their own, such as other.
11.
dbpedia: is shorthand for http://dbpedia.org/resource.
12.
dbo: is shorthand for http://dbpedia.org/ontology/.
13.
http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/.

References

Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Technical report, Archiv (2016). https://arxiv.org/abs/1607.04606
Corro, L.D., Abujabal, A., Gemulla, R., Weikum, G.: FINET: context-aware fine-grained named entity typing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 868–878 (2015)
Google Scholar
Ekbal, A., Sourjikova, E., Frank, A., Ponzetto, S.P.: Assessing the challenge of fine-grained named entity recognition and classification. In: Proceedings of the 2010 Named Entities Workshop at ACL 2010, Uppsala, Sweden, July 2010, pp. 93–101 (2010)
Google Scholar
Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.: Context-dependent fine-grained entity type tagging. arXiv (2014)
Google Scholar
Giuliano, C.: Fine-grained classification of named entities exploiting latent semantic Kernels. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CNLL, Boulder, Colorado, USA, pp. 201–209 (2009)
Google Scholar
Hovy, D.: How well can we learn interpretable entity types from text? In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short papers), Baltimore, Maryland, USA, 23–25 June 2014, pp. 482–487. Association for Computational Linguistics (2014)
Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. Technical report, arXiv (2016). https://arxiv.org/abs/1607.01759
Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012)
Google Scholar
Linguistic Data Consortium: ACE (automatic content extraction) english annotation guidelines for entities. Technical report, Linguistic Data Consortium, version 5.6.6 2006.08.01 (2006)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS, vol. 4013, pp. 266–277. Springer, Heidelberg (2006). doi:10.1007/11766247_23
Chapter Google Scholar
Nakashole, N., Tylenda, T., Weikum, G.: Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013, pp. 1488–1497. Association for Computational Linguistics (2013)
Google Scholar
Nothman, J., Curran, J., Murphy, T.: Transforming wikipedia into named entity training data. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 124–132 (2008)
Google Scholar
Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Proceedings of NAACL (2013)
Google Scholar
Ren, X., He, W., Qu, M., Hang, L., Ji, H., Han, J.: AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA, 1–5 November 2016
Google Scholar
Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC (2002)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)
Google Scholar
Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., Ramshaw, L., Xue, N.: Ontonotes: a large training corpus for enhanced processing. In: Olive, J., Christianson, C., McCary, J. (eds.) Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation, pp. 54–63. Springer, New York (2011)
Google Scholar
Yaghoobzadeh, Y., Schütze, H.: Corpus-level fine-grained entity typing using contextual information. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 715–725. Association for Computational Linguistics (2015)
Google Scholar
Yaghoobzadeh, Y., Schütze, H.: Multi-level representations for fine-grained typing of knowledge base entities. In: Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 3–7 April 2017. https://arxiv.org/abs/1701.02025 (2017, to appear)
Yao, L., Riedel, S., McCallum, A.: Collective cross-document relation extraction without labelled data. In: Proceedings of EMNLP (2010)
Google Scholar
Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015), Short papers, Bejing, China, 26–31 July 2015, pp. 291–296. Association for Computational Linguistics (2015)
Google Scholar
Yosef, M.A., Bauer, S., Hoffart, J., Spaniol, M., Weikum, G.: HYENA: hierarchical types classification for entity names. In: Proceedings of COLING 2012: Posters, Mumbai, India, December 2012, pp. 1361–1370 (2012)
Google Scholar

Download references

Acknowledgements

The research for this paper was made possible by the CLARIAH-CORE project financed by NWO.

Author information

Authors and Affiliations

Computational Lexicology and Terminology Lab, The Network Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Marieke van Erp & Piek Vossen

Authors

Marieke van Erp
View author publications
You can also search for this author in PubMed Google Scholar
Piek Vossen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marieke van Erp .

Editor information

Editors and Affiliations

Universidad Politécnica de Madrid, Madrid, Spain
Jorge Gracia
Nanyang Technological University, Singapore, Singapore
Francis Bond
Insight Centre for Data Analytics, National University of Ireland, Galway, Galway, Ireland
John P. McCrae
Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
Paul Buitelaar
Goethe-University Frankfurt, Frankfurt, Germany
Christian Chiarcos
University of Leipzig, Leipzig, Germany
Sebastian Hellmann

Appendix A: Results

Table 5. Precision, recall and F$_{1}$ scores on the overall datasets (macro-averaged) and per class.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Erp, M., Vossen, P. (2017). Multilingual Fine-Grained Entity Typing. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-59888-8_23
Published: 27 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59887-1
Online ISBN: 978-3-319-59888-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multilingual Fine-Grained Entity Typing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Entity Typing with Triples Using Language Models

Entity Typing Using Distributional Semantics and DBpedia

NERosetta for the Named Entity Multi-lingual Space

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: Results

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multilingual Fine-Grained Entity Typing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Entity Typing with Triples Using Language Models

Entity Typing Using Distributional Semantics and DBpedia

NERosetta for the Named Entity Multi-lingual Space

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: Results

Appendix A: Results

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation