An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems

Hui Chen¹,
Bao-gang Wei¹,
Yi-ming Li¹,
Yong-huai Liu² &
…
Wen-hao Zhu³

106 Accesses
1 Citation
Explore all metrics

Abstract

Entity recognition and disambiguation (ERD) is a crucial technique for knowledge base population and information extraction. In recent years, numerous papers have been published on this subject, and various ERD systems have been developed. However, there are still some confusions over the ERD field for a fair and complete comparison of these systems. Therefore, it is of emerging interest to develop a unified evaluation framework. In this paper, we present an easy-to-use evaluation framework (EUEF), which aims at facilitating the evaluation process and giving a fair comparison of ERD systems. EUEF is well designed and released to the public as an open source, and thus could be easily extended with novel ERD systems, datasets, and evaluation metrics. It is easy to discover the advantages and disadvantages of a specific ERD system and its components based on EUEF. We perform a comparison of several popular and publicly available ERD systems by using EUEF, and draw some interesting conclusions after a detailed analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing Baselines for Russian Named Entity Recognition

Eaglet – a Named Entity Recognition and Entity Linking Gold Standard Checking Tool

All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking

References

Bizer, C., Lehmann, J., Kobilarov, G., et al., 2009. DBpedia—a crystallization point for the Web of Data. Web Semant. Sci. Serv. Agents World Wide Web, 7(3):154–165. http://dx.doi.org/10.1016/j.websem.2009.07.002
Article Google Scholar
Carletta, J., 1996. Assessing agreement on classification tasks: the kappa statistic. Comput. Ling., 22(2):249–254.
Google Scholar
Cornolti, M., Ferragina, P., Ciaramita, M., 2013. A framework for benchmarking entity-annotation systems. Proc. 22nd Int. Conf. on World Wide Web, p.249–260.
Google Scholar
Finkel, J.R., Grenager, T., Manning, C., 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. Proc. 43rd Annual Meeting on Association for Computational Linguistics, p.363–370. http://dx.doi.org/10.3115/1219840.1219885
Google Scholar
Hachey, B., Nothman, J., Radford, W., 2014. Cheap and easy entity evaluation. Proc. 52nd Annual Meeting of the Association for Computational Linguistics, p.464–469.
Google Scholar
Hoffart, J., Yosef, M.A., Bordino, I., et al., 2011. Robust disambiguation of named entities in text. Proc. Conf. on Empirical Methods in Natural Language Processing, p.782–792.
Google Scholar
Ji, H., Nothman, J., Hachey, B., et al., 2014. Overview of TAC-KBP2014 entity discovery and linking tasks. Proc. Text Analysis Conf.
Google Scholar
Ji, H., Nothman, J., Hachey, B., et al., 2015. Overview of TAC-KBP2015 tri-lingual entity discovery and linking. Proc. Text Analysis Conf.
Google Scholar
Ling, X., Singh, S., Weld, D.S., 2015. Design challenges for entity linking. Trans. Assoc. Comput. Ling., 3:315–328.
Google Scholar
Milne, D., Witten, I.H., 2008. Learning to link with Wikipedia. Proc. 17th ACM Conf. on Information and Knowledge Management, p.509–518. http://dx.doi.org/10.1145/1458082.1458150
Google Scholar
Milne, D., Witten, I.H., 2013. An open-source toolkit for mining Wikipedia. Artif. Intell., 194:222–239. http://dx.doi.org/10.1016/j.artint.2012.06.007
Article MathSciNet Google Scholar
Ratinov, L., Roth, D., 2009. Design challenges and misconceptions in named entity recognition. Proc. 13th Conf. on Computational Natural Language Learning, p.147–155. http://dx.doi.org/10.3115/1596374.1596399
Google Scholar
Ratinov, L., Roth, D., Downey, D., et al., 2011. Local and global algorithms for disambiguation to Wikipedia. Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language, p.1375–1384.
Google Scholar
Ristad, E.S., Yianilos, P.N., 1998. Learning string-edit distance. IEEE Trans. Patt. Anal. Mach. Intell., 20(5):522–532. http://dx.doi.org/10.1109/34.682181
Article Google Scholar
Rizzo, G., van Erp, M., Troncy, R., 2014. Benchmarking the extraction and disambiguation of named entities on the semantic web. Proc. 9th Int. Conf. on Language Resources and Evaluation.
Google Scholar
Shen, W., Wang, J., Han, J., 2015. Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443–460. http://dx.doi.org/10.1109/TKDE.2014.2327028
Article Google Scholar
Spitkovsky, V.I., Chang, A.X., 2012. A cross-lingual dictionary for English Wikipedia concepts. 8th Int. Conf. on Language Resources and Evaluation, p.3168–3175.
Google Scholar
Usbeck, R., Röder, M., Ngonga Ngomo, A.C., et al., 2015. GERBIL: general entity annotator benchmarking framework. Proc. 24th Int. Conf. on World Wide Web, p.1133–1143. http://dx.doi.org/10.1145/2736277.2741626
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Hui Chen, Bao-gang Wei & Yi-ming Li
Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB, UK
Yong-huai Liu
School of Computer Engineering and Science, Shanghai University, Shanghai, 200000, China
Wen-hao Zhu

Authors

Hui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Bao-gang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yi-ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong-huai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-hao Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Chen.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61572434), the China Knowledge Centre for Engineering Sciences and Technology (No. CKC-EST-2015-2-5), and the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP), China (No. 20130101110-136)

ORCID: Hui CHEN, http://orcid.org/0000-0001-9709-977X

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H., Wei, Bg., Li, Ym. et al. An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems. Frontiers Inf Technol Electronic Eng 18, 195–205 (2017). https://doi.org/10.1631/FITEE.1500473

Download citation

Received: 26 December 2015
Revised: 13 March 2016
Published: 18 February 2017
Issue Date: February 2017
DOI: https://doi.org/10.1631/FITEE.1500473

Keywords

CLC number

TP391.1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Introducing Baselines for Russian Named Entity Recognition

Eaglet – a Named Entity Recognition and Entity Linking Gold Standard Checking Tool

All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

CLC number

Subscribe and save

Buy Now

Navigation

An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Introducing Baselines for Russian Named Entity Recognition

Eaglet – a Named Entity Recognition and Entity Linking Gold Standard Checking Tool

All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

CLC number

Subscribe and save

Buy Now

Search

Navigation