Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Entity recognition and disambiguation (ERD) is a crucial technique for knowledge base population and information extraction. In recent years, numerous papers have been published on this subject, and various ERD systems have been developed. However, there are still some confusions over the ERD field for a fair and complete comparison of these systems. Therefore, it is of emerging interest to develop a unified evaluation framework. In this paper, we present an easy-to-use evaluation framework (EUEF), which aims at facilitating the evaluation process and giving a fair comparison of ERD systems. EUEF is well designed and released to the public as an open source, and thus could be easily extended with novel ERD systems, datasets, and evaluation metrics. It is easy to discover the advantages and disadvantages of a specific ERD system and its components based on EUEF. We perform a comparison of several popular and publicly available ERD systems by using EUEF, and draw some interesting conclusions after a detailed analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bizer, C., Lehmann, J., Kobilarov, G., et al., 2009. DBpedia—a crystallization point for the Web of Data. Web Semant. Sci. Serv. Agents World Wide Web, 7(3):154–165. http://dx.doi.org/10.1016/j.websem.2009.07.002

    Article  Google Scholar 

  • Carletta, J., 1996. Assessing agreement on classification tasks: the kappa statistic. Comput. Ling., 22(2):249–254.

    Google Scholar 

  • Cornolti, M., Ferragina, P., Ciaramita, M., 2013. A framework for benchmarking entity-annotation systems. Proc. 22nd Int. Conf. on World Wide Web, p.249–260.

    Google Scholar 

  • Finkel, J.R., Grenager, T., Manning, C., 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. Proc. 43rd Annual Meeting on Association for Computational Linguistics, p.363–370. http://dx.doi.org/10.3115/1219840.1219885

    Google Scholar 

  • Hachey, B., Nothman, J., Radford, W., 2014. Cheap and easy entity evaluation. Proc. 52nd Annual Meeting of the Association for Computational Linguistics, p.464–469.

    Google Scholar 

  • Hoffart, J., Yosef, M.A., Bordino, I., et al., 2011. Robust disambiguation of named entities in text. Proc. Conf. on Empirical Methods in Natural Language Processing, p.782–792.

    Google Scholar 

  • Ji, H., Nothman, J., Hachey, B., et al., 2014. Overview of TAC-KBP2014 entity discovery and linking tasks. Proc. Text Analysis Conf.

    Google Scholar 

  • Ji, H., Nothman, J., Hachey, B., et al., 2015. Overview of TAC-KBP2015 tri-lingual entity discovery and linking. Proc. Text Analysis Conf.

    Google Scholar 

  • Ling, X., Singh, S., Weld, D.S., 2015. Design challenges for entity linking. Trans. Assoc. Comput. Ling., 3:315–328.

    Google Scholar 

  • Milne, D., Witten, I.H., 2008. Learning to link with Wikipedia. Proc. 17th ACM Conf. on Information and Knowledge Management, p.509–518. http://dx.doi.org/10.1145/1458082.1458150

    Google Scholar 

  • Milne, D., Witten, I.H., 2013. An open-source toolkit for mining Wikipedia. Artif. Intell., 194:222–239. http://dx.doi.org/10.1016/j.artint.2012.06.007

    Article  MathSciNet  Google Scholar 

  • Ratinov, L., Roth, D., 2009. Design challenges and misconceptions in named entity recognition. Proc. 13th Conf. on Computational Natural Language Learning, p.147–155. http://dx.doi.org/10.3115/1596374.1596399

    Google Scholar 

  • Ratinov, L., Roth, D., Downey, D., et al., 2011. Local and global algorithms for disambiguation to Wikipedia. Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language, p.1375–1384.

    Google Scholar 

  • Ristad, E.S., Yianilos, P.N., 1998. Learning string-edit distance. IEEE Trans. Patt. Anal. Mach. Intell., 20(5):522–532. http://dx.doi.org/10.1109/34.682181

    Article  Google Scholar 

  • Rizzo, G., van Erp, M., Troncy, R., 2014. Benchmarking the extraction and disambiguation of named entities on the semantic web. Proc. 9th Int. Conf. on Language Resources and Evaluation.

    Google Scholar 

  • Shen, W., Wang, J., Han, J., 2015. Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443–460. http://dx.doi.org/10.1109/TKDE.2014.2327028

    Article  Google Scholar 

  • Spitkovsky, V.I., Chang, A.X., 2012. A cross-lingual dictionary for English Wikipedia concepts. 8th Int. Conf. on Language Resources and Evaluation, p.3168–3175.

    Google Scholar 

  • Usbeck, R., Röder, M., Ngonga Ngomo, A.C., et al., 2015. GERBIL: general entity annotator benchmarking framework. Proc. 24th Int. Conf. on World Wide Web, p.1133–1143. http://dx.doi.org/10.1145/2736277.2741626

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Chen.

Additional information

Project supported by the National Natural Science Foundation of China (No. 61572434), the China Knowledge Centre for Engineering Sciences and Technology (No. CKC-EST-2015-2-5), and the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP), China (No. 20130101110-136)

ORCID: Hui CHEN, http://orcid.org/0000-0001-9709-977X

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Wei, Bg., Li, Ym. et al. An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems. Frontiers Inf Technol Electronic Eng 18, 195–205 (2017). https://doi.org/10.1631/FITEE.1500473

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500473

Keywords

CLC number

Navigation