Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Comparison of Methods to Annotate Named Entity Corpora

Published: 21 July 2018 Publication History

Abstract

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, degree of agreement, and performance were evaluated based on the gold standard. Because there were two annotators for one text for each method, two performances were evaluated: the average performance of both annotators and the performance when at least one annotator is correct. The experiments reveal that semi-automatic annotation is faster, achieves better agreement, and performs better on average. However, they also indicate that sometimes, fully manual annotation should be used for some texts whose document types are substantially different from the training data document types. In addition, the machine learning experiments using semi-automatic and fully manually annotated corpora as training data indicate that the F-measures could be better for some texts when manual instead of semi-automatic annotation was used. Finally, experiments using the annotated corpora for training as additional corpora show that (i) the NE recognition performance does not always correspond to the performance of the NE tag annotation and (ii) the system trained with the manually annotated corpus outperforms the system trained with the semi-automatically annotated corpus with respect to newswires, even though the existing NE recognizer was mainly trained with newswires.

References

[1]
Bea Alex, Claire Grover, Rongzhou Shen, and Mijail Kabadjov. 2010. Agile corpus annotation in practice: An overview of manual and automatic annotation of CVs. In Proceedings of 4th Linguistic Annotation Workshop, ACL 2010. 29--37.
[2]
Jon Chamberlain, Udo Kruschwitz, and Massimo Poesio. 2009. Constructing an anaphorically annotated corpus with non-experts: Assessing the quality of collaborative annotations. In Proceedings of the 2009 Workshop on the People’s Web Meets NLP (ACL-IJCNLP’09). 57--62.
[3]
Lonneke Van der Plas, Tanja Samardzić, and Paola Merlo. 2010. Cross-lingual validity of propbank in the manual annotation of french. In Proceedings of 4th Linguistic Annotation Workshop, ACL 2010. 113--117.
[4]
Bruno Guillaume, Karën Fort, and Nicolas Lefebvre. 2016. Crowdsourcing complex language resources: Playing to annotate dependency syntax. In Proceedings of COLING 2016. 3041--3052.
[5]
Masatsugu Hangyo, Daisuke Kawahara, and Sadao Kurohashi. 2012. Building a diverse document leads corpus annotated with semantic relations. In Proceedings of PACLIC 2012. 535--544.
[6]
Taiichi Hashimoto, Takashi Inui, and Koji Murakami. 2008. Constructing extended named entity annotated corpora (in Japanese). IPSJ SIG Technical Reports (NLP) 2008-NL-188 (2008), 113--120.
[7]
Ai Hirata and Mamoru Komachi. 2015. Analysis of named entity recognition for texts of various genres (in Japanese). NLP2015 Error Analysis Workshop (2015). https://docs.google.com/viewer?a=v8pid8equals;sites8srcid=ZGVmYXVsdGRvbWFpbnxwcm9qZWN0bmV4dG5scHxneDo1ZGYxOTg3YWE1MDIzOTRi.
[8]
Masaaki Ichihara, Kanako Komiya, Tomoya Iwakura, and Maiko Yamazaki. 2015. Error analysis of named entity recognition in BCCWJ. NLP2015 Error Analysis Workshop (2015). https://docs.google.com/viewer?a=v8pid=sites8srcid=ZGVmYXVsdGRvbWFpbnxwcm9qZWN0bmV4dG5scHxneDoxZTY1MWY4YTBjNmNjNzIx.
[9]
Tomoya Iwakura. 2015. Error analysis of named entity extraction (in Japanese). NLP2015 Error Analysis Workshop (2015). https://docs.google.com/viewer?a=v8pid=sites8srcid=ZGVmYXVsdGRvbWFpbnxwcm9qZWN0bmV4dG5scHxneDo1ZTg0ZmJmYmRjNThmN2I1.
[10]
Tomoya Iwakura, Kanako Komiya, and Ryuichi Tachibana. 2016. Constructing a Japanese basic named entity corpus of various genres. In Proceedings of NEWS 2016, Workshop of ACL 2016. 41--46.
[11]
Daisuke Kawahara, Yuichiro Machida, Tomohide Shibata, Sadao Kurohashi, Hayato Kobayashi, and Manabu Sassano. 2014. Rapid development of a corpus with discourse annotations using two-stage crowdsourcing. In Proceedings of COLING 2014. 269--278.
[12]
Kanako Komiya, Masaya Suzuki, Tomoya Iwakura, Minoru Sasaki, and Hiroyuki Shinnou. 2016. Comparison of annotating methods for named entity corpora. In Proceedings of LAW-X 2016, Workshop of ACL 2016. 59--67.
[13]
Lou Burnard. 2010. Reference Guide for the British National Corpus (World Edition). Retrieved February 26, 2018 from http://www.natcorp.ox.ac.uk/archive/worldURG/urg.pdf.
[14]
Yuichiro Machida, Daisuke Kawahara, Sadao Kurohashi, and Manabu Sassano. 2016. Design of word association games using dialog systems for acquisition of word association knowledge. In Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC’16) 2016. 86--91.
[15]
Kikuo Maekawa, Makoto Yamazaki, Toshinobu Ogiso, Takehiko Maruyama, Hideki Ogura, Wakako Kashino, Hanae Koiso, Masaya Yamaguchi, Makiro Tanaka, and Yasuharu Den. 2014. Balanced corpus of contemporary written Japanese. Language Resources and Evaluation 48 (2014), 345--371.
[16]
Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics—Special Issue on Using Large Corpora: II 19 (1993), 313--330.
[17]
Tetsuro Sasada, Shinsuke Mori, Tatsuya Kawahara, and Yoko Yamakata. 2015. Named entity recognizer trainable from partially annotated data. In International Conference of the Pacific Association for Computational Linguistics. 10--17.
[18]
Ryohei Sasano and Sadao Kurohashi. 2008. Japanese named entity recognition using structural natural language processing. In Proceedings of International Joint Conference on Natural Language Processing. 607--612.
[19]
Satoshi Sekine and Hitoshi Isahara. 2000. IREX: IR and IE evaluation project in Japanese. In Proceedings of the Language Resources and Evaluation Conference, No. 1019.
[20]
Rion Snow, Brendan O’Conner, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—But is it good? Evaluation non-expert annotation for natural lanugage tasks. In Proceedings of the 2008 Conference on Emprical Methods in Natural Language Processing (EMNLP’08). 254--263.
[21]
Takenobu Tokunaga, Jin Nishikara, Tomoya Iwakura, and Nobuhiro Yugami. 2015. Analysis of eye tracking data of annotators for named entity recognition task (in Japanese). IPSJ SIG Technical Reports NLP 2015-NL-223 (2015), 1--8.

Cited By

View all
  • (2024)Is Boundary Annotation Necessary? Evaluating Boundary-Free Approaches to Improve Clinical Named Entity Annotation Efficiency: Case StudyJMIR Medical Informatics10.2196/5968012(e59680)Online publication date: 2-Jul-2024
  • (2023)PaTAT: Human-AI Collaborative Qualitative Coding with Explainable Interactive Rule SynthesisProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581352(1-19)Online publication date: 19-Apr-2023
  • (2022)Bootstrapping semi-supervised annotation method for potential suicidal messagesInternet Interventions10.1016/j.invent.2022.10051928(100519)Online publication date: Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 4
December 2018
193 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3229525
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 July 2018
Accepted: 01 May 2018
Revised: 01 March 2018
Received: 01 October 2017
Published in TALLIP Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Annotation
  2. named entity extraction
  3. non-expert annotator

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • KAKENHI

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Is Boundary Annotation Necessary? Evaluating Boundary-Free Approaches to Improve Clinical Named Entity Annotation Efficiency: Case StudyJMIR Medical Informatics10.2196/5968012(e59680)Online publication date: 2-Jul-2024
  • (2023)PaTAT: Human-AI Collaborative Qualitative Coding with Explainable Interactive Rule SynthesisProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581352(1-19)Online publication date: 19-Apr-2023
  • (2022)Bootstrapping semi-supervised annotation method for potential suicidal messagesInternet Interventions10.1016/j.invent.2022.10051928(100519)Online publication date: Apr-2022
  • (2022)Iterative Learning for Semi-automatic Annotation Using User FeedbackIntelligent Technologies and Applications10.1007/978-3-031-10525-8_3(31-44)Online publication date: 23-Jul-2022
  • (2021)The Application of Text Mining Algorithms to Discover One Topic Objects in Digital Learning Repositories2021 28th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT50888.2021.9347611(502-509)Online publication date: 27-Jan-2021
  • (2019)Stock Trend Extraction using Rule-based and Syntactic Feature-based Relationships between Named Entities2019 International Conference on Advanced Information Technologies (ICAIT)10.1109/AITC.2019.8920986(78-83)Online publication date: Nov-2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media