Adapting a Robust Multi-genre NE System for Automatic Content Extraction

Diana Maynard²,
Hamish Cunningham²,
Kalina Bontcheva² &
…
Marin Dimitrov³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2443))

Included in the following conference series:

International Conference on Artificial Intelligence: Methodology, Systems, and Applications

551 Accesses

Abstract

Many current information extraction systems tend to be designed with particular applications and domains in mind. With the increasing need for robust language engineering tools which can handle a variety of language processing demands, we have used the GATE architecture to design MUSE - a system for named entity recognition and related tasks. In this paper, we address the issue of how this general-purpose system can be adapted for particular applications with minimal time and effort, and how the set of resources used can be adapted dynamically and automatically. We focus specifically on the challenges of the ACE (Automatic Content Extraction) entity detection and tracking task, and preliminary results show promising figures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Entity Recognition in Information Extraction

A New State-of-The-Art Czech Named Entity Recognizer

Czech Named Entity Corpus

References

D. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, 12(3):161–172, 1999.
Google Scholar
K. Bontcheva, D. Maynard, H. Saggion, and H. Cunningham. Using human language technology for automatic annotation and indexing of digital library content. In submitted to European Conference on Digital Libraries, 2002.
Google Scholar
J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, California, 1993.
Google Scholar
J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80–91, 1996.
Article Google Scholar
H. Cunningham. Information Extraction: a User Guide (revised version). Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, May 1999.
Google Scholar
H. Cunningham. GATE, a General Architecture for Text Engineering. Computers and the Humanities, 36:223–254, 2002.
Article Google Scholar
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
Google Scholar
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.
M. Dimitrov. A Light-weight Approach to Coreference Resolution for Named Entities in Text. MSc Thesis, University of Sofia, Bulgaria, 2002. http://www.ontotext.com/ie/thesis-m.pdf.
Jonathan G. Fiscus, George Doddington, John S. Garofolo, and Alvin Martin. Nist’s 1998 topic detection and tracking evaluation (tdt2). In Proc. of the DARPA Broadcast News Workshop, Virginia, US, 1998.
Google Scholar
W.B. Frakes and R. Baeza-Yates, editors. Information retrieval, data structures and algorithms. Prentice Hall, New York, Englewood Cliffs, N.J., 1992.
Google Scholar
O. Hamza, V. Tablan, D. Maynard, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition in Romanian. Technical report, Department of Computer Science, University of Sheffield, 2002. Forthcoming.
Google Scholar
C.D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT press, Cambridge, MA, 1999. Supporting materials available at http://www.sultry.arts.usyd.edu.au/fsnlp/.
MATH Google Scholar
D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natural Language Processing 2001 Conference, Tzigov Chark, Bulgaria, 2001.
Google Scholar
Diana Maynard, Kalina Bontcheva, Horacio Saggion, Hamish Cunningham, and Oana Hamza. Using a text engineering framework to build an extendable and portable IE-based summarisation system. In Proceedings of the ACL Workshop on Text Summarisation, 2002.
Google Scholar
Peter Sassone. Cost-benefit methodology for office systems. ACM Transactions on Office Information Systems, 5(3):273–289, 1987.
Article Google Scholar
S. Soderland. Learning to extract text-based information from the world wide web. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.
Google Scholar
Beth Sundheim, editor. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 1995. ARPA, Morgan Kaufmann.
Google Scholar
Yiming Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1:67–88, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept of Computer Science, University of Sheffield, 211 Portobello St, S1 4DP, Sheffield, UK
Diana Maynard, Hamish Cunningham & Kalina Bontcheva
Sirma AI Ltd, Ontotext Lab, 38AHristo Botev Blvd, 1000, Sofia, Bulgaria
Marin Dimitrov

Authors

Diana Maynard
View author publications
You can also search for this author in PubMed Google Scholar
Hamish Cunningham
View author publications
You can also search for this author in PubMed Google Scholar
Kalina Bontcheva
View author publications
You can also search for this author in PubMed Google Scholar
Marin Dimitrov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ITRI, University of Brighton, Lewes Road, BN2 4GJ, Brighton, UK
Donia Scott

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maynard, D., Cunningham, H., Bontcheva, K., Dimitrov, M. (2002). Adapting a Robust Multi-genre NE System for Automatic Content Extraction. In: Scott, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2002. Lecture Notes in Computer Science(), vol 2443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46148-5_27

Download citation

DOI: https://doi.org/10.1007/3-540-46148-5_27
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44127-4
Online ISBN: 978-3-540-46148-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Adapting a Robust Multi-genre NE System for Automatic Content Extraction

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Entity Recognition in Information Extraction

A New State-of-The-Art Czech Named Entity Recognizer

Czech Named Entity Corpus

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Adapting a Robust Multi-genre NE System for Automatic Content Extraction

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Entity Recognition in Information Extraction

A New State-of-The-Art Czech Named Entity Recognizer

Czech Named Entity Corpus

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation