Abstract
Many current information extraction systems tend to be designed with particular applications and domains in mind. With the increasing need for robust language engineering tools which can handle a variety of language processing demands, we have used the GATE architecture to design MUSE - a system for named entity recognition and related tasks. In this paper, we address the issue of how this general-purpose system can be adapted for particular applications with minimal time and effort, and how the set of resources used can be adapted dynamically and automatically. We focus specifically on the challenges of the ACE (Automatic Content Extraction) entity detection and tracking task, and preliminary results show promising figures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, 12(3):161–172, 1999.
K. Bontcheva, D. Maynard, H. Saggion, and H. Cunningham. Using human language technology for automatic annotation and indexing of digital library content. In submitted to European Conference on Digital Libraries, 2002.
J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, California, 1993.
J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80–91, 1996.
H. Cunningham. Information Extraction: a User Guide (revised version). Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, May 1999.
H. Cunningham. GATE, a General Architecture for Text Engineering. Computers and the Humanities, 36:223–254, 2002.
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.
M. Dimitrov. A Light-weight Approach to Coreference Resolution for Named Entities in Text. MSc Thesis, University of Sofia, Bulgaria, 2002. http://www.ontotext.com/ie/thesis-m.pdf.
Jonathan G. Fiscus, George Doddington, John S. Garofolo, and Alvin Martin. Nist’s 1998 topic detection and tracking evaluation (tdt2). In Proc. of the DARPA Broadcast News Workshop, Virginia, US, 1998.
W.B. Frakes and R. Baeza-Yates, editors. Information retrieval, data structures and algorithms. Prentice Hall, New York, Englewood Cliffs, N.J., 1992.
O. Hamza, V. Tablan, D. Maynard, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition in Romanian. Technical report, Department of Computer Science, University of Sheffield, 2002. Forthcoming.
C.D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT press, Cambridge, MA, 1999. Supporting materials available at http://www.sultry.arts.usyd.edu.au/fsnlp/.
D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natural Language Processing 2001 Conference, Tzigov Chark, Bulgaria, 2001.
Diana Maynard, Kalina Bontcheva, Horacio Saggion, Hamish Cunningham, and Oana Hamza. Using a text engineering framework to build an extendable and portable IE-based summarisation system. In Proceedings of the ACL Workshop on Text Summarisation, 2002.
Peter Sassone. Cost-benefit methodology for office systems. ACM Transactions on Office Information Systems, 5(3):273–289, 1987.
S. Soderland. Learning to extract text-based information from the world wide web. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.
Beth Sundheim, editor. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 1995. ARPA, Morgan Kaufmann.
Yiming Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1:67–88, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maynard, D., Cunningham, H., Bontcheva, K., Dimitrov, M. (2002). Adapting a Robust Multi-genre NE System for Automatic Content Extraction. In: Scott, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2002. Lecture Notes in Computer Science(), vol 2443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46148-5_27
Download citation
DOI: https://doi.org/10.1007/3-540-46148-5_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44127-4
Online ISBN: 978-3-540-46148-7
eBook Packages: Springer Book Archive