Nothing Special   »   [go: up one dir, main page]

Skip to main content

Adapting a Robust Multi-genre NE System for Automatic Content Extraction

  • Conference paper
  • First Online:
Artificial Intelligence: Methodology, Systems, and Applications (AIMSA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2443))

  • 551 Accesses

Abstract

Many current information extraction systems tend to be designed with particular applications and domains in mind. With the increasing need for robust language engineering tools which can handle a variety of language processing demands, we have used the GATE architecture to design MUSE - a system for named entity recognition and related tasks. In this paper, we address the issue of how this general-purpose system can be adapted for particular applications with minimal time and effort, and how the set of resources used can be adapted dynamically and automatically. We focus specifically on the challenges of the ACE (Automatic Content Extraction) entity detection and tracking task, and preliminary results show promising figures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. D. Appelt. An Introduction to Information Extraction. Artificial Intelligence Communications, 12(3):161–172, 1999.

    Google Scholar 

  2. K. Bontcheva, D. Maynard, H. Saggion, and H. Cunningham. Using human language technology for automatic annotation and indexing of digital library content. In submitted to European Conference on Digital Libraries, 2002.

    Google Scholar 

  3. J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, California, 1993.

    Google Scholar 

  4. J. Cowie and W. Lehnert. Information Extraction. Communications of the ACM, 39(1):80–91, 1996.

    Article  Google Scholar 

  5. H. Cunningham. Information Extraction: a User Guide (revised version). Research Memorandum CS-99-07, Department of Computer Science, University of Sheffield, May 1999.

    Google Scholar 

  6. H. Cunningham. GATE, a General Architecture for Text Engineering. Computers and the Humanities, 36:223–254, 2002.

    Article  Google Scholar 

  7. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

    Google Scholar 

  8. H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, and C. Ursu. The GATE User Guide. http://gate.ac.uk/, 2002.

  9. M. Dimitrov. A Light-weight Approach to Coreference Resolution for Named Entities in Text. MSc Thesis, University of Sofia, Bulgaria, 2002. http://www.ontotext.com/ie/thesis-m.pdf.

  10. Jonathan G. Fiscus, George Doddington, John S. Garofolo, and Alvin Martin. Nist’s 1998 topic detection and tracking evaluation (tdt2). In Proc. of the DARPA Broadcast News Workshop, Virginia, US, 1998.

    Google Scholar 

  11. W.B. Frakes and R. Baeza-Yates, editors. Information retrieval, data structures and algorithms. Prentice Hall, New York, Englewood Cliffs, N.J., 1992.

    Google Scholar 

  12. O. Hamza, V. Tablan, D. Maynard, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition in Romanian. Technical report, Department of Computer Science, University of Sheffield, 2002. Forthcoming.

    Google Scholar 

  13. C.D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT press, Cambridge, MA, 1999. Supporting materials available at http://www.sultry.arts.usyd.edu.au/fsnlp/.

    MATH  Google Scholar 

  14. D. Maynard, V. Tablan, C. Ursu, H. Cunningham, and Y. Wilks. Named Entity Recognition from Diverse Text Types. In Recent Advances in Natural Language Processing 2001 Conference, Tzigov Chark, Bulgaria, 2001.

    Google Scholar 

  15. Diana Maynard, Kalina Bontcheva, Horacio Saggion, Hamish Cunningham, and Oana Hamza. Using a text engineering framework to build an extendable and portable IE-based summarisation system. In Proceedings of the ACL Workshop on Text Summarisation, 2002.

    Google Scholar 

  16. Peter Sassone. Cost-benefit methodology for office systems. ACM Transactions on Office Information Systems, 5(3):273–289, 1987.

    Article  Google Scholar 

  17. S. Soderland. Learning to extract text-based information from the world wide web. Proceedings of Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.

    Google Scholar 

  18. Beth Sundheim, editor. Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, MD, 1995. ARPA, Morgan Kaufmann.

    Google Scholar 

  19. Yiming Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1:67–88, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maynard, D., Cunningham, H., Bontcheva, K., Dimitrov, M. (2002). Adapting a Robust Multi-genre NE System for Automatic Content Extraction. In: Scott, D. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2002. Lecture Notes in Computer Science(), vol 2443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46148-5_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-46148-5_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44127-4

  • Online ISBN: 978-3-540-46148-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics