Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/564376.564492acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Adaptive information extraction for document annotation in amilcare

Published: 11 August 2002 Publication History

Abstract

Amilcare is a tool for Adaptive Information Extraction (IE) designed for supporting active annotation of documents for the Semantic Web (SW). It can be used either for unsupervised document annotation or as a support for human annotation. Amilcare is portable to new applications/domains without any knowledge of IE, as it just requires users to annotate a small training corpus with the information to be extracted. It is based on (LP)2, a supervised learning strategy for IE able to cope with different texts types, from newspaper-like texts, to rigidly formatted Web pages and even a mixture of them[1][5].Adaptation starts with the definition of a tag set for annotation, possibly organized as an ontology. Then users have to manually annotate a small training corpus. Amilcare provides a default mouse-based interface called Melita, where annotations are inserted by first selecting a tag from the ontology and then identifying the text area to annotate with the mouse. Differently from similar annotation tools [4, 5], Melita actively supports training corpus annotation. While users annotate texts, Amilcare runs in the background learning how to reproduce the inserted annotation. Induced rules are silently applied to new texts and their results are compared with the user annotation. When its rules reach a (user-defined) level of accuracy, Melita presents new texts with a preliminary annotation derived by the rule application. In this case users have just to correct mistakes and add missing annotations. User corrections are inputted back to the learner for retraining. This technique focuses the slow and expensive user activity on uncovered cases, avoiding requiring annotating cases where a satisfying effectiveness is already reached. Moreover validating extracted information is a much simpler task than tagging bare texts (and also less error prone), speeding up the process considerably. At the end of the corpus annotation process, the system is trained and the application can be delivered. MnM [6] and Ontomat annotizer [7] are two annotation tools adopting Amilcare's learner.In this demo we simulate the annotation of a small corpus and we show how and when Amilcare is able to support users in the annotation process, focusing on the way the user can control the tool's proactivity and intrusivity. We will also quantify such support with data derived from a number of experiments on corpora. We will focus on training corpus size and correctness of suggestions when the corpus is increased.

References

[1]
F. Ciravegna: "Adaptive Information Extraction from Text by Rule Induction and Generalisation" in Proceedings of 17th IJCAI, Seattle, August 2001.
[2]
F. Ciravegna (2001): "Challenges in Information Extraction from Text for Knowledge Management", IEEE Intelligent Systems and Their Applications 16(6) 88--90.
[3]
F. Ciravegna (2001c): "(LP)2, an Adaptive Algorithm for Information Extraction from Web-related Texts" in Proc. of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle
[4]
D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson and M. Vilain Mixed-initiative development of language processing systems. In Proc. of the Fifth Conference on ANLP, Washington, 1997.
[5]
H. Cunningham, D. Maynard, V. Tablan, C. Ursu, K. Bontcheva: "Developing Language Processing Components with GATE", www.gate.ac.uk
[6]
J.B. Domingue, M. Lanzoni, E. Motta, M. Vargas-Vera and F. Ciravegna: "MnM: Ontology driven semi-automatic or automatic support for semantic markup", submitted paper.
[7]
S. Handschuh, S. Staab and F. Ciravegna: "S-CREAM - Semi-automatic CREAtion of Metadata", submitted paper.

Cited By

View all
  • (2014)Information extraction for deep web using repetitive subject patternWorld Wide Web10.1007/s11280-013-0248-y17:5(1109-1139)Online publication date: 1-Sep-2014
  • (2011)Ontology population and enrichmentKnowledge-driven multimedia information extraction and ontology evolution10.5555/2001069.2001075(134-166)Online publication date: 1-Jan-2011
  • (2010)Tools for Ontology Engineering and ManagementTheory and Applications of Ontology: Computer Applications10.1007/978-90-481-8847-5_6(131-154)Online publication date: 12-Aug-2010
  • Show More Cited By

Index Terms

  1. Adaptive information extraction for document annotation in amilcare

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
    August 2002
    478 pages
    ISBN:1581135610
    DOI:10.1145/564376
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    SIGIR02
    Sponsor:

    Acceptance Rates

    SIGIR '02 Paper Acceptance Rate 44 of 219 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Information extraction for deep web using repetitive subject patternWorld Wide Web10.1007/s11280-013-0248-y17:5(1109-1139)Online publication date: 1-Sep-2014
    • (2011)Ontology population and enrichmentKnowledge-driven multimedia information extraction and ontology evolution10.5555/2001069.2001075(134-166)Online publication date: 1-Jan-2011
    • (2010)Tools for Ontology Engineering and ManagementTheory and Applications of Ontology: Computer Applications10.1007/978-90-481-8847-5_6(131-154)Online publication date: 12-Aug-2010
    • (2007)Automated Ontology Learning and Validation Using Hypothesis TestingAdvances in Intelligent Web Mastering10.1007/978-3-540-72575-6_21(130-135)Online publication date: 2007
    • (2006)Knowledge management for a large service-oriented corporationProceedings of the 6th international conference on Practical Aspects of Knowledge Management10.1007/11944935_29(326-337)Online publication date: 30-Nov-2006

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media