Vulcain — An Ontology-Based Information Extraction System

Amalia Todirascu⁵,
Laurent Romary⁵ &
Dalila Bekhouche⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2553))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

493 Accesses

Abstract

This paper describes an information extraction system, Vulcain, dedicated to message filtering for a specific domain. The paper focuses on a method for identifying domain-specific terms and concepts, using syntactic information and an existing domain ontology. We focused on a method for identifying terms by partial syntactic analysis, based on TAG grammars. The domain ontology is represented in description logics, and DL inference mechanisms are used to validate the candidate concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

VBSRL: A Semantic Frame-Based Approach for Data Extraction from Unstructured Business Documents

Information Extraction Approaches: A Survey

An Approach to Web Information Processing

References

Assadi, H., Bourigault, D.: FrAnalyse syntaxique et statistique pour la construction d’ontologies à partir des textes. In J. Charlet, M. Zacklad, G. Kassel, D. Bourigault (eds.): Ingénierie des connaissances-Evolutions récentes et nouveaux défis, Eyrolles Publishing House (2000), 243–256.
Google Scholar
Baader, F., Hollunder, B.: A Terminological Knowledge Representation Systems with Complete Inference Algorithms. In Proceedings of the Workshop on Processing Declarative Knowledge (1991).
Google Scholar
Bonhomme, P. and Lopez, P.: TagML: XML encoding of Resources for Lexicalized Tree Adjoining Grammars. In Proceedings of LREC2000, Athens (2000).
Google Scholar
Bouaud, J., Habert, B., Nazarenko, A., Zweigenbaum, P.: FrRegroupements issus de dépendances syntaxiques sur un corpus de spécialité: catégorisation et confrontation à deux conceptualisations du domaine. In J. Charlet, M. Zacklad, G. Kassel, D. Bourigault (eds.): Ingénierie des connaissances-Evolutions récentes et nouveaux défis, Eyrolles Publishing House (2000) 275–290.
Google Scholar
Buitelaar, P.: CORELEX: Systematic Polysemy and Underspecification, Ph.D. thesis, Brandeis University, Department of Computer Science (1998)
Google Scholar
Capponi, N., Toussaint, Y.: FrInterprétation de classes de termes par généralisation de structures prédicat-argument. In J. Charlet, M. Zacklad, G. Kassel, D. Bourigault (eds.): Ingénierie des connaissances-Evolutions récentes et nouveaux défis, Eyrolles Publishing House (2000) 337–356.
Google Scholar
Chanod J.P.: Natural Language Processing and Digital Libraries. In M.T. Pazienza (ed.): Information Extraction, Springer-Verlag, LNAI 1714, (1999) 17–31.
Google Scholar
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In J. Klavans, P. Resnik (eds.): The Balancing Act-Combining Symbolic and Statistical Approaches to Language, MIT Press (1996) 49–66.
Google Scholar
Fensel D. et al.: OIL in a nutshell. In R. Dieng et al. (eds.): Knowledge Acquisition, Modeling, and Management, Proceedings of the European Knowledge Acquisition Conference (EKAW-2000), Lecture Notes in Artificial Intelligence, LNAI, Springer-Verlag (2000).
Google Scholar
Guarino, N.: Semantic Matching: Formal Ontological Distinctions for Information Organization, Extraction, and Integration. In M. T. Pazienza (ed.): Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology. Springer Verlag (1997) 139–170.
Google Scholar
Heid, U.: A linguistic bootstrapping approach to the extraction of term candidates from German text. In Terminology, (2000) 161–180.
Google Scholar
Haarslev V., Muller R.: Description of the RACER System and its Applications. In Proceedings of the International Workshop on Description Logics (DL-2001), Stanford, USA, (2001), 132–141
Google Scholar
Joshi A.: An Introduction to Tree Adjoining Grammars. In Mathematics of Language, John Benjamins Publishing, Amsterdam/Philadelphia (1987), 87–115.
Google Scholar
Lopez, P.: Robust Parsing with Lexicalized Tree Adjoining Grammars, Ph.D.Thesis, INRIA, Nancy, France (1999).
Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.:Introduction to Word-Net: An On-Line Lexical Database. In International Journal of Lexicography, 3(4), (1990), 302–312.
Article Google Scholar
Riloff, E., Lorenzen, J.: Extraction-based Text Categorization Generating Domain-Specific Role Relationships Automatically. In T. Strzalkowski (ed.): Natural Language Information Retrieval, Kluwer Academic Publishers, (1999), 167–196.
Google Scholar
Riloff, E., Shepherd, J.: A Corpus-Based Approach for Building Semantic Lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (1997).
Google Scholar
Schimd, H.:Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of the International Conference on New Methods in Language Processing, Manchester, United Kingdom (1994)
Google Scholar
Vilain, M.: Inferential Information Extraction. In M. Pazienza (ed.): Information Extraction, LNAI 1714, Springer-Verlag, (1999), 95–119.
Google Scholar

Download references

Author information

Authors and Affiliations

LORIA, INRIA Lorraine, Campus scientifique BP 239, 54506, Vandoeuvre-lès-Nancy Cedex, France
Amalia Todirascu, Laurent Romary & Dalila Bekhouche

Authors

Amalia Todirascu
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Romary
View author publications
You can also search for this author in PubMed Google Scholar
Dalila Bekhouche
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Systems Sciences, Royal Institute of Technology, Forum 100, 16440, Kista, Sweden
Birger Andersson , Maria Bergholtz & Paul Johannesson , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Todirascu, A., Romary, L., Bekhouche, D. (2002). Vulcain — An Ontology-Based Information Extraction System. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_6

Download citation

DOI: https://doi.org/10.1007/3-540-36271-1_6
Published: 28 February 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics