Abstract
We report on an a set of experiments carried out in the context of the Flemish OntoBasis project. Our purpose is to extract semantic relations from text corpora in an unsupervised way and use the output as preprocessed material for the construction of ontologies from scratch. The experiments are evaluated in a quantitative and ”impressionistic” manner.
We have worked on two corpora: a 13M words corpus composed of Medline abstracts related to proteins (SwissProt), and a small legal corpus (EU VAT directive) consisting of 43K words. Using a shallow parser, we select functional relations from the syntactic structure subject-verb-direct-object. Those functional relations correspond to what is a called a ”lexon”. The selection is done using prepositional structures and statistical measures in order to select the most relevant lexons. Therefore, the paper stresses the filtering carried out in order to discard automatically all irrelevant structures.
Domain experts have evaluated the precision of the outcomes on the SwissProt corpus. The global precision has been rated 55%, with a precision of 42% for the functional relations or lexons, and a precision of 76% for the prepositional relations. For the VAT corpus, a knowledge engineer has judged that the outcomes are useful to support and can speed up his modelling task. In addition, a quantitative scoring method (coverage and accuracy measures resulting in a 52.38% and 47.12% score respectively) has been applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aussenac-Gilles, N., Biébow, B., Szulman, S. (eds.): EKAW 2000 Workshop on Ontologies and Texts CEUR (2000), http://CEUR-WS.org/Vol-51/
Aussenac-Gilles, N., Maedche, A. (eds.): ECAI 2002 Workshop on Machine Learning and Natural Language Processing for Ontology Engineering, volume (2002), http://www.inria.fr/acacia/OLT2002
Bechhofer, S. (ed.): Ontology language standardisation efforts. OntoWeb Deliverable #D4, UMIST - IMG, Manchester (2002)
Berland, M., Charniak, E.: Finfing parts in very large corpora. In: Proceedings ACL 1999 (1999)
Berners-Lee, T.: Weaving the Web. Harper, New York (1999)
Bo, J.D., Spyns, P.: Creating a dogmatic multilingual ontology to support a semantic portal. In: Meersman, R., Tari, Z. (eds.) OTM-WS 2003. LNCS, vol. 2889, pp. 253–266. Springer, Heidelberg (2003)
Bourigault, D., Jacquemin, C.: Term extraction + term clustering: An integrated platform for computer-aided terminology. In: Proceedings EACL 1999 (1999)
Brewster, C., Ciravegna, F., Wilks, Y.: User centred ontology learning for knowledge management. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 203–207. Springer, Heidelberg (2002)
Buchholz, S.: Memory-Based Grammatical Relation Finding (1999)
Buchholz, S., Veenstra, J., Daelemans, W.: Cascaded grammatical relation assignment. Print Partners Ipskamp (2002)
Buitelaar, P., Olejnik, D., Sintek, M.: A Protégé plug-in for ontology extraction from text based on linguistic analysis. In: Van Harmelen, F., McIlraith, S., Plexousakis, D. (eds.) Proceedings of the Internal Semantic Web Conference 2004. LNCS, Springer, Heidelberg (2004)
Buitelaar, P., Handschuh, S., Magnini, B. (eds.): Proc. of the ECAI 2004 Workshop on Ontologies. Learning and Population (2004)
Caraballo, S.A., Charniak, E.: Determining the specificity of nouns from text. In: Proceedings SIGDAT 1999 (1999)
Daelemans, W., Buchholz, S., Veenstra, J.: Memory-based shallow parsing. In: Proceedings of CoNLL 1999 (1999)
Dingli, A., Ciravegna, F., Guthrie, D., Wilks, Y.: Mining web sites using adaptive information extraction. In: Proceedings of the 10th Conference of the EACL (2003)
Faure, D., Nédellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: The system ASIUM. In: Proceedings EKAW 1999 (1999)
Friedman, C., Hripcsak, G.: Evalutating natural language processors in the clinical domain. Methods of Information in Medicine 37, 334–344 (1998)
Gamallo, P., Agustini, A., Lopes, G.P.: Selection restrictions acquisition from corpora. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, Springer, Heidelberg (2001)
Gamallo, P., Agustini, A., Lopes, G.P.: Using co-composition for acquiring syntactic and semantic subcategorisation. In: Proceedings of the Workshop SIGLEX 2002, ACL 2002 (2002)
Gamallo, P., Gonzalez, M., Agustini, A., Lopes, G., de Lima, V.: Mapping syntactic dependencies onto semantic relations. In: Nathalie Aussenac-Gilles and Alexander Maedche, editors, ECAI 2002 Workshop on Machine Learning and Natural Language Processing for Ontology Engineering, volume (2002), http://www.inria.fr/acacia/OLT2002
Gangemi, A., Navigli, R., Velardi, P.: The ontowordnet project: Extension and axiomatization of conceptual relations in wordnet. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 820–838. Springer, Heidelberg (2003)
Gómez-Pérez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. In: Advanced Information and Knowledge Processing, Springer, Heidelberg (2003)
Gómez-Pérez, A., Manzano-Macho, D. (eds.): A survey of ontology learning methods and techniques. OntoWeb Deliverable #D1.5, Universidad Politécnica de Madrid (2003)
Grishman, R., Sterling, J.: Generalizing automatically generated selectional patterns. In: Proceedings of COLING 1994 (1994)
Gruber, T.R.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition 6(2), 199–221 (1993)
Guarino, N., Giaretta, P.: Ontologies and knowledge bases: Towards a terminological clarification. In: Mars, N. (ed.) Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, pp. 25–32. IOS Press, Amsterdam (1995)
Guarino, N., Persidis, A.: Evaluation framework for content standards. Technical Report OntoWeb Deliverable #3.5, Padova (2003)
Karanikas, H., Spiliopolou, M., Theodoulidis, B.: Parmenides system architecture and technical specification. Parmenides Deliverable #D22, UMIST, Manchester (2003)
Karanikas, H., Theodoulidis, B.: Knowledge discovery in text and text mining software. Technical report, UMIST - CRIM, Manchester (2002)
Lenat, D.B., Guha, R.V.: Building Large Knowledge Based Systems. Addison Wesley, Reading (1990)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLINGACL 1998 (1998)
Losee, R.: Term dependence: A basis for luhn and zipf models. Journal of the American Society for Information Science and Technology 52(12), 1019–1025 (2001)
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–195 (1958)
Maedche, A.: Ontology Learning for the Semantic Web. The Kluwer International Series in Engineering and Computer Science, vol. 665. Kluwer International, Dordrecht (2003)
Maedche, A., Staab, S., Nédellec, C., Hovy, E. (eds.): IJCAI 2001 Workshop on Ontology Learning, volume CEUR (2001), http://CEUR-WS.org/Vol-38/
Meersman, R.: Ontologies and databases: More than a fleeting resemblance. In: d’Atri, A., Missikoff, M. (eds.) OES/SEO 2001 Rome Workshop, Luiss Publications (2001)
Miller, G.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)
Navigli, R., Velardi, P., Gangemi, A.: Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems 18(1), 22–31 (2002)
Niles, I., Pease, A.: Towards a standard upper ontology. In: Welty, C., Smith, B. (eds.) Proceedings of the 2nd International Conference on Formal Ontology in Information Systems, FOIS 2001 (2001)
Parpola, P.: Managing terminology using statistical analyses, ontologies and a graphical ka tool. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, Springer, Heidelberg (2000), http://CEUR-WS.org/Vol-51/
Peeters, S., Kaufner, S.: State of the art in crosslingual information access for medical information. Technical report, CSLI (2001)
Pinto, H., Gómez-Pérez, A., Martins, J.P.: Some issues on ontology integration. In: Benjamins, R., Gómez-Pérez, A. (eds.) Proceedings of the IJCAI 1999 Workshop on Ontology and Problem-solving methods: lesson learned and future trends. CEUR, pp. 7.1–7.11 (1999)
Pretorius, A.J.: Lexon visualization: visualizing binary fact types in ontology bases. In: Proceedings of the 8th international conference on information visualisation (IV 2004), London, IEEE Press, Los Alamitos (2004) (in press)
Pustejovsky, J.: The Generative Lexicon. MIT Press, Cambridge (1995)
Reinberger, M.-L., Spyns, P., Daelemans, W., Meersman, R.: Mining for lexons: Applying unsupervised learning methods to create ontology bases. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 803–819. Springer, Heidelberg (2003)
Reinberger, M.-L., Spyns, P.: Discovering knowledge in texts for the learning of dogma-inspired ontologies. In: Buitelaar, P., Handschuh, S., Magnini, B. (eds.) Proceedings of the ECAI 2004 Workshop on Ontologies. Learning and Population (2004)
Rinaldi, F., Kaljurand, K., Dowdall, J., Hess, M.: Breaking the deadlock. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 876–888. Springer, Heidelberg (2003)
Spyns, P., Van Acker, S., Wynants, M., Jarrar, M., Lisovoy, A.: Using a novel orm-based ontology modelling method to build an experimental innovation router. In: Motta, E., Shadbolt, N.R., Stutt, A., Gibbins, N. (eds.) EKAW 2004. LNCS (LNAI), vol. 3257, Springer, Heidelberg (2004) (in press)
Spyns, P., Meersman, R., Jarrar, M.: Data modelling versus ontology engineering. SIGMOD Record Special Issue 31(4), 12–17 (2002)
Staab, S., Maedche, A., Nédellec, C., Wiemer-Hastings, P. (eds.): Proceedings of the Workshop on Ontology Learning, volume CEUR (2000), http://CEUR-WS.org/Vol-31/
Uschold, M., Gruninger, M.: Ontologies: Principles, methods and applications. Knowledge Sharing and Review 11(2) (June 1996)
Ushold, M.: Where are the semantics in the semantic web? AI Magazine 24(3), 25–36 (2003)
Velardi, P., Missikoff, M., Basili, R.: Identification of relevant terms to support the construction of Domain Ontologies. In: Maybury, M., Bernsen, N., Krauwer, S. (eds.) Proc. of the ACL-EACL Workshop on Human Language Technologies (2001)
Volz, R., Handschuh, S., Staab, S., Stojanovic, L., Stojanovic, N.: Unveiling the hidden bride: deep annotation for mapping and migrating legacy data to the semantic web. Web Semantics: Science, Services and Agents on the World Wide Web 1, 187–206 (2004)
Vossen, P. (ed.): EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)
Zipf, G.K.: Human Behaviour and the Principle of Least-Effort. Addison-Wesley, Cambridge (1949)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reinberger, ML., Spyns, P., Pretorius, A.J., Daelemans, W. (2004). Automatic Initiation of an Ontology. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE. OTM 2004. Lecture Notes in Computer Science, vol 3290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30468-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-30468-5_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23663-4
Online ISBN: 978-3-540-30468-5
eBook Packages: Springer Book Archive