Abstract
The evaluation of semantic relations acquired automatically from text is a challenging task, which generally ends up being done by humans. Despite less prone to errors, manual evaluation is hardly repeatable, time-consuming and sometimes subjective. In this paper, we evaluate relational triples automatically, exploiting popular similarity measures on the Web. After using these measures to quantify triples according to the co-occurrence of their arguments and textual patterns denoting their relation, some scores revealed to be highly correlated with the correction rate of the triples. The measures were also used to select correct triples in a set, with best F 1 scores around 96%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bennett, C.H., Gacs, P., Gcs, P., Member, S., Li, M., Vitanyi, P.M.B., Zurek, W.H.: Information Distance. IEEE Transactions on Information Theory 44, 1407–1423 (1998)
Blohm, S., Cimiano, P., Stemle, E.: Harvesting relations from the web: quantifiying the impact of filtering functions. In: Proc. 22nd National Conf. on Artificial Intelligence, pp. 1316–1321. AAAI (2007)
Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M.: Mining for personal name aliases on the web. In: Proc. 17th International Conf. on the World Wide Web, pp. 1107–1108. ACM (2008)
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proc. 16th International Conf. on the World Wide Web, pp. 757–766. ACM, New York (2007)
Brank, J., Grobelnik, M., Mladenić, D.: A survey of ontology evaluation techniques. In: Proc. Conf. on Data Mining and Data Warehouses, SIKDD (2005)
Cederberg, S., Widdows, D.: Using LSA and Noun Coordination Information to Improve the Precision and Recall of Automatic Hyponymy Extraction. In: Proc. Conf. on Computational Natural Language Learning, pp. 111–118 (2003)
Cilibrasi, R., Vitanyi, P.M.B.: Normalized Web Distance and Word Similarity. Computing Research Repository, ArXiv e-prints (2009)
Cimiano, P., Staab, S.: Learning by googling. SIGKDD Explorations Newsletter 6(2), 24–33 (2004)
Cimiano, P., Wenderoth, J.: Automatic Acquisition of Ranked Qualia Structures from the Web. In: Proc. 45th Annual Meeting of the Association of Computational Linguistics, pp. 888–895. ACL, Prague (2007)
Costa, R.P., Seco, N.: Hyponymy extraction and web search behavior analysis based on query reformulation. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds.) IBERAMIA 2008. LNCS (LNAI), vol. 5290, pp. 332–341. Springer, Heidelberg (2008)
Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. In: Proc. 19th International Joint Conf. on Artificial Intelligence, pp. 1034–1041. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT (May 1998)
Gracia, J.L., Mena, E.: Web-Based Measure of Semantic Relatedness. In: Bailey, J., Maier, D., Schewe, K.-D., Thalheim, B., Wang, X.S. (eds.) WISE 2008. LNCS, vol. 5175, pp. 136–150. Springer, Heidelberg (2008)
Harris, Z.: Distributional structure. In: Papers in Structural and Transformational Linguistics, pp. 775–794. D. Reidel Publishing Comp., Dordrecht (1970)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. 14th Conf. on Computational Linguistics, pp. 539–545. ACL, Morristown (1992)
Lenat, D.: CYC: A Large-Scale Investment in Knowledge Infrastructure. Communications of the ACM 38, 33–38 (1995)
Magnini, B., Negri, M., Prevete, R., Tanev, H.: Is It the Right Answer? Exploiting Web Redundancy for Answer Validation. In: Proc. 40th Annual Meeting of the Association for Computational Linguistics, pp. 425–432 (2002)
Oliveira, P.C.: Probabilistic Reasoning in the Semantic Web using Markov Logic, pp. 67–73. University of Coimbra, Faculty of Sciences and Technology, Department of Informatics Engineering (July 2009)
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proc. 21st International Conf. on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 113–120. ACL, Sydney (2006)
Turney, P.D.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Wu, F., Weld, D.S.: Open Information Extraction Using Wikipedia. In: Proc. 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. ACL, Uppsala (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Costa, H.P., Gonçalo Oliveira, H., Gomes, P. (2011). Using the Web to Validate Lexico-Semantic Relations. In: Antunes, L., Pinto, H.S. (eds) Progress in Artificial Intelligence. EPIA 2011. Lecture Notes in Computer Science(), vol 7026. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24769-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-24769-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24768-2
Online ISBN: 978-3-642-24769-9
eBook Packages: Computer ScienceComputer Science (R0)