Abstract
This paper describes a new method to automatically obtain a new thesaurus which exploits previously collected information. Our method relies on different resources, such as a text collection, a set of source thesauri and other linguistic resources. We have applied different techniques in the different phases of the process. By applying indexing techniques, the text collection provides the set of initial terms of interest for the new thesaurus. Then, these terms are searched in the source thesauri, providing the initial structure of the new thesaurus. Finally, the new thesaurus is enriched by searching for new relationships among its terms. These relationships are first detected using similarity measures and then are characterized with a type (equivalence, hierarchy or associativity) by using different linguistic resources. We have based the system evaluation on the results obtained with and without the thesaurus in an information retrieval task proposed by the Cross-Language Evaluation Forum (CLEF). The results of these experiments have revealed a clear improvement of the performance.
Supported by project TIC2003-09481-C04.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zazo, A.F., Figuerola, C.G., Alonso Berrocal, J.L., Rodríguez, E.: Reformulation of queries using similarity thesauri. Information Processing and Management 41(5), 1163–1173 (2005)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Salton, G.: Automatic Information Organization and Retrieval. McGraw Hill Book Co. (1968)
Jing, Y., Bruce Croft, W.: An association thesaurus for information retrieval. In: Proceedings of RIAO 1994, 4th International Conference. Recherche d’Information Assistee par Ordinateur, New York, US, pp. 146–160 (1994)
Sparck Jones, K., Needham, R.M.: Automatic Term Classification and Retrieval. Information Processing and Management 4(1), 91–100 (1968)
Qiu, Y., Frei, H.-P.: Applying a similarity thesaurus to a large collection for information retrieval (1993)
Qiu, Y., Frei, H.-P.: Concept-based query expansion. In: Proceedings of SIGIR-1993, 16th ACM International Conference on Research and Development in Information Retrieval, Pittsburgh, US, pp. 160–169 (1993)
Qiu, Y., Frei, H.-P.: Improving the retrieval effectiveness by a similarity thesaurus. Technical Report 225, Dept of Computer Science, Swiss Federal Institute of Technology (ETH), Zürich, Switzerland (1995)
Salton, G., Buckley, C., Yu, C.T.: An evaluation of term dependence models in information retrieval. In: SIGIR 1982: Proceedings of the 5th annual ACM conference on Research and development in information retrieval, pp. 151–173. Springer, Heidelberg (1982)
van. Rijsbergen, C.J., Harper, D.J., Porter, M.F.: The selection of good search terms. Information Processing and Management 17(2), 77–91 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez-Agüera, J.R., Araujo, L. (2006). Query Expansion with an Automatically Generated Thesaurus. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_93
Download citation
DOI: https://doi.org/10.1007/11875581_93
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)