Abstract
Database selection, also known as resource selection, server selection and query routing is an important topic in distributed information retrieval research. Several approaches to database selection use document frequency data to rank servers. Many researchers have shown that the effectiveness of these algorithms depends on database size and content. In this paper we propose a database selection algorithm which uses document frequency data and an extended database description in order to rank servers. The algorithm does not depend on the size and content of the databases in the system. We provide experimental evidence, based on actual data, that our algorithm outperforms the vGlOSS, CVV and CORI database selection algorithms in respect of the precision and recall evaluation measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bailey, P., Craswell, N., Hawking, D.: Engineering a Multi-Purpose Test Collection for Web Retrieval Experiments. In: Information Processing and Management (2002)
Callan, J.P., Lu, Z., Croft, W.B.: Searching Distributed Collections with Inference Networks. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–28. ACM Press, New York (1995)
Chakravarthy, A.S., Haase, K.B.: NetSerf: Using Semantic Knowledge to Find Internet Information Archives. In: Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11 (1995)
Cutting, D.R., Karger, D.R., Pederson, J.O., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)
Gravano, L., Garcia-Molina, H.: Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies. In: Proceedings of the 21st International Conference on Very Large Data Bases VLDB 1995, pp. 78–89 (1995)
Internet Archive. Internet Archive: Building an Internet Library (1997), http://www.archive.org
Khoussainov, R., O’Meara, T., Patel, A.: Adaptive Distributed Search and Advertising for WWW. In: Callaos, N., Holmes, L., Osers, R. (eds.) Proceedings of the Fifth World Multiconference on Systemics, Cybernetics and Informatics (SCI 2001), vol. 5, pp. 73–78 (2001)
Kirk, T., Levy, A.Y., Sagiv, Y., Srivastava, D.: The Information Manifold. In: Knoblock, C., Levy, A. (eds.) Information Gathering from Heterogeneous, Distributed Environments (1995)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)
Si, L., Callan, J.: The Effect of Database Size Distribution on Resource Selection Algorithms. In: Callan, J., Crestani, F., Sanderson, M. (eds.) Proceedings of the SIGIR 2003 Workshop on Distributed Information Retrieval (2003)
Si, L., Lu, J., Callan, J.: Distributed Information Retrieval With Skewed Database Size Distribution. In: Proceedings of the National Conference on Digital Government Research (2003)
Van Rijsbergen, C.J.: Information Retrieval. Department of Computing Science, University of Glasgow, 2nd edn. Butterworths (1979)
Voorhees, E.M.: Evaluation by Highly Relevant Documents. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74–82. ACM Press, New York (2001)
Voorhees, E.M., Harman, D.K.: Overview of the Ninth Text Retrieval Conference (TREC-9). In: Proceedings of the Ninth Text REtrieval Conference (TREC-9 ). Department of Commerce, National Institute of Standards and Technology (2001)
Yuwono, B., Lee, D.K.: WISE: A World Wide Web Resource Database System. Knowledge and Data Engineering 8(4), 548–554 (1996)
Yuwono, B., Lee, D.L.: Server Ranking for Distributed Text Retrieval Systems on the Internet. In: Topor, R.W., Tanaka, K. (eds.) Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), pp. 41–50. World Scientific, Singapore (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hyusein, B., Carthy, J. (2004). An Advanced Server Ranking Algorithm for Distributed Retrieval Systems on the Internet. In: Aykanat, C., Dayar, T., Körpeoğlu, İ. (eds) Computer and Information Sciences - ISCIS 2004. ISCIS 2004. Lecture Notes in Computer Science, vol 3280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30182-0_84
Download citation
DOI: https://doi.org/10.1007/978-3-540-30182-0_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23526-2
Online ISBN: 978-3-540-30182-0
eBook Packages: Springer Book Archive