Abstract
Performing information retrieval (IR) efficiently in a distributed environment is currently one of the main challenges in IR. Document representations are distributed among nodes in a manner that allows a query processing algorithm to efficiently direct queries to those nodes that contribute to the result. Existing term-based document distribution algorithms do not scale with large collection sizes or many-term queries because they incur heavy network traffic during the distribution and query phases.
We propose a novel algorithm for document distribution, namely distance-based document distribution. The distribution obtained by our algorithm allows answering any IR query effectively by contacting only a few nodes, independent of both document collection size and network size, thereby improving efficiency. We accomplish this by linearizing the information retrieval search space such that it reflects the ranking formula which will be used for later retrieval.
Our experimental evaluation indicates that effective information retrieval can be efficiently accomplished in distributed networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Callan, J.: Distributed Information Retrieval. In: Bruce Croft, W. (ed.) Advances Information Retrieval: Recent Research from the CIIR, Ch. 5, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)
Aberer, K., Alima, L., Ghodsi, A., Girdzijauskas, S., Haridi, S., Hauswirth, M.: The essence of p2p: a reference architecture for overlay networks. In: P2P 2005. Fifth IEEE International Conference on Peer-to-Peer Computing, pp. 11–20 (August 31- September 2, 2005)
Samet, H.: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading (1989)
Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network, pp. 161–172 (2001)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications, pp. 149–160 (2001)
Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, University of California at Berkeley (2001)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM 2003. Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 175–186. ACM Press, New York, NY, USA (2003)
Neumann, T., Bender, M., Michel, S., Weikum, G.: A reproducible benchmark for p2p retrieval. In: Bonnet, P., Manolescu, I. (eds.) ExpDB, pp. 1–8. ACM, New York (2006)
Aghbari, Z.A., Makinouchi, A.: Linearization approach for efficient KNN search of high-dimensional data. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 229–238. Springer, Heidelberg (2004)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Aberer, K., Klemm, F., Rajman, M., Wu, J.: An architecture for peer-to-peer information retrieval [17]
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)
Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33(1), 89–94 (2003)
Nottelmann, H., Fischer, G., Titarenko, A., Nurzenski, A.: An integrated approach for searching and browsing in heterogeneous peer-to-peer networks. In: Heterogeneous and Distributed Information Retrieval (2005)
Bender, M., Michel, S., Weikum, G., Zimmer, C.: Bookmark-driven query routing in peer-to-peer web search [17]
Callan, J., Fuhr, N., Nejdl, W. (eds.): Proceedings of the SIGIR Workshop on Peer-to-Peer Information Retrieval, 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 29, 2004). In: Callan, J., Fuhr, N., Nejdl, W. (eds.): Peer-to-Peer Information Retrieval (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Herschel, S. (2007). Similarity-Based Document Distribution for Efficient Distributed Information Retrieval. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-76993-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76992-7
Online ISBN: 978-3-540-76993-4
eBook Packages: Computer ScienceComputer Science (R0)