Abstract
The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Beitzel, S.M., et al.: Hourly Analysis of a Very Large Topically Categorized Web Query Log. In: Proc. of the 27th Conf. on Research and Development in Information Retrieval, pp. 321–328. ACM Press, New York (2004)
Cacheda, F., et al.: Performance Network Analysis for Distributed Information Retrieval Architectures. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 527–529. Springer, Heidelberg (2005)
Cacheda, F., et al.: Performance Network Analysis for Distributed Information Retrieval Architectures. Information Processing and Management Journal, published on-line (2006)
Cacheda, F., Plachouras, V.,, Ounis, I.: Performance Analysis of Distributed Architectures to Index One Terabyte of Text. In: McDonald, S., Tait, J. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 394–408. Springer, Heidelberg (2004)
Cacheda, F., Plachouras, V., Ounis, I.: A Case Study of Distributed Information Retrieval Architectures to Index One Terabyte of Text. Information Processing and Management Journal 41(5), 1141–1161 (2005)
Cacheda, F., Viña, A.: Experiences retrieving information in the World Wide Web. In: Proc. of the 6th IEEE Symposium on Computers and Communications, pp. 72–79. IEEE Computer Society Press, Los Alamitos (2001)
Cahoon, B., McKinley, K.S.: Performance evaluation of a distributed architecture for information retrieval. In: Proc. of 19th ACM-SIGIR International Conf. on Research and Development in Information Retrieval, pp. 110–118. ACM Press, New York (1996)
Frieder, O., Siegelmann, H.T.: On the Allocation of Documents in Multiprocessor Information Retrieval Systems. In: Proc. of the 14th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 230–239. ACM Press, New York (1991)
Hawking, D.: Scalable text retrieval for large digital libraries. In: Peters, C., Thanos, C. (eds.) ECDL 1997. LNCS, vol. 1324, pp. 127–146. Springer, Heidelberg (1997)
Hawking, D., Thistlewaite, P.: Methods for Information Server Selection. ACM Transactions on Information Systems 17(1), 40–76 (1999)
Jeong, B., Omiecinski, E.: Inverted File Partitioning Schemes in Multiple Disk Systems. IEEE Transactions on Parallel and Distributed Systems 6(2), 142–153 (1995)
Jones, C.B., et al.: Spatial information retrieval and geographical ontologies an overview of the SPIRIT project. In: Proc. of the 25th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 387–388. ACM Press, New York (2002)
Lin, Z., Zhou, S.: Parallelizing I/O intensive applications for a workstation cluster: a case study. ACM SIGARCH Computer Architecture News 21(5), 15–22 (1993)
Little, M.C.: JavaSim User’s Guide. Public Release 0.3, Version 1.0. University of Newcastle upon Tyne (Retrieved 1 June 2003), http://javasim.ncl.ac.uk/manual/javasim.pdf
Lu, Z., McKinley, K.: Partial collection replication versus caching for information retrieval systems. In: Proc. of the 25th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 248–255. ACM Press, New York (2000)
Moffat, A., Webber, W., Zobel, J.: Load Balancing for Term-Distributed Parallel Retrieval. In: Proc. of the 29th ACM-SIGIR Conf. on Research and Development in Information Retrieval, pp. 348–355. ACM Press, New York (2006)
Moffat, A., et al.: A pipelined architecture for distributed text query evaluation. Information Retrieval, published on-line (2006)
Moffat, A., Zobel, J.: What does it mean to “measure performance”? In: Zhou, X., et al. (eds.) WISE 2004. LNCS, vol. 3306, pp. 1–12. Springer, Heidelberg (2004)
Ounis, I., et al.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of ACM SIGIR’06 Workshop on Open Source Information Retrieval, ACM Press, New York (2006)
Ribeiro-Neto, B., Barbosa, R.: Query performance for tightly coupled distributed digital libraries. In: Proc. 3rd ACM Conf. on Digital Libraries, pp. 182–190. ACM Press, New York (1998)
Spink, A., et al.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3), 107–109 (1998)
Tomasic, A., Garcia-Molina, H.: Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In: Proc. 2nd Inter. Conf. on Parallel and Distributed Info. Systems, San Diego, California, pp. 8–17. IEEE Computer Society Press, Los Alamitos (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I. (2007). Performance Comparison of Clustered and Replicated Information Retrieval Systems. In: Amati, G., Carpineto, C., Romano, G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71496-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-71496-5_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71494-1
Online ISBN: 978-3-540-71496-5
eBook Packages: Computer ScienceComputer Science (R0)