Abstract
Different Web log studies calculate the same metrics using different search engines logs sampled during different observation periods and processed under different values of two controllable variables peculiar to the Web log analysis: a client discriminator used to exclude clients who are agents and a temporal cut-off used to segment logged client transactions into temporal sessions. How much are the results dependent on these variables? We analyze the sensitivity of the results to two controllable variables. The sensitivity analysis shows significant varying of the metrics values depending on these variables. In particular, the metrics varies up to 30-50% on the commonly assigned values. So the differences caused by controllable variables are of the same order of magnitude as the differences between the metrics reported in different studies. Thus, the direct comparison of the reported results is an unreliable approach leading to artifactual conclusions. To overcome the method-dependency of the direct comparison of the reported results we introduce and use a cross-analysis technique of the direct comparison of logs. Besides, we propose an alternative easy-accessible comparison of the reported metrics, which corrects the reported values accordingly to the controllable variables used in the studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Buzikashvili, N.: The Yandex study: First findings. Internet-math. Yandex, 95–120 (2005)
Holscher, C., Strube, G.: Web search behavior of internet experts and newbies. International Journal of Computer and Telecommunications Networking 33(1-6), 337–346 (2000)
Jansen, B.J., Spink, A.: How are we searching the World Wide Web? An analysis of nine search engine transaction logs. Inf. Processing & Management 42(1), 248–263 (2006)
Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a very large web search engine query log. SIGIR Forum 33(1), 6–12 (1999)
Spink, A., Ozmutlu, H.C., Ozmutlu, S., Jansen, B.J.: U.S. versus European Web search trends. SIGIR Forum 36(2), 32–38 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buzikashvili, N. (2006). Comparing Web Logs: Sensitivity Analysis and Two Types of Cross-Analysis. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_39
Download citation
DOI: https://doi.org/10.1007/11880592_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)