Abstract
This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can make web database administrators avoid unnecessarily requesting undownloadable and unmodified web pages in a page group. We postulate that the change behavior of web pages is strongly related to the past change behavior. We gather the change histories of approximately three million web pages at two-day intervals for 100 days, and estimate the future change behavior of those pages. Our estimation, which was evaluated by actual change behavior of the pages, worked well.
This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD). KRF-2006-214-D00136.
Chapter PDF
Similar content being viewed by others
References
Brewington, B., Cybenko, G.: How Dynamic is the Web? In: The 9th World Wide Web Conference, pp. 257–276 (2000)
Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: The 26th VLDB Conference, pp. 200–209 (2000)
Cho, J., Garcia-Molina, H.: Effective Page Refresh Policies for Web crawlers. ACM Transactions on Database Systems 28(4), 390–426 (2003)
Douglis, F., Feldmann, A., Krishnamurthy, B.: Rate of Change and Other Metrics: a Live Study of the World Wide Web. In: The 1st USENIX Symposium on Internetworking Technologies and System, pp. 147–158 (1997)
Edwards, G., McCurley, K., Tomlin, J.: Adaptive Model from Optimizing Performance of an Incremental Web Crawler. In: The 10th World Wide Web Conference, pp. 106–113 (2001)
Fetterly, D., Manasse, M., Najork, M., Wiener, J.L.: A large-scale study of the evolution of web pages. In: The 12th World Wide Web conference, pp. 669–678 (2003)
Ntoulas, A., Cho, J., Olston, C.: What’s New on the Web? The Evolution of the Web from a Search Engine Perspective. In: The 13th World Wide Web Conference, pp. 1–12 (2004)
Toyoda, M., Kitsuregawa, M.: What’s Really New on the Web? Identifying New Pages from a Series of Unstable Web Snapshots. In: The 15th World Wide Web Conference, pp. 233–241 (2006)
Kim, S.J., Lee, S.H.: Implementation of a Web Robot and Statistics on the Korean Web. In: Chung, C.-W., Kim, C.-k., Kim, W., Ling, T.-W., Song, K.-H. (eds.) HSI 2003. LNCS, vol. 2713, pp. 341–350. Springer, Heidelberg (2003)
Salton, G., Mcgill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Dhyani, D., Ng, W.K., Bhowmick, S.S.: A Survey of Web Metrics. ACM Computing Survey 34(4), 469–503 (2002)
Huberman, B.A.: The Laws of the Web: Patterns in the Ecology of Information. MIT Press, Cambridge (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Kim, S.J., Lee, S.H. (2007). Estimating the Change of Web Pages. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds) Computational Science – ICCS 2007. ICCS 2007. Lecture Notes in Computer Science, vol 4489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72588-6_129
Download citation
DOI: https://doi.org/10.1007/978-3-540-72588-6_129
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72587-9
Online ISBN: 978-3-540-72588-6
eBook Packages: Computer ScienceComputer Science (R0)