Abstract
One of the basic requirements of Web mining is a crawler system, which collects the information from the Web. To predict the performance, dependability and other operational measures of a system, it is required to construct and evaluate a formal model of the system. We have constructed a formal model for a distributed crawler, which is based on UbiCrawler, using stochastic activity networks (SANs). The constructed SAN model is used to evaluate some performance measures of the crawler. The results of the evaluation of throughput are same as the published statistics of UbiCrawler. In addition, we have been able to evaluate two other measures that are communication overhead and coverage. In this paper, we will discuss the architecture of the distributed crawler. Then, we will present a SAN model of the crawler and the results of its evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adamic, L.A.: Zipf, Power-Laws, and Pareto - A Ranking Tutorial. White Paper, Information Dynamics Lab, HP Labs, Palo Alto, CA (2000)
Avizienis, A., Laprie, C.-J., Randell, B., Landwehr, C.: Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. on Dependable And Secure Computing 1(1), 11–33 (2004)
Baeza-Yates, R., Castillo, C.: Crawling the Infinite Web. Journal of Web Engineering 6(1), 49–72 (2007)
Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., Silvestri, F.: Challenges on Distributed Web Retrieval. In: Proc. of the 23rd IEEE International Conference on Data Eng. (ICDE 2007) (2007)
Baeza-Yates, R., Castillo, C., Efthimiadis, E.N.: Characterization of National Web Domains. ACM Transaction on Internet Technology 7(2) (2005)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: a Scalable Fully Distributed Web Crawler. Journal of Software, Practice and Experience 34(8), 711–726 (2004)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hyper Textual Web Search Engine. In: Proc. of the 7th International Conference on World Wide Web, pp. 107–117 (1998)
Castillo, C.: Effective Web Crawling. PhD Thesis, University of Chile (2004)
Cho, J., Garcia-Molina, H.: Parallel Crawlers. In: Proc. of the 11th International Conference on World Wide Web, pp. 124–135. ACM Press, Honolulu (2002)
Deavours, D.D., et al.: The Möbius Framework and Its Implementation. IEEE Transaction on Software Engineering 28(10), 956–969 (2002)
Exposto, J., Macedo, J., Pina, A., Alves, A., Rufino, J.: Geographical Partition for Distributed Web Crawling. In: Proc. of the 2005 Workshop on Geographic Information Retrieval (GIR 2005), pp. 55–60. ACM Press, New York (2005)
Gomes, D., Silva, M.J.: The Viuva Negra Crawler. Technical Report (2006)
Heydon, A., Najork, M.: Mercator: A Scalable, Extensible Web Crawler. World Wide Web 2(4), 219–229 (1999)
Internet Growth and Statistics: Credits and Background, http://www.mit.edu/people/mkgray/net/background.html
Kahle, B.: The Internet Archive. RLG Diginews 6(3) (2002)
Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., Panigrahy, R.: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In: Proc. of the 29th Annual ACM Symposium on Theory of Computing, El Paso, Texas, pp. 654–663 (1997)
Movaghar, A., Meyer, J.F.: Performability Modeling with Stochastic Activity Networks. In: Proc. of the 1984 Real-Time Systems Symposium, Austin, TX, pp. 215–224 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nasri, M., Shariati, S., Abdollahi Azgomi, M. (2008). Performance Modeling of a Distributed Web Crawler Using Stochastic Activity Networks. In: Sarbazi-Azad, H., Parhami, B., Miremadi, SG., Hessabi, S. (eds) Advances in Computer Science and Engineering. CSICC 2008. Communications in Computer and Information Science, vol 6. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89985-3_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-89985-3_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89984-6
Online ISBN: 978-3-540-89985-3
eBook Packages: Computer ScienceComputer Science (R0)