Abstract
With Internet growing exponentially, topic-specific web crawler is becoming more and more popular in the web data mining. How to order the unvisited URLs was studied deeply, we present the notion of concept similarity context graph, and propose a novel approach to topic-specific web crawler, which calculates the unvisited URLs’ prediction score by concepts’ similarity in Formal Concept Analysis (FCA), while improving the retrieval precision and recall ratio. We firstly build a concept lattice using the visited pages, extract the core concepts which reflect the user’s query topic from the concept lattice, and then construct our concept similarity context graph based on the semantic similarities between the core concepts and other concepts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gulli, A., Signorini, A.: The indexable Web is More Than 11.5 BillionPages. In: Proceedings of the 14th International Conference on WWW (WWW 2005), pp. 902–903 (2005)
Chakrabarti, S., Berg, M., Dom, B.: Focused Crawling: a New Approach to Topicspecific Web Resource Discovery. Comput. Networks 31, 1623–1640 (1999)
Ching-Chi, H., Fan, W.: Topic-specific Crawling on the Web with the Measurements of the Relevancy Context Graph. Information Systems 31, 232–246 (2006)
Almpanidis, G., Kotropoulos, C., Pitas, I.: Combining Text and Link Analysis for Focused Crawling—An Application for Vertical Search Engines. Information Systems 32, 886–908 (2007)
Rungsawang, A., Angkawattanawit, N.: Learnable Topic-specific Web Crawler. Journal of Network and Computer Applications 28, 97–114 (2005)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin (1999)
CREDO Web Site, http://credo.fub.it/
Anna, F.: Ontology-based Concept Similarity in Formal Concept Analysis. Information Sciences 176, 2624–2641 (2006)
Li, Y., Bandar, Z.A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. On Knowledge and Data Engineering 15, 871–882 (2003)
Du, Y.J.: Study and Implement on Intelligent Action of Search Engine. Ph.D. dissertation, Southwest Jiaotong University (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Y., Du, Y., Sun, J., Hai, Y. (2008). A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_101
Download citation
DOI: https://doi.org/10.1007/978-3-540-85984-0_101
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85983-3
Online ISBN: 978-3-540-85984-0
eBook Packages: Computer ScienceComputer Science (R0)