Abstract
The hyperlink analysis algorithm is widely used by public search engines. But with the development of the websites with dynamic script, this algorithm is not fit to realize the efficient searching for these related pages, because there is not enough hyperlink information for these pages. The research on the association information mining on web pages with dynamic scripts is progressing gradually. This paper proposes an improved search framework which can be more efficient for the pages with dynamic scripts. Then, by building up state information tables which is in accord with page changes of the same URL for these pages and state transition chains for pages loading, the paper presents an analysis algorithm based on state-interrelated matching of these pages. Finally, the paper detailedly describes entire implementing process of the algorithm, and demonstrates the efficiency of the algorithm by experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Xia, B., Gao, J., Wang, T., Yang, D.: An Efficient Valid Page Crawling Approach for Websites with Dynamic Scripts. Journal of Software 20, 176–183 (2009)
Wang, X., Zhou, A.: Linkage Analysis for the World Wide Web and Its Application: A Survey. Journal of Software 14, 1768–1780 (2003)
Zhou, C.: Document Clustering in Search Engine [Dr. Dissertation]. HuaZhong University of Science and Technology, Wuhan (2009)
Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C.: AJAX Crawl: Making AJAX Applications Searchable. In: Proceedings of the 25th International Conference on Data Engineering, pp. 78–89 (2009)
Duda, C., Frey, G., Kossmann, D.,et al.: AJAX Search: crawling, indexing and searching web 2.0 applications. In: Proc. VLDB Endow, pp. 1440–1443 (2008)
Mesbah, A., Bozdag, E., van Deursen, A.: Crawling ajax by inferring user interface state changes. In: Proceedings of the 8th International Conference on Web Engineering, pp. 122–134 (2008)
Tombros, A., Ali, Z.: Factors affecting Web page Similarity. In: Proceedings of the 27th European Conference on IR Research, pp. 487–501 (2005)
Chirita, P.A., Olmedilla, D., Nejdl, W.: Finding related pages using the link structure of the WWW. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 632–635 (2004)
Lin, Z., King, I., Lyu, M.R.: PageSim: A novel link-based measure of web pages similarity. In: Proceedings of the 15th International Conference on World Wide Web, pp. 1019–1020 (2006)
Sadi, M.S., Rahman, M.M.H., Horiguchi, S.: A new algorithm to measure relevance among web pages. In: Proceedings of the 7th International Conference on Data Mining and Information Engineering, pp. 243–251 (2006)
Fang, Q., Yang, G., Wu, Y., Zheng, W.: P2P Web Search Technology. Journal of Software 19, 2706–2719 (2008)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tan, T., Tan, L. (2011). An Efficient Algorithm of Association Information Mining on Web Pages with Dynamic Scripts. In: Zhiguo, G., Luo, X., Chen, J., Wang, F.L., Lei, J. (eds) Emerging Research in Web Information Systems and Mining. WISM 2011. Communications in Computer and Information Science, vol 238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24273-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-24273-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24272-4
Online ISBN: 978-3-642-24273-1
eBook Packages: Computer ScienceComputer Science (R0)