Abstract
To extract the feature of web page accurately is one of the basic topics of Web Data Mining. Considering the structure of web page, a block based feature selection method was imported in this article. A neural network could be used to recognize the priorities of different web page block and then the VPDom tree was built up. The experiment proves that Block Based Feature Selection could filter the “noisy” and enhance the main content of the web page.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of the 14th International Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)
Wang, Q., Tang, S.W.: DOM-Based Automatic Extraction of Topical Information from Web Pages. Journal of Computer Research and Development 41, 1786–1792 (2004)
Embley, D., Jiang, S., Ng, Y.-K.: Record-boundary discovery in Web documents. In: Proc. 1999 ACM SIGMOD International Conference on Management of Data, pp. 467–478 (1999)
Chen, J., Zhou, B., Shi, J., Zhang, H.-J., Qiu, F.: Function-Based Object Model Towards Website Adaptation. In: The Proceedings of the 10th World Wide Web Conference (WWW 2010), Budapest, Hungary, pp. 587–596 (May 2001)
Kovacevic, M., Diligenti, M., Gori, M., Milutinovic, V.: Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification. In: The Proceedings of 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 1345–1355 (2002)
Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information retrieval Using Web Page Segmentation. In: The Proceedings of Twelfth World Wide Web Conference (WWW 2003), Budapest, Hungary, pp. 11–18 (2003)
Michael, T.M.: Machine Learning, pp. 60–72. McGraw-Hill, New York (1997)
Hiemstra, D.: A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries 3(2), 131–139 (2000)
Liu, T.Y., Yang, Y., Wan, H., Zeng, H.J., Chen, Z., Ma, W.Y.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, Y., Liu, R., He, X., Huang, Y. (2011). Block Based Web Page Feature Selection with Neural Network. In: Lin, S., Huang, X. (eds) Advances in Computer Science, Environment, Ecoinformatics, and Education. CSEE 2011. Communications in Computer and Information Science, vol 215. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23324-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-23324-1_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23323-4
Online ISBN: 978-3-642-23324-1
eBook Packages: Computer ScienceComputer Science (R0)