An efficient content extraction method for webpage based on tag-line-block analysis
Abstract
References
Recommendations
Web Content Extraction based on Webpage Layout Analysis
ITCS '10: Proceedings of the 2010 Second International Conference on Information Technology and Computer Sciencefor web content extraction task, researchers have proposed many different methods, such as wrapper-based method, DOM tree rule-based method, machine learning-based method and so on. To some extent, all these methods ignore the layout information of the ...
Automatic Web Content Extraction for Generating Tag Clouds from Thai Web Sites
ICEBE '11: Proceedings of the 2011 IEEE 8th International Conference on e-Business EngineeringThis paper proposes a novel Web content extraction approach based on heuristic rules and the XPath utility in XML. The main objective is to address the problem of Web visualization by generating tag clouds from Thai Web sites in order to provide an ...
Using main content extraction to improve performance of Vietnamese web page classification
SoICT '11: Proceedings of the 2nd Symposium on Information and Communication TechnologyWeb page classification is the process of categorizing a web page into one or more classes which have been predetermined. If we remove all HTML tags from a web page, then this process can be considered as a text classification problem. However, this ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Author Tags
Qualifiers
- Research-article
Funding Sources
- National Key Research and Development Program of China
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
Citations
View Options
View options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in