Abstract
Traditional methods of documents classification need characteristic abstraction and classifier training. The work of collecting trainable text terms is laborious and time-consuming. Additionally, it is difficult to abstract the characteristics from Chinese documents. In order to solve the problem, this paper proposes an ontology-based approach to improve the efficiency and effectiveness of web documents classification and retrieval. Firstly, the approach establishes an ontology model based on Hownet[6] kownledge base and its method. Then, it creates ontologies for each subclass of the classification system. It uses RDFS to convert Hownet into ontology and to define the relations among ontologies. The web documents classification is performed automatically using the ontology relevance calculating algorithm. Comparing with the method of KNN[2], the results of our experiments indicate that the accuracy of ontology-based approach is close to KNN, its algorithms is more robust than KNN, and its recalling rate is better than KNN.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cortes, C., Vapnik, V.: Support vector networks. Machine learning (20), 273–297 (1995)
Baoli, L., Qin, L., Shiwen, Y.: An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing (TALIP), 215–226 (2004)
Kan, M.-Y.: Web page classification without the web page. In: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters table of contents, pp. 262–263 (2004)
Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178 (2003)
Lauser, B., Wildemann, T., Poulos, A., Fisseha, F., Keizer, J., Katz, S.: A comprehensive framework for building multilingual domain ontologies: Creating a prototype biosecurity ontology. In: DC-2002: Metadata for e-Communities: Supporting Diversity and Convergence,Florence, Italy (October 2002)
Dong, Z.: Knowledge Description: What, How and Who? In: Proceedings of International Symposium on Electronic Dictionary, Tokyo, Japan (1988)
Web Ontology Language (OWL), (Current November 10, 2005), http://www.w3.org/2004/OWL/
Resource Description Framework (RDF), (Current November 10, 2005), http://www.w3.org/RDF/
Report SMI-2001-0880. Stanford Knowledge Systems Laboratory., Available at http://www.ksl.stanford.edu/people/dlm/papers/ontologytutorial-noy-mcguinness-abstract.html
Arch-int, N.: A semantic information gathering approach for heterogeneous information sources on WWW. Journal of Information Science 29(5), 357 (2003)
Martin, P., Eklund, P.: Embedding knowledge in Web documents. Computer Networks 31, 1403–1420 (1999)
Liddy, E.D., Paik, W., Yu, E.S.: Text categorization for multiple users based on semantic features from a machine-readable dictionary. ACM Transaction on Information Systems 12(3), 278–295 (1994)
Chen, L.-C., Luh, C.-J., Jou, C.: Generating page clippings from web search results using a dynamically terminated genetic algorithm. Information Systems, 299–316 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wei, G., Yu, J., Ling, Y., Liu, J. (2006). Design and Implementation of an Ontology Algorithm for Web Documents Classification. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751632_71
Download citation
DOI: https://doi.org/10.1007/11751632_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34077-5
Online ISBN: 978-3-540-34078-2
eBook Packages: Computer ScienceComputer Science (R0)