Design and Implementation of an Ontology Algorithm for Web Documents Classification

Guiyi Wei²⁴,
Jun Yu²⁴,
Yun Ling²⁴ &
…
Jun Liu²⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3983))

Included in the following conference series:

International Conference on Computational Science and Its Applications

862 Accesses

Abstract

Traditional methods of documents classification need characteristic abstraction and classifier training. The work of collecting trainable text terms is laborious and time-consuming. Additionally, it is difficult to abstract the characteristics from Chinese documents. In order to solve the problem, this paper proposes an ontology-based approach to improve the efficiency and effectiveness of web documents classification and retrieval. Firstly, the approach establishes an ontology model based on Hownet[6] kownledge base and its method. Then, it creates ontologies for each subclass of the classification system. It uses RDFS to convert Hownet into ontology and to define the relations among ontologies. The web documents classification is performed automatically using the ontology relevance calculating algorithm. Comparing with the method of KNN[2], the results of our experiments indicate that the accuracy of ontology-based approach is close to KNN, its algorithms is more robust than KNN, and its recalling rate is better than KNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improving Document Classification Effectiveness Using Knowledge Exploited by Ontologies

A General Framework for Text Document Classification Using SEMCON and ACVSR

Domain Ontology Graph Approach Using Markov Clustering Algorithm for Text Classification

References

Cortes, C., Vapnik, V.: Support vector networks. Machine learning (20), 273–297 (1995)
Google Scholar
Baoli, L., Qin, L., Shiwen, Y.: An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing (TALIP), 215–226 (2004)
Google Scholar
Kan, M.-Y.: Web page classification without the web page. In: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters table of contents, pp. 262–263 (2004)
Google Scholar
Ehrig, M., Maedche, A.: Ontology-focused crawling of Web documents. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 1174–1178 (2003)
Google Scholar
Lauser, B., Wildemann, T., Poulos, A., Fisseha, F., Keizer, J., Katz, S.: A comprehensive framework for building multilingual domain ontologies: Creating a prototype biosecurity ontology. In: DC-2002: Metadata for e-Communities: Supporting Diversity and Convergence,Florence, Italy (October 2002)
Google Scholar
Dong, Z.: Knowledge Description: What, How and Who? In: Proceedings of International Symposium on Electronic Dictionary, Tokyo, Japan (1988)
Google Scholar
Web Ontology Language (OWL), (Current November 10, 2005), http://www.w3.org/2004/OWL/
Resource Description Framework (RDF), (Current November 10, 2005), http://www.w3.org/RDF/
Report SMI-2001-0880. Stanford Knowledge Systems Laboratory., Available at http://www.ksl.stanford.edu/people/dlm/papers/ontologytutorial-noy-mcguinness-abstract.html
Arch-int, N.: A semantic information gathering approach for heterogeneous information sources on WWW. Journal of Information Science 29(5), 357 (2003)
Article Google Scholar
Martin, P., Eklund, P.: Embedding knowledge in Web documents. Computer Networks 31, 1403–1420 (1999)
Article Google Scholar
Liddy, E.D., Paik, W., Yu, E.S.: Text categorization for multiple users based on semantic features from a machine-readable dictionary. ACM Transaction on Information Systems 12(3), 278–295 (1994)
Article Google Scholar
Chen, L.-C., Luh, C.-J., Jou, C.: Generating page clippings from web search results using a dynamically terminated genetic algorithm. Information Systems, 299–316 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Zhejiang Gongshang University, Hangzhou, 310035, P. R. China
Guiyi Wei, Jun Yu, Yun Ling & Jun Liu

Authors

Guiyi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yun Ling
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
OptimaNumerics Ltd.,, Cathedral House 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Clayton School of IT, Monash University, 3800, Clayton, Australia
David Taniar
Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
School of Information and Communication Engineering, Sungkyunkwan University, Korea
Hyunseung Choo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, G., Yu, J., Ling, Y., Liu, J. (2006). Design and Implementation of an Ontology Algorithm for Web Documents Classification. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751632_71

Download citation

DOI: https://doi.org/10.1007/11751632_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34077-5
Online ISBN: 978-3-540-34078-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics