Abstract
This paper proposes a method of using ontology hierarchy in automatic topic identification. The fundamental idea behind this work is to exploit an ontology hierarchical structure in order to find a topic of a text. The keywords that are extracted from a given text will be mapped onto their corresponding concepts in the ontology. By optimizing the corresponding concepts, we will pick a single node among the concepts nodes that we believe is the topic of the target text. However, a limited vocabulary problem is encountered while mapping the keywords onto their corresponding concepts. This situation forces us to extend the ontology by enriching each of its concepts with new concepts using the external linguistics knowledge-base (WordNet). Our intuition of a high number keywords mapped onto the ontology concepts is that our topic identification technique can perform at its best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Banerjee, S., Mittal, V. O.: On the Use of Linguistics Ontologies for Acessing Distributed Digital Libraries. Proceeding of the First Annual Conference on Theory and Practice of Digital Libraries (1994)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced Hypertext Categorization Using Hyperlinks. ACM SIGMIND, Seattle, Washington (1998)
Chekuri, C., Goldwasser, M. H, Raghavan, P., Upfal, E.: Web Search Using Automated Classification. Poster at the Sixth International World Wide Web Conference (WWW6) (1997)
D’ Alessio, D., Murray, K., Schiaffino, R., Kreshenbaum, A.: Hierarchical Text Categorization. Proceeding RIAO2000 (2000)
D’ Alessio, D., Murray, K., Schiaffino, R., Kreshenbaum, A.: The effect of Topological Structure on Hierarchical Text Categorization. Proceeding of the Sixth Workshop on Very Large Corpora, COLLING ACL’ 98 (1998)
Gövert, N., Lalmas, M., Fuhr, N.: A Probabilistic Description-Oriented Approach for Categorizing Web Document. Proceeding of the Eighth International Conference on Information Knowledge Management, Kansas City, MO USA (1999) 475–482
Gelbukh, A., Sidorov, G., Guzman, A.: A Method of Describing Document Contents through Topic Selection. In Proc. of International Symposium on String Processing and Information Retrieval, Cancun, Mexico. Library of Congress 99-64139, IEEE Computer Society Press (1999)
Gelbukh, A., Sidorov, G., Guzman, A.: Use of a Weighted Topic Hierarchy for Document Classification. In Václav Matoušek et al (eds.): Text, Speech and Dialogue in Poc. 2nd International Workshop. Lecture Notes in Artificial Intelligence, No.92, ISBN 3-540-66494-7, Springer-Verlag., Czech Republic (1999) 130–135
Gelbukh, A., Sidorov, G., Guzman, A.,: Text Categorization Using a Hierarchical Topic Dictionary. Proc. Text Mining Workshop at 16th International Joint Conference on Artificial Intelligence (IJCAI’99), Stockholm, Sweden (1999)
Greiner, R., Grove, A, Schuurmans, D.: On learning hierarchical Classifications (1997)
Grobelnik, M., Mladenic, D.: Fast Categorization. In Proceedings of Third International Conference on Knowledge Discovery Data Mining (1998)
Guzman, A.: Finding the Main Themes in a Spanish Document. Journal Expert Systems with Application (1998) 139–148
Hoenkamp, E.: Spotting Ontological Lacunae through Spectrum Analysis Of Retrieved Documents. 13th European Conference On Artificial Intelligent, ECAI98, Brighton, England (1998)
Koller, D., Sahami, M.: Hierarchically Classifying Documents Using Very Few Words. In the Proceeding of Machine Learning (ICML-97) (1997) 170–176
Lee, J. Shin, D.: Multilevel Automatic Categorization for Webpages. The INET Proceeding’ 98 (1998)
Lin, C. Y, Hovy, E.: Identifying Topics by Position. In the Proceeding of The Workshop of Intelligent Scalable Text Summarization’ 97 (1997)
Lin, C. Y: Knowledge-based Automatic Topic Identification. In the Proceeding of The 33rd Annual Meeting of the Association for Computational Linguistics’ 95 (1995)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, Y.A.: Improving Text Classification by Shrinkage in a Hierarchy of Classes. Proceeding of the 15th Conference on Machine Learning (ICML-98) (1998)
Miller, G. A, Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An-Online Lexical Database. Five Papers on WordNet (1993)
Quek, C. Y, Mitchell, T: Classification of World Wide Web Documents. Seniors Honors Thesis, School of Computer Science, Carnegie Melon University (1998)
Scott, S., Matwin, S.: Text Classification using WordNet Hypernyms. In the Proceeding of Workshop-Usage of WordNet in Natural Language Processing Systems, Montreal, Canada (1998)
Sense Tagger. UTMK Internal Paper. Universiti Sains Malaysia, Penang, Malaysia (1999)
Soderland, S.: Learning to extract text-based information from World Wide Web. In the Proceeding of the Third International Conference on Knowledge Discovery and Data-Mining (1997)
Voorhees, E. M.: On Expanding Query Vectors with Lexically Related Words. Proceeding of the Second Text REtrieval Conference (TREC-2), NIST Special Publication, Gatherburg, Maryland} (1993)
Weigned, A. S, Wiener, E. D, Pedersen, J. O.: Working Papers IS-98-22. Dept. of Info. System, Leonard N. Stern, School Of Business, New York University (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tiun, S., Abdullah, R., Kong, T.E. (2001). Automatic Topic Identification Using Ontology Hierarchy. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2001. Lecture Notes in Computer Science, vol 2004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44686-9_43
Download citation
DOI: https://doi.org/10.1007/3-540-44686-9_43
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41687-6
Online ISBN: 978-3-540-44686-6
eBook Packages: Springer Book Archive