Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2020408.2020597acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Ontology enhancement and concept granularity learning: keeping yourself current and adaptive

Published: 21 August 2011 Publication History

Abstract

As a well-known semantic repository, WordNet is widely used in many applications. However, due to costly edit and maintenance, WordNet's capability of keeping up with the emergence of new concepts is poor compared with on-line encyclopedias such as Wikipedia. To keep WordNet current with folk wisdom, we propose a method to enhance WordNet automatically by merging Wikipedia entities into WordNet, and construct an enriched ontology, named as WorkiNet. WorkiNet keeps the desirable structure of WordNet. At the same time, it captures abundant information from Wikipedia. We also propose a learning approach which is able to generate a tailor-made semantic concept collection for a given document collection. The learning process takes the characteristics of the given document collection into consideration and the semantic concepts in the tailor-made collection can be used as new features for document representation. The experimental results show that the adaptively generated feature space can outperform a static one significantly in text mining tasks, and WorkiNet dominates WordNet most of the time due to its high coverage.

References

[1]
O. Fernandez, M. Ellsworth, R. Munoz, and C. F. Baker. Aligning framenet and wordnet based on semantic neighborhoods. In LREC '10: Proceedings of the 7th Conference on International Language Resources and Evaluation, pages 310--314, 2010.
[2]
L. Bentivogli and E. Pianta. Extending wordnet with syntagmatic information. In Proceedings of Second Global WordNet Conference, pages 47--53, 2004.
[3]
L. Bing, B. Sun, S. Jiang, Y. Zhang, and W. Lam. Learning ontology resolution for document representation and its applications in text mining. In CIKM '10: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pages 1713--1716, 2010.
[4]
C. C. Chang and C.J.Lin. LIBSVM: a library for support vector machines. 2001.
[5]
D. Chotikapanich and W. Griffiths. On calculation of the extended gini coefficient. Review of Income and Wealth, pages 541--547, 2001.
[6]
S. Deerwester, S. Dumais, G. Furnas, T.K.Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.
[7]
C. Fellbaum. WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, 1998.
[8]
C. Gini. Variabilita e mutabilita. Bologna: Tipografia di Paolo Cuppini, 1912.
[9]
J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran. Indexing with wordnet synsets can improve text retrieval. In Proceedings ACL/COLING Workshop on Usage of WordNet for Natural Language Processing, pages 38--44, 1998.
[10]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pages 50--57, 1999.
[11]
A. Hotho, A. Maedche, and S. Staab. Ontology-based text document clustering. Kunstliche Intelligenz, 4:48--54, 2002.
[12]
A. Hotho, S. Staab, and G. Stumme. Wordnet improves text document clustering. In Proceeding of the SIGIR 2003 Semantic Web Workshop, 2003.
[13]
X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In CIKM '09: Proceedings of the 18th International Conference on Information and Knowledge Management, pages 919--928, 2009.
[14]
L. Jing, L. Zhou, M. K. Ng, and J. Z. Huang. Ontology-based distance measure for text clustering. In Proceedings of the Text Mining Workshop, SIAM International Conference on Data Mining, 2006.
[15]
C. Leacock and M. Chodorow. Combining local context and wordnet similarity for word sense identification. Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database, pages 265--283, 1998.
[16]
B. Magnini and M. Speranza. Integrating generic and specilized wordnets. In RANLP: Recent Advances in Natural Language Processing, pages 149--153, 2001.
[17]
D. M.Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[18]
J. Morato, M. A. Marzal, J. Llorens, and J. Moreiro. Wordnet applications. In GWC '04: Proceedings of the Second Global Wordnet Conference, 2004.
[19]
H. G. Oliveira and P. Gomes. Towards the automatic creation of a wordnet from a term-based lexical network. In TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, pages 10--18, 2010.
[20]
S. P. Ponzetto and R. Navigli. Large-scale taxonomy mapping for restructuring and integrating wikipedia. In IJCAI '03: Proceedings of the 21st International Joint Conference on Artificial Intelligence, pages 2083--2088, 2009.
[21]
D. R. Recupero. A new unsupervised method for document clustering by using wordnet lexical and conceptual relations. Information Retrieval, 10(6):563--579, 2007.
[22]
M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic assignment of wikipedia encyclopedic entries to wordnet synsets. In AWIC '05: Proceedings of Advances in Web Intelligence, pages 380--386, 2005.
[23]
S. Scott and S. Matwin. Text classification using wordnet hypernyms. In COLING-ACL '98: Workshop on Usage of WordNet in NLP Systems, pages 45--51, 1998.
[24]
S. Shehata. A wordnet-based semantic model for enhancing text clustering. In ICDMW '09: Proceedings of the IEEE International Conference on Data Mining Workshops, pages 477--482, 2009.
[25]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In WWW '07: Proceedings of the 16th International Conference on World Wide Web, pages 697--706, 2007.
[26]
M. Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. In CIKM '93: Proceedings of the Second International Conference on Information and Knowledge Management, pages 67--74, 1993.
[27]
P. Vossen. Extending, trimming and fusing wordnet for technical documents. In NAACL Workshop on WordNet and Other Lexical Resources, 2001.
[28]
Y. Yang and X. Liu. A re-examination of text categorization methods. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 42--49, 1999.
[29]
I. Yoo, X. Hu, and I.-Y. Song. Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering. In KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 791--796, 2006.

Cited By

View all
  • (2020)A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data ScenarioInternational Journal of Information Technology & Decision Making10.1142/S0219622020500182(1-41)Online publication date: 12-Jun-2020
  • (2015)Adaptive Concept Resolution for document representation and its applications in text miningKnowledge-Based Systems10.1016/j.knosys.2014.10.00374:1(1-13)Online publication date: 1-Jan-2015
  • (2013)Towards an enhanced and adaptable ontology by distilling and assembling online encyclopediasProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505597(1703-1708)Online publication date: 27-Oct-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ontology
  2. tailor-made concept representation learning
  3. workinet

Qualifiers

  • Poster

Conference

KDD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data ScenarioInternational Journal of Information Technology & Decision Making10.1142/S0219622020500182(1-41)Online publication date: 12-Jun-2020
  • (2015)Adaptive Concept Resolution for document representation and its applications in text miningKnowledge-Based Systems10.1016/j.knosys.2014.10.00374:1(1-13)Online publication date: 1-Jan-2015
  • (2013)Towards an enhanced and adaptable ontology by distilling and assembling online encyclopediasProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505597(1703-1708)Online publication date: 27-Oct-2013
  • (2012)Actively Mining Search Logs for Diverse TagsInformation Retrieval Technology10.1007/978-3-642-35341-3_48(528-538)Online publication date: 2012
  • (2011)CCEProceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I10.1007/978-3-642-25853-4_16(201-214)Online publication date: 17-Dec-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media