Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

World Knowledge as Indirect Supervision for Document Clustering

Published: 26 December 2016 Publication History

Abstract

One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then, the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this article, we provide an example of using world knowledge for domain-dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then, we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, WordNet. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.
A preliminary version of this work appeared in the proceedings of KDD 2015 [Wang et al. 2015a]. This journal version has made several major improvements. First, we have proposed a new and general learning framework for machine learning with world knowledge as indirect supervision, where document clustering is a special case in the original paper. Second, in order to make our unsupervised semantic parsing method more understandable, we add several real cases from the original sentences to the resulting logic forms with all the necessary information. Third, we add details of the three semantic filtering methods and conduct deep analysis of the three semantic filters, by using case studies to show why the conceptualization-based semantic filter can produce more accurate indirect supervision. Finally, in addition to the experiment on 20 newsgroup data and Freebase, we have extended the experiments on clustering results by using all the combinations of text (20 newsgroup, MCAT, CCAT, ECAT) and world knowledge sources (Freebase, YAGO2).

References

[1]
Andrew Arnold, Ramesh Nallapati, and William W. Cohen. 2007. A comparative study of methods for transductive transfer learning. In Proceedings of ICDM Workshop on Mining and Management of Biological Data (ICDM’07). 77--82.
[2]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. Springer.
[3]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’07). 2670--2676.
[4]
Sugato Basu, Arindam Banerjee, and Raymond J. Mooney. 2002. Semi-supervised clustering by seeding. In Proceedings of International Conference on Machine Learning (ICML’02). 27--34.
[5]
Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. 2004. A probabilistic framework for semi-supervised clustering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’04). 59--68.
[6]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1533--1544.
[7]
Jonathan Berant and Percy Liang. 2014. Semantic parsing via paraphrasing. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL’14). 1415--1425.
[8]
Kurt D. Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of SIGMOD Conference. 1247--1250.
[9]
Guillaume Bouchard, Dawei Yin, and Shengbo Guo. 2013. Convex collective matrix factorization. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS’13). 144--152.
[10]
Qingqing Cai and Alexander Yates. 2013. Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL’13). 423--433.
[11]
Ming-Wei Chang, Lev Ratinov, and Dan Roth. 2012. Structured learning with constrained conditional models. Machine Learning 88, 3 (2012), 399--431.
[12]
O. Chapelle, B. Schölkopf, and A. Zien (Eds.). 2006. Semi-Supervised Learning. MIT Press.
[13]
Zhiyuan Chen and Bing Liu. 2014. Topic modeling using topics from many domains, lifelong learning and big data. In Proceedings of International Conference on Machine Learning (ICML’14). 703--711.
[14]
James Clarke and Mirella Lapata. 2006. Constraint-based sentence compression an integer programming approach. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL’06). 144--151.
[15]
Michael Collins and Nigel Duffy. 2001. Convolution kernels for natural language. NIPS. 625--632.
[16]
Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2007a. Co-clustering based classification for out-of-domain documents. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’07). 210--219.
[17]
Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2007b. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning. ACM, 193--200.
[18]
Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2008. Self-taught clustering. In Proceedings of International Conference on Machine Learning (ICML’08). 200--207.
[19]
Rina Dechter and Robert Mateescu. 2004. Mixtures of deterministic-probabilistic networks and their AND/OR search space. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (AUAI’04). 120--129.
[20]
Pascal Denis, Jason Baldridge, and others. 2007. Joint determination of anaphoricity and coreference resolution using integer programming. In Proceedings of NAACL Conference. 236--243.
[21]
Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. 2003. Information-theoretic co-clustering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’03). 89--98.
[22]
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’14). 601--610.
[23]
Eric Eaton and Paul L. Ruvolo. 2013. ELLA: An efficient lifelong learning algorithm. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 507--515.
[24]
Ofer Egozi, Shaul Markovitch, and Evgeniy Gabrilovich. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems 29, 2 (2011), 8.
[25]
Oren Etzioni, Michael Cafarella, and Doug Downey. 2004. WebScale information extraction in KnowItAll (preliminary results). In Proceedings of International Conference on World Wide Web (WWW’04). 100--110.
[26]
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545.
[27]
Anthony Fader, Luke S. Zettlemoyer, and Oren Etzioni. 2013. Paraphrase-driven learning for open question answering. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL’13). 1608--1618.
[28]
Christiane Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database. MIT Press.
[29]
Samah Fodeh, Bill Punch, and Pang-Ning Tan. 2011. On ontology-driven document clustering using core semantic features. Knowledge and Information Systems 28, 2 (2011), 395--421.
[30]
Evgeniy Gabrilovich and Shaul Markovitch. 2005. Feature generation for text categorization using world knowledge. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’05). 1048--1053.
[31]
Evgeniy Gabrilovich and Shaul Markovitch. 2006. Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’06). 1301--1306.
[32]
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’07). 1606--1611.
[33]
Evgeniy Gabrilovich and Shaul Markovitch. 2009. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34 (2009), 443--498.
[34]
Kuzman Ganchev, Joao Graça, Jennifer Gillenwater, and Ben Taskar. 2010. Posterior regularization for structured latent variable models. Journal of Machine Learning Research 11 (2010), 2001--2049.
[35]
Jiawei Han, Yizhou Sun, Xifeng Yan, and Philip S. Yu. 2010. Mining knowledge from databases: An information network analysis approach. In Proceedings of SIGMOD Conference. 1251--1252.
[36]
Huahai He and Ambuj K. Singh. 2006. Closure-tree: An index structure for graph queries. In Proceedings of IEEE International Conference on Data Engineering (ICDE’06). IEEE, 38--38.
[37]
Andreas Hotho, Steffen Staab, and Gerd Stumme. 2003. Ontologies improve text document clustering. In Proceedings of IEEE International Conference on Data Mining (ICDM’03). 541--544.
[38]
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, and Zheng Chen. 2008. Enhancing text clustering by leveraging Wikipedia semantics. In Proceedings of ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR’08). 179--186.
[39]
Xia Hu, Nan Sun, Chao Zhang, and Tat-Seng Chua. 2009a. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of ACM International Conference on Information and Knowledge (CIKM’09). 919--928.
[40]
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. 2009b. Exploiting Wikipedia as external knowledge for document clustering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’09). 389--396.
[41]
Wen Hua, Yangqiu Song, Haixun Wang, and Xiaofang Zhou. 2013. Identifying users’ topical tasks in web search. In Proceedings of ACM International Conference on Web Search and Data Mining (WSDM’13). 93--102.
[42]
Arto Klami, Guillaume Bouchard, and Abhishek Tripathi. 2013. Group-sparse embeddings in collective matrix factorization. Technical Report, ArXiv.
[43]
Xiangnan Kong, Jiawei Zhang, and Philip S. Yu. 2013. Inferring anchor links across multiple heterogeneous social networks. In Proceedings of ACM International Conference on Information and Knowledge (CIKM’13). 179--188.
[44]
Jayant Krishnamurthy and Tom M. Mitchell. 2012. Weakly supervised training of semantic parsers. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language (EMNLP-CoNLL’12). 754--765.
[45]
Tom Kwiatkowski, Eunsol Choi, Yoav Artzi, and Luke S. Zettlemoyer. 2013. Scaling semantic parsers with on-the-fly ontology matching. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1545--1556.
[46]
Tom Kwiatkowski, Luke S. Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2011. Lexical generalization in CCG grammar induction for semantic parsing. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1512--1523.
[47]
Ken Lang. 1995. NewsWeeder: Learning to filter netnews. In Proceedings of International Conference on Machine Learning (ICML’95). 331--339.
[48]
Douglas B. Lenat and R. V. Guha. 1989. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley.
[49]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5 (2004), 361--397.
[50]
Lianghao Li and Qiang Yang. 2015. Lifelong machine learning test. In Proceedings of the AAAI Workshop.
[51]
Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, Dan Roth, and Xifeng Yan. 2013. Mining evidences for named entity disambiguation. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’13). 1070--1078.
[52]
Percy Liang. 2013. Lambda dependency-based compositional semantics. Technical Report, ArXiv.
[53]
Xiao Ling, Wenyuan Dai, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2008. Spectral domain-transfer learning. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’08). 488--496.
[54]
Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu, and Sen Wu. 2013. Understanding and enhancement of internal clustering validation measures. IEEE Transactions on Cybernetics 43, 3 (2013), 982--994.
[55]
Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wú, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In Proceedings of International Conference on Machine Learning (ICML’06). 585--592.
[56]
Bo Long, Zhongfei Mark Zhang, and Philip S. Yu. 2007. A probabilistic framework for relational clustering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’07). 470--479.
[57]
Zhongqi Lu, Weike Pan, Evan Wei Xiang, Qiang Yang, Lili Zhao, and ErHeng Zhong. 2013. Selective transfer learning for cross domain recommendation. In Proceedings of SIAM International Conference on Data Mining (SDM’13). 641--649.
[58]
Zhongqi Lu, Yin Zhu, Sinno Jialin Pan, Evan Wei Xiang, Yujing Wang, and Qiang Yang. 2014. Source free transfer learning for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’14). 122--128.
[59]
Ping Luo, Hui Xiong, Guoxing Zhan, Junjie Wu, and Zhongzhi Shi. 2009. Information-theoretic distance measures for clustering validation: Generalization and normalization. IEEE Transactions on Knowledge and Data Engineering 21, 9 (2009), 1249--1262.
[60]
Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. 2012. Open language learning for information extraction. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language (EMNLP-CoNLL’12). 523--534.
[61]
Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. 2007. Mapping and revising Markov logic networks for transfer learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’07), Vol. 7. 608--614.
[62]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the ACL/AFNLP Conference. 1003--1011.
[63]
Tom M. Mitchell, William W. Cohen, Estevam R. Hruschka Jr., Partha Pratim Talukdar, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matthew Gardner, Bryan Kisiel, Jayant Krishnamurthy, Ni Lao, Kathryn Mazaitis, Thahir Mohamed, Ndapandula Nakashole, Emmanouil Antonios Platanios, Alan Ritter, Mehdi Samadi, Burr Settles, Richard C. Wang, Derry Tanti Wijaya, Abhinav Gupta, Xinlei Chen, Abulhair Saparov, Malcolm Greaves, and Joel Welling. 2015. Never-ending learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’15). 2302--2310.
[64]
Raymond J. Mooney. 2007. Learning for semantic parsing. In Proceedings of Conference on Intelligent Text Processing and Computational Linguistics (CICLing’07). 311--324.
[65]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of International Conference on Machine Learning (ICML’11). 809--816.
[66]
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359.
[67]
Simone Paolo Ponzetto and Michael Strube. 2007. Deriving a large-scale taxonomy from Wikipedia. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’07). 1440--1445.
[68]
Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics 34, 2 (2008), 257--287.
[69]
Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of Conference on Computational Natural Language Learning (CoNLL’08). 147--155.
[70]
Siva Reddy, Mirella Lapata, and Mark Steedman. 2014. Large-scale semantic parsing without question-answer pairs. Transactions of the Association for Computational Linguistics 2 (2014), 377--392.
[71]
Matthew Richardson and Pedro Domingos. 2006. Markov logic networks. Machine Learning 62, 1--2 (2006), 107--136.
[72]
Dan Roth and Wen-tau Yih. 2004. A linear programming formulation for global inference in natural language tasks. In Proceedings of Conference on Computational Natural Language Learning (CoNLL’08). 1--8.
[73]
Dan Roth and Wen-tau Yih. 2005. Integer linear programming inference for conditional random fields. In Proceedings of International Conference on Machine Learning (ICML’05). 736--743.
[74]
Dan Roth and Wen-tau Yih. 2007. Global inference for entity and relation identification via a linear programming formulation. Introduction to Statistical Relational Learning (2007), MIT Press, 553--580.
[75]
Rajhans Samdani, Ming-Wei Chang, and Dan Roth. 2012. Unified expectation maximization. In Proceedings of the Conference of the North American North American Chapter of the Association for Computational Linguistics (NAACL’12). 688--698.
[76]
Ajit P. Singh and Geoffrey J. Gordon. 2008. Relational learning via collective matrix factorization. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’08). 650--658.
[77]
Yangqiu Song, Shixia Liu, Xueqing Liu, and Haixun Wang. 2015. Automatic taxonomy construction from keywords via scalable Bayesian rose trees. IEEE Transactions on Knowledge and Data Engineering 27, 7 (2015), 1861--1874.
[78]
Yangqiu Song, Shimei Pan, Shixia Liu, Furu Wei, M. X. Zhou, and Weihong Qian. 2013. Constrained text coclustering with supervised and unsupervised constraints. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2013), 1227--1239.
[79]
Yangqiu Song, Haixun Wang, Weizhu Chen, and Shusen Wang. 2014. Transfer understanding from head queries to tail queries. In Proceedings of ACM International Conference on Information and Knowledge (CIKM’14). 1299--1308.
[80]
Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen. 2011. Short text conceptualization using a probabilistic knowledgebase. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’11). 2330--2336.
[81]
Yangqiu Song, Shusen Wang, and Haixun Wang. 2015. Open domain short text conceptualization: A generative + descriptive modeling approach. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’15). 3820--3826.
[82]
Alexander Strehl and Joydeep Ghosh. 2003. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (2003), 583--617.
[83]
Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. In Proceedings of International Conference on World Wide Web (WWW’07). 697--706.
[84]
Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, and Jiawei Han. 2011a. Co-author relationship prediction in heterogeneous bibliographic networks. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks (ASONAM’11). IEEE, 121--128.
[85]
Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: Principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery 3, 2 (2012), 1--159.
[86]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011b. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In Proceedings of the VLDB Endowment (PVLDB’11). 992--1003.
[87]
Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. 2012. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’12). 1348--1356.
[88]
S. V. N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, and Karsten M. Borgwardt. 2010. Graph kernels. Journal of Machine Learning Research 11 (Aug. 2010), 1201--1242.
[89]
Chenguang Wang, Nan Duan, Ming Zhou, and Ming Zhang. 2013. Paraphrasing adaptation for web search ranking. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL’13). 41--46.
[90]
Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, and Jiawei Han. 2015a. Incorporating world knowledge to document clustering via heterogeneous information networks. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’15). 1215--1224.
[91]
Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han. 2015b. KnowSim: A document similarity measure on structured heterogeneous information networks. In Proceedings of IEEE International Conference on Data Mining (ICDM’15). 1015--1020.
[92]
Chenguang Wang, Yangqiu Song, Dan Roth, Chi Wang, Jiawei Han, Heng Ji, and Ming Zhang. 2015c. Constrained information-theoretic tripartite graph clustering to identify semantically similar relations. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’15). 3882--3889.
[93]
Zheng Wang, Yangqiu Song, and Changshui Zhang. 2009. Knowledge transfer on hybrid graph. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI’09). 1291--1296.
[94]
Zhongyuan Wang, Fang Wang, Ji-Rong Wen, and Zhoujun Li. 2014a. Concept-based short text classification and ranking. In Proceedings of ACM International Conference on Information and Knowledge (CIKM’14). 1069--1078.
[95]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014b. Knowledge graph and text jointly embedding. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1591--1601.
[96]
Junjie Wu, Hui Xiong, and Jian Chen. 2009. Adapting the right measures for k-means clustering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’09). 877--886.
[97]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of SIGMOD Conference. 481--492.
[98]
Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, and Tie-Yan Liu. 2014. Rc-net: A general framework for incorporating knowledge into word representations. In Proceedings of ACM International Conference on Information and Knowledge (CIKM’14). 1219--1228.
[99]
Xifeng Yan, Philip S. Yu, and Jiawei Han. 2004. Graph indexing: A frequent structure-based approach. In Proceedings of SIGMOD Conference. ACM, 335--346.
[100]
Xuchen Yao and Benjamin Van Durme. 2014. Information extraction over structured data: Question answering with freebase. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL’14). 956--966.
[101]
Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2013. Predicting social links for new users across aligned heterogeneous social networks. In Proceedings of IEEE International Conference on Data Mining (ICDM’13). 1289--1294.
[102]
Jiawei Zhang, Xiangnan Kong, and Philip S. Yu. 2014. Transferring heterogeneous links across location-based social networks. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’14). 303--312.
[103]
Shijie Zhang, Meng Hu, and Jiong Yang. 2007. Treepi: A novel graph indexing method. In Proceedings of the IEEE International Conference on Data Engineering (ICDE’07). 966--975.
[104]
Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-rank: A comprehensive structural similarity measure over information networks. In Proceedings of ACM International Conference on Information and Knowledge (CIKM’09). 553--562.

Cited By

View all
  • (2022)Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web LogsJournal of Data and Information Quality10.1145/349039214:2(1-17)Online publication date: 23-Mar-2022
  • (2022)DACHA: A Dual Graph Convolution Based Temporal Knowledge Graph Representation Learning Method Using Historical RelationACM Transactions on Knowledge Discovery from Data10.1145/347705116:3(1-18)Online publication date: 30-Jun-2022
  • (2022)ICS: Total Freedom in Manual Text Classification Supported by Unobtrusive Machine LearningIEEE Access10.1109/ACCESS.2022.318400910(64741-64760)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 2
May 2017
419 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3017677
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2016
Accepted: 01 June 2016
Revised: 01 March 2016
Received: 01 October 2015
Published in TKDD Volume 11, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. World knowledge
  2. document clustering
  3. heterogeneous information network
  4. knowledge base
  5. knowledge graph

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)22
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Experience: Analyzing Missing Web Page Visits and Unintentional Web Page Visits from the Client-side Web LogsJournal of Data and Information Quality10.1145/349039214:2(1-17)Online publication date: 23-Mar-2022
  • (2022)DACHA: A Dual Graph Convolution Based Temporal Knowledge Graph Representation Learning Method Using Historical RelationACM Transactions on Knowledge Discovery from Data10.1145/347705116:3(1-18)Online publication date: 30-Jun-2022
  • (2022)ICS: Total Freedom in Manual Text Classification Supported by Unobtrusive Machine LearningIEEE Access10.1109/ACCESS.2022.318400910(64741-64760)Online publication date: 2022
  • (2019)Time-Sync Video Tag Extraction Using Semantic Association GraphACM Transactions on Knowledge Discovery from Data10.1145/333293213:4(1-24)Online publication date: 2-Jul-2019
  • (2018)Unsupervised meta-path selection for text similarity measure based on heterogeneous information networksData Mining and Knowledge Discovery10.1007/s10618-018-0581-y32:6(1735-1767)Online publication date: 1-Nov-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media