Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Learning Semantic Representations from Directed Social Links to Tag Microblog Users at Scale

Published: 07 March 2020 Publication History

Abstract

This article presents a network embedding approach to automatically generate tags for microblog users. Instead of using text data, we aim to annotate microblog users with meaningful tags by leveraging rich social link data. To utilize directed social links, we use two kinds of node representations for modeling user interest in terms of their followers and followees, respectively. To alleviate the sparsity problem, we propose a novel method based on two transformation functions for capturing implicit interest similarity. Different from previous works on capturing high-order proximity, our model is able to directly characterize the effect of the context user on the proximity of node pairs. Another novelty of our model is that the importance scores of users learned from the classic PageRank algorithm are utilized to set the link weights. By using such weights, our model is more capable of disentangling the interest similarity evidence of a link. We jointly consider the above factors when designing the final objective function.
We construct a very large evaluation set consisting of 2.6M users, 0.5M tags, and 0.8B following links. To our knowledge, it is the largest reported dataset for microblog user tagging in the literature. Extensive experiments on this dataset demonstrate the effectiveness of the proposed approach. We implement this approach with several optimization techniques, which makes our model easy to scale to very large social networks. Ubiquitous social links provide important data resources to understand user interests. Our work provides an effective and efficient solution to annotate user interests solely using the link data, which has important practical value in industry. To illustrate the use of our models, we implement a demonstration system for visualizing, navigating, and searching microblog users.

References

[1]
Morgan Ames and Mor Naaman. 2007. Why we tag: Motivations for annotation in mobile and online media. In Proceedings of the International Conference on Human Factors in Computing Systems (CHI’07). 971--980.
[2]
Kerstin Bischoff, Claudiu S. Firan, Wolfgang Nejdl, and Raluca Paiu. 2008. Can all tags be used for search? In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’18). 193--202.
[3]
Cheng Cao, Hancheng Ge, Haokai Lu, Xia Hu, and James Caverlee. 2017. What are you known for?: Learning user topical profiles with implicit and explicit footprints. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 743--752.
[4]
Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph representations with global structural information. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM’15).
[5]
David Carmel, Haggai Roitman, and Elad Yom-Tov. 2009. Who tags the tags?: A framework for bookmark weighting. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). 1577--1580.
[6]
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’15). 119--128.
[7]
Charalampos Chelmis and Viktor K. Prasanna. 2013. Social link prediction in online social tagging systems. ACM Trans. Inf. Syst. 31, 4 (2013), 20:1--20:27.
[8]
Maarten Clements, Arjen P. de Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4 (2010), 21:1--21:42.
[9]
Jose M. Conde, David Vallet, and Pablo Castells. 2010. Inferring user intent in web search by exploiting social annotations. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 827--828.
[10]
Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. 2019. A survey on network embedding. IEEE Trans. Knowl. Data Eng. 31, 5 (2019), 833--852.
[11]
Laurens Van Der Maaten and Geoffrey E. Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (2008), 2579--2605.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Human Language Technology: Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT’19). 4171--4186.
[13]
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’17). 135--144.
[14]
Yang Fang, Xiang Zhao, Peixin Huang, Weidong Xiao, and Maarten de Rijke. 2019. M-HIN: Complex embeddings for heterogeneous information networks via metagraphs. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 913--916.
[15]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’16). 855--864.
[16]
Yupeng Gu, Yizhou Sun, Yanen Li, and Yang Yang. 2018. RaRE: Social rank regulated large-scale network embedding. In Proceedings of the International Conference on World Wide Web (WWW’18). 359--368.
[17]
Ziyu Guan, Jiajun Bu, Qiaozhu Mei, Chun Chen, and Can Wang. 2009. Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). 540--547.
[18]
Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han. 2010. Survey on social tagging techniques. SIGKDD Explor. 12, 1 (2010), 58--72.
[19]
Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. 2010. Social media recommendation based on people and tags. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 194--201.
[20]
Harry Halpin, Valentin Robu, and Hana Shepherd. 2007. The complex dynamics of collaborative tagging. In Proceedings of the International Conference on World Wide Web (WWW’07). 211--220.
[21]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1024--1034.
[22]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. IEEE Data Eng. Bull. (2017).
[23]
Paul Heymann, Daniel Ramage, and Hector Garcia-Molina. 2008. Social tag prediction. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). 531--538.
[24]
Chong Huang, Yonghong Tian, Zhi Zhou, Charles X. Ling, and Tiejun Huang. 2006. Keyphrase extraction using semantic networks structure analysis. In Proceedings of the 6th International Conference on Data Mining (ICDM’06). IEEE, 275--284.
[25]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17).
[26]
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632.
[27]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue B. Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 591--600.
[28]
Yi-An Lai, Chin-Chi Hsu, Wen-Hao Chen, Mi-Yen Yeh, and Shou-De Lin. 2017. PRUNE: Preserving proximity and global ranking for network embedding. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS’17). 5263--5272.
[29]
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning (ICML’14). 1188--1196.
[30]
Chenliang Li, Aixin Sun, Jianshu Weng, and Qi He. 2015. Tweet segmentation and its application to named entity recognition. IEEE Trans. Knowl. Data Eng. 27, 2 (2015), 558--570.
[31]
Chenliang Li, Jianshu Weng, Qi He, Yuxia Yao, Anwitaman Datta, Aixin Sun, and Bu-Sung Lee. 2012. TwiNER: Named entity recognition in targeted Twitter stream. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). 721--730.
[32]
Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). 939--948.
[33]
Kaipeng Liu, Binxing Fang, and Weizhe Zhang. 2010. Ontology emergence from folksonomies. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM’10). 1109--1118.
[34]
Zhiyuan Liu, Xinxiong Chen, and Maosong Sun. 2012. Mining the interests of Chinese microbloggers via keyword extraction. Front. Comput. Sci. China 6, 1 (2012), 76--87.
[35]
Caimei Lu, Xin Chen, and E. K. Park. 2009. Exploit the tripartite network of social tagging for web clustering. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). 1545--1548.
[36]
Philip James McParlane and Joemon M. Jose. 2014. A novel system for the semi automatic annotation of event images. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 1269--1270.
[37]
Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, and Yu Chi. 2017. Deep keyphrase generation. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (ACL’17), Volume 1: Long Papers. 582--592.
[38]
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04), Held in Conjunction with ACL’04. 404--411.
[39]
Jingchao Ni, Shiyu Chang, Xiao Liu, Wei Cheng, Haifeng Chen, Dongkuan Xu, and Xiang Zhang. 2018. Co-regularized deep multi-network embedding. In Proceedings of the World Wide Web Conference (WWW’18). 469--478.
[40]
Liqiang Nie, Yi-Liang Zhao, Xiangyu Wang, Jialie Shen, and Tat-Seng Chua. 2014. Learning to recommend descriptive tags for questions in social forums. ACM Trans. Inf. Syst. 32, 1 (2014), 5:1--5:23.
[41]
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computat. Ling. 29, 1 (2003), 19--51.
[42]
Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric transitivity preserving graph embedding. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’16).
[43]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
[44]
Jing Peng, Daniel Dajun Zeng, Huimin Zhao, and Fei-Yue Wang. 2010. Collaborative filtering in social tagging systems based on joint item-tag recommendations. In Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM’10). 809--818.
[45]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD’14). 701--710.
[46]
Giovanni Quattrone, Licia Capra, Pasquale De Meo, Emilio Ferrara, and Domenico Ursino. 2011. Effective retrieval of resources in folksonomies using a new tag similarity measure. In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM’11). 545--550.
[47]
Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the Web Search and Data Mining Conference (WSDM’10). 81--90.
[48]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290 (2000), 2323--2326.
[49]
Jialie Shen, Wang Meng, Shuicheng Yan, HweeHwa Pang, and Xian-Sheng Hua. 2010. Effective music tagging through advanced statistical modeling. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). 635--642.
[50]
Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous information network embedding for recommendation. IEEE Trans. Knowl. Data Eng. 31, 2 (2019), 357--370.
[51]
Sifatullah Siddiqi and Aditi Sharan. 2015. Keyword and keyphrase extraction techniques: A literature review. Int. J. Comput. Applic. 109, 2 (2015).
[52]
Dimitrios Skoutas and Mohammad Alrifai. 2011. Ranking tags in resource collections. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 1207--1208.
[53]
Yang Song, Ziming Zhuang, Huajing Li, Qiankun Zhao, Jia Li, Wang-Chien Lee, and C. Lee Giles. 2008. Real-time automatic tag recommendation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08).
[54]
Zhiqing Sun, Jian Tang, Pan Du, Zhi-Hong Deng, and Jian-Yun Nie. 2019. DivGraphPointer: A graph pointer network for extracting diverse keyphrases. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 755--764.
[55]
Jian Tang, Meng Qu, and Qiaozhu Mei. 2015. PTE: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’15).
[56]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In Proceedings of the International Conference on World Wide Web (WWW’15).
[57]
Jinpeng Wang, Wayne Xin Zhao, Yulan He, and Xiaoming Li. 2014. Infer user interests via link structure regularization. ACM Trans. Intell. Syst. Tech. 5, 2 (2014), 23:1--23:22.
[58]
Suhang Wang, Charu C. Aggarwal, Jiliang Tang, and Huan Liu. 2017. Attributed signed network embedding. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM’17). 137--146.
[59]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’14). 1112--1119.
[60]
Christopher K. I. Williams. 2000. On a connection between kernel PCA and metric multidimensional scaling. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS’00). 675--681.
[61]
Wei Wu, Bin Zhang, and Mari Ostendorf. 2010. Automatic generation of personalized annotation tags for Twitter users. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL’10). 689--692.
[62]
Linchuan Xu, Xiaokai Wei, Jiannong Cao, and Philip S. Yu. 2018. On exploring semantic meanings of links for embedding social networks. In Proceedings of the International Conference on World Wide Web (WWW’18).
[63]
Hongyuan Zha. 2002. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Proceedings of the 25th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’02). 113--120.
[64]
Yi Zhang, Jianguo Lu, and Ofer Shai. 2018. Improve network embeddings with regularization. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). 1643--1646.
[65]
Ziwei Zhang, Peng Cui, Xiao Wang, Jian Pei, Xuanrong Yao, and Wenwu Zhu. 2018. Arbitrary-order proximity preserved network embedding. In Proceedings of the Knowledge Discovery and Data Mining Conference (KDD’18).
[66]
Wayne Xin Zhao, Jing Jiang, Jing He, Yang Song, Palakorn Achananuparp, Ee-Peng Lim, and Xiaoming Li. 2011. Topical keyphrase extraction from Twitter. In Proceedings of the Meeting of the Association for Computational Linguistics (ACL’11). 379--388.
[67]
Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. 2011. Comparing Twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on IR Research (ECIR’11). 338--349.
[68]
Wayne Xin Zhao, Jinpeng Wang, Yulan He, Jian-Yun Nie, Ji-Rong Wen, and Xiaoming Li. 2015. Incorporating social role theory into topic models for social media content analysis. IEEE Trans. Knowl. Data Eng. 27, 4 (2015), 1032--1044.

Cited By

View all
  • (2022)Hyperspherical Variational Co-embedding for Attributed NetworksACM Transactions on Information Systems10.1145/347828440:3(1-36)Online publication date: 31-Jul-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 38, Issue 2
April 2020
266 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3379433
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 March 2020
Accepted: 01 December 2019
Revised: 01 November 2019
Received: 01 September 2019
Published in TOIS Volume 38, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Microblog user tagging
  2. Network embedding
  3. Social importance

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Research Funds of Renmin University of China
  • National Natural Science Foundation of China
  • Beijing Outstanding Young Scientist Program
  • Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Hyperspherical Variational Co-embedding for Attributed NetworksACM Transactions on Information Systems10.1145/347828440:3(1-36)Online publication date: 31-Jul-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media