short-paper

Distilling Knowledge on Text Graph for Social Media Attribute Inference

Authors:

Dinghao WuAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2024 - 2028

https://doi.org/10.1145/3477495.3531968

Published: 07 July 2022 Publication History

Abstract

The popularization of social media generates a large amount of user-oriented data, where text data especially attracts researchers and speculators to infer user attributes (e.g., age, gender) for fulfilling their intents. Generally, this line of work casts attribute inference as a text classification problem, and starts to leverage graph neural networks for higher-level text representations. However, these text graphs are constructed on words, suffering from high memory consumption and ineffectiveness on few labeled texts. To address this challenge, we design a text-graph-based few-shot learning model for social media attribute inferences. Our model builds a text graph with texts as nodes and edges learned from current text representations via manifold learning and message passing. To further use unlabeled texts to improve few-shot performance, a knowledge distillation is devised to optimize the problem. This offers a trade-off between expressiveness and complexity. Experiments on social media datasets demonstrate the state-of-the-art performance of our model on attribute inferences with considerably fewer labeled texts.

Supplementary Material

MP4 File (SIGIR2022_sp1432.mp4)

This is the presentation video for the paper ''Distilling Knowledge on Text Graph for Social Media Attribute Inference.'' The video indicates our motivation, describes our model in detail, and briefly analyzes our results.

Download
6.82 MB

References

[1]

Lingwei Chen, Xiaoting Li, and Dinghao Wu. 2020. Enhancing robustness of graph convolutional networks via dropping graph connections. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 412--428.

[2]

Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, and Yongdong Zhang. 2019. Semi-supervised User Profiling with Heterogeneous Graph Attention Networks. In IJCAI, Vol. 19. 2116--2122.

[3]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixedlength context. arXiv preprint arXiv:1901.02860 (2019).

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]

[5]

Kaize Ding, Jianling Wang, Jundong Li, Dingcheng Li, and Huan Liu. 2020. Be more with less: Hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387 (2020).

[6]

George Forman. 2008. BNS feature scaling: an improved representation over tf-idf for svm text classification. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM). 263--270.

Digital Library

[7]

Victor Garcia and Joan Bruna. 2017. Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043 (2017).

[8]

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International conference on machine learning. 1263--1272.

[9]

Neil Zhenqiang Gong and Bin Liu. 2018. Attribute inference attacks in online social networks. ACM Transactions on Privacy and Security (TOPS) 21, 1 (2018), 1--30.

Digital Library

[10]

Alex Graves. 2012. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks. 37--45.

[11]

Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2, 7 (2015).

[12]

Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang, and Houfeng Wang. 2019. Text level graph neural network for text classification. arXiv preprint arXiv:1910.02356 (2019).

[13]

Jinyuan Jia and Neil Zhenqiang Gong. 2018. Attriguard: A practical defense against attribute inference attacks via adversarial machine learning. In 27th USENIX Security Symposium (USENIX Security 18). 513--529.

[14]

Jinyuan Jia, Binghui Wang, Le Zhang, and Neil Zhenqiang Gong. 2017. Attriinfer: Inferring user attributes in online social networks using markov random fields. In Proceedings of the 26th International Conference on World Wide Web. 1561--1569.

Digital Library

[15]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[16]

Xiaoting Li, Lingwei Chen, and Dinghao Wu. 2021. Turning Attacks into Protection: Social Media Privacy Protection Using Adversarial Attacks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, 208--216.

[17]

Chung-Ying Lin. 2020. Social reaction toward the 2019 novel coronavirus (COVID19). Social Health and Behavior 3, 1 (2020), 1.

[18]

Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. Heterogeneous graph attention networks for semi-supervised short text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4821--4830.

[19]

Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, and Yi Yang. 2018. Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002 (2018).

[20]

Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. 2020. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In CVPR. 14424--14432.

[21]

Gordon Pennycook, Jonathon McPhetres, Yunhao Zhang, Jackson G Lu, and David G Rand. 2020. Fighting COVID-19 Misinformation on Social Media: Experimental Evidence for a Scalable Accuracy-Nudge Intervention. Psychological Science (2020).

[22]

Jay M Ponte and W Bruce Croft. 2017. A language modeling approach to information retrieval. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 202--208.

Digital Library

[23]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

[24]

Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James W Pennebaker. 2006. Effects of age and gender on blogging. In AAAI spring symposium: Computational approaches to analyzing weblogs, Vol. 6. 199--205.

[25]

Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4 (2001), 35--43.

[26]

Yaqing Wang, Song Wang, Quanming Yao, and Dejing Dou. 2021. Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification. arXiv preprint arXiv:2111.00180 (2021).

[27]

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7370--7377.

Digital Library

[28]

Yanfang Ye, Shifu Hou, Yujie Fan, Yiyue Qian, Yiming Zhang, Shiyu Sun, Qian Peng, and Kenneth Laparo. 2020. ??-Satellite: An AI-driven System and Benchmark Datasets for Hierarchical Community-level Risk Assessment to Help Combat COVID-19. arXiv preprint arXiv:2003.12232 (2020).

[29]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In SIGKDD. 974--983.

[30]

Sixie Yu, Yevgeniy Vorobeychik, and Scott Alfeld. 2018. Adversarial classification on social networks. In International Conference on Autonomous Agents and MultiAgent Systems. 211--219.

[31]

Wen Zhang, Taketoshi Yoshida, and Xijin Tang. 2011. A comparative study of TFIDF, LSI and multi-words for text classification. Expert Systems with Applications 38, 3 (2011), 2758--2765.

Digital Library

[32]

Yufeng Zhang, Xueli Yu, Zeyu Cui, Shu Wu, Zhongzhen Wen, and Liang Wang. 2020. Every document owns its structure: Inductive text classification via graph neural networks. arXiv preprint arXiv:2004.13826 (2020).

Cited By

Tian YPei SZhang XZhang CChawla N(2025)Knowledge Distillation on Graphs: A SurveyACM Computing Surveys10.1145/371112157:8(1-16)Online publication date: 30-Jan-2025
https://dl.acm.org/doi/10.1145/3711121
Sun ZChen ZZhang JHao D(2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664608
Chen ZMao HLi HJin WWen HWei XWang SYin DFan WLiu HTang J(2024)Exploring the Potential of Large Language Models (LLMs)in Learning on GraphsACM SIGKDD Explorations Newsletter10.1145/3655103.365511025:2(42-61)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3655103.3655110
Show More Cited By

Index Terms

Distilling Knowledge on Text Graph for Social Media Attribute Inference
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Knowledge Distillation on Cross-Modal Adversarial Reprogramming for Data-Limited Attribute Inference
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

Social media generates a rich source of text data with intrinsic user attributes (e.g., age, gender), where different parties benefit from disclosing them. Attribute inference can be cast as a text classification problem, which, however, suffers from ...
Uses and gratifications of social networking sites for bridging and bonding social capital

Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Robust Graph Meta-Learning for Weakly Supervised Few-Shot Node Classification
Graph machine learning (Graph ML) models typically require abundant labeled instances to provide sufficient supervision signals, which is commonly infeasible in real-world scenarios since labeled data for newly emerged concepts (e.g., new categorizations ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
264
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)7

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tian YPei SZhang XZhang CChawla N(2025)Knowledge Distillation on Graphs: A SurveyACM Computing Surveys10.1145/371112157:8(1-16)Online publication date: 30-Jan-2025
https://dl.acm.org/doi/10.1145/3711121
Sun ZChen ZZhang JHao D(2024)Fairness Testing of Machine Translation SystemsACM Transactions on Software Engineering and Methodology10.1145/366460833:6(1-27)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3664608
Chen ZMao HLi HJin WWen HWei XWang SYin DFan WLiu HTang J(2024)Exploring the Potential of Large Language Models (LLMs)in Learning on GraphsACM SIGKDD Explorations Newsletter10.1145/3655103.365511025:2(42-61)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3655103.3655110
Jing SChen LLi QWu D(2024)DOS-GNN: Dual-Feature Aggregations with Over-Sampling for Class-Imbalanced Fraud Detection On Graphs2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650494(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650494
Li QZhao TChen LXu JWang S(2024)Enhancing Graph Neural Networks with Limited Labeled Data by Actively Distilling Knowledge from Large Language Models2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825477(741-746)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825477
Ashmore BChen L(2024)Leveraging Homophily-Augmented Energy Propagation for Bot Detection on GraphsDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_5(68-83)Online publication date: 31-Aug-2024
https://doi.org/10.1007/978-981-97-5572-1_5
Jing SChen LLi QWu D(2024)H$$^2$$GNN: Graph Neural Networks with Homophilic and Heterophilic Feature AggregationsDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_23(342-352)Online publication date: 31-Aug-2024
https://doi.org/10.1007/978-981-97-5572-1_23
Li XChen LWu D(2023)Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute PrivacyACM Transactions on Knowledge Discovery from Data10.1145/361409818:2(1-24)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3614098
Ashmore BChen LFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)HOVER: Homophilic Oversampling via Edge Removal for Class-Imbalanced Bot Detection on GraphsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615264(3728-3732)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3615264
Li QChen LJing SWu D(2023)Pseudo-Labeling with Graph Active Learning for Few-shot Node Classification2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00133(1115-1120)Online publication date: 1-Dec-2023
https://doi.org/10.1109/ICDM58522.2023.00133
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten