Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3477495.3532021acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

MGPolicy: Meta Graph Enhanced Off-policy Learning for Recommendations

Published: 07 July 2022 Publication History

Abstract

Off-policy learning has drawn huge attention in recommender systems (RS), which provides an opportunity for reinforcement learning to abandon the expensive online training. However, off-policy learning from logged data suffers biases caused by the policy shift between the target policy and the logging policy. Consequently, most off-policy learning resorts to inverse propensity scoring (IPS) which however tends to be over-fitted over exposed (or recommended) items and thus fails to explore unexposed items.
In this paper, we propose meta graph enhanced off-policy learning (MGPolicy), which is the first recommendation model for correcting the off-policy bias via contextual information. In particular, we explicitly leverage rich semantics in meta graphs for user state representation, and then train the candidate generation model to promote an efficient search in the action space. lMoreover, our MGpolicy is designed with counterfactual risk minimization, which can correct poicy learning bias and ultimately yield an effective target policy to maximize the long-run rewards for the recommendation. We extensively evaluate our method through a series of simulations and large-scale real-world datasets, achieving favorable results compared with state-of-the-art methods. Our code is currently available online.

Supplementary Material

MP4 File (SIGIR_presentation.mp4)
Presentation video.

References

[1]
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456--464.
[2]
Xiaocong Chen, Lina Yao, Julian McAuley, Guangling Zhou, and Xianzhi Wang. 2021. A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions. arXiv preprint arXiv:2109.03540 (2021).
[3]
Yifan Chen, Yang Wang, Xiang Zhao, Jie Zou, and Maarten De Rijke. 2020. Block- Aware Item Similarity Models for Top-N Recommendation. ACM Transactions on Information Systems (TOIS) 38, 4 (2020), 1--26.
[4]
Aminu Da'u and Naomie Salim. 2020. Recommendation system based on deep learning methods: a systematic review and new directions. Artificial Intelligence Review 53, 4 (2020), 2709--2748.
[5]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12, 7 (2011).
[6]
Tri Dung Duong, Qian Li, and Guandong Xu. 2021. Prototype-based Counter-factual Explanation for Causal Classification. arXiv preprint arXiv:2105.00703 (2021).
[7]
Tri Dung Duong, Qian Li, and Guandong Xu. 2021. Stochastic Intervention for Causal Effect Estimation. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.
[8]
Louis Faury, Ugo Tanielian, Elvis Dohmatob, Elena Smirnova, and Flavian Vasile. 2020. Distributionally robust counterfactual risk minimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3850--3857.
[9]
Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, and online news consumption. Public opinion quarterly 80, S1 (2016), 298--320.
[10]
C Lee Giles, Gary M Kuhn, and Ronald J Williams. 1994. Dynamic recurrent neural networks: Theory and applications. IEEE Transactions on Neural Networks 5, 2 (1994), 153--156.
[11]
David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.
[12]
Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1531--1540.
[13]
Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. 2020. Heterogeneous graph transformer. In Proceedings of The Web Conference 2020. 2704--2710.
[14]
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining. 1595--1604.
[15]
Olivier Jeunen and Bart Goethals. 2021. Pessimistic reward models for off- policy learning in recommendation. In Fifteenth ACM Conference on Recommender Systems. 63--74.
[16]
Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018. Deep learning with logged bandit feedback. In International Conference on Learning Representations.
[17]
Yonghan Jung, Jin Tian, and Elias Bareinboim. 2020. Learning causal effects via weighted empirical risk minimization. Advances in neural information processing systems 33 (2020).
[18]
Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. 2019. Stabilizing Off- Policy Q-Learning via Bootstrapping Error Reduction. arXiv:1906.00949 [cs.LG]
[19]
Qian Li, Tri Dung Duong, Zhichao Wang, Shaowu Liu, Dingxian Wang, and Guandong Xu. 2021. Causal-Aware Generative Imputation for Automated Under-writing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3916--3924.
[20]
Qian Li, Xiangmeng Wang, and Guandong Xu. 2021. Be Causal: De-biasing Social Network Confounding in Recommendation. arXiv preprint arXiv:2105.07775 (2021).
[21]
Qian Li, Zhichao Wang, Shaowu Liu, Gang Li, and Guandong Xu. 2021. Causal Optimal Transport for Treatment Effect Estimation. IEEE transactions on neural networks and learning systems (2021).
[22]
Qian Li, Zhichao Wang, Shaowu Liu, Gang Li, and Guandong Xu. 2021. Deep Treatment-Adaptive Network for Causal Inference. arXiv preprint arXiv:2112.13502 (2021).
[23]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H Chi. 2020. Off-policy learning in two-stage recommender systems. In Proceedings of The Web Conference 2020. 463--473.
[24]
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image transformer. In International Conference on Machine Learning. PMLR, 4055--4064.
[25]
Yuta Saito and Thorsten Joachims. 2021. Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances. In Fifteenth ACM Conference on Recommender Systems. 828--830.
[26]
Aravind Sankar, Xinyang Zhang, and Kevin Chen-Chuan Chang. 2019. Meta-gnn: metagraph neural network for semi-supervised learning in attributed heterogeneous information networks. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 137--144.
[27]
James E. Smith and Robert L. Winkler. 2006. The Optimizers Curse: Skepticism and Postdecision Surprise in Decision Analysis. Manage. Sci. 52, 3 (mar 2006), 311--322. https://doi.org/10.1287/mnsc.1050.0451
[28]
Adith Swaminathan and Thorsten Joachims. 2015. Counterfactual risk minimization: Learning from logged bandit feedback. In International Conference on Machine Learning. PMLR, 814--823.
[29]
Chengwei Wang, Tengfei Zhou, Chen Chen, Tianlei Hu, and Gang Chen. 2020. Off-Policy Recommendation System Without Exploration. Advances in Knowledge Discovery and Data Mining 12084 (2020), 16.
[30]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. 2019. Heterogeneous graph attention network. In The World Wide Web Conference. 2022--2032.
[31]
Xiangmeng Wang, Qian Li, Dianer Yu, Peng Cui, Zhichao Wang, and Guandong Xu. 2022. Causal Disentanglement for Semantics-Aware Intent Learning in Recommendation. IEEE Transactions on Knowledge and Data Engineering (2022).
[32]
Xiangmeng Wang, Qian Li, Wu Zhang, Guandong Xu, Shaowu Liu, and Wenhao Zhu. 2020. Joint relational dependency learning for sequential recommendation. Advances in Knowledge Discovery and Data Mining 12084 (2020), 168.
[33]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3 (1992), 229--256.
[34]
R. F. Woolson. 2008. Wilcoxon Signed-Rank Test. John Wiley & Sons, Ltd, 1--3. https://doi.org/10.1002/9780471462422.eoct979 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780471462422.eoct979
[35]
Fenfang Xie, Angyu Zheng, Liang Chen, and Zibin Zheng. 2021. Attentive Meta-graph Embedding for item Recommendation in heterogeneous information networks. Knowledge-Based Systems 211 (2021), 106524.
[36]
Guandong Xu, Tri Dung Duong, Qian Li, Shaowu Liu, and Xianzhi Wang. 2020. Causality learning: a new perspective for interpretable machine learning. arXiv preprint arXiv:2006.16789 (2020).
[37]
Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. 2018. Representation learning on graphs with jumping knowledge networks. In International Conference on Machine Learning. PMLR, 5453--5462.
[38]
Suiyun Zhang, Zhizhong Han, Yu-Kun Lai, Matthias Zwicker, and Hui Zhang. 2019. Stylistic scene enhancement GAN: mixed stylistic enhancement generation for 3D indoor scenes. The Visual Computer 35, 6 (2019), 1157--1169.
[39]
Xiaoying Zhang, Hong Xie, and John CS Lui. 2021. Heterogeneous Information Assisted Bandit Learning: Theory and Application. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2135--2140.
[40]
Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Meta- graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 635--644.
[41]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1040--1048.
[42]
Sijin Zhou, Xinyi Dai, Haokun Chen, Weinan Zhang, Kan Ren, Ruiming Tang, Xiuqiang He, and Yong Yu. 2020. Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 179--188.
[43]
Lixin Zou, Long Xia, Pan Du, Zhuo Zhang, Ting Bai, Weidong Liu, Jian-Yun Nie, and Dawei Yin. 2020. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining. 816--824.

Cited By

View all
  • (2024)Counterfactual Explanation for Fairness in RecommendationACM Transactions on Information Systems10.1145/364367042:4(1-30)Online publication date: 22-Mar-2024
  • (2023)Constrained Off-policy Learning over Heterogeneous Information for Fairness-aware RecommendationACM Transactions on Recommender Systems10.1145/36291722:4(1-27)Online publication date: 26-Oct-2023
  • (2023)Contextualized Knowledge Graph Embedding for Explainable Talent Training Course RecommendationACM Transactions on Information Systems10.1145/359702242:2(1-27)Online publication date: 27-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bias
  2. counterfactual risk minimization
  3. off-policy learning
  4. recommendation

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)9
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Counterfactual Explanation for Fairness in RecommendationACM Transactions on Information Systems10.1145/364367042:4(1-30)Online publication date: 22-Mar-2024
  • (2023)Constrained Off-policy Learning over Heterogeneous Information for Fairness-aware RecommendationACM Transactions on Recommender Systems10.1145/36291722:4(1-27)Online publication date: 26-Oct-2023
  • (2023)Contextualized Knowledge Graph Embedding for Explainable Talent Training Course RecommendationACM Transactions on Information Systems10.1145/359702242:2(1-27)Online publication date: 27-Sep-2023
  • (2023)Causality-guided Graph Learning for Session-based RecommendationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614803(3083-3093)Online publication date: 21-Oct-2023
  • (2023)Deconfounded recommendation via causal interventionNeurocomputing10.1016/j.neucom.2023.01.089529:C(128-139)Online publication date: 7-Apr-2023
  • (2023)Deep reinforcement learning in recommender systemsKnowledge-Based Systems10.1016/j.knosys.2023.110335264:COnline publication date: 15-Mar-2023
  • (2022)Being Automated or Not? Risk Identification of Occupations with Graph Neural NetworksAdvanced Data Mining and Applications10.1007/978-3-031-22064-7_37(520-534)Online publication date: 30-Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media