Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3485447.3512080acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

MINDSim: User Simulator for News Recommenders

Published: 25 April 2022 Publication History

Abstract

Recommender system is playing an increasingly important role in online news platforms nowadays. Recently, there is a growing demand for applying reinforcement learning (RL) algorithms to news recommendation aiming to maximize long-term and/or non-differentiable objectives. However, without an interactive simulated environment, it is extremely costly to develop powerful RL agents for news recommendation. In this paper, we build a user simulator, namely MINDSim, for news recommendation. Targeting at new user generation and corresponding behavior simulation, we first construct a hidden space for users using a generative adversarial network, so that new users can be generated by sampling from this hidden space. To capture complex and fast user interest drifts over time, we adopt an encoder-decoder architecture, which takes the clicked news during the simulation as input and outputs the new user interests for the next period of time. Finally, we build the MINDSim simulator using MIcrosoft News Dataset (MIND), and extensive experimental results on this large-scale real-world dataset demonstrate that MINDSim can simulate the behaviors of real users with high quality.

References

[1]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In ICML.
[2]
Xueying Bai, Jian Guan, and Hongning Wang. 2019. A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation. NeurIPS (2019).
[3]
Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, 2020. Unilmv2: Pseudo-masked language models for unified language model pre-training. In ICML.
[4]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, and Yiwei Zhang. 2018. Reinforcement Mechanism Design for e-commerce. In WWW.
[5]
Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2019. Large-scale interactive recommendation with tree-structured policy gradient. In AAAI.
[6]
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In WSDM.
[7]
Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In KDD.
[8]
Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative adversarial user model for reinforcement learning based recommendation system. In ICML.
[9]
Jamell Dacon and Haochen Liu. 2021. Does Gender Matter in the News? Detecting and Examining Gender Bias in News Articles. In Companion Proceedings of the Web Conference 2021. 385–392.
[10]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G Carbonell, Quoc Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In ACL.
[11]
Linhao Dong, Shuang Xu, and Bo Xu. 2018. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In ICASSP.
[12]
Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouve, and Gabriel Peyré. 2019. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. In The 22nd International Conference on Artificial Intelligence and Statistics. 2681–2690.
[13]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved Training of Wasserstein GANs. In NeurIPS.
[14]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. In IJCAI.
[15]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In WWW.
[16]
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415(2016).
[17]
Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In NeurIPS.
[18]
Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In KDD.
[19]
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. arXiv preprint (2019).
[20]
Eugene Ie, Chih wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. arXiv preprint arXiv:1909.04847(2019).
[21]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM.
[22]
John F Kolen and Stefan C Kremer. 2001. A field guide to dynamical recurrent networks. John Wiley & Sons.
[23]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased Offline Evaluation of Contextual-Bandit-Based News Article Recommendation Algorithms. In WSDM.
[24]
Jianxun Lian, Fuzheng Zhang, Xing Xie, and Guangzhong Sun. 2018. Towards Better Representation Learning for Personalized News Recommendation: a Multi-Channel Deep Fusion Approach. In IJCAI. 3805–3811.
[25]
Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In Proceedings of the 15th international conference on Intelligent user interfaces. 31–40.
[26]
Zheng Liu, Yu Xing, Fangzhao Wu, Mingxiao An, and Xing Xie. 2019. Hi-Fi Ark: Deep User Representation via High-Fidelity Archive Network. In IJCAI.
[27]
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML.
[28]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[29]
Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. In KDD.
[30]
Giorgio Patrini, Rianne van den Berg, Patrick Forre, Marcello Carioni, Samarth Bhargav, Max Welling, Tim Genewein, and Frank Nielsen. 2020. Sinkhorn autoencoders. In UAI.
[31]
Gabriel Peyré, Marco Cuturi, 2019. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning 11, 5-6(2019), 355–607.
[32]
Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2015. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732(2015).
[33]
Steffen Rendle. 2010. Factorization machines. In ICDM. IEEE.
[34]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI.
[35]
David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720(2018).
[36]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In ICML. PMLR.
[37]
Wenjie Shang, Yang Yu, Qingyang Li, Zhiwei Qin, Yiping Meng, and Jieping Ye. 2019. Environment reconstruction with hidden confounders for reinforcement learning based recommendation. In KDD.
[38]
Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In AAAI.
[39]
Richard S Sutton. 1990. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990. Elsevier, 216–224.
[40]
Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge.
[41]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. NeurIPS (2017).
[42]
Sanne Vrijenhoek, Mesut Kaya, Nadia Metoui, Judith Möller, Daan Odijk, and Natali Helberger. 2021. Recommenders with a mission: assessing diversity in news recommendations. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval. 173–183.
[43]
et al. Wayne Xin Zhao. 2020. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. arXiv preprint arXiv:2011.01731(2020).
[44]
Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2020. Neural news recommendation with negative feedback. CCF Transactions on Pervasive Computing and Interaction 2, 3(2020), 178–188.
[45]
Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, and Ming Zhou. 2020. MIND: A Large-scale Dataset for News Recommendation. In ACL. https://msnews.github.io/.
[46]
Shitao Xiao, Zheng Liu, Yingxia Shao, Tao Di, and Xing Xie. 2021. Training Large-Scale News Recommenders with Pretrained Language Models in the Loop. arXiv preprint arXiv:2102.09268(2021).
[47]
Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In KDD.
[48]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In KDD.
[49]
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW.
[50]
Lixin Zou, Long Xia, Pan Du, Zhuo Zhang, Ting Bai, Weidong Liu, Jian-Yun Nie, and Dawei Yin. 2020. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In WSDM.

Cited By

View all
  • (2024)Generation of Rating Matrix Based on Rational Behaviors of UsersJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p012928:1(129-140)Online publication date: 20-Jan-2024
  • (2024)Simulating News Recommendation Ecosystems for Insights and ImplicationsIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.338132911:5(5699-5713)Online publication date: Oct-2024
  • (2023)Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance FeedbackProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608775(362-373)Online publication date: 14-Sep-2023

Index Terms

  1. MINDSim: User Simulator for News Recommenders
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '22: Proceedings of the ACM Web Conference 2022
    April 2022
    3764 pages
    ISBN:9781450390965
    DOI:10.1145/3485447
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 April 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. news recommendation
    2. reinforcement learning
    3. user simulator

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '22
    Sponsor:
    WWW '22: The ACM Web Conference 2022
    April 25 - 29, 2022
    Virtual Event, Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)98
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Generation of Rating Matrix Based on Rational Behaviors of UsersJournal of Advanced Computational Intelligence and Intelligent Informatics10.20965/jaciii.2024.p012928:1(129-140)Online publication date: 20-Jan-2024
    • (2024)Simulating News Recommendation Ecosystems for Insights and ImplicationsIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.338132911:5(5699-5713)Online publication date: Oct-2024
    • (2023)Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance FeedbackProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608775(362-373)Online publication date: 14-Sep-2023
    • (2023)Stochastic Composition Optimization of Functions Without Lipschitz Continuous GradientJournal of Optimization Theory and Applications10.1007/s10957-023-02180-w198:1(239-289)Online publication date: 10-Mar-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media