Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3523227.3546788acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Denoising Self-Attentive Sequential Recommendation

Published: 13 September 2022 Publication History

Abstract

Transformer-based sequential recommenders are very powerful for capturing both short-term and long-term sequential item dependencies. This is mainly attributed to their unique self-attention networks to exploit pairwise item-item interactions within the sequence. However, real-world item sequences are often noisy, which is particularly true for implicit feedback. For example, a large portion of clicks do not align well with user preferences, and many products end up with negative reviews or being returned. As such, the current user action only depends on a subset of items, not on the entire sequences. Many existing Transformer-based models use full attention distributions, which inevitably assign certain credits to irrelevant items. This may lead to sub-optimal performance if Transformers are not regularized properly.
Here we propose the Rec-denoiser model for better training of self-attentive recommender systems. In Rec-denoiser, we aim to adaptively prune noisy items that are unrelated to the next item prediction. To achieve this, we simply attach each self-attention layer with a trainable binary mask to prune noisy attentions, resulting in sparse and clean attention distributions. This largely purifies item-item dependencies and provides better model interpretability. In addition, the self-attention network is typically not Lipschitz continuous and is vulnerable to small perturbations. Jacobian regularization is further applied to the Transformer blocks to improve the robustness of Transformers for noisy sequences. Our Rec-denoiser is a general plugin that is compatible to many Transformers. Quantitative results on real-world datasets show that our Rec-denoiser outperforms the state-of-the-art baselines.

References

[1]
Jasmijn Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable Neural Predictions with Differentiable Binary Variables. In ACL. 2963–2977.
[2]
Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150(2020).
[3]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432(2013).
[4]
Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential Recommendation with Graph Neural Networks. In SIGIR. 378–387.
[5]
Huiyuan Chen, Yusan Lin, Fei Wang, and Hao Yang. 2021. Tops, bottoms, and shoes: building capsule wardrobes via cross-attention tensor network. In RecSys. 453–462.
[6]
Huiyuan Chen, Lan Wang, Yusan Lin, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2021. Structured graph convolutional networks with stochastic masks for recommender systems. In SIGIR. 614–623.
[7]
Huiyuan Chen, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2022. Graph Neural Transport Networks with Non-local Attentions for Recommender Systems. In WWW. 1955–1964.
[8]
Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In DLP-KDD.
[9]
Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In WSDM. 108–116.
[10]
Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509(2019).
[11]
Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What Does BERT Look at? An Analysis of BERT’s Attention. In ACL Workshop BlackboxNLP. 276–286.
[12]
Gonçalo M Correia, Vlad Niculae, and André FT Martins. 2019. Adaptively Sparse Transformers. In EMNLP. 2174–2184.
[13]
Nicola De Cao, Michael Sejr Schlichtkrull, Wilker Aziz, and Ivan Titov. 2020. How Do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking. In EMNLP. 3243–3255.
[14]
Gabriel de Souza Pereira Moreira, Sara Rabhi, Jeong Min Lee, Ronay Ak, and Even Oldridge. 2021. Transformers4Rec: Bridging the Gap between NLP and Sequential/Session-Based Recommendation. In RecSys.
[15]
Aleksandar Dimitriev and Mingyuan Zhou. 2021. ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables. In ICML. 2717–2727.
[16]
Zhe Dong, Andriy Mnih, and George Tucker. 2020. DisARM: An Antithetic Gradient Estimator for Binary Latent Variables. In NeurIPS.
[17]
Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. In ICLR.
[18]
Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-Transformer. In NAACL-HLT. 1315–1325.
[19]
Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based recommendation. In RecSys. 161–169.
[20]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In ICLR.
[21]
Judy Hoffman, Daniel A Roberts, and Sho Yaida. 2019. Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729(2019).
[22]
Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018. Improving sequential recommendation with knowledge-enhanced memory networks. In SIGIR. 505–514.
[23]
Michael F Hutchinson. 1989. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation (1989).
[24]
Daniel Jakubovitz and Raja Giryes. 2018. Improving dnn robustness to adversarial attacks using jacobian regularization. In ECCV. 514–529.
[25]
Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In ICLR.
[26]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM. 197–206.
[27]
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML. 5156–5165.
[28]
Hyunjik Kim, George Papamakarios, and Andriy Mnih. 2021. The lipschitz constant of self-attention. In ICML. 5562–5571.
[29]
Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2019. Reformer: The Efficient Transformer. In ICLR.
[30]
Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self-attention for sequential recommendation. In WSDM. 322–330.
[31]
Yang Li, Tong Chen, Peng-Fei Zhang, and Hongzhi Yin. 2021. Lightweight Self-Attentive Sequential Recommendation. In CIKM. 967–977.
[32]
Defu Lian, Yongji Wu, Yong Ge, Xing Xie, and Enhong Chen. 2020. Geography-aware sequential location recommendation. In KDD. 2009–2019.
[33]
Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S. Yu. 2021. Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-Training Transformer. In SIGIR. 1608–1612.
[34]
Christos Louizos, Max Welling, and Diederik P Kingma. 2019. Learning Sparse Neural Networks through Regularization. In ICLR.
[35]
Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. 2020. Disentangled self-supervision in sequential recommenders. In KDD. 483–491.
[36]
Chaitanya Malaviya, Pedro Ferreira, and André FT Martins. 2018. Sparse and Constrained Attention for Neural Machine Translation. In ACL. 370–376.
[37]
Raphael A Meyer, Cameron Musco, Christopher Musco, and David P Woodruff. 2021. Hutch++: Optimal stochastic trace estimation. In SOSA. 142–155.
[38]
Ben Peters, Vlad Niculae, and André FT Martins. 2019. Sparse Sequence-to-Sequence Models. In ACL. 1504–1519.
[39]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In WWW. 811–820.
[40]
Sainbayar Sukhbaatar, Édouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. Adaptive Attention Span in Transformers. In ACL. 331–335.
[41]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM. 1441–1450.
[42]
Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In WSDM. 565–573.
[43]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998–6008.
[44]
Jianling Wang, Kaize Ding, and James Caverlee. 2021. Sequential Recommendation for Cold-start Users with Meta Transitional Learning. In SIGIR. 1783–1787.
[45]
Wenjie Wang, Fuli Feng, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2021. Denoising implicit feedback for recommendation. In WSDM. 373–381.
[46]
Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In SIGIR. 1288–1297.
[47]
Zhenlei Wang, Jingsen Zhang, Hongteng Xu, Xu Chen, Yongfeng Zhang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Counterfactual data-augmented sequential recommendation. In SIGIR. 347–356.
[48]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3-4 (1992), 229–256.
[49]
Jibang Wu, Renqin Cai, and Hongning Wang. 2020. Déjà vu: A contextualized temporal attention mechanism for sequential recommendation. In WWW. 2199–2209.
[50]
Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James Sharpnack. 2020. SSE-PT: Sequential recommendation via personalized transformer. In RecSys. 328–337.
[51]
Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In AAAI. 346–353.
[52]
Zhen Wu, Lijun Wu, Qi Meng, Yingce Xia, Shufang Xie, Tao Qin, Xinyu Dai, and Tie-Yan Liu. 2021. UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost. In NAACL-HLT. 3865–3878.
[53]
Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Jiajie Xu, Victor S Sheng S. Sheng, Zhiming Cui, Xiaofang Zhou, and Hui Xiong. 2019. Recurrent convolutional neural network for sequential recommendation. In WWW. 3398–3404.
[54]
An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, and Julian McAuley. 2019. CosRec: 2D convolutional neural networks for sequential recommendation. In CIKM. 2173–2176.
[55]
Chin-Chia Michael Yeh, Mengting Gu, Yan Zheng, Huiyuan Chen, Javid Ebrahimi, Zhongfang Zhuang, Junpeng Wang, Liang Wang, and Wei Zhang. 2022. Embedding Compression with Hashing for Efficient Representation Learning in Graph. In KDD.
[56]
Mingzhang Yin and Mingyuan Zhou. 2019. ARM: Augment-REINFORCE-merge gradient for stochastic binary networks. In ICLR.
[57]
Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A simple convolutional generative network for next item recommendation. In WSDM. 582–590.
[58]
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, 2020. Big bird: Transformers for longer sequences. In NeurIPS. 17283–17297.
[59]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI. 11106–11115.
[60]
Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, and Ke Xu. 2020. Scheduled DropHead: A Regularization Method for Transformer Models. In EMNLP. 1971–1980.

Cited By

View all
  • (2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 5-Jun-2024
  • (2024)Unified Denoising Training for RecommendationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688109(612-621)Online publication date: 8-Oct-2024
  • (2024)Double Correction Framework for Denoising RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671692(1062-1072)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. Denoising Self-Attentive Sequential Recommendation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems
    September 2022
    743 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 September 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Differentiable Mask
    2. Noise Analysis
    3. Sequential Recommendation
    4. Sparse Transformer

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Acceptance Rates

    Overall Acceptance Rate 254 of 1,295 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)420
    • Downloads (Last 6 weeks)46
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 5-Jun-2024
    • (2024)Unified Denoising Training for RecommendationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688109(612-621)Online publication date: 8-Oct-2024
    • (2024)Double Correction Framework for Denoising RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671692(1062-1072)Online publication date: 25-Aug-2024
    • (2024)Explainable and Coherent Complement Recommendation Based on Large Language ModelsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680028(4678-4685)Online publication date: 21-Oct-2024
    • (2024)A Systematic Evaluation of Generated Time Series and Their Effects in Self-Supervised PretrainingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679870(3719-3723)Online publication date: 21-Oct-2024
    • (2024)Towards Mitigating Dimensional Collapse of Representations in Collaborative FilteringProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635832(106-115)Online publication date: 4-Mar-2024
    • (2024)Recommender Transformers with Behavior PathwaysProceedings of the ACM Web Conference 202410.1145/3589334.3645528(3643-3654)Online publication date: 13-May-2024
    • (2024)Multi-level sequence denoising with cross-signal contrastive learning for sequential recommendationNeural Networks10.1016/j.neunet.2024.106480179(106480)Online publication date: Nov-2024
    • (2024)Dual perspective denoising model for session-based recommendationExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123845249:PCOnline publication date: 17-Jul-2024
    • (2024)Recommendations with minimum exposure guaranteesExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121164236:COnline publication date: 1-Feb-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media