research-article

Denoising Self-Attentive Sequential Recommendation

Authors:

Chin-Chia Michael Yeh,

Hao YangAuthors Info & Claims

RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Pages 92 - 101

https://doi.org/10.1145/3523227.3546788

Published: 13 September 2022 Publication History

Abstract

Transformer-based sequential recommenders are very powerful for capturing both short-term and long-term sequential item dependencies. This is mainly attributed to their unique self-attention networks to exploit pairwise item-item interactions within the sequence. However, real-world item sequences are often noisy, which is particularly true for implicit feedback. For example, a large portion of clicks do not align well with user preferences, and many products end up with negative reviews or being returned. As such, the current user action only depends on a subset of items, not on the entire sequences. Many existing Transformer-based models use full attention distributions, which inevitably assign certain credits to irrelevant items. This may lead to sub-optimal performance if Transformers are not regularized properly.

Here we propose the Rec-denoiser model for better training of self-attentive recommender systems. In Rec-denoiser, we aim to adaptively prune noisy items that are unrelated to the next item prediction. To achieve this, we simply attach each self-attention layer with a trainable binary mask to prune noisy attentions, resulting in sparse and clean attention distributions. This largely purifies item-item dependencies and provides better model interpretability. In addition, the self-attention network is typically not Lipschitz continuous and is vulnerable to small perturbations. Jacobian regularization is further applied to the Transformer blocks to improve the robustness of Transformers for noisy sequences. Our Rec-denoiser is a general plugin that is compatible to many Transformers. Quantitative results on real-world datasets show that our Rec-denoiser outperforms the state-of-the-art baselines.

References

[1]

Jasmijn Bastings, Wilker Aziz, and Ivan Titov. 2019. Interpretable Neural Predictions with Differentiable Binary Variables. In ACL. 2963–2977.

[2]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150(2020).

[3]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432(2013).

[4]

Jianxin Chang, Chen Gao, Yu Zheng, Yiqun Hui, Yanan Niu, Yang Song, Depeng Jin, and Yong Li. 2021. Sequential Recommendation with Graph Neural Networks. In SIGIR. 378–387.

[5]

Huiyuan Chen, Yusan Lin, Fei Wang, and Hao Yang. 2021. Tops, bottoms, and shoes: building capsule wardrobes via cross-attention tensor network. In RecSys. 453–462.

[6]

Huiyuan Chen, Lan Wang, Yusan Lin, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2021. Structured graph convolutional networks with stochastic masks for recommender systems. In SIGIR. 614–623.

[7]

Huiyuan Chen, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2022. Graph Neural Transport Networks with Non-local Attentions for Recommender Systems. In WWW. 1955–1964.

[8]

Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In DLP-KDD.

[9]

Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential recommendation with user memory networks. In WSDM. 108–116.

[10]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509(2019).

[11]

Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. 2019. What Does BERT Look at? An Analysis of BERT’s Attention. In ACL Workshop BlackboxNLP. 276–286.

[12]

Gonçalo M Correia, Vlad Niculae, and André FT Martins. 2019. Adaptively Sparse Transformers. In EMNLP. 2174–2184.

[13]

Nicola De Cao, Michael Sejr Schlichtkrull, Wilker Aziz, and Ivan Titov. 2020. How Do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking. In EMNLP. 3243–3255.

[14]

Gabriel de Souza Pereira Moreira, Sara Rabhi, Jeong Min Lee, Ronay Ak, and Even Oldridge. 2021. Transformers4Rec: Bridging the Gap between NLP and Sequential/Session-Based Recommendation. In RecSys.

[15]

Aleksandar Dimitriev and Mingyuan Zhou. 2021. ARMS: Antithetic-REINFORCE-Multi-Sample Gradient for Binary Variables. In ICML. 2717–2727.

[16]

Zhe Dong, Andriy Mnih, and George Tucker. 2020. DisARM: An Antithetic Gradient Estimator for Binary Latent Variables. In NeurIPS.

[17]

Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. In ICLR.

[18]

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, and Zheng Zhang. 2019. Star-Transformer. In NAACL-HLT. 1315–1325.

[19]

Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017. Translation-based recommendation. In RecSys. 161–169.

[20]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In ICLR.

[21]

Judy Hoffman, Daniel A Roberts, and Sho Yaida. 2019. Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729(2019).

[22]

Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018. Improving sequential recommendation with knowledge-enhanced memory networks. In SIGIR. 505–514.

[23]

Michael F Hutchinson. 1989. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation (1989).

[24]

Daniel Jakubovitz and Raja Giryes. 2018. Improving dnn robustness to adversarial attacks using jacobian regularization. In ECCV. 514–529.

[25]

Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. In ICLR.

[26]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM. 197–206.

[27]

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In ICML. 5156–5165.

[28]

Hyunjik Kim, George Papamakarios, and Andriy Mnih. 2021. The lipschitz constant of self-attention. In ICML. 5562–5571.

[29]

Nikita Kitaev, Lukasz Kaiser, and Anselm Levskaya. 2019. Reformer: The Efficient Transformer. In ICLR.

[30]

Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self-attention for sequential recommendation. In WSDM. 322–330.

[31]

Yang Li, Tong Chen, Peng-Fei Zhang, and Hongzhi Yin. 2021. Lightweight Self-Attentive Sequential Recommendation. In CIKM. 967–977.

[32]

Defu Lian, Yongji Wu, Yong Ge, Xing Xie, and Enhong Chen. 2020. Geography-aware sequential location recommendation. In KDD. 2009–2019.

[33]

Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S. Yu. 2021. Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-Training Transformer. In SIGIR. 1608–1612.

[34]

Christos Louizos, Max Welling, and Diederik P Kingma. 2019. Learning Sparse Neural Networks through Regularization. In ICLR.

[35]

Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. 2020. Disentangled self-supervision in sequential recommenders. In KDD. 483–491.

[36]

Chaitanya Malaviya, Pedro Ferreira, and André FT Martins. 2018. Sparse and Constrained Attention for Neural Machine Translation. In ACL. 370–376.

[37]

Raphael A Meyer, Cameron Musco, Christopher Musco, and David P Woodruff. 2021. Hutch++: Optimal stochastic trace estimation. In SOSA. 142–155.

[38]

Ben Peters, Vlad Niculae, and André FT Martins. 2019. Sparse Sequence-to-Sequence Models. In ACL. 1504–1519.

[39]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In WWW. 811–820.

[40]

Sainbayar Sukhbaatar, Édouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. Adaptive Attention Span in Transformers. In ACL. 331–335.

[41]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM. 1441–1450.

[42]

Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In WSDM. 565–573.

[43]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS. 5998–6008.

[44]

Jianling Wang, Kaize Ding, and James Caverlee. 2021. Sequential Recommendation for Cold-start Users with Meta Transitional Learning. In SIGIR. 1783–1787.

[45]

Wenjie Wang, Fuli Feng, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2021. Denoising implicit feedback for recommendation. In WSDM. 373–381.

[46]

Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In SIGIR. 1288–1297.

[47]

Zhenlei Wang, Jingsen Zhang, Hongteng Xu, Xu Chen, Yongfeng Zhang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Counterfactual data-augmented sequential recommendation. In SIGIR. 347–356.

[48]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3-4 (1992), 229–256.

[49]

Jibang Wu, Renqin Cai, and Hongning Wang. 2020. Déjà vu: A contextualized temporal attention mechanism for sequential recommendation. In WWW. 2199–2209.

[50]

Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James Sharpnack. 2020. SSE-PT: Sequential recommendation via personalized transformer. In RecSys. 328–337.

[51]

Shu Wu, Yuyuan Tang, Yanqiao Zhu, Liang Wang, Xing Xie, and Tieniu Tan. 2019. Session-based recommendation with graph neural networks. In AAAI. 346–353.

[52]

Zhen Wu, Lijun Wu, Qi Meng, Yingce Xia, Shufang Xie, Tao Qin, Xinyu Dai, and Tie-Yan Liu. 2021. UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost. In NAACL-HLT. 3865–3878.

[53]

Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Jiajie Xu, Victor S Sheng S. Sheng, Zhiming Cui, Xiaofang Zhou, and Hui Xiong. 2019. Recurrent convolutional neural network for sequential recommendation. In WWW. 3398–3404.

[54]

An Yan, Shuo Cheng, Wang-Cheng Kang, Mengting Wan, and Julian McAuley. 2019. CosRec: 2D convolutional neural networks for sequential recommendation. In CIKM. 2173–2176.

[55]

Chin-Chia Michael Yeh, Mengting Gu, Yan Zheng, Huiyuan Chen, Javid Ebrahimi, Zhongfang Zhuang, Junpeng Wang, Liang Wang, and Wei Zhang. 2022. Embedding Compression with Hashing for Efficient Representation Learning in Graph. In KDD.

[56]

Mingzhang Yin and Mingyuan Zhou. 2019. ARM: Augment-REINFORCE-merge gradient for stochastic binary networks. In ICLR.

[57]

Fajie Yuan, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M Jose, and Xiangnan He. 2019. A simple convolutional generative network for next item recommendation. In WSDM. 582–590.

[58]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, 2020. Big bird: Transformers for longer sequences. In NeurIPS. 17283–17297.

[59]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI. 11106–11115.

[60]

Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou, and Ke Xu. 2020. Scheduled DropHead: A Regularization Method for Transformer Models. In EMNLP. 1971–1980.

Cited By

Xin HSun YWang CXiong H(2025)LLMCDSR: Enhancing Cross-Domain Sequential Recommendation with Large Language ModelsACM Transactions on Information Systems10.1145/3715099Online publication date: 28-Jan-2025
https://doi.org/10.1145/3715099
Qian FHu YLi GChen JWang SZhao S(2025)Enhanced Knowledge Tracing With Learnable FilterIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.345213012:1(198-209)Online publication date: Feb-2025
https://doi.org/10.1109/TCSS.2024.3452130
Choi SLee DKang HCho H(2025)Exploring the Side-Information Fusion for Sequential RecommendationIEEE Access10.1109/ACCESS.2025.352581213(8839-8850)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3525812
Show More Cited By

Index Terms

Denoising Self-Attentive Sequential Recommendation
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Lightweight Self-Attentive Sequential Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Modern deep neural networks (DNNs) have greatly facilitated the development of sequential recommender systems by achieving state-of-the-art recommendation performance on various sequential recommendation tasks. Given a sequence of interacted items, ...
Sequential Recommendation with Dual Side Neighbor-based Collaborative Relation Modeling
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

Sequential recommendation task aims to predict user preference over items in the future given user historical behaviors. The order of user behaviors implies that there are resourceful sequential patterns embedded in the behavior history which reveal the ...
User Popularity Preference Aware Sequential Recommendation
Computational Science – ICCS 2023
Abstract
In recommender systems, users’ preferences for item popularity are diverse and dynamic, which reveals the different items that users prefer. Therefore, identifying user popularity preferences are significant for personalized recommendations. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

September 2022

743 pages

ISBN:9781450392785

DOI:10.1145/3523227

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RecSys '22

Sponsor:

RecSys '22: Sixteenth ACM Conference on Recommender Systems

September 18 - 23, 2022

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
3,565
Total Downloads

Downloads (Last 12 months)354
Downloads (Last 6 weeks)39

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xin HSun YWang CXiong H(2025)LLMCDSR: Enhancing Cross-Domain Sequential Recommendation with Large Language ModelsACM Transactions on Information Systems10.1145/3715099Online publication date: 28-Jan-2025
https://doi.org/10.1145/3715099
Qian FHu YLi GChen JWang SZhao S(2025)Enhanced Knowledge Tracing With Learnable FilterIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.345213012:1(198-209)Online publication date: Feb-2025
https://doi.org/10.1109/TCSS.2024.3452130
Choi SLee DKang HCho H(2025)Exploring the Side-Information Fusion for Sequential RecommendationIEEE Access10.1109/ACCESS.2025.352581213(8839-8850)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3525812
Li CXie TYu CHu BLi ZCheng LKong BNiu D(2025)DGT: Unbiased sequential recommendation via Disentangled Graph TransformerKnowledge-Based Systems10.1016/j.knosys.2024.112946310(112946)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2024.112946
Duan SOuyang MWang RLi QXiao Y(2025)Let long-term interests talk: An disentangled learning model for recommendation based on short-term interests generationInformation Processing & Management10.1016/j.ipm.2024.10399762:2(103997)Online publication date: Mar-2025
https://doi.org/10.1016/j.ipm.2024.103997
Zhang SMeng XZhang Y(2024)Variational Type Graph Autoencoder for Denoising on Event RecommendationACM Transactions on Information Systems10.1145/370315643:1(1-27)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3703156
Wu YMacdonald COunis I(2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3651169
Chua HDu YSun ZWang ZZhang JOng Y(2024)Unified Denoising Training for RecommendationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688109(612-621)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688109
He ZWang YYang YSun PWu LBai HGong JHong RZhang MBaeza-Yates RBonchi F(2024)Double Correction Framework for Denoising RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671692(1062-1072)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671692
Li ZLiang YWang MYoon SShi JShen XHe XZhang CWu WWang HLi JChan JZhang YSerra ESpezzano F(2024)Explainable and Coherent Complement Recommendation Based on Large Language ModelsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680028(4678-4685)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680028
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten