short-paper

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

Authors:

Chin-Chia Michael Yeh,

Hao YangAuthors Info & Claims

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Pages 791 - 797

https://doi.org/10.1145/3604915.3608831

Published: 14 September 2023 Publication History

Abstract

Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user’s dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the problem of data sparsity, previous studies have utilized self-supervised learning to enhance Transformers, such as pre-training embeddings from item attributes or contrastive data augmentations. However, these approaches encounter several training issues, including initialization sensitivity, manual data augmentations, and large batch-size memory bottlenecks.

In this work, we investigate Transformers from the perspective of loss geometry, aiming to enhance the models’ data efficiency and generalization in sequential recommendation. We observe that Transformers (e.g., SASRec) can converge to extremely sharp local minima if not adequately regularized. Inspired by the recent Sharpness-Aware Minimization (SAM), we propose SAMRec, which significantly improves the accuracy and robustness of sequential recommendation. SAMRec performs comparably to state-of-the-art self-supervised Transformers, such as S3Rec and CL4SRec, without the need for pre-training or strong data augmentations.

References

[1]

Maksym Andriushchenko and Nicolas Flammarion. 2022. Towards understanding sharpness-aware minimization. In International Conference on Machine Learning.

[2]

Huiyuan Chen, Yusan Lin, Menghai Pan, Lan Wang, Chin-Chia Michael Yeh, Xiaoting Li, Yan Zheng, Fei Wang, and Hao Yang. 2022. Denoising self-attentive sequential recommendation. In Proceedings of the 16th ACM Conference on Recommender Systems. 92–101.

Digital Library

[3]

Huiyuan Chen, Yusan Lin, Fei Wang, and Hao Yang. 2021. Tops, bottoms, and shoes: building capsule wardrobes via cross-attention tensor network. In Proceedings of the 15th ACM Conference on Recommender Systems. 453–462.

Digital Library

[4]

Huiyuan Chen, Chin-Chia Michael Yeh, Fei Wang, and Hao Yang. 2022. Graph neural transport networks with non-local attentions for recommender systems. In Proceedings of the ACM Web Conference 2022. 1955–1964.

Digital Library

[5]

Huiyuan Chen, Kaixiong Zhou, Kwei-Herng Lai, Xia Hu, Fei Wang, and Hao Yang. 2022. Adversarial graph perturbations for recommendations at scale. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1854–1858.

Digital Library

[6]

Xiangning Chen, Cho-Jui Hsieh, and Boqing Gong. 2022. When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations. In International Conference on Learning Representations.

[7]

Yongjun Chen, Zhiwei Liu, Jia Li, Julian McAuley, and Caiming Xiong. 2022. Intent contrastive learning for sequential recommendation. In Proceedings of the ACM Web Conference 2022. 2172–2182.

Digital Library

[8]

Gabriel de Souza Pereira Moreira, Sara Rabhi, Jeong Min Lee, Ronay Ak, and Even Oldridge. 2021. Transformers4rec: Bridging the gap between nlp and sequential/session-based recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems. 143–153.

Digital Library

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.

[10]

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2021. Sharpness-aware Minimization for Efficiently Improving Generalization. In International Conference on Learning Representations.

[11]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).

[12]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In International Conference on Learning Representations.

[13]

Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, and Samy Bengio. 2020. Fantastic Generalization Measures and Where to Find Them. In International Conference on Learning Representations.

[14]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining. 197–206.

[15]

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations.

[16]

Minyoung Kim, Da Li, Shell X Hu, and Timothy Hospedales. 2022. Fisher sam: Information geometry and sharpness aware minimisation. In International Conference on Machine Learning. 11148–11161.

[17]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer (2009), 30–37.

Digital Library

[18]

Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. 2021. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning. 5905–5914.

[19]

Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q Vera Liao, Yunfeng Zhang, and Chenhao Tan. 2022. Human-ai collaboration via conditional delegation: A case study of content moderation. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18.

Digital Library

[20]

Vivian Lai, Yiming Zhang, Chacha Chen, Q Vera Liao, and Chenhao Tan. 2023. Selective explanations: Leveraging human input to align explainable ai. Proceedings of the ACM on Human-Computer InteractionCSCW2 (2023).

Digital Library

[21]

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the loss landscape of neural nets. Advances in neural information processing systems.

[22]

Jiacheng Li, Yujie Wang, and Julian McAuley. 2020. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th international conference on web search and data mining. 322–330.

Digital Library

[23]

Defu Lian, Yongji Wu, Yong Ge, Xing Xie, and Enhong Chen. 2020. Geography-aware sequential location recommendation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining.

Digital Library

[24]

Tao Lin, Lingjing Kong, Sebastian Stich, and Martin Jaggi. 2020. Extrapolation for large-batch training in deep learning. In International Conference on Machine Learning. 6094–6104.

[25]

Yong Liu, Siqi Mai, Xiangning Chen, Cho-Jui Hsieh, and Yang You. 2022. Towards efficient and scalable sharpness-aware minimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12360–12370.

[26]

Zhiwei Liu, Ziwei Fan, Yu Wang, and Philip S Yu. 2021. Augmenting sequential recommendation with pseudo-prior items via reversely pre-training transformer. In Proceedings of the 44th international ACM SIGIR conference on Research and development in information retrieval. 1608–1612.

Digital Library

[27]

Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. 2020. Disentangled self-supervision in sequential recommenders. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 483–491.

Digital Library

[28]

Dominic Masters and Carlo Luschi. 2018. Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018).

[29]

Thomas Möllenhoff and Mohammad Emtiyaz Khan. 2023. SAM as an Optimal Relaxation of Bayes. In International Conference on Learning Representations.

[30]

Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, and Daniel M Roy. 2021. Information-theoretic generalization bounds for stochastic gradient descent. In Conference on Learning Theory. 3526–3545.

[31]

Ruihong Qiu, Zi Huang, Hongzhi Yin, and Zijian Wang. 2022. Contrastive learning for representation degeneration problem in sequential recommendation. In Proceedings of the fifteenth ACM international conference on web search and data mining. 813–823.

Digital Library

[32]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452–461.

Digital Library

[33]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World Wide Web. 811–820.

Digital Library

[34]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.

Digital Library

[35]

Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proceedings of the eleventh ACM international conference on web search and data mining. 565–573.

Digital Library

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008.

[37]

Song Wang, Xingbo Fu, Kaize Ding, Chen Chen, Huiyuan Chen, and Jundong Li. 2023. Federated Few-shot Learning. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

Digital Library

[38]

Yu Wang, Yuying Zhao, Yushun Dong, Huiyuan Chen, Jundong Li, and Tyler Derr. 2022. Improving fairness in graph neural networks via mitigating sensitive attribute leakage. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1938–1948.

Digital Library

[39]

Kaiyue Wen, Tengyu Ma, and Zhiyuan Li. 2023. How Sharpness-Aware Minimization Minimizes Sharpness?. In International Conference on Learning Representations.

[40]

Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James Sharpnack. 2020. SSE-PT: Sequential recommendation via personalized transformer. In Proceedings of the 14th ACM Conference on Recommender Systems. 328–337.

Digital Library

[41]

Xu Xie, Fei Sun, Zhaoyang Liu, Shiwen Wu, Jinyang Gao, Jiandong Zhang, Bolin Ding, and Bin Cui. 2022. Contrastive learning for sequential recommendation. In 2022 IEEE 38th International Conference on Data Engineering. 1259–1273.

[42]

Chin-Chia Michael Yeh, Mengting Gu, Yan Zheng, Huiyuan Chen, Javid Ebrahimi, Zhongfang Zhuang, Junpeng Wang, Liang Wang, and Wei Zhang. 2022. Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4391–4401.

Digital Library

[43]

Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM international conference on information & knowledge management. 1893–1902.

Digital Library

Cited By

Chen HLai VJin HJiang ZDas MHu XAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Towards Mitigating Dimensional Collapse of Representations in Collaborative FilteringProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635832(106-115)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635832
Zhang ZYang BChen XLi Q(2024)A global contextual enhanced structural-aware transformer for sequential recommendationKnowledge-Based Systems10.1016/j.knosys.2024.112515304(112515)Online publication date: Nov-2024
https://doi.org/10.1016/j.knosys.2024.112515

Index Terms

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Adaptive self-supervised learning for sequential recommendation
Abstract
Sequential recommendation typically utilizes deep neural networks to mine rich information in interaction sequences. However, existing methods often face the issue of insufficient interaction data. To alleviate the sparsity issue, self-supervised ...
A Recommendation Algorithm Based on a Self-supervised Learning Pretrain Transformer
Abstract
Click-through rate (CTR) prediction is crucial research direction for the recommendation, with the goal of predicting the probability that users will click on candidate items. There are studies indicates that users’ next click behavior is ...
Time Interval Aware Collaborative Sequential Recommendation with Self-supervised Learning
Web and Big Data
Abstract
Over the last few years, sequential recommender systems have achieved a great success in different applications. In the literature, it is generally believed that items farther away from the recommendation time have a weaker impact on the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

September 2023

1406 pages

ISBN:9798400702419

DOI:10.1145/3604915

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

RecSys '23

Sponsor:

RecSys '23: Seventeenth ACM Conference on Recommender Systems

September 18 - 22, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)145
Downloads (Last 6 weeks)12

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen HLai VJin HJiang ZDas MHu XAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Towards Mitigating Dimensional Collapse of Representations in Collaborative FilteringProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635832(106-115)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635832
Zhang ZYang BChen XLi Q(2024)A global contextual enhanced structural-aware transformer for sequential recommendationKnowledge-Based Systems10.1016/j.knosys.2024.112515304(112515)Online publication date: Nov-2024
https://doi.org/10.1016/j.knosys.2024.112515

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents