research-article

STRec: Sparse Transformer for Sequential Recommendations

Authors:

Qing LiAuthors Info & Claims

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Pages 101 - 111

https://doi.org/10.1145/3604915.3608779

Published: 14 September 2023 Publication History

Abstract

With the rapid evolution of transformer architectures, researchers are exploring their application in sequential recommender systems (SRSs) and presenting promising performance on SRS tasks compared with former SRS models. However, most existing transformer-based SRS frameworks retain the vanilla attention mechanism, which calculates the attention scores between all item-item pairs. With this setting, redundant item interactions can harm the model performance and consume much computation time and memory. In this paper, we identify the sparse attention phenomenon in transformer-based SRS models and propose Sparse Transformer for sequential Recommendation tasks (STRec) to achieve the efficient computation and improved performance. Specifically, we replace self-attention with cross-attention, making the model concentrate on the most relevant item interactions. To determine these necessary interactions, we design a novel sampling strategy to detect relevant items based on temporal information. Extensive experimental results validate the effectiveness of STRec, which achieves the state-of-the-art accuracy while reducing 54% inference time and 70% memory cost. We also provide massive extended experiments to further investigate the property of our framework.

References

[1]

Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).

[2]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).

[3]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[5]

Emil Julius Gumbel. 1954. Statistical theory of extreme values and some practical applications: a series of lectures. Vol. 33. US Government Printing Office.

[6]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).

[7]

Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 191–200.

[8]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).

[9]

Weizhe Hua, Zihang Dai, Hanxiao Liu, and Quoc Le. 2022. Transformer quality in linear time. In International Conference on Machine Learning. PMLR, 9099–9117.

[10]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 2333–2338.

Digital Library

[11]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.

[12]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[13]

Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020).

[14]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30–37.

Digital Library

[15]

Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.

[16]

Jiacheng Li and Julian McAuley. 2020. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th international conference on web search and data mining. 322–330.

Digital Library

[17]

Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1419–1428.

Digital Library

[18]

Muyang Li, Zijian Zhang, Xiangyu Zhao, Wanyu Wang, Minghao Zhao, Runze Wu, and Ruocheng Guo. 2023. AutoMLP: Automated MLP for Sequential Recommendations. In Proceedings of the ACM Web Conference 2023. 1190–1198.

Digital Library

[19]

Muyang Li, Xiangyu Zhao, Chuan Lyu, Minghao Zhao, Runze Wu, and Ruocheng Guo. 2022. MLP4Rec: A Pure MLP Architecture for Sequential Recommendations. In 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (IJCAI-ECAI 2022). International Joint Conferences on Artificial Intelligence, 2138–2144.

[20]

Jiahao Liang, Xiangyu Zhao, Muyang Li, Zijian Zhang, Wanyu Wang, Haochen Liu, and Zitao Liu. 2023. MMMLP: Multi-modal Multilayer Perceptron for Sequential Recommendations. In Proceedings of the ACM Web Conference 2023. 1109–1117.

Digital Library

[21]

Weilin Lin, Xiangyu Zhao, Yejing Wang, Tong Xu, and Xian Wu. 2022. AdaFS: Adaptive feature selection in deep recommender system. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3309–3317.

Digital Library

[22]

Qidong Liu, Feng Tian, Qinghua Zheng, and Qianying Wang. 2023. Disentangling interest and conformity for eliminating popularity bias in session-based recommendation. Knowledge and Information Systems 65, 6 (2023), 2645–2664.

Digital Library

[23]

Changhua Pei, Yi Zhang, Yongfeng Zhang, Fei Sun, Xiao Lin, Hanxiao Sun, Jian Wu, Peng Jiang, Junfeng Ge, Wenwu Ou, 2019. Personalized re-ranking for recommendation. In Proceedings of the 13th ACM conference on recommender systems. 3–11.

Digital Library

[24]

Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A Smith, and Lingpeng Kong. 2021. Random feature attention. arXiv preprint arXiv:2103.02143 (2021).

[25]

Qi Pi, Weijie Bian, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Practice on long sequential user behavior modeling for click-through rate prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2671–2679.

Digital Library

[26]

Qi Pi, Guorui Zhou, Yujing Zhang, Zhe Wang, Lejian Ren, Ying Fan, Xiaoqiang Zhu, and Kun Gai. 2020. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2685–2692.

Digital Library

[27]

Jiarui Qin, Weinan Zhang, Xin Wu, Jiarui Jin, Yuchen Fang, and Yong Yu. 2020. User behavior retrieval for click-through rate prediction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2347–2356.

Digital Library

[28]

Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. 2021. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems 34 (2021), 13937–13949.

[29]

Steffen Rendle. 2010. Factorization machines. In 2010 IEEE International conference on data mining. IEEE, 995–1000.

Digital Library

[30]

Steffen Rendle. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.

Digital Library

[31]

Sofia Serrano and Noah A Smith. 2019. Is attention interpretable?arXiv preprint arXiv:1906.03731 (2019).

[32]

Guy Shani, David Heckerman, Ronen I Brafman, and Craig Boutilier. 2005. An MDP-based recommender system.Journal of Machine Learning Research 6, 9 (2005).

[33]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.

Digital Library

[34]

Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st workshop on deep learning for recommender systems. 17–22.

Digital Library

[35]

Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2022. Efficient transformers: A survey. Comput. Surveys 55, 6 (2022), 1–28.

Digital Library

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[37]

Sinong Wang, Belinda Z Li, Madian Khabsa, Han Fang, and Hao Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).

[38]

Yejing Wang, Xiangyu Zhao, Tong Xu, and Xian Wu. 2022. Autofield: Automating feature selection in deep recommender systems. In Proceedings of the ACM Web Conference 2022. 1977–1986.

Digital Library

[39]

Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. 2021. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems 34 (2021), 22419–22430.

[40]

Rui Wu, Zhaopeng Qiu, Jiacheng Jiang, Guilin Qi, and Xian Wu. 2022. Conditional generation net for medication recommendation. In Proceedings of the ACM Web Conference 2022. 935–945.

Digital Library

[41]

Yuhao Yang, Chao Huang, Lianghao Xia, Chunzhen Huang, Da Luo, and Kangyi Lin. 2023. Debiased Contrastive Learning for Sequential Recommendation. In Proceedings of the ACM Web Conference 2023. 1063–1073.

Digital Library

[42]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, 2020. Big bird: Transformers for longer sequences. Advances in Neural Information Processing Systems 33 (2020), 17283–17297.

[43]

Chi Zhang, Rui Chen, Xiangyu Zhao, Qilong Han, and Li Li. 2023. Denoising and Prompt-Tuning for Multi-Behavior Recommendation. In Proceedings of the ACM Web Conference 2023. 1355–1363.

Digital Library

[44]

Chi Zhang, Yantong Du, Xiangyu Zhao, Qilong Han, Rui Chen, and Li Li. 2022. Hierarchical item inconsistency signal learning for sequence denoising in sequential recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2508–2518.

Digital Library

[45]

Tingting Zhang, Pengpeng Zhao, Yanchi Liu, Victor S Sheng, Jiajie Xu, Deqing Wang, Guanfeng Liu, and Xiaofang Zhou. 2019. Feature-level Deeper Self-Attention Network for Sequential Recommendation. In IJCAI. 4320–4326.

[46]

Kesen Zhao, Xiangyu Zhao, Zijian Zhang, and Muyang Li. 2022. MAE4Rec: Storage-saving Transformer for Sequential Recommendations. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2681–2690.

Digital Library

[47]

Wayne Xin Zhao, Shanlei Mu, Yupeng Hou, Zihan Lin, Yushuo Chen, Xingyu Pan, Kaiyuan Li, Yujie Lu, Hui Wang, Changxin Tian, 2021. Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4653–4664.

Digital Library

[48]

Zhi Zheng, Chao Wang, Tong Xu, Dazhong Shen, Penggang Qin, Baoxing Huai, Tongzhu Liu, and Enhong Chen. 2021. Drug package recommendation via interaction-aware graph induction. In Proceedings of the Web Conference 2021. 1284–1295.

Digital Library

[49]

Zhi Zheng, Chao Wang, Tong Xu, Dazhong Shen, Penggang Qin, Xiangyu Zhao, Baoxing Huai, Xian Wu, and Enhong Chen. 2023. Interaction-aware drug package recommendation via policy gradient. ACM Transactions on Information Systems 41, 1 (2023), 1–32.

Digital Library

[50]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941–5948.

Digital Library

[51]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11106–11115.

[52]

Kun Zhou, Hui Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhongyuan Wang, and Ji-Rong Wen. 2020. S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1893–1902.

Digital Library

Cited By

Gao JZhao XLi MZhao MWu RGuo RLiu YYin D(2024)SMLP4Rec: An Efficient All-MLP Architecture for Sequential RecommendationsACM Transactions on Information Systems10.1145/363787142:3(1-23)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637871
Liu YWalder CXie LLiu YBaeza-Yates RBonchi F(2024)Probabilistic Attention for Sequential RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671733(1956-1967)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671733
Ding Z(2024)Sequence recommendation based on sparse Transformer and filtering structure2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498558(1452-1456)Online publication date: 19-Jan-2024
https://doi.org/10.1109/NNICE61279.2024.10498558
Show More Cited By

Index Terms

STRec: Sparse Transformer for Sequential Recommendations
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Collaborative filtering
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Generating Items Recommendations by Fusing Content and User-Item based Collaborative Filtering
Abstract
Nowadays e-commerce has spread all over the world. The e-shops are not similar to the physical shops. The e-shops can have hundreds or thousands of items independent of physical boundaries. The information about all these products is available on ...
Collaborative Sequential Recommendations via Multi-view GNN-transformers
Sequential recommendation systems aim to exploit users’ sequential behavior patterns to capture their interaction intentions and improve recommendation accuracy. Existing sequential recommendation methods mainly focus on modeling the items’ chronological ...
Position-Enhanced and Time-aware Graph Convolutional Network for Sequential Recommendations
The sequential recommendation (also known as the next-item recommendation), which aims to predict the following item to recommend in a session according to users’ historical behavior, plays a critical role in improving session-based recommender systems. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

September 2023

1406 pages

ISBN:9798400702419

DOI:10.1145/3604915

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

CityU - HKIDS Early Career Research Grant
SIRG - CityU Strategic Interdisciplinary Research Grant
Huawei (Huawei Innovation Research Program)
APRC - CityU New Research Initiatives
Tencent (CCF-Tencent Open Fund)
Tencent (Tencent Rhino-Bird Focused Research Program)
Ant Group (Ant Group Research Fund)
Ant Group (CCF-Ant Research Fund)

Conference

RecSys '23

Sponsor:

RecSys '23: Seventeenth ACM Conference on Recommender Systems

September 18 - 22, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Upcoming Conference

RecSys '24

Sponsor:
sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
1,167
Total Downloads

Downloads (Last 12 months)1,167
Downloads (Last 6 weeks)39

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao JZhao XLi MZhao MWu RGuo RLiu YYin D(2024)SMLP4Rec: An Efficient All-MLP Architecture for Sequential RecommendationsACM Transactions on Information Systems10.1145/363787142:3(1-23)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637871
Liu YWalder CXie LLiu YBaeza-Yates RBonchi F(2024)Probabilistic Attention for Sequential RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671733(1956-1967)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671733
Ding Z(2024)Sequence recommendation based on sparse Transformer and filtering structure2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498558(1452-1456)Online publication date: 19-Jan-2024
https://doi.org/10.1109/NNICE61279.2024.10498558
Zhao KLiu SCai QZhao XLiu ZZheng DJiang PGai KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)KuaiSimProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668067(44880-44897)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668067
Wang H(2023)The Fallacy of Borda Count Method - Why it is Useless with Group Intelligence and Shouldn’t be Used with Big Data including Banking Customer ServicesSHS Web of Conferences10.1051/shsconf/202317904008179(04008)Online publication date: 14-Dec-2023
https://doi.org/10.1051/shsconf/202317904008

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents