Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3357384.3357925acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

Published: 03 November 2019 Publication History

Abstract

Click-through rate (CTR) prediction, which aims to predict the probability of a user clicking on an ad or an item, is critical to many online applications such as online advertising and recommender systems. The problem is very challenging since (1) the input features (e.g., the user id, user age, item id, item category) are usually sparse and high-dimensional, and (2) an effective prediction relies on high-order combinatorial features (a.k.a. cross features), which are very time-consuming to hand-craft by domain experts and are impossible to be enumerated. Therefore, there have been efforts in finding low-dimensional representations of the sparse and high-dimensional raw features and their meaningful combinations. In this paper, we propose an effective and efficient method called the AutoInt to automatically learn the high-order feature interactions of input features. Our proposed algorithm is very general, which can be applied to both numerical and categorical input features. Specifically, we map both the numerical and categorical features into the same low-dimensional space. Afterwards, a multi-head self-attentive neural network with residual connections is proposed to explicitly model the feature interactions in the low-dimensional space. With different layers of the multi-head self-attentive neural networks, different orders of feature combinations of input features can be modeled. The whole model can be efficiently fit on large-scale raw data in an end-to-end fashion. Experimental results on four real-world datasets show that our proposed approach not only outperforms existing state-of-the-art approaches for prediction but also offers good explainability. Code is available at: \urlhttps://github.com/DeepGraphLearning/RecommenderSystems.

References

[1]
Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, et almbox. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, Vol. 16. 265--283.
[2]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations .
[3]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, Vol. 35, 8 (2013), 1798--1828.
[4]
Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H Chi. 2018. Latent Cross: Making Use of Context in Recurrent Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 46--54.
[5]
Mathieu Blondel, Akinori Fujino, Naonori Ueda, and Masakazu Ishihata. 2016a. Higher-order factorization machines. In Advances in Neural Information Processing Systems. 3351--3359.
[6]
Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, and Naonori Ueda. 2016b. Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms. In International Conference on Machine Learning . 850--858.
[7]
Chen Cheng, Fen Xia, Tong Zhang, Irwin King, and Michael R Lyu. 2014. Gradient boosting factorization machines. In Proceedings of the 8th ACM Conference on Recommender systems. ACM, 265--272.
[8]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et almbox. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.
[9]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198.
[10]
Thore Graepel, Joaquin Qui nonero Candela, Thomas Borchert, and Ralf Herbrich. 2010. Web-scale Bayesian Click-through Rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine. In Proceedings of the 27th International Conference on International Conference on Machine Learning. 13--20.
[11]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-machine Based Neural Network for CTR Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence . AAAI Press, 1725--1731.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[13]
Xiangnan He and Tat-Seng Chua. 2017. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 355--364.
[14]
Xiangnan He, Zhankui He, Jingkuan Song, Zhenguang Liu, Yu-Gang Jiang, and Tat-Seng Chua. 2018. NAIS: Neural attentive item similarity model for recommendation. IEEE Transactions on Knowledge and Data Engineering, Vol. 30, 12 (2018), 2354--2366.
[15]
Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, et almbox. 2014. Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 1--9.
[16]
Yuchin Juan, Yong Zhuang, Wei-Sheng Chin, and Chih-Jen Lin. 2016. Field-aware factorization machines for CTR prediction. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 43--50.
[17]
Diederick P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations .
[18]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2011. Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun. ACM, Vol. 54, 10 (2011), 95--103.
[19]
Jianxun Lian, Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and Guangzhong Sun. 2018. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1754--1763.
[20]
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. In International Conference on Learning Representations .
[21]
H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, et almbox. 2013. Ad Click Prediction: A View from the Trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1222--1230.
[22]
Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-Value Memory Networks for Directly Reading Documents. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 1400--1409.
[23]
Alexander Novikov, Mikhail Trofimov, and Ivan Oseledets. 2016. Exponential machines. arXiv preprint arXiv:1605.03795 (2016).
[24]
Richard J Oentaryo, Ee-Peng Lim, Jia-Wei Low, David Lo, and Michael Finegold. 2014. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 123--132.
[25]
Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang. 2016. Product-based neural networks for user response prediction. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 1149--1154.
[26]
Steffen Rendle. 2010. Factorization machines. In Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 995--1000.
[27]
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. ACM, 811--820.
[28]
Steffen Rendle, Zeno Gantner, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2011. Fast context-aware recommendations with factorization machines. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 635--644.
[29]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. 2007. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web. ACM, 521--530.
[30]
Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A Neural Attention Model for Abstractive Sentence Summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 379--389.
[31]
Lili Shan, Lei Lin, Chengjie Sun, and Xiaolong Wang. 2016b. Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization. Electronic Commerce Research and Applications, Vol. 16 (2016), 30--42.
[32]
Ying Shan, T Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and JC Mao. 2016a. Deep crossing: Web-scale modeling without manually crafted combinatorial features. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 255--262.
[33]
Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-based Social Recommendation via Dynamic Graph Attention Networks. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 555--563.
[34]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, Vol. 15, 1 (2014), 1929--1958.
[35]
Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et almbox. 2015. End-to-end memory networks. In Advances in neural information processing systems. 2440--2448.
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000--6010.
[37]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations .
[38]
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD'17 . ACM, 12:1--12:7.
[39]
Xiang Wang, Xiangnan He, Fuli Feng, Liqiang Nie, and Tat-Seng Chua. 2018. TEM: Tree-enhanced Embedding Model for Explainable Recommendation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1543--1552.
[40]
Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional factorization machines: learning the weight of feature interactions via attention networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 3119--3125.
[41]
Weinan Zhang, Tianming Du, and Jun Wang. 2016. Deep learning over multi-field categorical data. In European conference on information retrieval . Springer, 45--57.
[42]
Qian Zhao, Yue Shi, and Liangjie Hong. 2017. GB-CENT: Gradient Boosted Categorical Embedding and Numerical Trees. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1311--1319.
[43]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep Interest Network for Click-Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1059--1068.
[44]
Jie Zhu, Ying Shan, JC Mao, Dong Yu, Holakou Rahmanian, and Yi Zhang. 2017. Deep embedding forest: Forest-based serving with deep embedding features. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1703--1711.

Cited By

View all
  • (2025)Upper bound on the predictability of rating prediction in recommender systemsInformation Processing & Management10.1016/j.ipm.2024.10395062:1(103950)Online publication date: Jan-2025
  • (2024)Harnessing Risks with Data: A Leakage Assessment Framework for WDN Using Multi-Attention Mechanisms and Conditional GAN-Based Data BalancingWater10.3390/w1622332916:22(3329)Online publication date: 19-Nov-2024
  • (2024)MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for RecommendationsFuture Internet10.3390/fi1608027016:8(270)Online publication date: 29-Jul-2024
  • Show More Cited By
  1. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
    November 2019
    3373 pages
    ISBN:9781450369763
    DOI:10.1145/3357384
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ctr prediction
    2. explainable recommendation
    3. high-order feature interactions
    4. self attention

    Qualifiers

    • Research-article

    Funding Sources

    • Chinese Scholarship Council
    • National Natural Science Foundation of China
    • Natural Sciences and Engineering Research Council of Canada
    • Canada CIFAR AI Chair
    • National Key Research and Development Program of China
    • Beijing Municipal Commission of Science and Technology

    Conference

    CIKM '19
    Sponsor:

    Acceptance Rates

    CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)345
    • Downloads (Last 6 weeks)38
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Upper bound on the predictability of rating prediction in recommender systemsInformation Processing & Management10.1016/j.ipm.2024.10395062:1(103950)Online publication date: Jan-2025
    • (2024)Harnessing Risks with Data: A Leakage Assessment Framework for WDN Using Multi-Attention Mechanisms and Conditional GAN-Based Data BalancingWater10.3390/w1622332916:22(3329)Online publication date: 19-Nov-2024
    • (2024)MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for RecommendationsFuture Internet10.3390/fi1608027016:8(270)Online publication date: 29-Jul-2024
    • (2024)RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding ModelElectronics10.3390/electronics1317347313:17(3473)Online publication date: 1-Sep-2024
    • (2024)XGBoost-Enhanced Graph Neural Networks: A New Architecture for Heterogeneous Tabular DataApplied Sciences10.3390/app1413582614:13(5826)Online publication date: 3-Jul-2024
    • (2024)Feature-Interaction-Enhanced Sequential Transformer for Click-Through Rate PredictionApplied Sciences10.3390/app1407276014:7(2760)Online publication date: 26-Mar-2024
    • (2024)MeFiNet: Modeling multi-semantic convolution-based feature interactions for CTR predictionIntelligent Data Analysis10.3233/IDA-22711328:1(261-278)Online publication date: 3-Feb-2024
    • (2024)Personalized Recommendation Multi-Objective Optimization Model Based on Deep LearningInternational Journal of Advanced Network, Monitoring and Controls10.2478/ijanmc-2024-00059:1(44-57)Online publication date: 28-Mar-2024
    • (2024)BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data PredictionChinese Journal of Electronics10.23919/cje.2023.00.04133:4(1077-1092)Online publication date: Jul-2024
    • (2024)FSASA: Sequential recommendation based on fusing session-aware models and self-attention networksComputer Science and Information Systems10.2298/CSIS230522067G21:1(1-20)Online publication date: 2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media