research-article

Public Access

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Authors:

Mark Wilkening,

Hsien-Hsin Sean Lee,

Carole-Jean Wu,

David BrooksAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 870 - 884

https://doi.org/10.1145/3466752.3480127

Published: 17 October 2021 Publication History

All formats PDF

Abstract

Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs). While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAccel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Compared to previously proposed specialized recommendation accelerators and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3 × and 6 ×.

References

[1]

Bilge Acun, Matthew Murphy, Xiaodong Wang, Jade Nie, Carole-Jean Wu, and Kim Hazelwood. 2020. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale. arXiv preprint arXiv:2011.05497(2020).

[2]

Muhammad Adnan, Yassaman Ebrahimzadeh Maboud, Divya Mahajan, and Prashant J Nair. 2021. High-Performance Training by Exploiting Hot-Embeddings in Recommendation Systems. arXiv preprint arXiv:2103.00686(2021).

[3]

Bahar Asgari, Ramyad Hadidi, Jiashen Cao, Da Eun Shim, Sung-Kyu Lim, and Hyesoon Kim. 2021. FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction. In 2021 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]

Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, 2016. A cloud-scale acceleration architecture. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–13.

[5]

Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, and Hang Li. 2009. Ranking Measures and Loss Functions in Learning to Rank. In Advances in Neural Information Processing Systems, Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (Eds.), Vol. 22. Curran Associates, Inc., 315–323. https://proceedings.neurips.cc/paper/2009/file/2f55707d4193dc27118a0f19a1985716-Paper.pdf

[6]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Teman. 2014. DaDianNao: A Machine-Learning Supercomputer. In MICRO.

[7]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138.

[8]

Yujeong Choi and Minsoo Rhu. 2020. Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 220–233.

[9]

Marshall Choy. 2020. Accelerating the Modern Machine Learning Workhorse: Recommendation Inference. https://sambanova.ai/blog/accelerating-the-modern-ml-workhorse-recommendation-inference

[10]

NEUCHIPS Corp.2020. NEUCHIPS Recommendation Accelerator RecAccel. https://2ca8d951-4386-4e41-9cab-50c86da5f5a8.filesusr.com/ugd/d79931_9382d53600f54d21a6eabe46d1f0ffa2.pdf

[11]

Zhaoxia Summer Deng, Jongsoo Park, Ping Tak Peter Tang, Haixin Liu, Jie Yang, Hector Yuen, Jianyu Huang, Daya S Khudia, Xiaohan Wei, Ellie Wen, Dhruv Choudhary, Raghuraman Krishnamoorthi, Carole-Jean Wu, Nadathur Satish, Changkyu Kim, Maxim Naumov, Sam Naghshineh, and Misha Smelyanskiy. 2021. Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale. IEEE Micro (2021), 1–1. https://doi.org/10.1109/MM.2021.3081981

Digital Library

[12]

Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, and Arun Kejariwal. 2021. Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems. arxiv:cs.IR/2105.01064

[13]

Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim Hazelwood, Asaf Cidon, and Sachin Katti. 2018. Bandana: Using Non-volatile Memory for Storing Deep Learning Models. arxiv:cs.LG/1811.05922

[14]

Benjamin Ghaemmaghami, Zihao Deng, Benjamin Cho, Leo Orshansky, Ashish Kumar Singh, Mattan Erez, and Michael Orshansky. 2020. Training with Multi-Layer Embeddings for Model Reduction. arxiv:cs.LG/2006.05623

[15]

Soroush Ghodrati, Byung Hoon Ahn, Joon Kyung Kim, Sean Kinzer, Brahmendra Reddy Yatham, Navateja Alla, Hardik Sharma, Mohammad Alian, Eiman Ebrahimi, Nam Sung Kim, Cliff Young, and Hadi Esmaeilzadeh. 2020. Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 681–697. https://doi.org/10.1109/MICRO50266.2020.00062

[16]

Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, and James Zou. 2019. Mixed dimension embeddings with application to memory-efficient recommendation systems. arXiv preprint arXiv:1909.11810(2019).

[17]

Shuhei Goda, Naomichi Agata, and Yuya Matsumura. 2020. A Stacking Ensemble Model for Prediction of Multi-Type Tweet Engagements. In Proceedings of the Recommender Systems Challenge 2020(RecSysChallenge ’20). Association for Computing Machinery, New York, NY, USA, 6–10. https://doi.org/10.1145/3415959.3415994

Digital Library

[18]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandom Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, and Carole-Jean Wu. 2020. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 982–995. https://doi.org/10.1109/ISCA45697.2020.00084

Digital Library

[19]

Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M Rush, Gu-Yeon Wei, and David Brooks. 2019. Masr: A modular accelerator for sparse rnns. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 1–14.

Digital Library

[20]

Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Mark Hempstead, Bill Jia, 2020. The architectural implications of facebook’s DNN-based personalized recommendation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 488–501.

[21]

Vipul Gupta, Dhruv Choudhary, Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, and Michael W. Mahoney. 2021. Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining(KDD ’21). Association for Computing Machinery, New York, NY, USA, 2928–2936. https://doi.org/10.1145/3447548.3467080

Digital Library

[22]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 243–254.

Digital Library

[23]

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context.ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (Dec. 2015), 19 pages. https://doi.org/10.1145/2827872

Digital Library

[24]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, 2018. Applied machine learning at facebook: A datacenter infrastructure perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 620–629.

[25]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web(WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173–182. https://doi.org/10.1145/3038912.3052569

Digital Library

[26]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.

Digital Library

[27]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861(2017).

[28]

Samuel Hsia, Udit Gupta, Mark Wilkening, Carole-Jean Wu, Gu-Yeon Wei, and David Brooks. 2020. Cross-Stack Workload Characterization of Deep Recommendation Systems. In 2020 IEEE International Symposium on Workload Characterization (IISWC). 157–168. https://doi.org/10.1109/IISWC50251.2020.00024

[29]

Yuzhen Huang, Xiaohan Wei, Xing Wang, Jiyan Yang, Bor-Yiing Su, Shivam Bharuka, Dhruv Choudhary, Zewei Jiang, Hai Zheng, and Jack Langman. 2021. Hierarchical Training: Scaling Deep Recommendation Models on Large CPU Clusters. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining(KDD ’21). Association for Computing Machinery, New York, NY, USA, 3050–3058. https://doi.org/10.1145/3447548.3467084

Digital Library

[30]

Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-Based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture(ISCA ’20). IEEE Press, 968–981. https://doi.org/10.1109/ISCA45697.2020.00083

Digital Library

[31]

Taylor Gordon Ivan Medvedev, Haotian Wu. 2019. Powered by AI: Instagram’s Explore recommender system. https://ai.facebook.com/blog/powered-by-ai-instagrams-explore-recommender-system/

[32]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.

Digital Library

[33]

Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, 2021. MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions. Proceedings of Machine Learning and Systems 3 (2021).

[34]

Wenqi Jiang, Zhenhao He, Shuai Zhang, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. 2021. FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining(KDD ’21). Association for Computing Machinery, New York, NY, USA, 3097–3105. https://doi.org/10.1145/3447548.3467139

Digital Library

[35]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1–12.

Digital Library

[36]

Criteo Kaggle. 2014. Display Advertising Challenge: Predict click-through rates on display ads. https://www.kaggle.com/c/criteo-display-ad-challenge

[37]

Wang-Cheng Kang and Julian McAuley. 2019. Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1523–1532.

Digital Library

[38]

Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, 2020. Recnmp: Accelerating personalized recommendation with near-memory processing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 790–803.

Digital Library

[39]

Liu Ke, Xuan Zhang, Jinin So, Jong-Geon Lee, Shin-Haeng Kang, Sukhan Lee, Songyi Han, Yeongon Cho, Jin Hyun Kim, Yongsuk Kwon, Kyungsoo Kim, Jin Jung, Ilkwon Yun, Sung Joo Park, Hyunsun Park, Joonho Song, Jeonghyeon Cho, Kyomin Sohn, Nam Sung Kim, and Hsien-Hsin Sean Lee. 2021. Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM. IEEE Micro (2021), 1–1. https://doi.org/10.1109/MM.2021.3097700

[40]

Byeongho Kim, Jaehyun Park, Eojin Lee, Minsoo Rhu, and Jung Ho Ahn. 2021. TRiM: Tensor Reduction in Memory. IEEE Computer Architecture Letters 20, 1 (2021), 5–8. https://doi.org/10.1109/LCA.2020.3042805

Digital Library

[41]

Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, and Vikas Chandra. 2019. Herald: Optimizing heterogeneous dnn accelerators for edge devices. arXiv preprint arXiv:1909.07437(2019).

[42]

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 740–753.

Digital Library

[43]

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2020. Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training. arxiv:cs.AR/2010.13100

[44]

Shih-Hsiang Lin, Pei-Yin Chen, and Yu-Ning Lin. 2017. Hardware Design of Low-Power High-Throughput Sorting Unit. IEEE Trans. Comput. 66, 8 (2017), 1383–1395. https://doi.org/10.1109/TC.2017.2672966

Digital Library

[45]

Michael Lui, Yavuz Yetim, Özgür Özkan, Zhuoran Zhao, Shin-Yeh Tsai, Carole-Jean Wu, and Mark Hempstead. 2020. Understanding Capacity-Driven Scale-Out Neural Recommendation Inference. arXiv preprint arXiv:2011.02084(2020).

[46]

Maxim Naumov, John Kim, Dheevatsa Mudigere, Srinivas Sridharan, Xiaodong Wang, Whitney Zhao, Serhat Yilmaz, Changkyu Kim, Hector Yuen, Mustafa Ozdal, 2020. Deep Learning Training in Facebook Data Centers: Design of Scale-up and Scale-out Systems. arXiv preprint arXiv:2003.09518(2020).

[47]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091(2019).

[48]

Lillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu-Yeon Wei, and David Brooks. 2019. Maxnvm: Maximizing dnn storage density and inference efficiency with sparse encoding and error mitigation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 769–781.

Digital Library

[49]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 267–278.

Digital Library

[50]

Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883(2018).

[51]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.

[52]

Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020. Compositional embeddings using complementary partitions for memory-efficient recommendation systems. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 165–175.

Digital Library

[53]

Franyell Silfa, Gem Dot, Jose-Maria Arnau, and Antonio Gonzalez. 2017. E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks. (2017). arxiv:1711.07480https://arxiv.org/pdf/1711.07480.pdf

[54]

Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2019. Hypar: Towards hybrid parallelism for deep learning accelerator array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 56–68.

[55]

Hu Wan, Xuan Sun, Yufei Cui, Chia-Lin Yang, Tei-Wei Kuo, and Chun Jason Xue. 2021. FlashEmbedding: Storing Embedding Tables in SSD for Large-Scale Recommender Systems. In Proceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems(APSys ’21). Association for Computing Machinery, New York, NY, USA, 9–16. https://doi.org/10.1145/3476886.3477511

Digital Library

[56]

Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference. In 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

[57]

Minhui Xie, Kai Ren, Youyou Lu, Guangxu Yang, Qingxing Xu, Bihai Wu, Jiazhen Lin, Hongbo Ao, Wanhong Xu, and Jiwu Shu. 2020. Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(SC ’20). IEEE Press, Article 21, 17 pages.

[58]

Xinyang Yi, Yi-Fan Chen, Sukriti Ramesh, Vinu Rajashekhar, Lichan Hong, Noah Fiedel, Nandini Seshadri, Lukasz Heldt, Xiang Wu, and Ed H. Chi. 2018. Factorized Deep Retrieval and Distributed TensorFlow Serving(SysML’18).

[59]

Chunxing Yin, Bilge Acun, Xing Liu, and Carole-Jean Wu. 2021. TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models(MLSys’21).

[60]

Jeff Zhang, Sameh Elnikety, Shuayb Zarar, Atul Gupta, and Siddharth Garg. 2020. Model-switching: Dealing with fluctuating workloads in machine-learning-as-a-service systems. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).

[61]

Jeff (Jun) Zhang, Parul Raj, Shuayb Zarar, Amol Ambardekar, and Siddharth Garg. 2019. CompAct: On-Chip ComPression of ActIvations for Low Power Systolic Array Based CNN Acceleration. ACM Trans. Embed. Comput. Syst. 18, 5s, Article 47 (Oct. 2019), 24 pages.

[62]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12. https://doi.org/10.1109/MICRO.2016.7783723

[63]

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. arxiv:cs.DC/2003.05622

[64]

Weijie Zhao, Jingyuan Zhang, Deping Xie, Yulei Qian, Ronglai Jia, and Ping Li. 2019. AIBox: CTR Prediction Model Training on a Single Node. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management(CIKM ’19). Association for Computing Machinery, New York, NY, USA, 319–328.

Digital Library

[65]

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. 2019. Recommending What Video to Watch Next: A Multitask Ranking System. In Proceedings of the 13th ACM Conference on Recommender Systems(RecSys ’19). ACM, New York, NY, USA, 43–51. https://doi.org/10.1145/3298689.3346997

Digital Library

[66]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5941–5948.

Digital Library

[67]

Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1059–1068.

Digital Library

Cited By

Qiu YLu LYi SJing MZeng XKong YFan Y(2025)Flips: A Flexible Partitioning Strategy Near Memory Processing Architecture for Recommendation SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353953436:4(745-758)Online publication date: Apr-2025
https://doi.org/10.1109/TPDS.2025.3539534
Tian RJiang JDu JHuang DLu Y(2024)Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343262035:11(2177-2192)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3432620
Lim JKim YChung SKoushanfar FKong J(2024)Near-Memory Computing With Compressed Embedding Table for Personalized RecommendationIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.334587012:3(938-951)Online publication date: Jul-2024
https://doi.org/10.1109/TETC.2023.3345870
Show More Cited By

Recommendations

Personalized Recommendation Algorithm Using User Demography Information
WKDD '09: Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data Mining

Personalized recommendation systems are web-based systems that aim at predicting a user’s interest on available products and services by relying on previously rated items and dealing with the problem of information and product overload. User demography ...
Userrank for item-based collaborative filtering recommendation

With the recent explosive growth of the Web, recommendation systems have been widely accepted by users. Item-based Collaborative Filtering (CF) is one of the most popular approaches for determining recommendations. A common problem of current item-based ...
Attention-driven Factor Model for Explainable Personalized Recommendation
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Latent Factor Models (LFMs) based on Collaborative Filtering (CF) have been widely applied in many recommendation systems, due to their good performance of prediction accuracy. In addition to users' ratings, auxiliary information such as item features ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
1,804
Total Downloads

Downloads (Last 12 months)407
Downloads (Last 6 weeks)35

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qiu YLu LYi SJing MZeng XKong YFan Y(2025)Flips: A Flexible Partitioning Strategy Near Memory Processing Architecture for Recommendation SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2025.353953436:4(745-758)Online publication date: Apr-2025
https://doi.org/10.1109/TPDS.2025.3539534
Tian RJiang JDu JHuang DLu Y(2024)Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU PlatformIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343262035:11(2177-2192)Online publication date: Nov-2024
https://doi.org/10.1109/TPDS.2024.3432620
Lim JKim YChung SKoushanfar FKong J(2024)Near-Memory Computing With Compressed Embedding Table for Personalized RecommendationIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2023.334587012:3(938-951)Online publication date: Jul-2024
https://doi.org/10.1109/TETC.2023.3345870
Yang WYang YJi SJiang JJing NWang QMao ZSheng W(2024)RecPIM: Efficient In-Memory Processing for Personalized Recommendation Inference Using Near-Bank ArchitectureIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338611743:10(2854-2867)Online publication date: Oct-2024
https://doi.org/10.1109/TCAD.2024.3386117
Nair KPandey AKarabannavar SArunachalam MKalamatianos JAgrawal VGupta SSirasao ADelaye EReinhardt SVivekanandham RWittig RKathail VGopalakrishnan PPareek SJain RKandemir MLin JAkbulut GDas C(2024)Parallelization Strategies for DLRM Embedding Bag Operator on AMD CPUsIEEE Micro10.1109/MM.2024.342378544:6(44-51)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/MM.2024.3423785
Jain RBhasi VJog ASivasubramaniam AKandemir MDas C(2024)Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00091(1217-1232)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00091
Choi YKim JRhu M(2024)ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00038(410-423)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00038
Lee YKim HRhu M(2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00033
Krishnan ANambiar MSinghal R(2023)Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings RecommendationsProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639878(1-9)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3639856.3639878
Hsia SGupta UAcun BArdalani NZhong PWei GBrooks DWu CAamodt TJerger NSwift M(2023)MP-Rec: Hardware-Software Co-design to Enable Multi-path RecommendationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582068(449-465)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582068
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten