research-article

Training personalized recommendation systems from (GPU) scratch: look forward not backwards

Authors:

Minsoo RhuAuthors Info & Claims

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

Pages 860 - 873

https://doi.org/10.1145/3470496.3527386

Published: 11 June 2022 Publication History

Abstract

Personalized recommendation models (RecSys) are one of the most popular machine learning workload serviced by hyperscalers. A critical challenge of training RecSys is its high memory capacity requirements, reaching hundreds of GBs to TBs of model size. In RecSys, the so-called embedding layers account for the majority of memory usage so current systems employ a hybrid CPU-GPU design to have the large CPU memory store the memory hungry embedding layers. Unfortunately, training embeddings involve several memory bandwidth intensive operations which is at odds with the slow CPU memory, causing performance overheads. Prior work proposed to cache frequently accessed embeddings inside GPU memory as means to filter down the embedding layer traffic to CPU memory, but this paper observes several limitations with such cache design. In this work, we present a fundamentally different approach in designing embedding caches for RecSys. Our proposed ScratchPipe architecture utilizes unique properties of RecSys training to develop an embedding cache that not only sees the past but also the "future" cache accesses. ScratchPipe exploits such property to guarantee that the active working set of embedding layers can "always" be captured inside our proposed cache design, enabling embedding layer training to be conducted at GPU memory speed.

References

[1]

Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free Deep Convolutional Neural Network Computing. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[2]

Alibaba. 2018. User Behavior Data from Taobao for Recommendation. https://tianchi.aliyun.com/dataset/dataDetail?dataId=649.

[3]

Bahar Asgari, Ramyad Hadidi, Jiashen Cao, Sung-Kyu Lim, Hyesoon Kim, et al. 2021. Fafnir: Accelerating Sparse Gathering by Using Efficient Near-memory Intelligent Reduction. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[4]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[5]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A Machine-learning Supercomputer. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Digital Library

[6]

Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-efficient Dataflow for Convolutional Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[7]

Minsik Cho, Tung D Le, U Finkler, Haruiki Imai, Yasushi Negishi, Taro Sekiyama, Saritha Vinod, Vladimir Zolotov, Kiyokuni Kawachiya, David S Kung, and Hillery C Hunter. 2018. Large Model Support for Deep Learning in Caffe and Chainer. In Proceedings of Machine Learning and Systems (MLSys).

[8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for Youtube Recommendations. In Proceedings of the ACM Conference on Recommender Systems (RECSYS).

Digital Library

[9]

Criteo. 2013. Criteo Terabyte Click Logs. https://labs.criteo.com/2013/12/download-terabyte-click-logs/.

[10]

Facebook. 2019. Accelerating Facebook's Infrastructure with Application-Specific Hardware. https://code.fb.com/data-center-engineering/accelerating-infrastructure/.

[11]

Facebook. 2019. MLPerf Training Script for DLRM. https://github.com/facebookresearch/dlrm/blob/master/bench/run_and_time.sh.

[12]

Google. 2020. Cloud TPUs: ML Accelerators for TensorFlow.

[13]

Huifeng Guo, Wei Guo, Yong Gao, Ruiming Tang, Xiuqiang He, and Wenzhi Liu. 2021. ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table. Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (2021).

Digital Library

[14]

Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S Lee, David Brooks, and Carole-Jean Wu. 2020. DeepRecSys: A System for Optimizing End-to-end At-scale Neural Recommendation Inference. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[15]

Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, and Xuan Zhang. 2020. The Architectural Implications of Facebook's DNN-based Personalized Recommendation. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[16]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[17]

Chien-Chin Huang, Gu Jin, and Jinyang Li. 2020. Swapadvisor: Pushing Deep Learning Beyond the GPU Memory Limit Via Smart Swapping. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[18]

Ranggi Hwang, Taehun Kim, Youngeun Kwon, and Minsoo Rhu. 2020. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[19]

Intel. 2016. Intel Processor Counter Monitor (PCM). https://github.com/opcm/pcm.

[20]

JEDEC. 2018. High Bandwidth Memory (HBM2) DRAM. (2018).

[21]

Hai Jin, Bo Liu, Wenbin Jiang, Yang Ma, Xuanhua Shi, Bingsheng He, and Shaofeng Zhao. 2018. Layer-centric Memory Reuse and Data Migration for Extreme-scale Deep Learning on Many-core Architectures. ACM Transactions on Architecture and Code Optimization (TACO) (2018).

[22]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[23]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial Deep Neural Network Computing. In Proceedings of the International Symposium on Microarchitecture (MICRO).

[24]

Hongju Kal, Seokmin Lee, Gun Ko, and Won Woo Ro. 2021. SPACE: Locality-aware Processing in Heterogeneous Memory for Personalized Recommendations. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[25]

Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, et al. 2020. RecNMP: Accelerating Personalized Recommendation with Near-memory Processing. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[26]

Byeongho Kim, Jaehyun Park, Eojin Lee, Minsoo Rhu, and Jung Ho Ahn. 2020. TRiM: Tensor Reduction in Memory, In IEEE Computer Architecture Letters. IEEE Computer Architecture Letters.

[27]

Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-density 3D Memory. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[28]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Programmable Interconnects. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[29]

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Digital Library

[30]

Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2021. Tensor Casting: Co-designing Algorithm-architecture for Personalized Recommendation Training. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[31]

Youngeun Kwon and Minsoo Rhu. 2018. A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks. In IEEE Computer Architecture Letters.

[32]

Youngeun Kwon and Minsoo Rhu. 2018. Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Digital Library

[33]

Youngeun Kwon and Minsoo Rhu. 2019. A Disaggregated Memory System for Deep Learning. In IEEE Micro.

[34]

Yejin Lee, Seong Hoon Seo, Hyunji Choi, Hyoung Uk Sul, Soosung Kim, Jae W Lee, and Tae Jun Ham. 2021. MERCI: Efficient Embedding Reduction on Commodity Hardware Via Sub-query Memoization. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[35]

Kaiwei Li, Jianfei Chen, Wenguang Chen, and Jun Zhu. 2017. SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[36]

Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An Instruction Set Architecture for Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA).

Digital Library

[37]

Michael Lui, Yavuz Yetim, Özgür Özkan, Zhuoran Zhao, Shin-Yeh Tsai, Carole-Jean Wu, and Mark Hempstead. 2021. Understanding Capacity-driven Scale-out Neural Recommendation Inference. In Proceedings of the International Symposium on Performance Analysis of Systems Software (ISPASS). IEEE, 162--171.

[38]

Jason Mohoney, Roger Waleffe, Henry Xu, Theodoros Rekatsinas, and Shivaram Venkataraman. 2021. Marius: Learning Massive Graph Embeddings on a Single Machine. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI).

[39]

Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng, Yinbin Ma, Junjie Yang, Ellie Wen, Hong Li, Lin Yang, Chonglin Sun, Whitney Zhao, Dimitry Melts, Krishna Dhulipala, KR Kishore, Tyler Graf, Assaf Eisenman, Kiran Kumar Matam, Adi Gangidi, Guoqiang Jerry Chen, Manoj Krishnan, Avinash Nayak, Krishnakumar Nair, Bharath Muthiah, Mahmoud khorashadi, Pallab Bhattacharya, Petr Lapukhov, Maxim Naumov, Ajit Mathews, Lin Qiao, Mikhail Smelyanskiy, Bill Jia, and Vijay Rao. 2021. Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models. In arxiv.org.

[40]

Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, et al. 2021. High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models. In arxiv.org.

[41]

Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R Devanur, Gregory R Ganger, Phillip B Gibbons, and Matei Zaharia. 2019. PipeDream: Generalized Pipeline Parallelism for DNN Training. In Proceedings of the ACM Symposium on Operating Systems Principles.

Digital Library

[42]

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. In arxiv.org.

[43]

NVIDIA. 2011. NVIDIA System Management Interface (nvidia-smi). https://developer.nvidia.com/nvidia-system-management-interface.

[44]

NVIDIA. 2016. NVIDIA CUDA Programming Guide.

[45]

NVIDIA. 2019. cuBLAS Library. (2019).

[46]

NVIDIA. 2019. cuDNN: GPU Accelerated Deep Learning.

[47]

NVIDIA. 2020. NVIDIA Tesla A100.

[48]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Proceedings of the International Symposium on Computer Architecture (ISCA).

[49]

Jaehyun Park, Byeongho Kim, Sungmin Yun, Eojin Lee, Minsoo Rhu, and Jung Ho Ahn. 2021. TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory. In Proceedings of the International Symposium on Microarchitecture (MICRO).

Digital Library

[50]

Xuan Peng, Xuanhua Shi, Hulin Dai, Hai Jin, Weiliang Ma, Qian Xiong, Fan Yang, and Xuehai Qian. 2020. Capuchin: Tensor-based GPU Memory Management for Deep Learning. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[51]

PyTorch. 2019. http://pytorch.org.

[52]

Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, and Yuxiong He. 2021. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. In arxiv.org.

[53]

Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon, and Dong Li. 2021. Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[54]

Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design. In Proceedings of the International Symposium on Microarchitecture (MICRO).

[55]

Minsoo Rhu, Mike O'Connor, Niladrish Chatterjee, Jeff Pool, Youngeun Kwon, and Stephen W. Keckler. 2018. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).

[56]

Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU memory management for training deep neural networks. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPOPP).

Digital Library

[57]

Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, and Gu-Yeon Wei. 2021. RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[58]

Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Proceedings of Machine Learning and Systems (MLSys).

[59]

Jie Amy Yang, Jianyu Huang, Jongsoo Park, Ping Tak Peter Tang, and Andrew Tulloch. 2020. Mixed-Precision Embedding Using a Cache. In arxiv.org.

[60]

Xinyang Yi, Yi-Fan Chen, Sukriti Ramesh, Vinu Rajashekhar, Lichan Hong, Noah Fiedel, Nandini Seshadri, Lukasz Heldt, Xiang Wu, and EH Chi. 2018. Factorized Deep Retrieval and Distributed TensorFlow Serving. In Proceedings of Machine Learning and Systems (MLSys).

[61]

Chunxing Yin, Bilge Acun, Carole-Jean Wu, and Xing Liu. 2021. TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models. In Proceedings of Machine Learning and Systems (MLSys).

[62]

Weijie Zhao, Jingyuan Zhang, Deping Xie, Yulei Qian, Ronglai Jia, and Ping Li. 2019. AIBox: CTR Prediction Model Training on a Single Node. In Proceedings of the ACM International Conference on Information and Knowledge Management.

Digital Library

Cited By

Chen CYen JLai YLin YYang C(2024)RecTS: A Temporal-Aware Memory System Optimization for Training Deep Learning Recommendation ModelsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689155(104-117)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689155
Jonatan GCho HSon HWu XLivesay NMora EShivdikar KAbellán JJoshi AKaeli DKim J(2024)Scalability Limitations of Processing-in-Memory using Real System EvaluationsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390468:1(1-28)Online publication date: 21-Feb-2024
https://dl.acm.org/doi/10.1145/3639046
Li SWang YHanson EChang ASeok Ki YLi HChen Y(2024)NDRec: A Near-Data Processing System for Training Large-Scale Recommendation ModelsIEEE Transactions on Computers10.1109/TC.2024.336593973:5(1248-1261)Online publication date: May-2024
https://doi.org/10.1109/TC.2024.3365939
Show More Cited By

Index Terms

Training personalized recommendation systems from (GPU) scratch: look forward not backwards
1. Computer systems organization
  1. Architectures
2. Computing methodologies
  1. Artificial intelligence
  2. Machine learning

Recommendations

Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Traditional graphics processing units (GPUs) suffer from the low memory capacity and demand for high memory bandwidth. To address these challenges, we propose Ohm-GPU, a new optical network based heterogeneous memory design for GPUs. Specifically, Ohm-...
Per-bank refresh with adaptive early termination for high density DRAM
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

DRAM, which is mainly used as main memory, requires a refresh operation to maintain the integrity of stored data. Since memory read and write operations to a bank are not allowed while the bank is being refreshed, a lot of memory accesses may be blocked ...
An Efficient near-Bank Processing Architecture for Personalized Recommendation System
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

Personalized recommendation systems consume the major resources in modern AI data centers. The memory-bound embedding layers with irregular memory access patterns have been identified as the bottleneck of recommendation systems. To overcome the memory ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture

June 2022

1097 pages

ISBN:9781450386104

DOI:10.1145/3470496

General Chairs:
Valentina Salapura
Google
,
Mohamed Zahran
New York University
,
Program Chairs:
Fred Chong
The University of Chicago
,
Lingjia Tang
The University of Michigan

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

In-Cooperation

IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Samsung Advanced Institute of Technology (SAIT)
National Research Foundation of Korea (NRF)

Conference

ISCA '22

Sponsor:

SIGARCH

ISCA '22: The 49th Annual International Symposium on Computer Architecture

June 18 - 22, 2022

New York, New York

Acceptance Rates

ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,157
Total Downloads

Downloads (Last 12 months)278
Downloads (Last 6 weeks)19

Reflects downloads up to 23 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen CYen JLai YLin YYang C(2024)RecTS: A Temporal-Aware Memory System Optimization for Training Deep Learning Recommendation ModelsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689155(104-117)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689155
Jonatan GCho HSon HWu XLivesay NMora EShivdikar KAbellán JJoshi AKaeli DKim J(2024)Scalability Limitations of Processing-in-Memory using Real System EvaluationsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36390468:1(1-28)Online publication date: 21-Feb-2024
https://dl.acm.org/doi/10.1145/3639046
Li SWang YHanson EChang ASeok Ki YLi HChen Y(2024)NDRec: A Near-Data Processing System for Training Large-Scale Recommendation ModelsIEEE Transactions on Computers10.1109/TC.2024.336593973:5(1248-1261)Online publication date: May-2024
https://doi.org/10.1109/TC.2024.3365939
Adnan MMaboud YMahajan DNair P(2024)Heterogeneous Acceleration Pipeline for Recommendation System Training2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00081(1063-1079)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00081
Choi YKim JRhu M(2024)ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00038(410-423)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00038
Liu HZheng LHuang YZhou JLiu CWang RLiaot XJinf HXue J(2024)Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00036(382-395)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00036
Lee YKim HRhu M(2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00033
Chen CHuang J(2023)Temporal-Guided Knowledge Graph-Enhanced Graph Convolutional Network for Personalized Movie Recommendation SystemsFuture Internet10.3390/fi1510032315:10(323)Online publication date: 28-Sep-2023
https://doi.org/10.3390/fi15100323
Pan ZZheng ZZhang FWu RLiang HWang DQiu XBai JLin WDu XAamodt TSwift MJerger N(2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624761
Jain RCheng SKalagi VSanghavi VKaul SArunachalam MMaeng KJog ASivasubramaniam AKandemir MDas CSolihin YHeinrich M(2023)Optimizing CPU Performance for Recommendation Systems At-ScaleProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589112(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589112
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents