Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA45697.2020.00083acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Centaur: a chiplet-based, hybrid sparse-dense accelerator for personalized recommendations

Published: 23 September 2020 Publication History

Abstract

Personalized recommendations are the backbone machine learning (ML) algorithm that powers several important application domains (e.g., ads, e-commerce, etc) serviced from cloud datacenters. Sparse embedding layers are a crucial building block in designing recommendations yet little attention has been paid in properly accelerating this important ML algorithm. This paper first provides a detailed workload characterization on personalized recommendations and identifies two significant performance limiters: memory-intensive embedding layers and compute-intensive multi-layer perceptron (MLP) layers. We then present Centaur, a chiplet-based hybrid sparse-dense accelerator that addresses both the memory throughput challenges of embedding layers and the compute limitations of MLP layers. We implement and demonstrate our proposal on an Intel HARPv2, a package-integrated CPU+FPGA device, which shows a 1.7---17.2x performance speedup and 1.7---19.5x energy-efficiency improvement than conventional approaches.

References

[1]
J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Convolutional Neural Network Computing," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2016.
[2]
M. Alian, S. W. Min, H. Asgharimoghaddam, A. Dhar, D. K. Wang, T. Roewer, A. McPadden, O. O'Halloran, D. Chen, J. Xiong, D. Kim, W. Hwu, and N. S. Kim, "Application-Transparent Near-Memory Processing Architecture with Memory Channel Network," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2018.
[3]
Altera, "Floating-Point IP Cores User Guide," 2016.
[4]
M. Alwani, H. Chen, M. Ferdman, and P. Milder, "Fused Layer CNN Accelerators," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2016.
[5]
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu, "Deep Speech 2: End-To-End Speech Recognition in English and Mandarin," 2015.
[6]
A. Arunkumar, E. Bolotin, B. Cho, U. Milic, E. Ebrahimi, O. Villa, A. Jaleel, C.-J. Wu, and D. Nellans, "MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2017.
[7]
H. Asghari-Moghaddam, Y. H. Son, J. H. Ahn, and N. S. Kim, "Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2016.
[8]
Caffe2, "Sparse Operations," 2017.
[9]
M. Campo, C.-K. Hsieh, M. Nickens, J. Espinoza, A. Taliyan, J. Rieger, J. Ho, and B. Sherick, "Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video Representation," in arxiv.org, 2018.
[10]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS), 2014.
[11]
Y. Chen, J. Emer, and V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2016.
[12]
Y. Chen, J. He, X. Zhang, C. Hao, and D. Chen, "Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs," in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2019.
[13]
Y. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in Proceedings of the International Solid State Circuits Conference (ISSCC), 2016.
[14]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, "DaDianNao: A Machine-Learning Supercomputer," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2014.
[15]
Y. Choi and M. Rhu, "PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units," in Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2020.
[16]
J. Choquette, "Volta: Programmability and Performance," in Hot Chips: A Symposium on High Performance Chips, 2017.
[17]
P. Covington, J. Adams, and E. Sargin, "Deep Neural Networks for Youtube Recommendations," in Proceedings of the ACM Conference on Recommender Systems (RECSYS), 2016.
[18]
J. Dean, D. Patterson, and C. Young, "A New Golden Age in Computer Architecture: Empowering the Machine-Learning Revolution," in IEEE Micro, 2018.
[19]
J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in arxiv.org, 2018.
[20]
C. Gao, D. Neil, E. Ceolini, S.-C. Liu, and T. Delbruck, "DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator," in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2018.
[21]
Google, "Cloud TPUs: ML Accelerators for TensorFlow," 2017.
[22]
U. Gupta, S. Hsia, V. Saraph, X. Wang, B. Reagen, G.-Y. Wei, H.-H. S. Lee, D. Brooks, and C.-J. Wu, "DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2020.
[23]
U. Gupta, C.-J. Wu, X. Wang, M. Naumov, B. Reagen, D. Brooks, B. Cottel, K. Hazelwood, M. Hempstead, B. Jia, H.-H. S. Lee, A. Malevich, D. Mudigere, M. Smelyanskiy, L. Xiong, and X. Zhang, "The Architectural Implications of Facebook's DNN-based Personalized Recommendation," in Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2020.
[24]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and W. J. Dally, "EIE: Efficient Inference Engine on Compressed Deep Neural Network," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2016.
[25]
K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang, "Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective," in Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2018.
[26]
X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T. Chua, "Neural Collaborative Filtering," in Proceedings of the International Conference on World Wide Web (WWW), 2017.
[27]
J. Hestness, N. Ardalani, and G. Diamos, "Beyond Human-Level Accuracy: Computational Challenges in Deep Learning," in Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPOPP), 2019.
[28]
B. Hyun, Y. Kwon, Y. Choi, J. Kim, and M. Rhu, "NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS), 2020.
[29]
Intel, "Hardware Accelerator Research Program (HARP)," 2017.
[30]
Intel, "Intel Agilex FPGAs and SoCs," 2019.
[31]
Intel, "Intel Foveros 3D Packaging Technology," 2019.
[32]
Intel, "Intel VTune Profiler," 2020.
[33]
H. Jang, J. Kim, J.-E. Jo, J. Lee, and J. Kim, "MnnFast: A Fast and Scalable System Architecture for Memory-Augmented Neural Networks," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2019.
[34]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. luc Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, "In-Datacenter Performance Analysis of a Tensor Processing Unit," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2017.
[35]
W. Jung, D. Jung, B. Kim, S. Lee, W. Rhee, and J. Ahn, "Restructuring Batch Normalization to Accelerate CNN Training," in The Conference on Systems and Machine Learning (SysML), 2019.
[36]
L. Ke, U. Gupta, C.-J. Wu, B. Y. Cho, M. Hempstead, B. Reagen, X. Zhang, D. Brooks, V. Chandra, U. Diril, A. Firoozshahian, K. Hazelwood, B. Jia, H.-H. S. Lee, M. Li, B. Maher, D. Mudigere, M. Naumov, M. Schatz, M. Smelyanskiy, and X. Wang, "RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2020.
[37]
Y. Kwon and M. Rhu, "A Disaggregated Memory System for Deep Learning," in IEEE Micro, 2019.
[38]
Y. Kwon, Y. Lee, and M. Rhu, "TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2019.
[39]
Y. Kwon and M. Rhu, "A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks," in IEEE Computer Architecture Letters, 2018.
[40]
Y. Kwon and M. Rhu, "Beyond the Memory Wall: A Case for Memory-Centric HPC System for Deep Learning," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2018.
[41]
S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Xie, Y. Chen, and T. Chen, "Cambricon: An Instruction Set Architecture for Neural Networks," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2016.
[42]
D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh, "TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning," in Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2016.
[43]
R. Mahajan, R. Sankman, N. Patel, D. Kim, K. Aygun, Z. Qian, Y. Mekonnen, I. Salama, S. Sharan, D. Iyengar, and D. Mallik, "Embedded Multi-die Interconnect Bridge (EMIB) - A High Density, High Bandwidth Packaging Interconnect," in IEEE Electronic Components and Technology Conference (ECTC), 2016.
[44]
D. J. Moss, S. Krishnan, E. Nurvitadhi, P. Ratuszniak, C. Johnson, J. Sim, A. Mishra, D. Marr, S. Subhaschandra, and P. H. Leong, "A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study," in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2018.
[45]
D. J. Moss, E. Nurvitadhi, J. Sim, A. Mishra, D. Marr, S. Subhaschandra, and P. H. Leong, "High Performance Binary Neural Networks on the Xeon+FPGA Platform," in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2017.
[46]
M. Naumov, D. Mudigere, H.-J. M. Shi, J. Huang, N. Sundaraman, J. Park, X. Wang, U. Gupta, C.-J. Wu, A. G. Azzolini, D. Dzhulgakov, A. Mallevich, I. Cherniavskii, Y. Lu, R. Krishnamoorthi, A. Yu, V. Kondratenko, S. Pereira, X. Chen, W. Chen, V. Rao, B. Jia, L. Xiong, and M. Smelyanskiy, "Deep Learning Recommendation Model for Personalization and Recommendation Systems," in arxiv.org, 2019.
[47]
N. Nethercote and J. Seward, "Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation," in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2007.
[48]
E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. O. G. Hock, Y. T. Liew, K. Srivatsan, D. Moss, S. Subhaschandra, and G. Boudoukh, "Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?" in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2017.
[49]
NVIDIA, "The NVIDIA DGX-1V Deep Learning System," 2017.
[50]
NVIDIA, "NVIDIA Tesla V100," 2018.
[51]
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2017.
[52]
E. Park, D. Kim, and S. Yoo, "Energy-efficient Neural Network Accelerator Based on Outlier-aware Low-precision Computation," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2018.
[53]
J. Park, H. Sharma, D. Mahajan, J. K. Kim, P. Olds, and H. Esmaeilzadeh, "Scale-Out Acceleration for Machine Learning," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2017.
[54]
J. Park, M. Naumov, P. Basu, S. Deng, A. Kalaiah, D. Khudia, J. Law, P. Malani, A. Malevich, S. Nadathur, J. Pino, M. Schatz, A. Sidorov, V. Sivakumar, A. Tulloch, X. Wang, Y. Wu, H. Yuen, U. Diril, D. Dzhulgakov, K. H. an Bill Jia, Y. Jia, L. Qiao, V. Rao, N. Rotem, S. Yoo, and M. Smelyanskiy, "Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications," in arxiv.org, 2018.
[55]
M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler, "vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2016.
[56]
M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler, "Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks," in Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2018.
[57]
Y. S. Shao, J. Clemons, R. Venkatesan, B. Zimmer, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina, S. G. Tell, Y. Zhang, W. J. Dally, J. Emer, C. T. Gray, B. Khailany, and S. W. Keckler, "Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2019.
[58]
H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Misra, and H. Esmaeilzadeh, "From High-Level Deep Neural Models to FPGAs," in Proceedings of the International Symposium on Microarchitecture (MICRO), 2016.
[59]
Y. Shen, M. Ferdman, and P. Milder, "Maximizing CNN Accelerator Efficiency Through Resource Partitioning," in Proceedings of the International Symposium on Computer Architecture (ISCA), 2017.
[60]
J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee, "Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba," in Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), 2018.
[61]
P. N. Whatmough, S. K. Lee, N. Mulholland, P. Hansen, S. Kodali, D. C. Brooks, and G.-Y. Wei, "DNN ENGINE: A 16nm Sub-uJ Deep Neural Network Inference Accelerator for the Embedded Masses," in Hot Chips: A Symposium on High Performance Chips, 2017.
[62]
C.-J. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang, "Machine Learning at Facebook: Understanding Inference at the Edge," in Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2019.
[63]
C.-J. Wu, D. Brooks, U. Gupta, H.-H. Lee, and K. Hazelwood, "Deep Learning: Its Not All About Recognizing Cats and Dogs," 2019.
[64]
Q. Xiao, Y. Liang, L. Lu, S. Yan, and Y. Tai, "Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs," in Design Automation Conference (DAC), 2017.
[65]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2015.
[66]
J. Zhang and J. Li, "Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network," in Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2017.
[67]
X. Zhang, J. Wang, C. Zhu, Y. Lin, J. Xiong, W. Hwu, and D. Chen, "DNNBuilder: An Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs," in Proceedings of the International Conference on Computer-Aided Design (ICCAD), 2018.

Cited By

View all
  • (2023)Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings RecommendationsProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639878(1-9)Online publication date: 25-Oct-2023
  • (2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
  • (2023)A Survey of Design and Optimization for Systolic Array-based DNN AcceleratorsACM Computing Surveys10.1145/360480256:1(1-37)Online publication date: 17-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture
May 2020
1152 pages
ISBN:9781728146614

Sponsors

In-Cooperation

  • IEEE

Publisher

IEEE Press

Publication History

Published: 23 September 2020

Check for updates

Author Tags

  1. FPGA
  2. accelerator
  3. deep learning
  4. machine learning
  5. neural network
  6. processor architecture

Qualifiers

  • Research-article

Conference

ISCA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)9
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Hetero-Rec++: Modelling-based Robust and Optimal Deployment of Embeddings RecommendationsProceedings of the Third International Conference on AI-ML Systems10.1145/3639856.3639878(1-9)Online publication date: 25-Oct-2023
  • (2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
  • (2023)A Survey of Design and Optimization for Systolic Array-based DNN AcceleratorsACM Computing Surveys10.1145/360480256:1(1-37)Online publication date: 17-Jun-2023
  • (2023)MP-Rec: Hardware-Software Co-design to Enable Multi-path RecommendationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582068(449-465)Online publication date: 25-Mar-2023
  • (2023)GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model InferenceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582029(282-301)Online publication date: 25-Mar-2023
  • (2023)Optimizing CPU Performance for Recommendation Systems At-ScaleProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589112(1-15)Online publication date: 17-Jun-2023
  • (2023)ETTE: Efficient Tensor-Train-based Computing Engine for Deep Neural NetworksProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589103(1-13)Online publication date: 17-Jun-2023
  • (2023)Accelerating Personalized Recommendation with Cross-level Near-Memory ProcessingProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589101(1-13)Online publication date: 17-Jun-2023
  • (2022)Hetero-Rec: Optimal Deployment of Embeddings for High-Speed RecommendationsProceedings of the Second International Conference on AI-ML Systems10.1145/3564121.3564134(1-9)Online publication date: 12-Oct-2022
  • (2022)Performance Model and Profile Guided Design of a High-Performance Session Based Recommendation EngineProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511692(133-144)Online publication date: 9-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media