Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3698038.3698549acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Free access

Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores

Published: 20 November 2024 Publication History

Abstract

Storage disaggregation underlies today's cloud and is naturally complemented by pushing down some computation to storage, thus mitigating the potential network bottleneck between the storage and compute tiers. We show how ML training benefits from storage pushdowns by focusing on transfer learning (TL), the widespread technique that democratizes ML by reusing existing knowledge on related tasks. We propose HAPI, a new TL processing system centered around two complementary techniques that address challenges introduced by disaggregation. First, applications must carefully balance execution across tiers for performance. HAPI judiciously splits the TL computation during the feature extraction phase yielding pushdowns that not only improve network time but also improve total TL training time by overlapping the execution of consecutive training iterations across tiers. Second, operators want resource efficiency from the storage-side computational resources. HAPI employs storage-side batch size adaptation allowing increased storage-side pushdown concurrency without affecting training accuracy. HAPI yields up to 2.5× training speed-up while choosing in 86.8% of cases the best performing split point or one that is at most 5% off from the best.

References

[1]
Amazon. 2021. AQUA (Advanced Query Accelerator) - A Speed Boost for Your Amazon Redshift Queries. https://aws.amazon.com/blogs/a ws/new-aqua-advanced-query-accelerator-for-amazon-redshift/
[2]
Amazon. 2024. Amazon S3. https://aws.amazon.com/s3/
[3]
Inc. Amazon Web Services. 2024. Build Generative AI Applications with Foundation Models - Amazon Bedrock. https://aws.amazon.com/bedrock/
[4]
Amazon AWS. 2021. Use pre-trained financial language models for transfer learning in Amazon SageMaker JumpStart. https://aws.amazon.com/blogs/machine-learning/use-pre-trained-financial-language-models-for-transfer-learning-in-amazon-sagemaker-jumpstart/
[5]
Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In OSDI 2020.
[6]
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding. (2021). https://arxiv.org/abs/2102.05095
[7]
AWS News Blog. 2017. S3 Select and Glacier Select - Retrieving Subsets of Objects. https://aws.amazon.com/blogs/aws/s3- glacier-select/
[8]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In NeurIPS 2020.
[9]
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. (2020). https://arxiv.org/abs/2005.14165
[10]
Curcial by Micron. 2024. Crucial T705 Gen5 NVMe M.2 SSD. https://www.crucial.in/content/dam/crucial/ssd-products/t705/flyers/b2c/crucial-t705-b2c-product-flyer-en.pdf
[11]
Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, and Leonidas Rigas. 2011. Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency. In SOSP 2011.
[12]
Chan Jung Chang, Jerry Chou, Yu-Ching Chou, andI-Hsin Chung. 2020. ECS2: A Fast Erasure Coding Library for GPU-Accelerated Storage Systems with Parallel & Direct IO. In CLUSTER 2020.
[13]
Alibaba Cloud. 2021. Real-Time Image Processing by Object Storage Service. https://www.alibabacloud.com/blog/real-time-image-processing-by-object-storage-service_597996
[14]
Google Cloud. 2024. Vertex AI. https://cloud.google.com/vertex-ai
[15]
McKinsey & Company. 2019. Edlich, Alex, et al. Driving impact at scale from automation and AI. https://mck.co/3IQIBVi
[16]
Microsoft Azure Databricks. 2024. Featurization for transfer learning. https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/preprocess-data/transfer-learning-tensorflow
[17]
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, and Neil Houlsby. 2023. Scaling Vision Transformers to 22 Billion Parameters. https://arxiv.org/abs/2302.05442
[18]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR 2009.
[19]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLORA: efficient finetuning of quantized LLMs. In NeurIPS 2023.
[20]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT 2019.
[21]
Dominik Durner, Viktor Leis, and Thomas Neumann. 2023. Exploiting Cloud Object Storage for High-Performance Analytics. In VLDB 2023.
[22]
Onno Eberhard and Torsten Zesch. 2021. Effects of Layer Freezing on Transferring a Speech Recognition System to Under-resourced Languages. In KONVENS 2021.
[23]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, Fei Feng, Yan Zhuang, Fan Liu, Pan Liu, Xingkui Liu, Zhongjie Wu, Junping Wu, Zheng Cao, Chen Tian, Jinbo Wu, Jiaji Zhu, Haiyong Wang, Dennis Cai, and Jiesheng Wu. 2021. When Cloud Storage Meets RDMA. In NSDI 2021.
[24]
Yanjie Gao, Yu Liu, Hongyu Zhang, Zhengxian Li, Yonghao Zhu, Haoxiang Lin, and Mao Yang. 2020. Estimating GPU memory consumption of deep learning models. In ESEC/FSE 2020.
[25]
Christos Gkantsidis, Dimitrios Vytiniotis, Orion Hodson, Dushyanth Narayanan, Florin Dinu, and Ant Rowstron. 2013. Rhea: automatic filtering for unstructured cloud storage. In NSDI 2013.
[26]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
[27]
Google. 2024. Google Cloud Storage. https://cloud.google.com/storage
[28]
Ryan Grainger, Thomas Paniagua, Xi Song, Naresh P. Cuntoor, Mun Wai Lee, and Tianfu Wu. 2022. PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers. In CVPR 2022.
[29]
Dan Graur, Oto Mraz, Muyu Li, Sepehr Pourghannad, Chandramohan A. Thekkath, and Ana Klimovic. 2024. Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement. In USENIX ATC 2024.
[30]
Rong Gu, Zhihao Xu, Yang Che, Xu Wang, Haipeng Dai, Kai Zhang, Bin Fan, Haojun Hou, Li Yi, Yu Ding, Yihua Huang, and Guihai Chen. 2023. High-Level Data Abstraction and Elastic Data Caching for Data-Intensive AI Applications on Cloud-Native Platforms. IEEE Transactions on Parallel and Distributed Systems (2023).
[31]
Walker Haddock, Matthew L Curry, Purushotham V Bangalore, and Anthony Skjellum. 2017. GPU erasure coding for campaign storage. In HPC-IODC Workshop of ISC 2017.
[32]
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al. 2014. Deep speech: Scaling up end-to-end speech recognition. (2014). https://arxiv.org/abs/1412.5567
[33]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR 2016.
[34]
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. https://arxiv.org/abs/2106.09685
[35]
Jinbin Hu, Ying Liu, Hao Wang, and Jin Wang. 2024. AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU Cluster. In ICPP 2024.
[36]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR 2017.
[37]
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. In NeurIPS 2019.
[38]
Huawei. 2024. Huawei ECS. https://www.huaweicloud.com/intl/en-us/product/ecs.html
[39]
IBM. 2024. IBM Storage Scale System 6000. https://www.ibm.com/products/storage-scale-system
[40]
Alexander Isenko, Ruben Mayer, Jedele Jeffrey, and Hans-Arno Jacobsen. 2022. Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines. In SlGMOD 2022.
[41]
Jinwoo Jeong, Seungsu Baek, and Jeongseob Ahn. 2023. Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access. In EuroSys 2023.
[42]
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters. In OSDl 2020.
[43]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS 2017.
[44]
Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2021. Transformers in vision: A survey. ACM Computing Surveys (CSUR) (2021).
[45]
Jungwoo Kim, Seonggyun Oh, Jaeha Kung, Yeseong Kim, and Sungjin Lee. 2024. NDPipe: Exploiting Near-data Processing for Scalable Inference and Continuous Training in Photo Storage. In ASPLOS 2024.
[46]
Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, and Animesh Trivedi. 2018. Understanding Ephemeral Storage for Serverless Analytics. In USENIX ATC 2018.
[47]
Gunjae Koo, Kiran Kumar Matam, Te I., H.V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading Communication with Computing Near Storage. In MICRO 2017.
[48]
Maximilian Kuschewski, Jana Giceva, Thomas Neumann, and Viktor Leis. 2025. High-Performance Query Processing with NVMe Arrays: Spilling without Killing Performance. In SIGMOD 2025.
[49]
Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2016. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics. In VLDB 2016.
[50]
Yixiao Li, Yifan Yu, Chen Liang, Nikos Karampatziakis, Pengcheng He, Weizhu Chen, and Tuo Zhao. 2024. LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models. In ICLR 2024.
[51]
Kunal Lillaney, Vasily Tarasov, David Pease, and Randal Burns. 2019. The Case for Dual-access File Systems over Object Storage. In HotStorage 2019.
[52]
Guodong Liu, Youshan Miao, Zhiqi Lin, Xiaoxiang Shi, Saeed Maleki, Fan Yang, Yungang Bao, and Sa Wang. 2024. Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation. In EuroSys 2024.
[53]
Bo Zhao Luo Mai Peter Pietzuch Marcel Wagenländer, Guo Li. 2024. Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections. In SOSP 2024.
[54]
Derek G. Murray, Jiří Šimša, Ana Klimovic, and Ihor Indyk. 2021. Tf.Data: A Machine Learning Data Processing Framework. In VLDB 2021.
[55]
Ananthan Nambiar, Maeve Heflin, Simon Liu, Sergei Maslov, Mark Hopkins, and Anna Ritz. 2020. Transforming the language of life: Transformer neural networks for protein prediction tasks. In BCB 2020.
[56]
Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: Generalized Pipeline Parallelism for DNN Training. In SOSP 2019.
[57]
NVIDIA. 2024. GPUDirect Storage. https://docs.nvidia.com/gpudirect-storage/index.html
[58]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS 2019.
[59]
Neha Prakriya, Yu Yang, Baharan Mirzasoleiman, Cho-Jui Hsieh, and Jason Cong. 2023. NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training. In HotStorage 2023.
[60]
Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In NSDI 2019.
[61]
Kun Qian, Yongqing Xi, Jiamin Cao, Jiaqi Gao, Yichi Xu, Yu Guan, Binzhang Fu, Xuemei Shi, Fangbo Zhu, Rui Miao, Chao Wang, Peng Wang, Pengcheng Zhang, Xianlong Zeng, Eddie Ruan, Zhiping Yao, Ennan Zhai, and Dennis Cai. 2024. Alibaba HPN: A Data Center Network for Large Language Model Training. In SIGCOMM 2024.
[62]
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, Jack Montgomery, Bert Maher, Satish Nadathur, Jakob Olesen, Jongsoo Park, Artem Rakhov, Misha Smelyanskiy, and Man Wang. 2019. Glow: Graph Lowering Compiler Techniques for Neural Networks. https://arxiv.org/abs/1805.00907
[63]
V. Saxena, K. R. Jayaram, S. Basu, Y. Sabharwal, and A. Verma. 2020. Effective Elastic Scaling of Deep Learning Workloads. In MASCOTS 2020.
[64]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. (2014). https://arxiv.org/abs/1409.1556
[65]
Samuel L. Smith, Pieter-Jan Kindermans, and Quoc V. Le. 2018. Don't Decay the Learning Rate, Increase the Batch Size. In ICLR 2018.
[66]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS 2017.
[67]
Meng Wang, Gus Waldspurger, and Swaminathan Sundararaman. 2024. A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training. In HotStorage 2024.
[68]
Wenxiao Wang, Wei Chen, Qibo Qiu, Long Chen, Boxi Wu, Binbin Lin, Xiaofei He, and Wei Liu. 2023. CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention. https://arxiv.org/abs/2303.06908
[69]
Karl Weiss, Taghi M Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big data (2016).
[70]
Shuotao Xu, Thomas Bourgeat, Tianhao Huang, Hojun Kim, Sungjin Lee, and Arvind Arvind. 2020. AQUOMAN: An Analytic-Query Offloading Machine. In MICRO 2020.
[71]
Siyu Yan, Xiaoliang Wang, Xiaolong Zheng, Yinben Xia, Derui Liu, and Weishan Deng. 2021. ACC: Automatic ECN Tuning for High-Speed Datacenter Networks. In SIGCOMM 2021.
[72]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In FAST 2020.
[73]
Yifei Yang, Matt Youill, Matthew Woicik, Yizhou Liu, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. 2021. FlexPushdownDB: hybrid pushdown and caching in a cloud DBMS. In VLDB 2021.
[74]
Peifeng Yin, Ping Luo, and Taiga Nakamura. 2017. Small Batch or Large Batch? Gaussian Walk with Rebound Can Teach. In KDD 2017.
[75]
Xiangyao Yu, Matt Youill, Matthew Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker. 2020. PushdownDB: Accelerating a DBMS using S3 computation. In ICDE 2020.
[76]
Qizhen Zhang, Philip A. Bernstein, Badrish Chandramouli, Jiasheng Hu, and Yiming Zheng. 2024. DDS: DPU-Optimized Disaggregated Storage. In VLDB 2024.
[77]
Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Mingxia Li, Fan Yang, Qianxi Zhang, Binyang Li, Yuqing Yang, Lili Qiu, Lintao Zhang, and Lidong Zhou. 2023. SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters. In EuroSys 2023.
[78]
Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, et al. 2022. Understanding and co-designing the data ingestion pipeline for industry-scale recsys training. In ISCA 2022.
[79]
Mark Zhao, Niket Agarwal, Aarti Basant, Buğra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman, Kevin Wilfong, Harsha Rastogi, Carole-Jean Wu, Christos Kozyrakis, and Parik Pol. 2022. Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product. In ISCA 2022.
[80]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2019. A Comprehensive Survey on Transfer Learning. (2019). http://arxiv.org/abs/1911.02685
[81]
V. V. Zunin. 2021. Intel OpenVINO Toolkit for Computer Vision: Object Detection and Semantic Segmentation. In 2021 International Russian Automation Conference (RusAutoCon).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing
November 2024
1062 pages
ISBN:9798400712869
DOI:10.1145/3698038
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 November 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Near-data processing
  2. object stores
  3. transfer learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Huawei

Conference

SoCC '24
Sponsor:
SoCC '24: ACM Symposium on Cloud Computing
November 20 - 22, 2024
WA, Redmond, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 55
    Total Downloads
  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)55
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media