research-article

Scaling Machine Learning with a Ring-based Distributed Framework

Authors:

Hui ZhangAuthors Info & Claims

CSAI '23: Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence

Pages 23 - 32

https://doi.org/10.1145/3638584.3638667

Published: 14 March 2024 Publication History

Abstract

In centralized distributed machine learning systems, communication overhead between servers and computing nodes has always been an important issue affecting the training efficiency. Although existing research has proposed various measures to reduce communication overhead between nodes in parameter server frameworks, the communication pressure and overhead inherited from centralized architectures are still significant. To address the above issue, this paper proposes a ring-based parameter server framework that is distinct from node division and model training mechanism in the standard p/s framework. The ring-based architecture cancels the global model stored on the server side, and each computing node stores a local copy of the model. During model training, computing nodes can asynchronously train local models based on local or remote training data. After all nodes finish learning, the ensemble learning method can predict test data based on all local models. To avoid the negative impact of remote data reading on model training efficiency, a producer-consumer data reading strategy is proposed. This strategy can reduce data reading overhead in a pipeline manner. To make rational use of the input and output bandwidths of all nodes, a circular data scheduling mechanism is proposed. At any given time, this mechanism ensures each node has at most one input stream and one output stream, thereby dispersing communication pressure. The experimental results show that the proposed distributed architecture achieves significantly better performance (1.7%-2.1% RMSE) than the state-of-the-art baselines and also achieves a 2.2x-3.4x speedup when reaching a comparable RMSE performance.

References

[1]

Mart Abadi 2016. Tensorflow: A system for large-scale machine learning. In 12th Symposium on Operating Systems Design and Implementation. 265–283.

[2]

Mathieu Blondel, Akinori Fujino, Naonori Ueda, and Masakazu Ishihata. 2016. Higher-Order Factorization Machines. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). 3359–3367.

Digital Library

[3]

Cheng Chen, Yilin Wang, Jun Yang, Yiming Liu, Mian Lu, Zhao Zheng, Bingsheng He, Weng-Fai Wong, Liang You, Penghao Sun, Yuping Zhao, Fenghua Hu, and Andy Rudoff. 2023. OpenEmbedding: A Distributed Parameter Server for Deep Learning Recommendation Models using Persistent Memory. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). 2976–2987.

[4]

Jeffrey Dean, Greg S Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V Le, Mark Z Mao, Marc’Aurelio Ranzato, Andrew Senior, and Paul Tucker. 2012. Large scale distributed deep networks. In NIPS’12. 1223–1231.

[5]

Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction(IJCAI’17). 1725–1731.

[6]

Liangjie Hong 2013. Co-factorization machines: modeling user interests and predicting individual decisions in twitter. In WSDM’13. 557–566.

[7]

Jie Jiang 2017. Angel: a new large-scale machine learning system. National Science Review 5, 2 (2017), 216–236.

[8]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI’14). 583–598.

Digital Library

[9]

Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed Factorization Machines. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (San Francisco, California, USA) (WSDM ’16). 377–386.

Digital Library

[10]

Babak Loni, Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Cross-Domain Collaborative Filtering with Factorization Machines. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416 (Amsterdam, The Netherlands) (ECIR 2014). 656–661.

[11]

Chun-Ta Lu, Lifang He, Weixiang Shao, Bokai Cao, and Philip S. Yu. 2017. Multilinear Factorization Machines for Multi-Task Multi-View Learning. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (Cambridge, United Kingdom) (WSDM ’17). 701–709.

Digital Library

[12]

Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, and Arvind Krishnamurthy. 2018. Parameter Hub: A Rack-Scale Parameter Server for Distributed Deep Neural Network Training. In Proceedings of the ACM Symposium on Cloud Computing (Carlsbad, CA, USA) (SoCC’18). 41–54.

Digital Library

[13]

Luo Luo, Wenpeng Zhang, Zhihua Zhang, Wenwu Zhu, Tong Zhang, and Jian Pei. 2018. Sketched Follow-The-Regularized-Leader for Online Factorization Machine. In KDD’18. ACM, 1900–1909.

[14]

Steffen Rendle. 2010. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining(ICDM’10). 995–1000.

[15]

Steffen Rendle. 2012. Learning Recommender Systems with Adaptive Regularization. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (Seattle, Washington, USA) (WSDM ’12). 133–142.

Digital Library

[16]

Steffen Rendle 2011. Fast context-aware recommendations with factorization machines. In SIGIR’11. 635–644.

[17]

Alexander Renz-Wieland, Rainer Gemulla, Zoi Kaoudi, and Volker Markl. 2022. NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). 481–495.

Digital Library

[18]

Zhen Song, Yu Gu, Zhigang Wang, and Ge Yu. 2022. DRPS: efficient disk-resident parameter servers for distributed machine learning. Frontiers of Computer Science 16 (2022), 1–12.

Digital Library

[19]

Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI’17). 3119–3125.

[20]

Eric P. Xing, Qirong Ho, Wei Dai, Jin-Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A New Platform for Distributed Machine Learning on Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD ’15). 1335–1344.

Digital Library

[21]

Fajie Yuan, Guibing Guo, Joemon M. Jose, Long Chen, Haitao Yu, and Weinan Zhang. 2017. BoostFM: Boosted Factorization Machines for Top-N Feature-Based Recommendation. In Proceedings of the 22nd International Conference on Intelligent User Interfaces (Limassol, Cyprus) (IUI ’17). 45–54.

Digital Library

[22]

Kankan Zhao, Jing Zhang, Liangfu Zhang, Cuiping Li, and Hong Chen. 2020. A distributed coordinate descent algorithm for learning factorization machine. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II. Springer, 881–893.

Digital Library

[23]

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed hierarchical gpu parameter server for massive scale deep learning ads systems. Proceedings of Machine Learning and Systems 2 (2020), 412–428.

[24]

Lei Zheng, Vahid Noroozi, and Philip S. Yu. 2017. Joint Deep Modeling of Users and Items Using Reviews for Recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (Cambridge, United Kingdom) (WSDM ’17). 425–434.

Digital Library

Index Terms

Scaling Machine Learning with a Ring-based Distributed Framework
1. Computing methodologies

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Distributed Machine Learning
ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and Networking

We explore the landscape of distributed machine learning, focusing on advancements, challenges, and potential future directions in this rapidly evolving field. We delve into the motivation for distributed machine learning, its essential techniques, real-...
A Fast Machine Learning Framework with Distributed Packet Loss Tolerance
CVIPPR '24: Proceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition

Growing dataset and model sizes for Deep Neural Networks (DNNs) training have necessitated distributed training. Despite a rich literature on designing better distributed training algorithms and frameworks, few of them touch the lower layers (e.g. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CSAI '23: Proceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence

December 2023

563 pages

ISBN:9798400708688

DOI:10.1145/3638584

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CSAI 2023

CSAI 2023: 2023 7th International Conference on Computer Science and Artificial Intelligence

December 8 - 10, 2023

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)6

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents