extended-abstract

Orchestra: adaptively accelerating distributed deep learning in heterogeneous environments

Authors:

Qiao XiangAuthors Info & Claims

CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers

Pages 181 - 184

https://doi.org/10.1145/3528416.3530246

Published: 17 May 2022 Publication History

Abstract

The synchronized Local-SGD(Stochastic gradient descent) strategy becomes a more popular in distributed deep learning (DML) since it can effectively reduce the frequency of model communication and ensure global model convergence. However, it works not well and leads to excessive training time in heterogeneous environments due to the difference in workers' performance. Especially, in some data unbalanced scenarios, these differences between workers may aggravate low utilization of resources and eventually lead to stragglers, which seriously hurt the whole training procedure. Existing solutions either suffer from a heterogeneity of computing resources or do not fully address the environment dynamics.

In this paper, we eliminate the negative impacts of dynamic resource constraints issues in heterogeneous DML environments with a novel, adaptive load-balancing framework called Orchestra. The main idea of Orchestra is to improve resource utilization by load balance between worker performance and the unbalance of data volume. Additionally, one of Orchestra's strongest features is the number of local updates adaptation at each epoch per worker. To achieve this improvement, we propose a distributed deep reinforcement learning-driven algorithm for per-worker to dynamically determine the number of local updates adaptation and training data volume, subject to mini-batch cost time and resource constraints at each epoch. Our design significantly improves the convergence speed of the model in DML compared with other state-of-the-art.

References

[1]

Timothy Castiglia, Anirban Das, and Stacy Patterson. 2020. Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks. In International Conference on Learning Representations.

[2]

Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. 2016. Revisiting distributed synchronous SGD. arXiv preprint arXiv:1604.00981 (2016).

[3]

Yiming Cui, Ting Liu, Wanxiang Che, Li Xiao, Zhipeng Chen, Wentao Ma, Shijin Wang, and Guoping Hu. 2019. A Span-Extraction Dataset for Chinese Machine Reading Comprehension. In International Joint Conference on Natural Language. Hong Kong, China, 5883--5889.

[4]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc'aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et al. 2012. Large scale distributed deep networks. Advances in neural information processing systems 25 (2012), 1223--1231.

[5]

Sanghamitra Dutta, Viveck Cadambe, and Pulkit Grover. 2016. Short-dot: Computing large linear transforms distributedly using coded short dot products. Advances In Neural Information Processing Systems 29 (2016).

[6]

Lasse Espeholt, Hubert Soyer, Rémi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. 2018. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 1406--1415.

[7]

Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017).

[8]

S. Gupta, W. Zhang, and F. Wang. 2016. Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE Computer Society, Los Alamitos, CA, USA, 171--180.

[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[10]

Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer. 2014. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869 (2014).

[11]

Can Karakus, Yifan Sun, and Suhas Diggavi. 2017. Encoded distributed optimization. In 2017 IEEE international symposium on information theory (ISIT). IEEE, 2890--2894.

Digital Library

[12]

Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Master's thesis. University of Toronto.

[13]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations.

[14]

Mu Li, Tong Zhang, Yuqiang Chen, and Alexander J Smola. 2014. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 661--670.

Digital Library

[15]

Ryan McDonald, Keith Hall, and Gideon Mann. 2010. Distributed training strategies for the structured perceptron. In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics. 456--464.

[16]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. PMLR, 1273--1282.

[17]

Sebastian U Stich. 2018. Local SGD converges fast and communicates little. arXiv preprint arXiv:1805.09767 (2018).

[18]

Yang You, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. 2018. Imagenet training in minutes. In Proceedings of the 47th International Conference on Parallel Processing. 1--10.

Digital Library

[19]

Hao Yu and Rong Jin. 2019. On the computation and communication complexity of parallel sgd with dynamic batch sizes for stochastic non-convex optimization. In International Conference on Machine Learning. PMLR, 7174--7183.

[20]

Xiaohui Zhang, Jan Trmal, Daniel Povey, and Sanjeev Khudanpur. 2014. Improving deep neural network acoustic models using generalized maxout networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 215--219.

[21]

Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola. 2010. Parallelized stochastic gradient descent. In Advances in neural information processing systems. 2595--2603.

Cited By

Zhou SHudin N(2024)Advancing e-commerce user purchase prediction: Integration of time-series attention with event-based timestamp encoding and Graph Neural Network-Enhanced user profilingPLOS ONE10.1371/journal.pone.029908719:4(e0299087)Online publication date: 18-Apr-2024
https://doi.org/10.1371/journal.pone.0299087

Index Terms

Orchestra: adaptively accelerating distributed deep learning in heterogeneous environments
1. Computing methodologies
  1. Distributed computing methodologies
  2. Machine learning

Recommendations

A Research on Multi-resource Allocation Solution in Cloud Computing Systems
ICEICE '12: Proceedings of the 2012 Second International Conference on Electric Information and Control Engineering - Volume 02

In this paper, a solution model for infrastructure resource allocation is proposed to support the business service on cloud computing platform. A series of core methodologies are developed to ensure the capability of automatically allocating resources ...
Mobile resource management load balancing strategy

This paper is dealing with mobile resource management that distributes mobile device processes between cloud computing virtual machine and mobile device. Expectation is that mobile device user experience will increase. Load balancing is an important ...
MixHeter

As data centers and applications grow more heterogeneous, allocating the proper resources to various applications increasingly depends on understanding the tradeoffs between different allocations, because mixed workloads may benefit from different ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers

May 2022

321 pages

ISBN:9781450393386

DOI:10.1145/3528416

General Chair:
Luca Sterpone
Politecnico di Torino, IT
,
Program Chairs:
Andrea Bartolini
Universit`a di Bologna, IT
,
Anastasiia Butko
Lawrence Berkeley National Laboratory

Copyright © 2022 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2022

Check for updates

Author Tags

Qualifiers

Extended-abstract

Conference

CF '22

Sponsor:

SIGMICRO

CF '22: 19th ACM International Conference on Computing Frontiers

May 17 - 22, 2022

Turin, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
58
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou SHudin N(2024)Advancing e-commerce user purchase prediction: Integration of time-series attention with event-based timestamp encoding and Graph Neural Network-Enhanced user profilingPLOS ONE10.1371/journal.pone.029908719:4(e0299087)Online publication date: 18-Apr-2024
https://doi.org/10.1371/journal.pone.0299087

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten