research-article

Open access

Managed communication and consistency for fast data-parallel iterative analytics

Authors:

Gregory R. Ganger,

Phillip B. Gibbons,

Garth A. Gibson,

Eric P. XingAuthors Info & Claims

SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing

Pages 381 - 394

https://doi.org/10.1145/2806777.2806778

Published: 27 August 2015 Publication History

Abstract

At the core of Machine Learning (ML) analytics is often an expert-suggested model, whose parameters are refined by iteratively processing a training dataset until convergence. The completion time (i.e. convergence time) and quality of the learned model not only depends on the rate at which the refinements are generated but also the quality of each refinement. While data-parallel ML applications often employ a loose consistency model when updating shared model parameters to maximize parallelism, the accumulated error may seriously impact the quality of refinements and thus delay completion time, a problem that usually gets worse with scale. Although more immediate propagation of updates reduces the accumulated error, this strategy is limited by physical network bandwidth. Additionally, the performance of the widely used stochastic gradient descent (SGD) algorithm is sensitive to step size. Simply increasing communication often fails to bring improvement without tuning step size accordingly and tedious hand tuning is usually needed to achieve optimal performance.

This paper presents Bösen, a system that maximizes the network communication efficiency under a given inter-machine network bandwidth budget to minimize parallel error, while ensuring theoretical convergence guarantees for large-scale data-parallel ML applications. Furthermore, Bösen prioritizes messages most significant to algorithm convergence, further enhancing algorithm convergence. Finally, Bösen is the first distributed implementation of the recently presented adaptive revision algorithm, which provides orders of magnitude improvement over a carefully tuned fixed schedule of step size refinements for some SGD algorithms. Experiments on two clusters with up to 1024 cores show that our mechanism significantly improves upon static communication schedules.

References

[1]

Apache Mahout. http://mahout.apache.org/.

[2]

Apache Spark MLLib. https://spark.apache.org/mllib/.

[3]

A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining, pages 123--132, New York, NY, USA, 2012. ACM.

Digital Library

[4]

G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--16, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[5]

G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective straggler mitigation: Attack of the clones. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 185--198, Lombard, IL, 2013. USENIX.

Digital Library

[6]

C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Digital Library

[7]

J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In International Conference on Machine Learning (ICML 2011), Bellevue, Washington, June 2011.

[8]

T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 571--582, Broomfield, CO, Oct. 2014. USENIX Association.

Digital Library

[9]

J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the straggler problem with bounded staleness. In Presented as part of the 14th Workshop on Hot Topics in Operating Systems, Berkeley, CA, 2013. USENIX.

Digital Library

[10]

H. Cui, J. Cipar, Q. Ho, J. K. Kim, S. Lee, A. Kumar, J. Wei, W. Dai, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting bounded staleness to speed up big data analytics. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 37--48, Philadelphia, PA, June 2014. USENIX Association.

Digital Library

[11]

H. Cui, A. Tumanov, J. Wei, L. Xu, W. Dai, J. Haber-Kucharsky, Q. Ho, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting iterative-ness for parallel ml computations. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 5:1--5:14, New York, NY, USA, 2014. ACM.

Digital Library

[12]

W. Dai, A. Kumar, J. Wei, Q. Ho, G. A. Gibson, and E. P. Xing. High-performance distributed ML at scale through parameter server consistency models. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25--30, 2015, Austin, Texas, USA., pages 79--87, 2015.

[13]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3--6, 2012, Lake Tahoe, Nevada, United States., pages 1232--1240, 2012.

[14]

J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1--22, 2 2010.

[15]

R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 69--77, New York, NY, USA, 2011. ACM.

Digital Library

[16]

A. Genkin, D. D. Lewis, and D. Madigan. Large-scale bayesian logistic regression for text categorization. Technometrics, page 2007.

[17]

G. Gibson, G. Grider, A. Jacobson, and W. Lloyd. Probe: A thousand-node experimental cluster for computer systems research. volume 38, June 2013.

[18]

W. R. Gilks. Markov chain monte carlo. Wiley Online Library, 2005.

[19]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 17--30, Hollywood, CA, 2012. USENIX.

Digital Library

[20]

T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(suppl. 1):5228--5235, 2004.

[21]

Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, G. A. Gibson, G. Ganger, and E. P. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1223--1231. Curran Associates, Inc., 2013.

[22]

Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, Aug. 2009.

Digital Library

[23]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.

Digital Library

[24]

M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 583--598, Broomfield, CO, Oct. 2014. USENIX Association.

Digital Library

[25]

J. Liu, J. Chen, and J. Ye. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 547--556, New York, NY, USA, 2009. ACM.

Digital Library

[26]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.

Digital Library

[27]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.

Digital Library

[28]

H. B. McMahan and M. Streeter. Delay-tolerant algorithms for asynchronous distributed online learning. Advances in Neural Information Processing Systems (NIPS), 2014.

[29]

H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2013.

Digital Library

[30]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 439--455, New York, NY, USA, 2013. ACM.

Digital Library

[31]

R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--14, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[32]

L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, Aug. 1990.

Digital Library

[33]

L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 937--946, New York, NY, USA, 2009. ACM.

Digital Library

[34]

J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17(9):530--531, Sept. 1974.

Digital Library

[35]

H.-F. Yu, H.-Y. Lo, H.-P. Hsieh, J.-K. Lou, T. G. McKenzie, J.-W. Chou, P.-H. Chung, C.-H. Ho, Y.-H. Chang, Chun-Fu an d Wei, et al. Feature engineering and classifier ensemble for kdd cup 2010. KDD Cup, 2010.

[36]

M. Zinkevich, J. Langford, and A. J. Smola. Slow learners are fast. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 2331--2339. Curran Associates, Inc., 2009.

Cited By

Wang HWang LXu HWang YLi YHan YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model TrainingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651357(801-817)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651357
Liu XDeng YNallanathan ABennis M(2024)Federated Learning and Meta Learning: Approaches, Applications, and DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333091026:1(571-618)Online publication date: Sep-2025
https://doi.org/10.1109/COMST.2023.3330910
Liu SGuo BFang CWang ZLuo SZhou ZYu Z(2024)Enabling Resource-Efficient AIoT System With Cross-Level Optimization: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.331995226:1(389-427)Online publication date: Sep-2025
https://doi.org/10.1109/COMST.2023.3319952
Show More Cited By

Index Terms

Recommendations

Big Data Analytics
SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential ...
Big Data Analytics with R and Hadoop

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing

August 2015

446 pages

ISBN:9781450336512

DOI:10.1145/2806777

General Chair:
Shahram Ghandeharizadeh
University of Southern California
,
Program Chairs:
Magdalena Balazinska
University of Washington
,
Michael J. Freedman
Princeton University
,
Publications Chair:
Sumita Barahmand
Microsoft

Copyright © 2015 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2015

Check for updates

Badges

Best Paper

Qualifiers

Research-article

Funding Sources

Conference

SoCC '15

Sponsor:

SoCC '15: ACM Symposium on Cloud Computing

August 27 - 29, 2015

Hawaii, Kohala Coast

Acceptance Rates

SoCC '15 Paper Acceptance Rate 34 of 157 submissions, 22%;

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
2,856
Total Downloads

Downloads (Last 12 months)148
Downloads (Last 6 weeks)13

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang HWang LXu HWang YLi YHan YTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model TrainingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651357(801-817)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651357
Liu XDeng YNallanathan ABennis M(2024)Federated Learning and Meta Learning: Approaches, Applications, and DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333091026:1(571-618)Online publication date: Sep-2025
https://doi.org/10.1109/COMST.2023.3330910
Liu SGuo BFang CWang ZLuo SZhou ZYu Z(2024)Enabling Resource-Efficient AIoT System With Cross-Level Optimization: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.331995226:1(389-427)Online publication date: Sep-2025
https://doi.org/10.1109/COMST.2023.3319952
Zeng YXue MXu PShi YZeng KZhang JYue L(2024)A Synchronous Parallel Method with Parameters Communication Prediction for Distributed Machine LearningCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54531-3_21(385-403)Online publication date: 23-Feb-2024
https://doi.org/10.1007/978-3-031-54531-3_21
Balaji V. Sivagami M. (2023)A Study on Distributed Machine Learning Techniques for Large-Scale Weather ForecastingScalable and Distributed Machine Learning and Deep Learning Patterns10.4018/978-1-6684-9804-0.ch003(44-64)Online publication date: 2-Jun-2023
https://doi.org/10.4018/978-1-6684-9804-0.ch003
Drungilas VVaičiukynas EAblonskis LČeponienė L(2023)Shapley Values as a Strategy for Ensemble Weights EstimationApplied Sciences10.3390/app1312701013:12(7010)Online publication date: 10-Jun-2023
https://doi.org/10.3390/app13127010
Barrak AJaziri MTrabelsi RJaafar FPetrillo F(2023)SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00069(650-661)Online publication date: 22-Oct-2023
https://doi.org/10.1109/QRS60937.2023.00069
Xu WYang ZNg DLevorato MEldar YDebbah M(2023)Edge Learning for B5G Networks With Distributed Signal Processing: Semantic Communication, Edge Computing, and Wireless SensingIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2023.323918917:1(9-39)Online publication date: Jan-2023
https://doi.org/10.1109/JSTSP.2023.3239189
Bala PChhabra R(2023)Role of Federated Learning for Internet of Vehicles: A Systematic ReviewArtificial Intelligence of Things10.1007/978-3-031-48781-1_11(128-139)Online publication date: 3-Dec-2023
https://doi.org/10.1007/978-3-031-48781-1_11
Wu XXu HLi BXiong Y(2022)Stanza: Layer Separation for Distributed Training in Deep LearningIEEE Transactions on Services Computing10.1109/TSC.2020.298568415:3(1309-1320)Online publication date: 1-May-2022
https://doi.org/10.1109/TSC.2020.2985684
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents