Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2806777.2806778acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Managed communication and consistency for fast data-parallel iterative analytics

Published: 27 August 2015 Publication History

Abstract

At the core of Machine Learning (ML) analytics is often an expert-suggested model, whose parameters are refined by iteratively processing a training dataset until convergence. The completion time (i.e. convergence time) and quality of the learned model not only depends on the rate at which the refinements are generated but also the quality of each refinement. While data-parallel ML applications often employ a loose consistency model when updating shared model parameters to maximize parallelism, the accumulated error may seriously impact the quality of refinements and thus delay completion time, a problem that usually gets worse with scale. Although more immediate propagation of updates reduces the accumulated error, this strategy is limited by physical network bandwidth. Additionally, the performance of the widely used stochastic gradient descent (SGD) algorithm is sensitive to step size. Simply increasing communication often fails to bring improvement without tuning step size accordingly and tedious hand tuning is usually needed to achieve optimal performance.
This paper presents Bösen, a system that maximizes the network communication efficiency under a given inter-machine network bandwidth budget to minimize parallel error, while ensuring theoretical convergence guarantees for large-scale data-parallel ML applications. Furthermore, Bösen prioritizes messages most significant to algorithm convergence, further enhancing algorithm convergence. Finally, Bösen is the first distributed implementation of the recently presented adaptive revision algorithm, which provides orders of magnitude improvement over a carefully tuned fixed schedule of step size refinements for some SGD algorithms. Experiments on two clusters with up to 1024 cores show that our mechanism significantly improves upon static communication schedules.

References

[1]
Apache Mahout. http://mahout.apache.org/.
[2]
Apache Spark MLLib. https://spark.apache.org/mllib/.
[3]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM '12: Proceedings of the fifth ACM international conference on Web search and data mining, pages 123--132, New York, NY, USA, 2012. ACM.
[4]
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--16, Berkeley, CA, USA, 2010. USENIX Association.
[5]
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective straggler mitigation: Attack of the clones. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 185--198, Lombard, IL, 2013. USENIX.
[6]
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[7]
J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In International Conference on Machine Learning (ICML 2011), Bellevue, Washington, June 2011.
[8]
T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 571--582, Broomfield, CO, Oct. 2014. USENIX Association.
[9]
J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the straggler problem with bounded staleness. In Presented as part of the 14th Workshop on Hot Topics in Operating Systems, Berkeley, CA, 2013. USENIX.
[10]
H. Cui, J. Cipar, Q. Ho, J. K. Kim, S. Lee, A. Kumar, J. Wei, W. Dai, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting bounded staleness to speed up big data analytics. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 37--48, Philadelphia, PA, June 2014. USENIX Association.
[11]
H. Cui, A. Tumanov, J. Wei, L. Xu, W. Dai, J. Haber-Kucharsky, Q. Ho, G. R. Ganger, P. B. Gibbons, G. A. Gibson, and E. P. Xing. Exploiting iterative-ness for parallel ml computations. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 5:1--5:14, New York, NY, USA, 2014. ACM.
[12]
W. Dai, A. Kumar, J. Wei, Q. Ho, G. A. Gibson, and E. P. Xing. High-performance distributed ML at scale through parameter server consistency models. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25--30, 2015, Austin, Texas, USA., pages 79--87, 2015.
[13]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3--6, 2012, Lake Tahoe, Nevada, United States., pages 1232--1240, 2012.
[14]
J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1--22, 2 2010.
[15]
R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 69--77, New York, NY, USA, 2011. ACM.
[16]
A. Genkin, D. D. Lewis, and D. Madigan. Large-scale bayesian logistic regression for text categorization. Technometrics, page 2007.
[17]
G. Gibson, G. Grider, A. Jacobson, and W. Lloyd. Probe: A thousand-node experimental cluster for computer systems research. volume 38, June 2013.
[18]
W. R. Gilks. Markov chain monte carlo. Wiley Online Library, 2005.
[19]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 17--30, Hollywood, CA, 2012. USENIX.
[20]
T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(suppl. 1):5228--5235, 2004.
[21]
Q. Ho, J. Cipar, H. Cui, S. Lee, J. K. Kim, P. B. Gibbons, G. A. Gibson, G. Ganger, and E. P. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1223--1231. Curran Associates, Inc., 2013.
[22]
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, Aug. 2009.
[23]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.
[24]
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 583--598, Broomfield, CO, Oct. 2014. USENIX Association.
[25]
J. Liu, J. Chen, and J. Ye. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 547--556, New York, NY, USA, 2009. ACM.
[26]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, Apr. 2012.
[27]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, New York, NY, USA, 2010. ACM.
[28]
H. B. McMahan and M. Streeter. Delay-tolerant algorithms for asynchronous distributed online learning. Advances in Neural Information Processing Systems (NIPS), 2014.
[29]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, S. Chikkerur, D. Liu, M. Wattenberg, A. M. Hrafnkelsson, T. Boulos, and J. Kubica. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2013.
[30]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 439--455, New York, NY, USA, 2013. ACM.
[31]
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--14, Berkeley, CA, USA, 2010. USENIX Association.
[32]
L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, Aug. 1990.
[33]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 937--946, New York, NY, USA, 2009. ACM.
[34]
J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17(9):530--531, Sept. 1974.
[35]
H.-F. Yu, H.-Y. Lo, H.-P. Hsieh, J.-K. Lou, T. G. McKenzie, J.-W. Chou, P.-H. Chung, C.-H. Ho, Y.-H. Chang, Chun-Fu an d Wei, et al. Feature engineering and classifier ensemble for kdd cup 2010. KDD Cup, 2010.
[36]
M. Zinkevich, J. Langford, and A. J. Smola. Slow learners are fast. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 2331--2339. Curran Associates, Inc., 2009.

Cited By

View all
  • (2024)PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model TrainingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651357(801-817)Online publication date: 27-Apr-2024
  • (2024)Federated Learning and Meta Learning: Approaches, Applications, and DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333091026:1(571-618)Online publication date: Sep-2025
  • (2024)Enabling Resource-Efficient AIoT System With Cross-Level Optimization: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.331995226:1(389-427)Online publication date: Sep-2025
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
August 2015
446 pages
ISBN:9781450336512
DOI:10.1145/2806777
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2015

Check for updates

Badges

  • Best Paper

Qualifiers

  • Research-article

Funding Sources

Conference

SoCC '15
Sponsor:
SoCC '15: ACM Symposium on Cloud Computing
August 27 - 29, 2015
Hawaii, Kohala Coast

Acceptance Rates

SoCC '15 Paper Acceptance Rate 34 of 157 submissions, 22%;
Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)148
  • Downloads (Last 6 weeks)13
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PrimePar: Efficient Spatial-temporal Tensor Partitioning for Large Transformer Model TrainingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651357(801-817)Online publication date: 27-Apr-2024
  • (2024)Federated Learning and Meta Learning: Approaches, Applications, and DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2023.333091026:1(571-618)Online publication date: Sep-2025
  • (2024)Enabling Resource-Efficient AIoT System With Cross-Level Optimization: A SurveyIEEE Communications Surveys & Tutorials10.1109/COMST.2023.331995226:1(389-427)Online publication date: Sep-2025
  • (2024)A Synchronous Parallel Method with Parameters Communication Prediction for Distributed Machine LearningCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54531-3_21(385-403)Online publication date: 23-Feb-2024
  • (2023)A Study on Distributed Machine Learning Techniques for Large-Scale Weather ForecastingScalable and Distributed Machine Learning and Deep Learning Patterns10.4018/978-1-6684-9804-0.ch003(44-64)Online publication date: 2-Jun-2023
  • (2023)Shapley Values as a Strategy for Ensemble Weights EstimationApplied Sciences10.3390/app1312701013:12(7010)Online publication date: 10-Jun-2023
  • (2023)SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00069(650-661)Online publication date: 22-Oct-2023
  • (2023)Edge Learning for B5G Networks With Distributed Signal Processing: Semantic Communication, Edge Computing, and Wireless SensingIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2023.323918917:1(9-39)Online publication date: Jan-2023
  • (2023)Role of Federated Learning for Internet of Vehicles: A Systematic ReviewArtificial Intelligence of Things10.1007/978-3-031-48781-1_11(128-139)Online publication date: 3-Dec-2023
  • (2022)Stanza: Layer Separation for Distributed Training in Deep LearningIEEE Transactions on Services Computing10.1109/TSC.2020.298568415:3(1309-1320)Online publication date: 1-May-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media