research-article

Petuum: A New Platform for Distributed Machine Learning on Big Data

Authors:

Abhimanu Kumar,

Yaoliang YuAuthors Info & Claims

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1335 - 1344

https://doi.org/10.1145/2783258.2783323

Published: 10 August 2015 Publication History

Abstract

How can one build a distributed framework that allows efficient deployment of a wide spectrum of modern advanced machine learning (ML) programs for industrial-scale problems using Big Models (100s of billions of parameters) on Big Data (terabytes or petabytes)- Contemporary parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized operators relying on graphical representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of different ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by leveraging several fundamental properties underlying ML programs that make them different from conventional operation-centric programs: error tolerance, dynamic structure, and nonuniform convergence; all stem from the optimization-centric nature shared in ML programs' mathematical definitions, and the iterative-convergent behavior of their algorithmic solutions. These properties present unique opportunities for an integrative system design, built on bounded-latency network synchronization and dynamic load-balancing scheduling, which is efficient, programmable, and enjoys provable correctness guarantees. We demonstrate how such a design in light of ML-first principles leads to significant performance improvements versus well-known implementations of several ML programs, allowing them to run in much less time and at considerably larger model sizes, on modestly-sized computer clusters.

Supplementary Material

MP4 File (p1335.mp4)

Download
162.05 MB

References

[1]

A. Agarwal and J. C. Duchi. Distributed delayed stochastic optimization. In NIPS, 2011.

Digital Library

[2]

A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM, 2012.

Digital Library

[3]

L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pages 177--186. Springer, 2010.

[4]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3:1--124, 2011.

Digital Library

[5]

J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin. Parallel coordinate descent for l1-regularized loss minimization. In ICML, 2011.

Digital Library

[6]

X. Chen, Q. Lin, S. Kim, J. Carbonell, and E. Xing. Smoothing proximal gradient method for general structured sparse learning. In UAI, 2011.

Digital Library

[7]

W. Dai, A. Kumar, J. Wei, Q. Ho, G. Gibson, and E. P. Xing. High-performance distributed ml at scale through parameter server consistency models. In AAAI. 2015.

[8]

J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Proceedings of the 24th international conference on Machine learning, pages 209--216. ACM, 2007.

Digital Library

[9]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In NIPS 2012, 2012.

[10]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.

[11]

H. B. M. et. al. Ad click prediction: a view from the trenches. In KDD, 2013.

[12]

J. Friedman, T. Hastie, H. Höfling, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied Statistics, 1(2):302--332, 2007.

[13]

T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(Suppl 1):5228--5235, 2004.

[14]

Q. Ho, J. Cipar, H. Cui, J.-K. Kim, S. Lee, P. B. Gibbons, G. Gibson, G. R. Ganger, and E. P. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS, 2013.

Digital Library

[15]

M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. JMLR, 14, 2013.

Digital Library

[16]

A. Kumar, A. Beutel, Q. Ho, and E. P. Xing. Fugue: Slow-worker-agnostic distributed learning for big models on big data. In AISTATS, 2014.

[17]

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012.

Digital Library

[18]

S. Lee, J. K. Kim, X. Zheng, Q. Ho, G. Gibson, and E. P. Xing. On model parallelism and scheduling strategies for distributed machine learning. In NIPS. 2014.

[19]

M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014.

Digital Library

[20]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. PVLDB, 2012.

Digital Library

[21]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In ACM SIGMOD International Conference on Management of data, 2010.

Digital Library

[22]

F. Niu, B. Recht, C. Ré, and S. J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.

Digital Library

[23]

R. Power and J. Li. Piccolo: building fast, distributed programs with partitioned tables. In OSDI. USENIX Association, 2010.

Digital Library

[24]

P. Richtárik and M. Takáč. Parallel coordinate descent methods for big data optimization. arXiv:1212.0873, 2012.

[25]

C. Scherrer, A. Tewari, M. Halappanavar, and D. Haglin. Feature clustering for accelerating parallel coordinate descent. NIPS, 2012.

Digital Library

[26]

Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, J. Zeng, Q. Yang, et al. Towards topic modeling for big data. arXiv:1405.4402, 2014.

[27]

T. White. Hadoop: The definitive guide. O'Reilly Media, Inc., 2012.

Digital Library

[28]

S. A. Williamson, A. Dubey, and E. P. Xing. Parallel markov chain monte carlo for nonparametric mixture models. In ICML, 2013.

[29]

E. P. Xing, M. I. Jordan, S. Russell, and A. Y. Ng. Distance metric learning with application to clustering with side-information. In NIPS, 2002.

Digital Library

[30]

H.-F. Yu, C.-J. Hsieh, S. Si, and I. Dhillon. Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In ICDM, 2012.

Digital Library

[31]

J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T.-Y. Liu, and W.-Y. Ma. Lightlda: Big topic models on modest compute clusters. In WWW. 2015.

Digital Library

[32]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, 2010.

Digital Library

[33]

Y. Zhang, Q. Gao, L. Gao, and C. Wang. Priter: A distributed framework for prioritized iterative computations. In SOCC, 2011.

Digital Library

[34]

Y. Zhang, Q. Gao, L. Gao, and C. Wang. Priter: A distributed framework for prioritizing iterative computations. IEEE Transactions on Parallel and Distributed Systems, 24(9):1884--1893, 2013.

Digital Library

[35]

Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. In Algorithmic Aspects in Information and Management, 2008.

Digital Library

[36]

M. Zinkevich, J. Langford, and A. J. Smola. Slow learners are fast. In NIPS, 2009.

Digital Library

Cited By

Yang FPeng SSun NWang FWang YWu FQiu JPan A(2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673095
Ci YLyu MZhang ZZuo DYang X(2024)KLNK: Expanding Page Boundaries in a Distributed Shared Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340988235:9(1524-1535)Online publication date: Sep-2024
https://doi.org/10.1109/TPDS.2024.3409882
Liao YXu YXu HChen MWang LQiao C(2024)Asynchronous Decentralized Federated Learning for Heterogeneous DevicesIEEE/ACM Transactions on Networking10.1109/TNET.2024.342444432:5(4535-4550)Online publication date: Oct-2024
https://doi.org/10.1109/TNET.2024.3424444
Show More Cited By

Index Terms

Petuum: A New Platform for Distributed Machine Learning on Big Data
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

A Survey on Distributed Machine Learning

The demand for artificial intelligence has grown significantly over the past decade, and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, to increase the quality of ...
Federated Machine Learning: Concept and Applications
Survey Papers and Regular Papers

Today’s artificial intelligence still faces two major challenges. One is that, in most industries, data exists in the form of isolated islands. The other is the strengthening of data privacy and security. We propose a possible solution to these ...
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis

Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2015

2378 pages

ISBN:9781450336642

DOI:10.1145/2783258

General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

KDD '15

Sponsor:

KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 10 - 13, 2015

NSW, Sydney, Australia

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
1,361
Total Downloads

Downloads (Last 12 months)173
Downloads (Last 6 weeks)23

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang FPeng SSun NWang FWang YWu FQiu JPan A(2024)Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC EnvironmentProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673095(514-523)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673095
Ci YLyu MZhang ZZuo DYang X(2024)KLNK: Expanding Page Boundaries in a Distributed Shared Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340988235:9(1524-1535)Online publication date: Sep-2024
https://doi.org/10.1109/TPDS.2024.3409882
Liao YXu YXu HChen MWang LQiao C(2024)Asynchronous Decentralized Federated Learning for Heterogeneous DevicesIEEE/ACM Transactions on Networking10.1109/TNET.2024.342444432:5(4535-4550)Online publication date: Oct-2024
https://doi.org/10.1109/TNET.2024.3424444
Liu LLi JLv JWang JZhao SLu Q(2024)Privacy-Preserving and Secure Industrial Big Data Analytics: A Survey and the Research FrameworkIEEE Internet of Things Journal10.1109/JIOT.2024.335372711:11(18976-18999)Online publication date: 1-Jun-2024
https://doi.org/10.1109/JIOT.2024.3353727
Xiao YJu LZhou ZLi SHuan ZZhang DJiang RWang LZhang XLiang LZhou J(2024)AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00394(5238-5251)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00394
Zeng YXue MXu PShi YZeng KZhang JYue L(2024)A Synchronous Parallel Method with Parameters Communication Prediction for Distributed Machine LearningCollaborative Computing: Networking, Applications and Worksharing10.1007/978-3-031-54531-3_21(385-403)Online publication date: 23-Feb-2024
https://doi.org/10.1007/978-3-031-54531-3_21
Zhao KLeng YZhang H(2023)Scaling Machine Learning with a Ring-based Distributed FrameworkProceedings of the 2023 7th International Conference on Computer Science and Artificial Intelligence10.1145/3638584.3638667(23-32)Online publication date: 8-Dec-2023
https://dl.acm.org/doi/10.1145/3638584.3638667
Renz-Wieland AKieslinger AGericke RGemulla RKaoudi ZMarkl VFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Good Intentions: Adaptive Parameter Management via Intent SignalingProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614895(2156-2166)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614895
Park HCho AJeon HLee HYang YLee SLee HChoo J(2023) HPC 2 lusterScape: Increasing Transparency and Efficiency of Shared High-Performance Computing Clusters for Large-scale AI Models 2023 IEEE Visualization in Data Science (VDS)10.1109/VDS60365.2023.00008(21-29)Online publication date: 15-Oct-2023
https://doi.org/10.1109/VDS60365.2023.00008
Ren YCao YYe CCheng X(2023)Two-layer accumulated quantized compression for communication-efficient federated learning: TLAQCScientific Reports10.1038/s41598-023-38916-x13:1Online publication date: 19-Jul-2023
https://doi.org/10.1038/s41598-023-38916-x
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents