article

Free access

Optimal distributed online prediction using mini-batches

Authors:

Ran Gilad-Bachrach,

Lin XiaoAuthors Info & Claims

The Journal of Machine Learning Research, Volume 13, Issue 1

Pages 165 - 202

Published: 01 January 2012 Publication History

PDF eReader Publisher Site

Abstract

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.

References

[1]

J. Abernethy, A. Agarwal, A. Rakhlin, and P. L. Bartlett. A stochastic view of optimal regret through minimax duality. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT), 2009.

[2]

D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice Hall, 1989.

[3]

J. R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer-Verlag, New York, 1997.

[4]

N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

[5]

G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538-543, August 1993.

[6]

A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in Neural Information Processing Systems 24, 2011.

[7]

O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 713-720, 2011.

[8]

J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2873-2898, 2009.

[9]

J. Duchi, A. Agarwal, and M. Wainwright. Distributed dual averaging in networks. In Advances in Neural Information Processing Systems (NIPS), pages 550-558, 2010.

[10]

S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization. Technical report, Department of Industrial and System Engineering, University of Florida, Gainesville, FL, 2010.

[11]

K. Gimpel, D. Das, and N. A. Smith. Distributed asynchronous online learning for natural language processing. In Proceedings of the Fourth Conference on Computational Natural Language Learning, pages 213-222, 2010.

[12]

R. Goebel and R. T. Rockafellar. Local strong convexity and local Lipschitz continuity of the gradient of convex functions. Journal of Convex Analysis, 15(2):263-270, 2008.

[13]

J. L. Gustafson. Reevaluating Amdahl's Law. Communications of the ACM, 31(5):532-533, 1988.

[14]

E. Harrington, R. Herbrich, J. Kivinen, J. Platt, and R. C. Williamson. Online Bayes point machines. In Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 241-252, 2003.

[15]

C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. In Y. Bengio, D. Schuurmans, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 781-789, 2009.

[16]

A. Juditsky, A. Nemirovski, and C. Tauvel. Solving variational inequalities with stochastic mirrorprox algorithm. Stochastic Systems, 1:1-42, 2011.

[17]

G. Lan. An optimal method for stochastic composite optimization. Technical report, Georgia Institute of Technology, 2009.

[18]

G. Lan, Z. Lu, and R. D. C. Monteiro. Primal-dual first-order methods with O(1/¿) iteration-complexity for cone programming. Mathematical Programming, 126:1-29, 2011.

[19]

J. Langford, A. J. Smola, and M. Zinkevich. Slow learners are fast. In Advances in Neural Information Processing Systems (NIPS) 22, pages 2331-2339, 2009.

[20]

A. Nedic and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48-61, 2009.

[21]

A. Nedic, D. P. Bertsekas, and V. S. Borkar. Distributed asynchronous incremental subgradient methods. In D. Butnariu, Y. Censor, and S. Reich, editors, Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, Studies in Computational Mathematics, pages 381-407. Elsevier, 2001.

[22]

A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Series in Discrete Mathematics. Wiley-Interscience, 1983.

[23]

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574-1609, 2009.

[24]

Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, 2004.

[25]

Y. Nesterov. Smooth minimization of nonsmooth functions. Mathematical Programming, 103: 127-152, 2005.

[26]

Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221-259, August 2009.

[27]

Y. Nesterov and J.-Ph. Vial. Confidence level solutions for stochastic programming. Automatica, 44(6):1559-1568, 2008.

[28]

B. T. Polyak. Introduction to Optimization. Translations Series in Mathematics and Engineering. Optimization Software, Inc., New York, 1987.

[29]

R. T. Rockafellar and R. J-B Wets. On the interchange of subdifferentiation and conditional expectation for convex functionals. Stochastics: An International Journal of Probability and Stochastic Processes, 7(3):173-182, 1982.

[30]

S. Shalev-Shwartz and A. Tewari. Stochastic methods for l ₁-regularized loss minimization. Journal of Machine Learning Research, 12:1865-1892, 2011.

[31]

S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML), pages 807-814, 2007.

[32]

P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM Journal on Optimization, 2008.

[33]

J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control, 31(9):803-812, 1986.

[34]

R. J-B Wets. Stochastic programming. In G. Nemhauser and A. Rinnnooy Kan, editors, Handbook for Operations Research and Management Sciences, volume 1, pages 573-629. Elsevier Science Publishers, Amsterdam, The Netherlands, 1989.

[35]

L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543-2596, 2010.

[36]

M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928-936, Washington DC, 2003.

[37]

M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS), pages 2595-2603, 2010.

Cited By

Qin WLuo X(2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TFUZZ.2023.3300370
Xu RMichala ATrinder PAronis SOzkan B(2022)CAEFL: composable and environment aware federated learning modelsProceedings of the 21st ACM SIGPLAN International Workshop on Erlang10.1145/3546186.3549927(9-20)Online publication date: 6-Sep-2022
https://dl.acm.org/doi/10.1145/3546186.3549927
Qu YUddin MGan CXiang YGao LYearwood J(2022)Blockchain-enabled Federated Learning: A SurveyACM Computing Surveys10.1145/352410455:4(1-35)Online publication date: 21-Nov-2022
https://dl.acm.org/doi/10.1145/3524104
Show More Cited By

Index Terms

Optimal distributed online prediction using mini-batches
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
  2. Machine learning

Recommendations

Optimal distributed online prediction using mini-batches

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with ...
Online PCA with optimal regret

We investigate the online version of Principle Component Analysis (PCA), where in each trial t the learning algorithm chooses a k-dimensional subspace, and upon receiving the next instance vector x_t, suffers the "compression loss", which is the squared ...
Second-order non-stationary online learning for regression

The goal of a learner in standard online learning, is to have the cumulative loss not much larger compared with the best-performing function from some fixed class. Numerous algorithms were shown to have this gap arbitrarily close to zero, compared with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 13, Issue 1

January 2012

3712 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2012

Published in JMLR Volume 13, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
700
Total Downloads

Downloads (Last 12 months)181
Downloads (Last 6 weeks)7

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qin WLuo X(2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TFUZZ.2023.3300370
Xu RMichala ATrinder PAronis SOzkan B(2022)CAEFL: composable and environment aware federated learning modelsProceedings of the 21st ACM SIGPLAN International Workshop on Erlang10.1145/3546186.3549927(9-20)Online publication date: 6-Sep-2022
https://dl.acm.org/doi/10.1145/3546186.3549927
Qu YUddin MGan CXiang YGao LYearwood J(2022)Blockchain-enabled Federated Learning: A SurveyACM Computing Surveys10.1145/352410455:4(1-35)Online publication date: 21-Nov-2022
https://dl.acm.org/doi/10.1145/3524104
Zhang XFang MLiu ZYang HLiu JZhu ZChong SJoo CLee K(2022)NET-FLEETProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3492866.3549723(71-80)Online publication date: 3-Oct-2022
https://dl.acm.org/doi/10.1145/3492866.3549723
Liang XShen SChen ELiu JLiu QCheng YPan Z(2022)Accelerating local SGD for non-IID data using variance reductionFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-1018-017:2Online publication date: 9-Aug-2022
https://dl.acm.org/doi/10.1007/s11704-021-1018-0
Mohamad SAlamri HBouchachia A(2022)Scaling up stochastic gradient descent for non-convex optimisationMachine Language10.1007/s10994-022-06243-3111:11(4039-4079)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s10994-022-06243-3
Wang LShen B(2022)On the Parallelization Upper Bound for Asynchronous Stochastic Gradients Descent in Non-convex OptimizationJournal of Optimization Theory and Applications10.1007/s10957-022-02141-9196:3(900-935)Online publication date: 4-Dec-2022
https://dl.acm.org/doi/10.1007/s10957-022-02141-9
Juditsky AKwon JMoulines É(2022)Unifying mirror descent and dual averagingMathematical Programming: Series A and B10.1007/s10107-022-01850-3199:1-2(793-830)Online publication date: 30-Jun-2022
https://dl.acm.org/doi/10.1007/s10107-022-01850-3
Kamp MBoley MKeren DSchuster ASharfman I(2022)Communication-Efficient Distributed Online Prediction by Dynamic Model SynchronizationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_40(623-639)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44848-9_40
Lin SDedeoglu MZhang J(2021)Accelerating Distributed Online Meta-Learning via Multi-Agent Collaboration under Limited CommunicationProceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3466772.3467055(261-270)Online publication date: 26-Jul-2021
https://dl.acm.org/doi/10.1145/3466772.3467055
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents