Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Optimal distributed online prediction using mini-batches

Published: 01 January 2012 Publication History

Abstract

Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.

References

[1]
J. Abernethy, A. Agarwal, A. Rakhlin, and P. L. Bartlett. A stochastic view of optimal regret through minimax duality. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT), 2009.
[2]
D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice Hall, 1989.
[3]
J. R. Birge and F. Louveaux. Introduction to Stochastic Programming. Springer-Verlag, New York, 1997.
[4]
N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
[5]
G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538-543, August 1993.
[6]
A. Cotter, O. Shamir, N. Srebro, and K. Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in Neural Information Processing Systems 24, 2011.
[7]
O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 713-720, 2011.
[8]
J. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2873-2898, 2009.
[9]
J. Duchi, A. Agarwal, and M. Wainwright. Distributed dual averaging in networks. In Advances in Neural Information Processing Systems (NIPS), pages 550-558, 2010.
[10]
S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization. Technical report, Department of Industrial and System Engineering, University of Florida, Gainesville, FL, 2010.
[11]
K. Gimpel, D. Das, and N. A. Smith. Distributed asynchronous online learning for natural language processing. In Proceedings of the Fourth Conference on Computational Natural Language Learning, pages 213-222, 2010.
[12]
R. Goebel and R. T. Rockafellar. Local strong convexity and local Lipschitz continuity of the gradient of convex functions. Journal of Convex Analysis, 15(2):263-270, 2008.
[13]
J. L. Gustafson. Reevaluating Amdahl's Law. Communications of the ACM, 31(5):532-533, 1988.
[14]
E. Harrington, R. Herbrich, J. Kivinen, J. Platt, and R. C. Williamson. Online Bayes point machines. In Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 241-252, 2003.
[15]
C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. In Y. Bengio, D. Schuurmans, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 781-789, 2009.
[16]
A. Juditsky, A. Nemirovski, and C. Tauvel. Solving variational inequalities with stochastic mirrorprox algorithm. Stochastic Systems, 1:1-42, 2011.
[17]
G. Lan. An optimal method for stochastic composite optimization. Technical report, Georgia Institute of Technology, 2009.
[18]
G. Lan, Z. Lu, and R. D. C. Monteiro. Primal-dual first-order methods with O(1/¿) iteration-complexity for cone programming. Mathematical Programming, 126:1-29, 2011.
[19]
J. Langford, A. J. Smola, and M. Zinkevich. Slow learners are fast. In Advances in Neural Information Processing Systems (NIPS) 22, pages 2331-2339, 2009.
[20]
A. Nedic and A. Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48-61, 2009.
[21]
A. Nedic, D. P. Bertsekas, and V. S. Borkar. Distributed asynchronous incremental subgradient methods. In D. Butnariu, Y. Censor, and S. Reich, editors, Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications, Studies in Computational Mathematics, pages 381-407. Elsevier, 2001.
[22]
A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Series in Discrete Mathematics. Wiley-Interscience, 1983.
[23]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574-1609, 2009.
[24]
Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, 2004.
[25]
Y. Nesterov. Smooth minimization of nonsmooth functions. Mathematical Programming, 103: 127-152, 2005.
[26]
Y. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221-259, August 2009.
[27]
Y. Nesterov and J.-Ph. Vial. Confidence level solutions for stochastic programming. Automatica, 44(6):1559-1568, 2008.
[28]
B. T. Polyak. Introduction to Optimization. Translations Series in Mathematics and Engineering. Optimization Software, Inc., New York, 1987.
[29]
R. T. Rockafellar and R. J-B Wets. On the interchange of subdifferentiation and conditional expectation for convex functionals. Stochastics: An International Journal of Probability and Stochastic Processes, 7(3):173-182, 1982.
[30]
S. Shalev-Shwartz and A. Tewari. Stochastic methods for l 1-regularized loss minimization. Journal of Machine Learning Research, 12:1865-1892, 2011.
[31]
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning (ICML), pages 807-814, 2007.
[32]
P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM Journal on Optimization, 2008.
[33]
J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control, 31(9):803-812, 1986.
[34]
R. J-B Wets. Stochastic programming. In G. Nemhauser and A. Rinnnooy Kan, editors, Handbook for Operations Research and Management Sciences, volume 1, pages 573-629. Elsevier Science Publishers, Amsterdam, The Netherlands, 1989.
[35]
L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11:2543-2596, 2010.
[36]
M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928-936, Washington DC, 2003.
[37]
M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems (NIPS), pages 2595-2603, 2010.

Cited By

View all
  • (2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
  • (2022)CAEFL: composable and environment aware federated learning modelsProceedings of the 21st ACM SIGPLAN International Workshop on Erlang10.1145/3546186.3549927(9-20)Online publication date: 6-Sep-2022
  • (2022)Blockchain-enabled Federated Learning: A SurveyACM Computing Surveys10.1145/352410455:4(1-35)Online publication date: 21-Nov-2022
  • Show More Cited By

Index Terms

  1. Optimal distributed online prediction using mini-batches

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 13, Issue 1
      January 2012
      3712 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 January 2012
      Published in JMLR Volume 13, Issue 1

      Author Tags

      1. convex optimization
      2. distributed computing
      3. online learning
      4. regret bounds
      5. stochastic optimization

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)181
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data RepresentationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2023.330037032:2(445-459)Online publication date: 1-Feb-2024
      • (2022)CAEFL: composable and environment aware federated learning modelsProceedings of the 21st ACM SIGPLAN International Workshop on Erlang10.1145/3546186.3549927(9-20)Online publication date: 6-Sep-2022
      • (2022)Blockchain-enabled Federated Learning: A SurveyACM Computing Surveys10.1145/352410455:4(1-35)Online publication date: 21-Nov-2022
      • (2022)NET-FLEETProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3492866.3549723(71-80)Online publication date: 3-Oct-2022
      • (2022)Accelerating local SGD for non-IID data using variance reductionFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-021-1018-017:2Online publication date: 9-Aug-2022
      • (2022)Scaling up stochastic gradient descent for non-convex optimisationMachine Language10.1007/s10994-022-06243-3111:11(4039-4079)Online publication date: 1-Nov-2022
      • (2022)On the Parallelization Upper Bound for Asynchronous Stochastic Gradients Descent in Non-convex OptimizationJournal of Optimization Theory and Applications10.1007/s10957-022-02141-9196:3(900-935)Online publication date: 4-Dec-2022
      • (2022)Unifying mirror descent and dual averagingMathematical Programming: Series A and B10.1007/s10107-022-01850-3199:1-2(793-830)Online publication date: 30-Jun-2022
      • (2022)Communication-Efficient Distributed Online Prediction by Dynamic Model SynchronizationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_40(623-639)Online publication date: 10-Mar-2022
      • (2021)Accelerating Distributed Online Meta-Learning via Multi-Agent Collaboration under Limited CommunicationProceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3466772.3467055(261-270)Online publication date: 26-Jul-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media