Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3313276.3316403acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Public Access

Memory-sample tradeoffs for linear regression with small error

Published: 23 June 2019 Publication History

Abstract

We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a1,b1), (a2,b2)…, with ai drawn independently from a d-dimensional isotropic Gaussian, and where bi = ⟨ ai, x⟩ + ηi, for a fixed x ∈ ℝd with ||x||2 = 1 and with independent noise ηi drawn uniformly from the interval [−2d/5,2d/5]. We show that any algorithm with at most d2/4 bits of memory requires at least Ω(d loglog1/є) samples to approximate x to ℓ2 error є with probability of success at least 2/3, for є sufficiently small as a function of d. In contrast, for such є, x can be recovered to error є with probability 1−o(1) with memory O(d2 log(1/є)) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

References

[1]
Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 266–275. IEEE, 2016.
[2]
Ran Raz. A time-space lower bound for a large class of learning problems. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on, pages 732–742. IEEE, 2017.
[3]
Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In Conference On Learning Theory, pages 843–856, 2018.
[4]
Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067–1080. ACM, 2017.
[5]
Dana Moshkovitz and Michal Moshkovitz. Mixing implies lower bounds for space bounded learning. In Conference on Learning Theory, pages 1516–1566, 2017.
[6]
Dana Moshkovitz and Michal Moshkovitz. Entropy samplers and strong generic lower bounds for space bounded learning. In 9th Innovations in Theoretical Computer Science Conference (ITCS), 2018.
[7]
Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 990–1002. ACM, 2018.
[8]
Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Conference on Learning Theory, pages 1490–1516, 2016.
[9]
A.S. Nemirovski and D.B Yudin. Problem complexity and method efficiency in optimization. 1983.
[10]
Yurii Nesterov. Introductory lectures on convex optimization: A basic course. 2014.
[11]
Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.
[12]
Arkadi Nemirovski. On parallel complexity of nonsmooth convex optimization. J. Complexity, 10(4):451–463, 1994.
[13]
Eric Balkanski and Yaron Singer. Parallelization does not accelerate convex optimization: Adaptivity lower bounds for non-smooth convex minimization. CoRR, abs/1808.03880, 2018.
[14]
Jelena Diakonikolas and Cristóbal Guzmán. Lower bounds for parallel and randomized convex optimization. CoRR, abs/1811.01903, 2018.
[15]
Thomas Strohmer and Roman Vershynin. A randomized solver for linear systems with exponential convergence. In 10th International Workshop on Randomization and Computation, RANDOM, pages 499–507, 2006.
[16]
Chonghai Hu, James T. Kwok, and Weike Pan. Accelerated gradient methods for stochastic optimization and online learning. In Advances in Neural Information Processing Systems, pages 781–789, 2009.
[17]
Guanghui Lan. An optimal method for stochastic composite optimization. Math. Program., 133(1-2):365–397, 2012.
[18]
Saeed Ghadimi and Guanghui Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23(4):2061–2089, 2013.
[19]
Yin Tat Lee and Aaron Sidford. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 147–156, 2013.
[20]
Roy Frostig, Rong Ge, Sham M. Kakade, and Aaron Sidford. Competing with the empirical risk minimizer in a single pass. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 728–763, 2015.
[21]
Ji Liu and Stephen J. Wright. An accelerated randomized kaczmarz algorithm. Math. Comput., 85(297):153–178, 2016.
[22]
Deanna Needell, Nathan Srebro, and Rachel Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program., 155(1-2):549–573, 2016.
[23]
Aymeric Dieuleveut, Nicolas Flammarion, and Francis Bach. Harder, better, faster, stronger convergence rates for least-squares regression. J. Mach. Learn. Res., 18 (1):3520–3570, 2017.
[24]
Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Accelerating stochastic gradient descent for least squares regression. In Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018., pages 545–604, 2018.
[25]
Weihao Kong. Personal communication. 2018.
[26]
Jacob Steinhardt and John Duchi. Minimax rates for memory-bounded sparse linear regression. In Conference on Learning Theory, pages 1564–1587, 2015.
[27]
Ohad Shamir. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems, pages 163–171, 2014.
[28]
Yuval Dagan and Ohad Shamir. Detecting correlations with little memory and communication. In Conference on Learning Theory (COLT), 2018.
[29]
Mark Braverman, Ankit Garg, Tengyu Ma, Huy L Nguyen, and David P Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1011–1020. ACM, 2016.
[30]
Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and system sciences, 58(1): 137–147, 1999.
[31]
Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702–732, 2004.
[32]
Blake E. Woodworth and Nati Srebro. Tight complexity bounds for optimizing composite objectives. In Advances in Neural Information Processing Systems, pages 3639–3647, 2016.
[33]
Gábor Braun, Cristobal Guzman, and Sebastian Pokutta. Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Trans. Information Theory, 63(7):4709–4724, 2017.
[34]
Yurii Nesterov. How to make the gradients small. Optima, 88:10–11, 2012.
[35]
Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. How Much Patience to You Have?: A Worst-case Perspective on Smooth Noncovex Optimization. Science and Technology Facilities Council Swindon, 2012.
[36]
Coralia Cartis, Nick IM Gould, and Philippe L Toint. Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization. arXiv preprint arXiv:1709.07180, 2017.
[37]
Yair Carmon, John C. Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points I. CoRR, abs/1710.11606, 2017.
[38]
Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points II: First-order methods. arXiv preprint arXiv:1711.00841, 2017.
[39]
Olivier Devolder, François Glineur, Yurii Nesterov, et al. First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers, 2013016, 2013.
[40]
Olivier Devolder, François Glineur, and Yurii Nesterov. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146(1-2):37–75, 2014.
[41]
Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, pages 3–24, 2013.
[42]
John C. Duchi, Michael I. Jordan, Martin J. Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Information Theory, 61(5):2788–2806, 2015.
[43]
Sergey G Bobkov. On concentration of distributions of random weighted sums. Annals of probability, pages 195–215, 2003.
[44]
Milla Anttila, Keith Ball, and Irini Perissinaki. The central limit problem for convex bodies. Transactions of the American Mathematical Society, 355(12):4723– 4735, 2003.
[45]
Heinrich von Weizsäcker. Sudakov’s typical marginals, random linear functionals and a conditional central limit theorem. Probability theory and related fields, 107 (3):313–324, 1997.
[46]
Sanjoy Dasgupta, Daniel Hsu, and Nakul Verma. A concentration theorem for projections. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 114–121. AUAI Press, 2006.
[47]
Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000.

Cited By

View all
  • (2024)Efficient Convex Optimization Requires Superlinear MemoryJournal of the ACM10.1145/368920871:6(1-37)Online publication date: 6-Sep-2024
  • (2024)A New Information Complexity Measure for Multi-pass Streaming with ApplicationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649672(1781-1792)Online publication date: 10-Jun-2024
  • (2024)Memory-Sample Lower Bounds for LWEAdvances in Cryptology – CRYPTO 202410.1007/978-3-031-68388-6_7(158-182)Online publication date: 17-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing
June 2019
1258 pages
ISBN:9781450367059
DOI:10.1145/3313276
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Lower bounds for optimization
  2. Memory-bounded learning

Qualifiers

  • Research-article

Funding Sources

Conference

STOC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)28
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Convex Optimization Requires Superlinear MemoryJournal of the ACM10.1145/368920871:6(1-37)Online publication date: 6-Sep-2024
  • (2024)A New Information Complexity Measure for Multi-pass Streaming with ApplicationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649672(1781-1792)Online publication date: 10-Jun-2024
  • (2024)Memory-Sample Lower Bounds for LWEAdvances in Cryptology – CRYPTO 202410.1007/978-3-031-68388-6_7(158-182)Online publication date: 17-Aug-2024
  • (2023)Hypothesis selection with memory constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668317(50453-50481)Online publication date: 10-Dec-2023
  • (2023)Efficient convex optimization requires superlinear memory (extended abstract)Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/722(6468-6473)Online publication date: 19-Aug-2023
  • (2023)Memory-Sample Lower Bounds for Learning with Classical-Quantum Hybrid MemoryProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585129(1097-1110)Online publication date: 2-Jun-2023
  • (2023)Memory-Query Tradeoffs for Randomized Convex Optimization2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00086(1400-1413)Online publication date: 6-Nov-2023
  • (2023)Tight Time-Space Lower Bounds for Constant-Pass Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00070(1195-1202)Online publication date: 6-Nov-2023
  • (2023)Near Optimal Memory-Regret Tradeoff for Online Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00069(1171-1194)Online publication date: 6-Nov-2023
  • (2022)Estimation of entropy in constant space with improved sample complexityProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602623(32474-32486)Online publication date: 28-Nov-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media