research-article

Public Access

Memory-sample tradeoffs for linear regression with small error

Authors:

Gregory ValiantAuthors Info & Claims

STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

Pages 890 - 901

https://doi.org/10.1145/3313276.3316403

Published: 23 June 2019 Publication History

Abstract

We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a₁,b₁), (a₂,b₂)…, with a_i drawn independently from a d-dimensional isotropic Gaussian, and where b_i = ⟨ a_i, x⟩ + η_i, for a fixed x ∈ ℝ^d with ||x||₂ = 1 and with independent noise η_i drawn uniformly from the interval [−2^−d/5,2^−d/5]. We show that any algorithm with at most d²/4 bits of memory requires at least Ω(d loglog1/є) samples to approximate x to ℓ₂ error є with probability of success at least 2/3, for є sufficiently small as a function of d. In contrast, for such є, x can be recovered to error є with probability 1−o(1) with memory O(d² log(1/є)) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

References

[1]

Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 266–275. IEEE, 2016.

[2]

Ran Raz. A time-space lower bound for a large class of learning problems. In Foundations of Computer Science (FOCS), 2017 IEEE 58th Annual Symposium on, pages 732–742. IEEE, 2017.

[3]

Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In Conference On Learning Theory, pages 843–856, 2018.

[4]

Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067–1080. ACM, 2017.

Digital Library

[5]

Dana Moshkovitz and Michal Moshkovitz. Mixing implies lower bounds for space bounded learning. In Conference on Learning Theory, pages 1516–1566, 2017.

[6]

Dana Moshkovitz and Michal Moshkovitz. Entropy samplers and strong generic lower bounds for space bounded learning. In 9th Innovations in Theoretical Computer Science Conference (ITCS), 2018.

[7]

Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 990–1002. ACM, 2018.

Digital Library

[8]

Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Conference on Learning Theory, pages 1490–1516, 2016.

[9]

A.S. Nemirovski and D.B Yudin. Problem complexity and method efficiency in optimization. 1983.

[10]

Yurii Nesterov. Introductory lectures on convex optimization: A basic course. 2014.

Digital Library

[11]

Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.

Digital Library

[12]

Arkadi Nemirovski. On parallel complexity of nonsmooth convex optimization. J. Complexity, 10(4):451–463, 1994.

Digital Library

[13]

Eric Balkanski and Yaron Singer. Parallelization does not accelerate convex optimization: Adaptivity lower bounds for non-smooth convex minimization. CoRR, abs/1808.03880, 2018.

[14]

Jelena Diakonikolas and Cristóbal Guzmán. Lower bounds for parallel and randomized convex optimization. CoRR, abs/1811.01903, 2018.

[15]

Thomas Strohmer and Roman Vershynin. A randomized solver for linear systems with exponential convergence. In 10th International Workshop on Randomization and Computation, RANDOM, pages 499–507, 2006.

Digital Library

[16]

Chonghai Hu, James T. Kwok, and Weike Pan. Accelerated gradient methods for stochastic optimization and online learning. In Advances in Neural Information Processing Systems, pages 781–789, 2009.

Digital Library

[17]

Guanghui Lan. An optimal method for stochastic composite optimization. Math. Program., 133(1-2):365–397, 2012.

Digital Library

[18]

Saeed Ghadimi and Guanghui Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23(4):2061–2089, 2013.

Digital Library

[19]

Yin Tat Lee and Aaron Sidford. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In 54th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2013, 26-29 October, 2013, Berkeley, CA, USA, pages 147–156, 2013.

Digital Library

[20]

Roy Frostig, Rong Ge, Sham M. Kakade, and Aaron Sidford. Competing with the empirical risk minimizer in a single pass. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 728–763, 2015.

[21]

Ji Liu and Stephen J. Wright. An accelerated randomized kaczmarz algorithm. Math. Comput., 85(297):153–178, 2016.

[22]

Deanna Needell, Nathan Srebro, and Rachel Ward. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program., 155(1-2):549–573, 2016.

Digital Library

[23]

Aymeric Dieuleveut, Nicolas Flammarion, and Francis Bach. Harder, better, faster, stronger convergence rates for least-squares regression. J. Mach. Learn. Res., 18 (1):3520–3570, 2017.

Digital Library

[24]

Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, and Aaron Sidford. Accelerating stochastic gradient descent for least squares regression. In Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018., pages 545–604, 2018.

[25]

Weihao Kong. Personal communication. 2018.

[26]

Jacob Steinhardt and John Duchi. Minimax rates for memory-bounded sparse linear regression. In Conference on Learning Theory, pages 1564–1587, 2015.

[27]

Ohad Shamir. Fundamental limits of online and distributed algorithms for statistical learning and estimation. In Advances in Neural Information Processing Systems, pages 163–171, 2014.

[28]

Yuval Dagan and Ohad Shamir. Detecting correlations with little memory and communication. In Conference on Learning Theory (COLT), 2018.

[29]

Mark Braverman, Ankit Garg, Tengyu Ma, Huy L Nguyen, and David P Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 1011–1020. ACM, 2016.

Digital Library

[30]

Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and system sciences, 58(1): 137–147, 1999.

Digital Library

[31]

Ziv Bar-Yossef, Thathachar S Jayram, Ravi Kumar, and D Sivakumar. An information statistics approach to data stream and communication complexity. Journal of Computer and System Sciences, 68(4):702–732, 2004.

Digital Library

[32]

Blake E. Woodworth and Nati Srebro. Tight complexity bounds for optimizing composite objectives. In Advances in Neural Information Processing Systems, pages 3639–3647, 2016.

Digital Library

[33]

Gábor Braun, Cristobal Guzman, and Sebastian Pokutta. Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Trans. Information Theory, 63(7):4709–4724, 2017.

[34]

Yurii Nesterov. How to make the gradients small. Optima, 88:10–11, 2012.

[35]

Coralia Cartis, Nicholas IM Gould, and Philippe L Toint. How Much Patience to You Have?: A Worst-case Perspective on Smooth Noncovex Optimization. Science and Technology Facilities Council Swindon, 2012.

[36]

Coralia Cartis, Nick IM Gould, and Philippe L Toint. Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization. arXiv preprint arXiv:1709.07180, 2017.

[37]

Yair Carmon, John C. Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points I. CoRR, abs/1710.11606, 2017.

[38]

Yair Carmon, John C Duchi, Oliver Hinder, and Aaron Sidford. Lower bounds for finding stationary points II: First-order methods. arXiv preprint arXiv:1711.00841, 2017.

[39]

Olivier Devolder, François Glineur, Yurii Nesterov, et al. First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers, 2013016, 2013.

[40]

Olivier Devolder, François Glineur, and Yurii Nesterov. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146(1-2):37–75, 2014.

Digital Library

[41]

Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, pages 3–24, 2013.

[42]

John C. Duchi, Michael I. Jordan, Martin J. Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Information Theory, 61(5):2788–2806, 2015.

[43]

Sergey G Bobkov. On concentration of distributions of random weighted sums. Annals of probability, pages 195–215, 2003.

[44]

Milla Anttila, Keith Ball, and Irini Perissinaki. The central limit problem for convex bodies. Transactions of the American Mathematical Society, 355(12):4723– 4735, 2003.

[45]

Heinrich von Weizsäcker. Sudakov’s typical marginals, random linear functionals and a conditional central limit theorem. Probability theory and related fields, 107 (3):313–324, 1997.

[46]

Sanjoy Dasgupta, Daniel Hsu, and Nakul Verma. A concentration theorem for projections. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 114–121. AUAI Press, 2006.

Digital Library

[47]

Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000.

Cited By

Marsden ASharan VSidford AValiant G(2024)Efficient Convex Optimization Requires Superlinear MemoryJournal of the ACM10.1145/368920871:6(1-37)Online publication date: 6-Sep-2024
https://dl.acm.org/doi/10.1145/3689208
Braverman MGarg SLi QWang SWoodruff DZhang JMohar BShinkar IO'Donnell R(2024)A New Information Complexity Measure for Multi-pass Streaming with ApplicationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649672(1781-1792)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649672
Lu MYang J(2024)Memory-Sample Lower Bounds for LWEAdvances in Cryptology – CRYPTO 202410.1007/978-3-031-68388-6_7(158-182)Online publication date: 17-Aug-2024
https://doi.org/10.1007/978-3-031-68388-6_7
Show More Cited By

Index Terms

Memory-sample tradeoffs for linear regression with small error
1. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
  2. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies
CF '12: Proceedings of the 9th conference on Computing Frontiers

To handle the demand for very large main memory, we are likely to use nonvolatile memory (NVM) as main memory. NVM main memory will have higher latency than DRAM. To cope with this, we advocate a less-deep cache hierarchy based on a large last-level, ...
Heterogeneous HMC+DDRx Memory Management for Performance-Temperature Tradeoffs

Three-dimensional DRAMs (3D-DRAMs) are emerging as a promising solution to address the memory wall problem in computer systems. However, high fabrication cost per bit and thermal issues are the main reasons that prevent architects from using 3D-DRAM ...
Linear-Size hopsets with small hopbound, and constant-hopbound hopsets in RNC
Abstract
Hopsets are a fundamental graph-theoretic and graph-algorithmic construct, and they are widely used for distance-related problems in a variety of computational settings. Currently existing constructions of hopsets produce hopsets either with $Ω (n ...$ $^{}$ $^{}$ $^{}$

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC 2019: Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

June 2019

1258 pages

ISBN:9781450367059

DOI:10.1145/3313276

General Chair:
Moses Charikar
Stanford University
,
Program Chair:
Edith Cohen
Google, USA / Tel Aviv University, Israel

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

STOC '19

Sponsor:

SIGACT

STOC '19: 51st Annual ACM SIGACT Symposium on the Theory of Computing

June 23 - 26, 2019

AZ, Phoenix, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
546
Total Downloads

Downloads (Last 12 months)133
Downloads (Last 6 weeks)28

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Marsden ASharan VSidford AValiant G(2024)Efficient Convex Optimization Requires Superlinear MemoryJournal of the ACM10.1145/368920871:6(1-37)Online publication date: 6-Sep-2024
https://dl.acm.org/doi/10.1145/3689208
Braverman MGarg SLi QWang SWoodruff DZhang JMohar BShinkar IO'Donnell R(2024)A New Information Complexity Measure for Multi-pass Streaming with ApplicationsProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649672(1781-1792)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3618260.3649672
Lu MYang J(2024)Memory-Sample Lower Bounds for LWEAdvances in Cryptology – CRYPTO 202410.1007/978-3-031-68388-6_7(158-182)Online publication date: 17-Aug-2024
https://doi.org/10.1007/978-3-031-68388-6_7
Aliakbarpour MBun MSmith AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Hypothesis selection with memory constraintsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668317(50453-50481)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668317
Marsden ASharan VSidford AValiant GElkind E(2023)Efficient convex optimization requires superlinear memory (extended abstract)Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/722(6468-6473)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/722
Liu QRaz RZhan WSaha BServedio R(2023)Memory-Sample Lower Bounds for Learning with Classical-Quantum Hybrid MemoryProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585129(1097-1110)Online publication date: 2-Jun-2023
https://dl.acm.org/doi/10.1145/3564246.3585129
Chen XPeng B(2023)Memory-Query Tradeoffs for Randomized Convex Optimization2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00086(1400-1413)Online publication date: 6-Nov-2023
https://doi.org/10.1109/FOCS57990.2023.00086
Lyu XTal AWu HYang J(2023)Tight Time-Space Lower Bounds for Constant-Pass Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00070(1195-1202)Online publication date: 6-Nov-2023
https://doi.org/10.1109/FOCS57990.2023.00070
Peng BRubinstein A(2023)Near Optimal Memory-Regret Tradeoff for Online Learning2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00069(1171-1194)Online publication date: 6-Nov-2023
https://doi.org/10.1109/FOCS57990.2023.00069
Aliakbarpour MMcGregor ANelson JWaingarten EKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Estimation of entropy in constant space with improved sample complexityProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602623(32474-32486)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602623
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents