research-article

Open access

Efficiently learning structured distributions from untrusted batches

Authors:

Sitan Chen,

Jerry Li,

Ankur MoitraAuthors Info & Claims

STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing

Pages 960 - 973

https://doi.org/10.1145/3357713.3384337

Published: 22 June 2020 Publication History

PDF eReader

Abstract

We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution over 1, …, n. Each user sends a batch of k i.i.d. samples from this distribution; however an є-fraction of users are untrustworthy and can send adversarially chosen responses. The goal of the algorithm is to learn in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-understood that (є) error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when k is large. Unfortunately, their algorithms run in time which is exponential in either n or k.

We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions.

It turns out that this abstraction is quite powerful, and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in n for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.

References

[1]

J. Acharya, I. Diakonikolas, C. Hegde, J. Li, and L. Schmidt. 2015. Fast and Near-Optimal Algorithms for Approximating Distributions by Histograms. In PODS.

Abstract

References

Cited By

Index Terms

Recommendations

List-decodable robust mean estimation and learning mixtures of spherical gaussians

Learning mixtures of structured distributions over discrete domains

List-decodable covariance estimation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations