research-article

Public Access

List-decodable robust mean estimation and learning mixtures of spherical gaussians

Authors:

Ilias Diakonikolas,

Daniel M. Kane,

Alistair StewartAuthors Info & Claims

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

Pages 1047 - 1060

https://doi.org/10.1145/3188745.3188758

Published: 20 June 2018 Publication History

PDF eReader

Abstract

We study the problem of list-decodable (robust) Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. In the former problem, we are given a set T of points in ⁿ with the promise that an α-fraction of points in T, where 0< α < 1/2, are drawn from an unknown mean identity covariance Gaussian G, and no assumptions are made about the remaining points. The goal is to output a small list of candidate vectors with the guarantee that at least one of the candidates is close to the mean of G. In the latter problem, we are given samples from a k-mixture of spherical Gaussians on ⁿ and the goal is to estimate the unknown model parameters up to small accuracy. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. Specifically, our main contributions are as follows:

List-Decodable Mean Estimation. Fix any d ∈ ₊ and 0< α <1/2. We design an algorithm with sample complexity O_d ((n^d/α)) and runtime O_d ((n/α)^d) that outputs a list of O(1/α) many candidate vectors such that with high probability one of the candidates is within ℓ₂-distance O_d(α^−1/(2d)) from the mean of G. The only previous algorithm for this problem achieved error Õ(α^−1/2) under second moment conditions. For d = O(1/), where >0 is a constant, our algorithm runs in polynomial time and achieves error O(α). For d = Θ(log(1/α)), our algorithm runs in time (n/α)^O(log(1/α)) and achieves error O(log^3/2(1/α)), almost matching the information-theoretically optimal bound of Θ(log^1/2(1/α)) that we establish. We also give a Statistical Query (SQ) lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible.

Learning Mixtures of Spherical Gaussians. We give a learning algorithm for mixtures of spherical Gaussians, with unknown spherical covariances, that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform k-mixture of identity covariance Gaussians we obtain the following: For any >0, if the pairwise separation between the means is at least Ω(k+√log(1/δ)), our algorithm learns the unknown parameters within accuracy δ with sample complexity and running time (n, 1/δ, (k/)^1/). Moreover, our algorithm is robust to a small dimension-independent fraction of corrupted data. The previously best known polynomial time algorithm required separation at least k^1/4 (k/δ). Finally, our algorithm works under separation of Õ(log^3/2(k)+√log(1/δ)) with sample complexity and running time (n, 1/δ, k^logk). This bound is close to the information-theoretically minimum separation of Ω(√logk).

Our main technical contribution is a new technique, using degree-d multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.

Supplementary Material

MP4 File (7b-2.mp4)

Download
28.53 MB

References

[1]

J. Acharya, I. Diakonikolas, J. Li, and L. Schmidt. 2017. Sample-Optimal Density Estimation in Nearly-Linear Time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017. 1278–1289.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Robustly learning mixtures of k arbitrary Gaussians

Clustering mixture models in almost-linear time via list-decodable mean estimation

List-decodable covariance estimation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations