Article

Scalable training of mixture models via coresets

Authors:

Matthew Faulkner,

Andreas KrauseAuthors Info & Claims

NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing Systems

Pages 2142 - 2150

Published: 12 December 2011 Publication History

Abstract

How can we train a statistical mixture model on a massive data set? In this paper, we show how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset will also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size independent of the size of the data set. More precisely, we prove that a weighted set of O(dk³/ε²) data points suffices for computing a (1 + ε)-approximation for the optimal model on the original n data points. Moreover, such coresets can be efficiently constructed in a map-reduce style computation, as well as in a streaming setting. Our results rely on a novel reduction of statistical estimation to problems in computational geometry, as well as new complexity results about mixtures of Gaussians. We empirically evaluate our algorithms on several real data sets, including a density estimation problem in the context of earthquake detection using accelerometers in mobile phones.

References

[1]

P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Geometric approximations via coresets. Combinatorial and Computational Geometry - MSRI Publications, 52:1-30, 2005.

[2]

A. Czumaj and C. Sohler. Sublinear-time approximation algorithms for clustering via random sampling. Random Struct. Algorithms (RSA), 30(1-2):226-256, 2007.

[3]

Sanjeev Arora and Ravi Kannan. Learning mixtures of separated nonspherical gaussians. Annals of Applied Probability, 15(1A):69-92, 2005.

[4]

D. Feldman and M. Langberg. A unified framework for approximating and clustering data. In Proc. 41th Annu. ACM Symp. on Theory of Computing (STOC), 2011.

[5]

S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3-19, 2007.

[6]

S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In Proc. 36th Annu. ACM Symp. on Theory of Computing (STOC), pages 291-300, 2004.

[7]

Jon Louis Bentley and James B. Saxe. Decomposable searching problems i: Static-to-dynamic transformation. J. Algorithms, 1(4):301-358, 1980.

[8]

D. Feldman, M. Monemizadeh, and C. Sohler. A PTAS for k-means clustering based on weak coresets. In Proc. 23rd ACM Symp. on Computational Geometry (SoCG), pages 11-18, 2007.

[9]

Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004.

[10]

D. Feldman, A. Fiat, and M. Sharir. Coresets for weighted facilities and their applications. In Proc. 47th IEEE Annu. Symp. on Foundations of Computer Science (FOCS), pages 315-324, 2006.

[11]

Ryan Gomes, Andreas Krause, and Pietro Perona. Discriminative clustering by regularized information maximization. In Proc. Neural Information Processing Systems (NIPS), 2010.

[12]

Matthew Faulkner, Michael Olson, Rishi Chandy, Jonathan Krause, K. Mani Chandy, and Andreas Krause. The next big one: Detecting earthquakes and other rare events from community-based sensors. In In Proc. ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2011.

[13]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. J. Roy. Statist. Soc. Ser. B, 39:1-38, 1977.

[14]

S. Dasgupta. Learning mixtures of gaussians. In Fortieth Annual IEEE Symposium on Foundations of Computer Science (FOCS), 1999.

[15]

S. Dasgupta and L.J. Schulman. A two-round variant of em for gaussian mixtures. In Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI), 2000.

[16]

S. Vempala and G. Wang. A spectral algorithm for learning mixture models. In In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002.

[17]

J. Feldman, R. A. Servedio, and R. O'Donnell. Pac learning axis-aligned mixtures of gaussians with no separation assumption. In COLT, 2006.

[18]

A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of gaussians. In In Proc. Foundations of Computer Science (FOCS), 2010.

[19]

M. Belkin and K. Sinha. Polynomial learning of distribution families. In In Proc. Foundations of Computer Science (FOCS), 2010.

[20]

D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf. Comput., 100(1):78-150, 1992.

[21]

S. Har-Peled and K. R. Varadarajan. High-dimensional shape fitting in linear time. Discrete & Computational Geometry, 32(2):269-288, 2004.

[22]

M.W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697, 2009.

[23]

G. Frahling and C. Sohler. Coresets in dynamic geometric data streams. In Proc. 37th Annu. ACM Symp. on Theory of Computing (STOC), pages 209-217, 2005.

Cited By

Tukan MMualem LMaalouf AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Pruning neural networks via coresets and convex geometryProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3603024(38003-38019)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3603024
Sinha SZhang HGoyal ABengio YLarochelle HOdena ADaumé HSingh A(2020)Small-GANProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525773(9005-9015)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525773
Blömer JBrauer SBujna K(2020)A Complexity Theoretical Study of Fuzzy K-MeansACM Transactions on Algorithms10.1145/340938516:4(1-25)Online publication date: 16-Sep-2020
https://dl.acm.org/doi/10.1145/3409385
Show More Cited By

Index Terms

Scalable training of mixture models via coresets

Recommendations

Training Gaussian mixture models at scale via coresets

How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a ...
Likelihood estimation for mixture models via the em algorithm
Fast progressive training of mixture models for model selection

Finite mixture models (FMM) are flexible models with varying uses such as density estimation, clustering, classification, modeling heterogeneity, model averaging, and handling missing data. Expectation maximization (EM) algorithm can learn the maximum ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing Systems

December 2011

2752 pages

ISBN:9781618395993

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 12 December 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tukan MMualem LMaalouf AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Pruning neural networks via coresets and convex geometryProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3603024(38003-38019)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3603024
Sinha SZhang HGoyal ABengio YLarochelle HOdena ADaumé HSingh A(2020)Small-GANProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525773(9005-9015)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525773
Blömer JBrauer SBujna K(2020)A Complexity Theoretical Study of Fuzzy K-MeansACM Transactions on Algorithms10.1145/340938516:4(1-25)Online publication date: 16-Sep-2020
https://dl.acm.org/doi/10.1145/3409385
Campbell TBroderick T(2019)Automated scalable Bayesian inference via Hilbert coresetsThe Journal of Machine Learning Research10.5555/3322706.332272120:1(551-588)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3322706.3322721
Hasani SThirumuruganathan SAsudeh AKoudas NDas G(2019)Efficient construction of approximate ad-hoc ML models through materialization and reuseProceedings of the VLDB Endowment10.14778/3236187.326946211:11(1468-1481)Online publication date: 17-Jan-2019
https://dl.acm.org/doi/10.14778/3236187.3269462
Munteanu ASchwiegelshohn CSohler CWoodruff D(2018)On coresets for logistic regressionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327757.3327763(6562-6571)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327757.3327763
Hasani SThirumuruganathan SAsudeh AKoudas NDas G(2018)Efficient construction of approximate ad-hoc ML models through materialization and reuseProceedings of the VLDB Endowment10.5555/3236187.326946211:11(1468-1481)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.5555/3236187.3269462
Cohen-Addad VCzumaj A(2018)A fast approximation scheme for low-dimensional k-meansProceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3174304.3175298(430-440)Online publication date: 7-Jan-2018
https://dl.acm.org/doi/10.5555/3174304.3175298
Hasani SThirumuruganathan SAsudeh AKoudas NDas G(2018)Efficient construction of approximate ad-hoc ML models through materialization and reuseProceedings of the VLDB Endowment10.14778/3236187.323619911:11(1468-1481)Online publication date: 5-Oct-2018
https://dl.acm.org/doi/10.14778/3236187.3236199
Bachem OLucic MKrause AGuo YFarooq F(2018)Scalable k -Means Clustering via Lightweight CoresetsProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219973(1119-1127)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3219973
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents