research-article

Open access

β-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

Authors:

Dionysis Manousakas,

Cecilia MascoloAuthors Info & Claims

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Pages 940 - 948

https://doi.org/10.1145/3437963.3441793

Published: 08 March 2021 Publication History

Abstract

Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers. Despite the multiple benefits of Bayesian methods (such as uncertainty-aware predictions, incorporation of experts knowledge, and hierarchical modeling), the quality of classic Bayesian inference depends critically on whether observations conform with the assumed data generating model, which is impossible to guarantee in practice. In this work, we propose a variational inference method that, in a principled way, can simultaneously scale to large datasets, and robustify the inferred posterior with respect to the existence of outliers in the observed data. Reformulating Bayes theorem via the β-divergence, we posit a robustified generalized Bayesian posterior as the target of inference. Moreover, relying on the recent formulations of Riemannian coresets for scalable Bayesian inference, we propose a sparse variational approximation of the robustified posterior and an efficient stochastic black-box algorithm to construct it. Overall our method allows releasing cleansed data summaries that can be applied broadly in scenarios involving structured and unstructured data contamination. We illustrate the applicability of our approach in diverse simulated and real datasets, and various statistical models, including Gaussian mean inference, logistic and neural linear regression, demonstrating its superiority to existing Bayesian summarization methods in the presence of outliers.

References

[1]

E. Angelino, M. J. Johnson, and R. P. Adams. Patterns of scalable Bayesian inference. Found. Trends Mach. Learn., 2016.

[2]

M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security of machine learning. Machine Learning, 2010.

[3]

A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549--559, 09 1998.

[4]

J. O. Berger, E. Moreno, L. R. Pericchi, M. J. Bayarri, J. M. Bernardo, J. A. Cano, J. De la Horra, J. Mart'in, D. R'ios Insua, B. Betrò, et al. An overview of robust Bayesian analysis. Test, 3(1):5--124, 1994.

[5]

K. Bhatia, Y.-A. Ma, A. D. Dragan, P. L. Bartlett, and M. I. Jordan. Bayesian robustness: A nonasymptotic viewpoint. arXiv preprint, 2019.

[6]

B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against Support Vector Machines. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 2012.

Digital Library

[7]

T. Campbell and B. Beronov. Sparse Variational Inference: Bayesian coresets from scratch. In Advances in Neural Information Processing Systems, 2019.

[8]

T. Campbell and T. Broderick. Automated scalable Bayesian inference via Hilbert coresets. Journal of Machine Learning Research, 20(15), 2019.

[9]

A. Cichocki and S. Amari. Families of Alpha- Beta- and Gamma- divergences: Flexible and robust measures of similarities. Entropy, 12(6):1532--1568, 2010.

[10]

A. P. Dawid, M. Musio, and L. Ventura. Minimum scoring rule inference. Scandinavian Journal of Statistics, 43(1):123--138, 2016.

[11]

B. de Finetti. The Bayesian approach to the rejection of outliers. In Proceedings of the fourth Berkeley Symposium on Probability and Statistics, 1961.

[12]

I. Diakonikolas, G. Kamath, D. Kane, J. Li, J. Steinhardt, and A. Stewart. Sever: A robust meta-algorithm for stochastic optimization. In Proceedings of the 36th International Conference on Machine Learning, 2019.

[13]

C. Dickens, E. Meissner, P. G. Moreno, and T. Diethe. Interpretable anomaly detection with Mondrian Pólya forests on data streams. arXiv preprint, 2020.

[14]

D. Dua and C. Graff. UCI Machine Learning Repository, 2017.

[15]

J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121--2159, 2011.

Digital Library

[16]

S. Eguchi and Y. Kano. Robustifying maximum likelihood estimation. Tokyo Institute of Statistical Mathematics, Tokyo, Japan, Tech. Rep, 2001.

[17]

B. Frénay and M. Verleysen. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, 25(5), 2013.

[18]

H. Fujisawa and S. Eguchi. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal., pages 2053--2081, 2008.

Digital Library

[19]

F. Futami, I. Sato, and M. Sugiyama. Variational Inference based on robust divergences. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018.

[20]

A. Ghorbani and J. Zou. Data Shapley: Equitable valuation of data for machine learning. In Proceedings of the 36th International Conference on Machine Learning, 2019.

[21]

A. Ghosh and A. Basu. Robust Bayes estimation using the density power divergence. Annals of the Institute of Statistical Mathematics, 68(2):413--437, 2016.

[22]

M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic Variational Inference. Journal of Machine Learning Research, 14:1303--1347, 2013.

Digital Library

[23]

M. D. Hoffman and A. Gelman. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 2014.

[24]

P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73--101, 03 1964.

[25]

P. J. Huber and E. M. Ronchetti. Robust statistics; 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, 2009.

[26]

J. Huggins, T. Campbell, and T. Broderick. Coresets for scalable Bayesian logistic regression. In Advances in Neural Information Processing Systems, 2016.

[27]

J. Jewson, J. Q. Smith, and C. Holmes. Principles of Bayesian inference using general divergence criteria. Entropy, 20(6):442, 2018.

[28]

D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In Advances in neural information processing systems, 2011.

Digital Library

[29]

J. Knoblauch, J. E. Jewson, and T. Damoulas. Doubly robust Bayesian inference for non-stationary streaming data with β-divergences. In Advances in Neural Information Processing Systems 31. 2018.

[30]

P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, 2017.

Digital Library

[31]

R. Kohavi. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In KDD, 1996.

Digital Library

[32]

D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361--397, Dec. 2004.

Digital Library

[33]

B. Li, Y. Wang, A. Singh, and Y. Vorobeychik. Data poisoning attacks on factorization-based collaborative filtering. In Advances in neural information processing systems, 2016.

[34]

Q. Liu, J. Peng, and A. T. Ihler. Variational Inference for crowdsourcing. In Advances in Neural Information Processing Systems 25. 2012.

[35]

D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

Digital Library

[36]

D. Manousakas, Z. Xu, C. Mascolo, and T. Campbell. Bayesian pseudocoresets. In Advances in Neural Information Processing Systems. 2020.

[37]

J. W. Miller and D. B. Dunson. Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 114(527), 2019.

[38]

P. Paschou, J. Lewis, A. Javed, and P. Drineas. Ancestry informative markers for fine-scale individual assignment to worldwide populations. Journal of Medical Genetics, 2010.

[39]

R. Pinsler, J. Gordon, E. Nalisnick, and J. M. Hernández-Lobato. Bayesian batch active learning as sparse subset approximation. In Advances in Neural Information Processing Systems. 2019.

[40]

V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 11(4), 2010.

[41]

D. R'ios Insua and F. Ruggeri. Robust Bayesian Analysis, volume 152. Springer Science & Business Media, 2012.

[42]

C. Riquelme, G. Tucker, and J. Snoek. Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling. In 6th International Conference on Learning Representations, 2018.

[43]

W. Samek, D. Blythe, K.-R. Müller, and M. Kawanabe. Robust spatial filtering with beta divergence. In Advances in Neural Information Processing Systems, 2013.

[44]

L. S. Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28), 1953.

[45]

V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.

Digital Library

[46]

J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, and R. Adams. Scalable Bayesian optimization using deep neural networks. In Proceedings of the 32nd International Conference on Machine Learning, 2015.

[47]

J. Steinhardt, P. W. W. Koh, and P. S. Liang. Certified defenses for data poisoning attacks. In Advances in neural information processing systems, 2017.

[48]

B. Strack, J. P. DeShazo, C. Gennings, J. L. Olmo, S. Ventura, K. J. Cios, and J. N. Clore. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 2014.

[49]

S. Vahidian, B. Mirzasoleiman, and A. Cloninger. Coresets for estimating means and mean square error with limited greedy samples. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 2020.

[50]

M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Found. Trends Mach. Learn., 1(1--2):1--305, Jan. 2008.

[51]

C. Wang and D. M. Blei. A general method for robust Bayesian modeling. Bayesian Analysis, 2018.

[52]

D. Wang, D. Irani, and C. Pu. Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2012.

Digital Library

[53]

Y. Wang, A. Kucukelbir, and D. M. Blei. Robust probabilistic modeling with Bayesian data reweighting. In Proceedings of the 34th International Conference on Machine Learning, 2017.

Digital Library

[54]

M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning, 2011.

[55]

J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems, 2009.

Digital Library

[56]

A. Zellner. Optimal information processing and Bayes's theorem. The American Statistician, 42(4):278--280, 1988.

[57]

Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. The Journal of Machine Learning Research, 17(1):3537--3580, 2016.

Digital Library

[58]

V. W. Zheng, S. J. Pan, Q. Yang, and J. J. Pan. Transferring multi-device localization models using latent multi-task learning. In AAAI, 2008.

[59]

H. Zhuang, A. Parameswaran, D. Roth, and J. Han. Debiasing crowdsourced batches. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.

Digital Library

Cited By

Manousakas DRitter HKaraletsos TKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Black-box coreset variational inferenceProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602747(34175-34187)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602747

Index Terms

β-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

Recommendations

Robust Bayesian clustering

A new variational Bayesian learning algorithm for Student-t mixture models is introduced. This algorithm leads to (i) robust density estimation, (ii) robust clustering and (iii) robust automatic model selection. Gaussian mixture models are learning ...
Robust Bayesian mixture modelling

Bayesian approaches to density estimation and clustering using mixture distributions allow the automatic determination of the number of components in the mixture. Previous treatments have focussed on mixtures having Gaussian components, but these are ...
Bayesian Inference via Variational Approximation for Collaborative Filtering

Variational approximation method finds wide applicability in approximating difficult-to-compute probability distributions, a problem that is especially important in Bayesian inference to estimate posterior distributions. Latent factor model is a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

March 2021

1192 pages

ISBN:9781450382977

DOI:10.1145/3437963

General Chairs:
Liane Lewin-Eytan
Amazon, Israel
,
David Carmel
Amazon, Israel
,
Elad Yom-Tov
Microsoft, Israel
,
Program Chairs:
Eugene Agichtein
Emory University and Amazon, USA
,
Evgeniy Gabrilovich
Google Health, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Lundgren Fund
Computer Laboratory, University of Cambridge
Nokia Bell Labs
Darwin College, University of Cambridge

Conference

WSDM '21

Sponsor:

WSDM '21: The Fourteenth ACM International Conference on Web Search and Data Mining

March 8 - 12, 2021

Virtual Event, Israel

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
290
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)21

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Manousakas DRitter HKaraletsos TKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Black-box coreset variational inferenceProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602747(34175-34187)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602747

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents