Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3437963.3441793acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

β-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

Published: 08 March 2021 Publication History

Abstract

Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers. Despite the multiple benefits of Bayesian methods (such as uncertainty-aware predictions, incorporation of experts knowledge, and hierarchical modeling), the quality of classic Bayesian inference depends critically on whether observations conform with the assumed data generating model, which is impossible to guarantee in practice. In this work, we propose a variational inference method that, in a principled way, can simultaneously scale to large datasets, and robustify the inferred posterior with respect to the existence of outliers in the observed data. Reformulating Bayes theorem via the β-divergence, we posit a robustified generalized Bayesian posterior as the target of inference. Moreover, relying on the recent formulations of Riemannian coresets for scalable Bayesian inference, we propose a sparse variational approximation of the robustified posterior and an efficient stochastic black-box algorithm to construct it. Overall our method allows releasing cleansed data summaries that can be applied broadly in scenarios involving structured and unstructured data contamination. We illustrate the applicability of our approach in diverse simulated and real datasets, and various statistical models, including Gaussian mean inference, logistic and neural linear regression, demonstrating its superiority to existing Bayesian summarization methods in the presence of outliers.

References

[1]
E. Angelino, M. J. Johnson, and R. P. Adams. Patterns of scalable Bayesian inference. Found. Trends Mach. Learn., 2016.
[2]
M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security of machine learning. Machine Learning, 2010.
[3]
A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549--559, 09 1998.
[4]
J. O. Berger, E. Moreno, L. R. Pericchi, M. J. Bayarri, J. M. Bernardo, J. A. Cano, J. De la Horra, J. Mart'in, D. R'ios Insua, B. Betrò, et al. An overview of robust Bayesian analysis. Test, 3(1):5--124, 1994.
[5]
K. Bhatia, Y.-A. Ma, A. D. Dragan, P. L. Bartlett, and M. I. Jordan. Bayesian robustness: A nonasymptotic viewpoint. arXiv preprint, 2019.
[6]
B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against Support Vector Machines. In Proceedings of the 29th International Coference on International Conference on Machine Learning, 2012.
[7]
T. Campbell and B. Beronov. Sparse Variational Inference: Bayesian coresets from scratch. In Advances in Neural Information Processing Systems, 2019.
[8]
T. Campbell and T. Broderick. Automated scalable Bayesian inference via Hilbert coresets. Journal of Machine Learning Research, 20(15), 2019.
[9]
A. Cichocki and S. Amari. Families of Alpha- Beta- and Gamma- divergences: Flexible and robust measures of similarities. Entropy, 12(6):1532--1568, 2010.
[10]
A. P. Dawid, M. Musio, and L. Ventura. Minimum scoring rule inference. Scandinavian Journal of Statistics, 43(1):123--138, 2016.
[11]
B. de Finetti. The Bayesian approach to the rejection of outliers. In Proceedings of the fourth Berkeley Symposium on Probability and Statistics, 1961.
[12]
I. Diakonikolas, G. Kamath, D. Kane, J. Li, J. Steinhardt, and A. Stewart. Sever: A robust meta-algorithm for stochastic optimization. In Proceedings of the 36th International Conference on Machine Learning, 2019.
[13]
C. Dickens, E. Meissner, P. G. Moreno, and T. Diethe. Interpretable anomaly detection with Mondrian Pólya forests on data streams. arXiv preprint, 2020.
[14]
D. Dua and C. Graff. UCI Machine Learning Repository, 2017.
[15]
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121--2159, 2011.
[16]
S. Eguchi and Y. Kano. Robustifying maximum likelihood estimation. Tokyo Institute of Statistical Mathematics, Tokyo, Japan, Tech. Rep, 2001.
[17]
B. Frénay and M. Verleysen. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems, 25(5), 2013.
[18]
H. Fujisawa and S. Eguchi. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal., pages 2053--2081, 2008.
[19]
F. Futami, I. Sato, and M. Sugiyama. Variational Inference based on robust divergences. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018.
[20]
A. Ghorbani and J. Zou. Data Shapley: Equitable valuation of data for machine learning. In Proceedings of the 36th International Conference on Machine Learning, 2019.
[21]
A. Ghosh and A. Basu. Robust Bayes estimation using the density power divergence. Annals of the Institute of Statistical Mathematics, 68(2):413--437, 2016.
[22]
M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic Variational Inference. Journal of Machine Learning Research, 14:1303--1347, 2013.
[23]
M. D. Hoffman and A. Gelman. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res., 2014.
[24]
P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist., 35(1):73--101, 03 1964.
[25]
P. J. Huber and E. M. Ronchetti. Robust statistics; 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ, 2009.
[26]
J. Huggins, T. Campbell, and T. Broderick. Coresets for scalable Bayesian logistic regression. In Advances in Neural Information Processing Systems, 2016.
[27]
J. Jewson, J. Q. Smith, and C. Holmes. Principles of Bayesian inference using general divergence criteria. Entropy, 20(6):442, 2018.
[28]
D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In Advances in neural information processing systems, 2011.
[29]
J. Knoblauch, J. E. Jewson, and T. Damoulas. Doubly robust Bayesian inference for non-stationary streaming data with β-divergences. In Advances in Neural Information Processing Systems 31. 2018.
[30]
P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, 2017.
[31]
R. Kohavi. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In KDD, 1996.
[32]
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361--397, Dec. 2004.
[33]
B. Li, Y. Wang, A. Singh, and Y. Vorobeychik. Data poisoning attacks on factorization-based collaborative filtering. In Advances in neural information processing systems, 2016.
[34]
Q. Liu, J. Peng, and A. T. Ihler. Variational Inference for crowdsourcing. In Advances in Neural Information Processing Systems 25. 2012.
[35]
D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
[36]
D. Manousakas, Z. Xu, C. Mascolo, and T. Campbell. Bayesian pseudocoresets. In Advances in Neural Information Processing Systems. 2020.
[37]
J. W. Miller and D. B. Dunson. Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 114(527), 2019.
[38]
P. Paschou, J. Lewis, A. Javed, and P. Drineas. Ancestry informative markers for fine-scale individual assignment to worldwide populations. Journal of Medical Genetics, 2010.
[39]
R. Pinsler, J. Gordon, E. Nalisnick, and J. M. Hernández-Lobato. Bayesian batch active learning as sparse subset approximation. In Advances in Neural Information Processing Systems. 2019.
[40]
V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 11(4), 2010.
[41]
D. R'ios Insua and F. Ruggeri. Robust Bayesian Analysis, volume 152. Springer Science & Business Media, 2012.
[42]
C. Riquelme, G. Tucker, and J. Snoek. Deep Bayesian bandits showdown: An empirical comparison of Bayesian deep networks for Thompson sampling. In 6th International Conference on Learning Representations, 2018.
[43]
W. Samek, D. Blythe, K.-R. Müller, and M. Kawanabe. Robust spatial filtering with beta divergence. In Advances in Neural Information Processing Systems, 2013.
[44]
L. S. Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28), 1953.
[45]
V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.
[46]
J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, and R. Adams. Scalable Bayesian optimization using deep neural networks. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
[47]
J. Steinhardt, P. W. W. Koh, and P. S. Liang. Certified defenses for data poisoning attacks. In Advances in neural information processing systems, 2017.
[48]
B. Strack, J. P. DeShazo, C. Gennings, J. L. Olmo, S. Ventura, K. J. Cios, and J. N. Clore. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 2014.
[49]
S. Vahidian, B. Mirzasoleiman, and A. Cloninger. Coresets for estimating means and mean square error with limited greedy samples. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, 2020.
[50]
M. J. Wainwright and M. I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Found. Trends Mach. Learn., 1(1--2):1--305, Jan. 2008.
[51]
C. Wang and D. M. Blei. A general method for robust Bayesian modeling. Bayesian Analysis, 2018.
[52]
D. Wang, D. Irani, and C. Pu. Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006. In 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2012.
[53]
Y. Wang, A. Kucukelbir, and D. M. Blei. Robust probabilistic modeling with Bayesian data reweighting. In Proceedings of the 34th International Conference on Machine Learning, 2017.
[54]
M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th international conference on machine learning, 2011.
[55]
J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems, 2009.
[56]
A. Zellner. Optimal information processing and Bayes's theorem. The American Statistician, 42(4):278--280, 1988.
[57]
Y. Zhang, X. Chen, D. Zhou, and M. I. Jordan. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. The Journal of Machine Learning Research, 17(1):3537--3580, 2016.
[58]
V. W. Zheng, S. J. Pan, Q. Yang, and J. J. Pan. Transferring multi-device localization models using latent multi-task learning. In AAAI, 2008.
[59]
H. Zhuang, A. Parameswaran, D. Roth, and J. Han. Debiasing crowdsourced batches. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.

Cited By

View all
  • (2022)Black-box coreset variational inferenceProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602747(34175-34187)Online publication date: 28-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining
March 2021
1192 pages
ISBN:9781450382977
DOI:10.1145/3437963
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data summarizations
  2. coresets
  3. data valuation
  4. noisy observations
  5. robust statistics
  6. scalable learning
  7. variational inference

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM '21

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)21
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Black-box coreset variational inferenceProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602747(34175-34187)Online publication date: 28-Nov-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media