research-article

Improved coresets for kernel density estimates

Authors:

Jeff M. Phillips,

Wai Ming TaiAuthors Info & Claims

SODA '18: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms

Pages 2718 - 2727

Published: 07 January 2018 Publication History

Abstract

We study the construction of coresets for kernel density estimates. That is we show how to approximate the kernel density estimate described by a large point set with another kernel density estimate with a much smaller point set. For characteristic kernels (including Gaussian and Laplace kernels), our approximation preserves the L_∞ error between kernel density estimates within error ϵ, with coreset size 4/ϵ², but no other aspects of the data, including the dimension, the diameter of the point set, or the bandwidth of the kernel common to other approximations. When the dimension is unrestricted, we show this bound is tight for these kernels as well as a much broader set.

This work provides a careful analysis of the iterative Frank-Wolfe algorithm adapted to this context, an algorithm called kernel herding. This analysis unites a broad line of work that spans statistics, machine learning, and geometry.

When the dimension d is constant, we demonstrate much tighter bounds on the size of the coreset specifically for Gaussian kernels, showing that it is bounded by the size of the coreset for axis-aligned rectangles. Currently the best known constructive bound is [EQUATION], and non-constructively, this can be improved by [EQUATION]. This improves the best constant dimension bounds polynomially for d ≥ 3.

References

[1]

Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337--404, 1950.

[2]

Franics Bach, Simon Lacsote-Julien, and Guillaume Obozinski. On the equivalence between herding and conditional gradient algorithms. In Proceedings International Conference on Machine Learning, 2012.

Digital Library

[3]

Nikhil Bansal and Shashwat Garg. Algorithmic discrepancy beyond partial coloring. In Proceedings ACM Symposium on Theory of Computation, 2017.

Digital Library

[4]

Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel hearding. In Conference on Uncertainty in Artificial Intellegence, 2010.

Digital Library

[5]

Ken Clarkson. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms, 4(6), 2010.

Digital Library

[6]

Efren Cruz Cortes and Clayton Scott. Sparse approximation of a kernel mean. IEEE Transactions on Signal Processing, accepted (arXiv:1503.00323), 2015.

[7]

Luc Devroye and László Györfi. Nonparametric Density Estimation: The L₁ View. Wiley, 1984.

[8]

Joseph C Dunn. Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM Journal on Control and Optimization, 18(5):473--487, 1980.

[9]

Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman Balakrishnan, and Aarti Singh. Confidence sets for persistence diagrams. The Annals of Statistics, 42:2301--2339, 2014.

[10]

Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3:95--110, 1956.

[11]

Robert Freund and Paul Grigas. New analysis and results for the frank-wolfe method. Mathematical Programming, (to appear).

Digital Library

[12]

Bernd Gärtner and Martin Jaggi. Coresets for polytope distance. In SOCG, 2009.

[13]

Joan Glaunès. Transport par difféomorphismes de points, de mesures et de courants pour la comparaison de formes et l'anatomie numérique. PhD thesis, Université Paris 13, 2005.

[14]

Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.

[15]

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Scholkopf, and Alexander Smola. A kernel two-sample test. JMLR, 13:723--773, 2012.

Digital Library

[16]

Zaid Harchaoui, Francis Bach, Olivier Cappe, and Eric Moulines. Kernel-based methods for hypothesis testing: A unified view. IEEE Signal Processing Magazine, 30:87--97, 2013.

[17]

Nick Harvey and Samira Samadi. Near-optimal herding. In COLT, pages 1165--1182, 2014.

[18]

Matrial Hein and Olivier Bousquet. Hilbertian metrics and positive definite kernels on probability measures. In AIStats, 2005.

[19]

Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In ICML, 2013.

Digital Library

[20]

Martin Jaggi and Simon Lacsote-Julien. On the global linear convergence of frank-wolfe optimization variants. In NIPS, 2015.

Digital Library

[21]

Sarang Joshi, Raj Varma Kommaraju, Jeff M. Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In SOCG, 2011.

Digital Library

[22]

Sarang Joshi, Raj Varma Kommaraju, Jeff M. Phillips, and Suresh Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In 27th Annual Symposium on Computational Geometry, 2011.

Digital Library

[23]

Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the samples complexity of learning. J. Comp. and Sys. Sci., 62:516--527, 2001.

Digital Library

[24]

Alfred Muller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429--443, 1997.

[25]

Aleksandar Nikolov. Tighter bounds for the discrepancy of boxes and polytopes. Technical report, arXiv:1701.05532, 2017.

[26]

Emanuel Parzen. On the estimation of a probability density function and the mode. Annals of Mathematical Statistics, 33:1065--1076, 1962.

[27]

Jeff M. Phillips. ϵ-samples for kernels. In SODA, 2013.

Digital Library

[28]

Jeff M. Phillips and Suresh Venkatasubramanian. A gentle introduction to the kernel distance. arXiv:1103.1625, March 2011.

[29]

Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density estimates. In SOCG, 2015.

[30]

Bernhard Scholkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.

Digital Library

[31]

David W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, 1992.

[32]

Alexander J. Smola Seth R. Flaxman, Yu-Xiang Wang. Who supported Obama in 2012?: Ecological inference through distribution regression. In KDD, 2015.

[33]

Bernard W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, 1986.

[34]

Le Song, Xinhua Zhang, Alex Smola, Arthur Gretton, and Berhard Schölkopf. Tailoring density estimation via reproducing kernel moment matching. In ICML, 2008.

Digital Library

[35]

Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. JMLR, 11:1517--1561, 2010.

Digital Library

[36]

Grace Wahba. Support vector machines, reproducing kernel Hilbert spaces, and randomization. In Advances in Kernel Methods - Support Vector Learning, pages 69--88. 1999.

Digital Library

[37]

Yan Zheng and Jeff M. Phillips. L_infity error and bandwidth selection for kernel density estimates of large data. In KDD, 2015.

Digital Library

Cited By

Chan TIp PU LChoi BXu J(2022)SAFEProceedings of the VLDB Endowment10.14778/3494124.349413515:3(513-526)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.14778/3494124.3494135
Chan TLi ZU LXu JCheng R(2021)Fast augmentation algorithms for network kernel density visualizationProceedings of the VLDB Endowment10.14778/3461535.346154014:9(1503-1516)Online publication date: 22-Oct-2021
https://dl.acm.org/doi/10.14778/3461535.3461540

Improved coresets for kernel density estimates
1. Theory of computation

Recommendations

Near-Optimal Coresets of Kernel Density Estimates
Abstract
We construct near-optimal coresets for kernel density estimates for points in $R^{d}$ when the kernel is positive definite. Specifically we provide a polynomial time construction for a coreset of size $O (\sqrt{d} / ε \cdot \sqrt{log 1 / ε})$ , and we show a near-matching lower ... $\sqrt{}^{}$ $^{}$ $^{}$ $^{}$
A multiple kernel-based kernel density estimator for multimodal probability density functions
Abstract
The performance of the single kernel-based kernel density estimator (SK-KDE) in fitting a unimodal probability density function (PDF) depends on the choice of kernel function and the corresponding selection of kernel bandwidth. Unlike unimodal ...
Highlights
- Multimodal probability density estimation problem was solved in a novel way.
- Multiple kernel-based kernel density estimator (MK-KDE) was firstly constructed.
- Efficient objective function was designed to obtain the optimized kernel ...
A continuation approach to mode-finding of multivariate Gaussian mixtures and kernel density estimates

Gaussian mixtures (i.e. linear combinations of multivariate Gaussian probability densities) appear in numerous applications due to their universal ability to approximate multimodal probability distributions. Finding the modes (maxima) of a Gaussian ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SODA '18: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms

January 2018

2859 pages

ISBN:9781611975031

Program Chair:
Artur Czumaj
University of Warwick, United Kingdom

Sponsors

SIAM Activity Group on Discrete Mathematics
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 07 January 2018

Check for updates

Qualifiers

Research-article

Conference

SODA '18

Sponsor:

SIGACT

SODA '18: Symposium on Discrete Algorithms

January 7 - 10, 2018

Louisiana, New Orleans

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
59
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chan TIp PU LChoi BXu J(2022)SAFEProceedings of the VLDB Endowment10.14778/3494124.349413515:3(513-526)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.14778/3494124.3494135
Chan TLi ZU LXu JCheng R(2021)Fast augmentation algorithms for network kernel density visualizationProceedings of the VLDB Endowment10.14778/3461535.346154014:9(1503-1516)Online publication date: 22-Oct-2021
https://dl.acm.org/doi/10.14778/3461535.3461540

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents