Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2095116.2095224acmotherconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
research-article

Learning k-modal distributions via testing

Published: 17 January 2012 Publication History

Abstract

A k-modal probability distribution over the domain {1,..., n} is one whose histogram has at most k "peaks" and "valleys." Such distributions are natural generalizations of monotone (k = 0) and unimodal (k = 1) probability distributions, which have been intensively studied in probability theory and statistics.
In this paper we consider the problem of learning an unknown k-modal distribution. The learning algorithm is given access to independent samples drawn from the k-modal distribution p, and must output a hypothesis distribution p such that with high probability the total variation distance between p and p is at most ε.
We give an efficient algorithm for this problem that runs in time poly(k, log(n), 1/ε). For k ≤ Õ(√ log n), the number of samples used by our algorithm is very close (within an Õ(log(1/ε)) factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases k = 0, 1 [Bir87b, Bir97].
A novel feature of our approach is that our learning algorithm crucially uses a new property testing algorithm as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the k-modal distribution into k (near)-monotone distributions, which are easier to learn.

References

[1]
{Bir87a} L. Birgé. Estimating a density under order restrictions: Nonasymptotic minimax risk. Annals of Statistics, 15(3):995--1012, 1987.
[2]
{Bir87b} L. Birgé. On the risk of histograms for estimating decreasing densities. Annals of Statistics, 15(3):1013--1022, 1987.
[3]
{Bir97} L. Birgé. Estimation of unimodal densities without smoothness assumptions. Annals of Statistics, 25(3):970--981, 1997.
[4]
{BKR04} Tugkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing monotone and unimodal distributions. In ACM Symposium on Theory of Computing, pages 381--390, 2004.
[5]
{CKC83} L. Cobb, P. Koppstein, and N. H. Chen. Estimation and moment recursion relations for multimodal distributions of the exponential family. J. American Statistical Association, 78(381):124--130, 1983.
[6]
{CT04} K. S. Chan and H. Tong. Testing for multimodality with dependent data. Biometrika, 91(1):113--123, 2004.
[7]
{DDS11} C. Daskalakis, I. Diakonikolas, and R. A. Servedio. Learning poisson binomial distributions. Arxiv report, 2011, 2011.
[8]
{DKW56} A. Dvoretzky, J. Kiefer, and J. Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Mathematical Statistics, 27(3):642--669, 1956.
[9]
{DL96a} L. Devroye and G. Lugosi. Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes. Annals of Statistics, 25:2626--2637, 1996.
[10]
{DL96b} L. Devroye and G. Lugosi. A universally acceptable smoothing factor for kernel density estimation. Annals of Statistics, 24:2499--2512, 1996.
[11]
{DL01} L. Devroye and G. Lugosi. Combinatorial methods in density estimation. Springer Series in Statistics, Springer, 2001.
[12]
{dTF90} G. A. de Toledo and J. M. Fernandez. Patch-clamp measurements reveal multimodal distribution of granule sizes in rat mast cells. Journal of Cell Biology, 110(4):1033--1039, 1990.
[13]
{FPP+98} F. R. Ferraro, B. Paltrinieri, F. F. Pecci, R. T. Rood, and B. Dorman. Multimodal distributions along the horizontal branch. The Astrophysical Journal, 500:311--319, 1998.
[14]
{GGR98} O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approximation. Journal of the ACM, 45:653--750, 1998.
[15]
{Gol11} O. Goldreich. Highlights of the Bertinoro workshop on Sublinear Algorithms (unpublished comments). Posted at http://www.wisdom.weizmann.ac.il/õded/MC/072.html, accessed June 17, 2011, 2011.
[16]
{Gro85} P. Groeneboom. Estimating a monotone density. In Proc. of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, pages 539--555, 1985.
[17]
{Kem91} J. H. B. Kemperman. Mixtures with a limited number of modal intervals. Annals of Statistics, 19(4):2120--2144, 1991.
[18]
{KR00} M. Kearns and D. Ron. Testing problems with sub-learning sample complexity. J. Comp. Sys. Sci., 61:428--456, 2000.
[19]
{Mas90} P. Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Annals of Probability, 18(3):1269--1283, 1990.
[20]
{Mur64} E. A. Murphy. One cause? many causes?: The argument from the bimodal distribution. J. Chronic Diseases, 17(4):301--324, 1964.
[21]
{Mye81} R. B. Myerson. Optimal auction design. Mathematics of Operations Research, 6:58--73, 1981.
[22]
{NS60} D. J. Newman and L. Shepp. The double dixie cup problem. The American Mathematical Monthly, 67(1):pp. 58--61, 1960.
[23]
{Rao69} B. L. S. Prakasa Rao. Estimation of a unimodal density. Sankhya Ser. A, 31:23--36, 1969.
[24]
{Ron08} D. Ron. Property Testing: A Learning Theory Perspective. Foundations and Trends in Machine Learning, 1(3):307--402, 2008.
[25]
{Weg70} E. J. Wegman. Maximum likelihood estimation of a unimodal density. I. and II. Ann. Math. Statist., 41:457--471, 2169--2174, 1970.
[26]
{Yat85} Y. G. Yatracos. Rates of convergence of minimum distance estimators and Kolmogorov's entropy. Annals of Statistics, 13:768--774, 1985.

Cited By

View all
  • (2022)Independence testing for bounded degree Bayesian networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601363(15027-15038)Online publication date: 28-Nov-2022
  • (2018)Team Performance with Test ScoresACM Transactions on Economics and Computation10.1145/32746446:3-4(1-26)Online publication date: 23-Oct-2018
  • (2018)Testing Shape Restrictions of Discrete DistributionsTheory of Computing Systems10.1007/s00224-017-9785-662:1(4-62)Online publication date: 1-Jan-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SODA '12: Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete algorithms
January 2012
1764 pages

Sponsors

  • Kyoto University: Kyoto University

In-Cooperation

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 17 January 2012

Check for updates

Qualifiers

  • Research-article

Conference

SODA '12
Sponsor:
  • Kyoto University

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Independence testing for bounded degree Bayesian networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601363(15027-15038)Online publication date: 28-Nov-2022
  • (2018)Team Performance with Test ScoresACM Transactions on Economics and Computation10.1145/32746446:3-4(1-26)Online publication date: 23-Oct-2018
  • (2018)Testing Shape Restrictions of Discrete DistributionsTheory of Computing Systems10.1007/s00224-017-9785-662:1(4-62)Online publication date: 1-Jan-2018
  • (2017)Sample-optimal density estimation in nearly-linear timeProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3039686.3039769(1278-1289)Online publication date: 16-Jan-2017
  • (2015)Testing identity of structured distributionsProceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms10.5555/2722129.2722252(1841-1854)Online publication date: 4-Jan-2015
  • (2015)Learning from satisfying assignmentsProceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms10.5555/2722129.2722162(478-497)Online publication date: 4-Jan-2015
  • (2015)Team Performance with Test ScoresProceedings of the Sixteenth ACM Conference on Economics and Computation10.1145/2764468.2764496(511-528)Online publication date: 15-Jun-2015
  • (2015)Learning Arbitrary Statistical Mixtures of Discrete DistributionsProceedings of the forty-seventh annual ACM symposium on Theory of Computing10.1145/2746539.2746584(743-752)Online publication date: 14-Jun-2015
  • (2015)Learning Poisson Binomial DistributionsAlgorithmica10.1007/s00453-015-9971-372:1(316-357)Online publication date: 1-May-2015
  • (2014)Efficient density estimation via piecewise polynomial approximationProceedings of the forty-sixth annual ACM symposium on Theory of computing10.1145/2591796.2591848(604-613)Online publication date: 31-May-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media