research-article

Anytime-valid inference for multinomial count data

AUTHORs:

Michael Lindon,

Alan MalekAuthors Info & Claims

NIPS'22: Proceedings of the 36th International Conference on Neural Information Processing Systems

Article No.: 204, Pages 2817 - 2831

Published: 03 April 2024 Publication History

Abstract

Many experiments compare count outcomes among treatment groups. Examples include the number of successful signups in conversion rate experiments or the number of errors produced by software versions in canary tests. Observations typically arrive in a sequence and practitioners wish to continuously monitor their experiments, sequentially testing hypotheses while maintaining Type I error probabilities under optional stopping and continuation. These goals are frequently complicated in practice by non-stationary time dynamics. We provide practical solutions through sequential tests of multinomial hypotheses, hypotheses about many inhomogeneous Bernoulli processes and hypotheses about many time- inhomogeneous Poisson counting processes. For estimation, we further provide confidence sequences for multinomial probability vectors, all contrasts among probabilities of inhomogeneous Bernoulli processes and all contrasts among intensities of time-inhomogeneous Poisson counting processes. Together, these provide an "anytime-valid" inference framework for a wide variety of experiments dealing with count outcomes, which we illustrate with several industry applications.

Supplementary Material

Additional material (3600270.3600474_supp.pdf)

Supplemental material.

Download
609.21 KB

References

[1]

Michael Lindon, Chris Sanden, and Vaché Shirikian. Rapid regression detection in software deployments through sequential testing. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD '22, page 3336-3346, New York, NY, USA, 2022a. Association for Computing Machinery. ISBN 9781450393850.

Digital Library

[2]

Lynn Kuo and Tae Young Yang. Bayesian computation for nonhomogeneous poisson processes in software reliability. Journal of the American Statistical Association, 91(434):763-773, 1996. ISSN 01621459. URL http://www.jstor.org/stable/2291671.

[3]

Glenn Shafer, Alexander Shen, Nikolai Vereshchagin, and Vladimir Vovk. Test martingales, bayes factors and p-values. Statist. Sci., 26(1):84-101, 02 2011.

[4]

Ian Waudby-Smith and Aaditya Ramdas. Confidence sequences for sampling without replacement. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 20204-20214. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/e96c7de8f6390b1e6c71556e4e0a4959-Paper.pdf.

[5]

Steven R. Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics, 49(2):1055-1080, 2021.

[6]

Emilie Kaufmann and Wouter M. Koolen. Mixture martingales revisited with applications to sequential tests and confidence intervals. Journal of Machine Learning Research, 22(246):1-44, 2021. URL http://jmlr.org/papers/v22/18-798.html.

[7]

A. Wald. Sequential tests of statistical hypotheses. Ann. Math. Statist., 16(2):117-186, 06 1945.

[8]

Harold Jeffreys. Some tests of significance, treated by the theory of probability. Mathematical Proceedings of the Cambridge Philosophical Society, 31(2):203-222, 1935.

[9]

Robert E. Kass and Adrian E. Raftery. Bayes factors. Journal of the American Statistical Association, 90(430):773-795, 1995. ISSN 01621459. URL http://www.jstor.org/stable/2291091.

[10]

Peter Grünwald, Rianne de Heide, and Wouter Koolen. Safe testing, 2021.

[11]

Muriel Felipe Pérez-Ortiz, Tyron Lardy, Rianne de Heide, and Peter Grünwald. E-statistics, group invariance and anytime valid testing, 2022. URL https://arxiv.org/abs/2208.07610.

[12]

Allard Hendriksen, Rianne de Heide, and Peter Grünwald. Optional Stopping with Bayes Factors: A Categorization and Extension of Folklore Results, with an Application to Invariant Situations. Bayesian Analysis, 16(3):961 - 989, 2021.

[13]

Aaditya Ramdas, Peter Grünwald, Vladimir Vovk, and Glenn Shafer. Game-theoretic statistics and safe anytime-valid inference, 2022. URL https://arxiv.org/abs/2210.01948.

[14]

A. Wald and J. Wolfowitz. Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics, 19(3):326-339, 1948. ISSN 00034851. URL http://www.jstor.org/stable/2235638.

[15]

I. J. Good. A bayesian significance test for multinomial distributions. Journal of the Royal Statistical Society. Series B (Methodological), 29(3):399-431, 1967. ISSN 00359246. URL http://www.jstor.org/stable/2984384.

[16]

Glenn Shafer. Testing by betting: A strategy for statistical and scientific communication. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(2):407-431, 2021. URL https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12647.

[17]

Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. Always valid inference: Continuous monitoring of a/b tests. Operations Research, 2021.

Digital Library

[18]

Herbert Robbins. Statistical methods related to the law of the iterated logarithm. The Annals of Mathematical Statistics, 41(5):1397-1409, 1970. ISSN 00034851. URL http://www.jstor.org/stable/2239848.

[19]

Eric-Jan Wagenmakers, Quentin F. Gronau, Fabian Dablander, and Alexander Etz. The support interval. Erkenntnis, 87(2):589-601, Apr 2022. ISSN 1572-8420.

[20]

James M. Dickey. The Weighted Likelihood Ratio, Linear Hypotheses on Normal Location Parameters. The Annals of Mathematical Statistics, 42(1):204-223, 1971.

[21]

Guido W. Imbens and Donald B. Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015.

[22]

Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings 16th Conference on Knowledge Discovery and Data Mining, pages 17-26, Washington, DC, 2010.

Digital Library

[23]

Zhenyu Zhao, Miao Chen, Don Matheson, and Maria Stone. Online experimentation diagnosis and troubleshooting beyond aa validation. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 498-507, 2016.

[24]

Donald B. Rubin. Inference and missing data. Biometrika, 63(3):581-592, 12 1976. ISSN 0006-3444.

[25]

Aleksander Fabijan, Jayant Gupchup, Somit Gupta, Jeff Omhover, Wen Qin, Lukas Vermeer, and Pavel Dmitriev. Diagnosing sample ratio mismatch in online controlled experiments: A taxonomy and rules of thumb for practitioners. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD '19, page 2156-2164, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450362016. 3330722.

Digital Library

[26]

Nanyu Chen, Min Liu, and Ya Xu. Automatic detection and diagnosis of biased online experiments, 2018.

[27]

A. Wald. Sequential analysis. J. Wiley & sons, Incorporated, 1947. URL https://books.google.com/books?id=0nREAAAAIAAJ.

[28]

J.F.C. Kingman. Poisson Processes. Oxford Studies in Probability. Clarendon Press, 1992. ISBN 9780191591242. URL https://books.google.com/books?id=VEiM-OtwDHkC.

[29]

Gerald Schermann, Jürgen Cito, Philipp Leitner, Uwe Zdun, and Harald C. Gall. We're doing it live: A multi-method empirical study on continuous experimentation. Information and Software Technology, 99:41-57, 2018. ISSN 0950-5849. URL https://www.sciencedirect.com/science/article/pii/S0950584917302136.

Digital Library

[30]

Steven R. Howard and Aaditya Ramdas. Sequential estimation of quantiles with applications to A/B testing and best-arm identification. Bernoulli, 28(3):1704 - 1728, 2022.

[31]

P. Armitage. Interim Analyses in Clinical Trials. Multiple Comparisons, Selection and Applications in Biometry. CRC Press, 1993. ISBN 9780824788957.

[32]

Jerzy Neyman, Egon Sharpe Pearson, and Karl Pearson. Ix. on the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231(694-706):289-337, 1933. URL https://royalsocietypublishing.org/doi/abs/10.1098/rsta.1933.0009.

[33]

Herbert Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527 - 535, 1952. URL https://doi.org/.

[34]

G. Casella and R.L. Berger. Statistical Inference. Duxbury advanced series in statistics and decision sciences. Thomson Learning, 2002. ISBN 9780534243128. URL https://books.google.com/books?id=0x_vAAAAMAAJ.

[35]

F. J. Anscombe. Fixed-sample-size analysis of sequential observations. Biometrics, 10(1):89-100, 1954. ISSN 0006341X, 15410420. URL http://www.jstor.org/stable/3001665.

[36]

Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17, page 1517-1525, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450348874.

Digital Library

[37]

P. Armitage, C. K. McPherson, and B. C. Rowe. Repeated significance tests on accumulating data. Journal of the Royal Statistical Society. Series A (General), 132(2):235-244, 1969. ISSN 00359238. URL http://www.jstor.org/stable/2343787.

[38]

Joseph B. Kadane, Mark J. Schervish, and Teddy Seidenfeld. Reasoning to a foregone conclusion. Journal of the American Statistical Association, 91(435):1228-1235, 1996. ISSN 01621459. URL http://www.jstor.org/stable/2291741.

[39]

Jean Ville. Étude critique de la notion de collectif. 1939. URL http://eudml.org/doc/192893.

[40]

James O. Berger, Lawrence D. Brown, and Robert L. Wolpert. A unified conditional frequentist and bayesian test for fixed and sequential simple hypothesis testing. Ann. Statist., 22(4):1787-1807, 12 1994.

[41]

J. O. Berger, B. Boukai, and Y. Wang. Unified frequentist and bayesian testing of a precise hypothesis. Statist. Sci., 12(3):133-160, 09 1997.

[42]

James O. Berger, Benzion Boukai, and Yinping Wang. Simultaneous bayesian-frequentist sequential testing of nested hypotheses. Biometrika, 86(1):79-92, 1999. ISSN 00063444. URL http: //www.jstor.org/stable/2673538.

[43]

Jerome Cornfield. A bayesian test of some classical hypotheses, with applications to sequential clinical trials. Journal of the American Statistical Association, 61(315):577-594, 1966. ISSN 01621459. URL http://www.jstor.org/stable/2282772.

[44]

D. A. Darling and Herbert Robbins. Confidence sequences for mean, variance, and median. Proceedings of the National Academy of Sciences, 58(1):66-68, 1967. ISSN 0027-8424. URL https://www.pnas.org/content/58/1/66.

[45]

Ian Waudby-Smith, David Arbour, Ritwik Sinha, Edward H. Kennedy, and Aaditya Ramdas. Doubly robust confidence sequences for sequential causal inference, 2021.

[46]

Michael Lindon, Dae Woong Ham, Martin Tingley, and Iavor Bojinov. Anytime-valid f-tests for faster sequential experimentation through covariate adjustment, 2022b. URL https://arxiv.org/abs/2210.08589.

[47]

Dae Woong Ham, Iavor Bojinov, Michael Lindon, and Martin Tingley. Design-based confidence sequences for anytime-valid causal inference, 2022. URL https://arxiv.org/abs/2210. 08639.

[48]

C. Jennison and B.W. Turnbull. Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC Interdisciplinary Statistics. CRC Press, 1999. ISBN 9781584888581. URL https://books.google.com/books?id=qBrpTcAYtNQC.

[49]

Ivair R Silva and Martin Kulldorff. Continuous versus group sequential analysis for post-market drug and vaccine safety surveillance. Biometrics, 71(3):851-858, 2015.

[50]

Steven R. Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. Time-uniform Chernoff bounds via nonnegative supermartingales. Probability Surveys, 17(none):257-317, 2020.

[51]

P. A. W. Lewis and G. S. Shedler. Simulation of nonhomogeneous poisson processes with degree- two exponential polynomial rate function. Operations Research, 27(5):1026-1040, 1979.

Digital Library

Index Terms

Anytime-valid inference for multinomial count data

Index terms have been assigned to the content through auto-classification.

Recommendations

Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions

In this paper we examine the problem of count data clustering. We analyze this problem using finite mixtures of distributions. The multinomial and the multinomial Dirichlet distributions are widely accepted to model count data. We show that these two ...
Inference with multinomial data: why to weaken the prior strength
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

This paper considers inference from multinomial data and addresses the problem of choosing the strength of the Dirichlet prior under a mean-squared error criterion. We compare the Maximum Likelihood Estimator (MLE) and the most commonly used Bayesian ...
Anytime-valid off-policy Inference for Contextual Bandits
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts X_t to actions A_tin an ...
Highlights
PROBLEM STATEMENT

Contextual bandits and adaptive experimentation are becoming increasingly commonplace in the tech industry and health sciences. The problem setting consists of (at each time t) observing a context X_t, taking a randomized ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing Systems

November 2022

39114 pages

ISBN:9781713871088

Copyright © 2022 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 April 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents