A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic

Uri Keich²¹ &
Niranjan Nagarajan²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

608 Accesses
1 Citations

Abstract

The subject of estimating the p-value of the log-likelihood ratio statistic for multinomial distribution has been studied extensively in the statistical literature. Nevertheless, bioinformatics laid new challenges before that research by often concentrating its interest on the “thin tail” of the distribution where classical statistical approximation typically fails. Hence, some of the more recent development in this area have come from the bioinformatics community ([5], [3]).

Since algorithms for computing the exact p-value have an exponential complexity, the only generally applicable algorithms for reliably estimating the p-value are lattice based. In particular, Hertz and Stormo have a dynamic programming algorithm whose complexity is O(QKN ²), where Q is the size of the lattice, K is the size of the alphabet and N is the size of the sample. We present a new algorithm that is practically as reliable as Hertz and Stormo’s and has a complexity of O(QKNlog N). An interesting feature of our algorithm is that it can guarantee the quality of its estimated p-value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Integrated likelihood inference in multinomial distributions

Article 08 November 2022

High-order asymptotic approximations for improved inference under exceptionally low false positive error rates

Article 14 July 2023

Minimum distance histograms with universal performance guarantees

Article Open access 13 July 2019

References

Baglivo, J., Olivier, D., Pagano, M.: Methods for exact goodness-of-fit tests. Journal of the American Statistical Association 87(418), 464–469 (1992)
Article MATH MathSciNet Google Scholar
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Menlo Park, California, pp. 28–36 (1994)
Google Scholar
Bejerano, G.: Efficient exact value computation and applications to biosequence analysis. In: Vingron, M., Istrail, S., Pevzner, P.A., Waterman, M.S. (eds.) Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), Berlin, Germany, pp. 38–47. ACM Press, New York (2003)
Chapter Google Scholar
Cressie, N., Read, T.R.C.: Person’s χ2 and the loglikelihood ratio statistic g2: A comparative review. International Statistical Review 57(1), 19–43 (1989)
Article MATH Google Scholar
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Article Google Scholar
Hirji, K.A.: A comparison of algorithms for exact goodness-of-fit tests for multinomial data. Communications in Statistics-Simulation and Computations 26(3), 1197–1227 (1997)
Article MATH Google Scholar
Hoeffding, W.: Asymptotically optimal tests for multinomial distributions. Annals of Mathematical Statistics 36, 369–408 (1965)
Article MATH MathSciNet Google Scholar
Kallenberg, W.C.M.: On moderate and large deviations in multinomial distributions. Annals of Statistics 13(4), 1554–1580 (1985)
Article MATH MathSciNet Google Scholar
Keich, U.: Efficiently computing the p-value of the entropy score. Journal of Computational Biology (in press)
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. The art of scientific computing, 2nd edn. Cambridge University Press, Cambridge (1992)
MATH Google Scholar
Rahmann, S.: Dynamic programming algorithms for two statistical problems in computational biology. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 151–164. Springer, Heidelberg (2003)
Chapter Google Scholar
Rice, J.A.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press, Boston (1995)
MATH Google Scholar
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, 14850, USA
Uri Keich & Niranjan Nagarajan

Authors

Uri Keich
View author publications
You can also search for this author in PubMed Google Scholar
Niranjan Nagarajan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Computational Biology Unit, HIB, University of Bergen, 5020, Bergen, Norway
Inge Jonassen
Department of Biology,, Penn Center for Bioinformatics, Penn Genomics Institute, 415 S. University Ave., PA 19104, Philadelphia, USA
Junhyong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keich, U., Nagarajan, N. (2004). A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-30219-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Integrated likelihood inference in multinomial distributions

High-order asymptotic approximations for improved inference under exceptionally low false positive error rates

Minimum distance histograms with universal performance guarantees

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Integrated likelihood inference in multinomial distributions

High-order asymptotic approximations for improved inference under exceptionally low false positive error rates

Minimum distance histograms with universal performance guarantees

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation