Abstract
The subject of estimating the p-value of the log-likelihood ratio statistic for multinomial distribution has been studied extensively in the statistical literature. Nevertheless, bioinformatics laid new challenges before that research by often concentrating its interest on the “thin tail” of the distribution where classical statistical approximation typically fails. Hence, some of the more recent development in this area have come from the bioinformatics community ([5], [3]).
Since algorithms for computing the exact p-value have an exponential complexity, the only generally applicable algorithms for reliably estimating the p-value are lattice based. In particular, Hertz and Stormo have a dynamic programming algorithm whose complexity is O(QKN 2), where Q is the size of the lattice, K is the size of the alphabet and N is the size of the sample. We present a new algorithm that is practically as reliable as Hertz and Stormo’s and has a complexity of O(QKNlog N). An interesting feature of our algorithm is that it can guarantee the quality of its estimated p-value.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baglivo, J., Olivier, D., Pagano, M.: Methods for exact goodness-of-fit tests. Journal of the American Statistical Association 87(418), 464–469 (1992)
Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Menlo Park, California, pp. 28–36 (1994)
Bejerano, G.: Efficient exact value computation and applications to biosequence analysis. In: Vingron, M., Istrail, S., Pevzner, P.A., Waterman, M.S. (eds.) Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), Berlin, Germany, pp. 38–47. ACM Press, New York (2003)
Cressie, N., Read, T.R.C.: Person’s χ2 and the loglikelihood ratio statistic g2: A comparative review. International Statistical Review 57(1), 19–43 (1989)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Hirji, K.A.: A comparison of algorithms for exact goodness-of-fit tests for multinomial data. Communications in Statistics-Simulation and Computations 26(3), 1197–1227 (1997)
Hoeffding, W.: Asymptotically optimal tests for multinomial distributions. Annals of Mathematical Statistics 36, 369–408 (1965)
Kallenberg, W.C.M.: On moderate and large deviations in multinomial distributions. Annals of Statistics 13(4), 1554–1580 (1985)
Keich, U.: Efficiently computing the p-value of the entropy score. Journal of Computational Biology (in press)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. The art of scientific computing, 2nd edn. Cambridge University Press, Cambridge (1992)
Rahmann, S.: Dynamic programming algorithms for two statistical problems in computational biology. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 151–164. Springer, Heidelberg (2003)
Rice, J.A.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press, Boston (1995)
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keich, U., Nagarajan, N. (2004). A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30219-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive