Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic

  • Conference paper
Algorithms in Bioinformatics (WABI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

Abstract

The subject of estimating the p-value of the log-likelihood ratio statistic for multinomial distribution has been studied extensively in the statistical literature. Nevertheless, bioinformatics laid new challenges before that research by often concentrating its interest on the “thin tail” of the distribution where classical statistical approximation typically fails. Hence, some of the more recent development in this area have come from the bioinformatics community ([5], [3]).

Since algorithms for computing the exact p-value have an exponential complexity, the only generally applicable algorithms for reliably estimating the p-value are lattice based. In particular, Hertz and Stormo have a dynamic programming algorithm whose complexity is O(QKN 2), where Q is the size of the lattice, K is the size of the alphabet and N is the size of the sample. We present a new algorithm that is practically as reliable as Hertz and Stormo’s and has a complexity of O(QKNlog N). An interesting feature of our algorithm is that it can guarantee the quality of its estimated p-value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baglivo, J., Olivier, D., Pagano, M.: Methods for exact goodness-of-fit tests. Journal of the American Statistical Association 87(418), 464–469 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Menlo Park, California, pp. 28–36 (1994)

    Google Scholar 

  3. Bejerano, G.: Efficient exact value computation and applications to biosequence analysis. In: Vingron, M., Istrail, S., Pevzner, P.A., Waterman, M.S. (eds.) Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), Berlin, Germany, pp. 38–47. ACM Press, New York (2003)

    Chapter  Google Scholar 

  4. Cressie, N., Read, T.R.C.: Person’s χ2 and the loglikelihood ratio statistic g2: A comparative review. International Statistical Review 57(1), 19–43 (1989)

    Article  MATH  Google Scholar 

  5. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  Google Scholar 

  6. Hirji, K.A.: A comparison of algorithms for exact goodness-of-fit tests for multinomial data. Communications in Statistics-Simulation and Computations 26(3), 1197–1227 (1997)

    Article  MATH  Google Scholar 

  7. Hoeffding, W.: Asymptotically optimal tests for multinomial distributions. Annals of Mathematical Statistics 36, 369–408 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  8. Kallenberg, W.C.M.: On moderate and large deviations in multinomial distributions. Annals of Statistics 13(4), 1554–1580 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  9. Keich, U.: Efficiently computing the p-value of the entropy score. Journal of Computational Biology (in press)

    Google Scholar 

  10. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C. The art of scientific computing, 2nd edn. Cambridge University Press, Cambridge (1992)

    MATH  Google Scholar 

  11. Rahmann, S.: Dynamic programming algorithms for two statistical problems in computational biology. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 151–164. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  12. Rice, J.A.: Mathematical Statistics and Data Analysis, 2nd edn. Duxbury Press, Boston (1995)

    MATH  Google Scholar 

  13. Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Keich, U., Nagarajan, N. (2004). A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr Statistic. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30219-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23018-2

  • Online ISBN: 978-3-540-30219-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics