Preservation of Statistically Significant Patterns in Multiresolution 0-1 Data

Prem Raj Adhikari²¹ &
Jaakko Hollmén²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6282))

Included in the following conference series:

IAPR International Conference on Pattern Recognition in Bioinformatics

1209 Accesses
2 Citations

Abstract

Measurements in biology are made with high throughput and high resolution techniques often resulting in data in multiple resolutions. Currently, available standard algorithms can only handle data in one resolution. Generative models such as mixture models are often used to model such data. However, significance of the patterns generated by generative models has so far received inadequate attention. This paper analyses the statistical significance of the patterns preserved in sampling between different resolutions and when sampling from a generative model. Furthermore, we study the effect of noise on the likelihood with respect to the changing resolutions and sample size. Finite mixture of multivariate Bernoulli distribution is used to model amplification patterns in cancer in multiple resolutions. Statistically significant itemsets are identified in original data and data sampled from the generative models using randomization and their relationships are studied. The results showed that statistically significant itemsets are effectively preserved by mixture models. The preservation is more accurate in coarse resolution compared to the finer resolution. Furthermore, the effect of noise on data on higher resolution and with smaller number of sample size is higher than the data in lower resolution and with higher number of sample size.

Download to read the full chapter text

Chapter PDF

Mixture Models from Multiresolution 0-1 Data

Sampling vs. Metasampling Based on Straightforward Hilbert Representation of Isolation Kernel

Bayesian log-normal deconvolution for enhanced in silico microdissection of bulk gene expression data

Article Open access 20 October 2021

Keywords

References

Shaffer, L.G., Tommerup, N.: ISCN 2005: An International System for Human Cytogenetic Nomenclature(2005) Recommendations of the International Standing Committee on Human Cytogenetic Nomenclature. Karger (2005)
Google Scholar
McLachlan, G.J., Peel, D.: Finite mixture models. In: Probability and Statistics – Applied Probability and Statistics Section, vol. 299. Wiley, New York (2000)
Google Scholar
Everitt, B.S., Hand, D.J.: Finite mixture distributions. Chapman and Hall, Boca Raton (1981)
Book Google Scholar
Hollmén, J., Tikka, J.: Compact and understandable descriptions of mixtures of bernoulli distributions. In: Berthold, M.R., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS (LNAI), vol. 4723, pp. 1–12. Springer, Heidelberg (2007)
Chapter Google Scholar
Gyllenberg, M., Koski, T.: Probabilistic models for bacterial taxonomy. International Statistical Review 69, 249–276 (2000)
Article Google Scholar
Burdick, D., Calimlim, M., Gehrke, J.: Mafia: A maximal frequent itemset algorithm for transactional databases. In: ICDE, pp. 443–452 (2001)
Google Scholar
Hollmén, J., Seppänen, J.K., Mannila, H.: Mixture models and frequent sets: Combining global and local methods fordata. In: SDM (2003)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216. ACM, New York (1993)
Chapter Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Fayyad, U.M., Uthurusamy, R. (eds.) AAAI Workshop on Knowledge Discovery in Databases (KDD-94), Seattle, Washington, pp. 181–192. AAAI Press, Menlo Park (1994)
Google Scholar
Adhikari, P.R., Hollmén, J.: Patterns from multiresolution 0-1 data. In: UP ’10: Proceedings of the 16th ACM SIGKDD. ACM, New York (to appear, 2010)
Google Scholar
Bishop, J.F.: Cancer facts: a concise oncology text. Harwood Academic Publishers, Amsterdam (1999)
Google Scholar
Myllykangas, S., Tikka, J., Böhling, T., Knuutila, S., Hollmén, J.: Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics 1, 15 (2008)
Article PubMed PubMed Central Google Scholar
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3), 14 (2007)
Article Google Scholar
Gallo, A., Miettinen, P., Mannila, H.: Finding subgroups having several descriptions: Algorithms for redescription mining. In: SDM, pp. 334–345 (2008)
Google Scholar
Haiminen, N., Mannila, H., Terzi, E.: Comparing segmentations by applying randomization techniques. BMC Bioinformatics 8(1), 171 (2007)
Article PubMed PubMed Central Google Scholar
Schervish, M.J.: P values: What they are and what they are not. American Statistician 50(3), 203–206 (1996)
Google Scholar
De La Horra, J., Rodriguez-Bernal, M.T.: Posterior predictive p-values: What they are and what they are not. Test 10(1), 75–86 (2001)
Article Google Scholar
Besag, J., Clifford, P.: Generalized monte carlo significance tests. Biometrika 76(4), 633–642 (1989)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
Google Scholar
Geisser, S.: A predictive approach to the random effect model. Biometrika 61(1), 101–107 (1974)
Article Google Scholar
Monsteller, F., Tukey, J.: Data analysis including statistics. In: Lindzey, G., Aronson, E. (eds.) Handbook of Social Psychology, vol. 2. Addison-Wesley, Reading (1968)
Google Scholar
Tikka, J., Hollmén, J., Myllykangas, S.: Mixture modeling of DNA copy number amplification patterns in cancer. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 972–979. Springer, Heidelberg (2007)
Chapter Google Scholar
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research 5, 329–350 (1970)
Article CAS PubMed Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1), 1–38 (1977)
Google Scholar
Hollmén, J.: BernoulliMix: Program package for finite mixture models of multivariate Bernoulli distributions (May 2009), http://www.cis.hut.fi/jHollmen/BernoulliMix/
Hanhijärvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., Mannila, H.: Tell me something I don’t know: randomization strategies for iterative data mining. In: KDD ’09: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 379–388. ACM, New York (2009)
Chapter Google Scholar
Gay, S.D.: Datamining in proteomics: extracting knowledge from peptide mass fingerprinting spectra. PhD thesis, University of Geneva, Geneva (2002)
Google Scholar
Mclachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 1st edn. Wiley Interscience, Hoboken (November 1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076, Aalto, Espoo, Finland
Prem Raj Adhikari & Jaakko Hollmén

Authors

Prem Raj Adhikari
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Hollmén
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Computing and Information Sciences, Radboud University Nijmegen, Heyendaalseweg 135, 6525AJ, Nijmegen, The Netherlands
Tjeerd M. H. Dijkstra , Elena Marchiori & Tom Heskes , &
Institute for Computing and Information Sciences, Turku Centre for Computer Science, Radboud University Nijmegen, Heyendaalseweg 135, 6525AJ, Nijmegen, The Netherlands
Evgeni Tsivtsivadze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adhikari, P.R., Hollmén, J. (2010). Preservation of Statistically Significant Patterns in Multiresolution 0-1 Data. In: Dijkstra, T.M.H., Tsivtsivadze, E., Marchiori, E., Heskes, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2010. Lecture Notes in Computer Science(), vol 6282. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16001-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-16001-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16000-4
Online ISBN: 978-3-642-16001-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Preservation of Statistically Significant Patterns in Multiresolution 0-1 Data

Abstract

Chapter PDF

Similar content being viewed by others

Mixture Models from Multiresolution 0-1 Data

Sampling vs. Metasampling Based on Straightforward Hilbert Representation of Isolation Kernel

Bayesian log-normal deconvolution for enhanced in silico microdissection of bulk gene expression data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Preservation of Statistically Significant Patterns in Multiresolution 0-1 Data

Abstract

Chapter PDF

Similar content being viewed by others

Mixture Models from Multiresolution 0-1 Data

Sampling vs. Metasampling Based on Straightforward Hilbert Representation of Isolation Kernel

Bayesian log-normal deconvolution for enhanced in silico microdissection of bulk gene expression data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation