Abstract
Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a great deal of attention, as it is very relevant to plant and animal breeding. More effective breeding strategies can be developed based on a more accurate prediction. Most of the existing work considers an additive model on single markers, or genotypes only. In this work, we studied the problem of epistasis detection for genetic trait prediction, where different alleles, or genes, can interact with each other. We have developed a novel method MINED to detect significant pairwise epistasis effects that contribute most to prediction performance. A dynamic thresholding and a sampling strategy allow very efficient detection, and it is generally 20 to 30 times faster than an exhaustive search. In our experiments on real plant data sets, MINED is able to capture the pairwise epistasis effects that improve the prediction. We show it achieves better prediction accuracy than the state-of-the-art methods. To our knowledge, MINED is the first algorithm to detect epistasis in the genetic trait prediction problem. We further proposed a constrained version of MINED that converts the epistasis detection problem into a Weighted Maximum Independent Set problem. We show that Constrained-MINED is able to improve the prediction accuracy even more.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bien, J., Taylor, J., Tibshirani, R., et al.: A lasso for hierarchical interactions. The Annals of Statistics 41(3), 1111–1141 (2013)
Brendel, W., Amer, M., Todorovic, S.: Multiobject tracking as maximum weight independent set. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1273–1280. IEEE (2011)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20, 33–61 (1998)
Cleveland, M.A., Hickey, J.M., Forni, S.: A common dataset for genomic analysis of livestock populations. G3: Genes— Genomes— Genetics 2(4), 429–435 (2012)
Cook, N.R., Zee, R.Y.L., Ridker, P.M.: Tree and spline based association analysis of gene–gene interaction models for ischemic stroke. Statistics in Medicine 23(9), 1439–1453 (2004)
Fang, G., Haznadar, M., Wang, W., Yu, H., Steinbach, M., Church, T.R., Oetting, W.S., Van Ness, B., Kumar, V.: High-order snp combinations associated with complex diseases: efficient discovery, statistical power and functional interactions. PloS One 7(4), e33531 (2012)
He, D., Rish, I., Haws, D., Teyssedre, S., Karaman, Z., Parida, L.: Mint: Mutual information based transductive feature selection for genetic trait prediction. arXiv preprint arXiv:1310.1659 (2013)
Kilpatrick, J.R.: Methods for detecting multi-locus genotype-phenotype association. PhD thesis, Rice University (2009)
Kizilkaya, K., Fernando, R.L., Garrick, D.J.: Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymorphism genotypes. Journal of Animal Science 88(2), 544–551 (2010)
Legarra, A., Robert-Granié, C., Croiseau, P., Guillaume, F., Fritz, S., et al.: Improved lasso for genomic selection. Genetics Research 93(1), 77 (2011)
Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics 37(4), 413–417 (2005)
Meuwissen, T.H.E., Hayes, B.J., Goddard, M.E.: Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001)
Park, T., Casella, G.: The bayesian lasso. Journal of the American Statistical Association 103, 681–686 (2008)
Pattin, K.A., White, B.C., Barney, N., Gui, J., Nelson, H.H., Kelsey, K.T., Andrew, A.S., Karagas, M.R., Moore, J.H.: A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genetic Epidemiology 33(1), 87–94 (2009)
Resende, M.F.R., Muñoz, P., Resende, M.D.V., Garrick, D.J., Fernando, R.L., Davis, J.M., Jokela, E.J., Martin, T.A., Peter, G.F., Kirst, M.: Accuracy of genomic selection methods in a standard data set of loblolly pine (pinus taeda l.). Genetics 190(4), 1503–1510 (2012)
Rincent, R., Laloë, D., Nicolas, S., Altmann, T., Brunel, D., Revilla, P., Rodriguez, V.M.: Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (zea mays l.). Genetics 192(2), 715–728 (2012)
Sakai, S., Togasaki, M., Yamazaki, K.: A note on greedy algorithms for the maximum weighted independent set problem. Discrete Applied Mathematics 126(2), 313–322 (2003)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)
Valiente, G.: A new simple algorithm for the maximum-weight independent set problem on circle graphs. In: Ibaraki, T., Katoh, N., Ono, H. (eds.) ISAAC 2003. LNCS, vol. 2906, pp. 129–137. Springer, Heidelberg (2003)
Wei, W., Hemani, G., Hicks, A.A., Vitart, V., Cabrera-Cardenas, C., Navarro, P., Huffman, J., Hayward, C., Knott, S.A., Rudan, I., et al.: Characterisation of genome-wide association epistasis signals for serum uric acid in human population isolates. PloS One 6(8), e23836 (2011)
Whittaker, J.C., Thompson, R., Denham, M.C.: Marker-assisted selection using ridge regression. Genet. Res. 75, 249–252 (2000)
Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: Snpharvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25(4), 504–511 (2009)
Zhang, X., Huang, S., Zou, F., Wang, W.: Team: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26(12), i217–i227 (2010)
Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case-control studies. Nature Genetics 39(9), 1167–1173 (2007)
Zhao, K., Tung, C.-W., Eizenga, G.C., Wright, M.H., Ali, L., Price, A.H., Norton, G.J., Islam, M.R., Reynolds, A., Mezey, J., et al.: Genome-wide association mapping reveals a rich genetic architecture of complex traits in oryza sativa. Nature Communications 2, 467 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
He, D., Wang, Z., Parada, L. (2015). MINED: An Efficient Mutual Information Based Epistasis Detection Method to Improve Quantitative Genetic Trait Prediction. In: Harrison, R., Li, Y., Măndoiu, I. (eds) Bioinformatics Research and Applications. ISBRA 2015. Lecture Notes in Computer Science(), vol 9096. Springer, Cham. https://doi.org/10.1007/978-3-319-19048-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-19048-8_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19047-1
Online ISBN: 978-3-319-19048-8
eBook Packages: Computer ScienceComputer Science (R0)