Abstract
Malaria is one of the world’s most deadly diseases and is caused by the parasite Plasmodium falciparum. Sixty percent of P. falciparum genes have no known function and therefore new methods of gene function prediction are needed. To address this problem, we train a naïve Bayes classifier on multiple sources of data and subsequently apply a modified version of the Gene Set Enrichment Analysis Algorithm to predict gene function in P. falciparum. To define gene function, we exploit the hierarchical structure of the Gene Ontology, specifically using the Biological Process category. We demonstrate the value of integrating multiple data sources by achieving accurate predictions on genes that cannot be annotated using simple sequence similarity based methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Liolios, K., et al.: The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 36, D475–D479 (2008)
Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Pena-Castillo, L., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9(suppl. 1), S2 (2008)
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)
Gardner, M.J., et al.: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511 (2002)
Brehelin, L., et al.: PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data. BMC Bioinformatics 9, 440 (2008)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)
Lord, P.W., et al.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275–1283 (2003)
Wang, J.Z., et al.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23, 1274–1281 (2007)
dfmax.c, ftp://dimacs.rutgers.edu/pub/challenge/graph/solvers
Mulder, N.J., et al.: New developments in the InterPro database. Nucleic Acids Res. 35, D224–D228 (2007)
Quevillon, E., et al.: InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005)
Stoeckert Jr., C.J., et al.: PlasmoDB v5: new looks, new genomes. Trends Parasitol 22, 543–546 (2006)
Chen, F., et al.: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34, D363–D368 (2006)
Date, S.V., Stoeckert Jr., C.J.: Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res. 16, 542–549 (2006)
Llinas, M., et al.: Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 34, 1166–1173 (2006)
Young, J.A., et al.: The Plasmodium falciparum sexual development transcriptome: a microarray analysis using ontology-based pattern identification. Mol. Biochem. Parasitol 143, 67–79 (2005)
Le Roch, K.G., et al.: Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301, 1503–1508 (2003)
Florens, L., et al.: A proteomic view of the Plasmodium falciparum life cycle. Nature 419, 520–526 (2002)
Lasonder, E., et al.: Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419, 537–542 (2002)
Khan, S.M., et al.: Proteome analysis of separated male and female gametocytes reveals novel sex-specific Plasmodium biology. Cell 121, 675–687 (2005)
Le Roch, K.G., et al.: Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle. Genome Res. 14, 2308–2318 (2004)
Hermjakob, H., et al.: IntAct: an open source molecular interaction database. Nucleic Acids Res. 32, D452–D455 (2004)
LaCount, D.J., et al.: A protein interaction network of the malaria parasite Plasmodium falciparum. Nature 438, 103–107 (2005)
Murphy, K.P.: The Bayes Net Toolbox for Matlab. Computing Science and Statistics 33 (2001)
Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545–15550 (2005)
Wuchty, S., Ipsaro, J.J.: A draft of protein interactions in the malaria parasite P. falciparum. J. Proteome Res. 6, 1461–1470 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tedder, P.M.R., Bradford, J.R., Needham, C.J., McConkey, G.A., Bulpitt, A.J., Westhead, D.R. (2009). Bayesian Data Integration and Enrichment Analysis for Predicting Gene Function in Malaria. In: Ambos-Spies, K., Löwe, B., Merkle, W. (eds) Mathematical Theory and Computational Practice. CiE 2009. Lecture Notes in Computer Science, vol 5635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03073-4_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-03073-4_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03072-7
Online ISBN: 978-3-642-03073-4
eBook Packages: Computer ScienceComputer Science (R0)