Abstract
Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the alignment of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of domain architecture. We developed several schemes for scoring the similarity of a pair of protein sequences by exploiting an analogy between comparing proteins using their domain content and comparing documents based on their word content. We evaluate the proposed methods using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting critical domains and of compensating for proteins with large numbers of domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Huynen, M.A., Bork, P.: Measuring genome evolution. PNAS 95(11), 5849–5856 (1998)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Gilbert, W.: The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52, 901–905 (1987)
Patthy, L.: Genome evolution and the evolution of exon-shuffling–a review. Gene 238(1), 103–114 (1999)
Eichler, E.E.: Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17(11), 661–669 (2001)
Emanuel, B.S., Shaikh, T.H.: Segmental duplications: an ’expanding’ role in genomic instability and disease. Nat. Rev. Genet. 2(10), 791–800 (2001)
Kaessmann, H., Zollner, S., Nekrutenko, A., Li, W.H.: Signatures of domain shuffling in the human genome. Genome Res. 12(11), 1642–1650 (2002)
Wang, W., Zhang, J., Alvarez, C., Llopart, A., Long, M.: The origin of the jingwei gene and the complex modular structure of its parental gene, yellow emperor, in drosophila melanogaster. Mol. Biol. Evol. 17(9), 1294–1301 (2000)
Long, M.: Evolution of novel genes. Curr. Opin. Genet. Dev. 11(6), 673–680 (2001)
Long, M., Thornton, K.: Gene duplication and evolution. Science 293(5535), 1551 (2001)
Apic, G., Gough, J., Teichmann, S.A.: Domain coalmbinations in archaeal, eubacterial and eukaryotic proteomes. J. Mol. Biol. 310(2), 311–325 (2001)
Letunic, I., Goodstadt, L., Dickens, N.J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R.R., Ponting, C.P., Bork, P.: Recent improvements to the smart domain-based sequence annotation resource. Nucleic Acids Res. 30(1), 242–244 (2002)
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., Sonnhammer, E.L.L.: The Pfam protein families database. Nucleic Acids Res. 30(1), 276–280 (2002)
Corpet, F., Gouzy, J., Kahn, D.: The ProDom database of protein domain families. Nucleic Acids Res. 26(1), 323–326 (1998)
Gracy, J., Argos, P.: Domo: a new database of aligned protein domains. Trends Biochem. Sci. 23(12), 495–497 (1998)
Heger, A., Holm, L.: Exhaustive enumeration of protein domain families. J. Mol. Biol. 328(3), 749–767 (2003)
Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995)
Geer, L.Y., Domrachev, M., Lipman, D.J., Bryant, S.H.: CDART: protein homology by domain architecture. Genome Res. 12(10), 1619–1623 (2002)
Bjorklund, A.K., Ekman, D., Light, S., Frey-Skott, J., Elofsson, A.: Domain rearrangements in protein evolution. J. Mol. Biol. 353(4), 911–923 (2005)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor, M.G.L., Nelson, C.R., Hariharan, I.K., Fortini, M.E., Li, P.W., Apweiler, R., Fleischmann, W., et al.: Comparative genomics of the eukaryotes. Science 287(5461), 2204–2215 (2000)
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428), 751–753 (1999)
Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S.: The universal protein resource (UniProt). Nucleic Acids Res. 33, D154–D159 (2005)
Nicholson, A.C., Malik, S.B., Logsdon, J.M.J., Van Meir, E.G.: Functional evolution of ADAMTS genes: evidence from analyses of phylogeny and gene organization. BMC Evol. Biol. 5(1), 11 (2005)
Stone, A.L., Kroeger, M., Sang, Q.X.: Structure-function analysis of the adam family of disintegrin-like and metalloproteinase-containing proteins (review). J. Protein Chem. 18(4), 447–465 (1999)
Wolfsberg, T.G., White, J.M.: Adams in fertilization and development. Dev. Biol. 180(2), 389–401 (1996)
Wharton, K.A.: Runnin’ with the Dvl: proteins that associate with Dsh/Dvl and their significance to Wnt signal transduction. Dev. Biol. 253(1), 1–17 (2003)
Sheldahl, L.C., Slusarski, D.C., Pandur, P., Miller, J.R., Kühl, M., Moon, R.T.: Dishevelled activates Ca2+ flux, PKC, and CamKII in vertebrate embryos. J. Cell Biol. 161(4), 769–777 (2003)
Mazet, F., Yu, J.K., Liberles, D.A., Holland, L.Z., Shimeld, S.M.: Phylogenetic relationships of the fox (forkhead) gene family in the bilateria. Gene 316, 79–89 (2003)
Kaestner, K.H., Knochel, W., Martinez, D.E.: Unified nomenclature for the winged helix/forkhead transcription factors. Genes Dev. 14(2), 142–146 (2000)
Lowry, J.A., Atchley, W.R.: Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain. J. Mol. Evol. 50(2), 103–115 (2000)
Patient, R.K., McGhee, J.D.: The GATA family (vertebrates and invertebrates). Curr. Opin. Genet. Dev. 12(4), 416–422 (2002)
Robinson, D.R., Wu, Y.M., Lin, S.F.: The protein tyrosine kinase family of the human genome. Oncogene 19(49), 5548–5557 (2000)
Hanks, S.K.: Genomic analysis of the eukaryotic protein kinase superfamily: a perspective. Genome Biol. 4(5), 111 (2003)
Cheek, S., Zhang, H., Grishin, N.V.: Sequence and structure classification of kinases. J. Mol. Biol. 320(4), 855–881 (2002)
Shiu, S.H., Li, W.H.: Origins, lineage-specific expansions, and multiple losses of tyrosine kinases in eukaryotes. Mol. Biol. Evol. 21(5), 828–840 (2004)
Iwabe, N., Miyata, T.: Kinesin-related genes from diplomonad, sponge, amphioxus, and cyclostomes: divergence pattern of kinesin family and evolution of giardial membrane-bounded organella. Mol. Biol. Evol. 19(9), 1524–1533 (2002)
Lawrence, C.J., Dawe, R.K., Christie, K.R., Cleveland, D.W., Dawson, S.C., Endow, S.A., Goldstein, L.S., Goodson, H.V., Hirokawa, N., Howard, J., et al.: A standardized kinesin nomenclature. J. Cell Biol. 67(1), 19–22 (2004)
Miki, H., Setou, M., Hirokawa, N.: Kinesin superfamily proteins (kifs) in the mouse transcriptome. Genome Res. 13(6B), 1455–1465 (2003)
Welch, A.Y., Kasahara, M., Spain, L.M.: Identification of the mouse killer immunoglobulin-like receptor-like (Kirl) gene family mapping to chromosome X. Immunogenetics 54(11), 782–790 (2003)
Belkin, D., Torkar, M., Chang, C., Barten, R., Tolaini, M., Haude, A., Allen, R., Wilson, M.J., Kioussis, D., Trowsdale, J.: Killer cell Ig-like receptor and leukocyte Ig-like receptor transgenic mice exhibit tissue- and cell-specific transgene expression. J. Immunol. 171(6), 3056–3063 (2003)
Engel, J.: Laminins and other strange proteins. Biochemistry 31(44), 10643–10651 (1992)
Hutter, H., Vogel, B.E., Plenefisch, J.D., Norris, C.R., Proenca, R.B., Spieth, J., Guo, C., Mastwal, S., Zhu, X., Scheel, J., Hedgecock, E.M.: Conservation and novelty in the evolution of cell adhesion and extracellular matrix genes. Science 287(5455), 989–994 (2000)
Richards, T.A., Cavalier-Smith, T.: Myosin domain evolution and the primary divergence of eukaryotes. Nature 436(7054), 1113–1118 (2005)
Goodson, H.V., Dawson, S.C.: Multiplying myosins. Proc. Natl. Acad. Sci. USA 103(10), 3498–3499 (2006)
Foth, B.J., Goedecke, M.C., Soldati, D.: New insights into myosin evolution and classification. Proc. Natl. Acad. Sci. USA 103(10), 3681–3686 (2006)
Maine, E.M., Lissemore, J.L., Starmer, W.T.: A phylogenetic analysis of vertebrate and invertebrate notch-related genes. Mol. Phylogenet. Evol. 4(2), 139–149 (1995)
Westin, J., Lardelli, M.: Three novel notch genes in zebrafish: implications for vertebrate notch gene evolution and function. Dev. Genes. Evol. 207(1), 51–63 (1997)
Kortschak, R.D., Tamme, R., Lardelli, M.: Evolutionary analysis of vertebrate notch genes. Dev. Genes. Evol. 211(7), 350–354 (2001)
Degerman, E., Belfrage, P., Manganiello, V.: Structure, localization, and regulation of cGMP-inhibited phosphodiesterase (PDE3). J. Biol. Chem. 272(11), 6823–6826 (1997)
Raper, J.: Semaphorins and their receptors in vertebrates and invertebrates. Curr. Opin. Neurobiol. 10(1), 88–94 (2000)
Yazdani, U., Terman, J.R.: The semaphorins. Genome. Biol. 7(3), 211 (2006)
Locksley, R.M., Killeen, N., Lenardo, M.J.: The tnf and tnf receptor superfamilies: integrating mammalian biology. Cell 104(4), 487–501 (2001)
MacEwan, D.J.: TNF ligands and receptors–a matter of life and death. Br. J. Pharmacol. 135(4), 855–875 (2002)
Inoue, J., Ishida, T., Tsukamoto, N., Kobayashi, N., Naito, A., Azuma, S., Yamamoto, T.: Tumor necrosis factor receptor-associated factor (TRAF) family: adapter proteins that mediate cytokine signaling. Exp. Cell Res. 254(1), 14–24 (2000)
Wing, S.S.: Deubiquitinating enzymes–the importance of driving in reverse along the ubiquitin-proteasome pathway. Int. J. Biochem. Cell Biol. 35(5), 590–605 (2003)
Kim, J.H., Park, K.C., Chung, S.S., Bang, O., Chung, C.H.: Deubiquitinating enzymes as cellular regulators. J. Biochem. (Tokyo) 134(1), 9–18 (2003)
DeLong, E.R., DeLong, D.M.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, N., Sedgewick, R.D., Durand, D. (2006). Domain Architecture in Homolog Identification. In: Bourque, G., El-Mabrouk, N. (eds) Comparative Genomics. RCG 2006. Lecture Notes in Computer Science(), vol 4205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11864127_2
Download citation
DOI: https://doi.org/10.1007/11864127_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44529-6
Online ISBN: 978-3-540-44530-2
eBook Packages: Computer ScienceComputer Science (R0)