Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2382936.2382959acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

PseudoDomain: identification of processed pseudogenes based on protein domain classification

Published: 07 October 2012 Publication History

Abstract

Pseudogenes are dysfunctional DNA sequences that share sequence similarities with functional genes. Accurate identification of pseudogenes is important to understand biological and evolutionary histories of genomes and genes. Most existing pseudogene identification tools rely on homology search between genomic sequences and annotated proteins of the same genome. However, when accurate annotations of the genome of interest are not available, these tools will not be able to provide reliable pseudogene identification. In this work, we introduce a new pseudogene identification tool named PseudoDomain, which is designed to accurately identify processed pseudogenes in genomes with or without gene annotations. PseudoDomain uses profile Hidden Markov Model-based homology search between genomic sequences and protein domain families, which are conserved in a large number of proteins. Experimental results show that our method is able to effectively identify processed pseudogenes with high sensitivity and low false positive rate. In addition, it can accurately predict the number and positions of frameshifts within putative pseudogenes. The source codes of PseudoDomain are available at http://sourceforge.net/projects/pseudodomain/.

References

[1]
E. Vanin. Processed pseudogenes: Characteristics and evolution. Hum Genet., 19:253--272, December 1985.
[2]
A.J. Mighell and N. R. Smith and P. A. Robinson and A. F. Markham. Vertebrate pseudogenese. FEBS Lett., 268(2--3):109--14, February 2000.
[3]
H. Winter et al. Human type I hair keratin pseudogene phihHaA has functional orthologs in the chimpanzee and gorilla: evidence for recent inactivation of the human gene after the Pan-Homo divergence. Hum Genet., 108(1):37--42, January 2001.
[4]
D. Torrents and M. Suyama and E. Suyama and P. Bork. A genome-wide survey of human pseudogenes. Genome Res, 13(12):2559--67, December 2003.
[5]
E. S. Balakirev and F. J. Ayala. Pseudogenes: Are they "Junk" or functional DNA? Annu Rev Genet, 37:123--151, December 2003.
[6]
S. Hirotsune et al. An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene. Nature, 423(6935):91--6, May 2003.
[7]
R. H. Waterston et al. Initial sequencing and comparative analysis of the mouse genome. Nature, 420(6915):520--62, December 2002.
[8]
Z. Zhang, N. Carriero, and M. Gerstein. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet., 20(2):62--7, Feburary 2004.
[9]
M. Kimura. Evolutionary rate at the molecular level. Nature, 217:624--626, February 1968.
[10]
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 409:860--921, Feburary 2001.
[11]
I. Podlaha and J. Zhang. Processed pseudogenes: the 'fossilized footprints' of past gene expression. Trends Genet., 25(10):429--434, Feburary 2009.
[12]
S. F. Altschul et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res., 25(17):3389--3402, September 1997.
[13]
NCBI reference sequences. http://www.ncbi.nlm.nih.gov/RefSeq/.
[14]
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence data bank and its supplement TREMBL. Nucl. Acids Res., 25:31--36, Janurary 1997.
[15]
M. Pop and S. L. Salzberg. Bioinformatics challenges of new sequencing technology. Trends Genet., 24(3):142--149, March 2008.
[16]
O. Harismendy et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biology, 10:R32, March 2009.
[17]
D. Devos and A. Valencia. Intrinsic errors in genome annotation. Trends in Genetics, 17(8):429--31, August 2001.
[18]
M. Punta et al. The Pfam protein families database. Nucl. Acids Res., 40(D1):D290--D301, November 2011.
[19]
Protein Knowledgebase: UniProtKB. http://www.uniprot.org/.
[20]
HMMER3: a new generation of sequence homology search software. http://hmmer.janelia.org/.
[21]
S. R. Eddy. Accelerated Profile HMM Searches. PLoS Comput. Biol., 7:e1002195, October 2011.
[22]
K. Karplus and C. Barrett and R. Hughey. Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14(10):846--56, October 1998.
[23]
Y. Zhang and Y. Sun. HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors. BMC Bioinformatics, 12(1):198, May.
[24]
Z. Zhang et al. Pseudopipe: an automated pseudogene identification pipeline. Bioinformatics, 22(12):1437--9, June 2006.
[25]
C. Ortutay and C. Vihinen. Pseudogenequest - service for identification of different pseudogene types in the human genome. BMC Bioinformatics, 9:299, July 2008.
[26]
D. Zheng and M. B. Gerstein. A computational approach for identifying pseudogenes in the ENCODE regions. Genome Biology, 7(suppl 1):S13, August 2006.
[27]
M. K. Sakharkar, V. T. Chow, and P. Kangueane. Distributions of exons and introns in the human genome. In Silico Biol, 4(4):387--93, 2004.
[28]
M. K. Sakharkar et al. An analysis on gene architecture in human and mouse genomes. In Silico Biol, 5(4):347--65, May 2005.
[29]
C. Wang et al. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res., 17(8):1195--201, March 2007.
[30]
H. Y. K. Lam et al. Pseudofam: the pseudogene families database. Nucl. Acids Res., 37(suppl 1):D738--743, 2009.
[31]
UCSC Genome Bioinformatics. http://genome.ucsc.edu/.
[32]
Pfam: Home Page. http://pfam.sanger.ac.uk/.
[33]
A. F. A. Smit, R. Hubley, and P. Green. RepeatMasker Open-3.0, 1996--2010. http://www.repeatmasker.org.

Index Terms

  1. PseudoDomain: identification of processed pseudogenes based on protein domain classification

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
        October 2012
        725 pages
        ISBN:9781450316705
        DOI:10.1145/2382936
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 October 2012

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. application
        2. frameshift detection
        3. profile HMM
        4. protein domain classification
        5. pseudogene identification

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        BCB' 12
        Sponsor:

        Acceptance Rates

        BCB '12 Paper Acceptance Rate 33 of 159 submissions, 21%;
        Overall Acceptance Rate 254 of 885 submissions, 29%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 130
          Total Downloads
        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 29 Sep 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media