Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Intelligent Classification and Analysis of Essential Genes Using Quantitative Methods

Published: 17 April 2020 Publication History

Abstract

Essential genes are considered to be the genes required to sustain life of different organisms. These genes encode proteins that maintain central metabolism, DNA replications, translation of genes, and basic cellular structure, and mediate the transport process within and out of the cell. The identification of essential genes is one of the essential problems in computational genomics. In this present study, to discriminate essential genes from other genes from a non-biologists perspective, the purine and pyrimidine distribution over the essential genes of four exemplary species, namely Homo sapiens, Arabidopsis thaliana, Drosophila melanogaster, and Danio rerio are thoroughly experimented using some quantitative methods. Moreover, the Indigent classification method has also been deployed for classification on the essential genes of the said species. Based on Shannon entropy, fractal dimension, Hurst exponent, and purine and pyrimidine bases distribution, 10 different clusters have been generated for the essential genes of the four species. Some proximity results are also reported herewith for the clusters of the essential genes.

References

[1]
Ryan S. O’Neill and Denise V. Clark. 2013. The Drosophila melanogaster septin gene Sep2 has a redundant function with the retrogene Sep5 in imaginal cell proliferation but is essential for oogenesis. Genome 56, 12 (2013), 753--758.
[2]
Yipin Wu, Michel Baum, Chou-Long Huang, and Aylin R. Rodan. 2015. Two inwardly rectifying potassium channels, Irk1 and Irk2, play redundant roles in Drosophila renal tubule function. American Journal of Physiology: Regulatory, Integrative and Comparative Physiology 309, 7 (2015), R747–R756.
[3]
Eugene V. Koonin. 2003. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Reviews Microbiology 1, 2 (2003), 127.
[4]
Eugene V. Koonin. 2000. How many genes can make a cell: The minimal-gene-set concept. Annual Review of Genomics and Human Genetics 1, 1 (2000), 99--116.
[5]
Fabian M. Commichau, Nico Pietack, and Jörg Stülke. 2013. Essential genes in Bacillus subtilis: A re-evaluation after ten years. Molecular BioSystems 9, 6 (2013), 1068--1075.
[6]
Mitsuhiro Itaya. 1995. An estimation of minimal genome size required for life. FEBS Letters 362, 3 (1995), 257--260.
[7]
Ren Zhang and Yan Lin. 2008. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Research 37, suppl. 1 (2008), D455–D458.
[8]
Lars M. Steinmetz, Curt Scharfe, Adam M. Deutschbauer, Dejana Mokranjac, Zelek S. Herman, Ted Jones, Angela M. Chu, et al. 2002. Systematic screen for human disease genes in yeast. Nature Genetics 31, 4 (2002), 400.
[9]
Gyanu Lamichhane, Matteo Zignol, Natalie J. Blades, Deborah E. Geiman, Annette Dougherty, Jacques Grosset, Karl W. Broman, and William R. Bishai. 2003. A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: Application to Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences of the United States of America 100, 12 (2003), 7213--7218.
[10]
Wenqi Hu, Susan Sillaots, Sebastien Lemieux, John Davison, Sarah Kauffman, Anouk Breton, Annie Linteau, et al. 2007. Essential gene identification and drug target prioritization in Aspergillus fumigatus. PLoS Pathogens 3, 3 (2007), e24.
[11]
Terry Roemer, Bo Jiang, John Davison, Troy Ketela, Karynn Veillette, Anouk Breton, Fatou Tandia, et al. 2003. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Molecular Microbiology 50, 1 (2003), 167--181.
[12]
Scott A. Becker and Bernhard Ø. Palsson. 2005. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: An initial draft to the two-dimensional annotation. BMC Microbiology 5, 1 (2005), 8.
[13]
Guri Giaever, Angela M. Chu, Li Ni, Carla Connelly, Linda Riles, Steeve Veronneau, Sally Dow, et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 6896 (2002), 387.
[14]
Yu Chen and Dong Xu. 2004. Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21, 5 (2004), 575--581.
[15]
Jens Harborth, Sayda M. Elbashir, Kim Bechert, Thomas Tuschl, and Klaus Weber. 2001. Identification of essential genes in cultured mammalian cells using small interfering RNAs. Journal of Cell Science 114, 24 (2001), 4557--4565.
[16]
Yinduo Ji, Barbara Zhang, Stephanie F. Van, Patrick Warren, Gary Woodnutt, Martin K. R. Burnham, Martin Rosenberg, et al. 2001. Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science 293, 5538 (2001), 2266--2269.
[17]
Larry A. Gallagher, Elizabeth Ramage, Michael A. Jacobs, Rajinder Kaul, Mitchell Brittnacher, and Colin Manoil. 2007. A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate. Proceedings of the National Academy of Sciences of the United States of America 104, 3 (2007), 1009--1014.
[18]
Gemma C. Langridge, Minh-Duy Phan, Daniel J. Turner, Timothy T. Perkins, Leopold Parts, Jana Haase, Ian Charles, et al. 2009. Simultaneous assay of every Salmonella Typhi gene using one million transposon mutants. Genome Research 19, 12 (2009), 2308--2316.
[19]
Ranjeet Kumar Rout, Pabitra Pal Choudhury, Santi Prasad Maity, B. S. Daya Sagar, and Sk Sarif Hassan. 2018. Fractal and mathematical morphology in intricate comparison between tertiary protein structures. Computer Methods in Biomechanics and Biomedical Engineering: Imaging 8 Visualization 6, 2 (2018), 192--203.
[20]
Kang Ning, Hoong Kee Ng, Sriganesh Srihari, Hon Wai Leong, and Alexey I. Nesvizhskii. 2010. Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology. BMC Bioinformatics 11, 1 (2010), 505.
[21]
Yuan-Nong Ye, Zhi-Gang Hua, Jian Huang, Nini Rao, and Feng-Biao Guo. 2013. CEG: A database of essential gene clusters. BMC Genomics 14, 1 (2013), 769.
[22]
Yao Lu, Jingyuan Deng, Matthew B. Carson, Hui Lu, and Long J. Lu. 2014. Computational methods for the prediction of microbial essential genes. Current Bioinformatics 9, 2 (2014), 89--101.
[23]
Ping Xu, Xiuchun Ge, Lei Chen, Xiaojing Wang, Yuetan Dou, Jerry Z. Xu, Jenishkumar R. Patel, et al. 2011. Genome-wide essential gene identification in Streptococcus sanguinis. Scientific Reports 1 (2011), 125.
[24]
Sergei Maslov and Kim Sneppen. 2002. Protein interaction networks beyond artifacts. FEBS Letters 530, 1–3 (2002), 255--256.
[25]
Hawoong Jeong, Sean P. Mason, A.-L. Barabási, and Zoltan N. Oltvai. 2001. Lethality and centrality in protein networks. Nature 411, 6833 (2001), 41.
[26]
Haiyuan Yu, Dov Greenbaum, Hao Xin Lu, Xiaowei Zhu, and Mark Gerstein. 2004. Genomic analysis of essentiality within protein networks. TRENDS in Genetics 20, 6 (2004), 227--231.
[27]
Balázs Papp, Csaba Pál, and Laurence D. Hurst. 2004. Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature 429, 6992 (2004), 661.
[28]
Felipe Sarmiento, Jan Mrázek, and William B. Whitman. 2013. Genome-scale analysis of gene function in the hydrogenotrophic methanogenic archaeon Methanococcus maripaludis. Proceedings of the National Academy of Sciences of the United States of America 110, 12 (2013), 4726--4731.
[29]
Guri Giaever, Angela M. Chu, Li Ni, Carla Connelly, Linda Riles, Steeve Veronneau, Sally Dow, et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 6896 (2002), 387.
[30]
Dong-Uk Kim, Jacqueline Hayles, Dongsup Kim, Valerie Wood, Han-Oh Park, Misun Won, Hyang-Sook Yoo, et al. 2010. Analysis of a genome-wide set of gene deletions in the fission yeast Schizosaccharomyces pombe. Nature Biotechnology 28, 6 (2010), 617.
[31]
David Meinke, Rosanna Muralla, Colleen Sweeney, and Allan Dickerman. 2008. Identifying essential genes in Arabidopsis thaliana. Trends in Plant Science 13, 9 (2008), 483--491.
[32]
Ben-Yang Liao and Jianzhi Zhang. 2007. Mouse duplicate genes are as essential as singletons. Trends in Genetics 23, 8 (2007), 378--381.
[33]
Vincent A. Blomen, Peter Májek, Lucas T. Jae, Johannes W. Bigenzahn, Joppe Nieuwenhuis, Jacqueline Staring, Roberto Sacco, et al. 2015. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 6264 (2015), 1092--1096.
[34]
Tim Wang, Kıvanç Birsoy, Nicholas W. Hughes, Kevin M. Krupczak, Yorick Post, Jenny J. Wei, Eric S. Lander, and David M. Sabatini. 2015. Identification and characterization of essential genes in the human genome. Science 350, 6264 (2015), 1096--1101.
[35]
L. W. Ning, H. Lin, H. Ding, J. Huang, N. Rao, and F. B. Guo. 2014. Predicting bacterial essential genes using only sequence composition information. Genetics and Molecular Research 13 (2014), 4564--4572.
[36]
Yongming Yu, Licai Yang, Zhiping Liu, and Chuansheng Zhu. 2017. Gene essentiality prediction based on fractal features and machine learning. Molecular BioSystems 13, 3 (2017), 577--584.
[37]
Kitiporn Plaimas, Roland Eils, and Rainer König. 2010. Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Systems Biology 4, 1 (2010), 56.
[38]
Marcio L. Acencio and Ney Lemke. 2009. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics 10, 1 (2009), 290.
[39]
Yao Lu, Jingyuan Deng, Judith C. Rhodes, Hui Lu, and Long Jason Lu. 2014. Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. Computational Biology and Chemistry 50 (2014), 29--40.
[40]
Jian Cheng, Zhao Xu, Wenwu Wu, Li Zhao, Xiangchen Li, Yanlin Liu, and Shiheng Tao. 2014. Training set selection for the prediction of essential genes. PloS One 9, 1 (2014), e86805.
[41]
Xiao Liu, Bao-Jin Wang, Luo Xu, Hong-Ling Tang, and Guo-Qing Xu. 2017. Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species. PloS One 12, 3 (2017), e0174638.
[42]
John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28, 1 (1979), 100--108.
[43]
Carlo Cattani. 2010. Fractals and hidden symmetries in DNA. Mathematical Problems in Engineering 2010 (2010), Article 507506.
[44]
Jayanta Kumar Das, Pabitra Pal Choudhury, Adwitiya Chaudhuri, Sk Sarif Hassan, and Pallab Basu. 2018. Analysis of purines and pyrimidines distribution over miRNAs of human, gorilla, chimpanzee, mouse and rat. Scientific Reports 8, 1 (2018), 9974.
[45]
Cheryl L. Berthelsen, James A. Glazier, and Mark H. Skolnick. 1992. Global fractal dimension of human DNA sequences treated as pseudorandom walks. Physical Review A 45, 12 (1992), 8902.
[46]
Konstantin Makarychev, Yury Makarychev, Andrei Romashchenko, and Nikolai Vereshchagin. 2002. A new class of non-Shannon-type inequalities for entropies. Communications in Information and Systems 2, 2 (2002), 147;166.
[47]
Wojciech H. Zurek. 1989. Algorithmic randomness and physical entropy. Physical Review A 40, 8 (1989), 4731.
[48]
Ty Roach, James Nulton, Paolo Sibani, Forest Rohwer, and Peter Salamon. 2017. Entropy in the tangled nature model of evolution. Entropy 19, 5 (2017), 192.
[49]
Arcady R. Mushegian and Eugene V. Koonin. 1996. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proceedings of the National Academy of Sciences 93, 19 (1996), 10268--10273.

Cited By

View all
  • (2023)Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classificationFrontiers in Genetics10.3389/fgene.2023.115412014Online publication date: 20-Apr-2023
  • (2023)A Pattern Classification Model for Vowel Data Using Fuzzy Nearest NeighborIntelligent Automation & Soft Computing10.32604/iasc.2023.02978535:3(3587-3598)Online publication date: 2023
  • (2023)PRMxAI: protein arginine methylation sites prediction based on amino acid spatial distribution using explainable artificial intelligenceBMC Bioinformatics10.1186/s12859-023-05491-x24:1Online publication date: 4-Oct-2023
  • Show More Cited By

Index Terms

  1. Intelligent Classification and Analysis of Essential Genes Using Quantitative Methods

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1s
    Special Issue on Multimodal Machine Learning for Human Behavior Analysis and Special Issue on Computational Intelligence for Biomedical Data and Imaging
    January 2020
    376 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3388236
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 April 2020
    Accepted: 01 July 2019
    Revised: 01 June 2019
    Received: 01 May 2019
    Published in TOMM Volume 16, Issue 1s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Essential genes
    2. Hurst exponent
    3. Shannon entropy
    4. fractal dimension
    5. purines
    6. pyrimidines

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classificationFrontiers in Genetics10.3389/fgene.2023.115412014Online publication date: 20-Apr-2023
    • (2023)A Pattern Classification Model for Vowel Data Using Fuzzy Nearest NeighborIntelligent Automation & Soft Computing10.32604/iasc.2023.02978535:3(3587-3598)Online publication date: 2023
    • (2023)PRMxAI: protein arginine methylation sites prediction based on amino acid spatial distribution using explainable artificial intelligenceBMC Bioinformatics10.1186/s12859-023-05491-x24:1Online publication date: 4-Oct-2023
    • (2023)Prediction of Protein-Protein Interaction Using Support Vector Machine Based on Spatial Distribution of Amino AcidsAdvances and Applications of Artificial Intelligence & Machine Learning10.1007/978-981-99-5974-7_3(23-32)Online publication date: 15-Nov-2023
    • (2022)Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein SequencesMathematics10.3390/math1013222810:13(2228)Online publication date: 25-Jun-2022
    • (2022)Protein-protein interaction prediction from primary sequences using supervised machine learning algorithm2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)10.1109/Confluence52989.2022.9734190(268-272)Online publication date: 27-Jan-2022
    • (2022)Multifactorial feature extraction and site prognosis model for protein methylation dataBriefings in Functional Genomics10.1093/bfgp/elac03422:1(20-30)Online publication date: 30-Oct-2022
    • (2022)Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequencesComputers in Biology and Medicine10.1016/j.compbiomed.2021.105024141:COnline publication date: 1-Feb-2022
    • (2021)Preliminaries on the Accurate Estimation of the Hurst Exponent Using Time Series2021 IEEE International Conference on Automation/XXIV Congress of the Chilean Association of Automatic Control (ICA-ACCA)10.1109/ICAACCA51523.2021.9465274(1-8)Online publication date: 22-Mar-2021

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media