Abstract
Discontinuity in long Deoxyribonucleic Acid (DNA) sequences creates harmful diseases. Changes in the DNA structure refers to changes in the human immunity system. Tuberculosis is a critical disease that causes coughing, fatigue, unintentional weight loss and fever on aged people due to the disorder in the DNA. Breaks or mutations over long DNA sequences are the pivotal reasons for this fatal disease. This study developed an automated machine learning technique to assess the total number of such breaks in the long DNA sequences. Data cleansing and deep neural network techniques are applied to handle this big data. The National Center for Biotechnology Information (NCBI) database has been used to extract the amino acid sequences for Tuberculosis disease from the big DNA datasets. Results reveal that the proposed automated approach is significantly effective for the determination of DNA sequence breaks for the tuberculosis diseases due to the high sensitivity of Markov chain as well as the effective normalization techniques. This approach fixed the size of the training datasets and recursively divide the whole dataset into certain length. The study also adopts multiple predictions approaches, such as the hidden Markov chain, Box-Cox transformation and linear transformation to forecast about the breaks for any long positions of the training and testing datasets. The results demonstrated that hidden the Markov chain model provided faster analysis with more accurate and reliable results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anandakumar S, Shanmughavel P (2008a) Computational annotation for hypothetical proteins of Mycobacterium tuberculosis, J ComputSciSystBiol 1:050–062. https://doi.org/10.4172/jcsb.1000004
Anandakumar S, Shanmughavel P (2008b) Computational annotation for hypothetical proteins of mycobacterium tuberculosis, J Comput Sci Syst Biol 641046, JCSB/Vol. 1, TamilNadu
Barik MR et al (2018) Normalised quantitative polymerase chain reaction for diagnosis of tuberculosis-associated uveitis. Tuberculosis 110:30–35
Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35:D301–D303
Bibicu D, Moraru L, Biswas A (2013) Thyroid nodule recognition based on feature selection and pixel classification methods. J Digit Imaging 26(1):119–128. https://doi.org/10.1007/s10278-012-9475-5
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26:211–252
Burkett KM, McNeney B, Graham J (2016) Sampletrees and Rsampletrees: sampling gene genealogies conditional on SNP genotype data. Bioinformatics 32(10):1568–1570
Canaan S, Sulzenbacher G, Zamboni V, Calvo LS, Frassinetti F, Maurin D, Cambillau C, Bourne Y (2005) Crystal structure of the conserved hypothetical protein Rv1155 from Mycobacterium tuberculosis. FEBS Lett 579:215–221. https://doi.org/10.1016/j.febslet.2004.11.069 (ISSN 0014-5793)
Cavalcante RG, Patil S, Weymouth TE, Bendinskas KG, Karnovsky A, Maureen A (2016) Sartor ConceptMetab: exploring relationships among metabolite sets to identify links among biomedical concepts. Bioinformatics 32(10):1536–1543
Debasree S, Piya P, Abhirupa G, Sudipto S (2016) Computational framework for prediction of peptide sequences that may mediate multiple protein interactions in cancer associated hub proteins. PLos One 11(5):e0155911
Deng L, Yu D (2014) Deep learning: methods and applications (PDF). Found Trends Signal Process 7(3–4):1–199
Deng SP, Zhu L, Huang DS (2016) Predicting hub genes associated with cervical cancer through gene co-expression networks. IEEE/ACM Trans Comput Biol Bioinform 13:27–35
Desalegn D (2017) Factors affecting tuberculosis case detection in Kersa District, South West Ethiopia. J Clin Tuber Other Mycobact Dis 9:1–4. https://doi.org/10.1016/j.jctube.2017.08.003 (ISSN 2405–5794)
Dhulekar N, Ray S, Yuan D, Baskaran A, Oztan B, Larsen M, Yene B (2016) Prediction of growth factor-dependent cleft formation during branching morphogenesis using a dynamic graph-based growth model. IEEE/ACM Trans Comput Biol Bioinform 13:350–363
Doerks T, van Noort V, Minguez P, Bork P (2012a) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 7(4):e34302. https://doi.org/10.1371/journal.pone.0034302
Doerks T, Noort VV, Minguez P, Bork P (2012b) Annotation of the M. tuberculosis hypothetical orfeome: adding functional information to more than half of the uncharacterized proteins. PLoS One 7(4):e34302. https://doi.org/10.1371/journal.pone.0034302
Domínguez JG, Schmidt B (2016) ParDRe: faster parallel duplicated reads removal tool for sequencing studies. Bioinformatics 32(10):1562–1564
Dong Q, Hu Z (2016) Statistics of visual responses to object stimuli from primate AIT neurons to DNN neurons. arXiv preprint. arXiv:1612.03590
Edelman A, Heller S, Johnsson SL (1994) Index transformation algorithms in a linear algebra framework. IEEE Trans Parallel Distrib Syst 5(12):1302–1309
Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
Fdez JA, Alonso JM (2016) A survey of fuzzy systems software: taxonomy, current research trends and prospects. IEEE Trans Fuzzy Syst 24:40–56
Fernández-Calleja V, Hernández P, Schvartzman JB, de Lacoba MG, Krimer DB (2017) Differential gene expression analysis by RNA-seq reveals the importance of actin cytoskeletal proteins in erythroleukemia cells. PeerJ 5:e3432
Hogeweg L, Sánchez CI, Maduskar P, Philipsen R, Story A, Dawson R, Theron G, Dheda K, Peters-Bax L, Van Ginneken B (2015) Automatic detection of tuberculosis in chest radiographs using a combination of textural, focal, and shape abnormality analysis. IEEE Trans Med Imaging 34(12):2429–2442
Hooda R, Sofat S, Kaur S, Mittal A, Meriaudeau F (2017) Deep-learning: a potential method for tuberculosis detection using chest radiography. In: Signal and image processing applications (ICSIPA), 2017 IEEE International Conference on. IEEE, Piscataway. https://doi.org/10.1109/ICSIPA.2017.8120663
Hripcsak G, Knirsch CA, Jain NL, Pablos-Mendez A (1997) Automated tuberculosis detection. J Am Med Inform Assoc 4(5):376–381
Hsieh SY, Chou YU (2016) A faster cDNA microarray gene expression data classifier for diagnosing diseases. IEEE/ACM Trans Comput Biol Bioinform 13:43–54
Joshua TB, Laura VC, Nathan CW, Sally AS, Mark NA, Nicholas WA, Benjamin S, Ken OB, Derek JR (2014) DNA repair pathways and their therapeutic potential in lung cancer. Lung Cancer Manag 3:159–173
Kamal MS, Nimmy SF (2017) StrucBreak: a computational framework for structural break detection in DNA sequences. Interdiscip Sci Comput Life Sci 9(4):512–527
Kamal MS, Sarowar MG, Dey N, Ashour AS, Ripon SH, Panigrahi BK, Tavares JMR (2017) Self-organizing mapping based swarm intelligence for secondary and tertiary proteins classification. Int J Mach Learn Cyber. https://doi.org/10.1007/s13042-017-0710-8
Kant S, Srivastava MM (2018) Towards Automated Tuberculosis detection using Deep Learning, eprint arXiv:1801.07080, Computer Science—Computer Vision and Pattern Recognition, 2018 arXiv:180107080K
Kumar K, Prakash A, Anjum F, Islam A, Ahmad F, Hassan MI (2015) Structure-based functional annotation of hypothetical proteins from Candida dubliniensis: a quest for potential drug targets. 3 Biotech 5(4):561–576. https://doi.org/10.1007/s13205-014-0256-3
Kumar A, Sharma A, Kaur G, Makkar P, Kaur J (2016), Functional characterization of hypothetical proteins of Mycobacterium tuberculosis with possible esterase/lipase signature: a cumulative in silico and in vitro approach, https://doi.org/10.1080/07391102.2016.1174738
Lawn SD (2015) Advances in diagnostic assays for tuberculosis. Cold Spring Harbor Perspect Med 5(12):a017806. https://doi.org/10.1101/cshperspect.a017806
Li X, Jin X, Wang H, Zhang X, Lin Z (2016) Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map. DNA Res 23(3):283–293
Liao S, Tammaro M, Yan H (2016) The structure of ends determines the pathway choice and Mre11 nuclease dependency of DNA double-strand break repair. Nucleic Acids Res 15
Lin Y, Zhang H, Zhu N, Wang X, Han Y, Chen M, Jiang J, Si S (2018) Identification of TB-E12 as a novel FtsZ inhibitor with anti-tuberculosis activity. Tuberculosis 110:79–85
Liu Y, Zhao M (2016) lnCaNet: pan-cancer co-expression network for human lncRNA and cancer genes. Bioinformatics 32(10):1595–1597
Machado MR, Pantano S (2016), SIRAH tools: mapping, backmapping and visualization of coarse-grained models. Bioinformatics 32(10):1568–1570
Mazandu GK, Mulder NJ (2012) Function prediction and analysis of Mycobacterium tuberculosis hypothetical proteins. Int J Mol Sci 13(6):7283–7302. https://doi.org/10.3390/ijms13067283
Melendez J, Sánchez CI, Philipsen RHHM, Maduskar P, Dawson R, Theron G, Dheda K, van Ginneken B (2016) An automated tuberculosis screening strategy combining X-ray-based computer-aided detection and clinical information. Sci Rep 6:25265. https://doi.org/10.1038/srep25265
Meyer MJ, Geske P, Haiyuan Y (2016) BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers. Bioinformatics 32(10):1598–2000
”Mycobacterium tuberculosis”. Sanger Institute. 2007-03-29. Retrieved 2008-11-16
Nahid P, Kim PS, Evans CA, Alland D, Barer M, Diefenbach J, Swindells S (2012) Clinical research and development of tuberculosis diagnostics: moving from silos to synergy. J Infect Dis 205(Suppl 2):S159–S168. https://doi.org/10.1093/infdis/jis194
Nicolau I, Ling D, Tian L, Lienhardt C, Pai M (2012) Research questions and priorities for tuberculosis: a survey of published systematic reviews and meta-analyses. PLoS One 7(7):e42479. https://doi.org/10.1371/journal.pone.0042479
O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, White BS, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova Y, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Wenjun L, Donna M, Patrick M, Kelly M, Mc MRM, O’Neill K, Shashikant P, Sanjida HR, Daniel R, Riddick LD, Conrad S, Andrei S, Susan SS, Hanzhen S, Francoise TN, Igor T, Raymond ET, Anjana RV, Craig W, Wendy DW, Melissa W, AviKimchi JL, Tatiana T, DiCuccio M, Paul K, Terence DM, Kim DP (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation, Nucleic Acids Res, 4(44):D733–D745 (Database issue)
Palacios A, Sanchez L, Couso I (2016) An extension of the FURIA classification algorithm to low quality data through fuzzy rankings and its application to the early diagnosis of dyslexia. Neurocomputing 176:60–71
Rabiner LH, Juang BH (1986) An introduction to hidden Markov models, IEEE ASSp Magazine
Rivera-Borroto OM, García-de la Vega JM, Marrero-Ponce Y, Grau R (2016) Relational agreement measures for similarity searching of cheminformatic data sets. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 13(1):158–167
Robertson BD, Altmann D, Barry C, Bishai B, Cole S, Dick T, Duncan K, Dye C, Ehrt S, Esmail H, Flynn J (2012) Detection and treatment of subclinical tuberculosis. Tuberculosis 92(6):447–452
Rodolfo A, Shirolkar A, Fraze C, Stout DA (2011) Characterization of myocardium muscle biostructure using first order features. Dig J Nanomater Biostruct 6(3):1357–1363 (Published: JUL-SEP)
Sáez JA, Galar M, Luengo J, Herrera F (2016), INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf Fusion 27:505–636
Sáez JA, Luengo J, Herrera F (2016), Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35
Sancho-Asensio A, Orriols-Puig A, Casillas J (2016) Evolving association streams. Inf Sci. 334–335:250–272
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Schwertman P, Bekker-Jensen S, Mailand N (2016) Regulation of DNA double-strand break repair by ubiquitin and ubiquitin-like modifiers. Nat Rev Mol Cell Biol 17:379–394
Shi S, Lin N, Zhang Y, Huang C, Liu L, Lu B, Cheng J (2013) Research on Markov property analysis of driving cycle. In: IEEE vehicle power and propulsion conference (VPPC), Beijing, pp 1–5
Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins—a review. Bioinformation 1(8):335–338
Weng J, Ahuja N, Huang TS (1997), “Learning recognition and segmentation of 3-D objects from 2-D images. In: Proceedings of 4th International Conference Computer Vision, Berlin, Germany, pp. 121–128
WHO (2009) Global tuberculosis control: a short update to the Report
Yafei L, Li Q (2016) A semi-parametric statistical model for integrating gene expression profiles across different platforms. BMC Bioinform 17(5)
Youyou Z et al (2016) Long noncoding RNA LINP1 regulates repair of DNA double-strand breaks in triple-negative breast cancer. Nat Struct Mol Biol
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nimmy, S.F., Sarowar, M.G., Dey, N. et al. Investigation of DNA discontinuity for detecting tuberculosis. J Ambient Intell Human Comput 15, 1149–1163 (2024). https://doi.org/10.1007/s12652-018-0878-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-0878-0