Abstract
Cancer is one of the life-threatening diseases caused by changes in the structure of genetic components of the cell. DNA sequences are one of the most important factors in the formation and spread of this disease. The signal processing approach is one of the scientific fields that has been developed in the last two decades in the analysis of DNA sequences. In this research, a hybrid model of discrete Fourier transform and anti-notch digital filter has been used for this purpose. The aim of using these techniques is to model an approach that can distinguish cancerous samples from non-cancerous ones. In other words, a pattern recognition model is designed to discriminate cancerous cell samples based on the features of protein coding regions of DNA sequences. Some computational and statistical techniques have been used in feature extraction and feature selection stages. Despite the proposed model simplicity, it doesn’t face conventional challenges such as high computational complexity or memory dissipation. Case studies have been tested with the least possible feature, depending on the nature of the features. Experimental results and features relationship led to the proposal of the SVM classifier to discriminate two categories. The output features and classification show good discrimination results among the cancerous and non-cancerous samples. One of the main advantages of the proposed model is the independence of its performance over the data length. Evaluation and validation results indicate the high accuracy and precision of the proposed method which emphasizes the biological genetic mutation nature of cancer.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akhtar M (2008) Genomic sequence processing: gene finding in eukaryotes (Doctoral dissertation, The University of New South Wales)
Anjali Chithraranjan AD, Hariprasad SA, Saneesh Cleatus T, Ganesh MM (2014) 19-2014-Novel approach on cancer detection. In: International conference on electrical, electronics and computer engineering (ICEECE-2014), pp 60–63
Barman S, Saha S, Mondal A, Roy M (2001) Signal processing techniques for the analysis of human genome associated with cancer cells. In: 2nd annual international conference IEMCON, pp 570–573
Barman S, Biswas S, Das S, Roy M (2012a) Performance analysis and simulation of IIR anti-notch filter with various structures for gene prediction application. In: 2012 5th International conference on computers and devices for communication (CODEC), pp 1–4
Barman S, Saha S, Mandal A, Roy M (2012b) Prediction of protein coding regions of a DNA sequence through spectral analysis. In: 2012 international conference on informatics, electronics & vision (ICIEV), pp 12–16
Berger JA, Mitra SK, Astola J (2003) Power spectrum analysis for DNA sequences. In: Seventh international symposium on signal processing and its applications, 2003. Proceedings, vol 2, pp 29–32
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, pp 144–152
Burset M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34(3):353–367
Cappelli E, Felici G, Weitschek E (2018) Combining DNA methylation and RNA sequencing data of cancer for supervised knowledge extraction. BioData Min. 11(1):22
Celli F, Cumbo F, Weitschek E (2018) Classification of large DNA methylation datasets for identifying cancer drivers. Big Data Res 13:21–28
Chakraborty S, Gupta V (2016) DWT based cancer identification using EIIP. In: 2016 second international conference on computational intelligence & communication technology (CICT), pp 718–723
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Das J, Barman S (2014) Bayesian fusion in cancer gene prediction. Int J Comput Appl 1:5–10
Das J, Barman S (2017) DSP based entropy estimation for identification and classification of Homo sapiens cancer genes. Microsyst Technol 23(9):4145–4154
Das L, Nanda S, Das JK (2018) An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window. Genomics 111(3):284–296
Datta S, Asif A (2004) DFT based DNA splicing algorithms for prediction of protein coding regions. In: Conference record of the thirty-eighth asilomar conference on signals, systems and computers, vol 1, pp 45–49
Fuentes AR, Ginori JVL, Ábalo RG (2006) Detection of coding regions in large DNA sequences using the short time Fourier Transform with reduced computational load. In: Iberoamerican congress on pattern recognition, pp 902–909
Gayathri TT (2017) Analysis of genomic sequences for prediction of cancerous cells using wavelet technique. Int Res J Eng Technol 4(4):1071–1077
GenBank National Center for Biotechnology Information Database. [Online]. Available: http://www.ncbi.nlm.nih.gov
Ghosh A, Barman S (2013) Prediction of prostate cancer cells based on principal component analysis technique. Proc Technol 10:37–44
Ghosh A, Barman S (2015) Realization of an EVD Model in LABVIEW Envirenent for Identification of Cancer and Healthy Homo sapiens Genes. Ann Fac Eng Hunedoara 13(2):195
Ghosh A, Barman S (2016) Application of BT and PC-BT in Homo sapiens gene prediction. Microsyst Technol 22(11):2691–2705
Hota MK, Srivastava VK (2010) Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time discrete Fourier transform. In: 2010 international conference on power, control and embedded systems (ICPCES), pp 1–4
Hota MK, Srivastava VK (2012) Identification of protein coding regions using antinotch filters. Digit Signal Process 22(6):869–877
James B, James B, David FO (1986) Biochemical engineering fundamentals. Mc Grow Hill Book Company, New York
Jindal R, Banerji B, Grover D (2015) Prediction and identification of cancerous cells using genomic signal processing. Int J Res Eng IT Soc Sci 5:14–26
Joachims T (1999) Transductive inference for text classification using support vector machines. ICML 99:200–209
Kanehisa M, Bork P (2003) Bioinformatics in the post-sequence era. Nat Genet 33(3):305–310
Kaysar MS, Khan MI (2019) Chapman–Kolmogorov relation based median string algorithm for DNA consensus classification. In: 2019 1st International conference on advances in science, engineering and robotics technology (ICASERT), pp 1–6
Kouser K, Lavanya PG, Rangarajan L (2016) Effective feature selection for classification of promoter sequences. PLoS ONE 11(12):e0167165
Kwan HK, Kwan BYM, Kwan JYY (2012) Novel methodologies for spectral classification of exon and intron sequences. EURASIP J Adv Signal Process 2012(1):50–63
La Rosa M, Fiannaca A, Rizzo R, Urso A (2015) Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC Bioinform 16(Suppl 6):S2
Lee PS, Lee KH (2000) Genomic analysis. Curr Opin Biotechnol 11(2):171–175
Liu B (2019) BioSeq-analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 20(4):1280–1294
Marhon SA, Kremer SC (2011) Gene prediction based on DNA spectral analysis: a literature review. J Comput Biol 18(4):639–676
Mesa A, Basterrech S, Guerberoff G, Alvarez-Valin F (2016) Hidden Markov models for gene sequence classification. Pattern Anal Appl 19(3):793–805
Mining WID (2006) Data mining: concepts and techniques. Morgan Kaufinann, Amsterdam
Mitra SK, Kuo Y (2006) Digital signal processing: a computer-based approach, vol 2. McGraw-Hill, New York
Naeem SM, Mabrouk MS, Eldosoky MA (2017) Detecting genetic variants of breast cancer using different power spectrum methods. In: 2017 13th international computer engineering conference (ICENCO), pp 147–153
Osuna E, Freund R, Girosit F (1997) Training support vector machines: an application to face detection. In: IEEE computer society conference on Computer vision and pattern recognition, proceedings, pp 130–136
Pontil M, Verri A (1998) Support vector machines for 3D object recognition. IEEE Trans Pattern Anal Mach Intell 20(6):637–646
Ramírez V, Román-Godínez I, Torres-Ramos S (2019) DNA-MC: tool for mapping and clustering DNA sequences. In: Latin American conference on biomedical engineering, pp 736–742
Rampone S (2004) An error tolerant software equipment for human DNA characterization. IEEE Trans Nucl Sci 51(5):2018–2026
Rampone S, Russo C (2012) A fuzzified BRAIN algorithm for learning DNF from incomplete data. Electron J Appl Stat Anal 5(2):256–270
Rao N, Lei X, Guo J, Huang H, Ren Z (2009) An efficient sliding window strategy for accurate location of eukaryotic protein coding regions. Comput Biol Med 39(4):392–395
Remita MA, Halioui A, Diouara AAM, Daigle B, Kiani G, Diallo AB (2017) A machine learning approach for viral genome classification. BMC Bioinform 18(1):208
Roy T, Barman S (2014) A behavioral study of healthy and cancer genes by modeling electrical network. Gene 550(1):81–92
Roy T, Barman S (2016a) Modeling of cancer classifier to predict site of origin. IEEE Trans Nanobiosci 15(5):481–487
Roy T, Barman S (2016b) Performance analysis of network model to identify healthy and cancerous colon genes. IEEE J Biomed Health Inform 20(2):710–716
Roy T, Barman S (2016c) Design and development of cancer regulatory system by modeling electrical network of gene. Microsyst Technol 22(11):2641–2653
Roy SS, Barman S (2018) A non-invasive cancer gene detection technique using FLANN based adaptive filter. In: Microsystem technologies
Rushdi A, Tuqan J (2005) Gene identification using the Z-curve representation. In: 2006 IEEE international conference on acoustics, speech and signal processing, 2006. ICASSP 2006 Proceedings, vol 2, pp II–II
Saberkari HS, Shamsi M, Sedaaghi MH (2014) A hybrid anti-notch/goertzel model for gene prediction in DNA sequences. Appl Med Inform 34(2):13–22
Satapathi GN, Srihari P, Jyothi A, Lavanya S (2013) Prediction of cancer cell using DSP techniques. In: 2013 international conference on communications and signal processing (ICCSP), pp 149–153
Shakya DK, Saxena R, Sharma SN (2011) A DSP-based approach for gene prediction in eukaryotic genes. Int J Electr Eng Inform 3(4):480–487
Shakya DK, Saxena R, Sharma SN (2013a) Improved exon prediction with transforms by de-noising period-3 measure. Digit Signal Process 23(2):499–505
Shakya DK, Saxena R, Sharma SN (2013b) An adaptive window length strategy for eukaryotic CDS prediction. IEEE/ACM Trans Comput Biol Bioinform 10(5):1241–1252
Sharma S, Sandal K, Garg P, Sharma SD (2017) Performance analysis of window functions for exon prediction in DNA sequences. In: 2017 International conference on computing, communication and automation (ICCCA), pp 283–286
Siegel R, Ward E, Brawley O, Jemal A (2011) Cancer statistics, 2011: the impact of eliminating socioeconomic and racial disparities on premature cancer deaths. Ca-a Cancer J Clin 61(4):212–236
Soentpiet R (1999) Advances in kernel methods: support vector learning. MIT Press, Cambridge
Stepanyan IV, Petoukhov SV (2017) The matrix method of representation, analysis and classification of long genetic sequences. Information 8(1):12
Theodoridis S, Koutroumbas K (2008) Pattern recognition. IEEE Trans Neural Netw 19(2):376
Tiwari S, Ramachandran S, Bhattacharya A, Bhattacharya S, Ramaswamy R (1997) Prediction of probable genes by Fourier analysis of genomic sequences. Bioinformatics 13(3):263–270
Vaidyanathan PP (2004) Genomics and proteomics: a signal processor’s tour. IEEE Circuits Syst Mag 4(4):6–29
Vaidyanathan PP, Yoon B-J (2002a) Gene and exon prediction using allpass-based filters. In: Proceedings of IEEE workshop on genomic signal processing and statistics
Vaidyanathan PP, Yoon B-J (2002b) Digital filters for gene prediction applications. In: Conference record of the thirty-sixth Asilomar conference on signals, systems and computers, vol 1, pp 306–310
Vaidyanathan PP, Yoon B-J (2004) The role of signal-processing concepts in genomics and proteomics. J Franklin Inst 341(1–2):111–135
Wan V, Campbell WM (2000) Support vector machines for speaker verification and identification. In: Neural networks for signal processing X, 2000. Proceedings of the 2000 IEEE signal processing society workshop, vol 2, pp 775–784
Weitschek E, Di Lauro S, Cappelli E, Bertolazzi P, Felici G (2018) CamurWeb: a classification software and a large knowledge base for gene expression data of cancer. BMC Bioinform 19(10):354
Wu Q et al (2018) Deep learning for predicting disease status using genomic data. PeerJ Preprints
Yin C, Yau SS-T (2007) Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 247(4):687–694
Yoon BJ (2007) Signal processing methods for genomic sequence analysis (Doctoral dissertation, California Institute of Technology)
Zainal Ariffin O, Nor Saleha IT (2011) National cancer registry report 2007, Malaysia Ministty of Health
Zhang W-F, Yan H (2012) Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences. Pattern Recognit 45(3):947–955
Zhang R, Zhang C-T (1994) Z curves, an intutive tool for visualizing and analyzing the DNA sequences. J Biomol Struct Dyn 11(4):767–782
Zhang L, Tian F, Wang S (2012) A modified statistically optimal null filter method for recognizing protein-coding regions. Genom Proteom Bioinform 10(3):166–173
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khodaei, A., Feizi-Derakhshi, MR. & Mozaffari-Tazehkand, B. A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods. Soft Comput 24, 16315–16334 (2020). https://doi.org/10.1007/s00500-020-04942-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-04942-4