Abstract
Recent genomic studies suggest that long non-coding RNAs (lncRNAs) play an important role in regulation of plant growth. Therefore, it is important to find more plant lncRNAs and predict their functions. This paper presents an improved maximum correlation minimum redundancy method for lncRNAs recognition. Sequence feature, secondary structural feature and functional feature such as pseudo-nucleotides feature which is based on the physical and chemical properties between dimers dinucleotide of related RNA have been extracted. Then, using maximum correlation minimum redundancy method to integrate a variety of feature selection methods such as Pearson correlation coefficient, information gain, relief algorithm and random forest for feature selection. Based on the selected superior feature subset, the classification model is established by SVM. Experimental results on Arabidopsis sequence dataset show that pseudo-nucleotides feature reflects information of different RNA sequences and the classification model constructed according to the proposed method can be more accurate than other methods on identification of plant lncRNAs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
An, N., Palmer, C.M., Baker, R.L., et al.: Plant high-throughput phenotyping using photogrammetry and imaging techniques to measure leaf length and rosette area. Comput. Electron. Agric. 127(C), 376–394 (2016)
Perron, U., Provero, P., Molineris, I.: In silico prediction of lncRNA function using tissue specific and evolutionary conserved expression. BMC Bioinform. 18(5), 144 (2017)
Mercer, T.R., Mattick, J.S.: Structure and function of long noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 20(3), 300 (2013)
Aryal, B., Rotllan, N., Fernández-hernando, C.: Noncoding RNAs and atherosclerosis. Current Atherosclerosis Rep. 16(5), 1–11 (2014)
Lee, J.T., Bartolomei, M.S.: X-inactivation, imprinting, and long noncoding RNAs in health and disease. Cell 152(6), 1308–1323 (2013)
Pian, C., Zhang, G., Chen, Z., et al.: LncRNApred: classification of long non-coding RNAs and protein-coding transcripts by the ensemble algorithm with a new hybrid feature. PLoS ONE 11(5), e0154567 (2016)
Wang, L., Park, H.J., Dasari, S., Wang, S., Kocher, J.-P., Li, W.: CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41(6), e74 (2013)
Long, H., Xu, Z., Hu, B., et al.: COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features. Nucleic Acids Res. 45(1), e2 (2017)
Schneider, H.W., Raiol, T., Brigido, M.M., et al.: A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Genom. 18(1), 804 (2017)
Yen, S.J., Lee, Y.S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36(3), 5718–5727 (2009)
Kumar, M., Gromiha, M.M., Raghava, G.P.: SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J. Mol. Recognit. 24(2), 303–313 (2011)
Tatarinova, T., Brover, V., Troukhan, M., et al.: Skew in CG content near the transcription start site in, Arabidopsis thaliana. Bioinformatics 19(Suppl. 1), i313 (2003)
Stadler, P.F., Hofacker, I.L., Lorenz, R., et al.: ViennaRNA Package 2.0. Algorithms Mol. Biol. 6(1), 26 (2011)
Zhao, Y.W., Su, Z.D., Yang, W., et al.: IonchanPred 2.0: a tool to predict ion channels and their types. Int. J. Mol. Sci. 18(9), 1838 (2017)
Chen, W., Feng, P.M., Lin, H., et al.: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 41(6), e68 (2013)
Liu, B., Liu, F., Fang, L., et al.: repRNA: a web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics 291(1), 473–481 (2016)
Zuber, J., Sun, H., Zhang, X., et al.: A sensitivity analysis of RNA folding nearest neighbor parameters identifies a subset of free energy parameters with the greatest impact on RNA secondary structure prediction. Nucleic Acids Res. 45(10), 6168–6176 (2017)
Dai, J., Xu, Q.: Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl. Soft Comput. J. 13(1), 211–221 (2013)
Shin, J.H., Park, C.H., Yang, Y.J., et al.: Entropy-based analysis of the non-linear relationship between gene expression profiles of amplified and non-amplified RNA. Int. J. Mol. Med. 20(6), 905 (2007)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Acknowledgement
The current study was supported by the National Natural Science Foundation of China (Nos. 61472061 and 31471880), and the Graduate Educational Reform Fund of Dalian University of Technology (Jg2017015).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Meng, J., Jiang, D., Chang, Z., Luan, Y. (2018). Prediction of LncRNA by Using Muitiple Feature Information Fusion and Feature Selection Technique. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-95933-7_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95932-0
Online ISBN: 978-3-319-95933-7
eBook Packages: Computer ScienceComputer Science (R0)