Abstract
Long non-coding RNA (lncRNA) plays an important role in regulating biological activities. Traditional feature engineering methods for lncRNA prediction rely on prior experience and require manual feature extraction from some related datasets. Besides, the structure of plant lncRNA is complex. It is difficult to extract features with good discrimination. This paper proposes a method based on long short-term memory networks (LSTM) for lncRNA recognition called lncRNA-LSTM. K-means clustering is used to solve the problem of unbalanced sample size at first, p-nts coding is performed according to the characteristics of RNA sequences, and it is input into a recurrent neural network including embedded layer, LSTM layer and full connection layer. lncRNA-LSTM is more effective than support vector machine, Naive Bayes and other model with feature fusing of open reading frame, second structure and k-mers. Using the same Zea mays dataset, lncRNA-LSTM achieves 96.2% accuracy which is 0.053, 0.173, 0.211 and 0.162 higher than that of CPC2, CNCI, PLEK and LncADeep, the precision and recall are much more effective and robust.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Falazzo, A.F., Lee, E.S.: Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2 (2015)
Aryal, B., Rotllan, N., Fernándezhernando, C.: Noncoding RNAs and atherosclerosis. Curr. Atherosclerosis Rep. 16(5), 1–11 (2014)
Schmitz, S.U., Grote, P., Herrmann, B.G.: Mechanisms of long noncoding RNA function in development and disease. Cell. Mol. Life Sci. 73(13), 2491–2509 (2016)
O’Leary, V.B., Ovsepian, S.V., et al.: PARTICLE, a triplex-forming long ncRNA, regulates locus-specific methylation in response to low-dose irradiation. Cell Rep. 11(3), 474–485 (2015)
Schneider, H.W., Raiol, T., Brigido, M.M., et al.: A support vector machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Genom. 18(1), 804 (2017)
Long, H., Xu, Z., Hu, B., et al.: COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features. Nucleic Acids Res. 45(1), e2 (2017)
Kong, L., Zhang, Y., Ye, Z.Q., et al.: CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 36, W345–W349 (2007)
Kang, Y.J., Yang, D., Kong, C.L., et al.: CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45(W1), W12–W16 (2017)
Wang, L.G., Hyun, J.P., Surendra, D., et al.: CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 41(6), e74 (2013)
Sun, L., Luo, H.T., Bu, D.C., et al.: Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41(17), e166 (2013)
Li, A.M., Zhang, J.Y., Zhou, Z.Y.: PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinform. 15, 311 (2014)
Baek, J., Lee, B., Kwon, S., et al.: LncRNAnet: long non-coding RNA identification using deep learning. Bioinformatics 34(22), 3889–3897 (2018)
Yang, C., Yang, L.S., Zhou, M., et al.: LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics 34(22), 3825–3834 (2018)
Pan, X.Y., Shen, H.B.: RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinform. 18, 136 (2017)
Bai, Y., Dai, X., Harrison, A.P., et al.: RNA regulatory networks in animals and plants: a long noncoding RNA perspective. Brief. Funct. Genomics 14(2), 91–101 (2015)
Liu, G., Guo, J.B.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338 (2019)
Andreu, P.G., Antonio, H.P., Irantzu, A.L., et al.: GREENC: a Wiki-based database of plant lncRNAs. Nucleic Acids Res. 44(D1), D1161–D1166 (2016)
Li, X., Yang, L., Chen, L.-L.: The biogenesis, functions, and challenges of circular RNAs. Mol. Cell 71(3), 428–442 (2018)
Ehsaneddin, A., Mohammad, R.K., et al.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10(11), e0141287 (2015)
Hochreite, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Dinger, M.E., Pang, K.C., Mercer, T.R., et al.: Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput. Biol. 4(11), e1000176 (2008)
Ronny, L., Stephan, H.B., Christian, H.S., et al.: ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011)
Acknowledgment
The current study was supported by the National Natural Science Foundation of China (Nos. 61872055 and 31872116).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Meng, J., Chang, Z., Zhang, P., Shi, W., Luan, Y. (2019). lncRNA-LSTM: Prediction of Plant Long Non-coding RNAs Using Long Short-Term Memory Based on p-nts Encoding. In: Huang, DS., Huang, ZK., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2019. Lecture Notes in Computer Science(), vol 11645. Springer, Cham. https://doi.org/10.1007/978-3-030-26766-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-26766-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26765-0
Online ISBN: 978-3-030-26766-7
eBook Packages: Computer ScienceComputer Science (R0)