Non-coding (nc) RNA plays a vital role in biological processes and has been associated with diseases such as cancer. Classification of ncRNAs is necessary for understanding the underlying mechanisms of the diseases and to design effective treatments. Recently, deep learning has been employed for ncRNA identification and classification and has shown promising results. In this study, we review the progress of ncRNA type classification, specifically lncRNA, lincRNA, circular RNA and small ncRNA, and present a comprehensive comparison of six deep learning based classification methods published in the past two years. We identify research gaps and challenges of ncRNA types, such as the classification of subclasses of lncRNA, transcript length and compositional variation, dependency on database searches and the high false positive rate of existing approaches. We suggest future directions for cross-species performance deviation, deep learning model selection and sequence intrinsic features.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The dataset, source code and usage instructions are available at http://homepage.cs.latrobe.edu.au/ypchen/ncRNAanalysis/.
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).
Akbari, H., Khalighinejad, B., Herrero, J. L., Mehta, A. D. & Mesgarani, N. Towards reconstructing intelligible speech from the human auditory cortex. Sci. Rep. 9, 874 (2019).
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2017).
Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).
Shi, X., Sun, M., Liu, H., Yao, Y. & Song, Y. Long non-coding RNAs: a new frontier in the study of human diseases. Cancer Lett. 339, 159–166 (2013).
Gao, G. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).
Yang, D.-C. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Baek, J., Lee, B., Kwon, S. & Yoon, S. LncRNAnet: long non-coding RNA Identification using deep learning. Bioinformatics 31, 3889–3897 (2018).
Yang, C. et al. LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics 34, 3825–3834 (2018).
Han, S. et al. LncFinder: An integrated platform for long non-coding rna identification utilizing sequence intrinsic composition, structural information and physicochemical property. Brief. Bioinform. 2018, bby065 (2018).
Yu, N., Yu, Z. & Pan, Y. A deep learning method for lincRNA detection using auto-encoder algorithm. BMC Bioinformatics 18, 511 (2017).
Ning, S. et al. LincSNP: a database of linking disease-associated snps to human large intergenic non-coding RNAs. BMC Bioinformatics 15, 152 (2014).
Samur, M. K. et al. Long intergenic non-coding RNAs have an independent impact on survival in multiple myeloma. Leukemia 32, 2626–2635 (2018).
Tuck, A. C. et al. Distinctive features of lincRNA gene expression suggest widespread RNA-independent functions. Life Sci. Alliance 1, e201800124 (2018).
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Chaabane, M. End-to-end Learning Framework for Circular RNA Classification from Other Long Non-coding RNAs Using Multi-modal Deep Learning. Masters thesis, Univ. Louisville (2018).
Ma, Y., Zhang, X., Wang, Y.-Z., Tian, H. & Xu, S. Research progress of circular RNAs in lung cancer. Cancer Biol. Ther. 20, 123–129 (2018).
Childs, L., Nikoloski, Z., May, P. & Walther, D. Identification and classification of ncRNA molecules using graph properties. Nucleic Acids Res. 37, e66–e66 (2009).
Croce, C. M. Causes and consequences of microRNA dysregulation in cancer. Nat. Rev. Genet. 10, 704–714 (2009).
Fiannaca, A., La Rosa, M., La Paglia, L., Rizzo, R. & Urso, A. nRC: non-coding RNA classifier based on structural features. BioData Mining 10, 27 (2017).
Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C., Kaewkamnerdpong, B. & Ruengjitchatchawalya, M. Identification of non-coding RNAs with a new composite feature in the hybrid random forest ensemble algorithm. Nucleic Acids Res. 42, e93 (2014).
Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).
Wang, L. et al. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Sun, L. et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41, e166 (2013).
Li, A. M., Zhang, J. Y. & Zhou, Z. Y. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15, 311 (2014).
Sun, L., Liu, H., Zhang, L. & Meng, J. lncRScan-SVM: a tool for predicting long non-coding RNAs using support vector machine. PloS One 10, e0139654 (2015).
Westholm, J. O. et al. Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation. Cell Rep. 9, 1966–1980 (2014).
Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).
Wei, L. et al. Improved and promising identification of human microRNAs by incorporating a high-quality negative set. ACM Trans. Comput. Biol. Bioinform. 11, 192–201 (2014).
Jiang, L., Zhang, J., Xuan, P. & Zou, Q. BP neural network could help improve pre-miRNA identification in various species. Biomed Res. Int. 2016, 11 (2016).
Hansen, T. B., Veno, M. T., Damgaard, C. K. & Kjems, J. Comparison of circular RNA prediction tools. Nucleic Acids Res. 44, e58 (2016).
Han, S., Liang, Y., Li, Y. & Du, W. Long noncoding RNA identification: comparing machine learning based tools for long noncoding transcripts discrimination. Biomed Res. Int. 2016, 8496165 (2016).
Wang, C., Wei, L., Guo, M. & Zou, Q. Computational approaches in detecting non-coding RNA. Curr. Genomics 14, 371–377 (2013).
Steijger, T. et al. Assessment of transcript reconstruction methods for RNA–seq. Nat. Methods 10, 1177–1184 (2013).
Fickett, J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318 (1982).
Panwar, B., Arora, A. & Raghava, G. P. Prediction and classification of ncRNAs using structural information. BMC Genomics 15, 127 (2014).
Chiu, J. K. H. & Chen, Y.-P. P. A comprehensive study of RNA secondary structure alignment algorithms. Brief. Bioinform. 18, 291–305 (2016).
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Hangauer, M. J., Vaughn, I. W. & McManus, M. T. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet. 9, e1003569 (2013).
Liu, J., Gough, J. & Rost, B. Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet. 2, e29 (2006).
Sato, K., Kato, Y., Hamada, M., Akutsu, T. & Asai, K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27, i85–i93 (2011).
Borgelt, C., Meinl, T. & Berthold, M. MoSS: a program for molecular substructure mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (eds Goethals, B. et al.) 6–15 (ACM, 2005).
Harrow, J. et al. GENCODE: the reference human genome annotation for the encode project. Genome Res. 22, 1760–1774 (2012).
Pan, X. & Xiong, K. PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. Mol. Biosyst. 11, 2219–2226 (2015).
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10, e0118432 (2015).
Laurent, G. S., Wahlestedt, C. & Kapranov, P. The landscape of long noncoding RNA classification. Trends Genet. 31, 239–251 (2015).
Yang, H., Dillon, T. S. & Chen, Y. P. Optimized structure of the traffic flow forecasting model with a deep learning approach. IEEE Trans. Neural Netw. Learn. Syst. 28, 2371–2381 (2017).
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Amin, N., McGrath, A. & Chen, YP.P. Evaluation of deep learning in non-coding RNA classification. Nat Mach Intell 1, 246–256 (2019). https://doi.org/10.1038/s42256-019-0051-2
BMC Genomics (2021)
Clinical Epigenetics (2021)
Nature Machine Intelligence (2021)
Nature Machine Intelligence (2021)
Neural Computing and Applications (2021)