Abstract
Non-coding (nc) RNA plays a vital role in biological processes and has been associated with diseases such as cancer. Classification of ncRNAs is necessary for understanding the underlying mechanisms of the diseases and to design effective treatments. Recently, deep learning has been employed for ncRNA identification and classification and has shown promising results. In this study, we review the progress of ncRNA type classification, specifically lncRNA, lincRNA, circular RNA and small ncRNA, and present a comprehensive comparison of six deep learning based classification methods published in the past two years. We identify research gaps and challenges of ncRNA types, such as the classification of subclasses of lncRNA, transcript length and compositional variation, dependency on database searches and the high false positive rate of existing approaches. We suggest future directions for cross-species performance deviation, deep learning model selection and sequence intrinsic features.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The dataset, source code and usage instructions are available at http://homepage.cs.latrobe.edu.au/ypchen/ncRNAanalysis/.
Change history
13 January 2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
References
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).
Akbari, H., Khalighinejad, B., Herrero, J. L., Mehta, A. D. & Mesgarani, N. Towards reconstructing intelligible speech from the human auditory cortex. Sci. Rep. 9, 874 (2019).
Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2017).
Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).
Shi, X., Sun, M., Liu, H., Yao, Y. & Song, Y. Long non-coding RNAs: a new frontier in the study of human diseases. Cancer Lett. 339, 159–166 (2013).
Gao, G. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).
Yang, D.-C. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Baek, J., Lee, B., Kwon, S. & Yoon, S. LncRNAnet: long non-coding RNA Identification using deep learning. Bioinformatics 31, 3889–3897 (2018).
Yang, C. et al. LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics 34, 3825–3834 (2018).
Han, S. et al. LncFinder: An integrated platform for long non-coding rna identification utilizing sequence intrinsic composition, structural information and physicochemical property. Brief. Bioinform. 2018, bby065 (2018).
Yu, N., Yu, Z. & Pan, Y. A deep learning method for lincRNA detection using auto-encoder algorithm. BMC Bioinformatics 18, 511 (2017).
Ning, S. et al. LincSNP: a database of linking disease-associated snps to human large intergenic non-coding RNAs. BMC Bioinformatics 15, 152 (2014).
Samur, M. K. et al. Long intergenic non-coding RNAs have an independent impact on survival in multiple myeloma. Leukemia 32, 2626–2635 (2018).
Tuck, A. C. et al. Distinctive features of lincRNA gene expression suggest widespread RNA-independent functions. Life Sci. Alliance 1, e201800124 (2018).
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Chaabane, M. End-to-end Learning Framework for Circular RNA Classification from Other Long Non-coding RNAs Using Multi-modal Deep Learning. Masters thesis, Univ. Louisville (2018).
Ma, Y., Zhang, X., Wang, Y.-Z., Tian, H. & Xu, S. Research progress of circular RNAs in lung cancer. Cancer Biol. Ther. 20, 123–129 (2018).
Childs, L., Nikoloski, Z., May, P. & Walther, D. Identification and classification of ncRNA molecules using graph properties. Nucleic Acids Res. 37, e66–e66 (2009).
Croce, C. M. Causes and consequences of microRNA dysregulation in cancer. Nat. Rev. Genet. 10, 704–714 (2009).
Fiannaca, A., La Rosa, M., La Paglia, L., Rizzo, R. & Urso, A. nRC: non-coding RNA classifier based on structural features. BioData Mining 10, 27 (2017).
Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C., Kaewkamnerdpong, B. & Ruengjitchatchawalya, M. Identification of non-coding RNAs with a new composite feature in the hybrid random forest ensemble algorithm. Nucleic Acids Res. 42, e93 (2014).
Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).
Wang, L. et al. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Sun, L. et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41, e166 (2013).
Li, A. M., Zhang, J. Y. & Zhou, Z. Y. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15, 311 (2014).
Sun, L., Liu, H., Zhang, L. & Meng, J. lncRScan-SVM: a tool for predicting long non-coding RNAs using support vector machine. PloS One 10, e0139654 (2015).
Westholm, J. O. et al. Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation. Cell Rep. 9, 1966–1980 (2014).
Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).
Wei, L. et al. Improved and promising identification of human microRNAs by incorporating a high-quality negative set. ACM Trans. Comput. Biol. Bioinform. 11, 192–201 (2014).
Jiang, L., Zhang, J., Xuan, P. & Zou, Q. BP neural network could help improve pre-miRNA identification in various species. Biomed Res. Int. 2016, 11 (2016).
Hansen, T. B., Veno, M. T., Damgaard, C. K. & Kjems, J. Comparison of circular RNA prediction tools. Nucleic Acids Res. 44, e58 (2016).
Han, S., Liang, Y., Li, Y. & Du, W. Long noncoding RNA identification: comparing machine learning based tools for long noncoding transcripts discrimination. Biomed Res. Int. 2016, 8496165 (2016).
Wang, C., Wei, L., Guo, M. & Zou, Q. Computational approaches in detecting non-coding RNA. Curr. Genomics 14, 371–377 (2013).
Steijger, T. et al. Assessment of transcript reconstruction methods for RNA–seq. Nat. Methods 10, 1177–1184 (2013).
Fickett, J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318 (1982).
Panwar, B., Arora, A. & Raghava, G. P. Prediction and classification of ncRNAs using structural information. BMC Genomics 15, 127 (2014).
Chiu, J. K. H. & Chen, Y.-P. P. A comprehensive study of RNA secondary structure alignment algorithms. Brief. Bioinform. 18, 291–305 (2016).
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Hangauer, M. J., Vaughn, I. W. & McManus, M. T. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet. 9, e1003569 (2013).
Liu, J., Gough, J. & Rost, B. Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet. 2, e29 (2006).
Sato, K., Kato, Y., Hamada, M., Akutsu, T. & Asai, K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27, i85–i93 (2011).
Borgelt, C., Meinl, T. & Berthold, M. MoSS: a program for molecular substructure mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (eds Goethals, B. et al.) 6–15 (ACM, 2005).
Harrow, J. et al. GENCODE: the reference human genome annotation for the encode project. Genome Res. 22, 1760–1774 (2012).
Pan, X. & Xiong, K. PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. Mol. Biosyst. 11, 2219–2226 (2015).
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10, e0118432 (2015).
Laurent, G. S., Wahlestedt, C. & Kapranov, P. The landscape of long noncoding RNA classification. Trends Genet. 31, 239–251 (2015).
Yang, H., Dillon, T. S. & Chen, Y. P. Optimized structure of the traffic flow forecasting model with a deep learning approach. IEEE Trans. Neural Netw. Learn. Syst. 28, 2371–2381 (2017).
Author information
Authors and Affiliations
Contributions
All authors contributed to the manuscript. N.A., Y.P.C. and A.M. conceived the idea. N.A. implemented the code, performed experiments and wrote the paper. A.M. and Y.P.C. contributed to the write up and with experiment analysis. A.M. and Y.P.C. reviewed the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Amin, N., McGrath, A. & Chen, YP.P. Evaluation of deep learning in non-coding RNA classification. Nat Mach Intell 1, 246–256 (2019). https://doi.org/10.1038/s42256-019-0051-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-019-0051-2
This article is cited by
-
GeneAI 3.0: powerful, novel, generalized hybrid and ensemble deep learning frameworks for miRNA species classification of stationary patterns from nucleotides
Scientific Reports (2024)
-
The long noncoding RNA (LINC-RBE) expression in testicular cells is associated with aging of the rat
Biogerontology (2024)
-
The lncRNA lnc-TSI antagonizes sorafenib resistance in hepatocellular carcinoma via downregulating miR-4726-5p expression and upregulating KCNMA1 expression
Journal of Molecular Histology (2024)
-
Cell-type specific and differential expression of LINC-RSAS long noncoding RNA declines in the testes during ageing of the rat
Biogerontology (2024)
-
Insights into the microRNA landscape of Rhodnius prolixus, a vector of Chagas disease
Scientific Reports (2023)