Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Evaluation of deep learning in non-coding RNA classification

An Author Correction to this article was published on 13 January 2020

Matters Arising to this article was published on 13 January 2020

This article has been updated

Abstract

Non-coding (nc) RNA plays a vital role in biological processes and has been associated with diseases such as cancer. Classification of ncRNAs is necessary for understanding the underlying mechanisms of the diseases and to design effective treatments. Recently, deep learning has been employed for ncRNA identification and classification and has shown promising results. In this study, we review the progress of ncRNA type classification, specifically lncRNA, lincRNA, circular RNA and small ncRNA, and present a comprehensive comparison of six deep learning based classification methods published in the past two years. We identify research gaps and challenges of ncRNA types, such as the classification of subclasses of lncRNA, transcript length and compositional variation, dependency on database searches and the high false positive rate of existing approaches. We suggest future directions for cross-species performance deviation, deep learning model selection and sequence intrinsic features.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overall taxonomy of ncRNA.
Fig. 2: Architectures of deep learning models.
Fig. 3: Length distribution of the long non-coding and protein-coding transcripts in human and mouse datasets.
Fig. 4: ROC curves for the lncRNA classification algorithms.
Fig. 5: Precision recall curves for the lncRNA classification algorithms.
Fig. 6

Data availability

The dataset, source code and usage instructions are available at http://homepage.cs.latrobe.edu.au/ypchen/ncRNAanalysis/.

Change history

  • 13 January 2020

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

  1. 1.

    Chen, L., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018).

    Article  Google Scholar 

  2. 2.

    Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).

    Article  Google Scholar 

  3. 3.

    Akbari, H., Khalighinejad, B., Herrero, J. L., Mehta, A. D. & Mesgarani, N. Towards reconstructing intelligible speech from the human auditory cortex. Sci. Rep. 9, 874 (2019).

    Article  Google Scholar 

  4. 4.

    Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 18, 851–869 (2017).

    Google Scholar 

  5. 5.

    Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).

    Article  Google Scholar 

  6. 6.

    Shi, X., Sun, M., Liu, H., Yao, Y. & Song, Y. Long non-coding RNAs: a new frontier in the study of human diseases. Cancer Lett. 339, 159–166 (2013).

    Article  Google Scholar 

  7. 7.

    Gao, G. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).

    Article  Google Scholar 

  8. 8.

    Yang, D.-C. et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).

    Article  Google Scholar 

  9. 9.

    Baek, J., Lee, B., Kwon, S. & Yoon, S. LncRNAnet: long non-coding RNA Identification using deep learning. Bioinformatics 31, 3889–3897 (2018).

    Article  Google Scholar 

  10. 10.

    Yang, C. et al. LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning. Bioinformatics 34, 3825–3834 (2018).

    Article  Google Scholar 

  11. 11.

    Han, S. et al. LncFinder: An integrated platform for long non-coding rna identification utilizing sequence intrinsic composition, structural information and physicochemical property. Brief. Bioinform. 2018, bby065 (2018).

    Google Scholar 

  12. 12.

    Yu, N., Yu, Z. & Pan, Y. A deep learning method for lincRNA detection using auto-encoder algorithm. BMC Bioinformatics 18, 511 (2017).

    Article  Google Scholar 

  13. 13.

    Ning, S. et al. LincSNP: a database of linking disease-associated snps to human large intergenic non-coding RNAs. BMC Bioinformatics 15, 152 (2014).

    Article  Google Scholar 

  14. 14.

    Samur, M. K. et al. Long intergenic non-coding RNAs have an independent impact on survival in multiple myeloma. Leukemia 32, 2626–2635 (2018).

    Article  Google Scholar 

  15. 15.

    Tuck, A. C. et al. Distinctive features of lincRNA gene expression suggest widespread RNA-independent functions. Life Sci. Alliance 1, e201800124 (2018).

    Article  Google Scholar 

  16. 16.

    Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    Article  Google Scholar 

  17. 17.

    Chaabane, M. End-to-end Learning Framework for Circular RNA Classification from Other Long Non-coding RNAs Using Multi-modal Deep Learning. Masters thesis, Univ. Louisville (2018).

  18. 18.

    Ma, Y., Zhang, X., Wang, Y.-Z., Tian, H. & Xu, S. Research progress of circular RNAs in lung cancer. Cancer Biol. Ther. 20, 123–129 (2018).

    Article  Google Scholar 

  19. 19.

    Childs, L., Nikoloski, Z., May, P. & Walther, D. Identification and classification of ncRNA molecules using graph properties. Nucleic Acids Res. 37, e66–e66 (2009).

    Article  Google Scholar 

  20. 20.

    Croce, C. M. Causes and consequences of microRNA dysregulation in cancer. Nat. Rev. Genet. 10, 704–714 (2009).

    Article  Google Scholar 

  21. 21.

    Fiannaca, A., La Rosa, M., La Paglia, L., Rizzo, R. & Urso, A. nRC: non-coding RNA classifier based on structural features. BioData Mining 10, 27 (2017).

    Article  Google Scholar 

  22. 22.

    Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C., Kaewkamnerdpong, B. & Ruengjitchatchawalya, M. Identification of non-coding RNAs with a new composite feature in the hybrid random forest ensemble algorithm. Nucleic Acids Res. 42, e93 (2014).

    Article  Google Scholar 

  23. 23.

    Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).

    Article  Google Scholar 

  24. 24.

    Wang, L. et al. CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

    Article  Google Scholar 

  25. 25.

    Sun, L. et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41, e166 (2013).

    Article  Google Scholar 

  26. 26.

    Li, A. M., Zhang, J. Y. & Zhou, Z. Y. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15, 311 (2014).

    Article  Google Scholar 

  27. 27.

    Sun, L., Liu, H., Zhang, L. & Meng, J. lncRScan-SVM: a tool for predicting long non-coding RNAs using support vector machine. PloS One 10, e0139654 (2015).

    Article  Google Scholar 

  28. 28.

    Westholm, J. O. et al. Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation. Cell Rep. 9, 1966–1980 (2014).

    Article  Google Scholar 

  29. 29.

    Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).

    Article  Google Scholar 

  30. 30.

    Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).

    Article  Google Scholar 

  31. 31.

    Wei, L. et al. Improved and promising identification of human microRNAs by incorporating a high-quality negative set. ACM Trans. Comput. Biol. Bioinform. 11, 192–201 (2014).

    Article  Google Scholar 

  32. 32.

    Jiang, L., Zhang, J., Xuan, P. & Zou, Q. BP neural network could help improve pre-miRNA identification in various species. Biomed Res. Int. 2016, 11 (2016).

    Google Scholar 

  33. 33.

    Hansen, T. B., Veno, M. T., Damgaard, C. K. & Kjems, J. Comparison of circular RNA prediction tools. Nucleic Acids Res. 44, e58 (2016).

    Article  Google Scholar 

  34. 34.

    Han, S., Liang, Y., Li, Y. & Du, W. Long noncoding RNA identification: comparing machine learning based tools for long noncoding transcripts discrimination. Biomed Res. Int. 2016, 8496165 (2016).

    Google Scholar 

  35. 35.

    Wang, C., Wei, L., Guo, M. & Zou, Q. Computational approaches in detecting non-coding RNA. Curr. Genomics 14, 371–377 (2013).

    Article  Google Scholar 

  36. 36.

    Steijger, T. et al. Assessment of transcript reconstruction methods for RNA–seq. Nat. Methods 10, 1177–1184 (2013).

    Article  Google Scholar 

  37. 37.

    Fickett, J. W. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318 (1982).

    Article  Google Scholar 

  38. 38.

    Panwar, B., Arora, A. & Raghava, G. P. Prediction and classification of ncRNAs using structural information. BMC Genomics 15, 127 (2014).

    Article  Google Scholar 

  39. 39.

    Chiu, J. K. H. & Chen, Y.-P. P. A comprehensive study of RNA secondary structure alignment algorithms. Brief. Bioinform. 18, 291–305 (2016).

    Google Scholar 

  40. 40.

    Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  Google Scholar 

  41. 41.

    Hangauer, M. J., Vaughn, I. W. & McManus, M. T. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet. 9, e1003569 (2013).

    Article  Google Scholar 

  42. 42.

    Liu, J., Gough, J. & Rost, B. Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet. 2, e29 (2006).

    Article  Google Scholar 

  43. 43.

    Sato, K., Kato, Y., Hamada, M., Akutsu, T. & Asai, K. IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27, i85–i93 (2011).

    Article  Google Scholar 

  44. 44.

    Borgelt, C., Meinl, T. & Berthold, M. MoSS: a program for molecular substructure mining. In Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (eds Goethals, B. et al.) 6–15 (ACM, 2005).

  45. 45.

    Harrow, J. et al. GENCODE: the reference human genome annotation for the encode project. Genome Res. 22, 1760–1774 (2012).

    Article  Google Scholar 

  46. 46.

    Pan, X. & Xiong, K. PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. Mol. Biosyst. 11, 2219–2226 (2015).

    Article  Google Scholar 

  47. 47.

    Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res. 31, 439–441 (2003).

    Article  Google Scholar 

  48. 48.

    Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One 10, e0118432 (2015).

    Article  Google Scholar 

  49. 49.

    Laurent, G. S., Wahlestedt, C. & Kapranov, P. The landscape of long noncoding RNA classification. Trends Genet. 31, 239–251 (2015).

    Article  Google Scholar 

  50. 50.

    Yang, H., Dillon, T. S. & Chen, Y. P. Optimized structure of the traffic flow forecasting model with a deep learning approach. IEEE Trans. Neural Netw. Learn. Syst. 28, 2371–2381 (2017).

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Contributions

All authors contributed to the manuscript. N.A., Y.P.C. and A.M. conceived the idea. N.A. implemented the code, performed experiments and wrote the paper. A.M. and Y.P.C. contributed to the write up and with experiment analysis. A.M. and Y.P.C. reviewed the article.

Corresponding author

Correspondence to Yi-Ping Phoebe Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amin, N., McGrath, A. & Chen, YP.P. Evaluation of deep learning in non-coding RNA classification. Nat Mach Intell 1, 246–256 (2019). https://doi.org/10.1038/s42256-019-0051-2

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing