Review Article | Published:

NON-CODING RNA

Towards a complete map of the human long non-coding RNA transcriptome

Abstract

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

RELATED LINKS

buildLoci: https://github.com/julienlag/buildLoci

GENCODE: www.gencodegenes.org

lncrna.annotator: https://github.com/gold-lab/shared_scripts/tree/master/lncRNA.annotator

UniProt: http://www.uniprot.org/

Pfam: http://pfam.xfam.org/

References

  1. 1.

    Liu, G., Mattick, J. & Taft, R. J. A meta-analysis of the genomic and transcriptomic composition of complex life. Cell Cycle 12, 2061–2072 (2013).

  2. 2.

    Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

  3. 3.

    Fang, S. et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 46, D308–D314 (2018). This study presents the latest instalment of the long-running NONCODE annotation, which was amongst the first ncRNA annotations and currently represents the most extensive collection.

  4. 4.

    Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or transcriptional noise? Evidence selection within long noncoding RNAs. Genome Res. 17, 556–565 (2007). This study initially demonstrated that lncRNA exons and promoters are under purifying evolutionary selection and hence provided strong evidence that, as a gene class, they are functional.

  5. 5.

    Pegueroles, C. & Gabaldón, T. Secondary structure impacts patterns of selection in human lncRNAs. BMC Biol. 14, 60 (2016).

  6. 6.

    Zhu, S. et al. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR–Cas9 library. Nat. Biotechnol. 34, 1279–1286 (2016).

  7. 7.

    Wen, K. et al. Critical roles of long noncoding RNAs in Drosophila spermatogenesis. Genome Res. 26, 1233–1244 (2016).

  8. 8.

    Li, L. & Chang, H. Y. Physiological roles of long noncoding RNAs: insight from knockout mice. Trends Cell Biol. 24, 594–602 (2014).

  9. 9.

    Sauvageau, M. et al. Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife 2, e01749 (2013).

  10. 10.

    Ip, J. Y. et al. Gomafu lncRNA knockout mice exhibit mild hyperactivity with enhanced responsiveness to the psychostimulant methamphetamine. Sci. Rep. 6, 27204 (2016).

  11. 11.

    Chen, G. et al. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986 (2013).

  12. 12.

    Amândio, A. R., Necsulea, A., Joye, E., Mascrez, B. & Duboule, D. Hotair is dispensible for mouse development. PLoS Genet. 12, e1006232 (2016).

  13. 13.

    Quek, X. C. et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015). For many years, this publication was the reference resource for manually curated, experimentally validated functional lncRNAs.

  14. 14.

    Sheik Mohamed, J., Gaughwin, P. M., Lim, B., Robson, P. & Lipovich, L. Conserved long noncoding RNAs transcriptionally regulated by Oct4 and Nanog modulate pluripotency in mouse embryonic stem cells. RNA 16, 324–337 (2010).

  15. 15.

    Loewer, S. et al. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat. Genet. 42, 1113–1117 (2010).

  16. 16.

    Huarte, M. et al. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142, 409–419 (2010).

  17. 17.

    Ng, S.-Y., Johnson, R. & Stanton, L. W. Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 31, 522–533 (2012).

  18. 18.

    Ounzain, S. et al. CARMEN, a human super enhancer-associated long noncoding RNA controlling cardiac specification, differentiation and homeostasis. J. Mol. Cell. Cardiol. 89, 98–112 (2015).

  19. 19.

    Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017).This paper provides a map of hundreds of proliferation-altering lncRNAs across seven human cell lines, representing an invaluable resource of functional genes.

  20. 20.

    Seiler, J. et al. The lncRNA VELUCT strongly regulates viability of lung cancer cells despite its extremely low abundance. Nucleic Acids Res. 45, 5458–5469 (2017).This study presents an intriguing example of an extremely lowly expressed lncRNA that yields a reproducible cellular phenotype after knockdown, thereby challenging the notion that expression cut-off thresholds can be used to discriminate functional lncRNAs.

  21. 21.

    Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G. & Chen, L.-L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).

  22. 22.

    Carrieri, C. et al. Long non-coding antisense RNA controls Uchl1 translation through an embedded SINEB2 repeat. Nature 491, 454–457 (2012).

  23. 23.

    Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

  24. 24.

    Hezroni, H. et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 11, 1110–1122 (2015).

  25. 25.

    Haerty, W. & Ponting, C. P. Unexpected selection to retain high GC content and splicing enhancers within exons of multiexonic lncRNA loci. RNA 21, 320–332 (2015).

  26. 26.

    Mason, M. K. et al. Retinoic acid-independent expression of Meis2 during autopod patterning in the developing bat and mouse limb. Evodevo 6, 6 (2015).

  27. 27.

    Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49, 1731–1740 (2017). This study describes the method of CLS for mapping full-length transcript models in human and mouse samples.

  28. 28.

    Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011).

  29. 29.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

  30. 30.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

  31. 31.

    Kanitz, A. et al. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 16, 150 (2015).

  32. 32.

    Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).

  33. 33.

    Marques, A. C. et al. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol. 14, R131 (2013).

  34. 34.

    Alam, T. et al. Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes. PLoS ONE 9, e109443 (2014).

  35. 35.

    Melé, M. et al. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res. 27, 27–37 (2017).

  36. 36.

    Lanzós, A. et al. Discovery of cancer driver long noncoding RNAs across 1112 tumour genomes: new candidates and distinguishing features. Sci. Rep. 7, 41544 (2017).

  37. 37.

    Juul, M. et al. Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate. eLife 6, e21778 (2017).

  38. 38.

    Tan, J. Y. et al. cis -acting complex-trait-associated lincRNA expression correlates with modulation of chromosomal architecture. Cell Rep. 18, 2280–2288 (2017).

  39. 39.

    Gong, J. et al. A functional polymorphism in lnc-LAMC2-1:1 confers risk of colorectal cancer by affecting miRNA binding. Carcinogenesis 37, 443–451 (2016).

  40. 40.

    de Kok, J. B. et al. DD3(PCA3), a very sensitive and specific marker to detect prostate tumors. Cancer Res. 62, 2695–2698 (2002).

  41. 41.

    Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).

  42. 42.

    Anderson, D. M. et al. A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell 160, 595–606 (2015).

  43. 43.

    Zhou, K. I. et al. N6-methyladenosine modification in a long noncoding RNA hairpin predisposes its conformation to protein binding. J. Mol. Biol. 428, 822–833 (2016).

  44. 44.

    Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).This publication describes MiTranscriptome, the largest annotation to date based on transcriptome assembly using thousands of tumour RNA-seq samples.

  45. 45.

    Hon, C.-C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

  46. 46.

    Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).

  47. 47.

    You, B.-H., Yoon, S.-H. & Nam, J.-W. High-confidence coding and noncoding transcriptome maps. Genome Res. 27, 1050–1062 (2017).This study first attempted the automated annotation of full-length transcripts using CAGE and 3 P-seq data.

  48. 48.

    Mele, M. et al. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).

  49. 49.

    Jan, C. H., Friedman, R. C., Ruby, J. G. & Bartel, D. P. Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs. Nature 469, 97–101 (2011).

  50. 50.

    Harrow, J. et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). This report represents the reference publication for the GENCODE annotation of protein-coding and non-coding genes.

  51. 51.

    Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, 115D–119 (2004).

  52. 52.

    Sonnhammer, E., Eddy, S. R., Birney, E., Bateman, A. & Durbin, R. Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26, 320–322 (1998).

  53. 53.

    Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

  54. 54.

    Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  55. 55.

    Hudson (Chairperson), T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

  56. 56.

    Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).

  57. 57.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  58. 58.

    Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

  59. 59.

    O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

  60. 60.

    The RNAcentral Consortium. RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res. 45, D128–D134 (2017).

  61. 61.

    Volders, P.-J. et al. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res. 43, D174–D180 (2015).

  62. 62.

    Ma, L. et al. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res. 43, D187–D192 (2015).

  63. 63.

    Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014).

  64. 64.

    Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892–897 (2001).

  65. 65.

    Hansen, K. D., Brenner, S. E. & Dudoit, S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131–e131 (2010).

  66. 66.

    Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016). A groundbreaking study using artificial spliced RNAs from a simulated genome as a gold standard by which to evaluate the sensitivity and specificity of transcriptome assembly methods.

  67. 67.

    Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  68. 68.

    Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013). A key resource benchmarking the ability of a range of transcriptome assembly tools to recall annotated exons and transcripts, highlighting their overall poor performance.

  69. 69.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

  70. 70.

    Shao, M. & Kingsford, C. Scallop enables accurate assembly of transcripts through phasing-preserving graph decomposition. Preprint at bioRxiv, 123612 (2017).

  71. 71.

    Liu, S. J. et al. Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol. 17, 67 (2016).

  72. 72.

    Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013). An early detailed view of human transcriptome sequencing using PacBio long-read technology, which established benchmarks for error rates, read lengths and sensitivity in detecting known and novel transcripts.

  73. 73.

    Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research 6, 100 (2017).

  74. 74.

    Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

  75. 75.

    Byrne, A. et al. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun. 8, 16027 (2017).

  76. 76.

    Smith, A. M., Jain, M., Mulroney, L., Garalde, D. R. & Akeson, M. Reading canonical and modified nucleotides in 16S ribosomal RNA using nanopore direct RNA sequencing. Preprint at bioRxiv, 132274 (2017).

  77. 77.

    Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018). An early glimpse of unlimited-length direct RNA-seq using nanopore technology.

  78. 78.

    Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. & Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep. 6, 31602 (2016).

  79. 79.

    Housman, G. & Ulitsky, I. Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs. Biochim. Biophys. Acta 1859, 31–40 (2016).

  80. 80.

    Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

  81. 81.

    Mercer, T. R. et al. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq. Nat. Protoc. 9, 989–1009 (2014).

  82. 82.

    Mercer, T. R. et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat. Biotechnol. 30, 99–104 (2012). Description of the RNA CaptureSeq method, identifying novel isoforms of deeply-studied protein-coding and lncRNA genes.

  83. 83.

    Clark, M. B. et al. Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing. Nat. Methods 12, 339–342 (2015).

  84. 84.

    Bussotti, G. et al. Improved definition of the mouse transcriptome via targeted RNA sequencing. Genome Res. 26, 705–716 (2016).

  85. 85.

    Deveson, I. W. et al. Universal alternative splicing of noncoding exons. Cell Syst. 6, 245–255.e5 (2018).

  86. 86.

    Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl Acad. Sci. USA 111, 9869–9874 (2014).

  87. 87.

    Nellore, A. et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 17, 266 (2016).Describes intropolis, a large-scale data set of splice junctions from essentially all short-read RNA-seq experiments to date, which suggests that the number of splice junctions in the human genome can be exhaustively mapped.

  88. 88.

    Seemann, S. E. et al. The identification and functional annotation of RNA structures conserved in vertebrates. Genome Res. 27, 1371–1383 (2017). A rigorous data set of evolutionarily conserved structures in lncRNA exons, sure to be of value in future efforts to map their functional elements.

  89. 89.

    Bartonicek, N. et al. Intergenic disease-associated regions are abundant in novel transcripts. Genome Biol. 18, 241 (2017).

  90. 90.

    Saini, H. K., Griffiths-Jones, S. & Enright, A. J. Genomic analysis of human microRNA transcripts. Proc. Natl Acad. Sci. USA 104, 17719–17724 (2007).

  91. 91.

    Jaffe, A. E. et al. Developmental regulation of human cortex transcription and its clinical relevance at single base resolution. Nat. Neurosci. 18, 154–161 (2014).

  92. 92.

    Gerrard, D. T. et al. An integrative transcriptomic atlas of organogenesis in human embryos. eLife 5, e15657 (2016).

  93. 93.

    Ahn, R. S. et al. Transcriptional landscape of epithelial and immune cell populations revealed through FACS-seq of healthy human skin. Sci. Rep. 7, 1343 (2017).

  94. 94.

    Wright, J. C. et al. Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat. Commun. 7, 11778 (2016). A description of how large-scale peptidomic data sets can be used at controlled false-discovery rates to identify misidentified protein-coding transcripts amongst lncRNA annotations.

  95. 95.

    Gonzalez-Porta, M., Calvo, M., Sammeth, M. & Guigo, R. Estimation of alternative splicing variability in human populations. Genome Res. 22, 528–538 (2012).

  96. 96.

    Kornienko, A. E. et al. Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol. 17, 14 (2016).

  97. 97.

    Kelley, D. & Rinn, J. Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012).

  98. 98.

    Kapusta, A. et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013).

  99. 99.

    Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).

  100. 100.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

  101. 101.

    Sen, R., Doose, G. & Stadler, P. Rare splice variants in long non-coding RNAs. Non-Coding RNA 3, 23 (2017).

  102. 102.

    Nguyen, Q. & Carninci, P. Expression specificity of disease-associated lncRNAs: toward personalized medicine. Curr. Top. Microbiol. Immunol. 394, 237–258 (2016).

  103. 103.

    Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).

  104. 104.

    Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

  105. 105.

    Kibbe, W. A. et al. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 43, D1071–D1078 (2015).

  106. 106.

    Yu, G. et al. BRWLDA: bi-random walks for predicting lncRNA-disease associations. Oncotarget 8, 60429–60446 (2017).

  107. 107.

    Zhang, J., Zhang, Z., Wang, Z., Liu, Y. & Deng, L. Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification. Bioinformatics https://doi.org/10.1093/bioinformatics/btx833 (2017).

  108. 108.

    Guo, X. et al. Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic Acids Res. 41, e35 (2013).

  109. 109.

    Ning, S. et al. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 44, D980–D985 (2016).

  110. 110.

    Carlevaro-Fita, J. et al. Unique genomic features and deeply-conserved functions of long non-coding RNAs in the Cancer LncRNA Census (CLC). Preprint at bioRxiv, 152769 (2017).

  111. 111.

    Kaewsapsak, P., Shechner, D. M., Mallard, W., Rinn, J. L. & Ting, A. Y. Live-cell mapping of organelle-associated RNAs via proximity biotinylation combined with protein-RNA crosslinking. eLife 6, e29224 (2017).

  112. 112.

    Mas-Ponte, D. et al. LncATLAS database for subcellular localisation of long noncoding RNAs. RNA 23, 1080–1087 (2017).

  113. 113.

    Benoit Bouvrette, L. P. et al. CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells. RNA 24, 98–113 (2018).

  114. 114.

    Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015).

  115. 115.

    Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Preprint at bioRxiv, 189746 (2017).

  116. 116.

    Carlevaro-Fita, J., Das, M., Polidori, T., Navarro, C. & Johnson, R. Ancient exapted transposable elements promote nuclear enrichment of long noncoding RNAs. Preprint at bioRxiv, 189753 (2017).

  117. 117.

    Zhang, B. et al. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA. Mol. Cell. Biol. 34, 2318–2329 (2014).

  118. 118.

    Marín-Béjar, O. et al. The human lncRNA LINC-PINT inhibits tumor cell invasion through a highly conserved sequence element. Genome Biol. 18, 202 (2017).

  119. 119.

    Guttman, M. & Rinn, J. L. Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012).

  120. 120.

    Smola, M. J. et al. SHAPE reveals transcript-wide interactions, complex structural domains, and protein interactions across theXistlncRNA in living cells. Proc. Natl Acad. Sci. USA 113, 10322–10327 (2016).

  121. 121.

    Fang, R., Moss, W. N., Rutenberg-Schoenberg, M. & Simon, M. D. Probing Xist RNA structure in cells using Targeted Structure-Seq. PLoS Genet. 11, e1005668 (2015).

  122. 122.

    Hawkes, E. J. et al. COOLAIR antisense RNAs form evolutionarily conserved elaborate secondary structures. Cell Rep. 16, 3087–3096 (2016).

  123. 123.

    Xue, Z. et al. A G-rich motif in the lncRNA Braveheart interacts with a zinc-finger transcription factor to specify the cardiovascular lineage. Mol. Cell 64, 37–50 (2016).

  124. 124.

    Lee, S. et al. Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell 164, 69–80 (2016).

  125. 125.

    Li, J.-H., Liu, S., Zhou, H., Qu, L.-H. & Yang, J.-H. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 42, D92–D97 (2014).

  126. 126.

    Paraskevopoulou, M. D. et al. DIANA-LncBasev2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res. 44, D231–D238 (2016).

  127. 127.

    Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplex-Inspector: an analysis tool for triplex-mediated targeting of genomic loci. Bioinformatics 29, 1895–1897 (2013).

  128. 128.

    Kelley, D. R., Hendrickson, D. G., Tenen, D. & Rinn, J. L. Transposable elements modulate human RNA abundance and splicing via specific RNA-protein interactions. Genome Biol. 15, 537 (2014).

  129. 129.

    Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).

  130. 130.

    Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  131. 131.

    Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 13, 1273–1289 (2003).

  132. 132.

    Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009).

  133. 133.

    Jia, H. et al. Genome-wide computational identification and manual annotation of human long noncoding. RNA genes. RNA 16, 1478–1487 (2010).

  134. 134.

    Cabili, M. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

  135. 135.

    [No authors listed.] HAVANA Annotation Guidelines, Version 24. Wellcome Sanger Institute ftp://ftp.sanger.ac.uk/pub/project/havana/Guidelines/Guidelines_March_2016.pdf (2016).

  136. 136.

    Wucher, V. et al. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 45, gkw1306 (2017).

  137. 137.

    Dinger, M. E., Pang, K. C., Mercer, T. R. & Mattick, J. S. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput. Biol. 4, e1000176 (2008).

  138. 138.

    Huang, J.-Z. et al. A peptide encoded by a putative lncRNA HOXB-AS3 suppresses colon cancer growth. Mol. Cell 68, 171–184.e6 (2017).

  139. 139.

    Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).

  140. 140.

    Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).

  141. 141.

    Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 16, 179 (2015).

  142. 142.

    Guttman, M., Russell, P., Ingolia, N. T., Weissman, J. S. & Lander, E. S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).

  143. 143.

    Carlevaro-Fita, J., Rahim, A., Guigó, R., Vardy, L. A. & Johnson, R. Cytoplasmic long noncoding RNAs are frequently bound to and degraded at ribosomes in human cells. RNA 22, 867–882 (2016).

  144. 144.

    Banfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).

  145. 145.

    Verheggen, K. et al. Noncoding after all: biases in proteomics data do not explain observed absence of lncRNA translation products. J. Proteome Res. 16, 2508–2515 (2017).One of several studies that carefully examines proteomic evidence for productive translation of lncRNAs.

  146. 146.

    Bruford, E. A., Lane, L. & Harrow, J. Devising a consensus framework for validation of novel human coding loci. J. Proteome Res. 14, 4945–4948 (2015).

  147. 147.

    Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

  148. 148.

    Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007). A pioneering bioinformatic tool for the discrimination of protein-coding and non-coding transcripts, in this case using an alignment-free sequence-feature and homology strategy.

  149. 149.

    Nelson, B. R. et al. A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 351, 271–275 (2016).

  150. 150.

    Ma, J. et al. Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J. Proteome Res. 13, 1757–1765 (2014).

  151. 151.

    Gibb, E. A. et al. Activation of an endogenous retrovirus-associated long non-coding RNA in human adenocarcinoma. Genome Med. 7, 22 (2015).

  152. 152.

    Gascoigne, D. K. et al. Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics 28, 3042–3050 (2012).

  153. 153.

    Ezkurdia, I. et al. The potential clinical impact of the release of two drafts of the human proteome. Expert Rev. Proteom. 12, 579–593 (2015).

  154. 154.

    Lopez, F., Granjeaud, S., Ara, T., Ghattas, B. & Gautheret, D. The disparate nature of “intergenic” polyadenylation sites. RNA 12, 1794–1801 (2006).

  155. 155.

    Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0403s18 (2007).

Download references

Acknowledgements

R.J. acknowledges the support of the Swiss National Science Foundation through the National Centres for Competence in Research (NCCR) ‘RNA & Disease’ and the Medical Faculty of the University Hospital and University of Bern. The authors thank J. Carlevaro-Fita (University of Bern) for help with data analysis and J. Harrow (Illumina), J. Mudge (European Bioinformatics Institute), P. Flicek (European Bioinformatics Institute) and I. Jungreis (Massachusetts Institute of Technology) for fruitful discussions and feedback. A.F. is supported by the Wellcome Trust (WT098051 and WT108749/Z/15/Z), the National Human Genome Research Institute (NHGRI) (U41HG007234, 2U41HG007234) and the European Molecular Biology Laboratory. Work described in this publication was supported by the National Human Genome Research Institute of the US National Institutes of Health (grants U41HG007234, U41HG007000 and U54HG007004) and the Wellcome Trust (grant WT098051 to R.G.). Work in the laboratory of R.G. was supported by the National Human Genome Research Institute (awards U54HG007000, R01MH101814 and U41HG007234), the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013-2017’ and CERCA Programme/Generalitat de Catalunya. The authors thank the following individuals for administrative support: R. Garrido (Centre for Genomic Regulation) and S. Roesselet and D. Re (both at the University of Bern).

Reviewer information

Nature Reviews Genetics thanks M. Dinger, I. Ulitsky and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

B.U.-R. and R.J. researched data for the article. B.U.-R., A.F. and R.J. wrote the article. All authors provided substantial contributions to discussions of the content and reviewed and/or edited the manuscript before submission.

Competing interests

The authors declare no competing interests.

Correspondence to Rory Johnson.

Glossary

Long non-coding RNAs

(lncRNAs). RNA transcripts ≥200 nucleotides long that do not encode any identifiable peptide product.

Annotation

Catalogue of gene loci comprising detailed and hierarchical information on their genomic coordinates and that of their constituent transcript isoforms and exons, all of which are assigned unique and stable identifiers.

Transcriptome assembly

The use of bioinformatic algorithms to reconstruct gene and transcript models based on short sequence reads.

Manual annotation

The creation of gene and transcript models by human annotators based on RNA and protein evidence and according to defined protocols.

Biotype

An annotation label referring to the genomic classification, processing or other characteristics of a locus or transcript intended to provide insights into biological function.

Expressed sequence tags

(ESTs). An early transcriptomic method in which short fragments of transcribed regions, often from 5′ or 3′ ends, are identified through sequencing of cDNA.

Cap analysis of gene expression

(CAGE). A cap-trapping and sequencing method that is considered a gold standard for mapping RNA 5′ ends.

Transcript models

Abstract descriptions of a transcription event, defining the genomic location of the start point, the end point and splice junctions.

Fragments per kilobase per million mapped

(FPKM). One of the principal units of RNA abundance in the context of RNA sequencing experiments, defined as the number of sequenced fragments per kilobase of annotation per million mapped fragments.

Oligonucleotide capture

A method for enriching cDNA libraries with sequences of interest using solution-phase hybridization to tiled, labelled oligonucleotide probes.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Basic concepts of lncRNA annotations.
Fig. 2: Annotation strategies for lncRNAs.
Fig. 3: Comparison of leading lncRNA annotations.
Fig. 4: Integrating capture and long-read sequencing with annotation pipelines.