NON-CODING RNA

Towards a complete map of the human long non-coding RNA transcriptome

Abstract

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Basic concepts of lncRNA annotations.
Fig. 2: Annotation strategies for lncRNAs.
Fig. 3: Comparison of leading lncRNA annotations.
Fig. 4: Integrating capture and long-read sequencing with annotation pipelines.

References

  1. 1.

    Liu, G., Mattick, J. & Taft, R. J. A meta-analysis of the genomic and transcriptomic composition of complex life. Cell Cycle 12, 2061–2072 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  2. 2.

    Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  3. 3.

    Fang, S. et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 46, D308–D314 (2018). This study presents the latest instalment of the long-running NONCODE annotation, which was amongst the first ncRNA annotations and currently represents the most extensive collection.

    PubMed  Article  Google Scholar 

  4. 4.

    Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or transcriptional noise? Evidence selection within long noncoding RNAs. Genome Res. 17, 556–565 (2007). This study initially demonstrated that lncRNA exons and promoters are under purifying evolutionary selection and hence provided strong evidence that, as a gene class, they are functional.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  5. 5.

    Pegueroles, C. & Gabaldón, T. Secondary structure impacts patterns of selection in human lncRNAs. BMC Biol. 14, 60 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  6. 6.

    Zhu, S. et al. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR–Cas9 library. Nat. Biotechnol. 34, 1279–1286 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  7. 7.

    Wen, K. et al. Critical roles of long noncoding RNAs in Drosophila spermatogenesis. Genome Res. 26, 1233–1244 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Li, L. & Chang, H. Y. Physiological roles of long noncoding RNAs: insight from knockout mice. Trends Cell Biol. 24, 594–602 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  9. 9.

    Sauvageau, M. et al. Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife 2, e01749 (2013).

    Article  Google Scholar 

  10. 10.

    Ip, J. Y. et al. Gomafu lncRNA knockout mice exhibit mild hyperactivity with enhanced responsiveness to the psychostimulant methamphetamine. Sci. Rep. 6, 27204 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  11. 11.

    Chen, G. et al. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986 (2013).

    PubMed  Article  CAS  Google Scholar 

  12. 12.

    Amândio, A. R., Necsulea, A., Joye, E., Mascrez, B. & Duboule, D. Hotair is dispensible for mouse development. PLoS Genet. 12, e1006232 (2016).

    Article  CAS  Google Scholar 

  13. 13.

    Quek, X. C. et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015). For many years, this publication was the reference resource for manually curated, experimentally validated functional lncRNAs.

    PubMed  Article  CAS  Google Scholar 

  14. 14.

    Sheik Mohamed, J., Gaughwin, P. M., Lim, B., Robson, P. & Lipovich, L. Conserved long noncoding RNAs transcriptionally regulated by Oct4 and Nanog modulate pluripotency in mouse embryonic stem cells. RNA 16, 324–337 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Loewer, S. et al. Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat. Genet. 42, 1113–1117 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  16. 16.

    Huarte, M. et al. A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142, 409–419 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  17. 17.

    Ng, S.-Y., Johnson, R. & Stanton, L. W. Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 31, 522–533 (2012).

    PubMed  Article  CAS  Google Scholar 

  18. 18.

    Ounzain, S. et al. CARMEN, a human super enhancer-associated long noncoding RNA controlling cardiac specification, differentiation and homeostasis. J. Mol. Cell. Cardiol. 89, 98–112 (2015).

    PubMed  Article  CAS  Google Scholar 

  19. 19.

    Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017).This paper provides a map of hundreds of proliferation-altering lncRNAs across seven human cell lines, representing an invaluable resource of functional genes.

    PubMed  Article  CAS  Google Scholar 

  20. 20.

    Seiler, J. et al. The lncRNA VELUCT strongly regulates viability of lung cancer cells despite its extremely low abundance. Nucleic Acids Res. 45, 5458–5469 (2017).This study presents an intriguing example of an extremely lowly expressed lncRNA that yields a reproducible cellular phenotype after knockdown, thereby challenging the notion that expression cut-off thresholds can be used to discriminate functional lncRNAs.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  21. 21.

    Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G. & Chen, L.-L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. 22.

    Carrieri, C. et al. Long non-coding antisense RNA controls Uchl1 translation through an embedded SINEB2 repeat. Nature 491, 454–457 (2012).

    PubMed  Article  CAS  Google Scholar 

  23. 23.

    Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  24. 24.

    Hezroni, H. et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 11, 1110–1122 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. 25.

    Haerty, W. & Ponting, C. P. Unexpected selection to retain high GC content and splicing enhancers within exons of multiexonic lncRNA loci. RNA 21, 320–332 (2015).

    PubMed Central  Article  CAS  Google Scholar 

  26. 26.

    Mason, M. K. et al. Retinoic acid-independent expression of Meis2 during autopod patterning in the developing bat and mouse limb. Evodevo 6, 6 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49, 1731–1740 (2017). This study describes the method of CLS for mapping full-length transcript models in human and mouse samples.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  28. 28.

    Gong, C. & Maquat, L. E. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3′ UTRs via Alu elements. Nature 470, 284–288 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  29. 29.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  30. 30.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    PubMed  Article  CAS  Google Scholar 

  31. 31.

    Kanitz, A. et al. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol. 16, 150 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  32. 32.

    Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  33. 33.

    Marques, A. C. et al. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol. 14, R131 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Alam, T. et al. Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes. PLoS ONE 9, e109443 (2014).

    Article  CAS  Google Scholar 

  35. 35.

    Melé, M. et al. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res. 27, 27–37 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  36. 36.

    Lanzós, A. et al. Discovery of cancer driver long noncoding RNAs across 1112 tumour genomes: new candidates and distinguishing features. Sci. Rep. 7, 41544 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  37. 37.

    Juul, M. et al. Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate. eLife 6, e21778 (2017).

    Article  Google Scholar 

  38. 38.

    Tan, J. Y. et al. cis -acting complex-trait-associated lincRNA expression correlates with modulation of chromosomal architecture. Cell Rep. 18, 2280–2288 (2017).

    PubMed  Article  CAS  Google Scholar 

  39. 39.

    Gong, J. et al. A functional polymorphism in lnc-LAMC2-1:1 confers risk of colorectal cancer by affecting miRNA binding. Carcinogenesis 37, 443–451 (2016).

    PubMed  Article  CAS  Google Scholar 

  40. 40.

    de Kok, J. B. et al. DD3(PCA3), a very sensitive and specific marker to detect prostate tumors. Cancer Res. 62, 2695–2698 (2002).

    PubMed  Google Scholar 

  41. 41.

    Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  42. 42.

    Anderson, D. M. et al. A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell 160, 595–606 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43.

    Zhou, K. I. et al. N6-methyladenosine modification in a long noncoding RNA hairpin predisposes its conformation to protein binding. J. Mol. Biol. 428, 822–833 (2016).

    PubMed  Article  CAS  Google Scholar 

  44. 44.

    Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).This publication describes MiTranscriptome, the largest annotation to date based on transcriptome assembly using thousands of tumour RNA-seq samples.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  45. 45.

    Hon, C.-C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

    PubMed  Article  CAS  Google Scholar 

  46. 46.

    Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).

    PubMed  Article  CAS  Google Scholar 

  47. 47.

    You, B.-H., Yoon, S.-H. & Nam, J.-W. High-confidence coding and noncoding transcriptome maps. Genome Res. 27, 1050–1062 (2017).This study first attempted the automated annotation of full-length transcripts using CAGE and 3 P-seq data.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  48. 48.

    Mele, M. et al. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  49. 49.

    Jan, C. H., Friedman, R. C., Ruby, J. G. & Bartel, D. P. Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs. Nature 469, 97–101 (2011).

    PubMed  Article  CAS  Google Scholar 

  50. 50.

    Harrow, J. et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). This report represents the reference publication for the GENCODE annotation of protein-coding and non-coding genes.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  51. 51.

    Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 32, 115D–119 (2004).

    Article  CAS  Google Scholar 

  52. 52.

    Sonnhammer, E., Eddy, S. R., Birney, E., Bateman, A. & Durbin, R. Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 26, 320–322 (1998).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  53. 53.

    Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  54. 54.

    Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  CAS  Google Scholar 

  55. 55.

    Hudson (Chairperson), T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

    Article  CAS  Google Scholar 

  56. 56.

    Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).

    PubMed  Article  CAS  Google Scholar 

  57. 57.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. 58.

    Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

    PubMed  Article  CAS  Google Scholar 

  59. 59.

    O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    PubMed  Article  CAS  Google Scholar 

  60. 60.

    The RNAcentral Consortium. RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res. 45, D128–D134 (2017).

    Article  CAS  Google Scholar 

  61. 61.

    Volders, P.-J. et al. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res. 43, D174–D180 (2015).

    PubMed  Article  CAS  Google Scholar 

  62. 62.

    Ma, L. et al. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res. 43, D187–D192 (2015).

    PubMed  Article  CAS  Google Scholar 

  63. 63.

    Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  64. 64.

    Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892–897 (2001).

    PubMed  Article  CAS  Google Scholar 

  65. 65.

    Hansen, K. D., Brenner, S. E. & Dudoit, S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131–e131 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  66. 66.

    Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016). A groundbreaking study using artificial spliced RNAs from a simulated genome as a gold standard by which to evaluate the sensitivity and specificity of transcriptome assembly methods.

    PubMed  Article  CAS  Google Scholar 

  67. 67.

    Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  68. 68.

    Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013). A key resource benchmarking the ability of a range of transcriptome assembly tools to recall annotated exons and transcripts, highlighting their overall poor performance.

    PubMed  Article  CAS  Google Scholar 

  69. 69.

    Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  70. 70.

    Shao, M. & Kingsford, C. Scallop enables accurate assembly of transcripts through phasing-preserving graph decomposition. Preprint at bioRxiv, 123612 (2017).

  71. 71.

    Liu, S. J. et al. Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol. 17, 67 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  72. 72.

    Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013). An early detailed view of human transcriptome sequencing using PacBio long-read technology, which established benchmarks for error rates, read lengths and sensitivity in detecting known and novel transcripts.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  73. 73.

    Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Research 6, 100 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  74. 74.

    Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  75. 75.

    Byrne, A. et al. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun. 8, 16027 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  76. 76.

    Smith, A. M., Jain, M., Mulroney, L., Garalde, D. R. & Akeson, M. Reading canonical and modified nucleotides in 16S ribosomal RNA using nanopore direct RNA sequencing. Preprint at bioRxiv, 132274 (2017).

  77. 77.

    Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018). An early glimpse of unlimited-length direct RNA-seq using nanopore technology.

    PubMed  Article  CAS  Google Scholar 

  78. 78.

    Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. & Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep. 6, 31602 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  79. 79.

    Housman, G. & Ulitsky, I. Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs. Biochim. Biophys. Acta 1859, 31–40 (2016).

    PubMed  Article  CAS  Google Scholar 

  80. 80.

    Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  81. 81.

    Mercer, T. R. et al. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq. Nat. Protoc. 9, 989–1009 (2014).

    PubMed  Article  CAS  Google Scholar 

  82. 82.

    Mercer, T. R. et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat. Biotechnol. 30, 99–104 (2012). Description of the RNA CaptureSeq method, identifying novel isoforms of deeply-studied protein-coding and lncRNA genes.

    Article  CAS  Google Scholar 

  83. 83.

    Clark, M. B. et al. Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing. Nat. Methods 12, 339–342 (2015).

    PubMed  Article  CAS  Google Scholar 

  84. 84.

    Bussotti, G. et al. Improved definition of the mouse transcriptome via targeted RNA sequencing. Genome Res. 26, 705–716 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  85. 85.

    Deveson, I. W. et al. Universal alternative splicing of noncoding exons. Cell Syst. 6, 245–255.e5 (2018).

    PubMed  Article  CAS  Google Scholar 

  86. 86.

    Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl Acad. Sci. USA 111, 9869–9874 (2014).

    PubMed  Article  CAS  Google Scholar 

  87. 87.

    Nellore, A. et al. Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Genome Biol. 17, 266 (2016).Describes intropolis, a large-scale data set of splice junctions from essentially all short-read RNA-seq experiments to date, which suggests that the number of splice junctions in the human genome can be exhaustively mapped.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  88. 88.

    Seemann, S. E. et al. The identification and functional annotation of RNA structures conserved in vertebrates. Genome Res. 27, 1371–1383 (2017). A rigorous data set of evolutionarily conserved structures in lncRNA exons, sure to be of value in future efforts to map their functional elements.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  89. 89.

    Bartonicek, N. et al. Intergenic disease-associated regions are abundant in novel transcripts. Genome Biol. 18, 241 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  90. 90.

    Saini, H. K., Griffiths-Jones, S. & Enright, A. J. Genomic analysis of human microRNA transcripts. Proc. Natl Acad. Sci. USA 104, 17719–17724 (2007).

    PubMed  Article  Google Scholar 

  91. 91.

    Jaffe, A. E. et al. Developmental regulation of human cortex transcription and its clinical relevance at single base resolution. Nat. Neurosci. 18, 154–161 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  92. 92.

    Gerrard, D. T. et al. An integrative transcriptomic atlas of organogenesis in human embryos. eLife 5, e15657 (2016).

    Article  Google Scholar 

  93. 93.

    Ahn, R. S. et al. Transcriptional landscape of epithelial and immune cell populations revealed through FACS-seq of healthy human skin. Sci. Rep. 7, 1343 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  94. 94.

    Wright, J. C. et al. Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow. Nat. Commun. 7, 11778 (2016). A description of how large-scale peptidomic data sets can be used at controlled false-discovery rates to identify misidentified protein-coding transcripts amongst lncRNA annotations.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  95. 95.

    Gonzalez-Porta, M., Calvo, M., Sammeth, M. & Guigo, R. Estimation of alternative splicing variability in human populations. Genome Res. 22, 528–538 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  96. 96.

    Kornienko, A. E. et al. Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol. 17, 14 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  97. 97.

    Kelley, D. & Rinn, J. Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol. 13, R107 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  98. 98.

    Kapusta, A. et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013).

    Article  CAS  Google Scholar 

  99. 99.

    Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  100. 100.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  CAS  Google Scholar 

  101. 101.

    Sen, R., Doose, G. & Stadler, P. Rare splice variants in long non-coding RNAs. Non-Coding RNA 3, 23 (2017).

    PubMed Central  Article  Google Scholar 

  102. 102.

    Nguyen, Q. & Carninci, P. Expression specificity of disease-associated lncRNAs: toward personalized medicine. Curr. Top. Microbiol. Immunol. 394, 237–258 (2016).

    PubMed  Google Scholar 

  103. 103.

    Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).

    PubMed  Article  CAS  Google Scholar 

  104. 104.

    Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  105. 105.

    Kibbe, W. A. et al. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 43, D1071–D1078 (2015).

    PubMed  Article  CAS  Google Scholar 

  106. 106.

    Yu, G. et al. BRWLDA: bi-random walks for predicting lncRNA-disease associations. Oncotarget 8, 60429–60446 (2017).

    PubMed  PubMed Central  Google Scholar 

  107. 107.

    Zhang, J., Zhang, Z., Wang, Z., Liu, Y. & Deng, L. Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification. Bioinformatics https://doi.org/10.1093/bioinformatics/btx833 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  108. 108.

    Guo, X. et al. Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic Acids Res. 41, e35 (2013).

    Article  CAS  Google Scholar 

  109. 109.

    Ning, S. et al. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 44, D980–D985 (2016).

    PubMed  Article  CAS  Google Scholar 

  110. 110.

    Carlevaro-Fita, J. et al. Unique genomic features and deeply-conserved functions of long non-coding RNAs in the Cancer LncRNA Census (CLC). Preprint at bioRxiv, 152769 (2017).

  111. 111.

    Kaewsapsak, P., Shechner, D. M., Mallard, W., Rinn, J. L. & Ting, A. Y. Live-cell mapping of organelle-associated RNAs via proximity biotinylation combined with protein-RNA crosslinking. eLife 6, e29224 (2017).

    Article  Google Scholar 

  112. 112.

    Mas-Ponte, D. et al. LncATLAS database for subcellular localisation of long noncoding RNAs. RNA 23, 1080–1087 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  113. 113.

    Benoit Bouvrette, L. P. et al. CeFra-seq reveals broad asymmetric mRNA and noncoding RNA distribution profiles in Drosophila and human cells. RNA 24, 98–113 (2018).

    PubMed  Article  CAS  Google Scholar 

  114. 114.

    Cabili, M. N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  115. 115.

    Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Preprint at bioRxiv, 189746 (2017).

  116. 116.

    Carlevaro-Fita, J., Das, M., Polidori, T., Navarro, C. & Johnson, R. Ancient exapted transposable elements promote nuclear enrichment of long noncoding RNAs. Preprint at bioRxiv, 189753 (2017).

  117. 117.

    Zhang, B. et al. A novel RNA motif mediates the strict nuclear localization of a long noncoding RNA. Mol. Cell. Biol. 34, 2318–2329 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  118. 118.

    Marín-Béjar, O. et al. The human lncRNA LINC-PINT inhibits tumor cell invasion through a highly conserved sequence element. Genome Biol. 18, 202 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  119. 119.

    Guttman, M. & Rinn, J. L. Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  120. 120.

    Smola, M. J. et al. SHAPE reveals transcript-wide interactions, complex structural domains, and protein interactions across theXistlncRNA in living cells. Proc. Natl Acad. Sci. USA 113, 10322–10327 (2016).

    PubMed  Article  CAS  Google Scholar 

  121. 121.

    Fang, R., Moss, W. N., Rutenberg-Schoenberg, M. & Simon, M. D. Probing Xist RNA structure in cells using Targeted Structure-Seq. PLoS Genet. 11, e1005668 (2015).

    Article  CAS  Google Scholar 

  122. 122.

    Hawkes, E. J. et al. COOLAIR antisense RNAs form evolutionarily conserved elaborate secondary structures. Cell Rep. 16, 3087–3096 (2016).

    PubMed  Article  CAS  Google Scholar 

  123. 123.

    Xue, Z. et al. A G-rich motif in the lncRNA Braveheart interacts with a zinc-finger transcription factor to specify the cardiovascular lineage. Mol. Cell 64, 37–50 (2016).

    PubMed  Article  CAS  Google Scholar 

  124. 124.

    Lee, S. et al. Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell 164, 69–80 (2016).

    PubMed  Article  CAS  Google Scholar 

  125. 125.

    Li, J.-H., Liu, S., Zhou, H., Qu, L.-H. & Yang, J.-H. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 42, D92–D97 (2014).

    PubMed  Article  CAS  Google Scholar 

  126. 126.

    Paraskevopoulou, M. D. et al. DIANA-LncBasev2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res. 44, D231–D238 (2016).

    PubMed  Article  CAS  Google Scholar 

  127. 127.

    Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplex-Inspector: an analysis tool for triplex-mediated targeting of genomic loci. Bioinformatics 29, 1895–1897 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  128. 128.

    Kelley, D. R., Hendrickson, D. G., Tenen, D. & Rinn, J. L. Transposable elements modulate human RNA abundance and splicing via specific RNA-protein interactions. Genome Biol. 15, 537 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  129. 129.

    Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).

    PubMed  Article  CAS  Google Scholar 

  130. 130.

    Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

    PubMed  Article  CAS  Google Scholar 

  131. 131.

    Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 13, 1273–1289 (2003).

    PubMed  PubMed Central  Article  Google Scholar 

  132. 132.

    Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009).

    PubMed  Article  Google Scholar 

  133. 133.

    Jia, H. et al. Genome-wide computational identification and manual annotation of human long noncoding. RNA genes. RNA 16, 1478–1487 (2010).

    PubMed  CAS  Google Scholar 

  134. 134.

    Cabili, M. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  135. 135.

    [No authors listed.] HAVANA Annotation Guidelines, Version 24. Wellcome Sanger Institute ftp://ftp.sanger.ac.uk/pub/project/havana/Guidelines/Guidelines_March_2016.pdf (2016).

  136. 136.

    Wucher, V. et al. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 45, gkw1306 (2017).

    Article  CAS  Google Scholar 

  137. 137.

    Dinger, M. E., Pang, K. C., Mercer, T. R. & Mattick, J. S. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput. Biol. 4, e1000176 (2008).

    Article  CAS  Google Scholar 

  138. 138.

    Huang, J.-Z. et al. A peptide encoded by a putative lncRNA HOXB-AS3 suppresses colon cancer growth. Mol. Cell 68, 171–184.e6 (2017).

    PubMed  Article  CAS  Google Scholar 

  139. 139.

    Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  140. 140.

    Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).

    Article  CAS  Google Scholar 

  141. 141.

    Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 16, 179 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  142. 142.

    Guttman, M., Russell, P., Ingolia, N. T., Weissman, J. S. & Lander, E. S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  143. 143.

    Carlevaro-Fita, J., Rahim, A., Guigó, R., Vardy, L. A. & Johnson, R. Cytoplasmic long noncoding RNAs are frequently bound to and degraded at ribosomes in human cells. RNA 22, 867–882 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  144. 144.

    Banfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  145. 145.

    Verheggen, K. et al. Noncoding after all: biases in proteomics data do not explain observed absence of lncRNA translation products. J. Proteome Res. 16, 2508–2515 (2017).One of several studies that carefully examines proteomic evidence for productive translation of lncRNAs.

    PubMed  Article  CAS  Google Scholar 

  146. 146.

    Bruford, E. A., Lane, L. & Harrow, J. Devising a consensus framework for validation of novel human coding loci. J. Proteome Res. 14, 4945–4948 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  147. 147.

    Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

    Article  CAS  Google Scholar 

  148. 148.

    Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007). A pioneering bioinformatic tool for the discrimination of protein-coding and non-coding transcripts, in this case using an alignment-free sequence-feature and homology strategy.

    PubMed  PubMed Central  Article  Google Scholar 

  149. 149.

    Nelson, B. R. et al. A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle. Science 351, 271–275 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  150. 150.

    Ma, J. et al. Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J. Proteome Res. 13, 1757–1765 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  151. 151.

    Gibb, E. A. et al. Activation of an endogenous retrovirus-associated long non-coding RNA in human adenocarcinoma. Genome Med. 7, 22 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  152. 152.

    Gascoigne, D. K. et al. Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics 28, 3042–3050 (2012).

    PubMed  Article  CAS  Google Scholar 

  153. 153.

    Ezkurdia, I. et al. The potential clinical impact of the release of two drafts of the human proteome. Expert Rev. Proteom. 12, 579–593 (2015).

    Article  CAS  Google Scholar 

  154. 154.

    Lopez, F., Granjeaud, S., Ara, T., Ghattas, B. & Gautheret, D. The disparate nature of “intergenic” polyadenylation sites. RNA 12, 1794–1801 (2006).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  155. 155.

    Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics https://doi.org/10.1002/0471250953.bi0403s18 (2007).

    PubMed  Article  Google Scholar 

Download references

Acknowledgements

R.J. acknowledges the support of the Swiss National Science Foundation through the National Centres for Competence in Research (NCCR) ‘RNA & Disease’ and the Medical Faculty of the University Hospital and University of Bern. The authors thank J. Carlevaro-Fita (University of Bern) for help with data analysis and J. Harrow (Illumina), J. Mudge (European Bioinformatics Institute), P. Flicek (European Bioinformatics Institute) and I. Jungreis (Massachusetts Institute of Technology) for fruitful discussions and feedback. A.F. is supported by the Wellcome Trust (WT098051 and WT108749/Z/15/Z), the National Human Genome Research Institute (NHGRI) (U41HG007234, 2U41HG007234) and the European Molecular Biology Laboratory. Work described in this publication was supported by the National Human Genome Research Institute of the US National Institutes of Health (grants U41HG007234, U41HG007000 and U54HG007004) and the Wellcome Trust (grant WT098051 to R.G.). Work in the laboratory of R.G. was supported by the National Human Genome Research Institute (awards U54HG007000, R01MH101814 and U41HG007234), the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013-2017’ and CERCA Programme/Generalitat de Catalunya. The authors thank the following individuals for administrative support: R. Garrido (Centre for Genomic Regulation) and S. Roesselet and D. Re (both at the University of Bern).

Reviewer information

Nature Reviews Genetics thanks M. Dinger, I. Ulitsky and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

B.U.-R. and R.J. researched data for the article. B.U.-R., A.F. and R.J. wrote the article. All authors provided substantial contributions to discussions of the content and reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Rory Johnson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

RELATED LINKS

buildLoci: https://github.com/julienlag/buildLoci

GENCODE: www.gencodegenes.org

lncrna.annotator: https://github.com/gold-lab/shared_scripts/tree/master/lncRNA.annotator

UniProt: http://www.uniprot.org/

Pfam: http://pfam.xfam.org/

Glossary

Long non-coding RNAs

(lncRNAs). RNA transcripts ≥200 nucleotides long that do not encode any identifiable peptide product.

Annotation

Catalogue of gene loci comprising detailed and hierarchical information on their genomic coordinates and that of their constituent transcript isoforms and exons, all of which are assigned unique and stable identifiers.

Transcriptome assembly

The use of bioinformatic algorithms to reconstruct gene and transcript models based on short sequence reads.

Manual annotation

The creation of gene and transcript models by human annotators based on RNA and protein evidence and according to defined protocols.

Biotype

An annotation label referring to the genomic classification, processing or other characteristics of a locus or transcript intended to provide insights into biological function.

Expressed sequence tags

(ESTs). An early transcriptomic method in which short fragments of transcribed regions, often from 5′ or 3′ ends, are identified through sequencing of cDNA.

Cap analysis of gene expression

(CAGE). A cap-trapping and sequencing method that is considered a gold standard for mapping RNA 5′ ends.

Transcript models

Abstract descriptions of a transcription event, defining the genomic location of the start point, the end point and splice junctions.

Fragments per kilobase per million mapped

(FPKM). One of the principal units of RNA abundance in the context of RNA sequencing experiments, defined as the number of sequenced fragments per kilobase of annotation per million mapped fragments.

Oligonucleotide capture

A method for enriching cDNA libraries with sequences of interest using solution-phase hybridization to tiled, labelled oligonucleotide probes.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Uszczynska-Ratajczak, B., Lagarde, J., Frankish, A. et al. Towards a complete map of the human long non-coding RNA transcriptome. Nat Rev Genet 19, 535–548 (2018). https://doi.org/10.1038/s41576-018-0017-y

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing