Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The RNA Atlas expands the catalog of human non-coding RNAs

A Publisher Correction to this article was published on 28 June 2021

This article has been updated

Abstract

Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: RNA Atlas transcriptome generation and annotation.
Fig. 2: The RNA Atlas transcriptome catalogued many single-exon lncRNAs and revealed previously non-annotated PCGs.
Fig. 3: Analyses of RNA polyadenylation status.
Fig. 4: The association between sample ontology and expression distance.
Fig. 5: Total RNA transcriptomes facilitated the use of intron expression profiles to study regulatory modalities.
Fig. 6: Evidence for regulation by lncRNAs.
Fig. 7: Interpretation of lncRNA function.

Data availability

All types of RNA entities can be readily explored via the online R2: Genomics Analysis and Visualization Platform (http://r2.amc.nl) and via a dedicated accessible portal (http://r2platform.com/rna_atlas). This portal includes genome browser profiles for the total RNA as well as polyA tracks for all samples. All samples can also be used for correlations, differential signals and many more analyses. In addition, the LongHorn results, described in this manuscript, can be explored.

The raw data (FASTQ files) and processed expression measurement tables from all RNA biotypes across samples have been deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE138734.

Code availability

Computer code used to generate the results presented in this manuscript is available at https://github.com/llorenzi90/RNA_Atlas.

Change history

References

  1. 1.

    Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).

  2. 2.

    Chen, L.-L. The biogenesis and emerging roles of circular RNAs. Nat. Rev. Mol. Cell Biol. 17, 205–211 (2016).

  3. 3.

    Lorenzi, L. Long noncoding RNA expression profiling in cancer: challenges and opportunities. GenesÿChromosomes Cancer 58, 191–199 (2019).

    CAS  Google Scholar 

  4. 4.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  5. 5.

    Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).

  6. 6.

    Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    De Rie, D. et al. An integrated expression atlas of miRNAs and their promoters in human and mouse. Nat. Biotechnol. 35, 872–878 (2017).

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).

  10. 10.

    Iyer, M. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Vo, J. N. et al. The landscape of circular RNA in cancer. Cell 176, 869–881 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

  13. 13.

    You, B. H., Yoon, S. H. & Nam, J. W. High-confidence coding and noncoding transcriptome maps. Genome Res. 27, 1050–1062 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Melé, M. et al. Human genomics. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Arun, G., Diermeier, S. D. & Spector, D. L. Therapeutic targeting of long non-coding RNAs in cancer. Trends Mol. Med. 24, 257–277 (2018).

  16. 16.

    Leucci, E. et al. Melanoma addiction to the long non-coding RNA SAMMSON. Nature 531, 518–522 (2016).

  17. 17.

    Hosono, Y. et al. Oncogenic role of THOR, a conserved cancer/testis long non-coding RNA. Cell 171, 1559–1572 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Cunningham, F. et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).

  19. 19.

    Roadmap Epigenomics Consortium, K. A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–329 (2015).

    Google Scholar 

  20. 20.

    Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, eaah7111 (2017).

  21. 21.

    Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    O’Leary, N. A. et al. Reference Sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Vromman, M., Vandesompele, J. & Volders, P.-J. Closing the circle: current state and perspectives of circular RNA databases. Brief Bioinform. 22, 288–297 (2021).

  24. 24.

    Jeck, W. R. et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19, 141–157 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Kozomara, A. & Griffiths-Jones, S. MiRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, 68–73 (2014).

    Google Scholar 

  27. 27.

    Friedländer, M. R., MacKowiak, S. D., Li, N., Chen, W. & Rajewsky, N. MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40, 37–52 (2012).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Backes, C. et al. miRCarta: a central repository for collecting miRNA candidates. Nucleic Acids Res. 46, D160–D167 (2018).

  29. 29.

    Fromm, B. et al. MirGeneDB 2.0: the metazoan microRNA complement. Nucleic Acids Res. 48, D132–D141 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

  31. 31.

    Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

  33. 33.

    Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

  34. 34.

    Frese, S. et al. Long-term endurance exercise in humans stimulates cell fusion of myoblasts along with fusogenic endogenous retroviral genes in vivo. PLoS ONE 10, e1032099 (2015).

  35. 35.

    Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G. & Chen, L. L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).

  36. 36.

    Cabili, M. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

  37. 37.

    Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 27, 927–936 (2015).

  38. 38.

    Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene 34, 4845–4854 (2015).

  39. 39.

    Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

  40. 40.

    Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Chiu, H. et al. Cupid: simultaneous reconstruction of microRNA-target and ceRNA networks. Genome Res. 25, 257–267 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Chiu, H. S. et al. Pan-cancer analysis of lncRNA regulation supports their targeting of cancer genes in each tumor context. Cell Rep. 23, 297–312 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Karreth, F. A. & Pandolfi, P. P. CeRNA cross-talk in cancer: when ce-bling rivalries go awry. Cancer Discov. 3, 1113–1121 (2013).

  44. 44.

    Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 (2010).

  45. 45.

    Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353–358 (2011).

  46. 46.

    Tay, Y., Rinn, J. & Pandolfi, P. P. The multilayered complexity of ceRNA crosstalk and competition. Nature 505, 344–352 (2014).

  47. 47.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  48. 48.

    Salzman, J., Gawad, C., Wang, P. L., Lacayo, N. & Brown, P. O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE 7, e30733 (2012).

  49. 49.

    Djebali, S. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Liberzon, A. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Ramilowski, J. A. Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res. 30, (2020).

  52. 52.

    Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Langmead Ben, StevenS. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2013).

    Google Scholar 

  54. 54.

    Pertea, M. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Trapnell, C. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  Google Scholar 

  59. 59.

    Cobos, F. A. et al. Zipper plot: visualizing transcriptional activity of genomic regions. BMC Bioinformatics 18, 231 (2017).

  60. 60.

    Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res. 9, ISCB Comm J-304 (2020).

  61. 61.

    Hubisz, M. J., Pollard, K. S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 12, 41–51 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).

    CAS  Google Scholar 

  63. 63.

    Vizcaíno, J. A. et al. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2012).

    PubMed  PubMed Central  Google Scholar 

  64. 64.

    Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Silva, C. A. S. et al. Data-driven rescoring of metabolite annotations significantly improves sensitivity. Anal. Chem. 90, 11636–11642 (2018).

    Google Scholar 

  66. 66.

    The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018).

  67. 67.

    Zhang, X. O. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 26, 1277–1287 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Gordon, A., Hannon, G. J. & Gordon. FASTX-Toolkit. http://hannonlab.cshl.edu/fastx_toolkit/

  69. 69.

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  70. 70.

    Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).

  71. 71.

    Lefever, S. et al. High-throughput PCR assay design for targeted resequencing using primerXL. BMC Bioinformatics 18, 400 (2017).

  72. 72.

    Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 16, 1297–1305 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Gleeson, J., Lane, T. A., Harrison, P. J., Haerty, W. & Clark, M. B. Nanopore direct RNA sequencing detects differential expression between human cell populations. Preprint at bioRxiv https://doi.org/10.1101/2020.08.02.232785 (2020).

  74. 74.

    Leger, A. et al. RNA modifications detection by comparative nanopore direct RNA sequencing. Preprint at bioRxiv https://doi.org/10.1101/843136 (2019).

  75. 75.

    Cole, C., Byrne, A., Adams, M., Volden, R. & Vollmers, C. Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing. Genome Res. 30, 589–601 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76.

    De Coster, W., D’hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).

  77. 77.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3091–3100 (2018).

  78. 78.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  79. 79.

    Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 31, 166–169 (2015).

    CAS  Google Scholar 

  80. 80.

    Nicorici, D. et al. FusionCatcher—a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Preprint at https://doi.org/10.1101/011650 (2014).

  81. 81.

    Goovaerts, T. et al. A comprehensive overview of genomic imprinting in breast and its deregulation in cancer. Nat. Commun. 9, 4120 (2018).

  82. 82.

    Van Der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  83. 83.

    R Development Core Team. R: A Language and Environment for Statistical Computing. http://www.R-project.org (R Foundation for Statistical Computing, 2011).

  84. 84.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  85. 85.

    Liao Y, Smyth GK, Shi W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics https://doi.org/10.1093/bioinformatics/btt656 (2014).

  86. 86.

    Bovolenta, L. A., Acencio, M. L. & Lemke, N. HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions. BMC Genomics 13, 405 (2012).

  87. 87.

    Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

  88. 88.

    Whitfield, T. W. et al. Functional analysis of transcription factor binding sites in human promoters. Genome Biol. 13, R50 (2012).

  89. 89.

    Xiao, F. et al. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 37, D105–D110 (2009).

  90. 90.

    Vlachos, I. S. et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 43, D153–D159 (2015).

  91. 91.

    Da, H. S. et al. MiRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 42, D78–D85 (2014).

  92. 92.

    Grosswendt, S. et al. Unambiguous identification of miRNA: target site interactions by different types of ligation reactions. Mol. Cell 54, 1042–1054 (2014).

  93. 93.

    Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 22, 1372–1381 (2012).

  94. 94.

    Garcia, D. M. et al. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18, 1139–1146 (2010).

  95. 95.

    Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

  96. 96.

    Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).

  97. 97.

    Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).

  98. 98.

    Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    CAS  PubMed  Google Scholar 

  99. 99.

    Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).

  100. 100.

    Pachkov, M., Balwierz, P. J., Arnold, P., Ozonov, E. & Van Nimwegen, E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res. 41, D214–D220 (2013).

  101. 101.

    Smith, A. D., Sumazin, P., Xuan, Z. & Zhang, M. Q. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl Acad. Sci. USA 103, 6275–6280 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. 102.

    Smith, A. D., Sumazin, P., Das, D. & Zhang, M. Q. Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics 21 (Suppl. 1), i403–i412 (2005).

  103. 103.

    Sz‚kely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).

    Google Scholar 

  104. 104.

    Van Nostrand, E. L. et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods 13, 504–514 (2016).

  105. 105.

    Lury, D. A. & Fisher, R. A. Statistical methods for research workers. J. R. Stat. Soc. Ser. D Statistician https://doi.org/10.2307/2986695 (1972).

  106. 106.

    Brown, M. B. 400: a method for combining non-independent, one-sided tests of significance. Biometrics 31, 987–992 (1975).

    Google Scholar 

  107. 107.

    Hough, S. H., Ajetunmobi, A., Brody, L., Humphryes-Kirilov, N. & Perello, E. Desktop Genetics. Per. Med. 13, 517–521 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. 108.

    Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).

  109. 109.

    Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  110. 110.

    Bushnell, B. BBMap. https://sourceforge.net/projects/bbmap/

  111. 111.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Google Scholar 

Download references

Acknowledgements

F.A.C. is supported by a Special Research Fund (BOF) scholarship of Ghent University (BOF.DOC.2017.0026.01). R.C. is supported by the Fonds Wetenschappelijk Onderzoek (11Y6218N). T.-W.C. is supported by grants from the Ministry of Science and Technology, Taiwan (MOST-109-2311-B-009 −002). A.U. is supported by research funding from the National Health and Medical Research Council (Australia) and the Leukemia & Lymphoma Society, the Leukemia Foundation and the Snowdome Foundation. G.A. is supported by a postgraduate scholarship from the Translational Cancer Research Network. M.R.W. and N.P.D. acknowledge support from the National Collaborative Research Infrastructure Strategy program, administered by Bioplatforms Australia. We thank N. Yigit, A. Barr, S. Pathak, L. Way and A. Mai for their contributions in library preparation and A. Yunghans, E. Jaeger and A. Moshrefi for their assistance in library organization and sequencing/tracking/data management. This project was funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 668858 and 826121 to P.M., P.S. and J. Koster and the Concerted Research Action of Ghent University (BOF/GOA 01G00819) to P.M. and K.B.

Author information

Affiliations

Authors

Contributions

P.M., J.V. and P.S. conceived the idea and designed and supervised the project. L.L. and H.-S.C. contributed to the implementation and design of most bioinformatic analyses. L.L performed most of the raw sequencing data processing, transcriptome assembly and filtering, polyadenylation classification and most of the presented analyses for quality assessment and characterization of the generated transcriptome. H.-S.C., T.-W.C. and P.S. performed the analyses related to prediction and validation of regulatory interactions mediated by ncRNAs. F.A.C. and K.D.P. performed the analyses to select the RNA Atlas genes and contributed to quality validation of the transcriptome. S.G., S.K. and G.P.S. generated and sequenced the polyA and total RNA libraries. P.-J.V. performed the evaluation of coding potential, analyses of mass spectrometry data, alignment of candidate protein sequences to other animal proteins via BLASTp and analysis of conservation with chimpanzee. R.C. and Y. S. contributed to the analyses of RNA biotype expression and sample ontology associations. J.N. performed the polyA-minus sequencing and the qPCR experiments. K. Vanderheyden and J.N. generated and sequenced the small RNA libraries. J.A. implemented the identification of miRNAs and sequence motif analysis. S.L. designed the primers for the qPCR experiments and contributed to the graphic design of schematic figures. A.P.T. performed the analysis of overlap between ONT reads in public datasets and RNA Atlas-only single-exon genes. E.J.B., W.T. and F.G. performed the experiments of CRISPRi-mediated transcriptional silencing of lncRNA MALAT1. M.V. generated the integrated circRNA reference dataset used for comparisons with RNA Atlas circRNAs. T.G. and T.D.M. performed the imprinting analyses. T.B.H. and J. Kjems implemented the circRNA identification workflow. N.N. developed the polyA-minus sequencing protocol. T.T., K. Vermaelen and K.R.B. provided immune system-related cell lines and cell types. N.P.D., G.A., M.R.W. and A.U. performed analyses and annotation of circRNAs and contributed to the analysis of ONT reads in public datasets. J. Koster developed dedicated tools to analyze RNA Atlas data and results and implemented them in a dedicated RNA Atlas datascope in the online portal R2. P.M. led the writing of the manuscript in collaboration with L.L., H.-S.C. and P.S. L.L., H.-S.C., G.P.S., J.V., P.S. and P.M. contributed to the conceptualization, interpretation and discussion of results. All authors commented on the manuscript and contributed to the presentation of the data and results. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper.

Corresponding authors

Correspondence to Pavel Sumazin or Pieter Mestdagh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Steven Salzberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–28 and detailed description of Supplementary Tables 1–28.

Reporting Summary

Supplementary Tables 1–28.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lorenzi, L., Chiu, HS., Avila Cobos, F. et al. The RNA Atlas expands the catalog of human non-coding RNAs. Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-00936-1

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing