Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Genome Biology Open Access 01 August 2023
Scientific Reports Open Access 27 January 2023
Nature Communications Open Access 10 November 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
All types of RNA entities can be readily explored via the online R2: Genomics Analysis and Visualization Platform (http://r2.amc.nl) and via a dedicated accessible portal (http://r2platform.com/rna_atlas). This portal includes genome browser profiles for the total RNA as well as polyA tracks for all samples. All samples can also be used for correlations, differential signals and many more analyses. In addition, the LongHorn results, described in this manuscript, can be explored.
The raw data (FASTQ files) and processed expression measurement tables from all RNA biotypes across samples have been deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE138734.
Computer code used to generate the results presented in this manuscript is available at https://github.com/llorenzi90/RNA_Atlas.
Esteller, M. Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).
Chen, L.-L. The biogenesis and emerging roles of circular RNAs. Nat. Rev. Mol. Cell Biol. 17, 205–211 (2016).
Lorenzi, L. Long noncoding RNA expression profiling in cancer: challenges and opportunities. GenesÿChromosomes Cancer 58, 191–199 (2019).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).
De Rie, D. et al. An integrated expression atlas of miRNAs and their promoters in human and mouse. Nat. Biotechnol. 35, 872–878 (2017).
Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).
Iyer, M. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).
Vo, J. N. et al. The landscape of circular RNA in cancer. Cell 176, 869–881 (2019).
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
You, B. H., Yoon, S. H. & Nam, J. W. High-confidence coding and noncoding transcriptome maps. Genome Res. 27, 1050–1062 (2017).
Melé, M. et al. Human genomics. The human transcriptome across tissues and individuals. Science 348, 660–665 (2015).
Arun, G., Diermeier, S. D. & Spector, D. L. Therapeutic targeting of long non-coding RNAs in cancer. Trends Mol. Med. 24, 257–277 (2018).
Leucci, E. et al. Melanoma addiction to the long non-coding RNA SAMMSON. Nature 531, 518–522 (2016).
Hosono, Y. et al. Oncogenic role of THOR, a conserved cancer/testis long non-coding RNA. Cell 171, 1559–1572 (2017).
Cunningham, F. et al. Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015).
Roadmap Epigenomics Consortium, K. A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–329 (2015).
Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, eaah7111 (2017).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
O’Leary, N. A. et al. Reference Sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Vromman, M., Vandesompele, J. & Volders, P.-J. Closing the circle: current state and perspectives of circular RNA databases. Brief Bioinform. 22, 288–297 (2021).
Jeck, W. R. et al. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19, 141–157 (2013).
Memczak, S. et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495, 333–338 (2013).
Kozomara, A. & Griffiths-Jones, S. MiRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, 68–73 (2014).
Friedländer, M. R., MacKowiak, S. D., Li, N., Chen, W. & Rajewsky, N. MiRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic Acids Res. 40, 37–52 (2012).
Backes, C. et al. miRCarta: a central repository for collecting miRNA candidates. Nucleic Acids Res. 46, D160–D167 (2018).
Fromm, B. et al. MirGeneDB 2.0: the metazoan microRNA complement. Nucleic Acids Res. 48, D132–D141 (2020).
Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
Frese, S. et al. Long-term endurance exercise in humans stimulates cell fusion of myoblasts along with fusogenic endogenous retroviral genes in vivo. PLoS ONE 10, e1032099 (2015).
Yang, L., Duff, M. O., Graveley, B. R., Carmichael, G. G. & Chen, L. L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).
Cabili, M. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 27, 927–936 (2015).
Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript fusions. Oncogene 34, 4845–4854 (2015).
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).
Chiu, H. et al. Cupid: simultaneous reconstruction of microRNA-target and ceRNA networks. Genome Res. 25, 257–267 (2015).
Chiu, H. S. et al. Pan-cancer analysis of lncRNA regulation supports their targeting of cancer genes in each tumor context. Cell Rep. 23, 297–312 (2018).
Karreth, F. A. & Pandolfi, P. P. CeRNA cross-talk in cancer: when ce-bling rivalries go awry. Cancer Discov. 3, 1113–1121 (2013).
Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 (2010).
Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353–358 (2011).
Tay, Y., Rinn, J. & Pandolfi, P. P. The multilayered complexity of ceRNA crosstalk and competition. Nature 505, 344–352 (2014).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Salzman, J., Gawad, C., Wang, P. L., Lacayo, N. & Brown, P. O. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS ONE 7, e30733 (2012).
Djebali, S. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Liberzon, A. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Ramilowski, J. A. Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res. 30, (2020).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).
Langmead Ben, StevenS. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2013).
Pertea, M. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Trapnell, C. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Cobos, F. A. et al. Zipper plot: visualizing transcriptional activity of genomic regions. BMC Bioinformatics 18, 231 (2017).
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare. F1000Res. 9, ISCB Comm J-304 (2020).
Hubisz, M. J., Pollard, K. S. & Siepel, A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform. 12, 41–51 (2011).
Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).
Vizcaíno, J. A. et al. The Proteomics Identifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2012).
Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).
Silva, C. A. S. et al. Data-driven rescoring of metabolite annotations significantly improves sensitivity. Anal. Chem. 90, 11636–11642 (2018).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699 (2018).
Zhang, X. O. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 26, 1277–1287 (2016).
Gordon, A., Hannon, G. J. & Gordon. FASTX-Toolkit. http://hannonlab.cshl.edu/fastx_toolkit/
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).
Lefever, S. et al. High-throughput PCR assay design for targeted resequencing using primerXL. BMC Bioinformatics 18, 400 (2017).
Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 16, 1297–1305 (2019).
Gleeson, J., Lane, T. A., Harrison, P. J., Haerty, W. & Clark, M. B. Nanopore direct RNA sequencing detects differential expression between human cell populations. Preprint at bioRxiv https://doi.org/10.1101/2020.08.02.232785 (2020).
Leger, A. et al. RNA modifications detection by comparative nanopore direct RNA sequencing. Preprint at bioRxiv https://doi.org/10.1101/843136 (2019).
Cole, C., Byrne, A., Adams, M., Volden, R. & Vollmers, C. Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing. Genome Res. 30, 589–601 (2020).
De Coster, W., D’hert, S., Schultz, D. T., Cruts, M. & Van Broeckhoven, C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34, 2666–2669 (2018).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3091–3100 (2018).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 31, 166–169 (2015).
Nicorici, D. et al. FusionCatcher—a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Preprint at https://doi.org/10.1101/011650 (2014).
Goovaerts, T. et al. A comprehensive overview of genomic imprinting in breast and its deregulation in cancer. Nat. Commun. 9, 4120 (2018).
Van Der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
R Development Core Team. R: A Language and Environment for Statistical Computing. http://www.R-project.org (R Foundation for Statistical Computing, 2011).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Liao Y, Smyth GK, Shi W. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics https://doi.org/10.1093/bioinformatics/btt656 (2014).
Bovolenta, L. A., Acencio, M. L. & Lemke, N. HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions. BMC Genomics 13, 405 (2012).
Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
Whitfield, T. W. et al. Functional analysis of transcription factor binding sites in human promoters. Genome Biol. 13, R50 (2012).
Xiao, F. et al. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 37, D105–D110 (2009).
Vlachos, I. S. et al. DIANA-TarBase v7.0: indexing more than half a million experimentally supported miRNA:mRNA interactions. Nucleic Acids Res. 43, D153–D159 (2015).
Da, H. S. et al. MiRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions. Nucleic Acids Res. 42, D78–D85 (2014).
Grosswendt, S. et al. Unambiguous identification of miRNA: target site interactions by different types of ligation reactions. Mol. Cell 54, 1042–1054 (2014).
Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 22, 1372–1381 (2012).
Garcia, D. M. et al. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18, 1139–1146 (2010).
Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).
Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).
Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).
Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).
Pachkov, M., Balwierz, P. J., Arnold, P., Ozonov, E. & Van Nimwegen, E. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res. 41, D214–D220 (2013).
Smith, A. D., Sumazin, P., Xuan, Z. & Zhang, M. Q. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl Acad. Sci. USA 103, 6275–6280 (2006).
Smith, A. D., Sumazin, P., Das, D. & Zhang, M. Q. Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics 21 (Suppl. 1), i403–i412 (2005).
Sz‚kely, G. J., Rizzo, M. L. & Bakirov, N. K. Measuring and testing dependence by correlation of distances. Ann. Stat. 35, 2769–2794 (2007).
Van Nostrand, E. L. et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat. Methods 13, 504–514 (2016).
Lury, D. A. & Fisher, R. A. Statistical methods for research workers. J. R. Stat. Soc. Ser. D Statistician https://doi.org/10.2307/2986695 (1972).
Brown, M. B. 400: a method for combining non-independent, one-sided tests of significance. Biometrics 31, 987–992 (1975).
Hough, S. H., Ajetunmobi, A., Brody, L., Humphryes-Kirilov, N. & Perello, E. Desktop Genetics. Per. Med. 13, 517–521 (2016).
Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).
Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).
Bushnell, B. BBMap. https://sourceforge.net/projects/bbmap/
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
F.A.C. is supported by a Special Research Fund (BOF) scholarship of Ghent University (BOF.DOC.2017.0026.01). R.C. is supported by the Fonds Wetenschappelijk Onderzoek (11Y6218N). T.-W.C. is supported by grants from the Ministry of Science and Technology, Taiwan (MOST-109-2311-B-009 −002). A.U. is supported by research funding from the National Health and Medical Research Council (Australia) and the Leukemia & Lymphoma Society, the Leukemia Foundation and the Snowdome Foundation. G.A. is supported by a postgraduate scholarship from the Translational Cancer Research Network. M.R.W. and N.P.D. acknowledge support from the National Collaborative Research Infrastructure Strategy program, administered by Bioplatforms Australia. We thank N. Yigit, A. Barr, S. Pathak, L. Way and A. Mai for their contributions in library preparation and A. Yunghans, E. Jaeger and A. Moshrefi for their assistance in library organization and sequencing/tracking/data management. This project was funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 668858 and 826121 to P.M., P.S. and J. Koster and the Concerted Research Action of Ghent University (BOF/GOA 01G00819) to P.M. and K.B.
The authors declare no competing interests.
Peer review information Nature Biotechnology thanks Steven Salzberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lorenzi, L., Chiu, HS., Avila Cobos, F. et al. The RNA Atlas expands the catalog of human non-coding RNAs. Nat Biotechnol 39, 1453–1465 (2021). https://doi.org/10.1038/s41587-021-00936-1
This article is cited by
Genome Biology (2023)
Scientific Reports (2023)
Human Cell (2023)
A systematic survey of LU domain-containing proteins reveals a novel human gene, LY6A, which encodes the candidate ortholog of mouse Ly-6A/Sca-1 and is aberrantly expressed in pituitary tumors
Frontiers of Medicine (2023)
The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome
BMC Genomics (2022)