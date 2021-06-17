Skip to main content

The RNA Atlas expands the catalog of human non-coding RNAs

Nature Biotechnology (2021)

Subjects

Abstract

Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.

Fig. 1: RNA Atlas transcriptome generation and annotation.
Fig. 2: The RNA Atlas transcriptome catalogued many single-exon lncRNAs and revealed previously non-annotated PCGs.
Fig. 3: Analyses of RNA polyadenylation status.
Fig. 4: The association between sample ontology and expression distance.
Fig. 5: Total RNA transcriptomes facilitated the use of intron expression profiles to study regulatory modalities.
Fig. 6: Evidence for regulation by lncRNAs.
Fig. 7: Interpretation of lncRNA function.

Data availability

All types of RNA entities can be readily explored via the online R2: Genomics Analysis and Visualization Platform (http://r2.amc.nl) and via a dedicated accessible portal (http://r2platform.com/rna_atlas). This portal includes genome browser profiles for the total RNA as well as polyA tracks for all samples. All samples can also be used for correlations, differential signals and many more analyses. In addition, the LongHorn results, described in this manuscript, can be explored.

The raw data (FASTQ files) and processed expression measurement tables from all RNA biotypes across samples have been deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE138734.

Code availability

Computer code used to generate the results presented in this manuscript is available at https://github.com/llorenzi90/RNA_Atlas.

Acknowledgements

F.A.C. is supported by a Special Research Fund (BOF) scholarship of Ghent University (BOF.DOC.2017.0026.01). R.C. is supported by the Fonds Wetenschappelijk Onderzoek (11Y6218N). T.-W.C. is supported by grants from the Ministry of Science and Technology, Taiwan (MOST-109-2311-B-009 −002). A.U. is supported by research funding from the National Health and Medical Research Council (Australia) and the Leukemia & Lymphoma Society, the Leukemia Foundation and the Snowdome Foundation. G.A. is supported by a postgraduate scholarship from the Translational Cancer Research Network. M.R.W. and N.P.D. acknowledge support from the National Collaborative Research Infrastructure Strategy program, administered by Bioplatforms Australia. We thank N. Yigit, A. Barr, S. Pathak, L. Way and A. Mai for their contributions in library preparation and A. Yunghans, E. Jaeger and A. Moshrefi for their assistance in library organization and sequencing/tracking/data management. This project was funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreements 668858 and 826121 to P.M., P.S. and J. Koster and the Concerted Research Action of Ghent University (BOF/GOA 01G00819) to P.M. and K.B.

Author information

Author notes

  1. These authors contributed equally: Lucia Lorenzi, Hua-Sheng Chiu.

Affiliations

  1. Center for Medical Genetics, Ghent University, Ghent, Belgium

    Lucia Lorenzi, Francisco Avila Cobos, Pieter-Jan Volders, Robrecht Cannoodt, Justine Nuytens, Katrien Vanderheyden, Jasper Anckaert, Steve Lefever, Eric J. de Bony, Wim Trypsteen, Fien Gysens, Marieke Vromman, Katleen De Preter, Jo Vandesompele & Pieter Mestdagh

  2. Cancer Research Institute Ghent (CRIG), Ghent, Belgium

    Lucia Lorenzi, Francisco Avila Cobos, Pieter-Jan Volders, Robrecht Cannoodt, Justine Nuytens, Katrien Vanderheyden, Jasper Anckaert, Steve Lefever, Eric J. de Bony, Wim Trypsteen, Fien Gysens, Marieke Vromman, Tim De Meyer, Katleen De Preter, Jo Vandesompele, Pavel Sumazin & Pieter Mestdagh

  3. Texas Children’s Cancer Center, Baylor College of Medicine, Houston TX, USA

    Hua-Sheng Chiu

  4. Illumina, Inc., San Diego CA, USA

    Stephen Gross, Scott Kuersten & Gary P. Schroth

  5. VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium

    Pieter-Jan Volders & Yvan Saeys

  6. Data Mining and Modelling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium

    Robrecht Cannoodt & Yvan Saeys

  7. Department of Applied Mathematics, Computer Science, and Statistics, Ghent University, Ghent, Belgium

    Robrecht Cannoodt

  8. Data Intuitive, Lebbeke, Belgium

    Robrecht Cannoodt

  9. Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, New South Wales, Sydney NSW, Australia

    Aidan P. Tay

  10. Department of Biomedical Sciences, Macquarie University, New South Wales, Sydney NSW, Australia

    Aidan P. Tay

  11. Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium

    Tine Goovaerts & Tim De Meyer

  12. Interdisciplinary Nanoscience Centre (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark

    Thomas Birkballe Hansen & Jørgen Kjems

  13. Biogazelle, Zwijnaarde, Belgium

    Nele Nijs

  14. Department of Diagnostic Sciences, Ghent University, Ghent, Belgium

    Tom Taghon

  15. Department of Respiratory Medicine, Ghent University, Ghent, Belgium

    Karim Vermaelen & Ken R. Bracke

  16. Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney NSW, Australia

    Nandan P. Deshpande & Marc R. Wilkins

  17. Adult Cancer Program, Lowy Cancer Research Centre, UNSW Sydney, Sydney NSW, Australia

    Govardhan Anande & Ashwin Unnikrishnan

  18. Prince of Wales Clinical School, UNSW Sydney, Sydney NSW, Australia

    Govardhan Anande & Ashwin Unnikrishnan

  19. Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan

    Ting-Wen Chen

  20. Department of Oncogenomics, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands

    Jan Koster

Contributions

P.M., J.V. and P.S. conceived the idea and designed and supervised the project. L.L. and H.-S.C. contributed to the implementation and design of most bioinformatic analyses. L.L performed most of the raw sequencing data processing, transcriptome assembly and filtering, polyadenylation classification and most of the presented analyses for quality assessment and characterization of the generated transcriptome. H.-S.C., T.-W.C. and P.S. performed the analyses related to prediction and validation of regulatory interactions mediated by ncRNAs. F.A.C. and K.D.P. performed the analyses to select the RNA Atlas genes and contributed to quality validation of the transcriptome. S.G., S.K. and G.P.S. generated and sequenced the polyA and total RNA libraries. P.-J.V. performed the evaluation of coding potential, analyses of mass spectrometry data, alignment of candidate protein sequences to other animal proteins via BLASTp and analysis of conservation with chimpanzee. R.C. and Y. S. contributed to the analyses of RNA biotype expression and sample ontology associations. J.N. performed the polyA-minus sequencing and the qPCR experiments. K. Vanderheyden and J.N. generated and sequenced the small RNA libraries. J.A. implemented the identification of miRNAs and sequence motif analysis. S.L. designed the primers for the qPCR experiments and contributed to the graphic design of schematic figures. A.P.T. performed the analysis of overlap between ONT reads in public datasets and RNA Atlas-only single-exon genes. E.J.B., W.T. and F.G. performed the experiments of CRISPRi-mediated transcriptional silencing of lncRNA MALAT1. M.V. generated the integrated circRNA reference dataset used for comparisons with RNA Atlas circRNAs. T.G. and T.D.M. performed the imprinting analyses. T.B.H. and J. Kjems implemented the circRNA identification workflow. N.N. developed the polyA-minus sequencing protocol. T.T., K. Vermaelen and K.R.B. provided immune system-related cell lines and cell types. N.P.D., G.A., M.R.W. and A.U. performed analyses and annotation of circRNAs and contributed to the analysis of ONT reads in public datasets. J. Koster developed dedicated tools to analyze RNA Atlas data and results and implemented them in a dedicated RNA Atlas datascope in the online portal R2. P.M. led the writing of the manuscript in collaboration with L.L., H.-S.C. and P.S. L.L., H.-S.C., G.P.S., J.V., P.S. and P.M. contributed to the conceptualization, interpretation and discussion of results. All authors commented on the manuscript and contributed to the presentation of the data and results. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper.

Corresponding authors

Correspondence to Pavel Sumazin or Pieter Mestdagh.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Steven Salzberg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

About this article

Cite this article

