An atlas of human long non-coding RNAs with accurate 5′ ends

Hon, Chung-Chau; Ramilowski, Jordan A.; Harshbarger, Jayson; Bertin, Nicolas; Rackham, Owen J. L.; Gough, Julian; Denisenko, Elena; Schmeier, Sebastian; Poulsen, Thomas M.; Severin, Jessica; Lizio, Marina; Kawaji, Hideya; Kasukawa, Takeya; Itoh, Masayoshi; Burroughs, A. Maxwell; Noma, Shohei; Djebali, Sarah; Alam, Tanvir; Medvedeva, Yulia A.; Testa, Alison C.; Lipovich, Leonard; Yip, Chi-Wai; Abugessaisa, Imad; Mendez, Mickaël; Hasegawa, Akira; Tang, Dave; Lassmann, Timo; Heutink, Peter; Babina, Magda; Wells, Christine A.; Kojima, Soichi; Nakamura, Yukio; Suzuki, Harukazu; Daub, Carsten O.; de Hoon, Michiel J. L.; Arner, Erik; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R. R.

doi:10.1038/nature21374

Article
Published: 01 March 2017

An atlas of human long non-coding RNAs with accurate 5′ ends

Chung-Chau Hon¹,
Jordan A. Ramilowski^1,2,
Jayson Harshbarger^1,2,
Nicolas Bertin^2,3^nAff28,
Owen J. L. Rackham^4,5,
Julian Gough⁴,
Elena Denisenko⁶,
Sebastian Schmeier⁶,
Thomas M. Poulsen⁷,
Jessica Severin^1,2,
Marina Lizio^1,2,
Hideya Kawaji^1,2,8,
Takeya Kasukawa¹,
Masayoshi Itoh^1,2,8,
A. Maxwell Burroughs^1,2,9,
Shohei Noma^1,2,
Sarah Djebali^10,11^nAff28,
Tanvir Alam^12,13,
Yulia A. Medvedeva^14,15,
Alison C. Testa¹⁶,
Leonard Lipovich^17,18,
Chi-Wai Yip¹,
Imad Abugessaisa¹,
Mickaël Mendez^1,2^nAff28,
Akira Hasegawa^1,2,
Dave Tang^1,2,19,
Timo Lassmann^1,2,19,
Peter Heutink^1,20,
Magda Babina²¹,
Christine A. Wells^22,23,
Soichi Kojima²⁴,
Yukio Nakamura^25,26,
Harukazu Suzuki^1,2,
Carsten O. Daub^1,2,27,
Michiel J. L. de Hoon^1,2,
Erik Arner^1,2,
Yoshihide Hayashizaki^2,8,
Piero Carninci^1,2 &
…
Alistair R. R. Forrest^1,2,16

Nature volume 543, pages 199–204 (2017)Cite this article

52k Accesses
714 Citations
434 Altmetric
Metrics details

Subjects

Abstract

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5′ ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Cell-type-specific lncRNAs implicated in GWAS traits.**

**Figure 3: LncRNAs implicated in eQTL.**

**Figure 4: Functional evidence of human lncRNAs.**

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Anoushka Joglekar, Wen Hu, … Hagen U. Tilgner

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

References

Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)
Article CAS ADS Google Scholar
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012)
Article CAS ADS Google Scholar
Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nature Genet. 47, 199–208 (2015)
Article CAS Google Scholar
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)
Article CAS Google Scholar
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)
Article CAS Google Scholar
Quek, X. C. et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015)
Article CAS Google Scholar
Schmidt, L. H. et al. The long noncoding MALAT-1 RNA indicates a poor prognosis in non-small cell lung cancer and induces migration and tumor growth. J. Thorac. Oncol. 6, 1984–1992 (2011)
Article Google Scholar
Andersson, R. et al. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nature Commun. 5, 5336 (2014)
Article CAS ADS Google Scholar
Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008)
Article CAS ADS Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)
Article CAS ADS Google Scholar
Quinn, J. J. & Chang, H. Y. Unique features of long non-coding RNA biogenesis and function. Nature Rev. Genet. 17, 47–62 (2016)
Article CAS Google Scholar
Palazzo, A. F. & Lee, E. S. Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2 (2015)
Article Google Scholar
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016)
Article CAS ADS Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010)
Article Google Scholar
Li, M. J. et al. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 44 (D1), D869–D876 (2016)
Article CAS Google Scholar
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015)
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015)
Article CAS ADS Google Scholar
Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nature Methods 10, 1177–1184 (2013)
Article CAS Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)
Article CAS Google Scholar
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003)
Article CAS ADS Google Scholar
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)
Article CAS ADS Google Scholar
Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015)
Article CAS ADS Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)
Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013)
Article CAS Google Scholar
Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013)
Article CAS Google Scholar
Sigova, A. A. et al. Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells. Proc. Natl Acad. Sci. USA 110, 2876–2881 (2013)
Article CAS ADS Google Scholar
Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)
Article CAS Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nature Genet. 46, 1311–1320 (2014)
Article CAS Google Scholar
Xiang, J.-F. et al. Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus. Cell Res. 24, 513–531 (2014)
Article CAS Google Scholar
Ulitsky, I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nature Rev. Genet. 17, 601–614 (2016)
Article CAS Google Scholar
Kapusta, A. et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013)
Article CAS Google Scholar
Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015)
Article CAS Google Scholar
Ng, S.-Y., Johnson, R. & Stanton, L. W. Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 31, 522–533 (2012)
Article CAS Google Scholar
Holm, H. et al. Several common variants modulate heart rate, PR interval and QRS duration. Nature Genet. 42, 117–122 (2010)
Article CAS Google Scholar
Pfeufer, A. et al. Genome-wide association study of PR interval. Nature Genet. 42, 153–159 (2010)
Article CAS Google Scholar
Smith, J. G. et al. Genome-wide association study of electrocardiographic conduction measures in an isolated founder population: Kosrae. Heart Rhythm 6, 634–641 (2009)
Article Google Scholar
Paralkar, V. R. et al. Unlinking an lncRNA from its associated cis element. Mol. Cell 62, 104–110 (2016)
Article CAS Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)
Article CAS Google Scholar
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
Lai, F. et al. Activating RNAs associate with Mediator to enhance chromatin architecture and transcription. Nature 494, 497–501 (2013)
Article CAS ADS Google Scholar
Clark, M. B. et al. The reality of pervasive transcription. PLoS Biol. 9, e1000625, (2011)
Article CAS Google Scholar
Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)
Article CAS Google Scholar
Severin, J. et al. Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nature Biotechnol. 32, 217–219 (2014)
Article CAS Google Scholar
Hasegawa, A., Daub, C., Carninci, P., Hayashizaki, Y. & Lassmann, T. MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics 15, 144 (2014)
Article Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28, 511–515 (2010)
Article CAS Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnol. 29, 644–652 (2011)
Article CAS Google Scholar
Kent, W. J. BLAT--the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
Article CAS Google Scholar
Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnol. 32, 462–464 (2014)
Article CAS Google Scholar
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012)
Article CAS Google Scholar
Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44 (D1), D726–D732 (2016)
Article CAS Google Scholar
Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)
Article CAS Google Scholar
Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011)
Article CAS Google Scholar
Washietl, S. et al. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA 17, 578–594 (2011)
Article CAS Google Scholar
Olexiouk, V. et al. sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 44 (D1), D324–D329 (2016)
Article CAS Google Scholar
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002)
Article CAS Google Scholar
Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013)
Article CAS Google Scholar
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013)
Article CAS Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)
Article CAS Google Scholar
Chao, A. & Shen, T.-J. Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10, 429–443 (2003)
Article MathSciNet Google Scholar
Meehan, T. F. et al. Logical development of the cell ontology. BMC Bioinformatics 12, 6 (2011)
Article Google Scholar
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E. & Haendel, M. A. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 13, R5 (2012)
Article Google Scholar
Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008)
Article CAS Google Scholar
1000 Genomes Project Consortiumet al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)
Sakharkar, M. K., Chow, V. T. K. & Kangueane, P. Distributions of exons and introns in the human genome. In Silico Biol. 4, 387–393 (2004)
CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
Article CAS Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010)
Article CAS Google Scholar
Bostock, M., Ogievetsky, V. & Heer, J. D³: data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011)
Article Google Scholar
Abugessaisa, I. et al. FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki. Database 2016, baw105 (2016)
Article Google Scholar

Download references

Acknowledgements

FANTOM5 was made possible by research grants for the RIKEN Omics Science Center and the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT to Y.H. It was also supported by research grants for the RIKEN Preventive Medicine and Diagnosis Innovation Program (RIKEN PMI) to Y.H. and the RIKEN Centre for Life Science Technologies, Division of Genomic Technologies (RIKEN CLST (DGT)) from the MEXT, Japan. A.R.R.F. is supported by a Senior Cancer Research Fellowship from the Cancer Research Trust, the MACA Ride to Conquer Cancer and the Australian Research Council’s Discovery Projects funding scheme (DP160101960). S.D. is supported by award number U54HG007004 from the National Human Genome Research Institute of the National Institutes of Health, funding from the Ministry of Economy and Competitiveness (MINECO) under grant number BIO2011-26205, and SEV-2012-0208 from the Spanish Ministry of Economy and Competitiveness. Y.A.M. is supported by the Russian Science Foundation, grant 15-14-30002. We thank RIKEN GeNAS for generation of the CAGE and RNA-seq libraries, the Netherlands Brain Bank for brain materials, the RIKEN BioResource Centre for providing cell lines and all members of the FANTOM5 consortium for discussions, in particular H. Ashoor, M. Frith, R. Guigo, A. Tanzer, E. Wood, H. Jia, K. Bailie, J. Harrow, E. Valen, R. Andersson, K. Vitting-Seerup, A. Sandelin, M. Taylor, J. Shin, R. Mori, C. Mungall and T. Meehan.

Author information

Nicolas Bertin, Sarah Djebali & Mickaël Mendez
Present address: † Present addresses: Human Longevity Singapore Pte. Ltd., Singapore (N.B.); GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet Tolosan, France (S.D.); Department of Computer Science, University of Toronto, Ontario, Canada (M.M.).,
alistair.forrest@gmail.com

Authors and Affiliations

RIKEN Center for Life Science Technologies (Division of Genomic Technologies), 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045, Yokohama, Japan
Chung-Chau Hon, Jordan A. Ramilowski, Jayson Harshbarger, Jessica Severin, Marina Lizio, Hideya Kawaji, Takeya Kasukawa, Masayoshi Itoh, A. Maxwell Burroughs, Shohei Noma, Chi-Wai Yip, Imad Abugessaisa, Mickaël Mendez, Akira Hasegawa, Dave Tang, Timo Lassmann, Peter Heutink, Harukazu Suzuki, Carsten O. Daub, Michiel J. L. de Hoon, Erik Arner, Piero Carninci & Alistair R. R. Forrest
RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, 230-0045, Yokohama, Japan
Jordan A. Ramilowski, Jayson Harshbarger, Nicolas Bertin, Jessica Severin, Marina Lizio, Hideya Kawaji, Masayoshi Itoh, A. Maxwell Burroughs, Shohei Noma, Mickaël Mendez, Akira Hasegawa, Dave Tang, Timo Lassmann, Harukazu Suzuki, Carsten O. Daub, Michiel J. L. de Hoon, Erik Arner, Yoshihide Hayashizaki, Piero Carninci & Alistair R. R. Forrest
Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore, 117599, Singapore
Nicolas Bertin
Department of Computer Science, University of Bristol, Life Sciences building, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK
Owen J. L. Rackham & Julian Gough
Program in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, 8 College Road, 169857, Singapore
Owen J. L. Rackham
Institute of Natural and Mathematical Sciences, Massey University Auckland, Albany, 0632, New Zealand
Elena Denisenko & Sebastian Schmeier
Biotechnology Research Institute for Drug Discovery (BRD), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, 305-8568, Ibaraki, Japan
Thomas M. Poulsen
RIKEN Preventive Medicine and Diagnosis Innovation Program, 2-1 Hirosawa, Wako, 351-0198, Saitama, Japan
Hideya Kawaji, Masayoshi Itoh & Yoshihide Hayashizaki
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 20894, Maryland, USA
A. Maxwell Burroughs
Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
Sarah Djebali
Universitat Pompeu Fabra (UPF), Barcelona Biomedical Research Park (PRBB), Dr Aiguader 88, Barcelona, 08003, Spain
Sarah Djebali
Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center; Computer, King Abdullah University of Science and Technology (KAUST),
Tanvir Alam
Electrical and Mathematical Sciences and Engineering Division, Computer, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
Tanvir Alam
Institute of Bioengineering, Research Center of Biotechnology RAS, Moscow, 119071, Russia
Yulia A. Medvedeva
Vavilov Institute of General Genetic, RAS, Moscow, 119991, Russia
Yulia A. Medvedeva
Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, the University of Western Australia, Nedlands, 6009, Western Australia, Australia
Alison C. Testa & Alistair R. R. Forrest
Center for Molecular Medicine and Genetics, Wayne State University, Detroit, 48201, Michigan, USA
Leonard Lipovich
Department of Neurology, School of Medicine, Wayne State University, Detroit, 48201, Michigan, USA
Leonard Lipovich
Telethon Kids Institute, The University of Western Australia, 100 Roberts Road, Subiaco, Subiaco, 6008, Western Australia, Australia
Dave Tang & Timo Lassmann
German Center for Neurodegenerative Diseases (DZNE), Tübingen, D-72076, Germany
Peter Heutink
Department of Dermatology and Allergy, Charité Universitätsmedizin Berlin, Berlin, 10117, Germany
Magda Babina
Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, 4072, Australia
Christine A. Wells
Department of Anatomy and Neuroscience, Faculty of Medicine, The University of Melbourne, 3010, Australia
Christine A. Wells
RIKEN CLST (Division of Bio-Function Dynamics Imaging), Wako, 351-0198, Saitama, Japan
Soichi Kojima
Cell Engineering Division, RIKEN BioResource Center, Tsukuba, 305-0074, Ibaraki, Japan
Yukio Nakamura
Faculty of Medicine, University of Tsukuba, Tsukuba, 305-8577, Ibaraki, Japan
Yukio Nakamura
Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, 141 83, Sweden
Carsten O. Daub

Authors

Chung-Chau Hon
View author publications
You can also search for this author in PubMed Google Scholar
Jordan A. Ramilowski
View author publications
You can also search for this author in PubMed Google Scholar
Jayson Harshbarger
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Bertin
View author publications
You can also search for this author in PubMed Google Scholar
Owen J. L. Rackham
View author publications
You can also search for this author in PubMed Google Scholar
Julian Gough
View author publications
You can also search for this author in PubMed Google Scholar
Elena Denisenko
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Schmeier
View author publications
You can also search for this author in PubMed Google Scholar
Thomas M. Poulsen
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Severin
View author publications
You can also search for this author in PubMed Google Scholar
Marina Lizio
View author publications
You can also search for this author in PubMed Google Scholar
Hideya Kawaji
View author publications
You can also search for this author in PubMed Google Scholar
Takeya Kasukawa
View author publications
You can also search for this author in PubMed Google Scholar
Masayoshi Itoh
View author publications
You can also search for this author in PubMed Google Scholar
A. Maxwell Burroughs
View author publications
You can also search for this author in PubMed Google Scholar
Shohei Noma
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Djebali
View author publications
You can also search for this author in PubMed Google Scholar
Tanvir Alam
View author publications
You can also search for this author in PubMed Google Scholar
Yulia A. Medvedeva
View author publications
You can also search for this author in PubMed Google Scholar
Alison C. Testa
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Lipovich
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Wai Yip
View author publications
You can also search for this author in PubMed Google Scholar
Imad Abugessaisa
View author publications
You can also search for this author in PubMed Google Scholar
Mickaël Mendez
View author publications
You can also search for this author in PubMed Google Scholar
Akira Hasegawa
View author publications
You can also search for this author in PubMed Google Scholar
Dave Tang
View author publications
You can also search for this author in PubMed Google Scholar
Timo Lassmann
View author publications
You can also search for this author in PubMed Google Scholar
Peter Heutink
View author publications
You can also search for this author in PubMed Google Scholar
Magda Babina
View author publications
You can also search for this author in PubMed Google Scholar
Christine A. Wells
View author publications
You can also search for this author in PubMed Google Scholar
Soichi Kojima
View author publications
You can also search for this author in PubMed Google Scholar
Yukio Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Harukazu Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Carsten O. Daub
View author publications
You can also search for this author in PubMed Google Scholar
Michiel J. L. de Hoon
View author publications
You can also search for this author in PubMed Google Scholar
Erik Arner
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihide Hayashizaki
View author publications
You can also search for this author in PubMed Google Scholar
Piero Carninci
View author publications
You can also search for this author in PubMed Google Scholar
Alistair R. R. Forrest
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The manuscript was written by A.R.R.F., C.C.H., J.A.R. and N.B. with help from P.C., E.A. and M.L. C.C.H., J.A.R., J.H., N.B., O.J.L.R., Y.H., P.C. and A.R.R.F. are core authors for the lncRNA work. P.H., M.B., C.A.W., S.K. and Y.N. provided samples. C.C.H. performed most of the analyses with help from others as listed below. C.C.H., N.B., J.A.R., O.R., J.G., A.M.B., S.D., A.H. and T.L.: RNA-seq assembly. C.C.H., J.A.R. N.B., A.T.C. and M.J. L.d.H.: coding potential assessment. C.C.H. devised and implemented the TIEScore, transcript model integration and CAT. S.S., C.C.H. and E.D. performed the GWAS and eQTL analyses. C.C.H., T.A. and Y.A.M. analysed TIRs. C.C.H. and T.M.P.: expression specificity analysis. L.L.: discussions in planning. J.H. implemented the web tool. M.I. and P.C. generated CAGE data. S.N. generated the RNA-seq. H.K. and T.L. clustered the CAGE data. C.C.H., N.B. and J.S. made ZENBU configurations. M.L., H.K., T.K. and I.A.: data handling. C.W.Y. curated cell-type and trait associations. M.M. helped with cell-type enrichment analysis. D.T. helped with repeats analysis. FANTOM5 headquarters: Y.H., A.R.R.F., P.C., M.I., C.O.D., H.S., T.L. and E.A. P.C., Y.H. and A.R.R.F. conceived the project and managed FANTOM5. The scientific coordinator was A.R.R.F. and the general organizer was Y.H.

Corresponding authors

Correspondence to Piero Carninci or Alistair R. R. Forrest.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks M. Gerstein, J. Rinn and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Building a 5′ complete lncRNA catalogue.

a, Integration of CAGE and transcript models. CAGE clusters were used to integrate transcript models from various sources and their 5′ completeness was assessed on the basis of TIEScore. b, Identification of lncRNAs. TIEScore identified 59,110 genes and coding potential assessment further identified 27,919 lncRNAs in FANTOM CAT at the robust TIEScore cutoff. c, Categorization of lncRNAs. LncRNAs were annotated according to their gene orientation (that is, genomic context) and DHS type²³ (that is, epigenomic context) and then categorized into divergent p-lncRNAs (purple), intergenic p-lncRNAs (blue), e-lncRNAs (green) and other lncRNAs (grey). d, Overlaps between FANTOM CAT and other lncRNA catalogues. e, LncRNA gene models outside FANTOM CAT are 5′ incomplete. LncRNAs found commonly in both catalogues (grey), or only in FANTOM CAT (red), show stronger evidence of transcription initiation (DHS, H3K4me1, H3K4me3 and PolII ChIP-seq²³) and conservation (phastCons³⁸) than those found only in other lncRNA catalogues (blue, green or yellow).

Source data

Extended Data Figure 2 FANTOM CAT is more 5′ complete than other lncRNA catalogues.

a, FANTOM CAT lncRNA TSS are well-supported. The 5′ ends of FANTOM CAT lncRNAs (first column) have stronger transcriptomic, epigenomic and genomic evidence of transcription initiation than the 5′ ends of lncRNA models in the Human BodyMap 2.0 (ref. 4), miTranscriptome³ and GENCODE release 25 (ref. 19) (second column). In b and c, the box plots show the median, quartiles and Tukey whiskers of the estimates of FDR of complete 5′ ends (b) and number of 5′ complete lncRNA genes (c) on the basis of ten sets of gold standard TSS and non-TSS regions (Methods). b, FDR of complete 5′ ends. c, Estimated number of 5′ complete lncRNA genes (total number of genes × [1 − FDR]). d, Validation rate of gene models using RAMPAGE. RAMPAGE data sets^25,50 (n = 207, Methods) were used to validate the lncRNA transcripts in FANTOM CAT and other catalogues (left). Transcripts containing full consensus CDS (CCDS transcripts) were used for control (right). The exon of a transcript is detected by RAMPAGE³¹ if it overlaps ≥3 RAMPAGE 3′ ends. Transcript detection rates of all catalogues were plotted (upper). About 95% of lncRNA transcripts in the robust FANTOM CAT can be detected, which is slightly higher than that of GENCODE release 25 (~92%). The TSS of a detected transcript is validated by RAMPAGE if it is located within the proximity of a RAMPAGE 5′ end (for example, from 0 to 500 bp, x axis, lower). At 100 bp, ~95% of lncRNA transcripts in the robust FANTOM CAT can be validated, versus ~85% for that of GENCODE release 25. We note the percentages of CCDS transcripts in FANTOM CAT and GENCODE release 25 detected or validated by RAMPAGE are similar, with the robust and stringent FANTOM CAT catalogues performing slightly better.

Source data

Extended Data Figure 3 Revision of lncRNA models in GENCODE.

a, An example of improved TSS annotation of a GENCODE release 25 lncRNA gene. The 5′ ends of GENCODE release 25 annotated lncRNA transcripts of TUG1 (ENSG00000253352) are distant from the region of strong CAGE signal, while FANTOM CAT added extra transcripts accurately start from the proximal CAGE signal summit. b, An example of bridged gene models of GENCODE release 25 lncRNA genes. In GENCODE release 25, the locus was annotated with three short lncRNA genes; FANTOM CAT bridged these short lncRNA transcript models into a long transcript model (RP11-973H7.4, ENSG00000267654) starting from the proximal CAGE signal summit.

Extended Data Figure 4 Heterogeneity among lncRNA gene categories.

a, Epigenomic features surrounding TSS. The y axis refers to the fraction of TIR overlaps with peaks of the corresponding epigenomic signal from the Roadmap Epigenome Consortium²³. b, Genomic features surrounding TSS. Sequence features conducive to generating longer transcripts are enrichment of 5′ splice site (5′ SS) and depletion of polyadenylation sites (PAS). Sequence features associated with transcription initiation include CpG islands, INR (initiator) motif and TATA box motif. c, Core promoter motifs. Grey dashed lines indicate whole-genome background.

Source data

Extended Data Figure 5 Transposons at TIRs.

a, Percentages of genes with conserved and unconserved TIR (as defined in Fig. 1c) and their overlap with various classes of transposons. b, Enrichment of retrotransposons at unconserved TIR. The Venn diagrams show the overlap between unconserved TIR, DNA transposons and retrotransposons. Retrotransposons are significantly enriched in unconserved TIR of all gene classes (one-tailed Fisher’s exact test, P < 0.05).

Source data

Extended Data Figure 6 Expression landscape of lncRNAs in primary cells.

a, Expression level and specificity. Abbreviation cpm is relative log expression (rle) normalized count per millions. The maximum expression level (log₂ cpm) and expression specificity (Chao–Shen’s corrected Shannon entropy⁵⁹) of genes among 69 primary cell facets¹⁰ were plotted. Box plots show the median (dashed lines), quartiles and Tukey whiskers. b, Percentage of genes within categories expressed within primary cell facets. The circles represent the mean among samples within a facet and the error bars represent 99.99% confidence intervals. Dashed lines represent the means among all samples. c, Number of lncRNA genes expressed within primary cell facets. Dashed line represents the mean among all samples. The x axis is sorted on the basis of number of lncRNA genes expressed. A gene is considered as ‘expressed’ when cpm ≥ 0.01.

Source data

Extended Data Figure 7 Association of cell-type-enriched genes with trait-associated genes of different biological themes.

A detailed view of blocks from Fig. 2a. The dendrograms were coloured as in Fig. 2a. a, ‘Immune system’ cell types and ‘infection and immunity’ traits. b, ‘Hepato-intestinal system’ cell types and ‘hepatic function’ traits. c, ‘Pigmented cells’ cell types and ‘pigmentation’ traits. d, ‘Non-immune blood cells’ cell types and ‘blood homeostasis’ traits. e, ‘Cardiovascular system’ cell types and ‘cardiovascular function’ traits.

Extended Data Figure 8 LncRNA AP001057.1 is associated with classical monocytes and implicated in immune diseases.

a, Genomic view of AP001057.1 (ENSG00000232124) in the ZENBU genome browser⁴³. The strongest TSS of AP001057.1 overlaps with an enhancer DHS. The locus overlaps with fine-mapped SNPs associated with Crohn’s disease and GWAS SNPs associated with coeliac disease and inflammatory bowel disease. b, AP001057.1 is associated with classical monocytes (CL:0000860). c, AP001057.1 is significantly upregulated in monocytes upon stimulation with various immunogenic agents (FDR < 0.05 in edgeR⁵⁸, highlighted in red and indicated with asterisks). Note: we performed differential expression analysis to identify lncRNAs that are dynamically regulated upon stimulation, infection or differentiation on the basis of 25 manually curated series of FANTOM5 samples (Supplementary Table 18 and Methods), and the results are available in Supplementary Table 19. Figures were captured (with slight modifications) from the online resource at http://fantom.gsc.riken.jp/cat/v1/#/genes/ENSG00000232124.1.

Extended Data Figure 9 Selective constraints and enrichment of GWAS trait and eQTL-associated SNPs at lncRNA loci.

a, Selective constraints between species (phastCons³⁸) and within human population (derived allele frequency³⁹). b, Enrichment of GWAS SNPs. Only lead GWAS SNPs¹⁵ were used (Methods). c, Enrichment of PICS¹⁷ fine-mapped SNPs in global (all versus all) or focused (immune versus immune) analysis (Methods). d, Enrichment of GTEx eQTL SNPs¹⁶ associated with expression of mRNAs. Circles represent means and the error bars represent their 99.99% confidence intervals.

Source data

Extended Data Figure 10 Co-expression of various gene pairs linked by eQTL SNPs.

We searched for gene loci that overlap eQTL SNPs associated with expression variation of mRNAs (as identified by GTEx¹⁶). Gene loci overlapping these SNPs were then paired with the corresponding mRNA and their expression correlation across the FANTOM5 expression atlas was investigated. Rows compare the gene types overlapping the SNPs. a, mRNAs; b, all lncRNAs; c, divergent p-lncRNAs; d, intergenic p-lncRNAs; e, e-lncRNAs. Columns compare the relative orientation of the gene pairs and the position of the SNPs. The term ‘all’ refers to all orientations of the gene pairs and positions of the SNPs pooled. Gene pairs were binned on the basis of the number of SNPs linking the pair (bin = 5 SNPs). The data points represent the mean of absolute Spearman’s rho and the error bars represent its 99.99% confidence intervals. At each bin, the number of pairs plotted is the same for the three pair types as indicated.

Source data

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-6, Supplementary Figures 1-14, descriptions for Supplementary Tables 1-19, online resources and Supplementary references. (PDF 9152 kb)

Supplementary Data

This zipped file contains Supplementary Tables 1-19 – see Supplementary Information document for descriptions. (ZIP 74370 kb)

Supplementary Data

This zipped file contains source data for Supplementary Figures 1-6. (ZIP 1386 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Source data

Source data to Fig. 1

Source data to Fig. 2

Source data to Fig. 3

Source data to Fig. 4

Source data to Extended Data Fig. 5

Source data to Extended Data Fig. 6

Source data to Extended Data Fig. 7

Source data to Extended Data Fig. 8

Source data to Extended Data Fig. 9

Source data to Extended Data Fig. 10

Source data to Extended Data Fig. 11

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hon, CC., Ramilowski, J., Harshbarger, J. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017). https://doi.org/10.1038/nature21374

Download citation

Received: 14 June 2016
Accepted: 08 January 2017
Published: 01 March 2017
Issue Date: 09 March 2017
DOI: https://doi.org/10.1038/nature21374

This article is cited by

Tracing vitamins on the long non-coding lane of the transcriptome: vitamin regulation of LncRNAs
- Fatemeh Yazarlou
- Fatemeh Alizadeh
- Soudeh Ghafouri-Fard
Genes & Nutrition (2024)
Transcription regulation by long non-coding RNAs: mechanisms and disease relevance
- Jorge Ferrer
- Nadya Dimitrova
Nature Reviews Molecular Cell Biology (2024)
Regulatory activity is the default DNA state in eukaryotes
- Ishika Luthra
- Cassandra Jensen
- Carl G. de Boer
Nature Structural & Molecular Biology (2024)
A type 1 immunity-restricted promoter of the IL−33 receptor gene directs antiviral T-cell responses
- Tobias M. Brunner
- Sebastian Serve
- Max Löhning
Nature Immunology (2024)
Synthetic reversed sequences reveal default genomic states
- Brendan R. Camellato
- Ran Brosh
- Jef D. Boeke
Nature (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

PowerPoint slides

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links