Abstract

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5′ ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)

  2. 2.

    et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012)

  3. 3.

    et al. The landscape of long noncoding RNAs in the human transcriptome. Nature Genet. 47, 199–208 (2015)

  4. 4.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)

  5. 5.

    et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)

  6. 6.

    et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015)

  7. 7.

    et al. The long noncoding MALAT-1 RNA indicates a poor prognosis in non-small cell lung cancer and induces migration and tumor growth. J. Thorac. Oncol. 6, 1984–1992 (2011)

  8. 8.

    et al. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nature Commun. 5, 5336 (2014)

  9. 9.

    et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008)

  10. 10.

    et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)

  11. 11.

    & Unique features of long non-coding RNA biogenesis and function. Nature Rev. Genet. 17, 47–62 (2016)

  12. 12.

    & Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2 (2015)

  13. 13.

    et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016)

  14. 14.

    et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010)

  15. 15.

    et al. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 44 (D1), D869–D876 (2016)

  16. 16.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015)

  17. 17.

    et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015)

  18. 18.

    et al. Assessment of transcript reconstruction methods for RNA-seq. Nature Methods 10, 1177–1184 (2013)

  19. 19.

    et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)

  20. 20.

    et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003)

  21. 21.

    et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)

  22. 22.

    et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015)

  23. 23.

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)

  24. 24.

    et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013)

  25. 25.

    , , , & High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013)

  26. 26.

    et al. Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells. Proc. Natl Acad. Sci. USA 110, 2876–2881 (2013)

  27. 27.

    et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)

  28. 28.

    et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nature Genet. 46, 1311–1320 (2014)

  29. 29.

    et al. Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus. Cell Res. 24, 513–531 (2014)

  30. 30.

    Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nature Rev. Genet. 17, 601–614 (2016)

  31. 31.

    et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013)

  32. 32.

    et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015)

  33. 33.

    , & Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 31, 522–533 (2012)

  34. 34.

    et al. Several common variants modulate heart rate, PR interval and QRS duration. Nature Genet. 42, 117–122 (2010)

  35. 35.

    et al. Genome-wide association study of PR interval. Nature Genet. 42, 153–159 (2010)

  36. 36.

    et al. Genome-wide association study of electrocardiographic conduction measures in an isolated founder population: Kosrae. Heart Rhythm 6, 634–641 (2009)

  37. 37.

    et al. Unlinking an lncRNA from its associated cis element. Mol. Cell 62, 104–110 (2016)

  38. 38.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

  39. 39.

    1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  40. 40.

    et al. Activating RNAs associate with Mediator to enhance chromatin architecture and transcription. Nature 494, 497–501 (2013)

  41. 41.

    et al. The reality of pervasive transcription. PLoS Biol. 9, e1000625, (2011)

  42. 42.

    Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)

  43. 43.

    et al. Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nature Biotechnol. 32, 217–219 (2014)

  44. 44.

    , , , & MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics 15, 144 (2014)

  45. 45.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28, 511–515 (2010)

  46. 46.

    et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnol. 29, 644–652 (2011)

  47. 47.

    BLAT--the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

  48. 48.

    , & Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnol. 32, 462–464 (2014)

  49. 49.

    & ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012)

  50. 50.

    et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44 (D1), D726–D732 (2016)

  51. 51.

    , & EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)

  52. 52.

    , & PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011)

  53. 53.

    et al. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA 17, 578–594 (2011)

  54. 54.

    et al. sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 44 (D1), D324–D329 (2016)

  55. 55.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002)

  56. 56.

    & nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013)

  57. 57.

    et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013)

  58. 58.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)

  59. 59.

    & Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10, 429–443 (2003)

  60. 60.

    et al. Logical development of the cell ontology. BMC Bioinformatics 12, 6 (2011)

  61. 61.

    , , , & Uberon, an integrative multi-species anatomy ontology. Genome Biol. 13, R5 (2012)

  62. 62.

    et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008)

  63. 63.

    1000 Genomes Project Consortiumet al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

  64. 64.

    , & Distributions of exons and introns in the human genome. In Silico Biol. 4, 387–393 (2004)

  65. 65.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)

  66. 66.

    , , & Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010)

  67. 67.

    , & D3: data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011)

  68. 68.

    et al. FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki. Database 2016, baw105 (2016)

Download references

Acknowledgements

FANTOM5 was made possible by research grants for the RIKEN Omics Science Center and the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT to Y.H. It was also supported by research grants for the RIKEN Preventive Medicine and Diagnosis Innovation Program (RIKEN PMI) to Y.H. and the RIKEN Centre for Life Science Technologies, Division of Genomic Technologies (RIKEN CLST (DGT)) from the MEXT, Japan. A.R.R.F. is supported by a Senior Cancer Research Fellowship from the Cancer Research Trust, the MACA Ride to Conquer Cancer and the Australian Research Council’s Discovery Projects funding scheme (DP160101960). S.D. is supported by award number U54HG007004 from the National Human Genome Research Institute of the National Institutes of Health, funding from the Ministry of Economy and Competitiveness (MINECO) under grant number BIO2011-26205, and SEV-2012-0208 from the Spanish Ministry of Economy and Competitiveness. Y.A.M. is supported by the Russian Science Foundation, grant 15-14-30002. We thank RIKEN GeNAS for generation of the CAGE and RNA-seq libraries, the Netherlands Brain Bank for brain materials, the RIKEN BioResource Centre for providing cell lines and all members of the FANTOM5 consortium for discussions, in particular H. Ashoor, M. Frith, R. Guigo, A. Tanzer, E. Wood, H. Jia, K. Bailie, J. Harrow, E. Valen, R. Andersson, K. Vitting-Seerup, A. Sandelin, M. Taylor, J. Shin, R. Mori, C. Mungall and T. Meehan.

Author information

Author notes

    • Nicolas Bertin
    • , Sarah Djebali
    •  & Mickaël Mendez

    Present addresses: Human Longevity Singapore Pte. Ltd., Singapore (N.B.); GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet Tolosan, France (S.D.); Department of Computer Science, University of Toronto, Ontario, Canada (M.M.).

Affiliations

  1. RIKEN Center for Life Science Technologies (Division of Genomic Technologies), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan

    • Chung-Chau Hon
    • , Jordan A. Ramilowski
    • , Jayson Harshbarger
    • , Jessica Severin
    • , Marina Lizio
    • , Hideya Kawaji
    • , Takeya Kasukawa
    • , Masayoshi Itoh
    • , A. Maxwell Burroughs
    • , Shohei Noma
    • , Chi-Wai Yip
    • , Imad Abugessaisa
    • , Mickaël Mendez
    • , Akira Hasegawa
    • , Dave Tang
    • , Timo Lassmann
    • , Peter Heutink
    • , Harukazu Suzuki
    • , Carsten O. Daub
    • , Michiel J. L. de Hoon
    • , Erik Arner
    • , Piero Carninci
    •  & Alistair R. R. Forrest
  2. RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan

    • Jordan A. Ramilowski
    • , Jayson Harshbarger
    • , Nicolas Bertin
    • , Jessica Severin
    • , Marina Lizio
    • , Hideya Kawaji
    • , Masayoshi Itoh
    • , A. Maxwell Burroughs
    • , Shohei Noma
    • , Mickaël Mendez
    • , Akira Hasegawa
    • , Dave Tang
    • , Timo Lassmann
    • , Harukazu Suzuki
    • , Carsten O. Daub
    • , Michiel J. L. de Hoon
    • , Erik Arner
    • , Yoshihide Hayashizaki
    • , Piero Carninci
    •  & Alistair R. R. Forrest
  3. Cancer Science Institute of Singapore, National University of Singapore, Centre for Translational Medicine, 14 Medical Drive, #12-01, Singapore 117599, Singapore

    • Nicolas Bertin
  4. University of Bristol, Department of Computer Science, Life Sciences building, 24 Tyndall Avenue, Bristol BS8 1TQ, UK

    • Owen J. L. Rackham
    •  & Julian Gough
  5. Program in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, 8 College Road, 169857 Singapore

    • Owen J. L. Rackham
  6. Institute of Natural and Mathematical Sciences, Massey University Auckland, Albany 0632, New Zealand

    • Elena Denisenko
    •  & Sebastian Schmeier
  7. Biotechnology Research Institute for Drug Discovery (BRD), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki, 305-8568, Japan

    • Thomas M. Poulsen
  8. RIKEN Preventive Medicine and Diagnosis Innovation Program, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan

    • Hideya Kawaji
    • , Masayoshi Itoh
    •  & Yoshihide Hayashizaki
  9. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA

    • A. Maxwell Burroughs
  10. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain

    • Sarah Djebali
  11. Universitat Pompeu Fabra (UPF), Barcelona Biomedical Research Park (PRBB), Dr Aiguader 88, Barcelona 08003, Spain

    • Sarah Djebali
  12. Computational Bioscience Research Center; Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia

    • Tanvir Alam
  13. Institute of Bioengineering, Research Center of Biotechnology RAS, Moscow 119071, Russia

    • Yulia A. Medvedeva
  14. Vavilov Institute of General Genetic, RAS, Moscow 119991, Russia

    • Yulia A. Medvedeva
  15. Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, the University of Western Australia, Nedlands 6009, Western Australia, Australia

    • Alison C. Testa
    •  & Alistair R. R. Forrest
  16. Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan 48201, USA

    • Leonard Lipovich
  17. Department of Neurology, School of Medicine, Wayne State University, Detroit, Michigan 48201, USA

    • Leonard Lipovich
  18. Telethon Kids Institute, The University of Western Australia, 100 Roberts Road, Subiaco, Subiaco, 6008, Western Australia, Australia

    • Dave Tang
    •  & Timo Lassmann
  19. German Center for Neurodegenerative Diseases (DZNE), D-72076 Tübingen, Germany

    • Peter Heutink
  20. Department of Dermatology and Allergy, Charité Universitätsmedizin Berlin, 10117 Berlin, Germany

    • Magda Babina
  21. Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane 4072, Australia

    • Christine A. Wells
  22. Faculty of Medicine, Department of Anatomy and Neuroscience, The University of Melbourne, 3010, Australia

    • Christine A. Wells
  23. RIKEN CLST (Division of Bio-Function Dynamics Imaging), Wako, Saitama 351-0198, Japan

    • Soichi Kojima
  24. Cell Engineering Division, RIKEN BioResource Center, Tsukuba, Ibaraki 305-0074, Japan

    • Yukio Nakamura
  25. Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan

    • Yukio Nakamura
  26. Department of Biosciences and Nutrition, Karolinska Institutet, 141 83 Huddinge, Sweden

    • Carsten O. Daub

Authors

  1. Search for Chung-Chau Hon in:

  2. Search for Jordan A. Ramilowski in:

  3. Search for Jayson Harshbarger in:

  4. Search for Nicolas Bertin in:

  5. Search for Owen J. L. Rackham in:

  6. Search for Julian Gough in:

  7. Search for Elena Denisenko in:

  8. Search for Sebastian Schmeier in:

  9. Search for Thomas M. Poulsen in:

  10. Search for Jessica Severin in:

  11. Search for Marina Lizio in:

  12. Search for Hideya Kawaji in:

  13. Search for Takeya Kasukawa in:

  14. Search for Masayoshi Itoh in:

  15. Search for A. Maxwell Burroughs in:

  16. Search for Shohei Noma in:

  17. Search for Sarah Djebali in:

  18. Search for Tanvir Alam in:

  19. Search for Yulia A. Medvedeva in:

  20. Search for Alison C. Testa in:

  21. Search for Leonard Lipovich in:

  22. Search for Chi-Wai Yip in:

  23. Search for Imad Abugessaisa in:

  24. Search for Mickaël Mendez in:

  25. Search for Akira Hasegawa in:

  26. Search for Dave Tang in:

  27. Search for Timo Lassmann in:

  28. Search for Peter Heutink in:

  29. Search for Magda Babina in:

  30. Search for Christine A. Wells in:

  31. Search for Soichi Kojima in:

  32. Search for Yukio Nakamura in:

  33. Search for Harukazu Suzuki in:

  34. Search for Carsten O. Daub in:

  35. Search for Michiel J. L. de Hoon in:

  36. Search for Erik Arner in:

  37. Search for Yoshihide Hayashizaki in:

  38. Search for Piero Carninci in:

  39. Search for Alistair R. R. Forrest in:

Contributions

The manuscript was written by A.R.R.F., C.C.H., J.A.R. and N.B. with help from P.C., E.A. and M.L. C.C.H., J.A.R., J.H., N.B., O.J.L.R., Y.H., P.C. and A.R.R.F. are core authors for the lncRNA work. P.H., M.B., C.A.W., S.K. and Y.N. provided samples. C.C.H. performed most of the analyses with help from others as listed below. C.C.H., N.B., J.A.R., O.R., J.G., A.M.B., S.D., A.H. and T.L.: RNA-seq assembly. C.C.H., J.A.R. N.B., A.T.C. and M.J. L.d.H.: coding potential assessment. C.C.H. devised and implemented the TIEScore, transcript model integration and CAT. S.S., C.C.H. and E.D. performed the GWAS and eQTL analyses. C.C.H., T.A. and Y.A.M. analysed TIRs. C.C.H. and T.M.P.: expression specificity analysis. L.L.: discussions in planning. J.H. implemented the web tool. M.I. and P.C. generated CAGE data. S.N. generated the RNA-seq. H.K. and T.L. clustered the CAGE data. C.C.H., N.B. and J.S. made ZENBU configurations. M.L., H.K., T.K. and I.A.: data handling. C.W.Y. curated cell-type and trait associations. M.M. helped with cell-type enrichment analysis. D.T. helped with repeats analysis. FANTOM5 headquarters: Y.H., A.R.R.F., P.C., M.I., C.O.D., H.S., T.L. and E.A. P.C., Y.H. and A.R.R.F. conceived the project and managed FANTOM5. The scientific coordinator was A.R.R.F. and the general organizer was Y.H.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Piero Carninci or Alistair R. R. Forrest.

Reviewer Information Nature thanks M. Gerstein, J. Rinn and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Notes 1-6, Supplementary Figures 1-14, descriptions for Supplementary Tables 1-19, online resources and Supplementary references.

Zip files

  1. 1.

    Supplementary Data

    This zipped file contains Supplementary Tables 1-19 – see Supplementary Information document for descriptions.

  2. 2.

    Supplementary Data

    This zipped file contains source data for Supplementary Figures 1-6.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature21374

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Newsletter Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing