Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

An atlas of human long non-coding RNAs with accurate 5′ ends

Abstract

Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5′ ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Conservation of lncRNAs.
Figure 2: Cell-type-specific lncRNAs implicated in GWAS traits.
Figure 3: LncRNAs implicated in eQTL.
Figure 4: Functional evidence of human lncRNAs.

References

  1. Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)

    CAS  ADS  Article  Google Scholar 

  2. Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012)

    CAS  ADS  Article  Google Scholar 

  3. Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nature Genet. 47, 199–208 (2015)

    CAS  Article  Google Scholar 

  4. Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)

    CAS  Article  Google Scholar 

  5. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)

    CAS  Article  Google Scholar 

  6. Quek, X. C. et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015)

    CAS  Article  Google Scholar 

  7. Schmidt, L. H. et al. The long noncoding MALAT-1 RNA indicates a poor prognosis in non-small cell lung cancer and induces migration and tumor growth. J. Thorac. Oncol. 6, 1984–1992 (2011)

    Article  Google Scholar 

  8. Andersson, R. et al. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nature Commun. 5, 5336 (2014)

    CAS  ADS  Article  Google Scholar 

  9. Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008)

    CAS  ADS  Article  Google Scholar 

  10. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)

    CAS  ADS  Article  Google Scholar 

  11. Quinn, J. J. & Chang, H. Y. Unique features of long non-coding RNA biogenesis and function. Nature Rev. Genet. 17, 47–62 (2016)

    CAS  Article  Google Scholar 

  12. Palazzo, A. F. & Lee, E. S. Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2 (2015)

    Article  Google Scholar 

  13. Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016)

    CAS  ADS  Article  Google Scholar 

  14. Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010)

    Article  Google Scholar 

  15. Li, M. J. et al. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 44 (D1), D869–D876 (2016)

    CAS  Article  Google Scholar 

  16. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015)

  17. Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015)

    CAS  ADS  Article  Google Scholar 

  18. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nature Methods 10, 1177–1184 (2013)

    CAS  Article  Google Scholar 

  19. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)

    CAS  Article  Google Scholar 

  20. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003)

    CAS  ADS  Article  Google Scholar 

  21. Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)

    CAS  ADS  Article  Google Scholar 

  22. Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015)

    CAS  ADS  Article  Google Scholar 

  23. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)

  24. Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013)

    CAS  Article  Google Scholar 

  25. Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013)

    CAS  Article  Google Scholar 

  26. Sigova, A. A. et al. Divergent transcription of long noncoding RNA/mRNA gene pairs in embryonic stem cells. Proc. Natl Acad. Sci. USA 110, 2876–2881 (2013)

    CAS  ADS  Article  Google Scholar 

  27. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)

    CAS  Article  Google Scholar 

  28. Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nature Genet. 46, 1311–1320 (2014)

    CAS  Article  Google Scholar 

  29. Xiang, J.-F. et al. Human colorectal cancer-specific CCAT1-L lncRNA regulates long-range chromatin interactions at the MYC locus. Cell Res. 24, 513–531 (2014)

    CAS  Article  Google Scholar 

  30. Ulitsky, I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nature Rev. Genet. 17, 601–614 (2016)

    CAS  Article  Google Scholar 

  31. Kapusta, A. et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 9, e1003470 (2013)

    CAS  Article  Google Scholar 

  32. Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015)

    CAS  Article  Google Scholar 

  33. Ng, S.-Y., Johnson, R. & Stanton, L. W. Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. EMBO J. 31, 522–533 (2012)

    CAS  Article  Google Scholar 

  34. Holm, H. et al. Several common variants modulate heart rate, PR interval and QRS duration. Nature Genet. 42, 117–122 (2010)

    CAS  Article  Google Scholar 

  35. Pfeufer, A. et al. Genome-wide association study of PR interval. Nature Genet. 42, 153–159 (2010)

    CAS  Article  Google Scholar 

  36. Smith, J. G. et al. Genome-wide association study of electrocardiographic conduction measures in an isolated founder population: Kosrae. Heart Rhythm 6, 634–641 (2009)

    Article  Google Scholar 

  37. Paralkar, V. R. et al. Unlinking an lncRNA from its associated cis element. Mol. Cell 62, 104–110 (2016)

    CAS  Article  Google Scholar 

  38. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

    CAS  Article  Google Scholar 

  39. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  40. Lai, F. et al. Activating RNAs associate with Mediator to enhance chromatin architecture and transcription. Nature 494, 497–501 (2013)

    CAS  ADS  Article  Google Scholar 

  41. Clark, M. B. et al. The reality of pervasive transcription. PLoS Biol. 9, e1000625, (2011)

    CAS  Article  Google Scholar 

  42. Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)

    CAS  Article  Google Scholar 

  43. Severin, J. et al. Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nature Biotechnol. 32, 217–219 (2014)

    CAS  Article  Google Scholar 

  44. Hasegawa, A., Daub, C., Carninci, P., Hayashizaki, Y. & Lassmann, T. MOIRAI: a compact workflow system for CAGE analysis. BMC Bioinformatics 15, 144 (2014)

    Article  Google Scholar 

  45. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28, 511–515 (2010)

    CAS  Article  Google Scholar 

  46. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnol. 29, 644–652 (2011)

    CAS  Article  Google Scholar 

  47. Kent, W. J. BLAT--the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

    CAS  Article  Google Scholar 

  48. Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnol. 32, 462–464 (2014)

    CAS  Article  Google Scholar 

  49. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012)

    CAS  Article  Google Scholar 

  50. Sloan, C. A. et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 44 (D1), D726–D732 (2016)

    CAS  Article  Google Scholar 

  51. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)

    CAS  Article  Google Scholar 

  52. Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011)

    CAS  Article  Google Scholar 

  53. Washietl, S. et al. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA 17, 578–594 (2011)

    CAS  Article  Google Scholar 

  54. Olexiouk, V. et al. sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res. 44 (D1), D324–D329 (2016)

    CAS  Article  Google Scholar 

  55. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002)

    CAS  Article  Google Scholar 

  56. Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29, 2487–2489 (2013)

    CAS  Article  Google Scholar 

  57. Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2013)

    CAS  Article  Google Scholar 

  58. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)

    CAS  Article  Google Scholar 

  59. Chao, A. & Shen, T.-J. Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10, 429–443 (2003)

    MathSciNet  Article  Google Scholar 

  60. Meehan, T. F. et al. Logical development of the cell ontology. BMC Bioinformatics 12, 6 (2011)

    Article  Google Scholar 

  61. Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E. & Haendel, M. A. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 13, R5 (2012)

    Article  Google Scholar 

  62. Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008)

    CAS  Article  Google Scholar 

  63. 1000 Genomes Project Consortiumet al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)

  64. Sakharkar, M. K., Chow, V. T. K. & Kangueane, P. Distributions of exons and introns in the human genome. In Silico Biol. 4, 387–393 (2004)

    CAS  PubMed  Google Scholar 

  65. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)

    CAS  Article  Google Scholar 

  66. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010)

    CAS  Article  Google Scholar 

  67. Bostock, M., Ogievetsky, V. & Heer, J. D3: data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011)

    Article  Google Scholar 

  68. Abugessaisa, I. et al. FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki. Database 2016, baw105 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

FANTOM5 was made possible by research grants for the RIKEN Omics Science Center and the Innovative Cell Biology by Innovative Technology (Cell Innovation Program) from the MEXT to Y.H. It was also supported by research grants for the RIKEN Preventive Medicine and Diagnosis Innovation Program (RIKEN PMI) to Y.H. and the RIKEN Centre for Life Science Technologies, Division of Genomic Technologies (RIKEN CLST (DGT)) from the MEXT, Japan. A.R.R.F. is supported by a Senior Cancer Research Fellowship from the Cancer Research Trust, the MACA Ride to Conquer Cancer and the Australian Research Council’s Discovery Projects funding scheme (DP160101960). S.D. is supported by award number U54HG007004 from the National Human Genome Research Institute of the National Institutes of Health, funding from the Ministry of Economy and Competitiveness (MINECO) under grant number BIO2011-26205, and SEV-2012-0208 from the Spanish Ministry of Economy and Competitiveness. Y.A.M. is supported by the Russian Science Foundation, grant 15-14-30002. We thank RIKEN GeNAS for generation of the CAGE and RNA-seq libraries, the Netherlands Brain Bank for brain materials, the RIKEN BioResource Centre for providing cell lines and all members of the FANTOM5 consortium for discussions, in particular H. Ashoor, M. Frith, R. Guigo, A. Tanzer, E. Wood, H. Jia, K. Bailie, J. Harrow, E. Valen, R. Andersson, K. Vitting-Seerup, A. Sandelin, M. Taylor, J. Shin, R. Mori, C. Mungall and T. Meehan.

Author information

Authors and Affiliations

Authors

Contributions

The manuscript was written by A.R.R.F., C.C.H., J.A.R. and N.B. with help from P.C., E.A. and M.L. C.C.H., J.A.R., J.H., N.B., O.J.L.R., Y.H., P.C. and A.R.R.F. are core authors for the lncRNA work. P.H., M.B., C.A.W., S.K. and Y.N. provided samples. C.C.H. performed most of the analyses with help from others as listed below. C.C.H., N.B., J.A.R., O.R., J.G., A.M.B., S.D., A.H. and T.L.: RNA-seq assembly. C.C.H., J.A.R. N.B., A.T.C. and M.J. L.d.H.: coding potential assessment. C.C.H. devised and implemented the TIEScore, transcript model integration and CAT. S.S., C.C.H. and E.D. performed the GWAS and eQTL analyses. C.C.H., T.A. and Y.A.M. analysed TIRs. C.C.H. and T.M.P.: expression specificity analysis. L.L.: discussions in planning. J.H. implemented the web tool. M.I. and P.C. generated CAGE data. S.N. generated the RNA-seq. H.K. and T.L. clustered the CAGE data. C.C.H., N.B. and J.S. made ZENBU configurations. M.L., H.K., T.K. and I.A.: data handling. C.W.Y. curated cell-type and trait associations. M.M. helped with cell-type enrichment analysis. D.T. helped with repeats analysis. FANTOM5 headquarters: Y.H., A.R.R.F., P.C., M.I., C.O.D., H.S., T.L. and E.A. P.C., Y.H. and A.R.R.F. conceived the project and managed FANTOM5. The scientific coordinator was A.R.R.F. and the general organizer was Y.H.

Corresponding authors

Correspondence to Piero Carninci or Alistair R. R. Forrest.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks M. Gerstein, J. Rinn and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Building a 5′ complete lncRNA catalogue.

a, Integration of CAGE and transcript models. CAGE clusters were used to integrate transcript models from various sources and their 5′ completeness was assessed on the basis of TIEScore. b, Identification of lncRNAs. TIEScore identified 59,110 genes and coding potential assessment further identified 27,919 lncRNAs in FANTOM CAT at the robust TIEScore cutoff. c, Categorization of lncRNAs. LncRNAs were annotated according to their gene orientation (that is, genomic context) and DHS type23 (that is, epigenomic context) and then categorized into divergent p-lncRNAs (purple), intergenic p-lncRNAs (blue), e-lncRNAs (green) and other lncRNAs (grey). d, Overlaps between FANTOM CAT and other lncRNA catalogues. e, LncRNA gene models outside FANTOM CAT are 5′ incomplete. LncRNAs found commonly in both catalogues (grey), or only in FANTOM CAT (red), show stronger evidence of transcription initiation (DHS, H3K4me1, H3K4me3 and PolII ChIP-seq23) and conservation (phastCons38) than those found only in other lncRNA catalogues (blue, green or yellow).

Source data

Extended Data Figure 2 FANTOM CAT is more 5′ complete than other lncRNA catalogues.

a, FANTOM CAT lncRNA TSS are well-supported. The 5′ ends of FANTOM CAT lncRNAs (first column) have stronger transcriptomic, epigenomic and genomic evidence of transcription initiation than the 5′ ends of lncRNA models in the Human BodyMap 2.0 (ref. 4), miTranscriptome3 and GENCODE release 25 (ref. 19) (second column). In b and c, the box plots show the median, quartiles and Tukey whiskers of the estimates of FDR of complete 5′ ends (b) and number of 5′ complete lncRNA genes (c) on the basis of ten sets of gold standard TSS and non-TSS regions (Methods). b, FDR of complete 5′ ends. c, Estimated number of 5′ complete lncRNA genes (total number of genes × [1 − FDR]). d, Validation rate of gene models using RAMPAGE. RAMPAGE data sets25,50 (n = 207, Methods) were used to validate the lncRNA transcripts in FANTOM CAT and other catalogues (left). Transcripts containing full consensus CDS (CCDS transcripts) were used for control (right). The exon of a transcript is detected by RAMPAGE31 if it overlaps ≥3 RAMPAGE 3′ ends. Transcript detection rates of all catalogues were plotted (upper). About 95% of lncRNA transcripts in the robust FANTOM CAT can be detected, which is slightly higher than that of GENCODE release 25 (~92%). The TSS of a detected transcript is validated by RAMPAGE if it is located within the proximity of a RAMPAGE 5′ end (for example, from 0 to 500 bp, x axis, lower). At 100 bp, ~95% of lncRNA transcripts in the robust FANTOM CAT can be validated, versus ~85% for that of GENCODE release 25. We note the percentages of CCDS transcripts in FANTOM CAT and GENCODE release 25 detected or validated by RAMPAGE are similar, with the robust and stringent FANTOM CAT catalogues performing slightly better.

Source data

Extended Data Figure 3 Revision of lncRNA models in GENCODE.

a, An example of improved TSS annotation of a GENCODE release 25 lncRNA gene. The 5′ ends of GENCODE release 25 annotated lncRNA transcripts of TUG1 (ENSG00000253352) are distant from the region of strong CAGE signal, while FANTOM CAT added extra transcripts accurately start from the proximal CAGE signal summit. b, An example of bridged gene models of GENCODE release 25 lncRNA genes. In GENCODE release 25, the locus was annotated with three short lncRNA genes; FANTOM CAT bridged these short lncRNA transcript models into a long transcript model (RP11-973H7.4, ENSG00000267654) starting from the proximal CAGE signal summit.

Extended Data Figure 4 Heterogeneity among lncRNA gene categories.

a, Epigenomic features surrounding TSS. The y axis refers to the fraction of TIR overlaps with peaks of the corresponding epigenomic signal from the Roadmap Epigenome Consortium23. b, Genomic features surrounding TSS. Sequence features conducive to generating longer transcripts are enrichment of 5′ splice site (5′ SS) and depletion of polyadenylation sites (PAS). Sequence features associated with transcription initiation include CpG islands, INR (initiator) motif and TATA box motif. c, Core promoter motifs. Grey dashed lines indicate whole-genome background.

Source data

Extended Data Figure 5 Transposons at TIRs.

a, Percentages of genes with conserved and unconserved TIR (as defined in Fig. 1c) and their overlap with various classes of transposons. b, Enrichment of retrotransposons at unconserved TIR. The Venn diagrams show the overlap between unconserved TIR, DNA transposons and retrotransposons. Retrotransposons are significantly enriched in unconserved TIR of all gene classes (one-tailed Fisher’s exact test, P < 0.05).

Source data

Extended Data Figure 6 Expression landscape of lncRNAs in primary cells.

a, Expression level and specificity. Abbreviation cpm is relative log expression (rle) normalized count per millions. The maximum expression level (log2 cpm) and expression specificity (Chao–Shen’s corrected Shannon entropy59) of genes among 69 primary cell facets10 were plotted. Box plots show the median (dashed lines), quartiles and Tukey whiskers. b, Percentage of genes within categories expressed within primary cell facets. The circles represent the mean among samples within a facet and the error bars represent 99.99% confidence intervals. Dashed lines represent the means among all samples. c, Number of lncRNA genes expressed within primary cell facets. Dashed line represents the mean among all samples. The x axis is sorted on the basis of number of lncRNA genes expressed. A gene is considered as ‘expressed’ when cpm ≥ 0.01.

Source data

Extended Data Figure 7 Association of cell-type-enriched genes with trait-associated genes of different biological themes.

A detailed view of blocks from Fig. 2a. The dendrograms were coloured as in Fig. 2a. a, ‘Immune system’ cell types and ‘infection and immunity’ traits. b, ‘Hepato-intestinal system’ cell types and ‘hepatic function’ traits. c, ‘Pigmented cells’ cell types and ‘pigmentation’ traits. d, ‘Non-immune blood cells’ cell types and ‘blood homeostasis’ traits. e, ‘Cardiovascular system’ cell types and ‘cardiovascular function’ traits.

Extended Data Figure 8 LncRNA AP001057.1 is associated with classical monocytes and implicated in immune diseases.

a, Genomic view of AP001057.1 (ENSG00000232124) in the ZENBU genome browser43. The strongest TSS of AP001057.1 overlaps with an enhancer DHS. The locus overlaps with fine-mapped SNPs associated with Crohn’s disease and GWAS SNPs associated with coeliac disease and inflammatory bowel disease. b, AP001057.1 is associated with classical monocytes (CL:0000860). c, AP001057.1 is significantly upregulated in monocytes upon stimulation with various immunogenic agents (FDR < 0.05 in edgeR58, highlighted in red and indicated with asterisks). Note: we performed differential expression analysis to identify lncRNAs that are dynamically regulated upon stimulation, infection or differentiation on the basis of 25 manually curated series of FANTOM5 samples (Supplementary Table 18 and Methods), and the results are available in Supplementary Table 19. Figures were captured (with slight modifications) from the online resource at http://fantom.gsc.riken.jp/cat/v1/#/genes/ENSG00000232124.1.

Extended Data Figure 9 Selective constraints and enrichment of GWAS trait and eQTL-associated SNPs at lncRNA loci.

a, Selective constraints between species (phastCons38) and within human population (derived allele frequency39). b, Enrichment of GWAS SNPs. Only lead GWAS SNPs15 were used (Methods). c, Enrichment of PICS17 fine-mapped SNPs in global (all versus all) or focused (immune versus immune) analysis (Methods). d, Enrichment of GTEx eQTL SNPs16 associated with expression of mRNAs. Circles represent means and the error bars represent their 99.99% confidence intervals.

Source data

Extended Data Figure 10 Co-expression of various gene pairs linked by eQTL SNPs.

We searched for gene loci that overlap eQTL SNPs associated with expression variation of mRNAs (as identified by GTEx16). Gene loci overlapping these SNPs were then paired with the corresponding mRNA and their expression correlation across the FANTOM5 expression atlas was investigated. Rows compare the gene types overlapping the SNPs. a, mRNAs; b, all lncRNAs; c, divergent p-lncRNAs; d, intergenic p-lncRNAs; e, e-lncRNAs. Columns compare the relative orientation of the gene pairs and the position of the SNPs. The term ‘all’ refers to all orientations of the gene pairs and positions of the SNPs pooled. Gene pairs were binned on the basis of the number of SNPs linking the pair (bin = 5 SNPs). The data points represent the mean of absolute Spearman’s rho and the error bars represent its 99.99% confidence intervals. At each bin, the number of pairs plotted is the same for the three pair types as indicated.

Source data

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-6, Supplementary Figures 1-14, descriptions for Supplementary Tables 1-19, online resources and Supplementary references. (PDF 9152 kb)

Supplementary Data

This zipped file contains Supplementary Tables 1-19 – see Supplementary Information document for descriptions. (ZIP 74370 kb)

Supplementary Data

This zipped file contains source data for Supplementary Figures 1-6. (ZIP 1386 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hon, CC., Ramilowski, J., Harshbarger, J. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017). https://doi.org/10.1038/nature21374

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature21374

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing