Subjects

Abstract

There is growing recognition that mammalian cells produce many thousands of large intergenic transcripts1,2,3,4. However, the functional significance of these transcripts has been particularly controversial. Although there are some well-characterized examples, most (>95%) show little evidence of evolutionary conservation and have been suggested to represent transcriptional noise5,6. Here we report a new approach to identifying large non-coding RNAs using chromatin-state maps to discover discrete transcriptional units intervening known protein-coding loci. Our approach identified 1,600 large multi-exonic RNAs across four mouse cell types. In sharp contrast to previous collections, these large intervening non-coding RNAs (lincRNAs) show strong purifying selection in their genomic loci, exonic sequences and promoter regions, with greater than 95% showing clear evolutionary conservation. We also developed a functional genomics approach that assigns putative functions to each lincRNA, demonstrating a diverse range of roles for lincRNAs in processes from embryonic stem cell pluripotency to cell proliferation. We obtained independent functional validation for the predictions for over 100 lincRNAs, using cell-based assays. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFκB, Sox2, Oct4 (also known as Pou5f1) and Nanog. Together, these results define a unique collection of functional lincRNAs that are highly conserved and implicated in diverse biological processes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

Data deposits

Microarray data have been deposited in the Gene Expression Omnibus (GEO) under accession number GSE13765.

References

  1. 1.

    et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004)

  2. 2.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)

  3. 3.

    et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002)

  4. 4.

    et al. The transcriptional activity of human chromosome 22. Genes Dev. 17, 529–540 (2003)

  5. 5.

    , & Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 17, 556–565 (2007)

  6. 6.

    Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007)

  7. 7.

    , , & The product of the H19 gene may function as an RNA. Mol. Cell. Biol. 10, 28–36 (1990)

  8. 8.

    et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991)

  9. 9.

    , & Tsix, a gene antisense to Xist at the X-inactivation centre. Nature Genet. 21, 400–404 (1999)

  10. 10.

    et al. Unregulated expression of the imprinted genes H19 and Igf2r in mouse uniparental fetuses. J. Biol. Chem. 277, 12474–12478 (2002)

  11. 11.

    et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007)

  12. 12.

    et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570–1573 (2005)

  13. 13.

    et al. Mouse transcriptome: neutral evolution of ‘non-coding’ complementary DNAs. Nature 431 1–2 10.1038/nature03016 (2004)

  14. 14.

    et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)

  15. 15.

    , , , & miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34, D140–D144 (2006)

  16. 16.

    et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534–538 (2008)

  17. 17.

    et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539–543 (2008)

  18. 18.

    et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl Acad. Sci. USA 104, 19428–19433 (2007)

  19. 19.

    et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007)

  20. 20.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

  21. 21.

    et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)

  22. 22.

    et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl Acad. Sci. USA 99, 4465–4470 (2002)

  23. 23.

    et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

  24. 24.

    , & Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 (Suppl 1). S136–S144 (2002)

  25. 25.

    et al. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc. Natl Acad. Sci. USA 102, 3738–3743 (2005)

  26. 26.

    , , & Homeobox D10 induces phenotypic reversion of breast tumor cells in a three-dimensional culture model. Cancer Res. 65, 7177–7185 (2005)

  27. 27.

    et al. Cre-lox-regulated conditional RNA interference from transgenes. Proc. Natl Acad. Sci. USA 101, 10380–10385 (2004)

  28. 28.

    et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nature Genet. 38, 431–440 (2006)

  29. 29.

    et al. Dissecting self-renewal in stem cells with RNA interference. Nature 442, 533–538 (2006)

  30. 30.

    , , , & Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008)

Download references

Acknowledgements

We would like to thank our colleagues at the Broad Institute, especially J. P. Mesirov for discussions and statistical insights, X. Xie for statistical help with conservation analyses, J. Robinson for visualization help, M. Ku, E. Mendenhall and X. Zhang for help generating ChIP samples, and N. Novershtern and A. Levy for providing transcription factor lists. M. Guttman is a Vertex scholar, I.A. acknowledges the support of the Human Frontier Science Program Organization. This work was funded by Beth Israel Deaconess Medical Center, National Human Genome Research Institute, and the Broad Institute of MIT and Harvard.

Author Contributions J.L.R., E.S.L., A.R. and M. Guttman conceived and designed experiments. The manuscript was written by M. Guttman, A.R., J.L.R. and E.S.L. J.L.R., I.A., C.F., D.F., M.H., B.W.C., J.P.C. and M. Guttman performed molecular biology experiments. All data analyses were performed by M. Guttman in conjunction with M. Garber (conservation analyses), M.F.L. (codon substitution frequency), T.S.M. (ChlP-seq data), O.Z. (motif analysis) and M.N.C. (lincRNA genomic location analysis). Reagents were provided by M. Garber (pre-published conservation analysis tools); T.J. and D.F. (p53 wild-type and knockout MEFs); N.H., A.R. and I.A. (dendritic cell stimulated time course); B.E.B. (ChlP data); R.J., B.W.C. and J.P.C. (luciferase assays); and M.K. and M.F.L. (codon substitution frequency code).

Author information

Author notes

    • John L. Rinn
    •  & Eric S. Lander

    These authors contributed equally to this work.

Affiliations

  1. Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA

    • Mitchell Guttman
    • , Ido Amit
    • , Manuel Garber
    • , Courtney French
    • , Michael F. Lin
    • , Maite Huarte
    • , Or Zuk
    • , Tarjei S. Mikkelsen
    • , Nir Hacohen
    • , Bradley E. Bernstein
    • , Manolis Kellis
    • , Aviv Regev
    • , John L. Rinn
    •  & Eric S. Lander
  2. Department of Biology,

    • Mitchell Guttman
    • , Bryce W. Carey
    • , John P. Cassady
    • , Rudolf Jaenisch
    • , Tyler Jacks
    • , Aviv Regev
    •  & Eric S. Lander
  3. The Koch Institute for Integrative Cancer Research,

    • David Feldser
    •  & Tyler Jacks
  4. Division of Health Sciences and Technology, and,

    • Tarjei S. Mikkelsen
  5. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Manolis Kellis
  6. Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02215, USA

    • Maite Huarte
    •  & John L. Rinn
  7. Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02114, USA

    • Moran N. Cabili
    •  & Eric S. Lander
  8. Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA

    • Bryce W. Carey
    • , John P. Cassady
    • , Rudolf Jaenisch
    •  & Eric S. Lander
  9. Center for Immunology and Inflammatory Diseases,

    • Nir Hacohen
  10. Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts 02129, USA

    • Bradley E. Bernstein
  11. Department of Pathology, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Bradley E. Bernstein
    •  & John L. Rinn

Authors

  1. Search for Mitchell Guttman in:

  2. Search for Ido Amit in:

  3. Search for Manuel Garber in:

  4. Search for Courtney French in:

  5. Search for Michael F. Lin in:

  6. Search for David Feldser in:

  7. Search for Maite Huarte in:

  8. Search for Or Zuk in:

  9. Search for Bryce W. Carey in:

  10. Search for John P. Cassady in:

  11. Search for Moran N. Cabili in:

  12. Search for Rudolf Jaenisch in:

  13. Search for Tarjei S. Mikkelsen in:

  14. Search for Tyler Jacks in:

  15. Search for Nir Hacohen in:

  16. Search for Bradley E. Bernstein in:

  17. Search for Manolis Kellis in:

  18. Search for Aviv Regev in:

  19. Search for John L. Rinn in:

  20. Search for Eric S. Lander in:

Corresponding author

Correspondence to John L. Rinn.

Supplementary information

PDF files

  1. 1.

    Supplementary Figures

    This file contains Supplementary Figures 1-11 with Legends

  2. 2.

    Supplementary Information

    This file contains Supplementary Methods and Supplementary References

Excel files

  1. 1.

    Supplementary Table 1

    In Supplementary Table 1 the K4-K36 domain coordinates are shown and the K4-K36 enriched domains in the 4 mouse cell types are listed. Coordinates are indicated in mouse genome build MM8.

  2. 2.

    Supplementary Table 2

    In Supplementary Table 2 the lincRNA Exon Coordinates and Pi LOD Enrichment Score are shown. lincRNA exons defined by Nimbelegen tiling micorarrays are listed in mouse genome build MM9. Each exon has an associated Pi LOD Enrichment Score (Methods) reported.

  3. 3.

    Supplementary Table 4

    In Supplementary Table 4 the PCR validation primer sequences are shown. Primer sequences used for validation of lincRNA expression by PCR and qPCR are reported.

  4. 4.

    Supplementary Table 5

    In Supplementary Table 5 the Northern blot analysis probe sequences and primers are shown. Primers and amplicons for Northern blot analyses are provided. The correct file for Supplementary Table 5 was uploaded on 4th March, 2009.

  5. 5.

    Supplementary Table 6

    In Supplementary Table 6 the Codon Substitution Frequency (CSF) Scores are shown. The CSF score for each K4-K36 domain is provided. Coordinates are reported in mouse genome build MM9. An updated version for Suplementary Table 6 was uploaded on 4th March, 2009

  6. 6.

    Supplementary Table 7

    In Supplementary Table 7 the Exon conservation for lincRNAs and other annotations are shown. Pi LOD Enrichment scores are provided for lincRNA exons and other annotations compared in the text. The coordinates are provided in Mouse genome MM9 and the max 12-mer LOD score as well as the randomized average max 12-mer LOD score is indicated along with the enrichment score.

  7. 7.

    Supplementary Table 8

    In Supplementary Table 8 the lincRNA Promoter Conservation is shown. Pi LOD Enrichment scores are provided for each lincRNA promoter region, protein coding gene promoters, and random intergenic regions. Coordinates are provided in Mouse genome build MM9.

  8. 8.

    Supplementary Table 9

    In Supplementary Table 9 the Human and Mouse orthologous lincRNAs are shown. lincRNAs defined in Human Lung Fibroblasts were lifted into the mouse genome (MM8) and enrichment statistics were computed for Mouse Lung Fibroblasts (Methods). The enrichment p-values and fold are indicated.

  9. 9.

    Supplementary Table 10

    In Supplementary Table 10 the lincRNA expression across mouse tissue compendium is shown. lincRNA expression levels across various mouse cell types, tissues, and conditions are provided. The values are log values of the relative expression of each lincRNA.

  10. 10.

    Supplementary Table 12

    In Supplementary Table 12 the P53 regulated lincRNAs upon DNA Damage Induction are shown. lincRNAs that temporally increase inP53 wild-type cells compared with P53 Knock-out cells upon stimulation with DNA damage are indicated along with their expression levels across the DNA damage time course.

  11. 11.

    Supplementary Table 13

    In Supplementary Table 13 the P53 Motif Enrichments in induced lincRNAs are shown. P53 motif scores are provided for each lincRNA promoter along with the sequence of the best motif hit and its conservation. P53 induced lincRNAs are indicated in the last column.

  12. 12.

    Supplementary Table 14

    In Supplementary Table 14 the NFKB regulated lincRNAs are shown. lincRNAs that are differentially expressed in TLR4 stimulation of BMDC cells compared with unstimulated BMDC cells are provided.

  13. 13.

    Supplementary Table 15

    In Supplementary Table 15 the ES cells lincRNAs bound by Oct4 and/or Nanog are shown: The coordinates of the lincRNAs bound by Oct4/Nanog in ES cells is provided.

  14. 14.

    Supplementary Table 16

    In Supplementary Table 16 the functional association of lincENC1 is shown. GSEA results for lincENC1 is provided for both profiled exons in the transcript.

  15. 15.

    Supplementary Table 17

    In Supplementary Table 17the Enrichment of Gene Ontology (GO) terms for lincRNA neighbors is shown. Significant GO terms (FDR<.05) are indicated along with their associated p-values.

Word documents

  1. 1.

    Supplementary Table 3

    In Supplementary Table 3 the characteristic properties of lincRNAs are shown.

Text files

  1. 1.

    Supplementary Table 11

    In Supplementary Table 11 the Gene Set Enrichment Analysis (GSEA) association matrix is shown. Functional associations between lincRNAs (columns) and MSigDB terms (rows) are indicated. Positive association is indicated by a 1, negative association is indicated by an -1, and no association is indicated by a 0.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature07672

Further reading Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.