Article | Published:

The evolution of lncRNA repertoires and expression patterns in tetrapods

Nature volume 505, pages 635640 (30 January 2014) | Download Citation

Abstract

Only a very small fraction of long noncoding RNAs (lncRNAs) are well characterized. The evolutionary history of lncRNAs can provide insights into their functionality, but the absence of lncRNA annotations in non-model organisms has precluded comparative analyses. Here we present a large-scale evolutionary study of lncRNA repertoires and expression patterns, in 11 tetrapod species. We identify approximately 11,000 primate-specific lncRNAs and 2,500 highly conserved lncRNAs, including approximately 400 genes that are likely to have originated more than 300 million years ago. We find that lncRNAs, in particular ancient ones, are in general actively regulated and may function predominantly in embryonic development. Most lncRNAs evolve rapidly in terms of sequence and expression levels, but tissue specificities are often conserved. We compared expression patterns of homologous lncRNA and protein-coding families across tetrapods to reconstruct an evolutionarily conserved co-expression network. This network suggests potential functions for lncRNAs in fundamental processes such as spermatogenesis and synaptic transmission, but also in more specific mechanisms such as placenta development through microRNA production.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

Sequence Read Archive

Data deposits

The sequencing data have been deposited in the Gene Expression Omnibus (accession GSE43520) and SRA (PRJNA186438 and PRJNA202404).

References

  1. 1.

    et al. Patterns of positive selection in six mammalian genomes. PLoS Genet. 4, e1000144 (2008)

  2. 2.

    et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343–348 (2011)

  3. 3.

    et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009)

  4. 4.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)

  5. 5.

    et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012)

  6. 6.

    et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009)

  7. 7.

    et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnol. 28, 503–510 (2010)

  8. 8.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005)

  9. 9.

    , , , & Specific expression of long noncoding RNAs in the mouse brain. Proc. Natl Acad. Sci. USA 105, 716–721 (2008)

  10. 10.

    et al. Identification and properties of 1,119 candidate lincRNA loci in the Drosophila melanogaster genome. Genome Biol. Evol. 4, 427–442 (2012)

  11. 11.

    & Long non-coding RNAs in C. elegans. Genome Res. (2012)

  12. 12.

    , , , & Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 1537–1550 (2011)

  13. 13.

    , , & Silencing of the mammalian X chromosome. Annu. Rev. Genomics Hum. Genet. 6, 69–92 (2005)

  14. 14.

    , & The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 415, 810–813 (2002)

  15. 15.

    et al. Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res. 18, 1433–1445 (2008)

  16. 16.

    et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007)

  17. 17.

    et al. Long noncoding RNAs with enhancer-like function in human cells. Cell 143, 46–58 (2010)

  18. 18.

    et al. A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147, 358–369 (2011)

  19. 19.

    et al. Long noncoding RNA genes: conservation of sequence and brain expression among diverse amniotes. Genome Biol. 11, R72 (2010)

  20. 20.

    & Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness. Genome Biol. 10, R124 (2009)

  21. 21.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)

  22. 22.

    et al. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 8, e1002841 (2012)

  23. 23.

    , & TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006)

  24. 24.

    et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007)

  25. 25.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

  26. 26.

    An integrated map of genetic variation from 1,092 human genomes. Nature 135, 56–65 (2012)

  27. 27.

    & Evidence of abundant purifying selection in humans for recently-acquired regulatory functions. Science 337, 1675–1678 (2012)

  28. 28.

    & Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. Trends Genet. 23, 273–277 (2007)

  29. 29.

    et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013)

  30. 30.

    et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011)

  31. 31.

    An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

  32. 32.

    et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010)

  33. 33.

    , , & PCL2 modulates gene regulatory networks controlling self-renewal and commitment in embryonic stem cells. Cell Cycle 10, 45–51 (2011)

  34. 34.

    & The transcriptional foundation of pluripotency. Development 136, 2311–2322 (2009)

  35. 35.

    , , , & Gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003)

  36. 36.

    , , & Coherent but overlapping expression of microRNAs and their targets during vertebrate development. Genes Dev. 23, 466–481 (2009)

  37. 37.

    et al. STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815 (2013)

  38. 38.

    et al. The H19 lincRNA is a developmental reservoir of miR-675 that suppresses growth and Igf1r. Nature Cell Biol. 14, 659–665 (2012)

  39. 39.

    & Using MCL to extract clusters from networks. Methods Mol. Biol. 804, 281–295 (2012)

  40. 40.

    et al. Rsx is a metatherian RNA with Xist-like properties in X-chromosome inactivation. Nature 487, 254–258 (2012)

  41. 41.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009)

  42. 42.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol. 28, 511–515 (2010)

  43. 43.

    UniProt. Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 40, D71–D75 (2012)

  44. 44.

    et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012)

  45. 45.

    et al. Ensembl 2012. Nucleic Acids Res. 40, D84–D90 (2012)

  46. 46.

    et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005)

  47. 47.

    , , , & Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27, 431–432 (2011)

  48. 48.

    R Development Core Team. R: A language and environment for statistical computing (R Foundation for Statistical Computing, 2011)

  49. 49.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

  50. 50.

    et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38, D613–D619 (2010)

  51. 51.

    , , , & Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. J. Comput. Biol. 11, 319–355 (2004)

  52. 52.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

  53. 53.

    et al. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009)

  54. 54.

    et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708–715 (2004)

  55. 55.

    et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010)

  56. 56.

    et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012)

  57. 57.

    et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006)

  58. 58.

    & miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 39, D152–D157 (2011)

Download references

Acknowledgements

We thank L. Froidevaux and D. Cortéz for help with genome sequencing, J. Meunier for help with preliminary miRNA analyses, K. Harshman and the Lausanne Genomics Technology Facility for high-throughput sequencing support, I. Xenarios for computational support, S. Bergmann and Z. Kutalik for advice on co-expression analyses. Human embryonic and fetal material was provided by the Joint MRC/Wellcome Trust (grant 099175/Z/12/Z) Human Developmental Biology Resource (http://www.hdbr.org). The computations were performed at the Vital-IT (http://www.vital-it.ch) Center for high-performance computing of the SIB Swiss Institute of Bioinformatics. This research was supported by grants from the European Research Council (Starting Independent Researcher Grant 242597, SexGenTransEvolution) and the Swiss National Science Foundation (grant 31003A_130287) to H.K. A.N. was supported by a FEBS long-term postdoctoral fellowship.

Author information

Author notes

    • Anamaria Necsulea
    •  & Magali Soumillon

    Present addresses: Laboratory of Developmental Genomics, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland (A.N.); Harvard Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA, and Broad Institute, Cambridge, Massachusetts 02142, USA (M.S.).

Affiliations

  1. Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland

    • Anamaria Necsulea
    • , Magali Soumillon
    • , Maria Warnefors
    • , Angélica Liechti
    •  & Henrik Kaessmann
  2. Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland

    • Anamaria Necsulea
    • , Magali Soumillon
    • , Maria Warnefors
    • , Angélica Liechti
    •  & Henrik Kaessmann
  3. The Robinson Institute, School of Molecular and Biomedical Science, University of Adelaide, Adelaide, South Australia 5005, Australia

    • Tasman Daish
    •  & Frank Grützner
  4. Department of Systematic Zoology, Faculty of Agriculture and Horticulture, Humboldt University Berlin, 10099 Berlin, Germany

    • Ulrich Zeller
  5. Department of Genetics, Stanford University School of Medicine, Stanford University, Stanford, California 94305, USA

    • Julie C. Baker

Authors

  1. Search for Anamaria Necsulea in:

  2. Search for Magali Soumillon in:

  3. Search for Maria Warnefors in:

  4. Search for Angélica Liechti in:

  5. Search for Tasman Daish in:

  6. Search for Ulrich Zeller in:

  7. Search for Julie C. Baker in:

  8. Search for Frank Grützner in:

  9. Search for Henrik Kaessmann in:

Contributions

A.N. conceived and performed all biological analyses and wrote the manuscript, with input from all authors. A.N. and M.W. processed RNA-seq data. M.S. and A.L. generated RNA-seq data. T.D. and F.G. collected platypus samples. U.Z. collected opossum samples. J.C.B. provided mouse placenta samples and contributed to H19X analyses. The project was supervised and originally designed by H.K.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Anamaria Necsulea or Henrik Kaessmann.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains the Supplementary Discussion, Supplementary Methods and additional references.

Excel files

  1. 1.

    Supplementary Table 1

    This Supplementary Table contains information for the RNA-seq samples used in this study.

  2. 2.

    Supplementary Table 2

    Node and edge identifiers for the co-expression network.

  3. 3.

    Supplementary Table 3

    This Supplementary Table contains the list of protein-coding genes, which have an excess of connections in cis in the co-expression network.

  4. 4.

    Supplementary Table 4

    MCL clusters determined for the co-expression network and the GO enrichment results for each cluster.

Zip files

  1. 1.

    Supplementary Tables 5 and 6

    This zipped file contains Supplementary Tables 5 and 6. Supplementary Table 5 shows results of the GO enrichment analysis for each lncRNA node in the co-expression network and Supplementary Table 6 contains the list of miRNAs associated with H19X in each species.

  2. 2.

    Supplementary Data 1

    This Supplementary Dataset contains the lncRNA annotations used in this study.

  3. 3.

    Supplementary Data 2

    This Supplementary Dataset contains information for homologous lncRNA families.

  4. 4.

    Supplementary Data 3

    This Supplementary Dataset contains expression level estimates for lncRNAs and for Ensembl-annotated protein-coding genes.

  5. 5.

    Supplementary Data 4

    This Supplementary Dataset contains miRNA expression values for 5 species.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature12943

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.