The Norway spruce genome sequence and conifer genome evolution

Journal name:
Nature
Volume:
497,
Pages:
579–584
Date published:
DOI:
doi:10.1038/nature12211
Received
Accepted
Published online

Abstract

Conifers have dominated forests for more than 200million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.

At a glance

Figures

  1. The gene-space and transcribed fraction of the P.abies 1.0 assembly.
    Figure 1: The gene-space and transcribed fraction of the P.abies 1.0 assembly.

    a, Gene family loss and gain in eight sequenced plant genomes (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Oryza sativa, Zea mays, Picea abies, Selaginella moellendorffii and Physcomitella patens). Gene families were identified using TribeMCL (inflation value 4), and the DOLLOP program from the PHYLIP package was used to determine the minimum gene set for ancestral nodes of the phylogenetic tree. We used plant genome annotations filtered to remove transposable elements. ‘Orphans’ refers to gene families containing only a single gene. Blue numbers indicate the number of gene families. b, Boxplot representation of length distribution for the 10% longest introns in the same eight genomes. c, Scatter plots of cumulative intron length against log10 expression calculated as fragments per kilobase per million mapped reads (FPKM) for high-confidence gene loci (top, coloured orange) and green for lncRNA loci (middle, shaded green). The bottom panel shows a histogram of cumulative intron size in the two sets of loci. d, Distribution of small (18–24-nucleotide (nt)) RNAs and their co-alignment-based colocation to genomic features (repeats, high-confidence genes and their promoter/UTRs). CDS, coding sequence.

  2. Conifer genomes contain expansions of a diverse set of LTR-RTs.
    Figure 2: Conifer genomes contain expansions of a diverse set of LTR-RTs.

    a, Distribution of different classes of transposable elements from six gymnosperm species. The figure is based on the total fraction of transposable elements (TE) identified and grouped into different classes from the different species. Genome sizes of the six species are given in circles and their phylogenetic relationship is shown, with tentative dating of divergence times (x-axis) based on 64 chloroplast genes over 39 species and five fossil calibration points. b, c, Heuristic neighbour-joining trees constructed from 5,922 sequences similar to the Ty3/Gypsy (b) and 3,052 sequences similar to the Ty1/Copia (c) reverse transcriptase domain from nine plant species. The trees to the right have only sequences from P. abies and Z. mays coloured, whereas the grey dots are the uncoloured versions of the other species represented on the left. d, Distributions of insertion times calculated for LTR-RTs in Picea abies, Picea glauca and Oryza glaberrima/O. sativa, using mutation rates (per base per year) of 2.2×10−9 for the Picea spp. and 1.8×10−8 for O. glaberrima50.

  3. Intron sizes are conserved among gymnosperms.
    Figure 3: Intron sizes are conserved among gymnosperms.

    a, b, Intron size comparisons between P. abies, P. sylvestris (a) and G. gnemon (b), respectively. Orthologues of introns that were categorised as short (50–300bp) or long (1–20kb) in P. abies were identified in P. sylvestris and G. gnemon, and the corresponding intron size was scored.

References

  1. Stewart, W. N. & Rothwell, G. W. Paleobotany and the Evolution of Plants (Cambridge Univ. Press, 1993)
  2. Savard, L. et al. Chloroplast and nuclear gene sequences indicate late Pennsylvanian time for the last common ancestor of extant seed plants. Proc. Natl Acad. Sci. USA 91, 51635167 (1994)
  3. Leslie, A. B. et al. Hemisphere-scale differences in conifer evolutionary dynamics. Proc. Natl Acad. Sci. USA 109, 1621716221 (2012)
  4. Flory, W. S. Chromosome numbers and phylogeny in the gymnosperms. J. Arnold Arb. 17, 8287 (1936)
  5. Morse, A. M. et al. Evolution of genome size and complexity in Pinus. PLoS ONE 4, e4332 (2009)
  6. Kovach, A. et al. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics 11, 420 (2010)
  7. Ahuja, M. R. & Neale, D. B. Evolution of genome size in conifers. Silvae Genet. 54, 126137 (2005)
  8. Buschiazzo, E., Ritland, C. E., Bohlmann, J. & Ritland, K. Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms. BMC Evol. Biol. 12, 8 (2012)
  9. Jaramillo-Correa, J. P., Verdu, M. & González-Martínez, S. C. The contribution of recombination to heterozygosity differs among plant evolutionary lineages and life-forms. BMC Evol. Biol. 10, 22 (2010)
  10. Murray, B. G. Nuclear DNA amounts in gymnosperms. Ann. Bot. (Lond.) 82, 315 (1998)
  11. Zhang, G. et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature 490, 4954 (2012)
  12. Vicedomini, R., Vezzi, F., Scalabrin, S., Arvestad, L. & Policriti, A. GAM-NGS: genomic assemblies merger for next generation sequencing. BMC Bioinformatics 14, S6 (2013)
  13. Sahlin, K., Street, N., Lundeberg, J. & Arvestad, L. Improved gap size estimation for scaffolding algorithms. Bioinformatics 28, 22152222 (2012)
  14. Vezzi, F., Narzisi, G. & Mishra, B. Feature-by-feature–evaluating de novo sequence assembly. PLoS ONE 7, e31002 (2012)
  15. Ralph, S. G. et al. A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis). BMC Genomics 9, 484 (2008)
  16. Mackenzie, S. A. in Plant Mitochondria (ed. Logan, D. C.) 3649 (Blackwell, 2007)
  17. Messing, J. et al. Sequence composition and genome organization of maize. Proc. Natl Acad. Sci. USA 101, 1434914354 (2004)
  18. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463467 (2007)
  19. Bennetzen, J. L., Coleman, C., Liu, R., Ma, J. & Ramakrishna, W. Consistent over-estimation of gene number in complex plant genomes. Curr. Opin. Plant Biol. 7, 732736 (2004)
  20. Vanneste, K., Van de Peer, Y. & Maere, S. Inference of genome duplications from age distributions revisited. Mol. Biol. Evol. 30, 177190 (2013)
  21. Jiao, Y. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97100 (2011)
  22. García-Gil, M. R. Evolutionary aspects of functional and pseudogene members of the phytochrome gene family in Scots pine. J. Mol. Evol. 67, 222232 (2008)
  23. Magbanua, Z. V. et al. Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine. PLoS ONE 6, e16214 (2011)
  24. Bánfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 16461657 (2012)
  25. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 17751789 (2012)
  26. Dolgosheina, E. V. et al. Conifers have a unique small RNA silencing signature. RNA 14, 15081515 (2008)
  27. Morin, R. D. et al. Comparative analysis of the small RNA transcriptomes of Pinus contorta and Oryza sativa. Genome Res. 18, 571584 (2008)
  28. Wan, L.-C. et al. Identification and characterization of small non-coding RNAs from Chinese fir by high throughput sequencing. BMC Plant Biol. 12, 146 (2012)
  29. Zhang, J. et al. Dynamic expression of small RNA populations in larch (Larix leptolepis). Planta 237, 89101 (2013)
  30. Henderson, I. R. & Jacobsen, S. E. Epigenetic inheritance in plants. Nature 447, 418424 (2007)
  31. Sanmiguel, P. & Bennetzen, J. L. Evidence that a recent increase in maize genome size was caused by the massive amplification of intergene retrotransposons. Ann. Bot. (Lond.) 82, 3744 (1998)
  32. Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nature Rev. Genet. 8, 973982 (2007)
  33. Devos, K. M., Brown, J. K. M. & Bennetzen, J. L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12, 10751079 (2002)
  34. Vitte, C. & Panaud, O. Formation of solo-LTRs through unequal homologous recombination counterbalances amplifications of LTR retrotransposons in rice Oryza sativa L. Mol. Biol. Evol. 20, 528540 (2003)
  35. Vicient, C. M. et al. Retrotransposon BARE-1 and its role in genome evolution in the genus Hordeum. Plant Cell 11, 17691784 (1999)
  36. Karlgren, A. et al. Evolution of the PEBP gene family in plants: functional diversification in seed plant evolution. Plant Physiol. 156, 19671977 (2011)
  37. Klintenäs, M., Pin, P. A., Benlloch, R., Ingvarsson, P. K. & Nilsson, O. Analysis of conifer FLOWERING LOCUS T/TERMINAL FLOWER1-like genes provides evidence for dramatic biochemical evolution in the angiosperm FT lineage. New Phytol. 196, 12601273 (2012)
  38. Gramzow, L. & Theissen, G. A hitchhiker’s guide to the MADS world of plants. Genome Biol. 11, 214 (2010)
  39. Smaczniak, C. et al. Developmental and evolutionary diversity of plant MADS-domain factors: insights from recent studies. Development 139, 30813098 (2012)
  40. Kubo, M. et al. Transcription switches for protoxylem and metaxylem vessel formation. Genes Dev. 19, 18551860 (2005)
  41. Piegu, B. et al. Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 16, 12621269 (2006)
  42. Hawkins, J. S., Kim, H., Nason, J. D., Wing, R. A. & Wendel, J. F. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 16, 12521261 (2006)
  43. Bennetzen, J. L., Ma, J. & Devos, K. M. Mechanisms of recent genome size variation in flowering plants. Ann. Bot. (Lond.) 95, 127132 (2005)
  44. Pavy, N. et al. A spruce gene map infers ancient plant genome reshuffling and subsequent slow evolution in the gymnosperm lineage leading to extant conifers. BMC Biol. 10, 84 (2012)
  45. Bennetzen, J. L. & Kellogg, E. A. Do plants have a one way ticket to genomic obesity? Plant Cell 9, 15091514 (1997)
  46. Van de Peer, Y., Fawcett, J. A., Proost, S., Sterck, L. & Vandepoele, K. The flowering world: a tale of duplications. Trends Plant Sci. 14, 680688 (2009)
  47. Fedoroff, N. V. Presidential address. Transposable elements, epigenetics, and genome evolution. Science 338, 758767 (2012)
  48. Van de Peer, Y. A mystery unveiled. Genome Biol. 12, 113 (2011)
  49. Soltis, D. E. et al. Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336348 (2009)
  50. Ma, J. & Bennetzen, J. L. Rapid recent growth and divergence of rice nuclear genomes. Proc. Natl Acad. Sci. USA 101, 1240412410 (2004)

Download references

Author information

Affiliations

  1. Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, 171 21 Solna, Sweden.

    • Björn Nystedt &
    • Ellen Sherwood
  2. Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, 901 87 Umeå, Sweden.

    • Nathaniel R. Street,
    • Douglas G. Scofield,
    • Nicolas Delhomme,
    • Olivier Keech,
    • Hannele Tuominen,
    • Bo Zhang,
    • Torgeir R. Hvidsten &
    • Stefan Jansson
  3. Department of Cell and Molecular Biology, Science for Life Laboratory, Karolinska Institutet, Box 1031, 171 77 Stockholm, Sweden.

    • Anna Wetterbom,
    • Johannes Luthman,
    • Fredrik Lysholm,
    • Carlos Talavera-López &
    • Björn Andersson
  4. Istituto di Genomica Applicata, Via J. Linussio 51, 33100 Udine, Italy.

    • Andrea Zuccolo,
    • Stefania Giacomello,
    • Riccardo Vicedomini &
    • Michele Morgante
  5. Institute of Life Sciences, Scuola Superiore Sant’Anna, 56127 Pisa, Italy.

    • Andrea Zuccolo
  6. Department of Plant Systems Biology (VIB) and Department of Plant Biotechnology and Bioinformatics (Gent University), Technologiepark 927, 9052 Gent, Belgium.

    • Yao-Cheng Lin,
    • Kevin Vanneste &
    • Yves Van de Peer
  7. Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, 901 87 Umeå, Sweden.

    • Douglas G. Scofield,
    • Zhi-Qiang Wu,
    • Stacey Lee Thompson &
    • Pär K. Ingvarsson
  8. School of Computer Science and Communication, Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 171 21 Solna, Sweden.

    • Francesco Vezzi,
    • Kristoffer Sahlin &
    • Lars Arvestad
  9. Università degli Studi di Udine, Via delle Scienze 208, 33100 Udine, Italy.

    • Stefania Giacomello,
    • Riccardo Vicedomini &
    • Michele Morgante
  10. School of Biotechnology, Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, 171 21 Solna, Sweden.

    • Andrey Alexeyenko,
    • Kristina Holmberg,
    • Jimmie Hällman,
    • Max Käller,
    • Nemanja Rilakovic &
    • Joakim Lundeberg
  11. Department of Forest Mycology and Plant Pathology, Uppsala Biocenter, Swedish University of Agricultural Sciences, Box 7026, 750 07 Uppsala, Sweden.

    • Malin Elfstrand &
    • Åke Olson
  12. Department of Genetics, Friedrich-Schiller-University Jena, Philosophenweg 12, 07743 Jena, Germany.

    • Lydia Gramzow &
    • Günter Theißen
  13. Molecular Evolution, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, 752 37 Uppsala, Sweden.

    • Lisa Klasson
  14. BACPAC Resources, Children’s Hospital of Oakland Research Institute, Bruce Lyon Memorial Research Building, Oakland, California 94609, USA.

    • Maxim Koriabine &
    • Pieter de Jong
  15. Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, 901 83 Umeå, Sweden.

    • Melis Kucukoglu,
    • Totte Niittylä,
    • Rishikesh Bhalerao,
    • Rosario Garcia Gil,
    • Björn Sundberg &
    • Ove Nilsson
  16. Department of Forest and Conservation Sciences, University of British Columbia, 2424 Main Mall, Vancouver, British Columbia V6T 1Z4, Canada.

    • Carol Ritland,
    • Joerg Bohlmann &
    • Kermit Ritland
  17. Jardí Botànic, Universitat de Valencia, c/Quart 80, 46008 Valencia, Spain.

    • Josep A. Rosselló
  18. Marimurtra Botanical Garden, Carl Faust Fdn, 17300 Blanes, Spain.

    • Josep A. Rosselló
  19. Canada Research Chair in Forest and Environmental Genomics, Centre for Forest Research and Institute for Systems and Integrative Biology, Université Laval, Québec, Québec G1V 0A6, Canada.

    • Juliana Sena,
    • Jean Bousquet &
    • John MacKay
  20. Department of Biosciences and Nutrition, Science for Life Laboratory, Karolinska Institutet, Box 1031, 171 21 Solna, Sweden.

    • Thomas Svensson
  21. Michael Smith Laboratories, University of British Columbia, 321-2185 East Mall, Vancouver, British Columbia V6T 1Z4, Canada.

    • Philipp Zerbe &
    • Joerg Bohlmann
  22. Swedish e-Science Research Center, Department Numerical Analysis and Computer Science, Stockholm University, Box 1031, 171 21 Solna, Sweden.

    • Lars Arvestad
  23. Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, 1432 Ås, Norway.

    • Torgeir R. Hvidsten

Contributions

B.N. and N.R.S. are joint first authors, and A.W., A.Z., Y-C.L. and D.G.S. are joint second authors, who contributed to most parts of the work. F.V., A.A., N.D., R.V., K.S. and E.S. contributed to the assembly and sequence analysis; S.G. to repeat analysis; N.D., M.E., L.G., M.Ku., T.N., Å.O., G.T., H.T., P.Z. and B.Z. to quality control of the assembly and analysis of gene families; K.H., J.H., O.K., M.Kä. and T.S. to the sequencing; L.K. to analysis of the mitochindrial genome; M.Ko. and N.R. to generation of fosmid pools; J.Lut., F.L., C.T.-L. and K.V. to analysis of the sequences of P. abies and other conifers; C.R. and J.S. to production of BAC sequences; Z.-Q.W. to analysis of the chloroplast genome and J.A.R. determined the genome size. L.A., R.B., J.Boh., J.Bou., R.G.G., T.R.H., P.d.J., J.M., M.M., K.R., B.S., S.L.T., Y.V.d.P. and B.A. contributed to the design and supervision of various parts of the research. O.N. headed and P.K.I. managed the project, J.Lun. coordinated the sequencing and assembly activities, and S.J. the bioinformatics activities. B.N., N.R.S., A.Z., O.N., P.K.I., J.Lun. and S.J. wrote and edited most of the manuscript. All authors commented on the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Raw data and assemblies are available from the ConGenIE (Conifer Genome Integrative Explorer) web resource (http://congenie.org), as well as the European Bioinformatics Institute (EMBL) and European Nucleotide Archive (ENA); see Supplementary Information 6.1 for accession numbers.

Author details

Supplementary information

PDF files

  1. Supplementary Information (12.4 MB)

    This file contains Supplementary Sections 1-6, each of which contains Supplementary Text and Data, Supplementary Figures, Supplementary Tables and additional references. Please note that the following Supplementary Figures and Tables appear as separate files: Supplementary Figure 5.2 and Supplementary Tables 3.3, 3.4, 3.11, 3.12 and 3.14.

  2. Supplementary Figures (506 KB)

    This file contains Supplementary Figures 5.2.

Excel files

  1. Supplementary Tables (18 KB)

    This file contains Supplementary Table 3.3.

  2. Supplementary Tables (24 KB)

    This file contains Supplementary Table 3.4.

  3. Supplementary Tables (14 KB)

    This file contains Supplementary Table 3.11.

  4. Supplementary Tables (18 KB)

    This file contains Supplementary Table 3.12.

  5. Supplementary Tables (16 KB)

    This file contains Supplementary Table 3.14.

Additional data