Abstract
Given the 2,400-fold range of genome sizes (0.06–148.9 Gbp (gigabase pair)) of seed plants (angiosperms and gymnosperms) with a broadly similar gene content (amounting to approximately 0.03 Gbp), the repeat-sequence content of the genome might be expected to increase with genome size, resulting in the largest genomes consisting almost entirely of repetitive sequences. Here we test this prediction, using the same bioinformatic approach for 101 species to ensure consistency in what constitutes a repeat. We reveal a fundamental change in repeat turnover in genomes above around 10 Gbp, such that species with the largest genomes are only about 55% repetitive. Given that genome size influences many plant traits, habits and life strategies, this fundamental shift in repeat dynamics is likely to affect the evolutionary trajectory of species lineages.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Data availability
The genomic DNA data analysed were already available in the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/home) or Illumina sequenced during the work described in this article and archived in the ENA (Supplementary Table 3). Details of the ENA accession identifier for each sample are provided in Supplementary Table 3a. Details of the source of the plant material and sequencing platform are given in Supplementary Table 3a. Genome size data were taken from reported estimates given in the Plant DNA C-values Database release 7.1 (https://cvalues.science.kew.org/) or from source publications not yet included in the database; in each case the source reference is provided in Supplementary Table 3a, column S and the species analysed here are listed in Supplementary Table 3b (further information is available in Methods, ‘Flow cytometry and genome size data’).
Code availability
Most of the code used to analyse these data are integral to the published, established software packages as stated above and parameter settings are described as appropriate. New code was generated to filter out all low-quality sequence reads, reads containing adapter sequences and reads with similarity to the plastid and mitochondrial genomes, and this is available in the Git repository https://bitbucket.org/repeatexplorer/re_utilities.
References
Lisch, D. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2013).
Bennetzen, J. L. & Park, M. Distinguishing friends, foes, and freeloaders in giant genomes. Curr. Opin. Genet. Dev. 49, 49–55 (2018).
Kersey, P. J. Plant genome sequences: past, present, future. Curr. Opin. Plant Biol. 48, 1–8 (2019).
Elliott, T. A. & Gregory, T. R. What’s in a genome? The C-value enigma and the evolution of eukaryotic genome content. Phil. Trans. Roy. Soc. B 370, 20140331 (2015).
Elliott, T. A. & Gregory, T. R. Do larger genomes contain more diverse transposable elements? BMC Evol. Biol. 15, 69 (2015).
Neumann, P., Novák, P., Hoštáková, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA 10, 1 (2019).
Mabuchi, T., Kokubun, H., Mii, M. & Ando, T. Nuclear DNA content in the genus Hepatica (Ranunculaceae). J. Plant Res. 118, 37–41 (2005).
Nowoshilow, S. et al. The axolotl genome and the evolution of key tissue formation regulators. Nature 554, 50–55 (2018).
Stritt, C., Wyler, M., Gimmi, E. L., Pippel, M. & Roulin, A. C. Diversity, dynamics and effects of long terminal repeat retrotransposons in the model grass Brachypodium distachyon. New Phytol. 227, 1736–1748 (2020).
Ma, J. X. & Bennetzen, J. L. Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc. Natl Acad. Sci. USA 103, 383–388 (2006).
Neumann, P., Koblížková, A., Navrátilová, A. & Macas, J. Significant expansion of Vicia pannonica genome size mediated by amplification of a single type of giant retroelement. Genetics 173, 1047–1056 (2006).
Nystedt, B. et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).
De La Torre, A. R., Li, Z., Van de Peer, Y. & Ingvarsson, P. K. Contrasting rates of molecular evolution and patterns of selection among gymnosperms and flowering plants. Mol. Biol. Evol. 34, 1363–1377 (2017).
Metcalfe, C. J., Filée, J., Germon, I., Joss, J. & Casane, D. Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements. Mol. Biol. Evol. 29, 3529–3539 (2012).
Sun, C., López Arriaza, J. R. & Mueller, R. L. Slow DNA loss in the gigantic genomes of salamanders. Genome Biol. Evol. 4, 1340–1348 (2012).
Vu, G. T. H., Cao, H. X., Reiss, B. & Schubert, I. Deletion-bias in DNA double-strand break repair differentially contributes to plant genome shrinkage. New Phytol. 214, 1712–1721 (2017).
Tiley, G. P. & Burleigh, J. G. The relationship of recombination rate, genome structure, and patterns of molecular evolution across angiosperms. BMC Evol. Biol. 15, 194 (2015).
Kent, T. V., Uzunović, J. & Wright, S. I. Coevolution between transposable elements and recombination. Philos. Trans. Roy. Soc. B 372, 20160458 (2017).
Maumus, F. & Quesneville, H. Deep investigation of Arabidopsis thaliana junk DNA reveals a continuum between repetitive elements and genomic dark matter. PLoS ONE 9, e94101 (2014).
Kelly, L. J. et al. Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size. New Phytol. 208, 596–607 (2015).
Bennetzen, J. L. & Kellogg, E. A. Do plants have a one-way ticket to genomic obesity? Plant Cell 9, 1509–1514 (1997).
Leitch, A. R. & Leitch, I. J. Ecological and genetic factors linked to contrasting genome dynamics in seed plants. New Phytol. 194, 629–646 (2012).
Francis, D., Davies, M. S. & Barlow, P. B. A strong nucleotypic effect of DNA C-value on the cell cycle regardless of ploidy level. Ann. Bot. 101, 747–757 (2008).
Doyle, J. J. & Coate, J. E. Polyploidy, the nucleotype, and novelty: the Impact of genome doubling on the biology of the cell. Int. J. Plant Sci. 180, 1–52 (2019).
Roddy, A. B. et al. The scaling of genome size and cell size limits maximum rates of photosynthesis with implications for ecological strategies. Int. J. Plant Sci. 181, 75–87 (2020).
Lawson, T. & Blatt, M. R. Stomatal size, speed, and responsiveness impact on photosynthesis and water use efficiency. Plant Physiol. 164, 1556–1570 (2014).
Franks, P. J. & Beerling, D. J. Maximum leaf conductance driven by CO2 effects on stomatal size and density over geologic time. Proc. Natl Acad. Sci. USA 106, 10343–10347 (2009).
Pellicer, J., Hidalgo, O., Dodsworth, S. & Leitch, I. J. Genome size diversity and its impact on the evolution of land plants. Genes 9, 88 (2018).
Knight, C. A., Molinari, N. A. & Petrov, D. A. The large genome constraint hypothesis: evolution, ecology and phenotype. Ann. Bot. 95, 177–190 (2005).
Vidic, T., Greilhuber, J., Vilhar, B. & Dermastia, M. Selective significance of genome size in a plant community with heavy metal pollution. Ecol. Appl. 19, 1515–1521 (2009).
Fleischmann, A. et al. Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms. Ann. Bot. 114, 1651–1663 (2014).
Van de Peer, Y., Mizrachi, E. & Marchal, K. The evolutionary significance of polyploidy. Nat. Rev. Genet. 18, 411–424 (2017).
Landis, J. B. et al. Impact of whole-genome duplication events on diversification rates in angiosperms. Am. J. Bot. 105, 348–363 (2018).
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
Pellicer, J. & Leitch, I. J. The Plant DNA C-values database (release 7.1): an updated online repository of plant genome size data for comparative studies. New Phytol. 226, 301–305 (2019).
Ickert-Bond, S. M. et al. Polyploidy in gymnosperms—insights into the genomic and evolutionary consequences of polyploidy in Ephedra. Mol. Phyl. Evol. 147, 106786 (2020).
Pellicer, J. & Leitch, I. J. in Molecular Plant Taxonomy Vol. 1115 (ed. Besse, P.) 279–307 (Humana Press, 2014).
Ferrari, S. & Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 31, 799–815 (2004).
Cribari-Neto, F. & Zeileis, A. Beta Regression in R. J. Stat. Softw. 34, 1–24 (2010).
Smithson, M. & Verkuilen, J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol. Meth. 11, 54–71 (2006).
Durka, W. & Michalski, S. G. Daphne: a dated phylogeny of a large European flora for phylogenetically informed ecological analyses. Ecology 93, 2297–2297 (2012).
Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).
Rambaut, A. FigTree version 1.4.3 http://tree.bio.ed.ac.uk/software/figtree (2012).
Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D. & R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. R Package version 3.1 http://cran.r-project.org/package=nlme (2017).
Acknowledgements
We thank Natural Environment Research Council (NE/G020256/1), the Czech Academy of Sciences (RVO:60077344) and Ramón y Cajal Fellowship (RYC-2017-2274) funded by the Ministerio de Ciencia y Tecnología (Gobierno de España) for support. In addition, the work was supported by the European Regional Development Fund–European Social Fund project ELIXIR-CZ–Capacity Building (no. CZ.02.1.01/0.0/0.0/16_013/0001777) and ELIXIR-CZ research infrastructure project (LM2015047) for the access to computing and storage facilities. We also thank Natural Environment Research Council for funding a studentship to S.D. and the China Scholarship Council for funding W.W. Finally, we thank R.A. Nichols for helpful advice and J. Marquardt for supplying DNA of H. non-scripta.
Author information
Authors and Affiliations
Contributions
A.R.L., I.J.L., J. Macas and P. Novák conceived the experiment and designed, implemented and coordinated the project. P. Novák conducted genomic sequence analysis, P. Neumann conducted (retro)transposon protein-coding domains analysis, J.P. and J. Mlinarec provided material and flow cytometry analysis, and M.S.G. provided the statistical analysis. J. Mlinarec, L.J.K., S.D., W.W., A. Kovařík, A. Koblížková and J.P. provided sequence data and experimental advice. All authors were involved in writing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Distribution of genome sizes (GS).
Distribution of genome sizes (GS) in (a) 10,770 angiosperms (out of c. 350,000 known species) (b) 506 gymnosperms (out of c. 1,000 known species).
Extended Data Fig. 2 Genome proportion of the different classes of repeats.
Genome proportion of the different classes of repeats based on copy number fitted against ln-transformed genome size. The grey regression lines show estimated trends for all 101 species fitted with a beta regression (see also Supplementary Table 4). Orange lines show the estimated slope from phylogenetic least squares (PGLS) using a phylogeny with proportional branch lengths (phy P) fitted with an Ornstein–Uhlenbeck process. We also tested a phylogeny with branch lengths transformed to a cladogram (phy C), and results were similar to this PGLS (not shown).
Extended Data Fig. 3 Transposable element analysis of 77 seed plant species.
Analysis of 77 seed plant species (69 angiosperms (1 early-diverging angiosperm, 53 eudicots, 15 monocots) and eight gymnosperms) showing how the proportion of the genome occupied by transposable element-related protein coding domains varies with ln-transformed genome size. The regression lines show the slopes estimated from a beta regression, and from a PGLS with an Ornstein–Uhlenbeck process. The regression line of the graph is similar to that seen for the whole repetitive fraction (see Extended Data Fig. 2a).
Extended Data Fig. 4 Genome proportion of repeats in eudicots, monocots and gymnosperms.
Genome proportion of repeats in four categories (sequences ≤ 20 copies, low (21–500), middle (501–10,000) and high (>10,000) copy sequences fitted against ln-transformed genome size, separately for eudicots, monocots and gymnosperms. See also Supplementary Table 4, which shows significant relationships in these datasets.
Supplementary information
Supplementary Information
Supplementary Tables 1–8 and Figs. 1–4.
Supplementary Table Excel file 1
The data and methods used in previously published work to estimate repeat genome proportions (GP) are provided in an Excel spreadsheet (filename: Novak_Supplementary_Table_1 (2 Sept).xlsx).
Supplementary Table Excel file 2
Details of the materials used in sequencing and repeat genome proportions (GP) and genome-size (GS) data: (a) shows the 101 plant species analysed for total repeat GP and the GPs of each of the four repeat categories based on the number of mutual similarity hits. It also shows the GPs of transposable elements (TEs), genome sizes (GS, bp/1 C) of the species analysed and the sources of that data; (b) shows the species in which the GS data were obtained in this work; and (c) lists the technical and biological replicates examined with the sources of the data (filename: Novak_Supplementary_Table_3 (2 Sept).xlsx).
Rights and permissions
About this article
Cite this article
Novák, P., Guignard, M.S., Neumann, P. et al. Repeat-sequence turnover shifts fundamentally in species with large genomes. Nat. Plants 6, 1325–1329 (2020). https://doi.org/10.1038/s41477-020-00785-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41477-020-00785-x
This article is cited by
-
Cytomolecular diversity among Vigna Savi (Leguminosae) subgenera
Protoplasma (2024)
-
Multi-integrated genomic data for Passiflora foetida provides insights into genome size evolution and floral development in Passiflora
Molecular Horticulture (2023)
-
Intragenomic rDNA variation - the product of concerted evolution, mutation, or something in between?
Heredity (2023)
-
Accumulation of retrotransposons contributes to W chromosome differentiation in the willow beauty Peribatodes rhomboidaria (Lepidoptera: Geometridae)
Scientific Reports (2023)
-
Transgressive and parental dominant gene expression and cytosine methylation during seed development in Brassica napus hybrids
Theoretical and Applied Genetics (2023)