Introduction

Chromosome evolution is an integral part of plant speciation1,2, and chromosome variations, such as polyploidy and dysploidy, provide genetic support for ecological differentiation and adaptation3. The crucial role of polyploidy has been widely demonstrated4,5,6,7, while more recently, the study of Escudero et al.8 highlighted that dysploidy can have a higher evolutionary impact than polyploidy, in the long run.

Several models have been proposed to account for the observed patterns of chromosome number variation. Some models hypothesised a progressive change from a starting number in a decreasing (‘Fusion Hypothesis’9), increasing (‘Fission Hypothesis’10,11) or diverging series (‘Modal Hypothesis’12). Bickham and Baker13 advocated a ‘Canalisation Model’, in which rapid karyotype evolution occurs immediately after a lineage enters a new adaptive zone and is followed by slower changes through time. Imai et al.14,15 developed the ‘Minimum Interaction Theory’ (MIT hereafter), which postulates that the only evolutionary relevant mutations are those linked to germinal cells, where non-homologous chromosomes are fixed during meiosis on the nuclear membrane by telomeres forming “suspension arches”. In these cells, interactions among non-homologous chromosomes would be counter-selected in two possible ways: by an increase in the nuclear volume and/or by an increase in chromosome number paralleled by a decrease of chromosome sizes. The same authors14,15 demonstrated that, indeed, in some animal taxa, a trend can be observed from low to high chromosome numbers, mediated by an imbalanced fission-to-fusion ratio, at least up to a theoretical maximum16.

Several works recorded a negative correlation between genome size and chromosome number in plants17,18 but a positive correlation between genome size and nuclear volume19. An increase in chromosome number by recurrent polyploidy can be counteracted by reductions brought about by descending dysploidy20. Accordingly, a chromosome duplication after polyploidisation, causing an increase in genome size, should be followed by an increase in nuclear volume, in agreement with MIT. Fewer and larger chromosomes, e.g., those originated through descending dysploidy, should be accompanied by larger nuclear volumes to reduce non-homologous chromosome interactions during meiosis consistent with MIT14. However, in this case no obvious increases in genome size should be expected, unless caused by massive appearance of repetitive non-coding DNA which is very frequent in plants3.

Although a generalised model of chromosome number variation is lacking and despite the failure of models to fully predict available experimental evidence, the development of such a diverse range of models is suggestive, per se, of the high evolutionary interest of chromosome number variation.

Despite the interest in chromosome number evolution over the last decades21,22,23,24, in the absence of studies linking changes of chromosome number to natural selection, it is impossible to identify the adaptive function of such variation. At present, an adaptive role has been demonstrated for other genomic phenotypic traits, such as genome size25,26,27 showing that species with large genomes may be at a selective disadvantage in extreme or unstable environmental conditions. However, for chromosome number variation, evolutionary hypotheses have been raised by a limited set of preliminary works28. It is generally admitted that chromosome number and genome size are not positively correlated in angiosperms and gymnosperms, while such a correlation is highly significant in ferns and lycophytes29. Further, Grant30 argued that in angiosperms, species with a basic chromosome number higher than 14 should be considered paleo-polyploid, thus linking high chromosome numbers with multiple ancient rounds of whole genome duplications. In addition, a series of recent studies suggested that paleo-polyploidy is widespread across angiosperms31 and seed plants32 in general, irrespective of their chromosome number. Unbalanced (i.e., odd) chromosome numbers often cause significant phenotypic changes and severely impact plant growth and sexual fitness33, with the exception of apomictic plants and those with holocentric chromosomes, for which no deleterious effect (i.e., counter-selection) is expected34,35. It has been postulated that descendant dysploid chromosome number changes, coupled with the transition to an annual life form, are the main trend in angiosperms34; however, this soon resulted in an overly broad generalization, like many other karyological assumptions concerning, for instance, the direction of variation in karyotype asymmetry36. Stebbins1 also hypothesized that a chromosome number reduction by dysploidy should be expected in plants occupying pioneer habitats to avoid excessive segregation and recombination of genes. Nevertheless, protection of favourable allelic co-occurrences against recombination is also achieved by inversions - with no effects on chromosome number - in some animal taxa37. However, high chromosome numbers carry an increased risk of mis-segregation during nuclear division20. Stebbins1 and Darlington38, together with Grant2, agreed that low recombination rates should be favoured in individuals living in unstable environments to quickly develop populations; in contrast, environmental stability is expected to select for increased recombination rates because loss of alleles is overbalanced by those rare allelic combinations with high fitness.

In particular, given that genetic recombination acts as a trade-off between the opposing needs of immediate fitness and evolutionary adaptability2, chromosome number should clearly play a central role in this balance1,39.

To date, the possible relationship among chromosome number, plant traits, and environmental parameters has scarcely been investigated quantitatively. Accordingly, the main aim of our study was to use a dataset of 801 Italian endemic vascular plants (1364 accessions) to test these relationships. Italian endemics represent an ideal case study because complete information is available for basic chromosome number (x), and they share a common geographical evolutionary history40. Whilst we expect a significant phylogenetic signal in chromosome numbers, we specifically aim to quantitatively test the hypothesis that environmental stability and longer life cycles select for higher chromosome numbers, while unstable habitats select for lower chromosome numbers. To this end, we used a phylogenetic comparative approach that estimates phylogenetic inertia and adaptation in chromosome number based on an Ornstein-Uhlenbeck process41. The method considers a single trait adapting to optima influenced by continuous, randomly changing predictor variables or in response to fixed categorical niches. One of the main parameters returned by the model is the phylogenetic half-life (t1/2), indicating the time it takes for half the ancestral influence on a trait to evolve towards the predicted optimal phenotype42. Using this method, we were able to estimate relationships between chromosome numbers, plant traits and environmental factors in an evolutionary framework. Finally, by comparing the results of the different models testing for evolvability of 2n and of x, we were able to assess the relative contribution of polyploidy to 2n and provide insight into its presumptive ecological significance.

Results

Phylogenetic effect in chromosome number

The phylogenetic effects in chromosome number varied among the examined clades, but were generally moderate to large (Table 1), thus rejecting the hypothesis of species independence. Diploid chromosome numbers (2n) exhibited significant phylogenetic effects (t1/2 > 0), and the supported values (values within two log-likelihood units lower than the maximum log-likelihood) ranged from a moderate to strong phylogenetic effect, not exceeding 1.0 total length (t1/2 < 1), with the exceptions of Malvids and Caryophyllales. Indeed, the best estimate for Malvids was t1/2 = 0.01, with a support region from 0 to 0.05, suggesting that the trait evolved rapidly, nearly instantly on the timescale of this phylogeny. In contrast, the half-life value for Caryophyllales was many times the total tree length (infinity), indicating that 2n evolved as if by Brownian motion in this clade.

Table 1 Phylogenetic regression results of the evolution of log-transformed chromosome numbers (2n and x) on climatic variables in each angiosperm clade (as indicated).

Compared to diploid numbers, the basic chromosome number (x) showed stronger phylogenetic effects with a half-life that included t1/2 > 1 and a supporting region that included t1/2 = ∞ in all clades. Furthermore, the wider support regions suggested that estimates of t1/2 for x were more uncertain than those for 2n.

The stronger phylogenetic effects exhibited by x were associated with coefficients of variation (CV) lower than 0.5, while the CV calculated for 2n was larger and was followed by weaker phylogenetic effects (Table 2).

Table 2 Mean chromosome numbers (±SD) and coefficient of variation (CV) for major clades of Italian endemics.

Adaptation and inertia in chromosome number

We found clear evidence for adaptation of chromosome number to environmental and morphological predictors, but in several cases, and especially for basic chromosome number (x), a pure Brownian motion model best explained the evolution of chromosome number on the phylogeny (Tables 1 and 3).

Table 3 Primary optima (±SE) of log-transformed chromosome numbers (2n and x) for each categorical predictor variable in each angiosperm clade (as indicated).

The effects of the continuous (climatic) predictors were generally weak. Indeed, only eight of the 60 models that used climatic predictors had lower AICc and lower half-life (indicating that not all the phylogenetic effect was due to phylogenetic inertia) compared to the model without the predictor (Table 1). Nevertheless, even in these cases (all these models refer to 2n), the half-life reduction was small and the AICc decrease never exceeded 2 units which indicated that there was a weak tendency for chromosome numbers to evolve towards the optimum. Hence, these results should be regarded with caution as they represent only a tentative indication of climatic effects on chromosome numbers.

In the eight models mentioned above, the climatic variables that explained variation in chromosome number, at least marginally, were mean temperature, temperature continentality and precipitation seasonality. Overall, the association of mean temperature with chromosome number was negative with the slope of the phylogenetic regression nearly flat in Caryophyllales and Lamiids, while it was steeper in other clades, despite explaining less than 4% of the variance. However, although the relationship was not significant, it was positive in Fabids. A negative relationship with chromosome number was also found for precipitation seasonality, while temperature continentality was generally positively associated with chromosome number.

For categorical predictors, the relationship with both 2n and x was overall robust with 32 out of the 84 models outperforming the model without the predictor (Table 3). Chromosome number was mostly affected by the habitat categories (light, moisture and nutrients), but in some clades, also by morphological categories (growth form and flower size). The gain in AICc values for the outperforming models attained more than 2 units with the percentage of the variance explained largely exceeding 5%. These models also exhibited a significant reduction in half-life, which indicated that chromosome number evolved in response to categorical variables while some, but not all, of the phylogenetic effects in chromosome number is due to phylogenetic inertia. Specifically, open habitat selected for low chromosome numbers while shaded, stable environments (e.g., forest) selected for higher chromosome numbers in Monocots, Fabids, and Campanulids (for both 2n and x) and in Malvids and Lamiids (for 2n only) (Fig. 1). Habitat moisture and nutrient availability were positively associated with chromosome number, although some optimal values (θ) had little biological meaning. In particular, this lack of biological meaning applies to eutrophic and wet categories owing to the low number of branches attributed to these categories. Hence, we regard these results with caution; nevertheless, the overall patterns of the models were robust and understandable. Chromosome number was also positively related with plant morphological traits, namely growth form (life forms with longer life cycles had higher chromosome numbers) and flower size but in a few cases, was also negatively associated with flowers clustered into a flower-like inflorescence.

Figure 1
figure 1

Adaptation of chromosome number (2n: circles and x: triangles) to habitat light. Mean ± SE [back transformed] of primary equilibria for the three habitat categories (open, semi-shaded and forest) mapped on the phylogeny using parsimony reconstruction. Empty symbols show estimates obtained from a model that is not outperforming a model with a single optima.

Discussion

Adaptation and inertia in chromosome number

We found evidence for adaptation of chromosome number to environmental and morphological predictors, especially for 2n, while, not surprisingly, x exhibited a lower degree of variation and less significant adaptive evidence, providing insight into the possible ecological significance of polyploidy. Therefore, although the majority of variation remains unexplained, a Brownian motion process alone is certainly inadequate to explain the evolution of chromosome number42,43.

Our study indicates that phylogenetic inertia is a significant component in all models (t1/2 > 0). Nevertheless, categorical predictors and, to a lesser extent, some climatic variables (mean temperature and precipitation seasonality) supported the hypothesis1,2,38 that environmental instability and seasonality select for low chromosome numbers (i.e., decreased recombination rates). Specifically, open, disturbed, drought-prone habitats selected for low chromosome numbers, while shaded, stable environments with good availability of water and nutrients selected for high chromosome numbers and consequently for increased recombination rates2.

In addition, in agreement with Bell44 and Stebbins1, who argued that rapid reproductive cycles correlate with low chromosome numbers, we found that low chromosome numbers are associated with small flowers densely clustered in inflorescences. In contrast, we found that higher chromosome numbers are linked to perennial herbs, especially geophytes, in agreement with previous studies1,17,45.

From our results, it seems that chromosome number is poorly related to large scale climatic conditions. Only three climate variables had a relationship with the chromosome number, so that species with higher chromosome numbers tend to occur in sites with lower annual temperature, lower precipitation seasonality and higher continentality, consistent with previous work46. Nevertheless, the effects of climatic predictors are best explained by models of 2n evolution, indicating the relative contribution of polyploidy and its ecological significance to diploid chromosome number. In addition, among categorical predictors, models fit on 2n exhibited stronger association with environmental and morphological predictors than the models fit on x, again suggesting a putative ecological significance of polyploidy. The increase of polyploid taxa with latitude and in colder sites with continental climates has been already shown by previous studies34,47,48,49, albeit questioned by others50,51,52. The recent availability of large chromosome number databases21,53 has spurred further research on this subject, spanning a substantially wider taxonomic space and confirming this trend across the whole Arctic flora54 and at other geographical scales22.

Phylogenetic effect in chromosome number and clade-specific implications

In this study, we found phylogenetic effects in chromosome number ranging from moderate to large. This finding is congruent with data published previously23, which highlighted that closely related species share similar patterns of chromosome number variation. Despite this, chromosome number per se has little systematic significance, because it is well known that many taxa share the same chromosome number across different groups1. Further, the use of other analytical approaches to measuring phylogenetic signal (e.g., Blomberg’s K andPagel’s lambda) returned highly congruent results with patterns of phylogenetic effects higher for basic chromosome number (x) than those for 2n (Table S3).

The degree of chromosome number variation detected here allows for the investigation of possible mechanisms regulating chromosome number together with genome size. Indeed, the six considered clades, despite their origin from ancestors with very small genomes55,56, show remarkable differences in genome size, which is larger on average in Monocots (1C = 11.9 pg) and Campanulids (1C = 4.03 pg) than in Fabids (1C = 2.35 pg), Lamiids (1C = 2.32), Caryophyllales (1C = 1.7 pg), and Malvids (1 C = 1.45 pg) (data from57). Hence, in the first two clades, large nuclear volumes19 allow an increase in chromosome numbers and/or chromosome sizes, in agreement with the MIT14. In contrast, in the remaining four clades, an increase in chromosome number should be paralleled by the occurrence of smaller chromosomes in the absence of significantly larger genomes and nuclear volumes, again consistent with the MIT14. Accordingly, a reduction of chromosome number by descending dysploidy should be seriously constrained by limited nuclear volumes in Fabids, Lamiids, Caryophyllales, and Malvids. Indeed, in our dataset, these clades show smaller CV values concerning x, linked to a lower frequency of dysploidy than that observed in Monocots and Campanulids.

Monocots exhibit a high variation in chromosome number and include genera (e.g., in Poales) in which holocentric chromosomes occur, which is a peculiar condition allowing for rapid diversification and an extended range of chromosome numbers58. However, the high frequency of geophytes in this clade also explains the obtained results, especially for 2n. Indeed, in geophytes, large cells are assumed to be an advantage during the rapid development of the plant body and cell expansion59; as a consequence, they show higher tolerance to genome duplication with a high frequency of polyploids60. Under the assumptions of19, the occurrence of polyploidy in Monocots and Campanulids should be accompanied by significantly larger nuclear volumes, given their larger average genome sizes57. If polyploidisation events are evolutionarily followed by descending dysploidy events, the consequent reduction of chromosome number and simultaneous increase of chromosome size in these two clades should not be subjected to selective constraint under MIT14. Finally, in Monocots the co-occurrence of massive polyploidisation and large genome size variation points towards evolutionary phenomena linked to the large genome constraint hypothesis27.

The performance of the models on Malvids might be biased by low sampling61, while in Fabids, the results are partly obscured by the co-occurrence of dysploidy and polyploidy phenomena in our dataset, especially in Mediterranean taxa (e.g., Genista). In Caryophyllales, results are biased by the predominance of taxa belonging to the genus Limonium, but model fits on 2n and x are largely comparable. In this latter clade, habitat patchiness/stochasticity and high frequency of hybridization, rather than polyploidy, might have influenced speciation and chromosome number changes.

In the present study, several associations among chromosome number, plant traits and environmental factors were found. However, as mentioned by other authors28,62, there is no reason to expect chromosome number per se to affect plant fitness. Instead, chromosome number variation can be driven by cellular processes that affect meiosis and mitosis14,20. Hence, we also interpret the evolution towards an optimal state as a karyotypic equilibrium determined by mutation rates63, by nuclear division dynamics14,20, and possibly by epigenomic surveillance systems18, rather than an adaptive optimum.

Albeit limited in sample size, at first glance, our results showed that the evolutionary process is homogeneous across the phylogeny41 among different clades of Italian endemics. Yet, having demonstrated that phylogenetic inertia is a significant component in chromosome number evolution, future studies can be extended to larger samples and also using different approaches64, where not only the primary optima but also the α-parameter (strength of selection) and σ-parameter (strength of drift) differ among niches, allowing clade-dependent rates of adaptation to be tested65.

In conclusion, our study presents non-stochastic demonstrations for chromosome number variation, and we argue that environmental stability favors higher recombination rates in comparison to unstable environments. In addition, whilst phylogeny is a strong predictor of trait values, especially for x, we highlight that a simple phylogenetic explanation is inadequate to account for its variation in 2n and x.

Methods

Chromosome data

Chromosome counts for 1364 accessions of 801 vascular plants endemic to Italy have been extracted from Chrobase.it (http://bot.biologia.unipi.it/chrobase/)66. Chrobase.it is an online dataset of chromosome counts for the Italian vascular flora21, hosting cytogenetic data for endemic and non-endemic species. For this study we only selected counts of endemic plants because they are the most sensitive components of a flora, often being restricted to ecologically selective habitats67, for which we are confident that the environmental variables calculated in the present study can be a good proxy for the total ecological requirements of the species. Most counts in Chrobase.it are associated with an exact geographic locality. For those chromosome counts lacking precise information (<10%), we identified an approximate locality based on the restricted distribution range of the species68,69.

Mean chromosome numbers were estimated for each species, while within-species variation to be incorporated into the phylogenetic analysis was not estimated separately for each species, because within-species samples were limited to a few counts70. Hence, we estimated the pooled variance across the species and used it, weighted by sample size, to estimate the observation variance of the individual species, as recommended by71. Preliminary analyses revealed that log transformation improved model fit by over 500 log-likelihood units relative to untransformed data, thus mean chromosome numbers were log transformed prior to analysis62.

Chromosome number evolution was analysed on a dataset separated into 6 major angiosperm clades72 (Monocots, Fabids, Malvids, Caryophyllales, Lamiids, and Campanulids) to guarantee an adequate number of taxa per clade (ranging from a minimum of 67 to a maximum of 200). This analysis allowed us to evaluate whether chromosome number evolution proceeded differently or homogeneously along different evolutionary histories. The complete data set assembled for the present study is reported in supplementary Table S2.

Phylogeny

We compiled a phylogeny using the dated, ultrametric supertree for 4685 European vascular plants (DaPhnE 1.0 supertree)73 based on 518 recent molecular phylogenies. We first completed this supertree for the 281 species in our data set that were absent from the supertree (or for which related species were missing in the supertree) following the methods of Durka and Michalski73, using more than 90 phylogenetic and systematic studies. We then reduced this tree to the 801 species in our dataset. The complete list of sources used in this paper is reported in supplementary Table S1. Mean diploid (2n) and basic (x, see74) chromosome numbers for each taxon were visualized on the phylogenetic tree (Figs 2 and S1) with the plotsimmap function in the package phytools75 of R76. Basic chromosome numbers (x) were obtained by a taxon by taxon screening of relevant karyological literature previously published by40.

Figure 2
figure 2

Ultrametric phylogenetic tree of 801 vascular plants endemic to Italy. The clades considered in this study are coloured and named according to72. Bars at the tips of the tree are proportional to the mean value of 2n and x per species (as indicated).

Climatic, ecological and morphological data

Climatic data associated with the sampling sites were downloaded from the Worldclim database at 2.5 min scale (http://www.worldclim.org77). The climate data considered were: mean annual temperature (°C), temperature seasonality (SD × 100), temperature continentality (°C), mean annual precipitation (mm) and precipitation seasonality (coefficient of variation). Means and standard deviations were estimated for each species within a buffer of 10 min over georeferenced sites using QGIS v. 2.18 (Quantum GIS, http://www.qgis.org).

Data on morphological traits and habitat characteristics for each of the considered species were retrieved from the literature78,79 (Table S2). To characterise the relevant features of vegetative morphology and reproductive strategies, we considered the growth form (annual herb, geophytes, perennial herb, and woody), the flower size (large >5 mm, small <5 mm, and inconspicuous 1 mm) and whether the flowers are densely clustered in inflorescences. Habitats were classified using ecologically meaningful categorical variables in terms of stability vs. instability of the habitats and reliable vs. unreliable resource availability. For this purpose, we classified the species into three categories according to soil moisture (dry [1–4], moist [5–8], and wet [9–12]), habitat light (open [9–12], semi-shaded [5–8], and forest [1–4]) and soil nutrient contents (oligo- [1–3], meso- [4–6], and eu-trophic [7–9]) based on Ellenberg indicator values (grouped as reported in square brackets) reported for the species79.

Phylogenetic comparative analysis

We investigated whether the evolution towards optima of diploid (2n) and basic (x) chromosome numbers is influenced by climatic variables (continuous predictors), habitat characteristics or plant traits (categorical predictors) within different angiosperm clades.

Thus, we used the phylogenetic comparative method implemented in the R program SLOUCH, established to study adaptive evolution of a trait along a phylogenetic tree41. The method assumes that the response trait evolves as if by an Ornstein–Uhlenbeck model of adaptive evolution towards a primary optimum θ, defined as the optimal state that species will approach in a given niche42.

Whilst we illustrated the overall evolutionary relationships among clades in Fig. 2, the phylogenetic comparative analyses were run only on the subtrees associated with each clade. Phylogenetic trees are scaled to 1.0 (from the root to the tip of the ultrametric tree) for an easier interpretation of parameter estimates41. The two main parameters returned by the model are the phylogenetic half-life (t1/2) and the stationary variance (vy). Phylogenetic half-life indicates the time it takes for half the ancestral influence on a trait to evolve towards the predicted optimal phenotype42. A half-life greater than zero means that adaptation is not instantaneous, while t1/2 = 0 means that there is no evolutionary lag. The stationary variance is the stochastic component of the model and can be considered as evolutionary changes in the response trait induced by genetic drift.

Phylogenetic half-life in a model that only includes the intercept is an estimate of the phylogenetic effect in the response trait. In such a model, a half-life = zero means that the response variable is not phylogenetically clustered, while a half-life >0 suggests that there is an influence of phylogeny on the data; a half-life with high values can be attributed to an underlying continuous Brownian motion process.

The intercept-only model is contrasted with a model that also includes a predictor variable. This type of model is regarded as an adaptation model because it tests whether the response traits evolve towards optima influenced by a predictor. By comparing a model with and without the inclusion of predictor variables, it is possible to determine how much of the phylogenetic signal can be attributed to phylogenetic inertia (i.e., resistance to adaptation). No reduction in t1/2 suggests that the phylogenetic signal can be entirely attributed to phylogenetic inertia; in contrast, when a trait evolves in response to a variable, a reduction in the half-life for the response trait (and/or of its support interval) should be observed.

The adaptation models, which include continuous predictors (i.e., the climatic variables in our study), are fitted using maximum likelihood for estimation of t1/2 and vy and generalized least squares for estimation of the regression parameters in an iterative procedure41. In addition, the SLOUCH model assumes that the predictors have a longer phylogenetic half-life than the model residuals, and this is well supported by the variables involved in our study. The model returns parameters of an optimal regression and of a phylogenetic regression. The former is the relationship between the response and predictor variable that is predicted to evolve free of ancestral influence (absence of inertia). Therefore, the slope of this regression must be steeper than that of the phylogenetic regression.

To evaluate the effect of the categorical predictors on the evolution of chromosome number, the ANOVA and ANCOVA extensions implemented in SLOUCH were used. Categorical predictors were mapped onto the phylogeny using parsimony reconstruction.

We used Akaike’s Information Criterion (AIC) and AIC weights (AICw) to compare intercept-only models to the adaptation models. Use of AICw standardizes the weight of evidence in favour of each alternative model between 0 and 180. We also reported AIC weights for each model relative to the set of models evaluated.

Finally, model interpretations were based on comparisons of t1/2 and vy of the adaptation models with the intercept-only model and with a pure Brownian motion model, along with the amount of variation in chromosome number that the models explain. All statistical analyses have been carried out in R v3.2.376.