Introduction

Concerted evolution is an evolutionary process by which sequences from the same gene family show higher sequence similarity to each other than to orthologous genes in related species1,2. Hence, genes evolved in a concerted manner present low polymorphism in their sequences, i.e., the sequences are homogenized. Concerted evolution is particularly notable in multicopy nuclear genes, where homogenization is mainly achieved by unequal crossing over and gene conversion3,4. One of the best characterized multicopy gene families is the 45S nuclear ribosomal DNA (nrDNA)5. It appears arranged as tandem repeated units with hundreds to thousands of copies in one or several loci per genome. These units are composed of the 18S rDNA, internal transcribed spacer 1 (ITS1), 5.8 S rDNA, internal transcribed spacer 2 (ITS2), and 26S rDNA, separated by longer non-transcribed intergenic spacers6. Among all these units, the internal transcribed spacers (ITS1 and ITS2) are the best-characterized nrDNA sequences7 partly because ITS sequences show characteristics advantageous for phylogenetic studies, such as biparental inheritance, short length, and high evolution rate4,8,9.

ITS sequences usually present fast concerted evolution with low levels of intra-genomic sequence variation and very few polymorphic positions10,11. However, in some animals (e.g.,12) and especially in plants13,14,15,16,17,18, sequence homogenization remains incomplete across ITS sequences, resulting in relatively high intra-genomic polymorphism. This ITS diversity is often linked to hybridization events9,19,20,21,22. Different ITS sequences may meet after hybridization and become homogenized after a time, but this homogenization may not be consistent among descendant lineages23. As concerted evolution tends to homogenize sequences rapidly8, evidence of non-concerted evolution is mostly expected in recently-formed hybrid species, where both parental ITS sequences may still be present. This phenomenon should be particularly conspicuous in recent allopolyploid species, where the occurrence of different ITS sequences located in distinct chromosomes tends to delay this homogenization24.

Erysimum l. (Brassicaceae) comprises more than 200 species25, mainly from Eurasia, with some species inhabiting North America and North Africa26,27. The Baetic Mountains (SE Iberia) constitute one of the most important glacial refugia in Europe and a hotspot for this group, with ~ 10 Erysimum species occurring in this small area28,29. In particular, these Erysimum species show characteristics that may facilitate hybridization and inter-specific gene-flow, such as occasional sympatry and a generalist pollination system32. Thus, previous studies have suggested that several of these taxa could have a hybrid origin30,31,32. Ploidy levels vary among and, in some cases, within species28,33, suggesting that a detailed understating of hybridization and allopolyploidization is necessary to shed light on the evolution of this group. However, the effects of hybridization and polyploidization on the genomes of these species are far from being fully understood.

In this study, we explore the homogenization dynamics of ITS, taking into account the interacting effects of concerted evolution, hybridization, and polyploidization. For this purpose, we analyzed polymorphisms at the species, population, and individual levels in ITS1 and ITS2 for seven Erysimum species. These species are closely related and belong to an Iberian clade within this genus34. We sequenced ITS1 and ITS2 by NGS to recover all the ITS copies present in the different genomes11,35. With these sequences, we then proceeded to quantify the degree of sequence homogenization in both ITS1 and ITS2 within individuals, populations, and species; and the concerted evolution levels in Erysimum spp. These species have been previously studied, showing a mainly outcrossing mating system with weak prezygotic barriers among them32,33. A cpDNA phylogeny has shown a recent origin (< 2 Mya) for these species36. Moreover, other phylogenies for these same species have shown reticulated patterns with a lack of species clustering in some cases and evidence of cytonuclear discordance, suggesting a recent hybridization scenario with allopolyploidization33. Due to their recent evolution, we hypothesize that polyploid species of this genus will have less ITS homogenization than diploid species. Any insight into ITS evolution in plants needs to consider the concomitant effects of hybridization and polyploidization on the rates of concerted evolution.

Materials and methods

Taxon sampling

We collected fresh leaves from polyploid and diploid Erysimum species. To determine DNA ploidy levels and assess genome size for each population, we used flow cytometry (see Table 1 for details on species ploidy levels). Specific details on the flow cytometry analyses could be found in Osuna-Mascaró et al.32,33. In particular, we sampled leaves from five individuals belonging to three different populations of Erysimum baeticumE. bastetanumE. mediohispanicumE. nevadense, and E. popovii, and five individuals from one population of E. lagascae and five from the microendemic E. fitzii. A total of 85 samples (= leaves of each individual) were dried and preserved in silica gel until DNA extraction. Table 1 shows the code, location, and ploidy levels of all samples.

Table 1 Taxonomic assignment, population code, location, elevation, and ploidy level for the Erysimum spp. populations sampled.

DNA extraction

We used at least 60 mg of dry plant material from each sample. We disrupted the tissues in liquid N2 using a mortar and pestle. Then, total genomic DNA was isolated using the GenElute Plant Genomic DNA Miniprep kit (Sigma-Aldrich, St. Louis, MO) following the manufacturer's protocol [https://www.sigmaaldrich.com/ES/es/technical-documents/protocol/genomics/dna-and-rna-purification/genelute-plant-genomic-dna-purification-kit]. The quantity and quality of the obtained DNA were checked using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, United States), and the integrity of the extracted DNA was checked on agarose gel electrophoresis.

ITS1 and ITS2 amplification

We independently amplified ITS1 and ITS2 in each sample. The ITS PCR reactions were performed in 25 μl with the following composition: 5 μL 5 × buffer containing MgCl2 at 1.5 mM (New England Biolabs), 0.1 mM each dNTP, 0.2 µM each primer, and 0.02 U Taq high fidelity DNA-polymerase (Q5 High-Fidelity DNA Polymerase, New England Biolabs). We used a set of long primers developed to have a 5' flanking sequence complementary to the Nextera XT DNA index to facilitate adapter ligation during library construction:

 > ITS1-Flabel (for ITS1 amplification)

TCG TCG GCA GCG TCA GAT GT GTA TAA GAG ACA GTC CGT AGG TGA ACC TGC GG

 > ITS1-Rlabel (for ITS1 amplification)

GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GGC TGC GTT CTT CAT CGA TGC

 > ITS3-Flabel (for ITS2 amplification)

TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG GCA TCG ATG AAG AAC GCA GC

 > ITS4-Rlabel (for ITS2 amplification)

GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GTC CTC CGC TTA TTG ATA TGC

Reactions included 30 cycles with the following conditions: 94 °C 15 s, 60 °C 30 s, and 72 °C 30 s. Amplified fragments were purified using spin columns (GenElute TM PCR Clean-Up Kit, Sigma-Aldrich) and were checked on agarose gel electrophoresis. Finally, we quantified the starting DNA concentration using the Infinite M200 PRO NanoQuant spectrophotometer (TECAN, Männedorf, Switzerland).

Library construction

We constructed two libraries, one for ITS1 amplicons and one for ITS2 amplicons. The libraries were prepared using the Nextera XT DNA Sample Preparation Kit. In brief, the DNA was tagged by adding a unique adapter label combination to the 3' and 5' ends of the DNA sequence. Then, the DNA was amplified via a nine-cycle PCR. The total volume reaction was 25 μl with the following composition: 5 μL 10 × buffer at 1.0 mM (New England BioLabs), 0.1 mM each dNTP, 0.2 µM each Nextera primer, 0.02 U Taq high fidelity DNA-polymerase (Q5, NEB), and 5 × Q5 High GC Enhancer (NEB). PCR thermocycling conditions were 98 °C during 5 s, 55 °C for 10 s, and 72 °C for 10 s. After that, we purified both libraries using the GenElute PCR Clean-Up Kit (Sigma) to remove short library fragments. Finally, we generated equal volumes of the libraries to prepare equimolar libraries for sequencing, and the final concentration of each library was quantified using the Infinite M200 PRO NanoQuant spectrophotometer (TECAN, Männedorf, Switzerland).

Library sequencing

ITS1 and ITS2 library sequencing were carried out by Novogene Bioinformatics Technology Co., Ltd, with an Illumina MiSeq platform (Illumina, USA) using a paired-end 150 bp sequence read run. The ITS libraries of E. mediohispanicum were sequenced twice due to an unexpected low sequencing output (we constructed new libraries as explained above). This sequencing was done using the Illumina Miseq platform and paired-end chemistry in the Center for Scientific Instrumentation (CIC) of the University of Granada, Spain.

Data analysis

FASTQ files were demultiplexed, and read quality was checked in FastQC v0.11.537. Then, we did a trimming procedure using first cutadapt v1.1538 to trim the adapters, followed by a quality trimming using Sickle v1.3339. Forward and reverse reads were paired in Geneious R.1140. Using the function "Set pair read" with default parameters for Illumina paired-end read technology. Then the paired reads were merged using BBMerge v37.6441 with a "Low Merge rate" to decrease false positives. Then, to reduce redundancy and noise caused by sequencing errors and tag switching events, we did a cluster analysis using CD-HIT v4.6.842. We clustered the sequences from each sample using an identity threshold of 0.99 (i.e., we merged the sequences with similarity ≥ 99%) and discarded the clusters that included < 5% of the total reads43. This step reduced the contribution of sequencing errors to the reported sequence diversity.

We aligned the sequences from each sample using MAFFT v7.45044 with default parameters, generating one alignment per species and marker. We trimmed the alignments using trimAl v1.245, removing poorly aligned regions with the "gappyout" method. We estimated population genetic parameters at intra-species, intra-population, and intra-individual levels using the R package PEGAS v0.146. We used the "nuc.div" function to calculate nucleotide diversity (π), estimated as the average number of nucleotide differences per site between two sequences47,48. We constructed boxplots in R to depict the nucleotide diversity (π) of each sample for ITS1 and ITS2 using the package ggplot249. Moreover, we estimated the haplotype diversity (Hd), with the "hap.div" function, as the probability of differentiation between two randomly chosen haplotypes. We then used the "haplotype" function to calculate the total number of haplotypes and the haplotype frequency distribution for each species, population, and individual. We represented the number of ITS1 and ITS2 haplotypes per sample for diploid and polyploid species with a boxplot generated in R using ggplot249. We checked for normality using Shapiro–Wilk’s method and then compared the nucleotide and haplotype diversity and the number of haplotypes among polyploid and diploid species and among ITS1 and ITS2 using the Mann–Whitney–Wilcoxon test. All statistical analyses were done in R v 4.1.0 using the package stats v3.6.150.

We investigated potential correlations among ploidy levels and haplotype and nucleotide diversity for ITS1 and ITS2 samples. Also, as these species were described as frequently hybridizing, we studied if there were shared haplotypes among different populations of the same species and among different species. To explore that, we estimated the total number of ITS1 and ITS2 haplotypes and their frequencies.

We analyzed the genetic structure of ITS1 and ITS2 by performing a hierarchical analysis of molecular variance (AMOVA;51). We used the "amova" function from the R package PEGAS v0.146 to explore the genetic variation explained by populations (i.e., at the population level), among individuals within populations (i.e., at the individual level), and within individuals (i.e., at the intra-genome level). We run an AMOVA for each species, including all the population samples, regardless of population ploidy levels. Moreover, we analyzed the amount of genetic variation in ITS1 and ITS2 explained by interspecific differences by partitioning the variance into three levels: among species, among populations within species, and within populations (i.e., among individuals). For that, we run two different AMOVA analyses first, all the sequences of ITS1 and then all the sequences of ITS2, regardless of the species and ploidy level.

Research involving plants

We obtained permission for collecting plant material from: Junta de Andalucía, Consejería de Medioambiente y Ordenación del Territorio. The sampling complied with all institutional, national, and international guidelines and legislations.

Results

From the initial 85 individuals, we obtained good-quality sequences for a total of 84 ITS1 and 81 ITS2 samples, with 10,156 ± 1233 sequences per individual for ITS1 and 49,428 ± 7678 sequences for ITS2 (Table S1).

Polyploid species (E. baeticum, E. bastetanum, E. popovii) tended to have higher nucleotide diversity than diploid species (Fig. 1) for both ITS1 (Wilcoxon test = 655, p value: 0.04; mean π ± SE; polyploid: 0.012 ± 0.007, diploid: 0.004 ± 0.006) and ITS2 (Wilcoxon test = 663, p value: 0.03; polyploids: 0.003 ± 0.004, diploids: 0.002 ± 0.003). In addition, the polyploid population of E. mediohispanicum (Em71, 4x) showed higher nucleotide diversity than the two diploid populations of this species, marginally significant for ITS1 (Wilcoxon test = 10, p value: 0.05; Em71: mean π = 0.011 ± 0.006; Em39: mean π = 0.006 ± 0.007 for ITS1; Em21: mean π = 0.0004 ± 0.001; Fig. S1) and more pronounced for ITS2 (Wilcoxon test = 8.5, p value: 0.04; Em71: mean π = 0.003 ± 0.002; Em39: mean π = 0.0003 ± 0.001 for ITS2; Em21: mean π = 0.001 ± 0.002 for ITS2; Fig. S1). Furthermore, the correlation between ploidy level and nucleotide diversity was highly significant for ITS1 (Spearman’s rho: 0.48, p value: 2.10 × 10–6; Fig. S2) and marginally significant for ITS2 (Spearman’s rho: 0.20, p value: 0.06). The difference in the degree of association of ITS1 and ITS2 polymorphisms with ploidy levels might be a consequence of overall diversity. ITS2 samples presented significantly lower nucleotide diversities than ITS1 ones (Wilcoxon test = 5165.5, p value: 3.33 × 10–7). Nucleotide diversity values for ITS1 and ITS2 at the three levels of analysis (species, population, individual) are shown in Tables S2S8.

Figure 1
figure 1

Boxplot depicting the nucleotide diversity (π) for ITS1 and ITS2 samples. Nucleotide polymorphism was estimated for each Erysimum individual as the average number of nucleotide differences per site between two sequences (Nei and Li47). E. baeticum (Ebb, ploidy 8x), E. bastetanum (Ebt, ploidy 4x and 8x), E. popovii (Ep, ploidy 4x and 10x), and one population of E. mediohispanicum (Em, ploidy 4x) are polyploids. E. nevadense (En), E. fitzii (Ef), two populations of E. mediohispanicum (Em), and E. lagascae (Ela) are diploids.

Haplotype diversity showed a similar pattern to that of the nucleotide diversity, with higher haplotype diversity for polyploid species than diploid species, for ITS1 (Wilcoxon test = 343, p value: 2.16 × 10–6; mean Hd = 0.89 ± 0.38 for polyploid; mean Hd = 0.50 ± 0.49 for diploid) and marginally significant for ITS2 (Wilcoxon test = 632.5, p value = 0.059; mean Hd = 0.39 ± 0.49 for polyploid; mean Hd = 0.28 ± 0.45 for diploid). Moreover, the degree of association between haplotype diversity and ploidy level seemed to differ between ITSs, being highly significant for ITS1 (Spearman’s rho: 0.43, p value: 2.96 × 10–5) but only marginally significant for ITS2 (Spearman’s rho: 0.18, p value: 0.09). The values of haplotype diversity for both ITS and three levels are shown in Tables S2S8. ITS2 presented lower haplotype diversity than ITS1 in terms of haplotype numbers (Wilcoxon test = 4458, p value 0.002; Table 2). ITS2 diversity was reduced to a single haplotype (i.e., no polymorphism was detected) in 49 individuals (Tables S2S8). Conversely, only 30 individuals showed no nucleotide diversity in ITS1 (Tables S2S8).

Table 2 Average haplotype diversity (Hp) per species, estimated for ITS1 and ITS2 samples. Maximum and minimum values (in parentheses) refer to individual samples.

Polyploid species showed higher number of haplotypes than diploid species (Fig. 2). Moreover, several ITS1 haplotypes were shared across species, particularly among some populations of E. bastetanum, E. fitzii, E. mediohispanicum, and E. nevadense. Specifically, we found that the three populations of E. bastetanum studied in this article shared haplotypes with two E. mediohispanicum populations (Em39, Em71) and with the three populations of E. nevadense. In addition, E. bastetanum populations and one population of E. nevadense (En05) shared haplotypes with the E. fitzii population included in the analyses. Conversely, no ITS2 haplotypes were found to be shared across different species (Tables S10, S11, S13, and S14).

Figure 2
figure 2

Boxplot depicting the number of ITS1 (left) and ITS2 (right) haplotypes per sample for diploid and polyploid species.

The hierarchical AMOVA showed that interspecific differences were a significant source of variation for both ITS (Table 3). The species-level explained 52.63 and 73.50% of the variance for ITS1 (p value < 0.001, Φ = 0.48) and ITS2 (p value < 0.001, Φ = 0.70) respectively, implying ample genetic divergence among species. Conversely, differences among populations were not significant and absorbed a relatively low amount of molecular variance (< 9% for both ITS1 & ITS2; Table 3). When the genetic structure was separately analyzed for each species, we found more complex results. Most of the variance (44.96–100% for ITS1; 29.12–100% for ITS2) resided within-individuals (see Table 4). Differences among populations varied from 0 to 48.07% for ITS1 and from 0 to 70.87% for ITS2. Moreover, the differences were only significant in E. mediohispanicum, E. nevadense, and E. popovii for ITS1 and E. bastetanum for ITS2 (Table 4).

Table 3 Hierarchical AMOVA results for ITS1 and ITS2 regions.
Table 4 ITS1 and ITS2 hierarchical AMOVA results for E. baeticum, E. bastetanum, E. fitzii, E.lagascae, E. mediohispanicum, E. nevadense, and E. popovii.

Discussion

We observed incomplete sequence homogenization for the 45S rDNA regions in the Erysimum species studied here. Our analyses were based on stringent trimming to avoid false polymorphisms due to sequencing errors. However, despite being so restrictive, we found high nucleotide and haplotype diversities overall, especially for ITS1, and a significant genetic structure that may inform the evolutionary history of these species.

Polyploid Erysimum species presented lower ITS homogenization levels than diploid species. Specifically, polyploid species presented higher nucleotide and haplotype diversity and a higher number of haplotypes, congruent with the hypothesis that polyploids harbor greater genetic diversity even within gene families52. The lack of concerted evolution in polyploid species has been previously described in several plant species in which an absence of sequence homogenization could be related to a recent allopolyploid origin34,53,54,55,56. Moreover, some studies have suggested that the number of rDNA loci, usually located in different chromosomes, is expected to be higher in polyploids, hindering sequence homogenization57,58. The number of rDNA loci and their chromosomic locations in these Erysimum species is unknown. In the genome of the diploid E. cheiranthoides34, the rDNA appears in eight locations in chromosomes 3, 6, 7, and 8, which may be related to the number of rDNA loci for the diploid Erysimum species studied here. In any case, a relatively higher number of rDNA loci is expected for polyploid Erysimum species. Although the number of rDNA loci may coincide with the sum of those of its parents in young allopolyploids, it could be more variable in older polyploids, where some loci are usually lost59,60,61,62.

We also detected limited sequence homogenization in diploid species, particularly in ITS1. The high molecular variance within diploid genomes (Table 4) could be the result of past hybridization events, which might result in the coexistence of multiple ITS families within individual genomes, particularly if hybrids are young34,63. This result is congruent with previous studies, in which the genomes of the diploid Erysimum species studied here were found to exhibit signatures of recent hybridization and introgression33. Moreover, Erysimum phylogenies based on ITS sequences64,65,66 showed a variable degree of phylogenetic incongruence compatible with hybridization. Here, the influence of hybridization on ITS diversity is further supported by the significant molecular variance among populations detected in some species (i.e., E. mediohispanicum, E. nevadense, for ITS1), showing a non-consistent homogenization pattern in the population level. Thus, these results suggest a different history of hybridization for each population, in concordance with previous studies32,33.

Our results indicated that sequence homogenization was heterogeneous across the 45S rDNA regions within a general scenario of high diversity (Tables S2S8). The degree of polymorphism exhibited by ITS1 was much higher than that of ITS2, suggesting that concerted evolution is operating more efficiently on the latter. This result agrees with previous studies that have shown that ITS1 is, on average, more variable than ITS2, which has been described as a very conserved marker67,68,69,70,71,72. This variation between the two spacers might help to analyze evolutionary patterns at different scales. While ITS1 variation might throw light on divergence at the population- or individual-level, our AMOVA results (Table 3) suggest that ITS2 could be useful for species-level characterization, at least in Erysimum spp.

Because of their sensitivity to hybridization, ITS markers have been previously used to identify the parental contributors of hybrid taxa17,53,64,73,74. Our study found shared haplotypes among diploid and polyploid species (specifically among E. bastetanum—a polyploidy—and the diploid species E. fitzii, E. mediohispanicum, and E. nevadense), which could be the result of incomplete lineage sorting or the effect of recent hybridization events. However, we have not found decisive evidence of whether these diploid species could be considered parental species of the polyploid taxon. Moreover, our results indicate hybridization across taxonomic levels (i.e., from individuals to species) since they are more congruent with multiple backcrossings across populations and taxa than with a single, “original” allopolyploidization event. Reticulated evolution seems to be the norm in this genus33,36,59,75,76,77. Thus, for the species analyzed in this study, Osuna-Mascaro et al.33 have found genomic evidence of rampant introgression between species, including both lilac- and yellow-flowered species. Future studies identifying the alleles co-located on the same chromosome through phased haplotypes78 or using PacBio single-molecule sequencing and the PURC method (Pipeline for Untangling Reticulate Complexes;79) could be used to identify parental species of the different hybrid taxa and trace back the evolutionary patterns of these Erysimum species.

Despite their evident versatility as molecular evolution markers, the analysis of ITS sequences needs to be undertaken to realize that concerted evolution might often be insufficient to ensure sequence homogenization80. Both ITS and, especially, ITS2 have for a long time been used as phylogenetic and barcoding markers in plants8,69,81,82,83. However, many studies have pointed out that evolutionary inferences based on these markers might lead to misleading or erroneous conclusions in species where sequence homogenization is lacking due to hybridization or other genome rearrangement events9,11. In this study, our results indicate that allopolyploidization and hybridization have severely impaired ITS sequence homogenization in Erysimum, implying that ITS-based phylogenies of this genus should be considered with prudence. Given that these causes of genomic rearrangement are widespread and prevalent among flowering plants11, caution is advised when using ITS for phylogenetic studies without prior knowledge of haplotype distribution, even for diploid species. Hence, intragenomic variation for ITS sequences could be used as an indication of possible recent hybridization.