Unraveling the evolutionary dynamics of ancient and recent polypoidization events in Avena (Poaceae)

Understanding the diversification of polyploid crops in the circum-Mediterranean region is a challenging issue in evolutionary biology. Sequence data of three nuclear genes and three plastid DNA fragments from 109 accessions of Avena L. (Poaceae) and the outgroups were used for maximum likelihood and Bayesian analyses. The evolution of cultivated oat (Avena sativa L.) and its close relatives was inferred to have involved ancient allotetraploidy and subsequent recent allohexaploidy events. The crown ages of two infrageneric lineages (Avena sect. Ventricosa Baum ex Romero-Zarco and Avena sect. Avena) were estimated to be in the early to middle Miocene, and the A. sativa lineages were dated to the late Miocene to Pliocene. These periods coincided with the mild seasonal climatic contrasts and the Mediterranean climate established in the Mediterranean Basin. Our results suggest that polyploidy, lineage divergence, and complex reticulate evolution have occurred in Avena, exemplifying the long-term persistence of tetraploids and the multiple origins of hexaploids related to paleoclimatic oscillations during the Miocene-Pliocene interval in the circum-Mediterranean region. This newly-resolved infrageneric phylogenetic framework represents a major step forward in understanding the origin of the cultivated oat.

The monophyly of Avena received strong support (MLBP = 96%, PP = 1.00). Three clades and two nodes were observed in the ppcB1 phylogram:  (Fig. S3). The clade A'C-PPI was sister to a single monophyletic lineage (PP = 0.62) containing nodes AB-PPI and A'C-PPII and clades A-PPI and A'C-PPIII in Avena (Fig. 2).

Figure 2. Maximum likelihood tree of Avena inferred from nuclear ppcB1 data including three clades (A-PPI in red, A'C-PPI in green, and A'C-PPIII in light green) and two nodes (AB-PPI in blue and A'C-PPII in dark green
In BI analyses, four clades plus eight polyploids [A'(D)-type sequences of tetraploids A. insularis and A. maroccana, and hexaploids (without A. nuda)] were observed in the GBSS1 tree: A'C-GBI (C-type sequences of clade AC-GBI members in ML analysis) (MLBS = 66%; PP = 1.00) (  (Fig. S10). Clades A'C-GBII plus AB-GBII included the same members as clade AB&AC-GBI (without A. sterilis). Clade AB-GBI was sister to clade AB-GBII, and this group (PP = 0.66) plus clade A'C-GBII (PP = 0.50) was assigned to a single monophyletic lineage (PP = 0.98), which was sister to clade A'C-GBI. This large clade received strong support (MLBS = 94%, PP = 1.00) in Avena (Fig. 4).

Discussion
Infrageneric phylogeny and allopolyploidy events in Avena. Two strongly supported infrageneric lineages within Avena were identified by the plastid data: the C-genome diploid lineage (Avena sect. Ventricosa) containing A. clauda, A. eriantha, and A. ventricosa in clade C-NRR; and the A-genome diploid-polyploid lineage (Avena sect. Avena) containing other congeneric species in clade AB-NRR + A'C-NRR (Fig. 6). Members of C-genome diploid lineage were distributed from the south Mediterranean to the Irano-Turanian region 5,6 , and they were easily distinguished based on their unequal glumes 15 , fusiform caryopses with striate sculpturing 32 , and heterobrachial chromosomes with heterochromatin blocks along long-arm terminals 8 . Morphological, cytogenetic, and phylogenetic evidence supported recognizing this lineage as a distinct section, Avena sect. Ventricosa, which was embedded within clades A'C-PPIII (Fig. S3) and A'C-GBI based on nuclear data (Figs S4 and S7). In the ppcB1 and GBSSI trees, Avena sect. Ventricosa shared a high degree of genetic similarity with C-type homoeologues of polyploids. Consequently, the ancestor of Avena sect. Ventricosa was probably the C genome donor for A'C(DC)-genome tetraploids and hexaploids.

Figure 5. Maximum likelihood tree of Avena inferred from nuclear gpa1 data including seven clades (A-GPI in red, C-GPI in brown, AB-GPI in light blue, AB-GPII in blue, A'C-GPI in green, A'C-GPII in dark green, and A'C-GPIII in light green). Explanation of branch thickness and colorful terminal symbols refer to
clustered with AB-genome tetraploids in node AB-PPI (Fig. S2), and As-genome diploids (A. atlantica, A. hirtula, and A. wiestii), A. damascena (Ad-genome), and A. longiglumis (Al-genome) clustered with AB-genome tetraploids in clade AB-GBII (Fig. S10). Phylogenetic relationships among the As-, Ad-, and Al-genome diploids  (Table S4). and A'C-genome tetraploids, together with those of As-, Ac-, Ad-, Al-, and Ap-genome diploids and AB-genome tetraploids indicated that the close relatives of the A'C-and AB-genome tetraploids might be found within different A-genome groups based on ppcB1 data. Therefore, we hypothesize that AB-and A'C(DC)-genome tetraploids originated from different A-genome diploid ancestors (Table S5 13,16,18,21,22 ). Whole genome sequencing data including repetitive DNA might be able to detect the A (A')-genome constitution in Avena tetraploids 12 .
Three secondary gene pool members, A. insularis, A. maroccana, and A. murphyi are native to the northwest Africa and adjacent environs (i.e., A. insularis in Sicily and Tunisia, A. maroccana in Moroccana, and A. murphyi in southern Spain and northern Morocco) 5,21 . They formed a clade A'C-GPI together with hexaploids in the gpa1 tree (Fig. S11). In view of the chromosome pairing capacity between A'C(DC)-genome tetraploids and hexaploids 21 and sequence-based diversity data 9 , the A' and C genomes in the three tetraploids matched closest with D and C genomes in cultivated oat 10 . Since the As-, Ad-, and Al-genome diploids were involved in the A'C(DC)-genome tetraploid formation, it cannot be excluded that A'C(DC)-genome tetraploids originated from the ancient allotetraploidy events owing to the isolated phylogenetic positions of A. maroccana in clade A'C-PPI (Fig. S1), those of A. insularis inserted within a monophyletic lineage of the GBSSI tree (Fig. S8), and that of A. murphyi in clade A'C-GPI (Fig. S11). If this was the case, one would expect three or more ancient A-genome diploids to have participated in the origin of A'C(DC)-genome tetraploid. The three tetraploids have been reported as AC-genome-derived based on anonymous genotyping-by-sequencing (GBS) markers 9 , while the A'C(DC) designation of the tetraploids is fully compatible with our results together with another analysis based on GBS markers located on hexaploid chromosomes 10 .
Within the As-genome diploids, Avena hispanica was isolated from the closely related A. hirtula and A. lusitanica in the clade A'C-NRR + AB-NRR of plastid tree (Fig. 6). However, A. lusitanica (group 5) showed specific genetic divergence from A. hirtula and A. hispanica (group 3) in high-density GBS analysis 10 . Based on the length of lemma biaristulate tips (5-12 mm 6 ) and the genome size (9.08 ± 0.11 12 ), A. hirtula can be easily differentiated from the two As-genome diploids, that have a similar genome size to the smallest Ad-genome diploid A. damascene 12 . The incongruencies among morphological characters and genetic differences make the identification of the As-genome species challenging. Avena lusitanica and A. hispanica might represent ecotypes of A. hirtula found in the circum-Mediterranean, western Asia, and Europe 5,33 .
Allohexaploid origin of Avena sativa. Two distinct steps were inferred for the formation of the cultivated oat. The first step includes the ancient allotetraploidy events involving the hybridization between the ancient A'(or diverged A)-and C-genome diploid ancestors to form A'C (now called DC)-genome tetraploids. The second step includes subsequent recent allohexaploidy events involving hybridization between DC-genome tetraploids and the more recent A-genome diploid progenitors to form the extant ACD-genome hexaploids 18 . The close relationship between the genetically homogeneous Avena sect. Ventricosa and the C-copy sequences of A'C-genome tetraploids plus hexaploids was a novel discovery which suggested their C-genome donor to be the ancestor of Avena sect. Ventricosa. This was consistent with the hypothesis that the paleotetraploidy events pre-dated and potentially triggered divergence of the extant A'C(DC)-genome tetraploids in narrow ranges of the Mediterranean Basin 9 . In the gpa1 tree, A'C(DC)-genome tetraploids together with hexaploids comprised the clade A'C-GPI (Fig. S11). Therefore, the nuclear data provided robust evidence for the designated D and C genomes in cultivated oat, matching closest with A'(D)-and C-genome in A. insularis, A. maroccana, and A. murphyi, and the A-genome designation matches better with the extant A-genome diploids in Avena.
The close relationships among three A-genome diploids and cultivated oat were observed in the ppcB1 tree, i.e., A. atlantica, A. longiglumis, and A. wiestii were embedded within the A'C-PPII-A1, A'C-PPII-A2, and A-PPI subclades (Figs S1 and S3). The IGS-RFLP dendrogram suggested that A. atlantica should be placed within the cluster containing polyploids rather than within the A. strigosa cluster 13 , showing that A. atlantica has genetic dissimilarities with A. strigosa 34 . Avena longiglumis formed a strongly supported subclade A'C-PPII-A2 with A. sativa (Fig. S1). In addition, two new ppcB1 clones of A. wiestii (Rawi 11581, US!) located in node A'C-PPII and clade A-PPI (Figs S1 and S3), together with two reported FL intron2 clones (CIav 9053) 18 indicate that the coexistance of diploid and tetraploid forms for A. wiestii is certainly different from other As-genome diploids. Although the different genome forms of A. wiestii were close in genome size 12 , the intraspecific differences between A wiestii deserves further investigation. Two plausible explanations can be proposed for the ploidy level of allelic variation found in A. wiestii. First, the three diploids may have arisen by allopolyploidy and subsequent unequal diploidization led to heterozygotes. Second, introgression may have brought about very subtle morphological and genetic changes in Avena (Fig. 2), because stabilized introgressant species were more common than cases of dispersed introgression involving extensive gene flow among distinct species 35 . The two explanations are not mutually exclusive, such as Leucaena 36 sharing unequal diploidization and introgression processes. The paternally inherited genome of an allopolyploid is usually more prone to genetic change than the maternally derived genome according to the nuclear cytoplasmic interaction hypothesis 37 . In support of this hypothesis, it has been proposed that A. atlantica, A. longiglumis, or A. wiestii might carry the diverged A-genomes because considerable allelic variation was detected in the ppcB1 and FL int2 data 18 .
Based on ppcB1 and cytogenetic data, a close phylogenetic relationships between the A and D genomes substantiates the multiple origins for cultivated oat 19 . However, the integrated theory for the long-term evolutionary impact of recurrent polyploidy was unclear for hexaploid divergence in Avena. Based on the level of genetic variation, it is logical to postulate that recurrent polyploidy from genetically distinct diploid progenitors would introduce genetic variation into hexaploids. Nuclear data have demonstrated that recurrent polyploidy can lead to hexaploids being reproductively isolated to varying degrees. Six hexaploids were found within A'C-PPIII, AB&A'C-GBI, AB-GPII clades, while some hexaploids were dispersed within other clades in the nuclear gene trees. For example, Avena nuda is morphologically distinct with falling caryopses, but it was independently inserted within node AB-PPI and clades AB-GBII and A'C-GBII, demonstrating varying degrees of interfertility Scientific RepoRts | 7:41944 | DOI: 10.1038/srep41944 when compared with A. sativa 15 . Clearly six hexaploids cannot be regarded as a single species designated as A. sativa 38 , especially for wild hexaploids- A. fatua, A. sterilis, A. hybrida, and A. occidentalis, each adapted to respective microenvironments in the circum-Mediterranean region 33 . Avena provided a great model for studying polyploidy, especially concerning the evolutionary and genetic processes associated with extensive intergenomic translocations 30 and northward diffusion into cooler areas 33 over a time scale of c. 20 mya (Fig. 6). Future studies of Avena need to investigate the unique and conserved genomic signatures using phylogenomics 39,40 . Paleoclimatic hypothesis for the lineage divergence of Avena. It has been proposed that the Miocene-Pliocene interval was a key period in the origin of Mediterranean temperate plants and involved two major climatic oscillations 41 . The former comprised mild seasonal climatic contrasts that resulted from rainfall decreasing and repeated cooling events during the early to middle Miocene; and the latter was characterized by a high seasonal Mediterranean climate resulting from the onset of aridity and seasonality during the late Miocene to Pliocene 27 . During these mild climatic contrasts, shifts in vegetation from subtropical forest to annual grasslands occurred in the Mediterranean Basin 29 (Fig. 6). These periods coincide with mild seasonal climatic contrasts that occurred during the early to middle Miocene. It appears a temporal relationship exists between the mild seasonal climatic contrasts and the divergence of major lineages in Avena.
Cultivated oat may have arisen multiple times in response to selection pressure such as geographic isolation. The long-term aridity of the Mediterranean Basin summer became more severe along a south-eastern to north-western gradient during the late Miocene to Pliocene 27 , leading to the domination of open habitats by C 3 -pooid grasses 42 . The increased colonization capacity of cultivated oat may be strongly linked to hybridization between diploid and tetraploid progenitors followed by chromosome duplication. Recurrent polyploidization events in the Avena sativa lineages (nodes 5, 6, and 7) seem to correlate with highly seasonal climatic oscillation. Geographic isolation might have contributed to genetic differentiation in the progenitor-derivative species pair, the presumed D(or A')-genome progenitors having disjunct distributions in the Mediterranean region (e.g., A. atlantica was endemic to Morocco, A. wiestii was endemic to the east Mediterranean, east Saharo-Arabian, and Irano-Turanian, and A. longiglumis was endemic to the west-south-east Mediterranean and Saharo-Arabian) 5 . The once extensive distribution of the narrow-endemic A'C(DC)-genome tetraploids underwent contraction. Hybridization might have been a key element in the successful spread of cosmopolitan cultivated oat as a result of incorporation of locally adapted genes from different progenitor genomes. If this was the case, then the initial hybridization must have pre-dated the formation of modern Mediterranean region 26 , which isolated A. wiestii (eastern-most) from A. atlantica (western-most). Therefore, the independent hexaploidy events of cultivated oat were modulated by harsh climatic oscillation, thus A. sativa was able to adapt to new habitats.
Avena represents a remarkable model to study because its history of polyploidy, lineage divergence, and complex reticulate evolution. The complex evolution of cultivated oat and its close relatives involved paleotetraploidy events between the ancient A(or A')-and C-genome diploid ancestors and subsequent recent allohexaploidy events between A'C(DC)-genome tetraploids and the more recent A-genome diploid progenitors. The pattern of recurrent polyploidizations in Avena and their temporal relationships with paleoclimatic oscillations is unparalleled among polyploid crops occurring in the circum-Mediterranean region 4,43 .

Methods
Taxon sampling and data collection. Eighty-nine accessions of 27 species were sampled to represent the morphological diversity and geographic range of six sections in Avena 5 , together with outgroups comprising 20 accessions of 16 species from seven allied genera (Supplementary Table S1 30 ) based on the recent phylogeny and classification of Poaceae 44 . Leaf material was obtained from seedlings and herbarium specimens.
Three low-copy nuclear genes, phosphoenolpyruvate carboxylase B1 (ppcB1), granule-bound starch synthase I (GBSSI) and G protein alpha subunit 1 (gpa1), were used. The ppcB1 gene encodes PEPC enzyme for the oxaloacetate replenishment of the tricarboxylic acid cycle in C 3 plants 45 , the GBSSI gene encodes GBSSI enzyme for the amylose synthesis in plants 46 , and the gpa1 gene encodes a G-protein α subunit for signal transduction in flowering plants 47 . These loci have previously been used for accurate phylogenetic assessments in Poaceae 2,47,48 . Based on genome-wide studies on cereal crops, the three loci appear to be on different chromosomes 4,48,49 , thus each of nuclear markers can provide an independent phylogenetic estimate.
Genomic DNA was extracted following Liu et al 31 . and 864 new sequences were generated for nuclear (ppcB1, GBSSI, and gpa1) and plastid (ndhA intron, rpl32-trnL, and rps16 intron) fragments, which were amplified using designed or published primers and protocols listed in Table S2 31 . Amplified products were purified using polyethylene glycol (PEG) precipitation protocols and sequenced using an ABI PRISM 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA, USA). For accessions that unsuccessfully underwent direct sequencing, the purified PCR products were cloned into pCR4-TOPO vectors and transformed into Escherichia coli TOP10 competent cells following the protocol of TOPO TA Cloning Kit (Invitrogen, Carlsbad, CA, USA). The resulting sequences were edited using Sequencher v.5.2.3 (Gene Codes Corp., Ann Arbor, MI, USA) and aligned with MUSCLE v.3.8.31 50 , followed by manual adjustment in SE-AL v.2.0a11 (http://tree.bio.ed.ac.uk/software/seal/). All sequences were deposited in GenBank (KT452936-453223, KT723464-724040).
Phylogenetic analyses. Phylogenetic analyses were performed using maximum likelihood 51 and Bayesian inference 52 . Nucleotide substitution models were selected based on the Akaike Information Criterion determined by Modeltest v.3.7 53 . ML and bootstrap analyses (MLBS) were performed using the best-fit model (Table S3) for 1,000 bootstrap replicates in GARLI v.0.96 54 , with runs set for unlimited generations, and automatic termination following 10,000 generations without significant topological change (lnL increase of 0.01). The output file containing the best trees for bootstrap reweighted data was then read into PAUP* v.4.0b10 55 where the majority-rule consensus tree was constructed to calculate MLBS.
Divergence time estimation. The molecular dating analyses employed plastid markers a strict molecular clock model was rejected at a significance level of 0.01 (LR = 963.1856, d.f. = 102, P < 0.01) based on a likelihood ratio test 51 . A Bayesian relaxed clock model was implemented in BEAST v.1.8.2 56 to estimate divergence times in Avena. Three plastid markers were partitioned using BEAUTI v.1.8.2 (within BEAST) with the best fit model determined by Modeltest v.3.7 (Table S3). The stipoid-Pooideae lineage including Avena plus outgroups was dated to be 49.71 mya based on eight phytolith fossils, and thus the crown age of Avena plus outgroups was set at 49.71 mya since fossil surveys provide no evidence of an earlier date for the origin of the stipoid-Pooideae lineage during the late Eocene 57 .
A Yule tree prior, linked uncorrelated lognormal relaxed clock model, and default operators were defined in the BEAST xml input file. After optimal operator adjustment as suggested by the output diagnostics from preliminary BEAST runs, two independent MCMC runs were performed for 30 million generations, each run sampling every 1,000 generations with 25% burn-ins. All parameters had a potential scale reduction factor that was close to one, indicating that the posterior distribution had been adequately sampled. A 50% majority rule consensus from the retained posterior trees (c. 45,000) of three runs was obtained using TreeAnnotator v.1.8.2 (within BEAST) with a PP limit of 0.5 and mean lineage heights. The convergence between two runs was checked using the "cumulative" and "compare" functions in AWTY 58 .