Molecular evolution of Wcor15 gene enhanced our understanding of the origin of A, B and D genomes in Triticum aestivum

The allohexaploid bread wheat originally derived from three closely related species with A, B and D genome. Although numerous studies were performed to elucidate its origin and phylogeny, no consensus conclusion has reached. In this study, we cloned and sequenced the genes Wcor15-2A, Wcor15-2B and Wcor15-2D in 23 diploid, 10 tetraploid and 106 hexaploid wheat varieties and analyzed their molecular evolution to reveal the origin of the A, B and D genome in Triticum aestivum. Comparative analyses of sequences in diploid, tetraploid and hexaploid wheats suggest that T. urartu, Ae. speltoides and Ae. tauschii subsp. strangulata are most likely the donors of the Wcor15-2A, Wcor15-2B and Wcor15-2D locus in common wheat, respectively. The Wcor15 genes from subgenomes A and D were very conservative without insertion and deletion of bases during evolution of diploid, tetraploid and hexaploid. Non-coding region of Wcor15-2B gene from B genome might mutate during the first polyploidization from Ae. speltoides to tetraploid wheat, however, no change has occurred for this gene during the second allopolyploidization from tetraploid to hexaploid. Comparison of the Wcor15 gene shed light on understanding of the origin of the A, B and D genome of common wheat.

(T. m. monococcum) 21 . The A u and A m genomes have similar genome size and gene content 22 . T. urartu, the wild diploid wheat from the Fertile Crescent region, has long been considered as the A-genome donor to tetraploid and hexaploid wheat species 23,24 . In polyploid wheat, the origin of the B genome is still under debating, in spite of a large number of attempts to identify the parental species 24 . It has been reported that the B genome is closely related to the S genome of the Sitopsis section [25][26][27] which contains five species: Ae. bicornis (S b S b , 2n = 2x = 14), Ae. longissima (S l S l , 2n = 2x = 14), Ae. sharonensis (S sh S sh , 2n = 2x = 14), Ae. searsii (S s S s , 2n = 2x = 14) and Ae. speltoides (SS, 2n = 2x = 14) 28,29 . Previous studies [30][31][32][33] have shown that Ae. speltoides is phylogenetically distinct from the other species in the Sitopsis section. Ae. speltoides (S genome) has been suggested as the most likely progenitor of the B genome 26,27 . However, Huang et al. 24 and Haider 29 reported that none of the five Sitopsis species they investigated is a close relative of the B genome in T. aestivum, and concluded that the B genome donor remains unknown. There has been little debate on Ae. tauschii Coss (genome DD) as the D genome progenitor of T. aestivum 24 .
There has been great interest in the determination of ancestral diploid genome donors of hexaploid wheat 10,29 . Understanding the origin of hexaploid wheat not only enhances its genetic improvement, but also is important in the development of artificial synthetic forms 20,34 , because genome progenitors of common wheat are very important genetic resources to improve the economical traits of modern cultivars 35,36 . However, so far, the direct experimental evidence for clear understanding of the phylogenetic history among the three A, B, and D genome lineages are still challenging. Maybe, this debate can be greatly simplified by analyzing the molecular evolution of a conservative gene among diploid, tetraploid and hexaploid wheat species.
Wcor15 (GenBank: AB095006), a member of the wheat cold-responsive gene family, which could encode the chloroplast-targeted protein when exposed to low temperature, plays an important role in the cold hardiness of wheat 37 . Based on our sequencing data, we found that the Wcor15 gene was very conservative in the hexaploid wheat, not only the coding region but also the 5′ -upstream non-coding region. In this study, we cloned and sequenced the Wcor15 gene from diploid, tetraploid and hexaploid wheats to reveal the origin of A, B and D genome in common wheat, and compared their evolution among diploid, tetraploid and hexaploid wheats.

Results
Cloning and characterization of homoeologous Wcor15 genes. The three homoeologous Wcor15 sequences were identified using the ORF sequence (including the intron, 563 bp) of Wcor15 gene (GenBank: AB095006) as probe to screen the nucleotides databases of EBI (EBI; http://www.ebi.ac.uk/ena/) 38 , and sequences were found from the wheat genome A, B and D, respectively ( Table 1). The specific PCR primers named Wcor15A, Wcor15B and Wcor15D (Table 2) for amplifying three homoeologous Wcor15 sequences which contained intact ORFs were designed, based on the highly variation region of accession CBTL0110083500 (2AL), CBTL0111257031 (2BL) and CBTL0110522649 (2DL).
The primer pairs were used to amplify genomic DNA of hexaploid wheat cultivar Annong 0822. Each primer pair generated single-band amplicon with the expected size. The genes were designated as Wcor15-2A  Table 2. Primers used in this study.
(KT264885), Wcor15-2B (KT264957) and Wcor15-2D (KT265022) respectively, which contained the 5′ upstream region, two exons, one intron and 3′ downstream region. Further analysis demonstrated that these three sequences are very similar with a few nucleotide insertions, deletions, and substitutions ( Supplementary Fig. S1). The Wcor15-2A sequence from A genome is exactly the same to the sequence of AB095006 previously reported by Takumi et al. 37 , suggesting that the Wcor15-2A and Wcor15 (GenBank: AB095006) is the same gene. After RT-PCR using RNA templates from Annong 0822, all of the three homoeologous Wcor15 genes were specifically induced by low temperature (data not shown), suggesting the three homoeologous Wcor15 genes are the cold-responsive gene.
In order to further confirm the location of the gene, one set of nulli-tetrasomic lines of cv. Chinese Spring was used. Wcor15-2B was found in the lines except nullisomic 2B-tetrasomic 2D (N2B-T2D). This indicates that the Wcor15-2B is located on chromosome 2B. In turn, Wcor15-2A and Wcor15-2D were assigned to chromosome 2A, and 2D, respectively (Fig. 1).
Each Wcor15 cDNA clone contained an ORF of 441 nucleotides that putatively encoded a polypeptide with 147 amino acid residues (Fig. 2). They shared common characteristics such as a sorting signal that is predicted to target them to the chloroplast 37 . The properties of the N-terminal end of the Wcor15-2A, Wcor15-2B and Wcor15-2D polypeptides were determined. They have the conserved regions coding for the putative chloroplast signal peptides and the putative cleavage site of the signal peptide (Fig. 2), and shared the common site of an intron insertion and 14-3-3 protein recognition motif that could interact with the 14-3-3 proteins. The binding of the proteins to the signal peptides is essential for the chloroplast precursor proteins to be efficiently transported into chloroplasts 39,40 . We also uncovered evidence that WCOR15-2A, WCOR15-2B and WCOR15-2D contained 11-mer amino acid motifs and α -helix structures characterizing LEA Group3 41 . Together these findings suggested that WCOR15-2A, WCOR15-2B and WCOR15-2D might belong to the chloroplast-targeted LEA3 protein.
Sequence analysis of the Wcor15-2A, Wcor15-2B and Wcor15-2D genes in hexaploid wheats (AABBDD, T. aestivum and T. spelta). The Wcor15A primer was used to amplify the Wcor15-2A among individual 106 hexaploid wheats including winter wheats, spring wheats and T. spelta from different geographical regions (Table 3). All the studied hexaploid wheats yielded an expected PCR product of approximately 1.8 kb. To further analyze Wcor15-2A, we randomly sequenced 100 samples (Supplementary Table S1). All sequences were   Table S2), suggesting that Wcor15-2A gene was highly conservative in hexaploid wheat.
The complete sequence of Wcor15-2B gene was also amplified from these 106 hexaploid wheats using Wcor15B primer. The PCR products from 54 wheats were sequenced (Supplementary Table S1). The Wcor15-2B sequences were highly conserved in the 54 hexaploid wheats (Supplementary Table S3). Fifteen substitutions (13 in the 5′ upstream, 2 in the 3′ downstream) and 2 insertion and deletion (one in the 5′ upstream, another in the intron) were occurred in the untranslational region, however, no significant differences were found in the two exons among the 54 sequences of Wcor15-2B ( Supplementary Fig. S2). They shared 100% identities in the deduced amino acid sequences.
The Wcor15-2D in these 106 hexaploid wheat accessions was also characterized. All of the samples yielded PCR products of ~2 kb. The PCR products from 33 wheat varieties were sequenced (Supplementary Table S1). No variation was found among 33 hexaploid wheat varieties (Supplementary Table S4), indicating highly conservative of Wcor15-2D gene in hexaploid wheat.
Our results indicated that the three genes Wcor15-2A, Wcor15-2B and Wcor15-2D derived from the three homoeologous 2A, 2B and 2D chromosomes were highly conserved among hexaploid wheat varieties from different geographical regions.
Sequence analysis of the Wcor15-2A, Wcor15-2B and Wcor15-2D genes in tetraploid species (AABB). The DNA from 10 tetraploid materials including three T. dicoccoides, three T. dicoccum, three T. durum and one T. carthlicum (Table 3) were amplified using the primer pairs Wcor15A, Wcor15B and Wcor15D ( Table 2). As expected, only the Wcor15A and Wcor15B amplified the PCR products with expected size (Fig. 3a). The Wcor15D primer did not give rise to any amplification products (Fig. 3a), confirming absence of Wcor15-2D in the tetraploid wheat genome.
The Wcor15-2A sequences from A genome in 10 tetraploid species (AABB) ( Table 4) are exactly the same with the sequence of Wcor15-2A from hexaploid wheats (Supplementary Table S5), suggesting that Wcor15-2A gene is highly conserved within tetraploid wheats, and between tetraploid and hexaploid wheats.
Alignment of the 10 Wcor15-2B sequences from tetraploid wheat showed a number of single nucleotide substitutions among these sequences whose situation was the same to Wcor15-2B in the 54 hexaploid varieties Ae. speltoides SS 3 Ae. longissima Ae.sharonensis S sh S sh 3 Ae.searsii S s S s 3 Ae. tauschii ssp. tauschii DD 3 Ae. tauschii ssp. strangulata DD 3 ( Supplementary Fig. S2), suggesting that diversification of Wcor15-2B did not occur between tetraploids and hexaploids during and after the second polyploidization.
Sequence analysis of the Wcor15-2A, Wcor15-2B and Wcor15-2D genes in diploid species (AA, SS and DD). In order to compare if Wcor15-2A, Wcor15-2B and Wcor15-2D genes have changed between diploid and polyploid, we sequenced these genes in a set of diploid wild relatives with genome AA, SS and DD, respectively (Table 4). In all the three varieties of T. urartu (genome A u A u ) surveyed, the primer Wcor15B and Wcor15D did not generate any amplification products (Fig. 3b), suggesting that the Wcor15-2B and Wcor15-2D sequence is absent in T. urartu. Amplicons were obtained from all three T. urartu with the primer Wcor15A. The three exactly same sequences (designated as Wcor15-2A1) showed 100% identity with the Wcor15-2A sequences from tetraploid and hexaploid wheats (Supplementary Table S5). Wcor15A, Wcor15B and Wcor15D primers failed to amplify the DNA from T. monococcum and T. boeoticum (Fig. 3b). In order to obtain the Wcor15 gene from the T. monococcum and T. boeoticum, we redesigned a pair of Wcor15s primers which located at near the coding region based on the previously reported Wcor15 gene (GenBank: AB095006). Three Wcor15 sequences were obtained (Fig. 3c) and are identical which was designated as Wcor15-2A2 containing a complete encoding region. The identity between Wcor15-2A2 and Wcor15-2A was 97.87% at the DNA level ( Supplementary Fig. S3 and Table S5).
In all eleven accessions of the Sitopsis species (1 Ae. bicornis S b S b , 1 Ae. longissima S 1 S 1 , 3 Ae. sharonensis S sh S sh , 3 Ae. searsii S s S s and 3 Ae. speltoides SS) ( Table 3) surveyed, the primer Wcor15A, Wcor15B and Wcor15D did not generate any amplification products (Fig. 3d). In order to obtain the Wcor15 gene from the Sitopsis section, we again employed the primer Wcor15s which only amplified the coding region of Wcor15 genes without the 5′ upstream sequence (> 1 Kb). Eleven Wcor15 sequences were obtained (Fig. 3c). Sequences analysis showed that all the three Ae. speltoides shared the two same exons of Wcor15-2B with tetraploid and hexaploid wheats. However, the intron of Wcor15-2B had two haplotypes in tetraploid and hexaploid wheats, one with a G deletion, the other with G insertion at the same location, while all the three Ae. speltoides only had one haplotype, a G deletion in the intron (Supplementary Fig. S4). The gene Wcor15-2B from Ae. bicornis (Q03-021), Ae. longissima (Q03-004), Ae. sharonensis (PI584395, PI584408 and PI584406), and Ae. searsii (PI599142, PI599124 and PI599126) showed 100% identity with each other, nevertheless, besides the difference of base G indel mentioned above, there were still many base differences compared with the gene from Ae. speltoides, 2 located in the first exon, 7 in the intron, and 5 in the second exon ( Supplementary Fig. S4). These results suggested that Ae. speltoides is the most likely gene donor of Wcor15-2B, and diversification of the gene occurred during the first polyploidization.  From diploid Ae. tauschii (As 80, As 77, As 2392, As 2386, As 2387 and As 2388), six Wcor15-2D were cloned with the primer Wcor15D ( Table 4). The six Wcor15-2D sequences were divided into two types: (I) As 2386, As 2387 and As 2388 with 100% identity, (II) As 80, As 77 and As 2392 with only a base substitution in the upstream non-coding regions. However, the Wcor15-2D from As 2386, As 2387 and As 2388 which belong to Ae. tauschii subsp. strangulata showed 100% identity with the Wcor15-2D from hexaploid wheat varieties (Supplementary  Table S6). The coding region sequences from Ae. bicornis (Q03-021), Ae. longissima (Q03-004), Ae. sharonensis (PI584395, PI584408 and PI584406), and Ae. searsii (PI599142, PI599124 and PI599126) are same to the sequences from As 80, As 77 and As 2392 of Ae. tauschii subsp. tauschii. The primer Wcor15A and Wcor15B failed to amplify a product from these species (Fig. 3e). The results suggested that Ae. tauschii subsp. strangulata is the donor to the gene Wcor15-2D in hexaploid wheat.

Discussion
The hexaploid bread wheat is believed to have originated through one or more hybridization events [16][17][18] . The study on origin of A, B and D genomes of bread wheat has been a hot topic. Understanding the origin of hexaploid wheat would benefit not only the genetic diversity but also expand the genetic basis for wheat breeding 23,42 . Previous studies have demonstrated that the sequence data of conserved gene can be used to study the evolution  of gene families from different species [43][44][45] . In this study, we reported the utility of the Wcor15 sequence to identify the progenitors of the tetraploid and hexaploid wheats and to define the evolution of their close relatives.
Wcor15 is the member of the Cor gene family, which could encode the chloroplast-targeted protein when exposed to low temperature, and play an important role in the cold hardiness of wheat 37,46-51 . Based on the previous research on Wcor15 (GenBank: AB095006) gene 37 , it was found that the gene of AB095006 located on chromosome 2AL, and we named it Wcor15-2A, in addition to this gene, we cloned the other two homoeologous Wcor15 sequences (Wcor15-2B and Wcor15-2D) from the wheat genome 2BL and 2DL, respectively. Gene characterization analyzing showed that the three homoeologous Wcor15 genes may belong to the chloroplast-targeted LEA3 protein, which is consistent with previous studies about characterization of Wcor15-2A 37,41 .
The Wcor15 gene is a good candidate gene for investigating the donor of A-, B-and D-genome. The three homoeologous Wcor15 sequences from the wheat genome A, B and D, respectively (Table 1) were different ( Supplementary Fig. S1). Each of the three sequences was highly conservative in respective diploid ( Supplementary  Figs S4-S6), tetraploid ( Supplementary Figs S4 and S5), and hexaploid ( Supplementary Figs S7-S10). Wcor15-2A and Wcor15-2B on the A-and B-genome were very stable from diploid (AA, BB) to tetraploid (AABB) ( Supplementary Figs S4 and S5), and from tetraploid (AABB) to hexaploid (AABBDD) ( Supplementary Figs S4  and S5). Wcor15-2D is also highly conserved from diploid (DD) to hexaploid (AABBDD) (Supplementary Fig. S6). Comparison of the conserved Wcor15 gene can provide some evidences on the origin of the A, B and D genome of common wheat.
The diploid wheats carrying A-genome included T. urartu (genome A u ), T. monococcum (genome A m ) and T. boeoticum (genome A m ). To investigate the evolutionary relationships of Wcor15-2A genes between diploid and polyploid wheats, the sequences from T. urartu, T. monococcum, T. boeoticum, tetraploid and hexaploid wheats were compared. The six genes in diploid wheats (genome AA) were classified into two types ( Supplementary Fig. S11). The three T. urartu (PI428222, PI428260 and PI428266) were type I (Wcor15-2A1). The two T. monococcum (Mo4 and TL) and one T. boeoticum (Bo8) were type II (Wcor15-2A2). Compared to the Wcor15-2A2 sequence, the Wcor15-2A1 sequence showed much higher identity (100%) with the Wcor15-2A sequences from tetraploid and hexaploid wheats, suggesting that the T. urartu might be the direct donor of the Wcor15-2A in common wheat and that Wcor15-2A gene from A genome has no mutation during two sequential allopolyploidization events from T. urartu to tetraploid and hexaploid wheats. The result is consistency with the previous studies 23, 24,52 . However, taking into consideration of no amplicon from T. monococcum and T. boeoticum when using Wcor15A primer, it suggested that non-coding regions of Wcor15-2A1 were obviously different from Wcor15-2A2. Coding regions alignments also revealed variation between Wcor15-2A2 and Wcor15-2A1 from T. urartu ( Supplementary  Fig. S3).
In terms of coding region, Wcor15-2B sequences from different tetraploid and hexaploid wheats were divided into two groups by the insertion and deletion of a nucleotide G in the intron. All three Ae. speltoides sequences shared 100% identity, are different from tetraploid and hexaploid wheats with only a G deletion in the intron. On the other hand, no amplicon obtained from Ae. speltoides when using Wcor15B primer, suggested that non-coding regions of Wcor15-2B might be obvious differences between Ae. speltoides and tetraploid and hexaploid wheats. Our results suggested that Ae. speltoides might be the direct donor of the Wcor15-2B in tetraploid and hexaploid wheat varieties, non-coding region of Wcor15-2B gene from B genome might mutate during the first polyploidization from Ae. speltoides to tetraploid wheat, however, no change has occurred for this gene during the second allopolyploidization from tetraploid to hexaploid.
The Wcor15 coding region of Ae. tauschii subsp. tauschii is same to the sequences from the S genome species, Ae. bicornis, Ae. longissima, Ae. sharonensis and Ae. searsii. Mayer et al. 57 also reported that Ae. sharonensis was much closer to Ae. tauschii than to Ae. speltoides. The analysis of the multispecies coalescent species tree for Aegilops and Triticum diploid suggested that Ae.bicornis, Ae. longissima, Ae. sharonensis and Ae. searsii are more closely related to Ae. tauschii ssp. tauschii than Ae. speltoides 58 . However, no amplicon obtained from Ae. bicornis, Ae. longissima, Ae. sharonensis and Ae. searsii when Wcor15D primer was used, indicating that non-coding region of Wcor15-2D from Ae. bicornis, Ae. longissima, Ae. sharonensis and Ae. searsii were obviously different from that of Ae. tauschii ssp. tauschii.
This paper examined the evolutionary relationship of the Wcor15 in diploid, tetraploid and hexaploid wheats during wheat allopolyploidization (Fig. 4). Triticum urartu, Ae. speltoides and Ae. tauschii subsp. strangulata are most likely the donors of the Wcor15-2A, Wcor15-2B and Wcor15-2D locus in common wheat, respectively. The Wcor15 genes from subgenomes A and D were very conservative without insertion and deletion of bases during evolution of diploid, tetraploid and hexaploid. However, the Wcor15-2B genes mutated only during the first allopolyploidization event.

Materials and Methods
Wheat germplasm. One hundred and six hexaploid wheat (genome AABBDD) were used in this study, including 4 varieties from Winter wheat region of North China (WWRNC), 24 varieties from North China plain sub-region of Yellow & Huai river winter wheat region (NCPSR), 28 varieties from North Huai river plain sub-region of Yellow & Huai river winter wheat region (NHRPSR), 7 varieties from West upland sub-region of Yellow & Huai river winter wheat region (WUSR), 3 varieties from Jiaodong upland sub-region of Yellow & Huai river winter wheat region (JUSR), 11 varieties from Winter wheat region of middle and lower reaches of the Yangtze river (WWR), 7 varieties from Southwestern winter wheat region (SWWR), 13 varieties from Introduced wheat variety of foreign (IWVF) 59 , 5 spring wheat region of North China (SWRNC) and 4 T. spelta, 10 tetraploid species (AABB), and 23 diploid species (AA, BB and DD) ( Table 3). DNA extraction, primer design, PCR and sequencing. Genomic DNA was extracted from young leaves of ten days seedlings using the Easypure plant Genomic DNA Kit (Sangon Biotech. Shanghai, China). Genome-specific primers were designed for each of the homoeologous Wcor15 genes (Table 2) using the software Primer Premier Version 5.0, and were synthesized by Shanghai Sangon Biological Technology Company.
PCR reaction were performed in total volumes of 20 μ l, containing 12.8 μ l ddH 2 O, 10 × PCR buffer (with Mg 2+ ) 2.0 μ l, dNTPs (2.5 mM) 2.0 μ l, 0.5 μ l of each primer (10 mM), 2.0 μ l genomic DNA and Taq DNA polymerase (5 U/μ l) 0.2 μ l. Amplifications were performed using a standard touchdown PCR protocol with the appropriate annealing temperature. Each PCR was done five repeats up to a total of 100 μ l.
All PCR products were directly sequenced. Each of 50 μ l PCR products were sequenced by Shanghai Sangon Biological Technology Company, and the other 50 μ l PCR products were sequenced by Huada Biotech Company in Beijing. To guarantee sequence accuracy, DNA sequencing was repeated three times.