Genome-wide characterization of microsatellites in Triticeae species: abundance, distribution and evolution

Microsatellites are an important constituent of plant genome and distributed across entire genome. In this study, genome-wide analysis of microsatellites in 8 Triticeae species and 9 model plants revealed that microsatellite characteristics were similar among the Triticeae species. Furthermore, genome-wide microsatellite markers were designed in wheat and then used to analyze the evolutionary relationship of wheat and other Triticeae species. Results displayed that Aegilops tauschii was found to be the closest species to Triticum aestivum, followed by Triticum urartu, Triticum turgidum and Aegilops speltoides, while Triticum monococcum, Aegilops sharonensis and Hordeum vulgare showed a relatively lower PCR amplification effectivity. Additionally, a significantly higher PCR amplification effectivity was found in chromosomes at the same subgenome than its homoeologous when these markers were subjected to search against different chromosomes in wheat. After a rigorous screening process, a total of 20,666 markers showed high amplification and polymorphic potential in wheat and its relatives, which were integrated with the public available wheat markers and then anchored to the genome of wheat (CS). This study not only provided the useful resource for SSR markers development in Triticeae species, but also shed light on the evolution of polyploid wheat from the perspective of microsatellites.

genes, but also generates heterozygosity and novel interactions leading to genetic and phenotypic variability 12 . There have been numerous studies devoted to understanding the mechanisms and evolution of polyploidy in T. aestivum [13][14][15] , while the roles of microsatellites have not been well understood.
Initially, SSR markers were developed from screening with SSR positive clones in genomic DNA library, but rather difficult and laborious. Then, bacterial artificial chromosome (BAC) end sequences 16 and expressed sequence tags (ESTs) 5,9,17,18 were widely used to SSR markers development in most plant species, which showed lower cost and effort but might yield multiple sets of markers at the same locus. Thanks to the rapid development of genome sequencing technology, the availability and analysis of nearly complete genome sequences from many organisms has provided insight into the distribution, putative function of microsatellites and also markers development [19][20][21][22][23][24][25] . More recently, the genome sequences were currently available for 8 Triticeae species, including Triticum aestivum and its seven relatives, which provided the opportunity to genome-wide characterized the distribution and frequency of microsatellites in Triticeae.
In present study, genome sequences of 8 Triticeae species and 9 model plants were mined for the abundance and composition of microsatellites. The nature of these microsatellites was analyzed and compared based on the genome sequences in the respective specie. Then, primers flanking these microsatellite motifs in T. aestivum were designed and used to analyze the relationship among wheat and its relatives. In order to improve the utilization efficiency of newly developed markers, only the longer perfect repeats (SSRs ≥ 20 nucleotides in length) and non-mononucleotides were selected to final genome-wide SSR markers development in wheat. In silico analysis of conservation and cross-transferability of these markers among barley and wheat close relatives were also carried out to study their utility in comparative mapping of genes and genomes. Our study not only provided the rich resource for SSR markers development in Triticeae species, but also provided the important information on the evolution of polyploid wheat.

Results and Discussion
Characteristics and frequency of microsatellites in Triticeae species. In current study, microsatellite distribution was characterized and compared in 18 genomes. A total of 4,763,266 microsatellites were identified, with an overall frequency of 126.83 per Mb ( Table 1). The variation in the microsatellite frequencies of these Poaceae species was 3.77-fold, which was highly similar to that reported for angiosperm species (3.7-fold) 26 . Moreover, the Poaceae species with large genome sizes have a low or moderate microsatellite frequency, which was agreed well with the significantly negative correlation between microsatellite frequencies and genome sizes (r = − 0.464) 26 . Interesting, we found that GC contents also showed a significantly negative correlations with microsatellite number or frequency (r = − 0.510).
The microsatellite characteristics (e.g., frequency and distributions of microsatellites with respect to motif length, type and repeat number) were generally similar among the Poaceae species. Comparative analysis of the occurrences of various microsatellites revealed that 80% of Poaceae species were rich in mononucleotide repeats (MNRs), while wheat (T. aestivum) and its progenitors (T. urartu and A. tauschii) have higher frequencies of DNRs (Table 1) species 26 . The difference in the microsatellite characteristics could be due to the different criteria used to identify SSRs in the database mining. For example, 10 and 11 copies, accounting for 67.63% of the total number of mononucleotide, were remained in our study while removed by Shi et al. (2013). The distributions with respect to the dominant/major motif type of microsatellites were showed in Figure S1. Specifically, the dominant motif type was rich in A/T, AG/CT, AAG/CTT, AAAT/ATTT, AAAAG/CTTTT and AGATAT/ATATCT. The distributions with respect to the dominant/major motif type of microsatellites were almost identical for MNRs, DNRs and PNRs, whereas TNRs, TeNRs and HNRs were relatively uncommon among Poaceae species. For example, AG/CT, the dominant motif type for DNRs, was accounted for 86.67% of the total Poaceae species, while AAAT/ATTT for TeNRs was just with the proportion of 40.00%. Furthermore, the microsatellite abundances increased significantly as the motif repeat number decreased, which might be because longer repeats have higher mutation rates and hence are more unstable.
Genic microsatellites, derived from transcripts, have some intrinsic advantages over genomes because of their higher level of transferability to related species, also higher quality and robustness of the amplification product 9,27 . We have characterized the distribution of microsatellites in coding sequences across 8 Poaceae species and 3 other plants ( Table 2). The results revealed that an overall lower frequency was found in coding sequences when compared to noncoding regions, which might be attribute to negative selection against frameshift mutations in coding regions 28 . It should be noted that 40.26% of the rice gene contained microsatellites and more than half of them (21.46%) were located in the coding region. On the other hand, fewer microsatellites were found in wheat and its two progenitors (T. urartu and A. tauschii), according for 10.62%, 5.95% and 6.85% of their total gene respectively. Gene Ontology (GO) enrichment analyses of genes containing microsatellites in coding region revealed enrichment in biological regulation (GO: 0065007), pigmentation (GO: 0043473) for Biological Process and binding (GO: 0005488), transcription regulator activity (GO: 0030528) for Molecular Function across these plants. While microsatellites seemed significantly less abundant in genes for catalytic activity (GO: 0003824) in plants (Table S1). Analysis of the distribution of microsatellites across the three subgenomes in wheat revealed that microsatellites were more abundant in the B and D genome chromosomes relative to A (B > D > A) (Table S2). Furthermore, high similarity was observed for several characteristics of microsatellites investigated in the different chromosomes of wheat. MNRs mostly contributed to the proportion of SSRs, and a very small part was contributed by PNRs and HNRs. Among MNRs, more than 90% of the wheat chromosomes were rich in A/T type while G/C was scarce. In the DNRs and TNRs category, the distribution of dominate motif type was perfectly uniform among different wheat chromosomes and the most frequent motif type was AG/CT and AAG/CTT, respectively. While there were more variation for TeNRs, PNRs and HNRs, which have 3 to 5 dominate motif type for them among different chromosomes. For example, for TeNRs the AAAT/ATTT, AATT/AATT, AGAT/ATCT, ACAT/ATGT and ATGC/ATGC were observed more frequently in different wheat chromosomes (Fig. 1). This might be the frequency of TeNRs, PNRs and HNRs was very low in all the wheat chromosomes and their motif-wise distribution was not significant. Although majority characteristics of microsatellites among different chromosomes showed highly similar, the densities of repeats were varied between different chromosomes in wheat. The highest frequency of microsatellites was identified in wheat chromosome 2D (131.03 SSR/Mb) followed by 3B (120.43 SSR/Mb), whereas the lowest frequency was observed in 3 A (82.48 SSR/Mb), with a variation of 1.43-fold (Table S2).  The distributions of microsatellites in coding and non-coding region were also compared between each wheat chromosomes. Results showed that clear similarity were found for the whole genome and non-coding sequences (including intergenic sequences and intron sequences), but obviously different for the coding sequences. Intron regions showed the highest frequency of microsatellites, whereas coding regions showed the lowest, with a variation of 1.81-fold. Intergenic sequences and whole genome have a similar frequency, with the number of 106.92 and 104.99 per Mb, respectively (Fig. 2). Furthermore, coding sequences have the highest variation (4.35-fold) among different chromosomes in wheat, ranging from 20.74 (5 A) to 90.23 (2B). While there were just 1.59, 1.59 and 1.62-fold change in whole genome, intergenic sequences and intron sequences respectively (Table S2). Excepting for microsatellites frequency, the distributions of different microsatellite motif type were also showed a significant difference between coding and non-coding regions. In all of the wheat chromosomes, TNRs were very frequent in coding sequences, and the most common among them were CCG/CGG (Fig. 1). While MNRs were predominant and AAG/CTT was the most prevalence motif type for TNRs in non-coding sequences (including intergenic sequences and intron sequences). These results were well agreed with the earlier reports in rice 19 , Brachypodium 21 and switchgrass 24 .

Gene Number
Based on the assembled pseudochromosomes of bread wheat, the genomic distributions of microsatellites and their relation with the annotated genes and TEs were investigated. Results showed that greater physical densities of microsatellites were found in distal chromosomal regions than in the central regions, which was similar to the previous reports in Gossypium 23,[29][30][31] and Brassica crops 22 . Specifically, the genomic distribution of microsatellites was positively correlated with genes and negatively correlated with TEs (Fig. 3). For wheat, the frequencies of microsatellites in the 1-Mb genomic intervals were significantly positively correlated with genes (r = 0.61) and negatively correlated with TEs (r = − 0.50). These results are similar to previous reports that microsatellites are associated with gene sequences in plants 16,31 . Furthermore, the flanking sequences (1000 bp) at both sides of each microsatellite were extracted and used to analyze repeat elements with the RepeatMasker Program (RepeatMasker libraries version: rm-20120418). Compared with whole genomic sequences, we observed 11.90%  reduction in the class I elements (retroelements) content associated with these flanking sequences, whereas class II elements (DNA transposons) increased 45.03% (Table 3). Gypsy long terminal repeat (LTR) retrotransposons reduced 22.05% and accounted for the greatest proportion of reduction in class I elements, while CMC-EnSpm was the most enriched repeat element type in class II elements (increased 45.99%).

Development and evaluation of genome-wide SSR markers in wheat and its relatives.
The 433,362 perfect microsatellite containing sequences were screened for suitable forward and reverse primer pairs at either side of the flanking genomic sequences. A total of 402,455 microsatellite markers were designed from the genomic sequences of wheat, with successful primer designing potential of 92.87%, which was similar to that documented in foxtail millet 20 and Gossypium species 23 . The physical location of these microsatellites markers was unevenly distributed on 21 chromosomes of T. aestivum, with average marker density of 84.49 markers per Mb.
Then, the high-density microsatellite markers were used to investigate the relationships among wheat and its relatives with in silico PCR. It was not surprising that the closer relatives will have higher PCR amplification effectivity. As all other species studied here belong to the same genus of Triticum (wheat) or the genus of its donor (Aegilops), H. vulgare showed the lowest PCR amplification effectivity was acceptable. A. tauschii was found to be the closest species to T. aestivum, followed by T. urartu, T. turgidum (Table 4). In A-genome lineage species, T. monococcum (A m genome) and T. urartu (A u genome) was both bearing the A genome, both of which have been implicated as the source of the A genome in polyploid wheat 32 . It has been argued that T. urartu and T. monococcum are the same species and (or) that the source of the A genome in polyploid wheat is T. urartu rather than T. monococcurn 32 . Generally, majority of studies supported that T. urartu is the donor of the A genome to polyploid wheat 9,32-34 . In the present study, T. monococcum and T. urartu showed significantly different outcomes for markers derived from wheat A subgenome. In T. urartu species, nearly half of the markers (47.12%) developed from the wheat A subgenome could amplify prominent PCR products, while it was only 14.92% for T. monococcum, with a variation of 3.16-fold change. This result was well agreed with previous report that T. monococcum and T. urartu were two separately group for the A genome 9 , also confirming that T. urartu is the most probable ancestor of the A genome of polyploid wheat 32,33 . The origin of the B genome of polyploid wheat remains controversial and nearly all the Sitopsis genome (S genome) species have been suggested as the donor of the B genomes, including Aegilops speltoides (genome S), A. bicornis (genome S b ), A. longissima (genome S l ), A. searsii (genome S s ), and A. sharonensis (genome S sh ) 32 . However, several studies indicated that the B genome of wheat has significantly diverged from all potential extant wild progenitors, although it is closer to A. speltoides than to any of the other Sitopsis genomes 35,36 . Hence, the majority of evidence seems to suggest that A. speltoides is the most likely living relatives of B genomes donor species 9,10 . In the present study, two B-genome lineage species (A. speltoides and A. sharonensis) were investigated to clarify the relationship between them and polyploid wheat. Our results indicated that T. aestivum was more closely related to A. speltoides for the origin of the B genome, while A. sharonensis showed a higher PCR  amplification effectivity for markers derived from wheat D subgenome (Table 4). However, only 15.94% of the marker derived from wheat B subgenome could amplify successful in A. speltoides, which was significantly lower than that for T. urartu (47.12%, 2.96-fold change) and A. tauschii (72.34%, 4.54-fold change). Furthermore, there was just slightly higher amplifying rate for markers derived from wheat B subgenome than that for A and D subgenome. This result supported that A. speltoides was more likely living relatives of B genome donor species than other speices, but it may not be the direct B donor of T. aestivum.
In D-genome lineage species, data obtained in this study also supported the viewpoint that A. tauschii was the donor of the D genome for hexaploid wheat 9,11 . In addition, it was interesting to find a significantly higher amplifying rate for markers derived from wheat D subgenome than that in A subgenome for its donor, with a variation of 1.54-fold change (Table 4), which was mainly due to the characteristic of microsatellite and the special emerge process of hexaploid wheat. Previous studies showed that microsatellites were subjected to a high rate of single-motif insertion and deletion mutations, through the process of replication slippage 37,38 , indicating that microsatellite was in a constant state of change. Compared with wheat D subgenome, A and B subgenome have a longer species differentiation from its progenitor, especially for the selective pressure throughout two ploidisation processes. The later join of D genome to hexaploid wheat might make it remain more ancestral species' microsatellite characteristics. On the other hand, previous study has showed that little genetic differentiation was found among the D genomes of T. aestivum and it appeared to share a single D genome genepool in the evolution of T. aestivum 11 . The A. tauschii genome sequencing material (AL8/78) has been demonstrated to be one of the closest accessions to wheat D subgenome 39 . Compared with A subgenome donor, the closer relationship combined with bottleneck might be another factor contributed to the higher amplifying rate for markers derived from wheat D subgenome.
In addition, all the developed genome-wide microsatellite markers were also subjected to in silico PCR analysis among wheat different chromosomes. Nearly all the microsatellite markers (average success rate 99.88%) could have products in their initial chromosome, while only 10.01% markers could successfully amplify in other chromosomes (Fig. 4). Then, we calculated the average amplification rate for homoeologous chromosomes and its subgenome chromosomes. It was widely accepted that there would be more similar among wheat three homoeologous chromosomes than its subgenome chromosomes. However, a significantly higher PCR amplification effectivity was found in chromosomes at the subgenome than its homoeologous, excepting four chromosomes (1D, 2D, 3D and 4D) (Fig. 5). For example, in wheat 1 A chromosome, an average of 13.19% amplification rate was found in 2A to 7A (subgenome chromosomes), while there were only 9.32% for its homoelogous chromosomes (1B and 1D). As we know, hexaploid wheat was relative new species, indicating that wheat subgenome might have higher progenitor's characteristics for microsatellites than its new immerge. Furthermore, previous study displayed that the number of the putative translocation events in the wheat D subgenome was about half of those presented in either the A or B subgenome, and majority of the translocations were occurred among chromosomes in the same subgenome 40 . Compared with D subgenome (especially for 1D, 2D, 3D and 4D), the more interchromosomal communications among chromosomes in the same subgenome for A and B might contribute their higher amplification rate in sub genome chromosomes than their homoelogous chromosomes.
On the other hand, only the longer perfect repeats (SSRs ≥ 20 nucleotides in length) and non-mononucleotides were selected to final genome-wide SSR markers development in wheat for improving their utilization efficiency. Mononucleotides were not considered due to the difficulty of distinguishing bona fide microsatellites from sequencing or assembly error, and because (A/T)n repeats in coding region may be confused with polyadenylation tracks. SSR markers derived from longer perfect repeats (SSRs ≥ 20 nucleotides in length) have demonstrated to show high polymorphic by the experimental data in many organisms such as human 41 and rice 42 . Finally, a total of 61623 microsatellite markers were designed from the selected genomic sequences of wheat, with a successful rate of 87.36%. Majority of the markers derived from Intergenic region (55978, 90.84%), followed by Intron region (4416, 7.17%) and Exon region (1229, 1.99%) (Fig. 6).
Bread wheat is hexaploid with 21 pairs of chromosomes, being derived from a combination of three diploid donor species via two ploidisation processes. In bread wheat, SSR markers usually amplify multiple fragments from homoeologous DNA sequences, which could complicate or cause errors in the genotype scoring. Therefore, all the newly developed SSR markers (61623) were subjected to in silico PCR analysis in the assembled genomic sequences of Triticum aestivum. A total of 20169 markers could generated 1 in silico PCR products in CS, of which 18216 (90.32%), 1424 (7.06%) and 529 (2.62%) from Intergenic region, Intron region and Exon region respectively (Fig. 6). Furthermore, markers derived from the Exon region (529, 43.43%) displayed the highest amplification rate of unique single allele in silico, followed by Intergenic region (18216, 32.54%) and Intron region (1424, 32.25%). While 215 (0.35%), 5964 (9.68%), 2388 (3.88%) and 32394 (52.57%) markers generated 0, 2, 3 and ≥ 4 in silico PCR products from the survey sequences of CS. Similar results were also found when used these markers search against wheat close relatives (Fig. 6). Genetic and breeding studies demonstrated that microsatellite markers generate one in silico PCR product could be especially useful 23 . Therefore, all the 20169 newly developed SSR markers would be used as potential SSR markers in wheat, further used to next polymorphisms evaluation in wheat and its relatives.
To generate microsatellite markers with the potential to direct use, we tested the polymorphisms of 20,666 specific microsatellite makers (generated 1 in silico PCR products in CS) in CS and w7984 using the genome sequencing data of w7984 (9.1 Gbp) 43 . A total of 8894 markers generated 1 in silico PCR products shared with CS and w7984, with a proportion of 43.07% (8894/20666). To avoid complicated errors in genotyping due to random amplification, all the sequences with 1 in silico PCR products in w7984 (8894) were further extracted and used to analyze their microsatellite characteristics (Table 5). For a given primer pair, only the amplicon flanked an SSR with the same basic motif as expected would remain and evaluate their polymorphisms in CS and w7984. Finally, a total of 5478 newly developed SSR markers shared the same microsatellite type in CS and w7984, of which 3267 (59.64%) displayed polymorphisms (Table S2). The high-density SSR marker-based physical maps constructed in this study could be useful for the rapid selection of genome-wide SSR markers that are well distributed over these chromosomes for various genotyping applications ( Figure S2). Furthermore, we also tested the transferability of 20666 newly development microsatellite markers in wheat relatives and barley. To avoid random amplification and multiple fragments, the transferable markers should meet the following criterial: I) the markers should generate 1 in silico PCR product; II) the amplicon flanked an SSR with the same basic motif as expected. In silico analysis demonstrated that 5836 (28.24%) markers displayed a high level of transferability at any of the seven wheat relatives, of which 4066 (69.67%) markers displayed Figure 5. Different average amplification rates between wheat homoeologous chromosomes and its sub genome revealed by newly developed SSR markers. Note: all the developed genome-wide microsatellite markers derived from each chromosome were separately searched against the assembled genomic sequences of wheat with in silico PCR. The average amplification rate for homoeologous chromosomes and its sub genome chromosomes were separately calculated for each chromosome. For example, in wheat 1A chromosome, the average amplification rate for homoeologous chromosomes were defined as mean amplification rate in 2A to 7A, while 1B and 1D for its sub genome chromosomes.

Validation of SSR markers by PCR amplification. The validation of newly developed SSR markers in
the wheat genome (CS) were performed using 21 randomly selected SSR markers. Nineteen (90.48%) of 21 primer pairs gave clear, successful and reproducible amplification as expected products size (Fig. 7A and Table S4), while 2 markers displayed a weak band at the expected position as a consequence of multiple loci amplification. One of the reasons might be that the ePCR would underestimate the complexity of wheat and there were more multiple loci amplification in real PCR than ePCR. On the other hand, the incomplete wheat genome Figure 6. Variation in the numbers of in silico PCR products with the newly developed SSR markers in wheat and its relatives. Note: The newly developed SSR markers were evaluated by search against the assembled genomic sequences of wheat and its close relatives with in silico PCR. And 1, 2, 3 and ≥ 4 corresponding to markers generated 1, 2, 3 and ≥ 4 in silico PCR products in the assembled genomic sequences respectively.

Extron Region
Intron Region Intergenic Region Whole genome  sequences used in this study might also contribute to underestimate the complexity of wheat. To evaluate polymorphism and molecular diversity potential of developed SSR markers, 8 validated microsatellite markers (3, 3 and 2 markers derived from Intergenic, Intron and Exton region respectively) were amplified in 3 accessions of wheat (CS, w7984 and Opata) ( Table S4). All the markers showed polymorphism in these 3 wheat accessions (Fig. 7C). Furthermore, 6 makers which displayed transferable to wheat relatives were validated by PCR amplification ( Fig. 7C and Table S4). Thus, high successful amplification rate in wheat and its relatives demonstrated that the newly developed 20666 microsatellite markers for hexaploid wheat genome were a useful resource for wheat genomics and molecular breeding, as well as other Triticeae species.

Number of Polymorphism
Integrated the newly developed markers and other publicly available hexaploid wheat markers into the wheat genome sequence. Recently, SNP markers identified from recent whole-genome shotgun, transcriptome sequencing and genotyping by sequencing (GBS) have been widely used to high-throughput genotyping using DArTSeq technology in wheat such as 90 K SNPs and 400 K SNPs arrays. To enhance the newly developed microsatellite markers as a genomic resource for the wheat genetics and breeding community, we anchored wheat 90 K SNPs, 400 K SNPs and other publicly available microsatellite markers on the same genomic sequence of CS. Finally, a total of 119,576 markers loci were anchored to the genomic sequence of CS, including 119,576 SNP marker loci (12,725 90 K and 106,752 400 K SNP markers) and 99 publicly available microsatellite markers (Table S5). Because the majority of these markers were widely used now or have been anchored to any phenotypic maps, integrating them and the newly development microsatellite markers allowed immediate association newly developed markers to traits targeted by breeders.

Conclusion
In present study, we conducted a genome-wide analysis of microsatellites in 8 Triticeae species and 9 model plants. The origin, distributions and evolution of microsatellites in Triticum species have been characterized and compared. Furthermore, in silico PCR of the microsatellite loci was used to analyze the relationship among wheat and its relatives as well as the wheat sub-genome and homoeologous chromosomes, which shed light on the evolution of polyploid wheat from the perspective of microsatellites. Additionally, 20,666 chromosome-specific SSR markers were developed, and their amplification efficiency and polymorphics were investigated in wheat (CS and w7984) as well as other Triticeae species. Among them, 3267 and 4066 markers displayed polymorphisms in wheat different materials (CS and w7984) and its close relatives, respectively. Finally, the newly developed SSR markers were further integrated with the publicly available wheat markers to dense the wheat genetic map. Our study not only provided the rich resource for SSR markers development for wheat and Triticeae species, but also provided the important information on the evolution of polyploid wheat.

Materials and Methods
Sources of genome sequences. The  Identification of microsatellites and primer design. Genome sequences were searched for microsatellites using the default parameter of MISA identification tool 27 . The search criteria were: six repeat units for dinucleotide repeats (DNRs), five repeat units for trinucleotide repeats (TNRs), tetranucleotide repeats (TeNRs), pentanucleotide repeats (PNRs) and hexanucleotide repeats (HNRs). Compound microsatellites were defined as ≥ 2 repeats interrupted by ≤ 100 bp. The forward and reverse primers flanking the identified microsatellite repeat motifs were designed in batches using the primer3_core program. Two perl scripts, p3_in.pl and p3_out.pl, serve as the interface modules for the programmer-to-programmer data interchange between MISA and the primer modeling software Primer 3.0 44 . The major parameters for primer design were as follows: 18-27 bp in primer length, 57-63 °C in melting temperature, 30-70% in GC content and 100-300 bp in product size.

Statistical analysis and functional annotation. The repetitive elements were detected using
RepeatMasker Program against RepBase (Version: rm-20120418) database with defaulting parameters. TE families were classified as previously described 45 . Each chromosome was divided into 1-Mb for statistical analysis of microsatellites, genes and TEs for the represent practical frequencies. The Circos-0.67-7 software was used to visualize the frequencies of microsatellites, genes and TEs in wheat 21 chromosomes 46 . Statistical analyses were performed using SPSS Statistics 17.0 (SPSS Japan, Inc., Tokyo, Japan). Functional annotation of genes containing microsatellites was performed by using Gene Ontology Tools.
In silico evaluation of genome-wide SSR markers in wheat and its relatives. In order to improve the utilization efficiency of newly development SSR markers, only the markers with longer perfect repeats (SSRs ≥ 20 nucleotides in length) and non-mononucleotides were selected to in silico analysis of their polymorphisms in wheat and its relative species. The software (e-PCR-2.3.12) was used for in silico PCR analysis with the following parameters: 7 word size, 0 discontinuous word, 50 bp margin, 2 mismatch, 1 gap, and 100 − 300 bp product size 47 . Only one genome was used at once. The paired primer regarded as a putative polymorphic primer should meet the following criterial: I) generated 1 in silico PCR product at any test genome; II) the amplicon flanked an SSR with the same basic motif as expected; III) the length of PCR product variation derived from the alteration of microsatellites motif. Hierarchical clustering was visualized on heatmaps in R using the gplots package ((http://www.R-project.org), specifying average linkage and Pearson's correlation distance metric.

Validation of SSR markers by PCR amplification.
A total of 21 SSR primer pairs were synthesized to test for PCR amplification in wheat and its close relatives. Genomic DNA of these selected materials was isolated from young leaves by a standard procedure 48 . PCR amplification reactions were performed in 20-ul volume that contained 1 ul template DNA (100 ng), 0.5 ul of each primer (10 uM), 1.6 ul dNTP (2.5 mM each), 1.6 ul MgCl 2 (25 mM), 0.2 ul Taq DNA polymerase (5 U/ul), and 2ul Taq buffer (10× ). DNA amplification was conducted by the 'touchdown' method with two stages: stage I) initial denaturation at 94 °C for 5 min followed by six cycles of denaturation at 94 °C for 30 s, annealing at 63 °C for 45 s with a 1 °C decrease in each subsequent cycle and extension at 72 °C for 1 min; stage II) 26 cycles of 30 s at 94 °C, 45 s at 57 °C and 1 min at 72 °C and a final extension at 72 °C for 10 min. The PCR products were separated on 3% agarose gels and were visualized by ethidium bromide staining.
Integrated the newly developed polymorphic SSR markers with 90 K SNPs, 400 K SNPs and other publicly available SSR markers. A total of 1705 SSR markers were successfully downloaded from GrainGenes database (http://wheat.pw.usda.gov/GG3/) and subjected to search against the whole genome sequences of wheat (CS) using an in silico PCR strategy. The software (e-PCR-2.3.12) was used for in silico PCR analysis with the following parameters: 7 word size, 0 discontinuous word, 50 bp margin, 2 mismatch, 1 gap, and 100-300 bp product size. Only the markers mapped to unique locations in the reference wheat survey genome (CS) were remained.
In addition, 458,919 SNPs markers were downloaded from GrainGenes database (http://wheat.pw.usda.gov/ GG3/), of which 37,853 markers for wheat 90 K SNPs and 421,066 for 400 K SNPs. These SNP markers were mapped against wheat survey sequence (CS) using BLAT (v.34) software 49 . Criteria for assigning chromosomal locations of SNP markers as described by Mayer et al. (2014) with some modifications 14 : I) 95% identify, 95% coverage and gap rate less than 2%; II) Only the markers mapped to unique location in the reference wheat survey genome (CS) were remained.