DArTseq-based analysis of genomic relationships among species of tribe Triticeae

Precise utilization of wild genetic resources to improve the resistance of their cultivated relatives to environmental growth limiting factors, such as salinity stress and diseases, requires a clear understanding of their genomic relationships. Although seriously criticized, analyzing these relationships in tribe Triticeae has largely been based on meiotic chromosome pairing in hybrids of wide crosses, a specialized and labourious strategy. In this study, DArTseq, an efficient genotyping-by-sequencing platform, was applied to analyze the genomes of 34 Triticeae species. We reconstructed the phylogenetic relationships among diploid and polyploid Aegilops and Triticum species, including hexaploid wheat. Tentatively, we have identified the diploid genomes that are likely to have been involved in the evolution of five polyploid species of Aegilops, which have remained unresolved for decades. Explanations which cast light on the progenitor of the A genomes and the complex genomic status of the B/G genomes of polyploid Triticum species in the Emmer and Timopheevi lineages of wheat have also been provided. This study has, therefore, demonstrated that DArTseq genotyping can be effectively applied to analyze the genomes of plants, especially where their genome sequence information are not available.


Results
Diploid analyzers of polyploid Aegilops and Triticum species. To determine the putative progenitors of each of the polyploid species of Aegilops, SilicoDArT markers in the diploid genomes of all the Aegilops species were used as genome analyzers (Table 1). For each of the diploid species, species-specific markers were selected by filtering markers present in one species but absent in all the others. This made dominant SilicoDArT markers preferred in this analysis, as codominant SNP markers do not give information on PAVs. The progenitors of the polyploid species were estimated based on the proportions of diploid markers that are retained in each polyploid genome (diploid-polyploid monomorphism). Because the number of the species-specific markers is affected by genetic similarity among the diploid species, especially the Sitopsis species, the genomes of the polyploid species of Aegilops were first analyzed with all the markers in each diploid genome of Aegilops species (Fig. 1a) before being analyzed with diploid species-specific markers (Fig. 1b). This allowed us to determine the suitability of the species-specific markers in estimating the progenitors of the polyploid species. The use of species-specific markers as analyzers reduced the background noise produced by monomorphic markers among the diploid species (Fig. 1b).  Having confirmed the adequacy of the species-specific markers of the analyzers, polyploid genomes of Triticum species were analyzed with only species-specific diploid analyzers (Table 2). Therefore, the conclusions made regarding the progenitors of the polyploid species (Aegilops and Triticum) are based on the species-specific markers of the analyzers. To select analyzers for the polyploid genomes of Triticum species, the genomes of 16 bread wheat-related diploid species were screened on the basis of the proportions of homoeology of their SilicoDArT markers to the total SilicoDArT markers in each of the three genomes of bread wheat. The homoeology of the diploid genomes to each of the genomes of bread wheat was estimated based on the number of markers of the diploid species assigned to each genome of bread wheat (Table 2). This estimation was possible because, in this study, we used DArTseq platform optimized for bread wheat. A diploid species with at least 10% homoeology to any of the three genomes of hexaploid wheat was selected as an analyzer for the corresponding genomes of each of the six polyploid Triticum species. With this criterion, a total of 13 analyzers were selected. Species-specific markers of the 13 selected diploid analyzers were then filtered for the analysis of the putative progenitors of the genomes of the polyploid species (Table 2).
Genomic differentiation and evolutionary relationships among polyploid and diploid species of Aegilops. Before applying the genome analyzers to determine the progenitors of the polyploid species, we used a total of 28,264 polyploid species-specific SilicoDArT markers, ranging from 187 in Ae. juvenalis to 4,759 in Ae. cylindrica (Table 3), to confirm genomic difference among the 12 polyploid species of Aegilops. The polyploid species-species markers were selected in the same manner as the diploid species-specific markers. The relatively low numbers of specific markers in the genomes of Ae. crassa and Ae. juvenalis is obviously because large proportions of their genomes (D and U genomes) are shared by the other species (Table 3). With the possibility of genomic adjustments during polyploidization 22,36 and the assumption that the original progenitors of the polyploid species may be different from the accessions of the diploid species used in this study, only diploid analyzers with considerably higher proportions of monomorphism with the polyploid species were taken as the putative progenitors of the polyploid species. Our analysis confirmed the putative diploid progenitors of Ae. ventricosa (D v N v ), Ae. cylindrica (C c D c ), Ae. kotschyi (S k U k ), Ae. biuncialis (U b M b ), Ae. triuncialis (U t C t ), Ae. ovata (U g M g ), and Ae. variabilis (S p U p ) 30 (Fig. 1). Noteworthy is that the proportions of the markers of three Sitopsis species (Ae. bicornis, Ae. longissima, and Ae. sharonensis) retained in the genomes of the two polyploid species with S-related genomes (Ae. kotschyi and Ae. variabilis) were not reasonably different. This made it difficult to decide which of the Sitopsis species donated the S-related genomes to Ae. kotschyi and Ae. variabilis, although Ae. bicornis and Ae. longissima, respectively, seem to be the most likely candidates. This observation confirms the likelihood of a common ancestry of the Sitopsis species 31 . Therefore, the original progenitor of the S-related genomes of the polyploid species may have been an ancient relative of the Sitopsis species, which is probably extinct. Although we adopted the polyploid species-specific markers ( Table 3) to differentiate Ae. kotschyi (S k U k ) from Ae. variabilis (S p U p ) and Ae. biuncialis (U b M b ) from Ae. ovata (U g M g ), these pairs of species have identical genomic constitutions (same progenitors; Fig. 1), and therefore may be considered to be variants/subspecies of the same species in each case.  The unidentified diploid progenitors of five polyploid Aegilops species (Ae. triaristata [U n X n , U n X n N n ], Ae. crassa [D cr X cr , D cr1 D cr2 X cr ], Ae. juvenalis [X j D j U j ], Ae. vavilovii [X va D va S va ], and Ae. columnaris [U c X c ]) 2,16,19-21 are most likely to be traced to Ae. speltoides or Ae. mutica. The competing proportions of monomorphic markers between each of the two diploid species (Ae. speltoides and Ae. mutica) and the genomes of the five polyploid species (Fig. 1) strongly suggest that the unidentified genomes may have been donated by an ancient species closely related to these two diploid species or their direct ancestor(s). Based on these results, we have proposed modifications in the genomic representations of Ae. crassa, Ae. juvenalis, Ae. vavilovii, Ae. columnaris, and Ae. triaristata (Table 4), in which the genomes of Ae. speltoides and Ae. mutica are jointly represented as T s . This does not suggest that the two diploid species are genomically the same, but shows that we could not, in this study, clearly determine which of the two species may have donated the controversial genome to the five polyploid species. In this proposal, the genome of tetraploid Ae. crassa is considered to be constituted of Ae. tauschii and T s genomes, while the genome of hexaploid Ae. crassa is designated as D 1 D 2 T s (tetraploid Ae. crassa x Ae. tauschii). Aegilops vavilovii is considered to have evolved from the hybridization of tetraploid Ae. crassa and Ae. searsii, granted that Ae. crassa and Ae. vavilovii have similar morphological traits and are reported to be sympatric 37 . Furthermore, our analysis shows that Ae. juvenalis has T s , D and U genomes; hexaploid Ae. triaristata, clearly lacks Ae. comosa genome but has the genomes of Ae. umbellulata, Ae. uniaristata and T s , and Ae. columnaris is composed of Ae. umbellulata and T s genomes.

Cluster analysis of diploid and polyploid Aegilops species. A phylogenetic tree (Fig. 2a) constructed
with 15,512 frequently called SNP markers separated the diploid Aegilops species into the already reported sections 30 , except that Ae. speltoides did not cluster with other species in the section Sitopsis, which has been reported by other researchers 18,29,32 (see online Supplementary Table S1 for information of the markers). Aegilops umbellulata (section Aegilops) seemed more distant from the others, whereas Ae. speltoides (section Sitopsis) appeared closer to Ae. mutica (section Amblyopyrum), and relatively distant from other species of section Sitopsis. Among Sitopsis species, Ae. longissima and Ae. sharonensis appeared genomically more proximal to each other than to others. The polyploid species of Aegilops formed two clusters clearly based on the putative common diploid progenitors, Ae. tauschii (D cluster) and Ae. umbellulata (U cluster) (Fig. 2b). Aegilops juvenalis, bearing both D and U genomes, clustered closely with Ae. crassa and Ae. vavilovii in the D cluster, indicating a possible evolutionary link between its (Ae. juvenalis) genome and the two species in the D cluster. This again suggests the likelihood of the presence of a diploid genome, perhaps T s , common to Ae. juvenalis, Ae. crassa and Ae. vavilovii (Fig. 1).

Genomic and evolutionary relationships in the Aegilops-Triticum species. We used species-specific
SilicoDArT markers of 13 bread wheat-related diploid species (Table 2) to determine the elementary donors of the A, B and D genomes in six polyploid Triticum species. As described for the estimation of the progenitors of the Ae. juvenalis 6x X j D j U j 187 Ae. ovata L. 4x U g M g 3163 Ae. triaristata 6x U n X n N n 2215 Ae. columnaris 4x U c X c 2470 Ae. variabilis 4x S p U p 2683 Total --28264 Table 3. Species-specific SilicoDArT markers of 12 polyploid species of Aegilops.

Reported genomic formula 2,3,16,19-21
Proposed genomic formula polyploid species of Aegilops, the proportions of species-specific markers of the diploid species shared with the genomes of the polyploid species enabled the determination of the progenitors of the genomes of the polyploid Triticum species (Figs 3-6).
The genome of T. urartu was the closest to the A genomes of all the polyploid species analyzed (Fig. 3), suggesting that T. urartu is the most likely donor of the A genome in each of them. The considerable similarity between the A genome of each of the polyploids and T. boeoticum -another A genome species -suggests a common ancestry of T. boeoticum and T. urartu. Similarly, Ae. searsii seems to be the most closely related to the B/G genomes of the polyploid species (Fig. 4). However, the proportion of Ae. speltoides markers assigned to the reference B genome is higher than those of every other diploid species analyzed ( Table 2). This strongly suggests   an evolutionary link between the genome of Ae. speltoides and the B/G genomes of the polyploids. This link is further supported by an almost equal similarity of the genomes of the two diploid species to the G genome of T. araraticum (Fig. 4). Using the species-specific markers of Ae. speltoides and Ae. searsii as analyzers, we found that chromosome 4 S of each of the diploid species are almost equally similar to chromosome 4B/G of each of the polyploids (Fig. 5). But chromosomes 2 S, 3 S and 7 S of Ae. speltoides appeared to be more similar to the corresponding chromosomes of T. araraticum than those of Ae. searsii are. These observations give the impression that the B/G genomes of polyploid Triticum species are likely to be recombinant genomes with varying contributions from Ae. speltoides and Ae. searsii. Analysis of the D genomes of the three hexaploid species unambiguously traced them to Ae. tauschii as the sole donor (Fig. 6). A further analysis using 66, 434 SNP markers consistently called in the six polyploid genomes (see online Supplementary Table S2 for information of the markers) indicated 72% similarity (monomorphism) across their A genomes, B/G genomes and the combined AB/AG genomes. However, higher similarity was observed among the AB genomes: hexaploids, 94%; tetraploids, 90%; hexaploid and tetraploid genomes combined, 84%. The slight differences in the proportions of monomorphic markers in the different groups of the AB genomes suggest that the AB genomes of the hexaploid species originated from the same tetraploid species, whereas those of the tetraploid species may have evolved from different accessions of the elementary A and B genome progenitors (T. urartu and Ae. speltoides/Ae. searsii, respectively). The lower similarity (84% as compared to 94%) across the hexaploid and tetraploid AB genomes may reflect further modification of AB genomes in hexaploid species resulting from their interaction with the D genome.

Discussion
The clustering patterns of Aegilops species were largely consistent with the established classifications 25,38,39 . Diploid species separated on the basis of their known sections in the genus; polyploid species were delineated following the presence of common diploid progenitor genomes (D and U genomes) among them 1,16,30 . However, as reported previously 18,29,32 , Ae. speltoides appeared distant from other species in the section Sitopsis; hence, its inclusion in the section needs to be reconsidered.
Markers specific to each of the 12 polyploid species clearly showed considerable polymorphisms among these genomes, including the genomes of species which arose from the same diploid progenitors. This suggests that genetic modifications, such as chromosomal alterations 2 , may have occurred during independent evolutionary events of those species with identical progenitors. Therefore, without these specific markers, it would be difficult to genomically differentiate Ae. kotschyi (S k U k ) from Ae. variabilis (S p U p ) and Ae. biuncialis (U b M b ) from Ae. ovata (U g M g ) because, from the stand point of our result (Fig. 1) and previous studies 30,31 , the species in each pair evolved from the same progenitors. Although each of the species in these two sets are recognized as independent, on the basis of differences in cytoplasm progenitors and/or nuclear genome variation 31,40 , this classification does not seem to be clearly justified. Therefore, in our opinion, each pair should be regarded as variants/subspecies of the same species.
The reported unknown diploid genomes, initially represented as modified M genome and later changed to X, in the genomes of Ae. triaristata, Ae. crassa, Ae. juvenalis, Ae. vavilovii, and Ae. columnaris 2,16,[19][20][21]30 is traceable to Ae. mutica or Ae. speltoides. The small proportions (<10%) of Ae. comosa-specific markers shared with the five polyploid species (Fig. 1b) is insufficient to infer the existence of remnants of Ae. comosa genome in the polyploid genomes. Assuming Ae. comosa was originally involved in the evolution of the polyploids, species-specific elements from other progenitors may have spread and eventually masked Ae. comosa-specific elements 41 . Our data suggest that ancient or ancestral forms of Ae. speltoides or Ae. mutica, which are probably extinct, donated the unidentified genomes to the five polyploid species. From our analysis, it appears that all the polyploid species originally had a genome of such an ancient species (Fig. 1). This observation agrees with the hypothesis that Ae. mutica (syn. Amblyopyrum muticum) and Ae. speltoides, both allogamous species with ancestral traits, diverged earlier than other Aegilops species and may therefore be the ancestors of the other Aegilops species 22 . Therefore, each diploid Aegilops species may have retained a substantial portion of the common ancestral genome (Ae. speltoides/Ae. mutica or their ancestor). The difference in the representation of the common progenitor in each of the polyploid species can result from the peculiar evolutionary event(s) of each species. Polyploids that arose from the hybridization of the common diploid ancestor with other diploid species should have larger portions of the genome of the common ancestor than those that did not directly evolve from the common ancestor. We validated the putative diploid progenitors of the A and D genomes of polyploid Triticum species to be T. urartu and Ae. tauschii, respectively [42][43][44][45][46] . We have also provided information that may help to explain the complex nature of the B/G genomes. The genomes of both Ae. speltoides and Ae. searsii are similar to the B/G genomes, especially to that of T. araraticum (Fig. 4), a relatively less advanced tetraploid genome 47,48 ; thus, the B/G genomes of polyploid Triticum species may have evolved from an ancestral genome that later differentiated into those of Ae. speltoides and Ae. searsii. Alternatively, the B/G genome may have arisen from the hybridization of Ae. speltoides and Ae. searsii before the emergence of the AB/AG genome at different times. The above considerations support earlier postulations that the B genome is the most modified of the three genomes of hexaploid wheat, whereas the A and D genomes are substantially similar to those of T. urartu and Ae. tauschii, respectively 22 . The previously suggested origin of the A genome of T. araraticum from T. boeoticum 23,24 is probably invalid (Fig. 3). Our results agree with the hypothesis that both Emmer and Timopheevi lineages of polyploid wheats have the same sources of elementary A and B/G genomes [25][26][27][28][29] . However, a common ancestry of the A-genome species cannot be ruled out and the A genomes of polyploid Triticum species may have evolved from a common ancestor of T. urartu and T. boeoticum before the differentiation of the two species. Although no karyotypic differences have been detected between these diploid A-genome species 25,49 , low fertility of interspecific F 1 hybrid plants of these two species has been reported 50 . The latter study, consistent with our result, confirms that the two species are genomically different. As previously documented 47,48 , our study suggests that the A and G genomes in T. araraticum are less modified than the A and B genomes in the Emmer lineage.
This study has demonstrated that DArTseq genotyping can be applied to conduct a large scale analysis of plant genomes, mostly because it allocates markers to individual chromosomes, which can be easily extracted and analyzed. This genomic sequence-based platform ensures a wide genomic coverage and is not subject to criticism associated with the factors that affect meiotic chromosome pairing in hybrids of distant crosses, which forms the main anchor of cytogenetic systems of genome analysis [9][10][11][51][52][53][54] . Also, the number of informative markers generated by DArTseq outstrips what is possible with conventional DNA marker procedures [14][15][16][17] , making it more robust and reliable. Genotyping of all the available accessions of species in tribe Triticeae using this platform would clarify the genomic relationships between the cultivated and wild species. This information would make the use of the available gene pools for breeding much more precise and would also help to clarify Triticeae taxonomy. As polyploidy and interspecific hybridization are key events in the evolution of higher plants 55 , this genome analysis approach would be useful in other groups of plant species, especially polyploids with unclear phylogeny.

Methods
Plant materials. All 23 Aegilops species, eight Triticum species, and three distant relatives of wheat were analyzed. Except bread wheat, represented by two cultivars 'Chinese Spring' (CS) and 'Norin 61' (N61), each species was represented by one accession (Table 5). Seedlings were raised in Greenhouses of the Arid Land Research Center, Tottori University and Laboratory of Plant Genetics, Kyoto University, Japan. Depending on growth rate and plant size, fresh leaves were harvested from each 2-4-week-old seedlings and genomic DNA samples were isolated and purified using the cetyl trimethyl ammonium bromide method. Quality check, quantification Genotyping-by-sequencing and data analysis. Purified DNA samples (1 μg for each sample) were sent to Diversity Arrays Technology Pty Ltd (http://www.diversityarrays.com/) for sequencing and marker identification. Sequences of the genomic representations were aligned to the wheat_ChineseSpring10 reference genome and wheat_ConsensusMap_version_4. This is because our analysis is based on DArTseq platform optimized for hexaploid wheat. DArTseq is a genotyping-by-sequencing system which utilizes Next-Generation-Sequencing platforms (HiSeq. 2500 in our case) to sequence the most informative representations of genomic DNA samples to aid marker discovery. In comparison to the array version of DArT, DArTseq results in higher marker densities 56 . The high marker number generated by this system gives it an edge over previous molecular marker procedures applied for genomic analysis of Triticeae species [14][15][16][17] . It, therefore, serves as a cheap alternative to genome analysis by whole genome sequencing, where the sequence information of genomes intended to be analyzed are not available. Two types of data are generated by DArTseq: SNP and SilicoDArT. SNP markers are nucleotide polymorphisms present in the restriction fragments, while SilicoDArT markers represent PAV of the restriction fragments. Therefore, codominant SNP markers are scored "0" (reference allele homozygote), "1" (SNP allele homozygote) and "2" (heterozygote: presence of both reference and SNP alleles), while dominant SilicoDArT markers are scored in a binary fashion, with "1" representing presence of the restriction fragment with the marker sequence and "0" designating its absence.  Frequently called SNP markers (>0.9 call rate) were used for phylogenetic tree constructions and differentiation of the genomes of polyploid species of Aegilops, whereas SilicoDArT markers (>0.7 call rate) were used for the determination of putative progenitors of the polyploid Aegilops and Triticum species. This reduction in call rate was made to accommodate more markers, ensure wider genomic coverage and reduce bias. To estimate the phylogenetic relationships among the 11 diploid and 12 polyploid Aegilops species, the raw genotypic data of the two sets (diploid and polyploid) were subjected to cluster analysis. Pearson's correlation coefficient (r) was used as similarity index, and the genetic distances among the species were estimated by transforming the r values to distance values, using d = 100(1 -r) (http://genomes.urv.cat/UPGMA/) 57 . Species-specific SilicoDArT markers of the polyploid species of Aegilops were used to differentiate their genomes, while species-specific SilicoDArT markers of diploid species of Aegilops were used to estimate the diploid-polyploid evolutionary relationships among all the Aegilops species. Diploid Triticum and Aegilops species whose total SilicoDArT markers showed at least 10% homoeology to the total SilicoDArT markers in any of the three genomes of hexaploid wheat were selected as analyzers to determine the putative progenitors of the corresponding genomes of each polyploid Triticum species. Species-specific SilicoDArT markers of these selected diploid species were used as analyzers to determine the putative progenitors of each polyploid Triticum species. In determining the progenitors of all the polyploid species (Aegilops and Triticum), the proportions of the species-specific markers of the diploid analyzers retained in the genomes of the polyploid species were used as a basis to draw conclusions on genomic proximity and evolutionary relationships among the species. Species-specific markers of Ae. speltoides and Ae. searsii were further used to examine the relationship between the seven B/G-genome chromosomes of each of the polyploid Triticum species and those of the two diploid species. The two diploid species were chosen based on the close proximity of their genomes to the B/G genomes of the polyploid species.

Availability of Materials and Data
The data on which are conclusions are based are included within the article and supplementary files and plant materials can be sourced from KOMUGI database maintained by the National BioResource Project -Wheat, Japan (https://shigen.nig.ac.jp/wheat/komugi/).