Genome sequence of the progenitor of wheat A subgenome Triticum urartu

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing3, linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.

Interestingly, the enlarged number of B3 transcription factors in the three wheat genomes is mainly caused by the expanded REM subfamily (Extended Data Figure 4c).It was reported that the REM subfamily functions preferentially in flower development and vernalization (Luo et al., 2013).Thus, dominant B3 transcription factors in wheat genome may be related to the adaption of cold season and involved in the processes of vernalization and flower development.However, more experiments are needed to confirm it.

S1.2 Families of disease resistance and prolamin genes
Given the report that there was a specific expansion of R genes (disease resistance genes) in the T. urartu genome (Ling et al., 2013), we also conducted comparison on R genes.A total of 598 genes that encode NB-ARC domain and disease resistance proteins were identified, compared to 593 R genes which were detected in the draft genome of T. urartu (Ling et al., 2013).The chromosomal locations of the R genes are shown in Supplementary Data 4.
Among the 598 R genes, only seven genes (1.2%) were not completely sequenced, and contained 'N's in their ORF or the 1 kb promoter sequence, whereas 36.3% of R genes reported in the draft sequence (Ling et al., 2013) carried with N bases.These results further support the improved quality and completeness of our new assembly.This information will facilitate the functional studies of R genes in T. urartu and their applications in improving the disease resistance of wheat.

S1.3 Gene expression profiling in leaf, root and spike of T. urartu
We identified 61,145 transcripts using RNA-seq data of three T. urartu tissues leaf, root and spike.Of them, 5,944 (9.7%), 3,884 (6.4%) and 5,483 (9.0%) revealed a differential expression (FDR < 1e-4) between leaf and root, between leaf and spike as well as between spike and root, respectively.The differentially expressed genes were partitioned into clusters with dominantly high expression in spike (Supplementary Information Figure S1a), leaf (Supplementary Information Figure S1b) and root (Supplementary Information Figure S1c), respectively.Gene ontology analysis of genes within each group shows organ specificity in gene functions.Genes that are preferentially expressed in spike are enriched for GO terms relative to hydrolase activity, polysaccharide, carbohydrate, fatty acid and lipid metabolic processes (Supplementary Information Figure S1a); leaf specifically expressed genes are enriched for photosynthesis, pigment, chlorophyll and tetrapyrrole metabolic processes (Supplementary Information Figure S1b); and root specific genes are enriched for oxidoreductase, peroxidase and antioxidate activity (Supplementary Information Figure S1c).
divergence of A, B and D genomes and before tetraploidization of A and B genomes.Similar observed results were reported by Miftahudin et al. (2004) and Ma et al. (2015).(2) We also observed that a fragment at end Tu7, which showed a good synteny with the corresponding part of Ta7A, Ta7D and chromosome 7 of Ae. tauschii, but not with the corresponding fragment of Ta7B, displayed a synteny with the end part of Ta4A in comparison of Tu with Ta (Figures 2a and 2b, Extended Data Figure 5a).Based on the results, it is reasonable to deduce that the distal segment localized on Ta4A should come from a one way translocation from Ta7B, which is similar with the reports (Miftahudin et al., 2004, Ma et al., 2015, Clavijo et al., 2017).This translocation event occurred during/after polyploidization of A and B

S2.2.1 Collinearity of T. urartu versus B. distachyon, O. sativa and S. bicolor
The evolutionary relationships between wheat and several other genome-sequenced grasses including Brachypodium, sorghum and rice had been reported (The International Brachypodium Initiative, 2010).Among them, Brachypodium is the closest and sorghum the farthest relative of wheat.The former diverged with wheat about 32-39 MYA and the latter about 45-60 MYA.Rice split with wheat about 40-54 MYA.
We found that Tu3 and Tu6 were two mostly conserved chromosomes.

S2.2.2 Reconstruction of T. urartu chromosomes from 12 ancestral chromosomes
Based on the report that rice well maintained the basic structure of 12 chromosomes of grass ancestor (Salse et al., 2008), we reconstructed the chromosomal evolution model of T. urartu from the 12 ancestral chromosomes (A1-A12) using the precise collinear relationships between T. urartu and rice.The T. urartu chromosomes were mostly formed by insertion of one ancestral chromosome into centromeric region of another.Tu1 was formed by insertion of A10 into A5 (corresponding to Os10 and Os5), Tu2 by insertion of A7 into A4 (corresponding to Os7 and Os4), Tu4 by insertion of A11 into A3 (corresponding to Os11 and Os3) and Tu7 by insertion of A8 into A6 (corresponding to Os8 and Os6).Tu5 is an exception.It was derived from concatenation of A12 and A9 (corresponding to Os12 and Os9).Moreover, segments from two distal ends of A3 joined with A9 to build a complete Tu5 (Figure 2c).
There were two inversions and three translocations in the formation of T. urartu genome.The fusion models of T. urartu chromosomes from ancestral chromosomes are completely different from that of B. distachyon (The International Brachypodium Initiative, 2010), indicating that the chromosome evolution of T. urartu (even Triticeae) must be independent from that of B. distachyon.
To perform more accurate and detailed investigation of the evolutionary scenario of T.
These AGK genes were depleted in pericentromeric and subtelomeric regions (Figure 2c), indicating that more new genes are likely to occur in these regions of T. urartu genome.
With above mentioned model of chromosome reconstruction, we accurately localized the loci of chromosomal fusions of T. urartu and investigated the relationships between localizations of chromosomal fusions and AGK genes.We observed that fusion locations were preferentially in the non-AGK gene-rich regions (Figure 2c).Chi-square test also supported that AGK genes significantly depleted at fusion locations (p-value = 0.02).Thus non-AGK genes-rich regions have more chance to occur chromosomal structure variations in T. urartu evolution.

S2.3 Evolution of ancient duplicated blocks in T. urartu
The common ancestor of grasses has undergone a whole genome duplication (WGD) and subsequent events including chromosome translocations, fusions and insertions to shape the structure of extant various grass genomes.Seven ancestral chromosomes were doubled into 14 chromosomes and subsequent two chromosomal fusions formed a 12-chromosome ancestor.Ancestral duplicated chromosomes were majorly maintained in rice.They are Os1-Os5, Os2-Os4, Os2-Os6, Os3-Os7, Os3-Os10, Os8-Os9 and Os11-Os12 (Salse et al., 2008).Five duplication blocks were identified based on an intra-specific comparison of T.
The largest duplication block is between Tu1 and Tu3 covering the chromosomal regions from 434-557 Mb on Tu1 and 389-634 Mb on Tu3 (Extended Data Figure 7f).Synteny analysis of Tu3 vs. Os1 and Tu1 vs. Os5 showed that large segments of these two groups of chromosomes were collinear.The duplicated block between Tu1 and Tu3 corresponds to rice duplicated block between Os1 and Os5 (Extended Data Figures 8a and 7f).We identified 693 syntenic genes between Os1 and Os5.Of them, 310 (45%) and 320 (46%) genes of Os1 and Os5 have syntenic orthologues on Tu3 and Tu1, respectively, while only 147 (21%) genes are paired in T. urartu (Extended Data Figure 7g).

S2.4.2 Collinearity between Tu3 and Ta3B
A total of 247 syntenic blocks were detected.On average, each block covered 5.5 Mb chromosomal segment, containing 53 genes on Tu3 and 51 genes on Ta3B with about 9 collinear genes.The largest syntenic block covered a chromosomal segment of 64.3 Mb with 419 genes on Tu3 and 445 genes on Ta3B.Of them, 78 genes are collinear.All syntenic blocks covered 617 Mb (82.6%) and 651 Mb (84.1%) sequence of Tu3 and Ta3B, respectively (Extended Data Figure 8b).These are consistent to the results obtained from DNA collinearity from MUMmer.
On the basis of the genomic annotations and collinearity analysis, we studied DNA contractions and expansions between Tu3 and Ta3B.Extended Data Figure 8c showed a syntenic block composed of five consecutive collinear gene pairs.About 100 kb larger repetitive DNA were inserted on Ta3B compared to Tu3, leading to expansion in this Ta3B syntenic region.Extended Data Figure 8d showed another syntenic block composed of seven consecutive collinear gene pairs.We found 70 kb repetitive DNA contraction in the syntenic region of Ta3B compared to Tu3.Extended Data Figure 8e showed eight collinear
genomes.The novel translocations found in TGACv1 hexaploid wheat genome(Clavijo et al.,    2017)  were not identified in Tu genome.(3) Another obvious genome structure variation was observed between Tu4 and Ta4A, where Tu4AL corresponds to Ta4AS and Tu4AS to Ta4AL (Figures2a and 2b), while such variation was not determined in comparison of Tu4 with TaB and TaD genomes.This result indicates that a pericentric inversion involving most of the long and short arm occurred on Ta4A during the evolution of Ta4A chromosome.This inversion occurred during or after the generation of tetra-or hexaploid wheat, since it was only found on Ta4A.S2.2Genomic comparison of T. urartu (Tu) with O. sativa (Os), B. distachyon (Bd) and S. bicolor (Sb) Tu3 shared common ancestor with Os1-Bd2-Sb3, while Tu6 with Os2-Bd3-Sb4.Notably, syntenic regions of consecutive Tu chromosomal segments were separated by non-homologous DNA segments with varied length in Bd, Os and Sb.For Tu3, two collinear blocks were separated by non-collinear segment of 12-40 Mb on Bd2, 10-20 Mb on Os1 and 12-48 Mb on Sb3.As for Tu6, two collinear regions were divided by 8-47 Mb on Bd3, 9-20 Mb on Os2 and 13-49 Mb on Sb4.The results suggest that segmental deletions likely occurred on Tu3 and Tu6 after divergence between Tu and Bd (Figure 2c, Extended Data Figure 6, Supplementary Data6).The second conserved chromosomes were Tu1, Tu2, Tu4 and Tu7, which comprised of two chromosomal segments originated from different ancient chromosomes.The majority of Tu1 were orthologous with Os5-Bd2-Sb9.Homologous segments of Os10-Bd3-Sb1 inserted into it.About 400 Mb segment (from 20 to 420 Mb) of the Tu2 shared common ancestor with Os7-Bd1-Sb2, and two segments around it corresponded to Os4-Bd5-Sb6.The smaller segment was about 20 Mb and the larger one was from ~420 to ~740 Mb.Similarly, most chromosomal regions of Tu4 and Tu7 were orthologous with Os3-Bd1-Sb1 and Os6-Bd1-Sb10, respectively.Internal parts from ~51 to ~141 Mb on Tu4 were homologous segments of Os11-Bd4-Sb5 and from ~168 to ~470 Mb on Tu7 were homologous segments of Os8-Bd3-Sb7 (Figure2c, Extended Data Figure6, Supplementary Data 6)The least conserved T. urartu chromosome was Tu5.It was derived by concatenation of segments originated from three different ancestor chromosomes.Sequentially, segment from 0 to ~317 Mb on Tu5 was homologous to Os12-Bd4-Sb8; segment from ~317 to ~530 Mb corresponded to Os9-Bd4-Sb2 and segment from ~530 to ~648 Mb corresponded to Os3-Bd1-Sb1 (Figure2c, Extended Data Figure6, Supplementary Data 6).This observation is consistent with the model described byPont et al. (2013).