Durum wheat genome highlights past domestication signatures and future improvement targets

The domestication of wild emmer wheat led to the selection of modern durum wheat, grown mainly for pasta production. We describe the 10.45 gigabase (Gb) assembly of the genome of durum wheat cultivar Svevo. The assembly enabled genome-wide genetic diversity analyses revealing the changes imposed by thousands of years of empirical selection and breeding. Regions exhibiting strong signatures of genetic divergence associated with domestication and breeding were widespread in the genome with several major diversity losses in the pericentromeric regions. A locus on chromosome 5B carries a gene encoding a metal transporter (TdHMA3-B1) with a non-functional variant causing high accumulation of cadmium in grain. The high-cadmium allele, widespread among durum cultivars but undetected in wild emmer accessions, increased in frequency from domesticated emmer to modern durum wheat. The rapid cloning of TdHMA3-B1 rescues a wild beneficial allele and demonstrates the practical use of the Svevo genome for wheat improvement. Genome assembly of durum wheat cultivar Svevo enables genome-wide genetic diversity analyses highlighting modifications imposed by thousands of years of empirical selection and breeding.

). e, Location of published QTLs.f, k-mer frequencies.g, Long terminal repeat (LTR)-retrotransposon density.h, DNA transposon frequency.i, Mean expression of HC genes calculated as log(FPKM + 1) of the mean expression value of all conditions (range 1.6-8.2).Links in center connect homoeologous genes between subgenomes; blue links between homoeologous chromosomes and green links between large translocated regions.

ARTICLES NATURE GENETICS
about 10,000 years ago 1 .Although the first evidence of DW dates to 6,500-7,500 years ago, DW became established as a prominent crop only 1,500-2,000 years ago 2 .Thus, the human-driven tetraploid wheat evolution process is the result of domestication (WEW to DEW), continued evolution under domestication (DEW to durum wheat landraces, DWL) and breeding improvement from DWL to modern durum wheat cultivars (DWC).
Wild relatives of modern crop plants can serve as sources of valuable genetic diversity for various traits (for example, disease resistance 3,4 and nutritional quality 5 ).Comprehensive comparative genomic analyses between cultivated crops and wild progenitors is a key strategy to detect novel beneficial alleles and structural variations that could constrain breeding efforts, as well as to understand the broader genetic consequences of evolution and selection history 6,7 .
Here we report the fully assembled genome of the modern DW cultivar (cv.)Svevo and provide a genome-wide account of modifications imposed by thousands of years of empirical selection and breeding.This was achieved by comparing the Svevo genome with the assembled genome of WEW accession Zavitan 8 and through a survey of the genetic diversity and selection signatures in a Global Tetraploid Wheat Collection consisting of 1,856 accessions.A region bearing a signature of historic selection co-locates with Cdu-B1, a quantitative trait locus (QTL) spanning 0.7 cM on chromosome 5B 9 known to control cadmium (Cd) accumulation in the grain.Identification of the gene(s) responsible for Cdu-B1 has been hampered by the large and repetitive nature of the DW genome and the low recombination rate in the region of interest.The efficient, genome-enabled dissection of the Cdu-B1 locus reported here demonstrates the value of the Svevo genome assembly for wheat improvement.

Results
The durum wheat reference genome.The Svevo genome sequence was assembled de novo using protocols previously described 8 and its main features are illustrated in Fig. 1.After sequencing (Supplementary Table 1) and assembly, the scaffolds (length of the shortest contig needed to cover 50% of the genome (N50) = 6.0 megabases (Mb); Supplementary Table 2) were ordered and oriented using the Svevo × Zavitan genetic map as previously described 10 .Thereafter, chromosome conformation capture sequencing (Hi-C) 11 resulted in a set of pseudomolecules (9.96 Gb; Supplementary Table 3) corresponding to the 14 chromosomes of DW and one group of unassigned scaffolds (499 Mb).The pseudomolecules encompass 95.3% of the assembled sequences and have 90% of the scaffolds oriented.Alignment of the DW genome with

NATURE GENETICS
high-density SNP genetic maps 12 showed highly recombinogenic distal chromosome regions exhibiting an almost linear relationship between genetic and physical distance (Supplementary Fig. 1).These regions account for about 22% of the genome with an average recombination rate of 1.8 Mb cM −1 (Supplementary Table 4).In contrast, large pericentromeric regions are nearly devoid of recombination and represent about 44% of the genome, with a mean recombination rate of 107 Mb cM −1 .Annotation of the Svevo genome led to the identification of 66,559 high confidence (HC) genes, 90.5% of which exhibited detectable evidence of expression in at least one of the 21 RNA-seq datasets listed in Supplementary Table 5.A detailed description of the DW genome is presented in the Supplementary Note (Sections 1.1 and 2.1).Projection onto the DW genome of 2,191 previously reported QTLs resulted in a full meta-QTL analysis (Supplementary Table 6b and Supplementary Dataset 1), revealing a QTL density distribution that closely mirrors the gene density distribution (Supplementary Table 7 and Fig. 1d,e).

Comparison between Svevo and Zavitan genomes.
To gain insights into short-term evolutionary changes, we compared the genome divergence between the modern DW cultivar Svevo and the WEW accession Zavitan 8 .The comparison revealed strong overall synteny (Fig. 1c) with high similarity in total HC gene number (DW 66,559; WEW 67,182; Supplementary Table 8), chromosome structure and transposable element composition (Supplementary Table 9).We identified syntenic LTR-retrotransposon insertions (Supplementary Fig. 2) not yet subjected to the rapid transposable element turnover of the intergenic space 1,13 because of the relatively short separation time between Svevo and Zavitan.
To monitor structural variations in the HC gene set, including minor changes that might have been generated during the short evolutionary timespan, a graph-based sequence clustering of all Svevo and Zavitan HC genes (in total 133,741) was undertaken.Stringent clustering (alignment e value < 10 −10 , overlap > 75% and identity > 75%) grouped only highly similar gene models in the same cluster.This approach produced 36,434 unigene groups, 79% (28,794) of which were clusters with at least two members, while 21% (7,640) contained only singletons (Supplementary Table 10).
The main scenarios for conserved and variable genes are summarized in Fig. 2a.The most frequent cluster configuration is made of two homoeologous gene copies per genome (one per A and B subgenomes), which occurs in 35% of all unigene groups.Altogether, the unigene groups with balanced copy numbers for Svevo and Zavitan represent up to 60% of all unigenes and involve 63% of all Svevo genes.The remaining 40% of unigenes (14,660) display asymmetric numbers of intact, full-length genes between Svevo

ARTICLES NATURE GENETICS
and Zavitan (we named this intact gene number variation).Since the unigene classification is on the basis of HC genes, any mutation leading to a frameshift and/or premature stop codon that rules out a gene from the HC class in Svevo or in Zavitan, results in an asymmetric unigene distribution.For at least two-thirds of the unigenes displaying variation of intact gene number, counterparts for the missing copies can still be found in LC or pseudogene class.The complete gene loss caused by large structural variations was responsible for asymmetric gene distribution in only onethird of the cases.Among the unbalanced gene clusters, there are 6,120 mixed clusters with copies from both genomes, subdivided into more Svevo members or more Zavitan members, as well as 4,313 and 4,227 lineage-specific unigene groups (mostly singleton genes) in Svevo and Zavitan, respectively, which have no close homoeolog in the HC gene set of the other accession.A detailed example of the type of variation leading to intact gene number variation is given for the lineage-specific unigenes (Supplementary Table 10a).In Svevo, this class includes 4,811 genes that represent 7.2% of all HC genes, a value similar to the 5% found after the comparison between two cultivars in a recent pangenome study of hexaploid wheat 14 .When the Svevo-specific genes were mapped onto the Zavitan genome (Fig. 2a and Supplementary Table 10b), 1,493 genes (31%) were not found on the Zavitan sequence, 1,225 (26%) correspond to shorter counterparts of annotated Zavitan HC genes and 1,095 (23%) were annotated as low confidence (LC) genes or pseudogenes.The remaining 965 (20%, that is, 1.4% of all Svevo HC genes) map to unannotated regions and are candidates for genes missed in the automated annotation.Loss and gain events can occur from ancestral four-member unigene clusters with one gene per A and B subgenome of both Svevo and Zavitan.One loss would result in a three-member unigene cluster, with one subgenome location missing.A total of 1,121 clusters with one lost Svevo member were found.The reverse situation with one lost member from Zavitan was found 852 times.The presumed HC gene losses are located predominantly in the more distal chromosomal regions (Supplementary Fig. 3a).A gain event would result in clusters with at least five members with one subgenome carrying two members.This condition was found 472 times with Svevo gains and 503 times with Zavitan gains.Most of the gains are located on the same chromosome indicating tandem gene duplication as the prevailing mechanism (Supplementary Fig. 3b).
Functional categories associated with relevant copy number differences between the two accessions are highlighted in Fig. 2b.A statistical analysis for gene ontology over-representation revealed that specific functions can be classified as: differentially enriched in Svevo, differentially enriched in Zavitan, balanced and Svevospecific unigene groups (Supplementary Fig. 3c).The balanced

ARTICLES NATURE GENETICS
gene groups are enriched for a multitude of regulatory functions and indicate a stronger sensitivity to gene dosage effects for the main regulatory networks.Unbalanced unigene groups with more Svevo members and Svevo unique groups are enriched for functions involving protein phosphorylation, for example kinases, which are known to trigger signal transduction cascades in response to environmental cues.The distal highly recombinogenic regions of chromosomes are enriched in unigenes displaying variation of intact gene number (Fig. 2c), contain most of the known QTLs (Fig. 1e) and the HC genes display a reduced expression breadth (that is, average expression value across all tissue/treatment conditions, Fig. 1i).This indicates the presence of an increased number of conditionspecific genes.The less dynamic interstitial regions contain a greater number of balanced gene families (Fig. 2c).Here the genes are expressed in nearly all conditions, indicative of an enrichment in housekeeping genes that is consistent with reports from the barley 11 and bread wheat 15 genomes.The positive correlation between recombination rate and DNA variants supports previous evidence that higher recombination rates and illegitimate recombination are drivers for tandem duplications 16 .
The balanced copy number groups contain much longer genes (median 1,152 base pairs (bp)) than the groups displaying variation of intact gene number (median 879 bp; Fig. 2a).The highest median gene lengths are found in groups with two copies in each genome (1,242 bp), whereas the lowest are among the unique genes (Svevo, 735 bp; Zavitan, 768 bp) and, surprisingly, in groups with one copy in each genome (756 bp).Such a pronounced shift towards shorter genes indicates an ongoing gene decay by frameshifts and mutations leading to premature stop codons.Collectively, the relatively high number of genes undergoing degeneration could be a consequence of more freedom for gene loss facilitated by the functional redundancy of the tetraploid genome state.
Germplasm structure and phylogenetic relationships.The wheat iSelect 90K SNP Infinium assay 17 was used to genotype the Global Tetraploid Wheat Collection consisting of 1,856 accessions representing the four main germplasm groups involved in tetraploid wheat domestication history and breeding: WEW, DEW, DWL and DWC (Supplementary Table 11).A set of 17,340 SNPs (non-redundant, genetically and physically mapped, subgenome-specific Mendelian loci) was used for analysis of genetic diversity, population structure and identification of the selection signatures.Four non-hierarchical clustering analyses (DAPC, sNMF, ADMIXTURE, fineSTRUCTURE [18][19][20][21] ), principal component analysis, and pairwise dissimilarity analysis on the basis of neighbor joining generated global and highly concordant pictures of the genetic relationships among taxa and populations (Fig. 3 and Supplementary Datasets 2 and 3).WEW, DEW, DWL and DWC clearly separated in the neighbor joining tree (Fig. 3a), a result indicative of strong demographic and founder effects and little evidences for polyphyletic origin.Principal component analysis (Fig. 3b) illustrates the broad genetic diversity of DEW, while DWC showed a comparatively limited genetic diversity and a close relationship to a specific DWL population.DEW and  For DRI, top and bottom 2.5% DI quantile distributions are highlighted as red-and blue-filled dots, respectively, while for the other selection metrics top 5% quantile distributions are highlighted as red-filled dots.The physical location of genes (Supplementary Table 12) and QTL confidence intervals relevant to domestication and breeding is reported.
ADMIXTURE analysis (Fig. 3c) showed that WEW and DEW have highly structured genetic diversity even at high k values, while DWL showed a high rate of admixture already at low k values.WEW germplasm was divided into two main populations from North Eastern Fertile Crescent and Southern Levant (WEW-NE and WEW-SL, respectively).WEW-NE was further divided into several populations from Turkey, Iran and Iraq, while WEW-SL included distinct populations from Israel (3), Jordan, Syria and Lebanon (Supplementary Fig. 4).DEW and DWL germplasm was characterized by a similar though independent radial dispersal pattern: Northern-to-Southern Fertile Crescent and from Fertile Crescent to Mediterranean basin (Western), Greece to Balkans (Western), Iran to Transcaucasia (Eastern) and Oman to India and Ethiopia.The germplasm belonging to DEW and DWL was subdivided in six main populations each, while all DWC clustered to a further distinct group that represents a wide branch of the durum North African and Turkey to Transcaucasian landrace populations (Supplementary Fig. 4).
After removing the accessions with a high level of admixture, the genetic relationships among the main tetraploid germplasm groups were further investigated using hierarchical analysis of variance (ANOVA), by computing the pairwise divergence index (or fixation index) F st and Nei's genetic distances (Fig. 4), and by generating population-based whole-genome phylogenetic trees (Supplementary Fig. 5).The results confirm the radial dispersal patterns already reported and indicate the WEW-NE from Turkey as the most probable ancestor of all DEW populations (F st and genetic distance values consistently lower for all WEW-DEW pairs).Two DEW populations from Southern Levant Fertile Crescent (Fig. 4) showed the closest relationship to all DWL populations (except T. turgidum ssp.turanicum), while the DWC germplasm was mostly related to the two DWL populations from North Africa and Transcaucasia (Fig. 4).The Ethiopian and T. turgidum ssp.turanicum populations were the most differentiated among the DWL germplasm and their contribution to the modern durum varieties was minimal.
Diversity reduction and signature of selection.The pattern of diversity for each germplasm group was assessed through a SNPbased gene diversity index 22 (Fig. 5a).WEW showed the highest average diversity with only two pericentromeric regions (chromosomes 2A and 4A) with a lower than average diversity.Thus, WEW provides a valuable reference for assessing the reduction of diversity associated with domestication and breeding in tetraploid wheat.Compared to WEW, each of the subsequently domesticated/ improved germplasm group showed several strong diversity depletions that arose independently and were progressively consolidated through domestication and breeding.With few exceptions, the diversity depletions that occurred in the early transition (WEW to DEW, Fig. 5b or DEW to DWL, Fig. 6a) are confirmed or even reinforced in the subsequent ones (Fig. 6a,b).Consequently, the genome of DWC is characterized by numerous regions showing near-fixation of allelic diversity (Fig. 6b).We applied five different metrics to detect selection signatures: diversity reduction index (DRI 23 ), single site divergence index (F st ; ref. 24 ), haplotype-based frequency differentiation index (hapFLK 25 ), cross-population extended haplotype homozygosity (XP-EHH 26 ), and spatial pattern of site frequency spectrum (XP-CLR 27 ).Genomic regions supported by one or more indexes were considered as putative signatures of selection.

T a C w i-A 1
Rht-A 1 Phs-A1  12) and QTL confidence intervals relevant to domestication and breeding is reported.

ARTICLES NATURE GENETICS
Frequently, two or more indexes occurred in overlapping regions, hereafter referred as selection clusters.In total, 104 pericentromeric (average size 107.7 Mb) and 350 non-pericentromeric (average size 11.4 Mb) clusters were identified in one, two or three transitions.When 41 loci known to be under selection during emmer domestication and durum wheat evolution or breeding (Supplementary Table 12) were projected on the genome, many of them overlapped with selection clusters (Figs. 5 and 6; Supplementary Dataset 4).Most of the strongest pericentromeric diversity depletions (DRI > 4) occurred during emmer domestication (chromosomes 2A, 4A, 4B, 5A, 5B, 6A and 6B).Furthermore, one of the two brittle rachis regions marking the early domestication process (BRT-3B 8 ) showed a localized sharp reduction in diversity confirmed by F st and XP-CLR indexes.The same region, then, underwent an extreme diversity reduction in the DEW-to-DWL transition (DRI 3.4).Additional 14 pericentromeric and 90 non-pericentromeric (DRI > 2) diversity depletions, including one harboring the major tough glume QTL governing threshability (Tg-2B 28 ), occurred during the DEW-to-DWL transition.Finally, several reductions in diversity (75 with DRI > 2) were specifically associated with breeding of modern durum cultivars, including some associated with disease resistance (for example, Sr13; ref. 29 and Lr14; ref. 30 ) and grain yellow pigment content loci (for example, Psy-B1; ref. 31 ).A detailed description of the selections signatures is presented in Supplementary Note.
Variation for cadmium grain content in tetraploid wheat.Cdu-B1 is a QTL located on the long arm of chromosome 5B, which accounts for >80% of the phenotypic variation in cadmium (Cd) concentration in grain 9,32,33 .The Cdu-B1 region corresponds to a physical interval of 4.27 Mb.A detailed comparison of the Zavitan and Svevo (low and high Cd, respectively) genomes, coupled with exome sequencing, revealed a segment of increased nucleotide variation in this refined region (Supplementary Fig. 6).Furthermore, the region contains 192 gene models, 48 of which have informative

Discussion
The genome assembly of the modern DW cv.Svevo, with a quality level consistent with those recently obtained for other species 8,11,15 , represents an essential tool to study durum wheat domestication, evolution and breeding as well as to gain new insights into gene function and the genome-wide organization of QTLs for relevant agronomic traits.This study presents an inclusive analysis of a large

NATURE GENETICS
panel of tetraploid wheat representing all known taxa and provides a global picture of genetic relationship and population structure.The process leading to modern durum wheat was revealed by the four main germplasm groups of the Global Tetraploid Wheat Collection.The combination of genetic diversity and selection signature analysis revealed a dynamic description of the modifications imposed on the genome by domestication and breeding.The strongest reductions in diversity occurred in well-defined pericentromeric regions during the domestication of WEW.Then, the reduction of diversity continued more moderately, but spread over the genome, during the evolution of DWL and, more recently, as a consequence of the breeding activity 23,41 .Multiple divergence and haplotype metrics identified several regions coincident with known domestication loci, as well as others that might indicate new putative loci under domestication or selection.
Identification of TdHMA3-B1 as the gene most likely responsible for phenotypic variation in grain Cd accumulation, a result supported by genetic and functional evidence, and the recovery of the TdHMA3-B1a allele for low Cd accumulation, provides an example of the relevance of the genomic tools presented here.The increase in frequency of TdHMA3-B1b during DW breeding could be due to the presence of a selective sweep for another gene in the linkage disequilibrium region, although no evidence has been found and further studies are required to support this hypothesis.Alternatively, the non-functional TdHMA3-B1b allele could exert some beneficial effects on plant fitness.Zinc assimilation by plants results in the cotransportation of Cd, and like other P 1B-2 -type ATPase transporters 42 , TdHMA3-B1 can transport both metals.Although Cdu-B1 has no effect on agronomic performance under Zn-sufficient conditions, the low-Cd line from a pair of Cdu-B1 near-isogenic lines showed reduced biomass compared to the high-Cd line when grown under Zn-deficiency 43 .Therefore, TdHMA3-B1b could provide a growth benefit in Zn-deficient soils, such as those that widely occur in wheat-growing regions of Turkey 44 where TdHMA3-B1b originated.A reduction in root vacuolar sequestration of Cd and Zn in high-Cd genotypes (non-functional TdHMA3-B1b allele) under Zn-limiting conditions may increase the pool of Zn available for transport to the shoot, thereby sustaining shoot growth.
Access to the fully annotated genome sequence in combination with the wealth of genotypic, genetic mapping 12 and gene expression data provides great potential for future innovation for the wheat scientific community and the breeding sector.Gene discovery, QTL cloning and the precision of genomics-assisted breeding to enhance grain quality and quantity of pasta wheat will benefit from the resources presented here.Furthermore, the durum sequence provides a fundamental tool to more effectively bridge and harness the allelic diversity present in wheat ancestors most of which remains largely untapped.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, statements of data availability and associated accession codes are available at https://doi.org/10.1038/s41588-019-0381-3.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested
A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g.means) or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g.Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Data collection
This works has taken advantage from the novel genome sequence of durum wheat cv.Svevo generated by the authors and of existing dataset downloaded from public databases as detailed in the Supplementary note (section 1: Additional Materials and Methods).Swiss-Prot (version 02-15-10), Arabidopsis Araport 11 (version 201606), a TrEMBL (version 02-15-10) and UniProt databases were the main sorces.

Data analysis
This work employed a vast number of software and data analyses, each software is cited in the Supplementary note (section 1: Additional Materials and Methods) along with the corresponding reference or website information.In summary (numbers refer to the references listed in the Supplementary note file): 1-Genome assembly was carried out using the proprietary software package DenovoMAGIC2TM (NRGene, Nes Ziona, Israel) as described8.SNP dataset filtered for uniqueness of map location (dingle-locus Mendelian loci), absence of sigletons and double-singletons, and, in order to limit the interference effects caused by ascertainment bias particularly relevant for wild emmer and domesticated emmer accessions, the SNPs were further selected for overall null allele frequency < 0.25 (failure rate).All the fine-mapping and positional cloning steps of the Cadmium transporter locus, including several genotyping and phenotyping steps, were carried out using standard replications.Same applies to the candidate gene expression analysis, including both in-planta and in yeast expression and complementation experiments.Importantly, near-isogenic line stocks, collections of diverse germplasms of adequate counts, and yeast mutant stocks were used as to most adequate materials to guarantee highly-repeatable and reliable results.In conclusion, the experimental design, methods of replication and validation, and statistical methods used meet the standards outlined by Nature Genetics.All attempts at replication were successful.Several experiments required multiple replications and in each case the number of replications is described.
Randomization When generating the Illumina NGS Svevo genome sequencing data, library construction was carried out assuring the high quality and integrity of the original genomic DNA used.DNA random fragmentation (shearing/sonication) and tagmentation for Nextera libraries were carried out by following carefully optimized Illumina protocols that ensured the random representativeness of genomic sequences for all paired-end and mate-pairs libraries.Concerning the distribution of sequencing depth among the different types of libraries, 123X coverage was dedicated to PE libraries of 450bp-insert size (half of total coverage), this was required to obtain highly-accurate contig assemblies.Coverage dedicated to each of the 750bp-PE and to the three Nextera Mate Pairs libraries (3kb, 5kb, 10kb) was equally partitioned (38-41X each) in order to ensure that balanced sequence information of different insert size was conveyed to the scaffold assembler.During the selection of the accessions of the tetraploid diversity panel, great attention was given to sample accessions from each of the four main wheat germplasm groups (WEW, DEW, DWL, DWC) to ensure an accurate germplasm representativeness.Within each group, care in uniform sub-regional sampling and further germplasm bank's accession availability inspection was considered to represent all main sub-areas related to diffusion and domestication.Within sub-areas, random sampling in balanced numbers was carried out, after passport inspections of seedbank available accessions, in order to limit as much as possible sampling of duplicated or highly related accessions.Prior to the final diversity analysis, great care was taken in the joint inspection of passport and molecular information available in order to filter out clearly duplicated, highly similar and redundant accessions.Germplasm structure was assessed in great details, based on two independent dedicated software.As to the whole-genome scan for differential gene diversity among the four main germplasm subgroups, given the predominantly descriptive objective of this analysis, namely describing the diversity present among germplasm collections, selective sweep tests and corrections for population structure were not applied, except for the initial withdrowal of the Ethiopian DEW and DWL accessions, two groups clearly distinct from the main bulks of European, Mediterranean and Central Asian germplasm.A similar approach was applied for the Cadmium-transporter allele survey distribution (carried out for all 1,854 accessions available).As to the fine-mapping and positional cloning steps of the Cadmium transporter locus, including several genotyping and phenotyping steps, standard randomization best practices were used across all experiments.

Blinding
Not applicable.This research did not include experiments with observer biases.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Fig. 1 |
Fig. 1 | Structural, functional and conserved synteny landscape of the DW genome.Tracks from outside to inside.a, Chromosome name and size (100 Mb tick size, arms differentiated by gray shading).b, Density of WEW HC gene models (HC; 0-25 genes per Mb).c, Links connecting homologous genes between WEW and DW.d, Density of DW HC gene models (0-22 genes per Mb).e, Location of published QTLs.f, k-mer frequencies.g, Long terminal repeat (LTR)-retrotransposon density.h, DNA transposon frequency.i, Mean expression of HC genes calculated as log(FPKM + 1) of the mean expression value of all conditions (range 1.6-8.2).Links in center connect homoeologous genes between subgenomes; blue links between homoeologous chromosomes and green links between large translocated regions.

Fig. 2 |
Fig. 2 | Comparison of the Svevo and Zavitan gene space.a, Main unigene group scenarios from co-clustering of Svevo and Zavitan HC genes.The diagram depicts the most common or typical scenarios, only HC intact genes were considered (CNV, copy number variation; CDS, coding sequence).b,Intact gene number variations.Each dot represents a gene cluster consisting of DW (x axis) and WEW (y axis) genes.Dots on the diagonal represent clusters with identical member numbers from both accessions.Functional predictions for some groups of genes with pronounced differences in member numbers are annotated on the diagram.c, Relative distance of DW genes from the centromere separated by gene cluster type.Proportionately more unigenes displaying intact gene number variation than balanced groups are observed towards the ends of the chromosome.HC genes unique to Svevo or Zavitan (not shown) are most highly represented at the ends of the chromosomes, with the median (black line) furthest from the centromere.Shape width represents the relative gene frequency.

TFig. 3 |
Fig. 3 | Tetraploid germplasm structure and phylogenetic relationships.a, Neighbor joining tree from Nei's genetic distances among the 1,856 accessions of the Global Tetraploid Wheat Collection.Genetic distances were computed from a set of 5,775 whole-genome linkage disquilibrium-pruned (r 2 = 0.5) SNPs.Correspondence between branches and main tetraploid wheat taxa/populations on the basis of ADMIXTURE and other population structure analyses are indicated by color code with the exception of T. turgidum ssp.karamychevii and ssp.isphahanicum that are indicated directly on the graph.Significances were estimated through 1,000 bootstrap resampling.b, Principal component analysis plot of the Global Tetraploid Wheat Collection on the basis of genome-wide pairwise distances calculated on the basis of linkage disquilibrium-pruned SNPs.c, ADMIXTURE analyses of the Global Tetraploid Wheat Collection with k (number of populations assumed for the analysis) from 2 to 20.

Fig. 4 |
Fig. 4 | Summary of Nei's genetic distances GD st (above diagonal) and pairwise F st (below diagonal) between main tetraploid wheat populations.Diagonal numbers represent within-population genetic diversity (expected heterozygosity) values.Only low-admixture accessions were used (Q-membership higher than 0.5 for WEW, DEW; Q-membership higher than 0.4 for DWL, DWC).Statistics were estimated with 5,775 linkage disequilibrium-pruned (r 2 = 0.5) SNPs.WEW-NE, WEW from the North Eastern Fertile Crescent, Turkey, Iran and Iraq; WEW-SL, WEW from Southern Levant including Lebanon, Syria, Israel and Jordan; DEW-T-TRC-IRN, DEW from Turkey to Transcaucasia and Iran; DEW-T-BLK, DEW from Turkey to the Balkans; DEW-SthEU, DEW spread in Southern Mediterranean areas; DEW-SL-EU1, DEW from Southern Levant Fertile Crescent to Europe (population 1); DEW-SL-EU2, DEW from Southern Levant Fertile Crescent to Europe (population 2); DEW-ETH, DEW from Oman, India and Ethiopia; DWL-SL-NA, DWL from Southern Levant Fertile Crescent to North Africa and Iberia; DWL-GRC-BLK, DWL from Greece to Balkans; DWL-T-TRC, DWL from Turkey to Transcaucasia; DWL-T-FC, DWL diffused in Turkey to the whole Fertile Crescent; DWL-TRN, T. turanicum; DWL-ETH, DWL from Ethiopia; DWC-DRY, DWC from Italian and ICARDA breeding programs adapted to dryland areas; DWC-ITLY, DWC from Italy; DWC-CIM70, DWC from the wide adaptation, temperate-adapted photoperiod insensitive CIMMYT and ICARDA germplasm bred in the 1970s; DWC-CIM80, DWC from the high-yielding CIMMYT germplasm bred in the 1980s; DWC-AMR, DWC from the photoperiod-sensitive North American and French germplasm.

Fig. 5 |
Fig. 5 | Genome-wide analysis of SNP diversity in the Global Tetraploid Wheat Collection and cross-population selection signatures from wild to domesticated emmer transition (WEW to DEW) on the basis of 17,340 informative SNPs.a, SNP-based diversity index (DI) for the main germplasm groups identified in the Global Tetraploid Wheat Collection: WEW, DEW, DWL and DWC.DI is reported as a centered 25 SNP-based average sliding window (single SNP step).Top and bottom 2.5% DI quantile distributions are highlighted as red-and blue-filled dots, respectively.b, Cross-population selection index metrics for the comparison between WEW and DEW.Selection metrics are provided for: diversity reduction index (DRI), divergence index (F st ), cross-population extended haplotype homozygosity (XP-EHH), multilocus test for allele frequency differentiation (XP-CLR) and haplotype-based differentiation test (hapFLK).For DRI, top and bottom 2.5% DI quantile distributions are highlighted as red-and blue-filled dots, respectively, while for the other selection metrics top 5% quantile distributions are highlighted as red-filled dots.The physical location of genes (Supplementary Table12) and QTL confidence intervals relevant to domestication and breeding is reported.

Fig. 6 |
Fig. 6 | Analysis of diversity and selection signatures in tetraploid wheat.Genome-wide cross-population selection signatures in DEW to DWL and DWL to DWC on the basis of 17,340 informative SNPs.a, Cross-population selection index metrics for the DEW to DWL. b, Cross-population selection index metrics for the DWL to DWC.For both panels, selection metrics are provided for: DRI, F st , XP-EHH, XP-CLR and hapFLK.For DRI, top and bottom 2.5% DI quantile distributions are highlighted as red-and blue-filled dots, respectively, while for the other selection metrics top 5% quantile distributions are highlighted as red-filled dots.The physical location of genes (Supplementary Table12) and QTL confidence intervals relevant to domestication and breeding is reported.

Received: 20
December 2017; Accepted: 22 February 2019; Published online: 8 April 2019 nature research | reporting summary October 2018 Corresponding author(s): Luigi Cattivelli Last updated by author(s): Feb 12, 2019 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish.This form provides structure for consistency and transparency in reporting.For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

2 -
Genome annotations was supported with the following tools: • HISAT2 (version 2.0.4)9 to align multiple sets of RNA-seq data to the assemblies; • Stringtie (version 1.2.3)9 to assemble mapped reads into transcript sequences for each dataset separately; • GMAP11 (version 06/30/2016) to align all sequences to the assemblies; • Cuffcompare from Cufflinks software suite12 for transcript predictions; • Transdecoder package (version 3.0.0) to extract the longest open reading frames for each transcript sequence and to translate them into predicted protein sequences; • AHRD tool (Automated Assignment of Human Readable Descriptions, https://github.com/groupschoof/AHRD,version 3.3.3)to annotate gene functions; • BUSCO (Benchmarking Universal Single-Copy Orthologs) tool (version 2, Embyophyta odb9)14 to determine the abundance of strongly conserved genes in the sets of all annotated genes; • Tallymer15 to calculate the Basic k-mer defined repetitivity; • Vmatch (www.vmatch.de)for transposons detection and classification by a homology search against the REdat_9.7_Triticeae and the nature research | reporting summary October 2018