The genomes of 204 Vitis vinifera accessions reveal the origin of European wine grapes

In order to elucidate the still controversial processes that originated European wine grapes from its wild progenitor, here we analyse 204 genomes of Vitis vinifera and show that all analyses support a single domestication event that occurred in Western Asia and was followed by numerous and pervasive introgressions from European wild populations. This admixture generated the so-called international wine grapes that have diffused from Alpine countries worldwide. Across Europe, marked differences in genomic diversity are observed in local varieties that are traditionally cultivated in different wine producing countries, with Italy and France showing the largest diversity. Three genomic regions of reduced genetic diversity are observed, presumably as a consequence of artificial selection. In the lowest diversity region, two candidate genes that gained berry–specific expression in domesticated varieties may contribute to the change in berry size and morphology that makes the fruit attractive for human consumption and adapted for winemaking.

We analysed WGS-derived short-read DNA sequences, using a reference-based alignment strategy, from a sample of 204 Vitis vinifera accessions (WGS panel) that captured the genetic diversity in the species, based on existing public information on the characterisation of grapevine germplasm repositories with SNP chips (diversity panel). DNA analyses were aimed at describing intraspecific genetic diversity and population structure as well as at characterizing genomic regions that showed low levels of nucleotide and haplotype diversity as a possible consequence of artificial selection. To finely characterize the genomic region that showed the lowest level of diversity, we generated RNA short reads and phenotypic data to investigate the impact of low diversity in the region on gene expression and berry-related traits.
The research sample for WGS is a group of cultivated grapevine varieties that is compared to undomesticated accessions. The sample is meant to represent all major groups of wine grapes that provided the foundation of the global wine industry and to include controls from table grapes and undomesticated grapes. The rationale for this sample choice is functional to the aims of the study described in the panel above (which require that the analysed set captures most of the intraspecific variation in genetic diversity) and to address the evolutionary questions about the relationships between natural populations and botanical groups of cultivated varieties (which requires genetic variation within groups to be well represented). The study is also extended to a larger diversity panel (n = 1,445) using data from existing data sets cited in Figure  . These datasets represent the state-of-art for the known genetic variation in the species V. vinifera. All accessions therein are characterized at a common set of variant sites that could compared with WGS data. The rationales for including these datasets in the present study are: 1) to have a highly reliable representation and validation of the fraction of diversity captured by the WGS panel compared to the known diversity; 2) to extend analyses of population genetics for parameters that are first accurately defined in the WGS panel using millions of unbiased SNPs to the analysis of the most ample genetic datasets available from literature and public repository, though genotyped at a subset of variant sites.
Sample-size calculation was not predetermined. For evolutionary analyses based on whole-genome sequencing, the sample included 204 accessions that well represent the known diversity in the species. This sample size is sufficient to capture the known genetic diversity. We performed an a posteriori validation by comparing this sample (n = 204) with a larger diversity panel (n = 1,445) using GBS data from existing data sets. For RNA analysis and the characterization of the selective sweep on chromosome 17, we used a sub-set of accessions that carry in different diploid combinations the haplotypes that had been identified in the whole n = 204 sample.
Berries of 88 varieties were sampled at same developmental stage on different dates (Supplementary Data 2), from two replicated field plots. From each plot two batches of asynchronous berries were collected over the same bunches, one composed of hard berries (target developmental stage: 5.2°Brix), the other composed of soft berries (target developmental stage: 6.4°Brix), both sorted by firmness to the touch. The accuracy of berry sorting was validated by subsampling from each batch random subsets of berries for destructive measurements, e.g. soluble solids concentration (Figure 6), berry weight, number of seeds per berry, seed fresh weight, and derived parameters (Supplementary Data 2). Data were collected by Gabriele Di Gaspero. Data of soluble solids concentration were visually read using hand-held refractometer with automatic temperature compensation (ATC-1, ATAGO CO., LTD, Saitama, Japan). All data mentioned here were recorded in excel spreadsheets.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
August 2016 Gregorian Date). Each accession was sampled at a single time-point, corresponding to the exact day when hard and soft berries coexisted on the same bunches. The day of collection of each accession (when this condition occurred) is reported in Supplementary Data 2 along with soluble solids concentration data that account for the occurrence of this condition. The rationale for this timing in sampling was the necessity to collect berries from different accessions with different phenology at a synchronized developmental stage. The spatial scale from which the samples were taken is a geographical line spanning approximately 90 Km. The exact location of the three sampling sites along this line is 46. 06 N, 12.84 E; 46.03 N, 13.23 E; 45.86 N, 13.96 E.
One sample from the n = 204 set was excluded from certain analyses and this was stated in that part of the text. "We first used a model-based clustering approach using whole genome sequence data of 203 accessions of vinifera (after removing accession KE-06 from this specific analysis, following the classification of this individual as a feral escapee done by Liang and coworkers)" For genome-wide association analysis (GWAS) of the seed-to-berry ratio (reported in Supplementary Figure 20 for explaining the phenotypic effect of the selective sweep on chromosome 17), the accessions Sultanina and Kishmish Vatkana, which carried only remains of the undeveloped seeds due to stenospermocarpy-a trait controlled by an independent locus on chromosome 18-were excluded.
Reproducibility of the results regarding the genetic structure of the grapevine germplasm, the relationships between populations and the origin of European wine grapes was obtained by testing alternative assumptions and different hypothetical evolutionary scenarios as reported in Supplementary Note 5 and Supplementary Figures S13-20.
Population genetic analyses were conducted without a priori allocation of samples into groups, with one exception explained below. Grouping of samples for generating figures and for calculating parameters within populations was solely based on the information of DNA variation data, with an agnostic approach with regard to other assumptions, which we considered necessary for an unbiased approach. An exception to this rule was only done for generating Figure 1b, with the following rationale. A priori geographic clustering was used only for producing for the TreeMix analysis on the extended germplasm of the diversity panel. Individual varieties were assigned to a country of origin based on information reported in the Vitis International Variety Catalogue database or to the earlier or most renowned growing area. Countries were grouped into broad geographical areas that have homogeneous within-areas and differentiated between-areas climate conditions, following the Köppen-Geiger climate classification map. All groups were defined in the main article and the complete composition of all groups is reported in Source Data.
Berries used for generating phenotypic data were sampled blind regardless their size, shape, color, position on the bunch, position on the vine.