Genome-wide patterns of population structure and association mapping of nut-related traits in Persian walnut populations from Iran using the Axiom J. regia 700K SNP array

Persian plateau (including Iran) is considered as one of the primary centers of origin of walnut. Sampling walnut trees originating from this arena and exploiting the capabilities of next-generation sequencing (NGS) can provide new insights into the degree of genetic variation across the walnut genome. The present study aimed to explore the population structure and genomic variation of an Iranian collection of Persian walnut (Juglans regia L.) and identify loci underlying the variation in nut and kernel related traits using the new Axiom J. regia 700K SNP genotyping array. We genotyped a diversity panel including 95 walnut genotypes from eight Iranian provinces with a variety of climate zones. A majority of the SNPs (323,273, 53.03%) fell into the “Poly High Resolution” class of polymorphisms, which includes the highest quality variants. Genetic structure assessment, using several approaches, divided the Iranian walnut panel into four principal clusters, reflecting their geographic partitioning. We observed high genetic variation across all of the populations (HO = 0.34 and HE = 0.38). The overall level of genetic differentiation among populations was moderate (FST = 0.07). However, the Semnan population showed high divergence from the other Iranian populations (on average FST = 0.12), most likely due to its geographical isolation. Based on parentage analysis, the level of relatedness was very low among the Iranian walnuts examined, reflecting the geographical distance between the Iranian provinces considered in our study. Finally, we performed a genome-wide association study (GWAS), identifying 55 SNPs significantly associated with nut and kernel-related traits. In conclusion, by applying the novel Axiom J. regia 700K SNP array we uncovered new unexplored genetic diversity and identified significant marker-trait associations for nut-related traits in Persian walnut that will be useful for future breeding programs in Iran and other countries.

Principal component analysis (PCA) was used to identify patterns of diversity among genotypes, based on phenotypic traits ( Supplementary Figs S2 and S3). PCA showed that the first eight components explained 82.1%  Table 2. Fruit traits utilized in the studied walnut genotypes. a SD is an abbreviation of standard deviation, which was calculated based on the measured values of twelve seeds. b CV is an abbreviation of coefficient of variation, which was estimated as the ratio of the standard deviation to the mean of all genotypes.
www.nature.com/scientificreports www.nature.com/scientificreports/ of the total variance among the 95 Iranian walnut genotypes (Supplementary Table S2). The first principal component accounted for 25.9% of the total phenotypic variation and showed the highest positive correlation with nut length (0.37), size index (0. 35 Genotyping and data quality control. We genotyped the whole Iranian collection using the latest the Applied Biosystem Axiom J. regia 700K SNP array (Table 1; Supplementary Table S1; Fig. S1). By applying default thresholds (dish quality control -dQC < 0.82 and quality control call rate < 0.97), all of the 95 samples passed the quality standards. The average cluster call rate and average reproducibility were 99.75% and 99.95%, indicating high quality genotyping results.
The SNPs were categorized into the six default classes using Affymetrix Power Tools (APT): 1) Poly High Resolution (PHR), which comprises polymorphic SNPs with three high-resolution genotypic clusters; 2) No Minor Homozygote (NMH), SNPs with no samples of the minor homozygous genotypes; 3) Mono High Resolution (MHR), SNPs which are monomorphic across the genotypes studied; 4) Call Rate Below Threshold (CRBT), SNPs with genotype call rate below threshold (97%); 5) Off-Target Variant (OTV) polymorphisms, where the genotyping data with low-intensity cluster resulted from dissimilarity between the probe and the target sequences and; (6) Other, which includes SNPs with no clear cluster pattern of the genotypic data.
A summary of the distribution of all SNPs in the different categories is shown in Table 4. Overall, the vast majority of the SNPs (323,273; 53.03%) fell into the PHR class of polymorphisms, which represent the highest quality variants. These were filtered for missing rate (<20%) and minor allele frequency (MAF > 0.05), obtaining a final subset of 313,657 PHR SNPs that were used in the subsequent analyses.
Population structure analysis. To study the structure of Persian walnut populations and the genetic relationship among samples, three different analyses were performed. Principal Component Analysis (PCA) and cluster analysis (CA) were used to assess the genetic distances among Iranian walnut genotypes by using a linkage disequilibrium (LD)-pruned SNP subset of 33,336 PHR polymorphisms. Figure 1a shows the first two principal components, which explained 8.05 and 7.33% of the total variation, respectively. PC1 clearly distinguishes Kerman individuals from the other provinces, while along PC2, genotypes from Fars and Yazd provinces clustered separately from Semnan (Fig. 1a). The cultivar Chandler, which was used as standard during the genotyping process, grouped with individuals from Ilam (Fig. 1a). Overall, we identified a clear separation of the Iranian walnut genotypes into four main genetic clusters centered in (1) Kerman, (2) Fars and Yazd, (3) Semnan, and (4) Ilam, Markazi, Hamedan, and West Azerbaijan provinces (Fig. 1a). By overlaying a map of Iran, we can observe a coincidence between these four walnut subpopulations and potential barriers to gene flow, such as mountains and deserts.
In contrast, the cluster analysis on a genome-wide Identical-By-State (IBS) matrix grouped the Iranian walnut genotypes into five groups ( Supplementary Fig. S4): (1) Yazd, Fars and West Azerbaijan, (2) Kerman, (3) Ilam, Markazi and Hamedan, (4) Semnan and (5) Ilam and Chandler (Fig. 2b). As with PCA, the cluster analysis assigned all genotypes to their geographical regions ( Supplementary Fig. S4), showing the Fars-Yazd and Semnan groups as the two most distant and the genotypes from Ilam and Markazi as genetically closest.  Table 3. Correlations among the fruits traits in the studied genotypes of walnut. **Correlation is significant at the 0.01 level. *Correlation is significant at the 0.05 level. ns Correlation is no significant. Traits abbreviations are explained in detail in the material and method section.
www.nature.com/scientificreports www.nature.com/scientificreports/ We then applied the model-based clustering approach implemented in fastSTRUCTURE software to determine the most likely number of genetic groups (K) within our Iranian walnut collection. According to the best choice algorithm function of fastSTRUCTURE, the most likely K ranged from 2 to 5 ( Fig. 1c). At K2 the two main groups comprised (1) Kerman individuals (n = 41), (2) all other genotypes (Fig. 1c). At K3 the three major groups encompassed (1) Kerman individuals (n = 41), (2) Semnan individuals (n = 9), and (3) the genotypes from other provinces (Fig. 1c). At K4, in addition to the clusters identified at K3, we observed a fourth group including the cultivar Chandler and two individuals from Ilam (Fig. 1c). At K5, we identified a further substructure with the Kerman population dividing it into two groups, (1) thirty-eight Kerman individuals, and (2) three very old Kerman individuals (Fig. 1c). As with the cluster analysis, most of the Ilam population (n = 10) and all of the Markazi (n = 4) individuals were not clearly assigned to a defined group by fastSTRUCTURE, showing  Table 4. Summary of SNP data generated in walnut populations using Axiom J. regia 700K SNP array. The SNPs classes are explained in detail in the results section. www.nature.com/scientificreports www.nature.com/scientificreports/ admixture with the Kerman, Semnan and Fars groups (Fig. 1c, K6, Supplementary Table S3). According to the marginal likelihood graph, the most likely number of subgroups in our Iranian walnut diversity panel was K3 ( Supplementary Fig. S5). The three-methods of population structure analysis each identified different degrees of substructure, but overall we can conclude that the Iranian walnut population panel comprises mainly four genetic clusters ( (Table 5).
The lowest value of allelic richness (A r ) was found in the West Azerbaijan population (1.44), while the Ilam and Kerman populations both showed the highest value of A r (1.62) ( Table 5). The observed heterozygosity (H o ) ranged from 0.30 in the Markazi population to 0.34 in the Ilam and West Azerbaijan populations (0.33) ( Table 5). The expected heterozygosity (Nei's gene diversity, H e ) varied from 0.26 (West Azerbaijan) to 0.34 (Kerman). The lowest (0.31) and highest (0.35) unbiased expected heterozygosity values (UH E ) were identified in the Semnan and Ilam populations respectively ( Table 5). The fixation index (F IS ) ranged from −0.29 in the West Azerbaijan population to 0.03 in Kerman and Fars populations, but on average, we observed low values of F IS in all Iranian walnut populations, indicating a deficit of homozygotes (Table 5).
To study the amount of genetic differentiation among the Iranian walnut populations, we evaluated genetic differentiation parameters (including G ′ ST , D Jost and F ST ) for each pairwise comparison between the Iranian walnut populations (Table 6 and Supplementary Tables S4, 5).
The global F ST , F IT and F IS values (0.07, 0.11 and 0.04 respectively) among our Iranian walnut populations indicate moderate genetic differentiation (Supplementary Table S4). A clear differentiation was observed between the Semnan population and those of Yazd, Fars, West Azerbaijan and Kerman provinces ( Table 6). The same differentiation was also observed when estimating the D Jost (Supplementary Table S5).These findings are in agreement with the above analysis of population structure. Comparison to the Iranian map (Fig. 1b), indicates that all of the genomic results for our population correlate with geographical and ecological location.
Parental analysis. We investigated the level of relatedness within our walnut collection, including 'Chandler' as a commercial walnut cultivar. A pair of Kerman genotypes had a proportion of IBD alleles (PI-HAT) value > 99% and were therefore considered to be genetically identical. The first and second-degree relationships, the most informative, are shown in Table 7. In first-degree relationships, the probability of parent-offspring pairs sharing zero, one, and two IBD alleles (Z0, Z1, and Z2) is expected to be closer to 0, 1 and 0 respectively, while for second-degree pairs, Z0, Z1, and Z2 are expected to be roughly 0.5, 0.5 and 0 respectively. Nine pairs of individuals showed PI_HAT equal to or higher than 0.5, and were then considered to be first-degree relatives ( Table 7). Eight pairs showing PI-HAT ≈ 0.25-0.35 and relatively high Z0 and Z1 (≈0.5) values were considered second-degree pairs (Table 7). Interestingly, we identified a second-degree relationship between two individuals www.nature.com/scientificreports www.nature.com/scientificreports/ from Ilam population and the cv. 'Chandler' . Overall, we identified first and second-degree relationships only between individuals of the same province.
Association mapping for nut-related traits. Due to the great phenotypic variability observed in our collection for nut-related traits, we ran GWAS to dissect, for the first time in walnut, the genetic basis of these traits. GWAS was performed using as phenotypic entrance, both the average of 12 kernels per tree and the individual best linear unbiased predictors (BLUPs), as well as the first five PCs of the phenotypic data. We applied the SUPER and FarmCPU algorithms, accounting for both population structure and kinship. A total of 55 loci on 11 chromosomes were significantly associated with six nut and kernel-related traits and the first five PCs of the phenotypic data ( Fig. 2; Supplementary Table S6). Three SNPs on chromosome 7 were identified for nut weight. Six SNPs on chromosome 3 were identified for kernel percentage. For both nut width and nut thickness a common SNP on chromosome 1 was identified. Six SNPs, two on chromosome 10 and one each on chromosomes 3, 4, 7, and 13, were identified for shape index. For roundness index, nine SNPs were identified -three on chromosome 10 and one each on chromosomes 1, 3, 4, 6, 8, and 13. We also found 19 SNPs significantly associated with the first five PCs of the phenotypic data. The allelic effect and minor allele frequency (MAF) for these ranged from −5.72 to 12.24 and 0.05 to 0.50 respectively. No significant associations were found for other traits but at a suggestive threshold we found association for all of the traits studied except kernel shrivel, kernel veins, and packing tissue thickness.
Considering the suggestive threshold, we identified 56 loci associated with 13 nut and kernel related traits (Supplementary Table S6). Seven SNPs, six on chromosome 7, and one on chromosome 13 were identified for shell texture. For shell seal, six were on chromosome 13, and one on chromosome 14. Six SNPs on chromosomes 8 (n = 3) and 13 (n = 3) were detected for kernel filled trait. We found 5 SNPs each associated with kernel weight, kernel plumpness, and ease of kernel removal from nuts. Four SNPs each were associated with nut length, size index, shell color, and kernel color and located across chromosomes 1, 2, 4, 6, 7, 10, 11, 12, 13, and 16. Two SNP on chromosome 4 were detected for nut shapes and one SNP each on chromosomes 3 and 4 were identified for shell strength. A single SNP on chromosome 13 was identified for shell thickness. The allelic effect and MAF for these suggestive SNPs ranged from −1.94 to 3.04 and 0.07 to 0.49 respectively.
The 55 significant (41 unique sequence) and 56 suggestive SNPs associated with nut and kernel-related traits were annotated using BLASTx queries (Supplementary Table S6). We found that the most significant SNPs associated with both shape index and round index, fell in genes coding for the WRKY transcription factor 70 isoform X1 (E value = 1E-12; (Fig. 2) and WD repeat-containing protein 3 isoform X2 (E value = 7E-09). Also, the SNP on chromosome 13 associated to kernel fill is located within a gene encoding for the cyclic DOF factor 2-like (E value = 1E-08). Other genes identified as involved in the studied traits included were the WD repeat-containing protein RUP2, the acidic endochitinase-like, the LRR receptor-like serine/threonine-protein kinase GSO1, the classical arabinogalactan protein 9-like, the cysteine-rich receptor-like protein kinase 25, and the NAC domain-containing protein 43 genes (Supplementary Table S6). However, no genes were identified for some marker-trait associations ("None" in the Supplementary Table S6).  www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Iran has a long history of walnut production and is the third leading country in walnut production worldwide (445,829 in-shell tons 33 ). Most walnut trees grown in Iran originated from seed, and exhibit considerable diversity in yield, quality, and resistance to abiotic and biotic stresses 2,11,12 . Environmental stresses and climate change are reducing walnut yield 17 . To face these challenges, native genotypes with interesting phenotypic traits need to be explored and preserved for future development of improved scion and rootstock varieties 16 . In this regard, as a pre-selection step, we evaluated various Iranian walnut populations in their native habitats for traits including climatic adaptations, precocity, yield, nut quality and resistance to biotic and abiotic stresses. Based on the profiles of all these traits, we selected the most interesting and variable 95 genotypes for the assessment of walnut genomic variation as a first step towards future walnut sampling and the introduction of molecular breeding for Persian walnut in Iran. These individuals originated from eight Iranian provinces (Table 1), rich in native walnut trees adapted to local conditions 2,15,16,23 . Many of the sampled trees are estimated to be at least 100, and some up to 500, years old. In particular, this collection of walnut trees exhibited especially high levels of phenotypic variation for nut-related traits, which were high correlated to each other, as reported in previous work 17, [34][35][36][37][38] .
The study of genomic variation and genetic differentiation in domesticated or natural populations is important for understanding patterns of local adaptation and dissecting the genetic basis of traits of interest. Access to the new Axiom J. regia 700K SNP array 25 allowed us to explore in-depth the genome-wide allelic variation within our Iranian walnut genetic resources.
Compared to previous surveys of Iranian walnut 23,36-38 , we characterized a gene pool covering most of the Iranian walnut distribution at higher genetic resolution. Although the latest Axiom J. regia 700K SNP array was designed using the deep re-sequencing of 27 California walnut accessions, the conversion rate for Iranian samples (53.03%) was similar to that observed for the California material 25 , confirming this array's value in assessing the population structure and genomic variation of Iranian walnut populations.
Using the genetic profiles of high-quality SNPs evenly distributed across the genome, we identified four distinct genetic groups in our walnut collection (Fig. 1a,b). Such population structure can be explained by the geographical proveniences of our genotypes as well as the various climate conditions to which they adapted. For instance, the walnut genotypes from Kerman clearly separated from the others (Fig. 1a,b). The Kerman population includes many individuals located at high altitude and a set of three very old trees that formed a separate group at K5. In addition, most of the individuals from Ilam and Markazi were admixed with the genotypes from Kerman, Semnan, and Fars. This level of genetic similarity could be the result of human-mediated exchanges  Table 7. Parentage analysis and relationship categories assignment (RCA) for Iranian walnut genotyps obtained by SNP allelic profiles using PLINK. FID1 = Family ID for the first sample; IID1 = Individual ID for the first sample; FID2 = Family ID for the second sample, IID2 = Individual ID for the second sample; a probability to share zero IBD allele; b probability to share one IBD allele; c probability to share two IBD allele; d relatedness measure.
www.nature.com/scientificreports www.nature.com/scientificreports/ among these provinces. Overall, our population structure results are in line with previous genetic analysis of Persian walnut 7,23,39,40 .
The average heterozygosity (Ho = 0.34) was similar to that of UC Davis WIP accessions (Ho = 0.3) examined by Marrano et al. 25 using the same SNP array. In contrast, our Iranian collection was more heterozygous than six J. regia populations from Kerman province studied by Vahdati et al. 16 (0.23), but less heterozygous than other Persian walnut germplasm described in previous genetic surveys 23,40,41 . This discrepancy can be explained by the use in this study of SNP markers, which are bi-allelic and therefore detect polymorphisms differently from the multi-allelic SSRs 42 . In the Kerman, Fars, and Ilam populations, H e was slightly higher than H o , probably due to the Wahlund effect or inbreeding. On the other hand, the values of Ho and He in the West Azerbaijan population suggest low inbreeding and large genetic variation. However, these findings could also be due to the small sample size of our walnut panel. Collecting additional walnut material from Iran will be essential in providing further support for our conclusions.
As in Marrano et al. 25 , overall most of the Iranian populations exhibited little inbreeding and considerable genetic variation, as expected due to the dichogamous nature of walnut which promotes outcrossing. The high inbreeding coefficient found by Vahdati et al. 16 suggests significant heterozygote deficiency, likely due to crossing between closely related individuals. We observed greater genetic variation than Vahdati et al. 16 , probably because we assessed more diverse populations from eight provinces and using genome-wide markers. In addition, although walnut is a mixed-mating tree species, negative F IS values are expected in adult trees because breeding is usually purged at an early age, thus age of the tree and sampling is an important indicator in determining F IS values 41 . Absence of inbreeding may also be due to an "isolate breaking effect" that occurs when previously isolated populations interbreed 43,44 .
F ST values for most wind-pollinated tree species tend to be lower than 0.10 41 , indicating that more than 90% of the neutral genetic variation is maintained within populations. Our global F ST value of 0.07, indicates that 93% of the genetic diversity of our walnut collection occurs within the seven Iranian populations (Supplementary  Table S2). Our results agree with previous studies reporting overall F ST values for J. regia populations in Europe, Africa and Asia 40 , China 10,45 and six from Kerman province 16 . These findings possibly reflect the human-mediated dispersal of J. regia in space and time, resulting in a reassembly and homogenization of walnut genetic diversity.
The pairwise F ST analysis suggests the Semnan population is the most genetically differentiated from the other provinces. Semnan province is far from the other studied provinces so neither pollen nor natural seed dispersal from them is feasible. Thus, it will be interesting to explore more in-depth the Semnan population for crossing in future Iranian walnut breeding. The Ilam-West Azerbaijan, and Ilam-Markazi populations showed lower pairwise F ST values, suggesting gene flow among these populations. Due to their geographical location, there is a possibility of wind-driven cross-pollination, seed dispersal by animal movement, and seed movement of superior genotypes between orchards by walnut growers.
To date, there is no published parentage analysis of Iranian walnut populations. Although walnut can be propagated by several methods 46 , propagation of valuable trees from Iranian landraces or natural populations has been almost exclusively by seed, via humans or birds. However, we found a clonal relationship between one pair of very old individuals from Kerman province (Gugher) that were genetically identical and likely propagated by natural layering. Based on our observation at the site, natural layering occurred when a branch of the older tree (more than 500 years old) touched the ground, producing adventitious roots and eventually a new tree. Vahdati and Khalighi 47 have previously reported vegetative propagation by layering in walnut. This is the first report of vegetative propagation in an Iranian walnut population.
Parentage analysis identified seventeen pairwise relationships among the Iranian genotypes, nine classified as first and eight as second-degree relationships. All of these were between individuals within given populations, probably due to open pollination or seed exchange between local walnut growers. Interestingly, we identified a second-degree relationship between the cultivar Chandler and two individuals from the Ilam population, possibly indicating one of the ancestors of 'Chandler' could have originated from Iran. Our results are consistent with Tulecke 48 that stated the parents of Chandler might have originated from Iran. These parentage results, together with the genomic variation and differentiation analysis, are of interest both for clarifying the relationships between walnut accessions in view of GWAS analysis, and for accurate planning of future breeding programs in Iran.
Due to the limited number of individuals per population, our conclusions about genetic differentiation among populations have to be considered preliminary. The results of this study can be improved by increasing the sample size from the populations that we had a limited number of samples of them in our survey (Hamadan, West Azerbaijan, Markazi, and Yazd), and from the Semnan population as the most genetically differentiated from the other populations. In addition, in future studies, sampling of young and old trees from the same locations might reveal if local regeneration methods tend to preserve local genetic diversity or obscure it by the importation of new genetic types. Additional collections from other Iranian provinces and further genomic analysis will provide additional evidences to our results.
As further proof of the value of our walnut collection, we performed association mapping for nut and kernel-related traits, identifying marker-trait associations for 19 of the 22 traits studied. Our GWAS results revealed 55 significant SNPs (41 of unique sequence) associated with the variation of six traits, and 56 suggestive SNPs associated with 13 traits. None of the markers identified for the studied traits were previously mapped. In some cases, the same SNP was associated with different traits, which could be explained by the high correlation observed among them, or pleiotropic effects. Limited information exists in walnut regarding the genetic based of nut and kernel-related traits. The annotation of the significant and suggestive SNPs revealed genetic mechanisms for the studied traits also identified for other plant species. In particular, it has been already reported that the genes encoding for the WRKY transcription factor, the LRR receptor-like serine/threonine-protein kinase GSO1, the NAC domain-containing protein 43, and cyclic DOF factor 2-like are likely involved in embryo development 49 Scientific RepoRts | (2019) 9:6376 | https://doi.org/10.1038/s41598-019-42940-1 www.nature.com/scientificreports www.nature.com/scientificreports/ and, therefore, the determination of seed size 50 , as well as in the development of the epidermal surface in embryos and cotyledons 51 , seed size/weight 52 , and seed maturation 53 . The SNPs identified in this study, if appropriately validated, could be used as potential markers for marker-assisted breeding in walnut.

Conclusion
Genome-wide markers offer new opportunities for better understanding genomic variation and architecture of horticulturally important traits in walnut. We used the Axiom J. regia 700K SNP array to characterize Iranian walnut genotypes and to verify that the genetic variation available in our panel was suitable to perform an informative association mapping study. We observed a conversion rate similar to that obtained using the same SNP array in a Californian walnut collection. This indicates the Axiom J. regia 700K SNP array is a robust and valid genomic tool for further exploring the genetic variation and differentiation of walnut worldwide.
Population structure analysis of this Iranian walnut collection showed four main groups. Total differentiation among the populations was moderate, reflecting the occurrence of cross-hybridization events between native populations. Pairwise F ST analysis found the Semnan population to be the most genetically differentiated and further in-depth examination of it should be prioritized in view of its value for future breeding programs. Overall, we observed consistency among the different genetic analyses employed and results were in accord with geographical and ecological information. Our findings demonstrate that large genetic variation still exists within Iranian walnut populations located in one of the main centers of origin and domestication of Persian walnut. Also, the potential of our population for future GWAS studies was confirmed through the results of association mapping for nut and kernel-related traits. The information generated in this study will be useful for better understanding the genetic basis of adaptation in walnut and identifying resilience alleles to be used by future breeding programs in addressing the challenges of climate change. Our invaluable collection of walnut genotypes adapted to diverse climates and altitudes across Iran were maintained at the Nut Crops collection orchard of Aburaihan Campus, University of Tehran. All the seven populations investigated in present study, along with additional material collected from other parts of Iran, have been established in a common garden to investigate in the future the genetic architecture of local adaptation and the correlation among genotypes and both environmental variables and drought-related traits using GWAS approaches.

Materials and Methods
Sample collection and phenotypic measurements. In the first step, pre-selection of Iranian walnut genotypes based on phenotypic records from the Iranian Ministry of Agriculture and local growers was performed with the aim of selecting superior genotypes to be used in future scion and rootstock breeding programs. We selected 95 genotypes to characterize at both phenotypic and genomic levels. In particular, the selected 95 individuals inhabit disjunctive mountainous areas, and are old walnut trees from open pollinated seedlings (50to 500-year-old) with trunk diameters greater than 50 cm. They represent local populations (seedlings) that were randomly planted by humans or birds (mostly crows), and grow across wide areas in different parts (valleys or mountains) of Iran. Seeds of the studied genotypes collected from walnut populations located in eight main walnut producers provinces in Iran. The walnut trees investigated in the present research were separated from each other by 366-1768 km (approximately 890 km on average). We sampled four locations from Kerman (Baft-Gugher, Rabor, Rabor-Hanza and Bardsir), two each from Fars (Eqlid and Bavanat) and Ilam (Ilam and Eyvan) and one each from Semnan, Yazd, Markazi, West Azerbaijan and Hamadan (Table 1; Supplementary Table S1; Fig. S1). We planned to collect a minimum of ten samples per location; however, the number of samples collected per region was not equal because of different plant density found in each regions. These native genotypes are considered diverse on a regional scale since each region has gradually selected individuals adapted to environmental, horticultural, cultural and traditional features of the location. Therefore, the 95 selected genotypes likely represent a large part of the full genetic diversity found in Iranian walnut populations. A summary of their profiles for nut and kernel related traits are shown in Table 2.
Leaf tissue and open-pollinated seeds (at least 60 nuts per mother tree) were sampled. Twelve nuts of each selected genotype were used to evaluate 22 fruit-related traits, based on International Plant Genetic Resources Institute (IPGRI) or BI descriptors 54 . These included nut length (NuLe), nut width (NuWi), nut thickness (NuTh), nut weight (NuWe), kernel percentage (KePe), shape index (ShIn), size index (SiIn), round index (RoIn), nut shape (NuSh), shell thickness (SheTh), shell color (SheCo), shell texture (SheTe), shell seal (SheSe), shell strength (SheSt), packing tissue thickness (PaTiTh), kernel weight (KeWe), kernel color (KeCo), kernel plumpness (KePl), kernel shrivel (KeSh), kernel vein (KeVe), kernel fill (KeFi), and ease of kernel removal from nuts (EKeNu). Statistical analyses including descriptive statistics and normality testing on data and their residuals were performed in Minitab 18 statistical software (Minitab, Inc., State College, PA, USA). Multivariate statistical analyses, including principal component analysis (PCA), and correlation analysis were conducted using R 55 . Pearson and Spearman's rank correlation coefficients were used to determine the relationships between two continuous variables and two continuous or continuous-ordinal variables respectively. Plant materials and DNA extraction. This study examined 95 adult walnut trees grown locally in diverse parts of Iran with various climates. The plant material was collected from eight Iranian provinces: Kerman, Fars, Ilam, Semnan, Yad, Markazi, West Azerbaijan and Hamedan. A detailed list is presented in Table 1. During the summer of 2017 mature fresh leaves were collected, immediately frozen in liquid nitrogen, and lyophilized. Geographical information for each tree was recorded along with detailed climate and population data (Table 1, Fig. 1b).
Total genomic DNA was extracted from 40 mg of dry leaves using the E-Z 96 Plant DNA Kit (Omega Bio-tek; Norcross, GA) according to the manufacturer's instructions. DNA was quantified using Qubit dsDNA High Sensitivity (HS) Assay Kits (InVitrogen, Life Technologies). The quality of DNA samples was checked by agarose gel electrophoresis. SNP allele calling and data analysis. SNP allele calling was performed by the Bioinformatics Core of Affymetrix as described in the Axiom Genotyping Solution Data Analysis Guide (http://www.bea.ki.se/documents/axiom_genotyping_solution_analysis_guide.pdf). The samples with a dQC value ≥ 0.82 and a QC call rate ≥ 97% were considered for further analysis. SNPs were then classified into six major classes: PHR, NMH, OTV, MHR, CRBT and Other.
Analysis of population structure. PHR SNPs filtered for missing rate (>20%) and MAF (<5%), and LD-pruned (plink commands: indep-pairwise 50 5 0.25) were used to perform the population structure analysis. Two approaches were used: (i) Principal Component Analysis (PCA) and (ii) fastSTRUCTURE analysis. The PCA analysis was performed using the R package 'SNPRelate' 56 . The PCA plot was constructed using the R package ggplot2. A Bayesian clustering approach using fastSTRUCTURE software v1.0 57 was then applied. A number of clusters (K values) ranging from 2 to 10 were tested, with ten replicates each using the default convergence criterion and priors. The most likely K value was chosen by plotting the marginal likelihood of the data, and with the best choice function implemented in fastSTRUCTURE. The results of all replicates for each K cluster were summarized using CLUMPAK (http://clumpak.tau.ac.il/) 58 .
To confirm further the subgroups identified by the above analysis, individual dissimilarities for each pair of individuals were calculated and used for hierarchical cluster analysis by the R package 'SNPRelate' 56 . Genomic diversity and differentiation among Iranian walnut populations. The genomic diversity of populations was estimated using the 'diveRsity' package 59 , which is available in R 55 . Mean number of alleles per locus (A), mean observed heterozygosity (H o ), expected heterozygosity (H e ), unbiased expected heterozygosity (UH e ), allelic richness per population (Ar), and inbreeding coefficient (F IS ) were calculated for SNPs with a missing rate < 20% and MAF > 5% across the different sub-populations of Iranian walnut accessions. Populations with only one individual were excluded from the analysis.
Two important measurements of within-population genomic variation at a marker locus are the expected heterozygosity (H e or gene diversity) and the observed heterozygosity 60  Where F ranges from 0 (no inbreeding) to 1 (completely inbred population) Genetic differentiation statistics between subpopulations for each locus and across all loci were computed using the R package 'diveRsity' . These included: G ′ ST = Hedrick's standardized "differentiation" per locus 61 , D Jost = Jost's true allelic differentiation per locus 62 , and the three unbiased estimators of Wright's F-statistics 63 within-population inbreeding coefficient (F IS ), total-population inbreeding coefficient F (F IT ), and the among-population genetic differentiation coefficient θ(F ST ). The index F ST was computed as: where σ w is the variance of the allele frequencies between populations, σ b is the variance of the allele frequencies between individuals within populations, and σ a is the variance of the allele frequencies between gametes within individuals. PGDSpider version 2.1.1.5 64 was used to convert PLINK files (PED and MAP) to Genepop format as data input format for 'diveRsity' R package.
Relatedness Analysis. The PLINK 1.9 software 65 was employed on each pair of the Iranian walnut genotypes to infer relatedness for all pairwise comparisons among the 95 Iranian walnut accessions. Pairwise IBD analysis was used to explore the first-degree and second-degree relationships among individuals as the proportion of the SNPs at which there were zero, one, or two shared IBD alleles represented by Z0, Z1, and Z2 respectively. Relatedness was then measured using the PLINK PI_HAT parameter, which indicates the proportion of SNPs in IBD between individual pairs. Pairs of accessions with PI_HAT value > 95% were considered to be genetically identical. We considered indvidual pairs to be first and second-degree relatives if they had PI_HAT values ≥ 0.47 and 0.22 respectively. Association mapping for nut-related traits. Association mapping was performed for 22 seed-related traits using the average performance of each genotype, BLUPs and the first five PCs of the phenotypic data. GWAS was carried out by applying three models: MLMM 66 , SUPER 67 and Fixed and random model Circulating Probability Unification (FarmCPU) 68 method, as implemented in GAPIT 69 . Population structure and familial relatedness were taken into account in all models. We determined suggestive thresholds to correct the p-value for