Analysis of genetic diversity and population structure among cultivated potato clones from Korea and global breeding programs

Characterizing the genetic diversity and population structure of breeding materials is essential for breeding to improve crop plants. The potato is an important non-cereal food crop worldwide, but breeding potatoes remains challenging owing to their auto-tetraploidy and highly heterozygous genome. We evaluated the genetic structure of a 110-line Korean potato germplasm using the SolCAP 8303 single nucleotide polymorphism (SNP) Infinium array and compared it with potato clones from other countries to understand the genetic landscape of cultivated potatoes. Following the tetraploid model, we conducted population structure analysis, revealing three subpopulations represented by two Korean potato groups and one separate foreign potato group within 110 lines. When analyzing 393 global potato clones, country/region-specific genetic patterns were revealed. The Korean potato clones exhibited higher heterozygosity than those from Japan, the United States, and other potato landraces. We also employed integrated extended haplotype homozygosity (iHS) and cross-population extended haplotype homozygosity (XP-EHH) to identify selection signatures spanning candidate genes associated with biotic and abiotic stress tolerance. Based on the informativeness of SNPs for dosage genotyping calls, 10 highly informative SNPs discriminating all 393 potatoes were identified. Our results could help understanding a potato breeding history that reflects regional adaptations and distinct market demands.


Results
Population structure analysis of the 110-line Korean potato germplasm panel using STRU CTU RE, DAPC, and HC. STRU CTU RE analysis provided an estimation of the number of populations in the Korean potato germplasm panel. The estimation of the delta K value, using Evanno's method, showed the highest peak at K = 3 (Fig. S3), indicating that the 110 clones in the panel could be grouped into three clusters based on differences in their genetic makeup. For the DAPC analysis, the lowest Bayesian information criterion (BIC) value of K = 3 was obtained using the find.clusters function, confirming structured population, except no admixture clones (Table S9, Fig. S4).
The Ward dendrogram generated using Nei's genetic distance and hierarchical clustering also revealed the presence of three clusters in the population represented by the 110 potato clones (Fig. 1, Table S9).
The individual clusters for STRU CTU RE, DAPC, and HC constituted similar sets of clones. For example, the clusters that included cv. Namjak also included the majority of the foreign potato clones, 26 (70.3%), 26 (70.3%), and 28 (75.7%) of the 37 foreign clones, respectively. In the Cluster I (Namjak group), the average percentage of foreign clones that were common across all three population structure analyses was 87.9%. Six Korean potato varieties (Namjak, Sinnamjak, Golden Egg, Sepoong, Jayoung, and Hongyoung) were common in all three methods, and two more Korean potato clones (Hongjiseul and Jaekyo-P-15) were commonly present in the DAPC and HC results. Thus, the four colored potatoes (Jayoung, Hongyoung, Hongjiseul, and Jaekyo-P-15), which are pigmented in both their skin and flesh, were grouped together in the Cluster I by DAPC and HC (Table S9). The remaining Korean potatoes (over 86.5%) were divided into two clusters, in which either cv. Daeseo or cv. Sumi were present. In the Cluster II (Daeseo group), the average percentage of Korean potato clones that were common across all three population structure analyses was 75.3%, while that of the Cluster III (Sumi group) was 73.3%. DAPC showed the highest average percentage of varieties common to more than two methods across all three clusters, suggesting that the DAPC results could be more reliable than the STRU CTU RE or HC results as described in 18,19 .
We also calculated population genetics parameters, some of which came from diploid genotype calls, dosage calls, or both ( Table 1). The minor allele frequency (MAF) ranged from 0.05 to 0.50, with a mean of 0.28. It was calculated by snpReady in R using diploid genotype calls. This value was similar to that calculated using Figure 1. The 110-line Korean potato germplasm consists of three subgroups which were inferred using three different approaches, STRU CTU RE, discriminant analysis of principal components (DAPC), and hierarchical clustering (HC). Most of the Korean potatoes grouped together into two clusters, whereas the foreign potatoes were placed into the third cluster. (a) Proportional membership (Q) of each clone showing three distinct clusters using 6575 SNP markers. (b) DAPC using the adgenet R package confirmed the structured population. The axes represent the first two linear discriminants and the small solid dots and ellipses represent each clone. The numbers in the circles indicate the different subpopulations identified by DAPC analysis. (c) A dendrogram of the 110 clones using HC (method = "ward.D2"). Two major clusters are observable, in which one cluster indicates I and another one consists of two subgroups (II and III). Note that Cluster III is presented in dark khaki, Cluster II in light blue, and Cluster I in dark gray, corresponding to the colors of the subgroups inferred by DAPC. The leaf colors indicate the respective market class of the individual clones. www.nature.com/scientificreports/ the function minorAllele in the adegent R package following the tetraploid model. We can easily calculate the number of transitions across the samples for the genotype calls of specific markers, so that the value indicates the informativeness of the SNP markers used in this study. Unlike the polymorphic information content (PIC), informativeness is calculated using dosage genotype calls. The PIC ranged from 0.08 to 0.38 with a mean of 0.30, whereas the informativeness ranged from 0.25 to 0.79 with a mean of 0.64. The average observed heterozygosity (0.51) calculated using diploid genotype calls was smaller than the average percent heterozygosity (0.63) calculated using dosage genotype calls. The average distance among the clones in the same cluster ranged from 0.32 to 0.37. The Cluster II showed the highest heterozygosity among the clones, indicating that it was highly diverse, whereas the other two clusters showed lower heterozygosity. The fixation index (F st ) measures the genetic distance between populations. The Cluster III had the highest F st value (0.18), while the Cluster II had the lowest (0.07), indicating that the clones in the former group are not currently breeding with one another, whereas those in the latter group share their genetic material through high levels of breeding. Tajima's D statistic was used to compare the observed nucleotide diversity against the expected diversity under the assumptions of selectively neutral polymorphisms and a constant population size 20 . The value (3.37) of Tajima's D using diploid genotype calls was smaller than that (4.04) obtained after converting the dosage forms (AAAA, AAAB, AABB, ABBB, and BBBB) into diploid forms (AAAA = AA, BBBB = BB, and AAAB, AABB, ABBB = AB) for use in analysis packages that do not support polyploid data. The value of 4.04 is close to that (4.29) obtained in a diploidized version described by Pandey et al. 14 . DAPC, HC, and KLFDAPC analyses for an extended genetic diversity panel. We further investigated the Korean potato clones in the Korean potato germplasm panel using an extended genetic diversity panel that included 94 Japanese potatoes 13 , 164 American potatoes, 15 Canadian potatoes, two German potatoes, one Chilian potato, and three potatoes of unknown origin 10 . The ClusterCall R package was used to obtain dosage genotype calls from the XY raw data of the Japanese potatoes, the publicly available theta data of the potato clones from North America and other countries 7 , and the .idat data from the Korean potato germplasm panel. Subsequently, the three dosage genotype calls were merged into a single dataset, hereafter referred to as the extended genetic diversity panel, based on common SNP markers. After filtering with the criteria MAF = 0.05 and call rate = 0.90, 3977 SNP markers remained (Table S3). DAPC was performed and the lowest Bayesian www.nature.com/scientificreports/ information criterion value was found to be 6 (  Table S10). The four flesh-colored potatoes, Jayoung, Hongyoung, Hongjiseul, and Jaekyo-P-15, moved from the Cluster I to the colored group (Table S10). The DAPC analysis of the extended potato diversity panel using 3977 SNP markers showed that differences in the percentages of the potato clones in specific clusters clearly reflect their country/regional origins. Figure 2A shows a ring plot representing the percentage of clones assigned to the six inferred clusters based on DAPC. For  (Table S2). These results confirm that clustering depends on the geographical location (Korea, Japan, and the USA) where the original crossing was carried out. Potato clones from Europe and other countries are placed into the Japanese cluster. The landrace potatoes are highlighted. www.nature.com/scientificreports/ the 73 Korean potatoes, 36% were grouped into Cluster IV and 53% were assigned to Cluster V; altogether, 89% of the Korean potatoes were grouped into these two clusters. The 54% North American potatoes were placed into Clusters IV and V. The potatoes in Clusters II and VI were the Russet (19%) and pigmented (15%) potatoes, respectively. The Russet class was unique across all countries. Moreover, 77% of the 94 Japanese potatoes and 86% of the 14 European potatoes were grouped into Cluster III. Although only one potato clone each was analyzed from Chile, Kazakhstan, New Zealand, Brazil, and Russia, they also grouped together into Cluster III ( Fig. 2A). Two clones, originating from China and Russia, were assigned to Cluster V. We also performed kernel local fisher DAPC (KLFDAPC), a nonlinear version of DAPC, which could rectify the limitations of linear approaches by preserving nonlinear information and the multimodal space of the samples 21 . The population genetic structure was projected by the first two reduced features of the KLFDAPC with σ = 2, for the Korean potato clones and the potato varieties released from Japan, the United States, and other countries (Fig. 2B). This confirmed that clustering depended on the geographical location (Korea, Japan, and the USA) where the original crossing was carried out. Potato clones from Europe and other countries were placed in the Japanese clusters. Interestingly, the potato landraces highlighted in Fig. 2B overlapped three different groups from Korea, Japan, and the United States. It is likely that the clear distinction between the American potatoes and Korean/Japanese clones was caused by the Russet varieties. The HC for the extended panel using 3977 SNP markers showed clustering profiles similar to those of the DAPC. The HC dendrogram (Fig. S6) led to an easily recognizable visualization of the duplicates among the 393 clones, whose pairwise genetic distances were zero or almost zero. The identified duplicates were Namjak vs. Irish Cobbler, Sumi vs. Superior, Daeseo vs. Atlantic, Daeji vs. Dejima, CO97043_14 W vs. MSQ070-1, Rosa_hari vs. Rosa, Russet Norkotah_hari vs. Russet Norkotah-S8 vs. Russet Norkotah-S3, Norin1_hari vs. Norin1, and InkaRouge_2x vs. Inka-no-mezame_2x. The former potatoes were from the 110-line panel and the latter from the 393-line extended panel.In fact, the Korean varieties Namjak, Daeseo, Sumi, and Daeji, in the 110-line panel are introduced and renamed from abroad as the cultivars, Irish Cobbler (Unknown), Atlantic (USA), Superior (USA) and Dejima (Japan), respectively. In addition, the foreign potatoes (Rosa_hari, Russet Norkotah_hari, and Norin1_hari) in the 110-line panel, were placed beside the original varieties from the extended panel with genetic distance = 0, indicating that the potato clones maintained in Korea have the same genetic identity as the original ones.
InkaRouge_2x and Inka-no-mezame_2x were duplicated, as described by Igarashi et al. 13 . The HC dendrogram showed the chip processing market potatoes grouped together, as were the pigmented potatoes and Russet varieties.
Heterozygosity and informativeness for a 393-line extended genetic diversity panel. The percentage of heterozygous SNP loci (percent heterozygosity) for the 393 lines is shown in Fig. 3. The percent heterozygosity for 68 (93.2%) of the 73 Korean potato clones was > 60.0% (Table S11). The highest percent heterozygosity was observed in cv. Daeseo (a.k.a., Atlantic), as described by Igarashi et al. 13 .
The Korean potato clones exhibited a higher average percent heterozygosity (65.6%) than the clones from Japan, the United States, and other landraces potatoes (62.4%, 63.2%, and 62.9%, respectively) according to a non-parametric Wilcoxon test (P < 0.001).
The informativeness of the 3977 SNP markers for the 393 potato clones from Korea, Japan, the United States, and other countries was calculated based on the transitions of genotype calls across samples, ranging from 25.4 to 79.4% (Table S12). The use of ten highly informative SNP markers could identify all 393 clones used in this www.nature.com/scientificreports/ study, including the duplicate clones (Fig. S7), being a power of discrimination equal to a high density SNP-set of 3977 markers. The MAF values for the selected 10 SNP-set were ≥ 0.40 except two markers ( Table 2).

Detection of SNP loci under selection.
A total of 70 SNP loci under selection were identified using iHS and XP-EHH (Fig. 4, Table S13), among which the 13 top significant SNPs detected by both approaches are shown in Table 3, along with the putative functions of the candidate genes containing these significant SNPs. Candidate genes spanning ~ 100 kb upstream and downstream of top significant SNPs were retrieved (Table 4), revealing that the Korean potatoes have footprints associated with several genes essential for biotic and abiotic stress tolerance. For example, candidate genes encoding the RPM1 interacting protein (Soltu.DM.09G018840), an essential regulator of plant defense, and leucine-rich repeat (LRR) family proteins (Soltu.DM.04G020580, Soltu. DM.04G020740, Soltu.DM.09G006540) were identified, whereas candidate genes such as nuclear factor Y (Soltu. DM.04G019240), the cystathionine beta-synthase family protein (Soltu.DM.04G020750), the zinc finger CCCHtype family protein (Soltu.DM.09G006620), and ascorbate peroxidase (Soltu.DM.09G006560) were identified for abiotic stress tolerance (Table S13).

Discussion
In this study, we used genome-wide SNP markers to evaluate a diversity panel composed of 393 potato varieties and advanced breeding lines that have been bred by different breeding programs worldwide, particularly from Japan, the United States, and Europe, focusing on their comparisons with Korean potatoes. The 110-line diversity panel, which included 45 commercial cultivars and 28 advanced breeding clones bred by Korean potato breeding programs, as well as 37 foreign potatoes, was investigated using three different complementary approaches: STRU CTU RE, DAPC, and HC. The Korean potatoes were divided into two groups, represented by cvs. Sumi and Daeseo, in agreement with the wide use of one of parents for cross breeding (Table S1). In the past, many foreign varieties were introduced and tested in local Korean environments. However, only a few varieties have been cultivated. For example, the potato varieties Atlantic, Superior, Irish Cobbler, and Dejima have been introduced and released under the registered names Daeseo, Sumi, Namjak, and Daeji, respectively 2 . When analyzing the Korean potatoes in the 393-line diversity panel, it was clear that they grouped together according to their market class (Fig. S6). For example, Korean potatoes suitable for chip processing were grouped together with foreign chip processing potatoes and the colored potatoes, cvs. Hongyoung, Jayoung, Hongjiseul, Jaekyo-P-15, and Daekwan2-60, were placed in the pigmented group that included Red Maria, Chieftain, All Blue, Shadow Queen, Purple Majesty, Winema, Aino-aka, Dragon Red, etc. We also looked at the groupings of landrace potatoes (pre-1930) (Triumph, Garnet Chile, Purple Peruvian, Nemuromurasaki, Kintoki-imo, Green Mountain, Russet Burbank, May Queen, Early Rose, Benimaru, and Kobo-imo) in the HC analysis (Fig. S6) and the KLFDAPC. In the KLFDAPC, they were placed centrally, overlapping the more recently bred potatoes from Korea, Japan, and the United States in different directions, visually supporting the history of potato breeding and how potato varieties have diversified according to various breeding strategies (Fig. 2B). Among the various potato types grown in the United States, the Russet potato is the most popular market class 22 . Russet potatoes are unique to the United States, and are not selected by breeding programs in either Korea or Japan, taking consumers' preferences into account. Approximately 35% of the potatoes in Japan are used for starch production 13 and many modern Japanese varieties have T-type chloroplast DNA 13,23 , supporting the result that Japanese potatoes were not differentiated from European ones in our study. Unfortunately, most European varieties do not perform well in Korean environments. Korean potato programs have been pursuing the development of diverse market class potatoes, such as potatoes suitable for chip processing, French fries, and double cropping (spring/summer season products are used as seeds for winter season production in the south) under low input conditions. Accordingly, several promising varieties have been developed and released for agricultural deployment as alternatives to foreign varieties such as Daeseo (Atlantic) or Sumi (Superior). In Table 3. Candidate genes containing top significant single nucleotide polymorphisms detected using integrated extended haplotype homozygosity and cross-population extended haplotype homozygosity analyses. www.nature.com/scientificreports/ terms of the high heterozygosity of Korean potatoes ( Fig. 3 and Table S11), it might be wise to direct breeding efforts to improve Atlantic potatoes to adapt well to local environmental conditions, as they showed the highest genome-wide percent heterozygosity of the studied varieties and are the most popular variety grown worldwide 13 .
Regarding the approaches employed in this study to reveal the genetic diversity and population structure of cultivated potatoes, dosage genotype calls could lead to more reasonable and accurate results than diploid genotype calls (Table 1). If no packages that support polyploid data are available, biallelic markers could be called in a diploidized version which means that the three heterozygous classes expected in potato were converted into one heterozygous class 17,24 . The use of appropriate methods for integrating different sources of SNP data could result in biologically meaningful outcomes, because previously, we recognized "strange" outcomes when we simply merged the publicly available genotype datasets (data not shown).
We identified several candidate genes, with 3977 SNP markers, related to biotic and abiotic stress tolerance that may be involved in adaptation to local environmental conditions. Candidate genes with putative functions, such as the RPM1 interacting protein (Soltu.DM.09G018840), LRR/NB-ARC domain-containing disease resistance proteins (Soltu.DM.04G020580, Soltu.DM.04G020740, Soltu.DM.09G006540), nuclear factor Y (Soltu. Table 4. The candidate selective sweep regions around the most significant single nucleotide polymorphisms, identified using integrated extended haplotype homozygosity and cross-population extended haplotype homozygosity analyses, which are associated with biotic or abiotic stress tolerances. www.nature.com/scientificreports/ DM.04G019240), the zinc finger CCCH-type family protein (Soltu.DM.09G006620), and ascorbate peroxidase (Soltu.DM.09G006560), were identified. RPM1-interacting protein 4 (RIN4) is a conserved plant immunity regulator that has been extensively studied and can be modified by pathogenic effector proteins 25 . RIN4 plays an important role in both pattern triggered immunity and effector-triggered immunity. Most disease resistance genes in plants encode nucleotide-binding site LRR proteins 26 . The nuclear factor Y complex plays multiple essential roles in plant growth, development, and stress responses 27 . CCCH genes are involved in plant developmental processes and biotic and abiotic stress responses 28,29 . The less-common CCCH type of zinc finger superfamily proteins are important in plant development and tolerance to abiotic stresses such as salt, drought, flooding, cold temperatures, and oxidative stress 29 . Ascorbate peroxidase catalyzes the conversion of H 2 O 2 generated under environmental stress into H 2 O,therefore, it is of great importance as a key antioxidant enzyme in maintaining cellular homeostasis 30 . Although some important candidate genes were detected under selection, it is worth mentioning that the genome coverage of the current 8 K SNP array may be low, resulting in a lack of information on some important genomic regions harboring selection signatures. This issue may be addressed by using a greater density of SNPs.
In terms of methods to enable selection of a small number of SNP markers for the evaluation of germplasm identity and purity, we invented the number of transitions across the samples for the genotype calls of specific markers, rather than the use of the previously described selection criteria such as high minor allele frequency, sampling of clustered SNP in proportion to marker cluster distance and a uniform genomic distribution 16 . Our method enabled direct selection of the most informative SNPs with high minor allele frequency from the filtered high quality SNPs of 3977 without any considerations. The selected 10 SNP-set can be used to evaluate genetic identity, genetic purity, parent-offspring identity, and the validation of crosses in nurseries 16,17 . The identified SNP markers will be converted into a competitive allele-specific PCR (KASP) system and validated for routine use in breeding programs as well as germplasm conservation.
Overall, these results on the molecular characterization of cultivated potato clones could help understand how potato cultivars diversify for distinct market classes depending on each countries' breeding strategies and could assist in genomics-facilitated breeding efforts to create new varieties that are better adapted to climate change and meet market demands.

Materials and methods
Plant materials. The germplasm used in this study comprised 110 diverse potato clones, including 73 Korean potato clones (45 commercial varieties and 28 advanced breeding lines) selected over 40 years by a potato breeding program in Korea, and 37 potato collections from various countries (Japan, the United States, the Netherlands, Germany, Spain, the UK, Russia, Belarus, Kazakhstan, Brazil, New Zealand, and China) (Table S1). Although nine of the foreign clones had an unknown origin, they were selected for this study according to their agronomic performance. All potato clones are available as tissue culture plants or tubers for field evaluations at Highland Agriculture Research Institute, National Agrobiodiversity Center, Rural Development Administration in Korea. Plant materials has been obtained and all experimental protocols in the present study complies with international, national, and institutional guidelines. The data were analyzed using Illumina GenomeStudio software according to the GenomeStudio® Polyploid Genotyping Module v2.0 Software Guide (Illumina, San Diego, CA). The SNP genotype data were filtered to exclude SNPs that were monomorphic, had > 10% missing data, or mapped to duplicate places in the genome. In addition, the genotype data were filtered using < 0.05 MAF, calculated by the function minorAllele in the R package adegenet 31 . After filtering, 6575 SNPs remained (Table S3) and were distributed across the 12 chromosomes (Fig. S2). In addition, genotypes in nucleotide format were obtained in GenomeStudio, and a tetraploid format STRU CTU RE input file (Table S4) was produced using a custom Python script. To determine the market class, phenotypic evaluations including tuber shape, tuber sucrose/glucose concentration, and chip color were carried out as described by Hirsch et al. 10 .
Comparisons of reproducibility of dosage genotype calling methods. The three software packages, GenomeStudio (Illumina software), ClusterCall (R package) 7 , and polyBreedR (the function geno_call, R package) (https:// polyp loids.r-unive rse. dev/ artic les/ polyB reedR/ Vigne tte1. html), which have been developed to generate dosage genotype calls based on different models, were compared in terms of reproducibility for three independent replicates of the 16 Korean varieties (Table S5).
The average number of loci with contradicting calls within these replicates after filtering (call rate 0.90, MAF 0.05) was only 0.2%, with a maximum of 0.3%, in GenomeStudio, whereas in ClusterCall, the number of markers with discordant calls between replicates was only 0.4%, with a maximum of 0.8%. There were no significant differences between the two software programs.
In contrast, for the function geno_call of polyBreedR, which employed the normal mixture model implemented in the R package fitPoly, the average difference was 3.8%, with a maximum of 6.3%. Thus, ClusterCall was used to generate dosage genotype calls for the merged dataset from different sources of raw data, as described below (Table S5, Fig. S1).