Genetic diversity analysis of Korean peanut germplasm using 48 K SNPs ‘Axiom_Arachis’ Array and its application for cultivar differentiation

Cultivated peanut (Arachis hypogaea) is one of the important legume oilseed crops. Cultivated peanut has a narrow genetic base. Therefore, it is necessary to widen its genetic base and diversity for additional use. The objective of the present study was to assess the genetic diversity and population structure of 96 peanut genotypes with 9478 high-resolution SNPs identified from a 48 K ‘Axiom_Arachis’ SNP array. Korean set genotypes were also compared with a mini-core of US genotypes. These sets of genotypes were used for genetic diversity analysis. Model-based structure analysis at K = 2 indicated the presence of two subpopulations in both sets of genotypes. Phylogenetic and PCA analysis clustered these genotypes into two major groups. However, clear genotype distribution was not observed for categories of subspecies, botanical variety, or origin. The analysis also revealed that current Korean genetic resources lacked variability compared to US mini-core genotypes. These results suggest that Korean genetic resources need to be expanded by creating new allele combinations and widening the genetic pool to offer new genetic variations for Korean peanut improvement programs. High-quality SNP data generated in this study could be used for identifying varietal contaminant, QTL, and genes associated with desirable traits by performing mapping, genome-wide association studies.

typing data with the ' Axiom_Arachis' 48 K SNP array 37,39 containing highly informative genome-wide SNPs (Table S3, S4). The sample SNP genotyping and SNP calling were performed using an Affymetrix GeneTitanplatform as described by Pandey et al. 37 . In brief, target probes were initially prepared for each sample and subjected to amplification, fragmentation, and hybridization on the chip, followed by DNA ligation and signal amplification. Then GeneTitan Multi-Channel Instrument was used for sample staining and scanning. For all 96 peanut samples, allele calling was performed using the software Axiom™ Analysis Suite version 1. To perform the sample quality control (QC) analysis, the 'Best Practices' workflow was used to select the desired samples that passed the QC test. Further, genotype calls were generated using the 'Sample QC' workflow, followed by the 'Genotyping' workflow for genotyping on the imported cell files (http:// media. affym etrix. com/ suppo rt/ downl oads/ manua ls/ axiom_ analy sis_ suite_ user_ guide. pdf). Finally, the 'Summary Only' workflow was performed to generate a summary and retrieve SNP data. SNPs with low call rates were removed with selection criteria of DishQC (DQC ) > 0.75 and call rate > 90%. Only high-quality SNPs were selected for further analysis.
Genetic diversity analysis. SNP data were used to elucidate the genetic diversity and the genetic relationship among individual genotypes. In addition, SNP information of public domain US mini-core genotypes was downloaded and merged with the Korean set of genotypes. These sets were then merged and designated as Merge set genotypes to assess the genetic diversity. Genetic diversity parameters such as allele frequency, heterozygosity, and polymorphic information index (PIC) were measured using the PowerMarker version 3.25 40 and TASSEL version 5.2.39 software with the default setting.
Population structure. Population structure analysis was conducted using a Bayesian clustering method implemented in STRU CTU RE 2.3.4 software 41 . We used K = 1 to 10 to investigate the number of populations (K). Evolution genetic clusters were estimated from the replication runs with the burn-in period up to 500,000 and the Markov chain Monte Carlo (MCMC) to repetitions at each K value. In the STRU CTU RE, LnP(D) representing the maximum likelihood can be obtained based on the K value. It is generally considered as an optimal number of a subpopulation 42  Phylogenetic tree analysis. Phylogenetic analysis were performed to obtain insight into the genetic diversity of peanut genotypes. Meanwhile, a phylogenetic tree was constructed using MEGA software (http:// www. megas oftwa re. net/.) with the neighbor-joining method with the following parameters: a test of phylogeny, bootstrap method; no. of bootstrap replications, 1000; model/method, maximum composite likelihood; substitutions to include, d: Transitions + Transversions; and Gaps/missing data treatment, pairwise deletion. A cluster was manually divided into subclades based on the hierarchically clustered genotype profiles.
Analysis of molecular variance. To examine the molecular variance among and within the populations, an analysis of molecular variance (AMOVA) was performed. Both Korean set and Merge set of peanut genotypes were further analyzed using Arlequin ver. 3.5.2 (http:// cmpg. unibe. ch/ softw are/ arleq uin35) to estimate the pairwise genetic distance (F ST ) for the subpopulations and to calculate the genetic variation between and within populations.
Development and validation of KASP markers. KASP markers were developed based on polymorphic analysis of axiom data putative SNPs to distinguish the 17 diverse genotypes currently used in the Korean peanut breeding program and can also be used for varietal identification. Five sets of KASP assay markers were designed by identifying the flanking sequences (50 bp each upstream and downstream) around the SNP position (Table S5). Validation and genotyping were performed for these 17 selected genotypes using the robust KASP assay. The KASP assay amplification and allelic discriminations were performed using a Quant Studio 3Real-Time PCR System (Thermo Fisher Scientific Korea Ltd.). The KASP assays were performed according to the manufacturer's standard protocol. In brief, 5-10 ng of genomic DNA template was mixed with 5 µL of KASP reaction mixture and 0.14 µL KASP assay. After mixing well, assays were performed with the following thermal cycling conditions: 15 min at 94 °C, a touchdown phase of 10 cycles at 94 °C for 20 s and at 61 °C-55°C (dropping 0.6 °C per cycle) for 60 s, and 26 cycles at 94 °C for 20 s and 55 °C for 60 s (first PCR stage). Recycling was then performed, including three cycles of 94 °C for 20 s and 57 °C for 60 s (second PCR stage). The recycling was performed twice and fluorescence reading was viewed for KASP genotyping after PCR amplification.

Results
Selection and genome-wide distribution of selected SNPs. A total 96 Korean set peanut genotypes from different countries (Table S1) were genotyped using the high-density SNP array ' Axiom_Arachis' 48 K SNP array. A total of 47,837 SNPs were initially obtained for these 96 Korean peanut genotypes based on SNP calling (Fig. 1A). Genome-wide distribution of SNP calling pattern, genome density, and the genotype heterozygosity used in analysis are shown in Fig. 1C (Fig. 1A). Similarly, US mini-core genotypes along with Korean set genotypes designated as Merge set were used in genetic analysis using 4475 SNP loci obtained after filtering data. For US mini core, chip data were obtained from the PeanutBase (https:// peanu tbase. org/ home). Further SNP genotype data of US mini core were filtered based on the shared or common Affymetrix ID (SNP site) between (US-mini core) and Korean set (96 genotypes) used for the analysis as the merge set (Table S1 and S2).  Table 1). Results of genetic diversity indicated that neither the Korean set nor the Merge set of genotypes was very diverse. F-statistics (F is ) of both sets (Korean and Merged) showed a high inbreeding coefficient (Table 1). Such results are expected as the peanut is known to be a highly self-pollinated crop. Genotypes used in this study might have been developed based on repetitive selection. Consequently, a high inbreeding coefficient of the relationship was expected. Besides, the relationships between the Korean set and the Merge set of genotypes were further investigated using IBS allele-sharing values. The MDS plot revealed that genome-wide IBS pairwise distances for Korean set and Merge set genotypes were 0.42-1.02 and 0.48-1.03, respectively (Figures S1A and S1B). Ten pairs of genotypes from the designated sets with maximum and minimum IBS values are presented in Table S7. The IBS-based genetic distance matrix in the Korean set and the Merge set of genotypes was calculated using the TASSEL version 5.2.39. (https:// tassel. bitbu cket. io/). The genetic distance ranged from 0.42 to 0.99, with an average value of (0.90) for the Korean set (Table S7). The lowest genetic distance (0.42) was observe between genotype GanghwaColl and Toalson in the Korean set. Likewise, the genetic distance ranged from 0.48 to 0.93 with an average of 0.99 for the Merge set. The lowest genetic distance (0.48) was observe between genotype TwungChungMuZhuVun and PI337399 in the Merge set (Table S7).    Fig. 2A). These results suggested that the Korean set of peanut genotypes should be categorized into two groups (GI and GII) (Fig. 2B). Analysis with a K value of 3 was also performed to further explore the population structure ( Fig. 2A). When K 3, at the second-largest value (ΔK = 50.114), the genotypes were divided into three groups (GI, GII, and GIII) ( Fig. 2B). At K = 2, 81 genotypes fall in the GI group, whereas, 15 genotypes fall in the group GII (Table S1). Similarly, the population structure of the Merge set of genotypes showed maximum K value (22,778.77) at K2 (Fig. 3A), followed by K3. Based on the representative population structure, the merged set of genotypes could be divided into two groups (GI, GII) at K = 2, and further at K = 3, this set of genotypes were segregated into three groups (GI, GII, and GIII) (Fig. 3B). At K = 2, 152 genotypes and 55 genotypes fall in the GI, and GII groups respectively (Table S2). Overall, we found a mixed pattern of separation of all the genotypes and a lack of distinct separation with subspecies, botanical variety, lineage, or origin.

Phylogenetic analysis.
A phylogenetic tree comprising 96 peanut genotypes in the Korean set was constructed based on 9947 high-resolution SNPs obtained after filtering (Fig. 4). These 96 peanut genotypes were divided into two clusters (CI and CII). The second cluster was subdivided into CIIa and CIIb. The CI cluster consisted of 43 genotypes, whereas the second cluster was manually divided into subclade I (n = 23) and CIIb (n = 30) genotypes ( Fig. 4) based on the hierarchically clustered genotype profiles. Cluster CIIa comprised 22 Korean origin genotypes and only one genotype from India, whereas cluster CIIb comprised 13 of 30 genotypes with Korean origin. Based on the phylogenetic tree, Korean peanut genotypes might have been introduced from not only India and China, but also from other parts of the world like USA, Argentina, Taiwan, and Zimbabwe. Korean genotypes mainly belonging to cluster CIIb might have been directly introduced from different parts of the world (Fig. 4). A phylogenetic tree was also constructed for the Merge set of 207 genotypes having 4475 SNPs (Fig. 5). The Merge set of genotypes were also classified into two major clusters in the NJ phylogenetic tree. The CI cluster comprised 96 genotypes, whereas the second cluster was manually divided into subclades based on the hierarchically clustered genotype profiles as CIa and CIIb comprising 111 genotypes (Fig. 5). Although the mini core US genotypes belonged to different countries (Table S2), they formed a distinguish cluster (CIIa and CIIb) from the Korean set genotypes cluster (CI). Cluster CIIa comprised 50 genotypes belonging to three different botanical varieties (fastagiata, hypogaea, vulgaris) from diverse parts of the world (Table S2). Whereas, cluster CIIb comprised 61 genotypes belonging to three different botanical varieties (fastagiata, hypogaea, vulgaris) dominated with 8 genotypes from Israel and 6 genotypes from India. The remaining genotypes were from different parts of the world (Table S2). However, genotype distribution was not observed based on categories of subspecies, botanical variety, or origin. Overall, results for both the data sets were found to be consistent with the model-based population structure at K = 2.
Principal component analysis. To further verify the clustering observed in the phylogenetic tree, PCA was conducted for the first two principal components (PC1 and PC2) of the Korean set of genotypes based on high-resolution SNP data. Scatter plots of the principal components displayed a variation of 29.92% for PC1 and 39.74% for PC2 (Fig. 6A). These scatter plots suggested that the Korean set of peanut genotypes were forming two main groups majority of Korean genotypes falling in the one group, consistent with clusters of the phylogenetic tree except for a slight overlap between some genotypes (Fig. 6A). Likewise, PCA was performed for the first two principal components of the Merge set of genotypes using 4475 SNPs. PC1 results explained 50.12% of the variance and PC2 explained 16.14% of the variance (Fig. 6B). The PCA plots in Fig. 6B correspond with Korean set genotypes in one cluster and Merge set genotypes into another cluster. These results suggest that the Merge set of peanut genotypes groups with different degrees of diversion among 207 genotypes and Korean set 96 genotypes more closely in other groups (Fig. 6B).  (Table 2). Likewise, a population variation of 65.61% was observed whereas individual population samples of the Merge set showed a variation of 34.39% (Table 2). In the case of the Korean set, the genetic difference among subpopulations was low compared to the difference within individual subpopulations. Conversely, the genetic difference among subpopulations was high compared to the difference within individual subpopulations in the Merge set model-based population. These results suggested two subgroups, which were consistent with the population structure and phylogenetic tree analysis of both datasets.

Development and validation of KASP markers. Based on Axiom SNP array data, we identified a few
SNPs that could distinguish Korean origin peanut genotypes (n = 17) which are currently in use in the Korean peanut breeding program. These selected SNPs were used to develop the KASP primers. Further, all the designated genotypes (n = 17) were genotyped using these KASP markers to validate the results. Results of genotyping analysis yielded expected results as shown in Fig. 7 and Table S6. For example, the KASP marker CV_1 could dis- Similarly, another KASP marker set (CV_2, CV_3, CV_4, and CV_5) distinguished these designated genotypes clearly from each other. These results indicated that these selected KASP marker sets could be utilized for the identification of varieties or for hybrid purity analysis. Further, these marker sets can be used in marker-assisted selection.

Discussion
Advances in the genomic research of peanut are limited and advanced molecular techniques are underutilized to manage available germplasm and landraces. However, the elucidation of genetic relationships and genetic diversity among the germplasm or other breeding lines can lead to the precise use of genetic resources for crop www.nature.com/scientificreports/ improvement and for designing the breeding program 45,46 . Recent developments in peanut genomic research, especially after the availability of its reference genome 7 and different SNP arrays 37,39 for high throughput genotyping of peanuts, provide a great opportunity for accessing different genetic resources at the genetic level and for advancing the breeding program via multiple strategies. SNP-based genotyping is widely used due to its accuracy, abundance, and high throughput. In this study, initially, a pseudomolecule-reference genome was identified based on the genomes of two diploid subspecies of cultivated peanut, A. duranensis (A genome) and A. ipaensis (B genome) as described previously 19,37 . Furthermore, pseudomolecule-reference genome-wise distribution of SNP arrays and genome-wise polymorphic SNPs were identified (Fig. 1A, Table S3). This analysis was more or less similar to the previously reported studies 19,37 . Further, the genetic diversity and the population structures of the Korean set of peanut genotypes (n = 96) were analyzed using 9947 high-resolution SNPs. At the same time, data related to mini-core US 111 genotypes were extracted from the public domain and combined with the Korean set of peanut genotypes to generate a Merge set. This Merged set was also used in the analysis of genetic diversity and population structure based on the 4448 polymorphic SNPs.. Results of genetic diversity analysis for the Korean set of genotypes showed F ST values that suggested the genotypes in the Korean data set were more closely related. Moreover, Korean sets of genotypes seem to have a mixed lineage from parents that may belong to different geographical origins or gene pools.
Whereas, F ST values of the Merge data set suggested that the US mini-core genotypes more diverse compared to Korean genotypes. Moreover, we observed a significant difference between the Korean data set and the minicore data set (Fig. 5). As no Korean genotypes fall in CIIa or CIIb cluster (Fig. 5) and because the percentage of variation in the Merged set was 66% among populations (between Korean and mini-core genotypes) compared to 10% between-group CI and CII in the Korean set (Table 2). These results suggested that the Korean data set and the mini-core data of peanut genotypes were significantly diverse. Unlike the F ST values, F-statistics (F is ) of Korean set of genotype was slightly higher than merged set (Table 1). It was expected to have a higher inbreeding coefficient in Korean data set, however as we merged the Korean and mini-core genotypes and used only common SNPs between Korean data set and mini-core that could have led to higher F is values in merged set as well. Commonly, the inbreeding coefficient (F is ), determines the probability that two alleles at a given locus in an individual are equal by lineage from the shared ancestors of the two paternities. These results demonstrate  44 . These genotypes were clustered into two major clusters. The second cluster was subdivided into CIIa and CIIb, with each colored branch representing respective clusters (CI, red; CIIa, purple; CIIb, green). The leaf node symbol represents the respective country mentioned in the legend. www.nature.com/scientificreports/ Figure 5. Phylogenetic tree of Merge set of peanut genotypes (n = 207) constructed using MEGA X software (http:// www. megas oftwa re. net/.) with the neighbor-joining method 44 . These genotypes were clustered into two major clusters. The second cluster was subdivided into CIIa and CII, with each colored branch representing respective cluster (CI, red; CIIa, purple; CIIb, green). www.nature.com/scientificreports/ that the mini core consisting of merged data have slightly less inbreeding compared to the Korean alone data set. Thus, it could be considered that the genetic diversity for Korean set peanut genotypes was low (Table 1) compared to the US mini-core genotypes. These results for genetic diversity were highly comparable to those of some recent studies conducted using SNPs markers for peanuts 19,37,47,48 . Structure within the population is best described when it is separated into two or three subgroups (K = 2 and K = 3). Particularly, the Korean set of peanut genotypes (n = 96) were categorized into two or three (GI, GII and GIII) groups (Fig. 2B). Likewise, 207 genotypes of the merged set were separated into two or three groups (GI, GII, and GIII) (Fig. 3B) at K = 2 and K = 3 respectively. However, STRU CTU RE analysis for the Korean data set did not indicate correspondence with subspecies, botanical variety, or origin lineage and was observed as a mixed pattern of separation. Similar types of results were reported by the researcher in peanut mini core genetic analysis 20,49 . Thus, this result indicated seeds of Arachis hypogaea L. might have been dispersed with wide geography from the center of origin.
Based on phylogenetic analysis, the Korean set of peanut genotypes (n = 96) were divided into two major clusters (CI and CII). CII was manually subdivided into CIIa and CIIb clades (Fig. 4). These Korean origin genotypes were distributed over all clusters CI, CIIa, and CIIb, with US runners/Virginias and US Spanish or Spanish parent-derived genotypes in CI, and CIIb clusters respectively (Fig. 4). More than 50.0% of Korean genotypes in the cluster CI had a close genetic distance with genotypes from the USA and China. Thus, these Korean peanut genotypes might have been derived from China and the USA. Other Korean genotypes were located in the CIIa cluster, showing 18.30% similarities with genotypes from different parts of the world (India, China, USA, Argentina, Taiwan, and Zimbabwe). Similarly, 31.0% of Korean genotypes from cluster CIIb showed similarities with a single Indian genotype, indicating that these genotypes belonging to this cluster might have been originated from India. Similarly, phylogenetic analysis for the Merge set of 207 genotypes showed two major groups (Fig. 5). As expected, Cluster CI (n = 96) genotypes belonged to the Korean set whereas clusters CIIa and CIIb comprising 111 genotypes were derived from the US mini-core set that formed a separate cluster (Fig. 5). In Fig. 5 all the Korean genotypes are in cluster CI, along with a few US Spanish origin genotypes whereas, cluster CIIa and CIIb are with the majority of genotypes belong to the mini core collection, and a few US runners. These results indicate that germplasm exchange between Korea and other parts of the world might have occurred. Different peanut genotypes in both sets (Korean and Merged) across the globe were separated into two major groups, although individuals in groups were mixed without any correlation with subspecies, botanical variety, or origin. However, the phylogenic results were in agreement with the results of the model-based population structure analysis at K = 2 for both the Korean set and the Merge set of peanut genotypes. Besides, the clustering and grouping patterns www.nature.com/scientificreports/ of genotypes of different origins could be explained precisely using an SNP array. They were comparable to those reported in previous studies 18,19,[50][51][52] . Though the mini core genotypes were selected to exploit the genetic diversity in the peanut core and germplasm collection, however, we found a high extent of genetic resemblance among the genotypes which is predictable since peanut has typically known for low genetic diversity. Previously, in the GWAS study researcher reported a high level of similarities among the mini core accession 49 . Moreover, the clusters derived from both sets of genotypes were further validated by PCA (Fig. 6). Results were consistent with relationships indicated by phylogenetic tree analysis and structure analysis. AMOVA results explained the differential variation between the Korean set and the Merge set of populations (Table 2). Population variation among the Korean set was 10.34%, whereas a difference of 89.66% was found in individual samples within the population of the Korean set (Table 2). Likewise, a variation of 65.61% was found among the populations of the Merged set and a variation of 34.39% was observed within the individual population samples of the Merge set (Table 2).
Overall, this study suggests that the high inbreeding coefficient and less variability in tested Korean set of genotypes. This might be instigated due to the self-pollination and repetitive selection of these peanut lines over the years during cultivar development and breeding process, which might have consequently reduced the genetic diversity. Therefore, higher similarity and low genetic variability were observed in tested Korean set of genotypes of the current study. This observation is in line with the pollination system and the history of peanut cultivation.
The increasing number of individuals and the routine use of population studies in genetics and breeding programs require flexibility and precision in genotyping methods. KASP marker assay is one such genotyping method that has emerged recently. It offers several advantages such as flexibility, robustness, multiplexing, cost-effectiveness, and rapid genotyping of small to large populations with hundreds of markers [53][54][55] . Thus, we developed KASP marker assays to validate and distinguish important pre-breeding genotypes using a unique SNP set that showed high-resolution calls on the Axiom 48 K SNP assay and polymorphism between selected genotypes. Although the number of markers used for validation was small, the viability and usefulness of these markers were analyzed by selecting fewer SNPs to distinguish Korean origin peanut genotypes (n = 17). Our results were similar to those using Axiom array genotyping, thus confirming and validating results of SNP array (Table S5). Validation using KASP markers suggested that a newly developed marker set could be utilized to identify varieties or analyze hybrid purity. In addition, these sets of markers could be used in marker-assisted selection or marker-trait association studies. Our results were consistent with previous studies reporting the application of KASP assays in legume species including peanuts 51,53,54,56 .

Conclusion
Results of this study showed a low genetic variation in Korean genotypes that might be caused by domestication, intentional selection, reduced population and genetic drift relation with the same gene pool, and breeding for some desirable traits. In the present study, genotype distribution was not found with the category of subspecies, botanical variety, or origin. However, this study revealed that the US mini-core set of genotypes were more diverse and with a negative or lower inbreeding coefficient than the Korean set of genotypes. This suggests that the Korean breeding scheme needs to widen the genetic base of breeding material and create new alleles or gene pool combinations by diversifying the current Korean breeding resources and increasing the breeding population. Besides, some genetically diverse genotypes might be useful for creating new trait combinations, developing a mapping population with desirable traits, improving crops, and generating new cultivars via appropriate breeding approaches. The present study also provides a number of high-resolution polymorphic SNP markers distributed in A and B subgenomes of peanut subspecies. They can facilitate the development of a new marker set for differentiation of varieties. Furthermore, these SNPs can be utilized to develop SNP-based genetic map, mapping applications, background selection, and substantial molecular breeding applications in the Korean peanut breeding program.