RAD-Seq analysis of wild Japanese garlic (Allium macrostemon Bunge) growing in Japan revealed that this neglected crop was previously actively utilized

Allium macrostemon Bunge, commonly referred to as "no-biru" in Japan, is a widespread wild onion species found across the country. Despite being deeply entwined in ancient Japanese culture, it remains an underutilized crop in Japan. Determining the origins of its domestic populations and understanding their genetic composition is crucial to highlighting the plant's historical significance in Japan. This study aims to bridge this knowledge gap by examining the genetic diversity of 47 A. macrostemon samples from various regions in Japan using RAD-Seq. Our analyses distinguished unique population structures, dividing the samples into three distinct groups: A, B, and C. Notably, groups A and B showed clear evidence of bulb propagation, while group C did not. Group C formed four subgroups: C1, C2, C3, and C4. Hybridization between subgroup C1 and either group A, B, or both, resulted in the emergence of subgroups C2, C3, and C4. Thus, groups A, B, and C1 are posited as the ancestral populations. Additionally, our morphological observations indicated distinct differences among these three groups. Our findings also suggest that human migration may have influenced the plant's distribution, hinting at active usage in the past that later waned, causing its current underutilized status.


Sample collection of Allium macrostemon
We collected bulbs from 47 A. macrostemon plants from different locations in Japan, including fields and riverbanks.Figure 1 and Table 1 show details regarding the plants, their collection dates, and places of origin.We further created three technical replicate samples (samples 30 and 31, samples 14 and 15, as well as samples 17 and 32), resulting in a total of 50 samples subjected to ddRAD-Seq.www.nature.com/scientificreports/

Variants detection using RAD-Seq data
The ddRAD-Seq of the 50 samples, as shown in Table 1, generated more than 10 Gbp of data, comprising a total of 196.2 million raw, single-end 51-bp reads.Quality-based filtering yielded an average of 0.9 million reads (ranging from a minimum of 0.2 million to a maximum of 3 million) across the 50 samples (Supplementary Table S1).
The Stacks program-built loci de novo with an average coverage depth of 16.78-fold (Supplementary Table S2).
We identified a total of 5848 variant sites for subsequent analysis.

The plant consists of three groups
Principal component analysis (PCA) of these variant sites effectively partitioned the 50 samples into three groups, labeled A, B, and C (Fig. 2).The division into these three groups was primarily driven by the principal components 1 and 2, which contribute 16.5% and 9.86% respectively.Analysis of principal components 3, 4, and 5 did not reveal additional meaningful features (Supplementary Fig. S1).Multidimensional scaling (MDS) analysis classified the samples into three groups, with the same members as identified by the PCA (Fig. 3).Members of group A originated from three districts in Japan: the plants in Chugoku (35 and 42), Shikoku (26,  34, and 36), and Kyushu (39).Thus, they were found throughout Western Japan, excluding the Okinawa district.Members of group B, on the other hand, came from five districts in Japan: plants from Tohoku (21), Kanto (2, 12, 25, and 30, with 30's technical replicate 31), Chubu (8), Kinki (27), and Shikoku (14, and its technical replicate   Cluster analysis was performed using a pairwise distance matrix (Fig. 4), which identified the same three groups as those identified in the PCA and MDS analyses: groups A and B each formed a tight cluster, while group C was an outgroup of groups A and B. The cluster analysis detected several subclusters, but, with two exceptions, it did not detect any region-specific subclusters.The two exceptions are the subclusters to which plants 5, 10, 22, and 33 from the Tohoku district and plants 16 and 46 from the Kyushu district belong.

Gene flow from group A and/or group B to group C was detected, but not from group C to group A or group B
We performed an admixture analysis to estimate the degree of genetic mixing.We used values from 1 to 10 as possible K values (symbolizing the number of ancestral populations), computed the cross-validation (CV) errors, and estimated the most likely K value, i.e., the K value with the lowest CV (Supplementary Fig. S2).The CV error was minimized at K = 3, indicating that the most likely number of ancestral populations is three, aligning with the number of groups identified above.The results of the admixture analysis are displayed (Fig. 5, Supplementary Fig. S3).The admixture analysis detected no gene flow into groups A and B from other groups.Conversely, this analysis identified gene flow into group C from either group A, group B, or both.www.nature.com/scientificreports/groups A and B in the PCA and MDS analyses.Within each subgroup, we did not identify characteristics specific to particular regions.The admixture analysis also displayed more intense gene flow into group C from group B than from group A.

Statistical interpretations of the classifications
We performed a statistical analysis on the six populations.The Fst value analysis (Table 2) exposed a high level of genetic differentiation between groups A and B, yet a comparatively low average degree of genetic differentiation between group A and subgroups C1, C2, C3, or C4, as well as between group B and any of these subgroups.Notably, the genetic differentiation between group B and either C1 or C2 was higher than that between group A and either C1 or C2.Conversely, the Fst values between group B and either C3 or C4 were lower compared to those between group A and either C3 or C4.
For each population, we conducted a statistical analysis.The average values of nucleotide diversity (π) and mean expected heterozygosity (He) were higher in the subgroups of group C; these values in groups A and B were similar and lower than in group C (Table 3, Supplementary Table S3).A negative Fis value indicates avoidance of inbreeding in groups A, B, and C2, while the relatively high Fis values in groups C1, C3, and C4 imply a moderate degree of inbreeding (Table 4).

Both groups A and B propagated asexually, while group C propagated sexually
A. macrostemon reproduces asexually via bulbs, particularly with human intervention.We detected asexual reproduction in the plant by examining the conservation of heterozygous loci positions via pairwise alignments.The degrees of conservation for the technical replicates, plants 14 and 15, 30 and 31, and 17 and 32, were 49.0%, 47.6%, and 43.9%, respectively.De novo analysis of RAD-Seq data, which contains many missing values, is likely to result in 40-50% conservation between somatic clones.Consequently, we determined the presence or absence of asexual reproduction by setting the value of conservation at nearly 40% as the criterion.It should be noted that a similar value of conservation was also used as a criterion for clonal propagation in a study of Japanese pepper that used de novo analysis of RAD-Seq data 28 .
We investigated the conservation levels among plants in each group, both within and across districts (Table 4, Supplementary Table S5).The conservation level in group A was high, with over half of the pairs showing above 40% (40.8-52.1%).Some pairs had conservation levels slightly below 40% (36.6-39.7%).However, given the somatic mutations (including loss of heterozygosity) that occur during asexual reproduction, it is likely that these plants with slightly lower conservation values also propagated from the same ancestor via asexual reproduction.The conservation level in group B was high, with all pairs of plants showing an average of more than 40% (40.7-49.7%)except for one pair which showed 39.6%.Conversely, nearly all the pairs observed in group C showed a low level of conservation, with only six exceptions.Thus, groups A and B propagated from a single plant by asexual reproduction, whereas the members of group C propagated almost exclusively without asexual reproduction.www.nature.com/scientificreports/

Morphological differences exist among ancestral populations A, B, and C1
Out of the 47 individuals subjected to RAD-Seq, 22 were cultured and analyzed for morphological data using nine distinct measurements (Table 5).Subsequently, we conducted a principal component analysis (PCA) on the 22 cultured individuals, using the acquired morphological data (Fig. 6a).Given that groups A, B, and C1 represent ancestral populations, a separate PCA was performed exclusively on the ten individuals from these groups (Fig. 6b).The first PCA that included all 22 individuals accounted for 41.28% of the total variance, with the second principal component contributing 25.8%.Conversely, in the PCA exclusive to the ten selected individuals, the first and second principal components accounted for 58.4% and 23.3% of the variance, respectively.Notably, while the graphical representation in Fig. 6a did not clearly delineate genetic or regional grouping, Fig. 6b distinctly showcased genetic groupings.These observations underscore the presence of morphological disparities among the ancestral populations A, B, and C1.
Based on the results of the PCA, we further investigated which morphological measurements primarily influenced the groupings observed in Fig. 6b.Among the variables, leaf width, maximum weight of bulbs, lateral  Vol:.( 1234567890) We examined the preceding study 33 that analyzed the concentration of phenolics in A. macrostemon collected from various regions across Japan.Using data extracted from this study 33 , we conducted a PCA, employing the quantitative values of functional components as variables.However, similar to the results in Fig. 6a, this PCA did not facilitate the delineation of samples based on genetic or regional clusters.

Discussion
The key revelation of this study is the historical relevance of A. macrostemon, a plant currently overlooked in Japan but widely used in earlier times.With minimal exceptions, no distinct regional clusters were discernible across all groups.Previous reports have noted the lack of a clear correlation between the plant's collection sites and the content of flavonoids and ferulic acid glycosides 18,19,33 .This absence of phenotypic differentiation between various collection sites aligns with the genetic findings of our study.These results suggest that although the plant's inter-regional migration is currently inactive, it was once actively transported, with its bulbous nature potentially facilitating this movement.Its historical significance is underscored by its wide usage as both a vegetable and medicinal resource.
The previously active utilization of this plant influence our interpretation of classical Japanese literature.Mentions of the plant are found in Japanese literary works such as the Kojiki and Manyoshu.Modern interpretations often suggest that individuals in the past had a fondness for wild plants.However, given the active use of A. macrostemon as a food source, it is plausible that emperors and poets from these historical periods may have appreciated it not merely as a wild plant but as a cultivated vegetable.This potential reevaluation offers a more nuanced understanding of cultural and lifestyle patterns in historic Japan.
Furthermore, this study revealed that A. macrostemon exhibits two distinct reproductive strategies.Groups A and B, which were clearly established as discrete groups, demonstrated evidence of clonal propagation based on the analysis of the conservation of heterozygous loci positions.This suggests that each group represents a population formed primarily through asexual reproduction.The negative Fis values observed in groups A and B are also likely related to this clonal propagation (Table 3).Therefore, the principal mode of reproduction within groups A and B appears to be through bulb propagation, highlighting a notable asexual reproduction strategy.
Conversely, group C exhibited a more complex population structure and demonstrated a largely non-clonal mode of propagation, with a few exceptions (Table 4).This suggests that group C primarily employs an alternative, bulb-independent mode of reproduction, most likely through hybridization leading to seed formation.Within group C, several members demonstrated notably low heterozygosity.This shift towards homozygosity is likely related to the reproductive dynamics brought about by hybridization.
Our study subdivided group C into four subgroups.Among them, subgroup C1 showed no signs of gene flow from either group A or group B. Furthermore, there was no evidence of clonal propagation within subgroup C1.Thus, group A, group B, and subgroup C1 represent distinct ancestral populations.Gene flow from group A, group B, or both, was observed in other subgroups within group C, contributing to increased genetic diversity within this group (Fig. 5).The possibility of gene flow due to human intervention cannot be discounted, but natural hybridization seems more probable.Human activities, including the movement of plants and bulbs, might have heightened the chances of natural hybridization between groups.This implies that human-induced translocation of this plant might have facilitated the augmentation of its genetic diversity.Consequently, this observation suggests that incidental human intervention may escalate the genetic diversity of a plant species, even one that can also propagate in the wild.
We conducted a cultivation trial to ascertain the morphological traits of this plant's genetic resources in Japan.Although we observed morphological variations, a discernible pattern remained elusive, prompting us to withhold the publication of these results.Nevertheless, the RAD-Seq analysis uncovered that groups A, B, and C1 formed the ancestral populations.Consequently, we re-evaluated only these three groups based on morphological characteristics, resulting in their partition into three distinct clusters.Essentially, excluding the population resulting from hybridization, Japanese A. macrostemon was genetically and morphologically divided into three groups.It was not possible to classify the three ancestral groups based on differences in their functional components.Still, we remain open to the possibility that future research may uncover componential differences among three groups.If ongoing research finds no compositional characteristics distinguishing the three ancestral groups, and if historical records indicate distinct recognition of each group, it might suggest that past populations valued this plant more for its vegetable properties than its medicinal benefits.
Given its widespread presence in Japan, A. macrostemon might have originated locally or been introduced from overseas, either in its entirety or in parts.Notably, groups A and B, which primarily reproduce through bulb propagation-a method well-suited for human-assisted migration, might have been introduced from foreign regions.In this scenario, the most desirable varieties might have been selected overseas and then imported into Japan.Conversely, group C, which displays a bulb-independent mode of reproduction, could be endemic to Japan.This group's unique reproductive strategy may have evolved to adapt to the specific environmental conditions prevalent in Japan, suggesting a native origin.However, more comprehensive research is required to conclusively determine the plant's origin and the pathways of its spread.
Interestingly, the Japanese word 'no-biru' also has another name, ' chosen no-biru' .'Chosen' means Korea, suggesting that part of this plant may have been introduced to Japan from Korea 34 .However, ' chosen' also has other meanings.'Chosen' is used as a prefix for closely related organisms, like different species within the same www.nature.com/scientificreports/genus.It is therefore possible that the plant was previously differentiated into two species.Although it remains unclear whether ' chosen' refers to Korea or to a closely related organism, the possibility that people in the past distinguished at least two species of this plant is intriguing and suggests a link with the grouping in this study.The existence of morphological differences among ancestral populations A, B, and C1 also supports this possibility.Further studies analyzing individuals from China and Korea, which would allow us to speculate on the evolution of this plant and the route of its arrival in Japan, would provide further clarification.A critical reference for discussing the potential foreign introduction of A. macrostemon is our prior study on the scallion mosaic virus (ScaMV) in the family Potyviridae that infects this plant species 35 .This earlier research revealed that the genome types of ScaMV found in Japan are dispersed among various genomic patterns, suggesting multiple introductions into Japan via different routes.Furthermore, it was shown that the parent ScaMV genomes recombined relatively recently in Japan and were subsequently spread across various districts by human activities.This supports the speculation that the plant might have been established in Japan prior to the introduction of the virus.Nevertheless, it does not entirely exclude the possibility that the plant and the virus were introduced simultaneously as part of the same event.In essence, this underlines the complex historical and ecological interplays between A. macrostemon, its associated viruses, and human activities.
In conclusion, our findings provide valuable insights into the genetic resources of A. macrostemon collected across Japan.Through genetic-level investigations, we have provided evidence suggesting past human-induced migration of this species.In this study, the use of RAD-Seq technology was instrumental in analyzing the genetic diversity of a species without prior genomic information, similar to what has been done previously with Japanese pepper 28 .Looking forward, conservation efforts for this plant's genetic resources will require attention to not only genetic diversity but also key traits such as the chemical compounds present in the plant's bulbs and leaves, which possess potential value for culinary uses.

Plant materials
A total of 47 wild Japanese garlic (A.macrostemon) samples were collected from various sites along riverbanks and fields across Japan.We searched for wild Japanese garlic plants on foot and by car.All methods involving plants were carried out in accordance with relevant guidelines and regulations.Plants were collected with permission from their owners.All wild Japanese garlics were maintained at the Center for Education and Research in Agricultural Innovation, Saga University (Saga, Japan).Detailed information about each plant sample, its origin, and the respective collection date is provided in Table 1.Certain samples served as technical replicates of the same plant, samples 30 and 31, samples 14 and 15, as well as samples 17 and 32.

Cultivation trial and measurement
The cultivation trial of A. macrostemon was conducted in a field (grey lowland soil, pH 6.1, EC 0.19 mS cm −1 ) located at Saga University (latitude 33° 30′ 91.308″ N; longitude 130° 33′ 28.119″ W).The compound of fertilizer (N:P 2 O 5 :K 2 O = 14:14:14%) was applied at 8 kg of N, phosphate, and potassium each at 10 a −1 , and the field was covered with black mulch.The bulbs were planted on 9 October 2015 and 12 October 2016.The planting density was set with a 150 cm row width and 25 cm spacing between plants both lengthwise and widthwise.Five mediumscale plants of the isolate were arranged per row, with one plant per hole.We recorded various morphological parameters over a 2-year period (2016 and 2017), including the number of leaves, plant number of tillers, leaf length and width, leaf sheath diameter, the number of daughter bulbs, the total and maximum weight of bulbs, and the lateral diameter of the maximum bulb.

DNA extraction and double-digest restriction site amplified DNA sequencing (ddRAD-Seq)
The extraction and purification of DNA were performed using the CTAB method 36 , followed by RNase treatment.The quality of the isolated genomic DNA was validated through 0.8% agarose gel electrophoresis.The DNA concentration was quantified using a Qubit dsDNA BR Assay Kit (Invitrogen, MA, USA).The library for ddRAD-Seq was prepared based on the method of Sakaguchi et al. 37 , with some adaptations from the original protocol 25 .The libraries were sequenced using 51-bp single-end reads at Macrogen (Seoul, Korea) on one lane of a HiSeq 2000 (Illumina, San Diego, CA, USA).

De novo mapping and variant calling
The denovo_map.pl script from the Stacks package (version 2.60) 38 , a wrapper for ustacks, sstacks, tsv2bam, and gstacks, was used to map reads de novo without using a reference genome sequence.The options for denovo_map.pl were -M 4 (number of mismatches allowed between stacks within individuals [for ustacks]), -n 4 (number of mismatches allowed between stacks between individual [for cstacks]), and -m 3 (number of identical reads required to initiate a new putative allele [for ustacks]).Genotyping data were created by assigning one individual per population to specify all the samples.After performing denovo_map.pl, the populations program of the Stacks package (version 2.60) created the vcf (variant call format) 39 , plink 40 , and phylip files using the -R 0.5 (minimum percentage of individuals across populations required to process a locus), -write-single-snp (restrict data analysis to only the first SNP per locus), -min-maf 0.05 (minimum minor allele frequency required to process a nucleotide site at a locus), -vcf, -plink, and -phylip-var-all options.

Principal component analysis, multidimensional scaling analysis, and cluster analysis
PCA and MDS analyses were performed using the vcf file generated by the populations program.The SNPRelate program 41 in the R software environment (version 4.1.2) was employed to convert the vcf file into a gds (genomic data structure) file and generate PCA and MDS plots.The contribution ratio for each principal component was

Figure 1 .
Figure 1.Collection sites for Allium macrostemon.Colours are used to refer to the different districts.The map is sourced from http:// www.craft map.box-i.net/ (accessed on 21st February 2023).

Figure 2 .
Figure 2. Principal component analysis (PCA) of Allium macrostemon samples, with the first two components based on 5848 SNP markers.The contribution rate of each principal component is indicated in parentheses.The colour scheme of the samples corresponds to that in Fig. 1.Colours are used to refer to the different groups.The figure was generated using R software (version 4.1.2).

Figure 3 .
Figure 3. Multidimensional scaling analysis of Allium macrostemon samples using two-dimensional data based on 5848 SNP markers.The colour schemes of the samples and the groupings correspond to those in Figs. 1 and 2, respectively.This figure was generated using R software (version 4.1.2).

Figure 4 .
Figure 4. Cluster analysis of Allium macrostemon samples.The colour schemes of the samples and the groupings correspond to those in Figs. 1 and 2, respectively.This figure was generated using R software (version 4.1.2).

Figure 5 .
Figure 5. Admixture analysis of Allium macrostemon samples used in this study, illustrated by K = 2-5 admixture plots.The horizontal axis displays both the group names and sample numbers.The colour schemes of the samples and the groupings correspond to those in Figs. 1 and 2, respectively.It should be noted that the color used for genetic clusters differs from that used for group names and sample numbers.This figure was generated using R software (version 4.1.2).

Figure 6 .
Figure 6.Principal component analysis (PCA) of morphological data from samples of Allium macrostemon.(a) analysis of all 22 samples, (b) analysis of 10 samples belonging to groups A, B, and C1.The contribution rate of each principal component is indicated in parentheses.The colour schemes of the samples and the groupings correspond to those in Figs. 1 and 2, respectively.The figure was generated using R software (version 4.1.3). https://doi.org/10.1038/s41598-023-43537-5

Table 1 .
Information on Allium macrostemon samples used in the study.

Table 2 .
Fst values among the six groups.

Table 3 .
Population genetic statistics of each group.

Table 4 .
Conservation of heterozygous loci positions across pairs of samples in each group.Some pairs were selected and shown.Full data are presented in Supplementary TableS5.

Table 5 .
Morphological features of Allium macrostemon collected in various parts of Japan.
diameter of the maximum bulb, leaf sheath diameter, and the number of plant tillers were characterized by elevated values in group B. Conversely, the number of daughter bulbs demonstrated higher values within group A. Leaf length, the number of leaves, and the total weight of bulbs displayed enhanced values in both group A and group B, indicating shared morphological features among these two groups.Remarkably, none of the measurements presented high values for group C1, suggesting distinct morphological characteristics within this group.