Elucidation of Japanese pepper (Zanthoxylum piperitum De Candolle) domestication using RAD-Seq

Japanese pepper, Zanthoxylum piperitum, is native to Japan and has four well-known lineages (Asakura, Takahara, Budou, and Arima), which are named after their production area or morphology. Restriction-site associated DNA sequencing (RAD-Seq) was used to analyse 93 accessions from various areas, including these four lineages. Single nucleotide variant analysis was used to classify the plants into eight groups: the Asakura and Arima lineages each had two groups, the Takahara and Budou lineages each had one group, and two additional groups were present. In one Asakura group and two Arima groups, the plants were present in agricultural fields and mountains, thus representing the early stage of domestication of the Japanese pepper. The second Asakura lineage group was closely related to plants present in various areas, and this represents the second stage of domestication of this plant because, after early domestication, genetically related lineages with desirable traits spread to the periphery. These results demonstrate that domestication of Japanese pepper is ongoing. In addition, this study shows that spineless plants are polyphyletic, despite the spineless lineage being considered a subspecies of Japanese pepper.


Results
Variant detection by de novo mapping of RAD-Seq data. Double digest RAD-Seq generated more than 11.9 gigabases of data for a total of 235.2 million raw single-end 51-bp reads. Quality-based filtering resulted in an average of 2.5 million reads (maximum of 4.3 million and a minimum of 0.4 million) among 93 samples (Supplementary Table S1). The Stacks program built loci de novo with an average coverage depth of 27.32 times (Supplementary Table S2). We performed the following analysis using data from 4,334 variant sites.
Correlation of genetic similarity with geographic locations. Based Table 1), 93 samples were classified into seven groups. PC1 divided group A from the other groups (Fig. 1a); PC2 divided groups D and E from the other groups (Fig. 1a); PC3 (Fig. 1b) identified groups C and F and confirmed the presence of group E; PC4 identified group B (Fig. 1c) and confirmed the presence of groups C and F; PC5 reconfirmed the presence of groups B and F (Fig. 1d); and PC6 identified group G (Fig. 1e). Thus, probably because the genetic relationships among plants were complex, it was necessary to use PC1 through PC6 for classification.
The analysis identified group D as the second Asakura lineage group. Group D plants were from Yabu City only (Figs. 1a, 2, Table 1), except for plant 74, which was from Kami. Unlike group A, which had more cultivated plants, group D contained many wild plants found in mountainous areas. Most, but not all, of the group D plants were spineless. Plant 58 was 100 years old and was growing in an agricultural field. When we observed plant 58 in 2017, this plant was spineless. As of 2020, this plant was near death, and its spines were visible.
The analysis identified group E as the Takahara lineage. Group E (Fig. 1a) was an isolated group with plants only from Takayama city, located in the highland region of northern Gifu Prefecture (Fig. 2, Table 1). This group consisted of only cultivated plants. Most of the group E plants were spineless (Table 1).
The analysis identified group C as the first group of the Arima lineage. Group C plants were from Kobe and Asago cities in southern Hyogo Prefecture (Fig. 2 www.nature.com/scientificreports/  www.nature.com/scientificreports/ agricultural field in Asago City for the preservation of the Arima lineage. All plants in group C were spiny and transplanted from Mt. Rokko (Table 1). The analysis identified group F as the second Arima lineage group. All plants in group F (Table 1) were from the Arima lineage production area in Asago and Kobe (Fig. 2). Group F contained spiny plants. Three plants cultivated at the Fruit Tree Research Institute came from Mt. Inariyama, located near Arima. PCA did not clearly separate group F from the other plants.
The analysis identified group B as the Budou lineage. Group B was from Kainan city in Wakayama Prefecture (Fig. 2), where the Budou lineage is produced (Table 1). Group B is the grape-like Budou lineage, which has a different phenotype than the other lineages. The plants in group B were all cultivated and had spines.
Group G was designated as an unclassified lineage. This plant group included plants from different locations across Japan (Table 1), including Adachi, Akita, Agamachi, Ayabe, Kami, Kobe, Nagahama, Nangoku, Noto, Toyooka, and Tsuruoka. The plants in group G were both cultivated and wild (Table 1).
Tight cluster and additional groups revealed by cluster analysis. Cluster analysis was performed using a pairwise distance matrix. The cluster analysis identified the same seven groups that were identified in the PCA analysis and showed that plants 42,43,44,45,46,47,48,49,50, and 51 formed tight cluster.
In addition, cluster analysis identified an eighth group (Fig. 3), named group H. Group H (Fig. 3) contained the plants from Kochi Prefecture (Fig. 2, Table 1). This was named "unforeknown Kochi group". The plants in this group were spiny. Plants 1 and 87 were cultivated, and plant 2 was wild. These plants are used as ingredients to make rice cakes in this region (Table 1). The other plant (number 92) from Kochi Prefecture did not belong to group H.

Regional groups and their exceptions elucidated through phylogenetic analyses.
To further establish the above classifications, we performed phylogenetic analyses using the maximum likelihood (ML) method (Fig. 4) and a method based on the coalescent model (Fig. 5). Both phylogenetic trees classified plants into eight groups. The phylogenetic trees (Figs. 4 and 5) also showed a tight cluster in group A, containing plants 42,43,44,45,46,47,48,49,50, and 51, present as a subclade within clade A with a high bootstrap value. The branch lengths of the ML tree ( Fig. 4) indicated that these nine plants had a high degree of genetic similarity.
There were some differences in these eight groups between the two phylogenetic trees. Two trees separated plant 14 from the other plants in group E. The ML tree separated plant 13 from other plants in group D. Indeed, in PCA, these two plants were closer to the plants in the other groups (Fig. 1a,b). The discrepancies between the two phylogenetic analyses, including these ones, are related to the low bootstrap values.
The cluster analysis and both phylogenetic analyses showed that group G had two regional groups in common: (1) plants 78 and 79, and (2) plants 81, 82, and 83. Plants 78 and 79 in the first group were from Agamachi, Niigata Prefecture, and were wild spineless plants. In the second group, plant 81 was from Yamagata Prefecture, plant 82 was from Akita Prefecture, and plant 83 was from Tsuruoka, Yamagata Prefecture, which are adjacent to each other. However, plant 76 from Tsuruoka, Yamagata Prefecture, did not belong to this regional group.  www.nature.com/scientificreports/ www.nature.com/scientificreports/ In both phylogenetic trees, group G plants 88, 92, and 93 were isolated from the other group G plants. In group G in the coalescent tree (Fig. 5), plants 77 and 80 were separated from other plants in the same group.  www.nature.com/scientificreports/ the error value of the cross-validation was minimal. We presented the results of the admixture analysis from K = 2 to K = 7 (Fig. 6). www.nature.com/scientificreports/ In the case of K = 2 (Fig. 6), except for plant 42, the tight cluster within group A (plants 43,44,45,46,47,48,49,50, and 51) formed a single population, shown in yellow. The red colour corresponds to plants other than those in group A. The remaining group A plants were admixed groups of the two populations. Thus, at K = 2, group A members were clearly separated from the other plants.
The analysis of K = 3-7 separated the six groups A, B, C, D, E, and H (Fig. 6). However, the analysis did not separate group F, which is consistent with the results of PCA, which also did not clearly separate group F from other plants.
The admixture analysis also showed that the grouping of group G was ambiguous, and group G may not be a single isolated group. Reflecting this observation, the pattern of colours in group G was not consistent across several K values. Thus, the plants in group G plants may be admixtures of a variety of groups.
Conservation of heterozygosity. Asexual reproduction was confirmed by checking for heterozygosity conservation using pairwise alignments. Plants 8 and 9 of group F were grafted from the same tree (Table 1). Between these two plants, 2288 of 3699 sites were conserved heterozygous sites (61.9%) (Supplementary Fig. S2). Similarly, plants 33 and 34 of group C were grafted from the same tree (Table 1). Between these two plants, 1947 of the 3097 sites were conserved heterozygous sites (62.9%) (Supplementary Fig. S3). Grafting is a common method of propagating the Budou lineage, namely, group B plants. Between plants 3 and 4 of group B, 2198 of 3879 sites were conserved heterozygous sites (56.7%) (Supplementary Fig. S4). Thus, this degree of heterozygosity conservation was found among asexually reproduced plants when analysing the data using de novo mapping.
Between plants 40 and 41 of group A, growing nearby, 2140 of 3414 sites were conserved heterozygous sites (62.7%) ( Supplementary Fig. S5); between plants 38 and 39 of group A, growing nearby, 1700 of 3502 sites were conserved heterozygous sites (48.5%) ( Supplementary Fig. S6). Thus, plant pairs "40 and 41" and "38 and 39" were identified as asexually propagated plants, although no records have been kept regarding their propagation. Based on the analysis of these five pairs, we observed a conservation of heterozygosity of approximately 50% or above between asexually propagated plants when the RAD-Seq data were analysed de novo.
We examined the conservation of heterozygosity for the tight cluster in group A (plants 42, 43, 44, 45, 46, 47, 48, 49, 50, and 51). The degree of conservation ranged from 50.4% to 68.9% ( Supplementary Fig. S7, Supplementary Table S3). By contrast, the plants from the tight clusters within group A shared lower levels of conservation (from 29.5% to 36.5%) with the other group A plants (plants, 35, 38, and 40) (Supplementary Fig. S7, Supplementary Table S3). Therefore, the members of the tight cluster in group A were asexually propagated plants derived from the same tree. (Tables 2 and 3, Supplementary Tables S4, S5, and S6), we removed the plants (13 and 14) for which we obtained uncertain results and the plants in group G that caused the ambiguity. Although groups C and F belonged to the Arima lineages, the Fst value between them was high (0.110, Table 2). Thus, the difference in the pairwise Fst values between these two groups may indicate two places of origin.

Statistical interpretations of the classifications. In our statistical analysis
Similarly, the pairwise Fst value between groups A and D of the Asakura lineage was high (0.0803, Table 2). The reason for the higher Fst value may be the presence of a tight cluster in group A. Therefore, we performed a second statistical analysis by modifying the grouping. In the new grouping, we designated the tight cluster in group A as A2. We denoted the plants that remained in group A as A1. As expected, the pairwise Fst values were high between group A2 and each of the other groups (Table 3). Table 3 shows that group A1 was genetically closer to group D (0.0539). Thus, the high genetic distance between groups A and D in Table 2 was due to the tight cluster within group A.
We performed tests for f3-statistics and f4-statistics (Supplementary Tables S5 and S6, respectively). Negative values of f3-statistics with A1 as the outgroup (Supplementary Table S5) showed that group A1 is an admixture population of group A2 and the remaining plants. This result is in good agreement with the results of the admixture analysis (Fig. 6). The f4-statistics test (Supplementary Table S6) also supports this result with high or low Z-scores. As for source populations other than A2, groups C and E were more likely than group D. Although groups C and F belonged to the Arima lineages, positive values of f3-statistics with A2 as the outgroup showed that group C may have more gene flow from groups E, D, and H than from group F.

Discussion
Our study classified Zanthoxylum piperitum into several groups based on DNA sequences and geographic information. Each group showed interesting characteristics, some of which may have been related to domestication. The domestication of Zanthoxylum piperitum can be described in two steps. The first step is the cultivation of wild plants in agricultural fields, in which case the wild and cultivated plants are genetically similar. In the second step, the lineages with desirable traits spread from their place of origin to the surrounding areas, in which case the genetic diversity of individuals within the lineages should be low. Grafting is a means of spreading crops such as fruit trees.
Group D plants reflect the first step of domestication, because the cultivated plants probably originated in the Yabu Mountains. Group D comprised both wild plants in mountainous areas and cultivated plants in agricultural fields (Table 1).
The spineless Asakura lineage (Z. piperitum (L.) DC forma inerme (Makino) Makino), first described 90 years ago 11 , may be placed in group D. This 90-year-old documentation 11  www.nature.com/scientificreports/ Figure 6. K = 2-7 admixture plots in admixture analysis of Zanthoxylum accessions used in this study. The horizontal axis shows the group names and the sample numbers. The colour scheme for the group names and the sample numbers is the same as in Fig. 3. The colour for the genetic cluster is different from that used to denote the group names and sample numbers. Figure was generated using R software (version 3.6.2) 44 .  Table S4). Furthermore, the amount of conservation of heterozygosity showed that these plants were propagated by grafting ( Supplementary Fig. S7, Supplementary Table S3). These plants were also found to be widely cultivated in many cities (Yabu, Ayabe, Gojo, Kyotamba, Kobe, and Kami), indicating that they have been spread. Plants 42 and 48 are highly productive trees in Yabu City, which supplies scions for grafting. Thus, the plants in this tight cluster exhibited a variety of desirable traits.
There are two potential places of origin for group A: Yabu City, where the Asakura district is located and another distant location. The following reasons may explain the first possibility: (1) many of the group A plants grow in Yabu City (plants 35, 38, 39, 42, 46, and 48); (2) a 90-year-old document 11 describes the spineless plants originating from Asakura District, Yabu City; and (3) all of the group A plants are spineless. Interestingly, we collected plants 35, 38, and 39 near the Konryuji temple, a potential place of origin for the Asakura lineage. Therefore, we do not exclude the possibility that the spineless Asakura lineage described 90 years ago 11 may represent group A plants. However, plants 35, 38, and 39 were cultivated. Thus, a serious problem for the first possibility is the absence of wild plants in group A in the mountains of Yabu City.
The second possibility is that the group A plants are from another distant location; that is, the plants in the tight cluster of group A of unknown origin hybridised with the group D plants to form the rest of the group A plants. This possibility is consistent with the results of admixture analysis for K = 2. However, it is worth noting that the results of the admixture analysis may have been artificial since the second step of domestication formed the ancestral population containing the plants in the tight cluster. In addition, the f3-statistics test results (Supplementary Table S5) do not necessarily indicate that the source population of group A is group D. If the second possibility is true, the origin of the plants in the tight cluster remains unknown, although the origin of the rest of the group A plants can be explained by hybridisation.
In groups C and F of the Arima lineage, we observed the first domestication step based on documentation. According to the records, the group F plants (plants 7, 8, and 9) were transplanted from Mt. Inariyama to Asago city (Table 1). Group C plants (plants 29, 30, 31, 32, 33, and 34) were transplanted from Mt. Rokko to Asago and Kobe cities. The other group F plants (plants 5, 6, 10, 11, and 12) should have the same origin. According to the PCA results, the group F plants (plants 5, 6, 10, 11, and 12) were more closely related to the plants in the other groups.
We did not find any records describing the origins of the group E plants. In addition, we could not find wild representatives of this group in the mountains. If the group E plants have adaptations to high altitudes, the mountainous areas near Takayama city may be where group E originated. Spineless plants are present in group E (plants 16, 17, 18,19, 20, 21, 22, 23, 24, 25, and 26) and are genetically different from the plants in groups A and D in Yabu City, where the Asakura district is located.
In addition to these observations, the results for group G also suggest that domestication is underway in many areas. Ongoing domestication is associated with the cultivation of wild lineages on nearby agricultural lands and  www.nature.com/scientificreports/ gardens. Group G was genetically different from the other plant groups undergoing domestication. Therefore, these plants may have been domesticated at their respective locations. Recent reports have stated that the domestication of other crop species, such as olives in Morocco 33 and dragon fruits/pitaya in Mexico 34 , is also ongoing. Both olives and dragon fruits undergo two common steps to become domesticated: first, the in situ conservation of wild plants, and second, the introduction of wild plants into agricultural fields. These plants were selected as the desirable plants for cultivation. The ongoing domestication of Japanese pepper is not entirely similar to that of olives and dragon fruit. In the first step of olive domestication, farmers continue to disturb the natural habitat of the olive when selecting wild populations 33 . For Japanese pepper, wild plants are harvested from their natural habitats and transplanted into agricultural fields or gardens while leaving other wild plants intact in the mountains. The second step of the domestication of olives and dragon fruits occurs through the continuous introduction of wild plants into agricultural fields to further select for preferred traits 33,34 . In Japanese pepper, wild plants are not frequently reintroduced into cultivated plant sites.
On the other hand, the ongoing domestication of Japanese pepper is similar to that of olives and dragon fruit in which domestication syndrome has been observed. Domestication syndrome changes the genomic, physiological, and morphological characteristics of cultivated plants compared to those of wild plants 35 11 , it contained some spiny plants. In addition, the spineless trait is not unique to the Asakura lineage, as some of the plants in the Takahara lineage (group E) are also spineless. Some group G plants were spineless. Importantly, these spineless plants were from many locations. Therefore, our study does not support the notion that the spineless lineage is either from Asakura or monophyletic. Our results indicate that the spineless lineage is not a single subspecies.
Some Fis values were negative (Supplementary Table S4). Notably, the tight cluster plants in group A (group A2) had the highest negative Fis value. Therefore, we speculated that some of these plants had been asexually reproduced by grafting, as asexual reproduction may cause negative Fis values 36 . Pairwise alignments support this possibility ( Supplementary Fig. S7 and Supplementary Table S3).
In this study, we elucidated the genetic diversity of Japanese pepper, while tracing the process of domestication. Our results demonstrate that the conservation of plant genetic resources is an important agricultural practice. Several cultivated lineages of Japanese pepper were found to be closely related to plants growing in the mountains, in contrast to other crops. On the other hand, several cultivated lineages, groups A, B, and E, had no close relatives growing in the mountains, which suggests that wild plants of this group no longer exist. Therefore, there is a need to conserve both plants growing in the mountains in situ and plants growing in agricultural fields or preservation facilities ex situ. The results presented in this study will be useful for developing a plan for the conservation and breeding of Japanese peppers.

Methods
Plant materials. Ninety-three Zanthoxylum piperitum samples collected from various parts of Japan were used ( Table 1). Photographs of the materials were taken, and the leaves were stored in a freezer (− 80 °C). At each location, where possible, plants growing in the mountains and agricultural fields were collected. Any lineages of Budou or Takahara that grew in the mountains were not found. All methods involving plants were carried out in accordance with relevant guidelines and regulations. Plants were collected with permission from their owners. DNA extraction and double-digest restriction site amplified DNA sequencing (ddRAD-Seq). DNA extraction and purification were performed using previously described methods 30 .
The library for ddRAD-Seq was prepared based on the original protocol 27 with some modifications 37 . The first restriction site near the primer binding site, which reads the single-ended DNA sequence, was BglII. The second restriction site, adjacent to the binding site for reading the index sequence, was EcoRI. The library was sequenced with 51-bp single-end reads at Macrogen (Seoul, Korea) on two lanes of an Illumina HiSeq2000 (Illumina, San Diego, CA, USA).
Processing and quality control of ddRAD-Seq data. The Stacks package 38 was used to analyse the ddRAD-Seq reads. The process_shortreads program of the Stacks package (version 2.4) was used to clean reads using -c (clean data, remove any read with an uncalled base), -q (discard reads with low-quality scores), and -r (rescue barcodes) options. The cutadapt program (version 2.6) 39 removed adaptor sequences from reads using the -b (remove adapter anywhere, the sequence of an adapter that may be ligated to the 5ʹ or 3ʹ end) and -e 0.1 (maximum allowed error rate by keeping the default value 0.1 = 10%) options. Any low-quality bases in the single-end reads were trimmed using the trimmomatic program (version 0.39) 40 with the following settings LEADING:19 (cut bases off the start of a read, if below a threshold quality), TRAILING:19 (cut bases off the end of a read, if below a threshold quality), AVGQUAL:20 (drop the read if the average quality is below a specified level), MINLEN:51 (drop the read if it is below a specified length), and SLIDINGWINDOW:20:20 (perform sliding window trimming, clipping once the average quality within the window falls below a threshold).  41 , plink 42 , phylip, and treemix files using the -R 0.5 (minimum percentage of individuals across populations required to process a locus), -write-single-snp (restrict data analysis to only the first SNP per locus), -min-maf 0.05 (minimum minor allele count required to process a SNP), -vcf, -plink, -phylip-var-all -treemix options. The populations program was also used to create pairwise alignments between the two individuals. In this case, the -R option was set to one.
Principal component analysis (PCA) and cluster analysis. Principal component analysis was performed based on the vcf file generated by the populations program. The SNPRelate program 43 in the R software environment (version 3.6.2) 44 converted the vcf file to a gds (genomic data structure) file, drew the PCA diagrams, and calculated the contribution ratios for each principal component. This step used only bi-allelic loci. The SNPRelate program plotted the dendrogram by considering the identity by state (IBS) pairwise distances. Images of the results were generated using the basic functions of the R software environment.
Admixture analysis. The plink program (plink 2, version 1.90p) 42 was used to create the input files for the admixture program. The admixture program (version 1.3) 45 was used to determine the admixture history and the cross-validation (CV) error for the hypothetical runs from K (number of ancestral populations) = 1-12. The CV error plot was drawn using the received log data to determine the optimal K value. R software (version 3.6.2) was used to draw admixture plots using the Q estimate files created by the admixture programs. Images of the results were generated using the basic functions of the R software environment.
Maximum likelihood phylogenetic tree analysis. ModelTest-NG was used to select the model 46 . The maximum likelihood (ML) phylogenetic tree was created using the raxmlHPC-PTHREADS-SSE3 program (version 8.2.12) 47 . The phylip file created by the populations program was used as the input file. The parameters for raxml were -f a (rapid bootstrap analysis and search for the best-scoring ML tree in one program run),-x 12,345 (an integer number [random seed] and turn on rapid bootstrapping), -p 12,345 (a random number seed for parsimony inferences), -N 1000 (bootstrap value), and -m GTRGAMMAX (model of binary [morphological], nucleotide, multi-state, or amino acid substitution). Group H was used as the root. Image of the phylogenetic tree was generated using Dendroscope (version 3.6.3) 48 and TreeView X (version 0.5.0, https:// treev iew-x. en. softo nic. com/).
Phylogenetic tree analysis under the coalescent model. Phylogenetic tree analysis was performed using SVDquartets 49 integrated into PAUP (version 4.0a) software 50 . The input file for the PAUP was created by converting a phylip file containing all sites to the nexus file format. The parameters used for SVDquartets were 'quartet evaluation' , 'evaluate all possible quartets' , 'tree inference' , 'select trees' , 'using QFM quartet assembly' , 'tree model' , 'multi-species coalescent' , 'write quartets file' , 'QMC format' , 'handling of ambiguities' and 'distribute' . The number of bootstrap analyses was 1000 replicates. Group A was used as the root. Image of the phylogenetic tree was generated using FigTree (version 1.4.4, http:// tree. bio. ed. ac. uk/).

Statistical analysis.
Groups separated by PCA and cluster analysis were assigned as separate populations in the population map data required for the Stacks package. The denovo_map.pl was re-performed after modifying the data for the population map. The populations program was run with the options -R 0.5, -write-single-snp, -min-maf 0.05, -fstats (SNP and haplotype-based F statistics), and -Fst_correction (a correction to be applied to Fst values with p-value [default p-value to keep an Fst measurement: 0.05]). The program treemix (version 1.12) 51 inferred gene flow using the f3-and f4-statistics with the option -k 500 (standard errors in blocks of 500 SNPs).