Introduction

Zanthoxylum species belongs to the family Rutaceae, commonly known as the citrus family. Many Zanthoxylum species are distributed worldwide and are used for culinary and medicinal purposes. For example, the Āyurvedic traditional medicine of India and Sri Lanka uses Z. tetraspermum and Z. rhetsa (“Katu Keena”)1. Sichuan pepper is an essential ingredient in the cuisines of the Sichuan province of China and contains several species of Zanthoxylum, including Z. simulans. Z. simulans is used in traditional Chinese medicine for its stomachic, analgesic, and anthelmintic properties2.

Zanthoxylum piperitum is native to Japan (except for Ryukyu) and southern Korea. In Japan, it is mainly used for culinary purposes. In English, it is known as Japanese pepper; in Japanese, it is called sanshō. This plant is considered to be highly medicinal in various uses3,4,5,6,7,8,9,10. Although Zanthoxylum piperitum is widely consumed and distributed throughout Japan, lineages have been formed based on the production area and morphology of this plant. The Asakura, Arima, and Takahara lineages are classified based on the place of production, while the Budou lineage is classified based on its morphology.

The Asakura lineage is believed to have originated in the Asakura district of Yabu City in central Hyōgo Prefecture. A typical feature of this lineage is the absence of spines. For this reason, the spineless plant is called the Asakura lineage in Japan. This spineless plant has been identified as a subspecies of Z. piperitum and named Z. piperitum (L.) DC forma inerme (Makino) Makino11. However, there is no evidence that all spineless plants are either from Asakura or are monophyletic. In Yabu city, Zanthoxylum piperitum grow as wild plants in mountainous areas and as cultivated plants in agricultural fields and gardens.

The Arima lineage, also known as the Rokko lineage, is another Japanese pepper lineage found in Kobe City in southern Hyōgo Prefecture. The flowers of these plants are typically used as ingredients in Arima cuisine.

The Takahara lineage is grown in Takayama City in northern Gifu Prefecture. These plants are grown at high altitudes 800 m above sea level, in contrast to the other lineages, which are cultivated on the plains. These plants bloom in mid-May, whereas other lineages bloom in late April. The wild Takahara lineage is difficult to find in the mountains of this region.

The Budou lineage (Budou means “grape” in Japanese) is categorised based on its morphology. This lineage is cultivated mainly in Kainan City, Wakayama Prefecture. It bears clusters of fruits that resemble grapes. According to oral tradition, it appeared as a chance seedling in a garden about 200 years ago and has been cultivated ever since. Therefore, the wild lineage is probably absent.

Many studies have investigated the genetic relationships among Zanthoxylum species12,13,14,15,16,17,18,19,20,21,22,23. There are no reports that compare the intraspecific diversity of Zanthoxylum piperitum in detail. However, three important studies have examined the DNA sequences of Zanthoxylum piperitum growing in Korea. One is the complete sequencing of the chloroplast genome12, and the others involved the development of markers to identify Zanthoxylum piperitum13,14. Many other studies have used the genetic information for Zanthoxylum piperitum as a reference for comparison15,16,17,18,19,20,22,23. There are two important studies on the intraspecific genetic diversity of Zanthoxylum species, which focus on: (1) the relationship and diversity between wild and cultivated plants22 and (2) the genetic structure of cultivated plants23. A previous study22 compared wild plants to cultivated plants using DNA sequence data and detected genetic similarities between cultivated and wild Z. armatum. Although this finding may be relevant to the domestication of Z. armatum, the study only used a small number of markers and was unable to examine the species domestication details.

Studies on intraspecific genetic diversity of Zanthoxylum species have been based on the analysis of a small number of markers, such as sequence-related amplified polymorphism markers22 and simple sequence repeat (SSR) markers23. Compared to genetic studies using a small number of markers, high-throughput sequencing can generate many single nucleotide variants (SNVs) that are suitable for analysing intraspecific differences24,25. Restriction site-associated DNA sequencing (RAD-Seq)26 is a high-throughput sequencing method. Initially, RAD-Seq cleaves individual genomes using a selected site-specific restriction enzyme. Then, DNA fragments adjacent to the restriction site are sequenced using a high-throughput sequencer. Double-digest RAD-Seq (ddRAD-Seq)27 is a variation of RAD-Seq that uses two restriction enzymes to cleave the genome. These methods allow for reduced representation of individual genomes. Moreover, these methods do not require prior knowledge of the genome sequence27. Our group has previously used the original RAD-Seq and ddRAD-Seq methods to analyse citrus28,29,30, loquat31, and firefly32.

The four lineages are well known, and Zanthoxylum piperitum is cultivated all over Japan, except for Ryukyu. Some lineages are present in both the mountains and agricultural fields. Therefore, the domestication of Japanese pepper may be an ongoing process. In the present study, ddRAD-Seq was used to elucidate the intraspecific genetic diversity and occurrence of domestication in Japanese pepper.

Results

Variant detection by de novo mapping of RAD-Seq data

Double digest RAD-Seq generated more than 11.9 gigabases of data for a total of 235.2 million raw single-end 51-bp reads. Quality-based filtering resulted in an average of 2.5 million reads (maximum of 4.3 million and a minimum of 0.4 million) among 93 samples (Supplementary Table S1). The Stacks program built loci de novo with an average coverage depth of 27.32 times (Supplementary Table S2). We performed the following analysis using data from 4,334 variant sites.

Correlation of genetic similarity with geographic locations

Based on the principal component analysis (PCA) (Fig. 1) and geographical information (Fig. 2, Table 1), 93 samples were classified into seven groups. PC1 divided group A from the other groups (Fig. 1a); PC2 divided groups D and E from the other groups (Fig. 1a); PC3 (Fig. 1b) identified groups C and F and confirmed the presence of group E; PC4 identified group B (Fig. 1c) and confirmed the presence of groups C and F; PC5 reconfirmed the presence of groups B and F (Fig. 1d); and PC6 identified group G (Fig. 1e). Thus, probably because the genetic relationships among plants were complex, it was necessary to use PC1 through PC6 for classification.

Figure 1
figure 1

Principal component (PC) analysis of Zanthoxylum accessions used in the study. The six major component data are shown by the five two-dimensional data sets: (a) first and second PCA axes, (b) first and third PCA axes, (c) first and fourth PCA axes, (d) first and fifth PCA axes, and (e) first and sixth PCA axes. Group A of the Asakura lineage is shown in blue; group B of the Budou lineage is shown in brown; group C of the Arima lineage is shown in orange; group D of the Asakura lineage is shown in violet; group E of the Takahara lineage is shown in maroon; group F of the Arima lineage is shown in green; group G is shown in violet. The letters indicating the sample number are also shown in the same colour. The contribution rate of each principal component is shown in parentheses. Figure was generated using R software (version 3.6.2)44.

Figure 2
figure 2

Sampling points and locations of the lineages. The colour scheme is the same as in Fig. 1.

Table 1 Information on Zanthoxylum piperitum accessions used in the study.

Group A may be an Asakura lineage group because the area across which the group A plants were distributed coincided with the production area for the Asakura lineage in Yabu City, central Hyogo Prefecture (Fig. 2, Table 1). All plants in group A were spineless. Group A was widely distributed and cultivated in other areas, including Kobe and Kami in Hyogo Prefecture, Ayabe, and Kyōtamba in Kyoto Prefecture, and Gojō in Nara Prefecture (Table 1). Interestingly, plants 42, 43, 44, 45, 46, 47, 48, 49, 50, and 51 formed tight cluster, and are from Yabu, Ayabe, Gojō, Kyōtamba, Yabu, Kobe, Yabu, Gojō, Gojō, and Kami, respectively. The other plants in group A (35, 36, 37, 38, 39, 40, and 41) were from Yabu, Kyōtamba, Kobe, Yabu, Yabu, Kyōtamba, and Kyōtamba, respectively.

The analysis identified group D as the second Asakura lineage group. Group D plants were from Yabu City only (Figs. 1a, 2, Table 1), except for plant 74, which was from Kami. Unlike group A, which had more cultivated plants, group D contained many wild plants found in mountainous areas. Most, but not all, of the group D plants were spineless. Plant 58 was 100 years old and was growing in an agricultural field. When we observed plant 58 in 2017, this plant was spineless. As of 2020, this plant was near death, and its spines were visible.

The analysis identified group E as the Takahara lineage. Group E (Fig. 1a) was an isolated group with plants only from Takayama city, located in the highland region of northern Gifu Prefecture (Fig. 2, Table 1). This group consisted of only cultivated plants. Most of the group E plants were spineless (Table 1).

The analysis identified group C as the first group of the Arima lineage. Group C plants were from Kobe and Asago cities in southern Hyogo Prefecture (Fig. 2, Table 1), where the Arima lineage was produced. There is an agricultural field in Asago City for the preservation of the Arima lineage. All plants in group C were spiny and transplanted from Mt. Rokko (Table 1).

The analysis identified group F as the second Arima lineage group. All plants in group F (Table 1) were from the Arima lineage production area in Asago and Kobe (Fig. 2). Group F contained spiny plants. Three plants cultivated at the Fruit Tree Research Institute came from Mt. Inariyama, located near Arima. PCA did not clearly separate group F from the other plants.

The analysis identified group B as the Budou lineage. Group B was from Kainan city in Wakayama Prefecture (Fig. 2), where the Budou lineage is produced (Table 1). Group B is the grape-like Budou lineage, which has a different phenotype than the other lineages. The plants in group B were all cultivated and had spines.

Group G was designated as an unclassified lineage. This plant group included plants from different locations across Japan (Table 1), including Adachi, Akita, Agamachi, Ayabe, Kami, Kobe, Nagahama, Nangoku, Noto, Toyooka, and Tsuruoka. The plants in group G were both cultivated and wild (Table 1).

Tight cluster and additional groups revealed by cluster analysis

Cluster analysis was performed using a pairwise distance matrix. The cluster analysis identified the same seven groups that were identified in the PCA analysis and showed that plants 42, 43, 44, 45, 46, 47, 48, 49, 50, and 51 formed tight cluster.

In addition, cluster analysis identified an eighth group (Fig. 3), named group H. Group H (Fig. 3) contained the plants from Kochi Prefecture (Fig. 2, Table 1). This was named “unforeknown Kochi group”. The plants in this group were spiny. Plants 1 and 87 were cultivated, and plant 2 was wild. These plants are used as ingredients to make rice cakes in this region (Table 1). The other plant (number 92) from Kochi Prefecture did not belong to group H.

Figure 3
figure 3

Cluster analysis of Zanthoxylum accessions. The new group, H group, is shown in light blue. The colour scheme is the same as in Fig. 1, with an additional colour. Figure was generated using R software (version 3.6.2)44.

Regional groups and their exceptions elucidated through phylogenetic analyses

To further establish the above classifications, we performed phylogenetic analyses using the maximum likelihood (ML) method (Fig. 4) and a method based on the coalescent model (Fig. 5). Both phylogenetic trees classified plants into eight groups. The phylogenetic trees (Figs. 4 and 5) also showed a tight cluster in group A, containing plants 42, 43, 44, 45, 46, 47, 48, 49, 50, and 51, present as a subclade within clade A with a high bootstrap value. The branch lengths of the ML tree (Fig. 4) indicated that these nine plants had a high degree of genetic similarity.

Figure 4
figure 4

Maximum likelihood phylogenetic analysis of Zanthoxylum accessions. The colour scheme is the same as in Fig. 3. The numbers at the nodes indicate bootstrap values (% over 1000 replicates). The nodes with ≥ 50% bootstrap value are labeled. Figure was generated using Dendroscope (version 3.6.3)48 and TreeView X (version 0.5.0, https://treeview-x.en.softonic.com/).

Figure 5
figure 5

Coalescent model phylogenetic tree of Zanthoxylum accessions. The colour scheme is the same as in Fig. 3. The numbers at the nodes indicate bootstrap values (% over 1,000 replicates). The nodes with ≥ 50% bootstrap value are labeled. Figure was generated using FigTree (version 1.4.4, http://tree.bio.ed.ac.uk/).

There were some differences in these eight groups between the two phylogenetic trees. Two trees separated plant 14 from the other plants in group E. The ML tree separated plant 13 from other plants in group D. Indeed, in PCA, these two plants were closer to the plants in the other groups (Fig. 1a,b). The discrepancies between the two phylogenetic analyses, including these ones, are related to the low bootstrap values.

The cluster analysis and both phylogenetic analyses showed that group G had two regional groups in common: (1) plants 78 and 79, and (2) plants 81, 82, and 83. Plants 78 and 79 in the first group were from Agamachi, Niigata Prefecture, and were wild spineless plants. In the second group, plant 81 was from Yamagata Prefecture, plant 82 was from Akita Prefecture, and plant 83 was from Tsuruoka, Yamagata Prefecture, which are adjacent to each other. However, plant 76 from Tsuruoka, Yamagata Prefecture, did not belong to this regional group.

In both phylogenetic trees, group G plants 88, 92, and 93 were isolated from the other group G plants. In group G in the coalescent tree (Fig. 5), plants 77 and 80 were separated from other plants in the same group. This observation suggests that group G may not be a single isolated group. Thus, the grouping of group G is ambiguous, probably because plants in various locations belonged to group G.

Admixture history of the eight groups and the tight cluster within group A

Admixture analysis was used to estimate ancestral history and calculated cross-validation (CV) errors to estimate possible values of K (number of ancestral populations) = 1–12 (Supplementary Figure S1). K = 2 was the most likely value because the error value of the cross-validation was minimal. We presented the results of the admixture analysis from K = 2 to K = 7 (Fig. 6).

Figure 6
figure 6

K = 2–7 admixture plots in admixture analysis of Zanthoxylum accessions used in this study. The horizontal axis shows the group names and the sample numbers. The colour scheme for the group names and the sample numbers is the same as in Fig. 3. The colour for the genetic cluster is different from that used to denote the group names and sample numbers. Figure was generated using R software (version 3.6.2)44.

In the case of K = 2 (Fig. 6), except for plant 42, the tight cluster within group A (plants 43, 44, 45, 46, 47, 48, 49, 50, and 51) formed a single population, shown in yellow. The red colour corresponds to plants other than those in group A. The remaining group A plants were admixed groups of the two populations. Thus, at K = 2, group A members were clearly separated from the other plants.

The analysis of K = 3–7 separated the six groups A, B, C, D, E, and H (Fig. 6). However, the analysis did not separate group F, which is consistent with the results of PCA, which also did not clearly separate group F from other plants.

The admixture analysis also showed that the grouping of group G was ambiguous, and group G may not be a single isolated group. Reflecting this observation, the pattern of colours in group G was not consistent across several K values. Thus, the plants in group G plants may be admixtures of a variety of groups.

Conservation of heterozygosity

Asexual reproduction was confirmed by checking for heterozygosity conservation using pairwise alignments. Plants 8 and 9 of group F were grafted from the same tree (Table 1). Between these two plants, 2288 of 3699 sites were conserved heterozygous sites (61.9%) (Supplementary Fig. S2). Similarly, plants 33 and 34 of group C were grafted from the same tree (Table 1). Between these two plants, 1947 of the 3097 sites were conserved heterozygous sites (62.9%) (Supplementary Fig. S3). Grafting is a common method of propagating the Budou lineage, namely, group B plants. Between plants 3 and 4 of group B, 2198 of 3879 sites were conserved heterozygous sites (56.7%) (Supplementary Fig. S4). Thus, this degree of heterozygosity conservation was found among asexually reproduced plants when analysing the data using de novo mapping.

Between plants 40 and 41 of group A, growing nearby, 2140 of 3414 sites were conserved heterozygous sites (62.7%) (Supplementary Fig. S5); between plants 38 and 39 of group A, growing nearby, 1700 of 3502 sites were conserved heterozygous sites (48.5%) (Supplementary Fig. S6). Thus, plant pairs “40 and 41” and “38 and 39” were identified as asexually propagated plants, although no records have been kept regarding their propagation. Based on the analysis of these five pairs, we observed a conservation of heterozygosity of approximately 50% or above between asexually propagated plants when the RAD-Seq data were analysed de novo.

We examined the conservation of heterozygosity for the tight cluster in group A (plants 42, 43, 44, 45, 46, 47, 48, 49, 50, and 51). The degree of conservation ranged from 50.4% to 68.9% (Supplementary Fig. S7, Supplementary Table S3). By contrast, the plants from the tight clusters within group A shared lower levels of conservation (from 29.5% to 36.5%) with the other group A plants (plants, 35, 38, and 40) (Supplementary Fig. S7, Supplementary Table S3). Therefore, the members of the tight cluster in group A were asexually propagated plants derived from the same tree.

Statistical interpretations of the classifications

In our statistical analysis (Tables 2 and 3, Supplementary Tables S4, S5, and S6), we removed the plants (13 and 14) for which we obtained uncertain results and the plants in group G that caused the ambiguity. Although groups C and F belonged to the Arima lineages, the Fst value between them was high (0.110, Table 2). Thus, the difference in the pairwise Fst values between these two groups may indicate two places of origin.

Table 2 Fst values for each pair of groups for A, B, C, D, E, F, H.
Table 3 Fst values for each pair of groups for A1, A2, B, C, D, E, F, H.

Similarly, the pairwise Fst value between groups A and D of the Asakura lineage was high (0.0803, Table 2). The reason for the higher Fst value may be the presence of a tight cluster in group A. Therefore, we performed a second statistical analysis by modifying the grouping. In the new grouping, we designated the tight cluster in group A as A2. We denoted the plants that remained in group A as A1. As expected, the pairwise Fst values were high between group A2 and each of the other groups (Table 3). Table 3 shows that group A1 was genetically closer to group D (0.0539). Thus, the high genetic distance between groups A and D in Table 2 was due to the tight cluster within group A.

We performed tests for f3-statistics and f4-statistics (Supplementary Tables S5 and S6, respectively). Negative values of f3-statistics with A1 as the outgroup (Supplementary Table S5) showed that group A1 is an admixture population of group A2 and the remaining plants. This result is in good agreement with the results of the admixture analysis (Fig. 6). The f4-statistics test (Supplementary Table S6) also supports this result with high or low Z-scores. As for source populations other than A2, groups C and E were more likely than group D. Although groups C and F belonged to the Arima lineages, positive values of f3-statistics with A2 as the outgroup showed that group C may have more gene flow from groups E, D, and H than from group F.

Discussion

Our study classified Zanthoxylum piperitum into several groups based on DNA sequences and geographic information. Each group showed interesting characteristics, some of which may have been related to domestication. The domestication of Zanthoxylum piperitum can be described in two steps. The first step is the cultivation of wild plants in agricultural fields, in which case the wild and cultivated plants are genetically similar. In the second step, the lineages with desirable traits spread from their place of origin to the surrounding areas, in which case the genetic diversity of individuals within the lineages should be low. Grafting is a means of spreading crops such as fruit trees.

Group D plants reflect the first step of domestication, because the cultivated plants probably originated in the Yabu Mountains. Group D comprised both wild plants in mountainous areas and cultivated plants in agricultural fields (Table 1).

The spineless Asakura lineage (Z. piperitum (L.) DC forma inerme (Makino) Makino), first described 90 years ago11, may be placed in group D. This 90-year-old documentation11, based on a much older book, states that the origin of the Asakura lineage was on a cliff near the Konryuji temple. Plant 90, belonging to group D, grows in a place similar to the place of origin and is spineless. In addition, spineless wild plants 53, 54, 60, and 69 were collected near the Konryuji temple.

Group A plants reflect the second domestication step because they include lineages with desirable traits and appear to have been spread. Group A contained a tight cluster comprising plants 42, 43, 44, 45, 46, 47, 48, 49, 50, and 51. This cluster showed minimal genetic diversity (Supplementary Table S4). Furthermore, the amount of conservation of heterozygosity showed that these plants were propagated by grafting (Supplementary Fig. S7, Supplementary Table S3). These plants were also found to be widely cultivated in many cities (Yabu, Ayabe, Gojo, Kyotamba, Kobe, and Kami), indicating that they have been spread. Plants 42 and 48 are highly productive trees in Yabu City, which supplies scions for grafting. Thus, the plants in this tight cluster exhibited a variety of desirable traits.

There are two potential places of origin for group A: Yabu City, where the Asakura district is located and another distant location. The following reasons may explain the first possibility: (1) many of the group A plants grow in Yabu City (plants 35, 38, 39, 42, 46, and 48); (2) a 90-year-old document11 describes the spineless plants originating from Asakura District, Yabu City; and (3) all of the group A plants are spineless. Interestingly, we collected plants 35, 38, and 39 near the Konryuji temple, a potential place of origin for the Asakura lineage. Therefore, we do not exclude the possibility that the spineless Asakura lineage described 90 years ago11 may represent group A plants. However, plants 35, 38, and 39 were cultivated. Thus, a serious problem for the first possibility is the absence of wild plants in group A in the mountains of Yabu City.

The second possibility is that the group A plants are from another distant location; that is, the plants in the tight cluster of group A of unknown origin hybridised with the group D plants to form the rest of the group A plants. This possibility is consistent with the results of admixture analysis for K = 2. However, it is worth noting that the results of the admixture analysis may have been artificial since the second step of domestication formed the ancestral population containing the plants in the tight cluster. In addition, the f3-statistics test results (Supplementary Table S5) do not necessarily indicate that the source population of group A is group D. If the second possibility is true, the origin of the plants in the tight cluster remains unknown, although the origin of the rest of the group A plants can be explained by hybridisation.

In groups C and F of the Arima lineage, we observed the first domestication step based on documentation. According to the records, the group F plants (plants 7, 8, and 9) were transplanted from Mt. Inariyama to Asago city (Table 1). Group C plants (plants 29, 30, 31, 32, 33, and 34) were transplanted from Mt. Rokko to Asago and Kobe cities. The other group F plants (plants 5, 6, 10, 11, and 12) should have the same origin. According to the PCA results, the group F plants (plants 5, 6, 10, 11, and 12) were more closely related to the plants in the other groups.

We did not find any records describing the origins of the group E plants. In addition, we could not find wild representatives of this group in the mountains. If the group E plants have adaptations to high altitudes, the mountainous areas near Takayama city may be where group E originated. Spineless plants are present in group E (plants 16, 17, 18,19, 20, 21, 22, 23, 24, 25, and 26) and are genetically different from the plants in groups A and D in Yabu City, where the Asakura district is located.

In addition to these observations, the results for group G also suggest that domestication is underway in many areas. Ongoing domestication is associated with the cultivation of wild lineages on nearby agricultural lands and gardens. Group G was genetically different from the other plant groups undergoing domestication. Therefore, these plants may have been domesticated at their respective locations.

Recent reports have stated that the domestication of other crop species, such as olives in Morocco33 and dragon fruits/pitaya in Mexico34, is also ongoing. Both olives and dragon fruits undergo two common steps to become domesticated: first, the in situ conservation of wild plants, and second, the introduction of wild plants into agricultural fields. These plants were selected as the desirable plants for cultivation. The ongoing domestication of Japanese pepper is not entirely similar to that of olives and dragon fruit. In the first step of olive domestication, farmers continue to disturb the natural habitat of the olive when selecting wild populations33. For Japanese pepper, wild plants are harvested from their natural habitats and transplanted into agricultural fields or gardens while leaving other wild plants intact in the mountains. The second step of the domestication of olives and dragon fruits occurs through the continuous introduction of wild plants into agricultural fields to further select for preferred traits33,34. In Japanese pepper, wild plants are not frequently reintroduced into cultivated plant sites.

On the other hand, the ongoing domestication of Japanese pepper is similar to that of olives and dragon fruit in which domestication syndrome has been observed. Domestication syndrome changes the genomic, physiological, and morphological characteristics of cultivated plants compared to those of wild plants35. The genomic characteristics of the plants in the tight cluster in group A were different from those of the wild plants. Similarly, the morphological characteristics of the plants in group B were different from those of wild plants.

Spineless plants are known as the Asakura lineage and are classified as a single subspecies, Z. piperitum (L.) DC forma inerme (Makino) Makino11. However, it is inappropriate to refer to the spineless plant as the Asakura lineage or Z. piperitum (L.) DC forma inerme (Makino) Makino. One Asakura lineage group (group A) contained a tight cluster of spineless plants of unknown origin. Although the second Asakura lineage group (group D) had a plant growing in a recorded place of origin11, it contained some spiny plants. In addition, the spineless trait is not unique to the Asakura lineage, as some of the plants in the Takahara lineage (group E) are also spineless. Some group G plants were spineless. Importantly, these spineless plants were from many locations. Therefore, our study does not support the notion that the spineless lineage is either from Asakura or monophyletic. Our results indicate that the spineless lineage is not a single subspecies.

Some Fis values were negative (Supplementary Table S4). Notably, the tight cluster plants in group A (group A2) had the highest negative Fis value. Therefore, we speculated that some of these plants had been asexually reproduced by grafting, as asexual reproduction may cause negative Fis values36. Pairwise alignments support this possibility (Supplementary Fig. S7 and Supplementary Table S3).

In this study, we elucidated the genetic diversity of Japanese pepper, while tracing the process of domestication. Our results demonstrate that the conservation of plant genetic resources is an important agricultural practice. Several cultivated lineages of Japanese pepper were found to be closely related to plants growing in the mountains, in contrast to other crops. On the other hand, several cultivated lineages, groups A, B, and E, had no close relatives growing in the mountains, which suggests that wild plants of this group no longer exist. Therefore, there is a need to conserve both plants growing in the mountains in situ and plants growing in agricultural fields or preservation facilities ex situ. The results presented in this study will be useful for developing a plan for the conservation and breeding of Japanese peppers.

Methods

Plant materials

Ninety-three Zanthoxylum piperitum samples collected from various parts of Japan were used (Table 1). Photographs of the materials were taken, and the leaves were stored in a freezer (− 80 °C). At each location, where possible, plants growing in the mountains and agricultural fields were collected. Any lineages of Budou or Takahara that grew in the mountains were not found. All methods involving plants were carried out in accordance with relevant guidelines and regulations. Plants were collected with permission from their owners.

DNA extraction and double-digest restriction site amplified DNA sequencing (ddRAD-Seq)

DNA extraction and purification were performed using previously described methods30. The library for ddRAD-Seq was prepared based on the original protocol27 with some modifications37. The first restriction site near the primer binding site, which reads the single-ended DNA sequence, was BglII. The second restriction site, adjacent to the binding site for reading the index sequence, was EcoRI. The library was sequenced with 51-bp single-end reads at Macrogen (Seoul, Korea) on two lanes of an Illumina HiSeq2000 (Illumina, San Diego, CA, USA).

Processing and quality control of ddRAD-Seq data

The Stacks package38 was used to analyse the ddRAD-Seq reads. The process_shortreads program of the Stacks package (version 2.4) was used to clean reads using -c (clean data, remove any read with an uncalled base), -q (discard reads with low-quality scores), and -r (rescue barcodes) options. The cutadapt program (version 2.6)39 removed adaptor sequences from reads using the -b (remove adapter anywhere, the sequence of an adapter that may be ligated to the 5ʹ or 3ʹ end) and -e 0.1 (maximum allowed error rate by keeping the default value 0.1 = 10%) options. Any low-quality bases in the single-end reads were trimmed using the trimmomatic program (version 0.39)40 with the following settings LEADING:19 (cut bases off the start of a read, if below a threshold quality), TRAILING:19 (cut bases off the end of a read, if below a threshold quality), AVGQUAL:20 (drop the read if the average quality is below a specified level), MINLEN:51 (drop the read if it is below a specified length), and SLIDINGWINDOW:20:20 (perform sliding window trimming, clipping once the average quality within the window falls below a threshold).

De novo mapping and variant calling

The denovo_map.pl of the Stacks package (version 2.5), a wrapper script for ustacks, cstacks, sstacks, tsv2bam, and gstacks, was used to map reads de novo without using a reference genome. The options for denovo_map.pl were -M 4 (number of mismatches allowed between stacks within individuals [for ustacks]), -n 4 (number of mismatches allowed between stacks between individuals [for cstacks]), and -m 3 (number of identical reads required to initiate a new putative allele [for ustacks]). Genotyping data were created to specify all the samples under analysis by assigning one individual per population. After performing denovo_map.pl, the populations program of the Stacks package (version 2.5) created the vcf (variant call format)41, plink42, phylip, and treemix files using the -R 0.5 (minimum percentage of individuals across populations required to process a locus), -write-single-snp (restrict data analysis to only the first SNP per locus), -min-maf 0.05 (minimum minor allele count required to process a SNP), -vcf, -plink, -phylip-var-all -treemix options. The populations program was also used to create pairwise alignments between the two individuals. In this case, the -R option was set to one.

Principal component analysis (PCA) and cluster analysis

Principal component analysis was performed based on the vcf file generated by the populations program. The SNPRelate program43 in the R software environment (version 3.6.2)44 converted the vcf file to a gds (genomic data structure) file, drew the PCA diagrams, and calculated the contribution ratios for each principal component. This step used only bi-allelic loci. The SNPRelate program plotted the dendrogram by considering the identity by state (IBS) pairwise distances. Images of the results were generated using the basic functions of the R software environment.

Admixture analysis

The plink program (plink 2, version 1.90p)42 was used to create the input files for the admixture program. The admixture program (version 1.3)45 was used to determine the admixture history and the cross-validation (CV) error for the hypothetical runs from K (number of ancestral populations) = 1–12. The CV error plot was drawn using the received log data to determine the optimal K value. R software (version 3.6.2) was used to draw admixture plots using the Q estimate files created by the admixture programs. Images of the results were generated using the basic functions of the R software environment.

Maximum likelihood phylogenetic tree analysis

ModelTest-NG was used to select the model46. The maximum likelihood (ML) phylogenetic tree was created using the raxmlHPC-PTHREADS-SSE3 program (version 8.2.12)47. The phylip file created by the populations program was used as the input file. The parameters for raxml were -f a (rapid bootstrap analysis and search for the best-scoring ML tree in one program run),-x 12,345 (an integer number [random seed] and turn on rapid bootstrapping), -p 12,345 (a random number seed for parsimony inferences), -N 1000 (bootstrap value), and -m GTRGAMMAX (model of binary [morphological], nucleotide, multi-state, or amino acid substitution). Group H was used as the root. Image of the phylogenetic tree was generated using Dendroscope (version 3.6.3)48 and TreeView X (version 0.5.0, https://treeview-x.en.softonic.com/).

Phylogenetic tree analysis under the coalescent model

Phylogenetic tree analysis was performed using SVDquartets49 integrated into PAUP (version 4.0a) software50. The input file for the PAUP was created by converting a phylip file containing all sites to the nexus file format. The parameters used for SVDquartets were ‘quartet evaluation’, ‘evaluate all possible quartets’, ‘tree inference’, ‘select trees’, ‘using QFM quartet assembly’, ‘tree model’, ‘multi-species coalescent’, ‘write quartets file’, ‘QMC format’, ‘handling of ambiguities’ and ‘distribute’. The number of bootstrap analyses was 1000 replicates. Group A was used as the root. Image of the phylogenetic tree was generated using FigTree (version 1.4.4, http://tree.bio.ed.ac.uk/).

Statistical analysis

Groups separated by PCA and cluster analysis were assigned as separate populations in the population map data required for the Stacks package. The denovo_map.pl was re-performed after modifying the data for the population map. The populations program was run with the options -R 0.5, -write-single-snp, -min-maf 0.05, -fstats (SNP and haplotype-based F statistics), and -Fst_correction (a correction to be applied to Fst values with p-value [default p-value to keep an Fst measurement: 0.05]). The program treemix (version 1.12)51 inferred gene flow using the f3- and f4-statistics with the option -k 500 (standard errors in blocks of 500 SNPs).