Vicia sepium L. (bush vetch, Fabaceae) is a small perennial herb that is widely distributed on the Eurasian continent in the temperate Euro-Siberian broad-leaved forest region1. It has a wide distribution, both horizontally and vertically; for example, in the Swiss Alps, it grows from the lowlands to the sub alpine zone2. However, because its seed dispersal is autochory, the seed dispersal system is not suitable for long-distance dispersal over watersheds. It has been introduced by humans as a pasture plant for livestock and become naturalized in North America1. In Japan, it is reported from only a few regions: Mt. Ibuki, central Honshu; central and southern Hokkaido; and Mt. Ishizuchi, Shikoku. As the distribution areas in Japan are limited and distant from each other, it has been considered that the Japanese V. sepium was introduced from Europe and became naturalized there3,4.

Mt. Ibuki is a mountain in central Japan with a harsh endemic environment5. Although its altitude is not very high (1377 m), the winter snow depth is extreme, reaching 11.82 m in 1927, which is the highest record in the world. Its base soil consists of limestone and is characterized by good drainage and aridity, which limits the development of forests. This unique environment forms an endemic flora composed of plants adapted to the heavy snow, alpine plants and endemic species [e.g., Veronica subsessilis (Miq.) Carrière, Cirsium confertissimum Nakai, and Euphrasia insignis Wettst. subsp. iinumae (Takeda) T.Yamaz.]. These plants on Mt. Ibuki have been used by local people as medicine, fertilizer, or pasture plants since the seventeenth century6.

It is considered that the V. sepium on Mt. Ibuki originated from seeds accidentally mixed with seeds of medicinal plants brought from Europe in the sixteenth century by a European Jesuit missionary, when Nobunaga Oda, a regional governor at the time, directed the creation of a large medicinal plant garden of up to 50 ha7,8,9. However, no traces of this garden exist, and it is described only in an old document called “Nanbanji-kohaiki” (History of the rise and fall of Christian temples in Japan)10. Therefore, the existence of V. sepium and two other non-medicinal plants of putative European origin [Lathyrus pratensis L. and Elymus caninus (L.) L.] is taken as evidence for this garden. Another old document, called “Somoku-zusetsu” (Figures and descriptions of plants)11, reported that a number of V. sepium individuals were growing on Mt. Ibuki. Although this document names this plant “Karasunoendo”, one of the Japanese names for Vicia sativa L., the plant can be identified as V. sepium by the rounded shape of the leaf tip in the drawing. In addition, V. sativa, drawn with its characteristic concave leaf tip, is documented on another page of the book as “Yahazunoendo”. This evidence suggests that V. sepium grew on Mt. Ibuki at least 166 years ago.

In Hokkaido, V. sepium is recorded from one central and two southern localities. A number of specimens from around Naganuma-cho, central Hokkaido, are stored in the herbarium of the Hokkaido University Museum (SAPS), which is the most authoritative herbarium in Hokkaido. In this area, the species grows mainly in artificial meadows on riverbank slopes12 that are close to sites of human disturbance. SAPS holds two old specimens of V. sepium collected from central Hokkaido in 1878 and 1894 (Class’80s.n., July 1878, SAPS066754 and H. Nagaya s.n., early June 1894, SAPS066753), which suggest that V. sepium was already growing here at least 145 years ago. On the basis of this evidence, it has been assumed that V. sepium was introduced into Hokkaido in the early Meiji era (late nineteenth century), during the early colonization of Hokkaido, by contamination of pasture seeds4,12.

In southern Hokkaido, it is recorded from Samani-cho and Matsumae-cho. The former is reported only in Takahashi13, and no specimens are stored in SAPS. Therefore, its distribution in Samani-cho is uncertain. The latter is a recently found new site, and two specimens are held (T. Shimazaki 21-023, 5 May 2021, SAPS061611 and T. Shimazaki 21-047, 24 May 2021, SAPS061868). Their habitats are bedrock on a ridge and meadow along a trail, respectively. In aerial photographs in Google Maps (, accessed on 13 April 2023), the new collection site seems to be a natural environment covered by forests and far from human disturbance. This new finding raises the question of whether V. sepium growing here is introduced or native.

Vicia sepium is also recorded from Mt. Ishizuchi, Shikoku3,4. These descriptions are probably based on the flora list of Mt. Ishizuchi reported by Jinno and Yamamoto14. Although “V. sepium” is listed, the Japanese name is given as “Karasunoendo”, one of the Japanese names for V. sativa. Since Jinno and Yamamoto14 did not illustrate the plants and no voucher specimens are recorded, unlike in the case of Mt. Ibuki, the plant may be V. sativa that was misidentified as V. sepium. In addition, V. sepium is not included in any of the later literature on the flora of Mt. Ishizuchi e.g.15,16. Thus, we considered that the distribution record of “V. sepium” on Mt. Ishizuchi indicates V. sativa.

Thus, only three regions in Japan are reliably known to have V. sepium growing: Mt. Ibuki and central and southern Hokkaido (Fig. 1). If it was introduced from Europe during the early modern period, there should be its signatures in its genome. When we estimate the trajectory of past population size changes of Japanese V. sepium populations over time, if only part of the diversity would have been introduced, the Japanese V. sepium population should show a sudden and severe population bottleneck at that time17. Furthermore, by comparing the phylogenetic relationships between Japanese and overseas samples, we may be able to get to the origin. Therefore, here, we investigated chloroplast and nuclear DNA data of Japanese and overseas (European, Siberian, and East Asian) V. sepium samples, and clarify the origin of Japanese V. sepium lineages, through phylogenetic, population structure, and population demographic analyses.

Figure 1
figure 1

Distribution and locations of sampling sites of Vicia sepium in Japan. (a) Known distribution of V. sepium in Japan. Sample sites; uncertain sites. (bd) Population locations on (b) Mt. Ibuki, (c) central Hokkaido, (d) southern Hokkaido. Inset in (b) shows flowers of V. sepium in Mt. Ibuki. Maps were created by R packages ggplot2 version 3.4.2 ( and maps version 3.4.1 ( using digital elevation model data freely provided by the Geographical Information Authority of Japan (

Materials and methods

Sampling and DNA extraction

We collected leaves of V. sepium from three populations in each on Mt. Ibuki and in central Hokkaido in 2016–2019 (Fig. 1). We also collected V. sativa and Vicia hirsuta as an outgroup in Mino, Japan, in 2016. Leaves were immediately dried with a silica gel and stored at room temperature until DNA extraction. DNA samples of German V. sepium plants were provided by the Botanic Garden and Botanical Museum Berlin-Dahlem18. We used leaves from southern Hokkaido and Russian V. sepium specimens stored in SAPS for DNA extraction. Details of the samples used in this study are summarized in Table 1. All plant samples were collected with permission. Genomic DNA was extracted from leaves by the CTAB method19 or with a DNeasy Plant Mini Kit (QIAGEN, Germany).

Table 1 Samples used in this study.

Chloroplast DNA sequencing

Two cpDNA regions, rbcL and matK, were sequenced from 37 individuals (Table 1) with the primers listed in Table S1. Although we attempted to obtain sequences from a Russian sample, no DNA was amplified from either region. The total volume for the PCR was 5.5 μL, containing 1.0 μL of template DNA, 2.5 μL of SapphireAmp Fast PCR Master Mix (Takara Bio, Japan), and 0.2 μM of each primer. PCR was performed on a TaKaRa PCR Thermal Cycler Dice Touch (Takara Bio, Japan) with an initial denaturation for 1 min at 94 °C; 35 cycles of denaturation for 5 s at 95 °C, annealing for 5 s at 58 °C, and extension for 7 s at 72 °C; and a final extension for 5 min at 72 °C. After purification of the PCR products with ExoSAP-IT (Thermo Fisher Scientific, USA), sequencing and electrophoresis were performed at Hokkaido System Science, Japan, with a BigDye Terminator Cycle Sequencing Kit version 3.1 (Thermo Fisher Scientific, US). Sequence data were aligned with sequences obtained from GenBank (Table S2) in MEGA version 11 software20.

High-throughput sequencing and SNP calling

Multiplexed inter-simple-sequence-repeat genotyping-by-sequencing (MIG-seq), which is effective for low-quality DNA such as that extracted from stored samples, was used to identify the genotypes of nuclear genomes21. As this study was initiated by two independent projects that later merged, the MIG-seq libraries were adjusted separately at Lake Biwa Museum and Tohoku University (Table 1). However, as we used the same protocol as Suyama, et al.22 to adjust the libraries, we considered that the two sequence data sets could be combined. The two libraries were sequenced independently on an Illumina MiSeq platform with a MiSeq Reagent Kit v3 (150 cycles; Illumina, CA, USA). Since we had already sequenced these libraries, the new samples from southern Hokkaido (H4) could not be included in the MIG-seq analysis.

Raw sequences from which primer sequences had already been removed with the “DarkCycle” option of the MiSeq control software were grouped into each index by using the index read option of the sequencer. Quality was filtered with the fastq_quality_filter of the FASTX_Toolkit version 0.0.13 ( with the “-q 30 -p 40” option. Extremely short reads were removed in TagDust version 1.33 with the “-fdr 0.01” option23. R1 and R2 reads from the same sample were concatenated into a single fastq file21. To remove sequences derived from the organelle genomes, we mapped each read to the mitochondrial and chloroplast genomes of Vicia fava KC18994724; and V. sepium MG68235225; in BWA version 0.7.17 with the BWA-MEM algorithm26, and unmapped reads were extracted in SAMtools version 1.927.

To obtain maximum and optimum SNPs for each analysis (Table 2), we performed both reference-based and de novo assemblies. For the reference-based assembly, we used the complete genome of V. sativa VSA_r1.028. Cleaned reads were mapped to the reference genome with the BWA and BWA-MEM algorithm, and bam files were generated by SAMtools. We then created a catalogue of each assembled locus with the gstacks tool of the Stacks version 2.62. For de novo assembly, we used the ustacks tool with the “-m 3 -M 2” option to assemble each read within an individual. We then created bam files for each individual using the cstacks with the “-n 2” option, sstacks and tsv2bam tools. Finally, we created a catalogue of each assembled locus with gstacks as in the reference-based assembly. In both assemblies, SNPs for each analysis were extracted from bam files and from the catalogue using the populations tool of Stacks with the options shown in Table 2. In addition, pairwise R2 values for each SNP pair in the datasets for phylogeny, genetic diversity, and population structure analyses were calculated for each SNP pair with PLINK version 1.90 beta 5.229. If R2 > 0.6, we removed one of the SNPs with a lower genotyping rate to reduce the effect of linkage between SNPs, using the “whitelist” option of the populations tool of Stacks30.

Table 2 Data set summary for each analysis.

Data analysis

Two cpDNA regions were concatenated and a haplotype network was constructed in TCS version 1.2131. For the SNP dataset, phylogenetic analysis was performed using a maximum likelihood approach implemented in RAxML-NG version 1.0.332. We used the GTR + Gamma model with an option to correct ascertainment bias (GTGTR4 + G + ASC_LEWIS), and performed 100 bootstrappings. The tree was visualized in FigTree version 1.4.4 (

To compare levels of genetic diversity between Japanese and overseas samples, we calculated the proportion of heterozygous sites within an individual, because the number of overseas samples is small. Differences in the proportion of heterozygous sites between sampled regions, excluding Russia (only one sample), were tested using a generalized linear mixed model with a binomial error distribution implemented in the lme4 package of R version 1.1.3133. Individual differences were treated as a random effect (random intercept).

Population structure was estimated with Admixture version 1.3.034. The number of ancestral populations (K) was assumed from 1 to 8. The program was run with default settings and performed 10 times for each K with a different random seed. To determine the best K, we used log-likelihood, ΔK35, and cross-validation (CV) error. Unlike log-likelihood and ΔK, CV-error has the best K with the lowest value. We calculated ΔK in the corrsieve package of R version 1.6.936. Independent results of the same K were merged in CLUMPAK37 to draw bar plots of ancestry.

A one-dimensional minor allele site frequency spectrum (1D-mSFS) was calculated from the vcf file with the R script “1D-msfs-R” ( Missing data were compensated for by bootstrapping. Using the 1D-mSFS, we estimated the trajectory of past effective population size changes with a maximum likelihood method implemented in Stairway plot version 2.1.138. We used a mutation rate of 7 × 10−9 per site per generation, which was estimated in Arabidopsis thaliana39. To convert the time scale from generations to years, we assumed two years per generation, as our study species is a perennial herb.

To infer when the Mt. Ibuki and central Hokkaido lineages diverged, we constructed and compared two population divergence models (Fig. 2). Model 1 (ancient divergence) assumes that the two lineages had already diverged before the end of the last glacial period (104 years ago). Model 2 (recent divergence) assumes that they diverged during the colonization of Hokkaido (within the last two centuries) and that the central Hokkaido lineage was introduced from Mt. Ibuki by humans. Any change in population size was assumed from the results of the population size change analysis (see Results). The search ranges of the parameters are summarized in Table S3. We did not consider migration between lineages on account of their separation distance. The same mutation rate and generation time were used as in the population size change analysis. For the observed data, we calculated a two-dimensional minor allele site frequency spectrum (2D-mSFS) from the vcf file with the R script “2D-msfs-R” ( Missing data were compensated for by bootstrapping within a lineage. The likelihood of each model was maximized from 50 random starting values, 40 expectation-conditional-maximization (ECM) optimization cycles, and 100,000 coalescent simulations in fastsimcoal2 version 270940. We calculated Akaike's information criterion (AIC) and selected the model with the lowest AIC value as the best model. The goodness of fit of the best model was checked by visually comparing the observed and predicted 2D-mSFS. Its confidence interval was calculated by parametric bootstrapping. We simulated the model in fastsimcoal2 with the maximum likelihood estimate parameter values 100 times and obtained its 2D-mSFS. Using the simulated 2D-mSFS as input data, we recalculated the parameters of the best model with the observed parameter values as a starting value, 15 ECM cycles, and 100,000 coalescent simulations. Finally, a 95% confidence interval (CI) was calculated from the obtained parameter values.

Figure 2
figure 2

Two divergence models. N, effective population size; T, event time.

We confirm that all the methods were carried out in accordance with relevant Institutional guidelines and regulations.


Phylogenetic relationships

In total, we obtained 27,206,661 raw reads after quality filtering and removal of organelle genomic regions from 85 individuals. The average number of raw reads per individual was 320,078 (range 193,678–559,622). These raw reads were used for SNP calling, and we used 370–1725 SNPs in each analysis (Table 2).

After alignment of the cpDNA sequences, we obtained 1064 bp concatenated sequences of rbcL and matK. There were three distinct haplotypes excluding V. sativa (Fig. 3a). All 32 Japanese sequences were assigned to haplotype J. All three UK sequences and one from Soest, Germany, were assigned to haplotype UG. Two Chinese sequences and the other four from Germany were assigned to haplotype CG.

Figure 3
figure 3

Phylogenetic relationships of Vicia sepium samples. Haplotype network of cpDNA sequences (a). J, Japanese samples. UG, UK sample and Soest, Germany, samples. CG, Chinese and remaining German samples. Missing haplotype. Maximum likelihood phylogenetic tree of nuclear genome (b). Numbers indicate bootstrap probability (major nodes only). Abbreviations are as in Table 1.

On the maximum likelihood phylogenetic tree, V. sepium showed clear monophyly with a bootstrap probability of 100% (Fig. 3b). The Japanese samples also showed clear monophyly with a bootstrap probability of 97%. The Mt. Ibuki and central Hokkaido samples showed monophyly, with bootstrap probabilities of 89% and 72%, respectively.

Genetic diversity and structure

The average proportions of heterozygous sites were 0.071 in Mt. Ibuki samples, 0.081 in central Hokkaido samples, 0.158 in German samples, and 0.143 in Russian samples (Fig. 4). The average values of the proportion of heterozygous sites in the Japanese samples were almost half of that in the overseas samples and significantly lower than that in German samples.

Figure 4
figure 4

Proportion of heterozygous sites within an individual. Different letters indicate significant differences between regions. The horizontal line indicates the median. Lower and upper limits of the box indicate 25th and 75th percentiles. Lower and upper limits of the vertical bar indicate minimum and maximum values within 1.5 × the length of the box. Dots indicate outliers.

As the number of ancestral population (K) increased, the log-likelihood also increased (Fig. 5a). However, ΔK was highest and CV error was lowest at K = 2 (Fig. 5b,c). As the CV errors of K = 2–4 were also smaller than that of K = 1, bar plots of ancestry in K = 2–4 are shown (Fig. 5d). At K = 2, populations I1 and I2 had a large proportion of cluster 2–1, which was dominant in the central Hokkaido populations (H1–3). However, with increasing K, the proportion became smaller and the cluster dominant in the central Hokkaido populations remained dominant only in central Hokkaido.

Figure 5
figure 5

Population structure estimated by Admixture. Changes in (a) log-likelihood, (b) ΔK, and (c) cross-validation error with K. (d) Proportion of ancestries from K = 2 to 4.

Population demography

Both lineages showed high effective population sizes during the last glacial period (until 104 years ago; Fig. 6). The central Hokkaido lineage began a gradual decline from 5000 years ago. The Mt. Ibuki lineage maintained a high effective population size until relatively recently, but showed a rapid decrease around 300–400 years ago. Finally, both lineages had a smaller current effective population size than that of the last glacial period.

Figure 6
figure 6

Trajectories of effective population size with time on Mt. Ibuki and in central Hokkaido.

Model 1 (ancient divergence) had a lower AIC value (ΔAIC = 54.9) than model 2 (recent divergence), and was selected as the best model (Table S4). The divergence time between the lineages was estimated to be 32,792 (95% CI 25,309–38,631) years ago (Table S5).


Although V. sepium is widely distributed on the Eurasian continent, its distribution in Japan is very limited. As explained in the introduction, it is possible that each Japanese lineage was introduced in different eras by different pathways. Here we assess this possibility on the basis of the genetic data.

The cpDNA network and nuclear DNA phylogenetic tree showed that the Japanese V. sepium lineage was monophyletic and clearly different from overseas samples (German, English, Russian, and Chinese). This indicates that the Mt. Ibuki and Hokkaido lineages have the same origin. Although the cpDNA haplotype of the two lineages was the same, each lineage was monophyletic on the nuclear DNA phylogenetic tree and had a unique ancestry at K = 3 and 4 in the genetic structure analysis. Thus, the two Japanese lineages are closely related but clearly distinct. However, from the results of genetic structure analysis, especially at K = 2, the genetic cluster dominant in the Hokkaido populations (H1–3) was also admixed in the Mt. Ibuki populations on the Shiga Prefecture side (I1) and the summit of the mountain (I2). Considering these facts, we think it unlikely that each Japanese lineage was introduced in different eras via different routes from the same source. Rather, two rational hypotheses can be considered: (1) Japanese V. sepium is not an introduced species but is a native species; and (2) it was introduced from Europe to Mt. Ibuki in the sixteenth century and form there into Hokkaido at the beginning of its colonization. To test these two hypotheses, we made two models. In Model 1 (ancient divergence), the two lineages diverged naturally before the end of the last glacial period, and the observed genetic admixture is due simply to incomplete lineage sorting. In Model 2 (recent divergence), the Hokkaido lineage was introduced recently by humans from Mt. Ibuki during the colonization of Hokkaido (ca. 150 years ago). Model selection supported the ancient divergence model, and the observed genetic admixture was due simply to incomplete lineage sorting. The best model also showed that the two lineages diverged during the last glacial period [maximum likelihood estimate of divergence time was 32,792 (95% CI 25,309–38,631) years ago]. Furthermore, if the Hokkaido lineage was introduced by humans, it should show a sudden population bottleneck in its population-size-change history at that time, because only a very small fraction of genetic diversity would have been introduced17. However, the trajectory of its population size change showed a continuous decline. This also rejects the hypothesis that the Hokkaido lineage was introduced by humans from Mt. Ibuki.

On the other hand, the Mt. Ibuki lineage showed a sudden population bottleneck 300–400 years ago. Although this supports the hypothesis that the Mt. Ibuki lineage was introduced by Christian missionaries in the Middle Ages, it contradicts the previous paragraph. Instead, since the whole of Mt. Ibuki has been used by local people from the Middle Ages to about 70 years ago to collect plants for medicine, fertilizer, and feed6, it is simpler to assume that this sudden population bottleneck observed is due to such human disturbance. This may become clearer when the population demographies of the other grasses or herbs growing on Mt. Ibuki are investigated in future studies.

All of our population genetic analyses of the SNP data sets generated by MIG-seq are based on the neutral theory. However, since MIG-seq is a genotyping-by-sequencing method closely related to microsatellites, and some microsatellites modulate transcription factors41, some SNPs may have been affected by natural selection. In addition, although the recent new evolutionary theory, maximum genetic diversity theory, suggests that the mutation rate changes in different genomic regions depending on the degree of complexity of organisms42, our demographic modeling assumes a constant mutation rate throughout the genome, and it may affect the estimation of event time, i.e., divergence time or time for population size change. Therefore, in the future, a more realistic population genetic model will be able to estimate more accurate results.

Although our overseas sample collection is incomplete and sequenced regions of cpDNA are short and limited, even if we will increase overseas samples and sequence more regions, the main result will not change, and therefore, we concluded that the Japanese lineage of V. sepium is not introduced but is native. The existence of a recently discovered natural habitat in southern Hokkaido supports this conclusion. Therefore, its existence on Mt. Ibuki does not support the existence of a rumored medicinal plant garden, as mentioned in the introduction. The Mt. Ibuki and the Hokkaido lineages diverged during the last glacial period, and the effective population size of the Hokkaido lineage decreased after the end of the last glacial period, owing to climate change during the Holocene. Although the Mt. Ibuki lineage maintained its effective population size after the end of the last glacial period owing to the endemic environment of the mountain, its effective population size also decreased owing to recent human disturbance. As a result, these two Japanese lineages now have very low genetic diversity compared with overseas V. sepium samples. Therefore, the distribution of V. sepium, which is widely distributed on the Eurasian continent, is very limited in Japan, perhaps because the Holocene climate of Japan is unsuitable for it. Although there are no plants distributed only in Mt. Ibuki and Hokkaido like V. sepium, as the Mt. Ibuki is known to be the southern/western limit of subalpine to alpine plants, for example, Geranium yesoense43. In addition, Japanese alpine plants distributed from the high mountains of central Japan to Hokkaido have two different origins, southern or northern44. Future population genetic studies of V. sepium with more detailed sampling in East Asian regions, including the Maritime Province of Siberia surrounding the Sea of Japan may help to elucidate its origin.