Introduction

The wild boar (Sus scrofa) is one of the most widely distributed terrestrial mammal species worldwide. These boars originated from islands in Southeast Asia and colonized various areas of Eurasia during several migration waves until the early Holocene1,2. Wild boars are classified into 16 subspecies based on their morphological characteristics and are found in Asia, Europe, and North Africa3; two subspecies exist in Japan, the Japanese wild boar (Sus scrofa leucomystax) and Ryukyu wild boar (Sus scrofa riukiuanus). These two subspecies are genetically distant and inhabit separate areas of Japan4. An ancestor of the Japanese wild boar spread from Southeast to Northeast Asia during the mid-to-late Pleistocene5 and expanded its habitats in Japan. In contrast, the Ryukyu boar migrated from Taiwan and elsewhere, unlike the Japanese boar, which migrated across the continent6.

Even when population densities are temporarily reduced by infectious disease or hunting, number of this species can increase rapidly owing to its high reproductive potential. In fact, over the past four decades, the Japanese wild boar population has progressively expanded by approximately 1.9 times7, resulting in substantial damage to food crops and approximately USD 34 million worth of damage in 20218. Notably, the wild boar population density of Japan is higher than that of other countries9. For the proper management of wild boars, an understanding of their present geographical distribution pattern is crucial.

Genetic characterization has been widely used to determine the population structure and patterns of gene flow in wildlife. DNA markers, such as mitochondrial DNA (mtDNA), microsatellites (simple sequence repeats [SSRs]), and single nucleotide polymorphisms (SNPs) using whole-genome sequences, are commonly used to analyze the population structure10,11. In comparison to other genetic analysis methods, mtDNA analysis is generally considered easier in terms of sample collection and laboratory procedures12. Additionally, mtDNA analysis provides only maternal genetic information, and inferences are based on a single short haploid locus. In contrast, because microsatellite loci from SSRs carry highly polymorphic, neutral, and biparental gene flow, they can provide relatively finer patterns than mtDNA for detecting recent population differences and admixtures13,14. However, population studies using SSR markers or SNPs are expensive and time-consuming.

Numerous extensive phylogeographic investigations of wild boars have been conducted in Japan, primarily employing the mtDNA control region15,16. However, the latest comprehensive study by Watanobe et al., which covered eastern Japan, dates back to 2003. Furthermore, as previously mentioned, the rapid expansion of the wild boar population’s range in Japan over the last four decades7 may have led to alterations or increased complexity in the population structure compared to the period of these earlier studies.

In this study, we conducted large-scale sampling covering Japanese wild boar habitats, comprising a total of 1062 individuals, to elucidate regional patterns of genetic population structure using 30 microsatellite loci (Fig. 1). We report the population structure and migration patterns of wild boar in Japan. Our study provides insights into the historical processes of biological range expansion, connectivity and dispersal barriers in wild boar and will be useful for wild boar management in Japan. Therefore, this study will be helpful as an invaluable reference data for future investigations of the genetic structure of the wild boar population on a large scale during and after the process of its recovery.

Figure 1
figure 1

Geographical distribution of 1062 wild boar samples collected in Japan from 2014 to 2020. Red points represent the locations where the wild boar samples were collected. Grey areas represent wild boar habitats in Japan as surveyed by the Japanese Ministry of Environment17. The details of each site are listed in Supplementary Data 3. The dashed line indicates the boundary between the habitats of Japanese wild boars (Sus scrofa leucomystax) and Ryukyu wild boars (Sus scrofa riukiuanus). Geographical data regarding the administrative boundaries were downloaded from the National Land Numerical Information download service, which is provided by the Ministry of Land, Infrastructure, Transport, and Tourism of Japan18. The map was created using QGIS version 3.28.1019.

Results

Microsatellite genotyping

Except for S0090, 29 loci in > 99% of individuals were genotyped. MICROCHECKER identified no loci with null alleles common to all populations. We identified no paired loci with a linkage disequilibrium (\(\overline{r}\)D) greater than 0.2, suggesting low covariation among the loci (Supplementary Data 1). Consequently, we used 29 locus alleles (excluding S0090) in subsequent analyses. All loci showed high levels of polymorphism, resulting in a total of 471 alleles, ranging from 7 (Sw122) to 35 (S0097), that were detected for these 29 loci; we detected a mean of 16 alleles per locus. The mean observed heterozygosity (HO) and expected heterozygosity (HE) ranged from 0.16 to 0.78 and 0.22 to 0.92, respectively (Supplementary Data 2). All loci deviated significantly from the Hardy–Weinberg equilibrium (HWE; p < 0.05).

Population structure

Discriminant analysis of principal components (DAPC) indicated that K = 15 was the most appropriate number of clusters for describing the dataset with the lowest Bayesian information criterion values (Supplementary Fig. 1a). Forty-two principal components (PCs) were retained via cross validation (Supplementary Fig. 1b). The genetic structure pattern showed that individuals close to each other were essentially in the same cluster; 90% clustering occurred within approximately 200 km (median 146 km) (Fig. 2a,b, Supplementary Fig. 2). The first discriminant function (PC1) separated Clusters 1 and 2 from the rest, whereas the second discriminant function (PC2) separated the clusters according to the longitude of the locations from where the individuals were collected (Fig. 2c). Clusters 6, 8, 9, 10, and 11 overlapped geographically and genetically, whereas the remaining clusters were geographically exclusive of the surrounding clusters. Notably, the geographical distance between individuals belonging to Cluster 3 showed a peak at 100 and 900 km, respectively, whereas the other clusters showed a single peak (Supplementary Fig. 2). In Cluster 3, one spatial cluster was located in the Kanto region, whereas the other was found in the Kyushu region, a large western island separated by the sea (Fig. 2a,b).

Figure 2
figure 2

Population structure of Sus scrofa leucomystax in Japan. (a) Bar plot of individual ancestry proportions for the genetic clusters inferred using discriminant analysis of principal components (DAPC). Estimated Bayesian genetic structures were based on 29 microsatellite loci with K = 15. (b) Pie charts represent the proportion of the assignment to each cluster based on the DAPC results shown in (a). Geographical data regarding the administrative boundaries were downloaded from the National Land Numerical Information download service, which is provided by the Ministry of Land, Infrastructure, Transport, and Tourism of Japan18. This map was created using QGIS version 3.28.1019. (c) Scatterplot of DAPC with principal components, PC1 and PC2.

Spatial autocorrelation

We observed a decrease in spatial autocorrelation as geographical distance increased. Moreover, we observed a significant positive correlation for spatial distances of up to 200 km (p < 0.05). When the distance between individuals was greater than 200 km, they were no longer related (i.e., spatially independent) (Fig. 3).

Figure 3
figure 3

Mantel correlogram depicting the association between genetic and geographic distances among pairs of individuals within each distance class of 20 km. Black solid points indicate significant association, whereas white empty points indicate no significance or a significantly negative correlation value (p < 0.05).

Potential geographical barrier

Because the estimated effective migration surface (EEMS) analysis method visualizes migration patterns and tests spatial genetic structures using genomic data and surrounding locations defined by the density of demes, we used 1000 demes, the maximum number allowed by the program. As all five Markov chain Monte Carlo runs of the program converged (Supplementary Fig. 3), we considered this analysis reliable. Our EEMS analysis showed potential barriers to migration not only in the sea separating Honshu from Shikoku and Kyushu, but also in terrestrial areas (Fig. 4a). Of these potential barriers, we identified six (labelled B1–B6) with greater than 90% posterior probability in the Bayesian estimation of migration parameters (Fig. 4b). We observed that six barriers were aligned with forest discontinuity areas (B1–3, B5–6) or topographical obstacles, such as the sea (B4). We identified three potential geographical barriers (B1–3) in Eastern Honshu. B1 was located in mountainous areas and the large southern plains. B2–3 was located in the plains surrounding urban areas, such as Nagoya and Osaka. In the Kyushu region, there were two areas with low migration rates (B5–6) and low forest connectivity. In contrast, we found no clear or significant geographical barriers in the Shikoku region or western Honshu.

Figure 4
figure 4

Effective migration surface distribution and comparison with forest continuity for identification of geographical barriers. Geographical data regarding the administrative boundaries were downloaded from the National Land Numerical Information download service, which is provided by the Ministry of Land, Infrastructure, Transport, and Tourism of Japan18. (a) Blue regions indicate a higher effective migration rate, whereas orange regions indicate lower effective migration rate than expected. (b) Forest continuity map20 overlaid with the geographical barriers that have > 90% probability shown as B1–6. The maps were created using R 4.0.2.

Discussion

In this study, we gathered samples from 1062 wild boars inhabiting various regions across Japan where wild boar populations are found. We used 29 microsatellite loci to reveal the genetic population structure at high resolution and identify the potential geographic barriers preventing the migration of wild boars. We observed a significant deviation from the HWE for all loci, which is consistent with the findings of other studies21,22, as there may be greater differences between subpopulations in large areas. From our multivariate clustering (DAPC) and EEMS analyses, we identified 15 genetically meaningful clusters; individuals belonging to each cluster were closely distributed, except for Cluster 3. Additionally, the clusters were separated by geographical features such as forest discontinuity, plains, and sea.

Cluster 1 individuals were exclusively located in an area separated from other areas by barrier B1 at its western and southern borders (Figs. 2 and 4). Murase et al.23 also reported that the wild boar populations in this area were a single population. B1 was identified as a continuous gap in mountainous areas, stretching in the northern and eastern directions (Fig. 4). Our findings are also consistent with those of a previous study conducted at the local scale, which suggested that rivers and urbanized areas act as natural migration barriers24. However, as several land uses are included in B1, it is unclear which land use category hinders migration the most.

In the area south of B1, individuals belonging to Cluster 2 were found (Figs. 2 and 4). This cluster was located on a peninsula with a large plain in the north and the surrounding sea, indicating that it was also an isolated population. However, in contrast to Cluster 1, the population in this peninsula became extinct in the 1970s due to indiscriminate hunting and infectious diseases, such as classical swine fever (CSF)23. Subsequently, in the mid-1980s, wild boars were anthropogenically reintroduced into the southern region of the peninsula for hunting purposes, and these wild boars subsequently expanded their habitat into the northern area23. Because EEMS is not suitable for examining the ancestral relationship between subpopulations, further studies focusing on SNPs are required to identify the genetic source of Cluster 2.

Although EEMS analysis identified the existence of B2 and B3 in the north of the Kii Peninsula, individuals in Clusters 6, 8, and 9 were located on both sides of B2 and B3 (Figs. 2 and 4; see also Fig. 1, which includes the peninsula names). Thus, these barriers are not completely responsible for the fragmentation of gene flow in different regions. Additionally, we did not identify any clear geographical barriers west of the Kansai region, suggesting that individuals belonging to Clusters 6–11 were more likely to migrate among each other than to the other clusters. Although Watanobe et al.5 reported the Japanese Alps in the center of Honshu as a potential geographic barrier to migration, we found no evidence of major discontinuities consistent with the Japanese Alps, which lie between B1 and B2. Our results indicate that gene flow is active among several subpopulations because of the high forest continuity in the areas surrounding B2 and B3. Furthermore, several studies have reported that landscapes such as forest cover or small-scale agriculture facilitate the dispersal and genetic homogenization of wild boars25,26.

B4 coincides with the sea between the Honshu, Shikoku, and Kyushu islands (Fig. 4; see also Fig. 1, which includes the names of these islands). Although mtDNA analysis has only identified the sea separating Honshu and Kyushu as a genetic boundary5, our analysis showed that this sea may also act as a barrier to dispersal. On Kyushu Island, B5 and B6 coincided with the plains with interrupted forest connectivity (Fig. 4), suggesting that they contributed to the population formation of Clusters 3, 13, 14, and 15 (Fig. 2).

Mantel analysis demonstrated a significant positive spatial autocorrelation between individuals (p < 0.05) located at distances of < 200 km (Fig. 3). This finding is consistent with the distance reported in other studies26,27. Furthermore, when plotting the distribution of the pairwise distances between individuals in each cluster, the 90th percentile for each cluster was < 200 km, except in Cluster 3 (Supplementary Fig. 2). These results indicate that wild boar subpopulations with genetic similarity were created with a diameter of approximately 200 km. Notably, most individuals in Cluster 3 were distributed in two distant locations (Supplementary Fig. 2): the Kanto (n = 32) and the Kyusyu (n = 33) regions, approximately 900 km away. Several studies have reported that the J12 haplotype, which was identified using mtDNA analysis in these studies, was also present in geographically distant areas, i.e., the Kanto, Kyusyu, and the Shikoku regions5,23, 28. The wild boar population in the western part of the Tochigi Prefecture in the Kanto region became extinct in approximately 1897 because of unregulated hunting and a CSF epidemic28,29. Subsequently, the population increased in southwestern Tochigi, either through range expansion from Gunma, which is located as western prefecture or translocation from distant areas mediated by humans29,30. Considering these circumstances, it is likely that the individuals belonging to Cluster 3 in the Tochigi Prefecture in this analysis were a natural expansion of the range after a previous human-mediated translocation from Miyazaki to the Gunma Prefecture. As discussed above for Cluster 3, further analysis focusing on SNPs is required to genetically prove this hypothesis.

However, caution should be exercised when interpreting the results of this study. First, even though we performed microsatellite analysis of a sufficient number of loci to identify their genetic clusters and elucidate geographical barriers to the distribution of wild boar populations, using SNPs or whole genome analysis could provide finer-scale population structures and evolutionary relationships among subpopulations than SSRs. Furthermore, the representation of genetic variation representation could have been biased based on the use of a reference SSR panel in a large-scale study. In addition, “size homoplasy”, specifically individuals with the same allele size but different ancestries, might have occurred in the microsatellite analysis. In fact, this phenomenon is quite common owing to the high mutation rate and the genotyping performed using fragment lengths that contain repetitive regions and flanking regions31,32. Thus, for each marker, not all allele value differences corresponded to repeat units. Therefore, further investigations are required to confirm our results, such as considering the chronological aspects between strains. Second, although several studies have reported the influence of topographical obstacles, such as rivers and freeways, on wild boar populations24,33, we did not explicitly consider these parameters here. Moreover, the geographical resolution used for analyzing the geographical barriers in this study seemed to be insufficient (1000 demes; each deme had an edge of approximately 15 km) to evaluate the influence of specific topographical obstacles. Further studies with more samples are required to increase the analysis resolution and identify the specific obstacles acting as barriers to wild boar migration.

In summary, our study demonstrates the population structure and migration patterns of wild boar in Japan. Generally, wild boar management in Japan is mainly conducted at the prefectural level, and their migration or population structure does not correspond to a specific administrative unit. Moreover, it is feasible for wild boar and domestic pigs to interbreed. In fact, the presence of hybrids has been reported in Japan4,34, 35. Thus, when farms have inadequate biosecurity, there is a risk that infectious diseases might also be easily transmitted. This underscores the necessity for cooperation beyond the municipal level for proper management. From the viewpoint of infectious disease control, the genetic boundaries identified in this study will be useful for planning or evaluating strategies to control the spread of infectious diseases particularly when inter-individual transmission is considered the main route of infection (i.e. in the early stages of an epidemic)36,31,38. In 2018, a CSF outbreak occurred in Japan and is still spreading, not only in domestic pigs but also in wild boars. Mitigating wild boar infection will be key to controlling CSF spread through the use of an oral vaccine or wild boar depopulation39. Additionally, the threat of the African swine fever (ASF) virus entering Japan is increasing, owing to the recent outbreak of ASF across East Asian countries. Hence, the results of the present analysis could provide valuable insights for considering measures to control the subsequent spread of infection, such as a possible ASF outbreak. Moreover, since this study was performed prior to the CSF outbreak, further validation, based on assessing changes in the population structure pre- and post-CSF outbreak, would also provide valuable information regarding the mechanisms of disease spread.

Methods

Ethical approval

All wild boar sampling activities were conducted from hunted and dead wild boars as official surveillance activities by licensed hunters and the Prefectural Animal Hygiene Service Center located in each prefecture. Therefore, no sampling activity from living animals was included in this study and no ethical approval was required.

Study area and sampling collection

We analyzed 1062 blood or tissue samples (ear and tongue) from hunted and dead wild boars collected by licensed hunters throughout the habitat areas identified in the survey between 2014 and 20207 (Fig. 1, Supplementary Fig. 4 and Supplementary Data 3). We recorded the date, location, estimated age, weight, and hunting method associated with each animal. Juveniles caught simultaneously were excluded to ensure sample independency. Blood samples were sent to the National Institute of Animal Health, NARO, where they were maintained at 4 °C and centrifuged for serum extraction. Until processing, tongue and serum samples were stored at -20 °C, and ear samples were stored in alcohol. The extraction of genomic DNA from the samples was done using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. We then performed multiplex PCR amplification using 30 microsatellite markers (Supplementary Data 4), as recommended by the ISAG/FAO40, using AmpliTaq Gold (Applied Biosystems, Waltham, MA, USA). The PCR mixture included 10 ng genomic DNA template, 1 × final concentration of PCR buffer, 200 µM final concentration dNTP mix, 0.1 µM each of forward and reverse primers, and 1 unit Taq DNA polymerase in a final reaction volume of 20 µl. Thermal cycling parameters for multiplex PCR were set with initial denaturation at 94 °C for 9 min, followed by 40 cycles at 94 °C for 30 s, annealing at 55 °C for 30 s, and extension at 72 °C for 1 min. After the PCR, two to three multiple PCRs, ensuring that the amplicon sizes and dyes did not overlap, were mixed and diluted in 1/10 ratio, and then 1 µl of the multiplexed PCR mixture was dissolved in 9 µl HiDi-formamide (Applied Biosystems, Waltham, MA, USA). Typing was performed using capillary electrophoresis on an ABI 3130XL Genetic Analyser (Applied Biosystems, Waltham, MA, USA) with ROX400 as the internal size standard, and the allele size was determined using GeneMapper v4.0 (Applied Biosystems, Waltham, MA, USA).

DNA extraction and genotyping

As minimizing missing information yields the most reliable results41, we excluded loci that could not be genotyped in > 1% of all samples. Potential errors of alleles were checked using the software MICROCHECKER v2.2.342, applying the Brookfield 1 index43, and retained loci with values < 0.244,45. We explored the linkage disequilibrium using the poppr package in R46, and excluded loci with a standard index of association (\(\overline{r}\)D) > 0.2. Based on the final dataset, the number of alleles, observed heterozygosity (HO), expected heterozygosity (HE), inbreeding coefficient (FIS), and deviation from the HWE were computed using the pegas and heirfstat packages47,48.

Population structure

We analyzed the population structure using DAPC implemented in the adegenet package of R49. Compared with other widely used clustering programs, such as STRUCTURE, which relies on an explicit population genetic model50, the DAPC method is a model-free method, particularly for determining deviation from the HWE in genetic clustering analysis. Additionally, the DAPC method is more robust and suitable for assessment when the genetic structure is more or less clinal and under complex dispersal scenarios, such as those represented by stepping-stone and hierarchical stepping-stone models49. The optimal number of retained PCs were determined using cross validation methods to avoid overfitting of the data and creating a artificially large separation between clusters. We identified the optimal number of genetic clusters (K) that minimizes within-cluster variance using k-means estimation, as determined using the Bayesian information criterion, and estimated the probabilities of membership to each cluster for each sample. To assess the spatial aggregation of the samples, we calculated the geographical distance between pairs of individuals within each defined cluster.

Spatial autocorrelation

Generally, genetic and geographic distances are correlated. To evaluate the relationship between genetic and geographic distances across space, we assessed the extent of spatial clustering in the samples using the vegan package in R. We calculated shared allele proportions (Dps) and geographic distances between pairs of individuals using distHaversine. We estimated correlograms of r-values for each class bin at a fixed distance of 20 km. We generated p-values using the permutational Mantel test (1000 permutations) and adjusted them using the Bonferroni procedure to correct the significance level for multiple comparisons.

Geographical barriers

To detect potential geographic barriers and gene flow patterns, we used the EEMS analysis method, which assesses the bias in genetic structure due to isolation by distance and then produces a visual representation of effective migration rates51. This method does not require the use of environmental variables and can therefore be evaluated independently of the hypothesis-driven resistance surface approach. In turn, a map of the inferred effective migration patterns can be used to visualize the spatial genetic structure of large, complex samples. We set the number of demes to 200, 500, and 1000, and ran five independent analyses with 1 × 106 burn-in Markov chain Monte Carlo steps and 1 × 107 iterations. We combined the results of the five runs and plotted them using the rEEMSplots2 package in R (https://github.com/dipetkov/reemsplots2). We superimposed the areas estimated as barriers with a probability of > 90% on the forest continuity polygon to assess the relationship between the natural habitat of wild boar and geographical barriers20.

Statistics and reproducibility

All statistical analyses are performed using either software packages as described or in R 4.0.2. All differences were considered statistically significant when p < 0.05 (see “Results” for details).