A reduced SNP panel to trace gene flow across southern European wolf populations and detect hybridization with other Canis taxa

Intra- and inter-specific gene flow are natural evolutionary processes. However, human-induced hybridization is a global conservation concern across taxa, and the development of discriminant genetic markers to differentiate among gene flow processes is essential. Wolves (Canis lupus) are affected by hybridization, particularly in southern Europe, where ongoing recolonization of historic ranges is augmenting gene flow among divergent populations. Our aim was to provide diagnostic canid markers focused on the long-divergent Iberian, Italian and Dinaric wolf populations, based on existing genomic resources. We used 158 canid samples to select a panel of highly informative single nucleotide polymorphisms (SNPs) to (i) distinguish wolves in the three regions from domestic dogs (C. l. familiaris) and golden jackals (C. aureus), and (ii) identify their first two hybrid generations. The resulting 192 SNPs correctly identified the five canid groups, all simulated first-generation (F1) hybrids (0.482 ≤ Qi ≤ 0.512 between their respective parental groups) and all first backcross (BC1) individuals (0.723 ≤ Qi ≤ 0.827 to parental groups). An assay design and test with invasive and non-invasive canid samples performed successfully for 178 SNPs. By separating natural population admixture from inter-specific hybridization, our reduced panel can help advance evolutionary research, monitoring, and timely conservation management.


Scientific Reports
| (2022) 12:4195 | https://doi.org/10.1038/s41598-022-08132-0 www.nature.com/scientificreports/ between wolves and jackals have been reported 19 and included samples from Croatia (n = 3) where hybrids with dogs have been documented 23 . Canine genotypes mapped to the CanFam2 version of the dog genome were converted to the CanFam3 version with the UCSC liftOver tool (http:// genome. ucsc. edu/ cgi-bin/ hgLift Over) and imported into the SNP&Variant Suite 8.0.1 (hereafter SVS, Golden Helix Inc., Bozeman, MT). Nine of the SNPs included had previously been associated with three phenotypic traits (Table S1): black-coloured coat, white-coloured nails, and the presence of dewclaws, vestigial first toes on the hind legs 49,52 . In wolves, these traits have been linked to wolf x dog hybridization 49,53,54 , although it is important to note that they can also be found in individuals that do not exhibit any other sign of dog introgression 49 . These SNPs are henceforth described as phenotypic loci.
We performed an initial quality control of the dataset, by first applying a filter based on individual genotyping success and retaining profiles with ≥ 90% success rate. Next, we screened the data with a per-SNP genotyping threshold of > 90%. We then pruned the dataset for loci in strong linkage disequilibrium (LD) in SVS using the default settings: a LD-threshold of 0.5 and sliding windows of 50 SNPs. None of the nine phenotypic loci passed the initial filtering process, owing to missing values in the SNP data set. However, given their potential information value, we nevertheless selected three SNPs, one for each trait, for inclusion in the candidate panel (Table S1; selected loci in bold font). We next performed a preliminary assessment of all the canine profiles in Admixture 1.23 55 with K = 3 population clusters (roughly corresponding to jackals, dogs and wolves) to determine overall population structure. To exclude possible hybrids, we ran pairwise comparisons, each time with K = 2, for each of the three wolf populations and dogs, then wolves and jackals, and finally dogs and jackals. We retained only wolves and jackals assigned with q i ≥ 0.95 to their respective population clusters, and dogs assigned with q i ≥ 0.90, considering their higher within-group variability.
Marker selection. AIMs to identify admixture between dogs, jackals and the three regionally distinct wolf populations were selected by performing pairwise F ST comparisons among the five canid groups and choosing loci with F ST ≥ 0.90, evenly distributed across the 38 autosomal chromosomes. We then selected the 45 bestperforming loci from each pairwise comparison for an initial panel. From this list, we selected a reduced panel by prioritizing loci that contributed to differentiation for (a) wolves vs. jackals, (b) F ST values across pairwise comparisons for the three regional wolf populations, (c) Iberian vs. Italian wolves, (d) Italian vs. Dinaric wolves, (e) dogs vs. Italian wolves, (f) dogs vs. Iberian wolves, (g) dogs vs. Dinaric wolves, and (h) wolves vs. dogs across all three regions.
Population structure analyses. We used the resulting SNPs in explorative multivariate analyses by performing a Discriminant Analysis of Principal Components (DAPC) in R 3.6.6 56 with the Adegenet package 2.1.3 57 to detect patterns of genetic differentiation among groups and individuals 58 . The "find.clusters" function was initially used to determine the best-supported number of genetic clusters using the Bayesian Information Criterion (BIC) running successive numbers (1-10) of K-means clusters of the individuals and the "table.value" function was used to graphically visualize the corresponding best clustering of the individuals. The optimal number of principal components (PCs) was identified using the "optim.a.score" function and the Discriminant Analysis (DA) was then run on the retained principal components using the "dapc" function. Finally, after selecting the optimal number of eigenvalues for the DA analysis, the DAPC results were graphically visualized with the "scatter.dapc" function and the assignment probability of individuals to each cluster were calculated and plotted using the "assignplot" function 59 .
We used the same dataset to perform Bayesian clustering analyses in Structure 2.3.4 60 . Five independent runs were performed for increasing values of K (the number of genetic clusters) from 1 to 10 using 500,000 Markov chain Monte Carlo (MCMC) iterations, after a burnin of 50,000 iterations, assuming no prior information (option Usepopinfo not activated), and choosing the "Admixture" and "Independent Allele Frequency" models 59 . We used the highest rate of increase in the posterior probability LnP(K) between consecutive values of K to estimate the most likely K-value, and then assessed the average (Q i ) and individual (q i ) proportions of membership in each cluster 61 . The software Clumpp 1.1.1 62 was used to concatenate the data from the five independent runs for each K value, and Distruct 1.1 63 to graphically display the results.
Subsequently, we used HybridLab 1.0 64 to simulate jackal x wolf, wolf x dog, and wolf x wolf hybrids. The five canid groups were used as reference populations, and we simulated pairwise crosses to generate 10 F1-hybrids and 10 BC1-individuals with the wild parental populations for each comparison (Table S2). The resulting profiles were analyzed in Structure as described above to evaluate the efficiency of the selected SNP panel in detecting hybrids. Finally, we ran Adegenet (PC Analysis performed with the "dudi.pca" function) and Structure as described above with the 192 SNPs and a larger dataset of 191 individuals. This dataset included the five reference groups plus additional canids categorized as admixed or not admixed in earlier studies 23,26,28,46,49,65 to test the discriminating power of the selected loci when applied to empirical data (comprising n = 10 Italian and n = 9 Dinaric wolves, n = 4 Italian wolf x dog, n = 4 Dinaric wolf x dog, n = 4 Iberian wolf x dog, and n = 2 jackal x dog hybrid individuals; admixed canids had been excluded from the marker selection step). To check the consistency of the inferred genetic structure, all clustering analyses were run again in Admixture to reassign each sample to its group of origin, assuming K-values from 1 to 10. The most likely number of clusters was identified based on the lowest cross-validation error 55 .
Genetic variability analyses. The proportions of polymorphic (PL) and monomorphic (ML) loci observed and effective allele numbers (A O and A E ), observed and unbiased expected heterozygosity (H O and uH E ), numbers of private alleles (N P ), Probabilities of Identity among unrelated (PID) and among full sib (PID sibs ) individuals 66 15 did not pass the filtering process in our study and were removed. The total dataset for further analyses therefore comprised 158 canids and 192 SNPs highly informative across groups (Table S5).
Population structure. The K-means clustering analysis identified the best-supported number of genetic clusters at K = 5, which showed the lowest BIC value (Fig. 1a). At K = 5 all individuals clustered in their five original sample groups, which clearly corresponded to the five inferred genetic clusters (Fig. 1b). Discriminant analysis was performed using the five clusters defined by the K-means procedure, retaining the first 100 PCs during the data transformation step, which explained more than 99.5% of the total variance in the data, and four principal eigenvalues. The DAPC clearly identified the five original canid groups with the first component, explaining most of the genetic variability (c. 66%) and distinguished dogs from all of the wild canids, which were visibly separated along the second axis, describing c. 19% of the genetic diversity (Fig. 1c). Furthermore, the discriminant functions based on DAPC correctly assigned all individuals to their a-priori genetic clusters defined by the K-means analyses with individual membership probabilities always > 0.999 (Fig. 1d). Multivariate analyses were strongly supported by the Bayesian clustering procedures implemented in Structure that showed increasing rates in the estimated posterior probability LnP(K) of the clusters until K = 5 (Fig. 2a). For all K-values we observed very low standard deviations among different runs of the same K, with an average variation of only 0.00017 (± 0.00014 SD) in individual coefficient values (q i ) among runs (Fig. 2a). At K = 2 (Fig. 2b), corresponding to the first main increase in LnP(K), dogs (mean estimated membership of population to the assigned cluster Q 1 = 0.999) were clearly separated from the other four taxa (mean Q 2 = 0.999). At K = 3, dogs (Q 1 = 0.998) clustered separately from Italian wolves (Q 2 = 1.000) and from the other wild canids (Q 3 = 0.990). At K = 4, dogs were included in cluster 1 (Q 1 = 0.997), Italian wolves in cluster 2 (Q 2 = 0.999), jackals in cluster 3 (Q 3 = 0.999), and Dinaric wolves together with Iberian wolves in cluster 4 (Q 4 = 0.994). At K = 5 ( Fig. 2b and Table 1), corresponding to the optimal number of genetic clusters and consistent with the phylogenetic and geographic subdivision of the samples, all five canid groups were correctly allocated in their own categories with dogs assigned to cluster 1 (Q 1 = 0.996), Italian wolves to cluster 2 (Q 2 = 0.999), jackals to cluster 3 (Q 3 = 0.999), www.nature.com/scientificreports/ Dinaric wolves to cluster 4 (Q 4 = 0.995) and Iberian wolves to cluster 5 (Q 5 = 0.998). For K > 5, the LnP(K) reached a plateau and no further interpretable substructures were observed in the data (Fig. 2a). Notably, the substructure identified at the optimal genetic subdivision (K = 5) with the 192 SNPs was highly coherent with results obtained through an additional Structure analysis performed with the entire panel of 98,004 unlinked SNPs (Fig. S1a). Bayesian clustering procedures run with the five canid groups plus their simulated first two generations of hybrids clearly confirmed that K = 5 corresponded to the optimal number of genetic clusters in the dataset with all five parental groups correctly assigned to their own cluster with 0.989 ≤ Q i ≤ 0.998 ( Fig. 3 and Table 2). As expected, all simulated F1 hybrids had intermediate Q i assignment values (0.482 ≤ Q i ≤ 0.512) between their respective parental groups ( Fig. 3 and Table 2) and all BC1 showed Q i assignment values to the parental wild groups they were backcrossed with ranging from 0.723 to 0.827 ( Fig. 3 and Table 2). Finally, both multivariate discriminant analyses and Bayesian clustering procedures that were run with the additional 33 canid genotypes clearly demonstrated the high discriminating power of the identified loci when applied to empirical data. These correctly and unambiguously identified all wolf (WDIN q i ≥ 0.992 and WIT q i ≥ 0.999), all known wolf x dog (0.441 ≤ q i ≤ 0.847) and jackal x dog (0.486 ≤ q i ≤ 0.744) hybrid individuals (Fig. 4a,b and Table S6). Bayesian clustering and assignment analyses run in Structure were highly concordant with the Maximum-likelihood clustering and assignment procedures performed in Admixture using both 192-SNP and 98,004-SNP parental   60 , assuming K = 5 clusters and using the "Admixture" and "Independent allele frequencies" models as parameter settings. Average Q i were obtained concatenating the data from the five independent runs using Clumpp 1.1.1 62 . www.nature.com/scientificreports/ population genotypes (Fig. S1b,c,d,e and Table S7, S8). Previous genome-wide analyses have indicated substructure between Dinaric and Balkan wolves 2 and earlier studies from Bulgaria have reported wolf-dog and possibly also wolf-jackal hybridization 19 . We therefore performed additional tests with 10 wolf profiles from Bulgaria. These analyses indicated that F1-hybrids and BC1-individuals from the Balkan region can also be distinguished based on the identified loci (results not shown).
Genetic variability. The proportions of polymorphic loci ranged from about 8% in jackals to more than 80% in dogs, with intermediate values (30%-45%) in the three wolf populations ( Table 3). The highest rates of A O , A E , A R , PIC, H O and uH E were observed in dogs and the lowest values were seen in jackals (Table 3). Among the wolf populations, Italian wolves showed the lowest values for the variability indexes, followed by Iberian and Dinaric wolves (Table 3). AMOVA showed that more than 81% of the total genetic diversity was significantly (P < 0.001) partitioned among the five groups. All pairwise F ST values (Table 4) were highly statistically significant (P < 0.0001) and aligned with the very low corresponding effective F ST -based migration rates ( Table 4). The PID and PID SIBS values for the 192 loci, and the probability of finding multiple individuals with the same genotype, were low (Table 3).
Candidate loci under selection and population divergence times. With a false discovery rate threshold of 0.05, BayeScan 73 identified n = 3 putative outlier SNPs, one for each pairwise comparison between wolf populations, mapped on chromosomes 1, 8, and 25, respectively (See Supplemental Note S1, Fig. S2 and Fig. S3 for details). For wolf population divergence times, ABC simulations showed the highest support for scenario number 7 (sequential population splitting with subsequent bottlenecks), which performed better than the other models (Supplemental Note S2).
Fluidigm assay design, development and application. We successfully designed primer pairs and assays for all the new 108 SNPs that satisfied the recommended parameters (minimum separation distance of 100 base pairs), avoiding regions with large repeats or otherwise difficult for the interpretation of results. After amplifying the 93 DNA samples 2-3 times at the 192 SNPs, we discarded 14 loci among the newly-designed SNPs (corresponding to 13% of panel 2-3, and 7% of the 192 SNPs) that failed amplifications in all reactions (Tables S3, S4, and S5), resulting in 178 SNPs (Fig. 5). Genotyping success for the remaining 178 SNPs was high across samples and markers, and only one non-invasively sampled dog with 96% missing data was removed from the analyses. All of the remaining wolf, hybrid, and dog samples were successfully genotyped at ≥ 95% of the tested loci, showing an average amplification success rate of 98% and an average allelic dropout rate of 4.4% across loci (Table S5). As expected, invasive samples showed the highest AS (99%) and the lowest error (0.7%) rates (Table S5), and their genotypes fully matched the original profiles from the Illumina CanineHD BeadChip. Interestingly, non-invasive samples showed only slightly lower AS (97%) and only slightly higher error (3.7%) rates across loci (Table S5). Cross-species amplification tests also produced valid genotypes for the other canid species with jackals showing AS = 97%, ADO = 1% and FA = 0, and red foxes showing AS = 90%, ADO = 0.5% and FA = 0 (Table S5). Notably, Panel 1 confirmed its high performance (AS = 99%, ADO = 0.6% and FA = 0 for tissue samples and AS = 98%, ADO = 6.1% and FA = 0 for faecal samples) for genotyping of various DNA sources. The newly designed and tested Panels 2 and 3 also performed well for genotyping both invasive (AS = 98%, ADO = 0.8% and FA = 0) and non-invasive samples (AS = 96%, ADO = 1.6% and FA = 0) on the Fluidigm platform (Table S5). A PCoA on invasive and non-invasive samples genotyped with the 178 SNPs confirmed the previous assignments. The different canid groups were clearly separated, with non-invasive genotypes clustering with profiles from their respective reference groups (Fig. S4). Admixed individuals were placed between parental groups, with recent admixture resulting in intermediate positions and older backcrosses positioned closer to wild parental groups (Fig. S4). Importantly, re-analyses with 178 of 192 SNPs did not affect the overall results. Outcomes from multivariate, Bayesian and Maximum-likelihood assignment and clustering analyses in  Table S9 and S12), simulated (Fig S5b; Tables S10 and S13) and other empirical data (Fig S5c,d; Table S11), were always significantly correlated (R ≥ 0.997; P < 0.0001 for all pairwise combinations) and not significantly different (t ≥ 0.005, P ≥ 0.977; t-tests for all pairwise combinations) from the results obtained with 192 SNPs.

Discussion
We employed a reduced panel of 192 SNPs to reliably discriminate between anthropogenic hybridization and natural gene flow among populations in five canid groups, centered on wolves in southern Europe. Of these SNPs, 178 were successfully amplified on the Fluidigm platform, and provided the same results as the 192 SNPs for analysis of population structure. Results from empirical data and simulated profiles suggest that the selected marker set is well-suited for this purpose, and our analyses unambiguously identified all F1-hybrids and all BC1-individuals. Our results have shown that the panel performs well, also for non-invasively collected samples, and the reduced set of SNPs presented by Harmoinen et al. 15 and in this study can therefore provide informative results for monitoring, research, and conservation management.

Marker selection and the power of discriminant loci.
Our results based on empirical data and simulated profiles indicated that the identified SNPs clearly differentiate F1-hybrids and accurately detect BC1individuals within and across canid species, showing no type I (non-admixed individuals erroneously identified www.nature.com/scientificreports/ as admixed animals) nor type II (admixed individuals falsely identified as non-admixed animals) errors 26 . The identified loci incorporate the panel of 93 SNPs developed for microfluidic genotyping by Harmoinen et al. 15 , who found almost identical genotypes in a test of 30 invasive and non-invasive samples from the same wolves. As a complement to the European wolf-dog hybrid panel presented by Harmoinen et al. 15 , we therefore propose the selected SNPs for a more detailed evaluation of population admixture versus hybridization in southern European populations, including possible events of jackal-hybridization 19,23 .
The presence of fixed alleles in jackals, dogs, and between wolf populations contributed to the discriminant power of certain loci, which can be particularly valuable as diagnostic markers for monitoring purposes. Where incomplete profiles are obtained from non-invasive samples, the results from a few such markers can be enough to allow species identification 74 , which is highly relevant for tracking the expanding distribution of jackals. Their rapid range expansion across Europe 22 and within our study area, recently illustrated by the confirmation of jackal presence south of the Po River in northern Italy (http:// gojage. blogs pot. com/ 2021/ 01/ all-roads-lead-to-romeeurop ean-golden. html), highlights the urgent need to account for this species in future canid monitoring and management plans. Our results indicated that the selected SNPs perform well for identification of F1-hybrids and BC1-individuals for all canid group combinations. Detecting the presence and distribution of these earlygeneration hybrids is typically a higher priority than that of individuals with lower proportions of dog ancestry, to prioritize scarce conservation resources 26 . The findings from our empirical and simulated data therefore indicate that our panel can reliably identify the animals that are most essential for monitoring and conservation management (e.g., the LIFE MIRCO-Lupo project, http:// www. lifem ircol upo. it/).   15 was developed and successfully tested with non-invasive samples, and the additional loci presented in this study also performed well with non-invasive sources of DNA, we underline that the proposed panel may need future modifications given the highly dynamic situation for canids in our study area. In view of the recent gene flow between Italian Alpine and Dinaric wolves 12 , the detection of dog-jackal hybridization 23 and the increasing overlap seen in wolf and jackal distributions uH E = unbiased expected heterozygosity; F IS = estimates of departure from Hardy-Weinberg equilibrium; P = probability of obtaining, by chance, F IS values higher than those observed after 10,000 random permutations (*P < 0.05, **P < 0.01, ***P < 0.001); PID = probability of identity; PID sibs = probability of identity among full sibs.  www.nature.com/scientificreports/ throughout Europe 21 , it will also be necessary to monitor for possible changes in the spatial and temporal distribution of allele frequencies in the selected SNPs 26 . However, ongoing international collaborations in population monitoring, as illustrated by the LIFE WolfAlps EU project (http:// www. lifew olfal ps. eu/), will provide such information for wolves and other wide-ranging carnivore species. Accordingly, the opportunities for data sharing offered by new genomic approaches will contribute toward keeping these tools as useful and updated as possible.
Population structure. All five a priori sampling groups emerged as well-separated genetic clusters based on multivariate and Bayesian clustering analyses of the identified SNPs. As expected, dogs represented the most divergent group, and Italian wolves were also clearly differentiated from the other wolves, as observed in earlier analyses of SNP profiles 1-3 and supported by earlier findings from analyses of mitochondrial DNA and microsatellite markers 11,75,76 . Moreover, pairwise values of genetic distances between populations align with previous findings 1-3 for southern European wolf populations. Notably, jackals emerged as a distinct group only at K = 4 population clusters, which is likely a reflection of the ascertainment bias in the SNP panel originally developed for dogs 1 .
Genetic variability. The lower diversity values for Italian wolves, followed by Iberian wolves and Dinaric wolves, accord with earlier SNP analyses of European wolf populations 1-3 . It is nonetheless important to note that the reported levels of diversity differ among analyses, and that the a priori selection of loci for the purposes of our study likely influenced the results. The SNP array that formed the basis of our study is centered on genetic variation in dogs 1,77 , magnifying their genetic diversity while underestimating that of wolves and the more distantly related jackals 1 . Furthermore, our selection of markers focused on SNPs with high F ST -values. For these reasons, direct comparisons of genetic variability among taxa should be treated with caution. Future developments based on whole-genome data may provide additional new markers less affected by ascertainment bias. However, the selection of loci from the existing-and commonly used-SNP panel facilitates the use of already-published canid profiles from earlier projects as reference populations, which could increase the utility of the panel.
Notwithstanding, we report various diversity parameters for comparative purposes, which might also inform future efforts for different canid groups or other taxa. Relevant examples are the values for PID and PID sibs , where values calculated across 21 jackals (PID = 1.343 × 10 -04 , PID sibs = 1.039 × 10 -02 ) were substantially higher than those calculated for the 30 Italian wolves (PID = 2.428 × 10 -10 , PID sibs = 1.247 × 10 -05 ) that represented the 2nd-highest values. Nonetheless, our selected SNPs provide essential information, given that the initial 93-SNP panel by Harmoinen et al. 15 did not distinguish among wild canids (i.e., among wolves, jackals, and red foxes). Although a higher number of SNPs are required to obtain PID values similar to standard microsatellite panels 32,43 , careful testing have shown that such panels can still work well across broad geographic regions and divergent populations 15 . Candidate loci under selection and population divergence times. We detected a limited number of loci under possible selection (Supplemental Note S1). Although earlier SNP analyses have suggested possible environmental selection in European wolves 3,78 , future research with more samples and genomic data are needed to evaluate these findings, including the potential presence of false positives among our results. Our population divergence modeling (Supplemental Note S2) indicated sequential population splits and bottlenecks broadly consistent with whole-genome results 5 , although our results showed very wide confidence intervals and could be partly affected by ascertainment bias, mutation rates, marker numbers, and secondary gene flow 3,5,79 . Fluidigm assay design, development and application. Our results indicate that the selected SNPs are well-suited for conservation management, including projects based on non-invasively collected samples. When tested on standard platforms such as the microfluidic arrays, most of the candidate markers performed efficiently and reliably, showing high amplification success and low error rates on various DNA sources from different canid groups. Well-differentiated allele frequencies allowed us to distinguish among the different canid groups and their first generation hybrids. Importantly, we were also able to distinguish the aforementioned groups from the red fox, whose DNA may be present in non-invasively collected samples such as livestock damage cases.

Conclusion and recommendations.
We found that the identified SNPs can reliably distinguish natural population admixture from inter-specific hybridization in five canid groups comprising dogs, jackals, and the long-divergent Iberian, Italian and Dinaric wolf populations, up to the first generation of backcrosses, using various sources of DNA, including non-invasively collected materials. New genomic tools provide opportunities to integrate data from different laboratories without the need for calibration, which is essential for wideranging species such as wolves. The sequencing of microsatellite markers 35 , which is now in progress for wolves (Skrbinšek et al., unpublished data), may also offer opportunities to combine hypervariable microsatellites with SNP loci possibly associated with adaptive genetic variation. The selected SNPs can provide timely identification of dispersers and ensure that such important individuals are not erroneously classified as hybrids, with the subsequent risk of management decisions that could be harmful for conservation. The proposed panel will also facilitate the monitoring of jackals, which are rapidly expanding their ranges across Europe, and contribute to research, monitoring, and conservation management of wild European canids.