Insights into the origin of the invasive populations of Trioza erytreae in Europe using microsatellite markers and mtDNA barcoding approaches

The African citrus psyllid Trioza erytreae is one of the major threats to citrus industry as the vector of the incurable disease known as huanglongbing (HLB) or citrus greening. The psyllid invaded the northwest of the Iberian Peninsula 6 years ago. The invasion alarmed citrus growers in the Mediterranean basin, the largest citrus producing area in Europe, which is still free of HLB. Before our study, no research had been carried out on the genetic diversity of T. erytreae populations that have invaded the Iberian Peninsula and the archipelagos of the Macaronesia (Madeira and the Canary Islands). In this study, combining microsatellites markers and mtDNA barcoding analysis, we characterize the genetic diversity, structure and maternal relationship of these new invasive populations of T. erytreae and those from Africa. Our results suggest that the outbreaks of T. erytreae in the Iberian Peninsula may have derived from the Canary Islands. The populations of T. erytreae that invaded Macaronesia and the Iberian Peninsula are likely to have originated from southern Africa. We anticipate our results to be a starting point for tracking the spread of this invasive pest outside of Africa and to be important for optimizing contingency and eradication plans in newly invaded and free areas.

The African citrus psyllid Trioza erytreae (Del Guercio) (Hemiptera: Triozidae) is an invasive pest that has become one of most severe threats to the Mediterranean citrus industry [1][2][3] . Along with the Asian citrus psyllid Diaphorina citri Kuwayama (Hemiptera: Psyllidae), T. erytreae is a vector of the phloem-restricted Gram-negative bacteria Candidatus Liberibacter species that occur in citrus. These bacteria are the causative agents of citrus greening disease, an incurable and most devastating disease that affect citrus, also known as huanglongbing (yellow dragon disease) or simply by its acronym HLB 4,5 . Spread of HLB by T. erytreae had a devastating impact on citrus production in the cooler highland regions of Kenya and Tanzania, where it caused losses of 25-100% 6 . In Florida, the largest orange growing area in the United States of America, the spread of HLB by D. citri caused citrus production decline by 74%, resulting in losses of about USD 4554 million 7,8 . Besides their role as HLB vectors, the nymphs of both psyllid species excrete large amount of honeydew that facilitates the growth of sooty moulds on infested trees 2,9 .
Trioza erytreae was first reported in 1897 in the South African regions of the Eastern Cape and Stellenbosch 10 , but it was not until 1918 that Del Guercio first described the African citrus psyllid from samples collected on

Results
Genome-wide characterisation of SSRs. We identified and mapped a total of 428,342 microsatellites across the 47,828 scaffolds of the unpublished genome sequence draft of T. erytreae using the GMATA software 35 . The SSRs frequency was estimated at 765.6 SSRs/Mb, which means 1 SSR for every 1.09 Kb. In silico identified SSRs were distributed among ten types of in tandem repeated motifs (from di-to deca-nucleotides). Analysis of SSR distribution revealed that the di-nucleotide motifs (340,227) were the most abundant SSRs, with a frequency of 79.4%. Both tetra- (20,902) and tri-(61,839) nucleotide repeats comprised about 5-15% ( Fig. 1A; Supplementary Data 1). The remaining motifs, from hepta-to deca-nucleotides, comprised less than 1.5% of total SSRs identified in this study (Fig. 1A). Considering the unknown orientation of DNA strands in the Tery6 draft genome sequence of T. erytreae, a further SSRs characterization was carried out grouping the repeat motifs into pairs of complementary sequences. According to this, GA/TC (36.6%) and CT/AG (31.9%) are the most frequent motif pairs, with a total frequency of 68.5% (Fig. 1B). Grouped motif pairs GC/GC (0.05%) and CG/ CG (~ 0.02%) were the least abundant di-nucleotide motifs. In decrease order, the most abundant tri-nucleotide motif pairs were ATT/AAT, ATA/TAT, ACA/TGT, TAA/TTA, AAC/GTT, TTG/CAA, and AAG/CTT, which encompassed 9.8% of all identified grouped motif pairs. Occurrence frequency of the remaining grouped motifs, including the rest of tri-and those from tetra-to deca-nucleotides (552 all together), was less than 11% of all motif pairs (Fig. 1B). Our data analysis reveals that SSR markers of 10 bp were most frequent, accounting for about 10% all SSR markers identified in this study. The overall trend of SSR length distribution in the T. erytreae genome is that the frequency of occurrence of SSRs gradually decreases as their length increases (Fig. 1C).  (Table 1) were used as potential markers to investigate the genetic diversity, structure and phylogeography of T. erytreae individuals from populations in mainland Europe and the archipelagos of the Macaronesia. Scaffolds Tery6_s00034 (274,710 bp), Tery6_s02825 (48,689 bp) and Tery6_s07841 (26739 bp) were randomly selected based on their sequence length (long, medium, and short scaffolds, respectively). SSRs were selected on the base of their type of repeat motif (di, tri-, tetra-and penta-nucleotides), nucleotide composition and length (number of in tandem repeated motifs) (    were not amplified efficiently and the corresponding primer pairs were discarded for further analysis. The individuals of T. erytreae collected in different geographical locations in the west coast of mainland Spain and Portugal, the Canary Islands and Madeira, as well as in South Africa and Kenya (Table 2), were analysed using the 10 selected SSR markers designed in this study. The scored allelic data for each SSR marker is summarised in the Table 3. The analysis showed that all SSR markers were polymorphic. Seventy alleles were detected over the ten selected SSR loci, and the average number of alleles per locus (N a ) was seven. SSR markers Tery08 and Tery11 had the highest number of alleles (12 and 20 alleles respectively), whereas Tery13 had the lowest (only two alleles). The expected (H e ) and observed (H o ) heterozygosity per locus in the entire population ranged from 0.20 to 0.77 and from 0.03 to 0.84, respectively. SSRs Tery11 and Tery08   Population structure based on T. erytreae SSR data. To assess the differentiation and genetic diversity among the local populations of T. erytreae sampled in newly invaded areas from Spain and Portugal, including Madeira and the Canary Islands, and those from the previous invaded areas in Africa (South Africa and Kenya), we used a Bayesian clustering method to analyse the SSR multi-locus genotyping data. The STRU CTU RE analysis according to the method of ΔK 36 showed that the overall genetic profile of all the individuals sampled could be described with two or three different hypothetically original populations corresponding to the highest ΔK values (Fig. 2). It means that the most likely values of genetic clusters (K) are 2 or 3. Nevertheless, Pritchard's method 37 showed a posterior probability of data at K = 7 (Fig. 2). The estimated likelihood distribution increased from K = 1 to K = 7, and then started to decrease. This implied that seven was the smallest value of K, which was the most likely number of inferred populations in our data set. Interestingly, the value of K at which the likelihood distribution reached its maximum coincided with a further peak value of the ΔK statistic at K = 7, suggesting a more complex hierarchical structure of the T. erytreae populations (Fig. 2). In consequence, we plotted the www.nature.com/scientificreports/ clustering results for K = 2, K = 3 and K = 7 (Fig. 3). Furthermore, we considered an initial structure of two populations (K = 2) as was suggested by the method of ΔK 36 whereby most of the analysed individuals were classified with high probability (Q > 0.90) in two clusters (Fig. 3). Cluster 1 (in green) was exclusively formed by individuals from newly invaded areas in Spain and Portugal, including those from the archipelagos of Madeira and the Canary Islands. On the other hand, Cluster 2 (in beige) was mainly comprised of individuals from Africa, but also included individuals from Camacha (Madeira). The exception to this pattern involved three locations in Madeira (Quebradas, Camacha and Moreno), Pretoria (South Africa), and Homa Bay (Kenya), where almost all individuals consistently had significant membership in both clusters. Looking at K = 3 plot, the Bayesian clustering analysis resolved Cluster 1 into two by reassigning some individuals to Cluster 3 (in purple). Almost of all individuals from Moreno, Poiso, and Farrobo (in Madeira and Porto Santo, respectively) were entirely reassigned to Cluster 3 along with several individuals from the Canary Islands and Galicia (Spain). In addition, individuals from Vairão (Porto) and São Vicente de Pereira Jusã (Aveiro) (both in the northwest coast of Portugal) were also assigned to Cluster 3, while those individuals sampled from southern locations up to Sobreda (Setúbal) were assigned to Cluster 1. The exceptions to this pattern were the individuals from Ribamar (Ericeira), which were assigned to Cluster 3. Most notably, samples from Kenya were genetically different from those of South Africa and grouped in Cluster 1. At K = 7 the population structure scenario was more hierarchical, but 73% of all individuals (108 out from 147) could be assigned to one of the seven clusters with more than 90% probability (Q > 0.9). The assignment of half of the remaining individuals (21 out of 39) could be done with more than 70% probability (Q > 0.7). Among the different groups, Cluster 1 (in green) and 2 (in beige) are restricted to the populations of South Africa and Kenya, respectively, with almost no presence of individuals from any of the newly  Genetic diversity analysis using T. erytreae SSR allelic data. The genetic diversity of T. erytreae populations was also assessed by means of a distance-based clustering method. The scored SSR allelic data obtained from the ten SSR loci developed in this study were used to calculate a genetic dissimilarity matrix and to compute a Neighbor Joining (NJ) tree. A preliminary dendogram constructed using only the African populations of T. erytreae showed that the individuals from South Africa grouped together into a single cluster clearly separated from the Kenyan population. The robustness of the tree clustering was supported by the high bootstrap values obtained for nearly all branches (Fig. 4). To confirm the results obtained from the structure analysis a NJ tree under topological constraints was inferred using as initial tree the population structure of individuals from all the sampled areas with Q > 0.7. The remaining individuals were positioned (constraint) on that previous The nucleotide sequences of the COI barcode fragments generated in this study (n = 39) and some previously deposited in the GenBank (n = 37) were used to analyse the maternal phylogenetic relationship of T. erytreae populations that have invaded Spain and Portugal, with those from South Africa and Kenya (Fig. 6). From all COI sequences used in this study, 38   www.nature.com/scientificreports/ (Cameroon, Ethiopia, Tanzania and Uganda). In accordance with the high level of identity of their COI barcode nucleotide sequences, the NJ tree generated from these sequences showed that the individuals from Spain and Portugal, including those from Madeira and the Canary Islands, formed a monophyletic group with the individuals from South Africa (Pretoria, Nelspruit, and Tzaneen). Our phylogenetic analysis reveals a clear differentiation between this monophyletic group and the individuals from Homa Bay (Kenya), as well as from those individuals previously reported in other locations in Kenya 39 . It was observed that the individuals from Spain and Portugal formed a paraphyletic group with those from Pretoria (Fig. 6), as the remaining South African individuals from Nelspruit and Tzaneen formed a separated clade. Furthermore, our analysis demonstrated the presence of two different T. erytreae lineages in Tzaneen, as most of their individuals formed a paraphyletic group with those from Nelspruit, while the remaining formed a clade with four individuals from West Acres (South Africa) 39 . The few exceptions to this observation were three South African individuals, one from Pretoria (Pretoria-100), and two from West Acres (TeSA1 and TeSA7), which may correspond to migrants from Nelspruit or Tzaneen, and Pretoria, respectively. In a sister clade position to the South African clade, the GenBank accessions of Kenyan and Tanzania COI sequences 39 included in this phylogenetic analysis formed a monophyletic group. The COI barcode sequences from Homa Bay (Kenya) clustered separately as an outgroup in a different clade with the corresponding fragment extracted from the mitochondrion genome sequences from Ethiopia and Uganda (MT416551 and MT416549, respectively) 27 , and Cameroon (MG989238) 40 present in GenBank.

Discussion
It is well known that SSR markers are useful to study population genetics and phylogeography in insect species and have been proven to be an efficient molecular tool to assist in the optimisation of integrated pest management programs 43,44 . However, although the contribution of SSRs in studies of population diversity and structure is undeniable, their comprehensive characterisation and development as molecular markers are only possible after genome sequences are available. In Hemiptera, microsatellites have been characterized for several invasive and devastating pests, including the Asian citrus psyllid D. citri 45 45 . Since the genome of T. erytreae is not yet fully annotated, we could not estimate the total number of SSRs in exons, introns or intergenic regions, but it was possible to get information for most of the SSR loci selected for further analysis. When comparing the number of various classes of SSRs in T. erytreae, we found that our data were in accordance with the averages estimated for other insect genomes 38 . The average of di-and trinucleotide SSRs is significantly higher than those observed for longer repeat types. Among the grouped motif pairs in T. erytreae, we found that CT/AG and GA/TC were predominant, while GC/ GC and CG/CG were the least frequent di-nucleotide motifs. These data agree with published results in other insect species, including members of the order Hemiptera 45 . In our study we developed ten informative microsatellites markers to determine the genetic variability and to estimate the gene flow and the structure of T. erytreae populations that have invaded Spain and Portugal. The Wright's fixation index (F w ) for all loci was higher than zero (0.41) considering the whole individuals sampled, indicating an overall heterozygote deficiency. This also suggested a slight degree of inbreeding among individuals within local populations. Our results show that the T. erytreae individuals analysed in this study are structured in seven genetic clusters (K = 7).
Although T. erytreae was recorded in the Iberian Peninsula in 2014, just 7 years ago 17 , our results suggest that genetic structure was already present among the newly invaded areas in the west coast of the Iberian Peninsula, from Galicia (Spain) to Setúbal (Portugal), between 2015 and 2019. Since T. erytreae was spotted in Galicia 12 years after of being reported in the Canary Islands, and that individuals from both locations share a very similar genetic structure, the Canary Islands emerge as the most likely source of the T. erytreae that invaded the north-western coast of the Spainish mainland. The little genetic homogeneity found in local populations suggests the presence of different colonizing lineages. In this regard, there is strong and growing evidence suggesting that multiple introductions, complex global movement, and population admixture contribute to increase the genetic diversity in invasive insect species, and that mixing of different lineages may contribute to rapid evolution and to invasiveness 46 .
Since the outbreak of T. erytreae was first reported in Madeira in 1994 13 and 8 years later (2002) in the Canary Islands 14 , it is possible that colonization of the latter archipelago has derived, at least in part, from Madeira. According to the genetic clusters identified in both archipelagos, the colonisation of the Canary Islands could have originated mainly from Quebradas and Farrobo (in the islands of Madeira and Porto Santo, respectively), and in a minor extension from Moreno and/or Poiso (both in Madeira). Our study shows that T. erytreae individuals from Madeira, the Canary Islands and Galicia are genetically more diverse compared to those from the sample locations in the coast of Portugal, South Africa and Kenya. Conversely to the high genetic diversity observed in Galicia, the local populations of T. erytreae sampled along the coast of Portugal seem relatively homogeneous with only two main genetic clusters. It is probable that the expansion of T. erytreae observed in recent years along the coast of Portugal 18 has had its origin in the invaded locations of Madeira, but we cannot rule out that it may have also been derived from Galicia. Remarkably, our distance-based clustering analysis of SSR allelic data suggest that the Camacha linage may have been derived from South Africa, somewhere in or around the triangle made up for the localities of Pretoria, Nelspruit, or Tzaneen, as the emerging position of the Cluster 7 in the base of the Cluster 2 supports this hypothesis.  www.nature.com/scientificreports/ The genetic diversity of T. erytreae populations from South Africa and Kenya seems different from each other. Moreover, their genetic diversity is also different from those populations of T. erytreae sampled in Spain and Portugal. Therefore, we cannot conclude from our study that the T. erytreae populations that invaded the Macaronesia and the Iberian Peninsula derived directly from one of these two African countries. Recently, a study on the intraspecific diversity of T. erytreae assessed by COI barcoding suggested that the populations of this species that have been found in Europe is most likely originated from South Africa 27 . The study that was strongly biased towards COI barcode accessions from Kenya with 74.2% of all sequences, compared to 14.6% from South Africa and only 7.9% from Spain and Portugal, and could not exclud the possibility of a Kenyan origin. However, according to our STRU CTU RE analysis, the presence in Pretoria (South Africa) of discrete membership fractions of the genetic clusters detected in the new invaded areas in Spain and Portugal suggests that the T. erytreae individuals that have invaded Europe may have their origin in South Africa. Our phylogenetic analysis using mtDNA barcoding also supports this hypothesis, as the individuals from Spain and Portugal (including those from the Canary Islands and Madeira) formed a monophyletic group with those individuals from Pretoria, Tzaneen and Nelspruit, suggesting that the T. erytreae individuals that invaded the Iberian Peninsula have their maternal origin somewhere in the northeast of South Africa. Furthermore, the fact that T. erytreae populations in the Canary Islands have been drastically reduced by its natural enemy Tamarixia dryi (Del Guercio) (Hymenoptera: Eulophidae), a highly specific parasitoid of T. erytreae imported from Pretoria 21,22 and released in the archipelago in 2018 23 , also supports the idea that the T. erytreae individuals that invaded the Macaronesia and the Iberian Peninsula have derived from the northeast of South Africa near Pretoria. Nevertheless, considering that Pretoria (Gauteng) is not a citrus growing area, it could be possible that the individuals of T. erytreae sampled in Pretoria could have derived from the neighbouring province of Limpopo, the largest citrus producing area in South Africa 47 .
A similar spatial pattern of invasion was observed in the case of the black citrus aphid Aphis citricidus (Kirkaldy) (Hemiptera: Aphididae), the main vector of citrus tristeza virus (CTV). This aphid species first invaded Madeira in 1994 48 , and 8 years later the north-western region of the Iberian Peninsula 49 . This similar invasive behaviour suggests a common pathway for both citrus pests. T. erytreae and A. citricidus are oligophagous species, with Citrus species as principal host plants but also feeding on other Rutaceae, and that the importation of citrus plants from non-European countries is forbidden in the EU. Taking all of this into consideration, we hypothesise that the introduction pathway of T. erytreae in the Macaronesia and Iberian Peninsula could be related to the commercial trade or transportation by travellers of ornamental Rutaceae. In the case of T. erytreae, another possibility was recently suggested which showed that the adults of the psyllid can survive several days on citrus fruits in low temperatures 20 .

Materials and methods
Trioza erytreae samples collection and total DNA extraction. A total of 147 T. erytreae individuals from 23 locations across Spain, Portugal, Kenya, and South Africa were collected between 2017 and 2019 ( Table 2). Geographical coordinates and host plants species were recorded. All samples were collected from citrus genus except one sample that was obtained from white sapote Casimiroa edulis La Llave & Lex. in Madeira, Portugal. Upon sampling, individuals were stored in 70% ethanol at 4˚C for preserving DNA integrity. Total DNA (mix of genomic and mitochondrial DNA) was isolated from individual insects following a modified Salting Out method 50 . DNA barcoding and phylogenetic analysis of COI. Total DNA (10 ng) of selected T. erytreae individuals was used as template for PCR amplification of a 714 bp fragment of the 5'-coding region of the mitochondrial Cytochrome C Oxidase I gene (COI; Gene ID: 37,507,472), from positions + 6 to + 719 with respect to the start codon. The COI barcode fragment was amplified using the specific primers Te-6U30 and Te-720L26, previously published by us 22 . PCR fragments were bi-directional sequenced using the same both primers. COI barcode fragments were sequenced by capillary electrophoresis using a 3130XL Genetic Analyzer (Applied Biosystems, Carlsbad-California, USA), at the DNA Sequencing Unit of the IBMCP (Valencia, Spain). Sequences were analysed and trimmed to remove primer sequences, using the Sequencer DNA Sequence Analysis Software v4.9 (Gene Codes Corporation, Michigan, USA). Forward and reverse high-quality reads obtained for each T. erytreae individual were assembled into consensus sequences and submitted to the GenBank public sequence repository. Multiple alignment of nucleotide sequences of COI barcode fragments was performed in MEGAX 42 using the built-in ClustalW alignment interface. For each COI barcode sequence, a fragment of 657 bp (from positions + 36 to + 692 with respect to the ATG) was used for multiple sequence alignment. The phylogenetic analysis was also carried out on MEGA X using the Maximum Likelihood Method based on the Tamura-Nei model 41 . The reliability of the constructed tree was evaluated using a bootstrap test of 10,000 replicates. Initial tree for the heuristic search were obtained automatically by applying the Maximum Parsimony method. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories [+ G, parameter = 0.0500]).
Genome mining for SSRs discovery. A genome sequence draft of T. erytreae, obtained in collaboration with the Genome Assembly and Annotation Team (CNAG-CRG, Barcelona-Spain), with an estimated size of 763 Mb (Rojas-Panadero et al., unpublished data), was mined for SSRs by means of the GMATA 35 software (Genome-wide Microsatellite Analysing Toward Application, http:// sourc eforge. net/p/ GMATA). The parameters of motif unit length were set to a minimum of two (excluding mononucleotide motifs) and maximum of ten nucleotides repeats, while the minimum of in tandem repeat copies of detected motifs (from di-to octanucleotides) was set to three. www.nature.com/scientificreports/ 4.4 PCR amplification and Genotyping of SSR loci. Excluding mononucleotides motives, SSRs of three or more in tandem repeated motifs were targeted to design primers for PCR amplification. Primer pairs with sufficient flaking sequence to amplify genomic SSR loci between 250 and 600 bp were designed in Primer Analysis Software Oligo 7.6 (MedProbe, Colorado, USA), using very high constrain parameters with Tm ranged between 58 and 60 °C. Amplicons sequence were blasted via BlastN against the T. erytreae draft genome sequence using the CLC Genomics Workbench v.9.5.3 (CLCbio, Aarhus, Denmark), to determine whether PCR primers would result in the amplification of single DNA fragments for each SSR loci. Finally, primer pairs were tested for amplification in PCR reactions using the Phire Hot Start II DNA Polymerase (Thermofisher Scientific; Vilnuis, Lithuania). PCR reactions were carried out in a total reaction volume of 20 µl containing 1X Phire Reaction Buffer (provided with 1.5 mM MgCl 2 ), 200 μM of all four dNTPs, 500 μM of forward and reverse primers, 0.4U of DNA Polymerase, and 20 ng of total DNA. Cycled conditions were set as: initial denaturation at 95 °C for 30 s, followed by 35 cycles of denaturation at 94 °C for 5 s, annealing at 60 °C for 5 s and extension at 72 °C for 10 s; and 10 min of extension at 72 °C. The PCR products were stained with gelRed and then resolved through a 2% agarose gel and visualised under UV light. Genotyping of selected SSR loci was performed on a capillary genetic fragment analyser. PCR reactions were set up as mentioned above with the only exception that forward primers labelled with wellRED fluorescent dye were used instead of unlabelled oligonucleotides. Denaturation and capillary electrophoresis were carried out on a CEQ™ 8000 Genetic Analysis System (Beckman Coulter Inc; Fullerton, USA), using linear polyacrylamide gel according to the manufacturer's instructions. The PCR amplified fragments were sized based on 400 or 600 bp DNA size standards (Beckman Coulter Inc; Fullerton, USA). Genetic analysis system software GenomeLab™ GeXP v10.0 was used for data collection and analysis, and subsequently to estimate the genetic diversity.

Genetic diversity analysis of T. erytreae populations. Population diversity organization was analysed
with the bioinformatic software DARwin v6 51 (Dissimilarity Analysis and Representation for Windows, http:// darwin. cirad. fr/ darwin). For each SSR locus, amplicons were scored as allelic data to calculate the genetic dissimilarity matrix using the simple matching dissimilarity index between pairs of accessions (units). Neighbour-Joining (NJ) tree was computed using the obtained dissimilarity matrix. The robustness of the consensus tree branches was tested using 10,000 bootstrap iterations.
Population genetic parameters and genetic structure analysis based on the SSR markers data. Two analyses were conducted to examine the populations of T. erytreae that invaded the Macaronesia (Madeira and Canary Islands) and the Iberian Peninsula (Table 1). Polymorphism analysis of the selected SSR loci was carried on the sampled T. erytreae local populations with factorial discriminant analysis performed in GENETIX 52 (https:// kimura. univ-montp2. fr/ genet ix/), a multivariate analysis approach that uses no priori genetic assumptions for relationships between allelic differences and genetic distances. By means of GENETIX, descriptive statistics of SSR allelic data, including the mean number of alleles per locus (N a ), the expected (H e ) and observed heterozygosity (H o ), Wright's fixation index over the whole population (F w ) and the inbreeding coefficient (F is ) were estimated (Table 3). A Bayesian clustering analysis performed with the STRU CTU RE v2.3.3 package software 37 (http:// cbsua pps. tc. corne ll. edu/ struc ture) was used to infer the genetic structure of the T. erytreae local populations. The software assumes a model in which there are K unknown populations, each of which is characterized by a set of allele frequencies at each analysed locus. The STRU CTU RE analysis was done using Admixture model but allowing allele frequencies to vary independently across populations. The additional parameters used were different values of F ST for different subpopulations, prior F ST mean 0.01, standard deviation 0.05, and constant lambda valued was set to 1. The length of the initial burn-in period was set to 500,000 iterations of burning followed by 1,000,000 Monte Carlo Markov Chain (MCMC) repetitions in the data collection phase, with ten independent runs for each value of K, and the number of inferred clusters varying from one to ten. Individuals in the populations were probabilistically assigned to hypothetical original populations, or jointly to two or more populations if their genotypes indicate that they are admixed. The most likely number of K was jointly determined by the method of ΔK developed by Evanno et al 36 and the Pritchard's method 37 based on the estimation of the K value at which the likelihood distribution began to decrease. The run at which K value yield the highest likelihood of the data given the parameters values was used for plotting the distribution of individuals membership coefficients (Q). Individuals analysed in this study were assigned with high probability (Q membership coefficient > 0.90), while the assignment of admixed individuals to the most likely population was set at Q > 0.70.

Data availability
Thirty-nine COI barcode DNA fragments generated from this study were deposited in the GenBank under the accession numbers MW286215-MW286253.