Introduction

The sycamore lace bug, Corythucha ciliata (Say) (Hemiptera: Tingidae), is an invasive insect species that originated from the eastern and central regions of North America1. It spread to Italy in 19602 and then became widely-distributed throughout Europe3. Shortly after 1990, it successively invaded South America4, Asia5 and Australia6. C. ciliata causes damage to sycamore trees by feeding on the foliage, thus reducing photosynthesis and promoting disease. Once established, it can form fairly stable, high density populations7,8,9,10. C. ciliata has become a primary pest of trees in the genus Platanus 11,12,13, and effective controls are lacking during widespread outbreaks14. For example, C. ciliata caused serious damage throughout Europe in the 1980 s, and this resulted in the formation of a special working group to study its biology, ecology, nature enemies and chemical control.

In China, C. ciliata was first discovered in Changsha in 200215 and then in Wuhan16, 17. In a relatively short time, it dispersed to at least 23 cities in China and to the Yangtze River basin17,18,19,20. Why is C. ciliata so successful in invading new places? It is helpful to note that this insect has a strong reproductive capacity, and adult females can produce 250–350 eggs20. Secondly, C. ciliata can withstand both high and low temperature extremes. In laboratory experiments, the Ltemp50 and Ltemp90 of C. ciliata adults and nymphs were 41.4 °C and 43.8 °C, respectively, after one hour of high temperature stress21. C. ciliata adults can also survive after exposure to very low temperatures; e.g. −23.3 °C. Additionally, C. ciliata is easily dispersed; although its wings are relatively weak, it can migrate thousands of meters via wind currents22 and has been extensively disseminated by human activity1. For example, it may have spread from North America to Italy as a result of the shipping industry10. In Turkey, C. ciliata has dispersed to large cities located along highways23, which indicates that human transportation is a major factor in dissemination. Finally, the host trees, Platanus spp., are widely planted in urban areas. C. ciliata has presumably co-evolved with Platanus spp.24, which contains seven tree species originating from North America, Europe, and Asia.

Mitochondrial DNA sequences and microsatellites are frequently used to investigate the genetic structure of populations25,26,27, and the combined use of these two genetic markers may result in more accurate information than that obtained from a single marker28. In the present study, three mtDNA genes (cytochrome c oxidase subunit I, COI), NADH dehydrogenase subunits 1 and 5 (ND1 and ND5) and nine microsatellite loci were used to study the genetic structure of 21 Chinese and two European populations of C. ciliata. The primary goals of this study were to: (1) investigate the genetic structure and diversity of C. ciliata populations in China and Europe; (2) identify the potential routes of dispersal in China; (3) define the demographic profile associated with the expansion of C. ciliata; and (4) analyze the genetic and environmental factors contributing to the invasiveness of C. ciliata.

Results

Genetic diversity

Using the mitochondrial markers, five haplotypes (H1–H5) were identified for COI, ND1 and ND5 (Fig. 1) (GenBank Accession numbers: KP420509-KP420523). Parameters of genetic diversity were calculated for the 23 populations based on concatenated sequences (Table 1), and COI, ND1, and ND5 sequences independently (Supplementary Table S1). The number of haplotypes per population (n) ranged from 1–6, 1–3, 1–2 and 1–3 in concatenated sequences, COI, ND1, and ND5, respectively. Haplotype diversity (H) ranged from 0.000–0.889, 0.000–0.711, 0.000–0.533, and 0.000–0.644 in concatenated sequences, COI, ND1 and ND5, respectively. Based on concatenated sequences, the H values of the ZJ and SV populations were the highest (H = 0.889) and the GY, WH, YZ and YC populations were the lowest (H = 0.000) (Table 1). In addition, nucleotide diversity (π) ranged from 0.00000–0.00376, 0.00000–0.00733, 0.00000–0.00249 and 0.00000–0.00356 in concatenated sequences, COI, ND1 and ND5, respectively. The π values were highest for the ZJ population (π = 0.00376) and lowest for the GY, WH, YZ and YC populations (π = 0.00000) based on the concatenated sequences (Table 1). We also analyzed the genetic diversity of the G1, G2, G3, and G4 groups (identified by SAMOVA analysis, see below) based on the concatenated sequences. The H value of G4 was higher than the other three groups (G1, G2, G3), and the π values for G3 and G4 were much higher than G1 and G2 (Supplementary Table S1).

Figure 1
figure 1

Collection sites in China and haplotype distribution in each population. (A,B and C) represent COI, ND1 and ND5, respectively; (D) national map and legend. The map is made by ArcGIS 10.2 software, http://www.arcgis.com/features/.

Table 1 Parameters of genetic diversity of 23 Corythucha ciliata populations based on mtDNA and microsatellite loci.

A total of 158 alleles were detected at the nine microsatellite loci, and genetic variability is shown in Supplementary Table S2. All markers were highly informative (PIC > 0.50), which make them useful for analysis of genetic diversity. The number of alleles per locus ranged from 10 (locus CA15) to 23 (locus GA365), and 17.5 was the mean number of alleles per locus. The number of private alleles (Unique Allele Number, UAN) occurred at very low frequencies (<0.025) for all loci. The mean observed and expected heterozygosity values across loci were 0.58 and 0.74, respectively, and the null allele frequency varied from −0.0507–0.1497 (Supplementary Table S2). What’s more, the specific null allele frequencies of nine loci in each population were showed in Supplementary Table S3. In addition, the genetic diversity of C. ciliata from the two European populations (Italy and Slovenia) was relatively higher than most of the other populations (P = 0.2422 for H; P = 0.0019 for A).

A total of 552 individuals representing the 23 populations were genotyped with the nine microsatellite markers (Table 1). The average number of allelic richness (AR) for each population ranged from 4.29 (GY, JA, YC and XZ) to 7.06 (SV and IT). The average number of alleles (A) for each population ranged from 4.56 (WH, ZZ, YC, and JA) to 7.67 (SV and IT). The average observed heterozygosity (HO) ranged from 0.47–0.77, mean expected heterozygosity (HE) was 0.50–0.75, and the average inbreeding index (Fis) ranged from −0.17 to 0.28. We also analyzed two groups (GI and GII, identified by Structure analysis, please see below), and the genetic diversity of GI was lower than GII (P < 0.0001) (Supplementary Table S4). Fisher’s exact test for Hardy-Weinberg equilibrium (HWE) across loci and populations indicated that all 23 populations, and the nine microsatellite loci showed significant deviation from HWE (Supplementary Tables S2 and S3). Besides, no pair of loci showed significant linkage disequilibrium.

Population genetic structure

With respect to concatenated mtDNA sequences, SAMOVA analysis indicated that the FCT value was highest (0.77243) when K was four; consequently, the 23 populations were divided into four groups designated G1, G2, G3 and G4. G1 included the following populations: HF, YCC, GY, WH, HY, ZZ, CS, TC, YC, JA, CD, HZ, XA, YW, SV and IT. G2 included NJ and YZ, G3 contained LY, SQ, XZ and TA, and G4 consisted of a single population, ZJ. Although G3 was an exception, the occurrence of G1, G2, and G4 was not consistent with geographic distribution.

AMOVA analysis revealed that over 73% of the variation occurred between groups; variation was relatively small within groups and within populations for the concatenated sequences, COI, ND1, and ND5 (Table 2). Furthermore, AMOVA revealed that significant genetic structure occurred in C. ciliata at various hierarchical levels (among groups, among populations within groups, and within populations) (P < 0.05) (Table 2).

Table 2 Partitioning of genetic variation at different hierarchical levels.

For microsatellite data, we implemented Bayesian cluster analysis with the Structure program. This analysis revealed that the most likely K value was 2 (chosen with Evanno’s ΔK method) (Fig. 2A), which suggested that two main clusters existed among the 23 populations (Fig. 2B). Cluster 1 (GI, shown in red) contained five populations (GY, WH, ZZ, YC and JA), whereas the remaining populations were assigned to cluster 2 (GII, green).

Figure 2
figure 2

Cluster analysis by STRUCTURE for 23 populations of C. ciliata. (A) Inference of the number of genetic clusters (K) for C. ciliata populations. (B) Proportion of genome of each individual assigned to each of the two clusters. Each individual is represented by a vertical bar.

According to the two clusters assigned by Structure, AMOVA suggested that significant genetic structure was present for C. ciliata at various hierarchical levels (between groups, among populations within groups, and within populations) (Table 2). A high level of differentiation (80.04%, F ST = 0.19958**) was found within populations based on microsatellites, which was different from mtDNA analysis where a high level of differentiation was found between groups.

The genetic differentiation among 23 C. ciliata populations was estimated by pairwise FST values between populations based on concatenated mtDNA sequences and microsatellite data (Supplementary Table S5). FST values varied from −0.09 to 1.00 for concatenated mtDNA sequences and from −0.02 to 0.32 for microsatellite data. The differentiation between populations was significant (P < 0.05 and P < 0.01) (Supplementary Table S5). Wright (1938)29 assumed that 0.05 < FST < 0.15 represented a moderate level of genetic differentiation; whereas 0.15 < FST < 0.25 and FST > 0.25 indicated a high level of genetic differentiation. Thus, C. ciliata groups possess obvious genetic differentiation (Table 3). We also noticed that the values for individual populations are not high for microsatellites but is high for some populations from mtDNA data. Usually, mitochondrial introgression from a close species, male-biased dispersal, demographic expansion or selection on mtDNA were considered as the main reasons for the mito-nuclear (microsatellite) discordance. In the present study, selection may be the cause of the observed discrepancy, and may have differential effects on genetic markers. Loci experiencing balancing selection may have allele frequencies more similar than expected under neutrality, reducing the diversity estimates, whereas selection pressure on mtDNA may accelerate the coalescence of lineages, and thus increase the levels of differentiation.

Table 3 The F ST value between the groups based on concatenated sequences.

We constructed a NJ tree from concatenated mtDNA sequences using Nei’s genetic distance and PHYLIP v. 3.224 (Fig. 3). In the phylogenetic tree, the clusters among populations were consistent with Nei’s genetic distance. According to the NJ tree, the following populations were assigned to sister groups: (i) WH, YC, JA, ZZ and GY; (ii) YW and HY; (iii) YCC and TC; (iv) CD and HZ; (v) SQ and XZ; (vi) NJ and YZ; and (vii) SV and IT. Some population clusters were consistent with the group assignment (G2 and G3) as defined in SAMOVA; examples include the cluster represented by TA, LY, SQ and XZ and the cluster containing YZ and NJ. It is noteworthy that the European populations SV and IT were closely related to the populations from eastern China represented by ZJ, TA, LY, SQ, XZ, NJ and YZ.

Figure 3
figure 3

NJ tree inferred from the concatenated mtDNA sequences.

Additionally, we constructed a NJ tree for COI sequences of 25 populations. This included the 23 populations listed in Table 1 and two additional populations from Canada (CN) and Germany (GM), which were obtained from GenBank (Fig. 4). In the phylogenetic tree, the clusters among populations were consistent with Nei’s genetic distance. According to the NJ tree, we found the same population clusters (NJ and YZ; ZJ, TA, LY, SQ, XZ) with the defined groups of the concatenated mtDNA sequences. The SV and IT populations were not sister groups, whereas the CN and GM populations were sisters. In the NJ tree, we also observed two clusters. In cluster 1, the SV population grouped with 11 Chinese populations. In cluster 2, the IT, GM and CN populations grouped with the remaining ten Chinese populations. The genetic diversity of cluster 2 was higher than cluster 1 although it is not significant (P = 0.1124 for H and P = 0.1831 for π).

Figure 4
figure 4

NJ tree inferred from the COI sequences.

We also constructed a NJ tree from microsatellite data (Fig. 5). Analysis indicated the presence of two primary clusters among the 23 populations, and the distribution was consistent with Nei’s genetic distance. When we compared the NJ tree derived from mtDNA sequences with the dendogram obtained for microsatellites, the cluster formed by the GY, WH, ZZ, YC and JA populations was conserved. The sister groups (SV and IT; NJ and YZ; CD and HZ; YW and HY; and YCC and TC) were also preserved (Figs 3 and 4).

Figure 5
figure 5

NJ tree inferred from multilocus microsatellites.

Gene flow and isolation by distance

The effective mutation scaled population size (Θ) was estimated for each population and group based on concatenated mtDNA sequences and microsatellite data. The bidirectional mutation-scaled immigration rate (M) was estimated for 253 population pairs (Supplementary Tables 6 and 7). The M ranged from 282.0 (ZZ → ZJ) to 777.3 (CS → GY), which indicated high gene flow (Θ * M) between population pairs with concatenated mtDNA sequences. Furthermore, 22 population pairs (8.7%) showed asymmetric migration rates (Supplementary Table S6).

With respect to the four groups identified by mtDNA analysis, M varied from 226.2 (G1 → G3) to 742.2 (G4 → G2) (Table 4). There were no asymmetric migration rates between group pairs. More importantly, the migrants from the eastern regions (including G2, G3 and G4) to the western regions (G1) were high (Table 4).

Table 4 Estimates of gene flow between four Corythucha ciliata groups based on concatenated mtDNA sequences.

Additionally, we also observed higher migration rates (M) and asymmetric migration between population pairs based on microsatellites. The M ranged from 131.93 (GY → CS) to 888.44 (HF → YW), and 48 population pairs (18.9%) had asymmetric migration rates (Supplementary Table 7). The FST value between GI and GII was 0.12927 (P < 0.0001).

The Mantel test indicated that there was no correlation between genetic and geographic distance (Km) for concatenated mtDNA sequences (r = −0.0683, P = 0.6943), COI (r = −0.1139, P = 0.8038), ND1 (r = −0.1378, P = 0.8723) and ND5 (r = −0.1547, P = 0.8820) and microsatellites (r = −0.1247, P = 0.8821).

Tests of alternative hypotheses on the source and dispersal of C. ciliata

To decipher the invasion and colonization routes of C. ciliata, we used the Approximate Bayesian Computation (ABC) analysis30, 31. The first step of ABC analysis was undertaken to define the most likely area for the initial invasion by C. ciliata. These results unambiguously pointed to a scenario where invasion initiated in the eastern coastal region (G2, G3 and G4) (logistic regression: 0.5192 [95% CI: 0.4556–0.5827]) (Scenario 2; Fig. 6A and Table 5). The second step of ABC analysis was intended to assess genetic relationships of the three most likely sources of introduction (G2, G3 and G4). Scenario testing analyses of the relationships among the three eastern groups suggested that G4 developed from an admixture of G2 and G3 (logistic regression: 0.6023 [95% CI: 0.5359–0.6688]) (Scenario 4; Fig. 6B and Table 5).

Figure 6
figure 6

Scenarios for the DIY ABC analyses, which was designed to infer the origin and dispersal of C. ciliata. (A) Two scenarios showing the origin of C. ciliata. (B) Six scenarios showing relationships of three groups (G2, G3 and G4); regarding the variation in population size and the split and admixture events. The best scenario in the corresponding step was indicated by red square.

Table 5 Description of the scenarios used in the approximate Bayesian.

Ecological niche models

To evaluate the abiotic environmental factors that have influenced the distribution of C. ciliata, we implemented the MaxEnt model, which enables probability distributions to be estimated from incomplete information32. The area under the curve (AUC) for the test data was 0.994, indicating a high fit of the modelled and observed distributions32, 33. As evident from Fig. 7, most regions of eastern, central, and southern China were predicted to be highly suitable for C. ciliata. The Jackknife evaluation indicated that precipitation during the warmest quarter (Bioclim 18) and temperature seasonality (Bioclim 4) were the main factors influencing the distribution of C. ciliata (32.9% and 22.8%, respectively).

Figure 7
figure 7

Potential distribution of C. ciliata under current climate conditions. The map is made by ArcGIS 10.2 software, http://www.arcgis.com/features/.

MJ networks of haplotypes and demographic history

MJ networks were reconstructed from haplotypes of COI, ND1, and ND5 (Fig. 8). Some haplotypes occurred at a higher frequency (G1 and G3) and were centrally located in networks (e.g. H2 for COI, H1 for ND1 and H1 for ND5).

Figure 8
figure 8

Median-joining networks of haplotypes. Sections of pies are proportional to haplotype frequency. (A,B and C) represent COI, ND1 and ND5, respectively. Pink, Group 1; blue, group 2; green, group 3; and yellow, group 4.

Most of Tajima’s D and Fu’s Fs values of concatenated sequences, COI, ND1 and ND5 were positive and not significant. However, exceptions included the Fs value for G1 in the concatenated sequences, and Tajima’s D values for G1 and G3 in the concatenated mtDNA sequences, COI, ND1 and ND5, respectively (Table 6). Thus, populations of C. ciliata in China experienced a population reduction or bottleneck effect. The mismatch distribution for concatenated sequences, COI, ND1, and ND5 was not unimodal (Fig. 9), which suggests that C. ciliata populations in China did not experience a recent expansion.

Table 6 Parameters of demographic history of the collated 23 populations and each of four population’s groups independently based on concatenated mtDNA sequences.
Figure 9
figure 9

Observed and simulated mismatch distributions. (A,B,C and D) represent concatenated sequences, COI, ND1 and ND5, respectively. The horizontal axis represents the number of pairwise differences, and the vertical axis represents the relative frequency.

Bottleneck effects based on microsatellite data

We used three microsatellite mutation models (IAM, SMM and TPM) to analyze bottleneck effects in C. ciliata. In the three mutation models, we used the Wilcoxon test to analyze whether the heterozygosity excess was significant. Two populations (CS and TA) and one population (XZ) had a statistically significant excess of heterozygotes when analyzed using the infinite allele model (IAM) and two-phased model of mutation (TPM), respectively (Table 7). However, heterozygosity excess was observed in 12 populations (HF, GY, WH, ZZ, CS, LY, NJ, SQ, XZ, YZ, YC and JA) using the step-wise mutation model (SMM, Table 7), suggesting that most C. ciliata populations in this study underwent a genetic bottleneck.

Table 7 Bottleneck test for Corythucha ciliata based on microsatellites.

Discussion

Genetic diversity and population differentiation

Based on analysis of variation in three mtDNA sequences and nine microsatellites, we conclude that the genetic diversity of C. ciliata populations in China is relatively low (mean values of H, π and A are 0.447, 0.00080 and 5.59, respectively) and generally less than the diversity exhibited in European populations (P = 0.2422 for H; P = 0.0019 for A) (Table 1 and Supplementary Table S1). Invasiveness of a species is often associated with a loss in genetic diversity, which may explain these findings34,35,36,37. Interestingly, the ZJ population (group 4) exhibited a high level of genetic diversity (H = 0.889 and π = 0.00376) and a high level of gene flow with the other three groups (9.135 to 21.450). The increased genetic variability due to the admixing between G2 and G3 (Fig. 6), along with multiple introductions, may explain the relatively high genetic variability of insects in G4. It is important to mention that ZJ (Zhenjiang) is one of the most important hub cities in eastern China; thus it remains a convenient location for dispersal of C. ciliata via humans and vehicles.

The HWE test revealed that the 23 populations significantly deviated from HWE. We analyzed three possible reasons for this deviation by excluding the presence of null alleles. Significant deviation from HWE could be caused by inbreeding (Fis: −0.17–0.28), and sampling bias and overlapping generations could result in the inbreeding phenomenon. In support of this, we observed that C. ciliata likes to congregate at feeding sites and exhibits generational overlap after the second generation38. Besides, we could not rule out possibility that homozygote excess was related to HWE because most of the loci had a tendency to slight homozygote excess (Table 1).

Genetic structure and dispersal of C. ciliata in China

We discovered that C. ciliata exhibits obvious genetic structure in China based on Structure analysis. SAMOVA indicated that the 23 populations were defined as four groups based on mtDNA sequences, and AMOVA indicated that significant genetic differentiation existed among groups, among populations within groups, and within populations. Additionally, the NJ tree revealed a pattern of genetic differentiation similar to the SAMOVA analysis. Conversely, Structure analysis indicated that the 23 populations formed two clusters based on microsatellites; AMOVA results showed significant genetic differentiation among groups, among populations within groups, and within populations. The NJ tree based on microsatellites revealed genetic differentiation similar to Structure analysis. The pairwise FST values also indicated that the Chinese populations of C. ciliata have obvious genetic differentiation. Moreover, genetic differentiation was not caused by geographic isolation because genetic and geographic distances were not correlated.

The direction of asymmetric gene flow can potentially reveal migration patterns and routes of invasion. Our analysis of gene flow indicated that C. ciliata migrated from the eastern regions (e.g. G2, G3 and G4) to the western regions of China (G1) (Table 4). Furthermore, ABC analyses indicated that the most likely scenario regarding the invasion of C. ciliata was a complex situation involving admixture and split. More specifically, G4 (ZJ) was an admixture derived from G2 and G3, which was then introduced to inland China (G3). This route of invasion is supported by the fact that sycamore trees were transplanted to China. Thus, we hypothesize that C. ciliata invaded China from seaports located along the east coast, possibly due to the importation of sycamore trees. However, it is important to note that our hypotheses are somewhat inconsistent with the initial reports of C. ciliata in Changsha15 and Wuhan16, 17, which are inland cities.

Some populations located in eastern China (e.g. ZJ) exhibited a high level of genetic diversity. These populations exhibit an asymmetric migration relative to other populations and have a slighter higher genetic diversity in mtDNA than other populations (Mean value of H: 0.517 > 0.414, P = 0.2411). This is likely to be another example of the Bridgehead effect39, which was predicted to be a common occurrence40 and formed the bridgehead populations that act as a source of western populations in China. The Bridgehead effect which refers to widespread secondary invasions stemming from a particular primary invasive population has been observed for the dispersal patterns of several invasive insects, including Frankliniella occidentalis 41 the fire ant Solenopsis invicta 42 and the western corn rootworm, Diabrotica virgifera virgifera 43.

In summary, these analyses provide compelling evidence that the invasion of C. ciliata started in eastern China and likely spread inland from eastern coastal areas.

Demographic history of C. ciliata

The analysis of Tajima’s D and Fu’s Fs values and mismatch distributions indicated that Chinese populations of C. ciliata fit the neutral evolution model and have not undergone a recent population expansion. Most of the D and Fs values were positive, which suggests that C. ciliata underwent a population reduction and wide migration or a recent bottleneck effect. This hypothesis is supported by several findings reported in this paper. For example, the overall genetic diversity for Chinese populations of C. ciliata was not high based on analysis of mtDNA genes and microsatellite data. Additionally, analysis of gene flow showed that the average migration rate between populations was high. Finally, the microsatellite data indicated that most populations experienced a bottleneck effect.

Invasiveness of C. ciliata and implications for management

It is important to ask why C. ciliata can successfully invade a new location despite low genetic diversity and wide migration or a recent bottleneck effect. Our study provides clues to address these questions from the standpoint of population genetics and environmental factors. First, Chinese populations of C. ciliata have population structure at different levels. Additionally, gene flow is higher between some pairs of populations and groups based on mitochondrial genes. This could be attributed to multiple introductions of C. ciliata in eastern coastal areas of China and subsequent dissemination. Repeated introductions or colonization of a given area from multiple sources can negate the decrease in genetic variability associated with colonization or rescue a species from an actual loss in genetic diversity37. ABC analyses indicated that the C. ciliata invasion started with the G2 and G3 groups independently and then evolved into G4. With respect to environmental factors and dissemination, it is noteworthy that C. ciliata can colonize and invade many different environments (Fig. 7).

In conclusion, the analyses of genetic diversity and population structure for C. ciliata conducted in this study provide a better understanding of the adaptability of this important pest. C. ciliata is adept at successfully invading new locations, which may be helpful in formulating more effective control measures for this invasive species.

Materials and Methods

Sampled populations and DNA extractions

In this study, C. ciliata adults (females and males) were collected from leaves of Platanus spp. in 23 urban locations, including Italy and Slovenia. The 21 Chinese populations (Fig. 1 and Table 8) encompassed most of the regions infested with C. ciliata. Adult samples of C. ciliata were preserved in 100% ethanol and stored at −20 °C. Genomic DNA was extracted from individual samples using DNAVzol (Bioteke, Beijing, China) and stored at −20 °C until needed for PCR.

Table 8 The information of sampling populations.

Amplification of mtDNA genes and sequencing

Three fragments of the mitochondrial genome (631, 751 and 786 bp of COI, ND1 and ND5, respectively) were amplified using DNA from 10 individuals per population. Primer pairs included COI-F (5′GGTCAACAAATCATAAAGATATTGG) and COI-R (5′-TAAACTTCAGGGTGACCAAAAAATCA)44, ND1-F (5′-ATTCAGACTCTCCTTCAGCA) and ND1-R (5′-GGATAACAGCCTAATCTTCT), and ND5-F (5′-AAAATCACCTCAACTATCAT) and ND5-R (5′-GCTCCTACACCAGTTTCTTC). Primers for ND1 and ND5 were designed based on the complete mitochondrial genome sequence of C. ciliata 45. Conditions for PCR amplification were as follows: initial denaturation for 5 min at 95 °C, followed by 35 cycles at 94 °C (1 min each), annealing for 1 min at 45–50 °C, elongation for 1–3 min (depending on predicted fragment length) at 68 °C, and a final extension step at 72 °C for 10 min. Amplification products were purified and then sequenced by Invitrogen Biotechnology Co. (Shanghai, China).

Amplification of microsatellites and determination of fragment size

The nine microsatellite loci, primer sequences, and PCR conditions used in the current study were previously described by Yang et al. 46. Amplifications were conducted using DNA from 24 individuals per population (total number = 552). Amplified PCR products were analyzed by Sangon Biotechnology Co. (Shanghai, China) using the ABI 3730 automated DNA sequencer (Applied Biosystems). GENEMAPPER v. 3.5 (Applied Biosystems) was used to obtain allele designations.

Analysis of diversity and population structure

Fragments comprising the three mitochondrial genes were initially aligned using ClustalX v. 2.047. The number of variable sites and haplotypes, haplotype diversity, and nucleotide diversity of the three mtDNA genes (COI, ND1 and ND5) were calculated by DnaSP v. 5.048. The expected and observed heterozygosity, polymorphic information content, and estimated null frequency were calculated for each locus in individual populations using CERVUS v. 3.0.349.

The number of alleles (A), allelic richness (AR), inbreeding index (Fis), and observed and expected heterozygosity (Ho and HE) were calculated by FSTAT v. 2.9.3.250. The deviation from Hardy-Weinberg equilibrium for each population and linkage disequilibrium for pairs of loci was calculated with GENEPOP v. 3.451, and P values were adjusted using the Sequential Bonferroni procedure52. Null alleles were identified using Micro-Checker v. 2.2.353. T-tests were used to evaluate the significance of differences in diversity estimates haplotype diversity (H), nucleotide diversity (π) and number of alleles (A) of the populations using JMP v.12.

Population groups were identified using concatenated sequences of the three mtDNA genes; information on longitude and latitude were included, and the data were analyzed using SAMOVA v. 1.054. The most likely number of clusters (K) was determined by repeating the analysis with K ranging from 2 to 6 and selecting the subdivision scheme associated with the highest FCT. The clustering of individuals from all populations was determined using Structure v. 2.055. The most supported number of clusters (K) was obtained based on multilocus microsatellite data. We used an admixture ancestry model and the correlated allele frequency model with a burn-in period of 100,000 iterations and 1,000,000 Markov chain Monte Carlo (MCMC) repetitions to calculate the most likely number of genetic clusters. We performed 10 independent runs for each K to confirm consistency across runs, tested K from 1 to 8, and used the ΔK method of Evanno et al.56 to choose the most supported value of K.

Analysis of molecular variance (AMOVA) was implemented by ARLEQUIN v. 3.057. The definition of groups in AMOVA was determined by SAMOVA and Structure analysis for mtDNA sequences and microsatellite data, respectively. Pairwise FST (differentiation index) was analyzed by ARLEQUIN v. 3.057.

The correlation between genetic [FST/(1 − FST)] and geographic distance matrices in log scale58 was performed using the Isolation by Distance Web Service (IBDWS)59 with 10,000 randomizations.

Gene flow analysis and median joining networks of haplotypes

The gene flow between pairs of populations and groups was estimated by MIGRATE v. 3.2.160. We used a Bayesian search strategy with the following parameters: three long chains (1,000,000 trees) with 10,000 trees discarded in the initial “burn-in”; replicates = YES, 5; randomtree = YES; and heating = ADAPTIVE: 1 {1.0 1.2 1.5 3.0}. The MIGRATE program was conducted for four cycles, and the seed number was changed for each run.

Median-joining (MJ) networks of haplotypes for the three mtDNA genes were constructed using Network v. 4.661, 62.

Scenarios of ABC analyses

To explore models that best describe the demographic history and geographic distribution of present-day populations, we evaluated the posterior probabilities of historical scenarios using the Approximate Bayesian Computation approach (ABC)30 as implemented in DIY ABC v. 2.1.031. Gene flow analyses revealed that migrants moved from the eastern regions (including G2, G3 and G4) to western China (G1). Therefore, to simplify scenarios for ABC analysis, we first combined the three groups from eastern regions and then tested two alternative scenarios for potential sources in China (Fig. 6A). A total of six scenarios were considered with respect to population size, split, and admixture events for C. ciliata in eastern inland regions (Fig. 6B).

Demographic analysis and bottleneck effects

The mismatch distribution in ARLEQUIN v. 3.0 was used for demographic analysis, and Tajima’s D and Fu’s Fs tests were implemented to analyze neutrality. Effective population size before expansion (Θ0), effective population size after expansion (Θ1), population expansion time (τ), and the sum of squared deviation (SSD) between observed and expected mismatches were also calculated. The significance of these statistics was tested by parametric bootstrapping (1,000 replicates) in ARLEQUIN v. 3.0.

Bottleneck effects were detected using Bottleneck v. 3.463. Analyses were based on the frequency of each allele with the following three models: the infinite allele model (IAM)64; the step-wise mutation model (SMM)65, and the two-phased model of mutation (TPM)66. The number of iterations for Bottleneck was 1,000, and the Wilcoxon sign-rank test was implemented.

In all instances where multiple tests were conducted simultaneously, the sequential Bonferroni procedure52 was applied to adjust the nominal significance level.

Phylogenetic reconstruction

Pairwise FST values for populations were calculated using ARLEQUIN v. 3.0 with 1,000 permutations. NJ phylogenetic trees were generated using PHYLIP v. 3.267 and based on Nei’s genetic distance. The NJ tree was constructed using COI sequences from the 23 populations listed in Table 1 and two additional populations from Germany (GM) and Canada (CN); the latter sequences were obtained from GenBank.

MaxEnt modeling algorithm

To estimate the environmental suitability of areas for C. ciliata invasion, we built a species distribution model (SDM) based on field observations of C. ciliata occurrences coupled with environmental maps. Data from sites known to be infested with C. ciliata in China were entered into MaxEnt32. Nineteen bioclimatic data layers were acquired from the WorldClim dataset (http://www.worldclim.org/)68 at a spatial resolution of 2.5 arc minutes. The random test percentage was set to 25%, and the Jackknife procedure was used to estimate the contribution of each variable based on performance of the model. The area under the curve (AUC) value was calculated for model validation; AUC reflects the model’s ability to distinguish between present records and random background points. AUC values ranged from 0.5 (not different from a randomly-selected predictive distribution) to 1 (with perfect predictive ability). Models having AUC values > 0.9 were considered to have very good, >0.8 good, and >0.7 useful discrimination abilities69. The final map was visualized and processed using the ArcGIS platform (http://www.esri.com/software/arcgis).