The tomato borer, Tuta absoluta, invading the Mediterranean Basin, originates from a single introduction from Central Chile

The Lepidopteran pest of tomato, Tuta absoluta, is native to South America and is invasive in the Mediterranean basin. The species' routes of invasion were investigated. The genetic variability of samples collected in South America, Europe, Africa and Middle East was analyzed using microsatellite markers to infer precisely the source of the invasive populations and to test the hypothesis of a single versus multiple introductions into the old world continents. This analysis provides strong evidence that the origin of the invading populations was unique and was close to or in Chile, and probably in Central Chile near the town of Talca in the district of Maule.


Results
Many samples were not at Hardy-Weinberg equilibrium (see Supplementary Table S1). These results together with the difficulty to unambiguously score microsatellite genotypes -a well-known phenomenon in Lepidoptera 4 -suggested the presence of frequent null alleles. Analyses were performed with and without the 5 loci (T437, T350, T378, T482, T458) that accounted for most disequilibria. The results presented here were obtained with all 12 loci that were analyzed, and those obtained with the 5 most problematic loci were qualitatively the same. Pairwise F ST values and genotypic differentiation tests (see Supplementary Table S2) showed that samples from the native area were strongly and significantly differentiated between different countries (with F ST values ranging from 0.14 (Argentina vs Chile) to 0.31 (Argentina vs Venezuela), and p , 10 25 ). Samples from the same country in South America were not significantly differentiated with the notable exception of Southern and Northern Chile that were differentiated from Central Chile (with F ST of 0.075 and 0.042, respectively). Most Figure S1). In addition, all the samples of the invaded area displayed minimum values of pairwise F ST with the samples from central Chile to which the maximum mean assignment likelihood, LiRs, of each African, Asian and European sample corresponded ( Figure 3). Moreover the NJ tree grouped the invasive population samples together with the samples from Central Chile (Figure 2). These results all suggest that Chile, and particularly Central Chile, is the most probable source location of T. absoluta invading the Mediterranean Basin.
The results of the Approximate Bayesian Computation (ABC) analyses are shown in Table 1. These results clearly indicate that Chile is the true source of the population of T. absoluta invading the Mediterranean Basin with posterior probabilities .0.9 and non overlapping confidence intervals. Low type I and mean type II errors were obtained for both prior sets (Table 1). When Chile was removed from the analyses, the scenario with a non-sampled ''ghost'' South American population was selected with posterior probability .0.97 (details not shown) confirming Chile as the actual source of the invading populations of T. absoluta. Complementary analyses considering the North, Central and Southern part of Chile as 3 putative sources in addition to the clusters of Colombia and Argentina, and the ghost population were performed (Supplementary Figure S2). They clearly revealed that Central part of Chile is the most probable source of invasive T. absoluta populations (Table 1), with posterior probabilities of 0.87 and 0.67 for the first and second sets of priors and samples respectively. Again, low type I and mean type II errors were obtained for both prior sets (Table 1).

Discussion
The main result of this study is that the origin of the invading populations around the Mediterranean was identified as unique and was in or close to Chile, and probably Central Chile in the district of Maule. Within this region, T. absoluta displayed very weak genetic structure so that it was not possible to infer the origin of invasive populations at a finer geographical scale. The absence of genetic structure in this region, as determined by the use of neutral genetic markers, indicates that the region surroundings Talca contains a single population of T. absoluta. The choice of new natural enemies (e.g. parasitoids) for possible use in biological control should take into account this new finding concerning the origin of the invasive population T. absoluta in the Mediterranean, as suggested by Roderick & Navajas 5 .
The second important conclusion of this study is that the native population of T. absoluta in South America is far from genetically homogeneous. Substantial genetic differentiation was found between northern and southern regions of South America, with more than 20% of the allele frequency variance found between southern Chile and the group of Venezuela and Colombia. Such a high level of differentiation is also found over smaller distances, for instance between Argentina and Chile or between Venezuela and Colombia. Such strong genetic structure in the native area does not usually facilitate the precise inference of the source of invasive populations because it requires an extensive sampling scheme.
The third main result was that there was an almost complete absence of genetic structuring in the invaded areas, from southern Spain to Israel and from Israel to Morocco. This genetic homogeneity over space was measured using hyper variable microsatellite markers and therefore was probably not the consequence of a low power of analysis. Instead, it very probably corresponds to a single introduction in Africa or Spain followed by an expansion without noticeable a demographic bottleneck, which would have led to genetic differentiation through space. The same situation has already been found in other recent insect invasions, such as the Asian ladybeetle Harmonia axyridis expansion in France 6 , the western corn rootworm Diabrotica virgifera virgifera expansion in North America and Central Europe 7 , and the Colorado potato beetle 8 .

Methods
We collected samples from various regions in South America, and from the invaded area in North Africa, Europe and Asia (Figure 1, Supplementary Table S1). T. absoluta larvae were collected on tomato plants in greenhouses or in open field and stored in ethanol (.90%) prior to DNA extraction. Total genomic DNA of each sampled individual was extracted using the DNeasy Tissue Kit (Qiagen) following manufacturer's instructions. The genotypes of 966 individuals were obtained at 12 microsatellite markers (T454, T425, T437, T350, T235, T310, T271, T426, T478, T378, T482,  T458) 9 . Intra and inter sample variability statistics, including the mean number of Table 1 | Posterior probabilities of the scenarios in two sets of ABC analyses for two different sets of priors and samples. Prior and sample sets are detailed in Supplementary Table S3. 95% confidence intervals (CI) are in brackets. The 95% CI of the selected scenarios never overlapped those of competing scenarios. The values presented in italics correspond to the second set of priors and the second set of samples. Type 1 error is the probability of selecting another scenario when the chosen scenario is true. Type 2 error is the mean probability of selecting the chosen scenario when it is false. Type 2 error i is the probability of selecting the chosen scenario when scenario i is true (the mean of type 2 error i is type 2 error). The lines in bold characters correspond to the chosen scenarios alleles per locus, Nei's diversity 10 , and pairwise F ST values 11 , were computed using Genepop 12 (ver. 4). Hardy-Weinberg (HW) and genotypic differentiation tests were performed using Fisher exact tests implemented in Genepop. The most probable source population of the European and African invasive populations was investigated as follows 7 . 1) First, we analyzed the pair-wise F ST values between each invasive population sample and each South American sample. 2) We then computed the mean individual assignment likelihood 13 (denoted LiRs) of each invading population sample i, to each possible South American source population using GENECLASS2 14 (ver. 2.0). The most probable source of a target invasive population's sample i was determined as the South American population whose sample displays the minimum corrected F ST values with i and the maximum mean individual assignment likelihood of i. 3) We also plotted a neighbor joining (NJ) tree 15 based on the genetic distance described by Cavalli-Sforza & Edwards 16 using the POPULATIONS software version 1.2.30 (http://bioinformatics.org/,tryphon/ populations/). It is expected that the source of a target invasive population's sample i is located in close proximity to i in the tree. 4) Finally, a Bayesian clustering analysis was performed using the STUCTURE software 17 with K, the number of clusters considered varying from 1 to 10. For each value of K, an admixture model with correlated allele frequencies, the LocPrior option, 20 runs per K, 10 6 iterations for the MCMC and 2 3 10 5 iterations for the burn-in period were used. The most probable source of each target invasive population's sample i was determined as the population whose samples were the last ones to still cluster with i with increasing values of K. The most likely value of K was determined using the method of Evanno et al. 18 and by eye, by examining the geographical coherence of the clustering for increasing values of K. The four above-mentioned analyses were performed on samples treated individually, i.e. without pooling samples, because frequent significant genotypic differentiation tests were found (see Supplementary Table S2).
An Approximate Bayesian Computation Analysis 19 (ABC) was carried out with DIY ABC 20 to measure our confidence in the source population inference. ABC is a model-based Bayesian method allowing posterior probabilities of historical scenarios to be computed, based on genetic (here, the genotypes of the samples at the 12 microsatellites) and historical data (1 st observation dates of the samples) and on historical and genetic parameters priors (Supplementary Table S3, Supplementary Figure S2). We contrasted 4 historical scenarios differing by the actual source population of the Mediterranean invading populations (Supplementary Figure S2). The putative source populations were the 3 South American clusters (Venezuelan/ Colombian, Argentinean, and Chilean clusters) obtained from the STRUCTURE analysis described above (Figure 1) plus a non-sampled ''ghost'' South American population, modeling the case where the actual source population in South America had not been sampled. In each scenario the four South American populations diverged independently from an ancestral population with transitory reduction in population size at time ti (i 5 1, 2, 3, 4). The analyses were conducted twice, with different parameter prior distributions and with 2 different sets of samples representative of the STRUCTURE clusters. Spa_cas and Mar_lar (see Supplementary Table  S1) samples were chosen because they were large enough (.25 individuals) and they were sampled closest to the first observation point in Europe and Africa respectively 1 . The other samples representative of the STRUCTURE clusters were chosen randomly among the samples, with more than 25 individuals, belonging to each cluster.