Multiple incursion pathways for Helicoverpa armigera in Brazil show its genetic diversity spreading in a connected world

The Old World cotton bollworm Helicoverpa armigera was first detected in Brazil with subsequent reports from Paraguay, Argentina, Bolivia, and Uruguay. This pattern suggests that the H. armigera spread across the South American continent following incursions into northern/central Brazil, however, this hypothesis has not been tested. Here we compare northern and central Brazilian H. armigera mtDNA COI haplotypes with those from southern Brazil, Uruguay, Argentina, and Paraguay. We infer spatial genetic and gene flow patterns of this dispersive pest in the agricultural landscape of South America. We show that the spatial distribution of H. armigera mtDNA haplotypes and its inferred gene flow patterns in the southwestern region of South America exhibited signatures inconsistent with a single incursion hypothesis. Simulations on spatial distribution patterns show that the detection of rare and/or the absence of dominant mtDNA haplotypes in southern H. armigera populations are inconsistent with genetic signatures observed in northern and central Brazil. Incursions of H. armigera into the New World are therefore likely to have involved independent events in northern/central Brazil, and southern Brazil/Uruguay-Argentina-Paraguay. This study demonstrates the significant biosecurity challenges facing the South American continent, and highlights alternate pathways for introductions of alien species into the New World.


Results
PCR amplification and sequence analysis. All specimens from the southern/south-western regions of South America were successfully sequenced for the mtDNA COI fragment (GenBank accession numbers MG230495 -MG230526; KU255535-KU255543 from 7 ) using the Noc-COI-F/R primer pairs. Sequence identity searches against the NCBI GenBank database confirmed that all suspected moths matched (i.e., 99-100% nucleotide identity) published H. armigera sequences, and did not contain premature stop codons.
Unique amino acid substitutions were detected in three haplotypes (i.e., Harm_BC47, Harm_BC42, Harm_ BC43). For Harm_BC47, this involved an L/V change, and in Harm_BC42 a I/M change. Both substitutions involved amino acids with hydrophobic side chain. In Harm_BC43 the unique amino acid involved an A to G change where both amino acids belonged to the 'small' category. All remaining unique haplotypes (e.g., Harm_BC06, BC23, BC24. BC34, BC37, BC39, BC44, BC45, BC46) had nucleotide transition substitutions at 3 rd codon positions. The most significant unique non-Brazilian haplotypes detected were Harm_BC13, Harm_BC16, and Harm_BC17. All three unique haplotypes shared SNPs with other haplotypes, indicating that they did not have unexpected base changes. Furthermore, these haplotypes were also detected multiple times in separate sequencing efforts.
The range of genetic distances (i.e., measures of genetic divergence/degree of differentiation) of H. armigera within Asia (China, India and Pakistan) and within Australia were both 0.00-0.04%, while within Europe (Germany and unknown sites) and within South America (Brazil, Argentina, Uruguay and Paraguay) were both 0.00-0.02%. Estimates of evolutionary divergence between all H. armigera sequences from Australia, Asia, Europe and South America were therefore likewise low and ranged between 0.00-0.04%. Observed nucleotide diversity between countries/continents in H. armigera ranged from 0.0024 ± 0.0004 (s.e.) to 0.0040 ± 0.0006 (s.e.) ( Table 1).

Helicoverpa armigera haplotypes distribution and validation of the null model. Results from
validation of the null model indicated no tendency for the test to generate either Type I or Type II errors (i.e., having an almost perfect rectangular distribution of P-values across 10,000 statistical tests, with each test based on 10,000 random realisations of pseudo-observed data). At the 5% confidence level, almost exactly 5% of tests yielded statistically significant results with randomly generated data for the full dataset (501/10,000 = 0.0501 (Suppl. Fig. 1a); and when testing for Brazil vs. non-Brazil 496/10,000 = 0.0496 (Suppl. Fig. 1b)). Figure 1 shows the distribution of χ Ran 2 under the null model. No random realisations were found to have a value of the test statistic greater than or equal to that observed (χ Obs 2 = 719.2), thereby indicating highly significant non-randomness within the observations (P = 0.000), and strong support that at least one haplotype/location observation has observations that are either more or less than expected by chance alone.

Whole table analysis.
Sub-table analysis. Evidence from above (Whole Table Analysis) strongly supported some mtDNA haplotypes were differentially distributed across sampling sites, and the TS DIFF analysis was therefore used to further identify those haplotypes that were unduly rare or common across the locations. The results indicated haplotype Harm_BC01 and Harm_BC02 were both simultaneously under-represented in Argentina (ARG), Paraguay (PRY) and one Brazilian state (BA), and overly represented in another Brazilian state (PI) (Fig. 2). The analysis also found evidence to support haplotypes Harm_BC03 and Harm_BC04 as being sporadically overly represented in three Brazilian states (BA, MT and RS) (dark blue cells, Fig. 2).
Brazil vs. Non-Brazil. In the 'Brazil vs. non-Brazil' treatment of haplotype distribution data (Suppl. Table 4; Fig. 3), non-randomness of haplotype distribution within the matrix was again confirmed by the χ Obs 2 analysis (χ Obs 2 = 164.0, P-value < 0.000; Fig. 3). The irregular distribution of the test statistic in Fig. 2 reflected a smaller dataset and, therefore, fewer possible combinations of allowable observations to fulfil the row and column constraints.
Consistent with the Whole Table Analysis (i.e., Fig. 2), at the scale of the individual haplotypes, Harm_BC01 and Harm_BC02 were simultaneously under-represented in non-Brazil and over-represented in Brazil. For haplotypes Harm_BC13, Harm_BC16 and Harm_BC17 the opposite was true. A trend for over-representation of unique haplotypes Harm_BC43, Harm_BC44, Harm_BC46 and Harm_BC47 at Non-Brazil locations was also detected in this analysis, but deemed statistically insignificant under the Benjamin and Hochberg (1995) false discovery rate correction. The difference in the strength of statistical significance between the two analyses (Figs. 2 and 3) is due to a smaller allowable number of permutations when allocating observations at random to the two aggregated spatial categories.
AMOVA and FST analysis. Overall F ST estimates based on the partial mtDNA COI gene, when treating our data set as Brazilian, non-Brazilian (Uruguay, Argentina and Paraguay), and Old World samples, showed that a significant F ST value (0.2879) was detected between Brazilian and non-Brazilian samples, indicating significant population sub-structure (i.e., low gene flow) between these populations (Suppl. Table 5). In contrast, the low F ST estimate (0.0742, see also Tay et al. 31  Analysis of molecular variance (AMOVA) among populations of H. armigera indicated that low levels of genetic variation existed between populations. Approximately 7% of variation detected could be attributed to that www.nature.com/scientificreports www.nature.com/scientificreports/ between individuals within groups, while the majority ( > 90%) of variation could be explained by the heterogeneous populations present across different geographic regions in the South American continent (Suppl. Table 7).

Discussion
The rapid spread and establishment of H. armigera across much of the South American continent has generated a very large population with significant impacts on agricultural production. In contrast to usual invasions such as the African incursion of the New World fall armyworm S. frugiperda 16,34,[39][40][41][42] , the invasive H. armigera population in the New World appears to be very diverse. Furthermore, the spatial distribution of this diversity strongly suggests that the population has spread from two different regions of introduction. Statistical analyses of the haplotype distribution patterns show that some of the H. armigera haplotypes most commonly found in Brazil appeared to be uncommon in Argentina (i.e., Harm_BC01) and Paraguay (Harm_BC02), and conversely, that some of the less common and/or unique haplotypes found in the non-Brazilian countries (e.g., Harm_BC13, Harm_BC16, Harm_BC17) appeared disproportionately uncommon in Brazil. Furthermore, F ST analyses suggest reduced gene flow between populations from the Cone Sul region (Southern Brazil, Argentina, Paraguay and Uruguay) and populations from either northern/central Brazil or from the Old World. Within individual countries, disproportionally over-represented haplotypes were identified (e.g., see Suppl. Table 2 and Fig. 2), and www.nature.com/scientificreports www.nature.com/scientificreports/ within Brazil, the rare Harm_BC13 haplotype was detected in the State of Santa Catarina at sites located only approximately 350 km from Paraguay where this rare haplotype was also identified. Two other unique haplotypes (Harm_BC42, Harm_BC45) were also only identified in the southern state of Rio Grande do Sul in Brazil.
It is necessary to keep in mind that the northern and central Brazilian H. armigera may contain the same rare haplotypes as our southern region populations, although studies involving greater sample sizes from similar sampling periods 37,38 did not detect these rare haplotypes. Nevertheless, rare haplotypes detected in Argentina, Paraguay and Uruguay have also not been detected in other Brazilian samples to-date. We were unable to include the mtDNA COI sequences of Tay et al. 31 due to different mtDNA COI regions being characterised. The study of Tay et al. 31 analysed substantial Brazilian populations from similar period and had identified similar haplotype frequency patterns as this study. Furthermore, Tay et al. 31 showed that within Brazil, the H. armigera population contained both globally common and globally rare haplotypes. At the national level, there were 13 rare haplotypes identified in Brazil reported, and is similar to the total number of 12 rare haplotypes identified in this study. Despite using different parts of the mtDNA COI gene, both studies have detected similar patterns of rare and common partial mtDNA COI haplotypes, suggesting that the results of our analyses are representative of the diversity present in Brazil at sampling time.
Population structure studies in H. armigera based on the mtDNA genes have found a general lack of substructure even for populations separated by considerable geographic distances (e.g 43,44 .), and this finding is supported by studies based on limited nuclear markers (e.g. 15,17,[45][46][47][48][49] ). Anderson et al. 36 demonstrated differences between sub-species of H. armigera present in Australia/New Zealand (i.e., H. armigera conferta) and the Old www.nature.com/scientificreports www.nature.com/scientificreports/ World sub-species (i.e., H. armigera armigera), but not between global H. armigera armigera populations using genome-wide SNP markers. In northern/central Brazil, gene flow patterns of H. armigera showed non-significant levels of population substructure 24,37 but exhibited reduced gene flow with southern South American populations of H. armigera. These southern South American populations, and particularly that of Paraguay and Argentina, also exhibited significant population substructure with Old World populations, suggesting that their origins differed from the origins of the founding populations present in northern/central Brazil.
Given that sufficient gene flow to prevent population structure had previously been detected in Brazilian H. armigera populations 24,37,38 , and that H. armigera in Argentina, Paraguay and Uruguay 7,25,27 was confirmed shortly after the Brazilian detections, it would be a logical assumption that this likely represented natural migration, and/or movements of contaminated agricultural commodities from Brazil to the southern regions of South America. It was therefore unexpected to find unique mtDNA haplotypes in the southern South American populations that were as yet unreported in Brazil. Equally as unexpected was the lack of the most common haplotypes (e.g., Harm_BC01, Harm_BC02) in these countries given that they represented the major Brazil, and in fact, global, haplotypes.
To further explain the observed heterogeneous haplotype patterns across Paraguay, Uruguay, Argentina, and the southern Brazilian states of Rio Grande do Sul, Santa Catarina and Paraná (i.e., the Cone Sul region), two hypotheses may be put forward: (I) intrinsic factors associated with new biological incursions (e.g., stochastic lineage sorting, survival/reproductive variability, etc.) in a new environment, and (II) independent incursion pathways of H. armigera into South America.
In Brazil where the incursion of H. armigera was first reported, stochastic lineage sorting of founding populations (i.e., hypothesis I) could lead to the observed heterogeneous haplotype distribution patterns. This could involve factors such as lower population density and variability at population introduction phase and/or the lag-phase, high variability of reproductive success rates (e.g., see Gaither et al. 50 ), variable adaptation success rates (e.g., differential response to attacks by parasitoids and/or predation rates 51 ), susceptibility to viral/bacterial/ fungal pathogen attacks, climatic stress, etc. to the novel New World environments at the early incursion stages.
In Brazil, across a number of studies, unexpectedly high genetic diversity of H. armigera has been detected 22,24,31,37 as represented by multiple maternal lineages (Suppl. Table 2 e.g., Suppl. Fig. 2). This would likely indicate a complex incursion history of H. armigera into Brazil, and may be associated with agricultural commodity movements from Old World regions into Brazil 31 . For example, repeated incursions/releases at different Brazilian sites could function as population source increasing propagule pressure, raising and maintaining diversity which is important in sustaining an incipient population 52-54 (i.e. hypothesis II). These suggested scenarios involving the highly volatile and variable periods of an exotic organism's biology contrast the scenarios offered by Leite et al. 37 , where repeated bottleneck effects such as potentially associated with differential pest control/ management strategies were deemed likely factors that underpinned the rapid population expansion signatures in both H. armigera and the New World endemic and closely related H. zea (but see 31,44 ). In fact, H. zea in the New World was hypothesised as the outcome of an earlier incursion and the subsequent divergence (ca. 1.5 million years ago) from its common ancestor with H. armigera 44,55 , and involved a founder population with limited genetic diversity 56 . The H. zea genome, sequenced prior to the recent arrival of H. armigera in the New World, showed no evidence for subsequent introgression with H. armigera, and no evidence for the gain of additional genes affecting host use, but rather for the loss of genes already present in H. armigera 57 .
Repeated introductions and high propagule pressure are increasingly being recognised as important factors that underpin the establishment of an alien species 2,54 . With repeated introduction events, the likelihood of diverse maternal lineages that ultimately contribute to propagule pressure is high. Together with lineage sorting and stochastic processes (e.g., demographic, environmental 53 ) experienced by the invasive species in the new environment, sampling of the mtDNA COI gene and the construction of a haplotype network will likely appear similar to one of a rapid population expansion (i.e., a 'star-shaped' haplotype network). This scenario of a 'star-shaped' haplotype network, as detected in H. armigera populations in the South Americas (see Fig. 2 of 37 ), differed fundamentally to that reported for H. zea (i.e., Fig. 1 of 44 ; Fig. 2 of 37 ). Multiple introductions of an invasive pest insect that resulted in a mtDNA genetic signature similar to a rapid population expansion signature, has also been previously reported in Brazil (e.g. the Asian citrus psyllid Diaphorina citri, see 58 ).
With the migration and dispersal ability of H. armigera in mind, the high frequency (i.e., 68%) of the two most common Harm_BC01 and Harm_BC02 haplotypes in Brazil populations, and a lack of population structures in northern/central Brazil (e.g. 37 ) and the rest of the world, it was perhaps unexpected to observe statistically significant spatial mtDNA COI haplotype patterns and F ST estimates in the Cone Sul region. The significant over-and underrepresented haplotypes in the Cone Sul region suggest that this population likely originated from somewhere outside the extensively sampled areas of central and northern Brazilian populations (i.e., hypothesis II). For example, haplotypes Harm_BC13 and Harm_BC17 for Paraguay and Harm_BC44 and Harm_BC47 for Uruguay and Argentina, respectively, were over-represented in these countries and to a lesser extent, also the over-representation of haplotypes Harm_BC43, Harm_BC44, Harm_BC46 and Harm_BC47,although the P-values (0.100-0.114) lie outside of the range usually considered significant, thereby adding support that these maternal lineages likely originated from non-Brazil source populations (i.e., hypothesis II).
The detection of the H. armigera genetic spatial signatures identified in this work provide the first insights into potential transnational patterns of the incursion of this species into the New World. This reflects opportunities to improve biosecurity protocols relating to phytosanitary practices of agricultural and horticultural commodity movements in the Cone Sul region. Although the haplotype distribution pattern from Rio Grande do Sul was excluded by our spatial analysis, pairwise F ST estimates indicated a limited gene flow of the Rio Grande do Sul population with populations from three Brazilian states (see Suppl. Table 5), and coupled with the statistically significant over-representation of unique haplotypes in the Cone Sul region, our results therefore added support to the hypothesis of multiple introduction pathways of H. armigera into South America.  59 . Although the low sample size (n = 2) in Uruguay has prevented meaningful interpretations of gene flow patterns based on the mtDNA COI marker, future genome-wide SNP markers studies on these individuals may enable migration patterns and the introduction history to be interpreted with more confidence. Taken as a whole, these F ST results suggest that non-Brazilian H. armigera populations, and particularly those from Paraguay, followed by Argentina, and to a small extend that of the Rio Grande do Sul population, likely represent populations that were wholly or partially derived from alternative incursions event(s) from that detected in northern/central Brazil.
Single locus markers have clear limitations, however recent studies from multiple mtDNA markers 31 and from genome-wide SNP markers 59 have also demonstrated multiple introductions from globally diverse H. armigera populations in the invasive Brazilian populations. Taken as a whole, these findings provided evidence to support multiple independent introductions of H. armigera into the South American continent over the effect of stochastic lineage sorting. Our finding is not without precedent, with the globally invasive hemipteran whitefly Bemisia tabaci MED species (i.e., the real B. tabaci 60 ) also being shown to have been independently introduced into this southern region of South America 57,61 in addition to an earlier introduction elsewhere in Brazil 62 .
Findings that populations of H. armigera in the South American continent likely also involved multiple introduction pathways was fortuitous, because the pattern identified in this study is unlikely to be maintained over time as mixing with other populations in South America and further incursions are likely. That incursions of H. armigera from the Old World potentially involved multiple pathways will have significant implications to pest and resistance management strategies in the New World. For example, populations of H. armigera around the world have developed resistance to conventional pesticides (e.g. 31,[63][64][65][66][67][68] , and the South American populations have also arrived at least with resistance to pyrethroids 67 but also potentially with various enhanced allelochemical detoxification traits. Increasing genetic diversity is a key factor that underpins increasing invasion success 69,70 . While significant levels of genetic diversity now exist in Brazil, and propagule pressure has also therefore decreased, the genetic make-up of these populations could be further bolstered by likely unrelated source populations from other parts of the Old World and this will further complicate and challenge management strategies. As pointed out by De Barro et al. 71 , measures to restrict the recruitment of additional genetic diversity should be maintained even after establishment and spread have occurred, so as to avoid increasing the genetic diversity of damaging invasive pests.  Table 2). Preservation of specimens, gDNA extraction procedures and PCR amplification and sequencing of partial mitochondrial DNA COI gene were done following PCR conditions as detailed in Arnemann et al. 7 using the primers Noc-COI-F (5′-GCGAAAATGACTTTATTCAAC-3′) and Noc-COI-R (5′-CCAAAAAATCAAAATAAATGTTG-3′).

Sample collection and DNA extraction.
Selection of published mtDNA COI haplotype dataset. For our analysis, the following criteria underpinned our selection of mtDNA COI from publicly available sources: (A) only include published sequences from studies that had undergone critical review processes to avoid inclusion of untested haplotypes; (B) the sequences must originate from samples collected at a similar sampling time period as our material, (C) the published sequences must match our characterised partial mtDNA COI gene region, and (D) the populations must include northern/central Brazil to enable spatial comparisons to our Southern populations. Based on these criteria, sequences from three studies were chosen 7,37,38 , with the published data from Arnemann et al. 7 also here included as part of the southern populations of H. armigera, while the studies of Leite et al. 37 and Mastrangelo et al. 38 represented the most comprehensive population diversity surveys at the mtDNA COI region in Brazil and fulfilled all four criteria.  www.nature.com/scientificreports www.nature.com/scientificreports/ Sequence analysis of partial mtDNA COI gene. The programs Pregap and Gap4 within the Staden sequence analysis package 46 were used for editing DNA trace files and to assemble sequence contigs (i.e., haplotypes). Assembled mtDNA COI haplotypes were checked for premature stop codons that may be indicative of pseudogenes. Categorisation of global mtDNA COI haplotypes at the 5′ gene region, (Suppl. Table 1) and estimates of evolutionary divergence between all H. armigera individuals (i.e., between South America vs. Europe vs. Asia vs. Australia; n = 314) involved 548 bp of the mtDNA COI partial gene in the final dataset. The evolutionary divergence between haplotypes was estimated using the maximum composite likelihood model 72 , and included all (i.e., 1 st + 2 nd + 3 rd ) codon positions using MEGA6 73 . Estimates of haplotype diversity (h ± SE) and nucleotide diversity (π ± SE) were carried out using the molecular evolution software package DNA sequence polymorphism (DnaSP) v. 5.10.01 74 . Assessments for presence of PCR errors, or nuclear mitochondrial sequences (NUMTs) involved (1) examination for presence of premature stop codons in the sequenced partial mtDNA COI gene region, (2) ascertaining for conservation of amino acid substitution patterns where presence of unique amino acid changes were further assessed for conservation of biochemical properties and/or their molecule sizes, and (3) sequence characterisation of rare haplotypes as confirmed by multiple independent PCR and sequencing efforts.  Table 2), prior to performing a contingency table analysis using the χ 2 statistic to detect departures, as detailed below. The statistical test was based on the randomisation of haplotypes x locations generated according to an appropriate null model (see 'null model' below), similar to Gotelli's 75 analysis of species co-occurrence data.

Analysis of
Null and alternative hypotheses. Due to the perceived unevenness of mtDNA COI haplotype spatial patterns, a statistical test was applied to ascertain whether the diversity and frequencies of individual haplotypes had occurred independently and at random across South American sampling sites, or whether there was evidence for spatial segregation. For such a test the null hypothesis is therefore that mtDNA COI haplotypes are distributed randomly across the locations. The alternative hypothesis therefore considers at least one haplotype as being either more or less common, in at least one location, than expected due to chance alone (i.e. haplotypes were differentially distributed across locations). (1) below was used for the χ 2 statistic to detect departures from randomness:  Table 2), and Z Ran i j , is the expected value for the same table entry calculated under the null hypothesis. If certain haplotypes and/or locations were disproportionately under or over-represented, then χ Obs 2 would be expected to fall within the extreme tail of the distribution of χ Ran 2 (i.e., the calculated value of the statistic when the null hypothesis is known to be true). Although an analysis based on χ Obs 2 provides an overall test of haplotype randomness across the landscape, information on which location and/or haplotype combinations underpin any deviation from randomness will require an additional test statistic (i.e., Eq. (2)) to be applied at the scale of each haplotype/location combination:

Test statistics used. Equation
Equation (2) implies that when TS DIFF < 0.0 a haplotype would be less common than expected in location j due to chance alone, when TS DIFF > 1.0 the haplotype would be more common than expected in location j by chance, and when = . TS 0 0 VAR the observed data conformed with the null model. Analysis of TS DIFF can therefore be used to further explore primary circumstances (i.e., combinations of haplotype and location) and directionality (i.e., whether haplotypes were unexpectedly rare or common across the sites) if/when evidence of differentially distributed haplotypes across the landscapes was detected (i.e. to identify the genotypes that are unduly rare or common across the locations).
Determining the statistical significance of individual table entries is problematic because multiple comparisons are being simultaneously assessed. For example in Fig. 2, there are 11 locations × 25 haplotypes = 275 cell-level tests, of which approximately 14 would be expected to show significance simply by chance (i.e. at the 0.05 level of confidence). To control for this 'familywise' error rate the method of Benjamini and Hochberg 76 was applied, using a false discovery rate of 0.10 77 . This identified seven cell entries of interest in the full analysis (Fig. 2), and five pairwise (Brazil /non-Brazil) comparisons of interest when data were pooled across locations (Fig. 4).
Null model. The null model generated for this study involved distributing all 226 observations (i.e., the complete mtDNA COI dataset of H. armigera in South America; Suppl. Table 2) across haplotypes and locations at random, with the constraint of fixed row and column totals (i.e. the same number of observations of each haplotype, and the same number of observations for each location, are both retained in the randomised matrices). This treatment is necessary to ensure that the null model explicitly accounts for both unequal survey efforts at different sites and the overall lower frequencies of some haplotypes. The algorithm AS159 of Patefield 78  www.nature.com/scientificreports www.nature.com/scientificreports/ This null model represents a non-parametric alternative to Fisher's exact test, which is also based on fixed marginal totals, but which relies upon a chi-squared distribution to approximate the underlying exact distribution. This approximation can be poor if the data are sparse (see Suppl. Table 3), hence the decision to calculate P-values via random permutation.
Calculating P-values. P-values for χ Obs 2 were calculated by enumerating the number of times χ Ran 2 was less than or equal to χ Obs 2 (N LessEqual ), and also the number of times χ Ran 2 was greater than or equal to χ Obs 2 (N greaterEqual ), based on 10,000 randomly generated tables. The two-tailed P-value 79 is given by: Validating the null model. The statistical test was validated by creating n pseudo-observed data sets under the null model, followed by application of the test to each of the created data sets to confirm appropriate Type I and Type II error rates. At the 5% confidence level only 5% of pseudo random data sets should be statistically significant, and the expected P-value distribution across the n tests should be rectangular. If it is found that more than 5% of test results are significant at the 5% confidence level, then this indicates an elevated Type I error rate (i.e., a chance that the null hypothesis is incorrectly rejected when no real differences are actually present). Conversely, when less than the nominal number of pseudo random data sets yield a significant result it can lead to incorrect acceptance of the null hypothesis (i.e., concluding no significant difference even when one is actually present; (i.e., an elevated Type II error rate)).
The null model was validated through repeated analyses (10,000 times) using random pseudo-data reconstructed to match the n = 266 observations (Suppl. Table 2), but allowing observations to be randomly allocated to the matrices, and thereby ensuring conformity with the null hypothesis. Tables that had no observations for a given row or column during the constructing of pseudo-data were excluded. The code, the program and test data are provided as supplementary material (Suppl. Material 1).
Brazil vs. Non-Brazil. The above analyses were repeated, but combining the location data to consider just two categories of samples -the Brazilian and the non-Brazilian (Argentina, Paraguay, Uruguay) samples. Note that this analysis was carried out as a guide to assist interpretation for the observed within-table patterns of deviations, and have not impacted our main findings which was the result from the full dataset (i.e., Fig. 2) analysis.
AMOVA and FST analysis. The population pairwise F ST and AMOVA estimates were carried out using Arlequin 3.5.2.2 39 , and significance values were estimated with 10,000 permutations. For the AMOVA, populations were assigned into four groups to test molecular variation across geographical regions of Brazil (North: Roraima, Maranhão, Piauí; Central/Eastern: Bahia, Mato Grosso; South: Rio Grande do Sul, Santa Catarina, Paraná) and neighbouring countries (Argentina, Paraguay, Uruguay).