Microsatellites reveal that genetic mixing commonly occurs between invasive fall armyworm populations in Africa

Understanding the population structure and movements of the invasive fall armyworm (FAW, Spodoptera frugiperda) is important as it can help mitigate crop damage, and highlight areas at risk of outbreaks or evolving insecticide resistance. Determining population structure in invasive FAW has been a challenge due to genetic mutations affecting the markers traditionally used for strain and haplotype identification; mitochondrial cytochrome oxidase I (COIB) and the Z-chromosome-linked Triosephosphate isomerase (Tpi). Here, we compare the results from COIB and Tpi markers with highly variable repeat regions (microsatellites) to improve our understanding of FAW population structure in Africa. There was very limited genetic diversity using the COIB marker, whereas using the TpiI4 marker there was greater diversity that showed very little evidence of genetic structuring between FAW populations across Africa. There was greater genetic diversity identified using microsatellites, and this revealed a largely panmictic population of FAW alongside some evidence of genetic structuring between countries. It is hypothesised here that FAW are using long-distance flight and prevailing winds to frequently move throughout Africa leading to population mixing. These approaches combined provide important evidence that genetic mixing between invasive FAW populations may be more common than previously reported.

The fall armyworm (FAW, Spodoptera frugiperda) is a highly invasive crop pest in Africa, Asia and Australasia 1 . It is native to North America where it is largely migratory, surviving winters in southern Florida and Texas before migrating north as the temperature warms, though there is some evidence that parts of Central and South America, such as Puerto Rico, have more resident populations that rarely interact with FAW from elsewhere in America [2][3][4] . The migratory nature of FAW means that it has a strong flight ability, and some individuals can disperse as far as 300 miles before oviposition 5 . Wherever it disperses to, the effects are devastating, causing millions of tonnes of crops to be lost, resulting in huge economic losses as well as food shortages 6 .
Understanding the migratory routes of FAW is important as these can be used to predict areas at risk and give farmers warning for early intervention techniques 2,7 . Additionally, understanding gene flow can help to predict outbreaks and foresee the spread of insecticide resistance that primarily occurs through the mixing of populations, leading to resistance alleles becoming more common in populations that were previously susceptible 7,8 .
There is currently a lot known about FAW population structure and movements in its native range (North, Central and South America), enabling farmers to deal with outbreaks and minimise crop losses 2,7,9,10 . Much less is known about potential migration and population mixing in Africa, and much of the available research has been based on mitochondrial cytochrome oxidase I (COIB) and the Z-chromosome-linked Triosephosphate isomerase (Tpi) haplotypes 9,11,12 . There are two Tpi markers used for FAW, TpiE4 that is based on variation in exon 4 which can differentiate between the corn and rice strains, and TpiI4 that is based on intronic variation and has six recorded haplotypes (five corn, one rice) that can differentiate between strains and populations 9,11,12 . However, there is some disagreement between COIB and TpiE4 haplotypes in FAW in Africa for strain identification, with evidence suggesting that the COIB haplotypes are less reliable in distinguishing between invasive populations Results Strain identification and haplotyping using COIB and TpiE4 markers. The COIB marker was analysed using an enzyme based PCR assay 19 , and the TpiE4/TpiI4 product was sequenced using Sanger sequencing. The expected strain discordance between the COIB and TpiE4 markers was observed in all countries, with the markers only reporting the same strain in 19% of samples (see Supplementary Table S1 online). In all countries, both markers identified larvae of the corn and rice strain (Fig. 1). Overall, the COIB marker most frequently reported samples as the rice strain (mean ± S.E. = 72% ± 0.09), whereas the TpiE4 marker reported them as the corn strain (mean ± S.E. = 92% ± 0.02). Both markers showed very similar strain frequencies across Malawi, Rwanda, Sudan and Zambia. In Ghana, more samples were reported as corn strain (63%) than rice using the COIB marker compared to the other countries, and there was significant variation in the distribution of the corn strain based on the COIB marker (χ 4 = 53.17, P < 0.001). However, when using the TpiE4 marker, the proportion reported as the corn strain was similar across all five countries (χ 4 = 1.16, P = 0.885). Those larvae identified as corn strain by the COIB marker in Ghana (N = 45), Rwanda (N = 16), Sudan (N = 6) and Zambia (N = 8) were sequenced using Sanger sequencing to determine the haplotype. All larvae were identified as CSh4 suggesting very little genetic differentiation based on COIB haplotypes.
In all countries, the intronic TpiI4 marker identified both corn and rice strain FAW, with the corn strain (82-99%) being more common compared to the rice strain (1-18%) (Fig. 1A). The most common haplotype in every country was TpiCa1a, and the rarest was TpiCa2C (Fig. 1B). A novel rice haplotype (TpiRa1b) was identified in samples from Malawi, Rwanda and Sudan, where no larvae were of the previously recorded rice haplotype (Fig. 1B,C). The greatest number of different haplotypes was observed in Ghana, with four different haplotypes identified (TpiCa1a, TpiCa2a, TpiCa2C, TpiRa1a). Heterozygotes were recorded in all countries, however due to ambiguity in which haplotype combinations these were, they were only identified as heterozygotes (Fig. 1B). An amova was carried out on the TpiI4 alignment based on genetic distances and showed significant differences between the six countries, however the total variance explained by differences between countries was low, with most of the genetic variation being between individuals within countries, which would suggest a largely panmictic population (Table 1). To further check for genetic structuring based on TpiI4 markers, a PCA was carried out using the genetic distance between sequences and this showed clustering based on strain identification, but no evidence of structuring between the six countries ( Supplementary Fig. S1A online).

Microsatellite locus information.
Microsatellites were amplified by PCR individually, and then genotyped on a ABI3500 sequencer. All eight microsatellites successfully amplified, and the number of alleles found ranged from 3 to 13 (Table 2). Twenty-one individuals (23%) had missing allele data, this ranged from 1 to 3 loci per individual, with an average of 0.34 (Table 2). Null allele frequencies were high for four alleles: Spf1502, Spf343, Spf997 and Spf670 ( Table 2). Seven of the eight microsatellites significantly deviated from Hardy-Weinberg equilibrium (HWE) when all individuals were considered together (Table 2). However, some of these microsatellites were in HWE at the within-country level (see Supplementary Table S2 (Table 3). In all three measures tested, a value of 0 suggests little genetic differentiation (panmixia) and 1 suggests high levels of segregation. The range of the three measures across all loci was 0.03 to 0.14 ( Table 3). There was also evidence of low genetic variance based on Fst between countries at each locus tested (Table 3). Pairwise Fst values between the six countries Table 1. Results of an amova to analyse differences between the six countries based on TpiI4. P value was calculated using a randomization test with 999 permutations.   Table 3. Genetic differentiation measures for FAW in Africa based on the eight microsatellites. In all three measures tested, a value of 0 suggests very little genetic differentiation (panmixia) and 1 suggests high levels of segregation. All measures are based on Hs (heterozygosity within populations) and Ht (heterozygosity without population structure). F-statistics represent genetic variance in a subpopulation compared to the whole (Fstvalues closer to 1 suggest high levels of differentiation between populations) or in a subpopulation compared to individuals within that subpopulation (Fis-values close to 1 suggest high levels of inbreeding in populations). Negative values of Fst and Fis should be interpreted as 0 and suggest very low differentiation of populations (Fst) or very low chance of inbreeding (Fis). Confidence intervals of Fis based on bootstrapping are also provided. www.nature.com/scientificreports/ ranged from − 0.02 to 0.08 (mean ± S. E. = 0.03 ± 0.01) suggesting high levels of population mixing (Supplementary Table S4 online). The level of inbreeding occurring within populations can be inferred from Fis, however, this varied between loci with high levels suggested for some loci (e.g. Spf1502 and Spf670), but low for others (e.g. Spf1592 and Spf789) ( Table 3). These results suggest that in Africa, FAW may frequently mix with FAW from other countries suggesting that very little population differentiation is occurring. Population differentiation was further analysed using an amova to determine if the genetic distance between individuals varies by country, location within country or sampling year (see Supplementary Table S5 online). There was no significant difference between samples from locations within countries (F 4,81 = 1.17, P = 0.120), or between sampling years (F 1,81 = 0.96, P = 0.543). The amova suggested, however, that FAW from each country were genetically different to FAW from other countries (F 5,74 = 1.86, P = 0.001).

Hs Ht Nei's Gst Hedrick's Gst Jost's D Fst Fis CI Fis ( −) CI Fis ( +)
As Country was the only significant factor influencing FAW population differentiation, a second amova was carried out to determine genetic variation between and within countries. This suggested significant differences between the six countries, however the total variance explained by differences between countries was low, and most of the genetic variation was found within individuals which would suggest a largely panmictic population (Table 4). To further check for genetic structuring based on the microsatellite markers, a PCA was carried out using the genetic distance between individuals and this showed no evidence of structuring between the six countries ( Supplementary Fig. S1B online).
Population clustering based on microsatellites. Clustering was carried out using an admixture model in STRU CTU RE, with the number of clusters selected using Delta K (Evanno method) and LnPr(K) methods 25 . Based on Delta K there were three genetically distinct clusters in FAW ( Fig. 2A,B). Based on LnPr(K), the most likely number of clusters was five (Fig. 2C,D). In both 3 and 5 cluster scenarios, FAW from Sudan and Zambia were more genetically isolated from the four other countries, though some individuals from the other four countries do show similar assignment patterns suggesting population mixing does occur between all countries. Samples from Ghana, Kenya, Malawi and Rwanda appear very similar to each other, suggesting high levels of population mixing between these countries (Fig. 2C,D, clustering with two and four clusters is shown in Supplementary Fig. S3 online). Based on the similarities between the structuring results for 3 and 5 clusters, and that the LnPr(K) begins to plateau after K = 3 we propose that population structure of FAW in Africa is best described by three genetic clusters. To identify potential substructure in FAW from the four countries exhibiting evidence of genetic similarity (Ghana, Kenya, Malawi and Rwanda), a separate analysis was performed in STRU CTU RE. This identified 3 genetic clusters as the most likely scenario, based on both DeltaK and LnPr(K), and further confirmed high levels of mixing between the countries, with no strong evidence of substructure identified ( Fig. 2E-G).
STRU CTU RE has been shown to miss some subdivision when clustering individuals 26 , therefore, population clustering analysis was also carried out by identifying clusters de-novo (i.e., with no prior population information provided) and then using Discriminant Analysis Principal Components (DAPC). This approach determines the number of possible clusters by running successive K-means clustering, and selecting the most suitable cluster based on Bayesian Information Criterion and the number of PCs to keep was calculated to be 7 using the a-score 26 . This method identified three clusters as the best model based on BIC (BIC = 120.67) in FAW (Fig. 3A). Based on three clusters, FAW from Sudan were more genetically different to populations from elsewhere in Africa with no individuals assigned to cluster 3, whereas, cluster 3 individuals were found in all other countries (Fig. 3B-D). The three clusters highlighted similarities between the adjacent countries of Zambia and Malawi, with 50% and 44% of individuals respectively from these countries assigned to cluster 3, and similarities between Kenya and Rwanda, with 23% and 19% of individuals assigned to cluster 3 respectively (Fig. 3B-D). Ghana showed most similarities with Kenya and Rwanda, with 25% of individuals assigned to cluster 3 ( Fig. 3B-D).

Discussion
This study is the first to use microsatellites to determine FAW population mixing and genetic diversity in Africa. Considering the limited genetic diversity and unreliability of the COIB and TpiE4 haplotypes for strain identification and the potential for confusion caused by corn and rice strain hybrids [11][12][13][14]27,28 , we sought to quantify the degree of population structuring in FAW in Africa using a more robust microsatellite approach. This revealed that microsatellites had higher levels of genetic diversity compared to the COIB and TpiE4 markers, revealing that FAW in Africa is largely a panmictic population.
The previously reported discordance between the TpiE4 and COIB markers for strain identification was mirrored in this study, with very little agreement occurring between the markers. Furthermore, based on the COIB haplotypes it was not possible to determine genetic differentiation between the countries as only COIB CSh4 Table 4. Results of an amova to analyse differences between the six countries in this analysis based on the microsatellites. P value was calculated using a randomization test with 999 permutations. www.nature.com/scientificreports/ www.nature.com/scientificreports/ was found. The intronic TpiI4 marker showed more variation between the individuals, however, the vast majority of larvae were TpiCa1a, which is in line with previous studies investigating FAW in Africa and Asia 12,13,28 . Previous work based on these markers in Africa concluded that there were significant differences between some African countries with widely separated populations being genetically distinct 28 . Whilst the findings here using the TpiI4 marker do support some evidence of genetic variation between countries, it was low, suggesting more of a panmictic population of FAW across Africa based on this marker. The low genetic variability observed with the COIB marker, and both TpiE4 and TpiI4 markers, limit the analyses that can be carried out and reduce the likelihood of genetic differentiation between countries being detected. By using highly variable microsatellites, we were able to overcome this challenge to determine genetic differentiation between FAW from different countries in Africa, as well as some similarities, suggesting the possible presence of both resident and migratory populations of FAW throughout the continent.
Most of the microsatellites in this study were out of Hardy-Weinberg equilibrium (HWE), whereas in previous population genetics studies using microsatellites in FAW from Paraguay and Brazil, no loci were found to be out of HWE 8 . However, the deviation observed in the present study is to be expected in invasive FAW populations, which have been through a tight bottleneck, given that they probably originated from a small source population in Africa, providing further evidence of a common origin for FAW which then subsequently spread across the continent 9,27 . The microsatellites also showed evidence of a genetic bottleneck and loss of diversity in the African FAW compared to populations in Texas, Mississippi, Puerto Rico, and Brazil. For example, previously reported allele sizes for locus Spf997 were in the range of 95 to 139 8,17 , whereas in this study the allele size range for the same locus was 79 to 113. The evidence of this genetic bottleneck throughout Africa offers more evidence of a single origin population instead of multiple introduction events. It is likely that if multiple incursion events had occurred then the microsatellite size ranges observed here would have matched more closely with those previously recorded.
Although FAW in Africa are likely to have undergone a population bottleneck at the time of invasion, the range of alleles for each locus identified in this study (3 to 13) was similar to that previously reported from Paraguay and Brazil (3 to 15) 8,18 . Based on this range of alleles, previous work found genetic differentiation between northern and southern FAW populations across Brazil and Paraguay, as well as gene flow across all populations sampled 8 . This indicates that despite a recent bottleneck there is still sufficient genetic diversity in microsatellite regions to enable population genetic studies of FAW in Africa.
Populations from the six countries (Kenya, Ghana, Malawi, Rwanda, Sudan and Zambia) did not show strong signs of population differentiation when using traditional measures (Nei's GST, Hedrick's GST and Jost's D). This www.nature.com/scientificreports/ indicates that these populations mix frequently, and no strong genetic structure is evident. This was supported by the amova which showed that most of the genetic variance was occurring between individuals. This lack of population differentiation between countries provides evidence consistent with FAW undergoing long distance migratory flights in Africa creating a panmixia of populations. Additionally, there was no evidence of genetic differentiation between samples from different sampling locations within the same country, confirming that populations are mixing within countries. This has significant consequences for the evolution and spread of insecticide resistance, as resistance alleles can spread rapidly throughout each country and across Africa. This is an important finding as insecticide resistance (organophosphate and pyrethroid resistance) has already been reported in FAW in China, so is highly likely to be present in Africa 14 . Considering the key role that long-distance, migratory flights played in the rapid spread of insecticide resistance both within and across continents in the invasive cotton bollworm (Helicoverpa armigera) [29][30][31][32] , it is important to consider the implications of frequent, long-distance flights that seem to be occurring in FAW.
The evidence of panmixia contrasts with previous results based on the COIB and Tpi markers, which analysed FAW samples from across Africa and found evidence of genetic differentiation between geographically widespread countries 9,28 . Further investigation with microsatellites using clustering approaches show that whilst the countries included in this study are similar genetically (e.g. Kenya, Rwanda, and Ghana), others are more differentiated (e.g. Sudan). We conclude from this that genetic mixing of FAW populations is occurring widely across Africa, however, there are some FAW possibly forming resident and partially segregated populations, as seen in parts of South America and the Caribbean 7 . Alternatively, the possibility that FAW have not been in Africa long enough to evolve population differentiation should also be considered.
Previous reports based on COIB and Tpi suggested a possible east-west divide between FAW populations 9 , or no clear pattern of division between populations 28 . Our study using microsatellites found that the two most genetically distinct populations are the most northerly and most southerly populations. African countries located further south (Zambia, Malawi) showed more similarities to each other compared with countries further north (Kenya, Rwanda, Ghana and Sudan) (e.g., fewer individuals were assigned to cluster 3 in the north compared to the south). This pattern of genetic separation coincides with the known migratory routes of the congeneric African armyworm (Spodoptera exempta) in eastern Africa, which follow the movement of the dominant winds each season, typically moving moths towards the north-west from Kenya and northern Tanzania, and a more south-westerly movement across southern Africa from Malawi 33,34 . This is also aligned with the movement of the inter-tropical convergence zone (ITCZ), with the wind direction (and hence seasonal migration) being broadly south-easterly north of the equator and north-easterly south of the equator 34 . Based on the high levels of mixing between FAW populations alongside this evidence of some genetic structuring between northern and southern populations, it is hypothesised that FAW may also follow the movement of the dominant winds if they are migratory in Africa as, like many other insects, they rely on wind to support high-altitude long-distance flights 2,5,35,36 .
This study highlights the benefits of using multiple approaches to study genetic diversity, with evidence presented for both widespread genetic mixing between populations alongside some segregation between countries. This is most likely due to a proportion of FAW adults undergoing long-distance migratory flights whilst the remaining FAW form more sedentary, resident populations. These results provide important evidence that genetic mixing between FAW populations throughout Africa may be more common than previously reported. This has important consequences for FAW management when considering factors such as the spread of insecticide resistance and crop infestations across borders.  DNA extraction. DNA was extracted from samples following the standard protocol for tissue in the Qiagen DNeasy Blood and Tissue kit. DNA was stored in buffer AE at − 20 °C. The protocol was altered slightly for extracting DNA from larvae collected in Sudan, these modifications were 200 μl ATL and an additional 200 μl 1 × SSC before incubation and the DNeasy Spin Column was centrifuged at 13,000 RPM.

Methods
Strain identification and haplotyping using COIB and Tpi markers. DNA was amplified for strain identification using COIB (F: 5′TAC ACG AGC ATA TTT TAC ATC, R: 5′GCT GGT GGT AAA TTT TGA TATC 27 ) and TpiI4/TpiE4 (F: 5′ATG ATT AGG ACA TCG GAG C, R:5′ATG TAA TCC AGT CAA TGC CTA 37 , modified by de Boer). Cycling parameters for both COIB and TpiI4/TpiE4 were 94 °C 10 min, 33 cycles of 94 °C 1 min, 55 °C 1 min, 72 °C 1 min and then a final extension of 72 °C for 5 min. Following COIB amplification, the product was incubated at 37 °C for 2 h with 1 µl EcoRV restriction enzyme and 2 µl NEBuffer to determine FAW strain. EcoRV cuts the amplicon at position 1182 bp if the sample is from the rice strain resulting in two visible bands, and does not cut for the corn strain resulting in one larger band when the product is run on a gel electrophoresis (Table 3). There are five known haplotypes of the COIB marker, these are corn h1 (A 1164 A 1287 ), corn h2 (A 1164 G 1287 ), corn h3 (G 1164 A 1287 ), corn h4 (G 1164 G 1287 ) and rice (T 1164 A 1287 ) 27  www.nature.com/scientificreports/ plement the reaction to 10 µL. The sequencing reaction was preincubated for 1 min at 96° C followed by 25 cycles of: 10 s at 96° C; 5 s at 50° C; 4 min at 60° C. Excess incorporated dye-terminators were removed using EDTA/ Ethanol precipitation before resuspending in 13 µL Hi-Di® formamide and capillary gel electrophoresis on an ABI 3500 Genetic Analyzer. Strain identification was carried out using the Tpi marker by Sangar sequencing following the same protocol as for COIB based on nucleotide variation in exon-4 (TpiE4), where the corn strain has base C 183 , the R strain has base T 183 and hybrids (males only) have C/T 183 27 . Tpi Intron 4 (TpiI4) was used to determine TpiI4 haplotypes based on 18 previously recorded highly variable positions 12,37 . For sequencing analysis, raw sequences were assembled and aligned using ClustalW in BioEdit 38,39 . Statistical analysis on strain and haplotype distributions for TpiE4 were carried out in R using a Poisson GLM followed by a Chi 2 test using the amova function. Based on the TpiI4 haplotypes, an amova was carried out using the POPPR package as this gave details of the variance explained within and between samples and populations (Kamvar et al. 2014), from which a P value was calculated using a randomization test with 999 permutations. The genetic distance computed for the amova was also used for Principal Components Analysis (PCA) using the prcomp function in R.
Microsatellite amplification. Eight highly variable microsatellites were selected for amplification based on them showing the greatest diversity in FAW in previous studies 8,17 , the microsatellite primer details are shown in Table 5 Microsatellite genotyping. Fragment genotyping was carried out on an ABI3500 sequencer. Each reaction was composed of 11 µl HiDi Formamide, 0.4 µl Rox500 size standard and 1 µl PCR product (Spf343, Spf997 and Spf1706) or 0.5 µl PCR product (Spf1592, Spf670, Spf789, Spf918, Spf1502). Genotyping results were viewed on Thermo Fisher Connect™. The threshold for successful amplification was > 100RFU, and for heterozygotes the minor peak was > 50% of the major peak. Alleles were called based on size measurements and peaks determined to be the same allele if size measurements were within 0.5 nucleotides of each other (for example, a size of 150.2 and 150.6 were both classed as 150).
Microsatellite analysis. Samples with fewer than 5 microsatellites amplified were removed from the analysis. Microsatellite analysis was carried out in R (v. 4.0.3) 40 . Hardy-Weinberg equilibrium was tested using the PEGAS R package 41 . The frequency of null alleles was determined using the Chakraborty et al. (1994) formula through the POPGENREPORT R package 42 . Heterozygosity and F-statistics were calculated using the HIERF-STAT package 43 . Genetic differentiation was measured using the MMOD package (G st and Jost's D) 44 . Linkage disequilibrium was calculated using an association index using the POPPR package 45 and by composite linkage disequilbrium using GenePop (v 4.7) 46 . An Analysis of Molecular Variance (AMOVA) to determine population differentiation based on genetic distance was carried out using the adonis2 function from the VEGAN package in R for all variables 47 and country was looked at using an amova with the POPPR package as this gave details of the variance explained within and between samples and populations 45 , from which a P value was calculated using a randomization test with 999 permutations. The genetic distance computed for the amova was also used for Principal Components Analysis (PCA) using the prcomp function in R. To identify population clusters, a Discriminant Analysis of Principle Components (DAPC) was carried out after clusters were identified de novo (i.e., no prior location information) using the find.clusters function in the ADEGENET package in R 26 . Optimum number of K was selected based on BIC (Fig. 3A). The number of PCs retained in the DAPC was 7, this was determined using the a-score with the optim.a.score function in the ADEGENET package in R (Supplementary www.nature.com/scientificreports/ Fig. S4 online). STRU CTU RE (v. 2.3.4) was also used to identify population clusters, using an admixture model with 100,000 burnin and 100,000 reps for K1 to K15 with 15 iterations per K 48 . STRU CTU RE results were visualised using STRU CTU RE HARVESTER 49 , CLUMPP 50 and DISTRUCT 51 .