Crop protection involves practices to manage the plant diseases, weeds and pests that damage agricultural crops and forestry. It plays a key role in safeguarding global crop production against losses, thereby helping to meet the increasing demand for food caused by a growing human population1. Cassava (Manihot esculenta Crantz) is an important root crop, which is drought tolerant and able to grow under suboptimal conditions such as low soil fertility2. It provides food for about 800 million people worldwide3. Cassava has proven to be an invaluable food security crop, particularly to smallholder farmers in Sub-Saharan African countries4. Cassava production, however, has been decreasing, particularly in East Africa, despite the increasing area under cultivation5. The main cause of this trend is two major viral diseases, cassava mosaic disease (CMD) and cassava brown streak disease (CBSD). These are both transmitted by their whitefly vector, Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidae)6.

CMD in Africa is caused by at least seven species of single-stranded DNA cassava mosaic begomoviruses (CMBs) (family Geminiviridae: genus Begomovirus)7,8 and CBSD by two cassava brown streak ipomoviruses (CBSIs) (family Potyviridae: genus Ipomovirus)9,10,11,12. In addition to being transmitted by B. tabaci, both CMBs and CBSIs are spread by farmers, through the use of virus-infected stem cuttings. The two diseases can occur singly or in dual infections in cassava and the damage can be severe. Yield losses of about 47% from CMD infected fields were recorded in eastern and central African cassava-growing areas13, while in other fields losses of up to 70% were reported due to CBSD14.

B. tabaci has recently been considered as a complex of species, including at least 35 morphologically indistinguishable species15. Members of the different species are found on more than 500 plant-host species in 74 families, which includes crops that are crucial to food security, such as cassava15,16,17. These pests affect plants by direct phloem feeding by nymphs and adults on crop foliage or production of honeydew, which encourages the growth of sooty mould fungus on leaves18,19. However, by far the greatest impact is caused by the spread of >350 plant viruses including CMBs and CBSIs20,21,22. Epidemics of CMD and CBSD have been reported in several parts of Eastern and Southern Africa since the early 1990s and these were associated with unusually high numbers of whiteflies on cassava21,23,24. The presence of these ‘superabundant’ populations has been responsible for the rapid spread and development of two disease epidemics25, but the reason(s) for their upsurges remain uncertain26.

Five putative species of B. tabaci (described by their mtCO1 marker) have been found colonising cassava in sub-Saharan Africa (SSA) and these were named serially SSA1 to SSA516,25,27,28, with several sub-groups reported for some species. The SSA1 species is widely distributed in Africa; SSA2 is mostly found in the eastern, southern and central areas of Africa as well as in the south of Spain; while SSA3 and SSA4 have been reported in Cameroon and the Central African Republic; SSA5 has only been described in the Ivory Coast and South Africa27,28,29. SSA2 was hypothesised to be an invasive species associated with the CMD epidemic in Uganda in the 1990s, but has subsequently been rarely found30,31. In addition, phylogenetically distinct populations have been described within the SSA1 species, known as SSA1 sub-groups 1 and 2 (SSA1-SG1 and SSA1-SG2, respectively), which were also associated with the CMD and CBSD epidemics25,26,32.

Analysis of genotypes and genetic diversity of B. tabaci species is of crucial importance as it can facilitate selection of appropriate management control measures33. Analysing the nuclear genetic diversity of whitefly populations had been performed in the past using several types of markers, among which neutral-codominant markers such as microsatellites gave reliable results. Those markers allowed to distinguish B tabaci species and populations within those species, including Med Q1 and ASl34, Med and MEAM135,36,37, Med Q1/Q238, or between a wide range of species worldwide39. Nevertheless, those markers had not been commonly used to untangle population structure among SSA species in Sub-Sahara Africa.

The objectives of the current study, therefore, were to understand: (i) whitefly species’ distributions in cassava fields in Uganda in 1997 during the initial stages of the CMD epidemic and compare these with the high whitefly populations still present in 2017; (ii) the genetic pattern (diversity and genetic structuring) of population dynamics over two decades in rapidly evolving B. tabaci species. To meet these objectives, we estimated the genetic diversity and population structuring of the whitefly species by sequencing the partial mitochondrial cytochrome oxidase I (mtCOI) barcoding region and 13 nuclear markers from specimens collected in 1997 and 2017 from the same geographical location.


Whitefly abundance and CMD and CBSD symptoms

Whitefly abundance varied between fields from <10 to over 500 adults per plant. Fields with over 100 adult whiteflies per plant were considered superabundant populations. Based on this criterion, eight (61.5%) of the 13 fields visited in 2017 had superabundant whitefly populations, among which two fields did not show CMD and CBSD symptoms (Table 1). The distribution of CMD within fields was higher (69.2%) than that of CBSD (46.2%) with a maximum leaf severity score of 4 detected in two fields. Moreover, 38.5% of the fields were dually infected with both viruses. Furthermore, during the 1997 survey, three fields (42.9%) had superabundant whitefly populations (Table 1) and one of these had plants exhibiting severe CMD symptoms. Three fields had plants with CMD symptoms, despite a low whitefly number (<10 per plant). No CBSD symptoms in any plants were recorded during 1997.

Table 1 Location and information of adult whiteflies collected in Uganda.

Phylogenetic analysis

The partial mtCOI gene of 665 whiteflies was sequenced, of which 219 were from the 1997 collection (110 eggs, 79 nymphs, 28 pupae and 2 adults) and the remaining 446 were all adults from the 2017 collection. After manual checking and editing for errors, the mtCOI sequences were trimmed to different lengths: 700 bp (n = 251), 657 bp (n = 219), 500 bp (n = 112) and 300 bp (n = 80) depending on the sequence quality obtained. Despite the shorter sizes of some sequences (300 bp), it was possible to differentiate between putative species and SSA1 sub-groups, as well as to use the shorter sequences as species tags in further analysis. To increase robustness of the phylogenetic tree, a total of 470 sequences that were at least 651 bp long, together with an additional 12 sequences from the GenBank, were used. Bayesian phylogenetic analysis used to generate the tree divided our sequences into four main clusters (SSA1(-SG1 -SG2 and -SG3), SSA2, Mediterranean (Med) and Bemisia afer) supported by high posterior probability values (>0.9) (data not shown). The specific group of haplotypes within SSA1 named SSA1- SG1 were found to be dominant with 183 individuals (84%), followed by SSA2 (n = 22, 10%), whereas others belonged to the other haplotype groups named SSA1-SG2 (n = 2, 0.9%) and SSA1-SG3 (n = 1, 0.5%), and to B. afer individuals (n = 10, 4.6%) from the 1997 collection (Supplementary Fig. S1). The 2017 samples revealed both SSA1-SG1 (n = 126, 50.2%) and SSA1-SG2 (n = 122, 48.6%) as the dominant groups over all observed genetic clusters. The other species characterised were Med and B. afer, which together represented 1.2% of the total (Supplementary Fig. S2).

Haplotype diversity results revealed 12 haplotypes from B. tabaci species within the combined dataset of longest sequences (651 bp), comprising 470 (Table 2) individuals (219 and 251 from 1997 and 2017 respectively); however, only two of these were observed in both 1997 and 2017 (Fig. 1). Ten haplotypes were observed in 1997, among which five were observed for SSA1-SG1, with the largest group containing 176 individuals (84.2%) represented as P319F in the phylogenetic tree (Fig. 1). This haplotype shared 100% identity with the previously identified sequence of KM37789940 and KX57078541, both from Uganda. The remaining four SSA1-SG1 haplotypes contained nine individuals (4.3%). Two haplotypes were observed for SSA1-SG2. Apart from SSA1, two haplotypes were found for SSA2 (n = 22, 10.5%) from 1997 collected samples. In 2017, five haplotypes were found from B. tabaci species, including one haplotype for SSA1-SG1 (n = 126, 50.6%) represented as P10G3 (Fig. 1). These individuals shared 100% identity with the majority of SSA1-SG1 found in 1997. In addition, three SSA1-SG2 haplotypes were observed, among which 117 individuals (47%) shared 100% identity with the KM37789940 and KX57079041 reference sequences recognised in Malawi and Uganda obtained from GenBank, and the other two SSA1-SG2 haplotypes contained five individuals (2%). One haplotype (n = 1, 0.4%) of the Med species was found. Apart from 12 haplotypes of B. tabaci species, two other B. afer haplotypes were also observed. All the new haplotypes were submitted to GenBank and were assigned accession numbers from MK360160 to MK360177 (Table 2).

Table 2 Haplotype distribution within fields F1–F14 (Table 1) and two different years (1997 or 2017).
Figure 1
figure 1

Rooted posterior probability phylogenetic tree generated by MrBayes using the Markov chain Monte Carlo method for all the different mtDNA COI haplotype sequences (651 bp) of 1997 and 2017 samples together (n = 14) with reference sequences (n = 12, in bold) obtained from GenBank for comparison. Numbers associated with nodes indicate the posterior probability for those nodes. Horizontal bars represent genetic distances as indicated by the scale bar, vertical distances are arbitrary.

Nuclear genetic analysis

A total of 594 out of 662 individuals (407 and 203 from the 2017 and 1997 samples, respectively) were successfully genotyped. All loci were checked with Microchecker42 and no PCR artefacts linked to large allele drop-out or stuttering were detected. All individuals and loci with missing data greater than 20% and/or 25% of null alleles were discarded from the dataset, meaning that 68 individuals and one loci (CIRSSA41) were removed. The number of alleles per locus from the 13 microsatellite markers over the whole dataset ranged from 5 to 52. The highest polymorphism observed was for the P5 locus and the lowest was for the CIRSSA2 locus. The mean null allele frequency for all loci and populations was 0.128 but ranged from 0.01 to 0.56 (from CIRSSA6 to CIRSSA41, respectively) (Table 3).

Table 3 Characteristics of loci used for nuclear analysis.

The population genetic diversity indices were calculated in SSA1 and SSA2 species separately. The results from SSA1 species showed the mean alleles richness over all loci, per field, ranged from 4.22 (n = 14; 2017) to 6.66 (n = 57; 1997) with the highest mean values observed in the 1997 collection (Supplementary Table S1). None of our collected samples, analysed per population, showed deviation from Hardy–Weinberg equilibrium, using the exact test of Markov Chain of Monte Carlo (MCMC) (Supplementary Table S1). Three out of 66 pairs of loci across all populations showed deviation from genotypic disequilibrium after Bonferroni correction.

Genetic structuring and population differentiation

Bayesian clustering analysis revealed three major genetic clusters from the SSA1 population with an optimal number of clusters of K = 3 (estimated by means of ∆K, as described by Evanno et al.43). The first two genetic clusters at K3, in Fig. 2b, dominated the 1997 B. tabaci samples. Individuals of the two mtCO1 sub-groups SSA1-SG1 and SSA1-SG2 were not differentiated and belonged to the same genetic clusters (Fig. 2b). The third genetic cluster, denoted by yellow colour, dominated the 2017 collections except for some individuals of the two other genetic clusters found within the 1997 samples at K = 3. Conversely, a few individuals assigned to the pink genetic clusters were also found in 2017 collections of both yellow and blue genetic clusters. We can also observe a proportion of individuals with less than 50% of posterior probability assigned to one genetic cluster, which were perceived to be part of several genetic clusters. These individuals could be assigned as individuals with gene flow between genetic clusters.

Figure 2
figure 2

STRUCTURE bar plots for SSA1 and SSA2 populations collected from Uganda (a) for 33 populations of SSA1 arranged by subgroup, site and year at K = 2 and 3, e.g., K2(a) and K3(a) with recessive allele option turned on, and K2(b) and K3(b) without the option turned on. (b) For 102 randomly selected SSA1-SG1 and SSA1-SG2 together with 17 individuals of SSA2 at K = 3 and 4. The black line within SSA1 separates individuals of SSA1-SG1 and SSA1-SG2 for 2017 and SSA1-SG1 for 1997.

Although there were few samples of the SSA2 population (17 individuals), another Bayesian analysis was run together with SSA1 individuals to understand the genetic pattern between the two-putative species. To decrease the effect of unbalanced samples, 102 samples of SSA1 together with 17 samples of SSA2 were randomly chosen, and similar results were obtained showing some level of shared genetic background, as expected for closely related species (Fig. 2b).

Individuals with 70% posterior probability from the Bayesian analysis dataset were selected and used to perform a principal component analysis (PCA), subsequently the analysis split the dataset into three clusters/ellipses similar to the previously identified genetic clusters of Bayesian analysis (Fig. 3). The 2017 samples were aggregated in one ellipse. In contrast, the majority of 1997 individuals belonged to the other two ellipses. However, some individuals from the two collections were mixed within clusters, for instance, with individuals in green, cyan and black. In parallel, a discriminant analysis of principal components (DAPC) was performed and the best BIC value was found at K = 3, as the best K number of assumed populations. Accordingly, the DAPC spread the dataset into three clusters; two were dominated by 1997 collections, whereas the third cluster was represented by the 2017 samples (Data not shown).

Figure 3
figure 3

Principal component analysis of B. tabaci (SSA1-SG1 and SSA1-SG2) populations from Uganda. Colours show the genetic clusters found with the Bayesian analysis of structure at K = 3. Each dot represents one individual. The pink cluster is dominated by the 1997 population, whereas the blue and orange clusters are dominated by the 2017 population. In each cluster there are few individuals of different years represented by green, brown and black dots.

The AMOVA carried out on our SSA1 samples to test for population differentiation between years, using only sites that were sampled in 1997 and 2017 (ie. 6 sites were considered for each year) showed a very low, but significant variation (Table 4).

Table 4 Analysis of molecular variance (AMOVA average over loci) from Ugandan populations of B. tabaci, comparing SSA1 populations between sampling year, *indicates significant variation among populations within species and within individuals.

Further analysis of differentiation between groups was performed through analysis of the population pairwise matrix of genetic distances Fst between sites, subgroups and years. Results revealed a few significant differences between SSA1-SG1 and SSA1-SG2 of the 2017 samples for only 33 comparisons. Most of the populations of 1997 were significantly different from the ones of 2017, except for field 5 (Supplementary Table 2).

The bottelneck analyses performed on the 2017 dataset showed that all populations had undergone a significant bottleneck (One-tailed Wilcoxon sign-rank tests, P < 0.05) in the recent past with the SMM model (Supplementary Table 3).


We used two different molecular markers to identify population genetic variations within the African B. tabaci colonizing cassava in Uganda, to compare 2017 populations with those of the 1997 outbreak. Our results reveal that SSA1 was the dominant species in Uganda both in 1997 and 2017 and that its subgroups SG1 and SG2 can interbreed. Populations within SSA1 were found to be structured into three genetic clusters, irrespective of subgroups, which varied in abundance between 1997 and 2017. The SSA2 individuals were clustered separately. The main results obtained here are showing that the genetic composition of SSA1 whitefly species has changed rapidly over the 20 years period, which is contrasting with the previous invader hypothesis.

Out of the 13 cassava fields visited in 2017, we observed 8 fields with >100 mean adult whiteflies per plant, which were defined as having superabundant populations. Within those eight fields, 63% (n = 5) showed CMD symptoms and 50% showed symptoms of CBSD. The association between the two diseases and B. tabaci population on cassava has been reported by Colvin et al. and Legg et al.21,44. In the remaining five fields with <50 mean adult whiteflies per plant, up to 60% CMD and CBSD symptoms were observed. The improved cassava varieties TME14 and TME 204 were grown in these fields. Despite the relatively low number of whiteflies observed in the field located at Kynagolo (1–9 mean adult whiteflies per plant) planted with TME 14 and the field at Masaka (10–49 mean adult whiteflies per plant) planted with an unknown variety, both had average CMD and CBSD symptom severity (Table 1).

Regardless of the superabundant whitefly population in two fields (Ntale and Wasinda), no CMD and CBSD symptoms were observed, which may be related to the type of variety grown (NASE3, an improved cassava variety, which might be tolerant to CMD and CBSD).

The average age of cassava field with high whitefly abundance (100 and above) was 6 months, which contrasts with the age recently shown by Kalyebi45 of 2–3 months. These observed discrepancies might be linked to the surveyed cassava varieties, which might be more whitefly susceptible in our case, or the very small amount of field samples in our study (n = 8). Furthermore, our study revealed whitefly abundance increased toward the southern part of Uganda, with a maximum population of >500 whiteflies per plant in 2017. This result corresponds to the previous studies conducted in Uganda during the 1990s46,47.

Despite all efforts made to combat CMD and CBSD diseases since the first outbreaks of whitefly populations reported in the 1990s23,46,48 in Uganda, most of the cassava varieties grown in the 2017 surveyed fields (TME 14, TME 204 and NAROCAS II) were infected with both diseases. Development of whiteflies on cassava is the result of synergistic interaction of several factors including viruses, bacterial symbionts and cassava genotypes49, all of which require more research attention. In order to combat these problems effectively, research and development efforts need to be focussed on creating cassava varieties that combine both virus- and whitefly-resistance traits.

SSA1-SG1 was present at all sites in 1997 and 2017 (Table 2). The presence of SSA1-SG1 in different regions of central and eastern Africa, including Uganda, has been reported in several studies. Interestingly, a balanced distribution of SSA1-SG1 and SSA1-SG2 was found in 2017 compared with the very low proportions of SSA1-SG2 in 1997 (0.9% in 1997 vs 48.6% in 2017). Recent studies describing the SSA1 sub-groups found similar results in Uganda25,30,50.

The proportion of SSA2 was low (10%) in 1997 and we did not detect it at all in 2017. Legg et al.25 also reported a drastic decline in SSA2 from 63.9% for 1997–1999 sampling to 1.4% by 2009–2010. The reduction of SSA2 in favour of SSA1-SG1 and SSA1-SG2 in Uganda has also been reported in two recent studies30,51. The reason for the decrease of SSA2 in eastern Africa is unknown but could be associated with less suitable environmental conditions, such as the use of improved cassava varieties following the CMD epidemic, which might have impacted the abundance of SSA1. In addition, it might be related to biological consequences of mating interruption between the two species, where copulation events occur between individuals of the two species but without viable progeny. Consequences of this behaviour is observed with a decreasing success of mating of the species in lower abundance15,52.

Analysis of mitochondrial DNA placed SSA1-SG1 and SSA1-SG2 in different haplotype groups within SSA1; however, nuclear analyses based on several methods revealed substantive gene flow between these two haplotypes. The ability of these two groups to interbreed and exchange genetic material resulted in there being no significant genetic differentiation between the individuals of both groups. The homogenisation of this group from potentially two different maternal lineages probably resulted in the maintenance and increase of SSA1-SG2 over the years. Similar results were observed from a genomic approach, showing no differences between SSA1-SG2 and SSA1-SG1 from Burundi, Tanzania, Rwanda or the Democratic Republic of Congo31. These results collectively indicate that SSA1-SG2 and SSA1-SG1 should not be considered as different entities, but only as different mitochondrial haplotypes within the SSA1 species.

Nuclear analyses from 1997 sampling also revealed that SSA2 individuals analysed were considered as a separate group to SSA1; however, with a signal of low shared background between both groups. This weak signal could be explain by the fact that both species are closely related ones. Similar results were seen in the recent genomic analysis by Wosula et al.31 and had been observed in La Réunion between the invasive Middle East Asia minor 1 (MEAM1) and indigenous Indian Ocean (IO) species53. High abundant SSA1 populations and low abundant SSA2 populations in sympatry might have created conditions favouring mating between those groups, which could have resulted in few cases of mating success between both cryptic species.

Our study also revealed no new populations between 1997 and 2017, but a significant genetic difference between the two collection periods. This is clearly shown in the Bayesian analysis structure (Fig. 2a) with the dominance of one cluster at K3 for 2017 SSA1 populations, whereas the two other genetic clusters characterised the 1997 SSA1 populations. Despite the variations in distribution of genetic clusters, we also observed mixed genetic patterns within populations between the years.

The Bayesian analyses, DAPC (BIC criteria) and PCA all showed structuring of SSA1 putative species into three genetic clusters that can interbreed. However, despite the gene flow patterns between several individuals, this structuring into three genetic clusters was stable between 1997 and 2017 and the individuals did not completely homogenise into a single population, with significant differentiation observed between populations of both years. Although the reasons for this are not entirely clear, mating preferences or other specific loose barriers to hybridisation may act to support this pattern. Presence of different symbiont communities could also be a factor. Indeed, some symbionts are known to play such a role in other insects54,55 as well as whiteflies, where a specific bacterial community (Arsenophonus and/or Cardinium) had been partly implicated in manipulating reproduction of MEAM I and IO species56. SSA1 supports a high bacterial diversity51,57; however, no link has yet been established between the complex bacterial community and hybridisation barriers. Further studies should be conducted to better understand the roles of endosymbionts in different B. tabaci species.

Despite the moderate null allele frequencies detected in our dataset, our results remained consistent through the four different analyses. All analyses produced similar patterns, which indicates the robustness of our results. All results obtained here categorically reject the hypothesis that new outbreaks of whiteflies in Uganda in the 1990s were due to the arrival of a new population or species of whiteflies25. Nevertheless, frequency of specific genetic clusters significantly changed over the studied 20-year period within the SSA1 species, with 2017 populations having a strong signature of a recent bottleneck event. It is possible that the most abundant genetic cluster comprising SSA1-SG1 and SSA1-SG2 might have overcome or displaced the previous most abundant genetic cluster. Correspondingly, three new hypotheses might be raised to explain the observed results: (i) the previous “old” dominant genetic cluster might be less fit for new cassava cultivars released in Uganda, (ii) environmental change occurred within the studied 20-year period and the SSA1-SG1 and SSA1-SG2 clusters were preferably adapted to it and (iii) the re-emergence of CBSD in Uganda in the early 2000s58. Confirming these hypotheses will require further experiments.

The strategy of using disease-resistant cultivars has not proved effective in combating CMD and CBSD. Some ‘improved’ varieties such as TME 14 and TME 204 became susceptible in Uganda3 and many virus-resistant cassava varieties are highly susceptible to whiteflies59. The intensification of cassava production to meet the high demand for food under increasing human population in the era of climate change might be impossible without the concomitant control of B. tabaci populations and development of virus-resistant crop varieties. This can only be achieved by a better understanding of the main viral vectors, which will facilitate design and selection of appropriate disease management and control measures.


Whitefly collection

Live adult whiteflies were collected from cassava plants during a survey of 13 fields from seven districts (Mityana, Mpigi, Wakisa, Kalungu, Masaka, Rakai and Gomba) in Uganda in February 2017 (Table 1, Fig. 4). The fields were separated by about 20 km (except for three that were <10 km apart) and their GPS coordinates were recorded. QGIS v.2.18.17 online software ( was used to map the site locations (Fig. 4). Whiteflies were collected with a mouth aspirator and then preserved in Eppendorf tubes containing absolute ethanol. In the same geographical location adult whiteflies and cassava leaves with eggs, nymphs and pupa were collected (Table 1, Fig. 4, within 2–3 km radius) in February 1997 and stored at −80 °C. No GPS coordinates were recorded in 1997 but the names of sites/villages and recorded distances were used to approximately match the old and new sites.

Figure 4
figure 4

Geographical locations of sampling surveys conducted in (a) Uganda as a whole and (b) part of the central region in which sampling was conducted. Red and black circles are sample sites for whitefly collections made in February 1997 and February 2017.

Determination of whitefly population

The number of adult whiteflies on the top five leaves of five plants selected randomly in each cassava field was recorded as described by Sseruwagi et al.60 in the 2017 sampling. The number of whiteflies per plant was estimated according to the following system: “1” = 1–9 adults per plant, “10” = 10–49, “50” = 50–99, “100” = 100–499 and “500” = >500.

Assessment of CMD and CBSD symptom severity

The symptom severity for CMD and CBSD were recorded for each sampled field. The severity was assessed by using a disease scale 1–5 according to Sseruwagi et al.60, where 1 = no disease symptoms and 5 = the most severe symptoms. Five plants were randomly assessed in each field in 2017.

DNA extraction of B. tabaci

Leica MZ8 stereomicroscope 100X (Leica Microsystems, Nanterre, France) was used for selection, at most 35 adult female whiteflies were selected from each field. A total of 662 samples were successfully extracted for DNA in this study (108 eggs, 78 nymphs, 28 pupae and 2 adult whiteflies from the 1997 collection, and 446 adults from the 2017 collection). Sex differentiation for eggs, nymphs and pupae was obtained indirectly, with the use of the microsatellite markers, whenever all loci were homozygotes for a given individual it was considered as a male. Two methods of DNA extraction were utilised. The non-destructive method was used for 2017 samples at 3P, CIRAD UMR PVBMT in Reunion Island, as described in Delatte et al.61 and the destructive method of Ghosh et al.51 for 1997 collections at Natural Resource Institute (NRI), University of Greenwich, England.

Mitochondrial DNA amplification and sequencing

A total of 662 individuals were successfully PCR-amplified and sequenced for mtCOI by using a primer pair designed by Mugerwa et al.41 (2195Bt 5′-TGRTTTTTTGGTCATCCRGAAGT-3′ and C012/Bt-sh2 5′-TTTACTGCACTTTCTGCC-3′). The PCR reaction mixture was prepared with a final volume of 15 µl, containing 7.5 µl of Type-it (2x) PCR master mix (Qiagen, France), 4.1 µl of pure HPLC water CHROMASOLV (Sigma-Aldrich, France), 0.8 µl of each primer and 1 µl of DNA template. Initial denaturation of DNA template occurred at 95 °C for 15 min followed by 40 cycles of denaturation at 95 °C for 30 s, primer annealing at 52 °C for 30 s, extension at 72 °C for 1 min and final extension at 72 °C for 10 min. Plates were sent to the Macrogen Europe laboratory for sequencing.

Sequence analysis

Sequences were manually edited and aligned using the Geneious R10 software62. The number and distribution of haplotypes within our sequences were achieved through DnaSP v.6 software63. The selected sequences together with reference sequences from the literature were aligned using ClustalW64 before being subjected to Jmodeltest 2.1.1065. The phylogenetic tree was computed using MrBayes66 at GTR + G (the closest to the selected model under MrBayes). Four Markov chains were conducted simultaneously for 1 100 000 generations starting from random initial trees, and sampled every 200 generations. Variation in the ML scores was examined graphically and 10% of the trees generated prior to stabilization of ML scores were discarded.

Microsatellite PCR amplification and genotyping–Microsatellite design

Two pools of extracted DNA of 25 individuals (each tube) of B. tabaci from laboratory colonies of SSA2, SSA1-SG1, SSA1-SG2, SSA1-SG3 and SSA3 were made and sent to GenoScreenVR (Genoscreen, Lille, France). Each pool contained 10.2 ng of DNA. The company developed a microsatellite-enriched library using a 454GS-FLX Titanium pyrosequencing67 tool. The enriched library was then constructed as described by Atiama et al.68. Total DNA was enriched by probes with the following motifs: TG, TC, AAC, AGG, ACAT, ACG, AAG and ACTC. About 534,451 reads were obtained with average fragment length of 247 bp. A first filter of quality was applied to discard short fragments (<40 bp) and low-intensity fragments, which removed 38% of the sequences.

The software QDD69 was run on the remaining sequences to identify microsatellite motifs in 73,060 raw sequences, among which 160 primers were designed. The objective was to obtain primers that would cross-amplify between all the pooled species and other whiteflies from the same complex of species with different sizes. From these, 41 primers were selected that could amplify various fragment lengths (100–260 bp) and had different repeat motifs (from di nucleotides to tetra nucleotide motifs). Those primers were tested by PCR individually, on four female specimens of the following species or whitefly genotype groups (-SG): SSA1-SG1, SSA1-SG2, SSA1-SG3, SSA2, SSA3, MEAM1, Med and IO. The amplified DNA was loaded on agarose gels and sorted. Among all those tested primers we kept nine primers that were (i) amplifying for all species or genotype groups with good signal intensity, and (ii) giving polymorphisms between individuals within species and between species/genotypes. These nine primers were fluorescently labelled (forward primer; Applied Biosystems, Waltham, MA, USA) and tested in simplex and multiplex PCR mixes on several field samples of the different whitefly species named above. Only five of them were retained in the present study, the other four were discarded due to the high number of null alleles observed in different populations and species within the species complex tested.

Amplification and genotyping of old and recent field populations from Uganda

PCR for genotyping was conducted using 13 microsatellite loci, which were combined in three multiplex primer reactions. Five of the markers were newly developed for this study (Table 2). All markers were selected based on their ability to amplify different species within B. tabaci complex.

A PCR mix of 15 µl was prepared with 7.5 µl of 2x multiplex PCR master mix (Type-it, QIAGEN), 4.5 µl of HPLC water and 0.1 µl of each primer followed by addition of 2 µl of template DNA. The volume was slightly changed in Mix 2, in which 0.2 µl of WF1GO3 and P5 primers were used. All PCR programs were as follows: initial denaturation 95 °C (15 min) followed by 40 cycles, 95 °C (30 s), 55 °C (180 s), 72 °C (1 min) and at 60 °C (15 min) for denaturation, primer annealing, extension and final extension, respectively, except for Mix 3 for which the annealing temperature was increased from 55 °C to 56 °C. Prior to genotyping, the amplified PCR products were diluted in different ratios according to the band intensity obtained for each mix. The final mix consisted of 10.8 µl of formamide, 0.2 µl of Applied Biosystems LIZ size marker and 1 µl of diluted amplified DNA. The mix was run in an Applied Biosystems 3130XL DNA sequencer machine. Genotypic data were retrieved visualised and scored manually using Gene mapper v.4.0 software.

Population structure analysis

The Bayesian cluster approach with Structure v.2.3.470 was used to assess genetic population structure between individuals. The method assigns individuals to different clusters (a series of K to be set). Each K is the number of estimated population clusters characterised by posterior probabilities. Structure 2.3.4 was set at 100,000 burn in length with run length of 1,000,000 MCMC, this step was repeated three times and K was set to range from 1 to 20. The dataset was arranged according to mtCOI results and field numbers. The best number of clusters (K) was estimated by means of ∆K as described by Evanno et al.43 using the online program Structure Harvester71. An online program CLUMPAK (Clustering Markov Packager Across)72 was used to summarise the best K posterior probabilities and to reconstruct the bar plots using Clumpp73 and Distruct74 software.

As null alleles were still recorded in our datasets, we then ran two Bayesian analyses using 12 microsatellite loci with and without the recessive alleles option, as explained by Falush et al.75. Both datasets were executed using burn-in length of 100,000 and MCMC run length of 1,000,000, repeated three times, and an assumed number of population (K) values between 1 and 20. Similar results were obtained from both analyses (Fig. 2a,b), showing robust analyses regardless of null alleles.

Population genetic analyses

The basic population parameters were analysed by using a set of programs within Genetix v.4.05.2, such as the number of alleles per population, expected heterozygosity and observed heterozygosity (according to the method of Nei76) and correlation within individuals following the method of Weir and Cockerham77. Deviation from Hardy-Weinberg equilibrium was tested using MCMC (run length of 1,000,000) implemented in Arlequin v. following the method utilized by Guo and Thomson79. GENEPOP v.4.280 was used to test genotypic disequilibrium by Fisher’s method81. The effect of null alleles on inferring population structure was studied, as described by Falush et al.75. Allelic richness using rarefaction was estimated by FSTAT v. Genetic differentiation among year, between populations within year was inferred by AMOVA by Arlequin. PCA and DAPC were also used to determine the genetic clusters among individuals using R software v. 3.4.283 with the Adegenet package84.

Recent genetic bottleneck signature was also tested in population of 2017 using the genetic software Bottelneck 1.2.0285. The software measures the temporary excess of heterozygosity that results from a decrease of the effective population size and proposes tests to detect this anomaly86,87. Deviations from expected heterozygosity were computed through 1000 permutations, using both the stepwise mutation model (SMM) and the two-phased model of mutation (TPM). One-tailed Wilcoxon sign-rank tests were used to determine whether a population exhibits significant heterozygosity deficit or excess.