High-throughput barcoding method for the genetic surveillance of insecticide resistance and species identification in Anopheles gambiae complex malaria vectors

Surveillance of malaria vector species and the monitoring of insecticide resistance are essential to inform malaria control strategies and support the reduction of infections and disease. Genetic barcoding of mosquitoes is a useful tool to assist the high-throughput surveillance of insecticide resistance, discriminate between sibling species and to detect the presence of Plasmodium infections. In this study, we combined multiplex PCR, custom designed dual indexing, and Illumina next generation sequencing for high throughput single nucleotide polymorphism (SNP)-profiling of four species from the Anopheles (An.) gambiae complex (An. gambiae sensu stricto, An. coluzzii, An. arabiensis and An. melas). By amplifying and sequencing only 14 genetic fragments (500 bp each), we were able to simultaneously detect Plasmodium infection; insecticide resistance-conferring SNPs in ace1, gste2, vgsc and rdl genes; the partial sequences of nuclear ribosomal internal transcribed spacers (ITS1 and ITS2) and intergenic spacers (IGS), Short INterspersed Elements (SINE), as well as mitochondrial genes (cox1 and nd4) for species identification and genetic diversity. Using this amplicon sequencing approach with the four selected An. gambiae complex species, we identified a total of 15 non-synonymous mutations in the insecticide target genes, including previously described mutations associated with resistance and two new mutations (F1525L in vgsc and D148E in gste2). Overall, we present a reliable and cost-effective high-throughput panel for surveillance of An. gambiae complex mosquitoes in malaria endemic regions.

Surveillance of malaria vector species and the monitoring of insecticide resistance are essential to inform malaria control strategies and support the reduction of infections and disease. Genetic barcoding of mosquitoes is a useful tool to assist the high-throughput surveillance of insecticide resistance, discriminate between sibling species and to detect the presence of Plasmodium infections. In this study, we combined multiplex PCR, custom designed dual indexing, and Illumina next generation sequencing for high throughput single nucleotide polymorphism (SNP)-profiling of four species from the Anopheles (An.) gambiae complex (An. gambiae sensu stricto, An. coluzzii, An. arabiensis and An. melas). By amplifying and sequencing only 14 genetic fragments (500 bp each), we were able to simultaneously detect Plasmodium infection; insecticide resistance-conferring SNPs in ace1, gste2, vgsc and rdl genes; the partial sequences of nuclear ribosomal internal transcribed spacers (ITS1 and ITS2) and intergenic spacers (IGS), Short INterspersed Elements (SINE), as well as mitochondrial genes (cox1 and nd4) for species identification and genetic diversity. Using this amplicon sequencing approach with the four selected An. gambiae complex species, we identified a total of 15 non-synonymous mutations in the insecticide target genes, including previously described mutations associated with resistance and two new mutations (F1525L in vgsc and D148E in gste2). Overall, we present a reliable and cost-effective high-throughput panel for surveillance of An. gambiae complex mosquitoes in malaria endemic regions.
Malaria, a mosquito-borne disease caused by Plasmodium parasites, is an important public health problem. In the last decade, malaria elimination strategies have contributed to a global fall in incidence rates. However, infection control progress has recently stalled with no further reductions in malaria-attributable mortality, and even reversal in some regions 1 . There were 241 million cases and 627,000 deaths reported in 2020 alone, predominantly in Sub-Saharan Africa 1 . Vector control strategies are based on two primary interventions, namely, insecticide-treated nets (ITN) and indoor residual spraying (IRS). The World Health Organization (WHO) recommends five chemical classes of insecticides (neonicotinoids, carbamates, organochlorines, organophosphates and pyrethroids) for IRS. Whilst for ITNs only pyrethroid insecticides are currently being used alone The amplified complete sequences of the ribosomal DNA internal transcribed spacers ITS1 and ITS2 were also analysed to investigate genetic variation across species and geographic regions. For ITS1, sequence length variation of < 30 bp was detected among samples from the same country (Guinea), whereas for ITS2, the maximum intra-specific variation in sequence length was 5 bp (Ivory Coast and Guinea). A total of 24 SNPs were detected in ITS1 and 6 SNPs in ITS2 (Supplementary Table 3). Phylogenetic analyses were performed separately for each genomic target. Both trees contain the An. arabiensis sequences from Cabo Verde separated in a clear clade from the remaining samples. When using the ITS1 genomic region, there was greater resolution to a country level ( Fig. 1, Supplementary Fig. 1). One clade consisted of samples from Ivory Coast (15/18), Guinea (6/22) and Ghana (n = 2; An. melas). A second clade includes all the samples from the Kisumu colony, three samples from Ivory Coast (3/18) and most samples from Guinea (16/22).
The mitochondrial genes cox1 and nd4 were also analysed to investigate the phylogenetic relationships of these species and the geographic resolution. Sequencing data analyses revealed a total of 66 SNPs, including six nonsynonymous mutations (cox1 S333R, F346C; nd4 G327R, R288S, V375F and I218M) (Supplementary Table 3). Phylogenetic analyses using both loci revealed a clade that separated most of An. arabiensis (Supplementary Figs. 2,3) with additional clustering similar to the ITS2-based analyses, with ITS1 showing the best geographic resolution across the four loci investigated (Fig. 1).

Detection of SNPs in genes associated with insecticide resistance.
A total of 133 SNPs were detected across the seven amplicons designed to target for loci (vgsc, rdl, gste2, ace1) associated with insecticide resistance. These included 66 SNPs in exons (15 non-synonymous), 56 SNPs in introns, 10 SNPs in untranslated regions, and one SNP in a splicing region ( Table 2, Supplementary Table 4). Most non-synonymous SNPs have been previously linked with insecticide resistance. For the vgsc gene, four amplicons were designed (vgsc-DIS6, -DIIS6, -DIIIS6, -DIVS5). In the vgsc-DIS6 amplicon, a total of 16 SNPs were identified, and included the V402L substitution detected in An. gambiae s.s. and An. coluzzii samples from Guinea (3/21) and Ivory Coast (7/14) (Supplementary Table 5). In the vgsc-DIIS6 amplicon, a total of 5 SNPS were found, including the L995F substitution (995F resistance-R; 995L sensitive-S), which was detected in mosquitoes from Guinea (67% R/R; 33% R/S) and Ivory Coast (57% R/R; 36% R/S). In the vgsc-DIIIS6 amplicon, we detected a total of 22 SNPS with four amino acid substitutions, including three known insecticide resistance mutations (I1527T, F1529C and N1570Y). The 1527 T mutation was detected in samples from Guinea (4%) and Ivory Coast (36%). Whilst the F1529C mutation was only found in samples from Ivory Coast (44%), and N1570Y was detected in Guinea (36%) and Ivory Coast (4%). A new mutation F1525L was found in Guinea (68%) and Ivory Coast (28%). In the vgsc-DIVS5 amplicon. A total of three new SNPs were detected, including the A1746S mutation, which was present in samples from Guinea (24%) and Ivory Coast (7%) ( Fig. 2; Supplementary Table 5).
For the rdl gene, the amplicon designed allowed for the detection of 19 SNPs, including one non-synonymous substitution A296G, which was found in An. gambiae samples from Guinea (21% R/S) and Ivory Coast (55% R/S, 25% R/R) (  Table 5). The I114T mutation was found in An. gambiae samples from Guinea (29%), Ivory Coast (20%) and Ghana (n = 1; An. melas). The substitution L119V was detected in Guinea (50%) and Ivory Coast (27%). The F120L mutation was found in samples from Guinea (21%), Ivory Coast (63%) and Ghana (50%). Whilst V131L was found exclusively in An. arabiensis samples from Cabo Verde (54%). The T154S substitution was detected in the Kisumu wild type samples, as well as in Guinea (50% R/S, 21% R/R) and Ivory Coast (47% R/S, 17% R/R). In addition, a new mutation D148E was detected in An. coluzzii samples (20%) from Guinea. Lastly, for the single amplicon of the ace1 gene, 25 SNPs were found, including one non-synonymous G280S mutation associated with insecticide-resistance, which was present in samples from Guinea (17% R/S) and Ivory Coast (37% R/S, 3% R/R) ( Fig. 2 www.nature.com/scientificreports/ For the vgsc gene, all samples from Ghana and Ivory Coast had at least one SNP associated with resistance, 58% of samples from Guinea presented two mutations, and 28% of the samples from Ivory Coast harbored four SNPs (Fig. 2). For the gste2 gene, a total of three missense mutations linked to resistance were identified, and 72% of samples from Guinea had one SNP, while 4% had a combination of three SNPs. In Ivory Coast, 47% and 3% of samples presented with one and three non-synonymous mutations, respectively.  Most samples were also confirmed to be positive using standard qPCR (overall agreement of 93%; 13/14, 2 samples failed qPCR) ( Table 3). The higher the coverage for the targeted region, the lower the cycle threshold obtained in the qPCR assay. For samples with total amplicon coverage below 75-fold, the cycle threshold values after qPCR assays were between 28 and 34 (Table 3).  www.nature.com/scientificreports/ Comparison of amplicon sequencing with standard methods and pooling. To assess amplicon sequencing data accuracy when compared with standard methods, the following assays were performed in parallel using target-site mutations for the detection of L995F, N1570Y, and G280S. In the amplicon sequencing, the genotypes were predicted using the ratio of the alternative allele and total allele depths (Supplementary Table 6). For N1570Y, all samples were the homozygous reference allele genotype using both methods (24/24; 100%). For G280S and L995F, four samples were heterozygous genotype in one assay and homozygous non-reference in another (4/29; 86.2%), both assays still detected at least one allele carrying the mutation in these samples. For G280S, one sample was classified as susceptible (S/S) genotype by the standard method but was called heterozygous (R/S) using amplicon sequencing. The multiplex assay involves the use of unique barcodes per sample, permitting the pooling of many individual samples and their later discrimination in the analysis. Nevertheless, we tested the pooling of samples before amplification, as pooling of samples collected in the same location and time assists with estimating the prevalence of insecticide resistance, reduces DNA extraction costs and labour time (DNA extraction of pools of mosquitoes vs. individual mosquitoes), and increases sample size in a sequencing platform run. Pooling has been previously tested using TaqMan assays and found to be precise in detecting SNPs 28 . The efficacy of sample pooling in relation to SNP detection was verified by pooling DNA samples prior to PCR assays. We compared the results of individual samples to the pooling of five samples (Pool5) prior to amplification (Table 4). Each pool was amplified with a unique barcode to permit further pooling at sequencing step. The resistant alleles detected in each individual sample were also detected after pooling. For instance, for the gste2 positions 28598062 and 28598136, 1/5 samples were heterozygous (R/S), while 4/5 samples were a homozygous reference allele (S/S) genotype. In each case, one alternative allele (R) in a total of 10 is present in the pooled sample. The genotype prediction for Pool5 for those positions was heterozygous (R/S), confirming the presence of the resistant allele.

Discussion
Our work describes a valuable genomic tool for large-scale malaria vector control surveillance strategies that can be applied and extended to simultaneously identify species within the An. gambiae complex, detect Plasmodium infection, and find genetic variants in loci associated with insecticide resistance. Using previously described genetic markers (SINE200, IGS), the amplicon assay differentiated between four An. gambiae complex species. By incorporating rDNA nuclear (ITS1, ITS2) and mitochondria (cox1, nd4) genes, previously found to be useful for reconstructing phylogenetic relationships within several vector complexes and cryptic species, our analysis revealed that majority of An. arabiensis from Cabo Verde cluster together. The ITS1 locus data provided better species and geographic clustering than the other three loci, consistent with previous work 29 . Further, amplicons covering four loci (vgsc, rdl, gste2, ace1) associated with insecticide resistance were evaluated. Most mutations identified in coding regions (51/66; 77%) led to synonymous amino acid changes, with some studies suggesting that they can affect RNA stability, protein folding and cell fitness 29 .
Most of the 15 non-synonymous SNPs identified in our dataset have been previously associated with insecticide resistance (vgsc 6/7, rdl 1/1, gste2 5/6, ace1 1/1) 21,30 . For vgsc, a new mutation F1525L was found in Guinea and Ivory Coast. The known 995F frequency approaches fixation in Guinea and Ivory Coast (100% and 93%, respectively), and high frequencies of this mutation have been found in west and central Africa and have been linked with secondary selection of other mutations that may enhance or compensate for the L995F phenotype 31 . For example, the N1570Y substitution has been shown to substantially increase pyrethroid resistance when present in combination with L995F 32 , and in our data, the N1570Y frequency was low (< 3.8%) except in Guinea (36%), and always present in the same haplotype as the 995F substitution. We also detected A1746S at low frequencies in Guinea, as previously described 31 , and in Ivory Coast, not previously reported. For gste2, a new Table 4. Genotypes of pooled samples versus individual samples. Individual mosquitoes were processed and submitted for ampliconsequencing. In parallel, DNA aliquots of each sample were pooled together (pool5) and sequenced. SNPs were called for each individual and/or pooled sample and the genotypes detected were shown as RS (heterozygous), RR (homozygous non-reference), and SS (homozygous reference). Chr (chromosome), Nt (nucleotide). www.nature.com/scientificreports/ mutation D148E was found in An. coluzzii samples from Guinea. In addition, we detected the mutation L119V, previously reported in An. funestus as L119F and associated with high levels of metabolic resistance to DDT 33 . Mosquitoes from both Ivory Coast and Guinea also presented substitutions I114T and F120L, with the latter at a high frequency in Ivory Coast and located in close proximity with the putative DDT binding site 34 .
For rdl, the 296G mutation was detected in Guinea and Ivory Coast and is the target site for cyclodiene insecticides (e.g., dieldrin), which have been discontinued from use in malaria control programmes. Despite the ban on dieldrin decades ago, molecular analysis studies have shown that despite the A296G substitution declining in some mosquito populations, it has persisted at high frequencies in others [35][36][37][38][39] . Here we show higher allele frequency of A296G mutation in mosquitoes from Ivory Coast when compared with those from Guinea, corroborating a previous report 40 . The 296S substitution has also been associated with current insecticides (e.g. fipronil) in An. arabiensis, An. funestus, and An. sinensis 7,38,39,41 . In the ace1 gene, the G280S substitution (G119S in Musca domestica), associated with resistance to carbamates and organophosphates, was identified in our Guinea and Ivory Coast samples, and has been reported in the border country Mali, where it is associated with bendiocarb resistance 42,43 . Overall, An. gambiae s.l. mosquitoes from Guinea and Ivory Coast were shown to carry resistant alleles to insecticides of all approved classes. Extreme and multiple resistance mosquito populations have been reported in this geographical region, and therefore represent a major threat to malaria control in West Africa 44 . The Cabo Verde samples had no mutations associated with insecticide resistance. Recently, the kdr L1014S and L1014F substitutions were detected at low frequencies (10% and 19%, respectively) in An. arabiensis collected in Cabo Verde 45 , suggesting it is possible that these mutations are increasing in frequency over time in this region.
Our data revealed that the amplicon method can detect Plasmodium infections in a single wild-caught mosquito, as previously reported in human blood samples 26 and laboratory-infected mosquitoes 27 . More generally, one drawback in amplicon sequencing assays is the low level of primer tagging among samples pooled together in the same run 46 . To overcome possible false positive SNP calling, we set a high threshold for assuming a real SNP while still being able to predict with high confidence the presence of at least one resistant allele in each single mosquito. Using this approach, our findings are consistent with previous data on wild-caught mosquitoes from Ivory Coast 47 and Guinea 42,48 . Our data also corroborates previous findings in An. arabiensis from Cabo Verde, describing the high frequency of V131L mutations in the gste2 gene 49 . Furthermore, our panel provided information of a broad range of missense mutations that have not been investigated in those previous studies using the same batch of samples.
Previous work on amplicon sequencing has focused on species identification towards Anopheles genus and Plasmodium infections using a different panel of 62 loci 27 . Here we describe a high-throughput tool for the surveillance of insecticide resistance, species identification and malaria parasite infection detection in malaria vectors from specifically the An. gambiae complex using a panel of only 14 genetic fragments. The approach allows the tracking of known mutations across populations and the identification of new genetic variants. Combined with phenotyping, this assay can help identify functional SNPs that are predictive of resistance to a particular intervention/insecticide. The assay is easy to implement and can be applied to many samples at low cost, achieved through PCR multiplexing and dual barcoding. As the PCR stage already includes Illumina-compatible flowcell adaptor sequences, there is no need to prepare libraries for Illumina sequencing, leading to multi-locus amplicon pools that can be sequenced by commercial providers at relatively low cost (< US$ 0.5 per amplicon). Further, the amplicons can also be sequenced in portable sequencers (e.g., Oxford Nanopore Technology MinIon), allowing for vector surveillance at field sites. Large scale monitoring can provide key information to support decision making in malaria vector control programs, which are crucial in reducing disease burden. For example, the discovery of new mutations linked to resistance alongside the screening of well-known markers can lead to changes in insecticide use. New loci linked to resistance, speciation and geographical location can be included as amplicons, thereby further facilitating the tracking of mosquitoes and malaria parasites across populations, to inform vector and disease control.

Material and methods
Mosquito sampling. A total of 110 mosquito samples were used in our study, including wild caught and laboratory colonies of the An. gambiae complex. Specimens were captured as part of ongoing studies in Guinea (n = 24; years 2017-2018) 42,48 , Ivory Coast (n = 46; 2019) 47 , Cabo Verde (n = 24; 2017) and Ghana (n = 2; 2015) 50 . These mosquito collections were chosen to represent genetically diverse members of the An. gambiae complex with differing rates of Plasmodium infection, distinct insecticide resistance intensities and underlying resistance mechanisms to validate these novel assays. Adult mosquitoes from the Anopheles genus were identified by a morphological key 51 . In Ivory Coast, study activities were conducted in the village of Aboudé, rural Agboville, Agnéby-Tiassa region and adult mosquitoes were collected using human landing catches (HLCs) 51 . In Guinea, mosquito collections were undertaken in six villages in the Maferinyah sub-prefecture, Forecariah Prefecture, by manual aspiration from house walls and HLCs 51 . Samples from the city of Praia in Santiago Island in Cabo Verde were collected as larvae in natural and artificial breeding sites. In Ghana, mosquitoes were collected using CDC resting traps from the village of Dogo, in the Greater Accra region of Ghana 50 . An. gambiae s.s. Kisumu strains (originally from Kenya) (n = 14) were obtained from the insectary colony at the LSHTM 51 .
DNA extraction. Individual mosquitoes were disrupted using a Tissue Ruptor II (Qiagen, Hilden, Germany) at speed 3 for sixty seconds. DNA of each single mosquito was extracted using Qiagen DNeasy Blood and Tissue Kits following the manufacturer's protocol. DNA samples were quantified using the Qubit 2.0 fluorimeter HS DNA kit (ThermoFisher, Waltham, MA, USA) and stored at − 20 °C. org/ vecto rbase/ app) and aligned and regions of high homology among species were chosen. An. gambiae PEST, An. coluzzii Ngousso, An. melas CM1001059_A, and An. arabiensis reference genome sequences were used in the mapping process. Our high-throughput methodology was designed to amplify a total of 14 genes/genomic regions in the An. gambiae complex genome. Seven genomic regions were targeted for species identification and/or phylogenetic analyses: 28S ribosomal RNA and rRNA intergenic spacer region (IGS) 25 , SINE200 (short interspersed elements) 20 , nuclear ribosomal internal transcribed spacers 1 and 2 (ITS1, ITS2), cytochrome c oxidase I (cox1), mitochondrially encoded NADH dehydrogenases 4 (nd4) (Supplementary Table 7). A total of 17 SNPs associated with insecticide resistance were selected, based on previous reports on the Culicidae family (Supplementary table 8). Primers were designed to target 7 genomic regions containing those 17 SNPs: 4 different domains of the voltage-gated sodium channel (vgsc DI-IV), partial sequences of the genes acetylcholinesterase 1 (ace1), glutathione S-transferase 2 (gste2) and resistance to dieldrin gamma-amino butyric acid receptor (rdl) (Supplementary Table 9, Fig. 4). The 18S ribosomal RNA of Plasmodium spp was also targeted to detect Plasmodium-infected mosquitoes (Supplementary Table 9). For the 1st PCR step, primer design was based on a previous publication with a few modifications 26 , including a 5′tag inline barcode (6 bp long) that was added to each forward and reverse gene target primers. Each primer consisted of 3 regions from 5′: universal Illumina partial adaptors (PE adaptor), inline barcodes with unique combinations per sample and locus specific primer targeting the gene of interest ( Supplementary Fig. 5, Table 10). Target DNA sequences (genes/genomic regions) in Anopheles species (An. gambiae s.s and An. coluzzii, An. arabiensis and An. melas) were downloaded from the VectorBase website and aligned using ALIVIEW software 52 . Similarly, the 18S ribosomal RNA sequences of P. vivax, P. falciparum, P. ovale and P. malariae were downloaded from the PlasmoDB database (https:// plasm odb. org/ plasmo/ app), aligned and the region of interest was selected. Primer design was performed using the Primer3 program 53 and the selection criteria for the region of interest was based on three aspects: sequence similarity among all species, the position of SNPs of interest, and amplicon size of approximately 500 bp. Primers were tested using control samples for P. falciparum, P vivax, P. malaria and P. ovale from the LSHTM Malaria reference laboratory ( Supplementary Fig. 6). The universal adaptors contained non-annealing overhangs on both forward and reverse primers, which were incorporated into both ends of the PCR product during the 1st PCR step, and act as annealing sites for the Illumina indexing primers during the 2nd PCR step. Primers were designed to have similar annealing temperatures in order to be eligible for multiplexing. After designing the primers, we checked for the best combinations using "Multiple Primer Analyzer" software (Thermo Fisher Scientific) to avoid primer dimerization. The 1st PCR reaction (final volume of 25 μl) was performed multiplexing gene-specific primers (0.75 μl/10 μM) as follows: ITS1, ace1, vgsc-DIIIS6, cox1 (multiplex-1); ITS2, nd4, rdl, vgsc-DIIS6, (multiplex-2); vgsc-DIVS5, IGS, gste2 (multiplex-3); vgsc-DIS6, SINE200 (multiplex-4) and Plasmodium ssp (single-plex). Assays were performed using Q5 high fidelity DNA polymerase under the following thermocycler conditions: heat activation at 98 °C for 30 s, 35 cycles of denaturation at 98 °C for 10 s, annealing at 57 °C for 60 s, and elongation at 72 °C for 1 min, and one final elongation at 72 °C for 2 min.
Amplicon purification and next generation sequencing. After gene-specific multiplex PCR reactions, amplicons were visualized in a 1% agarose gel to check for band size and intensity, and DNA quantification was performed using Qubit dsDNA HS Kit (Thermo Fisher). Each PCR multiplex from different mosquito samples were pooled together, resulting in 4 final tubes (pool_multiplex 1, pool_multiplex 2, pool_multiplex3, pool_ multiplex4). The pools were bead purified with AMPure XP beads (Beckman Coulter, California, United States) according to manufacturer's protocol, using 200 µl PCR product (per pool), and 0.8 × PCR-pool volume of beads (= 120 ul). Concentration of purified multiplexes was measured on a Qubit prior to pooling all the 4 multiplexes in a single tube at a final concentration of 20 ng/µl (25 µl final volume). The final pool containing all PCR products for all mosquito samples was then used as template in the indexing PCR (second PCR step) performed by GENEWIZ and sequenced using the Illumina MiSeq platform (Illumina, San Diego, CA, USA). Sequencing was performed using a paired-end (250 bp) configuration. A minimum of 50,000 reads were obtained per pool (250 reads per amplicon in a pool of 200 amplicons) using the Genewiz service (< US$ 0.5 per amplicon). Sequence reads were trimmed to remove possible adapters and nucleotides with poor quality at 3′ end.
Plasmodium falciparum detection. Samples were tested for the presence of P. falciparum using a qPCR assay targeting the cox-1 gene 54 . Reactions were prepared with 1 µl of 10 mM forward (5′TTA CAT CAG GAA TGT TAT TGC-3′) and 10 mM reverse (5′-ATA TTG GAT CTC CTG CAA AT-3′) primers, 1 µl of water, 5 µl of SYBR® Green master mix (Applied Biosystems®, US), and 2 µl of gDNA. The plates were run in the Roche Light-cycler® 96 Real-Time PCR system for 15 min at 95 °C, 35 cycles of 95 °C for 15 s, and 58 °C for 30 s. Fluorescence results were analyzed using the Roche Lightcycler® 96 software.

Detection of insecticide resistant mutations.
To validate sequence results, conventional reactions were used to check for specific insecticide resistant mutations. PCR primers targeting L1014F substitution was carried out following MR4 guidelines (https:// www. beire sourc es. org/ Publi catio ns/ Metho dsinA nophe lesRe search. aspx) and qPCR was reformed to target the N1570Y mutation 32 . PCR followed by restriction enzyme digestion was used to detect the presence of the ace1 resistant allele as previously described 16 . Bioinformatics analysis. Raw sequences were de-multiplexed using an in-house python script. From the sequenced pool, individual sample data was de-multiplexed based on the in-line barcodes in each forward and reverse primer using an inhouse pipeline (https:// github. com/ LSHTM Patho genSe qLab/ ampli con-seq) to remove any mis-tagging across barcodes. Sequences were trimmed and aligned to the reference genome or the www.nature.com/scientificreports/ gene sequence fasta files. Sequence data was checked for quality using FastQC (v 0.11.5). Paired end reads were mapped against the reference sequences for the BWA-MEM algorithm (v0.7.17, default parameters 55 ). SNPs and small indels were called using freebayes (v1.3.5, -haplotype-length -1) and GATK HaplotypeCaller (v4.1.4.1, default parameters) software. The union of variant sets from both callers was used to produce a list of variants for each sample using a custom python script. High quality SNPs were identified using filters that included a minimum phred quality of 30 per called base, minimum depth of 30 reads, and minimum allele depth of 10. Finally, only SNPs that were present in more than one sample, and present across two independent pools were retained. For each individual sample, the percentage of alternative allele to total depth coverage was used to classify genotypes: (i) homozygous susceptible (S/S; 0% to 25%); (ii) heterozygous (R/S) (25% to 75%); (iii) homozygous non-reference (R/R; 75% to 100%). Visualization of sequence assemblies was carried out using Tablet software. Species identification was performed in samples based on the cut-off established by the comparison assay: An. coluzzii SINE200 reads > 10% were An. coluzzii; An. coluzzii reads < 1% sample was An. gambiae s.s form; if values lay between both reads, samples were called as hybrid.

Data availability
All raw sequence data is listed in the European Nucleotide Archive (ERR9693161 to ERR9693176).