High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology

Genetic polymorphisms in P. falciparum can be used to indicate the parasite’s susceptibility to antimalarial drugs as well as its geographical origin. Both of these factors are key to monitoring development and spread of antimalarial drug resistance. In this study, we combine multiplex PCR, custom designed dual indexing and Miseq sequencing for high throughput SNP-profiling of 457 malaria infections from Guinea-Bissau, at the cost of 10 USD per sample. By amplifying and sequencing 15 genetic fragments, we cover 20 resistance-conferring SNPs occurring in pfcrt, pfmdr1, pfdhfr, pfdhps, as well as the entire length of pfK13, and the mitochondrial barcode for parasite origin. SNPs of interest were sequenced with an average depth of 2,043 reads, and bases were called for the various SNP-positions with a p-value below 0.05, for 89.8–100% of samples. The SNP data indicates that artemisinin resistance-conferring SNPs in pfK13 are absent from the studied area of Guinea-Bissau, while the pfmdr1 86 N allele is found at a high prevalence. The mitochondrial barcodes are unanimous and accommodate a West African origin of the parasites. With this method, very reliable high throughput surveillance of antimalarial drug resistance becomes more affordable than ever before.


Results
A total of 457 patients of all ages attending the Bandim and Belem Health Care Centres in Guinea-Bissau, were diagnosed with malaria with an RDT, and donated their used RDTs to the current study. Additionally, out of these patients, 318 patients gave consent to donate venous blood, which was processed and dried on filter paper (see methods). See Fig. 1 for sampling overview and demographic details. The 318 patients who donated filter paper samples were also diagnosed by microscopy. Parasite densities ranged from 15 parasites/200 leucocytes to 500 parasites/13 leucocytes. Filter paper samples were used for full resistance profiling and mitochondrial barcoding while RDTs were only used for mitochondrial barcoding.
Sample calling and sensitivity. Several steps in the protocol of the platform presented in this study have been optimised for analysis of large numbers of samples. Multiplexing of the gene-specific PCRs followed by further multiplexing of the index PCRs and finally pooling of samples prior to sequencing (see Figs 2 and S1 for details), are all elements that meet the necessity of efficiency in high throughput surveillance, but which reduce the possibility of keeping track of all the individual PCR fragments and samples in the analysis. It is therefore crucial to investigate to what extent individual amplicons and samples were lost during the process. All of the 463 custom dual index combinations (457 samples and 6 controls) were successfully called from the sequencing data with belonging amplicon sequences, and hence no samples had been completely lost, however some had too few reads for SNP-analysis. Out of the 318 samples that were resistance-profiled, 264 were successfully sequenced in all resistance-conferring positions. Of the 54 samples that were not completely profiled, 45 were lacking adequate sequencing of a single amplicon, 7 were lacking adequate sequencing of 2 amplicons, while 2 samples were lacking adequate sequencing of 3 amplicons. Mitochondrial barcodes were successfully generated for all 457 samples. There was no correlation between parasite density and successful resistance-profiling/amplicon coverage.
Amplicon representation and coverage. The various amplicons sequenced in this study are depicted in Fig. 3. Amplicon representation was defined as the presence of sequences corresponding to a specific amplicon within the individual sample. Base calling was defined as the determination of a base at a specific position within the amplicon, and was also assessed for individual samples. Both amplicon representation and successful base calling for the individual samples was assessed after performing quality trimming of the raw sequences (see methods). The amplicon representation (listed in Table 1) was estimated as a presence/non-presence of sequences belonging to a specific amplicon in the individual samples (read count >0) and ranged from 98.1% to 100%. Successful base calling (determined as bases called with a minimum z-score of 1.96 and p-value ≤ 0.05) was estimated for all surveyed SNP-positions, as well as positions from each mitochondrial amplicon, and ranged from 89.8% to 100%. Average read count for the different resistance-conferring SNPs (per sample) varied from 250 reads to 4,734 reads (mean = 1,520 reads for individual bases, and 1,725 reads for entire amplicons, Table 1). Average read count over all amplicons was 2,043.

SNP-data from P. falciparum positive subjects attending the Bandim and Belem Health Care
Centres in Guinea-Bissau. Table 2 lists the results of frequencies of known molecular markers in pfcrt, pfmdr1, pfdhfr and pfdhps. Samples with <75% unanimous reads were categorized as mixed, and therefore counted in both the reference and alternative group. Furthermore, Table 3 lists other polymorphisms found (only pfcrt and pfmdr were found to have additional polymorphisms) and Table 4 describes all polymorphisms found in pfK13. The data acquired for the six control strains is listed in Table 5. Finally, Fig. 4 illustrates the distribution of the various haplotypes of pfcrt, pfmdr, pfdhfr and pfdhps (excluding mixed infections).
Polymorphisms found in pfmdr1. In total, 1369 bases were sequenced in pfmdr1, covering c11-162, c938-1080, and c1134-1297 (Fig. 3). The pfmdr1 c86N was found in 92.8% of the samples, while pfmdr1 c86Y was found in only 9.7% of samples (Table 2). pfmdr1 c184 was not analysed due to a design flaw. No samples were found to have mutations at pfmdr1 c1034 and c1042, while only 2 samples had the pfmdr1 c1246Y allele (Table 2). Additionally, non-synonymous mutations were found at c12 N→T (2 samples), c968 G→V (1 sample), and c1199 N→K (16 samples plus 9 mixed) ( Table 3). Synonymous mutations were found at c102 (12 samples plus 6 mixed), c149 (7 Patients of all ages attending the Bandim or Belem health care centres, with a fever or a history of fever within the past 24 hours, were tested for malaria positivity with an RDT. If the patient tested positive, the patient was asked for informed consent in donating the RDT used to test them for malaria infection. The malaria-positive patients were furthermore asked for informed consent in donating venous blood. If consent was given for this procedure, the blood was drawn and the patient was included in the study. In total, 457 malaria-positive patients gave consent to donate their RDTs and 318 of these patients gave consent to donate venous blood. If venous blood was donated, the RDT was not used for the analysis. samples), c1062 (1 sample), c1069 (16 samples plus 6 mixed), c1137 (4 samples plus 1 mixed) and c1139 (1 sample plus 1 mixed) ( Table 3). A total of 262 samples were completely profiled in pfcrt and pfmdr1, and amongst these samples we found that 41.2% carried both the pfcrt c72-76 CVIET haplotype and the pfmdr1 c86N allele.
Polymorphisms found in pfdhfr. A fragment of 461 bp was sequenced from pfdhfr, reaching from c11-164 (Fig. 3). The c16 A→V and the c164 I→L mutations were completely absent while the c51 N→I mutation was found in 88% of samples, the c59 C→R mutation was found in 87.1% of samples and the c108 S→N mutation was Primer and fragment design for library preparation. The P. falciparum gene fragments of interest are amplified with gene-specific primers, in a "gene-specific PCR" (1). The primers anneal to loci in the P. falciparum genome, while encoding non-annealing overhangs, which are then incorporated into the PCR products during amplification. The non-annealing overhangs are identical on all gene-specific primers, and serve as annealing site for the index primers (2), which can in turn anneal to all amplified gene fragments. The index primers also encode individual 8-base indices as well as an adaptor sequence. The index and adapter sequence are incorporated into the PCR product (denoted amplicon from here on), which can then bind to the Miseq flow cell through the adaptor sequence, and be identified through sequencing of it's unique index combination.
Polymorphisms found in pfdhps. Two fragments covering 887 bp reaching from c358-645 were sequenced for pfdhps (Fig. 3). The c431 V→I mutation was found in one sample, the c436 A→S polymorphism was found in 35.7% of samples, and the c437 A→G was found in 66.8% of samples. In contrary, the c540K→E mutation was only observed once in a mixed infection, the c581 A→G mutation was not observed at all and the c613 A→S was found once (Table 2). A total of 233 samples were completely profiled in pfdhfr and pfdhps, and amongst these samples we found that 52.8% carried the combined pfdhfr/pfdhps quadruple mutant (consisting of pfdhfr c51I, c59R, c108N + pfdhps c437G).
Mitochondrial barcode. Mitochondrial barcodes were successfully provided for all 457 samples. The results revealed a unanimous pattern across all samples, namely the CTTAG barcode (corresponding to nucleotide positions 772-853-973-1283-2382), which corresponds to the barcodes published for West-Africa 26 .

Discussion
Multi-drug resistant P. falciparum threatens the current trend of decreasing malaria burden observed in many malaria endemic countries. Molecular surveillance of antimalarial resistance is becoming an attractive operational tool to determine temporal and geographical emergence and spread of drug resistance. However, efficient molecular surveillance requires high throughput, sequence-based SNP analysis in order to assess all the SNPs and genes playing a role in antimalarial drug resistance. This study applied Illumina's Miseq NGS technology of paired-end sequencing on samples collected from patients with P. falciparum infections to provide a high-throughput molecular surveillance method, which is cost-effective as compared to other published methodologies including classic Sanger sequencing.
The sequenced amplicons cover the most established molecular markers of drug resistance in pfcrt, pfmdr1, pfdhfr and pfdhps, as well as the entire pfK13 gene. Additionally, the sequenced amplicons also covered the SNPs constituting the mitochondrial barcode linked to the geographical origin of P. falciparum. The method is PCR based and applies several PCR steps, ruling out the possibility of assessing gene copy numbers, as would be relevant for pfmdr1 and the recently identified plasmepsin 42 . However, the number and identity of amplicons incorporated in the analysis is only dependent on primer design. The method is therefore highly adaptable, since polymorphisms for analysis can easily be incorporated, including for example the exonuclease E415G polymorphism, which has recently been identified as a marker for PPQ resistance 42 , pfmdr1 c184, as well as SNPs occurring in the apicoplast, which can be used in further specifying the origin of the infecting parasite 42 . Importantly, the method is sequencing-based, which generates the possibility of monitoring entire genes, which is necessary in the case of polymorphisms in pfK13 and their relation to artemisinin resistance. It also opens the possibility of identifying other polymorphisms that are not normally analysed with SNP-detection based methods, as is seen for pfcrt and pfmdr1 in this study. The unique 5′-and 3′-indices enable identification of individual samples, and it is therefore feasible to analyse various genes belonging to the same infection, which again is necessary when monitoring parasite susceptibility to combination therapies, such as SP and ACTs.
The possibility of customising the library preparation is the reason the Illumina platform was chosen over other platforms that would provide longer sequences and hence require fewer amplicons for sequencing of entire genes. At 10 USD per sample, the current study is approximately 7x cheaper than an alternative protocol published for Ion Torrent 41 . We have created 50 unique indices (Table S2), which make it possible to create 2450 unique dual index combinations (50 × 50-50) and hence run as many samples on the Miseq simultaneously. It is thereby feasible to reduce sample costs to less than 4 USD per sample, approximating only 5 USD per sample  Table 1. Amplicon Amplicon representation, successful base calling and average read count. The table shows how many of the samples were found with pertaining reads corresponding to the various amplicons, after quality trimming (as sample count and percentage), as well as for how many samples successful base-calling was performed (requires a minimum z-score of 1.96). For pfK13 and for the mitochondrion, a position for each amplicon was analysed. Some of the SNP positions are located in the same amplicon. The table also lists the name of the individual amplicons, and Fig. 3 can be referred to for illustrations of the length and position of the amplicons within the various genes.
already with only 800 samples. When running large numbers of samples simultaneously, the challenge consists of controlling the process of mixing all amplicons in equal amounts prior to sequencing (see Figure S1 for protocol overview and Figures S2 and S3 for example data). Targeting the desired depth of coverage of the various amplicons can be approximated through this process, also if certain amplicons are to be sequenced deeper than others. However, varying depth of coverage should be expected, especially if multiplex PCR is applied for gene amplification and subsequently during the library preparation, as is the case in our study. Sequencing bias may also occur when using the library design applied 43 , leading to vastly differentiated depth of coverage of the different amplicons. This is due to a potential mismatch between the Illumina Nextera sequencing primer and the designed amplicons, which can also occur when applying the Illumina Nextera Library Preparation Kit 43 . It is noteworthy, however, that despite such a bias, the final coverage can still be expected to be adequate, due to the mere sequencing capacity of the Illumina Miseq (amounting to 7.5 billion bp per run).  Table 3. Other polymorphisms in pfcrt and pfmdr1. The table describes polymorphisms identified amongst the 318 resistance-profiled samples in pfcrt and pfmdr1 other than the positions of interest indicated on Fig. 3 and Table 2. The frequencies of both the reference and alternative are listed, as well as the frequency of mixed infections. Note that mixed infections were counted in both the reference and alternative frequencies.  Table 4. Polymorphisms in pfK13. The table describes the various polymorphisms found in pfK13 for the 318 resistance-profiled samples, and the frequencies in which the reference and alternative base were found, as well as the frequencies of mixed infections. SNP positions written in italic indicate that they are located in the propeller region of pfK13. Note that mixed infections were counted in both the reference and alternative frequencies.  While gene haplotype analysis excludes analysis of mixed infections, frequencies of individual SNPs can be given for major and minor alleles as long as a minor allele is detectable. Since the aim of the current study was not only to provide frequency data, but also haplotype data, the cut off for mixed infections was set as <75% of the bases supporting a given base for a given position. This cut off, however, can be reduced and should be considered subjective to study purpose. The experimental limitations for the cut off are largely dependent on the achieved depth of coverage as well as the applied statistical probability applied for base calling.
The method applied in this study generated sequence data from pfcrt, pfmdr1, pfdhfr, pfdhps and pfK13 from 318 P. falciparum samples. The polymorphisms discovered in pfK13 in the current study indicate no pfK13-mediated artemisinin tolerance in Bandim, Guinea-Bissau. Accordingly, an efficacy study published in 2016 showed high efficacy of AL in the same area 3 . The majority of the non-synonymous polymorphisms found in pfK13 were located in the N-terminal region (not the propeller), in particular the pfK13 c189T was found at 57.9%. The high prevalence of this SNP is consistent with previous studies in Africa and it has not been associated with artemisinin resistance 15,44,45 . Only two non-synonymous polymorphisms were found in the propeller region of pfK13: c578S and c698I. The pfK13 c578S has previously been identified at noteworthy frequencies in a number of SSA countries without clinical relevance 3,15,25,46 . However, the data generated in this study cannot rule out any pfK13-independent mechanisms conferring resistance towards artemisinin, as well as the study cannot conclude on the treatment efficacy for the patients involved. Interestingly, a Kenyan study has recently published the presence of residual P. falciparum parasitemia at day 3 after treatment with either artemether-lumefantrine or dihydroartemisinin-piperaquine in 31.8% of the infected children 47 . A follow-up study furthermore rules out the responsibility of PfK13-encoded polymorphisms in in these day 3 positive cases 48 . Importantly, the mitochondrial barcodes indicated no recent import of P. falciparum parasites from South-East Asia, where artemisinin resistance is a reality. Mainly the first position in the barcode, base 772, would reveal recent South-East Asian origin, if it were a T instead of a C 26 . All four of the South-East Asian control parasites were found to have a T on position 772 (DD2, K1, MRA-1238, MRA-1239, see Table 5). The pfcrt c76T polymorphism and the pfcrt c72-76 CVIET haplotype, were found in 52.4% and 43.5%, respectively. Almost all of the samples (90%) were pfmdr1 c86N. The pfdhfr triple mutant was present at 82.4%. The pfdhps c437G polymorphism was 66.8%, while the polymorphisms at codons 540, 581 and 613 known to occur further downstream in pfdhps in an East African context were either absent or only found in single samples. The pfdhps c431V, which recently has been found in Nigeria and has been proposed as a molecular reason for the lack of mutations downstream of c437 in pfdhps in West-African P. falciparum parasites 49,50 was only found in one sample. These findings are consistent with previously published findings from the same area 51 , while also indicating an increase in the frequency of pfmdr1 c86N 51 , consistent with the recommendation and use of both AL and QN in treatment of malaria in Guinea-Bissau 1, 12, 52 . Furthermore, the frequency of the pfdhfr/pfdhps quadruple mutant was 52.8%, which indicates a noteworthy increase since it was last documented in 2004 at 15% 53 . Selection of the pfdhfr/pfdhps quadruple mutant may be explained by the recommendation and use of SP for IPTp in Guinea-Bissau, as well as through possible import from neighbouring countries with higher levels of the quadruple mutant 49,52,54,55 .

Conclusion
The methodology presented in this study represents an affordable, high throughput and reliable platform for efficient surveillance of spread and emergence of antimalarial drug resistance. With Illumina technology already installed in several research facilities in countries with malaria, implementing surveillance of antimalarial drug resistance based on this method would be feasible in these locations. In the current study, the analysis of all molecular markers of drug resistance was performed for less than 10 USD per sample, but can be run for as little as 4 USD per sample. Between 3 and 16 amplicons from 457 P. falciparum patient samples were successfully sequenced on a single Miseq v3 flow cell with an average coverage of 2,043 reads and a minimum z-score of 1.96 applying to all published base calls.

Methods
Patients and malaria parasite samples. A demographic overview of the patients recruited and the sampling process is depicted in Fig. 1. Patients of all ages attending the Bandim and Belem health care centres in Guinea-Bissau between October 2014 and March 2016, symptomatic of and diagnosed with P. falciparum malaria using malaria RDTs were after informed consent asked to donate blood for filter paper sampling (n = 318). Blood samples were drawn in accordance with the relevant guidelines and regulations. An additional 139 patients donated their P. falciparum positive RDTs. All patients donating samples had fever (≥37, 5 °C), or a history of fever within the past 24 hours and hence qualified as having clinical malaria and were treated according to national malaria treatment guidelines. Filter paper samples were collected as venous blood, which was filtered with Plasmodipur filters (Europroxima, Arnhem, Netherlands), and dotted on Whatman filter paper nr. 3. Filter paper and RDTs were stored at room temperature for 6-9 months before transport to Denmark. Furthermore, 6 well-described parasite isolates (3D7, Dd2, K1, 7G8, MRA-1238, and MRA-1239) were used as controls. The data acquired for these control strains is listed in Table 5. Primer design. The experimental protocol was designed to investigate polymorphisms occurring in pfcrt (c72-76), pfmdr1 (c86, c1034, c1042 and c1246), pfdhfr (c16, c51, c59, c108 and c164), pfdhps (c431, c436, c437, c540, c581 and c613), all of pfK13, and mitochondrial-genome base positions (772, 853, 973, 1283 and 2383). All gene-accession numbers, PCR fragments and SNPs are illustrated in Fig. 3. The entire protocol for PCRs and Miseq sequencing is illustrated in Figure S1. Primers were designed based on the principles described in Illumina's own Miseq protocol for 16 S metagenomic sequencing library preparation 57 and are listed in Supplementary Tables S1 (gene specific primers) and S2 (index primers). The concept of gene-specific and index PCR as well as characteristics of the primers needed for each step are illustrated in Fig. 2. The gene-specific primers were designed to anneal to the P. falciparum gene of interest (at a melting temperature around 60 °C), amplifying a fragment of approximately 500-550 bp, in order to allow for overlap of the two reads conducted during paired-end sequencing (350 bp reads). The gene-specific primers all contain a non-annealing overhang on both forward and reverse primers. The overhangs are incorporated into both ends of the PCR product during the gene-specific PCR, and can now act as annealing site for the indexing primers (Table S2) during the indexing PCR. The overhang sequence is based on Illumina's protocol, because it also serves as primer annealing site for Illumina's own sequencing primers during sequencing. The length and melting temperature are therefore ideal for the temperatures applied by the Miseq during the different sequencing steps. Indexing primers were designed to anneal to the overhangs, while themselves containing overhangs consisting of individual 8-base indices and adapter sequences that will allow the final PCR product to bind to the sequencing flow cell. Indices were generated using the Barcode Generator V. 2.8 58 , indicating a desired barcode length of 8 bases, a distance between barcodes of 4 bases and a GC content of at least 40%. All indices used for generation of the presented data are listed in Table S3. All primers were ordered at Eurofins Genomics (MWG-Biotech AG, Edersberg, Germany). PCR reactions and programs. All PCRs were performed with AH PCR Mastermix (AH diagnostics, Århus, Denmark). Four different multiplex PCR reactions were run, while Pfcrt.2 and Mito.2 were run as simplex. The multiplex primer combinations are listed in Table S1. Gene-specific primers were used at a 0,065 μM final concentration each, together with 1 μl of template DNA in a total reaction volume of 10 μl. The gene-specific PCR program consisted of the following steps: Heat activation at 95 °C for 15 minutes, 30 cycles of denaturation at 95 °C for 20 seconds, annealing at 57 °C for 2 minutes, and elongation at 72 °C for 2 minutes, and one final elongation at 72 °C for 10 minutes. In order to target relatively uniform coverage of the various amplicons, a subset of the PCR products from the index-specific multiplex PCRs were analysed by qPCR for the relative presence of the different fragments (qPCR described below). Based on this data (not presented) the PCR products were mixed at a ratio of (multiplex PCRs 1-4) M1:M2:M3:M4/1:0,5:4:6. M5 and amplicon Mito.2 were mixed at a ratio of 2:1, and Pfcrt.2 was run as simplex on the index PCR. These mixtures of multiplexes were used as template in the indexing PCRs. Indexing PCR reactions were performed with a final primer concentration of 0,065 μM together with 0,75 μl of gene-specific PCR product (mix of multiplexes or simplex reactions) in a final volume of 11,75 μl. The indexing PCR program consisted of the following steps: Heat activation at 95 °C for 15 minutes, 20 cycles of denaturation at 95 °C for 20 seconds, annealing at 60 °C for 1 minute, and elongation at 72 °C for 1 minute, and one final elongation at 72 °C for 10 minutes. Reactions were performed on either a VWR DuoCycler, VWR UNO96 (VWR, Søborg, Denmark) or an MWG BIOTECH Primus96 Plus (MWG-BIOTECH, Edersberg, Germany). qPCR. Primers applied for qPCR analysis are listed in Table S3. qPCR reactions were performed with QuantiTect SYBR Green Master Mix (Qiagen, Hilden, Germany), a final primer concentration of 2 μM in a final volume of 20 μl. Template for the reaction consisted of 1 μl of PCR product obtained from the Index PCR, diluted 1:200. The qPCR program consisted of the following steps: Heat activation at 95 °C for 15 minutes, 20 cycles of denaturation at 95 °C for 30 seconds, annealing at 54 °C for 40 seconds, and elongation at 68 °C for 50 seconds, and one final elongation at 68 °C for 10 minutes. See Figure S2 for example data.

Ethical approval. Ethical approval and permission for sampling venous blood from malaria patients in
Amplicon purification, dilution and pooling. Pools were made for a maximum of 96 samples, consisting of 4 ul of each indexing PCR reaction, totalling 14 pools (corresponding to the 14 individual 96-well plates that the PCRs were run in). The amplicons were bead purified with AMPure XP beads (Beckman Coulter, California, United States) according to manufacturer's protocol, using 200 ul PCR product (per pool), and 0.6 x PCR-pool volume of beads (=120 ul), in order to eliminate primer dimers of approximately 250 bp. Purified PCR pools were analysed on Agilent 2100 Bioanalyser using the Bioanalyser 1000 Kit (Agilent Technologies, Santa Clara, CA, USA), to verify the proper elimination of primer dimers, and amplicon sizes of approximately 650-700 bp. Concentration of purified PCR pools were measured on a Nanodrop2000 UV-VIS Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and subsequently diluted to 4 nM pools, according to the following equation: (concentration of pool/(660 * fragment length (bp))) * 1,000,000. Finally, diluted amplicon pools were pooled further according to the space each pool was to be given on the flow cell during sequencing (described in Figure S3). This was decided based on the amount of samples represented by each pool, multiplied by the amount of fragments amplified for each sample in the pool.
Miseq sequencing. The pooled libraries were denatured with NaOH to a final concentration of 1 mM, diluted with hybridization buffer and further heat denatured at 96 °C for 2 min before running the sequencing. 5% PhiX was included for low diversity libraries. The sequencing was performed with an Illumina MiSeq instrument using paired end 2 × 300-bp reads and a MiSeq v3 flow cell according to the manufacturers recommendation. Sequencing was carried out by the DTU in-house facility (DTU Multi-Assay Core (DMAC), Technical University of Denmark).
Quality trimming of paired-end sequences and base calling. Raw sequences were quality trimmed with Cutadapt 59 at a phred score of 20, and primer sequences were trimmed from the 5′-end of the sequences, in order to avoid primer bias in the sequenced fragments. Base calling was performed with Assimpler 60 , a Python program developed to compare reads with a custom database, consisting of the 3D7 reference sequences of all the analysed genes. Assimpler will then output two files, a column file with the reference base and the called base from the sample at all positions in the custom database, as well as a contigs file. Bases were called with a minimum z-score of 1.96 corresponding to a p-value below 0,05. The Z-score was calculated as Z = (X − Y)/sqrt(X + Y), the number of reads X having the most common nucleotide at that position, and the number of reads Y supporting other nucleotides 60 . Mixed infections were categorised as samples with <75% unanimous reads for a given base position for the purpose of generating haplotype data.