Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing

de Cesare, Mariateresa; Mwenda, Mulenga; Jeffreys, Anna E.; Chirwa, Jacob; Drakeley, Chris; Schneider, Kammerle; Mambwe, Brenda; Glanz, Karolina; Ntalla, Christina; Carrasquilla, Manuela; Portugal, Silvia; Verity, Robert J.; Bailey, Jeffrey A.; Ghinai, Isaac; Busby, George B.; Hamainza, Busiku; Hawela, Moonga; Bridges, Daniel J.; Hendry, Jason A.

doi:10.1038/s41467-024-45688-z

Download PDF

Article
Open access
Published: 15 February 2024

Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing

Nature Communications volume 15, Article number: 1413 (2024) Cite this article

2231 Accesses
18 Altmetric
Metrics details

Subjects

Abstract

Genomic surveillance of Plasmodium falciparum malaria can provide policy-relevant information about antimalarial drug resistance, diagnostic test failure, and the evolution of vaccine targets. Yet the large and low complexity genome of P. falciparum complicates the development of genomic methods, while resource constraints in malaria endemic regions can limit their deployment. Here, we demonstrate an approach for targeted nanopore sequencing of P. falciparum from dried blood spots (DBS) that enables cost-effective genomic surveillance of malaria in low-resource settings. We release software that facilitates flexible design of amplicon sequencing panels and use this software to design two target panels for P. falciparum. The panels generate 3–4 kbp reads for eight and sixteen targets respectively, covering key drug-resistance associated genes, diagnostic test antigens, polymorphic markers and the vaccine target csp. We validate our approach on mock and field samples, demonstrating robust sequencing coverage, accurate variant calls within coding sequences, the ability to explore P. falciparum within-sample diversity and to detect deletions underlying rapid diagnostic test failure.

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

A multi-cancer early detection blood test using machine learning detects early-stage cancers lacking USPSTF-recommended screening

Article Open access 17 April 2024

Plasma proteomic associations with genetics and health in the UK Biobank

Article Open access 04 October 2023

Introduction

The malaria parasite species Plasmodium falciparum is an example of both the potential value of genomic surveillance and the obstacles that can impede its implementation. Although a variety of antimalarial drugs exist, the evolution of resistance has compromised their efficacy^1,2. Most critical is resistance to artemisinin, the dominant chemotherapeutic agent in artemisinin-based combination therapy (ACT) and the foundation of global guidelines for the treatment of malaria³. Formerly confined to the Greater Mekong Subregion^4,5,6, genetic mutations associated with artemisinin resistance have recently been detected in Uganda⁷ and Rwanda⁸, escalating the risk of ACT failure in sub-Saharan Africa. Additionally, P. falciparum parasites with deletions causing false negative rapid diagnostic test (RDT) results have been detected at high frequency in Eritrea^9,10 and Ethiopia^11,12. The causal mutations underlying these phenotypes^{12,13,14,15,16} and resistance to other common antimalarials are well characterised. By informing on the frequency and distribution of these mutations, genomic surveillance could play a crucial role crafting evidence-based policies to limit their spread and improve malaria control.

Despite its potential value, multiple challenges limit widespread genomic surveillance of P. falciparum malaria. First, the nuclear genome is 23 Mbp¹⁷—considerably larger than typical bacterial (~3–5 Mbp)^18,19 or viral genomes (~10–100 kbp)²⁰. At present, this renders whole-genome sequencing strategies prohibitively costly to scale. Second, although targeted sequencing strategies—such as those employing multiplex polymerase chain reaction (PCR)^21,22,23, molecular-inversion probes^24,25 or hybrid capture — can be potentially more cost-effective, the genome of P. falciparum is extremely (A+T)-rich¹⁷ and often there is little unique and biochemically-suitable sequence (e.g., for primer or probe design) within proximity of targets. This makes the development of these approaches particularly difficult for P. falciparum. Third, many regions with a high unmet need for P. falciparum genomic surveillance are in sub-Saharan Africa, yet most existing targeted sequencing approaches have been developed for Illumina platforms^21,22,23,25. Due to their complexity, costs and maintenance requirements, these platforms are concentrated in centralised sequencing facilities—few of which are in sub-Saharan Africa. Although this situation is improving²⁶, deficits in local sequencing capacity still impel many small- and medium-sized labs to ship samples internationally for sequencing. This reduces country engagement, introduces ethical and logistical issues around sample export, and inevitably increases time to result, potentially delaying evidence-based policy decisions.

At the same time, there has been growing use of nanopore sequencing for pathogen genomic surveillance, facilitated by the small and portable MinION sequencing device (Oxford Nanopore Technologies). The MinION can be deployed in low resource settings, requires no maintenance, and permits real-time data analysis²⁷. It has been successfully deployed during Ebola²⁸, Zika²⁹, and SARS-CoV-2 outbreaks²⁶. A key advantage of nanopore-based sequencing is the generation of long reads (kbps to Mbps)³⁰ that can improve mapping and structural variant detection³¹, while a disadvantage is a higher base-level error rate compared to instruments from Illumina or Pacific Biosciences (PacBio). Although important proof-of-principle studies have demonstrated the feasibility of nanopore-based sequencing of P. falciparum, and investigated the consequences of its higher error rate^32,33,34, comparatively little effort has been made to develop methods for routine nanopore-based genomic surveillance of malaria.

In this study, we developed a flexible and cost-effective approach to targeted P. falciparum sequencing using the MinION. Flexibility is created through the development of open-source software, called multiply, that enables multiplex PCR design for a user-defined set of target genes and/or regions across the P. falciparum genome. We use this software to create eight- and sixteen-target amplicon sequencing panels, which encompass genes associated with antimalarial drug resistance, RDT failure, complexity of infection (COI) inference and malaria vaccine target csp^35,36. To sequence these panels we devised an optimised protocol that utilises dried blood spots (DBS) as input and costs approximately USD $25 per sample. We validate this approach on mock samples and Zambian field samples collected as DBS, and demonstrate adequate sequencing coverage of target genes, a high SNP calling accuracy within coding sequence (CDS), and how P. falciparum within-sample diversity is detectable in long-read data through analysis of the surface antigen gene msp2. Finally, we perform a proof-of-principle experiment demonstrating that our assay can identify hrp2/3 deletions that cause false-negative RDT results, presenting a novel statistical model for deletion calling from amplicon sequencing data.

Results

Designing amplicon panels for P. falciparum with multiply

New amplicon sequencing panels require the development of a multiplex PCR which, even for a moderate number of targets, entails evaluating vast combinations of primers for off-target binding, primer dimers, or polymorphic sites in the study population. To facilitate this process for amplicon panels where the targets are distributed across larger genomes (i.e., in contrast to tiling PCR of smaller pathogen genomes³⁷), we developed software called multiply (Fig. 1a). multiply provides a rapid and flexible approach to multiplex PCR design given a user-supplied list of target genes and/or regions. Briefly, multiply first generates a diverse set of candidate primers for each target using primer3³⁸. It then searches for polymorphic sites within primer binding locations by intersecting them with user-supplied Variant Call Format (VCF) files; computes primer-dimer scores for all candidate primer pairs using an algorithm similar to that described by Johnston et al.³⁹; and identifies potential off-target binding sites using blastn against the P. falciparum reference genome^40,41. At present, multiply does not check for potential off-target binding sites in the human genome, or in the genomes of other blood-borne pathogens. Results from these three steps are combined into a cost-function that scores multiplex PCR primer combinations, with a lower score indicating a better predicted performance. Finally, the cost-function is minimised using a greedy search algorithm to identify optimal combinations of primers for the specified targets.

**Fig. 1: Design of long-range multiplex PCRs for the low-complexity *P. falciparum* genome using *multiply*.**

We used multiply to develop a multiplex PCR for P. falciparum malaria, selecting eight target genes that would maximise the public health utility of our data (Table 1). To leverage the long-read capability of nanopore sequencing, we restricted candidate amplicons to 3–4 kbp; aiming to produce CDS-spanning amplicons that would still being feasible for PCR. In the design process, multiply considered a total of 194 candidate primers across the eight targets. For these candidate primers, it identified 383 high scoring off-target complementary matches in the 3D7 reference genome (>12 bp aligned from the 3’ end). Overall, 209 matches involved candidate forward primers for dhps; a candidate reverse primer for plasmepsin I (pmI) had 35 matches; a candidate reverse primer for kelch13 had 24 matches; and most other candidate primers had 5 or less matches. By comparing to the variant calls from 7113 P. falciparum whole genome sequences in the Pf6 data release⁴², multiply identified 11 common SNPs (set to minor allele frequency >5% in any Pf6 population) within binding locations of candidate primers, which were excluded. Of the 18,915 unique pairwise alignments multiply computed between candidate primers, 585 had potentially problematic dimer scores (score < − 6). Using a greedy search algorithm, multiply heuristically minimised these factors to suggest a multiplex PCR primer combination from the over 370 million possibilities given the candidate primer set.

Table 1 Target genes for the NOMADS8 and NOMADS16 amplicon sequencing panels

Full size table

We call the amplicons produced from this multiplex PCR the NOMADS8 (NMEC-Oxford Malaria Amplicon Drug-resistance Sequencing) panel. In total, the amplicons cover 28.8 kbp with an (A+T)-composition of 79%. The full coding sequences for 7 of 8 gene targets are captured completely within their amplicons. mdr1 has a coding sequence covering 4259 bp; our amplicon is only 3773 bp but includes important drug-resistance mutations (e.g., N86Y to D1246Y)⁴³. Using PCR conditions with reduced annealing and extension temperatures⁴⁴, we were able to obtain robust amplification of all individual targets and produce bands consistent with expectation for the multiplex, as assessed by agarose gel electrophoresis (Supplementary Fig. 1).

We used multiply to expand the NOMADS8 panel to include an additional eight targets. These were ama1, a highly polymorphic gene used in COI estimation⁴⁵; the RTS,S and R21 vaccine target csp^35,36; and the RDT antigen genes hrp2 and hrp3¹⁶, as well as their flanking genes. To incorporate these eight targets, multiply considered an additional 214 candidate primers and, keeping the 16 primers of the NOMADS8 panel fixed, repeated the selection process described above. The resulting amplicon panel, called NOMADS16, covers a total of 54.7 kbp (Table 1).

Minimising P. falciparum amplicon sequencing costs on the MinION

We combined existing and novel optimisations to minimise the costs of P. falciparum target amplification and sequencing on the MinION (Fig. 1b). Briefly, our protocol starts with DBS as input for DNA extraction, which are relatively non-invasive and easy to collect. Bulk P. falciparum DNA is enriched with a reduced-volume selective-whole genome amplification (sWGA) step, saving approximately USD $4 per sample while still maintaining sufficient yield for subsequent multiplex PCR (Supplementary Fig. 2). Amplicons are barcoded and pooled using a modified version of a simple and cost-effective one-pot protocol⁴⁶. Overall, the protocol from sample to sequence can be completed in 2–3 days at USD $25 per sample, assuming 96 samples are run on a R9.4.1 (FLO-MIN106D) or R10.4.1 (FLO-MIN114) MinION Flow Cell without washing (Supplementary Table 1). Smaller batches of 24 samples run on a Flongle Flow Cell (FLO-FLG001) add a negligible extra USD $1 per sample.

Producing long-read data for policy-relevant P. falciparum genes

We explored the read lengths that are generated with our amplicon panels and protocol by sequencing a mock sample, created by combining P. falciparum 3D7 and human DNA in vitro, on a Flongle Flow Cell (Methods, Fig. 1c, d, e). For the NOMADS8 panel, the median length of reads that mapped to the P. falciparum reference genome and overlapped a target gene was 3.59 kbp. All eight target genes had a median read length greater than 3.04 kbp and, excluding mdr1, on average 91.7% of reads that overlapped a target gene spanned its entire CDS. This included reads spanning all 13 exons of crt1 and the entire CDS of the artemisinin-resistance associated gene kelch13 (Fig. 1c, d). In several cases, longer amplicons enabled multiply to select primers that bind to regions with more moderate (A+T) compositions in adjacent genes, and this was the case for the forward primer used to amplify kelch13 (Fig. 1d). Similarly, the median length of target-overlapping reads for the NOMADS16 panel was 3.37 kbp, with an average of 88.2% of these completely spanning their target’s CDS (excluding mdr1).

Characterising sequencing efficiency and coverage across mock and field samples

Sufficient coverage over target regions is a precondition for accurate variant calling and other downstream analyses. Whether this is achieved depends on the total sequencing throughput, the proportion of that throughput that is on-target (i.e., maps to the intended organism and regions), and how uniformly on-target throughput is distributed across the target regions and samples.

We characterised the coverage generated by our protocol by running experiments with both the NOMADS8 and NOMADS16 panels on two different sample sets. The first set included 24 mock samples, created in vitro from standard laboratory or cultured strains of P. falciparum malaria (Methods, Supplementary Table 2). The second was a set of 28 DBS assessed as P. falciparum positive by RDT, and collected from a clinical setting in Kaoma, Zambia (Methods).

We sequenced the mock samples with the NOMADS8 panel on a Flongle Flow Cell, generating 345 thousand reads or 1.08 Gbp (Fig. 2a). Of these reads, 80.0% passed the Guppy quality control filter and had identifiable sample barcodes on at least one end. We mapped these reads to the P. falciparum 3D7 reference genome and found that 76.2% (61.4% of the total reads) mapped successfully. To understand the causes of mapping failure, all unmapped reads were subsequently mapped to the human reference genome. Nearly all of the reads failing to map to the P. falciparum genome mapped successfully to the human reference (99.5%). These reads tended be shorter and of lower quality than those mapping to P. falciparum, and in optimisation experiments we were able to reduce them by using a higher stringency DNA size selection step after adapter ligation (Supplementary Fig. 3, Methods). The human-mapped reads remaining in this experiment (18.5% of total) were not removed by size selection, despite being shorter. Of the reads mapping to P. falciparum, 93.5% mapped to target regions, suggesting that multiply largely avoided the production of off-target amplicons. In the end, 57.4% of total sequencing reads were on-target for this experiment. A similar percentage was found to be on-target for the NOMADS8 panel when sequencing field samples using a standard MinION Flow Cell (62.1%).

**Fig. 2: Sequencing throughput and coverage across samples and target genes for the NOMADS8 panel.**

Next, we interrogated how uniformly on-target sequencing coverage was distributed across the amplicons of the NOMADS8 panel by quantifying the number of reads that overlapped each of our targets. For the mock and field samples, the median fold-difference in coverage between the highest and lowest abundance amplicons were 9.3 and 16.2, respectively (Fig. 2b, c). With both the mock and field sample sequencing runs, the rank-order of amplicons by abundance was consistent (mock samples, Spearman’s ρ = 0.77; field samples, Spearman’s ρ = 0.85; Supplementary Fig. 4a, b). This indicates that coverage variation across amplicons is largely systematic, and likely a function of differences in PCR efficiency, rather than stochastic. However, comparing mock and field sample sets, the order of amplicon abundances differed slightly, indicating sample-set dependent effects. For example, mdr1 was lower abundance and dhfr was higher abundance in field samples; but notably, mdr1 is present in multiple copies in the laboratory strain Dd2, which is used in 8 of 24 of our mock samples (Supplementary Table 2). crt1 was the lowest abundance for both mock and field samples; also being the longest amplicon in the NOMADS8 panel (3874 bp) with the second highest (A+T) composition (81.62%, behind pmIII with 81.66%) and most bases in long homopolymers (605 bp in homopolymers length 4 or greater). Despite this, crt1 still had a median of 230-fold coverage in the mock sample experiment and 508-fold coverage with the field samples (Supplementary Fig. 4a, b).

The NOMADS16 panel had less uniform coverage across amplicons in comparison to the NOMADS8 panel (Supplementary Fig. 5). In particular, the fold-difference between the highest and lowest abundance amplicons was 141 for the mock samples and 324 for the field samples. This was driven in part by the hrp3 upstream amplicon producing very low median coverage relative to other amplicons in both experiments (mock samples, median of 28-fold coverage; field samples, median of 30-fold coverage; Supplementary Fig. 4c, d); with the hrp3 upstream amplicon excluded, the fold-differences between the highest and lowest abundance amplicons was substantially reduced, but still higher than with the NOMADS8 panel (mock samples, 43.8; field samples, 51.5). As with the NOMADS8 panel, amplicons in the NOMADS16 panel had consistent relative abundances (mock samples, Spearman’s ρ = 0.85; field samples, Spearman’s ρ = 0.84 and Supplementary Fig. 4c, d), but again the specific ordered varied somewhat between mock and field samples.

Effect of parasitemia on sequencing performance

We examined the effect that sample parasitemia had on three measures of sequencing performance: the number of reads generated per sample, normalised to the mean for the sequencing run; the percentage of those reads that mapped to P. falciparum; and the fold-difference in coverage between the highest and lowest abundance amplicons for the sample (Fig. 3). We jointly analysed data from across six different sequencing experiments to take into account batch effects caused by technical factors or variation in sample quality. These experiments used both NOMADS8 and NOMADS16 and included three different types of sample sets: 120 mock samples created by combining P. falciparum and human genomic DNA at different ratios to replicate varying parasitemia; 28 field samples sequenced in Oxford, UK as part of a training; and 41 field samples sequenced from a governmental container lab in Lusaka, Zambia (Methods). Both sets of field samples were collected as DBS. Jointly these sample sets had parasitemia values ranging from 10 parasites per microlitre (p/μL) to over 100,000 p/μL. Unfortunately, we note that parasitemia data was missing for 12/28 field samples sequenced in Oxford.

We did not perform any sample input normalisation and found that the number of reads per sample had only a weak positive correlation with parasitemia for both the NOMADS8 (Pearson’s r = 0.23) and NOMADS16 panels (Pearson’s r = 0.39). The P. falciparum mapping percentages had a stronger positive correlation with parasitemia (NOMADS8, Pearson’s r = 0.59; NOMADS16 Pearson’s r = 0.64); values were markedly lower below approximately 1000 p/μL.

The coverage fold-difference across amplicons was higher at lower parasitemia values, producing a negative correlation that was more pronounced for the NOMADS8 (Pearson’s r = − 0.41) than the NOMADS16 panel (Pearson’s r = − 0.24). With the hrp2/3 targets and their flanking genes removed, the fold-difference in coverage across the NOMADS16 panel was 5-fold less and the negative trend with parasitemia stronger (Pearson’s r = − 0.37). In addition to the hrp3 upstream target being low abundance, several of the titrated mock samples contained P. falciparum laboratory strains Dd2 and HB3, which have hrp2 and hrp3 deletions, respectively. This partially masked the effect of parasitemia and increased variation in coverage. For both NOMADS8 and NOMADS16 panels, roughly 1000 p/μL was the threshold below which coverage variation across amplicons increased.

SNPs are called accurately within coding sequences for clonal infections

We sought to assess how accurately molecular markers of antimalarial drug resistance could be detected with our method. Using Clair3 to call variants⁴⁷, we examined SNP calls for set of substitution mutations associated with drug resistance (documented by the World Health Organisation⁴³) across seven of our clonal mock samples that were sequenced on an R10.4.1 Flow Cell with a MinION Mk1b device (Fig. 4a). For the three mock samples containing P. falciparum laboratory strains, we identified all expected mutations and no false positives. Similarly, for the four mock samples created from cultured P. falciparum strains from Cambodia with documented artemisinin resistance, we identified the expected kelch13 mutations and no false positives.

**Fig. 4: SNP calling accuracy for a set of clonal mock samples.**

We expanded our analysis to examine SNP calling performance beyond known drug-resistance associated mutations and also characterised the effect read depth had on accuracy. We focused on the laboratory strains Dd2 and HB3, for which high-quality whole-genome assemblies exist⁴⁸. For these two mock samples we randomly downsampled the reads mapping to each target to sets ranging from 100 to 10 reads. We created replicates by repeating this procedure 10 times, thereby producing a total of 100 mock samples in silico with varying read depths for both Dd2 and HB3 (Methods). For these replicates we called variants using Clair3 and treating the whole-genome assemblies as truth, evaluated accuracy using the haplotype comparison tool hap.py⁴⁹ (Methods). In Fig. 4b we show the mean F₁-scores (the harmonic mean of the precision and recall) for each target at a given read depth. The target msp2 has been excluded as its very high sequence divergence from the reference genome makes it a case that should be handled separately, with a reference-free approach.

First we examined the coding sequences (CDS) of our targets (totalling 14.4 kbp, excluding msp2), as these are higher complexity and are also expected to capture the overwhelming majority of possible functional mutations. Overall, increasing from 10 to 30 reads resulted in a substantial improvement in the mean F₁ score (F₁ = 0.72 at 10 reads, to F₁ = 0.98 at 30 reads). With 40 reads or greater, SNPs within the CDS were called perfectly for all targets and replicates (F₁ = 1.0), aside from in a single replicate of mdr1. This error was in a Dd2 in silico replicate at the N86 codon position. Clones of Dd2 have been observed to carry multiple copies of mdr1, inducing heterozygosity at codon N86 (AAT), as different copies carry N86Y (TAT) or N86F (TTT). In the complete set of reads overlapping mdr1 in our Dd2 mock sample, the N86F mutation had a within-sample allele frequency of 67.4% (5451/8078 reads), most consistent with the mutation being carried by 2 of 3 mdr1 copies. Clair3 assumes a diploid genome, and this deviation from a 50% within-sample allele frequency likely led to the error. We observed similar errors, but at a higher frequency, in a previous analysis using an R9.4.1. Flow Cell (Supplementary Fig. 6).

We next expanded the analysis to the entire region spanned by our amplicons (totalling 25.2 kbp, excluding msp2), which includes 10.8 kbp of very low complexity (86% A+T) intergenic sequence. Here, SNP calling accuracy across our targets improved considerably with increasing numbers of reads, from an F₁-score of 0.63 overall with 10 reads, to a final F₁-score of 0.89 with 100 reads. We observed considerable variation in F₁-score across targets, which we hypothesised was due to differing amounts of low-complexity intergenic sequence. To evaluate this further, we visualised the genomic positions of erroneous SNP calls at different read depths across our target panel (Fig. 4c, d). Consistent with our hypothesis, we observed that areas with a high false positive rate, or diminished true positive rate, tended to be in very low complexity intergenic sequence. For example, between exons 3 and 4 of crt there is a homopolymer of 41 A nucleotides, around which SNP errors clustered (Fig. 4c). Similarly, upstream of dhps there is a 50 bp AT dinucleotide repeat region in which SNP errors were concentrated (Fig. 4d).

Finally, we systematically evaluated the sequence context of all unique sites where a SNP calling error was observed in any of the 200 in silico replicates in our analysis. These error-producing sites were divided into two groups: those in which the error was corrected with additional reads (n = 198), and those in which the error remained even in replicates with 100 reads (n = 17). We then computed the (A+T)-content and maximum homopolymer length in a 21 bp window centred on each of the sites (+/-10 bp). We found that sites where the SNP calling error could be corrected with additional read depth had lower (A+T)-content (mean 82.5% vs 94.4%) and shorter homopolymers in their flanking bases (mean 5.7 bp vs 8.3 bp) than the uncorrected SNP calling errors (Fig. 4e). Of the uncorrected SNP calling errors, 11/17 (65%) were situated in 21 bp windows consisting of only A or T nucleotides and 6 of these contained homopolymers of length 10 or greater.

Long-read sequencing of the surface antigen gene msp2 provides insights into within-sample diversity

Long reads can facilitate interrogation of more complex regions of the genome. Both the NOMADS8 and NOMADS16 panel include the highly diverse surface antigen gene msp2, canonically used both for COI estimation and for distinguishing recrudescence from reinfection⁵⁰. Critically, msp2 genetic variation induces length polymorphism across a set of known repeat-containing alleles, enabling allele typing via capillary or gel electrophoresis.

We analysed reads deriving from msp2 across our mock samples and observed length polymorphism analogous to that detected with electrophoresis approaches (Fig. 5a). To further characterise msp2-derived reads, we mapped them to each of the four P. falciparum laboratory strains used in our mock samples and labelled them by the strain to which they had the highest identity alignment (Methods). With this basic approach to allele classification, we could both confirm that the observed length polymorphism was driven by different underlying alleles of the expected types, and identify mock samples carrying multiple alleles. Next, we sought to explore an approach to read classification that avoided using a priori information about allele types. To this end, we implemented a global alignment algorithm for pairs of reads that used base-level quality scores to assess the likelihood that both reads derived from the same underlying haplotype sequence (Methods). Using this algorithm, we performed pairwise global alignment of all msp2-derived reads for each sample and hierarchically clustered the resulting pairwise alignment score matrices (Fig. 5b). In cases where a single P. falciparum strain was used to produce a mock sample, the pairwise alignment score matrices had little structure, consistent with a single msp2 allele being present. In cases where multiple P. falciparum strains were combined to produce a mock sample, structure within the pairwise alignment matrices was consistent with multiple msp2 alleles being present.

**Fig. 5: Analysis of length polymorphism and nucleotide identity of *msp2*-derived reads.**

The analysis using mock samples highlighted two limitations of these approaches. First, a general limitation of using only a single locus to learn about COI is that strains within a mixed/polyclonal infection may share the same allele at that locus, leading to underestimation of COI. We observed this with mock samples of COI = 2 created by combining Cam^WT and Cam^C580Y cultured P. falciparum strains. Second, reads identified as deriving from GB4 were underrepresented in higher COI mock samples (Fig. 5b). This may be due to the GB4 genomic DNA we obtained being lower quality. Consistent with this, the clonal mock sample created from GB4 genomic DNA produced less reads compared with the other laboratory strains (Fig. 2b).

We applied these approaches to characterise the msp2-derived reads in our field sample set and observed a variety of patterns reflecting clonal and mixed infections (Fig. 5c, d).

Detecting hrp2/3 deletions with the NOMADS16 amplicon panel

To characterise the ability of the NOMADS16 panel to detect hrp2 and hrp3 deletions that cause false-negative RDT results, we created a set of 45 clonal mock samples with a range of parasitemia levels (625−10,000 parasites per μL) from the lab strains 3D7 (hrp2 + /hrp3 + ), Dd2 (hrp2 − /hrp3 + ) and HB3 (hrp2 + /hrp3 − ). We included three mock P.f.-negative samples as negative controls, yielding 48 mock samples total. We sequenced all 48 samples on a single R10.4.1. Flow Cell using a MinION Mk1b device, generating 4.85 million reads or 11.52 Gbp of sequencing data, and resulting in a mean of 39,602 reads mapping to P. f. per sample after quality filtering and demultiplexing (range 15,352−96,892; excluding negative controls).

As with previous experiments, we observed considerable variation in the mean abundance of the different amplicons in the NOMADS16 panel. We standardised this variation by converting the amplicon abundance for each sample to a proportion with respect to the total abundance of that amplicon across all samples. In a heatmap of these proportions, the expected hrp2 and hrp3 deletions were clearly visible in Dd2 and HB3 mock samples, respectively (Fig. 6c). More specifically, the Dd2 mock samples displayed a reduced abundance of the hrp2 upstream and hrp2 amplicons relative to 3D7 and HB3. For the hrp2 downstream amplicon, both Dd2 and HB3 had reduced coverage relative to 3D7; which is consistent with other studies that have observed a deletion in HB3 near the end of chromosome 8, but not affecting the hrp2 gene¹². In the HB3 mock samples, we observed a reduced relative abundance of the hrp3 upstream, hrp3, and hrp3 downstream amplicons (Fig. 6c). Interestingly, despite the mean abundance of the hrp3 upstream amplicon (20 reads, n = 45) and hrp3 downstream amplicon (51 reads, n = 45) being much lower than the average across all amplicons (2156 reads, n = 720), these experimental results suggest they are still informative for deletion detection.

**Fig. 6: Validation of *hrp2/3* deletion detection using the NOMADS16 panel.**

Next, we examined in more detailed the abundance of the two amplicons that span the full-length hrp2 and hrp3 genes (Fig. 6d, e). The mean abundance of the hrp2 amplicon in hrp2 + mock samples was 669 reads (range 237–1229, n = 30), compared to a mean abundance of 14 reads (range 6–26, n = 15) in hrp2 − mock samples, and 20 reads in the P. f. -negative controls (range 15–25, n = 3). For the hrp3 amplicon, the mean abundance was 1362 reads in hrp3 + mock samples (range 310-3401, n = 30), 28 reads in hrp3 − mock samples(range 12–43, n = 15), and 28 reads in P. f. - negative controls (range 23–32, n = 3). The mean abundance for both hrp2 and hrp3 declined with parasitemia, but we still observed order of magnitude differences in abundance between deleted and wild-type parasite strains at 625 p/μL. We note that reads observed in the P. f. -negative samples and those expected to carry deletions are likely the result of barcode misclassification and/or contamination, and has been observed by others³⁴.

Finally, as proof-of-concept, we developed a statistical model for hrp2/3 deletion detection from data generated by the NOMADS16 panel. Importantly, we devised a model that can be calibrated to an individual sequencing run, leverages information across all amplicons in the NOMADS16 panel, and takes into account barcode misclassification and/or contamination; ultimately providing a probability of hrp2/3 deletion for each sample within a rigorous statistical framework (Methods). Applied to the data described above, our model detected all of the expected hrp2 and hrp3 deletions with complete certainty to a precision of four decimal places (p = 1.0000) (Fig. 6f).

Discussion

Though widely deployed for genomic surveillance of viral and bacterial pathogens, nanopore sequencing of P. falciparum malaria is relatively rare^32,33,34. Here, we have developed an approach to targeted nanopore sequencing of P. falciparum malaria that is flexible and cost-effective. Our approach begins with DBS as input and can produce genomic data of public health significance in 2 to 3 days at approximately USD $25 per sample. Importantly, DBS collection requires only a finger prick and is done routinely by malaria control programs. A major challenge with using DBS is that the modest amount of DNA extracted is primarily derived from the human host⁵¹. Here we use selective-whole genome amplification (sWGA) to enrich for bulk P. falciparum DNA prior to multiplex PCR, a strategy that has been adopted in several other P.f. amplicon sequencing workflows^21,23, to improve PCR performance and consistency from DBS. Simultaneously, we have demonstrated that the cost of sWGA can be substantially reduced when combined with targeted sequencing, allowing for a protocol that is both robust and affordable.

In developing multiply, we have provided a general and principled solution to the design of multiplex PCRs for targeted sequencing. This will enable rapid creation and updating of amplicon panels for P. falciparum, as well as accelerate the creation of panels for other organisms in the future. The software is open-source and freely available, allowing teams to design panels addressing their specific research or surveillance questions. Using multiply, we produced two amplicon sequencing panels containing eight- and sixteen-targets, reflecting the major public health uses of genomic data: tracking resistance to various antimalarial drugs, monitoring the sensitivity of RDTs, understanding the diversity of malaria vaccine targets and assessing within-sample diversity (which can help discriminate recrudescence from reinfection, and gives some indication of local transmission intensity). In contrast to all existing P. falciparum targeted sequencing approaches, our panels generate amplicons between 3 and 4 kbp, thereby producing individual reads that span the entire CDS of nearly all of our target genes.

A current priority of P. falciparum genomic surveillance is tracking the hrp2 and hrp3 deletions that can cause false-negative RDT results and are jeopardising the over 300 million RDTs distributed annually⁵². While several well-validated PCR-based methods exist to these deletions^16,53,54,55, there are only a few examples of detection by amplicon sequencing³⁴, or incorporation into amplicon sequencing panels. A set of best-practices for detecting these deletions by PCR recommended that an assay should: (i) target full-length hrp2 exon 2 and the exon 1/2 boundary, to ensure both complete and partial deletions of hrp2 are detected; (ii) target at least two single-copy genes, to ensure sufficient amplifiable DNA is present; and (iii) target one or both of the flanking genes, which are also lost in most deletions observed to date¹⁶. The NOMADS16 multiplex PCR was designed to meet all of these recommendations, and here we have shown it is able to accurately detect deletions across a set of mock samples of varying parasitemia. In addition, we have developed a novel statistical model for deletion detection that rigorously handles contamination and variation in sample quality, two issues that can complicate interpretation of amplicon sequencing data³⁴. Although further validation on field samples is necessary, this work represents an important initial proof-of-principle for this approach.

A limitation of our current method is that it has weaker performance on low parasitemia samples in comparison with other P. falciparum amplicon-based methods designed for short-read sequencing^21,22,23,56. Parasitemia and DBS sample quality (characterised by factors like age, storage conditions, number of blood spots and spot size) will influence the maximum amplicon length above which PCR performance will suffer due to an insufficient concentration of template DNA molecules of an adequate length. In addition, PCRs with longer amplicons typically have reduced efficiency in comparison with shorter alternatives. The 3 to 4 kbp, CDS-spanning amplicons in our panel exhibited robust assay performance on mock and field samples with above ~1000 parasites per microlitre. Additional experiments, especially with field samples, are necessary to more confidently establish this threshold and determine the requirements our long-read amplicon panels put on DBS collection procedures and quality. It is likely the NOMADS panels are best suited to higher parasitemia, symptomatic or clinical cases, rather than lower parasitemia asymptomatic cases. An advantage of developing multiply is that, should it be necessary, we will be able to rapidly design new amplicon panels with shorter lengths (e.g., 1–2 kbp). Moreover, work on adaptive sampling of reads during nanopore sequencing has recently been applied to P. falciparum malaria⁵⁷, and variations of this approach may help recover better data from low parasitemia samples.

An important question not directly addressed by this study is how sensitively our assay can detect minor clones, and the mutations they carry, in mixed/polyclonal P. falciparum infections⁵⁸. The extent to which a sequencing method can detect minor clones depends on two sequential processes. The first is the reliability with which the laboratory protocol recapitulates, in the sequencing reads, the number and proportions of P. falciparum strains that existed in the cognate sample. This is fundamentally a sampling process, with higher variation and lower sensitivity expected in low parasitemia and low coverage samples; but it is also influenced by the non-linear dynamics of any amplification procedures that may be employed. Our approach uses sWGA, which has been shown to weaken correlations between strain proportions in a mixed infection and within-sample allele frequencies²³. Once reads from different clones have been generated, the second process is to identify them by variant calling and/or haplotype inference, simultaneously distinguishing true variation from error or contamination. In clonal samples we determined that 30- to 50-fold coverage is sufficient for high accuracy SNP calling across all but the most extreme low complexity stretches of the genome (i.e., 100% (A+T)-content and/or hompolymers > 10bp). The fact that coverage levels tens- to hundreds- of times higher than this can be readily attained gives some indication that detection of minor clones comprising ~ 5 to 10% of the sample may be feasible. Critically, we used Clair3 to perform variant calling, which was designed for diploid human or haploid genomes⁴⁷, but not for samples with varying and unknown ploidy as is the case with P. falciparum. In order to properly investigate the limits of minor clone detection, haplotype inference tools that can handle complex P. falciparum infections in conjunction with the greater length and higher error rate of nanopore reads must be developed. Going forward, these may be built by adapting existing short-read haplotype inference tools, such as DADA2⁵⁹, for nanopore; by adapting nanopore-based tools such as Clair3⁴⁷ or WhatsHap⁶⁰, for malaria; or be developed as new fit-to-purpose tools. SeekDeep⁶¹, an algorithm that has been successfully optimized for PacBio reads⁶², may be more readily adaptable to nanopore.

Long-read amplicon sequencing of P. falciparum malaria brings benefits for malaria genomic surveillance. There are three immediate examples. First, long reads, especially those spanning entire CDS, are better suited for the detection of rare and novel mutations. Approaches using smaller reads typically focus on ~ 250 bp regions around known, common mutations, and have primer binding locations within the target gene. Therefore, novel mutations can emerge undetected or disrupt primer annealing, ultimately requiring the redesign or expansion of an amplicon panel. For P. falciparum a critical surveillance region is the propeller domain of kelch13⁶³, which harbours an expanding list of mutations conferring artemisinin resistance^4,43, but at 855 bp is too long to capture with a single amplicon in short-read sequencing. Second, once suitable computational tools are developed, long reads will enable epidemiologically relevant read-based phasing of variants within target genes⁶⁰. For example, pyrimethamine treatment failure is predicted on the basis of a triple mutation within dhfr including N51I, C59R and S108N⁶⁴; however, in some geographies single, double, and triple mutations exist, complicating this prediction for mixed infections^65,66,67. Third, longer reads allow for better mapping in structurally complex or repetitive regions of the genome, and can assist with structural variant detection³¹. The investigation of several control-relevant regions of P. falciparum, including msp2, the histidine-rich proteins hrp2 and hrp3, and the vaccine target csp; will all benefit from long-read sequencing.

Over 95% of all P.falciparum malaria cases occur in Africa³, and yet the vast majority of P.falciparum genomic data is generated elsewhere. This discrepancy has resulted, in part, due to a preponderance of protocols making use of second-generation sequencing platforms with inaccessibly high capital and maintenance costs. A flexible and cost-effective protocol for nanopore sequencing of P.falciparum malaria that uses the MinION significantly expands the settings in which genomic data collection is possible. While on-site or clinical sequencing remains impractical, there exists a multitude of research and public health laboratories across Sub-Saharan Africa with interest in generating P.falciparum genomic data who can benefit from our approach. It is important to highlight that challenges still exist for widespread implementation, in particular establishing timely, reliable, and affordable procurement processes for scientific reagents and equipment in Sub-Saharan Africa. Notwithstanding, the over 100,000 SARS-CoV-2 genomes sequenced on the African continent during the pandemic demonstrate that these challenges can be overcome²⁶. Given the rapid ongoing spread and, in some cases even, confluence⁶⁸, of P.falciparum drug and diagnostic resistance mutations in Africa, now is a critical time to expand P.falciparum genomic surveillance on the continent.

Methods

Development of multiplex PCR panels

The NOMADS8 and NOMADS16 panels were generated using a beta version of multiply, called pf-multiply, available at: https://github.com/JasonAHendry/pf-multiply(design file for NOMADS8 and NOMADS16). Both use a BED (*.bed) file to delineate the mdr1 amplicon. NOMADS8 was generated first using the command:

python multiply.py -d designs/pf-nomads8-mdr1part.ini

NOMADS16 was created by using the augment command of pf-multiply. The NOMADS8 multiplex PCR primers were combined at equimolar amounts for total primer concentration of 0.6 μM; after an initial sequencing run with mock samples on a R9.4.1 Flongle Flow Cell, primer concentrations were crudely adjusted (doubled or halved) based on observed amplicon abundances, keeping the total primer concentration fixed. The same procedure was repeated with NOMADS16; i.e., one round of primer concentration adjustment was performed.

Creating mock samples of P. falciparum and human DNA

We ordered P. falciparum genomic DNA for laboratory strains 3D7, Dd2, GB4 and HB3 and Cambodian field derived strains IPC 5202 (kelch13 R539T); IPC 4912 (kelch13 I543T), IPC 3445 (kelch13 C580Y); and IPC 3663 (kelch13 WT)⁶³ from BEI resources (www.beiresources.org). To create 10,000 p/μl in vitro DNA mixtures we diluted these stocks to 0.25ng/μl in 25ng/μL human genomic DNA from a pool of 36 HapMap cell lines⁶⁹. DNA mixtures were then combined at different numbers and ratios to replicate mixed infections of different proportions or COI, and/or serial diluted in additional human genomic DNA to replicate lower parasitemia infections (Supplementary Table 2). For validation of hrp2/3 deletions, parasite lines 3D7 (NF54), Dd2 and HB3 were cultured a 5% hematocrit in commerical red blood cells obtained from DRK Blutspendedienst Nord-Ost gemeinnützige Gmb, as previously described⁷⁰. Genomic DNA from all lines was extracted using a Qiagen Blood and Tissue Kit on parasite pellets lysed with 0.15% saponin. Extracted DNA was combined with human genomic DNA (Roche, 11691112001) to produce a 10,000 p/μl stock, as described above. Lower parasitemia strains were produced by 2-fold serial dilution of the 10,000 p/μl stock into human genomic DNA.

Collection of field samples

Samples from Zambia are from two studies. The first were collected under an ethical waiver granted by the National Health Research Authority, Zambia under the Laboratory Quality Improvement Research In Ministry of Health Laboratories (NHRA000004/16/11/2021). Symptomatic patients visiting a clinic in Kaoma, Western Province, Zambia were diagnosed with an RDT while a microscopy slide and DBS (on Whatmann No3 filter paper) were also collected. All samples were de-identified and no demographic or clinicial data was recorded. Bulk DNA was extracted from DBS using a Qiagen QIAamp Kit following manufacturers instructions. Parasitemia was quantified by light microscopy from thin film blood slides. The second set were collected during a Therapeutic Efficacy Study (TES) conducted by the Ministry of Health through the National Malaria Elimination Centre under ERES Converge IRB under Therapeutic Efficacy Testing for Artemether-Lumefantrine, Artesunate-Amodiaquine and Dihydroartemisinin-Piperaquine in Selected Sites in Zambia. The TES study is routinely conducted to assess the efficacy of three ACT antimalarial drugs used to treat uncomplicated malaria. Symptomatic patients visiting a clinic in Solwezi district, North-Western Zambia, were diagnosed for malaria using an RDT while a microscopy slide and DBS were collected. Parasitemia was quantified for every positive RDT, and any patient with parasitemia of 1000 parasites/μl or higher was given the option to enrol in the study through the written consent process. Additional clinical information such as fever status and demographic data (i.e., age, height, weight) were collected. The patient’s home address was recorded for study follow-up purposes, but the DBS were de-identified prior to any analysis being conducted.

Laboratory protocol and sequencing

The complete laboratory protocol, including materials and primer sequences, is available online at protocols.io (https://www.protocols.io/) as "Cost-effective targeted nanopore sequencing of P. falciparum malaria". In brief, for each sample 10–40 ng of extracted genomic DNA was used as template in a 50 μl sWGA reaction⁵¹, but with reduced phi29 DNA polymerase (NEB #M0269S) to minimise costs. Afterwards, 2 μl of sWGA product was used as template in a 25 μl multiplex PCR with KAPA HiFi Polymerase (Roche #KK2101) and either the NOMADS8 or NOMADS16 primer pools. Multiplex PCR products were cleaned using a 0.5X ratio of NEBNext Sample Purification Beads (NEB #E7103) and eluted in 15 μl nuclease-free water. DNA elute was quantified using the Qubit (ThermoFisher #Q32854) and between 100 and 600 ng of DNA was taken forward for barcoding and sequencing. We ligate unique barcodes from an Oxford Nanopore Technologies (ONT) Native Barcode Ligation Sequencing Kit (SQK-LSK109 with EXP-NBD104, EXPNBD114 for R9.4.1 Flow Cells; SQK-NBD114.96 for R10.4.1 Flow Cells) to each sample using a modified one-pot barcoding protocol⁴⁶. Samples are then pooled before adapter ligation and sequencing, where we follow ONT protocol recommendations.

Bioinformatics pipeline

For experiments using R9.4.1 Flow Cells, FAST5 files generated by the MinKNOW software were basecalled using Guppy (v5.0.11) with a minimum quality score threshold of 8. For the Flongle experiment, we used the super-accurate (SUP) basecalling model and for all other experiments we used the high accuracy (HAC) basecalling model. For experiments using R10.4.1 Flow Cells, POD5 files were basecalled using dorado (v0.34; https://github.com/nanoporetech/dorado) using the super-accurate (SUP) model. FASTQ files were then demultiplexed using Guppy (v5.0.11), without setting the ––require_both_ends flag, i.e., with single-end demultiplexing. Demultiplexed FASTQ files were mapped to release 52 of the P. falciparum 3D7 reference genome downloaded from PlasmoDB⁷¹ (https://plasmodb.org) using minimap2⁷² (v2.24-r1122) and the -ax-ont parameter setting. In the resultant BAM (*.bam) file, reads failing to map to the 3D7 reference were identified using samtools⁷³ (v1.16), with the command samtools view -f 0x904. These unmapped reads were converted back to FASTQ files using samtools fastq before being remapped to the GRCh38 human reference genome downloaded from NCBI’s Genome Database (https://www.ncbi.nlm.nih.gov/genome) and subsequently excluded from downstream analyses. Reads deriving from targets were defined as those that overlapped the coding-sequence defined in the Gene Feature Format (GFF) (*.gff) for release 52 of the P. falciparum 3D7 reference genome downloaded from PlasmoDB⁷¹. Variant calling of reads mapping to the 3D7 reference genome was performed using the using the singularity image of Clair3⁴⁷ (v1.0.4; https://github.com/HKU-BAL/Clair3) in diploid mode with the flags ––platform='ont' ––include_all_ctgs ––enable_phasing set. For the SNP calling analysis in Fig. 4, we sent all variants to the alignment model by setting ––var_pct_full=1.0 and –ref_pct_full=1.0.

SNP calling accuracy analysis

Downsampling reads

We partitioned reads mapped to the 3D7 reference into those overlapping each of our target genes using samtools⁷³ (v1.16), thereby producing BAM files for each of our targets. For each target BAM, we downsampled reads using the samtools view command and -s/––subsample flag to achieve the desired number of reads. This procedure was repeated for the Dd2 and HB3 samples; downsampling to 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 reads for each target, and ten times for each number of reads. As a result, for each given depth and target, we produced 10 randomly downsampled BAM files for Dd2 and Hb3; for 20 replicates total. In Fig. 4b the ’All’ category was created be concatenating the BAM files generated in this way for all targets in the NOMADS8 panel, excluding msp2.

Creating a set of true variants

Dd2 and HB3 have been sequenced to high depth on the Pacific Bioscience Sequencing SMRT technology and assembled, with resulting FASTA (*.fasta) sequences available on PlasmoDB⁷¹. To identify variants in these assemblies with respect to the 3D7 reference genome, we simulated high-quality (Phred 60) error-free reads in silico from the FASTA files, mapped them to the 3D7 reference genome with minimap2 (v2.24-r1122), and then identified variants using the bcftools⁷⁴ (v1.16) mpileup and call commands. In particular, we simulated 60 error-free reads, half forward and half reverse strand, for each target in our NOMADS8 panel by extracting the FASTA sequence spanning +/-4kbp of the target, based on GFF files for Dd2 and HB3.

Stratified variant call comparisons. We used the tool hap.py⁴⁹(https://github.com/Illumina/hap.py) from a Docker image (jmcdani20/hap.py:v0.3.12) to compute measures of variant calling accuracy across different target regions in comparison to the true variant set described above. To restrict accuracy measure analysis to coding sequence, we subset the 3D7 GFF to only CDS features, identified the rows pertaining to our targets, and output the chromosome, start, and stop positions as a BED (*.bed) file. We used then used the ––stratifications flag of hap.py to compute measures over these intervals. We used the annotated VCF (*.vcf) files produced by hap.py to generate positional plots of false- and true-positive rate across targets in Python.

Analysis of msp2 reads

Computing coding sequence lengths

After mapping reads to the 3D7 reference genome with minimap2⁷² (v2.24-r1122), we extracted reads that completely overlapped the msp2 (PF3D7_0206800) coding sequence using bedtools⁷⁴ (v2.31.0) with the intersect -F 1.0 command. From the resultant BAM file, we trimmed these reads to the extent of the msp2 coding sequencing by keeping only the section of each read that aligned within the interval [273689, 274507] of chromosome 2 (Pf3D7_02_v3); indels were retained if the bases on either side of them aligned within the interval. Unusually short trimmed reads (< 400bp) were removed as likely artefacts. Trimmed reads were used to create length distribution plots. They were independently mapped, using minimap2⁷² (v2.24-r1122), to release 52 of the reference genomes for 3D7, Dd2, GB4, and HB3 downloaded from PlasmoDB⁷¹. We let minimap2 output a PAF (*.paf) file and computed the identity of the mapping alignment by dividing column 10 (number of matches in alignment) by column 11 (total alignment length).

Global pairwise alignment

We implemented a banded version of the Needleman-Wunsch algorithm to compute global alignment scores between pairs of trimmed msp2 reads. We parameterised the scoring model such that scores reflect the log-probability that the two observed reads derived from the same underlying haplotype sequence; i.e., that all alignment differences were caused by sequencing error. Assuming an indel rate of 5%, which is broadly consistent with observed error rates, we used a linear gap score of ${\log }_{10}(0.05)$. For substitution scores, we took into account the base quality scores generated by Guppy as follows. Defining x and y as the two observed bases in the match, the likelihood that they were generated from the same haplotype base h is

$$P(x,y| h,{p}_{x},{p}_{y})=\left\{\begin{array}{ll}(1-{p}_{x})(1-{p}_{y})+\frac{{p}_{x}{p}_{y}}{3}\hfill\quad &{{{{{{{\rm{if}}}}}}}}\,x=y\\ {p}_{x}(1-{p}_{y})+{p}_{y}(1-{p}_{x})+\frac{2{p}_{x}{p}_{y}}{3}\quad &{{{{{{{\rm{if}}}}}}}}\,x\ne y\\ \quad \end{array}\right.$$

(1)

where ${p}_{x}=1{0}^{\frac{{Q}_{x}}{-10}}$ and ${p}_{y}=1{0}^{\frac{{Q}_{y}}{-10}}$, with Q_x and Q_y being the Phred-scaled base quality scores for x and y. The substitution score is then computed as log₁₀(P(x, y∣h, p_x, p_y)). For all alignments in this study, a band width of 80 bp centred on the diagonal of the global alignment matrix was used. Hierarchical clustering of the resulting scores was performed using the scipy.cluster.hierachy.linkage function from the SciPy⁷⁵ (v1.4.1) Python package.

Statistical model for hrp2/3 deletion detection

Data and notation

We first describe the data and notation used by our model (Table 2). Amplicon sequencing data is represented as a two-dimensional matrix of positive integers, X, which holds read counts after quality filtering, demultiplexing and mapping. The matrix X has rows representing samples, which are indexed by i ∈ {1, 2, . . . , n}, and columns representing target genes, which are indexed by j ∈ {1, 2, . . . , m}; each element, x_ij, represents the number of reads from sample i that mapped to the target gene j. We define a corresponding binary matrix, C, where each element c_ij indicates that the given target gene j is either present (c_ij = 1) or deleted (c_ij = 0) in sample i. We define a vector a of size n such that a_i ∈ [0, 1], which represents the relative abundance of each sample in the sequencing library. Finally, we define two scalar parameters: a read misclassification rate, ϵ ∈ [0, 1], which represents the rate at which reads derived from sample k contribute to another sample’s read counts (i.e., to x_i≠k,j), whether by contamination or incorrect sample assignment during demultiplexing; and a read count dispersion term $\nu \in {\mathbb{R}}+$, which is given a precise mathematical definition below.

Table 2 Notation for hrp2/3 deletion detection model

Full size table

Model

Each column of the read count matrix, x_j = {x_1j, x_2j, . . . x_nj}, contains the read counts for a given target gene j across all n samples. We model x_j with a Dirichlet-multinomial distribution:

$$P({{{{{{{{\bf{x}}}}}}}}}_{j};{N}_{j},{{{{{\alpha }}}}}_{j})=\frac{{{\Gamma }}(\mathop{\sum }\nolimits_{i=1}^{n}{\alpha }_{ij}){{\Gamma }}({N}_{j}+1)}{{{\Gamma }}({N}_{j}+\mathop{\sum }\limits_{i=1}^{n}{\alpha }_{ij})}\mathop{\prod }\limits_{i=1}^{n}\frac{{{\Gamma }}({x}_{ij}+{\alpha }_{ij})}{{{\Gamma }}({\alpha }_{ij}){{\Gamma }}({x}_{ij}+1)},$$

(2)

where ${N}_{j}=\mathop{\sum }\nolimits_{i=1}^{n}{x}_{ij}$ is the total read counts for target j across all samples; and α_j = {α_1j, α_2j, . . . , α_nj} is a vector of compound parameters, α_ij, for the target gene j. These α_ij determine the expected number of reads for each sample for a given target gene and are computed in three steps. First, we use the relative abundance of a sample, a_i, and its deletion status for target gene j, c_ij, to compute the expected proportion of reads generated for target gene j that should derive from sample i:

$${p}_{ij}=\frac{{c}_{ij}{a}_{i}}{\mathop{\sum }\limits_{k=1}^{n}{c}_{kj}{a}_{k}}.$$

(3)

Note that p_ij either equals zero, if target gene j is deleted in sample i; or the proportion of sample i in the sequencing library, but relative to only the samples where the gene is present. In the process of generating these reads, these expected proportions are altered through read misclassification and sample contamination, such that a different set of expected proportions, π_ij, are reflected in the final data. Here, we make the assumption that read misclassification happens at a fixed rate, ϵ, and uniformly across samples, resulting in the expression:

$${\pi }_{ij}={p}_{ij}(1-\epsilon )+(1-{p}_{ij})\epsilon .$$

(4)

Both p_ij and π_ij sum to one for a given j. Finally, we parameterise the Dirichlet-multinomial with α_ij = νπ_ij. The effect is that the expected read counts, x_ij, equals the product of the total number of reads for target gene j, N_j, multiplied by the deletion-status and error-adjusted sample proportion, π_ij:

$$E[{x}_{ij}]={N}_{j}\frac{{\alpha }_{ij}}{\mathop{\sum }\nolimits_{k=1}^{n}{\alpha }_{kj}}={N}_{j}\frac{\nu {\pi }_{ij}}{\mathop{\sum }\nolimits_{k=1}^{n}\nu {\pi }_{kj}}={N}_{j}{\pi }_{ij}.$$

(5)

The variance of x_ij equals:

$$V[{x}_{ij}]={N}_{j}{\pi }_{ij}(1-{\pi }_{ij})\left(\frac{{N}_{j}+\nu }{1+\nu }\right),$$

(6)

which increases as ν shrinks towards zero, or asymptotically approaches the variance of a binomial distribution, as ν grows towards infinity; ν controls read count overdispersion relative to a binomial distribution. In summary, the α_ij incorporate information about the deletion status of the target gene, the relative abundance of each sample in the library, the rate of misclassification in the sequencing run, and the amount of overdispersion in read counts across samples.

Inference

Our aim is to infer whether a target gene of interest is present (c_ij = 1) or absent (c_ij = 0) in a given sample, using all of the salient information in the read count matrix X. We approach this with a Bayesian formulation: treating the Dirichlet-multinomial distribution, described above, as the likelihood and computing the posterior probability over c_j = (c_1j, c_2j, . . . , c_nj) as:

$$P({{{{{{{{\bf{c}}}}}}}}}_{j}| {{{{{{{{\bf{x}}}}}}}}}_{j};\epsilon,v,{{{{{{{\bf{a}}}}}}}})\propto P({{{{{{{{\bf{x}}}}}}}}}_{j}| {{{{{{{{\bf{c}}}}}}}}}_{j};\epsilon,v,{{{{{{{\bf{a}}}}}}}})P({{{{{{{{\bf{c}}}}}}}}}_{j}).$$

(7)

A natural choice of prior for each c_ij would be a Bernoulli distribution, c_ij ~ Bern(θ), with 1 − θ giving the expected probability of deletion. Here, for simplicity, we have chosen a uniform prior equivalent to θ = 0.5, although in principle this could be adjusted based on previous knowledge about deletion prevalence of target gene j in the regions from which the samples were collected.

Also for simplicity, we have chosen to treat ϵ, v, and a as fixed parameters and we fit them using point estimation. Let δ be a set containing the indices of all the negative control samples included in the sequencing run, such that ∣δ∣ represents number negative controls. We first make a simple point estimate of the misclassification rate by taking the empirical mean of the x_ij’s for all these negative controls:

$$\epsilon=\frac{1}{| \delta | m}\mathop{\sum}\limits_{i\in \delta }\mathop{\sum }\limits_{j=1}^{m}{x}_{ij}/{N}_{j}.$$

(8)

This uses the fact that E[x_ij]/N_j = ϵ for cases where a_i = 0, which is true by definition for negative controls. Next we compute point estimates of the a and ν parameters. To make these estimates, we define a subset ϕ ⊂ {1, 2, . . . , m} representing the indices of the target genes with no known deletions. In the context of the NOMADS16 panel, this includes ten target genes excluding hrp2, hrp2, up., hrp2, down., hrp3, hrp3, up., and hrp3, down. Closed form maximum-likelihood estimators of the parameters of a Dirichlet or Dirichlet-multinomial do not exist⁷⁶, and so instead we estimate a_i using the empirical mean of x_ij/N_j for the target genes in set ϕ:

$${a}_{i}=\frac{1}{| \phi | }\mathop{\sum}\limits_{j\in \phi }{x}_{ij}/{N}_{j}.$$

(9)

Then, following Minka (2012)⁷⁶, we estimate ν using:

$$\log (\nu )=\frac{1}{| \phi | -1}\mathop{\sum}\limits_{j\in (| \phi | -1)}\log \left(\frac{{a}_{i}(1-{a}_{i})}{var({x}_{ij}/{N}_{j})}-1\right).$$

(10)

With point estimates of ϵ, ν and a, we compute the posterior distribution of c_j using Markov Chain Monte Carlo (MCMC). For each target gene j, we run an independent Metropolis-Hastings algorithm to compute the posterior c_j. We initialise the MCMC with c_ij = 1 for all i ∈ {1, 2, . . . , n}. In each iteration, we propose a new ${{{{{{{{\bf{c}}}}}}}}}_{j}^{{\prime} }$ by choosing uniformly from i, and then switching the deletion status of the corresponding c_ij by computing ${c}_{ij}^{{\prime} }=1-{c}_{ij}$. As this proposal is symmetrical, the Hastings Ratio is 1 and we accept the update with probability:

$$\min \left[1,\frac{P({{{{{{{{\bf{x}}}}}}}}}_{j};{N}_{j},{{{{{{{{\bf{c}}}}}}}}}_{j}^{{\prime} },\epsilon,v,{{{{{{{\bf{a}}}}}}}})P({{{{{{{{\bf{c}}}}}}}}}_{j}^{{\prime} })}{P({{{{{{{{\bf{x}}}}}}}}}_{j};{N}_{j},{{{{{{{{\bf{c}}}}}}}}}_{j},\epsilon,v,{{{{{{{\bf{a}}}}}}}})P({{{{{{{{\bf{c}}}}}}}}}_{j})}\right]$$

(11)

In total we conducted 10,000 iterations of the MCMC for each target gene, discarding the first 500 as burn-in. Finally, the posterior probability that a given sample i is carrying a deletion of target gene j is equal to the fraction of the iterations in which c_ij = 0.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Reads were mapped to release 52 of the P. falciparum reference genome for strains 3D7 (PlasmoDB-52_Pfalciparum3D7), Dd2 (PlasmoDB-52_PfalciparumDd2), GB4 (PlasmoDB-52_PfalciparumGB4) and HB3 (PlasmoDB-52_PfalciparumHB3) downloaded from PlasmoDB⁷¹; and to the GRCh38 human reference genome (GRCh38) downloaded from NCBI. Sequence data is available for download from NCBI’s Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra) under the accession PRJNA956048.

Code availability

multiply is available at: https://github.com/JasonAHendry/multiply. Bioinformatics pipeline, SNP calling accuracy, and msp2 analysis code is available at: https://github.com/JasonAHendry/nomadic2.

References

White, N. J. Antimalarial drug resistance. J. Clin. Investig. 113, 1084–1092 (2004).
Article CAS PubMed PubMed Central Google Scholar
Haldar, K., Bhattacharjee, S. & Safeukui, I. Drug resistance in plasmodium. Nat. Rev. Microbiol. 16, 156–170 (2018).
Article CAS PubMed PubMed Central Google Scholar
World Health Organisation. WHO World Malaria Report 2022 (WHO, 2022).
MalariaGEN Plasmodium falciparum Community Project. Genomic epidemiology of artemisinin resistant malaria. Elife 5, e08714 (2016).
Article Google Scholar
Imwong, M. et al. The spread of artemisinin-resistant plasmodium falciparum in the greater mekong subregion: a molecular epidemiology observational study. Lancet Infect. Dis. 17, 491–497 (2017).
Article PubMed PubMed Central Google Scholar
Hamilton, W. L. et al. Evolution and expansion of multidrug-resistant malaria in southeast asia: a genomic epidemiology study. Lancet Infect. Dis. 19, 943–951 (2019).
Article PubMed PubMed Central Google Scholar
Balikagala, B. et al. Evidence of artemisinin-resistant malaria in Africa. N. Eng. J. Med. 385, 1163–1171 (2021).
Article CAS Google Scholar
Uwimana, A. et al. Emergence and clonal expansion of in vitro artemisinin-resistant Plasmodium falciparum kelch13 R561H mutant parasites in Rwanda. Nat. Med. 26, 1602–1608 (2020).
Article CAS PubMed PubMed Central Google Scholar
Menegon, M. et al. Identification of Plasmodium falciparum isolates lacking histidine-rich protein 2 and 3 in Eritrea. Infect. Genet. Evol. 55, 131–134 (2017).
Article CAS PubMed Google Scholar
Berhane, A. et al. Major threat to malaria control programs by plasmodium falciparum lacking Histidine-Rich Protein 2, Eritrea. Emerg. Infect. Dis. 24, 462–470 (2018).
Article CAS PubMed PubMed Central Google Scholar
Golassa, L., Messele, A., Amambua-Ngwa, A. & Swedberg, G. High prevalence and extended deletions in Plasmodium falciparum hrp2/3 genomic loci in Ethiopia. PloS One 15, e0241807 (2020).
Article CAS PubMed PubMed Central Google Scholar
Feleke, S. M. et al. Plasmodium falciparum is evolving to escape malaria rapid diagnostic tests in Ethiopia. Nat. Microbiol. 6, 1289–1299 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ariey, F. et al. A molecular marker of artemisinin-resistant plasmodium falciparum malaria. Nature 505, 50–5 (2014).
Article PubMed ADS Google Scholar
Miotto, O. et al. Genetic architecture of artemisinin-resistant plasmodium falciparum. Nat. Genet. 47, 226–234 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gamboa, D. et al. A large proportion of p. falciparum isolates in the amazon region of peru lack pfhrp2 and pfhrp3: Implications for malaria rapid diagnostic tests. PLOS ONE 5, e8091 (2010).
Article PubMed PubMed Central ADS Google Scholar
Cheng, Q. et al. Plasmodium falciparum parasites lacking histidine-rich protein 2 and 3: a review and recommendations for accurate reporting. Malaria J. 13, 283 (2014).
Article Google Scholar
Gardner, M. J. et al. Genome sequence of the human malaria parasite plasmodium falciparum. Nature 419, 498–511 (2002).
Article CAS PubMed ADS Google Scholar
Rodríguez-Gijón, A. et al. A genomic perspective across earth’s microbiomes reveals that genome size in archaea and bacteria is linked to ecosystem type and trophic strategy. Front. Microbiol. 12, 761869 (2021).
Article PubMed Google Scholar
Martinez-Gutierrez, C. A. & Aylward, F. O. Genome size distributions in bacteria and archaea are strongly linked to evolutionary history at broad phylogenetic scales. PLoS Genet. 18, e1010220 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cui, J., Schlub, T. E. & Holmes, E. C. An allometric relationship between the genome length and virion volume of viruses. J. Virol. 88, 6403–6410 (2014).
Article PubMed PubMed Central Google Scholar
Jacob, C. G. et al. Genetic surveillance in the greater mekong subregion and south asia to support malaria control and elimination. Elife 10, e62997 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tessema, S. K. et al. Sensitive, highly multiplexed sequencing of microhaplotypes from the plasmodium falciparum heterozygome. J. Infect. Dis. 225, 1227–1237 (2022).
Article CAS PubMed Google Scholar
LaVerriere, E. et al. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: a malaria case study. Mol. Ecol. Resour. 22, 2285–2303 (2022).
Article CAS PubMed PubMed Central Google Scholar
Aydemir, O. et al. Drug-resistance and population structure of plasmodium falciparum across the democratic republic of congo using high-throughput molecular inversion probes. J. Infect. Dis. 218, 946–955 (2018).
Article CAS PubMed PubMed Central Google Scholar
Verity, R. et al. The impact of antimalarial resistance on the genetic structure of plasmodium falciparum in the drc. Nat. Commun. 11, 2107 (2020).
Article CAS PubMed PubMed Central ADS Google Scholar
Tegally, H. et al. The evolving sars-cov-2 epidemic in africa: Insights from rapidly expanding genomic surveillance. Science 378, eabq5358 (2022).
Article CAS PubMed Google Scholar
Loman, N. J. & Watson, M. Successful test launch for nanopore sequencing. Nat. Methods 12, 303–4 (2015).
Article CAS PubMed Google Scholar
Quick, J. et al. Real-time, portable genome sequencing for ebola surveillance. Nature 530, 228–232 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar
Faria, N. R. et al. Establishment and cryptic transmission of zika virus in brazil and the americas. Nature 546, 406–410 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Payne, A., Holmes, N., Rakyan, V. & Loose, M. Bulkvis: a graphical viewer for oxford nanopore bulk fast5 files. Bioinformatics 35, 2193–2198 (2019).
Article CAS PubMed Google Scholar
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genom. Biol. 21, 30 (2020).
Article Google Scholar
Runtuwene, L. R. et al. Nanopore sequencing of drug-resistance-associated genes in malaria parasites, Plasmodium falciparum. Sci. Rep. 8, 8286 (2018).
Article PubMed PubMed Central ADS Google Scholar
Razook, Z. et al. Real time, field-deployable whole genome sequencing of malaria parasites using nanopore technology. bioRxiv https://doi.org/10.1101/2020.12.17.423341, https://www.biorxiv.org/content/early/2020/12/18/2020.12.17.423341.full.pdf (2020).
Sabin, S. et al. Portable and cost-effective genetic detection and characterization of plasmodium falciparum hrp2 using the minion sequencer. Sci. Rep. 13, 2893 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar
White, M. T. et al. Immunogenicity of the rts,s/as01 malaria vaccine and implications for duration of vaccine efficacy: secondary analysis of data from a phase 3 randomised controlled trial. Lancet Infect. Dis. 15, 1450–1458 (2015).
Article CAS PubMed PubMed Central Google Scholar
Datoo, M. S. et al. Efficacy of a low-dose candidate malaria vaccine, r21 in adjuvant matrix-m, with seasonal administration to children in burkina faso: a randomised controlled trial. Lancet 397, 1809–1818 (2021).
Article CAS PubMed PubMed Central Google Scholar
Quick, J. et al. Multiplex pcr method for minion and illumina sequencing of zika and other virus genomes directly from clinical samples. Nat. Protoc. 12, 1261–1276 (2017).
Article CAS PubMed PubMed Central Google Scholar
Untergasser, A. et al. Primer3plus, an enhanced web interface to primer3. Nucleic Acids Res. 35, W71–4 (2007).
Article PubMed PubMed Central Google Scholar
Johnston, A. D., Lu, J., Ru, K. L., Korbie, D. & Trau, M. Primerroc: accurate condition-independent dimer prediction using roc analysis. Sci. Rep. 9, 209 (2019).
Article PubMed PubMed Central ADS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–10 (1990).
Article CAS PubMed Google Scholar
Camacho, C. et al. Blast+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Article Google Scholar
MalariaGEN et al. An open dataset of plasmodium falciparum genome variation in 7,000 worldwide samples [version 2; peer review: 2 approved]. Wellcome Open Res. 6, https://doi.org/10.12688/wellcomeopenres.16168.2 (2021).
World Health Organnisation. Report on antimalarial drug efficacy, resistance and response: 10 years of surveillance (2010–2019). (WHO, 2020).
Su, X. Z., Wu, Y., Sifri, C. D. & Wellems, T. E. Reduced extension temperatures required for pcr amplification of extremely a+t-rich dna. Nucleic Acids Res. 24, 1574–5 (1996).
Article CAS PubMed PubMed Central Google Scholar
Miller, R. H. et al. A deep sequencing approach to estimate plasmodium falciparum complexity of infection (coi) and explore apical membrane antigen 1 diversity. Malar J. 16, 490 (2017).
Article PubMed PubMed Central Google Scholar
Josh, Q. One-pot ligation protocol for oxford nanopore libraries. protocols.io https://doi.org/10.17504/protocols.io.k9acz2e (2018).
Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).
Article PubMed Google Scholar
Otto, T. D. et al. Long read assemblies of geographically dispersed plasmodium falciparum isolates reveal highly structured subtelomeres. Wellcome Open Res. 3, 52 (2018).
Article PubMed PubMed Central Google Scholar
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
Article CAS PubMed PubMed Central Google Scholar
Snounou, G. & Beck, H. P. The use of pcr genotyping in the assessment of recrudescence or reinfection after antimalarial drug treatment. Parasitol. Today 14, 462–7 (1998).
Article CAS PubMed Google Scholar
Oyola, S. O. et al. Whole genome sequencing of plasmodium falciparum from dried blood spots using selective whole genome amplification. Malar J. 15, 597 (2016).
Article PubMed PubMed Central Google Scholar
World Health Organisation. Response plan to pfhrp2 gene deletions. (WHO, 2019).
Grignard, L. et al. A novel multiplex qpcr assay for detection of plasmodium falciparum with histidine-rich protein 2 and 3 (pfhrp2 and pfhrp3) deletions in polyclonal infections. EBioMedicine 55, 102757 (2020).
Article PubMed PubMed Central Google Scholar
Kreidenweiss, A. et al. Monitoring the threatened utility of malaria rapid diagnostic tests by novel high-throughput detection of <em>plasmodium falciparum hrp2</em> and <em>hrp3</em> deletions: a cross-sectional, diagnostic accuracy study. eBioMedicine 50, 14–22 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vera-Arias, C. A. et al. High-throughput plasmodium falciparum hrp2 and hrp3 gene deletion typing by digital pcr to monitor malaria rapid diagnostic test efficacy. eLife 11, e72083 (2022).
Article CAS PubMed PubMed Central Google Scholar
Early, A. M. et al. Detection of low-density plasmodium falciparum infections using amplicon deep sequencing. Malar J. 18, 219 (2019).
Article PubMed PubMed Central Google Scholar
Meulenaere, K. D. et al. Selective whole-genome sequencing of Plasmodium parasites directly from blood samples by nanopore adaptive sampling. mBio 15, e01967-01923 (2024).
Lerch, A. et al. Development of amplicon deep sequencing markers and data analysis pipeline for genotyping multi-clonal malaria infections. BMC Genom. 18, 864 (2017).
Article Google Scholar
Callahan, B. J. et al. Dada2: High-resolution sample inference from illumina amplicon data. Nat. Methods 13, 581–3 (2016).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. et al. Whatshap: fast and accurate read-based phasing, https://doi.org/10.1101/085050 (2016).
Hathaway, N. J., Parobek, C. M., Juliano, J. J. & Bailey, J. A. Seekdeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res. 46, e21 (2018).
Article CAS PubMed Google Scholar
Waltmann, A. et al. Matched placental and circulating plasmodium falciparum parasites are genetically homologous at the var2csa id1-dbl2x locus by deep sequencing. Am. J. Trop. Med. Hyg. 98, 77–82 (2018).
Article PubMed Google Scholar
Straimer, J. et al. Drug resistance. k13-propeller mutations confer artemisinin resistance in plasmodium falciparum clinical isolates. Science 347, 428–31 (2015).
Article CAS PubMed ADS Google Scholar
Plowe, C. V. et al. Mutations in plasmodium falciparum dihydrofolate reductase and dihydropteroate synthase and epidemiologic patterns of pyrimethamine-sulfadoxine use and resistance. J. Infect. Dis. 176, 1590–6 (1997).
Article CAS PubMed Google Scholar
Roper, C. et al. Antifolate antimalarial resistance in southeast africa: a population-based analysis. Lancet 361, 1174–81 (2003).
Article PubMed Google Scholar
Roper, C. et al. Intercontinental spread of pyrimethamine-resistant malaria. Science 305, 1124 (2004).
Article CAS PubMed Google Scholar
MalariaGEN et al. Pf7: an open dataset of plasmodium falciparum genome variation in 20,000 worldwide samples [version 1; peer review: awaiting peer review]. Wellcome Open Res. 8, https://doi.org/10.12688/wellcomeopenres.18681.1 (2023).
Fola, A. A. et al. Plasmodium falciparum resistant to artemisinin and diagnostics have emerged in ethiopia. Nat. Microbiol. 8, 1911–1919 (2023).
Article CAS PubMed PubMed Central Google Scholar
International HapMap Consortium. The international hapmap project. Nature 426, 789–96 (2003).
Article Google Scholar
Trager, W. & Jensen, J. B. Human malaria parasites in continuous culture. Science 193, 673–5 (1976).
Article CAS PubMed ADS Google Scholar
Aurrecoechea, C. et al. Plasmodb: a functional genomic database for malaria parasites. Nucleic Acids Res. 37, D539–D543 (2008).
Article PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. Bedtools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Minkna, T. P. Estimating a dirichlet distribution. https://tminka.github.io/papers/dirichlet (2012).
Fidock, D. A. et al. Mutations in the p. falciparum digestive vacuole transmembrane protein pfcrt and evidence for their role in chloroquine resistance. Mol. Cell 6, 861–71 (2000).
Article CAS PubMed PubMed Central Google Scholar
Djimdé, A., Doumbo, O. K., Steketee, R. W. & Plowe, C. V. Application of a molecular marker for surveillance of chloroquine-resistant falciparum malaria. Lancet 358, 890–891 (2001).
Article PubMed Google Scholar
Cowman, A. F., Morry, M. J., Biggs, B. A., Cross, G. A. & Foote, S. J. Amino acid changes linked to pyrimethamine resistance in the dihydrofolate reductase-thymidylate synthase gene of plasmodium falciparum. Proc. Natl. Acad. Sci. USA 85, 9109–13 (1988).
Article CAS PubMed PubMed Central ADS Google Scholar
Brooks, D. R. et al. Sequence variation of the hydroxymethyldihydropterin pyrophosphokinase: dihydropteroate synthase gene in lines of the human malaria parasite, plasmodium falciparum, with differing resistance to sulfadoxine. Eur. J. Biochem. 224, 397–405 (1994).
Article CAS PubMed Google Scholar
Triglia, T., Wang, P., Sims, P. F., Hyde, J. E. & Cowman, A. F. Allelic exchange at the endogenous genomic locus in plasmodium falciparum proves the role of dihydropteroate synthase in sulfadoxine-resistant malaria. EMBO J. 17, 3807–3815 (1998).
Article CAS PubMed PubMed Central Google Scholar
Price, R. N. et al. Mefloquine resistance in plasmodium falciparum and increased pfmdr1 gene copy number. Lancet 364, 438–447 (2004).
Article CAS PubMed PubMed Central Google Scholar
Amato, R. et al. Genetic markers associated with dihydroartemisinin-piperaquine failure in plasmodium falciparum malaria in cambodia: a genotype-phenotype association study. Lancet Infect. Dis. 17, 164–173 (2017).
Article CAS PubMed Google Scholar
Witkowski, B. et al. A surrogate marker of piperaquine-resistant plasmodium falciparum malaria: a phenotype-genotype association study. Lancet Infec. Dis. 17, 174–183 (2017).
Article CAS Google Scholar

Download references

Acknowledgements

The NOMADS project is funded by the Bill & Melinda Gates Foundation (INV-003660 to K.S., D.J.B., M.M. and J.A.H., INV-048316 to K.S., D.J.B., M.M., B.H. and J.A.H.). The research was supported by the Wellcome Trust Core Award Grant Number 203141/Z/16/Z with additional support from the NIHR Oxford BRC. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. We are grateful to all health workers and patients who supported collection of field samples in Zambia. We thank Kirk Rockett for providing laboratory space and support during initial stages of the project; staff at the Oxford Genomics Centre for sequencing support, especially Amy Trebes and David Buck; Nada Kubikova for discussions about multiplex PCR primer design; Gavin Band and Annie Forster for beta testing of multiply; Shazia Ruybal for discussions about msp2 analysis. We also acknowledge the support of the Royal Geographical Society with IBG and Jaguar Land Rover for instigating initial collaborations through the 2018 RGS Land Rover Bursary which was awarded to J.A.H., I.G. and G.B.B.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Mariateresa de Cesare, Mulenga Mwenda.

Authors and Affiliations

Nuffield Department of Medicine, University of Oxford, Wellcome Centre for Human Genetics, Oxford, UK
Mariateresa de Cesare, Anna E. Jeffreys, Isaac Ghinai, George B. Busby & Jason A. Hendry
PATH, Lusaka, Zambia
Mulenga Mwenda, Chris Drakeley, Kammerle Schneider, Brenda Mambwe & Daniel J. Bridges
National Malaria Elimination Centre, Chainama, Lusaka, Zambia
Jacob Chirwa, Busiku Hamainza & Moonga Hawela
Max Planck Institute for Infection Biology, Berlin, Germany
Karolina Glanz, Christina Ntalla, Manuela Carrasquilla, Silvia Portugal & Jason A. Hendry
Imperial College London, London, UK
Robert J. Verity
Department of Pathology and Laboratory Medicine and Center for Computational Molecular Biology, Brown University, Providence, RI, USA
Jeffrey A. Bailey

Authors

Mariateresa de Cesare
View author publications
You can also search for this author in PubMed Google Scholar
Mulenga Mwenda
View author publications
You can also search for this author in PubMed Google Scholar
Anna E. Jeffreys
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Chirwa
View author publications
You can also search for this author in PubMed Google Scholar
Chris Drakeley
View author publications
You can also search for this author in PubMed Google Scholar
Kammerle Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Brenda Mambwe
View author publications
You can also search for this author in PubMed Google Scholar
Karolina Glanz
View author publications
You can also search for this author in PubMed Google Scholar
Christina Ntalla
View author publications
You can also search for this author in PubMed Google Scholar
Manuela Carrasquilla
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Portugal
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Verity
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A. Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Isaac Ghinai
View author publications
You can also search for this author in PubMed Google Scholar
George B. Busby
View author publications
You can also search for this author in PubMed Google Scholar
Busiku Hamainza
View author publications
You can also search for this author in PubMed Google Scholar
Moonga Hawela
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Bridges
View author publications
You can also search for this author in PubMed Google Scholar
Jason A. Hendry
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.dC.: Methodology, Investigation, Writing - Review & Editing. M.M.: Methodology, Investigation, Writing - Review & Editing. A.J.: Methodology, Investigation, Validation, Writing - Review & Editing. J.C.: Resources; C.D.: Project administration, Writing - Review & Editing. K.S.: Project administration, Funding acquisition, Writing - Review & Editing. B.M.: Investigation; K.G.: Investigation; R.V: Conceptualisation, Formal Analysis; C.N.: Resources; M.C.: Resources; S.P.: Resources; J.A.B.: Supervision, Reviewing & Editing. I.G.: Conceptualisation, Funding acquisition, Writing - Review & Editing. G.B.: Conceptualisation, Funding acquisition, Writing - Review & Editing. B.H.: Project administration, Funding acquisition. M.H.: Project administration, Resources. D.J.B.: Conceptualisation, Supervision, Project administration, Funding acquisition, Writing - Review & Editing. J.A.H.: Conceptualisation, Supervision, Funding acquisition, Investigation, Formal analysis, Software, Visualization, Writing - Original Draft.

Corresponding author

Correspondence to Jason A. Hendry.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Cristian Koepfli, Yutaka Suzuki and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

de Cesare, M., Mwenda, M., Jeffreys, A.E. et al. Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing. Nat Commun 15, 1413 (2024). https://doi.org/10.1038/s41467-024-45688-z

Download citation

Received: 06 March 2023
Accepted: 31 January 2024
Published: 15 February 2024
DOI: https://doi.org/10.1038/s41467-024-45688-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.