Introduction

Recent innovations in sequencing technologies have created new opportunities for the strategic use of genomics in malaria surveillance1,2,3. Examples include more accurate data on emergence and spread of drug and diagnostic resistance4,5, inferring parasite connectivity to support the classification of imported cases6, and predicting vaccine effectiveness7. Furthermore, and still at a more theoretical stage, genomic diversity can be used to assess differences and changes in transmission intensity8,9,10,11,12. This could be especially useful for stratification and evaluating the effectiveness of anti-malarial interventions.

For continuous genomic surveillance of malaria, samples must be collected regularly, and, especially critical for low-resource settings, cost-efficiently2,13,14. Pregnant women attending their first antenatal care (ANC) consultation are an easy-access subpopulation that could potentially serve as a sentinel group for malaria surveillance13,15,16. Besides low cost and easy accessibility, advantages of ANC-based surveillance include temporal continuity, known denominator populations, and the possibility of capturing asymptomatic infections15. Malaria burden trends in pregnant women at their first ANC visit have been shown to mirror community trends17, and routine malaria testing at ANC has already been implemented in Tanzania, where it is generally perceived as acceptable and positive by both patients and providers18. A few small studies, mostly outside of Africa, have investigated malaria genetic diversity in pregnant women using whole genome sequencing19, microsatellite markers20,21 or nested polymerase chain reaction (PCR)22,23. However, routinely collected genomic data from ANC has not been evaluated for its suitability for sentinel surveillance.

We hypothesized that the Plasmodium (P.) falciparum parasite population circulating in pregnant women at their first ANC visit and in the community are genetically similar, including similar genetic diversity (intra-host and population-level), relatedness between infections, and prevalence of antimalarial resistance markers. To test our hypothesis, we analyzed the parasite population in ANC users in southern Mozambique, and compared it to parasites found in children aged 2–10 years sampled in household surveys. Furthermore, we compared the parasite populations in three areas with declining transmission between 2015 and 2018. Manhiça and Magude are low-transmission areas, with Magude recently targeted for elimination with a package interventions24, while Ilha Josina is a historically high-transmission setting17.

Results

Sequencing performance

A total of 558 P. falciparum-positive dried blood spot (DBS) samples from ANC users (n = 378) and children sampled in population-representative household surveys (n = 180) were sequenced.

We used a panel targeting 241 amplicons in the P. falciparum genome25. The amplicons included markers of drug resistance in 15 genes and 165 microhaplotypes that had been selected to provide information about genetic diversity in the parasite population. On average, quality reads were obtained for 212 and 170 loci per sample for ANC users and children, respectively (Table 1). 68.5% (382/558) of the samples passed the filtering criteria. A lower proportion of samples from children passed filtering (51.7%) compared to ANC users (76.5%). Total number of reads per sample and number of loci covered per sample (n = 558) was primarily a function of parasite density (Fig. 1A, B). Across all samples sequenced, parasite densities were lower in those from children than those from ANC (Table 1). Parasite densities were similar between populations among samples that passed filtering (geometric mean [GM] = 191 and GM = 154 parasites/µL, respectively), and among samples that were filtered out (GM = 7 parasites/µL for both, Fig. 1C). Sequencing coverage was high across samples that passed filtering (n = 328), with a geometric mean total reads per sample of 319,503 and 484,245, and a median 223 and 213 out of 241 loci covered per sample, for ANC users and children, respectively (Table 1). On average, each locus (n = 241) was covered by 1.4 million reads and reads from 462 samples (Fig. 1D, E).

Table 1 Characteristics of study participants by population group and inclusion in analysis after filtering
Fig. 1: Sequencing performance.
figure 1

A Total number of reads per sample (n = 558) by parasite density before filtering. Red indicates pregnant women at their first ANC visit, and blue indicates children sampled in household surveys. B Number of loci (total n = 224) with reads per sample (n = 558) by parasite density before filtering. C Parasite density among samples that passed and did not pass filtering by population group (n = 378 ANC users and n = 180 children). D, E Reads per locus per included sample (n = 382) for diversity loci by chromosome and for drug resistance markers by gene, respectively. On average, each locus was covered by 1.4 million reads. Boxes indicate the 25th and 75th percentiles with the centre line indicating the median, and the whiskers indicate the smallest value within 1.5 times interquartile range below the 25th percentile, and the largest value within 1.5 times the interquartile range above 75th percentile.

Intra-host genetic diversity

Half of the pregnant women attending ANC consultations carried polyclonal infections (Table 2). On average, ANC users had a multiplicity of infection (MOI; unadjusted for relatedness) of 2.37, i.e., carried 2.37 genetically different P. falciparum parasite clones (Supplementary Table 1). 1-Fws was 0.39, suggestive of inbreeding in the population. Effective multiplicity of infection (eMOI), which incorporates intra-host relatedness between clones, was lower than MOI at 1.8, indicative of co-transmission over super-infections, leading to inbreeding (Supplementary Fig. 1C). Parasite density was associated with measured intra-host diversity, with higher diversity observed for women with higher-density infections. eMOI showed an overall declining trend from 2017 to 2018–2019, and was highest in Magude. 1-Fws showed similar trends but did not reach statistical significance. Primigravid women had higher eMOI compared to multigravidae in the univariate analysis, but the effect disappeared when adjusting for parasitemia, time, and area (Supplementary Table 2). No statistically significant differences were observed between seasons or human immunodeficient virus (HIV)-status groups. Among children, 62.4% carried polyclonal infections, the average MOI and eMOI was 2.86 and 2.3, respectively, and 1-Fws was 0.55. Similar to ANC users, children with higher-density infections showed higher eMOI.

Table 2 Factors associated with intra-host Plasmodium falciparum diversity among first antenatal care users and children

Temporal trends in intra-host genetic diversity

A significant interaction was observed between area and time in the multivariate analysis of intra-host diversity at ANC, indicating different temporal trends within the three areas. Parasite densities did not change over time (Supplementary Fig. 2). In Magude, eMOI declined by 50% per year (95%CI: −0.78;−0.25, p < 0.0002, Fig. 2A–C, Supplementary Table 3), with a shift toward more infections having eMOI > 2 (Supplementary Fig. 3), while 1-Fws and odds of infections being polyclonal showed declining trends (58% and 46% yearly decline, respectively). Similar trends were observed from naively estimated MOI and MOI not adjusted for intra-host relatedness (Supplementary Fig. 4). No temporal changes in intra-host diversity were observed in Manhiça, while in Ilha Josina, there was an increasing trend over time in polyclonal infections, although not statistically significant (Fig. 2B). Intra-host diversity among 47 children from Magude sampled cross-sectionally were compared with samples from ANC users in Magude (Fig. 2A–C Magude panel and D–F, Supplementary Table 4). In multivariate regressions combining both populations, all metrics of intra-host diversity showed declining trends over time. Both populations showed very significant declines in eMOI (−37% and −51% per year for children and ANC users, respectively), and eMOI was not associated with population group (p = 0.20). 1-Fws also tended to decline in both population groups, and no effect of population was detected. Further comparisons between ANC and children in the other two areas were precluded by limited number of samples from children.

Fig. 2: Temporal trends in Plasmodium falciparum intra-host genetic diversity among first antenatal care users by area and children in Magude.
figure 2

Intra-host genetic diversity over time in pregnant women attending their first antenatal care (ANC) visit (in red) by area (N = 120 in Magude, n = 64 in Manhiça, n = 105 in Ilha Josina), and children aged 2–10 years old from the community (blue, n = 47). Darker shade of color reflects higher parasite densitiy. Black dots represent Plasmodium falciparum (Pf) parasite rates (PR) by qPCR in the same population with 95% confidence intervals (CI) bars. A Effective multiplicity of infection (eMOI) in first ANC users by area in 0-truncated Poisson regression adjusted for parasitemia (Pf parasites/µl) with 95% CI bands. P values for temporal trend of eMOI in the regression (two-sided test) adjusted for multiple testing. B Monoclonal (eMOI ≤ 1.1) and polyclonal (eMOI>1.1) infections in pregnant women at ANC by area in a logistic regression adjusted for parasite density with 95% CI bands. C 1-Fws in pregnant women at ANC by area in a logistic regression adjusted for parasitemia with 95% CI bands. DF eMOI, polyclonality and 1-Fws in children from Magude, estimated with Poisson and logistic regressions similar to AC with 95% CI bands. P values in all graphs are for the temporal trend of the given metric in the regression (F test). Adjusted for multiple testing using the Benjamin–Hochberg method, a p value of <0.0062 indicates statistical significance.

Relationship between intra-host genetic diversity and parasite rates

To assess the potential of using intra-host genetic diversity as a proxy for transmission intensity, we compared mean eMOI, proportion polyclonal, and mean 1-FWS, with another proxy for transmission intensity, P. falciparum parasite rates using quantitative polymerase chain reaction (qPCR) (Fig. 2, Supplementary Figs. 5, 6). P. falciparum parasite rates in ANC users declined in all three areas during the study, starting from a higher rate in Ilha Josina compared to Magude and Manhiça (previously reported in17). In Magude, eMOI in both ANC users and children showed positive Pearson’s correlation coefficients (PCC) close to 1 (>0.85), although not statistically significant. Furthermore, for ANC users in Magude, both 1-FWS and proportion of infections that were polyclonal showed PCC > 0.65. In the other two areas, PCC was negative, but small to moderate and not statistically significant, for all three metrics of intra-host genetic diversity.

Population genetic diversity

Among ANC users, population mean expected heterozygosity (HE) across the 165 microhaplotype loci ranged from <0.01 to 0.90, with a mean of 0.57 (95% CI: 0.54–0.60, Fig. 3A). Three to 58 unique alleles were observed for each locus (Fig. 3B). In order to compare HE between time windows, areas, and population groups, the larger populations were randomly subsampled within areas and/or years once without replacement to match the smaller population in size. Overall, HE did not change between 2017 and 2018–2019 (Fig. 3E, F, Supplementary Table 7). Comparing HE between ANC populations in the three areas, parasites in Magude showed less diversity than the parasite population in Ilha Josina (Fig. 3C, D, Supplementary Table 7). Mean HE did not differ between ANC and children populations (p = 0.95, Fig. 3G, H, Supplementary Table 8).

Fig. 3: Population-level Plasmodium falciparum genetic diversity among first antenatal care users and children.
figure 3

A Expected heterozygosity (He) and 95% confidence interval for each microhaplotype (n = 165) across the population of pregnant women attending their first antenatal care visit (n = 289) estimated with MOIRE (R package). Mean HE across all loci indicated with a dotted line. Darker shade of green reflects higher diversity. B Number of distinct alleles observed for each locus (ordered by increasing HE as in A). C HE per locus for each area among ANC users. A single random subsampling without replacement of Magude and Ilha Josina was performed to balance sample size with Manhiça (n = 64). D Per-locus difference in HE between Magude and Ilha Josina. Overall difference between the two areas assessed with a linear mixed model with random intercepts and slopes per locus. E HE per locus for 2017 and 2018–2019 among ANC users. Random subsampling of 2017 performed to balance sample size with 2018–2019 (n = 123). Loci connected between years by gray lines. F Per-locus difference (Δ) in HE between years. Overall difference between years assessed with a linear mixed model with random intercepts and slopes per locus. G HE per locus for ANC users (pink) and children aged 2–10 years from the community (light blue) in overlapping years. Random subsampling of ANC users performed to balance sample size with children (n = 33), matching area of residence. H Per-locus difference in HE between children and ANC users. Overall difference between the two areas assessed with a linear mixed model with random intercepts and slopes per locus. Boxes indicate the 25th and 75th percentiles with the centre line indicating the median, and the whiskers indicate the smallest value within 1.5 times interquartile range below the 25th percentile, and the largest value within 1.5 times the interquartile range above 75th percentile. P values in C, EG are from F-tests. Adjusted for multiple testing using the Benjamin–Hochberg method, a p value of <0.0062 indicates statistical significance.

Pairwise inter-host genetic relatedness

Genetic relatedness between pairs of P. falciparum infections, including unphased polyclonal infections, was estimated with an identity by descent (IBD)-based approach. To compare relatedness between areas and populations, we performed permutation of labels and compared mean IBD in each area or population with permutation distributions. Infections from ANC users (n = 83,521 pairs) generally showed low relatedness, with a mean pairwise IBD of 0.026 (95% CI: 0.022;0.033) (Supplementary Fig. 7). IBD was slightly but significantly higher between infections in Magude compared to within and between other areas (Supplementary Fig. 7a, Supplementary Table 9). Infections in children tended to be more related compared to infections in ANC users, and between the two populations. Restricting the comparison to samples from overlapping years (2017–2020) and temporal windows (April 15 to June 30), mean IBD between ANC infections was 0.018, similar to the mean IBD of 0.017 observed for infections in children (Supplementary Fig. 7b, Supplementary Table 10).

Markers of drug resistance

The prevalence of all markers of antimalarial resistance assessed in this study were similar between ANC users and children from the community (Table 3). Parasites with quintuple 51-59-108-437-540 mutations in the dihydrofolate reductase and dihydropteroate synthetase (pfdhfr-pfdhps) genes were highly prevalent in both populations (>90%). In particular, sulphadoxine-pyrimethamine (SP) resistance-associated polymorphisms in the pfdhfr gene had almost reached fixation in the population, with 98.6% carrying the triple 51-59-108 mutant. No A581G nor I431V mutations in pfdhps were detected. Three quarters of the study population carried a multidrug resistance 1 (pfmdr1) F184Y gene mutation associated with amodiaquine resistance, while 1.2% carried the N86Y, and 0.3% carried the D1246Y mutations. The chloroquine resistance transporter (pfcrt) 72-76 CVIET mutant genotype was observed in four individuals, three of them children. No mutations in the kelch 13 propeller gene (pfkelch13) associated with artemisinin partial resistance, was observed in either population.

Table 3 Prevalence of Plasmodium falciparum drug resistance markers among first antenatal care users and children

Discussion

This study applied a multiplexed amplicon sequencing approach targeting microhaplotypes and drug resistance markers to assess the representability of pregnant women attending their first ANC consultation for sentinel P. falciparum genomic surveillance. We found that genetic diversity and pairwise inter-host relatedness, as well as prevalence of drug resistance markers, were consistent between first ANC users and children aged 2–10 years, representing the community. In Magude, which was subject to an elimination campaign, similar declining trends in intra-host diversity were observed for both ANC users and children. Our findings demonstrate the potential of ANC-based malaria genomics as a straight-forward and cost-efficient approach to assess the impact of antimalarial interventions and genetic variants of public health concern.

Pregnant women seeking ANC have previously been shown to mirror trends in malaria prevalence in the general population, although with a delay, and with more heterogeneity between gravidity groups at higher transmission settings16,17. A few studies have also compared the genetic diversity of parasite populations in pregnant women and the community19,20,22,23, but these were based on small sample sizes, only one took place in Africa23, and, importantly, none accounted for parasite densities. With this study, we expand the potential scope of ANC-based surveillance to include genomic surveillance of P. falciparum genetic diversity and resistance markers. We find that both primigravid and multigravid first ANC users, regardless of HIV status, can be included in a sentinel population. Since no difference was observed between seasons, sampling could take place throughout the year. However, other studies did find seasonal differences8, indicating that this might depend on the setting. Furthermore, it may not be realistic to reach sufficient sample sizes at ANC facilities alone at very low transmission, and it would be necessary to combine ANC sampling with other sampling strategies, such as health facility surveys. ANC sampling would also not be ideal if the goal is to identify finer relatedness patterns, including transmission networks, because of the temporal sparsity of samples. The very low inter-host relatedness observed at ANC might reflect little localized transmission, although more dense sampling would probably be required to detect this. Consistent with previous observations that parasite populations are at least partially structured in time26, inter-host relatedness was higher among cross-sectionally sampled children than among continuously sampled ANC users, with the difference disappearing when restricting the comparison to similar temporal windows.

Genetic diversity has been proposed as a surrogate marker of transmission intensity4,9,10,12. In line with this and previous studies8,11, we found the highest population diversity (HE) in the highest-transmission setting, Ilha Josina. Conversely, we also found the lowest intra-host diversity in Ilha Josina (both eMOI and 1-Fws). This might be explained by importation of parasites to low-transmission Magude and Manhiça from areas with higher transmission. A study from nearby low-transmission Eswatini observed similarly high diversity, which was attributed to frequent importation27. Alternatively, pregnant women in Iha Josina were previously found to, on average, have lower parasite densities compared to women from Magude and Manhiça28, probably reflecting higher levels of anti-parasite immunity. Since parasite density was a major predictor of intra-host diversity, this could affect the likelihood of encountering and being able to measure multiple clones in these women. Comparing P. falciparum parasite rates with metrics of intra-host genetic diversity, we only observed positive correlations in Magude. This indicates that other factors might affect the relationship between parasite rates and genetic diversity, such as the temporal scales at which malaria transmission and genetic diversity change10,29. Furthermore, the sites are likely to differ in host immunity due to previous exposure28 which, again, might affect the chance of measuring multiple clones in one individual. Similarly, healthcare-seeking behavior and the use of antimalarials might differ between the sites. Finally, it cannot be discarded that diversity statistics may be more directly impacted by the interventions deployed in Magude, with the transmission decline being a correlation. Genetic diversity on its own might, therefore, not always be a suitable proxy for local transmission intensity, and stratification based on genetic metrics should be carefully validated against other epidemiological data, including assessing the potential role of importation and underlying reasons for changes to transmission intensity.

On the other hand, genetic indicators of reduced transmission observed within Magude (decline in eMOI and 1-Fws, increased mean IBD, and lower HE) highlight how parasite genomics can complement clinical and epidemiological data to evaluate the impact of control interventions. Between 2015 and 2017, Magude was targeted with biannual rounds of mass drug administration (MDA), followed by reactive focal MDA in 2018, and three rounds of indoor residual spraying (IRS)24. Even though parasite rates declined in all three areas during the study period, and at similar levels and rates in Magude and the control area Manhiça28, we only observed evidence of declining intra-host diversity in Magude. Furthermore, Magude showed significantly lower population diversity and higher mean IBD compared to the other areas. A study from Zambia found a similar reduction in the complexity of infections following an MDA trial30. These findings reveal programmatically important changes to the parasite population structure, not apparent from prevalence and incidence estimates.

Strengths of this study include the rich data obtained from deep amplicon sequencing, with sensitivity to achieve good coverage for samples with down to 10 parasites/µL. A previous study comparing amplicon sequencing to whole genome sequencing showed that amplicon sequencing provides higher coverage, thereby allowing for more sensitive detection of minority strains, even for infections with high parasite densities31. Compared to single nucleotide polymorphism (SNP)-based methods, microhaplotypes allow for higher resolution and consequently more accurate estimates of diversity and relatedness, while being more convenient than microsatellites9,31. Microhaplotypes have previously been used in a nation-wide study from Mozambique, where they proved able to distinguish parasites from the northern and southern parts of the country4. Furthermore, although possible to distinguish major and minor alleles in polyclonal infections using SNPs, highly diverse microhaplotypes allow a more accurate assessment and better utilization of the full allele diversity in polyclonal samples31, which was half of the samples in this study. Relatedness between clones within a sample was evident from eMOI being lower than MOI, indicating that co-transmission is a more common event than superinfections. For our main analysis, we take this relatedness into account by using eMOI rather than MOI to estimate within-host diversity. Another strength of this study is the large ANC sample size, collected prospectively across three years in three different transmission scenarios. To the best of our knowledge, this study represents the most comprehensive assessment of genetic diversity and relatedness of malaria infections among ANC users to date.

This study is limited by the number of samples available to sequence from children, particularly when stratifying by site and year, restricting comparisons with ANC users. For intra-host diversity, we therefore focused on Magude, where most samples from children originated. We did not consider the potential issue of parasite importation from neighboring regions, nor reasons for ANC non-attendance, although we would not expect any potential selection bias15 to affect the parasite population. Data on previous malaria infection and treatment was not available, and this study was, therefore, limited to single time-point assessment of malaria infection. To confirm the generalizability of this approach for routine surveillance, more studies should be carried out in different epidemiological settings and include larger community sample sizes. Finally, we observed a clear dependence of sequencing coverage on parasite density, which may be explained by technical limitations. When only few, if any, parasite genomes are present in DNA extracted from a DBS, it will be difficult to amplify the parasite DNA for sequencing. This limitation applies to all genotyping techniques9, and we reached comparably high sensitivity with the protocol applied here. The relationship between parasite density and intra-host diversity may also be affected by biological processes, such as competitive stress and host immunity32, and future studies are needed to investigate this. Regardless of underlying causes, parasite density is an important confounding factor to adjust for when studying intra-host diversity.

In conclusion, this study extends the scope of ANC-based sentinel surveillance to include genomic malaria surveillance. We did not observe differences in resistance markers between P. falciparum collected from ANC users and children representing the community. When adjusting for parasite density, time, and study area, we also did not see differences in genetic diversity or pairwise relatedness between the two populations. In both ANC users and the community, we found genetic indicators of a recent reduction in the parasite population in an area targeted for elimination, demonstrating the added value of genomic data for impact evaluation. Multiplexed amplicon sequencing has great potential to support decision-makers with genomic intelligence, and adopting a cost-effective and convenient ANC-based sampling strategy would be a valuable step towards making genomic surveillance more feasible in malaria-endemic areas.

Methods

Study design and setting

This genomic surveillance study took place between 2015 and 2019 in three malaria-endemic areas in Maputo province in southern Mozambique (Supplementary Methods). Transmission intensity ranged from low in Manhiça and Magude, to moderate-to-high in Ilha Josina, and it declined in all three areas during the study17. Magude district was subject to a package of interventions in 2015–2018 including MDA with dihydroartemisinin- piperaquine and IRS with dichlorodiphenyl-trichloroethane and pirimiphos-methyl, resulting in a 85% reduction of in all-age P. falciparum parasite rates24 (Supplementary Methods p 3).

Study participants

Samples were collected from pregnant women at ANC clinics and children participating in household surveys. 10,439 pregnant women were recruited when attending their first ANC visit at Manhiça District Hospital, Ilha Josina Health Center, or Magude Health Center between November 2016 and November 2019. For 8910 of the visits, informed consent to participate was obtained, and 8745 visits were included in the study28. The main reason for exclusion was not residing in the area. Women donated a finger-prick drop blood onto filter paper (dried blood spot), and HIV status, date, age, gravidity, area of residence, and recent movements were recorded. 6471 samples (74%) were selected for molecular analysis in order to determine P. falciparum parasite rates in the three sites with a margin of error lower than or equal to expected parasite rates (5–20%). 483 (7.5%) of these were positive for P. falciparum by qPCR. 9381 children aged 2–10 years were sampled for annual age-stratified household surveys in the study area. The surveys were conducted around May every year (following the rainy season) from 2015 to 2019. DBS were obtained together with basic sociodemographic, clinical, and vector-control information24. Self-reported gender was evenly represented in the surveys, with 50.3% girls and 49.2% boys (information for remaining 0.5% was not available). 6767 (72.1%) of the DBS were selected for molecular analysis. 366 (5.4%) of the samples were positive by qPCR. This study was a secondary analysis of DBS left from previous analyses. 378 (78.3%) and 180 (49.1%) of samples from ANC users and children, respectively had enough material left for sequencing (at least a third of a DBS).

Inclusion and ethics

The team of authors combine researchers from Mozambique and non-malaria endemic areas. Mozambican researchers were involved throughout the research process. The research question aims to improve local malaria surveillance and is relevant to the local communities affected by malaria. All study protocols were approved by CISM’s and Hospital Clínic of Barcelona’s ethics committees, and the Mozambican Ministry of Health National Bioethics Committee. All study participants gave written informed consent, or in the case of minors, written informed assent and consent by a parent/guardian.

Amplicon sequencing

DNA was extracted from 558 available P. falciparum-positive DBS samples (from 378 ANC users and 180 children) using a Tween-Chelex based protocol (Supplementary Methods p 3). A multiplex panel of PCR primers targeting 241 P. falciparum amplicons of 150–250 bp was developed (Paragon Genomics Inc, California, USA) (Supplementary Data 1). Amplicons included 165 microhaplotypes informative about genomic diversity and relatedness in southern Africa25,31, and markers of drug resistance in 15 genes4, including polymorphisms associated with resistance to artemisinin (pfk13), SP (pfdhfr and pfdhps genes), and amodiaquine (pfcrt and pfmdr1 genes). We followed the CleanPlex® protocol (Paragon Genomics Inc, USA) with some modifications (Supplementary Methods p 3). Briefly, DNA was amplified for 15 or 20 cycles for multiplexed PCR, depending on parasitemia and ability to amplify, and for 15 cycles for indexing PCR. A randomly selected subset of resulting libraries was assessed by capillary electrophoresis using a TapeStation (Agilent technologies, California, USA). Libraries were pooled accounting for differences in yield due to parasitemia, and the pool was bead-cleaned using CleanMag® Magnetic Beads at 1× ratio to remove primer dimers. Pooled libraries were run on an agarose gel from which the amplicon-sized band was excised, and DNA extracted using Monarch® DNA Gel Extraction Kit (New England Biolabs Inc., Massachusetts, USA). Library pools were quantified and assessed using a TapeStation and a Qubit fluorometer. The purified libraries were sequenced in either a MiniSeq, or NextSeq instrument (Illumina, San Diego, USA).

Bioinformatics and data filtering

FASTQ files were run through a Nextflow-based pipeline33 (version 0.1.5), to infer alleles. Briefly, reads were demultiplexed for each locus using cutadapt34, and DADA235 was used to cluster reads using an error-inference model. Reads were filtered and truncated based on quality and length, also using cutadapt and DADA2. To reduce false positives, homopolymers and tandem repeats were masked. The resulting allele table was subsequently filtered based on read counts and coverage across loci within a sample and across samples. Alleles with fewer reads than the maximum observed reads in any locus for negative controls (14 reads) were removed, along with alleles with <1% within-sample frequency. Samples with a coverage of <50 diversity loci with a read depth of 100 were filtered out. Finally, diversity loci with <100 samples covering them with a read depth of 100 were also removed.

Definitions

Rainy season was defined as November 1st to April 30th, and the remaining year as dry season28. Years were defined based on transmission season, i.e., from November 1st to October 31st. When comparing time periods for ANC users, 2018 and 2019 were combined to balance sample size with 2017, where more cases were sampled due to higher transmission. Only children were sampled in 2015 and 2016, and these years were also combined. Primigravidity was defined as a first pregnancy, while multigravidity was defined as having had at least one previous pregnancy. Population diversity was measured as HE, i.e., the probability that two randomly selected parasites carry distinct alleles at each diversity locus (n = 165). It was calculated as:

$${H}_{{{{{{\rm{E}}}}}}}=\left(\frac{{{{{{\rm{n}}}}}}}{{{{{{\rm{n}}}}}}-1}\right)\left(1-{\sum }_{i}{p}_{i}^{2}\right)$$
(1)

where n is the population size, and pi is the frequency of the ith allele, with allele frequencies estimated statistically using a Markov chain Monte Carlo (MCMC) algorithm from MOIRE version 3.0.0 (R package)36,37. MOIRE source code is available at https://github.com/EPPIcenter/moire. Briefly, MOIRE uses a Bayesian approach to estimate allele frequencies, within-host relatedness, and within-host diversity from polyallelic data subject to experimental error. For the MCMC, we used 10,000 burnin samples followed by 10,000 samples run across 20 parallel tempered chains. We set alpha and beta to 0.1 and 9.9 for false positive rates, and 0.1 and 9.9 for false negative rates, respectively, shape and scale to 0.1 and 10.0 for mean COI, and alpha and beta to 1.0 and 1.0 for relatedness. Intra-host diversity was measured using the following metrics: MOI, eMOI, 1-Fws, and proportion of polyclonal infections (eMOI>1.1). Individual MOI and eMOI was also estimated with the MOIRE MCMC algorithm, accounting for allele frequencies across loci. eMOI furthermore takes within-host relatedness (r) into account, and can be interpreted as the expected MOI if population diversity was infinite (HE = 1). It is calculated as:

$${{{{{\rm{eMOI}}}}}}=1+\left(1-r\right)\left({{{{{\rm{MOI}}}}}}-1\right)$$
(2)

1-Fws was calculated as the allele heterozygosity of the individual (HW) relative to the population38:

$$1-{F}_{{{{{{\rm{WS}}}}}}}=\frac{{H}_{{{{{{\rm{W}}}}}}}}{{H}_{{{{{{\rm{E}}}}}}}}$$
(3)

We used 1-Fws in order to have increasing values with increasing diversity, aligned with the other metrics used. HW is defined as:

$${H}_{{{{{{\rm{W}}}}}}}=1-\left({{{{{{\rm{n}}}}}}}_{i}{\left(\frac{1}{{{{{{{\rm{n}}}}}}}_{i}}\right)}^{2}\right)$$
(4)

where n is the number of alleles detected at the ith locus of a given sample. Individual mean 1-Fws was calculated across all diversity loci. Pairwise infection (inter-host) relatedness was estimated as IBD, i.e., the proportion of the genome shared between parasites through recent ancestry, using Dcifer version 1.2.0 (R package)39, accounting for the presence of multiple strains in one infection and the probability that regions of the genome are shared by chance. Prevalence of resistance markers were calculated as the number of individuals carrying a mutated allele out of all individuals with a valid genotype for the respective locus. In case of both wildtype and mutant alleles present in one individual, the individual was considered mutant carrier if the infection was polyclonal by eMOI (eMOI > 1.1), otherwise only the major allele (wildtype or mutant) was considered. For genotypes involving multiple amplicons, only samples with a single allele present were included to avoid issues with phasing.

Statistical analysis

Univariate and multivariate regression analyses were used to estimate intra-host diversity and assess the effect of factors of interest. P values and confidence intervals for eMOI were obtained from zero-truncated Poisson regressions in order to restrict eMOI to positive values. Logistic regression was used for percentage polyclonal and 1-Fws. The effect size of continuous time on intra-host diversity was estimated from multivariate regressions with an interaction between time and area. To compare intra-host diversity between ANC users and children, only samples from Magude were included due to low sample sizes for children in Manhiça and Ilha Josina. HE was compared between populations with Linear Mixed Models (R package nlme version 3.1) fitting locus as a random effect. Simple random subsampling without replacement matching populations by area and year was performed to compare groups of similar sample size. Differences in mean relatedness between areas and populations were assessed with permutation testing (10,000 permutations). Prevalence of resistance markers were compared with Pearson’s chi-square test or Fisher’s exact test. All statistical tests were two-sided, except the permutation test for mean relatedness. Multiple comparisons were corrected for using the Benjamin-Hochberg procedure with a q value of 0.05, resulting in a final alpha of 0.0062 applied to indicate significance. All analyses were performed using R version 4.3.0.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.