The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for the elimination of malaria. To characterize the genetic diversity of this parasite in individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region and analyzed data on >300,000 SNPs and nine regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at new loci, and these varied markedly between geographical locations. These findings demonstrate a dynamic landscape of local evolutionary adaptation in the parasite population and provide a foundation for genomic surveillance to guide effective strategies for control and elimination of P. vivax.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gething, P.W. et al. A long neglected world malaria map: Plasmodium vivax endemicity in 2010. PLoS Negl. Trop. Dis. 6, e1814 (2012).
Price, R.N. et al. Vivax malaria: neglected and not benign. Am. J. Trop. Med. Hyg. 77 (suppl. 6), 79–87 (2007).
Battle, K.E. et al. The global public health significance of Plasmodium vivax. Adv. Parasitol. 80, 1–111 (2012).
Miller, L.H., Mason, S.J., Clyde, D.F. & McGinniss, M.H. The resistance factor to Plasmodium vivax in blacks. The Duffy-blood-group genotype, FyFy. N. Engl. J. Med. 295, 302–304 (1976).
Ménard, D. et al. Plasmodium vivax clinical malaria is commonly observed in Duffy-negative Malagasy people. Proc. Natl. Acad. Sci. USA 107, 5967–5971 (2010).
White, N.J. Determinants of relapse periodicity in Plasmodium vivax malaria. Malar. J. 10, 297 (2011).
Price, R.N. et al. Global extent of chloroquine-resistant Plasmodium vivax: a systematic review and meta-analysis. Lancet Infect. Dis. 14, 982–991 (2014).
Karunaweera, N.D. et al. Extensive microsatellite diversity in the human malaria parasite Plasmodium vivax. Gene 410, 105–112 (2008).
Barry, A.E., Waltmann, A., Koepfli, C., Barnadas, C. & Mueller, I. Uncovering the transmission dynamics of Plasmodium vivax using population genetics. Pathog. Glob. Health 109, 142–152 (2015).
Koepfli, C. et al. Plasmodium vivax diversity and population structure across four continents. PLoS Negl. Trop. Dis. 9, e0003872 (2015).
Carlton, J.M. et al. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature 455, 757–763 (2008).
Dharia, N.V. et al. Whole-genome sequencing and microarray analysis of ex vivo Plasmodium vivax reveal selective pressure on putative drug resistance genes. Proc. Natl. Acad. Sci. USA 107, 20045–20050 (2010).
Hester, J. et al. De novo assembly of a field isolate genome reveals novel Plasmodium vivax erythrocyte invasion genes. PLoS Negl. Trop. Dis. 7, e2569 (2013).
Chan, E.R. et al. Whole genome sequencing of field isolates provides robust characterization of genetic diversity in Plasmodium vivax. PLoS Negl. Trop. Dis. 6, e1811 (2012).
Neafsey, D.E. et al. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat. Genet. 44, 1046–1050 (2012).
Bright, A.T. et al. A high resolution case study of a patient with recurrent Plasmodium vivax infections shows that relapses were caused by meiotic siblings. PLoS Negl. Trop. Dis. 8, e2882 (2014).
Winter, D.J. et al. Whole genome sequencing of field isolates reveals extensive genetic diversity in Plasmodium vivax from Colombia. PLoS Negl. Trop. Dis. 9, e0004252 (2015).
Flannery, E.L. et al. Next-generation sequencing of Plasmodium vivax patient samples shows evidence of direct evolution in drug-resistance genes. ACS Infect. Dis. 1, 367–379 (2015).
Auburn, S. et al. Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One 7, e32891 (2012).
Menard, D. et al. Whole genome sequencing of field isolates reveals a common duplication of the Duffy binding protein gene in Malagasy Plasmodium vivax strains. PLoS Negl. Trop. Dis. 7, e2489 (2013).
Howes, R.E. et al. The global distribution of the Duffy blood group. Nat. Commun. 2, 266 (2011).
Suwanarusk, R. et al. Amplification of pvmdr1 associated with multidrug-resistant Plasmodium vivax. J. Infect. Dis. 198, 1558–1564 (2008).
Douglas, N.M. et al. Plasmodium vivax recurrence following falciparum and mixed species malaria: risk factors and effect of antimalarial kinetics. Clin. Infect. Dis. 52, 612–620 (2011).
Imwong, M. et al. The first Plasmodium vivax relapses of life are usually genetically homologous. J. Infect. Dis. 205, 680–683 (2012).
Lin, J.T. et al. Using amplicon deep sequencing to detect genetic signatures of Plasmodium vivax relapse. J. Infect. Dis. 212, 999–1008 (2015).
Manske, M. et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature 487, 375–379 (2012).
Nair, S. et al. Single-cell genomics for dissection of complex malaria infections. Genome Res. 24, 1028–1038 (2014).
Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14, 2611–2620 (2005).
Miotto, O. et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat. Genet. 47, 226–234 (2015).
Korsinczky, M. et al. Sulfadoxine resistance in Plasmodium vivax is associated with a specific amino acid in dihydropteroate synthase at the putative sulfadoxine-binding site. Antimicrob. Agents Chemother. 48, 2214–2222 (2004).
Imwong, M. et al. Novel point mutations in the dihydrofolate reductase gene of Plasmodium vivax: evidence for sequential selection by drug pressure. Antimicrob. Agents Chemother. 47, 1514–1521 (2003).
Alam, M.T. et al. Tracking origins and spread of sulfadoxine-resistant Plasmodium falciparum dhps alleles in Thailand. Antimicrob. Agents Chemother. 55, 155–164 (2011).
Pava, Z. et al. Expression of Plasmodium vivax crt-o is related to parasite stage but not ex vivo chloroquine susceptibility. Antimicrob. Agents Chemother. 60, 361–367 (2015).
Suwanarusk, R. et al. Chloroquine resistant Plasmodium vivax: in vitro characterisation and association with molecular polymorphisms. PLoS One 2, e1089 (2007).
Mu, J. et al. Multiple transporters associated with malaria parasite responses to chloroquine and quinine. Mol. Microbiol. 49, 977–989 (2003).
Raj, D.K. et al. Disruption of a Plasmodium falciparum multidrug resistance-associated protein (PfMRP) alters its fitness and transport of antimalarial drugs and glutathione. J. Biol. Chem. 284, 7687–7696 (2009).
Pagès, J.-M., James, C.E. & Winterhalter, M. The porin and the permeating antibiotic: a selective diffusion barrier in Gram-negative bacteria. Nat. Rev. Microbiol. 6, 893–903 (2008).
Bozdech, Z. et al. The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites. Proc. Natl. Acad. Sci. USA 105, 16290–16295 (2008).
Westenberger, S.J. et al. A systems-based analysis of Plasmodium vivax lifecycle transcription from human to mosquito. PLoS Negl. Trop. Dis. 4, e653 (2010).
Tao, Z.-Y., Xia, H., Cao, J. & Gao, Q. Development and evaluation of a prototype non-woven fabric filter for purification of malaria-infected blood. Malar. J. 10, 251 (2011).
Auburn, S. et al. Effective preparation of Plasmodium vivax field isolates for high-throughput whole genome sequencing. PLoS One 8, e53160 (2013).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Logan-Klumpler, F.J. et al. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40, D98–D108 (2012).
Tachibana, S. et al. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat. Genet. 44, 1051–1055 (2012).
Miles, A. et al. Genome variation and meiotic recombination in Plasmodium falciparum: insights from deep sequencing of genetic crosses. Preprint at bioRxiv 024182, http://dx.doi.org/10.1101/024182 (2015).
Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).
Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
We thank the patients and communities that provided samples for this study, and our many colleagues who supported this work in the field. Sequencing, data analysis and project coordination were funded by the Wellcome Trust (098051, 090770/Z/09/Z), the Medical Research Council (G0600718) and the UK Department for International Development (M006212). A.B. and I.M. acknowledge the Victorian State Government Operational Infrastructure Support and Australian Government National Health and Medical Research Council Independent Medical Research Institutes Infrastructure Support Scheme (NHMRC IRIISS). S.A. and R.N.P. are funded by the Wellcome Trust (Senior Fellowship in Clinical Science awarded to R.N.P., 091625). This study was supported in part by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases, National Institutes of Health.
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Features of the core genome and the internal and subtelomeric hypervariable regions.
(a) Several metrics illustrating the genome accessibility properties of core (unshaded) and hypervariable masked (shaded) regions. Black lines show mean mapping quality per variant position on a scale from 40 to 60. The dashed black line shows the 5th percentile of mean mapping quality per 10-kb window (49.16). Blue lines show the mean proportion of missing genotypes per variant position on a scale from 0 to 0.5. The dashed blue line shows the 95th percentile of mean missingness per 10-kb window (0.224). Green lines show the number of variant positions on a scale from 0 to 2,000. Red lines show the mean number of technical replicate discordances per variant position on a scale from 0 to 0.4. All metrics are shown in non-overlapping 10-kb windows. Regions annotated as SubtelomericHypervariable are shaded in red and as InternalHypervariable are shaded in orange. (b) The read mapping properties of Core, InternalHypervariable and SubtelomericHypervariable regions. These analyses are based on a subset of 187 samples with ≥97% genome callable positions. Coverage classifications were defined as follows: no coverage (left set of bars) refers to positions with zero coverage in at least 1/187 samples; low coverage (middle) refers to positions with coverage <5 (uncallable) in at least 1/187 samples; and poor mapping (right) refers to positions where at least 1/187 samples had ≥10% of reads with mapping quality zero (ambiguously aligned reads). The bars are color-coded according to the genomic region as in a.
Estimates based on 237,116 SNPs segregating in 148 samples with low levels of missingness from WTH, WKH and PID.
Observed theta (number of minor allele genotype calls (number of sites × MAF)/accessible genome length)) for different MAF bins. Dark green and light green lines show MAF calculated from fractional genotype calls and majority allele genotype calls, respectively.
Genome-wide values for r2 were calculated between pairs of SNPs over a range of distances and corrected for the inflation caused by population structure and other confounders as described in the Online Methods. Median values of linkage disequilibrium decay over short distances, e.g., r2 falls to <0.1 within 200 bp in western Thailand (green) and western Cambodia (blue) and within 500 bp in Papua Indonesia (red).
Each of the 148 samples from western Thailand (WTH), western Cambodia (WKH) and Papua Indonesia (PID) is displayed. The box on the left displays a histogram of nonreference allele frequency (NRAF) across all heterozygous SNPs. The horizontal axis is NRAF on a scale of 0 to 1. The vertical axis is the number of SNPs on a scale of 0 to 500. The box on the right displays heterozygosity in 20-kb bins across the genome. The horizontal axis represents genomic position, with vertical lines separating the 14 chromosomes. The vertical axis shows the proportion of heterozygous SNPs in a given bin on a scale of 0 to 0.03. The legend for each sample gives its geographical origin, average read depth, FWS, RoH and the inferred number of dominant clones. The line above each of these plots is colored according to the classification in Figure 3a (right). The inferred number of dominant clones is based on these criteria: one dominant clone, FWS ≥ 0.99; two dominant clones, FWS < 0.99 and NRAF histogram is bimodal and symmetric; three or more dominant clones, FWS < 0.99 and NRAF histogram is not bimodal and symmetric. RoH is the proportion of the genome occupied by runs of homozygosity, defined here as the proportion of 100-kb bins for which mean heterozygosity < 0.005. Samples are ordered by RoH. In samples with two dominant clones, the clones are classified as either unrelated (if RoH < 0.1) or related (if RoH > 0.1).
The ΔK metric30 evaluates the second-order rate of change of the likelihood function with respect to K and aims to identify the top-level hierarchical structure of the data. Following this metric, we found K = 3 to be the best choice for the number of putative populations.
Supplementary Figure 7 Details of genomic regions demonstrating haplotype-based evidence of selection.
Each plot shows one of the regions from Supplementary Table 8 plus 20 kb of flanking sequence 5′ and 3′. The top four tracks in each plot show log10 (P) values for genome-wide selection scans. The vertical axis shows log10 (P) on a scale of 0 to 15. The vertical lines represent the boundaries of the regions of selection (red lines in Fig. 5). The lower two tracks in each plot show FST scores between different populations. The vertical axis is scaled from 0 to 1.1. Red points represent nonsynonymous SNPs, blue points represent synonymous SNPs and gray points represent noncoding SNPs. The track at the bottom of each plot illustrates the coordinates of all genes within the given region. Gene names are given where these are available (as per the December 2015 version of GeneDB).
Supplementary Figure 8 Distributions of variant filters illustrating thresholds applied in the study.
The horizontal axis shows the percentile for a given variant annotation when ranked from lowest to highest values. The vertical axis shows the mean discordance of technical replicates per variant within each percentile bin. The horizontal line on each plot shows double the mean technical replicate discordance rate per variant across all discovered variable biallelic positions in non-masked regions (0.049). The vertical line(s) on each plot indicates the thresholds used in the study.
This tree was built using the same set of samples as was used in Figure 4, but here we mapped to the P01 reference (www.genedb.org/Homepage/PvivaxP01), rather than the Sal1 reference, before calling SNPs using GATK best practices. The structure of the tree is essentially the same as that in Figure 4, showing that the choice of reference genome makes little difference to the conclusions drawn from analysis of variation in the core genome.
Supplementary Figures 1–9, Supplementary Tables 1–8 and Supplementary Note. (PDF 3499 kb)
Gene-level summaries of variation data. The first sheets give aggregate metrics across all 148 samples used for population genetic analyses, and the other three sheets show metrics for WTH, WKH and ID respectively. Summaries are given for both high-quality SNPs (pass) and all discovered SNPs (all). We do not record SNPs or metrics for genes outside the core genome. N/S, nonsynonymous/synonymous ratio; π, nucleotide diversity per base; D, Tajima's D. (XLSX 3864 kb)
CNV calls. Start and end coordinates, and copy number for all CNV calls longer than 3 kb. (XLSX 54 kb)
About this article
Cite this article
Pearson, R., Amato, R., Auburn, S. et al. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet 48, 959–964 (2016). https://doi.org/10.1038/ng.3599
Naturally acquired blocking human monoclonal antibodies to Plasmodium vivax reticulocyte binding protein 2b
Nature Communications (2021)
Distinctive genetic structure and selection patterns in Plasmodium vivax from South Asia and East Africa
Nature Communications (2021)
Nature Reviews Genetics (2021)
Current Epidemiology Reports (2021)
Use of a Plasmodium vivax genetic barcode for genomic surveillance and parasite tracking in Sri Lanka
Malaria Journal (2020)