Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genomic analysis of local variation and recent evolution in Plasmodium vivax


The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for the elimination of malaria. To characterize the genetic diversity of this parasite in individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region and analyzed data on >300,000 SNPs and nine regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at new loci, and these varied markedly between geographical locations. These findings demonstrate a dynamic landscape of local evolutionary adaptation in the parasite population and provide a foundation for genomic surveillance to guide effective strategies for control and elimination of P. vivax.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Defining the accessible genome.
Figure 2: Copy number variation.
Figure 3: Genetic structure of mixed infections.
Figure 4: Parasite population structure.
Figure 5: Population-specific signatures of recent positive selection.


  1. Gething, P.W. et al. A long neglected world malaria map: Plasmodium vivax endemicity in 2010. PLoS Negl. Trop. Dis. 6, e1814 (2012).

    Article  Google Scholar 

  2. Price, R.N. et al. Vivax malaria: neglected and not benign. Am. J. Trop. Med. Hyg. 77 (suppl. 6), 79–87 (2007).

    Article  Google Scholar 

  3. Battle, K.E. et al. The global public health significance of Plasmodium vivax. Adv. Parasitol. 80, 1–111 (2012).

    Article  Google Scholar 

  4. Miller, L.H., Mason, S.J., Clyde, D.F. & McGinniss, M.H. The resistance factor to Plasmodium vivax in blacks. The Duffy-blood-group genotype, FyFy. N. Engl. J. Med. 295, 302–304 (1976).

    Article  CAS  Google Scholar 

  5. Ménard, D. et al. Plasmodium vivax clinical malaria is commonly observed in Duffy-negative Malagasy people. Proc. Natl. Acad. Sci. USA 107, 5967–5971 (2010).

    Article  Google Scholar 

  6. White, N.J. Determinants of relapse periodicity in Plasmodium vivax malaria. Malar. J. 10, 297 (2011).

    Article  Google Scholar 

  7. Price, R.N. et al. Global extent of chloroquine-resistant Plasmodium vivax: a systematic review and meta-analysis. Lancet Infect. Dis. 14, 982–991 (2014).

    Article  Google Scholar 

  8. Karunaweera, N.D. et al. Extensive microsatellite diversity in the human malaria parasite Plasmodium vivax. Gene 410, 105–112 (2008).

    Article  CAS  Google Scholar 

  9. Barry, A.E., Waltmann, A., Koepfli, C., Barnadas, C. & Mueller, I. Uncovering the transmission dynamics of Plasmodium vivax using population genetics. Pathog. Glob. Health 109, 142–152 (2015).

    Article  Google Scholar 

  10. Koepfli, C. et al. Plasmodium vivax diversity and population structure across four continents. PLoS Negl. Trop. Dis. 9, e0003872 (2015).

    Article  Google Scholar 

  11. Carlton, J.M. et al. Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature 455, 757–763 (2008).

    Article  CAS  Google Scholar 

  12. Dharia, N.V. et al. Whole-genome sequencing and microarray analysis of ex vivo Plasmodium vivax reveal selective pressure on putative drug resistance genes. Proc. Natl. Acad. Sci. USA 107, 20045–20050 (2010).

    Article  CAS  Google Scholar 

  13. Hester, J. et al. De novo assembly of a field isolate genome reveals novel Plasmodium vivax erythrocyte invasion genes. PLoS Negl. Trop. Dis. 7, e2569 (2013).

    Article  Google Scholar 

  14. Chan, E.R. et al. Whole genome sequencing of field isolates provides robust characterization of genetic diversity in Plasmodium vivax. PLoS Negl. Trop. Dis. 6, e1811 (2012).

    Article  CAS  Google Scholar 

  15. Neafsey, D.E. et al. The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum. Nat. Genet. 44, 1046–1050 (2012).

    Article  CAS  Google Scholar 

  16. Bright, A.T. et al. A high resolution case study of a patient with recurrent Plasmodium vivax infections shows that relapses were caused by meiotic siblings. PLoS Negl. Trop. Dis. 8, e2882 (2014).

    Article  Google Scholar 

  17. Winter, D.J. et al. Whole genome sequencing of field isolates reveals extensive genetic diversity in Plasmodium vivax from Colombia. PLoS Negl. Trop. Dis. 9, e0004252 (2015).

    Article  Google Scholar 

  18. Flannery, E.L. et al. Next-generation sequencing of Plasmodium vivax patient samples shows evidence of direct evolution in drug-resistance genes. ACS Infect. Dis. 1, 367–379 (2015).

    Article  CAS  Google Scholar 

  19. Auburn, S. et al. Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One 7, e32891 (2012).

    Article  CAS  Google Scholar 

  20. Menard, D. et al. Whole genome sequencing of field isolates reveals a common duplication of the Duffy binding protein gene in Malagasy Plasmodium vivax strains. PLoS Negl. Trop. Dis. 7, e2489 (2013).

    Article  Google Scholar 

  21. Howes, R.E. et al. The global distribution of the Duffy blood group. Nat. Commun. 2, 266 (2011).

    Article  Google Scholar 

  22. Suwanarusk, R. et al. Amplification of pvmdr1 associated with multidrug-resistant Plasmodium vivax. J. Infect. Dis. 198, 1558–1564 (2008).

    Article  CAS  Google Scholar 

  23. Douglas, N.M. et al. Plasmodium vivax recurrence following falciparum and mixed species malaria: risk factors and effect of antimalarial kinetics. Clin. Infect. Dis. 52, 612–620 (2011).

    Article  Google Scholar 

  24. Imwong, M. et al. The first Plasmodium vivax relapses of life are usually genetically homologous. J. Infect. Dis. 205, 680–683 (2012).

    Article  Google Scholar 

  25. Lin, J.T. et al. Using amplicon deep sequencing to detect genetic signatures of Plasmodium vivax relapse. J. Infect. Dis. 212, 999–1008 (2015).

    Article  CAS  Google Scholar 

  26. Manske, M. et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature 487, 375–379 (2012).

    Article  CAS  Google Scholar 

  27. Nair, S. et al. Single-cell genomics for dissection of complex malaria infections. Genome Res. 24, 1028–1038 (2014).

    Article  CAS  Google Scholar 

  28. Evanno, G., Regnaut, S. & Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol. Ecol. 14, 2611–2620 (2005).

    Article  CAS  Google Scholar 

  29. Miotto, O. et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat. Genet. 47, 226–234 (2015).

    Article  CAS  Google Scholar 

  30. Korsinczky, M. et al. Sulfadoxine resistance in Plasmodium vivax is associated with a specific amino acid in dihydropteroate synthase at the putative sulfadoxine-binding site. Antimicrob. Agents Chemother. 48, 2214–2222 (2004).

    Article  CAS  Google Scholar 

  31. Imwong, M. et al. Novel point mutations in the dihydrofolate reductase gene of Plasmodium vivax: evidence for sequential selection by drug pressure. Antimicrob. Agents Chemother. 47, 1514–1521 (2003).

    Article  CAS  Google Scholar 

  32. Alam, M.T. et al. Tracking origins and spread of sulfadoxine-resistant Plasmodium falciparum dhps alleles in Thailand. Antimicrob. Agents Chemother. 55, 155–164 (2011).

    Article  CAS  Google Scholar 

  33. Pava, Z. et al. Expression of Plasmodium vivax crt-o is related to parasite stage but not ex vivo chloroquine susceptibility. Antimicrob. Agents Chemother. 60, 361–367 (2015).

    Article  Google Scholar 

  34. Suwanarusk, R. et al. Chloroquine resistant Plasmodium vivax: in vitro characterisation and association with molecular polymorphisms. PLoS One 2, e1089 (2007).

    Article  Google Scholar 

  35. Mu, J. et al. Multiple transporters associated with malaria parasite responses to chloroquine and quinine. Mol. Microbiol. 49, 977–989 (2003).

    Article  CAS  Google Scholar 

  36. Raj, D.K. et al. Disruption of a Plasmodium falciparum multidrug resistance-associated protein (PfMRP) alters its fitness and transport of antimalarial drugs and glutathione. J. Biol. Chem. 284, 7687–7696 (2009).

    Article  CAS  Google Scholar 

  37. Pagès, J.-M., James, C.E. & Winterhalter, M. The porin and the permeating antibiotic: a selective diffusion barrier in Gram-negative bacteria. Nat. Rev. Microbiol. 6, 893–903 (2008).

    Article  Google Scholar 

  38. Bozdech, Z. et al. The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites. Proc. Natl. Acad. Sci. USA 105, 16290–16295 (2008).

    Article  CAS  Google Scholar 

  39. Westenberger, S.J. et al. A systems-based analysis of Plasmodium vivax lifecycle transcription from human to mosquito. PLoS Negl. Trop. Dis. 4, e653 (2010).

    Article  Google Scholar 

  40. Tao, Z.-Y., Xia, H., Cao, J. & Gao, Q. Development and evaluation of a prototype non-woven fabric filter for purification of malaria-infected blood. Malar. J. 10, 251 (2011).

    Article  Google Scholar 

  41. Auburn, S. et al. Effective preparation of Plasmodium vivax field isolates for high-throughput whole genome sequencing. PLoS One 8, e53160 (2013).

    Article  CAS  Google Scholar 

  42. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  Google Scholar 

  43. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  44. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  Google Scholar 

  45. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  Google Scholar 

  46. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).

    Article  CAS  Google Scholar 

  47. Logan-Klumpler, F.J. et al. GeneDB--an annotation database for pathogens. Nucleic Acids Res. 40, D98–D108 (2012).

    Article  CAS  Google Scholar 

  48. Tachibana, S. et al. Plasmodium cynomolgi genome sequences provide insight into Plasmodium vivax and the monkey malaria clade. Nat. Genet. 44, 1051–1055 (2012).

    Article  CAS  Google Scholar 

  49. Miles, A. et al. Genome variation and meiotic recombination in Plasmodium falciparum: insights from deep sequencing of genetic crosses. Preprint at bioRxiv 024182, (2015).

  50. Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  Google Scholar 

  51. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  Google Scholar 

  52. Sabeti, P.C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).

    Article  CAS  Google Scholar 

  53. Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

    Article  Google Scholar 

Download references


We thank the patients and communities that provided samples for this study, and our many colleagues who supported this work in the field. Sequencing, data analysis and project coordination were funded by the Wellcome Trust (098051, 090770/Z/09/Z), the Medical Research Council (G0600718) and the UK Department for International Development (M006212). A.B. and I.M. acknowledge the Victorian State Government Operational Infrastructure Support and Australian Government National Health and Medical Research Council Independent Medical Research Institutes Infrastructure Support Scheme (NHMRC IRIISS). S.A. and R.N.P. are funded by the Wellcome Trust (Senior Fellowship in Clinical Science awarded to R.N.P., 091625). This study was supported in part by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases, National Institutes of Health.

Author information

Authors and Affiliations



C.A., S.S., S.M., R.N., H.T., J.M., N.M.A., T.W., M.F.B., C.D., H.T.T., N.J.W., P.M., P.S., L.T., G.H., A.B., I.M., M.U.F., N.K., M.R. and Q.G. carried out field and laboratory work to obtain P. vivax samples for sequencing. C.H., E.D., D.M., M.K., S.C., B.M. and K.A.R. developed and implemented methods for sample processing and sequencing library preparation. R.D.P., L.H., B.J. and M.M. managed data production pipelines. S.A., O.M., V.J.C., B.M., K.A.R., A.M., J.C.R., R.M.F., F.N., R.N.P. and D.P.K. contributed to study design and management. R.D.P., R.A., S.A., O.M., J.A.-G. and D.P.K. performed data analyses. R.D.P., R.A., S.A. and D.P.K. drafted the manuscript, which was reviewed by all authors.

Corresponding author

Correspondence to Dominic P Kwiatkowski.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Features of the core genome and the internal and subtelomeric hypervariable regions.

(a) Several metrics illustrating the genome accessibility properties of core (unshaded) and hypervariable masked (shaded) regions. Black lines show mean mapping quality per variant position on a scale from 40 to 60. The dashed black line shows the 5th percentile of mean mapping quality per 10-kb window (49.16). Blue lines show the mean proportion of missing genotypes per variant position on a scale from 0 to 0.5. The dashed blue line shows the 95th percentile of mean missingness per 10-kb window (0.224). Green lines show the number of variant positions on a scale from 0 to 2,000. Red lines show the mean number of technical replicate discordances per variant position on a scale from 0 to 0.4. All metrics are shown in non-overlapping 10-kb windows. Regions annotated as SubtelomericHypervariable are shaded in red and as InternalHypervariable are shaded in orange. (b) The read mapping properties of Core, InternalHypervariable and SubtelomericHypervariable regions. These analyses are based on a subset of 187 samples with ≥97% genome callable positions. Coverage classifications were defined as follows: no coverage (left set of bars) refers to positions with zero coverage in at least 1/187 samples; low coverage (middle) refers to positions with coverage <5 (uncallable) in at least 1/187 samples; and poor mapping (right) refers to positions where at least 1/187 samples had ≥10% of reads with mapping quality zero (ambiguously aligned reads). The bars are color-coded according to the genomic region as in a.

Supplementary Figure 2 Minor allele frequency (MAF) spectrum.

Estimates based on 237,116 SNPs segregating in 148 samples with low levels of missingness from WTH, WKH and PID.

Supplementary Figure 3 Over-representation of rare alleles.

Observed theta (number of minor allele genotype calls (number of sites × MAF)/accessible genome length)) for different MAF bins. Dark green and light green lines show MAF calculated from fractional genotype calls and majority allele genotype calls, respectively.

Supplementary Figure 4 Patterns of linkage disequilibrium.

Genome-wide values for r2 were calculated between pairs of SNPs over a range of distances and corrected for the inflation caused by population structure and other confounders as described in the Online Methods. Median values of linkage disequilibrium decay over short distances, e.g., r2 falls to <0.1 within 200 bp in western Thailand (green) and western Cambodia (blue) and within 500 bp in Papua Indonesia (red).

Supplementary Figure 5 Genetic structure of individual infections.

Each of the 148 samples from western Thailand (WTH), western Cambodia (WKH) and Papua Indonesia (PID) is displayed. The box on the left displays a histogram of nonreference allele frequency (NRAF) across all heterozygous SNPs. The horizontal axis is NRAF on a scale of 0 to 1. The vertical axis is the number of SNPs on a scale of 0 to 500. The box on the right displays heterozygosity in 20-kb bins across the genome. The horizontal axis represents genomic position, with vertical lines separating the 14 chromosomes. The vertical axis shows the proportion of heterozygous SNPs in a given bin on a scale of 0 to 0.03. The legend for each sample gives its geographical origin, average read depth, FWS, RoH and the inferred number of dominant clones. The line above each of these plots is colored according to the classification in Figure 3a (right). The inferred number of dominant clones is based on these criteria: one dominant clone, FWS ≥ 0.99; two dominant clones, FWS < 0.99 and NRAF histogram is bimodal and symmetric; three or more dominant clones, FWS < 0.99 and NRAF histogram is not bimodal and symmetric. RoH is the proportion of the genome occupied by runs of homozygosity, defined here as the proportion of 100-kb bins for which mean heterozygosity < 0.005. Samples are ordered by RoH. In samples with two dominant clones, the clones are classified as either unrelated (if RoH < 0.1) or related (if RoH > 0.1).

Supplementary Figure 6 ΔK for values of K used in ADMIXTURE analysis of population structure.

The ΔK metric30 evaluates the second-order rate of change of the likelihood function with respect to K and aims to identify the top-level hierarchical structure of the data. Following this metric, we found K = 3 to be the best choice for the number of putative populations.

Supplementary Figure 7 Details of genomic regions demonstrating haplotype-based evidence of selection.

Each plot shows one of the regions from Supplementary Table 8 plus 20 kb of flanking sequence 5′ and 3′. The top four tracks in each plot show log10 (P) values for genome-wide selection scans. The vertical axis shows log10 (P) on a scale of 0 to 15. The vertical lines represent the boundaries of the regions of selection (red lines in Fig. 5). The lower two tracks in each plot show FST scores between different populations. The vertical axis is scaled from 0 to 1.1. Red points represent nonsynonymous SNPs, blue points represent synonymous SNPs and gray points represent noncoding SNPs. The track at the bottom of each plot illustrates the coordinates of all genes within the given region. Gene names are given where these are available (as per the December 2015 version of GeneDB).

Supplementary Figure 8 Distributions of variant filters illustrating thresholds applied in the study.

The horizontal axis shows the percentile for a given variant annotation when ranked from lowest to highest values. The vertical axis shows the mean discordance of technical replicates per variant within each percentile bin. The horizontal line on each plot shows double the mean technical replicate discordance rate per variant across all discovered variable biallelic positions in non-masked regions (0.049). The vertical line(s) on each plot indicates the thresholds used in the study.

Supplementary Figure 9 Neighbor-joining tree using the P01 reference.

This tree was built using the same set of samples as was used in Figure 4, but here we mapped to the P01 reference (, rather than the Sal1 reference, before calling SNPs using GATK best practices. The structure of the tree is essentially the same as that in Figure 4, showing that the choice of reference genome makes little difference to the conclusions drawn from analysis of variation in the core genome.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–8 and Supplementary Note. (PDF 3499 kb)

Supplementary Data 1

Gene-level summaries of variation data. The first sheets give aggregate metrics across all 148 samples used for population genetic analyses, and the other three sheets show metrics for WTH, WKH and ID respectively. Summaries are given for both high-quality SNPs (pass) and all discovered SNPs (all). We do not record SNPs or metrics for genes outside the core genome. N/S, nonsynonymous/synonymous ratio; π, nucleotide diversity per base; D, Tajima's D. (XLSX 3864 kb)

Supplementary Data 2

CNV calls. Start and end coordinates, and copy number for all CNV calls longer than 3 kb. (XLSX 54 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pearson, R., Amato, R., Auburn, S. et al. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet 48, 959–964 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing