Abstract
One goal in sequencing the Plasmodium falciparum genome, the agent of the most lethal form of malaria, is to discover vaccine and drug targets1. However, identifying those targets in a genome in which ∼60% of genes have unknown functions is an enormous challenge. Because the majority of known malaria antigens and drug-resistant genes are highly polymorphic and under various selective pressures2,3,4,5,6, genome-wide analysis for signatures of selection may lead to discovery of new vaccine and drug candidates. Here we surveyed 3,539 P. falciparum genes (∼65% of the predicted genes) for polymorphisms and identified various highly polymorphic loci and genes, some of which encode new antigens that we confirmed using human immune sera. Our collections of genome-wide SNPs (∼65% nonsynonymous) and polymorphic microsatellites and indels provide a high-resolution map (one marker per ∼4 kb) for mapping parasite traits and studying parasite populations. In addition, we report new antigens, providing urgently needed vaccine candidates for disease control.
Similar content being viewed by others
Main
Genetic mapping using genetic crosses is a powerful approach for identifying genes underlying various traits of parasites7,8,9,10. However, high costs and laborious procedures have prevented widespread application of genetic crosses for genetic studies in P. falciparum. Association mapping using field isolates is an alternative, but this approach requires typing large numbers of genetic markers and parasite isolates because of the complexity of the parasite population structure and highly variable recombination rates11. A genetic map consisting of ∼800 microsatellite markers has been constructed for P. falciparum12; however, compared with the methods recently developed for large-scale SNP typing13,14,15,16, microsatellite typing is relatively laborious. A genome-wide map with high-density SNP markers will greatly facilitate our ability to identify important genes underlying parasite traits and address important biological questions pertaining to the genome of P. falciparum. Unfortunately, a high-density SNP map is currently available only for chromosome 3 (ref. 11).
We searched for polymorphisms in the P. falciparum genome by amplifying and sequencing 3,539 predicted genes or fragments (∼19% of each isolate genome; 91% coding sequence) from four cloned isolates (Dd2, Hb3, D10 and 7G8) (Table 1 and Supplementary Table 1 online). After aligning the DNA sequences to the 3D7 genome sequence1, we identified 3,918 well-validated SNPs, giving a genome-wide average of one SNP per ∼5.9 kb DNA (Table 1 and Fig. 1). Of the genes surveyed, approximately half (54.3%) had one or more SNPs (Supplementary Fig. 1 online); the majority (∼65%) of SNPs were nonsynonymous (nsSNP; Table 1). The predominance of nsSNPs is probably due to codon bias and high frequencies of nonsynonymous sites in the parasite genome17,18. The estimates of the genome-wide average population mutation rate, 4Nμ (Watterson's θ), where N is the effective population size and μ is the per-nucleotide mutation rate, is 5.05 × 10−4, and the estimate of average pairwise nucleotide diversity (π) is 4.83 × 10−4 (Table 1), similar to previous reports11. Although these diversity values are lower than those from many model organisms, this could be due to our sequencing of mostly coding regions and our high stringency of SNP calling (see Methods).
We resequenced ∼45 kb of DNA covering 183 known SNPs on chromosome 3 from 99 worldwide isolates11, 108 of which are common SNPs (minor allele frequency ≥ 0.05). We discovered 185 new SNPs, 29 of which were common SNPs (Supplementary Table 2 online). Because only ∼22% of the 202 kb sequenced for the five isolates was resequenced for the 99 isolates, we would expect to miss 132 (29/0.22) common SNPs in five isolates. This indicates that our survey of five isolates captures ∼45% of common SNPs (108/(132 + 108) = 0.45) relative to the worldwide sample of 99 isolates and gives us a frequency of one common SNP per 842 bp (202 kb/240) and a global genome-wide expectation of >27,000 common SNPs, as estimated based on SNPs largely from single-copy nontelomeric genes.
Microsatellites are also abundant in the genome, averaging one polymorphic microsatellite per 1.3 genes (Fig. 1 and Table 1). The true frequency of polymorphic microsatellites throughout the genome would probably be much higher if more noncoding regions were assayed. Combining polymorphic microsatellites and SNPs, our data constitute a map with an average of one polymorphic marker per ∼3.6 kb for the P. falciparum genome (Fig. 1), providing a powerful tool for genetic studies of the parasite.
SNPs are not distributed evenly across chromosomes; rather, some regions have consecutive genes without any SNPs, and other DNA segments have consecutive genes with multiple SNPs (Fig. 1). The percentage of genes with SNPs varies from chromosome to chromosome, ranging from 47.1% (chromosome 13) to 67.5% (chromosome 7) (Table 1). The numbers of SNPs per gene differs more than twofold between chromosomes, averaging 0.85–2.08 SNPs per sequenced gene, with large chromosomes having fewer SNPs per gene (Table 1 and Supplementary Table 1). Indeed, excluding chromosome 7, there is a negative correlation between chromosome size and the number of SNPs per gene (Supplementary Fig. 2 online). Comparison of the average θ values from genes at chromosome ends (∼15% of the sequenced genes from each chromosome end) with genes in the remainder of the chromosome showed significantly higher θ values for genes at chromosome ends (P = 0.0001, Wilcoxon signed rank test; similarly, P = 0.0001 if we compared and tested ten genes from each end). These results suggest that a generally higher level of polymorphism at chromosome ends may contribute to this negative correlation, because these regions take up a relatively larger proportion of the small chromosomes.
With these genome-wide markers, we estimated the number of recombination events using methods described previously11. We detected recombination events at relatively high frequencies (Table 1); they were distributed nonuniformly both within and among chromosomes, clustering in subtelomeric regions, as previously described for a larger sample on chromosome 3 (ref. 11). Understanding the patterns and rates of recombination is of critical importance for genetic studies, particularly association mapping.
Genes encoding surface antigens, cell adhesion molecules and proteins involved in drug interactions are mostly polymorphic (Supplementary Fig. 3 online). The antigen group has a high ratio of nonsynonymous pairwise differences per nonsynonymous site (pN) relative to synonymous pairwise differences per synonymous site (pS) (pN/pS = 5.8), suggestive of balancing, diversifying or partial directional selection. Additionally, estimates of Tajima's D, a measure of the frequency distribution of alleles, across chromosomes also identified some genes that show an excess of diversity indicative of balancing selection, such as eba-175, which has been shown to be under strong balancing selection (Supplementary Table 1)19.
Chromosomal regions flanking many var clusters are more polymorphic than the genome-wide average (Supplementary Fig. 4 and Supplementary Table 1 online). In addition to many subtelomeric var genes, three of the four internal var clusters in the 3D7 genome (on chromosomes 7, 8, and 12) are flanked by five or more consecutive polymorphic genes, with some extending ∼100 kb from the core var cluster, particularly the two clusters on chromosomes 7 and 8 (Supplementary Fig. 4). On one side of the chromosome 7 internal var locus, elevated polymorphism extended over 200 kb to the chloroquine-resistance transporter gene (pfcrt). Indeed, genes flanking the var loci on chromosomes 7 (25 genes) and 8 (14 genes) have significantly higher θ values than the average values for their chromosome (Table 1, P < 0.0001 for chromosome 7 and P < 0.001 for chromosome 8, Wilcoxon signed rank test). The var genes encode a family of variant antigens called PfEMP1 that are important for immune invasion and disease pathogenesis20,21,22 and that may be under strong balancing selection from the host immune response23, thus maintaining more variation than expected under neutrality. Genes flanking var clusters may have correlated evolutionary histories with the var genes, preserving diverse alleles linked to each unique var haplotype. Therefore, a peak of elevated nucleotide diversity surrounding a selected target is one of the signatures of balancing selection24,25,26. Some var clusters, however, do not have obvious elevated polymorphism in the flanking regions, which could be due to the absence of var genes at a specific location in some parasites, or it could be that some var clusters are subject to lesser selective pressures (perhaps expressed less frequently). If this is true, vaccine development based on var genes should probably give emphasis to var genes that are under strong selection.
Signatures of selection can be exploited to identify genes encoding new antigens or drug targets. We searched for genomic regions with consecutive polymorphic genes or peaks of polymorphism (indicative of balancing selection) that may harbor genes encoding antigens. Indeed, approximately 40% of the 83 loci with five or more consecutive polymorphic genes contain genes encoding known antigens (Supplementary Table 3 and Supplementary Fig. 4 online). These results suggest that most of the parasite antigens are under selection from the host immune system. Further investigation of the other 37 loci (Supplementary Table 3) with unknown genes may lead to some new vaccine candidates. Table 2 lists 56 highly polymorphic genes having θ values 2 s.d. higher than the mean θ value for 1,920 genes with one or more SNP. Although more than half of these genes encode proteins of unknown function, ∼18% of them (10) are known antigens (Table 2). Two genes encode proteins involved in lipid metabolism, one of which is a known drug target in other organisms27. According to the annotation in PlasmoDB28, 32 of these genes (57%) have one or more predicted transmembrane domains (38%) and/or a signal peptide (36%), whereas only approximately 11% and 31% of the genes in the genome have a predicted signal peptide or transmembrane domain, respectively. The higher proportion of proteins with signal peptides and transmembrane domains suggests potential membrane and/or surface localization that may be recognized by the host immune system.
We next expressed the 56 genes in Table 2 plus 52 genes that have five or more SNPs and encode a predicted signal peptide and/or transmembrane domains (Supplementary Table 4 online) using an E. coli cell-free rapid expression system. Expression of proteins was verified via protein blot using antibodies to the His tags incorporated into the C terminus of the expressed proteins and detected using pooled human immune sera. Eleven of the 65 expressed proteins were recognized by pooled human immune sera but not by pooled nonimmune sera (Fig. 2); seven of these represented previously unknown antigens that require further evaluation as potential vaccine candidates.
This study identifies thousands of well-validated SNPs and polymorphic microsatellites for mapping genes that may be important in drug resistance, parasite development and disease pathogenesis. Developing high-density markers and high-throughput methods for genotyping large numbers of parasites is critical for mapping genes associated with malaria phenotypes, particularly in high-transmission populations where limited linkage disequilibrium exists. A high-throughput array-based genotyping method is being developed for use in the malaria community. The genome-wide data also show that the P. falciparum genome is highly polymorphic, with at least one polymorphic site per 0.5 kb in only five isolates. Additionally, this work shows that a genome-wide survey for polymorphisms and signatures of selections is a valuable approach for identifying antigens, which should lead to the identification of many new antigens as potential vaccine targets. Our study also suggests that different var clusters may be under variable immune selective pressures that should be taken into consideration when designing a var-based vaccine. Further characterization of the proteins encoded by these genes may lead to new vaccines, which are urgently needed to combat this deadly disease.
Methods
Parasites and DNA amplification and sequencing.
DNA sequences of the 3D7 parasite were downloaded from PlasmoDB. Primers for PCR (Supplementary Table 1) and DNA sequencing were designed from predicted ORFs larger than 400 bp (∼1.5 kb was sequenced for large genes, excluding the well-known gene families var, stevor and rifin) using a proprietary primer selection software (Visual Basic script). Primers for sequencing both strands of DNA (18–25 bp, with four to seven G/Cs and spaced ∼400 bp apart) were automatically selected and commercially synthesized. Genomic DNA from cultured parasites Dd2 (Thailand), HB3 (Honduras), 7G8 (Brazil) and D10 (Papua New Guinea) were amplified and sequenced as described previously18. Direct sequencing of PCR products eliminates artificial polymorphism frequently introduced when cloning AT-rich DNA into bacteria.
Sequence alignment and analysis.
A Java package was written to process raw sequence data, including trimming and aligning DNA sequences using Phred/Phrap and Sequencher 4.5 (GeneCodes). The 3D7 genomic sequence and the corresponding annotated coding sequences from PlasmoDB 5.0 (both sets of chromosomal flat files dated 2002 and 2005) provided the gene annotations, including annotation of coding and noncoding regions and gene ontology classifications. The program also mapped alignment files to the 3D7 genome and characterized any variation found in the alignment as SNPs or indels or microsatellites. A set of scripts was then used to calculate summary statistics of diversity (θ, π and Tajima's D values). All SNPs and microsatellites were confirmed by visually inspecting chromatogram traces of all potential polymorphisms. All alignments of indels and microsatellites were manually adjusted to minimize mismatches and size polymorphism. SNPs in repetitive regions were not called, because misalignments may create artificial SNPs. The number of recombination events throughout the genome was estimated using the nonparametric methodology of ref. 29 as described previously11.
Protein expression and blotting.
Cell-free expression of proteins was performed using a rapid translation system (Roche Diagnostics) according to the manufacturer's instruction. Proteins expressed in 50 μl of an E. coli cell-free expression system were enriched using paramagnetic precharged nickel particles (Promega), separated on 4%–12% polyacrylamide gels, transferred to PVDF membrane and detected using antibodies to the His tag or pooled human antisera from villagers of Mali.
Accession codes.
SNPs have been deposited at NCBI dbSNP database (accession codes 65654288–65658180) and also at PlasmoDB (v5.2).
URLs.
PlasmoDB: http://www.plasmodb.org/. Phred/Phrap: http://www.phrap.org/.
Note: Supplementary information is available on the Nature Genetics website.
References
Gardner, M.J. et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511 (2002).
Polley, S.D. & Conway, D.J. Strong diversifying selection on domains of the Plasmodium falciparum apical membrane antigen 1 gene. Genetics 158, 1505–1512 (2001).
Conway, D.J. et al. A principal target of human immunity to malaria identified by molecular population genetic and immunological analyses. Nat. Med. 6, 689–692 (2000).
Volkman, S.K. et al. Excess polymorphisms in genes for membrane proteins in Plasmodium falciparum. Science 298, 216–218 (2002).
Wootton, J.C. et al. Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature 418, 320–323 (2002).
Roper, C. et al. Intercontinental spread of pyrimethamine–resistant malaria. Science 305, 1124 (2004).
Wellems, T.E., Walker–Jonah, A. & Panton, L.J. Genetic mapping of the chloroquine–resistance locus on Plasmodium falciparum chromosome 7. Proc. Natl. Acad. Sci. USA 88, 3382–3386 (1991).
Su, X.-z., Kirkman, L.A., Fujioka, H. & Wellems, T.E. Complex polymorphisms in an approximately 330 kDa protein are linked to chloroquine–resistant P. falciparum in Southeast Asia and Africa. Cell 91, 593–603 (1997).
Vaidya, A.B. et al. A genetic locus on Plasmodium falciparum chromosome 12 linked to a defect in mosquito–infectivity and male gametogenesis. Mol. Biochem. Parasitol. 69, 65–71 (1995).
Wang, P., Read, M., Sims, P.F. & Hyde, J.E. Sulfadoxine resistance in the human malaria parasite Plasmodium falciparum is determined by mutations in dihydropteroate synthetase and an additional factor associated with folate utilization. Mol. Microbiol. 23, 979–986 (1997).
Mu, J. et al. Recombination hotspots and population structure in Plasmodium falciparum. PLoS Biol. 3, e335 (2005).
Su, X.-z. et al. A genetic map and recombination parameters of the human malaria parasite Plasmodium falciparum. Science 286, 1351–1353 (1999).
Lindblad–Toh, K. et al. Large–scale discovery and genotyping of single–nucleotide polymorphisms in the mouse. Nat. Genet. 24, 381–386 (2000).
Kennedy, G.C. et al. Large–scale genotyping of complex DNA. Nat. Biotechnol. 21, 1233–1237 (2003).
Hardenbol, P. et al. Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome Res. 15, 269–275 (2005).
Gunderson, K.L., Steemers, F.J., Lee, G., Mendoza, L.G. & Chee, M.S. A genome–wide scalable SNP genotyping assay using microarray technology. Nat. Genet. 37, 549–554 (2005).
Hey, J. Parasite populations: the puzzle of Plasmodium. Curr. Biol. 9, R565–R567 (1999).
Mu, J. et al. Chromosome–wide SNPs reveal an ancient origin for Plasmodium falciparum. Nature 418, 323–326 (2002).
Baum, J., Thomas, A.W. & Conway, D.J. Evidence for diversifying selection on erythrocyte–binding antigens of Plasmodium falciparum and P. vivax. Genetics 163, 1327–1336 (2003).
Baruch, D.I. et al. Cloning the P. falciparum gene encoding PfEMP1, a malarial variant antigen and adherence receptor on the surface of parasitized human erythrocytes. Cell 82, 77–87 (1995).
Su, X.-z. et al. The large diverse gene family var encodes proteins involved in cytoadherence and antigenic variation of Plasmodium falciparum–infected erythrocytes. Cell 82, 89–100 (1995).
Smith, J.D. et al. Switches in expression of Plasmodium falciparum var genes correlate with changes in antigenic and cytoadherent phenotypes of infected erythrocytes. Cell 82, 101–110 (1995).
Trimnell, A. et al. Global genetic diversity and evolution of var genes associated with placental and severe childhood malaria. Mol. Biochem. Parasitol. 148, 169–180 (2006).
Hudson, R.R. & Kaplan, N.L. The coalescent process in models with selection and recombination. Genetics 120, 831–840 (1988).
Nordborg, M., Charlesworth, B. & Charlesworth, D. Increased levels of polymorphism surrounding selectively maintained sites in highly selfing species. Proc. R. Soc. Lond. B 263, 1033–1039 (1996).
Charlesworth, D. Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. 2, e64 (2006).
Sato, S. & Wilson, R.J. The plastid of Plasmodium spp.: a target for inhibitors. Curr. Top. Microbiol. Immunol. 295, 251–273 (2005).
Kissinger, J.C. et al. The Plasmodium genome database. Nature 419, 490–492 (2002).
Myers, S.R. & Griffiths, R.C. Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375–394 (2003).
Acknowledgements
We thank C. Long and R. Fairhurst for pooled immune human sera and National Institute of Allergy and Infectious Disease (NIAID) intramural editor B.R. Marshall for assistance. This work was supported by the Division of Intramural Research of the NIAID as well as by the US National Institutes of Health, the National Academies Keck Genome Initiative and the Human Frontiers in Science Program (P.A.).
Author information
Authors and Affiliations
Contributions
J.M.: DNA amplification, sequencing and data analysis; J.D.: DNA amplification and sequencing; K.S.: primer design; K.M.M. and J.K.: software and database development and data analysis; G.A.T.M.: manuscript preparation; P.A.: software and database development, data analysis and manuscript preparation; X-z.S.: project design, data analysis and manuscript preparation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Fig. 1
Distribution of SNPs among 3,539 genes or gene fragments from five P. falciparum isolates. (PDF 3 kb)
Supplementary Fig. 2
Relationship of SNP density and chromosome size. (PDF 4 kb)
Supplementary Fig. 3
Highly polymorphic genes grouped according to GO functional terms. (PDF 68 kb)
Supplementary Fig. 4
Plots of nucleotide polymorphism (Watterson's theta) per gene on the 14 chromosome of Plasmodium falciparum. (PDF 426 kb)
Supplementary Table 1
Genes and sequences surveyed, SNP alleles, and diversity statistics. (XLS 1797 kb)
Supplementary Table 2
DNA sequences and SNPs obtained from 99 worldwide isolates. (PDF 90 kb)
Supplementary Table 3
Chromosomal loci with five or more consecutive polymorphic genes. (PDF 44 kb)
Supplementary Table 4
Polymorphic genes expressed in cell free E. coli rapid translation system. (PDF 137 kb)
Rights and permissions
About this article
Cite this article
Mu, J., Awadalla, P., Duan, J. et al. Genome-wide variation and identification of vaccine targets in the Plasmodium falciparum genome. Nat Genet 39, 126–130 (2007). https://doi.org/10.1038/ng1924
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1924
This article is cited by
-
Genetic diversity of Plasmodium falciparum AMA-1 antigen from the Northeast Indian state of Tripura and comparison with global sequences: implications for vaccine development
Malaria Journal (2022)
-
Advances and opportunities in malaria population genomics
Nature Reviews Genetics (2021)
-
Genetic diversity of Plasmodium falciparum in Grande Comore Island
Malaria Journal (2020)
-
Self-assembling functional programmable protein array for studying protein–protein interactions in malaria parasites
Malaria Journal (2018)
-
Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum
Parasites & Vectors (2017)