Main

It cannot be restated too often that malaria, principally through infection by the protozoan apicomplexan parasite Plasmodium falciparum , is responsible for more than one million deaths each year. A recent survey has shown that, in 2002, roughly 2.2 billion people were at risk of contracting P. falciparum infection, with 515 million a conservative estimate of the number of individuals who became infected1.

Clearly a disease of poverty, the dwindling effectiveness of frontline affordable drugs and the continuing lack of a vaccine mean that malaria poses a greater problem now than at any time since the failure of the WHO eradication drive in the 1950s. It is in this context that malaria research must act. Basic biological investigation of malaria parasites has always offered the promise of the development of new vaccines and drugs. Although biologists initially worked on single genes of therapeutic interest, the landmark publication2 of the complete P. falciparum genome with first-pass annotation in late 2002 irrevocably changed the face and practice of malaria research. This comprehensive data set, combined with the availability of substantial sequence tracts from the rodent malaria parasite (RMP) Plasmodium yoelii , has made it possible to embrace the latest global-genome-survey technologies3.

Thanks to the admirable real-time release policy of the three sequencing centres involved, this embrace was immediate, and two detailed proteomic studies accompanied the genome into press4,5. Here, we review the current status of Plasmodium genomic and post-genomic research, indicating trends and the future requirements necessary to build a detailed 'virtual parasite' in which we are fully informed of the molecular biology that underpins its biology and interactions with both host and vector (Box 1).

The P. falciparum genome

The major pre-genomic milestones in Plasmodium research are summarized in the accompanying TIMELINE, and the complex life cycle of a Plasmodium parasite is described in Box 2.

Timeline | Plasmodium genomics

The P. falciparum genome comprises 14 linear chromosomes ranging in size from 0.64?3.3 Mb and two non-nuclear genomes: a compact mitochondrial genome of 6 kb and a plastid-like, 35-kb circular genome that resides in an organelle now known as the apicoplast6. The genome of the 3D7 strain of P. falciparum was the first parasite genome to be sequenced to completion2. A chromosome-by-chromosome shotgun-sequencing strategy was used based on pulsed-field gel electrophoresis (PFGE)-separated chromosomes. Mainly due to the extreme AT bias (80%) of the genome, it took almost seven years before the 22.8-Mb nuclear genome was complete, revealing 5,268 predicted protein-encoding genes at an average frequency of one gene every 4.3 kb. Matching expressed sequence tags (ESTs) and proteomic analyses were used to experimentally validate approximately 49% and 52% of these genes, respectively2. At the time of publication, there were still 79 gaps in the sequence. The largest contiguous DNA sequence (contig) was chromosome 12 (2.3 Mb). Now, just three chromosomes (7, 8 and 13) are still awaiting closure, with only six gaps remaining (M. Berriman, personal communication).

Comparison of the first-pass annotation of the P. falciparum nuclear genome with other genomes showed that 60% of the predicted genes could not be assigned functions. The products of at least 1.3% of the P. falciparum genes are known to be involved in cell-to-cell adhesion or invasion of host cells, and a further 3.9% are postulated to have a role in evasion of the host immune response; many of these 250?300 proteins possess host-like extracellular adhesion domains. Curiously, only 8% of the P. falciparum genes could be assigned functions in metabolism, in contrast to 17% of the genes of the yeast Saccharomyces cerevisiae 7. This suggests either that enzymes are more difficult to identify by sequence homology in P. falciparum owing to its great evolutionary distance from other well-studied organisms or that the P. falciparum genome contains fewer enzymes as a consequence of its parasitic lifestyle. The complement of transport molecules appeared to be similarly reduced compared with free-living organisms.

Approximately 10% of the nuclear-encoded proteins are predicted to target to the apicoplast2,8,9. The apicoplast is thought to serve as an organization centre for certain metabolic pathways (isoprenoid and fatty-acid biosynthesis), was probably acquired from non-green algae by endosymbiosis, and is found in many apicomplexan parasites (for reviews, see Refs 10,11). The evolutionary origin of the apicoplast immediately suggested potential drug targets based on herbicides and antibacterial agents.

Clustering of P. falciparum subtelomeric genes based on hidden Markov models revealed that nearly 50% of these genes form 12 distinct P. falciparum-specific gene families, and of these only five subtelomeric gene families are shared with the RMPs (see Supplementary information S1 (table)). These gene families account for 5?10% of all genes, and include gene families that are involved in immune evasion and sequestration (var12,13,14, 59 members) and putative variant antigens (rif, 149 members, and stevor, 28 members15,16). Although some genes that belong to the defined gene families are located in internal chromosomal regions, most are distributed in the subtelomeric regions, internal to the complex species-specific repeats that abut the telomere repeats2.

All chromosomes harbour some copies of one or more of the gene families, but the composition and order vary. P. falciparum erythrocyte membrane protein 1 (PfEMP1), encoded by the var genes, is demonstrably associated with clonal antigenic variation, a strategy used in the erythrocytic stages to evade the adaptive host immune response. PfEMP1 also has an important role in the sequestration of infected erythrocytes in capillaries in the brain (resulting in cerebral malaria) and other tissues, including the placenta. The subtelomeric location of 60% of the var genes should facilitate the generation of diversity in the gene repertoire of this family17,18, and the high frequency of (ectopic) recombination in these regions could also contribute to the observed size variation in the subtelomeric regions. There are also seven non-subtelomeric var loci containing between one and seven copies, regularly interrupted by rif genes (Fig. 1).

Figure 1: A whole-genome synteny map of Plasmodium falciparum and three rodent malaria parasites (RMPs).
figure 1

A synteny map of the core regions of all chromosomes of P. falciparum and the RMPs, showing the 36 synteny blocks, 22 synteny breakpoints, 14 predicted centromeres, P. falciparum-specific indels and translocations in the RMP chromosomes. The 36 synteny blocks, coloured according to their chromosomal location in the composite RMP (cRMP) genome, are named with a Roman and an Arabic number referring to the corresponding chromosomal location in P. falciparum and the cRMP genome, respectively. Letters give the order in which the synteny blocks are connected. Small black arrows indicate the inverted orientation of a synteny block in P. falciparum relative to the cRMP genome. Indels containing P. falciparum-specific intrasyntenic genes are indicated by interruption of the coloured synteny blocks. Synteny blocks forming the cRMP chromosomes are linked by grey lines. Telomeres and subtelomeric linked ends are shown as white arrowheads. Bars under the cRMP chromosomes represent the differences in the organization of the synteny blocks of Plasmodium yoelii (Py), Plasmodium chabaudi (Pc) and Plasmodium vinckei (Pv) as a result of translocations. Colours indicate the cRMP chromosome with which recombination has taken place and colour gradients represent the ill-defined regions of the translocation breakpoints. Reproduced with permission from PLoS Pathogens Ref. 19.

Smaller gene families that encode diverse functions are also found (see Supplementary information S1 (table)), for example, genes encoding acyl coenzyme A synthetases (11 members) and receptor-associated protein kinases19 (21 members; Fig. 1). The genome sequence also expanded our knowledge of the extent of gene families that encode vaccine candidates, and might yet reveal new antigens for vaccine development. For example, before the sequencing initiative, the 6-Cys family, which contains transmission-blocking vaccine candidates, was thought to have three members (P48/45, P12 and P230); we now know that 10 members are expressed at different stages of the life cycle, but principally in gametocytes as five are exclusive to this stage and one (P36) is expressed in both gametocytes and sporozoites (for review, see Ref. 20).

The P. falciparum transcriptome

Several DNA microarray studies have been published, ranging from analysis of gene transcription using random clones selected from genomic DNA libraries21,22 to more recent quasi-global surveys of transcription23,24 based on oligonucleotides designed using the emerging genome sequence. The arrays are used to probe RNA from defined parasite stages. Additionally, gene transcription in defined parasite stages has been investigated in several EST surveys25,26,27,28 by different techniques creating and analysing stage-specific enriched cDNA libraries29,30, and through serial analyses of gene expression (SAGE)31,32.

Two microarray studies analysing transcription during the blood stages revealed that the parasite transcribes a large core of its genome during blood-stage development. In one study, 83% (Ref. 23) of the genes present on the chip (97.2% of the annotated genes of the genome are represented on the chip) were shown to be expressed during the asexual blood stages, with 24% expressed exclusively during this part of the life cycle. Using different cut-off criteria, the other study found that at least 60% (Ref. 24) of the genes in the genome are expressed during asexual blood-stage development. Furthermore, both studies found clear patterns of stage-specific gene transcription. Both studies also showed remarkable concordance in gene-transcription patterns and pinpointed genes encoding surface proteins that might serve as vaccine candidates for specific parasite stages. The study by Bozdech et al.24 suggested a cascade of transcriptional activation during blood-stage development, where transcripts are produced in an ordered manner as and when they are required to fulfil the demands of the cell (the 'transcripts-to-go' model). This raises the hope that inhibition of a few key early transcription factors might provide a means to arrest parasite development, a concept that remains generally valid despite the implications of the more recent discovery of translational repression in certain life-cycle stages33,34. The authors of both microarray studies proposed that the groups of genes with similar transcription profiles might be involved in similar functions or cellular processes, perhaps giving insight into the role of the genes for which the annotation did not reveal a function. SAGE technology revealed an unusual feature of blood-stage transcription ? the significant accumulation of antisense transcripts in a stage-specific fashion31,32. At present, this is thought to represent some level of gene regulation, but the mechanism, and indeed proof of function, of antisense RNA in Plasmodium remains obscure at present.

By including the invasive sporozoite and merozoite stages in their analysis, Le Roch et al.23 were able to identify gene groups that are associated with cell invasion, emphasizing the similarities of these stages with stage-specific modulation of gene-transcription patterns according to context. For example, one of the identified clusters contains genes transcribed in sporozoites, gametocytes and schizonts (at the onset of merozoite production) that could be essential for gliding motility and host-cell invasion, including actin and myosin.

Continued data mining of the published P. falciparum transcriptome, in addition to new transcriptome studies of defined developmental stages or mutant parasites, will provide a better understanding of the biology of the malaria parasite. Some examples of such studies are the analysis of the antioxidant defence system35, the pentose-phosphate pathway36, the transcription of variable antigens37 and detailed analysis of gametocyte development38,39,40. Not surprisingly, it seems that at least 60% of the Plasmodium genes are transcribed at multiple life-cycle stages, but stage- and strategy-specific transcription can also be found.

The P. falciparum proteome

Several detailed high-throughput mass-spectrometry studies of the P. falciparum proteome have been published. Reassuringly, the protein content of the different blood stages agrees well with the presence of transcripts of the genes encoding these proteins23,24 and, therefore, the proteome data generally support the 'transcripts-to-go' model.

Florens et al. characterized the proteome of four stages: sporozoites, merozoites, trophozoites and gametocytes4. Three additional proteomes from the ring, schizont and gamete stages were also included later in an even more global approach34. Of 2,415 proteins identified in the first study (later rising to 2,904), only 6% were expressed in all four stages, whereas more than half were unique to one stage. Almost half of the sporozoite proteins were stage specific, whereas for the blood stages the numbers ranged from 20?33%. These data form a sharp contrast with the global gene-transcription study analysing the same stages, which reported that 35% of the 5,119 genes on the chip are expressed in sporozoites, gametocytes and blood stages, whereas just under 30% appeared stage specific23. This discrepancy is most easily explained by the greater sensitivity of microarray analyses and technical issues associated with proteome analysis. The same issues applied to the second study, which combined transcriptome and proteome approaches and predicted that up to 40% of the transcripts for which no proteins were detected were insoluble34.

The proteome of the sexually developing parasite was described in more detail by Lasonder et al.5 Comparison of these data sets, with the help of annotation, identified candidate proteins for transmission-blocking vaccines, such as a family of genes containing Limulus coagulation factor C domains and predicted lectin domains. These proteins, initially thought to be exclusive to gametocytes and gametes, are also expressed in ookinetes and are essential for oocyst development41,42, indicating that they have a role in the interactions of the parasite with the mosquito midgut epithelium.

The annotation of the genome has benefited considerably from the proteome studies; for example, Lasonder et al. reported a set of peptides with significant matches in the P. falciparum genome for which no gene model had been predicted5, and further analysis of these peptides is ongoing (E. Lasonder, personal communication). Both transcriptome and proteome data revealed no tendency for the genome to be compartmentalized into regions containing genes that are coordinately expressed, ruling out an operon-like organization. Genome-wide transcription analysis revealed just 14 clusters of three or more co-regulated genes (60 in total)24, although no such clustering could be identified from the P. falciparum proteome4, probably hampered by the lower coverage of these data. However, a tandem array of five genes encoding proteins located in the Maurer's clefts (MCs) has been reported43.

In addition to the reported proteomes of the whole life-cycle stages, proteome studies have focused on specific organelles and structures. For example, the proteome of gradient-purified detergent-resistant membranes of mature blood-stage parasites (late schizonts/merozoites) has been analysed44. These membranes are greatly enriched in glycosylphosphatidylinositol-anchored proteins (GPI-APs) and their putative interacting partners. GPI-APs coat the surface of extracellular P. falciparum merozoites, and several are validated candidates for inclusion in a blood-stage malaria vaccine. In addition to detecting confirmed GPI-APs, this study identified new GPI-APs and several other novel, potentially GPI-AP-interacting proteins that are predicted to localize to the merozoite surface and/or apical, invasion-associated organelles (rhoptries and micronemes).

Furthermore, the proteomes of infected erythrocyte membranes and MCs of mature trophozoites and schizonts have been investigated, revealing 36 (Ref. 45) and 50 (Ref. 43) candidate proteins, respectively. MCs are parasite-derived membranous structures in the erythrocyte cytosol that are thought to be involved in the transport of parasite proteins to the erythrocyte surface46. Perhaps surprisingly, the two data sets share only four proteins, which could reflect the different methods of protein-sample preparation and detection. Alternatively, this lack of overlap could suggest that the parasite proteins on the erythrocyte surface are only transiently associated with the MCs, or that proteins residing in the MCs are more easily detected. Comparison of the P. falciparum genes encoding MC proteins with the RMP genes using the RMP?P. falciparum synteny map revealed that 36 of the 50 genes (72%) are syntenic with the RMPs or belong to locally expanded gene families shared between the different species (T.W.A.K., unpublished observations). The relatively large proportion of syntenic orthologues encoding MC proteins indicates that a considerable part of the protein-export machinery is conserved between P. falciparum and the RMPs.

The P. falciparum 'secretome' and 'permeome'

Malaria parasites secrete proteins across the vacuolar membrane into the erythrocyte cytosol or to the erythrocyte membrane, inducing modifications of the erythrocyte that are necessary for parasite survival, but which are also associated with disease. Two studies have independently identified a conserved sequence motif (RXLX(E/Q)) in such secreted proteins, termed either the Plasmodium export element (PEXEL)47 or the vacuolar transport signal (VTS)48. Bioinformatics using the PEXEL/VTS signal sequence predicts a 'secretome' of 300?400 proteins for P. falciparum (8% of all genes). In addition to 225 var, rif and stevor genes, the secretome includes 160 genes encoding proteins that are likely to be involved in remodelling of the host erythrocyte, including heat-shock proteins, kinases, phosphatases and putative transporters47; this vastly expands the number of potential vaccine and drug targets. The PEXEL/VTS motif seems to be distinct from known cellular-transport signals, which indicates that it might be a novel eukaryotic secretion signal associated with intracellular parasites. Interestingly, a similar sequence motif (RXLR) has been found in oomycete effector genes associated with avirulence49. The plant-pathogenic oomycetes require host tissue for at least parts of their life cycle, and the authors hypothesized that this motif, like the Plasmodium PEXEL/VTS motif, mediates transport to the host cells.

In addition to the transport of parasite proteins to the erythrocyte, intra-erythrocytic parasites take up nutrients from the erythrocyte cytosol and excrete metabolic waste products. Membrane-transport proteins mediate these processes but are also implicated in antimalarial drug resistance. Furthermore, the parasites need ion channels to maintain their ion homeostasis. The initial annotation of the P. falciparum genome identified only a limited number of transporters and no channels. By combining different bioinformatic approaches, several putative ion channels and >100 transporters were identified, including equal numbers of known and candidate transporters for a range of organic and inorganic nutrients50, collectively termed the P. falciparum 'permeome'. Most of the permeome could be organized in known classes of transporters, but the authors found that 17% (19 genes) show no sequence homology to any known family of transporters, and five of these genes were grouped as a novel family of P. falciparum-specific transporters. Although this doubled the amount of candidate transporters compared with the number reported in the genome paper, the transporter-gene content of the P. falciparum genome (2.1%) is the lowest reported for any organism so far.

P. falciparum protein-interaction networks

Understanding the interactions between proteins can provide insights into the function of, and functional relationships between, these proteins. Recently, the first large-scale analyses of interactions between proteins during the asexual blood stages of P. falciparum have been published51,52. Using a high-throughput yeast two-hybrid assay, 2,846 interactions were identified involving 1,312 largely uncharacterized proteins in 29 highly connected protein complexes. By combining information on protein interactions with patterns of co-expression and putative function, informed by annotation and the presence of specific domains, groups of interacting proteins were identified that have a role in chromatin modification, transcription, mRNA stability and ubiquitination, and invasion of host cells.

Improved insights into the structure?function relationships of increasing numbers of proteins might reveal new drug targets. Several groups have begun initial attempts to achieve a larger-scale protein-structure analysis by generating expression libraries of soluble proteins. The Structural Genomics Consortium has started an admirable initiative to attempt a high-throughput elucidation of three-dimensional structures of Plasmodium proteins (Box 1). The structural data produced are freely available and, so far, 19 proteins from different Plasmodium species and other apicomplexan parasites have been resolved.

Comparative genomics

Although analysis of a single genome provides tremendous biological insights for any given organism, comparative analysis of multiple genomes can provide substantially more information on the physiology and evolution of genomes, and increases our ability to identify and assign putative functions to predicted coding regions. Orthology recognition is becoming increasingly sophisticated, and bioinformatic methods to improve Plasmodium annotation through the recognition of global orthologies have been developed to discover and annotate biosynthetic pathways25,53. Comparative genomics can also help to identify orthologous genes or refine gene predictions through local alignments, substantially improving multi-exon gene models3,54. When closely related species within a single genus are compared, this should provide additional levels of insight into, for example, the repertoire of species-specific genes that might be associated with differences in lifestyle, such as the invasion of reticulocytes versus normocytes by Plasmodium merozoites, and even into speciation.

Animal models of malaria, although limited, have long been established as alternative means to gain insights into many aspects of the biology underlying the parasite?host/vector interactions that cannot be obtained readily or ethically working with the human malaria parasites P. falciparum and Plasmodium vivax . In addition to several primate parasites (for example, Plasmodium reichenowi , a close relative of P. falciparum that infects chimpanzees, and Plasmodium knowlesi , which is more closely related to P. vivax) and a chicken parasite ( Plasmodium gallinaceum , which has an intriguing phylogenetic relationship with all four human malaria parasites55,56), much work is done using RMPs as they are cheaper to maintain in vivo and there are fewer ethical concerns in the handling of their host organisms.

Significant amounts of genome data are available for all of the aforementioned parasites (Table 1), including the second most important human malaria parasite, P. vivax57. Its genome sequence has almost been completed, and annotation and analyses are drawing to a close (see Gene Indices database in Box 1; Jane Carlton, personal communication). These extensive genome data sets from different Plasmodium species have not only facilitated comparative genomics, but have also given rise to significant post-genomic studies, characterizing both the transcriptome and proteome of different life-cycle stages. Comparison of the P. falciparum genome with other genome data available in 2002 showed that 60% of the annotated P. falciparum genes could not be assigned functions and could therefore encode functions that are unique to P. falciparum or to the genus Plasmodium. More recently, the genomes of several other unicellular parasites have been published, allowing cross-genus genome comparisons of closely related parasites. The list of unicellular parasites for which significant amounts of genome sequence are now available includes two apicomplexan parasites infecting humans, Cryptosporidium parvum 58 and Cryptosporidium hominis 59; two apicomplexan parasites infecting cattle, Theileria parva 60 and Theileria annulata 61; Entamoeba histolytica 62; and three kinetoplastid parasites, Trypanosoma brucei 63, Trypanosoma cruzi 64 and Leishmania major 65, and the genome sequence of Toxoplasma gondii is nearing completion.

Table 1 Current status of the Plasmodium genome projects (March 2006)

The first comparative analysis of the genomes of two apicomplexan species, P. falciparum and C. parvum, showed that both lineages have acquired protein-adhesion domains originating from proteins of their animal hosts, and identified at least 145 apicomplexan-specific genes66. Initially, comparative genome analyses of Theileria, Cryptosporidium and Plasmodium species with other public genome databases indicated that the genomes of all three apicomplexan lineages have an unexpected paucity of specific transcription factors, despite their complex life cycles. However, a new apicomplexan family of genes with apetala 2 (AP2)-integrase DNA-binding domains was found; AP2 domains are predominantly found in plant transcription factors67. Further discussion of (comparative) genomics of the other apicomplexan and kinetoplastid parasites is beyond the scope of this review, but it is clear that detailed comparisons of the genomes of these species will help to unravel the function of many hypothetical Plasmodium genes and should lead to new insights into the complex parasitic lifestyles.

Comparative genomics between species

Extensive synteny between genomes. Comparative genome analysis between Plasmodium species was initiated by gene-mapping studies on separated chromosomes68,69,70, followed by more detailed analysis of small fragments of individual chromosomes71,72. In general, these studies demonstrated significant conservation of gene-linkage groups (synteny) between different species. By definition, synteny is the conservation of gene association in organized blocks. Within the blocks there can be deletions and changes in gene order but the syntenic relationship of the genes remains unaltered. However, the complete picture of the degree of synteny in Plasmodium remained unclear until sufficient sequence data were available.

The high degree of synteny between more distantly related Plasmodium species was demonstrated with the publication of the extensive genome shotgun sequence of the RMP P. yoelii3. Contigs covering >70% of the P. yoelii genome could be aligned along the core regions of the 14 P. falciparum chromosomes. The similarity of these two Plasmodium genomes was not only demonstrated by the high level of synteny but also mirrored by the predicted gene content. This was the first-ever comparison of the gene content of two eukaryotic species within a single genus, and it identified >3,300 P. yoelii orthologues of the 5,268 predicted P. falciparum genes. Although the orthologues were predominantly housekeeping genes, orthologues of candidate vaccine antigens involved in parasite?host/vector interactions (for example, circumsporozoite protein (CS) and members of the merozoite surface protein (MSP) and 6-Cys families) were also identified.

Such high levels of orthology were perhaps unsurprising, given that most of the features of the life cycle are conserved between different Plasmodium species. However, the validation of model malaria species that is provided by their genetic similarity to those species that infect humans has emphasized the fact that structure?function studies on P. falciparum vaccine candidates could be carried out with the more accessible, tractable model species. Therefore, the molecular mechanisms underlying gamete fertilization73 and sporozoite motility74 could reasonably be studied in model systems. Non-primate models might be less appropriate to investigate adaptive processes of human parasites, such as the ability of P. falciparum to successfully invade human erythrocytes through several independent routes, which are different from the routes of the RMPs and P. vivax. The complexity and flexibility of erythrocyte invasion by P. falciparum might have evolved as part of a selection and counter-selection 'arms race' that model species clearly cannot recreate.

The initial comparisons of the genomes of different Plasmodium species have recently been extended with the publication of two additional partial shotgun genome-sequence data sets (4× coverage each) from the RMPs P. berghei and Plasmodium chabaudi 33. The virtually complete synteny and high levels of nucleotide-sequence identity (88?92%) between the genomes of the three RMPs allowed the construction of composite RMP contigs, extending the contig size by an average of 400% (Ref. 19). Combining these contigs with chromosome-mapping studies has enabled complete comparative synteny maps to be compiled for the 'prototype' RMP genome and that of P. falciparum19,33 (Fig. 1). These maps again showed the extensive synteny of the internal chromosome regions. Interestingly, a minimum of only 15 gross chromosomal rearrangements that reshuffle the 36 synteny blocks is needed to convert the P. falciparum genome into the composite RMP genome and vice versa19. Clearly, as comparative genomics is expanded to more Plasmodium species, it should be possible to reconstruct the organization of the minimal genome of the most recent common ancestor of the genus.

The combined sequence data from the different RMPs improved the P. falciparum orthologue predictions and revealed a conserved set of 2,125 genes with orthologues present in the data sets of all four species. Perhaps more telling is the fact that 4,500 of the 5,268 P. falciparum full-length protein-encoding genes had an orthologue in at least one of the three RMP genome data sets33. The discrepancy between these numbers is largely due to gaps in the sequences of the three RMPs. The 736 P. falciparum-specific genes without any RMP orthologue were analysed in more detail; 575 genes were located in the subtelomeric regions and sharply defined the boundaries between the subtelomeric and core regions of the chromosomes, whereas the remaining 161 P. falciparum-specific genes, as well as seven newly identified putative genes associated with chromosome-internal var clusters, were located in the core regions of the chromosomes19.

Lack of synteny in the subtelomeric regions. The subtelomeric regions of Plasmodium chromosomes lack synteny, which stems from their generally distinct content of gene families (for example, the var, rif and stevor genes in P. falciparum are replaced by other families in RMPs and P. vivax) and from the presence of large numbers of species-specific non-coding repeat sequences. However, the gene content of the subtelomeric regions of different Plasmodium species is not completely species specific. The RMPs and P. vivax share a distinct family of subtelomeric variant genes, collectively known as the pir (Plasmodium interspersed repeats) superfamily, originally described in P. vivax as vir genes75. The pir superfamily is predicted to be large, with 150?850 members in each species, and has been proposed to include the P. falciparum rif family76. The proteins encoded by certain members of the pir superfamily have been localized to the erythrocyte surface77, suggesting a role in antigenic variation and immune evasion. However, P. berghei proteome data indicate that BIR (the related gene family in P. berghei is the bir family) proteins might have other, as-yet-unknown functions, as 9% of the BIR proteins were exclusively detected in mosquito stages33.

Our knowledge of the exact content and organization of subtelomeric gene families in species other than P. falciparum (3D7; 575 genes) remains incomplete. Although some general conservation might be anticipated between species, the true picture of multigene family diversity and organization, and the relationship to expression and function, can only emerge from increased genome sequencing. But what is clear is that the subtelomeric localization of these gene families should promote recombination, in turn generating diversity and therefore confusing synteny17,18.

This tendency to diversify is exemplified by the recent analysis of members of a subtelomeric gene family present in RMPs and P. falciparum. These genes were first identified as two different species-specific families through BLAST analyses within the RMPs (pyst-b) and P. falciparum (pf-fam-b), yet could be classified as members of the same interspecies gene superfamily (renamed pfmc-2tm) only through shared predicted protein structure (basic proteins with two transmembrane (TM) domains), as they lacked obvious sequence similarities78. Shared structural features of proteins encoded by subtelomeric gene families also suggest the existence of a gene superfamily within P. falciparum that includes both rif and stevor genes78. Interestingly, this superfamily might well be extended to include the subtelomeric pir genes found in other Plasmodium species, again indicating the rapid gene evolution that is one consequence of their subtelomeric location. Supplementary information S1 (table) provides an overview of all P. falciparum-specific, RMP-specific and their common subtelomeric gene families.

Disruption of synteny by species-specific genes. Analysis of the 168 chromosome-internal P. falciparum-specific genes that are not present in the RMP genomes revealed that 126 of these genes disrupt synteny within the synteny blocks (intrasyntenic genes), with 42 located at the synteny breakpoints between synteny blocks19 (intersyntenic genes; Fig. 2). Curiously, the synteny breakpoints in the RMPs only harboured five intersyntenic genes. Most P. falciparum intra- (62%) and intersyntenic (88%) genes encode predicted exported proteins that are destined for the membrane surface of the merozoite or the infected erythrocyte (including 13 var and 20 rif genes in the intra- and intersyntenic regions, respectively; Fig. 2), and therefore are probably involved in parasite?host interactions. The presence of species-specific genes at synteny breakpoints suggests that gross chromosomal rearrangements might also have helped shape the species-specific gene content of the genomes of Plasmodium species. Evidence for the association between such gross chromosomal rearrangements and the generation of species-specific gene families has been found for a gene family encoding transforming growth factor-β (TGF-β) receptor-like serine/threonine protein kinases (pftstk)19, consisting of 21 copies in P. falciparum (and possibly P. reichenowi) compared with one copy for all other malaria species. This gene family is the first gene family for which a single progenitor gene in other Plasmodium species has been identified and which appears to have expanded relatively recently only in P. falciparum and P. reichenowi, possibly as the result of a specific adaptation to the (common ancestor of) human and chimpanzee hosts.

Figure 2: Inter- and intrasyntenic var clusters of Plasmodium falciparum chromosome 4.
figure 2

Comparison of the P. falciparum and composite rodent malaria parasite (RMP) genomes revealed the presence of P. falciparum-specific gene clusters either marking synteny breakpoints or disrupting synteny blocks, termed inter- and intrasyntenic regions, respectively. Here, two chromosome-internal var clusters of P. falciparum chromosome 4 (Pfchr4) are shown, one linking regions with synteny to RMP chromosomes 6 (green) and 7 (pink), the other disrupting a region syntenic to RMP chromosome 5 (lavender). The annotations of the syntenic (coloured) genes flanking the regions are provided and the P. falciparum-specific genes (white) are designated as follows: V, var genes; R, rif genes; H, hypothetical protein. The vicar elements are shown in grey. All pseudogenes are marked with a trident.

The analysis of the location of P. falciparum-specific genes using the synteny maps revealed that chromosome-internal rearrangements might have influenced the diversity and complexity of the Plasmodium genome, increasing the ability of the parasite to interact with its vertebrate host successfully. Furthermore, it indicates that determination of the synteny breakpoints might help to rapidly identify the species-specific gene content of future Plasmodium genomes.

Transcriptomes and proteomes of RMPs

Although global transcription profiles might only be correctly interpreted when a whole-genome database is available, intelligent applications of cDNA-based technologies were initiated well before the publication of homologous genome data. Several RMP EST libraries and one P. vivax library79 have been produced, generating tens of thousands of sequences80,81,82,83 that can be compiled separately or in a common database, allowing investigations of transcript-specific features such as splicing. In addition, several stage-specific enriched suppression subtractive-hybridization libraries for RMPs throughout development in the mosquito have been generated41,84. These data have not only pinpointed stage-specific transcripts, but also confirmed the conserved nature of invasive organelles that are associated with host-cell invasion by different stages of Plasmodium. In keeping with their morphological similarities, certain invasive organelle proteins are expressed in more than one invasive stage80.

DNA microarray studies covering 70% of the genes of P. berghei have been performed on blood-stage parasites and generally support the 'transcripts-to-go' model24. Transcription profiles of purified immature and mature gametocytes showed that these forms share many of the cellular processes common to asexual blood-stage parasites, but enter G0 (G1 arrest)33. The switch from asexual to sexual development involves significant reprogramming of the transcriptional activity of the parasite76 (25% of the genes on the array were upregulated) carried out on the background of ongoing basic cellular processes33.

An extensive high-throughput proteome survey has been carried out on five stages of the P. berghei life cycle, including the first survey of Plasmodium ookinetes and oocysts33. The study uncovered many predicted ookinete surface proteins that can be explored for their transmission-blocking potential. In addition, this study revealed that the variable antigenic PIR proteins are not only expressed in the blood stages but are expressed as virtually non-overlapping subsets in many different life-cycle stages, such as gametocytes and sporozoites. This expression pattern is more reminiscent of RIFINs (repetitive interspersed family) in P. falciparum, expression of which is also not exclusive to blood stages, than of PfEMP1, suggesting that PIR and RIFIN proteins have multiple functions in their respective hosts and are not just involved in antigenic variation during the blood stages. Through comparison of the proteomes (1,836 proteins in total) of the five different stages, a dichotomous strategy of protein expression was visible: the stage-specific expression of proteins that are directly involved in the interaction between the parasite and the different host cells (43 sporozoite-, 372 asexual-blood-stage-, 127 gametocyte-, 317 ookinete- and 89 oocyst-specific proteins), coupled to more constitutive expression of 136 proteins present in at least four of the investigated stages underlying the conserved cellular machinery of the parasite in most of the different life-cycle stages. Again, this relatively low number of constitutively expressed genes is probably due to the lower coverage of proteomic techniques. Conserved elements of the cellular machinery include organelle components associated with cell invasion by the parasite and were not necessarily what might usually be considered housekeeping proteins33. Proteomic analysis of the merozoite rhoptries of three RMPs identified 36 potential rhoptry proteins85. Comparison of these 36 RMP rhoptry proteins with the RMP?P. falciparum synteny map revealed that at least four genes are located in the subtelomeric regions and are therefore species-specific (T.W.A.K., unpublished observations). When the RMP rhoptry proteome was compared with the GPI-APs reported for the late-schizont and merozoite stages of P. falciparum44, only two proteins were found to be conserved. These proteins were also detected in the MCs43 and infected erythrocyte membranes45 of mature trophozoites and schizonts, respectively, further highlighting the technical difficulties of obtaining pure subcellular parasite fractions. Indeed, five additional RMP rhoptry proteins were found in these data sets but not in the GPI-AP data set. A comparison with the genome of P. vivax, a parasite that also preferentially invades reticulocytes, when it is released might indicate that some of these 'species-specific' rhoptry proteins are actually conserved and can be functionally analysed for their role in determining the type of erythrocyte that the merozoite invades.

In addition to the proteomes of ookinetes and oocysts of P. berghei, which have not yet been reported for a human malaria parasite, the individual proteomes of the male and female gametocytes have been analysed in P. berghei. The two proteomes contained 36% (236 of 650) and 19% (101 of 541) sex-specific proteins, respectively86. The protein content of the male gametocytes was the most distinct of all the proteomes reported for the life-cycle stages and shared only 69 proteins with the female gametocyte, showing the divergent features of both sexes. This proteome analysis revealed the presence of sex-specific phosphatases and protein kinases that are involved in gender-specific signalling pathways. Figure 3 provides a schematic overview of the available transcriptome and proteome data sets of the different P. falciparum and RMP life-cycle stages.

Figure 3: Transcriptomes and proteomes of different Plasmodium falciparum and rodent malaria parasite (RMP) life-cycle stages.
figure 3

Transcriptome and proteome studies have covered a wide range of Plasmodium life-cycle stages, although no detailed studies have yet been published for the less-accessible liver stages, the short-living zygotes or oocyst/haemocoel sporozoites. The coloured books indicate stages for which P. falciparum and RMP transcription and expression data are available. Incomplete data sets or data sets of less pure stages are indicated by incompletely coloured books. Life cycle modified with permission from Nature Ref. 103 © (2002) Macmillan Publishers Ltd.

Meta-analyses and practical applications

The data sets from various malaria-parasite genome sequences and significant proteome and transcriptome surveys from at least two species provide a unique opportunity to perform comparative analyses and examine aspects of the biology of Plasmodium that would not be possible with data sets from a single species. Several studies have already been published that show how the use of different combinations of global Plasmodium databases can generate novel insights into parasite?host interactions with potential therapeutic value. Comparative post-genomic analysis of RMP genomes allowed additional detail to be teased out of the predicted protein sequences of orthologous genes. Calculation of the nonsynonymous (dN) versus synonymous (dS) nucleotide substitutions can reveal genes encoding more rapidly evolving proteins (high dN/dS values) compared with more conserved proteins (low dN/dS values)87,88. Not surprisingly, in RMPs, proteins containing predicted signal peptide (SP) sequences and/or TM domains encoding potential secreted or surface proteins that might be exposed to the host immune response showed the highest dN/dS ratios33. Analysis of the expression data generated by both transcriptome and proteome studies showed that a significantly greater number of blood-stage SP/TM proteins had high dN/dS ratios compared with mosquito-stage SP/TM proteins. This difference could reflect amino-acid changes that have accumulated as a consequence of interactions with the host immune response and, therefore, could identify genes that are under selective immune pressure.

Most methods to detect genes under natural selection are based on the comparative analysis of sequences within and between species. However, the single-genome-based method called codon volatility89 defines the proportion of point mutations that result in codons encoding a different amino acid. Although this approach has been questioned90,91, observations have confirmed that genes under selective pressure, such as var genes, contain relatively more volatile codons as opposed to genes that are under strong purifying selection to maintain their protein sequence, such as housekeeping genes89.

An elegant method combining genome and proteome data to identify novel P. falciparum antigens used a strategy to mine genomic-sequence databases using epitope predictions to identify novel sporozoite antigens and epitopes recognized by experimentally vaccinated humans92. Such an approach could lead to the generation of an antigen map of sporozoite/liver stages ('immunosome'). Another novel method to identify antigens using genomic data has been described for P. chabaudi. This approach, termed linkage-group selection, is based on crossing two genetically different Plasmodium lines and then applying a selective pressure, in this study immune pressure, on the recombinant progeny. Subsequent analysis of the decrease in the frequency of parental alleles in the progeny after immune pressure by using quantitative genome-wide molecular markers can identify genome loci containing genes encoding proteins that were under immune selective pressure93. A third method to identify new antigens based on the genome and proteome data has been developed using P. yoelii. Exons of genes encoding sporozoite proteins were cloned in a DNA-immunization vector using high-throughput methods. These vectors were then used to immunize mice that were subsequently analysed for their protection against sporozoite infection94.

Combined analysis of transcriptome and proteome data gives an insight into the regulation of transcription and protein expression. The genome of P. falciparum contains only a limited number of genes encoding transcription-associated proteins (one-third of the number usually found in the genomes of free-living eukaryotes95). However, proteins containing CCCH-type zinc-finger motifs, which are often associated with modulation of mRNA decay and translation rates, are abundant95, suggesting that post-transcriptional processes have a significant role in regulating P. falciparum protein levels. Bioinformatic analysis comparing mRNA-transcript and protein-abundance levels for seven different stages of the P. falciparum life cycle indeed implied mechanisms of post-transcriptional control, either involving interplay between mRNA stability and degradation, gene-specific control of mRNA translation, or a combination of both34. The combination of transcriptome and proteome data for P. berghei also demonstrated the presence of post-transcriptional control of gene expression in gametocytes through translational repression. Translational repression was known to affect the expression of two gametocyte-specific transcripts that encode vaccine-candidate antigens (P28 and P25) translated only in the zygote just after fertilization96,97. Comparison of gametocyte transcriptomes and proteomes with the P. berghei ookinete proteome identified nine genes undergoing translational repression, and a sequence motif putatively involved in translational repression was subsequently identified in the 1-kb region downstream of these genes. This motif is not conserved in P. falciparum but shares a conserved sub-motif, the nanos response element (NRE), to which RNA-binding proteins of the pumilio family (PUF) can bind and which has a role in translational repression98. Similar analysis of P. falciparum identified two genes that contain an NRE in their 3′ untranslated region which have abundant transcripts in the gametocyte stage, whereas the proteins they encode are significantly more abundant in the gamete stage34.

A detailed understanding of the specific mechanisms of transcriptional and translational control in Plasmodium might reveal novel therapeutic targets and strategies. For example, targeting the unlocking (derepressing) of translational repression in gametocytes circulating in the blood might lead to inappropriate expression of gametocyte-specific translationally repressed transcripts, possibly resulting in both the inhibition of further gametocyte development and exposure of their protein products (including current vaccine candidates) to the host immune system, generating transmission-blocking immune responses.

Concluding remarks

Malaria research is in a period of intense data collection, ensuring that the 'labels' on each gene in the Plasmodium genomes and the proteins that they encode are correct. The wealth of information on gene transcription and protein expression, in addition to the wide range of new techniques that are now available, will prove essential, as it is only through the lens of full and accurate annotation and protein characterization that we will be able to make sense of ? and exploit ? the genome information. One of the difficulties that malaria researchers face is that some of the life-cycle stages are difficult to retrieve in pure form, if accessible at all. Although some stages might be accessible in other Plasmodium species, such as the RMPs, access to the P. falciparum mosquito stages and the availability of a system to study liver-cell invasion by sporozoites will be indispensable to gain a comprehensive overview of the complete P. falciparum life cycle. One can only hope that intriguing stages such as the P. vivax hypnozoites (the dormant parasites present in liver cells), which are so few in number and small in size, will one day become accessible for genome-scale analyses. Furthermore, the availability of pure subcellular organelles and structures from the different life-cycle stages, including rhoptries, micronemes, dense granules and MCs, might shed light on biological processes such as host-cell invasion and modulation.

Although drug- and vaccine-discovery programmes are already (and rightly) underway as a result of the availability of the Plasmodium genomes, the hard choices will be at the level of inclusion or exclusion of drug targets or vaccine-candidate antigens for further development. However, the increased knowledge of structural and functional properties of a large number of Plasmodium proteins as well as antigenic properties of vaccine candidates will greatly benefit the decision-making process. More educated lead development might also come from large-scale gene-disruption studies. Reverse genetics is a powerful approach that is used in malaria research to specifically alter the parasite genome to explore its biology and gain new insights into gene function and expression. Recently, several reverse-genetic techniques using transposons99,100 have become available, but so far no genome-wide studies have been published. In a post-genomic setting, reverse genetics should be one of the principle technologies to be applied to increase our understanding of parasite biology.

HIV/AIDS is a devastating disease that the world has only known for 30 years and for which there is the prospect of combating and controlling the disease and its transmission, should the financial resources be made available to provide the drugs that have been developed. Conversely, malaria is an ancient disease that has been known for thousands of years, and the aetiological agent was first recognized more than 100 years ago, yet malaria is a steadily worsening scourge, with new therapeutics some distance away. Although the financial support for malaria research has improved, significant investment is still required at all levels of investigation, development and application to realize the potential of the P. falciparum genome and translate its promise into a tangible effect.