Main

Systems biology, also known as integrative biology, combines information from many different sources, including gene-expression data, two-hybrid genetic-interaction data, protein-interaction data from the systematic analysis of protein complexes, and protein-expression data. This enables gene functions to be predicted and networks to be elucidated. Through the use of genome sequences, automation and parallel technologies such as microarrays, large-scale data sets containing consistent, high-quality data can be created, assembled and evaluated at significant cost savings over single-gene analyses. Novel networks can also be revealed. Some of the genome-scale data sets that are present in model organisms such as Saccharomyces cerevisiae (Table 1) have resulted from the phenotypic analysis of a set of deletion strains for every gene1,2, two-hybrid screens3,4, gene-expression analyses5,6, protein-complex co-precipitation studies7, subcellular-localization and protein-expression studies8,9, comparative sequencing and analysis of genetic diversity10,11, and bioinformatic analysis of pathways12. Recent research in yeast has concentrated on synthesis of the combined data to characterize networks and protein interactions13,14. Because the yeast genome is well annotated, it is an excellent model for the development of new systematic methods. The challenge will be to translate these methods to organisms that affect human health, such as the human malaria parasite Plasmodium falciparum . This article describes how these approaches are accelerating malaria research, and speculates on how they might lead to the development of new therapies or vaccines for malaria.

Table 1 Methods for generating data that can be used in systematic analyses

As the causative agents of human malaria, Plasmodium parasites are important contributors to the global morbidity and mortality rates — 300–500 million cases of malaria occur each year. In Africa, it is estimated that malaria causes approximately 1.5–2.7 million deaths annually, primarily in children under five (reviewed in Ref. 15), and the disease also poses an important health threat to travellers. The economic cost of malaria to the developing world is enormous. It has been estimated that the gross national product per capita is reduced by more than 50% in malarious countries compared with non-malarious ones16.

A malaria vaccine would impact the lives of many people who live or travel in disease-endemic areas, and much effort has been devoted to this end. The feasibility of a human-malaria vaccine is discussed in Refs 17,18. Studies have shown that infective, sporozoite-stage parasites (Fig. 1), which had been irradiated so that they could invade, but not replicate, in the host, could elicit an immune response that subsequently protected from reinfection19,20. Also, humans who live in malaria-endemic regions gradually cease to show symptoms of the disease, although parasites can still be found in their bloodstream. The observation that administration of gamma globulin isolated from adults with so-called 'naturally acquired' immunity can reduce parasitaemia in recipients is consistent with the development of antigen-specific acquired immunity in humans21. However, although a recent subunit-vaccine formulation based on the sporozoite-expressed circumsporozoite protein (CSP) has shown some promise22, there is currently no licensed vaccine despite extensive testing of several candidates23. There are several reasons why the development of a malaria vaccine poses a considerable challenge: blood-stage parasites alter the complement of antigens that are on the surface of the red blood cell, or on the surface of the merozoite during its brief extracellular phase; the parasite population is genetically diverse; our understanding of the human immune response to malaria is incomplete; or the right antigen has yet to be discovered.

Figure 1: The malaria life cycle.
figure 1

Plasmodium species are parasites of red blood cells and hepatocytes. Haploid sporozoites are injected into the vertebrate host when a female mosquito takes a blood meal. They rapidly invade hepatocytes, where they undergo asexual multiplication, generating several thousand merozoites. Merozoites released from hepatocytes then invade erythrocytes, where they again multiply and mature progressively from ring-stage parasites to trophozoites to schizonts. The erythrocyte eventually ruptures, releasing 8–32 new merozoites, which can infect new erythrocytes. The asexual cycle takes about 42–48 hours, during which time the human host has periodic cycles of fever and chills. In response to a cue that is not understood, some of the parasites exit the asexual cycle and form male and female gametocytes in a process known as gametocytogenesis. The mosquito takes up the mature sexual forms, and sexual reproduction of the parasite occurs in the insect's midgut. Sporozoites released from the oocyst move to the mosquito salivary glands, which facilitates parasite transmission. Pre-erythrocytic vaccines, which would prevent infection, are targeted against proteins that are present in the sporozoite or liver stages, for example, circumsporozoite protein (CSP), thrombospondin-related adhesion protein (TRAP), exported protein 1 (EXP1), liver-stage antigen-1 (LSA1) and LSA3 (Ref. 15). Erythrocytic vaccines, which would reduce the severity of the disease, target proteins that are expressed in the merozoite phase of the life cycle: apical membrane antigen-1 (AMA1), merozoite surface protein-1 (MSP1), MSP2–5, P. falciparum erythrocyte membrane protein-1 (PfEMP1), ring-infected erythrocyte surface antigen precursor (RESA), erythrocyte-binding antigen (EBA) and glutamate rich protein (GLURP)15. Furthermore, some research has focused on developing vaccines against sexual-stage antigens, which would not reduce symptoms but could reduce transmission of the disease (Pfs230, Pfs48/45, Pfs25/28)15. All drugs target erythrocytic parasites, but some are active against liver and sexual stages as well24. Figure modified with permission from Ref. 40 © (2002) Macmillan Magazines Ltd.

Although vector-control strategies such as bednets can reduce the spread of malaria, drugs remain crucial weapons for preventing infections as well as reducing transmission, symptoms and mortality from the disease. However, resistance to inexpensive drugs such as chloroquine has emerged and has spread quickly, and multidrug-resistant Plasmodium strains are now common (reviewed in Ref. 24).

Systems biology for malaria

To rationally develop new therapies for malaria, molecular details of parasite stage-specific development need to be understood. This is a difficult task compared with many other microbial pathogens. Plasmodium has a complex life cycle, and although P. falciparum, the species responsible for most human deaths, can be cultured in human erythrocytes25, many stages of its life cycle cannot be easily maintained in cell culture. Cell culture of the mosquito stages has been described for rodent parasites26, but is not yet widely used. Other human Plasmodium species such as Plasmodium vivax , Plasmodium malariae and Plasmodium ovale also cause disease, and these species are even more refractory to experimental manipulation. P. vivax can only be cultured in human reticulocytes, which are difficult to obtain27. Transient transfection and gene disruption has been achieved in several Plasmodium species28,29,30,31 (see Supplementary information S1 (table)), but the process is labour intensive and, until recently, inefficient. Because the malaria parasites are haploid for most of their life cycle and because stable transfection has been reported only for erythrocytic stages, mutants that bear deleterious mutations in genes that are essential for parasite growth in erythrocytes are not easily recovered. The DNA of most Plasmodium species is AT rich (up to 90% in non-coding regions), and long, homopolymeric tracts of A and T recombine when cloned in Escherichia coli or are unclonable in some circumstances, making vector construction, subcloning and sequencing time-consuming (reviewed in Ref. 32).

When the complete genome sequence of P. falciparum was determined in 2002 (Ref. 33), it was shown to encode more completely uncharacterized genes ('hypothetical proteins') than other lower single-celled eukaryotes such as S. cerevisiae, in which 75% of putative protein-coding open reading frames are now characterized33,34,35,36. At least 65% of the genes identified in the P. falciparum project are described as 'hypothetical', indicating that they showed no significant homology to characterized genes from other species. Many others are described only as a 'kinase' or 'putative cysteine protease'. Although divergence from the ancestral eukaryotic tree might have reduced the number of statistically significant matches to characterized, homologous genes from other species when conventional search methods are used37, the low number of homologous genes might also reflect historically low levels of malaria-research funding. Also, some genes might be involved in parasite-specific processes that are not found in other model organisms. Because of difficulties in malaria research, the number of genes that remain hypothetical is unlikely to be reduced quickly if conventional gene-by-gene approaches are used.

Having as much information as possible about potential drug targets, such as proteases, kinases or enzymes, is fruitful in informing the optimal selection of truly novel targets from this list of hypothetical or uncharacterized genes. Although finding chemical inhibitors of known, validated targets, such as dihydrofolate reductase, can lead to new drugs, in many cases the lack of chemical diversity in libraries used in drug-screening efforts can limit the development of novel classes of antimalarials against known targets. Finding a novel chemical scaffold would be ideal, because parasite resistance to one class of compounds might result in cross-resistance to similar classes.

The availability of genome sequences has been a tremendous boon to malaria researchers and has already facilitated the search for new drugs. Through comparative genome analysis, enzymatic pathways have been discovered that are not found in humans. Researchers in public–private partnerships (the Medicines for Malaria Venture) are working to discover novel inhibitors to interrupt some of these pathways, such as the non-mevalonate pathway of isoprenoid biosynthesis36.

Expression analysis and gene function

For sequenced microorganisms, DNA microarrays can be used to inexpensively determine the expression programme of almost every gene in the genome. The expression programme can be used to begin to predict the functions of uncharacterized genes. For example, in both prokaryotes and eukaryotes, genes that encode components of multiprotein complexes or pathways are often transcriptionally co-regulated. In prokaryotes, such genes are members of an operon, with a single promoter. In eukaryotes, co-regulated genes might have similar promoter elements in their upstream regions. In S. cerevisiae, reliable functional annotations are available for most genes, so that the hypothesis that genes encoding proteins involved in similar processes are co-transcribed can be mathematically validated. Indeed, in S. cerevisiae, genes that encode members of protein complexes such as the ribosome, the proteasome and the DNA polymerase complex or genes involved in metabolic pathways such as electron transport are under tight transcriptional control and show similar patterns of induction or repression under different conditions5,6. Gene function might be assigned in P. falciparum by examining gene-expression profiles in different conditions and sorting genes into groups (clusters) that have similar expression patterns. If some gene functions are known, others can be inferred. For example, the rhoptry is a specialized invasion organelle in P. falciparum. Clusters of genes that contain known rhoptry genes could be identified to pinpoint unassigned potential rhoptry genes.

Gene-expression profiling is without doubt one of the most efficient approaches to extract large amounts of functional information. With amplification methods, ample quantities of parasite RNA (10–100 ng) can be extracted from almost every stage of the organism's life cycle35,38,39. In contrast to methods such as sequencing cDNA libraries or differential display, these data sets are comprehensive, which is the main power of the approach. It is just as important to know when a gene is not expressed as when it is expressed. In other single-celled parasites, such as trypanosomes, RNA editing has an important role in the control of protein expression. This has led to speculation that gene expression might be regulated differently in P. falciparum compared with other protozoan species40, especially as the P. falciparum genome seems to encode fewer conventional transcription factors and more RNA-binding proteins41. Preliminary analysis of gene-expression patterns has indicated that, similar to S. cerevisiae, transcripts that encode ribosomal proteins and other multiprotein complexes are tightly regulated during the erythrocytic life cycle although the genes are distributed across all 14 chromosomes39. Although adjacent co-regulated genes are observed39,42 there is little evidence for a higher frequency of adjacent co-regulated genes than in S. cerevisiae6. As discussed below, there is preliminary evidence that genes with common expression patterns might also share sequence motifs upstream of their promoters.

There are probably limits to the amount of information revealed by expression profiling, as some proteins are likely to be controlled at the level of transcription, others at the level of RNA stability, others at the translational level35,43 and others at the activity level44. Comprehensive proteomic analyses of different life-cycle stages of the P. falciparum and Plasmodium berghei genomes have been published35,42,45,46. In these so-called 'shotgun proteomics' experiments, protein extracts derived from relatively pure life-cycle-stage populations are digested and fractionated using chromatography or gel electrophoresis. The proteins are then analysed using tandem mass spectrometry. The sequences of detected peptides are then compared with predicted proteins. Although it might be difficult to infer much from proteins that are detected once or twice across the malaria life cycle, this analysis can provide information about potential protein function if multiple high-confidence spectra are detected in just one (stage-specific) or in all (housekeeping) stages. Identifying the proteins in purified subcellular organelles, such as the rhoptries, provides additional functional data47.

A significant challenge to the identification of novel malarial drug targets lies in identifying genes that are likely to be essential to parasite viability in erythrocytes. Drugs are seldom 100% effective at inhibiting protein activity and, therefore, if the protein is not crucial for survival, it will not be a good target. This is more important than determining whether or not a homologous enzyme is found in humans, because chemists can usually engineer small molecules that are selective for parasite protein and not human protein, providing a therapeutic window. Many effective and widely used antifungal drugs, for instance, target proteins that have close human homologues. Because Plasmodium is haploid for most of its life cycle, and stable integration of disruption constructs by stable transfection and homologous recombination has only been reported in erythrocytic stages, definitive genetic proof of a gene's essential function is difficult to obtain. The use of a tetracycline-regulatable promoter has been reported, and this might eventually allow gene-dosage experiments to be carried out48. On the other hand, essential proteins are more likely to be conserved in evolution49 and to have more interaction partners in protein–protein interaction networks. A comprehensive two-hybrid study in which 32,000 searches of P. falciparum baits against P. falciparum activation domains were carried out identified 2,846 protein–protein interactions50. Whereas having many interacting partners might be an indication that a protein is essential for viability, the identities of a protein's interacting partners might provide clues about the protein's function. For instance, if a protein interacts with a known gene that is involved in invasion, has an expression pattern that mimics other genes involved in invasion, and carries promoter motifs that indicate that it might be co-regulated with other invasion genes, it is probable that this protein is involved, directly or indirectly, in invasion.

The power of systems biology in malaria is that it allows accurate predictions of gene function and serves as a complement for powerful traditional genetic methods that have proven so useful in model systems. However, tools are needed that can assign probabilities to predictions and that can flag interactions that are probably spurious.

Blocking transmission and development

In response to a molecular signal that has not yet been identified, some blood-stage parasites exit the asexual erythrocytic cycle and undergo sexual development, differentiating into male and female gametocytes, which are taken up by the mosquito in a blood meal. These sexual forms of Plasmodium are responsible for transmission of the disease from one person to the next. An ideal antimalarial drug or drug combination would both kill rapidly multiplying asexual parasites and interfere with sexual development to block the transmission cycle. Signalling and developmental cascades such as those that might occur during sexual development or in the vector stages of the Plasmodium life cycle are common points of therapeutic intervention in humans. Because classic forward-genetic strategies cannot easily be applied to P. falciparum, identification of genes with important roles in Plasmodium developmental biology has been difficult, and compared with model organisms we know little about the identity of proteins that regulate growth and development. Knowing the identity of such proteins could provide new strategies for malaria control. For example, G-protein-coupled receptors and nuclear receptors, which bind small molecules and ultimately change transcriptional patterns, are some of the most important classes of drug targets in humans. It is likely that expression data from P. falciparum can be used to find novel transcription-factor-binding sites (motifs) that are associated with the transcriptional changes that occur during sexual development or other life-cycle stages. Sequences bearing these motifs could then be used to biochemically purify potential regulatory proteins. Using gene-expression information to find sequence motifs that are involved in controlling transcription has worked well in many other microorganisms, including bacteria51 and S. cerevisiae52, in which most (but not all) known transcription-factor-binding sites can be rediscovered. A comprehensive review and assessment of these techniques has recently been published53. Genes are first grouped by their common expression pattern52 or by their common function54. Then, the regions upstream of the start codon are systematically searched for all overrepresented sequences (such as ATGGAC) or by deterministic methods, such as MEME10. In Plasmodium, site-directed mutagenesis of promoter sequences is considerably more difficult than in yeast. Therefore, expression profiling combined with bioinformatics analysis and comparative genomics (Box 1) is probably the most efficient way to identify specific motifs that can be used to construct stage-dependent reporters of gene activity or to identify the transcriptional regulatory protein that binds to the motif. A search of the 1,000-base-pair regions upstream of the start codons of genes that are expressed during sexual development in P. falciparum identified a statistically overrepresented novel palindromic sequence (TGTANNTACA)55. When compared with the frequency of the motif in all other upstream regions in the genome, the probability of this enrichment by chance is roughly 1 in 1024. The motif is also statistically enriched upstream of homologous genes in P. berghei, Plasmodium yoelii and Plasmodium chabaudi , and it is preferentially found about 400–600 nucleotides upstream of the ATG in co-regulated genes but shows a random distribution in unregulated genes. Previous random exonuclease deletion mapping of a promoter carrying this motif56 showed that its disruption almost completely abolished promoter activity, providing strong evidence of its functional significance and the power of the bioinformatic approach. The next goal is to find the protein that binds to this or other motifs. If the activity of this transcription factor is controlled by the binding of a small molecule or a metabolite, an analogue of the small molecule could be used as a transmission-blocking drug. If other uncharacterized motifs are found to control the transcriptional regulation of known drug targets, other members of the regulatory pathway might be discovered among uncharacterized genes if they also bear the motif.

Motifs can also be found by examining the proteome. Although quantitative proteomic methods that rely on enzymatic or metabolic labelling have not gained widespread use in malaria parasites, estimates of protein abundance can be determined by counting the number of peptide spectra that are detected for a protein using mass spectrometry for a particular life-cycle stage. Two studies have sought to compare protein and transcript levels in P. falciparum and P. berghei35,43. Both studies found a delay between the time a transcript appeared and the time a cognate protein appeared for some life-cycle stages, and concluded that post-transcriptional gene silencing might have a role in control of translation during sexual development. Furthermore, analysis of the upstream untranslated regions of P. berghei genes that are subject to post-transcriptional gene silencing revealed the presence of a 47-nucleotide sequence that was overrepresented among these genes35. The hypothesis that there might be a translational delay makes sense because the parasite must undergo a rapid morphological transition once it leaves the mammalian host and enters the mosquito. Therefore, the parasite might store transcripts in preparation for this transition. If a drug was identified that could relieve translational repression in maturing sexual stages, it might effectively sterilize these parasites, leaving them unable to complete their life cycle and preventing transmission.

Systems biology and vaccine development

Although discovering appropriate formulations and adjuvants will probably have a key role in the development of a successful vaccine, most research has tended to focus on a fairly small class of historical antigens (Fig. 1). The acquisition and analysis of genome-scale data sets through systems biology might assist in the identification of novel vaccine targets, which could theoretically be more protective. The study of gene-expression and proteomic data sets can reveal which of the 5,300 Plasmodium proteins are expressed in the invasive stages of the Plasmodium life cycle (sporozoites or merozoites) and might be the most promising antigens. Doolan et al.57 used proteomic and genomic data to identify 27 potential sporozoite-stage antigens. To determine whether these proteins were antigenic, peripheral blood mononuclear cells (PBMCs) were obtained from volunteers that had previously been immunized with an irradiated-sporozoite vaccine and subsequently challenged with infectious mosquitoes. The antigens were tested for their ability to induce and recall an ex vivo interferon-γ response in the PBMCs of these volunteers. Of 27 proteins, 16 were antigenic using this assay — in some cases, the proportion of volunteers who showed a response to the antigen was higher than for antigens used in clinical trials, such as the CSP57. PBMCs of all the immunized volunteers reacted with one uncharacterized protein, PFL0800c, discovered by this genomic approach57. Both transcription39 and proteomic42 data indicate that this protein is one of the most highly expressed in the sporozoite stages.

Although data on stage-specific expression and the levels of expression in proteomic and transcription data allow the selection of novel proteins for further characterization as vaccine candidates, further information is provided by studying the frequency of polymorphisms in different genes across Plasmodium strains. Plasmodium is known to vary the presentation of antigens in successive generations58,59. Some of the antigenic variation is probably the result of mitotic recombination in members of multigene families, such as the mostly subtelomeric var genes, which encode versions of the P. falciparum erythrocyte membrane protein-1 (PfEMP1)60. Most of these proteins are weakly expressed, with a selective silencing mechanism in place to ensure that only one member of the family is fully transcribed at any time61. However, other P. falciparum immunogens, such as the CSP, the merozoite surface protein-1 (MSP1) or the apical membrane antigen-1 (AMA1), are not members of multigene families, are centromere proximal and are expressed at high levels. Levels of genetic variability in the introns of housekeeping genes are low in P. falciparum62 but in these immunogens, variability is exceptionally high63,64,65, implying that these genes are under intense selection pressure from the host. These and similar highly variable, uncharacterized proteins might be the targets of human naturally acquired immunity. Indeed, in P. chabaudi, mice that have been infected with one MSP1 variant cannot be reinfected by a different strain that carries the original MSP1 allele66. Genome-wide hybridization methods have been used to rapidly and inexpensively characterize genetic variability in yeast67, and this method also works in P. falciparum68, potentially providing an easy way to identify genes under selection across strains.

Outlook

Although systems biology holds significant promise, challenges remain. It is theoretically feasible to create, at least in rodent parasites, lines that express epitope-tagged or green-fluorescent-protein-tagged versions of all parasite proteins, and this could allow the rapid characterization of protein complexes or permit systematic, informative protein-localization studies. Indeed, by studying the localization and expression patterns of a handful of genes, and by comparing their protein sequences, a short amino-acid motif that directs a nascent protein out of the parasite to the surface of the red blood cell was discovered69,70.

The sequential disruption of genes is time consuming, and thought should be given to devising strategies for systematically creating and phenotyping mutant P. falciparum strains. Because Plasmodium genes are more difficult to clone and overexpress than genes from model organisms, there might be exceptional difficulties associated with creating arrays of Plasmodium recombinant proteins that can be used to examine kinase specificity or other assays of protein–protein interaction. However, the Gateway recombinational cloning system, which has been used in several genome-scale cloning efforts in humans, has recently been adapted to P. falciparum71.

Ultimately, all of the gene-identification and gene-expression data will need to be assembled in a repository where it can be easily retrieved. The malaria-research community is fortunate to have a dedicated repository — PlasmoDB — for malaria sequence and functional-genomics data sets and, therefore, this is unlikely to be a large hurdle. As systems-biology data are integrated, malariologists will probably use PlasmoDB to determine the probability that a gene is involved in a defined process based on expression data, proteomic data, protein-interaction and protein-localization data. Comparative sequencing or comparative genome hybridizations might identify subsets of genes that are under selection pressure from the host or from drugs. Discovering novel host alleles that result in increased or reduced susceptibility to malaria might reveal new interactions between parasite and host genes. Many of the approaches suggested here are not specific to P. falciparum but could be applied to a range of microbial pathogens. Researchers must keep in mind that more data is better, because the quality of a data set with 5,000 protein–protein interactions can be more readily evaluated with statistical methods than one that contains only 10 interactions. Support for this type of research, which might not involve testing a specific hypothesis, needs to be encouraged by the funding agencies. It might take several years before the impact of systems biology on malaria research can be fully realized because researchers will need to test the hypotheses using traditional methods and this is likely to be time consuming. Although every discovery might not lead directly to a new drug or a vaccine, the basic findings that are made through integrative biology will contribute to our understanding of Plasmodium biology, and this will serve as a foundation that will ultimately increase the speed at which new therapies can be developed.