plasmodium genomics

Nature 419, 531-534 (3 October 2002) | doi:10.1038/nature01094; Received 6 August 2002; Accepted 2 September 2002

letters to natureSequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14

Malcolm J. Gardner1, Shamira J. Shallom1, Jane M. Carlton1, Steven L. Salzberg1, Vishvanath Nene1, Azadeh Shoaibi1, Anne Ciecko1, Jeffery Lynn1, Michael Rizzo1, Bruce Weaver1, Behnam Jarrahi1, Michael Brenner1, Babak Parvizi1, Luke Tallon1, Azita Moazzez1, David Granger1, Claire Fujii1, Cheryl Hansen1, James Pederson2, Tamara Feldblyum1, Jeremy Peterson1, Bernard Suh1, Sam Angiuoli1, Mihaela Pertea1, Jonathan Allen1, Jeremy Selengut1, Owen White1, Leda M. Cummings1,3, Hamilton O. Smith1,3, Mark D. Adams1,3, J. Craig Venter1,3, Daniel J. Carucci2, Stephen L. Hoffman2,3 and Claire M. Fraser1

The mosquito-borne malaria parasite Plasmodium falciparum kills an estimated 0.7–2.7 million people every year, primarily children in sub-Saharan Africa. Without effective interventions, a variety of factors—including the spread of parasites resistant to antimalarial drugs and the increasing insecticide resistance of mosquitoes—may cause the number of malaria cases to double over the next two decades1. To stimulate basic research and facilitate the development of new drugs and vaccines, the genome of Plasmodium falciparum clone 3D7 has been sequenced using a chromosome-by-chromosome shotgun strategy2, 3, 4. We report here the nucleotide sequences of chromosomes 10, 11 and 14, and a re-analysis of the chromosome 2 sequence5. These chromosomes represent about 35% of the 23-megabase P. falciparum genome.

P. falciparum chromosomes were resolved on preparative pulsed field gels, and used to prepare shotgun libraries of 1–2-kilobase (kb) DNA fragments in plasmid vectors. Sequences of randomly selected clones were assembled, and gaps were closed using primer walking on plasmid templates or polymerase chain reaction (PCR) products. The cross-contamination of the chromosomal libraries with sequences from other chromosomes (up to 25%) and the high (A + T) content (80.6%) of P. falciparum DNA caused extreme difficulties in the gap closure process. Intergenic regions and introns frequently contained long runs of up to 50 consecutive A or T residues that were difficult to clone and sequence. The high (A + T) content of the chromosomes also prevented the construction of large insert libraries that could be used to construct scaffolds of ordered and oriented contiguous DNA sequences (contigs) during assembly. Similar but more severe problems were reported in the sequencing of the (A + T)-rich chromosome 2 of the slime mould Dictyostelium discoideum6, illustrating the need to develop better methods for the cloning and sequencing of very (A + T)-rich genomes. The reported sequences contain three or four short gaps (<2 kb) in each chromosome. Contigs comprising these chromosomes were joined end-to-end before annotation. Efforts to close the remaining gaps will continue.

Examination of the sequences of chromosomes 2, 10, 11 and 14 revealed that the structure of these chromosomes was similar to that of the other chromosomes. All contained the 97–99% (A + T) putative centromeric sequences reported previously7. Conserved subtelomeric sequences2 were observed in chromosomes 2, 10 and 11, but most of these elements had been deleted from both ends of chromosome 14. The termini of chromosome 14 consisted of telomeric hexamer repeats fused directly to truncated var (variant antigen) genes. Deletions of this type are thought to be due to chromosome breakage and healing events that occur during in vitro cultivation of the parasite.

Annotation procedures have improved since the publication of the P. falciparum chromosome 2 sequence5. A gene finding program, phat (pretty handy annotation tool8), was developed, supplementing the GlimmerM program9 used previously. In this work, GlimmerM and phat were retrained on a larger training set of well-characterized genes, complementary DNAs (cDNAs) and products of PCR with reverse transcription (RT–PCR) (total length 540 kb) than was used in the earlier work. A program called Combiner was used to evaluate the GlimmerM and phat predictions, as well as the results of searches against nucleotide and protein databases, to construct consensus gene models. To assess the effect of these modifications, chromosome 2 was re-annotated and the results were compared with the previous annotation.

Application of these automated annotation procedures and manual curation of the resulting gene models for chromosome 2 produced 223 gene models. The revised procedures detected 21 genes not predicted previously, and 13 of the existing chromosome 2 models collapsed into six models in the new annotation. Of the 21 new gene models, all but one had no significant similarity to proteins in a non-redundant amino-acid database. However, at least a portion of each of the 21 gene models had been predicted independently by both GlimmerM and phat, suggesting that many of these models were likely to represent coding sequences. On the other hand, five of the new gene models encoded proteins less than 100 amino acids in length, and may be less likely to encode proteins.

Another major difference was the detection of additional small exons. In the earlier annotation of chromosome 2, the 209 predicted genes contained 353 exons, or an average of 1.7 exons per gene. The revised procedures reported here revealed 510 exons, or 2.3 exons per gene; 60% of the new exons were predicted to be additions to the gene models reported previously. Most cases involved the addition of one or two exons per gene. In three notable cases, however, 7 to 12 small exons were added to the earlier gene models, and almost all of the new exons had been predicted by both of the gene finding programs. Overall, use of the revised annotation procedures resulted in the detection of additional genes and many small exons, which is reflected in the higher gene density and shorter mean exon length in the newly annotated chromosome 2 sequence compared with the previous annotation (Table 1). Despite these improvements in software and training sets, gene finding in P. falciparum remains challenging, and the gene structures presented here should be regarded as preliminary until confirmed by sequence information obtained from cDNAs or RT–PCR experiments10. Accurate prediction of the 5' ends of genes is particularly difficult. Generation of larger training sets, including additional expressed sequence tags (ESTs) and full-length cDNAs, would greatly improve the sensitivity and accuracy of gene predictions.


These annotation procedures were also applied to the analysis of chromosomes 10, 11 and 14 (Table 1; maps of these chromosomes are available as Supplementary Information). The 10 short gaps in the chromosomes should not have interfered with the gene predictions; only the genes adjacent to the gaps might have been affected. All three chromosomes were similar in terms of gene density, coding percentage and other parameters. A complete description of the parasite genome is contained in the accompanying Article2.

Annotation of chromosomes 10, 11 and 14 revealed four proteins with sequence similarity to SR proteins, a family of conserved splicing factors that contain RNA-binding domains and a protein interaction domain rich in Ser and Arg residues (SR domain; PF10_0047, PF10_0217, PF11_0200, PF14_0656). Three additional putative SR proteins were identified on chromosomes 5 and 13 (PFE0160c, PFE0865c, MAL13P1.120). SR proteins are thought to bind to exonic splicing enhancers (ESEs), short (6–9 bp) sequences within exons that assist in the recognition of nearby splice sites, and to interact with components of the spliceosome11. ESEs have previously been characterized only in multicellular organisms. To determine whether P. falciparum may use ESEs as part of its splicing machinery, a Gibbs sampling algorithm for motif detection12 was applied to a set of P. falciparum exons to detect any exonic splicing enhancers (ESEs). The exons were extracted from the set of well-characterized genes used to train the GlimmerM gene finder. Regions of 50 bp regions were selected from both ends of the internal exons and divided into two different data sets, representing the exon regions adjacent to both 5' and 3' splice sites. At least 10 runs of the Gibbs sampler were performed for each data set in order to identify the most probable motif with a length of 5–9 nucleotides. The motif with the highest maximum a posteriori probability was retained. This analysis identified a motif with the consensus GAAGAA, which is identical to ESEs found in human exons13, 14. The identification of several putative SR proteins, and sequences identical to the ESEs in humans, suggests that some features of exon recognition and splicing observed in higher eukaryotes may be conserved in P. falciparum.

Top

Methods

Sequencing and closure

P. falciparum clone 3D7 was selected for sequencing because it can complete all phases of the life cycle, and had been used in a genetic cross15 and the Wellcome Trust Malaria Genome Mapping Project16. High-molecular-mass genomic DNA was subjected to electrophoresis on preparative pulsed field gels, and chromosomes were excised. DNA was extracted from the gel, sheared, and cloned into the pUC18 vector as described5 (chromosomes 2, 14) or into a modified pUC18 vector via BstXI linkers (chromosomes 10, 11). Sequences were assembled and gaps were closed by primer walking on plasmid DNAs or genomic PCR products, or by transposon insertion5. Ordering of contigs was facilitated by the use of sequence tagged sites16 and microsatellite markers17. The final assembly of each chromosome was verified by comparison with BamHI and NheI optical restriction maps18. The average difference in size between the experimentally determined restriction fragments and the fragments predicted from the sequence was approximately 5–6% for chromosomes 11 and 14 for both enzymes. For chromosome 10, the average difference in fragment sizes was 6.1% for the NheI map, but the BamHI optical and prediction restriction maps could not be aligned. Because the NheI optical restriction map agreed with that predicted from the sequence, the chromosome 10 assembly was judged to be correct.

Annotation

GlimmerM9 and phat8 were trained on 117 P. falciparum genes and 39 cDNAs taken from GenBank, plus 32 genes from chromosomes 2 and 3 that had been verified by RT–PCR (provided by R. Huestis and K. Fischer; the training set is available at http://www.tigr.org/software/glimmerm/data). The GlimmerM and phat predictions, and sequence alignments of the chromosomes to protein and cDNA databases, were evaluated by the Combiner program. The program used a linear weighting method and dynamic programming to construct consensus gene models that were curated manually using AnnotationStation (AffyMetrix Inc.). Predicted proteins were searched against a non-redundant amino-acid database using BLASTP; other features were identified by searches against the Pfam19, PROSITE20 and InterPro21 databases. The results of all analyses were reviewed using Manatee, a tool that interfaces with a relational database of the information produced by the annotation software. Predicted gene products were manually assigned Gene Ontology 22 terms. Signal peptides and signal anchors were predicted with SignalP-2.0 (ref. 23). Transmembrane helices were predicted with TMHMM24. Mitochondrial- and apicoplast-targeted proteins were predicted by MitoProtII25, TargetP26 and PATS27. tRNA-ScanSE28 was used to identify transfer RNAs.

Top

References

------------------

References

1. Breman, J. G. The ears of the hippopotamus: manifestations, determinants, and estimates of the malaria burden. Am. J. Trop. Med. Hyg. 64, 1-11 (2001) | PubMed |
2. Gardner, M. J. et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498-511 (2002) | Article | PubMed |
3. Hall, N. et al. Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13. Nature 419, 527-531 (2002) | Article | PubMed |
4. Hyman, R. W. et al. Sequence of Plasmodium falciparum chromosome 12. Nature 419, 534-537 (2002) | Article | PubMed |
5. Gardner, M. J. et al. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum. Science 282, 1126-1132 (1998) | Article | PubMed |
6. Glockner, G. et al. Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418, 79-85 (2002) | Article | PubMed |
7. Bowman, S. et al. The complete nucleotide sequence of chromosome 3 of Plasmodium falciparum. Nature 400, 532-538 (1999) | Article | PubMed |
8. Cawley, S. E., Wirth, A. I. & Speed, T. P. Phat--a gene finding program for Plasmodium falciparum. Mol. Biochem. Parasitol. 118, 167-174 (2001) | Article | PubMed |
9. Salzberg, S. L., Pertea, M., Delcher, A., Gardner, M. J. & Tettelin, H. Interpolated Markov models for eukaryotic gene finding. Genomics 59, 24-31 (1999) | Article | PubMed |
10. Huestis, R. & Fischer, K. Prediction of many new exons and introns in Plasmodium falciparum chromosome 2. Mol. Biochem. Parasitol. 118, 187-199 (2001) | Article | PubMed |
11. Maniatis, T. & Tasic, B. Alternative pre-mRNA splicing and proteome expansion in metazoans. Nature 418, 236-243 (2002) | Article | PubMed |
12. Lawrence, C. E. et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208-214 (1993) | PubMed |
13. Ramchatesingh, J., Zahler, A. M., Neugebauer, K. M., Roth, M. B. & Cooper, T. A. A subset of SR proteins activates splicing of the cardiac troponin T alternative exon by direct interactions with an exonic enhancer. Mol. Cell Biol. 15, 4898-4907 (1995) | PubMed |
14. Fairbrother, W. G., Yeh, R. F., Sharp, P. A. & Burge, C. B. Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007-1013 (2002) | Article | PubMed |
15. Walliker, D., Quayki, I., Wellems, T. E. & McCutchan, T. F. Genetic analysis of the human malaria parasite Plasmodium falciparum. Science 236, 1661-1666 (1987) | PubMed |
16. Foster, J. & Thompson, J. The Plasmodium falciparum genome project: a resource for researchers. Parasitol. Today 11, 1-4 (1995) | Article |
17. Su, X. et al. A genetic map and recombination parameters of the human malaria parasite Plasmodium falciparum. Science 286, 1351-1353 (1999) | Article | PubMed |
18. Lai, Z. et al. A shotgun optical map of the entire Plasmodium falciparum genome. Nature Genet. 23, 309-313 (1999) | Article | PubMed |
19. Bateman, A. et al. The Pfam protein families database. Nucleic Acids Res. 30, 276-280 (2002) | Article | PubMed |
20. Falquet, L. et al. The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235-238 (2002) | Article | PubMed |
21. Apweiler, R. et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29, 37-40 (2001) | Article | PubMed |
22. Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nature Genet. 25, 25-29 (2000) | Article | PubMed |
23. Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1-6 (1997) | Article | PubMed |
24. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567-580 (2001) | Article | PubMed |
25. Claros, M. G. & Vincens, P. Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur. J. Biochem. 241, 779-786 (1996) | Article | PubMed |
26. Emanuelsson, O., Nielsen, H., Brunak, S. & von Heijne, G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005-1016 (2000) | Article | PubMed |
27. Zuegge, J., Ralph, S., Schmuker, M., McFadden, G. I. & Schneider, G. Deciphering apicoplast targeting signals--feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene 280, 19-26 (2001) | Article | PubMed |
28. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955-964 (1997) | Article | PubMed |
29. Florens, L. et al. A proteomic view of the Plasmodium falciparum life cycle. Nature 419, 520-526 (2002) | Article | PubMed |
30. Lasonder, E. et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419, 537-542 (2002) | Article | PubMed |
Top

Supplementary Information

Supplementary information accompanies this paper.

Top

Acknowledgements

We thank our colleagues at The Institute for Genomic Research and the Naval Medical Research Center for support; J. Foster for providing markers for chromosome 14; R. Huestis and K. Fischer for providing RT–PCR data for chromosomes 2 and 3 before publication; and S. Cawley for assistance with phat. This work was supported by the Burroughs Wellcome Fund, the National Institute for Allergy and Infectious Diseases, the Naval Medical Research Center, and the US Army Medical Research and Materiel Command.

Top

Competing interests statement

The authors declare no competing financial interests.

  1. The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
  2. Malaria Program, Naval Medical Research Center, 503 Robert Grant Avenue, Silver Spring, Maryland 20910-7500, USA
  3. Present addresses: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesdsa, Maryland 20894, USA (L.M.C.); Celera Genomics, 45 West Gude Drive, Rockville, Maryland 20850, USA (H.O.S., M.D.A.); The Center for the Advancement of Genomics, 1901 Research Boulevard, 6th Floor, Rockville, Maryland 20850, USA (J.C.V.); Sanaria, 308 Argosy Drive, Gaithersburg, Maryland 20878, USA (S.L.H.).

Correspondence to: Malcolm J. Gardner1 Correspondence and requests for materials should be addressed to M.J.G. (e-mail: Email: gardner@tigr.org). Chromosome sequences have been deposited in GenBank with accession numbers AE001362.2 (chromosome 2), AE014185 (chromosome 10), AE01486 (chromosome 11) and AE01487 (chromosome 14), and in PlasmoDB (http://plasmodb.org).

Main navigation

Gateways and databases

Extra navigation

.

natureproducts


ADVERTISEMENT