Main

Schistosomiasis is a public health problem in many developing countries, and Schistosoma mansoni is the most widespread species of the causative trematode parasite1. Parasite eggs laid in the hepatic portal vasculature are the principal cause of morbidity, and the ensuing pathology may prove fatal2. Control of the disease by chemotherapy has relied heavily on praziquantel, potentially allowing drug-resistant parasites to emerge3. Protective immune mechanisms in humans that might form the basis for a vaccine have proven difficult to characterize4 owing to effective immune evasion by parasites. Nevertheless, the successful vaccination of both rodents and primates with attenuated larvae5 indicates that the goal is feasible.

As representatives of the platyhelminths, schistosomes are the lowest group of bilateria that diverged early from the metazoan lineage6. With a blind-ending gut and no body cavity, their body plan seems simple, but tissues corresponding to the main organ systems of higher animals are present. Schistosomes have a complex life cycle, and they are among the first animals to develop sexual dimorphism and heteromorphic sex chromosomes. They are intimately associated with the gastropod mollusk intermediate and the mammalian final host, perhaps relying on host signals for development. Active transmission between hosts and internal migrations show their capacity for sophisticated neuromuscular coordination.

The large size (270 Mb; ref. 7) and complexity of the S. mansoni genome have previously deterred full-scale sequencing (see The Institute for Genomic Research and The Sanger Institute websites). Current knowledge of expressed genes is limited to a set of 163 full-length cDNAs and approximately 16,000 ESTs, 75% derived from adult worms8,9. We report here a multicenter effort to obtain and annotate extensive transcriptome data for S. mansoni, using both a normalized cDNA library10 from adults and ORESTES minilibraries from six life-cycle stages (Supplementary Fig. 1 online). This approach, based on arbitrary primers and low-stringency RT–PCR11, preferentially amplifies the central, function-defining coding regions of messages12. This first large-scale database for a bilaterian acoelomate should enhance our understanding of the evolution, biology and adaptation to parasitism of these animals and identify novel proteins to be exploited as drug targets and vaccine candidates.

Results

Transcriptome features and gene complement

We obtained 163,586 EST reads from the S. mansoni transcriptome: 151,684 using ORESTES minilibraries and 11,902 from a normalized adult worm library. All our results are from a filtered data set of 124,681 analyzed reads, which resulted in 30,988 assembled EST sequences (Table 1), called Schistosoma mansoni assembled EST sequences (SmAEs). Newly identified S. mansoni genes are listed by product in Supplementary Table 1 online. The SmAE data set is estimated to sample 92% of the S. mansoni transcriptome. Comparison of SmAEs with publicly available sequences shows that 77% represent new S. mansoni gene fragments, either novel paralogs (1%), new orthologs (20%) or fragments with unknown function (no match in GenBank; 55%; Table 1). An average SmAE sequence provides around 32% coverage of a matching gene in GenBank (Supplementary Fig. 2 online); nevertheless, 359 novel orthologs have their entire coding region fully sequenced (Supplementary Table 2 online).

Table 1 S. mansoni transcriptome features and gene complement

The total number of genes in the parasite was predicted by two different methods to be around 14,000 (Table 1), comparable to the 14,000–19,000 predicted genes of other fully sequenced invertebrates13,14,15. Extrapolation from nonredundant bases acquired from adult worm ESTs indicates that 7,200 genes are expressed in this stage (Supplementary Fig. 3 online). We obtained 58,846 tags from serial analysis of gene expression (SAGE), and the number of unique tags reached a clear plateau at 6,263 (Supplementary Fig. 3 online), suggesting that almost all adult transcripts were sampled. Thus, about 50% of all S. mansoni genes are expressed in adult worms.

Functional classification of transcripts

We assigned Gene Ontology classifications to 8,001 SmAEs (Gene Ontology browser is available at the project website). The distribution of SmAEs among the main categories is shown in Supplementary Table 3 online. Protein metabolism was the most frequently identified of the biological process categories (Fig. 1a). Searching for conserved domains (in the Pfam database) showed that protein kinases were the most abundant (Fig. 1b) proteins, with 180 identified, suggesting that S. mansoni has a more compact set of protein kinases than any of the fully sequenced metazoa16. Most of the top 15 Pfam domains were from proteins involved in either intercellular communication or transcriptional regulation, which is expected for a parasite with multiple tissues and organs.

Figure 1: Gene Ontology classification and frequently encountered Pfam domains in SmAEs.
figure 1

(a) Percentage of S. mansoni SmAEs in each of the biological process categories of Gene Ontology classification. A total of 5,463 distinct SmAEs were assigned to 9,497 different biological processes (individual SmAEs can have multiple Gene Ontology assignments). (b) Fifteen Pfam domains occurred most frequently in S. mansoni SmAEs. Multiple Pfam domains on the same SmAE were counted only once.

Being a metazoan

It has been proposed that the platyhelminth acoelomates, represented by S. mansoni, diverged from other eubilaterian metazoa more than a billion years ago6. As such, they lie somewhere between the unieukaryotes Saccharomyces cerevisiae and Plasmodium falciparum and the more advanced invertebrates Caenorhabditis elegans, Drosophila melanogaster and Ciona intestinalis. Phylogenetic analyses (ref. 6 and Supplementary Fig. 4 online) support the ancient and independent divergence of acoelomates from other metazoa, which may explain the high fraction (55%) of SmAEs with no significant matches to sequences in GenBank. Thus, S. mansoni sequences should make an important contribution to understanding early metazoan evolution.

Metazoa-specific and eukarya-conserved sequences

We selected SmAEs that encode proteins that have been conserved among either the eukarya or the metazoa by comparison with known proteomes of organisms whose genomes have been completely sequenced. We built a metazoa-specific base set with the SmAEs that had orthologs only in each of the multicellular eukaryotes, Homo sapiens, D. melanogaster, C. elegans and C. intestinalis, but no matches with the unicellular eukaryotes, S. cerevisiae and P. falciparum, or with prokaryotes. The base set contains 1,598 sequences (645 genes) that may be essential to the more complex metazoan cell functions. The eukarya-conserved sequences had at least one ortholog in all of the eukaryotes listed above. This data set contains 3,194 SmAEs (1,443 genes), representing S. mansoni genes that would be important for eukaryotic cell functions.

The relative distribution of SmAEs in Gene Ontology categories for the eukarya-conserved and metazoa-specific data sets (Fig. 2) shows that the latter set contains higher proportions of sequences in a few categories (cell-to-cell interactions, developmental processes, response to external stimulus and signal transduction). In general, the metazoa-specific sequences that have diverse roles in the tissues of a complex organism are overrepresented relative to the eukarya-conserved sequences.

Figure 2: Category distribution of eukarya-conserved and metazoa-specific SmAEs.
figure 2

The metazoa-specific sequences (solid bars) have orthologs in each of the multicellular eukaryotes H. sapiens, D. melanogaster, C. elegans and C. intestinalis but not in the unicellular eukaryotes S. cerevisiae and P. falciparum. The essential and conserved eukarya SmAEs (striped bars) have orthologs in all of the eukaryotes listed above.

Cell adhesion and tissue structure

As triploblastic acoelomates, schistosomes have three germ layers, bilateral symmetry, dorso-ventral patterning and rudimentary organs, for which intercellular adhesion mechanisms were an evolutionary prerequisite. The occurrence of homotypic cell adhesion is indicated by transcripts for protocadherins and the proteins that link them to the actin cytoskeleton in adherens junctions (Table 2). The small G proteins involved in actin polymerization are all present. The existence of organized tight junctions, important in maintaining the integrity of epithelia, can also be inferred, and evidence for gap junctions is provided by two pannexins/innexins. The extracellular matrix is represented by collagens, laminins and tenascins to which cells may attach by a potential integrin heterodimer; the intracellular links between integrins and the actin cytoskeleton are also evident.

Table 2 Cell adhesion and tissue structure orthologs

The ability to undergo remodeling is a feature of organized tissues, but evidence for apoptosis is fragmentary. Some orthologs of this pathway were found (Table 2) whereas others (Bax, Bcl-2 family, endonuclease G) were not. In contrast, numerous components of autophagy were identified, apart from Apg13p and initiator Apg12p. This situation probably reflects the absence of wandering phagocytes to eliminate redundant cells.

Antero-posterior axis differentiation

S. mansoni has several axis-determining components in common with other metazoa. The presence of nanos, pumilio and the knirps gap-gene strongly suggests parallels with the mechanism used by D. melanogaster, in which maternal factors segregate to one pole of the egg and determine the antero-posterior axis. We detected the polycomb group transcripts, enhancer of zeste, polyhomeotic distal and extra sex combs, responsible for the maintenance of pattern, but none of the archetypal Hox cluster sequences. Orthologs of putative S. mansoni homeotic transcription factors included LIM-homeodomain, double homeobox protein 4 and homeotic protein Msx1.

Dorso-ventral patterning

Dorso-ventral patterning may be dictated by an analog of the TGF-β pathway. We identified activin/TGF-β receptor orthologs, Smad4, Smad8 and Medea as well as the known Smad1 and Smad2 (ref. 17). The R-Smads (Smad1, Smad2 and Smad8) are anchored to the plasma membrane by SARA, also newly identified. Specification of the dorso-ventral axis may also involve the Wnt pathway; we identified two Wnts and their transmembrane receptor frizzled as well as the cytosolic components of the intracellular signaling cascade dishevelled, axin, Gsk3 and β-catenin.

Epithelia

Adult schistosomes have three epithelia, surface tegument, gastrodermis and protonephridial canals, which control the transport of material into and out of their bodies. We found transcripts of villin family members supervillin and archvillin, which may cap and bundle actin filaments to provide an internal scaffold for cellular extensions cross-braced at their base by spectrin, also present. Functional studies have identified mediated transport of sugars, amino acids and nucleotides18. At least nine SmAEs for sugar transporters (some ATP-driven) can be added to the already cloned Sgtp1, Sgtp2 and Sgtp4 (ref. 19). We identified several transporters for lipids, amino acids, nucleotides and ions (Table 3).

Table 3 Novel ortholog and paralog genes for transporters identified in S. mansoni

Endocytosis is prominent in the gastrodermis but caveolin-type lipid rafts have also been postulated in the tegument surface20. We did not identify caveolin transcripts but did find the raft-associated flotillin. Transcripts for components of clathrin-mediated endocytosis included the clathrin heavy chain, assembly protein Ap180 and adaptor complex Ap2, which together encode all the functions to select cargo and form a vesicle. Dynamin, the master regulator of endocytosis, was present, along with phospholipid-interacting endophilin, Eps15 and epsin. In addition to low density lipoprotein–binding proteins21, transcripts for serotransferrin, low density lipoprotein and very low density lipoprotein receptors attest to the importance of receptor-mediated endocytosis.

Motility and the nervous system

All life-cycle stages have an extensive and intricately organized musculature comprised of smooth fibers22, and only the cercarial tail has a form of striated muscle. We identified transcripts for several myosins, two actins, tropomyosin, paramyosin and troponins C, I and T, involved in the regulation of contraction, the filament attachment proteins, α-actinin, vinculin and titin, many of which are novel paralogs. We found no transcripts encoding specific striated muscle proteins.

Platyhelminths are the first metazoan group to possess a central nervous system23 and have a variety of sensory structures24 that transduce a wide range of stimuli. Notch receptor, its transcription factor partner (suppressor of hairless) and membrane-bound ligand (delta) suggest a role for Notch signaling in S. mansoni neurogenesis. Transcripts for axon guidance molecules to direct nerves to their synaptic partners (netrin and its membrane receptor Unc5, two semaphorin-like and two plexin-like molecules) document the presence of a molecular repertoire for sophisticated neural circuitry. Regarding sensory structures, we identified components of the light detection system (a rhodopsin paralog of that previously described8,25, rhodopsin kinase, arrestin and transducin), the first two in eggs and germ balls, respectively, consistent with the responsiveness of miracidia and cercariae to light.

Signaling

Transcriptome analysis identifies the molecular basis for some elements of schistosome neurotransmitter/receptor systems. We found ligand-gated channels, including three versions of the nicotinic acetylcholine receptor, choline o-acetyltransferase for synthesis and acetylcholine esterase for breakdown of this inhibitory neurotransmitter. We also found a glutamate receptor and transcripts for the γ-amino butyric acid (GABA) transporter and GABA receptor–associated protein but not the inhibitory GABA receptor itself.

We found G-protein-coupled receptors for glutamate and the excitatory transmitter serotonin along with its transporter, as well as a putative muscarinic acetylcholine receptor. Although S. mansoni has been reported to respond to catecholamine26, we found no transcripts for the relevant receptors. Primitive neuroendocrine processes are known to be mediated by FaRP-type peptides27, but we found a transcript only for allatostatin precursor protein. Nevertheless, orthologs of hormone proprotein convertase 2, which processes the precursors of bioactive peptides, and its regulatory neuroendocrine protein 7B2 were present, as was glycine peptidyl α-amide monooxygenase, required for the C-terminal amidation of the resulting peptides. Proprotein convertase 2 generates the opioid peptides and enkephalin in higher animals and might have the same function in schistosomes, as these peptides have previously been reported28.

It is difficult to envisage how hormone signaling might operate in acoelomates, except over a short distance or through the neuroendocrine route. Nevertheless, two members of the nuclear receptor superfamily (retinoid-X and fushi tarazu factor 1) have been characterized29, and SmAEs for a retinoic acid receptor (RAR-γ), a thyroid hormone receptor family member, a nuclear receptor 1 and a nuclear orphan receptor Tr2/4 can be added. But detection of transcripts for thyroid hormone interactor proteins 4, 12, 13 and 15 and thyroid hormone receptor–associated proteins Trap240 and Trap80, together with the reported effect of thyroid hormone on schistosome development30, suggests that at least one nuclear orphan receptor may have a functional ligand. An ortholog of thyroid peroxidase, required to synthesize thyroid hormone, is present, but thyroglobulin, its vertebrate substrate, is not. If there is endogenous thyroid hormone, perhaps S. mansoni uses an alternative tyrosine-rich protein as a precursor.

The presence of transcripts for a series of cytochrome P450 enzymes, testosterone 6-β-hydroxylase and 17b-hydroxysteroid dehydrogenase suggests that schistosomes synthesize steroid hormones from cholesterol. They also seem to have some receptor elements (progesterone receptor membrane component 2 and estrogen-related receptor), which could bind endogenous steroids or mediate the supposed action of exogenous steroids on their maturation. Identification of other receptors for insulin and FGF, but not their ligands, reinforces the concept that host molecules act on parasite receptors. The presence of SmAEs encoding neurotensin and natriuretic peptide receptors is notable but more difficult to place in context.

Sex determination and sexual maturation

Most platyhelminths are hermaphrodites, but sexual dimorphism seems to have evolved separately on at least eight occasions, arguing for a relatively simple underlying mechanism31. Determination of sex is inherent whereas envelopment by the male is a prerequisite for female maturation32, showing the need for cross-talk. We detected orthologs of fox-1, mog-1, mog-4, tra-2 and fem-1, involved in the determination of sex in C. elegans. We also found the ortholog of mago-nashi, which in C. elegans (mag-1) specifies female development by inhibiting the hermaphrodite phenotype. The presence of the above transcripts in S. mansoni confirms their evolutionarily ancient role in sex determination, but it is unclear how they contribute to the dioecious state.

Being a parasite

Schistosomes have a prolonged association with their hosts and should therefore possess specific adaptations to the parasitic way of life. Adult worms are bathed in, and feed on, host blood, and we found transcripts for echicetin-like molecules that affect hemostasis and prevent thrombosis. Adult worms also expressed apyrase (CD39/ATP-diphosphohydrolase), an enzyme involved in platelet aggregation and thromboregulation that has been localized to the tegument33, possibly indicating the capacity to inhibit platelet activation.

Longevity

In contrast to the short lifespan of C. elegans or D. melanogaster, schistosomes have predicted lifespan of 6–10 years34. In yeast and C. elegans, an extra copy of Sir2 or sir-2.1, implicated in chromatin silencing, can increase lifespan, and we identified orthologs to sir-2.1, sir-2.2, sir-2.5, sir-2.6 and sir-2.7 in S. mansoni. We identified SmAEs from the insulin-signaling pathway, associated with longevity in C. elegans, including Daf2, an insulin-like receptor, Age1, a phosphatidylinositol-3-OH kinase and Daf16. Daf16 is a transcription factor that regulates many genes that affect lifespan, including enzymes that protect against or repair oxidative damage35. We also identified Pdk1 and PTEN, proteins that regulate the Daf2 pathway.

Stress responses

S. mansoni undergoes rapid transitions between environments that are accompanied by temperature and osmotic stresses. We extended the list of previously described heat shock genes (23 SmAEs, 12 possibly new), which includes an HtrA ortholog, a stress-regulated serine protease. Uroplakin is believed to limit the permeability of membranes to water and small non-electrolytes36; we found an ortholog in egg, miracidia and cercaria stages. Parasites also encounter oxidative stress during host immune attack, which is dealt with by antioxidant enzymes, both previously characterized (superoxide dismutases, thioredoxin and glutathione reductases and peroxidases) and novel, including mitochondrial thioredoxin 2, a PKC-interacting thioredoxin, thioredoxin-like 2, an ortholog of Plasmodium yoelii thioredoxin, and glutaredoxin 3.

The innate immune response comprises primitive mechanisms used by metazoa in defense against infection14,15. The Toll pathway has an important role in this, and we identified several components including Tollip, pellino and NF-κB kinase (NEMO), implying that S. mansoni can respond to extracellular pathogens. The presence of transcripts for adenosine deaminase, Dicer and Piwi/argonaute indicates that S. mansoni can also deal with intracellular attack mediated by viral dsRNA. By extension, the last two genes indicate that post-transcriptional gene silencing could occur, and the use of RNA interference to suppress schistosome gene function was recently reported37,38.

Evasion of host immune responses

S. mansoni has been proposed to use several strategies to evade host immune responses, including protection of the tegument surface by a secreted membranocalyx39, molecular mimicry, antigenic variation and immunomodulation. As an example of molecular mimicry, the convergent evolution of S. mansoni and Biomphalaria glabrata (snail intermediate host) tropomyosins 1 and 2, has been suggested40 on the basis of immunological cross-reactivity and amino acid sequence identity (63%). We detected a new isoform, tropomyosin 3, in adults, eggs and germ balls with only 35% amino acid identity to B. glabrata, suggesting a different tissue location not subjected to the same selective pressure.

In the context of antigenic variation, we found no evidence of highly variable gene families (compared with Plasmodium), but our database identified 449 putative novel paralogs to known S. mansoni genes (Table 1); 33 of these had high identity and >30% coverage (Supplementary Table 4 online). This multiplicity of isoforms would allow the parasite to use paralogs of an essential enzyme targeted by the immune system to avoid loss of function, thus making vaccine development more difficult. Indeed, we identified several paralogs of previously investigated vaccine candidates (Supplementary Table 5 online).

Non-synonymous single-nucleotide polymorphisms (SNPs) are another source of variation. Analysis of redundant EST coverage of genes encoding vaccine candidates identified eight putative polymorphisms, two of which could be validated (see Supplementary Methods online) in isolates from different regions of the world. We detected alternative splicing in several genes, including a recently identified exon skipping in Sm14 (ref. 41) present in germ balls, schistosomula and adults.

Modulation of mammalian host immune responses by a schistosome infection is well documented, but the agents and mechanisms are not yet fully defined. The presence of transcripts for pro-inflammatory phospholipase A2-activating protein supports the documented effect of lyso-phosphatidylserine as an inducer of T-regulatory cells and Th2 polarization42. S. mansoni eggs and adults induce a characteristic allergic response43,44. The identification of a family of orthologs to wasp venom allergen 5 raises the question of how the parasite benefits from amplifying such a response.

Stage-associated frequency of sequences

The frequency of reads in a SmAE cluster obtained from different life cycle stages can reflect differential gene expression when the same set of primers is used for generating ORESTES minilibraries. We validated this approach experimentally by semi-quantitative RT–PCR (Supplementary Fig. 5 online). We analyzed 5,172 sequences obtained with the same set of primers, generating 2,058 SmAEs. We found that 82 of these had conspicuously different patterns of distribution among stages (with 99.8% confidence), several being predominant in one stage only (Fig. 3 and Supplementary Table 6 online). In particular, germ balls overexpressed elastase 2a (secretion for host invasion45), troponin I and tropomyosin 2 (muscle development), and centrin3 and S-rex/Nsp (differentiation).

Figure 3: Frequency of sequenced transcripts in life-cycle stages.
figure 3

Hierarchical clustering of SmAEs using relative expression inference, estimated from the count of reads in a SmAE obtained with the same primer from each stage: C, cercaria; S, schistosomula; A, adults; E, eggs; L, miracidia; G, germ balls. The SmAE number and annotation of each gene are shown. Color scale indicates the number of counts with black representing no count and red representing a count above 20. Cytophaga hutchinsonii, Loligo pealei, Canis familiaris, Neurospora crassa, Gallus gallus, Mizuhopecten yessoensis, Spodoptera frugiperda, Neuorospora aromaticivorans, Streptomyces coelicolor, Mycobacterium avium, Pisum sativum, Pseudomonas fluorescens, Neurospora tabacum, Zea mays, Ciona savignyi and Salmo salar are the full names of species not previously mentioned.

Potential drug targets and multidrug-resistance genes

One main benefit from our project should be the identification of novel proteins amenable to rational drug design. Selected examples of potential molecular targets are detailed in Table 4. Existing anthelminthics46 that disrupt neurotransmission provide the rationale for one group. Paralogs of calcium channel subunits, the targets of praziquantel, and cyclophillins, which mediate the antischistosomal effect of cyclosporin, are also listed. Molecules proposed as targets in other systems include innexins (connexins of vertebrates) and DNA polymerase. We identified transcripts for several multidrug resistance transporters, however, which could complicate the development of new drugs.

Table 4 Chemotherapy in schistosomiasis: potential new drug targets

Potential vaccine candidates

Potential vaccine candidates should include proteins that are preferentially surface-exposed or exported and that are expressed in intramammalian stages. These properties can be searched for using Gene Ontology categorization. Thus, orthologs of secreted toxins and surface proteins involved in cell adhesion both warrant investigation (Table 5). Three orthologs of Plasmodium circumsporozoite protein, expressed in schistosomula and adults, and an ortholog of the S. cerevisiae threonine-rich cell-wall protein may be surface-exposed. Likewise, receptors that potentially bind host hormones should be accessible to the immune system. Targeting glycosyl phosphatidyl inositol–anchored proteins or receptors for nutrients could impair vital functions in the parasite and thus provide another avenue for vaccine development.

Table 5 Novel S. mansoni genes to be investigated as vaccine candidates

Discussion

Our study of the S. mansoni transcriptome increases tenfold the number of ESTs available to define the gene complement of this blood fluke and will be an essential resource for annotation of its genome. Our overall impression of this member of one of the simplest extant bilaterian groups is that most, if not all, of the cellular and physiological systems of higher animals were established before the divergence of the platyhelminths. Thus, components required for tissue organization and smooth muscle function were present at an early stage of metazoan evolution. An extensive range of neurotransmitter systems and enzymes for the generation of neuropeptides and opioid peptides indicates substantial capacity for neurosecretory control of physiology. Potential components of thyroid and steroid hormone systems were identified; it will be pertinent to establish the source of ligands for the relevant receptors. Apoptosis seems to be a later evolutionary development, however, with autophagy the predominant means of removing unwanted cells.

Features of the transcriptome that can be associated with the parasitic way of life are more difficult to define. One probable reason for this is that we found no similarity for 55% of SmAEs. A singular advantage of parasitism is the ready access to a supply of nutrients, uptake of which is facilitated by a wide variety of transporters and receptors for lipids and cholesterol. With respect to immune evasion, the paucity of mechanisms for antigenic variation, compared with Plasmodium or Trypanosoma, is notable. Immune evasion by secretion of an inert bilayer masking the parasite-host interface can now be investigated by combining the transcriptome database with proteomics techniques to elucidate the architecture of the tegument surface. A similar approach should allow identification of protein immunomodulators known to be released by cercariae, adult worms and eggs.

We should not forget that S. mansoni is an important human pathogen with no vaccine and a single drug for treatment. Mining the SmAE database for drug targets and vaccine candidates should therefore be a priority. By analogy with other systems, we have singled out a number of chemotherapeutic possibilities from a potentially long list. The prediction of vaccine candidates from sequence information alone is highly speculative, but key antigens should now be identifiable by immunological studies in experimental animals and humans.

Methods

Parasites.

We maintained the BH and PR isolates of S. mansoni in the laboratory by routine passage through mice and snails and recovered parasite life cycle stages as described in Supplementary Methods online. We concentrated cercaria, schistosomula and adults by centrifugation and stored them at −20 °C in RNAlater (Ambion) according to the manufacturer's recommendations before extracting mRNA. We used freshly isolated parasites from the other stages (eggs, miracidia and germ balls) for immediate extraction of mRNA.

Construction of cDNA libraries and sequencing.

We obtained DNase-treated mRNA with MACs mRNA isolation kits (Miltenyi Biotec) and used it to construct cDNA and SAGE libraries. We carried out cDNA synthesis and amplification using the ORESTES protocol with modifications12,47 (see Supplementary Methods online). We prepared normalized poly-dT-primed cDNA libraries as previously described10 using the abundantly available mRNA from adult worms. We sequenced cDNA using standard fluorescence-labeling dye-terminator protocols. To analyze differential gene expression, we used a set of six primers to construct ORESTES cDNA minilibraries from all stages. Sequencing of at least two 96-well plates per library resulted in at least 140 sequences per stage per primer (see Supplementary Methods online).

EST processing pipeline and annotation.

We stored, processed and trimmed EST sequence chromatograms through a web-based service48 and accepted sequences with at least 100 bp with phred-15 or higher for further evaluation. We filtered sequences using BLASTN analysis with a local copy of GenBank NT database and the BlastMachine (Paracel) to eliminate those that matched non-S. mansoni sequences with E ≤ 10−15 and had at least 98% identity along at least 75 nucleotides. We also excluded reads that matched S. mansoni ribosomal or mitochondrial sequences and transposon sequences with E ≤ 10−15 and at least 85% identity along at least 75 nucleotides or that matched bacterial sequences with E ≤ 10−20 and at least 95% identity along at least 75 nucleotides. We filtered further transposon and bacterial sequences by comparing with BLASTX against the set of transposon and bacterial sequences from GenBank NR and eliminating those with matching E ≤ 10−4 and at least 30% identity along at least 75 amino acids with transposons or matching E ≤ 10−6 and at least 95% identity along at least 75 amino acids with bacteria. We clustered and assembled ESTs using CAP3 (ref. 49). We assigned putative protein products to SmAEs based on BLASTX hits to National Center for Biotechnology Information's NR database. We assigned Gene Ontology terms to SmAEs based on BLASTX hits against a database locally built from public sequences associated with Gene Ontology terms. The public Gene Ontology annotated data sets used were from H. sapiens, D. melanogaster, Arabidopsis thaliana, Oryza sativa, C. elegans, S. cerevisiae, Schizosaccharomyces pombe and Vibrio cholerae plus a curated sequence database (Gene Ontology Annotation at EBI) available at the Gene Ontology Consortium website. In both cases, we used E ≤ 10−6 as the BLASTX cut-off. We used ESTscan to deduce amino acid sequences and used them as queries against the Pfam database 7.8.

SAGE.

We constructed a SAGE library with mRNA derived from adult worms (males and females) using the I-SAGE Kit (Invitrogen). We treated poly(A)+ mRNA with DNase before extraction with oligo-dT. We cloned and sequenced concatamers and derived tags from high-quality sequence segments. To determine the relative abundance of transcripts in adult worms, we compared the SAGE tag list with the complete SmAE data set and with all full-length cDNA sequences from S. mansoni.

Phylogeny inferences.

We aligned protein sequences using the ClustalX multiple sequence alignment program. Only unambiguous positions were used in the phylogenetic analysis. We generated phylogenetic trees using the Phylip program as described in Supplementary Methods online.

Differential expression analysis.

To evaluate differential expression, we assembled the ORESTES sequences derived from six primers along all six life cycle stages and considered the number of reads per stage for each cluster as an indirect inference of the expression level in the stage. Sequences with a differential frequency of reads by stage (99.8% confidence) when analyzed by a randomization test50 are discussed. Hierarchical clustering of these data was done using correlation distance UPGMA as provided in the Spotfire for Functional Genomics software (Spotfire). We carried out semi-quantitative RT–PCR to confirm differential expression of three selected genes (see Supplementary Methods online).

SNP analysis.

We identified putative SNPs in S. mansoni genes using Polybayes as described in Supplementary Methods online. We selected a fraction of the putative SNPs in vaccine candidates for experimental validation using DNA derived from pooled adult worms (see Supplementary Methods online).

URLs.

Project website including Schistosoma Gene Ontology browser, BLAST server and SmAEs search tools, http://bioinfo.iq.usp.br/schisto/; The Institute for Genomic Research S. mansoni genome project, http://www.tigr.org/tdb/e2k1/sma1/; The Sanger Institute S. mansoni genome project, http://www.sanger.ac.uk/Projects/S_mansoni/; The Phred/Phrap/Consed System Home Page, http://www.phrap.org/; National Center for Biotechnology, http://www.ncbi.nlm.nih.gov/BLAST/; Gene Ontology Consortium, http://www.geneontology.org/; ESTScan2 server, http://www.ch.embnet.org/software/ESTScan2.html; Pfam server, http://www.sanger.ac.uk/Software/Pfam/.

Accession numbers.

Sequences were deposited in GenBank under accession numbers CD059164CD088507, CD088510CD120734, CD120740CD150744 and CD151578CD202980. SNPs identified in this study were deposited in dbSNP at National Center for Biotechnology Information under the accession numbers ss8486502ss8486509.

Note: Supplementary information is available on the Nature Genetics website.