The genomic basis of cnidarian evolution has so far been viewed from the perspective of an anthozoan, the sea anemone Nematostella vectensis6. Hydra is a medusozoan that diverged from anthozoans at least 540 millions year ago. Features of Hydra and Nematostella are compared in Supplementary Table 1. We generated draft assemblies of the Hydra magnipapillata genome using a whole-genome shotgun approach (Supplementary Information sections 1–3 and Supplementary Figs 1–3). The Hydra genome is (A+T)-rich (71% A+T), and includes 57% transposable elements (see below). Although the sequenced strain reproduces clonally in the laboratory by asexual budding, it is diploid with substantial heterozygosity (0.7% single nucleotide polymorphism between alleles), which we find is distributed along the genome as expected if it were drawn from a randomly mating population (Supplementary Information section 3). These features complicate shotgun sequencing and assembly. Two complementary assemblies (CA and RP) were generated (Supplementary Information section 3) and deposited in GenBank. The CA assembly (1.5 gigabases (Gb)) has contig and scaffold N50 values of 12.8 kilobases (kb) and 63.4 kb, respectively. The RP assembly (1.0 Gb) has a contig N50 length of 9.7 kb and a scaffold N50 length of 92.5 kb. The CA assembly gives an estimated non-redundant genome size of 1.05 Gb. The RP assembly gives an estimated non-redundant genome size of 0.9 Gb (see Supplementary Information section 3 for a discussion of genome size calculations). For analysis, we chose the assembly that minimized sequence redundancy owing to the separate assembly of haplotypes (see Supplementary Information section 3 for further discussion). Approximately 99% of known Hydra genes are found in both assemblies, attesting to their completeness with respect to protein-coding genes.

Although the present Hydra assembly is too fragmented for a chromosome-scale analysis, we found evidence for synteny with other metazoans. Of the 33 longest gene-rich Hydra scaffolds (that is, those containing genes from at least 10 Hydra/Nematostella orthologue groups), 15 (45%) were significantly enriched (P < 0.01) for genes from specific eumetazoan linkage groups6, indicating that vestiges of the ancestral eumetazoan genome organization persist in Hydra. This is in contrast to the highly diverged genomes of Drosophila and Caenorhabditis elegans, which show no synteny with other metazoans by these methods.

We estimate that the Hydra genome contains 20,000 bona fide protein-coding genes (excluding transposable elements), based on expressed sequence tags (ESTs), homology and ab initio gene prediction (Supplementary Information section 6). The amino acid substitution rate in the Hydra lineage is enhanced relative to the Nematostella lineage; the sequence divergence between a Hydra peptide and its human orthologue is typically greater than the sequence divergence between Nematostella and human (Supplementary Information section 8 and Supplementary Fig. 4) as expected based on the longer branch leading to Hydra in peptide-based phylogenies6. Similarly, the rate of intron loss has been higher in the Hydra lineage; we find that 22% (126 out of 575) of the introns shared by Nematostella and human in well-aligned coding regions have been lost in Hydra. Conversely, only 6% (28 out of 476) of the introns shared by Hydra and human are absent in Nematostella.

Transposable elements make up 57% of the Hydra genome and represent over 500 different families (Supplementary Information section 9). The most abundant element, comprising 15% of the genome (Fig. 1 and Supplementary Table 3), is a non-long-terminal-repeat (non-LTR) retroelement of the chicken repeat 1 (CR1) family. To our knowledge, elements of this family are more abundant in the Hydra genome than in any other sequenced animal genome (in comparison, the CR1 family occupies only 1% of the Nematostella assembly and 3% of the chicken assembly). This retrotransposon is still active in Hydra, as indicated by its representation in 105 ESTs. We also found 789 cases of intronless genes that were derived recently from multi-exon genes, most probably through retrotransposition. DNA transposons (predominantly ‘cut-and-paste’ elements of the mariner, Transib and hAT (hobo-Ac-Tam3) types) occupy 20% of both the Hydra and Nematostella genomes, and are also active in Hydra based on the presence of ESTs.

Figure 1: Dynamics of transposable element expansion in Hydra reveals several periods of transposon activity.
figure 1

a, The top panel shows phylogenetic relationships between four Hydra species based on ESTs (using Nei-Gojobori synonymous substitution rates; see Supplementary Fig. 8). The bottom panel shows the fraction of the genome that is occupied by a specific repeat class at a given divergence from the repeat consensus generated by the ReAS (recovery of ancestral sequences) algorithm (see Supplementary Information section 9). Substitution levels are corrected for multiple substitutions using the Jukes–Cantor formula K = -3/4ln(1-i4/3), where i is per cent dissimilarity on the nucleotide level from the repeat consensus. This substitution level for transposons is equivalent to Nei-Gojobori synonymous substitution rates in the ESTs. Three element expansions are inferred, the most distinct are the most ancient at 0.4 and the most recent at 0.05 divergence levels. The middle expansion at about 0.2 is not well synchronized and is more clearly seen for individual element classes in Supplementary Figs 5 and 6. b, c, Example of periods of activity of a single Hydra CR1 retrotransposon family (b) and the maximum likelihood phylogeny of the family (c).

PowerPoint slide

Timing of transposable element activity using sequence divergence of extant copies reveals at least three periods of element expansion (at 5%, 20% and 40% nucleotide substitutions; Fig. 1 and Supplementary Figs 5 and 6). In marked contrast, comparable expansions are absent from the Nematostella genome (Supplementary Fig. 7). Most individual Hydra transposable element families show discrete bursts of expansion (Fig. 1b, c) that are possibly associated with population bottlenecks7. The correspondence between speciation times in the genus Hydra and the timing of transposon activity may have been associated with the approximately threefold increase in genome size (Fig. 1a) in H. magnipapillata, H. vulgaris and H. oligactis relative to H. viridissima (380 megabases (Mb))8.

Addition of short RNA leader sequences to the 5′ ends of messenger RNAs by trans-splicing occurs in a subset of metazoans and unicellular eukaryotes9. Transcripts from at least one-third of EST-supported genes in Hydra undergo trans-spliced leader addition (Supplementary Information section 10). Hydra has multiple spliced leader genes (Supplementary Table 9), and a given transcript may be trans-spliced with several different spliced leaders. Notably, trans-splicing is absent from Nematostella (Supplementary Information section 10). It now seems likely that trans-splicing has evolved multiple times independently9.

Trans-splicing occurs in Hydra viridissima (N. A. Stover and R.E.S., unpublished data; GenBank accession number DQ092354) and in several other hydrozoans (Supplementary Table 10), and may be an ancestral feature of the class. Spliced leader addition gives a eukaryotic cell the opportunity to combine genes into operons, the multi-cistronic transcripts of which can be resolved into individual mRNAs by trans-splicing. We found 32 potential Hydra operons (Supplementary Information section 10, Supplementary Table 11 and Supplementary Fig. 9), but no obvious evidence for functional relationships between genes in these operons.

Bacteria are stably associated with Hydra10. Electron micrographs reveal bacterial cells underneath the glycocalyx, the coat that overlies the apical surface of the ectodermal epithelial layer of Hydra (Supplementary Fig. 10). Our assembly yielded eight large putative bacterial scaffolds as evidenced by: (1) high G+C content (in contrast to the low G+C content of the Hydra genome); (2) no high-copy repeat sequences typical of Hydra scaffolds; and (3) closely spaced single-exon open reading frames with best hits to bacterial genes (Supplementary Information section 11, Supplementary Fig. 11 and Supplementary Table 12). These scaffolds span a total of 4 Mb encoding 3,782 single-exon genes and represent an estimated 98% of the bacterial chromosome. Phylogenetic analysis of 16S rRNA (Supplementary Fig. 12) and conserved clusters of orthologous groups of proteins (COGs) indicate that this bacterium is a novel Curvibacter species belonging to the family Comamonadaceae (order Burkholderiales)11. About 60% of annotated Curvibacter sp. genes have an orthologue in another species of Comamonadaceae (Supplementary Table 13). Notably, the Curvibacter sp. genome encodes nine different ABC sugar transporters, compared to only one or two in other species of Comamonadaceae (Supplementary Table 14), possibly reflecting an adaptation to life in association with Hydra.

Non-metazoan genes among cnidarian ESTs have been reported previously12, and we have now found further examples of such genes in the Hydra genome assembly. These genes are candidates for horizontal gene transfer (HGT) (Supplementary Information section 12). Seventy-one Hydra gene models showed closer relationships to bacterial genes than to metazoan genes based on sequence similarity and phylogenetic analysis (Supplementary Table 15). Of these, 51 have no blast hits to other metazoans, except in a few cases to Nematostella. Potential donors of these HGT candidates are widely distributed among different bacterial phyla (Supplementary Table 15) and show no enrichment for close relatives of Curvibacter. Approximately 70% of the HGT candidates have EST support, and transcripts from 30% of the genes have spliced leaders, indicating unambiguously that they are derived from Hydra and not from associated bacteria (Supplementary Table 15). The HGT candidates generally have fewer introns than Hydra genes and nearly one-half are single-exon genes (Supplementary Fig. 14), as expected if they were relatively recently acquired by Hydra. A number of the HGT candidates encode sugar-modifying enzymes. Three genes encode enzymes in the branch of the bacterial lipopolysaccharide synthesis pathway that leads to formation of the activated heptose precursor of the lipopolysaccharide inner core (Supplementary Fig. 13). This pathway could modify endogenous glycoproteins or proteoglycans in Hydra.

We also identified 90 transposable elements that were potentially horizontally transferred into the Hydra genome. These elements have expanded recently (less than 10% nucleotide divergence from their consensus) and have no older copies in the genome. The most frequent element class consists of hAT transposons with 34 different families, although all major classes of transposable element (DNA transposon, LTR and non-LTR elements) are represented. Transposable elements have been shown previously to be horizontally transferred in metazoans13.

We identified 51 unique non-tRNA/non-rRNA transcripts that correspond to putative non-coding RNA genes based on 454 sequencing of short transcripts from Hydra (Supplementary Information section 13 and Supplementary Table 16). At least 17 of these are microRNAs (miRNAs), compared to 40 identified miRNAs in Nematostella14. Surprisingly, only a single miRNA gene in the available data sets, miR-2022, is common to both cnidarian species.

Hox and ParaHox gene families arose from a megacluster that included a number of other homeobox genes (for example, NK genes)15. With the exception of engrailed, descendants of all of the classes of homeobox genes in the megacluster are found in Nematostella16,17. Hydra is missing a substantial fraction of megacluster descendants16, indicating secondary loss. For example, the eve and emx genes are absent from Hydra, although they are present in Nematostella and several hydrozoans (Supplementary Table 17). The loss of these genes from Hydra is therefore recent in relation to the diversification of hydrozoans. These genes are expressed in a cell-type-specific manner in larvae and adults of Nematostella17 and Hydractinia18; it is intriguing that the loss of these genes correlates with the absence of a larval stage in Hydra (Supplementary Table 17). The absence of these genes in Hydra indicates that despite their near-universal presence in animals, it is possible to construct a metazoan without either of them. In addition to the loss of emx and eve genes, Hydra has undergone several other marked gene losses; for example, it lacks fluorescent protein genes and key circadian rhythm genes (Supplementary Information section 14).

All major bilaterian signalling pathways, including Wnt, transforming growth factor-β, Hedgehog, receptor tyrosine kinase and Notch, are present in Hydra and Nematostella. An important signalling centre in Hydra is the head organizer, which uses the Wnt signalling pathway to establish positional values along the body column19,20. The head organizer, which is located at the apical tip of the adult polyp, is derived from the gastrula blastopore in cnidarians. A transplanted head organizer has the capacity to induce axis formation21, similar to the Spemann–Mangold organizer in Xenopus. Orthologues of a number of genes known to act in the Spemann–Mangold organizer in Xenopus are present in the Hydra and Nematostella genomes. Moreover, several of the secreted signalling molecules and transcription factors encoded by these genes are expressed specifically in the Hydra head organizer and the blastopore organizer in the Nematostella gastrula (Supplementary Information section 15 and Supplementary Table 18). Thus, the Hydra head organizer and the Xenopus Spemann–Mangold organizer may share common descent from an organizer in the ancestor of cnidarians and bilaterians.

The extracellular portions of two Hydra receptor tyrosine kinases22,23 contain a novel protein domain, sweet tooth (SWT). The SWT domain is also present in ESTs from the hydrozoan Clytia, but is absent from all other sequenced genomes, including that of Nematostella (Supplementary Fig. 15). SWT is among the most abundant protein domains encoded in the Hydra genome. The SWT domain is present in one or more copies in predicted secreted proteins. Given its presence in receptors and secreted proteins, we deduce that the SWT domain defines a large, diverse and novel set of signalling proteins.

Hydra contains a pluripotent stem cell type that gives rise to germ cells, nerve cells, nematocytes and secretory cells4. Of the five genes that have been shown to induce pluripotency in differentiated somatic cells of mammals (Myc, Nanog, Klf4, Oct4 and Sox2)24, homologues of three (Nanog, Klf4 and Oct4) are clearly not present in the Hydra genome. Hydra has four Myc homologues. There are two members of the Sox B group in Hydra. The Sox B group includes Sox2, but the evolutionary relationship between vertebrate Sox2 genes and Hydra Sox B genes is not clear25. We conclude that the stem cell genetic network in Hydra probably has an evolutionary origin independent from the network used in mammalian stem cells. Studies of diverse cnidarians support this scenario (see Supplementary Information section 14 for details).

Hydra’s shape is formed by epitheliomuscular cells, a cell type unique to cnidarians. A survey of genes that encode muscle structural and regulatory proteins in Hydra and Nematostella reveals a conserved eumetazoan core actin-myosin contractile machinery shared with bilaterians (Supplementary Table 19). Both cnidarians, however, lack crucial, specific regulators associated with vertebrate striated (troponin complex) or smooth muscles (caldesmon), indicating that these specializations arose after the cnidarian–bilaterian split. Hydra also shows secondary simplifications relative to Nematostella, which has a greater degree of muscle-cell-type specialization, including specialized retractor muscle cells. Hydra lacks several components of the dystroglycan complex (α/ε-sarcoglycan and β-sarcoglycan, α/β-dystroglycan and γ-syntrophin), which may lead to a less robust tethering of actin to the cell membrane than in Nematostella. Similarly, the absence of a bona fide myosin light chain kinase and phosphatase in Hydra indicates a divergence or loss of regulation by myosin regulatory light chain phosphorylation. The greater degree of muscle-cell-type specialization in Nematostella is also mirrored in the higher number of myosin light chain genes in this species. Thus, even among cnidarians, we see substantial variation in muscle-associated components superimposed on the eumetazoan core, with the Hydra muscular system representing a secondary simplification from a more complex cnidarian ancestor.

Ultrastructural studies show that nerve cells in Hydra form synapses on contractile epitheliomuscular cells (Fig. 2a), and that these synapses contain dense core vesicles, paramembranous densities and cleft filaments26 similar to canonical neuromuscular junctions in bilaterians. Several components of the bilaterian neuromuscular junction (choline transporter, nicotinic acetylcholine receptor) are encoded in the Hydra genome (Supplementary Information section 16 and Supplementary Table 20) and their expression is consistent with a role in neuromuscular signalling (Supplementary Figs 16 and 17). Other components, however, are found only in a possibly primitive form (putative carnitine acetyltransferases that lack the diagnostic residues for choline selectivity), and some components are absent (the vesicular acetylcholine transporter; Fig. 2b). Together, these data indicate that a canonical bilaterian neuromuscular junction was probably not present in the last common ancestor of cnidarians and bilaterians. Hydra is known to use neuropeptides for the control of behaviour27, and these may be contained in the dense-core vesicles seen at Hydra synapses.

Figure 2: The neuromuscular junction in Hydra.
figure 2

a, Electron micrograph of a nerve synapsing on a Hydra epitheliomuscular cell. emc, epitheliomuscular cell; nv, nerve cell. Three vesicles are located in the nerve cell at the site of contact with the epitheliomuscular cell. Scale bar, 200 nm. b, Schematic diagram of a canonical neuromuscular junction. Yellow indicates presence in Hydra. Choline acetyltransferase (ChAT) is shown in red because it is not clear whether Hydra has an enzyme that prefers choline (Ch) as a substrate. Acetylcholine (ACh) molecules are shown as blue circles. The nicotinic acetylcholine receptor (nAChR) is shown in the open state with acetylcholine bound (left), and in the closed state in the absence of bound acetylcholine (right). AChE, acetylcholinesterase; ChT, choline transporter; MuSK, muscle-specific kinase; VAChT, vesicular acetylcholine transporter.

PowerPoint slide

In Hydra and Nematostella, epitheliomuscular cells have an apical junctional belt in the form of a septate junction, clear apical–basal polarity, and hemidesmosome-like contact sites with the extracellular matrix (mesoglea) on their basal surface (Fig. 3a). The Hydra and Nematostella genomes encode almost all of the proteins known from bilaterians to be involved in the establishment of cell–cell and cell–substrate contacts (Fig. 3b and Supplementary Fig. 18). This indicates that the common cnidarian–bilaterian ancestor possessed a genetic inventory for the formation of all types of eumetazoan cell–cell and cell–substrate junctions. The presence of innexin genes in the Hydra28 and Nematostella genomes (Fig. 3b and Supplementary Fig. 19) combined with the lack of connexin genes in non-chordate genomes clearly support the view that innexin-based gap junctions are an ancestral eumetazoan feature, and that gap junctions formed by connexins29 arose later in animal evolution. Similarly, the lack of occludin genes in cnidarians and other non-chordates (Fig. 3) indicates that occludins and their function in tight junction formation first arose in the deuterostome lineage.

Figure 3: Hydra cell junctions.
figure 3

a, Schematic diagram of the positions of cell–cell and cell–matrix contacts in Hydra epitheliomuscular cells. Septate junction, red; gap junctions, green; spot desmosomes, blue; hemidesmosome-like cell–matrix contact, yellow. Ecto, ectodermal cell; Endo, endodermal cell; M, mesoglea. For simplicity the nervous system has been omitted. be, Electron micrographs of cell–cell and cell–matrix contacts in Hydra. b, Apical septate junction. c, Spot desmosome between basal muscle processes. d, Gap junction in the lateral cell membrane. e, Hemidesmosome-like cell–mesoglea contact site. Scale bars in be indicate 100 nm. f, Phylogenetic distribution of cell–cell and cell–substrate contact proteins. A filled box indicates the presence of an orthologue from the corresponding protein family as identified by SMART/Pfam analysis or conserved cysteine patterns. See Supplementary Information section 17 and Supplementary Table 21 for details.

PowerPoint slide

Although some gene families associated with cell–cell and cell–substrate interactions are also found in placozoans, demosponges and choanoflagellates, it is important to note that there are cell-adhesion-associated protein domains specific to cnidarians and bilaterians. For example, Hydra and Nematostella have classic cadherins exhibiting a highly conserved, bilaterian-type cytoplasmic (CCD) domain (Fig. 3b and Supplementary Fig. 18) that is able to interact with β- and p120/δ-catenin (Supplementary Information section 17). So far, only one sponge cadherin gene that encodes a cytoplasmic domain with weak similarity to the eumetazoan CCD domain has been detected.

The sequencing of the Hydra genome has revealed unexpected relationships between the genetic makeup of the animal and its biology. The genes encoding the proteins that form epithelial junctions in bilaterians are present in Hydra yet there are obvious differences in structures of the junctional complexes. Despite the morphological similarity of neuromuscular junctions in bilaterians and Hydra, several of the key genes required to make this junction in bilaterians are absent from Hydra. Hydra has a complete set of muscle genes but lacks mesoderm and forms muscles only in epithelial cells. Most of the genes required for stem cell pluripotency in mammals are absent from Hydra, yet Hydra has a multipotent stem cell system that functions similarly to stem cell systems in bilaterians. The availability of the Hydra genome sequence and methods to manipulate it30 provide an opportunity to understand how this remarkable animal evolved.

Methods Summary

The genome of Hydra magnipapillata strain 105 was sequenced at the J. Craig Venter Institute using the whole genome shotgun approach. Two different assemblies were generated and deposited in GenBank (accession numbers ABRM00000000 and ACZU00000000). Complementary DNA libraries were prepared using standard methods and ESTs were generated at the National Institute of Genetics (Mishima, Japan) and the Genome Sequencing Center (Washington University, St Louis). ESTs have been deposited in the dbEST database at the National Center for Biotechnology Information. The Curvibacter sp. genome sequence has been deposited in GenBank (accession numbers FN543101, FN543102, FN543103, FN543104, FN543105, FN543106, FN543107 and FN543108).