A goal of comparative genomics is to decipher the causal connections between genome composition and animal form. The phylum Cnidaria (sea anemones, corals, hydroids and jellyfish) holds a pivotal place in such studies. Phylogenetic analyses consistently support cnidarians as the sister clade to Bilateria (protostomes plus deuterostomes), the clade that encompasses 99% of extant animals (Fig. 1a)1,2. Putative fossils of extant cnidarian classes have been identified in lower Cambrian strata, suggesting that cnidarian diversification represents one of the oldest evolutionary events among living animal phyla3,4. Nearly all cnidarian life cycles incorporate polyp and/or medusa body plans (Fig. 1b), the former a sessile life stage, and the latter a swimming predator equipped with neural and sensory structures that rival those of many bilaterians. Sequenced cnidarian genomes include the sea anemones Nematostella vectensis5 and Exaiptasia pallida (syn. Aiptasia sp.)6, the coral Acropora digitifera7 and the hydroid Hydra vulgaris (formerly Hydra magnipapillata)8. However, none of these species has a medusa life stage, and thus a major event in the evolution of complex animal life has not been subjected to whole genome sequencing.

Fig. 1: Cnidarian relationships, life cycles and sensory structures.
figure 1

a, Cladogram showing the phylogenic relationships between cnidarians with published genome sequences. b, Representative life cycles for cnidarians. Red arrows indicate sexual reproduction; blue arrows indicate metamorphosis and/or asexual reproduction. Some images modelled after Technau and Steele87. c, Organization of the rhopalia, a sensory structure found only in certain medusozoans such as Aurelia. d, Antibody staining demonstrating the clustering of tyrosinated tubulin-positive neurons (green) in the rhopalia. Red, phalloidin (actin stain); green, tyrosinated tubulin (Sigma, cat. no. T9028); blue, TO-PRO-3 Iodide (nuclear stain). Scale bar, 50 µm.

To improve our understanding of life history evolution in cnidarians, we have generated a draft genome assembly from the moon jellyfish Aurelia (‘species 1’ strain sensu, Dawson and Jacobs9), augmented with transcriptomes that cover the major life stages. Aurelia offers a tractable laboratory model and a valuable addition to comparative genomics. It is a member of the medusozoan class Scyphozoa, which represents a sister clade to Hydra and its relatives (Hydrozoa)10. The Aurelia medusa is a swimming planktivore, featuring complex neural and sensory system architecture manifested in eight structures called rhopalia, which are located on the margin of the medusa’s bell (Fig. 1c,d). The rhopalium features multiple sensory structures—including an eye-cup, a mechanosensory touch plate and a geosensory statocyst—and is patterned using several genes involved in bilaterian sensory organogenesis11,12. No comparable sensory structures exist in Nematostella, Exaiptasia, Acropora or Hydra. Genomes from medusa-bearing cnidarians such as Aurelia—alongside the forthcoming Clytia genome13—thus provide a new vantage into the evolution of complex animal life cycles.

Results and discussion

We sequenced and assembled the Aurelia genome using a combination of Illumina paired-end, mate-pair and PacBio data (see Methods section). Our final assembly has a total size of 713 megabases (Mb), which is consistent with previous estimates of the size of the Aurelia genome (C-value = 0.73 pg)14. This makes the Aurelia genome larger than sequenced anthozoan genomes, but smaller than some strains of H. vulgaris (~1.1–1.35 Gb for brown hydra and ~0.38 Gb for green hydra; see Supplementary Table 1)5,6,7,8. The Aurelia assembly is more fragmented than the anthozoan genomes. This is largely due to a high percentage of repetitive DNA, with transposable elements making up ~49.5% of the genome, and another ~0.8% of the genome consisting of simple tandem repeats (see Supplementary Table 5 for a summary of transposable elements). Synteny analysis performed with MCScanX15 suggests that anthozoans share far more syntenic blocks of orthologous genes amongst themselves than they do with Aurelia (see Supplementary Table 6 and the Supplementary Data). However, Aurelia shares more syntenic gene blocks with anthozoans than it does with Hydra, which suggests that its genome architecture is less derived. We found no evidence for trans-spliced leader sequences in our messenger RNA models, meaning that their presence in some hydrozoans is probably a clade-specific novelty16,17. Overall, the Aurelia genome shares characteristics with both anthozoans and hydrozoans, consistent with its phylogenetic placement (Fig. 1a).

Our annotation pipeline resulted in 29,964 gene models. This is on the higher end of gene count estimates in early branching animals, but is fewer than recent estimates for Acropora (Supplementary Table 1) and far fewer than the >40,000 genes currently predicted in the sponge Amphimedon18,19. Benchmarking Universal Single-Copy Ortholog (BUSCO)20 analysis of these gene models recovers complete or partial sequences for 76% of ‘core’ metazoan genes and 86% of ‘core’ eukaryotic genes, making the Aurelia assembly comparable to early branching organisms such as Amphimedon, Nematostella and Mnemiopsis (see Extended Data Table 3 in Levin et. al21, and the Supplementary Data for detailed BUSCO output). Using Pfam annotation, we catalogued the number of proteins with putative transcription-factor and peptide-signalling domains (Supplementary Tables 8 and 9; see the Supplementary Data for full Pfam annotation). In nearly every case, the numbers of conserved proteins in Aurelia fall within the range of other cnidarians. Based on these results, we feel confident that we have generated a draft genome of sufficient quality for comparative study.

The first question we wanted to address was intraspecies variability across Aurelia populations. The jellyfish used in our research, which is native to the coastline of California, is commonly referred to as Aurelia aurita. However, genetic markers reveal large sequence differences between various Aurelia populations (up to 40% divergence in ITS-1 and 23% in cytochrome c oxidase subunit I (CO1))9. Such diversity is comparable to interspecific differences in other marine animals, and suggests that the Aurelia species complex is ancient, probably originating in the Mesozoic9,22. Do these large differences in mitochondrial and non-coding regions imply equally large changes at the peptide level? To test this, we compared the protein models from our Californian strain of Aurelia to previously published transcriptomes from populations in Roscoff, France23, and Eilat, Israel24. The complete mitochondrial genome of our organism (contig ‘Seg3751’) shows 99% similarity to the ‘Aurelia aurita (2)’ mitogenome published by Park et al. (National Center for Biotechnology Information (NCBI) accession HQ694729)22. Phylogenetic analysis of the CO1 sequence derived from this mitogenome confirms that our strain is part of the ‘species 1’ complex (Fig. 2a). CO1 sequences of the Californian and Roscoff strains are ~97.8% identical, while the Californian and Eilat strains are ~81.5% identical. The average pair-wise identity between single-copy orthologous proteins is consistent with the CO1 results; amino acid sequences from the California and Roscoff strains are, on average, ~97.7% identical, while the California and Eilat strains are ~90.9% identical (Fig. 2b). For comparison, these same proteins in mice (Mus musculus) and rats (Rattus norvegicus) are, on average, ~95.1% identical (see the Supplementary Data). This means there is greater protein sequence divergence between some Aurelia populations than there is between mice and rats. These results suggest that, similar to Hydra, substantial variation exists across Aurelia genomes.

Fig. 2: Intraspecies variability across the genus Aurelia.
figure 2

a, Unrooted phylogenetic tree of Aurelia strains based on the CO1 genetic marker. Our ‘California stain’ is noted with a red arrow; the ‘Roscoff’ and ‘Eilat’ strains are noted with green and purple arrows, respectively. b, A graph showing the percentage amino acid identity of peptides between the strains of Aurelia. This analysis is restricted to single-copy orthologues shared between the three strains.

As the first step in our comparison of the Aurelia genome to other cnidarian genomes, we used OrthoFinder25 to group the cnidarian proteomes—as well as the bilaterians Branchiostoma, Capitella, Drosophila, Homo, Lottia and Limulus—into putative sets of conserved orthologues. Aurelia shares 378 conserved orthologous groups (COGs) with 1 or more bilaterians to the exclusion of other cnidarian genomes, including 27 COGs shared with Drosophila and 60 COGs with humans (Supplementary Fig. 2; the full list is provided in the Supplementary Data). Noteworthy, vetted members of this list include homologues of FBXO25/FBXO32 and RAG1—members of the FoxO signalling pathway that regulates stem cell maintenance in Hydra26,27—as well as JMY, which dynamically regulates cell motility and P53-based tumour suppression28. RAG1 has previously been identified in the hydrozoan jellyfish Podocoryna29, which suggests that the FoxO pathway might be broadly conserved across medusa-bearing cnidarians. Despite the hypothesized derived nature of medusozoans, their orthologue repertoire is equally similar to bilaterians compared to anthozoans (Fig. 3a); this suggests that medusozoans and anthozoans have retained comparable portions of the ancestral cnidarian/bilaterian gene repertoire.

Fig. 3: Gene expansions and losses among sequenced cnidarian genomes.
figure 3

a, A correlation matrix of orthologous gene clusters, represented as heat maps. The heat map codes clusters as binary ‘present/absent’ data for each taxon. b, A molecular clock for the five cnidarians and six bilaterians included in this study. Pie charts represent 8,263 conserved gene families present in the last common ancestor of cnidarians and bilaterians; the colours in each chart represent the number of families experiencing gene copy expansion, retraction or no change at that evolutionary node. Nodes within the Bilateria have been removed for simplicity, but all data for node dates and expansion/contraction statistics are available in the Supplementary Data. Abbreviations for the x axis: Cry, Cryogenian; Edi, Ediacaran; Є, Cambrian; O, Ordovician; S, Silurian; D, Devonian; C, Carboniferous; P, Permian; T, Triassic; J, Jurassic; K, Cretaceous; Pg, Paleogene; Ng, Neogene.

Focusing on orthologue clusters shared between cnidarians and bilaterians, we next traced patterns of gene gain and loss across 8,263 conserved gene families shared in the cnidarian/bilaterian (planulozoan) last common ancestor (Fig. 3b). Our results suggest that cnidarians and bilaterians each had their own pattern of gene expansions and contractions, as well as lineage-specific increases in novel gene families. This is consistent with the correlation matrix (Fig. 3a), which suggests that the organisms in our data set have largely dissimilar patterns of gene gain and loss compared with each other. The fraction of gene family contractions in Aurelia inherited from the planulozoan last common ancestor (~40%) is slightly higher than anthozoans (31–35%) but lower than Hydra (46%), which has undergone substantial gene loss. Regarding gene expansions, the rate in Aurelia (~23%) is comparable to that of available cnidarian genomes (~12–24%). If we expand our consideration to genes not present in the last common ancestor, gene innovation appears to be commonplace in the anthozoans; the number of COGs restricted to 2 or more anthozoans (1,695 clusters) is far greater than the numbers restricted to medusozoans (319 clusters; see Supplementary Fig. 2 for details). There are several sets of transcription factors that appear greatly expanded in Aurelia compared with other cnidarians, including proteins featuring a basic region leucine zipper, C2H2 type zinc finger, ETS, GATA zinc finger and/or HMG box domain (Supplementary Table 8). In all of these cases, many of the genes are differentially expressed, and demonstrate complex expression profiles across Aurelia’s life history (Supplementary Figs. 3 and 4). These gene expansions provide possible candidates for regulating the complex life cycle found in Aurelia, and are worthy of future study. But at a genome-wide vantage, there is little evidence that the expansion of conserved genes played an outsized role in the evolution of medusozoan body-plans.

Homeobox genes—a large clade of transcription factors that share a ~60-peptide DNA-binding homeodomain region—are primary candidates in the study of animal body-plan evolution, and a common starting point when analysing the gene content of early branching animal lineages30,31,32,33. In our list of COGs, we recovered several homeobox genes that Aurelia putatively shares with bilaterians to the exclusion of available cnidarian genomes. However, high sequence conservation within this gene group limits vetting with the Basic Local Alignment Search Tool (reciprocal-BLAST), so we performed a more detailed analysis of homeobox evolution using phylogenetic analysis (see Methods section). We attribute cnidarian homeodomains to 69 bilaterian families encompassing 9 classes (Fig. 4), which significantly increases the reconstructed homeobox gene complement of the planulozoan last common ancestor32. Anthozoans have higher homeobox gene counts than medusozoans; this is partly attributable to gene loss in medusozoans, but is mostly the result of multiple rounds of anthozoan-specific gene duplication events32,34. Putative anthozoan expansions involve Dmbx-, POU3-, Barx-, Bari-, Nk2- and Noto-like genes, as well as large radiations of PRD- and ANTP-class genes that cannot be readily matched to bilaterian genes (Supplementary Table 10 and see the Supplementary Data for homeodomain trees and assignments). In contrast, Aurelia appears to be missing 21 homeodomains found in 1 or more anthozoans (17 of which are also missing in Hydra), while it had mild expansions of Otx-, Vsx- and Hox9-13/15-like genes. These results provide a case study where the anthozoan gene repertoire is larger than that of Aurelia, despite the latter’s complex life cycle.

Fig. 4: The homeodomain complement of various animals, divided into the 11 major classes proposed by Zhong and Holland88.
figure 4

Rows represent candidate genomes from major animal groups, organized by their evolutionary relationships. Columns contain gene counts for each of the 11 major homeodomain classes. The hypothesized complement of the cnidarian/bilaterian last common ancestor is presented in the grey box to the left. Increases in cnidarian gene counts are noted in red. Gene counts for non-cnidarians are taken from HomeoDB288 and refs 30,31,89.

Given that conserved gene families are not broadly expanded in Aurelia, it is nevertheless possible that taxonomically restricted (orphan) genes have played a driving role in the evolution of medusozoan life stages. To test this hypothesis, we analysed RNA sequencing (RNA-seq) data from six stages in the Aurelia life cycle: planula, polyp, early strobila, late strobila, ephyra and juvenile medusa (Fig. 1a). A total of 11,963 differentially expressed genes were phylogenetically annotated based on a series of BLAST queries (results provided in the Supplementary Data). We found no evidence that taxonomically restricted genes demonstrate a collective trend towards upregulation in taxonomically restricted life stages (Fig. 5). Instead, genes unique to Aurelia are expressed more or less evenly across the life cycle. Some orphan genes are likely to play important roles in the development of the medsua23 but, at a transcriptome-wide level, the evolution of novel life stages in Aurelia appears to be the result of redeploying deeply conserved genes as opposed to acquiring new ones.

Fig. 5: RNA-seq expression profiles across the life cycle.
figure 5

Breakdown of 11,963 differentially expressed genes across the Aurelia life cycle by their putative taxonomic origin (left), and by their associated gene expression profiles (right). The gene expression profiles are organized by life stage on the x axis. The y axis shows the log transcript per million (TMM) counts for each gene in the cluster.

Since it appears that the development of medusozoan life stages involves redeployment of conserved genes, we next asked whether these genes demonstrate evidence of conserved functionality. We first searched for transcripts that are differentially regulated between pan-cnidarian life stages (planula through polyp) and medusozoan-specific life stages (early strobila through medusa). This analysis was restricted to genes that were successfully annotated using the Uniprot Swissprot35 data set. Enriched gene ontology annotations from these two clusters (provided in the Supplementary Data) are consistent with recent research on Aurelia development; for example, that the polyp-to-medusa transition involves major changes in the nervous system36, musculature37 and cnidocyte composition38. In a separate analysis, we annotated these differentially expressed genes based on their best BLAST hits from the Drosophila or Homo proteomes (see the Supplementary Data). These annotated genes were clustered into expression profiles (Supplementary Fig. 9) and submitted to STRING v1039 to look for the possible conservation of protein–protein interactions and enriched gene networks. According to STRING, all clusters contain significantly more protein–protein interactions than expected by chance (protein–protein interaction enrichment P value >0.05). These results support the hypothesis that conserved, differentially expressed genes in the medusa life stages are frequently involved in gene networks present in bilaterian animals.

For a final analysis, we focused on the enrichment of eye development proteins, because the homology between bilaterian and cnidarian eyes has been the subject of a long-standing debate in evolutionary biology40. Aurelia rhopalia feature a simple ‘pit eye’ that is probably capable of recognizing the direction of light41 (Fig. 1c), and scyphozoans are the sister taxon to cubozoans (box jellies), which feature complex eyes with a lens and retina. We began our analysis by using QuickGO to collect all Drosophila proteins known to play a role in eye morphogenesis (see the Supplementary Data). We created an interaction network for these proteins using STRING, and coloured them based on their expression profile in Aurelia (Fig. 6a). Of the genes involved in Drosophila eye morphogenesis, 61% have a homologue in Aurelia (292/478 queries); of these, ~59% exhibit significant differential expression in Aurelia (172/292 queries). For the 172 differentially expressed genes, only 19 are upregulated in medusozoan-specific life stages. These results suggest that proteins involved in Drosophila eye morphogenesis are not uniformly upregulated in Aurelia, and that many aspects of eye development are unlikely to be conserved.

Fig. 6: Clustering of differentially expressed genes and gene ontology (GO) terms.
figure 6

a, A protein interaction network showing genes involved in Drosophila eye development. The circles are coloured based on their expression profile in Aurelia. b,c, Protein interaction networks and select enriched GO terms for the Aurelia genes most similar in expression profile to eyes absent. Networks and enrichment analysis were performed using STRING, and based on putative homology to proteins in Drosophila (b) and humans (c). The illustrated GO terms were chosen by how informative they are and their non-redundancy. A full list of proteins and enriched GO terms are provided in the Supplementary Data.

Despite the abovementioned results, many of the major players of the ‘canonical’ eye-patterning network are upregulated in Aurelia during development of the medusa, including sine oculis (so), eyes absent (eya) and ocelliless (oc) (Fig. 6a). Many of these genes have previously been shown to be expressed in the Aurelia rhopalia11,42. We therefore flipped our original question; instead of asking what bilaterian eye-patterning genes are conserved in Aurelia, we asked, what are the functions of putative Aurelia eye-patterning genes in bilaterians? We used our gene clustering analysis to extract the genes with most similar expression profiles to eyes absent (Supplementary Fig. 8). Based on putative homologues in Drosophila and humans, we looked for potential conserved protein interactions and enriched gene ontologies. When compared against the Drosophila proteome, the Aurelia genes with expression profiles most similar to eyes absent are enriched in functions involving neurogenesis and compound eye formation (Fig. 6b). This analysis revealed some candidate genes for eye development in Aurelia that were missed in the QuickGO analysis. Interestingly, the same set of genes does not show enrichment for eye development in humans; instead, the list is dominated by proteins involved in kidney/nephron formation, neuron commitment and heart morphogenesis (Fig. 6c). Overall, our results provide intriguing evidence that sensory structures in Aurelia share ‘deep homology’ with bilaterian organs via ancestral multifunctional cell types43,44, and provide a case study for how the Aurelia genome can be queried to study gene regulatory network evolution in animals.


In conclusion, our results do not support the hypothesis that an increase in life history complexity in cnidarians is associated with an increase in gene number. Instead, Aurelia appears to pattern its strobila, ephyra and medusa life stages using many of the same genes found in bilaterian animals, possibly through the redeployment and modification of ancestral gene networks. This finding adds to a growing body of evidence that the evolution of the medusa life stage required the co-option of previously existing developmental gene networks and cell types. For example, Kraus and colleagues examined the expression of ten pan-metazoan genes in Aurelia, and determined that the medusa’s bell demonstrates a similar expression profile to the polyp tentacle45. The fact that a similar pattern is observed in the hydrozoan Clytia led these authors to conclude that medusas are homologous across the Cnidaria, and were derived from the polyp’s tentacle analgen45. Polyps and medusas of the hydrozoan Podocoryna share similar Wnt3/frizzled dynamics, suggesting that axial patterning in the medusa is derived from the polyp46. Other structures in the medusa could have even older origins; the eyes of Cladonema and Aurelia medusae express canonical photoproteins and transcription factors found in bilaterian eyes, suggesting that both may be derived from ancestral photosensitive cells42,47,48,49, and light-induced spawning in Clytia medusae is driven by a hormone-regulating opsin, which could suggest a deep homology between cnidarian gonadal photosensitive-neurosecretory cells and bilaterian deep brain photoreceptors50. While compelling, these studies focus on well-understood and broadly conserved developmental genes, and their results might subsequently overemphasize the similarities between medusae development and the development of other animals. A major contribution of this study to this literature is to demonstrate that these previous observations made on small numbers of genes appear to hold true at a genome-wide vantage.

A second contribution of this study is that it provides the first direct comparison between anthozoan genomes and the genome of a medusa-bearing cnidarian, which led to our discovery that patterns of gene gain, loss and co-option are comparable between the lineages. As important as gene co-option appears in Aurelia’s evolution, we did discover multiple gene family expansions that could be candidate drivers of medusa development, as well as many taxonomically restricted genes that are upregulated in the polyp-to-medusa transition. This finding is consistent with previous studies that have leveraged high-throughput sequencing to holistically examine medusa development, and broadly support the hypothesis that this life stage is generated from a combination of modified gene regulation as well as gene gain and loss23,51,52,53. However, our analyses allow us to further hypothesize that taxonomically restricted genes are not overrepresented in the polyp-to-medusa transition, and that changes in gene content appear just as common in the anthozoans as they are in Aurelia. Although anthozoans such as Nematostella are sometimes described as ‘basal’ cnidarians, this study provides a powerful reminder that all living animals exhibit a mosaic of ancestral and derived traits, and that reconstructing the genomic evolutionary history of animal life will continue to require a broad, comparative approach54.

We see two ways to interpret our analysis of the Aurelia genome, both of which have strong implications for the early evolution of animal life. The first interpretation is that medusozoans evolved a complex life cycle primarily by redeploying genetic and developmental pathways present in the planulozoan last common ancestor. This interpretation, if correct, suggests that animals can transition into radically different ecological niches (in this case, transitioning from benthic to pelagic carnivores) without major innovations in gene content. As the Precambrian–Cambrian transition represents an ecological explosion as much as a morphological one55, our results challenge the importance of genetic innovations in the early expansion of animal niches. The second possibility is that the last common ancestor of cnidarians had a medusa life stage, which was subsequently lost in anthozoans. This scenario was supported by many studies done in the twentieth century56, but lost popularity after genetic analyses refuted the hypothesis that hydrozoans are the earliest branching cnidarian lineage. Later cladistic analyses of morphological characters57 and the derived structure of many medusozoan mitochondrial genomes58 have been used as additional evidence that the medusozoan body-plan is derived in Cnidaria. However, our results do not support this hypothesis at the genetic level. Despite the current popularity of the ‘polyp-first’ scenario, it is worth reiterating that neither the polyp nor medusa life stage is found outside of cnidarians; it is therefore equally parsimonious for the first cnidarians to have had a biphasic life cycle that was lost in anthozoans, or for the medusa phase to have originated in medusozoans (see Fig. 1a). Our results cannot distinguish between these two scenarios, but they are consistent with a growing body of literature that the earliest branching animals may have included pelagic carnivores with complex neural and muscular architecture59,60. The ecological roles that animals such as jellyfish and ctenophores could have played in Precambrian oceans—where their modern mesoplankton prey were probably absent—is thus a pressing question in studies of the early evolution of animals61.

In addition to questions of evolution, we anticipate the Aurelia genome proving valuable in many other areas of biology. Given the varying degrees of nervous system complexity and behaviour across its life stages, Aurelia has and will continue to be an important model for studying the development and function of nervous systems12. Aurelia is a promising candidate for marine population genomics, as the division of this circumglobal genus into multiple species or subspecies remains unresolved9. It is also an important ecological model system, as Aurelia is a major culprit in environmentally and economically damaging jellyfish blooms, which may or may not be on the rise due to climate change62. Finally, Aurelia will provide an important study system in animal regeneration, as different life stages exhibit varying strategies of wound healing63. We look forward to additional progress in these fields now that the moon jellyfish has joined the genome family.


DNA collection and genome assembly

For genome sequencing, a single Aurelia polyp obtained from the Birch Aquarium (San Diego, California) was grown into a clonal population in the laboratory. A segment of the mitochondrial CO1 gene was amplified and sequenced, identifying the strain as Aurelia sp.19. Polyps were kept in artificial seawater (ASW) at room temperature and fed with Artemia nauplii (Brine Shrimp Direct, UT) once every 2 d. Strobilation was induced with 5 µM 5-methoxy-2-methylindole in ASW, or by lowering the temperature of the ASW to 14 °C for about a month. Total DNA was extracted from ephyrae using a salting-out protocol described in the Supplementary Methods. Ephyra were chosen as the source material for genomic DNA collection since multiple ephyra are produced by each polyp, and as pelagic organisms there is a substantially lower risk of collecting the algal contaminants that often grow alongside polyp communities. DNA was sheared to an average size of 10 kbp using a Covaris G-tube. The libraries used and statistics on the sequences obtained are described in the Supplementary Methods and summarized in Supplementary Table 2.

Genome assembly

The strategy for assembling the Aurelia genome is illustrated in Supplementary Fig. 1. The 250-bp paired-end reads were assembled into contigs using DISCOVAR de novo with its default options (version 53488, Broad Institute). Only contigs >1 kbp were used for the subsequent scaffolding steps. Initial scaffolding was performed using error-corrected PacBio reads (produced in 2012 using XL-P2 sequencing chemistry) and SSPACE-LR with its default options (version 1-1)64. The hybrid error correction of PacBio reads was performed using proovread (version 2.13.8)65, with error correction based on a combination of 250-bp paired-end reads merged with FLASh66, as well as high-confidence unitigs generated with ALLPATHS-LG (version 48257)67. Unitigs were generated from the 250-bp paired-end reads as a fragment library and the two mate-pair data sets as jumping libraries without quality trimming. ALLPATHS-LG was run with FRAG_COVERAGE and JUMP_COVERAGE set to 45, CLOSE_UNIPATH_GAPS set to FALSE and HAPLOIDIFY set to TRUE. The output of SSPACE-LR was further scaffolded using SSPACE (version 3.0)65,68 with the two sets of quality-trimmed mate-pair reads and the following options: -x 0 -m 32 -o 20 -k 5 -a 0.70 -n 15 -p 0 -v 0 -z 0 -g 0 -T 32 -S 0. Quality trimming of the 4-kbp mate-pair reads was done using HTQC69. Quality trimming of the 8-kbp mate-pair reads was done using cutadapt70 and Trimmomatic71. Scaffolding with SSPACE-LR was repeated before gaps were filled with PacBio reads using PBJelly (version 15.8.24)72 with -t 1000 -w 4000 options at the assembly step. All filtered reads without error correction were used for the gap filling with PBJelly. Additional scaffolding steps with SSPACE and SSPACE-LR were carried out after the gap filling. Final scaffolding was performed using L_RNA_scaffolder73 combined with the de novo transcriptome assembly (see below). Finally, gaps were filled using Sealer (version 1.9.0)74 and quality-trimmed 250-bp paired-end reads with -P 100 and -B 5000 options by scanning k-mer sizes from 96 through 86. Quality trimming of the 250-bp paired-end reads was done using Trimmomatic71. Assembly statistics at each step of the assembly pipeline are shown in Supplementary Table 3. Scaffolds larger than 2 kbp were used to calculate the final assembly statistics in Supplementary Table 1.

Isolation of mRNA, library preparation and de novo transcriptome sequencing

DNA/RNA was extracted from samples using a phenol/chloroform protocol, and total RNA was isolated using a clean-up step with TRI reagent (Sigma-Aldrich). Details of the protocol are descried in the Supplementary Methods. The concentration and integrity of each RNA extraction was verified using a 2100 Bioanalyzer (Agilent). Total RNA was converted into tagged complementary DNA libraries using the TruSeq RNA Sample Preparation Kit v2 (Illumina) according to the manufacturer’s protocol. Libraries were sequenced using an Illumina HiSeq 2000. We began by running 1 polyp sample on 1 lane with 100-nucleotide paired-end sequencing. After vetting the results, we performed additional 100-nucleotide paired-end sequencing on samples across the life cycle. These paired-end data sets were used for the de novo transcriptome assembly. Additional biological replicates were sequenced using 50-nucleotide single-end reads. Details about each sample and the relevant NCBI Sequence Read Archive accessions are provided in Supplementary Table 2.

Gene prediction and annotation

The annotation pipeline is described in detail in the Supplementary Methods and illustrated in Supplementary Fig. 1. Briefly, de novo transcriptome assembly was performed using Trinity75, and this data was passed to PASA76. Ab initio predictions were performed using GeneMark-ES77, glimmerHMM78 and the AUGUSTUS web server79 with default settings. Trinity models and the Uniprot Swissprot protein data set were mapped to the genome using exonerate80 and GMAP81. All gene models were passed to EVidenceModeler76 to create a weighted consensus gene structure data set, and the weighted models were passed back into PASA to create a final set of predictions76.

Following gene modelling, the results went through an annotation pipeline that included the following analyses: (1) BLASTp of protein models against the Uniprot Swissprot data set, (2) BLASTx of transcript models against the Uniprot Swissprot data set and (3) protein domain identification using HMMER and the Pfam-A database82,83. Gene models were rejected if they lacked a protein model and Uniprot annotation and had less than ten total reads mapped from the RNA-seq analyses (described below). This resulted in a final count of 29,964 vetted gene models. An annotation report from this pipeline is included in the Supplementary Data. The gene annotations described above were used to create the tables comparing genes with conserved transcription-factor domains (Supplementary Table 8) and signalling molecules (Supplementary Table 9). Basic statistics on the gene models are provided in Supplementary Table 4.

Test for trans-spliced leader additions

Because the gene models are built off of the genomic backbone, we would not anticipate finding trans-spliced leader additions in this data. We instead used the de novo mRNA models, which were assembled by Trinity using 100-bp paired-end reads (see above). We performed two tests to look for conserved leader sequences. First, we used BLASTn to query all known Clytia16 and Hydra17 trans-spliced leader sequences against the Trinity mRNA models. After finding no hits, we truncated all Trinity mRNA models to the first 100 bp, and then performed an all-versus-all BLASTn analysis with an e-value cut-off of 10 × 10-5. Only one pair of unrelated mRNA models (that is, not sharing the same cluster and/or gene identity in the Trinity output) shared a conserved region in this analysis. We therefore conclude that there is no evidence in our data for trans-spliced leader addition in Aurelia.

RNA-seq analysis

We used a genome-guided approach to RNA-seq. First, raw reads were aligned to the Aurelia genome using Hisat-284. For paired-end data sets, only the first 50 nucleotides from the forward reads were used. Gene counts were then estimated with the StringTie package85. Following vetting of the data sets (Supplementary Fig. 7), differential gene expression was calculated using the EdgeR package86. Only vetted genes were included in the analysis. Differentially expressed genes were identified based on a false-discovery rate adjusted P value of 0.05, and a minimum fourfold change in expression in at least 1 life stage comparison. The StringTie count matrix used for EdgeR is provided in the Supplementary Data.

STRING analysis

For STRING analysis, all differentially expressed genes from Aurelia were queried against the predicted proteins for Drosophila (Uniprot identity: UP000000803) and Homo (Uniprot identity: UP000005640) using BLASTx (with a minimum e-value of 10 × 10-5). The top BLAST hits were used to batch submit queries in the ‘Multiple Proteins’ section of the STRING v10 web server39.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.