Topological structures and syntenic conservation in sea anemone genomes

Zimmermann, Bob; Montenegro, Juan D.; Robb, Sofia M. C.; Fropf, Whitney J.; Weilguny, Lukas; He, Shuonan; Chen, Shiyuan; Lovegrove-Walsh, Jessica; Hill, Eric M.; Chen, Cheng-Yi; Ragkousi, Katerina; Praher, Daniela; Fredman, David; Schultz, Darrin; Moran, Yehu; Simakov, Oleg; Genikhovich, Grigory; Gibson, Matthew C.; Technau, Ulrich

doi:10.1038/s41467-023-44080-7

Download PDF

Article
Open access
Published: 13 December 2023

Topological structures and syntenic conservation in sea anemone genomes

Nature Communications volume 14, Article number: 8270 (2023) Cite this article

4218 Accesses
3 Citations
40 Altmetric
Metrics details

Subjects

Abstract

There is currently little information about the evolution of gene clusters, genome architectures and karyotypes in early branching animals. Slowly evolving anthozoan cnidarians can be particularly informative about the evolution of these genome features. Here we report chromosome-level genome assemblies of two related anthozoans, the sea anemones Nematostella vectensis and Scolanthus callimorphus. We find a robust set of 15 chromosomes with a clear one-to-one correspondence between the two species. Both genomes show chromosomal conservation, allowing us to reconstruct ancestral cnidarian and metazoan chromosomal blocks, consisting of at least 19 and 16 ancestral linkage groups, respectively. We show that, in contrast to Bilateria, the Hox and NK clusters of investigated cnidarians are largely disintegrated, despite the presence of staggered hox/gbx expression in Nematostella. This loss of microsynteny conservation may be facilitated by shorter distances between cis-regulatory sequences and their cognate transcriptional start sites. We find no clear evidence for topologically associated domains, suggesting fundamental differences in long-range gene regulation compared to vertebrates. These data suggest that large sets of ancestral metazoan genes have been retained in ancestral linkage groups of some extant lineages; yet, higher order gene regulation with associated 3D architecture may have evolved only after the cnidarian-bilaterian split.

Reconstruction of proto-vertebrate, proto-cyclostome and proto-gnathostome genomes provides new insights into early vertebrate evolution

Article Open access 23 July 2021

Ancient gene linkages support ctenophores as sister to other animals

Article Open access 17 May 2023

From comparative gene content and gene order to ancestral contigs, chromosomes and karyotypes

Article Open access 13 April 2023

Introduction

Comparative genomics and epigenomics have provided fundamental insights into the evolution of gene regulation in multicellular organisms. Among basally branching animals, this led to the surprising discovery of conserved gene repertoires^1,2,3, of microsyntenic gene clusters and patterns of DNA methylation and histone modification codes^4,5,6. Recent applications of long-read sequencing technologies and proximity ligation methods such as Hi-C facilitated the genome assembly on chromosomal level of many organisms. These chromosome-level genome assemblies have presented the opportunity to compare the content and localization of homologous genes between distantly related species and the reconstruction of ancestral linkage groups. In vertebrates and several other bilaterians, topological analysis has revealed three-dimensional chromosomal architecture⁷, organized as chromosomal compartments and topologically associated domains (TADs)⁸. The boundaries of TADs have been shown to act as barriers of gene regulation^9,10,11,12. Analyses on chromosome evolution and three-dimensional genome structure have mainly focused on bilaterians, and it remains unclear whether non-bilaterian animals share the topological signatures of bilaterian chromosomes. In this regard, studying Cnidaria, the sister clade to Bilateria, is a crucial step to understanding the evolution of animal genomes.

Cnidaria constitutes a large clade of basally branching Metazoa, dating back between 590 and 690 Mya^13,14,15. Their robust phylogenetic position as the sister group to Bilateria makes them the key group to study the evolution of bilaterian features, such as axis organization, mesoderm formation and central nervous system development¹⁶. The starlet sea anemone, Nematostella vectensis (Fig. 1a—Hexacorallia; Actiniaria; Edwardsiidae), has been developed into an important model organism^17,18,19 and in 2007 became the first non-bilaterian animal to have a draft scaffold-level genome assembled¹. The Nematostella genome revealed uncanny conservation of gene content to vertebrates as well as the first observations of large-scale macrosyntenic conservation by way of comparisons with human chromosomes to Nematostella scaffolds¹.

**Fig. 1: Chromosomal assembly of edwardsiid sea anemones *Nematostella vectensis* and *Scolanthus callimorphus*.**

Since the release of the Nematostella genome, genomes of the representatives of all five cnidarian classes have become available^{2,20,21,22,23,24,25,26,27} providing valuable insight into various aspects of the cnidarian gene complement and genome organization. However, these genomes originated from distantly-related species, and few genomic studies of cnidarians have sought to search for genomic conservation signals of recently-diverged species, a cornerstone of comparative genomics²⁸. For example, no genome sequence of an edwardsiid sea anemone has been reported, aside from that of Nematostella. Another sea anemone in the family Edwardsiidae is the “worm sea anemone” Scolanthus callimorphus (Fig. 1b), dwelling in European intertidal zones^29,30, which according to our molecular clock calculations has separated from Nematostella vectensis ~174 Mio years ago, to date its closest sequenced relative (Supplementary Fig. 1, see Materials and Methods for details).

Here, we report chromosome-level genome assemblies and new gene models for the model organism Nematostella vectensis and of the closely-related species Scolanthus callimorphus. We show that the macrosyntenic localization of orthologous genes across the chromosomes are highly conserved across many cnidarians and bilaterians, allowing us to reconstruct the minimal set of ancestral eumetazoan chromosomal linkage blocks. Notably, the macrosyntenic conservation contrasts with the loss of most of the microsyntenic blocks, including the integrity of the Hox cluster. Moreover, in Nematostella, which like all non-bilaterians lacks the TAD boundary protein CTCF^4,31, HiC analyses did not reveal any obvious TAD-like structures, as known in Bilateria. We propose that the proximity of most cis-regulatory regions to their target genes is an ancestral metazoan feature, whereas large-scale three-dimensional structures, such as TADs, only became necessary in bilaterians with larger genomes and more distantly located enhancers.

Results

High quality chromosome-level assemblies of two Edwardsiid genomes

Using short-read sequencing and a k-mer coverage model, we estimated the genome length of Nematostella at 244 Mb (Supplementary Fig. 2), which is substantially shorter than previously suggested at 450 Mb¹. This discrepancy could be partly attributed to the previous use of four haplotypes in sequencing, lower coverage and read length. The genome of the sea anemone Exaiptasia pallida is similar in length to Nematostella³², while the estimated 414 Mb of the Scolanthus genome is at present the largest sequenced sea anemone genome (Fig. 1c). Using PacBio long-read sequencing and high-throughput conformation capture (Hi-C), we then assembled chromosome-level Nematostella and Scolanthus genomes, which surpass the quality of the original Nematostella genome in terms of contiguity, accuracy and mappability (see Supplementary Notes, Supplementary Figs. 2,3 for details).

Gene completeness as measured by alignment of single-copy pan-metazoan genes to the assemblies using the BUSCO method³³ was comparable to the previous assembly (Supplementary Fig. 2i), however in order to completely assess this, we generated new gene models. Using a combination of IsoSeq and RNASeq data, we identified 24,525 gene models and 36,280 transcripts (see Supplementary Notes for details). BUSCO analysis showed that the gene set was more complete than a previously-generated gene set³⁴. In addition, the new gene models showed a better mappability to a publicly available single-cell RNA-Seq data set^35,36 (Supplementary Fig. 4).

To facilitate the usage of the newly assembled genomes, we established a publicly accessible genome browser. Both new genome assemblies and associated data are available for browsing, downloading, and BLAST at SIMRbase (https://simrbase.stowers.org). The Nematostella vectensis genome assembly, referred to as Nvec200, has an abundance of aligned track data, including the newly generated gene models, a large collection of published RNAseq and ChIP-seq analyses, as well as 145 ultra-conserved non-coding elements (UCNEs) shared between Nematostella and Scolanthus (Supplementary Notes; Supplementary data file 8).

Comparison of Nematostella chromosomes to metazoan genomes

We identified 15 chromosomes in the new genome assemblies of Nematostella and Scolanthus, and we numbered these chromosomes from 1 to 15 according to decreasing size (Fig. 1d, e). This is in line with the previous estimates based on the number of Nematostella metaphase plates¹ and the analysis of Nematostella chromosome spreads³⁷. Most chromosomes, according to their homologous pair, corresponded in length but are much larger in Scolanthus (Fig. 4d). This was accounted for by a large fraction of unclassified and potentially lineage-specific repeat sequences (Supplementary data file 1, Supplementary Notes). No obvious heteromorphic sex chromosomes could be identified based on read mapping depth, or from the level of heterochromatin usually associated with non-recombinant sex chromosomes. Ascertaining the sex-determining regions of the genome will require more data and detailed analyses.

Next we wished to determine the extent to which the Nematostella and Scolanthus chromosomes exhibit conservation of gene content and order (micro- and macrosynteny). Indeed, each of the 15 chromosomes of both species shared a majority of orthologous genes with a single corresponding chromosome in the other species (Fig. 2a). We found that 8117 of 8692 mutual best BLAST hits between Nematostella and Scolanthus were retained on their respective chromosomes, implying a one-to-one homology between all 15 chromosomes. However, gene order was largely lost from the most recent common ancestor (MRCA), which we estimate to have diverged approximately 174 Mya (Supplementary Fig. 1).

**Fig. 2: Macrosynteny conservation of edwardsiid genomes and reconstruction of ancestral linkage groups.**

To assess the macrosyntenic conservation between anthozoans we compared the Nematostella chromosomes to those of the sea anemone Exaiptasia pallida, the stony coral Acropora millepora and the soft coral Xenia sp. (Fig. 2a). While both Exaiptasia and Acropora are only assembled on the scaffold level, we observed that the gene content of the scaffolds suggest a similar karyotype to Nematostella. The chromosome-level genome assembly of the octocoral soft coral Xenia sp. also appears to have 15 chromosome-scale scaffolds²⁰. However, despite the apparent identical number of chromosomes, the chromosome-scale scaffolds of Xenia showed surprisingly little homology to those of the hexacoral Nematostella. We discovered many translocations and fusions, with only three chromosomes demonstrating a one-to-one relationship (Fig. 2a).

Notably, the hydrozoan Hydra vulgaris, which belongs to the clade of Medusozoa, also has 15 chromosomes³⁸. While our analyses revealed many ancestral linkage groups in the comparison with Hydra, we also detected numerous chromosomal rearrangements, including splits and fusions (Fig. 2a, Supplementary Fig. 5). Specifically, we observe that Nematostella chromosomes 2 and 3 are partially or completely split in other cnidarian genomes. By comparison, the scyphozoan jellyfish Rhopilema has 21 chromosomes, supported with karyotype images^21,39. We observed that all Rhopilema and Nematostella chromosomes show a clear 1-to-1, 1-to-2 or, in a single case, a 1-to-3 macrosyntenic correspondence, which can be traced to the ancestral linkage groups (Supplementary Figs. 5, 7). These comparisons among cnidarians suggest that the history of cnidarian chromosomes may be more complex than previously envisioned. This is in line with recent interpretations of cnidarian chromosome comparisons⁴⁰.

We then compared the chromosome-scale scaffolds of Nematostella with that of a bilaterian, the cephalochordate Branchiostoma floridae. Cephalochordates are early branching chordates lacking the two rounds of whole-genome duplication and allotetraploidization suggested for vertebrates^41,42, as represented by their single Hox cluster⁴³ (Fig. 2a). Strikingly, Branchiostoma and Nematostella chromosomes retain extensive macrosynteny since their divergence from the bilaterian-cnidarian MRCA. The large-scale macrosynteny retention detected in Branchiostoma becomes less obvious once we focus on more recently-branching bilaterian clades. Vertebrates, such as the early branching actinopterygian fish Lepisosteus oculatus⁴⁴ and humans, underwent two or more rounds of genome duplications and showed additional translocation events from the MRCA (Fig. 2a). Among protostomes, we observed that the lophotrochozoan Ram’s horn worm Streplosbio benedicti and the sea scallop Patinopecten yessoensis⁴⁵, as well as the ecdysozoan millipede Trigoniulus corallinus⁴⁶, retained many macrosyntenic links (Fig. 2a).

The high degree of macrosynteny conservation with Branchiostoma motivated us to compare Nematostella chromosomes to those of the representative of an even older lineage, the Porifera (sponges), which branched off prior to the bilaterian-cnidarian split (Supplementary Fig. 1) and is considered by many to be the sister group to all other animals^47,48. Strikingly, the comparison with the recently-assembled chromosomes of the sponge Ephydatia muelleri revealed many shared ancestral linkage groups (Fig. 2a).

Reconstruction of ancestral linkage groups

The observed patterns of conserved macrosynteny between chromosomes of multiple distantly-related species and the varying degrees of gene shuffling and chromosomal rearrangements in others (e.g., Drosophila, Caenorhabditis, humans) prompted us to reconstruct the set of predicted ancestral linkage groups for the MRCA of cnidarians, bilaterians and metazoans (Fig. 2b). First, we inferred gene orthology across all genomes investigated. For each of Metazoa, Bilateria and Cnidaria, we determined ancestral linkage groups by determining genes with high chromosomal linkage throughout each clade. Our method finds ALGs which maximize their “modularity”, i.e., groups with high chromosomal linkage within ALGs and low linkage between ALGs, by evaluating multiple scenarios (see Methods and Supplemental notes for details) to infer “ancestral linkage groups” (ALGs). We can then color the oxford plots with the genes in these blocks suggesting a pattern of splits and fusions in extant chromosomes from ancestral chromosomes.

We identified 17 ALGs present in the ancestor of cnidarians, although we could find similar modularity in scenarios of 16–18 ALGs (see Supplementary notes for details). These correspond well to the extant cnidarian chromosomes (Fig. 2c, Supplementary Fig. 5), however, while the Rhopilema and edwardsiid chromosomes appear highly representative of the ancestral cnidarian karyotype, the soft coral Xenia chromosomes appear to have undergone more chromosomal translocation events. Our results also indicate that although both Hydrozoa and Edwardsiidae have a clear 15 chromosome karyotype, their chromosomes originate from distinct fusions of cnidarian ALGs. For the ancestor of Bilateria, we identified 16 ALGs (Fig. 2c). In a previous study 17 ALGs were proposed⁴⁹, which resulted in similarly high modularity in our analysis (see Supplementary notes for details). By comparison with the sponge Ephydatia, we reconstructed a minimum of 16 metazoan ALGs for the metazoan ancestor, which were maintained in the cnidarian MRCA, Bilateria and the sponge (Fig. 2b, c). We also compare this to a recent reconstruction of the ancestor to bilaterians, cnidarians, and sponges in Supplementary data file 13⁴⁰.

In order to visualize how the chromosomes of extant species have undergone major splits and fusions, we projected the reconstructed ALGs of the respective cnidarian, bilaterian and metazoan ancestor to the chromosomes of the extant species. We found that many ALGs correspond to the same chromosomes across multiple species and lineages (Fig. 2c; Supplementary Figs. 5–7). To determine the extent of this, we further explored the relationships between metazoan ALGs and those of cnidarians and bilaterians (Fig. 2c). Remarkably, many ancestral chromosomes exhibited a 1-to-1 correspondence across all predicted ancestral lineages (Fig. 2c) and carried through to the extant lineages (Fig. 2d).

Chromosomal organization of the NK and extended Hox gene clusters

The chromosome-level assembly of the Nematostella genome allowed us to address the evolution of specific gene clusters. Prominent examples of clusters of homeodomain transcription factor coding genes ancestral for Bilateria include the SuperHox cluster, the ParaHox cluster, the NK/NK-like cluster as well as NK2 group genes located separately^43,50,51. It has been hypothesized that all of them originated from a single gene cluster, which then disintegrated during evolution⁵¹. Our analysis revealed that Nematostella possesses a separate ParaHox cluster of two genes, (Gsx and Xlox/Cdx) on chromosome 10, and a SuperHox cluster on chromosome 2 containing Hox, Evx, Mnx, and Rough, as well as more distant Mox and Gbx⁵² (Fig. 3, Supplementary Fig. 8, Supplementary data file 6). We identified an NK cluster on chromosome 5 containing NK1, NK5, Msx, NK4, NK3, NK7, NK6, a more distant Lbx, a possible highly derived Tlx-like gene and, intriguingly, Hex, which is also linked to the NK cluster in the hemichordate Saccoglossus kowalevskii⁵³ and in the cephalochordate Branchiostoma floridae. Similar to Bilateria, the NK2 genes were clustered separately and found on the chromosome 2 (Fig. 3, Supplementary Fig. 8-9, Supplementary data file 6). In contrast, in the earlier-branching sponges, neither ParaHox nor extended Hox cluster genes exist, and only the NK cluster is present with a single NK2/3/4 gene, two NK5/6/7 genes, an Msx ortholog, as well as possible Hex and Tlx orthologs⁵⁴, (Fig. 3, Supplementary Fig. 10). Taken together, this allows us to propose that the bilaterian-cnidarian MRCA possessed an NK-cluster on a chromosome different from the one carrying the SuperHox cluster, and a separate NK2 cluster, possibly on the same chromosome as the SuperHox cluster (Fig. 3). The hypothesized SuperHox-NK Megacluster⁴³, if it ever existed, must have both formed and broken apart during the time after the separation of the sponge lineage, but before the origin of the bilaterian-cnidarian ancestor (Fig. 3a, Supplementary notes). The lack of selection pressure in favor of microsynteny conservation is clearly illustrated by the comparison of the divergent Hox clusters of Scolanthus and Nematostella. Although located on homologous chromosomes, the gene order, orientation, and the number of intervening genes differs substantially between these two species (Fig. 3b). In contrast to the atomized Hox clusters of Nematostella and Scolanthus, the Hox cluster of their very distant anthozoan relative Xenia, is compact and contains “anterior” Hox genes HoxA and HoxB immediately next to the non-anterior Hox gene HoxE (Fig. 3b) Aside from the tandem duplications, this compact state likely represents the ancestral organization. Similarly, HoxE and HoxB are immediate neighbors in the genome of the jellyfish Rhopilema, although the Rhopilema Hox cluster shows some evidence of disintegration (Fig. 3b).

**Fig. 3: Evolution of a selection of Antennapedia class homeobox gene clusters.**

Topologically associating domains are not detected in either sea anemone genome

In the past decade, high-resolution chromosome conformation capture has increased interest in topologically associating domains (TADs), recurring chromosomal-folding motifs evidenced by signals in Hi-C contact maps⁵⁵. Flanking regions of TADs are positively correlated with CCCTC-binding factor (CTCF) binding sites. Interestingly, no CTCF ortholog has been detected in non-bilaterian animals^31,55, but previous studies of non-bilaterian animals have proposed the existence of TAD-like structures⁴⁷.

Similar to previous studies, we sought to identify putative TADs within the edwardsiid genomes. Initial inspections of the Hi-C maps of both Scolanthus and Nematostella, however, revealed that both species appeared to lack TAD-like structures (Fig. 4a; Supplementary Fig. 11) and instead the Hi-C contact frequency decayed smoothly as a function of read pair distance along the chromosomes (Supplementary Fig. 11). We quantified these findings by measuring the strength of topological boundaries, measured by insulation score, and found the topological boundaries to be weaker in the Nematostella genome compared to the fly (Supplementary Fig. 12, p = 8.38 × 10⁻¹⁸, Mann-Whitney U test) and mouse genomes (p = 2.93 × 10⁻⁵⁵, Mann-Whitney U test). We also verified that there was a lack of bilaterian-like TADs and weak topological domain boundaries in an independent Nematostella chromosome-scale genome assembly and Hi-C dataset produced in parallel to ours (Supplementary Fig. 11⁵⁶).

TAD structures are often well-conserved around gene clusters that control spatio-temporal gene expression, especially of developmental regulators, such as the TADs that separate the anterior and posterior hox genes in the mouse HoxD cluster⁵⁷. We therefore explored whether TAD-like structures existed around Hox clusters in Nematostella. Nematostella has six Hox genes, one is on chromosome 5, while the other five Hox genes are located on chromosome 2, but in two clusters, separated by hundreds of genes between (Fig. 4). We analyzed the genomic regions surrounding both partial hox clusters but again we could not detect any significant TAD-like structure.

Correlation of gene regulation with three-dimensional genome organization

We note that the Nematostella genome is fairly gene-dense (10 ± 4 genes per 100 kb), leaving relatively little intergenic sequence. It is conceivable that most relevant cis-regulatory elements are located in close proximity to the regulated gene. This is supported by the observation that many Nematostella transgenic reporter constructs faithfully mimic endogenous expression patterns with only 1–3 kb of upstream promoter sequence. Regions of open chromatin are reflected by ATAC-seq peaks, many of which contain cis-regulatory regions. We therefore wished to assess whether there is a correlation between the distance of ATAC-seq peaks and the closest gene (i.e., transcriptional start or end site) and the genome size. We compared several species, where chromosome assemblies and ATAC-seq data are available and - as expected - we found that there is indeed a correlation between distance of ATAC-seq peaks to the next gene and the genome size of the organism. However, there is also a correlation between small genomes and the absence of clear TAD structures, for instance in C. elegans, sponges and cnidarians. Notably, at least one cnidarian genome, Hydra, is relatively large (1.27 Gb), yet does not show typical TAD structures, although smaller-scale 3D structures could be observed⁵⁸. Thus, within bilaterians, species with small genomes may show a tendency to lack TAD structures, while in non-bilaterians, which lack CTCF, they may not organize into stable TAD-like structures. At present, Hydra is the only available large non-bilaterian genome with relatively short distances between ATAC-seq peaks to the next gene.

TADs not only assure maintenance and directionality of enhancers to the regulated gene, they frequently also encompass more than one gene. In vertebrates, many TADs are conserved between species, suggesting that their maintenance is under selection pressure. This predicts that in organisms with TADs the number of microsyntenic genes is higher than in organisms lacking TADs. To test this, we compared the number of microsyntenies in different cnidarians, chordates and protostomes, roughly representing similar times of divergence. We find that, while the amount of microsyntenies consisting mostly of two consecutive orthologous genes did not show any difference between clades, microsyntenies consisting of at least three or four orthologous genes were reduced in cnidarians. No microsyntenies of four and more genes were found in any pair of species spanning the anthozoan-medusozoan split (e.g., Nematostella-Rhopilema), whereas around a dozen such regions are shared among distantly branching spiralian and chordate-vertebrate (“deuterostome”) genomes (e.g., 11 blocks between the scallop Pecten maximus bilaterian and the annelid Streblospio benedicti, and 14 blocks for amphioxus Branchiostoma floridae to chicken Gallus gallus split) (Fig. 4f). A similar trend was observed for microsyntenies where intervening genes were allowed to be part of the blocks. For a commonly used threshold of up to 5 intervening genes⁴⁹, Nematostella-Rhopilema species pair showed only 7 blocks, whereas Pecten-Streblospio had 82, and amphioxus to chicken had 45 microsyntenic blocks retained (Fig. 4f, Supplementary data file 12). Together, this data indicates higher retention and evolutionary exploration of longer syntenic regions in bilaterian genomes, potentially allowed for by the maintained distal topological architecture.

Discussion

Here we report the assembly of two high quality, chromosome-level edwardsiid sea anemone genomes. In comparing them in relation to other cnidarian, bilaterian and poriferan genomes, we have illuminated several intriguing aspects about early animal chromosomal evolution, the origin of NK and extended Hox clusters, the conservation of non-coding elements and the status of topologically associated domains in the bilaterian-cnidarian MRCA. In addition, the highly improved Nematostella genome and manually curated gene annotations will prove to be an invaluable resource for future studies of both coding and non-coding regions, structural variants among populations and continued development of functional tools for this model organism.

Nearly all members of the extended Hox cluster were distributed among distant, isolated microsyntenic blocks on chromosome 2 of Nematostella (pseudo-chromosome 4 in Scolanthus), with the single exception of HoxF/Anthox1, located on chromosome 5 (Fig. 3; Supplementary notes). This indicates a lack of proximity constraint on the Hox genes in cnidarians, contrasting with the situation in Bilateria. In addition, while a staggered spatiotemporal pattern of Hox expression along the secondary, directive axis of the Nematostella larva and polyp can be observed⁵⁹, unlike Bilateria, there is no correlation between expression and cluster position⁶⁰. Notably, HoxF/Anthox1 is not only located on a different chromosome but it is also the only Hox gene expressed in the (aboral) ectoderm, while all other Hox genes are expressed in the inner endomesodermal cell layer^61,62.

The dispersed NK and extended Hox clusters may be due to the diminishment or, possible absence of higher-order chromosome organization at the level of microsynteny. In line with this, it was recently observed that the HoxD cluster boundaries in the mouse genome are marked by two TAD boundaries⁵⁷, and the cluster’s intra-TAD gene order is deemed to be under selective pressure⁶³. The lack of a CTCF gene in the Nematostella genome led us to hypothesize that the cnidarians might lack TADs, as TAD presence has been attributed to the appearance of CTCF⁶³. Moreover, CTCF is absent not only in cnidarians but also in earlier branching ctenophores and sponges, which provides a possibility that the existence of TADs might represent a bilaterian-specific feature. While we were unable to detect any noticeable structure resembling the current definitions of TADs, it remains an open question as to whether larger or smaller structures, beyond the resolution of our data, could yet be detected. One study has suggested evidence for the higher-order chromosomal organization in a non-bilaterian, the sponge Ephydatia⁴⁷. However, the contact maps resemble patterns we observe in our assemblies at the boundary of scaffolds or contigs, which can be the result of differential mappability from repetitive content or assembly issues. We therefore deliberately do not report any results from a TAD finder, since, after multiple rigorous rounds of manual assembly update, we can assert that the data we have generated do not qualitatively represent TAD boundaries per se, and most results would be likely false positives. While the precise definition of a TAD is still evolving^64,65, both data sets lack many characters of TADs identified in CTCF-containing genomes: hierarchical compartments, mammalian-specific “corner peaks” indicating strong interactions, and in our case, loop peaks and inter-contig compartments. This suggests that the presence of CTCF is necessary for the formation of TADs. The apparent absence of TADs in our analysis could also be explained by a higher degree of cell type variability. Therefore, we cannot exclude the possibility that performing the experiment with a more homogenous cell population, or sequencing at a higher resolution, would reveal a signal on a smaller scale.

While microsynteny analyses reveal little conservation of the local gene order in the genomes of Nematostella and Scolanthus, macrosyntenic analysis of the edwardsiid chromosomes compared to available cnidarian genomes revealed a high level of conservation. We identified a stable set of 19 ALGs across all clades of sequenced cnidarian genomes. When compared to extant genomes, we can trace a small number of recombination events from the ALGs since the common cnidarian ancestor split an estimated 580 Mya. This stands in stark contrast to the history of, for example, the 326 Mya old ancestral genome of Amniota, which is estimated to have 49 distinct units, while the karyotypes of the extant amniote taxa consist of multiple translocated segments and variable chromosomes⁶⁶. However, far more remarkable is the macrosynteny maintained between the edwardsiids, the early branching chordate Branchiostoma, and the sponge Ephydatia. Our analyses revealed clear one-to-one, one-to-few or few-to-one conservation of the chromosome-level linkages between cnidarians, sponges and early chordates, which suggests a striking retention of macrosyntenies throughout evolution of these animal lineages. This in turn allowed us to deduce a set of 16 ALGs of the last common metazoan ancestor, which was maintained in the cnidarian-bilaterian ancestor and gave rise to the 19 ancestral cnidarian chromosomes and the 17 ancestral bilaterian chromosomes⁴⁹. The observed conservation of macrosyntenic linkages can result from a strong selective pressure to maintain intact chromosomes during meiosis, so that only local translocations within the chromosomes, scrambling local gene order, are favored. Changes in population sizes or asexual reproduction or selfing may facilitate chromosomal unit breakages. While such global chromosomal rearrangements have been observed for some clades (most notably dipterans but also nematodes and cephalopods), it still remains unclear whether additional gene regulatory constraints may exist at the whole-chromosomal level⁶⁷. It is tempting to speculate that the emergence of the TADs in Bilateria may have restricted local rearrangements and at the same time released the constraints on maintaining the ancestral macrosyntenies conserved all the way back to the origin of multicellular animals.

The high level of macrosynteny maintaining ancestral chromosomal blocks over hundreds of millions of years contrasts with the low level of microsynteny conservation, even between more closely related cnidarians. Despite this reshuffling of genes within chromosomes, many developmental regulators involved in axial patterning show a conserved expression pattern, suggesting that their cis-regulatory elements have been maintained. For instance, in Nematostella and Hydra, which are separated over 500 Million years, all 12 Wnt genes, brachyury, and foxA are expressed at the oral pole, while foxQ2 and six3 are expressed aborally^68,69,70,71. To ensure conservation of spatio-temporal gene expression patterns cis-regulatory elements must remain located in close proximity to the respective gene even when translocated to a new genomic position. This is supported by our distance analyses of ATAC-seq peaks. By contrast, bilaterians with large genomes, such as vertebrates, enhancers can be located up to several megabases away from the gene they regulate, often with several genes in between. To maintain the gene regulation by long distant enhancers, we predicted that the microsynteny conservation should be under higher selective pressure than in non-bilaterians. Indeed, microsynteny conservation is higher in vertebrates than in cnidarians, in line with recent comparisons between skates, mouse and garfish⁷². We therefore propose a hypothetical evolutionary scenario, where close cis-regulation is ancestral to metazoans and maintained in non-bilaterians, which lack CTCF. It follows that CTCF evolved in the bilaterian common ancestor and was recruited to bind to the genome to act as an insulator and, together with cohesin, as a structural component defining recombination boundaries. This kept complex gene regulation by distant enhancers as a physical boundary for recombination events, but also acted as an insulator in TADs. Bilaterians that evolve a more compact genome would have a tendency to lose TADs and the dependence on CTCF. This seems to be the case in the C. elegans autosomes^73,74 and is also predicted for other small genomes, such as in urochordates.

Methods

Animal care and source

Nematostella vectensis animals (F1 of CH2xCH6, originally collected by Cadet Hand) were cultured at 18 °C under dim light conditions and fed daily with Artemia brine shrimps. Adult male and female individuals were verified by induction of spawning in isolation⁷⁵. Spawning was induced by a combination of white light and a temperature of 25 °C for 10 h. Scolanthus callimorphus animals were collected at the Île Callot, Carantec, France in the frame of the Assemble grant 227799 to U.T. After transport, they were kept in seawater at 20 °C and fed freshly hatched Artemia salina weekly or biweekly. Spawning could not be induced in the laboratory and the sex of the polyps was unknown. Nematostella vectensis is a laboratory strain since several decades. Neither Nematostella vectensis nor Scolanthus callimorphus are endangered species and they are simple invertebrates. Therefore, no ethical approval was necessary.

Sequencing

Short read DNA-Seq

Genomic DNA samples were extracted from both adult male and female individual Nematostella adults using the DNeasy Blood and Tissue Kit (Qiagen). After purification, ~5 µg of genomic DNA was recovered from each sample. Following DNA extraction, samples were sheared and size selected for ~500 bp using a Blue Pippin Prep machine (Sage Science). Following size selection, sequencing libraries were created using a KAPA HTP Library Prep kit (Roche) and subjected to paired-end sequencing on an Illumina NextSeq 500. Scolanthus DNA samples for library preparation were aliquoted from high molecular weight extractions, described below.

High molecular weight DNA extraction and library prep

Nematostella high molecular weight DNA was extracted at Dovetail Genomics. Samples were quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). The PacBio SMRTbell library (~20 kb) for PacBio Sequel was constructed using SMRTbell Template Prep Kit 1.0 (PacBio, Menlo Park, CA, USA) using the manufacturer recommended protocol. The pooled library was bound to polymerase using the Sequel Binding Kit 2.0 (PacBio) and loaded onto PacBio Sequel using the MagBead Kit V2 (PacBio). Sequencing was performed on the PacBio Sequel SMRT cell, using Instrument Control Software Version 5.0.0.6235, Primary analysis software Version 5.0.0.6236 and SMRT Link Version 5.0.0.6792, yielding 24.67 Gb over 3,050,403 subreads.

High molecular weight DNA from a single Scolanthus callimorphus adult animal was extracted using a modified Urea-based DNA extraction protocol^76,77. A whole animal was flash frozen and ground with mortar and pestle. While frozen, drops of buffer UEB1 (7 M Urea, 312.5 mM NaCl, 50 mM Tris-HCl pH 8, 20 mM EDTA pH 8.1% w:v N-Lauroylsarcosine sodium salt) were added and crushed with the tissue. Tissue was incubated in a final volume of 10 mL UEB1 at RT for 10 min. Three rounds of phenol-chloroform extraction were performed, followed by DNA precipitation by addition of 0.7 volume isopropanol. The pellet was transferred to a fresh tube and washed twice in 70% EtOH and twice more in 100% EtOH, dried, and resuspended in TE buffer.

A library for PacBio sequencing was then prepared from the high molecular weight sample using the SMRTbell® Express Template Prep Kit v1. The libraries were then sequenced on a PacBio Sequel machine over 3 SMRT Cells, yielding a total of 22.85 Gb over 1,474,285 subreads. An aliquot of the same sample was used to prepare a library using the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina. This was then subjected to 50 cycles of single-end sequencing in one flow cell lane using an Illumina HiSeq 2500 system.

Chicago libraries

Two Chicago libraries were prepared as described previously⁷⁸. For each library, ~500 ng of HMW gDNA (mean fragment length = 100 kbp) was reconstituted into chromatin in vitro and fixed with formaldehyde. Fixed chromatin was digested with DpnII, the 5’ overhangs filled in with biotinylated nucleotides, and then free blunt ends were ligated. After ligation, crosslinks were reversed and the DNA purified from protein. Purified DNA was treated to remove biotin that was not internal to ligated fragments. The DNA was then sheared to ~350 bp mean fragment size and sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of each library. The libraries were sequenced on an Illumina HiSeq 2500 (rapid run mode). The number and length of read pairs produced for each library was: 116 million, 2 × 101 bp for library 1; 35 million, 2 × 101 bp for library 2. Together, these Chicago library reads provided 125 × sequence coverage of the genome (1–100 kb pairs).

Chromatin was extracted from a single Nematostella vectensis adult male and Scolanthus callimorphus adult (unknown sex) nuclei using the Phase Genomics Proximo Hi-C animal protocol. After proximity ligation and purification, 16 ng and 9 ng of DNA was recovered, respectively. For library preparation 1 µl of Library Reagent 1 was added 12 PCR cycles were performed. The final library was subjected to 150 total cycles of paired-end sequencing using an Illumina NextSeq 550 machine yielding a total of 13.5 gigabases.

Hi-C sequencing, Scolanthus callimorphus PacBio library preparation and sequencing, Scolanthus Illumina DNA library preparation and sequencing and adult Nematostella vectensis RNA library preparation and sequencing was performed at the Vienna Biocenter Core Facility (VBCF) NGS Unit (https://www.viennabiocenter.org/facilities). Nematostella vectensis DNA size selection, library preparation, and sequencing were performed by the Molecular Biology Core at the Stowers Institute for Medical Research.

Developmental and adult Nematostella RNA sequencing was performed as follows. Nematostella were spawned and eggs were de-jellied and fertilized as previously described⁷⁵. Spawning and embryo development took place at 18 °C. Eggs and embryos from different stages were collected (300 per sample) in duplicate as indicated: eggs (within 30 min of spawn), blastula (7.5 hpf), gastrula (23.5 hpf) and planula (72 hpf). Eggs and embryos were collected in eppendorf tubes and centrifuged to a pellet at 21,000 × g for 1 min. All seawater was quickly removed and pellets were resuspended in 150 ml lysis buffer (RLT buffer supplied by the Qiagen RNeasy kit (#74104), supplemented with β-mercaptoethanol). The samples were homogenized with an electric pestle (1 min continuous drilling) and further supplemented with 200 ml of the above lysis buffer. Homogenized samples were then transferred into QIAshredder columns (Qiagen #79654) and centrifuged at 21,000 × g for 2 min. The flow throughs were supplemented with 1 ml 70% ethanol and transferred to RNeasy columns and were processed according to the Qiagen RNeasy protocol. Quality and integrity of the RNA was evaluated using the Agilent RNA 600 pico kit (Agilent Technologies) and RNA samples were stored at −80 °C until further processing. cDNA libraries were then constructed for polyA stranded sequencing. The resulting libraries were sequenced on Illumina HiSeq using paired end runs (RapidSeq- 2 × 150bp).

Genome Assembly

Size estimates for Nematostella vectensis and Scolanthus callimorphus were derived using Genomescope⁷⁹, taking the result of the highest k (56 and 18) which converged under the model.

Initial assemblies based on PacBio sequencing of Nematostella and Scolanthus were generated using canu version 1.8⁸⁰ with the parameters rawErrorRate=0.3 correctedErrorRate=0.045.

Nematostella haplotigs were removed using Purge Haplotigs⁸¹. First, the source PacBio reads were aligned onto the canu assembly using minimap2⁸² using the parameters -ax map-pb --secondary=no. Following this a coverage histogram was generated using the Purge Haplotigs script readhist. Per the documented Purge Haplotigs protocol, lower, mid, and high coverage limits were found by manual inspection of the plotted histogram to be 12, 57, and 130, respectively. All initial contigs marked as suspect or artifactual were removed from further analysis with the Purge Haplotigs script purge.

Due to lower sequencing coverage of Scolanthus, diploid per-scaffold coverage could not be deconvolved from haploid, and therefore Purge Haplotigs could not be used. Removal of redundant contigs was performed with Redundans version 0.14a⁸³ using the parameters --noscaffolding --nogapclosing --overlap 0.66. Only contigs marked in the reduced version of the genome were used in further analysis.

The input de novo assembly, shotgun reads, and Chicago library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assembly⁷⁸. Shotgun and Chicago library sequences were aligned to the draft input assembly using a modified SNAP read mapper (http://snap.cs.berkeley.edu). The separations of Chicago read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold. After scaffolding, shotgun sequences were used to close gaps between contigs.

Repetitive DNA and Hi-C scaffolding

Repetitive DNA was found using two strategies. First, known repeats found in repbase⁸⁴ were searched in the assemblies using RepeatMasker⁸⁵ using the parameters -s -align -e ncbi in addition to -species nematostella for Nematostella and -species edwardsiidae for Scolanthus. Second, novel repeat sequences were found using RepeatModeler version 2.0⁸⁶. After generating the repeat library, genomes’ repeat regions were detected with the corresponding library using the same parameters in RepeatMasker.

Hi-C sequences were aligned to the reduced and repbase masked genomes of Nematostella and Scolanthus using bwa mem^87,88 using the parameters -5SP. For Nematostella, an additional candidate assembly was generated by mapping Hi-C sequences to the Chicago library scaffolded sequences using repbase masking (dovetail_standardmask) in addition to the contig-based scaffolding (contig_standardmask). Duplicate reads were marked with the samblaster utility⁸⁹, and duplicate, secondary and supplementary mappings were removed with samtools. These mappings were used to generate initial chromosomal assemblies using Lachesis⁹⁰, specifying the restriction site GATC. Assemblies were manually reviewed using Juicebox Assembly Tools version 1.11.08⁹¹. Candidate assemblies were compared using the nucmer aligner with default parameters and visualized using mummerplot⁹². Assemblies were converted over to Juicebox format using juicebox_scripts (https://github.com/phasegenomics/juicebox_scripts). In the case of Scolanthus, duplicate regions were clipped, and the resulting contigs were subjected to another round of alignment, assembly and review.

Nematostella scaffold correctness was assessed using REAPR⁹³. Nematostella assembly nemVec1 was downloaded from the JGI website¹. Sequences from the adult male and adult female (see Sequencing) were aligned to nemVec1 and the Nematostella genome after scaffolding with Chicago libraries using SMALT as well as the REAPR tool perfectmap using an expected insert size of 400, as determined from fragment analysis. Error-free bases and contiguity after breaking the genome were extracted from the results.

Genome and gene model set assembly and completeness was assessed using BUSCO version 3.0.2³³, using the gene set metazoa_odb9 as the standard.

Gene models

Nematostella, Scolanthus and M. senile sequences obtained from previous studies^34,94 and publicly available data (see Data Availability for details) were used to generate de novo assembled transcripts.

Trinity version 5.0.2⁹⁵ was run on each library using the flags --min_contig_length 200 --min_kmer_cov 2. For those which had a strand-specific library preparation, the flag --SS_lib_type RF was applied. To reduce redundancy, cd-hit version 4.6.8^96,97 was applied with the flags -M 0 -c 1. Transdecoder version 5.0.6⁹⁸ was used to detect open reading frames in the resulting reduced set of transcripts. Transcript abundance was quantified using salmon version 1.2.1⁹⁹ using the flags --seqBias --useVBOpt --discardOrphansQuasi --softclip.

For PacBio Iso-seq, 12 Nematostella RNA samples were collected over the course of multiple developmental stages, adult tissues and regeneration time points. For developmental stages, zygotes spawned by a single batch of wildtype colony were kept at 22 °C, and collected at 0, 24, 48, 72, and 7 dpf. Adult tissues were collected from sex-sorted, sexually mature wildtype individuals kept at 22 °C. The male and female mesenteries were harvested separately by surgically opening the body column and carefully peeling off the attached body column tissues. Adult oral discs were collected by surgical removal of tentacles as well as the attached pharyngeal regions. Regeneration was induced by amputating the oral part of a sexually mature individual at the mid-pharyngeal level. Regenerating tissues close to the wound were collected at 4 hpa and 12 hpa, respectively. All the samples were deep-frozen and lysed using TRIzolTM reagent (Invitrogen). Phenol-chloroform extraction was performed to remove undissolved mesoglea from adult tissues. Directzol^TM RNA Miniprep Plus Kit (Zymo) was then used to purify total RNA from the aqueous phase. For each sample, 2 μg of total RNA with RIN > 7 was submitted to UC Berkeley for Iso-seq library construction.

RNA Libraries were sequenced at UC Berkeley using PacBio Sequel-II system. Raw subreads bams were processed and demultiplexed using PacBio’s isoseq v3.2 conda pipeline. The steps include consensus generation, primer demultiplexing, polyA refinement and data clustering using default parameters. This resulted in the generation of 406,317 high quality HIFI reads and used to build Nvec200 transcriptome.

HIFI reads were mapped to the Nematostella genome using minimap2⁸² using parameters (-ax splice -uf --secondary=no) to obtain the primary best alignments. Reads were then grouped and collapsed down to potential transcripts using PacBio’s cDNA_Cupcake toolkit and TAMA¹⁰⁰. Based on PacBio’s guideline, transcripts with degraded 5’ reads and have less than 10 FL counts were removed. Chimeric transcripts were then analyzed to find potential fusion genes. For reads that didn’t map to the genome, de novo transcriptome assembly was performed using graph-based tool Cogent with kmer size equals 30. Cupcake and TAMA results were merged into non-redundant gene models using stringtie v2¹⁰¹. Deep RNA-seq reads from 4 developmental stages: egg, gastrula, pos-gastrula and planula were aligned to the genome using STAR v 2.7.3a¹⁰². Read alignments outside of the isoseq gene models were extracted with bedtools v2.29.2¹⁰³ and used for reference-guided transcriptome assembly using Stringtie2. Final gene models were obtained by merging Isoseq models and RNAseq models and manually corrected using previously cloned full length CDS from Nematostella vectensis in NCBI (Supplementary data file 7). Finally, transdecoder v5.5.0 was used to produce CDS annotation using a minimum protein length of 50 amino acids and prioritizing ORFs with significant similarity to any family in the PFAM database¹⁰⁴. Alignment of the protein candidates to the PFAM database was done using Hmmer v3.1b2¹⁰⁵. RNAseq libraries from the N. vectensis developmental time series were downloaded and aligned to the new genome using STAR v2.7.3a¹⁰² with standard parameters. Mapping and assignment efficiency was measured using featureCounts from the subread package¹⁰⁶ with the “-p” flag for paired-end libraries.

Evidence for Scolanthus gene models were taken from RNA-sequencing and repeats. Scolanthus RNA-seq reads (see Sequencing) were mapped to the Scolanthus contigs using STAR version 2.7.3a¹⁰². These mappings were used as evidence for intron junctions to generate putative gene models and estimating hidden Markov model parameters using BRAKER2^107,108. Gene models were then refined using Augustus version 3.3.3¹⁰⁹ using extrinsic evidence from STAR splice junctions and the location of repeats from RepBase (see Genome Assembly) as counter-evidence for transcription. These models were filtered with the following criteria: 1) genes completely covered by RepeatModeler repeats (see Genome Assembly) were removed 2) predicted gene models were required to be either supported by external RNA-seq evidence as reported by Augustus or have a predicted ortholog as reported by Eggnog-mapper¹¹⁰. This resulted in a set of 24,625 gene models. Transcription factor identity was inferred by aligning the predicted protein sequences to Pfam A domains version 32.0¹¹¹ using hmmer version 3.3¹⁰⁵. Transcription factor families were based on domains curated in a previous work¹¹².

Extended Hox cluster, NK cluster and ParaHox genes were found with BLAT¹¹³ matches of published models^{52,61,114,115,116,117,118,119} to the nv1 genome, taking the best hits. If an NVE gene model³⁴ corresponded to the matched genomic region, its location in the nv2 genome was then determined for macrosynteny analysis. In cases where no published gene was known, reciprocal BLAST hits between the bilaterian and cnidarian counterpart were taken as evidence for orthology.

Divergence estimates

Single copy orthologs were detected by collecting common complete and duplicated BUSCO genes present in the Scolanthus and Nematostella genomes. Where duplicated BUSCOs were present, the transcript with the highest score was taken. This resulted in a total of 541 orthologs. BUSCOs found in genomes obtained from previous studies^{2,21,22,26,27,32,41,44,47,120,121,122,123} were used to generate multiple alignments. Genes were aligned with mafft version 7.427 using the E-INS-i model and a maximum 1000 refinement iterations¹²⁴. Alignments were trimmed using trimAl version 1.4.rev15 using the “gappyout” criteria¹²⁵. A maximum likelihood tree was inferred using iqtree version 2.0.6, using the model finder partitioned on each gene, constrained to nuclear protein models¹²⁶. Divergence estimates were determined using r8s version 1.8.1 using the Langley-Fitch likelihood method¹²⁷. Age ranges were estimated by fixing the split between Bilateria and Cnidaria at 595.7 and 688.3 Mya¹²⁸.

Single copy orthologs were detected by collecting common complete BUSCO genes present in the A. millepora, A. digitifera, E. pallida, M. senile, Scolanthus and Nematostella genomes. This resulted in a total of 229 orthologs.

Ultraconserved elements

In order to determine noncoding elements conserved between Scolanthus and Nematostella, genomes repeat-masked from both de novo and repbase repeats were blasted using NCBI BLAST+ version 2.10.0¹²⁹, using the flags -evalue 1E-10 -max_hsps 100000000 -max_target_seqs 100000000 -task megablast -perc_identity 0 -template_length 16 -penalty -2 -word_size 11 -template_type coding_and_optimal. Additionally, the -dbsize parameter was set to the estimated genome size. Candidate hits were then filtered using criteria loosely based on previous work¹³⁰: for each high-scoring pair, a sliding window method was used to determine subsections of the alignment with at least 95 % identity, and extending these windows as long as the identity remains at this level. Nematostella elements mapping to more than one locus in the Scolanthus genome were reduced to the longest locus pair in both genomes. Elements mostly mapping to coding sequence were removed, and the remaining elements were classified as intron or non-coding, depending on location. Recurring UCE sequences that were not identified by RepeatModeler or RepeatMasker were detected with blastclust version 2.2.26 requiring the length of hit to cover at least 90 % of either sequence for linkage.

Macrosynteny analysis

Branchiostoma floridae gene models and sequences were retrieved from the published study⁴¹. All against all comparisons were performed with OMA standalone version 2.5¹³¹. Genomes were downloaded from previous studies^{2,3,20,22,26,32,44,45,46,47,120,121,122,123,132,133,134,135,136}.

Ancestral genome reconstruction was carried out using a graph based approach. In brief, genes were summarized into multi-species orthologous groups, and these comprised the nodes, and orthology groups occurring on the same chromosome or scaffold of two different species were linked together. A consensus approach to community detection based on the Leiden algorithm¹³⁷ was used to determine ancestral linkage groups from this graph. For details, see the Supplementary Notes.

TAD sliding window analysis

Hi-C maps were generated for multiple species’ genomes (g:) and reads (r:), including those of the ctenophore Hormiphora californensis (g: GCA_020137815.1, r: SRR13784181, SRR13784182)¹³⁸, the cnidarian Hydra vulgaris strain 105 (g: GCA_022113875.1, r: SRR14099165)⁴⁰, the cnidarian Haliclystus octoradiatus (g: GCA_916610825.1, r: ERR6745733)¹³⁹, the Nematostella assembly and reads presented in this manuscript (also, Hi-C reads from the Darwin Tree of Life project, ERR8571699⁵⁶), the cnidarian Diadumene lineata (g: GCA_918843875.1, r: ERR6688655)¹⁴⁰, the fire jellyfish Rhopilema esculentum (g: https://doi.org/10.5524/100720³⁹, r: SRR11649085²¹), the fly Drosophila melanogaster (g: assembly Release 6¹²², r: SRR10512944¹⁴¹), the scallop Pecten maximus (g: https://doi.org/10.6084/m9.figshare.10311068, r: SRR10119404)¹⁴², the cephalochordate Branchiostoma floridae (g: GCA_000003815.2 Bfl_VNyyK, r: SRR12007919, SRR12059951)⁴¹, and Mus musculus (g: GRCm39¹⁴³, r: SRR1771322-SRR1771324¹⁴⁴).

The Hi-C maps were generated by mapping the raw paired-end Hi-C reads to the genome assemblies using chromap v0.2.3¹⁴⁵. The pairs files were normalized using hicExplorer v3.6¹⁴⁶, and balanced using Cooler v0.8.10¹⁴⁷. The insulation scores were then calculated using FanC v 0.9.23b⁷³ using 100 kb bin sizes. The peaks and valleys in the insulation scores were also called with FanC v0.9.24¹⁴⁸. We defined the delta between each peak and valley as the transition from a highly-interacting region to a poorly-interaction region as the strength of transitions of topological boundaries. We compared the distributions of these deltas with an uncorrected Mann-Whitney U two-sided test to test for significant differences in the median values.

Phylogenetic analysis of NK-like and SuperHox genes

Sequences of the NK class proteins were MUSCLE-aligned with default settings in MEGA11¹⁴⁹, and trimmed using the Automated1 setting in TrimAl (v. 1.3)¹²⁵. Trimmed sequences were used for calculating the NJ, and ML trees. The NJ tree was calculated in MEGA11¹⁴⁹, and the ML tree was calculated in IQ-TREE2¹²⁶ using the automatic model selection algorithm. Sequences of the SuperHox cluster genes were aligned as above but not trimmed. NJ and ML trees were calculated as above.

Nematostella gastrula ATAC-seq

Embryos were raised at 21 °C until gastrula stage (24 hpf). Ten embryos were collected and washed with cold PBS, then cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% NP-40, 0.1% Tween-20, 0.01% Digitonin) was added. Embryos were disintegrated by pipetting and incubated in lysis buffer for 6.5 minutes on ice. Nuclei were then pelleted by centrifugation (0.5 rcf, 10 min, 4 °C). The pellet was resuspended in 50 µl tagmentation mix (5× TD buffer, 0.1% Tween-20, 0.01% Digitonin, 2.5 µl Tn5 transposase). TD buffer and Tn5 transposase were kindly provided by the David Garfield lab. The transposition reaction was incubated at 37 °C for 30 min with agitation at 500 × g and then cooled down on ice for 5 min. DNA purification was carried out with the QIAquick PCR kit (QIAGEN, #28104) according to the manufacturer’s instructions, with an elution in 20 µl elution buffer. PCR amplification was performed for 12 cycles according to Buenrostro¹⁵⁰. A final purification step was performed with the QIAquick PCR kit (QIAGEN, #28104), elution in 15 µl EB. The quality of the library was validated with an agarose gel, from which DNA was extracted with a peqlab gel extraction kit, eluted in 20 µl EB, and stored at −20 °C. Library preparation and sequencing were done at Novogene and VBCF.

Comparative ATAC-seq analysis

ATAC-seq peak calling was performed for the following species using publicly available datasets (see Supplementary data file 14 for details).

All peak callings were done following the Encode guideline for ATAC-seq. Owenia and Nematostella peak calls were done following the unreplicated data guidelines (https://www.encodeproject.org/pipelines/ENCPL344QWT/) while Lytechinus and Petinopectin following the replicated data guidelines (https://www.encodeproject.org/pipelines/ENCPL787FUN/). Briefly, raw reads were trimmed, aligned to their reference genome and filtered for mapq >=20. Psudoreplicates were produced for each library as well as for the merged alignment of multiple libraries and peak calling was done on the initial alignment, the merged alignment and all the pseudoreplicates. For unreplicated data, peaks were filtered based on p-signal <=0.05 and similar peak calls between the original alignment and at least one pseudoreplicate. For replicated data, final peaks were selected based on IDR scores between the merged library peaks and each replicate library.

ATAC-seq peaks for Human (hg19), mouse (mm10), Caenohabditis (Ce11), and Drosophila (dm6) were downloaded from the ChIP-atlas database (https://chip-atlas.org/^151,152). Only embryonic stage datasets were used in this study. Repetitive regions were identified using RepeatModeler and RepeatMasker for all genomes and peaks overlapping repeat regions were removed for downstream analysis. A custom R script was used to find the closest gene for each ATAC-seq peak and measure the distances. All peaks overlapping genes were not considered in downstream analysis. Data points 1.5 times the interquartile range above the third quartile or below the first quartile were removed as outliers before plotting and regression. A simple linear regression was used to correlate genome size and mean intergenic distance for all species.

Microsynteny analysis

Orthofinder version 2.5.4 was run to obtain orthogroups for the selected species. The microsynteny pipeline from ref. ⁴⁹ was run for each of the three clades (deuterostomes, spiralians, cnidarians) separately. For this a subset of orthogroups that had at least one gene per species in a given clade (e.g., every cnidarian species had an ortholog for the cnidarian clade micro-synteny analysis) was generated, to correct for any missing genes. The micro-synteny analysis was thus based on 6016, 4832, and 6539 orthogroups for cnidarians, spiralians, and deuterostomes, respectively. For the most strict profiling of gene block length, we run the micro-synteny pipeline without allowance for any intervening genes. The resulting blocks were then filtered to remove micro-syntenies composed of paralogous genes. The resulting total numbers of micro-syntenic blocks given different minimal required lengths (at least 2, 3, or 4 genes) are listed in Supplementary data file 12.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Code availability

A description of the tools and algorithms used for this work are described in the methods section and the supplement. Custom code used to analyze the data are available at https://github.com/nijibabulu/cnidariangenomes^153,154.

Data availability

All raw data and assembled genomes are available via the National Center for Biotechnology Information under the accessions PRJNA667495, PRJNA1036184 and PRJNA430035. The assembled genomes can be downloaded, browsed and searched on publicly available browsers at https://simrbase.stowers.org/starletseaanemone and https://simrbase.stowers.org/wormanemone. Publicly available data that were used in this manuscript include SRR1771322-SRR1771324 [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA273476] (mouse Hi-C), SRR10512944 (D. melanogaster Hi-C), and ERR8571699 (Wellcome Sanger N. vectensis Hi-C). Data necessary to reproduce the analyses together with the code is available for download via figshare [https://doi.org/10.6084/m9.figshare.24258598.v2].

References

Putnam, N. H. et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86–94 (2007).
Article ADS CAS PubMed Google Scholar
Chapman, J. A. et al. The dynamic genome of Hydra. Nature 464, 592–596 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Srivastava, M. et al. The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466, 720–726 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Schwaiger, M. et al. Evolutionary conservation of the eumetazoan gene regulatory landscape. Genome Res. 24, 639–650 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gaiti, F. et al. Landscape of histone modifications in a sponge reveals the origin of animal cis-regulatory complexity. Elife 6, e22194 (2017).
Article PubMed PubMed Central Google Scholar
Nanty, L. et al. Comparative methylomics reveals gene-body H3K36me3 in Drosophila predicts DNA methylation and CpG landscapes in other invertebrates. Genome Res. 21, 1841–1850 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Acemel, R. D., Maeso, I. & Gómez‐Skarmeta, J. L. Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. WIREs Dev. Biol. 6, e265 (2017).
Article Google Scholar
Anderson, E., Devenney, P. S., Hill, R. E. & Lettice, L. A. Mapping the Shh long-range regulatory domain. Development 141, 3934–3943 (2014).
Article CAS PubMed PubMed Central Google Scholar
Krefting, J., Andrade-Navarro, M. A. & Ibn-Salem, J. Evolutionary stability of topologically associating domains is associated with conserved gene regulation. BMC Biol. 16, 87 (2018).
Article PubMed PubMed Central Google Scholar
Rodríguez-Carballo, E., Lopez-Delisle, L., Yakushiji-Kaminatsui, N., Ullate-Agote, A. & Duboule, D. Impact of genome architecture on the functional activation and repression of Hox regulatory landscapes. BMC Biol. 17, 55 (2019).
Article PubMed PubMed Central Google Scholar
Cartwright, P. et al. Exceptionally preserved jellyfishes from the middle cambrian. PLoS One 2, e1121 (2007).
Article ADS PubMed PubMed Central Google Scholar
Technau, U., Genikhovich, G. & Kraus, J. E. M. Cnidaria. in Evolutionary Developmental Biology of Invertebrates 1 (ed. Wanninger, A.) 115–163 (Springer Vienna, 2015).
Zapata, F. et al. Phylogenomic analyses support traditional relationships within Cnidaria. PLoS One 10, e0139068 (2015).
Article PubMed PubMed Central Google Scholar
Technau, U. & Steele, R. E. Evolutionary crossroads in developmental biology: Cnidaria. Development 138, 1447–1458 (2011).
Article CAS PubMed PubMed Central Google Scholar
Röttinger, E. Nematostella vectensis, an emerging model for deciphering the molecular and cellular mechanisms underlying whole-body regeneration. Cells 10, 2692 (2021).
Rentzsch, F. & Technau, U. Genomics and development of Nematostella vectensis and other anthozoans. Curr. Opin. Genet. Dev. 39, 63–70 (2016).
Article CAS PubMed Google Scholar
Layden, M. J., Rentzsch, F. & Röttinger, E. The rise of the starlet sea anemone Nematostella vectensis as a model system to investigate development and regeneration: overview of starlet sea anemone Nematostella vectensis. Wiley Interdiscip. Rev. Dev. Biol. 5, 408–428 (2016).
Article PubMed PubMed Central Google Scholar
Hu, M., Zheng, X., Fan, C.-M. & Zheng, Y. Lineage dynamics of the endosymbiotic cell type in the soft coral Xenia. Nature 582, 534–538 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Nong, W. et al. Jellyfish genomes reveal distinct homeobox gene clusters and conservation of small RNA processing. Nat. Commun. 11, 3051 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Leclère, L. et al. The genome of the jellyfish Clytia hemisphaerica and the evolution of the cnidarian life-cycle. Nat. Ecol. Evol. 3, 801–810 (2019).
Article PubMed Google Scholar
Khalturin, K. et al. Medusozoan genomes inform the evolution of the jellyfish body plan. Nat. Ecol. Evol. 3, 811–822 (2019).
Article PubMed Google Scholar
Ohdera, A. et al. Box, stalked, and upside-down? Draft genomes from diverse jellyfish (Cnidaria, Acraspeda) lineages: Alatina alata (Cubozoa), Calvadosia cruxmelitensis (Staurozoa), and Cassiopea xamachana (Scyphozoa). Gigascience 8, giz069 (2019).
Kim, H.-M. et al. The genome of the giant Nomura’s jellyfish sheds light on the early evolution of active predation. BMC Biol. 17, 28 (2019).
Article PubMed PubMed Central Google Scholar
Gold, D. A. et al. The genome of the jellyfish Aurelia and the evolution of animal complexity. Nat. Ecol. Evol. 3, 96 (2019).
Article PubMed Google Scholar
Shinzato, C. et al. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476, 320–323 (2011).
Article ADS CAS PubMed Google Scholar
Wilson, T. J., Lazner, F., Kola, I. & Hertzog, P. J. The mouse and the genomic era. in Comparative Genomics (ed. Clark, M. S.) 97–121 (Springer US, 2000).
Gosse, P.H. Notes on some new or little known marine animals. The Annals and Magazine of Natural History, ser. 2. 12, 153–159 (1853).
Wilson, E. Scolanthus callimorphus Worm anemone. in Marine Life Information Network: Biology and Sensitivity Key Information Reviews, [on-line] (2005).
Heger, P., Marin, B., Bartkuhn, M., Schierenberg, E. & Wiehe, T. The chromatin insulator CTCF and the emergence of metazoan diversity. Proc. Natl Acad. Sci. 109, 17507–17512 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Baumgarten, S. et al. The genome of Aiptasia, a sea anemone model for coral symbiosis. Proc. Natl Acad. Sci. USA 112, 11893–11898 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness. in Gene Prediction: Methods and Protocols (ed. Kollmar, M.) 227–245 (Springer, 2019).
Fredman, D., Schwaiger, M., Rentzsch, F. & Technau, U. Nematostella vectensis transcriptome and gene models v2.0. Preprint at https://figshare.com/articles/Nematostella_vectensis_transcriptome_and_gene_models_v2_0/807696 (2013).
Cole, A. G. et al. Muscle cell-type diversification is driven by bHLH transcription factor expansion and extensive effector gene duplications. Nat. Commun. 14, 1747 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Steger, J. et al. Single-cell transcriptomics identifies conserved regulators of neuroglandular lineages. Cell Rep. 40, 111370 (2022).
Article CAS PubMed Google Scholar
Guo, L. et al. An adaptable chromosome preparation methodology for use in invertebrate research organisms. BMC Biol. 16, 25 (2018).
Article PubMed PubMed Central Google Scholar
Zacharias, H., Anokhin, B., Khalturin, K. & Bosch, T. C. G. Genome sizes and chromosomes in the basal metazoan Hydra. Zoology 107, 219–227 (2004).
Article PubMed Google Scholar
Li, Y. et al. Chromosome-level reference genome of the jellyfish Rhopilema esculentum. Gigascience 9, giaa036 (2020).
Simakov, O. et al. Deeply conserved synteny and the evolution of metazoan chromosomes. Sci. Adv. 8, eabi5884 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Simakov, O. et al. Deeply conserved synteny resolves early events in vertebrate evolution. Nat. Ecol. Evol. 4, 820–830 (2020).
Article PubMed PubMed Central Google Scholar
Dehal, P. & Boore, J. L. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3, e314 (2005).
Article PubMed PubMed Central Google Scholar
Pollard, S. L. & Holland, P. W. H. Evidence for 14 homeobox gene clusters in human genome ancestry. Curr. Biol. 10, 1059–1062 (2000).
Article CAS PubMed Google Scholar
Braasch, I. et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat. Genet. 48, 427–437 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat. Ecol. Evol. 1, 1–12 (2017).
Article Google Scholar
Qu, Z. et al. Millipede genomes reveal unique adaptations during myriapod evolution. PLoS Biol. 18, e3000636 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kenny, N. J. et al. Tracing animal genomic evolution with the chromosomal-level assembly of the freshwater sponge Ephydatia muelleri. Nat. Commun. 11, 3676 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kapli, P. & Telford, M. J. Topology-dependent asymmetry in systematic errors affects phylogenetic placement of Ctenophora and Xenacoelomorpha. Sci. Adv. 6, eabc5162 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes. Nature 493, 526–531 (2013).
Article ADS CAS PubMed Google Scholar
Butts, T., Holland, P. W. H. & Ferrier, D. E. K. The urbilaterian Super-Hox cluster. Trends Genet. 24, 259–262 (2008).
Article CAS PubMed Google Scholar
Ferrier, D. E. K. Evolution of homeobox gene clusters in animals: the giga-cluster and primary vs. secondary clustering. Front. Ecol. Evol. 4, 36 (2016).
Chourrout, D. et al. Minimal ProtoHox cluster inferred from bilaterian and cnidarian Hox complements. Nature 442, 684–687 (2006).
Article ADS CAS PubMed Google Scholar
Simakov, O. et al. Hemichordate genomes and deuterostome origins. Nature 527, 459–465 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Larroux, C. et al. The NK homeobox gene cluster predates the origin of hox genes. Curr. Biol. 17, 706–710 (2007).
Article CAS PubMed Google Scholar
Friedman, N. & Rando, O. J. Epigenomics and the structure of the living genome. Genome Res. 25, 1482–1490 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fletcher, C. et al. The genome sequence of the starlet sea anemone, Nematostella vectensis (Stephenson, 1935). Wellcome Open Res. 8, 79 (2023).
Article Google Scholar
Rodríguez-Carballo, E. et al. The HoxD cluster is a dynamic and resilient TAD boundary controlling the segregation of antagonistic regulatory landscapes. Genes Dev. 31, 2264–2281 (2017).
Article PubMed PubMed Central Google Scholar
Cazet, J. F. et al. A chromosome-scale epigenetic map of the Hydra genome reveals conserved regulators of cell state. Genome Res. 33, 283–298 (2023).
He, S. et al. An axial Hox code controls tissue segmentation and body patterning in Nematostella vectensis. Science 361, 1377–1380 (2018).
Article ADS CAS PubMed Google Scholar
McGinnis, W. & Krumlauf, R. Homeobox genes and axial patterning. Cell 68, 283–302 (1992).
Article CAS PubMed Google Scholar
Ryan, J. F. et al. Pre-bilaterian origins of the Hox cluster and the Hox code: evidence from the sea anemone, Nematostella vectensis. PLoS One 2, e153 (2007).
Article ADS PubMed PubMed Central Google Scholar
Finnerty, J. R., Pang, K., Burton, P., Paulson, D. & Martindale, M. Q. Origins of BILATERAL SYMMETRY: Hox and Dpp expression in a sea anemone. Science 304, 1335–1337 (2004).
Article ADS CAS PubMed Google Scholar
Lazar, N. H. et al. Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res. 28, 983–997 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rowley, M. J. & Corces, V. G. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 19, 789–800 (2018).
Article CAS PubMed Google Scholar
Szabo, Q., Bantignies, F. & Cavalli, G. Principles of genome folding into topologically associating domains. Sci. Adv. 5, eaaw1668 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Sacerdot, C., Louis, A., Bon, C., Berthelot, C. & Roest Crollius, H. Chromosome evolution at the origin of the ancestral vertebrate genome. Genome Biol. 19, 166 (2018).
Article PubMed PubMed Central Google Scholar
Clarence, T. et al. Robust 3D modeling reveals spatiosyntenic properties of animal genomes. iScience 26, 106136 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Sinigaglia, C., Busengdal, H., Leclère, L., Technau, U. & Rentzsch, F. The bilaterian head patterning gene six3/6 controls aboral domain development in a cnidarian. PLoS Biol. 11, e1001488 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schwaiger, M. et al. An ancestral Wnt-Brachyury feedback loop in axial patterning and recruitment of mesoderm-determining target genes. Nat. Ecol. Evol. 6, 1921–1939 (2022).
Article PubMed Google Scholar
Kusserow, A. et al. Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433, 156–160 (2005).
Article ADS CAS PubMed Google Scholar
Lengfeld, T. et al. Multiple Wnts are involved in Hydra organizer formation and regeneration. Dev. Biol. 330, 186–199 (2009).
Article CAS PubMed Google Scholar
Marlétaz, F. et al. The little skate genome and the evolutionary emergence of wing-like fin appendages. bioRxiv 2022.03.21.485123 (2022) https://doi.org/10.1101/2022.03.21.485123.
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, J. et al. Condensin DC loads and spreads from recruitment sites to create loop-anchored TADs in C. elegans. Elife 11, e68745 (2022).
Fritzenwanker, J. H. & Technau, U. Induction of gametogenesis in the basal cnidarian Nematostella vectensis (Anthozoa). Dev. Genes Evol. 212, 99–103 (2002).
Article PubMed Google Scholar
Chen, J. & Dellaporta, S. Urea-based plant DNA Miniprep. in The Maize Handbook (eds. Freeling, M. & Walbot, V.) 526–527 (Springer, 1994).
Sanders, S. M. et al. CRISPR/Cas9-mediated gene knockin in the hydroid Hydractinia symbiolongicarpus. BMC Genom. 19, 649 (2018).
Article Google Scholar
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Article CAS PubMed PubMed Central Google Scholar
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Article CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k -mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinform. 19, 460 (2018).
Article CAS Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pryszcz, L. P. & Gabaldón, T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 44, e113–e113 (2016).
Article PubMed PubMed Central Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. (2013).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio] (2013).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).
Article CAS PubMed PubMed Central Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 9, R12 (2004).
Hunt, M. et al. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 14, R47 (2013).
Article PubMed PubMed Central Google Scholar
Praher, D. et al. Conservation and turnover of miRNAs and their highly complementary targets in early branching animals. Proc. R. Soc. B Biol. Sci. 288, 20203169 (2021).
Article CAS Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Article CAS Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat. Protoc. 8, 1494–1512 (2013).
Srivastava, A., Malik, L., Sarkar, H. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol 21, 239 (2020).
Kuo, R. I. et al. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genom. 21, 751 (2020).
Article CAS Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Article CAS PubMed Google Scholar
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767–769 (2016).
Article CAS PubMed Google Scholar
Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: incorporating protein homology information into gene prediction with GeneMark-EP and AUGUSTUS. 1 (2018).
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinforma. 7, 62 (2006).
Article Google Scholar
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Article CAS PubMed PubMed Central Google Scholar
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Article CAS PubMed Google Scholar
de Mendoza, A. et al. Transcription factor evolution in eukaryotes and the assembly of the regulatory toolkit in multicellular lineages. Proc. Natl Acad. Sci. 110, E4858–E4866 (2013).
Article PubMed PubMed Central Google Scholar
Kent, W. J. BLAT—The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
CAS PubMed PubMed Central Google Scholar
Matus, D. Q., Pang, K., Daly, M. & Martindale, M. Q. Expression of Pax gene family members in the anthozoan cnidarian, Nematostella vectensis: Pax gene expression in Nematostella vectensis. Evol. Dev. 9, 25–38 (2007).
Article CAS PubMed Google Scholar
Ryan, J. F. et al. The cnidarian-bilaterian ancestor possessed at least 56 homeoboxes: evidence from the starlet sea anemone, Nematostella vectensis. Genome Biol. 7, R64 (2006).
Article PubMed PubMed Central Google Scholar
Mazza, M. E., Pang, K., Martindale, M. Q. & Finnerty, J. R. Genomic organization, gene structure, and developmental expression of three Clustered otx genes in the sea anemone Nematostella vectensis. J. Exp. Zool. B Mol. Dev. Evol. 308B, 494–506 (2007).
Article CAS Google Scholar
Matus, D. Q., Thomsen, G. H. & Martindale, M. Q. Dorso/ventral genes are asymmetrically expressed and involved in germ-layer demarcation during cnidarian gastrulation. Curr. Biol. 16, 499–505 (2006).
Article CAS PubMed Google Scholar
Hudry, B. et al. Molecular insights into the origin of the Hox-TALE patterning system. Elife 3, e01939 (2014).
Article PubMed PubMed Central Google Scholar
Mazza, M. E., Pang, K., Reitzel, A. M., Martindale, M. Q. & Finnerty, J. R. A conserved cluster of three PRD-class homeobox genes (homeobrain, rx and orthopedia) in the Cnidaria and Protostomia. Evodevo 1, 3 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ying, H. et al. The Whole-genome sequence of the coral Acropora millepora. Genome Biol. Evol. 11, 1374–1379 (2019).
Article CAS PubMed PubMed Central Google Scholar
Consortium*, The C. elegans Sequencing. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).
Article ADS Google Scholar
Hoskins, R. A. et al. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 25, 445–458 (2015).
Article PubMed PubMed Central Google Scholar
Hydra 2.0 Web Portal. https://research.nhgri.nih.gov/hydra.
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Article CAS PubMed PubMed Central Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article PubMed PubMed Central Google Scholar
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sanderson, M. J. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302 (2003).
Article CAS PubMed Google Scholar
dos Reis, M. et al. Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Curr. Biol. 25, 2939–2950 (2015).
Article PubMed PubMed Central Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinforma. 10, 421 (2009).
Article Google Scholar
Dimitrieva, S. & Bucher, P. UCNEbase—a database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic Acids Res. 41, D101–D109 (2013).
Article CAS PubMed Google Scholar
Roth, A. C. J., Gonnet, G. H. & Dessimoz, C. Algorithm of OMA for large-scale orthology inference. BMC Bioinforma. 9, 518 (2008).
Article Google Scholar
Moya, A. et al. Whole transcriptome analysis of the coral Acropora millepora reveals complex responses to CO₂-driven acidification during the initiation of calcification. Mol. Ecol. 21, 2440–2454 (2012).
Article CAS PubMed Google Scholar
Zakas, C., Harry, N. D., Scholl, E. H. & Rockman M. V. The Genome of the Poecilogonous Annelid Streblospio benedicti. Genome Biol Evol. 14, evac008 (2022).
Shingate, P. et al. Chromosome-level assembly of the horseshoe crab genome provides insights into its genome evolution. Nat. Commun. 11, 2322 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Harris, T. W. et al. WormBase: a modern model organism information resource. Nucleic Acids Res. 48, D762–D767 (2020).
CAS PubMed Google Scholar
Srivastava, M. et al. The Trichoplax genome and the nature of placozoans. Nature 454, 955–960 (2008).
Article ADS CAS PubMed Google Scholar
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Schultz, D. T. et al. A chromosome-scale genome assembly and karyotype of the ctenophore Hormiphora californensis. G3 11, jkab302 (2021).
Article PubMed PubMed Central Google Scholar
Blaxter, M. et al. The genome sequence of the spotted kaleidoscope jellyfish, Haliclystus octoradiatus (James-Clark, 1863). Wellcome Open Res. 8, 6 (2023).
Article PubMed Google Scholar
Wood, C. et al. The genome sequence of the orange-striped anemone, Diadumene lineata (Verrill, 1869). Wellcome Open Res. 7, 93 (2022).
Article PubMed PubMed Central Google Scholar
Adams, M. et al. One fly-one genome: chromosome-scale genome assembly of a single outbred Drosophila melanogaster. Nucleic Acids Res. 48, e75 (2020).
CAS PubMed PubMed Central Google Scholar
Kenny, N. J. et al. The gene-rich genome of the scallop Pecten maximus. Gigascience 9, giaa037 (2020).
Mouse Genome Sequencing Consortium. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Article Google Scholar
Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, H. et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Nat. Commun. 12, 1–6 (2021).
ADS Google Scholar
Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Article ADS PubMed PubMed Central Google Scholar
Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).
Article CAS PubMed Google Scholar
Kruse, K., Hug, C. B. & Vaquerizas, J. M. FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data. Genome Biol. 21, 303 (2020).
Article PubMed PubMed Central Google Scholar
Tamura, K., Stecher, G. & Kumar, S. MEGA11: molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).
Article PubMed Google Scholar
Zou, Z., Ohta, T., Miura, F. & Oki, S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res. 50, W175–W182 (2022).
Article CAS PubMed PubMed Central Google Scholar
Oki, S. et al. ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep. 19, e46255 (2018).
Zimmermann, B., Simakov, O. & Montenegro, J. D. Nijibabulu/cnidariangenomes: v1.2.1 Zenodo. https://doi.org/10.5281/ZENODO.8407554 (2023).
Zimmermann, B. Source data for sea anemone genomes paper. https://doi.org/10.6084/m9.figshare.24258598.v1 (2023).
Voolstra, C. R. et al. Comparative analysis of the genomes of Stylophora pistillata and Acropora digitifera provides evidence for extensive differences between species of corals. Sci. Rep. 7, 1–14 (2017).
Article CAS Google Scholar
Cunning, R., Bay, R. A., Gillette, P., Baker, A. C. & Traylor-Knowles, N. Comparative analysis of the Pocillopora damicornis genome highlights role of immune system in coral evolution. Sci. Rep. 8, 16134 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Jiang, J. B. et al. A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome. Gigascience 8, giz026 (2019).
Jeon, Y. et al. The draft genome of an octocoral, Dendronephthya gigantea. Genome Biol. Evol. 11, 949–953 (2019).
Article CAS PubMed PubMed Central Google Scholar
Marlétaz, F. et al. Amphioxus functional genomics and the origins of vertebrate gene regulation. Nature 564, 64–70 (2018).
Article ADS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Matthew Nicotra for providing us with the HMW DNA extraction protocol used for Scolanthus. We thank Robert Reischl for the photo of Scolanthus and Patrick R.H. Steinmetz and Hanna Kraus for the photo of Nematostella vectensis (Fig. 1). Special thanks to Tatiana Lebedeva for the cartoon drawings of Nematostella (Fig. 4). We are grateful to the Stowers Institute Molecular Biology Core facility, particularly Amanda Lawlor, Michael Peterson and Anoja Perera. This work was supported by grants of the Austrian Science Fund FWF (P24858; P21108) to U.T., support from the Stowers Institute for Medical Research to M.G. and an NIH Ruth L. Kirschstein NRSA (F32 GM131522) to E.M.H. We are also grateful for the support of the CNRS Marine Station in Roscoff and the Assemble grant 227799 to U.T. for collecting Scolanthus.

Author information

Yehu Moran
Present address: The Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel
These authors contributed equally: Bob Zimmermann, Juan D. Montenegro.

Authors and Affiliations

Department of Neurosciences and Developmental Biology, Faculty of Life Sciences, University of Vienna, Djerassiplatz 1, 1030, Vienna, Austria
Bob Zimmermann, Juan D. Montenegro, Lukas Weilguny, Jessica Lovegrove-Walsh, Daniela Praher, David Fredman, Darrin Schultz, Yehu Moran, Oleg Simakov, Grigory Genikhovich & Ulrich Technau
Research platform SinCeReSt, University of Vienna, Djerassiplatz 1, 1030, Vienna, Austria
Bob Zimmermann, Juan D. Montenegro, Oleg Simakov & Ulrich Technau
Stowers Institute for Medical Research, Kansas City, MO, 64110, USA
Sofia M. C. Robb, Whitney J. Fropf, Shuonan He, Shiyuan Chen, Eric M. Hill, Cheng-Yi Chen, Katerina Ragkousi & Matthew C. Gibson
Department of Biology, Amherst College, Amherst, MA, 01002, USA
Katerina Ragkousi
Max Perutz laboratories, University of Vienna, Dr. Bohrgasse 5, 1030, Vienna, Austria
Ulrich Technau

Authors

Bob Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar
Juan D. Montenegro
View author publications
You can also search for this author in PubMed Google Scholar
Sofia M. C. Robb
View author publications
You can also search for this author in PubMed Google Scholar
Whitney J. Fropf
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Weilguny
View author publications
You can also search for this author in PubMed Google Scholar
Shuonan He
View author publications
You can also search for this author in PubMed Google Scholar
Shiyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Lovegrove-Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Eric M. Hill
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Katerina Ragkousi
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Praher
View author publications
You can also search for this author in PubMed Google Scholar
David Fredman
View author publications
You can also search for this author in PubMed Google Scholar
Darrin Schultz
View author publications
You can also search for this author in PubMed Google Scholar
Yehu Moran
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Simakov
View author publications
You can also search for this author in PubMed Google Scholar
Grigory Genikhovich
View author publications
You can also search for this author in PubMed Google Scholar
Matthew C. Gibson
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Technau
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.Z. generated the DNA libraries for PacBio sequencing and for HiC, carried out most of the bioinformatic analyses and wrote the paper. J.D.M. generated the gene models and carried out other bioinformatics analyses. SMCR and WJF set up the SIMR base browser and carried out bioinformatic analyses, L.W., D.F., D.S., O.S., J.L.-W. carried out bioinformatic analyses. S.H., S.C., E.M.H.,. C.C., K.R., D.P., Y.M., G.G. generated experimental data and contributed to the bioinformatic analyses. MCG oversaw the project, contributed Isoseq data and edited the paper, U.T. designed the study, collected Scolanthus polyps, contributed various sequencing data, and wrote the paper, further edited by all authors.

Corresponding authors

Correspondence to Matthew C. Gibson or Ulrich Technau.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1 - 14

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zimmermann, B., Montenegro, J.D., Robb, S.M.C. et al. Topological structures and syntenic conservation in sea anemone genomes. Nat Commun 14, 8270 (2023). https://doi.org/10.1038/s41467-023-44080-7

Download citation

Received: 02 June 2023
Accepted: 29 November 2023
Published: 13 December 2023
DOI: https://doi.org/10.1038/s41467-023-44080-7

This article is cited by

Updated single cell reference atlas for the starlet anemone Nematostella vectensis
- Alison G. Cole
- Julia Steger
- Ulrich Technau
Frontiers in Zoology (2024)
Highly conserved and extremely evolvable: BMP signalling in secondary axis patterning of Cnidaria and Bilateria
- David Mörsdorf
- Paul Knabl
- Grigory Genikhovich
Development Genes and Evolution (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.