The genomic basis of animal origins: a chromosomal perspective from the sponge Ephydatia muelleri

Nathan J Kenny a, b * nathanjameskenny@gmail.com 0000-0003-4816-4103 Warren R. Francis * wfrancis@biology.sdu.dk 0000-0003-3473-4726 Ramón E. Rivera-Vicéns r.rivera@lrz.uni-muenchen.de 0000-0002-6229-3537 Ksenia Juravel k.juravel@lrz.uni-muenchen.de Alex de Mendoza alexmendozasoler@gmail.com 0000-0002-6441-6529 Cristina Díez-Vives cristinadiezvives@gmail.com Ryan Lister ryan.lister@uwa.edu.au 0000-0001-6637-7239 Luis Bezares-Calderon l.a.bezares-calderon@exeter.ac.uk 0000-0001-6678-6876 Lauren Grombacher grombach@ualberta.ca Maša Roller masa.milosevic@gmail.com 0000-0001-5667-3317 Lael D. Barlow lael@ualberta.ca 0000-0002-4358-753X Sara Camilli scamilli@princeton.edu Joseph F. Ryan joseph.ryan@whitney.ufl.edu Gert Wörheide m, n woerheide@lmu.de 0000-0002-6380-7421 April L Hill ahill5@bates.edu Ana Riesgo * a.riesgo@nhm.ac.uk 0000-0002-7993-1523 Sally P. Leys sleys@ualberta.ca 0000-0001-9268-2181

One of the key events in the history of life was the evolutionary transition from unicellular organisms to multicellular individuals in which differentiated cell types work cooperatively 1 . In animals, the events that enabled this transformation can often be inferred by comparing the genomes of living representatives of nonbilaterian animals to those of bilaterians and their sister taxa, and determining shared characters and key differences between them 2 .
However, the origins of several fundamental metazoan traits, such as tissues and nervous systems, are still unknown. Determining the origin of these characteristics requires more robust and contiguous genomic resources than are currently available for in non-bilaterian animal taxa.
Porifera, commonly known as sponges, are one of the first lineages to have evolved during the rise of multicellular animals 3 and are an essential reference group for comparative studies. The benchmark genome for sponges, Amphimedon queenslandica 4 , has provided a wealth of insight into the genomic biology of sponges 5,6 , yet studies of other sponge species have suggested that traits in A. queenslandica may not be representative of the phylum as a whole 7,8 . For example, its genome is one of the smallest measured in sponges 6 , it is highly methylated in comparison to other animals 9 , may have undergone some gene loss even in wellconserved families 10 and has been described as possessing an 'intermediate' genomic state, between those of choanoflagellates and metazoans 5 . There are over 9,200 species of sponge (http://www.marinespecies.org/porifera/, 11 ), and understanding whether the unusual characteristics of A. queenslandica are typical of this large and diverse phylum can only be tested with additional, and more contiguous, genome assemblies.
Sponges diverged from the metazoan stem lineage in the Neoproterozoic 12 and therefore are central to understanding the processes and mechanisms involved in the initial metazoan radiation. Sponges possess the fundamental characteristics shared by all animals, including development through embryogenesis to form tissues and signalling to coordinate whole body behaviour 13,14 . Most also have a highly conserved body plan, consisting of canals and pumping cells that filter water effectively 15 . However, within the four classes of sponges (Hexactinellida, Demospongiae, Calcarea, and Homoscleromorpha), several groups differ from this Bauplan.
The transition to freshwater is one of the most remarkable evolutionary trajectories marine animals can undergo, as it requires a complete spectrum of physiological adaptations to novel habitats. Not only can freshwater sponges, which consist of a single layer of cells over a scant extracellular matrix, control their osmolarity in freshwater 19 , they can withstand extremely cold temperatures and even freezing, as they inhabit some lakes that see temperatures below -40°C 20 , and can also tolerate extreme heat and desiccation in desert sand dunes and high up on tree trunks 21,22 . Freshwater sponges are both unfamiliar and yet so common worldwide that under the right conditions they can foul drinking water reservoirs, waste treatment plants, intake pipes and cooling systems for power plants 23 . The main adaptation which permits colonization of such extreme habitats is the production of sophisticated structures called gemmules, a distinctive stage in the life of these sponges 21 . The events that allowed colonization of freshwater and which are required for adaptation to extreme habitats by sponges are not yet fully understood 24 . Whether genomes of freshwater species are remarkably changed from those seen in marine sponges and other animals is also yet to be investigated.
The freshwater sponge Ephydatia muelleri (Lieberkühn, 1856) ( Fig. 1, Supplementary File 1 S1) is found in rivers and lakes throughout the northern hemisphere. Because of its global distribution and century-long history of study both in situ and in the laboratory, E. muelleri is an outstanding model for asking questions about adaptation and the evolution of animal characters.
It has separate sexes, allowing the study of inheritance 25 , but more practically, gemmules are clones that can easily be cultured at room temperature [26][27][28] . They also tolerate freezing 20 : this species can be stored at -80°C for several years prior to hatching 28 . Here we present a chromosome-level assembly of the 326 million base pairs (megabases, Mb) of its genome. The highly contiguous assembly of this genome offers an exceptionally rich resource for studying regulatory elements, macrosynteny and structural chromatin variation, all of which are key data for understanding the genomic biology of sponges and the early evolution of animals.

Results & Discussion:
A complete genome, with a higher gene content than most animals We have produced a high-quality assembly of the 326Mb Ephydatia muelleri genome using PacBio, Chicago and Dovetail Hi-C libraries sequenced to approximately 1,490 times total coverage (Supplementary File 1 S2). The resulting assembly has 1,444 scaffolds with a scaffold N50 of 9.8 Mb ( Fig. 2A The abundance of genes seen in E. muelleri is in part due to tandem duplication. Many gene clusters have identical intron-exon structure between duplicated genes, suggesting that the mechanism of gene duplication is from replication slippage and unequal crossing-over. E. muelleri also shows evidence of widespread segmental duplication, with many gene clusters replicated. For example, both scaffold_0002 and scaffold_0004 contain a large cluster of predicted homologues of integrins, while on scaffold_0004 the integrin cluster overlaps with a large cluster of 177 predicted E3-ubiquitin ligases (e.g. Supplementary File 1, Fig. 6). These cluster duplications are recognizable by their close similarity in sequence, especially in coding sequences, but intergenic and intronic sequences are highly variable, strongly suggesting that these clusters are true duplications and not assembly artifacts. Even BUSCO genes, which are found in single copy in most genomes, are duplicated in E. muelleri with 19.6% represented by more than one copy (Supplementary File 1, S3.2, Table 4, 5).

Conservation of synteny with other metazoans
Sponges diverged from other metazoans in the Neoproterozoic (540-1000mya) and yet we found evidence for conserved syntenic regions and even local gene order within scaffolds between E. muelleri and Trichoplax adhaerens, Nematostella vectensis and the chordate Branchiostoma floridae. Synteny conserved over hundreds of millions of years is consistent with the hypothesis that gene shuffling primarily occurs within chromosomes rather than between them, as predicted by the double cut and join-dosage sensitivity (DCJ-DS) model 33 . The DCJ-DS model predicts that dosage sensitive genes would tend to stay on the same chromosome, although the local order may change. We find this is clearly observable in shorter chromosomes

Pan-metazoan epigenetics: Chromatin structure and cytosine DNA methylation
The large number of genomes now available make it clear that differences in gene regulation, as well as gene content, are responsible for the innovations seen in different animal body plans and phyla 34 . To understand the processes underlying gene regulation in nonbilaterian metazoans, data on the three-dimensional architecture of chromatin is needed from these clades. We used HOMER and Bowtie2 to analyse Hi-C data from E. muelleri and found that as in other animals the genome is organised into topologically associating domains (TADs) as well as loops, although we did not find mammalian-like "corner peaks" at the edges of the predicted TADs (Fig. 3A). These TADs are slightly larger on average than those seen in Drosophila melanogaster, at 142.4 kbp, compared with ~ 107kbp 35 . As in other non-bilaterians, the E. muelleri genome lacks the CCCTC-binding factor (CTCF) but does possess a suite of non-CTCF zinc finger proteins which form a sister group to the bilaterian CTCF proteins 36  (mCG/CG), which is higher than most invertebrates profiled to date 9 , but much lower than A.
queenslandica (81%) and S. ciliatum (51%) (Fig. 3B). The slightly higher repeat content of E. muelleri compared to A. queenslandica (47% and 43% respectively, Supplementary File 1, Table   6) indicates that hypermethylation in A. queenslandica cannot be driven by an exceptionally high repeat content in that species. The E. muelleri methylome thus challenges the assumption that all demosponges have hypermethylated genomes, and suggests that the A. queenslandica pattern is a lineage-specific innovation. Whether methylation levels differ significantly in freshwater compared to marine environments has yet to be explored, especially in invertebrate taxa, and could have a bearing on this inference.
Since cytosine methylation is highly mutagenic, vertebrate and A. queenslandica genomes are highly depleted for CpG dinucleotides 9 . Congruent with the intermediate methylation levels, we found that the genome of E. muelleri is also depleted for CpG dinucleotides, more than most invertebrates but less than in A. queenslandica ( Supplementary   File 1, Fig. 19). However, CpG content varies greatly across sponge genomes; for instance, S.
ciliatum has higher methylation than E. muelleri, but has a relatively higher amount of CpGs.
This indicates that CpG depletion is not fully coupled to methylation levels in sponges, and that retention of CpGs might obey unknown species-specific constraints.
Given that E. muelleri shows methylation levels more consistent with canonical "mosaic" invertebrate methylomes than with a hypermethylated genome, we then checked whether gene body methylation accumulation is dependent on gene transcription. CpGs are more commonly observed near TSS (Transcriptional Start Sites) than in A. queenslandica, but marginally lower in absolute levels than those seen in S. ciliatum (Fig. 3C). As observed in many invertebrates, E.
muelleri genes with mid-transcriptional levels show higher gene body methylation than nonexpressed genes or highly expressed genes ( Supplementary Fig. 19B) 37 . Promoters are strongly demethylated and repeats found within gene bodies tend to have higher methylation levels than those in intergenic regions, as seen in other invertebrates 38 , suggesting that not all repeats are actively targeted by DNA methylation in E. muelleri. Overall, the E. muelleri methylome shows many patterns similar to those of canonical "mosaic" invertebrate genomes, and may therefore provide a more appropriate comparison for future comparative epigenetics work than other existing sponge models.

Sponges show high levels of gene gain.
Every sponge species we examined showed a gain of 12,000 more genes since their divergence from the most recent sister taxon or clade (

Molecular signals of freshwater adaptation
To determine whether transitions to freshwater are accompanied by the loss of a common set of genes in independent clades, we studied shared losses in four disparate animal lineages, using pairs of species for each lineage, in which one is marine and the other is freshwater. Our  File 1, Fig. 28).
Freshwater sponges, like hydra and many protists use contractile vacuoles to excrete water 19 and so it is possible that the duplication of aquaglyceroporins in freshwater sponges may have allowed some of the genes to take on new functions. For example, in mammals AQP9 can mediate silicon influx in addition to being permeable to glycerol and urea, but to not water itself 41 . Since sponge aquaglyceroporins are more similar to AQP9 than to AQP3 or 7 (Supplementary File 1, S7.5.2 and Supplementary File 1, Fig. 28) it is possible that in freshwater sponges, in particular, these gene families function in silicon transport for skeletogenesis. Aquaporin-like molecules, glycerol uptake facilitator proteins (GLFPs), are found in bacteria and plants but, to date, have not been detected in animal genomes 42 . E. muelleri has 9 paralogs of GLFP, with five of them located on the same scaffold (Em0019) (Supplementary File 1, Fig. 28). We hypothesize that, as in plants 42 , the presence of GLFPs in sponges came about via horizontal gene transfer.

Gene expression during E. muelleri development
To understand what genes are common and which are distinct from other metazoans during the development of the filter-feeding body plan, we examined differential gene expression from hatching gemmules through to the formation of a filtering sponge. The majority arachidonate pathways for glycogen breakdown and fatty acid metabolism were differentially upregulated to produce the breakdown of stored reserves (Fig. 5 and Supplementary File 1, S8).
Many of the genes involved in formation of a basement membrane and true epithelia were originally considered to be eumetazoan 43 , but we found, in the E. muelleri genome, genes for type IV collagen, contactin, laminin, PAR3/6, patj, perlecan and nidogen that exhibit gene expression profiles consistent with their known role in development of polarized epithelia (Fig. 5 and Supplementary File 1, Fig. 31). Similarly, claudin, which may be involved in the tight seal that E. muelleri epithelia have been shown to form 44 , is expressed later in developmental time.
While eukaryotic genes are expressed throughout the different developmental stages, many sponge-specific genes are expressed only as the sponge hatches and develops the aquiferous system that is common only to Porifera.

Sensation and non-nervous signalling in sponges
Sponges have no nervous system and yet they contract in response to a range of stimuli 45 . Exactly how contractions are coordinated is still unknown, and the potential position of ctenophores rather than sponges as sister to the rest of animals on the tree of life [46][47][48] provocatively implies that sponges could have lost neurons.
It has been difficult to identify a single character of neurons shared by all animals 49 , but the synapse and in particular the proteins that compose its scaffolding and chemical neurotransmitter complement, are agreed to be an important component 50  However, homologues of SNAP SNAREs are widely conserved among eukaryotes without nervous systems, such as plants 54 . This raises the question of whether particular SNAP SNARE genes found in early-branching metazoan lineages indicate an early origin of neuron-specific protein machinery.
Our phylogenetic analysis of SNAP-23/25 homologues revealed that the vertebrate neuron-specific paralogue SNAP-25 arose from a duplication that occurred in the vertebrate stem lineage, while the two SNAP-23/25-like genes found in E. muelleri arose from an independent duplication that occurred in Porifera (Supplementary File 1, Fig. 29). This means that, like their non-holozoan homologues, none of the identified poriferan SNAP-23/25-like genes are more closely related to SNAP-25 than to the non-neuronal vertebrate paralogue SNAP-23. Given the high quality of the assembly of the E. muelleri genome, this result shows that SNAP-25 synapses arose after sponges diverged from the rest of animals, and this is consistent with a late origin of synaptic type electro-chemical signalling in the metazoan stem lineage, after the divergence of Porifera.
One overt behaviour of E. muelleri is a series of convulsions which it uses to dislodge particles clogging its collar filters 26 . Previous work indicated that sensory cilia in osculum were required for effective contractions and implicated a role for Transient Receptor Potential (TRP) channels in sensing changes in water moving through the sponge 55 . We found a large diversification of TRP channels in the E. muelleri genome, and these grouped with the TRPA and TRPML families (Supplementary File 1, S9.1). There is differential loss of TRPM, TRPML, TRPVnan, TRPV and TRPP2 in each of the four major lineages of sponges, but sponges as a group have lost TRPC/TRPN channels, as homologues of that group are known from choanoflagellates ( Supplementary File 1, S9). TRPA genes are some of the best characterized and are known as mechano-or chemo-receptors whereas TRPML families are largely considered to be expressed on organelles inside cells 56 . The diversification of TRPA channels in E. muelleri and other demosponges suggests a molecular mechanism for mechanoreception as well as chemical sensation in this clade.
In the E. muelleri genome we also found a wide range of ion channels involved in signalling in eumetazoans (Supplementary File 1, S9), but there are conspicuous absences including voltage-gated sodium and potassium channels, epithelial sodium-activated channels (ENaCs), leak channels, and glutamate-gated ion channels (GICs). Also absent are receptors for monoamine (serotonin and dopamine) signalling, as well as key components of the biosynthesis pathways for these, as well as ionotropic glutamate receptors. While the latter are present in calcareous and homoscleromorph sponges, and in non-metazoans, demosponges seem to have lost them. In contrast, we found evidence for a diversity of metabotropic glutamate receptors (mGluR), as well as a wealth of metabotropic GABA-receptors, as discussed above.
In E. muelleri therefore, as in other demosponges, there is evidence for components that allow sensation of the environment via TRP channels, among others, and non-neuronal chemical signalling via metabotropic GPCRs (e.g. receptors for glutamate and potentially GABA and/or a range of organic acids), but no evidence for more rapid electro-chemical signalling. While we find no signature for any aspect of conventional nervous tissues in the E. muelleri genome, we cannot rule out the possibility that the phylum Porifera as a whole, or individual lineages within it (including E. muelleri), have lost these neuron-related components.

Host-microbe associations in E. muelleri
Most animals possess diverse symbiotic microbial consortia, which provide their hosts with metabolic advantages and new functions, and sponges are no different 57,58 . The release of the genome of A. queenslandica 4 opened a window into the study of the mechanisms of spongemicroorganism interactions 57 . To unravel the recognition mechanisms developed by host and microbes to facilitate symbioses, high quality genomes (and more genomic resources in general) are fundamental.
The genome of E. muelleri offers a model that allows exploration of eukaryotic patterns of microbial recognition in unique environments. We studied the microbiome of 11 different specimens of E. muelleri collected from six locations across 6,500 km in the Northern hemisphere, and found that this species contained between 865 and 4,172 unique amplicon sequence variants (ASVs) (Supplementary File 1, S10). The microbiome of E. muelleri has a level of diversity comparable to that of the most diverse marine demosponges 58,59 . The microbiome of all specimens of E. muelleri is largely dominated by Proteobacteria and Bacteroidetes, as in other demosponges (Supplementary File 1, S10). However, like other freshwater sponges, E. muelleri possesses a large fraction of the order Betaproteobacteriales 59 , absent in marine sponges, which are traditionally associated with Gammaproteobacteria, a difference which is likely due to the differing pH and nutrients found in the two environments .
Surprisingly, even though the entire genome of an unknown species of Flavobacterium was recovered during the genome assembly (Supplementary File 1, S4), Flavobacteriales were not especially abundant in the other E. muelleri samples, only reaching 16% relative abundance in adult tissue from UK samples (Supplementary File 1, S10).
Overall the microbiome was relatively similar across all individuals collected, but geographic location was the major determinant of microbiome content, as has been found in marine sponges 58 . For example, only the samples collected from the Sooke Reservoir had a high abundance of Firmicutes and Campylobacteria. Likewise, only those samples collected from Maine had a moderate abundance of Cyanobacteria. Despite the distance separating samples, and therefore potential different ecologies of the collection sites, we found that four amplicon sequence variants were shared among all samples. These four ASVs were assigned to Burkholderiaceae (order Betaproteobacteriales) and Ferruginibacter (order Chitinophagales), and one was an unclassified bacterium (Supplementary File 1, S10). Whether these ASVs are fundamental for the metabolic function of E. muelleri, or whether they are simply cosmopolitan bacteria transported by the wind or on animals, and taken up by all sponges in lakes and rivers, is still to be determined. These findings and resources open the door to studies of species-specific patterns of host-microbe association at a broad scale.

Conclusions:
The high quality of the E. muelleri genome provides a new basis for comparative studies of animal evolution. To date we have lacked a chromosomal-quality poriferan assembly, and with this in hand for an experimentally tractable organism, comparative studies of a variety of ancestral characters, including longer-range gene regulation and genomic architecture, become possible.
Given their apparent anatomical simplicity, it can be surprising to some researchers that sponges have nearly twice the gene complement of other animals, but the high quality of this genome confirms this is not an artifact of previous genome assemblies, and suggests that gene duplication and adaptation to novel environments are responsible for the high gene counts.
Sponges possess complex filtering behaviours, integrate with an extensive network of microbes, and have an extensive defense system. As only approximately half of the genes found in sponges can be firmly identified, it is clear that there remains a huge amount of "hidden biology" yet to be understood in sponges, just as in other non-bilaterians 60 . The robustness of the E. muelleri genome and model is an excellent tool for performing this work. It also opens the door to comparative analysis of the genomic changes required for the challenging process of adaptation to freshwater, and to finding out whether these are shared convergently in disparate phyla.
Complemented by additional RNAseq, methylation data, and the analysis of symbiont content, the E. muelleri genome offers an important new opportunity for exploring the molecular toolkit, from protein coding to gene regulation, that underpinned the early evolution of animals and their diverse, complex, and successful, traits.

Summary of Methods: Full details for all sections available in Supplementary File 1.
Tissue used for DNA sequencing was derived from a single clone collected as overwintering cysts (gemmules) from the Sooke Reservoir, at the head tank of the city of Victoria, British Columbia drinking water system. A voucher specimen is deposited with the Automated annotation of gene sequences was performed using DIAMOND BLASTx 63 against the nr and swissprot databases followed by functional annotation. Methylation studies were performed using the MethylC-seq protocol 64 . TADs and loops were identified using HOMER 65 . Orthogroup based analyses were performed primarily using Orthofinder2 66 , with iqtree, mafft and diamond BLAST options (described in detail Supplementary File 1, S7).
RNAseq reads to the reference E. muelleri genome. edgeR 68

Ethics declarations
Competing interests: The authors declare no competing interests.