Sponges are an ancient group of animals that diverged from other metazoans over 600 million years ago. Here we present the draft genome sequence of Amphimedon queenslandica, a demosponge from the Great Barrier Reef, and show that it is remarkably similar to other animal genomes in content, structure and organization. Comparative analysis enabled by the sequencing of the sponge genome reveals genomic events linked to the origin and early evolution of animals, including the appearance, expansion and diversification of pan-metazoan transcription factor, signalling pathway and structural genes. This diverse ‘toolkit’ of genes correlates with critical aspects of all metazoan body plans, and comprises cell cycle control and growth, development, somatic- and germ-cell specification, cell adhesion, innate immunity and allorecognition. Notably, many of the genes associated with the emergence of animals are also implicated in cancer, which arises from defects in basic processes associated with metazoan multicellularity.
The emergence of multicellular animals from single-celled ancestors over 600 million years ago required the evolution of mechanisms for coordinating cell division, growth, specialization, adhesion and death. Dysfunction of these mechanisms drives diseases such as cancers, in which social controls on multicellularity fail, and autoimmune disorders, in which distinctions between self and non-self are disrupted. The hallmarks of metazoan multicellularity are therefore intimately related to those of cancer1 and immunity2.
Sponges have a critical role in the search for the origins of metazoan multicellular processes3, as they are generally recognized as the oldest surviving metazoan phyletic lineage. Although the kinship of sponges to other animals was recognized by the nineteenth century4, the absence of a gut and nervous system had relegated sponges to the ‘Parazoa’5, a grade below the ‘Eumetazoa’ or ‘true animals’ (that is, cnidarians, ctenophores and bilaterians)6. Nevertheless, sponges share key adhesion and signalling genes7,8,9,10,11 with eumetazoans, as well as other genes important in body plan patterning such as developmental transcription factors12,13,14,15; sponge embryos and larvae (Fig. 1) are readily comparable to those of other animals12,16. Sponges are diverse and their phylogeny is poorly resolved17,18,19, allowing for the possibility that sponges are paraphyletic20, which implies that other animals evolved from sponge-like ancestors.
Here we report on the genome of Amphimedon queenslandica, a haplosclerid demosponge, the adult organization and lifestyle of which is typical for sponges, feeding on microbes and particulate organic matter filtered by flagellated collar cells that resemble choanoflagellates. Although the diversity of sponges and their uncertain phylogeny make it doubtful that any single species can reveal the intricacies of early animal evolution, comparison of the A. queenslandica draft genome with sequences from other species can provide a conservative estimate of the genome of the common ancestor of all animals and the timing and nature of the genomic events that led to the origin and early evolution of animal lineages.
The A. queenslandica genome harbours an extensive repertoire of developmental signalling and transcription factor genes, indicating that the metazoan ancestor had a developmental ‘toolkit’ similar to that of modern complex bilaterians. The origins of many of these and other genes specific to animal processes such as cell adhesion, and social control of cell proliferation, death and differentiation can be traced to genomic events (gene birth, subfamily expansions, intron gain/loss, and so on) that occurred in the lineage that led to the metazoan ancestor, after animals diverged from their unicellular ‘cousins’. In addition to possessing a wide range of metazoan-specific genes, the Amphimedon draft genome is missing some genes that are conserved in other animals, indicative of gene origin and expansion in eumetazoans after their divergence from the demosponge lineage and/or gene loss in Amphimedon.
Genome sequencing and annotation
Amphimedon queenslandica is a hermaphroditic spermcast spawner, and cannot be readily inbred in the laboratory (Fig. 1a–c and Supplementary Note 1)21. Adult sponges also harbour many commensal microbes. To minimize allelic variation and microbial contamination we sequenced genomic DNA from multiple embryos and larvae from a single mother. This DNA contains four dominant parental haplotypes (∼3% polymorphism), although a single brood may have multiple fathers (Supplementary Notes 2.1 and 3). We used ∼9-fold whole-genome Sanger shotgun coverage to produce a ∼167-megabase-pair assembly that typically represents each locus once rather than splitting alleles (Supplementary Notes 2 and 3) and captures ∼97% of the protein-coding gene content (Supplementary Note 2.5). We also recovered an alpha-proteobacterial genome that is probably a vertically transmitted commensal microbe of Amphimedon embryos (Supplementary Note 2.7).
The assembled A. queenslandica genome encodes ∼30,000 predicted protein-coding loci (Supplementary Note 4). This is an overestimate of the true gene number due to overprediction, unrecognized transposable elements and gene fragmentation at contig or scaffold boundaries. Nevertheless, 18,693 (63%) have identifiable homologues in other organisms in the Swiss-Prot database; there are no doubt novel or rapidly evolving sponge genes unknown in other species. CpG dinucleotides are depleted, and TpG and CpA dinucleotides augmented, relative to overall G+C composition, which is indicative of germline cytosine methylation in the Amphimedon genome. This is consistent with the presence of a DNMT3-related putative de novo methytransferase as well as proteins with predicted methyl CpG binding domains.
Analysis of the Amphimedon gene set reveals marked conservation of gene structure (intron phase and position) and genome organization (synteny) relative to other animals (Supplementary Notes 5 and 6). In Amphimedon, intragenic position and phase are retained for 84% of the introns inferred for the metazoan ancestor, comparable to the 76% and 88% retention in human and sea anemone, respectively22,23. The organization of genes shows conserved synteny (that is, conserved linkage without necessarily requiring colinearity) relative to other animals. In particular, 83 of the 153 longest Amphimedon scaffolds (those that contain genes from more than ten distinct metazoan gene families, sufficient for synteny to be assessed) show segments of conserved synteny with other animals (Supplementary Note 6). This indicates that portions of the 15 ancestral linkage groups inferred for the cnidarian–bilaterian ancestor22,24 were already in place in the demosponge–eumetazoan ancestor. No such conserved synteny was detected between animals and the choanoflagellate Monosiga brevicollis.
We addressed the controversial phyletic branching of early animal lineages by comparing sets of orthologous genes in A. queenslandica and a diverse sampling of 18 complete genomes (Supplementary Note 7). Our analyses support the grouping of placozoans, cnidarians and bilaterians into a eumetazoan clade, with demosponges as an earlier-branching lineage25, and reject the diploblast–triploblast phylogeny17 in favour of a more conventional ‘sponges first’ tree19,20 (Fig. 1d). In our discussion below we therefore refer to descendants of the placozoan–cnidarian–bilaterian last common ancestor as Eumetazoa, and reserve ‘Eumetazoa sensu stricto’ for the more limited clade defined by descendants of the cnidarian–bilaterian ancestor.
Our analysis emphasizes the quantitative divergence between metazoans and their closest living unicellular relatives. For example, 28% of the amino acid substitutions between humans and their last common ancestor with choanoflagellates occurred on the metazoan stem lineage (bold line in Fig. 1d), before the divergence of sponges from other animals. This pre-metazoan period can be crudely estimated to be ∼150–200 million years (Supplementary Note 7.6).
The zootype and origin of metazoan genes
With multiple animal genomes now in hand, we can extend the ‘zootype’ concept26 to include other shared derived genomic characteristics of animals. Out of 4,670 pan-metazoan gene families defined by clustering sponge and eumetazoan peptides, 1,286 (27%) seem to be metazoan-specific (see Supplementary Note 9.2). Similarly, there are eumetazoan, eumetazoan sensu stricto and bilaterian genomic synapomorphies, as well as sponge-specific gene families (for example, kinases, see Supplementary Note 8). Owing to residual incompleteness of the sponge genome draft, and possible gene losses in the Amphimedon lineage, this analysis provides a conservative estimate.
Nearly three-quarters of the 1,286 animal-specific gene families arose by gene duplication on the metazoan stem (Supplementary Note 9). These include the early duplication of transcription factor families such as homeodomains and basic helix–loop–helix domains13,14,27. Additional gene duplication and divergence in eumetazoans further increased transcription factor gene family number, which in general are 2 to 34 times larger in eumetazoans than in Amphimedon. In contrast, substantial diversification of kinase gene families occurred before the divergence of the sponge and eumetazoan lineages (see below)28. We can assess the role of tandem duplication in the creation of these families by seeking evidence for linkages among anciently diverged paralogues (Supplementary Note 10). A significant fraction remain linked (up to 30%, as found in Trichoplax, P < 0.0001, with lower levels in other contemporary metazoan genomes), indicating that many gene family expansions originally occurred as tandem or proximal duplications, and that these genomically local duplications have remained linked over time. This is consistent with the overall preservation of relict linkages observed here and in other basal metazoan genomes22,24,25.
We find 235 animal-specific protein domains and 769 animal-specific domain combinations that evolved along the metazoan stem (Supplementary Note 9). Additionally, lineage-specific changes to these animal domain architectures occurred in early metazoan evolution16,29,30. For example, new combinations of domains in death-fold domain proteins and laminins possibly allow for the modification of protein interactions and pathways involved in programmed cell death and cell adhesion, respectively (Supplementary Note 9.3), and the co-option of sponge-, eumetazoan- or bilaterian-specific architectures into novel functions.
The 705 Amphimedon kinases represent the largest reported metazoan kinome, and include members of >70% of human kinase classes (compared with 59% in choanoflagellate, 83% in sea anemone, 70% in Caenorhabditis elegans and 77% in fruitfly; see Supplementary Note 8.7). Amphimedon has single copies of most metazoan kinase classes, but has several expansions of over 50 genes per class. The largest expansions are in the tyrosine kinase and tyrosine-kinase-like groups, and include over 150 likely receptor tyrosine kinases (RTKs). Unlike Monosiga, where RTKs could not be classified into metazoan families28, Amphimedon has kinase domains from six known animal families (epidermal growth factor receptor (EGFR), Met, discoidin domain receptor (DDR), regeneron orphan receptor (ROR), Eph and Sevenless). The EGFR and some Eph extracellular domain architectures are as in their eumetazoan counterparts, but many other RTKs have unique extracellular domains. For instance, DDRs have immunoglobulin repeats, and sushi domains are found in some members of the expanded Eph and Met families. This indicates that the activating ligands, presumably found largely in the external environment, may be distinct from those of eumetazoans.
Six hallmarks of animal multicellularity
The A. queenslandica genome allows us to assess systematically the origin of the six hallmarks of metazoan multicellularity: (1) regulated cell cycling and growth; (2) programmed cell death; (3) cell–cell and cell–matrix adhesion; (4) developmental signalling and gene regulation; (5) allorecognition and innate immunity; and (6) specialization of cell types. These cardinal features of metazoan multicellularity have their origins on the metazoan stem and often are the result of metazoan gene novelties combining with more ancient factors. A recurring theme is the overlap of these core ‘multicellularity’ genes with genes perturbed in cancer, a disease of aberrant multicellularity (see oncogenes and tumour suppressors in Figs 2 and 3).
Regulated cell cycling and growth
Although the core machinery of the animal cell cycle traces back to early eukaryotes (Fig. 2a and Supplementary Note 8.2), some critical metazoan regulatory mechanisms emerged more recently. For example, whereas the p53/p63/p73 tumour suppressor family is holozoan-specific31, the HIPK kinase that phosphorylates p53 in the presence of DNA breaks is metazoan-specific, and the MDM2 ubiquitin ligase that regulates p53 appears as a eumetazoan feature. Thus, the p53-mediated response to DNA damage may have emerged before the divergence of eumetazoans. The Myc oncogene illustrates how intramolecular regulation has also evolved. Although Amphimedon shares the four-amino-acid N-terminal DCMW motif present in other animal Myc proteins, this motif is missing in the Myc orthologue found in the unicellular Monosiga31. Because mutation of this motif disrupts Myc function in vertebrates, it may have an important role in all animals.
Tumour suppressors encoded by two classes of cyclin-dependent kinase (CDK) inhibitors mediate growth-factor-dependent regulation of the cell cycle. Although the INK4/CDKN2 class (p15/p16/p18/p19) regulates the eumetazoan-specific CDK4/6-cyclin D kinase and is chordate-specific, the Cip/Kip/CDKN1 class (p21/p27/p57) is more general, regulating many CDKs, and seems to have arisen on the eumetazoan stem. In bilaterians, Cip/Kip genes integrate external growth signals, and are regulated transcriptionally and post-transcriptionally by the major growth pathways (see below). The emergence of this class of CDK inhibitors on the eumetazoan stem suggests a central regulatory role even in early animals.
Although cell growth and cell division are tightly coupled in unicellular species, they can be separately regulated in multicellular organisms. In bilaterians, growth is regulated by six major signalling pathways (RTK signalling via Ras, insulin signalling via the phosphatidylinositol-3-OH kinase (PI(3)K) pathway, Rheb/Tor, cytokine-JAK/STAT, Warts/Hippo, and the Myc oncogene) that also modulate the cell cycle (Supplementary Note 8.2). Whereas the Rheb/Tor pathway dates back to early eukaryotes, the other pathways contain several genes that are holozoan and metazoan innovations. For example, the insulin receptor substrate and phosphotyrosine binding proteins GAB1/GAB2 emerged on the metazoan stem after the divergence of choanoflagellates, indicating that an insulin-signalling-like pathway may have been a key regulator of growth in early animals by tying into the ancient PDK1 and Akt kinases (Fig. 2b). However, because p21, p27 and MDM2 are all eumetazoan novelties, this pathway may not have acquired the ability to regulate cell proliferation until after the divergence of sponges from eumetazoans.
Programmed cell death
In contrast to the cell cycle machinery, most of the apoptotic circuitry is unique to animals, increasing in complexity along metazoan, eumetazoan and bilaterian stems (Fig. 2c and Supplementary Note 8.3). Both intrinsic and extrinsic programmed cell death pathways require caspases, a metazoan-specific family of cysteine aspartyl proteases. Amphimedon encodes initiator caspases with the characteristic caspase recruitment and death effector domains, as well as an expanded repertoire of effector capases.
The intrinsic pathway drives cell death by permeabilization of the outer mitochondrial membrane and is regulated by the Bcl-2 oncogene family of pro- and antiapoptotic factors. The pro-apoptotic protein Bak arose in the metazoan lineage, whereas Bax and Bok seem to be eumetazoan-specific. Bcl-2/Bcl-X are antiapoptotic and metazoan-specific. Mitochondrial permeabilization releases proteins of varying evolutionary origin, including the ancient apoptosis-inducing factor (AIF) that contributes to caspase-independent apoptosis, metazoan-specific apoptotic protease activating factor 1 (Apaf-1), and eumetazoan sensu stricto-specific caspase-activated DNase (CAD) and its regulator ICAD.
The extrinsic apoptotic pathway is activated by external signals through transmembrane tumour necrosis factor receptors (TNFRs) whose intracellular death domain interacts with downstream adaptors. Amphimedon encodes a nerve growth factor receptor (NGFR) p75-like protein, although it lacks the crucial death domain that is seen in Nematostella and bilaterians (see ref. 32); other death TNFRs (that is, Fas, DR4, DR5 and TNFR1) are vertebrate-specific32,33. Because the intrinsic cascade is composed of components that pre-date metazoans, it is likely to be the original mechanism for inducing apoptosis.
Cell–cell and cell–matrix adhesion
The diagnostic domains of two major cell–cell adhesion superfamilies, the cadherins and the immunoglobulins, are present in Monosiga within the extracellular region of putative transmembrane proteins31,34 (Supplementary Note 8.8). Amphimedon cadherins differ from those of Monosiga in having proteins with domain architectures diagnostic for the metazoan-specific classical cadherin and seven pass transmembrane cadherin subfamilies31,35. A considerable expansion of immunoglobulin-like domain-containing proteins occurred on the metazoan stem, with 218 predicted in Amphimedon versus 5 in Monosiga31. The combination of N-terminal immunoglobulin domains with C-terminal FN3 repeats is found only in metazoans.
Similarly, metazoan extracellular matrix (ECM) proteins use domains that evolved on the holozoan stem. For example, Monosiga encodes proteins with collagen triple helix repeats and other genes with fibrillar collagen C-terminal domains, but these domains only appear together in metazoans30,31. Thrombospondin domain architectures are found in Amphimedon; however, agrin, netrin and perlecan seem to be eumetazoan innovations. The extracellular matrix receptors, α and β integrin (Int), are present in Amphimedon and other metazoans, but absent from the Monosiga and the other non-metazoan eukaryotic genomes we considered (Fig. 3a; see note added in proof).
Developmental signalling and transcription
Components of the major metazoan developmental signalling pathways, as well as classes of developmental transcription factors, are mostly present in Amphimedon and absent from Monosiga and other non-metazoan genomes13,14,16,27,29, suggesting that ontogenetic development, including primary germ cell formation (Supplementary Note 8.4), originated on the metazoan stem3,11,12. Although Amphimedon possesses a characteristically metazoan repertoire of transcription factor families (Supplementary Note 8.6)13,14,27,31, in general these families are further expanded in eumetazoans13. Some differences between sponges and eumetazoans correlate with morphological complexity. For example, sponges do not seem to have a mesoderm and accordingly Amphimedon lacks transcription factors involved in mesoderm development (Fkh, Gsc, Twist, Snail). In contrast, sponges possess several transcription factors involved in determination or differentiation of muscles and nerves despite lacking a neuromuscular system (PaxB, Lhx genes, SoxB, Msx, Mef2, Irx and bHLH neurogenic factors)13,14,27. Amphimedon lacks Hox genes and some other transcription factor subfamilies that are involved in specifying and patterning bilaterian nervous systems and body plans13,14,27,36,37.
Signalling cascades, such as the Wnt, TGF-β, Notch and Hedgehog pathways, pattern embryos by specifying cellular identity and coordinating morphogenetic events. The ligands and receptors of all of these cascades are metazoan innovations at the cell surface (Supplementary Note 8.5), except the eumetazoan sensu stricto-specific Hedgehog ligand29. The transcription factors specific to these pathways are also metazoan-specific (Tcf/Lef, Smads, CSL, Gli), whereas the cytosolic signal transducers generally have more ancient origins. This pattern suggests that these pathways arose by the engagement of novel ligands and receptors with already active signalling mechanisms, enabling multicellular communication.
Amphimedon also has fewer ligands and receptors in each pathway compared to eumetazoans (three Wnt and two Fzd, eight TGF-β ligands and five TGF-β receptors, one Notch and five Deltas) (Supplementary Note 8.5), as observed for many transcription factor families. In contrast to transcription factors13,14,27, however, these proteins generally can not be assigned to eumetazoan subfamilies or are obvious recent sponge-specific duplications. This lack of phylogenetic resolution may reflect a period of rapid evolution and diversification of ligand/receptor molecules in sponge and eumetazoan lineages. Perhaps as a consequence, the inhibitors that interact with ligands and receptors to modulate pathway activity also appear to be lineage-specific. In particular, inhibitors described from bilaterians were not found in Amphimedon (for example, Chordin, Numb, I-Smads, Wif).
Allorecognition and innate immunity
The transition to multicellularity was accompanied by mechanisms to defend against invading pathogens and to prevent the fusion of genetically distinct conspecifics2. Although some metazoan immunity genes originated early in eukaryotic evolution, many are restricted to animals, as illustrated by the signalling cascades shared by the Toll-like receptor (TLR) and the interleukin1 receptor (IL-1R) (Supplementary Note 8.10). An ancestral form belonging to this receptor superfamily was probably present in the last common metazoan ancestor and independently diversified in poriferan and cnidarian lineages. Nuclear factor κB (NF-κB), Tollip and ECSIT genes are present in holozoans; however, most TLR/IL-1R pathway proteins are either composed of metazoan-specific domains (for example, Pellino) or architectures (for example, the death domain with TIR and protein kinase domains in MyD88 and IRAKs, respectively). Immune effector systems also consist largely of metazoan innovations, such as the macrophage-expressed gene 1 (MPEG1) that participates directly in pathogen elimination38. Likewise all animals share specific antiviral defence factors such as MDA5-like RNA helicases, and interferon regulatory factor-like proteins, although other systems (for example, RNAi) have more ancient origins39. A primordial complement pathway appears to have evolved exclusively on the eumetazoan sensu stricto stem and further diversified in bilaterians40.
Amphimedon and other demosponges encode unique extracellular Calx-β domain-containing proteoglycans called aggregation factors, which promote cell adhesion and may also be involved in allorecognition41. The presence of a cluster of aggregation-factor-related genes in the Amphimedon genome indicates that allorecognition could be under the control of a multigene family.
Specialized cell types
Sponge cells adhere to form tissue-like layers, but a true epithelial cell layer, characterized by aligned cell polarity, belt-form junctions and underlying basal lamina, is thought to be a eumetazoan innovation. Amphimedon possesses all the main components of the Par, Crumbs and Discs Large (Dlg) complexes, a set of interacting proteins that are largely metazoan-specific and determine polarity in epithelial cells (Fig. 3a and Supplementary Note 8.8). The main proteins comprising bilaterian spot-form and zonula adherens junctions are also present in Amphimedon and appear to be metazoan-specific34,42. By contrast, septate junction and basal lamina proteins appear to be largely eumetazoan innovations (Fig. 3a); Amphimedon does possess several genes with laminin-like domain architectures (Supplementary Note 9.3).
Sensory systems and the neuron
Sponges can sense and respond to their environment, although nerve cells seem to be restricted to eumetazoans sensu stricto43,44. However, the expression of orthologues of post-synaptic structural and proneural regulatory proteins in Amphimedon larval globular cells suggests an evolutionary connection with an ancestral protoneuron36,42. Amphimedon possesses homologues of bilaterian proteins involved in nervous system development (for example, elav- and musashi-like RNA-binding proteins, neural transcription factors), pre- and post-synaptic organization (for example, Discs large)42, endogenous and exogenous signalling (for example, G-protein-coupled receptors (GPCRs)), and neuroendocrine secretion, although bilaterian peptide hormones are not detected (Supplementary Note 8.9). Some key synaptic genes are conspicuously missing from Amphimedon (Fig. 3b and Supplementary Note 8.9), including the ionotropic glutamate receptor family42, whereas neuronal-type metabotropic glutamate, dopamine and serotonin receptors are present. Amphimedon has a homologue of the ephrin receptor, an axon guidance protein, although the ephrin ligand and developmental genes involved in axon guidance (for example, slit, netrin, unc-5 and robo) are not present. Amphimedon also possesses over 200 GPCRs, which includes a large lineage-specific expansion of rhodopsin-related GPCRs (Rh-GPCRs) that are encoded largely by clusters of single exon genes as observed in other metazoans (Supplementary Note 8.9). From these observations we infer that the metazoan ancestor possessed a complex sensory system, and many of the molecular requirements for neural development and nerve cell function. This suggests that exaptation was critical for the genesis of the first nerve cell, with eumetazoan-specific gene innovations providing the regulatory and structural requirements to connect these protoneural components into a functional neuron (Fig. 3b).
Molecular correlates of morphological complexity
With a diverse sample of genomes in hand, we sought differences in gene repertoire that are associated with gross morphological complexity. Figure 4 shows molecular function categories that are significantly enriched (P < 1×10−10) in one or more metazoan complexity group, with the relative frequencies of genes with these functions in each species shown by colour code. Here we have defined broad groupings representing three grades of morphological complexity, guided by the number of described cell types45, including non-bilaterian (or ‘basal’) metazoans (Nematostella, Trichoplax, Amphimedon; ∼5–15 cell types), invertebrate bilaterians (Drosophila, C. elegans, sea urchin; ∼50–100 cell types), and vertebrates (∼225 cell types, represented by the human genome), with a selection of non-animals as an outgroup (Supplementary Note 11). Similarly, using a principal component analysis, we also identified suites of molecular functions that are associated with complexity (Supplementary Figure 11.2). The first component differentiates between metazoans and non-metazoans; the second component partly differentiates between metazoan complexity groups.
Included among the functional categories that correlate with increase in metazoan morphological complexity are (Fig. 4 and Supplementary Table 11.1.1): GPCRs, ion channels, cell adhesion proteins, and defence and immunity proteins, which are enriched in basal metazoans relative to non-animals; homeobox transcription factors and gap junction proteins, which are enriched in bilaterians relative to non bilaterian animals; and immunoglobulin receptor family members, immunoglobulins, MHC antigens, and cytokine receptors, which are enriched in vertebrates relative to invertebrate bilaterians. These broad associations with complexity are evidently superimposed on notable lineage-specific variation as seen in Fig. 4 (for example, serine protease gene loss in C. elegans, and voltage-gated ion channel expansion in Paramecium). Similar functional categories contribute to principal components (Supplementary Table 11.2.1).
The Amphimedon genome, combined with recently sequenced genomes of diverse invertebrates and a choanoflagellate, identifies innovations that underlie the emergence and early diversification of the Metazoa. These genomic comparisons reconstruct a common animal ancestor of remarkable complexity. Metazoans can now be defined by a long list of genomic synapomorphies—gene content, intron–exon structure and syntenies—as well as characteristics common to all animal life such as sex, development, controlled cellular proliferation, differentiation and growth, and immunity. To what extent the ancestral functioning of this gene set is reflected in modern poriferans is unclear, although studies of both sponge development, which yields a highly patterned larva with axial polarity12, and sponge immunity provide points of direct comparison with the eumetazoan condition.
Whereas the eumetazoan lineage produced a wide diversity of body forms, the sponge body plan has been stable for over 600 million years. What can explain this disparity in evolved morphological complexity? Although we have seen that sponges and eumetazoans share many common pathways related to morphogenesis and cell-type specification, there are notable genomic differences, including different microRNA assemblages46, lineage-specific domains and domain architectures, and the differential expansions of gene families. Although there has been minimal characterization of cis-regulatory architectures in non-bilaterians, we note that as most classes of bilaterian transcription factors are also present in sponges, cnidarians and placozoans, it may be that quantitative rather than qualitative differences in cis-regulatory mechanisms were needed to produce more diverse body plans.
The sexually-reproducing, heterotrophic metazoan ancestor had the capacity to sense, respond to, and exploit the surrounding environment while maintaining multicellular homeostasis. Although sponges lack some of the cell types found in eumetazoans, including neurons and muscles, they share with all other animals genes that are essential for the form and function of integrated multicellular organisms. With these genomic innovations enabling the regulation of cellular proliferation, death, differentiation and cohesion, metazoans transcended their microbial ancestry.
Note added in proof: After completing our analysis, integrins and other cell-adhesion-related genes were discovered outside metazoa47. The presumed earlier origin of integrins has been incorporated in Fig 3a.
A detailed description of methods used in this study can be found in the Supplementary Information.
Genomic DNA was sheared and cloned into plasmid and fosmid vectors for whole genome shotgun sequencing as described49. The data were assembled using a custom approach described in the Supplementary Information. The Amphimedon 9X assembly and the preliminary data analysis has been deposited at DDBJ/EMBL/GenBank as project accession ACUQ00000000.
Gene prediction and annotation
Protein-coding genes were annotated using homology-based methods (Augustus50, Genomescan51) and one ab initio method (SNAP52). Protein-coding gene predictions can be accessed from http://www.metazome.net/amphimedon.
Phylogenetic analyses were conducted using Bayesian inference and maximum likelihood with bootstrap using MrBayes55,56, and PHYML57 respectively. Alternative likelihood topologies were tested using TREEPUZZLE58 and CONSEL59. Bayesian analysis using site-heterogeneous models were done using aamodel (J. Huelsenbeck, unpublished) and PhyloBayes60,61.
Identification of Amphimedon orthologues of specific bilaterian genes
Putative orthologues of genes involved in various processes in bilaterians were identified by reciprocal BLAST of human, mouse, or Drosophila genes against the Amphimedon gene models (blastp) or the assembly (tblastn). PFAM62 domain composition, assignment of PANTHER HMMs63,64 and phylogenetic trees were used to determine orthology. Trees were built using the neighbour-joining method in Phylip65 with one-hundred bootstrap replicates.
Molecular function enrichments and correlation of complexity
Metazoan gene families were assigned molecular functions using PANTHER63 annotations. Fisher’s exact test as implemented in R66 was run to test for enrichment or depletion of numbers of gene families for each molecular function category in the novel versus ancestral gene sets. Numbers of genes (not gene families) for various molecular function categories were tested for enrichment between different pairs of four eukaryotic complexity groups (vertebrate, non-vertebrate bilaterian, basal metazoan, non-animal) to identify molecular function families that correlate with the differences in complexity. Principal components analysis was used to identify the contribution of each molecular function category to a eukaryotic complexity group.
The genome sequence data can be accessed from DDBJ/EMBL/GenBank as project accession ACUQ00000000.
This study was supported by funds from the Australian Research Council (B.M.D., Maj.A), US Department of Energy Joint Genome Institute (B.M.D., D.S.R., S.P.L.) Harvey Karp (K.S.K.), NSF (T.H.O.), NIH/NHGRI (G.M.), University of Queensland Postdocotral Fellowship (Maj.A., S.F.C), Sars International Centre for Marine Molecular Biology (Maj.A.), DFG (M.St.), ANR (M.V.), CNRS (M.V.), Gordon and Betty Moore Foundation (D.S.R.) and Richard Melmon (D.S.R.). We thank J. Huelsenbeck and I. Hariharan for help with phylogenetic analyses and growth pathways, respectively. The work conducted by the US Department of Energy Joint Genome Institute was supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231.
List of Amphimedon queenslandica kinases classified into known kinase families.
List of Amphimedon queenslandica GPCR genes classified into adhesion/secretin, glutamate, and rhodopsin-like families.
List of gene clusters that appear to be novel to metazoans, eumetazoans, eumetazoans sensu stricto, and bilaterians.
List of PFAM domains and domain architectures that appear to be novel to metazoans, eumetazoans, eumetazoans sensu stricto, and bilaterians.
List of molecular function categories that are enriched or depleted in novel metazoan genes.
List of molecular function categories that are enriched in animal complexity group comparisons.
List of molecular function categories that are depleted in animal complexity group comparisons.
Molecular function principle components that explain differences between animal complexity groups.