Introduction

Scleractinian corals serve the critical ecological role of building reefs that provide billions of dollars annually in goods and services1 and sustain high levels of biodiversity2. However, corals are declining rapidly as ocean acidification impairs coral calcification and interferes with metabolism3, ocean warming disrupts their symbiosis with photosynthetic dinoflagellates (family Symbiodiniaceae)4, and outbreaks of coral disease lead to mortality5. As basal metazoans, corals provide a model for studying the evolution of biomineralization6, symbiosis7, and immunity8,9 - key traits which mediate ecological responses to these stressors. Understanding the genomic architecture of these traits is therefore critical to understanding corals’ success over evolutionary time10 and under future environmental scenarios. In particular, there is great interest in whether corals possess the genes and genetic variation required to acclimatize and/or adapt to rapid climate change11,12,13. Addressing these questions relies on the growing genomic resources available for corals, and establishes a fundamental role for comparative genomic analysis in these organisms.

Genomic resources for corals have expanded rapidly in recent years, with genomic or transcriptomic information now available for at least 20 coral species10. Comparative genomics in corals has identified genes important in biomineralization, symbiosis, and environmental stress response10, and highlighted the evolution of specific immune gene repertoires in corals14,15 However, complete genome sequences have only been analyzed and compared for two coral species, Acropora digitifera16 and Stylophora pistillata17, revealing extensive differences in genomic architecture and content. Therefore, additional complete coral genomes and more comprehensive comparative analysis may be transformative in our understanding of the genomic content and evolutionary history of reef-building corals, as well as the importance of specific gene repertoires and diversification within coral lineages.

Here, we present the genome of Pocillopora damicornis, one of the most abundant and widespread reef-building corals in the world18. This ecologically important coral is a model species and is commonly used in experimental biology and physiology. It is also the subject of a large body of research on speciation19,20,21, population genetics22,23,24,25, symbiosis ecology26,27,28, and reproduction29,30,31. Consequently, the P. damicornis genome sequence advances a number of fields in biology, ecology, and evolution, and provides a direct foundation for future studies in transcriptomics, population genomics, and functional genomics of corals.

Using the P. damicornis genome and other publicly available genomes of cnidarians and basal metazoans, we performed a comparative genomic analysis within the Scleractinia. Using this analysis, we address the following critical questions: (1) which genes are specific to or diversified within the scleractinian lineage, (2) which genes are specific to or diversified within individual scleractinian coral species, and (3) which features distinguish the P. damicornis genome from those of other corals. We address these questions based on orthology of protein-coding genes, which generalizes the approaches taken by Bhattacharya et al.10 and Voolstra et al.17 to a larger set of complete genomes to describe both shared and unique adaptations in the Scleractinia. In comparing these genomes, we reveal prominent diversification and expansion of immune-related genes, demonstrating that immune pathways are the subject of diverse evolutionary adaptations in corals.

Results and Discussion

P. damicornis genome assembly and annotation

The estimated genome size of P. damicornis is 349 Mb, smaller than other scleractinian genomes analyzed to date (Table 1). The size of the final assembly produced here was 234 Mb, and likely lacks high-identity repetitive content (estimated ~25% of the genome based on 31-mers) that could not be assembled. Total non-repetitive 31mer content was estimated at 262 Mb and the sum of contigs was 226 Mb, indicating that up to 14% of non-repetitive content may also be missing from the assembly, likely due to high heterozygosity (Dovetail Genomics, personal communication). However, the assembly comprises 96.3% contiguous sequence, and has the highest contig N50 (28.5 kb) of any cnidarian genome assembly (Table 1). We identified 26,077 gene models, which is consistent with the gene content of other scleractinian and cnidarian genomes (Table 1). Among these genes, 59.7% had identifiable homologs (E-value \(\le \) 10−5) in the SwissProt database, 73% contained identifiable homologs in at least one of the other 10 genomes, and 83.7% contained protein domains annotated by InterProScan. Genome completeness was evaluated using BUSCO, which found that 88.4% of metazoan single-copy orthologs were present and complete (0.5% were duplicated), 2.9% were present but fragmented, and 8.7% were missing. Together, these statistics indicate the P. damicornis genome assembly is of high quality and mostly complete (Table 1).

Table 1 Assembly and annotation statistics for the P. damicornis genome (Pdam) and others used for comparative analysis. (Spis = Stylophora pistillata17; Adig = Acropora digitifera16; Ofav = Orbicella faveolata100; Disc = Discosoma spp.101; Afen = Amplexidiscus fenestrafer101; Aipt = Aiptasia85; Nema = Nematostella vectensis102; Hydr = Hydra vulgaris103; Mlei = Mnemiopsis leidyi104; Aque = Amphimedon queenslandica105). *Re-annotated using present pipeline (github.com/jrcunning/ofav-genome).

Genomic feature frequency phylogeny

Feature frequency profiling shows the phylogenetic relationships among the genomes analyzed here (Fig. 1). This genome-scale analysis resolves the Complexa (A. digitifera) and Robusta branches (P. damicornis, S. pistillata, O. faveolata) of the scleractinians as a monophyletic sister clade to the corallimorpharians, lending further support to the conclusions of Lin et al.32 that corallimorpharians are not ‘naked corals’.

Figure 1
figure 1

Genome phylogeny. Feature frequency profiling of protein-coding gene models produces a genome scale phylogeny that supports a monophyletic scleractinian clade (blue) with corallimorpharians as a sister clade (red). Bootstrap support from 100 pseudoreplicates was 100% at every node.

Scleractinian gene content and core function

Gene families were identified by ortholog clustering across the 11 genomes in Table 1 (Supplementary Data S1). Across all four scleractinian genomes, we identified 43,580 ortholog groups ranging in size from 1 to 566 genes, with 14,653 of these gene families present in more than one coral species. That only a third of ortholog groups occurred in multiple genomes suggests high divergence among scleractinians, consistent with the findings of Voolstra et al.17. The highest number of shared ortholog groups occurred between P. damicornis and S. pistillata, the two most closely related species, and a dendrogram based on shared gene content33 reproduces the evolutionary relationships among the four corals34 (Fig. 2). Although gene content in O. faveolata is more similar overall to the other robust corals than the complex A. digitifera, O. faveolata also has the highest number of species-specific gene families, which may reflect its large genome size and/or adaptations to the Atlantic Ocean.

Figure 2
figure 2

Species-specific and shared gene families across four scleractinian genomes. Numbers indicate gene families, including both single-copy genes and multi-copy gene families. Dendrogram is based on shared gene content, following33.

A total of 7,536 ortholog groups were found in all four scleractinian genomes, constituting putative coral ‘core’ genes. Members of these core gene families comprised 46.6% of all P. damicornis genes, and functional profiling of the core genome revealed significant enrichment of 44 GO terms associated with basic cellular and metabolic functions, including nucleic acid synthesis and processing, cellular signaling and transport, and lipid, carbohydrate, and protein metabolism (Supplementary Data S2). This basic functionality explains why >30% of these gene families were also found in all other cnidarians, and 96.3% had orthologs in at least one non-coral. This is consistent with the identification of basic housekeeping functions in the core (shared) protein sets in other comparative studies10,17.

Genes specific to and diversified in scleractinians are enriched for immune functionality

A subset of the coral core gene families (n = 278; 3.7%) had no orthologs outside the Scleractinia, suggesting they may reflect important evolutionary innovations within this group35. (We refer to these genes as ‘coral-specific’ since they did not have orthologs in the non-scleractinian genomes analyzed here, yet these may still show sequence or domain similarity with genes in other organisms and in reference databases.) Other gene families (n = 21) were significantly larger in scleractinians than other anthozoans (Fisher’s exact test, p < 0.01; Fig. 3, Supplementary Table S1), indicating gene family expansion that may underlie adaptation to the scleractinian condition36,37. Genes belonging to both coral-specific and coral-diversified families in P. damicornis were enriched for GO terms involved in cellular signaling and immunity (Table 2), and showed significant similarity to proteins with known immune function (Fig. 3, Supplementary Table S1).

Figure 3
figure 3

Heatmap showing gene ortholog groups that were larger in scleractinians compared to other cnidarians (Pd = P. damicornis, Sp = S. pistillata, Of = O. faveolata, Ad = A. digitifera, Nv = N. vectensis, Ap = A. pallida, Ds = Discosoma sp., Af = A. fenestrafer). For each ortholog group, the longest protein sequence from P. damicornis was compared to the UniProt-SwissProt database using blastp, and the top hit was selected based on the lowest E-value (if < 1e-10). Uniprot accession numbers are shown in brackets. Sequences with no annotation had no hits to the SwissProt database with E < 1e-10. Gene family sizes and E-values for SwissProt hits can be found in Supplementary Table S1.

Table 2 Enrichment of GO terms in the coral-specific and coral-diversified genes in P. damicornis. Of the coral-specific genes (n = 349), 184 (53%) had GO annotations. Of the coral-diversified genes (n = 339), 229 (68%) had GO annotations.

Immune-related GO terms that were significantly enriched in the coral-specific gene set included viral defense, signal transduction, and NF-κB pathway regulation (Table 2). NF-κB signaling plays a central role in innate immunity38,39, and was recently demonstrated to be conserved and responsive to immune challenge in the coral O. faveolata40. Signal transduction was associated with 32 coral-specific genes that showed significant similarity to dopamine receptors, neuropeptide receptors, G-protein coupled receptors, and tumor necrosis factor (TNF) receptor-associated factors (TRAFs) (Supplementary Data S3), potentially representing other coral-specific immune pathways. Indeed, the TNF receptor superfamily is more diverse in corals than any organism described thus far other than choanoflagellates9,41, with 40 proteins in A. digitifera. The P. damicornis genome contained 39 proteins with TNFR cysteine-rich domains, suggesting that diversification of this repertoire may be a common feature of corals. Another enriched GO term in the coral-specific gene set was caveola assembly–the formation of structures in cell membranes that anchor transmembrane proteins–which may also play a role in signal transduction and immunity42.

The coral-diversified gene families (Fig. 3, Supplementary Table S1) showed high similarity to receptors for pathogen recognition, such as a C-type lectin, G-protein-coupled receptors (GPCRs), and both Notch and Wnt-signaling receptors (lipoprotein receptor-related protein). Notch and Wnt signaling are critical developmental gene pathways with potentially diverse roles in coral biology, but which also have a role in coral innate immunity43, particularly in wound-healing processes39,44. Other coral-diversified genes were similar to Ras-related proteins with leucine-rich repeats, and a tetratricopeptide repeat-containing protein, which may play roles in signal transduction45. Many of these tetratricopeptide repeat proteins also contained a CHAT domain characteristic of caspases46, indicating a potential role in apoptotic signaling and/or coral bleaching. Other coral-diversified genes were similar to Poly (ADP-ribose) polymerase, which may act as an anti-apoptotic signal transducer47, and lactadherin, which may be involved in phagocytosis and clearance of apoptotic cells48. Genes previously found to be differentially expressed in corals under stress or immune challenge were also found in the coral-diversified gene set, including the HSP70 co-chaperone sacsin49, the oligopeptide transporter solute carrier family 1550, and NFX1-type zinc finger protein51. Together, these results suggest that corals as a group have evolved a diverse set of immune signaling genes for interacting with and responding to pathogens and the environment. Importantly, the immune repertoire in corals also contains many other important gene families that are not discussed here (e.g., Toll-like receptors)39,40, since we focus only on those that are specific to or diversified in corals.

In addition to defending against pathogens, many of the immune pathways highlighted here may mediate the establishment and maintenance of symbiosis, including with beneficial bacteria and the Symbiodiniaceae15. Indeed, lectins and other pattern recognition receptors have previously been shown to regulate symbiont uptake and specificity7,52,53, while caspases have previously been shown to mediate bleaching and symbiont removal through apoptosis54. We also found that copper ion transmembrane transport was highly enriched in the coral-specific gene set (Table 2), which may reflect an important role for delivery of copper to endosymbionts, where it is a critical component of photosynthetic proteins (plastocyanin) and antioxidants (superoxide dismutase)55. For example, in mycorrhizal symbioses, fungi are known to deliver copper to their photosynthetic plant partners56, and shortage of trace metals such as copper has recently been linked to coral bleaching57.

In addition to immunity and symbiosis, the genes specific to and diversified within corals may underlie other unique scleractinian traits. Calcium carbonate skeleton formation, for example, may be linked to the diversification of calcium ion channels (e.g., polycystins) and cell adhesion proteins (e.g., coadhesin, Fig. 3), which have previously been identified as components of the skeletal organic matrix6,58. Corals may also have diversified mechanisms for controlling gene expression, evidenced by the enrichment of transcriptional regulation and chromatin silencing functions in the coral-specific gene set (Table 2), and the diversification of a histone demethylation protein family (Fig. 3). Finally, we note that some enriched GO annotations do not translate directly to corals (e.g., kidney development), and/or are only represented by a single gene (Table 2), and should therefore be interpreted with caution.

Within-species gene diversification also highlights immune function in scleractinians

The expansion of gene families within individual lineages may represent an important mechanism of molecular evolution driving adaptation and speciation36. Consistent with patterns of gene family size in other organisms59, the number of coral gene families decreased exponentially as gene family size increased (Fig. 4). P. damicornis had smaller gene families overall, and the fewest large gene families (n = 3 with size > 32, max size = 75), while A. digitifera had the most large gene families (n = 25 with size > 32, max size = 255), consistent with pervasive gene duplication in this species suggested by Voolstra et al.17. However, statistical comparison of shared gene family sizes across the four coral species, accounting for differences in total gene content, indicated that S. pistillata had the most significantly expanded gene families (n = 16), followed by A. digitifera (n = 11). Even though O. faveolata had the highest number of non-shared gene families (Fig. 2), only one shared gene family was significantly expanded, suggesting that its large genome size is the result of species-specific genes and/or even expansion of shared gene content. Finally, P. damicornis had no significantly expanded gene families relative to the other scleractinians, confirming that uneven gene family size17 and lineage-specific gene family expansion is common in the Scleractinia.

Figure 4
figure 4

Gene family size distribution in four coral genomes. Pdam = P. damicornis, Spis = S. pistillata, Ofav = O. faveolata, Adig = A. digitifera. Bars represent the total number of gene families in a given size class using exponential binning, with each interval open on the left (i.e., the first interval contains gene families of size 1, the second interval contains gene families of size 2 and 3, etc.).

Among the gene families showing lineage-specific expansions in corals, several were similar to reverse transcriptases and transposable elements (Supplementary Table S2); these may represent ‘genetic parasites’ that propagate across the genome60, but they may also play crucial roles in genome evolution and the regulation of gene expression61. Annotations of other expanded gene families (Supplementary Table S2) suggest important roles in interacting with the environment, cellular signaling, and immunity. While uneven gene family size could also reflect variation in assembly completeness and quality of genome assemblies62, these annotations are consistent with categories of genes known to undergo lineage-specific expansion across eukaryotes45.

One expanded gene family in A. digitifera was similar to NOD-like receptors (NLRs), which are cytoplasmic pattern recognition receptors that play a key role in pathogen detection and immune activation63. Characterized by the presence of NACHT domains, NLR genes are highly diversified and variable in number in the genomes of cnidarians64 and other species60. The expansion of this gene family in A. digitifera is consistent with these observations, and may represent adaptation to a new pathogen environment65, or to species-specific symbiotic interactions with microbial eukaryotes and prokaryotes39. Another expanded gene family in A. digitifera was similar to ephrin-like receptors, which may mediate signaling cascades and cell to cell communication66. In S. pistillata, one expanded gene family was similar to tachylectin-2, a pattern recognition receptor that has been identified in many cnidarians39. Previously, a tachylectin-2 homolog was found to be under selection in the coral Oculina67, providing more evidence that such genes are involved in adaptive evolution in corals. The one significantly expanded gene family in O. faveolata did not have a strong hit in the SwissProt database, but did contain a caspase-like domain suggesting a role in apoptosis, which was recently linked to disease susceptibility in this species68. Overall, differential expansion of genes related to the immune system is consistent with the findings of Voolstra et al.17, and suggests that this phenomenon is a general attribute of corals. Lineage-specific immune diversification in corals and other taxa may reflect interactions with specific consortia of eukaryotic and prokaryotic microbial symbionts69.

In addition to putative immune-related function, genes that have undergone lineage-specific expansions in corals may also play roles in biomineralization, which could contribute to variation in growth and morphology among coral species. For example, one significantly expanded gene family in A. digitifera was similar to a CUB and peptidase domain-containing protein that was found to be secreted in the skeletal organic matrix58, and another in S. pistillata was similar to fibrillar collagen with roles in biomineralization70.

Although the P. damicornis genome did not contain any gene families that were significantly expanded relative to the other corals, it did contain many genes (n = 6,966, 26.7%) with no orthologs in other genomes. While most of these P. damicornis-specific genes were unannotatable, protein domain homology revealed significant enrichment for 11 GO terms, including G-protein coupled receptor (GPCR) signaling pathway, bioluminescence, activation of NF-κB-inducing kinase, and positive regulation of JNK cascade (Table 3). The mitogen-activated protein kinase JNK plays a role in responses to stress stimuli, inflammation, and apoptosis71. JNK prevents the accumulation of reactive oxygen species (ROS) in corals in response to thermal and UV stress, and inhibition of JNK leads to coral bleaching and cell death72. The NF-κB transcription factor may also link oxidative stress and apoptosis involved in coral bleaching73, in addition to its central role in innate immunity39. The occurrence of lineage-specific genes that may function in these pathways indicates that P. damicornis may have evolved unique immune strategies for coping with environmental stress.

Table 3 Enrichment of GO terms in the P. damicornis-specific gene set. This gene set included 6,966 genes, of which 1,498 (22%) had GO annotations.

An expanded role of immunity in P. damicornis may explain how Pocillopora has achieved such a widespread distribution18,19. Indeed, Pocillopora corals function as fast-growing and weedy pioneer species in Hawaii74, on the Great Barrier Reef75, and in the eastern tropical Pacific (ETP)76. In fact, in the ETP, where the coral used in this study was collected from, Pocillopora thrives in marginal habitats, often dealing with elevated turbidity and reduced salinity after heavy rainfall events, subaerial exposure during extreme low tides, and both warm- and cold-water stress due to ENSO events and periodic upwelling77. A diversified immune system may also allow for flexibility in symbiosis, which may further contribute to the success of Pocillopora26,78. While the wide distribution of P. damicornis suggests there may be considerable variation in its genome that is not captured by our sample from the ETP, this work provides a foundation for future genomic analysis in this important coral species.

Conclusions

This comparative analysis revealed significant expansion of immune-related pathways within the Scleractinia, and further lineage-specific diversification within each scleractinian species. Different immune genes were diversified in each species (e.g., Nod-like and tachylectin-like receptors in A. digitifera and S. pistillata, and caspase-like and JNK signaling genes in O. faveolata and P. damicornis), suggesting diverse adaptive roles for innate immune pathways. Indeed, immune pathways govern the interactions between corals and their algal endosymbionts15,79, the susceptibility of corals to disease80, and their responses to environmental stress72. Therefore, prominent diversification of immune-related functionality across the Scleractinia is not surprising, and may underlie responses to selection involving symbiosis, self-defense, and stress-susceptibility.

The function and diversity of both the Scleractinia-specific and the species-specific immune repertoires deserve further study as they could prove to be critical for coral survival in the face of climate change. Indeed, factors placing high selection pressure on corals, such as bleaching and disease, both involve challenges to the immune system. Lineage-specific adaptations indicate corals continue to evolve novel immune-related functionality in response to niche-specific selection pressures. These results suggest that evolution of the innate immune system has been a defining feature of the success of scleractinian corals, and likewise may mediate their continued success under climate change scenarios.

Methods

P. damicornis genome sequencing and assembly

The P. damicornis genotype used for sequencing was collected from Saboga Is., Panama in March 2005, and cultured indoors at the University of Miami Coral Resource Facility until the time of sampling. Genomic DNA was extracted from two healthy fragments and two bleached fragments of this genotype in September 2016 using a Qiagen DNAeasy Midi kit and shipped overnight on dry ice to Dovetail Genomics (Santa Cruz, CA, USA). The bleached sample (low symbiont load) was used for DNA extraction for shotgun libraries (sequenced on an Illumina HiSeqX) for de novo assembly, since this step is more sensitive to contamination. The unbleached sample was used for DNA extraction for Chicago libraries (sequenced on an Illumina HiSeq2500), since this step is more sensitive to DNA size and quality but less sensitive to contamination. Genome scaffolds were assembled de novo using the HiRise software pipeline81. The Dovetail HiRise scaffolds were then filtered to remove those of potential non-coral origin using BLAST82 searches against three databases: (1) Symbiodiniaceae, containing the genomes of Breviolum minutum83 and Symbiodinium microadriaticum84, (2) Bacteria, containing 6,954 complete bacterial genomes from NCBI, and (3) viruses, containing 2,996 viral genomes from the phantome database (phantome.org; accessed 2017-03-01). Scaffolds with a BLAST hit to any of these databases with an e-value < 10−20 and a bitscore > 1000 were considered to be non-coral in origin and removed from the assembly85.

P. damicornis genome annotation

The filtered assembly was analyzed for completeness using BUSCO86 to search for 978 universal metazoan single-copy orthologs. The–long option was passed to BUSCO in order to train the ab initio gene prediction software Augustus87. Augustus gene prediction parameters were then used in the MAKER pipeline88 to annotate gene models, using as supporting evidence two RNA-seq datasets from P. damicornis89,90, one from closely-related S. pistillata17, and protein sequences from 20 coral species10. Results from this initial MAKER run were used to train a second gene predictor (SNAP)91 prior to an iterative MAKER run to refine gene models. Predicted protein sequences were then extracted from the assembly and putative functional annotations were added by searching for homologous proteins in the UniProt Swiss-Prot database92 using BLAST (E < 10−5), and protein domains using InterProScan93. Genome annotation summary statistics were generated using the Genome Annotation Generator software94. All data and code to reproduce these annotations as well as subsequent comparative genomic and statistical analyses is available at github.com/jrcunning/pdam-genome.

Comparative genomic analyses

We compared the predicted protein sequences of four scleractinians, two corallimorpharians, two actiniarians, one hydrozoan, one sponge, and one ctenophore (Table 1), by feature frequency profiling (FFP v3.19)95 using features of length 8 to create a whole genome phylogeny for these organisms (Fig. 1). We then identified ortholog groups (gene families) among the predicted proteins from these genomes using the software fastOrtho (http://enews.patricbrc.org/fastortho/) based on the MCL algorithm with a blastp E-value cutoff of 10−5. Based on these orthologous gene families, we defined and extracted several gene sets of interest: (1) gene families that were shared by all four scleractinians (i.e., coral ‘core’ genes), (2) gene families that were present in all four scleractinians but absent from other organisms (i.e., coral-specific genes), (3) gene families that were significantly larger in scleractinians relative to other anthozoans (Binomial generalized linear model, FDR-adjusted p < 0.01; i.e., coral-diversified genes), (4) gene families that were significantly larger in each scleractinian species relative to other scleractinians (pairwise comparisons using Fisher’s exact test, FDR-adjusted p < 0.01; i.e., coral species-specific gene family expansions), and (5) genes present in P. damicornis with no orthologs in any other genome (i.e., P. damicornis-specific genes).

Functional characterization

Putative gene functionality was characterized using Gene Ontology (GO) analysis. GO terms were assigned to predicted P. damicornis protein sequences using InterProScan96. Significantly enriched GO terms in gene sets of interest relative to the whole genome were identified using the R package topGO97,98. These analyses were implemented using custom scripts available in the accompanying data repository (github.com/jrcunning/pdam-genome).