Introduction

Stony corals (Scleractinia) can form mutualistic symbioses with photosynthetic dinoflagellates of the genus Symbiodinium that are based on nutritional exchange and allow coral reef growth in oligotrophic marine environments (Muscatine et al., 1975; Trench, 1979). These reef-building corals provide the foundation for the coral reef ecosystem and a habitat for millions of marine species. Currently, because of global climate change and multiple stress factors, coral reefs around the world are in decline (Hughes et al., 2003; Hoegh-Guldberg et al., 2007). Hyperthermal stress conditions are proposed as a major global factor destabilizing the coral–algal symbiosis, leading to the loss of endosymbionts (or their photopigments) that is manifested on the colony level as a loss of tissue coloration, also known as ‘coral bleaching’ (Douglas, 2003; Weis, 2008). As a result of bleaching, coral health may be compromised, increasing the incidence of coral mortality (Baker, 2003; McClanahan, 2004; Hoegh-Guldberg et al., 2007). The susceptibility of the coral–algal symbiosis to stress and bleaching is influenced by the entire holobiont, comprising coral host and associated microbes (Rowan, 1998; Rohwer et al., 2002). In particular, the loss of dinoflagellate photosymbionts (Rowan, 2004) may also affect coral fitness and its ability to respond to other stressors (Howells et al., 2012).

Dinoflagellates are important microbial eukaryotes that, together with diatoms, are the leading primary producers in the oceans (Lin, 2011), although recent studies suggest that dinoflagellates of the genus Symbiodinium are also capable of heterotrophy (Jeong et al., 2012). Dinoflagellates are characterized by a very large genome (Hackett et al., 2004) and a number of unique features such as DNA containing 5-hydoxymethylmuracil (Rae, 1976), a lack of the usual histones (Rizzo, 1981) and transcriptional regulatory elements (Li and Hastings, 1998). Dinoflagellates contain a conserved spliced leader sequence (Zhang et al., 2007), and highly expressed genes with elevated copy numbers and tandem repeats (Bachvaroff and Place, 2008). A number of dinoflagellate genes have been acquired from bacteria and other eukaryotes by horizontal gene transfer or endosymbiosis, resulting in important gene innovations (Wisecaver and Hackett, 2011; Wisecaver et al., 2013).

A symbiotic lineage of single-cell dinoflagellate protists is divided into clades (A–I) and numerous phylogenetically distinct types (Santos et al., 2002; Pochon et al., 2004). Out of the nine clades, Symbiodinium clades A, B, C and D are commonly associated with corals and other metazoans (Pochon and Gates, 2010). Although symbiont abundance shows some seasonal variability (Chen et al., 2005), Symbiodinium clade C are the most common endosymbionts found in reef-building corals from the Pacific and Indian Oceans (Lesser et al., 2013) and are represented by many distinct species (Thornhill et al., 2013). Symbiodinum clades A and B are more abundant within Caribbean corals (LaJeunesse, 2002; Baker, 2003), whereas clade D are dominant in corals living in the warm waters of the Persian Gulf (Ghavam Mostafavi et al., 2007). On the Great Barrier Reef (GBR), surveys of Symbiodinium genetic diversity can be found on the web-based SymbioGBR database (Tonk et al., 2013).

Recent sequencing projects have revealed some of the unique characteristics of Symbiodinium related to their transcriptional regulation (Bayer et al., 2012) and functional differences between clades (Ladner et al., 2012). The latest sequencing project for Symbiodinium minutum (Shoguchi et al., 2013) estimated a genome size of 1.5 Gbp with 42 000 predicted protein-encoding genes. In this study, we focused on protein-encoding genes from four different coral-associated Symbiodinium clades (A, B, C and D) using transcriptomic data generated by massively parallel Illumina sequencing (San Diego, CA, USA). Our aim was to explore the presence of unique, evolutionarily conserved genes in the symbiotic dinoflagellates that would determine their capacity for symbiosis with corals and other marine species. Here, we describe de novo transcriptome assemblies for the four Symbiodinium clades and provide gene annotation, gene ontology and pathway analyses for the predicted Symbiodinium genes orthologous to all four clades. Finally, we explore the possible role of conserved calcium/calmodulin-dependent protein kinases (CCaMKs) and the inositol pathway for the foundation of cnidarian–dinoflagellate symbiosis.

Materials and methods

Cultures, RNA extraction and sequencing

Cultures of Symbiodinium spp. used in this study were polyclonal cultures of clades A (internal transcribed spacer (ITS) type A2) isolated from the coral Zoanthus sociatus (Caribbean region); B (ITS type B2) from Oculina diffusa (Bermuda); C (ITS type C1) from the anemone Discosoma sanctithomae (Jamaica), all obtained from Professor Roberto Iglesias-Prieto (RSU, UNAM, Puerto Morelos, Mexico); and culture of clade D (ITS type D1) isolated from Porites annae (W. Pacific) by Professor Michio Hidaka (University of the Ryukyus, Japan) and was donated by Ass/Professor Scott Santos (Auburn University, Auburn, AL, USA). Cultures were grown in axenic f/2 medium (Guillard and Ryther, 1962) and maintained at a temperature of 25 °C, under a constant 12:12-h day/night period, with an irradiance of 50 μmol quanta m−2 s−1 (measured using a Li-Cor flat quantum sensor, Lincoln, NE, USA). The algal cultures from different cell growth phases were combined to maximize the diversity of gene expression profiles. The algal cells were centrifuged and the resulting pellet was snap-frozen in liquid nitrogen and stored at −80 °C before RNA extraction. Total RNA was extracted from algal cells as previously described (Rosic and Hoegh-Guldberg, 2010). Briefly, this method combines the usage of Trizol reagent (Ambion Life Technologies, Austin, TX, USA) followed by the RNeasy kit (Qiagen, Hilden, Germany). The RNA quantity and integrity were analyzed using a NanoDrop ND-1000 spectrometer (Wilmington, DE, USA) and an Agilent 2100 Bioanalyzer (Santa Clara, CA, USA), RNA integrity number >6. Equal concentrations of high-quality RNA from algal cultures were used to prepare libraries (using the Illumina TruSeq RNA Sample Preparation Kit, San Diego, CA, USA) for sequencing with the Illumina GA II Sequencing System at the Australian Genome Research Facility Ltd. To avoid bacterial contamination, library construction included the purification of poly-A-containing mRNA molecules using poly-T oligo-attached magnetic beads.

Short read data assessment and quality control

Phred-like quality scores are calculated by the Illumina sequencer for sequencing data, and nucleotides (nt) with quality scores <20 were trimmed from reads together with Illumina sequencing primers and multiplex adaptors. After trimming, reads <55 nt in length were removed. The remaining ‘clean’ short reads were assembled into contiguous sequences (contigs) before further analyses.

De novo short read assembly

The software applications Velvet version 1.1.04 (Zerbino and Birney, 2008) and Oases version 0.1 (Schulz et al., 2012) were used for de novo transcriptome assembly, with a k-mer size of 53. Oases was used to cluster the Velvet-assembled contigs to construct transcript isoforms. Contigs with less than three times k-mer coverage were removed from the final assemblies.

To reduce the redundancy of the assembled contigs, each transcriptome was first aligned to itself using the Nucleotide-Nucleotide Basic Local Alignment Search Tool (BLAST) software application (BLASTn, version 2.2.27+; Altschul et al., 1990). Any pair of contigs that was >99% identical over 95% of the length of the shorter contig was collapsed into a single contig by removal of the shorter contig. Additional testing to reduce redundancy was carried out and pairs of contigs that were >80% and >90% identical over 95% of the length of the shorter contig were collapsed into the longer contig. The number of redundant transcripts at the >80% and >90% identity levels was not significantly reduced, and reduction of redundant transcripts was kept at the >99% identity level (Supplementary Table S1).

Assembly assessment and annotation

To assess the quality of the assembled transcriptomes, the contigs were aligned, with BLASTn and an E-value of 10−5, to Symbiodinium expressed sequence tags (ESTs) from Genbank and the Joint Genome Institute databases as well as Symbiodinium hemoglobin and heat shock protein genes (Rosic et al., 2011a, 2013). Subsequently, the reference database was extended to include non-Symbiodinium sequences from Genbank (bacterial, environmental, invertebrate, plant and viral nucleotide sequences; Acropora digitifera genome) and other databases (A. millepora transcriptome, Moya et al., 2012; A. hyacinthus, A . tenuis and Porites astreoides transcriptomes, Eli Meyer and Mikhail Matz, www.bio.utexas.edu/research/matz_lab) to identify non-Symbiodinium homologs of the contigs that did not align to known Symbiodinium sequences. Furthermore, contigs >300 nt in length were aligned, using the Translated Nucleotide-Protein BLAST application (BLASTx, version 2.2.27+) and an E-value of 10−5, to the SwissProt (SP) protein (version 2011_08), TrEMBL (version 2012_07) and National Center for Biotechnology Information (NCBI) non-redundant protein sequence (nr; 13 August 2011 update) databases for the purpose of annotation. Additional BLAST comparison of individually targeted sequences was carried out against an EST database (http://sequoia.ucmerced.edu/SymBioSys/) and the coral proteome database, ZoophyteBase (Dunlap et al., 2013). These databases include sequences of various cnidarian species and Symbiodinium strains. The online software application, VENNY (http://bioinfogp.cnb.csic.es/tools/venny/), was used to identify database sequences that were aligned to contigs from two or more clades, as well as redundancy in the database alignments of contigs within each clade.

Identification of orthologs in all four Symbiodinium transcriptomes

To identify orthologous transcripts within the de novo-assembled transcriptomes, we applied the Reciprocal Best BLAST Hits (RBBH) approach (Telford, 2007) and adopted a four-way reciprocal BLASTn (E-value 10−15) strategy, with all pairwise comparisons (Ness et al., 2011). Criteria for identifying orthologous sequences were: minimum length of 200 nt; minimum sequence identity of 90%; and minimum alignment proportion of 80% for the shorter contig (Ness et al., 2011).

Functional profile and gene ontology (GO) enrichment analyses

To determine the function of the orthologous genes, we performed BLASTx against the SP protein database using an E-value of 10−15. Using this conservative approach, we only obtained sequence alignments with well-characterized orthologous proteins existing in the SP database. The SP genes that aligned best with the orthologous contigs were considered orthologous genes and used for downstream enrichment analyses.

GO enrichment analyses and pathway analyses, as well as the identification of enriched biological themes and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, were performed using the database for annotation, visualization and integrated discovery (DAVID) using the UNIPROT ACCESSION terms from the SP genes as identifiers (Huang da et al., 2009a, 2009b). All SP-annotated genes from all four clades contributed to the background gene set for the enrichment analyses. DAVID uses the Fisher’s exact test to ascertain statistically significant gene enrichment for a particular pathway, and significant processes were selected based on a corrected P-value <0.05. We applied the corrected P-value (Benjamini correction) with a cutoff of 0.05 for filtering the significantly enriched pathways and a P-value of 0.001 for filtering enriched GO categories (Huang da et al., 2009a; Meyer et al., 2009).

Taxonomic analyses

To evaluate an evolutionary origin and possible horizontal gene transfer within our transcriptome sequences, predicted transcripts from each of the analyzed samples were aligned to publicly available data sets, including ESTs, genome and transcriptome sequences from Genbank databases (accessed March 2012; that included bacterial, environmental, invertebrate, plant and viral nucleotide sequences; A. digitifera genome; human genome; and Symbiodinium ESTs); Symbiodinium ESTs from the Joint Genome Institute (University of California, Merced, CA, USA); the A. millepora transcriptome (Meyer et al., 2009; Moya et al., 2012); and the A. hyacinthus, A. tenuis and P. astreoides transcriptomes (Eli Meyer, Mikhail Matz, et al. data: www.bio.utexas.edu/research/matz_lab). Database alignments were carried out using BLASTn, specifying a word size of 11 and E-value 10−5.

In addition, to compare our conserved Symbiodinium transcripts with other eukaryotic genomes, we performed BLASTn (E-value 10−5) to genome and transcriptome sequences of three dinoflagellates Alexandrium minutum, Cyanidioschyzon merolae and Symbiodinium minutum, as well as the coral A. digitifera and human genomes.

Results

De novo short read assembly

Massively parallel Illumina sequencing generated over 19 million raw reads (99 nt single reads) and a total of 1.9–2.9 Gb raw data for each of the four Symbiodinium transcriptomes (Table 1). After trimming to remove low-quality bases, primer and adaptor sequences, the sequence reads were used to produce Velvet/Oases assemblies resulting in over 90 000 transcripts for each clade, with an average contig length between 300 and 396 bp, N50 between 310 and 479 bp and longest contigs from 3955 to 7242 bp (Table 2). In order to estimate the number of genes expressed, we clustered highly similar contigs that had at least 99%, 90% and 80% of sequence identity over 95% of the length of the shorter contig. Our results showed that when reducing redundancy, the decrease in the numbers of predicted transcripts was <1% for each algal clade (Supplementary Table S1). Therefore, we clustered similar contigs based on 99% sequence identity. Frequencies of the transcript length distribution for each Symbiodinium clade are shown in Supplementary Figure S1. After removal of short contigs (<300 nt), we obtained between 29 846 and 46 892 nonredundant transcripts for each clade. Following BLASTx comparison of the predicted transcripts with the SP, TrEMBL and the NCBI non-redundant protein sequence databases, we successfully annotated 40–44% of the transcripts (Table 2). Similarity between our de novo transcriptome assemblies and publicly available Symbiodinium EST sequences as well as non-Symbiodinium databases, based on BLASTn with cutoff E-values of 10−5, is presented in Supplementary Table S2.

Table 1 Sequencing statistics: the number of reads and total amount of sequence generated for each clade
Table 2 Overview of the sequencing data, assembly, clustering and annotation statistics: clustering was based on >99% pairwise identity between transcripts predicted by Velvet/Oases

Assembly assessment and annotation

In order to validate the accuracy of the de novo assemblies, we compared the Symbiodinium (C3) Sanger sequences (Leggat et al., 2007) for conserved HSP70 and HSP90 genes (Rosic et al., 2011a) and novel polymorphic hemoglobin-like proteins (Rosic et al., 2013) with the assemblies. The results of BLASTn and BLASTx alignments, with stringent E-values and high bit scores, confirmed the presence of these genes in the de novo assemblies of all four transcriptomes presented here (Supplementary Table S3). For conserved genes such as HSP70 and HSP90, we observed sequence identity of 95–99% between the EST Sanger sequences for Symbiodinium clade C3 and our de novo-assembled genes from the various Symbiodinium clades and ITS-2 types. The hemoglobin genes showed conservation at the protein level, but a greater level of sequence variability at the nucleotide level in the range of 65–94% for Hb1 and 69–96% for Hb2, across the clades.

Functional profile and GO enrichment analyses

GO enrichment analysis highlighted enriched pathways within the KEGG database, including well-described proteins and ubiquitous biochemical pathways. We identified six pathways common in all four Symbiodinium clades: phosphatidylinositol signaling system, inositol phosphate metabolism, spliceosome, ribosome, endocytosis and sucrose metabolic pathways (Table 3).

Table 3 Significant pathways enriched in all four Symbiodinium clades with corrected P-value 0.05 after multiple testing correction (MTC) by the Benjamini procedure

Enrichment analyses of our annotated transcriptomes revealed GO categories that were significantly enriched among the four analyzed dinoflagellates. Examples of enriched GO categories common to all analyzed dinoflagellates (using the Benjamini correction P-value <0.05 as cutoff) included 10 biological process categories related to photosynthesis: transcription, nitrate metabolism, microtubule-based processes and phosphatidylinositol metabolic processes (Table 4 and Figure 1). In the case of the molecular function category, there was no significant enrichment found that was common to all four dinoflagellates. For the cellular component category, all four Symbiodinium clades showed enrichment in genes related to 29 categories that included cellular components such as chloroplast, microtubules, dynein complex, cytosol and other plastid parts (Table 5).

Table 4 Biological processes (BP) enriched in all four Symbiodinium clades with corrected P-value 0.05
Figure 1
figure 1

Biological processes (BP) enriched in all four Symbiodinium clades, using DAVID enrichment analyses of nonredundant transcripts 300 nt with BLASTx hits to SP, E-value <10−15.

Table 5 Cellular components (CC) enriched in all four Symbiodinium clades with corrected P-value 0.05

The number of alignments to the SP database is presented in Table 2. The records of nonredundant Symbiodinium transcripts from different clades that aligned to the SP database, including transcripts from two or more clades that aligned to the same SP genes, are presented in a four-way Venn diagram (Figure 2). Orthology among all Symbiodinium clades was inferred for the 1053 genes at the core of the four-way Venn diagram. These genes shared the same UNIPROT ACCESSION IDs and showed a high degree of sequence similarity among the four Symbiodinium clades, and included a number of multiple-copy genes (Supplementary Table S4). This list of shared Symbiodinium genes included conserved genes such as heat shock proteins (Hsp70 and Hsp90); housekeeping genes (HKGs) including actin, calmodulin, tubulin, GAPDH and cyclophilin; as well as several ribosomal, cytochrome genes with chloroplast-based heme-containing cytochrome P450 and photosynthetic genes including ribulose bisphosphate carboxylase and peridinin-chlorophyll a-binding protein (Table 6). Common antioxidant genes important in stress responses were also identified including thioredoxin, ferredoxin and superoxide dismutase. Furthermore, we have identified calcium-dependent protein kinases and CCaMKs that are important in intercellular signaling (Table 6). Several CCaMK isoforms within the conserved catalytic region have been identified for Symbiodinium clades, and are presented in a multiple sequence alignment (ClustalX) including the representative CCaMKs from other species (Figure 3). The phylogenetic analyses of the conserved catalytic domain revealed two monophyletic groups of Symbiodinium CCaMKs, indicating their different evolutionary origin (Figure 3c).

Figure 2
figure 2

The Venn diagram of proposed Symbiodinium transcripts from each clade (300 nt in size) aligned to the SP database, showing the number of genes unique to each clade as well as those shared among clades. Sequence homology was inferred when the expectation value (E-value) was 10−15.

Table 6 The UniProt IDs and descriptions of some of the common genes that have been found in all four Symbiodinium clades
Figure 3
figure 3

Diagram of the CCaMK structure (Swulius and Waxham, 2008), showing the conserved catalytic domain (black box) required for kinase activity; autoinhibitory domain (gray box) and the C-terminal regulatory domain (white box) holding Ca2+-binding EF-hands (a). The multiple sequence alignment (MSA) of the conserved CCaMK region from four Symbiodinium clades including representatives of different isoforms from other species: Karenia brevis (CO064068); Lilium longiflorum (2113422A); Medicago truncatula (Q6RET7); CCaMK of Malus domestica (Q07250); Acropora digitata (adi_v1.00159); and Homo sapiens (Q14012) (b). The identical residues in all sequences are indicated by white letters with a black background (amino acids conserved in 100% of the sequences), white letters with a gray background (80% conserved) and black letters with a gray background (60% conserved). The MSA was constructed using clustalX (ftp://ftp.ebi.ac.uk/pub/software/clustalw2). Phylogenetic analyses of deduced amino acid sequences of Symbiodinium and other representative species were done using sequences from MSA (c). The phylogenetic tree was tested using a 1000-replicated bootstrap analysis (Felsenstein, 1989) and bootstrap values >50% are indicated at each node. A distance method using maximum likelihood estimates was based on the Dayhoff PAM matrix. The scale for the branch length (0.1 substitutions per site) is presented under the tree.

Taxonomic analyses

Our taxonomic analyses revealed that up to 72% of Symbiodinium transcripts aligned with sequences from publicly available sequence databases, whereas 25–35% had no hits. The majority of the aligned transcripts matched the symbiotic dinoflagellate data sets (96.6–98%), whereas a relatively small proportion of the aligned transcripts matched bacterial (0.7–1.3%) and coral (0.2–0.7%) sequences (Supplementary Table S2).

Transcripts corresponding to the 1053 shared SP genes from each Symbiodinium clade were redundant and included 2597 (clade A), 3626 (B), 2864 (C) and 2513 (D) transcripts. They were compared with five other eukaryotic genomes/transcriptomes (BLASTn E-value 10−5) resulting in 12–17% of these transcripts from each clade aligning to non-Symbiodinium sequences, including Homo sapiens and A. digitifera (Supplementary Table S5). More than 80% of these transcripts were aligned to Symbiodinium species only (our Symbiodinium clades and S. minutum) and were not found in other eukaryotes. Due to redundancy and multiple transcripts annotated with the same UNIPROT ACCESSION IDs, only 41% or 432 of these genes were actually unique to Symbiodinium when compared with other analyzed eukaryotes (Supplementary Table S6). Direct alignment of the 1053 gene sequences obtained from the SP database to the other eukaryote sequences have shown 95.6% presence (1007 out of 1053) of these SP genes in other eukaryotes and 4.4% (46 out of 1053) specific to Symbiodinium species (Supplementary Table S7).

Discussion

The number of predicted genes for the four Symbiodinium clades has been estimated here to be between 30 000 and 46 000, and this is in the range of results obtained by others (Bayer et al., 2012; Shoguchi et al., 2013). Validation of the de novo assemblies was obtained using conserved Symbiodinium HSP70 and HSP90 genes (Rosic et al., 2011a) and polymorphic hemoglobin genes (Rosic et al., 2013). Heat shock proteins are molecular chaperones that are evolutionarily conserved and important for regular cellular functions involving protein folding and unfolding, degradation and transport as well as stress responses (Sørensen et al., 2003). In contrast, globin genes are characterized by a high level of polymorphism, although on a protein level they maintain globin-fold and a conserved Histidine residue (Royer et al., 2005). From our de novo assemblies, we recovered polymorphic hemoglobin-like genes with preserved globin domains (Supplementary Table S3).

Biological processes enriched and common to all four symbiotic dinoflagellates included photosynthesis, metabolism, transcription/translation and cytoskeletal interactions (Figure 1), as well as the phosphatidylinositol signaling system, inositol phosphate metabolism, spliceosome, ribosome, endocytosis and sucrose metabolism pathways (Table 3). In photosynthetic dinoflagellate Alexandrium the spliceosome pathway was also enriched, highlighting the importance of RNA-splicing mechanism in the dinoflagellates group (Zhang et al., 2014). These results indicate some of the fundamental processes needed for successful Symbiodinim–coral interaction. Pathways related to nutrient transfer, carbon concentration, nitrogen recycling, calcification, oxidative stress and cell communication have been proposed as necessary physiological adaptations required for a symbiotic lifestyle within Symbiodinium–holobiont systems (Weber and Medina, 2012). Symbiodinium, however, builds symbiotic associations with many marine species, including metazoan hosts (cnidaria, porifiera, mollusca and acoelomorpha), as well as single-celled eukaryote hosts (foraminifera, radiolarians and ciliates). Consequently, the implications and importance of these conserved biological pathways and processes for the foundation of this photosynthetic symbiosis are yet to be fully elucidated in the symbiotic and nonsymbiotic scenarios.

Finding gene orthologs among HKGs such as actin, calmodulin, tubulin, GAPDH and cyclophilin (Table 6) being expressed in all four clades was in some way an expected outcome as these proteins play an important role in essential cellular processes and their stable gene expression profiles are usually maintained irrespective of stress exposure (Sturzenbaum and Kille, 2001; Huggett et al., 2005). The actin gene is highly expressed and conserved in dinoflagellates (Kim et al., 2011). This gene and other HKGs have already been successfully applied as a reference, together with other HKGs, in several recent gene expression studies of different Symbiodinium clades, because of their stable expression during stress exposure (Rosic et al., 2011b; McGinley et al., 2012; Sorek and Levy, 2012; Ogawa et al., 2013). Furthermore, photosynthesis-related transcripts were recognized as shared among the clades and included genes such as Ribulose-1,5-bisphosphate carboxylase oxygenase (Rubisco) and peridinin-chlorophyll a-binding protein that are unique to dinoflagellates (Rowan et al., 1996; Leggat et al., 2011). An RBBH approach was used to find orthologs between EST libraries of two Symbiodinium spp. (CassKB8 and C3), revealing 132 potential orthologs that, similarly, included HKGs such as actin and cyclophilin and photosynthesis-related genes (Voolstra et al., 2009). Here, we have identified several antioxidant genes common to the four coral dinoflagellates (Table 6), including genes from the thioredoxin (Trx) superfamily, as well as superoxide dismutase (SOD Mn) and catalase (CAT) genes. The Trx genes have also been found in two Symbiodinium clades, A and B (Bayer et al., 2012), whereas SOD and CAT genes are known to be involved in the oxidative stress response and scavenging reactive oxygen species (Lesser and Shick, 1989; Lesser, 2006; Levy et al., 2006). Although there are different forms of SOD metalloproteins (McCord and Fridovich, 1969), the conserved SOD Mn form found in mitochondria is also common in bacteria and many eukaryotic algae, and is considered, together with Fe SOD, to be an evolutionarily ancient form of SODs (Lesser, 2006). CAT is a heme-containing enzyme that catalyzes the conversion of hydrogen peroxide to water and oxygen and, similarly, peroxidases catalyze the conversion of hydrogen peroxide to water (Lesser, 2006). The Trx enzymes in the chloroplast regulate the activity of photosynthetic enzymes via ferreodoxin-thioredoxin reductase (Arnér and Holmgren, 2000). In legume roots, antioxidant proteins such as SOD, CAT and thioredoxin play an important role during nodule formation and show increased expression in nodules necessary for lowering the reactive oxygen species levels (Lee et al., 2005). A reduced expression of the thioredoxin gene via RNA interference resulted in impaired nodule formation (Lee et al., 2005). Similar to root nodules, where the symbiotic relationships occurs between bacteria and plant, preserved antioxidant genes of Symbiodinium suggest that perhaps these antioxidants are important in the establishment of coral–algal symbiosis.

The discovery of calcium-dependent protein kinases within the shared Symbiodinium transcripts is consistent with the fact that these enzymes have previously been found in plants, ciliates and some protists (Harper and Harmon, 2005). In the Apicomplexa, Ca2+ acts as a secondary messenger via a range of calcium-dependent protein kinases, and initiates a number of signaling processes that are important for communication between eukaryotic cells (Nagamune and Sibley, 2006). Calcium signaling and calcium-regulated protein kinases have also been recognized for their critical role in establishing plant–microbe symbioses, especially in the plant–rhizobium system (Harper and Harmon, 2005; Oldroyd et al., 2009). Genes such as CCaMK that play a role in the establishment of rhizobial symbiosis between the legume plants and fungi (Mitra et al., 2004) were found to be conserved in symbiotic dinoflagellates (Table 6) and represented by several isoforms organized into two evolutionarily distant groups (Figure 3). They have, however, also been discovered in nonsymbiotic photosynthetic dinoflagellates like Karenia brevis, playing a role in intracellular signaling pathways (Lidie et al., 2005), as well as in the coral host (Shinzato et al., 2011; Dunlap et al., 2013) where they play a role in the motility of coral sperm (Morita et al., 2009). Interestingly, CCaMKs have been highly conserved in plants that establish rhizobial symbiosis with nitrogen-fixing soil bacteria, but not in nonleguminous plants such as Arabidopsis and wheat that do not have this type of symbiotic interaction (Mitra et al., 2004). In plants, mutants of the DMI3 gene that encodes CCaMK are not capable of establishing symbiosis with mycorrhizal fungi (Mitra et al., 2004). Consequently, in photosynthetic symbioses such as that between corals and Symbiodinium, the role of CCaMK may occur via Ca2+ signaling, allowing molecular recognition and the establishment of the coral–algal symbiosis.

Genes from the enriched KEGG pathway ‘Phosphatidylinositol signaling system’ were identified within the orthologous genes, shared among all four Symbiodinium clades, encoding proteins including phosphatidylinositol 4-phosphate 5-kinases, phosphatidylinositol 4-kinase gamma 5, phosphatidate cytidylyltransferase and myo-inositol oxygenase (Supplementary Table S4). An interesting characteristic of these genes is that they encompass multiple copies of the membrane occupation and recognition nexus (MORN) repeat. Proteins containing MORN repeats could be involved in cell division, as in the parasite Toxoplasma gondii, where they are located in the cell division apparatus and are proposed to play a role in the cytoskeleton interaction between the parasite and host (Gubbels et al., 2006; Takeshima et al., 2000). In plants, the MORN repeat motif has been discovered in the ARC3 gene that is important for the replication of chloroplasts (Shimada et al., 2004). In addition, the symbiotic interaction and signaling between legumes and rhizobacteria is mediated via phosphatidylinositide-regulated endocytosis (Peleg-Grossman et al., 2007), whereas glycosylphosphatidylinositol is important for the cell recognition and detection of microbes by the host (Davy et al., 2012). Our results support the importance of the conserved ‘Phosphatidylinositol signaling system’ pathway for symbiotic dinoflagellates and suggest its possible involvement in symbiotic interactions. Conclusive evidence could come from in vitro studies that would include mutants or silencing of particular genes from this pathway, followed by an evaluation of the capacity of Symbiodinium to establish symbiotic interaction with the host.

Within the sequences from the Symbiodinium cultures we identified almost 2% of sequences of bacterial and other origins (Supplementary Table S2). This indicates a possible gene intake via horizontal gene transfer or endosymbiosis that has been shown to be an important mechanism for acquiring innovative features during dinoflagellate evolution (Wisecaver and Hackett, 2011; Wisecaver et al., 2013). However, the possibility of a sequence match due to conserved genes cannot be excluded.

Of transcripts aligned to the 1053 genes conserved in all four Symbiodinium clades, we revealed up to 17% of the transcripts aligning to five other eukaryotic genomes/transcriptomes (Supplementary Table S5). Approximately 80% of these transcripts matched only to our transcriptomes and the genome of Symbiodinium minutum (Shoguchi et al., 2013) and could not be found in the nonsymbiotic dinoflagellate Alexandrium minutum nor in the unicellular red alga Cyanidioschyzon merolae. From the 1053 conserved SP genes, 432 were present only in Symbiodinium. The lack of alignment between these Symbiodinium transcripts and the sequences of the analyzed eukaryotes does not exclude the presence of the associated SP genes in other organisms, as many of them actually encode proteins involved in housekeeping and maintenance of the regular cellular functions, but does indicate sequence and probable functional divergence of these genes conserved within symbiotic lineages of dinoflagellates (Supplementary Tables S6 and S7). Consequently, these results present a potential pool of symbioses-related transcripts that should be evaluated under symbiotic conditions. Furthermore, future studies are needed to explore the genetic bases of differences between different Symbiodinium clades.

In this research, we used a conservative approach by targeting and evaluating functions and pathways unique to and preserved in all symbiotic dinoflagellate types. This was to avoid possible bias arising from the following circumstances: (1) Symbiodinium maintained in culture have a tendency to represent only a portion of the population from the in hospite environment (Santos et al., 2001); (2) cultures used here were polyclonal, which inevitably introduces additional genetic variability; and (3) possibility of bacterial contamination despite antibiotic treatment (Shoguchi et al., 2013). Evaluating the features of only conserved, shared and well-described proteins of these symbiotic dinoflagellates is one important piece in a puzzle that includes many different genes and a number of pathways. Consequently, these results present an important foundation for recognizing similarity within coral endosymbiotic algae and their unique capacity among dinoflagellates to establish a symbiotic relationship with corals and other marine species.