Introduction

Taxonomy is critical for quantifying biodiversity, deducing underlying ecological and evolutionary processes, improving conservation practices, and ultimately enhancing our understanding and appreciation of the natural world [1, 2]. Improved taxonomic resolution through the use of genetic data has greatly accelerated recognition of the processes governing the ecology and evolution of long established and more recently discovered species of microbial eukaryote [3,4,5,6,7]. However, formal descriptions of morphologically ambiguous eukaryotic microbes from many cryptically diverse and/or evolutionarily divergent phyla remain limited. This is especially the case for mutualisms between microalgae (i.e. dinoflagellates) and their marine invertebrate hosts, whose partnerships are the foundation of coral reef ecosystems around the world. Revisions to the systematic framework of these symbionts have helped to focus recent scientific discussions of these microbes [8,9,10,11,12,13], but much of their species taxonomy remains unresolved and undermines scientific efforts to improve awareness about their biology.

As the foremost taxonomic unit in biology, considerable importance is placed on the correct resolution of species. Ideally, a robust approach to defining a species supplies evidence that satisfies each of the morphological, ecological, phylogenetic, and biological species concepts. Each species definition emphasizes different stages in the process of speciation, including genetic isolation, population subdivision, and lineage divergence [14], and resulting changes in morphology and ecology. However, resolving single-celled species is often hindered by their size, lack of morphological features, and difficulties in culturing. Accepting these limitations, genetic data sourced from multiple independent phylogenetic and population genetic markers ultimately provides the best proxy for objectively delimiting species of eukaryotic microbes [3, 15].

When resolved to species, microalgal endosymbionts can serve as model systems to examine processes of dispersal, gene flow, and speciation among unicellular organisms [16]. A wide range of marine invertebrates that are mutualistic with photosynthetic dinoflagellates (Symbiodiniaceae; Dinophyceae) occur in shallow marine habitats across the world’s oceans. Reef-building corals are amongst the most diverse and abundant group of hosts to these symbionts and dominate tropical marine habitats where they build the framework supporting coral reef ecosystems. Dense dinoflagellate populations found in the tissues of corals provide sufficient biomass harvested from small samples that are easily and repeatedly obtained over space and time from hosts living in distinct habitats [17, 18]. The widespread distribution of many host coral species provides, in effect, a suite of biotic and abiotic environments from which to sample microalgal symbionts, allowing detailed investigations of their diversity, connectivity, ecology, and evolution over broad geographic scales and exposure to a range of physical–environmental conditions.

Worldwide, the genus Cladocopium is the foremost group of dinoflagellates associated with marine invertebrates. This genus has been characterized extensively using several genetic markers, which indicate that it comprises numerous genetically distinct, albeit closely related, lineages created by several adaptive radiations since the late Miocene [19,20,21,22,23,24,25]. Many genetically distinct Cladocopium lineages, many referred to by their ‘ITS2 type’ designation, vary in their biogeographic distributions and in their ecology. Indeed, genetically distinct Cladocopium exhibit clear differences in prevalence across reef environments and each associates with distinct kinds of host taxa [26]. Correspondingly, these Cladocopium often exhibit marked differences in physiology [27,28,29,30]. Thus, collective differences in their physiological, ecological, and biogeographic traits define genetic ‘types’ as distinct biological entities; and signify that they are discrete species [19, 20]. By this reasoning, the genus Cladocopium likely contains hundreds of species [25, 31].

Niche diversification via host specialization often appears to be the main cause of symbiont speciation [31,32,33]. To evaluate this mechanism in more detail, research presented herein examined Cladocopium associated with widespread coral species in the genus Pocillopora. These reef corals are prevalent throughout the Indo-Pacific [34], capable of long-range dispersal [35], rapid growth, relatively fast generational turnover [36], and may display marked differences in stress tolerance among individual colonies (e.g., sensitive or resistant to episodes of rapid ocean warming) [37, 38]. The extensive branching morphology of colonies provides critical habitat for many kinds of reef fish and invertebrates [39, 40]. Moreover, they are early colonizers repopulating reefs damaged by typhoons and events of mass coral bleaching and mortality [41]. They are therefore subjects of frequent physiological, ecological, and genetic investigations [42,43,44,45]. Species of Pocillopora often harbor specific ‘ITS2 types’ of Cladocopium not associated with other coral taxa [19, 20, 26, 46,47,48]. Here, we use a combination of genetic markers for phylogenetic and population genetic analyses to formally describe two new sibling species of Cladocopium. Each exhibited host fidelity over vast geographic ranges under varied environmental conditions and their age of origin is estimated using a calibrated molecular clock. With this clear taxonomic insight, we discuss the implications of reef corals dependent on symbionts that have adapted to, and evolved with, hosts through millions of years of changing climate.

Methods

Specimen collection

Pocillopora verrucosa (mt-ORF Type 3; n = 28), Pocillopora acuta (mt-ORF Type 5; n = 30), Pocillopora damicornis (mt-ORF Type 4; n = 15) Pocillopora meandrina (mt-ORF Type 1; n = 10), Pocillopora grandis (=eydouxi, mt-ORF Type 1; n = 64) and Pocillopora sp. (no mt-ORF data; n = 11) colonies were sampled by SCUBA at 17 sites throughout the Indo-Pacific from 2004–2017 (Fig. 1A and Table S1) and many of these samples have been included in previous studies [19, 20, 32, 38, 47, 49,50,51,52,53,54]. Fragments of coral between 1 and 3 cm2 were collected using a hammer and chisel and preserved in 20% DMSO buffer or 100% ethanol and stored at −20 °C.

Fig. 1: Collection locations, light micrographs, and cell sizes of symbiotic dinoflagellates from corals in the genus Pocillopora.
figure 1

A Indo-Pacific collection locations of Pocillopora corals. Arrows correspond to major ocean currents that influence dispersal and connectivity. B Light micrographs of cells representative of Cladocopium latusorum sp. nov. and C cells representative of Cladocopium pacificum sp. nov. Scale bar = 20 µm. D Size differences between C. latusorum (white circles) and C. pacificum (gray triangles). Each symbol corresponds to mean cell dimensions from independent samples. Error bars represent 95% confidence intervals.

Cell imaging and size measurements

Cells of Cladocopium were isolated from preserved fragments of both P. verrucosa and P. grandis. Cell size can vary with latitude, thus representative cells isolated only from host colonies in similar habitats from Palau were used. Ten microliters of macerated tissue was drawn into a pipette tip and placing on a microscope slide with a coverslip. The cells were visualized using bright-field microscopy on an Olympus BX51 microscope (Olympus Corp., Tokyo, Japan), and imaged using the ORCA ER (Model C4742-80) and Olympus DP71 at the Penn State Microscopy Facility. The average and standard deviations of lengths and widths of at least 55 cells from 5 colonies per host species harboring each symbiont type were measured using ImageJ and calculated in Microsoft Excel.

Genomic DNA extraction, rRNA gene fingerprinting, amplification, and sequencing

Genomic DNA was extracted from 1 cm2 fragment of each sample using a modified Promega Wizard protocol [46, 55] (Promega Madison, WI). The Pocillopora-specific mitochondrial open reading frame (mt-ORF), which resolves many Pocillopora species [19, 42, 45, 49], was amplified to confirm host species (Table S2).

All samples had the symbiont ‘type’ initially identified using the Symbiodiniaceae internal transcribed spacer regions 1 and 2 (ITS2) of the rRNA gene region (primers and thermal cycling conditions listed in Table S2). Amplified products were electrophoresed using a CBScientific system with 45–80% denaturing gradient gels. Resultant gels were visualized using SYBR Green stain, where characteristic fingerprints for each sample were compared to known standards and/or sequenced to confirm identity to the ITS2 “type” designation (cf. [56]). Cladocopium samples identified as ITS2 “types” C1c, C1b-c, C42, C42a, C42b and C1c-ff, C1c-42-jj, C1c-42-ff, C1d, and C1c-d-t were analyzed with additional genetic markers.

Genetic markers including the D1/D2 domain of the large ribosomal subunit (LSU rRNA gene), internal transcribed spacer 2 (ITS2) of the rRNA gene region, partial chloroplast large-subunit cp23S, mitochondrial cytochrome b (cob), mitochondrial cytochrome oxidase 1 (cox 1), and non-coding plastid minicircle (psbAncr) were amplified using the primers and protocols listed in Table S2. The conserved mitochondrial cob and cox 1 gene are conserved genes commonly used to distinguish between members of different genera, while the plastid cp23S and ITS2 genes are often applied for species designations within a genus. While ITS2 has been used to distinguish lineages of Symbiodiniaceae, though intragenomic variation often limits its use in reliable species designation. The nuclear large-subunit LSU rRNA gene is commonly used to distinguish species across genera of Symbiodiniaceae while the psbAncr is a rapidly evolving region that provides resolution between species as well as phylogeographic evidence resolution within species [25, 57, 58].

Amplicons were Sanger sequenced in the forward direction only for LSU, cp23S, cob, and cox 1, reverse for mt-ORF, and in both directions for psbAncr using the BigDye Terminator 3.1 Cycle Sequencing Kit (ThermoFisher Scientific, Waltham, MA). Sequences were analyzed on the Applied Biosciences Sequencer (Applied Biosciences, Foster City, CA) at the Penn State University Genomics Core Facility. Sequences were checked and edited in Geneious v 11.0.5 (Biomatters Ltd, Auckland, NZ).

DNA alignment, phylogenetic analysis, and divergence dating

The ITS2, cp23, cob, cox 1, and LSU sequences were concatenated and aligned with the same order of concatenated genes obtained from Cladocopium goreaui, Cladocopium infistulum, and Cladocopium thermophilum. PAUP v 2.3 [59] was used to create an unrooted maximum parsimony phylogeny based on a heuristic search. Gaps (and insertions) were treated as a 5th character and scored as one change. Bootstrap values based on 1000 iterations were evaluated to statistically assess branch support. Maximum likelihood was also employed to compare branching topologies produced by a different phylogenetic reconstruction. Branch support was also assessed using Bayesian Inference using MrBayes v3.2.1 [60] implementing General Time Reversible. Each MCMC analysis was run for 1.0 × 106 generations and sampled every 100 generations. The first quarter of trees were discarded as burn-in corresponding with the convergence of chains.

Sequences of the entire psbAncr were initially aligned using ClustalW2 (https://www.ebi.ac.uk/Tools/msa/clustalw2/) and optimized manually. MrModelTest v2 [61] was used to estimate the best model of nucleotide substitution, and the Akaike Information Criterion was used to parameterize Bayesian Evolutionary Analysis 2 (BEAST v2.5.2) [62, 63]. Both strict and relaxed clock models were tested under an HKY + G substitution model and calibrated based on the closing of the Central American Seaway. The Isthmus of Panama land barrier was established 4.2–4.7 million years ago (mya) [64] and the land barrier 3.1 mya at the latest [65, 66], and caused the isolation of once-connected populations of Eastern Pacific and Caribbean coral species, such as Porites panamensis and Porites colonensi containing Cladocopium symbionts. Following this timing, a gamma distribution with an alpha of 2.0 and beta 1.0 with an offset of 3.0 million years (my) was used. The MCMC chain was sampled every 1000 generations and run for 100,000,000 generations. The first 10% of iterations were discarded as a burn-in and the remaining convergence of independent runs and random starting seeds was analyzed using TRACER v1.7.1. TreeAnnotator [67] was used to find the maximum clade credibility tree and visualized in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). Densitree v2.2.6 [68] was used to visualize tree posteriors.

Microsatellite analysis

One hundred and eighty four samples from 5 of the 17 locations (Palau, Hawaii, Australia, Mexico, and Panama) were analyzed using eight previously published microsatellite markers including C1.05 [69], Sgr_21, Sgr_33, Sgr_37, Sgr_40, SgrSpl_22, SgrSpl_25, and SgrSpl_30 (Table S3) [23, 69]. Microsatellites were amplified according to corresponding published annealing temperatures listed in Table S3 and fragment sizes were analyzed using a LIZ500 base-pair standard on the ABI 3730 genetic analyzer (Applied Biosystems, Foster City, CA) at the Penn State Genomics Core Facility. The Geneious v 11.0.5 microsatellite plugin was used to score allele sizes. GenAlEx v6.5 [70] was used to identify and remove clones so only individual multi-locus genotypes (MLGs) were analyzed (Supplemental Data). One hundred seven unique MLGs were retained and GenAlEx v6.5 [70] was used to calculate summary statistics including number of different alleles, number of effective alleles, observed and expected heterozygosity, and probability of identity (Table 1). Two alleles were frequently observed at each locus, consistent with previous Cladocopium microsatellite studies [23, 25, 51, 71] and the hypothesis that Cladocopium genome is all or partially duplicated [72, 73]. A test for linkage disequilibrium was conducted for each pair of loci in each lineage using the log-likelihood ratio statistic with GenePop on the Web (http://genepop.curtin.edu.au/) [74, 75] (Supplemental Material).

Table 1 Microsatellite summary statistics.

Two separate and unsupervised (i.e. without a priori designations such as species, host species, or geography) clustering methods (STRUCTURE and principal components analysis (PCA) embedding using t-Distributed Stochastic Neighbor Embedding (t-SNE)) were used to determine the number of clusters of multi-locus genotypes and the probability of membership of each sample to those clusters. First, Bayesian clustering analyses of the allele frequencies at each locus were performed with STRUCTURE v2.3.4 [76], assuming no-admixture, no location prior for seeding initial groupings, and no correlation of allele frequencies. The analyses ran 1,000,000 iterations with a burn-in of 10,000 for 100,000 replicates for each of tested K values from 1 to 10. The results of this analysis were plotted using the PlotSTR script [51]. To confirm consistency between analytical methods, a second unsupervised clustering technique was utilized. A multi-locus genotype table was used to run a PCA, after which the principal components (PCs) were embedded into 2 dimensions using t-SNE. This technique utilizes all PCs weighted by the amount of variance explained aimed at preserving high-dimensional structure [77]. A series (K = 1–10) was then fit of Gaussian mixture models on these embeddings to determine the number of groups as well as the probability of membership of each sample to each group. To determine the alleles at each locus responsible for clustering, the results of the clustering techniques were then analyzed using the BayesAllele script [51] to develop allele frequency by cluster plots. These analyses were performed in R (R Development Core Team 2018) and code for these analyses is available at https://github.com/KiraTurnham/ISME.Turnham.et.al.2021. Also available at this site are all supplemental datasets including multi-locus microsatellite alleles, summary statistics, DNA sequence nexus files, and BEAST input files.

Results

Taxonomic summary

Cladocopium latusorum Turnham, Sampayo & LaJeunesse, sp. nov. (Fig. 1B)

Phylum Dinoflagellata

Class Dinophyceae

Order Suessiales

Family Symbiodiniaceae

Genus Cladocopium LaJeunesse and Jeong 2018 [78]

Species Cladocopium latusorum,. sp. nov. Turnham, Sampayo & LaJeunesse.

Description

Vegetative ovate cells range in mean length between 10.5 µm (±0.72 µm; SD) and 11.2 µm (±0.68 µm; SD), and range in mean widths from 9.8 µm (±0.85 µm; SD) to 9.9 µm (±0.67 µm; SD) (Fig. 1B, D; Table S4). Nucleotide sequences of the large ribosomal subunit rRNA gene (GenBank MW711730–MW711732), abundant intragenomic variants of the internal transcribed spacer 2 of the rRNA gene region (MW757029–MW757034), partial chloroplast large-subunit, cp23S (MW711735), mitochondrial cytochrome b, cob, (MW713374), mitochondrial cox 1 (MW713056), and the full sequence of the psbAncr (MW819755–MW819779), genetically define this species. Two haploid alleles typically occur at microsatellite loci.

Holotype

The designated holotype are cells within DMSO preserved tissues of P. grandis deposited in the Algal Collection of the National Herbarium, Smithsonian Institution, Washington DC, USA; and assigned the catalog number: US Alg. Coll 227752.

Type locality

Republic of Palau, Oceania, Pacific Ocean (7°14.93′N, 134°14.149′E).

Habitat

Associated predominantly with the broadcasting corals P. grandis and P. meandrina, and also forms mutualisms with the brooding species P. damicornis and P. acuta.

Etymology

From the Latin word latus, meaning side, in reference to its broad longitudinal distribution from the western the Indian Ocean and Red Sea to the eastern Pacific Ocean.

Other notes

Species previously referred to in the literature ITS2 “type” C1c, C1b-c, C42, C42a, C42b and C1c-ff, C1c-42-ff.

Cladocopium pacificum Turnham & LaJeunesse, sp. nov. (Fig. 1C)

Phylum Dinoflagellata

Class Dinophyceae

Order Suessiales

Family Symbiodiniaceae

Genus Cladocopium LaJeunesse and Jeong 2018 [78]

Species Cladocopium pacificum Turnham & LaJeunesse, sp. nov.

Description

Vegetative ovate cells range in mean length between 10.3 µm (±0.70 µm; SD) and 10.6 µm (±0.77 µm; SD), and range in mean widths from 9.3 µm (±0.77 µm; SD) to 9.6 µm (±0.76 µm; SD) (Fig. 1C, D and Table S4). Nucleotide sequences of the large ribosomal subunit rRNA gene (GenBank MW711733, MW711734), abundant intragenomic variants of the internal transcribed spacer 2 of the rRNA gene region (MW757035–MW757037), partial chloroplast large subunit, cp23S (MW711736), mitochondrial cytochrome b, cob, (MW713375), mitochondrial cox 1 (MW713057), and full sequence of the psbAncr (MW861711–MW861727), genetically define this species. Two haploid alleles typically occur at microsatellite loci.

Holotype

The designated holotype are cells within DMSO preserved tissues of P. verrucosa deposited in the Algal Collection of the National Herbarium, Smithsonian Institution, Washington DC, USA; and assigned the catalog number: US Alg. Coll 227753.

Type locality

Republic of Palau, Oceania, Pacific Ocean (7°14.93′N, 134°14.149′E).

Habitat

Associated with the broadcasting coral P. verrucosa, and the brooding species P. acuta, and P. damicornis.

Etymology

From the Latin word pacificus meaning peacemaking, peaceful, of peace.

Other notes

Species previously referred to in the literature as ITS2 “type” C1d, C1d-t.

Morphology

Cells of both species were spherical ovate and brown in pigmentation (Fig. 1B, C). Mean cell sizes C. latusorum obtained from independent samples of P. grandis (=eydouxi) in Palau were larger than C. pacificum obtained from nearby colonies of P. verrucosa (Fig. 1D; Table S4). Each species has a single pyrenoid and a reticulated chloroplast that envelopes the periphery of the cell cytoplasm.

Phylogenetic relationships based on conserved genes

The examination of a sequence alignment comprising nuclear, chloroplast, and mitochondrial genes resolves discreet phylogenetic lineages separate from currently described species of Cladocopium (Fig. 2A). Fixed nucleotide differences in the rRNA gene region (LSU and ITS2) resolves each species lineage. LSU sequences differed between C. latusorum and C. pacificum by 4–5 base substitutions and were distinctive from C. goreaui and C. thermophilum by 8 base substitutions and C. infistulum by 9 base substitutions (Fig. 2A). Direct Sanger sequencing of the D1/D2 variable domain of LSU shows intragenomic variation within each species; some nucleotide positions share two possible bases (mostly transitions T/C and A/G) and are viewed as double peaks in electropherograms. This inter-individual, intragenomic variation occurs consistently among genotypes obtained from locations separated by thousands of kilometers (Fig. 1A and Table S1).

Fig. 2: Phylogenetic and population genetic data resolving two species of Cladocopium.
figure 2

A Unrooted phylogenic reconstruction inferred from aligned concatenated DNA including rRNA genes (ITS2 and LSU), partial chloroplast cp23, mitochondrial cob, and cox 1 genes, showing the relationship of Cladocopium latusorum and C. pacificum with other described species of Cladocopium. B Two-dimensional representation of Principle Components calculated for the multi-locus genotypes using t-SNE [77]. C STRUCTURE plot (K = 2) showing reproductive isolation between Cladocopium latusorum and C. pacificum in the Pacific Ocean. Vertical bars represent the probability of assignment of one individual to a distinct population. D Bubble plots show differences in allele frequencies between species for each of 8 microsatellite loci. Analyses were conducted using scripts available at https://github.com/KiraTurnham/ISME.Turnham.et.al.2021.

The ITS2 for both species is characterized by high levels of intragenomic variation. Denaturing gradient gel electrophoreses of PCR products or next-generation sequencing provides a profile of the state of fixed sequence heterogeneity in the genomes of each species (e.g., see figure four in LaJeunesse et al. [79]). The genomes of C. latusorum are typically dominated by three or more co-abundant ITS2 sequences that together are diagnostic for this species [20, 79]. The ancestral “C1” sequence is most frequently paired with two co-dominant sequences designated “b” and “c”. Alternatively, the genomes among certain genotypes contain additional co-dominant sequences including “C42”, and related sequence derivatives including, “42a” and “42b” on the Great Barrier Reef [20] or ‘ff’ on the Western Australian coast [50]. Evidence from more rapidly evolving genetic markers indicates that these differences in ITS2 sequence and/or differences in the relative abundances of intragenomic variants constitute examples of inter-individual variation, i.e. population-level differences within a species. Compared to C. latusorum, the ribosomal array of C. pacificum is much less ‘fragmented’, meaning that less intragenomic variation in the ITS2 region exists and corresponds to fewer co-dominant repeats. Similar to C. latusorum, it possesses the ancestral “C1” sequence, which is co-dominant with a second sequence designated “d”. Some profiles of the rRNA gene array from C. pacificum may additionally contain low abundance of “c” sequence variant.

Sequences of conserved plastid genes (cp23S, cob, cox 1) provided limited phylogenetic resolution. Sequences of the partial chloroplast 23S (cp23S) were identical for C. latusorum, C. pacificum, C. goreaui, and C. thermophilum, while distinct from C. infistulum by three base substitutions (Fig. 2A). Sequences of the mitochondrial cox 1 were identical for both C. latusorum and C. pacificum but differed from C. goreaui by seven fixed base substitutions (Fig. 2A). The mitochondrial cob was also identical for C. latusorum and C. pacificum but differed from C. goreaui by four base substitutions. The combination of rRNA gene and mitochondrial sequences identified C. latusorum and C. pacificum as a monophyletic group divergent from other described Cladocopium species (Fig. 2A). The nexus file containing alignments of concatenated ITS2, LSU, cp23S, cob and cox 1 sequences for each formally described species of Cladocopium is available in the Supplemental Data.

Microsatellite multi-locus genotypes

One hundred and seven multi-locus genotypes were assembled using neutral microsatellite loci from samples obtained at 5 locations spanning the Pacific Ocean (Supplemental Data). Markers in both species showed high heterozygosity, with a mean of observed heterozygosity of 0.832 for C. latusorum and 0.754 for C. pacificum (Table 1). The probability that non-clonal individuals have the same multi-locus genotype (probability of identity) was equal to 3.3E−09 for C. latusorum and 4.9E−09 for C. pacificum (Table 1). Tests for linkage disequilibrium between each pair of loci were significant for only one pair of loci (22 and 40, p = 0.06377; Supplemental Material). Furthermore, the rapidly increasing number of recovered genotypes from subsampling accumulative loci creates a graph consistent with a population having sexually recombinant individuals (Fig. S1).

Each iteration of the t-SNE distance metric method supported three distinct clusters (K = 3). All C. pacificum genotypes clustered in one group regardless of their collection location, while C. latusorum subdivided into two clusters, albeit in close approximation (Fig. 2B). STRUCTURE analyses determined that C. pacificum and C. latusorum represent genetically isolated populations even with large overlaps in their geographical distributions (Fig. 2C). When clustered by STRUCTURE into two or more groups, C. pacificum and C. latusorum species boundaries remained separate, and further structure occurred within each species, indicating that population structure occurred within each species according to their biogeographical location (Fig. S2). The differentiation of two reproductively isolated species was supported by differences in allele frequencies observed at many loci, and the occurrence of certain alleles found in one lineage and not observed in the other (private alleles; Fig. 2D).

Phylogeographic patterning within an evolutionary divergence between species

Large differences in psbAncr nucleotide sequences differentiated Cladocopium latusorum from C. pacificum; and both were well-diverged from the outgroup taxon, C. goreaui (Fig. 3A). The psbAncr sequences from other described Cladocopium (e.g. C. infistulum and C. thermophilum) could not be unambiguously aligned with the sequences obtained here. While the differentiation of C. latusorum at K = 3 by STRUCTURE (Fig. S2) was not supported by the psbAncr phylogeny, the phylogeny resolved fine-scale phylogeographic patterns where some related haplotypes occurred more frequently in particular geographic locations. For instance, C. latusorum from high latitude Taiwan contained closely related haplotypes that were very distinct from others within the species (indicated by red triangles in Fig. 3A). The psbAncr haplotypes of C. latusorum from Eastern (blue triangles) and Western Australia (green squares) also are distinguished from other regions. Finally, both populations of C. latusorum and C. pacificum from the Eastern Pacific contained haplotypes not found in other Pacific regions (Fig. 3A). The nexus file containing psbAncr sequences alignments are available in the Supplemental Data.

Fig. 3: High-resolution phylogenetic analysis of Cladocopium latusorum and C. pacificum.
figure 3

A Maximum Parsimony phylogeny of the psbAncr (Bootstrap support values based on 1000 replicates). B Pocillopora grandis typical host species mutualistic with C. latusorum. C Approximate geographic distribution of C. latusorum. D Pocillopora verrucosa typical host species mutualistic with C. pacificum. E Approximate geographic distribution of C. pacificum. Symbols correspond to samples sourced from geographic locations presented in Fig. 1A; squares denote samples from the Indian Ocean, triangles from the western Pacific, circles from the central Pacific and hexagons from the eastern Pacific.

The ecological attributes and geographic distribution of C. latusorum differed from C. pacificum. C. latusorum associated with Pocillopora grandis and P. meandrina (Table S1 and Fig. 3B). The biogeographic distribution of C. latusorum extended from the western Indian Ocean to the eastern Pacific Ocean (Fig. 3C). By contrast, C. pacificum was found in P. verrucosa (Table S1 and Fig. 3D), and its biogeographic distribution extended from the Indo-west Pacific to the tropical eastern Pacific Ocean (Fig. 3E). While each species associated with distinct lineages of broadcast spawning species (mitochondrial genetic types 1, 8 & 9 for C. latusorum and types 3 & 7 for C. pacificum), both were found to associate with larval brooding species, including P. acuta and P. damicornis (genetic types 5 and 4, respectively; Fig. 4A).

Fig. 4: Age estimates for the co-diversification of Cladocopium with their pocilloporid hosts.
figure 4

A Differences in specificity (gray shading) of Cladocopium latusorum and C. pacificum to different lineages of broadcast spawning Pocillopora (circles), whereas both associate with brooding species (squares). Unrooted host phylogeny based on the mitochondrial open reading frame (number designations as described in 49). B Time of divergence between Cladocopium latusorum and C. pacificum in correspondence with the adaptive radiation of Pocillopora [42]. Reduced time-calibrated psbA phylogeny estimated from BEAST 2.3.5 with bars representing the 95% highest posterior density interval and mean divergence times (ages) listed at each node. The Cladocopium associated with sibling species Porites panamensis (Eastern Pacific) and Porites porites var. colonensi (Western Atlantic) were used to calibrate the molecular clock based on the geological separation of the Atlantic and Pacific Oceans by the Isthmus of Panama (4.5–3.0 mya).

Molecular clock for estimating the age of Pocillopora symbionts

Bayesian posterior probabilities estimated that C. latusorum and C. pacificum diverged from each other during the late Pliocene approximately 2.7 mya (±1.0 my) when calibrated by sibling symbionts specific to sibling Porites corals separated by the Isthmus of Panama, a geological barrier known to be at least 3.1 million years old, (Fig. 4B). Both the relaxed log clock normal and default strict models were tested. Both resulted in similar mean ages, and the strict clock model is drawn in Fig. 4A. The comparison of this divergence time corresponds to the estimated mean divergence time of the broadcasting Pocillopora taxa with which each symbiont uniquely associates (Fig. 4B; [42]).

Discussion

Numerous biotic and abiotic factors influence the diversification of lineages and the origination of species. While geographic isolation and habitat depth can influence the diversification of symbiotic dinoflagellates [32, 80, 81], life with different host taxa exerts considerable selective pressures that guide lineage sorting and the origination of species having stringent host affiliations [25, 32, 33, 82]. Given the Pocillopora symbionts studied here and the existence of numerous other examples only provisionally examined, this form of “ecological speciation” (as defined in 83) appears to be a prevailing process behind the creation of most symbiodiniacean species, especially in the genus Cladocopium [25]. In some instances, this diversification appears to have proceeded alongside that of the host (Fig. 4A; [66]).

Multiple species concepts are satisfied

The descriptions of C. latusorum and C. pacificum fulfill expectations of the morphological, ecological, phylogenetic (evolutionary), and biological species concepts [3, 14]. As each concept differs in its emphasis of early, middle, or late stages of speciation, the evidence in support of each species concept strengthens confidence that taxonomic designations accurately reflect biological reality. Furthermore, this combination of support provides fundamental insight into the ecology and evolution of the species in question.

While mating experiments are unrealistic for most micro-eukaryotes, especially for those that are resistant to cultivation, evidence from population genetic markers provides direct evidence of sexual recombination. Differences in microsatellite allele sizes and frequencies between C. latusorum and C. pacificum (especially when co-occurring geographically) indicate that they are reproductively isolated biological species (Fig. 2B, C; [83]). Each population contains recombinant genotypes that differentiate most unique individual genotypes (due to random sampling a subset of loci out of 8 total (Fig. S1; [84])) and are therefore indicative of a sexual population, a finding supported by the lack of linkage disequilibrium of alleles between loci (Supplemental Material). The maintenance of two distinct, non-recombining, gene pools both in sympatry and over thousands of kilometers satisfies the biological species concept for C. latusorum and C. pacificum.

Some of the possible morphological attributes that can be used to distinguish members of the family Symbiodiniaceae include the number and shape of amphiesmal plates [78], chloroplast morphology [31, 85], chromosome numbers [86], and mean cell size. While simple to measure, cell size is perhaps the most meaningful feature. Differences in cell volume are known to affect cellular and physiological functions among Symbiodiniaceae and many other microalgae [87,88,89,90,91,92]. Here, C. latusorum and C. pacificum showed differences in mean cell volumes from samples taken at the same location (to avoid the influence of external environmental conditions on cell size; Fig. 1B, C). Cell volume differences among other symbiodiniacean species correspond to dissimilarities in light-harvesting and growth rates [88], and may affect how each species and its host tolerates thermal stress (e.g. [32]). While species most tolerant of thermal stress are often members of the genus Durusdinium [29, 93, 94], thermal tolerance and other physiological differences have also been documented among corals hosting closely related species of Cladocopium [27, 30, 95]. Indeed, there are large size ranges among Cladocopium species, which may ultimately influence their physiological attributes [31]. It is currently unknown to what extant C. latusorum and C. pacificum differ in physiology and how this may influence the distribution of their respective hosts.

Species that occupy separate ecological niches, by definition, possess intrinsic functional differences. During the process of ecological specialization, the preferential utilization of certain resources is reinforced by disruptive selection. This form of natural selection causes genetic diversification and ultimately speciation [96, 97]. From the symbiont perspective, the host as habitat is the major ecological resource upon which they depend; where external environmental conditions and nutritional resources are modulated through the host. In these symbiotic relationships, therefore, the host cellular and biochemical attributes comprise a major axis of natural selection driving niche diversification. Preferential colonization of certain host species by a portion of the symbiont population that share similar genetic attributes will lead to genetic diversification if reproductive success is enhanced. While both Cladocopium spp. associate with larval brooding Pocillopora, they are ecologically differentiated by the species of broadcast spawning Pocillopora with which they associate (Fig. 4A).

The host’s biology of transmitting the symbiont to developing eggs (vertical transmission) appears to have further promoted the evolution of C. latusorum and C. pacificum from a recent common ancestor. The family Pocilloporidae (genera Pocillopora, Stylophora, and Seriatopora), together with members of the genera Montipora and Porites, incorporate symbionts into the egg during oogenesis [98,99,100]. The sequestration of symbiont cells to the host’s germline exerts additional selection pressure that favors well-adapted symbiont genotypes and more rapidly advances the process of speciation [101]. Therefore, it is not surprising that numerous independent monophyletic lineages in Cladocopium correspond to each of these three groups of common reef-forming coral [24, 26, 102]. In contrast, the majority of Indo-Pacific corals display open modes of symbiont acquisition (relying on environmental sources to obtain their symbionts) and associate with different species of Cladocopium, some of which are capable of forming stable mutualisms with many distantly related coral taxa [24, 102].

Relationship between coral and dinoflagellate diversification

The late Pliocene/early Pleistocene evolution of C. latusorum and C. pacificum corresponds to the origins of Pocillopora species around this same time [42]. There is no indication that long-term co-diversification, or co-speciation, is sustained between corals and their dinoflagellate symbionts [22, 103]. However, these findings support the concept that co-diversification does occur over shorter time frames of geological epochs, probably during intervals between major shifts in the planet’s environment [33]. Genetic and paleontological evidence placed the origination of modern Pocillopora species during the Pliocene and Pleistocene (~3.0 mya; 42), epochs that are characterized by cycles of intense cooling and warming [104]. Consistent with this timing, we calculated the diversification of C. latusorum and C. pacificum at approximately 2.7 mya (± 1.0 my; Fig. 4B). While the evolution of host specialized symbiont species may occur rapidly [97] once these new mutualisms had evolved, they have persisted for a long interval of time (Fig. 4B). The long co-existence of these partnerships, over large geographic ranges, has implications in their conservation and the extent to which these partnerships can respond to rapid changes in climate (discussed below).

Geographic mosaic of symbionts with Pocillopora: the effect of latitude and isolation

The populations comprising C. latusorum and C. pacificum, respectively, appear connected genetically across much of the Pacific Ocean (Figs. 2C and 3A). However, fine-scale phylogeographic patterns and allele frequency differences in each species suggest some partitioning among populations (Figs. 3A and S2). Slight genetic differences between populations from distant regions are probably influenced by a number of factors including local adaptation to seasonal variability in temperature and light, ocean upwelling, and isolation by distance [32, 105, 106]. While sample sizes across geographic locations preclude population genetic analyses from microsatellite data, there were indications that some geographically separated regions contained distinct populations (Fig. 2B, S2). Populations of C. latusorum from eastern and western Australia possess psbAncr haplotypes similar to each other (Fig. 3A). Moreover, psbAncr haplotypes of C. latusorum from around Taiwan clustered together (Fig. 3A). Consistent with the Eastern Pacific being among the most isolated region the Indo-Pacific, the psbAncr haplotypes in this remote location clustered together and, when considered with microsatellite data, support that these populations are somewhat isolated genetically (Fig. S2). Interestingly, the psbAncr haplotypes were basal in the phylogenies of both species (Fig. 3A). Ultimately, more intensive sampling and application of additional population genetic markers are needed to substantiate and discover the detailed population-level structure within C. latusorum and C. pacificum across the biogeographical range of their hosts (e.g. [106]).

In addition to the symbiont species described here, Pocillopora form mutualisms with several other undescribed Cladocopium ‘species’ across the breadth of their geographic range, as well as Durusdinium glynnii [82]. Cladocopium C1h is particularly common in animals across the Indian Ocean [24, 50], while certain regionally endemic Cladocopium spp. are found in colonies from subtropical to temperate latitudes (23°–40°), characterized by turbid, cold water, environments [24, 107]. Some of these symbionts are monophyletic with C. latusorum and C. pacificum (such as C1ee and C1h); [19, 38], while others appear to have evolved independently [20, 50, 108]. Where these other Pocillopora-specific Cladocopium spp. overlap in geography with C. latusorum and C. pacificum, their local distributions among colonies of Pocillopora appear to be zoned by water depth [26, 108]. Such biogeographic and ecological patterns indicate a geographic mosaic of co-diversification between Pocillopora and Cladocopium [109]. Having evolved over several millions of years, these mutualisms are adapted to a broad range of environmental extremes and thus represent valuable study system to provide insights into the response of coral mutualisms to major shifts in climate.

Implications of widely distributed host-symbiont mutualisms that are millions of years old

The associations between Pocillopora and specific Cladocopium species across thousands of kilometers, covering environmental gradients and reef habitats found over large spans of latitude and longitude exemplify the fidelity exhibited by many of these animal–algal partnerships (Fig. 3C, E). Therefore, while prevailing environmental conditions and past episodes of severe thermal stress can influence the prevalence and ecological dominance of certain symbiont species, even among corals that vertically transmit their symbionts [38], high specificity between hosts and their adapted symbionts appears to often supersede external environmental influences [18, 110].

As with the two species described here, the majority of Cladocopium diversity appears to have originated during a time frame spanning 5–6 my to the present [25, 78]. The Pliocene and Pleistocene epochs have been the coldest times on Earth since the beginning of the Cenozoic era (65 mya) [104] when reef communities began recovery from the K/T boundary mass extinction event. Therefore, the sensitivity of corals to warming ocean temperatures may be explained by the evolution and ecological dominance of relatively “cold-adapted” Cladocopium-coral associations.

Nonetheless, both host and symbiont species lineages have endured numerous cooling and warming cycles of the planet, which intensified during the Mid-Pleistocene transition (~1 mya; e.g. [111, 112]). This longevity indicates that reef corals like Pocillopora, and their symbionts, have the capacity to adjust to modern-day increases in ocean warming, at least over the coming decades [113]. While, the pocilloporids that possess attributes of rapid generational times, high dispersal capacity, and successful early colonizers following disturbances, are susceptible to regional extinctions [114, 115]. The apparent capacity for dispersal and gene flow across populations of Pocillopora and their symbionts over thousands of kilometers likely expedited adaptation to past changes in climate; and may help them persist through current shifts in climate [49, 106].

Ultimately, the broad geographic distributions and geological age of these and other host–symbiont combinations must be considered in forecasting their response to ocean warming, and guide decisions when planning for their conservation [116]. The characterization of genetic variation captured in the sampling of C. latusorum and C. pacificum may offer some insight in forecasting the fate of these mutualisms. It should be noted that C. latusorum appears more genetically diverse than C. pacificum, assuming that microsatellite diversity (Table 1, Figs. S1 and S2) and psbAncr haplotype sequence variation (Fig. 3A) provide accurate proxies for the total genetic variation contained in the populations of these species. As selection pressure intensifies with rising ocean temperatures, this difference in haplotype diversity may influence the relative adaptive potential of each of these symbiont species and their mutualisms.