Introduction

Mitochondrial (mt) genes have a long history of use for phylogenetic reconstruction in animals1, and the relative ease with which complete mt genomes can now be obtained has fueled an increase in their use to resolve phylogenetic relationships within many groups2,3,4. Animal mt genomes typically include a highly conserved set of protein-coding genes with few non-coding intergenic regions; are inherited uniparentally without undergoing recombination; and in many cases have rates of substitution that may be an order of magnitude higher than those of the nuclear genome5. While these properties might be advantageous for phylogenetic reconstruction in some cases, they may also generate phylogenetic signals that differ from those of the nuclear genome. Discordance between nuclear and mt gene phylogenies is common and can result from biological processes such as introgression or incomplete lineage sorting (ILS) that act differently on mt vs. nuclear genomes (e.g.6,7,8). Alternatively, apparent mt-nuclear discordance can arise from inaccurate estimation of phylogenies due to low statistical power, poor model fit or taxon sampling issues8. Recent advances in computational models and increased taxon sampling of both mt and nuclear genomes have allowed these alternative sources of discordance to be evaluated in several well-sampled vertebrate taxa6,8. Studies have concluded that mt-nuclear discordance more often arises from biological processes such as introgression and ILS and persists even when factors that lead to inaccurate phylogenetic estimation have been addressed6,7,8.

Phylogenies of anthozoan cnidarians (e.g., corals and sea anemones) reconstructed from mt genes or genomes have often recovered relationships within and among orders that differ from those inferred from both nuclear genes and morphology. The mt genomes of these non-bilaterian metazoans have several unusual properties that are not found in bilaterians9. For example, the mt genomes of class Hexacorallia (e.g., sea anemones, scleractinian corals and black corals) encode the standard 13 protein-coding genes found in bilaterians, but only two tRNAs (trnW, trnM)10,11,12,13,14. Many hexacorals have group I introns in nad5 or cox110,11,12,13, and the latter gene may have a LAGLI-DADG type homing endonuclease encoded within it13. The ceriantharian tube anemones have multipartite linear mt genomes15. All members of class Octocorallia (e.g., soft corals, gorgonians and sea pens) have just a single tRNA (trnM), but with only one known exception (i.e., a member of genus Pseudoanthomastus16) their mt genomes include an additional protein-coding gene that encodes the DNA mismatch repair protein, mtMutS17. At least one sea pen has a bipartite circular mt genome18, and other octocoral lineages have undergone frequent rearrangements (inversions) of gene order by a mechanism that appears to involve intramolecular recombination19,20,21.

The unusual property of anthozoan mt genomes that has most impacted their utility for phylogenetic reconstruction is, however, the rate at which they evolve. Unlike bilaterian mt genomes that tend to evolve 5–10X faster than the nuclear genome22,23, anthozoan mt genes typically evolve 10–100X slower than nuclear genes24. As a result, mt genes that have been widely used in bilaterians for barcoding, species-level phylogenetic analyses and phylogeography are often invariant within—and sometimes between—anthozoan genera25,26. These slow rates of mt gene evolution have, however, increased the potential utility of mt genes for reconstructing deep phylogenetic relationships among the families and orders of Anthozoa, a group of organisms that last shared a common ancestor in the pre-Cambrian27,28. Nonetheless, phylogenies of Anthozoa reconstructed from complete mt genomes (or their protein-coding genes) have often been incongruent with other sources of morphological and phylogenomic evidence. The most notable of these discrepancies has been a lack of support for the monophyly of the anthozoan classes, Hexacorallia and Octocorallia. Mitochondrial phylogenies have often placed Octocorallia sister to the cnidarian sub-phylum Medusozoa4,21,29,30, despite the very strong morphological and life-history evidence for the monophyly of Anthozoa (see31), which has also been confirmed in several phylogenomic studies32,33. Moreover, in some of these same analyses Hexacorallia has been recovered outside of Cnidaria, as the sister to a clade of sponges4,34. Mitochondrial gene phylogenies have also recovered Ceriantharia (tube anemones) sister to the rest of Anthozoa15,30,35 rather than within Hexacorallia as supported by genomic-scale studies27,28,32. In addition, previous studies have suggested that Scleractinia is paraphyletic with Corallimorpharia4,12,36) and have differed from nuclear gene phylogenies in the placement of the orders Actiniaria, Zoantharia and Antipatharia and in the relationships among the major clades of Scleractinia37,38. Within Octocorallia, mt genes and/or genomes have provided little statistical support for the deepest nodes in either of the two major clades that have been recognized29,30,39,40.

Explanations that have been proposed to explain the incongruence between mt and nuclear or morphological phylogenies of Anthozoa include substitution saturation of the mt genome21,36,41, rate heterogeneity between the major lineages29, and long branch attraction (LBA) due to the combined effects of rate heterogeneity and incomplete or biased taxon sampling34. Most mt genome phylogenies and phylogenomic analyses of anthozoans published to date have been taxon-sparse, often omitting entire orders29,32,33 or have drawn comparisons between topologies generated from completely different taxon sets41. As a result, it is still unclear if the source of incongruence between mt and nuclear gene phylogenies of anthozoans is simply an artifact of incomplete, biased and incomparable taxon sampling or if the evolutionary signal present in anthozoan mt genomes does indeed differ from that of the nuclear genome.

Recent advances in phylogenomic methods and technologies have facilitated the ability to obtain complete mt genomes while simultaneously generating sequence reads for thousands of nuclear genes. In particular, target-enrichment methods used to sequence ultraconserved elements (UCEs) and exonic regions of the nuclear genome can recover complete or near-complete mt genomes as off-target reads3. Comparisons of mt vs nuclear gene phylogenies from the same set of taxa (often the same individuals) facilitate investigation of the causes of mt-nuclear incongruence by eliminating artifacts that may be caused by unequal or different taxon sampling.

In recent phylogenomic analyses of Anthozoa based on UCEs and exons27,28, complete or near-complete mt genomes were recovered for a majority of the taxa sequenced. Here, we used the complete set of mt protein-coding sequences to reconstruct the phylogenies of the Octocorallia and Hexacorallia classes and compared those to nuclear gene phylogenies generated for the same set of individuals. The dataset comprised a total of 202 species representing all orders and > 50% of extant families. With this comparable dataset, the impacts of sampling biases were removed and we were able to robustly explore whether incongruence is related to evolutionary signal. New findings on the unique properties of the recovered mt genomes are also noted.

Methods

Target-enrichment analyses

UCE and exon loci were target enriched and bioinformatically extracted from high-throughput sequencing data as described in Quattrini et al.27,42 using the anthozoa-v1 baitset42. Briefly, raw reads were cleaned using illumiprocessor43 and Trimmomatic v 0.3544 and then assembled using either Spades v 3.1 (45; with the –careful and –cov-cutoff 2 parameters) or Trinity v. 2.046. The phyluce pipeline was then used as described in the online tutorials (https://phyluce.readthedocs.io/en/latest/tutorials/tutorial-1.html) with some modifications (see supplemental code in 27, 42). Using phyluce, 75% and 50% taxon-occupancy matrices were created for each nuclear locus, aligned with MAFFT v7.130b47, and loci were concatenated (phyluce_align_format_nexus_files_for_raxml) separately for hexacorals (n = 108) and octocorals (n = 94).

Mt genome analyses

Whole and partial mt genomes were extracted from the off-target reads in the target-enrichment sequencing data. Mitochondrial genomes were extracted and assembled in three ways. First, we used blastn to find whole or partial genomes in the Trinity or Spades assemblies and then extracted those as fasta sequences. Second, we used Novoplasty v 2.648 to assemble mt genomes using the adapter-trimmed paired-end reads. Seed files were used to help assemble each species and consisted of cox1 sequences downloaded from GenBank for the species of interest or a closely-related species. Third, Geneious Prime 2020 (https://www.geneious.com) was used for genomes that were difficult to assemble with Spades and Novoplasty. The Map to Reference tool, with Mapper set to “Geneious” and Sensitivity to “Medium Sensitivity/Fast”, within Geneious was used. Reference sequences included individual mt loci from closely-related taxa, either mtMutS, cox1 or 16S. The fine-tuning option required iterations from “up to 5” to “up to 25” times to assemble the complete mt genome from the reference sequences.

Following mt genome assembly, fasta files were uploaded to Mitos2 (49, http://mitos2.bioinf.uni-leipzig.de) for annotation (translation code = 4). For further analyses, we used only species whose mt genomes were represented by at least 50% of the protein coding genes (hexacorals n = 108, octocorals n = 94, Suppl. Table 1), except that we included five ceriantharians with low mt genome recovery (e.g., for 15–53% of genes recovered for each species). Protein-coding genes were then each aligned separately using MAFFT v7.130b47 and adjusted by eye to ensure the sequences were in frame. Loci were then concatenated with phyluce_align_concatenate_alignments.

Some mt genomes for which we had corresponding nuclear data could not be assembled, or were published in previous studies, and so sequences were downloaded from GenBank and subsequently used in our analyses (Suppl. Table 1). We used mt data from GenBank for 26 hexacorals; 16 of these were of the same individuals used in our study. All octocoral mt genomes were also assembled concurrently in another study16 and added to GenBank by those authors.

Phylogenomic analyses

Removing loci that are saturated can improve phylogenomic analyses50. Therefore, we ran saturation tests on each of the different locus datasets using Phylomad51. For nuclear loci, we ran saturation tests using models of entropy on all variable sites and only on those that had no missing data in each locus alignment following50,51. Entropy values below the predicted threshold value indicate a high risk of substitution saturation. Datasets are denoted hereafter as LR (low risk loci) and LRM (low risk loci with no missing data in saturation test). For the mt data, we ran saturation tests on sites with no missing data for the concatenated alignment. Loci with substitution saturation were removed and then various datasets were used for further phylogenetic analyses (Suppl. Table 2, Table 1).

Table 1 Summary statistics for different alignment datasets.

Selection tests were conducted using codon-based models in Codeml within PAML v. 452. The one ratio model (M0) was run on the mt alignment only, for both octocorals and hexacorals. This allowed us to estimate average omega (dN/dS) and kappa (ts/tv) values across all branches in the corresponding mt phylogenies. Omega values = 1 indicate the locus is evolving neutrally, values > 1 indicate positive selection and values < 1 indicate negative or purifying selection. Higher kappa values indicate transition relative to transversion bias.

Phylogenomic analyses were conducted using maximum likelihood in IQTree v 2.153 on each of the concatenated datasets (Table 1). We ran partitioned analyses on the different datasets using the best model for each locus (-m TESTMERGE) chosen with ModelFinder54. Ultrafast bootstrapping (-bb 100055) and the Sh-like approximate likelihood ratio test (-alrt 100056) were conducted as well as site-concordance factors (-scfl 100)57. For nuclear data, a species tree analysis was also conducted using ASTRAL III v 5.7, which is statistically consistent under a multispecies coalescent model58. A tree for each locus was constructed in IQTree using the best fit model of evolution selected with ModelFinder for each locus. We used the 75% taxon-occupancy data matrices for octocorals and for hexacorals. Treeshrink59 was used to remove long branches, and the newick utility, nw_ed, was used to remove branches with < 30% bootstrap support prior to running ASTRAL. Site concordance factors (-scfl 100) were also calculated on the ASTRAL species tree but using the concatenated alignment of the loci used in the species tree analysis. The phylogenetic relationship of Renilla muelleri to other octocorals was spuriously placed in some phylogenies. Because this species is well-supported in Pennatuloidea, we pruned this species from all phylogenetic trees using the R phytools package60.

Following phylogenetic inference, we conducted Robinson-Foulds distance (R-F,61) tests using IQTree v2.1 (-rf). R-F distances were calculated between all pairs of hexacoral unrooted trees and all pairs of octocoral unrooted trees. We also used the R TreeDist62 package to calculate generalized Robinson-Foulds (gR-F) distances as a comparison. The two most congruent mt and nuclear trees based on maximum likelihood were determined from the smallest R-F distances for both hexacorals and octocorals and plotted. In cases where R-F distances were the same, we chose the topology with lower gR-F distances that also had the most nodes with bs support values > 95%. Hexacorals were rooted at the Ceriantharia based on prior phylogenomic studies of the phylum Cnidaria32,33 and Scleralcyonacea was rooted to Malacalcyonacea based on prior phylogenomic studies27,28,40.

GC content (%) was calculated for all mt and nuclear loci to determine if GC content contributed to mt-nuclear discordance. The program SeqKit63 was used to calculate the GC content at each locus for every individual. GC content was then averaged over all loci for each individual using awk and plotted in R using ggplot264. A one-way analysis of variance was conducted in R to test whether GC content differed significantly between mt and nuclear data. Code for all analyses can be found in Suppl. File S1 and all trees and alignments can be found on figshare.

Results

Mt genome assemblies

Herein we assembled complete or near complete mt genomes of 75 hexacorals (73X average coverage) from the following orders: Actiniaria, Antipatharia, Ceriantharia, Corallimorpharia, and Scleractinia. Ceriantharian mt genomes were difficult to assemble. Out of five ceriantharians, none had complete mt genomes and only two genes were found for one species (Ceriantheomorphe brasiliensis). Only one species, Botruanthus mexicanus, had a near complete genome assembly. We confirmed the presence of group I introns (Suppl. Table 3) in many taxa. In Actiniaria, Antipatharia, Zoantharia, and Relicanthus daphneae, two protein-coding genes, nad1 and nad3, were found inserted as introns within nad5. Ten protein-coding genes were found in the nad5 intron of most scleractinians, with the exception of Caryophyllia arnoldi in which we found only seven protein-coding genes and rns within the nad5 intron. In Corallimorpharia, 10 protein-coding genes were in the nad5 intron of Corallimorphus profundus, and for the rest of the corallimorpharians (Rhodactis osculifera, Discosoma carlgreni, and Ricordea florida), all genes but trnW were in the nad5 intron. Another group 1 intron that encodes a homing endonuclease from the LAGLI-DADG family was present in cox1 of some hexacorals (uploaded to figshare). We confirmed the presence of this endonuclease in 24% of actiniarians, 28% of scleractinians, 17% of antipatharians, and 100% of corallimorpharians (Suppl. Table 3). We also documented this intron in two species of Ceriantharia, Botruanthus mexicanus and Ceriantheomorphe brasiliensis.

Of the complete (or near complete) mt genomes of hexacorals assembled in this study, only three species displayed gene order rearrangements relative to other taxa in their respective orders (Suppl. Table 3). Within Actiniaria, only one species sequenced, Alicia sansibarensis, exhibited a mt genome rearrangement with cox2-nad4-nad6-cob inserted prior to atp8 instead of between nad6 and rns. Of the scleractinians, Caryophyllia arnoldi had a genome rearrangement with the cob-nad2-nad6 gene block inserted after the 3’ end of nad5 instead of within the nad5 intron. The mt genome of Madrepora oculata also had a gene rearrangement, with a switch in the order of cox2 and cox3 compared to all other scleractinians. Corallimorphus profundus also had a different genome rearrangement compared to R. osculifera, D. carlgreni, and R. florida (Suppl. Fig. 3). Corallimorphus profundus had 10 protein-coding genes and rns within the nad5 intron. In contrast, R. osculifera, D. carlgreni and R. florida have all other genes but trnW within the nad5 intron.

Alignment summary

For hexacorals, concatenated nuclear locus alignments across 50–75% taxon-occupancy datasets ranged from 38,534 to 246,027 bp with 95 to 756 loci in each dataset (Table 1). For each hexacoral species, locus recovery (average read coverage = 15X) ranged from 303 to 1156, with overall few loci (342 to 589) recovered in ceriantharians (Suppl. Table 1). For octocorals, concatenated nuclear locus alignments across 50–75% taxon-occupancy datasets ranged from 213,477 to 555,701 bp with 408 to 1,252 loci (Table 1). For each octocoral species, 604 to 1275 loci were recovered (average read coverage = 26X) (Suppl. Table 1).

All 13 protein-coding genes were included in the alignment for 79% of all hexacoral species (Suppl. Tables 1 and 3). The hexacoral mt genome alignment containing the 13 protein-coding genes was 12,465 bp, and for each gene at least 94–98% of the species were represented.

For octocorals, all 14 protein-coding genes were included in the alignment for 80% of species. The octocoral mt genome alignment was 16,176 bp, and for each gene 96–100% of the species were represented, except for mtMutS. mtMutS was included for only 77% of the species as for some species mtMutS was highly incomplete or, in some cases (< 5%), it could not be reliably aligned to other species.

GC content differed between mt and nuclear loci for both hexacorals and octocorals (Fig. 1). GC content was significantly higher (F = 246, p < 0.001) across nuclear loci (39%) than mt loci (34%) in hexacorals. Of note, the GC content in the mt genes of Zoantharia was more similar to the GC content of the nuclear loci, and slightly higher by 0.5%. This pattern contrasted with the other hexacoral orders in which mt GC content was 4–8% lower than nuclear GC content. Ceriantharia had low GC content in mt data as compared to other orders. Antipatharia had slightly higher GC content (42%) in the nuclear loci compared to other groups of hexacorals (38–40%). In octocorals, GC content was also significantly higher (F = 779, p < 0.001) in nuclear loci (38%) as compared to mt loci (32%). However, two octocoral species had mt GC content similar to the nuclear data, at 38–40%: Leptophyton benayahui and Tenerodus fallax, which are sister to one another in both phylogenies and on long branches in the mt tree (Fig. 2).

Figure 1
figure 1

GC content (%) in nuclear and mitochondrial loci of hexacorals and octocorals. ***p < 0.001.

Figure 2
figure 2

Maximum likelihood tree of Hexacorallia inferred from (left) mitochondrial and (right) nuclear loci (50% taxon occupancy, no loci with substitution saturation as denoted in saturation tests with no missing data included). BS Ultrafast bootstraps, SHaLRT Shimodaira-Hasegawa approximate likelihood ratio test, PP posterior probabilities from the most similar ASTRAL tree. Numbered squares on branches identify clades discussed in “Results”.

Selection tests on the mt genome alignments indicated that the mt genomes are under strong purifying selection. The omega value (dN/dS) for hexacorals was 0.10 while the value for octocorals was 0.14. The kappa value (ts/tv) for hexacorals was 2.7, whereas in octocorals it was higher at 3.9; both values indicate higher numbers of transitions than transversions. Saturation tests conducted using PhyloMad indicated that neither the hexacoral nor octocoral mt alignment was under saturation as indicated by entropy tests (Suppl. Fig. 1, Suppl. Table 2). For nuclear-locus datasets, 8–50% of the loci in each dataset had a high risk of substitution saturation. Hexacorals tended to have more saturated loci, with 30–50% saturated loci per dataset whereas octocorals had a lower number, with 8–35% saturated loci per dataset.

Mt-nuclear discordance

Hexacorallia

Overall, all phylogenies constructed for Hexacorallia were well supported (Table 1). Among all nuclear trees constructed with ASTRAL and IQTree, 83 to 97% of nodes (106 total nodes) on each tree had higher than 95% ultrafast bootstrap (bs) values, posterior probabilities (pp), and SH-aLRT values. Similarly, the mt genome tree was well supported with 78–89% of nodes having higher than 95% ultrafast bs values and SH-aLRT values.

There were some differences among all hexacoral phylogenies, but nuclear phylogenies constructed with ASTRAL and IQTree were mostly congruent with one another (pairwise R-F = 4–28, gR-F = 0.01–0.08). The R-F distances between the hexacoral mt genome tree and the nuclear trees, however, were much larger, ranging from 56 to 68 (gR-F = 0.14–0.16). The mt genome tree was most similar to the ASTRAL species trees (R-F = 56–60, gR-F = 0.14–0.15) compared to the maximum likelihood phylogenies (R-F = 62–66, gR-F = 0.14–0.16), although the differences were negligible. There were three maximum likelihood phylogenies that were all equally congruent with the mt genome phylogeny (R-F = 62, gR-F = 0.15–0.16). Of the three most congruent ML trees, the tree with 50% data occupancy and highly saturated loci removed without missing data in the saturation tests (50LRM) had the lowest gR-F value and highest BS support values.

There were a few species on long branches in the mt genome tree, but not the nuclear tree, including the zoantharians Nanozoanthus harenaceus and Microzoanthus occultus and the scleractinian Paraconotrochus antarcticus (see Suppl. Files, Suppl. Fig. 2). In addition, the branch lengths at the tips were much shorter in the mt genome tree compared with the nuclear tree, indicating slow substitution rates between conspecifics and in many cases, between genera. Overall, however, the rate variation appeared higher in the mt genome tree as compared to the nuclear tree (Suppl. Fig. 2).

Although there were several differences among shallow nodes in all topologies, two major differences were apparent at deep nodes (Fig. 2). First, the relationship of Zoantharia and Actiniaria to other orders differed among topologies. In the mt genome tree, Actiniaria was sister to all other hexacoral orders except Ceriantharia (bs = 100, SHaLRT = 100, sCF = 82). This same relationship was also recovered in the ASTRAL species trees (bs = 100, SHaLRT = 100, sCF = 54–56) and the 75LRM tree (bs = 100, SHaLRT = 100, sCF = 56, Fig. 2, see Suppl. Files). In contrast, in most of the maximum likelihood trees for the nuclear dataset, Zoantharia diverged earlier than Actiniaria (bs = 100, SHaLRT = 100, pp = 100, sCF = 52–84). Second, the relationship of Relicanthus daphneae to other orders differed among phylogenies. In the mt genome phylogeny, R. daphneae was sister to the zoantharians, although with low to moderate support (UF = 88, SHaLRT = 77, sCF = 42). In the majority of nuclear phylogenies, R. daphneae was also recovered as sister to Antipatharia-Corallimorpharia-Scleractinia, with variable support depending on dataset (bs > 84, SHaLRT > 75, pp > 41, sCF > 32).

There were also some differences between mt genome and nuclear phylogenies within each hexacoral order. Within Scleractinia, there were differences among trees at the shallow nodes, including branch lengths and the relationships among species of Porites (S4 clade), but the major difference was the placement of the family Micrabaciidae (S1). Nuclear phylogenies all strongly supported that this family is sister to the Robust/Vacatina clade (S2) of Scleractinia (bs = 100, SHaLRT = 100, pp = 97–100, sCF = 37–40). In contrast, Micrabaciidae (S1) was recovered as sister to all other Scleractinia (S2 + S3) with strong support (bs = 100, SHaLRT = 99, sCF = 33) in the mt genome phylogeny. Within Actiniaria, the position of the superfamily Actinostoloidea (A2) differed between mt genome and nuclear phylogenies. This superfamily was sister to the superfamilies Metridioidea + Actinioidea (A4 + A3) in the mt genome phylogeny (bs = 100, SHaLRT = 100, sCF = 48.6) whereas it was sister to the superfamily Actinioidea (A3) in all nuclear phylogenies (bs = 100, SHaLRT = 99–100, pp = 100, sCF = 37–38). Within Antipatharia, the position of Acanthopathes thyoides (An3) differed between mt genome and nuclear phylogenies. This species was sister to all other antipatharians (An1 + An2) in the mt genome phylogeny (bs = 100, SHaLRT = 100, sCF = 72) whereas it was sister to the family Schizopathidae (An2) in the majority of nuclear phylogenies (bs = 76–100, SHaLRT = 23–100, pp = 91–98, sCF = 34–37), except for the 50LRM and ASTRAL LRM topologies (bs = 100, SHaLRT = 100, pp = 100, sCF = 65–67), which matched the mt genome tree. Within Zoantharia, the placement of Epizoanthus illoricatus (Z2) and Neozoanthus aff. uchina (Z4 in part) differed among mt genome and nuclear phylogenies. In the mt genome tree, E. illoricatus (Z2) was sister to the rest of the zoantharians (bs = 100, SHaLRT = 100, sCF = 66), whereas in all nuclear phylogenies, Nanozoanthus harenaceus and Microzoanthus occultus (Z1) were sister to the rest of the zoantharians (bs = 100, SHaLRT = 100, pp = 100, sCF = 37–38). Neozoanthus. aff. uchina (Z4 in part) was sister to the family Zoanthidae (Z5) in the mt genome phylogeny (bs = 100 SHaLRT = 100, sCF = 49) whereas it was sister to Hydrozoanthus gracilis (Z4) in all nuclear phylogenies (bs = 100 SHaLRT = 100, pp = 100, sCF = 48–51). Within Corallimorpharia, there were differences within the Discosomidae family (C1) with Rhodactis osculifera sister to Discosoma carlgreni in the nuclear phylogeny yet sister to the remaining discosomids in the mt genome phylogeny.

Octocorallia

Nuclear gene phylogenies for Octocorallia were in general well supported. Among all nuclear trees constructed with ASTRAL and IQTree, 83 to 96% of nodes (91 total nodes) on each tree had higher than 95% ultrafast bootstrap (bs) values, posterior probabilities (pp), and SH-aLRT values. In contrast, mt genome trees for Octocorallia were not as well supported with only 76% of nodes having higher than 95% ultrafast bs and SHaLRT values.

Nuclear phylogenies constructed with ASTRAL and IQTree were somewhat congruent with one another (R-F = 6–36, gR-F = 0.03–0.16 Table 2, see Suppl. Files). The R-F distances between the octocoral mt genome tree and the nuclear trees, however, were much larger, ranging from 60 to 72 (gR-F = 0.23–0.36). Octocoral mt genome trees were somewhat more similar to the maximum likelihood phylogenies (R-F = 60–68, gR-F = 0.23–0.28) as compared to the ASTRAL trees (R-F = 68–72, gR-F = 0.25–0.36). The most similar tree (R-F = 60, gR = F = 0.23) to the mt genome phylogeny was constructed with a 75% taxon occupancy data matrix with highly saturated loci removed and no missing data in the saturation test (75LRM). In general, branch lengths differed between mt genome and nuclear trees (Suppl. Fig. 2) and rate variation was higher across the mt genome tree. The branch lengths at the tips were much shorter in the mt genome tree compared with the nuclear tree, but there were also several long branches recovered in the mt genome tree. In the mt genome tree, seven species were on very long branches (Muricella sp., Leptophyton benayahui, Tenerodus fallax, Cornularia pabloi, Pseudoanthomastus sp., Erythropodium caribaeorum, and Melithaea erythraea), a pattern not recovered in nuclear phylogenies (see Suppl. Files, Suppl. Fig. 2).

Table 2 Pairwise Robinson-Foulds and Generalized Robinson-Foulds (in parentheses) distances between hexacoral topologies and octocoral topologies.

Numerous differences were apparent among the octocoral mt genome and nuclear phylogenies (Fig. 3). Within the order Scleralcyonacea, the placement of Pennatuloidea + Ellisellidae (clade S1) differed. In the mt genome tree this clade was sister to the Keratoisididae + Primnoidae + Chrysogorgiidae (S2) and Helioporidae (S3) clades (bs = 90, SHaLRT = 100, sCF = 29). In the nuclear datasets, it was sister either to clades S3 + S4 with various levels of support (bs = 53–100, SHaLRT = 25–99, sCF = 33–36) or sister to clade S2 + S3 + S4 with strong support (bs = 100, SHaLRT = 100, pp = 93–100, sCF = 36). Cornularia pabloi also changed positions, diverging later (sister to clade S3) in the mt genome phylogeny (bs = 90, SHaLRT = 95, sCF = 28) as compared to all nuclear phylogenies where it was placed sister to all other scleralcyonaceans (bs = 100, SHaLRT = 100, pp = 100, sCF = 37). Parasphaerasclera valdiviae was an early-diverging lineage and sister to all other scleralcyonaceans in the mt genome phylogeny (bs = 100, SHaLRT = 100, sCF = 63) whereas it was sister to family Coralliidae in the nuclear phylogeny (bs = 100, SHaLRT = 100, pp = 100, sCF = 37). Helioporidae (S3) was recovered as sister to clade S4 in the maximum likelihood nuclear phylogenies (bs = 99–100, SHaLRT = 99, sCF = 36) but sister to clade S2 in the mt genome phylogeny (bs = 90, SHaLRT = 97, sCF = 34) and the ASTRAL phylogenies, although the relationships in the species trees were poorly to moderately supported (pp = 5–86, sCF = 34). Family Keratoisididae was recovered as sister to Primnoidae in the mt genome phylogeny (bs = 94, SHaLRT = 96, sCF = 32) and in one nuclear phylogeny (50LRM) but with poor support (bs = 79, SHaLRT = 47, sCF = 32). In all other nuclear phylogenies, Keratoisididae was recovered sister to Chrysogorgiidae (bs = 100, SHaLRT = 100, pp = 90–100, sCF = 36–37).

Figure 3
figure 3

Maximum likelihood tree of Octocorallia inferred from (left) mitochondrial and (right) nuclear loci (75% taxon occupancy, no loci with substitution saturation as denoted in saturation tests without missing data). BS Ultrafast bootstraps, SHaLRT Shimodaira-Hasegawa approximate likelihood ratio test, PP posterior probabilities from the most similar ASTRAL tree. Numbered squares on branches identify clades discussed in “Results”.

Within Malacalcyonacea, several differences among phylogenetic relationships were noted, including some relationships among congeneric species. The Incrustatidae + Malacacanthidae clade was an early-diverging lineage and sister to most malacalcyonacean families (except for Clavularia inflata) in the mt genome tree (bs = 100, SHaLRT = 100, sCF = 50), but these families diverged later as part of the M2 clade in the nuclear phylogenies. The Tubiporidae + Arulidae clade (M1) was sister to all malacalcyonaceans (except for C. inflata) in the nuclear phylogeny (75LRM). In the mt genome phylogeny, it included Nidalia and was sister to the Sarcophytidae + Carijoidae clade (M3a) (bs = 72, SHaLRT = 86, sCF = 32). An Anthogorgiidae + Eunicellidae + Plexaurellidae clade (M8a) was sister to Paramuriceidae (M8c) in the mt genome phylogeny (bs = 99, SHaLRT = 95, sCF = 35). In contrast, the Keroeididae + Taiaroidae + Astrogorgiidae clade (M8b) was sister to Paramuriceidae (M8c) in the nuclear phylogenies (bs = 99–100, SHaLRT = 99–100, pp = 100, sCF = 34–42). Within Sarcophytidae, relationships differed among species between mt genome and nuclear phylogenies.

Discussion

Mt genome properties

Utilizing a total of 202 complete or near-complete mitochondrial (mt) genomes, we were able to examine mt-nuclear discordance within the Anthozoa and explore the unique mt genome properties of all orders belonging to this sub-phylum of Cnidaria. In addition to the mt genomes newly assembled here, most of the previously published mt genomes16,38,65 that we included in our analyses had been assembled from the raw sequence data from Quattrini et al.27,42. This large dataset of mt genomes further demonstrates the utility of off-target reads generated from target-capture data for the assembly of mt genomes and adds to the growing knowledge of mt genome evolution within the sub-phylum Anthozoa.

Although group I introns have been previously recorded in hexacorals10,11,12,13,14,65,66,67,68, we note their pervasiveness across the group. A nad5 intron of at least two protein-coding genes and up to all 13 is present in the majority of hexacoral families. From our data, it also appears that this intron is present in Ceriantharia, however, this needs further confirmation as we had difficulties assembling mt genomes in that order. The other group I intron that encodes a homing endonuclease from the LAGLI-DADG family is present in cox1 in many hexacorals. Both gains and/or losses of this gene have been previously noted in the hexacoral orders Scleractinia13, Corallimorpharia67, Actiniaria68, and Zoantharia65. This endonuclease appears to be more common in some orders (Zoantharia, Corallimorpharia) than others (Scleractinia). Based on annotation from Mitos2, we also documented this intron in two ceriantharians. To our knowledge, this intron has not yet been documented in the order Ceriantharia. Based on its distribution across the phylogeny, the homing endonuclease, likely a result of horizontal transmission13, has been gained and lost within Hexacorallia for several hundred million years, with origins dating to 300–400 MYA27. To date, no introns have been recorded in Octocorallia.

Mitochondrial genome rearrangements within Anthozoa have been a topic of interest for over two decades, as species in this sub-phylum exhibit several gene order changes. Of the 108 complete (or near complete) mt genomes of hexacorals examined in this study, only 6% displayed gene order rearrangements relative to the canonical gene order within their respective taxonomic order; many of these rearrangements have been described in prior studies (e.g.12,68,69). In contrast to hexacorals, octocorals have undergone gene rearrangements more frequently across their phylogenetic history18,19,20,21,70. Of the 92 complete to near complete octocoral mt genomes used in this study, 21% had gene rearrangements. Brockman and McFadden20 suggested that octocoral gene rearrangements evolve via inversions of conserved gene blocks (or intramolecular recombination) whereas hexacoral gene rearrangements are likely caused by gene shuffling. Additionally, they hypothesized that the presence of the mt mis-match repair protein, mtMutS (unique to Octocorallia) might play a role in mediating these gene inversions. A recent review by Johansen and Emblem71 suggested that the large nad5 intron that is ubiquitous in hexacorals (but absent from octocorals) perhaps stabilizes mt genome organization in that class. With the increasing availability and decreasing costs of high-throughput sequencing combined with new analytical methods for assembling and annotating mt genomes (e.g., MitoFinder,3), many new discoveries likely await regarding the mt genome evolution of anthozoan cnidarians.

Mt-nuclear discordance

Advances in genomic approaches have also facilitated comparisons of the phylogenetic histories of nuclear and mt genomes. This has allowed for exploration of the patterns and underlying causes of mt-nuclear discordance. In both Hexacorallia and Octocorallia, we found a high degree of mt-nuclear discordance at every level (i.e., order to species) even when comparing the mt phylogeny to the most similar nuclear phylogeny. At deep nodes in the phylogenies, the most apparent differences in the hexacoral phylogenies included the positions of the anemone groups Actiniaria, Zoantharia, and R. daphneae. Within octocorals, the most apparent differences at deep nodes were relationships among clades within the order Scleralcyonacea and among the early-diverging lineages within Malacalcyonacea. Discordance at deep nodes complicates interpretations of ancestral state reconstructions through deep time. In addition, this level of discordance causes concern for using just one source of sequence data (i.e. nuclear or whole mt genomes) for phylogenetic reconstruction, but also highlights how different datasets used in compliment present a unique opportunity to better understand the cause of the discordance from an evolutionary perspective.

Substitution saturation of mt genomes has been suggested to be the cause of mt-nuclear discordance in anthozoans21,41. Using entropy tests on our extensive dataset of ~ 100 genomes in each class, we did not find evidence for substitution saturation. The entropy-based t statistic tests saturation on variable sites only, is suitable for assessing misleading tree topologies, and it has several advantages, including: (1) it is robust across a range of confounding factors, including rate variation across sites; and (2) the negative influence of slowly-evolving sites is removed in the measurement of overall base composition50. Thus, our results might differ from prior studies that used other methods, particularly if slowly-evolving sites were not taken into account. Alternatively, the different results could be driven by the number (2–3X less) and choice of taxa used in prior phylogenetic studies. In contrast to mt genomes, we found that ~ 10 to 50% of UCE and exon nuclear loci were saturated, depending on dataset. A recent study examining substitution saturation of UCE and exon loci across a range of taxa (e.g., hymenopterans, fishes, and crustaceans), also found similar numbers of saturated loci and noted that this could be driven by the highly variable flanking regions of UCEs50. We removed UCE and exon loci with substitution saturation from the dataset prior to phylogenetic analysis, yet even so, the nuclear and mt topologies were quite incongruent. Therefore, substitution saturation is not the primary cause of the observed discordance among nuclear and mt phylogenies.

Introgression is another biological process that can result in discordance among nuclear and mt phylogenies. Within Anthozoa, introgressive hybridization has been suggested to be an important mechanism in generating species diversity72,73,74,75,76,77,78. Because mt genomes are maternally inherited and non-recombining, species or groups of species that have undergone past hybridization might be expected to have mt genomes that are more similar than their nuclear genomes (e.g.79,80]). Using D-statistics and ABBA-BABA tests, Quattrini et al.76 determined that hybridization is an important mechanism in shaping diversity within the octocoral genus Sclerophytum (= Sinularia). Similarly, hybridization has been noted within multiple species in the scleractinian genus Porites77,78. Indeed, we found strong incongruence between mt and nuclear phylogenies within both genera. Although incomplete lineage sorting is likely driving some incongruence at shallow nodes, our results and past data also suggest that introgression explains some of the incongruence, at least at the tips of the trees. Mitochondrial introgression is more likely and happens at a faster rate than nuclear introgression, cautioning the use of mt gene trees as accurate depictions of species trees80. Future studies should consider explicitly testing for mt introgression in pairs or groups of taxa using, for example, ABBA-BABA tests and isolation with migration models (e.g.81). Whether or not introgressive hybridization is the cause of incongruent relationships at nodes deeper in a phylogeny is more difficult to discern. However, ancient introgression of ghost lineages (e.g., extinct, unknown or unsampled lineages that remain in extant species likely due to ancient hybridization82) could play a role in generating incongruence and could be explored in future studies.

Unique properties of anthozoan mt genomes could also be partly responsible for the mt-nuclear discordance seen here. Anthozoan mt genomes evolve slowly24,80. This slow substitution rate can be seen clearly across both hexacoral and octocoral mt phylogenies as short branch lengths, particularly at the tips. Shearer et al.80 hypothesized that background selection is influencing the slow substitution rates within mt genomes of anthozoans. Due to non-recombining mt loci, selection reduces variation not only at sites under selection, but at those that are linked as well83. Indeed, we found that mt genomes are under strong purifying selection in both Hexacorallia and Octocorallia, with omega values close to zero and high kappa values suggesting transition bias in both classes. Another recent study found that some genes are under relaxed purifying selection in deep-sea taxa, with some sites in particular genes under positive selection84. We were not able to test for selection on nuclear loci, as none have been annotated to date, thus they are not in correct reading frames. However, because of the large number of loci used, we would not anticipate that all or even most nuclear loci would evolve under the same type of selection.

We also found variation in substitution rates and GC content across mt genome phylogenies and between mt and nuclear phylogenies. Although some relationships among species that occurred on long branches in the mt genome tree (e.g., Tenerodus fallax and Leptophyton benayahui) were also recovered in the nuclear tree, others were not (e.g., Cornularia pabloi). In a family-level revision of Octocorallia, McFadden et al.40 also noted that some species relationships in the gene tree for mtMutS were artifacts of long branch attraction, and that this rate variation among lineages influenced phylogenetic signal in mt data. Furthermore, we also showed pronounced differences in GC content between nuclear and mt data and even between different taxonomic orders. However, there are no obvious indications that GC content is driving discordance between nuclear and mt phylogenies.

Summary

Our results have demonstrated pervasive mt-nuclear discordance in Anthozoa. Overall, non-recombining mt genomes that do not evolve neutrally, exhibit substantial rate variation, and are likely to rapidly introgress are most likely influencing our ability to reconstruct accurate species relationships using mt genome data alone. Other studies have cautioned against the use of mtDNA for resolving phylogenetic relationships in anthozoans 21,41 and even more broadly in metazoans1, but unequal taxon sampling and non-matching tips have always been potential confounding issues in mt-nuclear comparisons. We included the same tips in the mt and nuclear phylogenies and sampled widely across all orders. Nonetheless, it is still possible that inadequate taxon sampling could influence the patterns of mt-nuclear discordance we observed, and including more taxa in particular regions of the trees would stabilize some relationships. Even so, mt-nuclear discordance in hexacorals and octocorals is not an artifact of biased and incomparable taxon sampling, but is instead a signal of evolutionary processes that have shaped the genetic diversity of Anthozoa.