Phylogeography, colouration, and cryptic speciation across the Indo-Pacific in the sea urchin genus Echinothrix

The sea urchins Echinothrix calamaris and Echinothrix diadema have sympatric distributions throughout the Indo-Pacific. Diverse colour variation is reported in both species. To reconstruct the phylogeny of the genus and assess gene flow across the Indo-Pacific we sequenced mitochondrial 16S rDNA, ATPase-6, and ATPase-8, and nuclear 28S rDNA and the Calpain-7 intron. Our analyses revealed that E. diadema formed a single trans-Indo-Pacific clade, but E. calamaris contained three discrete clades. One clade was endemic to the Red Sea and the Gulf of Oman. A second clade occurred from Malaysia in the West to Moorea in the East. A third clade of E. calamaris was distributed across the entire Indo-Pacific biogeographic region. A fossil calibrated phylogeny revealed that the ancestor of E. diadema diverged from the ancestor of E. calamaris ~ 16.8 million years ago (Ma), and that the ancestor of the trans-Indo-Pacific clade and Red Sea and Gulf of Oman clade split from the western and central Pacific clade ~ 9.8 Ma. Time since divergence and genetic distances suggested species level differentiation among clades of E. calamaris. Colour variation was extensive in E. calamaris, but not clade or locality specific. There was little colour polymorphism in E. diadema.

). Small but significant mtDNA Φ ST values were found between populations in Maui and Clipperton Island (Φ ST value 0.069, p < 0.05), between Maui and Kingman Reef (Φ ST value 0.105, p < 0.05) and between the Island of Hawaii and Clipperton (Φ ST value 0.100, p < 0.01), suggesting genetic exchange.
In the central Pacific, gene flow of mtDNA in clade 1 of E. calamaris was restricted between geographically close populations (~ 760 km) at Kingman Reef and Kiribati (Φ ST value 0.613, p < 0.05). This was also true for populations at Kingman Reef and Moorea (Φ ST value 0.709, p < 0.05), but not between Kiribati and Moorea (Table S1). Gene flow between Kiribati and the majority of other populations was restricted.
Among Indo-West Pacific members of E. calamaris clade 1, non-significant mtDNA Φ ST values indicated the presence of gene flow between Malaysia and Fiji, Japan, Seychelles, Maldives, and South Africa, and between Japan and Moorea, Kiribati, Maldives, Madagascar, and South Africa. Identical haplotypes occurred in populations at Fiji, Moorea, Cook Islands, and Malaysia (Fig. 4).
In the Indian Ocean large and significant Φ ST values (ranging from 0.34 to 0.56) in the mtDNA of E. calamaris clade 1 suggested high levels of genetic isolation of populations at Réunion, with genetic exchange only with Madagascar (Table S1) www.nature.com/scientificreports/ shared identical mtDNA haplotypes. Connectivity was also found among E. calamaris mtDNA haplotypes from India, Zanzibar, and South Africa (Fig. 4).
Populations of E. calamaris from the Red Sea and Gulf of Oman in clade 2 were clearly differentiated from each other in mtDNA (Φ ST value = 0.54, p < 0.05) ( Table S2).
Populations of E. calamaris in clade 3, showed less genetic isolation in mtDNA among populations than those in clade 1. Identical haplotypes were found in individuals at Fiji, Japan, Moorea, Kiribati and Malaysia (Fig. 4), but gene flow between populations in Malaysia or Japan and populations situated farther East in the Pacific was more restricted (Table S3). The population of E. calamaris in Papua New Guinea had large and significant Φ ST values in comparison to all other populations in this clade.
Haplogroups of E. diadema included some populations that were distributed far apart in both the Pacific Ocean (Fiji, Moorea, Clipperton, Isla del Coco, Hawaii, Guam) and the Indian Ocean (Seychelles and Zanzibar) (Fig. 4). The presence of shared haplotypes among such disparate populations suggests wide dispersal across the entire Indo-Pacific, including migration across the EPB. Despite many shared and closely connected haplotypes, Φ ST values indicated genetic heterogeneity among many populations (Table S4) (Table S7) and the nDNA haplotype network (Fig. 5) indicated that genetic exchange of populations of E. diadema in the Indo-Pacific was far more pervasive than suggested by mtDNA (Table S4). Only a few populations showed indications of restricted gene flow (e.g. Clipperton vs Namatakula, Fiji).
Genetic differentiation between geographic regions. Analysis of Molecular Variance (AMOVA) of geographic regions within clades of E. calamaris and E. diadema indicated that most of the genetic variation in both mtDNA and nDNA occurred within populations, rather than between populations within each geographic region or between geographic regions (Tables 2, 3). An exception to this was clade 1 of E. calamaris, which showed a greater level of mtDNA variation between geographic regions (32.55%) than between populations within regions (17.39%).  Table 4).
The populations in the Hawaiian Archipelago showed significant inter-regional mtDNA differences with those in the Indian Ocean (Φ CT value = 0.462, p = 0.007). They also had a high percentage of significant regional variation with populations in the rest of the central Pacific (33.60%, Φ CT value = 0.336, p = 0.032), but were  www.nature.com/scientificreports/ Analysis of molecular variance of nDNA between regions of E. calamaris in clade 1 indicated that most of the genetic variation occurred within populations (Table 5). Only populations in the tropical eastern Pacific and in the Hawaiian Archipelago showed inter-regional variance that slightly exceeded intraregional variance when compared with those from the western Pacific Ocean. However, fixation indices were not significant.
Pairwise AMOVAs of E. diadema mtDNA (Table 6) and nDNA (Table 7) revealed a ubiquitous pattern of genetic variation concentrated within populations. Greatest within-population variance in mtDNA occurred when these populations were compared between the tropical eastern Pacific and the Hawaiian Archipelago (94.68%, Φ SC value = 0.103 p = < 0.001) and between the western Pacific Ocean with the Hawaiian Archipelago (94.84%, Φ SC value = 0.073, p = 0.110). Comparisons between the Indian Ocean, the Hawaiian Archipelago and the western Pacific Ocean revealed greater interregional variance of nDNA than between populations, but fixation indices for nDNA were not significant (Table 7).   Colour polymorphism. Colour variation was extensive among E. calamaris specimens, but not among those of E. diadema (Fig. 1). Juvenile E. diadema had green and purple-banded interambulacral and ambulacral spines, the test epithelial tissues were burgundy or black (lighter at night) with a black periproctal cone. In adults the colour of the test epithelial tissues was black, the interambulacral spines were black or banded dark purple and green, but always with a blue, green, or turquoise iridescent sheen. The ambulacral spines were banded in all E. diadema, regardless of age. In clades 1 and 3 of E. calamaris colour variation within each clade included members that were white, green, red, pink, purple, brown or black, with a variety of combinations, but was not clade or locality specific. Colour variation in clade 2 of E. calamaris was reduced. Individuals from Dahab in the Red Sea had predominantly black interambulacral spines and test epithelial tissues that were either red or black. Those from the Gulf of Oman were black with a red tinge typical of the black colour morphs reported from the Gulf of Aqaba by Dafni 32 . In contrast to E. diadema, the ambulacral spines of E. calamaris were typically one colour with a dark venom gland at the tip, but not banded in either adults or juveniles. Bright green median regions of the interambulacra (Fig. 6a, b), present only in E. calamaris, may advertise the venomous nature of the ambulacral spines 27,28 . However, these median regions were absent in some red and black colour morphs (Fig. 6c), as were white/yellow spots on the skin of the periproctal cone.
At Sosoikula Reef, Suva, Fiji, five E. calamaris were members of clade 1, nineteen of clade 3, while a single specimen was a hybrid of E. diadema (nDNA) and E. calamaris mtDNA clade 3 (Fig. 1). The hybrid's periproctal cone was black and small, resembling that of E. diadema. The spines around the edge of the interambulacra and down the entire ambulacra were banded black and grey, with a distinct blue/green sheen similar to E. diadema.  www.nature.com/scientificreports/ All other interambulacral spines were white with no banding. Spine structure, with verticillations on the interambulacral spines were typical of E. calamaris sensu Coppard and Campbell 28 , and two forms of tridentate pedicellariae that were structurally closest to E. calamaris (see Coppard and Campbell 33 ).

Discussion
Similar to other diadematid echinoids, Echinothrix has a poor fossil record primarily due to the fragile nature of the test that breaks-up and disperses after death 29 . A paucity of fossil material has previously made it hard to determine when species diverged. Our analysis, using combined dating from fossil spines and substitution rates from the closely related genus Diadema Gray, 1825 34  Both clade 1 of E. calamaris and E. diadema have extensive distributions throughout the Indo-Pacific, while clades 2 and 3 of E. calamaris are restricted to specific geographic regions. A trend of isolation by distance was observed only in clade 1 of E. calamaris, but not in E. diadema, which remains genetically connected across ~ 26,000 km of the Indo-Pacific.
Larval duration of Echinothrix in the plankton is unknown. Time to metamorphosis in other diadematids appears to vary depending on water temperature. Larvae of tropical Diadema antillarum Philippi, 1845 settle in the laboratory in ~ 36 days 43 , but the larvae of temperate Centrostephanus rodgersii (A. Agassiz, 1864) spend between 105 and 126 days in the water column before settlement 44 . If water temperature is a factor in development rate, then we should expect members of Echinothrix to be closer to Diadema than Centrostephanus Peters, 1855. However, time to metamorphosis may vary among clades of Echinothrix and among geographic regions.
In the Indian Ocean, a high degree of genetic isolation (based on mtDNA) was exhibited by the population of E. calamaris at Réunion Island, with genetic connections only with that at Madagascar, corresponding to the Madagascar ecoregion of Spalding et al. 45 . High population connectivity was found between Indian Ocean and western Pacific populations of E. diadema, but gene flow was somewhat restricted in clade 1 of E. calamaris. Today, the Strait of Malacca seaway through the Malay-Indonesian Archipelago has sufficient current activity to allow gene flow among some Indo-Pacific marine species 46 . The Indonesian throughflow between Malaysia and Australia also has the potential to distribute plankton westwards 47  Differences in the distributions of clade 1 and clade 3 could not be explained by their age, as they are sister clades. Members of the two clades may have different ecological requirements. Alternatively, different distributions may be the result of chance dispersal events. Crossing the uninterrupted deep water that forms the Eastern Pacific Barrier (EPB) (Fig. 2) is very difficult for shallow water marine invertebrates [18][19][20][21] . In this study, we found that both E. diadema and clade 1 of E. calamaris have crossed the EPB. Shortened transport time across the EPB resulting from a strong El Niño event is a likely mechanism for dispersal 17,21,[50][51][52][53][54][55][56] . We propose that the very strong 1997-1998 El Niño event may well have facilitated dispersal of Echinothrix larvae across the EPB and corresponds to the last colonization by Echinothrix larvae in the eastern Pacific.
Gene flow between Clipperton Island and Isla del Coco was not quantified in E. calamaris due to the limited sample size from Isla del Coco. However, in E. diadema gene flow appeared to be restricted (Table S3). Crossing the eastward flowing North Equatorial Counter Current (NECC) would severely limit exchange between Isla del Coco and Clipperton but potentially facilitate some exchange between Isla del Coco and central and South  59 . In E. diadema no restriction to gene flow was found between Clipperton Island and the Hawaiian Archipelago (Table S3), perhaps because larvae of E. diadema have greater dispersal ability than those of E. calamaris.
The Line Islands, including eight islands that form part of Kiribati, lie 1800 km south of Hawaii and have been proposed as a source for larval dispersal to the Hawaiian Archipelago 60 . In non-El Niño years equatorial upwelling north of Kiribati 61,62 in conjunction with the westward-flowing South Equatorial Current and the eastward-flowing North Equatorial Counter Current limit larval movement north and northeast. Lack of recent gene flow in E. calamaris between populations at Kiribati with populations in the tropical eastern Pacific and at Kingman Reef was apparent, both through large and significant Φ ST values and limited connectivity in the mtDNA haplotype network. This pattern was also found in E. diadema (Table S4). However, populations of both E. calamaris and E. diadema at Kingman Reef showed evidence of gene flow with both the Island of Hawaii and Clipperton Island, and therefore Kingman Reef may act as a larval source for dispersal north and east.
Evidence of gene flow between Kiribati and Moorea Island to the southeast was found in both clade 1 and clade 3 of E. calamaris. Kiribati is influenced by the eastward-flowing Equatorial Counter Current (ECC) and the westward-flowing South Equatorial Current (SEC) 63,64 . These currents are reported to have transported to Kiribati floating volcanic pumice and presumably larvae derived from Krakatau (western Pacific), the Tonga Trench (southwestern Pacific) and Benedicto Island, Mexico (eastern Pacific) 65 .
Gene flow in mtDNA between Japan and Kiribati and between Japan and Moorea in both clades 1 and 3 of E. calamaris is hard to explain by the flow of oceanic currents. This is also true for gene flow in E. diadema between Taiwan and both the tropical eastern Pacific and the Hawaiian Archipelago, and between the Seychelles and Isla del Coco.
Clade 2 of E. calamaris was endemic to the Red Sea and Gulf of Oman. Echinothrix diadema was not encountered at either location in the limited collections of this study. Mastaller's 66 report of E. diadema in the Gulf of Aqaba is thought to be an error resulting from misidentification of the dark morph of E. calamaris as E. diadema 32 . No mtDNA haplotypes of clade 2 were found among E. calamaris in either the Indian or Pacific Ocean populations. Large and significant Φ ST values indicated that Red Sea and Gulf of Oman populations were isolated from one another. Today larval dispersal of the Red Sea population of E. calamaris is probably restricted by upwelling of cold, nutrient-rich water between the Gulf of Aden and the Arabian Sea, which acts as a significant barrier to gene flow in other species 42 . Upwelling off Oman in conjunction with seasonal current patterns result in large fluctuations in temperature and primary productivity 67-69 and may limit gene flow between clade 2 of E. calamaris in the Gulf of Oman with other populations.
Discordance in population structure between mtDNA and nDNA cannot completely be explained by different evolutionary rates between markers. Although nuclear 28S rDNA appeared to be evolving at a slow rate (Table 1), the nDNA Calpain-7 intron was found to be evolving at a similar rate to mitochondrial 16S, with only slightly smaller genetic distances between clades (Table 1). This raises the question of whether differences in genetic structure between mtDNA and nDNA have resulted from sex-biased migration of their pelagic larvae. In such a scenario, male larvae would remain longer in the plankton resulting in increased connectivity and gene flow in biparentally inherited nDNA relative to matrilineally inherited mtDNA. To-date sexual dimorphism in echinoid larvae has not been demonstrated and the question of sex-biased migration remains open.
In disagreement with the literature 13,22-24 , variation in colouration of E. diadema was slight throughout its geographic range. This was in stark contrast to E. calamaris, which exhibited diverse colour variation within clades 1 and 3, with no indication that colour polymorphisms were correlated with either clade or geographic location. Why two such ecologically similar species displayed such different levels of colour variation is puzzling. Echinothrix diadema and E. calamaris occur in mixed species aggregations under coral heads or in large crevices 14 and have very similar dietary preferences 15 . It is therefore unlikely that pigments from food affect their colouration. Both species reportedly grow to a large size, with recorded horizontal test diameters of 143 mm in E. calamaris and 110 mm in E. diadema 70 . Both species have venomous ambulacral spines to ward off predators 28 . There is no apparent explanation for the diversity of colour seen in contemporary E. calamaris; it is impossible to know whether it may be a remnant of past evolutionary processes that are no longer present.
Our molecular phylogeny revealed the presence of three distinct and genetically distant clades in Echinothrix calamaris, that suggested species level differentiation. Although not a direct comparison, genetic distances in ATPase-6 and ATPase-8 between clades of E. calamaris (Table 1) were greater than those reported in the same genes between some species of Diadema (e.g. 4.34% between D. savignyi (Audouin, 1809) and D. mexicanum A. Agassiz, 1863; 2.61% between D. savignyi and D. paucispinum-a A. Agassiz, 1863) 34 . Time since divergence between clades of E. calamaris (~ 9.8 million years (My) between clade 1 + 2 and clade 3; ~ 7.2 My between clade 1 and clade 2) was also greater than between these species of Diadema (less than 4 My). However, we have no information on reproductive isolation between clades. Clade 1 and clade 3 have overlapping distributions in parts of the western and central Pacific (Fig. 2) and are living side-by-side on coral reefs with no spatial isolation, so their genetic divergence may well be the result of reproductive incompatibility. Slight differences in tridentate pedicellarial morphology have been reported among members of E. calamaris 33 , but we could find no major morphological differences that would be diagnostic between clades. In Fiji, where both clades 1 and 3 occur, E. calamaris spawned around the time of the new moon 71  www.nature.com/scientificreports/ or spatial isolation between clades, possible barriers to genetic intermingling could be molecules involved in species-specific recognition of gametes or lower hybrid fitness.
We have found hybrids between E. diadema and E. calamaris to be rare, quite possibly because the two species spawn fifteen days out of phase with each other 71 . A single E. diadema/E. calamaris clade 3 hybrid was found at Suva, Fiji. Very occasional asynchrony among E. calamaris spawning out of phase with conspecifics on the full moon simultaneously with E. diadema may partially explain such introgression. It is likely that close proximity of spawning individuals would be necessary, and that other reproductive barriers such as those generated by proteins involved in sperm and egg attraction 72 and sperm and egg binding and fusion including sperm bindin 73 and its egg receptor EBR1 74 would have to be overcome for this to occur.

Methods
Two hundred and ninety-nine Echinothrix calamaris and 171 E. diadema were collected from localities throughout the Indo-Pacific (Fig. 7). Samples represented the diversity of colour morphs present at each location. Gonads and muscle from the Aristotle's lantern were dissected and preserved in either 95% ethyl alcohol or high salt dimethyl sulfoxide buffer. Astropyga pulvinata was also sampled from off Uva Island, Gulf of Chiriquí Panama (7.8016° N, 81.7579° W) for use as an outgroup. Collection information including tissue voucher numbers at the Smithsonian Tropical Research Institute, Panama are included in Supplementary Table S8. DNA extraction, sequencing, and alignment. Genomic DNA was extracted from the lantern muscle or gonad using a DNeasy tissue kit (Qiagen) or proteinase K microextractions. Mitochondrial 16S rDNA, ATPase-6, and ATPase-8, as well as nuclear 28S rDNA and the intron in Calpain-7 were amplified for all samples. 16S rDNA was amplified using the 16Sar and 16Sbr primers of Kessing et al. 75 and the ATPase-6 and ATPase-8 regions using the LYSa (Lysine-tRNA) and ATP6b primers of Lessios et al. 34  Nuclear 28S rDNA was amplified using the primers and protocol of Littlewood and Smith 76 . Calpain-7 intron primers were designed for Echinothrix based on the i21 EPIC (Exon Priming Intron Crossing) primers of Gerard et al. 77 . The intron was amplified using the forward primer 5′-CCG GTA TAC AAT CCT TGT GGC and reverse primer 5′-AGC GAC ACC CAG AAT TCG TT. The PCR reaction mixture and conditions were the same as used for mtDNA, but with the annealing temperature raised to 55 °C. PCR products of 28S and the Calpain-7 intron were cloned using Promega pGEM-T Easy kits to avoid polymorphisms. One clone was amplified from each specimen using Promega M13 and M13R primers. After purification in Sephadex columns, amplification with the same primers, and labelling with Applied Biosystems (ABI) Prism BigDye terminators, nucleotides were sequenced in both directions in an ABI 3130 XL automatic sequencer. Alignments were performed in MacClade 78 .
After trimming sequence ends of ambiguous bases and overlapping segments, there were 614 bp of 16S rDNA, 174 bp of ATPase-8, 360 bp of ATPase-6, 1180 bp of 28S rDNA and 277 bp of the Calpain-7 intron, including protein binding site and 84 bp of the flanking 3' exon. The first ~ 200 bases from the 5' end of the Calpain-7 intron were deleted from the original 477 bp due to ambiguous low-quality sequence reads. This resulted in 2605 bp of DNA sequence for each specimen.   80 was employed to determine the best model of molecular evolution for each DNA region based on the AIC criterion 81 . Models selected were 16S: HKY + I + G (I = 0.4530, α = 0.8020), ATPase-8: TIM3 + I (I = 0.4060), ATPase-6: GTR + I + G (I = 0.2120, α = 0.5020), 28S: TIM2 + I (I = 0.9540) and Calpain-7 intron: GTR + G (α = 0.3560). The data were concatenated, and a partitioned Bayesian phylogenetic analysis was carried out with MrBayes v. 3.2 82 using for each gene region the model suggested by jModeltest and parameters unlinked across partitions. A single haplotype of Astropyga pulvinata was chosen as an outgroup. The analysis was started with Dirichlet priors for rates and nucleotide frequencies and run for 6 × 10 8 steps, sampling every 3000th tree from two runs. Convergence was assessed according to the average standard deviation of split frequencies reaching < 0.01 and potential scale reduction factor 83 1.00 for all parameters. The runs were also visually checked for a lack of trends in Tracer v1.6 78 . The first 25% of trees were discarded from each run as burn-in, and a 50% majority rule tree was constructed. Maximum likelihood (ML) analysis was conducted on the concatenated data in RAxML v.8.2.x 84 using the GTR + G model and rapid bootstrapping for 10,000 iterations. Nodes with less than 85% Bayesian support in MrBayes and less than 75% ML support in RAxML were collapsed. Haplotype networks are a means of visualizing relationships when the ancestors of extant groups are assumed to be still present. Median joining haplotype networks for each major clade, combining mtDNA genes in one analysis (Fig. 4) and combining nDNA sequences in a separate analysis (Fig. 5), were constructed in PopART (Population Analysis with Reticulate Trees) 85 , with reticulation tolerance set to zero. The mean genetic distance between clades according to the models suggested by jModeltest was calculated from all pairwise comparisons estimated by Paup* 86 .

Timing of divergence.
A partitioned analysis of unique haplotypes of ATPase-6 (360 bp) and ATPase-8 (174 bp) from 237 taxa was analysed in BEAST (v1.10.4.) 87 . Substitution models for the reduced datum set selected by jMODELTEST were GTR + I for ATPase-6 and TN93 + I for ATPase-8. BEAST was run using an uncorrelated relaxed clock with a lognormal distribution and a Yule speciation process. A single haplotype of Astropyga pulvinata was chosen as an outgroup. Substitution rates (per site, per million years) for ATPase-6 and ATPase-8 from the closely related genus Diadema were used to calibrate the phylogeny, based on the assumption that the western Atlantic D. antillarum was separated from the eastern Pacific D. mexicanum 2.5 Ma 34 . Initial ucldmean clock rates were set to 0.0176 substitutions per site per million years for ATPase-6 and 0.0108 substitutions per site, per million years for ATPase-8. An exponential prior with an offset of 15.97 and a mean of 2.0 was used to constrain the minimum age of Echinothrix. This was based on the 15.7 to ~ 25 Ma Burdigalian, early Miocene age of the Echinothrix spine fragment from Java, Indonesia 29 . The results from five individual runs, each of 10-50 × 10 6 steps, logged every thousand steps were combined in LogCombiner v. 1.10.4 with 10% burnin from each run, resulting in 120,000 trees. The runs were visually checked for lack of trends in Tracer v.1.6 79 . The effective sample size (ESS) of the combined set was > 212 for all parameters except ATP6ucld.stdv, which had a value of 129.
Gene flow between populations and biogeographic regions. Φ ST statistics represent divergence between populations relative to divergence within populations and are assumed to measure degree of on-going gene flow. We estimated gene flow separately from concatenated mitochondrial and concatenated nuclear genes, because of the four-fold difference in effective population size. Pairwise Φ ST values using Tamura and Nei 31 distances were calculated in Arlequin v. 3.5.2.2 88 between geographic populations within clades. Analysis of molecular variance (AMOVA) using Tamura and Nei 31 distances was estimated in Arlequin from 1023 random reshufflings of haplotypes in each clade. Φ-statistics were then employed to estimate the proportion of genetic variability found between geographic regions (Φ CT ), between populations within geographic regions (Φ ST ) and within populations (Φ SC ). Geographic regions were defined as the Indian Ocean, western Pacific Ocean (Malay-Indonesian Archipelago to Guam) central Pacific Ocean (New Caledonia, East to Moorea), Hawaiian Archipelago and the tropical eastern Pacific Ocean (Isla del Coco and Clipperton Island). Geographic sea distances between collection points were measured using Google Maps. Standard Mantel 89 tests were conducted to evaluate possible relationships between geographic distance and genetic divergence with significance tested using 10,000 permutations. Figs. 1 and 6 were taken by S.E.C., M. Barton, O. Bronstein, and F. Ducarme. Brightness, colour, and contrast were unchanged.