Introduction

The bioluminescent earthworm Pontodrilus litoralis (Grube, 1855) has been reported as a cosmopolitan species, inhabiting marine littoral ecosystems in the sub-temperate and tropical coastal areas of the Atlantic, Pacific, and Indian oceans1,2,3,4, and is reported to be both arenicolous and limicolous. The first description of this littoral earthworm, named as Lumbricus litoralis by Grube (1855)5, was based on the morphological characteristics of a Mediterranean sample from the Villefranche-sur-Mer (formerly Villafranca) on Côte d’Azur, France. The genus Pontodrilus was first established by Perrier (1874)6 who also described P. marionis Perrier, 1874; however, Beddard (1895)7 subsequently synonymized P. marionis with L. litoralis. Easton (1984)1 then provided an extensive list of P. litoralis synonyms and references to the taxonomic literature and concluded that P. litoralis is a single species which is highly variable. Although, a few other morphologically distinct species of Pontodrilus have been discovered, only two species of Pontodrilus, including P. litoralis, have been reported from Thailand and peninsular Malaysia2. Chen et al. (2021)8 hypothesized that the widespread populations of P. litoralis throughout the world resulted from their transport by currents, which is congruent with Blakemore’s (2007)9 suggestion that the wide distribution of P. litoralis is due to the transport of ships’ sand-ballast, and the natural rafting of euryhaline cocoons. The wide range of salinity tolerance of P. litoralis, shown experimentally by Seesamut et al., (2022)10, may have facilitated this species’ wide distribution pattern.

Molecular (DNA) taxonomy in earthworms has mostly used a single marker gene, in particular the mitochondrial cytochrome c oxidase subunit 1 (COI) gene. When such a marker is used to identify species, the method is referred to as DNA barcoding11,12,13,14,15. However, many earthworm studies have implemented both nuclear and mitochondrial genes in phylogenetic species delimitation16,17,18,19. Widely used methods based on single-locus sequences are, e.g., Automatic Barcode Gap Discovery (ABGD)20, Assemble Species by Automatic Partitioning (ASAP)21, Bayesian implementation of Poisson Tree Processes model (bPTP)22 and General Mixed Yule Coalescent model (GMYC)23; for more details, see the review by Martinsson & Erséus (2021)24 and Goulpeau et al., (2022)25. However, for sexually reproducing species, multiple-locus delimitation, which takes the evolution of more than one gene into account, may be more reliable for testing hypotheses of speciation events; for instance, congruent nodes in the comparison between one nuclear and one mitochondrial gene tree are more supportive of a speciation event (ceased gene flow) than are incongruent nodes, which are evidence of gene flow between individuals belonging to different “mitochondrial” (= maternal) lineages19,26,27.

Despite the worldwide distribution records of P. litoralis, scientists still believe that it is a single species, and this is largely based on morphological characteristics. Variation in the body size between populations in Asia has been studied, but these marked difference in the morphometrics of P. litoralis across geographic populations did not correlate with their genetic differences (COI). Rather, it was suggested that P. litoralis is a single species3. In this study, we aimed to test the hypothesis that the worldwide distributed earthworm P. litoralis is a single species as proposed by Easton (1984) and Seesamut (2019)1,3. The earthworms were collected from North America, Australia and Oceania, Europe, Africa, and Asia (East and Southeast Asia), and morphological examination, phylogenetic analysis, and species delimitation using the methods mentioned above plus multi-locus delineation using Bayesian phylogenetics and phylogeography28,29 were conducted.

Results

We obtained a total of 114 COI sequences of P. litoralis which included 22 specimens from North America, three from Africa, 12 from Australia and Oceania, three from Europe, and 74 from Asia (24 from East Asia and 50 from Southeast Asia) (Fig. 1, Table 1). The final aligned dataset, comprised of 658 bp sequence fragments, contained a total of 392 invariable (monomorphic) sites, 210 variable (polymorphic) sites (total number of mutations is 283), and 119 parsimony informative sites. The result yielded a total of 52 haplotypes, with a haplotype (gene) diversity of 0.978 and a nucleotide diversity (Pi) of 0.09838. All sequences are deposited in GenBank (Table 1). Based on P. litoralis samples from different geographic distributions, the COI phylogenetic tree revealed a high genetic diversity, and the COI-based species delimitations revealed that the 114 specimens were divided into 19 MOTUs by ABGD and ASAP, whereas the bPTP and GMYC methods yielded 30 and 31 MOTUs, respectively (Fig. 2).

Figure 1
figure 1

(A and B) Map showing the sampling sites of P. litoralis. The map is based on a map from D-maps (available at https://d-maps.com/carte.php?num_car=3228&lang=en), map was edited in Adobe Photoshop. (C) Photograph of P. litoralis from Thailand (photograph by Teerapong Seesamut).

Table 1 List of P. litoralis specimens examined in this study, and accession numbers of the COI and ITS2 sequences. * Juvenile stage; ** Only tail was collected.
Figure 2
figure 2

A ML phylogenetic tree of P. litoralis based on the COI fragment sequence (658 bp) and the species delimitation clustering results. The nodes with ML bootstraps > 70% are considered well-supported. The scale bar indicates the branch length. ABGD, automated barcode gap discovery; ASAP, Assemble Species by Automatic Partitioning; bPTP, Bayesian implementation of Poisson Tree Processes model; GMYC, generalized mixed Yule coalescent model. The numbering is the input MOTUs of the BPP analyses, the letters a—d are the four most conservative MOTUs suggested by BPP.

The COI marker showed a higher variability than the ITS2. The COI haplotype network shows that 52 haplotypes were detected in 114 individuals, with each (location) population having its own single haplotype. Only one haplotype was shared across two locations from different countries: Quangbinh (Vietnam) and Taiwan (Fig. 3A). The ITS2 haplotype network showed a total of 36 haplotypes from 98 individuals (Fig. 3B). The highest numbers of mutational steps are 77 and 16 in COI and ITS2, respectively.

Figure 3
figure 3

Haplotype networks for (A) COI sequences (658 bp) and (B) ITS2 sequences (437 bp) of P. litoralis. Lines with dashes and numbers between circles represent the number of mutational steps between two haplotypes. The number of samples in each haplotype corresponds to the size of the circles in the legend.

The phylogenetic relationships observed in the analysis of the concatenated data (COI + ITS2) were congruent with the COI and ITS2 phylogenetic trees (Figs. 2, 4, and Supplementary Fig. 1). The results of the BPP analyses are summarised in Table 2. In analysis A, B, and C, 17, 3, and 11, respectively, out of the 30 MOTUs are supported with a PP of > 0.95. The only two MOTUs that are supported in all three analyses are MOTU 29 and 30. In one of the three separate analyses of B and C, respectively, maximum support was found for combining a majority of the MOTUs into one. The most conservative estimate would be four MOTUs, i.e., (a) combining MOTUs 1–25, (b) combining MOTUs 26–28, (c) MOTU 29, and (d) MOTU 30 (Figs. 2, 4). There is some support for combining (i) MOTUs 26 and 27 and (ii) MOTUs 26 and 28. Based on these four MOTUs delineated by the BPP, interspecific COI uncorrected p-distances were calculated, revealing that the genetic divergence among this conservative set of MOTUs ranged from 13.9 to 16.9%.

Figure 4
figure 4

A ML concatenated tree of COI and ITS2. Nodes with ML bootstraps > 70% are considered well-supported. The scale bar indicates the branch length. The four most conservative MOTUs suggested by the BPP analysis are marked with black circles labelled with a–d respectively.

Table 2 List of species delimitation and their posterior probability (PP) given as a mean of three separate runs. The results with > 0.05 PP in at least one analysis are included. Posterior probabilities in bold are considered significant and MOTUs in bold are accepted.

Most P. litoralis specimens in this study were in the adult stage, and specimens from different collecting localities showed no difference in any distinctive morphological characteristics.

Discussion

Morphological investigation showed that the external and internal morphology of P. litoralis samples in this study correspond to the original description and those recently reported1,2,5,30. The analyses of the single-locus phylogeny and mitochondrial species delimitation suggested that P. litoralis is a complex of species, which all seem to be cryptic because of the homogeneity in their morphological characteristics. Moreover, a high degree of genetic structuring among different geographical populations of P. litoralis is evident. The occurrence of cryptic species in clitellates has frequently been uncovered, which is not surprising as there are few diagnostic morphological features that can be used to distinguish different species31. On the other hand, Martinsson et al. (2020)32 tested the species hypotheses of the enchytraeid worm Fridericia magna in Norway and Sweden and concluded that the data for this morphospecies is consistent with it being a single species. This and other examples (below) have shown that high intraspecific mitochondrial genetic distances are also common in clitellates.

In the semi-aquatic freshwater earthworm genus Glyphidrilus, ten single and multi-locus species delimitation methods revealed a high degree of incongruence between the genetic structures and morphology-based species identifications19. Several publications have examined and reported deeply divergent mitochondrial lineages and a high genetic diversity within well-established earthworm morphospecies14,17,33,34,35. Although the COI species delimitation analyses in this study suggested the presence of either 19, 30, or 31 MOTUs within P. litoralis, Lohse (2009)36 mentioned that geographic population structure is likely to lead to the overestimation of species numbers retrieved from species delimitation analyses. This has also been a critique of the multispecies coalescent methods, such as BPP37, and it is possible that this is a reason for our BPP analysis supporting about 20 MOTUs in the majority of runs, but then shifting to supporting much fewer MOTUs in some of the runs. This variation makes the interpretation of the results harder, and we have, therefore, chosen the more conservative estimate of MOTUs. Thus, we suggest that several MOTUs of P. litoralis are possibly affected by the bias from those species delimitation methods that analyzed the dataset containing different geographic populations of P. litoralis. With respect to the widespread distribution of the littoral earthworm P. litoralis, it may be dispersed around the world by humans or naturally be transported by currents8,9. Here, we suggest that the cosmopolitan distribution of P. litoralis is more likely to be caused by currents as human-mediated dispersal might cause the identical haplotype to be shared across different populations from distant locations38, while in our case there is a lack of identical haplotype shared across distant locations (Fig. 3).

For earthworms, we agree that 13% or thereabouts of COI interspecific genetic distance between two earthworm MOTUs could be used as a rule-of-thumb threshold to delimit different species14,19. Therefore, the most conservative recognition of only four MOTUs retrieved from the BPP analysis would suggest that P. litoralis is represented by four different species in our study (lineages a–d in Figs. 2 and 4). However, a much higher number of MOTUs of P. litoralis were detected by the different species delimitation methods. There are more than 20 synonyms of P. litoralis that have been reported from around the world1,2. Thus, in order to assign which synonym belongs to which different clade within the P. litoralis species complex, further investigations of type specimens representing all synonyms (or topotypes, in case of old type specimens or those not preserved in ethanol) are needed by implementing DNA taxonomy together with morphological investigation.

In summary, the global scale phylogeny and species delimitation of the cosmopolitan littoral earthworm P. litoralis were here investigated by an integrative taxonomic approach, with both single and multi-locus multispecies coalescent-based species delimitation methods. The study revealed several MOTUs within P. litoralis based on COI species delimitation alone, and this was well supported by the ITS2 data. The phylogenetic tree shows deeply divergent mitochondrial lineages and a high number of haplotypes, especially for COI. Without support from morphological characteristics, we suggest that the morphospecies P. litoralis is referred to as a cryptic species. Further in-depth studies of the morphology and anatomy of these littoral earthworms, e.g., by using scanning electron microscopy, are required to investigate the potential presence of cryptic morphology, which would provide further evidence for a more precise taxonomic revision of the species complex. Moreover, studies on population genetics and a search for more evidence (or lack) of gene flow and/or reproductive barriers are needed.

Materials and methods

Specimen collection and morphological examination

Specimens of P. litoralis were collected from several types of habitats, such as sandy beaches, mangrove swamps of the intertidal zone, sanitary sewer links, estuaries, under the trash or leaf litter, and freshwater channels between the mainland and the sea, in Thailand and surrounding countries in Southeast Asia (Fig. 1A,B) since 2007. All specimens were deposited in Chulalongkorn University Museum of Zoology (CUMZ), Thailand. Additional Japanese, Taiwanese, and Fijian specimens deposited in the collection at Chubu University Japan were included in the analyses. These littoral earthworms could be found in sand mixed with seaweed debris in sandy beaches facing the ocean in Taiwan and Japan, ranging from the northernmost record at Matsushima Bay, Miyagi Prefecture, to Aichi Prefecture, Mideast Honshu, Fukuoka Prefecture, Kyushu, and the Ryukyu archipelago. In addition, additional specimens of P. litoralis were collected by Christer Erséus and his team from different beaches at Lizard Island (Great Barrier Reef, Australia), Carrie Bow Cay (the barrier reef of the coast of Belize), and from three localities in Southeastern USA: Cedar Point (Alabama), Craig Key (Florida Keys), and Indian River Lagoon at Fort Pierce (Florida), the two latter sites being about 350 km apart. The Australian sites were all in depressions immediately behind the beach sand, while the US and Belizean sites were in the upper intertidal zone on the seaward slope of the beach. Finally, worms were also obtained from Turkey (Biga Peninsula in Marmara Sea; courtesy of Sermin Acik Cinar) and South Africa (Grahamstown; courtesy of Sam James). All specimens were preserved in 80–99% (v/v) ethanol for molecular analyses. For other details of the worms used in the analysis, see Table 1. Morphological identification (Fig. 1C) was made based on taxonomic literature following Easton (1984), Gates (1972), and Seesamut et al., (2018)1,2,30. All work with animals was conducted in accordance with the Institutional Animal Care and Use Committee of Khon Kaen University (IACUC-KKU) under approval number IACUC-KKU-32/65.

DNA extraction, PCR amplification, and DNA sequencing

Voucher specimens of P. litoralis from Southeast Asia and Japan, including Taiwan, were used for the extraction of their total genomic DNA from the posterior part of each earthworm using the Lysis Buffer for PCR (Takara) and following the manufacturer's protocol. Two molecular markers were amplified: a fragment of mitochondrial COI and the internal transcribed spacer 2 (ITS2) region of the nuclear ribosomal DNA. The COI fragment was amplified with the Tks Gflex™ DNA Polymerase (Takara) using universal primers HCO2198 and LCO 149039, while primers 606F (forward) and 1082R (reverse)40 were used for ITS2. The PCR mixture was as follows: 1 μL of Tks Gflex DNA Polymerase (1.25 unit/μL), 25 μL of 2 × Gflex PCR buffer (Mg2+, dNTP plus), 1 μL each of primers (10 μM), 19.5 μL of sterilized distilled water, and 2.5 μL of crude lysate with Lysis buffer. The PCR thermal cycling was performed as 94 °C for 2 min, followed by 35 amplification cycles of 94 °C for 60 s, 48 °C for 60 s, and 72 °C for 2 min and then followed by a final 72 °C for 5 min. The concentration and quality of the amplicons were examined by 1% (w/v) agarose gel electrophoresis against a DNA standard marker in 1 × TAE buffer and detected under UV transillumination after staining with SYBR® Safe DNA Gel Stain. The samples for which direct sequencing of the nuclear gene markers failed were subjected to subcloning using Promega pGEM-T Easy Vector System (Promega, Cat: A1360) to separate allelic variants before sequencing. The purifying and sequencing of PCR products were done commercially by Macrogen Inc. (Japan).

For the specimens from the remaining localities, DNA was extracted from small pieces of worm tissue with the E.Z.N.A.® Tissue DNA Kit II (Omega Bio-tek), following the instructions for kits requiring OB protease, or in some cases with DNeasy® Blood & Tissue Kit (250) (QIAGEN). For samples extracted with E.Z.N.A., the tubes were incubated at room temperature for five minutes before eluting the DNA. The remaining parts of the specimens were deposited, as vouchers, in the Swedish Museum of Natural History, Stockholm. The extracted DNA was then used to PCR amplify fragments of the COI gene and nuclear ITS2 region using puReTaq Ready-To-Go PCR Beads (GE Healthcare). Amplification was done according to the kit instructions. The COI sequences were amplified by thermal cycling with an Eppendorf PCR, programmed at 35 cycles of 40 s at 95 °C, 45 s at 45 °C, and 1 min at 72 °C, with an initial denaturation period of 5 min at 95 °C and a final terminal extension period of 8 min at 72 °C. For the ITS region, there were 25 cycles of 30 s at 95 °C, 30 s at 50 °C and 1 min at 72 °C with the same denaturation and extension period as for COI. The PCR products were checked with electrophoresis on agarose gel (1%) stained with ethidium bromide (3%), and the successfully amplified PCR products were purified using an E.Z.N.A® Cycle-Pure Kit (GE Healthcare) according to the manufacturer’s instructions, except for 100 μL of CP buffer was used and the final elution was done with 40 μL sterile deionized water. The products were then sent to Macrogen Inc., South Korea, where all samples were sequenced.

Sequence editing, alignment, phylogenetic reconstruction, and haplotype analysis

To identity and verify the amplified sequences, the obtained sequences were submitted to the BLASTn algorithm to check and compare with other sequences available in the GenBank databases in the National Center for Biotechnology Information- NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi). All sequences were reassembled, edited, and aligned in MEGA X41 using the MUSCLE algorithm42 with default parameters, and then manually checked by eyes.

The phylogenetic analyses of the COI gene and the concatenated dataset (COI + ITS2) were conducted. The best-fit nucleotide substitution model of each gene fragment for phylogenetic analysis was determined using JModelTest v2.1.1043. Phylogenetic trees were reconstructed under maximum likelihood (ML) through the online portal CIPRES Science Gateway44 as implemented in RaxML-HPC2 on XSEDE45, with 1,000 bootstrapping replicates and default parameter settings. The ML tree based on the RaxML program was constructed under the GTR + CAT model for the best-fit nucleotide substitution. The resulting tree was plotted using FigTree v.1.4.4 (http://tree.bio.ed.ac.uk/software/figtree) and the tree diagram was created in Adobe Photoshop 2020. The ML analysis of the concatenated data (COI and ITS2) was done after partitioning the concatenated data with Kakusan446. For the haplotype analysis, the NEXUS format was created by DnaSP v.647 and the haplotype networks were constructed in PopArt48 using the TCS method49. Genetic divergences were examined using uncorrected p-distance as implemented in MEGA X with a bootstrap re-analysis of 1,000 pseudoreplicates.

Mitochondrial and multi-locus species delimitation analyses

Molecular species delimitation using the COI sequences was performed using the ABGD20, ASAP21, bPTP22 and GMYC23 methods. The ABGD is a simple method to split a sequence alignment data set into candidate species. We used the ABGD online server with default settings, to divide the specimens into clusters (http://wwwabi.snv.jussieu.fr/public/abgd). The ASAP analysis21 was implemented in an online web server (https://bioinfo.mnhn.fr/abi/public/asap/) under Kimura (K80) model. The lowest score was considered50. The bPTP analysis was carried out using an online web server (https://species.h-its.org/) with 100,000 MCMC generations. The GMYC method is a likelihood method for delimiting species by fitting within- and between-species branching models to reconstruct gene trees. The initial Bayesian tree was constructed in the BEAST v1.10.4 package51,52. All parameter settings were configured in BEAUTi v1.8.4, while Tracer v1.6 was used to check the estimate sample size (ESS) values and run the trace file. Using the ultrametric tree produced by BEAST, the GMYC analysis was performed in the R package splits.

Multi-locus species delimitation was performed using BPP v.3.328,29 on the COI and ITS2 datasets used in the ML analysis. The molecular operational taxonomic units (MOTUs) obtained from the GMYC analysis was used as the input as this analysis yielded the highest number of MOTUs, except for one MOTU for which no ITS2 sequence was available and so this MOTU was omitted from the analysis. The joint Bayesian species delimitations and species tree estimations28,53,54 were used, and three analyses (A-C) with different population size (estimated by θ) and divergence time (τ0) priors were performed, using the same settings and priors as in Martinsson and Erséus (2018)55 and Martinsson et al., (2020)32 (A: θ = 2, 400, τ0 = 2, 200; B: θ = 2, 1000, τ0 = 2, 200; C: θ = 2, 2000, τ0 = 2, 200). Each analysis was run for 200,000 generations, discarding the first 4,000 as burn-in, and all analyses were performed three times to confirm consistency between runs. We considered the species delimited with a PP (posterior probability) > 0.95 in all analyses to be well supported.