Rapid dissemination of taxonomic discoveries based on DNA barcoding and morphology

The taxonomic impediment is characterized by dwindling classical taxonomic expertise, and slow pace of revisionary work, thus more rapid taxonomic assessments are needed. Here we pair rapid DNA barcoding methods with swift assessment of morphology in an effort to gauge diversity, establish species limits, and rapidly disseminate taxonomic information prior to completion of formal taxonomic revisions. We focus on a poorly studied, but diverse spider genus, Pseudopoda, from East Asia. We augmented the standard barcoding locus (COI) with nuclear DNA sequence data (ITS2) and analyzed congruence among datasets and species delimitation methods for a total of 572 individuals representing 23 described species and many potentially new species. Our results suggest that a combination of CO1 + ITS2 fragments identify and diagnose species better than the mitochondrial barcodes alone, and that certain tree based methods yield considerably higher diversity estimates than the distance-based approaches and morphology. Combined, through an extensive field survey, we detect a twofold increase in species diversity in the surveyed area, at 42–45, with most species representing short range endemics. Our study demonstrates the power of biodiversity assessments and swift dissemination of taxonomic data through rapid inventory, and through a combination of morphological and multi-locus DNA barcoding diagnoses of diverse arthropod lineages.

The turn of the millennium saw a reinforced emphasis on the "taxonomic impediment" 1,2 , characterized by dwindling classical taxonomic expertise. The typical slow pace of formal taxonomic revisions, combined with limited funding and lack of taxonomic jobs, translates to a significant lag between species discovery and taxonomic dissemination 3 . While notable effort and funds have been devoted to overcome this problem 2 , the taxonomic impediment persists 4 in spite of recent deluge of modern approaches to taxonomy that relate to DNA barcoding [5][6][7][8] and cyber-dissemination 9,10 . The problem of formal taxonomic speed of progress is furthermore exacerbated by increasing global climate changes and habitat destruction, both amplifying extinction speed [11][12][13] . Species-level taxa are the basic currency for most biological research and underlie conservation decisions, but for these purposes taxonomic information, no matter how great, is useless until it is published. The field urgently needs rapid taxonomic inventories that allow dissemination of taxonomic information-species delimitation, DNA barcodes, and morphological diagnosis-and avoid the typical taxonomic time-lag.
For hyperdiverse organisms such as many arthropods, the number of species described to date represents only a fraction of their estimated actual diversity 10,14 . The rate of discovery and taxonomic description of species is typically constrained by limited expertise and relatively slow pace of classical monographic research. For example, collection, examination, and description of specimens is usually challenged by deciphering taxonomic legacy including type searching, examination and interpretation, and reviewing scattered and often obscure literature. Thus, available information is withheld for years and sometimes decades before publication, a time during which this information is of little or no use to taxonomy's end users. Truly integrative taxonomic revisions that combine genetic/genomic data with classical full blown taxonomic treatments continue to be published, but usually at slow pace, and thus more rapid taxonomic approaches may also be useful 15 . Some studies thus rely heavily on 1 Scientific RepoRts | 6:37066 | DOI: 10.1038/srep37066 DNA barcodes for species discovery omitting formal taxonomy (paragraph below), and others emphasize rapid dissemination of informal or formal taxonomic information 16,17 . We explore a combination of such approaches by surveying a diverse lineage of huntsman spiders from East Asia (Fig. 1), the genus Pseudopoda Jäger, 2000 (Fig. 2). Our approach does not attempt in any way to replace formal taxonomic revisions, merely to speed up transmission of information.
Hebert and colleagues 6,7 proposed that DNA barcoding based on a 650 bp fragment of the mitochondrial gene cytochrome oxidase subunit I (COI) can be used for species identification and delimitation across diverse animal phyla, and since then the field has seen dramatic development and explosive use 18,19 . Despite some early opposition 20,21 , DNA barcoding is now routinely used in spider taxonomy [22][23][24][25][26][27] , although literature continues to disagree whether COI data may suffice 19 or not 28 to accurately delimit species. In addition to the classical mtDNA barcoding fragment, authors are increasingly analyzing additional nuclear markers 29,30 and genomic data. Genomic approaches are certainly powerful, but may be unnecessarily cumbersome (and costly) if the goal is rapid taxonomic dissemination; the data may be prohibitively burdensome to analyze while not adding critical elements to taxonomic decisions and may furthermore increase, rather than decrease, the time lag between discovery and description. However, a combination of barcodes from the biparentally inherited nuclear genome and the uniparentally inherited mitochondrial genome offers a simple and rapid way of improving the efficacy of DNA barcoding. Among nuclear DNA loci in both plants and animals, the internal transcribed spacers (ITS, ITS2) are n/a n/a n/a n/a  among the most popular markers for reliable species discovery. The reasons include ease of amplification, and a combination of conserved and variable regions 31,32 . ITS2 has been used in spider phylogenetic 14,33,34 and phylogeographic 29,30 analyses. However, it has not to our knowledge been utilized as the second, nuclear barcoding locus in taxonomic species delimitation, as successfully done e.g. in Collembola 35 .
In our study, we aimed to: (i) assemble a DNA reference library from described and, judging from our morphological identifications, putatively new Pseudopoda species; (ii) test the efficacy of DNA barcodes for species discovery, delimitation, and identification; (iii) evaluate the congruence among COI and ITS2 barcodes and morphology in regards to Pseudopoda taxonomy; (iv) provide morphological characters to directly support each inferred Pseudopoda species identification; and (v) rapidly disseminate taxonomic information alleviating the 'taxonomic lag' of standard approaches, with the explicit aim to provide a formal revision of these in due time. Using a combination of DNA barcoding with rapid morphological assessment, we find 19-22 new species in addition to 23 previously described ones. We provide evidence and taxonomically diagnostic information for each species and through rapid dissemination make these taxa -some as yet not formally named-immediately available for end users.

Results
Morphology. Based on traditional diagnostic features of genitalia a total of 572 individuals belong to 42-45 species including 19-22 new species, here referred to as Pseudopoda sp. Females of two species previously only known from males (P. roganda and P. interposita) are reported for the first time. We provide basic morphological information including diagnostic photographs of habitus ( Figure S1), palp ( Figure S2) and epigyne ( Figure S3), with full taxonomic treatments to follow (Cao et al. prep.). Most of the species are comparably small and possess similar carapace patterns, making species identification based on somatic features challenging ( Figure S1). In contrast, most species are readily diagnosable using genital characters although with the following exceptions: populations of P. bibulba (P. bibulba xz collected from XS, ZXS and P. bibulba we from EWS, WCT, see Table S1) and three populations of P. yunnanensis (P. yunnanensis wfs from WFS, P. yunnanensis qss from QSS, and P. yunnanensis ews from EWS, see Table S1) are genetically distinct but morphologically similar. In contrast, species within the P. sinapophysis group including P. sinapophysis, P. interposita, P. sp15, and those within P. sp2 group including P. sp2 and P. sp12 (Figures S2 and S3) are genetically distinct, but not readily morphologically diagnosable species. For P. bibulba, widespread in Yunnan Province, we collected specimens from four sites (XS, ZXS, EWS, WCT) (Fig. 1), and identified two morphs based on minor details of male palps and epigyna ( Figures S2H2, S2H3, S3L2, S3L3, S3M2, S3M3): P. bibulba we from WCT and EWS, P. bibulba xz from XS and ZXS. These can be distinguished by the sharp, curved embolic end, slightly different retrolateral tibial aphophysis in males ( Figures S2H2, S2H3), and the slightly broad anterior margins of the lateral lobes in females (Figures S3L2,  S3L3, S3M2, S3M3). Similarly, details of genitalia can diagnose species of the P. sinapophysis group and P. sp2 group, including the shape of retrolateral tibial aphophysis in males for both groups ( Figures S2C3, S2C4, S2D1 for sinapophysis group; S2B3, S2B4 for P. sp2 group), the anterior margins of the lateral lobes in females for sinapophysis group (Figures S3E1, S3E2, S3E3), and the posterior margins of the lateral lobes in females for P. sp2 group (Figures S3C3, S3D1). For P. yunnanensis, also widespread in Yunnan Province, we collected many specimens from three sites (QSS, EWS, WFS) ( Figure 1) belonging to three morphs respectively: P. yunnanensis qss, P. yunnanensis ews, P. yunnanensis wfs based on the tegular apophysis of the male palp ( Figures S2J1, S2J2, S2J3), while the epigyna are not diagnostic (Figures S3O1-P3).
Phylogenetic inference. From the total of 42-45 species analyzed here, 36-39 were represented by multiple individuals and 6 by a single individual (Table S1). All phylogenetic analyses (BI and ML for COI, ITS2, COI + ITS2) provided high resolution broadly agreeing on species limits, but with some notable differences in topology (Figures 3 and S4-8, Table 1). The BI and ML trees based on COI + ITS2 unambiguously place all individuals in cohesive, morphologically diagnostic groups, most of them strongly supported (100% bootstrap and posterior possibilities) (Figures 3 and S8, Table 1). The BI and ML trees based on COI showed that described species represented by multiple individuals were monophyletic with high support except P. yunnanensis that is paraphyletic ( Figures S4-5, Table 1). The BI tree based on ITS2 alone also supported the monophyly of 34 of 35 putative species with the only exception of P. serrata nesting within P. recta (ML tree is nearly identical, Figures S6-7, Table 1). The other five species represented by a single-specimen each formed an isolated branch outside other species clades (Figures 3 and S4-S8).
Species delimitation. Barcoding gap. COI barcodes for all 572 individuals were generated belonging to 42-45 putative species (Table S1). A subset of 140 individuals representing all but one of the putative species were amplified for ITS2. We found non-normally distributed K2P data (Kolmogorov-Smirnov test for all groups P < 0.001) for all groups, and statistically significant differences in K2P values between all intraspecific and interspecific comparisons (Mann-Whitney test P < 0.001) (Table S2). Single marker barcoding gap did not delimit the entire set of 42-45 putative species (Figure 4). However, most of putative species represented by multiple individuals had pairwise barcoding gaps. The exceptions include a couple of species with high intraspecific variation, that may represent additional cryptic diversity. Thus P. bibulba, P. namkhan, sp18, P. yunnanensis do not have pairwise barcoding gaps for COI ( Figure S4, Table S2), and P. daliensis and P. sp15 do not have pairwise barcoding gaps for ITS2 ( Figure S6, Table S2). Only P. yunnanensis can not be separated from close relatives with a pairwise barcoding gap in the combined COI + ITS2 dataset ( Figure 3, Table S2). For the "global" barcoding gaps, the concatenated COI + ITS2 dataset supported 44 species fully congruent with morphology, while single gene datasets defined 41-44 species (Figure 4, Table S2).

Species delimitation metrics from Geneious.
In general, Rosenberg's P AB statistic and P ID (liberal) yielded higher number of putative species than did P ID (strict) and P RD (randomly distinct). For the COI dataset, all putative species were monophyletic except P. yunnanensis, but both P ID (Strict) = 0.88 and P ID (Liberal) = 0.97 supported P. yunnanensis (Table S3). For P ID (Strict), putative species were supported by values of over 0.7, except P. confusa (0.47), P. sp14 (0.58), P. sp4 (0.58), P. sp7 (0.58). However, at least two of the four metrics supported all species (Table S3), and for example, all clades had P ID (Liberal) values above 0.9 except P. confusa (0.84) ( Table S3). The P RD (Randomly Distinct) and Rosenberg's P AB statistics also supported most species (for detail see Table S3). For the ITS2 dataset alone, and for the concatenated COI + ITS2 dataset, at least two metrics support each species hypothesis (Table S3).
ABGD. For the COI dataset, ABGD analyses using different parameter combinations produced non-identical results (Table S4). The analyses based on JC or K2P distances yielded 48 species under each parameter setting broadly agreeing with the above results, but further splitting up P. bibulba, P. namkhan, P. sp18, P. yunnanensis into P. bibulba xz, P. bibulba WCT, P. bibulba EWS, P. namkhan FLS, P. namkhan XSBN, sp18 FLS, sp18 HLT, P. yunnanensis qss, P. yunnanensis ews, P. yunnanensis wfs ( Figure S4). Analyses based on p-distance yielded even more putative species (Table S4). For the ITS2 dataset the ABGD results differed under different parameter settings (Table S4), however, they broadly agreed with other methods. The analyses based on JC or K2P distance yielded 46 species (not supporting P. prompta) supporting the core set of 41 putative species plus additional splitting: P. daliensis and P. recta were split to two and P. sp15 into three putative species, respectively. P. yunnanensis was split into P. yunnanensis qss, P. yunnanensis ews, P. yunnanensis wfs and P. sp7 and P. spiculata were merged ( Figure S6). Again, analyses based on p-distances yielded more species. The analysis of the concatenated matrix yielded broadly congruent results (Figure 3, Table S4).
GMYC. In general, the single-threshold GMYC model produced more species hypotheses (46-62 species), likely overly splitting genetically structured populations (Figures 3, S4 and S6). The GMYC results of COI, COI + ITS2 appear biologically unrealistic, fragmenting several morphological species into several geographic clusters (Figures 3 and S4). However, analyses of the ITS2 nuclear fragment yielded 46 putative species, which is broadly congruent with other analyses and with morphology ( Figure S6). bPTP. Similar to GMYC, the bPTP results imply up to 102 species, but with the analysis of the concatenated data yielding somewhat less extreme values (59 species , Fig. 3).

BPP. The Bayesian phylogenetics and phylogeography method was congruent with most above hypotheses
showing the highest support for the 44 species hypothesis (P = 1) and only slightly lower support for the 41 species hypothesis (P = 0.983).

Biogeographical patterns. Our results suggest that small range endemism characterizes Pseudopoda spe-
cies, yet with several species being sympatric (Table S1 , Figs 1 and 5). The phylogenetic analyses revealed strong geographic structure, as exemplified by two clades: (Eastern China; Tropical mainland China, Taiwan and Hainan Islands, Laos) (Fig. 5).

Discussion
Accurate identification of species is a crucial step in many areas of biological research. This step can be challenging due to lack of taxonomic expertise and the long lag between species discovery and formal taxonomic revisions. Thus information, even if it is available, is inaccessible to end users before taxonomic revisions are published. On the other hand, a serious limitation to the utility of DNA barcoding as a practical taxonomic resource is the lack of accompanying morphological data, and potential misidentification of voucher specimens 36 . The importance of accurate identification is obvious and providing a detailed voucher information and morphological character evidence used for identification vastly adds to the utility of DNA barcoding 37 . Our approach here aims to mitigate these problems by rapidly disseminating taxonomic information thus allowing immediate access to reliably identified specimens that are linked to DNA barcodes and morphological voucher specimens. In addition, we provide diagnostic illustrations of key characters of each putative species.
DNA barcoding routinely uses a single mitochondrial marker to establish species limits 6,7 , and this approach has recently been shown to work well in spider species identification 19 and delimitation 27 . However, some authors question the usefulness of COI data alone 28 and rather suggest the use of multiple markers, especially for the delimitation problem. Indeed, our results show that a single mitochondrial marker in isolation provides species delimitations that differ from those when the two are combined (Tables S2-S4), and the combination of two markers more closely matches morphological and geographical data. Of the tested approaches to species delimitation, GMYC and bPTP suggested considerably higher diversity estimates compared with the more congruent, and consistent estimates employing the barcoding gap analyses, PID, BPP, and morphology. Richness seemed overestimated using GMYC when analyzing COI in isolation and the combined dataset, but not when analyzing ITS2 alone. Thus the structuring of the mtDNA data at population level seems to mislead the method which may be more appropriately used with nuclear markers.
Despite some disparity among methods, the majority of the analyses agree about a twofold increase in known Pseudopoda species diversity in our sample, estimated now to be between 42 and 45. The correspondence between standard morphological diagnostic features (male palps) and molecular species delimitations is very high, other than that some barcoding methods suggest additional division of morphological species.
We included only a fraction of Pseudopoda diversity (23/121 known species) and hence biogeographical analyses are premature, however, some clear patterns emerge from our results. Although certain species are somewhat widespread, the striking pattern we observe is that the majority of species are short range endemics, restricted to one mountain, some restricted to one natural forest park, or even a single locality (Fig. 5). In one case, for example, two closely related species (P. interposita and P. sp. 15) both are short range endemics in the same small area, yet show no overlap being separated only by a small river (Fig. 5). According to Jäger 38 and Jäger et al. 39 Pseudopoda species sometimes show high local species diversity because of a strong altitudinal differentiation and an inability to balloon. This may explain many small range endemics within the genus, and other spider groups, e.g. Coelotinae, in the mountainous parts of Southeast Asia 39,40 . Furthermore, our phylogenetic analyses indicate correspondence between geography and phylogeny, possibly hinting at independent colonization events of eastern China versus tropical mainland China, Hainan, Taiwan and Laos. We emphasize that our sample is far too limited for detailed biogeographical analyses and that more geographical sampling and further markers are necessary to fully resolve relationships, and to test biogeographical scenarios in the genus Pseudopoda. However, the emerging patterns of short-range endemics and geographically structured phylogeny should stimulate further research on the biogeography of the group.
DNA barcoding continues to be an important tool to aid taxonomic discovery and identification both at the species and at higher taxonomic levels 7,36 . The single short COI region of classical barcodes, however, can sometimes be insufficient for accurate identification, and alternative markers have been used, e.g. ITS2 in plants. In agreement with our results other recent studies have also found that mitochondrial and nuclear markers combined can outperform standard barcodes, without placing too much burden on data cost and data analysis. For example, the combination of COI and ITS2 proved useful in barcoding of Collembola 35 . DNA barcoding can become more powerful when combined with classical morphological diagnoses and dissemination of standard taxonomical information, using a variety of analytical approaches [41][42][43][44][45] .

Conclusions
Our study demonstrates the power of rapid taxonomically driven inventories combined with swift dissemination of morphological and DNA data to rapidly disseminate taxonomic data of diverse arthropod lineages, making information available long before exhaustive revisionary work can be completed. Such approach may be especially suitable for taxonomically neglected lineages that contain considerable undiscovered diversity. With the vast majority of species remaining to be discovered and described, we advocate minimizing the lag between discovery and dissemination to facilitate biological research and conservation planning.

Materials and Methods
Taxonomic focus. With 121 described species, the genus Pseudopoda Jäger, 2000 (Fig. 2) is the third largest genus of the family Sparassidae 39,40 , although its actual species diversity is insufficiently known. Its species inhabit mountainous forests of southern, eastern and southeastern Asia 46,47 , and are mostly confined to distinct altitudinal zones between 300 and 3800 m 38,39 . About half of the known diversity, 51 species, are known from China. Most species have extremely confined distributions (e.g. P. mediana Quan et al. 46 and P. bicruris Quan et al. 46 ) but their ranges also often overlap (Figs 1 and 5). In Pseudopoda, morphological species delimitation is challenging due to species sympatry, habitus resemblance, and potential abundance of cryptic species. Specialists thus agree that in order to discover and identify species and to accurately resolve the taxonomy in this genus, a combination of morphological and molecular approaches is required 39 .
Specimen collection. We sampled 572 Pseudopoda and 2 Sinopoda individuals from 35 localities in eastern China (Hubei, Jiangxi, Anhui and Fujian Provinces), southwest China (Yunnan Province), and two large islands in the south and east China sea (Hainan and Taiwan). We further obtained nucleotide data from GenBank (P. prompta, from India, as well as P. confusa and P. namkhan, both from Laos) (Fig. 1, Table S1). The individual name includes species name followed by the locality code and specimen code. The locality codes are the locality abbreviations using capital letters (Table S1). We used 2 Sinopoda species (S. anguina, S. pengi) as outgroups. Field collected adult specimens were fixed in absolute ethanol, and their right legs were removed to be stored at − 80 °C for subsequent DNA extraction. All vouchers that we transferred to 75% ethanol for identification and morphological examination are deposited at the Centre for Behavioural Ecology and Evolution (CBEE), Hubei University, China.
Morphology. Genitalic structures (epigyna, male palps) and habitus for each putative morphological species were digitally imaged. Male palps and epigyna were examined and photographed with Leica M205C stereomicroscope and Olympus BX51 compound microscope after being dissected from the spider bodies. The digital images depicting the habitus and genital morphology were a composite of multiple images taken at different focal planes along the Z axis and assembled using the software package Helicon Focus 3.10. Left palps were photographed unless otherwise stated. Genitalia were cleared in boiling 10% KOH for a few minutes to dissolve soft tissues.
DNA extraction, PCR amplification and sequencing. DNA was extracted from two to four legs of each specimen using Universal Gen DNA Kit (CWBIO, Beijing, China). We used the universal primers LCOI1490 (GGTCAACAAATCATAAAGATATTGG) and HCOI2198 (TAAACTTCAGGGTGACCAAAAAATCA) for COI, as well as ITS5.8 (GGGACGATGAAGAACGCAGC) and ITS4 (TCCTCCGCTTATTGATATGC) for ITS2 to obtain PCR products following standard protocols 33,48 . Amplified fragments were sequenced in both directions by the Tsingke Biological Technology (Wuhan, China) and then assembled and proofread using the Chromaseq module in Mesquite employing Phred and Phrap. We generated COI barcodes for all specimens and ITS2 sequences for 140 specimens; these were chosen to represent unique COI haplotypes of all putative species, and all localities. Sequences were submitted to Genbank (accession numbers KY095934-KY096645, see Table S1).

Phylogenetic analysis.
We aligned all sequences using MAFFT 49 through the EMBL-EBI online portal with 100 tree rebuilding replications and 100 max iterations for a thorough search otherwise using default settings. Protein coding gene sequences were translated to amino acids and confirmed to contain no stop codons. Because the 5′ and 3′ ends of some COI sequences were of poor quality, all COI sequences were trimmed to 621 bp. ITS2 sequences were also manually trimmed to 398 bp. All individual sequences were verified to belong to Sparassidae using BLASTn in GenBank. The two genes were concatenated in Mesquite. We conducted Bayesian and maximum-likelihood analyses to estimate phylogenetic relationships among species needed for species delimitation. In all analyses, gaps and ambiguous bases were treated as missing data. We created four data partitions for sensitivity analyses to explore potential data conflict: ITS2, COI1st, COI2nd, COI3rd. The appropriate models for the Bayesian analysis, selected with jModelTest2 on XSEDE (2.1.6) 50 using the Akaike information criterion (AIC), were: GTR + I + G for COI1st and COI2nd, GTR + G for COI3rd and ITS2. Data matrices were analyzed using Bayesian inference with MrBayes 3.2.3 on XSEDE 51 analyzing both individual genes as well as concatenated matrices. The Markov chain Monte Carlo search for each matrix ran with four chains for 50,000,000 generations sampling the Markov chain every 1,000 generations, and the sample points of the first 12,500,000 generations were discarded as 'burnin' , after which the chains had reached approximate stationarity as determined by analysis in Tracer. Maximum likelihood analysis was done with RAxML Black Box on XSEDE 52 on the focal matrix with same partitions as implemented in the Bayesian analysis, but using a GTR + I + G model for all partitions, keeping other parameters default. All large analyses including jModelTest2, MrBayes 3.2.3 and RAxML 8.0 were run in parallel on the CIPRES cluster at the San Diego Supercomputing Center.

Species delimitation.
We analysed individual and concatenated matrices using six species delimitation methods. As DNA barcoding gap 7 and species delineation metrics (SDM) from Geneious require a priori species designation, we assigned the 572 individuals to 42-45 putative species according to combined analysis of phylogenetic topology from gene trees, morphological and geographic information.
We performed two DNA barcoding gap analyses, one looking at all data simultaneously (global) and the other making pairwise comparisons among species (focal) 36 . In the DNA barcoding gaps analysis, we computed genetic distances using Kimura 2-parameter (K2P) for each candidate species in Mega6 53 . We looked for statistically significant differences between intra-and interspecific K2P distances using means or medians, depending on data distribution 19 . We employed Kolmogorov-Smirnov test of normality, then performed parametric (one-way ANOVA) or nonparametric (Mann-Whitney test) statistics in SPSS.
Scientific RepoRts | 6:37066 | DOI: 10.1038/srep37066 The species delimitation plugin in Geneious v8.1.6 was utilized to obtain species delineation metrics including Rosenberg's PAB statistic 54 , PID statistics containing PID (Strict) and PID (Liberal) and PRD. Rosenberg's PAB statistic is the probability that a putative taxon will be monophyletic with respect to a sister clade containing "b" taxa under the null model of random coalescence 54 . The PID statistics provide the frequency with which a member of a putative species can be correctly identified given a specific alignment of sequences. The PID (Strict) requires the query sequence to fall within a monophyletic clade for an identification to be made. The PID (Liberal) requires the query sequence to fall within or to be sister to a monophyletic clade. PRD (probability randomly distinct) is the probability that a clade has the observed distinctiveness due to random coalescence. A probability value less than 0.05 rejects the null hypothesis of random coalescence and suggests that the members of a clade can be classified as species.
The generalized mixed Yule-coalescent (GMYC) methodology 56 uses likelihood to test for species boundaries by detecting the transition point of interspecific versus intraspecific rates of lineage coalescence. We performed GMYC analyses in the "splits" package for R. Following Xu et al. 27 , we used the single threshold model. We employed BEAST v. 1.8.0 57 to obtain an ultrametric gene tree as a GMYC guide tree, under a strict molecular clock model. We used standard arthropod substitution rates for COI employing a normal prior with a mean value of 0.0115, under Yule speciation model prior and ran 50 million generations, sampling every 5000 generations. We used TRACER v 1.6 58 to determine burn-in and to check for stationarity, then discarded as 'burnin' 10% of the trees in each chain to settle on an ultrametric tree using TreeAnnotator.
We carried out a Bayesian Poisson Tree Processes (bPTP) analysis employing the BI tree as input tree, as implemented online (http://species.h-its.org/ptp/) 59 . PTP is a single-locus species delimitation method using only nucleotide substitution information, implementing a model assuming gene tree branch lengths generated by two independent Poisson process classes (within-and among-species substitution events). The bPTP analysis was run using 100 000 MCMC generations, with a thinning of 100 and burn-in of 0.1, with removing the outgroups.
Finally, species delimitation on a guide tree (A10) was conducted using the program BPP 3.1 60 . This allows the use of the combined COI and ITS2 datasets to calculate the posterior probability support for each species delimitation hypothesis. In this analysis (with speciesdelimitation = 1, speciestree = 0), a reversible-jump MCMC (rjM-CMC) algorithm is used to move between different species-delimitation models that are compatible with a fixed guide tree. We used the default prior for the different species tree models (speciesmodelprior = 1), which assigns equal probabilities for the rooted trees. The population size parameters (theta, θ ) were assigned the gamma prior G (2, 1000), with mean 2/2000 = 0.001. The divergence time at the root of the species tree (tau, τ ) was assigned the gamma prior G (2, 1000), while the other divergence time parameters are assigned the Dirichlet prior (Yang & Rannala, 2010: equation 2) 60 . These parameters were obtained and optimized by the Analysis A00 (Parameter estimation under the multispecies coalescent, with speciesdelimitation = 0, speciestree = 0) which generates the posterior distribution of species divergence times (tau, τ ) and population sizes (theta, θ ) under the MSC model when the species phylogeny is fixed. Each analysis was run at least twice to confirm consistency between runs. We used the BI tree as guide tree using the 41 or 44 putative species.