Introduction

Barcoding the Tree of Life

Barcoding initiatives across the tree of life have helped document and describe thousands of species of bony fishes, birds, sharks, and sponges, among many other groups1,2,3,4,5. Cold Code6, the barcoding initiative for amphibians and non-avian reptiles, has similarly produced an immense quantity of sequence data for the mitochondrial locus encoding cytochrome c oxidase subunit I (COI). Cold Code and other barcoding initiatives provide a cost-free sequencing service for up to ten individuals of any species. In conjunction with databases such as the Barcode of Life Data Systems (BOLD), GenBank, and Dryad, researchers without access to sequencing facilities can produce and visualize novel sequences before adding preexisting data and running analyses. Implementation of Cold Code has contributed considerably to taxonomic resolution in Third World nations, and has been applied for conservation efforts in these regions that most need them7. Although Cold Code instigated barcoding on the grounds of species identification and discovery8, recent studies have increasingly used barcoding data for phylogenetic inference and to answer phylogeographic questions9, 10. This practice is often undertaken without sufficient assessment of the utility of barcoding for the taxonomic group of interest. Inference at deep timescales, may be severely compromised by the rapid mutational rate and limited size of the COI fragment used for barcoding. At shallower timescales, and in narrower phylogenetic contexts, DNA barcoding remains valuable11.

Limitations to Barcoding

Despite ease of amplification, subsidized sequencing, and fast mutational rates making for high informativeness, mtDNA species-level inference via barcoding has its drawbacks. Mitochondrial phylogenetic reconstruction may be hampered by introgression and hybridization, male-biased gene flow, and selection on the linked mitochondrial genome, among other limitations12. Specifically, in several taxonomic groups—blowflies13; birds14; orthopterans15; dipterans16—mtDNA divergence and barcoding have been shown to be insufficient in delineating rapidly evolving species lineages, or those likely to introgress mitogenomes. However, these cases are interesting exceptions and when barcoding is used in concert with alternative methodologies such as ecology, morphology, and nuclear genomic data, barcoding is a powerful tool17,18,19. These integrative approaches facilitate pluralistic assessments of species delimitation and enhance accuracy. Requisite morphological diagnosis as part of species descriptions can quickly and easily pair with molecular data produced by DNA barcoding20, 21.

Systematics of Cyrtodactylus Gray 1827

Since the last extensive molecular phylogenetic assessment of Cyrtodactylus 22, more than 40 new species have been described using morphological, molecular, or integrative methods21, 23,24,25. Indeed, as of 2016, several species26,27,28,29,30,31 and many lineages await description23, 32, 33. These add to the more than 200 formally described species34, and contribute to the growing number of publications (100+ per year) discussing Cyrtodactylus (Supplemental Fig. 1). In lieu of costly molecular methods, many of these species descriptions rely solely on a morphological framework. These analyses distinguish species from their closest congener(s), diagnose species within their local region, and leave them unassigned or ambiguously assigned to a more inclusive species-group. This is compounded by rapid species discovery which outpaces a phylogenetic understanding of this immensely successful genus.

Cyrtodactylus ranges from Pakistan and western India eastward to the Solomon Islands and in doing so covers an enormous expanse of ecoregions and global biodiversity hotspots35. Given the distributional spread across geopolitical borders, the number of researchers involved, and methods of specimen collection, it remains a challenge to keep current with the systematics of this group. Biodiversity estimates are consistently underreported for a number of countries within the range of Cyrtodactylus. With increased attention and sampling throughout Southeast Asia, specifically in the Indochinese, Sundaic, Philippine, Wallacean, and Papuan regions, it remains vital to maintain consistency in methods for accurate records of species diversity. Where barcoding datasets do exist for Cyrtodactylus, they have been created almost exclusively for species descriptions21, 24, 25. Often these barcoding phylogenies are carried out within the confines of a single country, such as for Laos36 and Vietnam20, 37. The complex geological histories of the regions across which Cyrtodactylus occurs, and the convoluted biogeographic history of the genus itself, make these ‘barcode-by-country’ reviews potentially misleading in their phylogenetic conclusions. Indeed, more inclusive molecular phylogenies are already beginning to resolve the synonymy of a number of bent-toed gecko species38. And while we are aware of no researchers who would agree with a geopolitically monophyletic hypothesis (clades are restricted to country borders) for Cyrtodactylus, ‘barcode-by-country’ reviews continue to unintentionally make just such phylogenetic assumptions.

Herein, we highlight the utility of the barcoding marker COI for intraspecific and shallow interspecific phylogenetic use, and encourage its use as an alternative to morphology-only systematic comparison. Additionally, we hope to draw attention to the potentially damaging practice of “barcoding-by-country,” by elucidating the fractured biogeographic history of Cyrtodactylus throughout the Indochinese region. We use Vietnam as an explicit example of a geopolitical boundary thought to be inhabited by three independent lineages22, to encourage a broader comparison of Cyrtodactylus in taxonomic and systematic works. Ultimately, for researchers without access to funding or sequencing facilities, DNA barcoding with the Cold Code continues to allow us all to work towards more complete sampling of Cyrtodactylus, providing a more accurate picture of the taxonomic and morphological diversity of this genus.

Results

Phylogenetic Inference using COI and ND2

New sequences and those acquired from GenBank included a total of 63 individuals sampled for both mitochondrial markers. In the fully sampled COI (Fig. 1) and the COI/ND2-standardized genealogies (Fig. 2), deeper relationships within Cyrtodactylus obtained very little support. However, nearly all (37/39) intraspecific relationships were strongly supported (BSS ≥ 90%). Sister-taxa relationships are also well supported (≥70%) in both full and standardized genealogies. As expected, no support existed for reciprocal monophyly of current geopolitical regions.

Figure 1
figure 1

‘Fully-sampled’ maximum likelihood phylogeny of Cyrtodactylus as inferred from mitochondrial locus COI, including novel sequences contributed by this study (51) indicated by asterisks. Circles at nodes indicate BSS values of ≥70: grey indicate intraspecific sampling and black interspecific sampling. Bolded names indicate samples also included in the ‘Standardized ND2’ phylogeny (Fig. 2). Sample numbers are included to aid in determining relationships in cases where more than 2 samples were used for a given species, or species are reconstructed as paraphyletic. Cyrtodactylus pubisulcus image drawn by IGB from photograph courtesy of Ben Karin.

Figure 2
figure 2

‘Standardized ND2’ Maximum likelihood genealogy of ND2 including only taxa for which COI sequence data also exist. Circles at nodes indicate clade congruence between ND2 and COI loci, with BSS values of ≥70: blue indicate species groups, black interspecific sampling. Asterisks indicate new ND2 sequences contributed by this study. Upper map shows the geopolitical distribution of samples included in this phylogeny, and colored circles associated with tree tips correspond to this map. Lower map highlights the Indochinese region, and boxes represent generalized sampling localities of species groups (IM, IA, IB, IC, TM, WM, EW, VA, VB; denoted by blue circles at nodes). Sampled country localities indicated by colored circles at the tree tips highlight the interdigitated nature of geographic relationships within phylogenetic species groups. Maps drawn and adapted by IGB in Adobe Illustrator CS6 from public domain image provided by Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Location_Map_Asia.svg).

The genealogy based on ND2 and standardized to our COI sampling strongly supported the majority of intraspecific relationships (Fig. 2). Analyses of sampling-standardized ND2 obtained greater and more frequent support for sister-taxa relationships, as well as strong support (≥90%) at a number of deeper nodes that denoted species-groups of Cyrtodactylus (Fig. 2; colored boxes denote geographic region). Biogeographic matrilines returned by analysis of ND2 were largely consistent with those presented by Wood et al.22, albeit with reduced support.

Congruence in Mitochondrial Markers

Prior phylogenetic reconstructions (combined mitonuclear) of Cyrtodactylus found mtDNA matrilineal genealogies and nDNA phylogenies were largely congruent22, 23, 32. Matrilineal phylogeny as inferred by ND2 has been valuable in predicting accurate phylogenetic relationships within Cyrtodactylus 22. Both ND2 and COI genealogies strongly supported the monophyly of several species groups that were obtained consistently in other investigations of Cyrtodactylus 23, 32, 39,40,41. Exclusive of C. battalensis—the sole representative of the West Himalayan group—there was strong support (91-ND2/72-COI) for the monophyly of an India-Myanmar (IM) sister-group to the remaining species of Cyrtodactylus. Both genealogies supported three independent Indochinese groups: (A; IA) C. chanhomae, C. lomyenensis, and C. phongnhakebangensis (96/83); (B; IB) C. hontreensis, C. intermedius, and C. phuquocensis (98/72); and (C; IC) C. tigroides, C. bichnganae, and C. cf. chauquangensis (99/70). These matrilines included residents of Thailand, Laos, and Vietnam, without geopolitical monophyly. Members of the ‘C. sworderi complex’ (WM)39, 40 varied in support (100/65), as did an East/West Malaysian (EW) group composed of C. pubisulcus, C. yoshii, and C. aurensis (88/72). Moderate support existed for a Thai/Malay Peninsula (TM) matriline comprised of C. interdigitalis, C. elok, and C. jarakensis. Additionally, there was strong support for distinct Vietnamese groups A (VA) (100/73) and B (VB) (85/75), although no consistent support united them into a monophyletic group (55/40). Indochinese species from Vietnam, Thailand, and Laos were assigned to multiple clades (5, 3, and 3, respectively), which were strongly supported across both molecular datasets.

Discussion

As in any field, assessing the appropriateness of the data to resolve the question of interest is paramount. In molecular systematics studies, this means addressing the ability of the data to provide phylogenetic information at the evolutionary depth or depths of interest. DNA barcoding has been lauded as a way to cheaply and rapidly include molecular data into species descriptions and phylogenetic studies. However, the evolutionary scale of the group of interest often resides outside the limits of barcoding’s phylogenetic reconstruction abilities. We find that COI alone can not replace phylogenetic assessment by multilocus mitonuclear study, nor does it resolve relationships as accurately as another, single mitochondrial locus (ND2). What it does provide however, is valuable information for shallow scale interspecific and intraspecific systematics, which are invaluable to species discovery.

When viewed in its entirety, instead of by geopolitical boundaries, Cyrtodactylus show a general West to East biogeographic trend22. A number of eastward dispersals of Indochinese origin into the Sundaic, Wallacean, Papuan, and Philippine regions punctuate this overall pattern22. These dispersal events account for the distribution of geographically proximate species interspersed across the tree of Cyrtodactylus. This is particularly relevant to the appropriate differential diagnosis of novel taxa. Some groups of Cyrtodactylus are easy to identify morphologically from geographic congeners, such as ground-dwelling members of the subgenus Geckoella from India and Sri Lanka23, Papuan giants42, and Sundaic dwarves43. In contrast, however, Vietnamese bent-toed geckos represent a prime example of a morphologically conservative body plan involving multiple species groups. Our trees depict five well supported matrilines of Vietnamese Cyrtodactylus (Fig. 2; orange circles) interspersed with inhabitants of other Indochinese and Sundaic nations. This convoluted biogeographic history highlights the necessity of molecular and morphological comparison against closest phylogenetic and not solely political congeners.

Barcoding initiatives across the tree of life largely coincide with an interest in species discovery and delimitation. At least 12 species of Cyrtodactylus have been described since 2012 using a combination of morphological means and barcoding data. However, during that same period, several other species have been described based solely on morphological assessments26, 44,45,46,47,48. Prior to the initiation of DNA barcoding and Cold Code, the inclusion of molecular data into species descriptions was time-intensive, costly, and limited significantly by access to sequencing resources. The advent of Cold Code and the introduction of subsidized genetic barcoding makes it possible to include molecular results in species descriptions. Notwithstanding, barcoding is not the ultimate phylogenetic tool because it offers a matrilineal perspective on the history of species only, and the rapid evolution of barcoding genes often precludes the resolution of deep relationships.

DNA barcoding in other taxa has, unfortunately, unsuccessfully resolved interspecific relationships, identified independently evolving lineages, and, worse, misidentified interspecific relationships as a result of mitogenome introgression13,14,15,16. Our analyses address the use of genetic barcoding as a method for inferring historical associations among species of Cyrtodactylus via direct comparison with another popular mitochondrial marker ND2. Prior to the implementation of Cold Code, alternative mitochondrial markers such as ND2, 16S, and cytb have been used more frequently as markers for identifying independently evolving units for taxonomic description. However, as DNA barcoding has become more popular, COI has supplanted alternatives due to its near-universal applicability. COI also is the dominant marker for describing and inferring relationships between novel taxa within this genus. As a result, many species of Cyrtodactylus have been described using morphology in combination with either COI or ND2, but rarely both molecular markers. Here, our assessment adds 46 additional samples to allow for direct comparison of both loci, to assess the value of COI as a phylogenetic tool in Cyrtodactylus.

Neither COI nor ND2 successfully resolve deeper relationships within Cyrtodactylus with much support. This result likely owes to the phylogenetic depth, i.e. age of the genus, and the limitations of employing a single locus. Notwithstanding, the matrilineal phylogeny as inferred using ND2 is largely concordant with the nuclear DNA phylogeny of Wood et al.22. Moderate to strong levels of support for a series of species-groups in Fig. 2 highlights the value of COI at resolving shallow interspecific relationships that are consistent with those of ND2. The smaller fragment of COI (658 bp) and slower mutational rate when compared to ND2 (1047 bp + 400 bp of tRNAs) hamper phylogenetic inference beyond close relationships (Fig. 1). As an identifier of species groups, COI performs moderately well by providing support for 9 of 12 matrilines obtained with strong support by analysis of ND2.

DNA barcoding has been used most frequently in Cyrtodactylus as a method for describing and inferring relationships between novel taxa. Most of these investigations have used COI exclusively, and because of this, COI and ND2 datasets are largely non-overlapping. The standardizing of datasets across mitochondrial loci serves to evaluate the phylogenetic utility of COI as a tool for genealogical inference relative to ND2. Ultimately, many sister-taxa and some higher level relationships as suggested by our fully sampled COI tree cannot be tested against ND2 due to sampling. While COI plays a valuable role in species discovery and as a tool for informing other comparative methods (morphology, ecology, biogeography), we also recognize its shortcomings. When possible, we encourage the use of additional molecular markers (ND2, RAG1, PDC, MXRA5) for inferring relationships within this ultra-diverse genus. Ultimately, confident resolution may require massive amounts of data that next generation genomic sequencing yields, either complete mitogenomes, or SNPs from nuclear DNA. In addition to Cold Code-funded barcode sequencing, we encourage potential descriptors of new species of Cyrtodactylus to contact IGB and AMB regarding the possibility of additional molecular sequencing.

When used as the sole molecular marker for phylogenetic inference of a group of any considerable depth, or as an intraspecific marker for tracking matrilineal history, COI is unlikely to provide the resolution desired to confidently support or refute hypotheses. When appropriately used as part of a pluralistic methodology, however, DNA barcoding may prove extremely useful. Prior molecular assessment or “genetic screening” can help accurately place a novel species into a species group for the most useful morphological comparison. While it is important to diagnose new taxa in reference to geographic congeners, it is also necessary to distinguish it from its closest evolutionary congeners, to help develop a more complete image of its history. The high expense of DNA sequencers and satellite equipment and time-intensive methods continue to impede the inclusion of genetic data in species’ descriptions. In response, Cold Code provides cost-free sequencing of the DNA barcoding locus COI for up to 10 individuals of any species.

Materials and Methods

Ethics

Field and laboratory experimental protocol for NSF subaward 13–0632 and DEB 0844532 were approved by Villanova University IACUC (approval: 16-14 and 11-04 respectively). Cyrtodactylus samples were collected in compliance with permits to NVT at the Institute of Tropical Biology, under the Vietnam Academy of Science and Technology, following guidelines of the Institutional Animal Care and Use Committee (IACUC).

Taxon Sampling and Molecular Methods

New sampling for this project was built upon molecular datasets assembled for investigations into inter- and intraspecific relationships within Cyrtodactylus 21,22,23,24,25, 36, 37, 39,40,41, 49. A large number of sequences were acquired from GenBank, but to this growing dataset we have sequenced 51 additional samples for COI, and a further 25 samples sequenced for the mitochondrial locus ND2. Due to its comparatively fast mutation rate, length, history in the literature, and ease of amplification, ND2 has been used consistently in studies of squamate phylogenetics (>20,900 GenBank records), and as the primary locus for the systematics of Cyrtodactylus (>900 GenBank records). For these reasons we have chosen to compare COI directly to ND2, for use in bent-toed gecko phylogenetics. All samples are accompanied by locality data, voucher information, and GenBank accession numbers, recorded in Table 1.

Table 1 List of samples used in this study with appropriate voucher (museum or field) numbers, locality data, and GenBank accession numbers.

After extracting genomic DNA from liver, heart, or tail tissue preserved in 95–100% ethanol via Qiagen DNeasy Blood and Tissue kits (Qiagen), isolated DNA was quantified using a NanoDrop spectrophotometer (Thermo Scientific). Samples for COI amplification and sequencing were sent to South China DNA Barcoding Center at the Kunming Institute of Zoology. ND2 samples were amplified via polymerase chain reaction using standard primers and protocols22. All sequences were assembled, edited, and aligned in Geneious v.7, and protein-coding regions were translated to amino acid sequences to maintain proper reading frames and avoid premature stop codons. tRNA secondary structure was addressed and adjusted by eye for consistency. Final COI and ND2 alignments stretched 677 and 1,512 bp, respectively.

Phylogenetic Analyses

Datasets of mitochondrial loci COI and ND2 were analyzed independently via the maximum likelihood (ML) framework for phylogenetic inference. The alignments of both genes were standardized to include the same species and wherever possible, the same specimens, to allow for direct comparison of results. An additional COI alignment of two samples per species for all available species (GenBank accession numbers of some recently described species remain unavailable) were combined to create a matrilineal genealogy representing all currently barcoded Cyrtodactylus.

We used the Akaike Information Criterion (AIC) in PartitionFinder50 to establish the most accurate models of evolution based on locus and codon position, specific to our analytical program (RAxML). ML analyses were carried out in RAxML 8.051 via the CIPRES supercomputing portal52. COI was analyzed as a single locus, and ND2 was partitioned into the protein coding region and tRNAs. We employed the GTR+I+Γ model of evolution, and ran the program for 100 independent tree searches to find the best topology, and 5000 bootstrap replicates to retrieve topological support values.

Accession Codes (Data Availability)

All accession numbers are included in Table 1, except where pending acceptance to GenBank (noted as ‘Awaiting accession’).