Exploring hidden diversity in Southeast Asia’s Dermogenys spp. (Beloniformes: Zenarchopteridae) through DNA barcoding

Members of the freshwater halfbeak genus Dermogenys are hard to identify to the species level, despite several previous attempts to isolate fixed meristic, morphometric and colour pattern differences. This has led to ongoing confusion in scientific literature, records of species occurrence, and entries in museum collections. Here, a DNA barcoding study was conducted on the genus to gain further understanding of its taxonomic status across the Southeast Asian region. Fish were collected from 33 localities, spanning freshwater and brackish habitats in Malaysia, Western Indonesia, Thailand and Vietnam. In total, 290 samples of Dermogenys spp. were amplified for a 651 base pair fragment of the mitochondrial cytochrome oxidase c subunit I (COI) gene. Analysis was able to successfully differentiate the three species: D. collettei, D. siamensis, D. sumatrana; reveal the presence of a new putative species, Dermogenys sp., that was sampled in sympatry with D. collettei at three locations; as well as uncovering two genetic lineages of a fifth species, D. bispina, that display non-overlapping geographical distributions in drainages of northern Borneo; Kudat and Sandakan. This study expands the barcode library for Zenarchopteridae, demonstrates the efficacy of DNA barcoding techniques for differentiating Dermogenys species, and the potential thereof in species discovery.

classified four of the then ten nominal species, considering three to be synonyms of D. pusilla. In a subsequent revision, Brembach 19 recognized ten species and three subspecies, with diagnosis based on modified anal fin characters in males. He classified several populations that had consistent differences in the andropodium (male modified anal fin) as D. pusilla. This has led to erroneous classifications throughout museum collections and the literature 11 . Meisner & Collette 1 named a new Dermogenys species from Sabah as D. bispina. This was followed by the discovery of a further four species, differentiated based on phylogenetic analysis of the gonadal histology and embryonic modifications associated with viviparity, and on osteological characters in modified anal fin rays 11 . Meisner concluded that the osteological characters of anal fin rays in males were diagnostic for species identification. Currently, there are 12 morphologically recognised species of Dermogenys in the Southeast Asian inland waters, as reviewed by Meisner 11 .
Among the halfbeaks, the genus level diagnostics as described by Rainboth 20 are widely used as a guide for field identification. To aid identification at lower taxonomic rank, Meisner & Burns 9,10 and Meisner 11 developed keys for species level Dermogenys identification based on extensive morphological and morphometric analyses of a large number of anatomical structures. However, taxonomic differentiation among Dermogenys species remains challenging and is difficult in non-specialist facilities due to the small size of the individuals and overlapping morphological characteristics within the genus, requiring a considerable degree of skill and taxonomic expertise. For instance, Meisner 11 concluded that the osteological characters are only informative at the generic level, while gonad histology yields characters useful for differentiating species groups within genera, as well as differentiating at genus level between Dermogenys and Nomorhamphus. According to Meisner 11 , the single diagnostic character for species level identification is the shape of the modified anal fin, however this character is only present in male specimens, and is only discernible through specialised radiography and optimal staining techniques. Although meristic and morphometric characters have proved efficient in resolving taxonomic questions at the species level in many teleost studies [21][22][23][24][25][26][27][28][29][30] , in the case of Dermogenys, they fail to adequately identify individuals to species level 11,19 . Likewise, characters like black pigmentation of pelvic and dorsal fins in males and melanophore arrangement from anterior to anal fin in females may aid in sorting the Dermogenys specimens into different groups at the genus level, but not to a precise taxon.
This study therefore attempts to alleviate these taxonomic challenges through a DNA barcoding approach. This technique, which relies on fixed differences in mitochondrial DNA sequences between taxa, has shown to be approximately 90% successful in species identification of freshwater and marine fishes [31][32][33][34][35] . Once a DNA barcoding database is established, the method offers a rapid way to allocate unknown specimens to correct species names, and can be used to identify individuals that lack defining morphological characters, for example females, larval fishes, or processed fish products [36][37][38] , among a range of other applications [39][40][41][42] . The ability for DNA barcoding to identify unknown specimens has proven useful across a wide range of biological disciplines, such as in biosecurity 43,44 , wildlife forensics [45][46][47] , phylogenetics 31,[48][49][50] , and more generally in the conservation and management of wildlife [51][52][53][54] .
To date, the only Dermogenys barcode sequences available in online DNA databases and identified to species level are for D. pusilla, while all other Dermogenys barcodes come from fish that were only identified to genus level. Therefore, this study aims to generate new DNA barcodes for the species obtained during the course of this study, and to assign these to the species level using morphological and molecular tools, providing a robust taxonomic framework for future research on Dermogenys. The findings provide further insights into systematics as well as the phylogenetic relationships of the genus and highlight the complementarity of morphological and molecular characteristics in elucidating the taxonomic status and systematics of this group.

Results
Collection and morphological identification. Sample collection across 33 freshwater locations in Thailand, Vietnam, Malaysia and Western Indonesia yielded a total of 290 Dermogenys specimens (Fig. 1, Table 1). Island-wide sampling was conducted in Western Indonesia (Sumatra), however no Dermogenys specimen was obtained from the west coast of this island. Across all locations, most of the individuals collected (70%) were adults and therefore could be putatively assigned to their morphological species using the techniques outlined in the Methods section. This resulted in the identification of five morpho-species: D. collettei Meisner 11  Morphological identification among adults was not straightforward. Of the adults sampled, a combined total of 14 specimens from Sikao (Isthmus of Kra, Thailand), Alur Itam and Sungai Iyu (Western Indonesia), and Sungai Lalang and Kuala Sepetang (Peninsular Malaysia) could not be classified based on existing keys. At three of these sites (Alur Itam, Sungai Iyu and Kuala Sepetang), additional individuals were present that could be positively identified as D. collettei, so in this case co-occurring unidentified fishes were tentatively classified as being of the same species. In the other two instances (Sikao, n = 6, and Sungai Lalang, n = 12), no positive identifications could be made of any specimen, and therefore all were recorded as Dermogenys sp. No adult male was available from these locations, and although females were morphologically similar to D. collettei, they lacked the thin line of melanophores arranged from the anterior to anal fin characteristic of females of this species.
The remaining 86 (29.7%) samples were sub-adult specimens lacking morphological diagnostic characters, and as sub-adults are expected to co-exist with the adult specimens, where adults could be identified the identity of sub-adults was tentatively assumed to be the same. For all specimens, DNA barcodes were then employed to confirm (or challenge) morphological identifications. All sequences were identified as Dermogenys spp. by the GenBank BLAST tool, however, as prior to this study no species-level reference COI data existed for Dermogenys other than for D. pusilla, and as no D. pusilla was collected in this study, no individual could be identified to species level with the GenBank search. The current data set did, however, clearly exhibit variation consistent with five discrete monophyletic groups that all differed from D. pusilla, as visualised in Fig. 2. These groups were largely consistent with species designations based on morphology, identifying groupings for D. collettei Not all was as expected, however, with relatively high genetic distance observed within groups, tentatively identified samples showing identity to unexpected genetic clusters, and individuals collected at the same location showing identity to different groupings. This is reflected in within-site measures of Kimura 2-parameter (K2P) genetic distance, which ranged between 0.0% where all individuals had identical barcodes, to 5.1% where individuals collected together assigned to multiple genetic clusters, indicating more than one putative species was present in these sites (Table 1). Samples from three locations showed high "intrapopulation" K2P distance due to the presence of multiple putative species; Kuala Sepetang (5.1%), Sg. Iyu (3.6%) and Alur Itam (3.8%). Individuals from all three sites had previously been flagged as hard to identify based on morphology, and as a result of the barcode evidence, all specimens from each site were sorted into two groups; the D. collettei group and the Dermogenys sp. group (Table 1) and are considered discrete putative species in all further analysis and discussion.
Average K2P genetic distances among barcodes generated here, Genbank D. pusilla barcodes, and representatives of other zenarchopterids are presented in Table 2. Within putative taxa, as defined using a combination of morphological similarity and genetic monophyly, the lowest mean distance (0.02%) was observed for the unclassified Dermogenys sp. (n = 28), that was collected across five locations (Table 1, Fig. 1). Maximum average distance at the intraspecific level was 1.5% (for D. bispina and D. pusilla), while minimum average between species was 3.7% between D. sumatrana and D. pusilla.
A barcode gap analysis ( Fig. 3(a)) incorporating current data and existing D. pusilla barcodes revealed that "barcode gaps" were present among all pairwise comparisons of the six putative Dermogenys species, indicating that all six groups were comprised of members belonging to different putative species. Following this, Automatic Barcode Gap Discovery (ABGD) analysis generated 3 to 48 OTUs (Fig. 4). However, an a priori intraspecific divergence of (P) (P = 0.0077-0.0129), chosen based on Fig. 4, generated 7 OTUs, instead of the expected six. The additional OTU (with respect to the six presumed species) identified by ABGD divided D. bispina into two groups, the first is from Kudat and the second from Sandakan (north and north-east Sabah, respectively).
The K2P genetic distances of the newly assigned groupings (based on OTUs) are summarised in Table 3. The minimum genetic distance values of the newly assigned grouping were slightly lower than for the six presumed Dermogenys species, with intraspecific distance ranging between 0.1% -1.5% and interspecific distance from 2.7% to 13.2%. Average K2P genetic distance generated for the two groups of D. bispina (Kudat and Sandakan) is 2.7%. Overall, the p-distance within-site, intraspecific and intraspecific values were slightly lower than K2P value (Tables 2 and 3). A re-analysis of the barcode gap was conducted for the newly assigned groups, showing the presence of barcode gaps as presented in Fig. 3(b).
The COI gene tree estimated using ML and BI was congruent with that estimated using the NJ method (Fig. 2), with all tree estimation methods yielding results that recover two strongly supported geographic clusters of D. bispina, Kudat and Sandakan, reflecting groups identified in the ABGD analysis. At three locations (Kuala Sepetang (SPP), Sungai Iyu (SI) and Alur Itam(AI)), individuals were present from two taxonomic groups, namely D. collettei and Dermogenys sp.  Fig. 1. Species names as verified using morphological and genetic information. Multiple species were present at three locations, as reflected by elevated genetic distances among individuals collected at these sites (indicated in bold). * Sample obtained from Florida Museum of Natural History. n/c -no calculation due to single sample.

Discussion
The present study successfully identified and characterized six putative species of Dermogenys, including two reciprocally monophyletic groups in D. bispina; generated a reference barcode database for Dermogenys; and assessed levels of morphological/molecular disparity. Our preliminary Dermogenys taxonomic identification using morphological characters was limited, as the morphological keys 11 are only applicable at particular life stages and are gender-specific. Therefore, due to a strong reliance on assessment of anal fin structure, which is only accurate through advanced radiographic analyses, conflict was common between morphologically identified specimens   Table 2. Interspecific and intraspecific mean genetic distances of K2P distance and p-distance (in parenthesis) for the six putative (based on morphology and COI) species of Dermogenys as identified by a combination of morphological and genetic data, and GenBank D. pusilla. n/c -no calculation due to single sample.  and results from our molecular analysis. These issues compound the mislabelling of Dermogenys species throughout museum collections and the taxonomic literature 11,20,[57][58][59] . Failure to adequately discriminate to species level prior to biological and evolutionary investigations can also be problematic, for example de Bruyn et al. 12 found major taxonomic ambiguities in their molecular analysis of Dermogenys, as morphologically recognised species were shown to comprise multiple, reciprocally monophyletic lineages. These taxonomic uncertainties are understandable, given the lack of easily distinguishable diagnostic characters for the group, yet ongoing mislabelling of specimens adds to taxonomic confusion and can lead to counter-productive conservation and management efforts. Thus, methods such as DNA barcoding may offer a means to ensure identifications are standardised, at least until such time as current taxonomic keys receive necessary revisions, alleviating the inconsistencies highlighted above, and the strong reliance on sex-specific characters. The DNA barcoding approach relies, as to some extent do taxonomic keys, on expert knowledge to ascertain the identify of reference specimens, pulling together taxonomic literature and geographical information to compile a reference library. This paper represents the first attempt to do so for Dermogenys, verifying that the standard fish barcoding primers of Ward et al. 31 work for barcoding the genus and adding data for five more Dermogenys species to the existing reference library of one. The DNA barcodes generated here enable the identification not only of adult specimens without undertaking complicated morphological investigations, but also of under-developed sub-adults, in the current example and more importantly, for future researchers seeking to identify Dermogenys.
The use of barcoding has already revealed some new information on geographical distributions of the genus. In clarifying the identity of fishes from Kuala Sepetang, Sg. Iyu and Alur Itam that were all tentatively identified as D. collettei using morphological and geographical information, our survey revealed that two genetically divergent species exist in sympatry at these sites, one of which remains undescribed. Such findings show that DNA barcoding could assess species diversity through the pairing of genetic distance methods and the genotypic cluster concept 60 . This study also lends further support that DNA barcoding may aid in larval identification, as has been observed in previous studies [36][37][38] , as well as in the identification of cryptic sympatric species, one of the known strengths of the barcoding approach 32,61 .
Of all taxa surveyed here, the Dermogenys species from northern Borneo exhibited among the highest intraspecific distance values, with some analyses indicating cause for considering the two geographically and genetically distinct populations, D. bispina 'Kudat' (north Borneo) and D. bispina 'Sandakan' (northeast Borneo), as sub-species. Borneo boasts a large number of endemic freshwater fishes 62,63 , and the island's physical and environmental characteristics, as well as a paleo history of montane regions are believed to be core reasons for the north, in particular, being a centre for speciation and endemism 64,65 . Between Sandakan and Kudat in northern Borneo lies the Crocker Range (average height 1,800 m) separating eastern and western drainages 64,66 , and the potential cause for isolating the two D. bispina lineages, leading to formation of subspecies in allopatry. This range has previously been hypothesised as a barrier to gene flow for freshwater fishes in the region 67,68 .
Minimum distance between the two D. bispina groups was 2.7%, and this would be high enough to qualify as diagnostic of different species if applying a 2% divergence threshold (e.g., Ward et al. 30,69 . Hubert et al. 70 ). However, the sole reliance of a 2% cut-off value for delimiting species across all taxa can mask the real diversity (e.g., Australian fishes 31 ; Neotropical freshwater fishes 34 ; Canadian freshwater fishes 70 ; Tuna species 71 ; North America's freshwater fishes 72 ) in the group as the initial intraspecific distance of the D. bispina group (when both Kudat and Sandakan were combined) was only 1.5%, and the initial barcode gap displayed no taxonomic ambiguity. This highlights the need for conducting comprehensive analyses as illustrated in the ABGD analysis. Combined with the lack of clear morphological differences, two OTUs supported with barcode gap re-analysis, the discovery of the two reciprocally monophyletic and geographically isolated D. bispina groups, probably  warrants the two populations be considered at the very least as discrete evolutionary significant units or subspecies, and potentially as two different species, given that other reciprocally monophyletic groups with similar levels of divergence are classified as different species (e.g., D. pusilla and D. sumatrana -3.7%). No overlap of intraspecific and interspecific genetic distance was observed. In fact, lower levels of nearest neighbour distance (NND) were observed in Canadian freshwater fishes 70 . Out of 190 barcoded species, 14 showed <0.1% NND value, 20 showed 0.1-1.0%, 17 showed 1.0-2.7%. A similar pattern was observed in Nigerian freshwater fishes 73 . O'Brien & Mayr 74 outlined several criteria for subspecies classification 1) subspecies members share unique geographic range or habitat, 2) the OTUs are reciprocally monophyletic indicating that the genetic divergence of subspecies accumulated in the absence of gene flow, and is time-dependent and 3) unique natural history relative to other subdivisions of the species. Even though DNA barcoding is a very effective tool for the systematics and validation of numerous freshwater fish taxa 32,34,70,75 , this approach leans heavily on the work of classical taxonomists, including the primary documentation of species and distributions, and ongoing work in validating and describing new OTUs. Morphological identification is likely to remain a fundamental approach for taxonomic identifications in most instances, and where DNA barcodes find no match in the barcode libraries, morphology remains the first port of call to validate a specimen's identity 76 . Nevertheless, ambiguity in molecular findings is a good indication that knowledge of a taxonomic group is incomplete, and thus, DNA barcoding functions as a complementary and supporting tool for the robust identification of fish taxa and other organisms.

Conclusion
This study reinforces the complementarity of both morphological and molecular characters as well as other lines of evidence (geographical distribution) in elucidating the taxonomic status and systematics of the Dermogenys group. The data presented contributes DNA reference barcodes for five additional species in this taxonomically challenging group, and we shed light on the level of genetic divergence expected within and between species in this genus, highlighting an area of Northern Borneo where geographically distinct lineages have arisen below the recognised species boundary. Taxonomy and systematics of the group remains incomplete, and detailed taxonomic work will be required to formally describe the new OTU (designated here as Dermogenys sp.), and to update taxonomic keys accordingly. Never-the-less, DNA barcoding as employed here demonstrates the power of molecular techniques in helping tackle difficult issues in taxonomy.

Methods
Collection of tissue samples. This study was carried out in accordance with the recommendations and approval by the Universiti Sains Malaysia Animal Ethics Committee. A total of 288 individuals from 31 locations were sampled from the inland waters of Malaysia, Western Indonesia, Thailand and Vietnam (Fig. 1, Table 1). Samples were obtained from slow moving brackish and freshwater systems, and collected using a scoop net with mesh size of 4 mm. The specimens were identified to genus level based on Rainboth 20 . Specimens were anesthetized with Transmore (NIKA Trading Co.), a fish stabilizer commonly used in aquatic trading prior to taking tissue samples from the pectoral fin (stored in 95% ethanol for DNA extraction). The specimens were then fixed with formalin and preserved in 70% ethanol. Back at the laboratory species level identification was conducted based on osteological characters of the modified anal fin (andropodium) in males 11 . The radiological study was conducted at the Chinese Academy of Sciences, Beijing. Preliminary identification was conducted based on morphological keys. As many characters are common in groups of species (e.g. melanophores from anterior to anal fin arranged into thin line observed in D. collettei and D. siamensis females; black pigment on distal tips of the posterior dorsal fin present in D. collettei, D. siamensis, Dermogenys sp. and D. bispina males; black pigment at the base and distal tip of pelvic fin observed in D. collettei, D. sumatrana, D. siamensis, and Dermogenys sp. males), precise identifications considered current knowledge on species distributions 1,11,16,77 . Thus, samples from Sabah, Vietnam, Thailand and Western Indonesia were putatively assigned to D. bispina, D. siamensis and D. sumatrana respectively, as a working hypothesis. Although de Bruyn et al. 12 did not identify their specimens to species level, they hypothesised that most species were restricted in distribution range, which was further supported by their phylogeny, and therefore this classification scheme was justified in our study. In addition, two samples -morphologically identified as D. collettei, were contributed by the Florida Museum of Natural History (DC652 and DC683), making a total of 290 samples from 33 sampling sites (Fig. 1).
Extraction, COI amplification and DNA sequencing. Total genomic DNA was extracted by using a modified conventional salt extraction procedure 78  Sequence analysis. Sequences were aligned using MEGA v6.06 79 software, this package was also used to investigate base composition, and to calculate the number of variable sites and genetic distance measures. Both uncorrected pairwise distance (p-distance) and distance estimates based on the Kimura 2-parameter (K2P) model 80 were calculated to assess mean "intrapopulation", intraspecific and interspecific genetic distance within sample sites, and within and between species. Using the same software, phylogenetic relationships among haplotypes were constructed applying Neighbour-Joining (NJ) and Maximum Likelihood (ML) methods with 1000 bootstrap replicates. Tree construction was conducted using the K2P model 80 for NJ analysis and Hasegawa-Kishino-Yano model 81 with gamma (HKY + G) rates (optimal substitution model estimated by the model test run in MEGA v6.06) for ML analysis.
The relationships among haplotypes were also assessed using a Bayesian Inference (BI) method together with a Markov Chain Monte Carlo (MCMC) algorithm. In order to construct gene trees, PartitionFinder v1.1.0 82 was used to determine the best-fit partitioning schemes and models of molecular evolution for phylogenetic analysis. The BI analyses were performed using MrBayes 83 with employment of HKY + I 81 , F81 84 , and GTR + G 85 for the first, second and third codon, 1 million MCMC chains and a 50% burn in. The trace files generated from MrBayes run were diagnosed in Tracer v1.6 86 to evaluate the MCMC chain. The COI trees generated were visualized and edited using FigTree v1.4.2 87 . Nomorhamphus sp. (GenBank Acc. No JQ430374) and Hemirhamphodon (Acc. No KM405787) were included as outgroups 88 . Additional D. pusilla (Acc. KU692464 to KU692472) COI sequences from GenBank were included to improve phylogenetic resolution.
The maximum intraspecific distance against the minimum nearest-neighbour distance graph was plotted to check the presence of a "barcode gap" in the dataset 89 . Presence of barcode gaps among sets of sequences within a presumed species/taxon indicates that there is likely to be more than a single taxon within the group. The number of operational taxonomic units (OTUs) based on pairwise sequence distances between individuals within the dataset was generated using the Automatic Barcode Gap Discovery (ABGD) 90 species delineation tool on a web interface (http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html) with default settings and the K2P model employed. The interpretation of the ABGD results is very straightforward. An OTU is considered as successfully delimited when the predicted groups are formed and no other unrelated sequences were included in that group 91 .
Data availability. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.