DNA barcoding and phylogeography of the Hoplias malabaricus species complex

Hoplias malabaricus (Bloch, 1794) is a carnivorous fish species widely distributed from northern to southern South America. This taxon is believed to be a good model for the investigation of biogeographic events that shape the ichthyofauna evolution in the Neotropical freshwater systems. However, many studies have revealed that H. malabaricus hides a species complex that hampers its taxonomic identity and limit its practical value for evolutionary and biogeographic studies. In this paper, we used the mitochondrial gene cytochrome c oxidase subunit I (COI) to delimit cryptic species and explore the phylogeography of H. malabaricus sensu stricto. We found genetic evidence for putative new species in the genus Hoplias and showed that H. malabaricus (Bloch, 1794) is a major clade assigned to barcode index number (BIN) BOLD:ABZ3047. This species is structured in six subpopulations differentiated by high Fst values and restricts gene flow. The subpopulations of the São Francisco/East Atlantic/Eastern Northeast Atlantic/Parnaíba/Itapecuru River basins and Tapajós River Basin were the most differentiated and showed demographic fluctuations. The present distributional pattern is most likely explained through a scenario from the Pleistocene.

of the Neotropical ichthyofauna 10,11,14 . DNA barcoding revealed markedly deep divergence of COI sequences between H. malabaricus populations, which supports the hypothesis that some mitochondrial lineages may constitute putative independent species 1,9 . Cardoso and colleagues 1 delimited 16 mitochondrial lineages of the H. malabaricus species complex to further investigate their taxonomic status. Recently, three of these lineages (BINs: ACO5223, AAZ3734, AAB1732) were recognized as valid species: H. mbigua, H. misionera and H. argentinensis [15][16][17] . On the other hand, the remaining cryptic diversity enclosed in the H. malabaricus complex wait for investigation. At least seven of the H. malabaricus mitochondrial lineages were recorded in the Amazon basin (BINs: AAB1732, ABZ3046, ABZ3047, AAB1731, ACF3787, ACK2158 and ADG3393).
Understanding population divergences on a regional scale is invaluable to disentangle intricate questions about the Neotropical ichthyofauna evolution. Widely distributed fish species, such as H. malabaricus, can be considered suitable models for biogeographic studies because they are exposed to extensive evolutionary and ecological drives that promote dispersal events [18][19][20] . On the other hand, the presence of cryptic species in the H. malabaricus complex is a confounding factor that limits its value for phylogeographic and population genetic studies. Additionally, a precise delimitation of cryptic species may strengthen the inferences in both basic and applied research fields, for example in conservation measures, ecological risk assessment and climate change effects on biodiversity 21 .
In this study, we assembled an extensive database of DNA barcodes including new sequences from the Amazon basin and adjacent drainages, aiming to delimit mitochondrial lineages representative of formally described species and discriminate putative cryptic species of the H. malabaricus complex. We delimited H. malabaricus (Bloch, 1794) "sensu stricto" following criteria in Cardoso and colleagues 1 and evaluated it for genetic structure, phylogeography and demographic history.

Methods
Ethics statements. Fish specimens were collected under a Brazilian Government of the Sistema de Autorização e Informação em Biodiversidade (SISBIO), permission N. 24384-1. For tissue and voucher preservation, the specimens were anesthetized and euthanized by exposure in eugenol solution for a few minutes until the complete stop of the opercula beats. All procedures were approved by Comissão de Ética no Uso de Animais (CEUA) of Universidade Federal do Oeste do Pará (N 09003/2016) and followed all relevant guidelines. Additionally, the study was carried out in compliance with ARRIVE guidelines.
Sampling and study area. We sampled 153 H. malabaricus specimens from 38 localities in the Amazon Basin (Brazil, Peru), the Araguaia-Tocantins basin (Brazil), the Western Northeastern Atlantic Basin (Brazil), the Guiana Shield drainages (Guiana) and the Orinoco Basin (Venezuela, Colombia). The fish were collected using seine nets, casting nets and fish hooks. Samples of epaxial muscle were preserved in absolute ethanol and stored at -20 °C. The specimens were fixed with 10% formalin for 48 h, washed and preserved in 70% ethanol. Specimens were later deposited in the Fish Collection of the Instituto de Ciências e Tecnologia das Águas and the Laboratório de Genética e Biodiversidade, Universidade Federal do Oeste do Pará (Brazil). DNA extraction, PCR and sequencing. DNA extraction followed a salting-out protocol and the amounts were evaluated in a 1% agarose gel stained with Gelred (Biotium) 22,23 . DNA fragments of cytochrome c oxidase subunit I (COI) mitochondrial gene were amplified using the standard DNA barcoding primers Fish F1 and Fish R1 24 . The reactions were assembled in 25 μL, containing 15 μL sterile H 2 O, 2.8 μL dNTP mix (1.25 mM), 2.5 μL buffer 10 × (200 mM Tris-HCl (pH 8.4) + 500 mM KCl), 2.5 μL MgCl 2 (50 mM), 0.5 μL of each primer (5 μM), 0.2 μL Taq DNA polymerase (5 U/μL) and 1 μL of genomic DNA (50-100 ng). The cycling profile was as follows: 95 °C/2 min, 35 cycles of 94 °C/30 s, 54 °C/30 s and 72 °C/1 min, and a final step of 72 °C/10 min. The amplifications were performed with a Pxe 0.2 thermocycler (Thermo Scientific) and the amplified products were evaluated in a 1% agarose gel stained with Gelred. PCR products were purified with PEG8000 protocol 25 29,30 and (3) automatic barcode gap discovery (ABGD) 31 . The BIN analysis is an automated process implemented in the platform www. bolds ystems. org, which is based on genetic distances to identify clusters of query sequences against a DNA barcode library; such clusters are coded as BIN numbers and interpreted as operational taxonomic unit (OTUs) that represent a species. To perform the GMYC analysis, we removed haplotype duplicates with ElimDupes (https:// www. hiv. lanl. gov/ conte nt/ seque nce/ elimd upesv2/ elimd upes. html) and made an ultrametric tree using BEAST v1.8.0 32 , following these parameters: evolutionary model HKY + I + G chosen with jModelTest 33 , molecular clock lognormal relaxed, Yule speciation process. Leporinus amblyrhynchus Garavello & Britski 1987 was adopted as the outgroup. We ran a Bayesian reconstruction with 80 million MCMC iterations, sampled every 1000 iterations with a burn-in of 10%. The tree convergence and stability were checked with the software Tracer v.1.7.1 32 , retaining an effective sample size (ESS) > 200. The trees were combined with TreeAnotator v1.8.0 32 , and the output file was saved in Newick tree format to be used for GMYC delimitation. The analysis of coalescence/speciation (GMYC) was processed following the model single threshold, in the environment R 3.4.3 34 supplemented with libraries Splits (Species Limits by Threshold Statistics) 35 and Ape (Analyses of Phylogenetics and Evolution in R language) 36 . ABGD was processed on the platform www. bioin fo. mnhn. fr/ abi/ public/ abgd/ abgdw eb. html using the alignment data set (fasta file) as the input file. We set the parameters: model K80, Pmin. 0.001, Pmax 0.01 and barcoding gap width X = 0.2.
To integrate the phylogenetic information, species delimitation and divergence time we processed a second Bayesian reconstruction following the procedures mentioned above with minor modifications: 200 million MCMC iterations sorted at each 1000 and 10% of burn-in and a strict clock model. Divergence times were calibrated with a mutation rate of 1% per million years (Myr), which is conservative for fish mtDNA 37,38 . The resulting trees were assembled with TreeAnotator and the topology was visualized/edited with FigTree v1.2.2 (http:// tree. bio. ed. ac. uk/ softw are/ figtr ee/). Pairwise genetic distances between delimited species were measured following the K80 model 39 using the software MEGA X software 40 . Population genetics and phylogeography. The individuals assumed to be Hoplias malabaricus sensu stricto (BIN ABZ3047) following designation proposed in Cardoso and colleagues 1 were investigated for intraspecific genetic diversity and population structure. We used the software GENELAND R package v. ≥ 4.0.0 (http:// www2. imm. dtu. dk/ gigu/ Genel and/) 41 implemented with R v3.4.0 34 to investigate population subdivisions and to find the geographic population units, based on Bayesian statistics. The population genetic structure was evaluated through F ST statistics and molecular variance analysis (AMOVA) implemented with Arlequin v.3.1 42 . We assumed populations as the clusters of individuals such as revealed by Geneland analysis. For F ST divergence, we follow Wright and colleagues 43 categories: low (0.00-0.05), moderate (0.05-0.15), high (0.15-0.25) and elevated (> 0.25). Parameters of the population genetics (e.g. haplotypes, nucleotide diversity, polymorphic sites) were analysed with DNAsp v.6 44 . A haplotype network was constructed based on median joining algorithm 45 with the assistance of PopART software 46 .
To explore the demographic history, we applied neutrality tests Tajima's D 47 and Fu's Fs 48 , implemented with Arlequin v.3.1 42 . Additionally, to detect population size variations we investigated the mismatch distributions and Bayesian skyline plot (BSP). These analyses were implemented with DNAsp v.6 44 and BEAST v.1.8.0 32 . BSP analysis adopted the HKY + I + G model and 100 million MCMC sorted each 1000 iterations.
We constructed an ecological niche model with the maximum entropy algorithm MAXENT version 3.3.3 k 49-51 based on 82 georeferenced occurrence points (Fig. 6a) and 19 bioclimatic variables from WorldClim (https:// www. world clim. org/ data/ biocl im. html). Such variables were correlated with a 2.5 arc-minute spatial scale 51 , and the distributional limits were assessed from the median occurrence with 50 bootstrap pseudoreplicates. The theoretical distributional patterns were visualized with QGIS v.3.16.8-Hannover (Quantum GIS Development Team, www. qgis. org). We used jackknife permutations to evaluate the model performance gain and to identify and retain the most relevant explanatory variables.

DNA barcoding and species delimitation.
We analysed 439 COI sequences from 10 species of the genus Hoplias. The sequences were 621 bp long without stop codons or indels. The dataset showed a base composition of 29.4% (T), 29.5% (C), 23.3% (A) and 17.7% (G).
Phylogenetic Bayesian inference showed three major clades in Hoplias. The largest clade encompasses the species assembled to the H. malabaricus group that emerged more recently; the second clade is configured with the species from the H. lacerdae group, which surprisingly nested five individuals deposited as H. malabaricus (Panamá clade); the most basal clade is represented by Hoplias curupira lineages (Fig. 1). All the lineages received strong support from the Bayesian genealogic inference (100% posterior probability-PP), with the exception of the H. malabaricus lineages that were assembled to BIN ABZ3047 supported by 65% PP (Fig. 1).
Based on the present DNA barcode library of Hoplias we delimited a varied number of species with different methods: ABGD (28sp), BIN (33sp) and GMYC (42 sp). The pairwise genetic distances between groups (BINs) ranged from 1.1 to 21.5%, while the intra-BIN distances ranged from 0 to 1.6% (Table 1  www.nature.com/scientificreports/ ADR3966 and AAB1730) but with two species (ABGD, GMYC). H. mbigua was delimited with three species (BIN, GMYC) (ACO5223, AAI8239 and ACK8866) but with two species (ABGD). The specimens assigned to H. malabaricus were assembled in 11 putative species (BINs: ADG3393, AEA4944, AEF4663, ADL3159, ABZ3046,  AAI8240, AAB1731, AEA5279, ACF3787, ACR9466 and ABZ3047); however, ABGD supported eight species while GMYC delimited 21 species hidden in this taxon. Therefore, the H. malabaricus species complex was clearly evidenced from distinct DNA barcode analysis. We recognized H. malabaricus (Bloch, 1794) as the clade that nested the individuals from Suriname, since it is the type locality of this species. Such clade was delimited as BIN ABZ3047. To explore the intraspecific genetic relationships, we assumed BIN ABZ3047 as a single species, which is hereinafter referred to as H. malabaricus sensu stricto following Cardoso et al. (2018). This species comprises eleven entities widely distributed throughout the Amazon Basin and adjacent drainages, including a subclade that is restricted to the Guyana shield drainages. An updated distribution map of the Hoplias malabaricus species group based on records of barcoded individuals is provided and the geographic occurrence of news records of putative species is highlighted (Fig. 2).  (Fig. 3).

Population genetics and phylogeography of
The H. malabaricus sensu stricto subpopulations presented high genetic diversity counting from 6 to 19 haplotypes and haplotypic diversity (h) from 0.230 (SPOP6) to 0.956 (SPOP2). The nucleotide diversity (π) was higher in SPOP1 (0.0187) and lowest in SPOP6 (0.0005). Two subpopulations (SPOP3 and SPOP6) resulted in negative and statistically significant Fu's Fs and Tajima's D values that were interpreted as evidence of neutrality deviation by purifying selection or population expansion. For the other subpopulations, both the neutrality tests suggested the long-term population stability ( Table 2).
The haplotype network showed that only two haplotypes were shared between H. malabaricus sensu stricto subpopulations (SPOP1-SPOP2 and SPOP1-SPOP5). Three haplogroups were depicted and coincide with SPOP3, SPOP4 and SPOP6, which showed only private haplotypes demonstrating higher genetic differentiation (Fig. 4).
A pairwise Fst comparison demonstrates that subpopulation SPOP6 is the most differentiated ( Table 3). The lowest Fst value (0.407) was recorded for the SPOP6xSPOP1 comparison and the highest differentiation was detected to SPOP6xSPOP2 (0.545). AMOVA results demonstrated 27.20% of variation was detected among populations and 72.80% within populations (Table 4).
In the mismatch distribution analysis among the population groups, the SPOP3 and SPOP6 presented a unimodal curve and were suggestive of demographic fluctuations, whereas the SPOP1, SPOP2, SPOP4 and SPOP5 populations displayed multimodal curves (Fig. 5a-f). Bayesian skyline plots (BSP) provided a signal of long-term demographic stability (Fig. 5a-f). For the SPOP1, the BSP showed continuous and discrete historical population size growth approximately 950 thousand years before present (kybp), with a decrease with the onset of the 150 kybp, after which it remained stable for a short time before increasing to the current population size (Fig. 5a). For SPOP2, the BSP plot suggests expansion time estimated onset 600 kybp, followed by a period of stability starting at 200 kybp and shortly after very slight tendency of decrease approximately 50 kybp (Fig. 5b). The SPOP3 and SPOP4 started a period of expansion at 350 kybp, and evidence of a substantial decline started at 50 kybp (Fig. 5c,d). SPOP5 demonstrated a fluctuation, with rapid population expansion starting at 150 kybp (Fig. 5e). For SPOP6, a remarkably progressive increase in population size started at 35 kybp, and the lightest tendency of increase occurred at the start of the Holocene (Fig. 5f).
For the niche model, the mean diurnal range [mean of monthly (max temp − min temp)] was the most important variable driving the Hoplias malabaricus sensu stricto distribution. The other most important variables were the annual temperature range, mean temperature of the wettest quarter, mean temperature of the driest quarter and precipitation seasonality (coefficient of variation). The prediction is H. malabaricus was potentially distributed congruent with the present-day geographical distribution. The areas with the highest levels of suitability were eastern Amazon Basin, Guiana Shield and northeast Brazil. Spatial displacements did not seem to occur, but the range size population increased over time (see Fig. 6).

Discussion
The species of Hoplias tend to be morphologically conserved and its taxonomy is still under debate 15,17 . Despite its conspicuous karyotypic variation, cytogenetic markers have shown poor resolution for taxonomy and phylogenetic reconstruction 2,8,9 . Recent progress has been achieved with DNA barcoding coupled to morphological traits (integrative taxonomy), which starts to delineate a clearer picture of the evolutionary history of this group 15 2,5,7,8 . Some of these lineages were presented in previous studies 1,9,52 ; however, we could recognize six new putative species from the H. malabaricus complex (BINs: AAY4779, AAD3630, AEA5279, ADL3159, AEF4663 and AEA4944). COI deep divergences, higher than 2%, are a marked signature of lineage/species differentiation in Hoplias 1,8,9 . Such a phenomenon has been observed between populations from the same and distinct hydrographical basins 8 and between populations that share identical karyomorphs 9 . H. misionera and H. argentinensis diverged from their nearest neighbors by 5.6 and 9.0% 15,17 . Jacobina and colleagues 8 similarly recorded deep divergence (7 to 7.3%) delimiting putative species from H. malabaricus lineages in distinct Brazilian hydrographic basins. On the other hand, Cardoso and colleagues 1 demonstrated that speciation in Hoplias may be accompanied by a large range of COI genetic distances, between approximately 1% and 20%. Indeed, H. mbigua diverged from its nearest neighbor by only 1.13% 1 . The BINs AAD3630 and AEA5279 are from Colombia and are composed of individuals from Orinoco basin drainages. The BINs AEA4944 and AEF4663 from the Ventuari-Orinoco were delimited as a single species with ABGD and GMYC. Herein, we consider the taxonomic status of these lineages most imprecise, and their delimitations as putative species must be regarded cautiously. Putative species delimited from a small number of sequences (< 5 individuals) are highly susceptible to bias 31,54 . The ADL3159 lineage was restricted to the Crepori River, a tributary of the middle Tapajós drainage, distantly more than 500 km from the confluence zone between the Tapajós and Amazonas Rivers. This group was supported by distinct methods, and we believe it is a new undescribed species within the H. malabaricus complex. Recently, an integrative taxonomic analysis by our team formally describes this taxon (article accepted).
Because of the taxonomic uncertainty of the genus Hoplias, particularly in the H. malabaricus complex, few studies have focused on population genetics and micro evolutionary processes, which may be caused by imprecise species discrimination under field conditions due to the high genetic diversity (karyomorphs, DNA), morphological similarity and poor taxonomic knowledge. Herein, we successfully identified H. malabaricus from cryptic congeners based on DNA barcoding sequences. Although the COI gene has been routinely used for species delimitation 13,55 , it can be useful to explore population genetic structuring in freshwater fishes 12,56 .     www.nature.com/scientificreports/ From the H. malabaricus complex, we detached a clade supported with 65% PP and delimited to BIN ABZ3047, which was interpreted as a single evolutionary unit and tested for genetic structure and taxonomic status validation. This putative species was assumed to be H. malabaricus stricto sensu and believed to be representative of H. malabaricus (Bloch, 1794). We found a clear genetic structure pattern, initially evidenced by the Bayesian topology that showed multiple subclades, and spatially divergent subpopulations shown with Geneland analyses. High Fst differentiation values and almost 50% of the genetic variation among populations support the genetic structure hypothesis. For comparison, the Fst values of the Tapajos River Basin population (SPOP6), the most differentiated, ranged from 40 to 54%. Aguirre and colleagues 57 measured a maximum F ST divergence of 20% between H. microlepis populations from rivers and artificial impoundments in Ecuador. Due to its wide distribution range and sedentary habits, Hoplias species tend to exhibit high population genetic differentiation 9,58 .
Genetic structuring in nonmigratory fish from the Amazon basin dispersed to adjacent drainages has been previously observed in Arapaimidae species, Osteoglossum bicirrhosum 59 and Arapaima gigas 60 . We detected a genetic link between populations from Orinoco drainage and the Amazon basin, in the Trombetas River (northern Amazonas drainage). Although, Amazonas and Orinoco systems remain currently isolated, past contacts between them are well documented and such gene flow exchange has been observed in other fishes (e.g., needle fish genus Potamorrhaphis) 61 . Therefore, such connections between H. malabaricus populations may be explained by lineage sorting taking to the maintenance of ancestral haplotypes.
The SPOP 3 and 6 presented demographic instability and neutrality deviation that was interpreted as a signature of recent expansion. The biological drives in such populations may be adjusted with Pleistocenic phenomena, which span the most recent period of repeated glaciations. The SPOP 3 flows through portions of morphoclimatic domains of the Cerrado, Caatinga and part of the Atlantic Forest, where there is an extensive field of sand dunes in the São Francisco River basin which represents a testimony drier climate in this region in the past. The São Francisco River changed to its present course during the Mindel glaciation (approximately 450 kybp). Presently, it flows towards the north, curving towards the southeast and to the Atlantic Ocean, but this river previously flowed in a different direction, connecting with the Parnaíba River to the Atlantic Ocean 62 , which may explain why this subpopulation was also be formed by samples from the Parnaíba River and the Western Northeast Atlantic basin. Similarly, the highlands in Guiana shields and Andes Mountains accumulated thick ice caps during the Pleistocene glaciations and the Tapajós region experienced long periods of erosion 63,64 . The SPOP6, formed by samples from the lower Tapajós River basin, is consistent within the limits of the Pleistocene (2.5 mybp-11.7 kybp), as shown by with mismatch distribution and population expansion analyses.
Modern Amazonian biodiversity dimensions were achieved during the Neogene (23-2.6 mybp), but the most ancient lineages are probably present since the Paleogene (66-23 mybp) and Late Cretaceous (146-66 mybp) [65][66][67] . During the middle to late Cenozoic, the Western Amazon basin was a lacustrine habitat, while the eastern and central portions were repeatedly invaded by marine incursions, resulting in the isolation of Guiana and Brazilian Shield tributaries 68 . The river capture dynamics during the Neogene have been proposed as the main force driving the last diversification of aquatic and terrestrial Amazonian taxa that are ecologically restricted to water bodies and riparian forests 65,69 .
The proportion in which animals and plants retain ancestral ecological traits during time scales, that is, niche conservation, is still a controversial theme 70 . The finding of niche conservation is based on many variable aspects of the fundamental niche, taking into account that the rate of adaptation to conditions outside the niche is slower than the rate of extinction. The observation of conservation or niche change is an area of great relevance in the study of niche biogeography and in historical biogeography and in the exploration of patterns and mechanisms www.nature.com/scientificreports/ of species richness in wide spatial areas 71 . Similar to the study by McNyset 72 , which shows ecological niche conservation for six freshwater fish from North America under a moderate time scale, the present study points out that from the last glacial maximum (LGM) to the Anthropocene the niche remains very close in this taxonomic group. The H. malabaricus populations showed demographic evidence of expansion-decline cycles during the Pleistocene, immediately before the LGM. We hypothesize that such demographic fluctuations in H. malabaricus may be synchronized with the recent glaciation and the geologic events linked to the formation of the modern Amazonas River system 65,73 . Currently, H. malabaricus sensu stricto is dispersed through the Guiana drainages, Amazon basin, Western Atlantic Northeast basin and São Francisco River basin 1 . These systems experienced connection-isolation cycles during the Pleistocene period driven by climatic fluctuations and geomorphologic forces 1,18,74 . Our paleogeographic model revealed that the hypothetical distribution in the Pleistocene was mostly congruent with the present-day distribution and was influenced by climate change. The geographical distribution of the complex was restricted in the North and Northeast at South America during the LGM, and in the Anthropocene there was a reduction in potential areas. The drainages from the eastern portion of the Amazon basin and coastal drainages in the western Northeast Atlantic basin seem to be important routes for population dispersal.
Geomorphologic forces, mainly plate tectonics, marine incursions/regressions and climate fluctuations, recurrently shaped the Amazon landscapes, driving important processes in aquatic systems and leading to intensive changes in the river courses and river captures. These combined phenomena may be involved in the taxonomic radiation and geographic dispersion of the fish populations throughout the Amazon basin [65][66][67][75][76][77][78] .
To conclude, the data herein indicate that Hoplias malabaricus sensu stricto presents a remarkable population structure, which seems to have been caused by past events. The cryptic diversity observed among the Hoplias malabaricus species group has conservation and taxonomic implications. Although morphological data have not been evaluated here, our molecular data are necessary to determine whether the mitochondrial lineages can represent distinct species. If to ABZ3047 subpopulations are not shown in the future to be different species, at least the SPOP 3, SPOP 4 and SPOP 6 can be considered as different evolutionarily significant units for conservation purposes. Additionally, even whether Hoplias malabaricus sensu stricto constitutes a well-distributed metapopulation in northern South America, its distribution would be much smaller than what is currently attributed to the species; thus, the taxonomy and geographic distribution of H. malabaricus should be reevaluated.

Data availability
All data generated or analysed during this study are included in this published article and its Supplementary Information files.