Human overexploitation of natural resources has placed conservation and management as one of the most pressing challenges in modern societies (Mora and Zapata 2013). Over the past four decades, heavy targeting by fisheries has contributed to a decline in average population size of more than 36% and 81% in marine and freshwater ecosystems, respectively (WWF 2016). Understanding the evolutionary history of impacted species is fundamental for establishing strategies to protect and sustain biological diversity (Crandall et al. 2000; Moritz 2002; Cook and Sgrò 2019). Nevertheless, this task is particularly challenging in morphologically conserved species because they are difficult to distinguish based on external morphology alone, requiring an integrated approach between morphological and molecular tools that until recently have been unavailable for non-model organisms (da Fonseca et al. 2016).

Understanding the number and distribution of morphologically cryptic species has strongly benefited from studying fast evolving mitochondrial genes, which are revealing an exponentially growing number of species (Sáez and Lozano 2005), and their demographic history (Carnaval et al. 2009). Such discoveries of hidden genetic diversity have been reported even in previously known biodiversity hotspots, such as in the Amazon (Benzaquem et al. 2015), and in marine ecosystems (Asgharian et al. 2011; Brandão et al. 2016). Although many cryptic species have allopatric distributions (Wake 2009), many others are partially or fully sympatric (McBride et al. 2009; Moritz et al. 2018), suggesting that cryptic speciation can result in stable genetic boundaries between species. Thus, assessing the phylogenetic relationships between sympatric cryptic species, their levels of genetic connectivity, and their demographic history offer important insights into the process of species formation.

Although mitochondrial studies have been fundamental in identifying morphologically conserved species and mapping their distribution (Ward et al. 2009), mitochondrial markers are limiting when assessing demographic history (Galtier et al. 2009). To better inform sustainable conservation measures, it is thus necessary to use hundreds or thousands of independent nuclear markers to accurately reconstruct the demographic history of these species (Grewe et al. 2015; Allendorf 2017; Grundler et al. 2019). Recent advances in sequencing technology and statistical methods now offer unprecedented opportunities for the field of conservation biology across taxa, providing new insights into the adaptive capacity in tortoises (Scott et al. 2020), genetic connectivity in terrestrial mammals (Pedersen et al. 2018), and historical changes in effective population size in marine mammals (Peart et al. 2020; Bilgmann et al. 2021).

To infer the evolutionary processes underlying the genetic diversity of sympatric cryptic marine fishes, we focus on the genus Mugil, commonly known as mullets. Historically, these species have been heavily targeted by traditional and industrial fisheries (Whitfield et al. 2012; Pacheco-Almanzar et al. 2017)—with total harvest reaching about 140 k tons globally in 2013 (Crosetti 2016) and significant decreases in census sizes within the last 25 years (Mendonça and Bonfante 2011; Sant’Ana et al. 2017; Vieira et al. 2019). Nevertheless, their diversity and evolutionary history is just starting to be revealed by molecular studies (Durand and Borsa 2015; Xia et al. 2016; Delrieu-Trottin et al. 2020; Neves et al. 2020).

Mugil species live in fresh and brackish waters during most of their life cycle, migrating to the sea to reproduce (Nordlie 2016). Thus, they play a fundamental role in transferring energy between estuaries and coastal systems (Lebreton et al. 2011), helping in the maintenance of biological productivity and, consequently, the yield of other fisheries. Several morphologically similar species of Mugil occur sympatrically in tropical and subtropical waters, making it necessary to use genetic information to better understand the number of species and their distribution (Durand et al. 2012b; Durand and Borsa 2015; Xia et al. 2016; Neves et al. 2021). Such challenges in assessing species richness and diversity also occur within important Marine Protected Areas (MPAs) designed specifically to protect species of high ecologic and economic importance, such as the Coral Coast MPA, the largest coastal Tropical Southwestern marine province (de Souza et al. 2012). In this MPA, sympatric species of Mugil are extremely challenging to identify based on current taxonomic keys (Menezes et al. 2015), resulting in up to 14% of individuals receiving conflicting morphological and mitochondrial classifications (Neves et al. 2021). Most of the conflicts occur in individuals that carry the mitochondrial haplotype associated with M. curema but are morphologically assigned to M. rubrioculus, M. incilis, or to M. curvidens (Neves et al. 2021). Although M. rubrioculus and M. curema are remarkably similar in external morphology (Fig. 1), they have diverged ~29 million years ago (mya) (Neves et al. 2020), reflecting a strong morphological conservatism in the evolution of these species. Such morphological similarity leads to strong disagreements among biologists regarding the number and distribution of these species (Durand et al. 2012a; Menezes et al. 2015; Pacheco-Almanzar et al. 2016) and reinforces the difficulties in establishing species-specific regulations for fisheries. Currently, Mugil species are targeted by traditional fisheries throughout the MPA without species-specific quotas, although we lack information regarding their relative abundance.

Fig. 1: Distribution of the six species of Mugil fishes that are potentially sympatric at the Coral Coast MPA (highlighted in doted black line).
figure 1

Each depicted species is surrounded by a rectangle with color matching the species distribution (Menezes et al. 2015; Barletta and Dantas 2016; Durand and Whitfield 2016; Pacheco-Almanzar et al. 2016). Dashed yellow line indicates the area where the occurrence of M. incilis has been debated (Menezes et al. 2015). Dashed black line indicates that individuals of all species except M. brevirostris may occur. Mugil incilis photo by A. Carvalho. The plot on the right corner represents a multiple factor analysis based on morphological traits (Neves et al. 2021); ellipses represent 95% confidence level.

Here, we use double-digest restriction of genomic DNA associated with high-throughput sequencing to genotype thousands of markers across nominal species of Mugil that are sympatric in the Coral Coast MPA to: (i) determine the number of Mugil species in the area, (ii) quantify patterns of genetic connectivity within species across two estuaries and between sympatric species in the same estuary, and (iii) estimate the demographic history of every species. Our results provide new insights into the evolution of these species and provide guidelines for sustainable management.

Materials and methods


Specimen collection

We used 94 muscle tissue samples of Mugil species that occur sympatrically in the Coral Coast MPA (Fig. 1) and that were classified as 6 morphological species (Mugil liza, M. brevirostris, M. rubrioculus, M. curema, M. incilis, and Mugil curvidens), according to the current taxonomic key (Menezes et al. 2015). A previous study (Neves et al. 2021) established that these individuals contained only five mitochondrial lineages, based on a diagnostic COI barcoding gene (Table S1), suggesting either an overestimation of species number or mitochondrial introgression. Here, we test the morphological and mitochondrial hypotheses using nuclear data.

While most specimens were collected in a partial reserve of the MPA where traditional fisheries exploit Mugil without restrictions (Santo Antonio estuary), 15 of the 30 individuals of Mugil curvidens were collected in an estuary 38 km away, where fisheries have some restrictions to protect manatee populations (Manguaba estuary; Fig. S1). We included this population to test whether populations of M. curvidens from different estuaries function as a single panmictic population, or whether measurable barriers to gene flow exist between these locations.

Genomic sequencing

Tissue samples were sent in a sequencing plate to DArT™ (Diversity Array Technology), who performed DNA extraction, tested two combinations of enzymes (PstI/HpaII and PstI/SphI), performed high-throughput sequencing for the best enzyme combination (PstI/SphI), assembled the loci, and called genotypes for the 94 individuals. This process was replicated for 32 samples (126 libraries generated in total), to estimate reproducibility and error rate of the genotypes (Grewe et al. 2015).

This method is analogous to the ddRAD protocol (Peterson et al. 2012), in the sense that two restriction enzymes (a rare and a common cutter) digest the genomes of individuals from closely related species in presumably homologous sites. Complementary Illumina adapters including individual barcodes are ligated to each restriction site, the libraries are size-selected, then amplified in 30 rounds of PCR. The resulting sequences are then processed using proprietary DArT analytical pipelines, which remove poor quality sequencing reads and demultiplex ~1,500,000 reads per individual. A secondary genotype-calling pipeline is applied to identify homologous clusters across individuals and retain clusters or loci with: a balanced counts per allele, a Mendelian distribution of alleles, and greater than 25X coverage.

Each individual was characterized by an array of SNPs, where 0 is homozygous for the major allele, 1 is heterozygous, 2 is homozygous for the minor allele, and - represents missing data. Each SNPs is characterized by: reproducibility (fraction of allele calls which are consistent among the technical replicates generated from the same DNA samples), call rate (the proportion of individuals scored for that locus), and polymorphism information content (PIC: index for evaluating the informative extent of a SNP marker, varying between zero for no allelic variation and 1.0 for maximum allele variation).

Because the focal species have diverged between ~29 to ~6 mya (Neves et al. 2020) and some have diverged in chromosome number and structure (Galetti et al. 2000; Nirchio et al. 2005, 2017; Rossi et al. 2005), it is likely that the restriction enzymes would not cut the same genomic regions across species. We tested for biases on the distribution of missing data by plotting the missing data per individual and the call rate per species, using the package dartR (Gruber et al. 2018) in R software (R Core Team 2020).

Data filtering

We used the package dartR (Gruber et al. 2018) to filter the data and to produce the input files for all downstream analyses (Table S3). We retained the SNPs with the following criteria: (1) SNPs with reproducibility above 97%, to reduce genotyping error; (2) only polymorphic SNPs; (3) loci (trimmed sequence tags) that are distinct enough to avoid paralogous loci (threshold: 0.2 of Hamming distance); (4) a varying amount of missing data (0, 20, and 40%, corresponding to call rates of 100, 80, and 60, respectively); and (5) one SNP per locus favoring SNPs with higher informativity (PIC values), to assure statistical independence among SNPs required by most analyses.

Because missing data was not equally distributed among species (Fig. S2), for performing comparative analyses across the six Mugil species (hereafter “6sp”), we built datasets with the three levels of missing data (hereafter “0MD”, “20MD”, and “40MD). Because M. liza contained most of the missing data (Fig. S2), we repeated this process excluding M. liza without missing data (“5sp_0MD”). To estimate demographic history per species, we have produced 5 species-specific datasets with 40% of missing data, following the classifications based on mitochondrial and nuclear data, since these were strictly concordant (see Results; Fig. 2). We also produced a dataset with no missing data considering only M. curvidens, to test for population structure. See Table S4 for details on the various datasets and their use in corresponding analyses.

Fig. 2: Population structure analyses performed with 94 Mugil individuals and the dataset with 0% missing data (984 SNPs).
figure 2

A PCoA analysis; B STRUCTURE analysis. In both analyses, individuals with disagreement between morphological and mitochondrial identification belong to the same mitochondrial lineages as recovered by the COI gene (Neves et al. 2021). There is no evidence of genetic population structure between the samples of M. curvidens from two estuaries (SA—Santo Antonio; MB—Manguaba).


Population structure

In order to assess how many evolutionary lineages are present in our sampling, we performed two population structure analyses that differ in their model assumptions. First, we performed a non-model based Principal Coordinates Analysis (PCoA) to quantify how the genetic variance is distributed among samples based on the presence or absence of alleles, using the package dartR (Gruber et al. 2018) in R software (R Core Team 2020) and the three datasets for the six morphological species (6sp_0MD, 6sp_20MD, 6sp_40MD). Second, we estimated the number of genetic clusters and tested whether there is ongoing hybridization between them, using the algorithm implemented in STRUCTURE v.2.3.4 (Pritchard et al. 2000), which maximizes Hardy–Weinberg and Linkage equilibria within K ancestral clusters. We considered one to ten K clusters, with five replicates for each, 10 k iterations as burn-in, 10 k MCMC steps, independent allelic frequencies, and no prior on the assignment of individuals. We chose the most likely K based on log-likelihood values (Pritchard et al. 2000). The graphic output was built using Clumpak (Kopelman et al. 2015). We performed this analysis, using the most restrictive datasets containing all six species (6sp_0MD), the five species excluding M. liza (5sp_0MD), and M. curvidens alone (curvidens_0MD), to test whether there is population structure when maximizing the number of SNPs.

For both analyses, we expect to find evidence for six clusters if the morphological hypothesis is correct or for five clusters if the mitochondrial hypothesis is correct (Table S1; Neves et al. 2021). If species hybridize in sympatry, we expect that individuals sampled in Santo Antonio estuary will be assigned to more than one cluster.

Phylogenetic relationships

We evaluated phylogenetic relationships among species based on our genomic data, both using phylogenomic and population genomic methods, that differ in their assumptions. First, we built a Maximum Likelihood (ML) phylogenetic tree describing the relationships among all individuals, classified according to morphology. We used dartR to produce a FASTA alignment for each individual containing a concatenation of all loci, with a random allele in heterozygous sites, using the six species datasets (6sp_0MD, 6sp_20MD, and 6sp_40MD). We used RAxML v. 8 (Stamatakis 2014) through CIPRES gateway (Miller et al. 2010) to perform 1 k bootstrap replicates (bs), with the GTR + GAMMA model. We visualized the ML tree using FigTree v.1.4.4 (, rooting the tree with M. liza (Neves et al. 2020). We expect individuals of the same species to form monophyletic clades, either reflecting their mitochondrial or morphological classification.

Second, we built a species tree describing the relationships between the species included in our sampling while allowing for interspecific gene flow using TreeMix (Pickrell and Pritchard 2012). We used the most stringent dataset (6sp_0MD) to generate a NEXUS alignment where individuals were grouped a priori according to the five genetic clusters supported by mitochondrial and nuclear data (Fig. 2). We ran this analysis considering zero to three migration axes between species, with five iterations for each, recording the likelihood of each model and the p-value of each migration event.

Genetic differentiation and variability

We estimated genetic variability between and within species. To ensure an unbiased comparison among species, we used the comparative dataset without missing data (6sp_0MD). First, we estimated genetic differentiation between all pairwise comparisons, using the fixation index FST (Wright 1943), as implemented in StAMPP package (Pembleton et al. 2013). We grouped the individuals into the five nuclear genetic groups, but we kept the individuals morphologically identified as M. incilis separate and considered the two sampling locations of M. curvidens as distinct populations, to test for population differentiation. Second, we estimated several indices of genetic diversity within each of the five species supported by mitochondrial and nuclear data (Fig. 2) as a proxy for relative differences in effective population size (Ne). Using the HIERFSTAT package (Goudet 2005) and the dataset with 0% missing data (6sp_0MD), we estimated expected heterozygosity (He), observed heterozygosity (Ho), and inbreeding coefficient (FIS) (Wright 1943). We estimated 95% confidence intervals for FIS by performing 100 replicates of non-parametric bootstrapping. Allele richness was estimated using the package PopGenReport (Adamack and Gruber 2014). Using the same dataset but concatenating the whole fragment instead, and using two haplotypes per individual without ambiguity codes, we estimated nucleotide diversity (θ and π), number of singletons, and departures from demographic stability with Tajima’s D, using DNAsp software v.6.12.03 (Rozas et al. 2017).

Demographic history

From a set of candidate models, we inferred the demographic history that best explains the observed patterns of genetic diversity within each species using the diffusion approximation methods implemented in δaδi (Gutenkunst et al. 2009). We produced one-dimensional site-frequency spectrum (SFS) for each species using the 40MD species-specific datasets after conversion to VCF using the radiator package (Gosselin 2017). To maximize the number of segregating sites in each SFS, we projected down to 80% the actual sample size, which uses subsampling to incorporate positions with missing data. This maximized the number of segregating sites in the SFS for most species when compared to projections to 100% or 60% the actual sample size. We folded and fit these SFSs to four models of increasing demographic complexity: (1) a neutral model, assuming a constant population size; (2) a two-epoch model, describing an instantaneous change in population size (Ne1) at a certain time (T1); (3) a bottlegrowth model, describing an instantaneous size change similar to the previous one (Ne1) at a certain time (T1), but followed by a period of exponential size change to the present size (Ne2); and (4) a three-epoch model, describing two instantaneous size changes (Ne1, Ne2) at times T1 and T2. We performed optimization using the δaδi pipeline developed by Portik et al. (2017), which performs multiple rounds of optimization, each round using the best fitting parameters from the last as new starting parameters. We performed these optimizations under default settings, with the following exceptions: we performed four rounds of optimization with [10,10,10,5] replicates in each round with maximum iterations of [5,30,60,700] per replicate in each round; we set a maximum limit of 50 on population size parameters; we optimized using the L-BFGS-B method. We replicated this approach three times per species and model combination to ensure convergence on model selection and parameter estimates. We plotted the optimized model SFS against the data SFS along with the residuals to infer deviations of the empirical data relative to the optimized model. To select the most likely demographic model for each species while accounting for the different number of parameters of the four models, we used the Akaike information criterion (AIC) (Akaike 1974), calculating AIC weights for each model and species combination (Burnham and Anderson 2002). Standard deviation for each parameter was calculated through the Fisher information matrix uncertainty estimation implemented in δaδi (Coffman et al. 2016).


Data filtering

The two pairs of enzymes showed equivalent genetic distances between species, but PstI-ShpI showed higher reproducibility between technical replicates (Table S2) and therefore was used to genotype all 94 individuals.

Our raw data (Fig. S2) was composed of 55,507 loci of ~69 bp with 71,585 SNPs. We observed a large amount of missing data (53%) that is not homogeneously distributed across species. By plotting the call rate of all loci by species (Fig. S3) we consistently observed bi-modal distributions, showing that loci are either always or never called across individuals of the same species. The most divergent species, Mugil liza, shows the largest amount of missing data (85%), followed by M. curema (72%). This is consistent with the absence of homologous restriction sites between highly divergent genomes, rather than a methodological error.

Considering all individuals together (i.e., in the 6sp datasets), we found 7,495 SNPs with 40% MD, 3,445 SNPs with 20% MD and 984 SNPs with 0% MD (Fig. S4). By removing individuals of Mugil liza (i.e., in the 5sp_0MD), the number of SNPs increases nearly two-fold (1,879 SNPs). When considering each individual assigned to each of the five mitochondrial clades separately, the number of SNPs is relatively low, considering 40% of missing data: 706 SNPs for M. liza, 946 for M. brevirostris, 2,001 for M. rubrioculus, 1,211 for M. curema, and 5,899 for M. curvidens. Mugil curvidens dataset with no missing data consist of 3,312 SNPs (Table S3 and S4).


Population structure

The first four dimensions of the PCoA using the comparative dataset without missing data (6sp_0MD) explained 95.4% of the data variability (Fig. 2A). In general, individuals from the same morphological species clustered together, with individuals of M. curvidens sampled at the two localities clustering together. The individuals that are morphologically assigned to M. incilis (2) and to M. rubrioculus (4) but that are mitochondrially assigned to M. curema clustered with the remaining individuals of M. curema. The individual that is morphologically assigned to M. curema but that has the mitochondrial lineage of M. curvidens also clustered with the remaining individuals of M. curvidens. Allowing for missing data did not change these results (Fig. S5).

In agreement, our STRUCTURE analyses show that the sampled individuals are assigned to five well-differentiated clusters, with the highest likelihood values at K = 5 (Fig. 2B; Fig. S6). The nuclear clusters correspond perfectly to the five mitochondrial lineages, confirming the misidentifications based on morphology. The two populations of M. curvidens are assigned to the same cluster. Our results show no sign of ongoing hybridization, as every individual is almost entirely assigned to a single cluster (maximum fraction detected from another cluster is 0.045). These results remained constant when performing the same analysis without M. liza (1,879 SNPs, Fig. S7A; maximum fraction = 0.044) or when only considering M. curvidens (3,312 SNPs, Fig. S7B; no individual is assigned to the second cluster with a fraction above 0.566).

Phylogenetic relationships

Our ML tree (Fig. 3A) recovered five well supported clades (bs = 100), with a consistent topology to what was previously described using a fragment of the mitochondrial COI gene (Neves et al. 2021). The M. curvidens individuals sampled in the Manguaba and Santo Antonio estuaries form a single clade, suggesting no population-level divergence. Also in agreement with the mitochondrial DNA, the two individuals morphologically identified as M. incilis and the four individuals morphologically identified as M. rubrioculus nested within the clade of M. curema, being sister of M. curvidens, where the individual morphologically identified as M. curema nested. Mugil brevirostris is sister to M. rubrioculus, having the shortest branch lengths between species. The analysis including 20 and 40% of missing data showed the same topology (Fig. S8).

Fig. 3: Phylogenetic history of Mugil species based on 984 SPs.
figure 3

A Maximum Likelihood phylogeny; numbers on nodes represent bootstrap support. Colors of the terminal branches refer to the morphological identification. B Species tree estimated by TreeMix showing interspecific gene flow between M. rubrioculus and M. curvidens.

Our estimated species tree showed the same topology and relative branch lengths of the ML tree (Fig. 3B). The model assuming one migration was the simplest model describing the data, as more complex models did not retrieve significant migration events (Table S5). Thus, we only find significant admixture between M. rubrioculus and M. curvidens (p-value = 0.014).

Genetic differentiation and variability

Our measures of genetic differentiation between populations showed extremely high FST values between the five species identified by our population structure analyses (Table S6), showing that most SNPs are fixed between species. The minimum differentiation was observed between M. brevirostris and M. rubrioculus (FST = 0.851), and the maximum was between M. liza and M. brevirostris (FST = 0.948). Again, the individuals morphologically assigned to M. incilis but genetically identified as M. curema showed very low differentiation relative to M. curema (FST = 0.008). The two sampling localities of M. curvidens are also genetically similar (FST = 0.01). For most species (M. brevirostris, M. rubrioculus, and M. curema), the confidence intervals of FIS included zero (Table 1), consistent with random mating. The remaining two species (M. liza and M. cirvidens) the confidence intervals of FIS exclude zero, but yet overlap with the remaining species, suggesting similar inbreeding coefficients.

Table 1 Summary statistics of Mugil lineages from Tropical Southwestern Atlantic marine province.

Our estimated levels of genetic diversity within species based on the same SNPs across species (Table 1) showed that Mugil liza has the highest values across most diversity indexes (Ho, He, AR, θ, and π), with the exception of the number of polymorphic sites (S) that was highest in M. curvidens. The lowest diversity was found in M. curvidens when considering heterozygosity (Ho and He), or in M. brevirostris when considering other diversity indexes (S, AR, θ, and π). All species showed a negative Tajima’s D, suggestive of demographic expansion. But Tajima’s D was only significant in M. curvidens (p < 0.05), which shows the highest number of singletons.

Demographic history

Our demographic modeling rejected the neutral model of constant population size for all species, in favor of one of two similar models showing a recent range expansion: two-epoch and bottlegrowth (Fig. 4). When assuming a constant effective population size, our observed Site Frequency Spectra (SFS) showed an excess of singletons and a deficit of low-frequency SNPs across all species (Fig. S9). When including a change in effective population size, either instantaneous (two-epoch) or continuous (bottlegrowth) the SFS fit the expectations with few residuals (Fig. S10). The AIC weights (Fig. 4, Table S7) support two-epoch as the most probable of the candidate scenarios for M. liza, M. rubrioculus, and M. curema (AIC weights = 0.579, 0.539, and 0.596, respectively) and support bottlegrowth for M. brevirostris and M. curvidens (AIC weights = 0.593 and 0.532, respectively). In all cases, the AIC weight of the best fitting model is substantially higher than the second-best fitting, with the exception of M. curvidens, where bottlegrowth is only slightly more probable (AIC weight = 0.532) than three-epoch (AIC weight = 0.463), both estimating a similar scenario. The parameter estimates of the best fitting model describe a population expansion in all species, either directly from the ancestral population or after a bottleneck (Table S8). Parameter estimates from second-best models always show a similar result of population expansion (Table S9).

Fig. 4: Demographic history of the five sympatric Mugil species.
figure 4

Weighted support for alternative demographic models according to Akaike’s information criterion (AIC).


Designing sustainable and science-based measures for managing wild populations requires a fundamental knowledge on the number of species, their genetic connectivity, diversity, and demographic history (Scott et al. 2020; Bilgmann et al. 2021; Hoffmann et al. 2021). This is particularly challenging in morphologically similar species, such as the Mugil fishes, despite their high ecologic and economic importance. Here, we use a genomic approach to resolve these evolutionary questions and understand how speciation of these morphologically similar species can lead to the maintenance of strong genetic isolation in the absence of geographic isolation.

Five species of Mugil occur in sympatry and show high vagility

The number of species of Mugil that are sympatric in the Coral Coast MPA in Brazil has been debated (Fig. 1) (Menezes et al. 2015; Barletta and Dantas 2016). A previous study integrating mitochondrial and morphological data reported a conflict between the two classifications (Table S1; Neves et al. 2021), suggesting either an overestimation of the number of species, or evolutionary processes that are specific to the mitochondrial genome.

Using hundreds to thousands of loci spread throughout the genome, our population structure analyses conclusively show that all sampled specimens belong to five well-defined genetic clusters (Fig. 2). These clusters align perfectly with the five mitochondrial lineages previously described based on the barcoding gene COI (Fig. 3A; Table S1; Neves et al. 2020). All individuals showing a conflict between the morphological and the mitochondrial classifications (two M. incilis, four M. rubrioculus, and one M. curema) were clustered with evolutionary lineages coincident with the mitochondrial gene, showing that this barcoding gene can reliably distinguish between species that cannot always be differentiated using external morphological data (Fig. 1). Although the number of genotyped individuals (94) and filtered SNPs used here (from 1,879 to 984 SNPs) are relatively modest, this result is consistent across datasets (Fig. S7A) and clustering methods (Fig. 2), showing that this sampling is sufficient to answer this research question, as shown in other RAD-seq studies with a smaller number of loci (Mesak et al. 2014). These results rule out the hypothesis of mitochondrial introgression through hybridization, and reduce the number of species in the study area to five: M. liza, M. brevirostris, M. rubrioculus, M. curema, and M. curvidens. Given that we find unexpectedly low level of homology between restriction sites across these five species (Figs. S2, S3), consistent with high genomic divergence in the nuclear genome (Table S6), future genomic studies would benefit from other next-generation sequencing methods that are more suitable for deeper time scales (da Fonseca et al. 2016).

Our results also show no significant genetic differentiation between the M. curvidens sampled in the two estuaries 38 km apart, even when this species was analyzed separately with double the number of loci (Fig. S7B). This confirms that the heterogenous habitat composed of coral reefs and recurrent plumes of sediments from the rivers (Passos et al. 2016) does not restrict gene flow and that these two areas with different conservation regimes harbor a single panmictic population of M. curvidens. Although this result is perhaps not surprising because adults of Mugil species migrate long distances along the shoreline (Livi et al. 2011) and because dispersal also occurs passively through pelagic eggs and larvae (Livi et al. 2011; Whitfield et al. 2012), this hypothesis had not been tested at the genomic level in Mugil. Such a level of genetic connectivity across heterogeneous habitats contrasts with genomic studies in other marine species (Riginos and Nachman 2001; Hauser and Carvalho 2008; Selkoe et al. 2016) and has relevant implications for conservation, as discussed below.

Mugil species show strong genetic isolation despite full sympatry

Because many morphologically cryptic species are allopatric, showing a fractal genetic structure (Wake 2009; McMahan et al. 2013; Boehm et al. 2013), it is challenging to understand whether such speciation processes lead to strong reproductive isolation. In this context, sympatric cryptic species are important because we have an opportunity to measure genetic barriers between them and thus infer whether they represent evolutionarily stable units. According to a mitochondrial study (Neves et al. 2020), these focal species of Mugil have diverged between ~29 mya, when M. liza split from the remaining species, to ~6 mya, when M. brevirostris and M. rubrioculus diverged. Despite such a long period since initial divergence, it has not been tested whether these species were formed in the face of gene flow, and whether they currently present strong genetic isolation where they occur sympatrically, such as in this MPA.

Our maximum likelihood phylogenetic analysis using concatenated loci (Fig. 3A) estimates a topology and branch lengths that are in large agreement with those previously estimated from mitochondrial loci (Neves et al. 2020), irrespective of the missing data allowed (Fig. S8). This tree shows that M. brevirostris and M. rubrioculus diverged most recently, preceded by the split between M. curema and M. curvidens. Interestingly, most SNPs are fixed among all species pairwise comparisons, reflected in the extremely high values of fixation indices among the five species (Table S6) and consistent with old divergence.

By using allelic frequencies of independent SNPs for estimating phylogenetic relationships in the presence of incomplete lineage sorting and interspecific gene flow (Pickrell and Pritchard 2012), we find strong support for a topology that is congruent with the tree based on concatenated data. Notably, the most likely species tree shows gene flow between M. rubrioculus and M. curvidens (p-value = 0.014), without support for gene flow between all other species pairs (p-values > 0.278; Table S5). This suggests that, while some morphologically similar species might have formed in parapatry or sympatry, most species were likely formed in allopatry. This hypothesis is consistent with the extensive changes in the coast of South America that occurred from the Miocene (Bush, de Oliveira 2006) to the Pleistocene (Ludt and Rocha 2015), including a reduction of habitable area, changes in ocean currents, and shifts in water column thermal dynamics. Such changes have been associated to population bottlenecks in glacial refugia, and species formation in coastal marine taxa from tropical waters (Ludt and Rocha 2015), consistent with our findings for Mugil.

Our STRUCTURE analyses show that every individual has ancestry in a single cluster (membership probabilities > 0.95), irrespective of the stringency of the filtering (Fig. 2B, Fig. S7A), suggesting strong genetic isolation between species. This absence of hybrids suggests that genetic, behavioral, ecological, or other barriers can establish strong genetic barriers between species, despite morphological and spatial overlap. Although currently these species lack any geographic barriers, use the same macrohabitat, and are morphologically similar, several studies of Mugil species have shed some light on multiple reproductive barriers that might contribute to the strong reproductive isolation reported here. Ethological studies have shown that the spawning season of M. curema and M. liza only overlaps in one month in southeastern Brazil (Albieri et al. 2010), while the spawning season of M. cephalus and M. curema do not overlap in Mexico (Ibáñez-Aguirre 1993), suggesting that temporal isolation might have been established between several Mugil species. Ecological studies have found that in southeastern Brazil, M. curema is associated with high salinity waters while M. liza is associated with lower salinity (Mai et al. 2018), suggesting some ecological isolation between these species. Diet studies have shown that sympatric species of Mugil in West Africa can present differential particle size selection to avoid competition (LeLoc’h et al. 2015), possibly constituting a further ecological isolating barrier. Furthermore, cytogenetic studies have shown that some of these species differ in the number and arrangement of chromosomes (Rossi et al. 2005; Harrison et al. 2007; Nirchio et al. 2017), possibly constituting a genetic isolating barrier. Although the relative contribution of these barriers has not been tested by experimental crosses, our observation of lack of hybrids in an area of sympatry between five different species of Mugil suggest that these species show strong reproductive isolation, and thus have passed the “grey zone of speciation” (sensus Roux et al. 2016). Most species past this level of divergence have accumulated strong ecomorphological differentiation. Our results add to evidence from other studies of morphologically similar species (e.g. Roux et al. 2013) that suggest that cryptic speciation can also result in strong genetic barriers between sympatric taxa.

Species differ in their relative abundance and demographic expansion

Information regarding population size change is fundamental for understanding the evolutionary history of a species and to delineate conservation strategies (Ramakrishnan et al. 2005; Dussex and Robertson 2018). Yet, inferring the evolutionary processes underlying the patterns of genetic variation within species requires hundreds or thousands of genetic markers sampled randomly across the genome.

Using genomic data, we show that the diversity indexes vary strongly between species (Table 1). Because in an idealized Wright–Fisher population the nucleotide diversity π = 4 Ne µ (where µ is the mutation rate per nucleotide site per generation and Ne is the effective population size), our estimates of π should be proportional to differences in Ne between species when using the same loci across species (dataset 6sp_0MD). Although we find relatively higher diversity estimates (Ho, He, θ, and π) in M. liza and lower in M. brevirostris, (Ho, He, and π), caution is needed when interpreting Ne, since these are estimates of long-term effective population size and may not scale linearly with contemporary census sizes (Braude and Templeton 2009; Leffler et al. 2012). For example, M. liza is characterized by a larger body size, has higher commercial value, and their census size has decreased in recent years (Mendonça and Bonfante 2011; Sant’Ana et al. 2017; Vieira et al. 2019). Therefore, current patterns of genetic diversity within Mugil species are likely to be affected by the demographic history of each species, rather than reflecting their current relative abundance.

Our demographic modeling (Fig. 4) clarifies how the demographic history of each species shapes the current estimates of Ne. The neutral demographic scenario was rejected for every species (AIC weight < 0.001 for all species) in favor of scenarios with an increase of Ne over time; a sudden increase of Ne for M. liza, M. curema, and M. rubrioculus, and an exponential increase of Ne after a sudden bottleneck for M. brevirostris and M. curvidens. The simulated patterns of variability are very similar under the three candidate models of range expansion (Fig. S9) and thus caution is needed in trying to distinguish between them. AIC weights convey a conditional probability for each model in the pool, and while the best fitting models were nearly double the weight of the second-best fitting model in all but one taxa (M. curvidens), this weight never exceeded 0.6. However, even if we examine the second-best fitting models for each taxa, the selected models and their parameter estimates agree with the results of the best-fitting models (Table S9). These results are also consistent with our observation of negative Tajima’s D across species, even though they were only statistically significant in one (Table 1). Based on the Ne estimates for the most likely model (Table S8), we estimate around a threefold increase with no bottleneck in Ne in M. liza, M. curema, and M. rubrioculus, which shows the three highest π estimates. Both species with the lowest values of πM. brevirostris and M. curvidens—show expansions after a bottleneck, with the bottleneck and expansion both being stronger in M. brevirostris than in M. curvidens. This suggests that the current Ne is strongly shaped by species-specific demographic history. However, it should be noted that these parameter estimates are not a perfect comparison between species as they are estimates contingent upon the model used, which may not reflect the true demographic history, only the most probable of our candidates and given our data. Due to high differences in AIC weight for models between species (Table S7), no one model could be fixed to all for estimating comparable magnitudes for size changes in these species.

Genomic studies in other exploited fish species have consistently found signatures of demographic expansion in the Atlantic herring (Barrio et al. 2016), the North American lake whitefish (Rougeux et al. 2017), turbot (Momigliano et al. 2020), and Pacific salmon (Rougemont et al. 2020). Our results are in line with those findings and suggest that current patterns of genetic diversity in wild populations are highly determined by historical changes in Ne associated with the Quaternary ice age. Thus, future studies in protected areas must consider the effect of these important historical events shaping current patterns of genetic diversity. Higher Ne values can also be associated with maturation time lengthened (Nunney 1993), which can happen when juveniles maturate earlier due to overexploitation (Kuparinen and Merilä 2007), or with differences in geographical range among species (Leffler et al. 2012). This suggest that, although these species are morphologically similar and coexist in the same habitats, their evolutionary histories can be quite different, affecting their levels of standing genetic variation, and hence their adaptive potential (Wang et al. 2016).

Implications for conservation

Although governmental institutions and legislators are willing to implement new science-based regulations, a fundamental knowledge on the number of species, their genetic connectivity, and their demographic history is still lacking for most species of ecological and economic interest. Considering that Mugilidae fishes are particularly challenging to identify morphologically, yet are the main fish target in tropical artisanal fisheries (Batista et al. 2014), our results provide important guidelines for conservation.

The data generated by the DArT-seq method can provide valuable information for conservation and management, having been applied to species of great economic importance that have been historically exploited. Some examples are the lobster Panulirus homarus (Palinuridae) exploited in the Middle East (Al-Breiki et al. 2018), the sharks Carcharhinus brachyurus and C. obscurus (Carcharhinidae) (Junge et al. 2019), shrimps of the genus Macrobrachium (Palaemonidae) (Makombu et al. 2019) exploited globally, and fish species of the genus Osteoglossum (Osteoglossidae) commercialized in the Amazon region (de Souza et al. 2019).

We show that M. incilis is very likely absent from the Coral Coast MPA, in agreement with a previous study based on morphological and mitochondrial data (Menezes et al. 2015). This implies that conservation efforts inside this MPA should focus only on five Mugil species and that M. incilis has a more restricted distribution than previously thought.

Currently, fisheries in northeastern Brazil report capture of all Mugil species under the same category (9,219.5 tons in 2007; IBAMA 2007). Irrespective of their unknown current census population sizes, our results suggest that these species have different effective population sizes (Table 1), and that the magnitude and mode of demographic expansion differs across species (Table S8). Although our results suggest that each species of Mugil should have a specific protection status, this is extremely challenging to implement in such a morphologically conserved group of species. Yet, our finding that species classification using thousands of genomic markers is in strict agreement with species classification based on a single fragment of the COI gene (Neves et al. 2021) represents an important validation for future monitorization studies using hundreds or thousands of samples. This mitochondrial barcoding gene provides a cost-effective tool to monitor fisheries bycatch, and thus to assess species abundance over time and to establish sustainable protective measure for such ecologically and economically important species.

Data archiving

All individuals sampled for this study were deposited in the ichthyology collection of the Alagoas university; voucher number, sampling locations, morphological data and COI Genbank accession numbers are listed in Table S1. All the raw genomic data, filtered datasets, scripts and infiles for all the analyses were deposited in GitHub (