Introduction

During the glaciations and interglaciations of the Late Pliocene and Pleistocene, falling and rising sea levels led to the exposure and reflooding of the shallow seabeds that fringe the coastlines of continents and islands (Imbrie and Imbrie 1986; Toms et al. 2014; Ludt and Rocha 2015). During the low sea level stands of the Ice Ages, newly exposed seabeds split previously contiguous seas into smaller isolated oceanic bodies, which allowed for the vicariance of their now allopatric marine groups (Avise 2000). Subsequently, when the exposed seabeds were reflooded during the high sea level stands of the interglaciations, these previously allopatric groups became reconnected by straits, which allowed for their dispersal and secondary contact. This model of “vicariance, then secondary contact” provides a general explanation for the phylogeography of many marine groups (Bowen et al. 2014; Ni et al. 2014), including that of the four-eyed sleeper (Bostrychus sinensis, Eleotridae, Perciformes) (Qiu et al. 2016).

The four-eyed sleeper is a goby that is distributed along the muddy, sandy, and gravelly coastlines of the Indo-Pacific from India to Australia, China, and Japan (Masuda et al. 1984). Recently, Qiu et al. (2016) studied the phylogeography of the four-eyed sleeper in the East China Sea (ECS) and South China Sea (SCS) with protein-coding sequences of the mitochondrial cytochrome b (Cytb) and nuclear ryanodine receptor 3 (RyR3) genes and two morphological traits (adult standard length (SL) and presence/absence of dark costal stripes). Gene trees for the Cytb and RyR3 sequences were congruent for two reciprocally monophyletic groups of haplotypes and alleles, which were diagnostic for two lineages with a historical geographic break across the Taiwan Strait. A divergence time of ~440,000 years ago was inferred for the two lineages (i.e., in Marine Isotope Stage 12, which was the most extreme Ice Age of the last one million years; Railsback et al. 2015). During Marine Isotope Stage 12, the ECS was limited to the Okinawa Trough, the SCS was reduced to an enclosed inland body, and an emergent land bridge across the Taiwan Strait connected Taiwan to mainland China (Fig. 1). Thus, at this time, ECS populations of the four-eyed sleeper were cutoff from those in the SCS, which allowed for their vicariance into separate “ECS” and “SCS” phylogeographic lineages.

Fig. 1
figure 1

Map of the six geographic samples for the four-eyed sleeper (with the two-letter abbreviations of their localities given in parentheses). This map is rendered with ODV v4.7.3 (Schlitzer 2002). Shallow seabeds with depths of <100 m, which were exposed during the major glaciations of the last one million years (e.g., Marine Isotope Stage 12), are denoted by the offshore shading (Voris 2000; Railsback et al. 2015), whereas the spring and summer directions of the South China Sea Surface Current and Kuroshio Branch Current through the Taiwan Strait are represented by the arrows (Jan et al. 2002). (Inset) Two females from WZ and one female from XM with final individual identifications of SCS, ECS, and SCS, respectively (see text). The body sizes of these three females approximate the mean SL of their sympatric SCS and ECS groups at WZ and allopatric SCS group at XM, respectively. Thus, their size differences illustrate that SL of SCS individuals varies geographically such that SCS individuals in sympatry are bigger than ECS specimens, but are of similar size in allopatry. Furthermore, the ECS female from WZ illustrates the presence of dark costal stripes, which is the second morphological trait of this study

Unlike the haplotypes and alleles of the ECS lineage, those of the SCS lineage were found both north and south of the Taiwan Strait (Qiu et al. 2016). Spawning occurs in the Chinese four-eyed sleeper during spring and summer and its planktonic larvae remain in the water column for at least 1 month (Zhong and Li 2002; Hong et al. 2016). Thus, as the Taiwan Strait was reflooded during the more recent interglaciations, the ECS and SCS were reconnected, which allowed for offshore dispersal via the prolonged planktonic larval stage and secondary contact between the two phylogeographic lineages. However, during the spring and summer, the South China Sea Surface Current and Kuroshio Branch Current both flow northeastwardly across the Taiwan Strait (the former moves southwardly at other times of the year; Jan et al. 2002). Thus, planktonic larval dispersal is biased toward the northeast, which effectively limits the secondary contact region of the two lineages to the ECS.

About 10% of the sampled individuals from the ECS were identified as hybrids (Qiu et al. 2016). Although based on only two loci, this finding remains noteworthy, because it raises a number of important questions about hybridization in the secondary contact region of the four-eyed sleeper. (1) What are the relative frequencies of F1 and later generation hybrids and backcrosses? (2) How widespread is mitochondrial DNA (mtDNA) introgression? (3) Does the ECS hybrid region qualify as a “tension zone”? (4) At what stage of the speciation process are the ECS and SCS lineages at? (5) How do the two morphological characters vary among the two lineages and their hybrids (e.g., is either trait diagnostic for the ECS and SCS parents and/or their crossbreeds)?

The northwest Pacific marginal seas are a “hotspot” for research on the phylogeography and evolution of marine groups (Ni et al. 2014; Wang et al. 2015). However, what remains missing for the region are detailed studies about hybridization, particularly in the ECS where secondary contact has been reported in many marine groups (e.g., Liu et al. 2007; Wang et al. 2008; Yin et al. 2009; Tang et al. 2010; Yan et al. 2014). This gap is especially significant, because hybrid regions constitute “natural laboratories” for studying the processes of divergence, adaptation, speciation, and gene exchange (Hewitt 1988, 2011; Mallet 2005; Petit and Excoffier 2009; Taylor et al. 2015). In this study, we complement the two loci and two morphological traits of Qiu et al. (2016) with a new microsatellite data set. This new data set provides the additional nuclear DNA (nDNA) loci and power that were unavailable to the previous authors for resolving the five key questions from above about sleeper hybridization. Our investigation now provides new insights into the dynamics of hybridization in the ECS and thereby can serve as a case study for future research on admixture, introgression, and divergence in the northwest Pacific marginal seas.

Materials and methods

Geographic samples, morphological traits, and Cytb/RyR3 sequences

Adult four-eyed sleepers (n = 534) were collected from five ECS and one SCS localities in China (Fig. 1). As the five ECS samples were from the secondary contact region, they were expected to include individuals of the ECS and SCS lineages as well as their hybrids (Qiu et al. 2016). Conversely, as the one SCS sample was from south of the secondary contact region, it was primarily included as a reference for the identification of individuals from the SCS lineage.

Specimen processing, the scoring and handling of the two morphological traits (adult SL in mm and presence/absence of dark costal stripes), and the generation and analyses of the Cytb and RyR3 sequences closely followed the methods of Qiu et al. (2016). Thus, in our coverage below of these methods, we largely limit ourselves to those specifics that are of greatest importance to the interpretation of our current results.

SL was recorded for individuals only after confirming their adult status by examination of their gonads. Although we were unable to control for the effects of age, this source of heterogeneity on SL was not regarded as critical, because Zhang (2002) showed that ~82 and 14% of the adults in Chinese sleeper populations are first- and second-year breeders, respectively. Thus, our geographic samples were in effect randomly drawn from a single distribution with a heavy skew toward the youngest reproducers. SL did not vary between females and males in any geographic sample (Student’s t test, P > 0.122 in all cases), so the sexes were pooled for subsequent analyses. SL comparisons among ECS, SCS, and hybrid individuals were made on a locality-by-locality basis, rather than with the combined measurements for all six geographic samples.

Representative sequences of all new alleles for the diploid RyR3 locus were separately cloned and sequenced to confirm their accuracy (e.g., correct phasing). These confirmed sequences were combined with those of Qiu et al. (2016) to generate a final comprehensive panel of verified alleles for the accurate RyR3 genotyping of the new specimens.

Summary statistics of genetic diversity for the Cytb and RyR3 sequences of the 534 new individuals were first estimated on a per-sample basis with the methods of Qiu et al. (2016). These new sequences were then separately combined with the Cytb and RyR3 orthologs of the previous 205 specimens, which were sequenced by these authors. The combined Cytb and RyR3 sequences were separately aligned with CLUSTALX v2.1 (Larkin et al. 2007) and phylogenetically analyzed with MRBAYES v3.2.5 (Ronquist et al. 2012) under the clock:coalescence prior that is designed for population-level processes and intraspecific inference (Drummond and Bouckaert 2015). Under the clock:coalescence prior, the Cytb and RyR3 gene trees were rooted according to a molecular clock, which obviated the need for outgroups. By combining the new Cytb and RyR3 sequences of this study with those of Qiu et al., we were able to classify each of the former as “ECS” or “SCS” according to its placement with the previously documented haplotype and allele groups that were reciprocally monophyletic, congruent, and diagnostic for the ECS and SCS phylogeographic lineages.

Microsatellite DNA data and analyses

Starting with the same total genomic DNA isolates as used in the Cytb and RyR3 sequencing, the 534 new specimens were genotyped for 15 of the 20 microsatellite loci reported for the four-eyed sleeper by Wang et al. (2011). These 15 loci were PCR amplified with the primers and methods of these authors and microsatellite genotypes were then scored on an ABI 3130xl automated sequencer (Applied Biosystems, Waltham, MA) with GENEMARKER (Softgenetics, State College, PA). The initial data set was screened for missing data and potential genotyping errors with MICROCHECKER v2.2.2 (van Oosterhout et al. 2004). Thirty-three specimens and three loci (BSB006, BSSW83, and BSW068) were excluded from the final data set, leaving 501 individuals and 12 markers (BSC001, BSC002, BSD026, BSD106, BSD121, BSD125, BSD137, BSE008, BSE020, BSSD14, BSSW89, and BSW045).

Microsatellite genotypes were combined with those for RyR3 to generate what is hereafter referred to as the nDNA data set. Summary statistics of nDNA genetic diversity were estimated for the six geographic samples with POPGENE v1.3.2 (Yeh and Boyle 1997). Probabilities of identity between both unrelated individuals and siblings (PID and PID-sib, respectively) were estimated with the observed allele frequencies of each sample with GenAlEx v6.41 (Peakall and Smouse 2012). Each locus of a sample was evaluated for its departure from Hardy–Weinberg expectations with the exact HW test in GENEPOP v4.4 (Rousset 2008). Wright’s FIS was estimated for each locus and then for all loci of a sample with FSTAT v2.9.3 (Goudet 2002) and GDA v1.1 (Lewis and Zaykin 2002), respectively. 95% confidence intervals (CI) were determined for the per-locus and total FIS with 10,000 randomizations of the genotypes in FSTAT and with 10,000 bootstrap replications across loci with GDA. Departures from linkage equilibrium were tested for all pairs of loci within each geographic sample with 10,000 permutations of their genotypes with GDA. By preserving the individual genotypes at each locus, these permutations corrected for the effects of within-locus disequilibrium on the between-locus equilibrium (Lewis and Zaykin 2002). To compensate for multiple comparisons, standard Bonferroni corrections were applied in all sample-wide series of tests (Rice 1989).

Provisional and final individual identifications

Provisional individual identifications as “ECS”, “SCS”, or “hybrid” were first made for the 534 new specimens according to the gene tree approach of Qiu et al. (2016). Specifically, the origins of the one Cytb haplotype and two RyR3 alleles for each new individual were inferred according to the placements of its sequences with either the ECS or SCS groups of their respective gene trees. “ECS” individuals were then provisionally identified as those with matching ECS origins for their one Cytb haplotype and two RyR3 alleles. Similarly, “SCS” individuals were provisionally identified as those with matching SCS origins for their one Cytb haplotype and two RyR3 alleles. Conversely, “hybrid” individuals were provisionally identified as those with mismatched origins for their one Cytb haplotype and/or two RyR3 alleles. Thus, hybrid individuals included those: (1) with an ECS-related Cytb haplotype and two SCS-related RyR3 alleles; (2) with a SCS-related Cytb haplotype and two ECS-related RyR3 alleles; and (3) with both ECS- and SCS-related RyR3 alleles.

For the 501 new specimens with both nDNA and Cytb data, final individual identifications as “ECS”, “SCS”, or “hybrid” were next made by combining the results of their nDNA STRUCTURE analyses with those for their gene tree inferences of Cytb origins. The nDNA data were analyzed with STRUCTURE v2.2.3 using the admixture model with correlated allele frequencies among populations (Pritchard et al. 2000). The number of groups for this analysis was initially set to K = 2 given the prior findings of Qiu et al. (2016) for a secondary contact region with both ECS and SCS lineages. Importantly, this premise was also confirmed by the results for K = 1–6 with the approach of Evanno et al. (2005), as implemented in STRUCTURE HARVESTER v0.6.93 (Earl and vonHoldt 2012), and by a comparison of the posterior probabilities for K = 1 and 2, which documented that the posterior probability for the latter was 5113.73 log likelihood units greater than that for the former. The final analysis with K = 2 was run for 100 million generations with the first 10% of the simulation discarded as burn-in.

The proportion of nDNA with an ECS origin was estimated for each individual with the qij parameter (which we hereafter refer to as qECS) of STRUCTURE (Pritchard et al. 2000). To maximize the efficiency and accuracy of hybrid detection, we followed the approach of others (e.g., Vähä and Primmer 2006; Muto et al. 2013) and used ranges of qECS 0.90, 0.90 > qECS > 0.10, and qECS 0.10 to classify the nDNA of each individual as of ECS, hybrid, or SCS origin, respectively. Final individual identifications were then made by comparing the nDNA origin for a specimen to the ECS/SCS gene tree placement of its Cytb. Thus, final identifications consisted: (1) of ECS individuals with nDNA (qECS 0.90) and Cytb of matching ECS origin; (2) of SCS specimens with nDNA (qECS 0.10) and Cytb of matching SCS origin; and (3) of hybrids with admixed nDNA (0.90 > qECS > 0.10) or with mismatched origins for their nDNA and Cytb (i.e., ECS-related nDNA (qECS 0.90) and SCS-related Cytb or SCS-related nDNA (qECS 0.10) and ECS-related Cytb).

Unlike the provisional classifications, the final identifications relied on all three sources of genetic data (Cytb, RyR3, and the 12 microsatellite loci). Thus, final identifications were emphasized throughout the remainder of this study and were hereafter implied, if left unspecified.

Principal coordinate analyses of the nDNA data

As a check on the ECS, SCS, and hybrid identifications, the nDNA data for each geographic sample were analyzed by principal coordinate analysis (PCoA) as implemented in GenAlEx v6.41 (Peakall and Smouse 2012). Codominant genotypic distances were first calculated for all pairs of individuals for a geographic sample with their nDNA genotypes. These distances were then used in the PCoA to generate the first and second coordinates or axes that maximally explained the individual nDNA variation within each geographic sample. The specimens of each geographic sample were plotted in their PCoA graphs with a different symbol that represented their ECS, SCS, or hybrid identifications. In this way, the ECS, SCS, and hybrid identifications for each geographic sample were checked against the pairwise nDNA distances among its individuals.

Simulations of ECS, SCS, and hybrid individuals

To further assess the depth of hybridization, a series of ECS and SCS individuals, F1 and F2 hybrids, and their first- and second-generation backcrosses was simulated with HYBRIDLAB v1.0 (Nielsen et al. 2006). Starting with the observed nDNA allele frequencies, 250 ECS and 250 SCS individuals were separately simulated as the ECS and SCS classes, respectively. These two classes were then input as the ECS and SCS sources for the simulation of 250 F1 hybrids. This F1 hybrid class was then crossed to itself and to the ECS and SCS classes to generate the F2 hybrid, first-generation ECS/F1 backcross, and first-generation SCS/F1 backcross categories (each with 250 simulated individuals). In a similar manner, six additional categories of second-generation backcross hybrids were generated (each with 250 simulated individuals): (1) ECS × ECS/F1; (2) SCS × ECS/F1; (3) F1 × ECS/F1; (4) ECS × SCS/F1; (5) SCS × SCS/F1; and (6) F1 × SCS/F1. The 3000 simulated individuals for all 12 classes were combined and then analyzed as before with STRUCTURE. The qECS for the 250 simulated individuals of each class were summarized by their means and CI. They were then compared to the 95% credible intervals for the sleeper hybrids with admixed nDNA of 0.90 > qECS > 0.10 in an effort to estimate the relative frequencies of early vs. later generation hybrids.

Results

Summary statistics and gene trees for Cytb and RyR3

There were 113 different haplotypes among the 739 Cytb sequences of this study and of Qiu et al. (2016) (Supplementary Fig. 1A and Supplementary Table 1). Of these 113 distinct haplotypes, 59 and 54 belonged to the ECS and SCS haploptype groups of the Cytb gene tree, which were reciprocally monophyletic and diagnostic for the ECS and SCS lineages, respectively. Of the 113 haplotypes, 73 were limited to our new samples (i.e., they were not present in the collections of Qiu et al.; Table 1). Of these 73 novel haplotypes, 44 and 29 were related to the ECS and SCS groups, respectively. In our combined CytB gene tree, the ECS and SCS groups were both defined by decisive posterior probabilities of 100%. Furthermore, as expected for our XM sample (from south of the secondary contact region in the ECS; Fig. 1), all 48 of its Cytb sequences were related to the SCS lineage.

Table 1 Summary statistics of Cytb, RyR3, and nDNA genetic diversity for the six geographic samples

There were 68 different alleles among the 1478 RyR3 sequences of this study and of Qiu et al. (2016) (Supplementary Fig. 1B and Supplementary Table 2). Of these 68 distinct alleles, 27 and 41 belonged to the ECS and SCS allele groups of the RyR3 gene tree, respectively. Of the 68 alleles, 49 were limited to the 534 new specimens of this study (Table 1). Of these 49 novel alleles, 23 and 26 were related to the ECS and SCS groups of the RyR3 gene tree, respectively. In our combined RyR3 gene tree, the ECS group was defined once again by a decisive posterior probability of 100%. However, the SCS group was now supported by a moderate posterior probability of 73%. Furthermore, 3 of the 96 RyR3 sequences for the 48 diploid XM specimens were related to the ECS (not SCS) group. Although imperfect, the RyR3 support was still congruent with the decisive reciprocally monophyletic and diagnostic Cytb evidence for the ECS and SCS lineages.

Summary statistics for nDNA

The nDNA loci provided extensive genetic diversity for the identification of individuals (Table 1). This resolving power of the nDNA loci was most clearly supported for the six geographic samples by their PID of 1.4E−21 to 7.8E−18 and PID-sib of 5.0E−07 to 3.7E−06, which documented that two individuals (whether unrelated or siblings) were unlikely to share the same nDNA genotypes by chance.

Each geographic sample was characterized by several nDNA loci that deviated significantly (even after standard Bonferroni correction) from their Hardy–Weinberg expectations, the null hypothesis of FIS = 0, and genotypic linkage equilibria (Table 1). Importantly, the numbers of these significant departures increased in a parallel northeastwardly direction for all three tests (Fig. 1). Specifically, 5, 7, 9, and 11 (out of 13) loci deviated significantly from their Hardy–Weinberg expectations for XM, ND, WZ/YH, and SM/ZS, respectively. In turn, three and ten loci were associated with significantly positive FIS for XM/ND and WZ/YH/SM/ZS, respectively. Coupled with the total FIS that were also significantly positive (according to their CI > 0), these per-locus FIS indicated that the Hardy–Weinberg deviations of the six geographic samples were due to heterozygote deficits. Furthermore, 26, 37, and 62–70 pairs of loci (out of 78) deviated significantly from genotypic linkage equilibria for XM, ND, and WZ/YH/SM/ZS, respectively. Thus, levels of within- and between-locus disequilibria and heterozygote deficiency were greatest within the secondary contact region.

ECS, SCS, and hybrid identifications

Although based on different treatments of the Cytb, RyR3, and microsatellite data, we found that the provisional and final ECS, SCS, and hybrid identifications were similar with only 39 of the 501 specimens with both assignments showing mismatches (Table 2 and Supplementary Table 3). For the final individual identifications, 46.3, 42.9, and 10.8% of the 501 specimens were identified as SCS, ECS, and hybrid, respectively. The percentages for the SCS identifications decreased northeastwardly from 97.9% at XM to 93.5, 51.8, 38.3, and 18.3–18.5% at ND, WZ, YH, and SM/ZS, respectively. Correspondingly, the percentages for the ECS identifications increased northeastwardly from 0.0% at XM to 5.2, 33.7, 46.9, and 62.5–74.1% at ND, WZ, YH, and SM/ZS, respectively. Furthermore, the hybrid percentages also increased northeastwardly from 1.3–2.1% for XM and ND to highs of 7.4–19.2% for WZ, SM, YH, and ZS. Collectively, these percentages documented that ECS, SCS, and hybrid individuals were syntopic throughout the ECS, which reconfirmed this northwest Pacific marginal sea as a secondary contact region with hybridization (Qiu et al. 2016).

Table 2 Provisional and final individual identifications for the six geographic samples

Of the 54 total hybrids, 33 were characterized by a nDNA/Cytb mismatch (i.e., qECS 0.90 and SCS-related Cytb or qECS 0.10 and ECS-related Cytb), whereas 21 were defined by an admixed nDNA of 0.90 > qECS > 0.10 (Table 2 and Fig. 2). Of the latter 21 hybrids, qECS for 13 fell outside the intermediate range of 0.30–0.70 (i.e., their nDNA were closer to either qECS 0.90 or 0.10 for ECS and SCS individuals, respectively).

Fig. 2
figure 2

Histograms summarizing the qECS and Cytb origins for 501 individuals of the six geographic samples. Specimens are binned according to their qECS and each bin is then shaded according to its relative frequencies of ECS- and SCS-related Cytb. Sample sizes are provided for each geographic sample in brackets

The final ECS, SCS, and hybrid identifications for the six geographic samples agreed with the distributions of their individuals in the PCoA graphs (Supplementary Fig. 2). For example, of the eight hybrids from SM, only one (with qECS = 0.58) fell between the two nonoverlapping ECS and SCS clusters of the SM sample. Conversely, the seven other hybrids from SM fell more or less within the spread of either the ECS or SCS clusters. Of these latter seven, six qualified as hybrids because of their nDNA/Cytb mismatch, whereas the seventh was defined by its admixed nDNA (qECS = 0.88) that was more closely related to the nDNA of the ECS cluster. In turn, two separate clusters were not evident for the 48 individuals from XM, because this locality occurred outside of the secondary contact region in the ECS (Fig. 1).

Figure 3 compared the qECS and 95% credible intervals for the 21 sleeper hybrids with admixed nDNA (i.e., 0.90 > qECS > 0.10) to the mean qECS and CI for the 12 simulated ECS, SCS, and hybrid classes. The CI for the simulated hybrid classes often overlapped (e.g., the CI for the F1 and F2 hybrid categories), whereas the 95% credible intervals for the 21 sleeper hybrids were relatively broad. Thus, none of the latter 21 hybrids was assignable to a single simulated class. Still, of great interest was that the F1 and F2 hybrid classes were rejected for all but nine hybrids whose 95% credible intervals overlapped with the CI of the former. This finding of relatively few F1 hybrids (with an upper bound of nine) was close to the count of eight with admixed nDNA of intermediate 0.70 > qECS > 0.30 (Fig. 2).

Fig. 3
figure 3

Estimates of qECS for the 12 simulated ECS, SCS, and hybrid classes (a) and for the 21 sleeper hybrids with admixed nDNA of 0.90 > qECS > 0.10 (b). Estimates for the former are summarized as the means and CI (closed dots and solid bars, respectively) for the 250 simulated individuals of each category. Estimates for the latter refer to the qECS and 95% credible intervals (closed dots/solid bars and open dots/broken bars) for the 9 and 12 sleeper hybrids with admixed nDNA (Fig. 2), whose credible intervals overlap and do not overlap with the CI for the simulated F1 and F2 hybrid classes, respectively

Morphological trait variation

Qiu et al. (2016) reported that dark costal stripes were consistently present and absent for ECS and SCS/hybrid individuals, respectively. However, we found this trait to be less consistent (i.e., the presence/absence of dark costal stripes varied within as well as among the ECS, SCS, and hybrid classes). Specifically, dark costal stripes were absent for nine and one of our ECS specimens from YH and ND, respectively (Table 2 and Supplementary Table 3). Furthermore, dark costal stripes were present for one, four, and one of our SCS specimens from YH, WZ, and ND, respectively, and for 29 of our 54 total hybrids. Indeed, similar discrepancies were also found for the provisional individual identifications that relied on the Qiu et al. assignment method. Althought imperfect, the presence/absence of dark costal stripes was sufficiently constant to distinguish between ECS and SCS individuals (95.3% present vs. 97.4% absent, respectively). Conversely, the utility of this trait for hybrid identification was no greater than that of flipping a balanced coin (i.e., 46.3% absent and 53.7% present).

Also unlike Qiu et al. (2016), our SL comparisons of the final ECS, SCS, and hybrid identifications were performed on a sample-by-sample (rather than combined sample) basis, which allowed for an assessment of this trait’s geographic variation (Figs. 1 and 4). Mean SL for the ECS samples ranged from 114.9 to 125.8 mm, whereas those for the SCS collections increased clinally from 117.3–129.2 to 135.6–168.8 mm for XM, ND, and WZ/YH/SM/ZS, respectively. According to their overlapping CI, SL for the SCS sample from XM (where ECS individuals were absent) was not significantly different from those for the five ECS collections. Conversely, according to their nonoverlapping CI, SL for the SCS samples from WZ, YH, SM, and ZS (where both SCS and ECS individuals co-occurred at 18.3%) were all significantly greater than those for their four sympatric ECS collections. Furthermore, SL for the SCS sample from ND (where only four ECS specimens were found) was intermediate, such that its SL was not significantly different than that for the ECS collection from WZ (i.e., its northern neighbor) but was significantly less than those for the SCS samples from WZ, SM, and ZS. Thus, SL for the SCS samples varied geographically such that they were smallest in allopatry and largest in broad sympatry with the ECS lineage.

Fig. 4
figure 4

Mean SL and CI (dots and bars, respectively) for the SCS, ECS, and hybrid individuals of each geographic sample. Sample sizes are presented in parentheses. Open dots and broken bars refer to means and CI, respectively, that are based on sample sizes of 4 specimens

A detailed assessment of the hybrid SL was precluded by the small sample sizes of 1 or 8 for SM, ND, and XM (Fig. 4). Still, as implied by Qiu et al. (2016), we found that the mean SL for the hybrids with larger samples (i.e., ZS, YH, and WZ) were intermediate to those for their syntopic ECS and SCS counterparts.

Discussion

Hybridization within the secondary contact region of the ECS

The five geographic samples from north of the Taiwan Strait (ND to ZS) each consist of ECS, SCS, and hybrid individuals (Fig. 1 and Table 2), which reinforces the conclusion that the ECS is a secondary contact region with hybridization (Qiu et al. 2016). However, hybridization is uncommon within this secondary contact region, as evidenced by the extensive within- and between-locus disequilibria and heterozygote deficits in the five geographic samples of the ECS (Table 1) and low counts of hybrids (particularly, F1) in the final individual identifications (Figs. 2 and 3). As F1 hybrids are the starting points for later generation hybridization, their scarcity is indicative of pre- and/or post-mating isolating mechanisms that diminish the frequency of successful cross breeding between the ECS and SCS lineages. Isolating mechanisms may be related to reduced hybrid viability due to genetic incompatibilities between the two lineages and/or to positive assortative mating by the ECS and SCS parents, which is based on their SL and dark costal stripe differences (see below).

A relatively high level of mtDNA introgression between the ECS and SCS lineages is indicated by the fact that the majority of hybrids are defined by nDNA/Cytb mismatches (Fig. 2). This finding for the four-eyed sleeper corroborates the results of a large number of previous studies, documenting the relative ease with which organellar genomes (e.g., mtDNA) introgress between species and other diverged lineages relative to nDNA across a range of taxa (Coyne and Orr 2004; Bachtrog et al. 2006; Toews and Brelsford 2012; Choleva et al. 2014). Because most mtDNA genes are “house-keeping” genes, the mtDNA of one group is less likely to experience genetic incompatibilities with the nDNA of another, thereby reducing the purifying selection against its introgression into the gene pool of the latter. This commonly invoked argument provides a ready explanation for why sleeper hybrids with nDNA/Cytb mismatches are most common.

Classic tension zone model of hybridization

Hybridization within the secondary contact region of the ECS conforms to the classic tension zone model, whereby a hybrid cline is maintained by the opposing forces of dispersal into the hybrid area and selection against hybrids (Barton and Hewitt 1985; Gay et al. 2008; Rheindt and Edwards 2011; Taylor et al. 2015). For the four-eyed sleeper, alleles may be lost in the secondary contact region, because of reduced hybrid viability and/or constraints imposed by positive assortative mating on the less common SCS parents. Such loss is opposed by the northeastwardly spring and summer dispersal of SCS planktonic larvae from allopatric populations south of the Taiwan Strait (e.g., from XM) into the hybrid region (Fig. 1). The end result is a hybrid cline that is maintained by a balance of allele gain via SCS dispersal from the south and sequence loss by negative hybrid selection and/or positive assortative mating in the ECS.

The final individual identifications for the XM sample include one later generation hybrid with an admixed nDNA that is more closely related to the nDNA for the SCS lineage (Fig. 2). Although this hybrid may be the result of in situ hybridization, we regard this possibility unlikely as ECS individuals remain unsampled at XM (Table 2 and Qiu et al. 2016). Instead, we favor the hypothesis that this hybrid represents a migrant from the ECS, which was dispersed to XM by one of the coastal eddies that, during the spring and summer, flow across the Taiwan Strait in an opposite southwestwardly direction to the prevailing northeastwardly South China Sea Surface Current and Kuroshio Branch Current (Jan et al. 2002).

Reproductive character displacement as an explanation for the SL variation

In contrast to the ECS collections, SL for the SCS samples vary geographically in a manner that is consistent with reproductive character displacement (i.e., reinforcement (Coyne and Orr 2004; Pfennig and Pfennig 2009)) (Fig. 4). In allopatry or where the number of ECS specimens is ~5.0% (i.e., at XM and ND, respectively), SL for the SCS samples are comparable to those for the ECS collections (Fig. 1). Conversely, where the SCS and ECS groups both occur at percentages of 18.3% (i.e., at WZ, YH, SM, and ZS), SL for the SCS samples exceed those for their XM and ND counterparts, as well as those for the ECS collections. Under a tension zone model of hybridization, this increase in SL within the secondary contact region is attributable to selection for an exaggerated size-based phenotype that minimizes the cross breeding of SCS individuals with ECS specimens and the subsequent loss of their alleles via hybrid inviability. Thus, according to this interpretation, the northeastwardly increase in the SL of the SCS samples represents reproductive character displacement, whereas the larger size of their individuals within the ECS hybrid region reflects an incomplete pre-mating isolating mechanism between the SCS and ECS lineages.

Alternatively, the northeastwardly increase in SL among the SCS samples may be related to Bergmann’s rule, which offers an ecogeographical reason as to why body size is often larger in more northern groups (Fisher et al. 2010; Vinarski 2014). Specifically, as bigger bodies have a smaller surface area to volume ratio, larger individuals will lose heat less quickly than smaller ones, which can be advantageous in more northern colder environments. However, we find this explanation to be less likely than that of reproductive character displacement for the SL increase of the SCS samples in the hybrid region (Figs. 1 and 4). Most importantly, although SL for the more northern WZ is bigger than the lengths for ND and XM, its mean monthly sea temperatures are similar to those for the latter two (World Sea Temperature 2017). From May through November, the mean monthly sea temperatures for WZ and XM (which are separated by a latitudinal distance of 3.0369°) differ by <1.7 °C (indeed, WZ is warmer than XM by 0.7 °C in August). Then, from December through April, WZ is cooler than XM by 2.1–3.4 °C. In turn, the mean monthly sea temperatures for WZ and ND (which are separated by a latitudinal distance of 0.8500°) differ throughout the year by <1.2 °C (with WZ now warmer than ND by 0.1–0.3 °C in February, July, and August). Furthermore, although the body lengths of marine fishes often vary geographically in a manner consistent with Bergmann’s rule, this trend is not universal (e.g., a positive size/latitude association is missing in the Northeast Pacific; Fisher et al. 2010). In light of these similar mean monthly sea temperatures and exceptions to Bergmann’s rule, we find reproductive character displacement to be a more likely explanation for the SL increase between the allopatric and sympatric SCS samples. Still, it remains possible that this northeastwardly increase in SL may reflect instead local adaptation to one or more environmental factors other than temperature (Vinarski 2014).

Incipient speciation between the ECS and SCS lineages

Rheindt and Edwards (2011) presented a model of speciation, whereby the last dozen cycles of pan-continental glaciation and interglaciation are tied to alternating rounds of vicariance, then secondary contact/hybridization, respectively. According to the Rheindt/Edwards model, two lineages will continue to diverge across successive Ice Ages, which leads to their accumulation of genetic and phenotypic incompatibilities and to the disruption of their hybridization while in secondary contact during interglaciations. Eventually, the two lineages may reach a point where they become reproductively isolated and thereby qualify as different species under the biological species concept (Coyne and Orr 2004). For the four-eyed sleeper, Qiu et al. (2016) estimated that the ECS and SCS lineages split ~440,000 years ago during the Ice Age of Marine Isotope Stage 12. Since then, there have been three more recent Ice Ages, where sea levels dropped by >100 m, the Taiwan land bridge was correspondingly exposed, and the ECS and SCS lineages were successively isolated, which led to their current accumulation of genetic and phenotypic (e.g., SL-based) incompatibilities (Siddall et al. 2003; Railsback et al. 2015). However, as evidenced by their continuing low level of hybridization, the pre- and post-mating isolating mechanisms of the ECS and SCS lineages remain incomplete, which signify that the two groups are still at an incipient stage of the speciation process.

ECS as a “natural laboratory” of hybridization

The northwest Pacific marginal seas remain a “hotspot” for the phylogeographic and evolutionary study of marine groups, because of their dynamic geographic history of sea level decreases and increases during the glaciations and interglaciations of the last one million years (Ni et al. 2014; Wang et al. 2015). Within this region, falling and rising sea levels have led to a recurring pattern of “vicariance, then secondary contact” among its marine groups. This congruence among different groups allows for replicative testing of those factors and mechanisms that drive hybridization and introgression, divergence and adaptation, gene flow, and speciation in coastal and oceanic systems (Hewitt 1988, 2011; Currat et al. 2008; Rheindt and Edwards 2011; Taylor et al. 2015).

In the case of the four-eyed sleeper, we now call for more critical experiments and tests of our current hypotheses and conclusions about its secondary contact and hybridization in the ECS. In particular, controlled laboratory crosses are now needed to directly test the viability and fertility of its hybrids and the mating preferences of its ECS and SCS parents, especially with regard to their SL and dark costal stripe differences (Zhong and Li 2002; Hong et al. 2016). Future studies should also extend to population genomics work, especially for the identification of “genomic islands of divergence,” which are most likely to include the genes that are responsible for the genetic and phenotypic incompatibilities between the ECS and SCS lineages (Nosil et al. 2009; Hemmer-Hansen et al. 2014; Pujolar et al. 2014). In these ways, our current findings and conclusions about the four-eyed sleeper can achieve their greatest utility as a case study for future research on the vicariance, secondary contact, and hybridization of marine groups in the northwest Pacific marginal seas.

Data archiving

The Cytb and RyR3 sequences for the 534 new specimens are available in GenBank under accession numbers MF002602-MF003135 and MF003136-MF004203, respectively. The Cytb and RyR3 genotypes, morphological traits, qECS, and provisional/final individual identifications for the new specimens are provided in Supplementary Table 3.