Environmental conditions favoring alternative phenotypes and function constitute divergent selection (Darwin 1859; Fisher 1930; Levins 1968; Endler 1986; Schluter 2000). For example, a given phenotype may be superior in one environment for enabling a vital function but inferior in other environments that demand alternative functions (Levins 1968; Futuyma and Moreno 1988). Natural environments typically contain a mosaic of continuous gradations or state changes over time and space (Levins 1968). Therefore we expect traits under divergent selection to occur as a mosaic of differentiation coincident with adaptive phenotype-environment functional matching (DeWitt and Scheiner 2004). When presented among multiple populations, divergent selection can create replicated patterns of phenotypic differentiation (Langerhans and DeWitt 2004; Schluter et al. 2004; Gompel and Prud’homme 2009; Ord and Summers 2015). Replicated diversification that is predictable based on environment is strong evidence for the adaptive value of replicated trait differentiation (Losos et al. 1998; Arendt and Reznick 2008; Gompel and Prud’homme 2009).

Replicated diversification across environments has been broadly observed in taxa including microbes (Wang et al. 2018), plants (Leger and Rice 2007; Pandey et al. 2015; James et al. 2021), invertebrates (Cunha et al. 2008; Eroukhmanoff et al. 2009), vertebrates (Rivera 2008; Moen et al. 2016), and particularly in fish (Day et al. 1994; Reznick et al. 1997; Langerhans and DeWitt 2004; Ruehl et al. 2011; Tobler et al. 2011; Oke et al. 2017; Greenway et al. 2020). Such stereotyped diversification within or among taxa suggests common cause and brings into question the mechanisms that produced the respective phenotype-environment matching: Does the replicated pattern involve only selection or evolution? Did canalized phenotypic differentiation or developmental plasticity evolve? Was there a single or multiple independent instances of evolution?

Replicated differentiation among populations may arise from at least four causes: (1) selection only (for nonheritable traits) (Fisher 1930; Endler 1986), (2) evolution of adaptive phenotypic plasticity (Via and Lande 1985; DeWitt and Scheiner 2004), (3) replicated canalized genetic differentiation (Waddington 1957), or (4) a single genetic differentiation with subsequent occupation (colonization or persistence) bias for later founded populations (Avise 1989; Nepokroeff and Sytsma 1996). Distinguishing mechanisms that underly replicated differentiation may require analysis of the functional ecologies of phenotypes by environment, heritability or plasticity of phenotypes, and phyletic affinities and genetic differentiation of populations. These factors as relevant here are illustrated in Fig. 1. Ecophenotypy can be identified by replicated phenotypic syndromes in repeated association with specific environments (Fig. 1a). The functional ecology of ecophenotypes should be understood in the context of each environment to establish whether ecophenotypy represents adaptive phenotype-environment matching (DeWitt and Scheiner 2004). Common garden experiments may be used to document plasticity across environments and quantitative genetic variation within or among populations having similar or differing environments (Futuyma 2021)(Fig. 1b). Replicated canalized differentiation would present as ecophenotypies mapping haphazardly in a polyphyletic phylogram (Fig. 1c, left) whereas ecophenotypy that evolved once followed by occupation bias would present as monophyly (Fig. 1c, right). Corresponding patterns of multivariate genetic ordinations for the single and replicated phenotypic differentiation models are given in Fig. 1d. Such ordinations are appropriate for loci such as microsatellites that fit a stepwise mutation model (Slatkin 1995).

Fig. 1: Patterns of phenotypic and genotypic differentiation.
figure 1

a Dendrogram showing ecophenotypy for populations af each in one of two environmental states given by font color. b Phenotypes produced by sibling cohorts from families identified by subscript number and populations per letter designation. Siblings in family a1 and c2 constitutively express the ecotype associated with their native environment regardless of rearing conditions. Siblings in family p1 exhibit ecophenotypic plasticity based on rearing environment. Nonoverlap of population centroids (colored ellipses) in laboratory rearing under neutral environmental conditions indicates a likelihood of heritable ecophenotypy. c Genetic dendrograms showing example patterns for populations with replicated phenotypic differentiation through independent evolutionary events (left) or a single differentiation with colonization bias by lineage (right). d MAMOVA biplot showing expected conformations under the replicated (left) and single (right) diversification models. Vectors indicate how allelic composition at locus G1–G8 structure the canonical genetic space.

In the present study, we examined a highly replicated pattern of predator-associated body shape diversification in mosquitofish, Gambusia affinis. The study builds on previous results evincing replicated diversification and canalized genetic differentiation of the ecophenotypes (Langerhans et al. 2004) and functional studies documenting the adaptive value of the PABS ecophenotypy (Langerhans et al. 2004, 2005). We present molecular population genetic analyses of the previously studied populations and compare genetic ordination patterns to those expected under various models for mechanisms that can underlie replicated divergence.


Predator-associated Burst Speed (PABS) Ecophenotypy

Prior research documented a strong pattern of replicated ecophenotypy in several fishes, especially livebearing fishes. Prey fish from sites with pursuit predators (larger predatory fishes) consistently had small heads and trunks but enlarged caudal regions (Fig. 2). The replicated pattern was first published for 18 total populations of fish from 3 non-congeneric species (G. affinis, Brachyrhaphis rhabdophora, and Poecilia reticulata) (Langerhans and DeWitt 2004; Langerhans et al. 2004). Among those 18 populations, despite varied body plans, geographies, and site characters other than predators, the shared PABS phenotypy (that expressed across the predation gradient similarly for all species) was 60% greater than the sum of unique (species- or population-specific) predator regime effects (Langerhans and DeWitt 2004) (Fig. 2A). PABS ecophenotypy was later found in 6 more Gambusia (Langerhans et al. 2007; Langerhans and Makowicz 2009; Moody and Lozano-Vilano 2018), 2 more Brachyrhaphis (Ingley et al. 2014), another Poecilia (P. vivipara; Gomes and Montiero 2008), Phalloptychus januarius (Santi et al. 2020), and offshore collections of an African minnow (Rastrineobola argentea; Sharpe et al. 2015). The essential elements of PABS phenotypy—small head and trunk and enlarged caudal region—are also evident in other previously published cases of replicated diversification such as that in threespine stickleback (Gasterosteus aculeatus; Fig. 2C), blacktail shiner (Cyprinella lutrensis; Franssen 2011) and Galaxias platei (Milano et al. 2006). In two cases PABS phenotypy was not found where sought, both involving species found in other circumstances to exhibit the syndrome: (1) R. argentea from inshore habitats (Sharpe et al. 2015) and (2) P. vivipara from a complex habitat mosaic (Araújo et al. 2014). Araújo et al. (2014) resolved that “other selective agents (e.g., salinity, feeding) also act on the studied populations and the observed variation in body shape is a compromise between these multifarious selective agents”. Thus, with occasional exceptions, the PABS ecophenotypic paradigm has proven to be represented across a range of fish species.

Fig. 2: Predator-associated body shape in prey fishes.
figure 2

A Illustration of predator-associated body shapes in composite for G. affinis, B. rhabdophora, and P. reticulata (magnified 3×). B Procrustean morph of a single G. affinis photo to canonical extremes of it’s species predator regime effect (unmagnified). C Line drawing of body shapes for Gasterosteus aculeatus (unmagnified). Illustrations after references Langerhans and DeWitt (2004), Langerhans et al. (2004), and Walker (1997).

Functional Ecology

The functional basis for PABS phenotypic diversification is well understood. Mosquitofish from populations sympatric with pursuit predators accelerate 16–28% faster for a given body size compared to those from populations lacking the predators (Langerhans et al. 2004, 2005; Langerhans 2009). Even within populations, fish having higher PABS canonical shape scores were commensurately faster accelerators (Langerhans et al. 2004). The positive relationship between PABS phenotypy and burst speed was also demonstrated for 3 Brachyrhaphis species (Ingley et al. 2014). The functional principles of PABS—burst acceleration conferred by a small head and trunk and enlarged caudal region—also predicts burst speed in 8 species of frog larvae (Dayton et al. 2005; Johnson et al. 2008; Arendt 2010; Calsbeek and Kuchta 2011). Higher burst speeds associated with PABS ecophenotypy were confirmed to increase survivorship with predators in G. hubbsi (Langerhans 2009) and the 3 Brachyrhaphis species (Ingley et al. 2014). Burst speed is also known to predict survival with pursuit predators where brood burden is the major determinant of burst speed in G. affinis (Belk and Tuckfield 2010) and guppies (Ghalambor et al. 2004).

Developmental basis for PABS phenotypy

A genetic basis for PABS was evinced by common garden rearing in the absence of predators (Langerhans et al. 2004, 2005; Ruehl and DeWitt 2005). Offspring of G. affinis from the 6 original study populations raised in the laboratory expressed the phenotypes exhibited by field-collected fish from their ancestral populations. Common garden experiments such as this do not allow complete exclusion of maternal effects as a potential basis for phenotypic differentiation. Although maternal effects were possible, dams were kept in a predator-free environment for 4 weeks—longer than a brood cycle—prior to obtaining offspring for the rearing experiment. Phenotypic plasticity was not tested in the common garden experiments with G. affinis only. Yet there may be little room for plasticity to play a significant role in the ecophenotypy. The effect size (ηp2) of ecophenotypy among field-collected juvenile fish was 48.4 % (Langerhans et al. 2004). The natal environment effect for lab-reared offspring was nearly as strong, explaining 42.6% of partial shape variance (Langerhans et al. 2004). Arnett and Kinnison (2017) and Arnett (2016) explicitly tested for plasticity of G. affinis and G. holbrooki in response to largemouth bass, Micropterus salmoides. Plasticity was not statistically supported in the published paper nor in studies on multigenerational plasticity in a thesis (Arnett 2016). Lack of detectable maternal-effect plasticity coupled with the common garden results of Langerhans et al. (2004) support the conclusion that canalized genetic differentiation predominantly underlies replicated PABS phenotypic differentiation among the 6 populations that are the subject of the present study. The common garden experiment in Langerhans et al. (2004) also demonstrated phenotypic differentiation attributable to populations nested within predator state (Langerhans et al. 2004). Thus, it appears likely that both general and PABS morphology has a heritable basis.

Materials and methods

Collection and sites

G. affinis were sampled from the six populations previously documented to exhibit replicated phenotypic, functional, and likely quantitative genetic diversification (Langerhans and DeWitt 2004; Langerhans et al. 20042005). These populations will be referred to by acronyms as per the published work (AC – Autumn Circle, HE – Hensel, KT – Krenek Tap, UO – University Oaks, and RA and RB – Riverside A and B). One site, RA, among the original populations, had been destroyed by anthropogenic habitat alteration. However, we discovered fish from this population in two ponds ~200 m downhill from the original site. Genetic identity of these fish was confirmed by comparison with preserved laboratory stock whose founders originated from the Riverside sites (see Stock Identity below).

Fish from field sites were collected using dip nets and seines and were immediately euthanized using 2-phenoxyethanol and stored in 75% ethanol prior to processing. Laboratory stock had been treated similarly. The predator states, locations, and number of fish in the samples are given in Supplementary Table S1.

Molecular methods

DNA extraction was performed on caudal fin tissue with the PUREGENE® DNA Purification Kit (Gentra Systems, Minneapolis, MN). PCR was used to amplify twelve previously described microsatellite loci (Spencer et al. 1999; Purcell et al. 2011; Zane et al. 1999) that were found to be variable in our samples. Detail regarding these loci (Gafu2, Gafu3, Gafu5, Gafu7, Mf-6, Gaaf11, Gaaf7, Gaaf13, Gaaf15, Gaaf16, Gaaf22) is summarized in Supplementary Table S2. PCR was initiated with a 5-minute denaturation period at 95 °C followed by 40 cycles of 30 s at 95 °C, 30 s at the annealing temperature for each primer, and 1 min at 72 °C. The final extension period was 5 min at 72 °C. In the case of five primers, a touchdown PCR protocol was used. This protocol was identical to the standard PCR protocol with the exception that it was run for 20 cycles with the annealing temperature starting at 65 °C which then decreased by 0.5 °C every cycle. It then finished up with 20 cycles at the final annealing temperature of 55 °C. Annealing temperatures for each primer, microsatellite repeat motifs, and authorities are given in Supplementary Table S2.

PCR products were run on agarose gels to confirm amplification. If amplification was confirmed, PCR products were visualized in an ABI 377 automated sequencer with the Genescan® −400 HD Rox Size Standard (Applied Biosystems) for sizing. Allele sizing and scoring was performed using Applied Biosystem’s Genescan® 3.1.2 and Genotyper® version 2.5 software and 95% coverage was attained. Locus scoring was examined using Microchecker and FreeNA (Van Oosterhout et al. 2006; Chapuis and Estoup 2007) in order to test for null alleles and potential scoring errors due to stuttering.

Missing data and multilocus genotype methods

A fraction of alleles (6% of total) were unresolved by our methods. Most standard analytics in population genetics treat loci separately, so missing values did not require full-case deletion of data. For multivariate analyses such as discriminant analysis, multivariate clustering, and multivariate analysis of molecular variance as described below, but not for STRUCTURE analysis, we imputed missing allele scores using maximum likelihood in JMP (v. 13). This inserted leverage-minimizing values in place of missing values to obviate the need for full-case deletion of multilocus data. Thus all informative loci contributed to analytical results. Since imputed data did not represent actual locus assessments, allele counts did not include them. For example, an imputed value was not considered to be a ‘private allele’ for population genetic inference.

Alleles as assayed did not have identity to individual chromosomes, so each case (row of data on repeats for all loci) was effectively a multilocus ‘pseudohaplotype’. This designation was only meaningful for multivariate analytics. For MAMOVA we included a main effect of individual specimens nested in population to treat each pair of pseudohaplotypes per individual fish as a diplotype for statistical accounting. For discriminant analyses we combined posterior classification probabilities for each pair of pseudohaplotypes to yield diplotype predictions.

Stock identity

To assess the genetic affinity of fish from the two ponds downhill from the original RA site, a linear discriminant ordination was constructed. The two putative RA populations and laboratory stocks were initially treated as separate samples to define proximity of multivariate least squares means for the putative RA samples and the RA lab sample. Based on the results, a second ordination was constructed pooling laboratory and known or inferred field stock to examine ordination strength for improvement.

The initial discriminant ordination of multilocus data treating the laboratory stocks, putative RA pond samples, and other field samples separately demonstrated genetic affinity of RA and putative RA samples (Fig. 3a). Ninety-five percent confidence ellipses of the multivariate least squares means for the RA laboratory stock and the putative RA field samples overlapped considerably in an otherwise unoccupied region of canonical space. No diplotypes from RA or the pond samples overlapped any of the 95 other diplotypes in the ordination. Thus, the pond sites and laboratory RA stock were collectively deemed as a unified RA sample. Likewise, the laboratory and field stock from RB were pooled based on tight proximity of their multivariate least squares means. The discriminant ordination conducted after pooling demonstrated a clarified population ordination involving small confidence ellipses with no multidimensional overlap (Fig. 3b). The increase in ordination strength further supported the stock identities as assigned.

Fig. 3: Linear discriminant ordination of multilocus data by sample origin.
figure 3

a Initial ordination treating laboratory stock and putative RA samples as separate entities. b Ordination after pooling laboratory stock and known or inferred samples. Diplotype scores are given for linear discriminant axis (LDA) 1 below the graphs.

Population genetic analyses

Population genetic structure was assessed by considering descriptive metrics of genetic variation, genotypic disequilibrium within and among individuals and populations, and genetic distances among populations. We calculated numbers of private and shared alleles, allelic diversity and richness, expected and observed heterozygosities, and disequilibria within individuals (FIS) and among populations (FST). Loci found to be out of Hardy–Weinberg equilibrium were examined with the program Bottleneck (Piry et al. 1999) stipulating the stepwise-mutation model per Slatkin (1995), which is the preferred model for microsatellite polymorphism (Putman and Carbone 2014; Jiang et al. 2021). Bottleneck output and patterns of allele diversity by frequency were used to inform on whether founder events or recent reduction in effective population size were likely causes of the disequilibrium.

Among population variation was characterized with the metrics RST, FST, and unbiased Nei’s genetic distance (Nei 1978). Genetic distances were calculated using the software GenAlEx (v. 6.0) stipulating a stepwise-mutation model (Peakall and Smouse 2006). Genetic distance matrices were compared using matrix correlations (Mantel tests) with phenotypic and geographic distance matrices. Phenotypic distance matrices were calculated using least squares means from geometric morphometric MANCOVAs following Langerhans and DeWitt (2004). A first model treated shape as the dependent data suite and populations of origin as predictors with centroid size as a covariate. A second version was calculated including predator state of the populations, in which case population identity was nested in the predator state effect. This second model provided separate least squares means morphometry specifically for the PABS effect, so distance matrix correlations could be run for morphology in general, and PABS morphology specifically. Two geographic distance matrices were calculated, the first using GIS coordinates for actual sample locations (‘as the crow flies’ distances) and the second measuring distances through the nearest likely hydrological connections between populations (‘flow-wise’ distances). Distance matrices are given in Supplementary Table S3.

Multivariate (multilocus) population genetic differentiation was tested with STRUCTURE software (v. 2.3.4) which examines the likelihood of various numbers of divisions among populations (Pritchard et al. 2000). The program was run using 5 × 106 repetitions, a burn-in of 106 repetitions, and an initial α of 1. Six iterations were conducted with K ranging from 1 to 8. STRUCTURE Harvester was then run to scrutinize outcomes at the various K. STRUCTURE inferences were confirmed and characterized using multivariate analysis of molecular variance (MAMOVA). MAMOVA treated loci collectively and examined covariance of alleles across loci as part of the analytical structure as opposed to AMOVA which pools results from univariate analyses of each locus (Nievergelt et al. 2007). The MAMOVA assumes a stepwise data relationship for differing repeat numbers within loci. For example, a difference of 20 repeats is twice as great a linear genetic distance as one of 10 repeats. Thus, population differences were statistically tested and visualized in a canonical space (among group variation scaled to within group variation). MAMOVA tested for population differentiation as a main effect with individual fish identities—pairs of haplotypes in terms of the data structure—nested in populations. The nested individual effect tested for genetic disequilibrium (multivariate FIS). Population differentiation and its genetic structure were assessed by effect sizes (as ηp²), 95% confidence ellipses for canonical centroids (multivariate least squares means), and locus-specific vectors describing the canonical structure of the multivariate genetic space. The extreme position of one of the populations in the canonical space motivated a secondary MAMOVA with that population removed to ensure that results of the primary analysis were robust. A tertiary MAMOVA was run for the sole purpose of providing an effect strength for a predation regime main effect to contrast with that for populations nested in predation regime. MAMOVAs were calculated in JMP Pro (v. 15) and precisely replicated in Excel (v. 2109) for transparency of detail (Supplemental File S2).

The ordination of population canonical centroids was tested for predator state bimodality using a data dispersion analysis (DeWitt et al. 2021). The dispersion analysis tested for multivariate gradient structure in the five canonical dimensions (single adaptation model; illustrated in two dimensions in Fig. 1d). Though not a planned focus, this analysis also tested for overdispersion and hotspot clumping of population types in the genetic space (Supplementary Fig. S1). To conduct these tests we calculated a cross product, ṿ'Pṿ, between centroid proximities (the hollow inverse distance matrix P) pre- and post-multiplied by the centered vector of effects-coded population states ṿ. The position of the cross product in a null distribution of those from 9999 replicated case wise randomizations was used to define an effect size (correlation equivalent) and two tailed probability of random dispersion. For a 5% type I error rate, if the actual cross product was in the bottom or top 2.5% of values, we rejected the null hypothesis of random dispersion. Cross products in the left tail indicated overdispersion and those in the upper tail indicated gradient or hotspot clumping. It was not necessary to apply methods to determine which of the latter two effects was occurring (e.g., by correlelogram per Dale and Fortin 2014, or a further specialized metric per DeWitt et al. 2021). This analysis is included in Supplemental File S2.

Two covariance cluster analyses were run using MAMOVA output to construct phylograms for the six populations. The first used regular least squares means for the six populations and the second used canonical centroids. Both were constructed as neighbor-joining dendrograms using multivariate distance between population summary metrics. Ecophenotypy was mapped for each branch to facilitate visual assessment of tendencies toward monophyly or polyphyly. Cluster analyses were conducted with JMP Pro (v. 15).


Population genetic descriptions

A high degree of polymorphism was noted in the focal populations with 209 (7.2%) distinct alleles identified out of 2886 scored for 127 fish. There was no difference in the percentage of unique alleles among populations or population predation regimes. Unique allele percentages ranged 17–22% by population—16.8, 17.5, and 21.6% for populations lacking predators (HE, RA, AC) and 20.3, 19.1, and 18.8% for populations with predators (KT, RB, UO). A marked genetic difference among the populations was a high percentage of private alleles, 38.5%, in the RA population compared to 3.1–8.5% for all others. The high allelic endemism of RA relative to other populations was associated with a greater allelic richness metric (9.1 versus 5–7 for other populations (Table 1). These patterns taken together coincide with the strong separation of RA on the major discriminant axis (Fig. 3b).

All populations were similarly high in FIS, indicating excess homozygosity which was evident for most loci in all populations (Table 1). Only one locus (Gaaf22) in one population (RA) demonstrated heterozygote excess (81% in excess of expected; P < 0.01). This locus did not exhibit a trend for excess heterozygosity in any of the other populations. Despite the high genetic diversity noted in these populations, FST values were 0.07 ± 0.02 SD which is barely ‘moderate’ based on the qualitative characterizations put forth by Hartl and Clark (1997). Beyond the slightly elevated allelic richness and greatly elevated private allele number in RA, the populations were similar.

Table 1 Summary population genetic statistics.

Bottleneck analysis under the stepwise-mutation model did not suggest recent bottlenecking in the collective of populations or in the population for which we expected bottlenecks to be most likely—the RA population. There was strong genetic diversity in all populations and the structure of the diversity did not suggest greater loss of uncommon alleles as expected when bottlenecks occur (Maruyama and Fuerst 1985). Rather, these data trended the opposite: low-frequency alleles were exponentially more common than those with higher frequency (Fig. 4a). This was also true of the population with the greatest habitat variability, RA, when assessed separately (Fig. 4b).

Fig. 4: Proportional allele counts as a function of their frequency.
figure 4

Populations had a high percentage of low-frequency alleles. a For all populations and loci pooled. b For RA only.

Population ordinations

Structure analysis revealed the strongest model of differentiation to be for k = 2, evincing at least two major genetic identities—RA versus all others (Fig. 5a). The strength of this model was coincident with the starkly high number of private alleles (49 of 127 alleles) found in the RA population. Yet the populations were all significantly and strongly distinct from one another as evidenced by the preliminary discriminant analysis above (Fig. 3), the MAMOVA below, and the clean separation of populations by STRUCTURE at k = 6 (P < 10−100; Fig. 5b).

Fig. 5: STRUCTURE plot showing genetic differentiation of the populations studied.
figure 5

a The model for 2 populations. This model had the greatest leverage. b Results presuming 6 populations. Sample sizes are as given in Table S1.

MAMOVA demonstrated that the populations were highly distinct (ηp² = 0.80; Supplementary Table S3a) and that unique aspects of RA were the major elements of genetic differentiation among populations. Subsequent canonical axes separated other populations (Supplemental File 2). The two major axes accounted for 84% of canonical variance and the ordination in that space was as illustrated in Fig. 6. This ordination did not visually suggest bimodal structure or other patterns of centroid dispersion by predator state of the population as illustrated in Fig. 1d. The dispersion analysis of centroids taking into account all five dimensions of inter-population differentiation suggested no deviation from random molecular population differentiation (r = 0.11, P = 0.8). The full annotated analysis was archived in Supplemental File 2). A large amount of variation (ηp² = 0.83) was partitioned among individuals in concurrence with the observations of high polymorphism within the populations coupled with excess homozygosity (see ‘Population genetic descriptions’).

Fig. 6: Major axes of the MAMOVA ordination of populations.
figure 6

Circles give centroid (multivariate least squares means) location in the canonical space with their size indicating 95% confidence regions. Color indicates population predator state (orange, with predators; blue, without predators). Biplot rays show allele contributions to genetic discrimination of the six populations studied. For example, alleles at locus Gaaf11 are strongly differentiated across RA and RB populations.

Major allelic distinctions between RA and the other populations involved the loci Gafu2, Mf-6, Gafu5, Gafu7, and Gaaf11. Remaining populations were most separated by allelic variation in Gafu7, Gafu2, Gaaf15, and Gaaf16. Population centroids and locus loadings (as biplot rays) are illustrated in Fig. 6. Ordination without including RA produced similar results in terms of strong population differentiation with no suggestion of bimodality between populations of unlike predator environment/ecotype (Supplementary Table S3b; Supplementary Fig. S2).

Cluster analysis of the MAMOVA regular least squares means and canonical centroids both yielded phylograms with no suggestion of monophyly of PABS ecophenotypes (Fig. 7). Ordinary locus-specific least squares means yielded a phylogram requiring two phenotype transitions and the canonical data required three transitions which is the maximum possible to parse differentiation of 6 independent populations split half and half for two ecophenotypes (contrast patterns of Figs. 1c and 7).

Fig. 7: Cluster ordinations of populations.
figure 7

a Using regular multi-locus least squares means (LSM) from MAMOVA or b multivariate centroids. Ecophenotypes/predator state of populations are given with color (orange – predators present; blue – predators absent). Circles denote inferred evolutionary origins of predator-associated phenotypy.

Distance matrix comparisons revealed high correlations (≥0.9) among the three genetic distance measures (Supplementary Tables S4 and S5). Unbiased Nei metrics were well correlated to both of the others (FST, RST)—more than the other two with each other. However, RST had a clear theoretical basis for being a preferred measure for microsatellite variation among populations (Putman and Carbone 2014), so we focused on its correlations with the geographic and phenotypic distance matrices. All matrix correlations are given in Supplementary Table S5. The ‘flow-wise’ matrix of distances yielded higher correlations with the other matrices and was correlated by 0.95 with the ‘crow flies’ distance matrix. The flow-wise metric also has a stronger theoretical basis for causal connection to genetic distances, so it was used as the focal geographic distance matrix. The RST distance matrix was correlated moderately (rmat = 0.3) with the flow-wise geographic distance matrix, indicating recognizable but not strong isolation by distance. RST distances and general phenotypic distances were strongly related (rmat = 0.7) but this relationship apparently did not involve PABS morphology. The matrix correlation between RST and PABS morphological distance among population was rmat = −0.09. Thus, there was no indication from matrix correlations that genetic distances were greater for populations having unlike PABS ecophenotypy.


The only supported conceptual model for the replicated divergence of body shape in G. affinis was that of multiple instances of adaptive evolution. Results from the present study demonstrated strong molecular genetic differentiation of populations. Degree of differentiation weakly suggested genetic isolation by distance and one population (RA) was genetically exceptional (Figs. 3, 5, and 6) but not so for PABS morphology (Langerhans et al. 2004). Genetic and phenotypic distance matrices were strongly correlated but this effect was not attributable to PABS phenotypic variation. The genetic cluster analysis of populations produced dendrograms widely divergent from the monophyly expected for the single-evolution model (Fig. 7) and MAMOVA did not reveal bimodal molecular genetic structure of populations by predation regime (Fig. 6). The evidence is thus coincident in support of a hypothesis of multiple evolutionary transitions from + to – PABS phenotypy or vice versa. Thus, we rule out the alternative hypothesis of a single divergence followed by differential colonization of or persistence in the habitats

Were the single adaptation model correct, one would expect to see monophyly and a bimodal canonical genetic structure (Fig. 1c, d). No support for this model was found among any of the diverse methods used to test the opposing conceptual paradigms. The likelihood of multiple evolutionary events is also supported for ecophenotypies at expanded taxonomic scale. Examples of replicated differentiation across environmental states or gradients are well known from many taxa (Schluter 2000; Arendt and Reznick 2008; Blount et al. 2018; Waters and McCulloch 2021). The independence of replicated adaptations are in some cases well documented (Schluter 2000; Losos 2009; Grant and Grant 2014). For example, two other livebearers (Poecilia sulfuraria and P. mexicana) independently differentiated into sulfide tolerant and nontolerant (ancestral) lineages with convergent morphological and physiological phenotypes (Tobler et al. 2008, 2011). Similarly, independently evolved replicated ecophenotypes within species arose for 22 populations of a ragwort plant species occupying either dune or headland habitats (James et al. 2021). In sticklebacks, spatially replicated differentiation into benthic and limnetic ecomorphs has progressed to formation of ecophenotypic species pairs (Schluter et al. 2004). In other systems, both colonization bias and replicated local adaptation are known. In the Hawaiian silversword radiation, independent evolution of woodiness in island-colonizing herbal forms is known for at least 4 species (Baldwin 1997), but in other cases woodiness in island species is attributable to differential colonization (Givnish et al. 1996; Nepokroeff and Sytsma 1996).

Because PABS ecophenotypy is known across so many species within livebearers and for greater breadth still among orders (Langerhans and DeWitt 2004; Milano et al. 2006; Gomes and Montiero 2008; Langerhans and Makowicz 2009; Ingley et al. 2014; Moody and Lozano-Vilano 2018; Santi et al. 2020) a single-evolution scenario underlying the phylogenetic breadth of PABS ecophenotypy would go back to common ancestry over 150 million years ago. It would require lineage sorting undeterred by chance or mixis in each lineage displaying the ecophenotypy for millennia, which seems implausible. Given the spatial and temporal mosaicism of predation risk for small fishes, the strength and clarity of PABS functional ecology (Langerhans et al. 2004; Johnson et al. 2008; Langerhans 2009; Ingley et al. 2014), the likelihood of a heritable basis for this morphology (Langerhans et al. 2004; Ruehl and DeWitt 2005), and the documented presence of reduced mate preferences for unlike ecophenotypes (Langerhans et al. 2005, 2007), it would be surprising if PABS phenotypy lacked the evolutionary lability to respond to fine-grained environmental mosaicism in predation regime and therefore regularly create replicated ecophenotypic differentiation.

Mechanistic insight into evolutionary processes often can be gleaned by contrasting FST (and related measures) with QST (Reed and Frankham 2001; Chenoweth and Blows 2008). QST is an analog of FST denoting differences among populations in quantitative traits influenced by additive allelic effects summed over potentially many loci (Spitze 1993). Approximations of QST in this system can be derived from the rearing study effect sizes. Specifically, ηp2 for population differentiation among predator states was 0.43 compared to 0.33 for populations with like predator-states (values constructed from Table 1 in Langerhans et al. 2004). This implies high QST when comparing populations of unlike predator type but modest QST otherwise: QST (predation regime) QST (populations within predation regime). Results for RST (and the other genetic distances) effect sizes estimated in the present study were similar across populations of unlike or like predator types (Supplementary Table S4). Genetic effect size among predator regimes was 0.79 and that among populations within predator states was 0.81. Thus: FST (predation regime) ≤ FST (populations within predation regime). Such contrasting patterns for trait and molecular genetics are not uncommon (Reed and Frankham 2001; Chenoweth and Blows 2008; Holsinger and Weir 2009). They reflect that there are two types of population genetic differentiations that generally operate independently and differentially respond to mechanisms of genetic change in natural populations (Reed and Frankham 2001). Divergent selection drives differentiation of heritable genetic effects contributing to relevant trait functions (Arnold 1983; Spitze 1993; Chenoweth and Blows 2008) increasing QST among unlike populations. The process of selection affects those genes contributing to trait expression relevant to the ecological gradient at hand, leaving the vast majority of DNA under the influence of nondirected differentiation (Waters and McCulloch 2021). Drift and low migration drive differentiation in neutral genetic markers, increasing FST among all populations regardless of selective state. Population oscillations such as those common to stochastic environment specialists such as mosquitofish (Matthews and Marsh-Matthews 2011), even when creating cyclical bottlenecks may have little impact on QST (Bryant and Meffert 1993) but may profoundly reduce molecular genetic variation within populations and thus increase FST (Hartl and Clark 1997). The RA population had the most erratic hydrological environment (TJD, personal observation) and this may explain its distanced position in the ordinations (Figs. 3 and 6) but lack of exceptional position in the phenotypic ordination (Langerhans et al. 2004). Although mechanisms such as selective sweeps can effect both types of genetic variation similarly, empirical evidence suggests predominantly independent patterns of differentiation for quantitative trait and molecular genetic differentiation (Reed and Frankham 2001).

Although heightened QST may originate with divergent natural selection, both it and FST will be impacted if reinforcement emerges (Dobzhansky 1955; Schluter 2000). Reinforcement is the secondary evolution of barriers to mixis between populations adapted to different environments (Dobzhansky 1955; Schluter 2000). Pre-mating isolation by reinforcement is already known in Gambusia due to female mate choice based on visual characteristics of males (Langerhans et al. 2005, 2007). Females spend more time near and more often overtly approach videographic imagery of males from their own population, followed by that of males from a like predator-state, and less prefer imagery of males from populations of unlike predator state. Consider the impacts secondary reinforcement would have for the two focal mechanisms of phylogenesis illustrated in Fig. 1c: (1) If divergent selection produced a singular differentiation followed by colonization or persistence bias by habitat, an initial two-branch phylogram with umbelliform polytomies at the branch tips would be acted upon by drift to lengthen terminal branches but have little or negative relative impact on the long branches separating the polytomies. However, the divide between polytomies may increase due to differential mortality of migrants from unlike habitats (e.g., as in B. roseni and B. terrabensis; Ingley and Johnson 2016) and reduced fitness of hybrid offspring (Day et al. 1994; Richards et al. 2016). Once reinforcement emerged however, asymmetric branch lengthening by environmental state (deepening between unlike populations) would pervade for FST and make monophyly more apparent. QST at this stage would only increase if adaptation had not yet reached its apex. (2) For the multiple evolutionary origins scenario, whether reinforcement played a role in differentially extending molecular genetic divides between lineages of unlike type would depend on whether the mechanism of pre- or post-mating isolation that evolved were general (biased against any immigrants) or specific (biased against migrants from unlike populations). As described earlier, Gambusia exhibits both general and specific premating isolation (Langerhans et al. 2005, 2007). Phylogenesis in each model would thus reflect a progressively shifting equilibrium of among-population and among-population-within-habitat branch extensions. With replicated evolutionary diversifications, phylogenesis would retain polyphyly in general with trending for deeper branches among clades of unlike type. The polyphyly observed in the present study therefore supports a multiple origins scenario of ecophenotypy.

At a smaller scale, the descriptive population genetic information demonstrated high levels of genetic variation within populations. Average allele diversity per population was 7% (7 types per 100 scored) with allelic richness of 6.1. These values are typical for freshwater fish but are much higher than is typical for marine fish (Ward et al. 1994; DeWoody and Avise 2000). High genetic variation within populations suggests considerable migration or persistently large population size. It also suggests a likelihood of equilibrium or excess heterozygosity. Yet there was pronounced heterozygote deficiency in all populations for most loci. The proportion of heterozygous diplotypes was 48%—significantly below the expectation of 71%. Such deficiency is also common for freshwater fishes (Ward et al. 1994; DeWoody and Avise 2000) and likely relates to environmental or other drivers of assortative mating. In an extreme case, Vázquez-Domínguez et al. (2009) found that G. yucatana and P. orri in Quintana Roo, Mexico demonstrate heterozygosity of 0.11 (P. orri) and 0.25 (G. yucatana) despite expected values in parity with those of the present study (~71% for both species). The habitats surveyed were small cenotes (karstic solution lakes) and shallow wetlands. The wetlands and shallow cenotes embedded in them undergo seasonal drying which creates pockets of isolated refugia. In these refugia fish mate and breed until the rainy season reconnects the habitat mosaic (TJD personal observation; Kobza et al. 2004; Zambrano et al. 2006; Vázquez-Domínguez et al. 2009; Loera-Pérez et al. 2020). Thus, reproduction largely occurs in small, isolated groups, which would increase inbreeding (FIS) at the metapopulation level prior to a flush of homozygotes to the larger population when contiguous. Such a system through export of locally inbred genotypes to the broader system would account for both high allelic diversity and excess homozygosity. Similar dynamics may occur to lesser degree in the six populations reported upon here. However, other mechanisms producing assortative mating are known for G. affinis and G. holbrooki. Mosquitofish are widely observed to interact and move in cohesive groups (Martin 1975; Pyke 2005; Pazmino et al. 2020), especially during the early part of the breeding season (Maglio and Rosen 1969). Due to the prolonged sperm storage and superfetation common in livebearers including Gambusia (Thibault and Schultz 1978; Constantz 1989; Haynes 1993), effects of early inbreeding could persist for multiple broods. Segregation into groups thus may be functionally analogous to physically-imposed metapopulation structure.

The present study supported the hypothesis of independent replicated adaptive divergence of canalized ecophenotypy in genetically diverse but distinct populations. As clear as is the adaptive significance of PABS ecophenotypy and its multiply-evolved nature, several questions remain for this syndrome. It is unknown whether the replication of PABS phenotypy represents parallel or convergent evolution (Oke et al. 2017). It is unknown whether phenotypic plasticity in this system may be playing an additional (likely lesser) role in conjunction with canalized genetic differentiation. Existing cases in fishes suggest plasticity may be the major contributor to ecophenotypic differentiation (Day et al. 1994; Robinson and Wilson 1996; Franssen 2011; Mays et al. 2019). The taxonomic breadth of PABS ecophenotypy and the pace and degree of evolutionary lability also remain to be resolved. Understanding the role of PABS phenotypes and other adaptive syndromes in phylogenesis will inform the larger conceptual framework of how environmental mosaicism shapes biodiversity. Our conceptual focus herein was on repeatability of diversification. Yet the molecular divergence among populations speaks as clearly to the importance of chance and multifarious selection as the predictable ecophenotypy speaks to the importance of determinism. Thus, extant diversity is the product of both shared and unique evolutionary forces (Langerhans and DeWitt 2004; Blount et al. 2018).