Determining the evolutionary processes shaping phenotypes has been a central task of genetics since its inception (Wright 1931). While traits such as body size or structure of acoustic calls have obvious implications for fitness (Brommer et al. 2014; Campbell et al. 2010; Kingston and Rossiter 2004; Puechmaille et al. 2014), genetic drift must first be tested as a potential primary evolutionary mechanism operating within and between populations. Echolocating mammals use acoustic calls for spatial orientation, foraging, and communication (Knörnschild et al. 2012), hence, acoustic traits should be subject to strong selection. Thus, the relative contributions of genetic isolation and resulting drift vs. adaptive, social, or sexual selection to acoustic traits becomes a central question in the evolutionary ecology of echolocating organisms.

In the absence of selection, traits—including acoustic traits—will evolve through neutral processes, reflecting population genetic structure and effective population size (Armstrong and Coles 2007; Chen et al. 2009; Odendaal et al. 2014). Very few studies of acoustic traits, however, directly evaluate predictions from drift models of trait evolution (e.g., Campbell et al. (2010)). Instead of modeling the amount of trait divergence attributable to neutral processes, given the evolutionary history of the lineage in question, qualitative assessments of genetic divergence compared to call frequency tend to be interpreted as reflecting some combination of neutral and selective processes (Clare et al. 2013). Additionally, constraints on acoustic traits arising from body size must be considered when analyzing the evolution of calls. Other things being equal, smaller bats will emit calls at higher frequencies than larger bats, in a pattern resulting from the acoustic properties of any physical object (Jones 1996). It is therefore important to include other correlated traits, such as body size, in analyses of mammalian echolocation traits.

Among bats, new phylogenies coupled with comparative analyses reveal ecological convergence in echolocation traits, including the evolution of highly sophisticated constant-frequency echolocation (Davies et al. 2012). Instead of emitting calls sweeping across multiple frequencies—or frequency modulated (FM) echolocation—, bats that use constant-frequency (CF) echolocation emit a pure tone at a constant frequency, ending with a FM sweep. Constant frequency calls are both simple and highly constrained (Kingston et al. 2001), and therefore easy to characterize by their dominant frequency.

In the New World, only the noctilionoid Pteronotus parnellii species complex has evolved constant frequency echolocation with Doppler-shift compensation (Clare et al. 2013). Recent research on the call frequencies of different populations in this species complex has focused on their correspondence with systematics (Clare et al. 2013; López-Baucells et al. 2017; Pavan et al. 2018; Pavan and Marroig 2016), but not on the evolutionary processes responsible for shaping these traits. Here we integrate population and quantitative genetic models to test drift as the primary driver of acoustic divergence in sister populations in the P. parnellii species complex. As island populations originating through over-water dispersal (Dávalos 2006; Pavan and Marroig 2017), the two focal populations have likely been subject to random genetic drift for at least part of their evolutionary history. We couple the isolation-with-migration model (Hey and Nielsen 2007) with quantitative analyses of phenotypic divergence codified by Spitze (1993) based on Lande’s (1992) interpretation of Wright’s (1951) island model, and extended to account for uncertainty in trait heritability and scaling by Leinonen et al. (2006). The methods presented here can be easily extended to test drift as an evolutionary mechanism across populations with known isolation-with-migration parameters (Muscarella et al. 2011; Russell et al. 2008).


Study sites, capture, and sampling methods

A total of 61 Pteronotus pusillus and P. portoricensis individuals were captured using a harp trap at the entrance of 6 caves distributed throughout the Dominican Republic on the island of Hispaniola (P. pusillus) and Puerto Rico (P. portoricensis, Supplementary Tables 1 and 2) (Núñez-Novas et al. 2016; Pavan and Marroig 2016). For ease in communication, we refer to individuals from both populations as Pteronotus parnellii species complex or sensu lato (s.l.). Within minutes of capture, skin tissue was sampled using an Acu-Punch sterile, disposable 2-mm skin biopsy punch (Acuderm, Inc.). Punctures were transferred to ~0.7 g indicator silica for desiccation and transport (Corthals et al. 2015). Bats were released at the site of capture within 60 min of being caught.

Morphological and echolocation data collection

After sexing, standard mammalian measurements were collected comprising length of body, tail, ear, foot, and forearm, and body mass. We recorded biosonar vocalizations from bats in a wire mesh cage (approximately 15 × 15 × 35 cm). Recordings were made with a Larson Davis 6.25 mm instrumentation microphone (model no. 2520) and Larson Davis preamplifier (model no. PRM 422), and amplified with a PCB Piezotronics Signal Conditioner (model no. 480E09). Calls were recorded onto a laptop computer using a National Instruments A–D card (DAQCard-6062E), recording at 356 kHz using BatSound Pro v3.31 (Pettersson Elektronik). We placed bats into the cage and allowed them to calm briefly. With the microphone held ~10 cm from the bat’s head, we recorded three to five files of 5 s in duration.

We analysed the vocalizations using the callViewer-18 script compiled with Matlab by Mark Skowronski. The spectrogram parameters of the program were set at a window length of 3 ms and an FFT size of 16,384 points resulting in a resolution of 22 Hz. Automated detection parameters set a frame rate of 10,000 frames per second (resulting in a frame duration of 0.1 ms), a bandpass filter between 20 and 130 kHz, and a minimum energy of 45 dB. We focused frequency analysis on the constant frequency (CF) portion of the second harmonic of the calls, as this is known to be the highest amplitude and most consistent component of the biosonar calls of this species, and relates directly to the best frequency of the cochlea, auditory nerve, and central nervous system (Huffman and Henson 1990).

Calls were selected for analysis based on the best available signal-to-noise ratio. Approximately 65 individual calls were identified for each bat and targeted for frequency analysis in callViewer-18. The resulting tables assigned a number to each call, the time to the centre of each analysis frame, and the frequency of that frame and its slope (kHz/ms), among other parameters. Segments of calls were included in analysis when their slopes were zero for a minimum of 1 ms (10 consecutive frames). Within a single call, variations in the frequency of the CF component were accepted if they shifted within one increment of resolution, provided that the shift persisted for a minimum of 1 ms. These variations correspond to Doppler-shifts induced by motion of the bat in the cage. All frames of a call deemed to be of constant frequency were averaged and this value was taken as the frequency of the CF portion of the call.

Molecular data collection

DNA was extracted from the desiccated skin samples using QIAmp or DNeasy extraction kits (Qiagen, Inc.), and following the manufacturer’s protocol for animal tissues (Corthals et al. 2015). Extracted DNA was used as a template in PCR amplifications using Taq polymerase and primers for the entire mitochondrial cytochrome b (cytb) gene, and partial sequences from four nuclear loci: stat5a, plcb4, rag2, and atp7a. Primers and amplification conditions for each locus have been described in detail elsewhere (Dávalos et al. 2014).

Population genetic analyses

Aligned datasets for each locus were edited by eye to remove sites violating the infinite sites model (i.e., sites with >2 character states). The remaining data were then filtered using the Perl script IMgc (Woerner et al. 2007) to yield the longest non-recombining fragment for each locus. The filtered data included 1,121 bp for cytb (nDR = 26, nPR = 6), 472 bp for rag2 (nDR = 34, nPR = 28), 307 bp for plcb4 (nDR = 26, nPR = 24), 429 bp for stat5a (nDR = 10, nPR = 12), and 637 bp for atp7a (nDR = 2, nPR = 8). Sample sizes for the nuclear loci are given as numbers of sequenced chromosomes. McDonald and Kreitman (1991) tests were conducted on all coding regions (cytb, atp7a, and rag2); in no case did we find evidence of selection (Supplementary Table 3).

Previous analyses indicated the two populations are completely discontinuous (Pavan and Marroig 2016), and our own analyses failed to show consistent differentiation among sampling sites within islands. The historical demography of these populations was therefore estimated using the two-population model of IMa2 v.8.27.12 (Hey and Nielsen 2004). As implemented here, this model estimated six parameters: θ (=4Neμ) for each of the populations from Hispaniola, Puerto Rico, and the most recent common ancestor, m (=Mi/μ, where Mi is the rate of migration into population i) in each direction, and τ (=tμ, where t is the time since population splitting). Priors were set to uniform distributions (U) (0,50) for θ and migration parameters, and U(0,5) for τ. Five independent runs were allowed to burn-in for ~5 million steps each, after which each Markov chain Monte Carlo (MCMC) search was allowed to continue for ~10 million steps. Each run consisted of 30 heated chains with heating parameters ha = 0.975 and hb = 0.75. Relative substitution rates for each locus were estimated in IMa2 and converted to substitutions/site/year based on a cytb rate of 2.99 × 10−8 substitutions/site/year from a fossil-calibrated phylogenetic analysis of Noctilionoidea including P. parnellii s.l. (Rojas et al. 2016). Coalescent-scaled parameters (θ, m, and τ) were converted to natural parameters (Ne, Mi, and t, in order) as described above using the geometric mean of locus-specific substitution rates (=5.67 × 10−7 substitutions/locus/year) and a generation time of 2 years. After verifying that all runs converged on similar posterior distributions, we used IMa2 to calculate the joint posterior densities for each parameter based on the 47,983 coalescent genealogies resulting from all five separate runs.

We estimated fixation index, or FST values (Wright 1965), based on the posterior densities of directional effective number of migrants (Nem or Nm) from IMa2 (Supplementary Figure 1). The transformation to obtain FST values was based on Wright’s (1965) island model with Takahata’s (1983) correction for a finite number (d) of populations, in which:

$$F_{{\rm{ST}}} = \frac{1}{{1 + 4N_{\rm{e}}m\left( {\frac{{d^2}}{{\left( {d - 1} \right)^2}}} \right)}}.$$

While attempts to estimate Nm from FST values have long been criticized as overly simplistic (Whitlock and McCauley 1999), here we instead use this relationship to transform a posterior distribution of Nm into a posterior distribution of FST values. This transformation accounts for the variance in FST arising from stochastic errors as well as differences between loci, which are overlooked when transforming FST into migration rates. In addition, by estimating directional values of Nm using a non-equilibrium coalescent-based method, we avoid the assumptions of equal Ne and m for all populations, as well as the condition of mutation-drift-migration equilibrium for the entire system.

In addition to obtaining FST distributions from coalescent posterior distributions of Nm, we estimated global G′ST (equivalent to FST) from the multi-locus sequence data, including a correction for finite number of subpopulations (Hedrick 2005), and then bootstrapped to obtain a distribution around that value using the R package mmod v. 1.3.3 (Winter 2012). This provided a direct comparison between our coalescent-based method and previous methods of characterizing neutral variation within the genome.

Quantitative trait comparisons

All statistical analyses were conducted in the R statistical language v.3.4.4. We estimated the summary statistics of Hispaniolan and Puerto Rican populations for call frequency, body mass, and forearm length phenotypes. Bayesian estimates of differences between populations were obtained using the BEST R package (Kruschke 2013).

Models of echolocation frequency as a function of morphological data

The frequency of the CF portion of the second harmonic of a set of biosonar calls was analyzed for each bat, as described above. The median of these values was used to represent its CF in frequentist statistical analyses. Principal component analysis of morphological measurements was used to obtain orthogonal variables summarizing the variation in body size among individuals. We used linear models of CF call frequency as a function of principal components or body measurements to test whether variation in call frequency was explained by body size (Jones 1996).

Using Bayesian models to estimate PST and compare to FST

The divergence in a quantitative, heritable trait is expressed as an analogue of the FST called the QST, and given by:

$$Q_{{\rm{ST}}} = \frac{{\sigma _{A_{\rm{B}}}^2}}{{\sigma _{A_{\rm{B}}}^2 + 2\sigma _{A_{\rm{W}}}^2}};$$

in which \(\sigma _{A_{\rm{B}}}^2\) is the additive variance for the trait between populations, and \(\sigma _{A_{\rm{W}}}^2\) is the additive variance within populations. Genetic drift as a mechanism behind the quantitative differentiation is rejected when QST exceeds FST from neutral loci. For most traits in wild populations, however, the additive genetic variance is unknown (Brommer 2011). In this case, the variance terms need to be scaled by a constant c and the heritability h2, resulting in the estimate of phenotypic differentiation, or PST:

$$P_{ST} = \frac{{c\sigma _B^2}}{{c\sigma _B^2 + 2h^2\sigma _W^2}} = \frac{{\frac{c}{{h^2}}\sigma _B^2}}{{\frac{c}{{h^2}}\sigma _B^2 + 2\sigma _W^2}};$$

in which c/h2 is the additive genetic contribution to the proportion of the between-population variance. In most empirical cases the c/h2 ratio is unknown, but it determines how robust the PST approximation to the QST is. If the PST exceeds the neutral expectation —the FST—at c = h2, then it will also exceed this expectation when c > h2. However, when c < h2, there is a limit to the extent to which the PST reflects the QST exceeding the neutral expectation. This critical value is estimated by calculating c / h2critical (Eq. 13) for the lower 5% tail of PST and the upper 5% tail of FST distributions (Brommer 2011):

$$\frac{c}{{h^2}}_{{\rm{critical}}} = \frac{{2\sigma _{W0.05}^2F_{{\rm{ST}}\,0.95}}}{{\sigma _{B\,0.05}^2(1 - F_{{\rm{ST}}\,0.95})}} = \frac{{(1 - P_{{\rm{ST}}\,0.05})F_{{\rm{ST}}\,0.95}}}{{P_{{\rm{ST}}\,0.05}(1 - F_{{\rm{ST}}\,0.95})}}.$$

When c / h2critical is low, there is a large range of c / h2 over which the phenotypic divergence (QST) will exceed the neutral genetic divergence (FST), indicating the comparison is robust, and thus greater confidence in the inference of quantitative divergence exceeding neutral genetic divergence.

One advantage of our method emerges from obtaining FST values from posteriors for directional migration rates (Supplementary Figure 1). Unlike FST distributions based on assumptions of normality, the distribution of multi-locus FST values can be directly compared to the distribution of trait-specific PST values without assuming a canonical frequency distribution for either. Thus, we can also summarize the entire parameter-space over which any differences between PST and FST can be observed in the best-case scenario of c=h2. We use the frequency distributions of FST and trait-specific PST to visualize differences and test genetic drift as the evolutionary mechanism underlying differentiation in the two populations. The overlap between distributions was calculated by computing kernel densities on the same scale and their overlap using the R package sfsmisc v. 1.1-0 (Maechler 2016).

To calculate trait-specific PST values, we estimated the phenotypic variance between and within populations using linear hierarchical (or multilevel) models (Brommer et al. 2014). Multilevel models have the advantage of fitting parameters specific to clusters of observations (Gelman and Hill 2007), such as islands, without discarding the variation in observations of individuals. These models had the quantitative trait as a response variable, with population as an island-specific (or random) effect, and sex as either a sample-wide (or fixed) or an island-specific effect (including a potential interaction between sex and island). Including sex as an effect accounts for phenotypic variance between sexes that might otherwise obscure the pattern of variation between populations. The PST was estimated as a derived quantity by coding eq. 13 for c=h2, including the population-specific phenotypic variance \(\sigma _B^2\), and the residual variance \(\sigma _W^2\), after factoring out the effects of sex on the traits. In general, Bayesian analyses of the QST, approximated here by the PST, increase precision in estimates of the entire posterior distribution (O’Hara and Merilä 2005).

Hierarchical models were coded in Jags v.3.3.0 (Plummer 2003), and ran in the R package R2jags v.0.04-01 (Su and Yajima 2012), with a burn-in of 25,000 iterations followed by 25,000 additional iterations. Posteriors were sampled every 25 generations to produce effective sampling sizes for the posterior of at least 1600, and assessed using the potential scale reduction factor (PSRF), which approaches 1 at convergence (Gelman and Rubin 1992). All posterior parameter estimates had PSRF ≤ 1.003. The prior for the population-specific effect was drawn from a normal distribution, with identical independent priors for between- and within-population variances set as half-Cauchy distributions with variance of at least 10,000. These priors are robust and do not make any assumptions about the relative contribution of variation from different levels in the hierarchy (Gelman and Hill 2007). In addition, we used estimates from these models to build posterior predictive checks, and compared the simulated data to parameters from the observations using the bayesplot R package (Gabry 2017). Estimates of PST with their posterior distribution are shown for the best-case scenario of c = h2.


Population genetic analyses of isolation with migration

We estimated the historical demography of Hispaniolan and Puerto Rican populations of Pteronotus parnellii s.l. using the two-population model of IMa2. Five independent runs of ~10 million steps each converged on similar point estimates and posterior densities for all parameters except θMRCA (Supplementary Table 4). We focus here on the parameters of interest for the phenotypic evolution models: θ for each island population, directional migration rates (mi), and the splitting time (τ) for the two daughter populations. The scaled population size parameter for Hispaniola (θDR = 0.625, 95% high probability density [HPD] = 0.225, 1.975) was consistently estimated to be greater than that for Puerto Rico (θPR = 0.125, 95% HPD = 0.025, 0.575). Assuming a mean multilocus substitution rate of 1.13 × 10−6 substitutions/locus/generation, these θ estimates correspond to modal effective population sizes of ~ 138,000 individuals in the Hispaniolan population (95% HPD = 50k, 435k) and ~28,000 individuals in the Puerto Rican population (95% HPD = 6k, 127k; Fig. 1a). These results are consistent with our best knowledge of the biology of these species, namely, that members of the island P. parnellii species complex are of least conservation concern in their respective ranges (Schipper et al. 2008), numerous caves throughout their range are known to harbour thousands to tens of thousands of individuals (Gannon et al. 2005; Núñez-Novas et al. 2016), and the range of P. pusillus (Hispaniola) is much larger in area than the range of P. portoricensis (Puerto Rico). These analyses converged on an estimate of the scaled splitting time parameter τ = 0.703 (95% HPD = 0.258, 4.793); this corresponds to a splitting time of ~1.24 Ma (95% HPD = 0.45 Ma, 8.45 Ma; Fig. 1b). This estimated splitting time corresponds with recent, independently derived phylogenetic estimates of divergence of ~1.2 Ma between P. portoricensis and P. pusillus by Pavan and Marroig (2017). Migration rates were estimated as mDR (the coalescent-scaled rate of migration from Puerto Rico into Hispaniola) = 1.875 (95% HPD = 0.475, 15.07) and mPR = 0.025 (95% HPD = 0.025, 30.52). These estimates correspond to estimates of the effective number of migrants per generation (Nm) of NmDR = 0.325 (95% HPD = 0.077, 2.336) and NmPR = 0.049 (0.007, 0.907; Supplementary Figure 1).

Fig. 1
figure 1

Results from IMa2 analyses of Pteronotus parnellii s.l. populations. a Joint posterior density of Ne estimates for island populations. b Divergence time estimates between Puerto Rican and Hispaniolan populations in thousands of years (Ka)

Relationship between echolocation frequency and body size

Empirical frequency distributions of echolocation frequencies showed complete separation between Puerto Rican and Hispaniolan recordings (Fig. 2a). Analyses of covariance (ancova) revealed that call frequencies of Hispaniolan and Puerto Rican bats were significantly different (F(1,49) ≥ 405.6, P value < 2e-16, Fig. 2a), but this variation was not a linear product of body size as measured by body mass (F(1,49) = 0.436, P value = 0.512, Fig. 2b), forearm length (F(1,52) = 2.01, P value = 0.162, Fig. 2c), or the first principal component of all external body measurements (F(1,44) = 0.258, P value = 0.614, Supplementary Figure 2). The only variable to show a correlation with call frequencies was principal component 2, but it still did not explain the divergence between the two populations (PC2, F(1,44) = 7.39, P value = 0.010, Supplementary Figure 2). The distributions of external body measurements for Hispaniolan and Puerto Rican populations overlapped almost completely (Fig. 2b). Using a Bayesian implementation of the t test (Kruschke 2013), we found Hispaniolan bats had a mean forearm length 0.71 mm shorter (HPD = −1.82, 0.422), confirming those bats are generally smaller but not significantly so (Fig. 2b). The first three principal components of morphological variation were able to discriminate between the Hispaniolan and Puerto Rican populations (manova of principal components of morphological measurements F(1,46) = 17.46, P = 1.042e-07; Wilk’s Λ = 0.467, partial η2 = 0.53).

Fig. 2
figure 2

Echolocation call frequency by island by sex and its relation to body dimensions. a Boxplots of echolocation frequency (summarizing 10 calls/individual). Bayesian 95% high-probability density (HPD) of the difference in call frequency means between Puerto Rico and Hispaniola was 5.2–6.0 kHz. b Call frequency as a function of body mass. Analyses of covariance support very different call frequency for island groups (F(1, 49) = 704.260, P value = 0.000), but no influence of body mass on the call frequency (F(1, 49) = 0.435, P value = 0.512). The 95% HPD of population differences in means for body mass was 0.89–2.29 g. c Call frequency as a function of forearm length. Analyses of covariance support little influence of forearm length on the call frequency (F(1, 52) = 2.851, P value = 0.097). The 95% HPD of population differences in means for forearm lengths was −1.82, 0.422 mm

Estimates of genotypic and phenotypic differentiation

The posterior distributions of population-specific migration rates were used to calculate FST values (Table 1, Fig. 3). While all Bayesian models for quantitative traits performed well in posterior predictive checks, modelling the island-specific effects of sex improved estimates of the variance for sexes on different islands (Supplementary Figure 3). Body size variables showed no sex-specific effects: the posterior distributions of the coefficients of the effect of being male had slightly negative means for each trait, but included zero (Table 1). Compared to females, Hispaniolan males called at lower frequency (Table 1, Supplementary Table 5, Fig. 2a). This effect persisted even after including body mass or forearm length as covariates of call frequency (Table 2).

Table 1 Posterior estimates of FST into each island, G’ST (normalized multi-locus FST), trait-specific PST with sample-wide effects of sex for each trait, and island-specific effects of sex in the case of call frequency
Fig. 3
figure 3

Densities of Bayesian posteriors for FST based on between-population migration rates, and PST for relevant phenotypic variables (Brommer et al. 2014). The lines show the 95th percentile for the corresponding FST, and the 5% percentile for the PST. The overlap between PST body mass and FST Hispaniola was 0.023, for FST Puerto Rico it was 0.084; between PST call frequency and FST Hispaniola was < 0.001, for FST Puerto Rico it was 0.003; and between PST forearm length and FST Hispaniola was 0.049, for FST Puerto Rico it was 0.125

Table 2 Summary of posterior estimates of echolocation call frequency PST for analyses correcting for body size through island-specific effects of sex, and sample-wide effect of body mass (BM), or forearm length (FA)

Estimates of trait-specific PST at c = h2 showed the greatest differentiation in echolocation call frequency, with lower estimates for body mass and forearm length (Table 1). The call frequency PST for Hispaniola also had low c/h2critical (0.014 with island-specific sex effect, 0.022 with sample-wide sex effect), while the c / h2critical values of almost all other comparisons were several times larger, and even >1 for the body size data from Puerto Rico. Comparisons of the distributions of PST and FST show overlap in distributions ≥4.9% for body size variables and Puerto Rico, while the lowest overlap corresponded to PST call frequency and FST Hispaniola (0.02% with island-specific sex effect, Fig. 3, and 0.06% with sample-wide sex effect, Supplementary Figure 4). This indicates phenotypic differentiation was significantly greater than neutral genetic differentiation for call frequency trait in the Hispaniolan population.


We tested genetic drift as the evolutionary process underlying acoustic differentiation by integrating phenotypic and genotypic models, rejecting this evolutionary mechanism as a probable explanation for the divergent call in the Hispaniolan population. To this end, we integrated coalescent-based models and well-established methods for estimating FST (Wright 1965) to directly compare measures of differentiation. This approach accounts for variation in FST estimates in comparisons with PST, allows the estimation of different FST distributions corresponding to directional rates of migration, and enables measuring overlap to quantify the correspondence between genotypic and phenotypic divergence. In addition, compared to a bootstrapped distribution around a point estimate of global FST, the distributions of FST derived from Bayesian posterior estimates of Nm more realistically reflect the information content in the data.

Comparisons of F ST and P ST

Our comparisons differ from traditional comparisons of genotypic and phenotypic differentiation in two ways. First, calculating FST values from estimates of the number of migrants (Nm) overcomes the limitations of traditional approximations by integrating both stochastic variation from individual sampling and variance across loci and by allowing for asymmetric rates of gene flow (Muir et al. 2012; Sundqvist et al. 2016). Importantly, because Nm is a compound parameter (\(4Nm = 4N_{\rm{e}}\mu \ast \frac{M}{\mu }\)), these estimates are independent from the highly variable mutation rate. This approach, then, can test the genetic drift hypothesis for each population, with potential applications for testing the evolutionary processes behind differentiation in continuous traits across many populations in the West Indies and beyond (Muscarella et al. 2011; Russell et al. 2008). Second, FST values were derived from a Bayesian posterior of Nm, and the resulting distribution of FST was then compared against the posterior of phenotypic differentiation (Fig. 3). Therefore, the frequency distribution of FST does not need to be generated by bootstrapping, or some other means of introducing variation around a single point estimate. The difference between the frequency distributions of FST and PST can then be calculated as a derived quantity in Bayesian analyses or, in the case of overlap, as an estimate of the overlapping portion of the frequency distributions (Fig. 3). When estimates of neutral genetic and trait differentiation clearly overlap, as for body size variables for Puerto Rico (Fig. 3), the c / h2critical value becomes irrelevant because the overlap between trait-specific PST values and estimates of genetic differentiation occurs under the best-case scenario of ch2. In cases in which the frequency distributions differ, as for call frequency and perhaps body mass (Fig. 3), the c / h2critical value can be estimated with greater confidence on the upper tail of the FST than has been feasible before.

An analysis using the traditional method of estimating a distribution of genetic differentiation illustrates some benefits of the proposed approach. We estimated global GST (equivalent to FST) from the sequence data (Hedrick 2005), and bootstrapped to obtain a distribution around that value. With a mean G’ST = 0.215 (normalized 95% CI: 0.116-0.314), and like the transformed Bayesian posterior distribution of Nm for Hispaniola, the global GST shows no overlap with PST for the call frequency, but does overlap with PST for both body mass and forearm length. Unlike our approach using transformed posterior distributions, the traditional approach assumes an equilibrium model with gene flow being equal in both directions, and so would lead to rejection of genetic drift for the Puerto Rican population with a c / h2critical value of 0.03. In contrast, the method presented here accounts for directional differences in FST that lead to a sixfold higher c / h2critical for the Puerto Rican (0.080) compared to the Hispaniolan population (0.014), resulting in much greater robustness for the rejection of drift for the latter.

Our method of deriving FST distributions from Bayesian posterior distributions of Nm may be practically implemented in future studies. Here, we chose the IMa2 software package to estimate distributions of Nm because the underlying demographic model is directly applicable to the sampled populations of P. portoricensis and P. pusillus, two sister species with no genetic structure within islands and DNA sequence data well described by the infinite sites model. In other applications, more complex demographic scenarios and/or different data types may apply different methods to estimate the Nm posterior (e.g., approximate Bayesian methods). Regardless of the specific method used to estimate these posterior distributions, we urge that the data be tested for neutrality, that appropriate demographic models be used, and that appropriate substitution models be specified in the analysis. In addition, we expect that, regardless of the method used, Bayesian estimates of the Nm composite parameter will yield more precise posteriors with larger datasets, particularly with an increase in the number of variable markers rather than an increase in the number of sampled individuals (Knowles and Carstens 2007; Leaché et al. 2013).

Acoustic divergence despite similar body size

Previous analyses had shown genetic differentiation between the Hispaniolan (Pteronotus pusillus) and Puerto Rican (P. portoricensis) populations in the P. parnellii species complex (Dávalos 2006; Pavan and Marroig 2016). We confirmed two independent, allopatric populations characterized by divergent acoustic signals. Acoustic differentiation cannot be explained by the subtle differences in body size that characterize the two populations. Linear combinations of morphological measurements of the skull or body can discriminate between species (Pavan and Marroig 2016), but if the two populations were sympatric they could not be easily distinguished using external measurements (Fig. 2b), and would be considered cryptic (Kingston et al. 2001). In contrast, differences in call frequency were large and consistently explained by population membership, but not by size (Fig. 2, Supplementary Figure 2). These observations raise the question of how the two sister populations evolved divergent call frequencies.

Pteronotus pusillus and P. portoricensis are allopatric, insular populations with a long history of isolation for approximately 1.2 million years (Fig. 1b). This long isolation coupled with little subsequent migration makes genetic drift an obvious mechanism for acoustic divergence (Supplementary Figure 1). Multiple adaptive, social, and sex-driven evolutionary causes have been invoked to explain variation in call frequency between populations and between species, including habitat physical features (Odendaal et al. 2014), ambient humidity (Guillén et al. 2000), acoustic environment (Gillam and McCracken 2007), ecological segregation (Kingston et al. 2001; Kingston and Rossiter 2004), female choice (Puechmaille et al. 2014), and cultural drift (Yoshino et al. 2008). Crucially, the genetic drift model is seldom tested in a way that considers more than simple pairwise genetic distances (e.g., Odendaal et al. 2014; Puechmaille et al. 2011; Yoshino et al. 2008). Here, the genetic drift hypothesis is rejected for Hispaniola over ~98% of the range of c / h2 < 1or additive proportion of heritability (Table 1, Fig. 3). This finding, along with sexual dimorphism in the Hispaniolan but not Puerto Rican population, detected even after accounting for body size, suggests sexual or social mechanisms could explain trait differentiation in these island populations.

As high-frequency calls are an honest signal of body condition, females seem to select for higher-frequency males in at least one constant-frequency echolocating bat species (Puechmaille et al. 2014). If that were the case here, female choice would lead to the higher frequency of Pteronotus pusillus. Male calls, however, were significantly lower in this population even after accounting for their somewhat smaller body size (Table 2). If this dimorphism is the result of female choice, then it runs counter to the direction of divergence that needs to be explained. The alternative is for females to drive the change in call frequency, not through sexual selection, but through cultural drift (Yoshino et al. 2008). In this case, the maternal transmission of culturally distinct higher-frequency calls coupled with female philopatry leads to long-term divergence in acoustic calls even after males disperse. Over the time of estimated isolation even subtle cultural differences together with overwater barriers could explain the divergence found. Although ecological factors cannot be entirely ruled out without additional data, the cultural drift hypothesis has the advantage of explaining sexual dimorphism as well. Future studies can explore the implications of these initial findings, including the extent of sex-biased dispersal between the two isolated populations (Pavan and Marroig 2016).

In conclusion, we have introduced a Bayesian coalescent-based approach to estimate FST and thereby test genetic drift as an evolutionary mechanism to explain phenotypic divergence across multiple traits. This approach directly calculates the extent of overlap between posterior distributions of FST and PST. Coalescent-based analyses revealed isolated populations with minimal subsequent migration, leading to high FST values, while trait analyses showed acoustic divergence and sexual dimorphism in call frequency. Comparisons between FST and PST rejected genetic drift as a probable evolutionary mechanism behind acoustic divergence in Pteronotus pusillus, and somewhat less robustly in P. portoricensis. The significantly higher calls of the Hispaniolan population, together with lower calls of males make female choice an unlikely evolutionary mechanism, and instead leave open the possibility of female-mediated cultural drift. By integrating Bayesian coalescent and trait analyses, this study demonstrates a powerful approach to testing genetic drift as the key evolutionary process in trait differentiation.

Data archiving

We have deposited the primary data underlying these analyses as follows:

  • Sampling locations, morphological data, R scripts and Bayesian models available from Dryad:

  • DNA sequences: Genbank accessions cytb: KX787941-KX787953, KX787967-KX787994, stat5a: KY077747- KY077757, plcb4: KY077790-KY077814, rag2: KY077758-KY077789, and atp7a: KY077742-KY077746.