Climate change is likely to have an important influence on genetic connectivity of many organisms this century (reviewed in Littlefield et al. 2019). Changes in climate have already caused phenological and range shifts as well as genetic modulation in many organisms (reviewed in Parmesan 2006), and specifically in amphibians (Alexander and Eischeid 2001; Pounds 2001; Pounds et al. 2006). Yosemite toads are a federally threatened species that is reputed to be declining in both distribution (Brown and Olsen 2013; Drost and Fellers 1994; Jennings and Hayes 1994; but see Lee et al. in prep.), and abundance (Sherman and Morton 1993). Climate change is predicted to have a disproportionately large impact on meadow hydrology, and projected to dramatically reduce the geographic range for Yosemite toads by 2100 (Smith and Tirpak 1988; Viers et al. 2013; US Fish & Wildlife Service 2014; Reich et al. 2018).

Yosemite toad tadpoles are obligate shallow pond specialists (mean depth 4.35 cm; Liang et al. 2017), and almost exclusively breed in mesic mountain meadows (Grinnell and Storer 1924; Karlstrom 1962; Ratliff 1985), making survival dependent upon snowpack runoff and spring recharge. Significant larval mortality is often observed even during years with above-average snowpack (Sherman 1980; Sherman and Morton 1984, 1993; Brown et al. 2015). Freezing and early desiccation in ultra-shallow ponds often causes mass die-offs, exacerbated by water mold infections (Saprolegnia diclina) and flatworm (Turbellaria) predation (Sadinski et al. 2020). Adults and post-metamorphic juveniles are sensitive to seasonal temperatures, which directly determine their overwintering body condition, fecundity, and ability to disperse (Morton 1981). All these climatic features and interactions may play a role in determining Yosemite toad movement and breeding patterns across the landscape, and hence the magnitude and net vector of migration. Therefore, a model of fine-resolution climate associations on Yosemite toad network connectivity may help address how and where climate change will fragment meadows in the future, and lead to a more specific understanding of how their distribution may shift.

In this study, we sought to analyze patterns of environmentally mediated genetic connectivity among Yosemite toad populations in Yosemite National Park (YOSE). Genetic connectivity fosters long-term persistence of a species with discrete populations, by mitigating inbreeding depression, replenishing adaptive genetic variation, and promoting heterosis (Gotelli 1991; Hansson 1991; Whitlock et al. 2000; Gaggiotti 2003; Lowe and Allendorf 2010). From previous research, we broadly know that topography and climate influence migration (Wang 2012), and that meadow quality depends upon the network of nearby meadows (Berlow et al. 2013; Maier et al. 2022), but we lack an understanding of how specific climatic features such as snowmelt timing influence connectivity, or how previously under-sampled populations at lower elevation are affected. In addition, no study of Yosemite toad connectivity has considered environmental factors influencing toad occupancy of meadows, which may relate to larval and natal requirements, separately from migration corridor environment affecting only adult toads. While tadpoles develop and metamorphose in meadow pools and flooded areas, adult toads forage, hibernate, and can disperse into intervening habitat over 1 km per season (Martin 2008; Liang 2010). For biphasic species, estimates of connectivity can benefit from modeling between-site and at-site effects separately (Pflüger and Balkenhol 2014).

Our goals in this study were: (1) to identify specific climatic, topographic, soil, and vegetation features influencing connectivity, and (2) to forecast a spatial representation of future connectivity and net movement based on a “business-as-usual” climate change model. We also present several new approaches to address some general challenges of landscape genetics, including the challenge of defining migration environment. Our approach models the most likely migration path separately from modeling the influence of environment along that path. This allows a much richer consideration of hypothesized features and their combinations to be explored in a modeling framework. A similar approach has been taken with straight-line transects (Murphy et al. 2010a; van Strien et al. 2012; van Strien 2013), but we instead perform model testing to choose more realistic migration paths. For the most likely migration path, we then optimize the most likely corridor bandwidth of environmental influence since toads likely take multiple routes between any two meadows. This is analogous to circuit theory (McRae 2006; McRae and Beier 2007), except that it allows the separate modeling of paths and environmental influence.

We also introduce several novel approaches to address the challenge of building multivariate models of future genetic change. We apply a unique machine learning approach in our environmental models of connectivity that avoids overfitting, and effectively extrapolates future values. Machine learning approaches have been successfully applied to several studies (Murphy et al. 2010b; Hether and Hoffman 2012; Sylvester et al. 2018; Pless et al. 2021; Kittlein et al. 2022), helping to ameliorate problems identified with the assumptions of linear models (Peterman and Pope 2021). We include a “lineage effect” to account for known phylogeographic structure (Maier et al. 2019; Fig. 1A). Finally, we consider the effect of asymmetrical migration, which has been found at a smaller scale in Yosemite toads (Maier et al. 2022). We separately model the magnitude of genetic connectivity from its asymmetry, to allow an estimate of net asymmetrical shift (i.e., range shifts) under climate change.

Fig. 1: Projected migrational shift after climate change.
figure 1

A Primary study area in Yosemite NP (YOSE), CA includes ~33% of known Yosemite toad populations. Top right inset shows the range of Yosemite toads in gray, and the boundary of YOSE black. Green polygons are all meadows within the park. Solid black circles indicate all known Yosemite toad meadows identified between 1915 and the present. Large circles indicate the meadows sampled and sequenced in the present study (n = 90). Colors correspond to phylogenetic lineages shown by the inset (Maier et al. 2019). Random noise is added to locations to protect the locations of this threatened species. B Mean per pixel FST predicted from present-day environmental features, based on all underlying migration corridors. C Projected increase in FST for the time interval 2070–2099 based on the future climate scenario RCP 8.5 (“business-as-usual”). Shift in the graph theoretical “centrality” depicts the increase (circles) or decrease (triangles) in how central a meadow will become to the network. D Projected asymmetrical shift in migration (δM) in the future time interval. Arrow length represents the underlying magnitude of asymmetrical shift, and arrow direction represent the net vector of change. Neutral colors are the mean value.

Our results provide information that can be used to develop effective mitigation for habitat and climate related disturbance. In addition, our novel workflow provides a useful framework for characterizing genetic network structure in patch-limited species, and this approach may be extended to other systems.

Materials and methods

Spatial extent of sample collection

Population boundaries were highly correlated with meadow boundaries in a previous study (Maier et al. 2022). Thus, we sampled meadows within Yosemite National Park (YOSE) to maximize representation across all known breeding locations from a recent 6-year survey effort (Lee et al. in prep.), and overlap with previous studies (Shaffer et al. 2000; Wang 2012; Berlow et al. 2013; Maier 2018; Maier et al. 2019, 2022; Fig. 1A). YOSE includes ~33% of known Yosemite toad meadows. Tadpoles were sampled across all available egg clutches, ponds, meadows, and two separate years (2012–2013) to maximize inclusion of available genetic diversity and reduce potential bias of oversampling close relatives relative to the total local population. A minimum of five samples was used per meadow if additional meadows were included within 1 km; otherwise 10 samples per meadow were used, unless insufficient samples were available. This scheme maximized intra- and inter-meadow sampling representation across the study area.

Molecular methods, ddRAD sequencing, and bioinformatics

We used a previously generated double-digest Restriction Site Associated DNA Sequencing (ddRADseq) haplotype dataset (described in Maier 2018; Maier et al. 2019) for all analyses of genetic differentiation and structure. The dataset was compiled and analyzed with Stacks v1.19 (Catchen et al. 2011, 2013) and filtered using a minimum 10× locus coverage, 75% complete loci, and minor allele frequency of 0.005. SNPs were called using the multinomial likelihood algorithm of Stacks, which calculates the likelihood of the observed genotypes given the per-base sequencing error rate. The final dataset included 2318 polymorphic loci for 535 individual Yosemite toad tadpoles from 90 populations in YOSE (Table S1). We included all individuals, as previously published analyses did not find any differences in genetic differentiation after removal of close relatives (Maier et al. 2022).

Genetic differentiation and asymmetry estimates

We used AMOVA-based FST described by Weir (1996) and implemented in Arlequin (Excoffier et al. 1992) and STACKS (Catchen et al. 2013) to represent bidirectional genetic differentiation:

$$F_{ST} = \frac{{\mathop {\sum }\nolimits_i n_i\left( {\tilde p_i - \bar p} \right)^2/\left( {r - 1} \right)\bar n}}{{\bar p\left( {1 - \bar p} \right)}}$$

where \(\tilde p_i\) is the estimated allele frequency for each biallelic SNP in the ith population of a pair, ni is the count of alleles observed in that population, and r is the count of populations (2).

For asymmetrical migration estimates, we used the method of Sundqvist et al. (2016), which estimates directional GST (Nei and Chesser 1983) for a pair of populations A and B, with allele frequency vectors a and b, as follows:

$$G_{ST}^{A \to B} = \frac{{\left| {{{{\boldsymbol{a}}}} - {{{\boldsymbol{c}}}}^ \ast } \right|^2}}{{4 - \left| {{{{\boldsymbol{a}}}} + {{{\boldsymbol{c}}}}^ \ast } \right|^2}}$$


$$G_{ST}^{B \to A} = \frac{{\left| {{{{\boldsymbol{b}}}} - {{{\boldsymbol{c}}}}^ \ast } \right|^2}}{{4 - \left| {{{{\boldsymbol{b}}}} + {{{\boldsymbol{c}}}}^ \ast } \right|^2}}$$

where c* represents the allele frequency vector for a hypothetical shared gene pool. The shared gene pool has allele frequencies that are the normalized geometric means of a and b for each allele k:

$$f\left( {{{{\boldsymbol{c}}}}^ \ast } \right) = \frac{{\sqrt {a_kb_k} }}{{\mathop {\sum }\nolimits_k \sqrt {a_kb_k} }}$$

Migration rates are then calculated using Nem ≈ ((1/GST) − 1)/4 (Wright 1931). We followed Maier et al. (2022) and only used the differential between emigration and immigration, yielding a relative term δM. Net immigration (δM < 0) or net emigration (δM > 0) have the advantage of not modeling any explicit parameter, and being less sensitive to island model assumptions, such as drift-migration equilibrium.

Optimizing migration paths

Landscape genetic studies often integrate the hypothesis-testing of environmental features with the hypothesis-testing of migration paths (Manel et al. 2003; Manel and Holderegger 2013). However, this may vastly simplify the number, combinations, and weights of feature space explored, since it requires each hypothesized resistance raster to be encoded as one combination of all covariates. We took a different approach, first optimizing the most likely migration paths with ResistanceGA using two simple features already known to influence Yosemite toad migration, then extracting mean environmental values to be used in a modeling framework.

Topographic complexity and moisture availability are already known to broadly influence genetic connectivity and site occupancy in the Yosemite toad (Martin 2008; Wang 2012; Berlow et al. 2013; Brown et al. 2015; Maier 2018; Maier et al. 2022). We hypothesized that a combination of these two environmental features may best explain migration paths between meadows. High-resolution (10 m) rasters of SRTM-derived slope (Rabus et al. 2003) and vegetation class from ground-truthed aerial imagery (Keeler-Wolf et al. 2012) were selected to inform these two resistance surfaces, and were resampled to 30 m for computational feasibility. Raw pixel values of slope from 0–90° were used to inform the topographic resistance surface, whereas vegetation pixel values were first scored as ordinal resistance values between 1–10 based on moisture availability (Table S2). Both rasters were then transformed individually into resistance surfaces using the SS_optim function of ResistanceGA v4.2.2 (Peterman 2018). This procedure builds linear mixed models between FST and resistance distance through an environmental surface, then uses a genetic algorithm to explore which parameters optimize the model likelihood. After preliminary testing, a monomolecular curve was chosen to represent each transformation function, because both slope and decreasing moisture should monotonically increase resistance. The costDistance method (gdistance v1.3.6; van Etten 2017) was used to generate least cost paths (LCPs). Steep ridgetops were identified using a “gradient metrics” toolbox (Evans et al. 2014) and were scored as impenetrable to prevent unlikely, short routes from being preferred. We used a maximum value of 1 × 106, and a maximum iteration count of 50, stopping after ten steps of no improvement in the objective function.

We combined slope and vegetation rasters by rescaling to a maximum value of 10, transforming with Resistance.tran function (ResistanceGA), and finally combining in proportions from 0.0 to 1.0 in increments of 0.1 (a total of 11 hypotheses). To choose the LCP model that best represents migration paths, we used the lme4 package v1.1.27.1 (Bates et al. 2015) to build linear mixed models between LCP distance and FST. Source and destination populations were used as random effects to control for non-independence of measurements. Geometric distance was used as the null model of isolation by straight-line distance, and a 30 km distance cutoff was used following previous research showing insignificant isolation-by-distance beyond that scale (Maier 2018; Maier et al. 2022). We also excluded improbable paths that crossed Yosemite and Hetch Hetchy Valleys. The 12 models were ranked, and the one with highest log likelihood and lowest AICc was selected.

Alternative corridor bandwidths

We chose six alternative corridor bandwidths of the optimal LCP model with which to extract environmental features (Fig. 2). This allowed us to directly test which bandwidth best captures environmental influence on genetic migration (see “Optimizing corridor bandwidths”). We considered two simple buffer distances of 100 and 500 m, values chosen relative to the mean yearly dispersal distance of 275 m (Liang 2010), and we calculated them using the gBuffer function (rgeos v0.5.8). Additionally, we considered four least cost corridor (LCC) bandwidths, based on varying thresholds for the accumulated cost surface that was used to produce the LCP model. In contrast to a 1-dimensional LCP, a LCC is a 2-dimensional raster surface showing the accumulated cost of moving between source and destination, for all pixels. Such corridors may potentially better represent the bandwidth of travel, because a multitude of reasonable paths may be considered instead of just one. The accCost function (gdistance) was applied to each meadow, and then the sum of cost values was taken between each pair of meadows. Thresholds for clipping each raster into a corridor were then defined as a lower quantile of all resistance values in the raster, from conservative to inclusive: 0.001, 0.005, 0.01, and 0.05. The remaining pixels were rescaled from 0 to 1, to be used as weights for environmental extraction. This allows the probability of migration at any given pixel to influence the extent to which that local environment is considered.

Fig. 2: Alternative corridor bandwidths.
figure 2

An example of the six corridor bandwidth types, for one pair of meadows (IDs 359 and 1543). Bandwidths for least cost paths (LCPs) are simple buffer distances of 100 and 500 m, chosen relative to the average seasonal dispersal distance of 275 m. Bandwidths for least cost corridors (LCCs) represent different thresholds for the accumulated cost between source and destination: quantiles of 0.001, 0.005, 0.01, and 0.05 among all pixel values were chosen. Remaining values were then rescaled from 0 to 1.

Environmental feature extraction

Remotely sensed and survey data were used to parameterize landscape genetic models. Features were taken from nine categories, detailed in Table 1: climate (California Basin Characterization Model v65; Flint et al. 2013), soil moisture (NDMI from LANDSAT 5; Masek et al. 2006), snowmelt (day of year from MODIS; Hall et al. 2002), geology (age of soil; Huber et al. 2003), topography (SRTM; Rabus et al. 2003), vegetation (Keeler-Wolf et al. 2012), fire (frequency; Cal Fire 2022), recreation (trails and road weighted by usage; Yosemite NP), and meadow network (Berlow et al. 2013; this study). Additionally, we extracted the same climate features under the future Community Climate System Model (CCSM v4), using representation concentration pathway (RCP) 8.5 (Riahi et al. 2011). The “business-as-usual” RCP 8.5 scenario models a future where emissions continue to rise throughout the twenty-first century, leading to increase in global mean surface temperature of 2.6–4.8 °C by 2100 (IPCC 2014). For LCC bandwidths, gdalwarp (gdalUtils v2.0.3.2) was first used to resample each raster from its native resolution to the LCC resolution of 30 m. Then pixels within each bandwidth were summarized as a mean, sum, or proportion, depending upon the feature (Table 1). For simple buffers, this was achieved using the exact_extract function (exactextractr v 0.7.2), and for LCCs, summary values were weighted by migration probability of that pixel, using cellStats (raster v3.5.15). In addition to “between-site” environment, we extracted “at-site” values for these same features by using meadow polygon boundaries, then taking the difference between source and destination meadows. The meadow network group of features was only measured “at-site,” and the ones from this study include: degree, centrality, and clustering coefficient of each meadow (igraph v1.2.9; Csárdi and Nepusz 2006), as well as meadow area. Finally, two more features were added to account for isolation-by-distance (minimum path distance; “LCPdist”), and phylogeographic structure, encoded as time to most recent common ancestor (Maier et al. 2019) if source and destination are from two lineages, otherwise zero (“LineageCross”).

Table 1 Environmental features.

Optimizing corridor bandwidths

We trained a random forests model on each of the nine feature groups, to choose the optimal corridor bandwidth. This allows the freedom for one feature group (e.g., climate) to potentially have broader spatial influence on migration than another feature group (e.g., geology). Random forests analysis is a classification and regression tree method that uses an ensemble of bootstrap-aggregated decision trees to avoid overfitting (Breiman 2001). It can handle datasets containing many features, their higher-order interactions, and multicollinearity while modeling a response variable. We used 5000 trees for each model of FST versus environment, and tuned the hyperparameters using the caret package (v6.0.90; Kuhn 2008) with five-fold cross validation. Values of 1, 5, and 10 were explored for min.node.size (minimum leaves in each tree), and values between 1 and the number of features were explored for mtry (number of features sampled at each split). For each feature group, the best corridor bandwidth was chosen based the model with lowest root-mean-square error (RMSE) value.

Removing environmental redundancy

We performed principal component analysis (PCA) within each feature group to remove environmental redundancy and collinearity from the predictors. Although the predictive accuracy in tree-based machine learning approaches is robust to collinearity, correlated features are known to bias variable importance metrics (Strobl et al. 2007; Toloşi and Lengauer 2011; Gregorutti et al. 2017). We focused on in-group redundancy in this step because related features from the same data source and resolution are most likely to contain redundant information. Between-group redundancy was reduced in a later step (see “Cubist models of migration and asymmetry”). The prcomp function in base R was used on groups with >2 features, after centering on zero and rescaling by standard deviation.

Cubist models of migration and asymmetry

Cubist regression trees were chosen to predict and forecast FST and δM. Although tree-based learning methods such as random forests and xgboost have many appealing properties, they may extrapolate poorly, particularly if relationships are linear (Breiman 2001; Heenkenda et al. 2015; Houborg and McCabe 2018; Meyer and Pebesma 2021). Preliminary testing showed this to be the case in models of δM. Cubist models (Quinlan 1992, 1993) mitigate this by replacing simple split rules with linear models at each branch in the decision tree. Effectively, a different regression equation is locally fit to each partition of the data. The two hyperparameters are the number of committees (number of trees with adjusted weights) and neighbors (nearby data points to average), used to adjust model predictions.

We trained a cubist model on both FST and δM using all non-redundant features from the previous step. The caret package was used to tune the hyperparameters with ten-fold cross validation. Values of 1, 10, 50, 75, and 100 were explored for committees, and values of 0, 1, 5, 7, and 9 were explored for neighbors. Permuted variable importance was calculated using the varImp function (caret), and then collinear features from different groups (VIF > 10) were removed, with a preference to keep those with higher importance. A final cubist training procedure was then rerun with the reduced set of features.

Forecasting future migration after climate change

Future values of both FST and δM were predicted 89 years in the future (climate features averaged between 1981–2010 in the present, and 2070–2099 in the future). We first projected the PC scores of future climate features onto the loadings of the PCA model using predict.prcomp. Then, we projected FST and δM into the future using predict.train (caret). To calculate the average of underlying processes at each pixel on the map, we calculated the average change in magnitude and asymmetry of migration for all overlapping corridors. To do this, we scaled each pairwise estimate by its corridor bandwidth (0–1), with less probable routes receiving less weight. We took the average of overlapping pixel values to estimate mean change in connectivity (FST). For changes in migration asymmetry (δM), the net direction of change was calculated per pixel by summing all the underlying vectors (dx, dy) into a net vector, then calculating the net direction as \(atan2\left( {\overline {dy} ,\,\overline {dx} } \right)\) radians.

To assess how the total network of meadows will shift in the future, we assigned edge attributes for each pair of meadows of max(FST) − FST, and calculated Kleinberg’s hub centrality score (Kleinberg 1999), which is the principal eigenvector of the weighted adjacency matrix ATA. Hub centrality represents park-wide connectedness, with the highest values receiving the most overall (direct and indirect) gene flow.


Optimizing migration paths

ResistanceGA found Yosemite toad migration to have a gradual monomolecular relationship with slope, with a shape of 1.119 and maximum value of 736,398 (Table 2). This means that a 30° slope elicits a resistance value of <2000, and a 60° slope elicits a resistance value of <40,000, but an inflection point occurs near 70°, which elicits a value of >100,000 (Table S3). This is consistent with observations of toads climbing moderately steep drainages but seldom breeding at meadows isolated by extremely steep terrain (pers obs.). Vegetation showed a much less gradual monomolecular relationship than slope, with a shape of 0.515 and maximum value of 685,016. Although this weighting would favor wet meadow habitat, all vegetation types except dry rock outcrops would be plausible migration paths (Table S2 and Fig. S1).

Table 2 ResistanceGA results.

The LCP model with 0.9 of slope and 0.1 vegetation (including impenetrable ridgeline barriers) had the highest model ranking based on log-likelihood, AIC, and R2m metrics (Table 3). The second highest ranked model also gave slope a weight of 0.8, suggesting topography has a stronger influence on Yosemite toad migration paths than vegetation moisture. Hence, we used this LCP model as the most plausible set of toad migration paths for optimizing corridor bandwidths. Due to the topographic complexity of mountain ridgelines bisecting YOSE, this model found only one path connecting each of the two low-elevation lineages (Y-South, Y-West) to their adjacent high-elevation lineage (Y-East; Fig. S2).

Table 3 Migration path results.

Environmental feature extraction

Graph theoretical metrics of degree, centrality, and clustering coefficient highlighted the pattern of meadows in the Y-East lineage as the best-connected ones in the park (Table S1 and Fig. S3). This is partly because values of FST tend to be lower, and genetic diversity tends to be higher in this high-elevation lineage (Table S1). However, the other reason is spatial configuration: toads in Y-South, Y-West, and Y-North lineages can only exit those regions by passing through the Y-East lineage, due to topographic constraints of river valleys. After ordinating each group of features onto its respective eigenvectors, the remaining dataset was mostly uncorrelated (Fig. S4).

Optimizing corridor bandwidths

For every group of features except for geology (climate, soil moisture, snowmelt, topography, vegetation, fire, and recreation), the random forests model chose the broadest corridor bandwidth of LCC 0.05 (Table 4). The best bandwidth for geology was LCP 100 m, although with only one variable this relationship was weak (R2 = 0.137), whereas most other feature groups had R2 > 0.5. The LCC 0.05 bandwidth includes the primary LCP, and typically incorporates a multitude of paths with similar probability of migration, as well as peripheral habitat that may occasionally or indirectly influence toads (Fig. 2). This general pattern of LCC 0.05 bandwidths is similar to the idea of circuit theory (McRae 2006; McRae and Beier 2007), except that our method allows each feature to be modeled independently in its raw form.

Table 4 Corridor bandwidth results.

Cubist models of migration and asymmetry

After hyperparameter tuning, the two cubist models of FST and δM each attained strong model fit, with ten-fold cross-validated R2 = 0.88 and R2 = 0.78, respectively (Table 5 and Fig. 3). Both models benefited from the maximum number of committees (100) and an intermediate number of neighbors (5; Tables S5 and S6). Variable importance scores showed a pattern of relatively few between-site features with high importance in the FST model, but many at-site features of similar importance in the δM model (Fig. 4 and Tables S7 and S8). This makes sense given that FST mostly explains the overall magnitude of connectivity between sites, whereas δM is expected to explain environmental contrasts between meadows which might favor connectivity in one direction.

Table 5 Cubist models of FST and δM.
Fig. 3: Cubist model fit for FST and δM.
figure 3

Model fit of environmentally predicted FST and δM, using the cubist machine learning method. In the top row: predicted versus observed values, with linear trend lines in purple. In the bottom row: residuals versus predicted values, showing approximate homoscedasticity of residuals. All models were assessed using ten-fold cross validation and had coefficients of determination R2 = 0.88 (FST), and R2 = 0.78 (δM).

Fig. 4: Variable importance plots.
figure 4

Final non-redundant features selected for cubist models of FST (top panel) and δM (bottom). Permuted importance based on RMSE is normalized to a maximum value of 100%. Actual features used, based on PCA scores, are shown on the left. Interpretations of each feature based on PCA loadings (see Figs. S5 and S6) are shown on the right. “At-site” features (taken as the difference between source and destination values) are bold, italicized, and underlined.

Based on the loadings of the PCA models (Figs. S5 and S6), we interpreted the most important features influencing Yosemite toad connectivity (FST) within their corridors. Many climatic features were among the top ranked ones: snow meltoff day-of-year variability, snow runoff and temperature variability, recharge, climatic water deficit mean and variability, maximum temperature, and potential evapotranspiration (Fig. 4). We also found that vegetation type along corridors has a large influence on level of connectivity: rocky and shrubby habitats, proportion of water, proportion of wet meadow, and dry meadow with shrubby cover. Similarly, LANDSAT-derived soil moisture had an influence. The most important topographic features included radiation indices (particularly related to SE, NW, and SW slopes), and general topographic complexity. Interestingly, trail crossings (weighted by traffic level) were found to be a highly important feature. Finally, as expected, isolation-by-distance along the shortest migration path, and lineage effect, both were important features.

Although at-site (contrast between meadows) features were less important in predicting FST, several are noteworthy: meadow size, meadow vegetation type, and geological age of the soil. Meadow vegetation type is a classification of meadow polygons into four categories: proportion of semi- to permanently flooded, transiently flooded, willow, and short-hair sedge (more xeric). The PCA-projected feature appeared to be a gradient between semi- to permanently flooded meadow and short-hair sedge. Many of the same climatic features, indices of radiation, and topographic complexity were also important at-site.

For migration asymmetry (δM), some of the most important at-site (contrast between meadows) features were in the meadow network group: meadow size (area), degree (count of neighbors), and probability of occupancy, which is also informed by the network of available nearby meadows (Berlow et al. 2013). Many climatic and moisture-related features had similar importance to the model. LANDSAT-derived soil moisture was found to be a large influence, with fall conditions more important than spring conditions. The same gradient in meadow vegetation between semi- to permanently flooded meadow and short-hair sedge was ranked high, as was willow cover to a lesser extent. The most important topographic features included southern-facing aspect, radiation indices (including SE and NE slopes), elevation, and topographic complexity measures such as slope, roughness, and surface relief ratio. Many climatic features were important, including those relating to snowpack (mean and variability, runoff, recharge), temperature (variability of minimum and maximum, range of extremes), climatic water deficit (mean and variability), and potential evapotranspiration. The geological age of soil was also an important feature.

Forecasting future migration after climate change

Our model predicted a continuous surface of Yosemite toad connectivity that highlights regional corridors of high flow, and “pinch-points” of low connectivity (Fig. 1B). In general, regions of high connectivity tend to occur at or below the lineage level, and landscape features inhospitable to toad occupancy or migration tend to bisect these regions (e.g., ridgelines, low fire-prone areas, areas of sparse meadow habitat). The Y-North lineage, which is balkanized by a myriad of canyons and ridgelines, apparently has low connectivity compared to the others.

Our future model of FST change suggests greatest reduction in connectivity to Y-West, Y-South, and certain admixed areas in the south, such as Merced Pass (Y-South) and the East-South-A admixed lineage (Fig. 1C). We compared the graph theoretical metric of centrality, which computes the normalized first eigenvector of how central a meadow is to the entire network of toad migration, and found a shift from south and west, to north and east. This indicates that a higher proportion of future toad migration will occur at higher elevations and latitudes (Y-East and Y-North) compared with the present.

Our future model of δM change strongly suggested a pattern of net asymmetrical movement from west to east, toward higher elevation areas (Fig. 1D). Given the pinch points in southern migration corridors, this would encourage net inter-lineage movement to occur from Y-South to Y-East up the M. Fork Merced River (north of the Clark Range), or up the S. Fork Merced River directly into the higher elevation Clark Range. Similarly, Y-West is only loosely connected to Y-East, with a single likely path up Tenaya Creek into the Tuolumne River watershed. These two vectors were the most strongly supported patterns of predicted range shift, although minor vectors within the high-elevation Y-East and Y-North lineages suggest some net movement northward. Broadly, the FST and δM results are consistent, suggesting a range shift toward higher elevation and latitude by 2100.


Impact of climate change on connectivity

Our models revealed that climate has strongly contributed to Yosemite toad genetic connectivity and forecasted a range shift toward higher elevations and latitudes (Figs. 1 and 4). Climatic features relating to snowpack variability were the most important for both FST and δM models, and by far the strongest FST feature overall. Snow runoff and groundwater recharge variability also ranked very high, for both migration corridors and meadows, consistent with occupancy modeling work that showed these features to be good predictors of site presence range-wide (Viers et al. 2013). Broadly, changes to snowpack and associated runoff are expected to have greatest impact on amphibian phenology and persistence, since snow can account for 80% of total runoff during dry summer months (Corn 2003; Stewart 2009). Recently, network methods have shown that the number of Sierra Nevada meadows that offer refugia from climate change (i.e., less climate variability) is projected to decline, and this will reduce connectivity across multiple meadow-dwelling species (Maher et al. 2017). In another meadow-dwelling species, Belding’s ground squirrel (Urocitellus beldingi), network genetic connectivity among meadows was indeed related to climate refugia (Morelli et al. 2017). Our results are suggestive as to which changing climate attributes could further fragment Yosemite toad meadows but may also offer an opportunity to pinpoint refugial meadows for the species.

In the future, climate change is expected to dramatically change the hydrology of Sierra Nevada meadows. If carbon emissions continue unabated, by the end of the twenty-first century there will be an estimated 7 °C rise in average springtime temperature, 64% drop in springtime snowpack volume, and 50 day phenological shift to earlier runoff of snowmelt (Reich et al. 2018). This effect will not be uniform however, and some meadows may become climate refugia for toads (Smith and Tirpak 1988; Viers et al. 2013; Reich et al. 2018). Our work suggests that lower elevation Yosemite toads in the Y-South and Y-West lineages will track wetter and more stable conditions as they recede upward and northeastward. Prehistorically, there is evidence that these toads have contracted and expanded their distribution cyclically throughout the Pleistocene (Maier et al. 2019), although possibly at a slower rate. This is also consistent with geologically younger alluvial soils playing a role in connectivity magnitude and direction, given that toad meadow habitat only formed ~10,000 years ago as glaciers receded and deposited these soils (Wood 1975). Our mapped projections of range shifts (Fig. 1C, D) could be used as a guide to protect or even assist migration as it becomes necessary to help toads keep apace of local desiccation.

Impact of topography

We found southern and eastern aspects, along with associated indices of heat load and radiation, to be along the most important topographic features for Yosemite toad connectivity. South-facing slopes have been found to influence Yosemite toad patch suitability in previous work (Liang and Stohlgren 2011), and the influence of solar radiation may play a role in efficient and successful larval development prior to ponds desiccating (Mullally 1953; Mullally and Cunningham 1956; Brattstrom 1962). Although topographic complexity and elevation were important to δM, topographic complexity only ranked 21% as important as the top FST feature (Table S7). This may be partly explained by the fact that corridors were chosen based on lower slope, and so this “pre-modeling” step partialled out some of the inherent importance of topography. We further discuss this limitation of our approach below (see “Implications for forecasting climate impacts with landscape genetics”). Our LCP model testing did identify the 90% slope 10% vegetation model as best fit to the data, consistent with a similar pattern found using variance partitioning on microsatellite markers (Wang 2012). Interestingly, the effect of elevation on δM suggests that toads prefer to change elevations, and previous work (Maier et al. 2022) suggests a preference toward meadows with a higher slope position index (closer to ridgetops). Possibly, Yosemite toads are behaviorally biased toward upland dispersal that maximizes the likelihood of encountering suitably wet, sub-alpine breeding meadows. Negative geotaxis, or the tendency toward uphill movements, has previously been reported in other alpine animals (Baur 1986; Peterson 1997).

Impact of vegetation and soil

Rocky and shrubby habitats were the most important vegetation feature found in the FST model, possibly because they represent more xeric elements of toad corridors (e.g., western whitethorn, talus slopes) that inhibit migration. However, we note that the opposite is possible, as shrubby post-fire habitat counter-intuitively provides necessary shade and facilitates connectivity in two amphibian species in Yellowstone NP (Spear et al. 2005; Murphy et al. 2010b). The importance of a gradient in vegetation type within meadows, from semi-permanently flooded meadow to xeric short-hair sedge, presents an interesting opportunity to leverage this feature for future ecological research. Previous research has shown the Yosemite toad to utilize a gradient of meadow habitat, from immature stages and adult males in lower wet areas, to the more readily dispersing adult females occupying upper and rockier areas (Morton and Pereyra 2010). Future work could assess whether such a gradient influences emigration and immigration rates. Finally, LANDSAT-derived soil moisture was important in both models, with late season moisture playing a stronger role than spring moisture. This hints at an influence of predictably moist soil substrates for migrating adult toads, as they are foraging or searching for hibernacula. Moist surfaces during the dry season are an essential source for toads to absorb water via their pelvic drink patch, and the speed at which they replenish body fluid osmolality is known to influence where they aggregate (Reynolds and Christian 2009).

Impact of foot traffic

Trails as linear barriers were important predictors of Yosemite toad connectivity in the model of FST. The impact of trails is unsurprising given the magnitude of recreation YOSE experiences each year. In 2019, 4.4 million people drove into YOSE, 155,578 of whom hiked on trails and camped in the backcountry (National Park Service 2022). This user load has gradually accumulated to 125-fold the number of visitors one century ago. Automobiles began driving over Tioga Road in 1900, and the route became popular for businessmen beginning in the 1920s (Trexler 1975). The trail system was slowly built up from existing Native American trails during the 1860s by the U.S. Geological Survey of California, and later in the 1920s and 1930s by the Sierra Club and National Park Service (Bingaman 1968). Future mark-recapture studies might reveal that these linear barriers also impact demographic connectivity. In the future, our results may be effectively combined with corridor design methods (Chetkiewicz et al. 2006; Beier et al. 2008; Sawyer et al. 2011) to help maintain species persistence.

Meadow network effects

This landscape genetics study is among the first to isolate source-specific environmental effects on genetic connectivity (Murphy et al. 2010a; Dileo et al. 2014), and the first to accomplish this goal for Yosemite toads. Site contrasts in network features such as meadow area, probability of occupancy (Berlow et al. 2013), and degree (number of adjacently connected meadows) had a strong influence on asymmetry of migration δM. This echoes the finding that Yosemite toad meadows are locally organized into neighborhoods of “hub” and “satellite” meadows, with larger, flatter hub meadows receiving net genetic input from satellites (Maier et al. 2022). In the present study, we found a similar pattern at a larger spatial scale. At-site climatic contrasts were decisively forecasted to invoke net genetic movement toward the more climatically protected high-elevation meadows in the Y-East lineage (Fig. 1D).

Environmentally, our model (Fig. 4) predicted this asymmetry of genetic movement between source and destination meadows to be driven by contrasting meadow vegetation (e.g., vegetation type, willow cover), differing patterns of insolation (e.g., southern aspect, surface relief ratio, heat load), snow-related differences (e.g., snowpack mean, variability, runoff, recharge), and varying soil properties (e.g., fall soil moisture, and geological age of soil). Soil age could influence water retention ability of natal pools, given the recent deposition of alluvium into some (but not all) meadows by glaciation (Wood 1975). At-site feature quality may encourage emigration or immigration, by influencing resource abundance and population density, causing individuals to disperse toward meadows that reduce competition, and fitness costs (Travis et al. 1999; Matthysen 2005; Mathieu et al. 2010; Pflüger and Balkenhol 2014). Toads at more suitable meadows may benefit from higher carrying capacity through more natal pond habitat, more willow hibernacula, higher solar input to expedite tadpole metamorphosis, and less inter-annual variability in pond water supply by snowmelt. As climate change progresses, toads may consolidate at more stable “hub” sites; there is some evidence for a decadal shift in occupancy away from isolated sites, and toward meadows with consistent occupancy (Lee et al. in prep.).

An added benefit of this network perspective on meadows is that graph theory can be applied to entire networks of species connectivity, to elucidate emergent properties of indirect gene flow such as “centrality” or “modularity” (Dyer and Nason 2004; Garroway et al. 2008; Dyer et al. 2010). Although FST is anticipated to increase for nearly all meadows (likely due to population declines and genetic drift isolating sites), the relative differences in that change are expected to re-center the meadow network eastward and northward (Fig. 1C). Effectively, these two network approaches to Yosemite toad connectivity (asymmetrical δM shift from source-destination contrasts, and network-wide shift in FST centrality) are complimentary and support the same conclusion.

Implications for forecasting climate impacts with landscape genetics

Predictive spatial models in landscape genetics and genomics are an important, burgeoning class of tools for summarizing evolutionary processes and incorporating them into species or landscape conservation planning (Vandergast et al. 2008; Sork et al. 2010; Fitzpatrick and Keller 2015). In this study, we provided a novel network method for individually modeling variables of interest (Fig. 2), which fills an important void in the landscape genetic toolbox. Many studies generate hypothesized “resistance rasters” of environmental variables (McRae 2006; Zeller et al. 2012), but these are not well-suited to multivariate hypotheses, and can be biased by the subjectivity of translating raw values into resistance values (Shirk et al. 2010; Spear et al. 2010). Moreover, this approach often does a poor job of exploring total parameter space, and may converge on the wrong environmental model (Graves et al. 2013). The alternate approach, transect analysis, does extract raw values of variables from rectilinear corridors, but at the cost of losing biologically meaningful dispersal corridor locations (van Strien et al. 2012; van Strien 2013). Our method combines the biological plausibility of LCP-optimized dispersal corridors (Douglas 1994) with a multivariate modeling process of raw environmental variables, making inferences potentially more realistic. Our corridor hypothesis-testing framework was able to accommodate the multitude of equally probable dispersal paths that Circuit Theory allows (McRae 2006; McRae and Beier 2007), without giving up the flexibility of a multivariate modeling process. Our approach also does not assume that habitat suitability models explain both at-site and between-site environmental preferences as is typically done (Epps et al. 2007; Storfer et al. 2007), including for the Yosemite toad (Wang 2012). Finally, our approach accounts for phylogeographic signal in genetic differentiation estimates, which is often an unaccounted source of variance (Dyer et al. 2010).

One potential weakness of our approach is that it must first develop a simple environmental model of LCPs, before extracting features and building a nuanced model. Thus, if the first model is over simplistic, biased, or uninformed by existing knowledge of broad environmental features, then the corridors may contain some irrelevant landscape values. However, we carefully surveyed the existing literature and used the two most broadly important features (slope and vegetation moisture) to form LCP hypotheses for subsequent testing. The model ranking consistently showed slope to be the more important (but not only) feature relevant to migration paths, and all models outperformed straight-line transects, which are used by default in other transect methods (Murphy, et al. 2010a; van Strien et al. 2012; van Strien 2013). Another weakness is that machine learning approaches are not very transparent about the sign (positive or negative) of each feature’s relationship in the model, and only describe importance in terms of effect on RMSE.

Although the cubist algorithm (Quinlan 1992, 1993) has not previously been applied in landscape genetics, it can have some advantages over random forests, which has been used several times (Murphy et al. 2010b; Hether and Hoffman 2012; Sylvester et al. 2018; Pless et al. 2021; Kittlein et al. 2022). The method recursively applies linear models to each branch in a collection of rule-based decision trees, which has inherent advantages. Unlike random forests, which we found to vastly underpredict extreme values and extrapolate poorly for our model of δM, the cubist algorithm can accurately extrapolate linear conditions outside the observed range. Like random forests and xgboost, cubist is an ensemble method that sidesteps the problems of overfitting, multicollinearity, and non-linearity inherent in the Ordinary Least Squares (OLS) linear regression class of models. Cross-validated performance of the cubist algorithm was higher than linear mixed models, random forests, xgboost, and neural networks in our preliminary experiments. Cubist models should be explored further in other landscape genetic questions.

Conclusions and implications for management

Despite numerous efforts to find causative agents for the decline of Yosemite toads, such as UV radiation (Sadinski 2004), exotic predators (Grasso et al. 2010), meadow grazing (Roche et al. 2012a, b; Matchett et al. 2015), chemical deposition (Bradford and Gordon 1994; Davidson 2004; Sadinski 2004), and chytridiomycosis (Dodge et al. 2012; Lindauer and Voyles 2019; Lindauer et al. 2020), no clear patterns have emerged (Brown et al. 2015). This is in stark contrast to the “smoking-gun” inferences about other amphibian declines and recoveries in California (Knapp and Matthews 2000; Vredenburg et al. 2010; Knapp et al. 2016). However, our results based on modeling over one third of the species range indicate that climate and recent climate change strongly influence connectivity in the Yosemite toad. This is occurring both at the level of breeding meadows and adult migration corridors and may be a central threat to the future persistence of the species. Specifically, we found some evidence that lower-elevation meadows will become increasingly disconnected to each other in response to changing snowpack and runoff conditions, which could force lower-elevation toads to either adapt or move upward. Unfortunately, both adaptation and migration become much less likely in smaller and more fragmented populations, where genetic drift and inbreeding can contribute to extinction risks (Nunney and Campbell 1993; Sacchei et al. 1998; Spielman et al. 2004; Schlaepfer et al. 2018; Bozzuto et al. 2019). This is consistent with the observation that low elevation meadows have experienced higher rates of extirpations (Drost and Fellers 1994, 1996). We suggest that land managers could account for these movement patterns by prioritizing the protection of likely climate change corridors and refugia.