Introduction

Range expansions are a common aspect of the natural history of most species (Excoffier et al. 2009). In recent decades, however, substantial range expansions have become increasingly frequent as a result of anthropogenic changes to the climate and landscape (Prugh et al. 2009; Chen et al. 2011). For example, barred owls (Livezey 2009), northern flying squirrels (Garroway et al. 2011), as well as several species of sea turtle (Pike 2013; Maffucci et al. 2016), have all undergone contemporary range expansions that are at least partially attributed to climate warming or additional human mediated environmental factors. However, despite the ubiquity of range expansions across taxa, empirical studies of the genetic consequences of recent expansions remain relativity rare. As these expansions present species with novel ecological and evolutionary pressures, understanding patterns of genetic diversity and consequently, adaptive potential, in recently expanded populations is of great interest.

Population genetic theory posits that expansion fronts are populated by a small number of individuals, resulting in reduced genetic diversity and the founder effect (Nei et al. 1975; Excoffier et al. 2009). In some cases, dispersers can form “pocket populations” at the expansion front that are subject to high rates of inbreeding, further reducing genetic diversity (Ibrahim et al. 1996). Additionally, a phenomenon known as “allele surfing” may occur during range expansions, in which rare or even deleterious alleles can reach high frequencies at the front of the expansion axis (Edmonds et al. 2004; Klopfstein et al. 2006). Though there are few empirical examples, allele surfing has been suggested in a range of taxa, including microbes (Gralka et al. 2016), coral snakes (Streicher et al. 2016), and humans (Hofer et al. 2008). More generally, demographic factors such as long distance dispersal events from the source population (Bialozyt et al. 2006) and gene flow (Pfennig et al. 2016) may play an important role in shaping both genetic diversity and structure in recently expanded populations. Long distance dispersal events, in which the expansion front experiences moderate gene flow with the source population, is well documented in plants (e.g., Davies et al. 2004; Kremer et al. 2012) and has been described in a population of recently introduced European starlings in South Africa (Berthouly-Salazar et al. 2013). Additionally, gene flow with either a closely related species or additional fronts of expansion can increase genetic diversity through the introduction of novel alleles as well as those that have been lost to drift.

Here, we focus on the coyote (Canis latrans), which has recently undergone a substantial range expansion from west to east and therefore provides a tractable system to evaluate the demographic and genetic effects of a contemporary range expansion. Coyotes were historically absent from eastern North America but have expanded their range over the last century and now occupy every state in the continental United States. This expansion from the west into eastern North America followed large scale wolf control efforts on the east coast in the 1890s (e.g., Laliberte and Ripple 2004), and it has been suggested that the empty niche space was subsequently filled by northeasterly dispersing coyotes (e.g., Thornton and Murray 2014). However, this time period also corresponds to the transformation of dense forests into open agricultural land across the eastern landscape, and a combination of these factors likely influenced range expansion (Parker 1995; Nowacki and Abrams 2008). This eastward expansion occurred along two spatiotemporally isolated expansion fronts (Fig. 1). The first major wave of coyote expansion from the northern Great Plains began in the early twentieth century and consisted of two routes: (1) across the northern Great Lakes region and southern Canada into New England, and (2) along the southern Great Lakes eastward to Pennsylvania (Fig. 1; Parker 1995; Kays et al. 2010). These two fronts likely converged in New York and Pennsylvania during the late 1940s, and now operate as a single front of expansion (Kays et al. 2010; Bozarth et al. 2011). The second major coyote expansion began in the mid-20th century and followed a southeastern route from Texas to the Carolinas by the 1980s (Fig. 1; DeBow et al. 1998). These two distinct expansion fronts have experienced different rates of gene flow with other Canis species. The northeastern colonization experienced gene flow with remnant wolf populations in the Great Lakes region and Ontario (C. lupus and/or C. lycaon), confirmed in a genome-wide scan of ancestry (vonHoldt et al. 2011, 2016a). Along the southern expansion front, similar gene flow was documented with remnant red wolves (C. rufus; McCarley 1962; Nowak 2002; Miller et al. 2003; Hinton et al. 2013; Bohling et al. 2016). Red wolves were later extirpated from the Southeastern U.S. in the 1970s, reducing the opportunity for subsequent gene flow.

Fig. 1
figure 1

Map of eastern coyote range expansion and sampling locations. Historic range and expansion routes are approximate and modified from Parker (1995), Nowak (2002), and Kays et al. (2010). (sample size, n)

The two expansion fronts are suspected to meet along the mid-Atlantic coast, resulting in evolutionary and ecological consequences. First, overall genetic diversity among populations may increase in a geographically restricted region as a result of increased population connectivity (Hagen et al. 2015). Further, wolf genes have likely entered southern coyote populations through this recent connectivity (vonHoldt et al. 2011, 2016b), which alter phenotypic characters and influence adaptive traits. Northeastern coyotes exhibit a phenotype that is distinct from their western counterparts, most notably in overall larger body size and craniodental morphology (Silver and Silver 1969; Kays et al. 2010). This unique phenotype has been attributed to the selective introgression of wolf genes (vonHoldt et al. 2016b) and likely contributed to adaptive differences of northeastern coyotes from other populations (Kays et al. 2010; Thornton and Murray 2014). For instance, this phenotype is presumed to have enabled northeastern coyotes to hunt larger prey (Benson et al. 2017). Although reports of adult white tailed deer predation are fairly common (e.g., Patterson and Messier 2003), studies found that coyote diets are highly variable across habitat types (Tremblay et al. 1998) and seasons (Dumond et al. 2001); therefore the full ecological and behavioral implications of the northeastern coyote phenotype are unclear. In contrast, the population-level phenotype of southeastern coyotes has not been extensively quantified, although regional studies suggest southeastern coyotes are smaller in size (Hinton and Chamberlain 2014), with only a few documented instances of adult deer predation (Chitwood et al. 2014). While much remains unknown about the ecology of eastern coyotes, these findings suggest divergence between southeastern and northeastern coyotes.

Overall, eastern coyote populations represent an opportunity to examine the molecular consequences of how range expansion and secondary contact shape genetic diversity. We conducted a survey of 10 microsatellite loci of 482 eastern coyotes to evaluate the correspondence between population structure, genetic diversity, and the expansion routes. We predict that genetic structure will be consistent with the two known expansion routes in the northern and southern U.S., respectively. Although theory predicts low genetic diversity and strong structuring in recent expansion fronts (Excoffier et al. 2009), we hypothesize that the northern expansion front will harbor higher genetic diversity as a result of interspecific breeding with other Canis species. Finally, we assess the degree of secondary contact between northern and southern expansion fronts, a likely source of genetic diversity that breaks down population structure. While numerous microsatellite studies have been conducted on northeastern (e.g., Kays et al. 2008; Rutledge et al. 2010) and southeastern coyote populations (e.g., Damm et al. 2015), few studies have evaluated genetic structure and diversity among the two groups (e.g., Way et al. 2010; Bozarth et al. 2011).

Methods

Study area and sample collection

Coyote whole blood and tissue (e.g., liver, tongue, kidney, etc.) samples (n = 482) were obtained from 129 counties in 11 states in the eastern U.S. (Fig. 1; Table S1) between 2001 and 2015. In a minority of cases, sampling year was unknown, but believed to be approximately within this timeframe. The majority of samples were collected within a three-year period (2012–2015), which is consistent with the 2–3 year generation time for coyotes (Bekoff and Wells 1986). Removal of samples collected two or more years outside of this period, as well as samples with unknown collections years, produced qualitatively identical results in downstream analyses. Most samples were archived by government organizations including Florida Fish and Wildlife, Ohio Department of Natural Resources, and US Department of Agriculture. Additional samples were obtained from state management programs (IACUC # 1961A-13) as well as from the New York State Museum. In most cases, body size, age (e.g., adult, juvenile), and sex, as well as the date and location of capture were recorded. All but two samples from New York had a known county of origin (Table S1), and these samples were therefore excluded from spatially explicit analyses at the county level.

Microsatellite genotyping

DNA was extracted from all samples using the DNeasy Blood and Tissue kit (Qiagen, Louisville, KY) following the instructions provided by the manufacturer and quantified by Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Carlsbad, CA). Water controls were occasionally included to control for contamination. Each sample was then genotyped at 10 highly polymorphic microsatellite loci: FH2001, FH2004, FH2010, FH2137 (Francisco et al. 1996), FH2611, FH2658, FH3399 (Guyon et al. 2003), Pez11, Pez16, and Pez17 (Neff et al. 1999). Polymerase chain reactions were performed using a forward primer with a 5′ 16 bp-M13F sequence tag, a fluorescently dye-labeled (6-FAM; Applied Biosystems, Foster City, CA) complement to the M13F tag (Boutin-Ganache et al. 2001) and an unlabeled reverse primer. Reactions were a total volume of 10 μl, and contained 1.5 μl (6 ng) DNA, 1.0 μl primer mix, 0.4 μl 10 mg/ml BSA (New England Biolabs, Ipswich, MA), 5.0 μl Type-It master mix (Qiagen, Louisville, KY), and 2.1 μl ddH2O. Cycling conditions consisted of an initial denaturation at 95 °C for 15 min, followed by 25 cycles at 94 °C for 30 s, 59 °C for 90 s, and 72 °C for 60 s, then 15 cycles at 94 °C for 30 s, 53 °C for 90 s, and 72 °C for 60 s, with a final extension at 60 °C for 30 min. To ensure consistent genotyping across reactions, 22 randomly selected samples were amplified ≥ 3 times. To confirm the absence of contamination, negative controls were also included with each reaction. PCR products were denatured with Hi-Di formamide (Applied Biosystems, Foster City, CA) and LIZ GeneScan 500 size standard (Applied Biosystems, Foster City, CA). PCR fragments were then analyzed on an ABI 3730XL capillary sequencer and genotypes called using

GENEIOUS

v6.1.6 (Kearse et al. 2012). Samples with more than 30% missing data were excluded from the analysis.

Genetic diversity

Observed and expected heterozygosity, pairwise linkage disequilibrium (LD), as well as deviations from Hardy-Weinberg equilibrium (HWE) at each sampling location were evaluated with

ARLECORE

v3.5.2, the console version of

ARLEQUIN

(Excoffier and Lischer 2010). The exact tests to evaluate LD and HWE were conducted with 1,000,000 steps following 100,000 dememorisation. We calculated additional metrics of genetic distance, including allelic richness (A R; rarified for eight), with the

R

package

HIERFSTAT

(Goudet 2005). To assess genetic distance among all sampling locations, we calculated pairwise F ST in

ARLECORE

and evaluated significance using 10,000 permutations and applying a Bonferroni correction for multiple comparisons. Inbreeding coefficients (F IS) were also estimated in

ARLECORE

, with significance evaluated using 1000 permutations. We then evaluated the genetic distance among northern and southern sampling locations with a hierarchical locus-by-locus analysis of molecular variance (AMOVA) in

ARLECORE

. Samples were grouped according to collection location, with Florida, Alabama, Georgia, Louisiana, and South Carolina designated “southern” and New York, Pennsylvania, Ohio, Maryland, Virginia, and North Carolina considered “northern.” Within each group, populations were defined as the state of origin for each sample. Additionally, we identified private alleles within the northern and southern groups with

GenAlEx

v.6.5 (Peakall and Smouse 2012).

Population structure

We conducted both spatial independent and spatial dependent analyses of eastern coyote population structure. Our spatial independent analysis was implemented in the Bayesian-clustering program

STRUCTURE

v2.3.4 (Pritchard et al. 2000). With no prior populations assumed, we conducted 10 independent runs for each K value with the admixture model for K = 1–10, using 500,000 repetitions after a burn-in of 250,000. Output from each independent run was then combined using

CLUMPP

v64.1.1.2 (Jakobsson and Rosenberg 2007). The most likely number of genetic clusters represented by the data was estimated by considering both the log-likelihood (LnProbability) values inferred directly from

STRUCTURE

(Pritchard et al. 2000), as well as ΔK (Evanno et al. 2005), which was calculated with

STRUCTURE HARVESTER

v0.6.94 (Earl and vonHoldt 2012). While it has been suggested that ΔK is a superior indicator of the “true” number of clusters represented by the data (Evanno et al. 2005), this statistic is limited in that it cannot provide support for K = 1 or the highest K value (i.e., K = 10), as it is based on the rate of change in log-likelihood between successive K values. To account for biases induced by uneven sampling, we first subsampled all sampling locations to the smallest n (i.e., five) and reran

STRUCTURE

. However, this consistently resulted in an optimal K of 1 and given that allelic diversity was extremely high, it is likely that five individuals per sampling location does not provide enough power to detect subtle differences in allele frequencies. We therefore addressed this potential bias by removing locations with small sample sizes (n < 15) and conducting an additional

STRUCTURE

run using the parameters described above.

The spatially explicit analysis of population structure was conducted in

TESS

v.2.3 (Chen et al. 2007) using the BYM admixture model (Durand et al. 2009) for K = 2–10, with 1,000,000 total sweeps, a burn-in of 250,000, and 10 independent runs per K. Geographic information was included as the latitude and longitude of the county centroid from which each coyote was sampled. Though similar to

STRUCTURE

,

TESS

additionally incorporates spatial information into clustering assignments and has been suggested to outperform

STRUCTURE

when populations are weakly differentiated (Chen et al. 2007). To evaluate the optimal K value for the

TESS

analysis, the deviance information criterion (DIC) value was averaged over each independent run and plotted against K. Generally, the optimal K value corresponds to the plateau of the DIC curve (Durand et al. 2009); however, clustering patterns at successive K values were also taken into account when selecting the optimal K (Yamashiro et al. 2016). Output over each independent run was combined with

CLUMPP

prior to graphic representation.

For both the

STRUCTURE

and

TESS

analyses, we considered individuals to have high assignments to a given inferred cluster if the ancestry proportion (i.e., Q-value) was greater than or equal to 0.8. Further, individuals were considered “admixed” if Q was less than 0.8 for any single inferred cluster (e.g., Rutledge et al. 2010). To evaluate substructure within the northern and southern sampling locations, we reran

STRUCTURE

including only individuals sampled from northern or southern locations, with identical parameters as described above. Additionally, we evaluated the association of pairwise genetic and geographic distances, that is, the extent of isolation-by-distance (IBD), within northern and southern sampling locations, with a series of Mantel tests implemented in the

R

package

ade4

(Dray and Dufour 2007). Pairwise genetic distances between sampling locations were calculated as F ST/(1−F ST) following Rousset (1997) and geographic distances were calculated as the shortest straight-line distance between state centroids using the Advanced Google Maps Distance Calculator (https://www.daftlogic.com/projects-advanced-google-maps-distance-calculator.htm). Finally, to further visualize clustering in our data, we conducted a centered, unscaled, principal component analysis (PCA) with the

R

package

ADEGENET

(Jombart 2008).

Geographic cline analysis

We conducted a geographic cline analysis to describe the transition between divergent groups across the landscape by considering the frequency of genotypes along a one dimensional geographic transect. Clines are modeled as sigmoidal shaped curves with exponential decay curves on either end (i.e., tails) that can be described mathematically (Szymura and Barton 1986, 1991). Two of the key cline parameters include cline center, the inflection point of the curve, which indicates the location along the geographic transect where change in trait frequency is most rapid, and cline width, the inverse of the maximum slope, which describes the geographic distance over which this rapid change occurs. Two additional parameters, pMin and pMax, describe the frequency of the focal trait at each end of cline, which indicates the level of trait fixation at each end of the geographic transect.

Locations along a north-south transect were calculated as the shortest straight line distance between each sampling location (i.e., county) and the southernmost site (Collier County, Florida), again using the Advanced Google Maps Distance Calculator. In cases where the shortest straight-line distance encompassed habitat that is obviously unsuitable for coyotes (e.g., the Atlantic Ocean), a pivot point was created near the edge of the suitable habitat range, such that the total number of pivot points needed to avoid the unsuitable habitat was minimized. The distance between sampling locations was then calculated as the sum of the straight line distances passing through the pivot points (Baldassarre et al. 2014). These distances were not intended to simulate animal movement, but to standardize transect distances across land for geographic analyses.

As cline theory assumes all samples were collected along a one dimensional transect (Barton and Hewitt 1985), we excluded samples from Louisiana, which is approximately 900 km west of the major north-south axis formed by the other sampling locations, and could induce biases as a result of excessive perpendicular sampling. To ensure that the inclusion of Ohio, which is approximately 788 km west of New York, did not induce similar biases, we conducted a second cline analysis excluding all samples originating from Ohio.

Lastly, to evaluate additional potential biases in the cline induced by the pivot points, we conducted a third cline analysis in which locations along a north-south transect were calculated as the shortest straight line distance between sampling location and Thomas County, Georgia, which did not require pivot points to avoid major bodies of water. That is, the shortest straight-line distance between Thomas County, Georgia and all other sampled counties extended over land only. This analysis excluded samples collected from Florida, as well as samples from Louisiana.

Geographic clines in average ancestry proportion were evaluated using the

R

package

HZAR

v0.2-5, which implements a Metropolis-Hastings Markov chain Monte Carlo algorithm (Derryberry et al. 2014). We chose to examine clinal variation in average ancestry proportions, rather than allele frequency directly, as the loci surveyed in this study were highly polymorphic, and reducing the analysis to the frequency of a single allele per locus did not capture the complexity represented by the data. We fit a total of 15 possible models to the data in addition to a null model of no clinal variation, in which ancestry portions do not vary across the landscape. All models estimated cline center and width, but incorporated all possible combinations of scaling (pMin and pMax fixed at 0 and 1, fixed to observed values, free parameters) and tail parameters (no tails, right tail only, left tail only, both tails mirrored about cline center, both tails estimated independently). We further estimated the two log-likelihood support limits (analogous to 95% confidence intervals) around both the cline center and width.

Model selection was based on Akaike’s Information Criterion (Akaike 1973), corrected for sample size (AICc), with the lowest AICc score indicative of the best model (Burnham and Anderson 2002; Derryberry et al. 2014). The estimations for cline center and width given by strongly competing models (ΔAICc < 2) were also considered in the identification of the contact zone between the northern and southern populations, with clines considered coincident and concordant if the two log-likelihood support limits around the center and width, respectively, were overlapping.

Results

Genotyping and genetic diversity

We genotyped 482 coyotes from 11 states and 129 counties at 10 microsatellite loci. All loci were highly polymorphic, with the number of alleles per locus ranging 7–35 with an average of 16.7 (Table S2). Allelic richness (AR), a metric of allelic diversity that accounts for differences in sample size, ranged 3.78–6.33 (average = 4.95; Table 1, Table S2). Observed heterozygosity values were similarly high across all sampling locations, ranging 0.780–0.864 (average HO = 0.838; Table 1). We did not observe any significant deviations from HWE after applying a Bonferroni correction for multiple tests (α = 0.05, adjusted p > 4.5 × 10−4). However, following Bonferroni correction, two loci significantly deviated from linkage equilibrium (FH2001 and FH2137) in Ohio samples (p = 2.0 × 10−5). Removal of these loci from Ohio samples did not impact population level trends of genetic structure (results not shown). F IS values were low across all sampling locations (F IS average = −0.0495, range = −0.0962–0.04) and none were significantly different from zero (adjusted p > 0.0045). Overall, pairwise F ST values varied between all locations (F ST average = 0.020, range = −0.004–0.057). Following Bonferroni correction, 15 of these 55 pairwise F ST values were significantly different from zero, 12 of which were north-south comparisons (e.g., PA and SC: F ST = 0.041, p < 1.0 × 10−5), two were south-south comparisons (e.g., SC and FL: F ST = 0.019, p = 6.9 × 10−4), and one was a north-north comparison (PA and OH: F ST = 0.006, p < 1.0 × 10−6; Table 2). The AMOVA indicated significant genetic distance between northern and southern sampling locations (F CT = 0.017, p < 1.0 × 10−5). Further, variation among populations (i.e., states) within groups was also significant (F SC = 0.010, p < 1.0 × 10−5), as was variation within populations (F ST = 0.027, p < 1.0 ×10−5). We identified 23 and 20 private alleles in the northern and southern groups, respectively, all of which were relatively at low frequency (average: 0.016, range: 0.002–0.088; Table S3).

Table 1 Diversity statistics across sampling locations and average over all locations
Table 2 Pairwise F ST values among all sampling locations

Population structure

Our spatial independent analysis of population structure provided support for two distinct populations, as both ΔK and the mean LnProbability converged on K = 2 (Fig. S1). These two inferred clusters corresponded to an approximate north-south divide, with the majority of samples originating from New York, Pennsylvania, Ohio, Maryland, Virginia, and North Carolina composing one genetic cluster, and the majority of samples originating from South Carolina, Louisiana, Georgia, Alabama, and Florida in the second genetic cluster (Fig. 2a). However, we identified 48 and 22 admixed individuals in the northern and southern populations, respectively. We further identified 13 individuals that were sampled from the north, but exhibited higher membership to the southern population, and three individuals sampled from the south that clustered with the north (Fig. 2). Finally, removing locations with small sample size (n < 15) produced qualitatively identical results with regard to a north–south divide (Fig. S3).

Fig. 2
figure 2

Genetic structure inferred by Bayesian clustering in STRUCTURE a and TESS b at K = 2 with sampling locations indicated on the X-axis. c Average Q-values per state inferred via STRUCTURE

Overall, the spatially explicit analysis in

TESS

yielded similar results to the

STRUCTURE

analysis (Fig. 2b). The plateau of the DIC curve occurred at K = 5; however, clustering patterns above K = 2 did not reveal a new distinct population, but rather suggested admixture with an unsampled population in the mid-Atlantic states of Pennsylvania, Ohio, Maryland, and Virginia (Fig. S2). These results suggest that two clusters are optimal, but that these two populations likely experience different rates of gene flow with a neighboring unsampled population. Average individual level ancestry proportions within these two clusters were similar to

STRUCTURE

and showed a clear geographic north-south divide (Fig. 2b). While in the

STRUCTURE

analysis the majority of samples originating from North Carolina clustered with the northern population (8 out of 19) and a minority of samples were either admixed (6 out of 19) or clustered with the southern population (5 out of 19),

TESS

revealed eight North Carolina coyotes clustered with the north, one with the south, and the remaining ten were admixed. Further, 23 and 42 additional admixed coyotes were identified in the northern and southern populations, respectively, and one coyote from Ohio clustered with the south (Fig. 2b).

In our evaluation of substructure within sampling locations, we found support for an optimal K of 1 for individuals collected from southern sampling locations (Fig. S4A) and no evidence of IBD between southern sampling locations (Mantel: r = 0.564, p = 0.124; Figure S5A). However, for the individuals sampled from the north, support for an optimal K of 1 was comparable to support for an optimal K of 2 (Fig. S4B). At K = 2, clustering patterns suggested admixture with a neighboring unsampled population, particularly in Ohio (Fig. S4C), a pattern similar to what was observed at K = 3 in the TESS analysis (Fig. S2). Despite this potential weak substructuring, we found no evidence for a correlation between genetic and geographic distance between northern sampling locations (Mantel test: r = 0.358, p = 0.146; Fig. S5B). Lastly, the PCA clearly separated coyotes by the expansion fronts and by geography (PC1, 3.57% variation explained; Fig. 3). However, the pattern of separation along PC2 (3.43% variation) did not follow a geographic pattern, suggesting that within each population, coyotes in neighboring states are not necessarily the most genetically similar, further indicating a lack of substantial substructure.

Fig. 3
figure 3

Principal component analysis (PCA) of all 482 coyote samples using 10 microsatellite loci, with colors corresponding to northern and southern sampling locations. Labels for Ohio and Maryland are overlapping. The variation explained by PC1 and PC2 was 3.57 and 3.43%, respectively. A full color version of this figure is available at the Heredity journal online

Geographic cline analysis

Our data clearly showed clinal variation in average ancestry proportions per sampling location along a 2072 km north–south transect extending from Collier County, Florida to Hamilton County, New York (Fig. 4a). The best-fit cline model estimated pMin and pMax as free parameters (0.075 and 0.924, respectively) and did not fit decay tails about the cline center. The cline center was estimated at 1218 km and cline width was 579 km. To determine the approximate geographic location of the cline center, we identified eleven counties with transect distances within the two log-likelihood support limits of the maximum likelihood estimated (MLE) cline center (2LLc = 1106–1322 km), seven of which were located in southwest Virginia and four were in northern North Carolina (Fig. 4b). Of these ten counties, Beaufort County, North Carolina had a transect distance most similar to the MLE cline center, at 1219 km from Collier County, Florida. In addition to this model, four models were within two ΔAICc units of the lowest AICc score (Table S4). These models provided similar estimates of cline center (range: 1218–1228 km, average: 1218 km) and larger estimates of cline width (range: 778–844 km, average: 822 km). However, all estimates of cline center and width for these alternative models were within the two log-likelihood support limits calculated for the model with the lowest AICc score (2LLc: 1106–1322 km; 2LLw: 296–907; Table S4) indicating that these clines are both coincident and concordant and identify approximate the same geographic area as the selected model. Additionally, the removal of sampling locations in Ohio did not appreciably alter estimates of cline center (1224 km; 2LLc: 1122–1302 km) or width (775 km; 2LLw: 330–993 km).

Fig. 4
figure 4

a Geographic cline in average Q North frequency along a 2072 km north-south sampling transect connecting Collier County, Florida and Hamilton County, New York. Crosses represent sampled counties. b Approximate location of cline center along the sampling transect, highlighting sampling locations within the two log-likelihood support limits of the cline center

Our third cline analysis, which extended from Thomas County, Georgia to Hamilton County, New York (and excluded Florida and Louisiana), also showed similar results (Fig. S6). The best fit model for the cline in average Q-value fixed p Min and p Max at the observed values (0.022 and 0.986, respectively), and did not fit exponential decay tails. The cline center was estimated at 708 km from Thomas County, Georgia with a cline width of 882 km. One alternative model was within 2 ΔAICc units of the lowest AICc score, however this model provided similar estimates of both cline center (700 km) and width (956 km), with overlapping two log-likelihood support limits (Table S5). For the selected model, the same seven counties in southwest Virginia were within two log-likelihood units of the MLE cline center (599–796 km), with Carroll County closest to the cline center (711 km). However, the four counties in North Carolina identified by the first cline had transect distances slightly outside of this range (825, 872, 875, 908 km). These results suggest that the first cline was not substantially biased by the use of pivot points to avoid unsuitable coyote habitat, as approximately the same geographic area was identified in both analyses.

Discussion

Over the past century, coyotes colonized eastern North America along two discrete expansion fronts that occurred during distinct periods of the 20th century and differed in the frequency of hybridization events with other Canis species. Our analyses provide evidence for two genetically distinct regional populations of coyotes that correspond to the two known historic colonization routes. These findings are consistent with our expectations that divergent demographic histories result in observable genetic differences among groups of conspecifics at fine temporal scales.

Despite this clear geographic separation of coyote populations, it is likely that coyotes originating from distinct expansion fronts have begun to overlap in range, forming a contact zone between the two previously isolated populations. While other studies have suggested intraspecific gene flow between northern and southern coyote populations (Bozarth et al. 2011; vonHoldt et al. 2011, 2016b), our results provide the first estimate of the precise geographic space over which this contact zone occurs. We observed a latitudinal gradient of ancestry proportions in eastern coyotes, where the most rapid change in ancestry proportion occurred in the mid-Atlantic region. There are two primary explanations for this clinal distribution: selection against admixture among the two populations, or alternatively neutral demographic processes. Selection typically results in a steep change in frequency over a short geographic distance and narrow estimates of cline width, while neutral processes would form an initially steep clinal transition that gradually widens over time as populations homogenize (Barton and Hewitt 1985). In this study, our estimated cline width was sizable (MLE: 579 km), suggesting that this cline is driven by demography and recent contact among the two groups, rather than selective processes. However, we surveyed neutral loci, which may introgress readily even in the presence of selection against admixture (e.g., Gompert et al. 2012). It is therefore unclear if this cline will persist over generations due to selection or if we have simply captured the initial stages of homogenization between the two groups. Future studies should address the change in cline over time as well as investigate clines in the frequency of functionally relevant alleles to address the role of selection in shaping our observed cline. Finally, it is also theoretically possible to observe clines in allele frequencies under a pure isolation-by-distance scenario within a single linage (Wright 1943). In this case, however, the known spatiotemporal isolation of the coyote expansion fronts suggests that eastern coyotes are unlikely to represent a single lineage, and that our observed cline is attributable to secondary contact.

While cline analyses are useful for addressing the change in allele frequencies across a landscape, they are limited in that multidimensional sampling locations are collapsed into a one-dimensional transect. In this study, the 129 counties sampled across the eastern U.S. did not follow a perfectly one dimensional transect and we acknowledge that multidimensional sampling may bias cline analyses (e.g., Dufkov et al. 2011). However, we found no evidence for substructure in the southern cluster and only weak evidence for substructure in the northern population, perhaps as a result of unequal gene flow with a neighboring mid-western population. Though we did detect significant genetic distance between Ohio and Pennsylvania, which may be attributed to a high rate of gene flow in Ohio, removal of Ohio from the cline analysis did not appreciably change the MLE of cline center or width. These results suggest variation in allele frequencies perpendicular to the major north-south transect in the geographic regions surveyed was not substantial enough to have markedly biased our results.

Generally, this lack of substructure within each inferred cluster was interesting, as fine scale population structure has been documented in California coyotes (Sacks et al. 2004). However, population structure in these western coyotes corresponds to habitat breaks across the landscape (Sacks et al. 2004; Sacks et al. 2008), whereas the eastern United States represents a more homogeneous ecoregion (U.S. Environmental Protection Agency 2017). That is, major landscape changes occur over much larger geographic distances on the east coast than in central California, increasing the scale at which habitat breaks could influence population structure.

Interestingly, we observed extremely high heterozygosity and allelic diversity in both expansion fronts, which is atypical of recently expanded populations (Excoffier et al. 2009) and contrary to our expectations that the northern population would be more diverse as a result of more extensive gene flow with wolf populations. Though we acknowledge that the microsatellite markers used in this study have a high mutation rate (Irion et al. 2003) and are ascertained based on diversity, heterozygosity levels in eastern coyotes are approximately 10% higher than those reported for a subset of the same markers in California coyote populations (H E  = 0.76; Sacks et al. 2004), suggesting that this pattern is not entirely a methodological artifact. Although it is not immediately clear from these results how coyote populations along both expansion fronts were able to maintain such high genetic diversity, several demographic processes could mitigate these expected patterns of reduced genetic diversity and increase the adaptive potential of an expanding population. For example, if the expansion front is subject to even a modest rate of gene flow from the source population as a result of long distance dispersal, genetic diversity at the periphery could be maintained (e.g., Alleaume-Benharira et al. 2006; Berthouly-Salazar et al. 2013). Coyotes are known to be highly mobile, with male and female coyotes observed to disperse up to 102.5 km from their natal range, likely playing an important role in the maintenance of genetic diversity (Harrison 1992; Mastro 2011; Hinton et al. 2012, 2015).

It is also possible that in the early stages of expansion, coyotes along the range periphery did experience decreased genetic diversity as predicted by population genetic theory. Over the past 50 years, however, rapid increases in population size and connectivity, in combination with the high mutation rate of microsatellites, may have reintroduced genetic diversity and counteracted the impact of a historic bottleneck on contemporary populations. This phenomenon has been documented in an introduced population of rabbits in Australia (Zenger et al. 2003), as well as observed in as few as 1.5 generations in European brown bears (Hagen et al. 2015). As coyotes have a generation time of 2–3 years (Bekoff and Wells 1986), it is conceivable that a marked increase in genetic diversity could be observed after 50 years.

Finally, coyotes along both expansion fronts are known to have interbred with other Canis species (Bohling et al. 2016; Kays et al. 2010). Though interspecific hybridization is often synonymous with outbreeding depression (e.g., Muhlfeld et al. 2009), this process may play a more beneficial role for closely related species through the introduction of novel or advantageous variation (Hedrick 2013). In the case of range expansion, hybridization could increase overall genetic diversity as well as introduce locally adapted genes to populations at the expansion front (Pfennig et al. 2016). The introgression of locally adapted genes through interspecific gene flow has previously been suggested to have facilitated range expansion not only in northeastern coyotes (Kays et al. 2010), but also in Anopheles mosquitos by transferring genes critical to adaptation to arid environments (Besansky et al. 2003; Pfennig et al. 2016). While the markers utilized in this study are not sensitive enough to detect signatures of interspecific hybridization, the impact of interspecific hybridization events on population level genetic diversity remains an important area for future research.

Overall, it is important to note that these three demographic processes, long-range dispersal, recent increases in population size and connectivity, and interspecific hybridization, are not mutually exclusive and a combination of factors likely contributed to the observed genetic diversity in eastern coyote populations. For instance, heterozygosity was similarly high in both the northern and southern populations, despite interspecific hybridization occurring at higher frequency in the north and therefore, other demographic processes likely contributed to the observed pattern across all sampling locations. Future studies should address genome-wide trends in heterozygosity to elucidate the contribution of interspecific hybridization, in addition to other demographic processes, to genetic diversity across eastern coyote populations.

Data accessibility

Sampling locations and microsatellite genotypes for all individuals are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.2t965