Using high-throughput sequencing to investigate the factors structuring genomic variation of a Mediterranean grasshopper of great conservation concern

González-Serna, María José; Cordero, Pedro J.; Ortego, Joaquín

doi:10.1038/s41598-018-31775-x

Download PDF

Article
Open access
Published: 07 September 2018

Using high-throughput sequencing to investigate the factors structuring genomic variation of a Mediterranean grasshopper of great conservation concern

María José González-Serna ORCID: orcid.org/0000-0001-8688-7623¹,
Pedro J. Cordero¹ &
Joaquín Ortego²

Scientific Reports volume 8, Article number: 13436 (2018) Cite this article

2562 Accesses
13 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Inferring the demographic history of species is fundamental for understanding their responses to past climate/landscape alterations and improving our predictions about the future impacts of the different components of ongoing global change. Estimating the time-frame at which population fragmentation took place is also critical to determine whether such process was shaped by ancient events (e.g. past climate/geological changes) or if, conversely, it was driven by recent human activities (e.g. habitat loss). We employed genomic data (ddRAD-Seq) to determine the factors shaping contemporary patterns of genetic variation in the Iberian cross-backed grasshopper Dociostaurus crassiusculus, an endangered species with limited dispersal capacity and narrow habitat requirements. Our analyses indicate the presence of two ancient lineages and three genetic clusters resulted from historical processes of population fragmentation (~18–126 ka) that predate the Anthropocene. Landscape genetic analyses indicate that the limits of major river basins are the main geographical feature explaining large-scale patterns of genomic differentiation, with no apparent effect of human-driven habitat fragmentation. Overall, our study highlights the importance of detailed phylogeographic, demographic and spatially-explicit landscape analyses to identify evolutionary significant units and determine the relative impact of historical vs. anthropogenic factors on processes of genetic fragmentation in taxa of great conservation concern.

Disentangling the contemporary and historical effects of landscape on the population genomic variation of two bird species restricted to the highland forest enclaves of northeastern Brazil

Article 20 November 2023

Population genomics reveals how 5 ka of human occupancy led the Lima leaf-toed gecko (Phyllodactylus sentosus) to the brink of extinction

Article Open access 27 October 2023

Population genomic diversity and structure in the golden bandicoot: a history of isolation, extirpation, and conservation

Article Open access 08 October 2023

Introduction

Inferring the evolutionary and demographic history of species and populations is fundamental for understanding how they were impacted by past environmental and landscape alterations and anticipating their responses to different components of global change such as climatic variations^1,2,3, habitat loss⁴ or the emergence of infectious diseases⁵. Many organisms show nowadays highly fragmented distributions due to a natural patchy distribution of their particular habitats^6,7 or as consequence of their originally continuous populations became isolated due to habitat fragmentation driven by human activities^8,9 or past climatic/geological events^10,11. The genetic, ecological and evolutionary consequences of severe population fragmentation are numerous, including alteration of selective pressures, genetic erosion, inbreeding, accumulation of deleterious mutations, reduced evolutionary potential and, ultimately, increased risk of extinction^12,13. For these reasons, the study of population fragmentation is a major area of research for conservation biologists and geneticists, and particular attention has been paid to taxa forming small populations and presenting narrow distributions, low dispersal capabilities, and specific habitat requirements^14,15.

Estimating the time-frame at which population fragmentation took place is critical to determine whether such process was driven by historical processes that predate the Anthropocene or if, conversely, it is a direct consequence of human activities^16,17. This has important implications to inform on ground conservation practices¹⁸. If recent human-induced habitat fragmentation is identified as the main driver of population genetic structure and disruption of gene flow, then conservation practices should focus on restoring population connectivity either creating corridors to dispersal or assisting gene flow to avoid the long-term negative consequences of inbreeding and loss of genetic diversity^19,20. If, instead, population genetic structure was driven by ancient processes, then the different clades, lineages or genetic clusters might represent evolutionary significant units (ESUs) with particular local adaptations that deserve to be managed independently to maximize the protection of both vicariant and adaptive diversity^18,21,22. Beyond the temporal scale of population divergence, identifying the proximate factors shaping contemporary patterns of genetic structure is also fundamental to understand how organisms interact with the different components of the landscape²³. Genetic and spatial information has been successfully integrated to infer dispersal routes across different habitat types²⁰, identify natural barriers to dispersal (e.g. rivers²⁴, topography²⁵, geology²⁶) and determine the consequences of human activities on disrupting gene flow of natural populations (e.g. agriculture^27,28, infrastructures²⁹). For this reason, testing alternative spatially-explicit scenarios of population connectivity under a landscape genetic framework can help to determine the relative role of human and natural barriers to gene flow on structuring present-day patterns of genetic variation³⁰. This takes particular relevance in the case of specialist taxa with patchy distributions, as identifying contemporary barriers to gene flow and cryptic corridors to dispersal is crucial to establish management practices aimed to restore or enhance connectivity among remnant populations²⁰.

The Iberian Peninsula constitutes an important biodiversity hotspot, with high species richness, rates of endemism and levels of intra-specific genetic diversity^31,32,33. Explanations for the high diversity of the Iberian Peninsula include its historically high climatic stability³⁴, the low impact of Pleistocene glaciations in comparison with northern temperate areas³⁵, its current proximity and Miocene connection with North Africa and other Mediterranean regions^10,36, and the presence of deep environmental gradients and a complex topography^34,37. Despite its high biodiversity and conservation value, the Mediterranean region has been exposed to millennia of strong human intervention^27,34 that have reduced the original extent of its primary vegetation by ~95%³³. This region is also predicted to be impacted by intense effects of climate change and experience distributional shifts and remarkable range contractions in many taxa^38,39. Both severe habitat loss and climate warming represent serious threats for many taxa with small and highly fragmented distributions that deal with important difficulties for maintaining viable populations and face risk of extinction^40,41. Thus, understanding the evolutionary history, demographic trends, and interactions with landscape heterogeneity of these taxa is critical for establishing effective conservation policies and informed management practices that ensure their long-term persistence^42,43.

In this study, we use genomic data to infer the processes structuring genetic variation in the Iberian cross-backed grasshopper Dociostaurus crassiusculus (Pantel, 1886), a species of great conservation concern that has been recently catalogued as “endangered” in the Red List of European Orthoptera⁴⁴. The taxonomic position of this species was controversial and according to morphological criteria it has been considered for a long time a subspecies of the Asian Dociostaurus kraussi (Ingenitskii, 1897)^45,46,47. A recent re-evaluation of the taxonomic status of the genus using genetic data has supported the presence of two well recognized species in concordance with their disjunct distributions: D. crassiusculus in the Iberian Peninsula and D. kraussi in Asia⁴⁸. The full species status of D. crassiusculus makes it of higher conservation concern provided that the very few known populations of the species persist in highly isolated pockets of habitat embedded in an expansive matrix of unsuitable areas³. The species is currently distributed in central-southeast Iberia, mostly occupying pseudo-steppe habitats with halophytic plant communities linked to gypsum or hypersaline soils³. These narrow habitat requirements, together with the reduced flying capacity of the species and the progressive loss of its natural habitat by human activities, has led that all populations of D. crassiusculus are nowadays extremely fragmented and at high risk of extinction by stochastic phenomena^3,49.

Here, we employ restriction-site-associated DNA sequencing (ddRAD-seq), coalescent-based simulations and a landscape genetics framework to evaluate alternative demographic scenarios, estimate the timing of population fragmentation, and infer the processes shaping contemporary patterns of genetic structure across all known populations of D. crassiusculus. Specifically, we first used genomic data to analyse the spatial genetic structure of extant populations of the species, identify main lineages and establish their phylogenomic relationships, and define hierarchical units for conservation and management^18,21. Second, we tested alternative coalescent-based demographic and migration models to infer spatial patterns and rates of inter-population gene flow, estimate the timing of population fragmentation at different spatial scales and, ultimately, determine whether the genetic structure of the species is a consequence of ancient events (e.g. past climate or geological changes) or if, conversely, it is compatible with human-driven population fragmentation¹⁶. Finally, we generated alternative isolation-by-resistance (IBR) scenarios of population connectivity within a spatially-explicit framework to identify the specific landscape features shaping genetic differentiation in the species and unravel the relative importance of natural (topography, lithology, limits of main river basins) vs. anthropogenic (habitat loss) processes of genetic fragmentation.

Results

Genomic data and genetic statistics

A total of 91,666,732 reads were obtained for the 35 genotyped individuals of D. crassiusculus. The number of reads per individual (mean ± SD = 2,619,049 ± 841,054 reads) before and after different quality filtering steps is shown in Supplementary Fig. S1. The datasets obtained with Stacks for p = 2 and p = 4 contained 80,534 and 65,459 unlinked SNPs, respectively. The datasets obtained with PyRAD for W_clust = 90% and MinCov = 11 and 23 contained 32,424 and 18,442 unlinked SNPs, respectively; and for W_clust = 95% and MinCov = 11 and 23 contained 42,053 and 23,628 unlinked SNPs, respectively. Population genetic statistics (P, π, H_O, H_E and F_IS) calculated with Stacks for all positions (polymorphic and non-polymorphic) and considering loci that were represented in at least two (p = 2) and four populations (p = 4) and the 50% of individuals within populations (r = 0.5) are presented in Supplementary Table S1. Pair-wise F_ST values ranged from 0.063 to 0.237 and all were significantly different from zero based on 100 permutations (P < 0.05; Supplementary Table S2).

Population genetic structure

Structure analyses based on a random subset of unlinked 10,000 SNPs from six different datasets obtained with Stacks and PyRAD considering different filtering/clustering parameters (see Supplementary Methods for further details), always identified K = 2 as the most likely clustering solution according with the ΔK criterion (Supplementary Figs 2 and 3). The two clusters presented no signature of genetic admixture and split the southernmost population (ORCE) from the remainder of the populations (Fig. 1d). Structure analyses for K = 3 divided Northern (TAJU-BELI) and Central populations (PHUE-SALI-BONI) in two different genetic clusters, but in this case the geographically closer populations (TAJU-BELI and PHUE-SALI) showed a considerable degree of genetic admixture (~25%) (Fig. 1d and Supplementary Fig. S2). The results obtained with Structure were in agreement with those obtained from Principal Component Analyses (PCA), in which PC1 split the Southern population (ORCE) from the remainder of the populations, and PC2 separated Northern (TAJU-BELI) from Central populations (PHUE-SALI-BONI) (Fig. 2 and Supplementary Fig. S4).

Phylogenomic inference

Phylogenomic relationships among populations inferred by Snapp were well-resolved and nodes presented high posterior probabilities (Fig. 3a). In agreement with analyses of genetic structure (Structure and PCAs), Snapp analyses supported an earlier split of ORCE from the remainder of the populations, which in turn divided into Northern (TAJU) and Central populations (PHUE and BONI) (Fig. 3a). Analyses considering different current and ancestral population sizes (α = 2; β = 200 or α = 2; β = 20,000) and different population combinations for Central and Northern populations (i.e. BELI-SALI, BELI-PHUE, and TAJU-SALI) yielded analogous results (data not shown)⁵⁰. The best tree from SVDquartets yielded the same topology than Snapp, but the relationships among populations were not well resolved (bootstrap support values <70%) probably as a result of inter-population gene flow or incomplete lineage sorting (Fig. 3b).

Coalescent-based demographic models

Fastsimcoal2 analyses supported models A, B and D (Fig. 4) as the best-fitting and statistically equivalent models (ΔAIC < 2.00; Table 1). These three migration models have in common that all of them consider gene flow between ancestral populations (m_ANC) (Fig. 4). Although analogous models without ancestral migration were tested (models C, E and F; Fig. 4), they were poorly supported (Table 1). Demographic parameters estimated under the three best supported models (A, B, and D) and their weighted averages are presented in Table 2. Considering a 1-year generation time for D. crassiusculus³, Fastsimcoal2 analyses showed that the division between the Southern and Northern-Central populations (T_DIV2) occurred ~126 ka (95% CIs: 90–197 ka), probably during the Eemian Interglacial period (115–130 ka) (Table 2). The weighted average estimate yielded by Fastsimcoal2 for the more recent split between Northern and Central populations (T_DIV1) indicate that this event took place ~17 ka (95% CIs: 11–24 ka), around the last glacial maximum (LGM; 20 ka) (Table 2). Gene-flow estimates were low and the migration rate (m) inferred between Central and Southern populations (m_C-S) was nearly an order of magnitude lower than the migration rate between Northern and Central populations (m_N-C) and between ancestral populations (m_ANC) after the first population split (T_DIV2) (Table 2).

Table 1 Comparison of alternative migration models (detailed in Fig. 4) tested using Fastsimcoal2.

Full size table

Table 2 Parameters inferred from coalescent simulations with Fastsimcoal2 under the three best supported migration models (see Fig. 4 for details).

Full size table

Landscape genetic analyses

Genetic differentiation was significantly and positively correlated with resistance distances obtained under all tested scenarios (Supplementary Table S3). Hypothetical scenarios based on habitat and lithology reached the highest model fit at the lowest resistance value for the non-suitable category, indicating that they do not explain the data better than a flat landscape in which all cells have equal resistance (=1) (Supplementary Table S3; Fig. 5a). In contrast, model fit for the scenario incorporating the resistance offered by the boundaries of main river basins peaked when the resistance value offered by this landscape feature was set to 100 (r² = 0.830; P = 0.001) (Supplementary Table S3; Fig. 5a). A multiple matrix regression with randomization (MMRR) analysis considering simultaneously the best fit resistance value under each scenario showed that the scenario incorporating the resistance offered by the boundaries of main river basins was the only one retained into the final model (Table 3; Fig. 5b). This indicates that isolation in different river basins is the main factor explaining genetic differentiation in the species, with no apparent effect of topographic roughness, lithology or habitat (Table 3).

Table 3 Multiple matrix regression with randomization (MMRR) for genetic differentiation (F_ST) in relation with resistance distances defined by (i) a flat landscape, (ii) topographic roughness (slope), (iii) limits of main river basins, (iv) habitat, and (v) lithology.

Full size table

Discussion

Genomic data revealed that populations of the endangered Iberian grasshopper D. crassiusculus show a marked hierarchical genetic structure, with the presence of two highly divergent cryptic lineages (Fig. 3) that comprise three genetic clusters (Figs 1 and 2). One of the lineages is only represented by the highly isolated population (ORCE) located in the southernmost limit of species distribution (Southern cluster), whereas the other includes the remainder of the populations and is sub-structured into two genetic clusters (Northern and Central clusters). Our phylogenomic and coalescent-based analyses supported an early split of the two lineages and estimated that their divergence took place during the Upper Pleistocene (~126 ka), probably around the Eemian interglacial stage. The Northern (TAJU-BELI) and Central (SALI-PHUE-BONI) genetic clusters were estimated to diverge much more recently (~17 ka), probably after the LGM. Note, however, that these estimates of divergence time must be interpreted with caution. In particular, it is remarkable the different estimates of divergence time obtained for Model D vs. Models A–B (Table 2). The fact that Model D does not consider gene flow among contemporary populations is expected to have resulted in younger estimates of population split than in Models A–B. Thus, Model D and Models A–B are statistically indistinguishable but find two different solutions that fit equally well our genomic data (Models A–B: presence of contemporary gene flow and older estimates of divergence times; Model D: lack of contemporary gene flow and younger estimates of divergence times) (Tables 1 and 2). Statistical evaluation of alternative migration models showed that the most likely scenarios were always those considering ancient gene flow between ancestral populations and contemporary gene flow between populations from Northern and Central genetic clusters, although with very low absolute values for migration rates per generation (m_ANC = 1.30 × 10⁻⁰⁵; m_N-C = 3.47 × 10⁻⁰⁵; Table 2 and Fig. 4). The consistent support for models including gene flow between ancestral populations (m_ANC), indicate that vicariance with multiple contacts (probably during glacial-interglacial cycles) is likely to have led to the current genetic structure of the species (i.e. isolation with gene flow). The best supported scenario (Model B) is the one considering gene flow between recently split populations across time, with higher contemporary migration rates among closer populations from Northern and Central genetic clusters (Fig. 4). Migration models involving gene flow with the Southern lineage were either not supported (Model C) or yielded point estimates of migration rates an order of magnitude lower (m_C-S = 1.83 × 10⁻⁰⁶) than those inferred between Northern and Central populations (Model A) (Table 1; Fig. 4). These results are in agreement with Bayesian clustering analyses, which showed considerable genetic admixture (~25%) among nearby populations from Northern and Central genetic clusters but no signature of admixed ancestry for the Southern lineage (Fig. 1d). Thus, despite the small distribution range and the relatively short geographical distances separating the extant populations of D. crassiusculus, our results indicate that this species shows a remarkable genetic structure that is comparable to that reported for other Orthoptera taxa with patchy distributions and forming highly fragmented populations^27,28,51,52.

Our landscape genetic analyses indicate that geographical distance, the spatial distribution of suitable habitats, lithology or topography do not explain per se the degree of genetic differentiation among populations and revealed that the limits of major river basins are the main factor explaining large-scale patterns of genetic structure in D. crassiusculus (Fig. 1a). These results are in agreement with inferences from Structure and PCA analyses, which showed that the populations of the species are clustered according to the limits of main river basins: Northern genetic cluster in Tagus river basin, Central genetic cluster in Guadiana river basin, and Southern genetic cluster in Guadalquivir river basin (Fig. 1a). Apart from numerous freshwater fishes^53,54, the importance of palaeo- and modern drainages in structuring genetic variation has been also reported in another steppe specialist grasshopper (Mioscirtus wagneri) presenting highly fragmented populations and inhabiting a similar geographic area⁵¹ and in geckos (genus Rhynchoedura) from arid regions of Australia⁵⁵. These results indicate the importance of this landscape feature on the evolutionary histories of terrestrial organisms from steppe and arid landscapes⁵⁵. Rivers themselves do not seem to be an important barrier to dispersal in our study system, as populations located within the same basin but at different sides of main river stems or their tributaries (e.g. TAJU and BELI) show low levels of genetic differentiation in comparison with populations located in different basins⁵⁶. Estimates of divergence time among contemporary populations of D. crassiusculus (~17–126 ka; Table 2) and the timing of species split from its sister taxon D. kraussi (1.01 Ma⁴⁸) indicate that the origin of the species and its different lineages is probably posterior to the formation of the main river basins from the central-south Iberia, which are thought to have acquired their current configuration during the Oligocene-Pliocene^54,55,57. Thus, the different genetic clusters and lineages are not likely to have resulted from population isolation in different palaeodrainages or ancient geological surfaces²⁶, but probably reflect the role of river drainages and lowlands as corridors of suitable habitat facilitating connectivity among populations located within the same basin⁵¹. Given that a predominantly flat landscape characterizes the distribution area of D. crassiusculus and the main drainages are not separated by an abrupt topography (i.e. mountain systems), our results suggest that populations of the species have probably remained linked to lowlands (e.g. pseudo-steppe saline low grounds) from different river basins^51,58 rather than physically separated by ridges representing the divides between drainages. Accordingly, our analyses indicate that other landscape features such as topographic roughness (slope) or the distribution of the typical habitats and lithological formations occupied by the species are not important factors explaining spatial patterns of genetic structure in D. crassiusculus (Table 3). Previous studies have identified topographic roughness as a relevant factor shaping genetic differentiation in two montane grasshoppers inhabiting areas with abrupt landscapes^25,59, a situation contrasting with the predominantly flat areas characterizing the distribution range of D. crassiusculus³. The widespread presence of sedimentary lithologies (evaporites, limestones, and conglomerates) across the distribution range of the species could have reduced our ability to identify barriers to dispersal linked to unsuitable geological formations or, alternatively, might reflect the capacity of the species to cross them. In any case, we must point out that our landscape genetic analyses should be interpreted with extreme caution, given that the very few extant populations of the species (n = 6) strongly limit the power of our analyses and the scope of the obtained inferences.

Coalescent-based analyses support the fact that range-wide patterns of genetic structure in D. crassiusculus are a consequence of ancient processes of population fragmentation (~17–126 ka; Table 2) that predate the Anthropocene. Accordingly, landscape genetic analyses suggest that land clearing for agriculture is not likely to explain large-scale patterns of genetic fragmentation (Fig. 5; Table 3). Based on the degree of divergence between the different lineages and genetic clusters, we recommend that the Northern, Central and Southern groups are recognized as Evolutionarily Significant Units (ESU)⁶⁰, Designatable Units (DU)⁶¹ or Conservation Significant Units (CSU)⁶². These entities are likely to be substantially reproductively isolated from each other, represent an important component in the evolutionary legacy of the species, and include all discrete genetic and geographic subunits below the species level for status assessment, establishing conservation priorities and setting on-ground management strategies^60,63. Of particular concern is the highly divergent Southern lineage because, as far as we know, it is currently represented by a single small population (ORCE) within the Guadalquivir river basin and Andalucía region (Fig. 1a,d). The correspondence between the identified units (lineages and genetic clusters) and the circumscription of different government administrations (Madrid, Castilla-La Mancha and Andalucía regions) could facilitate the establishment of regional conservation plans aimed at implementing the most efficient management strategies within each territory. Although nearby populations (TAJU-BELI and PHUE-SALI) showed no apparent signatures of genetic fragmentation (Fig. 1d), we must point that several lines of evidence suggest that this finding is not incompatible with a dramatic impact of human activities on the decline of the species at local and regional scales³. For instance, historical museum records indicate that many populations from the Northern cluster (specifically, in Madrid province) have been extirpated in the last decades^3,49. All remaining populations are extremely small and submitted to severe impacts of human intervention (e.g. land ploughing, urbanization) and stochastic phenomena (e.g. flash flooding) that have been linked to sharp population declines^3,49. The expected time-lag between population fragmentation/declines and the genetic consequences of such processes (disruption of gene flow, genetic differentiation, loss of genetic diversity, etc.) might explain why recent human impacts have not been yet reflected in spatial patterns of genetic variation⁶⁴. Unfortunately, the small number of extant populations at local/regional scales (1–2 populations/genetic cluster), makes difficult to perform detailed analyses to evaluate the role of current landscape structure (e.g. land clearing for agriculture, urbanization, etc.) on the genetic connectivity of contemporary populations^27,65. Future genomic analyses of specimens available in museum collections³ could help to determine temporal changes in genetic diversity and study past patterns of gene flow in relation with historical landscape composition⁵².

Overall, our genomic data support that the different lineages and genetic clusters of D. crassiusculus can be regarded as independent units that require adequate conservation and management strategies to preserve their idiosyncratic evolutionary histories. Conservation actions for D. crassiusculus should be focused on the preservation of areas with sensitive habitat occupied by the main lineages and units delineated by our genomic analyses. These should include the control of negative human interventions and the monitoring of local populations, actions that could also benefit other co-distributed and poorly-known species with similar ecological requirements and fragmented populations linked to gypsum and salt steppes of the Iberian Peninsula^{7,10,27,66,67}. Given the extremely low number and size of extant populations of the species, ex-situ conservation plans and reintroduction/translocations programmes in restored habitats could help to reduce the chances of species/lineage extinction^68,69. These conservation actions should always consider the genomic singularity of the different units identified in this study and be accompanied with long-term habitat management and population monitoring^68,70. Future studies including detailed ecological information (e.g. diet analyses⁷¹) and genome scans to detect potential loci under selection implicated in ecological adaptation^72,73 would be of great help to get a better understanding of the processes underlying the evolutionary history of the different lineages and refine the conservation actions for this endangered species.

Methods

Study area and sampling

During 2008–2015, we sampled six populations of Dociostaurus crassiusculus (Pantel, 1886) (Fig. 1; Table 4). All the populations were found in areas with a particular lithological composition (evaporites, limestones, and conglomerates) and with plant communities linked to gypsum or hypersaline soils. We are confident that these populations cover the entire distribution range of the species, as other areas with potentially adequate habitats (i.e. pseudo-steppe saline grounds, wastelands with halophytic vegetation and surroundings of hypersaline/saline lagoons with marl-gypsum outcrops) have been extensively prospected without any records of the species^3,48. Dociostaurus crassiusculus has been recently assigned to the category “endangered” in the IUCN red list of threatened species due to the high fragmentation of its very small size populations^3,44 and, for this reason, we only collected 5–6 adult individuals per population. We aimed at collecting an equal number of males and females in each locality, but samples sizes are often male-biased due to very low female numbers in some populations. Monitoring of some of the studied populations indicates that the abundance of D. crassiusculus in years before and after sampling was qualitatively similar, which suggests that the removal of only 5–6 individuals per locality had little impact on the population dynamics of the species. Fresh whole specimens were stored in 2,000 µL ethanol 96% at −20 °C until used for genomic analyses.

Table 4 Geographical location and number of samples (n) for the studied populations of the Iberian cross-backed grasshopper Dociostaurus crassiusculus.

Full size table

DNA extraction and genomic library preparation

We used NucleoSpin Tissue kits (Macherey-Nagel, Durën, Germany) to extract and purify genomic DNA from the hind femur of each individual. Genomic DNA was individually barcoded and processed into one genomic library using the double-digestion restriction-fragment-based procedure (ddRADSeq) described in Peterson, et al.⁷⁴. In brief, DNA was doubly digested with the restriction enzymes MseI and EcoRI (New England Biolabs, Ipswich, MA, USA) and Illumina adaptors including unique 7-bp barcodes were ligated to the digested fragments. Ligation products were pooled, size-selected between 475–580 bp with a Pippin Prep (Sage Science, Beverly, MA, USA) machine and amplified by PCR with 12 cycles using the iProof^TM High-Fidelity DNA Polymerase (BIO-RAD, Hercules, CA, USA). The library was sequenced in a single-read 150-bp lane on an Illumina HiSeq. 2500 platform at The Centre for Applied Genomics (SickKids, Toronto, ON, Canada).

Genomic data processing and bioinformatics

We used both Stacks v. 1.35^75,76,77 and PyRAD v. 3.0.66⁷⁸ to assemble our sequences into de novo loci and call genotypes. This allowed us to examine the robustness of our analyses based on SNP datasets obtained using two of the most popular programs currently available to assemble RAD-seq data^76,78. The choice of different filtering thresholds using either Stacks or PyRAD had little impact on the obtained inferences⁵⁰. For this reason, unless otherwise indicated, all downstream analyses were performed using a SNP dataset obtained with Stacks including only those loci that were represented in at least four populations (p = 4). See Supplementary Methods for additional details on sequence assembling and data filtering.

Population genetic statistics

Population genetics statistics, including major allele frequency (P), nucleotide diversity (π), observed (H_O) and expected (H_E) heterozygosity, and the Wright’s inbreeding coefficient (F_IS), were calculated using the program populations from Stacks⁷⁵. For biallelic RADSeq loci, π is an estimate of expected heterozygosity and is therefore a useful measure of the genetic diversity of populations. Furthermore, F_IS measures the reduction in observed heterozygosity as compared to expected heterozygosity for an allele in a population, with positive values indicating non-random mating or cryptic population structure^79,80,81,82. Pair-wise F_ST values of genetic differentiation were calculated between all pairs of populations in Arlequin v.3.5⁸³. We used PGDSpider v. 2.1.0.3⁸⁴ to convert Variant Call Format (VCF) files provided by Stacks into the correct format needed for Arlequin.