Genomic signatures of drift and selection driven by predation and human pressure in an insular lizard

Genomic divergence was studied in 10 small insular populations of the endangered Balearic Islands lizard (Podarcis lilfordi) using double digest restriction-site associated DNA sequencing. The objectives were to establish levels of divergence among populations, investigate the impact of population size on genetic variability and to evaluate the role of different environmental factors on local adaptation. Analyses of 72,846 SNPs supported a highly differentiated genetic structure, being the populations with the lowest population size (Porros, Foradada and Esclatasang islets) the most divergent, indicative of greater genetic drift. Outlier tests identified ~ 2% of loci as candidates for selection. Genomic divergence-Enviroment Association analyses were performed using redundancy analyses based on SNPs putatively under selection, detecting predation and human pressure as the environmental variables with the greatest explanatory power. Geographical distributions of populations and environmental factors appear to be fundamental drivers of divergence. These results support the combined role of genetic drift and divergent selection in shaping the genetic structure of these endemic island lizard populations.


Results
A total of 6.8 billion paired-end reads of 101 bp length were generated from the 91 individuals. Following application of denovo_map.pl and described filtering steps, 288,286 SNPs were called from 80,091 ddRAD contigs, with a mean coverage of 28.6 per site. The first SNP for each locus was retained leaving 72,846 SNPs for analysis (this number is fewer than the number of loci due to removal of SNPs present in only 20% of individuals).
Population structure. Nucleotide diversity ranged between 0.120 (Porros islets) and 0.182 (Cabrera harbour). Foradada, Esclatasang and Porros presented the highest number of private alleles (746, 475, and 945, respectively) indicating considerable genetic divergence, with little or no gene flow between them and the other populations, probably due to their strong geographical isolation. In general, inbreeding coefficients (F IS ) were low (less than 10%) (Supplementary Table 1). Patterns of divergence based on F ST distance analysis were highly congruent with previous results, with the populations of the islands of Menorca showing a clear differentiation with respect to the populations of the islands of Mallorca together with Cabrera populations (Supplementary Figure 1). Using all 72,846 SNPs, the greatest divergence was between Porros islet (Menorca) and all other populations from Mallorca and Cabrera and between the two Cabrera islets (Foradada and Esclatasang) and Menorcan populations. Lowest divergence was found between the two locations within Cabrera island (harbour and lighthouse), between the populations of the islands of Mallorca (Dragonera and Colomer) and Cabrera main island, and among all Menorcan islands (with the exception of Porros). The divergent position of Porros, Foradada and Esclatasang was less pronounced when only outlier SNPs (1,355 SNPs) were considered, while Mallorca populations were more divergent with respect to Cabrera populations (Supplementary Figure 1). The best-supported values of K in the Admixture analysis were K = 5 (CV = 0.372) or K = 6 (CV = 0.388) for the first single SNPs dataset. The divergent positions of Porros, Foradada and Esclatasang islets was corroborated by these results; Dragonera and Colomer grouped with Cabrera main island with K = 5 or formed an independent group with K = 6 ( Fig. 2). When only outlier SNPs were used, Admixture analyses supported separation into three geographic groups (Menorca, Mallorca and Cabrera), with the exception of Porros islet, when K was set to four (CV = 0.288). When K = 6 (CV = 0.294), Porros, Aire and Foradada were revealed as independent groups (Fig. 2).
Patterns of differentiation observed in the previous analysis match with the population structure obtained with DAPC analyses. The k-means clustering algorithm, used prior to DAPC analyses revealed lowest BIC values (637.3) for 10 clusters. Cross-validation showed that use of the first 15 PCs (55.3% of variance) provided higher assignment rates (99.5%) and the lowest root mean squared error (RMSE) (0.016), justifying the use of this subset of PCs in the analysis. The first PC (51.2% of variance) separated all populations into two major groups: Menorcan populations and all the remaining populations from Mallorca and Cabrera. All lizard populations were grouped by island (Cabrera main island, Dragonera, Porros, Aire, Foradada, Esclatasang and Colomer), except for Rei and Colom islets in Menorca that grouped together. Ten clusters were also favored when analyses were carried out using only SNPs that were candidates for selection, and variance was best explained by 25 PCs (90.2% of variance). In this case, the first PC (91.4%) reinforced the clear separation between Menorca islands and Mallorca islands and Cabrera populations. The populations grouped geographically (Menorca, Mallorca and Cabrera), except for Porros islet which continued showing a divergent position (Supplementary Figure 2). NJ tree based on F ST distances (Fig. 3) confirmed the results found using the admixture analysis.
As expected, positive association had been obtained between N and N e , and between N and nucleotide diversity (pi) and N e and pi. Negative correlations had been achieved between mean F ST and N, but not with N e (data not shown). Migration rates (estimated by divMigrate) did not show gene flow between Menorca islands and Mallorca islands and Cabrera populations (Fig. 4a). The highest migration rates were observed between Aire, Colom and Rei islands in Menorca (0. 68-0.89) and between the two localities situated in Cabrera main island (harbour and lighthouse) (0.88-1.00). These migration rates are almost symmetrical. The population from the smallest islet (Porros) did not showed gene flow even with other proximate populations. Directional migration from the populations of the islands of Mallorca (Dragonera and Colomer) to Cabrera archipelago was also observed (0.25-0.44). The Fig. 4b, showed an asymmetric and high migration rate from Mallorca islands to Cabrera archipelago, and low values between Mallorca islands/Cabrera and populations of the islands of Menorca.
Candidate regions under selection. A total of 1,355 candidate sites for selection from 72,846 RAD tags were determined by BayeScan under a prior of 1:100 for selected:neutral sites. This increased to 2,884 sites when a ratio of 1:10 was used, and decreased to 732 sites when the prior ratio was 1:1000. Comparison of prior and posterior proportions suggests a true ratio between 1:10 and 1:100 and so our use of a 1:100 prior provides quite conservative results. After filtering, a total of 141 of the 184 RAD sites that contained outlying SNPs produced hits on BLASTn and hits with < 30% query coverage were discarded (Table 1).
Environmental association analysis. The RDA analysis that used all SNPs indicated that the variation explained by the environmental variables (20.1%) was lower than the unexplained variance (79.9%) (Fig. 5). However, when the analysis was based on only outlier SNPs (1,355), environmental variables explained most of the variation (60.4%). The low explanatory power obtained with all SNPs is not surprising given that we expect that most of the SNPs in our global dataset to be neutral and not associated with environmental predictors. A total of 58 loci with associations with environmental variables were detected, most of which were related to human pressure (53.5%) and predation (36.2%). Some of these associated SNPs have been found to be related to locomotory and feeding behavior (NEGR1, GRM1), perception of pain (GRM1), lipid metabolism (GDPD2) or ion transport (FHL1, FTH1, SLC9A6), microtubule formation (CLIP1), myoblast differentiation (MBNL3), embryonic development (INTS6L), pH regulation (SLC9A6), toxin transport (DNAJC17), cell adhesion (ESAM, NEGR1), hormone regulation (TG, NCOA1), brain development and cognition (SHROOM4).

Discussion
The RADseq methodology has been applied in other studies of squamate (lizards and snakes), increasing understanding of the processes related to genetic divergence and the identification of genomic regions of interest. The total number of SNPs obtained in this study (288,286) agree with the SNP density found in other RADseq studies of reptiles, with relatively high levels of diversity detected 30,31 . Population structure analysis revealed a clear genetic structure among all the populations of P. lilfordi, independent of whether we used SNPs from all RAD tags or just candidates for selection 32 . Major genetic structuring mirrors that found using mtDNA, with high levels of divergence between Menorca islands and Mallorca islands/Cabrera populations 19,20 . However, analyses of outlier SNPs revealed greater similarity between the northern Mallorca Colomer population and other Mallorca islands (Dragonera), which differs from the pattern found in mtDNA 20 . These results together with the high migration rate detected between Mallorca islands and the Cabrera archipelago populations supports a previous proposal 20 that the Colomer Island could be home of a relict population representative of the early population that once colonized Mallorca Island. The populations with the smallest population sizes (Porros, Foradada and Esclatasang islets) were most divergent with highest F ST values and the greatest number of private alleles relative to other populations, which supports previous findings 30,33 and is suggestive of genetic drift. Long-term isolation and small population size www.nature.com/scientificreports/ should lead to decreased genetic diversity and increased inbreeding coffecients [34][35][36][37] . While nucleotide diversity was low 38 , inbreeding values were under 10% 39,40 which is not indicative of an inbreeding effect. It is worth highlighting evidence of adaptive divergence among lizard populations based on F ST outlier tests. Almost 2% of total SNPs were candidates for selection. These loci were related to several functions with direct survival value such as tail regeneration, reproduction, lipid metabolism and circadian rhythm. Nonetheless,    We show that environmental variables appear to be an important driver of divergence between lizard populations after taking into account the effect of historical divergence. The RDA analysis revealed most SNPs that were influenced by the environment were associated with levels of predation and human pressure. These SNPs were involved in diverse functions most notably with feeding and locomotory behavior. The explanatory power of the remaining environmental predictors, such as the biotic capacity of islands, the presence of rats, or the existence of breeding colonies of gulls, is negligible. Some behavioral and physiological differences between populations can   www.nature.com/scientificreports/ be related to differences in predation and human pressures, as in the case of escape behavior in lizard populations with or without terrestrial predators. For example, predation pressure has previously been shown to influence flight initiation distance, distance fled, or hiding time in Balearic lizard populations [44][45][46][47] . Predation has traditionally been identified as a major selective factor shaping the morphological and demographic evolution of animal species 48 . Unlike many terrestrial vertebrates that have evolved in the presence of these selection pressures over millions of years, P. lilfordi has evolved for ~ 5.3 Ma in a pristine environment, free from terrestrial predators 16 . The subsequent arrival of humans ~ 5000 years ago caused a major change as allochthonous predators were introduced. Hence there is a strong association between indices of human pressure and predation pressure as a result of this Holocenic arrival 16,17,49 .
It is interesting that this selection has had a strong and detectable effect on the genomic structure of these populations in a relatively short time. This has been described in a few other studies [50][51][52] . However, to our knowledge, this is the first case where predator and human pressures have been functionally linked with possible selection on loci involved in physiological functions that are directly involved with locomotor and escape behaviors. Same human-driven factors are often responsible of rapid adaptation and current extinction crisis 53 . This fact implies that the study of rapid adaptation to novel environment changes, especially those related with humans, has an inmediate relevance to conservation biology. For this reason, the study of adaptive evolution need to be incorporate into conservation strategies of insular terrestrial vertebrates populations and specifically in the Balearic lizard. In this way, Ashley et al. 25 proposed the promotion of an evolutionary enlightened management in which conservation decisions need to take into account the evolutionary effects of anthropogenic changes.
Overall, our results reveal that both evolutionary processes, associated with isolation and small population size, and selective factors, related to environmental patterns (specifically human pressure and level of predation) have played a role in shaping divergence between Balearic lizard populations.

Methods
Sample collection, DNA extraction, library preparation, and sequencing. Tissue samples were collected from 94 lizards (P. lilfordi) from 10 different sampling locations across the Balearic archipelago ( Fig. 1 and Table 2). Populations were selected to cover a diverse range of substrates, orographies, plant cover, presence of terrestrial predator and human pressure, as well as different population sizes and different mtDNA clades ( Table 2). Total genomic DNA was extracted from each tissue sample using DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer's standard protocol with a specific RNase copurification step. DNA was quantified using the Thermo Fisher Scientific Qubit 3.0 Fluorometer (ThermoFisher Scientific) and quality evaluated using agarose gel and Nanovue Plus Spectrophotometer (GE Healthcare, UK Limited). Paired-end ddRADseq libraries were prepared and sequenced by Floragenex (Eugene, Oregon, USA), following Peterson et al. 28 and Truong et al. 54 protocols. Full details are provided in Supplementary Methods. Data processing and variant calling. Stacks v2.4 55 pipelines were used to process the sequence reads and call SNPs for each individual. First, a demultiplexing and quality filtering step was carried out using process_radtags with the default parameters. Clean reads were used to perform a de novo RAD assembly using the denovo_ map.pl pipeline. The percentage of missing genotypes for each individual was calculated using the -missing-indv Table 2. Characteristics and environmental variables of the studied populations. n number of samples used for every population, S island surface area in hectares, predation indexes absence of terrestrial predators = 0; one occasional predators in the island = 1; one widespread predator was or is present in the island = 2; two frequent predators present in the island = 3, human pressure uninhabited island and very difficult access = 0; sporadic human presence and easy access = 1; regular human presence and easy access = 2; previous permanent human presence with constructions but with an actual protection = 3; present and past human presence = 4.  56 and three individuals with more than 79% of missing data were removed. SNPs present in RAD tags found in at least 80% (R) of individuals (Supplementary Figure 3) and with a minimum allele frequency (MAF) of 0.05 were selected and exported into a VCF file using populations. One single SNP per RAD tag was called using populations to reduce the effects of linkage disequilibrium. See Supplementary Methods.
Population structure. Several analyses were used to characterize population structure of island lizard populations based on all RAD-tag information (single SNP selected from each tag, referred to as the all-SNP dataset: VCF file in Appendix S1) and using only outlier SNPs (see later for identification of outliers: VCF file in Appendix S2). First, two different programs, Stacks v4.2 55 and hierfstat R package 57 , were used to estimate levels of genetic variability among different lizard populations. Second, population structure was examined with Admixture v1.3.0 program 58 based on both datasets, for K = 2 to K = 10 co-ancestry clusters. Third, patterns of genetic divergence on both datasets were analyzed using two approaches. Discriminant Analysis of Principal Components (DAPC) was performed using the R package adegenet 59 to obtain an overall representation of the divergence between populations and Neighbor-Joining (NJ) trees were inferred using Mega 7 60 based on pairwise F ST distances. Effective population size (N e ) for each population has been estimated with the software NeEstimator v2.0.1 61 using the molecular coancestry method. Linear regression analyses between N and N e , pi and N, pi and N e , and F ST with N and with N e , was performed with Pearson correlation. To investigate migration rates between each locality and between each island (Mallorca, Menorca and Cabrera), migration networks were generated usind divMigrate function 62 in the R package diveRsity 63 based on G ST genetic distance 64 with 1000 bootstrap repetitions and a filter threshold of 0.25. More information is provided in the Supplementary Methods.

Test of selection and environmental association analysis.
Tests of selection was carried out to explore the role of divergent selection using BayeScan 65 . This program identifies candidate loci under selection using an F ST outlier approach across all sampled populations. The BayeScan algorithm is based on an island model in which subpopulations differ from a common migrant pool. Thus, a departure from neutrality is identified at a SNP when the overall genome divergence between different subpopulations is insufficient to explain its diversity across these subpopulations.
Genome-environment association (GEA) is an important tool for the examination of local adaptation to heterogeneous landscapes 66,67 . Climatic variables were not used as environmental predictors because the Balearic lizard inhabits a reduced geographical range with minimal climatic variation 23 . Six environmental traits were considered: biotic capacity, number of vascular plants species, predation pressure, human pressure, and presence/absence of rats and gulls. All of these traits are related to natural resources on the islands and factors that potentially affect the lizards' survival and were known to show clear differences among the populations studied. Partial redundancy analysis (RDA) was used as a GEA method to identify adaptive loci based on associations between genetic data and environmental predictors 68 . See Supplementary Methods. Ethical statement. All tail tips samples used in this study were obtained in accordance with Ethical Guidelines of the Universities of Balearic Islands and Salamanca, particularly, following the Bioethics Committee Guidelines of the University of Salamanca. The Ethical Committee from the University of Salamanca publishes general Guidelines concerning the experimental protocols with laboratory animals. These general Guidelines for laboratory animals can be read in http://www.usal.es. According to these Guidelines, only the requirements applicable to our study were implemented simply because we did not perform any experiment with lizards in captivity. Field protocols for the capture, handling and release of lizards (which was done at the site of capture a few minutes after the sampling of tail tips) were approved by the competent authority: the Nature Conservation Agency (Conselleria de Medi Ambient) of the Government of Balearic Island (permits: CEP 02/2018 and CEP 10/2016 to V.P.-M. and A. P.-C.).

Data availability
Individual raw sequences are available at the Sequence Read Archive (SRA) (BioProject ID: PRJNA645796). The VCF files with first single SNPs and only with outlier SNPs putatively under selection are found on Appendices S1 and S2.