Purging of highly deleterious mutations through severe bottlenecks in Alpine ibex

Human activity caused dramatic population declines in many wild species. The resulting bottlenecks have a profound impact on the genetic makeup of a species with unknown consequences for health. A key genetic factor for species survival is the evolution of deleterious mutation load, but how bottleneck strength and mutation load interact lacks empirical evidence. Here, we take advantage of the exceptionally well-characterized population bottlenecks of the once nearly extinct Alpine ibex. The species survived one of the most dramatic bottlenecks known for successfully restored species. We analyze 60 complete genomes of six ibex species and the domestic goat. We show that historic bottlenecks rather than the current conservation status predict levels of genome-wide variation. By retracing the recolonization of the Alps by Alpine ibex, we find genomic evidence of concurrent purging of highly deleterious mutations but accumulation of mildly deleterious mutations. This demonstrates how human-induced severe bottlenecks caused both relaxed selection and purging, thus reshaping the landscape of deleterious mutation load. Our findings also highlight that even populations of ~1000 individuals can accumulate mildly deleterious mutations. Hence, conservation efforts should focus on preventing population declines below such levels to ensure long-term survival of species.


Abstract 25
Human activity caused dramatic population declines in many wild species. The resulting bottlenecks 26 have a profound impact on the genetic makeup of a species with unknown consequences for health. A 27 key genetic factor for species survival is the evolution of deleterious mutation load, but how 28 bottleneck strength and mutation load interact lacks empirical evidence. Here, we take advantage of 29 the exceptionally well-characterized population bottlenecks of the once nearly extinct Alpine ibex. 30 The species survived one of the most dramatic bottlenecks known for successfully restored species. 31 We analyze 60 complete genomes of six ibex species and the domestic goat. We show that historic 32 bottlenecks rather than the current conservation status predict levels of genome-wide variation. By 33 retracing the recolonization of the Alps by Alpine ibex, we find genomic evidence of concurrent 34 purging of highly deleterious mutations but accumulation of mildly deleterious mutations. This 35 demonstrates how human-induced severe bottlenecks caused both relaxed selection and purging, thus 36 reshaping the landscape of deleterious mutation load. Our findings also highlight that even 37 populations of ~1000 individuals can accumulate mildly deleterious mutations. Hence, conservation 38 efforts should focus on preventing population declines below such levels to ensure long-term survival 39 of species. 40 41 4 deleterious mutations. Alpine ibex were reduced to ~100 individuals in the 19th century in a single 70 population in the Gran Paradiso region of Northern Italy 37 . In less than a century, a census size of ca. 71 50'000 individuals has been re-established across the Alps. Thus, the population bottleneck of Alpine 72 ibex is among the most dramatic recorded for any successfully restored species. Most extant 73 populations experienced two to four, well-recorded bottlenecks leaving strong footprints of low 74 genetic diversity 38,39 . 75

85
Both Alpine and Iberian ibex experienced severe bottlenecks due to overhunting and habitat 114 fragmentation. We first analyzed evidence for purifying selection using allele frequency spectra. We 115 focused only on derived sites that were polymorphic in at least one of the two sister species ( Figure  116 2A, S8). We found that frequency distributions of high and moderate impact mutations in Alpine ibex 117 were downwards shifted compared to modifier (i.e. neutral) mutations, which strongly suggests 118 purifying selection against highly deleterious mutations ( Figure 2B, D). Short indels (≤ 10 bp) in 119 coding sequences revealed a similar downward shift ( Figure S9). We found no comparable frequency 120 shifts in Iberian ibex ( Figure 2B). This is consistent with purifying selection acting more efficiently 121 against highly deleterious mutations in Alpine ibex compared to Iberian ibex. We repeated the 122 analyses of the site frequency spectra also for three alternative scoring systems (GERP, phyloP and 123 phastCons) reporting phylogenetic conservation. We found for all three scores an excess in Alpine 124 ibex of the most deleterious mutation category ( Figure S10).

142
To test whether Alpine ibex showed evidence for purging of deleterious mutations, we calculated the 143 relative number of derived alleles Rxy 29 for the different categories of mutations ( Figure 2D). We 144 used a random set of intergenic SNPs for standardization, which makes Rxy robust against sampling 145 effects and population substructure 29 . Low and moderate impact mutations (i.e. mildly deleterious 146 mutations) showed a minor excess in Alpine ibex compared to Iberian ibex, indicating a higher load 147 in Alpine ibex. In contrast, we found that highly deleterious mutations were strongly reduced in 148 Alpine ibex compared to Iberian ibex (Tukey test, p < 0.0001, Figure 2E). Strikingly, the proportion 149 of SNPs across the genome segregating a highly deleterious mutation is higher in Alpine ibex ( Figure  150 1E), but Rxy shows that highly deleterious mutations have a pronounced downwards allele frequency 151 shift in Alpine ibex compared to Iberian ibex. Furthermore, the number of homozygous sites with 152 highly deleterious mutations per individual was significantly lower in Alpine ibex than Iberian ibex (t 153 test, p = 0.015, Figure 2E). Individual allele counts at highly deleterious sites were also significantly 154 lower in Alpine ibex compared to Iberian ibex (t test, p = 0.003). We assessed the robustness of the 155 Rxy analyses using four additional mutation scoring methods (i.e. SIFT, REVEL, CADD and 156 VEST3). We found that the highest impact category had a consistent deficit in Alpine ibex compared 157 to Iberian ibex ( Figure S12).. Together, this shows that highly deleterious mutations were 158 substantially purged in Alpine ibex. We also found evidence for the accumulation of mildly 159 deleterious mutations through genetic drift in Alpine ibex. term effective population sizes. For this, we used detailed demographic records spanning the near 174 century since the populations were established 47,48 . We found that nucleotide diversity decreased with 175 smaller long-term population size (Spearman's rank correlation, rho = 0.93, p = 0.007, Figure 3C). We 176 found the same trend for the individual number of heterozygous sites per kilobase ( Figure S14).

198
Bottlenecks affect the landscape of deleterious mutations by randomly increasing or decreasing allele 199 frequencies at individual loci. We find that individuals from populations that underwent stronger 200 bottlenecks carry significantly more homozygotes for nearly neutral and mildly deleterious mutations 201 (i.e. modifier, low and moderate impact mutations; Figure 4A). In contrast, individuals showed no 202 meaningful difference in the number of homozygotes for highly deleterious (i.e. high impact) 203 mutations across populations. The stability in the number of homozygotes for highly deleterious 204 mutations through successive bottlenecks despite a step-wise increase in the number of homozygotes 205 for weaker impact mutations, supports that purging occurred over the course of the Alpine ibex 206 reintroductions. We repeated the analyses using an alternative scoring of mutations based on the 207 phylogenetic conservation of the region in which the mutations were found (i.e. Genomic 208 Evolutionary Rate Profiling; GERP). The number of homozygotes for mutations in highly conserved 209 regions showed a slight upwards trend still indicating purifying but not necessarily purging for this 210 category of mutations ( Figure S15). This suggests that the mutational impact (e.g. premature stop 211 codons) rather than degree of conservation predicts whether purging is likely to occur. This suggests 212 also that mean fitness should be more directly assessed using e.g. simulation datasets (see below). 213 Because the above findings are contingent on a model where deleterious mutations are recessive, we 214 also analyzed the total number of derived alleles per individual. We find a consistent but less 215 pronounced increase in total number of derived alleles per individual for nearly neutral and mildly 216 deleterious mutations ( Figure 4B). In contrast, the total number of derived alleles for highly 217 deleterious mutations did not correlate with the strength of bottleneck and was lowest in the most 218 severely bottlenecked Alpi Marittime population ( Figure 4B), suggesting that the most deleterious 219 mutations were purged in this population. The Rxy statistics showed a corresponding strong deficit in 220 the Alpi Marittime population ( Figure 3E). We repeated the Rxy analyses using four additional 221 mutation scoring methods (i.e. SIFT, REVEL, CADD and VEST3) and found a consistent deficit of 222 the highest impact category in the Alpi Marittime ( Figure S12). Overall, we find evidence for more 223 purging in the most bottlenecked Alpine ibex population. founder size ( Figure 6A, S16, Table S1). We used Rxy to analyze the evolution of deleterious 248 mutation frequencies through the reintroduction bottlenecks. The simulations showed a deficit of 249 highly deleterious mutations, but an increase of mildly deleterious mutations after the reintroduction 250 bottlenecks ( Figure 6B). This is consistent with our empirical evidence for purging during the species 251 bottleneck ( Figure 2E). We computed genetic load defined as the mean individual fitness in females 252 and found an increase in Alpine ibex following the species bottleneck ( Figure S17) Ibex species with recently reduced population sizes accumulated deleterious mutations compared to 282 closely related species. This accumulation was particularly pronounced in the Iberian ibex that 283 experienced a severe bottleneck and Alpine ibex that went nearly extinct. We show that even though 284 Alpine ibex carry an overall higher mutation burden than related species, the strong bottlenecks 285 imposed by the reintroduction events purged highly deleterious mutations. Importantly, purging was 286 only effective against highly deleterious mutations. Mildly deleterious mutations actually 287 accumulated through the reintroductions. Hence, the overall number of deleterious mutations 288 increased with bottleneck strength. This is consistent with the finding that population-level 289 inbreeding, which is a strong indicator of past bottlenecks, is correlated with lower population growth 290 rates in Alpine ibex 50 . Alpine ibex individuals). Indels up to 10 bp were also retained and filtered using the same filters and 324 filter parameters, except for not including the filter MQRankSum, because this measure is more likely 325 to be biased for indels of several base pairs. Filtering parameters were chosen based on genome-wide 326 quality statistics distributions (see Figures S23 -S40). Variant positions were independently validated 327 by using the SNP caller Freebayes (v1.0.2-33-gdbb6160 56 ) with the following settings: --no-complex 328 --use-best-n-alleles 6 --min-base-quality 3 --min-mapping-quality 20 --no-population-priors --hwe-329 priors-off. 330 To ensure high-quality SNPs, we only retained SNPs that were called and passed filtering using 331 GATK, and that were confirmed by Freebayes. Overall, 97.5 % of all high-quality GATK SNP calls 332 were confirmed by Freebayes. This percentage was slightly lower for chromosome X (96.7%) and 333 unplaced scaffolds (95.2%). We tested whether the independent SNP calls of GATK and Freebayes 334 were concordant and we could validate 99.6% of the biallelic SNPs. We retained genotypes called by 335 GATK and kept SNPs with a minimum genotyping rate of 90% for all further analysis. The total 336 number of SNPs detected was 59.5 million among all species. Per species, the number of SNPs 337 ranged from 21.9 million in the domestic goat (N=16) to 2.0 million in Markhor (N=1 , Table S2).  Figures S4 and S5). Option --viterbi-training was used to estimate transition probabilities before 360 running the HMM. Running the analysis without the option --viterbi-training led to less but longer 361 ROH ( Figures S3-S5). ROH were also estimated using PLINK (v1.90b5, https://www.cog-362 genomics.org/plink/) with the following settings: --homozyg-window-het 2, 363 --homozyg-window-missing 5, --homozyg-snp 100, --homozyg-kb 500, --homozyg-density 10, --364 homozyg-gap 100, --homozyg-window-threshold .0. ROH estimates based on PLINK were overall 365 slightly lower but the qualitative trends hold among species and population (Figures S41 and S42). Deleterious mutations are assumed to be overwhelmingly derived mutations. We used all ibex species 419 except Alpine and Iberian ibex as an outgroup to define the derived state. For each biallelic site, 420 which was observed in alternative state in Alpine ibex or Iberian ibex, the alternative state was 421 defined as derived if its frequency was zero in all other species (a total of 44'730 autosomal SNPs). 422 For loci with more than two alleles, the derived state was defined as unknown. For comparisons 423 among all species, we only used the following criteria to select SNPs (370'853 biallelic SNPs 424 retained): transcriptional activity (FPKM > 0.3 in at least one organ) and GERP > -2. The minimum 425 GERP score cut-off was set to retain only high-quality chromosomal regions following previously 426 established practice e.g. 24 . We further followed the following categorizations adopted for human 427 populations to identify moderate to highly deleterious mutations e.g. 24 : -2 to +2 is considered neutral 428 or near neutral, 2 to 4 considered as moderate, 4 to 6 as large and >6 as extreme effects. We also 429 required a minimal distance to the next SNP of 3bp to avoid confounding effects of potential multi-430 nucleotide polymorphisms (MNPs). 431 432

Population genetic analyses 433
Site frequency spectra (SFS) were calculated using the R packages plyr and dplyr. SFS analyses were 434 performed for SnpEff and GERP categories and two additional conservation scores: phyloP 66 and 435 phastCons 67 . We chose a cutoff of 1 to distinguish conserved from less conserved sites. In the case of 436 phyloP, sites with a score above 1 were defined as conserved. For phastCons, sites with a score equal 437 to 1 (the maximum observed value) were considered as conserved. 438 439 For individual counts of derived alleles or homozygotes, we used all biallelic sites polymorphic either 440 in Alpine or Iberian ibex (or both) for which the derived state was known with a maximal missing rate 441 per locus of 10%. We retained all sites matching these criteria for any downstream analyses even if a 442 particular site was not polymorphic in any given population. The effective rate of missing data per 443 locus was between 0.03-0.07% ( Figure S43) and no correlation was found between missing rate per 444 population and counts. We found no qualitative differences if we included only loci with a 100% 445 genotyping rate ( Figure S44). 446 447 We calculated the relative number of derived alleles Rxy 29 for the different categories of mutations. 448 Rxy compares the number of derived alleles found at sites within a specific category. Following 9,29 , 449 we used a random set of 65592 intergenic SNPs for standardization, which makes Rxy robust against 450 sampling effects and population substructure. We performed the Rxy analysis for the four SnpEff Based on empirical evidence, we assumed a negative relationship between h and s 75 . We used the 474 exponential equation h = exp(-51*s)/2 with a mean h set to 0.37 following 76 . We assumed hard 475 selection acting at the offspring level. In addition to the 5000 loci under selection, we simulated 500 476 neutral loci. Recombination rates among each neutral or deleterious locus was set to 0.5. This 477 corresponds to an unlinked state. Initial allele frequencies for the burn-in were set to µ / h * s = 478 0.0014 (corresponding to the expected mean frequency at mutation-selection balance 77 ). Mutation 479 rate µ was set to 5e-05 and deleterious mutations were allowed to back-mutate at a rate of 5e-07. The high extinction rate of the zoo Interlaken Harder did not affect the outcome of the simulations. 490 The extinctions were a result of the strong reduction in population size during the founding and 491 occurred always after the founding (see also Figure S16). The extinctions of the reintroduced 492 populations did not affect the estimates of derived allele counts but reduced sample sizes and, hence, 493 affected the variance of estimators. Genetic load information was retrieved using the delet statistics 494 option and is defined in Nemo as the mean realized fitness (L =