Selection, recombination and population history effects on runs of homozygosity (ROH) in wild red deer (Cervus elaphus)

The distribution of runs of homozygosity (ROH) may be shaped by a number of interacting processes such as selection, recombination and population history, but little is known about the importance of these mechanisms in shaping ROH in wild populations. We combined an empirical dataset of >3000 red deer genotyped at >35,000 genome-wide autosomal SNPs and evolutionary simulations to investigate the influence of each of these factors on ROH. We assessed ROH in a focal and comparison population to investigate the effect of population history. We investigated the role of recombination using both a physical map and a genetic linkage map to search for ROH. We found differences in ROH distribution between both populations and map types indicating that population history and local recombination rate have an effect on ROH. Finally, we ran forward genetic simulations with varying population histories, recombination rates and levels of selection, allowing us to further interpret our empirical data. These simulations showed that population history has a greater effect on ROH distribution than either recombination or selection. We further show that selection can cause genomic regions where ROH is common, only when the effective population size (Ne) is large or selection is particularly strong. In populations having undergone a population bottleneck, genetic drift can outweigh the effect of selection. Overall, we conclude that in this population, genetic drift resulting from a historical population bottleneck is most likely to have resulted in the observed ROH distribution, with selection possibly playing a minor role.


INTRODUCTION
The inheritance of two genomic segments that are identical by descent due to mating of related individuals generates runs of homozygosity (ROH) in the offspring (Gibson et al. 2006). In recent years, with increased availability of high-density genomic data due to SNP arrays and whole genome sequencing, the study of ROH has increased (Ceballos et al. 2018a;Curik et al. 2014;Pryce et al. 2014). Previous studies have shown that the length, abundance and genomic location of ROH vary considerably between populations and can be affected by a number of related processes, including-but not limited to-selection, recombination, and population history (Ceballos et al. 2018b;Curik et al. 2014;Kardos et al. 2017). By investigating ROH, it is possible to gain insights into the relative importance of the mechanisms that cause them.
An allele under strong positive selection is expected to increase in frequency in a population, which will result in increased homozygosity at the site and neighbouring sites and eventually lead to a region where a ROH is common in the population (Sabeti et al. 2002). This concept has been widely applied to identify signatures of selection in humans (Nothnagel et al. 2010;Pemberton et al. 2012), livestock (Ghoreishifar et al. 2020;Kim et al. 2013;Mastrangelo et al. 2017;Peripolli et al. 2018;Purfield et al. 2017;Zhang et al. 2022), and a number of other organisms (Gorssen et al. 2021;Grilz-Seger et al. 2019;Kumar et al. 2020). By searching for regions in the genome where ROH are common, often called ROH hotspots (Pemberton et al. 2012) or ROH islands (Nothnagel et al. 2010), it is possible to identify genomic regions of interest and even identify genes under selection. For example, in dairy cattle, genes associated with lactation and milk yield are present in regions where more than 50% of the population harbour a ROH (Peripolli et al. 2018).
ROH length and position also correlate inversely with local recombination rate (Pemberton et al. 2012). Recombination rate varies within and between chromosomes (Johnston et al. 2017;Kawakami et al. 2014;McVean et al. 2004), and studies have shown that ROH hotspots tend to be sites of lower recombination rate (Kardos et al. 2017;Stoffel et al. 2021). In low recombination regions, long haplotypes are rarely broken up by meiotic crossovers, meaning they are more likely to come together in longer ROH. Conversely, in high recombination regions, long haplotypes are more likely to be broken down, resulting in shorter ROH (Bosse et al. 2012). Additionally, the power to detect ROH is higher in regions of low recombination rate, leading to an increased likelihood of identifying ROH, and in turn ROH hotspots (Kardos et al. 2017). Recombination rate is not, however, mutually exclusive to selection, and selection may interact with recombination rate to influence ROH distribution (Kardos et al. 2017). Selection can reduce genetic variation and effective population size (N e ) at closely linked loci (via a selective sweep), with the extent of reduced genetic variation dependent on the selection coefficient and local recombination rate (Charlesworth 2009). It is also important to note the potential for interactions between recombination rate and population history. Species or populations which are closely related are likely to have similar broad-scale recombination rate patterns across the genome, which can again result in more similar distributions of ROHs (Nothnagel et al. 2010).
Intuitively, population history events such as a population bottleneck or effective migration, can directly influence ROH abundance (Ceballos et al. 2018b;Foote et al. 2021;Mooney et al. 2021). The more related members of a population are, the more inbreeding will occur and the more ROH will be present. Individuals from the same populations tend to have a similar inbreeding coefficients, number of ROHs and distribution of ROH lengths, reflecting their shared population history Bosse et al. 2012;Cardoso et al. 2018). A more specific consequence of population history can occur following a population bottleneck, where drift and selection play an important role in shaping the ROH landscape. Genetic drift may result in certain alleles becoming fixed by chance, increasing the likelihood of generating ROHs at a site under inbreeding. Moreover, the interaction between effective population size (N e ) and selection may be key in the formation of ROH hotspots. The product of N e and the selection coefficient (s) of a new mutation can determine its likelihood of fixation in the population (Falconer and Mackay 1983). When s is high, an allele is likely to become fixed; however, when N e is small, genetic drift can outweigh the force of selection (Petit and Barbadilla 2009).
Together, selection, recombination rate and population history are likely to play interacting roles in shaping the incidence and genomic locations of ROH in a population. However, few studies have investigated all these factors in a wild population. Here, we aimed to investigate the influence of each factor on ROH number and location in a wild population of red deer inhabiting the island of Rum, Scotland. This individually-monitored population has experienced a population bottleneck, shows evidence for inbreeding depression (Huisman et al. 2016) and has a detailed genetic linkage map (Johnston et al. 2017) making it an ideal population to address our aims.

Study populations
The red deer population inhabiting the north block of the Isle of Rum, Scotland (57°0′N, 6°20′W) has been studied at an individual level since 1971 (Clutton-Brock et al. 1982), and is the main focus of this study. DNA was extracted from ear punches from calves captured soon after birth or darted adults, post-mortem tissue or cast antlers (See Huisman et al. (2016) for full details). DNA samples were genotyped at 50,541 attempted SNP loci on the Cervine Illumina 50 K BeadChip (Brauning et al. 2015). SNP genotypes were clustered and scored using Illumina GenomeStudio v2.0, and were subject to quality control with the following parameters: SNP minor allele frequency (MAF) > 0.001, ID genotyping success >0.9, and SNP genotyping success >0.99. In addition, SNPs mapped to the sex chromosomes were removed. This resulted in a final data set of 39,587 autosomal SNPs genotyped in 3046 individuals (Johnston et al. 2017).
This study also used equivalent genotype data for 157 individuals from a mainland population of red deer from Argyll, Scotland. Details of sample collection and DNA extraction can be found in McFarlane et al. (2020). Samples were genotyped as above and quality control was carried out by McFarlane et al. (2020) as above with the exception of SNP genotyping success >0.9. These individuals were originally genotyped as part of a study of red × sika deer (Cervus nippon) hybridisation and were pure red deer according to that study (McFarlane et al. 2020).

Calling runs of homozygosity
We searched for runs of homozygosity in each genotyped individual using the -homozyg function in PLINK v1.90 (Purcell et al. 2007). We used two different estimates of marker positions to call ROH: physical distances in base pairs (bp) and genetic distance in centimorgans (cM) The genetic map accounts for variable recombination rate through the genome, whereas the physical map is independent of recombination (Kardos et al. 2017).
Recombination distance was based on a genetic linkage map previously constructed for the Rum deer population using 38,038 SNP markers and pedigree information (Johnston et al. 2017). Physical marker positions were obtained from the red deer (Cervus elaphus) genome assembly version mCerEla1.1 which was constructed using DNA from a female red deer originating from Rum . For consistency between searches, we assumed 1 Mb ≈ 1 cM (Actual value: 1 Mb = 1.04 cM; (Johnston et al. 2017)). The following parameters were used to identify ROH in both cases: -autosome-num 33 (a flag to specify the 33 autosomes in red deer), -homozyg-snp 40 (minimum number of SNPs in a ROH), -homozyg-kb 2500 (minimum length of a ROH in kb), -homozyg-density 70 (minimum density, 1 SNP per 70 kb), -homozyg-window-snp 35 (sliding window size), -homozygwindow-missing 4 (number of missing SNPs allowed in a window), homozyghet 0 (number of heterozygote SNPs allowed in a ROH), and -maf 0.01 (minor allele frequency threshold). Following the use of these parameters in PLINK 35,132 autosomal SNPs and 3046 individuals were analysed in the Rum dataset and 34,673 autosomal SNPs and 157 individuals in the Argyll dataset.
We used simulations (details given below) to show that the chosen search parameters for our average SNP density can capture the 'true ROH' present (Supplementary Figure 1).

Estimate of inbreeding coefficient
We used the called ROH to estimate individual genome-wide inbreeding coefficient, F ROH (McQuillan et al. 2008), using the sum of Mb or cM in ROH across all autosomes divided by the total Mb of autosomes estimated as 2591.86 Mb from the genome assembly mCerEla1.1. As above we assumed 1 Mb ≈ 1 cM (Johnston et al. 2017), therefore the same value was used as the denominator in both instances.

ROH hotspots
Following our ROH search, we determined ROH hotspots across the genome by estimating the proportion of individuals with ROH covering a SNP locus i.e. the ROH density. ROH density was calculated as a SNP-by-SNP measure as follows:

ROH density ¼ Number of individuals with a ROH at focal SNP Total number of genotyped individuals 100
We used the 99th percentile value for ROH density as the hotspot threshold, such that any SNP with a value equal to or above this threshold was classed as a hotspot SNP. This outlier/percentile approach has been used by a number of recent publications Cesarani et al. 2022;Mastrangelo et al. 2018;Purfield et al. 2017;Zhang et al. 2022).
We found that ROH density was positively correlated with the density of SNPs such that fewer ROH were identified in regions where fewer SNPs were genotyped. This issue with the PLINK algorithm for calling ROH has previously been identified for this type of analysis (Nandolo et al. 2018). Windows of size 1500 Kb (or equivalent in cM) and sliding 100 Kb (or equivalent) allowed for the assessment of the relationship between number of SNPs in a window and number of ROH found. Following assessment of both populations using the bp and cM datasets, we chose a minimum density of 23 SNPs per 1500 kb to minimise the correlation, but maximise consistency between methods and the number of SNPs remaining in the analysis ( Supplementary Fig. 2, Supplementary Table 1). All SNPs within windows that fell below this density were discarded from the analysis of hotspots. Additional post-ROH-search quality control removed the first and last 40 SNPs from each chromosome to account for the fact that fewer ROH will have been called in these regions as a ROH cannot span past the ends of a chromosome. (See Supplementary Table 1 for the number of remaining SNPs post-ROH search).

Haplotype diversity
As an additional analysis to detect possible signatures of selection through a drop in genetic diversity, we estimated haplotype diversity for the Rum dataset and in our simulated data. Phased autosomal haplotypes for each individual in the Rum dataset were obtained using AlphaPeel v0.0.1 (Whalen et al. 2018). Phasing was conducted using the Rum red deer pedigree and genomic dataset, using multi-locus peeling, which takes into account that nearby loci are more likely to be inherited jointly from a parental haplotype. Five peeling cycles were run, and only phased genotypes with a probability of >0.95 were retained. Phasing success was very high overall, with an individual mean phasing rate (i.e., the proportion of an individual's genotypes that were phased) of~99% in individuals born after 1980.
Phased haplotypes were then split into 20 SNP windows with a 10 SNP sliding increment and any window with missing calls within it was removed. The number of unique 20-SNP haplotypes in each window and the number of occurrences of each, was counted. A haplotype diversity measure was based on Simpson's diversity index for measuring biodiversity: where n = the number of occurrences of each haplotype and N = total number of occurrences of all haplotypes and the range of D is from 0 (low diversity) to 1 (high diversity).

Simulations
Simulations were carried out in SLiM v3 (Haller and Messer 2019) to test the effect of selection, recombination rate and population history on the distribution of ROH across a simulated 100 Mb chromosome. Each simulation model was run for 23 iterations. We used three different population history scenarios: one reflecting the Rum population, one with no population bottleneck and one with a more severe population bottleneck than the Rum population ( Fig. 1). All scenarios started with an effective population size of 7500 set to run for 75,000 generations as a burn-in (10*N e ) (Haller and Messer 2019). This starting effective population size was based on N e =~1/10 census population size (Frankham 2007). An estimation of the red deer population in Scotland at the beginning of the 20th century was~150,000 individuals, which has since doubled tõ 300,000 at the start of the 21st century (Pepper et al. 2020). Assuming a similar rate of population increase, the red deer population of Scotland (and the rest of the UK) was~75,000 at the beginning of the 19th century.
The population history of Rum is somewhat unknown, as an unrecorded number of individuals were introduced from various red deer herds in Scotland and England beginning in 1845 (Marshall 1998). Therefore, to simulate this bottleneck event in the Rum population history, we dropped N e to 100 at generation 75,000. The simulation ended at generation 75,030 (simulated present day) with a current N e of 100 individuals reflecting the current population of around 1000 individuals. In the more severe bottleneck scenario we dropped N e to 10 at generation 75,000, and then increased it to 100 at generation 75,005. As above the simulation ended at generation 75,030 with 100 individuals. The no bottleneck scenario maintained an N e of 7500 until generation 70,030 (Fig. 1). The output for all scenarios was the genomic data for the last simulated generation.
Four different models were tested for each population history scenario: 1) neutral; 2) varied recombination rate; 3) varied recombination rate plus selection; and 4) varied recombination rate plus stronger selection. All models had a mutation rate of 1 × 10 −8 per base pair per generation (Kyriazis et al. 2021).
Model 1 was a neutral model set to have a constant recombination rate across the chromosome at 1.038 cM/Mb based on estimates from the linkage map (Johnston et al. 2017), with every mutation set as neutral (i.e. selection coefficient = 0).
Model 2 had a varied recombination rate across the chromosome. All red deer chromosomes, with the exception of chromosome 5, are acrocentric (i.e. the centromere occurs very close to the end of the chromosome) and Johnston et al. (2017) demonstrated that recombination rate shows consistent broad-scale variation across chromosomes in red deer. For example, sexaveraged recombination rates are higher in peri-centromeric regions (0-25% of the chromosome length) in comparison to the rest of the chromosome where recombination rates are more constant, with a slight elevation towards the sub-telomere. We split the simulated 100 Mb chromosome into 10 Mb regions and applied a recombination rate to each region according to this observed data. For regions 1-10 the recombination rates were as follows: Model 3 incorporated neutral mutations (fixed selection coefficient = 0), beneficial mutations (mean selection coefficient = 0.001 drawn from gamma distribution with shape parameter = 0.2 and a dominance coefficient = 0.5 i.e. additive), and deleterious mutations (mean selection coefficient = −0.01 drawn from gamma distribution shape parameter 0.2, and dominance coefficient = 0.1 under the assumption deleterious mutations are partially recessive). Neutral, beneficial and deleterious mutations occurred in the ratio 3:1:10, respectively Kim et al. 2017;Kyriazis et al. 2021). This model also included varied recombination as specified for Model 2.
Model 4 included variable recombination as in model 2 but higher selection coefficients than in model 3. We increased the selection coefficients by a factor of five, i.e. beneficial mutations had a mean selection coefficient = 0.005 and deleterious mutations had a mean selection coefficient = −0.05).
Following simulation runs 100 (and 7500 for the no bottleneck scenario) individual genomes were output from SLiM and ROH were called with parameters as described above without the minor allele frequency threshold. In addition, we carried out haplotype diversity calculations on these outputs as described above, further details are given in Supplementary Figs. 7 and 8. All SLiM scripts are available at: https://github.com/ annamayh/ROH_distribution_red_deer.

ROH abundance and length between populations
Across all 3046 Rum deer, we found 68,643 ROH using the physical map (bp positions) and 45,814 ROH using genetic map (cM positions). In the 157 Argyll deer, the equivalent figures were 1825 and 1227 ROH, respectively. The Rum population had significantly more ROH per individual and higher inbreeding coefficients (F ROH ) than the Argyll population (

ROH hotspots
As summarised above, we found that the marker positions used (bp or cM positions) significantly affected the number of ROH found in the populations (Table 1). We next explored this map effect on the genomic distribution of ROH in the Rum population. We found eight regions on six different chromosomes that were classed as ROH hotspots using the physical map positions, whereas only five regions were ROH hotspots using the genetic map positions, all of which were previously identified using the physical map positions (Fig. 2 We also looked for ROH hotspots using only short ROH, between 2.5 and 5 Mb. As above, we found a differences between the maps used. We also find the same three ROH hotspots on chromosomes 15, 18 and 28 when using the genetic map positions, plus four other loci reaching just over the threshold (Supplementary Fig. 4).

Haplotype diversity
A low haplotype diversity indicates that certain haplotypes are more common than others, whereas high haplotype diversity indicates haplotypes occur in equal number. We found a sharp decrease in haplotype diversity in two of the five ROH hotspots, on chromosomes 15 and 28, Fig. 3. This sharp decrease is consistent for chromosome 15 when different window sizes are used (supplementary Figs. 5,6). However the diversity decrease on chromosome 28 appears more dependent on window size, with a Difference between map positions used within each population were significant. One-way ANOVA of map on number of ROH per ID, p-values: Rum dataset <2.2e-16 and Argyll dataset <9e-12. Differences between populations within maps were significant. One-way ANOVA of population on number of ROH per ID, p-value <2.2e-16 for both maps.
c Difference between map positions used within each population were significant. One-way ANOVA of map on F ROH , p values: Rum dataset <2.2e-16 and Argyll dataset <3e-7. Differences between populations within maps were significant. One-way ANOVA of population on F ROH , p-value < 2.2e-16 for both maps.   Fig. 7). On the other hand, simulations of the Rum scenario including strong selection have multiple drops in haplotype diversity ( Supplementary Fig. 8), presumably due to the imposed selection coefficients. Moreover, some, but not all, of these drops coincide with ROH hotspots.

Simulations
In our simulations, we ran 23 iterations of four model designs within three different population histories. In Fig. 4, we show the ROH hotspot threshold (Fig. 4A) and maximum ROH density (Fig.  4B)   The table shows the mean ROH density, with standard deviation (SD) and the ROH hotspot threshold. Any SNP with a value of ROH density above this threshold value is classed as a ROH hotspot SNP. The table also shows the number of SNPs classed as a ROH hotspot SNP and the chromosomes containing ROH hotspots. The notations (a), (b) indicate independent hotspots on the same chromosomes. Fig. 3 Haplotype diversity for 20-SNP windows in 10 SNP sliding increments across 33 autosomes in the Rum population of red deer. Low haplotype diversity indicates that certain haplotypes are more common; high haplotype diversity indicates haplotypes occur at more even frequencies. ROH hotspot regions are shaded in purple.
we compared ROH density between population histories (compare across panels in Fig. 4) there was noticeable variation. As expected, these differences in ROH patterns between different scenarios reflected the expected level of inbreeding and N e in the simulated populations. Additionally, the severe bottleneck model has the most variation between iterations, further highlighting the impact population history, particularly a bottleneck, can have on ROH. We found little impact of varied recombination rate or weak selection on ROH. In all population histories, Models 2 and 3 did not differ greatly from the neutral Model 1 (Fig. 4). On the other hand, strong selection did have a substantial effect, particularly in the no bottleneck scenario. Model 4 showed higher maximum ROH density and hotspot threshold values (Fig. 4). This pattern is most visually obvious in the no bottleneck scenario and least in the severe bottleneck scenario (Fig. 4 left and right panels, respectively).
In order to assess the likelihood that ROH hotspots identified in the Rum population may be caused by selection, we compared the observed data to our simulated data. We show that results from Model 3, including weak selection, are not appreciably different from the neutral model and both overlap with the empirical values ( Fig. 4A middle panel, Supplementary Fig. 9). Model 4, including strong selection, does not overlap with the empirical data for the ROH threshold, with the empirical value for this falling outside the 95% confidence intervals (Fig. 4A middle panel, Supplementary Figure 11). This indicates that most ROH hotspots in the Rum dataset falling over this threshold are not caused by strong selection. However, the maximum ROH density in the Rum dataset is within the 95% confidence intervals from Model 4 and coincides with the mean across iterations (Fig. 4B middle panel, Supplementary Fig. 10), suggesting that the maximum ROH hotspot peak(s) seen in Fig. 2 could be areas of strong selection. However, this maximum value also overlaps with the confidence intervals of other models ( Supplementary Fig. 10).

DISCUSSION Population history and inbreeding level
In this study, we investigated the roles of population history, recombination and selection on the number and distribution of ROH in a wild red deer population, and combined this with forward genetic simulations. We first found that the Rum population had a higher level of inbreeding than the comparison Argyll population, and this difference was consistent between methods (Table 1). This difference probably reflects the different recent histories of the two populations. The Rum population became isolated from the mainland~150 years ago, has had few introductions since and currently numbers around 1000 individuals, therefore is more likely to inbreed, while the Argyll population is continuous with the rest of the Scottish mainland population which numbers in the hundreds of thousands. The direct influence of population history on ROH number has been previously documented in a number of wild species (Foote et al. 2021;Nguyen et al. 2022), including in a study of red deer which included Rum deer samples (de Jong et al. 2020).
Of all the variables tested within our simulations, population history had the greatest influence on ROH distribution, reflecting the level of inbreeding in the simulated populations (Fig. 4). For example, the simulations with a severe bottleneck had the highest number of ROH as a result of a smaller N e , reflecting observed results from bottlenecked populations (Bosse et al. 2012;Kardos et al. 2018). All models in the severe bottleneck scenario also had considerable variability among simulation runs. This variability is presumably due to the unpredictable effect that genetic drift can have on a population following a bottleneck. In the case where little diversity is preserved through a bottleneck, followed by inbreeding, ROH number will increase genome-wide. In another scenario, if a wider range of genetic diversity is preserved, the ROH number remains lower.
We also show population history interacting with selection, with little evidence for the effect of strong selection in the severe bottleneck scenario compared to a strong effect on ROH in the nobottleneck scenario. This is a nice demonstration of the relationship between effective population size and efficiency of selection, in which populations experiencing a bottleneck show weaker responses to selection (Falconer and Mackay 1983;Kardos et al. 2021). In a large non-bottlenecked population there is increased genetic variation and selection is more efficient (Kardos et al. 2021;Petit and Barbadilla 2009), allowing the beneficial alleles to become fixed and in doing so generate ROH hotspots. In contrast, in a smaller bottlenecked population, such beneficial alleles may be lost during the bottleneck and instead ROH hotspots can be the result of drift and inbreeding.

Recombination rate
To investigate the effect of recombination rate on ROH in our dataset we used two maps, physical and genetic, which affected the number of ROHs detected and their location (Tables 1 and 2; Fig. 2). In the Rum population three ROH hotspots found using the physical map were no longer classed as hotspots when using the genetic map. In addition, we found that some hotspots reached further over the hotspot threshold in the genetic map than in the physical map (Fig. 2). The differing results are due to the negative correlation between recombination rate and ROH density, which is known from previous studies (Bosse et al. 2012;Pemberton et al. 2012).
In contrast to expectation and past literature (Bosse et al. 2012;Kardos et al. 2017), the addition of varied recombination rate in our simulation did not yield results that differed significantly from a neutral model (Fig. 4). However, other models simulating the effect of recombination rate used higher values for recombination, to reflect the organism studied (Kardos et al. 2017). In red deer, the maximum recombination rate recorded anywhere in the genome is 4 cM/Mb and the average across 32 acrocentric autosomes is 1.038 cM/Mb (Johnston et al. 2017). In our simulations, the maximum recombination rate was 1.75 cM/Mb based on the average acrocentric values. By using representative values for our study organism we may not have seen such a dramatic effect of recombination rate as other studies. However, as discussed above, accounting for variable recombination rate did affect the ROH hotspots. Perhaps for this population, recombination may be an aid to the formation of ROH hotspots, but not the primary source.

Selection
A number of studies have identified ROH hotspots to be sites of positive selection (Grilz-Seger et al. 2019;Peripolli et al. 2018;Purfield et al. 2017;Sabeti et al. 2002;Shihabi et al. 2022). When accounting for the influence of recombination rate, the red deer population on Rum showed five genomic regions with a ROH in more than 15% of the population (Fig. 2). It is possible these regions may be sites of positive selection. However, we are cautious to draw this conclusion. A relatively low number of individuals in the total population contained a ROH at a hotspot, in comparison to other ROH hotspot studies, which are as high as >80% in livestock and~20% in humans (Pemberton et al. 2012;Purfield et al. 2012;Purfield et al. 2017)-although direct comparison between studies is difficult due to the minimum ROH length employed. Instead, the regions we identify here may be sites generated by genetic drift. Therefore, we further assessed the likelihood that these ROH hotspots are caused by selection or drift.
From our simulations, we conclude that the majority of the ROH hotspots observed are unlikely to be caused by strong selection. Inclusion of strong selection in the Rum population scenario increased the ROH hotspot threshold over that in the empirical dataset (Fig. 4A). We cannot, however, conclude the ROH hotspots are purely a result of genetic drift, as simulations suggest that ROH hotspots are equally likely to occur as a result of neutral processes as under weak selection (Fig. 4A) We tentatively suggest that the maximum peaks in ROH density in the Rum data (i.e. the two highest ROH hotspots on chromosomes 15 and 28, Fig. 2) could be the result of selection. First, there is a sharp decrease in haplotype diversity at the onset of these two ROH hotspots (Fig. 3). It can be assumed that positive selection leads to a sudden decrease in haplotype diversity (Shihabi et al. 2022), as certain haplotypes are being favoured. This assumption is also supported by our simulations. We saw multiple drops in haplotype diversity in the strong selection Rum simulation, some of which coincide with ROH hotspots, but we saw very few haplotype diversity drops in the neutral simulation. Second, these hotspots are also apparent when using only short ROH, between 2.5 and 5 Mb ( Supplementary Fig. 4). Short ROH reflect more ancient inbreeding where IBD segments are broken down by recombination over time (Kirin et al. 2010;McQuillan et al. 2008). Hence, short ROH have been subject to selection pressures for a longer period of time and are more likely to show evidence for long-term selection (Zhang et al. 2015). Finally, the empirical value for the maximum ROH density (Peak ROH density, Fig. 2) overlap values generated from the strong selection simulation (Fig. 4B. Middle panel). However this empirical value was also within the range generated from the neutral or weak selection simulations.
The strength of selection certainly plays an important role in the formation of a ROH hotspot. Increasing the selection coefficient (s) in our simulations resulted in an increase in ROH density. This was particularly evident in the non-bottlenecked population (as high N e allows for a stronger response to selection). By having high s, these new alleles are likely to go to fixation and generate a ROH hotspot. However, in artificially selected organisms s is much higher than the value we chose for our simulations (mean s = 0.005). Therefore, in these populations with very high selection coefficients, selection is likely to be the driver of the majority of ROH hotspots (Kim et al. 2013). In our bottlenecked simulation, it is likely that genetic drift outweighed the strength of selection, as we see little response to the increase in s. Thus in populations with low N e , ROH hotspots may instead be sites of genetic drift rather than selection. A very recent estimate suggests the Rum population has a N e below 200 (Gauzere et al. 2022), and likely has lower values for s than artificially selected organisms. Therefore, the low N e together with the findings discussed here suggest that most ROH hotspots in this population likely emerged due to drift, but we cannot exclude the possibility that some hotspots are the result of strong selection.

CONCLUSION
Our simulations have offered a valuable way to assess how different factors may be affecting the distribution of ROH in our study population, and highlight the danger of assuming all ROH hotspots are sites of selection. Overall, simulations show that population history has the most substantial effect on ROH number and density. Empirical data supports the effect of population history, with particular influence on the number of ROH. We also show in our simulations that when N e is high, and/or selection is particularly strong, selection is the main driver of ROH hotspots. However, in populations with weak selection, genetic drift has the main impact on the formation of ROH hotspots. Simulations show no effect of recombination rate; however, empirical data suggests otherwise. This suggests that recombination rate may aid the formation of ROH hotspots in our study population but is not the sole cause. In our study population we conclude that the most likely driver of ROH formation is genetic drift, aided by genome-wide recombination. However, two ROH hotspots are also consistent with a strong selection scenario, but we can neither fully exclude drift as a driver.

DATA AVAILABILITY
The raw data files used in this paper are deposited in Dryad https://doi.org/10.5061/ dryad.mpg4f4r49.