Introduction

The “genetic architecture” of a trait is a broad term used to describe the characteristics of genetic variants that contribute to the trait’s phenotypic variation. Such characteristics include the number of loci contributing towards variation, the amount of variation attributable to each causal locus, the location of these loci in the genome, and how rare or common causal alleles are in the population (Mackay 2001). It is important to understand the genetic architecture underpinning traits of interest; in disease research this can be used to inform clinical diagnosis and prognosis as well as identify potential treatments; in livestock and crop breeding, it allows for more informed selective breeding strategies to improve trait yield; in evolutionary genetics we can use knowledge of the genetic architecture of a trait to understand how evolutionary processes may act on that trait and what micro-evolutionary dynamics are occurring in the population.

Historically, heritability estimation has been a key metric when investigating the genetic architecture of a trait. Narrow sense heritability (h2) is the proportion of phenotypic variation which is explained by additive genetic variation in the population, and traditionally was estimated using correlations between family members in twin studies or through parent-offspring regression (Falconer and Mackay 1996). More recently the animal model (Meyer 1989) has become widely used, and has the advantage of using relationships between all individuals and across generations to increase power and better estimate the additive genetic component. The proportion of the variance from the additive genetic component can be estimated by an animal model using relatedness inferred from either the pedigree (h2ped – obtained using the numerator relationship matrix A (Henderson 1975)) or genetic markers such as SNPs (h2GRM – obtained using a genomic relationship matrix (GRM) (Yang et al. 2010)). The pedigree captures variance due to identity-by-descent and provides estimates of expected relatedness rather than realised relatedness. The pedigree can also be prone to error, especially when it is derived from observational data, but also in cases when it is derived from genotype data and paternities still cannot be accurately or uniquely assigned. A conventional GRM captures identity-by-state, and the resulting estimates are dependent on which SNPs are used to calculate the GRM; if neither the causal SNP nor any SNPs in linkage disequilibrium (LD) with the causal SNP have been genotyped, the h2GRM estimate may be underestimated (Yang et al. 2015). This can be avoided by using high density genotyping arrays or whole genome sequencing, however in some wild populations it has been shown that increasing the number of SNPs above a certain threshold does not affect GRM-based heritability estimates (Bérénos et al. 2014; Perrier et al. 2018; James et al. 2022).

Inclusion of related individuals when estimating h2GRM can result in overestimation by inadvertently capturing effects of common environment, epistasis and dominance (Kang et al. 2010; Zaitlen et al. 2013). However, removal of related individuals is not optimal for smaller study populations such as wild populations, as this reduces sample sizes. In 2013, Zaitlen et al. proposed a partitioning method that simultaneously estimates both h2 and h2GRM (referred to as h2g by the authors) of a trait in a population that contains related individuals. The method involves running an animal model simultaneously fitting both a GRM and an additional GRM which is thresholded in such a way that relatedness estimates below a specified threshold are set to 0. Both GRMs jointly model variance due to identity-by-descent (IBD), whilst the non-thresholded GRM additionally models variance due to identity-by-state (IBS). This method has so far been applied to a range of traits in humans (Xia et al. 2016; Hill et al. 2018) as well as in wild passerine birds (Silva et al. 2017) and farmed salmon (Kokkinias 2022). The extent to which thresholded GRMs explain trait variance differs across these studies, presumably due to differences in relatedness structure, trait architecture, trait heritability and other variables, so it is of interest to explore the outcome of the approach in a wide range of populations and traits.

Here, we use the method of Zaitlen et al. (2013) to simultaneously estimate h2 and h2GRM for a selection of polygenic and monogenic traits in a wild population of related Soay sheep. The effect of additive genetic variation segregating at the population level is fitted using the non-thresholded IBS GRM, whilst the family-associated variation (such as epistasis, dominance, common environmental factors not captured by our non-genetic random effects fitted in the model, and any additional additive genetic variation segregating within families) is captured using the thresholded IBD GRM.

In addition, we explored a modified version of the approach, in which the thresholded IBD GRM was replaced with a GRM calculated from SNPs that had a minor allele frequency (MAF) below a certain threshold (but not thresholding relatedness), allowing us to examine how much of the genetic variation is captured by rare alleles and whether a rare-allele GRM can be used as a substitute for a relatedness thresholded GRM. We analysed a mixture of monogenic and polygenic traits to better understand how extremes of the genetic architecture of a trait affect the outcome of our heritability partition.

Methods

Phenotypic data

The Soay sheep (Ovis aries) of the St. Kilda archipelago is a primitive breed of sheep that has been the focus of a longitudinal, individual-based study since 1985 (Clutton-Brock and Pemberton 2003). Individuals are ear-tagged at first capture (usually two to ten days after birth) to allow re-identification, regularly recaptured in order to measure, amongst others, various morphometric and life history traits across an individual’s lifespan, and, after death, skeletal remains are collected and measured.

We focused on five polygenic morphometric traits across three different age classes (neonate, lamb and adult). Birth weight was measured in neonates. Three traits (weight, foreleg length and hindleg length) were measured on live individuals during an August catch up for lambs and adults. As adults are often recaptured across different years, the live traits have repeated measurements for many individuals. The remaining two traits (metacarpal length and jaw length) are post mortem measures taken from skeletal remains. Both birth and August weight are measured to the nearest 0.1 kg, whilst the remaining traits are all measured to the nearest mm. A detailed description of trait measurements can be found in Beraldi et al. (2007). Several studies have demonstrated that these traits are under polygenic control (Bérénos et al. 2015; Ashraf et al. 2021; Hunter et al. 2022; James et al. 2022).

Neonates were defined as individuals who were caught and weighed between two and ten days after birth. For live traits, lambs were defined as individuals whose morphometric data was recorded in the August of their birth year, and for post mortem traits in lambs were individuals who died before 14 months of age. Similarly, adults were defined as individuals with phenotypic data recorded at least two years after birth for the live traits, or if they died after 26 months of age for the post mortem traits. We did not include yearling data in our analyses as these are individuals who have not yet reached their adult size and thus have small sample sizes available due to first winter mortality.

In addition to the polygenic traits, we analysed four monogenic traits (male horn type, female horn type, coat colour and wild/self coat pattern) to examine whether the different genetic architectures underpinning polygenic and monogenic traits yield different results when partitioning the heritability. These traits are well characterised in this population and the causal gene for each trait has been identified (see Johnston et al. (2013), Gratten et al. (2007), Gratten et al. (2010) and Supplementary Table 1 for additional information on the underlying genetics of horn type, coat colour and coat pattern respectively). Male and female horn types were analysed separately as while both are controlled by the same locus, in males the “normal horned” allele is dominant, so only two phenotype classes (normal horned and scurred) are observed, whilst in females there is codominance and three phenotypes are observed (normal horned, scurred and polled). As the monogenic traits investigated here do not change over ontogeny, we did not analyse these traits by age class.

Table 1 lists the number of individuals and records per trait.

Table 1 Number of individuals and records, fixed and random effects fitted in each trait and age class model in addition to the genomic relationship matrices or A matrix.

Genetic data

Data were available for 8557 sheep genotyped on the Ovine SNP50 Illumina BeadChip, which gave genotypes for 38,130 autosomal variants remaining after quality control (MAF > 0.001, SNP locus genotyping success >0.99, individual genotyping success >0.95, IBS with another individual <0.9) that are polymorphic in the population. A subset of 438 SNPs are used to recover the pedigree using Sequoia (Huisman 2017), alongside observational data. 188 individuals have also been genotyped on the Ovine Infinium HD SNP BeadChip, which has a much higher density of variants than the SNP50 BeadChip. After quality control (SNP locus genotyping success >0.99 and individual genotyping success >0.95), 430,702 polymorphic SNPs were retained. This has allowed for the remaining genotypes to be imputed to this higher density using AlphaImpute, which combines shared haplotype and pedigree information for phasing and imputation (Hickey et al. 2012) (see Stoffel et al. (2021) for information on our imputation procedure). We used imputed genotype “hard” calls rather than genotype probabilities in downstream analyses. Genotypes with a probability of <0.99 were excluded, resulting in, 419,281 autosomal SNPs remaining for 8557 individuals (4035 females, 4452 males). We have previously shown that imputation does not affect heritability estimates for these traits in this population (James et al. 2022).

Overview of population structure

The Soay sheep population consists of individuals with a range of relationships. Currently, 10,979 individuals are included in the pedigree with an additional 379 genotyped individuals having no known connection to the pedigree, and the average pedigree pairwise relatedness is 0.0079. Full siblings are much rarer than half-siblings. In total, there are 17,948 parent-offspring pairs in the pedigree, 686 full sibling pairs, 124,739 half-sibling pairs, and 29,141 grandparent-grandchild pairs.

Using the imputed genotype data, Fhat3 inbreeding values (Yang et al. 2011) ranged between −0.23 and 0.65, with a median of 0 and a mean of 0.02. 90% of individuals have an inbreeding estimate lower than 0.04.

4.39% of non-diagonal relatedness estimates in the GRM of the total genotyped population were above 0.05. Of the non-diagonal estimates above 0.05, 0.98% were higher than 0.4, suggesting they were likely parent-offspring pairs, full siblings, or half-siblings where the non-shared parents were closely related. 7.64% of non-zero estimates were between 0.2 and 0.4, which would include half-siblings where the non-shared parents were not closely related, grandparent-offspring pairs, and avuncular relationships. 24.11% of non-zero estimates were between 0.1 and 0.2, which would include first cousins and great-grandparent-great-offspring pairs, as well as more complex relationships such as grand-avuncular pairs or half-avuncular pairs. The remaining estimates above 0.05 include more distant relationships, such as half-cousins and double second cousins.

1.44% of non-diagonal relatedness estimates in the thresholded GRM of the total genotyped population were above 0.1. Of the non-diagonal estimates above 0.1, 2.99% were higher than 0.4, whilst 23.35% of estimates were between 0.2 and 0.4.

The proportion of non-zero relatedness estimates within each relatedness interval remained similar across the samples of individuals for each trait.

Variance component analysis

We used animal models to partition the phenotypic variance for each trait into genetic and non-genetic variance components. For each trait, we ran multiple models:

$${\rm{Model}}\,1:y=X\beta +\sum _{r}{Z}_{r}{u}_{r}+{Wg}+\varepsilon$$

Where y is the vector of phenotypic values; X is a design matrix linking individual records with the vector of fixed effects β, Zr is an incidence matrix that relates a non-genetic random effect to the individual records; ur is the associated vector of non-genetic random effects; g is the vector of additive genetic random effects of the whole study population with W the incidence matrix; and ε is the vector of residuals. It is assumed that g ~ MVN(0, Mgσg2), where σg2 is the additive genetic variance and Mg is the GRM for the whole study population. This model was run once per trait for the polygenic traits. For the monogenic traits, this model was run three times: once with all SNPs being used to construct the GRM, once with all SNPs on the same chromosome as the causal gene being used to construct the GRM, and once with all SNPs within 1 Mb upstream or downstream of the causal gene being used to construct the GRM. For monogenic traits, these models were designed to eliminate noise from genetic variation from non-causal loci, given that the only variants affecting these traits should be those within the previously identified causal regions.

$${\rm{Model}}\,2:y=X\beta +\sum _{r}{Z}_{r}{u}_{r}+{Wg}+W{k}_{t}+\varepsilon$$

The terms in this model are the same as in Model 1, with the inclusion of kt, the vector of extra genetic random effects associated with relatives with a genomic relatedness higher than threshold t. It is assumed that kt ~ MVN(0, Mktσkt2), where σkt2 is the kinship genetic variance and Mkt is the kinship GRM with relationships equal to or less than t being set to 0. This model was run twice per trait, once at t = 0.05 and once at t = 0.1. Both thresholds capture parent-offspring, full sibling and half-sibling relationships but differ as to the proportion of more distantly related pairs of individuals that are retained.

$${\rm{Model}}\,3:y=X\beta +\sum _{r}{Z}_{r}{u}_{r}+{Wg}+{{Wg}}_{{MAF}}+\varepsilon$$

The terms in this model are the same as in Model 1, with the inclusion of gMAF, the vector of additive genetic random effects of SNPs with a MAF under a specified threshold in the whole study population. It is assumed that gMAF ~ MVN(0, MMAFσMAF2), where σMAF2 is the additive genetic variance of SNPs with a MAF below a set threshold, and MMAF is the GRM calculated from the SNPs with a MAF below the threshold for the whole study population and range of relationships. For polygenic traits, this model was run five times, with MAF thresholds varying between 0.1 and 0.001 (see Table 2 for the full list of MAF thresholds and the number of SNPs that remained for each threshold).

Table 2 Number of SNPs used in the MAF-thresholded GRMs at each threshold, and percentage of the total number of genotyped SNPs they represent.

For the monogenic traits, in addition to running the same models as for the polygenic traits, we also ran the model for each threshold where both GRMs were only constructed from SNPs on the chromosome containing the causal gene, and again for each threshold where both GRMs were only constructed from SNPs within the region spanning 1 Mb either side of the causal gene (See Table 3 for the number of SNPs that remained for each threshold for each trait for both the single chromosome and region based GRMs). We did not run the model for any thresholds that resulted in a GRM being computed for less than 10 SNPs. These models were designed to eliminate noise from genetic variation from non-causal loci, given that the only variants affecting these traits should be those within the previously identified causal regions.

$${\rm{Model}}\,4:y=X\beta +\sum _{r}{Z}_{r}{u}_{r}+{Wped}+\varepsilon$$
Table 3 Number of SNPs used in the MAF-thresholded GRMs at each threshold for each trait for both the GRMs constructed from SNPs on the focal chromosome and SNPs within 1 Mb either side of the focal gene.

The terms in this model are the same as in Model 1, with the exception of ped being the vector of pedigree-based effects of the whole study population. It is assumed that ped ~ MVN(0, Agσg2), where σg2 is the additive genetic variance and Ag is the relationship matrix for the whole study population. Again, this model was run once per trait.

Fixed and non-genetic random effects were only fitted for the polygenic traits, with the effects fitted differing between traits and age classes (see Table 1). For each trait, the same individuals were analysed across all models.

The GRMs were computed using GCTA (Yang et al. 2011), which computes the genetic relationship between individuals i and j as

$${A}_{{ij}}=\frac{1}{N}\sum _{z}{A}_{{ijz}}=\left\{\begin{array}{c}\frac{1}{N}\sum _{z}\frac{\left({s}_{{iz}}-2{p}_{k}\right)\left({s}_{{jz}}-2{p}_{z}\right)}{2{p}_{z}\left(1-{p}_{z}\right)},i\,\ne\, j\\ 1+\frac{1}{N}\sum _{z}\frac{{s}_{{iz}}^{2}-\left(1+2{p}_{z}\right){s}_{{iz}}+2{p}_{z}^{2}}{2{p}_{z}\left(1-{p}_{z}\right)},i=j\end{array}\right.$$

where siz is the number of copies of the reference allele for SNP z of the individual i, pz is the frequency of the reference allele for the SNP z, and N is the number of SNPs (Yang et al. 2010). The models described above were run using DISSECT (Canela-Xandri et al. 2015). This method of calculating the GRM weights SNP effects by their MAF; if two individuals share a rare allele, they are more likely to be related than two individuals that share the common allele of the same SNP. Giving more weight to rare alleles means that individuals that are genotypically similar due to IBD are likely to have a higher relatedness estimate than those who are similar due to IBS and are unrelated, so this estimator works to our advantage.

To deal with multiple measurements per individual for the adult live traits, we used a repeatability model by fitting ID as a random effect to ensure that uncertainty was correctly propagated through all estimations (Mrode 2014). Although DISSECT does not currently have the option to automatically analyse repeated measures, it is possible to modify input files to allow for a repeated measures model (see James et al. (2022) for method details).

For models 1 and 4, we estimated the narrow sense heritability (h2) by dividing the additive genetic variance (the variance associated with the GRM or pedigree respectively) by the total estimated phenotypic variance (yielding h2GRM and h2ped respectively). For models 2 and 3, we estimated three heritabilities: to avoid confusion when comparing with models 1 and 4, we refer to them as h2pop, the additive genetic variance explained by the full GRM (equivalent to h2GRM from model 1); h2kin, the additive genetic variance explained by the thresholded GRM; and h2pk, which is estimated as the sum of h2pop and h2kin (equivalent to h2 and h2ped). For the monogenic traits, we refer to the heritability estimates from models 1 and 3 when only including SNPs on the same chromosome as the causal gene as h2GRM_Chrom, h2pop_Chrom, h2kin_Chrom, and h2pk_Chrom. Likewise, for the heritability estimates when only including SNPs within 1 Mb of the causal gene, we shall refer to them as h2GRM_Region, h2pop_Region, h2kin_Region, and h2pk_Region.

We performed log-likelihood ratio tests (LRT) between models 2 and 3 and model 1 at each threshold for each trait. For models where the inclusion of the thresholded GRM was determined to improve the fit, h2pk estimates were compared to h2ped estimates to determine whether the model including the thresholded GRM was able to estimate h2ped accurately, whilst the h2pop estimates were compared to h2GRM estimates to determine if the h2GRM estimates were biased due to the presence of related individuals.

Results

Polygenic traits

For the neonate and lamb traits, the inclusion of a thresholded GRM in the model only significantly improved the fit of the model for lamb jaw length for both relationship-thresholded models (model 2, t = 0.05 and t = 0.1) (Fig. 1A, Supplementary Tables 24).

Fig. 1: Heritability estimates for traits for which inclusion of a thresholded GRM was significant.
figure 1

Estimates displayed are A lamb jaw length, B adult August weight, C adult foreleg length and D adult jaw length for models 1, 2 and 4. h is the h estimate from model 1 (no thresholded GRM fitted), h is the additive genetic variance explained by the full GRM (equivalent to h from model 1), h is the additive genetic variance explained by the thresholded GRM, h is the sum of h and h, and h is the h estimate when using the pedigree. For each trait, light purple represents the estimate of h for model 1, blue represents the estimate of h for model 2, pink represents the estimate of h for model 2, and purple represents the estimate of h for model 4. Stars represent the p value when performing loglikelihood ratio tests between models 1 and 2: ns means p > 0.05, *0.01 < p < 0.05, **p < 0.01.

For lamb jaw length, the h2kin estimates were 0.084 and 0.083 for t = 0.05 and t = 0.1 respectively (S.E. 0.048 and 0.047 respectively) whilst h2pop estimates were 0.215 and 0.229 (S.E. 0.043 and 0.039), suggesting that family-associated variance explains 8% of phenotypic variance and 26.5–28.1% of the genetic variance underpinning lamb jaw length, depending on the threshold used. For this trait, the inclusion of the MAF-thresholded GRMs (model 3) led to non-convergence of the model. Estimates of h2pk (h2pop + h2kin) were higher than that of h2GRM for the two relatedness-thresholded models (h2GRM: 0.254 (S.E. 0.036), h2pk t = 0.05: 0.300, t = 0.1: 0.312), with the h2pk estimate for the higher threshold being the highest. The two h2pk estimates were similar to the estimate of h2ped (0.317 (S.E. 0.062)), with the h2pk estimate at t = 0.1 falling within the standard error around the estimate of h2ped. The standard errors around the estimates of h2pop overlapped with that of h2GRM, suggesting that the h2GRM estimate for this trait does not suffer from major biases from family-associated affects such as dominance and epistasis (Table 4).

Table 4 The h2 estimates and their standard errors for the traits in which the inclusion of the thresholded GRM significantly improved model fit for the comparison of model 1 (with no thresholded GRM) and model 2.

Of the adult traits, the inclusion of a relatedness-thresholded GRM was significant for August weight, foreleg length and jaw length at t = 0.1 (Fig. 1B–D, Table 4, Supplementary Tables 2 and 4). For adult August weight, the estimate of h2kin was 0.098 (S.E. 0.047) and the estimate of h2pop was 0.167 (S.E. 0.048), suggesting that family-level genetic variance explains 9.8% of the phenotypic variation of this trait and made up 37.1% of the underlying genetic variance. For adult foreleg length, the estimate of h2kin was 0.112 (S.E. 0.041) and the estimate of h2pop was 0.194 (S.E. 0.045), suggesting that family-level genetic variance explains 11.2% of phenotypic variation and 36.6% of underlying genetic variance for adult foreleg length. For adult jaw length, the estimate of h2kin was 0.169 (S.E. 0.074) and the estimate of h2pop was 0.427 (S.E. 0.079), suggesting that family-level variance explained 16.9% of the phenotypic variation and 28.4% of the underlying genetic variance for this trait. Estimates of h2pk of all three traits fell within the standard errors around their respective h2ped estimates, and for adult August weight, the standard error around the estimate of h2pop overlapped with that of h2GRM. However, for adult foreleg length and adult jaw length the standard errors around the estimates of h2pop and h2GRM did not overlap, suggesting that there may be some effect of family-associated variance such as dominance or epistasis biasing the h2GRM estimates (Table 4).

The MAF-thresholded models (model 3) did not yield a significant change in the additive genetic variance explained for any trait (Supplementary Tables 3 and 4).

Monogenic traits

Inclusion of either the relatedness-thresholded GRM (Model 2, t = 0.05 and t = 0.1) or the MAF-thresholded GRM (Model 3) was not significant for any of the monogenic traits we investigated (Supplementary Tables 57). These results are somewhat surprising, as dominance is known to play a role in all four of these traits (Dolling 1961; Ryder et al. 1974; Kinsmann 2001; Coltman and Pemberton 2003; Gratten et al. 2007; Gratten et al. 2010; Johnston et al. 2013).

Re-running model 3 focusing on smaller SNP windows (focal chromosome and 1 Mb either side of the causal gene) did not improve the model fit for any of the MAF thresholds across any of the traits. For male and female horn type, the regional model was only run for MAF = 0.1 and MAF = 0.05 due to the fact that less than 10 SNPs remained when applying the more stringent thresholds. For coat colour and coat pattern, the regional model was not run for MAF = 0.005 and MAF = 0.001 for the same reasons, however the model at the remaining thresholds failed to converge.

Interestingly, for male horn type, coat colour, and coat pattern, estimates of h2GRM_Chrom and h2GRM_Region from the chromosome and regional models that did run were lower than estimates of h2GRM when the whole genome SNP data was used to construct the GRM. The standard error around the estimate of h2GRM did not overlap with those of h2GRM_Chrom and h2GRM_Region, suggesting that these estimates are significantly different from each other. For female horn type, h2GRM_Chrom and h2GRM_Region estimates for the chromosomal and regional models that did run were slightly higher than h2GRM, though the standard errors around the estimates did overlap.

Discussion

As far as we are aware, Zaitlen et al. (2013)’s method has been used to estimate heritability in populations containing related individuals in humans (Zaitlen et al. 2013), one livestock population (Kokkinias 2022) and two wild populations of birds (Silva et al. 2017). Population structure and sample size both differ between the different studies, as well as the proportion of h2pk attributed to h2kin. Zaitlen et al. (2013) focused on a sample of ~38,000 Icelandic individuals including sibling, half-sibling, parent-offspring, grandparent-grandchild and avuncular relationships, and found that h2kin made up 25–67% of h2pk estimates across their focal traits. Kokkinias (2022) focused on ~5000 individuals in a commercial salmon breeding program which contained four breeding lines, with most individuals in a breeding line being closely related (usually either full siblings or half siblings) – they found that h2kin made up 0–10% of h2pk estimates. Silva et al. (2017) focused on both a sample of 700–1400 Norwegian house sparrows and a sample of ~800 Swedish collared flycatchers, with the sample of house sparrows being more related to each other than the sample of collared flycatchers were. h2kin made up 19–100% of h2pk in the house sparrow data and 2–74% of h2pk estimates in the collared flycatcher data. The Soay sheep population is comprised of related individuals with very complex family structures – both males and females are promiscuous meaning full-siblings are much rarer than half-siblings, and the population is uniformly inbred. For the traits in which inclusion of a thresholded GRM improved model fit, we found that h2kin estimates made up 26.5–37.1% of h2pk estimates.

It is possible that the difference between these results is because of population relatedness structure; the salmon population contains no individuals without any relatives and the degree of relatedness between relatives was high, thus the GRM and the thresholded GRM were probably similar. In comparison, the human and bird data comprised a more complex set of relationships resulting in a bigger difference between the GRM and thresholded GRM. Xia et al. (2016) performed a similar analysis to Zaitlen et al. (2013) in a population of Scottish humans, but including environmental relationship matrices in the models alongside the thresholded GRM and performed stepwise model selection to determine which combination of matrices resulted in the best model fit for each of their traits. They first did this on a sample of 10 K individuals, primarily nuclear families and unrelated individuals, and then on a sample of 20 K individuals, made up of the original 10 K plus an additional 10 K individuals that were mainly related to the original 10 K individuals. Of the 16 traits studied, the relatedness-thresholded GRM was retained after model selection for 10 traits when using the 10 K data and for 14 when using the 20 K data. The authors suggest that the increase in sample size and inclusion of more distant relationships allowed the model to separate the effect of family-associated genetic variance (h2kin) from family-associated environmental variance for the four traits for which the relatedness-thresholded GRM was retained only when using the 20 K data.

We identified three traits that are potentially influenced by family-associated variance. For weight and foreleg length, this effect was only observed in adulthood, whilst for jaw length the effect was found in both lambs and adults.

As jaw length is a skeletal trait, it is measured post-mortem and thus there is no overlap between individuals who have a recorded lamb jaw length and those who have a recorded adult jaw length. This suggests that the effect of family-level genetic variance on jaw length is not dependent on any age-related mortality factors such as juvenile survival and is potentially consistent throughout life, affecting both juvenile jaw length and total potential (adult) jaw length.

Weight and foreleg length are live measures, meaning there is an overlap between individuals measured as adults and individuals measured as lambs (and in the case of weight, individuals measured as neonates). In addition, individuals can be caught across multiple years as adults, meaning that adult August weight and adult foreleg length might have repeated measures. There are therefore two main potential reasons as to why our models showed an effect of family-level genetic variance in adulthood but not in juveniles or neonates. Firstly, h2 estimates for birth weight, lamb August weight and lamb foreleg length are low (h2 < 0.15), so these traits are likely primarily controlled by environmental factors (including maternal effects). Thus, any family-level genetic variation affecting these traits may also be small. Secondly, it is possible that any family-level genetic variation affecting weight and foreleg length only affect the total potential (adult) weight and foreleg length, rather than juvenile weight and foreleg length. This means the effect would only be picked up among individuals who have reached their potential for these traits (i.e. individuals who have stopped growing – adults).

The standard errors around the h2pop estimates for adult foreleg length and adult jaw length at t = 0.01 did not overlap with those for their respective h2GRM estimates, suggesting that any family-associated variance that is influencing these traits is significantly impacting estimates of h2GRM when related individuals are included in the analysis. It is interesting that, despite there being a genetic correlation of 0.944 (S.E. 0.019) between adult foreleg length and adult hindleg length (Bérénos et al. 2014), inclusion of a thresholded GRM at t = 0.1 is significant for adult foreleg length and not adult hindleg length. However, in a recent genome-wide association study of this population, we found two peaks on chromosomes 7 and 9 that were significantly associated with adult foreleg length but not with adult hindleg length (James et al. 2022). It is possible that family-associated variation such as epistasis and dominance are acting upon these two regions, resulting in the inclusion of the thresholded GRM being significant for adult foreleg length but not adult hindleg length. Further investigation, with larger sample sizes that may become available in the future, would shed more light on the commonalities and differences in the determinants of these traits.

Inclusion of either the relatedness-thresholded GRM models (model 2) or the MAF-thresholded models (model 3) did not improve model fit for any of the monogenic traits we investigated. From our results, this model is not well suited for detecting any dominance in monogenic traits. It is possible that this is because the minor allele frequencies of the causal variants are too high to influence the thresholded GRMs – especially the MAF-thresholded GRMs, given that the minor allele frequencies are all higher than our most lenient threshold (Supplementary Table 1). Interestingly, for three of the traits (male horn type, coat colour and coat pattern), estimates of h2GRM_Chrom and h2GRM_Region were lower than that of h2GRM, suggesting that there may be some additive genetic variance influencing these traits located elsewhere in the genome, and that these traits may not be truly monogenic. Further investigation using methods such as regional genomic relationship mapping (Nagamine et al. 2012) or haplotype heritability mapping (Shirali et al. 2018) may therefore provide new insights to the genetic architecture underpinning these traits.

The MAF-thresholded model was designed to identify whether our focal traits were influenced by rare alleles in the population. As the genetic architecture of some of our focal traits has not yet been fully characterised, it is unknown what proportion of the genetic variance is being contributed by SNPs with high or low MAFs. Using the MAF-thresholded model, we would expect to see some of the genetic variance being attributed to the MAF-thresholded GRM if variance was being attributed to rare alleles. Given that the MAF-thresholded model was not significant for any of our focal traits, it is possible that none of these traits are influenced by variants with a low enough MAF to be detected by this model.

Overall, we have demonstrated that the method proposed by Zaitlen et al. (2013) can be useful for estimating heritability in wild population samples containing relatives, as well as showing how the method is affected by different genetic architectures of traits (monogenic versus polygenic). However, we found that the model including the thresholded GRMs was not a significantly better fit for the Soay sheep data when estimating heritability for most of our traits. On the other hand, there are indications that some traits previously thought to be monogenic in the Soay sheep may be influenced by genetic variation outwith the causal gene.