An evaluation of inbreeding measures using a whole-genome sequenced cattle pedigree

The estimation of the inbreeding coefficient (F) is essential for the study of inbreeding depression (ID) or for the management of populations under conservation. Several methods have been proposed to estimate the realized F using genetic markers, but it remains unclear which one should be used. Here we used whole-genome sequence data for 245 individuals from a Holstein cattle pedigree to empirically evaluate which estimators best capture homozygosity at variants causing ID, such as rare deleterious alleles or loci presenting heterozygote advantage and segregating at intermediate frequency. Estimators relying on the correlation between uniting gametes (FUNI) or on the genomic relationships (FGRM) presented the highest correlations with these variants. However, homozygosity at rare alleles remained poorly captured. A second group of estimators relying on excess homozygosity (FHOM), homozygous-by-descent segments (FHBD), runs-of-homozygosity (FROH) or on the known genealogy (FPED) was better at capturing whole-genome homozygosity, reflecting the consequences of inbreeding on all variants, and for young alleles with low to moderate frequencies (0.10 < . < 0.25). The results indicate that FUNI and FGRM might present a stronger association with ID. However, the situation might be different when recessive deleterious alleles reach higher frequencies, such as in populations with a small effective population size. For locus-specific inbreeding measures or at low marker density, the ranking of the methods can also change as FHBD makes better use of the information from neighboring markers. Finally, we confirmed that genomic measures are in general superior to pedigree-based estimates. In particular, FPED was uncorrelated with locus-specific homozygosity.


Correlations with homozygosity for different allele frequency (AF) classes and with whole genome homozygosity.
Comparisons with homozygosity levels at alternate or private alleles grouped according to allele frequencies (Supplementary Figure S2AB), with marker homozygosity at different MAF categories (Supplementary Figure S2C) and with whole genome homozygosity (Supplementary Figure  S2D) indicated that global measures FPED and FROH including all pedigree generations or all ROH, performed the best. Nevertheless, measures based on the most recent generations of the pedigree (FPED- 5G) or recent ROH (> 5 Mb) had very similar correlations with the different homozygosity measures as FPED and FROH, respectively, indicating that they captured most of the variation of these inbreeding coefficients. Estimators relying on the longest ROH only, FROH-L, had high correlations with homozygosity measures. These were lower than those achieved with FROH, but still higher than correlations observed with FPED. Correlations continued to decrease with FROH-M and FROH-S. The values were still positive but below the levels obtained with FPED. Finally, FPED-OLD presented almost null correlations with the different homozygosity measures.
With HBD-based measures, FHBD-25 and FHBD-125 presented almost identical results (Supplementary Figure S4). When FHBD-525 was computed using all identified HBD segments, correlations with homozygosity at alternate alleles for different AF classes (Supplementary Figure S4A) were higher for AF < 0.5, compared to HBD measures relying only on the longest HBD segments . With private alleles, FHBD-525 had higher correlations with homozygosity at low frequency alleles (AF < 0.15; Supplementary Figure S4B). Interestingly, the correlation curves obtained with FHBD-25 were almost identical to those obtained with FROH and FHOM (Supplementary Figure S4). As a result, FHBD-25 had one of the highest correlations with whole genome homozygosity (Supplementary Figure S4D).

Correlations with homozygous mutation load (HML).
Similar trends were observed when estimators were compared to HML (Supplementary Figure S3). In general, FROH and FPED had higher correlations with HML than measures using subsets of ROH or of pedigree-generations. Almost identical correlations, or slightly higher values, were achieved with FROH-5 and FPED-5G. FROH-L had consistently lower correlations with HML, although close to values observed with FROH-5. With FROH-M, correlations decreased further whereas with FROH-S, the values dropped more significantly. FPED-OLD had almost null correlations with different HML measures.
When using HBD-based measures, estimators based on HBD segments always had larger correlations with HML when all identified segments, including the shortest ones, were considered (Supplementary Figure S5). As observed previously, FHBD-25 presented very similar results to those observed with FROH.

Summary.
Overall, the best results were obtained with FHBD-525, FROH and FPED, measures using all information. Nevertheless, FROH-5 and FPED-5G presented comparable correlations with homozygosity or HML, indicating that these measures of recent inbreeding captured most of the variation in FROH and FPED, respectively. Differences between FHBD-25 and FHBD-525 were more pronounced.

Comparison of inbreeding coefficients relying on maximum likelihood approaches
The FML measure from our study is the maximum likelihood estimator from Wang (2007), using genotypes of triad of individuals to estimate the nine condensed IBD states (Jacquard 1974). This method is implemented in COANCESTRY (Wang 2011) and the related R package (Pew et al. 2015). In the manuscript, we also described the FHBD measure based on identification of homozygous-bydescent (HBD) segments proposed in Druet and Gautier (2017).
Both methods rely on the probabilities to observe genotypes conditionally on F and Hardy-Weinberg proportions. In both cases, genotypes come from a mixture of two distributions (autozygous vs allozygous) and F is estimated as the value maximizing the likelihood.
Here, we want to illustrate that when SNPs are considered independent in FHBD (i.e. when the probability of coancestry change between markers is 1.0 for very distant ancestors separated by many generations of recombination), the two methods provide very similar estimators of the inbreeding coefficient. Therefore, we estimated FHBD with RZooROH (Bertrand et al. 2019) using a hidden Markov model with one HBD and one non-HBD class and with a rate equal to 250,000. With such a high rate, the probability of ancestry change between successive markers is close to 1 (i.e., the markers are independent).
The correlation between FML and FHBD estimated as described above was equal to 0.9999 and the estimated linear regression was FHBD = -0.0001 + 1.0013 × FML. Both results illustrate that these estimators are extremely similar.

Estimation of inbreeding coefficients with low marker density
Methods. All genomic inbreeding coefficients were re-estimated with fewer markers. The new set of markers contained 5,977 SNPs from the Illumina BovineLD BeadChip instead of the 37,675 SNPs used previously.
FUNI, FGRM, FML, FHOM, FROH and FHBD were estimated using the same approach as before. For FROH, PLINK (Purcell et al. 2007) was run with the default options and the following changes: a minimum of 25 SNPs per ROH, at least 1 SNP per 500 Kb, a scanning window of 25 SNPs, a total length > 2 Mb and no heterozygous SNPs. Compared with options selected at higher density, the number of SNPs per ROH was reduced by 50%. This increased the risk of identifying false positive tracks (non-IBD). However, almost no ROH were detected when the number of SNPs was maintained at 50 per ROH, indicating that ROH are not recommended at this marker density.

Results.
Results are reported in Supplementary Figures S19 to S21. Overall, although correlations were lower compared to estimators obtained with 37,675 SNPs, correlations remained high, in particular for the whole genome homozygosity (Supplementary Figure S19). The ranking of the methods and their behaviour was also in line with results obtained with more markers. The most striking difference was that FROH presented lower correlations, closer to values obtained with FPED (see for instance Supplementary Figure S19). For estimation of locus-specific inbreeding coefficients (Supplementary Figure S21), FHBD appeared more robust (i.e. smaller differences with results obtained with the 37,675 SNPs) as it uses information from succession of SNPs in a probabilistic framework. Consequently, it was the most efficient to capture regional scores, for both regional homozygosity and regional HML.

Comparison of inbreeding coefficients predicted in offspring with genotypes from parents
Methods. When mating two individuals, it is of interest to predict the inbreeding coefficient of their offspring. Here, we compared the predicted inbreeding coefficient based on 50K density genotypes from the 145 sequenced parents with the observed inbreeding levels in the 100 sequenced offspring. The inbreeding coefficient of an individual is also equal to the coancestry coefficient between its parents (Malécot 1948). Therefore, predicting the inbreeding coefficients in the offspring amounts to estimating the relatedness between their parents. For each inbreeding estimator previously used, we tried to use a related relationship measure. The pedigree additive relationship, the genomic relationships estimated as in Yang et al. (2011) and with the first method from VanRaden (2008) and the triadic likelihood approach (Wang 2007) were used to predict FPED, FUNI, FGRM and FML, respectively. The genomic relationship presented by Yang et al. (2011) is a correlation between parental allelic dosages (Speed and Balding 2015) averaged over all SNPs, and therefore linked to FUNI. Predicting FUNI at one marker as the average FUNI of the four possible combinations of gametes (i.e., four possible genotypes in the offspring) is indeed equivalent to estimate FGRM using the genomic additive relationship from Yang et al. (2011) between the parents. To predict FHOM, we relied on relatedness measures estimated from proportions of alleles that are identical-by-state (IBS) between two individuals. For instance, Eding and Meuwissen (2001) proposed to estimate relatedness based on a similarity index, whereas Weir and Goudet (2017) presented more recently a relatedness estimator that also relies on IBS proportions. The correlations between inbreeding coefficients estimated by these methods and FHOM are equal to one (all these estimators rely on the observed homozygosity). Finally, we also used a method relying on identification of IBD segments shared between pairs of parents to predict FHBD. We used a simplified approach relying on identification of ROH between the homologs from distinct parents (e.g., Pryce et al. 2012;de Cara et al. 2013;Luan et al. 2014). To that end, the four haplotypes from the parents were first obtained with Beagle 4.0 with the ped option (Browning and Browning 2007). To estimate the IBD probabilities at a locus between homologs from distinct parents (four possible pairs of paternal/maternal homologs), we applied the four HBD classes model to four virtual individuals created by combining two parental haplotypes (one from each parent). The additive relationship between the parents was finally computed as the average IBD probabilities (over the four possible combinations). As described in Supplementary Text S1, we considered HBD-measures relying on all identified HBD segments (FHBD-525) or on the longest HBD segments associated with recent ancestors (FHBD-25).
These inbreeding coefficients predicted using genotypes from the parents (with the 37,675 SNPs) were then compared to scores computed with the whole genome sequence data from the corresponding offspring.

Results.
As expected, correlations with the different scores were lower when inbreeding coefficients were predicted from the genetic relationship between the parents, rather than estimated using the genotypes from the individual. Estimators giving more weight to rare alleles better predicted the homozygosity at rare alleles and HML, whereas the other measures were better predictors of the whole genome homozygosity (Supplementary Figures S20-22). The latter group of methods performed poorly to predict homozygosity at rare alleles in general except for HML at private missense variants (tolerated and deleterious) for which correlations were somehow larger (Supplementary Figure S21), close to the levels achieved by the first group of methods. Homozygosity at deleterious alleles was better predicted using all HBD segments (FHBD-525) rather than long HBD segments only (FHBD-25).  (2008) and based on the diagonal elements of the genomic relationship matrix (dividing all SNP contributions by the same denominator) FML = maximum likelihood estimator of the inbreeding coefficient FHOM = excess homozygosity estimator FHBD-T = inbreeding coefficient estimated as the probability of belonging to any of the HBD classes with a rate higher or equal to T averaged over the whole genome FPED = inbreeding coefficient estimated from pedigree data FPED-5G = inbreeding coefficient estimated from five generations of pedigree FPED-OLD = inbreeding coefficient estimated from ancient ancestors from the pedigree, and equal to FPED -FPED-5G FROH = inbreeding coefficient estimated from ROH longer than 2 Mb FROH-S = inbreeding coefficient estimated from short ROH ( The inbreeding measures related to recent or ancient inbreeding are: FPED-5G, the inbreeding coefficient estimated from five generations of pedigree; FPED-OLD, the inbreeding coefficient estimated from ancient ancestors from the pedigree, and equal to FPED -FPED-5G; FROH, the inbreeding coefficient estimated from ROH longer than 2 Mb; FROH-S, the inbreeding coefficient estimated from short ROH (2-5 Mb); FROH-M, the inbreeding coefficient estimated from intermediate ROH (5-10 Mb); FROH-L, the inbreeding coefficient estimated from long ROH (> 10 Mb); FROH-5, the inbreeding coefficient estimated from ROH longer than 5 Mb.  Figure S21. Correlation coefficients between individual regional inbreeding measures and regional scores in 1 Mb windows computed from the whole-genome sequence data in 145 individuals. The regional inbreeding coefficients were estimated only with markers present among the 5,977 SNPs from the bovine low-density genotyping array. The correlations for approximately 2500 windows are presented as a violin plot combined with an inner boxplot. A. Correlation with regional homozygosity. B. Correlation with regional homozygous mutation load (HML). Note that when estimated inbreeding coefficients were constant for a window, a null correlation was obtained.

Supplementary
Supplementary Figure S22. Correlation coefficients between individual inbreeding measures predicted using parental genotypes at 37,675 SNPs and scores obtained from the whole-genome sequence data in their 100 offspring. A. Correlation with homozygosity at alternate alleles grouped according to their allele frequency. B. Correlation with homozygosity at private alleles (young alleles) grouped according to their allele frequency. C. Correlation with marker homozygosity (counted at both reference and alternate alleles) as a function of minor allele frequency. D. Correlation with whole-genome homozygosity. The error bars represent the 95% confidence intervals.