Extreme inbreeding in a European ancestry sample from the contemporary UK population

Yengo, Loic; Wray, Naomi R.; Visscher, Peter M.

doi:10.1038/s41467-019-11724-6

Download PDF

Article
Open access
Published: 03 September 2019

Extreme inbreeding in a European ancestry sample from the contemporary UK population

Nature Communications volume 10, Article number: 3719 (2019) Cite this article

63k Accesses
19 Citations
309 Altmetric
Metrics details

Subjects

Abstract

In most human societies, there are taboos and laws banning mating between first- and second-degree relatives, but actual prevalence and effects on health and fitness are poorly quantified. Here, we leverage a large observational study of ~450,000 participants of European ancestry from the UK Biobank (UKB) to quantify extreme inbreeding (EI) and its consequences. We use genotyped SNPs to detect large runs of homozygosity (ROH) and call EI when >10% of an individual’s genome comprise ROHs. We estimate a prevalence of EI of ~0.03%, i.e., ~1/3652. EI cases have phenotypic means between 0.3 and 0.7 standard deviation below the population mean for 7 traits, including stature and cognitive ability, consistent with inbreeding depression estimated from individuals with low levels of inbreeding. Our study provides DNA-based quantification of the prevalence of EI in a European ancestry sample from the UK and measures its effects on health and fitness traits.

Associations of autozygosity with a broad range of human phenotypes

Article Open access 31 October 2019

David W Clark, Yukinori Okada, … James F Wilson

Participation bias in the UK Biobank distorts genetic associations and downstream analyses

Article Open access 27 April 2023

Tabea Schoeler, Doug Speed, … Zoltán Kutalik

Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects

Article Open access 09 May 2022

Laurence J. Howe, Michel G. Nivard, … Neil M. Davies

Introduction

Mating between close relatives, that is inbreeding, is reported in many species to yield deleterious outcomes, such as reduced fertility^1,2,3,4, stature^{2,4,5,6,7,8,9,10} and lifespan². In humans, consanguineous mating leads to higher childhood mortality^3,11,12 and to adverse effects on traits such as lung function^4,10,13 and cognitive ability^4,8,10,13. Because of its detrimental consequences, also referred to as inbreeding depression, a number of species have developed inbreeding avoidance mechanisms to limit its effects¹⁴. In humans, inbreeding avoidance mechanisms, include cultural and religious taboos on incest, and laws explicitly forbidding certain types of mating. For instance, the Sexual Offences Act (2003) in the UK specifically forbids mating between first-degree (parent–offspring or fullsibs (FS), i.e., coefficient of relationship of 0.5) and second-degree (halfsibs (HS), grandparent–grandchild, avuncular or double-first cousins, i.e., coefficient of relationship of 0.25) relatives; and also forbids mating between step and relatives when one of the family members is below 18 years old. Cultural, legal, religious and health-related constraints strongly weigh on the ability to observe, and therefore study the causes and consequences of inbreeding between first- and second-degree relatives, hereafter referred to as extreme inbreeding (EI). A number of previous studies have attempted to quantify the prevalence and incidence of EI^{15,16,17,18,19,20}. However, as underlined by van den Berghe²¹, the estimates which they produced are questionable given the “disinclination of family members to report incest when it occurs, and the countervailing bias of many scholars and crusaders to magnify the problem on which they build their career”. Add to these limitations, the relatively small size of these studies (often <1000 participants) and the discrepancies between them with respect to the definition of EI, as some of these studies included mating between step-relatives²¹. Here, we leverage a large observational study of ~450,000 participants, the UK Biobank (UKB), to quantify EI and its consequences in contemporary European descents from the UK population. We compare our estimates with the prevalence of police-recorded cases of incest offences reported in the Crime Survey for England and Wales (CSEW) between April 2002 and March 2017. We also characterise the distribution of runs of homozygosity (ROHs) in EI cases and assess its consistency with theoretical predictions. Finally, we characterise the phenotypic consequences of EI on a number of health-related traits measured in UKB participants.

Results

Prevalence of EI in European descents from the UKB

We previously identified²² 456,426 individuals of European ancestry among the 487,409 UKB participants who have been genotyped. Ancestry was called in our previous study using projected principal components analysis based on known ancestry and whole-genome sequence data from 2504 participants of the 1000 Genomes Project²³ (Methods). Given that 12 participants had retracted consent, we only analysed 456,414 UKB participants in the present study. We used 301,412 quality-controlled genotyped single-nucleotide polymorphisms (SNPs) to call ROHs using the PLINK software (Methods). As in previous studies^4,8,10, ROHs were defined as homozygous >1.5 Mb long genomic segments (Methods). We then estimated for each study participant the percentage of their autosome comprising ROHs as a measure of inbreeding. Such inbreeding measure, hereafter denoted F_ROH, is a well-established predictor of pedigree inbreeding^24,25. Following guidelines from the American College of Medical Genetics and Genomics (ACMG)^26,27, EI was called for individuals with F_ROH > 0.1. The use of both F_ROH as a measure of inbreeding and of this threshold are recommended by the ACMG for detecting suspected consanguinity between parents.

We thus identified 125 unrelated participants (65 males and 60 females) whose genomes are consistent with their parents being first- or second-degree relatives. That represents a prevalence of EI ~0.03%, i.e., ~1/3652 (95% confidence interval—CI_95%: [1/4428–1/3106]). As a sensitivity analysis, and consistent with theory predicting much longer ROHs under EI, we re-estimated the prevalence of EI considering only ROHs > 2 Mb or >5 Mb long. Using these alternative definitions of ROH, also recommended in the ACMG guidelines, we detected 115 (prevalence of ~1/3969; CI_95%: [1/4857–1/3355]) and 98 (prevalence of ~1/4658; CI_95%: [1/5807–1/3887]) cases of EI, respectively. We also estimated the prevalence of EI using allele-frequency based inbreeding measures or using ROHs detected on both autosome and X-chromosomes of female participants. (Supplementary Table 1). Given that the latter estimates of the prevalence of EI are not statistically distinct (paired t test: p > 0.05) from our first estimate based on ROHs > 1.5 Mb, we will hereafter only consider ROHs > 1.5 Mb.

We then compared our estimate of the prevalence of EI with the prevalence of incest offences reported in the CSEW between April 2002 and March 2017. That survey reports a total of 11,196 cases of police-recorded incest offences over this time period (URLs). Relative to the population of England and Wales, which varied from 52,602,200 to 58,744,600 between those years (URLs), this represents a prevalence ranging from ~1/5247 (CI_95%: [1/5346–1/5151]) to 1/4699 (CI_95%: [1/4787–1/4612]). The latter estimate is of the same order of magnitude as our estimated prevalence of EI in the UKB although these two estimates are based on different time periods (births 1938–1967 in the UKB vs. reports 2002–2017 in the CSEW). We then compared the mean years of birth among EI cases with the rest of UKB participants and found no statistical difference (p = 0.11). That suggests that the prevalence of EI is relatively unchanged over time although mean inbreeding coefficients have significantly decreased over the years (correlation between year of birth and F_ROH: r = −0.01%; Pearson’s correlation test p = 5.5 × 10⁻¹⁴). However, it is important to note that the prevalence of EI and that of police-recorded incest offences cannot naïvely nor strictly be compared because (i) only an unknown but likely small fraction of incest cases are reported to the police, (ii) not all cases of incest would result in viable offspring as observed in this study, and (iii) viable offspring with severe cognitive impairment due to inbreeding are unlikely to enrol themselves as participants in the UKB.

Fry et al.²⁸ previously reported that the UKB is not representative of the entire UK population, as it notably, includes healthier and more educated participants than the average population. Such an ascertainment on traits which are negatively correlated with inbreeding (e.g., educational attainment (EA) or height²⁸), may lead the prevalence of EI in the UKB to be an underestimation of the actual prevalence of EI in the UK population. As a consequence, our estimate of prevalence of EI is likely conservative, although the magnitude of the underestimation is difficult to predict as it depends on many other unknown factors which might differ between UKB participants and the general population.

Deconvolution of underlying mating types

We next estimated the proportion of EI cases born from mating between first-degree relatives (mating type 1; MT1) vs. second-degree relatives (MT2) using a threshold-based approach based on F_ROH. To determine an optimal threshold, we simulated inbreeding under MT1 and MT2 using phased genotypes from 972 unrelated UKB participants (Methods, Supplementary Table 2). These 972 UKB participants are the offspring from 972 independent parent–offspring (PO) trios identified in the UKB²⁹. Over ~20,000 simulation replicates (one replicate is one simulated EI case) we found that F_ROH as a predictor of underlying mating type (MT1 vs. MT2) yields an area under the receiver operating characteristic curve (AUC) of ~0.97 and that using F_ROH > 0.17 as a threshold yields optimal sensitivity and specificity both >0.92 (Fig. 1). Using this threshold, we therefore identified 54/125 (i.e., ~43.2%) EI UKB cases whose parents are most likely first-degree relatives. It is worth noting that complex inbreeding loops between second degree-relatives may also lead to extreme values of F_ROH. However, mating between first-degree relatives remains a more parsimonious explanation of the empirical observations, in particular in a population of European ancestry where such complex inbreeding loops are uncommon.

We further attempted to quantify the proportion of MT1 born from PO vs. FS mating (π_PO/FS). Given that the theoretical expectation of F_ROH is 0.25 both under PO and FS mating, we found, as expected, that F_ROH alone cannot discriminate PO from FS in our simulations (AUC of ~0.5).

However, F_ROH being proportional to the cumulative length of ROHs across the genome also implies that the same value of F_ROH could reflect either fewer larger ROHs, or more smaller ROHs. Therefore, we investigated if the numbers of ROHs detected (N_ROH) under PO or FS mating are different and if so can discriminate those two types of mating. We found on average over ~20,000 simulation replicates that N_ROH ~45 ROHs are detected in offspring of FS mating as compared to N_ROH ~38 ROHs detected in offspring of PO mating (Table 1). Moreover, ROHs detected in offspring of PO mating were on average ~2.7 Mb longer than ROHs detected under FS mating (Table 1). Consistent with these observations, we found that N_ROH as a predictor of mating type yields a discriminative AUC of ~0.81, with the optimal threshold of >41 yielding a sensitivity of ~0.77 and specificity of ~0.69. Using that threshold we predict that 24/54 (i.e., π_PO/FS ~44.4%; CI_95%: [31.2–57.7%]) EI cases with F_ROH > 0.17 are likely offspring of parent–offspring mating. We also considered an alternative approach that aims at directly estimating the proportion of EI cases born from PO vs. FS mating from modelling the length distribution of ROHs (Methods). We applied this method to 2244 ROHs segments detected in 54 EI cases with F_ROH > 0.17 and estimated that π_PO/FS ~67.6% (CI_95%: [45.2–90.1%]). To confirm this finding, we analysed the distribution of F_ROH from X-chromosome ROHs (hereafter denoted F_ROH-X) in 26 female EI cases with F_ROH > 0.17. This analysis is justified by the fact that the theoretical expectation of F_ROH-X equals 0.5 under PO mating vs. 0.25 under FS mating. We first stratified these 26 female EI cases into two groups (Group 1 and Group 2) depending on whether the likelihood of their autosomal segments lengths is larger under PO mating or under FS mating. More specifically, Group 1 (N = 10) and Group 2 (N = 16) contain female EI cases predicted to be offspring of FS and PO, respectively (Supplementary Fig. 1) from the length distribution of autosomal ROHs. The mean F_ROH-X in Group 2 is 0.53 (CI_95%: [0.41–0.65]), consistent with PO mating, while the mean F_ROH-X in Group 1 is 0.34 (CI_95%: [0.19–0.49]), which is consistent with FS mating, although standard errors are large. Altogether, we found that between 44.4 and 67.6% of EI cases with F_ROH > 0.17 are likely offspring of PO mating.

Table 1 Mean number and length of runs of homozygosity (ROHs) detected in participants from the UK Biobank (UKB), including extreme inbreeding (EI) cases (defined as F_ROH > 0.1) and unrelated EI controls (defined as F_ROH < 0.01). We also report the mean and length of ROHs in simulated data under various mating types

Full size table

We simulated inbreeding between first-cousins (hereafter denoted MT3) in order to quantify the ability of the F_ROH > 0.1 threshold recommended by the ACMG guidelines to discriminate MT1 or MT2 from MT3. We recall here that the coefficient of relationship between first-cousins is 0.125, and therefore the expected inbreeding coefficient of their offspring is E[F_ROH] = 0.5 × 0.125 = 0.0625. Also, MT3 is legal in most countries and thus more common in the population. We found over ~20,000 simulation replicates that F_ROH yields an AUC of ~0.95, and that using F_ROH > 0.1 as a threshold yields a sensitivity of ~0.94 and a specificity of ~0.79 to discriminate MT1 or MT2 from MT3 (Fig. 1). This, therefore, suggests that ~8/125 EI cases identified (i.e., ~6.4%) in this study could in fact be offspring of first-cousins mating. Hill and Weir³⁰ derived that the theoretical standard deviation of inbreeding coeffcients of offspring of first-cousins is ~0.024. Therefore, assuming under MT3 that F_ROH is normally distributed with mean 0.0625 and standard deviation 0.024, follows that the probability of F_ROH > 0.1 equals ~5.9%, which is consistent with our simulations.

Distribution of ROH in EI cases

As expected, we found that EI cases harboured significantly more and significantly longer ROHs than EI controls (F_ROH < 0.01) in the population (Table 1). On average, we detected N_ROH ~33.6 ROHs in EI cases vs. ~4.9 ROHs in EI controls. The mean length of ROHs was L_ROH ~14.8 Mb in EI cases vs. ~2.1 Mb in EI controls. Both mean numbers and mean lengths of ROHs detected are consistent with our simulations of EI (mean N_ROH ~33.6 and L_ROH ~14.0; Table 1). We represent in Fig. 2 the histogram of ROHs length in EI cases, and report in Fig. 3, a few examples of very large ROHs (>100 Mb) covering ~50% of an entire chromosome. We also report X-chromosome ROHs detected 54/125 female EI cases in Supplementary Fig. 2.

Previous theoretical studies have often considered the length of genomic segments homozygous by descent (HBD) to follow an exponential distribution^31,32. These studies generally relied on specific assumptions regarding recombination map functions, like Haldane or Kosambi map functions, which yield tractable algebraical simplifications. However, empirical evidence supporting these assumptions remains limited. Moreover, some of these simplifying assumptions like that of independence between the lengths and the numbers of HBD segments have also been criticised³³. Here, we used an empirical approach to estimate the length distribution of ROHs segments detected in EI cases using a mixture of exponential distributions. Given that only ROHs larger than 1.5 Mb were detected, we modelled the distribution of lengths minus 1.5 Mb and not directly the length distribution, which would better fit a mixture of truncated exponential distributions (Methods). Mixtures of exponential distributions represent a flexible family of probability distributions, from which the exponential distribution is a special case. We selected the number of mixture components best fitting the data using the Bayesian Information Criterion (BIC). To calibrate our inference, we first estimated the length distribution of >282,635 simulated true HBD segments under various mating types. Our simulations are based on observed recombination maps from the 1000 Genomes Project²³, and therefore do not make additional assumptions regarding recombination rates (Methods). We found for all simulated mating types that BIC selects two mixture components, which suggests that the single exponential distribution is likely too simple to characterise the length distribution of HBD segments. Of note, mixtures of two exponential distributions also yield a better fit than gamma distributions that have previously also been proposed¹. Similarly, we estimated the length distribution of >99,794 ROHs detected in our simulated data. We found consistently that the length distribution of simulated ROHs is also well characterised by a mixture of two exponential distributions. We report in Table 2, the parameters of the mixture distributions estimated from true HBD segments and from ROHs. We then estimated the length distribution of the 4196 ROHs detected over all EI cases. We found this distribution to fit a 84:16 mixture of exponential distributions with means ~15.7 Mb (larger component) and ~0.7 Mb (smaller component), respectively (Table 2; Fig. 2). Overall, our findings suggest that the length distribution of HBD segments and ROHs can be well approximated with a mixture of two exponential distributions.

Table 2 Parameters of mixtures of exponential distributions estimated from observed length distributions of homozygous-by-descent (HBD) genomic segments and runs of homozygosity (ROH)

Full size table

Another observation in our simulations was that that the mean number of ROHs detected in an individual was larger than the number of true HBD segments simulated. This somewhat counterintuitive observation is explained by the fact that HBD were defined as segments identical-by-descent (from parents to offspring), while ROHs were re-estimated from the genotypes of simulated offspring. As a consequence, although simulated offspring of matings between unrelated parents have exactly zero HBD segments, they still harbour ROHs > 1.5 Mb given that their chromosomes were sampled from 972 existing UKB participants. Despite not being closely related (genomic relationship (GRM) < 0.05), these 972 UKB participants are still likely to have a distant common ancestor (>25 generations ago), which would lead to detection of ROHs > 1.5 Mb in their (simulated) offspring. We found that simulated offspring of matings between unrelated parents had on average 4.8 ROHs > 1.5 Mb (Table 1). If we subtract that number (i.e., 4.8 ROHs) from the mean number of ROHs detected under simulated inbred matings (Tables 1 and 2), we now find very consistent mean numbers of ROHs and HBD segments per individual. More specifically, for each simulated inbred mating we find, after this correction, 32.5 HBD vs. 33.3 ROH for PO mating, 41.6 HBD vs. 40.4 ROH for fullsibs mating, 20.8 HBD vs. 20.2 ROH for HSs mating, 25.2 HBD vs. 23.5 ROH for avuncular mating, 20.8 vs. 20.1 ROH for grandparent–grandchild mating, 29.8 vs. 26.8 ROH for double-first cousin mating and 14.9 HBD vs. 13.3 ROH for first-cousin mating.

Phenotypic consequences of EI

We quantified the consequences of EI on multiple traits measured in the UKB. We first analysed ten control traits with prior evidence of inbreeding depression^4,8,10,13. Those ten traits are height, hip-to-waist ratio (HWR), handgrip strength (HGS; average of left and right hand), lung function measured as the peak expiratory flow (PEF), visual acuity (VA), auditory acuity (AA), number of years of education (EA), fluid intelligence score (FIS), cognitive function measured as the mean time to correctly identify matches (MTCIM) and fertility measured as the number of children (NCh). We performed linear regressions of these traits on the EI status adjusted for age at recruitment, recruitment centre (treated as a categorical factor), sex, year of birth (treated as a continuous variable), genotyping batch (treated as a factor), socioeconomic status measured by the Townsend deprivation index and population structure measured by ten genetic principal components estimated from HM3 SNPs. As expected, we found that EI cases had a reduced mean in these ten traits as compared to EI controls. More specifically, we found phenotypic means in EI cases to be between 0.3 and 0.7 standard deviation below the population mean (Table 3). Note, that under normality assumptions, between ~25 and ~40% of the population has a phenotype below 0.7 and 0.3 standard deviations below the mean, respectively. Despite the small sample size of 125 EI cases, the reduction was statistically significant (Wald-test p < 0.5/10 = 0.005) for 7 out the 10 traits (Table 3). We also specifically estimated the inbreeding load (often denoted B), which represents the number of loci with deleterious alleles that would cause one death on average if made homozygous³. As previously recommended³⁴, we estimated B using Poisson regression of the number of children engendered onto F_ROH. Poisson regression was performed using a logarithmic link function as also previously recommended³⁴ and adjusted for the same covariates listed above. For this analysis, we used the entire distribution of F_ROH, (i.e., includes both EI cases and EI controls) and found an estimate of B ~1.46 (CI_95%: [0.87–2.05]; Wald-test p = 1.3 × 10⁻⁶; Table 3). The effect of inbreeding on fertility of the resulting inbred offspring, that we have quantified here, has been previously detected in humans³⁵. However, the latter study did not provide an estimate of inbreeding load that can be directly compared with ours. Nonetheless, we found that our estimate is consistent with estimates of inbreeding load on survival of offspring from inbred mating in humans^3,36 and other species^34,37, although these are different traits.

Table 3 Association between extreme inbreeding (EI) and multiple traits measured in UK Biobank participants (125 EI cases vs. 345,276 EI controls)

Full size table

We then assessed whether the observed reduction in these ten traits was consistent with inbreeding depression quantified within EI controls. Under the assumption that inbreeding depression results only from directional dominance effects of deleterious alleles or heterozygote advantage (overdominance), phenotypes are expected to decline linearly with increased inbreeding. However, if epistasis contributes to inbreeding depression^38,39 or if causal variants for inbreeding depression are rarer¹, a nonlinear relationship could be observed in particular for large inbreeding coefficients. To test this hypothesis we first estimated inbreeding depression in 345,276 EI controls unrelated with each other and unrelated with the 125 EI cases. For each of the 10 control traits, we then compared the phenotypic mean in the 125 EI cases, with a linear prediction based on the estimate of inbreeding depression in EI controls. For this analysis inbreeding depression was also estimated using an alternative inbreeding measure (F_UNI), which we previously showed to be more powerful for detecting inbreeding depression⁴. The latter analysis did not reveal a significant deviation from the linear prediction (Wald-test p > 0.005) regardless of the inbreeding measure used, which therefore underlines that the observed phenotypic reduction in EI cases is consistent with inbreeding depression observed within EI controls (Fig. 4). This also suggests that causal variants contributing to inbreeding depression in those traits are likely well-tagged (i.e., correlated) by common variants in the population. However, we acknowledge that the estimate of inbreeding depression from the EI cases present in the UKB might be too low if, as seems plausible, they are a relatively healthy sample from the population of all EI cases in the UK²⁸.

We next analysed the number of diseases diagnosed in an individual as an overall measure of health (Methods). We used overdispersed Poisson regression to estimate the relative risk (RR) of being diagnosed with at least one disease in EI cases as compared to EI controls. We found a RR of ~1.44 (Wald-test p = 3.6 × 10⁻⁵; Table 3). To minimise potential biases due to partial or differential disease reporting between UKB participants, we re-estimated RR in individuals with at least one disease diagnosed. This analysis included only 110 of the 125 EI cases identified and similarly showed a reduced but still significant RR ~1.34 (Wald-test p = 4.4 × 10⁻⁴; Table 3). In summary, we confirm that EI produces offspring with reduced stature (height), cognitive function (EA, FIS, and MTCIM), AA, muscular fitness (HGS), and lung function (PEF), consistent with a linear decline in these traits as inbreeding increases. We also provide additional evidence that offspring resulting from EI have increased risk for developing any type of disease.

Social context of EI cases

We tested the association between EI and the Townsend depression index, which quantifies the level of socioeconomic deprivation in areas where UKB participants live. We found significant evidence that EI is enriched in more socioeconomically deprived area (odds ratio: 1.22; CI_95%: [1.16–1.29]; Wald-test p = 2.6 × 10⁻¹³), consistent with a previous study¹³, which reported association between F_ROH and the same index in the UKB.

We further investigated the social contexts in which EI arose. For that we compared different characteristics of the parents of EI cases with that of the parents of EI controls. We found that 14.5% (i.e., 18/124, 1 missing value) of EI cases vs. 1.5% of controls reported to be adopted as a child (Fisher exact test p = 7.3 × 10⁻¹³). Given the significance of this difference we therefore focused all subsequent comparisons in nonadopted participants (106 EI cases vs. 339,241 EI controls) in order to minimise biases due to differential reporting of parental traits.

Previous studies⁴⁰ have suggested that low EA of parents could be a cause of inbreeding in the population. Given that EA of parents of UKB participants has not been measured, we therefore tested this hypothesis by comparing mean genetic predictors of EA in UKB participants between EI cases and EI controls. Note that mean genetic predictor of EA is an estimate of the parental average for this trait. We found no statistical evidence that the mean genetic predictor of EA in EI cases deviate from that of EI controls (t test p = 0.538; Table 3). In fact, the mean genetic predictor of EA in EI cases approximately equals the median of the EA genetic predictor distribution in EI controls, which highlights that EI cases are not outliers on this scale. Besides EA, we then used overdispersed Poisson regression to compare the number of diseases (Online method) reported in parents of EI cases vs. parents of controls, which we used as another proxy for socioeconomic status of parents. We found no significant evidence that parents of EI cases are enriched for comorbidities as compared to parents of EI controls (RR ~0.96; Wald-test p = 0.507; Table 3). However, this observation must be interpreted with caution as it may simply reflect that EI cases observed in the UKB may be from more healthier background as compared to EI in the general population. Although additional information on parents of UKB participants was available (i.e., age of parents or age when parent died), missing values rates were often too large (>50%) among EI cases to draw reliable inference. Finally, we investigated if EI cases were geographically clustered, but found no significant association between EI and birth location (North coordinate: Wald-test p = 0.15; East-coordinate: Wald-test p = 0.08). Note that the absence of geographical clustering that we report only applies to these extreme events and could also reflect lack of statistical power as we still observed variance in mean F_ROH between different geographical areas of the UK. Altogether, although we observed that EI is more prevalent in more socially deprived areas of the UK, our results point to an absence of evidence that social and geographical stratification of parents contribute to the prevalence of EI in the population.

Discussion

In this study, we estimated a prevalence of EI of ~1/3652 in individuals of European ancestry born in the UK between 1938 and 1967. Importantly, our estimate of the UK prevalence of EI is likely downwardly biased partly because of the ascertainment of UKB participants, who are on average healthier and more educated than the rest of the UK population²⁸. It also worth mentioning that our estimate only accounts for mating between close relatives that have led to viable offspring. Altogether, our findings suggest that the prevalence of EI in the population is small and that very large observational studies are required to quantify it accurately.

We aimed in this study to quantify EI as it can routinely be detected in clinical screenings if genotypes are available. Therefore, we followed guidelines from the American College of Medical Genetics and Genomics, which recommend the use of both F_ROH and a threshold at 0.1. Nevertheless, we acknowledge that ACMG guidelines may be suboptimal with respect to detection of EI and that other approaches could have been in implemented^41,42. We found in our simulations that a threshold 0.1 may in fact be too conservative, while using a threshold of ~0.08 is optimal with respect to specificity and sensitivity to detect EI (Fig. 1).

In addition, our study has addressed theoretical questions regarding the distribution of genomic segments homozygous-by-descent, which are classically approximated using long ROHs. Indeed, we explored how the distribution of long ROHs can be utilised to infer mating types underlying EI. Although we only applied threshold-based methods, we found that such simple approaches perform quite well in our simulations (AUC > 0.95). However, it is worth mentioning that previous studies have addressed a similar question using more elaborate models. For example, Druet and Gautier⁴² introduced a model-based approach which assumes individual genomes to be a mosaic of HBD and non-HBD segments, and allows HBD segments to originate from different ancestors at different time points. The aim of their method is therefore to estimate simultanesouly the age and the HBD status of genomic segments. Note that knowing the age of an HBD segments directly informs the likelihood of certain mating types.

One similarity between Druet and Gautier’s approach and ours, is that we both assumed the distribution of HBD segments to follow a mixture of exponential distributions. However, our approach relies on observed ROHs, which we have assumed to be HBD, whereas Druet and Gautier models HBD segments as unobserved states of a hidden Markov chain. Consequently, their inference is likely more robust to biases from ROHs calling, which often requires arbitrary choices to be made (e.g., minimum length of ROHs, minimum distance between ROHs and number of occasional heterozygotes allowed). On the other hand, the Druet and Gautier method relies on the assumption that the length of HBD segments follows an exponential distribution as a consequence of assuming a constant recombination rate. Our study provides a simulation-based (using observed genetic maps) and an empirical quantification of the length distribution of long genomic segments identical-by-descent, which we found to best fit a mixture of two exponential distributions. Therefore, our results confirm that the assumption of constant recombination rate is inappropriate for describing segments length distribution³³, and we show that mixtures of exponential distributions provide a mathematically tractable framework to accommodate arbitrary recombination maps. We note that Druet and Gautier acknowledged that violation of the assumption of a constant recombination rate across the genome could limit the interpretation of their model parameters.

We showed in this study that the reduction in measured values of multiple complex fitness-related traits resulting from EI is consistent with inbreeding depression estimated within EI controls, who still harbour ROHs in their genome⁴³. If inbreeding depression in EI controls is well estimated then the latter finding would suggest that gene × gene or gene × environment interactions contribute little to inbreeding depression in the traits analysed and also that variants causal of inbreeding depression in these traits are well tagged (i.e., correlated) by common SNPs. However, because of ascertainment of UKB participants who are on average healthier and more educated than the general population²⁸, estimates of inbreeding depression in UKB participants may also be underestimated. Moreover, Curik et al.⁴⁴ showed using computer simulations that the absence or presence of a nonlinear relationship between inbreeding and traits should be interpreted with caution in particular when inbreeding depression is estimated using an inbreeding measure which only partially reflects realised autozygosity, as is the case for F_ROH.

Lastly, we attempted to quantify the contribution of social contexts to the prevalence of EI. Despite the sparsity of parental information for EI cases, we found no evidence that EI is more prevalent in health-deprived families nor that low education contributes to increase the likelihood of EI in the population. In conclusion, our study provides an objective quantification of EI in the UK population and shed lights on its causes and phenotypic consequences.

Methods

SNP genotyping

We used genotyped and imputed allele counts at 16,652,994 SNPs imputed to the Haplotype Reference Consortium⁴⁵ imputation reference panel, in 487,409 participants of the UKB^29,46. Extensive description of data can be found here²⁶. We restricted our analysis to 456,414 participants of European ancestry identified using projected principal components based on sequenced participants of the 1000 genomes projects with known ancestry²⁶. This subset of the UKB contains 348,502 conventionally unrelated participants, i.e., whose estimated pairwise SNP-based GRM < 0.05, estimated using 1,124,803 common (minor allele frequency (MAF) ≥ 1%) HapMap3⁴⁷ SNPs using GCTA (v1.9)⁴⁸. The North West Multi-Centre Research Ethics Committee (MREC) approved the study and all participants in the UKB study analysed here provided written informed consent.

Polgenic predictor of EA

We used estimated SNP effects from the Lee et al.⁴⁹ GWAS of EA to calculate polygenic score predicting EA. HM3 SNP effects were re-estimated after excluding data from the UKB. Marginal SNP effects were then transformed into conditional SNP effects using the LD-pred method⁵⁰ assuming all SNPs to be causal. The latter analysis used genotypes at HM3 imputed SNPs of ~300,000 unrelated UKB participants as linkage disequilibrium reference panel.

ROH detection

ROH were called using only 301,412 SNPs genotyped in 456,414 UKB participants of European descent. These SNPs were filtered on missingness rate (missingness < 1%), MAF > 5% and Hardy–Weinberg equilibrium test p value > 0.0001. As in previous studies^4,8,10, we used the following PLINK (versions 1.07 and 1.9)^51,52 command to call ROH: --maf 0.05 --homozyg --homozyg-density 50 --homozyg-gap 1000 --homozyg-kb 1500 --homozyg-snp 50 --homozyg-window-het 1 --homozyg-window-missing 5 --homozyg-window-snp 50. That command detects ROHs at least 1.5 Mb long, at least 1 Mb apart from one another, containing at least 50 SNPs, and such that SNPs overlapping ROH can have at most 5 missing values and 1 occasional heterozygote. Once ROHs detected, we calculate an inbreeding measure F_ROH for each individual by dividing the cumulated length of ROH in Mb by an estimate of the length of the human autosome, i.e., ~2881 Mb under genome build hg19. Note that this estimate of autosome length may vary between genome builds, and therefore may impact the number of individuals detected above a given threshold.

Simulation of EI

To simulate EI we used 972 independent (GRM < 0.05) trios (both parents and one offspring) out of 1066 identified in the UKB²⁹. We used the same set of 301,412 genotyped and quality-controlled SNPs as to call ROH to phase haplotypes using SHAPEIT 2 with the following options: --duohmm -W 5 -T 10 and using genetic maps from the 1000 Genomes (1KG) Project phase 3 (hg19, see URL)²³. We considered eight different mating types (pedigrees): mating between unrelated individuals (i.e., any pair among the unrelated 972 samples), between first-cousins, between double-first cousins, between grandchildren and grandparents, between uncles/aunts and nieces/nephews, between HSs, between fullsibs and between parents and offspring. Nonetheless, we describe here the case of PO mating. First, we sample a random pair of individuals (denoted P₁ and P₂) out of 972 × 971/2 = 471,906 possible pairs. We then create recombined chromosomes from haplotypes of P₁ and P₂. For all genetic intervals defined in the 1KG genetic maps, we sample the Bernoulli distributed indicator of the presence of a recombination breakpoint with probability equal to 0.01 × genetic distance of the interval in Morgan(s). Once the recombined chromosomes of the offspring O of P₁ and P₂ are simulated, we then repeat this procedure to simulate an offspring resulting from mating of O with one of the parent, i.e., P₁ or P₂. To then mimic real data, which contain genotyping errors, we also add a random number of errors to the simulated genotypes. The number of errors is sampled from a Poisson distribution with a mean corresponding to the mean number of genotyping errors estimated, for each chromosome, from comparing genotypes of 168 twin pairs (Supplementary Table 2). We found overall a genotyping error at quality controlled SNPs ~4.5 × 10⁻⁴, which is orders of magnitude larger than the rate of new somatic mutation, which was previously estimated around ~2.8 × 10⁻⁷ in human fibroblasts⁵³. Therefore, somatic mutation would have a negligible effect on ROH calling given the set of parameters that we used.

Association with phenotypes measured in the UKB

We used GCTA with the --ibc command to estimate for each UKB participants the correlation between uniting gametes⁴⁸. That statistic denoted F_UNI (also known as “Fhat3”) is an estimate of inbreeding using allele frequencies in the current population and was previously shown to be more powerful to detect ID⁴. We nonetheless condidered F_ROH as a reference inbreeding measure in this study in accordance with the ACMG guidelines. We tested the association between inbreeding measures (F_ROH and F_UNI) and traits using linear regression adjusted for age at recruitment (UKB field 21022–0.0), sex, assessment centre (UKB field 54–0.0), genotyping chip and batch, year of birth (UKB field 34–0.0), socioeconomical status measured by the Townsend deprivation index (UKB field 189–0.0) and 10 genetic principal components calculated using PLINK 2.0. Analyses were performed in 345,276 unrelated EI controls (F_ROH < 0.01). Traits were pre-adjusted and inverse normal transformed and phenotypic values larger than >4 standard deviations were excluded. UKB identifiers for tested traits are: height (UKB field 50-0.0), hip-to-waist ratio (HWR: ratio of UKB field 49-0.0 over UKB field 48-0.0), HGS (average of UKB fields 46-0.0 and 47-0.0), lung function measured as the PEF (UKB field 3064-0.0), VA measured on log MAR scale (VA: average between UKB field 5201-0.0 and UKB field 5208-0.0), auditory acuity measured as te speech reception threshold (AA: average between UKB field 20,019-0.0 and UKB field 20,021-0.0), number of years of education (EA), fluid intelligence score (FIS: UKB field 20,016-0.0), cognitive function measured as the mean time to correctly identify matches (MTCIM: UKB field 20,023-0.0) and fertility measured as the number of children (NCh: for males UKB field 2405-0.0 and for females UKB field 2734-0.0). To test the association between number of diseases diagnosed and inbreeding, we used overdispersed Poisson regression implemented in R 3.2.0 (glm function with option family = “quasipoisson”). Number of diseases diagnosed was estimated as the number International Classification of Diseases, Tenth Revision (ICD10) codes reported for UKB participants. We also analysed reported illnesses in fathers and mothers of UKB participants (UKB fields 20,107 and 20,110, respectively) as measure of health deprivation in the family. Illnesses of parents were reported among 12 groups of diseases (URLs). We created for each participant a count of diseases in both parents. Analysis were adjusted for adoption status (UKB field 1767) and missing values on the parental diseases were excluded.

Length distribution of ROHs

We estimated the length distribution of ROHs using a mixture of exponential distributions with a number of components from 1 to 10. Given that only ROHs larger than 1.5 Mb are detected, we therefore analysed lengths of ROHs in Mb minus the minimum threshold (as in Fig. 2). This choice is justified by following property of exponential distributions. If X follows an exponential distribution of rate λ, then Y = X|X > s, i.e., the truncated distribution of X with values larger than a given threshold s, is such that (Y-s) also follows an exponential distribution with the same rate (λ) as X. Estimation of mixture distribution was performed using the R package Renext. Model selection was performed using BIC criterion.

Discriminate PO vs. FS mating from ROH length distribution

Given a collection of autosomal ROH segments lengths, we developed a method for estimating the proportion π_PO/FS of these segments resulting from PO vs. FS mating. We denote f_PO and f_FS as the probability density functions of (ROHs) segments length under PO and FS, respectively. We assume that the length distribution of the set of ROHs used for inference is a mixture of f_PO and f_FS and we denote π_PO/FS as the mixture proportion. We also assumed f_PO and f_FS to be known so that the parameter of interest, i.e., that we want to estimate, is π_PO/FS.

The log-likelihood l(x;π_PO/FS) of one segment of length x can be written as

$${l\left( {x;\pi _{PO/FS}} \right) = {\mathrm{log}}\left[ {\pi _{PO/FS}f_{PO}(x) + \left( {1 - \pi _{PO/FS}} \right)f_{FS}(x)} \right] = {\mathrm{log}}\left[ {\pi _{PO/FS}\left( {f_{PO}(x) - f_{FS}(x)} \right) + f_{FS}(x)} \right]}.$$

(1)

From that we can write the Fisher information as

$${\Bbb E}\left[ { - \frac{{\partial ^2l\left( {X;\pi _{PO/FS}} \right)}}{{\partial \pi _{PO/FS^2}}}} \right] = {\Bbb E}\left[ {\left( {\frac{{f_{PO}(X) - f_{FS}(X)}}{{\pi _{PO/FS}\left( {f_{PO}(X) - f_{FS}(X)} \right) + f_{FS}(X)}}} \right)^2} \right],$$

(2)

where X ’s probability log density function is l(x;π_PO/FS). Therefore the asymptotic variance of the maximum likelihood estimator $\hat \pi _{PO/FS}$ of π_PO/FS would be

$${\mathrm{var}}[\hat \pi _{PO/FS}] \approx \frac{1}{N} \times \left\{ {{\Bbb E}\left[ {\left( {\frac{{f_{PO}(X) - f_{FS}(X)}}{{\pi _{PO/FS}\left( {f_{PO}(X) - f_{FS}(X)} \right) + f_{FS}(X)}}} \right)^2} \right]} \right\}^{ - 1}.$$

(3)

where N is the number of segments used to estimate π_PO/FS.

We use parameters from Table 2 to characterise f_PO and f_FS. Each of these two distributions were approximated using mixtures of two exponential distributions which parameters were estimated from >648,125 simulated ROHs under PO and FS. Conditional on f_PO and f_FS, estimating π_PO/FS is therefore a straightforward univariate optimisation problem. We used Eq. (3) to quantify the standard error of $\hat \pi _{PO/FS}$. The expectation in Eq. (3) was approximated using one million Monte Carlo simulations conditional on $\hat \pi _{PO/FS}$, f_PO and f_FS.

URLs

For Crime Survey for England and Wales, see https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/crimeandjustice/datasets/sexualoffencesappendixtables/yearendingmarch2017/sexualoffencesappendixtablesmarch2017.xls

(Table 8; Offence code 23, total number of cases is 11,196).

For Population sizes in England and Wales, see

https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates.

For Educational attainment in the UK from 2011 Census, see

http://www.nomisweb.co.uk/census/2011/DC5102EW/view/2092957703?rows=c_age&cols=c_hlqpuk11.

For Genetic maps from the 1,000 Genomes Project, see ftp://ngs.sanger.ac.uk/production/samtools/genetic-map.tgz.

For UK Biobank groups of diseases affecting parents, see

https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=1010.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

This study makes use of genotype and phenotype data from the UK Biobank data under project 12505. UKB data can be accessed upon request once a research project has been submitted and approved by the UKB committee. We also provide data source for generating all figures.

Code availability

We provide the R script that we used to simulate genotypes of EI cases individuals from a collection of phased haplotypes in SHAPEIT2 format at https://github.com/loic-yengo/InbreedingSimulationR. We also provide R scripts for generating all figures.

References

Saura, M. et al. Detecting inbreeding depression for reproductive traits in Iberian pigs using genome-wide data. Genet. Sel. Evol. 47, 1 (2015).
Article Google Scholar
Huisman, J., Kruuk, L. E. B., Ellis, P. A., Clutton-Brock, T. & Pemberton, J. M. Inbreeding depression across the lifespan in a wild mammal population. Proc. Natl Acad. Sci. 113, 3585–3590 (2016).
Article ADS CAS Google Scholar
Morton, N. E., Crow, J. F. & Muller, H. J. An estimate of the mutational damage in man from data on consanguineous marriages. Proc. Natl Acad. Sci. 42, 855–863 (1956).
Article ADS CAS Google Scholar
Yengo, L. et al. Detection and quantification of inbreeding depression for complex traits from SNP data. Proc. Natl Acad. Sci. USA 114, 8602–8607 (2017).
Article CAS Google Scholar
Charlesworth, B. & Charlesworth, D. The genetic basis of inbreeding depression. Genet. Res. 74, 329–340 (1999).
Article CAS Google Scholar
Charlesworth, D. & Willis, J. H. The genetics of inbreeding depression. Nat. Rev. Genet. 10, 783–796 (2009).
Article CAS Google Scholar
Pemberton, J. M., Ellis, P. E., Pilkington, J. G. & Bérénos, C. Inbreeding depression by environment interactions in a free-living mammal population. Heredity 118, 64–77 (2017).
Article CAS Google Scholar
McQuillan, R. et al. Evidence of inbreeding depression on human height. PLoS Genet. 8, e1002655 (2012).
Article CAS Google Scholar
Fareed, M. & Afzal, M. Evidence of inbreeding depression on height, weight, and body mass index: a population-based child cohort study. Am. J. Hum. Biol. 26, 784–795 (2014).
Article Google Scholar
Joshi, P. K. et al. Directional dominance on stature and cognition in diverse human populations. Nature 523, 459–462 (2015).
Article Google Scholar
Fareed, M., Kaisar Ahmad, M., Azeem Anwar, M. & Afzal, M. Impact of consanguineous marriages and degrees of inbreeding on fertility, child mortality, secondary sex ratio, selection intensity, and genetic load: a cross-sectional study from Northern India. Pediatr. Res. 81, 18–26 (2017).
Article Google Scholar
Dorsten, L. E., Hotchkiss, L. & King, T. M. The effect of inbreeding on early childhood mortality: twelve generations of an amish settlement. Demography 36, 263–271 (1999).
Article CAS Google Scholar
Johnson, E. C., Evans, L. M. & Keller, M. C. Relationships between estimated autozygosity and complex traits in the UK Biobank. PLoS Genet. 14, e1007556 (2018).
Article Google Scholar
Pusey, A. & Wolf, M. Inbreeding avoidance in animals. Trends Ecol. Evol. 11, 201–206 (1996).
Article CAS Google Scholar
Finkelhor, D. Sexually Victimized Children. Sociology Scholarship (1979).
Meiselman, K. C. Incest: A Psychological Study of Causes and Effects with Treatment Recommendations1st edn (Jossey-Bass Publishers, San Francisco 1978).
De Francis, V. Protecting the Child Victim of Sex Crimes Committed by Adults. Final Report (1969).
Weinberg, S. K. Incest Behavior. (Citadel Press, Secaucus, NJ, US 1955).
Maisch, H. Incest. (Stein and Day, New York 1972).
Sariola, H. & Uutela, A. The prevalence and context of incest abuse in Finland. Child Abus. Negl. 20, 843–850 (1996).
Article CAS Google Scholar
Berghe, P. L. van den. Human inbreeding avoidance: culture in nature. Behav. Brain Sci. 6, 91–102 (1983).
Article Google Scholar
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. https://doi.org/10.1093/hmg/ddy271 (2018).
Article CAS Google Scholar
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Gazal, S. et al. Inbreeding coefficient estimation with dense SNP data: comparison of strategies and application to HapMap III. Hum. Hered. 77, 49–62 (2014).
Article Google Scholar
Keller, M. C., Visscher, P. M. & Goddard, M. E. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics 189, 237–249 (2011).
Article Google Scholar
Rehder, C. W. et al. American College of Medical Genetics and Genomics: standards and guidelines for documenting suspected consanguinity as an incidental finding of genomic testing. Genet. Med. 15, 150–152 (2013).
Article Google Scholar
Sund, K. L. & Rehder, C. W. Detection and reporting of homozygosity associated with consanguinity in the clinical laboratory. Hum. Hered. 77, 217–224 (2014).
Article Google Scholar
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Article Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203 (2018).
Article ADS CAS Google Scholar
Hill, W. G. & Weir, B. S. Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genet. Res. 93, 47–64 (2011).
Article CAS Google Scholar
Stam, P. The distribution of the fraction of the genome identical by descent in finite random mating populations. Genet. Res. 35, 131–155 (1980).
Article Google Scholar
Clark, A. G. The size distribution of homozygous segments in the human genome. Am. J. Hum. Genet. 65, 1489–1492 (1999).
Article CAS Google Scholar
Franklin, I. R. The distribution of the proportion of the genome which is homozygous by descent in inbred individuals. Theor. Popul. Biol. 11, 60–80 (1977).
Article CAS Google Scholar
Nietlisbach, P., Muff, S., Reid, J. M., Whitlock, M. C. & Keller, L. F. Nonequivalent lethal equivalents: Models and inbreeding metrics for unbiased estimation of inbreeding load. Evol. Appl. 12, 266–279 (2018).
Article Google Scholar
Postma, E., Martini, L. & Martini, P. Inbred women in a small and isolated Swiss village have fewer children. J. Evol. Biol. 23, 1468–1474 (2010).
Article CAS Google Scholar
Lee, J. K., Lascoux, M. & Nordheim, E. V. Number of lethal equivalents in human populations: how good are the previous estimates? Heredity 77, 209 (1996).
Article Google Scholar
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, Sunderland, Massachussets, 01375 USA 1998).
Lynch, M. The genetic interpretation of inbreeding depression and outbreeding depression. Evolution 45, 622–629 (1991).
Article Google Scholar
Crow, J. F. & Kimura, M. An Introduction to Population Genetics Theory (Blackburn Press, Caldwell, New Jersey 07006 USA 2009).
Abdellaoui, A. et al. Educational attainment influences levels of homozygosity through migration and assortative mating. PLoS ONE 10, e0118935 (2015).
Article Google Scholar
Solé, M. et al. Age-based partitioning of individual genomic inbreeding levels in Belgian Blue cattle. Genet. Sel. Evol. 49, 92 (2017).
Article Google Scholar
Druet, T. & Gautier, M. A model-based approach to characterize individual inbreeding at both global and local genomic scales. Mol. Ecol. 26, 5820–5841 (2017).
Article CAS Google Scholar
Ceballos, F. C., Joshi, P. K., Clark, D. W., Ramsay, M. & Wilson, J. F. Runs of homozygosity: windows into population history and trait architecture. Nat. Rev. Genet. 19, 220 (2018).
Article CAS Google Scholar
Curik, S. ölkner & Stipic The influence of selection and epistasis on inbreeding depression estimates. J. Anim. Breed. Genet. 118, 247–262 (2001).
Article Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS Google Scholar
Allen, N. et al. UK Biobank: current status and what it means for epidemiology. Health Policy Technol. 1, 123–126 (2012).
Article Google Scholar
International HapMap 3 Consortium. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Article ADS Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS Google Scholar
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Article CAS Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS Google Scholar
Milholland, B. et al. Differences between germline and somatic mutation rates in humans and mice. Nat. Commun. 8, 15183 (2017).
Article ADS CAS Google Scholar

Download references

Acknowledgements

This research was supported by the Australian Research Council (DP160103860 and DP160102400), and the Australian National Health and Medical Research Council (1078037, 1078901 and 1113400). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding bodies. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This research has been conducted using the UK Biobank Resource under project 12505. We thank Bill Hill, Bruce Weir and Sadia Bouzegane for helpful comments and suggestions on the manuscript. We also thank Sean Lee and Patrick Turley for their help in calculating polygenic predictors of educational attainment in UKB participants.

Author information

Authors and Affiliations

Institute for Molecular Bioscience, The University of Queensland, QLD 4072, Brisbane, Australia
Loic Yengo, Naomi R. Wray & Peter M. Visscher
Queensland Brain Institute, The University of Queensland, Brisbane, 4072, Australia
Naomi R. Wray & Peter M. Visscher

Authors

Loic Yengo
View author publications
You can also search for this author in PubMed Google Scholar
Naomi R. Wray
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Visscher
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.M.V, L.Y. and N.R.W. conceived and designed the study. L.Y. and P.M.V derived the theory. L.Y. performed statistical analyses and simulations. L.Y., N.R.W and P.M.V wrote the paper.

Corresponding authors

Correspondence to Loic Yengo or Peter M. Visscher.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information: Nature Communications thanks Andrew Clark, Ino Curik and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yengo, L., Wray, N.R. & Visscher, P.M. Extreme inbreeding in a European ancestry sample from the contemporary UK population. Nat Commun 10, 3719 (2019). https://doi.org/10.1038/s41467-019-11724-6

Download citation

Received: 23 February 2019
Accepted: 18 July 2019
Published: 03 September 2019
DOI: https://doi.org/10.1038/s41467-019-11724-6

This article is cited by

Who owns (or controls) health data?
- Scott D. Kahn
- Sharon F. Terry
Scientific Data (2024)
The effect of inbreeding, body size and morphology on health in dog breeds
- Danika Bannasch
- Thomas Famula
- Robert Rebhun
Canine Medicine and Genetics (2021)
The value of genomic relationship matrices to estimate levels of inbreeding
- Beatriz Villanueva
- Almudena Fernández
- Ricardo Pong-Wong
Genetics Selection Evolution (2021)
Almond diversity and homozygosity define structure, kinship, inbreeding, and linkage disequilibrium in cultivated germplasm, and reveal genomic associations with nut and seed weight
- Stefano Pavan
- Chiara Delvento
- Concetta Lotti
Horticulture Research (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Prevalence of EI in European descents from the UKB

Deconvolution of underlying mating types

Distribution of ROH in EI cases

Phenotypic consequences of EI

Social context of EI cases

Discussion

Methods

SNP genotyping

Polgenic predictor of EA

ROH detection

Simulation of EI

Association with phenotypes measured in the UKB

Length distribution of ROHs

Discriminate PO vs. FS mating from ROH length distribution

URLs

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links