Introduction

Offspring of genetically related individuals are inbred, meaning that they harbor genomic segments homozygous because of common ancestry (Wright, 1922; identical by decent (IBD); Malécot, 1948). Studies on inbreeding often aim at quantifying the amount of variance in fitness explained by inbreeding or the inbreeding load within a population (Szulkin et al., 2010). To do this, the proportion of an individual’s genome that is IBD (genome-wide IBD (GWIBD)) needs to be quantified as precisely as possible, because it is the best predictor of the homozygous mutation load of an individual (all recessive deleterious mutations that become IBD contribute to inbreeding depression; Figure 1; Keller et al., 2011).

Figure 1
figure 1

Conceptualization of the causality in inbreeding and heterozygosity–fitness correlations/regressions. Arrowheads represent the causal direction (see also Szulkin et al., 2010). We are generally interested in estimating GWIBD because it is most directly related to the homozygous mutational load and inbreeding depression. The heterogeneous distribution of recessive deleterious mutations, the environment and other genetic factors like epistatic interactions are introducing noise into that relation. Mating patterns as represented by pedigrees are one way to quantify GWIBD (Pedigree F), but Mendelian segregation is adding noise to this estimate of GWIBD. Background inbreeding resulting from relatedness between pedigree founders introduces both random noise and bias into the relationship between GWIBD and Pedigree F. If Background F is absent and the relation between fitness and GWIBD is linear, both Pedigree F and GWIBD are error-free predictors in fitness–inbreeding regressions and consequently the regression slopes are unbiased. Berkson (1950) explains this in terms of a ‘controlled’ experiment, in which case both an underlying variable measured accurately (GWIBD) and its expected value (Pedigree F) give unbiased linear regression slopes (see also Muff et al., 2015). Frequently, molecular markers are used as predictors of GWIBD. Here, the direction of causality is reversed, because GWIBD causes Marker IBD that in turn affects Marker IBS. Both dependencies are affected by noise components that will introduce random error in the predictors (Marker IBD or Marker IBS) used in heterozygosity–fitness regressions. Consequently, the regression slopes will be biased downward. Colors and dashed lines represent the different estimates depicted in Figure 4 and Supplementary Figures S6 and S7.

GWIBD is generally not known precisely for an individual, but is estimated using either pedigree information (Pedigree F) or molecular markers (Marker F). On the one hand, Pedigree F is the expected proportion of GWIBD of a diploid individual and is traditionally quantified using Wright’s path method as F=2−(2*k+2) for an individual whose parents are kth generation linear descendants from a common ancestor (Wright, 1922). It can also be extended to incorporate complex inbreeding loops (Malécot, 1948). On the other hand, Marker F reflects the actual proportion of GWIBD of an individual. A multitude of methods exist for calculating Marker F but until now no consensus has been reached on which of these methods yields the most precise estimate of GWIBD (a first evaluation of the different metrics was performed in Kardos et al., 2015). Given a large number of genotyped molecular markers (that is, 10 000 single-nucleotide polymorphisms (SNPs)), both averages of single marker estimates and estimates based on several closely linked markers perform equally well in predicting GWIBD (Kardos et al., 2015).

According to Griffiths et al. (2005), meiotic recombination is defined as ‘any meiotic process that generates a haploid product with new combinations of the alleles carried by the haploid genotypes that united to form the dihybrid meiocyte’. Thus, three different cellular processes contribute to meiotic recombination: The independent assortment of chromosomes, the number and also the distribution of crossovers. All three processes influence the variation in GWIBD, as we will explain in the following.

Genomes do not get transmitted as independent base pairs but rather in segments of DNA, leading to linkage between adjacent segments and this increases the amount of variation around the expected GWIBD (Mendelian noise; Fisher, 1949). For any given inbreeding constellation, the more segments in a genome segregate independently, the lower the variation in GWIBD between individuals of the same inbreeding constellation will be (law of large numbers; Rasmuson, 1993; Visscher, 2009). Consequently, because chromosomes get inherited as independent units in meiosis, the Mendelian noise for a given inbreeding constellation will be smaller in species with more chromosomes (Hill and Weir, 2011). Moreover, genomic segments on a given chromosome will be broken up by crossovers during meiosis which increases the number of independently segregating units in a genome. Thus, the longer the genetic map of a genome, which directly corresponds with the expected number of crossovers in each meiosis, the smaller the Mendelian sampling noise (Hill and Weir, 2011; Wang, 2016). To our knowledge, all analytical analyses and most of the simulations so far have assumed a uniform distribution of crossovers along chromosomes (see, for example, Franklin, 1977; Stam, 1980; Hill and Weir, 2011; Kardos et al., 2015; Wang, 2016; but see Suarez et al., 1979 and Libiger and Schork, 2007 for Monte Carlo simulations on relatedness). Although this assumption holds more or less for the human genome (Matise et al., 2007), linkage maps from other species have shown that the distribution of recombination along chromosomes can be highly biased toward the telomeres (Gore et al., 2009; Backström et al., 2010). We and others predicted that this results in a more block-like inheritance pattern of genomic segments on a given chromosome which in turn increases the Mendelian noise (Risch and Lange, 1979; Guo, 1995; Forstmeier et al., 2012). Here we use a Monte Carlo gene-dropping simulation that incorporates all three cellular processes of meiotic recombination to show their combined effects on the amount of Mendelian noise.

The precision of Pedigree F in predicting GWIBD not only depends on the amount of Mendelian noise but also on the relatedness between pedigree founders. Throughout this paper we will mostly focus on an idealized scenario, where all inbreeding is fully defined and captured by the pedigree information and we will only briefly highlight the effects of related or inbred pedigree founders on Pedigree F. A more in-depth treatment of this issue can be found elsewhere and is outside the scope of the current study (see, for example, Powell et al., 2010; Thompson, 2013; Knief et al., 2015; Speed and Balding, 2015).

Usually, only parts of a genome are covered by molecular markers and these parts are used as a proxy for GWIBD (Marker F; Powell et al., 2010). This will introduce variation into our estimates of GWIBD which we call the marker sampling noise. Unlike the Mendelian sampling noise that is an inherent result of the meiotic process, the marker sampling noise is only a consequence of the limited number of markers used; hence, the precision of GWIBD estimates increases when more molecular markers are sampled (Miller et al., 2014; Kardos et al., 2015; Wang, 2016). Meiotic recombination also influences Marker F (Wang, 2016): given a fixed number of markers, the precision of Marker F decreases with the amount of meiotic recombination (whereas precision of Pedigree F increases), because each marker contains less information about GWIBD in genomes with more independently segregating segments (that is, longer genetic maps; Wang, 2016).

Microsatellite markers are and will continue to be first choice for parentage analyses in wild and captive animal populations because they are cheap and easy to genotype even in small labs for large numbers of individuals. The vast majority of heterozygosity–fitness studies are thus conducted on these data and we need a framework to interpret their results (Taylor, 2015). In these studies homozygosity at multiple unlinked microsatellites is usually combined into a measure of GWIBD, for example, as the percentage of markers being homozygous. Yet, microsatellite homozygosity does not translate directly into IBD for two reasons. (1) Two homologous DNA segments may have been inherited from a shared ancestor (IBD), but the microsatellite marker located in this segment may have changed because of mutation since the common ancestor. Then, the marker will not be identical by state (IBS) even though the segment is IBD. However, if we define IBD with regard to common ancestors that lived rather recently, such cases will be exceedingly rare, because microsatellite mutation rates are too low to produce substantial error (Goldstein and Schlötterer, 1999). Hence, for simplicity, in the following we ignore this first possibility assuming no mutations. (2) Because a microsatellite marker can adopt only a finite number of states (alleles), two long-separated haplotypes (not IBD) may carry the same allele (IBS) by chance alone. Genotyping errors, which often result in the dropout of one allele at a heterozygous marker (Pompanon et al., 2005), lead to the same effect. Depending on how well marker homozygosity reflects IBD, this may introduce error into the GWIBD estimate (Thompson, 1976; Miller et al., 2014) that we call the IBD–IBS discrepancy. We here specifically refer to a limited number of microsatellites for GWIBD estimation because both the marker sampling noise and the IBD–IBS discrepancy cease when large numbers (10 000) of SNPs are employed for GWIBD estimation (Miller et al., 2014; Kardos et al., 2015; Wang, 2016).

In the present study we use a Monte Carlo gene-dropping simulation to address the following questions. (1) How does the distribution of recombination along chromosomes affect the variance in GWIBD? (2) How does the amount of Mendelian sampling noise (when using Pedigree F) compare with the amount of marker sampling noise and IBD–IBS discrepancy when using a limited number of molecular markers? (3) How many molecular markers are needed to approximate GWIBD better than using pedigree information (Forstmeier et al., 2012)? We incorporate the linkage maps of two genomes that we expect to show high (zebra finch, Taeniopygia guttata) and low (human) amounts of Mendelian noise. Our expectation is based on the fact that although the zebra finch genome consists of 39 autosomes compared with only 22 in humans, about half of its autosomal genome is made up of only 6 chromosomes (Tgu1, Tgu1A, Tgu2, Tgu3, Tgu4 and Tgu5) and shows extremely low recombination rates in the centers of these chromosomes (Backström et al., 2010), whereas in humans recombination is distributed quite uniformly on the megabase scale along chromosomes (Matise et al., 2007). We follow the genomes of 159 diploid pedigree founders, who we assume to be unrelated, through an empirical 7-generation pedigree of 3404 individuals. We define IBD in reference to the pedigree founders and we indicate the number of generations of the pedigree as a subscript (for example, F7, IBD7, GWIBD7 refer to values from a pedigree that is seven generations deep). First, we estimate the Mendelian sampling noise that corresponds to Pedigree F7, ignoring inbreeding stemming from relatedness between pedigree founders. Then, we show the effects of related founders on precision of Pedigree F by shortening our pedigree.

To quantify the amount of marker sampling noise we estimated GWIBD7 from subsets of the simulated genomes (Marker IBD7). We used empirical data from 11 microsatellites genotyped in the seventh generation of our pedigree to estimate the IBD–IBS discrepancy and introduced this noise component into our simulations (Marker IBS7). This allows us to compare the precision of (1) F from pedigrees of varying lengths, (2) multilocus homozygosity of ‘ideal markers’ (Knief et al., 2015), for example, derived from SNP panels (Marker IBD7) and (3) multilocus homozygosity of typical microsatellite markers (Marker IBS7) in predicting GWIBD7.

Materials and methods

Empirical linkage and physical maps

For the zebra finch, we used the sex-averaged linkage map described in Backström et al. (2010) based on 1395 SNPs and covering 33 chromosomes (data accessible from Schielzeth et al., 2011, 2012). We excluded the sex chromosome and 10 microchromosomes covered by <10 markers (see Supplementary Information for details). The zebra finch genome consists of 39 autosomes (Pigozzi and Solari, 1998), and we thus added 17 ‘artificial’ chromosomes to our linkage map assuming a uniform distribution of recombination (see Supplementary Information for details; Supplementary Figure S1).

For humans, we used the sex-averaged Rutgers Map v.2 (Matise et al., 2007) covering 24 168 markers and all 22 autosomes and the X chromosome (Supplementary Figure S2). We removed the sex chromosome from all further analyses. The physical lengths of chromosomes were taken from the hg19/GRCh37 assembly (Collins et al., 2004).

Gene-dropping simulations

Our gene-dropping simulations extend those described in Libiger and Schork (2007). We first specified a genome by the number and size (both genetically and physically) of chromosomes. Then we split each chromosome in predefined physical segments (we used 100 kb segments) and calculated the recombination probability between segments using a smoothed linkage map (see Supplementary Information ‘Linkage map smoothing’) and the Kosambi map function (Kosambi, 1943). Finally, we followed each segment through a specified genealogy by using these recombination probabilities and assuming that founders of the pedigree were unrelated.

  1. 1

    For each founder we simulated a diploid chromosome set (two unique haplotypes without inbreeding for each chromosome). In the zebra finch, each diploid chromosome set consisted of two times 11 509 100 kb segments (each segment defined by a single number at a given position) partitioned across 39 autosomes. In humans, each chromosome set consisted of two times 28 801 of these 100 kb segments partitioned across 22 autosomes.

  2. 2

    The simulation proceeded by creating offspring of the pedigree founders. We tried to simulate meiosis as realistically as possible and implemented the following steps. (a) Before meiosis I the homologous chromosomes (2N2C=2 homologous chromosomes, 2 chromatids) in both the mother and the father duplicate to form two sister chromatids (2N4C) that are identical. (b) In meiosis I (which leads to 1N2C), crossovers between chromatids of the homologous chromosomes occur with probabilities as defined by the linkage map. Crossovers may occur between both chromatids of the homologous chromosomes but not between the two sister chromatids of a single chromosome (remember also that the two sister chromatids are identical). Thus, also unrecombined chromatids may get inherited. (c) One of the four chromatids (that is, 1N1C) in both the mother and the father was chosen randomly to create the offspring that is then 2N2C again.

  3. 3

    Within each offspring the total length of all autozygous stretches was determined as homozygosity for a founder haplotype (GWIBD). The founder haplotypes were the uniquely defined 100 kb segments (represented by a single number, see above) that we tracked through the pedigree (see below). For each offspring we counted how many of these segments were IBD, meaning that the same founder haplotype was at the same position on the offspring’s two chromatids. The end of an autozygous stretch was placed at a randomly chosen base pair between the flanking autozygous and non-autozygous segment in order to lessen the block-like inheritance.

Pedigrees used

We ran our gene-dropping simulations on three pedigrees. (1) 10 000 times on a very simple designed pedigree comprising full-sibs and their offspring (full-sib mating), first-cousins and their offspring (first-cousin mating) and second-cousins and their offspring (second-cousin mating; Supplementary Figure S3). We use this artificial pedigree only to quantify the amount of Mendelian noise in GWIBD in full-sib, first-cousin and second-cousin mating. (2) 1000 times on an empirical pedigree from our captive population of zebra finches held at the Max Planck Institute for Ornithology in Seewiesen, Germany, comprising n=159 founders and n=3404 individuals in total (Supplementary Figure S4). The pedigree spans seven generations: in the first three generations the aim was to produce outbred individuals, the fourth generation contains offspring of full-sib matings and in the last three generations selection lines were produced (six lines in total) that increased the overall level of inbreeding. (3) 1000 times on a pedigree that contains the same birds and generations as the empirical Seewiesen pedigree but in which parents were resampled from the empirical parents within each generation (random mating). In this pedigree mean inbreeding levels and the variation in inbreeding were markedly reduced (Table 1). We calculated the expected inbreeding coefficient under random mating as

Table 1 Precision of Pedigree F in predicting GWIBD within each generation of the empirical and the random-mating zebra finch pedigree

where F(t) is the inbreeding coefficient in generation t and N is the number of parents giving rise to generation t with their average inbreeding coefficient F(t−1) (Falconer and Mackay, 1996). Given that the fifth generation of our pedigree had been bred from the previous three generations (otherwise generations were non-overlapping), we used the weighted mean inbreeding coefficient of those three generations in the formula as F(t−1). Our analysis of precision and bias of Pedigree F, Marker IBD and Marker IBS in both Seewiesen pedigrees is focused on the last (seventh) generation (n=681 individuals), because it contains the most precise information about coancestry. We use the random mating pedigree to show that our results are qualitatively transferable between pedigrees, but will change quantitatively when considering pedigrees with more or less variance in inbreeding (that is, different levels of identity disequilibrium; Miller and Coltman, 2014). If not stated otherwise, we present the results on the empirical pedigree because the higher levels of inbreeding render the effects of interest more clearly visible. We also provide the downloadable simulation script that can be applied to any pedigree. Running the simulation once on the 3404-individual pedigree with a zebra finch linkage map took 45 min to complete on a single computer core (Intel Core i7-2600, 3.4 GHz and 16 Gb RAM) that adds up to a runtime of 31.3 days for 1000 simulation runs.

Estimating the IBD–IBS discrepancy of microsatellites

We used a simplified approach to quantify how often two long-separated haplotypes that are not IBD carry the same allele by chance alone (IBD–IBS discrepancy) using empirical data from 11 microsatellite markers genotyped in the seventh generation of our zebra finch pedigree. A detailed description of how we calculated the IBD–IBS discrepancy is presented in the Supplementary Information. This estimate is specific to our captive population of zebra finches and should be adopted with caution for other populations. The aim of our example is to show how the precision and bias in predicting GWIBD (or fitness) declines when using typical microsatellite markers instead of ‘ideal markers’ (as can be derived from SNP panels; Knief et al., 2015). Essentially, we estimated how often our microsatellite markers would be IBS by chance alone (because of a limited number of allelic states) in the absence of any inbreeding, which renders marker IBS a less reliable predictor of marker IBD.

Comparing molecular estimates of inbreeding with pedigree-based estimates

To compare the precision of pedigree-based with molecular marker-based estimates of inbreeding, we ran our gene-dropping simulations 1000 times on the two 7-generation Seewiesen pedigrees (empirical and random mating) for both the zebra finch and the human genome and recorded GWIBD7 and an estimate of inbreeding based on n=5, 10, 20, 40, 80 and 160 randomly chosen 100 kb segments in the genome (Marker IBD7). Basically, we randomly sampled n 100 kb segments (founder haplotypes represented by a single number, see above) in each individual and counted how many of these segments were IBD (denoted as m), meaning that the same founder haplotype was at the same position on the individual’s two chromatids. Marker IBD7 was then calculated as

Ignoring de novo mutations of marker alleles, all markers that were IBD were designated as IBS. We further incorporated the IBD–IBS discrepancy by calling a marker IBS with the empirically estimated IBD–IBS discrepancy probability (denoted as α) from our microsatellite data, irrespective of whether it was IBD or not. Thus,

Pedigree F was calculated using the pedigreemm package (v0.3-1; Vazquez et al., 2010) in R (v3.0.2; R Core Team, 2013). To simulate related founders we shortened our two Seewiesen pedigrees by removing the first three (Supplementary Figure S5D) or four (Supplementary Figure S5C) generations and then calculating Pedigree F (Pedigree F4 and Pedigree F3, respectively). Note that Pedigree F4 refers to the longer pedigree, extending over four generations (n=526 founders of which 2.85% (empirical) and 10.27% (random mating) are inbred), and Pedigree F3 to the three-generation pedigree (n=153 founders of which 60.13% (empirical) and 42.48% (random-mating) are inbred; Table 1). As a measure of precision in predicting the dependent variable (Zar, 2010), we took the coefficient of determination (r2) between both the pedigree-based estimates of inbreeding (Pedigree F3, Pedigree F4 and Pedigree F7) and GWIBD7, and between the marker-based estimates (Marker IBD7 and Marker IBS7) and GWIBD7. We also recorded the slopes of ordinary least square regressions with GWIBD7 as the dependent variable and Pedigree F3, Pedigree F4, Pedigree F7, Marker IBD7 and Marker IBS7 as predictors to estimate the bias of the prediction.

Results

Mendelian noise in GWIBD and a comparison with analytical results

It has been shown analytically before (Hill and Weir, 2011) that for any class of individuals with the same inbreeding history and hence the same Pedigree F, there is considerable variation among individuals in the proportion of the genome that is inherited IBD (see also Figure 2). Using 10 000 simulation runs on the designed pedigree, this variation—caused by Mendelian sampling noise—was markedly larger in the zebra finch (full-sib mating s.d.=0.0838, first-cousin mating s.d.=0.0461, second-cousin mating s.d.=0.0231) than in humans (full-sib mating s.d.=0.0454, first-cousin mating s.d.=0.0251, second-cousin mating s.d.=0.0115; Figure 2). Although the s.d. of GWIBD decreased with more distant inbreeding levels, the coefficient of variation (CV), which can be interpreted as a measure of the relative s.d., increased in both the zebra finch (full-sib mating CV=0.335, first-cousin mating CV=0.732, second-cousin mating CV=1.466) and the human genome (full-sib mating CV=0.181, first-cousin mating CV=0.402, second-cousin mating CV=0.737), which has also been shown analytically (Hill and Weir, 2011).

Figure 2
figure 2

Variation in inbreeding (realized GWIBD) for offspring of a full-sib mating (a, b), first-cousin mating (c, d) and second-cousin mating (e, f). Left panels show the results for zebra finches and right panels show the results for humans, both derived from 10 000 simulation runs on the simple designed pedigree depicted in Supplementary Figure S3. The black lines indicate the expected inbreeding coefficients calculated with Wright’s path method (1/4, 1/16 and 1/64, respectively).

By constructing an artificial human linkage map with a strict uniform distribution of crossovers, we were able to compare our simulation-based estimates of the s.d. in GWIBD in humans with their analytical expectations (calculated with the formulas provided in Franklin, 1977; Hill and Weir, 2011). The simulations yielded slightly larger s.d. values of GWIBD than expected analytically, at maximum a deviation of 3.3% (full-sib mating s.d.=0.0412 vs 0.0420, first-cousin mating s.d.=0.0226 vs 0.0234, second-cousin mating s.d.=0.0104 vs 0.0107). The deviation is probably caused by the use of an infinitesimal model in the analytical approach (Franklin, 1977; Hill and Weir, 2011), whereas we simulated 100 kb segments (for computational feasibility) which slightly increased the s.d. in IBD sharing. In line with this interpretation, analytical models yield larger s.d. values in IBD sharing between relatives when they use a localized distribution of crossovers instead of an infinitesimal model (Risch and Lange, 1979; Suarez et al., 1979; Visscher, 2009).

Effects of meiotic recombination on precision and bias of Pedigree F

As shown above, the s.d. of GWIBD resulting from full-sib, first-cousin and second-cousin mating is almost twice as large in zebra finches as in humans because of the differences in their genomic architectures. Consequently, Pedigree F in the seventh generation of the empirical and the random-mating pedigree was more precise in predicting GWIBD when simulating a human linkage map (empirical pedigree: r2=0.82, 95% quantile range (QR)=0.79–0.85; Figure 3b) than when simulating a zebra finch linkage map (empirical pedigree: r2=0.56, 95% QR=0.49–0.63; Figure 3a). When we shortened the pedigrees by three or four generations, precision of Pedigree F dropped in both zebra finches and humans by the same percentage (empirical Pedigree F4: zebra finches r2=0.48, 95% QR=0.40–0.57 and humans r2=0.70, 95% QR=0.65–0.75, for both a 14% reduction in comparison with empirical Pedigree F7 (see also the equivalent change in r2 in Supplementary Figure S5D); empirical Pedigree F3: zebra finches r2=0.39, 95% QR=0.30–0.48 and humans r2=0.57, 95% QR=0.51–0.62, a 31% reduction in comparison with empirical Pedigree F7 (see also the equivalent change in r2 in Supplementary Figure S5C); Figures 4a and b). In preceding generations with lower inbreeding levels (where many individuals have Pedigree F=0, s.d.=0), the pedigree-based inbreeding estimates appeared more precise on average (Table 1), because Mendelian noise can only contribute to variation in GWIBD whenever Pedigree F>0. This highlights the fact that the coefficients of determination depend on the inbreeding level and its variation within the pedigree or population under study. In line with this, precision of Pedigree F was lower in the random-mating pedigree, in which the mean and variance in inbreeding were reduced (Table 1 and Supplementary Figures S6A and B).

Figure 3
figure 3

An exemplary simulation run showing the realized GWIBD in 681 individuals from the seventh generation of our empirical pedigree over their expected values (Pedigree F). Simulations are based on the linkage maps of (a) the zebra finch and (b) the human genome. Shown are the most representative simulation runs (out of 1000 each) where regression slopes (β=1.00, 95% QR=0.88–1.12 and β=1.00, 95% QR=0.93–1.07) and coefficients of determination (r2=0.56, 95% QR=0.49–0.63 and r2=0.82, 95% QR=0.79–0.85) were closest to the mean values from 1000 runs.

Figure 4
figure 4

Comparison of precision (a, b) and bias (c, d) when predicting GWIBD by Pedigree F7 (black), ‘ideal markers’ (Marker IBD7, red) and microsatellites (Marker IBS7 with an IBD–IBS discrepancy of 13.3%, blue) in the empirical pedigree. The left and right panels are estimates from 1000 simulation runs in zebra finches and humans, respectively. The black solid line indicates the average precision and bias of pedigree-based estimates (Pedigree F7) of inbreeding (±1 s.e.) which is not influenced by the number of markers (slight changes in precision and bias across the different numbers of markers are caused by random sampling noise in GWIBD). The dashed black line indicates Pedigree F4 (±1 s.e.) and the dotted black line Pedigree F3 (±1 s.e.). The red and blue lines indicate average precision and bias (±1 s.e.) of Marker IBD7 and Marker IBS7, respectively, for varying numbers of markers (100 kb genomic segments) used for predicting the inbreeding level of individuals (GWIBD7) in the seventh generation of the empirical pedigree.

Regressing GWIBD7 over Pedigree F7 from 1000 simulations yielded an unbiased mean slope of β=1.00 in both zebra finches (empirical pedigree 95% QR=0.88–1.12; Figure 3a) and humans (empirical pedigree 95% QR=0.93–1.07; Figure 3b), as is expected for a direct cause–effect relationship and a Berkson error model (Berkson, 1950). Shortening the empirical pedigree led to a downward bias in slopes in zebra finches and humans alike (empirical Pedigree F4: zebra finches β=0.89, 95% QR=0.76–1.02 and humans β=0.89, 95% QR=0.82–0.96, a 11% reduction in comparison with empirical Pedigree F7; empirical Pedigree F3: zebra finches β=0.77, 95% QR=0.65–0.91 and humans β=0.78, 95% QR=0.70–0.85, a 22% reduction in comparison with empirical Pedigree F7; Figures 4c and d). However, shortening the random-mating pedigree had almost no effect on the regression slopes (random-mating Pedigree F7: zebra finches β=1.00, 95% QR=0.78–1.24 and humans β=1.00, 95% QR=0.87–1.13; random-mating Pedigree F4: zebra finches β=0.99, 95% QR=0.75–1.24 and humans β=0.99, 95% QR=0.84–1.14; random-mating Pedigree F3: zebra finches β=1.03, 95% QR=0.75–1.31 and humans β=1.03, 95% QR=0.86–1.19; Supplementary Figures S6C and D). This is because under random mating each individual has on average the same pedigree structure and thus the pedigree-based inbreeding coefficients of all individuals increase on average by the same amount and almost linearly across multiple generations when mean inbreeding levels in the population are low (Falconer and Mackay, 1996). Thus, the Berkson error model holds (Berkson, 1950) and Pedigree F4 and Pedigree F3 are unbiased estimators of Pedigree F7.

Effects of meiotic recombination on precision and bias of Marker F

In the following, we first consider ‘ideal markers’ (Knief et al., 2015), that is, markers that are never homozygous by chance alone (no IBS without IBD). ‘Ideal markers’ were more precise and more accurate in predicting GWIBD when simulating a zebra finch linkage map than when simulating a human linkage map (compare red lines in Figures 4a and c with Figures 4b and d). They were also slightly more precise and accurate when simulating the empirical pedigree than the random-mating pedigree (compare red lines in Figures 4a–d with Supplementary Figures S6A–D), as it has also been found by Wang (2016). For instance, 20 ‘ideal markers’ yielded a precision of r2=0.63 (95% QR=0.47–0.76) in the empirical zebra finch pedigree, but only r2=0.53 (95% QR=0.44–0.62) in the empirical human pedigree. In the random-mating pedigree, 20 ‘ideal markers’ yielded a precision of r2=0.56 (95% QR=0.37–0.72) in zebra finches, but only r2=0.42 (95% QR=0.30–0.54) in humans. Thus, markers are more reliable in the species with the less reliable pedigree-based prediction and in the pedigree with more variation in inbreeding. We then ask how many markers are needed to obtain higher precision than given by Pedigree F7.

In the seventh generation of our empirical pedigree, 15 randomly distributed ‘ideal marker’ segments in the zebra finch genome (out of 11 509 autosomal 100 kb segments=0.13% of the autosomal genome) gave the same precision as the pedigree-based estimate (Figure 4a). In the human genome, 80 randomly distributed ‘ideal marker’ segments (out of 28 801 autosomal 100 kb segments=0.28% of the autosomal genome) were needed (Figure 4b). In the seventh generation of the random-mating pedigree, these estimates were almost unchanged (Supplementary Figures S6A and B).

The ordinary least square regression slopes of GWIBD7 over Marker IBD7 estimated from 15 and 80 randomly distributed segments in the zebra finch and human genome, respectively, were biased downward and only when using 160 segments to estimate Marker IBD7 did the slopes become almost unbiased (both in the empirical (Figures 4c and d) and random-mating pedigree (Supplementary Figures S6C and D)). This means that when using Marker IBD7 (or even more so when using Marker IBS7, see below) in a heterozygosity–fitness study, the inbreeding load (Szulkin et al., 2010) in a population will be underestimated.

We now consider nonideal markers like microsatellites that can be IBS without being IBD. We empirically estimated the IBD–IBS discrepancy (mean from 11 microsatellites) as 13.3% in our zebra finches (see Supplementary Information). After incorporating this into our simulations of the empirical pedigree, 40 randomly distributed segments in the zebra finch genome (out of 11 509 autosomal 100 kb segments=0.35% of the autosomal genome) were needed to obtain an estimate of GWIBD7 that is as precise as the pedigree-based estimate (empirical Pedigree F7) (Figure 4a). Using the same IBD–IBS discrepancy for humans we found that 160 randomly distributed segments (out of 28 801 autosomal 100 kb segments=0.56% of the autosomal genome) were needed (Figure 4b). However, these estimates change drastically when considering the random-mating pedigree, where only 160 randomly distributed segments in the zebra finch genome provided an as precise estimate of GWIBD7 as the pedigree (Supplementary Figures S6A and B). This is probably because variation in inbreeding is much lower and thus the same IBD–IBS discrepancy (13.3%) adds relatively more noise.

Regressing GWIBD7 over Marker IBS7 yielded slopes that were biased downward. Using 80 and 160 segments for the marker-based inbreeding estimates in zebra finches and humans yielded comparable slopes between Marker IBD7 and Marker IBS7 in the empirical pedigree (Figures 4c and d). The coefficient of determination (and thus the correlation coefficient rxy) was increasing more rapidly for Marker IBS7 than for Marker IBD7 with increasing numbers of loci (Figures 4a and b) and the s.d. (s.d.x) was decreasing faster in Marker IBS7 than in Marker IBD7 (probably because the distribution of Marker IBS7 was less skewed). The slope of an ordinary least square regression line with independent variable x and dependent variable y (which is GWIBD) is calculated as rxy × s.d.y/s.d.x and thus at some point Marker IBS7 is a less biased estimate of GWIBD7 than Marker IBD7. However, one should keep in mind that at that point precision of Marker IBS7 is lower than precision of Marker IBD7.

Discussion

Mendelian noise in GWIBD

Both the genetic map length and the number of chromosomes under consideration are known to influence the variation in IBD (Franklin, 1977; Stam, 1980; Hill and Weir, 2011; Kardos et al., 2015). Similarly, it has been predicted that the distribution of crossovers along chromosomes will influence the amount of variance in IBD (see, for example, Risch and Lange, 1979; Rasmuson, 1993; Guo, 1995; Forstmeier et al., 2012), but to our knowledge it has never received attention in a modeling framework. Here we show that the variance in IBD is much larger in zebra finches than in humans, because in the former almost half of the genome is inherited in only six segments (that is, the interiors of chromosomes Tgu1, Tgu1A, Tgu2, Tgu3, Tgu4 and Tgu5) that only rarely break up by crossovers (Backström et al., 2010). Hill and Weir (2011) provide a formula to calculate the expected s.d. in GWIBD for full-sib mating, first-cousin mating and second-cousin mating that assumes a uniform distribution of crossovers along chromosomes. We used the genetic length of all simulated zebra finch chromosomes in this formula and compared the results with those from our simulation (that incorporates the skew in the distribution of crossovers). The s.d. increased by a factor of 1.79, 1.81 and 1.93, respectively. Within birds, such a highly nonuniform distribution of recombination events has been observed only in zebra finches and long-tailed finches so far (Singhal et al., 2015), yet it should be noted that even more extreme examples of a skewed distribution of recombination can be found in other organisms (for example, corn (Zea mays); Gore et al., 2009). In the following, we will discuss the effect of this nonuniform distribution of crossovers on precision and bias of Pedigree and Marker F.

Effects of meiotic recombination on pedigree-based estimates of inbreeding

The maximum precision of pedigree-based estimates of GWIBD depends on the amount of Mendelian sampling noise and is thus higher in humans than in zebra finches. When shortening the pedigree from seven to four or three generations or when reducing the mean inbreeding level and its variation, precision of the pedigree-based estimate of inbreeding drops (Table 1, Figures 4a and b; Supplementary Figures S6A and B) and the relative reduction is independent of the underlying meiotic recombination landscape (that is, precision is reduced by the same percentage in zebra finches and humans). Besides losing precision, Pedigree F is also systematically biased downward when pedigree founders are related but less so when the mean and variation in inbreeding is low as in the random-mating pedigree (Figures 4c and d; Supplementary Figures S6C and D; Balloux et al., 2004; Kardos et al., 2015; Wang, 2016). Thus, there is not a single best pedigree depth in terms of bias and precision for every pedigree structure but the optimum is actually reached at the minimum number of generations that captures most of the variation in Pedigree F (Wang, 2014, 2016), and this number is a pedigree-specific parameter.

Effects of meiotic recombination on marker-based estimates of inbreeding

In zebra finches, fewer markers are needed to reach the same precision as Pedigree F than in humans. This is mostly because Pedigree F is a less precise estimate of GWIBD in zebra finches than in humans. Moreover, each single marker in the zebra finch genome contributes more to an increase in precision of Marker F than in the human genome, and this is reflected in the steeper increase in precision with an increasing number of markers (see Figure 4a versus b). The skewed distribution of recombination events in the zebra finch leads to the inheritance of large blocks in the center of macrochromosomes (Forstmeier et al., 2012). We speculate that whenever a marker in the zebra finch genome is located in one of these blocks, it captures information on a relatively large proportion of the genome and thus on the inbreeding level of an individual. However, whenever it is located more toward the telomeres it will be less informative. In humans, the pronounced block-like inheritance of genomic regions is absent and consequently each marker adds approximately the same but on average less information (because recombination rates are generally higher), and this is also evident in the smaller standard errors in humans compared with zebra finches in Figure 4.

A comparison between Pedigree and Marker F

Assessing 15 and 80 ‘ideal markers’ (like long runs of homozygosity in dense SNP panels; McQuillan et al., 2008; Knief et al., 2015) for their IBD status (Marker IBD7) in zebra finches and humans, respectively, yields as precise estimates of GWIBD7 as a complete seven-generation pedigree (both in the empirical and random-mating pedigree). When shortening the pedigree, precision of the pedigree-based estimate of inbreeding drops (Figure 4 and Supplementary Figure S6), and then the same precision is reached with even fewer ‘ideal markers’.

The surprisingly small number of ‘ideal markers’ needed in the zebra finch genome to reach the same precision as Pedigree F is in good agreement with an earlier empirical estimate for our captive population: a comparison of the strength of marker- and pedigree-based estimates of inbreeding depression suggested that 11 microsatellites reflect an individual’s realized inbreeding coefficient equally well as the pedigree (Forstmeier et al., 2012). Microsatellites used in that study were spread across nine chromosomes, including the macrochromosomes Tgu1, Tgu1A, Tgu2, Tgu3, Tgu5, Tgu6 and Tgu9, that together sum up to half the physical zebra finch genome and rarely break up in meiosis (Backström et al., 2010). Thus, they are potentially more informative than a random set of ‘ideal markers’ (as considered here). Because of a limited number of segregating haplotypes in our captive population, being IBS for a single microsatellite reflects IBD well (Forstmeier et al., 2012), but this may not be the case in large and panmictic populations in the wild (Knief et al., 2015). To our knowledge, empirical field studies have rarely assessed the extent to which IBS of microsatellite markers reflects IBD of the surrounding genomic region, a question that now can be addressed by either using dense SNP panels or several microsatellites located within small genomic regions (for example, 100 kb, see Knief et al., 2015). Considering the results from the random-mating pedigree where the mean and variation in inbreeding are lower, the need to assess the IBD–IBS discrepancy becomes obvious. In both the empirical and the random-mating pedigree, incorporating the IBD–IBS discrepancy into our simulations decreased the precision of the molecular markers and consequently more markers were needed to get as precise estimates of GWIBD7 as with Pedigree F7. However, the loss in precision was much stronger in the random-mating pedigree because the same IBD–IBS discrepancy (13.3%) translates into a much larger error when mean inbreeding levels are lower (Miller and Coltman, 2014).

Conclusions

We here show that meiotic recombination affects precision and bias of pedigree- and marker-based estimates of inbreeding. In addition to the number of chromosomes and the number of crossovers (that together constitute the genetic map length), the distribution of crossovers along chromosomes also has an effect on variation in GWIBD, such that a more nonuniform distribution of crossovers increases the Mendelian noise.

The amount and distribution of meiotic recombination is species specific (Rasmuson, 1993) and consequently cannot be changed for a study organism. All else being equal, Pedigree F is a more precise estimate of GWIBD in species with more meiotic recombination (and a more uniform distribution of crossovers), whereas Marker F is losing precision. However, adding a small number of genetic markers compensates for the loss and eventually results in more precise estimates of GWIBD than Pedigree F. The exact number of markers that are needed to obtain a precise estimate of GWIBD is strongly dependent on the demographic history of the study population (Miller and Coltman, 2014). Whether it is more effective to reduce the marker sampling noise (by increasing the number of randomly distributed markers) or the IBD–IBS discrepancy (by clustering markers to get more reliable information about IBD; Knief et al., 2015) remains to be tested empirically.

If a pedigree is fully informative about all inbreeding or at least about most of the variation in inbreeding in a population (Wang, 2014, 2016; which may be the case when founding a captive population from a wild one) then Pedigree F yields an unbiased estimate of GWIBD. However, as soon as a pedigree does not cover all inbreeding in a population (which is rather the rule in wild populations) Pedigree F is a biased predictor of GWIBD. Although Marker F is also biased downward when predicting GWIBD from a small number of molecular markers, the bias can be easily reduced by increasing the number of molecular markers in the study.

We suggest that it is generally advantageous to use molecular markers in heterozygosity–fitness studies, especially if (1) large numbers (10 000) of SNPs can be genotyped (see, for example, Hoffman et al., 2014; Kardos et al., 2015; Huisman et al., 2016; Wang, 2016, 2) if moderate numbers (30–100) of microsatellites can be genotyped and (3) the pedigree covers only few generations or pedigree founders are related.

Data accessibility

All data sets (seven-generation pedigrees and microsatellite genotypes) and the annotated source codes for linkage map construction and smoothing, finding the optimal interference distance and the gene-dropping simulations are accessible from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.67f9c.