Introduction

Inbreeding depression is a major threat to the survival of small endangered populations. It is mainly due to the increase in the frequency of homozygous genotypes for deleterious recessive alleles, which leads to fitness decay and increased extinction risk (Lande 1994; Hedrick and Kalinowski 2000; O’Grady et al. 2006; Charlesworth and Willis 2009). However, deleterious recessive alleles that escape selection in non-inbred populations because they are usually in heterozygosis, can be purged under inbreeding as they are exposed in homozygosis. This is expected to result in a reduction of fitness depression and in some fitness recovery, unless the effective population size and the effects of deleterious alleles are so small that drift overwhelms natural selection (García-Dorado 2012,2015,).

While inbreeding depression is ubiquitously documented (Crnokrak and Roff 1999; O’Grady et al. 2006), there is far less empirical evidence for the effect of genetic purging. Evidence of purging has often been obtained in situations where inbreeding increases slowly, but many studies have failed to detect purging in both wild and captive populations or have just detected purging effects of small magnitude, particularly under fast inbreeding or during short periods of slow inbreeding (Ballou 1997; Bryant et al. 1999; Byers and Waller 1999; Crnokrak and Barrett 2002; Boakes et al. 2007; Kennedy et al. 2014). This is not surprising, since purging is expected to be less efficient under faster inbreeding, but more delayed under slower inbreeding. Furthermore, purging can be difficult to detect because of lack of experimental power or confounding effects, as concurring adaptive processes (Hedrick and García-Dorado 2016; López-Cortegano et al. 2016). Thus, failure to detect purging does not mean that purging is irrelevant in actual populations. Developing methods and tools to detect and evaluate purging is of critical importance in conservation, as it may help to improve management policies.

The first models aimed to detect purging from pedigreed fitness data were based on different regression approaches that use an ancestral inbreeding coefficient (Fa) to define the independent variable(s) accounting for purging (Ballou 1997; Boakes et al. 2007). This Fa coefficient, first described by Ballou (1997), represents the average proportion of an individual’s genome that has been in homozygosis by descent in at least one ancestor. It is relevant to purging because recessive deleterious alleles can be purged in inbred ancestors, so that individuals with higher Fa are expected to carry fewer such alleles than those with the same level of inbreeding but lower Fa values, and should therefore have higher fitness. Gulisija and Crow (2007) developed a different index to measure the opportunity of purging (O i ) by assuming that, in the same pedigree path, there are no two ancestors that are homozygous for the same deleterious allele. However, the authors noted that, due to this assumption, their approach is appropriate to evaluate the opportunities of purging just for completely recessive and severely deleterious alleles with low initial frequency in shallow pedigrees. Furthermore, they did not develop an explicit model for the dependence of fitness on the opportunity of purging. Therefore, here we do not investigate the properties of this index.

More recently, an Inbreeding–Purging (IP) model has been proposed, based on a “purged inbreeding coefficient” (g), that predicts how mean fitness and inbreeding load are expected to evolve in a population undergoing inbreeding. This coefficient g is defined as Wright’s inbreeding coefficient (F) adjusted for the reduction in frequency of the deleterious alleles caused by purging, so that it is the coefficient appropriate to predict the actual increase in homozygosis for these alleles. It depends on a purging coefficient (d) that represents the enhancement of selection under inbreeding (García-Dorado 2012). For each single deleterious allele, d equals the recessive component of the selection coefficient, i.e., the deleterious effect that is concealed in the heterozygous and expressed just in the homozygous condition. Note that d equals the heterozygous value for relative fitness in the classical quantitative genetics scale proposed by Falconer (Falconer and Mackay 1996). For overall fitness, which is affected by many alleles with different deleterious effects, reliable IP predictions can be obtained by using a single empirically defined d value. The dependence of g on d is illustrated in Fig. 1, and shows that purging is more efficient when inbreeding is slower (i.e., when the effective population size is larger), but also takes longer to become relevant. Therefore, this model predicts that the rate of inbreeding (or the effective population size N) and the number of inbreeding generations (t) critically determine the extent of purging.

Fig. 1
figure 1

Evolution of the expected purged inbreeding coefficient (g) against generation number for different d values, together with the evolution of Wright’s inbreeding coefficient (F) for populations of effective size 25 (left) or 100 (right)

The purging coefficient d has been estimated from the evolution of mean fitness in Drosophila experiments, the IP model providing a much better fit than a model without purging (Bersabé and García-Dorado 2013; López-Cortegano et al. 2016). Furthermore, equations have been derived to obtain IP predictions for pedigreed individuals and have been implemented in the free software package PURGd. This software analyzes pedigreed fitness data to obtain estimates of the IP parameters, namely the rate of inbreeding depression δ and the purging coefficient d (García-Dorado 2012; García-Dorado et al. 2016). Preliminary analysis of simulated data showed that this software accurately discriminates between situations with and without purging, and that the genealogical IP approach consistently provided a good fit to the data. However, the estimates of δ and d showed some downward bias (García-Dorado et al. 2016). Thus, before this method is applied to real data, it is necessary to characterize the bias of (δ, d) estimates obtained under different scenarios and to check how far it affects the reliability of IP predictions of fitness evolution computed using them.

Here, we analyze fitness data of simulated pedigreed individuals undergoing inbreeding and purging in order to investigate: (i) how often the IP and Fa-based approaches allow to detect purging; (ii) the extent to which the estimates of the model’s parameters depend on the rate of inbreeding (here determined by the population size N) and on the number of inbreeding generations (t); (iii) how reliable are the IP and Fa-based predictions for inbreeding scenarios with N and/or t values different from those used to estimate the model’s parameters.

Material and methods

The simulated populations

A monecious panmictic population of size N = 103 is simulated under a mutation-selection-drift (MSD) scenario over 104 generations to obtain a base population that can be assumed to be at the MSD balance. Mutations occur at a rate λ per genome and generation, and have selection coefficient s and degree of dominance h, so that fitness is reduced by h·s or s when the mutant allele is in heterozygosis or homozygosis, respectively. According to the standard assumption of non-epistatic models, fitness is multiplicative across loci. In practice, fitness effects can be epistatic to some extent. In particular, the homozygous effect of a deleterious allele may be larger in individuals that are also homozygous for other deleterious alleles, giving reinforcing epistasis that involves recessive components. However, although this could be expected to produce an increase in inbreeding depression, previous simulation results suggest that this increase is canceled out by a parallel excess in purging, so that simple IP predictions not accounting for epistasis still fit the evolution of mean fitness under inbreeding (Pérez-Figueroa et al. 2009). The simulation methods are described in detail by Bersabé et al. (2016).

Two different sets of mutational parameters (CAPTIVE and WILD, summarized in Table 1) are considered. In both cases, a variable selection coefficient is sampled from a gamma distribution with shape parameter \(\alpha = 3^{ - 1}\) and rate parameter β = α/E(s), where E(s) stands for the expected s value. Sampled s values larger than 1 are assigned as s = 1. The mutation rate and average deleterious effect in the WILD case are twice those of the CAPTIVE one, in order to account for the inbreeding load that has been empirically detected in the wild, which is about four-fold that of captive populations (Ralls et al. 1988; O’Grady et al. 2006; Hedrick and García-Dorado 2016). For each given s value, the degree of dominance h is sampled from a uniform distribution ranging between 0 and \(e^{ - 7.5s}\) (García-Dorado 2003). Note that this gives an average degree of dominance (E(h)) that is larger in the CAPTIVE than in the WILD case, as the average selection coefficient is lower. The corresponding distributions of homozygous effects are shown in Fig. 2.

Table 1 Genetic parameters used in simulations for the two different cases (CAPTIVE and Wild): expected (E) values of the selection coefficient (s, gamma distributed with shape parameter 1/3) and of the degree of dominance (h, uniformly distributed between 0 and e−7.5s), and mutation rate (λ)
Fig. 2
figure 2

The area below the lines gives the expected number of deleterious mutations with homozygous effects within any interval in the abscissa axis. Dotted line: CAPTIVE mutational model. Dashed line: WILD mutational model. Note that the figure does not show probability density functions, as they do not integrate to 1 but to the mutation rate λ

For each case considered, ten base populations are simulated. Populations of reduced size N = 10, N = 25, and N = 50 (lines) are obtained from these base populations at the MSD equilibrium (250, 100 and 50 replicates, respectively, each of the 10 base populations contributing equal numbers of replicates for each size). Effective population sizes are assumed to equal actual population sizes. All lines are continued for 2 N generations following the same protocol as for the base populations (i.e., under mutation, selection and drift), and pedigrees and individual fitness are recorded.

Estimation of inbreeding depression and purging

IP Model

This model predicts fitness as a function of a purged inbreeding coefficient g that is defined as Wright’s F inbreeding coefficient corrected for the reduction in frequency of deleterious alleles expected from purging. This g coefficient can be computed as a function of the purging coefficient d (García-Dorado 2012). For a model with constant effects across loci, d equals the per-copy deleterious effect that is expressed in homozygosis but is concealed in heterozygosis (d = s(1 − 2 h)/2). For more realistic models where deleterious effects vary across loci, as in our simulated populations, IP predictions should be averaged over the distribution of deleterious effects. Since this approach is not possible in practical situations, an effective purging coefficient (here referred to just as purging coefficient and denoted by d) has been defined empirically as the d value giving the best predictions when used in the IP model, which has been shown to produce good approximations (García-Dorado 2012). A simple recurrence equation calculates g each generation as a function of d, the effective population size N, and the F and g values in the previous generation, or from pedigree data. García-Dorado et al. (2016) generalized the pedigree recurrence equations to allow for overlapping generations. These equations parallel those classically used to predict the evolution of F using Malecot’s coancestry coefficients, introducing an additional term that depends on d. Thus, the model can predict either the average fitness expected at generation t (W t ), or the expected fitness for an individual i with pedigree records (W i ). In the case of individual fitness,

$$W_i = W_{\rm 0}\cdot e^{ - \delta \cdot g_i},$$
(1)

where δ is the rate of inbreeding depression, g i is the purged inbreeding coefficient of individual i computed using d (Fig. 1), and W0 is the expected fitness in the non-inbred population.

Note that, if natural selection can be ignored during the inbreeding period, g can be replaced with F, and δ equals the inbreeding load B in the base population defined as the sum over loci of 2 s(1/2 − h) q(1 − q), as shown by Morton et al. (1956), where q is the frequency of the deleterious allele. Thus, the inbreeding load B can be interpreted as the expected rate of inbreeding depression if natural selection is neglected during the inbreeding process. This can be appropriate when very few generations are considered, so that purging has no opportunity to occur, when natural selection is overwhelmed by drift due to a very small effective population size, or when natural selection is relaxed by maintaining a population in benign conditions, as it could occur to some extent in ex situ conservation programs. Otherwise, purging selection must be taken into account by replacing F with g. Furthermore, non-purging selection (i.e., selection as it would operate in an equilibrium population with stable homozygosis) should also be considered, at least in not too small populations, as it can compensate for a significant fraction of the inbreeding depression. To understand this concept, discussed in the section devoted to the Full Model (FM) in García-Dorado (2012), think of a population at the MSD equilibrium. This population has finite size N (i.e., inbreeding increases at a rate 1/2 N) and a given inbreeding load, but it does not experience inbreeding depression because it is compensated by natural selection. This kind of selection is not due to a net increase in homozygosis and, therefore, it can be considered part of the standard selection occurring in populations at the MSD balance and we do not use the term purging to describe it. According to this Full Model, due to non-purging selection, the actual expected rate of inbreeding depression as a function of g is δFM = B − B*, where B and B* are, respectively, the inbreeding loads expected at the MSD balance for the original non-inbred population and for the new reduced size N. To obtain this δFM value, we compute B and B* using Equations 10 and 13 in García-Dorado (2007), both averaged over 106 (s, h) values sampled from the corresponding joint distribution (s values larger than 1 were assigned s = 1 as in the simulation process). Note that δFM approaches B for very small populations, but can be substantially smaller when N is large.

For each pedigree, we estimate the purging coefficient d and the rate of inbreeding depression δ using the PURGd 2.0 software package (García-Dorado et al. 2016; freely available at https://www.ucm.es/genetica1/mecanismos). These estimates are obtained using the two methods implemented in PURGd. Results obtained using linear regression for log-transformed fitness (LR method) are not qualitatively different from those obtained using the numerical non-linear regression method (NNLR), but give more downwardly biased estimates of δ and larger standard errors. These LR results are not reported in the main text, although a summarizing figure is given in the Supplementary Material (Figure S1). Thus, we only report results from the NNLR method, which fits predictions from Eq. 1 by numerically searching for estimates that minimize the residual sums of squares (García-Dorado et al. 2016). The expected fitness value in the non-inbred population, E(W0), is obtained in a previous step as the mean fitness of non-inbred individuals with non-inbred ancestors (F = Fa = 0), as explained in García-Dorado et al. (2016). Therefore, the program produces estimates of δ and d that are conditional to this estimate of the non-inbred expected fitness. To check for the convergence of the numerical algorithm, we estimate the genetic parameters for each pedigree as the result of a single run, and as the average of five and ten independent runs.

A bootstrap method was devised to test the statistical significance of the estimate of d obtained from each replicate line against the null hypothesis d = 0 and is described in the Supplementary Material.

Ancestral Inbreeding models

Ballou (1997) defined the ancestral inbreeding coefficient (Fa) as the fraction of an individual’s genome that has been in homozygosis by descent in at least one ancestor, calculated in terms of the inbreeding coefficient (F) and the ancestral inbreeding coefficient of the individual’s parents (sire S and dam D) as

$$\begin{array}{*{20}{l}} F_{\mathrm a} = & \frac{1}{2}\left\{ F_{{\rm a}\;(D)} + \left[ {1 - F_{{\rm a}\;(D)}} \right]\cdot F_{(D)}\right.\hfill{}\\ & \left. + F_{{\rm a}\;(S)} + \left[ {1 - F_{{\rm a}\;(S)}} \right]\cdot F_{(S)} \right\}\hfill{}\end{array}$$
(2)

Thus, Fa is related to the purging opportunities in the ancestors of an individual. This equation assumes indepe ndence between F and Fa in the same individual, which can lead to some overestimation of ancestral inbreeding. In order to avoid this bias, it has been proposed to estimate ancestral inbreeding by using the so-called gene dropping simulation approach. Therefore, we have also implemented in PURGd this simulation method, which estimates ancestral inbreeding as described by Suwanlee et al. (2007) using 106 replicates. Results for all the ancestral inbreeding models considered were obtained using Fa calculated both from Eq. 2 and from gene dropping. For consistency with our IP method and with previously published Fa-based analysis, in the main text we report results obtained using Eq. 2, and those obtained using gene dropping are shown in the Supplementary Material.

To fit the joint effect of inbreeding and purging on fitness, Ballou proposed the following linear model

$$W = W_0 + b_{\rm F}F + b_{{\rm FFa}}F.F_{\rm a},$$

where bF is the partial regression coefficient that gives the decline of fitness with increasing inbreeding (F) for any constant value of the product F.Fa. According to Ballou, −bF represents the rate of inbreeding depression, while the coefficient bFFa measures the increase of fitness in inbred individuals due to reduced inbreeding depression caused by purging in their ancestors.

Since we use a multiplicative fitness model, we rewrite Ballou’s model for individual fitness as

$$W_i = W_{\rm 0}\cdot e^{b_{\rm F}\cdot F_i + b_{FF_{\rm{a}}}F_i\cdot F_{{\rm a}i}}$$
(3)

Two additional linear models have been proposed by Boakes and Wang (2005) to analyze purging using ancestral inbreeding. One of these two models (BW) considers that the effect of purging does not depend on the level of inbreeding, but just on previous purging opportunities. For multiplicative fitness, this model is written as

$$W_i = W_{\rm 0}\cdot e^{b_{\rm F}\cdot F_i + b_{{\rm F_a}}F_{{\rm a}i}},$$
(4)

where the coefficient of the purging term \(b_{F_{\rm a}}\) is the average rate of increase of individual fitness due to the opportunities of purging in the ancestors.

The other model proposed by Boakes and Wang (2005) is the mixed “Ballou–Boakes & Wang” model (here B-BW), where the purging term is the sum of those in Ballou and BW models, giving

$$W_i = W_0\cdot e^{b_{\rm{F}}\cdot F_i + b_{F_{\rm{a}}}F_{{\rm{a}}i} + b_{FF_{\rm{a}}}F_i\cdot F_{{\rm{a}}i}}$$
(5)

Fitness evaluation is often dichotomous by nature (e.g., dead/alive individuals), and both Ballou (1997) and Boakes and Wang (2005) tested their models by fitting dichotomous (0, 1) fitness data using logistic regression. To check which is the better approach to handle such data, we generate dichotomous fitness values and analyze them using Ballou’s model, with both the NNLR and the Logistic methods (Figure S2; Tables S1 and S2). However, to compare ancestral inbreeding and IP approaches under similarly optimal conditions, in the main text we always report results of NNLR analysis of fitness data simulated as a continuous variable defined in the interval (0, 1). A bootstrap contrast analogous to that performed for the IP analysis is used in each replicate to test the significance of purging in Ballou’s analysis (see Supplementary Material).

Non-Linear Regression coefficients for Fa-based models, as well as bootstrap errors, are computed using PURGd 2.0. As in the case of the IP model, the intercept is obtained in a previous step as the mean fitness for non-inbred individuals with non-inbred ancestors (F = Fa = 0).

Analysis of the predictive value of the estimates

To evaluate the predictive value of the parameters estimated in the previous section, we use the estimates obtained from different numbers of generations (t = N/2, t = N, t = 2 N) in lines of different sizes (N = 10, N = 25, N = 50) to predict the evolution of average fitness for lines for each of the three sizes considered (crossed predictions). We check how these predictions fit the corresponding simulated data by graphically comparing the observed and predicted evolution of mean fitness.

In the case of the IP model, predictions of the expected fitness at generation t (W t ) are computed using the equation for the evolution of mean fitness, obtained by replacing W i and g i in Eq. 1 with their expected values at generation t (W t and g t ). For this purpose, g t is computed as a function of N using the expression provided in García-Dorado (2012). The neutral prediction of the model by Morton et al. (1956) is also obtained by replacing gt with the standard inbreeding coefficient (F t ) into Eq. 1 and using the inbreeding load computed in the simulated population (δ = BSIM).

In the case of models based on ancestral inbreeding, predictions for mean fitness are obtained by replacing F i and Fai in Eqs. 35 with their expected values through generations, F t and Fat. Below we derive an expression for the evolution of Fat through generations in a panmictic population maintained with effective size N.

From Eq. 2, assuming a monecious population, or the same expected Fa value (or F values) for sires and dams, the average ancestral inbreeding at generation t can be computed by iterating the expression

$$F_{{\rm a}\;t} = F_{{\rm a}\;(t - 1)} + \left[ {1 - F_{{\rm a}\;(t - 1)}} \right]\cdot F_{(t - 1)},$$

which, noting that \(F_{t} = 1 - \left( {1 - \frac{1}{{2N}}} \right)^ t\) and rearranging, can be written as

$$F_{{{\rm a}\;t}} = 1 - \left( {1 - \frac{1}{{2N}}} \right)^{t - 1}\cdot \left[ {1 - F_{{ {\rm a}\;(t - 1)}}} \right]$$
(6)

In addition, an expression directly giving the expected ancestral inbreeding after t generations can be derived, so that it is not necessary to iterate expression 6 through generations. For simplicity, we define \(x_t = 1 - F_{{\rm a\;}t}\) and \(k = \left( {1 - \frac{1}{{2N}}} \right)\), so that Eq. 6 can be written as \(x_t = x_{t - 1}\cdot k^{t - 1}\). Therefore, since x0 = 1, the expected value of x t can be computed as

$$x_t = x_o\mathop {\prod}\limits_{i = 0}^{t - 1} {k^i} = k^{\mathop {\sum}\nolimits_{i = 0}^{t - 1} t } = k^{t\left( {t - 1} \right)/2}$$

and, replacing x t and k into this expression and rearranging, we obtain

$$F_{{\rm{a}}\;t} = 1 - \left( {1 - \frac{1}{{2N}}} \right)^{\frac{1}{2}t(t - 1)}$$

Results

IP estimates of the rate of inbreeding depression and the purging coefficient

The inbreeding loads in the simulated base populations (BSIM = 0.5828 ± 0.0144 for CAPTIVE; BSIM = 2.5370 ± 0.0460 for WILD) are close to their corresponding expectations for the MSD balance (B = 0.6266 for CAPTIVE, B = 2.5511 for WILD). The estimated rates of inbreeding depression (δ) are close to B for N = 10, as usually assumed, but decline for larger sizes, being in good agreement with their expected values (δFM) when computed from short term data (t = N/2) (Table 2). The estimates of δ based on longer inbreeding periods become downwardly biased.

Table 2 Estimates of rates of inbreeding depression and purging coefficients from lines of different sizes (N) and different numbers of generations (t)

Estimates of d are large, indicating substantial purging (Table 2 and S3). There is a trend for a reduction of d when estimated from longer inbreeding periods, which is associated with a parallel reduction in the estimate of δ. As expected, the estimates of this purging parameter are always larger in the WILD case than in the CAPTIVE one. In both cases, the estimates are very similar regardless of the number of runs averaged per replicate (results not shown). Thus, the estimates presented here were obtained from just one run, though more runs might be needed if additional environmental factors were included.

We have also estimated the purging coefficient by using the expected value of the rate of inbreeding depression (δFM) as a known δ value in PURGd (results shown in Table 2 and S3). It is interesting to note that this alleviates the underestimation of d with increased number of analyzed generations, compared to the situations where both d and δ are jointly estimated from the data.

Estimates of the coefficients in ancestral inbreeding models

Table 3 and S4S5 show the estimates of non-linear regression coefficients for Fa-based models. Similar results obtained using gene dropping are shown in the Supplementary Material (Tables S6S7). In both Ballou’s and B–BW models, −bF estimates obtained from short term data for different population sizes (N) are reasonably close to the expected rate of inbreeding depression (δFM), although standard errors are larger than in the IP model. However, Ballou’s −bF estimates tend to increase when based on more generations of inbreeding, leading to values well above δFM in the WILD case.

Table 3 Non-linear regression coefficients estimated for Ballou’s model, BW model and B–BW model in pedigrees of different populations sizes (N = 10 and N = 50) and numbers of generations (t = N/2 and t = 2 N)

The estimates of the coefficients for terms including Fa are usually positive, indicating purging, but vary depending on N and t in an unpredictable way, particularly for BW and B-BW models where −bFa can even be negative in some instances.

Figure 3 illustrates how different Fa-based models fit the data for lines of different sizes, showing the observed evolution of fitness during 2N generations together with the corresponding predictions computed using coefficients estimated from the same data (Figure S3, obtained using gene dropping, gives similar results). BW model fits the data poorly, showing a systematic overestimation of fitness during the first N generations and an increasing underestimation later on, while Ballou’s model fits remarkably well. B–BW model does not improve fitting over Ballou’s one, which is not surprising as \(b_{{\rm Fa}}\) estimates are usually small. Therefore, hereafter we will use Ballou’s model to evaluate the predictive value of Fa-based methods.

Fig. 3
figure 3

Evolution of mean fitness in simulated lines (red) and the corresponding predictions obtained using different Fa-based models. Predictions are computed for two different cases, CAPTIVE and WILD, and three different population sizes (N = 10, N = 25 and N = 50) over t = 2 N generations using the coefficients estimated from the same lines and number of generations. Three models based on ancestral inbreeding are used: Ballou’s (green), BW (yellow) and B-BW model (black dotted), as well as a prediction without selection (gray)

The efficiency of IP and Ballou’s models to detect purging

Figure 4 gives the percent of replicates in which a model including purging fitted the data significantly better than a non-purging model, both for IP and Ballou approaches (Figure S4 with Ballou’s results obtained using gene dropping gives similar results). For both models, purging detection is more likely in larger lines and for larger inbreeding periods, as expected from more efficient purging and larger sample sizes. Detection is also more likely for the WILD than for the CAPTIVE case, as expected.

Fig. 4
figure 4

Percent of replicates where a model including purging fitted the data significantly better than a non-purging model under the IP or Ballou approaches, both for CAPTIVE and WILD mutational models (bootstrap contrasts with α = 0.05)

Under both IP and Ballou’s models, the proportion of detected cases in the most difficult situation (N = 10, t = N/2, CAPTIVE) is very small, indicating that although both approaches detect purging when estimates are averaged over replicates, they may not be able to do so when small replicates are separately considered during short inbreeding periods. The fact that, in that situation, the proportion of detected cases is smaller than 0.05 indicates that the test is conservative. In more favorable situations, both IP and Ballou models give substantial detection rates, usually somewhat larger for the former model.

The reliability of predictions based on estimates using IP and Ballou’s models

One of the main aims of this work is to check whether each pair of IP parameters (δ, d) estimated by PURGd from pedigree data for each (N, t) situation (Table 2 and S3) is reliable for predicting the evolution of fitness in lines of different sizes during periods of considerable length (t up to 2 N). Thus, Fig. 5 gives, for each population size, the crossed IP predictions computed using different (δ, d) estimates obtained from data corresponding to different population sizes and inbreeding periods, together with the prediction computed assuming no selection and using the inbreeding load of the base population (d = 0; δ = BSIM), and with the evolution of mean fitness observed in the simulated lines. IP predictions remain quite accurate during the first N generations. In general, there is a slight trend for long-term fitness being better predicted using (δ, d) estimates from long term data. Furthermore, predictions computed using (δ, d) estimates obtained from small lines, where purging is more likely to be overwhelmed by genetic drift, tend to underrate fitness for larger lines. Conversely, IP predictions tend to overestimate fitness in the long term. However, all these biases are usually small, with the exception of those for N = 10 lines in the WILD case.

Fig. 5
figure 5

Observed fitness for the CAPTIVE (up) and WILD (down) cases, and the corresponding prediction computed using the estimates obtained in the IP model. In each panel, observed and predicted values over t = 2 N generations correspond to the population size indicated in the column (N = 10, N = 25 and N = 50). Different predictions are plotted using estimates obtained from different data sets, denoted by different colors and strokes as shown in the lateral panel. Neutral predictions, computed assuming no selection and using the inbreeding load observed in the simulated base population (BSIM), are also shown

In any case, despite the variability observed between the average (δ, d) estimates obtained from different data sets (Table 2 and S3), IP predictions remain quite accurate and always fit the data much better than a model assuming no selection. The reason is that the reductions in the estimate of δ obtained from longer inbreeding periods are compensated by reductions in the corresponding estimate of d.

Figure 6 shows a similar evaluation for the reliability of Ballou’s predictions computed using estimates of the corresponding coefficients obtained from different data sets (Table 3 and S4S5). Figure S5 obtained using gene dropping estimates gives similar results. Predictions obtained using parameters estimated in smaller lines underestimate long-term fitness, while those obtained from larger lines tend to overestimate fitness in the medium-term but can still underestimate fitness in the long term. Fitting also improves when estimates are based on longer inbreeding periods and, of course, when the coefficients used to obtain predictions had been estimated in the same data set for which fitting is tested. In general, predictions are reliable during the first few generations, where purging is irrelevant, but become unreliable later on. Thus, Ballou’s predictions of mean fitness are highly dependent on the conditions used to estimate the coefficients of the model, and become very erratic after a few generations. The same analysis was performed for the BW model, giving even less reliable predictions (data not shown).

Fig. 6
figure 6

Observed fitness for the CAPTIVE (up) and WILD (down) cases, and the corresponding prediction computed using the estimates obtained in Ballou’s model. In each panel, observed and predicted values over t = 2 N generations correspond to the population size indicated in the column (N = 10, N = 25 and N = 50), and different predictions are plotted using estimates obtained from different data sets, denoted by different colors and strokes as shown in the lateral panel. Neutral predictions, computed assuming no selection and using the inbreeding load observed in the simulated base population (BSIM), are also shown

Comparing Figs. 5 and 6 shows that IP predictions are more accurate than those of Ballou’s Fa-based model, the IP model providing reasonable predictions of the evolution of fitness for any of the population sizes considered using parameters estimated under different conditions.

Discussion

Using simulated pedigreed fitness data, we analyze the performance of the IP model (IP) and of models based on ancestral inbreeding (Fa) in order to: (i) detect purging; (ii) estimate genetic parameters that can be used to obtain reliable predictions of the evolution of fitness under inbreeding and purging. The IP model is based on the expected effect of selection against the recessive component of deleterious effects (d) that is exposed in homozygotes due to inbreeding, while the Fa approach is based on the statistical fitting of models including inbreeding (F) and ancestral inbreeding (Fa) terms. To estimate the parameters of these models we have used an updated version of the PURGd software (García-Dorado et al. 2016)

The statistical estimation approaches

We have discussed in a previous paper (García-Dorado et al. 2016) the advantages of the NNLR approach compared to linear regression for log-fitness data (LR), and the analysis of the data presented here confirm those advantages (results not shown). Furthermore, here we compare the performance of our NNLR method with that of the logistic regression approach previously used in the literature to analyze purging for dichotomous data, as those from dead/alive records, (Ballou 1997; Boakes et al. 2007; Ceballos and Álvarez 2013; Kennedy et al. 2014). To do so, we have estimated the parameters of Ballou’s model using both approaches for simulated binary fitness data, and we find that the NNLR estimates fit these data as well or slightly better than the logistic ones (Figure S2). Therefore, since the NNLR analysis relies on a model that is consistent with our exponential IP model and has other advantages regarding the estimation of δ, as discussed in García-Dorado (2016), we encourage its use to analyze binary fitness data. Hereafter, we discuss the properties of both IP and Fa models using NNLR estimates obtained from untransformed continuous fitness data.

The mutational models

In order to explore the consequences of purging against the inbreeding load expressed in wild or captive populations, we analyze fitness under two mutational models. The CAPTIVE mutational model corresponds to model II in Pérez-Figueroa et al. (2009). This model accounts for the properties of deleterious effects detected in Drosophila mutation accumulation experiments, but uses a larger deleterious mutation rate and higher kurtosis to roughly account for the additional rate of mutations that behave as deleterious in molecular evolutionary studies but whose effect is too small to be detected in mutation accumulation experiments (García-Dorado and Caballero 2000; Ávila and García-Dorado 2002; García-Dorado et al. 2004; Halligan and Keightley 2009). WILD mutational parameters were obtained by doubling the average deleterious effect and the deleterious mutation rate of the CAPTIVE case to approximately account for the about fourfold inbreeding load expressed in competitive or wild conditions (Ralls et al. 1988; O’Grady et al. 2006; Yun and Agrawal 2014; Hedrick and García-Dorado 2016). Our estimates of the purging coefficient d in the CAPTIVE case are larger than those obtained in non-competitive conditions for Drosophila (Bersabé and García-Dorado 2013), but the estimates obtained in the WILD case are similar to those experimentally obtained in competitive conditions (López-Cortegano et al. 2016). We find that our CAPTIVE and WILD cases parallel the non-competitive and competitive conditions of those experiments, as the WILD case gives a larger inbreeding load but also a larger purging coefficient than the CAPTIVE one so that, under slow inbreeding, long term inbreeding depression is small in both instances.

Performance of IP and F a models

The IP estimates of δ obtained using early data of the inbreeding process are in good agreement with their expected value (up to t = N generations in the CAPTIVE case or t = N/2 in the WILD case; see Table 2). However, they become downwardly biased when based on full data from a long inbreeding process, which is associated with a reduction of the estimates of d. The reason is that, for t = 2 N, most purging occurs during a small proportion of the period considered and, since the model’s predictions are not exact, estimates smaller than the true δ and d values can lead to some overfitting of long-term data. More stable estimates of d were obtained by introducing into the model the expected rate of inbreeding depression (δFM) as a known δ value. In practice, δFM is unknown, but δ can be estimated in a previous step by analyzing data of early generations, or by assuming d = 0 and using fitness data from individuals with no ancestral inbreeding (Fa = 0; an option incorporated in PURGd 2.0). This δ estimate can then be introduced into PURGd as a known δ value to obtain more stable estimates of d.

A main finding is that, despite the bias for δ and d described above, each joint (δ, d) IP estimate, whether obtained from small or large lines or based on short-term data or on the full long inbreeding process, produces good predictions for the evolution of mean fitness over the whole range of population sizes and during the whole period of inbreeding considered (Fig. 5). An exception is that of the smaller lines (N = 10) for the WILD case, where the observed inbreeding depression is larger than the IP prediction, unless (δ, d) were also estimated from the same data (N = 10 lines). Furthermore, (δ, d) estimates obtained from N = 10 lines predict too small fitness in the medium term for larger lines. The reason is that IP is a deterministic model that predicts the consequences of natural selection on homozygous genotypes induced by inbreeding, but does not account for the reduction in the efficiency of natural selection caused by random drift. In fact, it has been found that drift roughly overwhelms purging for Nd < 1 (García-Dorado 2012), so that alleles with d < 0.1 should be hardly purged in lines with N = 10. In the WILD case, the number of mutations per gamete with an effect small enough to escape purging for N = 10 is larger than in the CAPTIVE one (see Fig. 2 and note that Ns < 2 implies Nd < 1 for h < 1). In fact, the class with d < 0.1 contributes twice inbreeding load in the WILD than in the CAPTIVE case (0.36 vs. 0.18). Thus, in the WILD case, the IP model is less reliable for the smaller lines. Remarkably, even in this N = 10 case, IP predictions are much more accurate than those computed by ignoring purging.

It should be noted that IP predictions (as well as Fa-based ones) do not account for the fitness decline caused by the continuous accumulation of newly arisen mutations. Therefore, they tend to overestimate long-term fitness in small lines where natural selection against the accumulation of new deleterious mutations is relatively inefficient. This bias, although can be corrected in theoretical situations (see the Full Model approach in García-Dorado 2012), is unknown in practice. In our data, this mutational fitness decline is small for the periods considered, although it could be threatening for very small lines in the long term (García-Dorado et al. 1999; Ávila and García-Dorado 2002; Caballero et al. 2002; García-Dorado 2003; Halligan and Keightley 2009).

In addition to the IP model, we used three different models to estimate the dependence of individual fitness on F and Fa, where the latter parameter (the ancestral inbreeding) is used as an indirect measure of the purging opportunities in the individual’s ancestors. For the three models, we have obtained results using Fa estimates computed using the original Ballou’s equation (Eq. 2) or the gene dropping simulation approach suggested by Suwanlee et al. (2007). We found that, Ballou’s original formula produces some upward bias in the estimates of Fa, but the ability of Fa models to detect purging and predict its consequences are very similar regardless how Fa was computed.

According to Ballou (1997), when Fa is included into the model, the regression coefficient of fitness on F gives the rate of inbreeding depression (δ = −bF). This is obviously true for the particular case of Fa = 0, where \(b_{\rm F}\) estimates the rate of inbreeding depression for fitness in non-purged individuals. However, the meaning of bF is less clear for Fa > 0 since, as shown in the IP approach, the dependence of fitness on F among purged individuals varies according to how fast inbreeding has been produced and, therefore, it also depends on Fa. This explains why −bF is a poor estimator of the expected rate of inbreeding depression unless it is based on early inbreeding periods, otherwise showing important bias of different sign depending on the model used.

In Ballou’s model, purging is measured by the coefficient corresponding to the interaction effect (bFFa). Thus, this model considers that the role of purging is to reduce inbreeding depression, so that it only affects inbred individuals. Therefore, bFFa measures the rate of reduction of inbreeding depression with increasing Fa. Due to this interaction term, this model has a common feature with the IP approach: the effect of purging increases when inbreeding accumulates, both models predicting an initial fitness decline that is later reversed to some extent, in agreement with the pattern observed in simulated lines.

On the contrary, in the BW model purging is measured by the coefficient bFa, which represents the rate of increase in fitness with increasing Fa, averaged over all F values (including individuals with F = 0), and does not account for the reversal of the initial depression. Boakes and Wang (2005) found that this BW model was more efficient detecting purging in mutational models with mildly deleterious alleles, probably because those models involved high mutation rates implying larger expressed load in non-inbred individuals, and because those authors detected purging measuring its consequences on the overall load of deleterious alleles per individual. On the contrary, we evaluate the ability of the models to detect the reduction in inbreeding depression, so that Ballou’s model is more appropriate than BW. Regarding the B-BW model, it did not outperform Ballou’s nor BW models in Boakes & Wang study (2005), nor in the present analysis.

Therefore, we consider the performance of Ballou’s Fa-based model to detect and predict the consequences of purging on inbreeding depression, and we compare it to that of the IP model. The estimates of the interaction term in Ballou’s model (bFFa) are very dependent on both the size of the lines and the number of generations of inbreeding considered. Furthermore, for each population size N, different pairs of joint estimates (bF, bFFa) produce different predictions for the evolution of fitness, which compromises the reliability of Ballou’s method. It is interesting to note that, as Fa approaches 1, (bF ·F + bFFa F·Fa) approaches (bF + bFFa)F. Thus, after the early fitness recovery ascribed to purging, this method predicts a continuous rate of decline of fitness with increasing F. Since such decline is not a general consequence expected from inbreeding and purging, this prediction can be considered a flaw of the model. However, due to this predicted decline, Ballou’s model can spuriously fit the medium-term fitness decline ascribed to the fraction of the inbreeding load caused by deleterious alleles that are not being successfully purged (those with Nd < 1), or to the continuous fixation of new deleterious mutations. Overall, due to the erratic nature of Ballou’s model predictions, ascribed to the inconsistency of the corresponding estimates, the IP model should be preferred to estimate parameters that can be useful to predict the evolution of fitness under inbreeding.

Finally, according to our conservative bootstrap results, the probability of detecting purging in each replicate is higher for IP than for Ballou’s analysis and increases for larger lines and longer inbreeding processes. Thus, in the WILD case, at least 20 generations are required to have a good probability (p > 0.8) of detecting purging with effective population size 10 or above, while about 10 generations gives a modest detection rate (about 30% for N = 10 and 50% for N = 25). In the CAPTIVE case, detection chances using data of about 20 generations of inbreeding are modest, unless the effective size is about 50 or larger. Thus, purging can pass undetected because inbreeding is too fast for enough purging to occur, or because, being slow, is tracked for a too short period. In practice, detection rates are likely to be smaller due to the noisy nature of fitness measurements (particularly for binary data), to population management partially relaxing fitness, or to concurrent adaptive processes or undetected environmental trends (García-Dorado 2015; Hedrick and García-Dorado 2016; López-Cortegano et al. 2016).

Thus, our results encourage the use of the IP approach to analyze and predict purging, showing that: (i) δ estimates are more reliable when based on short periods of inbreeding, so that only small purging has occurred (or on individuals with no ancestral inbreeding); (ii) purging is better detected from long inbreeding processes and under slow inbreeding; (iii) the estimate of the purging coefficient d is less biased when based on short-term inbreeding, but more reliable estimates can be obtained from longer processes by using a good estimate of δ as a known parameter; (iv) joint (δ, d) estimates, even if they are downwardly biased in some cases, usually produce reliable IP predictions for the evolution of mean fitness under inbreeding, unless inbreeding is too fast. We also find that purging detection and measurement are very demanding, which can explain why many analyses have failed to detect purging in individual data sets (Ballou 1997; Bryant et al. 1999; Byers and Waller 1999; Crnokrak and Barrett 2002; Boakes et al. 2007; Kennedy et al. 2014). Genomic information can contribute to obtain large samples of data useful to detect and measure inbreeding depression (Kardos et al. 2016; Wang 2016). Unfortunately, inferring purging using genomic based estimates of inbreeding is not straightforward because the historical information about how present inbreeding has been produced is less explicit in genomic data than in a pedigree. Although it should be possible to infer this historical information from analysis based on the length of the segments that are identical by descent (Keller et al. 2011; Speed and Balding 2015), no method has so far been developed to obtain estimates of a predictive purging parameter from such data. Another possibility is pedigree reconstruction based on massive molecular markers (Fernández and Toro 2006; Wang 2011; Wang et al. 2012; Jiménez-Mena et al. 2016). However, in both instances, detection possibilities may be poor if fitness records are available just for individuals of the present generation. In any case, our understanding of purging can be expected to improve in the future through the accumulation of IP analysis of different sets of available pedigreed data.

Data archiving

An updated version of the PURGd software (PURGd 2.0) published by García-Dorado et al. (2016) is available from https://www.ucm.es/genetica1/mecanismos.