Introduction

Estimations of heritability (proportion of phenotypic variance attributable to genetic factors) and breeding values are of great interest because they are necessary to plan an efficient breeding program for the trait of interest. Therefore, considerable effort has been devoted to develop new statistical methods that estimate the breeding values and heritability in the linear mixed model context (Lynch and Walsh, 1998). However, when the random effects or variance components fitted to the model have multiple ‘solutions’ among their parameter spaces given the observed data, such parameters are said to be unidentifiable. Even if it is often well justified to include multiple random effects into the model (for example, Wall et al., 2005), in practice, identifiability problems due to small data size and the familial structure of the data complicate the estimation of random genetic effects and their variances (Misztal, 1997; Waldmann et al., 2008; Norris et al., 2010). Evolutionary studies of natural populations can be more prone to this problem because data sets that are used for mixed model analyses of natural populations are typically much smaller than those that are used in plant or animal breeding (Kruuk, 2004; O’Hara et al., 2008). In addition, the pedigree information for natural populations may be inaccurate and/or incomplete.

Bayesian methods can be applied to estimate genetic parameters (for example, Wang et al., 1993; Blasco, 2001; Sorensen and Gianola, 2002; Hadfield, 2010), and such methods can, unlike REML (residual maximum likelihood), be helpful in diagnosing identifiability problems. In the Bayesian approach, one combines what is known about the parameter (represented as a prior distribution) with the information that comes from the data to obtain the posterior distribution. This probability distribution represents the uncertainty about the parameter after the data has been taken into account (for example, Blasco, 2001). In the Bayesian approach, one is not limited to calculating only point estimates of variance components (as is traditionally done in a REML analysis) or only their confidence intervals, but one may explore any aspect of the remaining uncertainty, such as the uncertainty in the estimation of heritability. The standard computational approach is to use Markov chain Monte Carlo (MCMC) methods to draw samples from posterior distributions. The Gibbs sampler and the Metropolis–Hastings (M–H) algorithm are the two commonly used MCMC methods. The Gibbs sampler (Casella and George, 1992) uses draws from the fully conditional posteriors, and is a special case of M–H sampling (Chib and Greenberg, 1995).

Recently, Bauer et al. (2009) and Waldmann et al. (2008) applied Gibbs sampling to quantitative genetics research studies in plants, and the latter article developed a fast hybrid Gibbs sampler that accounted for additive and dominance variances in the mixed model. In a study carried out by De Boer and Hoeschele (1993), it was shown that the presence of inbreeding induces nonzero covariances between additive and dominance effects. However, Bauer et al. (2006), Oakey et al. (2006), Bauer and Léon (2008) predicted the breeding values (assuming no dominance) for self-pollinating crops by accounting for inbreeding among the lines. When nonzero covariance exists due to inbreeding, computational procedures for the estimation of the variance components are further complicated. In our current study, we do not consider inbreeding together with dominance.

Since the 1980s, the use of MCMC methods has revolutionized the Bayesian analysis of complex statistical models (Robert and Casella, 2004). Even today, much focus is given to improving the efficiency and convergence of MCMC samplers. The efficiency of an MCMC algorithm critically depends on the transition kernel of the MC (Hastings, 1970; Roberts and Rosenthal, 2001), but the choice of an efficient kernel, which produces a rapidly mixing chain, is often difficult. Therefore, adaptive MCMC algorithms have been proposed that can use the previous history of the chain to learn the proposal distribution parameters. These adaptive algorithms are efficient for exploring the posterior distribution of the model using the data at hand. Recent developments (Haario et al., 2001; Rosenthal, 2011) have increased the interest in applying adaptive MCMC methods in research studies. Although these methods would allow one to keep adapting the proposal distribution during the whole time the algorithm runs, we selected the proposal distribution on the basis of a learning phase, whose output we otherwise omitted from the posterior analysis. After the learning phase, we fixed the proposal distribution so that we were able to justify the use of analysis tools (such as the effective sample size (ESS)) that have been developed for analyzing the output from non-adaptive MCMC algorithms. As we stopped adapting after the learning phase, one could argue that our algorithm was not truly adaptive. However, because we manage to reduce analytically the number of unknowns down to three, the benefits of adaptation can be obtained in this simplified manner. Moreover, our method of selecting the proposal distribution is strongly motivated by the work on adaptive MCMC methods.

The hybrid Gibbs sampler is a combination of both a single-site Gibbs sampling algorithm (for example, Sorensen and Gianola, 2002) and a blocked Gibbs sampling algorithm (Garcia-Cortes and Sorensen, 1996). The convergence of the single-site Gibbs sampler can be slow due to posterior dependencies. In our new approach, the adaptive MCMC runs in two phases. First, in the learning phase, we ran the MCMC algorithm to obtain an estimate of the posterior covariance structure for log-transformed variance components. In the learning phase, we used a hybrid Gibbs sampler to sample random (additive genetic and dominance) effects. The dependencies among breeding values and dominance effects slow the convergence of the MCMC chain. So the effect of breeding values and dominance effects were marginalized away (integrated out analytically) before computing the posterior for the second phase, which we call the adapted phase. In the adapted phase, we utilized the estimated covariance structure from the learning phase, to generate multivariate correlated proposals for (log-transformed) variance components in a random walk M-H algorithm. The acceptance of these proposals was checked jointly as a single block. Block updates of the variance components after marginalizing the random effects helped the MC to converge to its equilibrium distribution reasonably fast.

In this study, we developed a fast adaptive MCMC algorithm for the estimation of additive and dominance variance in the traditional infinitesimal model without inbreeding. We compared the efficiency of the two estimation algorithms (conventional and adaptive MCMC samplers) with simulated and real data sets. In this assessment, it was important to ensure that differences in the analysis results were not due to reasons other than real differences in the sampling efficiencies between the two algorithms. Weak identifiability of variance components would make a fair comparison difficult. To alleviate this problem, we decided to simulate moderately high heritability. At the same time, we wanted to keep the number of individuals in our sample relatively small to maintain reasonable computation times in the examples. However, results from these example analyses of data with high heritability and small sample size arguably correspond to the results obtained from more realistic data with smaller heritability and a larger sample size.

Models and methods

Model 1

We consider the mixed linear model (Henderson, 1985a, 1985b):

where y is an n × 1 vector of phenotypic observations, β is a k × 1 vector of fixed (environmental) effects, a is a q × 1 vector of random additive genetic effects, d is a q × 1 vector of random dominance genetic effects and e is a n × 1 vector of error terms, which are independently normally distributed with mean zero and variance σe2. Moreover, X, Z1 and Z2 are known incidence matrices, where X associates β to the phenotypic observations y. For the simulated data sets, Z1 and Z2 associate genetic effects, a and d, respectively, to the observation vector y. For the field data, Z1 and Z2 associate random additive genetic effects a and G × E (genotype-by-environment interaction) to y. The numerator relationship matrix A, which describes additive genetic relationships between lines, can be calculated from pedigree information using well-known methods (for example, see p 763 in Lynch and Walsh, 1998) or alternatively, its inverse A−1 can be calculated directly (Henderson, 1976; Quaas, 1976). Similarly, the dominance matrix D, which describes dominance genetic relationships among lines, can be calculated from pedigree information (for example, see p 768 in Lynch and Walsh, 1998; Waldmann et al., 2008) or alternatively; its inverse can be calculated directly (Hoeschele and VanRaden, 1991). We want to emphasize that for studies of genotype-by-environment interactions, the methodology presented here works simply by considering the G × E covariance structure in place of D. For details of the G × E covariance structure, see Bauer et al. (2009).

In the following sections, we present two different hierarchical models; the former to be used in the learning phase and the latter in the adapted phase of the estimation algorithm. If all the priors are chosen to be the same, then these two hierarchical models are identical, except that most parameters have been integrated out analytically from the latter.

Hierarchical model 1

Let the precision parameters ψa, ψd and ψe be the inverses of the variances σa2, σd2 and σe2, respectively. Then using model 1, the phenotypic observation for a given trait is modeled as a linear combination of explanatory variables. For given β, a, d and ψe, vector y follows a multivariate normal distribution

where 1/ψe is the residual variance of the model. Let θ=(β, a, d) be the unknown location parameters and ψ=(ψa, ψd, ψe) be the precision parameters. By Bayes theorem, the joint posterior density of unknown parameters is proportional to

where p(ψ)=p(ψa)p(ψd)p(ψe) and p(θ|ψ)=p(β)p(a|ψa)p(d|ψd) are the prior distributions and p(y|θ, ψ) is the likelihood from Equation (2). For the Bayesian analysis, one must assign a prior distribution for the unknown model parameters. Therefore, β was assigned an improper uniform prior distribution.

Conditionally on the precision parameters, the genetic effects were assigned multivariate normal prior distributions with a zero mean vector 0 (of size q),

Before assigning a prior distribution for the precision parameters, we standardized the phenotypic observation vector y to use the same prior for different data sets (which may originally have had very different phenotypic scales). After the standardization, the precision parameters ψa, ψd and ψe were assumed to follow a gamma prior distribution with parameters ki and λi and mean ki/λi,

We chose ki=1 and λi=0.001 (that is, the exponential distribution with mean 1/λi) to obtain flat priors. This choice allows the variance components to be shrunk to very nearly zero, if this is warranted by the data. This follows because the prior Equation (6) implies an inverse gamma prior with parameters (ki, λi) for the variance component σi2. The inverse gamma density increases from a value of zero to its maximum at the mode λi/(ki+1) and then decays slowly. Shrinkage-type priors have been used before, for example, in variable selection (O’Hara and Sillanpää, 2009) and in haplotype estimation (Gasbarra et al., 2011), as well as in the penalized likelihood estimation of genetic covariance matrices (Meyer and Kirkpatrick, 2010).

Hierarchical model 2

In the adapted phase of the algorithm, we use a model in which all the unknown location parameters θ are integrated out from model (1). The joint posterior density of parameters ψ is

To mimic the improper uniform prior Equation (4), the fixed effects β were assigned a normal prior distribution with a zero mean vector 0 and a large covariance matrix B σβ2, where σβ2=106,

Here, B is the unscaled prior covariance matrix between fixed effects. The genetic effects a and d were assigned the multivariate normal priors Equation (5), and the variance components, the gamma priors Equation (6). After these choices, it is a simple matter to integrate out the location parameters from the model (cf. pp 313–314 in Sorensen and Gianola, 2002), namely

where Σ=XBX σβ2+Z1AZ1/ψa+Z2DZ2/ψd+I/ψe.

Estimation in the learning phase

To implement the Gibbs sampler, one needs the fully conditional posterior distributions of all unknown parameters (θ and ψ) of hierarchical model 1. These can be found, for example, from Waldmann et al. (2008). To update θ, samples can be drawn either element-wise or block-wise from the fully conditional posterior distribution

where θ̂ is the solution to the linear system C θ=Wy. Here,

with αa=ψa/ψe, αd=ψd/ψe. The precision parameters are sampled from their fully conditional posterior distributions,

where k*a=ka+q/2, λ*a=λa+(aTA1a)/2, k*d=kd+q/2, λ*d=λd+(dTD−1d)/2, k*e=ke+n/2, and λ*e=λe+1/2||y X β− Z1a Z2d||2. During the learning phase of the algorithm, we use a hybrid Gibbs sampler with a block update every 50th iteration to sample the random additive and dominance effects. See the appendix for details of the sampling algorithm.

Estimation in the adapted phase

We use the history of the chain during the learning phase to form the proposal distribution for the parameters of hierarchical model 2. In the second, adapted phase of the algorithm, we use an M–H algorithm to update the log-variance components block-wise using putative samples generated from the learned proposal distribution.

Our M–H algorithm uses random-walk proposals; the proposed parameter vector is generated by adding to the current parameter vector an increment from a multivariate normal distribution with a zero mean and covariance matrix Sp. We base our selection of the proposal covariance matrix on the theoretical results of Roberts et al. (1997) and Roberts and Rosenthal (2001). These authors showed that if the posterior distribution is approximately multivariate normal with covariance matrix S, then the optimal choice for the proposal covariance matrix Sp is approximately (2.38)2/d S, where d is the number of unknown parameters in the posterior distribution. To improve our ability to use this result, our algorithm works on the logarithmic scale, that is, we use the vector τ=(τa, τd, τe) as the new parameter vector, where the τ’s are the logarithms of the variance components, τi=log(σi2)=−log(ψi), i=a, d, e. This reparameterization eliminates the positivity constraints that are present for the variance components or their inverses. At the same time, it makes the posterior distribution resemble a multivariate normal distribution more closely. As the posterior covariance matrix S of the vector τ is unknown, we estimate it with the sample covariance matrix Ŝ, which is calculated from the log-transformed variance components that are simulated during the learning phase.

After the proposed parameter vector τ* has been generated by adding a noise vector to the current parameter vector τ, the proposed τ* is either accepted or rejected as the new state of the MC based on the value of the M–H acceptance ratio r, which is now given by

Here, the likelihood ratio can be evaluated based on Equation (8), after the log-transformed variance components τ=(τa, τd, τe) and τ*=(τ*a, τ*d, τ*e) have been transformed to precision parameters, using the formulas

For τ, the likelihood is

where Σ is the covariance matrix of y, conditionally on the current values of the parameters,

For τ*, the likelihood p(y|τ*) is obtained from a similar formula in which Σ is replaced by the covariance matrix of y conditional on the proposed values of the parameters,

See the appendix for the details of how the likelihood ratio is calculated. To evaluate the prior ratio p(τ*)/p(τ) in Equation (11), we must take into account that we have formulated the prior for the vector of precision parameters ψ. Using the change-of-variables formula for probability densities, the prior ratio can be calculated as

Here, p(ψ)=p(ψa|ka, λa)p(ψd|kd, λd)p(ψe|ke, λe) is the product of the three gamma densities, Equation (6); and similarly, p(ψ*) is the product of the same gamma densities evaluated at the proposed precision parameters. Furthermore, J=−exp(−τaτdτe) is the Jacobian (determinant) arising from expressing ψ in terms of τ, and J*=−exp(−τa*τd*τe*) is the Jacobian from expressing ψ* in terms of τ*. In the actual M–H algorithm, we first calculate the logarithm of the M–H ratio r, and then we calculate the logarithm of the ratio of the absolute Jacobians,

The sampling algorithm during the adapted phase is as follows. First, we estimate the posterior covariance matrix S of the log-transformed variance components from the output of the learning phase and calculate the proposal covariance matrix as . Then, we iterate the following steps:

  1. 1

    Let τ be the current values in logarithmic scale. Generate new values τ*=τ+ w , where w is simulated from MVN (0, Sp). Transform τ and τ* to precision parameter vectors ψ and ψ*.

  2. 2

    Calculate the logarithm of the M–H acceptance ratio log(r) using Equations (11), (12), (13) and (14).

  3. 3

    Accept the proposed value τ*, if a random number drawn from the uniform distribution over [0,1] is less than r. If the proposal is accepted, then the proposed parameter vector is taken as the current vector τ = τ*, otherwise, the current value is retained.

As the breeding values and dominance effects have been integrated out from the likelihood, this sampling algorithm reduces the problems of the Gibbs sampler that arise due to posterior dependences between the random effects and variance components.

The whole adaptive algorithm, consisting of the learning phase and adapted phase, is described more fully in the appendix. It has been implemented in the Matlab (2007) environment, where most of our analyses have also been performed.

Effective sample size

ESS (Geyer, 1992; Waagepetersen et al., 2008) is a popular diagnostic tool for MCMC methods. ESS determines the approximate number of independent samples that would provide the same estimation accuracy as the dependent MCMC samples. The ESS values were calculated with the R package coda (Plummer et al., 2006).

Example analyses

Simulated data

We developed a C program that simulates ‘virtual’ populations for the variance component estimation. Owing to the identifiability problems faced during the analysis, we decided to consider two different data sets, one of which resulted in an unimodal posterior distribution of dominance variance and the other in a bimodal posterior. To develop the bimodal data set, we considered a base population of 50 unrelated lines, wherein each of the 25 females were mated with 25 males, and each crossing resulted in five offspring (in total, 3175 individuals, including the base population). For the unimodal data set, we considered a base population of 40 lines, 20 females and 20 males, and each crossing resulted in nine offspring (in total 3640 individuals, including the base population).

Additive genetic relationship matrix A and dominance relationship matrix D were calculated from the pedigree information as described in the model section. To simulate a quantitative trait y, we generated three factors; the additive effect a, the dominance effect d and noise e, and the vector of phenotypic observations was calculated as their sum,

Here, vectors a, d and e were drawn from MVN(0, A σa2), MVN(0, D σd2) and MVN(0, I σe2), respectively. We used the Cholesky decomposition of the covariance matrices A σa2 and D σd2 to draw samples from these distributions. Hence, the random genetic effects a and d were calculated as a=Pza and d=Tzd, where ziMVN(0, I) and P and T are the Cholesky factors PP =A σa2 and TT =D σd2. To validate our estimation methods, we generated the two data sets using a heritability of 0.31 (σa2=800, σd2=600, σe2=3025). Using the same set of parameters, we generated 10 simulation replicates of the unimodal data set (an existing unimodal data set and 9 new replicates) by sampling new residuals e from MVN(0, I σe2), each time, keeping the original pedigree. However, one realization was removed and simulated again because it resulted in a bimodal posterior.

QTLMAS XII workshop data

This is the simulated data set obtained from the QTLMAS XII workshop web page, http://www.computationalgenetics.se/QTLMAS08/QTLMAS/DATA.html.

The data set was generated following an animal breeding protocol, consisting of 5865 individuals from seven generations. For the first four generations (a total of 4665 individuals), both pedigree and phenotype information are available, and we considered this subset of the data for our analysis. The additive genetic relationship matrix A and dominance relationship matrix D were calculated from the pedigree information.

Field data

Real data from 82 spring barley (Hordeum vulgare L.) lines originating from the German North Rhine Westphalia (Bauer et al., 2006, 2009; Bauer and Léon, 2008) core collection were analyzed. These lines were cultivated in a randomized complete-block design with three replications over 3 different years (2001, 2002 and 2003) at the Research Station ‘Dikopshof’ of University of the Bonn, Germany. For the real data, a few replications were missing, and the missing values were imputed by the average value of non-missing replications for the corresponding year. There are a number of alternative ways of dealing with missing data. However, as the number of missing values was very low, we expected that method used would not make a significant difference. Pedigree information was available for all the lines, and the phenotypic observations of the trait ’thousand kernel mass’ were measured for all the lines. For the field data, we considered genotype-by-environment interaction instead of the dominance relationship in the linear mixed model (1) and accounted for the inbreeding among lines. Following Bauer et al. (2009), two different covariance structures were applied to model the genotype-by-environment interaction. In the first approach, called Bayes_ID, the genotype-by-environment interaction was assumed to be independently and identically normally distributed. In the second approach (Bayes_Aext), an extended relationship matrix Aext=AI (here ‘’ is the Kronecker product of two matrices) was used to model the genotype-by-environment interaction. Moreover, the fixed year effect was considered in the X matrix, along with the overall mean for the analysis. Note that in model (1), the simplifying assumption of independent errors with the constant variance was again made (cf. Burgueno et al., 2012; Piepho et al., 2012).

Analyses and results, simulated data

To validate our new algorithm, we analyzed the two simulated data sets; the unimodal data set with 3640 individuals and the bimodal data set with 3175 individuals. The estimates for variance components based on all the individuals of the two simulated data sets and 10 simulation replicates were calculated using our adaptive MCMC method and the REML method (Tables 1 and 2). The REML estimates of the variance components were calculated using ASReml software (Gilmour et al., 2006). True values, given in Table 1, are the values used in the simulations. The implemented MCMC had a total chain length of 50 000, consisting of a burn-in period of 2000 iterations, a learning phase from iteration number 2000 to 5000, and finished with the adapted phase from iterations 5000 to 50 000. Acceptance ratios for the bimodal and unimodal data sets were 28% and 26%, respectively. The point estimates, mean and median of the posterior distribution of the variance components were calculated from the MCMC samples. To calculate the mode of the posterior distribution, a kernel smoothing approach following Hoti et al. (2002) was used.

Table 1 The estimates of variance components and broad-sense heritabilities for the learning and adapted phases from the MCMC analyses of the two simulated data sets
Table 2 Estimated variance components of two Markov chain Monte Carlo algorithms and REML based on 10 simulation replicates

In Table 2, we demonstrate that Bayesian point estimates averaged over 10 replicates for variance components were always more close to the true simulated values than the averaged REML estimates. The same was often true for the extreme values (minimum and maximum) of the estimates over 10 replicates. This suggests that the Bayesian adaptive MCMC method can give variance component estimates that are competitive with the REML estimates.

A properly implemented MCMC sampler should be able to cover all the areas supported by the target distribution, but the existence of multiple modes makes this difficult (Geyer and Thompson, 1995). A conventional MCMC algorithm may fail to jump between the different modes and therefore may visit only a single mode. Although running the chain for a very long time may remedy this problem, this is computationally highly demanding. In our approach, the posterior covariance structure estimated from the learning phase helps the sampler to move freely between the different modes of the target distribution. Our MCMC algorithm was able to detect two different modes in the posterior for dominance and residual variances in the bimodal data set, whereas REML always returns a single mode (and the identified mode may depend on its starting values). Table 3 summarizes the rough estimates for two different modes (estimated using the kernel smoothing approach; see Hoti et al., 2002.) The posterior mode 1 values of dominance and residual variance are close to the true simulated values and are somewhat better than the corresponding REML estimates. The posterior mode 2 values of dominance and residual variance are poor, and their existence indicates that there is an identifiability problem. From Figures 1 and 2, it is clear that the adaptive MCMC algorithm is able to move between the different modes of the posterior of the bimodal data set. It can be seen from Figures 1 and 3 that our new adaptive MCMC algorithm was able to detect different modes in the distribution with a relatively low number of iterations, whereas the conventional MCMC method had problems visiting the different modes. To visualize the different modes in the posterior, a histogram with hexagonal bins was drawn for the log-transformed dominance and error variance components (Figure 3) with the aid of the hexbin package of R.

Table 3 The two different modes of the variance components for the simulated bimodal data set
Figure 1
figure 1

The logarithm of the variance components for the bimodal data set plotted against the MCMC iteration number. The trace plots show 45 000 iterations from the adapted phase.

Figure 2
figure 2

The logarithm of the variance components for the unimodal data set plotted against the MCMC iteration number. The trace plots show 45 000 iterations from the adapted phase.

Figure 3
figure 3

Histogram of the log-transformed dominance and error variance components using hexagonal bins.

Analyses and results, QTLMAS XII workshop data

We considered a subset of 4665 individuals (the first four generations) from the QTLMAS XII workshop data for the analysis. The pedigree information for the first four generations was available, and hence A and D matrices were calculated from the pedigree. The heritability of the QTLMAS XII workshop data was around 0.30 with the zero dominance effect. For further details of the data, see Lund et al. (2009). Our main motivation in analyzing the QTLMAS data set was to test how our method behaves in the absence of a dominance effect. The variance components were estimated using the adaptive MCMC and the REML methods (Table 4). The implemented MCMC had a total chain length of 50 000 with a burn-in period of 2000 iterations, a learning phase from iteration number 2000 to 5000, and the adapted phase from iterations 5000 to 50 000. The acceptance ratio for the data set was 35%. The point estimates were calculated as before. In our analysis, we obtained a heritability of approximately 0.30. Hallander et al. (2010) used a different prior and obtained a heritability point estimate of 0.34 from a smaller subset of data, using a Bayesian model containing additive polygenic effects only. They used uniform distributions as non-informative choice of priors to the standard deviations.

Table 4 The estimates of the variance components and broad-sense heritabilities for the learning and adapted phases from the Markov chain Monte Carlo analysis of the QTLMAS XII data set

Analyses and results, field data

The trait ’thousand kernel mass’ for 82 spring barley lines from 3 different years with three replications were considered for analysis with our adaptive MCMC method and the REML method using ASReml software (by assuming the same covariance structure for the genotype-by-environment interaction as in Bayes_ID). The implemented MCMC method had a total chain length of 50 000, consisting of a burn-in period of 2000 iterations, a learning phase from iteration number 2000 to 5000, and finishing with the adapted phase from iterations 5000 to 50 000. For analysis, each year was considered as a different location. Therefore, to account for the number of locations, heritability was calculated using the formula (Hanson, 1963):

where σa2 is the additive genetic variance, σg2 × e is the variance due to genotype-by-environment interactions, σe2 is the error variance, j is the number of years and k=3 is the number of replications. We also calculated the point estimates and 95% highest posterior density intervals for the posterior distribution from the adapted phase of the algorithm using the Bayes_ID and Bayes_Aext methods (Table 5). Bauer et al. (2009) considered data from two different years (2002 and 2003) for the analysis; and in our current study, we considered data from 3 different years (2001, 2002 and 2003). Hence, our analysis provided higher heritabilities than in Bauer et al. (2009). Both studies showed that the Bayes_Aext estimates were closer to the REML estimates. Moreover, results from both studies indicated that it is important to consider the relationship information between lines in a Bayesian model when estimating the genotype-by-environment interactions.

Table 5 The estimates of variance components, heritabilities and the 95% HPD intervals for the field data from the adapted phases of the algorithm using Bayes_ID and Bayes_Aext covariances

Effective sample size

ESS values were calculated for different data sets from both the learning phase and the adapted phase. ESS is a measure of the mixing properties of the MCMC chain. High values of ESS imply that the autocorrelation is low and are an indication that the mixing of the MCMC chain is good.

Adequate mixing of the MCMC sampler over different parts of the parameter space is essential for the convergence of MCMC algorithms, but conventional MCMC algorithms may suffer from slow mixing. From the trace plots (Figure 4), for the learning phase and adapted phase, it seems clear that the adapted MCMC was mixing well compared with the general hybrid Gibbs sampler (used in the learning phase). Thus, the adaptation significantly improved the mixing property of the algorithm, by learning an appropriate covariance structure for the proposal distribution. This visual impression is confirmed by Table 6, which summarizes the ESS for the unimodal, bimodal and QTLMAS data sets. In addition, the ESS values and the ESS ratios were calculated for the 10 simulation replicates (see Table 7). The ESS ratios were calculated by dividing the ESS from the adapted phase by the ESS from the learning phase. The ESS ratio gives how many more learning phase iterations one needs to run to obtain the same estimation accuracy as with the adapted phase. Based on the ESS values, one needs to run the standard MCMC chain (learning phase) at least six times longer to obtain comparable estimates as with from the adapted phase MCMC sampler. To calculate the ESS, an MCMC chain with a length of 3000 from the learning phase and a chain of the same length from the beginning of the adapted phase were considered after a burn-in period of 2000 iterations. For the field data, the ESS was calculated using Bayes_ID covariances. The ESS values from Tables 6 and 7 clearly support better mixing properties of variance components in the adapted phase for all the data sets. Our prior allows the chain to mix well, and at the same time, it allows a realistic estimate of the dominance variance in the case of no dominance, because in such a case the prior shrinks the posterior towards zero.

Figure 4
figure 4

Trace plot of the log-transformed additive variance component for the unimodal simulated data set. The first 3000 samples are taken from the learning phase and the remaining samples are from the adapted phase. A full color version of this figure is available at the Heredity journal online.

Table 6 ESS for 3000 iterations of the two Markov chain Monte Carlo algorithms with the unimodal, bimodal, QTLMAS and field data sets
Table 7 Efficiencies of two MCMC algorithms based on 10 simulation replicates

When the target distribution is multimodal, the conventional MCMC algorithm may have difficulties moving between modes. Additionally, the REML method fails to identify different modes of the distribution. Our new adaptive MCMC algorithm was able to visit the different modes even after a low number of iterations and exhibited good mixing properties.

Discussion

One of the main problems associated with Bayesian analysis of mixed models with several random effects is that the analysis is computationally demanding. Waldmann et al. (2008) have shown that the hybrid Gibbs sampler is much faster than the normal blocked Gibbs sampler for estimating additive and dominance genetic variances in the traditional infinitesimal model. In our current study, we compared the performance of the hybrid Gibbs sampler with an adaptive MCMC method using simulated pedigree data sets with non-zero additive and dominance genetic variances but no inbreeding, showing that the new adaptive MCMC algorithm was almost two times faster than the hybrid Gibbs sampler. To compare the running times, we compared an adaptive MCMC chain of total length 50 000 (a burn-in period of 2000 iterations, 3000 iterations in the learning phase and 45 000 iterations in the adapted phase) with a hybrid Gibbs sampling chain of the same total length (a burn-in period of 2000 iterations and 48 000 iterations from the normal hybrid Gibbs sampling). What is more, the adaptive algorithm has superior mixing properties, as is shown by the ESS in Tables 6 and 7. The increase in speed is partly due to the fact that, unlike the algorithm of Waldmann et al. (2008), our adaptive MCMC method does not sample additive and dominance genetic values for individuals. In the adaptive MCMC algorithm (see appendix for details), the determinants and quadratic forms associated with the covariance matrices at the proposed and current points are needed to calculate the likelihood ratio. Once the proposed value is accepted, the determinant and quadratic form at the current point can be replaced by the determinant and quadratic form corresponding to the accepted variance components. This makes the calculation of the likelihood ratio computationally lighter than the block update of the Gibbs sampler. A MATLAB implementation of the adaptive method (described in the appendix) is available in the Supplementary Materials.

During the adapted phase of the algorithm, our sampler generates values only from the marginal posterior of the variance components. Even if our method is primarily intended for the estimation of the genetic variances, it is possible to generate MCMC samples for the additive and dominance genetic values afterwards, by sampling them block-wise from their fully conditional posterior distribution conditionally on each of the values of the variance components in the MCMC sample generated by the adaptive MCMC sampler. In contrast, in the normal hybrid Gibbs sampler, the genetic values are sampled conditionally on each of the values of the variance components. We tested this procedure by calculating the genetic values for the QTLMAS workshop data with the blocked Gibbs sampler conditionally on every 10th realization (of three variance components) out of 45 000 samples from the adapted phase. The linear correlation between the true genetic values (that is, the sum of the additive and dominance values) and the estimated genetic values value was around 0.71 for the QTLMAS workshop data. In addition, the genetic values given by ASReml showed a correlation of approximately 0.71 with the true genetic values for the same data set. Our adaptive MCMC genetic values showed a strong correlation of approximately 0.99 with the genetic values from ASReml, demonstrating that our posterior mean estimates were close to the classical point estimates.

De Boer and Hoeschele (1993) showed that the presence of inbreeding changes the mean and complicates the genetic covariance structure of a population. Although the mixed model was not considered here, it can, in principle, account for inbreeding by including a complex covariance matrix among the additive and dominance effects. To accomplish this analysis, our method would require adjustments. However, another type of a model that would suit our estimation framework well is a Gaussian process model (Crossa et al., 2010) or an extension of a ridge regression model (Piepho, 2009; Schulz-Streeck and Piepho, 2010). Then, the dominance relationship matrix would be replaced by a marker similarity matrix or by the covariance function that was proportional to the evaluations of a reproducing kernel evaluated in the marker genotypes.

Identifiability problems can arise, especially when the dominance relationship matrix D is close to a multiple of the identity matrix. This occurs when the pedigree is incomplete and/or lacks full sibs or double cousins. Then, certain features of the phenotypic observations be attributed to dominance effects can almost as well as to noise. In such a case the joint marginal posterior of the dominance variance and the error variance should be bimodal, and then a conventional MCMC sampler may have difficulties moving between the modes. Gibbs samplers are especially vulnerable, but M–H sampling schemes may behave better. Adding more full-sibs to the pedigree file can improve the multimodality problem to some extent. In our simulation experiments, our adaptive MCMC algorithm was able to explore the entire parameter space with good mixing properties, and therefore was able to detect different modes in the posterior distribution.

The proposal covariance matrix (2.38)2/d S from Roberts et al. (1997) and Roberts and Rosenthal (2001) is optimal in a large-dimensional context when the posterior is approximately Gaussian (Roberts and Rosenthal, 2007; Rosenthal, 2011). We also experimented with other scalings of the posterior covariance matrix, but the theoretical formula turned out to work well enough in our applications. This scaling factor (2.38)2/d was also employed in the MCMC sampler of Fang et al. (2011), who introduced a new method for QTL mapping. In their sampling scheme, they utilized REML estimates in the construction of the proposal covariance matrix. If the target distribution is multimodal, this approach may fail to move between different modes. In contrast, our new adaptive MCMC method uses the history of the chain to learn the proposal covariance matrix, which enables the algorithm to move between different modes. The success of adaptive MCMC methods generally depends on how well the proposal covariance structure is learned from the previous history of the chain. Therefore, it is important to use a sufficient number of samples in the learning period. The required sample depends first on the dimensionality and on the other characteristics of the posterior distribution, and second on the mixing properties of the MCMC sampler. Therefore, it is impossible to give general prescriptions for it.

In our study, we also tested adaptation in a version of the model where the random effects were not marginalized away. However, this formulation suffered from poor mixing and slow convergence because of posterior dependencies among the random effects and the variance components (results not shown). The marginalized model (that is, hierarchical model 2) that we used in our study was able to explore the entire parameter space with good mixing properties. The ESS of a parameter is the number of independent samples from the posterior distribution that our correlated MCMC sample is worth. If the ESSs are low, then the autocorrelations will be high, and that may be an indication of poor mixing of the chain. The adaptive scheme was able to decrease the autocorrelation of the chain to yield much larger ESS.

The choice of the prior is one of the important steps in any Bayesian analysis. Generally, the influence of the prior distribution on the posterior is related to the sample size of the data. We carried out a sensitivity analysis using different priors, and most priors seemed to lead to non-zero estimates of dominance variance for the QTLMAS data (results not shown). However, the gamma prior for the precision parameters (ki=1 and λi=0.001) was able to provide good mixing, while still resulting in a realistic estimate of dominance variance in the case of no dominance. This follows because the prior can then shrink the posterior towards zero.

Data archiving

Data have been deposited at Dryad:doi:10.5061/dryad.0p88f.