A new genomic prediction method with additive-dominance effects in the least-squares framework

Liu, Hailan; Chen, Guo-Bo

doi:10.1038/s41437-018-0099-5

Article
Published: 20 June 2018

A new genomic prediction method with additive-dominance effects in the least-squares framework

Heredity volume 121, pages 196–204 (2018)Cite this article

1008 Accesses
9 Citations
2 Altmetric
Metrics details

Subjects

Abstract

In our previous work, we proposed a genomic prediction method combing identical-by-state-based Haseman-Elston regression and best linear prediction with additive variance component only (HEBLP|A herein), the most essential component of genetic variation. Since the dominance effects contribute significantly in heterosis, it is desirable to incorporate the HEBLP with dominance variance component that is expected to enhance the predictive accuracy as we move to the further development: HEBLP|AD, a paralleled implementation of genomic prediction compared with genomic best linear unbiased prediction (GBLUP). The simulation results indicated that when the dominance effects contributed to a large proportion of genetic variation, HEBLP|AD and GBLUP|AD, having similar accuracy, both outperformed HEBLP|A; but when the dominance variation was none or little, HEBLP|A, HEBLP|AD, and GBLUP|AD had similar predictability. The analysis of real data from Arabidopsis thaliana F2 population also demonstrated the latter situation. In summary, HEBLP|AD performed stable whether a trait was controlled by dominance effects or not.

You have full access to this article via your institution.

Download PDF

Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits

Article 26 September 2020

Duanyang Ren, Lixia An, … Wenzhong Liu

Performance of Bayesian and BLUP alphabets for genomic prediction: analysis, comparison and results

Article 04 May 2022

Prabina Kumar Meher, Sachin Rustgi & Anuj Kumar

Multi-trait single-step genomic prediction accounting for heterogeneous (co)variances over the genome

Article Open access 22 October 2019

Emre Karaman, Mogens S. Lund & Guosheng Su

Introduction

With the rapid development of high-throughput molecular marker techniques, such as single nucleotide polymorphisms (SNPs) and statistical approaches, genomic prediction first proposed by Meuwissen et al. (2001) has been successfully applied to genetic improvement of complex traits that are controlled by polygenic effects—numerous small-effect quantitative trait loci (QTL) (Schaeffer, 2006; Hayes et al. 2009; Jannink et al. 2010; Zhang et al. 2011; Riedelsheimer et al. 2012). Compared to the conventional marker-assisted selection (MAS), genomic prediction is far more accurate by utilizing all molecular marker information to estimate the breeding values of each individual in a candidate population (Heffner et al. 2009; Arruda et al. 2016).

In the early stage of genomic prediction methods, many models accounted only for additive effects (Meuwissen et al. 2001; Bernardo and Yu, 2007; Calus et al. 2008; VanRaden, 2008). However, dominance effects contribute to heterosis (Hua et al. 2003; Li et al. 2008), and therefore should be included in the models orienting hybrid breeding. Recent studies also show that genomic prediction models including dominance effects can improve the prediction accuracy (Denis and Bouvet, 2011; Su et al. 2012; Technow et al. 2012; Denis and Bouvet, 2013; Nishio and Satoh, 2014; de Almeida Filho et al. 2016; Wang et al. 2017; Liu et al. 2017; Resende et al. 2017).

In our previous study, we developed a fast genomic prediction approach (namely HEBLP, or HEBLP|A herein) combining identical-by-state (IBS)-based Haseman-Elston (HE) regression and best linear prediction (BLP). It can obtain the total additive genetic variance via a simple HE linear regression with reduced computation complexity, but only additive effects are included (Liu and Chen, 2017). The present study aims to develop the HEBLP with both the additive and dominance effects (HEBLP|AD) and to evaluate its predictive performance in the simulated and a real Arabidopsis thaliana F2 population.

Materials and methods

The Arabidopsis thaliana F2 population

We used the phenotype and genotype data of an Arabidopsis thaliana F2 population (namely P19) derived from a cross between Bay-0 and Lov-5 (Salomé et al. 2011). It consists of 384 individuals and 245 SNP markers. There are seven traits including days until visible flower buds in the center of the rosette (DTF1), days until inflorescence stem reached 1 cm in height (DTF2), days until first open flower (DTF3), rosette leaf number (RLN), cauline leaf number (CLN), total leaf number: sum of RLN and CLN (TLN), and leaf initiation rate (RLN/DTF1) (LIR1). For more details about the P19 population please refer to Salomé et al. 2011.

Statistical models

The linear model of a quantitative trait can be written as:

$$y = Z_aa + Z_dd + e,$$

(1)

in which y is the n × 1 vector for the standardized phenotypic value of a quantitative trait measured from n individuals $\left( {y_i = \frac{{y_i^\prime - \bar y}}{{\sigma _y}}} \right),\,y_i^\prime$ represents the raw phenotypic value; $\bar y$ represents the mean value of the phenotypic values; and σ_y represents the standard error of the phenotypic values.); Z_a is the standardized genotype matrix of n rows and m columns for additive effects (m represents the number of markers.). Z_d is the standardized genotype matrix n × m for dominance effects. In order to keep the additive and dominance variances orthogonal to each other, the coding schemes for additive and dominance effects should be tuned accordingly (Vitezica et al. 2017). For the ith individual at the kth locus, $Z_{a,ik} = \frac{{x_{ik} - 2p_k}}{{\sqrt {2p_k(1 - p_k)} }},$ in which x_ik counts the number of reference alleles (2, 1, and 0 for AA, Aa, and aa, respectively) and p_k the frequency of the reference allele A at the locus. $Z_{d,ik} = \frac{{\delta _{ik} - 2pk}}{{2pk(1 - pk)}}$, in which δ_ik is coded 0, 2p_k, and (4p_k−2), respectively for AA, Aa, and aa genotypes, respectively. F2 population the expected pk is 0.5, and the frequency for AA, Aa, and aa are 0.25, 0.5, and 0.25, respectively, under the Hardy-Weinberg equilibrium. The additive and dominance effects of the causal loci were represented by a and d, respectively; the additive effects follow $N\left( {0,\sigma _d^2} \right)$; the dominance effects follow $N\left( {0,\sigma _d^2} \right)$; and e is the residual error, following $N\left( {0,\sigma _e^2} \right)$. Therefore, ${\it{{\rm var}}}{\mathrm{(}}{\it{y}}{\mathrm{) = }}{\it{\Omega }}_a\sigma _a^2 + {\it{\Omega }}_d\sigma _d^2 + I\sigma _e^2$, in which ${\it{\Omega }}_a = \frac{{z_az_a^\prime }}{m}$ is the additive genetic relationship matrix and ${\it{\Omega }}_d = \frac{{z_dz_d^\prime }}{m}$ is the dominance genetic relationship matrix.

For HEBLP|A and HEBLP|AD methods, we estimated total additive $\left( {\sigma _a^2} \right)$ and dominance $\left( {\sigma _d^2} \right)$ genetic variance in the training population via Haseman-Elston regression (HE) as below

$$Y = b_0 + b_a\omega _a + b_d\omega _d + \varepsilon,$$

(2)

in which Y is a vector of $\frac{{n\left( {n - 1} \right)}}{2}$ elements for the squared difference between a pair of individuals and Y_ij = (y_i-y_j)²; ω_a is the additive genetic relatedness between a pair of individuals i and j, as found in the ith row and the jth column entry in Ω_a; ω_d is the dominance genetic relatedness between a pair of individuals i and j, similarly as found in the ith row and the jth column entry in Ω_d. Alternative to HE, linear mixed model can be employed to estimate the additive and dominance variance components via restricted maximum likelihood (REML) algorithm. Of note, the difference between HE and linear mixed model are as below. HE is based on least squares, and it allows the analytical result for b_a and b_d, respectively. In contrast, REML is a model-based approach and the exact structure of the estimated variance, regardless of additive or dominance, remains elusive. Furthermore, as discussed in our previous study (Liu and Chen, 2017), the computational complex for HE is ${\cal O}(n^2)$, proportional to the square of sample size, but for REML ${\cal O}(n^3)$. The computational advantage of HE is important especially when the sample size is large.

Analytical results for the Haseman-Elston regression

The least-squares framework exists analytical results for the regression coefficient. Although, Eq 2 is a linear model of two regression coefficients, $E\left( {b_a} \right) = \frac{{{\rm cov}(Y,\omega _a)}}{{{\rm var}(\omega _a)}}$ and $E\left( {b_d} \right) = \frac{{{\rm cov}(Y,\omega _d)}}{{{\rm var}(\omega _d)}}$ because ω_a and ω_d are orthogonal for each locus. The general principal for deriving the analytical solution for E(b_a) can be found in Chen’s study (Chen, 2014). For E(b_a), $cov\left( {Y,\omega _a} \right) = E\left( {Y\omega _a} \right) - E\left( Y \right)E\left( {\omega _a} \right) = E(Y\omega _a)$ because E(Y) = 0.

$$E\left( {Y\omega _a} \right) = \frac{1}{m}\mathop {\sum }\limits_{x_{ik}} \mathop {\sum }\limits_{x_{jk}} \omega _{a,ik}\omega _{a,jk}\left[ {E\left( {y_i{\mathrm{|}}x_{ik}} \right) - E\left( {y_j{\mathrm{|}}x_{jk}} \right)} \right]^2p\left( {x_{ik}} \right)p(x_{jk}),$$

in which E(y_i|x_ik) is the conditional probability of the phenotype given its genotype, ω_a,ik as defined above. p(x_ik) takes value of 0.25, 0.5, and 0.25, respectively, given x_ik = AA, Aa, and aa. In quadric form

$$E\left( {Y\omega _a} \right) = \frac{1}{m}{\boldsymbol{\beta }}^T{\boldsymbol{I}}_{\boldsymbol{A}}\left\{ {\mathop {\sum }\limits_{k = 1}^m {\cal M}_k} \right\}{\boldsymbol{I}}_{\boldsymbol{A}}{\boldsymbol{\beta }},$$

in which the general form of ${\boldsymbol{\beta }}^T = [\beta _1 + \left( {p_1 - q_1} \right)d_1,\beta _2 + \left( {p_2 - q_2} \right)d_2, \ldots ,\beta _m + \left( {p_m - q_m} \right)d_m]$ the vector for additive effects and I_A an identity matrix with ${\boldsymbol{I}}_{A,kk} = \sqrt {2p_kq_k}$. For F2 populations, as _pi = 0.5 the dominance effect d_i will be eliminated out from β. ${\cal M}_k = \left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\rho _{1,k}^2} & {\rho _{2,k}\rho _{1,k}} \cr {\rho _{1,k}\rho _{2,k}} & {\rho _{2,k}^2} \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{m,k}\rho _{1,k}} \cr \cdots & {\rho _{m,k}\rho _{2,k}} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,k}\rho _{m,k}} & {\rho _{2,k}\rho _{m,k}} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & {\rho _{m,k}^2} \end{array}} \end{array}} \right)$, a symmetric matrix, indicating how the kth marker tags QTLs; for instance the entry at the ith row and the jth column ρ_i,k,ρ_j,k represents the joint LD of the ith and the jth QTLs tagged by the kth marker.

The denominator var(ω_a) can be written as $\frac{1}{{m^2}}\mathop {\sum}\nolimits_{k_1 = 1}^m {\mathop {\sum}\nolimits_{k_2 = 1}^m {\rho _{k_1k_2}^2} }$, understood as the averaged linkage disequilibrium between each pair of markers—including a marker with itself (see Appendix for the definition of effective number of markers m_e). Alternatively, var(ω_a) can be expressed in quadric form

$${\rm var}\left( {\omega _a} \right) = \frac{1}{{m^2}}1^T\left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} 1 & {\rho _{2,1}^2} \cr {\rho _{1,2}^2} & 1 \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{1,m}^2} \cr \cdots & {\rho _{2,m}^2} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,m}^2} & {\rho _{2,m}^2} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & 1 \end{array}} \end{array}} \right)1,$$

in which $1^T = [1,1, \ldots 1]$ a vector for 1.

So, in quadric form

$$E\left( {b_a} \right) = - 2m\frac{{{\boldsymbol{\beta }}^T{\boldsymbol{I}}_{\boldsymbol{A}}\left\{ {\mathop {\sum }\nolimits_{k = 1}^m \left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\rho _{1,k}^2} & {\rho _{2,k}\rho _{1,k}} \cr {\rho _{1,k}\rho _{2,k}} & {\rho _{2,k}^2} \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{m,k}\rho _{1,k}} \cr \cdots & {\rho _{m,k}\rho _{2,k}} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,k}\rho _{m,k}} & {\rho _{2,k}\rho _{m,k}} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & {\rho _{m,k}^2} \end{array}} \end{array}} \right)} \right\}{\boldsymbol{I}}_{\boldsymbol{A}}{\boldsymbol{\beta }}}}{{1^T\left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} 1 & {\rho _{2,1}^2} \cr {\rho _{1,2}^2} & 1 \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{1,m}^2} \cr \cdots & {\rho _{2,m}^2} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,m}^2} & {\rho _{2,m}^2} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & 1 \end{array}} \end{array}} \right)1}}.$$

Similarly, for E(b_d), we had

$E\left( {Y\omega _d} \right) = \frac{1}{m}\mathop {\sum }\limits_{x_{ik}} \mathop {\sum }\limits_{x_{jk}} \omega _{d,ik}\omega _{d,jk}\left[ {E\left( {y_i{\mathrm{|}}x_{ik}} \right) - E\left( {y_j{\mathrm{|}}x_{jk}} \right)} \right]^2p\left( {x_{ik}} \right)p(x_{jk})$ and its quadric form

$\frac{1}{m}{\boldsymbol{D}}^T{\boldsymbol{I}}_D\left\{ {\mathop {\sum }\limits_{k = 1}^m \left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\rho _{1,k}^4} & {\rho _{2,k}^2\rho _{1,k}^2} \cr {\rho _{1,k}^2\rho _{2,k}^2} & {\rho _{2,k}^4} \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{m,k}^2\rho _{1,k}^2} \cr \cdots & {\rho _{m,k}^2\rho _{2,k}^2} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,k}^2\rho _{m,k}^2} & {\rho _{2,k}^2\rho _{m,k}^2} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & {\rho _{m,k}^4} \end{array}} \end{array}} \right)} \right\}{\boldsymbol{I}}_D{\boldsymbol{D}}$ in which ${\boldsymbol{D}} = [d_1,d_2, \ldots d_m]$ the vector for dominance effects and I_D an identity matrix with I_D,kk = 2p_kq_k.

The denominator is ${\rm var}\left( {\omega _d} \right) = \frac{1}{{m^2}}\mathop {\sum }\limits_{k_1 = 1}^m \mathop {\sum }\limits_{k_2 = 1}^m \rho _{k_1k_2}^4$, and in quadric form

$${\rm var}\left( {\omega _d} \right) = \frac{1}{{m^2}}1^T\left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} 1 & {\rho _{2,1}^4} \cr {\rho _{1,2}^4} & 1 \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{1,m}^4} \cr \cdots & {\rho _{2,m}^4} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,m}^4} & {\rho _{2,k}^4} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & 1 \end{array}} \end{array}} \right)1.$$

So,

$$E\left( {b_d} \right) = - 2m\frac{{{\boldsymbol{D}}^T{\boldsymbol{I}}_{\boldsymbol{D}}\left\{ {\mathop {\sum }\nolimits_{k = 1}^m \left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {\rho _{1,k}^4} & {\rho _{2,k}^2\rho _{1,k}^2} \cr {\rho _{1,k}^2\rho _{2,k}^2} & {\rho _{2,k}^4} \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{m,k}^2\rho _{1,k}^2} \cr \cdots & {\rho _{m,k}^2\rho _{2,k}^2} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,k}^2\rho _{m,k}^2} & {\rho _{2,k}^2\rho _{m,k}^2} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & {\rho _{m,k}^4} \end{array}} \end{array}} \right)} \right\}{\boldsymbol{I}}_{\boldsymbol{D}}{\boldsymbol{D}}}}{{1^T\left( {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} 1 & {\rho _{2,1}^4} \cr {\rho _{1,2}^4} & 1 \end{array}} & {\begin{array}{*{20}{c}} \cdots & {\rho _{1,k}^4} \cr \cdots & {\rho _{2,k}^4} \end{array}} \cr {\begin{array}{*{20}{c}} \vdots & \vdots \cr {\rho _{1,m}^4} & {\rho _{2,k}^4} \end{array}} & {\begin{array}{*{20}{c}} \ddots & \vdots \cr \cdots & 1 \end{array}} \end{array}} \right)1}},$$

Although, E(b_a) and E(b_d) resemble each other, E(b_a) has its kernel related to squared correlation ρ², which is a term associated to the additive variance (Hill and Robertson, 1968), while E(b_d) related to ρ⁴. In particular, the numerator involves the LD between a pair of markers and the denominator the LD between a pair of markers.

Of note, there are two kinds of F2 populations, the conventional F2 that is derived from F1 but not completely reproducible in term of genotypes, and in contrast there is “immortalized F2” (IF2), which can be reproduced accordingly. The IF2 can often be realized in two ways: via double haploid population (DH) (Liu et al. 2017) and from recombination inbred lines (RIL) (Hua et al. 2003). The LD differs upon F2/IF2 is used in practice. Between the $k_1^{{\rm th}}$ and $k_2^{{\rm th}}$ markers, for a conventional F2 and DH-derived F2 the squared correlation is $\rho _{k_1,k_2}^2 = \left( {1 - 2c_{k_1,k_2}} \right)^2$ but $\rho _{k_1,k_2}^2 = \left( {\frac{{1 - 2c_{k_1,k_2}}}{{1 + 2c_{k_1,k_2}}}} \right)^2$ for RIL-derived F2. For example, given the recombination of 0.1 between a pair of markers, their ρ² = 0.64 for F2 and DH-derived F2 but 0.44 for RIL-derived F2. For dominance-associated terms, ρ⁴ = 0.41 for F2 and DH-derived IF2, and 0.2 for RIL-derived IF2.

For simplicity, we only consider the typical polygenic trait that the QTLs are randomly distributed along the genome, and, under this assumption, $\sigma _a^2 = - \frac{{b_a}}{2}$ and $\sigma _d^2 = - \frac{{b_d}}{2}$, respectively. A computer program that estimates additive and dominance heritability using Haseman-Elston regression is available from authors.

Best linear prediction (BLP)

BLP method was used to predict the genotypic value of each line of the candidate population.

$$\hat g_2 = \left( {\hat \sigma _a^2\Omega _{a21} + \hat \sigma _d^2\Omega _{d21}} \right)V^{ - 1}y_1,$$

(3)

in which $\hat g_2$ is the predicted genotypic values in the candidate population; $y_1$ is the phenotypic values in the training population; Ω_a21 and Ω_d21 represent the additive and the dominance genetic relationship matrix between the candidate and the training population respectively; $\hat \sigma _a^2$ and $\hat \sigma _d^2$ represent the estimated additive and dominance variances respectively; the inverse of the V matrix is computed using $V^{ - 1} = \left( {\hat \sigma _a^2\Omega _{a11} + \hat \sigma _d^2\Omega _{d11} + \hat \sigma _e^2I} \right)^{ - 1}$, in which Ω_a11 and Ω_d11 represent the additive and the dominance genetic relationship matrix for the training population respectively.

Results

Estimates of the heritability and predictability in the simulated F2 population

We simulated a quantitative trait from F2 experimental population. In the simulated F2 population, we assumed that 1001 equal-frequent biallelic markers were evenly distributed in one chromosome [the recombination rate was c between the ith and the (i + 1)th markers]. All markers were defined as QTLs whose additive and dominance effects follow a normal distribution. Each simulation scenario included 20 replications.

In order to assess the unbiasedness of estimating heritability via the three methods (HE|A, HE|AD, and REML|AD), we performed a Monte Carlo simulation experiment for a F2 population. When the simulated parameters were set as population size (n = 500), additive heritability ($h_a^2 = 0.3$), dominance heritability ($h_d^2 = 0.2$), and recombination rate (c = 0.01), the results showed that $\hat h_a^2 = 0.271 \pm 0.075$ (via HE|A), $\hat h_a^2 = 0.271 \pm 0.075$ and $\hat h_d^2 = 0.193 \pm 0.039$ (via HE|AD), and $\hat h_a^2 = 0.296 \pm 0.048$ and $\hat h_d^2 = 0.226 \pm 0.052$ (via REML|AD) (Fig. 1). It indicated that all three methods could obtain unbiased estimates of parameters under the typical polygenic model.

Moreover, we evaluated the prediction accuracy of HEBLP|AD, HEBLP|A, and GBLUP|AD under five environments in the simulated F2 population (Fig. 2). The size of both the training (n_T) and the candidate population (n_C) were 500 and 100 in all simulations. The squared correlation coefficient (r²) between the phenotypes and the predicted genotypic values was defined as the prediction accuracy.

In scenario 1 ($h_a^2 = 0.4$, $h_d^2 = 0$, and c = 0.01), the prediction accuracies were HEBLP|AD = 0.333 ± 0.066, GBLUP|AD = 0.314 ± 0.096, and HEBLP|A = 0.335 ± 0.067. In scenario 2 ($h_a^2 = 0.40$, $h_d^2 = 0.05$, and c = 0.01), the prediction accuracies were HEBLP|AD = 0.351 ± 0.065, GBLUP|AD = 0.355 ± 0.066, and HEBLP|A = 0.334 ± 0.065. The results of these two simulations indicated that the three methods had a similar predictive ability in the case of no or very small contribution of dominance effects to genetic variation. In scenario 3 ($h_a^2 = 0.40$, $h_d^2 = 0.1$, and c = 0.01), the prediction accuracies were HEBLP|AD = 0.388 ± 0.065, GBLUP|AD = 0.391 ± 0.066, and HEBLP|AD = 0.334 ± 0.067. In scenario 4 ($h_a^2 = 0.4$, $h_d^2 = 0.2$, and c = 0.01), the prediction accuracies were HEBLP|AD = 0.471 ± 0.063, GBLUP|AD = 0.475±0.064, and HEBLP|A = 0.335 ± 0.070. In scenario 5 ($h_a^2 = 0.1$, $h_d^2 = 0.6$, and c = 0.01), the prediction accuracies were HEBLP|AD = 0.553 ± 0.063, GBLUP|AD = 0.569 ± 0.067, and HEBLP|A = 0.079 ± 0.048. It indicated a similar predictability between HEBLP|AD and GBLUP|AD, and a significantly better performance than HEBLP|A in the case of a large contribution of dominance effects to genetic variation.

Comparison of computational time of HE|AD and REML|AD

We simulated F2 population based on 20 replications to evaluate the computational time of HE|AD and REML|AD. In this case, the parameters were set as population size (n = 500), additive heritability ($h_a^2 = 0.2$), dominance heritability ($h_d^2 = 0.6$), marker number (M = 3001), and recombination rate (c = 0.01). The result showed that $\hat h_a^2 = 0.183 \pm 0.064$ and $\hat h_d^2 = 0.568 \pm 0.068$ (via HE|AD), and $\hat h_a^2 = 0.20 \pm 0.032$ and $\hat h_d^2 = 0.636 \pm 0.084$ (via REML|AD), and that HE|AD and REML|AD took an average of 304 s and 3487 s in each simulation, respectively, demonstrating a significant computational advantage of HE|AD over REML|AD.

Comparison of heritability and predictability between F2 and IF2 derived from RIL using HEBLP|AD

We simulated F2 and IF2 derived from RIL population to evaluate HEBLP|AD. In this case, we simulated 1001 markers, among which 100 markers were sampled as QTLs. When we estimated the heritability and prediction accuracy, the markers representing QTLs were excluded. Each simulation scenario included 20 replications.

When the simulated parameters were set as training population size (n_T = 500), candidate population size (n_C = 100), additive heritability ($h_a^2 = 0.5$), dominance heritability ($h_d^2 = 0.25$), and recombination rate (c = 0.01), the results of the simulated F2 population showed $\hat h_a^2 = 0.458 \pm 0.170$, $\hat h_d^2 = 0.244 \pm 0.111$, and the predictability r² = 0.614 ± 0.061 in the simulated F2 population; for the simulated IF2 populations, $\hat h_a^2 = 0.480 \pm 0.129$, $\hat h_d^2 = 0.230 \pm 0.075$, and the predictability r² = 0.544 ± 0.071 in the simulated IF2 population derived from RIL population. As RIL-derived IF2 undergoing multi-generation selfing, its decayed LD resulted a much lower r² than that of F2.

Approximation of prediction accuracy

To further understand the study, in the Appendix, we derived a formula of prediction accuracy including additive and dominance variance components. This derived result could be considered as an extension to those previously established by Daetwyler et al. (2008) and Goddard (2009).

$$r^2 = H^2\frac{{H^2}}{{H^2 + \frac{{m_{e.a} + m_{e.d}}}{{n_T}}}}.$$

(4)

The result showed that $H^2$ was the upper bound of the prediction accuracy, and was further upon (1) the broad heritability ($H^2 = h_a^2 + h_d^2$), (2) the effective number of markers (m_e.a), (3) the effective number of markers of dominance heritability (m_e.d), and (4) the sample size of the training data. As m_e.a and m_e.d were determined by the recombination, when the markers were dense, the prediction accuracy could be further approximated as

$$r^2 \approx H^2\frac{{H^2}}{{H^2 + \frac{{6l}}{{n_T}}}},$$

(5)

in which l is the length of the chromosome (Morgan). Both Eq 4 and Eq 5 indicated that the upper bound of prediction accuracy was H² when the sample size n_T became infinite. Having evaluated the utility of the approximation, we found that the expected and observed prediction accuracy was consistent via HEBLP|AD under different recombination rates based on 10 simulations for F2 population (Table 1). Eq 4 and Eq 5 gave similar prediction for r² when the markers were dense, and the accuracy of Eq 5 was reduced when the markers were sparse. The sample size of the candidate population could only influence the statistical power of the prediction accuracy because r² followed $\chi _1^2$ under the null hypotheses.

Table 1 Prediction accuracy (r²) under different recombination rates (c) based on 10 simulations in F2 population when $h_a^2 = 0.3$, $h_d^2 = 0.5$, marker number = 1001, and the candidate sample size was 100

Full size table

Genomic prediction of 7 traits in the Arabidopsis thaliana F2 population

The 7 traits, including DTF1, DTF2, DTF3, RLN, CLN, TLN, and LIR1 from a Arabidopsis thaliana F2 population were used to assess the prediction performance of HEBLP|A, HEBLP|AD, and GBLUP|AD.

We first analyzed the 7 traits via HE|A, HE|AD, and REML|AD, obtaining the estimated additive heritability varying from 0.080 to 0.582 (HE|A), 0.080 to 0.582 (HE|AD), and 0.158 to 0.731 (REML|AD), and the estimated dominance heritability varying from 0.009 to 0.052 (HE|AD), and 0.018 to 0.106 (REML|AD). The results demonstrated that dominance effects only accounted for a little proportion of genetic variation for these traits (Table 2).

Table 2 The estimated variance proportion ($\hat h_a^2$ and $\hat h_d^2$) for the 7 traits in the Arabidopsis thaliana F2 (P19) population

Full size table

Based on 100 replications, we found that the predictability of HEBLP|A, HEBLP|AD, and GBLUP|AD was similar for all traits (Table 3). For example, the prediction accuracies for DTF1 were 0.466 ± 0.028, 0.459 ± 0.032, and 0.440 ± 0.088 via HEBLP|A, HEBLP|AD, and GBLUP|AD, respectively. It indicated that, as is in the simulations, HEBLP|A, HEBLP|AD, and GBLUP|AD showed similar predictability in the case of a very small contribution of dominance effects to the genetic variation.

Table 3 Prediction accuracy of the 7 traits in the Arabidopsis thaliana F2 (P19) population based on 100 simulations

Full size table

Discussion

The impact of the dominance heritability on predictive accuracy

The wide utilization of heterosis in the animals and plants, such as maize, rice, and cattle has significantly increased their productivity. In this study, we extended our previous method of HEBLP|A to HEBLP|AD. The simulation results demonstrated that (1) HEBLP|AD and GBLUP|AD are superior to HEBLP|A when the dominance effects can explain a significant proportion of genetic variation; (2) HEBLP|AD, GBLUP|AD, and HEBLP|A have a similar predictive ability when the dominance effects can only explain a small proportion of genetic variation. Furthermore, the real data from Arabidopsis thaliana F2 population was used to evaluate the three methods, and since the estimated heritability showed a small contribution of the dominance effects to genetic variation, the result was supportive to the second case in the simulation. de Almeida Filho et al. (2016) indicated that when the dominance effects consisted of only a small proportion in the total genetic variation, incorporating them into BayesA, BayesB, BL, and BRR would decrease the prediction accuracy. However, it is safe and stable to include dominance effects into HEBLP model under this circumstance. In addition, not limited to the F2 population as was demonstrated, HEBLP|AD is applicable as long as the populations promise the estimation of additive and dominance variance components (such as natural population of random mating).

In addition, we also provided an approximation of prediction accuracy for F2 population (Appendix). The genetic length of the chromosome, the density of markers, H², and the sample size of the training population were key factors that would influence the prediction accuracy. The method presented in Appendix was general and could be applied to other populations. In this simulation, we simulated extremely long and single chromosome, which was unrealistic, and we will consider incorporating the real marker density into further study. We considered typical polygenic model only at present, but the interplay between genetic architecture will be included in our further studies.

Application of the genomic prediction in hybrid breeding of crops

The traditional strategy to cultivate hybrid crosses is to perform a large number of cross experiments between the inbred lines and furthermore select desirable hybrids. This process can be accelerated via combining genomic prediction approaches with immortalized F2 (IF2) population constructed by the doubled haploid (DH) population. Hua et al. (2003) first constructed IF2 population, which had the same genetic architecture as the conventional F2 population, can be generated via randomly permutated intermating of recombinant inbred lines (RILs) or DH population at present. In a hybrid breeding program, when sample size (n) of RIL or DH population is large and all crosses $\left[ {\frac{{n(n - 1)}}{2}} \right]$ between inbred lines from the RIL or DH population need to be evaluated in the field trials, it will occupy large resources. To reduce the cost of genetic improvement, genomic prediction can be used to IF2 population to select hybrid crosses with high-hybrid performance. Guo et al. (2013) applied genomic prediction to an IF2 population derived from RIL population in maize, and Xu et al. (2014) did that in rice. Liu et al. (2017) has applied genomic prediction to IF2 population based on rapeseed DH population. However, construction of RIL population is time-consuming, and therefore the procedure of GP+IF2 (DH) will be a more efficient choice to pick out superior hybrids and potential lines with high-specific combining ability or general combining ability.

References

Arruda MP, Lipka AE, Brown PJ, Krill AM, Thurber C, Brown-Guedira G et al. (2016) Comparing genomic selection and marker-assisted selection for Fusarium head blight resistance in wheat (Triticum aestivum L.). Mol Breed 36:84
Article CAS Google Scholar
Bernardo R, Yu J (2007) Prospects for genomewide selection for quantitative traits in maize. Crop Sci 47:1082–1090
Article Google Scholar
Calus MPL, Meuwissen THE, De Roos APW, Veerkamp RF (2008) Accuracy of genomic selection using different methods to define haplotypes. Genetics 178:553–561
Article PubMed PubMed Central CAS Google Scholar
Chen G-B (2014) Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman-Elston regression. Front Genet 5:107
PubMed PubMed Central Google Scholar
Daetwyler HD, Villanueva B, Woolliams JA (2008) Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3:e3395
Article PubMed PubMed Central CAS Google Scholar
de Almeida Filho JE, Guimarães JFR, e Silva FF, de Resende MDV, Muñoz P, Kirst M et al. (2016) The contribution of dominance to phenotype prediction in a pine breeding and simulated population. Heredity 117:33–41
Article PubMed PubMed Central Google Scholar
Denis M, Bouvet J-M (2011) Genomic selection in tree breeding: testing accuracy of prediction models including dominance effect. BMC Proc 5:O13
Article PubMed Central Google Scholar
Denis M, Bouvet JM (2013) Efficiency of genomic selection with models including dominance effect in the context of Eucalyptus breeding. Tree Genet Genomes 9:37–51
Article Google Scholar
Goddard M (2009) Genomic selection: prediction of accuracy and maximisation of long term reponse. Genetica 136:245–257
Article PubMed Google Scholar
Guo T, Li H, Yan J, Tang J, Li J, Zhang Z et al. (2013) Performance prediction of F1 hybrids between recombinant inbred lines derived from two elite maize inbred lines. Theor Appl Genet 126:189–201
Article PubMed CAS Google Scholar
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443
Article PubMed CAS Google Scholar
Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12
Article CAS Google Scholar
Hill WG, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38:226–231
Article PubMed CAS Google Scholar
Hua J, Xing Y, Wu W, Xu C, Sun X, Yu S et al. (2003) Single-locus heterotic effects and dominance by dominance interactions can adequately explain the genetic basis of heterosis in an elite rice hybrid. Proc Natl Acad Sci USA 100:2574–2579
Article PubMed CAS Google Scholar
Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genom 9:166–177
Article CAS Google Scholar
Li L, Lu K, Chen Z, Mu T, Hu Z, Li X (2008) Dominance, overdominance and epistasis condition the heterosis in two heterotic rice hybrids. Genetics 180:1725–1742
Article PubMed PubMed Central Google Scholar
Liu H, Chen G-B (2017) A fast genomic selection approach for large genomic data. Theor Appl Genet 130:1277–1284
Article PubMed CAS Google Scholar
Liu P, Zhao Y, Liu G, Wang M, Hu D, Hu J et al. (2017) Hybrid performance of an immortalized F2 rapeseed population is driven by additive, dominance, and epistatic effects. Front Plant Sci 8:815
Article PubMed PubMed Central Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
PubMed PubMed Central CAS Google Scholar
Nishio M, Satoh M (2014) Including dominance effects in the genomic BLUP method for genomic evaluation. PLoS ONE 9:e85792
Article PubMed PubMed Central CAS Google Scholar
Resende RT, Resende MDV, Silva FF, Azevedo CF, Takahashi EK, Silva-Junior OB et al. (2017) Assessing the expected response to genomic selection of individuals and families in Eucalyptus breeding with an additive-dominant model. Heredity 119:245–255
Article PubMed CAS PubMed Central Google Scholar
Riedelsheimer C, Czedik-Eysenberg A, Grieder C, Lisec J, Technow F, Sulpice R et al. (2012) Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet 44:217–220
Article PubMed CAS Google Scholar
Salomé PA, Bomblies K, Laitinen RAE, Yant L, Mott R, Weigel D (2011) Genetic architecture of flowering-time variation in Arabidopsis thaliana. Genetics 188:421–433
Article PubMed PubMed Central CAS Google Scholar
Schaeffer LR (2006) Strategy for applying genome wide selection in dairy cattle. J Anim Breed Genet 123:218–223
Article PubMed CAS Google Scholar
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS ONE 7:e45293
Article PubMed PubMed Central CAS Google Scholar
Technow F, Riedelsheimer C, Schrag TA, Melchinger AE (2012) Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects. Theor Appl Genet 125:1181–1194
Article PubMed Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Article PubMed CAS Google Scholar
Vitezica ZG, Legarra A, Toro MA, Varona L (2017) Orthogonal estimates of variances for additive, dominance, and epistatic effects in populations. Genetics 206:1297–1307
Article PubMed PubMed Central Google Scholar
Wang X, Li L, Yang Z, Zheng X, Yu S, Xu C et al. (2017) Predicting rice hybrid performance using univariate and multivariate GBLUP models based on North Carolina mating design II. Heredity 118:302–310
Article PubMed CAS Google Scholar
Xu S, Zhu D, Zhang Q (2014) Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci USA 111:12456–12461
Article PubMed CAS Google Scholar
Zhang Z, Zhang Q, Ding XD (2011) Advances in genomic selection in domestic animals. Chin Sci Bull 56:2655–2663
Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (31771392 to G.-B.C.).

Author contributions

H.L. and G.-B.C. designed and performed the study as well as wrote the manuscript.

Author information

Authors and Affiliations

Maize Research Institute, Sichuan Agricultural University, Chengdu, Sichuan Province, 611130, China
Hailan Liu
Clinical Research Institute, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang Province, China
Guo-Bo Chen

Authors

Hailan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Bo Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hailan Liu or Guo-Bo Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix

Factors influence prediction accuracy for F2 population

In this note, we try to outline the factors that influence the prediction accuracy for an F2 population.

For the training population, its phenotype can be expressed as

$$y = \mu + \mathop {\sum }\limits_{j = 1}^m b_{a_j}x_{a_j} + \mathop {\sum }\limits_{j = 1}^m b_{d_j}x_{d_j} + \varepsilon.$$

Here, we assume every marker is causal and has small effects, a typical polygenic trait. $x_{a_j}$ and $x_{d_j}$ are the orthogonal coding of the jth marker for the additive and dominance effect. var(y) = V_p the phenotypic variance, and ${\rm var}\left( {\mathop {\sum }\limits_{j = 1}^m b_{a_j}x_{a_j} + \mathop {\sum }\limits_{j = 1}^m b_{d_j}x_{d_j}} \right) = h_A^2 + h_D^2 = H^2$.

According to linear regression theory, for the additive effect for the jth marker can be estimated as $\hat b_{a_j} = \frac{{{\rm cov}(y,x_{a_j})}}{{{\rm var}(x_{a_j})}}$ and rewritten as $b_{a_j} + \sigma _{\hat b_{a_j}}$, in which $\sigma _{\hat b_{a_j}} = \frac{{\sigma _e^2}}{{N_T\sigma _{x_{a_j}}^2}}$ the sampling variance of the estimate; for the dominance effect, $\hat b_{d_j} = \frac{{{\rm cov}(y,x_{d_j})}}{{{\rm var}(x_{d_j})}}$, and $\sigma _{\hat b_{d_j}} = \frac{{\sigma _e^2}}{{N_T\sigma _{x_{d_j}}^2}}$. N_T is the sample size of the training population, and m is the number of markers.

For the candidate population, the phenotype can be expressed as $y_C = a + \mathop {\sum}\nolimits_{j = 1}^k {b_{a_j}\tilde x_{a_j}} + \mathop {\sum}\nolimits_{j = 1}^k {b_{d_j}\tilde x_{d_j} + \varepsilon _T}$, while the predicted genotypic values $\hat y_C = \mathop {\sum}\nolimits_{j = 1}^k {\hat b_{a_j}\tilde x_{a_j}} + \mathop {\sum}\nolimits_{j = 1}^k {\hat b_{d_j}\tilde x_{d_j}}$. It is easy to derive the variance and covariance terms below.

$${\rm Var}\left( {y_C} \right) = \mathop {\sum }\limits_{j = 1}^k b_{a_j}^2\sigma _{x_{a_j}}^2 + \mathop {\sum }\limits_{j = 1}^k b_{d_j}^2\sigma _{x_{d_j}}^2 + \sigma _{e_T}^2 = V_G + V_{e_C},$$

$${\rm Var}\left( {\hat y_C} \right) = \mathop {\sum }\limits_{j = 1}^k b_{a_j}^2\sigma _{x_{a_j}}^2 + \mathop {\sum }\limits_{j = 1}^k b_{d_j}^2\sigma _{x_{d_j}}^2 + \left( {\frac{{m_{e.a}}}{{n_T}} + \frac{{m_{e.d}}}{{n_T}}} \right)\sigma _e^2 = V_G + V_e\left( {\frac{{m_{e.a}}}{{n_T}} + \frac{{m_{e.d}}}{{n_T}}} \right),$$

$${\rm Cov}\left( {\hat y_T,y_T} \right) = \mathop {\sum }\limits_{j = 1}^k b_{a_j}^2\sigma _{x_{a_j}}^2 + \mathop {\sum }\limits_{j = 1}^k b_{d_j}^2\sigma _{x_{d_j}}^2 = V_G.$$

The prediction accuracy is

$$\begin{array}{l}r^2 = \frac{{{\rm Cov}\left( {\hat y_T,y_T} \right)^2}}{{{\rm Var}\left( {y_T} \right){\rm Var}\left( {\hat y_T} \right)}} = \frac{{V_G^2}}{{\left( {V_G + V_{e_T}} \right)\left[ {V_G + V_e\left( {\frac{{m_{e.a}}}{{n_T}} + \frac{{m_{e.d}}}{{n_T}}} \right)} \right]}}\cr = H^2\frac{{H^2}}{{\left[ {H^2 + \left( {1 - H^2} \right)\left( {\frac{{m_{e.a}}}{{n_T}} + \frac{{m_{e.d}}}{{n_T}}} \right)} \right]}} \approx H^2\frac{{H^2}}{{H^2 + \frac{{m_{e.a} + m_{e.d}}}{{n_T}}}}\end{array}.$$

For genetic value

$$y_G = \mu + \mathop {\sum }\limits_{j = 1}^k b_{a_j}x_{a_j} + \mathop {\sum }\limits_{j = 1}^k b_{d_j}x_{d_j},$$

$$V(y_G) = V_G.$$

The prediction accuracy between the true genotypic values and the predicted genotypic values can be written as squared Pearson’s correlation

$$r_G^2 = \frac{{V_G^2}}{{V_G\left[ {V_G + V_e\frac{{m_{e.a} + m_{e.d}}}{{n_T}}} \right]}} = \frac{{H^2}}{{H^2 + \frac{{m_{e.a} + m_{e.d}}}{{n_T}}}}.$$

This equation is an extension of the one as derived by Daetwyler et al. (2008), but here we include the dominance component. In practice, the prediction accuracy is more relevant to the effective number of loci, which can be understood as quasi-independent segment of the whole genome. So, the prediction accuracy is approximated as

$$r^2 = H^2\frac{{H^2}}{{H^2 + \frac{{m_{e.a} + m_{e.d}}}{{n_T}}}} = H^2r_G^2,$$

(A1)

in which m_e,a and m_e,d are the effective number of markers coded for additive and dominance effects.

$$m_{e.a} = \frac{{m^2}}{{m + \mathop {\sum }\nolimits_{i = 1}^k \mathop {\sum }\nolimits_{i \ne j}^k r_{ij}^2}}.$$

(A2)

As for markers not on the same chromosome, the LD is nearly zero, so $m_{e.a} = \frac{{m^2}}{{m + \mathop {\sum }\nolimits_{i = 1}^k \mathop {\sum }\nolimits_{i \ne j}^k r_{ij}^2}}$

$$m_{e.d} = \frac{{m^2}}{{m + \mathop {\sum }\nolimits_{i = 1}^k \mathop {\sum }\nolimits_{i \ne j}^k r_{ij}^4}}.$$

(A3)

If the recombination is based on Haldane map function, for F2 $r_{ij}^2 = \exp \left( { - 4\left| {d_i - d_j} \right|} \right) = e^{ - 4d_{ij}}$, in which d_i,j = |d_i−d_j| is the genetic distance (Morgan) between a pair of loci, and $r_{ij}^2 = e^{ - 8d_{ij}}$. Obviously, when there is no LD between markers, $r_{ij}^2 = 0$, and m_e,a = m, m_e,d = m. As $r_{ij}^4 \le r_{ij}^2$, we have $m_{e.a} \le m_{e.d} \le m$.

Further approximation for the prediction accuracy

For the additive component,

$$\frac{{\mathop {\sum }\nolimits_{i = 1}^k \mathop {\sum }\nolimits_{i \ne j}^k r_{ij}^2}}{{m^2}} = \mathop {\smallint }\limits_0^l \mathop {\smallint }\limits_0^l e^{ - 4|d_{x_1} - d_{x_2}|}d_{x_1}d_{x_2} = \frac{1}{{2l^2}}\left( {l - \frac{{c_{2l}}}{2}} \right)$$

and for the dominance component,

$$\frac{{\mathop {\sum }\nolimits_{i = 1}^k \mathop {\sum }\nolimits_{i \ne j}^k r_{ij}^4}}{{m^2}} = \mathop {\smallint }\limits_0^l \mathop {\smallint }\limits_0^l e^{ - 4|d_{x_1} - d_{x_2}|}d_{x_1}d_{x_2} = \frac{1}{{4l^2}}(l - \frac{{c_{2l}}}{2})$$

in which $c_{2l}$ is the recombination fraction given the genetic distance of 2l.

So, $m_{e.a} = \left[ {\frac{1}{m} + \frac{{\left( {l_1 - \frac{{c_{2l}}}{2}} \right)}}{{2l^2}}} \right]^{ - 1}$, if the markers are dense, and m>>l₁ (m is often greater than 10,000 along a single chromosome), $m_{e.a} \approx 2l$; similarly, $m_{e.a} \approx 4l$. So, the prediction accuracy can be further approximated as

$$r^2 \approx H^2\frac{{H^2}}{{H^2 + \frac{{6l}}{{n_T}}}}.$$

(A4)

when the density of markers is high.

So the expectation of the prediction accuracy is upon the training sample size, but the statistical significance of r² depends on the sample size of the candidate sample size. Under the null distribution r² follows $\chi _1^2$, so the non-centrality parameter for the statistical test of r² is $\lambda = \frac{{n_Cr^2}}{{1 - r^2}}$, in which n_C is the sample size of the candidate population.

In genomic prediction, the additive genomic relationship matrix can be used to estimate m_e.a. Given A, an n_T × n_T matrix, the additive genomic relationship matrix, if we estimate variance, $\sigma _{A_o}^2$, of the $\frac{{n_T\left( {n_T - 1} \right)}}{2}$ off-diagonal elements, and $\hat m_{e.a} = \frac{1}{{\sigma _{A_o}^2}}$; similarly, we can have $\hat m_{e.d} = \frac{1}{{\sigma _{D_o}^2}}$ for the dominance effective number of markers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Chen, GB. A new genomic prediction method with additive-dominance effects in the least-squares framework. Heredity 121, 196–204 (2018). https://doi.org/10.1038/s41437-018-0099-5

Download citation

Received: 31 January 2018
Accepted: 23 May 2018
Published: 20 June 2018
Issue Date: August 2018
DOI: https://doi.org/10.1038/s41437-018-0099-5

This article is cited by

A dimensionality-reduction genomic prediction method without direct inverse of the genomic relationship matrix for large genomic data
- Hailan Liu
- Shizhou Yu
Plant Cell Reports (2023)
Including dominance effects in the prediction model through locus-specific weights on heterozygous genotypes can greatly improve genomic predictive abilities
- Tianfei Liu
- Chenglong Luo
- Guosheng Su
Heredity (2022)
Transcriptome analysis reveals the molecular mechanisms of heterosis on thermal resistance in hybrid abalone
- Qizhen Xiao
- Zekun Huang
- Caihuan Ke
BMC Genomics (2021)

Subjects

Abstract

Similar content being viewed by others

Efficient weighting methods for genomic best linear-unbiased prediction (BLUP) adapted to the genetic architectures of quantitative traits

Performance of Bayesian and BLUP alphabets for genomic prediction: analysis, comparison and results

Multi-trait single-step genomic prediction accounting for heterogeneous (co)variances over the genome

Introduction

Materials and methods

The Arabidopsis thaliana F2 population

Statistical models

Analytical results for the Haseman-Elston regression

Best linear prediction (BLP)

Results

Estimates of the heritability and predictability in the simulated F2 population

Comparison of computational time of HE|AD and REML|AD

Comparison of heritability and predictability between F2 and IF2 derived from RIL using HEBLP|AD

Approximation of prediction accuracy

Genomic prediction of 7 traits in the Arabidopsis thaliana F2 population

Discussion

The impact of the dominance heritability on predictive accuracy

Application of the genomic prediction in hybrid breeding of crops

References

Acknowledgements

Author contributions

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Appendix

Appendix

Factors influence prediction accuracy for F2 population

Further approximation for the prediction accuracy

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A dimensionality-reduction genomic prediction method without direct inverse of the genomic relationship matrix for large genomic data

Including dominance effects in the prediction model through locus-specific weights on heterozygous genotypes can greatly improve genomic predictive abilities

Transcriptome analysis reveals the molecular mechanisms of heterosis on thermal resistance in hybrid abalone

Search

Quick links