Variance components model with disequilibria

Yuan, Ao; Chen, Guanjie; Yang, Qi; Rotimi, Charles; Bonney, George

doi:10.1038/sj.ejhg.5201645

Download PDF

Article
Published: 24 May 2006

Variance components model with disequilibria

Ao Yuan¹,
Guanjie Chen¹,
Qi Yang²,
Charles Rotimi¹ &
…
George Bonney¹

European Journal of Human Genetics volume 14, pages 941–952 (2006)Cite this article

416 Accesses
Metrics details

Abstract

The variance components (VC) model has been popular for genetic analysis. It has received wide applications in a variety of genetic practices, and been extended to various forms for different settings. However, most of the existing VC models are, explicitly or implicitly, under the assumption of the Hardy–Weinberg and/or linkage equilibria, which is impractical in some realistic settings since more or less deviations from this assumption are common. We propose a new VC model that incorporates both these disequilibria, and includes the existing models as special cases. The corresponding variance components are computed for some commonly used relative pairs conditional on the observed marker identity-by-descent data. Parameters can be estimated by the traditional methods such as the maximum likelihood estimate. Simulation studies suggest that this extended model improves inference significantly over the existing models when deviations of these disequilibria are present.

From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill

Article 11 July 2022

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Article Open access 29 August 2022

Rank-invariant estimation of inbreeding coefficients

Article Open access 25 November 2021

Introduction

The variance components(VC) models^{1, 2, 3, 4, 5, 6, 7, 8} has received much attention and wide applications in quantitative genetic trait studies, as this method requires few model assumptions. It has been extended to various forms for different data structures under different algorithms and model assumptions. Lange and Boehnke⁹ extended it to multivariate traits, Duggirala et al¹⁰ applied it to dichotomous traits, Amos et al¹¹ studied the least squares algorithm of it. Andrade et al¹² extended it to longitudinal pedigree data. This model and its variants have been used extensively in genetic linkage analysis. However, most of the existing VC models are, explicitly or implicitly, under the assumption of the Hardy–Weinberg and/ or linkage equilibria. These fundamental assumptions are sometimes not easy to justify, and in practice they are often more or less deviated. In linkage analysis the latter assumption may be inappropriate, since putative disease locus are usually in linkage disequilibrium(LD) with the flanking marker loci.¹³ Almasy et al¹⁴ proposed a combined linkage/disequilibrium analysis in which the LD are incorporated into the VC model. There are some VC models for combined linkage and association studies,¹⁵ a VC model incorporated with the two disequilibria is of practical meaning, and has not been in the literature. Here we consider such model in the settings of Hardy–Weinberg and/or LD, as an extension of the existing VC models. In our model the LD is parameterized via the trait-marker composite genotype, differently from that in Almasy et al¹⁴ in which the LD is parameterized via the trait-marker alleles. The correspondindg variance components are computed for some commonly used relative pairs conditional on the observed marker identity-by-descent (IBD) data. Parameters can be estimated by the traditional methods such as the maximum likelihood estimate (MLE) under the normal model assumption. This extended VC model is expected to have more accurate estimation of parameters, can be used for linkage and combined linkage and LD mapping (association study), using pedigree data, and have more power for such analysis.

The common VC model

We first describe the likelihood of the commonly used variance components model, for example as in Amos.⁵ Since the total likelihood is a product of likelihood over all the families under study, we only present the model for a given family for the sake of simplicity.

Let Y_i be the trait value of the ith individual in the family.The VC model describing the trait value is

where μ is the overall mean, g_i is the unobserved random major gene effect at the trait locus with alleles denoted by A and B, G_i is the unobserved polygenic effects,

where the η_j's are effects associated with the covariates x_ij's, and e_i is the residual random error. The usual assumption is that g_i, G_i and e_i are uncorrelated and E(g_i)=E(G_i)=E(e_i)=0. Let p be the population proportion of allele A. Under the Hardy–Weinberg assumption one has E(g_i)=a(2p−1)+2p(1−p)d=0. The covariance between individuals i and j is

where σ_a²=2p(1−p)[a−d(2p−1)]² is the additive genetic variance due to the locus, σ²_d=4p²(1−p)²d² is the dominant genetic variance, Φ_ij=Δ_7ij/2+Δ_8ij/4 is the kinship coefficient¹⁶ between individuals i and j, and Δ_7ij, Δ_8ij, Δ_9ij are the condensed kinship coefficient,¹⁷ between individuals i and j. The Δ_kijs(k=1, …, 9) are the probabilities for the nine possible condensed IBD status as divided by Jacquard,¹⁷ in which Δ_7ij, Δ_8ij and Δ_9ij are commonly used in practice. They are the population probabilities of sharing 2, 1 and 0 genes IBD for individuals (i, j), without regard to their particular genotypes, but only (i, j)'s kinship relationships, and under the Mendelian inheritance. Also, 2Φ_ij is the expected proportion of gene IBD for individuals (i, j), at this locus.

For linkage analysis, usually IBD sharing data {π_ij} {π_ij=0, 1, 2} between a relative pair individual i and j, at marker locus is available, Amos⁵ proposed the following model for the conditional covariance

where θ is the recombination fraction between the trait and the marker loci. The values of f(θ, π_ij) and g(θ, π_ij) can be found.⁵ It is noted that g(θ, π_ij)=0 for most human relative pairs except full sibs and it's related to the possibility of sharing two allales IBD.

VC model with disequilibria

In this section we derive VC models with disequilibria in different settings, by incorporating these parameters into the covariances (2).

Hardy–Weinberg disequilibrium at trait locus

We first consider incorporating the Hardy–Weinberg disequilibrium at the trait locus into the VC model, without marker information Let A_k denote allele k at the trait locus (k=1, …, K), p_k its proportion in the population, P_kl the corresponding proportion of the genotype A_kA_l. One way to deal with the deviation from the Hardy–Weinberg assumption is the use of the within population inbreeding cofficient^{18, 19} f at the trait locus, which is the odds that at any gene, both alleles of the gene pair were inherited from the same ancestor. Let I(·) be the indicator function. Given f we have

Here 0≤f≤1, and f=0 corresponds to Hardy–Weinberg equilibrium. Let p_(kl)(km) be the conditional probability that two individuals have genotype (A_kA_l, A_kA_m) or (A_lA_k, A_mA_k) at the trait locus given that they share A_k IBD (Assuming random mating and phase known, these are the only cases they share A_k IBD. The possibilities for the cases A_kA_l, A_mA_k or (A_lA_k, A_kA_m) are negligible). Let Y be the trait value of a general individual and g be his/her genotype, and μ_kl=E(Y∣g=A_kA_l). Following Fisher¹ and Lange,¹⁶ let α_k's be the optimal additive genetic effects in the sense that they minimize the sum of squared residuals Σ_kΣ_lδ²_klp_kl, where δ_kl=μ_kl−α_k−α_l. We show in Appendix A that

and

where γ₇(f)=(1+(f/2))σ²_a+(1−f)σ²_d+fσ²₀, γ₈(f)=((1+f)²/2)σ²_a, σ²_a=2Σ_kα²_kp_k, σ²_d=Σ_kΣ_lδ²_klp_kp_l, σ²₀=Σ_kδ²_kkp_k is the part of variance explained by the optimal additive genetic effects, and α_k=Σ_lμ_klp_kl/[(1+f)p_k] for all k.

Note that if f=0, (6) reduces to (2). The α_k's and is δ_kl's are the optimal additive major gene effects and the residual effects.¹⁶

Linkage to marker

Now we consider the case with marker information available in addition to the trait locus data. Let π_ij(=0, 1, 2) to be the number of IBD allele sharing between individuals i and j at the marker locus, π′_ij be the corresponding unobserved number at the trait locus, and θ be the recombination fraction between the two loci. Expressions for Cov(Y_i, Y_j∣π_ij=k) can be found by the formula

Usually, for each individual the IBD data π_ij is not directly available. However, their probabilities P(π_ij=k)(k=0, 1, 2) can be computed from the corresponding observed marker genotypes. So the covariances between individual pair (i, j) in a given family is

Covariance with Hardy–Weinberg disequilibrium at trait given marker IBD

In the previous section, we derived the variance components under Hardy–Weinberg equilibrium at the trait locus. Here we give these componenets with the linked marker information, that is, conditional on the trait-marker IBD data. In this case the variance components are

where Δ_7ij(π_ij)=P(π′_ij=2∣π_ij), Δ_8ij(π_ij)=P(π′_ij=1∣π_ij) and Δ_9ij(π_ij)=P(π′_ij=0∣π_ij) are the conditional IBD sharing at the trait locus given the IBD sharing at the marker locus, for individuals (i, j). The derivation is the same as that for Cov(Y_i, Y_j∣f) with Δ_7ij and Δ_8ij replaced by Δ_7ij(π_ij) and Δ_8ij(π_ij), whose values are obtained from the relationships

and the known values of P(π′_ij=0, π_ij) as listed in the literatures cited before. Note here given π′_ij=0, Y_j and Y_j are independent, and Cov(Y_i, Y_j∣f,π′_ij=0)=0, thus we don't have the term for Δ_9ij(π_ij).

Since in real data the set {π_ij} is unobservable, we only have the computed the set of probabilities {P(π_ij=k)}, thus the covariance is

Covariance with LD between trait and marker

In linkage analysis, LD between the trait locus and the genotype marker locus should be taken into consideration. In this section we compute the covariances between relative pairs when in addition to the case of LD is also present between the trait and marker loci. Let a_k and a_ka_l denote the alleles and genotypes at the marker locus, q_k and q_kl be the corresponding population frequencies. Since the within-population inbreeding coefficient f is common for any locus in the genome of the given population, f describes the relationship between the marker genotype frequencies q_ks allele frequencies q_kls, in the same way as it did between the p_ks and p_kls at the trait locus. That is, we have

Let be a general notation for the trait-marker composite genotype. We assume

It is easy to check that under (11), Σ_kΣ_lp_(kl, rs)=p_rs and Σ_rΣ_sp_(kl, rs)=p_kl,, the probabilities of composite genotypes satisfy such consistent condition with its marginal probabilities. Here 0⩽ζ⩽1 is the LD parameter, and it should not be confused with the definition of LD that is used in some texts, such as in Weir²⁰ or Almasy et al.¹⁴ Note that ζ=0 corresponds to linkage equilibrium. Also, ζ manifests the vertical connection between the trait and marker loci, while the recombination fraction describes the horizontal link between the alleles.

For a relative pair, let p_(kl)(mn)∣π′_ij=P(A_kA_l, A_mA_n∣π′_ij) be the conditional probability that individual i has trait genotype A_kA_l and individual j has trait genotype A_mA_n given their IBD value π′_ij at this locus, p_klm∣π′_ij=½P(A_kA_l, A_kA_m∣π′_ij)+½P(A_kA_l, A_mA_k∣π′_ij) be the probability when they also share one allele identical by state (IBS) at the trait; p_kl∣π′_ij=P(A_kA_l, A_kA_l∣π′_ij) be the probability when they share both alleles IBS at the trait locus. We have (Appendix B)

where γ₇(f, ζ, π_ij), γ₈(f, ζ, π_ij) and γ₉(f, ζ, π_ij) denote respectively Cov(g_i, g_j∣f, ζ, π_ij, π′_ij=2), Cov(g_i, g_j∣f, ζ, π_ij, π′_ij=1) and Cov(g_i, g_j∣f, ζ, π_ij, π′_ij=0). Note that by conditioning on the IDB values at both the trait and marker loci, we cannot assert Cov(g_i, g_j∣f, ζ, π_ij, π′_ij=0)=0 as we did for the previous section. We have γ₇(f, ζ, π_i,j)≡γ₇(f),

and

where

which is also written as

Since the genetic covariance between the relative pair can be written as

by (12), when π′_ij=2 or π_ij=0, the expression for genetic variance between a relative pair is the same regardless LD is present or not. In fact, from the derivation in Appendix B, this conclusion is true for any consistent composite genotype specification: under random mating and any consistent specification P(G) of the composite genotype, the IBD status (π′_ij, π_ij) of a relative pair (i, j) contributes information of LD to their genetic variance at the trait locus only if π′_ij≤1 and π_ij≥1.

Again in practice, given the estimated IBD probabilities, the covariance is computed as

Parameter estimation

Let β=(μ, η₁, …, η_j)^T be the parameters in the mean, α=(θ, f, ζ, σ²_a, σ²_d, σ²_G, σ²_e, σ²₀, σ²₁, σ²₂, σ²₃, σ₁₁, σ₁₂, σ₁₃)^T be the parameters in the covariance matrices, y_k be the observations of all the members in the kth family, and μ_k=μ_k(β)=E(Y_k)=X_kβ, where X_k is the covariate matrix for the kth family, and n_k is the total number of individuals in this family. The commonly used model based estimation method is MLE, while the common model for quantitative trait is the normal distribution. Under these assumptions, the likelihood of the kth family is L_k(α, β∣Y_k)=φ(Y_k−μ_k∣Ω_k), where φ(Y−μ∣Ω) is the density of the n_k dimensional normal N(μ, Ω) distribution, is the covariance matrix of the kth family, with

as specified in (12) in the most general case. The P(π_ij=r∣g_ij)'s can be obtained by some common IBD computation methods. The covariances can also take any of the more specific form (8), (6), (3) and (2) in the equilibrium case. Here we used (Y_i, Y_j) for (Y_ki, Y_kj), the (i, j)th relative pair in the kth family. The total likelihood is thus L(α, β∣Y)=Π^k_k=1L_k(α, β∣Y_k), and the log-likelihood, omitting the normalizing constant, is

The MLE is the parametric value (α̂, β̂) that maximizes (14), and it has many desired optimality properties.

Power

The power of the method can be easily estimated and will shown is dependent only on the parameters α in the covariance matrix. Let H₀:α=α₀ and H₁:α=α₁ (H₀⊂H₁ or α₀ be part of α₁) be the null and alternative hypothesis considered in the previous sections, dim(H₁)−dim(H₀)=k and f(·∣α, β) be the density of the model considered. Let α̂₀ and α̂₁ be the MLE of α under H₀ and H₁, respectively. Note our hypothesis only involves α, not the parameters β in the mean specification. Let

be the relative entropy (Kullback–Leibler divergence) between the two densities f(·∣α₁, β₁) and f(·∣α₀, β₀). It is known that D(α₁∣∣α₀)≥0 with equality hold if and only if α₁=α₀. Assuming homogenous familial structures for all the families, for give level γ>0, the asymptotic power q_n for the likelihood ratio test of H₀ vs H₁, with a dataset of size (number of families) n, is (Appendix C)

where V_k is the χ² random variable with k degrees of freedom and χ²_k(1−γ) is its 1−γ upper quantile.

Given f(·∣·, ·), α₁ and α₀, D(α₁∣∣α₀) can be easily computed. In fact, since our model f(·∣·, ·) is multivariate normal, it is easy to see that

where d=dim(μ), Ω(·) is the Ω_k's with the elements given in (12), in which the τ_ij's take the theoretical mean values. To plot the power surface, we fix the parameter values at their MLE, except those for f and ζ. Then for a given γ>0 and a set of selected (f, ζ) values, we can compute q_n=q_n(γ, f, ζ) for different γ, f, ζ and n.

Application

Simulation study

Data of 10 000 sibpairs are simulated in our study. We give some detailed description of how the two levels of disequilibria are incorporated in the simulation process. It can be described in the following three steps.

Step 1

For each sibpair we simulate the their trait genotypes g_is and the marker IBD probabilities π_ijs. Let G_i=(a_ra_s)/(A_kA_l) be the composite genotype of the trait and marker for the ith individual, with lower case letters a_ra_s for marker genotype. we simulate (G_i, G_j) for each sibpair, and π_ij is generated along. We first generate the composite genotypes G_f of the father and G_m of the mother by the probability given in (11) with ζ=0.1, and p_kl and q_rs are given (4) and (10) with f=0.12, p₁=0.55, p₂=0.45, q₁=0.65 and q₂=0.35. Although (G_f, G_m) are not part of the data to be used in the computation, they are needed to generate the sibs composite genotypes. Now given (G_f, G_m) we generate G_i, G_j and π_ij as below. Let G_f=(a_f1 a_f2)/(A_f1 A_f2), G_m=(a_m1 a_m2)/(A_m1 A_m2). During meiosis, if there is no recombination (with probability 1−θ, θ=0.25), G_f splits into two gametes (a_f1/A_f1) and (a_f2/A_f2). Then one of the gametes is selected with probability 0.5 to pass to the next generation. Here we only consider the recombination at the marker, since we want the IBD π_ij at the marker. The recombination at the trait is similar, and we omit it for simplicity, since this will not affect the probabilities of the G_is. Similarly, G_m will split into (a_m1/A_m1) and (a_m2/A_m1), or (a_m2/A_m1) and (a_m1/A_m2), and one of the gamets is selected with probability 0.5 to pass to the next generation. For example, if for the father, there is recombination during meiosis and (a_f1/A_f1) is selected, and for the mother there is no recombination during meiosis and (a_m1/A_m1) is selected, then G_i=(a_f1 a_m1)/(A_f2 A_m1) and g_i=(A_f2A_m1). Repeat the above process to get, say, G_j=(a_f2 a_m1)/(A_f1 A_m1) and g_j=(A_f1A_m1). Since at the marker locus, sibpair (i, j) has a composite genotype (a_f1a_m1, a_f2a_m2), we have π_ij=1, which comes from the common maternal allele a_m1.

Step 2

Simulate each pair's covariates. The mean μ_I of the ith individual is given by (1). Specifically, we take μ=23, g_i=1 if individual i has genotype A₁A₁=0 if A₁A₂, and =−1 if A₁A₂. We take G_i∼N(0,σ²_G) with σ²_G=0.2. Two covariates are genetated, x_i1 and x_i2, stand for age (years) and sex index for the ith individual, x_i2=1 for female and =0 for male. The coefficient for age is η₁=0.2 and that for sex is η₂=1.5. e_i is the random error from N(0, 1) distribution.we always assume the first dib is younger with x_i1∼U[10, 60], then for the second sib, with x_j1=x_i1+z with z∼U[1, 10]. For x_i2, using the gender ratio from the real data, we sample z∼U(0, 1), if z≤0.54 let x_i2=1 (female) otherwise 0 (male).

Step 3

Simulates the sibpair covariance matrices Ω_ij=Cov(Y_i, Y_j)=(ω_ij) and the final observed trait values. By (3.9), ω₁₁=ω₂₂=(1+f/2)σ²_a+(1−f)σ²_d+fσ²₀+σ²_G+σ²_e, σ²_a=2Σ_kα²_kp_k, α²_d=Σ_k,lδ²_klp_kp_l, σ²₀=Σ_kδ²_kkp_k and p_k is the population proportion of allele A_k, δ_kl=μ_kl−α_k−α_l, α_k=Σ_lμ_klp_kl/[(1+f)p_k], p_kl=(1−f)p_kp_l+f_pkI(l=k) is the population proportion of genotype A_kA_l, and μ_kl=E(Y∣g=A_kA_l)=μ+g_kl+η_lE(x_i1)+η₂E(x_i2)=μ+g_kl+40η₁+0.54η₂, as is given in (3.9), and for sibpairs Φ_ij=1/4. Δ_kij(π_ij) is defined after (8) and can be found in Wright,¹⁸ where they are implemented in terms of the recombination fraction θ. The marker IBD data π_ijs are generated above, the trait IBD π′_ij are unknown, but only the conditional probability P(π′_ij∣π_ij)s are used, which are easily derived.²⁰ The γ_k(f, ζ, π_ij)s are defined after (12). The definition of σ_1,2 involved p_(kl)(km) which is given in the definition of the γ_k(f, ,ζ, π_ij)s. Now we have implemented Ω_ij and are ready to simulated the y_is. We simulate the data pairwise. For a sibpair (y_i, y_j), denote Y=(y_i, y_j) and μ=(μ_i, μj). We sample Z∼N(0, I₂), the two-dimensional standard normal distribution, and let , and simulate such Y 10 000 times.

For γ₈(f, ζ, π_ij) in te case π_ij=2, σ_1, 1, σ_1, 2 and σ_1, 3 are not independently estimable, so in this case we write γ₈(f, ,ζ, 2)=(1−ζ)γ₈(f)=σ₄², where σ₄²=−ζ(1+f)σ_1, 1+2ζσ_1, 2+ζ²σ_1, 3 viewed as a single parameter to be estimated.

Table 1 displays the values of the real parameters of interest from the simulation, and their MLE estimates (estimated standard deviation in bracket) under H₀: f=ζ=0.0 and H₁: all parameters free, respectively.

Table 1 Parameter estimates for the simulated data under H₀ and H₁

Full size table

The difference 2(log likelihood(H₁)−log likelihood(H₀))=20.9934, with a P-value of 0.000106 under a χ² distribution with two degrees of freedom, that is, the evidence of rejecting H₀ is very strong. This example shows that incorporating the disequilibria mechanism into the variance components model can improve the inference significantly when such disequilibria are present.

Real data application

We used the AADM data (African-American Diabetes mellitus) to illustrate the method. The data is from an international collaboration between West Africa and US investigators in mapping type II diabetes susceptibility genes in West African ancestral populations of African-Americans. Affected sib-pairs along with unaffected spouse controls were being enrolled. Eligible participants were invited to study clinics to obtain detailed epidemiological, familial and medical history information. For detailed description of the data, see Rotimi et al.²¹ For this data we computed the model parameter estimates using VC model (2), or under the hypothesis of equilibria, H₀: f=ζ=0; and under the VC model with Hardy–Weinberg/LD (12), H₁:f and ζ are free parameters, to fit the data. The response variable is BMI, the covariate is age. The results are shown in Table 2, where the estimated standard deviations are listed inside the brackets.

Table 2 Parameter estimates for the AADM data under H₀ and H₁

Full size table

The −2 loglikelihood difference is 12.5076 with a P-value of 0.0058, which is highly significant. So the inference should be based on H₁. We see a large Hardy–Weinberg disequilibrium at the triat locus, suggesting that the genetic background of the sample under study is not as simple as assumed by the existing VC model. The low recombination rate (0.0016) indicates that the trait and marker loci are tightly linked, and the LD between the trait and marker is non-negligible. The overall BMI of this sample is 23.58, and the age effect is 0.053, which are quite common for normal populations.

The power depends on all the parameters in the model, we highlight its dependence on (f, η) to study its relationship with these two parameters. Using (15) and the parameters above, the following Figure 1 shows the powers of the likelihood ratio test for H₀ vs H₁, for various combinations of f, ζ, and n.

Since the LD depends on the unobservable trait genotype, its needs larger sample size to detect. For the real data, with the observations and the estimated parameter setting, it is easy to detect the HWE disequilibrium with reasonable sample size, while it is very difficult to detect the LD, or requires very large sample size to achieve high power. For the simulated data-parameter setting, the powers are high for the joint HWE disequilibrium, LD and the joint HWE and LD disequilibria.

The software for this extended VC model is written in SAS; the current version is for sibpair familial structure only, and is available upon request from the second author at gchen@genomecenter.howard.edu. The CPU time to compute the parameter estimates depends on the machine, data size, number of regressors, pedigree structure and starting values for the parameters etc. For the two examples above, with suitably chosen starting values, the CPU times for computing the MLEs are 27.24 and 27.33 s on our machine.

Discussion

We have generalized the VC model to the cases of the Hardy–Weinberg and LD or both, this gives more practical application of this popular model. In some practices, these disequilibria are not justified. In these cases, the existing VC model is clearly inadequate, and our generalized VC model might be beneficial in more estimates, and in enhancing the inference power of parameters of interest. Also this generalized model can be used in testing these disequilibria by forming the corresponding likelihood ratio statistic, along with the parameter estimates. Other inferences on one or both of the two disequilibria are sometimes also of direct interest, which are now available under this generalized VC model.

We computed the variance components for some common relative pairs. The cases of other relative pairs are similar and straightforward. We considered the parameter estimation in several ways and computed the IBD under some common cases.

Further extensions/modifications to implement more features will be similar, such as the multivariate traits,⁹ the multipoint VC, dichotomous trait, robust LOD score correction,⁷ the conditioning adjustment.²¹

References

Fisher RA : The correlation between relatives on the supposition of mendel inheritance. Trans Roy Soc Edinburgh 1918; 52: 399–433.
Article Google Scholar
Harris DL : Genotypic covariances between inbred relatives. Genetics 1964; 50: 1319–1348.
CAS PubMed PubMed Central Google Scholar
Amos CI, Elston RC : Robust methods for the detection of genetic linkage for quantitative data from pedigrees. Genet Epidemiol 1989; 6: 306–349.
Google Scholar
Goldgar DE : Multipoint analysis of human quantitative genetic variation. Am J Human Genet 1990; 47: 957–967.
CAS Google Scholar
Amos CI : Robust variance-components approach for assessing gene linkage in pedigrees. Am. J Human Genet 1994; 54: 535–543.
CAS Google Scholar
Almasy L, Blangero J : Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Human Genet 1998; 62: 1198–1211.
Article CAS Google Scholar
Blangero J, Williams JT, Almasy L : Robust LOD score for variance component-based linkage analysis. Genetic Epidemiol 2000; 19 (Suppl 1): S8–S14.
Article Google Scholar
Sham PC, Purcell S : Equivalence between Haseman–Elston and variance-components linkage analysis for sib pairs. Am J Human Genet 2001; 68: 1527–1532.
Article CAS Google Scholar
Lange K, Boehnke M : Extensions to pedigree analysis. IV covariance components models for multivariate traits. Am J Med Genet 1983; 14: 513–524.
Article CAS Google Scholar
Duggriala R, Williams JT, Williams-Blangero S, Blangero J : A variance component approach to dichotomous trait linkage analysis using a threshold model. Genet Epidemiol 1997; 14: 987–992.
Article Google Scholar
Amos CI, Gu X, Chen J, Davis BR : Least squares estimation of variance components for linkage. Genetic Epidemiol 2000; 19 (Suppl 1): S1–S7.
Article Google Scholar
Andrade M, Gueguen R, Visvikis S, Sass C, Siest G, Amos C : Extension of variance components approach to incorporate temporal trends and longitudinal pedigree data analysis. Genetic Epidemiol 2002; 22: 221–232.
Article Google Scholar
Xiong M, Jin L : Combined linkage and linkage disequilibrium mapping for genome screens. Genetic Epidemiol 2000; 19: 211–234.
Article CAS Google Scholar
Almasy L, Williams J, Dyer T, Blangero J : Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genetic Epidemiol 1999; 17 (Suppl. 1): S31–S36.
Article Google Scholar
Fulker DW, Cherny SS, Sham PC, Hewitt JK : Combined linkage and association sib-pair analysis for quantitative traits. Am J Human Genet 1999; 64: 259–267.
Article CAS Google Scholar
Lange K : Mathematical and statistical methods for genetic analysis. Berlin: Springer-Verlag, 1997.
Book Google Scholar
Jacquard A : The genetic structure of populations. New York: Springer-Verlag, 1974.
Book Google Scholar
Wright S : The genetic structure of populations. Ann. Eugenics 1951; 15: 323–354.
Article CAS Google Scholar
Cockerham CC : Variance of gene frequencies. Evolution 1969; 23: 72–84.
Article Google Scholar
Weir B : Genetic Data Analysis II. Sinauer Associates, Inc. Publishers: Sunderland, Massachusetts, 1996.
Google Scholar
Rotimi CN, Dunston GM, Berg K : In search of susceptibility genes for type 2 diabetes in West Africa: The design and results of the first phase of the AADM study. Ann Epidemiol 2001; 11: 51–58.
Article CAS Google Scholar
Lange K : Central limit theorems for pedigrees. J Math Biol 1978; 6: 59–66.
Article Google Scholar

Download references

Acknowledgements

We appreciate the suggestions/comments from the Editor and the two anonymous reviewers, which greatly improved the quality of this manuscript. The work was supported by the United States Public Service Grant No AG 16996 and the National Center for Research Resources Grant No 2G12RR003048 from the National Institutes of Health. The AADM study was supported by NIH Grants no. 3T37TW00041 from NCMHD and NHGRI. G Chen and Rotimi were also partly supported by the NIGMS/MBRS program.

Author information

Authors and Affiliations

National Human Genome Center, Howard University, Washington, DC, USA
Ao Yuan, Guanjie Chen, Charles Rotimi & George Bonney
Department of Computer Science and Software Engineering, University of Wisconsin-Platteville, WI, USA
Qi Yang

Authors

Ao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Guanjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Charles Rotimi
View author publications
You can also search for this author in PubMed Google Scholar
George Bonney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ao Yuan.

Appendices

Appendix A

We first derive (5). When fixed A_k, the events A_kA_l and A_kA_m are independent, so by (4) we have,

For (6), we use the method as in Lange¹⁶ (pp. 87–89). Since and the α_k's minimize the squared error , take derivative with respect to α_k, we get

Sum over k in (A.1) we have

Now (A.1) and (A.2) gives

that is,

Then we have

When i=j, Cov(G_i, G_j)=σ_G², Cov(e_i, e_j)=σ_e² and

By (A.1), the above is

Since ∑_lp_kl=∑_lp_lk=p_k, we have

and

so by (A.3) and the above three equations we have

If i≠j, Cov(e_ie_j)=0, by the central limit theorem of Lange²² and assume no dominance, we have approximately Cov(G_i, G_j)=2Φ_ijσ_G² and

By (A.1) and (A.2), the coefficient Δ_9ij is zero. By the calculation for E(g_i²), the first term above is

the second term is

From (5) it is easy to check that

so by (A.2) the coefficient of Δ_8ij is

Since ∑_k∑_lα_kα_lp_kl=fσ_a²/2, the above is

By (5) and (A.1), the middle term in the above is

By the same way, the last term in (A.6) is

so the coefficient of Δ_8ij is

Now collecting terms we have

Appendix B

When i=j, π_ii′=2 which is noninformative about trait-marker relationship, so , which has the same expression as in (8). When i≠j,

We first derive the conditional probabilities in (B.1). Since conditioning on the IBD status, those quantities are independent of relatedness of the pair, only depend on the relationships among the trait and marker alleles through f and ζ, in other words, given IBD status, different alleles in one configuration are independent with those in the other one. We have

Now the two configurations share A_kA_l in common, if we fix it, the two configurations are independent each other, so we rewrite the above as

Similarly,

and

where p_(kl, r)=∑sp_(kl, rs). The same reason gives

so we have

Also

where

So

and

so

Also

and

where p_(k, rs)=∑_lp_(kl, rs)=p_k(q_rs−ζp_rs+ζ(1−f)p_sI(r=k)+ζI(r=s=k)), so

Lastly

Now we compute the covariance (B.1) for different values of the π′_ij's. If π′_ij=0, by (B.2)–(B.4) and Appendix A, we have the same expression of (B.1) as in (8).

If π′_ij=1, by (B.5)–(B.7), the coefficient of Δ_7ij(π_ij) and Δ_8ij(π_ij) in (B.1) are the same as that in (A.4) and (A.7); the coefficient of Δ_9ij(π_ij) in (B.1) has four terms corresponding to those in (B.7), the first two terms are zero by the computation in Appendix A, by its symmetry in (k, l, m, n), the last two terms are

since

the first term above is zero. By expanding and check each term using (A.1) and (A.2), the second term above, and hence the coefficient of Δ_9ij(π_ij) is

If π′_ij=2, by (B.8), the coefficient of Δ_7ij(π_ij) is the same as before. Now we compute the coefficient of Δ_8ij(π_ij). We expand it in five terms as in (B.9). The first term is (1+f)²(1−ζ)σ_a²/2 by the computation in Appendix A. Expanding the same way as in Appendix A, the second term is

the last two terms above are zero by (A.1). Since

substitute this into the second and the third term in (B.12), it becomes −(1+f)²∑_kα_k²p_k(q_k−ζp_k). By expanding the same way, the third term is

The last two terms in (B.14) are zero. Substitute (B.13) into the second and third term in (B.14), it becomes

now combine the second and the third terms gives

the fourth term is

the fifth term is

For the coefficient of Δ_9ij(π_ij), we expand it in four terms according to (B.10), the first two terms are zero by the computation in Appendix A, so it reduces to

the first term above is zero since , the second term, and hence the coefficient of Δ_9ij(π_ij) is , which is

The first term in the bracket above is

the second term is

the third term is

the fourth term is

Now collect terms, the coefficient of Δ_9ij(π_ij) is

Appendix C

Let ξ=(α, β), ξ₀=(α₀, β), and define ξ₁ and the hat notations for the corresponding estimates. Let Î(ξ) be the empirical Fisher information matrix evaluated at (ξ), by Taylor expansion,

and its is well known that, under H₁, as n → ∞,

Also, since the familial structures are homogeneous, so

Thus under H₁,

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, A., Chen, G., Yang, Q. et al. Variance components model with disequilibria. Eur J Hum Genet 14, 941–952 (2006). https://doi.org/10.1038/sj.ejhg.5201645

Download citation

Received: 22 September 2005
Revised: 05 April 2006
Accepted: 06 April 2006
Published: 24 May 2006
Issue Date: 01 August 2006
DOI: https://doi.org/10.1038/sj.ejhg.5201645

Variance components model with disequilibria

Abstract

Similar content being viewed by others

From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Rank-invariant estimation of inbreeding coefficients

Introduction

The common VC model

VC model with disequilibria

Hardy–Weinberg disequilibrium at trait locus

Linkage to marker

Covariance with Hardy–Weinberg disequilibrium at trait given marker IBD

Covariance with LD between trait and marker

Parameter estimation

Power

Application

Simulation study

Step 1

Step 2

Step 3

Real data application

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Appendix C

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Abstract

Similar content being viewed by others

From Mendel to quantitative genetics in the genome era: the scientific legacy of W. G. Hill

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Rank-invariant estimation of inbreeding coefficients

Introduction

The common VC model

VC model with disequilibria

Hardy–Weinberg disequilibrium at trait locus

Linkage to marker

Covariance with Hardy–Weinberg disequilibrium at trait given marker IBD

Covariance with LD between trait and marker

Parameter estimation

Power

Application

Simulation study

Step 1

Step 2

Step 3

Real data application

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Appendix C

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links