Pedigree linkage disequilibrium mapping of quantitative trait loci

Fan, Ruzong; Spinka, Christie; Jin, Lei; Sun Jung, Jee

doi:10.1038/sj.ejhg.5201301

Download PDF

Article
Published: 13 October 2004

Pedigree linkage disequilibrium mapping of quantitative trait loci

Ruzong Fan^1,2,
Christie Spinka¹,
Lei Jin¹ &
…
Jee Sun Jung¹

European Journal of Human Genetics volume 13, pages 216–231 (2005)Cite this article

670 Accesses
21 Citations
Metrics details

Abstract

In this paper, we propose to use pedigrees of any size and any types of relatives in joint high-resolution linkage disequilibrium (LD) and linkage mapping of quantitative trait loci (QTL) by variance component models. Two or multiple markers can be simultaneously used in modeling association with the trait locus, instead of using one marker a time in the analysis. The proposed method can provide a unified result by using two or multiple markers in the modeling. This may avoid the complications of different results obtained from the separate analysis of marker by marker. The models simultaneously incorporate both linkage and LD information. The measures of LD are modeled by mean coefficients, and linkage information is modeled by variance–covariance matrix. Using analytical formulas to calculate the regression coefficients, the genetic effects are shown to be decomposed into additive and dominance components. The noncentrality parameter approximations of test statistics of LD are provided to make power calculations. Power and type I error rates are explored to investigate the merit of the proposed method by both the analytical formulas and simulations. Comparing with the association between-family and association within-family (‘AbAw’) approach of Fulker and Abecasis et al, it is evident that the method proposed in this article is more powerful. The method is applied to investigate the relation between polymorphisms in the angiotensin 1-converting enzyme (ACE) genes and circulating ACE levels, with a better result than that of the ‘AbAw’ approach. Moreover, two markers I/D and 4656(CT)3/2 can fully interpret association with the trait locus at a 0.01 significance level, which provides a unique result for the ACE data.

Robust association tests for quantitative traits on the X chromosome

Article 10 September 2022

Zi-Ying Yang, Wei Liu, … Ji-Yuan Zhou

Liability threshold modeling of case–control status and family history of disease increases association power

Article 20 April 2020

Margaux L. A. Hujoel, Steven Gazal, … Alkes L. Price

Fine mapping and accurate prediction of complex traits using Bayesian Variable Selection models applied to biobank-size data

Article Open access 19 July 2022

Gustavo de los Campos, Alexander Grueneberg, … Anirban Samaddar

Introduction

To map quantitative trait loci (QTL) of complex diseases, linkage disequilibrium (LD or association) regression analysis using population data can be performed.¹ However, population substructures may affect the results and produce false positives. To remedy this, variance component models are proposed to perform joint LD and linkage mapping of QTL using both population and pedigree data.^{2,3,4,5,6,7,8,9,10,11,12} However, most research limits on models that utilize small nuclear families in research. In principle, the method of George et al¹³ can be used to analyze general pedigrees in joint linkage and LD mapping, but the examples of George et al¹³ are limited to nuclear families. To our knowledge, Abecasis et al,³ Göring and Terwillinger,¹⁰ and Martin et al¹¹ are the only literature that present joint linkage and LD mapping methods using extended multi-generation pedigrees for QTL mapping.

In our previous research of joint linkage and LD mapping (or association study), either nuclear families or sibships were combined with population data to construct variance component models to map QTL for complex diseases.^1,7,8 This article generalizes our previous work to extended multigeneration pedigrees of any sizes and any types of relatives. Intuitively, large pedigrees contain more linkage and LD information. Therefore, it is important to develop models that may accommodate any types of data, including population data, sibships, nuclear families and multigeneration large pedigrees for a combined analysis.

The arrangement of the paper is as follows. First, variance component models are constructed based on multigeneration pedigree data following our previous research. The models incorporate both linkage and LD information in a unified analysis. The linkage information is modeled in variance–covariance matrix, while the association parameters are modeled in mean coefficients. Moreover, genetic effects are decomposed into orthogonal summation of additive and dominance effects. Type I error rates are calculated for five test cases to show the robustness of the proposed approach. To explore the effectiveness of the proposed methods, comparison with the association between-family and association within-family (‘AbAw’) approach of Abecasis and Faulker et al is investigated.^2,3,4,9,12 Two pedigrees of Figure 1 in Abecasis et al³ are taken as example. The comparison is based on both analytical formulae of noncentrality parameter approximations of test statistics and simulation results, using the same values of parameters of Abecasis et al.³ The simulation program PEDSIMUL kindly provided by Dr Abecasis is used to generate simulated datasets. As a practical example, the method is applied to investigate the relation between polymorphisms in the angiotensin-1-converting enzyme (ACE) genes and circulating ACE levels.^14,15

Methods

Consider a quantitative trait locus Q which has two alleles Q₁ and Q₂ with allele frequencies q₁ and q₂, respectively. Assume that two markers A and B are typed in a chromosome region of the trait locus Q. Marker A has two alleles A and a with frequencies P_A and P_a, respectively. Marker B has two alleles B and b with frequencies P_B and P_b, respectively. Suppose that the data are composed of I families. Let us list the log-likelihood of the I families by L₁, …, L_I. The overall log-likelihood is L=∑^I_i=1L_i. In the ith family, let n_i be the total number of individuals who are listed as j=1, 2, …, n_i, each individual j is preceded by all his/her ancestors. Let us denote the quantitative traits of ith family by a vector y_i= ${(y_{i 1}, y_{i 2}, \dots, y_{i n_{i}})}^{τ}$ , genotypes at marker A by a vector ${(A_{i 1}, A i_{2}, \dots, A_{i n_{i}})}^{τ}$ , and genotypes at marker B by a vector ${(B_{i 1}, B i_{2}, \dots, B_{i n_{i}})}^{τ}$ . The log-likelihood is defined by

under the assumption of multivariate normality. The notations of the log-likelihood are defined as follows. The mean component X_iμ is from the following regression equation in a similar way as that of our previous work:^1,7,8

where β is the overall mean, G_ij is the polygenic effect, e_ij is the error term. Assume that G_ij is normal N(0,σ_G²), and e_ij is normal N(0,σ_e²), Moreover, G_ij and e_ij are independent. x_Aij, x_Bij, z_Aij and z_Bij are dummy variables defined by

α_A, α_B, δ_A and δ_B are regression coefficients of the dummy variables x_Aij, x_Bij, z_Aij and z_Bij.

The model matrix X_i is defined by

and μ=(β, α_A, α_B, δ_A, δ_B)^τ is a vector of regression coefficients. An intuitive rationale of regression (1) is provided on pages 608–609 in Fan and Xiong.¹ In regression (1), it is assumed that there are no covariates, which may be included as that of Fan and Xiong.^1,8

Σ_i is a n_i × n_i variance–covariance matrix defined as

where σ²=σ_g²+σ_G²+σ_e², σ_g² is variance explained by the putative QTL Q, σ_G² is polygenic variance, and σ_e² is error variance. The genetic variances σ_g²=σ_ga²+σ_gd² and σ_G²=σ_Ga²+σ_Gd² are decomposed into additive and dominance components. ρ_jk=(π_jkQσ_ga²+Δ_jkQσ_gd²+2Φ_jkσ_Ga²+Δ_7jkμ_Gd²)/σ² is correlation between the jth individual and the kth individual of the family, where π_jkQ is the proportion of alleles shared identical by descent (IBD) at QTL Q by the jth and the kth individuals, Δ_ijQ is the probability that both alleles at QTL Q shared by the jth and the kth individuals are IBD, Φ_jk is kinship coefficient of individuals j and k, and Δ_7jk is the probability that both alleles shared by the jth and the kth individuals are IBD at any locus. π_jkQ and Δ_jkQ are usually estimated by marker information. ^16,17,18 The recombination fractions between the genotyped markers and the unobserved QTL are contained in the estimations of π_jkQ and Δ_jkQ.^{7,19,20,21,22} Hence, linkage information is modeled in variance–covariance matrix.

Let μ_ij be the effect of genotype Q_iQ_j,i,j=1,2, μ₁₂=μ₂₁, that is μ_ij is the mean displacement in phenotype trait for a given genotype Q_iQ_j. Denote the population effect mean by μ=μ₁₁q₁²+2μ₁₂q₁q₂+μ₂₂q₂², that is, μ is the aggregate effect of the QTL on the trait mean in the population. Define α_Q=q₁μ₁₁+(q₂−q₁)μ₁₂−q₂μ₂₂, δ_Q=2μ₁₂−μ₁₁−μ₂₂. If μ₁₁=a, μ₁₂=d, and μ₂₂=−a as in the traditional quantitative genetics,²³ α_Q=a+(q₂−q₁)d is the average allele substitution effect, and δ_Q=2d characterizes the dominance effect. In general, one may define a=μ₁₁−(μ₁₁+μ₂₂)/2 and d=μ₁₂−(μ₁₁+μ₂₂)/2. It is well known that the additive variance σ_gd²=2q₁q₂α_Q² and the dominance variance σ_gd²=(q₁q₂)²δ_Q².

Denote a measure of LD between trait locus Q and marker A by D_AQ=P(AQ₁)−q₁P_A, a measure of LD between trait locus Q and marker B by D_QB=P(BQ₁)−q₁P_B, and a measure of LD between marker A and marker B by D_AB=P(AB)−P_AP_B. Let the additive and dominance variance-covariance matrices be

Such as in Appendix B, Fan and Xiong,¹ we can show that the coefficients of regression equation (1) are given by

If only one marker A is used in the analysis, Eq. (3) can be replaced by α_A=D_AQα_Q/(P_aP_A), δ_A=D_AQ²δ_Q/(P_a²P_A²). When the marker and QTL are the same and allele A is Q₁ and allele a is Q₂, D_AQ=q₁q₂=P_aP_A, this implies that α_A=α_Q, and δ_A=δ_Q, that is, all genetic substitution effect and dominance effect are encompassed in the allele association parameters. Hence, the estimates of the residual additive genetic variance σ_ga² and σ_gd² should be close to 0 in data analysis.

For a relative pair (1,2) of individuals 1 and 2. Table 1 gives the conditional probability P(G₁,G₂∣C) given their allele IBD sharing status. Here G_i is genotype of relative i, and C is one event of (IBD=k), k=0,1,2. For example, P(AA, AA∣IBD=0)=P_A⁴, P(AA,AA∣IBD=1)=P_A³ and P(AA, AA∣IBD=2)=P_A². Denote indicator functions

Table 1 Conditional probability P(G₁, G₂∣C) of a relative pair (1, 2) given their allele IBD sharing status

Full size table

Utilizing the conditional probabilities of Table 1, the conditional expectation of product of the indicator functions can be calculated and the results are listed in Table 2. For instance, E(x_A1x_A2∣IBD=2)=4P_a²P_A²+(P_a−P_A)² 2P_aP_A+4P_A²P_a²=2P_aP_A. Based on Table 2, the expectation of product of indicator functions of (4) can be calculated for various types of relative pair. If individuals 1 and 2 are the same person, P(IBD=2)=1 and then

where O are 0 matrices. If individuals 1 and 2 are common type of relatives, Appexdix A provides expectations of product of indicator functions (4).

Table 2 Conditional expectation of a relative pair (1, 2) given their allele IBD sharing status

Full size table

From above discussions, it is clear that the linkage is modeled in the variance–covariance matrix, while the association parameters are modeled in the mean coefficients. Hence, the model simultaneously takes linkage and association into account. The individual test of either linkage or association can be performed as follows: compare the full model in which all parameters are estimated and sub-models in which some parameters are fixed as constant.

In practice, one may first test linkage to detect a broad region of a trait locus. That is, the coefficients of α_A, α_B, δ_A and δ_B are fixed to be 0; and test the existence of linkage. In the presence of linkage, the association test can be performed for fine mapping of the QTL. If both additive and dominance variances σ_ga² and σ_gd² are significantly larger than 0, the genetic effects are not equal to 0 (ie, α_Q≠0 and δ_Q≠0). This implies that a null hypothesis H_AB,ad:α_A=α_B=δ_A=δ_B=0 is equivalent to D_AQ=D_QB, that is, no association between the QTL Q and the markers. Hence, H_AB,ad can be used to test association. Notice that there are four parameters under the null hypothesis H_AB,ad, but only two independent parameters D_AQ and D_QB are under test. Hence, at least two of the regression coefficients of α_A, α_B, δ_A, and δ_B must be 0 in data analysis. Suppose only additive variance σ_ga² is significantly larger than 0, but dominance variance σ_gd² is not significantly larger than 0 (ie, α_Q≠0 and δ_Q=0). Regression (1) can be simplified by excluding the dominance effects from the analysis, that is, setting δ_A=δ_B=0. A null hypothesis H_AB,a:α_A=α_B=0 can be used to test association between the QTL Q and the markers.

In regression (1), two markers A and B are used in the analysis. If multiple markers are available, one may extend it to fit the data. For example, assume one more marker C is typed. Then regression (1) can be extended to

where x_Cij and z_Cij are defined accordingly in the same way of (2), and α_C and δ_C are their regression coefficients.

Evidence of association can be tested by likelihood ratio test (LRT) procedure. For instance, let L_ad be the log-likelihood under the alternative hypothesis of H_AB,ad, and L₀ be the log-likelihood under the null hypothesis H_AB,ad. Then, the quantity 2[L_ad–L₀] is asymptotically distributed as χ². Notice again that there are only two measures of LD, D_AQ and D_QB, under the null hypothesis H_AB,ad. In data analysis, the number of coefficients α_A, α_B, δ_A, δ_B, which are significantly different from 0, should be less than or equal to 2. This number is the degrees of freedom of the likelihood ratio test 2[L_ad–L₀]. For large sample data, the likelihood ratio test is accurate based on the statistical theory. In the following, we will develop a F-test procedure based on linear model theory.²⁴

F-tests and noncentrality parameter approximations

For the convenience of explanation, let us consider I families given in graph A or graph B of Figure 1 (Figure 1 in Abecasis et al³). Let N be the total number of individuals, that is, N=nI, where n=11 for graph A and n=18 for graph B of Figure 1, respectively. For each individual in Figure 1, an ID is assigned. For the grand parents of graph B, the genotypes are unavailable and so no IDs are assigned. Denote the total trait values y=(y₁^τ, …, y_I^τ)^τ, the total variance–covariance matrix by Σ=diag(Σ₁, …, Σ_I), and the model matrix by X=X₁^τ, …, X_I^τ)^τ. Assume that Σ₁=⋯=Σ_I. Let α̂_A, α̂_B, δ̂_A, δ̂_B, Σ̂_i, Σ̂ be the maximum likelihood estimators of α_A, α_B, δ_A, δ_B, Σ_i, Σ. The estimate of μ is μ̂=[X^τΣ̂⁻¹ X]⁻¹ X^τ Σ̂⁻¹ y=[∑_i=1^IX_i^τΣ̂_i⁻¹X_i]⁻¹∑_i=1^IX_i^τΣ̂_i⁻¹y_i. Let H be a q × 5 test matrix of rank q. By Graybill²⁴ (Chapter 6), the test statistic of a hypothesis Hμ=0 is noncentral F(q, N−5) defined by

The noncentrality parameter of the test statistic F can be calculated by λ=(Hμ)^τ[H[X^τΣ⁻¹X]⁻¹H^τ]⁻¹ Hμ=(Hμ)^τ[H[∑_i=1^IX_i^τΣ_i⁻¹X_i]⁻¹H^τ]⁻¹ (Hμ). Denote In Appendix B, an approximation of the noncentrality parameter is provided by

where b₁ and b₂ are constants given by equations (B.1) and (B.2) in Appendix B for graph A and graph B of Figure 1, respectively. Notations O are 0 matrices.

To test the null hypothesis H_AB,a : α_A=α_B=0, the test matrix

Let us denote the corresponding F-test statistic by F_AB,a. The noncentrality parameter is

To test the null hypothesis H_AB,d: δ_A=δ_B=0, the test matrix

Denote the corresponding F-test statistic by F_AB,d. The noncentrality parameter is

To test the null hypothesis H_AB,ad : α_A=α_B=δ_A=δ_B=0, the test matrix

Denote the corresponding F-test statistic by F_AB,ad. The noncentrality parameter is λ_AB,ad=λ_AB,a+λ_AB,d. Assume that only one marker A is used in the analysis. The test statistic of a hypothesis Hμ=0 is noncentral F(q,N−3) defined by

where μ=(β, α_A, δ_A)^τ. The noncentrality parameter is for the null hypothesis H_A,ad : α_A=δ_A=0. Correspondingly, we denote the F-test statistic by F_A,ad. Similarly, λ_A,a≈(b₁I/σ²)σ_ga²D_AQ²/(P_AP_aq₁q₂) is the noncentrality parameter of the test statistic F_A,a for the null hypothesis H_A,a :α_A=0. The noncentrality parameter of the test statistic F_A,d for the null hypothesis H_A,d : δ_A=0 is λ_A,d≈(b₂I/σ²)σ_gd² D_AQ⁴/(P_A²P_a²q₁²q₂²).

Type I error rates and power comparison

Before making comparison, let us briefly describe the ‘AbAw’ method and comments on the difference of the ‘AbAw’ approach with ours. Such as our proposed method, the random effect parameters (σ_g², σ_G² and σ_e²) are modeled in the variance–covariance matrices in the ‘AbAw’ approach. However, the ‘AbAw’ approach decomposes the genetic association into effects of between-family members and within-family members.^2,3,4,9,12 The effects of between family members and within-family members are modeled in two coefficients β_b and β_w. This is primarily the difference between the ‘AbAw’ approach and our method. Suppose only one marker A is used in the analysis. Ignoring the dominance effect, our model only has one additive coefficient α_A but the ‘AbAw’ still has two regression coefficients β_b and β_w. Under the assumption of multivariate normality, the ‘AbAw’ χ_qtl² is defined as the likelihood ratio test of hypothesis β_w=0 against no constraints on the coefficient. In addition, the ‘AbAw’ approach uses only one marker and it is not clear how to extend the ‘AbAw’ approach to use more than one markers in analysis. We use two biallelic markers, and it is easy to extend our approach to use multiple biallelic markers in analysis. For sibship data, Fan and Jung⁷ compared our method with the ‘AbAw’ approach and found that our method is advantageous. In the following, we are going to calculate the type I error rates to show the robustness of the proposed approach. Then, we will make power comparison for general extended pedigrees.

Type I error rates

To investigate the robustness of the proposed approach, we calculate type I error rates at a 0.05 significance level for five test cases of Table 3, which is the same as those of Table 1 of Abecasis et al.³ Trait values are constructed from a normal distribution with mean 100, and total variance σ²=1 at the trait locus Q except for the Admixture case. Here σ²=σ_ga²+σ_Ga²+σ_e² is the summation of the additive variance of major gene effect σ_ga², the variance of additive polygenic effect σ_Ga², and the error variance σ_e². A biallelic marker A is simulated at a recombination fraction 0 between QTL Q and locus A. In each case, linkage equilibrium is assumed between QTL Q and marker A, that is, D_AQ=0. In the test case of Admixture, population admixture is generated by mixing families equally draw from one of the two subpopulations C and D. In both subpopulations C and D, no major gene effect or familial effect is assumed, that is, σ_g²=σ_G²=0. However, the trait mean of subpopulation C is fixed as 1 and the variance is fixed as 1, and the marker allele frequency P_A is taken as 0.7 in subpopulation C. The trait mean of subpopulation D is fixed as 0 and the variance is fixed as 1, and the marker allele frequency P_A is taken as 0.3 in subpopulation D. Therefore, the total variance in the mixing population is σ²=1.25. The admixture contributed to (1–0)²/[4 × 1.25]=0.20 of the total variance. The related parameters of the other four test cases are given in the legend of Table 3.

Table 3 Type I error rates (%) at a 0.05 significance level

Full size table

To calculate the type I error rates, 1000 data sets are simulated for each test case. Each data set contains 50 pedigrees of either graph A or graph B of Figure 1, respectively. Using the data sets, we fit the following model

where G_ij is normal N (0, σ_Ga²), y_ij is normal N (β+x_Aijα_A,σ²), and σ²=σ_ga²+σ_Ga²+σ_e². The null hypothesis is H_A,a : α_A=0. Since the QTL Q is in linkage equilibrium with marker A, an empirical test statistic, which is larger than the cutting point at a 0.05 significance level, is treated as a false positive. Based on either likelihood ratio test or F-test, type I error rates are calculated as the proportions of the 1000 simulation data sets, which give significant result at the 0.05 significant level based on F_A,a and likelihood ratio test statistic, respectively.

The results of Table 3 shows that the type I error rates are around the nominal level 0.05. Hence, the model is reasonably robust. For the large three-generation pedigree of Graph B of Figure 1, the type I error rates are very close to the nominal 0.05 significance level. Hence , the model works well under large sample case, as expected (notice that the total number of individuals N=550 for the 50 small three-generation pedigrees of Graph A, and N=900 for the 50 large three-generation pedigrees of Graph B). Moreover, the results of F-test F̂_A,a are very close to those of likelihood ratio test statistic. This is, theoretically, guaranteed under large sample by the property of the two tests (Chapter 6, Graybill²⁴).

Comparison with the ‘AbAw’ approach

Table 4 shows the power of 50 duplicates of pedigree A (and pedigree B) of Figure 1, respectively, for varying levels of LD between trait locus and marker A at 0.01 significance level. The parameters are exactly the same as those of Abecasis et al.³ Besides, the power of AbAw's χ_qtl² is taken from Table 2 of Abecasis et al.³ The power of F_A,a is calculated based on the noncentrality parameter approximation λ_A,a; the power of F̂_A,a and LRT are calculated as the proportions of 1000 simulation data sets, which give significant result at the 0.01 significance level based on F_A,a and likelihood ratio test statistic, respectively. It is obvious from Table 4 that F_A,a and likelihood ratio test based on our model have higher power than that of AbAw's χ_qtl². Since other test statistics in Table 2 of Abecasis et al³ is less powerful than the AbAw's χ_qtl², the method proposed in the current article is advantageous over the AbAw approach. As the type I error rates in Table 3, the results of F-test F̂_A,a are very close to those of likelihood ratio test statistics.

Table 4 Power (%) of 50 pedigrees (Figure 1) for varying levels of LD between trait locus and marker A at 0.01 significance level

Full size table

Power comparison

Let us denote the heritability by h², which is h²=σ_ga²/σ².²³. To make power comparison, we conside 30 duplicates of small and large 3-generation pedigrees of graphs A and B of Figure 1, respectively. Two modes of inheritance are taken as example: a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritace a=1.0,d=−0.5, respectively. Figures 2 and 3 show the power curves of test statistics F_AB,a, F_AB,ad, F_A,a, F_A,ad, F_A,d, and F_AB,d against LD coefficient D_AQ at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. The other parameters are given in the legend of the Figures. It can be seen that F_AB,ad, generally has higher power than that of F_A,ad, and F_AB,a generally has higher power than that of F_A,a. Hence, analysis using two markers are advantageous. Owing to the increase of degrees of freedom of test statistics, F_AB,ad generally has lower power than that of F_AB,a, and F_A,ad generally has lower power than that of F_A,a. Besides, the power of F_AB,d and F_A,d is low unless the LD between markerr A and QTL Q is vey strong. Since pedigree of Graph B of Figure 1 is larger than that of Graph A, the power of test statistics for Figure 3 is higher than that of corresponding statistics for Figure 2. Thus, large pedigrees contain more LD information and the models proposed rightly use the information.

Figures 4 and 5 show power curves of test statistics F_AB,a, F_AB,ad, F_A,a, F_A,ad, F_A,d, and F_AB,d against trait frequency q₁ or marker allele frequency P_A at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. In addition to the merits shown in Figures 2 and 3, the power of test statistics F_AB,ad, and F_AB,,a heavily depends on the frequency q₁ of QTL (graphs I and II of each figure), but not very much on the marker allele fequency P_A (graphs III and IV of each figure). Figures 6 and 7 show power curves of test statistics F_AB,a, F_AB,ad, F_A,a, F_A,ad, F_A,d, and F_AB,d against heritability h² at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. It can be seen that the power of F_AB,ad, F_AB,a, F_A,ad and F_A,a can be high as the heritability h² is larger than 0.1. The power of F_AB,d and F_A,d is generally low.

An example

The proposed method is applied to analyze ACE data.^14,15 The data consist of 83 extended families with between four and 18 members, and the pedigrees range in size from two to three generations. Circulating ACE levels were measured for 405 individuals. In total, 10 biallelic polymorphisms in the ACE gene were genotyped. Multipoint IBD at each marker are calculated by Merlin.²⁵ Software QTDT-2.4.3 is used to analyze the data. There are missing genotype information at markers, and so the total number of individuals is different from marker to marker. For instance, there are four individuals whose genotype information is missing at marker I/D and the total number N=401; at marker G2350A, on the other hand, there are more missing genotype data, and N=365.

Variance component linkage analysis shows that additive variances are significantly larger than 0, but dominance variances are not significantly larger than 0.³ Hence, dominance effects can be excluded from regression equation. The total variance is modeled as σ²=σ_ga²+σ_Ga²+σ_e², and related variance-covariance structure can be constructed. Table 5 shows LD analysis of the ACE gene by individual marker. To make comparison with the ‘AbAw’ approach, results of AbAw's lod are taken from Table 4 of Abecasis et al.³ After fitting the proposed models in this article, lod is calculated by LRT/(2 ln 10), where LRT=2(L₁−L₀), L₁ is the log-likelihood under y_ij=β+x_Aijα_A+G_ij+e_ij, and L₀ is the log-likelihood under y_ij=β+G_ij+e_ij. The lod scores calculated by the proposed method in this article is generally higher than those of the ‘AbAw’ approach, which is consistent with the results of Table 4. Hence, the proposed method is advantageous. Moreover, the results of Table 5 confirm the finding that the association is strongest around the G2215A, I/D, G2350A and 4656(CT)3/2 polymorphisms (lod=27.01), 27.37, 28.01 and 27.93; and the ‘AbAw’ lod=14.91, 15.76, 14.40 and 14.22, respectively;³). The estimations of σ_ga² are 0.05 and 0.03 at markers I/D and G2350A, respectively. In Table 5, the additional linkage effects are presented. The additional linkage lod is calculated by 2(L₁–L₂)/(2 ln 10), where L₂ is the log-likelihood y_ij=β+x_Aijα_A+G_ij+e_ij assuming the total variance σ²=σ_Ga²+σ_e² and related variance–covariance structure. For markers I/D and G2350A, the additional linkage lods are 0.09 and 0.03 (the related P-values are 0.76 and 0.85, respectively). The residual QTL-specific variances of 0.05 and 0.03 for the two ACE markers of largest effect are not significantly different from zero. Little additional linkage effects are present after modeling the association effects. Therefore, these markers are likely in complete LD with the trait alleles, and the genetic variance attributable to the QTL is encompassed in the association coefficient α_A.^2,3

Table 5 LD analysis of the ACE gene by individual marker

Full size table

Notice that there are 10 individual results in Table 5, and it is unclear if there is any relation among the models/markers from results in Table 5. Using the proposed models, we explore LD analysis based on two or more markers. Table 6 shows LD analysis of the ACE gene by two markers. Regressions are given by: (1) y_ij=β+G_ij+e_ij; (2) y_ij=β+x_Aijα_A+G_ij+e_ij; (3) y_ij=β+x_Aijα_A+x_Bijα_B+G_ij+e_ij. We first investigate markers I/D and G2350A that show strongest association in individual marker analysis of Table 5. It turns out that marker I/D is correlated with marker G2350A in the following sense: if A=I/D and B=G2350A, x_Aij are x_Bij are correlated to each other. This phenomenon motivates us to further investigate the relation of these two markers. Utilizing data from http://www.well.ox.ac.uk/~mfarrall/oxhap-freq.html, we calculate the frequency of the four haplotypes of these two markers as follows: P(IA)=0.478875, P(IG)=0.002817, P(DG)=0.515494, P(DA)=0.002817 (note here A is an allele at marker G2350A). This shows that allele I at marker I/D is almost always present with allele A at marker G2350A, and allele D at marker I/D is almost always present with allele G at marker G2350A. Besides, the measure of LD is 0.246878 between the markers I/D and G2350A. The two markers are almost in complete LD with each other.

Table 6 LD analysis of the ACE gene by two markers

Full size table

Since there are more missing data at marker G2350A than that at marker I/D, we choose I/D as baseline marker A. Six markers, 4656(CT)3/2, T-5491C, A-5466C, T-3892C, A-240T and T-93C, show significant association at a 0.05 significance level, in addition to the association of marker I/D (see the P-value of test H₀ : α_B=0 in Table 6). In these six markers, 4656(CT)3/2 shows strongest additional association with LRT=7.94, lod=1.72, P-value=0.005 (Table 6). Hence, we choose I/D and 4656(CT)3/2 as markers A and B for the basis of further investigation. Table 7 shows results by one more marker in addition to markers I/D and 4656(CT)3/2. Four markers, T-5491C, A-5466C, T-3892C and A-240 T, show significant association at a 0.05 significance level, in a addition to the association of markers I/D and 4656(CT)3/2 (see the P-value of test H₀ : α_C=0 in Table 7). Further investigation shows that there is no significant evidence to analyze the data using more than three markers in the regression equations.

Table 7 LD analysis of the ACE gene by multiple markers

Full size table

Notice that if the significance level is set at 0.01 instead of 0.05, then only markers I/D and 4656(CT)3/2 are associated with the trait locus (Table 6). The regression, y_ij=β+x_Aijα_A+x_Bijα_B+G_ij+e_ij, A=I/D, B=4656(CT)3/2, can fully interpret association at the 0.01 significance level, which provides a unique result.

Discussion

This article focuses on building models that may utilize extended multi generation large pedigrees for LD mapping of QTL. Based on the same logic developed in our previous work,^7,8 variance component models are constructed for high resolution joint linkage and LD mapping of QTL. By type I error rate evaluation, it is found that the proposed models have correct type I errors and the methods are robust. The methods proposed are compared with the ‘AbAw’ approach. It is found that our models have higher power. This is consistent with our previous findings based on sibpair data.⁷ Based on power comparison, it is shown that large pedigrees may contain more LD information than small pedigrees. Moreover, the proposed method is applied to ACE data with a better result than that of the ‘AbAw’ approach. In addition, it is found that markers I/D and 4656(CT)3/2 can fully describe the association with the trait locus at a 99% significance level for the ACE data. This result is very meaningful. For instance, all 10 biallelic markers of ACE data show association with the trait locus. However, it is markers I/D and 4656(CT)3/2 which are really necessary in describing the association at the 99% significance level. Marker G2350A is correlated to marker I/D, as a matter of fact!

One possible explanation that the proposed method is more powerful than the ‘AbAw’ approach is as follows. The proposed method decomposes the association into the summation of additive and dominance effects. If dominance effect is not significantly larger than 0, then only additive effect is modeled such as the ACE data. The’AbAw’ approach, on the other hand, decomposes the association into the summation of between-family (b) and within-family (w) components. In Table 4 of Abecasis et al,³ the ‘AbAw’ lod is reported to be the within-family effect, which is less powerful than the proposed method. By decomposing the association effect into association between-family and association within-family in the ‘AbAw’ approach, the information is diluted and so the approach is less powerful. This is, in the opinion of the authors of the current article, is not a criticism for the ‘AbAw’ approach, since it may be a good method in certain circumstances. It is our hope that the current article may stimulate more interest in exploring the most appropriate method in LD mapping of complex diseases.

Population may contain LD information, and pedigree data may contain both linkage and LD information. In mapping complex diseases, it is difficult to get replicate linkage findings using pedigree data. It is interesting to combine the same pedigree data with population data for fine association study. In general, any type of pedigree data can be combined with population data for a unified analysis of linkage and LD mapping of QTL, based on the models developed in the current paper and our previous research.^7,8 Linkage analysis can localize genetic traits in a broad region, and is less sensitive to population structures. LD mapping, on the other hand, is appropriate for fine high-resolution mapping and sensitive to population admixtures. As the first step, one may perform linkage analysis using pedigree data to get prior linkage evidences based on a sparse genetic map.^26,27 Then, a combined analysis using both population data and pedigree data can take advantage of both linkage and LD mapping for high resolution of genetic traits. That is, it may provide high resolution of LD mapping, and be more likely to avoid the false positives by using the prior linkage evidences based on a dense genetic map.²⁸

Up to now, most research focus on using biallelic markers in LD mapping of QTL. It would be interesting in building models using multiallelic markers such as macrosatellites or haplotype blocks. One problem in using multiallelic markers in analysis is that the number of parameters can be big, which may lead to high degrees of freedom of test statistics. Hence, selection of significant markers and relevant alleles is an important issue. In addition, genotyping errors can heavily impact the result of a study. These issues deserve more in depth investigations.

References

Fan R, Xiong M : High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur J Hum Genet 2002; 10: 607–615.
Article CAS PubMed Google Scholar
Abecasis GR, Cardon LR, Cookson WOC : A general test of association for quantitative traits in nuclear families. Am J Hum Genet 2000; 66: 279–292.
Article CAS PubMed Google Scholar
Abecasis GR, Cookson WOC, Cardon LR : Pedigree tests of linkage disequilibrium. Eur J Hum Genet 2000; 8: 545–551.
Article CAS PubMed Google Scholar
Abecasis GR, Cookson WOC, Cardon LR : The power to detect linkage disequilibrium with quantitative traits in selected samples. Am J Hum Genet 2001; 68: 1463–1474.
Article CAS PubMed Google Scholar
Allison DB, Bonnie T, St Jean P, Elston RC, Infante MC, Schork NJ : Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet 1998; 63: 1190–1201.
Article CAS PubMed Google Scholar
Almasy L, Williams JT, Dyer TD, Blangero J : Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genet Epidemiol 1999; 17 (Suppl 1): S31–S36.
Article PubMed Google Scholar
Fan R, Jung J : High resolution joint linkage disequilibrium and linkage mapping of quantitative trait loci based on sibship data. Hum Hered 2003; 56: 166–187.
Article CAS PubMed Google Scholar
Fan R, Xiong M : Combined high resolution linkage and association mapping of quantitative trait loci. Eur J Hum Genet 2003; 11: 125–137.
Article CAS PubMed Google Scholar
Fulker DW, Cherny SS, Sham PC, Hewitt JK : Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet 1999; 64: 259–267.
Article CAS PubMed Google Scholar
Göring HHH, Terwillinger JD : Linkage analysis in the presence of error IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am J Hum Genet 2000; 66: 1310–1327.
Article PubMed Google Scholar
Martin ER, Monks SA, Warren LL, Kaplan NL : A test for linkage and association in general pedigrees: the pedigree diseqilibrium test. Am J Hum Genet 2000; 67: 146–154.
Article CAS PubMed Google Scholar
Sham PC, Cherny SS, Purcell S, Hewitt JK : Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 2000; 66: 1616–1630.
Article CAS PubMed Google Scholar
George V, Tiwari HK, Zhu XF, Elston RC : A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am J Hum Genet 1999; 65: 236–245.
Article CAS PubMed Google Scholar
Farrall M, Keavney B, MckKenzie CA et al.: Fine mapping of an ancestral recombination break-point in DCP1. Nat Genet 1999; 23: 270–271.
Article CAS PubMed Google Scholar
Keavney B, MckKenzie CA, Connell JM et al: Measured haplotype analysis of the angiotension-1 converting enzyme gene. Hum Mol Genet 1998; 7: 1745–1751.
Article CAS PubMed Google Scholar
Amos CI : Robust variance-components approach for assessing linkage in pedigrees. Am J Hum Genet 1994; 54: 534–543.
Google Scholar
Amos CI, Elston RC : Robust methods for the detection of genetic linkage for quantitative data from pedigrees. Genet Epidemiol 1989; 6: 349–360.
Article CAS PubMed Google Scholar
Amos CI, Elston RC, Wilson AF, Bailey-Wilson JE : A more powerful robust sib-pair test of linkage for quantitative traits. Genet Epidemiol 1989; 6: 435–449.
Article CAS PubMed Google Scholar
Fulker DW, Cherny SS, Cardon LR : Multiple interval mapping of quantitative trait loci, using sib-pairs. Am J Hum Genet 1995; 56: 1224–1233.
CAS PubMed Central PubMed Google Scholar
Almasy L, Blangero J : Multipoint quantitative trait linkage analysis in general pedigrees. Am J Hum Genet 1998; 62: 1198–1211.
Article CAS PubMed Google Scholar
Goldgar DE, Oniki RS : Comparison of a multipoint identity-by-descent method with parametric multipoint linkage analysis for mapping quantitative traits. Amer J Hum Genet 1992; 50: 598–606.
CAS PubMed Google Scholar
Pratt SC, Daly M, Kruglyak L : Exact multipoint quantitative-trait linkage analysis in pedigrees by variance components. Am J Hum Genet 2000; 66: 1153–1157.
Article CAS PubMed Google Scholar
Falconer DS, Mackay TFC : Introduction to quantitative genetics, 4th edn. London: Longman, 1996.
Google Scholar
Graybill FA : Theory and application of the linear model. Belmont, CA: Wadsworth Publishing Company, Inc., 1976.
Google Scholar
Abecasis GR, Cherny SS, Cookson WOC, Cardon LR : Merlin—rapid analysis of dense genetic maps usig sparse gene flow trees. Nat Genet 2002; 30: 97–101.
Article CAS PubMed Google Scholar
Broman KW, Murray JC, Sheffied VC, White RL, Weber JL : Comprehensive human genetic map: individual and sex-specific variation in recombination. Am J Hum Genet 1998; 63: 861–869.
Article CAS PubMed Google Scholar
Kong A, Gudbjartsson DF, Sainz J et al.: A high resolution recombination map of the human genome. Nat Genet 2002; 31: 241–247.
Article CAS PubMed Google Scholar
The International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001; 409: 928–933.

Download references

Acknowledgements

We thank two reviewers, and Dr Gert-Jan B von Ommen for their helpful comments to improve the paper. R Fan was supported partially by a research fellowship from the Alexander von Humboldt Foundation, Germany, and an International Research Travel Assistance Grant of the Texas A&M University. We are grateful to Dr Abecasis for kindly providing the simulation program PEDSIMUL to generate simulated datasets.

Author information

Authors and Affiliations

Department of Statistics, Texas A&M University, 447 Blocker Building, 3143 TAMUS, College Station, TX, 77843, USA
Ruzong Fan, Christie Spinka, Lei Jin & Jee Sun Jung
Institute of Medical Biometry, Informatics and Epidemiology, University of Bonn, Sigmund Freud Strasse 25, D-53105, Bonn, Germany
Ruzong Fan

Authors

Ruzong Fan
View author publications
You can also search for this author in PubMed Google Scholar
Christie Spinka
View author publications
You can also search for this author in PubMed Google Scholar
Lei Jin
View author publications
You can also search for this author in PubMed Google Scholar
Jee Sun Jung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruzong Fan.

Appendices

Appendix A

In this appendix, we provides expectations of product of indicator functions (4) for a few common type of relatives. If individuals 1 and 2 are heterozygous sibs P(IBD=0)=P(IBD=2)=1/4, P(IBD=1)=1/2 and then

If individuals 1 and 2 are a parent–offspring pair, P(IBD=0)=P(IBD=2)=0, P(IBD=1)=1 and then

If individuals 1 and 2 are a grandparent/grandchild pair (or uncle/niece or aunt/nephew pair), P(IBD=0)=1/2, P(IBD=2)=0, P(IBD=1)=1/2 and then

If individuals 1 and 2 are first cousins, P(IBD=0)=3/4, P(IBD=2)=0, P(IBD=1)= 1/4 and then

In general, consider two individuals 1 and 2 who are noninbred relatives. Let Φ₁₂ be their kinship coefficient, and Δ₇₁₂ be the probability that both alleles shared by the two individuals are IBD at any locus. Then

Appendix B

Based on Equations (5) and (A.1), (A.2), (A.3) and (A.4), an approximation of noncentrality parameter of test staticstics F can be obtained as follows: Let us denote Σ_i⁻¹=(1/σ²)(γ_kl)_{n × n} n=11 for graph A and n=18 for graph B of Figure 1, respectively. When I is sufficiently large, the laws of large numbers implies that

Here, b₁ and b₂ are constants as follows for pedigrees in graph A of Figure 1

For pedigrees in graph B of Figure 1, constants b₁ and b₂ are given by

Therefore, the noncentrality parameter can be approximated by

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, R., Spinka, C., Jin, L. et al. Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur J Hum Genet 13, 216–231 (2005). https://doi.org/10.1038/sj.ejhg.5201301

Download citation

Received: 02 April 2004
Revised: 17 August 2004
Accepted: 27 August 2004
Published: 13 October 2004
Issue Date: 01 February 2005
DOI: https://doi.org/10.1038/sj.ejhg.5201301

Keywords

This article is cited by

Favorable alleles mining for gelatinization temperature, gel consistency and amylose content in Oryza sativa by association mapping
- Hui Wang
- Shangshang Zhu
- Delin Hong
BMC Genetics (2019)
Association studies of dormancy and cooking quality traits in direct-seeded indica rice
- SUNAYANA RATHI
- K. PATHAK
- R. N. SARMA
Journal of Genetics (2014)
Polymorphisms of the IGF1R gene and their genetic effects on chicken early growth and carcass traits
- Mingming Lei
- Xia Peng
- Xiquan Zhang
BMC Genetics (2008)
Combined Linkage and Association Mapping of Quantitative Trait Loci with Missing Completely at Random Genotype Data
- Ruzong Fan
- Lian Liu
- Ming Zhong
Behavior Genetics (2008)
PedGenie: an analysis approach for genetic association testing in extended pedigrees and genealogies of arbitrary size
- Kristina Allen-Brady
- Jathine Wong
- Nicola J Camp
BMC Bioinformatics (2006)

Pedigree linkage disequilibrium mapping of quantitative trait loci

Abstract

Similar content being viewed by others

Robust association tests for quantitative traits on the X chromosome

Liability threshold modeling of case–control status and family history of disease increases association power

Fine mapping and accurate prediction of complex traits using Bayesian Variable Selection models applied to biobank-size data

Introduction

Methods

F-tests and noncentrality parameter approximations