Abstract
In this paper, we propose to use pedigrees of any size and any types of relatives in joint high-resolution linkage disequilibrium (LD) and linkage mapping of quantitative trait loci (QTL) by variance component models. Two or multiple markers can be simultaneously used in modeling association with the trait locus, instead of using one marker a time in the analysis. The proposed method can provide a unified result by using two or multiple markers in the modeling. This may avoid the complications of different results obtained from the separate analysis of marker by marker. The models simultaneously incorporate both linkage and LD information. The measures of LD are modeled by mean coefficients, and linkage information is modeled by variance–covariance matrix. Using analytical formulas to calculate the regression coefficients, the genetic effects are shown to be decomposed into additive and dominance components. The noncentrality parameter approximations of test statistics of LD are provided to make power calculations. Power and type I error rates are explored to investigate the merit of the proposed method by both the analytical formulas and simulations. Comparing with the association between-family and association within-family (‘AbAw’) approach of Fulker and Abecasis et al, it is evident that the method proposed in this article is more powerful. The method is applied to investigate the relation between polymorphisms in the angiotensin 1-converting enzyme (ACE) genes and circulating ACE levels, with a better result than that of the ‘AbAw’ approach. Moreover, two markers I/D and 4656(CT)3/2 can fully interpret association with the trait locus at a 0.01 significance level, which provides a unique result for the ACE data.
Similar content being viewed by others
Introduction
To map quantitative trait loci (QTL) of complex diseases, linkage disequilibrium (LD or association) regression analysis using population data can be performed.1 However, population substructures may affect the results and produce false positives. To remedy this, variance component models are proposed to perform joint LD and linkage mapping of QTL using both population and pedigree data.2,3,4,5,6,7,8,9,10,11,12 However, most research limits on models that utilize small nuclear families in research. In principle, the method of George et al13 can be used to analyze general pedigrees in joint linkage and LD mapping, but the examples of George et al13 are limited to nuclear families. To our knowledge, Abecasis et al,3 Göring and Terwillinger,10 and Martin et al11 are the only literature that present joint linkage and LD mapping methods using extended multi-generation pedigrees for QTL mapping.
In our previous research of joint linkage and LD mapping (or association study), either nuclear families or sibships were combined with population data to construct variance component models to map QTL for complex diseases.1,7,8 This article generalizes our previous work to extended multigeneration pedigrees of any sizes and any types of relatives. Intuitively, large pedigrees contain more linkage and LD information. Therefore, it is important to develop models that may accommodate any types of data, including population data, sibships, nuclear families and multigeneration large pedigrees for a combined analysis.
The arrangement of the paper is as follows. First, variance component models are constructed based on multigeneration pedigree data following our previous research. The models incorporate both linkage and LD information in a unified analysis. The linkage information is modeled in variance–covariance matrix, while the association parameters are modeled in mean coefficients. Moreover, genetic effects are decomposed into orthogonal summation of additive and dominance effects. Type I error rates are calculated for five test cases to show the robustness of the proposed approach. To explore the effectiveness of the proposed methods, comparison with the association between-family and association within-family (‘AbAw’) approach of Abecasis and Faulker et al is investigated.2,3,4,9,12 Two pedigrees of Figure 1 in Abecasis et al3 are taken as example. The comparison is based on both analytical formulae of noncentrality parameter approximations of test statistics and simulation results, using the same values of parameters of Abecasis et al.3 The simulation program PEDSIMUL kindly provided by Dr Abecasis is used to generate simulated datasets. As a practical example, the method is applied to investigate the relation between polymorphisms in the angiotensin-1-converting enzyme (ACE) genes and circulating ACE levels.14,15
Methods
Consider a quantitative trait locus Q which has two alleles Q1 and Q2 with allele frequencies q1 and q2, respectively. Assume that two markers A and B are typed in a chromosome region of the trait locus Q. Marker A has two alleles A and a with frequencies PA and Pa, respectively. Marker B has two alleles B and b with frequencies PB and Pb, respectively. Suppose that the data are composed of I families. Let us list the log-likelihood of the I families by L1, …, LI. The overall log-likelihood is L=∑Ii=1Li. In the ith family, let ni be the total number of individuals who are listed as j=1, 2, …, ni, each individual j is preceded by all his/her ancestors. Let us denote the quantitative traits of ith family by a vector yi=, genotypes at marker A by a vector , and genotypes at marker B by a vector . The log-likelihood is defined by
under the assumption of multivariate normality. The notations of the log-likelihood are defined as follows. The mean component Xiμ is from the following regression equation in a similar way as that of our previous work:1,7,8
where β is the overall mean, Gij is the polygenic effect, eij is the error term. Assume that Gij is normal N(0,σG2), and eij is normal N(0,σe2), Moreover, Gij and eij are independent. xAij, xBij, zAij and zBij are dummy variables defined by
αA, αB, δA and δB are regression coefficients of the dummy variables xAij, xBij, zAij and zBij.
The model matrix Xi is defined by
and μ=(β, αA, αB, δA, δB)τ is a vector of regression coefficients. An intuitive rationale of regression (1) is provided on pages 608–609 in Fan and Xiong.1 In regression (1), it is assumed that there are no covariates, which may be included as that of Fan and Xiong.1,8
Σi is a ni × ni variance–covariance matrix defined as
where σ2=σg2+σG2+σe2, σg2 is variance explained by the putative QTL Q, σG2 is polygenic variance, and σe2 is error variance. The genetic variances σg2=σga2+σgd2 and σG2=σGa2+σGd2 are decomposed into additive and dominance components. ρjk=(πjkQσga2+ΔjkQσgd2+2ΦjkσGa2+Δ7jkμGd2)/σ2 is correlation between the jth individual and the kth individual of the family, where πjkQ is the proportion of alleles shared identical by descent (IBD) at QTL Q by the jth and the kth individuals, ΔijQ is the probability that both alleles at QTL Q shared by the jth and the kth individuals are IBD, Φjk is kinship coefficient of individuals j and k, and Δ7jk is the probability that both alleles shared by the jth and the kth individuals are IBD at any locus. πjkQ and ΔjkQ are usually estimated by marker information. 16,17,18 The recombination fractions between the genotyped markers and the unobserved QTL are contained in the estimations of πjkQ and ΔjkQ.7,19,20,21,22 Hence, linkage information is modeled in variance–covariance matrix.
Let μij be the effect of genotype QiQj,i,j=1,2, μ12=μ21, that is μij is the mean displacement in phenotype trait for a given genotype QiQj. Denote the population effect mean by μ=μ11q12+2μ12q1q2+μ22q22, that is, μ is the aggregate effect of the QTL on the trait mean in the population. Define αQ=q1μ11+(q2−q1)μ12−q2μ22, δQ=2μ12−μ11−μ22. If μ11=a, μ12=d, and μ22=−a as in the traditional quantitative genetics,23 αQ=a+(q2−q1)d is the average allele substitution effect, and δQ=2d characterizes the dominance effect. In general, one may define a=μ11−(μ11+μ22)/2 and d=μ12−(μ11+μ22)/2. It is well known that the additive variance σgd2=2q1q2αQ2 and the dominance variance σgd2=(q1q2)2δQ2.
Denote a measure of LD between trait locus Q and marker A by DAQ=P(AQ1)−q1PA, a measure of LD between trait locus Q and marker B by DQB=P(BQ1)−q1PB, and a measure of LD between marker A and marker B by DAB=P(AB)−PAPB. Let the additive and dominance variance-covariance matrices be
Such as in Appendix B, Fan and Xiong,1 we can show that the coefficients of regression equation (1) are given by
If only one marker A is used in the analysis, Eq. (3) can be replaced by αA=DAQαQ/(PaPA), δA=DAQ2δQ/(Pa2PA2). When the marker and QTL are the same and allele A is Q1 and allele a is Q2, DAQ=q1q2=PaPA, this implies that αA=αQ, and δA=δQ, that is, all genetic substitution effect and dominance effect are encompassed in the allele association parameters. Hence, the estimates of the residual additive genetic variance σga2 and σgd2 should be close to 0 in data analysis.
For a relative pair (1,2) of individuals 1 and 2. Table 1 gives the conditional probability P(G1,G2∣C) given their allele IBD sharing status. Here Gi is genotype of relative i, and C is one event of (IBD=k), k=0,1,2. For example, P(AA, AA∣IBD=0)=PA4, P(AA,AA∣IBD=1)=PA3 and P(AA, AA∣IBD=2)=PA2. Denote indicator functions
Utilizing the conditional probabilities of Table 1, the conditional expectation of product of the indicator functions can be calculated and the results are listed in Table 2. For instance, E(xA1xA2∣IBD=2)=4Pa2PA2+(Pa−PA)2 2PaPA+4PA2Pa2=2PaPA. Based on Table 2, the expectation of product of indicator functions of (4) can be calculated for various types of relative pair. If individuals 1 and 2 are the same person, P(IBD=2)=1 and then
where O are 0 matrices. If individuals 1 and 2 are common type of relatives, Appexdix A provides expectations of product of indicator functions (4).
From above discussions, it is clear that the linkage is modeled in the variance–covariance matrix, while the association parameters are modeled in the mean coefficients. Hence, the model simultaneously takes linkage and association into account. The individual test of either linkage or association can be performed as follows: compare the full model in which all parameters are estimated and sub-models in which some parameters are fixed as constant.
In practice, one may first test linkage to detect a broad region of a trait locus. That is, the coefficients of αA, αB, δA and δB are fixed to be 0; and test the existence of linkage. In the presence of linkage, the association test can be performed for fine mapping of the QTL. If both additive and dominance variances σga2 and σgd2 are significantly larger than 0, the genetic effects are not equal to 0 (ie, αQ≠0 and δQ≠0). This implies that a null hypothesis HAB,ad:αA=αB=δA=δB=0 is equivalent to DAQ=DQB, that is, no association between the QTL Q and the markers. Hence, HAB,ad can be used to test association. Notice that there are four parameters under the null hypothesis HAB,ad, but only two independent parameters DAQ and DQB are under test. Hence, at least two of the regression coefficients of αA, αB, δA, and δB must be 0 in data analysis. Suppose only additive variance σga2 is significantly larger than 0, but dominance variance σgd2 is not significantly larger than 0 (ie, αQ≠0 and δQ=0). Regression (1) can be simplified by excluding the dominance effects from the analysis, that is, setting δA=δB=0. A null hypothesis HAB,a:αA=αB=0 can be used to test association between the QTL Q and the markers.
In regression (1), two markers A and B are used in the analysis. If multiple markers are available, one may extend it to fit the data. For example, assume one more marker C is typed. Then regression (1) can be extended to
where xCij and zCij are defined accordingly in the same way of (2), and αC and δC are their regression coefficients.
Evidence of association can be tested by likelihood ratio test (LRT) procedure. For instance, let Lad be the log-likelihood under the alternative hypothesis of HAB,ad, and L0 be the log-likelihood under the null hypothesis HAB,ad. Then, the quantity 2[Lad–L0] is asymptotically distributed as χ2. Notice again that there are only two measures of LD, DAQ and DQB, under the null hypothesis HAB,ad. In data analysis, the number of coefficients αA, αB, δA, δB, which are significantly different from 0, should be less than or equal to 2. This number is the degrees of freedom of the likelihood ratio test 2[Lad–L0]. For large sample data, the likelihood ratio test is accurate based on the statistical theory. In the following, we will develop a F-test procedure based on linear model theory.24
F-tests and noncentrality parameter approximations
For the convenience of explanation, let us consider I families given in graph A or graph B of Figure 1 (Figure 1 in Abecasis et al3). Let N be the total number of individuals, that is, N=nI, where n=11 for graph A and n=18 for graph B of Figure 1, respectively. For each individual in Figure 1, an ID is assigned. For the grand parents of graph B, the genotypes are unavailable and so no IDs are assigned. Denote the total trait values y=(y1τ, …, yIτ)τ, the total variance–covariance matrix by Σ=diag(Σ1, …, ΣI), and the model matrix by X=X1τ, …, XIτ)τ. Assume that Σ1=⋯=ΣI. Let α̂A, α̂B, δ̂A, δ̂B, Σ̂i, Σ̂ be the maximum likelihood estimators of αA, αB, δA, δB, Σi, Σ. The estimate of μ is μ̂=[XτΣ̂−1 X]−1 Xτ Σ̂−1 y=[∑i=1IXiτΣ̂i−1Xi]−1∑i=1IXiτΣ̂i−1yi. Let H be a q × 5 test matrix of rank q. By Graybill24 (Chapter 6), the test statistic of a hypothesis Hμ=0 is noncentral F(q, N−5) defined by
The noncentrality parameter of the test statistic F can be calculated by λ=(Hμ)τ[H[XτΣ−1X]−1Hτ]−1 Hμ=(Hμ)τ[H[∑i=1IXiτΣi−1Xi]−1Hτ]−1 (Hμ). Denote In Appendix B, an approximation of the noncentrality parameter is provided by
where b1 and b2 are constants given by equations (B.1) and (B.2) in Appendix B for graph A and graph B of Figure 1, respectively. Notations O are 0 matrices.
To test the null hypothesis HAB,a : αA=αB=0, the test matrix
Let us denote the corresponding F-test statistic by FAB,a. The noncentrality parameter is
To test the null hypothesis HAB,d: δA=δB=0, the test matrix
Denote the corresponding F-test statistic by FAB,d. The noncentrality parameter is
To test the null hypothesis HAB,ad : αA=αB=δA=δB=0, the test matrix
Denote the corresponding F-test statistic by FAB,ad. The noncentrality parameter is λAB,ad=λAB,a+λAB,d. Assume that only one marker A is used in the analysis. The test statistic of a hypothesis Hμ=0 is noncentral F(q,N−3) defined by
where μ=(β, αA, δA)τ. The noncentrality parameter is for the null hypothesis HA,ad : αA=δA=0. Correspondingly, we denote the F-test statistic by FA,ad. Similarly, λA,a≈(b1I/σ2)σga2DAQ2/(PAPaq1q2) is the noncentrality parameter of the test statistic FA,a for the null hypothesis HA,a :αA=0. The noncentrality parameter of the test statistic FA,d for the null hypothesis HA,d : δA=0 is λA,d≈(b2I/σ2)σgd2 DAQ4/(PA2Pa2q12q22).
Type I error rates and power comparison
Before making comparison, let us briefly describe the ‘AbAw’ method and comments on the difference of the ‘AbAw’ approach with ours. Such as our proposed method, the random effect parameters (σg2, σG2 and σe2) are modeled in the variance–covariance matrices in the ‘AbAw’ approach. However, the ‘AbAw’ approach decomposes the genetic association into effects of between-family members and within-family members.2,3,4,9,12 The effects of between family members and within-family members are modeled in two coefficients βb and βw. This is primarily the difference between the ‘AbAw’ approach and our method. Suppose only one marker A is used in the analysis. Ignoring the dominance effect, our model only has one additive coefficient αA but the ‘AbAw’ still has two regression coefficients βb and βw. Under the assumption of multivariate normality, the ‘AbAw’ χqtl2 is defined as the likelihood ratio test of hypothesis βw=0 against no constraints on the coefficient. In addition, the ‘AbAw’ approach uses only one marker and it is not clear how to extend the ‘AbAw’ approach to use more than one markers in analysis. We use two biallelic markers, and it is easy to extend our approach to use multiple biallelic markers in analysis. For sibship data, Fan and Jung7 compared our method with the ‘AbAw’ approach and found that our method is advantageous. In the following, we are going to calculate the type I error rates to show the robustness of the proposed approach. Then, we will make power comparison for general extended pedigrees.
Type I error rates
To investigate the robustness of the proposed approach, we calculate type I error rates at a 0.05 significance level for five test cases of Table 3, which is the same as those of Table 1 of Abecasis et al.3 Trait values are constructed from a normal distribution with mean 100, and total variance σ2=1 at the trait locus Q except for the Admixture case. Here σ2=σga2+σGa2+σe2 is the summation of the additive variance of major gene effect σga2, the variance of additive polygenic effect σGa2, and the error variance σe2. A biallelic marker A is simulated at a recombination fraction 0 between QTL Q and locus A. In each case, linkage equilibrium is assumed between QTL Q and marker A, that is, DAQ=0. In the test case of Admixture, population admixture is generated by mixing families equally draw from one of the two subpopulations C and D. In both subpopulations C and D, no major gene effect or familial effect is assumed, that is, σg2=σG2=0. However, the trait mean of subpopulation C is fixed as 1 and the variance is fixed as 1, and the marker allele frequency PA is taken as 0.7 in subpopulation C. The trait mean of subpopulation D is fixed as 0 and the variance is fixed as 1, and the marker allele frequency PA is taken as 0.3 in subpopulation D. Therefore, the total variance in the mixing population is σ2=1.25. The admixture contributed to (1–0)2/[4 × 1.25]=0.20 of the total variance. The related parameters of the other four test cases are given in the legend of Table 3.
To calculate the type I error rates, 1000 data sets are simulated for each test case. Each data set contains 50 pedigrees of either graph A or graph B of Figure 1, respectively. Using the data sets, we fit the following model
where Gij is normal N (0, σGa2), yij is normal N (β+xAijαA,σ2), and σ2=σga2+σGa2+σe2. The null hypothesis is HA,a : αA=0. Since the QTL Q is in linkage equilibrium with marker A, an empirical test statistic, which is larger than the cutting point at a 0.05 significance level, is treated as a false positive. Based on either likelihood ratio test or F-test, type I error rates are calculated as the proportions of the 1000 simulation data sets, which give significant result at the 0.05 significant level based on FA,a and likelihood ratio test statistic, respectively.
The results of Table 3 shows that the type I error rates are around the nominal level 0.05. Hence, the model is reasonably robust. For the large three-generation pedigree of Graph B of Figure 1, the type I error rates are very close to the nominal 0.05 significance level. Hence , the model works well under large sample case, as expected (notice that the total number of individuals N=550 for the 50 small three-generation pedigrees of Graph A, and N=900 for the 50 large three-generation pedigrees of Graph B). Moreover, the results of F-test F̂A,a are very close to those of likelihood ratio test statistic. This is, theoretically, guaranteed under large sample by the property of the two tests (Chapter 6, Graybill24).
Comparison with the ‘AbAw’ approach
Table 4 shows the power of 50 duplicates of pedigree A (and pedigree B) of Figure 1, respectively, for varying levels of LD between trait locus and marker A at 0.01 significance level. The parameters are exactly the same as those of Abecasis et al.3 Besides, the power of AbAw's χqtl2 is taken from Table 2 of Abecasis et al.3 The power of FA,a is calculated based on the noncentrality parameter approximation λA,a; the power of F̂A,a and LRT are calculated as the proportions of 1000 simulation data sets, which give significant result at the 0.01 significance level based on FA,a and likelihood ratio test statistic, respectively. It is obvious from Table 4 that FA,a and likelihood ratio test based on our model have higher power than that of AbAw's χqtl2. Since other test statistics in Table 2 of Abecasis et al3 is less powerful than the AbAw's χqtl2, the method proposed in the current article is advantageous over the AbAw approach. As the type I error rates in Table 3, the results of F-test F̂A,a are very close to those of likelihood ratio test statistics.
Power comparison
Let us denote the heritability by h2, which is h2=σga2/σ2.23. To make power comparison, we conside 30 duplicates of small and large 3-generation pedigrees of graphs A and B of Figure 1, respectively. Two modes of inheritance are taken as example: a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritace a=1.0,d=−0.5, respectively. Figures 2 and 3 show the power curves of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against LD coefficient DAQ at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. The other parameters are given in the legend of the Figures. It can be seen that FAB,ad, generally has higher power than that of FA,ad, and FAB,a generally has higher power than that of FA,a. Hence, analysis using two markers are advantageous. Owing to the increase of degrees of freedom of test statistics, FAB,ad generally has lower power than that of FAB,a, and FA,ad generally has lower power than that of FA,a. Besides, the power of FAB,d and FA,d is low unless the LD between markerr A and QTL Q is vey strong. Since pedigree of Graph B of Figure 1 is larger than that of Graph A, the power of test statistics for Figure 3 is higher than that of corresponding statistics for Figure 2. Thus, large pedigrees contain more LD information and the models proposed rightly use the information.
Figures 4 and 5 show power curves of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against trait frequency q1 or marker allele frequency PA at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. In addition to the merits shown in Figures 2 and 3, the power of test statistics FAB,ad, and FAB,,a heavily depends on the frequency q1 of QTL (graphs I and II of each figure), but not very much on the marker allele fequency PA (graphs III and IV of each figure). Figures 6 and 7 show power curves of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against heritability h2 at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. It can be seen that the power of FAB,ad, FAB,a, FA,ad and FA,a can be high as the heritability h2 is larger than 0.1. The power of FAB,d and FA,d is generally low.
An example
The proposed method is applied to analyze ACE data.14,15 The data consist of 83 extended families with between four and 18 members, and the pedigrees range in size from two to three generations. Circulating ACE levels were measured for 405 individuals. In total, 10 biallelic polymorphisms in the ACE gene were genotyped. Multipoint IBD at each marker are calculated by Merlin.25 Software QTDT-2.4.3 is used to analyze the data. There are missing genotype information at markers, and so the total number of individuals is different from marker to marker. For instance, there are four individuals whose genotype information is missing at marker I/D and the total number N=401; at marker G2350A, on the other hand, there are more missing genotype data, and N=365.
Variance component linkage analysis shows that additive variances are significantly larger than 0, but dominance variances are not significantly larger than 0.3 Hence, dominance effects can be excluded from regression equation. The total variance is modeled as σ2=σga2+σGa2+σe2, and related variance-covariance structure can be constructed. Table 5 shows LD analysis of the ACE gene by individual marker. To make comparison with the ‘AbAw’ approach, results of AbAw's lod are taken from Table 4 of Abecasis et al.3 After fitting the proposed models in this article, lod is calculated by LRT/(2 ln 10), where LRT=2(L1−L0), L1 is the log-likelihood under yij=β+xAijαA+Gij+eij, and L0 is the log-likelihood under yij=β+Gij+eij. The lod scores calculated by the proposed method in this article is generally higher than those of the ‘AbAw’ approach, which is consistent with the results of Table 4. Hence, the proposed method is advantageous. Moreover, the results of Table 5 confirm the finding that the association is strongest around the G2215A, I/D, G2350A and 4656(CT)3/2 polymorphisms (lod=27.01), 27.37, 28.01 and 27.93; and the ‘AbAw’ lod=14.91, 15.76, 14.40 and 14.22, respectively;3). The estimations of σga2 are 0.05 and 0.03 at markers I/D and G2350A, respectively. In Table 5, the additional linkage effects are presented. The additional linkage lod is calculated by 2(L1–L2)/(2 ln 10), where L2 is the log-likelihood yij=β+xAijαA+Gij+eij assuming the total variance σ2=σGa2+σe2 and related variance–covariance structure. For markers I/D and G2350A, the additional linkage lods are 0.09 and 0.03 (the related P-values are 0.76 and 0.85, respectively). The residual QTL-specific variances of 0.05 and 0.03 for the two ACE markers of largest effect are not significantly different from zero. Little additional linkage effects are present after modeling the association effects. Therefore, these markers are likely in complete LD with the trait alleles, and the genetic variance attributable to the QTL is encompassed in the association coefficient αA.2,3
Notice that there are 10 individual results in Table 5, and it is unclear if there is any relation among the models/markers from results in Table 5. Using the proposed models, we explore LD analysis based on two or more markers. Table 6 shows LD analysis of the ACE gene by two markers. Regressions are given by: (1) yij=β+Gij+eij; (2) yij=β+xAijαA+Gij+eij; (3) yij=β+xAijαA+xBijαB+Gij+eij. We first investigate markers I/D and G2350A that show strongest association in individual marker analysis of Table 5. It turns out that marker I/D is correlated with marker G2350A in the following sense: if A=I/D and B=G2350A, xAij are xBij are correlated to each other. This phenomenon motivates us to further investigate the relation of these two markers. Utilizing data from http://www.well.ox.ac.uk/~mfarrall/oxhap-freq.html, we calculate the frequency of the four haplotypes of these two markers as follows: P(IA)=0.478875, P(IG)=0.002817, P(DG)=0.515494, P(DA)=0.002817 (note here A is an allele at marker G2350A). This shows that allele I at marker I/D is almost always present with allele A at marker G2350A, and allele D at marker I/D is almost always present with allele G at marker G2350A. Besides, the measure of LD is 0.246878 between the markers I/D and G2350A. The two markers are almost in complete LD with each other.
Since there are more missing data at marker G2350A than that at marker I/D, we choose I/D as baseline marker A. Six markers, 4656(CT)3/2, T-5491C, A-5466C, T-3892C, A-240T and T-93C, show significant association at a 0.05 significance level, in addition to the association of marker I/D (see the P-value of test H0 : αB=0 in Table 6). In these six markers, 4656(CT)3/2 shows strongest additional association with LRT=7.94, lod=1.72, P-value=0.005 (Table 6). Hence, we choose I/D and 4656(CT)3/2 as markers A and B for the basis of further investigation. Table 7 shows results by one more marker in addition to markers I/D and 4656(CT)3/2. Four markers, T-5491C, A-5466C, T-3892C and A-240 T, show significant association at a 0.05 significance level, in a addition to the association of markers I/D and 4656(CT)3/2 (see the P-value of test H0 : αC=0 in Table 7). Further investigation shows that there is no significant evidence to analyze the data using more than three markers in the regression equations.
Notice that if the significance level is set at 0.01 instead of 0.05, then only markers I/D and 4656(CT)3/2 are associated with the trait locus (Table 6). The regression, yij=β+xAijαA+xBijαB+Gij+eij, A=I/D, B=4656(CT)3/2, can fully interpret association at the 0.01 significance level, which provides a unique result.
Discussion
This article focuses on building models that may utilize extended multi generation large pedigrees for LD mapping of QTL. Based on the same logic developed in our previous work,7,8 variance component models are constructed for high resolution joint linkage and LD mapping of QTL. By type I error rate evaluation, it is found that the proposed models have correct type I errors and the methods are robust. The methods proposed are compared with the ‘AbAw’ approach. It is found that our models have higher power. This is consistent with our previous findings based on sibpair data.7 Based on power comparison, it is shown that large pedigrees may contain more LD information than small pedigrees. Moreover, the proposed method is applied to ACE data with a better result than that of the ‘AbAw’ approach. In addition, it is found that markers I/D and 4656(CT)3/2 can fully describe the association with the trait locus at a 99% significance level for the ACE data. This result is very meaningful. For instance, all 10 biallelic markers of ACE data show association with the trait locus. However, it is markers I/D and 4656(CT)3/2 which are really necessary in describing the association at the 99% significance level. Marker G2350A is correlated to marker I/D, as a matter of fact!
One possible explanation that the proposed method is more powerful than the ‘AbAw’ approach is as follows. The proposed method decomposes the association into the summation of additive and dominance effects. If dominance effect is not significantly larger than 0, then only additive effect is modeled such as the ACE data. The’AbAw’ approach, on the other hand, decomposes the association into the summation of between-family (b) and within-family (w) components. In Table 4 of Abecasis et al,3 the ‘AbAw’ lod is reported to be the within-family effect, which is less powerful than the proposed method. By decomposing the association effect into association between-family and association within-family in the ‘AbAw’ approach, the information is diluted and so the approach is less powerful. This is, in the opinion of the authors of the current article, is not a criticism for the ‘AbAw’ approach, since it may be a good method in certain circumstances. It is our hope that the current article may stimulate more interest in exploring the most appropriate method in LD mapping of complex diseases.
Population may contain LD information, and pedigree data may contain both linkage and LD information. In mapping complex diseases, it is difficult to get replicate linkage findings using pedigree data. It is interesting to combine the same pedigree data with population data for fine association study. In general, any type of pedigree data can be combined with population data for a unified analysis of linkage and LD mapping of QTL, based on the models developed in the current paper and our previous research.7,8 Linkage analysis can localize genetic traits in a broad region, and is less sensitive to population structures. LD mapping, on the other hand, is appropriate for fine high-resolution mapping and sensitive to population admixtures. As the first step, one may perform linkage analysis using pedigree data to get prior linkage evidences based on a sparse genetic map.26,27 Then, a combined analysis using both population data and pedigree data can take advantage of both linkage and LD mapping for high resolution of genetic traits. That is, it may provide high resolution of LD mapping, and be more likely to avoid the false positives by using the prior linkage evidences based on a dense genetic map.28
Up to now, most research focus on using biallelic markers in LD mapping of QTL. It would be interesting in building models using multiallelic markers such as macrosatellites or haplotype blocks. One problem in using multiallelic markers in analysis is that the number of parameters can be big, which may lead to high degrees of freedom of test statistics. Hence, selection of significant markers and relevant alleles is an important issue. In addition, genotyping errors can heavily impact the result of a study. These issues deserve more in depth investigations.
References
Fan R, Xiong M : High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur J Hum Genet 2002; 10: 607–615.
Abecasis GR, Cardon LR, Cookson WOC : A general test of association for quantitative traits in nuclear families. Am J Hum Genet 2000; 66: 279–292.
Abecasis GR, Cookson WOC, Cardon LR : Pedigree tests of linkage disequilibrium. Eur J Hum Genet 2000; 8: 545–551.
Abecasis GR, Cookson WOC, Cardon LR : The power to detect linkage disequilibrium with quantitative traits in selected samples. Am J Hum Genet 2001; 68: 1463–1474.
Allison DB, Bonnie T, St Jean P, Elston RC, Infante MC, Schork NJ : Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet 1998; 63: 1190–1201.
Almasy L, Williams JT, Dyer TD, Blangero J : Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genet Epidemiol 1999; 17 (Suppl 1): S31–S36.
Fan R, Jung J : High resolution joint linkage disequilibrium and linkage mapping of quantitative trait loci based on sibship data. Hum Hered 2003; 56: 166–187.
Fan R, Xiong M : Combined high resolution linkage and association mapping of quantitative trait loci. Eur J Hum Genet 2003; 11: 125–137.
Fulker DW, Cherny SS, Sham PC, Hewitt JK : Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet 1999; 64: 259–267.
Göring HHH, Terwillinger JD : Linkage analysis in the presence of error IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am J Hum Genet 2000; 66: 1310–1327.
Martin ER, Monks SA, Warren LL, Kaplan NL : A test for linkage and association in general pedigrees: the pedigree diseqilibrium test. Am J Hum Genet 2000; 67: 146–154.
Sham PC, Cherny SS, Purcell S, Hewitt JK : Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 2000; 66: 1616–1630.
George V, Tiwari HK, Zhu XF, Elston RC : A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am J Hum Genet 1999; 65: 236–245.
Farrall M, Keavney B, MckKenzie CA et al.: Fine mapping of an ancestral recombination break-point in DCP1. Nat Genet 1999; 23: 270–271.
Keavney B, MckKenzie CA, Connell JM et al: Measured haplotype analysis of the angiotension-1 converting enzyme gene. Hum Mol Genet 1998; 7: 1745–1751.
Amos CI : Robust variance-components approach for assessing linkage in pedigrees. Am J Hum Genet 1994; 54: 534–543.
Amos CI, Elston RC : Robust methods for the detection of genetic linkage for quantitative data from pedigrees. Genet Epidemiol 1989; 6: 349–360.
Amos CI, Elston RC, Wilson AF, Bailey-Wilson JE : A more powerful robust sib-pair test of linkage for quantitative traits. Genet Epidemiol 1989; 6: 435–449.
Fulker DW, Cherny SS, Cardon LR : Multiple interval mapping of quantitative trait loci, using sib-pairs. Am J Hum Genet 1995; 56: 1224–1233.
Almasy L, Blangero J : Multipoint quantitative trait linkage analysis in general pedigrees. Am J Hum Genet 1998; 62: 1198–1211.
Goldgar DE, Oniki RS : Comparison of a multipoint identity-by-descent method with parametric multipoint linkage analysis for mapping quantitative traits. Amer J Hum Genet 1992; 50: 598–606.
Pratt SC, Daly M, Kruglyak L : Exact multipoint quantitative-trait linkage analysis in pedigrees by variance components. Am J Hum Genet 2000; 66: 1153–1157.
Falconer DS, Mackay TFC : Introduction to quantitative genetics, 4th edn. London: Longman, 1996.
Graybill FA : Theory and application of the linear model. Belmont, CA: Wadsworth Publishing Company, Inc., 1976.
Abecasis GR, Cherny SS, Cookson WOC, Cardon LR : Merlin—rapid analysis of dense genetic maps usig sparse gene flow trees. Nat Genet 2002; 30: 97–101.
Broman KW, Murray JC, Sheffied VC, White RL, Weber JL : Comprehensive human genetic map: individual and sex-specific variation in recombination. Am J Hum Genet 1998; 63: 861–869.
Kong A, Gudbjartsson DF, Sainz J et al.: A high resolution recombination map of the human genome. Nat Genet 2002; 31: 241–247.
The International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001; 409: 928–933.
Acknowledgements
We thank two reviewers, and Dr Gert-Jan B von Ommen for their helpful comments to improve the paper. R Fan was supported partially by a research fellowship from the Alexander von Humboldt Foundation, Germany, and an International Research Travel Assistance Grant of the Texas A&M University. We are grateful to Dr Abecasis for kindly providing the simulation program PEDSIMUL to generate simulated datasets.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
In this appendix, we provides expectations of product of indicator functions (4) for a few common type of relatives. If individuals 1 and 2 are heterozygous sibs P(IBD=0)=P(IBD=2)=1/4, P(IBD=1)=1/2 and then
If individuals 1 and 2 are a parent–offspring pair, P(IBD=0)=P(IBD=2)=0, P(IBD=1)=1 and then
If individuals 1 and 2 are a grandparent/grandchild pair (or uncle/niece or aunt/nephew pair), P(IBD=0)=1/2, P(IBD=2)=0, P(IBD=1)=1/2 and then
If individuals 1 and 2 are first cousins, P(IBD=0)=3/4, P(IBD=2)=0, P(IBD=1)= 1/4 and then
In general, consider two individuals 1 and 2 who are noninbred relatives. Let Φ12 be their kinship coefficient, and Δ712 be the probability that both alleles shared by the two individuals are IBD at any locus. Then
Appendix B
Based on Equations (5) and (A.1), (A.2), (A.3) and (A.4), an approximation of noncentrality parameter of test staticstics F can be obtained as follows: Let us denote Σi−1=(1/σ2)(γkl)n × n n=11 for graph A and n=18 for graph B of Figure 1, respectively. When I is sufficiently large, the laws of large numbers implies that
Here, b1 and b2 are constants as follows for pedigrees in graph A of Figure 1
For pedigrees in graph B of Figure 1, constants b1 and b2 are given by
Therefore, the noncentrality parameter can be approximated by
Rights and permissions
About this article
Cite this article
Fan, R., Spinka, C., Jin, L. et al. Pedigree linkage disequilibrium mapping of quantitative trait loci. Eur J Hum Genet 13, 216–231 (2005). https://doi.org/10.1038/sj.ejhg.5201301
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.ejhg.5201301
Keywords
This article is cited by
-
Favorable alleles mining for gelatinization temperature, gel consistency and amylose content in Oryza sativa by association mapping
BMC Genetics (2019)
-
Association studies of dormancy and cooking quality traits in direct-seeded indica rice
Journal of Genetics (2014)
-
Polymorphisms of the IGF1R gene and their genetic effects on chicken early growth and carcass traits
BMC Genetics (2008)
-
Combined Linkage and Association Mapping of Quantitative Trait Loci with Missing Completely at Random Genotype Data
Behavior Genetics (2008)
-
PedGenie: an analysis approach for genetic association testing in extended pedigrees and genealogies of arbitrary size
BMC Bioinformatics (2006)