Introduction

To map quantitative trait loci (QTL) of complex diseases, linkage disequilibrium (LD or association) regression analysis using population data can be performed.1 However, population substructures may affect the results and produce false positives. To remedy this, variance component models are proposed to perform joint LD and linkage mapping of QTL using both population and pedigree data.2,3,4,5,6,7,8,9,10,11,12 However, most research limits on models that utilize small nuclear families in research. In principle, the method of George et al13 can be used to analyze general pedigrees in joint linkage and LD mapping, but the examples of George et al13 are limited to nuclear families. To our knowledge, Abecasis et al,3 Göring and Terwillinger,10 and Martin et al11 are the only literature that present joint linkage and LD mapping methods using extended multi-generation pedigrees for QTL mapping.

In our previous research of joint linkage and LD mapping (or association study), either nuclear families or sibships were combined with population data to construct variance component models to map QTL for complex diseases.1,7,8 This article generalizes our previous work to extended multigeneration pedigrees of any sizes and any types of relatives. Intuitively, large pedigrees contain more linkage and LD information. Therefore, it is important to develop models that may accommodate any types of data, including population data, sibships, nuclear families and multigeneration large pedigrees for a combined analysis.

The arrangement of the paper is as follows. First, variance component models are constructed based on multigeneration pedigree data following our previous research. The models incorporate both linkage and LD information in a unified analysis. The linkage information is modeled in variance–covariance matrix, while the association parameters are modeled in mean coefficients. Moreover, genetic effects are decomposed into orthogonal summation of additive and dominance effects. Type I error rates are calculated for five test cases to show the robustness of the proposed approach. To explore the effectiveness of the proposed methods, comparison with the association between-family and association within-family (‘AbAw’) approach of Abecasis and Faulker et al is investigated.2,3,4,9,12 Two pedigrees of Figure 1 in Abecasis et al3 are taken as example. The comparison is based on both analytical formulae of noncentrality parameter approximations of test statistics and simulation results, using the same values of parameters of Abecasis et al.3 The simulation program PEDSIMUL kindly provided by Dr Abecasis is used to generate simulated datasets. As a practical example, the method is applied to investigate the relation between polymorphisms in the angiotensin-1-converting enzyme (ACE) genes and circulating ACE levels.14,15

Figure 1
figure 1

Pedigrees used in power calculations and comparison, which are taken from Figure 1 of Abecasis et al.3 The number in the box or circle is individual ID.

Methods

Consider a quantitative trait locus Q which has two alleles Q1 and Q2 with allele frequencies q1 and q2, respectively. Assume that two markers A and B are typed in a chromosome region of the trait locus Q. Marker A has two alleles A and a with frequencies PA and Pa, respectively. Marker B has two alleles B and b with frequencies PB and Pb, respectively. Suppose that the data are composed of I families. Let us list the log-likelihood of the I families by L1, …, LI. The overall log-likelihood is L=∑Ii=1Li. In the ith family, let ni be the total number of individuals who are listed as j=1, 2, …, ni, each individual j is preceded by all his/her ancestors. Let us denote the quantitative traits of ith family by a vector yi= ( y i 1 , y i 2 ,  ,  y i n i ) τ , genotypes at marker A by a vector ( A i 1 A i 2 , A i n i ) τ , and genotypes at marker B by a vector ( B i 1 B i 2 , B i n i ) τ . The log-likelihood is defined by

under the assumption of multivariate normality. The notations of the log-likelihood are defined as follows. The mean component Xiμ is from the following regression equation in a similar way as that of our previous work:1,7,8

where β is the overall mean, Gij is the polygenic effect, eij is the error term. Assume that Gij is normal N(0,σG2), and eij is normal N(0,σe2), Moreover, Gij and eij are independent. xAij, xBij, zAij and zBij are dummy variables defined by

αA, αB, δA and δB are regression coefficients of the dummy variables xAij, xBij, zAij and zBij.

The model matrix Xi is defined by

and μ=(β, αA, αB, δA, δB)τ is a vector of regression coefficients. An intuitive rationale of regression (1) is provided on pages 608–609 in Fan and Xiong.1 In regression (1), it is assumed that there are no covariates, which may be included as that of Fan and Xiong.1,8

Σi is a ni × ni variance–covariance matrix defined as

where σ2=σg2+σG2+σe2, σg2 is variance explained by the putative QTL Q, σG2 is polygenic variance, and σe2 is error variance. The genetic variances σg2=σga2+σgd2 and σG2=σGa2+σGd2 are decomposed into additive and dominance components. ρjk=(πjkQσga2jkQσgd2+2ΦjkσGa27jkμGd2)/σ2 is correlation between the jth individual and the kth individual of the family, where πjkQ is the proportion of alleles shared identical by descent (IBD) at QTL Q by the jth and the kth individuals, ΔijQ is the probability that both alleles at QTL Q shared by the jth and the kth individuals are IBD, Φjk is kinship coefficient of individuals j and k, and Δ7jk is the probability that both alleles shared by the jth and the kth individuals are IBD at any locus. πjkQ and ΔjkQ are usually estimated by marker information. 16,17,18 The recombination fractions between the genotyped markers and the unobserved QTL are contained in the estimations of πjkQ and ΔjkQ.7,19,20,21,22 Hence, linkage information is modeled in variance–covariance matrix.

Let μij be the effect of genotype QiQj,i,j=1,2, μ12=μ21, that is μij is the mean displacement in phenotype trait for a given genotype QiQj. Denote the population effect mean by μ=μ11q12+2μ12q1q2+μ22q22, that is, μ is the aggregate effect of the QTL on the trait mean in the population. Define αQ=q1μ11+(q2q1)μ12q2μ22, δQ=2μ12μ11μ22. If μ11=a, μ12=d, and μ22=−a as in the traditional quantitative genetics,23 αQ=a+(q2q1)d is the average allele substitution effect, and δQ=2d characterizes the dominance effect. In general, one may define a=μ11−(μ11+μ22)/2 and d=μ12−(μ11+μ22)/2. It is well known that the additive variance σgd2=2q1q2αQ2 and the dominance variance σgd2=(q1q2)2δQ2.

Denote a measure of LD between trait locus Q and marker A by DAQ=P(AQ1)−q1PA, a measure of LD between trait locus Q and marker B by DQB=P(BQ1)−q1PB, and a measure of LD between marker A and marker B by DAB=P(AB)−PAPB. Let the additive and dominance variance-covariance matrices be

Such as in Appendix B, Fan and Xiong,1 we can show that the coefficients of regression equation (1) are given by

If only one marker A is used in the analysis, Eq. (3) can be replaced by αA=DAQαQ/(PaPA), δA=DAQ2δQ/(Pa2PA2). When the marker and QTL are the same and allele A is Q1 and allele a is Q2, DAQ=q1q2=PaPA, this implies that αA=αQ, and δA=δQ, that is, all genetic substitution effect and dominance effect are encompassed in the allele association parameters. Hence, the estimates of the residual additive genetic variance σga2 and σgd2 should be close to 0 in data analysis.

For a relative pair (1,2) of individuals 1 and 2. Table 1 gives the conditional probability P(G1,G2C) given their allele IBD sharing status. Here Gi is genotype of relative i, and C is one event of (IBD=k), k=0,1,2. For example, P(AA, AAIBD=0)=PA4, P(AA,AAIBD=1)=PA3 and P(AA, AAIBD=2)=PA2. Denote indicator functions

Table 1 Conditional probability P(G1, G2C) of a relative pair (1, 2) given their allele IBD sharing status

Utilizing the conditional probabilities of Table 1, the conditional expectation of product of the indicator functions can be calculated and the results are listed in Table 2. For instance, E(xA1xA2IBD=2)=4Pa2PA2+(PaPA)2 2PaPA+4PA2Pa2=2PaPA. Based on Table 2, the expectation of product of indicator functions of (4) can be calculated for various types of relative pair. If individuals 1 and 2 are the same person, P(IBD=2)=1 and then

where O are 0 matrices. If individuals 1 and 2 are common type of relatives, Appexdix A provides expectations of product of indicator functions (4).

Table 2 Conditional expectation of a relative pair (1, 2) given their allele IBD sharing status

From above discussions, it is clear that the linkage is modeled in the variance–covariance matrix, while the association parameters are modeled in the mean coefficients. Hence, the model simultaneously takes linkage and association into account. The individual test of either linkage or association can be performed as follows: compare the full model in which all parameters are estimated and sub-models in which some parameters are fixed as constant.

In practice, one may first test linkage to detect a broad region of a trait locus. That is, the coefficients of αA, αB, δA and δB are fixed to be 0; and test the existence of linkage. In the presence of linkage, the association test can be performed for fine mapping of the QTL. If both additive and dominance variances σga2 and σgd2 are significantly larger than 0, the genetic effects are not equal to 0 (ie, αQ≠0 and δQ≠0). This implies that a null hypothesis HAB,ad:αA=αB=δA=δB=0 is equivalent to DAQ=DQB, that is, no association between the QTL Q and the markers. Hence, HAB,ad can be used to test association. Notice that there are four parameters under the null hypothesis HAB,ad, but only two independent parameters DAQ and DQB are under test. Hence, at least two of the regression coefficients of αA, αB, δA, and δB must be 0 in data analysis. Suppose only additive variance σga2 is significantly larger than 0, but dominance variance σgd2 is not significantly larger than 0 (ie, αQ≠0 and δQ=0). Regression (1) can be simplified by excluding the dominance effects from the analysis, that is, setting δA=δB=0. A null hypothesis HAB,a:αA=αB=0 can be used to test association between the QTL Q and the markers.

In regression (1), two markers A and B are used in the analysis. If multiple markers are available, one may extend it to fit the data. For example, assume one more marker C is typed. Then regression (1) can be extended to

where xCij and zCij are defined accordingly in the same way of (2), and αC and δC are their regression coefficients.

Evidence of association can be tested by likelihood ratio test (LRT) procedure. For instance, let Lad be the log-likelihood under the alternative hypothesis of HAB,ad, and L0 be the log-likelihood under the null hypothesis HAB,ad. Then, the quantity 2[Lad–L0] is asymptotically distributed as χ2. Notice again that there are only two measures of LD, DAQ and DQB, under the null hypothesis HAB,ad. In data analysis, the number of coefficients αA, αB, δA, δB, which are significantly different from 0, should be less than or equal to 2. This number is the degrees of freedom of the likelihood ratio test 2[Lad–L0]. For large sample data, the likelihood ratio test is accurate based on the statistical theory. In the following, we will develop a F-test procedure based on linear model theory.24

F-tests and noncentrality parameter approximations

For the convenience of explanation, let us consider I families given in graph A or graph B of Figure 1 (Figure 1 in Abecasis et al3). Let N be the total number of individuals, that is, N=nI, where n=11 for graph A and n=18 for graph B of Figure 1, respectively. For each individual in Figure 1, an ID is assigned. For the grand parents of graph B, the genotypes are unavailable and so no IDs are assigned. Denote the total trait values y=(y1τ, …, yIτ)τ, the total variance–covariance matrix by Σ=diag1, …, ΣI), and the model matrix by X=X1τ, …, XIτ)τ. Assume that Σ1=I. Let α̂A, α̂B, δ̂A, δ̂B, Σ̂i, Σ̂ be the maximum likelihood estimators of αA, αB, δA, δB, Σi, Σ. The estimate of μ is μ̂=[XτΣ̂−1 X]−1 Xτ Σ̂−1 y=[∑i=1IXiτΣ̂i−1Xi]−1i=1IXiτΣ̂i−1yi. Let H be a q × 5 test matrix of rank q. By Graybill24 (Chapter 6), the test statistic of a hypothesis =0 is noncentral F(q, N−5) defined by

The noncentrality parameter of the test statistic F can be calculated by λ=()τ[H[XτΣ−1X]−1Hτ]−1 =()τ[H[∑i=1IXiτΣi−1Xi]−1Hτ]−1 (). Denote In Appendix B, an approximation of the noncentrality parameter is provided by

where b1 and b2 are constants given by equations (B.1) and (B.2) in Appendix B for graph A and graph B of Figure 1, respectively. Notations O are 0 matrices.

To test the null hypothesis HAB,a : αA=αB=0, the test matrix

Let us denote the corresponding F-test statistic by FAB,a. The noncentrality parameter is

To test the null hypothesis HAB,d: δA=δB=0, the test matrix

Denote the corresponding F-test statistic by FAB,d. The noncentrality parameter is

To test the null hypothesis HAB,ad : αA=αB=δA=δB=0, the test matrix

Denote the corresponding F-test statistic by FAB,ad. The noncentrality parameter is λAB,ad=λAB,a+λAB,d. Assume that only one marker A is used in the analysis. The test statistic of a hypothesis =0 is noncentral F(q,N−3) defined by

where μ=(β, αA, δA)τ. The noncentrality parameter is for the null hypothesis HA,ad : αA=δA=0. Correspondingly, we denote the F-test statistic by FA,ad. Similarly, λA,a≈(b1I/σ2)σga2DAQ2/(PAPaq1q2) is the noncentrality parameter of the test statistic FA,a for the null hypothesis HA,a :αA=0. The noncentrality parameter of the test statistic FA,d for the null hypothesis HA,d : δA=0 is λA,d≈(b2I/σ2)σgd2 DAQ4/(PA2Pa2q12q22).

Type I error rates and power comparison

Before making comparison, let us briefly describe the ‘AbAw’ method and comments on the difference of the ‘AbAw’ approach with ours. Such as our proposed method, the random effect parameters (σg2, σG2 and σe2) are modeled in the variance–covariance matrices in the ‘AbAw’ approach. However, the ‘AbAw’ approach decomposes the genetic association into effects of between-family members and within-family members.2,3,4,9,12 The effects of between family members and within-family members are modeled in two coefficients βb and βw. This is primarily the difference between the ‘AbAw’ approach and our method. Suppose only one marker A is used in the analysis. Ignoring the dominance effect, our model only has one additive coefficient αA but the ‘AbAw’ still has two regression coefficients βb and βw. Under the assumption of multivariate normality, the ‘AbAw’ χqtl2 is defined as the likelihood ratio test of hypothesis βw=0 against no constraints on the coefficient. In addition, the ‘AbAw’ approach uses only one marker and it is not clear how to extend the ‘AbAw’ approach to use more than one markers in analysis. We use two biallelic markers, and it is easy to extend our approach to use multiple biallelic markers in analysis. For sibship data, Fan and Jung7 compared our method with the ‘AbAw’ approach and found that our method is advantageous. In the following, we are going to calculate the type I error rates to show the robustness of the proposed approach. Then, we will make power comparison for general extended pedigrees.

Type I error rates

To investigate the robustness of the proposed approach, we calculate type I error rates at a 0.05 significance level for five test cases of Table 3, which is the same as those of Table 1 of Abecasis et al.3 Trait values are constructed from a normal distribution with mean 100, and total variance σ2=1 at the trait locus Q except for the Admixture case. Here σ2=σga2+σGa2+σe2 is the summation of the additive variance of major gene effect σga2, the variance of additive polygenic effect σGa2, and the error variance σe2. A biallelic marker A is simulated at a recombination fraction 0 between QTL Q and locus A. In each case, linkage equilibrium is assumed between QTL Q and marker A, that is, DAQ=0. In the test case of Admixture, population admixture is generated by mixing families equally draw from one of the two subpopulations C and D. In both subpopulations C and D, no major gene effect or familial effect is assumed, that is, σg2=σG2=0. However, the trait mean of subpopulation C is fixed as 1 and the variance is fixed as 1, and the marker allele frequency PA is taken as 0.7 in subpopulation C. The trait mean of subpopulation D is fixed as 0 and the variance is fixed as 1, and the marker allele frequency PA is taken as 0.3 in subpopulation D. Therefore, the total variance in the mixing population is σ2=1.25. The admixture contributed to (1–0)2/[4 × 1.25]=0.20 of the total variance. The related parameters of the other four test cases are given in the legend of Table 3.

Table 3 Type I error rates (%) at a 0.05 significance level

To calculate the type I error rates, 1000 data sets are simulated for each test case. Each data set contains 50 pedigrees of either graph A or graph B of Figure 1, respectively. Using the data sets, we fit the following model

where Gij is normal N (0, σGa2), yij is normal N (β+xAijαA,σ2), and σ2=σga2+σGa2+σe2. The null hypothesis is HA,a : αA=0. Since the QTL Q is in linkage equilibrium with marker A, an empirical test statistic, which is larger than the cutting point at a 0.05 significance level, is treated as a false positive. Based on either likelihood ratio test or F-test, type I error rates are calculated as the proportions of the 1000 simulation data sets, which give significant result at the 0.05 significant level based on FA,a and likelihood ratio test statistic, respectively.

The results of Table 3 shows that the type I error rates are around the nominal level 0.05. Hence, the model is reasonably robust. For the large three-generation pedigree of Graph B of Figure 1, the type I error rates are very close to the nominal 0.05 significance level. Hence , the model works well under large sample case, as expected (notice that the total number of individuals N=550 for the 50 small three-generation pedigrees of Graph A, and N=900 for the 50 large three-generation pedigrees of Graph B). Moreover, the results of F-test A,a are very close to those of likelihood ratio test statistic. This is, theoretically, guaranteed under large sample by the property of the two tests (Chapter 6, Graybill24).

Comparison with the ‘AbAw’ approach

Table 4 shows the power of 50 duplicates of pedigree A (and pedigree B) of Figure 1, respectively, for varying levels of LD between trait locus and marker A at 0.01 significance level. The parameters are exactly the same as those of Abecasis et al.3 Besides, the power of AbAw's χqtl2 is taken from Table 2 of Abecasis et al.3 The power of FA,a is calculated based on the noncentrality parameter approximation λA,a; the power of A,a and LRT are calculated as the proportions of 1000 simulation data sets, which give significant result at the 0.01 significance level based on FA,a and likelihood ratio test statistic, respectively. It is obvious from Table 4 that FA,a and likelihood ratio test based on our model have higher power than that of AbAw's χqtl2. Since other test statistics in Table 2 of Abecasis et al3 is less powerful than the AbAw's χqtl2, the method proposed in the current article is advantageous over the AbAw approach. As the type I error rates in Table 3, the results of F-test A,a are very close to those of likelihood ratio test statistics.

Table 4 Power (%) of 50 pedigrees (Figure 1) for varying levels of LD between trait locus and marker A at 0.01 significance level

Power comparison

Let us denote the heritability by h2, which is h2=σga2/σ2.23. To make power comparison, we conside 30 duplicates of small and large 3-generation pedigrees of graphs A and B of Figure 1, respectively. Two modes of inheritance are taken as example: a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritace a=1.0,d=−0.5, respectively. Figures 2 and 3 show the power curves of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against LD coefficient DAQ at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. The other parameters are given in the legend of the Figures. It can be seen that FAB,ad, generally has higher power than that of FA,ad, and FAB,a generally has higher power than that of FA,a. Hence, analysis using two markers are advantageous. Owing to the increase of degrees of freedom of test statistics, FAB,ad generally has lower power than that of FAB,a, and FA,ad generally has lower power than that of FA,a. Besides, the power of FAB,d and FA,d is low unless the LD between markerr A and QTL Q is vey strong. Since pedigree of Graph B of Figure 1 is larger than that of Graph A, the power of test statistics for Figure 3 is higher than that of corresponding statistics for Figure 2. Thus, large pedigrees contain more LD information and the models proposed rightly use the information.

Figure 2
figure 2

Power of 30 small three-generation pedigrees (Figure 1, graph a) of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against LD coefficient DAQ at 0.01 significance level, when q1=PA=PB=0.50, DQB=0.08, σGa2=0.10, σGd2=0.05 and h2=0.15, for a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritance a=1.0, d=−0.5, respectively.

Figure 3
figure 3

Power of 30 large three-generation pedigrees (Figure 1, graph b) of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against LD coefficient DAQ at 0.01 significance level, when q1=PA=PB=0.50, DQB=0.08, σGa2=0.10, σGd2=0.05 and h2=0.15, for a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritance a=1.0, d=−0.5, respectively.

Figures 4 and 5 show power curves of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against trait frequency q1 or marker allele frequency PA at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. In addition to the merits shown in Figures 2 and 3, the power of test statistics FAB,ad, and FAB,,a heavily depends on the frequency q1 of QTL (graphs I and II of each figure), but not very much on the marker allele fequency PA (graphs III and IV of each figure). Figures 6 and 7 show power curves of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against heritability h2 at 0.01 significance level, for the 30 duplicates of graph A and graph B of Figure 1, respectively. It can be seen that the power of FAB,ad, FAB,a, FA,ad and FA,a can be high as the heritability h2 is larger than 0.1. The power of FAB,d and FA,d is generally low.

Figure 4
figure 4

Power of 30 small three-generation pedigrees (Figure 1, graph a) of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against trait frequency q1 or marker allele frequency PA at 0.01 significance level, when PA=0.5 (graphs I and II), q1=0.5 (graphs III and IV), PB=0.50, σGa2=0.10, σGd2=0.05 and h2=0.15, for a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritance a=1.0, d=−0.5, respectively. The linkage disequilibrium coefficients are DAB=(min(PA, PB)−PAPB)/2, DAQ=(min(PA, q1)–PAq1)/2 and DQB=(min(PB,q1)–PBq1)/2.

Figure 5
figure 5

Power of 30 large three-generation pedigrees (Figure 1, graph b) of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against trait frequency q1 or marker allele frequency PA at 0.01 significance level, when PA=0.5 (graphs I and II), q1=0.5 (graphs III and IV), PB=0.50, σGa2=0.10, σGd2=0.05 and h2=0.15, for a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritance a=1.0, d=−0.5, respectively. The linkage disequilibrium coefficients are DAB=(min(PA, PB)–PAPB)/2, DAQ=(min(PA, q1)−PAq1)/2 and DQB=(min(PB, q1)−PBq1)/2.

Figure 6
figure 6

Power of 30 small three-generation pedigrees (Figure 1, graph a) of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against heritability h2 at 0.01 significance level, when q1=PA=PB=0.50, σGa2=0.10, σGd2=0.05, for a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritance a=1.0, d=−0.5, respectively. Abbreviations: Dom is dominant, Inhrt is inheritance, Rec is recessive.

Figure 7
figure 7

Power of 30 large three-generation pedigrees (Figure 1, graph b) of test statistics FAB,a, FAB,ad, FA,a, FA,ad, FA,d, and FAB,d against heritability h2 at 0.01 significance level, when q1=PA=PB=0.50, σGa2=0.10, σGd2=0.05, for a dominant mode of inheritance a=d=1.0, and a recessive mode of inheritance a=1.0, d=−0.5, respectively. Abbreviations: Dom is dominant, Inhrt is inheritance, Rec is recessive.

An example

The proposed method is applied to analyze ACE data.14,15 The data consist of 83 extended families with between four and 18 members, and the pedigrees range in size from two to three generations. Circulating ACE levels were measured for 405 individuals. In total, 10 biallelic polymorphisms in the ACE gene were genotyped. Multipoint IBD at each marker are calculated by Merlin.25 Software QTDT-2.4.3 is used to analyze the data. There are missing genotype information at markers, and so the total number of individuals is different from marker to marker. For instance, there are four individuals whose genotype information is missing at marker I/D and the total number N=401; at marker G2350A, on the other hand, there are more missing genotype data, and N=365.

Variance component linkage analysis shows that additive variances are significantly larger than 0, but dominance variances are not significantly larger than 0.3 Hence, dominance effects can be excluded from regression equation. The total variance is modeled as σ2=σga2+σGa2+σe2, and related variance-covariance structure can be constructed. Table 5 shows LD analysis of the ACE gene by individual marker. To make comparison with the ‘AbAw’ approach, results of AbAw's lod are taken from Table 4 of Abecasis et al.3 After fitting the proposed models in this article, lod is calculated by LRT/(2 ln 10), where LRT=2(L1−L0), L1 is the log-likelihood under yij=β+xAijαA+Gij+eij, and L0 is the log-likelihood under yij=β+Gij+eij. The lod scores calculated by the proposed method in this article is generally higher than those of the ‘AbAw’ approach, which is consistent with the results of Table 4. Hence, the proposed method is advantageous. Moreover, the results of Table 5 confirm the finding that the association is strongest around the G2215A, I/D, G2350A and 4656(CT)3/2 polymorphisms (lod=27.01), 27.37, 28.01 and 27.93; and the ‘AbAw’ lod=14.91, 15.76, 14.40 and 14.22, respectively;3). The estimations of σga2 are 0.05 and 0.03 at markers I/D and G2350A, respectively. In Table 5, the additional linkage effects are presented. The additional linkage lod is calculated by 2(L1L2)/(2 ln 10), where L2 is the log-likelihood yij=β+xAijαA+Gij+eij assuming the total variance σ2=σGa2+σe2 and related variance–covariance structure. For markers I/D and G2350A, the additional linkage lods are 0.09 and 0.03 (the related P-values are 0.76 and 0.85, respectively). The residual QTL-specific variances of 0.05 and 0.03 for the two ACE markers of largest effect are not significantly different from zero. Little additional linkage effects are present after modeling the association effects. Therefore, these markers are likely in complete LD with the trait alleles, and the genetic variance attributable to the QTL is encompassed in the association coefficient αA.2,3

Table 5 LD analysis of the ACE gene by individual marker

Notice that there are 10 individual results in Table 5, and it is unclear if there is any relation among the models/markers from results in Table 5. Using the proposed models, we explore LD analysis based on two or more markers. Table 6 shows LD analysis of the ACE gene by two markers. Regressions are given by: (1) yij=β+Gij+eij; (2) yij=β+xAijαA+Gij+eij; (3) yij=β+xAijαA+xBijαB+Gij+eij. We first investigate markers I/D and G2350A that show strongest association in individual marker analysis of Table 5. It turns out that marker I/D is correlated with marker G2350A in the following sense: if A=I/D and B=G2350A, xAij are xBij are correlated to each other. This phenomenon motivates us to further investigate the relation of these two markers. Utilizing data from http://www.well.ox.ac.uk/~mfarrall/oxhap-freq.html, we calculate the frequency of the four haplotypes of these two markers as follows: P(IA)=0.478875, P(IG)=0.002817, P(DG)=0.515494, P(DA)=0.002817 (note here A is an allele at marker G2350A). This shows that allele I at marker I/D is almost always present with allele A at marker G2350A, and allele D at marker I/D is almost always present with allele G at marker G2350A. Besides, the measure of LD is 0.246878 between the markers I/D and G2350A. The two markers are almost in complete LD with each other.

Table 6 LD analysis of the ACE gene by two markers

Since there are more missing data at marker G2350A than that at marker I/D, we choose I/D as baseline marker A. Six markers, 4656(CT)3/2, T-5491C, A-5466C, T-3892C, A-240T and T-93C, show significant association at a 0.05 significance level, in addition to the association of marker I/D (see the P-value of test H0 : αB=0 in Table 6). In these six markers, 4656(CT)3/2 shows strongest additional association with LRT=7.94, lod=1.72, P-value=0.005 (Table 6). Hence, we choose I/D and 4656(CT)3/2 as markers A and B for the basis of further investigation. Table 7 shows results by one more marker in addition to markers I/D and 4656(CT)3/2. Four markers, T-5491C, A-5466C, T-3892C and A-240 T, show significant association at a 0.05 significance level, in a addition to the association of markers I/D and 4656(CT)3/2 (see the P-value of test H0 : αC=0 in Table 7). Further investigation shows that there is no significant evidence to analyze the data using more than three markers in the regression equations.

Table 7 LD analysis of the ACE gene by multiple markers

Notice that if the significance level is set at 0.01 instead of 0.05, then only markers I/D and 4656(CT)3/2 are associated with the trait locus (Table 6). The regression, yij=β+xAijαA+xBijαB+Gij+eij, A=I/D, B=4656(CT)3/2, can fully interpret association at the 0.01 significance level, which provides a unique result.

Discussion

This article focuses on building models that may utilize extended multi generation large pedigrees for LD mapping of QTL. Based on the same logic developed in our previous work,7,8 variance component models are constructed for high resolution joint linkage and LD mapping of QTL. By type I error rate evaluation, it is found that the proposed models have correct type I errors and the methods are robust. The methods proposed are compared with the ‘AbAw’ approach. It is found that our models have higher power. This is consistent with our previous findings based on sibpair data.7 Based on power comparison, it is shown that large pedigrees may contain more LD information than small pedigrees. Moreover, the proposed method is applied to ACE data with a better result than that of the ‘AbAw’ approach. In addition, it is found that markers I/D and 4656(CT)3/2 can fully describe the association with the trait locus at a 99% significance level for the ACE data. This result is very meaningful. For instance, all 10 biallelic markers of ACE data show association with the trait locus. However, it is markers I/D and 4656(CT)3/2 which are really necessary in describing the association at the 99% significance level. Marker G2350A is correlated to marker I/D, as a matter of fact!

One possible explanation that the proposed method is more powerful than the ‘AbAw’ approach is as follows. The proposed method decomposes the association into the summation of additive and dominance effects. If dominance effect is not significantly larger than 0, then only additive effect is modeled such as the ACE data. The’AbAw’ approach, on the other hand, decomposes the association into the summation of between-family (b) and within-family (w) components. In Table 4 of Abecasis et al,3 the ‘AbAw’ lod is reported to be the within-family effect, which is less powerful than the proposed method. By decomposing the association effect into association between-family and association within-family in the ‘AbAw’ approach, the information is diluted and so the approach is less powerful. This is, in the opinion of the authors of the current article, is not a criticism for the ‘AbAw’ approach, since it may be a good method in certain circumstances. It is our hope that the current article may stimulate more interest in exploring the most appropriate method in LD mapping of complex diseases.

Population may contain LD information, and pedigree data may contain both linkage and LD information. In mapping complex diseases, it is difficult to get replicate linkage findings using pedigree data. It is interesting to combine the same pedigree data with population data for fine association study. In general, any type of pedigree data can be combined with population data for a unified analysis of linkage and LD mapping of QTL, based on the models developed in the current paper and our previous research.7,8 Linkage analysis can localize genetic traits in a broad region, and is less sensitive to population structures. LD mapping, on the other hand, is appropriate for fine high-resolution mapping and sensitive to population admixtures. As the first step, one may perform linkage analysis using pedigree data to get prior linkage evidences based on a sparse genetic map.26,27 Then, a combined analysis using both population data and pedigree data can take advantage of both linkage and LD mapping for high resolution of genetic traits. That is, it may provide high resolution of LD mapping, and be more likely to avoid the false positives by using the prior linkage evidences based on a dense genetic map.28

Up to now, most research focus on using biallelic markers in LD mapping of QTL. It would be interesting in building models using multiallelic markers such as macrosatellites or haplotype blocks. One problem in using multiallelic markers in analysis is that the number of parameters can be big, which may lead to high degrees of freedom of test statistics. Hence, selection of significant markers and relevant alleles is an important issue. In addition, genotyping errors can heavily impact the result of a study. These issues deserve more in depth investigations.