Combined high resolution linkage and association mapping of quantitative trait loci

Fan, Ruzong; Xiong, Momiao

doi:10.1038/sj.ejhg.5200941

Download PDF

Article
Published: 14 February 2003

Combined high resolution linkage and association mapping of quantitative trait loci

Ruzong Fan^1,2 &
Momiao Xiong³

European Journal of Human Genetics volume 11, pages 125–137 (2003)Cite this article

663 Accesses
16 Citations
Metrics details

Abstract

In this paper, we investigate variance component models of both linkage analysis and high resolution linkage disequilibrium (LD) mapping for quantitative trait loci (QTL). The models are based on both family pedigree and population data. We consider likelihoods which utilize flanking marker information, and carry out an analysis of model building and parameter estimations. The likelihoods jointly include recombination fractions, LD coefficients, the average allele substitution effect and allele dominant effect as parameters. Hence, the model simultaneously takes care of the linkage, LD or association and the effects of the putative trait locus. The models clearly demonstrate that linkage analysis and LD mapping are complementary, not exclusive, methods for QTL mapping. By power calculations and comparisons, we show the advantages of the proposed method: (1) population data can provide information for LD mapping, and family pedigree data can provide information for both linkage analysis and LD mapping; (2) using family pedigree data and a sparse marker map, one may investigate the prior suggestive linkage between trait locus and markers to obtain low resolution of the trait loci, because linkage analysis can locate a broad candidate region; (3) with the prior knowledge of suggestive linkage from linkage analysis, both population and family pedigree data can be used simultaneously in high resolution LD mapping based on a dense marker map, since LD mapping can increase the resolution for candidate regions; (4) models of high resolution LD mappings using two flanking markers have higher power than that of models of using only one marker in the analysis; (5) excluding the dominant variance from the analysis when it does exist would lose power; (6) by performing linkage interval mappings, one may get higher power than by using only one marker in the analysis.

Leveraging information between multiple population groups and traits improves fine-mapping resolution

Article Open access 10 November 2023

Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Article Open access 17 October 2019

Population-specific long-range linkage disequilibrium in the human genome and its influence on identifying common disease variants

Article Open access 06 August 2019

Introduction

Twenty years ago, variations in human DNA were recognized as genetic markers in linkage study.¹ After that, the advances in molecular biology and computational technology have led to mapping several human inherited disease genes. Using restriction fragment length polymorphism (RFLP) markers and polymorphic micro-satellite loci, linkage analysis and positional cloning have been used successfully in mapping the chromosome locations of Mendelian disease genes. The success mainly depends on one premise that the disease genes of Mendelian traits have a large effect on the phenotypes.² In fact, there is usually a one-to-one correspondence between disease gene genotypes and the disease phenotype for Mendelian traits. Moreover, the correlations between genotypes and phenotype of Mendelian traits are strong. Given sufficient family data, Mendelian traits can be mapped with high probability by linkage analysis.

With the encouragement of successful mapping Mendelian trait genes, there has been growing interests and endeavors in the study of complex traits such as asthma and diabetes. For complex diseases, the inheritance patterns and phenotype definitions as with genetic etiology are much more complex. The trait/affection status is usually a continuous variable.³ The mapping of complex disease genes is much harder. Novel statistical methods such as both linkage analysis and linkage disequilibrium (LD) mapping or association study are needed in dissecting complex traits. As very dense marker maps such as single nucleotide polymorphism (SNP) are available,⁴ both linkage analysis and association study are utilized simultaneously for mapping complex disease loci.^5,6 Almasy et al⁷ and Fulker et al⁸ proposed to use combined linkage and association analysis for quantitative trait loci (QTL). Sham et al⁹ studied the power of linkage versus association analysis of quantitative traits by analytically calculating non-centrality parameters of test statistics. Abecasis et al^10,11,12 proposed test statistics of association studies for quantitative traits in nuclear families, general pedigrees, and selected samples. Cardon¹³ studied a sib-pair regression model of LD for quantitative traits. All these researches concentrated on family data which include sib-pairs, and used only one marker in analysis. In Fan and Xiong,¹⁴ we proposed a linear regression method of high resolution mapping of quantitative trait loci by LD mapping analysis. The method is based on population data. Using two flanking markers, the regression models have higher power than that of models using only one marker.¹⁴

It is well-known that family pedigree data can be used in both linkage analysis and association study, and population data can be used in association study. Hence, it is necessary to consider a method to combine both population data and family pedigree data in the analysis. In this paper, we propose to perform both linkage analysis and high resolution LD mapping for QTL based on combined family and population data. Linkage interval mapping is based on family data, and LD mapping is based on both family pedigree and population data. Based on variance component models, we construct likelihood to analyse family and population data in Section of Models. Then, we discuss the parameter estimations and regression coefficients. The linkage information, i.e., recombination fractions, is contained in the variance-covariance matrix, and the association information, i.e., the LD coefficients, is contained in the mean parameters or the regression coefficients. We calculate the non-centrality parameters for association study and linkage analysis, respectively. Using the non-centrality parameters, we perform power calculations and comparisons. The technical details to calculate the regression coefficients, parameters, non-centrality parameters are left in the Appendixes.

Models

Consider a quantitative trait locus Q which has two alleles Q₁ and Q₂. Suppose that the allele frequencies of Q₁ and Q₂ are q₁ and q₂, respectively. Assume that two markers A and B flank the trait locus Q in an order of AQB. Marker A has two alleles A and a with frequencies P_A and P_a, respectively. Marker B has two alleles B and b with frequencies P_B and P_b, respectively. For a nuclear family of k children and two parents, let us denote their quantitative traits by a vector y=(y_f, y_m, y₁, · · · , y_k)^τ, genotypes at marker A by a vector (A_f,A_m,A₁, · · ·, A_k)^τ, and genotypes at marker B by a vector (B_f,B_m,B₁,· · ·, B_k)^τ. Here y_f is the trait value of the father, A_f is the genotype of the father at marker A, and B_f is the genotype of the father at marker B. Other notations are defined, similarly, for the mother with subscript m and for the i-th child with subscript i. The log-likelihood is defined by . The notations of the log-likelihood are defined as follows. For the mean component Xμ, we consider the following regression equation such as model (1) in Fan and Xiong¹⁴

where β is overall mean, w_i is a row vector of covariates such as gender and age, γ is a column vector of regression coefficients for the covariates w_i, G_i is polygenic effect, e_i is error term. Assume that G_i is normal , and e_i is normal . Moreover, G_i and e_i are independent. x_Ai, x_Bi, z_Ai and z_Bi are dummy variables defined by

α_A, α_B, δ_A, and δ_B are regression coefficients of the dummy variables x_Ai, x_Bi, z_Ai and z_Bi.

and μ=(β, γ^τ, α_A, α_B, δ_A, δ_B)^τ is a vector of regression coefficients. Σ is a (k+2)×(k+2)

variance-covariance matrix defined as

is variance explained by the putative QTL Q, is polygenic variance, and is error variance. The genetic variances are decomposed into additive and dominant components. is correlation between parents and children, is correlation between the i-th child and the j-th child, π_ijQ is the proportion of alleles shared identical by descent (IBD) at QTL Q by the i-th child and the j-th child, and Δ_ijQ is the probability that both alleles at QTL Q shared by the i-th child and the j-th child are IBD.

For population data, an intuitive rationale of regression model (1) is given in Fan and Xiong¹⁴. In general, one can construct a variance-covariance matrix for any type of pedigree in a similar way as above. Assume that there are two independent sub-samples of data: (1) population data: n independent individuals; (2) family data: I–n (I>n) independent families. Let us list the log-likelihood of the n independent individuals by L₁, · · · , L_n, and the likelihood of the I–n families by L_n+1, · · · , L_I. Then the overall log-likelihood is . The unknown parameters are μ= . Using the likelihood ratio tests, one may test statistical significance of the parameters of interest.

Parameter estimations and regression coefficients

Regression coefficients

Let μ_ij be the effect of genotype Q_iQ_j, i, j=1, 2, μ₁₂ =μ₂₁. Denote the population effect mean by and define α_Q=q₁μ₁₁ +(q₂−q₁)μ₁₂−q₂μ₂₂, δ_Q=2μ₁₂−μ₁₁−μ₂₂. If μ₁₁=a, μ₁₂=d, and μ₂₂=–a as in the traditional quantitative genetics,¹⁵ α_Q= a+(q₂–q₁)d is the average allele substitution effect, and δ_Q =2d characterizes the dominant effect. In general, one may define a=μ₁₁−(μ₁₁+μ₂₂)/2 and d=μ₁₂−(μ₁₁+μ₂₂)/2. It is well known that the additive variance and the dominant variance . A true random effect model describing the trait value is y_i=β+w_iγ+g_i+G_i+e_i, where

Denote LD coefficient between trait locus Q and marker A by D_AQ=P(AQ₁)–q₁P_A, LD coefficient between trait locus Q and marker B by D_QB=P(BQ₁)–q₁P_B, and LD coefficient between marker A and marker B by D_AB=P(AB)–P_AP_B. Let the additive and dominant variance–covariance matrices be

Moreover, let us denote three ratios . As in Appendix B,¹⁴ we can show that the coefficients of regression equation (1) are given by

Parameters of variance–covariances

Denote the recombination fraction between trait locus Q and marker A by θ_AQ, the recombination fraction between trait locus Q and marker B by θ_QB, and the recombination fraction between marker A and marker B by θ_AB. Fulker and Cardon¹⁶ proposed to estimate the proportion π_ijQ of allele IBD at putative QTL Q for a sib-pair i and j by where π_ijA and π_ijB are the IBD proportions of alleles shared at the marker A and marker B, respectively. The coefficients α_π, β_πA and β_πB are given by

Let Δ_ijA, Δ_ijB be the probability of sharing two alleles IBD at markers A and B for a pair of sibs, respectively. In Fan,¹⁷ we proposed to estimate Δ_ijQ by equation . Under the assumption of no interference, the coefficients are as follows (Fan¹⁷):

where Assuming that the positions of marker A and marker B are known, θ_AB can be calculated through Haldane's map function. Then only one of θ_AQ and θ_QB is unknown since the other can be calculated through Trow's formula.¹⁸ For general relatives i and j, Almasy and Blangero¹⁹ proposed an algorithm to calculate the proportion π_ijQ of allele IBD at putative QTL Q, and the expected probability Δ_ijQ that both alleles at QTL Q are IBD. In Fan,¹⁷ we derived formulas to calculate the covariances of trait values for a few types of relatives directly without performing matrix operations.

Association and linkage studies

From equations (3) and (4), we can see that the coefficients of LD (i.e., D_AQ and D_QB) and gene effects (i.e., α_Q and δ_Q) are contained in the regression coefficients. Moreover, we show in the above paragraph that the linkage parameters (i.e., recombination fractions θ_AQ, θ_QB and θ_AB) are contained in the variance-covariance matrix. Assume that markers A and B are in LD with the trait locus Q, i.e., D_AQ≠0,D_QB≠0. We may simultaneously test LD of marker A and marker B with trait locus Q, the gene substitution and dominant effects by testing α_A=α_B=δ_A=δ_B=0. From equation (3), we may test LD of markers A and B with the trait locus Q and the gene substitution effect α_Q by testing α_A=α_B=0. From equation (4), we may test LD of markers A and B with the trait locus Q and the dominant effect by testing δ_A=δ_B=0.

To test linkage, one may use the likelihood ratio test of the log-likelihood L. Under the null hypothesis of no linkage between the major trait locus Q and the markers, θ_AQ= θ_QB=1/2. Under the alternative hypothesis of linkage, θ_AQ≠1/2 or θ_QB≠1/2. By comparing the difference of maximum log-likelihoods under the alternative and null hypotheses, we may use χ² statistic to test the linkage. We will derive analytical formulas to explore the linkage interval mapping by the nuclear families in a similar way to Sham et al⁹ according to statistical theory of likelihood ratio tests.²⁰

Non-centrality parameters of association study

Assume that there are no covariates. Then μ=(β, α_A, α_B, δ_A, δ_B)^τ. Consider the overall log-likelihood , where L_i is the log-likelihood of trait values y_i of the i-th family or individual. Let Σ_i be the variance-covariance matrix of y_i, and X_i be its model matrix. Denote the total trait values , the total variance–covariance matrix by Σ=diag(Σ₁, · · · ,Σ_I), and the model matrix . Let be the maximum likelihood estimators of β, α_A, α_B, δ_A, δ_B,Σ_i, Σ. The estimate of μ is . Let H be a q×5 test matrix of rank q. Suppose that the total number of individuals is N. By Graybill,²¹ Chapter 6, the test statistic of a hypothesis Hμ=0 is non-central F(q, N–5) defined by

The non-centrality parameter of the test statistic F can be calculated by . If the data are composed of n individuals of a population, Fan and Xiong¹⁴ worked out the non-centrality parameters to test if there are allele substitution and/or dominant effects and LDs between the markers and the major gene locus. In the following, we discuss a situation that the data are composed of both individual population data and family data.

Suppose that there are n individuals of a population, and n is sufficiently large. For each y_i of the n individuals, Σ_i=σ² and X_i=(1 x_Ai x_Bi z_Ai z_Bi), i=1, 2,· · ·, n. From formulas in Fan and Xiong,¹⁴ Appendix A and Appendix B, we may show that

where V_A and V_D are additive and dominant variance-covariance matrices given in (2).

Secondly, suppose that there are m trio families, and m is sufficiently large. A trio family is composed of both parents and a single child. Notice that the means of x_Ai, x_Bi, z_Ai and z_Bi are 0. Let K_f =(x_Af x_Bf z_Af z_Bf) and K_m=(x_Am x_Bm z_Am z_Bm). We show in Appendix A that the covariance matrix between parents and their offspring is

where K₁=(x_A₁ x_B₁ z_A₁ z_B₁) and O₂ is zero 2×2 matrix. For each of the trio families, the variance–covariance Σ_i is a 3×3 matrix whose inverse is

Using equations (5), (6), and (7), we show in Appendix B

Thirdly, suppose that there are k nuclear families each of them has both parents and two offspring, and the correlation of the two offspring is ρ₁₂. Assume that k is sufficiently large. For each family, the variance–covariance Σ_i is a 4×4 matrix whose inverse is

where . In Appendix C, we show that the covariance matrix between two offspring is

Using equations (5), (6), (9) and (10), we show in Appendix D that

where the constants are given by Combine the n individuals, m trio families, and k families with two offspring. Define . Then equations (5), (8) and (11) lead to

To test if there are additive and dominant effects, we may test the hypothesis H_AB,ad: α_A=α_B=δ_A=δ_B=0. Then the test matrix H is defined by

Let us denote the corresponding F-test statistic by F_AB,ad, and the non-centrality parameter by λ_AB,ad. Then we have from (3), (4), and (12) that

Assume that the two markers A and B are in linkage equilibrium, then D_AB=0. Moreover, assume that the trait locus Q is in LD with marker A but not with marker B, then D_QB=0 and D_AQ≠0. Then , which only involves marker A and can be written as λ_A,ad. Correspondingly, we denote the F-test statistic by F_A,ad. Similarly, is the non-centrality parameter of a test statistic F_A,a. To test the other hypotheses, we may get the non-centrality parameters in a similar way by taking appropriate test matrices H. To test if there is dominant effect, we may test the hypothesis H_AB,d : δ_A=δ_B=0. The non-centrality parameter is . To test if there is an additive or substitution effect, we may test the hypothesis . The non-centrality parameter is . The corresponding F-test statistic is denoted by F_AB,a.

Non-centrality parameters of linkage studies

Consider a nuclear family with k children and both parents. Under the null hypothesis of no linkage between the trait locus and markers, the correlation of each sib-pair is

The expected log-likelihood is Under the alternative hypothesis of linkage between the trait locus and marker A, the correlation between a sib-pair is . From Haseman and Elston,²² Table IV, we have

The expected log-likelihood under the alternative hypothesis of linkage is

where P(π_ijA=0)=P(π_ijA=1)=1/4 and P(π_ijA=1/2)=1/2. From Stuart and Ord,²⁰ the non-centrality parameter for linkage of the nuclear family is equal to λ_linkage,A=E(2L_random,A)–E(2L_Null). If k=2, it can be shown that .

Under the alternative hypothesis of linkage between the trait locus and markers A and B, the correlation between a sib-pair is given by for i, j=0, 1, 2

To calculate C_ij, we need to calculate the joint distribution of π_12A, π_12Q and π_12B of a sib-pair under the alternative hypothesis of linkage. Assume that there is no interference for disjoint regions of the chromosome. Then we have

From Haseman and Elston,²² Table IV, we may construct the joint distribution of π_12Q, π_12A and π_12B by relation (15), and the results are presented in Table 3 of Fan.¹⁷ Based on the results, we can calculate C_ij, i, j=0, 1, 2, which are given in Appendix D of Fan.¹⁷ The expected log-likelihood under the alternative hypothesis of linkage is $E(2 L_{r a n d o m, A B}) = - (k + 2) [\log (2 π σ^{2}) + 1] - Σ_{π_{12 A}} Σ_{π_{12 B}} \dots Σ_{π_{k -1, k A}} Σ_{π_{k -1, k B}} P (π_{12 A}) P (π_{12 B}) \dots P (π_{k - 1, k A}) P (π_{k -1, k B})$

where P(π_ijB=0)=P(π_ijB=1)=1/4 and P(π_ijB=1/2)=1/2 such as those for marker A. From Stuart and Ord²⁰_, the non-centrality parameter for linkage of the nuclear family is equal to λ_linkage,AB=E(2L_random,AB)–E(2L_Null). If k=2, it can be shown that .

Power calculation and comparison

Let us denote heritability by h² which is defined by . In the power calculations, we take the additive polygenic variance , polygenic dominant variance , the equal allele frequencies P_A=q₁=P_B=0.5 at the two markers A and B, and the QTL Q. Moreover, suppose that μ₁₁=a, μ₁₂=μ₂₁=d and μ₂₂=–a.

Suppose that the map distance λ_AB between marker A and marker B is known. Under the assumption of no interference, we may calculate the recombination fraction by Haldane's map function θ_AB=[1–exp(–2λ_AB)]/2. Similarly, we may calculate the recombination fractions θ_AQ and θ_QB by the map distances λ_AQ and λ_QB. Assume that marker A and marker B are in linkage equilibrium, i.e., D_AB=0, the genetic distances λ_AB=5 cM, λ_AQ=λ_QB=2.5 cM, and the heritability h²=0.25. Suppose we have a sample with n=100 individuals, m=30 trio families, and k=20 nuclear families with two offspring. Assume that the IBD proportions shared by the two offspring in the k=20 families at both markers A and B are π_A= π_B=0.5, and the probability of sharing two alleles IBD at markers A and B are Δ_A=Δ_B=0.5. Figure 1 shows the power of the test statistics F_AB,ad, F_AB,a, F_A,ad, and F_A,a against the disequilibrium coefficient D_AQ when D_QB=0.15 for a mode of dominant inheritance with a=d=1.0, and a mode of recessive inheritance with a=1.0, d=–0.5, respectively. Several features are interesting in the two graphs of Figure 1. First, the power of F_AB,ad and F_AB,a are higher than that of F_A,ad and F_A,a. Hence, the regression mapping which uses two markers A and B has its advantage over the one marker mapping which only uses one marker A or B. Second, the statistic F_AB,ad has higher power than that of F_AB,a, and the statistic F_A,ad has higher power than that of F_A,a. Thus, excluding the dominant variance from the analysis when it does exist would lose power. Third, as expected, when D_AQ=0 the power to detect LD using only marker A is minimal. More interestingly, when D_AQ=0.15 the power is still higher using the flanking two markers than using only marker A.

Figure 2 shows the power of the test statistics F_AB,ad, F_AB,a, F_A,ad, and F_A,a against the heritability h² when D_AB=0.10 and D_AQ=D_QB=0.15 for a mode of dominant inheritance with a=d=1.0, and a mode of recessive inheritance with a=1.0, d=–0.5, respectively. The other parameters are the same as those of Figure 1. Among the features observed in Figure 1, the power is reasonably high when the heritability h² is bigger than 0.15. To compare the power of population based and family based methods, Figure 3 shows the power of the test statistics F_AB,ad and F_AB,a for a mode of dominant inheritance with a=d=1.0, and a mode of recessive inheritance with a=1.0, d=–0.5, respectively. For Figure 3, population data contain n=252 individuals, but no family data (m=k=0). For dominant inheritance of Figure 3, the data contain m=84 trio families (n=k=0). For recessive inheritance of Figure 3, the data contain k=63 nuclear families each has two offspring (n=m=0). Notice that m=84 or k=63 family data contain 252 individuals, and thus the number of individuals is the same as that of the population data. We can see that population based method is more powerful than the family based method for the same number of individuals.

In a population, the LD can exist due to mutations at the trait locus. In the absence of tight linkage between the trait locus and a marker, the recombination between the marker locus and the trait locus can rapidly dissipate the disequilibrium from generation to generation. Denote the frequency of haplotype AQ at the generation when the mutations occur by P(AQ)(0). Then LD coefficient is D_AQ(0)= P(AQ)(0)–q₁P_A for the generation when the mutations occur. For the following generations, the disequilibrium coefficient is reduced by a factor 1−θ_AQ in each generation.²³ Suppose that the mutation is already T generation old. Then the LD coefficient is D_AQ(T)=D_AQ(0)(1–θ_AQ)^T. Similarly, the other LD coefficients are D_AB(T)=D_AB(0)(1– θ_AB)^T and D_QB(T)=D_QB(0)(1–θ_QB)^T.

Assume that the map distance between marker A and marker B is λ_AB=5 cM, and the other parameters are given by D_AB(0)=0.20,D_AQ(0)=D_QB(0)=0.25, h²=0.25, λ_AB=5 cM, n=100, m=30, k=20, T=30, π_A= π_B=0.5, Δ_A=Δ_B=0.25. Figure 4 shows the power of the test statistics F_AB,ad, F_AB.a, F_A.ad, and F_A,a against the recombination fraction θ_AQ for a mode of dominant inheritance with a=d=1.0, and a mode of recessive inheritance with a=1.0, d=–0.5, respectively. We can see that the power curves of F_AB,ad and F_AB,a are very high, although the power curves of F_A.ad and F_A,a decrease very rapidly as the recombination fraction θ_AQ increases. Hence, high resolution LD mappings have advantage to do fine gene mappings, and appropriate for the dense marker maps such as single nucleotide polymorphisms on human genome. To investigate the effect of the age of the mutation on the power, Figure 5 shows the power curves against the position of markers. In the Figure, the QTL locates at 15 cM which is flanked by two markers A and B. One marker is one the right-hand side of the QTL, and the other is on the left-hand side with equal distance to the QTL. The power decreases quickly when the age of the mutation increases. For a mutation which is 30 generations old, one should expect very low power if the markers locate 5 cM away from the QTL.

To explore the linkage interval mapping, we take a sample of k=250 nuclear families each has two offspring. Multiplying λ_linkage,A and λ_linkage,AB by k, we may calculate the non-centrality parameters for the linkage mapping using marker A and the linkage interval mapping using markers A and B. Moreover, assume that the genetic distances are λ_AB=30 cM, and λ_AQ=λ_QB=15 cM, i.e., the QTL Q is right in the middle between markers A and B. Figure 6 gives power curves of linkage interval mapping by markers A and B, and linkage mapping by marker A against heritability h² for a mode of dominant inheritance with a=d=1.0, and a mode of recessive inheritance with a=1.0, d=–0.5, respectively. It is clear that the power of interval linkage mapping using both markers A and B is higher than that of linkage mapping using only one marker A.

Discussion

In this paper, we investigate variance component models of both high resolution LD mapping and linkage analysis for QTL. The models are based on family pedigree and population data. We consider likelihoods which utilizes flanking marker information. The likelihoods jointly include recombination fractions, LD coefficients, the average allele substitution effect and allele dominant effect as parameters. The linkage parameters are contained in the variance-covariance matrix. The parameters of LD and gene effects are contained in the regression coefficients.^8,9,11,12 The model simultaneously takes care of the linkage, LD and the effects of the putative trait locus Q, and hence clearly demonstrates that linkage analysis and LD mapping are complimentary, not exclusive, methods for QTL mapping. The family data which have at least two offspring contain information for both linkage and association, and population data and trio family data which have two parents and only one offspring contain information for association. By combining the family and population data in the analysis, one may expect to get better results than that by analysing them separately.

Linkage analysis can localize genetic trait loci in broad chromosome regions of a few cM (<10 cM), and is less sensitive to population admixture than LD mapping. In practice, one may carry out linkage analysis as a first step to obtain prior suggestive linkage based on a sparse marker map. By performing linkage interval mappings, one may get higher power than that of using only one marker. With prior linkage in hand, LD mapping can be used to get high resolution of the genetic trait loci based on a dense marker map. We have shown that models of high resolution LD mappings using two flanking markers have higher power than that of models of using only one marker. Hence, high resolution LD mappings have the advantages to do fine gene mappings, and appropriate for the dense marker maps such as SNPs on human genome. Performing both LD mapping and linkage analysis has potential to avoid false positives due to population history or environmental effects. In the meantime, it takes the advantage of high resolution of LD mapping.

The power of association study depends on the existence of LD between trait locus and markers. In the absence of LD, the power of LD mappings is very low. To increase the probability of detecting LD, one may need to carry out suitable design for a genetic study.²⁴ It is well known that the level of LD is heavily affected by population stratification. On the one hand, the family based methods are less likely influenced by population stratification than those of population data based methods. On the other hand, a family based association study is less powerful than that of population based study for the same number of individuals. Combining the family and population data, one may expect more information, and take the advantage of population data and family data. More investigation is needed to explore the population stratification effect on high resolution LD mapping of QTL, and to develop robust methods to identify association between multiple markers and QTL in the presence of population stratification.

To our knowledge, there is not much research on statistical analysis about high resolution LD mapping of QTL. Using only one bi-allelic marker, the statistical analysis of LD mapping has been studied by a few colleagues.^{8,9,10,11,12,13} Relatively, multipoint linkage mapping has been studied more intensively.^16,19,25 It is our hope that the current research may shed more light on the high resolution association study, and stimulate more interests to utilize the advantage of LD mapping in fine resolution of genetic studies. In the Section of power calculation and comparison, we mainly explore a set of scenarios of LD mapping. For several sets of parameters, we compare the power of four test statistics for LD mapping. Moreover, we compare the power of LD mapping of using population data and family data. We also investigate the effect of mutation age on the power. For linkage mapping, we only include one figure to make power comparison of linkage interval mapping using two markers with linkage mapping using only one marker.⁹ This reflects the need for more research on high resolution LD mapping of QTL, since the research on linkage interval/multipoint mapping is more mature.

In this paper, we treat LD as a fixed effect since only two markers are considered. In general, inference about the LD structure in the population are desirable, and LD should be modeled as a random effect when multiple markers/haplotypes are used in analysis, which would need more investigation. We assume that the data of all family members are available. For some late-onset diseases, the data for the parents or former family members may no longer be available. In principle, one can use similar methods as the ones proposed in this paper to perform high resolution LD mapping for sib-pair data of late-onset diseases. This is an area which is of importance and needs more research. Due to the length of this paper, we do not pursue these issues in depth, and they will be explored in other projects.

References

Botstein, D, White, RL, Skolnick, MH & Davis, RW : Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet, (1980). 32, 314–331.
CAS PubMed PubMed Central Google Scholar
Morton, NE : Sequential tests for the detection of linkage. Am J Hum Genet, (1955). 7, 277–318.
CAS PubMed PubMed Central Google Scholar
Morton, NE : Significance levels in complex inheritance. Am J Hum Genet, (1998). 62, 690–697.
Article CAS Google Scholar
The International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature, (2001). 409, 928–933.
Risch, N & Merikangas, K : The future of genetic studies of complex human diseases. Science, (1996). 273, 1516–1517.
Article CAS Google Scholar
Abecasis, GR, Cherny, SS, Cookson, WOC & Cardon, LR : Merlin – rapid analysis of dense genetic maps using sparse gene flow tress. Nature Genetics, (2002). 30, 97–101.
Article CAS Google Scholar
Almasy, L, Williams, JT, Dyer, TD & Blangero, J : Quantitative trait locus detection using combined linkage/disequilibrium analysis. Genetic Epidemiology, (1999). 17, Suppl 1 S31–S36.
Article Google Scholar
Fulker, DW, Cherny, SS, Sham, PC & Hewitt, JK : Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet, (1999). 64, 259–267.
Article CAS Google Scholar
Sham, PC, Cherny, SS, Purcell, S & Hewitt, JK : Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet, (2000). 66, 1616–1630.
Article CAS Google Scholar
Abecasis, GR, Cardon, LR & Cookson, WOC : A general test of association for quantitative traits in nuclear families. Am J Hum Genet, (2000). 66, 279–292.
Article CAS Google Scholar
Abecasis, GR, Cookson, WOC & Cardon, LR : Pedigree tests of linkage disequilibrium. Eur J Hum Genet, (2000). 8, 545–551.
Article CAS Google Scholar
Abecasis, GR, Cookson, WOC & Cardon, LR : The power to detect linkage disequilibrium with quantitative traits in selected samples. Am J Hum Genet, (2001). 68, 1463–1474.
Article CAS Google Scholar
Cardon, LR : A sib-pair regression model of linkage disequilibrium for quantitative traits. Hum Hered, (2000). 50, 350–358.
Article CAS Google Scholar
Fan, R & Xiong, M : High resolution mapping of quantitative trait loci by linkage disequilibrium analysis. Eur J Hum Gen, (2002). 10, 607–615.
Article CAS Google Scholar
Falconer, DS & Mackay, TFC Introduction to Quantitative Genetics, London: Longman (1996). 4th edn
Google Scholar
Fulker, DW & Cardon, LR : A sib-pair approach to interval mapping of quantitative trait loci. Am J Hum Genet, (1994). 54, 1092–1103.
CAS PubMed PubMed Central Google Scholar
Fan, R Interval mapping of quantitative trait loci. (2002). http://stat.tamu.edu/∼rfan/paper.html/interval_mapping.pdf
Lange, K Mathematical and Statistical Methods for Genetic Analysis, New York: Springer-Verlag (1997).
Book Google Scholar
Almasy, L & Blangero, J : Multipoint quantitative trait linkage analysis in general pedigrees. Am J Hum Genet, (1998). 62, 1198–1211.
Article CAS Google Scholar
Stuart, A & Ord, JK Kendall's Advanced Theory of Statistics: Classical Inference and Relationships, Vol. 2, Oxford (1991). 5th edn
Google Scholar
Graybill, FA : Theory and Application of the Linear Model. California: Pacific Grove (1976).
Haseman, JK & Elston, RC : The investigation of linkage between a quantitative trait and a marker locus. Behavior Genetics, (1972). 2, 3–19.
Article CAS Google Scholar
Hartl, DL & Clark, AG Principles of Population Genetics, 2nd edn Sinauer (1989).
Google Scholar
Boehnke, M & Langefeld, CD : Genetic association mapping based on discordant sib pairs: the discordant-alleles test. Am J Hum Genet, (1998). 62, 950–961.
Article CAS Google Scholar
Pratt, SC, Daly, M & Kruglyak, : Exact multipoint quantitative-trait linkage analysis in pedigrees by variance components. Am J Hum Genet, (2000). 66, 1153–1157.
Article CAS Google Scholar

Download references

Acknowledgements

We thank two reviewers, and Dr Gert-Jan B van Ommen for their helpful comments to improve the paper. R Fan was supported partially by a research fellowship from the Alexander von Humboldt Foundation, Germany, and an International Research Travel Assistance Grant of the Texas A&M University. M Xiong was supported by NIH grant R01-GM56515, and MH59518. Ms JS Jung kindly writes the SAS micro, which is available from R Fan upon request.

Author information

Authors and Affiliations

Department of Statistics, The Texas A&M University, 447 Blocker Building, College Station, 77843-3143, Texas, USA
Ruzong Fan
Institute of Medical Biometry, Informatics and Epidemiology, University of Bonn, Sigmund Freud Strasse 25, Bonn, D-53105, Germany
Ruzong Fan
Human Genetics Center, University of Texas – Houston, P.O. Box 20334, Houston, 77225, Texas, USA
Momiao Xiong

Authors

Ruzong Fan
View author publications
You can also search for this author in PubMed Google Scholar
Momiao Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruzong Fan.

Appendices

Appendix A

In this Appendix, we show equation (6). Actually, we have

Similarly, we may show the other terms in equation (6).

Appendix B

By equations (6), (7), and large number theory, we can show the approximation (8).

For instance, the approximation for element on the second row and the second column is

For the element on the forth row and the forth column, we have

Appendix C

To prove equation (10), we first have

Similarly, we may show that To show the other terms of (10), we first calculate the joint probabilities P(A₁,B₂), in which the first offspring's genotype is A₁ at marker A and the second offspring's genotype is B₂ at marker B, A₁∈{AA,As, aa}, B₂∈{BB, Bb, bb}.We need to consider nine possible phase {AA, Aa, aa}×{BB, Bb, bb} for each parent. At the first glance, one needs to consider 9×9 possible matings to calculate P(A₁,B₂). However, many matings do not lead to specific genotypes (A₁,B₂) of a sib pair. This eliminates many terms and reduces the amount of calculations. For instance, a mating of (A_f =AA,B_f =BB)×(A_m=AA,B_m=BB) only results o.spring with genotype (AA,BB). Then, we have

Symmetrically, we may get the following three terms

Note that P(A₁=AA, B₂=Bb)=P(A₁−AA)−P(A₁=AA, B₂=BB or bb). Hence,

Similarly, we may calculate the following three terms

Finally, we can calculate the following term using equation Σ_A1 Σ_B2 P(A₁,B₂)=1

Using equations (16), (17), (18), (19) and (20), we may calculate

Similarly, we may get . By symmetric property, we may calculate the remaining terms in (10).

Appendix D

Let be the matrix given by (9). To show the approximation of (11), we notice that d₁₁ can be calculated by The element on the second row and the second column of approximation (11) can be calculated by

Similarly, the element on the forth row and the forth column of approximation (11) is

The other terms of approximation (11) can be calculated in a similar manner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, R., Xiong, M. Combined high resolution linkage and association mapping of quantitative trait loci. Eur J Hum Genet 11, 125–137 (2003). https://doi.org/10.1038/sj.ejhg.5200941

Download citation

Received: 04 June 2002
Revised: 14 October 2002
Accepted: 21 November 2002
Published: 14 February 2003
Issue Date: 01 February 2003
DOI: https://doi.org/10.1038/sj.ejhg.5200941

Keywords

This article is cited by

Combined linkage and association mapping of putative QTLs controlling black tea quality and drought tolerance traits
- Robert. K. Koech
- Richard Mose
- Zeno Apostolides
Euphytica (2019)
A gene frequency model for QTL mapping using Bayesian inference
- Wei He
- Rohan L Fernando
- Helene Gilbert
Genetics Selection Evolution (2010)
Combined Linkage and Association Mapping of Quantitative Trait Loci with Missing Completely at Random Genotype Data
- Ruzong Fan
- Lian Liu
- Ming Zhong
Behavior Genetics (2008)
Pedigree linkage disequilibrium mapping of quantitative trait loci
- Ruzong Fan
- Christie Spinka
- Jee Sun Jung
European Journal of Human Genetics (2005)

Combined high resolution linkage and association mapping of quantitative trait loci

Abstract

Similar content being viewed by others

Leveraging information between multiple population groups and traits improves fine-mapping resolution

Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data

Population-specific long-range linkage disequilibrium in the human genome and its influence on identifying common disease variants

Introduction

Models