Abstract
For complex disease genetics research in human populations, remarkable progress has been made in recent times with the publication of a number of genome-wide association scans (GWAS) and subsequent statistical replications. These studies have identified new genes and pathways implicated in disease, many of which were not known before. Given these early successes, more GWAS are being conducted and planned, both for disease and quantitative phenotypes. Many researchers and clinicians have DNA samples available on collections of families, including both cases and controls. Twin registries around the world have facilitated the collection of large numbers of families, with DNA and multiple quantitative phenotypes collected on twin pairs and their relatives. In the design of a new GWAS with a fixed budget for the number of chips, the question arises whether to include or exclude related individuals. It is commonly believed to be preferable to use unrelated individuals in the first stage of a GWAS because relatives are ‘over-matched’ for genotypes. In this study, we quantify that for GWAS of a quantitative phenotype, relative to a sample of unrelated individuals surprisingly little power is lost when using relatives. The advantages of using relatives are manifold, including the ability to perform more quality control, the choice to perform within-family tests of association that are robust to population stratification, and the ability to perform joint linkage and association analysis. Therefore, the advantages of using relatives in GWAS for quantitative traits may well outweigh the small disadvantage in terms of statistical power.
Similar content being viewed by others
Introduction
Recent publications of genome-wide association scans (GWAS) for a range of diseases1, 2, 3, 4 and quantitative phenotypes4, 5, 6 have demonstrated the feasibility of this ‘unbiased’ approach to gene discovery. It is now clear from published GWAS that effect sizes are small, with relative genotype risks typically <1.5. For quantitative traits, the individual effect sizes are consistent with <1% of the phenotypic variance being explained by a single polymorphism.4, 5, 7 For such traits, it might appear inefficient to include related individuals in the first stage of a GWAS because relatives are ‘over-matched’ for genotypes. It is known that for simple tests of association with disease, where the assumption of common causal variants is true, the use of relatives at the expense of unrelated individuals can cause a reduction in power. For example, sib-controls are over-matched to index cases, leading to a loss of power compared with unrelated case–control studies.8, 9 Although most GWAS to date have used unrelated cases and controls, a number of studies have used related individuals.6, 10
Numerous association study designs for binary phenotypes were considered by Risch and colleagues8, 9, 11 and recently by others.12 The conclusion from these studies was that selecting multiple cases from multiplex families increased power, in particular for a rare allele with a large effect on disease susceptibility, but that selecting controls that are related decreased power. The reason for the former is that the susceptibility allele is enriched in the multiplex families when there is a strong phenotype–genotype relationship. The reason for the latter is that the controls are over-matched relative to the cases.
However, common variants with small effect size that are targeted in a GWAS imply that for statistical power, both the advantages and disadvantages of having relatives diminish. Here we show, for a GWAS of a quantitative trait, that surprisingly little power is lost when genotyping related individuals. Since genotyping of relatives has many advantages (QC, linkage analysis, parent of origin effects), these results argue for including relatives in a GWAS where possible.
Methods
For a quantitative trait, we assume that the QTL heritability (q2) is small (so that 1−q2≈1 and ln(1+q2)≈q2) and assume an additive model. Let ρ be the phenotypic correlation of the relatives and r the coefficient of relationship (=twice the kinship coefficient). We consider the non-centrality-parameter (NCP, e.g.13) of a test for association, using either two unrelated individuals or a pair of related individuals with coefficient of relationship r. For n unrelated individuals, the NCP is NCPU=nq2/(1−q2)≈nq2, so 2q2 per pair of unrelated individuals. The NCP for any pair of relatives can be derived using regression theory. Following Visscher and Duffy,14 the NCP per family (size two) is, NCPrelatives≈2q2(1−ρr)/(1−ρ2). For sibships (r=1/2), this result is identical to the approximate NCP for total association (λB+λW) from Sham et al13 The ratio of this NCP to that from having two unrelated individuals is,
This simple expression shows that the relative power of unrelated versus related pairs of individuals only depends on the phenotypic correlation of the relatives and the coefficient of relatedness. In practice, the phenotypic correlation will usually be smaller than the coefficient of relatedness. For example, for sibling pairs or dizygotic twin pairs (r=1/2), estimates of phenotypic correlations for systolic blood pressure and body mass index were 0.2315 and 0.26,16 respectively. For some traits, in particular those where the resemblance between relatives has a strong environmental component, sibling phenotypic correlations can be >1/2. For example, estimates of sibling phenotypic correlations for leukocyte telomere length and forced expiratory volume were 0.6717 and 0.64,18 respectively. When the resemblance between relatives is solely due to additive genetic effects then the phenotypic correlation is always smaller than the coefficient of relatedness. In which case, the ratio in Equation [1] is less than one and power is lost by using relatives. For sibling pairs, the ratio of NCP is (1−(1/2)ρ)/(1−ρ2). For the special case that all family resemblance is due to additive genetic effects (ρ=rh2), the ratio of NCP for pairs of relatives is (1−r2h2)/(1−r2h4)≈1−r2h2(1−h2). For sibships of arbitrary size, the results from Sham et al13 can be used to quantify the ratio of NCP. The ratio of the approximate NCP for s siblings versus s unrelated individuals was derived from,13
Results
We used the above results to quantify the loss in power for sibships (Figure 1) and, assuming an additive genetic model of family resemblance, for pairs of related individuals with a range of relationships (Figure 2). Clearly, the loss in efficiency is very small for small sibships and for pairs of relatives with a coefficient of relationship less than 1/2. For sibling pairs, the largest loss (ratio of NCP of 0.93, or loss of 7%) is for a phenotypic correlation of 0.3 (Figure 1). For larger sibships, genotyping all siblings leads to a theoretical loss of power of 5–20% for a realistic range of parameters (Figure 1). The maximum loss in power approaches 50% when genotyping both pairs of monozygotic twins (r=1) and when the heritability is large (Figure 2), but clearly both monozygotic individuals would not be genotyped in practice for a GWAS. By differentiating Equation [1], for a given coefficient of relationship (r) the loss in power is maximum when the phenotypic correlation (ρ) is approximately r/2 (since [1−√(1−r2)]/r≈r/2), and the relative power at this value of ρ is √(1−r2)/{1−[1−√(1−r2)]/r]2}≈1−r2/4. For sib pairs (r=1/2), the minimum power of 0.933 is obtained when ρ is 0.268. The quantified loss in power (say, 5–10%) is small relative to the loss in power in most GWAS due to incomplete SNP coverage.1, 2, 3, 4
In extreme cases, when the phenotypic correlation is larger than the coefficient of relationship, using relatives can actually increase power. For example, for telomere length and forced expiratory volume (assuming phenotypic correlations of 0.67 and 0.64 for sibling pairs), the efficiency of having pairs of siblings relative to the same number of unrelated individuals is 1.152 and 1.207, respectively (from Equation (1)).
Discussion and conclusions
The benefits of having relatives in a GWAS are manifold. They include the ability to perform more quality control, for example Mendelian error checking and IBD sharing, the choice to perform within-family tests of association that are robust to population stratification, the ability to perform parent-of-origin analysis and the ability to perform joint linkage and association analysis. Furthermore, where cohorts of related individuals already exist, including large consortia such as GenomEUtwin,19 substantial gains in power may be obtained by utilising relatives due to the resulting increase in sample size. Robust and powerful statistical methods exist, which can incorporate familial relationships in association analysis of both quantitative20, 21 and binary21, 22, 23 phenotypes.
For large sibships and in general larger pedigrees, it would be inefficient to genotype all individuals for all markers (Figure 1). In those cases, only genotyping a subset of the individuals and imputing genotypes for ungenotyped individuals would be more cost effective.14, 24 The NCP for association in an arbitrary complex pedigree can be investigated numerically by calculating the variance of the best linear predictor from a linear mixed model, assuming known variance components.25
In conclusion, contrary to common belief, there is hardly any loss in power when using relatives to conduct an association study and the advantages of using relatives in GWAS may well outweigh the small disadvantage in terms of statistical power, by providing a more robust and flexible strategy for analysis.
References
WTCCC: Genome-wide association study of 14 000 cases of seven common diseases and 3000 shared controls. Nature 2007; 447: 661–678.
Sladek R, Rocheleau G, Rung J et al: A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 2007; 445: 881–885.
Scott LJ, Mohlke KL, Bonnycastle LL et al: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 2007; 316: 1341–1345.
Saxena R, Voight BF, Lyssenko V et al: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007; 316: 1331–1336.
Weedon MN, Lettre G, Freathy RM et al: A common variant of HMGA2 is associated with adult and childhood height in the general population. Nat Genet 2007; 39: 1245–1250.
Scuteri A, Sanna S, Chen WM et al: Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet 2007; 3: e115.
Frayling TM, Timpson NJ, Weedon MN et al: A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007; 316: 889–894.
Teng J, Risch N : The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases. II. Individual genotyping. Genome Res 1999; 9: 234–241.
Risch N, Teng J : The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res 1998; 8: 1273–1288.
Moffatt MF, Kabesch M, Liang L et al: Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 2007; 448: 470–473.
Risch NJ : Searching for genetic determinants in the new millennium. Nature 2000; 405: 847–856.
Li M, Boehnke M, Abecasis GR : Efficient study designs for test of genetic association using sibship data and unrelated cases and controls. Am J Hum Genet 2006; 78: 778–792.
Sham PC, Cherny SS, Purcell S, Hewitt JK : Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 2000; 66: 1616–1630.
Visscher PM, Duffy DL : The value of relatives with phenotypes but missing genotypes in association studies for quantitative traits. Genet Epidemiol 2006; 30: 30–36.
Hottenga JJ, Whitfield JB, de Geus EJ, Boomsma DI, Martin NG : Heritability and stability of resting blood pressure in Australian twins. Twin Res Hum Genet 2006; 9: 205–209.
Schousboe K, Willemsen G, Kyvik KO et al: Sex differences in heritability of BMI: a comparative study of results from twin studies in eight countries. Twin Res 2003; 6: 409–421.
Andrew T, Aviv A, Falchi M et al: Mapping genetic loci that determine leukocyte telomere length in a large sample of unselected female sibling pairs. Am J Hum Genet 2006; 78: 480–486.
Ferreira MA, O'Gorman L, Le Souef P et al: Variance components analyses of multiple asthma traits in a large sample of Australian families ascertained through a twin proband. Allergy 2006; 61: 245–253.
Peltonen L : GenomEUtwin: a strategy to identify genetic influences on health and disease. Twin Res 2003; 6: 354–360.
Abecasis GR, Cardon LR, Cookson WO : A general test of association for quantitative traits in nuclear families. Am J Hum Genet 2000; 66: 279–292.
Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM : PBAT: tools for family-based association studies. Am J Hum Genet 2004; 74: 367–369.
Thornton T, McPeek MS : Case–control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet 2007; 81: 321–337.
Goring HH, Terwilliger JD : Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am J Hum Genet 2000; 66: 1310–1327.
Chen WM, Abecasis GR : Family-based association tests for genome-wide association scans. Am J Hum Genet 2007; 81: 913–926.
Lynch M, Walsh B : Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer Associates, 1998.
Acknowledgements
We thank David Goldgar and Bill Hill for helpful discussions and the referees for useful suggestions. This work was supported by Australian NHMRC Grants 389892, 339462 and 442915 and Australian Research Council Grant DP0770096.
Author information
Authors and Affiliations
Corresponding author
Additional information
Electronic-Database Information
GenomEUtwin http://www.genomeutwin.org/
Rights and permissions
About this article
Cite this article
Visscher, P., Andrew, T. & Nyholt, D. Genome-wide association studies of quantitative traits with related individuals: little (power) lost but much to be gained. Eur J Hum Genet 16, 387–390 (2008). https://doi.org/10.1038/sj.ejhg.5201990
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.ejhg.5201990
Keywords
This article is cited by
-
On the number of siblings and p-th cousins in a large population sample
Journal of Mathematical Biology (2018)
-
Whole-genome sequencing suggests a chemokine gene cluster that modifies age at onset in familial Alzheimer's disease
Molecular Psychiatry (2015)
-
Novel genomic approaches unravel genetic architecture of complex traits in apple
BMC Genomics (2013)
-
The Use of Imputed Sibling Genotypes in Sibship-Based Association Analysis: On Modeling Alternatives, Power and Model Misspecification
Behavior Genetics (2013)
-
A Twin Association Study of Nicotine Dependence with Markers in the CHRNA3 and CHRNA5 Genes
Behavior Genetics (2011)