Introduction

Recent publications of genome-wide association scans (GWAS) for a range of diseases1, 2, 3, 4 and quantitative phenotypes4, 5, 6 have demonstrated the feasibility of this ‘unbiased’ approach to gene discovery. It is now clear from published GWAS that effect sizes are small, with relative genotype risks typically <1.5. For quantitative traits, the individual effect sizes are consistent with <1% of the phenotypic variance being explained by a single polymorphism.4, 5, 7 For such traits, it might appear inefficient to include related individuals in the first stage of a GWAS because relatives are ‘over-matched’ for genotypes. It is known that for simple tests of association with disease, where the assumption of common causal variants is true, the use of relatives at the expense of unrelated individuals can cause a reduction in power. For example, sib-controls are over-matched to index cases, leading to a loss of power compared with unrelated case–control studies.8, 9 Although most GWAS to date have used unrelated cases and controls, a number of studies have used related individuals.6, 10

Numerous association study designs for binary phenotypes were considered by Risch and colleagues8, 9, 11 and recently by others.12 The conclusion from these studies was that selecting multiple cases from multiplex families increased power, in particular for a rare allele with a large effect on disease susceptibility, but that selecting controls that are related decreased power. The reason for the former is that the susceptibility allele is enriched in the multiplex families when there is a strong phenotype–genotype relationship. The reason for the latter is that the controls are over-matched relative to the cases.

However, common variants with small effect size that are targeted in a GWAS imply that for statistical power, both the advantages and disadvantages of having relatives diminish. Here we show, for a GWAS of a quantitative trait, that surprisingly little power is lost when genotyping related individuals. Since genotyping of relatives has many advantages (QC, linkage analysis, parent of origin effects), these results argue for including relatives in a GWAS where possible.

Methods

For a quantitative trait, we assume that the QTL heritability (q2) is small (so that 1−q2≈1 and ln(1+q2)≈q2) and assume an additive model. Let ρ be the phenotypic correlation of the relatives and r the coefficient of relationship (=twice the kinship coefficient). We consider the non-centrality-parameter (NCP, e.g.13) of a test for association, using either two unrelated individuals or a pair of related individuals with coefficient of relationship r. For n unrelated individuals, the NCP is NCPU=nq2/(1−q2)≈nq2, so 2q2 per pair of unrelated individuals. The NCP for any pair of relatives can be derived using regression theory. Following Visscher and Duffy,14 the NCP per family (size two) is, NCPrelatives≈2q2(1−ρr)/(1−ρ2). For sibships (r=1/2), this result is identical to the approximate NCP for total association (λB+λW) from Sham et al13 The ratio of this NCP to that from having two unrelated individuals is,

This simple expression shows that the relative power of unrelated versus related pairs of individuals only depends on the phenotypic correlation of the relatives and the coefficient of relatedness. In practice, the phenotypic correlation will usually be smaller than the coefficient of relatedness. For example, for sibling pairs or dizygotic twin pairs (r=1/2), estimates of phenotypic correlations for systolic blood pressure and body mass index were 0.2315 and 0.26,16 respectively. For some traits, in particular those where the resemblance between relatives has a strong environmental component, sibling phenotypic correlations can be >1/2. For example, estimates of sibling phenotypic correlations for leukocyte telomere length and forced expiratory volume were 0.6717 and 0.64,18 respectively. When the resemblance between relatives is solely due to additive genetic effects then the phenotypic correlation is always smaller than the coefficient of relatedness. In which case, the ratio in Equation [1] is less than one and power is lost by using relatives. For sibling pairs, the ratio of NCP is (1−(1/2)ρ)/(1−ρ2). For the special case that all family resemblance is due to additive genetic effects (ρ=rh2), the ratio of NCP for pairs of relatives is (1−r2h2)/(1−r2h4)≈1−r2h2(1−h2). For sibships of arbitrary size, the results from Sham et al13 can be used to quantify the ratio of NCP. The ratio of the approximate NCP for s siblings versus s unrelated individuals was derived from,13

Results

We used the above results to quantify the loss in power for sibships (Figure 1) and, assuming an additive genetic model of family resemblance, for pairs of related individuals with a range of relationships (Figure 2). Clearly, the loss in efficiency is very small for small sibships and for pairs of relatives with a coefficient of relationship less than 1/2. For sibling pairs, the largest loss (ratio of NCP of 0.93, or loss of 7%) is for a phenotypic correlation of 0.3 (Figure 1). For larger sibships, genotyping all siblings leads to a theoretical loss of power of 5–20% for a realistic range of parameters (Figure 1). The maximum loss in power approaches 50% when genotyping both pairs of monozygotic twins (r=1) and when the heritability is large (Figure 2), but clearly both monozygotic individuals would not be genotyped in practice for a GWAS. By differentiating Equation [1], for a given coefficient of relationship (r) the loss in power is maximum when the phenotypic correlation (ρ) is approximately r/2 (since [1−√(1−r2)]/rr/2), and the relative power at this value of ρ is √(1−r2)/{1−[1−√(1−r2)]/r]2}≈1−r2/4. For sib pairs (r=1/2), the minimum power of 0.933 is obtained when ρ is 0.268. The quantified loss in power (say, 5–10%) is small relative to the loss in power in most GWAS due to incomplete SNP coverage.1, 2, 3, 4

Figure 1
figure 1

Relative power of GWAS for sibships versus unrelated individuals, for the same cost of genotyping. ρ is the phenotypic correlation between siblings.

Figure 2
figure 2

Relative power of GWAS for pairs of related individuals versus unrelated individuals, for the same cost of genotyping, assuming that all family resemblance is due to additive genetic effects. The coefficient of relationship (x axis) is twice the kinship coefficient. Value of 1/8, 1/4, 1/2 and 1 correspond to, for example, first cousins, half sibs, full sibs and monozygotic twin pairs.

In extreme cases, when the phenotypic correlation is larger than the coefficient of relationship, using relatives can actually increase power. For example, for telomere length and forced expiratory volume (assuming phenotypic correlations of 0.67 and 0.64 for sibling pairs), the efficiency of having pairs of siblings relative to the same number of unrelated individuals is 1.152 and 1.207, respectively (from Equation (1)).

Discussion and conclusions

The benefits of having relatives in a GWAS are manifold. They include the ability to perform more quality control, for example Mendelian error checking and IBD sharing, the choice to perform within-family tests of association that are robust to population stratification, the ability to perform parent-of-origin analysis and the ability to perform joint linkage and association analysis. Furthermore, where cohorts of related individuals already exist, including large consortia such as GenomEUtwin,19 substantial gains in power may be obtained by utilising relatives due to the resulting increase in sample size. Robust and powerful statistical methods exist, which can incorporate familial relationships in association analysis of both quantitative20, 21 and binary21, 22, 23 phenotypes.

For large sibships and in general larger pedigrees, it would be inefficient to genotype all individuals for all markers (Figure 1). In those cases, only genotyping a subset of the individuals and imputing genotypes for ungenotyped individuals would be more cost effective.14, 24 The NCP for association in an arbitrary complex pedigree can be investigated numerically by calculating the variance of the best linear predictor from a linear mixed model, assuming known variance components.25

In conclusion, contrary to common belief, there is hardly any loss in power when using relatives to conduct an association study and the advantages of using relatives in GWAS may well outweigh the small disadvantage in terms of statistical power, by providing a more robust and flexible strategy for analysis.