The Wellcome Trust Case Control Consortium (WTCCC)1 recently published a landmark study for genetics of common diseases, reporting genome-wide association (GWA) results in ∼2000 Caucasian patients for each of seven common diseases and 3000 shared controls, each genotyped with an Affymetrix 500K platform of evenly distributed markers. Exceedingly rigorous quality control and statistical analyses were performed. Multiple preceding association hypotheses were supported (with significance defined as nominal P-value <10−7) in several common diseases. In bipolar disorder (BD), there was only one association that met the criterion, and it did not correspond to previous association reports.

The hypothesis tested in a GWA is that somewhere in the genome, with detectable linkage disequilibrium to the tested markers, there is at least one allele or genotype that is significantly associated with a studied disease after correction for the number of tests performed. A key assumption of GWA is that a priori power to detect association is not too small to diminish credibility of positive results (or to make notable an absence of positive results). Altshuler and Daly,2 commenting on a priori power to detect association in the WTCCC study, emphasize that single-nucleotide polymorphisms (SNPs) with modest effect (modest odds ratio (OR)) are unlikely to be detected in data sets of the WTCCC size, even though some have in fact been detected.

If we assume a significance threshold at P<10−7, linkage disequilibrium between marker and disease allele with r2=0.8, and unscreened controls, the WTCCC study had 80% power to detect ORs ⩾1.6, 1.4 and 1.5 for allele frequencies of 0.1, 0.5 and 0.75 respectively. To detect ORs ⩾1.2, it would take ∼12 000, 5000 and 7000 cases (with 1.5 times as many controls) for the same allele frequencies. For significance threshold of P<10−6, and ORs ⩾1.2, 80% power would require ∼11 000, 4000 and 6000 cases for these allele frequencies. Given the paucity of findings so far in BD, the implication of these calculations is that even larger sample sizes than were assembled by the WTCCC will be needed to detect genes with modest effect.

Modest risk ratios can result from multiple disease genes and allelic heterogeneity (more than one disease-associated allele at the same locus or haplotype), since each allele of each gene must generally be detected separately. Epistasis and non-analyzed environmental factors may also reduce the OR of a true disease allele.

Some types of hypothesis have not, so far, been tested in the WTCCC data. The first type includes multi-marker hypotheses (including association with haplotypes and gene–gene interaction), copy number variation, subphenotype hypotheses and uncommon/rare-variants association.

A debate over whether uncommon alleles may be expected in common disease has gone on for some years.3, 4, 5 Nonetheless, the field would be unwise not to consider methods for detecting them. Multi-marker haplotypes6 or analysis of patterns of segmental sharing7 may succeed in detecting association with some uncommon disease alleles, but it would appear that resequencing may be required to detect at least some and possibly most uncommon disease alleles and certainly for detecting rare-variants associated with disease.8

Second, hypotheses based on linkage, candidate genes and other non-genome-wide hypotheses may be tested using data from a GWA experiment, but they are logically and statistically different from GWA. There are legitimate assumptions based on pre-existing linkage, association and aneuploidy results that would generate association experiments with considerably smaller probability spaces than a GWA experiment (that is, there is no probability of outcome defined in parts of the genome not included in the hypothesis). The nominal P-values required for statistical significance would be, therefore, less stringent than for a genome-wide experiment. Nonetheless, one must always be cautious about hypotheses that could conceivably be tailored to known results after an experiment. Repeated replication, independent biological corroboration and meta-analyses are more needed for a regional or gene-based hypothesis than for a straightforward GWA result.

Bipolar disorder in the WTCCC has one significant GWA result, on 16p12, at SNP rs420259. This result is not terribly distant (11 Mb) from the peak of a non-parametric linkage report of the NIMH collaborative study9 and is 18 Mb from the peak for a parametric linkage report.10 However, there are multiple linkage scans that do not show this linkage. The 16p12 region does not appear positive in the major meta-analyses of BD linkage.11, 12, 13 Lack of persuasive linkage evidence does not invalidate an association finding, of course.

Recently, Baum et al.,14 in the McMahon lab at NIMH Intramural (USA), reported a 2-stage GWA study of BD, by pooling Caucasian samples and controls from an NIMH study in the first stage. Single-nucleotide polymorphisms with nominal P<0.05, OR >1.4 and near a known gene were tested on German cases and controls in a second stage. Genotyping was on an Illumina platform, whereas the WTCCC used an Affymetrix platform. The total numbers of cases and controls, and the Caucasian samples, were less than but comparable to the WTCCC BD and control samples. Single-nucleotide polymorphisms that showed association findings of P<0.05 in both sets of pooled case and control samples were studied with individual genotyping. Individual genotyping association results in Baum et al. were significant (P<10−7) for rs1012053, which is in the DGKH gene on chromosome 13q.14.

It is disappointing to note the lack of correspondence between the results of the two published GWA papers. None of the alleles selected for individual genotyping based on the pooling experiment (in the Baum et al. study) have suggestive P-values at or very close to the significant or suggestive SNPs in the WTCCC study. To look for overlap, we took the P-values of the two studies as three samples (NIMH (pooled), German (pooled) and WTCCC), filtered P<0.05 in all three samples and calculated a combined probability using Fisher's χ2-calculation. Wherever possible, imputed genotypes from the WTCCC study (with its Affymetrix platform) were used to give genotypic and allelic frequencies for corresponding SNPs in the Baum study (with its Illumina platform). No combined values had P<10−6. The two ‘best’ values were for rs10791345 and rs4806874, with P=5 × 10−6 and 9 × 10−6 respectively. In the blogosphere (http://www.genetics.med.ed.ac.uk/blog/, http://www.polygenicpathways.co.uk/Bipolargenes.html), the two studies are interpreted to show considerable overlap, but this is not statistically correct. In the Schizophrenia Research Forum (http://www.schizophreniaforum.org/res/sczgene/default.asp), the ‘best’ agreement was reported to be for the DFNB31 gene on chromosome 9, where the G allele of rs10982246 has a P-value of 2.6 × 10−6 in the WTCCC study (using a trend test), and the G allele of rs942518 (only 22 kb away) has a P-value of 0.0001 in the combined samples of the Baum et al. study. The imputed WTCCC G allele of rs942518 gives an association P-value of 0.43, however, so these data cannot be counted as a consistent association between the two studies.

The design differences between the WTCCC and Baum et al. studies could have contributed to the discrepant outcomes. If this is the case, then the ongoing GWA of thousands of individuals from the NIMH samples in the US should come out with more similar results to WTCCC. These ongoing studies are based on the same sample source (NIMH) as Baum et al., but have larger sample sizes, and will perform individual genotypes on a comparable Affymetrix platform to the WTCCC.

But we suspect that the lack of consistent BD associations is due to the nature of the underlying genes. As noted above, there are a number of genetic analyses that have not yet been performed, including set-based analyses.7 For discovery of individual low-OR loci, the only systematic solution would be much larger samples, according to the discussions above and in Altshuler and Daly.2 For smaller samples, true positives may be detected but not replicated in other, similar-sized samples, and this may have led to the discrepancies between the two BD GWA publications.

For uncommon and rare-variants association, extensive resequencing in selected regions may be required. It is also possible that phenotypic refinements are needed, and that these may be generated by multivariate analysis of clinical data already present in the NIMH databases, or by biological studies of new individuals who volunteer for these large-scale samples.