Genome-wide association studies have yielded a bounty of common genetic variants linked to complex diseases and traits. But the plentiful harvest has borne meager fruit: for many traits, only a fraction of the predicted heritability can be explained by the combined effects of these variants.

With the ability to sequence genomic regions deeply, biologists are looking beyond common sequence differences and are finding that rare variants can contribute substantially to common diseases. The problem is that the low frequency of these variants makes them hard to pick out in a sea of neutral differences. A bevy of statistical methods aims to maximize the power to detect causal variants by grouping them to increase the collective power of association, but it is difficult to gauge their effectiveness.

Brent Richards and colleagues at McGill University now provide an empirical estimate of the relative power of a number of such tools. A key strength of their work is in using high-quality Sanger sequencing data—rather than more error-prone genotype resequencing data—for seven genes from 1,998 individuals as the basis for modeling the effects of variants on phenotype.

The group modeled phenotype scenarios that address different hypotheses used in three classes of rare-variant detection methods. One class groups markers from the same region, a second weights the importance of alleles based on criteria such as predicted functionality in a protein, and a third calls variants as functional based on changes in distribution between cases and controls. In one scenario, for example, the authors simulated deleterious and protective alleles in the same gene, a condition that is considered in the design of distributive methods.

Despite the large sample size, low power plagued all the methods, and no single method worked well across all scenarios. They showed that power increases with the size of a variant effect and with the proportion of variants that are causal. The results serve as a warning that causal variants with small effects will be missed by current approaches.

Real data likely represent a complex mixture of scenarios. Until new tools are available to tease these differences from the data, biologists will need to keep the limitations of the individual hypotheses used by current methods in mind.