Two recent reviews, one in Nature Reviews Genetics from Bansal et al. (Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet. 11, 773–785 (2010))1 and one elsewhere2, examined the emerging area of rare variant association studies. These reviews nicely describe the progression from association studies for common SNPs towards those for rare variants. We would like to add to these discussions a strategy that has been used by several groups for rare variant case–control association studies. This strategy was developed independently of genome-wide association (GWA) studies and is largely confined to cancer genetics, and we refer to it here as case–control mutation screening (CCMS).

Ideas contributing to CCMS are as follows. First, linkage analysis shows that evidence from many, individually very rare sequence variants at the same locus can be combined3. Second, clinical testing of susceptibility genes such as breast cancer 1, early onset (BRCA1) and BRCA2 has shown that testing can be based on sequencing rather than genotyping. Third, the integrated evaluation of unclassified variants in BRCA1 and BRCA2 has shown that in silico assessment of rare variants — currently, rare missense substitutions (rMSs) — can be used to grade variants on the basis of predicted severity without attempting to dichotomize them as deleterious versus neutral4. Finally, lessons from GWA studies tell us that well-powered CCMS studies will be large, usually multi-centre and often multi-ethnic, and therefore must be analysed by statistical methods that allow for covariates.

The development of CCMS can be traced through the efforts of the genetics community to understand the contribution of heterozygous sequence variation in ataxia telangiectasia mutated (ATM) to risk of breast cancer (Table 1). Analysis of ATM CCMS data started with a case–control study that used a cohort allelic sums test limited to protein-truncating variants plus variants that clearly damage splice junctions (T+SJVs)5, Analyses progressed to a two-pronged strategy of analysing the pool of ATM T+SJVs in one logistic regression and the pool of rMSs in a second logistic regression6. The subtlety in this latter approach lies in combining all of the rMSs into a single categorical variable that incorporates prior information, such as sequence conservation, and grades the severity of rMSs from probably harmful to probably benign4,6. This variable is easily assessed in a logistical regression test for trend, thus minimizing the multiple testing problem while accommodating epidemiologic covariates. We believe that this form of CCMS, augmented by steadily improving statistical methods7,8, will be useful for identifying genes that harbour variants conferring intermediate risk, especially those in which most pathogenic variants are rare and either reduce or ablate function.

Table 1 Key analyses of association of rare ATM variants with breast cancer

Going forward, improving the accuracy and scope of methods for predicting sequence variant severity is a key goal. To this end, the Critical Assessment of Genomic Interpretation community exercise will illuminate the capabilities of current approaches and inform their further development. An important additional issue is that methods for predicting gene dysfunction must be sufficiently transparent to allow other researchers to readily replicate predictions and judge the effects of hidden multiple testing (which maybe introduced by the prediction of sequence variant severity) on CCMS data analysis.