Abstract
Several methods to correct for multiple testing within a gene region have been proposed. These methods are useful for candidate gene studies, and to fine map generegions from GWAs. The Bonferroni correction and permutation are common adjustments, but are overly conservative and computationally intensive, respectively. Other options include calculating the effective number of independent singlenucleotide polymorphisms (SNPs) or using theoretical approximations. Here, we compare a theoretical approximation based on extreme tail theory with four methods for calculating the effective number of independent SNPs. We evaluate the typeI error rates of these methods using single SNP association tests over 10 gene regions simulated using 1000 Genomes data. Overall, we find that the effective number of independent SNP method by Gao et al, as well as extreme tail theory produce typeI error rates at the or close to the chosen significance level. The typeI error rates for the other effective number of independent SNP methods vary by gene region characteristics. We find Gao et al and extreme tail theory to be efficient alternatives to more computationally intensive approaches to control for multiple testing in gene regions.
Introduction
Methods to control for multiple comparisons within gene regions are used for various kinds of analyses including candidate gene studies, and higher order analyses such as singlenucleotide polymorphism (SNP)–SNP interaction analysis among pairs or groups of genes. The Bonferroni correction is simple and widely used, but is usually overly conservative due to high correlation among SNPs in a chromosomal region. Permutation provides a typeI error rate that asymptotically approaches the chosen significance level when the units being permuted are exchangeable under the null hypothesis. However, permutation is computationally intensive especially for highthroughput analyses or simulations. A computationally efficient option, which is less conservative than the basic Bonferroni correction, is to calculate the effective number of independent SNPs (M_{effs}) in a gene region and to use this value in the Bonferroni correction.^{1, 2, 3, 4, 5} One further option is to use extreme tail theory to explicitly calculate the probability of detecting a test statistic as large as or larger than the observed maximum test statistic in the gene region.^{6}
Although evaluations of methods to control for multiple testing exist,^{6, 7, 8} the evaluations often have drawbacks. Some were done using 20 or fewer markers, well below the number typically seen in gene regions of a few hundred kb. Others did not simulate gene region variation between replicates, which may be particularly problematic for methods such as M_{eff} methods and extreme tail theory, which are dependent on the correlation within the gene region.
In 2008, Moskvina and Schmidt compared extreme tail theory with the first M_{eff} method developed by Cheverud^{1} and modified by Nyholt^{5} across scenarios with both a small (40) and large (∼6000) number of SNPs.^{8} They found that extreme tail theory produced a more accurate estimate for M_{eff} while still being computationally efficient. Cheverud and Nyholt’s method is known to be overly conservative,^{3, 4} and Moskvina and Schmidt did not compare extreme tail theory with some of the more recent M_{eff} methods as we do here. Thus, the relative merits of extreme tail theory versus more recently proposed M_{eff} methods remain untested.
First, in order to gain a more complete understanding of M_{eff} methods, we compare the methods using simple examples where the number of independent SNPs is known. Using logistic regression to assess the association between case status and SNPs, we compare the typeI error rates for the M_{eff} methods and extreme tail theory in gene region simulations where we strive to overcome some of the drawbacks of previous studies by varying replicate linkage disequilibrium (LD) structure, and including many SNPs in each region. To our knowledge, no study has compared extreme tail theory with the most promising M_{eff} methods.
Methods
Methods to compute the effective number of independent SNPs
Unless otherwise indicated, we calculated the eigenvalues, λ, using eigenvector decomposition in R^{9} with the genotypic correlation of additively coded genotypes using Pearson’s correlation coefficient. We outline the M_{eff} methods below where M represents the total number of markers in the region and M_{x} is the M_{eff} calculated by method x.
Cheverud^{1} and Nyholt^{5}
where var(λ) is the variance of the eigenvalues.
Cheverud first developed equation (1) to calculate the M_{eff} dependent on the variation of the eigenvalues calculated using genotypic correlation. Nyholt then modified Cheverud’s method by using the allelic correlation to calculate the eigenvalues rather than the genotypic correlation.
Li and Ji^{4}
In 2005, Li and Ji developed a method that separates the eigenvalues into two components representing: (1) the correlation between SNP genotypes (the integers of the eigenvalues) and (2) the independent contribution of each SNP (the remainders of the eigenvalues).^{4} Li and Ji sum these components over all of the eigenvalues to estimate the M_{eff}.
In practice, decomposing a correlation matrix into its eigenvalues can sometimes yield very small negative numbers. Taking the floor of these negative eigenvalues for Li and Ji’s method can provide inaccurate results. Thus, when implementing Li and Ji’s method we used the absolute value of the eigenvalues.
Gao et al^{3}
where c is user defined.
Gao et al’s method estimates the M_{eff} as the number of eigenvalues needed to explain a prespecified proportion of the sum of all of the eigenvalues.^{3} They suggest that a threshold of 0.995 works well in most situations, although a higher or lower threshold would likely perform better depending on the LD structure of the gene region. We use Gao’s recommendation of c=0.995 in our implementation.
Galwey^{2}
In 2009, yet another equation to calculate the effective number of independent SNPs was proposed by Galwey. As previously mentioned, eigenvalue decomposition sometimes yields very small negative numbers. As the square root in Galwey’s original equation cannot use negative numbers, Galwey suggests changing all negative eigenvalues to zero. Here, we use the absolute value of the eigenvalues, which we found produces identical results to setting all negative eigenvalues to 0 as Galwey suggested.
Extreme tail theory
To our knowledge, using extreme tail theory to control for multiple comparisons in a gene region was first described by Conneely and Boehnke in 2007^{6} and was further evaluated by Moskvina and Schmidt in 2008.^{8}
Assuming a multivariate normal distribution for test statistics under the null hypothesis of no association, we calculated the probability of observing a maximum test statistic as large as or larger than a certain threshold.
P_{ET} is the probability of observing at least one test statistic whose absolute value is as large or larger than a critical value, Z*; M is total number of markers or test statistics; m is the marker or test statistic indicator; N is total number of subjects; i is subject indicator; Y is the phenotype; x_{m} is SNP m; Z_{m} is test statistic m.
where Φ is the multivariate normal probability density function.
As shown in equation (5), P_{ET} depends on the joint distribution of M statistics (Z_{1},..., Z_{M}) where Z∼N(0, Σ). In 2007, Conneely and Boehnke showed that, under the null hypothesis of no association, the covariance of the test statistics, Σ, can be calculated directly from the correlation between SNPs (X_{1},..., X_{M}). We use this to find the critical value of the test statistic that corresponds to the multivariate probability, P_{ET}, in equation (5).
Simulation design
Simple examples: known number of independent SNPs
We created SNP correlation matrices to have an independent block structure where the SNPs within each block are perfectly correlated. First, we created independent blocks each with an equal number of SNPs (m=2, 5, and 20) and varied the number of blocks from 1 to 10. We then simulated two or three independent blocks each containing a different number of SNPs between 0 and 100. To highlight the implications of the later scenario, we simulated many pairs of blocks (N_{blockpairs}=1, 20, and 50) so that each pair contained a total of 10 SNPs distributed unequally between the two blocks.
As we created situations consisting of independent blocks where the number of independent SNPs is known, we can rewrite the M_{eff} equations in terms of the block structures. Rewriting the equations gives us insight into the tendencies of each method. These simplifications are shown in the Supplementary material.
TypeI error: association model and replicates
Instead of using the total number of gene region SNPs in the Bonferroni correction to control for multiple testing, the M_{eff} can be used instead. While using the M_{eff} in the Bonferroni correction is less conservative than using the total number of SNPs, the resulting typeI error may not be equal to the chosen significance level. Therefore, we performed a simulation study to investigate the ability of the M_{eff} and extreme tail theory approaches to retain the specified typeI error rate. For each gene region SNP, we tested the null hypothesis that the SNP, under an additive genetic model, is not associated with case status using a LRT statistic from a logistic regression. In addition, for each simulation scenario, we calculated the typeI error rate of the traditional Bonferroni correction using the total number of gene region SNPs.
To simulate case status for each replicate and simulation type, we randomly paired 4000 haplotypes from a population (discussed below) to create 2000 subjects, randomly labeling one half as cases and the other half as controls. We used 10 000 replicates to evaluate the typeI error rate.
Gene region simulations
To gain further understanding of each method’s performance over a widevariety of gene regions including many rare variants, we applied all methods on simulated data from the 10 gene regions from the HapMap ENCODE resequencing and genotyping project. For each region, we simulated 1000 cases and 1000 controls for 10 000 replicates using Hapgen^{10, 11} from an initial sample of the 112 CEU haplotypes from the 1000 Genomes Pilot data for which phased haplotype data was available at the time of this publication. Hapgen introduces variation in the gene region between haplotypes while still retaining the general LD and minor allele frequency (MAF) characteristics of the gene region. Basic information about the 10 gene regions is provided in Table 1.
As Conneely et al’s approach is computer intensive as the number of marker’s increases, we broke up each of the Encode gene regions into two subregions. We visually choose the split locations to have the smallest amount of correlation between the two subregions. We then calculated the extreme tail theory adjusted Pvalue for each subregion and used a Bonferroni correction, choosing the minimum of the two Pvalues and multiplying by two for the extreme tail theory Pvalue for the entire region. We calculated the effective number of independent SNPs for each entire gene region, as well as for each subregion so as to better compare the methods with extreme tail theory. We used a logistic regression model to detect the marginal association of each SNP in the region with case status. We chose the SNP with the lowest Pvalue and adjusted for multiple comparisons within each gene region using the M_{eff} methods or extreme tail theory. We compared the typeI error rate of the M_{eff} methods and the extreme tail theory method, overall 10 gene regions, and repeated the analysis after removing all SNPs with MAF<5%.
Results
Simple examples: known number of independent SNPs
Cheverud’s method overestimated the M_{eff}, often to a large extent, both when the number of independent blocks was varied while the number of SNPs within each block stayed the same and when there were unequal groupings of SNPs within the independent blocks (Figure 1). This inflation was predicted by the simplifications of the equations (supplementary material), as well as by Nyholt who has stated that the method is overly conservative when there is strong LD. Nyholt thus recommends removing all redundant SNPs (ie SNPs with an r^{2}=1) before calculating the M_{eff}. Nonetheless, it is useful to see the extent to which the method overestimates the M_{eff} in these examples, especially as the other methods produced M_{eff} that were much closer to the number of independent SNPs.
As can be observed in Figure 1, Galwey’s method accurately estimated the M_{eff} as the number of equally sized blocks varied, but underestimated the M_{eff} when the block groups were of unequal size. Li and Ji’s method accurately estimated the M_{eff} in both scenarios. Finally, Gao et al’s method estimated the number of independent SNPs in most scenarios, but slightly underestimated the number of independent SNPs when the total number of SNPs was large (data not shown). These results were further supported by the mathematical simplifications of the M_{eff} formulas (supplementary section).
Simulations
As seen in Figure 2 and in Supplementary Tables S1 and S2, the results were consistent across all 10 gene regions. Galwey’s method and Li and Ji’s method both had inflated typeI error, the Bonferroni adjustment and Cheverud method had deflated typeI error, and Gao et al’s method and extreme tail theory produced typeI error closest to the true level of 0.05. As suggested previously by Gao et al in another paper,^{13} we saw typeI error rates closer to 0.05 for Gao et al’s method, as well as the other M_{eff} methods when each gene region was split into two to calculate M_{eff}. Finally, most methods produced typeI error rates closer to 0.05 when only common SNPs (MAF>0.05) were included in the analysis. Overall, Gao et al’s method and extreme tail theory produced typeI error rates close or slightly above the true level for the simulation scenarios containing all variants. The typeI error of both methods improved when only including common SNPs with Gao et al’s method being slightly below the true level of 0.05 for some gene regions.
Run time
The time needed to calculate M_{eff} or extreme tail theory increased as the number of SNPs within each region increased. Extreme tail theory was slightly slower than the M_{eff} methods (Supplementary Table S3).
Discussion
Although we compared these methods using a logistic regression with a dichotomous trait, we expect similar performance for continuous traits as long as the model assumptions are met.
Here, we use c=0.995 as Gao et al suggest. Changing c will most often change the M_{eff} calculated for a particular gene region. Smaller values of c will require fewer eigenvalues to reach the threshold and thus M_{eff} will be smaller while larger values of c will require more eigenvalues to reach the cutoff resulting in a larger M_{eff}.
Most studies now implement imputation based on the HapMap samples or the 1000 Genomes Project. We expect the studies that use imputed data would perform similarly to results shown here. Further, Gao showed that his method performed well in imputed data.^{13}
As the number of markers for which we adjust increases to the number of GWAS markers on a chromosome or in the genome, we expect the performance of the methods to decline. This was seen in the 10 gene region simulations using the 1000 Genomes data in which each gene region had between 1000 and 2000 SNPs (Table 1). Although the M_{eff} methods do not implicitly have a computational limit on the number of SNPs, the methods appear to perform better when the large regions are broken into smaller units (Figure 2). Further, as the region size increases, the genotypic correlation matrix used to estimate the correlation of the statistics in extreme tail theory and the eigenvalues for the M_{eff} methods will start to pick up some macro level correlation, such as that due to population stratification, in addition to the gene level correlation. Thus, the methods will be less able to accurately adjust for the correlation in the gene regions. One solution is to breakup the chromosome or the genome into manageable units similar to those used here, such as gene regions or 1 Mb sections. Others have explored using M_{eff} methods on a genomewide scale and find using smaller units to be both computationally practical and effective at retaining the appropriate typeI error.^{8, 13} We expect that adjusting for multiple comparisons using broken up sections would perform very similarly to the results presented in Figure 2.
Recently, researchers have begun conducting association studies using rare variants from the sequence data. Our analysis using 10 gene regions simulated using the 1000 Genomes data is a good example of what may occur when rare variants are included in the analysis. Although we do see a slightly improved performance when only common variants are included in the analysis, it appears that including rare variants in the analysis is still possible without too much change in the overall results for the adjustment for multiple comparisons. However, it is worth noting that the sample size needed to detect association between a SNP and case status at a genomewide level is prohibitively high (in the 100 000s) for rare variants with a moderate effect.^{14} Thus, many researchers are using methods to analyze rare variants that collapse or consider multiple variants together, so that each gene or region is treated as a single variable.^{12, 15, 16} These genes or regions could thus be used as the unit of measure for the M_{eff} methods or the extreme tail theory method. More research is needed to compare the performance in this particular scenario.
We found that extreme tail theory and Gao et al produce a typeI error rate close to or at the chosen significance level. Thus, we recommend using either extreme tail theory or Gao et al’s method to control for multiple testing in gene regions when the gold standard of permutation is not feasible.
References
 1
Cheverud JM : A simple correction for multiple comparisons in interval mapping genome scans. Heredity 2001; 87: 52–58.
 2
Galwey NW : A new measure of the effective number of tests, a practical tool for comparing families of nonindependent significance tests. Genet Epidemiol 2009; 33: 559–568.
 3
Gao X, Starmer J, Martin ER : A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 2008; 32: 361–369.
 4
Li J, Ji L : Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 2005; 95: 221–227.
 5
Nyholt DR : A simple correction for multiple testing for singlenucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 2004; 74: 765–769.
 6
Conneely KN, Boehnke M : So many correlated tests, so little time! rapid adjustment of p values for multiple correlated tests. Am J Hum Genet 2007; 81: 1158–1168.
 7
Salyakina D, Seaman SR, Browning BL, Dudbridge F, MullerMyhsok B : Evaluation of Nyholt's procedure for multiple testing correction. Hum Hered 2005; 60: 19–25, (discussion 61–12).
 8
Moskvina V, Schmidt KM : On multipletesting correction in genomewide association studies. Genet Epidemiol 2008; 32: 567–573.
 9
Team RDC : R: A Language and Environment for statistical Computing. Vienna, Austria: R Foundation for statistical Computing, 2009.
 10
Spencer CC, Su Z, Donnelly P, Marchini J : Designing genomewide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet 2009; 5: e1000477.
 11
Li N, Stephens M : Modeling linkage disequilibrium and identifying recombination hotspots using singlenucleotide polymorphism data. Genetics 2003; 165: 2213–2233.
 12
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X : Rarevariant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 2011; 89: 82–93.
 13
Gao X, Becker LC, Becker DM, Starmer JD, Province MA : Avoiding the high Bonferroni penalty in genomewide association studies. Genet Epidemiol 2010; 34: 100–105.
 14
Bansal V, Libiger O, Torkamani A, Schork NJ : Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 2010; 11: 773–785.
 15
Li B, Leal SM : Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83: 311–321.
 16
Price AL, Kryukov GV, de Bakker PI et al: Pooled association tests for rare variants in exonresequencing studies. Am J Hum Genet 2010; 86: 832–838.
Acknowledgements
A portion of this research was conducted using the Linux Clusters for Genetic Analysis (LinGA) computing resource funded by the Robert Dawson Evans Endowment of the Department of Medicine at Boston University School of Medicine and Boston Medical Center and by contributions from individual investigators. We would like to thank Mayetri Gupta for her thoughtful comments on this research.
Author information
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies this paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Received
Revised
Accepted
Published
Issue Date
DOI
Keywords
 correlated
 permutation
 Bonferroni
 typeI error
 effective
 independent
Further reading

Multiple functional variants in the IL1RL1 region are pretransplant markers for risk of GVHD and infection deaths
Blood Advances (2019)

Correction for multiple testing in candidategene methylation studies
Epigenomics (2019)

Genetic Predisposition to Symptomatic Lumbar Disk Herniation in Pediatric and Young Adult Patients
SPINE (2019)

Functional annotation and Bayesian finemapping reveals candidate genes for important agronomic traits in Holstein bulls
Communications Biology (2019)

HisCoMGGI: Hierarchical structural component analysis of gene–gene interactions
Journal of Bioinformatics and Computational Biology (2018)