The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants

Fadista, João; Manning, Alisa K; Florez, Jose C; Groop, Leif

doi:10.1038/ejhg.2015.269

Download PDF

Short Report
Published: 06 January 2016

The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants

João Fadista^1,2,
Alisa K Manning^3,4,
Jose C Florez^3,4,5,6 &
…
Leif Groop^2,7

European Journal of Human Genetics volume 24, pages 1202–1205 (2016)Cite this article

36k Accesses
164 Citations
23 Altmetric
Metrics details

Subjects

Abstract

Genome-wide association studies (GWAS) have long relied on proposed statistical significance thresholds to be able to differentiate true positives from false positives. Although the genome-wide significance P-value threshold of 5 × 10⁻⁸ has become a standard for common-variant GWAS, it has not been updated to cope with the lower allele frequency spectrum used in many recent array-based GWAS studies and sequencing studies. Using a whole-genome- and -exome-sequencing data set of 2875 individuals of European ancestry from the Genetics of Type 2 Diabetes (GoT2D) project and a whole-exome-sequencing data set of 13 000 individuals from five ancestries from the GoT2D and T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) projects, we describe guidelines for genome- and exome-wide association P-value thresholds needed to correct for multiple testing, explaining the impact of linkage disequilibrium thresholds for distinguishing independent variants, minor allele frequency and ancestry characteristics. We emphasize the advantage of studying recent genetic isolate populations when performing rare and low-frequency genetic association analyses, as the multiple testing burden is diminished due to higher genetic homogeneity.

Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts

Article Open access 28 January 2020

Reproducibility in the UK biobank of genome-wide significant signals discovered in earlier genome-wide association studies

Article Open access 20 September 2021

SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests

Article Open access 22 September 2022

Introduction

In genetic association analyses of complex traits, determining the correct P-value threshold for statistical significance is critical to control the number of false-positive associations. Although the genome-wide significance (WGS) P-value threshold of 5 × 10⁻⁸ has become a standard for genome-wide association studies (GWAS),^{1, 2} it has not been updated to account for the lower allele frequency spectrum used in many recent array-based GWAS studies³ and sequencing studies. Different statistical procedures accounting for multiple testing have been used in the genome-wide setting, including the naive Bonferroni correction,⁴ which can be overly conservative due to the assumption that every genetic variant tested is independent of the rest; false discovery rate procedures,⁵ permutation based-approaches² and Bayesian approaches.⁶

Here, we set out to perform an updated evaluation of the significance threshold for genome-wide genetic association studies designed to discover loci associated with complex traits using a multiple testing approach to control the number of false-positive associations. Guidelines developed in this paper can be useful for researchers using human sequence data (for either direct association testing or as an imputation panel) to evaluate variants in the lower frequency spectrum of their samples. In 2005 the International HapMap Consortium¹ used permutation testing of genotypes in 10 densely genotyped Encyclopedia of DNA Elements genomic regions to estimate the number of common independent variants (minor allele frequency (MAF)≥5%) to be 150 per 500 kilobase pairs (kb) in European population. Extrapolating to all the genome (~3.3 Gb) suggested a significance threshold of 5 × 10⁻⁸. Since then, this WGS threshold became a standard for reporting genome-wide association significance hits at MAF≥5% for European ancestry populations.^{2, 3} Moreover, the HapMap variation catalog^{1, 7} established most of the variation that one could test for association and set a P-value threshold for WGS that was invariant to a study’s sample size at MAF≥5%. More recently, whole-exome- and -genome-sequencing projects greatly expanded the number of genetic variants that one could use in association studies. In the 1000 Genomes sequencing project,⁸ it was observed that ~50% of observed genetic variants were novel, even in the well-characterized Encyclopedia of DNA Elements regions. Sequencing studies lead to an increased number of low-frequency (0.5%<MAF<5%) and rare (MAF<0.5%) variants, arguing for a more stringent statistical threshold for association testing in studies utilizing sequence data.

Materials and methods

For genome-wide (WGS) and exome-wide (WES) significance threshold calculations the Genetics of Type 2 Diabetes (GoT2D) genome-wide integrated SNP panel data freeze v.20120804 and GoT2D.exomes.2760.qc_plus.86_swap_fixed.vcf were used, respectively. The integrated SNP panel contains QC genotypes from low-coverage (4 ×) whole-genome sequencing (4 ×), deep (70 ×)-exome sequencing and 2.5 m SNP genotyping of 2875 samples from four European cohorts: FUSION (Finland), DGI (Sweden and Finland), WTCCC (UK) and KORA (Germany). For the exome ancestry analysis, we sampled exome sequencing from Europeans (Finland and Ashkenazi cohorts), African-Americans (JHS cohort), South Asians (LOLIPOP cohort), East Asians (KARE cohort) and Hispanic (FHS cohort) from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) consortium. Exome-sequencing target capture was performed with the Agilent SureSelect Human All Exon platform. The function snpgdsLDpruning from the SNPrelate R package⁹ was used to calculate the number of biallelic tag SNPs using the correlation coefficient (r²) linkage disequilibrium (LD) metric at different LD thresholds (in autosomes plus chromosome X). Tag SNPs are the SNPs selected by the LD pruning algorithm to be kept on the pruned subset. The estimate of the number of independent variants is consistent if the snpgdsLDpruning algorithm is re-run. We repeated the snpgdsLDpruning command 10 times for the whole-genome-sequencing variants on chromosome 21, and we found that the SD of the number of independent variants at each LD threshold was always <0.1% of the mean number of independent variants. The vcftools package¹⁰ was used to subset the data at different MAFs. The P-value needed to reach genome- and exome-wide significance at different MAFs and LD thresholds was calculated as 0.05/number of tag SNPs. SNPs below a defined LD threshold are considered independent. For simplicity, the comparison between different WGS ancestry groups (UK, Sweden and Finland) and LD reliability based on sample size were done only on chromosome 21 at MAF≥5%, MAF≥1% and MAF≥0.5%. The results were then extrapolated to all genome based on the genome and chromosome 21 sizes taken from Ensembl browser.¹¹ Only 512 samples taken at random from each population were chosen for the WGS ancestry comparison as this was the minimum number of samples for one of our ancestry groups (Sweden). The same was applied for the WES ancestry comparison (minimum number of samples=861). We also used the D′ LD metric, implemented in the snpgdsLDpruning algorithm, to calculate the number of biallelic tag SNPs at different LD thresholds. D′ measures the evolutionary genealogy of a pair of variants – and is influenced by the amount of recombination that has occurred between the two loci as the appearance of the more recent variant. When used with the binning approach described in the paper, D′ would create bins of SNPs that do not tag the same association signal – as the power of LD mapping (observing an association in a non-causal SNP that is linked to the causal SNP) is a function of r² and not D′. Furthermore, D′ has the disadvantage that the estimate can be biased upward if allele frequencies between loci are very different, the sample size is small, or the frequency of the variants are low.

Results

For this analysis, we used a WGS and WES data set of 2875 individuals of European ancestry from the GoT2D and a whole-exome-sequencing data set of 13 000 individuals from five ancestries from the T2D-GENES projects (Flannick et al, Teslovich et al, submitted). For each scenario of data type, MAF, LD threshold and ancestry, we estimate the number of independent genetic variants and calculate a statistical significance threshold to maintain a family-wise type I error rate of 5%: 0.05/number of independent variants (Materials and methods).

We found that the genome- and exome-wide association significance P-value thresholds needed for association testing depend upon the LD cut-off chosen for defining independence between variants, MAF and ancestry (Figure 1). As expected, for both genome- and exome-wide significance thresholds in the GoT2D data set, as lower LD is considered, more variants are considered dependent, relaxing the required P-value significance threshold (Materials and methods). In addition, as the minimum MAF of the variants included in a study decreases, more stringent significant thresholds are needed due to the increasing number of variants and the lower LD between less frequent variants (Figure 1; Supplementary Table S1; Supplementary Table S2).

Interestingly, the widely used genome-wide P-value threshold of 5 × 10⁻⁸ is valid for common variants (MAF≥5%) only if a LD r²<0.8 is applied (see Supplementary Table S1 for other thresholds). Under a model using this LD threshold for tagging SNPs, then we would have a P-value threshold of 3 × 10⁻⁸, 2 × 10⁻⁸ and 1 × 10⁻⁸, for analyses of variants with MAF≥1%, MAF≥0.5% and MAF≥0.1%, respectively (Figure 1a; Supplementary Table S1). For exome-wide significance (also at LD r²<0.8), we would have a P-value threshold of 1 × 10⁻⁶ at MAF≥5%, 7 × 10⁻⁷ at MAF≥1%, 5 × 10⁻⁷ at MAF≥0.5% and 3 × 10⁻⁷ at MAF≥0.1% (Figure 1b; Supplementary Table S2), roughly consistent with the WES threshold commonly used that is based on a Bonferroni correction for 100 000 variants with MAF≥0.5% (P-value=5 × 10⁻⁷).

Importantly, if no LD threshold is applied, that is, including all variants even if they are in perfect LD (LD r²=1), this naive Bonferroni correction will lead to unnecessary testing. For instance at MAF≥0.1%, you end up testing 833 000 variants that are in perfect LD (Figure 1a; Supplementary Table S1). Of note, 92% of biallelic SNPs from our GoT2D exome-sequencing data set of European ancestry are also captured by the general population (ExAC database—http://exac.broadinstitute.org/).

We also examined the significance P-value threshold for various sample sizes to determine its effect at different allele frequencies. We observed clear evidence that for studies that include variants with MAF≥0.5%, the statistical significance P-value threshold calculated in a European sample of N=2875 is reliable for smaller studies of the same ancestry when N>500 (Supplementary Figure S1; Supplementary Table S4). When we evaluated the D′, we observed that the number of tag SNPs was much lower and less dependent on allele frequency than the using r² as the LD measure (Supplementary Figure S2).

As the GoT2D whole-genome data set included >500 individuals from each of three ancestries: UK (n=660), Sweden (n=512) and Finland (n=1442) and the T2D-GENES exome-sequencing data set included >861 individuals from each of five diverse ancestries (South Asian, East Asian, European, Hispanic and African-American), we questioned if the statistical significance threshold controlling the false-positive rate for low-MAF variants changes with ancestry characteristics. We hypothesized that due to the Finnish population history shaped by relative few founders and recent rapid expansion,^{12, 13} we would have an advantage when performing rare/low-frequency variant analysis in this population in comparison with the UK and Swedish populations. In fact, for the Finnish population there are a lower number of independent variants among low-frequency variants, requiring a less stringent correction for statistical association testing and increased power (Figure 2a; Supplementary Table S3). For instance, at a LD threshold of r²<0.8 and MAF ≥0.5%, we would have a WGS significance P-value threshold of 2.6 × 10⁻⁸ in Finns, whereas for the Swedish and British ancestries it would be 2.3 × 10⁻⁸, which would require testing >200 000 extra variants in the latter two populations (Supplementary Table S3). Likewise, when considering the ancestries represented in the T2D-GENES whole-exome-sequence data set, we observe an increased testing burden for ancestry groups with a greater genetic diversity, in particular the African-Americans (Figure 2b; Supplementary Figure S3). This emphasizes the advantage of performing rare/low-frequency variant association studies in isolated populations with a relative lower effective population size, as the length of shared haplotypes is greater with lower allele frequency, in line with what has been previously reported.⁷

Discussion

Taken together, this study provides guidelines for genome- and exome-wide association P-value thresholds needed to correct for multiple testing, explaining the impact of LD thresholds for distinguishing independent variants, MAF and ancestry characteristics.

We confirm the 5 × 10⁻⁸ P-value threshold for WGS to be valid for common (MAF>5%) genetic variation in the European population. However, for lower frequency variants, the genome-wide P-value threshold needs to be more stringent for studies with European ancestry (3 × 10⁻⁸ for MAF≥1%, 2 × 10⁻⁸ for MAF≥0.5% and 1 × 10⁻⁸ for MAF≥0.1% at LD r²<0.8). For exome-sequencing studies, exome-wide-significant thresholds should also be agreed and adopted by the scientific community; for studies with European ancestry, P-value threshold of 1 × 10⁻⁶, 7 × 10⁻⁷, 5 × 10⁻⁷ and 3 × 10⁻⁷, for MAF≥5%, MAF≥1%, MAF≥0.5% and MAF≥0.1%, respectively, are reasonable. Studies of other ancestry groups should consider the degree of genetic variation when considering the appropriate statistical significance threshold.

We also demonstrate the advantage of studying isolated young populations with a relative lower effective population size, for analysis of rare variants, since their lower genetic diversity translates into fewer independent rare variants and therefore, less multiple testing burden and consequent increased power in rare variant analysis.

We acknowledge that the frequentist approach of using P-value thresholds as a measure of statistical evidence has important limitations, as it does not take into account the power of the tests, as it is a threshold suggested for all sample sizes and allele frequencies.^{14, 15} Although in a Bayesian setting, one can incorporate these parameters as prior odds of belief, it needs prior distributions to be defined for model parameters, involving intensive computation to incorporate likelihoods over the defined parameter space. By doing so, if different studies adopt different priors, comparability of findings between studies remain problematic. Nevertheless, we believe that the Bayesian approach has its most value for region fine mapping to identify the true causal variant(s).¹⁶

References

International HapMap Consortium: A haplotype map of the human genome. Nature 2005; 437: 1299–1320.
Article Google Scholar
Pe’er I, Yelensky R, Altshuler D, Daly MJ : Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 2008; 32: 381–385.
Article Google Scholar
Welter D, MacArthur J, Morales J et al: The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014; 42: D1001–D1006.
Article CAS Google Scholar
Bonferroni CE : Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni. Bardi: Rome, Italy, 1935, pp 13–60.
Google Scholar
Storey JD, Tibshirani R : Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440–9445.
Article CAS Google Scholar
Wellcome Trust Case Control Consortium: Genome-wide association study of 14000 cases of seven common diseases and 3000 shared controls. Nature 2007; 447: 661–678.
Article Google Scholar
The International HapMap Consortium, Frazer KA, Ballinger DG et al: A second generation human haplotype map of over 3.1 million SNPs. Nature 2007; 449: 851–861.
Article Google Scholar
1000 Genomes Project Consortium, Abecasis GR, Auton A et al: An integrated map of genetic variation from 1092 human genomes. Nature 2012; 491: 56–65.
Article Google Scholar
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS : A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 2012; 28: 3326–3328.
Article CAS Google Scholar
Danecek P, Auton A, Abecasis G et al: The variant call format and VCFtools. Bioinformatics 2011; 27: 2156–2158.
Article CAS Google Scholar
Flicek P, Amode MR, Barrell D et al: Ensembl 2014. Nucleic Acids Res 2014; 42: D749–D755.
Article CAS Google Scholar
Peltonen L, Jalanko A, Varilo T : Molecular genetics of the Finnish disease heritage. Hum Mol Genet 1999; 8: 1913–1923.
Article CAS Google Scholar
Service S, DeYoung J, Karayiorgou M et al: Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies. Nat Genet 2006; 38: 556–560.
Article CAS Google Scholar
Wacholder S, Chanock S, Garcia-Closas M et al: Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst 2004; 96: 434–442.
Article Google Scholar
Sham PC, Purcell SM : Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 2014; 15: 335–346.
Article CAS Google Scholar
Stephens M, Balding DJ : Bayesian statistical methods for genetic association studies. Nat Rev Genet 2009; 10: 681–690.
Article CAS Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the members of T2D-GENES and GoT2D consortia for sharing prepublication data.

Author information

Authors and Affiliations

Department of Epidemiology Research, Statens Serum Institut, Copenhagen S, Denmark
João Fadista
Department of Clinical Sciences, Lund University Diabetes Centre, Lund University, Malmo, Sweden
João Fadista & Leif Groop
Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA
Alisa K Manning & Jose C Florez
Department of Medicine, Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
Alisa K Manning & Jose C Florez
Department of Medicine, Diabetes Research Center, Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA
Jose C Florez
Department of Medicine, Harvard Medical School, Boston, MA, USA
Jose C Florez
Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
Leif Groop

Authors

João Fadista
View author publications
You can also search for this author in PubMed Google Scholar
Alisa K Manning
View author publications
You can also search for this author in PubMed Google Scholar
Jose C Florez
View author publications
You can also search for this author in PubMed Google Scholar
Leif Groop
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Fadista.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies this paper on European Journal of Human Genetics website

Supplementary information

Supplementary Figure 1 (JPG 15525 kb)

Supplementary Figure 2 (JPG 20704 kb)

Supplementary Figure 3 (JPG 24672 kb)

Supplementary Table 1 (XLSX 28 kb)

Supplementary Informations (DOCX 12 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fadista, J., Manning, A., Florez, J. et al. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur J Hum Genet 24, 1202–1205 (2016). https://doi.org/10.1038/ejhg.2015.269

Download citation

Received: 27 July 2015
Revised: 04 November 2015
Accepted: 26 November 2015
Published: 06 January 2016
Issue Date: August 2016
DOI: https://doi.org/10.1038/ejhg.2015.269

This article is cited by

Blood glucose and lipids are associated with sarcoidosis: findings from observational and mendelian randomization studies
- Yuan Zhan
- Jiaheng Zhang
- Jungang Xie
Respiratory Research (2024)
Rare recurrent copy number variations in metabotropic glutamate receptor interacting genes in children with neurodevelopmental disorders
- Joseph T. Glessner
- Munir E. Khan
- Hakon Hakonarson
Journal of Neurodevelopmental Disorders (2023)
The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies
- R. Onur Öztornaci
- Hamzah Syed
- Bahar Taşdelen
Journal of Big Data (2023)
Genetics of SLE: mechanistic insights from monogenic disease and disease-associated variants
- Carola G. Vinuesa
- Nan Shen
- Thuvaraka Ware
Nature Reviews Nephrology (2023)
Investigating the tissue specificity and prognostic impact of cis-regulatory cancer risk variants
- Ajay Subramanian
- Shengqin Su
- Michael Sargent Binkley
Human Genetics (2023)