Meta-analysis of SNPs involved in variance heterogeneity using Levene’s test for equal variances

Deng, Wei Q; Asma, Senay; Paré, Guillaume

doi:10.1038/ejhg.2013.166

Download PDF

Short Report
Published: 07 August 2013

Meta-analysis of SNPs involved in variance heterogeneity using Levene’s test for equal variances

Wei Q Deng¹,
Senay Asma² &
Guillaume Paré^1,2,3,4

European Journal of Human Genetics volume 22, pages 427–430 (2014)Cite this article

1892 Accesses
6 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Meta-analysis is a commonly used approach to increase the sample size for genome-wide association searches when individual studies are otherwise underpowered. Here, we present a meta-analysis procedure to estimate the heterogeneity of the quantitative trait variance attributable to genetic variants using Levene’s test without needing to exchange individual-level data. The meta-analysis of Levene’s test offers the opportunity to combine the considerable sample size of a genome-wide meta-analysis to identify the genetic basis of phenotypic variability and to prioritize single-nucleotide polymorphisms (SNPs) for gene–gene and gene–environment interactions. The use of Levene’s test has several advantages, including robustness to departure from the normality assumption, freedom from the influence of the main effects of SNPs, and no assumption of an additive genetic model. We conducted a meta-analysis of the log-transformed body mass index of 5892 individuals and identified a variant with a highly suggestive Levene’s test P-value of 4.28E-06 near the NEGR1 locus known to be associated with extreme obesity.

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Associations of dietary patterns with brain health from behavioral, neuroimaging, biochemical and genetic analyses

Article Open access 01 April 2024

Ruohan Zhang, Bei Zhang, … Wei Cheng

Introduction

Considering the effects on the phenotypic variance of quantitative traits (that is, differences in the variance of a trait according to the genotype), a sample size of the order of tens of thousands is required for the association of single-nucleotide polymorphisms (SNPs) at genome-wide significance levels (P<1E-06).¹ The effects on the phenotypic variance of quantitative traits are likely to be equally modest, thus requiring large sample sizes to identify. Among the statistical tests designed to detect variance heterogeneity, Levene’s test² has been shown to be robust to the violation of the normality assumption and adequately powered under other irregularities.³ Furthermore, Levene’s test, by design, is not under the influence of any of the main effects of SNPs and compares the pairwise differences in variance between genotype groups, which encompass both linear and non-linear trends. For instance, an analysis of 21 799 individuals from the Women’s Genome Health Study first identified SNPs with a genome-wide significant Levene’s test P-value for C-reactive protein (rs12753193, P=8.0E-11) and soluble ICAM-1 (rs738409, P=1.9E-10; rs1799969, P=2.1E-09).⁴ Although it is feasible to analyze the heterogeneity of variance in individually large studies, sufficient sample sizes for the detection of variants with small effects can only be practically reached through meta-analysis. Indeed, a recent report has associated an FTO variant (rs7202116) with the phenotypic variability of body mass index (BMI) (P=2.4E-10; N=131 233) in a meta-analysis using the squared residual as the response variable.⁵

In addition to finding genetic variants influencing phenotypic variance, a meta-analysis of variance heterogeneity can also be used to prioritize potentially interacting variants to test for gene–environment and gene–gene interactions. The high-dimensional nature of genome-wide data inevitably poses computational and statistical challenges, such as multiple testing burden. Consequently, sample sizes of individual genome-wide association studies have been largely underpowered to detect interactions.⁶ Despite these challenges, there is a pressing need to understand how genetic interactions contribute to the ‘missing heritability’.^{7, 8} The discovery of novel genetic interactions through meta-analysis presents a promising strategy, as large international consortia provide the adequate sample sizes and methodologies for meta-analyzing interactions are quite well developed.^{9, 10} We have previously proposed a prioritization scheme–variance prioritization–in the context of quantitative traits based on the observation that the trait variance conditional on genotypes will vary when an interaction is present,⁴ an active area of methodological research.^{11, 12, 13} Prioritization is achieved by comparing the variances of a quantitative trait conditional on the genotypes using Levene’s test. As only SNPs with Levene’s test P-values that are lower than a pre-determined threshold (typically a nominal significance level at ∼0.05) are tested for interaction effects, the underlying effect of multiple hypothesis testing is greatly reduced and overall statistical power is increased accordingly, compared with an exhaustive search for gene–gene or gene–environment interactions.

In this paper, we provide a framework for combining summary statistics from multiple genome-wide studies to calculate the meta-analyzed Levene’s test P-values for individual SNPs without needing to exchange individual-level data. We then perform a genome-wide search for SNPs involved in the heterogeneity of variance using log-transformed BMI and height.

Materials and methods

Consider a quantitative trait Y with N individuals, and Y_i as the quantitative trait when stratified according to the possible genotypes (i=0, 1, or 2) of a biallelic SNP. To obtain an equivalent of the exact Levene’s test statistic without exchanging individual-level data, the following statistics are reported by the study s (s=1, 2, ... S) for each SNP:

(n_0s, n_1s, n_2s): genotype counts, summing up to N_s

(, , ): within genotype means of Z₀, Z₁, Z₂

(, , ): within genotype variances of Z₀, Z₁, Z₂

Where and is the group mean of Y_i. The calculation of Levene’s test statistic by simply combining samples assumes the following natural weights: () and (). The meta-analyzed Levene’s test statistic L⁺ using only the summary statistics and weights is (detailed derivation in S1):

Under the null hypothesis of variance homogeneity, L⁺ follows an F-distribution with df₁=2 and df₂=N−3. Caution should be observed regarding rare variants (minor allele frequency (MAF)<1%); a minimum of two individuals is needed to estimate the variance in any observed genotype group.

It is common practice in meta-analysis to apply study-specific weights in such a way that the combined estimate reflects the individual effects of varying influences. An adjusted weight can be attained by multiplying the natural weights by the desired adjustment η_is:

The corresponding P-value can be calculated from the test statistic, with the adjusted weights replacing the natural weights. In other words, natural weights are re-weighted by the adjustment η_is ∈[0,1] , where 1 corresponds to the complete representation of the ith genotype in the sth study and 0 corresponds to no representation in the meta-analysis.

We conducted a genome-wide meta-analysis of the variance heterogeneity for log(BMI) and log(height) using three publicly available genome-wide data sets from dbGap:¹⁴ MESA (Study accession: phs000209.v10.p2, http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000209.v10.p2) and GENEVA, including data from the NHS and HFPS (Study accession:phs000091.v2.p1, http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000091.v2.p1.). For each data set, we performed quality control of SNPs based on MAF (>1%), Hardy–Weinberg equilibrium (P>1E-08) and the genotype call rate (>95%) and filtered individuals based on ethnicity (only European Caucasians) and relatedness (kinship coefficient>0.025). In addition, individuals with diabetes were excluded from the analysis of BMI to avoid reverse causation.

Results

The quantile–quantile plots of the Levene’s test P-values suggested no noticeable inflation of type I error rate in either the individual studies or the meta-analysis (Supplementary Figure S2). We did not detect SNPs with a meta-analyzed Levene’s test P-value (as shown in Figures 1a and e) that was lower than 5E-08 (ref. 15), which was attributed to the small sample size, even with all the studies combined (Supplementary Table S1). For SNPs with P-values that were lower than 1E-05 (Table 1), we systematically searched for neighboring SNPs associated at genome-wide significance levels with any traits or disease in the catalog of published GWAS (http://www.genome.gov/gwastudies/), filtering associations based on a maximum distance of 500 kb and r²>0.8 or D'>0.8. Among the 16 top hits for log(BMI), rs12132044 in the intronic region of the NEGR1 gene had a highly suggestive meta-analyzed Levene’s test P-value of 4.28E-06 (Supplementary Figure S3). Notably, rs2815752 near the NEGR1 gene, which is known to be associated with BMI¹⁶ and severe obesity in a pediatric cohort,⁵ was in weak LD (r²=0.325; D'=0.888; Distance=231.44 kb) with rs12132044 and also nominally significant for variance heterogeneity (P=0.0076). None of the other top hits for log(BMI) or log(height) were correlated with variants associated with other traits or diseases in their neighboring regions (Supplementary Table S2). For illustrative purposes, we also performed a meta-analysis using arbitrary adjustments. Similar conclusions were reached when meta-analysis was performed with study-specific weights (Figures 1b–d, f and g). Additional simulation results on variance prioritization are provided (Supplementary Table S3; Supplementary Figure S1 and S4).

Table 1 SNPs with Levene’s test P-value lower than 1.0 × 10⁻⁵ from a meta-analysis of height and body mass index

Full size table

Discussion

Analysis of the genetic basis of quantitative trait variance has recently gained increasing interest. Differences in the variance of a quantitative trait between genotypes of a SNP can be due to environmental sensitivity, underlying gene–gene or gene–environment interactions, or linkage disequilibrium with causal variants. Levene’s test can be applied to meta-analysis of environmental sensitivity, which largely rests on analysis of phenotypic variation. Notably, meta-Levene identified a NEGR1 variant (rs12132044) such that the variance of log(BMI) stratified by genotypes was related to its number of minor alleles in a non-linear fashion, which would otherwise be underpowered for detection using a linear model.

Even with the large sample size available from modern consortia, statistical power to detect interactions remains modest, and thus there is a need for prioritization methods. The computation of Levene’s test P-values using only summary-level data facilitates the use of variance prioritization in meta-analysis when individual-level data cannot be obtained. In variance prioritization, SNPs with significant Levene’s test P-values are prioritized and then directly tested for interaction effects using the preferred interaction meta-analysis methods. Our simulations (Supplementary Table S3) showed improvements in power when using the optimal Levene’s test P-value thresholds. Increased power was consistent with the reductions in the genetic interaction search space resulting from the prioritization of SNPs. Under most circumstances, the absolute power to detect an interaction is low, a priori, such that the relative increase in power is substantial. This can be highly advantageous if hundreds or thousands of interactions of small effect sizes underlie the genetics of complex traits. On the other hand, the need for prioritization diminishes when the interaction effect sizes are large and exhaustive search alone provides satisfactory power. However, even in this scenario, the performance of prioritization is either better than or at least equivalent to the conventional exhaustive search. Finally, the strength of association between the environmental covariate and the quantitative trait is the main determinant of the gain in power from prioritization, so situations where variance prioritization is particularly favorable can be readily identified. Our simulations (Supplementary Figure S4) also concurred with the theoretical framework of variance prioritization^{4, 11}, according to which the sample size, number of SNPs, MAF, and the proportion of variance explained by the interactions and the covariate influenced the statistical power of variance prioritization.

Beyond allowing the implementation of variance prioritization to select SNPs for a meta-analysis of genetic interactions or environmental sensitivity, there are many other potential applications of the meta-analysis of Levene’s test. For example, the homogeneity of variance assumption underlying many statistical models is usually examined using Levene’s test. The meta-analysis of the main effects often relies on the assumption of a common variance among the different levels of a factor, and meta-analysis of Levene’s test can be conveniently adopted as a quality control step prior to main effects analysis. Meta-analysis of Levene’s test is not limited to stratification by genotype; it can also be used to investigate the heterogeneity of phenotypic variance across a wide range of environmental factors.

A few limitations are worth considering. First, the required summary statistics are not typically reported in existing GWAS meta-analyses, and the generation of such statistics entails further analytic efforts among individual research centers. However, calculation of summary statistics can be simply executed at research centers using our PLINK R plug-in¹⁷ scripts (PLINK v.1.07; http://pngu.mgh.harvard.edu/purcell/plink/). Second, meta-analysis frequently uses imputation methods to produce a common set of SNPs among studies genotyped on different platforms. Imputed SNPs are usually assigned a probability score based on the expected number of minor alleles, in which case individuals cannot be stratified into discrete genotypes. To address this concern, we suggest using a best-guess model whereby participants may be classified according to the most likely genotype. However, further statistical methodologies to incorporate probabilistic genotypes under the current framework are required. Finally, population stratification presents a major challenge to the meta-analysis of population-based GWAS. We observed that meta-analysis of Levene’s test in a multi-ethnic population can lead to false positive results when no precaution is taken (data not shown). Although our method does not explicitly address this problem, one solution would be to compute the required summary statistics from the principal-component-adjusted traits.

In conclusion, we have presented a mathematical framework for meta-analysis of Levene’s test that can be used for environmental sensitivity or variance prioritization in meta-analysis. The use of Levene’s test is advantageous as it is robust to departures from the normality assumption, is not influenced by the main effects of SNPs, and does not assume an additive genetic model. Finally, meta-analysis of Levene’s test can be adapted to more general contexts of variance analysis and has utility beyond the field of genetics.

References

Visscher PM, Posthuma D : Statistical power to detect genetic loci affecting environmental sensitivity. Behav Genet 2010; 40: 728–733.
Article Google Scholar
Levene H : Robust tests for equality of variances; in: Olkin I, (ed): Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford University Press: Stanford, CA, USA, 1960, pp 278–292.
Google Scholar
Lim T-S, Loh W-Y : A comparison of tests of equality of variances. Comput Stat Data Anal 1996; 22: 287–301.
Article Google Scholar
Paré G, Cook NR, Ridker PM, Chasman DI : On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the women’s genome health study. PLoS Genet 2010; 6: e1000981.
Article Google Scholar
Yang J, Loos RJ, Powell JE et al: FTO genotype is associated with phenotypic variability of body mass index. Nature 2012; 490: 267–272.
Article CAS Google Scholar
Thomas D : Gene-environment-wide association studies: emerging approaches. Nat Rev Genet 2010; 11: 259–272.
Article CAS Google Scholar
Zuk O, Hechter E, Sunyaev SR, Lander ES : The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci USA 2012; 109: 1193–1198.
Article CAS Google Scholar
Manolio TA, Collins FS, Cox NJ et al: Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753.
Article CAS Google Scholar
Aschard H, Hancock DB, London SJ, Kraft P : Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Hum Hered 2010; 70: 292–300.
Article Google Scholar
Manning AK, LaValley M, Liu CT et al: Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP x environment regression coefficients. Genet Epidemiol 2011; 35: 11–18.
Article Google Scholar
Deng W, Paré G : A fast algorithm to optimize SNP prioritization for gene-gene and gene-environment interactions. Genet Epidemiol 2011; 35: 729–738.
Article Google Scholar
Struchalin MV, Amin N, Eilers PH, van Duijn CM, Aulchenko YS : An R package "VariABEL" for genome-wide searching of potentially interacting loci by testing genotypic variance heterogeneity. BMC Genet 2012; 13: 4.
Article Google Scholar
Struchalin MV, Dehghan A, Witteman JC, van Duijn C, Aulchenko YS : Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC Genet 2010; 11: 92.
Article Google Scholar
Mailman MD, Feolo M, Jin Y et al: The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007; 39: 1181–1186.
Article CAS Google Scholar
Pe'er I, Yelensky R, Altshuler D, Daly MJ : Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet Epidemiol 2008; 32: 381–385.
Article Google Scholar
Willer CJ, Speliotes EK, Loos RJ et al: Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet 2009; 41: 25–34.
Article CAS Google Scholar
Purcell S, Neale B, Todd-Brown K et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Clinical Epidemiology and Biostatistics, Population Genomics Program, McMaster University, Hamilton, ON, Canada
Wei Q Deng & Guillaume Paré
Department of Pathology and Molecular Medicine, McMaster University, Hamilton, ON, Canada
Senay Asma & Guillaume Paré
Population Health Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
Guillaume Paré
Thrombosis and Atherosclerosis Research Institute, Hamilton, ON, Canada
Guillaume Paré

Authors

Wei Q Deng
View author publications
You can also search for this author in PubMed Google Scholar
Senay Asma
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Paré
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Paré.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies this paper on European Journal of Human Genetics website

Supplementary information

Supplementary Information (DOC 783 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, W., Asma, S. & Paré, G. Meta-analysis of SNPs involved in variance heterogeneity using Levene’s test for equal variances. Eur J Hum Genet 22, 427–430 (2014). https://doi.org/10.1038/ejhg.2013.166

Download citation

Received: 29 March 2013
Revised: 01 June 2013
Accepted: 05 July 2013
Published: 07 August 2013
Issue Date: March 2014
DOI: https://doi.org/10.1038/ejhg.2013.166