SHAVE: shrinkage estimator measured for multiple visits increases power in GWAS of quantitative traits

Meirelles, Osorio D; Ding, Jun; Tanaka, Toshiko; Sanna, Serena; Yang, Hsih-Te; Dudekula, Dawood B; Cucca, Francesco; Ferrucci, Luigi; Abecasis, Goncalo; Schlessinger, David

doi:10.1038/ejhg.2012.215

Download PDF

Article
Open access
Published: 24 October 2012

SHAVE: shrinkage estimator measured for multiple visits increases power in GWAS of quantitative traits

Osorio D Meirelles¹,
Jun Ding¹,
Toshiko Tanaka²,
Serena Sanna³,
Hsih-Te Yang¹,
Dawood B Dudekula¹,
Francesco Cucca³,
Luigi Ferrucci²,
Goncalo Abecasis⁴ &
…
David Schlessinger¹

European Journal of Human Genetics volume 21, pages 673–679 (2013)Cite this article

1210 Accesses
9 Citations
Metrics details

Subjects

A Corrigendum to this article was published on 12 December 2013

Abstract

Measurement error and biological variability generate distortions in quantitative phenotypic data. In longitudinal studies with repeated measurements, the multiple measurements provide a route to reduce noise and correspondingly increase the strength of signals in genome-wide association studies (GWAS).To optimize noise correction, we have developed Shrunken Average (SHAVE), an approach using a Bayesian Shrinkage estimator. This estimator uses regression toward the mean for every individual as a function of (1) their average across visits; (2) their number of visits; and (3) the correlation between visits. Computer simulations support an increase in power, with results very similar to those expected by the assumptions of the model. The method was applied to a real data set for 14 anthropomorphic traits in ∼6000 individuals enrolled in the SardiNIA project, with up to three visits (measurements) for each participant. Results show that additional measurements have a large impact on the strength of GWAS signals, especially when participants have different number of visits, with SHAVE showing a clear increase in power relative to single visits. In addition, we have derived a relation to assess the improvement in power as a function of number of visits and correlation between visits. It can also be applied in the optimization of experimental designs or usage of measuring devices. SHAVE is fast and easy to run, written in R and freely available online.

Evaluating and improving heritability models using summary statistics

Article 23 March 2020

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Article Open access 08 November 2019

An integrated framework for local genetic correlation analysis

Article 14 March 2022

Introduction

In contrast to Mendelian traits, for which the association with related penetrant mutations is patent, quantitative traits show a continuous range of values and smaller effect sizes of genetic variants. Thus, to identify genetic factors involved in quantitative traits, larger sample sizes and more refined statistical analyses are required.

Population studies with multiple visits and quantitative trait measurements a priori offer the possibility to increase power and determine the trajectory of trait values in relation to disease or other outcomes. Methods that take all the measurements into account can increase statistical power of the genome-wide association studies (GWAS) analyses that dominate current discovery efforts. Similar benefits of using multiple measurements have been shown in analyses of expression profiling on microarrays^{1, 2, 3, 4} and more recently in studies of blood pressure.⁵ However, for most existing population cohorts, additional variability is introduced by different numbers of visits for individuals and by possible secular drift. To optimally model this type of data, we propose a shrinkage method that efficiently combines observations from different measurements, even when some visits are missing for some individuals.

The strength of shrinkage estimators compared with frequentist approaches has been clearly described in classical literature^{6, 7} and more recently in GWAS and related areas such as imputation, fine mapping and meta-analysis.⁸ Our method implements an empirical Bayes algorithm, ‘Shrunken Average (SHAVE)’, using regression toward the mean for every individual as a function of their number of visits and the correlation between visits. We evaluated the performance of the method by simulations and confirmed the expectations in real data.

We used the SardiNIA cohort (http://sardinia.nia.nih.gov)⁹ consisting of >6000 individuals and a set of 14 traits that were measured up to three times in all individuals, at time ∼3-year intervals. To evaluate the impact of the method, we selected top SNPs from single visits and meta-analysis studies, and compared the significance of the same SNPs for single visits, for the average across visits and for SHAVE. Variable but appreciable improvement in performance was found.

SHAVE is fast and easy to run, and can thus be added to approaches such as principal component and variance analysis. Finally, we suggest a way to estimate the cost-benefit of adding additional visits for GWAS signals and discuss the potential utility of SHAVE for other applications. The R code for SHAVE is available (http://sardinia.nia.nih.gov/Download/).

Methods

We outline briefly how we test for the association of a single measurement of a trait with a given SNP, and then generalize for multiple replicates. Consider a given quantitative trait and a given SNP. Let y_ij (i=1,..., n; j=1,…, k_i) denote individual i’s jth repeated measure of the trait, or his/her residual for that trait after adjusting for one or more variables (eg, sex and age). Let G_i denote the number of minor alleles of the given SNP for individual i. Let G be the vector (G₁, G₂,..., G_n) containing the number of alleles (0,1 or 2) for all individuals. Similarly, let Y₁ be the vector (y₁₁, y₂₁,…, y_n1), containing the first measurement of the trait for all individuals. To test the association between Y₁ and G using an additive model, Y₁ is regressed on G such that y_i1=β₁G_i+α+e_i1, and an estimate for β₁ is obtained (β^*₁). We then we divide β^*₁ by its standard error and obtain a z-statistic. Next for each z-statistic a corresponding P-value is obtained. This is done separately for every SNP and every trait.

SHAVE: the posterior expectation μ*_i

Now consider the following random-intercept model:

where w≥0 and σ²≥0 are unknown parameters, which can be easily estimated from our dataset. Straightforward algebra, such as that presented in an elementary textbook on Bayesian statistics (eg, Lee¹⁰) would give the posterior expectation of μ_i when σ² and w are known. Thus, , where k_i is the number of visits and ȳ_i is the average across visits for individual i. For the proof, please see Supplementary Materials Section 1. Let μ^*_i denote the posterior expectation , then

We call μ*_i the SHAVE estimator for multiple visits. Note that not all individuals have the same number of visits, thus if an individual i did not have visit j, the measurement y_ij is set to missing, thus ȳ_i will be the average of non-missing values. The reason that k_i varies between individuals is mainly due to missing data, which we are assuming is missing completely at random, that is, is unrelated to age, sex, …, and the missing value itself.

The term will be referred to as the adjustment factor of the average, which is equal to one minus the shrinkage factor. We note that μ^*_i does not depend on σ² and is a function only of k_i, w and ȳ_i. Next, μ*_i is regressed on G such that and the statistical significance of beta is calculated.

Estimating w and σ²

Let n be the total number of individuals and k_i be the number of visits for individual i. Given that equation 1 implies Var(y_ij|μ_i)=σ²/w and Var(y_ij)=σ²/w+σ², both quantities can be respectively estimated by and , and setting _within and yields the estimate of w. Although the estimation of σ² is not needed for μ*_i, σ² can be estimated by s²_total–s²_within. Thus, the weight estimate is given by:

Therefore, in equation 2 the term w, which is unknown, should be replaced by its estimate. When all individuals have exactly two visits, ŵ is equal to ρ/(1−ρ), where ρ is the sample correlation between the two visits, and ŵ also minimizes the least squares loss function . Another possibility is to use a more robust loss function and estimate w that minimizes L₁. Although estimating w by L₁ and L₂ will give different results, both weights were similar for most traits when using the SardiNIA data, and furthermore, their corresponding z-statistics for SHAVE were extremely similar for all traits. For more details see Supplementary Materials Section 2 and Table S2.

Comparing different metrics – LOD ratio

Comparisons were done in GWAS using three summary trait values from multiple visits: single visit, Average and SHAVE.

Note: To distinguish between the statistical term ‘average’ and the actual ‘Average’ among visits, we use the latter throughout this paper. To assess performance among different metrics, for a given trait and a given SNP, we run an association test for each visit, the Average and SHAVE, obtaining a corresponding slope and z-statistic and calculating the corresponding z². The LOD score, one of the outputs from the Merlin¹¹ software, is defined as z²/log(100) and was chosen as a performance measure because the LOD score (or equivalently z²) is conveniently proportional to the sample size. For example, assume a true association between a trait and a specific SNP. If the sample size were equal to 2000 individuals with a corresponding z², then doubling the sample size to 4000 individuals would be expected to double z² as well. Next, we describe three common scenarios and provide an expected LOD ratio between different metrics, with z₁, z_AVG, and z_SHAVE as the corresponding z-statistics for single visit, Average and SHAVE.

Average vs single visit

We start by considering that all individuals have the same number of visits (equation 4) – that is, from a balanced dataset – and we then account for a situation in which there are different numbers of visits for individuals (unbalanced dataset) (equation 5).

Next we assume an unbalanced dataset for the LOD ratio between SHAVE vs Average (equation 6).

SHAVE vs Average

Proofs for equations 4, 5 and 6 can be found in Supplementary Materials Section 3. At this stage we point out a salient fact that if every individual has the same number of visits, then by equation 2, SHAVE will be the Average multiplied by a constant factor (kw/(1+kw)), which implies that z²_SHAVE is identical to z²_avg, also indicating that SHAVE will have the same power as the Average. This equality in power in balanced datasets between SHAVE and Average is also consistent with (equation 6), where replacing k_i by k, results in a ratio equal to one.

Simulation study

Simulated unbalanced datasets were generated with 5000 individuals, with 2500 individuals with three visits and the remaining 2500 with a single visit. We conducted two types of simulations, one to estimate Type I error and the other to estimate power. In each type of simulation we compared SHAVE, Average and single visits.

Simulation models

As for all metrics described, σ² is independent from the z-statistics, we set σ² equal to one. Next, we describe two simulation models, where model 1 is used to measure Type I error and model 2 is used to measure power.

Model 1, for β=0: where and e_ij is independent from μ_i. In this model we can see that Var(y_ij)=1+1/w.

Model 2, for β≠0: , where where δ_i∼N(0,1), e_ij∼N(0,1/w), G_i is randomly generated based on the pre-defined allele frequency, Ḡ is the average number of alleles across all individuals, and e_ij is independent from μ_i. Since our original random-intercept model assumes that Var(y_ij) does not depend on the genotype, the term (1−β²Var(G))^1/2 is introduced in order to have Var(δ_i(1−β²Var(G_i))^1/2+β (G_i−Ḡ))=Var(μ_i)=1, which implies that Var(y_ij)=1+1/w in both models 1 and 2.

Type I error simulations

We set α levels to 1 × 10⁻⁵, 1 × 10⁻⁶, 1 × 10⁻⁷ and 5 × 10⁻⁸. Ten billion simulations were performed to achieve an accurate Type I error estimation. Correlations between visits ρ were set equal to 0.2 or 0.5, and minor allele frequency P was set equal to 0.5. We then simulated y_ij values for all three visits and all individuals according to model 1 using the ‘true’ weight w=ρ/(1−ρ). Next, we randomly set as missing 50% of the values for visits 2 and 3, and Average was then calculated for every individual based on non-missing values. Next, we estimated the sample weight ŵ using equation 3 and generated SHAVE. Finally, we simulated the vector G based on the minor allele frequency P (0.5). After performing the simple linear regression between each metric and G, P-values were obtained. Next, for each metric we measured Type I error as the proportion (over 10¹⁰ simulations) of P-values smaller than each α level.

Power simulations

Simulations were conducted using α level of 5 × 10⁻⁸, β values of 0.20, 0.25 and 0.30, minor allele frequencies P equal to 0.1 and 0.5, and correlations between visits ρ equal to 0.2 and 0.5. Simulated values were generated similarly to Type I error simulations with the main difference being that model 2 was used instead of model 1. One million simulations were performed for each combination of parameters (β, P and ρ), and as a result of each combination, we measured power as the proportion of P-values less than the 5 × 10⁻⁸ cutoff, now considered the standard threshold to declare genome-wide significance findings.

Applying the method – SardiNIA dataset

The SardiNIA project was designed to investigate the genetics of quantitative traits in the Sardinian founder population.⁹ Over a 10-year period, from November 2001 to the present, residents of four towns in Sardinia, Italy, starting at age 14–95 years, were invited to participate to the study, and a total of 6320 individuals had up to three visits at ∼3-year intervals. The total number of individuals in each visits one, two and three was 6177; 5670; and 1971, respectively, where each individual could be present or not in any of the visits. Individuals were characterized for >100 quantitative traits,⁹ and 14 traits were selected for this analysis (bilirubin, total cholesterol, γ-GT, glycemia, HDL, height, LDL, PR-interval, QT-interval, red blood cell counts (RBC), serum iron, transferrin, triglycerides and uric acid). Traits were selected based on previously reported meta-analysis studies (as of October 2011), which also showed top SNPs for visits 1 and 2 with P<5 × 10⁻⁸ from the SardiNIA dataset, where the same top SNPs had minor allele frequency >5% and were also SNPs were previously identified from the Hapmap project (SNPs with ‘rs’ as the first two characters). Genotype information was obtained from the Metabochip, a custom Illumina iSELECT genotyping array (http://www.sph.umich.edu/csg/kang/MetaboChip).

To minimize the effect of outliers, we applied an inverse normal transformation for every trait in each visit.⁹ Transformed traits were used as the dependent variable and modeled using linear regression, with age at the time of visit and sex as covariates for each separate trait and for each visit. As a result, each trait measurement version (a given trait for a given visit) generated standardized residuals (mean equal to zero and SD equal to one) as the output. (This standardization step is needed in order to assume that noise levels are the same for each visit. However, GWAS results without standardization were very similar (not shown)).

Comparing performance of metrics

To measure the performance of metrics, the most significant SNP for each trait was selected based on three criteria: significance of the signal in visit 1, significance of the signal in visit 2, and significance in published meta-analyses.^{12, 13, 14, 15, 16, 17, 18, 19, 20, 21} Next, for each SNP we ranked the P-values among metrics and then we obtained the average rank for each metric across all traits. As SNPs were selected based on reported meta-analysis (Table 4), but not all of those were present in the Metabochip, we used the SNAP algorithm²² to select a proxy SNP in the Metabochip that had the highest R²(≥0.80). As SardiNIA project is a family based study, to test for association while accounting for relatedness, we used a variance component method implemented in Merlin.¹¹

Results

Simulation results – power and Type I error

Simulated Type I errors were very similar to expected (α), showing that Type I error is well controlled for all three metrics – single visit, Average of up to three visits and SHAVE of up to three visits (Supplementary Materials Table S1). We also noticed a clear increase in power for SHAVE relative to the Average and to a single visit (Table 1). With α level, minor allele frequency P and effect size β, respectively, set to 5 × 10⁻⁸, 0.50 and 0.20, simulated power is shown as an increasing function of the correlation between visits (Figure 1 top). Similarly, with α, P and ρ set to 5 × 10⁻⁸, 0.50 and 0.20, simulated power is shown as an increasing function of the effect size β (Figure 1 bottom). In addition, simulated and expected power was very similar for all three methods. A detailed description of the calculation of expected power can be found in Supplementary Materials Section 4.

Table 1 Simulated and expected power for alpha equal to 5 × 10⁻⁸ and different levels of frequency P, slope β and correlation ρ

Full size table

SardiNIA dataset – performance by ranking

To compare metrics, we use the average rank across 14 traits (Tables 2, 3, 4), where lower average rank indicates higher overall significance. Using data for all three visits in SardiNIA and selecting for the top SNP based on visit 1 (Table 2), the average ranks for visit 1, visit 2, Average and SHAVE were 3.36, 3.50, 2.07 and 1.07. Similarly when selecting for the top SNP based on visit 2 (Table 3), corresponding average ranks were 3.64, 3.00, 2.21 and 1.14. When selecting for top Meta-Analysis SNP’s (Table 4), corresponding average ranks were 3.29, 3.57, 1.93 and 1.21. On the basis of these findings, Average was superior to any single visit in all three tables, with SHAVE having the best performance, (less significant than the Average only twice out of 42 cases (height and QT-interval in Table 4)). An alternative way to compare performance by ranking is shown in Supplementary Materials Table S3.

Table 2 Association results between 14 traits and their corresponding top SNPs, where top SNPs were selected based on visit 1 results of SardiNIA GWAS, and where z-statistics for Average and SHAVE are based on three visits

Full size table

Table 3 Association results between 14 traits and their corresponding top SNPs, where top SNPs were selected based on visit 2 results of SardiNIA GWAS, where z-statistics for Average and SHAVE are based on three visits

Full size table

Table 4 Association results between 14 traits and their corresponding top SNPs, where top SNPs were selected based on multi-study meta-analyses, and z-statistics for Average and SHAVE are based on three visits and results of SardiNIA GWAS

Full size table

SardiNIA dataset – performance by LOD ratios using top Meta-analysis SNPs

We performed two types of LOD ratios for every trait, the first between Average and single visit, the second between SHAVE and Average. To compare signals between Average and single visits, we selected a subset of individuals who had both visits 1 and 2 and compared their signals. We first obtained the z-statistics of the Average (z_AVG) and the z-statistics corresponding to visits 1 and 2 (z₁ and z₂). Next, we obtained the LOD ratio between Average (represented by the square of z_AVG) and a single visit (represented by the square of (z₁+z₂)/2). Observed LOD ratios were all above one, indicating an increase in power using the Average vs a single visit (Figure 2). We note that traits with lowest correlation between visits had the highest LOD ratios, and in the three traits with lowest correlation, transferrin, serum iron and QT-interval, LOD ratios were above 1.5. Similarly, traits with high correlation between visits, such as RBC and height, had LOD score ratios close to one. In general, expected and observed LOD ratios were quite similar, suggesting that our observations match the expectations of the model.

To compare LOD ratios between SHAVE and Average, we generated a subset of the SardiNIA dataset in which all individuals had visit 1, and then, for the same individuals, we randomly selected 50% of them and included their second visits (setting the remaining visit 2 cases as ‘missing values’). The main reason to look at this subset was that differences between SHAVE and Average are only appreciable in unbalanced datasets. Although the observed LOD ratios between SHAVE vs Average were modest when compared with Average vs single visit, the ratios were all greater than one, indicating a consistent increase in power of SHAVE relative to Average (Figure 3). Also, with the exception of transferrin, observed and expected LOD ratios were similar.

Expected LOD ratios in a hypothetical dataset

To get a better estimate of the expected LOD ratio between Average and single visits, and between SHAVE and Average, we generated charts based on a hypothetical dataset with multiple visits (from 2 to 10 visits). When comparing Average vs single visit, we show the expected LOD ratio as a function of the number of visits and the correlation between visits (Figure 4). The expected LOD ratios decrease as the correlation between visits increases. Similarly, expected LOD ratios increase as the number of visits increases, and saturates as the number of visits k becomes large, based on equation 4. When comparing SHAVE vs Average, we assumed a hypothetical dataset in which 50% of the individuals had a single visit and 50% of the individuals had k visits (from 2 to 10) (Supplementary Materials Figure S1). Here LOD ratios are more modest when compared with Figure 4, but still show the same relation to number of visits and correlation.

Discussion

Increasing the strength of a true genetic signal for a quantitative trait can provide overall benefits for GWAS studies, and we show here the extent to which measurements from multiple visits can contribute to that goal. In particular, when we compared the performance of SHAVE vs single visit and SHAVE vs Average using the SardiNIA dataset, some traits showed a large LOD ratio for their top SNPs, indicating that the same genome-wide significance can be achieved using a smaller sample with SHAVE. SHAVE increases power relative to the Average when the dataset is unbalanced (ie, individuals have different number of trait measurements). However, when a dataset is balanced, SHAVE and Average generate identical results. The increase in power for SHAVE was also supported by simulations, which showed both Type I error and power very close to that expected under the assumptions of the linear model.

Power increases with effect size (absolute value of the slope), number of visits and correlation between visits. Given the goal of maximizing the increase in power, when is SHAVE most useful? If power from a single visit is low — such that the top SNP is far from being genome-wide significant — then even the increase in power by SHAVE will not be sufficient for any SNP to achieve genome-wide significance. On the other hand, when a SNP shows marginal genome-wide significance in a single visit, the power boost from SHAVE may make a SNP genome-wide significant. Moreover, when a SNP is already genome-wide significant in a single visit, an increase in power by SHAVE will further improve genome-wide significance, providing additional confidence in the SNP effect.

A major assumption of the random-intercept model is that Var(y_ij | μ_i)=σ²/w (a combination of biological variability and measurement error) is identical for each visit. This might not always be the case if better technology were used to measure a trait in a more recent visit (reducing measurement error), or if better protocols are used (reducing biological variability). However, SHAVE can easily be adapted to such datasets, and one potential improvement could be to estimate a different weight w_j for each visit j. In such instances SHAVE and the Average will not be equivalent even in balanced datasets, with SHAVE expected to outperform the Average. Another key assumption is that the true variance (unknown) within each individual is constant. If this assumption is violated shrinkage distortions result. In our model we assume that this true variance within individual i, denoted by η²_i is equal to σ²/w. However, if η²_i is equal to σ²/w_i, where w_i is the unknown weight for individual i, then if η²_i>σ²/w, w will be greater than his/her true weight w_i, leading to ‘under-shrinkage’, and similarly if η²_i<σ²/w, then w will be smaller than w_i, with ‘over-shrinkage’. Thus, to minimize the effects of over shrinkage and under shrinkage, a potential improvement would be to estimate w_i for each individual, were SHAVE will likely outperform the Average even in balanced datasets. Although preliminary results showed very small increase in power (Supplementary Materials Table S4), there is still potential for improvement in datasets with more visits, in which case the estimate of w_i will be more precise.

Our method uses a two-step model where in the first step we estimate w and use it to calculate SHAVE, and in the second step, SHAVE is regressed on G to obtain the estimate of β and the corresponding z-statistic. One potential improvement would be to use a one-step model, in which w and β are estimated jointly. However, if the genetic variance of the top SNP of a trait, which is equal to is small relative to σ², the expected increase in power will be insignificant. Moreover, preliminary results comparing both one-step and two-step models were nearly identical (Supplementary Materials Table S5).

The derived relations of LOD score ratios, in which the simplest and most practical is , can be applied in cost-benefit analysis for signal improvement in the usage of measuring devices and in experimental design. For example, suppose we are considering adding an additional visit for a trait, and that we have had some preliminary GWAS results for a given SNP. Under the assumption that the signal is true, by estimating the sample correlation between visits, one could estimate the potential increase in significance for that SNP if an additional visit were obtained. This can provide guidance in planning research. Moreover, one can estimate the potential increase in significance for epidemiological studies and GWAS.

In summary, SHAVE takes advantage of multiple trait measurements to boost statistical power for GWAS of quantitative traits. Although, the specific weighting scheme used in this paper is a simple version that is easy to implement even in large-scale GWAS, there are many additional ways to improve the method. The method can also be adapted for more complicated scenarios with unique trait characteristics. For example, traits such as pulse wave velocity²³ show a trait variance that increases with age, in which case weights can be estimated as a function of age. Other traits such as systolic and diastolic blood pressure²⁴ show a trait variance that increases with the magnitude of the measurement, in which case weights can be estimated as a function of the trait. Such new weighting schemes could potentially further increase the statistical power of genetic studies of quantitative traits.

References

Astrand M, Mostad P, Rudemo M : Empirical Bayes models for multiple probe type microarrays at the probe level. BMC Bioinform 2008; 9: 156.
Article Google Scholar
Meirelles O : Statistical Methods in Microarrays and High-Throughput Flow Cytometry, (PhD thesis). Albuquerque, NM: University of New Mexico, 2009.
Google Scholar
Ritchie ME, Diyagama D, Neilson J et al: Empirical array quality weights in the analysis of microarray data. BMC Bioinform 2006; 7: 261.
Article Google Scholar
Sjogren A, Kristiansson E, Rudemo M, Nerman O : Weighted analysis of general microarray experiments. BMC Bioinform 2007; 8: 387.
Article Google Scholar
Powers BJ, Olsen MK, Smith VA, Woolson RF, Bosworth HB, Oddone EZ : Measuring blood pressure for decision making and quality reporting: where and how many measures? Ann Intern Med 2011; 154: 781–788, W-289-790.
Article Google Scholar
Efron B, Morris C : Steins estimation rule and its competitors – empirical Bayes approach. J Am Stat Assoc 1973; 68: 117–130.
Google Scholar
Morris CN : Parametric empirical Bayes inference – theory and applications. J Am Stat Assoc 1983; 78: 47–55.
Article Google Scholar
Stephens M, Balding DJ : Bayesian statistical methods for genetic association studies. Nat Rev Genet 2009; 10: 681–690.
Article CAS Google Scholar
Pilia G, Chen WM, Scuteri A et al: Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet 2006; 2: e132.
Article Google Scholar
Lee PM : Bayesian Statistics – an Introduction. 3rd ed, 2004 London, UK: Hodder Arnold, pp 238–241.
Google Scholar
Abecasis GR, Cherny SS, Cookson WO, Cardon LR : Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 2002; 30: 97–101.
Article CAS Google Scholar
Chambers JC, Zhang W, Sehmi J et al: Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 2011; 43: 1131–1138.
Article CAS Google Scholar
Kolz M, Johnson T, Sanna S et al: Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations. PLoS Genet 2009; 5: e1000504.
Article Google Scholar
Lango Allen H, Estrada K, Lettre G et al: Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 2010; 467: 832–838.
Article CAS Google Scholar
Marroni F, Pfeufer A, Aulchenko YS et al: A genome-wide association scan of RR and QT interval duration in 3 European genetically isolated populations: the EUROSPAN project. Circ Cardiovasc Genet 2009; 2: 322–328.
Article CAS Google Scholar
Pfeufer A, van Noord C, Marciante KD et al: Genome-wide association study of PR interval. Nat Genet 2010; 42: 153–159.
Article CAS Google Scholar
Pichler I, Minelli C, Sanna S et al: Identification of a common variant in the TFR2 gene implicated in the physiological regulation of serum iron levels. Hum Mol Genet 2011; 20: 1232–1240.
Article CAS Google Scholar
Prokopenko I, Langenberg C, Florez JC et al: Variants in MTNR1B influence fasting glucose levels. Nat Genet 2009; 41: 77–81.
Article CAS Google Scholar
Sanna S, Busonero F, Maschio A et al: Common variants in the SLCO1B3 locus are associated with bilirubin levels and unconjugated hyperbilirubinemia. Hum Mol Genet 2009; 18: 2711–2718.
Article CAS Google Scholar
Teslovich TM, Musunuru K, Smith AV et al: Biological, clinical and population relevance of 95 loci for blood lipids. Nature 2010; 466: 707–713.
Article CAS Google Scholar
Uda M, Galanello R, Sanna S et al: Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of beta-thalassemia. Proc Natl Acad Sci USA 2008; 105: 1620–1625.
Article CAS Google Scholar
Johnson AD, Handsaker RE, Pulit SL, Nizzari MM, O'Donnell CJ, de Bakker PI : SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinform 2008; 24: 2938–2939.
Article CAS Google Scholar
Rogers WJ, Hu YL, Coast D et al: Age-associated changes in regional aortic pulse wave velocity. J Am Coll Cardiol 2001; 38: 1123–1129.
Article CAS Google Scholar
de Lange M, Spector TD, Andrew T : Genome-wide scan for blood pressure suggests linkage to chromosome 11, and replication of loci on 16, 17, and 22. Hypertension 2004; 44: 872–877.
Article CAS Google Scholar

Download references

Acknowledgements

This research was supported in part by the Intramural Research Program of the National Institute on Aging, NIH. This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Md. (http://biowulf.nih.gov). The SardiNIA (Progenia) team was supported by Contract NO1-AG-1-2109 from the NIA; the efforts of GRA were supported in part by contract 263-MA-410953 from the NIA to the University of Michigan and by research grant HG002651 and HL084729 from NIH (to GRA).

Author information

Authors and Affiliations

Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
Osorio D Meirelles, Jun Ding, Hsih-Te Yang, Dawood B Dudekula & David Schlessinger
Department of Health and Human Services, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
Toshiko Tanaka & Luigi Ferrucci
Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Monserrato, Cagliari, Italy
Serena Sanna & Francesco Cucca
Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
Goncalo Abecasis

Authors

Osorio D Meirelles
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ding
View author publications
You can also search for this author in PubMed Google Scholar
Toshiko Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Serena Sanna
View author publications
You can also search for this author in PubMed Google Scholar
Hsih-Te Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dawood B Dudekula
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Cucca
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Ferrucci
View author publications
You can also search for this author in PubMed Google Scholar
Goncalo Abecasis
View author publications
You can also search for this author in PubMed Google Scholar
David Schlessinger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osorio D Meirelles.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies the paper on European Journal of Human Genetics website

Supplementary information

Supplementary Information (DOC 1045 kb)

Rights and permissions

This work is licensed under the Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

Reprints and permissions

About this article

Cite this article

Meirelles, O., Ding, J., Tanaka, T. et al. SHAVE: shrinkage estimator measured for multiple visits increases power in GWAS of quantitative traits. Eur J Hum Genet 21, 673–679 (2013). https://doi.org/10.1038/ejhg.2012.215

Download citation

Received: 30 December 2011
Revised: 20 July 2012
Accepted: 26 July 2012
Published: 24 October 2012
Issue Date: June 2013
DOI: https://doi.org/10.1038/ejhg.2012.215

Subjects

Abstract

Similar content being viewed by others

Evaluating and improving heritability models using summary statistics

Improved polygenic prediction by Bayesian multiple regression on summary statistics

An integrated framework for local genetic correlation analysis

Introduction

Methods

SHAVE: the posterior expectation μ*i

Estimating w and σ2

Comparing different metrics – LOD ratio

Average vs single visit

SHAVE vs Average

Simulation study

Simulation models

Type I error simulations

Power simulations

Applying the method – SardiNIA dataset

Comparing performance of metrics

Results

Simulation results – power and Type I error

SardiNIA dataset – performance by ranking

SardiNIA dataset – performance by LOD ratios using top Meta-analysis SNPs

Expected LOD ratios in a hypothetical dataset

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information (DOC 1045 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links

SHAVE: the posterior expectation μ*_i

Estimating w and σ²