Variant-specific inflation factors for assessing population stratification at the phenotypic variance level

Sofer, Tamar; Zheng, Xiuwen; Laurie, Cecelia A.; Gogarten, Stephanie M.; Brody, Jennifer A.; Conomos, Matthew P.; Bis, Joshua C.; Thornton, Timothy A.; Szpiro, Adam; O’Connell, Jeffrey R.; Lange, Ethan M.; Gao, Yan; Cupples, L. Adrienne; Psaty, Bruce M.; Rice, Kenneth M.

doi:10.1038/s41467-021-23655-2

Download PDF

Article
Open access
Published: 09 June 2021

Variant-specific inflation factors for assessing population stratification at the phenotypic variance level

Nature Communications volume 12, Article number: 3506 (2021) Cite this article

3562 Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term ‘variance stratification’. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Genome-wide association studies

Article 26 August 2021

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Article Open access 12 April 2024

Introduction

Large-scale association analyses using whole genome sequence (WGS) data on thousands of participants are now underway, through programs such as the NHLBI’s Trans-Omics for precision Medicine (TOPMed) program and NHGRI’s Genome Sequencing Program. Unlike earlier Genome-Wide Association Studies (GWASs), where data were combined using meta-analyses of summary statistics, in WGS analyses participant-level data from multiple studies are often pooled, and results are obtained from a single analysis. Pooled analysis of WGS is useful due to its computational tractability and its ability to control for genetic relatedness across the pooled datasets. However, it is sensitive to a form of population stratification that is not well known. Population stratification in genetic association analysis^1,2 typically refers to situations where the mean phenotype value and allele frequency both differ across population subgroups. Unless appropriately accounted for in the analysis, e.g., by using regression-based adjustments for ancestry such as principal components or genetic relatedness matrices in linear mixed models, or their combination, it can lead to false–positive associations^3,4,5. Population stratification more generally can refer to differences in phenotype distribution and allele frequency across population subgroups⁶, and hence can also manifest as differences in phenotype variances by subgroup, combined with differences in allele frequencies. In practice, this phenomenon is common in pooled analysis of multi-study data, as small differences in allele frequencies are prevalent, and different studies being pooled often have different measurement protocols, environmental exposures and inclusion criteria, all of which can lead to different phenotype variances among studies.

Previous studies considered the effect of combining together groups with different phenotypic variances. Haldar and Ghosh⁶ studied the effect of population stratification due to mean differences, variance differences, and more generally, phenotypic heterogeneity, across subpopulations, on false positive detections when testing variant associations with a quantitative trait. Conomos et al.⁷ showed that when testing variant associations in a pooled sample of Hispanics/Latinos from different Hispanic background groups, statistical properties of tests are improved when the model allows for different residual variances in the different background groups. Musharoff et al.⁸, in a preprint, studied population variance structure using statistical models of both population means and variances, and developed statistical tests for the association of genetic variants with phenotypic variability.

In this manuscript, we develop variant-specific inflation factors λ_vs, which quantify the degree of inflation/deflation in association testing of a single genetic variant due to population stratification at the variance level. We develop an algorithm to compute approximate variant-specific inflation factors based on allele frequencies and variances in groups pooled together for analysis, demonstrate their usage for assessing model fit, and demonstrate the implications of the population stratification at the variance level in simulations and in analyses of WGS data from TOPMed. To account for population stratification at the variance level we use the computationally efficient and scalable approach proposed in Conomos et al.⁷ and implemented in GENESIS⁹, and show in simulations that it indeed addresses the variance stratification problem in scenarios based on Musharoff et al.⁸.

Results

Simulation studies

Our simulations consisted of 576 simulation settings according to various combinations of parameters. We compared a few ways to estimate variance parameters to be used in computing λ_vs: empirical variances based on homogeneous and stratified variance models, and model-based variances from the heterogeneous variance model. The estimated λ_vs were essentially the same regardless of the method. Figure 1 compares the estimated λ_vs to the observed λ_gc in each of the simulation settings and in each of the two modeling approaches (homogeneous versus stratified variance). Settings are divided according to patterns determining whether variance stratification will be expected, including same or different MAFs between the two studies, the same or different error variances, and whether the PC affects the genetic variance or not. The top three rows in Fig. 1 demonstrate settings in which both the MAF and the total variances differ between the two combined studies, including settings in which both the error and genetic variance components are the same, but the PC affects the genetic variance, resulting in different total variance between the studies because the mean of the PC differs between them. In these settings the variance stratification is observed when using the homogeneous variance model, in that the observed inflation can be substantially higher or lower than 1, with exact values depending on the specific parameters used in each simulation. Indeed, the computed λ_vs and the observed λ_gc are highly correlated. In contrast, the stratified variance model was robust to variance stratification across all settings, with observed inflation around 1 in all simulations. The bottom two rows of Fig. 1 demonstrate settings in which either the MAF or the variances are the same in the two combined studies. In these settings, the expected inflation computed by λ_vs is always 1 (no inflation). As expected, the observed inflation is the same in the homogeneous and stratified variance models. The spread seen in the values of the observed inflation, with some values higher and some lower than the desired 1, are consistent with that expected based on the number of replication of simulations in each setting (10,000); see Supplementary Information for more details.

**Fig. 1: Estimated variant-specific inflation factors versus observed inflation in simulations.**

Genetic association analysis of BMI and hemoglobin concentration in TOPMed

We demonstrate the variance stratification problem in analyzes of hemoglobin concentrations (HGB, N = 7596; three analysis groups) and body mass index (BMI, N = 9807; eight analysis groups) in the TOPMed freeze 4 dataset. In both analyses we computed approximate variant-specific inflation factors λ_vs. We investigated the inflation/deflation problems resulting from variance stratification, and verified that the patterns of inflation and deflation in the homogeneous variance analysis agree, across the different variants, with those obtained from the formula and the provided code. Figures 2 and 3 provide quantile-quantile (QQ)-plots for variants from three categories of variants, where theory predicts inflation $( \lambda_{vs}\ge 1.01)$, deflation $(\lambda_{vs}\le 0.99)$ and variants with λ_vs “Approx. no inflation” ($0.99 < {\lambda }_{vs} < 1.01$), and across all variants, for HGB and BMI analyses respectively. The plots overlay the results from the four analyses methods together. While the homogeneous variance model clearly produces inflated and deflated QQ-plots in line with the theoretical expectation, when looking at all tested variants together, this inflation and deflation (i.e., Type I and Type II errors) mask each other, alarmingly. Despite appearances, these problems do not “cancel out”; one creates more Type I errors, one creates more Type II errors, yet the plot of all results may lead investigators to conclude that the analysis is well-calibrated. In contrast, the stratified residual variance model provides good control of Type I errors, as seen in the QQ-plots, with the exception of the bottom left panel in Fig. 3, which provides QQ-plots for the set of variants that are expected to have deflated test statistics under the pooled variance model when studying BMI: here the stratified residual variance model was also somewhat deflated. Figure 4 provides the genomic control inflation factors λ_gc computed over each of the variant sets provided in the QQ-plots and for each of the traits. The completely stratified and MetaCor models performed better in terms of overall QQ-plots and computed λ_gc values in the two analyses, in that λ_gc values were always closer to 1. MetaCor performed slightly better than the completely stratified model under independence, likely because it accounts for a low degree of relatedness between the strata.

**Fig. 2: QQ-plots comparing observed and expected p values (−log10 transformed) from the analysis of hemoglobin concentrations.**

**Fig. 3: QQ-plots comparing observed and expected p values (−log10 transformed) from the analysis of BMI.**

**Fig. 4: Estimated genomic control inflation factors (λ_gc) across compared analyses.**

Table 1 describes the inflation/deflation patterns of variants according to their MAF. One can see that the inflation/deflation problem is ubiquitous for rare variants, but less so for common variants. In fact, for variants with frequency <0.05, only ~4% of variants have λ_vs falling in the “approximately no inflation” category. This is because the ratio between allele frequencies has a strong effect on inflation/deflation, and ratios can become quite high when variants are rare. In the Supplementary Information, Figure S2 shows the distribution of inflation, deflation, and “approximately no inflation” categories across variants in the two analyses, and demonstrates how similar the deflation/inflation categories are between them. Most variants stay in the same category between analyses, but some rare variants (in the figure defined as MAF < 0.05) can be inflated in one analysis and deflated in the other. These differences are because λ_vs coefficients are affected by sample sizes, variances, and allele frequencies, which all differ to some extent between analyses due to different samples and trait characteristics.

Table 1 Variant inflation/deflation characteristics by categories of MAF.

Full size table

Discussion

A standard tool for analysis of quantitative traits is linear or linear mixed model regression. In its widely-used default version, linear regression is fitted under the assumption that the phenotype’s residual variance is the same for all individuals in the analyzed sample. The extent of the consequences if the variances are not equal sized can be computed exactly given simplifying assumptions. Broadly, using default approaches, if a specific subgroup has a larger phenotypic variance than that of other subgroups in the pooled analysis, the estimated precision of the association signal will understate the contribution from such a subgroup. The result is deflation (loss of power) for variants where allele frequency is greater in this subgroup compared to other subgroups, or inflation (too many false positives) for variants with lower allele frequency in this subgroup compared to others.

While default linear regression methods assume the same variance for all subgroups, which leads to mis-calibrated tests if the assumption does not hold, standard computational tools can be adapted to allow for a stratified variance model, yielding better calibrated tests. Specifically, by fitting different residual variances for each study, or more generally, appropriately defined “analysis group” (e.g., all African Americans of a specific study) the problem can be alleviated. This can be viewed as fitting a different variance component for noise within each study, or as a weighted least squares approach, in which the group-dependent weights are estimated. This approach is implemented in some standard genetic analysis software packages (e.g., GENESIS⁹). Our mathematical derivation and code can be used to assess the degree of miscalibration of association tests. The code uses an additive model, using a Binomial distribution for allele counts, which is commonly used in GWAS. Inflation/deflation trends should be similar between additive and dominant models, though specific values estimated using each of the two models would not be identical.

In linear regression, the stratified residual variance model allows every analysis group to have its own residual variance parameter. In the mixed model setting, where the variance is decomposed into genetic and residual variances, this model keeps the genetic variance component the same but allows for the residual variance to differ across groups. Analysis groups can be defined as study, race/ethnicity, combinations of these, or any other sample characteristics that affect trait variance and may also correlate with allele frequencies. Our mathematical derivation and code for computing λ_vs are under simplifying assumptions of no covariate effects and independent observations. Therefore, these make no distinction between genetic and residual variance components. While in the linear regression setting (independent observations) the variance stratification model clearly suffices to account for variance heterogeneity, in the mixed model setting, a residual variance stratification model may not be optimal, because it may not fully account for stratification in the genetic variance, which could be the result of study design. For example, in Fig. 5, the estimated genetic variance component of the Cleveland Family Study is much higher than those of other studies, and of the residual variance component of the same study, perhaps because study participants were selected from families with obstructive sleep apnea, which is highly associated with obesity. Heterogeneity in genetic variance is addressed in the “completely stratified model”, but such a model requires that individuals are independent between different groups (strata). We also used MetaCor, a method that allows for complete stratification of analysis groups, while keeping genetically related individuals across these groups¹⁵. MetaCor was shown to have good statistical properties and performed well in the BMI and HGB analysis. However, it is currently computationally costlier than a pooled analysis because individual level data are used both at the individual analysis group computations, and when computing covariances between effect size estimates of all analysis groups. Computational efficiency is critical when testing the large number of variants observed in WGS studies. In addition, the MetaCor approach is not yet extended to tests of sets of rare variants (rather than single rare variants tests studied in the current manuscript). While more difficult to assess, variance stratification likely affects tests of rare variant sets as well, and methods that use a Score test based on a null model that is fit once, such as the stratified variance approach implemented in GENESIS, straightforwardly extend to such settings. As sample sizes of TOPMed grow, pooling together more diverse studies and populations, variance stratification problems may be more severe. Models allowing for pooled analysis with both group-specific residual and genetic variances or robust variance estimates may be needed for better control of Type I errors and increased efficiency. Until such methods are developed, we recommend to first use the stratified variance approach, because it is computationally efficient, it can account for relatedness across the entire sample, and the same null model can be used to test variant sets. As a second step, we recommend computing approximate λ_vs, and assessing whether observed inflation/deflation remains for test statistics within groups of variants predicted to be inflated/deflated based on λ_vs values. If inflation/deflation are observed despite residual variance stratification, the analyst would ideally move forward with a meta-analytic approach such as MetaCor (does not discard data but computationally more demanding), or standard meta-analysis after removing individuals to generate genetically independent strata.

**Fig. 5: Estimated variance components across compared analyses.**

Methods

The linear model

For a total sample size of n, we assume that the data follow a linear model denoted as

$${y}_{i}={\beta }_{0}+{g}_{i}\beta +{{\epsilon }}_{i},\,1 \le i \le n,$$

(1)

where y_i is the trait or phenotype value of person i, g_i is their count of coded alleles (i.e., genotype), β₀ denotes the mean outcome in those with no copies of the coded allele, β denotes the effect on the mean trait of each additional copy of the coded allele, and the ${{\epsilon }}_{i}$ are residual errors, which we for now assume are independent, as a simplifying assumption.

To provide intuition for the variance stratification problem, we first demonstrate a mathematical derivation in simplified settings. We assume that the genotype effect is null (β = 0), and that the errors follow a normal distribution ${{\epsilon }}_{i}\sim N(0,\,{\sigma }_{i}^{2})$. We further assume that the phenotypes are centered, the genotypes are centered, and follow a dominant mode of inheritance, i.e., we are using ${g}_{i}=({\tilde{g}}_{i}-p)$, where ${\tilde{g}}_{i}$ is the genotype under a dominant mode (having values 1 or 0), p is the frequency of having any copies of the variant allele, and g_i is used in analysis.

Implication of variance stratification on the Wald test

The Wald test quantifies the strength of the genetic association by dividing a regression-based estimate of β by its corresponding estimated standard error. The linear regression estimate of the effect (written in the general regression form) is

$${\hat{\beta}}=\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}^{2}\right)^{-1}\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}{y}_{i}\right).$$

(2)

Denoting the estimated residual variance of individual i by ${\sigma }_{i}^{2}$ (which may differ across individuals), the variance of $\hat{\beta }$ is

$${\rm{var}}({\hat{\beta }})=\,\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}^{2}\right)^{-2}\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}^{2}{\sigma }_{i}^{2}\right).$$

(3)

When the variance of the residuals is homogeneous across all individuals, this is

$${\rm{var}}({\hat{\beta}})=\,\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}^{2}\right)^{-2}\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}^{2}\right){\sigma }^{2}=\,\left(\mathop{\sum }\limits_{i=1}^{n}{g}_{i}^{2}\right)^{-1}{\sigma }^{2},$$

(4)

where σ² is the common variance parameter. To illustrate how this approach can mislead under variance stratification, we consider the situation where two studies are present, of sizes n₁ and n₂, respectively, such that ${n}_{1}+{n}_{2}=n.$ Further, each study is internally homogeneous with error variances ${\sigma }_{1}^{2}$ and ${\sigma }_{2}^{2}$, and it is also useful to write p₁ and p₂ for the frequency of the variant of interest (under dominant mode). Because we assume that the variant was centered in the pooled population, we have that ${\rm{E}}{[\mathop{\sum}\limits_{{i\in {S_{j}}}} g_{i}^{2}]}={n}_{i}E{[{g}_{i}^{2}]}={n}_{i}{[p_{i} (1- p_{i})^{2}+(1- p_{i}) (0-p_{i})^{2}]}$, or $E{[\mathop{\sum}\limits_{{i\in {{S}_{j}}}} g_{i}^{2}]}={n}_{i}{p}_{i}(1- p_{i}).$ We can re-write Eq. (3) as:

$${\rm{var}}({\hat{\beta}})=\frac{{n_{1} p_{1}(1-p_{1})\sigma_{1}^{2}+ n_{2} p_{2}(1- p_{2})\sigma_{2}^{2}}}{{[n_{1}{p}_{1}(1- p_{1})+ n_{2} p_{2}(1- p_{2})]^{2}}}$$

(5)

We see that the actual variance is a linear combination of the variance parameters ${\sigma }_{1}^{2},\,{\sigma }_{2}^{2}$, and the weight assigned to each depends on the minor allele frequency and sample size in each group. When the minor allele frequencies (MAFs) are equal, p₁ = p₂, the two forms⁴ and⁵ are equal, as there is no association between genotype and outcome, and no confounding occurs. But when ${p}_{1}\;\ne\; {p}_{2}$ then the variance of the estimator upweights the residual variance in the group where the variant is more common, which does not happen under homogeneity. This result straightforwardly generalizes for M studies.

In some studies, researchers use mixed models in GWAS to account for genetic relationships between individuals. Then, it is usually assumed that the variance is decomposed to an error and genetic variances, so that ${\rm{var}}({\epsilon })=\,{\sigma }_{e}^{2}+{\sigma }_{g}^{2}$. When using unrelated individuals and not accounting for genetic relatedness via a genetic relationship matrix, the two variance components are not identifiable and it is clear that accounting for differences in error variances is the same as accounting for differences in total variance (the sum of the two variance components). Musharoff et al.⁸ introduced a model where the variances depends on individual-specific genetic components. For example, it could depend on a principal component (PC) of the genetic data, with: ${\rm{var}}({\epsilon })={\sigma }_{e}^{2}+\,{\theta }_{g}^{2}{\sigma }_{g}^{2}$ where θ_g is a PC, with values varying across individuals. We address this setting in simulations.

Computing approximate variant-specific inflation factors

We can use mathematical derivations under homogeneity and heterogeneity, relaxing the restrictive assumptions provided earlier, to compute variant-specific inflation factors. These make use of standard “sandwich” formula for large-sample approximations of the variance of estimators; for a minimally-technical summary see Result 2.1 in Wakefield¹⁰, or for more detail Sections 5.2–5.3 of Van der Vaart¹¹. We now allow for additive genetic model, and do not assume that the genotypes are standardized. The variance estimator used by the Wald test is now provided as follows:

$${\rm{var}}({({\hat{\beta }}_{0},{\hat{\beta }})}^{T}\,)=\,\left[\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\right]^{-1}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}\\ \times cov({\boldsymbol{y}})\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\left[{\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)}^{T}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\right]^{-1}$$

which simplifies if ${\rm{cov}}({y}_{i})=\,{\sigma }^{2}$ for all $i=1,\,\ldots ,\,n$, to:

$${\rm{var}}({({\hat{\beta }}_{0},\hat{\beta })}^{T}\,)=\,\left[\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\right]^{-1}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}{\sigma }^{2},$$

but allowing for different variances per study, it becomes:

$${\rm{var}}(({\hat{\beta}}_{0},{\hat{\beta}})^{T})= \, \left[\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\right]^{-1}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}\\ \, \times\left(\begin{array}{ccc}{\sigma }_{1}^{2} & 0 & 0\\ 0 & \ddots & 0\\ 0 & 0 & {\sigma }_{M}^{2}\end{array}\right)\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\left[\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)^{T}\left(\begin{array}{cc}1 & {g}_{1}\\ \vdots & \vdots \\ 1 & {g}_{n}\end{array}\right)\right]^{-1}$$

Based on these two expressions, we propose an algorithm to compute an approximate variant-specific inflation factor. For computational purposes, we further simplify these arguments taking advantage of the fact that there are repeated rows (e.g., people who have g_i = 1 and are from the same study, having the same residual variance). The algorithm below uses the additional assumption that phenotypic variance within each study does not vary with genotype—which must hold under the strong null hypothesis of no association in any subpopulation. It also uses the simplifying assumption that variants are in Hardy Weinberg Equilibrium (HWE) within each study population; testing HWE is a standard preprocessing step for genotype data.

Algorithm for computing variant-specific inflation factors

Suppose that an analyst wished to estimate a vector of regression parameters ${({\beta }_{g},{\beta }_{1},\ldots ,{\beta }_{M})}^{T}$, where β_g is a variant association measure, and ${\beta }_{1},\ldots ,\,{\beta }_{M}$ are intercepts for M analysis groups. Denote the genotype of the ith individual in the m analysis group by g_mi. The design matrix for estimating these parameters in linear regression would be of the form

$$\left(\begin{array}{llll}{\bf{g}}_{1} \quad {\bf{1}}_{{n_{1}}} \quad {\bf{0}}_{{n_{1}}} \quad \ldots \quad {\bf{0}}_{{n_{1}}}\\ {\bf{g}}_{2} \quad {\bf{0}}_{{n_{2}}} \quad {\bf{1}}_{{n_{2}}} \quad \ldots \quad {\bf{0}}_{{n_{2}}}\\ \;\; \vdots \qquad \!\vdots \qquad\! \vdots \qquad\!\! \ldots\quad\!\! \vdots \\ {\bf{g}}_{M} \quad {\bf{0}}_{{n_{M}}} \quad {\bf{0}}_{{n_{M}}} \quad \ldots \quad {\bf{1}}_{{n_{M}}}\end{array}\right)$$

where ${{\bf{g}}}_{m}={({g}_{m1},\ldots ,{g}_{m{n}_{m}})}^{T}$, ${1}_{{n}_{m}}$ is a vector of length n_m with all entries being equal to 1, and similarly ${0}_{{n}_{m}}$ is a vector of length n_m with all entries being equal to 0. Let $V=var(y)$ be the diagonal matrix with error variances of the outcomes. The estimator of the variances and covariances of the vector of regression parameters is ${\rm{var}}(\hat{{\boldsymbol{\beta }}})={({{\rm{W}}}^{T}{\rm{W}})}^{-1}{{\rm{W}}}^{T}{\rm{VW}}{({{\rm{W}}}^{{\rm{T}}}{\rm{W}})}^{-1}$. From the matrix ${\rm{var}}(\hat{{\boldsymbol{\beta }}})$ we are interested in the leading diagonal value, which is the variance of $\widehat{{\beta }_{g}}$. Suppose first that one construct the matrix W using the actual data. Then:

$$({{\rm{W}}}^{T}{\rm{W}})=\,\left(\begin{array}{ccc}\mathop{\sum}\limits_{m=1}^{M}{{\bf{g}}}_{m}^{T}{{\bf{g}}}_{m} \quad\;\;\; {{\bf{g}}}_{1}^{T}{\bf{1}}_{{n}_{1}}\quad {\bf{g}}_{2}^{T}{\bf{1}}_{{n_{2}}}\quad \ldots \quad\;\;\;\; {\bf{g}}_{{M^{T}}}{\bf{1}}_{{n_{M}}}\\ \!\!\!\!\!{{\bf{g}}}_{1}^{T}{\bf{1}}_{{n}_{1}}\qquad\quad\;\;\; n_{1}\!\! \qquad\;\;\, 0 \qquad\ldots\;\;\;\qquad 0\\ \;\; \!\!\!\!\!\!\!{{\bf{g}}}_{2}^{T}{\bf{1}}_{{n}_{2}}\qquad\;\;\;\;\;\quad\!\!\!\!\!\! 0 \qquad\;\; n_{2} \qquad\; \ldots\quad\;\;\;\;\;\; 0\\ \; \; \!\vdots \qquad\qquad\;\; \vdots \qquad\quad \vdots \qquad\quad\!\!\! \ddots\;\;\; \qquad 0\\ \!\!\!{{\bf{g}}}_{{M^{T}}}{\bf{1}}_{{n_{M}}}\;\;\;\quad\quad 0 \qquad\quad 0 \qquad\quad 0\qquad\;\;\;\; n_{M}\end{array}\right)$$

Now, instead of using the genotype themselves, we use the large sample limit of the expected genotype under HWE to replace the expression ${\bf{g}}_{m}^{T} {1}_{{n_{m}}}$ by ${n}_{m}\times {(0\times {p}_{m}^{2}+1\times 2{p}_{m}(1-{p}_{m})+2\times (1-{p}_{m})^{2})}$, where ${n}_{m}{p}_{m}^{2}$, $2{n}_{m} {p}_{m} {(1-p_{m})}$, and ${n}_{m} {(1-p_{m})}^{2}$ are the number of individuals from analysis group m expected to have 0, 1, and 2 effect alleles under HWE. Similarly, we can replace ${\bf{g}}_{m}^{T} {\bf{g}}_{m}$ by its large sample limit under HWE.

Notice that the quantity $0 \times p_{m}^{2}+1\times 2{p}_{m} {(1-p_{m})} +2\times {(1-p_{m})}^{2}$ is a multiplication of two vectors: ${(0,1,2)}\times {(p_{m}^{2}, 2 p_{m} {(1-p_{m})}, {(1-p_{m}^{T})})}^{T}$. Thus, we now define a matrix X and a matrix p, such that $({\rm{W}}^{{\rm{T}}}{\rm{W}})={\rm{X}}^{{\rm{T}}}{\rm{PX}}$. In matrix X the left column having values ${(0,1,2,\ldots ,0,1,2)}^{T}$ (${(0,1,2)}^{T}$ repeating for each study), instead of the actual observed genotypes ${(g_{11},\ldots , g_{{M {n_{M}}}})}^{T}$, other columns represent study-specific intercepts, and the matrix P is a diagonal matrix providing the HWE probabilities, for each study, further scaled by the proportion of individual that each analysis group contributes to the study. We use the matrices X, P, and V = var(y) to similarly replace W^TVW by its large sample limit under HWE. Specifically, define:

${\rm{X}}=({\rm{G}}\,{\rm{D}})$ where G is a vector of length 3M of the form ${(0,1,2,\ldots ,0,1,2)}^{T}$, and D is a 3M × M design matrix modeling study-specific intercepts where the i, j element D_ij is
$$\left\{\begin{array}{lll}1 & {\rm{if}} & i=3m,3m-1\,or\,3m-2,\,j=m\\ 0 & {\,\!} & {\rm{otherwise}}.\end{array}\right.$$
P, a 3M × 3M diagonal matrix, in which each entry gives the population proportion in each combination of genotype and study, i.e.,:

$${\rm{P}}=diag\left(\frac{{n_{1}}}{n}{p}_{1}^{2},\,\frac{{n_{1}}}{n}2{p}_{1}(1-{p}_{1}),\,\frac{{n_{1}}}{n}{(1-{p}_{1})}^{2},\,\ldots\right.$$

$$\left.\frac{{n_{M}}}{n}{p}_{M}^{2},\,\frac{{n_{M}}}{n}2{p}_{M}(1-{p}_{M}),\,\frac{{n_{M}}}{n}{(1-{p}_{M})}^{2}\right)$$

V, is a 3M × 3M diagonal matrix, in which each entry gives the outcome variance in each combination of genotype and study.

$${\rm{V}}=diag({\sigma }_{1}^{2},{\sigma }_{1}^{2},{\sigma }_{1}^{2},\,\ldots ,\,{\sigma }_{M}^{2},\,{\sigma }_{M}^{2},\,{\sigma }_{M}^{2})$$

Define now ${\rm{B}}={{\rm{X}}}^{T}{\rm{PX}}$ and ${\rm{A}}={{\rm{X}}}^{T}{\rm{PVX}}$, which give the large sample limits of $({{\rm{W}}}^{T}{\rm{W}})$ and ${{\rm{W}}}^{T}{\rm{VW}}$. Under heterogeneity the variance of the slope estimate $\widehat{{\beta }_{g}}\,$ is proportional to the leading diagonal entry ${{\rm{B}}}^{-1}{{\rm{AB}}}^{-1}$. Under homogeneity the variance $\widehat{{\beta }_{g}}\,$ is proportional to the leading entry of ${{\rm{B}}}^{-1}\times {\rm{sum}}(diag({\rm{PV}}))$, with the same constant of proportionality. The ratio of these two leading entries, squared, gives the large-sample value of λ_gc, the genomic control inflation factor¹² that would be obtained by comparing the median Wald test statistic to the median of the ${\chi }_{1}^{2}$ reference distribution, if all variants had the same MAF values across the studies. Because this formula provides different results for each variant, depending on the allele frequencies, we denote the ratio between the estimated values under homogeneous variance and the heterogeneous variance models λ_vs, for “variant specific”. Note that this function requires estimation of variances (for constructing matrix P, under HWE assumption) and allele frequencies (for constructing matrix V), which are readily obtained.

An R function implementing these matrix calculations is provided, together with a tutorial that includes a coding example. These are also provided on GitHub on https://github.com/tamartsi/Variant_specific_inflation, and the function will be integrated into the GENESIS R package.

Simulation studies

We performed simulations to study the appropriateness of the proposed λ_vs, in terms of how it approximates the standard genomic control coefficient λ_gc obtained from a “homogeneous variance” model that estimates a single variance parameter across data from all studies. We also studied whether a “stratified variance” model, allowing for different variance parameters across two studies, improves upon the homogeneous variances model. In this vein, we simulated unassociated genetic variants and outcomes in a range of settings combining two studies. We simulated ${n}_{1},\,{n}_{2}$ individuals in study 1 and study 2, ${n}_{1}+{n}_{2}=n$. Let y_i be the outcome value of person $i,\,i=1,\,\ldots ,\,n$, and ${\theta }_{i}$ a PC value for this person, ${\beta }_{1}=1,\,{\beta }_{2}=2$ be study-specific intercepts for studies 1 and 2, ${\sigma }_{1}^{2},\,{\sigma }_{2}^{2}$study-specific error variances, ${\sigma }_{g}^{2}$ a common genetic variance parameter, and ${\beta }_{\theta }=1$ models the linear association of the PC with the outcome. The PC was simulated from a normal distribution with variance 1, and mean ${\mu }_{1}=2$ in study 1, and mean μ₂ in study 2 computed such that the overall PC mean in the two studies together is equal to zero (i.e., $(\mathop{\sum}\limits_{i=1}^{{n_{2}}}\theta_{i})/n_{2}\,=(\mathop{\sum}\limits_{j=1}^{{n_{1}}}\theta_{j})/{n}_{1}$). The outcome model specified as:

$${y}_{i}={\alpha }_{1}{1}_{stud{y}_{1}}+{\alpha }_{2}{1}_{stud{y}_{2}}+{\theta }_{i}{\alpha }_{\theta }+{{\epsilon }}_{i},$$

(6)

With

$${{\epsilon }}_{i}\sim N(0,\,{\sigma }_{1}^{2}{1}_{stud{y}_{1}}+{\sigma }_{2}^{2}{1}_{stud{y}_{2}}+{\sigma }_{g}^{2}),$$

(7)

Or

$${{\epsilon }}_{i}\sim N(0,\,{\sigma }_{1}^{2}{1}_{stud{y}_{1}}+{\sigma }_{2}^{2}{1}_{stud{y}_{2}}+{\theta }_{i}^{2}{\sigma }_{g}^{2}).$$

(8)

In⁷ the PC does not affect the genetic outcome variance, while in⁸ it does. Some of the parameters were the same in all simulations (as reported above). We varied the following parameters: ${n}_{1},{n}_{2}\in \{1000,\,5000\}$, ${\sigma }_{1}^{2},{\sigma }_{2}^{2},\,{\sigma }_{g}^{2}\in \{1,\,2\},$ and simulated bi-allelic independent genetic variants with MAFs ${p}_{1},{p}_{2}\in \{0.01,\,0.05,\,0.5\}$ in the two studies.

We performed 10,000 simulations for each combination of parameters and, for each such setting, computed λ_gc as the ratio between median observed and expected value of the ${\chi }_{(1)}^{2}$ test statistic (under the null). We computed λ_vs in each of the 10,000 simulations based estimated variances and observed allele frequencies in each of the two simulated studies, and averaged these estimates across the simulations. We compared three approaches to estimate variances¹: fit a homogeneous variances model, obtain residuals $\hat{{\epsilon }}.$ For each study, estimate the variance as the average $1/n\mathop{\sum}\limits_{i=1}^{n}\widehat{\epsilon_{i}^{2}}$ where n is the number of study individuals (empirical variance)²; fit a “stratified variance” linear regression model allowing for different residual variances by study (as implemented in the R/Bioconductor package GENESIS⁹);³ use the same model with stratified variances, but use the variance estimates obtained by the AI-REML algorithm (model variance). In the Supplementary Information, we provide a distribution of λ_gc values that would be seen under random variability, using 10,000 independent test statistics (as was used in simulations), and simulating test statistics under the null and computing inflation factors λ_gc.

Whole genome sequencing in TOPMed

For the present analysis, we used Whole Genome Sequencing (WGS) data from freeze 4 of TOPMed. WGS was performed on DNA samples extracted from blood. Sequencing was performed by the Broad Institute of MIT and Harvard (FHS and Amish) and by the Northwest Genome Center (JHS). PCR-free libraries were constructed using commercially available kits from KAPA Biosystems (Broad) or Illumina TruSeq (NWGC). Libraries were pooled for clustering and sequencing, and later de-multiplexed using barcodes. Cluster amplification and sequencing were performed according to manufacturer’s protocols using the Illumina cBot and HiSeq X sequencer, to a read depth of >30X. Base calling was performed using Illumina’s Real Time Analysis 2 (RTA2) software. Read alignment, variant detection, genotype calling and variant filtering were performed by the TOPMed Informatics Research Center (University of Michigan). Reads were aligned to the 1000 Genomes hs37d5 decoy reference sequence. Variant detection and genotype calling were performed jointly for several TOPMed studies (including the three analyzed here), using the GotCloud pipeline. Mendelian consistency was used to train a variant quality classifier using a Support Vector Machine, used for variant filtering. Additional quality control (pedigree checks, gender checks, and concordance with prior array data), performed by the TOPMed Data Coordinating Center, were used to detect and resolve sample identity issues. Further details (including software versions) are provided online (see: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/document.cgi?study_id=phs000964.v2.p1&phv=251960&phd=6969&pha=&pht=4838&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1).

TOPMed analyses were performed in agreement with study participants’ consent, as verified via an approval process by parent studies PIs in TOPMed and TOPMed publication committee.

Variant-specific inflation and genetic association analysis of BMI and hemoglobin concentration in TOPMed

To demonstrate the variance stratification problem, we used datasets of hemoglobin concentrations (HGB) and body mass index (BMI) in the TOPMed freeze 4. For each of the traits, we computed a Genetic Relationship Matrix (GRM¹³) on all available variants for the corresponding trait with minor allele frequency at least 0.001, which was used to control for genetic relatedness in mixed models. Because some studies had individuals with different genetic backgrounds (leading to differences in allele frequencies), we defined “analysis groups” to use for assessment of variance stratification. An analysis group was as either all individuals from a single study (e.g., Amish), or further defined by both study and race/ethnic group (e.g., European and African Americans from the Cleveland Family Study were separate analysis groups). Thus, analysis groups capture multiple potential sources of trait variance, including differences in allele frequencies due to genetic ancestry, differences in environment and social/cultural factors, and differences in trait measurement by study. For both BMI and HGB, we performed single-variant association analysis for all variants with minor allele count of at least 20. Detailed breakdown of the studies and populations used in these analyses are provided in the Tables 2 and 3. The analysis strategy for both traits was to use the fully-adjusted two-stage procedure for rank-normalization of residuals, because it was shown to have better statistical properties (type 1 error control and power), especially when testing possibly rare genetic variants¹⁴. Thus, we first fit a mixed linear regression model, with fixed effects for sex, age (also age² for BMI), group defined by study and race/ethnicity, and allowing for genetic relatedness by including a variance component proportional to the GRM. Then we took the residuals generated by this model, rank-normalized them, and then re-fit the same model but with the rank-normalized residuals as the trait. For both traits, we compared four analyses: first, a ‘homogeneous variance’ analysis that estimates a single residual variance parameter across all individuals; second, a “stratified residual variance” model that allows a different residual variance parameter for each analysis group; third, a “completely stratified” approach which fits models and performs tests in each analysis group separately, and then combines the results via inverse-variance fixed-effects meta-analysis; and forth, a “MetaCor” analysis¹⁵ that perform stratified analyses followed by fixed-effects meta-analysis while accounting for potential correlations due to genetic relationships between individuals in different analysis groups. The ‘completely stratified’ and the ‘MetaCor’ analyses are slightly more flexible than the stratified variance model because they allow for different genetic variance components across analysis groups, in addition to different residual variance components. For BMI, we removed eight individuals from the “completely stratified” analysis to ensure individuals were unrelated across groups, defined as less than third-degree relatedness. All analyses, other than MetaCor, used the GENESIS R package.

Table 2 Analysis groups/strata participating in the BMI analysis.

Full size table

Table 3 Analysis groups/strata participating in the analysis of hemoglobin concentration.

Full size table

Computing variant-specific inflation factors in mixed models with residual rank-normalization

We studied the calibration of the various analyses of HGB and BMI by computing approximate variant-specific inflation factors λ_vs and, for diagnostics, generated QQ-plots as describe later. Notably, λ_vs were developed assuming independent data, and applying them in the mixed model settings provides only an approximation, as both the sample size is inaccurate (e.g., two full siblings have similar genetic data, so their effective sample size is <2), and there is more than a single variance parameter, and thus it is not straight forward to decide which variance estimates to use in computing λ_vs. To see that, consider the mixed- model analysis. We modeled both an error and a genetic variance component, so that for each observation, the model, in matrix form, assumes that:

$${\boldsymbol{y}}={\boldsymbol{X}}\beta +{{\boldsymbol{g}}}_{{\boldsymbol{j}}}{\alpha }_{j}+\epsilon ,\,{\rm{with}}\,cov(\epsilon )={\sigma }_{e}^{2}{\boldsymbol{I}}+{\sigma }_{g}^{2}{\boldsymbol{G}}$$

Where G is the GRM, and ${\sigma }_{e}^{2},\,{\sigma }_{g}^{2}$ are error and genetic variance components, respectively. Thus, the variance depends on ${\sigma }_{e}^{2}$, ${\sigma }_{g}^{2}$, and G

In addition, we applied the fully-adjusted two-stage procedure for rank-normalization of residuals, another procedure unaccounted for by the algorithm. Therefore, different possible models will yield quite different variance estimates to be used in the λ_vs computations, due to changes to the residual distributions due to rank-normalization. Because we are alerting the readers to the problems arising from assuming that variances are the same across all studies, we used variance computed based on the ‘homogeneous variance’ null model (the same residual and genetic variance components for all analysis groups). We extracted marginal residuals (distinguished from conditional residuals that can also be computed in mixed models) for each group, and computed empirical variance for group j by $v_{j}=\,1/n_{j}\mathop{\sum}\limits_{{i\in {S_{j}}}} \widehat{\epsilon_{i}^{2}}$. Note that this estimator does not account for relatedness. We used the residuals from the second null model from the two-stage procedure.

Assessing population stratification at the variance level through QQ-plots

Once λ_vs are computed for each of the variants of interest, we propose to generate QQ-plots across sets of variants to visualize whether population stratification at the variance level is appropriately addressed. A function is available on the GitHub repository to generate QQ-plots stratifying variants to categories: “Inflated”, “Deflated” and “Approx. no inflation”. The categories can be manually defined, so that a variant can be assigned to the “Inflated” category if its λ_vs is larger than a user-specified value, e.g., 1.01. Similarly, a variant is assigned to the “Deflated” category if its λ_vs is lower than a user-specified, e.g., 0.99. Variants are assigned to the “Approx. no inflation” category if they are not in the “Inflated” or “Deflated” categories, i.e., their λ_vs is close to the desired value of 1.

Characterizing variants by inflation patterns

To study how common the variant inflation/deflation problem is, and how it relates to variant frequencies, we computed the proportion of variants in levels of λ_vs for each allele frequency category: <0.01, 0.01–0.05, 0.05–0.2, and 0.2–0.5. We also studied the similarity of inflation/deflation patterns between BMI and HGB.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

TOPMed (https://www.nhlbiwgs.org/) whole genome sequencing data are available, from TOPMed Freeze 5b and Freeze 8, on dbGaP by application to each of the studies used in this manuscript. Phenotypes can also be obtained through application to dbGaP. Study dbGaP accessions are: phs000956 (Amish; ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000956); phs000954 (CFS; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000954); phs000951 (COPDGene; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000951); phs000988 (CRA; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000988.v4.p1); phs000974 (FHS; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000974.v1.p1); phs000964 (JHS; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000964.v4.p1). Source data are provided with this paper. Because Figs. 2 and 3 are based on results from testing tens of millions of variants, source data are provided after randomly sampling a smaller subset of data points out of those with p value > 0.01. Source data are provided with this paper.

Code availability

Statistical analyses were performed using the freely available R software, version 4.0.0. Association testing used the GENESIS package version 2.18.0, available on R/Bioconductor, or using the MetaCor R package available on GitHub https://github.com/tamartsi/MetaCor. Code for computing variant-specific inflation factors is available on GitHub https://github.com/tamartsi/Variant_specific_inflation with a tutorial, code, and example simulated data provided also in Supplementary Software 1. The code will also become available as part of GENESIS in a future release. Figures were generated using the ggplot2 R package version 3.3.0.

References

Hellwege, J. N. et al. Population stratification in genetic association. Stud. Curr. Protoc. Hum. Genet.95, 1.22.1–1.3 (2017).
Google Scholar
Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. Lancet361, 598–604 (2003).
Article Google Scholar
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet.38, 904–909 (2006).
Article CAS Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations. Methods Mol. Biol.1019, 215–236 (2013).
Article CAS Google Scholar
Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet.98, 127–148 (2016).
Article CAS Google Scholar
Haldar, T. & Ghosh, S. Effect of population stratification on false positive rates of population-based association analyses of quantitative traits. Ann. Hum. Genet.76, 237–245 (2012).
Article Google Scholar
Conomos, M. P. et al. Genetic diversity and association studies in US Hispanic/Latino populations: applications in the Hispanic Community Health Study/study of Latinos. Am. J. Hum. Genet.98, 165–184 (2016).
Article CAS Google Scholar
Musharoff S., et al. 2018. Existence and implications of population variance structure. bioRxiv:439661
Gogarten S. M., et al. 2019. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics.
Wakefield J. 2013. Bayesian and frequentist regression methods. Springer Science & Business Media.
Van der Vaart A. W. 2000. Asymptotic statistics. Cambridge University Press.
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics55, 997–1004 (1999).
Article CAS Google Scholar
Amin, N., Van Duijn, C. M. & Aulchenko, Y. S. A genomic background based method for association analysis in related individuals. PloS One2, e1274 (2007).
Article ADS Google Scholar
Sofer T., et al. 2019. A fully adjusted two-stage procedure for rank-normalization in genetic association studies. Genet. Epidemiol.
Sofer, T. et al. Meta-analysis of genome-wide association studies with correlated individuals: application to the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Genet Epidemiol.40, 492–501 (2016).
Article Google Scholar

Download references

Acknowledgements

T.S. was supported by National Heart, Lung, and Blood Institute (NHLBI; R01HL120393‐03S1, 1R35HL135818, and 1R21HL145425). The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. Study acknowledgements are provided in Supplementary File 1.

Author information

Authors and Affiliations

Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
Tamar Sofer
Department of Medicine, Harvard Medical School, Boston, MA, USA
Tamar Sofer
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Tamar Sofer
Department of Biostatistics, University of Washington, Seattle, WA, USA
Xiuwen Zheng, Cecelia A. Laurie, Stephanie M. Gogarten, Matthew P. Conomos, Joshua C. Bis, Timothy A. Thornton, Adam Szpiro & Kenneth M. Rice
Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
Jennifer A. Brody & Bruce M. Psaty
Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD, USA
Jeffrey R. O’Connell
Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
Ethan M. Lange
School of Medicine, University Mississippi Medical Center, Jackson, MS, USA
Yan Gao
Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
L. Adrienne Cupples
National Heart, Lung, and Blood Institutes, Framingham Heart Study, Framingham, MA, USA
L. Adrienne Cupples
New York Genome Center, New York, NY, USA
Namiko Abe, Karen Bunting, Bo-Juen Chen, Heather Geiger, Soren Germer, Melissa Marton, Catherine Reeves, Nicolas Robine, Alexi Runnels, Tanja Smith, Lara Winterkorn & Michael Zody
University of Michigan, Ann Arbor, MI, USA
Gonçalo Abecasis, Larry Bielak, Thomas Blackwell, Jeffrey Curtis, Matthew Flickinger, Colin Gross, Hyun Min Kang, Sharon Kardia, Jonathon LeFaive, Patricia Peyser, Jacob Pleiness, Albert Vernon Smith, Jennifer Smith, Daniel Taliun, Peter VandeHaar, Jiongming Wang, Joshua Weinstock, Cristen Willer, Ketian Yu, Wei Zhao & Sebastian Zoellner
Broad Institute, Cambridge, MA, USA
Francois Aguet, Kristin Ardlie, Mark Chaffin, Seung Hoan Choi, Clary Clish, Stacey Gabriel, Namrata Gupta, Pradeep Natarajan, Carolina Roselli & Seyedeh Maryam Zekavat
Cedars Sinai, Boston, MA, USA
Christine Albert
Children’s Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA, USA
Laura Almasy
Emory University, Atlanta, GA, USA
Alvaro Alonso, Rich Johnston, Lawrence S. Phillips, Zhaohui Qin & Stephanie L. Sherman
University of Maryland, Baltimore, MD, USA
Seth Ament, Amber Beitelshees, Christy Chang, Coleen Damcott, Scott Devine, Mao Fu, Da-Wei Gong, Yue Guan, Elliott Hong, Michael Kessler, Joshua Lewis, Patrick McArdle, Braxton D. Mitchell, May E. Montasser, Jeff O’Connell, Tim O’Connor, James Perry, Toni Pollin, Robert Reed, Kathleen Ryan, Shabnam Salimi, Amol Shetty, Elizabeth Streeten, Simeon Taylor & Huichun Xu
University of Washington, Seattle, WA, USA
Peter Anderson, Joshua Bis, Jennifer Brody, Jai Broome, Erin Buth, Matthew Conomos, Colleen Davis, Leslie Emery, Chris Frazar, Stephanie M. Fullerton, Stephanie Gogarten, Ben Heavner, Susan Heckbert, Deepti Jain, Craig Johnson, Alyna Khan, Cathy Laurie, Cecelia Laurie, David Levine, Susanne May, Daniel McGoldrick, Caitlin McHugh, Sarah C. Nelson, Deborah Nickerson, Bruce Psaty, Ken Rice, Josh Smith, Nicholas Smith, Nona Sotoodehnia, Adrienne M. Stilp, Adam Szpiro, Timothy A. Thornton, Machiko Threlkeld, David Tirschwell, Catherine Tong, Fei Fei Wang, Bruce Weir, Kayleen Williams & Quenna Wong
University of Mississippi, Jackson, MS, USA
Pramod Anugu, Lynette Ekunwe, Yan Gao, Hao Mei & Nancy Min
National Institutes of Health, Bethesda, MD, USA
Deborah Applebaum-Bowden
Johns Hopkins University, Baltimore, MD, USA
Dan Arking, Dimitrios Avramopoulos, Emily Barron-Casella, Terri Beaty, Diane Becker, Lewis Becker, James Casella, Kimberly Jones, Barry Make, Rasika Mathias, Rakhi Naik, Wendy Post, Ingo Ruczinski, Steven Salzberg, Margaret Taub, Dhananjay Vaidya & Lisa Yanek
University of Kentucky, Lexington, KY, USA
Donna K. Arnett
Duke University, Durham, NC, USA
Allison Ashley-Koch, Yongmei Liu & Marilyn Telen
University of Alabama, Birmingham, AL, USA
Stella Aslibekyan, Bertha Hidalgo, Marguerite Ryan Irvin, Merry-Lynn McDonald & Hemant Tiwari
Stanford University, Stanford, CA, USA
Tim Assimes, Carlos Bustamante, Chris Gignoux, Yu Liu, David T. Paik, Marco Perez, Michael Snyder, Hua Tang & Joseph Wu
University of Wisconsin Milwaukee, Milwaukee, WI, USA
Paul Auer
Providence Health Care, Vancouver, BC, Canada
Najib Ayas
Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
Adithya Balasubramanian, Huyen Dinh, Harsha Doddapaneni, Shannon Dugan-Perez, Jesse Farek, Richard Gibbs, Yi Han, Jianhong Hu, Ziad Khan, Sandra Lee, Vipin Menon, Ginger Metcalf, Zeineen Momin, Donna Muzny, Caitlin Nessner, Osuji Nkechinyere, Geoffrey Okwuonu, Mahitha Rajendran, Sejal Salvi, Jireh Santibanez & Jennifer Watt
Cleveland Clinic, Cleveland, OH, USA
John Barnard, Gerald Beck, Mina Chung, Suzy Comhair & Serpil Erzurum
University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
Kathleen Barnes, Sharon Graw, Luisa Mestroni & Matthew Taylor
Columbia University, New York, NY, USA
R. Graham Barr
The Emmes Corporation, Rockville, MD, USA
Lucas Barwick
National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
Rebecca Beer, Weiniu Gan, Cashell Jaquish, Andrew Johnson, Dan Levy, James Luo, Julie Mikulla, George Papanicolaou & Pankaj Qasba
Boston University, MA General Hospital, Boston, MA, USA
Emelia Benjamin
University of Pittsburgh, Pittsburgh, PA, USA
Takis Benos, Mark Geraci, Mark Gladwin, Ryan L. Minster, Frank Sciurba, Daniel E. Weeks & Yingze Zhang
FundaÃ§Ã£o de Hematologia e Hemoterapia de Pernambuco—Hemope, Recife, BR, Brazil
Marcos Bezerra
University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
John Blangero, Joanne Curran & Michael Mahaney
University of Texas Health at Houston, Houston, TX, USA
Eric Boerwinkle, Myriam Fornage & James Hixson
Wake Forest Baptist Health, Winston-Salem, NC, USA
Donald W. Bowden, David Herrington, Nicholette Palmer & Beverly Snively
National Jewish Health, Denver, CO, USA
Russell Bowler, James Crapo, Tasha Fingerlin, Elizabeth Regan & Snow Xueyan Zhao
Medical College of Wisconsin, Milwaukee, WI, USA
Ulrich Broeckel
University of Texas Health at Houston, Houston, TX, USA
Deborah Brown & Paul de Vries
University of California, San Francisco, San Francisco, CA, USA
Esteban Burchard & Ryan Hernandez
Brigham & Women’s Hospital, Boston, MA, USA
Brian Cade, Vincent Carey, Juan P. Casas Romero, Peter Castaldi, Daniel Chasman, Michael Cho, Dawn DeMeo, Auyon Ghosh, Craig Hersh, Brian Hobbs, Meryl LeBoff, JoAnn Manson, Matt Moll, Dandi Qiao, Susan Redline, Edwin Silverman, Tamar Sofer, Jessica Lasky Su, Jody Sylvia, Scott T. Weiss & Carla Wilson
University of Colorado at Denver, Denver, CO, USA
Jonathan Cardwell, Sameer Chavan, Michelle Daya, Shanshan Gao, Daniel Grine, John Hokanson, Ethan Lange, Leslie Lange, Susan Mathai, Bonnie Neltner, Julia Powers Becker, Meher Preethi Boorgula, Pamela Russell, David Schwartz, Aniket Shetty, Garrett Storm, Tarik Walker, Avram Walts & Ivana Yang
University of Montreal, Montreal, WI, USA
Julie Carrier
Washington State University, Pullman, WA, US
Cara Carty
University of California, Los Angeles, Los Angeles, CA, USA
Richard Casaburi, Carolyn Crandall & Karol Watson
National Taiwan University, Taipei, Taiwan, ROC
Yi-Cheng Chang & Lee-Ming Chuang
University of Virginia, Charlottesville, VA, USA
Wei-Min Chen, Charles Farber, Ani Manichaikul, Josyf C. Mychaleckyj, Aakrosh Ratan & Stephen Rich
Lundquist Institute, Torrance, CA, USA
Yii-Der Ida Chen, Xiuqing Guo, Xiaohui Li, Henry Lin, Jerome Rotter, Kevin Sandow & Kent D. Taylor
National Health Research Institute, Miaoli County, Taiwan, ROC
Ren-Hua Chung & Chao (Agnes) Hsiung
University of Vermont, Burlington, VT, USA
Elaine Cornell, Jon Peter Durda & Russell Tracy
University of Mississippi, Jackson, MS, USA
Adolfo Correa & Michael Hall
Boston University, Boston, MA, USA
L. Adrienne Cupples, Honghuang Lin, Kathryn Lunetta & Vasan S. Ramachandran
Vitalant Research Institute, San Francisco, CA, USA
Brian Custer
University of Illinois at Chicago, Chicago, IL, USA
Dawood Darbar
University of Chicago, Chicago, IL, USA
Sean David
Mayo Clinic, Rochester, MN, USA
Mariza de Andrade
Washington University in St Louis, St. Louis, MO, USA
Lisa de las Fuentes & Susan K. Dutcher
Vanderbilt University, Nashville, TN, USA
Michael DeBaun, Dan Roden & M. Benjamin Shoemaker
University of Cincinnati, Cincinnati, OH, USA
Ranjan Deka
University of North Carolina, Chapel Hill, NC, USA
Qing Duan, Nora Franceschini, Yun Li, Kari North & Laura Raffield
University of Texas Rio Grande Valley School of Medicine, Edinburg, TX, USA
Ravi Duggirala & Juan Manuel Peralta
Brown University, Providence, RI, USA
Charles Eaton, Simin Liu & Stephen McGarvey
Harvard University, Cambridge, MA, USA
Adel El Boueiz, Wonji Kim, Sean McFarland & Vijay G. Sankaran
Massachusetts General Hospital, Boston, MA, USA
Patrick Ellinor, Steven Lubitz, James Meigs & Lu-Chen Weng
Washington University in St Louis, St Louis, MO, USA
Lucinda Fulton, C. Charles Gu, D. C. Rao, Karen Schwander & Yun Ju Sung
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Margery Gass, Jeff Haessler, Charles Kooperberg, Ulrike Peters & Lesley Tinker
Icahn School of Medicine at Mount Sinai, New York, NY, USA
Bruce Gelb, Eimear Kenny, Ruth J. F. Loos, Girish Nadkarni & Michael Preuss
Beth Israel Deaconess Medical Center, Boston, MA, USA
Robert Gerszten
Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
David Glahn
University of Texas Rio Grande Valley School of Medicine, San Antonio, TX, USA
Harald Goring
Mass General Brigham, Boston, MA, USA
Kathryn J. Gray
Indiana University, Indianapolis, IN, USA
David M. Haas & Jennifer Wessel
University of Calgary, Calgary, AB, Canada
Patrick Hanly
University of Maryland, Philadelphia, PA, USA
Daniel Harris
Yale University, New Haven, CT, USA
Nicola L. Hawley
Tulane University, New Orleans, LO, USA
Jiang He
University of Iowa, Iowa City, IA, USA
Karin Hoth & Robert Wallace
Tri-Service General Hospital National Defense Medical Center, Taipei City, Taiwan, ROC
Yi-Jen Hung
Blood Works Northwest, Seattle, WA, USA
Haley Huston, Jill Johnsen, Barbara Konkle & Sarah Ruuska
Taichung Veterans General Hospital Taiwan, Taichung City, Taiwan, ROC
Chii Min Hwu, Wen-Jane Lee & Wayne Hui-Heng Sheu
Oklahoma State University Medical Center, Columbus, OH, USA
Rebecca Jackson
Albert Einstein College of Medicine, New York, NY, USA
Robert Kaplan & Sylvia Smoller
University of California, San Francisco, San Francisco, CA, USA
Shannon Kelly
McGill University, Montreal, QC, Canada
John Kimoff
University of Colorado at Denver, Aurora, CO, USA
Greg Kinney
Loyola University, Maywood, IL, USA
Holly Kramer
Harvard School of Public Health, Boston, MA, USA
Christoph Lange & Xihong Lin
Ohio State University, Columbus, OH, USA
Ulysses Magalang
Broad Institute, Harvard University, Massachusetts General Hospital, Boston, MA, USA
Alisa Manning
George Washington University, Washington, DC, USA
Lisa Martin
RTI International, Research Triangle Park, USA
Becky McNeil & Cora Parker
University of Arizona, Tucson, AZ, USA
Deborah A. Meyers
Stanford University, Palo Alto, California, USA
Emmanuel Mignot
National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
Mollie Minear
Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
Courtney Montgomery
Ministry of Health, Government of Samoa, Apia, WS, Samoa
Take Naseri
Howard University, Washington, DC, USA
Sergei Nekhai
University at Buffalo, Buffalo, NY, USA
Heather Ochs-Balcom
University of Pennsylvania, Philadelphia, PA, USA
Allan Pack & Sarah Tishkoff
University of Minnesota, Minneapolis, MN, USA
James Pankow, Michael Tsai & Scott Vrieze
Boston University, Boston, MA, USA
Gina Peloso
University of Colorado at Denver, Denver, CO, USA
Nicholas Rafaels
Northwestern University, Chicago, IL, USA
Laura Rasmussen-Torvik
Fred Hutchinson Cancer Research Center, University of Washington, Seattle, WA, USA
Alex Reiner
Lutia I Puava Ae Mapu I Fagalele, Apia, WS, Samoa
Muagututi’a Sefuiva Reupena
University of Ottawa, Ottawa, ON, Canada
Rebecca Robillard
Universidade de Sao Paulo, Sao Paulo, Brazil
Ester Cerdeira Sabino
Columbia University, New York, New York, USA
Danish Saleheen
Harvard Medical School, Boston, MA, 02115, USA
Christine Seidman & Jonathan Seidman
Emory University, Atlanta, GA, USA
Vivien Sheehan
UMass Memorial Medical Center, Worcester, MA, USA
Brian Silver
University of Saskatchewan, Saskatoon, SK, Canada
Robert Skomro
University of Southern California, Los Angeles, CA, USA
David Van Den Berg
Brigham & Women’s Hospital, Mass General Brigham, Boston, MA, USA
Heming Wang
Henry Ford Health System, Detroit, MI, USA
L. Keoki Williams
Beth Israel Deaconess Medical Center, Cambridge, MA, USA
James Wilson
Case Western Reserve University, Cleveland, OH, USA
Xiaofeng Zhu

Authors

Tamar Sofer
View author publications
You can also search for this author in PubMed Google Scholar
Xiuwen Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Cecelia A. Laurie
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie M. Gogarten
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer A. Brody
View author publications
You can also search for this author in PubMed Google Scholar
Matthew P. Conomos
View author publications
You can also search for this author in PubMed Google Scholar
Joshua C. Bis
View author publications
You can also search for this author in PubMed Google Scholar
Timothy A. Thornton
View author publications
You can also search for this author in PubMed Google Scholar
Adam Szpiro
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey R. O’Connell
View author publications
You can also search for this author in PubMed Google Scholar
Ethan M. Lange
View author publications
You can also search for this author in PubMed Google Scholar
Yan Gao
View author publications
You can also search for this author in PubMed Google Scholar
L. Adrienne Cupples
View author publications
You can also search for this author in PubMed Google Scholar
Bruce M. Psaty
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth M. Rice
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium

Namiko Abe
, Gonçalo Abecasis
, Francois Aguet
, Christine Albert
, Laura Almasy
, Alvaro Alonso
, Seth Ament
, Peter Anderson
, Pramod Anugu
, Deborah Applebaum-Bowden
, Kristin Ardlie
, Dan Arking
, Donna K. Arnett
, Allison Ashley-Koch
, Stella Aslibekyan
, Tim Assimes
, Paul Auer
, Dimitrios Avramopoulos
, Najib Ayas
, Adithya Balasubramanian
, John Barnard
, Kathleen Barnes
, R. Graham Barr
, Emily Barron-Casella
, Lucas Barwick
, Terri Beaty
, Gerald Beck
, Diane Becker
, Lewis Becker
, Rebecca Beer
, Amber Beitelshees
, Emelia Benjamin
, Takis Benos
, Marcos Bezerra
, Larry Bielak
, Joshua Bis
, Thomas Blackwell
, John Blangero
, Eric Boerwinkle
, Donald W. Bowden
, Russell Bowler
, Jennifer Brody
, Ulrich Broeckel
, Jai Broome
, Deborah Brown
, Karen Bunting
, Esteban Burchard
, Carlos Bustamante
, Erin Buth
, Brian Cade
, Jonathan Cardwell
, Vincent Carey
, Julie Carrier
, Cara Carty
, Richard Casaburi
, Juan P. Casas Romero
, James Casella
, Peter Castaldi
, Mark Chaffin
, Christy Chang
, Yi-Cheng Chang
, Daniel Chasman
, Sameer Chavan
, Bo-Juen Chen
, Wei-Min Chen
, Yii-Der Ida Chen
, Michael Cho
, Seung Hoan Choi
, Lee-Ming Chuang
, Mina Chung
, Ren-Hua Chung
, Clary Clish
, Suzy Comhair
, Matthew Conomos
, Elaine Cornell
, Adolfo Correa
, Carolyn Crandall
, James Crapo
, L. Adrienne Cupples
, Joanne Curran
, Jeffrey Curtis
, Brian Custer
, Coleen Damcott
, Dawood Darbar
, Sean David
, Colleen Davis
, Michelle Daya
, Mariza de Andrade
, Lisa de las Fuentes
, Paul de Vries
, Michael DeBaun
, Ranjan Deka
, Dawn DeMeo
, Scott Devine
, Huyen Dinh
, Harsha Doddapaneni
, Qing Duan
, Shannon Dugan-Perez
, Ravi Duggirala
, Jon Peter Durda
, Susan K. Dutcher
, Charles Eaton
, Lynette Ekunwe
, Adel El Boueiz
, Patrick Ellinor
, Leslie Emery
, Serpil Erzurum
, Charles Farber
, Jesse Farek
, Tasha Fingerlin
, Matthew Flickinger
, Myriam Fornage
, Nora Franceschini
, Chris Frazar
, Mao Fu
, Stephanie M. Fullerton
, Lucinda Fulton
, Stacey Gabriel
, Weiniu Gan
, Shanshan Gao
, Yan Gao
, Margery Gass
, Heather Geiger
, Bruce Gelb
, Mark Geraci
, Soren Germer
, Robert Gerszten
, Auyon Ghosh
, Richard Gibbs
, Chris Gignoux
, Mark Gladwin
, David Glahn
, Stephanie Gogarten
, Da-Wei Gong
, Harald Goring
, Sharon Graw
, Kathryn J. Gray
, Daniel Grine
, Colin Gross
, C. Charles Gu
, Yue Guan
, Xiuqing Guo
, Namrata Gupta
, David M. Haas
, Jeff Haessler
, Michael Hall
, Yi Han
, Patrick Hanly
, Daniel Harris
, Nicola L. Hawley
, Jiang He
, Ben Heavner
, Susan Heckbert
, Ryan Hernandez
, David Herrington
, Craig Hersh
, Bertha Hidalgo
, James Hixson
, Brian Hobbs
, John Hokanson
, Elliott Hong
, Karin Hoth
, Chao (Agnes) Hsiung
, Jianhong Hu
, Yi-Jen Hung
, Haley Huston
, Chii Min Hwu
, Marguerite Ryan Irvin
, Rebecca Jackson
, Deepti Jain
, Cashell Jaquish
, Jill Johnsen
, Andrew Johnson
, Craig Johnson
, Rich Johnston
, Kimberly Jones
, Hyun Min Kang
, Robert Kaplan
, Sharon Kardia
, Shannon Kelly
, Eimear Kenny
, Michael Kessler
, Alyna Khan
, Ziad Khan
, Wonji Kim
, John Kimoff
, Greg Kinney
, Barbara Konkle
, Charles Kooperberg
, Holly Kramer
, Christoph Lange
, Ethan Lange
, Leslie Lange
, Cathy Laurie
, Cecelia Laurie
, Meryl LeBoff
, Sandra Lee
, Wen-Jane Lee
, Jonathon LeFaive
, David Levine
, Dan Levy
, Joshua Lewis
, Xiaohui Li
, Yun Li
, Henry Lin
, Honghuang Lin
, Xihong Lin
, Simin Liu
, Yongmei Liu
, Yu Liu
, Ruth J. F. Loos
, Steven Lubitz
, Kathryn Lunetta
, James Luo
, Ulysses Magalang
, Michael Mahaney
, Barry Make
, Ani Manichaikul
, Alisa Manning
, JoAnn Manson
, Lisa Martin
, Melissa Marton
, Susan Mathai
, Rasika Mathias
, Susanne May
, Patrick McArdle
, Merry-Lynn McDonald
, Sean McFarland
, Stephen McGarvey
, Daniel McGoldrick
, Caitlin McHugh
, Becky McNeil
, Hao Mei
, James Meigs
, Vipin Menon
, Luisa Mestroni
, Ginger Metcalf
, Deborah A. Meyers
, Emmanuel Mignot
, Julie Mikulla
, Nancy Min
, Mollie Minear
, Ryan L. Minster
, Braxton D. Mitchell
, Matt Moll
, Zeineen Momin
, May E. Montasser
, Courtney Montgomery
, Donna Muzny
, Josyf C. Mychaleckyj
, Girish Nadkarni
, Rakhi Naik
, Take Naseri
, Pradeep Natarajan
, Sergei Nekhai
, Sarah C. Nelson
, Bonnie Neltner
, Caitlin Nessner
, Deborah Nickerson
, Osuji Nkechinyere
, Kari North
, Jeff O’Connell
, Tim O’Connor
, Heather Ochs-Balcom
, Geoffrey Okwuonu
, Allan Pack
, David T. Paik
, Nicholette Palmer
, James Pankow
, George Papanicolaou
, Cora Parker
, Gina Peloso
, Juan Manuel Peralta
, Marco Perez
, James Perry
, Ulrike Peters
, Patricia Peyser
, Lawrence S. Phillips
, Jacob Pleiness
, Toni Pollin
, Wendy Post
, Julia Powers Becker
, Meher Preethi Boorgula
, Michael Preuss
, Bruce Psaty
, Pankaj Qasba
, Dandi Qiao
, Zhaohui Qin
, Nicholas Rafaels
, Laura Raffield
, Mahitha Rajendran
, Vasan S. Ramachandran
, D. C. Rao
, Laura Rasmussen-Torvik
, Aakrosh Ratan
, Susan Redline
, Robert Reed
, Catherine Reeves
, Elizabeth Regan
, Alex Reiner
, Muagututi’a Sefuiva Reupena
, Ken Rice
, Stephen Rich
, Rebecca Robillard
, Nicolas Robine
, Dan Roden
, Carolina Roselli
, Jerome Rotter
, Ingo Ruczinski
, Alexi Runnels
, Pamela Russell
, Sarah Ruuska
, Kathleen Ryan
, Ester Cerdeira Sabino
, Danish Saleheen
, Shabnam Salimi
, Sejal Salvi
, Steven Salzberg
, Kevin Sandow
, Vijay G. Sankaran
, Jireh Santibanez
, Karen Schwander
, David Schwartz
, Frank Sciurba
, Christine Seidman
, Jonathan Seidman
, Frédéric Sériès
, Vivien Sheehan
, Stephanie L. Sherman
, Amol Shetty
, Aniket Shetty
, Wayne Hui-Heng Sheu
, M. Benjamin Shoemaker
, Brian Silver
, Edwin Silverman
, Robert Skomro
, Albert Vernon Smith
, Jennifer Smith
, Josh Smith
, Nicholas Smith
, Tanja Smith
, Sylvia Smoller
, Beverly Snively
, Michael Snyder
, Tamar Sofer
, Nona Sotoodehnia
, Adrienne M. Stilp
, Garrett Storm
, Elizabeth Streeten
, Jessica Lasky Su
, Yun Ju Sung
, Jody Sylvia
, Adam Szpiro
, Daniel Taliun
, Hua Tang
, Margaret Taub
, Kent D. Taylor
, Matthew Taylor
, Simeon Taylor
, Marilyn Telen
, Timothy A. Thornton
, Machiko Threlkeld
, Lesley Tinker
, David Tirschwell
, Sarah Tishkoff
, Hemant Tiwari
, Catherine Tong
, Russell Tracy
, Michael Tsai
, Dhananjay Vaidya
, David Van Den Berg
, Peter VandeHaar
, Scott Vrieze
, Tarik Walker
, Robert Wallace
, Avram Walts
, Fei Fei Wang
, Heming Wang
, Jiongming Wang
, Karol Watson
, Jennifer Watt
, Daniel E. Weeks
, Joshua Weinstock
, Bruce Weir
, Scott T. Weiss
, Lu-Chen Weng
, Jennifer Wessel
, Cristen Willer
, Kayleen Williams
, L. Keoki Williams
, Carla Wilson
, James Wilson
, Lara Winterkorn
, Quenna Wong
, Joseph Wu
, Huichun Xu
, Lisa Yanek
, Ivana Yang
, Ketian Yu
, Seyedeh Maryam Zekavat
, Yingze Zhang
, Snow Xueyan Zhao
, Wei Zhao
, Xiaofeng Zhu
, Michael Zody
& Sebastian Zoellner

Contributions

T.S. and K.M.R. conceptualized the work and drafted the manuscript. T.S., X.Z., S.M.G., M.C., T.A.T., A.S. and K.M.R. developed, studied, and implemented the genetic analysis algorithm that incorporates different residual variances by group. T.S. performed simulation studies. X.Z., C.A.L., S.M.G. performed quality control on the genetic sequencing data. X.Z., C.A.L., J.A.B., and J.B. harmonized and performed quality control for the phenotypes used in the analysis. B.M.P., C.C.L. and K.M.R. supervised quality control and phenotype harmonization procedures. T.S., X.Z., C.A.L., S.M.G., J.A.B., M.P.C., J.C.B., T.A.T., A.S., J.R.O., E.M.L., Y.G., L.A.C., B.M.P. and K.M.R. interpreted the data, reviewed and approved the final manuscript. The NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium authors contributed to the TOPMed data collection, joint processing and quality controls, and establishment of analysis procedures.

Corresponding author

Correspondence to Tamar Sofer.

Ethics declarations

Competing interests

Bruce M. Psaty serves on the Steering Committee of the Yale Open Data Access Project funded by Johnson & Johnson. All other authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Jacklyn Hellwege and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Software 1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sofer, T., Zheng, X., Laurie, C.A. et al. Variant-specific inflation factors for assessing population stratification at the phenotypic variance level. Nat Commun 12, 3506 (2021). https://doi.org/10.1038/s41467-021-23655-2

Download citation

Received: 09 March 2020
Accepted: 07 May 2021
Published: 09 June 2021
DOI: https://doi.org/10.1038/s41467-021-23655-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Simulation studies

Genetic association analysis of BMI and hemoglobin concentration in TOPMed

Discussion

Methods

The linear model

Implication of variance stratification on the Wald test

Computing approximate variant-specific inflation factors

Algorithm for computing variant-specific inflation factors

Simulation studies

Whole genome sequencing in TOPMed

Variant-specific inflation and genetic association analysis of BMI and hemoglobin concentration in TOPMed

Computing variant-specific inflation factors in mixed models with residual rank-normalization

Assessing population stratification at the variance level through QQ-plots

Characterizing variants by inflation patterns

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links