Introduction

Variance components analysis1 has emerged as a versatile tool in human complex trait genetics, enabling studies of the genetic contribution to variation in a trait2 as well as its distribution across genomic loci3,4, allele frequencies3, and functional annotations3,5,6. There is increasing interest in applying methods for variance components analysis to large-scale genetic datasets with the goal of uncovering novel insights into the genetic architecture of complex traits4,7. A prominent example of the utility of these methods is in the estimation of SNP heritability (\({h}_{{\mathrm{{SNP}}}}^{2}\))2, the variance in a trait explained by a given set of genotyped SNPs. Variance components methods for estimating SNP heritability typically assume a genetic variance component that represents the fraction of phenotypic variation explained by the SNPs included in the study and a residual variance component. Recent studies have shown that these “single-component” methods yield biased estimates of SNP heritability due to the linkage disequilibrium (LD) and minor allele frequency (MAF)-dependent architecture of complex traits8,9. On the other hand, flexible models with multiple variance components3,4 that allows for SNP effects to vary with MAF and LD, have been shown to yield more accurate SNP heritability estimates8,9. Recent work has shown that SNP heritability can be estimated with minimal assumptions about the genetic architecture10; however, this method cannot partition heritability across categories of SNPs of interest such as functional or population genomic annotations. Partitioning heritability requires fitting multiple variance components, thus creating the need for accurate and scalable methods that can fit tens or even hundreds of variance components to large-scale genomic data to obtain accurate and novel insights into genetic architecture.

While the ability to fit flexible variance component models to large-scale datasets is essential to obtain accurate and novel insights into genetic architecture, fitting such models requires scalable algorithms. Approaches for estimating variance components typically search for parameter values that maximize the likelihood or the restricted maximum likelihood (REML)11. Despite a number of algorithmic improvements2,4,12,13,14,15,16, computing REML estimates of the variance components on data sets such as the UK Biobank17 (≈500,000 individuals genotyped at nearly one million SNPs) remains challenging. The reason is that methods for computing these estimators typically perform repeated computations on the input genotypes.

We propose a method that can jointly estimate multiple variance components efficiently. Our proposed method, RHE-mc, is a randomized multi-component version of the classical Haseman–Elston regression for heritability estimation18,19. RHE-mc builds on our previously proposed method, RHE-reg20, which uses a randomized algorithm to estimate a single variance component. RHE-mc can simultaneously estimate multiple variance components, as well as variance components associated with continuous and overlapping annotations. Further, unlike REML estimation algorithms, RHE-mc requires only a single pass over the input genotypes that results in a highly memory efficient implementation. The resulting computational efficiency permits RHE-mc to jointly fit 300 variance components in less than an hour on a dataset of about 300,000 individuals and 500,000 SNPs, about two orders of magnitude faster than state-of-the-art methods. On a dataset of one million individuals and one million SNPs, RHE-mc can fit 100 variance components in about 12 h.

To demonstrate its utility, we first show that RHE-mc can accurately estimate genome-wide and partitioned SNP heritability under realistic genetic architectures (the functional dependence of SNP effect sizes on MAF and LD). We applied RHE-mc to 22 traits measured across 291,273 individuals genotyped at 459,792 common SNPs (MAF > 1%) in the UK Biobank to obtain estimates of genome-wide SNP heritability. We then used RHE-mc to partition heritability for the 22 traits across seven million imputed SNPs (MAF > 0.1%) into 144 bins defined based on MAF and LD. We observe that the per-allele squared effect size tends to increase with lower MAF and LD across the traits considered. Finally, we partitioned heritability for SNPs with MAF > 0.1% across 28 functional annotations. We recover previously reported enrichment of heritability in annotations corresponding to conserved regions7 and also document enrichment of heritability in FANTOM5 enhancers in eczema, asthma, autoimmune disorders, and thyroid disorders.

Results

Methods overview

RHE-mc aims to fit a variance component model that relates phenotypes y measured across N individuals to their genotypes over M SNPs X:

$$ {\boldsymbol{y}}| {\boldsymbol{\epsilon }},{{\boldsymbol{\beta }}}_{1},\ldots ,{{\boldsymbol{\beta }}}_{K} =\mathop{\sum }\limits_{k=1}^{K}{{\boldsymbol{X}}}_{k}{{\boldsymbol{\beta }}}_{k}+{\boldsymbol{\epsilon }}\\ \hskip 62pt {\boldsymbol{\epsilon }} \sim {\mathcal{D}}({\bf{0}},{\sigma }_{e}^{2}{{\bf{I}}}_{N})\\ \hskip 20pt {{\boldsymbol{\beta }}}_{k} \sim {\mathcal{D}}\left({\bf{0}},\frac{{\sigma }_{k}^{2}}{{M}_{k}}{{\bf{I}}}_{{M}_{k}}\right),k\in \{1,\ldots ,K\}$$

where \({\mathcal{D}}(\boldsymbol{\mu} ,\boldsymbol{\Sigma})\) is an arbitrary distribution with mean μ and covariance Σ. Each of the M SNPs is assigned to one of K non-overlapping categories so that Xk is the N × Mk matrix consisting of standardized genotypes of SNPs belonging to category k (note that the expected heritability is constant within categories when we use standardized genotypes). βk denotes the effect sizes of SNPs assigned to category k which are drawn from a zero-mean distribution with covariance parameter \(\frac{{\sigma }_{k}^{2}}{M_k}{{\bf{I}}}_{{M}_{k}}\) (the variance component of category k) while \({\sigma }_{e}^{2}\) is the residual variance.

In this model, the genome-wide SNP heritability is defined as: \({h}_{{\mathrm{{SNP}}}}^{2}=\frac{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2}}{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2} \, + \, {\sigma }_{e}^{2}}\) while the SNP heritability of category k is defined as: \({h}_{k}^{2}=\frac{{\sigma }_{k}^{2}}{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2} \, + \, {\sigma }_{e}^{2}}\). By choosing categories to represent genomic annotations of interest, e.g., chromosomes, allele frequencies, or functional annotations, these models can be used to estimate the phenotypic variation that can be attributed to the relevant annotation.

The key inference problem in this model is the estimation of the variance components: \(({\sigma }_{1}^{2},\ldots ,{\sigma }_{K}^{2},{\sigma }_{e}^{2})\). These parameters are typically estimated by maximizing the likelihood or the restricted likelihood. Instead, RHE-mc uses a scalable method-of-moments estimator, i.e., finding values of the variance components such that the population moments match the sample moments18,19,21,22,23. RHE-mc uses a randomized algorithm that avoids explicitly computing N × N genetic relatedness matrices that are required by method-of-moments estimators. Instead, it operates on a smaller matrix formed by multiplying the input genotype matrix with a small number of random vectors (see “Methods” section). The application of a randomized algorithm for SNP heritability estimation using a single variance component was proposed in our previous work, RHE-reg20. RHE-mc extends our previous work in several directions. RHE-mc can efficiently fit multiple variance components (both non-overlapping and overlapping) and can also handle continuous annotations. The resulting algorithm has scalable runtime as it only requires operating on the genotype matrix one time. Further, RHE-mc uses a streaming implementation that does not require all the genotypes to be stored in memory leading to scalable memory requirements (Supplementary Notes). Finally, RHE-mc uses an efficient implementation of a block Jackknife to estimate standard errors with little computational overhead (Supplementary Notes).

Accuracy of genome-wide SNP heritability estimates in simulations

We assessed the accuracy of RHE-mc in estimating genome-wide SNP heritability as previous attempts at estimating SNP heritability have been shown to be sensitive to assumptions about how SNP effect size varies with MAF and LD8. Starting with genotypes of M = 593,300 array SNPs over N = 337,205 unrelated white British individuals in the UK Biobank, we simulated phenotypes according to 64 MAF and LD-dependent architectures by varying the SNP heritability, the proportion of variants that have non-zero effects (causal variants or CVs), the distribution of CVs across minor allele frequencies (CVs distributed across all minor allele frequency bins or CVs restricted to either common or low-frequency bins), and the form of coupling between the SNP effect size and MAF as well as LD. For RHE-mc, we partitioned the SNPs into 24 variance components based on six MAF bins as well as four LD bins (see “Methods” section). The key parameter in applying RHE-mc is the number of random vectors B which we set to 10. RHE-mc estimates were relatively insensitive when we increased the number of random vectors B to 100 (Supplementary Figs. 1 and 2, Supplementary Table 1). Across these 64 architectures, RHE-mc is relatively unbiased (a two-sided t-test of the hypothesis of no bias is not rejected across any of the architectures at a p-value < 0.05) with the largest relative bias observed to be 0.5% of the true SNP heritability (Supplementary Fig. 3). We used a block Jackknife (number of blocks = 100) to estimate the standard errors of RHE-mc and confirmed that the estimated standard errors are close to the true SE (Supplementary Table 2).

We compared the accuracy of RHE-mc to state-of-the-art methods for heritability estimation that can be applied to large datasets (across architectures where the true SNP heritability was fixed at 0.5). These methods, LDSC24, SumHer25, S-LDSC26, and GRE10, all leverage summary statistics while RHE-mc requires individual genotype data. We found that estimates from the summary-statistic methods tend to be sensitive to the underlying genetic architecture: across 16 architecture relative biases range from −31% to 27% for LDSC,   −27% to 5% for S-LDSC, and −5% to 9% for SumHer (Fig. 1). We also compared to a recently proposed method (GRE10) that only estimates genome-wide SNP heritability (without partitioning by MAF/LD) and observed that relative biases ranged from 1% to 1.4% for GRE and from  −1.5% to 0.5% for RHE-mc. We also considered architectures in which only rare variants are causal and found RHE-mc is accurate relative to other methods (Supplementary Fig. 4). These results further emphasize that RHE-mc can accurately estimate SNP-heritability through fitting multiple variance components.

Fig. 1: Comparison of estimates of genome-wide SNP heritability from RHE-mc with LDSC, GRE, S-LDSC, and SumHer in large-scale simulations (N = 337,205 unrelated individuals, M = 593,300 array SNPs).
figure 1

a We compared methods for heritability estimation under 16 different genetic architectures. We set true heritability to 0.5 and varied the MAF range of causal variants (MAF of CV), the coupling of MAF with effect size (a = 0 indicates no coupling with MAF and a = 0.75 indicates coupling with MAF), and the effect of local LD on effect size (b = 0 indicates no dependence on LDAK weights and b = 1 indicates dependence on LDAK weights) (see “Methods” section). Each boxplot represents estimates from 100 simulations. b Relative bias of each method (as a percentage of the true h2) across 16 distinct MAF-dependent and LD-dependent architectures. Each boxplot contains 16 points; each point is the relative bias estimated from 100 simulations under a single genetic architecture. Points and error bars represent the mean and  ±2 SE. In a and b, boxplot whiskers extend to the minimum and maximum estimates located within 1.5×  interquartile range (IQR) from the first and third quartiles, respectively. Here, we run RHE-mc using 24 bins formed by the combination of six bins based on MAF as well as four bins based on quartiles of the LDAK score of a SNP (see “Methods“ section). We run S-LDSC with only 10 MAF bins (see Supplementary Table 5). To do a fair comparison, for every method, we computed LD scores and LDAK weights by using in-sample LD, and in all simulations we aim to estimate the SNP-heritability explained by the same set of M SNPs.

We compared RHE-mc to the state-of-the-art REML-based variance component estimation method, GCTA-mc (multi-component GREML8,27,28) and to exact multi-component Haseman–Elston Regression (HE-mc) as implemented in GCTA27. We ran each of these methods by partitioning SNPs into 24 variance components (6 MAF bins by 4 LD bins, see “Methods” section). To make these experiments computationally feasible, we simulated phenotypes starting from a smaller set of genotypes (M = 593,300 array SNPs and N = 10,000 white British individuals). Across 16 architectures where the true SNP heritability was fixed at 0.25, the relative biases for RHE-mc range from  −3.2% to 3.6%, and from  −3.2% to 5% for GCTA-mc (Fig. 2). On average, RHE-mc has standard errors that are 1.1 times larger than GCTA-mc (which range from 0.97 to 1.24) and 1.08 times larger than HE-mc (which range from 1.00 to 1.21).

Fig. 2: Comparison of SNP heritability estimates from RHE-mc with GCTA-mc (GCTA with multiple variance components) and HE-mc (HE with multiple variance components) (N = 10,000 unrelated individuals, M = 593,300 array SNPs).
figure 2

In ad we compared heritability estimates from these methods under 16 different genetic architectures. We varied the MAF range of causal variants (MAF of CV), the coupling of MAF with effect size (a), and the effect of local LD on effect size (b = 0 indicates no dependence on LDAK weights and b = 1 indicates dependence on LDAK weights (see “Methods” section). We ran 100 replicates where the true heritability of the phenotype is 0.25. We run RHE-mc, HE-mc, and GCTA-mc using 24 bins formed by the combination of six bins based on MAF as well as four bins based on quartiles of the LDAK score of a SNP (see “Methods” section). Across all different genetic architectures, the relative biases range from  −3.2% to 3.6% for RHE-mc, and from  −3.2% to 5% for GCTA-mc, and from  −2.6% to 1.45% for HE-mc. On average, RHE-mc has SEs that are 1.1 and 1.08 times larger than GCTA-mc and HE-mc, respectively. Black points and error bars represent the mean and  ±2 SE. Each boxplot represents estimates from 100 simulations. Boxplot whiskers extend to the minimum and maximum estimates located within 1.5×  interquartile range (IQR) from the first and third quartiles, respectively. The SEs are computed from 100 simulations (note that GCTA-mc did not run successfully on all 100 simulations).

Accuracy of heritability partitioning in simulations

We also evaluated the accuracy of RHE-mc in partitioning SNP heritability in both small-scale (M = 593,300 SNPs, N = 10,000 individuals) (Supplementary Fig. 5) and large-scale settings (M = 593,300 SNPs, N = 337,205 individuals) (see Supplementary Fig. 6). For these experiments, we restrict our attention to architectures for which the CVs are chosen to lie within a narrow range of MAF. Since the variance components correspond to bins of MAF and LD, a subset of the variance components would have no causal SNPs and hence have a heritability of zero. We assess the accuracy of estimates of heritability aggregated over these components (termed the non-causal bin) as well as the heritability aggregated over the remaining genetic components (termed the causal bin). For example, variance components that correspond to MAF  [0.01, 0.05] would be included in the causal bin for an architecture that restricts the MAF of CVs to lie in the range [0.01, 0.05]. For the small-scale simulations, we compared RHE-mc to GCTA-mc. We ran both methods by partitioning the SNPs into 24 variance components based on six MAF bins as well as four LD bins defined by quartiles of the measure of LDAK weight at a SNP (see “Methods” section). Across the genetic architectures tested, estimates of heritability within each of the causal and non-causal bins are highly concordant between RHE-mc and GCTA-mc (Supplementary Fig. 5, Supplementary Table 3): for the causal bin, the relative bias ranges from  −4% to 0.4% for RHE-mc and  −3.6% to 2% for GCTA-mc while, for the non-causal bin, the bias ranges from 0 to 0.7% for RHE-mc and 0 to 1.4% for GCTA-mc (Supplementary Table 3). For the large-scale settings, RHE-mc remains accurate: the relative bias ranges from  −2.6% to 3.2% (causal bin) and −0.5% to 0.2% (non-causal bin) over the genetic architectures considered (Supplementary Fig. 6, Supplementary Table 4).

Heritability partitioning has been used to estimate heritability attributed to functional genomic annotations7. However, some of these annotations (such as FANTOM5 enhancers) are quite small covering  <1% of the genome. We explored the ability of RHE-mc to accurately estimate heritability as a function of the size of the annotation. To this end, we performed simulations using N = 291,273 unrelated white British individuals and M = 459,792 common SNPs. We defined eight annotations (four MAF bins and two LD bins) in which we fixed the enrichment of a selected bin and varied the proportion of SNPs in the selected category. RHE-mc obtained accurate estimates of enrichment even when the selected bin only contained 0.4% of the genome-wide SNPs (comparable to the size of FANTOM5 enhancers). RHE-mc estimates are well-calibrated: when the bin has zero enrichment, RHE-mc rejected the null hypothesis of no enrichment in 5% of the simulations, while attaining high power to reject the null hypothesis even when the bin contained  <1% of the SNPs (Supplementary Notes).

Computational efficiency

We benchmarked the runtime and memory usage of RHE-mc as a function of number of individuals, SNPs and variance components (Fig. 3, Table 1). We ran RHE-mc with B = 10 random vectors and 22 variance components where each chromosome forms a distinct component. On a dataset of  ≈300,000 individuals and  ≈500,000 SNPs, RHE-mc can fit 22 variance components in less than an hour and  ≈300 variance components (corresponding to bins of size 10 Mb) with little increase in its runtime. On a dataset of one million individuals and one million SNPs, RHE-mc can fit 100 variance components in a few hours. Further, due to its use of a streaming implementation that only requires the genotypes to be operated on once, the memory requirement of RHE-mc is modest: all experiments required <60 GB. We compared the run time and memory usage of RHE-mc with REML-based methods (GCTA27 and BOLT-REML4) on the UK Biobank genotypes consisting of around 500,000 SNPs over varying sample sizes and observed that RHE-mc achieves several orders-of-magnitude reduction in runtime. Summary-statistic methods such as S-LDSC requires pre-computed inputs which depend on the runtimes of other softwares making a direct comparison of speed difficult. Thus, we have restricted our comparison to individual-level methods where the benchmarking can be done in a comparable manner.

Fig. 3: Comparison of running time of RHE-mc, GCTA-mc, and BOLT-REML.
figure 3

We compared runtime of RHE-mc, GCTA-mc, and BOLT-REML with increasing sample size N (for a fixed number of SNPs M = 459,792 and components K = 22).

Table 1 Comparison of running time of RHE-mc, GCTA-mc, and BOLT-REML.

Estimating total SNP heritability in the UK Biobank

We applied RHE-mc to estimate genome-wide SNP heritability for 22 complex traits (6 quantitative and 16 binary traits) measured in the UK Biobank. We analyzed N = 291,273 unrelated white British individuals and M = 459,792 SNPs genotyped on the UK Biobank Axiom array (see “Methods” section). We ran RHE-mc with B = 10 and with SNPs divided into eight bins based on two MAF bins (0.01 ≤ MAF < 0.05, MAF ≥ 0.05) and quartiles of the LD-scores. We compared the estimates from RHE-mc to those from LDSC, S-LDSC, SumHer, and GRE. Restricting our analysis to 18 traits for which the point estimate of genome-wide SNP heritability from RHE-mc is  >0.05, the estimates from S-LDSC, GRE, SumHer, and LDSC were on average 2.5%, 10%, 25%, and 67% higher than RHE-mc (Fig. 4). Relative to the simulation results, the estimates from S-LDSC are generally consistent with those from RHE-mc. This is likely due to the fact that, in simulations, our application of S-LDSC used only MAF bins. On the other hand, in real data, we used S-LDSC with the recommended baseline-LD annotations (including functional annotations).

Fig. 4: Estimates of genome-wide SNP heritability from RHE-mc, LDSC, S-LDSC, GRE, and SumHer for 22 complex traits and diseases in the UK Biobank.
figure 4

We restricted our analysis to N = 291,273 unrelated white British individuals. We applied all methods to M = 459,792 array SNPs (MAF > 1%). We ran S-LDSC with baseline-LD model. For every method, LD scores or LDAK weights are computed using in-sample LD among the SNPs, and we aim to estimate the SNP-heritability explained by the same set of SNPs. RHE-mc was applied to array SNPs with eight MAF/LD bins. Black error bars mark  ±2 standard errors centered on the estimated heritability. We used a block Jackknife (number of blocks = 100) to estimate the standard errors. In Supplementary Fig. 7, we also report RHE-mc estimates of genome-wide SNP heritability on M = 4,824,392 imputed SNPs (MAF > 1%) with 8 MAF/LD bins and M = 7,774,235 imputed SNPs (MAF > 0.1%) with 144 MAF/LD bins (see “Methods” section).

We then applied RHE-mc to estimate genome-wide heritability attributable to imputed variants. The genome-wide estimates of SNP heritability from RHE-mc on imputed SNPs (MAF > 1%) are concordant with the estimates from array SNPs (2.8% higher on average). We then analyzed M = 7,774,235 imputed genotypes with MAF > 0.1% using 144 bins formed by 4 LD bins and 36 MAF bins (see “Methods” section). Genome-wide SNP heritability estimates from RHE-mc on imputed SNPs (MAF > 0.1%) are 11.4% higher than RHE-mc on imputed SNPs (MAF > 1%) (Fig. 4, Supplementary Fig. 7). Following previous work10, we have removed the MHC region to enable a systematic comparison since the estimation of LD in the MHC region can be challenging; it would be of interest to compare methods when the MHC is included.

Partitioning SNP heritability across allele frequency and LD bins

We used RHE-mc to partition SNP heritability of 22 complex traits across MAF and LD bins. We analyzed M = 7,774,235 imputed SNPs with MAF > 0.1%. We used 144 bins formed by 4 LD bins and 36 MAF bins (see “Methods” section). We compute the per-allele squared effect size of SNPs in bin k as \(\frac{{h}_{k}^{2}}{2{f}_{k}(1\,-\,{f}_{k}){M}_{k}}\), where \({h}_{k}^{2}\) is the heritability estimated in bin k, fk is the mean MAF in bin k, and Mk is the number of SNPs in bin k. We observe that allelic effect size increases with lower MAF and LD. For height, in the lowest quartile of LD scores, SNPs with MAF ≈ 0.1% have allelic effect sizes ≈27x ± 8 larger than SNPs with MAF ≈ 50%. Similarly, among SNPs with MAF ≈50%, SNPs in the lowest quartile of LD scores have allelic effect sizes ≈5x ± 1 larger than SNPs in the highest quartile (Fig. 5 for height; other traits in Supplementary Fig. 9). While these trends have been observed in previous studies9,29,30, the ability of RHE-mc to jointly fit multiple variance components allows us to estimate effect sizes at SNPs with MAF as low as 0.1%. We caution that negative heritability estimates in bins of lowest MAF and high LD score could arise due to one or more of the following factors: low number of SNPs in this bin (we did not constrain our variance components estimates to be non-negative), the inadequacy of the assumed heritability model, and errors in the imputed genotypes used for the analysis.

Fig. 5: Per-allele squared effect size of height as a function of MAF.
figure 5

We applied RHE-mc to N = 291,273 unrelated white British individuals and M = 7,774,235 imputed SNPs. SNPs were partitioned into 144 bins based on LD score (4 bins based on quartiles of the LD score with i denoting the ith quartile) and MAF (36 MAF bins) (see “Methods” section). Per-allele effect size squared for bin k is defined as \(\frac{{h}_{k}^{2}}{{M}_{k}* 2{f}_{k}* (1\,-\,{f}_{k})}\), where \({h}_{k}^{2}\) is the heritability attributed to bin k, Mk is the number of SNPs in bin k, and fk is the average MAF in bin k. Each point represents the estimated allelic effect size. Bars mark  ±2 standard errors centered on the estimated allelic effect size. See Supplementary Fig. 9 for results on all 22 traits.

Partitioning heritability by functional annotations

The ability of RHE-mc to estimate variance components associated with a large number of overlapping annotations enables us to explore the contribution of a variety of functional genomic annotations to trait heritability using individual-level data in the UK Biobank. We applied RHE-mc to jointly partition heritability of 22 complex traits across 28 functional annotations as defined in ref. 7 restricting our analysis to N = 291,273 unrelated white British individuals and M = 5,670,959 imputed SNPs (we restrict to SNPs with MAF > 0.1% which are also present in 1000 Genomes Project). We grouped the traits into five categories (autoimmune, diabetes, respiratory, anthropometric, cardiovascular); for a representative trait from each category, we report enrichment of each of the 28 functional annotations in Fig. 6 (see “Methods” section; for all traits see Supplementary Fig. 8). Our results are largely concordant with previous studies7,9: we observe enrichment of heritability across traits in conserved regions (Z-score > 3 in 15 traits). We also observe enrichment of heritability at FANTOM5 enhancers (labeled Enhancer_Andersson in Fig. 6) in asthma, eczema, autoimmune disorders (broad), hypothyroidism, and thyroid disorders (Z-score > 3) even though these annotations cover only 0.4% of the analyzed SNPs.

Fig. 6: Enrichment of heritability across 28 functional annotations.
figure 6

We applied RHE-mc to N = 291,273 unrelated white British individuals and M = 5,670,959 imputed SNPs (MAF > 0.1% and present in 1000 Genomes Project). SNPs were partitioned based on 28 functional annotations that were defined in a previous study7. We grouped 22 traits in the UK Biobank into five categories (autoimmune, diabetes, respiratory, anthropometric, cardiovascular). Here we plot enrichment of five traits (one representative trait per category). Each bar represents the estimated enrichment. Black error bars mark  ±2 standard errors centered on the estimated enrichment. Annotations are ordered by the proportions of SNPs in that annotation (given in parentheses). See Supplementary Fig. 8 for results on all 22 traits.

Discussion

We have presented RHE-mc, an algorithm that can efficiently estimate multiple variance components on large-scale genotype data. In light of increasing evidence for SNP effect sizes that vary as a function of covariates, such as MAF and LD and the bias associated with methods that fit only a single variance component8, the ability to define flexible models endowed with multiple variance components is important to obtain unbiased estimates of fundamental quantities such as SNP heritability. We confirm that RHE-mc yields accurate genome-wide SNP heritability estimates under diverse genetic architectures. In applications to 22 complex traits in the UK Biobank, RHE-mc yields heritability estimates on array SNPs that are lower on average relative to S-LDSC and SumHer. We have explored the utility of RHE-mc in heritability partitioning analyses. These analyses show that per-allele squared effect sizes tend to increase with a decrease in MAF and LD consistent with previous studies9. We also partitioned heritability across functional annotations to reveal enrichment of heritability at FANTOM5 enhancers in specific traits such as asthma and eczema.

We discuss several limitations of RHE-mc as well as directions for future work. First, the method-of-moments estimator underlying RHE-mc tends to yield slightly larger standard errors, on average, relative to REML estimators. The relative performance of the two methods likely depends on a number of aspects of the study design such as sample size, number of SNPs, the LD structure, relatedness patterns, and the underlying genetic architecture. Nevertheless, our method is designed to be applicable to massive datasets for which the heritability estimates are relatively precise. Developing scalable variance components estimators that are as efficient as REML-based methods is an important direction for future work. Second, this work has primarily explored the partitioning of heritability across discrete annotations. While we have shown how the methodology can be extended to continuous-valued annotations (see “Methods” section and Supplementary Notes), it would be of interest to explore variation in trait heritability as a function of the value of an annotation. On the other hand, the ability of RHE-mc to fit many annotations allows the annotation to be divided into a sufficiently large number of bins. Third, we have applied RHE-mc to binary traits available in the UK Biobank treating these traits as continuous. Methods that explicitly model binary traits as well as the underlying ascertainment involved in case-control studies are likely to lead to more accurate heritability estimates23,31. For example, the PCGC method23 is an extension of HE regression and it would be of interest to develop a scalable randomized PCGC estimator. Fourth, RHE-mc requires access to individual-level genotype and phenotype data. Methods that only require summary statistic data (GRE10, LDSC24, and SumHer25) have the advantage of being applicable to datasets where acquiring access to individual-level data can be challenging10. Finally, our method could potentially lead to improvements in association testing, trait prediction, and understanding of polygenic selection.

Methods

Multi-component linear mixed model

RHE-mc attempts to fit the following variance component model:

$$ {\boldsymbol{y}}| {\boldsymbol{\epsilon }},{{\boldsymbol{\beta }}}_{1},\ldots ,{{\boldsymbol{\beta }}}_{K} =\mathop{\sum }\limits_{k=1}^{K}{{\boldsymbol{X}}}_{k}{{\boldsymbol{\beta }}}_{k}+{\boldsymbol{\epsilon }}\\ \hskip 62pt {\boldsymbol{\epsilon }} \sim {\mathcal{D}}({\bf{0}},{\sigma }_{e}^{2}{{\bf{I}}}_{N})\\ \hskip 20pt {{\boldsymbol{\beta }}}_{k} \sim {\mathcal{D}}\left({\bf{0}},\frac{{\sigma }_{k}^{2}}{{M}_{k}}{{\bf{I}}}_{{M}_{k}}\right),k\in \{1,\ldots ,K\}$$
(1)

Here y is a N-vector of centered phenotypes and each of the M SNPs is assigned to one of K non-overlapping categories. Each category k contains Mk SNPs, k {1, …, K}, ∑kMk = M. Xk is a N × Mk matrix, where xk,n,m denotes the standardized genotype for individual n at SNP m in category k. We have ∑nxk,n,m = 0 and \({\sum }_{n}{x}_{k,n,m}^{2}=N\) for m {1, 2, …, Mk}. βk denote the Mk-vector of SNP effect sizes for the kth category where \({\mathcal{D}}(\boldsymbol{\mu} ,\boldsymbol{\Sigma})\) is an arbitrary distribution with mean \(\boldsymbol{\mu}\) and covariance \(\boldsymbol{\Sigma}\). In the above model, \({\sigma }_{e}^{2}\) is the residual variance, and \({\sigma }_{k}^{2}\) is the variance component of the kth category. The total SNP heritability is defined as

$${h}_{{\mathrm{{SNP}}}}^{2}=\frac{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2}}{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2}+{\sigma }_{e}^{2}}$$
(2)

The SNP heritability of category k is defined as

$${h}_{k}^{2}=\frac{{\sigma }_{k}^{2}}{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2}+{\sigma }_{e}^{2}},k\in \{1,\ldots ,K\}$$
(3)

Enrichment in bin k is defined as

$${e}_{k}=\frac{{h}_{k}^{2}/{h}_{{\mathrm{{SNP}}}}^{2}}{{M}_{k}/M},k\in \{1,\ldots ,K\}$$
(4)

Method-of-moments for estimating multiple variance components

To estimate the variance components, RHE-mc uses a Method-of-Moments (MoM) estimator that searches for parameter values so that the population moments are close to the sample moments32. Since \({\mathbb{E}}[{\bf{y}}]=0\), we derived the MoM estimates by equating the population covariance to the empirical covariance. The population covariance is given by

$${\mathrm{{cov}}}({\boldsymbol{y}})=E[{\boldsymbol{y}}{{\boldsymbol{y}}}^{{\mathrm{{T}}}}]-E[{\boldsymbol{y}}]E[{{\boldsymbol{y}}}^{{\mathrm{{T}}}}]=\sum _{k}{\sigma }_{k}^{2}{{\boldsymbol{K}}}_{k}+{\sigma }_{e}^{2}{{\boldsymbol{I}}}_{N}$$
(5)

Here \({{\boldsymbol{K}}}_{k}=\frac{{{\boldsymbol{X}}}_{k}{{\boldsymbol{X}}}_{k}^{{\mathrm{{T}}}}}{{M}_{k}}\) is the genetic relatedness matrix (GRM) computed from all SNPs of kth category. Using yyT as our estimate of the empirical covariance, we need to solve the following least-squares problem to find the variance components.

$$(\tilde{{\sigma }_{1}^{2}},\ldots ,\tilde{{\sigma }_{K}^{2}},\tilde{{\sigma }_{e}^{2}})={\mathop {\mathrm{{argmi{n}}}}\limits_{({\sigma }_{1}^{2},\ldots ,{\sigma }_{K}^{2},{\sigma }_{e}^{2})}}| | {\boldsymbol{y}}{{\boldsymbol{y}}}^{{\mathrm{{T}}}}-\left({\mathop{\sum }\limits_{k}}{\sigma }_{k}^{2}{{\boldsymbol{K}}}_{k}+{\sigma }_{e}^{2}{\boldsymbol{I}}\right)| {| }_{F}^{2}$$
(6)

The MoM estimator satisfies the following normal equations:

$$\left[\begin{array}{ll}{\boldsymbol{T}}&{\boldsymbol{b}}\\ {{\boldsymbol{b}}}^{{\mathrm{{T}}}}&N\end{array}\right]\left[\begin{array}{l}\tilde{{\sigma }_{g}^{2}}\\ {\tilde{\sigma }}_{e}^{2}\end{array}\right]=\left[\begin{array}{l}{\boldsymbol{c}}\\ {{\boldsymbol{y}}}^{{\mathrm{{T}}}}{\boldsymbol{y}}\end{array}\right]$$
(7)

Here \(\tilde{{\sigma }_{g}^{2}}=\left[\begin{array}{l}\tilde{{\sigma }_{1}^{2}}\\ \vdots \\ \tilde{{\sigma }_{K}^{2}}\end{array}\right]\), T is a K × K matrix with entries Tk,l = tr(KkKl), kl {1, …, K}, b is a K-vector with entries bk = tr(Kk) = N (because Xks is standardized), and c is a K-vector with entries ck = yTKky. Each GRM Kk can be computed in time \({\mathcal{O}}({N}^{2}{M}_{k})\) and \({\mathcal{O}}({N}^{2})\) memory. Given K GRMs, the quantities Tk,l, ck, kl {1, …, K}, can be computed in \({\mathcal{O}}({K}^{2}{N}^{2})\). Given the quantities Tk,l, ck, the normal Eq. (7) can be solved in \({\mathcal{O}}({K}^{3})\). Therefore, the total time complexity for estimating the variance components is \({\mathcal{O}}({N}^{2}M+{K}^{2}{N}^{2}+{K}^{3})\).

RHE-mc: Randomized estimator of multiple variance components

The key bottleneck in solving the normal Eq. (7) is the computation of Tk,l, kl {1, …, K} which takes \({\mathcal{O}}({N}^{2}M)\). Instead of computing the exact value of Tk,l, we use an unbiased estimator of the trace33 based on the following identity: for a given N × N matrix C, zTCz is an unbiased estimator of tr(C) (E[zTCz] = tr[C]), where z be a random vector with mean zero and covariance IN. Hence, we can estimate the values Tk,l, kl {1, …, K} as follows:

$${T}_{k,l}={\mathrm{{tr}}}({{\boldsymbol{K}}}_{k}{{\boldsymbol{K}}}_{l})\approx \widehat{{T}_{k,l}}=\frac{1}{B}\frac{1}{{M}_{k}{M}_{l}}{\mathop {\sum }\limits_{b}}{{\boldsymbol{z}}}_{b}^{{\mathrm{{T}}}}{{\boldsymbol{X}}}_{k}{{\boldsymbol{X}}}_{k}^{{\mathrm{{T}}}}{{\boldsymbol{X}}}_{l}{{\boldsymbol{X}}}_{l}^{{\mathrm{{T}}}}{{\boldsymbol{z}}}_{b}$$
(8)

Here z1, …, zB are B independent random vectors with zero mean and covariance IN. We draw these random vectors independently from a standard normal distribution. Computing Tk,l using the unbiased estimator involves four multiplications of sub-matrices of the genotype matrix with a vector, repeated B times. Therefore, the total running time for estimating the matrix T is \({\mathcal{O}}(NMB+{K}^{2}NB)\).

Moreover, we can leverage the structure of the genotype matrix which only contains entries in {0, 1, 2}. For a fixed genotype matrix Xk, we can improve the per iteration time complexity of matrix–vector multiplication from \({\mathcal{O}}(NM)\) to \({\mathcal{O}}(\frac{NM}{{\mathrm{{max}}}({\mathrm{log}\,}_{3}N,{\mathrm{log}\,}_{3}M)})\) by using the Mailman algorithm34. Solving the normal equations takes \({\mathcal{O}}({K}^{3})\) time so that the overall time complexity of our algorithm is \({\mathcal{O}}(\frac{NMB}{\max ({\mathrm{log}\,}_{3}(N),{\mathrm{log}\,}_{3}(M))}+{K}^{2}(K+NB))\).

RHE-mc uses a block Jackknife to estimate standard errors. In Supplementary Notes, we show how the block Jackknife estimates can be computed with little additional computational overhead. Further, we also show how covariates can be efficiently included in the model (Supplementary Notes).

Multi-component LMM with overlapping annotations

RHE-mc can also be applied in the setting where annotations overlap. Following ref. 7, the heritability of SNPs belong to annotation k is defined as

$${h}_{k}^{2}=\frac{{\sum }_{i\in {S}_{k}}{\sum }_{j:i\in {S}_{j}}\frac{{\sigma }_{j}^{2}}{{M}_{j}}}{\mathop{\sum }\nolimits_{k = 1}^{K}{\sigma }_{k}^{2}+{\sigma }_{e}^{2}},k\in \{1,\ldots ,K\}$$
(9)

where Sk is the set of SNPs in kth annotation and Mk = Sk. Enrichment in bin k is defined as \({e}_{k}=\frac{{h}_{k}^{2}/{h}_{{\mathrm{{SNP}}}}^{2}}{{M}_{k}/M}\).

Multi-component LMM with continuous annotations

We have described the derivation of RHE-mc using binary annotations. Following ref. 29, we can extend RHE-mc to support continuous-value annotations as follows:

$$ {\boldsymbol{y}}| {\boldsymbol{\epsilon }},{{\boldsymbol{\beta }}}_{1},\ldots ,{{\boldsymbol{\beta }}}_{K} =\mathop{\sum }\limits_{k=1}^{K}{{\boldsymbol{X}}}_{k}{{\boldsymbol{\beta }}}_{k}+{\boldsymbol{\epsilon }}\\ \hskip 55pt {\boldsymbol{\epsilon }} \sim {\mathcal{D}}({\bf{0}},{\sigma }_{e}^{2}{{\bf{I}}}_{N})\\ \hskip -5pt {{\boldsymbol{\beta }}}_{k} \sim {\mathcal{D}}\left({\bf{0}},\frac{{\sigma }_{k}^{2}}{{M}_{k}}{\mathrm{{diag}}}({\bf{a}}_{k})\right),k\in \{1,\ldots ,K\}$$
(10)

This model is similar to the model in Eq.  (1) except that here we assume that the variance of effect sizes depend on continuous-valued annotation. Let \({\mathbf{a}}\)k be a Mk-vector where ak,m is the value of kth annotation at SNP m (the elements of \({\mathbf{a}}_{k}\) must be non-negative). Let Sk be the set of SNPs belong to annotation k. In this model, the SNP heritability of annotation k is defined as:

$${h}_{k}^{2}=\frac{{\sum }_{i\in {S}_{k}}\frac{{\sigma }_{k}^{2}}{{M}_{k}}{a}_{k,i}}{\mathop{\sum }\nolimits_{k = 1}^{K}{\sum }_{i\in {S}_{k}}\frac{{\sigma }_{k}^{2}}{{M}_{k}}{a}_{k,i}+{\sigma }_{e}^{2}},k\in \{1,\ldots ,K\}$$
(11)

To estimate the variance components of this new model, we only need to replace Xk with \({{\boldsymbol{X}}}_{k}{\mathrm{{diag}}}(\sqrt{{{\bf{a}}}_{k}})\) in Eq. (5) for every annotation k. We assessed the accuracy of RHE-mc in estimating variance components with continuous annotation in Supplementary Notes.

Simulations

We performed simulations to compare the performance of RHE-mc with several state-of-the-art methods for heritability estimation that cover the spectrum of methods that have been proposed.

We considered two simulation settings. In the large-scale simulation setting, we simulated phenotypes for the full set of UK Biobank genotypes consisting of M = 593,300 array SNPs and N = 337,205 individuals. We obtained the individuals by keeping unrelated white British individuals which are  >3rd degree relatives (defined as pairs of individuals with kinship coefficient  <1/2(9/2))17, and removing individuals with putative sex chromosome aneuploidy. The small-scale setting was designed so that we could compare the accuracies of RHE-mc to REML methods. In this setting, we simulated phenotypes from a subsampled set of genotypes from the UK Biobank data genotypes used in large-scale simulation35. Specifically, we randomly chose a subset of N = 10,000 individuals from the large-scale data so that we have M = 593,300 array SNPs and N = 10,000 individuals. We simulated phenotypes from genotypes using the following model which is used in refs. 8,10:

$$ {\sigma }_{m}^{2} =S{c}_{m}{w}_{m}^{b}{[{f}_{m}(1-{f}_{m})]}^{a}\\ \hskip -48pt {({{\boldsymbol{\beta }}}_{1},{{\boldsymbol{\beta }}}_{2},..,{{\boldsymbol{\beta }}}_{m})}^{T} \sim {\mathcal{N}}({\bf{0}},{\mathrm{{diag}}}({\sigma }_{1}^{2},{\sigma }_{2}^{2},...,{\sigma }_{m}^{2}))\\ y| {\boldsymbol{\beta }} \sim {\mathcal{N}}({\bf{X}}\beta ,(1-{h}^{2}){{\bf{I}}}_{N})$$
(12)

where S is a normalizing constant chosen so that \(\mathop{\sum }\nolimits_{m = 1}^{M}{\sigma }_{m}^{2}={h}^{2}\). Here h2 [0, 1], a {0, 0.75}, b {0, 1}. βm, fm, and wm are the effect size, the minor allele frequency, and LDAK score of mth SNP, respectively. Let cm {0, 1} be an indicator variable for the causal status of SNP m. The LD score of a SNP is defined to be the sum of the squared correlation of the SNP with all other SNPs that lie within a specific distance, and the LDAK score of a SNP is computed based on local levels of LD such that the LDAK score tends to be higher for SNPs in regions of low LD36. The above models relating genotype to phenotype are commonly used in methods for estimating SNP heritability: the GCTA Model (when a = b = 0 in Eq. (12)), which is used by the software GCTA27 and LD Score regression (LDSC)24, and the LDAK Model (where a = 0.75, b = 1 in Eq. (12)) used by software LDAK36. Moreover, under each model, we varied the proportion and minor allele frequency (MAF) of CVs. Proportion of CVs were set to be either 100% or 1%, and MAF of CVs drawn uniformly from [0, 0.5] or [0.01, 0.05] or [0.05, 0.5] to consider genetic architectures that are either infinitesimal or sparse, as well genetic architectures that include a mixture of common and rare SNPs as well as ones that consist of only rare or common SNPs. The true heritability were chosen from {0.1, 0.25, 0.5, 0.8}.

We generated 100 sets of simulated phenotypes for each setting of parameters and report accuracies averaged over these 100 sets.

Comparisons

For the large-scale simulations, we compared RHE-mc to methods that rely on summary statistics for estimating heritability. Among the summary statistic methods, LD score regression (LDSC)24 uses the slope from the GWAS χ2 statistics regressed on the LD scores to estimate heritability. Stratified LD score regression (S-LDSC)7 is an extension of LDSC for partitioning heritability from summary statistics. SumHer is the summary statistic analog of LDAK25. We ran S-LDSC with 10 binary MAF bin annotations defined such that each bin contains exactly 10% of the typed SNPs; this is intended to mirror the 10 MAF bin annotations in the S-LDSC “baseline-LD model”29 (see Supplementary Table 5). To run SumHer, we used the LDAK software to compute the default “LDAK weights” using in-sample LD 25,36,37. We then computed “LD tagging” using 1-Mb windows centered on each SNP as recommended25. To do a fair comparison we computed LD scores for LDSC, S-LDSC, GRE, and SumHer by using in-sample LD among the M SNPs, and in all simulations we aim to estimate the SNP-heritability explained by the same set of M SNP. We described the parameter settings of summary statistic methods in Supplementary Notes.

For the small-scale simulations, we compared RHE-mc to GCTA-mc and HE-mc27. GCTA-mc and HE-mc are the extensions of GCTA and HE to a multi-component LMM, respectively, where the variance components are typically defined by binning SNPs according to their MAF as well as local LD8. We ran GCTA-mc, HE-mc and RHE-mc using 24 bins formed by the combination of six bins based on MAF (MAF ≤ 0.01, 0.01 < MAF ≤ 0.02, 0.02 < MAF ≤ 0.03, 0.03 < MAF ≤ 0.4, 0.04 < MAF ≤ 0.05, MAF > 0.05) as well as four bins based on quartiles of the LDAK score of a SNP. We ran both GCTA-mc and RHE-mc allowing for estimates of a variance component to be negative.

For comparisons of runtime, we compared RHE-mc to GCTA27 and BOLT-REML4 which is a computationally efficient approximate method to compute the REML estimator. We ran all methods with 22 components (one for each chromosome). We also ran RHE-mc with  ≈300 components (corresponding to 10 Mb bins) on the UK Biobank genotype (Supplementary Fig. 10). To create our largest dataset, we replicate individuals from the UK Biobank and a subset of the imputed SNPs to obtain a dataset with one million individuals and SNPs. We use the latest versions of BOLT-REML (Version 2.3.2) and GCTA (Version 1.92.1) in our comparison. All comparisons are performed on an Intel(R) Xeon(R) CPU 2.10 GHz server with 128 GB RAM.

Heritability estimates in the UK Biobank

We estimated SNP-heritability for 22 complex traits (6 quantitative, 16 binary) in the UK Biobank17. In this study, we restricted our analysis to SNPs that were present in the UK Biobank Axiom array used to genotype the UK Biobank. SNPs with >1% missingness and minor allele frequency <1% were removed. Moreover, SNPs that fail the Hardy–Weinberg test at significance threshold 10−7 were removed. We restricted our study to self-reported British white ancestry individuals who are  >3rd degree relatives defined as pairs of individuals with kinship coefficient  <1/2(9/2)17. Furthermore, we removed individuals who are outliers for genotype heterozygosity and/or missingness. Finally, we obtained a set of N = 291,273 individuals and M = 459,792 SNPs to use in the real data analyses. We included age, sex, and the top 20 genetic principal components (PCs) as covariates in our analysis for all traits. We used PCs precomputed by the UK Biobank from a superset of 488,295 individuals. Additional covariates were used for waist-to-hip ratio (adjusted for BMI) and diastolic/systolic blood pressure (adjusted for cholesterol-lowering medication, blood pressure medication, insulin, hormone replacement therapy, and oral contraceptives).

Heritability partitioning

In our initial analysis, we removed SNPs with >1% missingness and minor allele frequency <1%. Moreover, we removed SNPs that fail the Hardy–Weinberg test at significance threshold 10−7 as well as SNPs that lie within the MHC region (Chr6: 25–35 Mb) to obtain 4,824,392 SNPs. We restricted our study to self-reported British white ancestry individuals who are  >3rd degree relatives defined as pairs of individuals with kinship coefficient  <1/2(9/2)17. Furthermore, we removed individuals who are outliers for genotype heterozygosity and/or missingness. Finally, we obtained 291,273 individuals . We partitioned SNPs into eight bins based on two MAF bins (MAF ≤ 0.05, MAF > 0.05) and quartiles of the LD-scores. For each bin k, we computed the heritability enrichment as the ratio of the percentage of heritability explained by SNPs in bin k to the the percentage of SNPs in bin k.

We considered an additional analysis in which we included SNPs with MAF > 0.1% resulting in N = 291,273 unrelated white British individuals and M = 7,774,235 imputed SNPs (MAF > 0.1%). We defined 144 bins based on 4 LD bins and 36 MAF bins. The 4 LD bins are defined based on quartile of LD-scores, and 36 MAF bins are defined based on 9-quantile of the following four intervals: 0.001 ≤ MAF ≤ 0.01, 0.01 < MAF ≤ 0.05, 0.05 ≤ MAF ≤ 0.10, 0.10 < MAF ≤ 0.50.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.