To the Editor — Recent work has highlighted the importance of accounting for linkage disequilibrium (LD)-dependent genetic architectures in analyses of heritability1,2,3,4,5. Two models incorporating LD-dependent architectures have been proposed for analyses of functional enrichment: the baseline-LD model4 used by stratified LD-score regression4,6 (S-LDSC) and the LDAK model1,3. Although both models include LD-dependent effects, they produce very different estimates of functional enrichment (for example, 9.35x ± 0.80 in ref. 4 and 1.34x ± 0.26 in ref. 3 for conserved regions), thus leading to different interpretations of the functional architecture of complex traits. To reconcile these findings, we performed a comprehensive set of formal model comparisons and empirical analyses. Each of these analyses supports the higher functional enrichment estimates of S-LDSC with the baseline-LD model; each paragraph below is detailed in a corresponding section of the Supplementary Note (and detailed analyses can be found in Supplementary Tables 1–10 and Supplementary Figs. 1–23).
We defined six heritability models, including the infinitesimal model that ref. 3 called the ‘GCTA model’7; the baseline-LD model4, combining functional annotations6 with LD-dependent and minor allele frequency (MAF)-dependent architectures; and the LDAK model3, combining LD-dependent and MAF-dependent architectures. Notably, the baseline-LD and LDAK models use very different LD-dependent architectures. For comparison purposes, we also defined the ‘α-model’1, comprising only the MAF-dependent part of the LDAK model; the ‘Gazal-LD model’, comprising only the LD-dependent and MAF-dependent parts of the baseline-LD model (analogous to the LDAK model); and the ‘baseline-LD + LDAK’ model, which flexibly combines baseline-LD and LDAK model annotations.
We performed formal model comparisons by using the likelihood approach of ref. 3, analyzing 16 UK Biobank traits (n = 20,000 unrelated British-ancestry samples). First, we analyzed genotypes imputed with 1000 Genomes (1000G), as in ref. 3 (Fig. 1a). We confirmed that the LDAK model attained higher likelihoods than the GCTA model and α-model, in agreement with ref. 3, but we observed that this model attained lower likelihoods than the Gazal-LD, baseline-LD and baseline-LD + LDAK models. Second, we analyzed genotypes imputed with the Haplotype Reference Consortium (HRC), a more comprehensive SNP set that produced higher likelihoods for each model analyzed (Fig. 1b). The LDAK model attained slightly lower likelihoods than the GCTA model and α-model, and much lower likelihoods than the Gazal-LD, baseline-LD and baseline-LD + LDAK models; the baseline-LD and baseline-LD + LDAK models attained similar likelihoods. We obtained similar conclusions by using out-of-sample polygenic prediction, as proposed in ref. 5.
We defined three methods for estimating functional enrichment: the S-LDSC method4, which uses the baseline-LD model4; the LDAK method3, which uses the LDAK model3; and the S-LDSC + LDAK method, which uses the baseline-LD + LDAK model. We emphasize the distinction between heritability models and functional enrichment methods.
We compared the three methods by using extensive simulations. The S-LDSC method was unbiased in simulations under the baseline-LD model, and it produced unstable estimates under the LDAK model (unlike the results for real data). The LDAK method was downward biased under the LDAK model (because it restricts analyses to well-imputed SNPs) and was even more downward biased under the baseline-LD model. The S-LDSC + LDAK method produced robust enrichment estimates under both models (and the baseline-LD + LDAK model), thus validating it as a gold-standard method.
We compared the gold-standard S-LDSC + LDAK method to the S-LDSC and LDAK methods across 16 UK Biobank traits, analyzing 28 functional annotations. S-LDSC and S-LDSC + LDAK produced nearly identical estimates of enrichment (Fig. 2a; 8.11x ± 0.54 and 7.48x ± 0.49 for conserved regions); thus, adding LDAK model annotations did not significantly change S-LDSC enrichment estimates. In contrast, LDAK enrichment estimates were systematically lower than S-LDSC + LDAK enrichment estimates (Fig. 2b) but were higher than the LDAK enrichment estimates in ref. 3.
The LDAK model assigns zero weights to most SNPs (≥85%, to ‘thin’ SNPs in high LD), hard-coding zero heritability for these SNPs; this aspect may substantially affect functional enrichment, in which an out-of-annotation SNP in perfect LD with a zero-weight in-annotation SNP cannot act as a proxy. We investigated the effect of the proportion of SNPs with zero weights on LDAK enrichment estimates. We ran the LDAK software by using a non-default flag that models SNPs in perfect LD differently by assigning non-zero weights to all SNPs (LDAK-nonzeroweights). With this flag, LDAK enrichment estimates increased considerably (Fig. 2b), thus suggesting that assigning zero heritability to most SNPs may lead to downward bias in LDAK functional enrichment estimates.
Recently, the authors of LDAK introduced the SumHer method, which extends the LDAK model to estimate functional enrichment from summary statistics5; this method also produces low functional enrichment estimates5 (for example, 1.95x ± 0.07 for conserved regions). We investigated the effect of the proportion of SNPs with zero weights on SumHer enrichment estimates. We ran the SumHer method (LDAK software) by using the same non-default flag as above (SumHer-nonzeroweights). Again, SumHer enrichment estimates increased considerably (Fig. 2c), thus suggesting that assigning zero heritability to most SNPs may lead to downward bias in SumHer functional enrichment estimates. We also determined that a model similar to the SumHer model (but more amenable to our likelihood analyses) attained lower likelihoods than the baseline-LD model in formal model comparisons.
In summary, the baseline-LD model attained higher likelihoods than the LDAK model; the S-LDSC method produced functional enrichment estimates nearly identical to those produced by the gold-standard S-LDSC + LDAK method (which was unbiased in simulations under both baseline-LD and LDAK models) in empirical analyses of 16 UK Biobank traits; and the lower enrichment estimates for LDAK (and SumHer) could potentially be explained by the assignment of zero weights to most SNPs. S-LDSC enrichment estimates are further corroborated by published results on the functional enrichment of low-frequency variants8 (which are less affected by LD) and functional enrichment of fine-mapped SNPs (Fig. 2 of ref. 9 and Fig. 3 of ref. 10). We recommend using the S-LDSC method rather than the S-LDSC + LDAK method in most settings, owing to the complexities of computing LDAK model weights and running S-LDSC + LDAK. We note that our original LD-score regression (LDSC) method used the GCTA model to assess confounding in GWAS data, estimate genetic correlations between traits and estimate the heritability causally explained by all common SNPs. We anticipate that S-LDSC with the baseline-LD model4 will improve with the results of LDSC with the GCTA model for these applications, in agreement with recent observations5.
Accurate estimation of components of heritability relies on accurate modeling of genetic architectures, and we anticipate that new models and corresponding methods will continue to improve current knowledge. However, our results strongly support S-LDSC with the baseline-LD model as the current state of the art for functional enrichment analyses. Additional points are discussed in the Supplementary Note.
Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.
Baseline-LD model version 1.1 can be found at https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baselineLD_v1.1_ldscores.tgz. UK Biobank association statistics, computed with BOLT-LMM v2.3, are available at http://data.broadinstitute.org/alkesgroup/UKBB/.
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Yang, J. et al. Nat. Genet. 47, 1114–1120 (2015).
Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Nat. Genet. 49, 986–992 (2017).
Gazal, S. et al. Nat. Genet. 49, 1421–1427 (2017).
Speed, D. & Balding, D. J. Nat. Genet. 51, 277–284 (2019).
Finucane, H. K. et al. Nat. Genet. 47, 1228–1235 (2015).
Yang, J. et al. Nat. Genet. 42, 565–569 (2010).
Gazal, S. et al. Nat. Genet. 50, 1600–1607 (2018).
Farh, K. K.-H. et al. Nature 518, 337–343 (2015).
Huang, H. et al. Nature 547, 173–178 (2017).
We are grateful to P.-R. Loh for assistance with UK Biobank data and to L. O’Connor, D. Speed and D. Balding for helpful discussions. This research was conducted by using the UK Biobank Resource under Application 16549. A.L.P. is funded by NIH grants U01 HG009379, R01 MH101244 and R01 MH107649. S.G. is funded by NIH K99 HG010160-01. H.K.F. is funded by Eric and Wendy Schmidt and NIH DP5-OD024582. Computational analyses were performed on the Orchestra High-Performance Compute Cluster at Harvard Medical School.
The authors declare no competing interests.
About this article
Cite this article
Gazal, S., Marquez-Luna, C., Finucane, H.K. et al. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat Genet 51, 1202–1204 (2019). https://doi.org/10.1038/s41588-019-0464-1
Nature Genetics (2020)
Seminars in Cancer Biology (2020)
Immunological Reviews (2020)
Nature Genetics (2020)
Nature Communications (2020)