Reconciling S-LDSC and LDAK functional enrichment estimates

To the Editor — Recent work has highlighted the importance of accounting for linkage disequilibrium (LD)-dependent genetic architectures in analyses of heritability1,2,3,4,5. Two models incorporating LD-dependent architectures have been proposed for analyses of functional enrichment: the baseline-LD model4 used by stratified LD-score regression4,6 (S-LDSC) and the LDAK model1,3. Although both models include LD-dependent effects, they produce very different estimates of functional enrichment (for example, 9.35x ± 0.80 in ref. 4 and 1.34x ± 0.26 in ref. 3 for conserved regions), thus leading to different interpretations of the functional architecture of complex traits. To reconcile these findings, we performed a comprehensive set of formal model comparisons and empirical analyses. Each of these analyses supports the higher functional enrichment estimates of S-LDSC with the baseline-LD model; each paragraph below is detailed in a corresponding section of the Supplementary Note (and detailed analyses can be found in Supplementary Tables 110 and Supplementary Figs. 123).

We defined six heritability models, including the infinitesimal model that ref. 3 called the ‘GCTA model’7; the baseline-LD model4, combining functional annotations6 with LD-dependent and minor allele frequency (MAF)-dependent architectures; and the LDAK model3, combining LD-dependent and MAF-dependent architectures. Notably, the baseline-LD and LDAK models use very different LD-dependent architectures. For comparison purposes, we also defined the ‘α-model’1, comprising only the MAF-dependent part of the LDAK model; the ‘Gazal-LD model’, comprising only the LD-dependent and MAF-dependent parts of the baseline-LD model (analogous to the LDAK model); and the ‘baseline-LD + LDAK’ model, which flexibly combines baseline-LD and LDAK model annotations.

We performed formal model comparisons by using the likelihood approach of ref. 3, analyzing 16 UK Biobank traits (n = 20,000 unrelated British-ancestry samples). First, we analyzed genotypes imputed with 1000 Genomes (1000G), as in ref. 3 (Fig. 1a). We confirmed that the LDAK model attained higher likelihoods than the GCTA model and α-model, in agreement with ref. 3, but we observed that this model attained lower likelihoods than the Gazal-LD, baseline-LD and baseline-LD + LDAK models. Second, we analyzed genotypes imputed with the Haplotype Reference Consortium (HRC), a more comprehensive SNP set that produced higher likelihoods for each model analyzed (Fig. 1b). The LDAK model attained slightly lower likelihoods than the GCTA model and α-model, and much lower likelihoods than the Gazal-LD, baseline-LD and baseline-LD + LDAK models; the baseline-LD and baseline-LD + LDAK models attained similar likelihoods. We obtained similar conclusions by using out-of-sample polygenic prediction, as proposed in ref. 5.

Fig. 1: Likelihood comparison of different models of per-SNP heritability.
figure1

We report the change in log likelihood compared with the LDAK model (ΔLL) of five other per-SNP heritability models, summed across 16 independent UK Biobank traits (n = 20,000). All six models include one heritability parameter that is maximized in sample when estimating the likelihood; any other parameters are maximized out of sample. a, Analyses using M = 2,835,699 well-imputed 1000G SNPs (as in ref. 3). b, Analyses using M = 4,631,901 well-imputed HRC SNPs. Numbers in parentheses in figure legends indicate the numbers of traits with ΔLL > 0. The M = 4.6 million well-imputed HRC SNPs consistently attained higher likelihoods than the M = 2.8 million well-imputed 1000G SNPs in comparisons using the same model. Further details and numerical results are provided in the Supplementary Note.

We defined three methods for estimating functional enrichment: the S-LDSC method4, which uses the baseline-LD model4; the LDAK method3, which uses the LDAK model3; and the S-LDSC + LDAK method, which uses the baseline-LD + LDAK model. We emphasize the distinction between heritability models and functional enrichment methods.

We compared the three methods by using extensive simulations. The S-LDSC method was unbiased in simulations under the baseline-LD model, and it produced unstable estimates under the LDAK model (unlike the results for real data). The LDAK method was downward biased under the LDAK model (because it restricts analyses to well-imputed SNPs) and was even more downward biased under the baseline-LD model. The S-LDSC + LDAK method produced robust enrichment estimates under both models (and the baseline-LD + LDAK model), thus validating it as a gold-standard method.

We compared the gold-standard S-LDSC + LDAK method to the S-LDSC and LDAK methods across 16 UK Biobank traits, analyzing 28 functional annotations. S-LDSC and S-LDSC + LDAK produced nearly identical estimates of enrichment (Fig. 2a; 8.11x ± 0.54 and 7.48x ± 0.49 for conserved regions); thus, adding LDAK model annotations did not significantly change S-LDSC enrichment estimates. In contrast, LDAK enrichment estimates were systematically lower than S-LDSC + LDAK enrichment estimates (Fig. 2b) but were higher than the LDAK enrichment estimates in ref. 3.

Fig. 2: Comparison of functional enrichment estimates in analyses of UK Biobank traits.
figure2

ac, For 28 functional annotations, we report functional enrichment estimates of S-LDSC with the baseline-LD model (n = 434,000; a), the LDAK method (n = 20,000; b) and the SumHer method (n = 434,000) (restricted to the 24 annotations included in the SumHer model; c) versus functional enrichment estimates of S-LDSC + LDAK with the corresponding estimand (n = 434,000); S-LDSC + LDAK (S-LDSC estimand) and S-LDSC + LDAK (LDAK estimand) produced highly similar results for these 28 annotations (Supplementary Note). Meta-analysis of the results was performed across 16 independent UK Biobank traits. For the LDAK method (b) and the SumHer method (c), we also report results for corresponding methods using a non-default flag that models SNPs in perfect LD differently by assigning nonzero heritability to all SNPs (LDAK-nonzeroweights and SumHer-nonzeroweights, respectively). In each case, we report the concordance correlation coefficient (ρc) with S-LDSC + LDAK. Dashed gray lines represent y = x. Error bars represent 95% confidence intervals for annotations for which the estimated enrichment was significantly different (P < 0.05; two-sided z test) between the two methods. Further details and numerical results are provided in the Supplementary Note.

The LDAK model assigns zero weights to most SNPs (≥85%, to ‘thin’ SNPs in high LD), hard-coding zero heritability for these SNPs; this aspect may substantially affect functional enrichment, in which an out-of-annotation SNP in perfect LD with a zero-weight in-annotation SNP cannot act as a proxy. We investigated the effect of the proportion of SNPs with zero weights on LDAK enrichment estimates. We ran the LDAK software by using a non-default flag that models SNPs in perfect LD differently by assigning non-zero weights to all SNPs (LDAK-nonzeroweights). With this flag, LDAK enrichment estimates increased considerably (Fig. 2b), thus suggesting that assigning zero heritability to most SNPs may lead to downward bias in LDAK functional enrichment estimates.

Recently, the authors of LDAK introduced the SumHer method, which extends the LDAK model to estimate functional enrichment from summary statistics5; this method also produces low functional enrichment estimates5 (for example, 1.95x ± 0.07 for conserved regions). We investigated the effect of the proportion of SNPs with zero weights on SumHer enrichment estimates. We ran the SumHer method (LDAK software) by using the same non-default flag as above (SumHer-nonzeroweights). Again, SumHer enrichment estimates increased considerably (Fig. 2c), thus suggesting that assigning zero heritability to most SNPs may lead to downward bias in SumHer functional enrichment estimates. We also determined that a model similar to the SumHer model (but more amenable to our likelihood analyses) attained lower likelihoods than the baseline-LD model in formal model comparisons.

In summary, the baseline-LD model attained higher likelihoods than the LDAK model; the S-LDSC method produced functional enrichment estimates nearly identical to those produced by the gold-standard S-LDSC + LDAK method (which was unbiased in simulations under both baseline-LD and LDAK models) in empirical analyses of 16 UK Biobank traits; and the lower enrichment estimates for LDAK (and SumHer) could potentially be explained by the assignment of zero weights to most SNPs. S-LDSC enrichment estimates are further corroborated by published results on the functional enrichment of low-frequency variants8 (which are less affected by LD) and functional enrichment of fine-mapped SNPs (Fig. 2 of ref. 9 and Fig. 3 of ref. 10). We recommend using the S-LDSC method rather than the S-LDSC + LDAK method in most settings, owing to the complexities of computing LDAK model weights and running S-LDSC + LDAK. We note that our original LD-score regression (LDSC) method used the GCTA model to assess confounding in GWAS data, estimate genetic correlations between traits and estimate the heritability causally explained by all common SNPs. We anticipate that S-LDSC with the baseline-LD model4 will improve with the results of LDSC with the GCTA model for these applications, in agreement with recent observations5.

Accurate estimation of components of heritability relies on accurate modeling of genetic architectures, and we anticipate that new models and corresponding methods will continue to improve current knowledge. However, our results strongly support S-LDSC with the baseline-LD model as the current state of the art for functional enrichment analyses. Additional points are discussed in the Supplementary Note.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Baseline-LD model version 1.1 can be found at https://data.broadinstitute.org/alkesgroup/LDSCORE/1000G_Phase3_baselineLD_v1.1_ldscores.tgz. UK Biobank association statistics, computed with BOLT-LMM v2.3, are available at http://data.broadinstitute.org/alkesgroup/UKBB/.

Code availability

LDSC software is available at https://github.com/bulik/ldsc. LDAK version 5 is available at http://dougspeed.com/downloads/.

References

  1. 1.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    CAS  Article  Google Scholar 

  2. 2.

    Yang, J. et al. Nat. Genet. 47, 1114–1120 (2015).

    CAS  Article  Google Scholar 

  3. 3.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Nat. Genet. 49, 986–992 (2017).

    CAS  Article  Google Scholar 

  4. 4.

    Gazal, S. et al. Nat. Genet. 49, 1421–1427 (2017).

    CAS  Article  Google Scholar 

  5. 5.

    Speed, D. & Balding, D. J. Nat. Genet. 51, 277–284 (2019).

    CAS  Article  Google Scholar 

  6. 6.

    Finucane, H. K. et al. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Article  Google Scholar 

  7. 7.

    Yang, J. et al. Nat. Genet. 42, 565–569 (2010).

    CAS  Article  Google Scholar 

  8. 8.

    Gazal, S. et al. Nat. Genet. 50, 1600–1607 (2018).

    CAS  Article  Google Scholar 

  9. 9.

    Farh, K. K.-H. et al. Nature 518, 337–343 (2015).

    CAS  Article  Google Scholar 

  10. 10.

    Huang, H. et al. Nature 547, 173–178 (2017).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We are grateful to P.-R. Loh for assistance with UK Biobank data and to L. O’Connor, D. Speed and D. Balding for helpful discussions. This research was conducted by using the UK Biobank Resource under Application 16549. A.L.P. is funded by NIH grants U01 HG009379, R01 MH101244 and R01 MH107649. S.G. is funded by NIH K99 HG010160-01. H.K.F. is funded by Eric and Wendy Schmidt and NIH DP5-OD024582. Computational analyses were performed on the Orchestra High-Performance Compute Cluster at Harvard Medical School.

Author information

Affiliations

Authors

Contributions

S.G., H.K.F. and A.L.P. designed experiments. S.G. performed experiments. S.G., C.M.L. and H.K.F. analyzed data. S.G., H.K.F. and A.L.P., with assistance from C.M.L., wrote the manuscript.

Corresponding authors

Correspondence to Steven Gazal or Hilary K. Finucane or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Supplementary Information

Supplementary Notes, Supplementary Figs. 1–23 and Supplementary Tables 1 and 5

Reporting Summary

Supplementary Tables

Supplementary Tables 2–4 and 6–10

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gazal, S., Marquez-Luna, C., Finucane, H.K. et al. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat Genet 51, 1202–1204 (2019). https://doi.org/10.1038/s41588-019-0464-1

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing