Evaluating and improving heritability models using summary statistics


There is currently much debate regarding the best model for how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I model, the authors of LD Score Regression recommend the Baseline LD model, and we have recommended the LDAK model. Here we provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Based on 31 studies of complex human traits (average sample size 136,000), we show that the Baseline LD model is more realistic than other existing heritability models, but that it can be improved by incorporating features from the LDAK model. Our framework also provides a method for estimating the selection-related parameter α from summary statistics. We find strong evidence (P < 1 × 10−6) of negative genome-wide selection for traits, including height, systolic blood pressure and college education, and that the impact of selection is stronger inside functional categories, such as coding SNPs and promoter regions.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Genetic architecture estimates from different heritability models.
Fig. 2: Evidence of selection.

Data availability

We performed the UKBb GWAS using data applied for and downloaded via the UK Biobank website (www.ukbiobank.ac.uk). We obtained summary statistics for the 17 public GWAS studies from the websites of the corresponding studies. We downloaded the 1000 Genome Project data from the LDSC website (www.github.com/bulik/ldsc).


  1. 1.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  2. 2.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  3. 3.

    Bulik-Sullivan, B. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Speed, D. & Balding, D. Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits. Nat. Genet. 51, 277–284 (2019).

    PubMed  CAS  Google Scholar 

  5. 5.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

  6. 6.

    Speed, D., Hemani, G., Johnson, M. & Balding, D. Improved heritability estimation from genome-wide SNP data. Am. J. Hum. Genet. 91, 1011–1021 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  7. 7.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  8. 8.

    Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  9. 9.

    Finucane, H. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  10. 10.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  11. 11.

    Corbeil, R. R. & Searle, S. R. Restricted maximum likelihood (REML) estimation of variance components in the mixed model. Technometrics 18, 31–38 (1976).

    Google Scholar 

  12. 12.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    PubMed  PubMed Central  CAS  Google Scholar 

  13. 13.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    PubMed  PubMed Central  CAS  Google Scholar 

  15. 15.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).

    PubMed  CAS  Google Scholar 

  16. 16.

    The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    PubMed Central  Google Scholar 

  17. 17.

    Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974).

    Google Scholar 

  18. 18.

    Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    PubMed  CAS  Google Scholar 

  19. 19.

    Yang, J., Zeng, J., Goddard, M. E., Wray, N. R. & Visscher, P. M. Concepts, estimation and interpretation of SNP-based heritability. Nat. Genet. 49, 1304–1310 (2017).

    PubMed  CAS  Google Scholar 

  20. 20.

    Gazal, S., Marquez-luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK models and functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Hou, K. et al. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat. Genet. 51, 1244–1251 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  22. 22.

    Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2016).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Ypma, T. Historical development of the Newton–Raphson method. SIAM Rev. 37, 531–551 (1995).

    Google Scholar 

  24. 24.

    Efron, B. & Stein, C. The jackknife estimate of variance. Ann. Stat. 9, 586–596 (1981).

    Google Scholar 

  25. 25.

    Speed, D. et al. Describing the genetic architecture of epilepsy through heritability analysis. Brain 137, 2680–2689 (2014).

    PubMed  PubMed Central  Google Scholar 

  26. 26.

    Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  27. 27.

    Liu, J. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  28. 28.

    The Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

    PubMed Central  Google Scholar 

  29. 29.

    Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    PubMed  CAS  Google Scholar 

  30. 30.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    PubMed Central  Google Scholar 

  31. 31.

    Scott, R. et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902 (2017).

    PubMed  PubMed Central  CAS  Google Scholar 

  32. 32.

    Zheng, H. et al. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature 526, 112–117 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  33. 33.

    Locke, A. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  34. 34.

    Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 626–633 (2016).

    Google Scholar 

  35. 35.

    Wood, A. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    PubMed  PubMed Central  CAS  Google Scholar 

  36. 36.

    Perry, J. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014).

    PubMed  PubMed Central  CAS  Google Scholar 

  37. 37.

    Day, F. et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat. Genet. 47, 1294–1303 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  38. 38.

    Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nat. Genet. 518, 187–196 (2015).

    CAS  Google Scholar 

  39. 39.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    PubMed  PubMed Central  CAS  Google Scholar 

  40. 40.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    PubMed  PubMed Central  CAS  Google Scholar 

  41. 41.

    The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    PubMed Central  Google Scholar 

Download references


We thank B. Shaban for help with the LDAK website, and A. Price, S. Gazal and H. Finucane for helpful discussions. D.S. is funded by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 754513, by Aarhus University Research Foundation and by the Independent Research Fund Denmark under project no. 7025-00094B. D.S. and D.J.B. are funded by the Australian Research Council under project no. DP190103188.

Author information




D.S. and J.H. performed the analyses. D.S. and D.J.B. wrote the manuscript.

Corresponding author

Correspondence to Doug Speed.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of likelihoods.

Plots compare likelihood ratio test (LRT) statistics (twice the improvement in log likelihood relative to the null model) computed using the likelihood from restricted maximum likelihood (REML) with those from loglSS, our new approximate likelihood, and loglOld, the approximate likelihood we reported in the original version of SumHer (see Supplementary Note for details). We only analyze the 14 UKBb GWAS, because to perform REML requires individual-level data, and we only consider the GCTA, LDAK and LDAK-Thin Models, because REML is only feasible for simple heritability models. To ensure a fair comparison, when running SumHer we restrict the reference panel to the 4.7 M GWAS SNPs. The bottom plots are zoomed versions of the top plots (obtained by excluding height, the most heritable trait). We see that the LRT statistics from loglSS are highly concordant with those from REML, indicating that the weights used when calculating loglSS perform well. We observe lower concordance between the LRT statistics from loglOld and those from REML, reflecting that loglOld was based on the assumption that test statistics were Gaussian distributed, rather than Gamma distributed.

Extended Data Fig. 2 Estimated proportions of SNP heritability.

This is an expanded version of Fig. 1d, and shows that estimates of functional enrichments tend to converge as the heritability model becomes more complex. Plots report the estimated proportion of SNP heritability contributed by each category of SNPs, averaged across either the 14 UKBb or 17 Public GWAS (vertical segments indicate 95% confidence intervals). Bars indicate the heritability model used and are ordered by number of parameters (see Supplementary Table 13 for definitions): GCTA + 1Fun Model (two parameters, used by Gusev et al.12), LDAK + 1Fun Model (two parameters, Speed et al.1), LDAK + 24Fun Model (25 parameters, Speed et al.4), Baseline Model (53 parameters, Finucane et al.9), BLD-LDAK and BLD-LDAK + Alpha Models (66 and 67 parameters, this paper) and Baseline LD Model (75 parameters, Gazal et al.10). The estimated enrichment of a category is obtained by dividing its estimated proportion of SNP heritability by the proportion of SNPs it contains (horizontal dashed lines). Numerical values are provided in Supplementary Tables 5 & 6.

Extended Data Fig. 3 Reduced-complexity heritability models.

The seven-parameter BLD-LDAK-Lite is a reduced version of the BLD-LDAK Model, obtained by removing two of the nine continuous annotations and all 57 binary annotations (Supplementary Table 8 explains how we used forward stepwise selection to decide which of the continuous annotations to retain). The nine-parameter BLD-LDAK-Lite+1Fun Model adds to the BLD-LDAK-Lite Model one function indicator and the corresponding 500 base pair buffer, while the eight-parameter BLD-LDAK-Lite+Alpha Model is the same as the BLD-LDAK-Lite Model, except annotations are scaled by [fj(1-fj)]1+α. These plots show that estimates of SNP heritability and confounding bias from the BLD-LDAK-Lite Model, and average estimates of functional enrichments from the BLD-LDAK-Lite+1Fun Model are close to the those from the BLD-LDAK Model, while estimates of α from the BLD-LDAK-Lite+Alpha Model are close to those from the BLD-LDAK + Alpha Model. Numbers indicate how many of the pairs of estimates are inconsistent either nominally or after Bonferroni correction. Numerical values are provided in Supplementary Tables 37.

Extended Data Fig. 4 Comparison with GRE.

Hou et al.21 proposed GRE, a method for estimating SNP heritability without specifying a heritability model. GRE requires individual level data and that there are more individuals than the number of SNPs on the largest chromosome. Here we compare estimates from GRE to those from SumHer for the 14 UKBb GWAS. To run GRE, we follow the instructions at www.github.com/bogdanlab/h2-GRE; to satisfy the sample size requirement, we use only the 623k directly-genotyped SNPs (Hou et al. did likewise). For SumHer, we consider ten heritability models; to enable a fair comparison with GRE, we always restrict the reference panel to genotyped SNPs. The first three plots compare estimates of SNP heritability from GRE and SumHer. It is noticeable that when using only genotyped SNPs, changing the heritability model has a much smaller impact on estimates of SNP heritability than when using imputed SNPs (Supplementary Table 3); this reflects that with fewer SNPs, the impact of the prior assumptions is reduced. Nonetheless, if we consider GRE estimates to be the ‘gold standard’, then this analysis indicates that the LDAK-Thin, GCTA-LDMS-R, GCTA-LDMS-I, BLD-LDAK, BLD-LDAK + Alpha and Baseline LD Models produce more accurate estimates of SNP heritability than the GCTA, LDAK, LDAK + 24Fun and Baseline Models. In the fourth plot, the solid and dashed lines mark the point estimate and 95% confidence intervals for the gradient when regressing onto the Akaike Information Criterion (AIC) the absolute difference between estimates from SumHer and GRE (when performing this regression, we include an indicator for trait, to reflect that AIC will tend to be lower for more heritable traits). If we again consider GRE estimates to be the gold standard, then the fact that the gradient is significantly positive (P < 10−6) indicates that lower AIC implies more accurate estimates of SNP heritability.

Extended Data Fig. 5 Comparison of weighted least-squares and maximum likelihood solvers.

The plots compare likelihood ratio test (LRT) statistics (twice the improvement in log likelihood relative to the null model), computed using loglSS, our approximate model likelihood. We consider six heritability models (see Supplementary Table 13 for definitions), estimating parameters using either maximum likelihood (our recommended approach) or weighted least-squares regression (the approach used by LDSC and previously by SumHer). Note that when we estimate parameters for the Baseline and Baseline LD Models using weighted least-squares regression, we frequently obtain negative E[Sj]; so that we can compute loglSS, we replace these with 10−6. These plots show that for the GCTA, GCTA-LDMS-I, LDAK and LDAK + 24Fun Models (the simpler models), the two solvers result in near-identical model fit. However, for the Baseline and Baseline LD Models (the more complex models), weighted least-squares regression often results in a worse fit, because it does not respect that test statistics are approximately Gamma distributed. Note that the reason we observe discordance between the weighted least-squares estimates from LDSC and SumHer (mainly evident for the Baseline Model), is because the SumHer weighted least-squares solver is always iterative, whereas the LDSC solver is iterative when provided with a single-parameter heritability model, but one-step when provided with a multi-parameter model.

Extended Data Fig. 6 Reduced quality control for UKBb GWAS.

For our main analysis of the UKBb GWAS, we first identified individuals with values for all 14 phenotypes, then filtered so that no pair remained with allelic correlation >0.02 (Supplementary Note 6). As a secondary analysis, we instead identified individuals with values for any of the 14 phenotypes, then filtered so that no pair remained with allelic correlation >0.03125. This increased the number of individuals from 130,080 to 246,655, with on average 236k phenotypic values per GWAS (range 201k to 247k). The first plot shows that increasing the sample size does not change the ranking of models based on the Akaike Information Criterion. The remaining three plots shows that it does not significantly change estimates of SNP heritability or average functional enrichments from the BLD-LDAK Model, nor estimates of the selection-related parameter α from the BLD-LDAK + Alpha Model (horizontal and vertical segments indicate 95% confidence intervals; numbers indicate how many of the pairs of estimates are inconsistent either nominally or after Bonferroni correction).

Supplementary information

Supplementary Information

Supplementary Note and Tables 1–16

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Speed, D., Holmes, J. & Balding, D.J. Evaluating and improving heritability models using summary statistics. Nat Genet 52, 458–462 (2020). https://doi.org/10.1038/s41588-020-0600-y

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing