High-definition likelihood inference of genetic correlations across human complex traits

Ning, Zheng; Pawitan, Yudi; Shen, Xia

doi:10.1038/s41588-020-0653-y

Technical Report
Published: 29 June 2020

High-definition likelihood inference of genetic correlations across human complex traits

Nature Genetics volume 52, pages 859–864 (2020)Cite this article

15k Accesses
66 Citations
87 Altmetric
Metrics details

Subjects

Abstract

Genetic correlation is a central parameter for understanding shared genetic architecture between complex traits. By using summary statistics from genome-wide association studies (GWAS), linkage disequilibrium score regression (LDSC) was developed for unbiased estimation of genetic correlations. Although easy to use, LDSC only partially utilizes LD information. By fully accounting for LD across the genome, we develop a high-definition likelihood (HDL) method to improve precision in genetic correlation estimation. Compared to LDSC, HDL reduces the variance of genetic correlation estimates by about 60%, equivalent to a 2.5-fold increase in sample size. We apply HDL and LDSC to estimate 435 genetic correlations among 30 behavioral and disease-related phenotypes measured in the UK Biobank (UKBB). In addition to 154 significant genetic correlations observed for both methods, HDL identified another 57 significant genetic correlations, compared to only another 2 significant genetic correlations identified by LDSC. HDL brings more power to genomic analyses and better reveals the underlying connections across human complex traits.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Relative efficiency of HDL against LDSC when 10% of SNPs are causal.**

**Fig. 2: Genetic correlation estimates from HDL and LDSC among 30 phenotypes in the UKBB.**

**Fig. 3: Comparing genetic correlation estimates from HDL and LDSC with those from LMMs across 11 phenotypes in the UKBB.**

Detecting local genetic correlations with scan statistics

Article Open access 01 April 2021

A global overview of pleiotropy and genetic architecture in complex traits

Article 19 August 2019

Genetic correlations of polygenic disease traits: from theory to practice

Article 06 June 2019

Data availability

The individual-level genotype and phenotype data are available by application from the UKBB (http://www.ukbiobank.ac.uk/). The UKBB GWAS summary statistics by the Neale laboratory can be obtained from http://www.nealelab.is/uk-biobank/. Source data are provided with this paper.

Code availability

HDL software is available at https://github.com/zhenin/HDL/. LDSC software is available at https://github.com/bulik/ldsc/. PLINK 2.0 (https://www.cog-genomics.org/plink/2.0/) was used to extract individual-level data of imputed SNPs from the UKBB. PLINK 1.9 (https://www.cog-genomics.org/plink/) and LDAK (http://dougspeed.com/ldak/) were used in LD correlation calculation and simulations.

References

Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
Article CAS Google Scholar
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Article CAS Google Scholar
Bulik-Sullivan, B. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS Google Scholar
Zheng, J. et al. LD hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).
Article CAS Google Scholar
Ni, G. et al. Estimation of genetic correlation via linkage disequilibrium score regression and genomic restricted maximum likelihood. Am. J. Hum. Genet. 102, 1185–1194 (2018).
Article CAS Google Scholar
Yang, J. et al. Genome-wide genetic homogeneity between sexes and populations for human height and body mass index. Hum. Mol. Genet. 24, 7445–7449 (2015).
Article CAS Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Article CAS Google Scholar
Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).
Article CAS Google Scholar
Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).
Article CAS Google Scholar
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).
Article CAS Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Article CAS Google Scholar
Yengo, L., Yang, J. & Visscher, P. M. Expectation of the intercept from bivariate LD score regression in the presence of population stratification. Preprint at bioRxiv https://doi.org/10.1101/310565 (2018).
Ganna, A. et al. Large-scale GWAS reveals insights into the genetic architecture of same-sex sexual behavior. Science 365, eaat7693 (2019).
Article CAS Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

We thank the UKBB resource, approved under application no. 14302 and 19655, for the individual-level genotype data used in LD correlation calculation and simulations. X.S. was in receipt of a Swedish Research Council starting grant (no. 2017-02543). Y.P. received a Swedish Research Council grant (no. 2016-04194). We thank the Edinburgh Compute and Data Facility (ECDF) for providing high-performance computing resources.

Author information

Authors and Affiliations

Biostatistics Group, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Zheng Ning & Xia Shen
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Zheng Ning, Yudi Pawitan & Xia Shen
Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK
Xia Shen

Authors

Zheng Ning
View author publications
You can also search for this author in PubMed Google Scholar
Yudi Pawitan
View author publications
You can also search for this author in PubMed Google Scholar
Xia Shen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.S. and Y.P. initiated and coordinated the study. Z.N. performed data analysis. All authors contributed to method development and manuscript writing.

Corresponding author

Correspondence to Xia Shen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Relative efficiency of HDL against LDSC when 100% SNPs are causal.

In each heritability group, we generated 100 pairs of traits, where true genetic correlation and phenotypic correlation are 0.5. In the high heritability group, the heritability of the pair of traits is 0.6 and 0.8 separately; in the low heritability group, the heritability of the pair of traits is 0.2 and 0.4 separately. The 307,519 array SNPs of ~336,000 UKBB genomic British individuals were used to simulate true phenotypes and to compute the LD matrix for both HDL and LDSC. The P-values are from Levene’s test for variance heterogeneity. Inside each box, the line indicates the median value, the central box indicates the interquartile range (IQR), and whiskers extend up to 1.5 times the IQR.

Source data

Extended Data Fig. 2 Relative efficiency of HDL against LDSC under different model setups when 10% SNPs with MAF > 1% are causal.

52,914 out of 529,139 array SNPs with MAF > 1% were randomly selected as causal variants. 100 pairs of traits were generated, where true genetic correlation and phenotypic correlation are 0.5. The true phenotypes of trait i is generated from model \({\mathbf{y}}_i = \mathop {\sum}\nolimits_{k = 1}^M {{\mathbf{X}}_{ik}\beta _{ik} + \epsilon_i}\), where \({\mathbf{X}}_{ik} = ({\mathbf{Z}}_{ik} - 2p_k1)[2p_k(1 - p_k)]^{\alpha /2}\); Z_ik are the original genotypes of SNP k for trait i; p_k is the MAF of SNP k; M is the number of causal variants. Four scenarios were simulated: (1) α = −1, and the marginal distribution of β_ik is \(N(0,h_i^2/M)\); (2) α = −1, and the marginal distribution of β_ik is \(N(0,w_kh_i^2/M)\), where w_k is the LDAK weight of SNP k which is inversely proportional to its LD score; (3) α = −0.25, and the marginal distribution of β_ik is \(N(0,h_i^2/M)\) and (4) α = −0.25, and the marginal distribution of β_ik is \(N(0,w_kh_i^2/M)\). After β_i were generated, they were rescaled by multiplying the same constant so that the true heritabilities were 0.5 for both traits. The 307,519 array SNPs of ~336,000 UKBB genomic British individuals were used to simulate true phenotypes and to compute LD matrix for both HDL and LDSC. The P-values are from Levene’s test for variance heterogeneity. Inside each box, the line indicates the median value, the central box indicates the interquartile range (IQR), and whiskers extend up to 1.5 times the IQR.

Source data

Extended Data Fig. 3 Relative efficiency of HDL against LDSC under different model setups when 10% SNPs with 5% > MAF > 1% are causal.

52,914 out of 221,620 array SNPs with 5% > MAF > 1% were randomly selected as causal variants. 100 pairs of traits were generated, where true genetic correlation and phenotypic correlation are 0.5. The true phenotypes of trait i is generated from model \({\mathbf{y}}_i = \mathop {\sum}\nolimits_{k = 1}^M {{\mathbf{X}}_{ik}\beta _{ik} + \epsilon_i}\), where \({\mathbf{X}}_{ik} = ({\mathbf{Z}}_{ik} - 2p_k1)[2p_k(1 - p_k)]^{\alpha /2}\); Z_ik are the original genotypes of SNP k for trait i; p_k is the MAF of SNP k; M is the number of causal variants. Four scenarios were simulated: (1) α = −1, and the marginal distribution of β_ik is \(N(0,h_i^2/M)\); (2) α = −1, and the marginal distribution of β_ik is \(N(0,w_kh_i^2/M)\), where w_k is the LDAK weight of SNP k which is inversely proportional to its LD score; (3) α =−0.25, and the marginal distribution of β_ik is \(N(0,h_i^2/M)\) and (4) α =−0.25, and the marginal distribution of β_ik is \(N(0,w_kh_i^2/M)\). After β_i were generated, they were rescaled by multiplying the same constant so that the true heritabilities were 0.5 for both traits. The 307,519 array SNPs of ~336,000 UKBB genomic British individuals were used to simulate true phenotypes and to compute LD matrix for both HDL and LDSC. The P-values are from Levene’s test for variance heterogeneity. Inside each box, the line indicates the median value, the central box indicates the interquartile range (IQR), and whiskers extend up to 1.5 times the IQR.

Source data

Extended Data Fig. 4 Relative efficiency of HDL using imputed reference panel against LDSC.

100 pairs of traits were generated, where true heritabilities are 0.5, genetic correlation and phenotypic correlation are 0.5. The 1,029,876 imputed SNPs of ~336,000 UKBB genomic British individuals were used to simulate true phenotypes. LDSC and LDSC.1kG stand for the LDSC software using UKBB imputed reference panel and default 1000 Genomes reference panel, respectively. 102,988 (10% of 1,029,876) randomly sampled SNPs are set to be causal variants. The P-values are from Levene’s test for variance heterogeneity. Inside each box, the line indicates the median value, the central box indicates the interquartile range (IQR), and whiskers extend up to 1.5 times the IQR.

Source data

Extended Data Fig. 5 Relative efficiency and standard error of LDSC estimate among 30 phenotypes in UK Biobank.

Each dot represents genetic correlation results for one pair of traits among 435 pairs. The x-axis represents the standard error of the LDSC estimate. The y-axis represents the relative efficiency of HDL against LDSC. HDL reference panel: UKBB imputed SNPs; LDSC reference panel: 1000 Genomes (default). Colors indicate the number of binary traits in the pair.

Source data

Extended Data Fig. 6 Genetic correlation estimates from HDL and LDSC among 30 phenotypes in UK Biobank based on directly genotyped variants on the array.

Lower triangle: HDL estimates; Upper triangle: LDSC estimates. The areas of the squares represent the absolute value of corresponding genetic correlations. After Bonferroni correction for 435 tests at 5% significance level, genetic correlations estimates that are significantly different from zero in both methods are marked with a dot; estimates that are significantly different from zero in only one method are marked with an asterisk and a black square. HDL reference panel: UKBB array SNPs; LDSC reference panel: UKBB array SNPs.

Source data

Extended Data Fig. 7 Relative efficiency of HDL using imputed reference panel against LDSC for the estimation of heritability.

a, 100 traits were generated using 14,867 imputed SNPs on chromosome 22 of ~336,000 UKBB genomic British individuals, where true heritability was set to 0.05. LDSC and LDSC.1kG stand for the LDSC software using UKBB imputed reference panel and default 1kG reference panel, respectively. 1,487 (10% of 14,867) randomly sampled SNPs are set to be causal variants. b, The relative efficiency, calculated as the ratio of the estimated variances of the LDSC estimates to those of the HDL estimates, was evaluated for 30 GWAS of real phenotypes in UKBB. HDL reference panel: UKBB imputed SNPs; LDSC reference panel: 1000 Genomes (default). Inside each box, the line indicates the median value, the central box indicates the interquartile range (IQR), and whiskers extend up to 1.5 times the IQR.

Source data

Extended Data Fig. 8 Comparison of the heritability estimates from HDL and default LDSC across 30 UKBB phenotypes.

The default LDSC uses the 1000 Genomes reference panel. HDL uses UKBB imputed markers as reference. R represents the correlation between the two sets of estimates. The red dashed line represents identity.

Source data

Extended Data Fig. 9 Example of the eigenvalues of an LD matrix.

5,420 genotyped variants on chromosome 22 for UKBB genomic British individuals were used to generate the LD matrix. The red dashed line represents the cutoff where the leading eigenvalues and corresponding eigenvectors capture 90% of the information of the LD matrix.

Source data

Extended Data Fig. 10 HDL results where the LD matrix is approximated by different numbers of leading eigenvalues and eigenvectors.

After performing eigen-decomposition to the LD matrix, leading eigenvalues explaining different amount of variances of the LD matrix and their corresponding eigenvectors were taken to approximate the LD matrix. In each heritability group, we generated 100 pairs of traits, where true genetic correlation and phenotypic correlation are 0.5. In the high heritability group, the heritability of the pair of traits is 0.6 and 0.8 separately; in low heritability group, the heritability of the pair of traits is 0.2 and 0.4 separately. The 307,519 array SNPs of ~336,000 UKBB genomic British individuals were used to simulate true phenotypes and to compute the LD matrix for HDL. 30,752 SNPs are causal (10% of 307,519). Inside each box, the line indicates the median value, the central box indicates the interquartile range (IQR), and whiskers extend up to 1.5 times the IQR.

Source data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ning, Z., Pawitan, Y. & Shen, X. High-definition likelihood inference of genetic correlations across human complex traits. Nat Genet 52, 859–864 (2020). https://doi.org/10.1038/s41588-020-0653-y

Download citation

Received: 03 October 2019
Accepted: 26 May 2020
Published: 29 June 2020
Issue Date: August 2020
DOI: https://doi.org/10.1038/s41588-020-0653-y

This article is cited by

Shared genetic architecture between autoimmune disorders and B-cell acute lymphoblastic leukemia: insights from large-scale genome-wide cross-trait analysis
- Xinghao Yu
- Yiyin Chen
- Yang Xu
BMC Medicine (2024)
Shared genetic risk loci between Alzheimer’s disease and related dementias, Parkinson’s disease, and amyotrophic lateral sclerosis
- Michael Wainberg
- Shea J. Andrews
- Shreejoy J. Tripathy
Alzheimer's Research & Therapy (2023)
Causal associations between female reproductive behaviors and psychiatric disorders: a lifecourse Mendelian randomization study
- Yifan Yu
- Lei Hou
- Fuzhong Xue
BMC Psychiatry (2023)
Longitudinal machine learning uncouples healthy aging factors from chronic disease risks
- Netta Mendelson Cohen
- Aviezer Lifshitz
- Amos Tanay
Nature Aging (2023)
Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix
- Hui Li
- Rahul Mazumder
- Xihong Lin
Nature Communications (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links