Population-specific causal disease effect sizes in functionally important regions impacted by selection

Shi, Huwenbo; Gazal, Steven; Kanai, Masahiro; Koch, Evan M.; Schoech, Armin P.; Siewert, Katherine M.; Kim, Samuel S.; Luo, Yang; Amariuta, Tiffany; Huang, Hailiang; Okada, Yukinori; Raychaudhuri, Soumya; Sunyaev, Shamil R.; Price, Alkes L.

doi:10.1038/s41467-021-21286-1

Download PDF

Article
Open access
Published: 17 February 2021

Population-specific causal disease effect sizes in functionally important regions impacted by selection

Nature Communications volume 12, Article number: 1098 (2021) Cite this article

9667 Accesses
47 Citations
48 Altmetric
Metrics details

Subjects

Abstract

Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.

Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics

Article Open access 14 February 2023

Quantifying genetic effects on disease mediated by assayed gene expression levels

Article 18 May 2020

A cross-population atlas of genetic associations for 220 human phenotypes

Article 30 September 2021

Introduction

Trans-ethnic genetic correlations are significantly less than 1 for many diseases and complex traits^1,2,3,4,5,6, implying that population-specific causal disease effect sizes contribute to the incomplete portability of genome-wide association study (GWAS) findings and polygenic risk scores (PRSs) to non-European populations^{6,7,8,9,10,11,12}. However, current methods for estimating genome-wide trans-ethnic genetic correlations assume the same trans-ethnic genetic correlation for all categories of SNPs^2,5,13, providing little insight into why causal disease effect sizes are population-specific. Understanding the biological processes contributing to population-specific causal disease effect sizes can help inform polygenic risk prediction in non-European populations and alleviate health disparities^6,14,15.

Here, we introduce a new method, S-LDXR, for estimating enrichment of stratified squared trans-ethnic genetic correlation across functional categories of SNPs using GWAS summary statistics and population-matched linkage disequilibrium (LD) reference panels (e.g. the 1000 Genomes Project (1000G)¹⁶); we stratify the squared trans-ethnic genetic correlation across functional categories to robustly handle noisy heritability estimates. S-LDXR analyzes GWAS summary statistics of HapMap3¹⁷ SNPs with minor allele frequency (MAF) greater than 5% in both East Asian (EAS) and European (EUR) populations (regression SNPs) to draw inferences about causal effects of all SNPs with MAF greater than 5% in both populations (heritability SNPs). We confirm that S-LDXR yields robust estimates in extensive simulations. We apply S-LDXR to 31 diseases and complex traits with GWAS summary statistics available in both East Asian (EAS) and European (EUR) populations, leveraging recent large studies in East Asian populations from the CONVERGE consortium and Biobank Japan^18,19,20; we analyze a broad set of genomic annotations from the baseline-LD model^21,22,23, as well as tissue-specific annotations based on specifically expressed gene (SEG) sets²⁴. Most results are meta-analyzed across the 31 traits to maximize power (analogous to refs. ^21,22,23), as we expect to see similar patterns of enrichment/depletion across traits (even though the underlying biological processes differ across traits). We also investigate trait-specific enrichments/depletions for the tissue-specific annotations (analogous to ref. ²⁴).

Results

Overview of methods

Our method (S-LDXR) for estimating stratified trans-ethnic genetic correlation is conceptually related to stratified LD score regression^21,22 (S-LDSC), a method for partitioning heritability from GWAS summary statistics while accounting for LD. S-LDXR determines that a category of SNPs is enriched for trans-ethnic genetic covariance if SNPs with high LD to that category have a higher product of Z-scores than SNPs with low LD to that category. Unlike S-LDSC, S-LDXR models per-allele effect sizes (accounting for differences in MAF between populations).

In detail, the product of Z-scores of SNP j in two populations, Z_1jZ_2j, has the expectation

$${\rm{E}}[{Z}_{1j}{Z}_{2j}]=\sqrt{{N}_{1}{N}_{2}}\sum \limits_{C}{\ell }_{\times }(j,C){\theta }_{C}\ ,$$

(1)

where N_p is the sample size for population p; ℓ_×(j, C) = ∑_kr_1jkr_2jkσ_1jσ_2ja_C(k) is the trans-ethnic LD score of SNP j with respect to annotation C, whose value for SNP k, a_C(k), can be either binary or continuous; r_pjk is the LD (Pearson correlation) between SNP j and k in population p; σ_pj is the standard deviation of SNP j genotypes in population p; and θ_C represents the per-SNP contribution to trans-ethnic genetic covariance of the per-allele causal disease effect size of annotation C. Here, r_pjk and σ_pj can be estimated from population-matched reference panels (e.g. 1000 Genomes Project¹⁶). We estimate θ_C for each annotation C using weighted least square regression. Subsequently, we estimate the trans-ethnic genetic covariance of each binary annotation C (ρ_g(C)) by summing trans-ethnic genetic covariance of each SNP in annotation C as ${\sum }_{j\in C}\left({\sum }_{C^{\prime} }{a}_{C^{\prime} }(j){\theta }_{C^{\prime} }\right)$, using coefficients (${\theta }_{C^{\prime} }$) for all binary and continuous-valued annotations $C^{\prime}$ included in the analysis; the heritabilities in each population (${h}_{g1}^{2}(C)$ and ${h}_{g2}^{2}(C)$) are estimated analogously. We then estimate the stratified squared trans-ethnic genetic correlation, defined as

$${r}_{g}^{2}(C)=\frac{{\rho }_{g}^{2}(C)}{{h}_{g1}^{2}(C){h}_{g2}^{2}(C)}\ .$$

(2)

We define the enrichment/depletion of squared trans-ethnic genetic correlation as ${\lambda }^{2}(C)=\frac{{r}_{g}^{2}(C)}{{r}_{g}^{2}}$, where ${r}_{g}^{2}$ is the genome-wide squared trans-ethnic genetic correlation; λ²(C) can be meta-analyzed across traits with different ${r}_{g}^{2}$. S-LDXR analyzes GWAS summary statistics of common HapMap3¹⁷ SNPs (regression SNPs) to estimate λ²(C) for (causal effects of) all common SNPs (heritability SNPs). Further details (quantities estimated, analytical bias correction, shrinkage estimator to reduce standard errors, estimation of standard errors, significance testing, and factors impacting power) of the S-LDXR method are provided in the Methods section and the Supplementary Notes; we have publicly released open-source software implementing the method (see “Code availability”).

We apply S-LDXR to 62 annotations (defined in both EAS and EUR populations) from our baseline-LD-X model (“Methods”, Supplementary Data 1, Supplementary Figs. 1, 2), primarily derived from the baseline-LD model^21,22,23 (v1.1), and 53 SEG annotations (Supplementary Data 2). We have publicly released all baseline-LD-X model annotations and LD scores for EAS and EUR populations (see “Data availability”).

Simulations

We evaluated the accuracy of S-LDXR in simulations using genotypes that we simulated using HAPGEN2²⁵ from phased haplotypes of 481 EAS and 489 EUR individuals from the 1000 Genomes Project¹⁶, preserving population-specific MAF and LD patterns (18,418 simulated EAS-like and 36,836 simulated EUR-like samples, after removing genetically related samples, the ratio of sample sizes similar to empirical data; ~2.5 million SNPs on chromosomes 1–3) (“Methods”); we did not have access to individual-level EAS data at sufficient sample size to perform simulations with real genotypes. For each population, we randomly selected a subset of 500 simulated samples to serve as the reference panel for estimating LD scores. We performed both null simulations (heritable trait with functional enrichment but no enrichment/depletion of squared trans-ethnic genetic correlation; λ²(C) = 1) and causal simulations (λ²(C) ≠ 1). In our main simulations, we randomly selected 10% of the SNPs as causal SNPs in both populations, set genome-wide heritability to 0.5 in each population, and adjusted genome-wide genetic covariance to attain a genome-wide r_g of 0.60 (unless otherwise indicated). In the null simulations, we used heritability enrichments from analyses of real traits in EAS samples to specify per-SNP causal effect size variances and covariances. In the causal simulations, we directly specified per-SNP causal effect size variances and covariances to attain λ²(C) ≠ 1 values from analyses of real traits, as these were difficult to attain using the heritability and trans-ethnic genetic covariance enrichments from analyses of real traits.

First, we assessed the accuracy of S-LDXR in estimating genome-wide trans-ethnic genetic correlation (r_g); we note that S-LDXR does not use the shrinkage estimator for genome-wide estimates. Across a wide range of simulated r_g values (0.0 to 1.0), S-LDXR yielded approximately unbiased estimates and well-calibrated jackknife standard errors (Supplementary Table 1, Supplementary Fig. 3).

Second, we assessed the accuracy of S-LDXR in estimating λ²(C) in quintiles of the 8 continuous-valued annotations of the baseline-LD-X model. We performed both null simulations (λ²(C) = 1) and causal simulations (λ²(C) ≠ 1). Results are reported in Fig. 1a and Supplementary Data 3–8. In both null and causal simulations, S-LDXR yielded approximately unbiased estimates of λ²(C) for most annotations, validating our analytical bias correction. As a secondary analysis, we tried varying the S-LDXR shrinkage parameter, α, which has a default value of 0.5. In null simulations, results remained approximately unbiased; in causal simulations, reducing α led to less precise (but less biased) estimates of λ²(C), whereas increasing α biased results towards the null (λ²(C) = 1), demonstrating a bias-variance tradeoff in the choice of α (Supplementary Fig. 4, Supplementary Data 4, 7). Results were similar at other values of the proportion of causal SNPs (1% and 100%; Supplementary Data 3, 5, 6, 8). We also confirmed that S-LDXR produced well-calibrated jackknife standard errors (Supplementary Data 3–8).

**Fig. 1: Accuracy of S-LDXR in null and causal simulations.**

Third, we assessed the accuracy of S-LDXR in estimating λ²(C) for the 28 main binary annotations of the baseline-LD-X model (inherited from the baseline model of ref. ²¹). We discarded λ²(C) estimates with the highest standard errors (top 5%), as estimates with large standard errors (which are particularly common for annotations of small size) are uninformative for evaluating unbiasedness of the estimator (in analyses of real traits, trait-specific estimates with large standard errors are retained, but contribute very little to meta-analysis results, and would be interpreted as inconclusive when assessing trait-specific results). Results are reported in Fig. 1b and Supplementary Data 4, 7. In null simulations, S-LDXR yielded unbiased estimates of λ²(C), further validating our analytical bias correction. In causal simulations, estimates were biased towards the null (λ²(C) = 1)—particularly for annotations of small size (proportion of SNPs < 1%)—due to our shrinkage estimator; increasing the shrinkage parameter above its default value of 0.5 further biased the estimates towards the null (λ²(C) = 1) in causal simulations (Supplementary Data 6–8). To ensure robust estimates, we focus on the 20 main binary annotations of large size (>1% of SNPs) in analyses of real traits (see below); although results for these annotations may still be biased towards the null, we emphasize that S-LDXR is unbiased in null data. Results were similar at other values of the proportion of causal SNPs (1% and 100%; Supplementary Data 3, 5, 6, 8). We also confirmed that S-LDXR produced well-calibrated jackknife standard errors (Supplementary Data 3–8) and conservative p-values (Supplementary Fig. 7, Supplementary Data 3–8).

Fourth, we performed additional null simulations in which causal variants differed across the two populations (Methods). S-LDXR yielded robust estimates of λ²(C), well-calibrated standard errors and conservative p-values in these simulations (Supplementary Fig. 11, Supplementary Data 11).

Fifth, we performed additional null simulations with annotation-dependent MAF-dependent genetic architectures^26,27,28, defined as architectures in which the level of MAF-dependence is annotation-dependent, to ensure that estimate of λ²(C) remains unbiased. We disproportionately sampled low-frequency causal variants from the top quintile of background selection statistic, and set the variance of per-allele effect sizes of a causal SNP to be inversely proportional to its maximum MAF across both populations (Methods). Results are reported in Supplementary Figs. 8–10, and Supplementary Data 9–10. S-LDXR yielded nearly unbiased estimates of λ²(C) for the 28 binary functional annotations (Supplementary Fig. 8) and nearly unbiased estimates of λ²(C) for most quintiles of continuously valued annotations (Supplementary Fig. 9); estimates were slightly biased estimates in the top and bottom quintile of the average level of LD annotation and the recombination-rate annotation, likely due to less accurate reference LD scores at SNPs with extreme levels of LD. We repeated these simulations with five MAF bin annotations added to the baseline-LD-X model and obtained similar results (Supplementary Figs. 8a, 9b), supporting our decision not to include MAF bin annotations into the baseline-LD-X model.

Sixth, we performed additional null simulations, in which we increased or decreased the reference panel size from 500 to 250 or 1000, to assess the impact of reference panel size on the accuracy of S-LDXR (Methods). We simulated GWAS summary statistics based on the baseline-LD-X model as well as the model with annotation-dependent MAF-dependent genetic architectures. We determined that the small systematic biases in null simulations of continuous-valued annotations were on the same order of magnitude as for 500 reference samples (Supplementary Figs. 12, 13 and Supplementary Data 12 for 250 reference samples; Supplementary Figs. 14, 15 and Supplementary Data 13 for 1000 reference samples). We also performed simulations in which we reduced the simulated GWAS sample size by half, from N_EAS = 18K, N_EUR = 37K to N_EAS = 9K, N_EUR = 18K (while fixing the reference panel size at 500). We again determined that the small systematic biases were generally on the same order of magnitude as for N_EAS = 18K, N_EUR = 37K (although estimates were less stable and sometimes subject to larger biases, likely because our analytical bias correction starts to break down when the GWAS has low power) (Supplementary Figs. 16, 17 and Supplementary Data 14). Although it was not computationally feasible to perform simulations at larger GWAS sample sizes, these analyses do not provide a reason to believe that the small systematic biases that we observed in some of our null simulations of continuously valued annotations would substantially increase at larger GWAS sample sizes.

In summary, S-LDXR produced approximately unbiased estimates of enrichment/depletion of squared trans-ethnic genetic correlation in null simulations, and conservative estimates in causal simulations of both quintiles of continuous-valued annotations and binary annotations.

Analysis of baseline-LD-X model annotations across 31 diseases and complex traits

We applied S-LDXR to 31 diseases and complex traits with summary statistics in East Asians (average N = 90K) and Europeans (average N = 267K) available from Biobank Japan, UK Biobank, and other sources (Supplementary Table 2 and Methods). First, we estimated the trans-ethnic genetic correlation (r_g) (as well as population-specific heritabilies) for each trait. Results are reported in Supplementary Fig. 18 and Supplementary Table 2. The average r_g across 31 traits was 0.85 (s.e. 0.01) (average ${r}_{g}^{2}$ = 0.72 (s.e. 0.02)). 28 traits had r_g < 1, and 11 traits had r_g significantly less than 1 after correcting for 31 traits tested (P < 0.05/31); the lowest r_g was 0.34 (s.e. 0.07) for Major Depressive Disorder (MDD), although this may be confounded by different diagnostic criteria in the two populations²⁹. Several other complex traits, including Age at Menopause (r_g = 0.57 (s.e. 0.09)) and LDL (r_g = 0.66 (s.e. 0.11)) also had low trans-ethnic r_g, likely due to pervasive gene-environment interaction across the genome. These estimates were consistent with estimates obtained using Popcorn² (Supplementary Fig. 19) and those reported in previous studies^2,5,6. We note that our estimates of trans-ethnic genetic correlation for 31 complex traits are higher than those reported for gene expression traits² (average estimate of 0.32, increasing to 0.77 when restricting the analysis to gene expression traits with (cis) heritability greater than 0.2 in both populations), which are expected to have different genetic architectures.

Second, we estimated the enrichment/depletion of squared trans-ethnic genetic correlation (λ²(C)) in quintiles of the 8 continuous-valued annotations of the baseline-LD-X model, meta-analyzing results across traits; these annotations are moderately correlated (Fig. 2a and Supplementary Data 1). We used the default shrinkage parameter (α = 0.5) in all analyses. Results are reported in Fig. 2b and Supplementary Data 15. We consistently observed a depletion of ${r}_{g}^{2}(C)$ (λ²(C) < 1, implying more population-specific causal effect sizes) in functionally important regions. For example, we estimated λ²(C) = 0.82 (s.e. 0.01) for SNPs in the top quintile of background selection statistic (defined as 1 − McVicker B statistic/1000³⁰; see ref. ²²); λ²(C) estimates were less than 1 for 29/31 traits, including 2 traits (Height and EGFR) with two-tailed p < 0.05/31. The background selection statistic quantifies the genetic distance of a site to its nearest exon; regions with high background selection statistic have higher per-SNP heritability, consistent with the action of selection, and are enriched for functionally important regions²². We observed the same pattern for CpG content and SNP-specific F_st (which are positively correlated with background selection statistic; Fig. 2a) and the opposite pattern for nucleotide diversity (which is negatively correlated with background selection statistic). We also estimated λ²(C) = 0.87 (s.e. 0.03) for SNPs in the top quintile of average LLD (which is positively correlated with background selection statistic), although these SNPs have lower per-SNP heritability due to a competing positive correlation with predicted allele age²². We caution that average LLD was the annotation most susceptible to bias in our simulations; see “Simulations”. Likewise, we estimated λ²(C) = 0.84 (s.e. 0.02) for SNPs in the bottom quintile of recombination rate (which is negatively correlated with background selection statistic), although these SNPs have average per-SNP heritability due to a competing negative correlation with average LLD²². However, λ²(C) < 1 estimates for the bottom quintile of GERP (NS) (which is positively correlated with both background selection statistic and recombination rate) and the middle quintile of predicted allele age are more difficult to interpret. For all annotations analyzed, heritability enrichments did not differ significantly between EAS and EUR, consistent with previous studies^20,31. Results were similar at a more stringent shrinkage parameter value (α = 1.0; Supplementary Fig. 20), and for a meta-analysis across a subset of 20 approximately independent traits (Methods; Supplementary Fig. 21).

**Fig. 2: S-LDXR results for quintiles of 8 continuous-valued annotations across 31 diseases and complex traits.**

Finally, we estimated λ²(C) for the 28 main binary annotations of the baseline-LD-X model (Supplementary Data 1), meta-analyzing results across traits (as we did not observe significant trait-specific enrichment/depletion of squared trans-ethnic genetic correlation for these annotations due to limited power). Results are reported in Fig. 3a and Supplementary Data 16. Our primary focus is on the 20 annotations of large size (>1% of SNPs), for which our simulations yielded robust estimates; results for remaining annotations are reported in Supplementary Data 16. We consistently observed a depletion of λ²(C) (implying more population-specific causal effect sizes) within these annotations: 17 annotations had λ²(C) < 1, and 5 annotations had λ²(C) significantly less than 1 after correcting for 20 annotations tested (P < 0.05/20). These annotations included Conserved (λ²(C) = 0.93 (s.e. 0.02)), Promoter (λ²(C) = 0.85 (s.e. 0.04)) and Super Enhancer (λ²(C) = 0.93 (s.e. 0.02)), each of which was significantly enriched for per-SNP heritability, consistent with ref. ²¹. For all annotations analyzed, heritability enrichments did not differ significantly between EAS and EUR (Fig. 3a), consistent with previous studies^20,31. Results were similar at a more stringent shrinkage parameter value (α = 1.0; Supplementary Fig. 20), and for a meta-analysis across a subset of 20 approximately independent traits (Methods; Supplementary Fig. 22). As a secondary analysis, we also estimated λ²(C) across 10 MAF bin annotations; we did not observe variation in λ²(C) estimates across MAF bins (Supplementary Table 3), further supporting our decision to not include MAF bin annotations in the baseline-LD-X model.

**Fig. 3: S-LDXR results for 20 binary functional annotations across 31 diseases and complex traits.**

Since the functional annotations are moderately correlated with the 8 continuous-valued annotations (Supplementary Data 1c, Supplementary Fig. 1), we investigated whether the depletions of squared trans-ethnic genetic correlation (λ²(C) < 1) within the 20 binary annotations could be explained by the 8 continuous-valued annotations. For each binary annotation, we estimated its expected λ²(C) based on values of the 8 continuous-valued annotations for SNPs in the binary annotation (Methods), meta-analyzed this quantity across traits, and compared observed vs. expected λ²(C) (Fig. 3b and Supplementary Data 17). We observed strong concordance, with a slope of 0.57 (correlation of 0.61) across the 20 binary annotations. This implies that the depletions of ${r}_{g}^{2}(C)$ (λ²(C) < 1) within binary annotations are largely explained by corresponding values of continuous-valued annotations.

In summary, our results show that causal disease effect sizes are more population-specific in functionally important regions impacted by selection. Further interpretation of these findings, including the role of positive and/or negative selection, is provided in the “Discussion” section.

Analysis of SEG annotations

We analyzed 53 SEG annotations, defined in ref. ²⁴ as ±100kb regions surrounding the top 10% of genes specifically expressed in each of 53 GTEx³² tissues (Supplementary Data 2), by applying S-LDXR with the baseline-LD-X model to the 31 diseases and complex traits (Supplementary Table 2). We note that although SEG annotations were previously used to prioritize disease-relevant tissues based on disease-specific heritability enrichments^20,24, enrichment/depletion of squared trans-ethnic genetic correlation (λ²(C)) is standardized with respect to heritability (i.e. increase in heritability in the denominator would lead to increase in trans-ethnic genetic covariance in the numerator (Eq. (2))), hence not expected to produce exceedingly disease-specific signals. Thus, we first assess meta-analyzed λ²(C) estimates across the 31 diseases and complex traits (trait-specific estimates are assessed below).

Results are reported in Fig. 4a and Supplementary Data 18. λ²(C) estimates were less than 1 for all 53 tissues and significantly less than 1 (p < 0.05/53) for 37 tissues, with statistically significant heterogeneity across tissues (p < 10⁻²⁰; Methods). The strongest depletions of squared trans-ethnic genetic correlation were observed in skin tissues (e.g. λ²(C) = 0.83 (s.e. 0.02) for Skin Sun Exposed (Lower Leg)), Prostate and Ovary (λ²(C) = 0.84 (s.e. 0.02) for Prostate, λ²(C) = 0.86 (s.e. 0.02) for Ovary) and immune-related tissues (e.g. λ²(C) = 0.85 (s.e. 0.02) for Spleen), and the weakest depletions were observed in Testis (λ²(C) = 0.98 (s.e. 0.02); no significant depletion) and brain tissues (e.g. λ²(C) = 0.98 (s.e. 0.02) for Brain Nucleus Accumbens (Basal Ganglia); no significant depletion). Results were similar at less stringent and more stringent shrinkage parameter values (α = 0.0 and α = 1.0; Supplementary Figs. 23, 24 and Supplementary Data 18). A comparison of 14 blood-related traits and 16 other traits yielded highly consistent λ²(C) estimates (R = 0.82; Supplementary Fig. 25, Supplementary Data 19), confirming that these findings were not exceedingly disease-specific.

**Fig. 4: S-LDXR results for 53 specifically expressed gene (SEG) annotations across 31 diseases and complex traits.**

These λ²(C) results were consistent with the higher background selection statistic³⁰ in Skin Sun Exposed (Lower Leg) (R = 0.17), Prostate (R = 0.16), and Spleen (R = 0.14) as compared to Testis (R = 0.02) and Brain Nucleus Accumbens (Basal Ganglia) (R = 0.08) (Supplementary Fig. 26, Supplementary Data 2), and similarly for CpG content (Supplementary Fig. 27, Supplementary Data 2). Although these results could in principle be confounded by gene size³³, the low correlation between gene size and background selection statistic (R = 0.06) or CpG content (R = −0.20) (in ±100kb regions) implies limited confounding. We note the well-documented action of recent positive selection on genes impacting skin pigmentation^{34,35,36,37,38}, the immune system^{34,35,36,37,39}, and Ovary⁴⁰; we are not currently aware of any evidence of positive selection impacting Prostate. We further note the well-documented action of negative selection on fecundity- and brain-related traits^26,28,41, but it is possible that recent positive selection may more closely track differences in causal disease effect sizes across human populations, which have split relatively recently⁴² (see “Discussion”).

More generally, since SEG annotations are moderately correlated with the 8 continuous-valued annotations (Supplementary Fig. 28, Supplementary Data 2), we investigated whether these λ²(C) results could be explained by the 8 continuous-valued annotations (analogous to Fig. 3b). Results are reported in Fig. 4b and Supplementary Data 20. We observed strong concordance, with a slope of 0.96 (correlation of 0.76) across the 53 SEG annotations. This implies that the depletions of λ²(C) within SEG annotations are explained by corresponding values of continuous-valued annotations.

The strong depletion of squared trans-ethnic genetic correlation in tissues impacted by positive selection (as opposed to negative selection) suggests a possible connection between positive selection and population-specific causal effect sizes. To further assess this, we estimated the enrichment/depletion of squared trans-ethnic genetic correlation in SNPs with high integrated haplotype score (iHS)^43,44, which quantifies the action of positive selection (“Methods”). We observed a significant depletion (λ²(C) = 0.88 (s.e. 0.03)), further implicating positive selection (however, it is difficult to assess whether the iHS annotation contains unique information about λ²(C) conditional on other annotations; see “Discussion”). In addition, we observed a high genome-wide trans-ethnic genetic correlation for schizophrenia (r_g = 0.95 (s.e. 0.04) vs. average of 0.85 (s.e. 0.01) across traits), a psychiatric disorder hypothesized to be strongly impacted by negative selection^45,46, suggesting that negative selection may play a limited role in population-specific causal effect sizes. As noted above, these estimates pertain to parameters that were defined based on common variants (see “Overview of methods”); we note that although negative selection has the strongest impact on low-frequency variants²⁶, common variants are also impacted by negative selection and can inform inferences about negative selection²². The role of positive selection (as opposed to negative selection) in population-specific causal effect sizes is discussed further in the Discussion section.

We investigated the enrichment/depletion of λ²(C) in the 53 SEG annotations for each individual trait (Supplementary Data 21). We identified six significantly depleted (vs. 0 significantly enriched) trait-tissue pairs at per-trait p < 0.05/53. The limited number of statistically significant results was expected, due to the reduced power of trait-specific analyses; however, λ²(C) estimates were generally consistent across traits. Results for BMI and height, two widely studied anthropometric traits, are reported in Fig. 5. For BMI, we observed significant depletion of squared trans-ethnic genetic correlation (λ²(C) = 0.84 (s.e. 0.05)) in Pituitary. Previous studies have highlighted the role of Pituitary in obesity^47,48,49; our results suggest that this tissue-specific mechanism is population-specific. For height, we observed significant depletion of squared trans-ethnic genetic correlation for Transformed fibroblasts (λ²(C) = 0.87 (s.e. 0.03)), a connective tissue linked to human developmental disorders⁵⁰; again, our results suggest that this tissue-specific mechanism is population-specific. Although Pituitary was significantly depleted for BMI but not height, and Transformed fibroblasts were significantly depleted for height but not BMI, we caution that for both tissues our λ²(C) estimates did not differ significantly between BMI and height.

**Fig. 5: S-LDXR results for 53 specifically expressed gene (SEG) annotations for BMI and height.**

In summary, our results show that causal disease effect sizes are more population-specific in regions surrounding SEGs.

Discussion

We developed a new method (S-LDXR) for stratifying squared trans-ethnic genetic correlation across functional categories of SNPs that yields approximately unbiased estimates in extensive simulations. By applying S-LDXR to East Asian and European summary statistics across 31 diseases and complex traits, we determined that SNPs with high background selection statistic³⁰ have substantially depleted squared trans-ethnic genetic correlation (vs. the genome-wide average), implying that causal effect sizes are more population-specific. Accordingly, squared trans-ethnic genetic correlations were substantially depleted for SNPs in many functional categories and enriched in less functionally important regions (although the power of S-LDXR to detect enrichment of squared trans-ethnic genetic correlation is limited due to depletion of heritability in less functionally important regions). In analyses of SEG annotations, we observed substantial depletion of squared trans-ethnic genetic correlation for SNPs near skin and immune-related genes, which are strongly impacted by recent positive selection, but not for SNPs near brain genes. We also observed trait-specific depletions of squared-trans-ethnic genetic correlation for SEG annotations, which indicate population-specific disease mechanisms.

Reductions in trans-ethnic genetic correlation have several possible underlying explanations, including gene-environment (G × E) interaction, gene–gene (G × G) interaction, and dominance variation (but not differences in heritability across populations, which would not affect trans-ethnic genetic correlation and were not observed in our study). Given the increasing evidence of the role of G × E interaction in complex trait architectures⁵¹, and evidence that G × G interaction and dominance variation explain limited heritability^52,53,54, we hypothesize that depletion of squared trans-ethnic genetic correlation in the top quintile of background selection statistic and in functionally important regions may be primarily attributable to stronger G × E interaction in these regions. Interestingly, a recent study on plasticity in Arabidopsis observed a similar phenomenon: lines with more extreme phenotypes exhibited stronger G × E interaction⁵⁵. Although depletion of squared trans-ethnic genetic correlation is often observed in regions with higher per-SNP heritability, which may often be subject to stronger G × E, depletion may also occur in regions with lower per-SNP heritability that are subject to stronger G × E; we hypothesize that this is the case for SNPs in the top quintile of average LLD and the bottom quintile of GERP (NS) (Fig. 2).

Distinguishing between stronger G × E interaction in regions impacted by selection and stronger G × E interaction in functionally important regions as possible explanations for our findings is a challenge, because functionally important regions are more strongly impacted by selection. To this end, we constructed an annotation that is similar to the background selection statistic but does not make use of recombination rate, instead relying solely on a SNP’s physical distance to the nearest exon (“Methods”). Applying S-LDXR to the 31 diseases and complex traits using a joint model incorporating baseline-LD-X model annotations and the nearest exon annotation, the background selection statistic remained highly conditionally informative for trans-ethnic genetic correlation, whereas the nearest exon annotation was not conditionally informative (Supplementary Table 4). This result implicates stronger G × E interaction in regions with a reduced effective population size that are impacted by the selection, and not just proximity to functional regions, in explaining depletions of squared trans-ethnic genetic correlation; however, we emphasize that selection acts on allele frequencies rather than causal effect sizes, and could help explain our findings only in conjunction with other explanations such as G × E interaction. Our results on SEGs implicate stronger G × E interaction near skin, immune, and ovary genes and weaker G × E interaction near brain genes, potentially implicating positive selection (as opposed to negative selection). This conclusion is further supported by the significant depletion of squared trans-ethnic genetic correlation in the integrated haplotype score (iHS) annotation that specifically reflects positive selection, high genome-wide trans-ethnic genetic correlation for schizophrenia (Supplementary Table 2), and lack of variation in squared trans-ethnic genetic correlation across genes in different deciles of probability of loss-of-function intolerance⁵⁶ (Methods, Supplementary Figs. 29, 30, Supplementary Table 5). We conclude that depletions of squared trans-ethnic genetic correlation could potentially be explained by stronger G × E interaction at loci impacted by positive selection. We caution that other explanations are also possible; in particular, evolutionary modeling using an extension of the Eyre–Walker model⁵⁷ to two populations suggests that our results for the background selection statistic could also be consistent with negative selection (Supplementary Notes, Supplementary Figs. 31, 32, Supplementary Table 6). Additional information, such as genomic annotations that better distinguish different types of selection or data from additional diverse populations, may help elucidate the relationship between selection and population-specific causal effect sizes.

Our study has several implications. First, PRSs in non-European populations that make use of European training data^6,9,11 may be improved by reweighting SNPs based on the expected enrichment/depletion of squared trans-ethnic genetic correlation, helping to alleviate health disparities^6,14,15. For example, when applying LD-pruning + p-value thresholding methods^58,59, both the strength of association and trans-ethnic genetic correlation should be accounted for when prioritizing SNPs for trans-ethnic PRS, as our results suggest that trans-ethnic genetic correlation is likely depleted near functional SNPs with significant p-values (due to stronger G × E). In particular, when multiple SNPs have a similar level of significance, the SNPs enriched for trans-ethnic genetic correlation should be prioritized. Analogously, when applying more recent methods that estimate posterior mean causal effect sizes^{60,61,62,63,64,65,66} (including functionally informed methods^62,66), these estimates should subsequently be weighted according to the expected enrichment/depletion for squared trans-ethnic genetic correlation based on their functional annotations. Second, modeling population-specific genetic architectures may improve trans-ethnic fine-mapping. Our results suggest that causal effect sizes and/or causal variants are likely to differ across different populations, contrary to standard assumptions^31,67. Thus, incorporating information about trans-ethnic genetic correlations in trans-ethnic fine-mapping may lead to more accurate identification of both population-specific and shared causal variants⁶⁸. Third, modeling population-specific genetic architectures may also increase power in trans-ethnic meta-analysis⁶⁹, e.g. by adapting MTAG⁷⁰ to two populations (instead of two traits), leveraging trans-ethnic (instead of cross-trait) genetic correlation between pairs of populations to improve the estimation of SNP effect sizes in both populations. Fourth, it may be of interest to stratify G × E interaction effects⁵¹ across genomic annotations. Fifth, modeling and incorporating environmental variables, where available, may provide additional insights into population-specific causal effect sizes. In our simulations, we did not explicitly simulate G × E. However, G × E would induce population-specific causal effect sizes, which we did explicitly simulate. Sixth, the S-LDXR method could potentially be extended to stratify squared cross-trait genetic correlations⁷¹ across genomic annotations⁷².

We note several limitations of this study that pertain to the S-LDXR method. First, S-LDXR is designed for populations of homogeneous continental ancestry (e.g. East Asians and Europeans) and is not currently suitable for analysis of admixed populations⁷³ (e.g. African Americans or admixed Africans from UK Biobank⁷⁴), analogous to LDSC and its published extensions^21,71,75. However, a recently proposed extension of LDSC to admixed populations⁷⁶ could be incorporated into S-LDXR, enabling its application to the growing set of large studies in admixed populations¹⁰. Second, S-LDXR estimates of enrichment of stratified squared trans-ethnic genetic correlation (λ²(C)) are slightly downward biased in null simulations of the top quintile of the background selection statistic and average LLD annotations, especially in simulations involving annotation-dependent MAF-dependent genetic architectures. However, these biases are small compared to the depletions of λ²(C) observed in the analysis of real traits. We further note that our estimates are unbiased in null simulations of binary annotations, implying that our results on real traits for binary annotations are robust. Third, since S-LDXR applies shrinkage to reduce standard error in estimating stratified squared trans-ethnic genetic correlation and its enrichment, estimates are conservative—true depletions of squared trans-ethnic genetic correlation in functionally important regions may be stronger than the estimated depletions. However, we emphasize that S-LDXR is approximately unbiased in null data. Fourth, the optimal value of the shrinkage parameter α may be specific to the pair of populations analyzed. In our simulations, we determined that α = 0.5 provides a satisfactory bias-variance tradeoff across a wide range of values of polygenicity and power. Thus, α = 0.5 may also be satisfactory for other pairs of populations. However, we recommend that one should ideally perform simulations on the pair of populations being analyzed to selection the optimal value of α. Fifth, it is difficult to assess whether a focal annotation contains unique information about λ²(C) conditional on other annotations, as squared trans-ethnic genetic correlation is a non-linear quantity defined by the quotient of squared trans-ethnic genetic covariance and the product of heritabilities in each population.

We also note several limitations of this study that pertain to our analysis of real traits. First, we focused on comparisons of East Asians and Europeans, due to the limited availability of very large GWAS in other populations. For other pairs of continental populations, if differences in the environment are similar, then we would expect similar genome-wide trans-ethnic genetic correlation and similar enrichment/depletion of squared trans-ethnic genetic correlation, based on our hypothesis that imperfect trans-ethnic genetic correlation is primarily attributable to G × E. We also note that different sets of SNPs, with different MAF and LD patterns, would be analyzed for different pairs of populations. However, we expect that these differences would not contribute to differences in trans-ethnic genetic correlation, if G × E is the fundamental factor impacting trans-ethnic genetic correlation. Second, the SEG annotations analyzed in this study are defined predominantly (but not exclusively) based on gene expression measurements of Europeans²⁴. We hypothesize that results based on SEG annotations defined in East Asian populations would likely be similar, as heritability enrichment of functional annotations (predominantly defined in Europeans) are consistent across continental populations^20,31, despite the fact that gene expression patterns and genetic architectures of gene expression differ across diverse populations^12,77,78. Thus, SEG annotations derived from gene expression data from diverse populations may provide additional insights into population-specific causal effect sizes. Third, we restricted our analyses to SNPs that were relatively common (MAF > 5%) in both populations (estimating parameters that were defined based on common SNPs), due to the lack of a large LD reference panel for East Asians. Extending our analyses to lower-frequency SNPs may provide further insights into the role of negative selection in shaping population-specific genetic architectures, as negative selection has the strongest impact on variants with low frequency^26,27. Fourth, we did not consider population-specific variants in our analyses, due to the difficulty in defining trans-ethnic genetic correlation for population-specific variants^2,5, a more fundamental challenge than analyzing low-frequency SNPs; a recent study⁷⁹ has reported that population-specific variants substantially limit trans-ethnic genetic risk prediction accuracy. Fifth, estimates of genome-wide trans-ethnic genetic correlation may be confounded by different trait definitions or diagnostic criteria in the two populations, particularly for major depressive disorder. However, this would not impact estimates of enrichment/depletion of squared trans-ethnic genetic correlation (λ²(C)), which is defined relative to genome-wide values. Sixth, we have not pinpointed the exact underlying phenomena (e.g. environmental heterogeneity coupled with gene-environment interaction) that lead to population-specific causal disease effect sizes at functionally important regions. Despite these limitations, our study provides an improved understanding of the underlying biology that contribute to population-specific causal effect sizes, and highlights the need for increasing diversity in genetic studies.

Methods

Definition of stratified squared trans-ethnic genetic correlation

We model a complex phenotype in two populations using linear models, Y₁ = X₁β₁ + ϵ₁ and Y₂ = X₂β₂ + ϵ₂, where Y₁ and Y₂ are vectors of phenotype measurements of population 1 and population 2 with sample size N₁ and N₂, respectively; X₁ and X₂ are mean-centered but not normalized genotype matrices at M SNPs in the two populations; β₁ and β₂ are per-allele causal effect sizes of the M SNPs; and ϵ₁ and ϵ₂ are environmental effects in the two populations. We assume that in each population, genotypes, causal effect sizes, and environmental effects are independent of each other. We assume that the per-allele effect size of SNP j in the two populations has variance and covariance,

$$\begin{array}{l}{\rm{Var}}[{\beta }_{1j}]=\sum \limits_{C}{a}_{C}(j){\tau }_{1C},\ {\rm{Var}}[{\beta }_{2j}]=\sum \limits_{C}{a}_{C}(j){\tau }_{2C},\\ {\rm{Cov}}[{\beta }_{1j},{\beta }_{2j}]=\sum \limits_{C}{a}_{C}(j){\theta }_{C},\end{array}$$

(3)

where a_C(j) is the value of SNP j for annotation C, which can be binary or continuous-valued; τ_1C and τ_2C are the net contribution of annotation C to the variance of β_1j and β_2j, respectively; and θ_C is the net contribution of annotation C to the covariance of β_1j and β_2j.

We define stratified trans-ethnic genetic correlation of a binary annotation C (e.g. functional annotations²¹ or quintiles of continuous-valued annotations²²) as

$${r}_{g}(C)=\frac{{\rho }_{g}(C)}{\sqrt{{h}_{g1}^{2}(C)}\sqrt{{h}_{g2}^{2}(C)}},$$

(4)

where ${\rho }_{g}(C)={\sum }_{j\in C}{\rm{Cov}}[{\beta }_{1j},{\beta }_{2j}]={\sum }_{j\in C}{\sum }_{C^{\prime} }{a}_{C^{\prime} }(j){\theta }_{C^{\prime} }$ is the trans-ethnic genetic covariance of annotation C; and ${h}_{gp}^{2}(C)={\sum }_{j\in C}{\rm{Var}}[{\beta }_{pj}]={\sum }_{j\in C}{\sum }_{C^{\prime} }{a}_{C^{\prime} }(j){\tau }_{pC^{\prime} }$ is the "allelic-scale heritability" (sum of per-SNP variance of per-allele causal effect sizes; different from heritability on the standardized scale) of annotation C in population p. Here, $C^{\prime}$ includes all binary and continuous-valued annotations included in the analysis. Since estimates of ${h}_{gp}^{2}(C)$ can be noisy (possibly negative), we estimate squared stratified trans-ethnic genetic correlation,

$${r}_{g}^{2}(C)=\frac{{\rho }_{g}^{2}(C)}{{h}_{g1}^{2}(C){h}_{g2}^{2}(C)},$$

(5)

to avoid bias or undefined values in the square root. In this work, we only estimate ${r}_{g}^{2}(C)$ for SNPs with minor allele frequency (MAF) greater than 5% in both populations. To assess whether causal effect sizes are more or less correlated for SNPs in annotation C compared with the genome-wide average, ${r}_{g}^{2}$, we define the enrichment/depletion of stratified squared trans-ethnic genetic correlation as

$${\lambda }^{2}(C)=\frac{{r}_{g}^{2}(C)}{{r}_{g}^{2}}.$$

(6)

We meta-analyze λ²(C) instead of ${r}_{g}^{2}(C)$ across diseases and complex traits. For continuous-valued annotations, defining ${r}_{g}^{2}(C)$ and λ²(C) is challenging, as squared correlation is a non-linear term involving a quotient of squared covariance and a product of variances; we elected to instead estimate λ²(C) for quintiles of continuous-valued annotations (analogous to ref. ²²). We note that the average value of λ²(C) across quintiles of continuous-valued annotations is not necessarily equal to 1, as squared trans-ethnic genetic correlation is a non-linear quantity.

We provide a more detailed definition of the estimands in the Supplementary Notes.

S-LDXR method

S-LDXR is conceptually related to stratified LD score regression^21,22 (S-LDSC), a method for stratifying heritability from GWAS summary statistics, to two populations. The S-LDSC method determines that a category of SNPs is enriched for heritability if SNPs with high LD to that category have higher expected χ² statistic than SNPs with low LD to that category. Analogously, the S-LDXR method determines that a category of SNPs is enriched for trans-ethnic genetic covariance if SNPs with high LD to that category have higher expected product of Z-scores than SNPs with low LD to that category.

S-LDXR relies on the regression equation

$${\rm{E}}[{Z}_{1j}{Z}_{2j}]=\sqrt{{N}_{1}{N}_{2}}\sum \limits_{C}{\ell }_{\times }(j,C){\theta }_{C}$$

(7)

to estimate θ_C, where Z_pj is the Z-score of SNP j in population p; ℓ_×(j, C) = ∑_kr_1jkr_2jkσ_1jσ_2ja_C(k) is the trans-ethnic LD score of SNP j with respect to annotation C, whose value for SNP k, a_C(k), can be either binary or continuous; r_pjk is the LD between SNP j and k in population p; and σ_pj is the standard deviation of SNP j in population p. We obtain unbiased estimates of ℓ_×(j, C) using genotype data of 481 East Asian and 489 European samples in the 1000 Genomes Project¹⁶. To account for heteroscedasticity and increase statistical efficiency, we use weighted least square regression to estimate θ_C. We use regression equations analogous to those described in ref. ²¹ to estimate τ_1C and τ_2C. We include only well-imputed (imputation INFO > 0.9) and common (MAF > 5% in both populations) SNPs that are present in HapMap 3¹⁷ (irrespective of GWAS significance level) in the regressions (regression SNPs), analogous to our previous work^21,71,75. We use all SNPs present in either population in 1000 Genomes¹⁶ to estimate trans-ethnic LD scores ℓ_×(j, C) (reference SNPs; analogous to S-LDSC²¹), so that the resulting coefficients θ_C also pertain to these SNPs. However, we estimate ${r}_{g}^{2}(C)$ and λ²(C) (see below; defined as a function of causal effect sizes) for all SNPs with MAF > 5% in both populations (heritability SNPs), accounting for tagging effects (analogous to S-LDSC²¹).

Let ${\hat{\tau }}_{1C}$, ${\hat{\tau }}_{2C}$, and ${\hat{\theta }}_{C}$ be the estimates of τ_1C, τ_2C, and θ_C, respectively. For each binary annotation C, we estimate the stratified heritability of annotation C in each population, ${h}_{g1}^{2}(C)$ and ${h}_{g2}^{2}(C)$, and trans-ethnic genetic covariance, ρ_g(C), as

$${\hat{h}}_{g1}^{2}(C)=\sum \limits_{j\in C}\sum \limits_{C^{\prime} }{a}_{jC^{\prime} }{\hat{\tau }}_{1C^{\prime} },\ {\hat{h}}_{g2}^{2}(C)=\sum \limits_{j\in C}\sum \limits_{C^{\prime} }{a}_{jC^{\prime} }{\hat{\tau }}_{2C^{\prime} },\ {\hat{\rho }}_{g}(C)=\sum \limits_{j\in C}\sum \limits_{C^{\prime} }{a}_{jC^{\prime} }{\hat{\theta }}_{C^{\prime} },$$

(8)

respectively, restricting to causal effects of SNPs with MAF > 5% in both populations (heritability SNPs), using coefficients (${\tau }_{1C^{\prime} }$, ${\tau }_{2C^{\prime} }$, and ${\theta }_{C^{\prime} }$) of both binary and continuous-valued annotations. We estimate genome-wide trans-ethnic genetic correlation as ${\hat{r}}_{g}=\frac{{\hat{\rho }}_{g}({\mathcal{C}})}{\sqrt{{\hat{h}}_{g1}^{2}({\mathcal{C}}){\hat{h}}_{g1}^{2}({\mathcal{C}})}}$, where ${\mathcal{C}}$ represents the set of all SNPs with MAF > 5% in both populations. We then estimate ${r}_{g}^{2}(C)$ as

$${\hat{r}}_{g}^{2}(C)=\left\{{\tilde{r}}_{g}^{2}(C)+\frac{{\rm{Cov}}\left[{\hat{\rho }}_{g}^{2}(C),{\hat{h}}_{g1}^{2}(C){\hat{h}}_{g2}^{2}(C)\right]}{{\hat{h}}_{g1}^{2}(C){\hat{h}}_{g2}^{2}(C)}\right\}/\left\{1+\frac{{\rm{Var}}\left[{\hat{h}}_{g1}^{2}(C){\hat{h}}_{g2}^{2}(C)\right]}{{\hat{h}}_{g1}^{2}(C){\hat{h}}_{g2}^{2}(C)}\right\},$$

(9)

where ${\tilde{r}}_{g}^{2}(C)=\frac{{\hat{\rho }}_{g}^{2}(C)-{\rm{Var}}[{\hat{\rho }}_{g}(C)]}{{\hat{h}}_{g1}^{2}(C){\hat{h}}_{g2}^{2}(C)-{\rm{Cov}}[{\hat{h}}_{g1}^{2}(C),{\hat{h}}_{g2}^{2}(C)]}$. The correction to ${\tilde{r}}_{g}^{2}(C)$ in Eq. (9) is necessary for obtaining an unbiased estimate of ${r}_{g}^{2}(C)$, as computing quotients of two random variables introduces bias (Supplementary Notes). (We do not constrain the estimate of ${r}_{g}^{2}(C)$ to its plausible range of [−1, 1], as this would introduce bias.) Subsequently, we estimate enrichment of stratified squared trans-ethnic genetic correlation as

$${\hat{\lambda }}^{2}(C)=\left\{{\tilde{\lambda }}^{2}(C)+\frac{{\rm{Cov}}\left[{\hat{r}}_{g}^{2}(C),{\hat{r}}_{g}^{2}\right]}{{\hat{r}}_{g}^{2}(C)}\right\}/\left\{1+\frac{{\rm{Var}}\left[{\hat{r}}_{g}^{2}(C)\right]}{{\hat{r}}_{g}^{2}(C)}\right\}$$

(10)

where ${\tilde{\lambda }}^{2}(C)=\frac{\hat{{r}_{g}^{2}(C)}}{\hat{{r}_{g}^{2}}}$, the ratio between estimated stratified (${\hat{r}}_{g}^{2}(C)$) and genome-wide (${\hat{r}}_{g}^{2}$) squared trans-ethnic genetic correlation. We use block jackknife over 200 non-overlapping and equally sized blocks to obtain the standard error of all estimates. The standard error of λ²(C) primarily depends on the total allelic-scale heritability of SNPs in the annotation (sum of per-SNP variances of causal per-allele effect sizes), which appears as the denominator (${h}_{1g}^{2}(C){h}_{2g}^{2}(C)$) in the estimation of a stratified squared trans-ethnic genetic correlation (${r}_{g}^{2}(C)$); if this denominator is small, estimation of ${r}_{g}^{2}(C)$ becomes noisy. The standard error of λ²(C) indirectly depends on the size of the annotation, because larger annotations tend to have larger total heritability. However, estimates of λ²(C) for a large annotation may have a large standard error if the annotation is depleted for heritability.

To assess the informativeness of each annotation in explaining disease heritability and trans-ethnic genetic covariance, we define standardized annotation effect size on heritability and trans-ethnic genetic covariance for each annotation C analogous to ref. ²²,

$$\begin{array}{l}{\tau }_{1C}^{* }=\frac{M{h}_{g1}^{2}}{{h}_{g1}^{2}(C)}\times {\sigma }_{C}\times {\tau }_{1C},\ {\tau }_{2C}^{* }=\frac{M{h}_{g2}^{2}}{{h}_{g2}^{2}(C)}\times {\sigma }_{C}\times {\tau }_{2C},\\ {\theta }_{C}^{* }=\frac{M{\rho }_{g}}{{\rho }_{g}(C)}\times {\sigma }_{C}\times {\theta }_{C},\end{array}$$

(11)

where ${\tau }_{1C}^{* }$, ${\tau }_{2C}^{* }$, and ${\theta }_{C}^{* }$ represent proportionate change in per-SNP heritability in population 1 and 2 and trans-ethnic genetic covariance, respectively, per standard deviation increase in annotation C; τ_1C, τ_2C, and θ_C are the corresponding unstandardized effect sizes, defined in Eq. (3); and σ_C is the standard deviation of annotation C.

We provide a more detailed description of the method, including derivations of the regression equation and unbiased estimators of the LD scores, in the Supplementary Notes.

S-LDXR shrinkage estimator

Estimates of ${r}_{g}^{2}(C)$ can be imprecise with large standard errors if the denominator, ${h}_{g1}^{2}(C){h}_{g2}^{2}(C)$, is close to zero and noisily estimated. This is especially the case for annotations of small size (<1% SNPs). We introduce a shrinkage estimator to reduce the standard error in estimating ${r}_{g}^{2}(C)$.

Briefly, we shrink the estimated per-SNP heritability and trans-ethnic genetic covariance of annotation C towards the genome-wide averages, which are usually estimated with smaller standard errors, prior to estimating ${r}_{g}^{2}(C)$. In detail, let M_C be the number of SNPs in annotation C, we shrink $\frac{{\hat{h}}_{1g}^{2}(C)}{{M}_{C}}$, $\frac{{\hat{h}}_{2g}^{2}(C)}{{M}_{C}}$, and $\frac{{\hat{\rho }}_{g}(C)}{{M}_{C}}$ towards $\frac{{\hat{h}}_{1g}^{2}}{M}$, $\frac{{\hat{h}}_{2g}^{2}}{M}$, and $\frac{{\hat{\rho }}_{g}}{M}$, respectively, where ${\hat{h}}_{g1}^{2}$, ${\hat{h}}_{g2}^{2}$, ${\hat{\rho }}_{g}$ are the genome-wide estimates, and M the total number of SNPs. We obtain the shrinkage as follows. Let ${\gamma }_{1}=1/\left(1+\alpha \frac{{\rm{Var}}\left[{\hat{h}}_{g1}^{2}(C)\right]}{{\rm{Var}}\left[{\hat{h}}_{g1}^{2}\right]}\frac{M}{{M}_{C}}\right)$, ${\gamma }_{2}=1/\left(1+\alpha \frac{{\rm{Var}}\left[{\hat{h}}_{g2}^{2}(C)\right]}{{\rm{Var}}\left[{\hat{h}}_{g2}^{2}\right]}\frac{M}{{M}_{C}}\right)$, and ${\gamma }_{3}=1/\left(1+\alpha \frac{{\rm{Var}}\left[{\hat{\rho }}_{g}(C)\right]}{{\rm{Var}}\left[{\hat{\rho }}_{g}\right]}\frac{M}{{M}_{C}}\right)$ be the shrinkage obtained separately for ${\hat{h}}_{g1}^{2}(C)$, ${\hat{h}}_{g2}^{2}(C)$ and ${\hat{\rho }}_{g}(C)$, respectively, where α ∈ [0, 1] is the shrinkage parameter adjusting magnitude of shrinkage. We then choose the most stringent shrinkage, $\gamma =\min \{{\gamma }_{1},{\gamma }_{2},{\gamma }_{3}\}$, as the final shared shrinkage for both heritability and trans-ethnic genetic covariance.

We shrink heritability and trans-ethnic genetic covariance of annotation C using γ as, ${\bar{h}}_{g1}^{2}(C)={M}_{C}\left(\gamma \frac{{\hat{h}}_{g1}^{2}(C)}{{M}_{C}}+(1-\gamma )\frac{{\hat{h}}_{g1}^{2}}{M}\right)$, ${\bar{h}}_{g2}^{2}(C)={M}_{C}\left(\gamma \frac{{\hat{h}}_{g2}^{2}(C)}{{M}_{C}}+(1-\gamma )\frac{{\hat{h}}_{g2}^{2}}{M}\right)$, and ${\bar{\rho }}_{g}(C)={M}_{C}\left(\gamma \frac{{\hat{\rho }}_{g}(C)}{{M}_{C}}+(1-\gamma )\frac{{\hat{\rho }}_{g}}{M}\right)$, where ${\bar{h}}_{g1}^{2}(C)$, ${\bar{h}}_{g2}^{2}(C)$, and ${\bar{\rho }}_{g}(C)$ are the shrunk counterparts of ${\hat{h}}_{g1}^{2}(C)$, ${\hat{h}}_{g2}^{2}(C)$, and ${\hat{\rho }}_{g}(C)$, respectively. We shrink ${\hat{r}}_{g}^{2}(C)$ by substituting ${\hat{h}}_{g1}^{2}(C)$, ${\hat{h}}_{g2}^{2}(C)$, and ${\hat{\rho }}_{g}(C)$ with ${\bar{h}}_{g1}^{2}(C)$, ${\bar{h}}_{g2}^{2}(C)$, ${\bar{\rho }}_{g}(C)$, respectively, in Eq. (9), to obtain its shrunk counterpart, ${\bar{r}}_{g}^{2}(C)$. Finally, we shrink ${\hat{\lambda }}^{2}(C)$, by plugging in ${\bar{r}}_{g}^{2}(C)$ in Eq. (10) to obtain its shrunk counterpart, ${\bar{\lambda }}^{2}(C)$. We recommend α = 0.5 as the default shrinkage parameter value, as this value provides robust estimates of λ²(C) in simulations. We note that S-LDXR does not use the shrinkage estimator when estimating genome-wide r_g and ${r}_{g}^{2}$.

Significance testing

To assess whether an annotation C is enriched or depleted of squared trans-ethnic genetic correlation for a trait, we test the null hypothesis ${\hat{\lambda }}^{2}(C)=1$. Since ${\hat{\lambda }}^{2}(C)$ is not normally distributed⁸⁰, we instead test the equivalent null hypothesis ${\hat{D}}^{2}(C)={\hat{\rho }}_{g}^{2}(C)-{\hat{r}}_{g}^{2}{\hat{h}}_{g1}^{2}(C){\hat{h}}_{g1}^{2}(C)=0$, where ${\hat{r}}_{g}^{2}$ is the genome-wide squared trans-ethnic genetic correlation. We obtain test statistic as $\frac{{\hat{D}}^{2}(C)}{s.e.[{\hat{D}}^{2}(C)]}$, and obtain p-value under t-distribution with B − 1 degrees of freedom, where B is the number of jackknife blocks. Since the ${\hat{D}}^{2}(C)$ statistic does not involve division by ${\hat{h}}_{g1}^{2}(C){\hat{h}}_{g1}^{2}(C)$, we do not apply any shrinkage to ${\hat{D}}^{2}(C)$.

Baseline-LD-X model

We include a total of 54 binary functional annotations in the baseline-LD-X model. These include 53 annotations introduced in ref. ²¹, which consists of 28 main annotations including conserved annotations (e.g. Coding, Conserved) and epigenomic annotations (e.g. H3K27ac, DHS, Enhancer) derived from ENCODE⁸¹ and Roadmap⁸², 24 500-base-pair-extended main annotations, and 1 annotation containing all SNPs. We note that although chromatin accessibility can be population-specific, the fraction of such regions is small⁸³. Following ref. ²², we created an additional annotation for all genomic positions with number of rejected substitutions⁸⁴ greater than 4. Further information for all functional annotations included in the baseline-LD-X model is provided in Supplementary Data 1a.

We also include a total of 8 continuous-valued annotations in the baseline-LD-X model. First, we include 5 continuous-valued annotations introduced in ref. ²² (see “Data availability”), without modification: background selection statistic³⁰, CpG content (within a ± 50 kb window), GERP (number of substitutation) score⁸⁴, nucleotide diversity (within a ± 10 kb window), and Oxford map recombination rate (within a ± 10 kb window)⁸⁵. Second, we include 2 minor allele frequency (MAF) adjusted annotations introduced in ref. ²², with modification: level of LD (LLD) and predicted allele age. We created analogous annotations applicable to both East Asian and European populations. To create an analogous LLD annotation, we estimated LD scores for each population using LDSC⁷⁵, took the average across populations, and then quantile-normalized the average LD scores using 10 average MAF bins. We call this annotation “average level of LD”. To create analogous predicted allele age annotation, we quantile-normalized allele age estimated by ARGweaver⁸⁶ across 54 multi-ethnic genomes using 10 average MAF bins. Finally, we include 1 continuous-valued annotation based on F_ST estimated by PLINK2⁸⁷, which implements the Weir & Cockerham estimator of F_ST⁸⁸. Further information for all continuous-valued annotations included in the baseline-LD-X model is provided in Supplementary Data 1b.

Simulations

We used simulated East Asian (EAS) and European (EUR) genotype data to assess the performance of our method, as we did not have access to real EAS genotype data at a sufficient sample size to perform simulations with real genotypes. We simulated genotype data for 100,000 East-Asian-like and 100,000 European-like individuals using HAPGEN2²⁵ (see “Code availability”), which preserves population-specific MAF and LD patterns, starting from phased haplotypes of 481 East Asians and 489 Europeans individuals available in the 1000 Genomes Project¹⁶ (see “Data availability”), restricting to ~2.5 million SNPs on chromosome 1 – 3 with minor allele count greater than 5 in either population. Since the direct output of HAPGEN2 includes substantial relatedness², we used PLINK2⁸⁷ (see “Code availability”) to remove simulated individuals with genetic relatedness greater than 0.05, resulting in 35,378 EAS-like and 36,836 EUR-like individuals. From the filtered set of individuals, we randomly selected 500 individuals in each simulated population to serve as reference panels. We used 18,418 EAS-like and 36,836 EUR-like individuals to simulate GWAS summary statistics, capturing the imbalance in sample size between EAS and EUR GWAS in the analysis of real traits. In our secondary simulations, we also decreased or increased the reference panel size or decreased the GWAS sample size, to evaluate the robustness of our method with respect to reference panel size and GWAS sample size.

We performed both null simulations, where enrichment of squared trans-ethnic genetic correlation, λ²(C), is 1 across all functional annotations, and causal simulations, where λ²(C) varies across annotations, under various degrees of polygenicity (1%, 10%, and 100% causal SNPs). In the null simulations, we set τ_1C, τ_2C, θ_C to be the meta-analyzed τ_C in real-data analyses of EAS GWASs, and followed Eq. (3) to obtain a variance, Var[β_1j] and Var[β_2j], and covariance, Cov[β_1j, β_2j], of per-SNP causal effect sizes β_1j, β_2j, setting all negative per-SNP variance and covariance to 0. In the causal simulations, we directly specified per-SNP causal effect size variances and covariances using self-devised τ_1C, τ_2C, and θ_C coefficients, to attain λ²(C) ≠ 1, as these were difficult to attain using the coefficients from analyses of real traits.

We randomly selected a subset of SNPs to be causal for both populations, and set Var[β_1j], Var[β_2j], and Cov[β_1j, β_2j] to be 0 for all remaining non-causal SNPs. We scaled the trans-ethnic genetic covariance to attain a desired genome-wide r_g. Next, we drew causal effect sizes of each causal SNP j in the two populations from the bi-variate Gaussian distribution,

$$\left[\begin{array}{l}{\beta }_{1j}\\ {\beta }_{2j}\end{array}\right] \sim N\left(\left[\begin{array}{l}0\\ 0\end{array}\right],\left[\begin{array}{ll}{\rm{Var}}[{\beta }_{1j}]&{\rm{Cov}}[{\beta }_{1j},{\beta }_{2j}]\\ {\rm{Cov}}[{\beta }_{1j},{\beta }_{2j}]&{\rm{Var}}[{\beta }_{2j}]\end{array}\right]\right),$$

(12)

and scaled the drawn effect sizes to match the desired total heritability and trans-ethnic genetic covariance. We also performed null simulations in which imperfect genome-wide trans-ethnic genetic correlation is due to population-specific causal variants. In these simulations, we randomly selected 10% of the SNPs to be causal in each population, with 80% of causal variants in each population shared with the other population, and sampled perfectly correlated causal effect sizes for shared causal variants using Eq. (12). We simulated the genetic component of the phenotype in population p as X_pβ_p, where X_p is column-centered genotype matrix, and drew environmental effects, ϵ_p, from the Gaussian distribution, $N\left(0,1-{\rm{Var}}[{{\bf{X}}}_{p}{{\boldsymbol{\beta }}}_{p}]\right)$, such that the total phenotypic variance in each population is 1. Finally, we simulated GWAS summary association statistics for population p, Z_p, as ${Z}_{pj}=\frac{{{\bf{X}}}_{pj}^{{\mathtt{T}}}{{\bf{Y}}}_{p}}{\sqrt{{N}_{p}}{\sigma }_{pj}}$, where σ_pj is the standard deviation of SNP j in population p. We have publicly released Python code for simulating GWAS summary statistics for 2 populations (see “Code availability”). Fifth, we performed additional null simulations with annotation-dependent MAF-dependent genetic architectures^26,27,28, defined as architectures in which the level of MAF-dependence is annotation-dependent.

We also performed null simulations with annotation-dependent MAF-dependent genetic architectures^26,27,28, defined as architectures in which the level of MAF-dependence is annotation-dependent, to assess the impact on estimates of enrichment of stratified squared trans-ethnic genetic correlation, (λ²(C)). In these simulations, we set the variance of causal effect size of each SNP j in both populations to be proportional to ${[{p}_{j,\text{max}}(1-{p}_{j,\text{max}})]}^{\alpha }$, where p_j,max is the maximum MAF of SNP j in the two populations. (We elected to use maximum MAF because a SNP that is rare in one population but common in the other is less likely to be impacted by negative selection.) We set α to − 0.38, as previously estimated for 25 UK Biobank diseases and complex traits in ref. ²⁸. We sampled causal effect sizes using Eq. (12), with Var[β_1j], Var[β_2j], and Cov[β_1j, β_2j] scaled to attain a desired genome-wide heritability and trans-ethnic genetic correlation. We randomly selected 10% of SNPs to be causal in both populations. Additionally, in the top quintile of background selection statistic, we selected 1.8 × more low-frequency causal variants (p_j,max < 0.05) than common variants (p_j,max ≥ 0.05), capturing the action of negative selection across low-frequency and common variants²⁷.

Summary statistics for 31 diseases and complex traits

We analyzed GWAS summary statistics of 31 diseases and complex traits, primarily from UK Biobank⁷⁴, Biobank Japan²⁰, and CONVERGE¹⁸. All summary statistics were based on genotyping arrays with imputation to an appropriate LD reference panel (e.g. Haplotype Reference Consortium⁸⁹ and UK10K⁹⁰ for UK Biobank⁷⁴, the 1000 Genomes Project¹⁶ for Biobank Japan²⁰), except those of the MDD GWAS in the East Asian population, which was based on low-coverage whole genome sequencing data¹⁸. These include: atrial fibrillation (AF)^91,92, age at menarche(AMN)^93,94, age at menopause (AMP)^93,94, basophil count(BASO)^20,95, body mass index (BMI)^20,96, blood sugar(BS)^20,96, diastolic blood pressure (DBP)^20,96, eosinophil count(EO)^20,96, estimated glomerular filtration rate (EGFR)^20,97, hemoglobin A1c(HBA1C)^20,96, height (HEIGHT)^96,98, high density lipoprotein (HDL)^20,96, hemoglobin (HGB)^20,95, hematocrit (HTC)^20,95, low density lipoprotein (LDL)^20,96, lymphocyte count(LYMPH)^20,96, mean corpuscular hemoglobin (MCH)^20,96, mean corpuscular hemoglobin concentration (MCHC)^20,95, mean corpuscular volume (MCV)^20,95, major depressive disorder (MDD)^18,99, monocyte count (MONO)^20,96, neutrophil count(NEUT)^20,95, platelet count (PLT)^20,96, rheumatoid arthritis(RA)¹⁰⁰, red blood cell count (RBC)^20,96, systolic blood pressure (SBP)^20,96, schizophrenia (SCZ)¹⁰¹, type 2 diabetes (T2D)^102,103, total cholesterol (TC)^20,96, triglyceride (TG)^20,96, and white blood cell count (WBC)^20,96. Further information for the GWAS summary statistics analyzed is provided in Supplementary Table 2. In our main analyses, we performed random-effect meta-analysis to aggregate results across all 31 diseases and complex traits. To test if the meta-analyzed ${\hat{\lambda }}^{2}(C)$ is significantly different from 1, we computed a test statistic as $\frac{{\hat{\lambda }}^{2}(C)-1}{s.e.({\hat{\lambda }}^{2}(C))}$, where $s.e.({\hat{\lambda }}^{2}(C))$ is the standard error of meta-analyzed ${\hat{\lambda }}^{2}(C)$, and obtained a p-value under the normal distribution. We also defined a set of 20 approximately independent diseases and complex traits with cross-trait ${r}_{g}^{2}$ (estimated using cross-trait LDSC⁷¹) less than 0.25 in both populations: AF, AMN, AMP, BASO, BMI, EGFR, EO, HBA1C, HEIGHT, HTC, LYMPH, MCHC, MCV, MDD, NEUT, PLT, RA, SBP, TC, TG.

Expected enrichment of stratified squared trans-ethnic genetic correlation from 8 continuous-valued annotations

To obtain expected enrichment of squared trans-ethnic genetic correlation of a binary annotation C, λ²(C), from 8 continuous-valued annotations, we first fit the S-LDXR model using these 8 annotations together with the base annotation for all SNPs, yielding coefficients, ${\tau }_{1C^{\prime} }$, ${\tau }_{2C^{\prime} }$, and ${\theta }_{C^{\prime} }$, for a total of 9 annotations. We then use Eq. (3) to obtain per-SNP variance and covariance of causal effect sizes, β_1j and β_1j, substituting τ_1C, τ_2C, θ_C with ${\tau }_{1C^{\prime} }$, ${\tau }_{2C^{\prime} }$, and ${\theta }_{C^{\prime} }$, respectively. We apply shrinkage with default parameter setting (α = 0.5), and use Eqs. (9) and (10) to obtain expected stratified squared trans-ethnic genetic correlation, ${r}_{g}^{2}(C)$, and subsequently λ²(C).

Analysis of SEG annotations

We obtained 53 SEG annotations, defined in ref. ²⁴ as ±100k-base-pair regions surrounding genes specifically expressed in each of 53 GTEx³² tissues. A list of the SEG annotations is provided in Supplementary Data 2. Correlations between SEG annotations and the 8 continuous-valued annotations are reported in Supplementary Fig. 28 and Supplementary Data 2. Most SEG annotations are moderately correlated with the background selection statistic and CpG content annotations.

To test whether there is heterogeneity in enrichment of squared trans-ethnic genetic correlation, λ²(C), across the 53 SEG annotations, we first computed the average λ²(C) across the 53 annotations, ${\bar{\lambda }}^{2}(C)$, using fixed-effect meta-analysis. We then computed the test statistic $\mathop{\sum }\nolimits_{i = 1}^{53}\frac{{\left({\hat{\lambda }}^{2}({C}_{i})-{\bar{\lambda }}^{2}({C}_{i})\right)}^{2}}{{\rm{Var}}[{\hat{\lambda }}^{2}({C}_{i})]}$, where C_i is the ith SEG annotation, and ${\hat{\lambda }}^{2}({C}_{i})$ the estimated λ²(C). We computed a p-value for this test statistic based on a χ² distribution with 53 degrees of freedom.

Analysis of distance to nearest exon annotation

We created a continuous-valued annotation, named “distance to nearest exon annotation”, based on a SNP’s physical distance (number of base pairs) to its nearest exon, using 233,254 exons defined on the UCSC genome browser¹⁰⁴ (see “Data availability”). This annotation is moderately correlated with the background selection statistic annotation²² (R = −0.21), defined as (1 - McVicker B statistic/1000), where the McVicker B statistic quantifies a site’s genetic distance to its nearest exon³⁰. We have publicly released this annotation (see “Data availability”).

To assess the informativeness of functionally important regions versus regions impacted by selection in explaining the depletions of squared trans-ethnic genetic correlation, we applied S-LDXR on the distance to nearest exon annotation together with the baseline-LD-X model annotations. We used both enrichment of squared trans-ethnic genetic correlation (λ²(C)) and standardized annotation effect size (${\tau }_{1C}^{* }$, ${\tau }_{2C}^{* }$, and ${\theta }_{C}^{* }$) to assess informativeness.

Analysis of probability of loss-of-function intolerance decile gene annotations

We created 10 annotations based on genes in deciles of the probability of being loss-of-function intolerant (pLI) (see “Data availability”), defined as the probability of assigning a gene into haplosufficient regions, where protein-truncating variants are depleted⁵⁶. Genes with high pLI (e.g. >0.9) have highly constrained functionality, and therefore mutations in these genes are subject to negative selection. We included SNPs within a 100kb-base-pair window around each gene, following ref. ²⁴. A correlation heat map between pLI decile gene annotations and the 8 continuous-valued annotations is provided in Supplementary Fig. 29. All pLI decile gene annotations are moderately correlated with the background selection statistic and CpG content annotations.

Analysis of the integrated haplotype score annotation

We created a binary annotation (proportion of SNPs: 6.3%) that includes all SNPs whose maximum absolute value of the integrated haplotype score (iHS)^43,44 (see “Data availability”) across all 1000 Genomes Project EAS and EUR sub-populations are greater than 2.0, the recommended threshold to detect positive selection in ref. ⁴³. This annotation is positively correlated with the top quintile of the background selection statistic annotation (R = 0.077). We note that although the iHS is a recombination-rate-adjusted quantity to detect the action of recent positive selection, it may also capture actions of negative selection^43,44.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All baseline-LD-X model annotations and other annotations used in this work are available at https://data.broadinstitute.org/alkesgroup/S-LDXR/. We used exon definitions from the UCSC Genome Browser (https://genome.ucsc.edu/). We used gene pLI scores from the Exome Aggregation Consortium (ExAC) (https://exac.broadinstitute.org/). The integrated haplotype scores (iHS) are available at http://coruscant.itmat.upenn.edu/data/JohnsonEA_iHSscores.tar.gz. The 1000 Genomes Project Phase 3 data are available at https://www.internationalgenome.org/. The baseline-LD model annotations are available at https://alkesgroup.broadinstitute.org/LDSCORE/.

Code availability

Python code implementing S-LDXR is available at https://github.com/huwenboshi/s-ldxr. Python code for simulating GWAS summary statistics under the baseline-LD-X model is available at https://github.com/huwenboshi/s-ldxr-sim. Python code implementing the two-population Eyre-Walker model is available at https://github.com/huwenboshi/two-population-Eyre-Walker-model. Python code for creating the distance to nearest exon annotation is available at https://github.com/huwenboshi/distance-to-nearest-exon. We used HAPGEN2 (https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html) to simulated genotype data. We used PLINK2 (https://www.cog-genomics.org/plink/2.0/) to remove related individuals in the simulated genotype data.

References

de Candia, T. R. et al. Additive genetic variation in schizophrenia risk is shared by populations of african and european descent. Am. J. Hum. Genet. 93, 463–470 (2013).
Article PubMed PubMed Central CAS Google Scholar
Brown, B. C. et al. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30 (2016).
Article CAS PubMed Google Scholar
Ikeda, M. et al. Genome-wide association study detected novel susceptibility genes for schizophrenia and shared trans-populations/diseases genetic effect. Schizophr. Bull. 45, 824–834 (2018).
Article PubMed Central Google Scholar
Galinsky, K. J. et al. Estimating cross-population genetic correlations of causal effect sizes. Genet. Epidemiol. 43, 180–188 (2019).
Article PubMed Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584 (2019).
Article CAS PubMed PubMed Central Google Scholar
Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the page study. PLoS Biol. 11, e1001661 (2013).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Márquez-Luna, C., Loh, P.-R., Consortium, S. A. T. D. S., Consortium, S. T. D. & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).
Article PubMed PubMed Central Google Scholar
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Keys, K. L. et al. On the cross-population portability of gene expression prediction models. bioRxiv https://doi.org/10.1101/552042 (2019).
Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gurdasani, D., Barroso, I., Zeggini, E. & Sandhu, M. S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20, 520–535 (2019).
Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015).
Article ADS CAS Google Scholar
Consortium, I. H. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52 (2010).
Article ADS CAS Google Scholar
Cai, N. et al. Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 523, 588 (2015).
Article ADS CAS Google Scholar
Nagai, A. et al. Overview of the biobank japan project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Article PubMed PubMed Central Google Scholar
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390 (2018).
Article CAS PubMed Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling s-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
Article CAS PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621 (2018).
Article CAS PubMed PubMed Central Google Scholar
Su, Z., Marchini, J. & Donnelly, P. Hapgen2: simulation of multiple disease SNPs. Bioinformatics 27, 2304–2305 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet.50, 746 (2018).
Article CAS PubMed Google Scholar
Gazal, S. et al. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat. Genet. 50, 1600 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Cai, N., Kendler, K. & Flint, J. Minimal phenotyping yields GWAS hits of low specificity for major depression. BioRxiv https://doi.org/10.1101/440735 (2018).
McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).
Article PubMed PubMed Central CAS Google Scholar
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Article CAS PubMed PubMed Central Google Scholar
Consortium, G. et al. Genetic effects on gene expression across human tissues. Nature 550, 204 (2017).
Article Google Scholar
Raychaudhuri, S. et al. Accurately assessing the risk of schizophrenia conferred by rare copy-number variation affecting genes with brain function. PLoS Genet. 6, e1001097 (2010).
Article PubMed PubMed Central CAS Google Scholar
Sabeti, P. C. et al. Positive natural selection in the human lineage. Science 312, 1614–1620 (2006).
Article ADS CAS PubMed Google Scholar
Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. & Clark, A. G. Recent and ongoing selection in the human genome. Nat. Rev. Genet. 8, 857 (2007).
Article CAS PubMed PubMed Central Google Scholar
Novembre, J. & Di Rienzo, A. Spatial patterns of variation due to natural selection in humans. Nat. Rev. Genet. 10, 745 (2009).
Article CAS PubMed PubMed Central Google Scholar
Laland, K. N., Odling-Smee, J. & Myles, S. How culture shaped the human genome: bringing genetics and the human sciences together. Nat. Rev. Genet. 11, 137 (2010).
Article CAS PubMed Google Scholar
Wilde, S. et al. Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 y. Proc. Natl Acad. Sci. 111, 4832–4837 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
von Boehmer, H. Positive selection of lymphocytes. Cell 76, 219–228 (1994).
Article Google Scholar
Li, J. et al. Natural selection has differentiated the progesterone receptor among human populations. Am. J. Human Genet. 103, 45–57 (2018).
Article CAS Google Scholar
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
Veeramah, K. R. & Hammer, M. F. The impact of whole-genome sequencing on the reconstruction of human population history. Nat. Rev. Genet. 15, 149 (2014).
Article CAS PubMed Google Scholar
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
Johnson, K. E. & Voight, B. F. Patterns of shared signatures of recent positive selection across human populations. Nat. Ecol. Evol. 2, 713–720 (2018).
Article PubMed PubMed Central Google Scholar
van Dongen, J. & Boomsma, D. I. The evolutionary paradox and the missing heritability of schizophrenia. Am. J. Med. Genet. B: Neuropsychiatric Genet. 162, 122–136 (2013).
Article CAS Google Scholar
Pardiñas, A. F. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381–389 (2018).
Article PubMed PubMed Central CAS Google Scholar
Vicennati, V. & Pasquali, R. Abnormalities of the hypothalamic-pituitary-adrenal axis in nondepressed women with abdominal obesity and relations with insulin resistance: evidence for a central and a peripheral alteration. J. Clin. Endocrinol. Metab. 85, 4093–4098 (2000).
Article CAS PubMed Google Scholar
Vgontzas, A. et al. Hypothalamic-pituitary-adrenal axis activity in obese men with and without sleep apnea: effects of continuous positive airway pressure therapy. J. Clin. Endocrinol. Metab. 92, 4199–4207 (2007).
Article CAS PubMed Google Scholar
Bose, M., Oliván, B. & Laferrère, B. Stress and obesity: the role of the hypothalamic–pituitary–adrenal axis in metabolic disease. Curr. Opin. Endocrinol. Diabetes Obes. 16, 340 (2009).
Article CAS PubMed PubMed Central Google Scholar
Itoh, N. & Ornitz, D. M. Fibroblast growth factors: from molecular evolution to roles in development, metabolism and disease. J. Biochem. 149, 121–130 (2011).
Article CAS PubMed Google Scholar
Robinson, M. R. et al. Genotype–covariate interaction effects and the heritability of adult body mass index. Nat. Genet. 49, 1174 (2017).
Article CAS PubMed Google Scholar
Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).
Article PubMed PubMed Central CAS Google Scholar
Mäki-Tanila, A. & Hill, W. G. Influence of gene interaction on complex trait variation with multilocus models. Genetics 198, 355–367 (2014).
Article PubMed PubMed Central Google Scholar
Zhu, Z. et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 96, 377–385 (2015).
Article CAS PubMed PubMed Central Google Scholar
de Jong, M. et al. Natural variation in arabidopsis shoot branching plasticity in response to nitrate supply affects fitness. PLoS Genet. 15, e1008366 (2019).
Article PubMed PubMed Central CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285 (2016).
Article CAS PubMed PubMed Central Google Scholar
Eyre-Walker, A. Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107, 1752–1756 (2010).
Consortium, I. S. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748 (2009).
Article CAS Google Scholar
Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483 (2012).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R. et al. Efficient bayesian mixed-model analysis increases association power in large cohorts. Nature genetics 47, 284 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article PubMed PubMed Central CAS Google Scholar
Hu, Y. et al. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 13, e1005589 (2017).
Article PubMed PubMed Central CAS Google Scholar
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1–10 (2019).
Article CAS Google Scholar
Chung, W. et al. Efficient cross-trait penalized regression increases prediction accuracy in large cohorts using secondary phenotypes. Nat. Commun. 10, 1–11 (2019).
Article ADS CAS Google Scholar
Lloyd-Jones, L. R. et al. Improved polygenic prediction by bayesian multiple regression on summary statistics. Nat. Commun. 10, 1–11 (2019).
Article CAS Google Scholar
Márquez-Luna, C. et al. LDpred-funct: incorporating functional priors improves polygenic prediction accuracy in UK biobank and 23andme data sets. bioRxiv https://doi.org/10.1101/375337 (2020).
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from gwas summary data. Am. J. Hum. Genet. 106, 805–817 (2020).
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).
Article PubMed PubMed Central Google Scholar
Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using mtag. Nat. Genet. 50, 229 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lu, Q. et al. A powerful approach to estimating annotation-stratified genetic covariance via gwas summary statistics. Am. J. Hum. Genet. 101, 939–964 (2017).
Article CAS PubMed PubMed Central Google Scholar
Seldin, M. F., Pasaniuc, B. & Price, A. L. New approaches to disease mapping in admixed populations. Nat. Rev. Genet. 12, 523 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. K. et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291 (2015).
Article CAS PubMed PubMed Central Google Scholar
Luo, Y. et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. bioRxiv https://doi.org/10.1101/503144 (2019).
Martin, A. R. et al. Transcriptome sequencing from diverse human populations reveals differentiated regulatory architecture. PLoS Genet. 10, e1004549 (2014).
Article PubMed PubMed Central CAS Google Scholar
Mogil, L. S. et al. Genetic architecture of gene expression traits across diverse populations. PLoS Genet. 14, e1007586 (2018).
Article PubMed PubMed Central CAS Google Scholar
Durvasula, A. & Lohmueller, K. E. Negative selection on complex traits limits genetic risk prediction accuracy between populations. bioRxiv https://doi.org/10.1101/721936 (2019).
Curtiss, J. On the distribution of the quotient of two chance variables. Annal. Math. Statistics 12, 409–421 (1941).
Article MathSciNet MATH Google Scholar
Consortium, E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57 (2012).
Article ADS CAS Google Scholar
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kasowski, M. et al. Extensive variation in chromatin states across humans. Science 342, 750–752 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using gerp++. PLoS Comput. Biol. 6, e1001025 (2010).
Article PubMed PubMed Central CAS Google Scholar
Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005).
Article ADS CAS PubMed Google Scholar
Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).
Article PubMed PubMed Central CAS Google Scholar
Chang, C. C. et al. Second-generation plink: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central CAS Google Scholar
Weir, B. S. & Cockerham, C. C. Estimating f-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
CAS PubMed Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS PubMed PubMed Central Google Scholar
consortium, U. et al. The uk10k project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Article ADS CAS Google Scholar
Low, S.-K. et al. Identification of six new genetic loci associated with atrial fibrillation in the Japanese population. Nat. Genet. 49, 953 (2017).
Article CAS PubMed Google Scholar
Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nature Genet. 50, 1234 (2018).
Article CAS PubMed Google Scholar
Horikoshi, M. et al. Elucidating the genetic architecture of reproductive ageing in the japanese population. Nat. Commun. 9, 1977 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Day, F. R. et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and brca1-mediated DNA repair. Nat. Genet. 47, 1294 (2015).
Article CAS PubMed PubMed Central Google Scholar
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).
Article CAS PubMed PubMed Central Google Scholar
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pattaro, C. et al. Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat. Commun. 7, 10023 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 1–11 (2019).
Article CAS Google Scholar
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668 (2018).
Article CAS PubMed PubMed Central Google Scholar
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376 (2014).
Article ADS CAS PubMed Google Scholar
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nature Genet. 51, 1670–1678 (2019).
Article CAS PubMed Google Scholar
Suzuki, K. et al. Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population. Nat. Genet. 51, 379 (2019).
Article CAS PubMed Google Scholar
Scott, R. A. et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902 (2017).
Article CAS PubMed PubMed Central Google Scholar
Karolchik, D., Hinrichs, A. S. & Kent, W. J. The UCSC genome browser. Curr. Protocols Bioinformatics 40, 1–4 (2012).
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to L. O’Connor, H. Finucane, D. Kassler, S. Mallick, N. Patterson, B. Neale, R. Walters, A. Martin, B. Brown, F. Hormozdiari, M. Hujoel, K. Burch, and B. Pasaniuc for helpful discussions. This research was conducted using the UK Biobank Resource under Application 16549 and was funded by NIH grants R01 HG006399, U01 HG009379, R37 MH107649, R01 MH101244, and R01 CA222147.

Author information

Authors and Affiliations

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Huwenbo Shi, Steven Gazal, Armin P. Schoech, Katherine M. Siewert, Samuel S. Kim & Alkes L. Price
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Huwenbo Shi, Steven Gazal, Masahiro Kanai, Armin P. Schoech, Katherine M. Siewert, Samuel S. Kim, Yang Luo, Tiffany Amariuta, Soumya Raychaudhuri & Alkes L. Price
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Masahiro Kanai & Hailiang Huang
Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Masahiro Kanai & Hailiang Huang
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Masahiro Kanai, Yang Luo, Tiffany Amariuta & Soumya Raychaudhuri
Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
Masahiro Kanai & Yukinori Okada
Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Evan M. Koch, Yang Luo, Soumya Raychaudhuri & Shamil R. Sunyaev
Department of Medicine, Harvard Medical School, Boston, MA, USA
Evan M. Koch, Hailiang Huang & Shamil R. Sunyaev
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Armin P. Schoech & Alkes L. Price
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
Samuel S. Kim
Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Yang Luo, Tiffany Amariuta & Soumya Raychaudhuri
Center for Data Sciences, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Yang Luo, Tiffany Amariuta & Soumya Raychaudhuri
Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
Tiffany Amariuta
Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
Yukinori Okada
Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
Soumya Raychaudhuri

Authors

Huwenbo Shi
View author publications
You can also search for this author in PubMed Google Scholar
Steven Gazal
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Kanai
View author publications
You can also search for this author in PubMed Google Scholar
Evan M. Koch
View author publications
You can also search for this author in PubMed Google Scholar
Armin P. Schoech
View author publications
You can also search for this author in PubMed Google Scholar
Katherine M. Siewert
View author publications
You can also search for this author in PubMed Google Scholar
Samuel S. Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany Amariuta
View author publications
You can also search for this author in PubMed Google Scholar
Hailiang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yukinori Okada
View author publications
You can also search for this author in PubMed Google Scholar
Soumya Raychaudhuri
View author publications
You can also search for this author in PubMed Google Scholar
Shamil R. Sunyaev
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L. Price
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.S. and A.L.P. designed the experiments. H.S. performed the experiments. H.S., S.G., M.K., E.M.K., A.P.S., and A.L.P. analyzed the data. H.S. and A.L.P. wrote the paper with assistance from all authors.

Corresponding authors

Correspondence to Huwenbo Shi or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Jian Zeng, and the other, anonymous, reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1-2

Supplementary Data 3-5

Supplementary Data 6-8

Supplementary Data 9-10

Supplementary Data 11

Supplementary Data 12

Supplementary Data 13

Supplementary Data 14

Supplementary Data 15-16

Supplementary Data 17

Supplementary Data 18-20

Supplementary Data 21

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shi, H., Gazal, S., Kanai, M. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat Commun 12, 1098 (2021). https://doi.org/10.1038/s41467-021-21286-1

Download citation

Received: 15 November 2019
Accepted: 15 January 2021
Published: 17 February 2021
DOI: https://doi.org/10.1038/s41467-021-21286-1

This article is cited by

Principles and methods for transferring polygenic risk scores across global populations
- Linda Kachuri
- Nilanjan Chatterjee
- Tian Ge
Nature Reviews Genetics (2024)
Cross-ancestry genetic architecture and prediction for cholesterol traits
- Md. Moksedul Momin
- Xuan Zhou
- S. Hong Lee
Human Genetics (2024)
Shared genetic architectures of educational attainment in East Asian and European populations
- Tzu-Ting Chen
- Jaeyoung Kim
- Hong-Hee Won
Nature Human Behaviour (2024)
Genetic control of DNA methylation is largely shared across European and East Asian populations
- Alesha A. Hatton
- Fei-Fei Cheng
- Allan F. McRae
Nature Communications (2024)
Estimation of cross-ancestry genetic correlations within ancestry tracts of admixed samples
- Elizabeth G. Atkinson
Nature Genetics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Overview of methods

Simulations

Analysis of baseline-LD-X model annotations across 31 diseases and complex traits

Analysis of SEG annotations

Discussion

Methods

Definition of stratified squared trans-ethnic genetic correlation

S-LDXR method

S-LDXR shrinkage estimator

Significance testing

Baseline-LD-X model

Simulations

Summary statistics for 31 diseases and complex traits

Expected enrichment of stratified squared trans-ethnic genetic correlation from 8 continuous-valued annotations

Analysis of SEG annotations

Analysis of distance to nearest exon annotation

Analysis of probability of loss-of-function intolerance decile gene annotations

Analysis of the integrated haplotype score annotation

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links