Abstract
Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93–0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Automatic landmarking identifies new loci associated with face morphology and implicates Neanderthal introgression in human nasal shape
Communications Biology Open Access 08 May 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
PAGE individual-level genotype and phenotype data are available through dbGaP https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000356.v2.p1. UKBB individual-level genotype and phenotype data are available through application at https://www.ukbiobank.ac.uk/. AoU individual-level genotype and phenotype are available through application at https://www.researchallofus.org/. The set of preprocessed HapMap3 variants used in this manuscript is retrieved from https://ndownloader.figshare.com/files/25503788.
Code availability
Software implementing genome-wide genetic correlation estimation method: https://github.com/kangchenghou/admix-kit (ref. https://doi.org/10.5281/ZENODO.7482679) Code for replicating analyses: https://github.com/kangchenghou/admix-genet-cor (ref. https://doi.org/10.5281/ZENODO.7482683).
References
Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Ramirez, A. H. et al. The All of Us Research Program: data quality, utility, and diversity. Patterns 3, 100570 (2022).
Zhou, W. et al. Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genomics 2, 100192 (2022).
Brown, B. C., Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Galinsky, K. J. et al. Estimating cross-population genetic correlations of causal effect sizes. Genet. Epidemiol. 43, 180–188 (2019).
Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet. 106, 805–817 (2020).
Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1098 (2021).
Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).
Wang, Y. et al. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11, 3865 (2020).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Gurdasani, D., Barroso, I., Zeggini, E. & Sandhu, M. S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20, 520–535 (2019).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 1080 (2019).
Marigorta, U. M. & Navarro, A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 9, e1003566 (2013).
Patel, R. A. et al. Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits. Am. J. Hum. Genet. 109, 1286–1297 (2022).
Cai, N. et al. Minimal phenotyping yields genome-wide association signals of low specificity for major depression. Nat. Genet. 52, 437–447 (2020).
Seldin, M. F., Pasaniuc, B. & Price, A. L. New approaches to disease mapping in admixed populations. Nat. Rev. Genet. 12, 523–528 (2011).
Mills, M. C. & Rahal, C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020).
Atkinson, E. G. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 53, 195–204 (2021).
Hou, K., Bhattacharya, A., Mester, R., Burch, K. S. & Pasaniuc, B. On powerful GWAS in admixed populations. Nat. Genet. 53, 1631–1633 (2021).
Bitarello, B. D. & Mathieson, I. Polygenic scores for height in admixed populations. G3 10, 4027–4036 (2020).
Marnetto, D. et al. Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nat. Commun. 11, 1628 (2020).
Bentley, A. R. et al. Gene-based sequencing identifies lipid-influencing variants with ethnicity-specific effects in African Americans. PLoS Genet. 10, e1004190 (2014).
Rajabli, F. et al. Ancestral origin of ApoE ε4 Alzheimer disease risk in Puerto Rican and African American populations. PLoS Genet. 14, e1007791 (2018).
Blue, E. E., Horimoto, A. R. V. R., Mukherjee, S., Wijsman, E. M. & Thornton, T. A. Local ancestry at APOE modifies Alzheimer’s disease risk in Caribbean Hispanics. Alzheimers Dement. 15, 1524–1532 (2019).
Naslavsky, M. S. et al. Global and local ancestry modulate APOE association with Alzheimer’s neuropathology and cognitive outcomes in an admixed sample. Mol. Psychiatry 27, 4800–4808 (2022).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Zhang, Y., Qi, G., Park, J.-H. & Chatterjee, N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 50, 1318–1326 (2018).
The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).
Deming, W. E. Statistical adjustment of data. Wiley. (1943).
Pasaniuc, B. et al. Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS Genet. 7, e1001371 (2011).
Hodonsky, C. J. et al. Ancestry-specific associations identified in genome-wide combined-phenotype study of red blood cell traits emphasize benefits of diversity in genomics. BMC Genomics 21, 228 (2020).
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Johnson, R. et al. Estimation of regional polygenicity from GWAS provides insights into the genetic architecture of complex traits. PLoS Comput. Biol. 17, e1009483 (2021).
Zhang, J. & Stram, D. O. The role of local ancestry adjustment in association studies using admixed populations. Genet. Epidemiol. 38, 502–515 (2014).
Liu, J., Lewinger, J. P., Gilliland, F. D., Gauderman, W. J. & Conti, D. V. Confounding and heterogeneity in genetic association studies with admixed populations. Am. J. Epidemiol. 177, 351–360 (2013).
Saitou, M., Dahl, A., Wang, Q. & Liu, X. Allele frequency differences of causal variants have a major impact on low cross-ancestry portability of PRS. Preprint at medRxiv https://doi.org/10.1101/2022.10.21.22281371 (2022).
Pasaniuc, B. et al. Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. Bioinformatics 29, 1407–1415 (2013).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
Hou, K. et al. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat. Genet. 51, 1244–1251 (2019).
Linnet, K. Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin. Chem. 44, 1024–1031 (1998).
Chiu, A. M., Molloy, E. K., Tan, Z., Talwalkar, A. & Sankararaman, S. Inferring population structure in biobank-scale genomic data. Am. J. Hum. Genet. 109, 727–737 (2022).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
The 1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Schubert, R., Andaleon, A. & Wheeler, H. E. Comparing local ancestry inference models in populations of two- and three-way admixture. PeerJ 8, e10090 (2020).
Gay, N. R. et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21, 233 (2020).
Pardiñas, A. F. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381–389 (2018).
Schoech, A. P. et al. Negative short-range genomic autocorrelation of causal effects on human complex traits. Preprint at medRxiv https://doi.org/10.1101/2020.09.23.310748 (2020).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Reich, D. et al. Reduced neutrophil count in people of African descent is due to a regulatory variant in the Duffy antigen receptor for chemokines gene. PLoS Genet. 5, e1000360 (2009).
Reiner, A. P. et al. Genome-wide association and population genetic analysis of C-reactive protein in African American and Hispanic American women. Am. J. Hum. Genet. 91, 502–512 (2012).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B 82, 1273–1300 (2020).
Cook, J. P. & Morris, A. P. Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility. Eur. J. Hum. Genet. 24, 1175–1180 (2016).
Acknowledgements
We thank A. Price, M. J. Zhang, R. Patel, J. Pritchard, A. Durvasula, J. Cai and E. Petter for helpful suggestions. This research was funded in part by the National Institutes of Health under awards U01-HG011715 (B.P.), R01-HG009120 (B.P.), R01-MH115676 (B.P.), R01-HL151152 (C.K.), P01-CA196569 (D.V.C.) and U01-CA261339 (D.V.C.). Y.W. and S.S. were supported in part by NIH R35-GM125055 and NSF CAREER-1943497. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. PAGE is supported by the National Institutes of Health under awards R01-HG010297. This research was conducted using the UKBB Resource under application 33297. We thank the participants of UKBB for making this work possible. The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants.
Author information
Authors and Affiliations
Contributions
K.H. and B.P. conceived and designed the experiments. K.H. performed the experiments and statistical analyses with assistance from Y.D., Z.X., Y.W., A.B., R.M., S.S. and B.P. G.M.B., S.B., D.V.C., B.F.D., M.F., C.G., X.G., C.H., E.E.K., M.K., C.K., L.L., A.M., K.E.N., U.P., L.J.R.-T., S.S.R., J.I.R., H.E.W., G.L.W. and Y.Z. provided data and feedback on analysis. K.H. and B.P. wrote the manuscript with feedback from all authors.
Corresponding authors
Ethics declarations
Competing interests
E.E.K. has received personal fees from Regeneron Pharmaceuticals, 23&Me and Illumina, and serves on the advisory boards for Encompass Biosciences, Foresite Labs and Galateo Bio. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Loïc Yengo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Consistency of radmix for shared traits across studies.
We compared estimated radmix for shared traits across studies. We compared both \(\hat r_{{{{\mathrm{admix}}}}}\) (a-c) and \(- \log _{10}\left( p \right)\) (for one-sided test of \(H_0:r_{{{{\mathrm{admix}}}}} = 1\); Methods) (d-f). Three traits (Height, Triglycerides, Total cholesterol) with the most significant p-values for \(H_0:r_{{{{\mathrm{admix}}}}} = 1\) were annotated. Number of common traits shared across studies (ncommon) and Spearman correlation p-value were shown in the title for each panel. Overall, there were weak consistency of estimated \(\hat r_{{{{\mathrm{admix}}}}}\) for shared traits across studies (although 𝑝-values for \(H_0:r_{{{{\mathrm{admix}}}}} = 1\) were consistent significantly). Numerical results are reported in Supplementary Table 7.
Extended Data Fig. 2 radmix estimation is robust to the assumption of radmix > 0.
We performed radmix estimation using alternative assumption of \(- 1 \le r_{{{{\mathrm{admix}}}}} \le 1\) in real trait analysis in PAGE in light of potential scenarios of effect sizes in opposite directions36,63. We compared estimated radmix when assuming \(0 \le r_{{{{\mathrm{admix}}}}} \le 1\) (default method) and when assuming \(- 1 \le r_{{{{\mathrm{admix}}}}} \le 1\). Left: comparing point estimates of 𝑟admix across 24 traits in PAGE. Right: comparing the meta-analyzed log-likelihood. Results obtained from two methods are highly consistent.
Extended Data Fig. 3 radmix estimation is robust to genetic architecture and SNP set.
We performed radmix estimation under the assumption of alternative genetic architecture and SNP set on real trait analysis across PAGE and UKBB. We compared p-values (for one-sided test of \(H_0:r_{{{{\mathrm{admix}}}}} = 1\)) of our default setting (using frequency-dependent genetic architecture and imputed SNPs; Table 1) to those obtained using GCTA genetic architecture and imputed SNPs (a), and to those obtained using frequency-dependent genetic architecture and HM3 SNPs (b). Numerical results are reported in Supplementary Table 8.
Extended Data Fig. 4 radmix estimation is robust to subsetting PAGE African American individuals based on genotype PCs.
We subsetted PAGE individuals with self-identified race/ethnicity label of ‘African American’ (total N = 17,327) based on genotype PCs and retained N = 17,167 individuals (a). We found that the estimated 𝑟admix were highly consistent between using all PAGE African American individuals (default) and using subset of PAGE African American individuals based on genotype PCs. (b) comparing point estimates of 𝑟admix across 24 traits in PAGE. (Dot on the bottom left of the figure corresponds to MCHC trait, with a small sample size of 3,650.) (c) comparing the meta-analyzed log-likelihood. Results obtained from two sets of individuals are highly consistent.
Extended Data Fig. 5 Comparing estimated radmix between alternative method formulations and default method.
Each dot corresponds to a trait. (a) Comparing results of default method and of directly optimizing and estimating \(\sigma _g^2,\rho _g\). (b) Comparing results of default method and of directly optimizing and estimating \(\sigma _{g,1}^2,\sigma _{g,2}^2\) (different variance components per ancestry) and \(\rho _g\). See Supplementary Table 9 and Supplementary Note for details.
Extended Data Fig. 6 Multiple conditionally independent association signals for loci with heterogeneity by ancestry.
Upper panel corresponds to the two-sided association p-values and lower panel corresponds to the fine-mapping PIP. Different colors in the PIP plot corresponds to different credible sets. (a) MCH at 16p13.3 for UK Biobank European-African admixed individuals. (b) RBC at 16p13.3 for UK Biobank European-African admixed individuals. (c) CRP at 1q23.2 for PAGE European-African admixed individuals.
Extended Data Fig. 7 Simulations with single causal variant.
Simulations were based on 100 regions each spanning 20 Mb on chromosome 1 and 17,299 PAGE individuals. In each simulation, we randomly selected single causal variant and simulated quantitative phenotypes where these causal variants had same causal effects across ancestries and each causal variant was expected to explain a fixed amount of heritability (0.2%, 0.6%, 1.0%). Each panel corresponds to one metric for both causal and clumped variants. (a) False positive rate (FPR) of HET test. (b) Deming regression slope with \(\beta _{{{{\mathrm{afr}}}}} \sim \beta _{{{{\mathrm{eur}}}}}\). (c) Deming regression slope with \(\beta _{{{{\mathrm{eur}}}}} \sim \beta _{{{{\mathrm{afr}}}}}\). (d) Pearson correlation. (e) OLS regression slope with \(\beta _{{{{\mathrm{afr}}}}} \sim \beta _{{{{\mathrm{eur}}}}}\). (f) OLS regression slope with \(\beta _{{{{\mathrm{eur}}}}} \sim \beta _{{{{\mathrm{afr}}}}}\). 95% confidence intervals were based on 100 random sub-samplings with each sample consisted of 500 SNPs (Methods). Numerical results are reported in Supplementary Table 13.
Extended Data Fig. 8 Simulation with multiple causal variants at other sample sizes (Fig. 6d–f).
Simulations were based on chromosome 1 (515,087 SNPs) and 17,299 PAGE individuals. We drew 62, 125, 250, 500, 1000 causal variants to simulate different level of polygenicity, such that on average there were approximately 0.25, 0.5, 1.0, 2.0, 4.0 causal variants per Mb. The heritability explained by all causal variants was fixed at \(h_g^2 = 10\%\). (a-c) False positive rate of HET test for the causal variants and clumped variants. (d-f) Deming regression slope of estimated ancestry-specific effects (βeur ~ βaf) for the causal variants and clumped variants. 95% confidence intervals were based on 100 random sub-samplings with each sub-sample consisted of n = 50, 100, 500 SNPs (instead of n = 1,000 SNPs in Fig. 6c, d) (Methods).
Extended Data Fig. 9 Additional results for simulations with single causal variant with varying βeur:βafr and \(h_g^2\).
Simulations were based on 100 regions each spanning 20 Mb on chromosome 1 from 17299 PAGE individuals. In each simulation, we randomly selected single causal variant and simulated quantitative phenotypes where these causal variants had varying causal effects across ancestries and each causal variant was expected to explain a fixed amount of heritability (0.2%, 0.6%, 1.0%, 2.0%, 5.0%). We provide results for both causal variants and LD-clumped variants. We separate results into two rows for better visualization: upper row (a-c): βeur:βafr = 0.9, 1.0, 1.1; lower row (d-f): βeur:βafr = 0.0, 0.5, 1.0. We show results for False positive rate (FPR) of HET test, Deming regression slope with βeur ~ βafr, and OLS regression slope with βeur ~ βafr. 95% confidence intervals were based on 100 random sub-samplings with each sample consisted of 500 SNPs (Methods). Numerical results and further discussions are provided in Supplementary Table 15.
Supplementary information
Supplementary Information
Supplementary Note and Tables 1–18.
Supplemental Table 1
Supplementary Tables 8 and 11.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hou, K., Ding, Y., Xu, Z. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat Genet 55, 549–558 (2023). https://doi.org/10.1038/s41588-023-01338-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-023-01338-6