Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies

Abstract

Fine-mapping in genome-wide association studies attempts to identify causal SNPs from a set of candidate SNPs in a local genomic region of interest and is commonly performed in one genetic ancestry at a time. Here, we present multi-ancestry sum of the single effects model (MESuSiE), a probabilistic multi-ancestry fine-mapping method, to improve the accuracy and resolution of fine-mapping by leveraging association information across ancestries. MESuSiE uses summary statistics as input, accounts for the diverse linkage disequilibrium pattern observed in different ancestries, explicitly models both shared and ancestry-specific causal SNPs, and relies on a variational inference algorithm for scalable computation. We evaluated the performance of MESuSiE through comprehensive simulations and multi-ancestry fine-mapping of four lipid traits with both European and African samples. In the real data, MESuSiE improves fine-mapping resolution by 19.0% to 72.0% compared to existing approaches, is an order of magnitude faster, and captures and categorizes shared and ancestry-specific causal signals with enhanced functional enrichment.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic overview of MESuSiE.
Fig. 2: An illustrative simulation example highlights the benefits of explicit modeling of both shared and ancestry-specific signals in multi-ancestry fine-mapping.
Fig. 3: Comparison of the 95% credible set size and power of different methods in detecting the causal signals that have non-zero effects in at least one ancestry in the baseline simulation settings.
Fig. 4: Comparison of the power of different methods in detecting the shared causal signals in the baseline simulation settings.
Fig. 5: Comparison of the 95% credible sets from different methods for four lipid traits in the real data.
Fig. 6: LocusZoom plot displays the fine-mapping results on TG from different methods in a genomic region on chromosome 11.

Similar content being viewed by others

Data availability

The individual-level genotype data UKBB are available at http://www.ukbiobank.ac.uk. GWAS summary statistics for the four lipid traits and MCHC in UKBB are available at http://www.nealelab.is/uk-biobank. GWAS summary statistics for the four lipid traits from GLGC are available at http://csg.sph.umich.edu/willer/public/glgc-lipids2021/. GWAS summary statistics of MCHC in Biobank Japan are available at https://pheweb.jp/downloads. GWAS summary statistics of SCZ are available at https://figshare.com/articles/dataset/scz2019asi/19193084. The GTEx independent eQTL v8 data are publicly available at https://gtexportal.org/home/datasets. Baseline annotation data are available at https://alkesgroup.broadinstitute.org/LDSCORE/baselineLD_v2.1_annots/. 1000 Genomes data (phase 3) are available at https://ftp.1000genomes.ebi. ac.uk/vol1/ftp/release/20130502/. The summary statistics data and the fine-mapping results presented in the study are also available at Zenodo55 https://doi.org/10.5281/zenodo.8411004. Source data are provided with this paper.

Code availability

MESuSiE (1.0) is available at https://github.com/borangao/MESuSiE, deposited at Zenodo55 https://doi.org/10.5281/zenodo.8411004. SuSiE (0.11.84) software is available at https://github.com/stephenslab/susieR. Paintor (3.0) software is available at https://github.com/gkichaev/PAINTOR_V3.0. METAL (metal-2011-03-25) is available at https://csg.sph.umich.edu/abecasis/metal/download/. Genotype data processing and quality control filtering of plink bed/fam/bim files were performed using PLINK (2.0) available at https://www.cog-genomics.org/plink/2.0/. Variants were annotated using Ensembl Variant Effect Predictor v85 with assembly GRCh37 is available at https://useast.ensembl.org/info/docs/tools/vep/index.html. Phylop score of variants are retrieved by R package GenomicScores (2.8.0). Base pair position of GTEX eQTL was transferred from GRCh38 to GRCh37 by R package liftOver (1.20.0). The analysis code to reproduce the results presented in the study is available at Zenodo55 https://doi.org/10.5281/zenodo.8411004.

References

  1. Willer, C. J. et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat. Genet 40, 161–169 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Nielsen, J. B. et al. Genome-wide study of atrial fibrillation identifies seven risk loci and highlights biological pathways and regulatory elements involved in cardiac development. Am. J. Hum. Genet 102, 103–115 (2018).

    Article  CAS  PubMed  Google Scholar 

  3. Schaid, D. J. et al. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Yang, J., Fritsche, L. G., Zhou, X. & Abecasis, G. A scalable Bayesian method for integrating functional information in genome-wide association studies. Am. J. Hum. Genet 101, 404–416 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wen, X., Lee, Y., Luca, F. & Pique-Regi, R. Efficient integrative multi-SNP association analysis via deterministic approximation of posteriors. Am. J. Hum. Genet 98, 1114–1129 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhang, W., Najafabadi, H. S. & Li, Y. SparsePro: an efficient genome-wide fine-mapping method integrating summary statistics and functional annotations. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463133 (2021).

  10. Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 18, e1010299 (2022).

  11. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 82, 1273–1300 (2020).

    Article  Google Scholar 

  12. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10, e1004722 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Yang, Z. et al. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat. Genet 55, 1057–1065 (2023).

    Article  CAS  PubMed  Google Scholar 

  14. Shi, H. et al. Localizing components of shared transethnic genetic architecture of complex traits from GWAS summary data. Am. J. Hum. Genet 106, 805–817 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Shi, H. et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 12, 1–15 (2021).

    Google Scholar 

  16. Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Mahajan, A. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet 46, 234–244 (2014).

    Article  CAS  PubMed  Google Scholar 

  18. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. LaPierre, N. et al. Identifying causal variants by fine mapping across multiple studies. PLoS Genet 17, e1009733 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet 26, 3639–3650 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet 97, 260–271 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e10017779 (2015).

    Article  Google Scholar 

  25. Benjamin, M. UK Biobank — Neale lab. http://www.nealelab.is/uk-biobank (2018).

  26. Yuan, K. et al. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases. Preprint at medRxiv https://doi.org/10.1101/2023.01.07.23284293 (2023).

  27. Zou, Y., Carbonetto, P., Xie, D., Wang, G. & Stephens, M. Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model. Preprint at bioRxiv https://doi.org/10.1101/2023.04.14.536893 (2023).

  28. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet 52, 1355–1363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).

  30. Liang, Y. et al. A scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction. Nat. Commun. 12, 1424 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Qiao, J., Shao, Z., Wu, Y., Zeng, P. & Wang, T. Detecting associated genes for complex traits shared across East Asian and European populations under the framework of composite null hypothesis testing. J. Transl. Med 20, 424 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Shang, L. et al. Genetic architecture of gene expression in European and African Americans: an eQTL mapping study in GENOA. Am. J. Hum. Genet 106, 496–512 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Nakamura, M. T. & Nara, T. Y. Structure, function, and dietary regulation of delta6, delta5, and delta9 desaturases. Annu Rev. Nutr. 24, 345–376 (2004).

    Article  CAS  PubMed  Google Scholar 

  35. Stoffel, W. et al. Obesity resistance and deregulation of lipogenesis in Δ6-fatty acid desaturase (FADS2) deficiency. EMBO Rep. 15, 110–120 (2014).

    Article  CAS  PubMed  Google Scholar 

  36. Nakaya, Y., Schaefer, E. J. & Brewer, H. B. Activation of human post heparin lipoprotein lipase by apolipoprotein H (β2-glycoprotein I). Top. Catal. 95, 1168–1172 (1980).

    CAS  Google Scholar 

  37. Choudhury, A. et al. Meta-analysis of sub-Saharan African studies provides insights into genetic architecture of lipid traits. Nat. Commun. 13, 2578 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Cavalcante, L. N. et al. African genetic ancestry is associated with lower frequency of PNPLA3 G allele in non-alcoholic fatty liver in an admixed population. Ann. Hepatol. 27, 100728 (2022).

    Article  CAS  PubMed  Google Scholar 

  39. Goffredo, M. et al. Role of TM6SF2 rs58542926 in the pathogenesis of nonalcoholic pediatric fatty liver disease: a multiethnic study. Hepatology 63, 117–125 (2016).

    Article  CAS  PubMed  Google Scholar 

  40. Fan, Y. et al. Hepatic transmembrane 6 superfamily member 2 regulates cholesterol metabolism in mice. Gastroenterology 150, 1208–1218 (2016).

    Article  CAS  PubMed  Google Scholar 

  41. O’Hare, E. A. et al. TM6SF2 rs58542926 impacts lipid processing in liver and small intestine. Hepatology 65, 1526–1542 (2017).

    Article  PubMed  Google Scholar 

  42. Zeng, P., Hao, X. & Zhou, X. Pleiotropic mapping and annotation selection in genome-wide association studies with penalized Gaussian mixture models. Bioinformatics 34, 2797–2807 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Zhou, G., Chen, T. & Zhao, H. SDPRX: a statistical method for cross-population prediction of complex traits. Am. J. Hum. Genet 110, 13–22 (2023).

    Article  CAS  PubMed  Google Scholar 

  44. Thompson, W. D. et al. Association of maternal circulating 25(OH)D and calcium with birth weight: a mendelian randomisation analysis. PLoS Med 16, e1002828 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Burns, A. C. et al. Genome-wide gene by environment study of time spent in daylight and chronotype identifies emerging genetic architecture underlying light sensitivity. Sleep 46, zsac287 (2023).

    Article  PubMed  Google Scholar 

  46. Gharahkhani, P. et al. Effect of increased body mass index on risk of diagnosis or death from cancer. Br. J. Cancer 120, 565–570 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Bovijn, J. et al. GWAS identifies risk locus for erectile dysfunction and implicates hypothalamic neurobiology and diabetes in etiology. Am. J. Hum. Genet 104, 157–163 (2019).

    Article  CAS  PubMed  Google Scholar 

  48. Yang, S. & Zhou, X. PGS-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Brief. Bioinform. 10, bbac039 (2022).

    Article  Google Scholar 

  49. Horton, R. et al. Gene map of the extended human MHC. Nat. Genet. 5, 889–899 (2004).

    Article  CAS  Google Scholar 

  50. Beck, S. et al. Complete sequence and gene map of a human major histocompatibility complex. Nature 401, 921–923 (1999).

    Article  Google Scholar 

  51. Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Article  CAS  Google Scholar 

  52. Aguet, F. et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  CAS  Google Scholar 

  53. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 1–14 (2016).

    Article  Google Scholar 

  54. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Gao, B. & Zhou, X. MESuSiE: enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Zenodo https://doi.org/10.5281/zenodo.8411004 (2023).

Download references

Acknowledgements

This study was supported by the National Institutes of Health (NIH) grants R01HG009124 and R01GM144960 (to X.Z.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. This study has been conducted using UK Biobank resource under Application Number 30686. UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also had funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK.

Author information

Authors and Affiliations

Authors

Contributions

X.Z. designed the methods. B.G. performed the experiments and analyzed and interpreted data. X.Z. and B.G. drafted and revised the manuscript. All authors critically reviewed the manuscript, suggested revisions as needed and approved the final version.

Corresponding author

Correspondence to Xiang Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Stephen Rich, Alicia Martin, Qiongshi Lu for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of the power of different methods in detecting the ancestry-specific causal signals in the baseline simulation settings.

Results are shown for the baseline setting with different numbers of causal SNPs (1, 3, or 5; columns) and different causal effect sizes (per causal SNP heritability = 10−4 or 2*10−4; rows). (a) Precision-recall curve with the PIP threshold spanning from 0.01 to 1. For each PIP threshold t, the White British-specific signal is declared as PIPWB > t for MESuSiE, PIPWB > t, PIPBB < t for SuSiE, and PIP > t for Paintor. the Black British-specific signal is declared as PIPBB > t for MESuSiE, PIPBB > t, PIPWB < t for SuSiE, and PIP > t for Paintor. These quantities are calculated as Precision = 1 – FDR, \({\rm{FDR}}\,=\frac{{\rm{FP}}}{{\rm{TP}}+{\rm{FP}}}\), \({\rm{Recall}}=\frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FN}}}\), where FP, TP, FN, and TN denote the counts of false positives, true positives, false negatives, and true negatives respectively, at a specified PIP threshold. PIP threshold of 0.5 is labelled in circle. (b) Bar plot displays the power at controlled FDR of 0.01, 0.05, and 0.1.

Source data

Extended Data Fig. 2 Calibration of PIP detecting the ancestry-specific causal signals in the baseline simulation settings.

Results are shown for the baseline setting in n = 200 independent regions of interest (combining per causal SNP heritability = 10−4 or 2*10−4) with different number of causal signals (number of causal SNP = 1, 3, 5; cols). SNPs are ranked by PIPs and divided into 10 equally spaced bins based on PIP values. The observed proportion of the signal (y-axis) is compared to the mean PIP (x-axis) within each bin. The gray error bars in each panel represent 2 * standard errors. The diagonal red line in each panel shows the expected line when the PIP statistics are calibrated. Points below the diagonal line imply that the corresponding PIPs are larger than expected, suggesting more false discoveries than expected. Points above the diagonal line imply that the PIPs are conservative.

Source data

Extended Data Fig. 3 LocusZoom plot displays the fine-mapping results on TG from different methods in a genomic region on chromosome 19.

1st row: LocusZoom plots display the two-sided negative log10(P-value) from the marginal GWAS analysis (y-axis) in the UKBB (left) and the African ancestry GWAS (right) across base pair positions (x-axis). A genome-wide significance threshold of 5*10−8 was applied to determine marginal significance. Color of a SNP represents the r2 values between the SNP and the lead variant rs1801689 in the region. 2nd row: LocusZoom plots display the SuSiE PIP (y-axis) fitted in the UKBB (left) and the African ancestry GWAS (right) across base pair positions. 3rd row: LocusZoom plots for the PIPeither from MESuSiE (left) or Paintor (right) across base pair positions (x-axis). Signals, whether shared or ancestry-specific, are determined based on corresponding PIP value (PIPshared, PIPEUR, or PIPAFR) exceeding 0.5. 4th row: annotated genes in the genomic region. For the detected signals, we use upper triangle to represent European-specific signal, lower triangle to represent African ancestry-specific variant, diamond to represent shared signal, and square to represent Paintor signal. In contrast, a circle is used to indicate the absence of a signal (None). TG: triglycerides.

Source data

Extended Data Fig. 4 Comparison of the 95% credible sets from different methods for MCHC and SCZ.

(a) Boxplot displays the set size of the 95% credible sets for 174 analyzed genomic regions. (b) eQTL enrichment, which is calculated as the ratio of independent eSNP proportion within the 95% credible set to the eSNP proportion outside of the set. Blood and brain cortex eQTLs are used. MCHC: mean corpuscular hemoglobin concentration, SCZ: schizophrenia.

Source data

Extended Data Fig. 5 LocusZoom plot displays the fine-mapping results on SCZ from different methods in a genomic region on chromosome 15.

1st row: LocusZoom plots display the two-sided negative log10(P-value) from the marginal GWAS analysis (y-axis) in the PGC European (left) and the East Asian ancestry GWAS (right) across base pair positions (x-axis). A genome-wide significance threshold of 5*10−8 was applied to determine marginal significance. Color of a SNP represents the r2 values between the SNP and the lead variant rs4702 in the region. 2nd row: LocusZoom plots display the SuSiE PIP (y-axis) fitted in the European (left) and the East Asian ancestry GWAS (right) across base pair positions. 3rd row: LocusZoom plots for the PIPeither from MESuSiE (left) or Paintor (right) across base pair positions (x-axis). Signals, whether shared or ancestry-specific, are determined based on corresponding PIP value (PIPshared, PIPEUR, or PIPEAS) exceeding 0.5. 4th row: annotated genes in the genomic region. For the detected signals, we use upper triangle to represent European-specific signal, lower triangle to represent East Asian ancestry-specific variant, diamond to represent shared signal, and square to represent Paintor signal. In contrast, a circle is used to indicate the absence of a signal (None). SCZ: schizophrenia.

Source data

Supplementary information

Supplementary Information

Supplementary Figures 1–47, Supplementary Tables 1–4, Supplementary Note and References.

Reporting Summary

Peer Review File

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig./Table 1

Statistical source data.

Source Data Extended Data Fig./Table 2

Statistical source data.

Source Data Extended Data Fig./Table 3

Statistical source data.

Source Data Extended Data Fig./Table 4

Statistical source data.

Source Data Extended Data Fig./Table 5

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, B., Zhou, X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat Genet 56, 170–179 (2024). https://doi.org/10.1038/s41588-023-01604-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01604-7

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics