Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses

Abstract

Fine-mapping is commonly used to identify putative causal variants at genome-wide significant loci. Here we propose a Bayesian model for fine-mapping that has several advantages over existing methods, including flexible specification of the prior distribution of effect sizes, joint modeling of summary statistics and functional annotations and accounting for discrepancies between summary statistics and external linkage disequilibrium in meta-analyses. Using simulations, we compare performance with commonly used fine-mapping methods and show that the proposed model has higher power and lower false discovery rate (FDR) when including functional annotations, and higher power, lower FDR and higher coverage for credible sets in meta-analyses. We further illustrate our approach by applying it to a meta-analysis of Alzheimer’s disease genome-wide association studies where we prioritize putatively causal variants and genes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Motivating example.
Fig. 2: Credible sets, credible models and FDR versus power (in-sample LD, with and without functional annotations, T = 2).
Fig. 3: Credible sets, credible models and FDR versus power (UKBB LD, consistent and inconsistent meta-analyses, T = 2).
Fig. 4: Results for inconsistent meta-analyses.
Fig. 5: Functional enrichments of prioritized SNPs at AD GWAS loci.

Similar content being viewed by others

Data availability

1000G data (phase 3) are available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. LD for UKBB British-ancestry samples are available at https://data.broadinstitute.org/alkesgroup/UKBB_LD. Summary statistics of AD GWAS are available at https://ctg.cncr.nl/software/summary_statistics. DeepSEA functional annotations are available at http://deepsea.princeton.edu/job/analysis/create/. PolyFun annotations are available at https://alkesgroup.broadinstitute.org/polyfun_results/. CADD scores are available at https://cadd.gs.washington.edu/score. Eigen scores are available at http://www.columbia.edu/ii2135/eigen.html. V2G data are available at https://genetics-docs.opentargets.org/our-approach/data-pipeline. GTEx eQTLs are available at https://gtexportal.org/home/datasets. ROSMAP eQTLs/meQTLs are available at https://mostafavilab.stat.ubc.ca/xqtl/.

Code availability

An R package implementing CARMA is available at https://github.com/Iuliana-Ionita-Laza/CARMA. The analysis code to produce the major results presented in the paper is available at Zenodo38, https://doi.org/10.5281/zenodo.7772462.

References

  1. Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat.1780–1815 (2011).

  2. Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–301 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Benner, C. et al. Finemap: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wang, G., Sarkar, A. K., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).

    Article  Google Scholar 

  8. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).

    Article  CAS  PubMed  Google Scholar 

  10. Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015).

    Article  Google Scholar 

  11. Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘sum of single effects’ model. PLoS Genet. 18, e1010299 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Chen, W. et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat. Commun. 12, 7117 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2, 100210 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Yang, J. et al. Conditional and joint multiple-snp analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hans, C., Dobra, A. & West, M. Shotgun stochastic search for ‘large p’ regression. J. Am. Stat. Assoc. 102, 507–516 (2007).

    Article  CAS  Google Scholar 

  20. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).

    Article  Google Scholar 

  21. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Consortium, G. et al. The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  Google Scholar 

  24. Ng, B. et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat. Neurosci. 20, 1418–1426 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Sims, R. et al. Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer’s disease. Nat. Genet. 49, 1373–1384 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Benner, C., Havulinna, A. S., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/318618v1 (2018).

  27. Zellner, A. & Siow, A. Posterior odds ratios for selected regression hypotheses. Trab. Estad. Investig. Oper. 31, 585–603 (1980).

    Article  Google Scholar 

  28. Castillo, I. & van der Vaart, A. et al. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012).

    Article  Google Scholar 

  29. Womack, A. J., Fuentes, C. & Taylor-Rodriguez, D. Model space priors for objective sparse Bayesian regression. Preprint at https://arxiv.org/abs/1511.04745 (2015).

  30. Bulik-Sullivan, B. K. et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Warton, D. I. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Am. Stat. Assoc. 103, 340–349 (2008).

    Article  CAS  Google Scholar 

  32. Vilaplana, J. P. & Puri, M. N. (eds) Proc. 4th International Meeting of Statistics in the Basque Country, San Sebastian, Spain (De Gruyter, 1994).

  33. Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

    Article  Google Scholar 

  34. Dimitromanolakis, A., Xu, J., Krol, A. & Briollais, L. sim1000g: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs. BMC Bioinformatics 20, 26 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Willer, C. J., Li, Y. & Abecasis, G. R. Metal: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Yang, Z. ZikunY/CARMA: release of CARMA package and code used in the publication (1.0.5). Zenodo https://doi.org/10.5281/zenodo.7772462 (2023).

Download references

Acknowledgements

This study was supported by the National Institutes of Health under grants MH095797 (to I.I.-L.), AG072272 (to Z.Y., C.W., B.V., R.M. and I.I.-L.) and RC2DK116690 (to K.K., C.W. and I.I.-L.). The funders had no role in study design, data collection and analysis and decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Z.Y. and I.I.-L. designed the methods and performed the experiments. Z.Y., C.W., L.L., A.K., A.L., B.V., R.M., K.K. and I.I.-L. analyzed and interpreted data. Z.Y. and I.I.-L. drafted the paper and revised it according to suggestions by the co-authors. All authors critically reviewed the paper, suggested revisions as needed and approved the final version.

Corresponding author

Correspondence to Iuliana Ionita-Laza.

Ethics declarations

Competing interests

All authors declare no competing interests.

Statistics and reproducibility

No statistical method was used to predetermine the sample size. We did not use any study design that required randomization or blinding.

Peer review

Peer review information

Nature Genetics thanks Laura Fachal, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance of credible sets and credible models (in-sample LD, with and without functional annotations).

T {1, 2, 3}. (a) Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set. (b) Power is the percentage of the simulated causal variants identified by the credible model or the credible sets. Size is the average number of variants included in the credible model (CARMA) or all the credible sets (CARMA, SuSiE, fastPAINTOR) at a locus. The credible model is computed based on a threshold of 10 for the posterior odds.

Extended Data Fig. 2 Power and FDR using positive predictions for in-sample LD.

Comparison between no annotation and including functional annotations for (a) T = 1, (b) T = 2, and (c) T = 3. FDR vs. power for different methods using positive predictions as the PIP threshold varies from 0.1 to 1. These quantities are calculated as \({{{\rm{FDR}}}}:=\frac{FP}{TP+FP}\) and \({{{\rm{power}}}}:=\frac{TP}{TP+FN}\), where FP, TP, FN, TN denote the number of false positives, true positives, false negatives and true negatives respectively given a certain PIP threshold. Open circles denote the results at PIP threshold 0.5, and solid circles denote the results at PIP threshold 0.95.

Extended Data Fig. 3 Performance of credible sets and credible models (UKBB LD. Three simulation settings: consistent, inconsistent, inconsistent+imputation).

T {1, 2, 3}. (a) Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set. (b) Power is the percentage of the simulated causal variants identified by the credible model or the credible sets. Size is the average number of variants included in the credible model (CARMA) or all the credible sets at a locus. The credible model is computed based on a threshold of 10 for the posterior odds.

Extended Data Fig. 4 Power and FDR using positive predictions in consistent/inconsistent/inconsistent+imputation settings.

(a) T = 1, (b) T = 2, (c) T = 3. FDR vs. power for different methods using positive predictions as the PIP threshold varies from 0.1 to 1. These quantities are calculated as \({{{\rm{FDR}}}}:=\frac{FP}{TP+FP}\) and \({{{\rm{power}}}}:=\frac{TP}{TP+FN}\), where FP, TP, FN, TN denote the number of false positives, true positives, false negatives and true negatives respectively given a certain PIP threshold. Open circles denote the results at PIP threshold 0.5, and solid circles denote the results at PIP threshold 0.95.

Extended Data Fig. 5 Results for the inconsistent meta-analysis.

T {1, 2, 3}. (a) Increased single SNP credible sets in the presence of Z-scores/LD inconsistencies. Results on credible sets (ρ = 0.99) with only one SNP, that is, the corresponding SNP receives a PIP larger than 0.99. For each model we report the total number of credible sets that contain only one SNP across all 94 loci and three different scenarios, and the coverage of these sets, that is the proportion of these credible sets that contain a causal SNP. (b) SuSiE with CARMA vs. SLALOM in the inconsistent setting. Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.

Extended Data Fig. 6 Performance of credible sets (in-sample LD).

T {1, 2, 3}; ρ = 0.95. Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets that contain a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.

Extended Data Fig. 7 Performance of credible sets for large explained variance per locus.

Performance of credible sets for the spike-and-slab prior with τ = 0.038 (corresponding to φ = 0.01) and Cauchy prior (τ ~ Gamma(0.5, 0.5)) in the scenario with large explained variance at a locus (15%). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets that contain a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.

Supplementary information

Supplementary Information

Supplementary Note, Tables 1–6 and Figs. 1–76.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Wang, C., Liu, L. et al. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat Genet 55, 1057–1065 (2023). https://doi.org/10.1038/s41588-023-01392-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01392-0

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics