CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses

Yang, Zikun; Wang, Chen; Liu, Linxi; Khan, Atlas; Lee, Annie; Vardarajan, Badri; Mayeux, Richard; Kiryluk, Krzysztof; Ionita-Laza, Iuliana

doi:10.1038/s41588-023-01392-0

Technical Report
Published: 11 May 2023

CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses

Nature Genetics volume 55, pages 1057–1065 (2023)Cite this article

7047 Accesses
4 Citations
41 Altmetric
Metrics details

Subjects

Abstract

Fine-mapping is commonly used to identify putative causal variants at genome-wide significant loci. Here we propose a Bayesian model for fine-mapping that has several advantages over existing methods, including flexible specification of the prior distribution of effect sizes, joint modeling of summary statistics and functional annotations and accounting for discrepancies between summary statistics and external linkage disequilibrium in meta-analyses. Using simulations, we compare performance with commonly used fine-mapping methods and show that the proposed model has higher power and lower false discovery rate (FDR) when including functional annotations, and higher power, lower FDR and higher coverage for credible sets in meta-analyses. We further illustrate our approach by applying it to a meta-analysis of Alzheimer’s disease genome-wide association studies where we prioritize putatively causal variants and genes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Credible sets, credible models and FDR versus power (in-sample LD, with and without functional annotations, ∣T∣ = 2).**

**Fig. 3: Credible sets, credible models and FDR versus power (UKBB LD, consistent and inconsistent meta-analyses, ∣T∣ = 2).**

**Fig. 4: Results for inconsistent meta-analyses.**

**Fig. 5: Functional enrichments of prioritized SNPs at AD GWAS loci.**

GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies

Article Open access 23 November 2022

Zihuai He, Linxi Liu, … Iuliana Ionita-Laza

Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer’s disease risk genes

Article 15 February 2021

Jeremy Schwartzentruber, Sarah Cooper, … Andrew Bassett

The Polygenic Risk Score Knowledge Base offers a centralized online repository for calculating and contextualizing polygenic risk scores

Article Open access 02 September 2022

Madeline L. Page, Elizabeth L. Vance, … John S. K. Kauwe

Data availability

Code availability

An R package implementing CARMA is available at https://github.com/Iuliana-Ionita-Laza/CARMA. The analysis code to produce the major results presented in the paper is available at Zenodo³⁸, https://doi.org/10.5281/zenodo.7772462.

References

Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat.1780–1815 (2011).
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–301 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).
Article PubMed PubMed Central Google Scholar
Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016).
Article PubMed PubMed Central Google Scholar
Benner, C. et al. Finemap: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
Article PubMed PubMed Central Google Scholar
Wang, G., Sarkar, A. K., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).
Article Google Scholar
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).
Article CAS PubMed Google Scholar
Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015).
Article Google Scholar
Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘sum of single effects’ model. PLoS Genet. 18, e1010299 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, W. et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat. Commun. 12, 7117 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2, 100210 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Conditional and joint multiple-snp analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hans, C., Dobra, A. & West, M. Shotgun stochastic search for ‘large p’ regression. J. Am. Stat. Assoc. 102, 507–516 (2007).
Article CAS Google Scholar
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).
Article Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214 (2016).
Article CAS PubMed PubMed Central Google Scholar
Consortium, G. et al. The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Ng, B. et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat. Neurosci. 20, 1418–1426 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sims, R. et al. Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer’s disease. Nat. Genet. 49, 1373–1384 (2017).
Article CAS PubMed PubMed Central Google Scholar
Benner, C., Havulinna, A. S., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/318618v1 (2018).
Zellner, A. & Siow, A. Posterior odds ratios for selected regression hypotheses. Trab. Estad. Investig. Oper. 31, 585–603 (1980).
Article Google Scholar
Castillo, I. & van der Vaart, A. et al. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012).
Article Google Scholar
Womack, A. J., Fuentes, C. & Taylor-Rodriguez, D. Model space priors for objective sparse Bayesian regression. Preprint at https://arxiv.org/abs/1511.04745 (2015).
Bulik-Sullivan, B. K. et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Warton, D. I. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Am. Stat. Assoc. 103, 340–349 (2008).
Article CAS Google Scholar
Vilaplana, J. P. & Puri, M. N. (eds) Proc. 4th International Meeting of Statistics in the Basque Country, San Sebastian, Spain (De Gruyter, 1994).
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Article Google Scholar
Dimitromanolakis, A., Xu, J., Krol, A. & Briollais, L. sim1000g: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs. BMC Bioinformatics 20, 26 (2019).
Article PubMed PubMed Central Google Scholar
Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Article CAS PubMed PubMed Central Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. Metal: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. ZikunY/CARMA: release of CARMA package and code used in the publication (1.0.5). Zenodo https://doi.org/10.5281/zenodo.7772462 (2023).

Download references

Acknowledgements

This study was supported by the National Institutes of Health under grants MH095797 (to I.I.-L.), AG072272 (to Z.Y., C.W., B.V., R.M. and I.I.-L.) and RC2DK116690 (to K.K., C.W. and I.I.-L.). The funders had no role in study design, data collection and analysis and decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Biostatistics, Columbia University, New York City, NY, USA
Zikun Yang, Chen Wang & Iuliana Ionita-Laza
Division of Nephrology Department of Medicine College of Physicians and Surgeons, Columbia University, New York City, NY, USA
Chen Wang, Atlas Khan & Krzysztof Kiryluk
Department of Statistics, University of Pittsburgh, Pittsburgh, PA, USA
Linxi Liu
Department of Neurology College of Physicians and Surgeons, Columbia University, New York City, NY, USA
Annie Lee, Badri Vardarajan & Richard Mayeux

Authors

Zikun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Linxi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Atlas Khan
View author publications
You can also search for this author in PubMed Google Scholar
Annie Lee
View author publications
You can also search for this author in PubMed Google Scholar
Badri Vardarajan
View author publications
You can also search for this author in PubMed Google Scholar
Richard Mayeux
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Kiryluk
View author publications
You can also search for this author in PubMed Google Scholar
Iuliana Ionita-Laza
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.Y. and I.I.-L. designed the methods and performed the experiments. Z.Y., C.W., L.L., A.K., A.L., B.V., R.M., K.K. and I.I.-L. analyzed and interpreted data. Z.Y. and I.I.-L. drafted the paper and revised it according to suggestions by the co-authors. All authors critically reviewed the paper, suggested revisions as needed and approved the final version.

Corresponding author

Correspondence to Iuliana Ionita-Laza.

Ethics declarations

Competing interests

All authors declare no competing interests.

Statistics and reproducibility

No statistical method was used to predetermine the sample size. We did not use any study design that required randomization or blinding.

Peer review

Peer review information

Nature Genetics thanks Laura Fachal, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance of credible sets and credible models (in-sample LD, with and without functional annotations).

∣T∣ ∈ {1, 2, 3}. (a) Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set. (b) Power is the percentage of the simulated causal variants identified by the credible model or the credible sets. Size is the average number of variants included in the credible model (CARMA) or all the credible sets (CARMA, SuSiE, fastPAINTOR) at a locus. The credible model is computed based on a threshold of 10 for the posterior odds.

Extended Data Fig. 2 Power and FDR using positive predictions for in-sample LD.

Comparison between no annotation and including functional annotations for (a) ∣T∣ = 1, (b) ∣T∣ = 2, and (c) ∣T∣ = 3. FDR vs. power for different methods using positive predictions as the PIP threshold varies from 0.1 to 1. These quantities are calculated as \({{{\rm{FDR}}}}:=\frac{FP}{TP+FP}\) and \({{{\rm{power}}}}:=\frac{TP}{TP+FN}\), where FP, TP, FN, TN denote the number of false positives, true positives, false negatives and true negatives respectively given a certain PIP threshold. Open circles denote the results at PIP threshold 0.5, and solid circles denote the results at PIP threshold 0.95.

Extended Data Fig. 3 Performance of credible sets and credible models (UKBB LD. Three simulation settings: consistent, inconsistent, inconsistent+imputation).

∣T∣ ∈ {1, 2, 3}. (a) Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set. (b) Power is the percentage of the simulated causal variants identified by the credible model or the credible sets. Size is the average number of variants included in the credible model (CARMA) or all the credible sets at a locus. The credible model is computed based on a threshold of 10 for the posterior odds.

Extended Data Fig. 4 Power and FDR using positive predictions in consistent/inconsistent/inconsistent+imputation settings.

(a) ∣T∣ = 1, (b) ∣T∣ = 2, (c) ∣T∣ = 3. FDR vs. power for different methods using positive predictions as the PIP threshold varies from 0.1 to 1. These quantities are calculated as \({{{\rm{FDR}}}}:=\frac{FP}{TP+FP}\) and \({{{\rm{power}}}}:=\frac{TP}{TP+FN}\), where FP, TP, FN, TN denote the number of false positives, true positives, false negatives and true negatives respectively given a certain PIP threshold. Open circles denote the results at PIP threshold 0.5, and solid circles denote the results at PIP threshold 0.95.

Extended Data Fig. 5 Results for the inconsistent meta-analysis.

∣T∣ ∈ {1, 2, 3}. (a) Increased single SNP credible sets in the presence of Z-scores/LD inconsistencies. Results on credible sets (ρ = 0.99) with only one SNP, that is, the corresponding SNP receives a PIP larger than 0.99. For each model we report the total number of credible sets that contain only one SNP across all 94 loci and three different scenarios, and the coverage of these sets, that is the proportion of these credible sets that contain a causal SNP. (b) SuSiE with CARMA vs. SLALOM in the inconsistent setting. Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.

Extended Data Fig. 6 Performance of credible sets (in-sample LD).

∣T∣ ∈ {1, 2, 3}; ρ = 0.95. Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets that contain a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.

Extended Data Fig. 7 Performance of credible sets for large explained variance per locus.

Performance of credible sets for the spike-and-slab prior with τ = 0.038 (corresponding to φ = 0.01) and Cauchy prior (τ ~ Gamma(0.5, 0.5)) in the scenario with large explained variance at a locus (15%). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets that contain a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.

Supplementary information

Supplementary Information

Supplementary Note, Tables 1–6 and Figs. 1–76.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, Z., Wang, C., Liu, L. et al. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat Genet 55, 1057–1065 (2023). https://doi.org/10.1038/s41588-023-01392-0

Download citation

Received: 11 April 2022
Accepted: 11 April 2023
Published: 11 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1038/s41588-023-01392-0

This article is cited by

MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies
- Boran Gao
- Xiang Zhou
Nature Genetics (2024)

CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses

Subjects

Abstract

Access options

Similar content being viewed by others

GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies

Genome-wide meta-analysis, fine-mapping and integrative prioritization implicate new Alzheimer’s disease risk genes

The Polygenic Risk Score Knowledge Base offers a centralized online repository for calculating and contextualizing polygenic risk scores

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Statistics and reproducibility

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 Performance of credible sets and credible models (in-sample LD, with and without functional annotations).

Extended Data Fig. 2 Power and FDR using positive predictions for in-sample LD.

Extended Data Fig. 3 Performance of credible sets and credible models (UKBB LD. Three simulation settings: consistent, inconsistent, inconsistent+imputation).

Extended Data Fig. 4 Power and FDR using positive predictions in consistent/inconsistent/inconsistent+imputation settings.

Extended Data Fig. 5 Results for the inconsistent meta-analysis.

Extended Data Fig. 6 Performance of credible sets (in-sample LD).

Extended Data Fig. 7 Performance of credible sets for large explained variance per locus.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

This article is cited by

MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Statistics and reproducibility

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links