Abstract
Fine-mapping is commonly used to identify putative causal variants at genome-wide significant loci. Here we propose a Bayesian model for fine-mapping that has several advantages over existing methods, including flexible specification of the prior distribution of effect sizes, joint modeling of summary statistics and functional annotations and accounting for discrepancies between summary statistics and external linkage disequilibrium in meta-analyses. Using simulations, we compare performance with commonly used fine-mapping methods and show that the proposed model has higher power and lower false discovery rate (FDR) when including functional annotations, and higher power, lower FDR and higher coverage for credible sets in meta-analyses. We further illustrate our approach by applying it to a meta-analysis of Alzheimer’s disease genome-wide association studies where we prioritize putatively causal variants and genes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
1000G data (phase 3) are available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. LD for UKBB British-ancestry samples are available at https://data.broadinstitute.org/alkesgroup/UKBB_LD. Summary statistics of AD GWAS are available at https://ctg.cncr.nl/software/summary_statistics. DeepSEA functional annotations are available at http://deepsea.princeton.edu/job/analysis/create/. PolyFun annotations are available at https://alkesgroup.broadinstitute.org/polyfun_results/. CADD scores are available at https://cadd.gs.washington.edu/score. Eigen scores are available at http://www.columbia.edu/ii2135/eigen.html. V2G data are available at https://genetics-docs.opentargets.org/our-approach/data-pipeline. GTEx eQTLs are available at https://gtexportal.org/home/datasets. ROSMAP eQTLs/meQTLs are available at https://mostafavilab.stat.ubc.ca/xqtl/.
Code availability
An R package implementing CARMA is available at https://github.com/Iuliana-Ionita-Laza/CARMA. The analysis code to produce the major results presented in the paper is available at Zenodo38, https://doi.org/10.5281/zenodo.7772462.
References
Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat.1780–1815 (2011).
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–301 (2012).
Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).
Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable Bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016).
Benner, C. et al. Finemap: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
Wang, G., Sarkar, A. K., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).
Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015).
Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the ‘sum of single effects’ model. PLoS Genet. 18, e1010299 (2022).
Chen, W. et al. Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors. Nat. Commun. 12, 7117 (2021).
Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genom. 2, 100210 (2022).
Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Yang, J. et al. Conditional and joint multiple-snp analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Hans, C., Dobra, A. & West, M. Shotgun stochastic search for ‘large p’ regression. J. Am. Stat. Assoc. 102, 507–516 (2007).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214 (2016).
Consortium, G. et al. The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Ng, B. et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat. Neurosci. 20, 1418–1426 (2017).
Sims, R. et al. Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer’s disease. Nat. Genet. 49, 1373–1384 (2017).
Benner, C., Havulinna, A. S., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/318618v1 (2018).
Zellner, A. & Siow, A. Posterior odds ratios for selected regression hypotheses. Trab. Estad. Investig. Oper. 31, 585–603 (1980).
Castillo, I. & van der Vaart, A. et al. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Stat. 40, 2069–2101 (2012).
Womack, A. J., Fuentes, C. & Taylor-Rodriguez, D. Model space priors for objective sparse Bayesian regression. Preprint at https://arxiv.org/abs/1511.04745 (2015).
Bulik-Sullivan, B. K. et al. Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Warton, D. I. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Am. Stat. Assoc. 103, 340–349 (2008).
Vilaplana, J. P. & Puri, M. N. (eds) Proc. 4th International Meeting of Statistics in the Basque Country, San Sebastian, Spain (De Gruyter, 1994).
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Dimitromanolakis, A., Xu, J., Krol, A. & Briollais, L. sim1000g: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs. BMC Bioinformatics 20, 26 (2019).
Fachal, L. et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Willer, C. J., Li, Y. & Abecasis, G. R. Metal: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26, 2190–2191 (2010).
Yang, Z. ZikunY/CARMA: release of CARMA package and code used in the publication (1.0.5). Zenodo https://doi.org/10.5281/zenodo.7772462 (2023).
Acknowledgements
This study was supported by the National Institutes of Health under grants MH095797 (to I.I.-L.), AG072272 (to Z.Y., C.W., B.V., R.M. and I.I.-L.) and RC2DK116690 (to K.K., C.W. and I.I.-L.). The funders had no role in study design, data collection and analysis and decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Z.Y. and I.I.-L. designed the methods and performed the experiments. Z.Y., C.W., L.L., A.K., A.L., B.V., R.M., K.K. and I.I.-L. analyzed and interpreted data. Z.Y. and I.I.-L. drafted the paper and revised it according to suggestions by the co-authors. All authors critically reviewed the paper, suggested revisions as needed and approved the final version.
Corresponding author
Ethics declarations
Competing interests
All authors declare no competing interests.
Statistics and reproducibility
No statistical method was used to predetermine the sample size. We did not use any study design that required randomization or blinding.
Peer review
Peer review information
Nature Genetics thanks Laura Fachal, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance of credible sets and credible models (in-sample LD, with and without functional annotations).
∣T∣ ∈ {1, 2, 3}. (a) Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set. (b) Power is the percentage of the simulated causal variants identified by the credible model or the credible sets. Size is the average number of variants included in the credible model (CARMA) or all the credible sets (CARMA, SuSiE, fastPAINTOR) at a locus. The credible model is computed based on a threshold of 10 for the posterior odds.
Extended Data Fig. 2 Power and FDR using positive predictions for in-sample LD.
Comparison between no annotation and including functional annotations for (a) ∣T∣ = 1, (b) ∣T∣ = 2, and (c) ∣T∣ = 3. FDR vs. power for different methods using positive predictions as the PIP threshold varies from 0.1 to 1. These quantities are calculated as \({{{\rm{FDR}}}}:=\frac{FP}{TP+FP}\) and \({{{\rm{power}}}}:=\frac{TP}{TP+FN}\), where FP, TP, FN, TN denote the number of false positives, true positives, false negatives and true negatives respectively given a certain PIP threshold. Open circles denote the results at PIP threshold 0.5, and solid circles denote the results at PIP threshold 0.95.
Extended Data Fig. 3 Performance of credible sets and credible models (UKBB LD. Three simulation settings: consistent, inconsistent, inconsistent+imputation).
∣T∣ ∈ {1, 2, 3}. (a) Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set. (b) Power is the percentage of the simulated causal variants identified by the credible model or the credible sets. Size is the average number of variants included in the credible model (CARMA) or all the credible sets at a locus. The credible model is computed based on a threshold of 10 for the posterior odds.
Extended Data Fig. 4 Power and FDR using positive predictions in consistent/inconsistent/inconsistent+imputation settings.
(a) ∣T∣ = 1, (b) ∣T∣ = 2, (c) ∣T∣ = 3. FDR vs. power for different methods using positive predictions as the PIP threshold varies from 0.1 to 1. These quantities are calculated as \({{{\rm{FDR}}}}:=\frac{FP}{TP+FP}\) and \({{{\rm{power}}}}:=\frac{TP}{TP+FN}\), where FP, TP, FN, TN denote the number of false positives, true positives, false negatives and true negatives respectively given a certain PIP threshold. Open circles denote the results at PIP threshold 0.5, and solid circles denote the results at PIP threshold 0.95.
Extended Data Fig. 5 Results for the inconsistent meta-analysis.
∣T∣ ∈ {1, 2, 3}. (a) Increased single SNP credible sets in the presence of Z-scores/LD inconsistencies. Results on credible sets (ρ = 0.99) with only one SNP, that is, the corresponding SNP receives a PIP larger than 0.99. For each model we report the total number of credible sets that contain only one SNP across all 94 loci and three different scenarios, and the coverage of these sets, that is the proportion of these credible sets that contain a causal SNP. (b) SuSiE with CARMA vs. SLALOM in the inconsistent setting. Performance of credible sets (ρ = 0.99). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets containing a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.
Extended Data Fig. 6 Performance of credible sets (in-sample LD).
∣T∣ ∈ {1, 2, 3}; ρ = 0.95. Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets that contain a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.
Extended Data Fig. 7 Performance of credible sets for large explained variance per locus.
Performance of credible sets for the spike-and-slab prior with τ = 0.038 (corresponding to φ = 0.01) and Cauchy prior (τ ~ Gamma(0.5, 0.5)) in the scenario with large explained variance at a locus (15%). Power: the proportion of simulated causal variants included in any credible set; Coverage: the proportion of credible sets that contain a causal variant; Size: the number of variants included in a credible set; Purity: mean squared correlation between variants in a credible set.
Supplementary information
Supplementary Information
Supplementary Note, Tables 1–6 and Figs. 1–76.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Z., Wang, C., Liu, L. et al. CARMA is a new Bayesian model for fine-mapping in genome-wide association meta-analyses. Nat Genet 55, 1057–1065 (2023). https://doi.org/10.1038/s41588-023-01392-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-023-01392-0