Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Improving fine-mapping by modeling infinitesimal effects

Abstract

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods’ posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: RFRs and functional enrichments.
Fig. 2: Simulations with nonsparse effects.
Fig. 3: Simulations with uncorrected stratification.
Fig. 4: Runtime comparison.
Fig. 5: Real data performance improvements.

Similar content being viewed by others

Data availability

The main fine-mapping results at N = 100,000 sample size produced by this study are publicly available at https://doi.org/10.5281/zenodo.7055906. The fine-mapping results at N = 366,000 previously produced by our group are available at https://www.finucanelab.org/data. The UKBB individual-level data are accessible on request through the UKBB Access Management System (https://www.ukbiobank.ac.uk/). The UKBB analysis in this study was conducted via application number 31063.

Code availability

Software implementing SuSiE-inf and FINEMAP-inf are publicly available at https://github.com/FinucaneLab/fine-mapping-inf (https://doi.org/10.5281/zenodo.8427832). All scripts for figure generation as well as simulation scripts are available at https://github.com/cuiran/improve-fine-mapping (https://doi.org/10.5281/zenodo.10037442).

References

  1. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Shendure, J., Findlay, G. M. & Snyder, M. W. Genomic medicine—progress, pitfalls, and promise. Cell 177, 45–57 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Hukku, A. et al. Probabilistic colocalization of genetic variants from complex and molecular traits: promise and limitations. Am. J. Hum. Genet. 108, 25–35 (2021).

    Article  CAS  PubMed  Google Scholar 

  4. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. LaPierre, N. et al. Identifying causal variants by fine mapping across multiple studies. PLoS Genet. 17, e1009733 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Ser. Stat. Methodol. 82, 1273–1300 (2020).

    Article  Google Scholar 

  9. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Benner, C., Havulinna, A. S., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://doi.org/10.1101/318618 (2018).

  11. Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).

    Article  PubMed  Google Scholar 

  12. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012). S1-3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Westra, H.-J. et al. Fine-mapping and functional studies highlight potential causal variants for rheumatoid arthritis and type 1 diabetes. Nat. Genet. 50, 1366–1374 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhang, Z. et al. Genetic analyses support the contribution of mRNA N6-methyladenosine (m6A) modification to human disease heritability. Nat. Genet. 52, 939–949 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Ulirsch, Jacob C. Identification and interpretation of causal genetic variants underlying human phenotypes (accession no. 2022. 29209644). Doctoral dissertation, Harvard University (2022).

  21. Howrigan, D. P. et al. Nealelab/UK_Biobank_GWAS: v2. Zenodo https://doi.org/10.5281/zenodo.8011558 (2023).

  22. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Kanai, M. et al. Meta-analysis fine-mapping is often miscalibrated at single-variant resolution. Cell Genomics 2, 100210 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Erbe, M. et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 95, 4114–4129 (2012).

    Article  CAS  PubMed  Google Scholar 

  27. Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Márquez-Luna, C. & Loh, P.-R. South Asian Type 2 Diabetes (SAT2D) Consortium, SIGMA Type 2 Diabetes Consortium & Price, A. L. Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 41, 811–823 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Pan UK Biobank. Broad Institute https://pan.ukbb.broadinstitute.org/ (2023).

  30. Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021).

  31. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Bates, S., Candès, E., Janson, L. & Wang, W. Metropolized knockoff sampling. J. Am. Stat. Assoc. 116, 1413–1427 (2021).

    Article  CAS  Google Scholar 

  34. Candès, E., Fan, Y., Janson, L. & Lv, J. Panning for gold: ‘model-x’ knockoffs for high dimensional controlled variable selection. J. R. Stat. Soc. Ser. Stat. Methodol. 80, 551–577 (2018).

    Article  Google Scholar 

  35. Sesia, M., Katsevich, E., Bates, S., Candès, E. & Sabatti, C. Multi-resolution localization of causal variants across the genome. Nat. Commun. 11, 1093 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sesia, M., Bates, S., Candès, E., Marchini, J. & Sabatti, C. False discovery rate control in genome-wide association studies with population structure. Proc. Natl Acad. Sci. USA 118, e2105841118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. He, Z. et al. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat. Commun. 13, 7209 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Nealelab UK_Biobank_GWAS. GitHub https://github.com/Nealelab/UK_Biobank_GWAS/blob/master/ukb31063_eur_selection.R (2018).

  39. FinucaneLab/fine-mapping-inf: fine-mapping-inf-published. Zenodo https://doi.org/10.5281/zenodo.10037442 (2023).

  40. Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. mkanai/finemapping-pipeline. Zenodo https://zenodo.org/records/6908588 (2022).

  42. Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hail | Index. Hail https://hail.is/ (2023).

  45. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Weissbrod, O. et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 54, 450–458 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This research has been conducted using the UKBB Resource under application number 31063. We acknowledge all the participants of UKBB. R.C., R.A.E. and H.K.F. are funded by DP5—5DP5OD024582-04, SFARI 704413 and 1U01HG011719-01. This work is also supported by the Novo Nordisk Foundation (NNF21SA0072102). M.K. is supported by a Nakajima Foundation Fellowship and the Masason Foundation. Z.F. is funded by NSF DMS-2142476. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank all the members of Finucane lab and Analytic and Translational Genetics Unit (ATGU) at Massachusetts General Hospital for their helpful feedback.

Author information

Authors and Affiliations

Authors

Contributions

R.C. and H.K.F. conceived the idea of the RFR metric. Z.F. and H.K.F. conceived the idea of SuSiE-inf and FINEMAP-inf. Z.F. derived the methods and wrote the software. R.A.E. conducted the large-scale simulations and related analyses. M.K. provided GWAS and fine-mapping pipelines. R.C. conducted the analyses on UKBB and wrote the paper. J.C.U., O.W., B.M.N. and M.J.D. contributed to the analyses and helped interpret the results. Z.F. and H.K.F. jointly supervised this research and critically revised the paper.

Corresponding authors

Correspondence to Ran Cui, Zhou Fan or Hilary K. Finucane.

Ethics declarations

Competing interests

J.C.U. is an employee of Illumina. O.W. is an employee and holds equity in Eleven Therapeutics. B.M.N. is a member of the scientific advisory board at Deep Genomics and Neumora, consultant of the scientific advisory board for Camp4 Therapeutics and consultant for Merck. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymou(s) reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Replication failure rates at different PIP thresholds.

Replication failure rates at four different PIP thresholds: 0.9 (default), 0.93, 0.95, 0.99, for SuSiE, FINEMAP, SuSiE-inf, and FINEMAP-inf aggregated across 10 UKBB quantitative phenotypes, contrasted with RFRs in ideal simulations and with EPN. Error bars represent one SD of the corresponding binomial distribution Binom(n, p), where n is the total number of high-PIP variants at sample size N = 100 K, and p is the RFR. Bar plot data is presented as RFR + /- SD. Numerical results are available in Supplementary Table 14.

Extended Data Fig. 2 Calibration in different non-sparsity coverage settings.

Calibration for SuSiE, SuSiE-inf, FINEMAP and FINEMAP-inf in simulations with different non-sparsity coverage settings: 0.5%, 1%, and 5% (see Table 1 for more parameter settings in these simulations). Heritability ratio between small and large effects is fixed at 3:1 for three simulation scenarios, while the fourth scenario we set the coverage at 5% and heritability ratio at 15:1 to match the per-SNP heritability in simulations where 1% SNPs are causal and heritability ratio is 3:1. Error bars correspond to 95% Wilson confidence interval. Numerical results available in Supplementary Table 15.

Extended Data Fig. 3 Additional evidence of performance improvements in real data.

a-b. Functional enrichment of top N (N = 500, 1000, 1500, and 3000) highest PIP variants from SuSiE, SuSiE-inf, FINEMAP, and FINEMAP-inf. GWAS summary statistics computed using BOLT-LMM and OLS. c. Functional enrichment of the set differences between SuSiE and SuSiE-inf high-PIP (PIP > 0.9) variants and FINEMAP and FINEMAP-inf high-PIP variants. Error bars represent one SD of the corresponding binomial distribution Binom(n,p), where n is the total number of variants in each set and p is the corresponding proportion of annotated variants). Bar plot data is presented as proportion +/- SD. d. The proportion of reduction for the number of variants in three categories when using the SuSiE-inf and FINEMAP-inf compared to using SuSiE and FINEMAP. The three categories are: High-PIP (PIP > 0.9 for either method, reduced from 1876 to 1578), Replicated (PIP > 0.9 at both sample sizes N = 100 K and N = 366 K, reduced from 665 to 595), and Shared high-PIP (PIP > 0.9 for both method, reduced from 723 to 646). e. Credible set sizes in all regions fine-mapped by SuSiE and SuSiE-inf. Box plot lower and upper hinges correspond to 1st and 3rd quantiles, whiskers extend no further than 1.5*IQR from the hinges, outliers are plotted as individual points, solid line in the boxes show medians. Numerical results available in Supplementary Table 16-20.

Extended Data Fig. 4 Estimated infinitesimal variance (tau squared) in simulations.

a. The mean estimated tau squared aggregated across all regions fine-mapped in each non-sparse simulation settings +/- SD, where SD is the in-sample standard deviation of estimated tau squared in the corresponding simulation setting. See Table 1 for simulation parameters. b. Estimated tau squared when OLS/BOLT-LMM is used to perform GWAS, and true tau squared in three sets of simulation settings are plotted. “Large-scale, inf model” represents the set of large-scale simulations (described in Methods) with 100% causal coverage setting and no missing causal variants are introduced. “One region, h2g = 0.05” represents the set of simulations using imputed genotypes in one region on Chromosome 1, with 100% causal coverage, no missing causal variants, and no exclusion of variants in the fine-mapping pipeline. The total SNP heritability is set to be 0.05. “One region, h2g = 0.1” is similar except with total SNP heritability set to be 0.1. c. Estimated tau squared in four stratification simulation settings with no non-sparse effects, see Table 1 for simulation parameters. Box plot lower and upper hinges correspond to 1st and 3rd quantiles, whiskers extend no further than 1.5*IQR from the hinges, outliers are plotted as individual points, solid line in the boxes show medians. Numerical results available in Supplementary Table 21-23.

Extended Data Fig. 5 Estimated infinitesimal variance (tau squared) in UK Biobank.

a. Estimated tau squared in all fine-mapped regions for 10 UK Biobank phenotypes at sample size N = 366 K. Box plot lower and upper hinges correspond to 1st and 3rd quantiles, whiskers extend no further than 1.5*IQR from the hinges, outliers are plotted as individual points, solid line in the boxes show medians and the red dot denotes the mean. b. Comparing mean tau-squared estimates between traits. Heatmap shows the results of pair-wise Welch two-sample T-test with alternative hypothesis: mean of estimated tau squared in all regions for trait 1 (x-axis) is greater than that of trait 2 (y-axis). The test is one-sided. Multiple-testing adjusted p-value significance cutoff is set to be 0.05/90 = 5.5e-4, correcting for the total number of trait pairs tested. Stars indicate p-value has passed the significant threshold. c. Correlation between number of credible sets and the estimated infinitesimal variance (tau squared). Regions with the same number of credible sets are aggregated, and the median estimated tau squared are obtained from these regions. Scatter plot shows these medians. The best fitted line is plotted using ggscatter. R is the Pearson correlation, and p is the two-sided correlation p-value. The 95% confidence interval is shown on the plot as the gray shaded area. Numerical results available in Supplementary Table 24-26.

Extended Data Fig. 6 Agreement between SuSiE and FINEMAP PIPs; SuSiE-inf and FINEMAP-inf PIPs.

a-b. Density plots of PIPs from fine-mapping 10 UK Biobank traits at sample size N = 366 K. X-axis shows PIPs from running SuSiE (or SuSiE-inf), y-axis shows PIPs from running FINEMAP (or FINEMAP-inf). Only variants with PIP > = 0.1 for either method are shown on the plots. c. Number of high-PIP variants identified by SuSiE, SuSiE-inf, FINEMAP, FINEMAP-inf, minPIP, minPIP-inf, meanPIP and meanPIP-inf, where meanPIP(-inf) is defined as taking the average PIP between SuSiE and FINEMAP (resp. SuSiE-inf and FINEMAP-inf). Data aggregated across 10 UKBB traits fine-mapped at N = 366 K. Numerical results available in Supplementary Table 27.

Extended Data Fig. 7 minPIP-inf performance.

a. Replication failure rates of minPIP and minPIP-inf in real data and in ideal simulations. Numerical results available in Supplementary Table 4. Error bars represent one SD of the corresponding binomial distribution Binom(n, p), where n is the total number of high-PIP variants at sample size N = 100 K, and p is the RFR. Bar plot data is presented as RFR + /- SD. b. Functional enrichment of top N (N = 500, 1000, 1500, and 3000) highest PIP variants from SuSiE-inf, FINEMAP-inf and minPIP-inf. Error bars represent one SD of the corresponding binomial distribution Binom(n,p), where n is the total number of variants in each set and p is the corresponding proportion of annotated variants). Numerical results available in Supplementary Table 16. c-d. PRS accuracy, in terms of delta R2, when applying SuSiE-inf sparse component of the posterior effect sizes vs. minPIP-inf sparse component of the posterior effects sizes as weights; similarly, for FINEMAP-inf and minPIP-inf. PRS were computed for 2 out-of-sample cohorts and 7 traits. For descriptions of PRS weights, see Methods. Numerical results available in Supplementary Table 11-12.

Extended Data Fig. 8 AK3 locus for Plt.

4kbp window near the AK3 gene is shown on the plot. GWAS -log10 p-values from BOLT-LMM for the trait platelet count (Plt) are plotted on the top panel, PIPs from 4 fine-mapping methods and 2 aggregated methods are plotted on the subsequent panels. Variant rs12005199 and rs409950 are highlighted with dashed lines.

Extended Data Fig. 9 PCSK9 locus for LDLC.

23kbp window at the PCSK9 gene location is shown on the plot. GWAS -log10 p-values from BOLT-LMM for trait low density lipoprotein cholesterol (LDLC) are plotted on the top panel, PIPs from 4 fine-mapping methods and 2 aggregated methods are plotted on the subsequent panels. The well-known putative causal variant rs11591147 is highlighted with dash line, as well as two intronic variants: rs499883 and rs7552841.

Extended Data Fig. 10 PRS comparison with standard errors.

a-b. Points represents the same values as in Fig. 5, standard errors represent 0.95 level confidence intervals of delta R2. First, standard errors for the R2 of Model 0 and Model 1 (defined in Methods) are computed separately using R function CI.rsq, then combined into the SE (standard error) of delta R2 by taking the square root of the sum of squared. Data is presented as delta R2 + /- SE for both axes.

Supplementary information

Supplementary Information

Supplementary Notes 1–6 and Figs. 1–7.

Reporting Summary

Peer Review File

Supplementary Tables

Numerical results for all figures in the manuscript as well as a few data tables referenced in the manuscript.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, R., Elzur, R.A., Kanai, M. et al. Improving fine-mapping by modeling infinitesimal effects. Nat Genet 56, 162–169 (2024). https://doi.org/10.1038/s41588-023-01597-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01597-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing