From genome-wide associations to candidate causal variants by statistical fine-mapping

Abstract

Advancing from statistical associations of complex traits with genetic markers to understanding the functional genetic variants that influence traits is often a complex process. Fine-mapping can select and prioritize genetic variants for further study, yet the multitude of analytical strategies and study designs makes it challenging to choose an optimal approach. We review the strengths and weaknesses of different fine-mapping approaches, emphasizing the main factors that affect performance. Topics include interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Flow of a typical process from initial GWAS to annotation of SNPs selected from fine-mapping analyses.
Fig. 2: Hypothetical examples of fine-mapping strategies.
Fig. 3: Power of conditional analysis.
Fig. 4: Posterior probability for a single causal SNP when 5–40 SNPs are in a region of interest.

References

  1. 1.

    Hardy, J. & Singleton, A. Genomewide association studies and human disease. N. Engl. J. Med. 360, 1759–1768 (2009).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  2. 2.

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

    Article  CAS  Google Scholar 

  3. 3.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  4. 4.

    Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  5. 5.

    Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  6. 6.

    Al Olama, A. A. et al. A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer. Nat. Genet. 46, 1103–1109 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  7. 7.

    Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  8. 8.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    PubMed Central  Article  CAS  Google Scholar 

  9. 9.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    PubMed  CAS  Article  Google Scholar 

  10. 10.

    MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  11. 11.

    Ding, K. & Kullo, I. J. Methods for the selection of tagging SNPs: a comparison of tagging efficiency and performance. Eur. J. Hum. Genet. 15, 228–236 (2007).

    PubMed  CAS  Article  Google Scholar 

  12. 12.

    Stram, D. Tag SNP selection for association studies. Genet. Epidemiol. 27, 365–374 (2004).

    PubMed  Article  Google Scholar 

  13. 13.

    Spain, S. L. & Barrett, J. C. Strategies for fine-mapping complex traits. Hum. Mol. Genet. 24, R111–R119 (2015).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  14. 14.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017). This paper reviews the developments and progress of using summary statistics from genetic association studies to perform joint analyses of genetic variants for use in fine-mapping and to perform transcription-wide association studies (TWAS).

    PubMed  CAS  Article  Google Scholar 

  15. 15.

    Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  16. 16.

    Manolio, T. A. Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363, 166–176 (2010).

    PubMed  CAS  Article  Google Scholar 

  17. 17.

    Pe’er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).

    PubMed  Article  Google Scholar 

  18. 18.

    van de Bunt, M., Cortes, A., Brown, M. A., Morris, A. P. & McCarthy, M. I. Evaluating the performance of fine-mapping strategies at common variant GWAS loci. PLoS Genet. 11, e1005535 (2015). Based on extensive simulations, this paper evaluates various factors that influence statistical fine-mapping and provides guidance on the design of fine-mapping studies.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  19. 19.

    Zaykin, D. V. & Zhivotovsky, L. A. Ranks of genuine associations in whole-genome scans. Genetics 171, 813–823 (2005).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  20. 20.

    Hedrick, P. W. Gametic disequilibrium measures: proceed with caution. Genetics 117, 331–341 (1987).

    PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Devlin, B. & Risch, N. A comparison of linkage diequilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995).

    PubMed  CAS  Article  Google Scholar 

  22. 22.

    Martin, E. R. et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67, 383–394 (2000).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  23. 23.

    Guerreiro, R. J. & Hardy, J. TOMM40 association with Alzheimer disease: tales of APOE and linkage disequilibrium. Arch. Neurol. 69, 1243–1244 (2012).

    PubMed  Article  Google Scholar 

  24. 24.

    Slatkin, M. Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  25. 25.

    Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

    PubMed  CAS  Article  Google Scholar 

  26. 26.

    Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genom. Hum. Genet. 10, 387–406 (2009).

    CAS  Article  Google Scholar 

  27. 27.

    The Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  28. 28.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  29. 29.

    Southam, L. et al. The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur. J. Hum. Genet. 19, 610–614 (2011).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  30. 30.

    Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017). This paper applies three complementary Bayesian fine-mapping methods to a large data set and nicely illustrates novel methods and their interpretations, along with strategies for using annotation to interpret fine-mapping results. The supplemental material is particularly informative for computational strategies for Bayesian fine-mapping.

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  31. 31.

    Amos, C. I. et al. The OncoArray Consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol. Biomarkers Prevent. 26, 126–135 (2017).

    Article  Google Scholar 

  32. 32.

    Voight, B. F. et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  33. 33.

    Parkes, M., Cortes, A., van Heel, D. A. & Brown, M. A. Genetic insights into common pathways and complex relationships among immune-mediated diseases. Nat. Rev. Genet. 14, 661–673 (2013).

    PubMed  CAS  Article  Google Scholar 

  34. 34.

    Hocking, R. A biometrics invited paper. The analysis and selection of variables in linear regression. Biometrics 32, 1–49 (1976).

    Article  Google Scholar 

  35. 35.

    Freedman, D. A note on screening regression equations. Am. Statistician 37, 152–155 (1983).

    Google Scholar 

  36. 36.

    Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

    PubMed  CAS  Article  Google Scholar 

  37. 37.

    Daly, M., Rioux, J., Schaffner, S., Hudson, T. & Lander, E. High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001).

    PubMed  CAS  Article  Google Scholar 

  38. 38.

    Wall, J. D. & Pritchard, J. K. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4, 587–597 (2003).

    PubMed  CAS  Article  Google Scholar 

  39. 39.

    Schwartz, R., Halldorsson, B. V., Bafna, V., Clark, A. G. & Istrail, S. Robustness of inference of haplotype block structure. J. Comp. Biol. 10, 13–19 (2003).

    CAS  Article  Google Scholar 

  40. 40.

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–288 (1996).

    Google Scholar 

  41. 41.

    Cho, S., Kim, H., Oh, S., Kim, K. & Park, T. Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proc. 3 (Suppl. 7), S25 (2009).

    PubMed  PubMed Central  Article  Google Scholar 

  42. 42.

    Breheny, P. & Huang, J. Penalized methods for bi-level variable selection. Statist. Interface 2, 369–380 (2009).

    Article  Google Scholar 

  43. 43.

    Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  44. 44.

    Ayers, K. L. & Cordell, H. J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34, 879–891 (2010).

    PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies, and other large-scale problems. Ann. Appl. Statist. 5, 1780–1815 (2011). This paper provides a Bayesian computational framework to consider a large number of causal variants.

    Article  Google Scholar 

  46. 46.

    Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  47. 47.

    Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015). This paper links Bayesian fine-mapping using summary statistics and full data and describes an efficient computational approach using only relevant variables for each candidate model.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  48. 48.

    Wilson, M. A., Iversen, E. S., Clyde, M. A., Schmidler, S. C. & Schildkraut, J. M. Bayesian model search and multilevel inference for SNP association studies. Ann. Appl. Statist. 4, 1342–1364 (2010).

    Article  Google Scholar 

  49. 49.

    Carlin, B. & Louis, T. Bayesian Methods for Data Analysis 3rd edn, (Chapman and Hall/CRC, Boca Raton, Fl, USA, 2008).

    Google Scholar 

  50. 50.

    Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  51. 51.

    Wallace, C. et al. Dissection of a complex disease susceptibility region using a bayesian stochastic search approach to fine mapping. PLoS Genet. 11, e1005272 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  52. 52.

    Wen, X., Lee, Y., Luca, F. & Pique-Regi, R. Efficient Integrative Multi-SNP Association Analysis via Deterministic Approximation of Posteriors. Am. J. Hum. Genet. 98, 1114–1129 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  53. 53.

    Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  54. 54.

    Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).

    PubMed  CAS  Article  Google Scholar 

  55. 55.

    Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016). This paper builds on prior developments of Bayes methods for fine-mapping and develops a computationally efficient method to explore a wide range of models that can include multiple causal variants in regions of interest.

    PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Dadaev, T. et al. Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants. Nat. Commun. https://doi.org/10.1038/s41467-018-04109-8 (2018).This paper illustrates practical approaches to fine-mapping many genomic regions using Bayesian methods and illustrates the use of quantile regression to evaluate how genomic annotation is associated with SNPs that have a large Bayes posterior probability of being causally related to prostate cancer.

  57. 57.

    Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014). This is the first of a series of papers regarding PAINTOR software for fine-mapping, allowing multiple causal variants and summary statistics and integrating functional annotations.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. 58.

    Lin, D. Y. & Zeng, D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 34, 60–66 (2010).

    PubMed  CAS  Google Scholar 

  59. 59.

    Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  60. 60.

    Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  61. 61.

    Ntzani, E. E., Liberopoulos, G., Manolio, T. A. & Ioannidis, J. P. Consistency of genome-wide associations across major ancestral groups. Hum. Genet. 131, 1057–1071 (2012).

    PubMed  CAS  Article  Google Scholar 

  62. 62.

    Marigorta, U. M. & Navarro, A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 9, e1003566 (2013). This paper illustrates that common genetic associations of complex traits are highly conserved across diverse ethnic populations and motivates the application of trans-ethnic analysis.

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  63. 63.

    Li, Y. R. & Keating, B. J. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 6, 91 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  64. 64.

    Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  65. 65.

    Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. 24, 1330–1336 (2016). This paper demonstrates that reductions in fine-mapping credible sets are heavily dependent on ancestral composition of contributing studies and emphasizes the importance of trans-ethnic study design.

    PubMed  PubMed Central  Article  Google Scholar 

  66. 66.

    Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  67. 67.

    Wang, X. et al. Comparing methods for performing trans-ethnic meta-analysis of genome-wide association studies. Hum. Mol. Genet. 22, 2303–2311 (2013).

    PubMed  CAS  Article  Google Scholar 

  68. 68.

    van Rooij, F. J. et al. Genome-wide trans-ethnic meta-analysis identifies seven genetic loci influencing erythrocyte traits and a role for RBPMS in erythropoiesis. Am. J. Hum. Genet. 100, 51–63 (2017).

    PubMed  Article  CAS  Google Scholar 

  69. 69.

    Franceschini, N. et al. Variant discovery and fine mapping of genetic loci associated with blood pressure traits in Hispanics and African Americans. PLoS ONE 11, e0164132 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  70. 70.

    Larson, N. B. et al. Trans-ethnic meta-analysis identifies common and rare variants associated with hepatocyte growth factor levels in the Multi-Ethnic Study of Atherosclerosis (MESA). Ann. Hum. Genet. 79, 264–274 (2015).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  71. 71.

    Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  72. 72.

    Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011). This paper introduces a Bayesian partition model framework for trans-ethnic fine-mapping by clustering study populations based on genetic similarity in order to account for heterogeneity of allelic effects on a trait.

    PubMed  PubMed Central  Article  Google Scholar 

  73. 73.

    Cannon, M. E. et al. Trans-ancestry fine mapping and molecular assays identify regulatory variants at the ANGPTL8 HDL-C GWAS locus. G3 7, 3217–3227 (2017).

    PubMed  Article  Google Scholar 

  74. 74.

    Magi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  75. 75.

    Yon Rhee, S., Wood, V., Dolinski, K. & Draghici, S. Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 9, 509 (2008).

    Article  CAS  Google Scholar 

  76. 76.

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  77. 77.

    ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

    Article  CAS  Google Scholar 

  78. 78.

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  79. 79.

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    PubMed Central  Article  CAS  Google Scholar 

  80. 80.

    Pennisi, E. ENCODE project writes eulogy for junk DNA. Science 337, 1159–1161 (2012).

    PubMed  CAS  Article  Google Scholar 

  81. 81.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). This paper leverages cell-line regulatory annotation to identify disease-relevant cell types and reveals that common genetic trait associations are enriched in functional DNA.

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  82. 82.

    Ma, M. et al. Disease-associated variants in different categories of disease located in distinct regulatory elements. BMC Genomics 16 (Suppl. 8), S3 (2015).

    PubMed  PubMed Central  Google Scholar 

  83. 83.

    Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

    PubMed  CAS  Article  Google Scholar 

  84. 84.

    Mudge, J. M. & Harrow, J. The state of play in higher eukaryote gene annotation. Nat. Rev. Genet. 17, 758–772 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  85. 85.

    Eilbeck, K., Quinlan, A. & Yandell, M. Settling the score: variant prioritization and Mendelian disease. Nat. Rev. Genet. 18, 599–612 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  86. 86.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  87. 87.

    Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, (877–885 (2016).

    Google Scholar 

  88. 88.

    Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

    PubMed  CAS  Article  Google Scholar 

  89. 89.

    Wingender, E., Dietze, P., Karas, H. & Knuppel, R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  90. 90.

    Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014).

    PubMed  CAS  Article  Google Scholar 

  91. 91.

    Ioannidis, N. et al. FIRE: functional inference of genetic variants that regulate gene expression. Bioinformatics 33, 3895–3901 (2017).

    PubMed  CAS  Article  Google Scholar 

  92. 92.

    Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  93. 93.

    Sveinbjornsson, G. et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat. Genet. 48, 314–317 (2016).

    PubMed  CAS  Article  Google Scholar 

  94. 94.

    Chen, W., McDonnell, S., Thibodeau, S., Tillmans, L. & Schaid, D. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics 204, 933–958 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  95. 95.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  96. 96.

    Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  97. 97.

    Quintana, M. A. et al. Incorporating prior biologic information for high-dimensional rare variant association studies. Hum. Hered. 74, 184–195 (2012).

    PubMed  Article  Google Scholar 

  98. 98.

    Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  99. 99.

    Millstein, J., Zhang, B., Zhu, J. & Schadt, E. E. Disentangling molecular relationships with a causal inference test. BMC Genet. 10, 23 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  100. 100.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  101. 101.

    Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  102. 102.

    Zhu, Z. H. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    PubMed  CAS  Article  Google Scholar 

  103. 103.

    Battle, A., Brown, C. D., Engelhardt, B. E. & Montgomery, S. B. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed  Article  Google Scholar 

  104. 104.

    Magenis, R. E., Brown, M. G., Lacy, D. A., Budden, S. & LaFranchi, S. Is Angelman syndrome an alternate result of del(15)(q11q13)? Am. J. Med. Genet. 28, 829–838 (1987).

    PubMed  CAS  Article  Google Scholar 

  105. 105.

    Antonacci, F. et al. Characterization of six human disease-associated inversion polymorphisms. Hum. Mol. Genet. 18, 2555–2566 (2009).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  106. 106.

    Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  107. 107.

    Auer, P. L. et al. Guidelines for large-scale sequence-based complex trait association studies: lessons learned from the NHLBI Exome Sequencing Project. Am. J. Hum. Genet. 99, 791–801 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  108. 108.

    Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  109. 109.

    Morrison, A. C. et al. Practical approaches for whole-genome sequence analysis of heart- and blood-related traits. Am. J. Hum. Genet. 100, 205–215 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  110. 110.

    Guidugli, L. et al. Assessment of the clinical relevance of BRCA2 missense variants by functional and computational approaches. Am. J. Hum. Genet. 102, 233–248 (2018).

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  111. 111.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  112. 112.

    Haralambieva, I. H. et al. Genome-wide associations of CD46 and IFI44L genetic variants with neutralizing antibody response to measles vaccine. Hum. Genet. 136, 421–435 (2017).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  113. 113.

    Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  114. 114.

    Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet. 4, e1000279 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  115. 115.

    Stephens, M. A unified framework for association analysis with multiple related phenotypes. PloS ONE 8, e65245 (2013).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  116. 116.

    Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PloS ONE 10, e0120758 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  117. 117.

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

    PubMed  CAS  Article  Google Scholar 

  118. 118.

    Quintana, M. A. & Conti, D. V. Integrative variable selection via Bayesian model uncertainty. Stat. Med. 32, 4938–4953 (2013).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  119. 119.

    Quintana, M. A., Berstein, J. L., Thomas, D. C. & Conti, D. V. Incorporating model uncertainty in detecting rare variants: the Bayesian risk index. Genet. Epidemiol. 35, 638–649 (2011).

    PubMed  PubMed Central  Article  Google Scholar 

  120. 120.

    Jostins, L. & McVean, G. Trinculo: Bayesian and frequentist multinomial logistic regression for genome-wide association studies of multi-category phenotypes. Bioinformatics 32, 1898–1900 (2016).

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  121. 121.

    Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the US Public Health Service and National Institutes of Health (contract grant number GM065450).

Reviewer information

Nature Reviews thanks D. Conti and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

Authors

Contributions

All authors contributed to researching content for the article, discussing content and writing. D.J.S. was responsible for reviewing and editing the manuscript before submission.

Corresponding author

Correspondence to Daniel J. Schaid.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

BayesFM: https://sourceforge.net/projects/bayesfm-mcmc-v1-0/

BIMBAM v1.0: http://www.haplotype.org/bimbam.html

BVS v4.12.1: https://cran.r-project.org/web/packages/BVS

CAVIAR/eCAVIAR: http://genetics.cs.ucla.edu/caviar/

CAVIARBF v0.2.1: https://bitbucket.org/Wenan/caviarbf

DAP v1.0.0: https://github.com/xqwen/dap

fgwas v0.3.6: https://github.com/joepickrell/fgwas

FINEMAP v1.1: http://www.christianbenner.com/

Fine-mapping: https://github.com/hailianghuang/Fine-mapping

FM-QTL: https://github.com/xqwen/fmeqtl

JAM in R2BGLiMS v0.1: https://github.com/pjnewcombe/R2BGLiMS

mvBIMBAM v1.0.0: http://stephenslab.uchicago.edu/software.html#mvbimbam

PAINTOR v3.0: https://github.com/gkichaev/PAINTOR_V3.0

piMASS v0.9: http://www.haplotype.org/pimass.html

SNPTEST v2.5.4-beta3: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html

Trinculo: https://sourceforge.net/projects/trinculo/

Supplementary information

Glossary

Genome-wide association studies

(GWAS). Scans of genetic markers, typically single-nucleotide polymorphisms (SNPs), across DNA of many subjects to find variants statistically associated with a complex trait.

Complex traits

Either quantitative traits (for example, blood pressure and height) or common diseases (for example, major cancers) that are caused by many genetic and environmental factors working together, each having a relatively small effect and few, if any, being absolutely required for disease to occur.

Tag SNPs

Single-nucleotide polymorphisms (SNPs) that are sufficiently correlated with neighbouring SNPs such that the tag SNP serves as a surrogate for unmeasured SNPs.

Linkage disequilibrium

(LD). Nonrandom association of alleles at different loci on a haplotype in a given population. LD is key to fine-mapping because coinheritance without recombination of alleles from different variants implies that the variants are proximal on the same chromosome.

Causal variants

Genetic variants that mechanistically contribute to diseases or quantitative traits but are not fully penetrant in the sense that the variant may not be a sufficient cause in isolation.

Fine-mapping

To refine the genomic localization of causal variants by the use of statistical, bioinformatic or functional methods.

Penalized regression

A way to estimate regression coefficients by maximizing the log-likelihood of the data while placing a penalty that constrains the size of the regression coefficients, shrinking small coefficients towards zero, sometimes exactly to zero. Although this causes coefficient estimates to be biased, it improves the overall prediction of the model by decreasing the variance of the coefficient estimates.

Summary statistics

Measures of statistical association between a trait and one or more single-nucleotide polymorphisms (SNPs) that summarize the size of effects of the SNPs on the trait, the variances of the effect sizes and how the effect sizes are correlated among themselves. For case–control studies, summary statistics include the estimated log-odds ratios from logistic regression, the variances of the log-odds ratios and the correlations among the log-odds ratios.

Trans-ethnic

A type of genetic association study that includes subjects from more than one ethnic background.

Multiple testing correction

When testing more than one statistical association, the probability of declaring at least one significant result increases as the number of statistical tests increases. If each of m independent statistical tests uses P value < α to declare significance, then the chance that at least one of the m tests is found to be significant is approximately . Multiple testing correction maintains the overall chance of declaring at least one significant result by using more stringent P value thresholds for each association tested. The Bonferroni correction uses P value < α/m to test each association.

Statistical power

The probability of correctly rejecting a null hypothesis of no statistical association between a single-nucleotide polymorphism (SNP) and a trait when in truth a statistical association exists. Power depends on the magnitude of the SNP effect, the sample size and the P value threshold for deciding statistical significance.

Haplotype

A combination of alleles found on the same chromosome.

Haplotype block

A set of highly associated alleles on a chromosome that tend to be inherited together.

Genotype imputation

A method for estimating (imputing) the unobserved genotypes of study subjects, both for individuals with missing or unreliable genotypes at a genotyped single-nucleotide polymorphism (SNP) and for all individuals at an ungenotyped SNP.

Recombination hot spots

Genomic regions where the rate of recombination is much higher than the neutral expectation.

Cross-validation

A technique to build a prediction model by randomly partitioning the sample into a training set to train the model (for example, determining which single-nucleotide polymorphisms (SNPs) to include in a model) and a test set to measure its predictive performance (for example, average squared prediction error). It is common to split the original sample into ten equally sized subsamples, use nine to train and one to test, repeat this process ten times such that each of the ten subsamples is used as a test sample, and then average the predictive performance over the ten training subsamples.

Prior probability

In Bayesian probability theory, the probability distribution assigned to parameters of interest, specified to represent prior knowledge of their values before observing the data.

Posterior probability

In Bayesian probability theory, the updated probability distribution of parameters of interest, conditional on the observed data.

Posterior inclusion probability

(PIP). The marginal probability that a single-nucleotide polymorphism (SNP) is included in any causal model, conditional on the observed data, thereby providing weight of evidence that a SNP should be included as potentially causative.

Expression quantitative trait loci

(eQTLs). Genomic regions that harbour one or more nucleotide variants that influence the amount of expression of a gene.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schaid, D.J., Chen, W. & Larson, N.B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet 19, 491–504 (2018). https://doi.org/10.1038/s41576-018-0016-z

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing