Review Article | Published:

From genome-wide associations to candidate causal variants by statistical fine-mapping

Nature Reviews Geneticsvolume 19pages491504 (2018) | Download Citation

Abstract

Advancing from statistical associations of complex traits with genetic markers to understanding the functional genetic variants that influence traits is often a complex process. Fine-mapping can select and prioritize genetic variants for further study, yet the multitude of analytical strategies and study designs makes it challenging to choose an optimal approach. We review the strengths and weaknesses of different fine-mapping approaches, emphasizing the main factors that affect performance. Topics include interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

References

  1. 1.

    Hardy, J. & Singleton, A. Genomewide association studies and human disease. N. Engl. J. Med. 360, 1759–1768 (2009).

  2. 2.

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  3. 3.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  4. 4.

    Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

  5. 5.

    Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).

  6. 6.

    Al Olama, A. A. et al. A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer. Nat. Genet. 46, 1103–1109 (2014).

  7. 7.

    Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

  8. 8.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  9. 9.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

  10. 10.

    MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

  11. 11.

    Ding, K. & Kullo, I. J. Methods for the selection of tagging SNPs: a comparison of tagging efficiency and performance. Eur. J. Hum. Genet. 15, 228–236 (2007).

  12. 12.

    Stram, D. Tag SNP selection for association studies. Genet. Epidemiol. 27, 365–374 (2004).

  13. 13.

    Spain, S. L. & Barrett, J. C. Strategies for fine-mapping complex traits. Hum. Mol. Genet. 24, R111–R119 (2015).

  14. 14.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017). This paper reviews the developments and progress of using summary statistics from genetic association studies to perform joint analyses of genetic variants for use in fine-mapping and to perform transcription-wide association studies (TWAS).

  15. 15.

    Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

  16. 16.

    Manolio, T. A. Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363, 166–176 (2010).

  17. 17.

    Pe’er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).

  18. 18.

    van de Bunt, M., Cortes, A., Brown, M. A., Morris, A. P. & McCarthy, M. I. Evaluating the performance of fine-mapping strategies at common variant GWAS loci. PLoS Genet. 11, e1005535 (2015). Based on extensive simulations, this paper evaluates various factors that influence statistical fine-mapping and provides guidance on the design of fine-mapping studies.

  19. 19.

    Zaykin, D. V. & Zhivotovsky, L. A. Ranks of genuine associations in whole-genome scans. Genetics 171, 813–823 (2005).

  20. 20.

    Hedrick, P. W. Gametic disequilibrium measures: proceed with caution. Genetics 117, 331–341 (1987).

  21. 21.

    Devlin, B. & Risch, N. A comparison of linkage diequilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995).

  22. 22.

    Martin, E. R. et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am. J. Hum. Genet. 67, 383–394 (2000).

  23. 23.

    Guerreiro, R. J. & Hardy, J. TOMM40 association with Alzheimer disease: tales of APOE and linkage disequilibrium. Arch. Neurol. 69, 1243–1244 (2012).

  24. 24.

    Slatkin, M. Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008).

  25. 25.

    Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

  26. 26.

    Li, Y., Willer, C., Sanna, S. & Abecasis, G. Genotype imputation. Annu. Rev. Genom. Hum. Genet. 10, 387–406 (2009).

  27. 27.

    The Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  28. 28.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

  29. 29.

    Southam, L. et al. The effect of genome-wide association scan quality control on imputation outcome for common variants. Eur. J. Hum. Genet. 19, 610–614 (2011).

  30. 30.

    Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017). This paper applies three complementary Bayesian fine-mapping methods to a large data set and nicely illustrates novel methods and their interpretations, along with strategies for using annotation to interpret fine-mapping results. The supplemental material is particularly informative for computational strategies for Bayesian fine-mapping.

  31. 31.

    Amos, C. I. et al. The OncoArray Consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol. Biomarkers Prevent. 26, 126–135 (2017).

  32. 32.

    Voight, B. F. et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).

  33. 33.

    Parkes, M., Cortes, A., van Heel, D. A. & Brown, M. A. Genetic insights into common pathways and complex relationships among immune-mediated diseases. Nat. Rev. Genet. 14, 661–673 (2013).

  34. 34.

    Hocking, R. A biometrics invited paper. The analysis and selection of variables in linear regression. Biometrics 32, 1–49 (1976).

  35. 35.

    Freedman, D. A note on screening regression equations. Am. Statistician 37, 152–155 (1983).

  36. 36.

    Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

  37. 37.

    Daly, M., Rioux, J., Schaffner, S., Hudson, T. & Lander, E. High-resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001).

  38. 38.

    Wall, J. D. & Pritchard, J. K. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4, 587–597 (2003).

  39. 39.

    Schwartz, R., Halldorsson, B. V., Bafna, V., Clark, A. G. & Istrail, S. Robustness of inference of haplotype block structure. J. Comp. Biol. 10, 13–19 (2003).

  40. 40.

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 58, 267–288 (1996).

  41. 41.

    Cho, S., Kim, H., Oh, S., Kim, K. & Park, T. Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proc. 3 (Suppl. 7), S25 (2009).

  42. 42.

    Breheny, P. & Huang, J. Penalized methods for bi-level variable selection. Statist. Interface 2, 369–380 (2009).

  43. 43.

    Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, e1000130 (2008).

  44. 44.

    Ayers, K. L. & Cordell, H. J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 34, 879–891 (2010).

  45. 45.

    Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies, and other large-scale problems. Ann. Appl. Statist. 5, 1780–1815 (2011). This paper provides a Bayesian computational framework to consider a large number of causal variants.

  46. 46.

    Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

  47. 47.

    Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015). This paper links Bayesian fine-mapping using summary statistics and full data and describes an efficient computational approach using only relevant variables for each candidate model.

  48. 48.

    Wilson, M. A., Iversen, E. S., Clyde, M. A., Schmidler, S. C. & Schildkraut, J. M. Bayesian model search and multilevel inference for SNP association studies. Ann. Appl. Statist. 4, 1342–1364 (2010).

  49. 49.

    Carlin, B. & Louis, T. Bayesian Methods for Data Analysis 3rd edn, (Chapman and Hall/CRC, Boca Raton, Fl, USA, 2008).

  50. 50.

    Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

  51. 51.

    Wallace, C. et al. Dissection of a complex disease susceptibility region using a bayesian stochastic search approach to fine mapping. PLoS Genet. 11, e1005272 (2015).

  52. 52.

    Wen, X., Lee, Y., Luca, F. & Pique-Regi, R. Efficient Integrative Multi-SNP Association Analysis via Deterministic Approximation of Posteriors. Am. J. Hum. Genet. 98, 1114–1129 (2016).

  53. 53.

    Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

  54. 54.

    Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).

  55. 55.

    Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016). This paper builds on prior developments of Bayes methods for fine-mapping and develops a computationally efficient method to explore a wide range of models that can include multiple causal variants in regions of interest.

  56. 56.

    Dadaev, T. et al. Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants. Nat. Commun. https://doi.org/10.1038/s41467-018-04109-8 (2018).This paper illustrates practical approaches to fine-mapping many genomic regions using Bayesian methods and illustrates the use of quantile regression to evaluate how genomic annotation is associated with SNPs that have a large Bayes posterior probability of being causally related to prostate cancer.

  57. 57.

    Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014). This is the first of a series of papers regarding PAINTOR software for fine-mapping, allowing multiple causal variants and summary statistics and integrating functional annotations.

  58. 58.

    Lin, D. Y. & Zeng, D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 34, 60–66 (2010).

  59. 59.

    Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

  60. 60.

    Benner, C. et al. Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am. J. Hum. Genet. 101, 539–551 (2017).

  61. 61.

    Ntzani, E. E., Liberopoulos, G., Manolio, T. A. & Ioannidis, J. P. Consistency of genome-wide associations across major ancestral groups. Hum. Genet. 131, 1057–1071 (2012).

  62. 62.

    Marigorta, U. M. & Navarro, A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 9, e1003566 (2013). This paper illustrates that common genetic associations of complex traits are highly conserved across diverse ethnic populations and motivates the application of trans-ethnic analysis.

  63. 63.

    Li, Y. R. & Keating, B. J. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 6, 91 (2014).

  64. 64.

    Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

  65. 65.

    Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. 24, 1330–1336 (2016). This paper demonstrates that reductions in fine-mapping credible sets are heavily dependent on ancestral composition of contributing studies and emphasizes the importance of trans-ethnic study design.

  66. 66.

    Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).

  67. 67.

    Wang, X. et al. Comparing methods for performing trans-ethnic meta-analysis of genome-wide association studies. Hum. Mol. Genet. 22, 2303–2311 (2013).

  68. 68.

    van Rooij, F. J. et al. Genome-wide trans-ethnic meta-analysis identifies seven genetic loci influencing erythrocyte traits and a role for RBPMS in erythropoiesis. Am. J. Hum. Genet. 100, 51–63 (2017).

  69. 69.

    Franceschini, N. et al. Variant discovery and fine mapping of genetic loci associated with blood pressure traits in Hispanics and African Americans. PLoS ONE 11, e0164132 (2016).

  70. 70.

    Larson, N. B. et al. Trans-ethnic meta-analysis identifies common and rare variants associated with hepatocyte growth factor levels in the Multi-Ethnic Study of Atherosclerosis (MESA). Ann. Hum. Genet. 79, 264–274 (2015).

  71. 71.

    Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).

  72. 72.

    Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011). This paper introduces a Bayesian partition model framework for trans-ethnic fine-mapping by clustering study populations based on genetic similarity in order to account for heterogeneity of allelic effects on a trait.

  73. 73.

    Cannon, M. E. et al. Trans-ancestry fine mapping and molecular assays identify regulatory variants at the ANGPTL8 HDL-C GWAS locus. G3 7, 3217–3227 (2017).

  74. 74.

    Magi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).

  75. 75.

    Yon Rhee, S., Wood, V., Dolinski, K. & Draghici, S. Use and misuse of the gene ontology annotations. Nat. Rev. Genet. 9, 509 (2008).

  76. 76.

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  77. 77.

    ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).

  78. 78.

    Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

  79. 79.

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  80. 80.

    Pennisi, E. ENCODE project writes eulogy for junk DNA. Science 337, 1159–1161 (2012).

  81. 81.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). This paper leverages cell-line regulatory annotation to identify disease-relevant cell types and reveals that common genetic trait associations are enriched in functional DNA.

  82. 82.

    Ma, M. et al. Disease-associated variants in different categories of disease located in distinct regulatory elements. BMC Genomics 16 (Suppl. 8), S3 (2015).

  83. 83.

    Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

  84. 84.

    Mudge, J. M. & Harrow, J. The state of play in higher eukaryote gene annotation. Nat. Rev. Genet. 17, 758–772 (2016).

  85. 85.

    Eilbeck, K., Quinlan, A. & Yandell, M. Settling the score: variant prioritization and Mendelian disease. Nat. Rev. Genet. 18, 599–612 (2017).

  86. 86.

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  87. 87.

    Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, (877–885 (2016).

  88. 88.

    Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  89. 89.

    Wingender, E., Dietze, P., Karas, H. & Knuppel, R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res. 24, 238–241 (1996).

  90. 90.

    Mathelier, A. et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42, D142–D147 (2014).

  91. 91.

    Ioannidis, N. et al. FIRE: functional inference of genetic variants that regulate gene expression. Bioinformatics 33, 3895–3901 (2017).

  92. 92.

    Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

  93. 93.

    Sveinbjornsson, G. et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat. Genet. 48, 314–317 (2016).

  94. 94.

    Chen, W., McDonnell, S., Thibodeau, S., Tillmans, L. & Schaid, D. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics 204, 933–958 (2016).

  95. 95.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

  96. 96.

    Wen, X., Luca, F. & Pique-Regi, R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 11, e1005176 (2015).

  97. 97.

    Quintana, M. A. et al. Incorporating prior biologic information for high-dimensional rare variant association studies. Hum. Hered. 74, 184–195 (2012).

  98. 98.

    Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

  99. 99.

    Millstein, J., Zhang, B., Zhu, J. & Schadt, E. E. Disentangling molecular relationships with a causal inference test. BMC Genet. 10, 23 (2009).

  100. 100.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

  101. 101.

    Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

  102. 102.

    Zhu, Z. H. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

  103. 103.

    Battle, A., Brown, C. D., Engelhardt, B. E. & Montgomery, S. B. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  104. 104.

    Magenis, R. E., Brown, M. G., Lacy, D. A., Budden, S. & LaFranchi, S. Is Angelman syndrome an alternate result of del(15)(q11q13)? Am. J. Med. Genet. 28, 829–838 (1987).

  105. 105.

    Antonacci, F. et al. Characterization of six human disease-associated inversion polymorphisms. Hum. Mol. Genet. 18, 2555–2566 (2009).

  106. 106.

    Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017).

  107. 107.

    Auer, P. L. et al. Guidelines for large-scale sequence-based complex trait association studies: lessons learned from the NHLBI Exome Sequencing Project. Am. J. Hum. Genet. 99, 791–801 (2016).

  108. 108.

    Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

  109. 109.

    Morrison, A. C. et al. Practical approaches for whole-genome sequence analysis of heart- and blood-related traits. Am. J. Hum. Genet. 100, 205–215 (2017).

  110. 110.

    Guidugli, L. et al. Assessment of the clinical relevance of BRCA2 missense variants by functional and computational approaches. Am. J. Hum. Genet. 102, 233–248 (2018).

  111. 111.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

  112. 112.

    Haralambieva, I. H. et al. Genome-wide associations of CD46 and IFI44L genetic variants with neutralizing antibody response to measles vaccine. Hum. Genet. 136, 421–435 (2017).

  113. 113.

    Servin, B. & Stephens, M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 3, e114 (2007).

  114. 114.

    Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet. 4, e1000279 (2008).

  115. 115.

    Stephens, M. A unified framework for association analysis with multiple related phenotypes. PloS ONE 8, e65245 (2013).

  116. 116.

    Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PloS ONE 10, e0120758 (2015).

  117. 117.

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

  118. 118.

    Quintana, M. A. & Conti, D. V. Integrative variable selection via Bayesian model uncertainty. Stat. Med. 32, 4938–4953 (2013).

  119. 119.

    Quintana, M. A., Berstein, J. L., Thomas, D. C. & Conti, D. V. Incorporating model uncertainty in detecting rare variants: the Bayesian risk index. Genet. Epidemiol. 35, 638–649 (2011).

  120. 120.

    Jostins, L. & McVean, G. Trinculo: Bayesian and frequentist multinomial logistic regression for genome-wide association studies of multi-category phenotypes. Bioinformatics 32, 1898–1900 (2016).

  121. 121.

    Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2008).

Download references

Acknowledgements

This research was supported by the US Public Health Service and National Institutes of Health (contract grant number GM065450).

Reviewer information

Nature Reviews thanks D. Conti and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Affiliations

  1. Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA

    • Daniel J. Schaid
    •  & Nicholas B. Larson
  2. Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN, USA

    • Wenan Chen

Authors

  1. Search for Daniel J. Schaid in:

  2. Search for Wenan Chen in:

  3. Search for Nicholas B. Larson in:

Contributions

All authors contributed to researching content for the article, discussing content and writing. D.J.S. was responsible for reviewing and editing the manuscript before submission.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Daniel J. Schaid.

Supplementary information

Glossary

Genome-wide association studies

(GWAS). Scans of genetic markers, typically single-nucleotide polymorphisms (SNPs), across DNA of many subjects to find variants statistically associated with a complex trait.

Complex traits

Either quantitative traits (for example, blood pressure and height) or common diseases (for example, major cancers) that are caused by many genetic and environmental factors working together, each having a relatively small effect and few, if any, being absolutely required for disease to occur.

Tag SNPs

Single-nucleotide polymorphisms (SNPs) that are sufficiently correlated with neighbouring SNPs such that the tag SNP serves as a surrogate for unmeasured SNPs.

Linkage disequilibrium

(LD). Nonrandom association of alleles at different loci on a haplotype in a given population. LD is key to fine-mapping because coinheritance without recombination of alleles from different variants implies that the variants are proximal on the same chromosome.

Causal variants

Genetic variants that mechanistically contribute to diseases or quantitative traits but are not fully penetrant in the sense that the variant may not be a sufficient cause in isolation.

Fine-mapping

To refine the genomic localization of causal variants by the use of statistical, bioinformatic or functional methods.

Penalized regression

A way to estimate regression coefficients by maximizing the log-likelihood of the data while placing a penalty that constrains the size of the regression coefficients, shrinking small coefficients towards zero, sometimes exactly to zero. Although this causes coefficient estimates to be biased, it improves the overall prediction of the model by decreasing the variance of the coefficient estimates.

Summary statistics

Measures of statistical association between a trait and one or more single-nucleotide polymorphisms (SNPs) that summarize the size of effects of the SNPs on the trait, the variances of the effect sizes and how the effect sizes are correlated among themselves. For case–control studies, summary statistics include the estimated log-odds ratios from logistic regression, the variances of the log-odds ratios and the correlations among the log-odds ratios.

Trans-ethnic

A type of genetic association study that includes subjects from more than one ethnic background.

Multiple testing correction

When testing more than one statistical association, the probability of declaring at least one significant result increases as the number of statistical tests increases. If each of m independent statistical tests uses P value < α to declare significance, then the chance that at least one of the m tests is found to be significant is approximately . Multiple testing correction maintains the overall chance of declaring at least one significant result by using more stringent P value thresholds for each association tested. The Bonferroni correction uses P value < α/m to test each association.

Statistical power

The probability of correctly rejecting a null hypothesis of no statistical association between a single-nucleotide polymorphism (SNP) and a trait when in truth a statistical association exists. Power depends on the magnitude of the SNP effect, the sample size and the P value threshold for deciding statistical significance.

Haplotype

A combination of alleles found on the same chromosome.

Haplotype block

A set of highly associated alleles on a chromosome that tend to be inherited together.

Genotype imputation

A method for estimating (imputing) the unobserved genotypes of study subjects, both for individuals with missing or unreliable genotypes at a genotyped single-nucleotide polymorphism (SNP) and for all individuals at an ungenotyped SNP.

Recombination hot spots

Genomic regions where the rate of recombination is much higher than the neutral expectation.

Cross-validation

A technique to build a prediction model by randomly partitioning the sample into a training set to train the model (for example, determining which single-nucleotide polymorphisms (SNPs) to include in a model) and a test set to measure its predictive performance (for example, average squared prediction error). It is common to split the original sample into ten equally sized subsamples, use nine to train and one to test, repeat this process ten times such that each of the ten subsamples is used as a test sample, and then average the predictive performance over the ten training subsamples.

Prior probability

In Bayesian probability theory, the probability distribution assigned to parameters of interest, specified to represent prior knowledge of their values before observing the data.

Posterior probability

In Bayesian probability theory, the updated probability distribution of parameters of interest, conditional on the observed data.

Posterior inclusion probability

(PIP). The marginal probability that a single-nucleotide polymorphism (SNP) is included in any causal model, conditional on the observed data, thereby providing weight of evidence that a SNP should be included as potentially causative.

Expression quantitative trait loci

(eQTLs). Genomic regions that harbour one or more nucleotide variants that influence the amount of expression of a gene.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/s41576-018-0016-z