Article | Published:

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability

Nature Geneticsvolume 50pages13111317 (2018) | Download Citation

Abstract

Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequencing data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified linkage disequilibrium score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

  2. 2.

    Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

  3. 3.

    Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).

  4. 4.

    Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).

  5. 5.

    Sheehan, S., Harris, K. & Song, Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).

  6. 6.

    Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).

  7. 7.

    Hudson, R. R. & Kaplan, N. L. The coalescent process in models with selection and recombination. Genetics 120, 831–840 (1988).

  8. 8.

    Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).

  9. 9.

    McVean, G. A. & Cardin, N. J. Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond. B 360, 1387–1393 (2005).

  10. 10.

    Marjoram, P. & Wall, J. D. Fast “coalescent” simulation. BMC Genet. 7, 16 (2006).

  11. 11.

    Hobolth, A. & Jensen, J. L. Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor. Popul. Biol. 98, 48–58 (2014).

  12. 12.

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  13. 13.

    Skoglund, P. et al. Genetic evidence for two founding populations of the Americas. Nature 525, 104–108 (2015).

  14. 14.

    Raghavan, M. et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349, aab3884 (2015).

  15. 15.

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

  16. 16.

    Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

  17. 17.

    Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).

  18. 18.

    Vernot, B. & Akey, J. M. Resurrecting surviving Neandertal lineages from modern human genomes. Science 343, 1017–1021 (2014).

  19. 19.

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

  20. 20.

    Stewart, J. R. & Stringer, C. B. Human evolution out of Africa: the role of refugia and climate change. Science 335, 1317–1321 (2012).

  21. 21.

    Hunter-Zinck, H. & Clark, A. G. Aberrant time to most recent common ancestor as a signature of natural selection. Mol. Biol. Evol. 32, 2784–2797 (2015).

  22. 22.

    Morris, A. P., Whittaker, J. C. & Balding, D. J. Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am. J. Hum. Genet. 70, 686–707 (2002).

  23. 23.

    Zöllner, S. & Pritchard, J. K. Coalescent-based association mapping and fine mapping of complex trait loci. Genetics 169, 1071–1092 (2005).

  24. 24.

    Minichiello, M. J. & Durbin, R. Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet. 79, 910–922 (2006).

  25. 25.

    Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

  26. 26.

    Fuchsberger, C., Abecasis, G. R. & Hinds, D. A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).

  27. 27.

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

  28. 28.

    Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 21, 952–960 (2011).

  29. 29.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  30. 30.

    Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  31. 31.

    Palamara, P. F. ARGON: fast, whole-genome simulation of the discrete time Wright–Fisher process. Bioinformatics 32, 3032–3034 (2016).

  32. 32.

    Wakeley, J. & Wilton, P. In Encyclopedia of Evolutionary Biology Vol. 1 (ed. Kliman, R. M.) 287–292 (Oxford Academic, Oxford, 2016).

  33. 33.

    Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

  34. 34.

    Loh, P. R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

  35. 35.

    Bamshad, M. & Wooding, S. P. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4, 99–111 (2003).

  36. 36.

    Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

  37. 37.

    Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).

  38. 38.

    Bersaglieri, T. et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).

  39. 39.

    Barreiro, L. B. & Quintana-Murci, L. From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat. Rev. Genet. 11, 17–30 (2010).

  40. 40.

    The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  41. 41.

    Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).

  42. 42.

    Thierfelder, W. E. et al. Requirement for Stat4 in interleukin-12-mediated responses of natural killer and T cells. Nature 382, 171–174 (1996).

  43. 43.

    Liang, Y. L. et al. Association of STAT4 rs7574865 polymorphism with autoimmune diseases: a meta-analysis. Mol. Biol. Rep. 39, 8873–8882 (2012).

  44. 44.

    Kobayashi, S. et al. Association of STAT4 with susceptibility to rheumatoid arthritis and systemic lupus erythematosus in the Japanese population. Arthritis Rheum. 58, 1940–1946 (2008).

  45. 45.

    Korman, B. D., Kastner, D. L., Gregersen, P. K. & Remmers, E. F. STAT4: genetics, mechanisms, and implications for autoimmunity. Curr. Allergy Asthma Rep. 8, 398–403 (2008).

  46. 46.

    Gendler, S. J. & Spicer, A. P. Epithelial mucin genes. Annu. Rev. Physiol. 57, 607–634 (1995).

  47. 47.

    Kufe, D. W. Mucins in cancer: function, prognosis and therapy. Nat. Rev. Cancer 9, 874–885 (2009).

  48. 48.

    Seibold, M. A. et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. N. Engl. J. Med. 364, 1503–1512 (2011).

  49. 49.

    Ishimaru, Y. et al. Transient receptor potential family members PKD1L3 and PKD2L1 form a candidate sour taste receptor. Proc. Natl Acad. Sci. USA 103, 12569–12574 (2006).

  50. 50.

    Li, A., Tian, X., Sung, S. W. & Somlo, S. Identification of two novel polycystic kidney disease-1-like genes in human and mouse genomes. Genomics 81, 596–608 (2003).

  51. 51.

    Ishimaru, Y. et al. Interaction between PKD1L3 and PKD2L1 through their transmembrane domains is required for localization of PKD2L1 at taste pores in taste cells of circumvallate and foliate papillae. FASEB J. 24, 4058–4067 (2010).

  52. 52.

    Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

  53. 53.

    Raynal, P. & Pollard, H. B. Annexins: the problem of assessing the biological role for a gene family of multifunctional calcium- and phospholipid-binding proteins. Biochim. Biophys. Acta 1197, 63–93 (1994).

  54. 54.

    Wu, N., Liu, S., Guo, C., Hou, Z. & Sun, M. Z. The role of annexin A3 playing in cancers. Clin. Transl. Oncol. 15, 106–110 (2013).

  55. 55.

    Okada, Y. et al. Meta-analysis identifies nine new loci associated with rheumatoid arthritis in the Japanese population. Nat. Genet. 44, 511–516 (2012).

  56. 56.

    Tom Tang, Y. et al. TAFA: a novel secreted family with conserved cysteine residues and restricted expression in the brain. Genomics 83, 727–734 (2004).

  57. 57.

    Sturm, R. A. et al. A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue–brown eye color. Am. J. Hum. Genet. 82, 424–431 (2008).

  58. 58.

    Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015).

  59. 59.

    Huff, C. D. et al. Crohn’s disease and genetic hitchhiking at IBD5. Mol. Biol. Evol. 29, 101–111 (2012).

  60. 60.

    McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).

  61. 61.

    Wakeley, J. (ed.). Coalescent Theory: An Introduction (Roberts & Co., Greenwood Village, CO, USA, 2009).

  62. 62.

    Hernandez, R. D. et al. Classic selective sweeps were rare in recent human evolution. Science 331, 920–924 (2011).

  63. 63.

    Charlesworth, B. Background selection 20 years on: the Wilhelmine E. Key 2012 invitational lecture. J. Hered. 104, 161–171 (2013).

  64. 64.

    Comeron, J. M. Background selection as null hypothesis in population genomics: insights and challenges from Drosophila studies. Phil. Trans. R. Soc. Lond. B 372, 20160471 (2017).

  65. 65.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

  66. 66.

    Torres, R., Szpiech, Z. A. & Hernandez, R. D. Human demographic history has amplified the effects of background selection across the genome. PloS Genet., 14, e1007387 (2018).

  67. 67.

    Enard, D., Messer, P. W. & Petrov, D. A. Genome-wide signals of positive selection in human evolution. Genome Res. 24, 885–895 (2014).

  68. 68.

    Serre, D. et al. No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol. 2, e57 (2004).

  69. 69.

    Pritchard, J. K., Pickrell, J. K. & Coop, G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20, R208–R215 (2010).

  70. 70.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

  71. 71.

    Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

  72. 72.

    Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

  73. 73.

    Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008).

  74. 74.

    Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

  75. 75.

    Hinch, A. G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).

  76. 76.

    Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat. Genet. 43, 847–853 (2011).

  77. 77.

    Gusev, A. et al. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am. J. Hum. Genet. 88, 706–717 (2011).

  78. 78.

    Palamara, P. F. et al. Leveraging distant relatedness to quantify human mutation and gene-conversion rates. Am. J. Hum. Genet. 97, 775–789 (2015).

  79. 79.

    Palamara, P. F., Lencz, T., Darvasi, A. & Pe’er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).

  80. 80.

    Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).

  81. 81.

    Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418 (2015).

  82. 82.

    Nei, M., Suzuki, Y. & Nozawa, M. The neutral theory of molecular evolution in the genomic era. Annu. Rev. Genom. Hum. Genet. 11, 265–289 (2010).

  83. 83.

    Griffiths, R. C. & Marjoram, P. An ancestral recombination graph. In Progress in Population Genetics and Human Evolution 257–270 (Springer, New York, 1997).

  84. 84.

    Simonsen, K. L. & Churchill, G. A. A Markov chain model of coalescence with recombination. Theor. Popul. Biol. 52, 43–59 (1997).

  85. 85.

    Rabiner, L. R. & Juang, B.-H. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 4–16 (1986).

  86. 86.

    Harris, K., Sheehan, S., Kamm, J. A. & Song, Y. S. Decoding coalescent hidden Markov models in linear time. Res. Comput. Mol. Biol. 8394, 100–114 (2014).

  87. 87.

    Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013).

  88. 88.

    Davis, J. & Goadrich, M. The relationship between precision-recall and ROC curves. In Proc. 23rd Int. Conf. on Machine Learning 233–240 (ACM, 2006).

  89. 89.

    Galinsky, K. J., Loh, P. R., Mallick, S., Patterson, N. J. & Price, A. L. Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure. Am. J. Hum. Genet. 99, 1130–1139 (2016).

  90. 90.

    Mathieson, I. & McVean, G. Demography and the age of rare variants. PLoS Genet. 10, e1004528 (2014).

  91. 91.

    Li, M. J. et al. dbPSHP: a database of recent positive selection across human populations. Nucleic Acids Res. 42, D910–D916 (2014).

Download references

Acknowledgements

We thank P.-R. Loh for suggesting several coding improvements for the ASMC software, and for support with the phasing and processing of the UK Biobank data; S. Gazal for support with the S-LDSC analysis and the baselineLD model; I. Shlyakhter for support with the COSI2 simulator; Y. Field for support with the simulation setup in the analysis of recent positive selection; D. Reich for providing computational resources; H. Finucane, Y. Reshef and S. Gusev for helpful discussions. This research was conducted using publicly available datasets (see URLs): the UK Biobank Resource under Application #16549, and the Genome of the Netherlands resource under Application #2017149. We thank the participants of the UK Biobank and the Genome of the Netherlands projects. P.F.P. and A.L.P. were supported by NIH grants R01 MH101244, R01 HG006399 and R01 GM105857; J.T. and Y.S.S. were supported in part by NIH grant R01 GM094402 and a Packard Fellowship for Science and Engineering; Y.S.S. is a Chan Zuckerberg Biohub investigator.

Author information

Affiliations

  1. Department of Statistics, University of Oxford, Oxford, UK

    • Pier Francesco Palamara
  2. Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA

    • Pier Francesco Palamara
    •  & Alkes L. Price
  3. Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA

    • Pier Francesco Palamara
    •  & Alkes L. Price
  4. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Pier Francesco Palamara
    •  & Alkes L. Price
  5. Department of Statistics, University of Michigan, Ann Arbor, MI, USA

    • Jonathan Terhorst
  6. Department of Statistics, University of California, Berkeley, Berkeley, CA, USA

    • Yun S. Song
  7. Computer Science Division, University of California, Berkeley, Berkeley, CA, USA

    • Yun S. Song
  8. Chan Zuckerberg Biohub, San Francisco, CA, USA

    • Yun S. Song

Authors

  1. Search for Pier Francesco Palamara in:

  2. Search for Jonathan Terhorst in:

  3. Search for Yun S. Song in:

  4. Search for Alkes L. Price in:

Contributions

P.F.P. and A.L.P. conceived the study and analyzed results. P.F.P. developed the ASMC algorithm, performed simulations and data analysis. J.T. and Y.S.S. developed the CSFS model used in the SMC++ and ASMC algorithms. P.F.P. and A.L.P. wrote the manuscript with comments from J.T. and Y.S.S.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Pier Francesco Palamara or Alkes L. Price.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–14, Supplementary Tables 1–13 and Supplementary Note

  2. Reporting Summary

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41588-018-0177-x