High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability

Abstract

Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequencing data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified linkage disequilibrium score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: ASMC accuracy in coalescent simulations.
Fig. 2: Running time of ASMC.
Fig. 3: Genome-wide scan for recent positive selection in the UK Biobank dataset.
Fig. 4: S-LDSC analysis of ASMCavg background selection annotation and disease heritability.

References

  1. 1.

    Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Hobolth, A., Christensen, O. F., Mailund, T. & Schierup, M. H. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 3, e7 (2007).

    PubMed  PubMed Central  Google Scholar 

  5. 5.

    Sheehan, S., Harris, K. & Song, Y. S. Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013).

    PubMed  PubMed Central  Google Scholar 

  6. 6.

    Rasmussen, M. D., Hubisz, M. J., Gronau, I. & Siepel, A. Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10, e1004342 (2014).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Hudson, R. R. & Kaplan, N. L. The coalescent process in models with selection and recombination. Genetics 120, 831–840 (1988).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    McVean, G. A. & Cardin, N. J. Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond. B 360, 1387–1393 (2005).

    CAS  Google Scholar 

  10. 10.

    Marjoram, P. & Wall, J. D. Fast “coalescent” simulation. BMC Genet. 7, 16 (2006).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Hobolth, A. & Jensen, J. L. Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor. Popul. Biol. 98, 48–58 (2014).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  13. 13.

    Skoglund, P. et al. Genetic evidence for two founding populations of the Americas. Nature 525, 104–108 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Raghavan, M. et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349, aab3884 (2015).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Vernot, B. & Akey, J. M. Resurrecting surviving Neandertal lineages from modern human genomes. Science 343, 1017–1021 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Stewart, J. R. & Stringer, C. B. Human evolution out of Africa: the role of refugia and climate change. Science 335, 1317–1321 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Hunter-Zinck, H. & Clark, A. G. Aberrant time to most recent common ancestor as a signature of natural selection. Mol. Biol. Evol. 32, 2784–2797 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Morris, A. P., Whittaker, J. C. & Balding, D. J. Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am. J. Hum. Genet. 70, 686–707 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Zöllner, S. & Pritchard, J. K. Coalescent-based association mapping and fine mapping of complex trait loci. Genetics 169, 1071–1092 (2005).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Minichiello, M. J. & Durbin, R. Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet. 79, 910–922 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    PubMed  PubMed Central  Google Scholar 

  26. 26.

    Fuchsberger, C., Abecasis, G. R. & Hinds, D. A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

    CAS  Google Scholar 

  28. 28.

    Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 21, 952–960 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Google Scholar 

  30. 30.

    Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014).

  31. 31.

    Palamara, P. F. ARGON: fast, whole-genome simulation of the discrete time Wright–Fisher process. Bioinformatics 32, 3032–3034 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Wakeley, J. & Wilton, P. In Encyclopedia of Evolutionary Biology Vol. 1 (ed. Kliman, R. M.) 287–292 (Oxford Academic, Oxford, 2016).

  33. 33.

    Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Loh, P. R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Bamshad, M. & Wooding, S. P. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4, 99–111 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Bersaglieri, T. et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Barreiro, L. B. & Quintana-Murci, L. From evolutionary genetics to human immunology: how selection shapes host defence genes. Nat. Rev. Genet. 11, 17–30 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  41. 41.

    Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Thierfelder, W. E. et al. Requirement for Stat4 in interleukin-12-mediated responses of natural killer and T cells. Nature 382, 171–174 (1996).

    CAS  Google Scholar 

  43. 43.

    Liang, Y. L. et al. Association of STAT4 rs7574865 polymorphism with autoimmune diseases: a meta-analysis. Mol. Biol. Rep. 39, 8873–8882 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Kobayashi, S. et al. Association of STAT4 with susceptibility to rheumatoid arthritis and systemic lupus erythematosus in the Japanese population. Arthritis Rheum. 58, 1940–1946 (2008).

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    Korman, B. D., Kastner, D. L., Gregersen, P. K. & Remmers, E. F. STAT4: genetics, mechanisms, and implications for autoimmunity. Curr. Allergy Asthma Rep. 8, 398–403 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Gendler, S. J. & Spicer, A. P. Epithelial mucin genes. Annu. Rev. Physiol. 57, 607–634 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Kufe, D. W. Mucins in cancer: function, prognosis and therapy. Nat. Rev. Cancer 9, 874–885 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Seibold, M. A. et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. N. Engl. J. Med. 364, 1503–1512 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Ishimaru, Y. et al. Transient receptor potential family members PKD1L3 and PKD2L1 form a candidate sour taste receptor. Proc. Natl Acad. Sci. USA 103, 12569–12574 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Li, A., Tian, X., Sung, S. W. & Somlo, S. Identification of two novel polycystic kidney disease-1-like genes in human and mouse genomes. Genomics 81, 596–608 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Ishimaru, Y. et al. Interaction between PKD1L3 and PKD2L1 through their transmembrane domains is required for localization of PKD2L1 at taste pores in taste cells of circumvallate and foliate papillae. FASEB J. 24, 4058–4067 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

    CAS  Google Scholar 

  53. 53.

    Raynal, P. & Pollard, H. B. Annexins: the problem of assessing the biological role for a gene family of multifunctional calcium- and phospholipid-binding proteins. Biochim. Biophys. Acta 1197, 63–93 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Wu, N., Liu, S., Guo, C., Hou, Z. & Sun, M. Z. The role of annexin A3 playing in cancers. Clin. Transl. Oncol. 15, 106–110 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Okada, Y. et al. Meta-analysis identifies nine new loci associated with rheumatoid arthritis in the Japanese population. Nat. Genet. 44, 511–516 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Tom Tang, Y. et al. TAFA: a novel secreted family with conserved cysteine residues and restricted expression in the brain. Genomics 83, 727–734 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Sturm, R. A. et al. A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue–brown eye color. Am. J. Hum. Genet. 82, 424–431 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Huff, C. D. et al. Crohn’s disease and genetic hitchhiking at IBD5. Mol. Biol. Evol. 29, 101–111 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).

    PubMed  PubMed Central  Google Scholar 

  61. 61.

    Wakeley, J. (ed.). Coalescent Theory: An Introduction (Roberts & Co., Greenwood Village, CO, USA, 2009).

  62. 62.

    Hernandez, R. D. et al. Classic selective sweeps were rare in recent human evolution. Science 331, 920–924 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Charlesworth, B. Background selection 20 years on: the Wilhelmine E. Key 2012 invitational lecture. J. Hered. 104, 161–171 (2013).

    PubMed  PubMed Central  Google Scholar 

  64. 64.

    Comeron, J. M. Background selection as null hypothesis in population genomics: insights and challenges from Drosophila studies. Phil. Trans. R. Soc. Lond. B 372, 20160471 (2017).

  65. 65.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Torres, R., Szpiech, Z. A. & Hernandez, R. D. Human demographic history has amplified the effects of background selection across the genome. PloS Genet., 14, e1007387 (2018).

  67. 67.

    Enard, D., Messer, P. W. & Petrov, D. A. Genome-wide signals of positive selection in human evolution. Genome Res. 24, 885–895 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Serre, D. et al. No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol. 2, e57 (2004).

    PubMed  PubMed Central  Google Scholar 

  69. 69.

    Pritchard, J. K., Pickrell, J. K. & Coop, G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20, R208–R215 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Hinch, A. G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat. Genet. 43, 847–853 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Gusev, A. et al. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am. J. Hum. Genet. 88, 706–717 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78.

    Palamara, P. F. et al. Leveraging distant relatedness to quantify human mutation and gene-conversion rates. Am. J. Hum. Genet. 97, 775–789 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79.

    Palamara, P. F., Lencz, T., Darvasi, A. & Pe’er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. 82.

    Nei, M., Suzuki, Y. & Nozawa, M. The neutral theory of molecular evolution in the genomic era. Annu. Rev. Genom. Hum. Genet. 11, 265–289 (2010).

    CAS  Google Scholar 

  83. 83.

    Griffiths, R. C. & Marjoram, P. An ancestral recombination graph. In Progress in Population Genetics and Human Evolution 257–270 (Springer, New York, 1997).

  84. 84.

    Simonsen, K. L. & Churchill, G. A. A Markov chain model of coalescence with recombination. Theor. Popul. Biol. 52, 43–59 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85.

    Rabiner, L. R. & Juang, B.-H. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 4–16 (1986).

    Google Scholar 

  86. 86.

    Harris, K., Sheehan, S., Kamm, J. A. & Song, Y. S. Decoding coalescent hidden Markov models in linear time. Res. Comput. Mol. Biol. 8394, 100–114 (2014).

    PubMed  PubMed Central  Google Scholar 

  87. 87.

    Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. 88.

    Davis, J. & Goadrich, M. The relationship between precision-recall and ROC curves. In Proc. 23rd Int. Conf. on Machine Learning 233–240 (ACM, 2006).

  89. 89.

    Galinsky, K. J., Loh, P. R., Mallick, S., Patterson, N. J. & Price, A. L. Population structure of UK Biobank and ancient Eurasians reveals adaptation at genes influencing blood pressure. Am. J. Hum. Genet. 99, 1130–1139 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. 90.

    Mathieson, I. & McVean, G. Demography and the age of rare variants. PLoS Genet. 10, e1004528 (2014).

    PubMed  PubMed Central  Google Scholar 

  91. 91.

    Li, M. J. et al. dbPSHP: a database of recent positive selection across human populations. Nucleic Acids Res. 42, D910–D916 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank P.-R. Loh for suggesting several coding improvements for the ASMC software, and for support with the phasing and processing of the UK Biobank data; S. Gazal for support with the S-LDSC analysis and the baselineLD model; I. Shlyakhter for support with the COSI2 simulator; Y. Field for support with the simulation setup in the analysis of recent positive selection; D. Reich for providing computational resources; H. Finucane, Y. Reshef and S. Gusev for helpful discussions. This research was conducted using publicly available datasets (see URLs): the UK Biobank Resource under Application #16549, and the Genome of the Netherlands resource under Application #2017149. We thank the participants of the UK Biobank and the Genome of the Netherlands projects. P.F.P. and A.L.P. were supported by NIH grants R01 MH101244, R01 HG006399 and R01 GM105857; J.T. and Y.S.S. were supported in part by NIH grant R01 GM094402 and a Packard Fellowship for Science and Engineering; Y.S.S. is a Chan Zuckerberg Biohub investigator.

Author information

Affiliations

Authors

Contributions

P.F.P. and A.L.P. conceived the study and analyzed results. P.F.P. developed the ASMC algorithm, performed simulations and data analysis. J.T. and Y.S.S. developed the CSFS model used in the SMC++ and ASMC algorithms. P.F.P. and A.L.P. wrote the manuscript with comments from J.T. and Y.S.S.

Corresponding authors

Correspondence to Pier Francesco Palamara or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Tables 1–13 and Supplementary Note

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Palamara, P.F., Terhorst, J., Song, Y.S. et al. High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 50, 1311–1317 (2018). https://doi.org/10.1038/s41588-018-0177-x

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing