Analysis | Published:

Signatures of negative selection in the genetic architecture of human complex traits


We develop a Bayesian mixed linear model that simultaneously estimates single-nucleotide polymorphism (SNP)-based heritability, polygenicity (proportion of SNPs with nonzero effects), and the relationship between SNP effect size and minor allele frequency for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752) and show that on average, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (P < 0.05/28) signatures of natural selection in the genetic architecture of 23 traits, including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. The significant estimates of the relationship between effect size and minor allele frequency in complex traits are consistent with a model of negative (or purifying) selection, as confirmed by forward simulation. We conclude that negative selection acts pervasively on the genetic variants associated with human complex traits.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Johnson, T. & Barton, N. Theoretical models of selection and mutation on quantitative traits. Phil. Trans. R. Soc. Lond. B 360, 1411–1425 (2005).

  2. 2.

    Hancock, A. M. et al. Colloquium paper: human adaptations to diet, subsistence, and ecoregion are due to subtle shifts in allele frequency. Proc. Natl. Acad. Sci. USA 107(Suppl 2), 8924–8930 (2010).

  3. 3.

    Pritchard, J. K., Pickrell, J. K. & Coop, G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20, R208–R215 (2010).

  4. 4.

    Smith, J. M. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974).

  5. 5.

    Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

  6. 6.

    Eyre-Walker, A. Evolution in health and medicine Sackler colloquium: genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl. Acad. Sci. USA 107(Suppl 1), 1752–1756 (2010).

  7. 7.

    Mancuso, N. et al. The contribution of rare variation to prostate cancer heritability. Nat. Genet. 48, 30–35 (2016).

  8. 8.

    Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

  9. 9.

    Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

  10. 10.

    Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

  11. 11.

    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

  12. 12.

    Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

  13. 13.

    Gratten, J., Wray, N. R., Keller, M. C. & Visscher, P. M. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nat. Neurosci. 17, 782–790 (2014).

  14. 14.

    Habier, D., Fernando, R. L., Kizilkaya, K. & Garrick, D. J. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics 12, 186 (2011).

  15. 15.

    Moser, G. et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet. 11, e1004969 (2015).

  16. 16.

    de Los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D. & Calus, M. P. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193, 327–345 (2013).

  17. 17.

    Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

  18. 18.

    Lloyd-Jones, L. R. et al. Inference on the genetic basis of eye and skin color in an admixed population via Bayesian linear mixed models. Genetics 206, 1113–1126 (2017).

  19. 19.

    Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

  20. 20.

    Lloyd-Jones, L. R. et al. The genetic architecture of gene expression in peripheral blood. Am. J. Hum. Genet. 100, 228–237 (2017).

  21. 21.

    Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

  22. 22.

    Lee, S. H. et al. Estimation of SNP heritability from dense genotype data. Am. J. Hum. Genet. 93, 1151–1155 (2013).

  23. 23.

    Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).

  24. 24.

    Neal, R. M. MCMC using Hamiltonian dynamics. in Handbook of Markov Chain Monte Carlo (eds Brooks, S., Gelman, A., Jones, G. & Meng, X.-L.) 113–162 (CRC Press, Boca Raton, FL, 2011).

  25. 25.

    Fernando, R. L., Dekkers, J. C. & Garrick, D. J. A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet. Sel. Evol. 46, 50 (2014).

  26. 26.

    Walter, K. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  27. 27.

    Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

  28. 28.

    Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

  29. 29.

    Robinson, M. R. et al. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 47, 1357–1362 (2015).

  30. 30.

    Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).

  31. 31.

    Turcot, V. et al. Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nat. Genet. 50, 26–41 (2018).

  32. 32.

    de Koning, L., Merchant, A. T., Pogue, J. & Anand, S. S. Waist circumference and waist-to-hip ratio as predictors of cardiovascular events: meta-regression analysis of prospective studies. Eur. Heart J. 28, 850–856 (2007).

  33. 33.

    Wass, P., Waldenström, U., Rössner, S. & Hellberg, D. An android body fat distribution in females impairs the pregnancy rate of in-vitro fertilization-embryo transfer. Hum. Reprod. 12, 2057–2060 (1997).

  34. 34.

    Day, F. R. et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat. Genet. 47, 1294–1303 (2015).

  35. 35.

    Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

  36. 36.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  37. 37.

    Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008).

  38. 38.

    Visscher, P. M., Goddard, M. E., Derks, E. M. & Wray, N. R. Evidence-based psychiatric genetics, AKA the false dichotomy between common and rare variant hypotheses. Mol. Psychiatry 17, 474–485 (2012).

  39. 39.

    Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nat. Genet. 46, 220–224 (2014).

  40. 40.

    Uricchio, L. H., Zaitlen, N. A., Ye, C. J., Witte, J. S. & Hernandez, R. D. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 26, 863–873 (2016).

  41. 41.

    Kelleher, J., Etheridge, A. M. & McVean, G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS Comput. Biol. 12, e1004842 (2016).

  42. 42.

    Konarzewski, M. & Książek, A. Determinants of intra-specific variation in basal metabolic rate. J. Comp. Physiol. B 183, 27–41 (2013).

  43. 43.

    Nyholt, D. R., Gillespie, N. A., Heath, A. C. & Martin, N. G. Genetic basis of male pattern baldness. J. Invest. Dermatol. 121, 1561–1564 (2003).

  44. 44.

    Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).

  45. 45.

    de Moor, M. H. et al. Meta-analysis of genome-wide association studies for neuroticism, and the polygenic association with major depressive disorder. JAMA Psychiatry 72, 642–650 (2015).

  46. 46.

    Davies, G. et al. Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N = 112 151). Mol. Psychiatry 21, 758–767 (2016).

  47. 47.

    Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988 (2011).

  48. 48.

    Altshuler, D. M. et al. International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  49. 49.

    Torgerson, D. G. et al. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. 5, e1000592 (2009).

  50. 50.

    McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

  51. 51.

    Hernandez, R. D. et al. Classic selective sweeps were rare in recent human evolution. Science 331, 920–924 (2011).

  52. 52.

    Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).

  53. 53.

    Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).

  54. 54.

    Field, Y. et al. Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).

  55. 55.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

  56. 56.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  57. 57.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

  58. 58.

    Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).

  59. 59.

    Zeng, J. Whole Genome Analyses Accounting for Structures in Genotype Data. PhD dissertation, Iowa State University, Chapter 2, 6-33 (2015).

  60. 60.

    Fernando, R. L. & Garrick, D. Bayesian methods applied to GWAS. Methods Mol. Biol. 1019, 237–274 (2013).

  61. 61.

    Gelman, A. et al. Bayesian Data Analysis. (CRC Press, Boca Raton, FL, 2014).

  62. 62.

    Fernando, R. L. et al. Controlling the proportion of false positives in multiple dependent tests. Genetics 166, 611–619 (2004).

  63. 63.

    Storey, J. D. The optimal discovery procedure: a new approach to simultaneous significance testing. J. R. Stat. Soc. Series B Stat. Methodol. 69, 347–368 (2007).

  64. 64.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

  65. 65.

    UK Biobank Genotyping and quality control of UK Biobank, a large-scale, extensively phenotyped prospective resource. UK Biobank (2015).

  66. 66.

    Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013).

  67. 67.

    Messer, P. W. SLiM: simulating evolution with selection and linkage. Genetics 194, 1037–1039 (2013).

  68. 68.

    Palamara, P. F. et al. Leveraging distant relatedness to quantify human mutation and gene-conversion rates. Am. J. Hum. Genet. 97, 775–789 (2015).

  69. 69.

    Enard, D., Messer, P. W. & Petrov, D. A. Genome-wide signals of positive selection in human evolution. Genome Res. 24, 885–895 (2014).

  70. 70.

    Abecasis, G. R. et al. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

Download references


We thank The University of Queensland’s Research Computing Centre (RCC) for its support in this research. We thank F. Zhang for building the website for the software tool GCTB. This research was supported by the Australian Research Council (DP160101343, DP160101056, DP160103860, and DP160102400), the Australian National Health and Medical Research Council (1107258, 1078901, 1078037, 1083656, 1078399, 1046880, and 1113400), the US National Institutes of Health (MH100141, GM099568, ES025052, and AG042568), and the Sylvia & Charles Viertel Charitable Foundation (Senior Medical Research Fellowship). R.d.V. acknowledges funding from an ERC consolidator grant (647648 EdGe, awarded to Philipp Koellinger). This study makes use of data from dbGaP (accessions: phs000090 and phs000091), UK10K project (EGA accessions: EGAS00001000108 and EGAS00001000090), and UK Biobank Resource (application number: 12514). A full list of acknowledgements for these datasets can be found in part 19 of the Supplementary Note.

Author information

J.Y., P.M.V., and R.d.V. conceived the study. J.Y., J.Z., and P.M.V. designed the experiment. J.Z. derived the analytical methods, conducted all analyses, and developed the software with assistance and guidance from J.Y., Y.W., M.R.R., L.R.L.-J., L.Y., C.X.Y., A.X., and J.S. L.R.L.-J., A.F.M., J.E.P., G.W.M., A.M., T.E., G.G., N.R.W., and P.M.V. provided the CAGE data. J.Z. and J.Y. wrote the manuscript with the participation of all authors. All authors reviewed and approved the final manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to Jian Yang.

Supplementary information

Supplementary Text, Figures and Tables

Supplementary Figures 1–26, Supplementary Tables 1–9, and Supplementary Note

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: Estimation of the genetic architecture parameters for a simulated trait using the ARIC + GENEVA data.
Fig. 2: Posterior distributions of the genetic architecture parameters for height versus BMI using data from UKB.
Fig. 3: BayesS estimates of the genetic architecture parameters for the UKB traits.
Fig. 4: cGVE by SNPs with MAF smaller than a threshold on the x axis.
Fig. 5: Forward simulations with different types of selection.