Genome-wide sequencing of human populations has revealed substantial variation among genes in the intensity of purifying selection acting on damaging genetic variants1. Although genes under the strongest selective constraint are highly enriched for associations with Mendelian disorders, most of these genes are not associated with disease and therefore the nature of the selection acting on them is not known2. Here we show that genetic variants that damage these genes are associated with markedly reduced reproductive success, primarily owing to increased childlessness, with a stronger effect in males than in females. We present evidence that increased childlessness is probably mediated by genetically associated cognitive and behavioural traits, which may mean that male carriers are less likely to find reproductive partners. This reduction in reproductive success may account for 20% of purifying selection against heterozygous variants that ablate protein-coding genes. Although this genetic association may only account for a very minor fraction of the overall likelihood of being childless (less than 1%), especially when compared to more influential sociodemographic factors, it may influence how genes evolve over time.
This is a preview of subscription content, access via your institution
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Raw data produced by this study are available as part of the UK Biobank data returns catalogue with application ID 44165: https://biobank.ndph.ox.ac.uk/ukb/docs.cgi?id=1.
Code used as part of this project to perform phenotype testing, CNV calling, variant quality control and to generate all main text figures, supplementary figures and supplementary tables is available on github: https://github.com/HurlesGroupSanger/UKBBFertility. All statistical analysis in this manuscript was performed using R v3.6.0.
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Weghorn, D. et al. Applicability of the mutation-selection balance model to population genetics of heterozygous protein-truncating variants in humans. Mol. Biol. Evol. 36, 1701–1710 (2019).
Darwin, C. The Descent of Man, and Selection in Relation to Sex (A. L. Burt, 1874); https://doi-org.ezp.lib.cam.ac.uk/10.5962/bhl.title.16749
Ganna, A. et al. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nat. Neurosci. 19, 1563–1565 (2016).
Männik, K. et al. Copy number variations and cognitive phenotypes in unselected populations. JAMA 313, 2044–2054 (2015).
Huguet, G. et al. Measuring and estimating the effect sizes of copy number variants on general intelligence in community-based samples. JAMA Psychiatry 75, 447–457 (2018).
Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Szustakowski, J. D. et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat. Genet. 53, 942–948 (2021).
Barthold, J. A., Myrskylä, M. & Jones, O. R. Childlessness drives the sex difference in the association between income and reproductive success of modern Europeans. Evol. Hum. Behav. 33, 628–638 (2012).
Dudel, C. & Klüsener, S. Estimating men’s fertility from vital registration data with missing values. Popul. Stud. 73, 439–449 (2019).
Birth Summary Tables, England and Wales 2019 (Office of National Statistics, 2020).
Oud, M. S. et al. A systematic review and standardized clinical validity assessment of male infertility genes. Hum. Reprod. 34, 932–941 (2019).
Bult, C. J. et al. Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 47, D801–D806 (2019).
Lopes, A. M. et al. Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1. PLoS Genet. 9, e1003349 (2013).
Skjaerven, R., Wilcox, A. J. & Lie, R. T. A population-based study of survival and childbearing among female subjects with birth defects and the risk of recurrence in their children. N. Engl. J. Med. 340, 1057–1062 (1999).
Lie, R. T., Wilcox, A. J. & Skjaerven, R. Survival and reproduction among males with birth defects and risk of recurrence in their children. JAMA 285, 755–760 (2001).
Power, R. A. et al. Fecundity of patients with schizophrenia, autism, bipolar disorder, depression, anorexia nervosa, or substance abuse vs their unaffected siblings. JAMA Psychiatry 70, 22–30 (2013).
Allen, M. S. The role of personality in sexual and reproductive health. Curr. Dir. Psychol. Sci. 28, 581–586 (2019).
Buss, D. M. et al. International preferences in selecting mates: a study of 37 cultures. J. Cross. Cult. Psychol. 21, 5–47 (1990).
Pawłowski, B. & Dunbar, R. I. Impact of market value on human mate choice decisions. Proc. Biol. Sci. 266, 281–285 (1999).
Buss, D. M. & Schmitt, D. P. Mate preferences and their behavioral manifestations. Annu. Rev. Psychol. 70, 77–110 (2019).
Fieder, M., Huber, S. & Bookstein, F. L. Socioeconomic status, marital status and childlessness in men and women: an analysis of census data from six countries. J. Biosoc. Sci. 43, 619–635 (2011).
Nettle, D. & Pollet, T. V. Natural selection on male wealth in humans. Am. Nat. 172, 658–666 (2008).
Miettinen, A., Rotkirch, A., Szalma, I., Donno, A. & Tanturri, M.-L. Increasing Childlessness in Europe: Time Trends and Country Differences Working Paper 33 (Family and Societies, 2015).
Jalovaara, M. et al. Education, Gender, and Cohort Fertility in the Nordic Countries. Eur. J. Popul. 35, 563–586 (2019).
Fieder, M. & Huber, S. The effects of sex and childlessness on the association between status and reproductive output in modern society. Evol. Hum. Behav. 28, 392–398 (2007).
GTEx Consortium. The Genotype–Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Trivers, R. in Sexual Selection and the Descent of Man (ed. Campbell, B.) (Aldine, 1972).
Bateman, A. J. Intra-sexual selection in Drosophila. Heredity 2, 349–368 (1948).
Parker, G. A. & Pizzari, T. in Current Perspectives on Sexual Selection 119–163 (Springer, 2015).
Kolk, M. & Barclay, K. Cognitive ability and fertility among Swedish men born 1951-1967: evidence from military conscription registers. Proc. Biol. Sci. 286, 20190359 (2019).
Kendall, K. M. et al. Cognitive performance among carriers of pathogenic copy number variants: analysis of 152,000 UK Biobank subjects. Biol. Psychiatry 82, 103–110 (2017).
Davis, K. A. S. et al. Mental health in UK Biobank—development, implementation and results from an online questionnaire completed by 157,366 participants: a reanalysis. BJPsych Open 6, e18 (2020).
Tyrrell, J. et al. Genetic predictors of participation in optional components of UK Biobank. Nat. Commun. 12, 886 (2021).
Stefansson, H. et al. CNVs conferring risk of autism or schizophrenia affect cognition in controls. Nature 505, 361–366 (2014).
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, 1462–1472 (2016).
Verweij, R. M. et al. Sexual dimorphism in the genetic influence on human childlessness. Eur. J. Hum. Genet. 25, 1067–1074 (2017).
Clark, D. W. et al. Associations of autozygosity with a broad range of human phenotypes. Nat. Commun. 10, 4957 (2019).
Stanley, K. E. et al. Causal genetic variants in stillbirth. N. Engl. J. Med. 383, 1107–1116 (2020).
Kaplanis, J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020).
Girirajan, S. et al. Phenotypic heterogeneity of genomic disorders and rare copy-number variants. N. Engl. J. Med. 367, 1321–1331 (2012).
Costain, G., Chow, E. W. C., Silversides, C. K. & Bassett, A. S. Sex differences in reproductive fitness contribute to preferential maternal transmission of 22q11.2 deletions. J. Med. Genet. 48, 819–824 (2011).
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Berrington, A. in Demographic Research Monographs 57–76 (Springer, 2017).
Betzig, L. Means, variances, and ranges in reproductive success: comparative evidence. Evol. Hum. Behav. 33, 309–317 (2012).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Macé, A. et al. New quality measure for SNP array based CNV detection. Bioinformatics 32, 3298–3305 (2016).
Liaw, A. & Wiener, M. Classification and regression by randomforest. R News 2, 285 (2002).
Di Angelantonio, E. et al. Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45,000 donors. Lancet 390, 2360–2371 (2017).
Fromer, M. et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607 (2012).
Backenroth, D. et al. CANOES: detecting rare copy number variants from whole exome sequencing data. Nucleic Acids Res. 42, e97 (2014).
Packer, J. S. et al. CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data. Bioinformatics 32, 133–135 (2016).
Crawford, K. et al. Medical consequences of pathogenic CNVs in adults: analysis of the UK Biobank. J. Med. Genet. 56, 131–138 (2019).
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at https://doi.org/10.1101/148353 (2017).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014).
Kersey, P. J. et al. Ensembl Genomes 2016: more genomes, more complexity. Nucleic Acids Res. 44, D574–D580 (2016).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).
Nait Saada, J. et al. Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations. Nat. Commun. 11, 6130 (2020).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Balduzzi, S., Rücker, G. & Schwarzer, G. How to perform a meta-analysis with R: a practical tutorial. Evid. Based. Ment. Health 22, 153–160 (2019).
Population and Welfare Department. Multi-Generation Register 2016: A Description of Contents and Quality (Statistics Sweden, 2017).
Carlstedt, B. Cognitive Abilities—Aspects of Structure, Process and Measurement. Doctoral thesis, Univ. of Gothenburg (2000).
Hällsten, M. Inequality across three and four generations in egalitarian Sweden: 1st and 2nd cousin correlations in socio-economic outcomes. Res. Soc. Stratif. Mobil. 35, 19–33 (2014).
Mårdberg, B. & Carlstedt, B. Swedish Enlistment Battery (SEB): construct validity and latent variable estimation of cognitive abilities by the CAT‐SEB. Int. J. Sel. 6, 107–114 (1998).
Rönnlund, M., Carlstedt, B., Blomstedt, Y., Nilsson, L.-G. & Weinehall, L. Secular trends in cognitive test performance: Swedish conscript data 1970–1993. Intelligence 41, 19–24 (2013).
We thank L. Parts, J. Kaplanis, M. Przeworski and G. Davey-Smith for useful discussions and advice on data analysis; M. Oud and J. Veltman for helpful discussions regarding infertility; G. Kalantzis and P. F. Palamara for assistance with correcting for recent ancestry; and the INTERVAL study for sharing genotyping and exome data that enabled us to refine our CNV filtering methodology. This work has been funded by core Wellcome funding to the Wellcome Sanger Institute (grant WT098051) and as part of a Medical Research Council (MRC) Centre Grant to the MRC Centre for Neuropsychiatric Genetics and Genomics (grant MR/L010305/1). K.B. is supported by a fellowship from the Bank of Sweden Tercentenary Foundation (Riksbankens Jubileumsfond). This work has been conducted using the UK Biobank Resource under application numbers 14421 (to G.K.) and 44165 (to H.C.M.).
M.E.H. is a founder of, director of, consultant to, and holds shares in, Congenica Ltd and is a consultant to the AstraZeneca Centre for Genomics Research.
Peer review information
Nature thanks Loic Yengo and the other, anonymous reviewers for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 2 Effect size estimate for the association of shet burden with number of children for individuals with children.
Shown are the effect size estimates for the association of shet burden with number of children, separated into females (purple) and males (jade), but with all childless individuals in the UK Biobank removed. Like Main Text Fig. 1a, the regression used to generate the displayed result used the raw number of children, live births for females and children fathered for males, rather than a binary value for having children. Asterisks indicate significance after Bonferroni correction for 20 tests (P < 2.5 × 10−3; Methods).
Shown are odds ratio estimates for the association of shet burden with having children, stratified by participant age (y-axis) and separated into females (violet) and males (jade). Age range intervals are left-open. Dash of the line indicates whether the estimate comes from shet burden calculated from deletions (long dash), PTVs (short dash), or from a fixed effects meta-analysis (no dash). Also shown for reference are the results for all individuals regardless of age (All Ages), which is identical to the result shown in main text Fig. 1b. Asterisks indicate significance after Bonferroni correction for 20 tests (P < 2.5 × 10−3; Methods).
Extended Data Fig. 4 Mediation of the relationship between shet burden and childlessness by various disorders.
Plotted is the deletion and PTV meta-analysis -log10 P value for the association between shet burden and having children, corrected by one of 1,294 ICD-10 codes from a combination of general practitioner, hospital episode records, and self-reported conditions (left) or hospital episode records alone (right) separately for males (top) and females (bottom). Remaining ICD-10 codes at different levels on the ICD-10 hierarchy not displayed here are plotted in Supplementary Figs. 8 and 9. Results are ordered first by ICD-10 chapter (x-axis) and then by increasing -log10 P value (y-axis). The arrow for code Z37 indicates the point is below the scale of the y-axis with -log10 P value indicated in parentheses. Visual outliers are labelled and do not imply a significant change in the effect size of shet on childlessness.
Extended Data Fig. 5 Mediation of the association between shet burden and childlessness by various disorders.
Depicted are the results of our primary association between childlessness and individual shet burden corrected for presence/absence of approximately 2,000 different disorders, diseases, and health factors queried from (A,B) hospital episode statistics and (C,D) complete health outcomes data as represented by the ICD-10 medical coding system separately for (A,C) males and (B,D) females (see main text methods). Shown on the x-axis is the -log10 p value for the association of shet with having children, corrected for a given diagnostic code. On the y-axis is the -log10 p value for having a given medical code on likelihood of having children; p values are placed above or below y = 0 based on the direction of effect, with disorders which are associated with having children above and those associated with not having children below. Codes were chosen for labeling to highlight outliers and not based on any statistical criteria. Codes with points at the top or bottom of plots have -log10 p values ≥ 100. Color of points and text is based on the ICD-10 chapter. Please note that text labels do not necessarily represent the full official name of a given ICD-10 code.
Extended Data Fig. 6 Association of eight relevant phenotypes/demographic measures with the likelihood of having children among UK Biobank participants.
Shown are the results of a logistic regression estimating the odds ratio for the relationship of (A) having a partner at home, (B) ever having had sex (C) completing university, (D) having a severe mental health disorder, (E) household income, (F) fluid intelligence, (G) Townsend deprivation index, and (H) engaging in same sex sexual behaviour with likelihood of having children, separated into females (violet) and males (jade). 95% confidence intervals for all plots are included, but may be invisible at the resolution of the figure. Please note that the scales of the x-axis for plots (A) and (B) are different from plots (C-H) due to the relatively stronger association of these traits with having children.
Extended Data Fig. 7 Effect size estimates for the association between shet burden and Townsend Deprivation Index.
Shown are the effect size estimates for the association between shet burden and Townsend Deprivation Index, separated into females (purple) and males (jade). Units are unnormalized Townsend Deprivation Indices for each individual in the UK Biobank. Asterisks indicate significance after Bonferroni correction for 20 tests (p < 2.5 × 10−3; Methods).
Extended Data Fig. 8 The role of individual phenotypes in the relationship between shet burden, childlessness, and fitness.
(A) Odds ratio estimates for the association of cumulative deleterious variation for a combined meta-analysis (deletions + PTVs) with childlessness (middle), corrected for a combination of whether or not a study participant has a mental health (MH) disorder, a partner at home, a university degree, infertility (as ascertained from Complete Health Outcomes Data; Methods), or ever had sex; traits included in each model are indicated as coloured boxes (males – jade, females – violet) on the y-axis. Stars within boxes indicate either nominal (*) or Bonferroni-corrected (**) significance level with childlessness for each covariate independently when correcting for PTV shet burden. For all possible combinations of these traits, see Supplementary Fig. 17. As indicated by coloured boxes, all models include shet burden and were run separately for males and females. The marginal bar plot to the right gives the proportion of the variance in childlessness explained by shet burden as calculated for PTVs only, scaled to the model which only includes shet burden (i.e. the model on the bottom of the plot). (B) Predicted reduction in overall fitness as a factor of individual shet burden. Displayed is the expected reduction in fitness as a factor of increasing shet burden, independently for each sex. Error is shown as the lighter shaded area surrounding the trend line, and is based on the confidence intervals on the odds ratio as determined by our logistic regression model (Fig. 1b; Methods). The dashed line represents the theoretical reduction in fitness as predicted by shet4.
Identical to Extended Data Fig. 8B, except in this instance, the y-axis represents predicted childlessness as a factor of individual shet burden, rather than predicted reduction in fitness. Values at x = 0 represent actual mean childlessness among UK Biobank males (jade) and females (violet) with an shet burden of 0.
This file contains Supplementary Figs. 1–21, Supplementary Notes 1–3 and references.
Phenotypes assessed in this study.
Results from logistic models corrected for ICD-10 codings.
Covariates for training the CNV random forest QC model.
Catalogue of odds ratios and betas for associations reported in this manuscript.
Raw fertility statistics for all Swedish males born 1965–1967.
Values used for estimating contribution of mental health phenotypes to overall fitness.
About this article
Cite this article
Gardner, E.J., Neville, M.D.C., Samocha, K.E. et al. Reduced reproductive success is associated with selective constraint on human genes. Nature 603, 858–863 (2022). https://doi.org/10.1038/s41586-022-04549-9
This article is cited by
Will Intelligent Latter-day Saints and Smart Conservatives Inherit the Earth? Differential Selection for Intelligence in the USA Based on Religiosity and Conservatism
Evolutionary Psychological Science (2022)