Main

Menopause is caused by the depletion of the primordial follicle pool. There is a broad variation in the age of menopause (AOM), and early menopause (EM) impacts health, quality of life (https://www.menopausemandate.com/) and fertility potential2,3,4. It is estimated that natural fertility ends on average 10 years before menopause3,5. At the extreme end of the AOM distribution is primary ovarian insufficiency (POI) with cessation of menses before the age of 40 years, which occurs in 1–4% of women4. EM and POI are a well-known cause of infertility, which is increasingly relevant as women in many populations are choosing to have children later in life3.

Through genome-wide association studies (GWAS), we and others have reported associations of rare and low-frequency variants with variation in AOM, mostly under an additive model6,7,8. Rare variants in several genes have also been reported to cause Mendelian forms of POI9 although many are only reported in a small number of cases or in single families4,10. Despite advances in understanding the genetic causes of EM and POI, genetic screening has mainly been focused on Turner syndrome, which has a prevalence of 1 in 2,000, and the FMR1 premutation, found in 1 in 8,000 women4,10.

We performed a GWAS meta-analysis for AOM under the recessive model as well as the additive one (not affected by surgical procedures, such as hysterectomy and/or oophorectomy) on 174,329 postmenopausal women from Iceland, the United Kingdom (UK), Denmark and Norway (nIceland = 27,281, nUK = 137,906, nDenmark = 5,978 and nNorway = 3,161; Supplementary Tables 1 and 2). We tested 39.3 million sequence variants for associations with AOM (Fig. 1 and Supplementary Figs. 1 and 2).

Fig. 1: Regional association plot for the CCDC201 locus (7q12.3), flanking 550 kb on either side of the p.(Arg162Ter) variant, indicated as a diamond on the plot (chr7: 45863165G>A).
figure 1

Under a recessive model, variant associations with AOM are plotted against their NCBI Build 38 positions at the CCDC201 locus, colored by the strength of correlation, r2, to p.(Arg162Ter). LD data are based on the UK Biobank, and white circles represent variants absent from the UK Biobank. Known genes are shown in the plot. P values are two-sided without Bonferroni correction calculated based on an inverse variance-weighted fixed-effects meta-analysis. NA, not available.

Homozygosity (n = 27 women) for the low-frequency stop-gain variant p.(Arg162Ter) (rs117316434(A), chr7: 45863165; minor allele frequency (MAF) ~1%) in CCDC201 is associated with earlier AOM by 9 years than in heterozygotes and noncarriers (recessive effect = −1.59 s.d.; 95% confidence interval (CI): −1.98, −1.20), recessive P = 1.3 × 10−15; Figs. 1 and 2 and Table 1). The effect of the variant did not differ between the four groups (Phet = 0.28; Table 1). The association was genome-wide significant in the UK, the largest of the four groups (P = 3.6 × 10−13), and was also significant in the remaining three sample sets combined (P = 2.4 × 10−4; Table 1). The effect of p.(Arg162Ter) in CCDC201 on AOM deviates from the additive model and is limited to homozygotes (Supplementary Figs. 3 and 4). We did not detect an association with AOM under the additive model (additive effect = 0.029 s.d. (95% CI: −0.0094, 0.066), P = 0.16). We did not find a significant association of the p.(Arg162Ter) variant with any case–control or quantitative traits under the additive model.

Table 1 Association of the p.(Arg162Ter) stop-gain variant in CCDC201 (rs117316434(A); NP_001382164.1:p.Arg162Ter) with AOM under the recessive model

AOM data can be used to define EM (AOM before age 45 years) and POI (AOM before age 40 years) as case–control traits11. As expected, homozygotes are at high risk of EM (odds ratio (OR) = 35.5 (95% CI: 17.5, 71.6), P = 5.4 × 10−23), with 93% of homozygotes entering menopause before the age of 45 compared to 11% of heterozygotes and noncarriers. Homozygotes are also at high risk of POI (OR = 27.3 (95% CI: 9.38, 82.6), P = 2.2 × 10−9; Table 2), with 33% of homozygotes (9 of 27 women) entering menopause before the age of 40 compared with 3% of heterozygotes and noncarriers. It has been observed that there is a tendency to report values ending in 0 or 5 when women are asked to recall their AOM12, and we observed a similar trend (Supplementary Fig. 5). Assuming that half of women who report their AOM at exactly 40 are truly below that age, we estimated the probability of developing POI among homozygotes to be 46% ((9 + 3.5)/27) compared to 4% among heterozygotes and noncarriers ((4,687 + 1,728.5)/174,302). Based on this estimate of POI derived from AOM, 0.19% (or 1 of 513) of all POI cases are caused by p.(Arg162Ter) homozygosity. This is in line with POI based on the International Classification of Diseases, Tenth Revision (ICD-10) diagnostic code E283 in the UK Biobank 500k whole-genome sequenced (WGS) set, where one homozygote is observed among the 571 cases (0.17%).

Table 2 Association of the p.(Arg162Ter) stop-gain variant in CCDC201 (rs117316434(A); NP_001382164.1:p.Arg162Ter) with POI (AOM < 40) and EM (AOM < 45) under the recessive model
Fig. 2: Distribution of age of menopause by population and genotype.
figure 2

ad, Age of menopause for p.(Arg162Ter) homozygotes (red), heterozygotes and noncarriers (blue) by population in the UK (a), ISL (b), DNK (c) and NOR (d). The dashed line indicates the mean age of menopause. ISL, Iceland; DNK, Denmark; NOR, Norway; UK, UK Biobank; Het, Heterozygotes.

We tested the effect of p.(Arg162Ter) homozygosity on 34 reproductive, anthropometric and hormonal traits in women (requiring P < 0.05/32 = 0.0015 to account for multiple testing; Table 3, Supplementary Tables 27 and Supplementary Data 1). p.(Arg162Ter) homozygous women had one fewer child on average than other women (P = 0.00011). Also, the 17 childbearing p.(Arg162Ter) homozygous women had their last childbirth 5 years earlier than other women (recessive effect = −1.17 s.d. (95% CI: −1.72, −0.61), P = 3.8 × 10−5).

Table 3 Association of the p.(Arg162Ter) stop-gain variant in CCDC201 with fertility-related traits in the meta-analysis of Icelandic, Danish, Norwegian and UK Biobank datasets

A substantial fraction (51%) of noncarrier and heterozygous mothers give birth after the age of 30 years, while very few homozygous mothers have children after the age of 30 years (16%; Fig. 3a,b). Additionally, homozygous women are more likely to have no children or only one child than noncarrier and heterozygous women (Fig. 3c–f). Consistently, homozygous women are at greater risk of being diagnosed with infertility in electronic health records (ICD-10 code N97; OR = 7.3, P = 0.00019). While homozygotes exhibit a trend toward earlier childbearing, the proportion of women with children before 25 is not significantly different between homozygotes and noncarriers (OR = 1.23, P = 0.066; Fig. 3c and Supplementary Table 8). This suggests that earlier childbearing in homozygotes is not due to a higher frequency of very early births. Instead, it may be linked to infertility, potentially requiring conception attempts at a younger age for homozygotes.

Fig. 3: Distribution of maternal age at childbirth and number of children by population and genotype.
figure 3

a,b, Maternal age at childbirth for p.(Arg162Ter) homozygotes (red), heterozygotes and noncarriers (blue) by population in ISL (a) and the UK (b). The dashed line indicates the mean age at childbirth. c, Distribution of the percentage of mothers having children by age bins. Maternal age at childbirth for p.(Arg162Ter) homozygotes (red), heterozygotes and noncarriers (blue) by population in the UK and ISL. dg, Number of children born to p.(Arg162Ter) homozygotes (red) and heterozygotes and noncarriers (blue) women by population in ISL (d), the UK (e), DNK (f) and NOR (g). The dashed line indicates the mean number of children.

Notably, p.(Arg162Ter) homozygosity did not associate significantly with age at menarche, anthropometric traits, sex hormones, twinning or recombination phenotypes13 (Supplementary Tables 57), nor did it associate with the reproductive profile or infertility of males, indicating a female-specific effect (Supplementary Table 3). Under the additive model, the effect of p.(Arg162Ter) on twinning is nominally significant (OR = 1.46, P = 0.017), but does not meet our threshold for statistical significance after accounting for multiple testing (Supplementary Table 5).

In total, 290 variants have previously been reported in ref. 14 to associate with the age of menopause under the additive model and 44 under the recessive model, for which we provide robust replication (reported under the additive model excluding UK Biobank data: 281/290 = 97%; reported under the recessive model: 43/44 = 98%; Supplementary Data 2 and 3). At the CCDC201 locus, we note that a study discussed in ref. 14 reported a common variant (rs1826838; MAF = 38%) with a small effect on AOM under the additive model (effect in current meta-analysis = 0.025 s.d., P = 7.8 × 10−11). The variant rs1826838 is a 3′-UTR variant in the CCDC201 gene located 2,814 bp downstream of p.(Arg162Ter), and the variants represent two independent signals at the locus (r2 = 0.0065; Supplementary Table 9 and Supplementary Fig. 6). We note that the effect of the rare p.(Arg162Ter) homozygous genotype is 60-fold greater than that of heterozygotes for the common variant rs1826838.

For the current study, we provide summary statistics for the GWAS meta-analysis of the age of menopause for all tested variants under the recessive and additive models (Data availability).

The p.(Arg162Ter) variant is well imputed (imputation info >0.96) in the three population sets (Iceland, Denmark and Norway) based on the imputation of sequence variants detected through whole-genome sequencing. The UK Biobank analysis is based on a set of 500k WGS individuals. There was no discordance between homozygous genotypes of the p.(Arg162Ter) based on the set of 500k individuals with whole-genome sequence and the 500k set based on imputation from 155k sequenced individuals15.

The allele frequency (AF) of the p.(Arg162Ter) stop-gain variant in CCDC201 ranges from 0.74% to 1.15% in the four European sample sets, and 1 in 10,000 north Europeans are homozygous (Table 1). Within the UK Biobank set of 500k WGS individuals, 1000 Genomes and gnomAD data, p.(Arg162Ter) is very rare among those of Asian and African ancestry (Supplementary Fig. 7 and Supplementary Table 10). There was a north-to-south gradient among individuals of European descent in the UK Biobank, with Scandinavians showing the highest AF (AF = 1.25%, which means 1 in 6,400 individuals is expected to be homozygous) and the lowest frequency among southeast Europeans (AF = 0.11%, which means that 1 in 826,000 individuals is expected to be homozygous; Supplementary Fig. 7). To our knowledge, there are no reports of selection at this locus in European populations16,17,18,19, and in our data, no significant association with pigmentation traits is observed that might be indicative of selection (Supplementary Table 11).

CCDC201 encodes the coiled-coil domain-containing protein 201, a 187-amino acid protein. The stop-gain variant p.(Arg162Ter) is located in the third and final exon of CCDC201 and is predicted to result in a protein shortened by 25 amino acids (Supplementary Fig. 8). We note that p.(Arg162Ter) is located in the last exon of the CCDC201 gene and is therefore flagged as a low-confidence pLOF by the LOFTEE algorithm20. Possible loss-of-function mechanisms are truncation or instability of the truncated protein, but this awaits functional validation. The CCDC201 protein sequence is conserved across various mammals, suggesting that its function can be studied in other species. Interestingly, the 25 amino acids that could be lost by the p.(Arg162Ter) variant appear to be more conserved than the rest of the protein (Supplementary Figs. 9 and 10).

Interestingly, CCDC201 was one of 33 new protein-coding genes added to the National Center for Biotechnology Information (NCBI) Homo sapiens Annotation Release in November 2022, and it was previously not even annotated as a lncRNA or pseudogene (update 109.20211119, RefSeq release 210). The coding sequence for CCDC201 was previously missed due to a lack of spliced cDNA or expressed sequence tag evidence because its expression is restricted to female tissues21. Based on RNA sequencing (RNA-seq) data from Genotype-Tissue Expression (GTEx)22 and the human protein atlas23, CCDC201 shows the strongest expression in female tissues (ovary, breast and placenta) but is also present in other tissues such as testis. Based on single-cell RNA-seq data from the human protein atlas23, CCDC201 is one of 98 genes that show greater expression in oocytes than in other tissues.

Based on data from the GTEx project, among women 20–79 years of age, CCDC201 was most highly expressed in ovarian tissue among women aged 20–49 (premenopausal age), and CCDC201 gene expression is almost nonexistent in women over 50 years old (P = 0.00055, Wilcoxon rank-sum test; Supplementary Fig. 11). We did not observe a difference in the expression of CCDC201 in testes between age groups, which showed much lower expression than the ovaries (Supplementary Fig. 12).

Consistent with human tissue expression data, in mouse ENCODE RNA-seq data, expression of Ccdc201, the mouse homolog of CCDC201, is specific to the ovary and placenta21. Furthermore, in mice, Ccdc201 has been identified as a target of the oocyte-specific transcription factor FIGLA, which is known to control early folliculogenesis without affecting male germ cell differentiation24,25,26,27. Also, there is conflicting evidence that variants in FIGLA may cause autosomal dominant POI and female infertility9,28 (OMIM: 608697). Given that CCDC201 is a downstream target of FIGLA and shows oocyte-specific expression, we speculate that it may have a role in primordial follicle development and/or oocyte survival.

Performing a GWAS of AOM under the recessive model in 174,329 postmenopausal women from four European countries, we discovered that homozygosity of the low-frequency p.(Arg162Ter) in CCDC201 causes menopause to occur 9 years earlier than in noncarrier or heterozygotes and leads to POI of close to half of carriers. Despite the large effect of homozygosity for p.(Arg162Ter) on AOM, the association was not detected in large GWASs of that trait using an additive model29,30,31. In addition, annotating the variant as a coding loss-of-function variant has only been possible since 2022 when CCDC201 was annotated as a protein-coding gene in humans. In contrast with WGS data used in the current study, the CCDC201 region was not covered by capture libraries used for exome sequencing of UKB participants because this gene was uncharacterized when libraries were designed32. Around 1 in 10,000 northern European women are p.(Arg162Ter) homozygotes. This genotypic frequency is comparable to the frequency of the premutation of the FMR1 gene, which is the most common known genetic cause of POI (1 in 8,000 women)4. However, the penetrance of POI among carriers of the FMR1 gene premutation is approximately 20%, which is less than that of p.(Arg162Ter) homozygotes.

We observe a nominal association of p.(Arg162Ter) with earlier childbirth (before age 25) and an increased likelihood of twinning. Further work is needed to determine if these associations are real or chance observations due to multiple testing. If real, these observations suggest that p.(Arg162Ter) may increase oocyte activation, leading to earlier childbirth and twinning, but also hastening oocyte depletion and leading to EM.

Identification of p.(Arg162Ter) homozygosity in women presents an opportunity to take action in line with a shortened reproductive lifespan. This would involve referring homozygotes to a fertility specialist to plan their reproductive life and treat symptoms of EM, as is done for other genetic causes of POI4.

Methods

Study population

AOM was derived for individuals who were considered to have undergone natural menopause not affected by surgical procedures, such as hysterectomy and/or oophorectomy.

In Iceland, we used data on AOM obtained from the Icelandic Cancer Society’s Cancer Registry (n = 9,794) and from questionnaires from various genetic programs at deCODE genetics (n = 21,390), of which the majority was gathered through deCODE’s osteoporosis project and the deCODE Health study, which had also been genotyped. The Cancer Society’s data were collected from a questionnaire in the years 1964–1994, and deCODE genetics data from 1999 to 2022. All Icelandic data were collected through studies approved by the National Bioethics Committee (approvals VSN-15-198 and VSN-15-214) following review by the Icelandic Data Protection Authority. Participants donated blood or buccal samples after signing a broad informed consent allowing the use of their samples and data in all projects at deCODE genetics approved by the NBC. All personal identifiers of the participants’ data were encrypted by a third-party system, approved and monitored by the Icelandic Data Protection Authority.

The UK Biobank study33 is a large prospective cohort study of ~500,000 individuals in the age range of 40–69 years from across the UK. AOM (Data Field 3581) was collected from a touchscreen questionnaire at the UK Biobank assessment centers from 140,688 genotyped females who indicated that their periods had stopped (Data Field 2724). Only British individuals of European ancestry were included in the study. The UK Biobank data were obtained under application 56270. All phenotype and genotype data were collected following informed consent obtained from all participants. The North West Research Ethics Committee reviewed and approved the UK Biobank’s scientific protocol and operational procedures (REC reference: 06/MRE08/65).

Data on menopause status from Denmark were provided by the Danish Blood Donor Study (DBDS)34. Around 51% of participants were females with an age span at inclusion 18–70 years. The data were obtained from a paper questionnaire (v1) on self-reported health status and lifestyle sent to all participants in the DBDS (n = 110,000) from 2010 to mid-year 2015. Around 85,000 participants responded to it. In the end, AOM from 8,037 chip-typed females was used in the analysis. All participants signed an informed consent statement, and the DBDS genetic study was approved by the Danish National Committee on Health Research Ethics (NVK-1700407) and by the Danish Capital Region Data Protection Office (P-2019-99).

Data on female infertility from Denmark were provided by the Copenhagen Hospital Biobank (CHB) Reproduction Study, which involves a targeted selection of patients with reproductive phenotypes from the CHB, a biobank based on patient blood samples drawn in Danish hospitals35.

The AOM data from Norway were provided by the Hordaland Health Studies (HUSK). The HUSK surveys are a collaborative project between the University of Bergen, the Norwegian Health Screening Service (SHUS) and the Municipal Health Service in Hordaland aimed at gathering information so that disease ultimately can be prevented36. In the first phase of the studies (HUSK1), in 1992–1993, around 18,000 residents of Hordaland County born in 1925–1952 participated in the study. In 1997–1999 (HUSK2), previous participants born in 1950–1951 and 1925–1927 were re-invited, in addition to all residents in Hordaland County born in 1953–1957. In total, approximately 36,000 individuals participated in the study (18,000 in 1992–1993 and 26,000 in 1997–1999), with some participating at both times. Age at last menstruation (proxy for menopause) was collected from questionnaires sent to participants both in HUSK1 and HUSK2. All participants signed an informed consent statement, and the HUSKment study was approved by the Regional Committee for Medical Research Ethics Western Norway (REK Vest 10279 (2018/915)). In the end, AOM from 3,161 genotyped females was used in the analysis.

For all strata, in the case of repeat measurements, the mean age of menopause or the mean age at the last period was used to represent each individual’s AOM.

Rounding tendency in reported age of menopause

It has been observed that when women are asked to recall their AOM, they tend to report values ending in 0 or 5 (ref. 12). Thus, we need to take into account the possibility that some women who reported menopause at the age of 40 years may not have been included as POI cases due to this tendency and could lead to an underestimation of the risk of POI in our study. Of the 27 homozygotes for p.(Arg162Ter) with AOM information, nine reported AOM before the age of 40, while seven reported experiencing menopause exactly at the age of 40. Assuming an equal probability of rounding reported AOM up or down to 40, we estimated the penetrance of POI among homozygotes as 46% ((9 + 3.5)/27). Likewise, for noncarriers and heterozygotes, the estimated penetrance of POI is 3.7% ((4,678 + 1,728.5)/174,302).

Estimating the proportion of POI explained by p.(Arg162Ter) homozygosity

Using AOM data to define POI as AOM before the age of 40 years, we can observe nine homozygotes among the 4,687 females with AOM before the age of 40 years. Thus, we estimate that the proportion of all POI cases caused by p.(Arg162Ter) homozygosity is around 0.19% (that is, 1 of 521). Similarly, taking into account rounding bias, the proportion of all POI cases estimated to be caused by homozygosity is also 0.19% (that is, 1 of 513, or (9 + 3.5)/(4,687 + 1728.5)).

In the UK Biobank 500k WGS set, one homozygote was observed among the 571 females with the ICD-10 diagnostic code E283, indicative of POI. Thus, the incidence of p.(Arg162Ter) homozygosity is 1 of 571 among POI cases.

Genotyping

In Iceland, 34,453,001 sequence variants identified in WGS data from 63,460 Icelanders participating in various disease projects at deCODE genetics were tested. The samples were sequenced using standard TruSeq (Illumina) methodology to an average genome-wide coverage of 40×. SNPs and insertions and deletions (InDels) were identified, and their genotypes were called using joint calling with Graphtyper37. Variant Effect Predictor from RefSeq was used to annotate the effects of sequence variants on protein-coding genes. We chip-typed 173,025 Icelanders (around 50% of the population) using Illumina SNP arrays, and the chip-typed individuals were long-range phased38. The variants identified in the whole-genome sequencing of Icelanders were imputed into the chip-typed individuals. In addition, based on Icelandic genealogy, the genotype probabilities for 292,636 untyped close relatives of chip-typed individuals were calculated39,40.

From the UK Biobank, we used data from around 428k WGS individuals who were of British/Irish ancestry. The WGS was performed using Illumina standard TruSeq methodology (mean depth of 32×) in a collaborative work between deCODE genetics in Iceland and The Wellcome Sanger Institute in the UK. Sequence variants from the WGS were identified and called jointly using Graphtyper37. Phasing from previous chip-typing of the same sample was used as the basis to assign haplotypes15.

From Denmark and Norway, we chip-typed 464,016 and 254,304 individuals, respectively. The samples were chip-typed by deCODE genetics using both Omni microarrays (Illumina) and Global Screening Array (Illumina). Graphtyper was used to identify SNPs and InDels and jointly call their genotypes37. Using the identified variants, the samples were then phased (using SHAPEIT4 (ref. 41)) along with an international set of 1,041,174 genotyped individuals from 49 countries (including Denmark and Norway), chip-typed at deCODE genetics. For variant imputation, we compiled an international reference panel from 50,839 WGS individuals from 14 countries, including 10,985 from Denmark and 3,467 from Norway. The identified variants from WGS were subsequently imputed into the chip-typed individuals.

Association analysis

We performed a meta-analysis on GWAS on 180,564 females from Iceland, the UK, Denmark and Norway with self-reported AOM or age at last menstruation. We tested a total of 39,281,741 sequence variants (imputation info >0.80 and MAFIce > 0.02%, MAFUK > 0.01%, MAFDen > 0.1%, MAFNor > 0.2%), identified in the WGS, for association with AOM. The quantitative traits were transformed to a standard normal distribution. For the quantitative traits, the year of birth was included as a covariate in the analysis, with additional adjusting for the first 20 principal components in the UK, for population stratification. For each population, the quantitative traits were tested using a linear mixed model implemented in BOLT-LMM42. For the meta-analysis, we used a fixed-effects inverse variance method based on effect estimates and s.e. from each population43. For each study, we used linkage disequilibrium (LD) score regression to account for distribution inflation in the dataset due to cryptic relatedness and population stratification44. Using a set of about 1.1 million sequence variants with available LD scores, we regressed the χ2 statistics from our GWAS scan against the LD score and used the intercept as a correction factor. The estimated correction factor for AOM, based on LD score regression, was 0.97 for the recessive model in the Icelandic sample, 1.01 in the UK, 1.01 in Denmark and 1.02 in Norway.

We report the effect estimates for POI and EM phenotypes against population controls and as a categorical trait among women who reported AOM (AOM < 40 versus AOM ≥ 40; AOM < 45 versus AOM ≥ 45; Table 2). The effect estimates from the two methods do not differ significantly, and we have reached the same conclusion (Phet > 0.25).

Significance thresholds

We applied genome-wide significance thresholds corrected for multiple testing using an adjusted Bonferroni procedure weighted for variant classes and predicted functional impact. With 39,281,741 sequence variants being tested in the meta-analysis, the weights given in ref. 45 were rescaled to control the family-wise error rate. The adjusted significance thresholds are 2.0 × 10−7 for variants with high impact (n = 9,910), 4.0 × 10−8 for variants with moderate impact (n = 202,465), 3.7 × 10−9 for low-impact variants (n = 3,244,032), 1.8 × 10−9 for other variants in DNase I hypersensitivity sites (n = 5,001,568) and 6.1 × 10−10 for all other variants (n =30,823,766).

Variant frequency map

UK Biobank participants were first grouped by birth country. We then defined regional ancestry groupings with the aim that the groups be representative of the region’s current population, be homogeneous by genetic ancestry and have at least 200 individuals (for accurate estimation of variant frequencies).

We assessed the current genetic ancestry profiles of regions by comparing our ancestry analyses15 to in-house and published results of human genome diversity datasets like Human Origins46 and HGDP47, comparing genetic ancestry results across neighboring countries, surveying country demographics through resources like The World Factbook48 and examining participants’ self-reported ethnicity information and UK census data49 to determine the extent to which individuals who migrated to the UK were representative of the source countries’ current demographics.

In some cases, we split off ancestry-based groupings representing distinct populations or unrepresentative migrant communities (for example, South Asian ancestry born in Africa and West Asia) to achieve homogeneous birthplace-based groupings. Groups depicted on the maps in Supplementary Fig. 7 are those best representing the current demographic majority. If countries had fewer than 200 participant birthplaces, we merged them with neighboring countries with similar assessed ancestry profiles. Map geometries were obtained via R package maps and manipulated with sf50. The maps in Supplementary Fig. 7 are sourced from Natural Earth (https://www.naturalearthdata.com/about/terms-of-use/).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.