Introduction

Large biobank genome data collections combined with electronic health records have made phenome-wide association studies (PheWAS) feasible [1] leading to increased power and novel discoveries in disease genetics [2]. Blood donors have been suggested to be an excellent option for large cohorts of healthy individuals [3] as blood donors voluntary and frequently donate blood. Based on in-depth interviews blood donors are known to have a positive attitude towards scientific research and for use of their donated samples for research if not needed for patient care [4, 5]. The blood donor population, however, is highly selected due to prior health questionnaires and healthy donor effect [6] referring to accumulation in the blood donor pool of individuals with a very good health status. The healthy donor effect, an example of membership bias, may lead to severe confounding effects in research settings if not considered.

As a result of genetic drift, isolated populations, such as the Finnish population, encounter bottlenecks that may enrich both deleterious [7], disease-predisposing alleles and disease-protective alleles. Small founder population, geographical and linguistic isolation [8, 9] and a strong bottleneck approximately 120 years ago, are manifested in the Finnish disease heritage, a set of recessive diseases that are more common in Finland than elsewhere [10] and also to a remarkably lower prevalence of other diseases, such as cystic fibrosis. In addition, the significantly lower genetic diversity of the Finns compared to other European populations [11] enable more efficient genetic research compared to populations with more heterogeneous inheritance. It is worth noting, that even though homogeneous compared to other populations, a strong genetic variation occurs within Finland, leading to geographical genetic substructure of the population [12].

To evaluate the usefulness of blood donor-based biobank for genetic studies, we investigated the occurrence and geographical distribution of three set of genetic variants: (1) rare Finnish-enriched variants associated with susceptibility to many multifactorial diseases identified previously by FinnGen project [13] (2) blood donation-related variants: mutations causing hemochromatosis, an iron accumulation disease treated with phlebotomy, and Bernard-Soulier syndrome, a rare macro thrombocytopenia caused by defective proteins important to thrombocyte development and function, (3) HLA-DQ2/DQ8 heterozygotes, established to have a high susceptibility to type I diabetes, an autoimmune, multifactorial disease enriched in Finland. A few founder mutations for hemochromatosis are shared by all Northern European populations whereas a set of local mutations for Bernard-Soulier syndrome explain the majority of the Finnish cases [14, 15]. Individuals with hemochromatosis mutations can be assumed to be enriched to and those with Bernard-Soulier syndrome missing from blood donor pool. The frequency of type 1 diabetes risk haplotype, HLA-DQ2/DQ8 is assumed to be lower among blood donors than in the general population. Furthermore, we investigated the geographical distribution of the variants in Finland and the degree of consanguinity in Blood Service Biobank. In the present study we evaluate how well a blood donor-based biobank encompasses the rare and common genetic variants and certain HLA-haplotypes in Finland to understand the actual value of blood donor cohorts in studies of disease-associated variants and personalised genetics.

Material and methods

Samples, genome data and blood donation data

31,880 blood samples were collected along the standard blood donation from blood donors who had given a written biobank consent for the Blood Service Biobank of the Finnish Red Cross Blood Service, Finland. Use of the samples and data is in accordance with the biobank consent and meets the requirements of the Finnish Biobank Act 688/2012.

The biobank samples were genotyped as part of the FinnGen project with the FinnGen ThermoFisher Axiom custom array v1 or v2. Genotyping, quality control, and genome imputation protocols, R11, are described in detail in FinnGen Gitbook [16]. In brief, genotype calling was performed with AxiomGT1 algorithm. Prior the imputation, genotyped samples were pre-phased with Eagle 2.3.5 with the default parameters, except the number of conditioning haplotypes was set to 20,000. Genotype imputation was performed using the population-specific imputation reference panel SISu v3 including 3775 high coverage (25–30x) whole genome sequence data, with Beagle 4.1 (version 08Jun17.d8b). Genotypes were then returned to Blood Service Biobank.

The principal component analysis, PCA, was performed as part of FinnGen project for the analysis of population structure and is described in detail in FinnGen Gitbook [16]. After variant filtering and LD pruning, the FinnGen data was merged with 1k genome project (1kgp) data to filter out outliers. The final values of 20 principal components were then returned to Blood Service Biobank.

Blood donation data contains the blood donation-related information, birth year, ABO blood group, haemoglobin value, home region and donation history. This information was used in the analyses when applicable.

Comparison between Blood Service Biobank (N = 35,709) and FinnGen (N = 292,432) cohorts for allele frequencies of the rare variants, level of kinship and mean values of principal components 1 and 2 were performed with R7 dataset [16].

Variant identification

Genotypes were called from VCF files under the following conditions: when imputed dosage score was ≥0 or ≤0.1, dosage value was considered as 0, when imputed dosage score was ≥0.9 or ≤1.1, dosage value was considered as 1 and finally when imputed dosage score was ≥1.9 or ≤2.0, dosage value was considered as 2. Samples which met these conditions in case of all 58 variants were included in the study.

The occurrence and geographical distribution of the following variants were analysed: (i) 53 rare disease-associated variants [13]; (ii) two mutations in HFE gene, H63D and C282Y, two Finnish-enriched mutations in GP1BA gene (Leu129Pro, Tyr518Leufs83) and two in GP9 gene (Asn45Ser and Leu40Pro) leading to Bernard-Soulier thrombocytopenia [14, 15, 17] that can be assumed to influence blood donation activities; and (iii) HLA-DQ2/DQ8 heterozygotes, HLA-DRB1*03:01-DQA1*05:01-HLA-DQB1*02:01 and HLA-DRB1*04:01-DQA1*03:01-DQB1*03:02, who have a high risk for type I diabetes [18, 19]. These two HLA-haplotypes include the known predisposing risk factors for type 1 diabetes: amino acid change in HLA-DRB1 molecule at position 13 and 71 and in HLA-DQB1 molecule at position 57 ref. [18]. All the investigated variants included in the study are listed in Supplementary Table 1.

Consanguinity

Relatedness between the donors had been previously determined in FinnGen project [16]. The level of consanguinity in each cohort, Blood Service Biobank and FinnGen, was calculated as following: 10,000 random samples were extracted in each cohort and unique study donors with kinship relationships within the cohorts were extracted. This was repeated 1000 times to increase the trustworthy of real-life situation. The percentage of individuals with first-, second- and third-degree consanguinity in each cohort were calculated.

HLA imputation and haplotyping

HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, -DPB1 alleles were imputed at high-resolution level using HIBAG algorithm [20] with population-specific models in genome build 38 as described by Ritari et al. [21]. HLA-haplotyping was performed on high-resolution level on imputed HLA-alleles with Gap R package version 1.2.1. The mean posterior probability, PP, value of the imputed HLA-alleles was calculated prior haplotyping and all the samples where the mean PP value of all imputed HLA-alleles was 0.80 or higher, were included in haplotyping. The cut-off PP value of HLA-haplotypes was set to 0.5. Four donors had more than one possible haplotype combination which were not available in the reference haplotype lists [22] (FRCBS unpublished). These donors were excluded from the HLA-haplotype analyses. Altogether HLA-haplotypes from 29,659 donors (59,318 haplotypes) were available for the analysis.

Statistical testing

The allele frequencies of rare disease-associated variants between Blood Service Biobank and FinnGen cohort were compared by linear regression. Student’s t-test was used to compare the mean values of the principal component 1, PC1, and principal component 2, PC2, within the blood donor cohort and between the two cohorts. Linear regression was used to model the response of HFE C282Y genotype, age and lifetime donation count (donors with ≤ 3 donations were filtered out) to the donor’s mean haemoglobin level (measured at the point of each blood donation event). Hardy-Weinberg equilibrium, HWE, was calculated to all variants in the blood donor cohort by the Hardy-Weinberg R package (HWExact or HWChisq), version 1.7.5. Chi-squared test and Fisher’s exact test were used to analyse the significance of the consanguinity level differences between Blood Service Biobank and FinnGen. P-value adjustment was performed with Stats R package using method “BY”.

All analyses were performed in R [23] version 3.6.1 or later, with R Studio [24].

Results

Demographics of the Blood Service Biobank

We compared the demographics of the Blood Service Biobank cohort to 203,861 blood donors who had donated blood in Finland during the same period as the biobank cohort was collected (Table 1). Overall, we could see several differences between the two cohorts.

Table 1 Demographics of the Blood Service Biobank cohort and non-Biobank Blood Donor pool (starting 1.11.2017).

In both cohorts, the overall number of female donors was higher, but male donors had higher donation activity. The age and donation activity distributions differed between the two cohorts; especially high difference was seen in the donation activity of young men. Blood group distribution between the two cohorts showed modest differentiation in all the ABO Rh blood groups, apart from AB– in both sexes and B+ in women. Highest difference was seen in O– distribution. All the provinces in Finland were represented in the study cohort (Table 1).

The mean values of PC1 and PC2 were compared by t-test between the Blood Service Biobank and FinnGen cohort. There was a statistically significant albeit small difference between the mean values of the PC1s of the two cohorts (p = 6.89e–15). The difference explained only 2% of the SD of the whole study cohort. The PC1 component is assumed to reflect the east-west axis of the Finnish population [12]. No difference was seen in PC2 comparison (p = 1.00). Principal components 1 and 2 of the two cohorts are shown in Fig. 1.

Fig. 1: Principal components 1 and 2 of the Blood Service Biobank (N = 34,619) and FinnGen (N = 274,535) cohorts.
figure 1

Each point has been created by binning bivariate data points, PC1 and PC2, of the donors into a bin by R package Hexbin. Minimum count in each bin is 5.

Genetic composition of the Blood Service Biobank

Based on linear regression, we could see a strong relationship between the allele frequencies of the 53 disease-associated variants in Blood Service Biobank and FinnGen cohort (adjusted R² = 0.997, p = 6.39e–15, Fig. 2A). Hence the alleles were found in the Blood Service Biobank cohort with similar frequencies as in the FinnGen cohort, consisting mostly of hospital-based biobanks.

Fig. 2: Fifty-three rare Finnish enriched disease-associated variants in Blood Service Biobank.
figure 2

A Linear regression equation of the minor allele frequencies in Blood Service Biobank and FinnGen cohort. B Amount of variants carried by donors in Blood Service Biobank C Amount of the 53 variants homozygous for the minor allele in blood donor Biobank.

All the investigated rare variants were found among the blood donors. 80.3% (N = 25,615) of the blood donors in the Biobank had at least one of the 53 rare variants, the range was 0–9. One blood donor carried 9 different rare variants (Fig. 2A). No less than 23.6% (N = 7533) of them carried three or more variants. Of the 53 rare variants, 38 were found at least once as homozygous (Fig. 2C). For 16 variants more than 10 homozygotes were found in the Blood Service Biobank. More than hundred homozygotes were found in two variants, rs144651842 and rs77482050. rs144651842 has been shown to be protective against chronic lower respiratory diseases and rs77482050 protective against prostate cancer [13].

Two of the rare variants were not in the nominal (p < 0.05) Hardy-Weinberg Equilibrium: 17 minor allele homozygotes were found as compared to the 9 assumed for protective variant against arthrosis rs35937944, and 5 minor allele homozygotes found as compared to 2 assumed for rs201483470, a variant associated with coronary revascularization [13]. However, after adjustment the differences did not meet statistical significance.

As expected, the two hemochromatosis related mutations HFE H63D and HFE C282Y had higher frequencies in Blood Service Biobank than in the general population; H63D: MAF Blood Service Biobank 0.1164, MAF gnomAD (FIN) 0.09594, C282Y: MAF Blood Service Biobank 0.0390, MAF gnomAD (FIN) 0.0356 [25, 26]. There were 434 homozygotes (433 expected) for HFE H63D and 69 homozygotes (48 expected) for HFE C282Y. HFE C282Y homozygotes were found 43.8% more than expected. The observed number of donors found as compound heterozygotes (N = 284) for the two HFE mutations corresponded to the expected number (N = 290).

As expected, Bernard-Soulier Syndrome causing mutations were found only in heterozygous form in Blood Service Biobank. The minor allele frequencies of GP1BA Leu129Pro, MAF 0.0009, and GP9 Leu40Pro, MAF 0,0002, were lower in the Blood Service Biobank than the allele frequencies in the general population 0.0015 and 0.001 ref. [25, 26], respectively. The number of donors heterozygous for GP1BA Leu129Pro mutation, N = 55, and GP9 Leu40Pro mutation, N = 14, were close to the expected, N = 57 and N = 13, respectively. Bernard-Soulier mutation GP9 Asn45Ser was found in Blood Service Biobank with higher minor allele frequency, MAF 0.0008, than in the general population, MAF 0.00038 (ref. 25, 26). The number of donors heterozygous for this mutation, N = 49, was close to the expected, N = 51, in the Hardy-Weinberg Equilibrium. As the Bernard-Soulier mutations are rare and some of them are single-family mutations, it was not surprising that one mutation, Tyr518Leufs83, of the four was not included in the genotyping array.

Finally, we studied the occurrence of type I diabetes risk genotype HLA-DQ2/DQ8 in the Blood Service Biobank. HLA-DQ2/DQ8 heterozygous blood donors were observed 14.7% less than expected (expected N = 375, observed N = 320), however, the difference was not significant (p = 4.1e–01).

Level of haemoglobin and HLA-haplotypes of the HFE C282Y homozygotes

Linear regression model showed male gender and HFE C282Y homozygosity to be associated with higher haemoglobin level (p < 2e–16, p = 3.33e–09, respectively), whereas age and lifetime blood donation count was shown to be negatively associated with haemoglobin level (p < 2e–16, p = 3.98e–12, respectively) when mean haemoglobin levels of HFE C282Y homozygous and wild type donors were compared (Supplementary Fig. 2). HFE C282Y homozygosity increased the haemoglobin level 5.6 g/L (95% CI 3.86–7.25). Model fit well to the dataset (F (4,28336) = 8035, p < 2.2e–16) and the variables, sex, HFE C282Y genotype, age and lifetime blood donation count, explained the haemoglobin level moderately, adjusted R2 = 0.53.

As HFE C282Y maps close to the HLA segment on chromosome 6 and as hemochromatosis is known to show in European populations associated with HLA A3, B7 ref. [27, 28], we also investigated the HLA-haplotypes of the HFE C282Y homozygotes to see whether the HLA associations are similar in Finland. Altogether 79 different HLA-A to HLA-DPB1 haplotypes were found among the donors homozygous for HFE C282Y. HLA-A*03:01 allele was seen in altogether in 53.8% of the HFE C282Y homozygotes and in homozygous form in 21.4%. HLA-A*03:01-B*07:02 haplotype was seen in 33.1% in these homozygotes and in homozygous form in 9.3%. The most common HLA-haplotype from A to DQB1 was A*03:01-B*07:02-C*07:02-DRB1*15:01-DQA1*01:02-DQB1*06:02, found as heterozygous in 19.2% of the HFE C282Y homozygotes, whereas 7.5% carried this haplotype in the Blood Service Biobank cohort.

Geographical distribution of the markers in the Blood Service Biobank

When comparing the mean values of PC1 between those hetero- or homozygous for the minor alleles and those homozygous for the major allele, 46 variants showed statistically significant differences (p < 0.05). The corresponding result for PC2 was 31. The individual PC1-PC2 plots for each variant are shown in Supplementary Fig. 3. Supplementary 4 shows the mean PC1 and PC2 values in relation to geographical location in Finland based on blood donor’s region of residence.

We could see geographical enrichment in the overall frequency of the 53 rare variant carriers, HFE C282Y, GP1BA Leu129Pro, GP9 Asn45Ser and GP9 Leu40Pro mutations as well as HLA-DQ2/DQ8 heterozygotes in Finland (Fig. 3). Donors carrying multiple rare variants were enriched in the Kainuu region of the Eastern Finland. The frequency of HFE C282Y mutation was relatively high in the whole country except in Kainuu region. In addition, we could see enrichment of this mutation in Ahvenanmaa and West of Finland. GP1BA LeuPro129 was enriched in the Northern Savo region of the Eastern Finland and GP9 Asn45Ser was enriched in South Ostrobothnia. GP9 Leu40Pro mutation was found only in 14 donors as heterozygous in Uusimaa, North Karelia, Pirkanmaa, Päijät-Häme, Northern Savo, South Ostrobothnia, Northern Ostrobothnia, and Ahvenanmaa regions. Bernard-Soulier syndrome causing mutations were not met in several regions at all. HLA-DQ2/DQ8 heterozygotes were enriched in Ahvenanmaa and Lapland. Frequencies of the mutations and HLA-DQ2/DQ8 heterozygotes in each region are listed in Supplementary Table 1.

Fig. 3: Distribution and geographical enrichment of blood donors in Blood Service Biobank.
figure 3

Distribution of blood donors carrying A multiple rare disease-associated genetic variants B HFE C282Y mutation C T1D risk haplotype HLA-DQ2/DQ8, D GP1BA Leu129Pro mutation, E GP9 Asn45Ser mutation, and F GP9 Leu40Pro mutation.

Consanguinity

The degree of consanguinity in the Blood Service Biobank and FinnGen cohort was compared for the first-, second- and third-degree relatives. The level of consanguinity was higher in the Blood Service Biobank in the first-degree relatives; 1.7% (SD 0.0017) were full siblings and 2.14% (SD 0.0019) had parent-offspring relationship whereas the corresponding figures in the FinnGen cohort were 0.96% (SD 0.0013) and 0.94% (0.0014), respectively (Fig. 4). Monozygotic twins/duplicates were met slightly more often in the FinnGen cohort, 0.042% (SD 0.0003) than in the Blood Service Biobank cohort 0.039% (SD 0.0003). Differences in the first-degree relatives were statistically significant in full sibling (p = 1.07e–04) and parent-offspring level (p = 1.91e–10) but not in case of the monozygotic twins/duplicates (p = 1.0). This could be the result of twin cohorts included in the FinnGen study [29]. Second degree relatives were seen slightly less in Blood Service Biobank cohort, 2.3% (SD 0.0020), than in the FinnGen cohort, 2.4% (SD 0.0022), while the third-degree relatives were met more often in the Blood Service Biobank, 4.3% (SD 0.0026), than in the FinnGen cohort, 4.2% (SD 0.0029). There was no statistically significant difference in second- or third-degree relativeness between the two cohorts, p = 1.0 and p = 1.0, respectively.

Fig. 4: Consanguinity in Blood Service Biobank.
figure 4

The degree of consanguinity is higher in blood donor cohort than in FinnGen cohort for full siblings and parent-offspring. BSBB = Blood Service Biobank. Standard deviation of the mean value in each consanguinity group in both cohorts is shown.

Discussion

Blood donor biobank is considered to be a valuable resource in research due to the positive attitude towards research of blood donors as well as the voluntary and repeated blood donations. Samples from patients are an obvious target for many disease-focused studies and healthy blood donors carrying a desired gene variant can provide an excellent starting material for cellular and molecular biology studies without confounding effects of medication or disease progression. However, if not taken into consideration, the healthy donor effect may lead to membership bias and thus further to unpredictable consequences [6, 30].

We could see statistically significant differences between the Biobank and blood donor cohorts. There was a difference between age and donation activity distributions; this may reflect the high-level commitment of regular and usually older donors toward extra requests by the Blood Service, such as biobanking, and the fact that first-time donors are rarely recruited to the biobank. Hence, in the biobank cohort, the age distribution is focused on the older age which then also reflects on to the overall lifetime donation activity. Biological reasons may reflect to the lifetime donation activity of women. Lack of young men in Biobank cohort may be explained by recruiting practises; military service is mandatory for men in Finland, but donors are not recruited to Biobank in blood donations taking place in the defence forces. A high difference in O- blood group distribution could be a result of higher donation activity of biobank donors in general or because donors with O- blood group are relatively active and committed donors, hence they are presumably recruited to biobank more often. The number of samples from more remote and sparsely populated areas of Finland, such as Kainuu and Lapland, were low due to blood collection and biobank recruitment practices. In turn, the South-West area (Varsinais-Suomi) was over-represented in the Biobank as the result of active local recruitment.

The Finnish population is one of the most studied populations due to its unique genetic inheritance [7, 8, 29, 31]. Isolated populations that have undergone strong genetic bottlenecks like Finland, provide an opportunity to find and study genetic variants that have a strong effect in disease susceptibility and that are rare in most other populations [7, 29]. The impact of the Finnish settlement history can still be seen in the distribution of genetic variation in Finland today [12, 32, 33]. The Kainuu region has been utilised previously in several genetic studies due to the region’s unique genetic inheritance [34]. In the present study we show that the overall rare disease-associated variant frequency was highest in the Kainuu region, whereas the frequency of HFE C282Y mutation was lower in Kainuu region. Furthermore, the thrombocytopenia causing mutations GP1BA Leu129Pro, Asn45Ser and Leu40Pro weren’t met at all in Kainuu (Fig. 3). The occurrence of predisposing risk factor for type 1 diabetes, HLA-DQ2/DQ8 heterozygosity, was enriched in Ahvenanmaa and Lapland, whereas GP1BA Leu129Pro mutation was mainly seen in Northern Savo and GP9 Asn45Ser in South Ostrobothnia. These differences are largely due to the settlement history in Finland; early settlement, before A.D 1550, focused on the southern part of the country mainly by people from Scandinavia. Late settlement took place in 16th and 17th century when small family groups from the South migrated to the East and Northern parts of the country [8, 12, 35]. The genetic influence of these founder populations in East and West of Finland [12, 35] is reflected in the findings of the present study; genome level clustering either in East or West of Finland was seen in the carriers of several different genetic variants. In addition, the HFE C282Y mutation has been shown to originate in Europe [36] or possibly in Ireland or Scandinavia [37]. Taken this and the early settlement history of Finland into consideration, we could still see the route of this mutation in the present study; although present in all regions in Finland, HFE C282Y mutation was enriched in Ahvenanmaa and South Ostrobothnia regions and was met in lowest frequency in Kainuu region. Furthermore we show, that donors homozygous for this mutation carried a well-known hemochromatosis haplotype marker, HLA-A*03:01(refs. 27, 37), and that HLA-haplotype A*03:01-B*07:02-C*07:02-DRB1*15:01-DQA1*01:02-DQB1*06:02 was enriched in these donors. The latter is interesting in terms of special immunological features described in patients with clinical hemochromatosis [38].

HFE C282Y mutation as homozygous, the most common cause of hereditary hemochromatosis, causes organ damage level hemochromatosis only for a minority of its carriers, leaving room for other hepcidin regulator genes, such as TfR2 and HJV [39], and contributing factors, such as modifier genes and dietary factors [39, 40]. Identification of HFE C282Y homozygotes among blood donors may have some practical consequences for blood banks: homozygotes perhaps should avoid iron supplementation after blood donation [41] and determination of their ferritin and transferrin levels may be recommended [42]. Informing donors homozygous for HFE C282Y mutation is supported by the recommendation of American College of Medical Genetic and Genomics [43]. We are currently setting up a policy how to communicate these findings to blood donors.

Regarding the mutations causing Bernard-Soulier syndrome, GP1BA Leu129Pro, GP9 Asn45Ser and Leu40Pro no homozygous donors were found, as expected. There is no obvious reason for the higher minor allele frequency of GP9 Asn45Ser in Blood service Biobank. As this mutation is enriched in certain areas in Finland, understanding the geographical distribution of the reference dataset would be necessary. Individuals heterozygous for Bernard-Soulier Syndrome causing mutations, are measured usually half of the normal GP Ib-IX-V expression level but display only mild symptoms, such as mild bleeding [44]. It may be relevant to further investigate whether blood donation is a harmless event to these individuals or whether the blood products from individuals heterozygous for Bernard-Soulier Syndrome mutations differ from blood products of donors with no copy of these mutations.

In addition to altruism [45], having blood donors in the family has been shown to effect on blood donation motive [46]. We show, to our knowledge the first time, that the degree of first-degree consanguinity is higher among blood donors when compared to population of non-blood donors. This finding is supported by the earlier findings of “family tradition” as a motivation for blood donation [45, 46]. The greater degree of first-degree relatives enables the use of the material in linkage and trio-based analysis. In this study we wanted to understand the structure of the biobank cohort, hence the first-degree relatives were included in the study.

In the present study we focused on a limited number of variants associated with diseases, some of them being rare and enriched in Finland. More comprehensive and systematic studies considering all genetic factors potentially related to traits affecting blood donation eligibility, such as body weight, haemoglobin level or blood group systems, would be needed to fully understand the actual value, or possible bias, of blood donors in disease-associated genetic studies or as a control group in general. Also, it is worth keeping in mind that biobank donors are active and regular blood donors, hence we must be careful of making assumptions of them representing the general non-blood donor population.

The results of this study clearly indicate that for most of genetic variants and polymorphisms, even for those with a relatively strong effect size for diseases, the blood donor biobank provides a good and useful cohort for a genotype-based collection of samples. Despite of the selection of healthy individuals as blood donors, the blood donor biobank was found to include most of disease-associated gene variants and polymorphisms with frequencies equal or near to those found in the patient enriched FinnGen cohort.