Introduction

Genetic association studies of complex traits in isolated populations are advantageous in identifying risk loci that are rare in the general population but enriched in the isolate [1,2,3]. The Ashkenazi Jewish (AJ) population has been attractive for genetic studies, because of its unique demographic history of a recent severe bottleneck followed by a rapid expansion and endogamy [4]. AJ were found to carry unique mutations for several Mendelian disorders, as well as risk factors for complex diseases [5,6,7,8,9,10,11,12,13]. Importantly, while these mutations may be unique or nearly-unique to AJ, they often highlight pathways of broad significance.

Cardiovascular diseases (CVD) are a common cause of death worldwide [14]. Genome-wide association studies (GWAS) in unrelated individuals have identified thousands of genetic variants associated with CVD and their risk factors [15, 16], but the genetic risk is not fully explained. The Kibbutzim Family Study (KFS) was established in 1992 to investigate the environmental and genetic basis of cardiometabolic risk factors [17,18,19,20]. The participants belonged, at the time of recruitment, to large families living in close-knit communities, called ā€œKibbutzimā€, in Northern Israel. Kibbutzim have been communal settlements that have created a relatively homogeneous environment for their members. For example, earnings were uniformly distributed, and Kibbutzim members typically dined jointly. Kibbutzim members are mostly of Ashkenazi Jewish ancestry, with the remaining members belonging to other Jewish subgroups. The KFS is thus expected to be a useful resource for the study of cardiometabolic genetic risk factors.

While most association studies so far were conducted on unrelated individuals, the extended family design of the KFS has the advantages of a reduced sensitivity to population stratification bias and the ability to detect Mendelian inconsistencies [21]. Family-based studies also have the ability to enrich for genetic loci containing rare variants. Familial aggregation, segregation analyses, and linkage and candidate gene association studies were previously conducted in the KFS [17,18,19, 22,23,24,25], focusing on outcomes such as Low-density lipoprotein peak particle diameter [24], fibrinogen variability [23], and red blood cell membrane fatty acid composition [19]. Here, we present results for genome-wide genotyping of 901 KFS participants. We aimed to [1] characterize the population genetics of the KFS population and [2] assess the contribution of genetic variants enriched in the KFS to anthropometric and cardiometabolic traits and other health-related phenotypes.

Methods

Recruitment

The KFS participants were recruited in two phases in 1992ā€“1993 and 1999ā€“2000 [17, 24] from six Kibbutzim in Northern Israel [23]. The first recruitment phase of the study (1992ā€“1993) included 80 extended families, ranging in size from 2 to 43 individuals [26]. During the second phase (1999ā€“2000), participants from the first phase were all invited for repeat examinations (80% response rate) and new participants were recruited, giving a total of 150 extended families ranging in size from 2 to 55 individuals [17]. Families were invited to participate if they consisted of at least four individuals who (i) lived in the Kibbutz, (ii) spanned at least two generations, and (iii) were at least 15 years old. Families were retained if at least two family members consented to participate. Overall, 1033 participants were recruited; 111 were examined only in the first phase, 533 only in the second phase, and 389 in both. Participants completed a self-administered socio-demographic and health questionnaire, including questions on medical and family history and lifestyle [17, 26]. Psychosocial and dietary information was collected as well. Anthropometric and blood pressure traits (described below) were measured in both phases [19, 20], and peripheral blood samples were collected following a 12-hour fast. All subjects signed an informed consent and the study was approved by the Institutional Review Board of the Hadassah-Hebrew University Medical Center.

Genotyping and quality control

Of the 1033 participants recruited, 938 had high-quality DNA samples (A260/280ā€‰>ā€‰1.8, concentrationā€‰>ā€‰50ā€‰ng/Āµl). Genotyping was performed using Illumina HumanCoreExome BeadChip, consisting of ā‰ˆ240,000 tag single-nucleotide polymorphisms (SNPs) and ā‰ˆ240,000 exome variants. Standard quality control (QC) procedures were applied to filter variants and individuals using Plink 1.90 [27]; for details, please see Supplementary NoteĀ 1. A total of 901 individuals and 323,708 variants (281,586 variants with minor allele frequency (MAF)ā€‰>ā€‰1%) passed QC and were used in downstream analyses. The data reported in this paper was deposited at the European Genome-phenome Archive (EGA) under accession number EGAS00001002782.

Principal component analysis

Principal component analysis (PCA) was performed using PC-AiR [28], which is robust to known or cryptic relatedness. Our reference panel included West-Eurasian populations (covering Europe, West-Asia (the Middle East) and the Caucasus, nā€‰=ā€‰922) [29] and the Jewish groups listed in Supplementary TableĀ 1 (nā€‰=ā€‰174) [29]. These samples served as the ā€œunrelated subsetā€ for PC-AiR. We additionally ran PCA using only the Jewish groups (nā€‰=ā€‰174) as the reference population. In that analysis, we also included a panel of AJ recruited in the United States by The Ashkenazi Genome Consortium (TAGC) (nā€‰=ā€‰128) [12], which allowed us to examine differences in ancestry between AJ from Israel (KFS) and the US. Another PCA was run using only the AJ samples from Behar et al. (nā€‰=ā€‰29) as the reference population, to focus on differences between Western and Eastern AJ. Variants used in the PC-AiR analysis were restricted to MAFā€‰>ā€‰1% and were pruned to eliminate linkage disequilibrium (LD), using the --indep-pairwise command in Plink (window size 50ā€‰kb, a shift of ten variants at each step, and LD between variants (r2)ā€‰<ā€‰0.1).

IBD sharing and demographic reconstruction

We phased the KFS genotypes using SHAPEIT v2 [30], and detected IBD segments using GERMLINE [31] and with additional filtering by Haploscore [32] and SNP density. See Supplementary NoteĀ 2 for details. An evaluation of the improvement in the accuracy of detected IBD segments due to the pedigree-based phasing is described in Supplementary NoteĀ 3. For the demographic inference analysis, we retained only IBD segments shared between Ashkenazi founders (nā€‰=ā€‰303), as identified by the first PC in the PCA of the Jewish populations (Results). The method we used to estimate the population size history of AJ is described in Supplementary NoteĀ 4. We calculated runs of homozygosity (ROH) using plink (--homozyg-kb 5000).

Imputation

We imputed the phased genotypes using IMPUTE2 [33]. For the reference panel, we initially used either an Ashkenazi-only reference panel (TAGC; nā€‰=ā€‰128) [12], the 1000 Genomes reference panel phase 1 version 3 (nā€‰=ā€‰1092), or a combined Ashkenaziā€‰+ā€‰1000 Genomes panel (nā€‰=ā€‰1220). The estimates provided by IMPUTE2 for the concordance between the true array genotypes and their imputed values were highest when using the combined reference panel, and we thus used that panel for downstream analyses. Imputed genotypes were initially available for 82,328,870 variants. For most analyses, we only considered the 6,858,900 variants with MAFā€‰ā‰„ā€‰1% and imputation quality scoreā€‰ā‰„ā€‰0.9.

Identification of rare variants that are relatively common in the KFS

In populations that have undergone recent strong genetic drift (such as Ashkenazi Jews [12, 34]), it is expected that some risk variants of large effects have risen in frequency compared to the general population [35]. We thus focused on variants with a substantially higher frequency in the KFS compared to the general population, which we take as non-Finnish Europeans (NFE) from The Genome Aggregation Database (gnomAD). We observed that a naive search for variants with a large frequency difference led to numerous artifacts. We thus implemented a stringent QC pipeline. First, we filtered out variants withā€‰>ā€‰10% MAF difference between the KFS (founders only, nā€‰=ā€‰393) and AJ in gnomAD (nā€‰ā‰ˆā€‰150) [36]. Second, we filtered out variants withā€‰>ā€‰10% MAF difference between NFE in the 1000 Genomes Project (phase 3; CEUā€‰+ā€‰GBRā€‰+ā€‰TSIā€‰+ā€‰IBS; nā€‰=ā€‰404) [37] and NFE in gnomAD (nā€‰ā‰ˆā€‰7500) [36]. After applying the two above-mentioned filters, we extracted variants that were very rare (MAFā€‰<ā€‰0.1%) in gnomAD NFE but relatively common (MAFā€‰ā‰„ā€‰1%) in the KFS. This resulted in a total of nā€‰=ā€‰212,505 enriched variants.

To determine if the MAF ratio (KFS/NFE) correlated with the functional consequence of the enriched variants, we annotated these variants using SnpEff version 4.3q [38]. We performed gene-set enrichment analysis (GSEA) using the Molecular Signatures Database (MSigDB) on variants present in gnomAD NFE with KFS/NFE MAF ratioā€‰>ā€‰10 and high/moderate predicted functional impact, totaling 190,598 variants [39].

Association analysis

We performed association tests with BOLT-LMM v2.2 [40]. BOLT-LMM accounts for relationships between individuals and population structure using a linear mixed-model, as well as handles imputed ā€˜ā€˜dosageā€™ā€™ data. For building the mixed-model, we used 299,509 genotyped variants (MAFā€‰>ā€‰0.1%). We used the 1000 Genomes LD-Score table provided with BOLT-LMM. We only tested the 212,505 enriched variants withā€‰>ā€‰10x higher KFS/NFE MAF ratio (see above). P-value threshold for significance was set at 1.61āˆ™10āˆ’6 (See Supplementary NoteĀ 5 for detailed description).

Supplementary TableĀ 3 lists the 16 anthropometric and cardiometabolic traits we analyzed and their corresponding heritability estimates (based on all imputed genetic variants), as calculated by BOLT-REML [41]. All models were adjusted for age, gender, and phase. Lipid-lowering medication was accounted for by introducing a dichotomous covariate for medication use and blood pressure lowering medication was adjusted by adding 10 and 5ā€‰mmā€‰Hg to systolic (SBP) and diastolic (DBP) blood pressures, respectively [42]. Lipoprotein (a), C-reactive protein, and triglycerides variables were inverse normal transformed. We observed no improvement in association results when using the non-infinitesimal mixed-model test in BOLT-LMM, and thus all reported results are for the standard infinitesimal model.

To determine the number of independently associated loci, we first excluded associated variants in high LD (r2ā€‰>ā€‰0.95) with the index SNP (lowest P-value) in each chromosomal region, followed by a conditional analysis using the index SNP as a covariate.

Results

Samples

Our study included 1033 participants (47% male, 53% female) who were recruited during two phases (1992ā€“1993 and 1999ā€“2000) from 150 families (445 founders). The majority of families spanned three (57.3%) and two (28.0%) generations. The mean family size was 6.89 individuals (range 2ā€“55). Participantsā€™ characteristics by gender are given in TableĀ 1. The 16 anthropometric and cardiometabolic traits used in the association analysis are summarized in Supplementary TableĀ 4 by gender and age group.

Table 1 Socio-demographic characteristics of the KFS (nā€‰=ā€‰901)

Population genetics

Principal component analysis (PCA)

To study the genetic ancestry of the KFS participants, we ran PCA (Methods) on the genotyped KFS samples (nā€‰=ā€‰901), along with worldwide (nā€‰=ā€‰922) and Jewish (nā€‰=ā€‰174) reference populations [29] (Supplementary TableĀ 1). The first two principal components (Fig.Ā 1) distinguish three main non-Jewish population groups: European, Caucasian, and West-Asian (Middle-Eastern), and six Jewish populations: Ashkenazi, Sephardi, North-African, Yemenite, West-Asian, and Caucasian. A partial overlap is observed between AJ and European non-Jews, as well between West-Asian and Caucasian Jewish and non-Jewish populations.

Fig. 1
figure 1

A principal components analysis (PCA) of the KFS samples (nā€‰=ā€‰901, blue cross marks), along with reference samples from Jewish (nā€‰=ā€‰174) and non-Jewish (nā€‰=ā€‰922) populations

The KFS samples largely overlapped with the AJ reference samples [29]. To study the non-Ashkenazi ancestries in the KFS, we ran PCA with the KFS samples and the Jewish reference populations only (Supplementary Fig.Ā 1). The number of individuals with exclusive AJ ancestry, as distinguished by the first PC (PC1ā€‰ā‰¤ā€‰0), was nā€‰=ā€‰733 (81.4%). The majority of the remaining individuals overlapped with the Sephardi and North-African Jewish clusters, but the Middle-Eastern, Caucasian, and Yemenite Jewish populations were also represented. Some individuals seemed to have a mixed Ashkenazi and other Jewish ancestry, although quantifying their exact number is difficult with PCA.

Self-reported country of birth allowed us to compare the PCA-based and self-reported Jewish ancestry for 247 individuals born outside Israel (Supplementary Fig.Ā 2). Among 140 individuals self-reported as AJ (born in Northern and Central Europe), 136 (97%) met the defined genetic criterion (PC1ā€‰ā‰¤ā€‰0). Among the 11 individuals self-reported as North-African Jewish, 9 (82%) met a pre-defined genetic criterion (PC1ā€‰>ā€‰0.03 and PC2ā€‰>ā€‰0.05).

Next, we asked whether AJ with recent origins in Eastern vs. Western Europe are genetically distinct. We designated KFS individuals born in Germany as Western AJ, and individuals born in Poland, Russia, Hungary, and Romania as Eastern AJ. A PCA plot revealed that Eastern and Western AJ can be distinguished in PC space, albeit imperfectly (Supplementary Fig.Ā 3). We observed the same pattern in the samples of Behar et al. [29].

We observed no differences in PCA between the KFS AJ samples and 128 US-based AJ [12] (Supplementary Figs.Ā 1 and 3), indicating no difference in genetic ancestry between Israel- and US-based AJ. This result, which agrees with the IBD-based analysis of Gusev et al. [43], is expected based on the short time since the migrations of AJ out of Europe and suggests that the source population for these migrations was relatively homogeneous.

IBD sharing and demographic reconstruction

We detected IBD segments longer than 3ā€‰cM shared between AJ founders in the KFS (Methods). Using the number and lengths of observed segments, we confirmed a recent severe bottleneck in the AJ recent history (point estimates: effective size ā‰ˆ450 individuals, 23 generations ago) [12]. See Supplementary NoteĀ 6 for complete details. IBD sharing also revealed differences in ancestry between Eastern and Western AJ. The mean number of segments shared within Western AJ was 1.4x larger than within Eastern AJ (8.4 vs. 5.9, Pā€‰<ā€‰10-7; Supplementary TableĀ 5), but the mean segment length was similar (ā‰ˆ5.5ā€‰cM, Pā€‰=ā€‰0.28). Sharing levels were particularly high in the group of Western AJ that was distinct by PCA (Supplementary TableĀ 5). The number and lengths of long runs of homozygosity (Methods) did not significantly differ between Eastern and Western AJ (Supplementary TableĀ 6).

Functional annotation of variants enriched in the KFS

We annotated the function of 212,505 variants of MAFā€‰>ā€‰1% in the KFS andā€‰<ā€‰0.1% in Europeans (Methods). We identified 62 (0.03%) high impact and 291 (0.13%) moderate impact variants, with the remaining predicted to have low or no functional significance (ā€œmodifiersā€) according to SnpEff (Supplementary TableĀ 7). We observed no correlation between the MAF ratio (KFS/Europeans) and the putative functional significance (Supplementary Fig.Ā 7).Gene-set enrichment analysis (GSEA) on the 201 genes that contained at least one variant with MAF ratioā€‰>ā€‰10 and a high/moderate functional impact (Methods) identified highly significant enrichment (false discovery rate q-valueā€‰<ā€‰10-5) in 25 gene sets (pathways) in the molecular signature database (MSigDB). The top pathways were mostly related to cancers (breast cancer, prostate cancer, skin cancer, and sarcomas, among others) (Supplementary TableĀ 8). There was no enrichment for cancer-related pathways when random sets of genes with variants of no functional significance were analyzed.

Association of variants enriched in the KFS with anthropometric and cardiometabolic traits

We considered the 212,505 enriched variants and used BOLT-LMM to test for an association of these variants with 16 anthropometric and cardiometabolic traits (Methods; qq-plots and Manhattan plots are shown in Supplementary Figs.Ā 8 and 9, respectively). We set the P-value threshold for significance to 1.61āˆ™10-6 (Methods). At this significance level, 24 variants were significantly associated (TableĀ 2), comprising seven independent loci. We report gender-specific results for these variants in Supplementary TableĀ 9, and locus zoom plots in Supplementary Fig.Ā 10.

Table 2 Significant associations for enriched variants in the KFS (MAFā€‰>ā€‰1% and 10x higher compared to Europeans) and replication in the JPS cohort

Our main finding is a region spanning seven variants (453ā€‰kb, KFS/European MAF ratio between 56 and 228), located in chr8p23.1, and associated with body-weight, waist circumference, and body mass index (BMI). The most significant association was with body weight for (hg19) chr8:g.9887880ā€‰Tā€‰>ā€‰G (Pā€‰=ā€‰3.6āˆ™10-8), an imputed variant located upstream of MSRA (methionine sulfoxide reductase A). This is the only variant with a study-wide significant association (Pā€‰<ā€‰1.61āˆ™10-6/16).

In other chromosomes, a large region (1.9ā€‰Mb) in chr13q14.3 showed a significant association with lipoprotein(a) (LPA). The region contained ten variants, all withā€‰>ā€‰189-fold MAF ratio, with the most significant result at rs780360029 (Pā€‰=ā€‰3.8āˆ™10-7). These variants span eight genes (TableĀ 2), three of which belong to a region that is frequently deleted in B-cell chronic lymphocytic leukemia (DLEU) [44]. Two intronic variants in chr6q25.3-26, a known locus for LPA, were significantly associated with LPA; rs754054303 at ACAT2 gene and rs185882981 at the LPA gene (TableĀ 2). Two additional intronic variants in chr17q25.1ā€”rs566833653 in CDR2L and rs759145164 in KCTD2ā€”were both associated with height (Pā€‰=ā€‰4.7āˆ™10-7 and Pā€‰=ā€‰4.2āˆ™10-7, respectively; TableĀ 2), and are absent in Europeans. Some of the top hits show suggestive differences in P-values between the sexes (Supplementary TableĀ 9).

We pursued replication of our findings in another Israeli cohort, the Jerusalem Perinatal Study (JPS) [45], which consists of parents and their children who were born in the 1970s in Jerusalem. We ran linear regression analyses separately for children (nā€‰=ā€‰857, mean age ā‰ˆ32) and mothers (nā€‰=ā€‰763, mean age ā‰ˆ60) for 12 of the 24 associated variants (the remaining were associated with LPA, which was not available in the JPS) and meta-analyzed the results in all three groups (TableĀ 2). The top hit for weight, BMI, and waist circumference at chr8:g.9887880ā€‰Tā€‰>ā€‰G had the same direction and magnitude of effect in the KFS and the JPS, with P-values around 10-4 in the JPS mothers and 10-10 overall (TableĀ 2). The nearby variant rs759188048 similarly replicated, but the results for more distant variants in that locus were mixed. Among the other loci, the association of chr8:g.17880544ā€‰Gā€‰>ā€‰C with waist-to-hip ratio and that of rs776420285 with hip circumference replicated in the JPS mothers.

Discussion

We analyzed the genotypes of 901 individuals from extended families living in Kibbutzim in Israel, who had detailed records on anthropometric traits and cardiometabolic risk factors. The data enabled us to refine population-genetic patterns of Israeli Jews, as well as study genetic associations with 16 traits.

Ashkenazi Jewish population genetics

PCA confirmed self-reported ancestries and allowed precise assignment of ethnic origins for most KFS individuals. Participants were mostly of AJ origin (81.4%), with the remaining having various other Jewish ancestries. It was previously estimated that AJ have experienced a founder event ā‰ˆ25ā€“35 generations ago with an effective population size of ā‰ˆ300ā€“400 individuals [12, 34]. We established that these estimates hold for our independent AJ sample.

A popular theory of Ashkenazi origins is an initial settlement in Western Europe (Northern France and Germany), followed by migration to Poland and an expansion there and in the rest of Eastern Europe [46]. An open question is whether AJ with recent origins in Eastern Europe are genetically distinct from Western European AJ. Early mtDNA and disease mutation studies have identified differences between AJ from different origins [10, 47], and a recent study of mtDNA diversity in AJ has found large differences in haplotype frequencies between Western and Eastern AJ [48, 49]. With genome-wide data, a previous study of ā‰ˆ1300 AJ [4] did not find a correlation (on a PCA plot) between genetic ancestry and a country of origin. A study of IBD sharing across the US did find three AJ sub-clusters, but could not assign the clusters to specific locations [50]. A later study of 29 AJ [29], which is part of the Jewish reference panel used here, did not identify genetic differences between Eastern and Western AJ, except for a minute East-Asian component in the ADMIXTURE analysis that was present in Eastern but not Western AJ. Our analysis of the KFS individuals who reported their country of origin showed that many Western AJ cluster separately from Eastern AJ, and the same pattern was observed in our re-analysis of the data of Behar et. al. [29]. IBD sharing analysis showed 1.4x more shared segments within Western AJ compared to Eastern AJ (and an even higher levels of sharing (2.1ā€‰Ć—ā€‰) for those Western AJ who were distinct on PCA; Supplementary TableĀ 5; Supplementary Fig.Ā 3), but with no difference in mean segment length. An explanation consistent with these observations is that Western AJ consist of two slightly distinct groups: one that descends from a subset of the original founders (represented by those who are distinct on the PCA plot), and another that has migrated there back from Eastern Europe, possibly after absorbing a limited degree of gene flow. We note, however, that we cannot exclude the possibility that the results reflect, at least partly, biased sampling of Western AJ in the KFS.

Analysis of rare European variants that are relatively common in AJ

Studying isolated or founder populations such as AJ is expected to increase power to discover disease-associated genes, due to the rise in frequency of rare or unique risk alleles [35, 51, 52]. Here, we did not observe a correlation between variants enriched in AJ (the KFS) and putative functional significance. Nevertheless, for enriched variants with a functional impact, we identified a significant overlap with several cancer-related gene-sets, including breast cancer. AJ women have a high risk of familial breast cancer, mostly due to founder mutations in the BRCA1 and BRCA2 genes [53]. While no functional enriched variants were observed in BRCA1 or BRCA2 in the KFS, a number of genes with functional enriched variants were found to interact with BRCA1 (Supplementary TableĀ 8). We note that whether cancer is more prevalent in Ashkenazi Jews compared to the general Western population is debated, and possibly limited to colorectal and prostate cancers, if at all [54,55,56].

We detected seven loci with AJ-enriched variants that were associated with anthropometric and cardiometabolic traits. The most strongly associated locus included seven variants surrounding the MSRA gene in chr8p23.1 that were associated with body weight, waist circumference, and BMI. The association of the index SNP in this locus (chr8:g.9887880ā€‰Tā€‰>ā€‰G) was replicated in another Israeli cohort (TableĀ 2). Variants near this region (100ā€‰kb upstream), located between the genes TNKS and MSRA, were found to be associated with extreme obesity in children and adolescents [57] and with adult waist circumference [58] in individuals of European ancestry. Our findings may implicate MSRA as a candidate gene for these observed associations. This gene encodes a ubiquitous and highly conserved protein that carries out the enzymatic reduction of methionine sulfoxide to methionine.

Another region of interest is chr13q14.3, showing significant associations of ten variants with LPA. This region includes the DLEU genes, which are frequently deleted in B-cell chronic lymphocytic leukemia, suggesting a role of one or more tumor suppressors [44]. The variant rs749307626 is located in an intronic region of the DLEU2 gene, which was previously associated with waist-to-hip ratio in a meta-analysis of African and European populations [49]. Three variants are located in the DLEU1 gene, previously associated with anthropometric traits in another isolate (Korčula Island, Croatia [59]). One variant is located in the DLEU7 gene, previously associated with height in Europeans and Africans [60]. The variant rs756877701 is located in an intronic region of the PHF11 gene, which was previously associated with cardiomegaly in the Amish population [61]. Finally, two AJ-enriched variants in the known LPA locus on chr6 [62, 63] were associated with LPA in the KFS, providing evidence for the generalizability of our results.

Outlook

We report here the first genetic association study of enriched AJ variants with cardiometabolic traits in the Israeli Jewish population. In this study, we have identified a number of suggestive associations and also refined the understanding of the population genetics of Ashkenazi and other Jewish groups. Current limitations of our study include its relatively small size and its focus on Ashkenazi Jews. Thus, additional analyses will be required in larger Jewish samples, as well as in other populations, to replicate the findings and elucidate the mechanisms underlying the observed associations. We conclude that the KFS is a valuable source for studying genetics of complex traits as well as Jewish genetics in the setting of a longitudinal family study.