Unraveling a fine-scale high genetic heterogeneity and recent continental connections of an Arabian Peninsula population

Eaaswarkhanth, Muthukrishnan; Pathak, Ajai K.; Ongaro, Linda; Montinaro, Francesco; Hebbar, Prashantha; Alsmadi, Osama; Metspalu, Mait; Al-Mulla, Fahd; Thanaraj, Thangavel Alphonse

doi:10.1038/s41431-021-00861-6

Download PDF

Article
Open access
Published: 22 March 2021

Unraveling a fine-scale high genetic heterogeneity and recent continental connections of an Arabian Peninsula population

European Journal of Human Genetics volume 30, pages 307–319 (2022)Cite this article

4857 Accesses
3 Citations
25 Altmetric
Metrics details

Subjects

Abstract

Recent studies have showed the diverse genetic architecture of the highly consanguineous populations inhabiting the Arabian Peninsula. Consanguinity coupled with heterogeneity is complex and makes it difficult to understand the bases of population-specific genetic diseases in the region. Therefore, comprehensive genetic characterization of the populations at the finest scale is warranted. Here, we revisit the genetic structure of the Kuwait population by analyzing genome-wide single nucleotide polymorphisms data from 583 Kuwaiti individuals sorted into three subgroups. We envisage a diverse demographic genetic history among the three subgroups based on drift and allelic sharing with modern and ancient individuals. Furthermore, our comprehensive haplotype-based analyses disclose a high genetic heterogeneity among the Kuwaiti populations. We infer the major sources of ancestry within the newly defined groups; one with an obvious predominance of sub-Saharan/Western Africa mostly comprising Kuwait-B individuals, and other with West Eurasia including Kuwait-P and Kuwait-S individuals. Overall, our results recapitulate the historical population movements and reaffirm the genetic imprints of the legacy of continental trading in the region. Such deciphering of fine-scale population structure and their regional genetic heterogeneity would provide clues to the uncharted areas of disease-gene discovery and related associations in populations inhabiting the Arabian Peninsula.

Genome-wide association studies

Article 26 August 2021

Network of large pedigrees reveals social practices of Avar communities

Article Open access 24 April 2024

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

Introduction

The Arabian Peninsula (AP) is a melting pot of human diversity and culture. Its geographical location, as a land bridge connecting Africa and Eurasia, served as a gateway for human migration from the African continent to the rest of the world [1]. Recently emerging archeological findings and ancient artifacts indicate that humans occupied AP much earlier than previously thought, thereby providing a new paradigm of the migration of modern humans from Africa [2, 3]. In contrast to the present arid desert landscape, the AP region was once humid with a green environment that bolstered modern human dispersals [4]. It has been estimated that the present hot and dry desert climate would have started around 6 kya [4], and there was discontinuity in population movements in the AP region between the late Pleistocene and Early Holocene [5]. However, the AP was repopulated later [5] and gradually occupants adapted to the arid ecological environment [6] establishing distinct settlements [6,7,8]. Moreover, the Arabian Sea served as a maritime corridor for spice trade between Africa, Arabia, South Asia, and Southeast Asia [9]. The early nomadic lifestyle in the arid environment, in addition to the frequent influx of other populations, subsequent admixture, and consanguineous practices drove the emergence of indigenous ethnic populations throughout the AP region [6]. The multiple waves of migration of populations from Europe, Asia, and Africa drove the genetic diversity of ethnic Arab populations. For example, human populations that migrated to Qatar have been genetically categorized into three groups based on putative ancestors from Arabia/Bedouin, Persia/South Asia, and Africa [10]. As evident from a few genetic studies, genetically heterogeneous groups currently populate the AP [11, 12]. Our recent genome-wide selection analysis also highlights this inter-regional genetic heterogeneity [13]. The incidence of genetic heterogeneity in the highly consanguineous Arab populations increases the complexity of understanding the genetic bases and etiology of various diseases prevalent in the Arab populations. Notably, more than 1100 different genetic diseases have been reported in Arab populations, of which 60% are autosomal recessive and 44% are restricted to a specific population or geographical region [14]. Hence, there is an urgent need for the genetic characterization of these regional populations at the finest level. Therefore, it is imperative to comprehend the fine-scale genetic structures and regional heterogeneities of populations inhabiting the AP. However, the details surrounding this context remain understudied. In consideration of this background, we conducted high-resolution population genetic analyses to better understand the fine-scale patterns of the genetic diversity and the extent of genetic heterogeneity of the Arab people of Kuwait.

Kuwait is located in the northeastern corner of the AP and is bordered by Saudi Arabia to the south and Iraq to the north. Several archeological explorations reflect Kuwait as a vital location which has been used by ancient civilizations as a military or trade station [15, 16]. The State of Kuwait was established during the early 18th century by pastoral-nomadic tribes that migrated from Saudi Arabia [17] and underwent a dramatic transformation after the discovery of oil in the late 19th century [18]. Apart from cattle ranching, the early settlers were involved in various occupations such as fishing and merchant seafaring for survival [19]. The large-scale maritime trade activities with India and Africa set Kuwait as a nexus connecting India, Arabia, and Persia to Europe [18]. These occupations stratified groups socially into aristocrats, merchants, and nomads (Bedouins) [20]. The oil boom endowed Kuwait with an affluent economy, high employment opportunities, migration of skilled workers from Asia and Africa, high-tech infrastructures, and westernized lifestyles. The resettlement of peoples from neighboring regions, mainly Saudi Arabia and Persia, and the admixture of various populations, facilitated gene flow [21], thereby increasing the genetic diversity. As Arab populations are well-known for their overwhelming consanguinity, Kuwait is no exception and eventually face the consequences in the form of severe recessive diseases [22]. Studies based on uniparental markers have delineated the maternal [23,24,25] and paternal [26, 27] diversity in Kuwait. Our earlier study based on genome-wide single nucleotide polymorphisms (SNPs) delineated the genetic substructure of the Kuwait population based on their surnames and genetic ancestry [28]. Accordingly, putative ancestors of the current Kuwaiti population can be traced to Saudi Arabian (Kuwait-S), Persian (Kuwait-P), and Bedouin (Kuwait-B) populations, as further corroborated by whole exome- and genome-based studies [29,30,31,32]. However, the extent of genetic heterogeneity and fine-scale population structure have not yet been elucidated. Therefore, we conducted the present investigation to expand the findings of our previous studies by considerably increasing the sample size and comparing them with the growing body of genomes from modern and ancient specimens across West Eurasia, in addition to applying multiple haplotype-based analyses on a genome-wide scale.

Materials and methods

The detailed materials and methods are presented in the Supplementary Text.

Samples

We included DNA samples of 620 Kuwaiti individuals from the State of Kuwait who were part of a larger cohort collected for studies of metabolic disorders [28, 33, 34]. All participants were healthy and enrolled after obtaining written informed consent. The study protocol was approved by the Ethical Review Committee of the Dasman Diabetes Institute (Kuwait). Participant recruitment, sample collection, and related procedures were conducted in accordance with the tenets of the Declaration of Helsinki. Each of the 620 individuals were assigned to subgroups (Kuwait-P, Kuwait-S, and Kuwait-B) based on their surnames as described elsewhere [28].

Genotyping and quality control

The 620 individuals were genotyped using HumanOmniExpress arrays for 730,525 SNPs (Illumina, Inc., San Diego, CA, USA) in accordance with the manufacturers' protocol. Quality control checks and data filtering were conducted using the PLINK v1.9 [35]. After QC and relatedness filtering, 583 individuals and 587,819 SNPs met the inclusion criteria for analysis (Supplementary Text and Supplementary Fig. 1). The genotypes from the individuals analyzed in this study has been deposited at the EGA repository (European Genome-Phenome Archive; accession number: EGAS00001005034).

Datasets, merging, and phasing

We merged our data with the published genome-wide SNP genotype data of global populations (Supplementary Fig. 1 and Supplementary Table 1) available from the Estonian Biocentre database (https://evolbio.ut.ee/). The combined dataset was filtered using PLINK v1.9 [35] to include 244,688 SNPs and 2139 individuals (Supplementary Text). The filtered combined dataset was phased using the SHAPEIT algorithm [36] and used for further analyses. To avoid the effects of markers with a strong linkage disequilibrium (LD), we thinned the marker set by removing SNPs with an r² value of >0.4 using a sliding window of 200 SNPs, shifted at intervals of 25 SNPs. The pruned dataset yielded 155,744 SNPs that were used for the relevant population genetics analyses, including Wright’s F-statistic (F_ST), principal component analysis (PCA), the ADMIXTURE for ancestry estimation, and runs of homozygosity (RoH).

Population structure analyses: F _ST, PCA, ADMIXTURE, and RoH

To explore the population genetic structure, we initially computed the mean pairwise F_ST differences between all population groups using the Weir and Cockerham method [37] implemented in PLINK v1.9 [35]. Next, we conducted PCA of the LD-pruned combined dataset using the smartpca program in the EIGENSOFT software package version 6.1.4 [38]. Further, we ran an unsupervised structure-like analysis using ADMIXTURE v1.3.0 [39] on the same dataset 25 times at different time intervals, with K values ranging from 2 to 12. Notably, K = 9 was the best supported K value as determined from the lowest cross-validation indexes. RoH was estimated using PLINK v1.9 [35] with a sliding window of 100 SNPs (1000 kb), allowing for one heterozygous and five missing calls per window.

Analyses to test admixture events and relative allele sharing: f ₃ and f ₄ statistics

We computed the f₃ and f₄ statistics using the qp3Pop and qpDstat programs (with f₄ mode: YES) implemented in the ADMIXTOOLS package [40]. A dataset containing 244,688 SNPs and 2139 individuals was used for the f-statistics analyses of modern individuals. We merged the dataset from the modern and ancient human genomes, which contained a combined total of 231,418 SNPs and 3697 individuals. We computed the derived allele sharing of the Kuwaiti population using outgroup f₃ to measure the genetic similarity of different Kuwaiti groups. Admixture f₃ was calculated to infer the plausible admixing sources in the history of the Kuwaiti population. We calculated the f₄ statistic to evaluate the level of gene flow between contemporary populations of Kuwaitis and their regional neighbors and allele sharing between modern Kuwaitis and available published data of ancient individuals from surrounding regions. Further, we computed f₄-ratio [41] to estimate the amount of Neanderthal ancestry present in the Kuwaiti subgroups. More details on the estimations are presented in the Supplementary Text.

Haplotype-based fine-scale analyses: ChromoPainter and fineSTRUCTURE

We used the phased dataset with 244,688 SNPs and 2139 individuals for haplotype-based analyses. ChromoPainter was used to “paint” each individual as a combination of all other sequences [42]. We executed the “All vs. All” mode (using the -a flag), where all individuals are considered as both donors and recipients. Next, we analyzed the resulting painted dataset using fineSTRUCTURE [42] to identify genetically homogenous clusters. The step-wise run parameters are described in the Supplementary Text. The analyzed individuals were initially classified into 233 clusters, which we reduced to increase the interpretability of subsequent analyses. More specifically, we iteratively “climbed the tree” and the combined branches consisted of less than five clusters if at least one of them was composed of less than five individuals. The obtained tree was further refined by pooling together pairs or triplets of clusters if the pairwise total variation distance (TVD) based on the number of chunks shared among members of a branch was >0.035. After refinement, 40 clusters remained.

Nonnegative least square (NNLS)

Starting from the copying vectors obtained with the ChromoPainter, we reconstructed the ancestry profile of each cluster or individual by applying a slight modification of the NNLS function of R v3.5.1 (https://www.r-project.org/), as described elsewhere [43]. Therefore, for each individual belonging to a Kuwait cluster and for each of these clusters, we decomposed their ancestry as a mixture (with proportions summing to 1) of five (North/East Europe, Bedouins1, Yoruba, Druze, and North Africa clusters) and three (only North/East Europe, Bedouins1, and Yoruba) putative ancestral sources.

Exploring the variability of Kuwaiti individuals via pairwise TVD analysis

To obtain a detailed picture of the variation underlying modern-day Kuwaiti population, we determined the pairwise TVD [44] among different individuals in specific clusters. TVD indicates the differences in ancestry profiles among individuals, where a high TVD value indicates high heterogeneity. First, we determined the TVD among individuals of the same cluster. Second, focusing only on Kuwaiti individuals, we determined the TVD among Kuwaitis and all the members of the respective clusters. Third, for each cluster, we determined the TVD only among Kuwaiti individuals. The analysis was performed in consideration of the number and length of genomic fragments inherited among individuals.

Estimation of admixture dates

The times of admixture events were investigated using the GLOBETROTTER [45]. We applied an “individual” approach by analyzing each Kuwaiti individual alone. The estimation parameters are described in the Supplementary Text. For each of the inferred admixture events, we considered only those that were characterized by bootstrap values for the time of an admixture event between 1 and 400. We also computed a weighted LD statistic estimating the date of admixture using ALDER v1.03 that model the decay of admixture LD [46] to validate the admixture events calculated by ALDER and GLOBETROTTER are consistent.

Results

Population structure

As populations inhabiting the AP are well-known for their consanguinity, we compared the RoH patterns of the Kuwaiti population with those of regional and continental populations. Among the three Kuwaiti subgroups, the Kuwait-S subgroup was proximal to the AP and the Middle East in terms of both the length and number of RoH segments (Fig. 1A), whereas the Kuwait-B and Kuwait-P subgroups were distant from the Kuwait-S subgroup, with the Kuwait-B subgroup displaying the lowest average length and number of RoH segments (Fig. 1A).

Furthermore, we applied the structure-like clustering algorithm ADMIXTURE to determine the detailed discrete genetic structure of the Kuwait population [39]. A given number of distinct ancestral populations is input as clusters (K) and the ADMIXTURE computes the genetic ancestry proportion of each individual for individual clusters or ancestral populations. We chose the best K value of 9 with the lowest cross-validation index. At K = 9, the Kuwaiti populations were characterized by six substantial ancestral components like their neighboring populations across the AP and the Middle East (Fig. 1B). However, each of the three specified Kuwaiti subgroups harbor different proportions of these ancestral components. The brown (Arabian) component, which was shared by all populations of the AP and the Middle East and maximized in Saudis and Bedouins, was the most prominent in Kuwait-S, followed by Kuwait-B and least in Kuwait-P subgroups. The Caucasus (orange), North African Mozabite-like (magenta), South Asian (red), and Kalash-like light green components were higher in the Kuwait-P subgroup (alike Iranians) than the two other Kuwaiti subgroups. The Kuwait-B subgroup was distinct among the Kuwaitis by having the highest overall African ancestry, which was dominated by the West African Mandenka/Yoruba-like blue component (Fig. 1B).

Assessment of mean pairwise F_ST differences reflected the regional affinity through the low degree of differentiation between the Kuwaiti subgroups and the AP populations (Fig. 1C). The Kuwait-P subgroup had a shorter genetic distance (F_ST < 0.01) from Arabians, Middle Easterners, Caucasians, South Asians, and Europeans, and the Kuwait-B subgroup with Arabians, and the Kuwait-S subgroup with Arabians and Middle Easterners. The highest degree of differentiation (F_ST > 0.05) was observed between the Kuwait-P/Kuwait-S subgroups and Africans/East Asians. The Kuwait-B subgroup was genetically distant from East Asians, but not Africans (Fig. 1C). The individual population-wise F_ST differences are presented in Supplementary Fig. 2.

PCA was performed to evaluate the relationships among global and regional populations, which inferred the geographical affinity of all three Kuwaiti subgroups from their distributions in the close vicinity of their Arabian and Middle Eastern neighbors (Supplementary Fig. 3). As the allele frequency-based PCA did not reveal the heterogeneous population structure at the finest level, we further performed PCA of the haplotype copying vectors (Fig. 3) together with other haplotype-based investigations.

Drift and allelic sharing of the Kuwaiti population with modern and ancient individuals

The f₃ admixture results provide distinct patterns of the histories of all three Kuwaiti subgroups, wherein the Kuwait-B subgroup had a significant admixture signal (i.e., Z-score < −3) when one of the ancestral sources was either Bedouin or Saudi, while the second source was a different African population (Table 1). Both the Kuwait-S and Kuwait-P subgroups had a common ancestral source in Saudis, but not Bedouins, while the second source varied. The signal for the Kuwait-S subgroup was significant only when the second ancestral source was African, whereas the Kuwait-P subgroup had a significant admixture signal when the second ancestral source was a population from South Asia, Parsi, Caucasus, or Central Asia. Hence, among the three subgroups, the Kuwait-S is probably related to an ancestral basal AP population group.

Table 1 Admixture f₃ (Source1, Source2; Target) to detect putative admixing sources in context of three assigned Kuwaiti population subgroups.

Full size table

By f₃ outgroup analysis with modern populations using Mbuti as an outgroup, we observed that all three Kuwaiti subgroups had an identical declining pattern of shared drift from Europe to East Asia (Fig. 2A). The results also showed that the Kuwait-B subgroup had a distinctly lower shared drift than both the Kuwait-P and Kuwait-S subgroups, in regard to all modern populations across the globe. Compared with the Kuwait-S subgroup, the Kuwait-P subgroup had a higher shared drift with most Eurasian populations, with the exception of those from the AP and Africa. However, the Kuwait-S subgroup had a higher shared drift with populations with deeper Arabian and African genetic backgrounds. Interestingly, among the three subgroups, the Kuwait-B subgroup had the least drift sharing with Africans, which can probably be explained by the higher masking of African-related alleles from the outgroup. To verify this hypothesis, we performed another f₃ analysis (Supplementary Fig. 4) using Papuans (instead of Mbuti) as an outgroup underlying the idea of an early split of Papuans from other non-Africans. Consequently, we observed that the drift sharing was greater for the Kuwait-B subgroup with Africans than the other two Kuwaiti groups. This drift sharing of the Kuwait-B subgroup with Africans was relatively higher with Yoruba/Mandenka/Bantu/African Pygmy populations than that with Ethiopian/Moroccan/Mozabite populations.

**Fig. 2: Outgroup f₃ (Mbuti; Pop1, X) results of the three subgroups of Kuwaiti populations with Mbuti as an outgroup.**

In the context of the f₃ outgroup analysis with ancient West Eurasian individuals, the Kuwait-B subgroup again stood apart from the other subgroups due to the low derived allele sharing with all analyzed ancient genomes (Fig. 2B). The Kuwait-P subgroup had the highest genetic affinity with Iberian_BA or Anatolian_ChL individuals but showed higher genetic relatedness with most of the ancient individuals compared with the Kuwait-B and Kuwait-S subgroups. With respect to the Europe_EN and Europe_MnChl, the Kuwait-P and Kuwait-S subgroups shared equal amounts of derived alleles. Interestingly, the Kuwait-S subgroup had higher genetic sharing with the Levant_N, Levant_BA, and Natufians than the Kuwait-P and Kuwait-B subgroups. In general, all three subgroups showed a pattern of higher genetic affinity with ancient individuals from the Steppe and Caucasus regions.

The f₄ statistical results of the modern genomes (Supplementary Table 2) showed higher allelic sharing (gene flow) of the Kuwait-P and Kuwait-S subgroups with all Eurasian populations, compared with the Kuwait-B subgroup. Kuwait-P was found to have the highest sharing of alleles with all Eurasian populations, with the exception of those from the AP and Africa. Populations of the AP and Africa mostly shared higher alleles with the Kuwait-S subgroup than with the Kuwait-P and Kuwait-B subgroups. Given the pronounced African genetic component of the Kuwait-B subgroup in the ADMIXTURE results, the less allelic sharing in f-stats could be the consequence of higher masking of African-like alleles. The f₄ statistical analysis with aDNA also disclosed a similar pattern, as the Kuwait-P and Kuwait-S subgroups shared higher alleles with all ancient individuals than with the Kuwait-B subgroup (Supplementary Table 3). Kuwait-P shared more alleles with almost all ancient individuals than the Kuwait-S subgroup, except in the context of the Levant_N, Levant_BA, and Natufian, which shared more alleles with the Kuwait-S subgroup than with the Kuwait-P subgroup. However, with regard to the Europe_EN and Anatolia_N, both the Kuwait-P and Kuwait-S subgroups shared almost equivalent numbers of alleles. From the f₄-ratio estimates, we observed very low proportions of Neanderthal ancestry (<0.5%) in the Kuwaiti subgroups (Supplementary Table 4). As the associated standard errors are quite high, no definite conclusion could be reached in this context.

Distinct genetic heterogeneity in the Kuwaiti population

To understand the fine-scale population structure and recent admixture history, we scrutinized the haplotype-sharing patterns among Kuwaiti individuals. Through fineSTRUCTURE analysis, we inferred the existence of eight different groups that included at least five Kuwaiti individuals (Fig. 3B). Of these, three (Kuwait1–3) were part of a group of clusters consisted of all individuals from the Kuwait-B subgroup including three that formed a minor group. Four additional (Kuwait4–7) groups comprised a group of clusters that included groups from the AP, which included 38% individuals of the Kuwait-P subgroup and 75% of the Kuwait-S subgroup. Moreover, 33% of the Kuwait-P subgroup were part of a larger group that included Jews and populations from Anatolia, the Caucasus, and the Levant, which formed a sister group with the Druze population. This cluster scheme confirmed that the previous clustering postulating the existence of three different groups had only partially captured the high genetic variation of Kuwaitis.

**Fig. 3: Genetic structure based on haplotype-sharing pattern.**

The relationships among populations were further explored by PCA based on the chunkcount coancestry matrix. As shown in the PCA plot (Fig. 3A), PC1 and PC3 accounted for 0.24% and 0.08% of the variation, respectively. As such, the fineSTRUCTRE clustering of Kuwaiti individuals mirrors the assemblies of individuals of different Kuwaiti groups with corresponding populations of the same fineSTRUCTURE subcluster, rather than with each other.

The allele frequency and haplotype-based results further divided the genetic structure of the Kuwaiti population, highlighting the high genetic variability among the three Kuwaiti subgroups. Therefore, to ensure that the three subgroups sufficiently accounted for the existent genetic variability of the Kuwaiti population, we explored the pairwise TVD distribution at different levels. Basically, we calculated the TVD between individuals of a cluster considering haplotypic copying vectors of the individuals. When the intra-cluster TVD was taken into consideration, most of the Kuwaiti samples had the highest TVD among all clusters in the dataset, revealing strong heterogeneity (Supplementary Fig. 5). This heterogeneity was still evident when only the TVD (a) among Kuwaitis and all the members of the same clusters (Fig. 4C, D) and (b) among Kuwaitis of the same cluster (Supplementary Fig. 6) were explored, implying that the high diversity of the clusters with large numbers of Kuwaitis is, at least in part, driven by the high heterogeneity among native Kuwaitis, rather than by differences among individuals from the AP.

**Fig. 4: Ancestral genetic components, admixture history and genetic variability.**

Ancestral genetic components of the Kuwaiti population clusters

The extreme variation of the analyzed samples was also confirmed when the ancestry of the eight clusters was explored with the NNLS function (Fig. 4A). Namely, the proportion of inferred European, African, and Middle East/Arabian was quite variable. More specifically, the Kuwait1, Kuwait2, and Kuwait3 clusters were characterized by a similar proportion of the three ancestries, with a slightly higher contribution from Africa (39%, 36%, and 46% from the Yoruba cluster, respectively). In contrast, clusters Kuwait6, Kuwait7, and Kuwait8 were characterized by a large proportion of Middle East/Arabian ancestry (47%, 62%, and 66% from the Bedouins1 cluster, respectively). The Kuwait5 cluster had similar contributions from all sources, with the greatest contribution from Europe (26%, 33%, and 39% from the Yoruba, Bedouins1, and North/East Europe clusters, respectively). Finally, the Kuwait9 cluster had the highest European ancestry (54% from the North/East Europe cluster). The genetic contributions of the Druze and North Africa clusters were also evaluated, but the proportions did not vary much among the eight Kuwaiti clusters (Supplementary Fig. 7).

Recent admixture events

To identify the admixture events with different sources for the three Kuwaiti subgroups, we analyzed their haplotype data using GLOBETROTTER [45], which disclosed different genetic profiles of the major and minor ancestral sources of the three subgroups (Supplementary Fig. 8). The major ancestral sources of the Kuwait-P subgroup were Kuwait, Europe, and South Asia, those of the Kuwait-B subgroup were Africa and Kuwait, and that of the Kuwait-S subgroup was Kuwait. We found different minor ancestral sources of Africa, such as Mandenka and some hunter-gatherers, for the Kuwait-S subgroup, which also had high Central Asian ancestral sources (Supplementary Fig. 8A). On the other hand, the Kuwait-B subgroup underwent several African admixture events (Supplementary Fig. 8B). Moreover, while inferring the time of admixture events of the three groups, we observed the GLOBETROTTER results showing significantly more recent admixture events for the Kuwait-B subgroup than for the Kuwait-P and Kuwait-S subgroups (Fig. 4B, Wilcoxon test with Bonferroni-adjusted p values: KWB vs. KWP = 0.000012; KWB vs. KWS = 0.0012). More specifically, the mean inferred time for the Kuwait-B subgroup was 13 generations ago (5–95%: 2–27 generations ago), and similar values were observed for the Kuwait-P and Kuwait-S subgroups (~19 generations; 5–95%: 4–37 and 3–43 generations ago, respectively) (Fig. 4B). To gain additional confidence in the admixture events predicted by GLOBETROTTER, we applied ALDER [46] tool for admixture dating and observed that the time of admixture estimates obtained from ALDER (Supplementary Table 5) were consistent with GLOBETROTTER estimations.

Discussion

The results of the classical allele frequency-based analyses conducted in the present study are in agreement with our previous genome-wide study of the genetic substructure of the Kuwaiti population, which showed genetic heterogeneity [28]. Additionally, the RoH were relatively long and frequent in the Kuwaiti population, but also divergent among the subgroups, which is rather intriguing considering the high prevalence of consanguinity in the studied population. Thus, we applied refined statistical approaches to analyze the genetic structure of the Kuwaiti population in the context of available ancient samples and haplotypes of modern genomes. As the “within” population structure is embedded within haplotypes, we resolved the extent of genetic heterogeneity by fine-scale, high-resolution, haplotype-based analyses. We used ChromoPainter and fineSTRUCTURE to uncover the hidden genetic structure “within” the Kuwaiti population. These methods exploit the rich information available within haplotypes to identify clusters of genetically distinct individuals with a resolution that cannot be attained with the use of allele frequency-based methods. Through this approach, we could identify distinct genetic clusters of individuals that strongly segregate within the three Kuwaiti subgroups.

The results of f₃ and f₄ confirmed the existence of a heterogeneous genetic pattern among the Kuwaitis, signaling out a probable impact of distinct population dynamics that characterized the current genetic diversities of different Kuwaiti subgroups. The f₃ admixture results (Table 1) showed a distinct set of ancestral sources for the Kuwait-B subgroup, rather than for the Kuwait-P and Kuwait-S subgroups, suggesting a different genetic background of the Kuwait-B subgroup related to ancestors of contemporary Bedouins and Africans. Notably, a similar higher African-related genetic background of the Kuwait-B subgroup was also observed in the results of ADMIXTURE and CP-NNLS analyses. However, these results did not replicate similar patterns in the outgroup f₃ and f₄ statistics with African Mbuti as an outgroup, probably due to the masking of alleles shared with the outgroup. This higher masking of alleles related to the outgroup in the Kuwait-B subgroup plausibly reflects a greater number of African-related alleles among individuals in the Kuwait-B subgroup possibly due to a recent admixture event. There was greater genetic affinity between the Kuwait-P and Kuwait-S subgroups. With regard to other modern and ancient populations (Fig. 3A, B), the significant admixture signals with the use of the Saudi population as an admixture source (Table 1) indicate a relatively higher common genetic background of these two groups compared with the Kuwait-B subgroup. However, considering the relatively recent origin of the Kuwaiti population from the Saudi people, the discrepancies in the admixture signal, shared drift, and allele sharing pattern of the Kuwait-B subgroup, particularly in the context of both modern and ancient individuals, can be interpreted as a consequence of later gene flow from African-related populations to the Kuwait-B subgroup. This is obvious from the lowest number and length of RoH segments in the Kuwait-B subgroup (Fig. 1A), suggesting higher interactions with outsiders.

Meanwhile, the visibly higher genetic affinity of the Kuwait-S subgroup to the AP and African populations, compared with the Kuwait-P subgroup in admixture f₃ (Table 1), the shared drift (Fig. 2A) and allele sharing (Supplementary Table 2) patterns suggest a varied population history for both and a lesser extent of intermixing. In fact, the highest amount of RoH segments (both in the average number and length) of the Kuwait-S subgroup is also in agreement with inbreeding and negligible interactions with other groups. Such consanguinity is plausibly a causal factor among individuals in the Kuwait-S subgroup of being much closer to their ancestral source groups from the AP, which is also supported by the admixture f₃ results showing that among the three subgroups, the Kuwait-S subgroup was more of a basal group. The greater affinity of the Kuwait-S subgroup with populations from the AP is also corroborated by the f₃ outgroup results with modern data and even with ancient individuals where the Kuwait-S subgroup had greater affinity to ancient Levant farmers and hunter-gatherers than the Kuwait-P subgroup (Fig. 2B). This genetic affinity further corroborates that Neolithic farmers from the Fertile Crescent plausibly repopulated the AP [5]. Meanwhile, the genetic affinity of the Kuwait-P subgroup to European and Caucasus populations is in agreement with the Persian-related genetic background. Considering this fact, it is anticipated that the Kuwait-P subgroup would be more closely related to ancient populations from the Steppe, Caucasus, and Iran, with lesser genetic affinity to ancient Levantines and Natufian populations than the Kuwait-S subgroup, as reflected in outgroup f₃ with ancient genomes. It was intriguing to find less allele sharing of the Kuwait-B subgroup with Africans in f₃ with Mbuti as outgroup, especially considering the higher African genetic components as determined by ADMIXTURE analysis. However, in the additional f₃ using Papuans as the outgroup (Supplementary Fig. 4), strikingly, the Kuwait-B subgroup had the highest degree of drift sharing with West Africans, suggesting a recent gene flow between these groups.

Taking into account that both the West and East African populations were transported to the Middle East, Arabia, and Indian Ocean during the slave trade in the 15–19th centuries [47, 48], the f₃ admixture results of the Kuwait-B and Kuwait-S subgroups verified the impact of the slave trade on AP populations. Moreover, considering the NNLS results of higher ancestry contribution from Yoruba in Kuwait1, 2, and 3 clusters (with most Kuwait-B individuals), and the low level of variation in North African genetic ancestry profile for all eight Kuwaiti clusters in addition to the admixture signal with Mandenka, we infer a recent possible admixture event in the Kuwait-B subgroup with a West African (Mandenka or Yoruba) or sub-Saharan African group.

Furthermore, the GLOBETROTTER results reinforce different major and minor ancestral sources of all three Kuwaiti subgroups (Supplementary Fig. 8A) and varied admixture events (Supplementary Fig. 8B). The higher African source profile of the Kuwait-B subgroup suggests a more recent admixture event within the last ~10 generations (~300 years) for the Kuwait-B subgroup than the other two subgroups (Fig. 4B). The Kuwait-P and Kuwait-S subgroups share a similar single date of admixture approximately 18 generations (~500 years) ago (Fig. 4B; Supplementary Fig. 8B). These admixture events signify the role of the Arabian maritime dominance [15, 16] and slave trade [48, 49] by the genetic footprint in present-day Kuwaitis as was expected, but it is almost obvious that the Kuwait-B subgroup is probably one of the most exogamous groups inhabiting the AP with a different and more recent history of intermixing that plausibly involved slaves from Western or sub-Saharan Africa [50]. Moreover, the fineSTRUCTURE and PCA identified distinct genetic clusters of individuals that strongly segregated within the three Kuwaiti subgroups, indicating a much deeper and distinct level of genetic heterogeneity among present-day Kuwaitis.

The TVD results of the differences in genetic ancestry profiles of individuals of a cluster provided a better picture of the genetic heterogeneity of the Kuwaiti population. Notably, the intra-cluster TVD value was the highest (both in length and number) for Kuwaiti populations inhabiting the AP and remained highest when the TVD was calculated only among Kuwaiti individuals in a single cluster. Consequently, the results indicate that the heterogeneity of clusters was due to the Kuwaiti population, suggesting that Kuwaitis are one of the most genetically heterogeneous populations inhabiting the AP region. NNLS analysis also confirmed the genetic heterogeneity of the Kuwaiti population by showing the differential amount of genetic ancestries that contributed to each of these clusters by three different major donor populations of Yoruba, Bedouins, and North-East Europeans.

In general, Kuwaiti populations are demographically characterized by large families and a remarkable rate of consanguinity, which is a potential threat to human health, especially in the context of rare autosomal recessive genetic diseases. As there have been historical population migrations in this region, a complex genetic diversity is expected, which was clearly reflected in this study. Our thorough investigations on the Kuwaiti population genetic structure at the finest scale highlights the precise genetic history and distinct heterogeneity of the Kuwaiti people, which could enormously aid in the systematic discovery of population- and/or family-specific diseases, especially in deciphering deleterious founder genetic variations. Overall, our study presents the fine-scale genetic structure of the distinctively heterogeneous Kuwaiti population and further highlights the recent historical population influx and gene flow from Western/sub-Saharan Africa to the AP region.

References

Armitage SJ, Jasim SA, Marks AE, Parker AG, Usik VI, Uerpmann H-P. The Southern Route “Out of Africa”: evidence for an early expansion of modern humans into Arabia. Science. 2011;331:453–6.
Article CAS Google Scholar
Groucutt HS, Grün R, Zalmout IAS, Drake NA, Armitage SJ, Candy I, et al. Homo sapiens in Arabia by 85,000 years ago. Nat Ecol Evol. 2018;2:800–9.
Article Google Scholar
Stewart M, Clark-Wilson R, Breeze PS, Janulis K, Candy I, Armitage SJ, et al. Human footprints provide snapshot of last interglacial ecology in the Arabian interior. Sci Adv. 2020;6:eaba8940.
Article CAS Google Scholar
Petraglia MD, Groucutt HS, Guagnin M, Breeze PS, Boivin N. Human responses to climate and ecosystem change in ancient Arabia. Proc Natl Acad Sci USA. 2020;117:8263–70.
Article CAS Google Scholar
Uerpmann H-P, Potts DT, Uerpmann M. Holocene (Re-)occupation of Eastern Arabia. In: Petraglia MD, Rose JI, editors. The evolution of human populations in Arabia: paleoenvironments, prehistory and genetics. Dordrecht: Springer Netherlands; 2010. pp. 205–14.
Groucutt HS, Petraglia MD. The prehistory of the Arabian peninsula: deserts, dispersals, and demography. Evol Anthropol. 2012;21:113–25.
Article Google Scholar
Petraglia MD, Alsharekh A. The Middle Palaeolithic of Arabia: Implications for modern human origins, behaviour and dispersals. Antiquity. 2003;77:671–84.
Article Google Scholar
Petraglia MD, Breeze PS, Groucutt HS. Blue Arabia, Green Arabia: examining human colonisation and dispersal models. In: Rasul NMA, Stewart ICF, editors. Geological setting, palaeoenvironment and archaeology of the Red Sea. Cham: Springer International Publishing; 2019. pp. 675–83.
Boivin N, Fuller DQ. Shell Middens, ships and seeds: exploring coastal subsistence, maritime trade and the dispersal of domesticates in and around the ancient Arabian Peninsula. J World Prehistory. 2009;22:113–80.
Article Google Scholar
Omberg L, Salit J, Hackett N, Fuller J, Matthew R, Chouchane L, et al. Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations. BMC Genet. 2012;13:49.
Article Google Scholar
Scott EM, Halees A, Itan Y, Spencer EG, He Y, Azab MA, et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat Genet. 2016;48:1071–6.
Article CAS Google Scholar
Hajjej A, Almawi WY, Arnaiz-Villena A, Hattab L, Hmida S. The genetic heterogeneity of Arab populations as inferred from HLA genes. PLoS ONE. 2018;13:e0192269.
Article Google Scholar
Eaaswarkhanth M, Dos Santos ALC, Gokcumen O, Al-Mulla F, Thanaraj TA. Genome-wide selection scan in an Arabian Peninsula population identifies a TNKS haplotype linked to metabolic traits and hypertension. Genome Biol Evol. 2020;12:77–87.
Article Google Scholar
Tadmouri GO, Sastry KS, Chouchane L. Arab gene geography: From population diversities to personalized medical genomics. Glob Cardiol Sci Pract. 2014;2014:394–408.
PubMed PubMed Central Google Scholar
Potts DT. The archaeology and early history of the Persian Gulf. In: Potter LG, editor. The Persian Gulf in history. New York: Palgrave Macmillan US; 2009. pp. 27–56.
Potts DT. Trends and patterns in the archaeology and pre-modern history of the Gulf Region. In: Peterson JE, editor. The Emergence of the Gulf States: Studies in Modern History. London: Bloomsbury; 2016. pp. 19–42.
Khoury PS, Kostiner J. Tribes and State formation in the Middle East. Berkeley: University of California Press; 1991.
Slot B. Kuwait: the growth of a historic identity. London: Arabian Publishing; 2003.
Lienhardt P. Shaikhdoms of Eastern Arabia. In: Al-Shahi A, editor. London: Palgrave Macmillan; 2001.
Casey MS. The history of Kuwait. Westport, Connecticut, United States: Greenwood Publishing Group; 2007.
Alenizi M, Goodwin W, Ismael S, Hadi S. STR data for the AmpFlSTR Identifiler loci in Kuwaiti population. Leg Med. 2008;10:321–5.
Article CAS Google Scholar
Teebi AS. Autosomal recessive disorders among Arabs: an overview from Kuwait. J Med Genet. 1994;31:224–33.
Article CAS Google Scholar
Scheible M, Alenizi M, Sturk-Andreaggi K, Coble MD, Ismael S, Irwin JA, et al. Mitochondrial DNA control region variation in a Kuwaiti population sample. Forensic Sci Int Genet. 2011;5:e112–3.
Article CAS Google Scholar
Theyab JB, Al-Bustan S, Crawford MH. The genetic structure of the Kuwaiti population: mtDNA Inter- and intra-population variation. Hum Biol. 2012;84:379–403.
Article Google Scholar
Eaaswarkhanth M, Melhem M, Sharma P, Nizam R, Al Madhoun A, Chaubey G, et al. Mitochondrial DNA D-loop sequencing reveals obesity variants in an Arab population. Appl Clin Genet. 2019;12:63–70.
Article CAS Google Scholar
Mohammad T, Xue Y, Evison M, Tyler-Smith C. Genetic structure of nomadic Bedouin from Kuwait. Heredity 2009;103:425–33.
Article CAS Google Scholar
Triki-Fendri S, Sánchez-Diz P, Rey-González D, Alfadhli S, Ayadi I, Ben Marzoug R, et al. Genetic structure of the Kuwaiti population revealed by paternal lineages. Am J Hum Biol. 2016;28:203–12.
Article Google Scholar
Alsmadi O, Thareja G, Alkayal F, Rajagopalan R, John SE, Hebbar P, et al. Genetic substructure of Kuwaiti population reveals migration history. PLoS ONE. 2013;8:e74913.
Article CAS Google Scholar
Alsmadi O, John SE, Thareja G, Hebbar P, Antony D, Behbehani K, et al. Genome at juncture of early human migration: a systematic analysis of two whole genomes and thirteen exomes from Kuwaiti population subgroup of inferred Saudi Arabian tribe ancestry. PLoS ONE. 2014;9:e99069.
Article Google Scholar
John SE, Thareja G, Hebbar P, Behbehani K, Thanaraj TA, Alsmadi O. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis. Genom Data. 2015;3:116–27.
Article Google Scholar
Thareja G, John SE, Hebbar P, Behbehani K, Thanaraj TA, Alsmadi O. Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry. BMC Genom. 2015;16:92.
Article Google Scholar
John SE, Antony D, Eaaswarkhanth M, Hebbar P, Channanath AM, Thomas D, et al. Assessment of coding region variants in Kuwaiti population: implications for medical genetics and population genomics. Sci Rep. 2018;8:16583.
Article Google Scholar
Hebbar P, Elkum N, Alkayal F, John SE, Thanaraj TA, Alsmadi O. Genetic risk variants for metabolic traits in Arab populations. Sci Rep. 2017;7:40988.
Article CAS Google Scholar
Hebbar P, Nizam R, Melhem M, Alkayal F, Elkum N, John SE, et al. Genome-wide association study identifies novel recessive genetic variants for high TGs in an Arab population. J Lipid Res. 2018;59:1951–66.
Article CAS Google Scholar
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015;4:7.
Article Google Scholar
Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat Methods. 2011;9:179–81.
Article Google Scholar
Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38:1358–70.
CAS PubMed Google Scholar
Patterson N, Price AL, Reich D. Population structure and eigen analysis. PLoS Genet. 2006;2:e190.
Article Google Scholar
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–64.
Article CAS Google Scholar
Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192:1065–93.
Article Google Scholar
Petr M, Pääbo S, Kelso J, Vernot B. Limits of long-term selection against Neandertal introgression. Proc Natl Acad Sci USA. 2019;116:1639–44.
Article CAS Google Scholar
Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8:e1002453.
Article CAS Google Scholar
Montinaro F, Busby GBJ, Pascali VL, Myers S, Hellenthal G, Capelli C. Unravelling the hidden ancestry of American admixed populations. Nat Commun. 2015;6:6596.
Article CAS Google Scholar
Busby GBJ, Hellenthal G, Montinaro F, Tofanelli S, Bulayeva K, Rudan I, et al. The role of recent admixture in forming the contemporary West Eurasian genomic landscape. Curr Biol. 2015;25:2878.
Article CAS Google Scholar
Hellenthal G, Busby GBJ, Band G, Wilson JF, Capelli C, Falush D, et al. A genetic atlas of human admixture history. Science. 2014;343:747–51.
Article CAS Google Scholar
Loh P-R, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193:1233–54.
Article Google Scholar
Gordon M. Slavery in the Arab World. London: Rowman & Littlefield; 1989.
Ugo Nwokeji G. Slavery in non-Islamic West Africa, 1420-820. In: Eltis D, Engerman S, editors. The Cambridge World History of Slavery. Cambridge: Cambridge University Press; 2011. pp. 81–110.
Segal R. Islam’s Black Slaves: the Other Black Diaspora. New York: Farrar, Straus and Giroux; 2001.
Fernandes V, Triska P, Pereira JB, Alshamali F, Rito T, Machado A, et al. Genetic stratigraphy of key demographic events in Arabia. PLoS ONE. 2015;10:e0118625.
Article Google Scholar

Download references

Acknowledgements

This work was supported by a research grant to the Dasman Diabetes Institute from the Kuwait Foundation for the Advancement of Sciences (grant no. RA 2015-022). We thank the members of the National Dasman Diabetes BioBank Core Facility for sample processing and DNA extraction. We acknowledge Fadi Alkayal and Motasem Melhem for genotyping assays. We thank Prof. Gyaneshwer Chaubey for his support during the initial stages of this work. We thank Dr. Luca Pagani for his useful comments and suggestions.

Funding

This work was supported by a research grant to the Dasman Diabetes Institute from the Kuwait Foundation for the Advancement of Sciences (grant no. RA 2015-022).

Author information

These authors contributed equally: Muthukrishnan Eaaswarkhanth, Ajai K. Pathak

Authors and Affiliations

Department of Genetics and Bioinformatics, Dasman Diabetes Institute, Dasman, Kuwait
Muthukrishnan Eaaswarkhanth, Prashantha Hebbar, Osama Alsmadi, Fahd Al-Mulla & Thangavel Alphonse Thanaraj
Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia
Ajai K. Pathak, Linda Ongaro, Francesco Montinaro & Mait Metspalu
Department of Evolutionary Biology, Institute of Molecular and Cell Biology, Tartu, Estonia
Ajai K. Pathak & Linda Ongaro
Department of Biology-Genetics, University of Bari, Bari, Italy
Francesco Montinaro
Department of Cell Therapy and Applied Genomics, King Hussein Cancer Center, Amman, Jordan
Osama Alsmadi

Authors

Muthukrishnan Eaaswarkhanth
View author publications
You can also search for this author in PubMed Google Scholar
Ajai K. Pathak
View author publications
You can also search for this author in PubMed Google Scholar
Linda Ongaro
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Montinaro
View author publications
You can also search for this author in PubMed Google Scholar
Prashantha Hebbar
View author publications
You can also search for this author in PubMed Google Scholar
Osama Alsmadi
View author publications
You can also search for this author in PubMed Google Scholar
Mait Metspalu
View author publications
You can also search for this author in PubMed Google Scholar
Fahd Al-Mulla
View author publications
You can also search for this author in PubMed Google Scholar
Thangavel Alphonse Thanaraj
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ME and TAT designed the study. OA was responsible for participant recruitment, sample collection, and genotyping. PH processed the raw genotype data. OA and PH were involved in subgroup classification of the samples. ME, AKP, LO, and FM analyzed the data. ME, AKP, and MM contributed to the interpretation of results. ME and AKP wrote the main article. TAT, FAM, LO, and FM contributed to the writing of the article. FAM provided the required resources and critically reviewed and approved the final version of the article. All authors reviewed and approved the final version of the article prior to submission.

Corresponding authors

Correspondence to Fahd Al-Mulla or Thangavel Alphonse Thanaraj.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Eaaswarkhanth, M., Pathak, A.K., Ongaro, L. et al. Unraveling a fine-scale high genetic heterogeneity and recent continental connections of an Arabian Peninsula population. Eur J Hum Genet 30, 307–319 (2022). https://doi.org/10.1038/s41431-021-00861-6

Download citation

Received: 31 October 2020
Revised: 09 February 2021
Accepted: 03 March 2021
Published: 22 March 2021
Issue Date: March 2022
DOI: https://doi.org/10.1038/s41431-021-00861-6

This article is cited by

Linkage analysis using whole exome sequencing data implicates SLC17A1, SLC17A3, TATDN2 and TMEM131L in type 1 diabetes in Kuwaiti families
- Prashantha Hebbar
- Rasheeba Nizam
- Fahd Al-Mulla
Scientific Reports (2023)