Introduction

The Arabian Peninsula (AP) is a melting pot of human diversity and culture. Its geographical location, as a land bridge connecting Africa and Eurasia, served as a gateway for human migration from the African continent to the rest of the world [1]. Recently emerging archeological findings and ancient artifacts indicate that humans occupied AP much earlier than previously thought, thereby providing a new paradigm of the migration of modern humans from Africa [2, 3]. In contrast to the present arid desert landscape, the AP region was once humid with a green environment that bolstered modern human dispersals [4]. It has been estimated that the present hot and dry desert climate would have started around 6 kya [4], and there was discontinuity in population movements in the AP region between the late Pleistocene and Early Holocene [5]. However, the AP was repopulated later [5] and gradually occupants adapted to the arid ecological environment [6] establishing distinct settlements [6,7,8]. Moreover, the Arabian Sea served as a maritime corridor for spice trade between Africa, Arabia, South Asia, and Southeast Asia [9]. The early nomadic lifestyle in the arid environment, in addition to the frequent influx of other populations, subsequent admixture, and consanguineous practices drove the emergence of indigenous ethnic populations throughout the AP region [6]. The multiple waves of migration of populations from Europe, Asia, and Africa drove the genetic diversity of ethnic Arab populations. For example, human populations that migrated to Qatar have been genetically categorized into three groups based on putative ancestors from Arabia/Bedouin, Persia/South Asia, and Africa [10]. As evident from a few genetic studies, genetically heterogeneous groups currently populate the AP [11, 12]. Our recent genome-wide selection analysis also highlights this inter-regional genetic heterogeneity [13]. The incidence of genetic heterogeneity in the highly consanguineous Arab populations increases the complexity of understanding the genetic bases and etiology of various diseases prevalent in the Arab populations. Notably, more than 1100 different genetic diseases have been reported in Arab populations, of which 60% are autosomal recessive and 44% are restricted to a specific population or geographical region [14]. Hence, there is an urgent need for the genetic characterization of these regional populations at the finest level. Therefore, it is imperative to comprehend the fine-scale genetic structures and regional heterogeneities of populations inhabiting the AP. However, the details surrounding this context remain understudied. In consideration of this background, we conducted high-resolution population genetic analyses to better understand the fine-scale patterns of the genetic diversity and the extent of genetic heterogeneity of the Arab people of Kuwait.

Kuwait is located in the northeastern corner of the AP and is bordered by Saudi Arabia to the south and Iraq to the north. Several archeological explorations reflect Kuwait as a vital location which has been used by ancient civilizations as a military or trade station [15, 16]. The State of Kuwait was established during the early 18th century by pastoral-nomadic tribes that migrated from Saudi Arabia [17] and underwent a dramatic transformation after the discovery of oil in the late 19th century [18]. Apart from cattle ranching, the early settlers were involved in various occupations such as fishing and merchant seafaring for survival [19]. The large-scale maritime trade activities with India and Africa set Kuwait as a nexus connecting India, Arabia, and Persia to Europe [18]. These occupations stratified groups socially into aristocrats, merchants, and nomads (Bedouins) [20]. The oil boom endowed Kuwait with an affluent economy, high employment opportunities, migration of skilled workers from Asia and Africa, high-tech infrastructures, and westernized lifestyles. The resettlement of peoples from neighboring regions, mainly Saudi Arabia and Persia, and the admixture of various populations, facilitated gene flow [21], thereby increasing the genetic diversity. As Arab populations are well-known for their overwhelming consanguinity, Kuwait is no exception and eventually face the consequences in the form of severe recessive diseases [22]. Studies based on uniparental markers have delineated the maternal [23,24,25] and paternal [26, 27] diversity in Kuwait. Our earlier study based on genome-wide single nucleotide polymorphisms (SNPs) delineated the genetic substructure of the Kuwait population based on their surnames and genetic ancestry [28]. Accordingly, putative ancestors of the current Kuwaiti population can be traced to Saudi Arabian (Kuwait-S), Persian (Kuwait-P), and Bedouin (Kuwait-B) populations, as further corroborated by whole exome- and genome-based studies [29,30,31,32]. However, the extent of genetic heterogeneity and fine-scale population structure have not yet been elucidated. Therefore, we conducted the present investigation to expand the findings of our previous studies by considerably increasing the sample size and comparing them with the growing body of genomes from modern and ancient specimens across West Eurasia, in addition to applying multiple haplotype-based analyses on a genome-wide scale.

Materials and methods

The detailed materials and methods are presented in the Supplementary Text.

Samples

We included DNA samples of 620 Kuwaiti individuals from the State of Kuwait who were part of a larger cohort collected for studies of metabolic disorders [28, 33, 34]. All participants were healthy and enrolled after obtaining written informed consent. The study protocol was approved by the Ethical Review Committee of the Dasman Diabetes Institute (Kuwait). Participant recruitment, sample collection, and related procedures were conducted in accordance with the tenets of the Declaration of Helsinki. Each of the 620 individuals were assigned to subgroups (Kuwait-P, Kuwait-S, and Kuwait-B) based on their surnames as described elsewhere [28].

Genotyping and quality control

The 620 individuals were genotyped using HumanOmniExpress arrays for 730,525 SNPs (Illumina, Inc., San Diego, CA, USA) in accordance with the manufacturers' protocol. Quality control checks and data filtering were conducted using the PLINK v1.9 [35]. After QC and relatedness filtering, 583 individuals and 587,819 SNPs met the inclusion criteria for analysis (Supplementary Text and Supplementary Fig. 1). The genotypes from the individuals analyzed in this study has been deposited at the EGA repository (European Genome-Phenome Archive; accession number: EGAS00001005034).

Datasets, merging, and phasing

We merged our data with the published genome-wide SNP genotype data of global populations (Supplementary Fig. 1 and Supplementary Table 1) available from the Estonian Biocentre database (https://evolbio.ut.ee/). The combined dataset was filtered using PLINK v1.9 [35] to include 244,688 SNPs and 2139 individuals (Supplementary Text). The filtered combined dataset was phased using the SHAPEIT algorithm [36] and used for further analyses. To avoid the effects of markers with a strong linkage disequilibrium (LD), we thinned the marker set by removing SNPs with an r2 value of >0.4 using a sliding window of 200 SNPs, shifted at intervals of 25 SNPs. The pruned dataset yielded 155,744 SNPs that were used for the relevant population genetics analyses, including Wright’s F-statistic (FST), principal component analysis (PCA), the ADMIXTURE for ancestry estimation, and runs of homozygosity (RoH).

Population structure analyses: F ST, PCA, ADMIXTURE, and RoH

To explore the population genetic structure, we initially computed the mean pairwise FST differences between all population groups using the Weir and Cockerham method [37] implemented in PLINK v1.9 [35]. Next, we conducted PCA of the LD-pruned combined dataset using the smartpca program in the EIGENSOFT software package version 6.1.4 [38]. Further, we ran an unsupervised structure-like analysis using ADMIXTURE v1.3.0 [39] on the same dataset 25 times at different time intervals, with K values ranging from 2 to 12. Notably, K = 9 was the best supported K value as determined from the lowest cross-validation indexes. RoH was estimated using PLINK v1.9 [35] with a sliding window of 100 SNPs (1000 kb), allowing for one heterozygous and five missing calls per window.

Analyses to test admixture events and relative allele sharing: f 3 and f 4 statistics

We computed the f3 and f4 statistics using the qp3Pop and qpDstat programs (with f4 mode: YES) implemented in the ADMIXTOOLS package [40]. A dataset containing 244,688 SNPs and 2139 individuals was used for the f-statistics analyses of modern individuals. We merged the dataset from the modern and ancient human genomes, which contained a combined total of 231,418 SNPs and 3697 individuals. We computed the derived allele sharing of the Kuwaiti population using outgroup f3 to measure the genetic similarity of different Kuwaiti groups. Admixture f3 was calculated to infer the plausible admixing sources in the history of the Kuwaiti population. We calculated the f4 statistic to evaluate the level of gene flow between contemporary populations of Kuwaitis and their regional neighbors and allele sharing between modern Kuwaitis and available published data of ancient individuals from surrounding regions. Further, we computed f4-ratio [41] to estimate the amount of Neanderthal ancestry present in the Kuwaiti subgroups. More details on the estimations are presented in the Supplementary Text.

Haplotype-based fine-scale analyses: ChromoPainter and fineSTRUCTURE

We used the phased dataset with 244,688 SNPs and 2139 individuals for haplotype-based analyses. ChromoPainter was used to “paint” each individual as a combination of all other sequences [42]. We executed the “All vs. All” mode (using the -a flag), where all individuals are considered as both donors and recipients. Next, we analyzed the resulting painted dataset using fineSTRUCTURE [42] to identify genetically homogenous clusters. The step-wise run parameters are described in the Supplementary Text. The analyzed individuals were initially classified into 233 clusters, which we reduced to increase the interpretability of subsequent analyses. More specifically, we iteratively “climbed the tree” and the combined branches consisted of less than five clusters if at least one of them was composed of less than five individuals. The obtained tree was further refined by pooling together pairs or triplets of clusters if the pairwise total variation distance (TVD) based on the number of chunks shared among members of a branch was >0.035. After refinement, 40 clusters remained.

Nonnegative least square (NNLS)

Starting from the copying vectors obtained with the ChromoPainter, we reconstructed the ancestry profile of each cluster or individual by applying a slight modification of the NNLS function of R v3.5.1 (https://www.r-project.org/), as described elsewhere [43]. Therefore, for each individual belonging to a Kuwait cluster and for each of these clusters, we decomposed their ancestry as a mixture (with proportions summing to 1) of five (North/East Europe, Bedouins1, Yoruba, Druze, and North Africa clusters) and three (only North/East Europe, Bedouins1, and Yoruba) putative ancestral sources.

Exploring the variability of Kuwaiti individuals via pairwise TVD analysis

To obtain a detailed picture of the variation underlying modern-day Kuwaiti population, we determined the pairwise TVD [44] among different individuals in specific clusters. TVD indicates the differences in ancestry profiles among individuals, where a high TVD value indicates high heterogeneity. First, we determined the TVD among individuals of the same cluster. Second, focusing only on Kuwaiti individuals, we determined the TVD among Kuwaitis and all the members of the respective clusters. Third, for each cluster, we determined the TVD only among Kuwaiti individuals. The analysis was performed in consideration of the number and length of genomic fragments inherited among individuals.

Estimation of admixture dates

The times of admixture events were investigated using the GLOBETROTTER [45]. We applied an “individual” approach by analyzing each Kuwaiti individual alone. The estimation parameters are described in the Supplementary Text. For each of the inferred admixture events, we considered only those that were characterized by bootstrap values for the time of an admixture event between 1 and 400. We also computed a weighted LD statistic estimating the date of admixture using ALDER v1.03 that model the decay of admixture LD [46] to validate the admixture events calculated by ALDER and GLOBETROTTER are consistent.

Results

Population structure

As populations inhabiting the AP are well-known for their consanguinity, we compared the RoH patterns of the Kuwaiti population with those of regional and continental populations. Among the three Kuwaiti subgroups, the Kuwait-S subgroup was proximal to the AP and the Middle East in terms of both the length and number of RoH segments (Fig. 1A), whereas the Kuwait-B and Kuwait-P subgroups were distant from the Kuwait-S subgroup, with the Kuwait-B subgroup displaying the lowest average length and number of RoH segments (Fig. 1A).

Fig. 1: Population structure and ancestry components.
figure 1

A A scatter plot showing the average number of RoH segments and total RoH length in kb. B An ADMIXTURE plot of individual ancestry proportions at K = 9. C The mean pairwise FST values show the genetic distances between regional and continental populations.

Furthermore, we applied the structure-like clustering algorithm ADMIXTURE to determine the detailed discrete genetic structure of the Kuwait population [39]. A given number of distinct ancestral populations is input as clusters (K) and the ADMIXTURE computes the genetic ancestry proportion of each individual for individual clusters or ancestral populations. We chose the best K value of 9 with the lowest cross-validation index. At K = 9, the Kuwaiti populations were characterized by six substantial ancestral components like their neighboring populations across the AP and the Middle East (Fig. 1B). However, each of the three specified Kuwaiti subgroups harbor different proportions of these ancestral components. The brown (Arabian) component, which was shared by all populations of the AP and the Middle East and maximized in Saudis and Bedouins, was the most prominent in Kuwait-S, followed by Kuwait-B and least in Kuwait-P subgroups. The Caucasus (orange), North African Mozabite-like (magenta), South Asian (red), and Kalash-like light green components were higher in the Kuwait-P subgroup (alike Iranians) than the two other Kuwaiti subgroups. The Kuwait-B subgroup was distinct among the Kuwaitis by having the highest overall African ancestry, which was dominated by the West African Mandenka/Yoruba-like blue component (Fig. 1B).

Assessment of mean pairwise FST differences reflected the regional affinity through the low degree of differentiation between the Kuwaiti subgroups and the AP populations (Fig. 1C). The Kuwait-P subgroup had a shorter genetic distance (FST < 0.01) from Arabians, Middle Easterners, Caucasians, South Asians, and Europeans, and the Kuwait-B subgroup with Arabians, and the Kuwait-S subgroup with Arabians and Middle Easterners. The highest degree of differentiation (FST > 0.05) was observed between the Kuwait-P/Kuwait-S subgroups and Africans/East Asians. The Kuwait-B subgroup was genetically distant from East Asians, but not Africans (Fig. 1C). The individual population-wise FST differences are presented in Supplementary Fig. 2.

PCA was performed to evaluate the relationships among global and regional populations, which inferred the geographical affinity of all three Kuwaiti subgroups from their distributions in the close vicinity of their Arabian and Middle Eastern neighbors (Supplementary Fig. 3). As the allele frequency-based PCA did not reveal the heterogeneous population structure at the finest level, we further performed PCA of the haplotype copying vectors (Fig. 3) together with other haplotype-based investigations.

Drift and allelic sharing of the Kuwaiti population with modern and ancient individuals

The f3 admixture results provide distinct patterns of the histories of all three Kuwaiti subgroups, wherein the Kuwait-B subgroup had a significant admixture signal (i.e., Z-score < −3) when one of the ancestral sources was either Bedouin or Saudi, while the second source was a different African population (Table 1). Both the Kuwait-S and Kuwait-P subgroups had a common ancestral source in Saudis, but not Bedouins, while the second source varied. The signal for the Kuwait-S subgroup was significant only when the second ancestral source was African, whereas the Kuwait-P subgroup had a significant admixture signal when the second ancestral source was a population from South Asia, Parsi, Caucasus, or Central Asia. Hence, among the three subgroups, the Kuwait-S is probably related to an ancestral basal AP population group.

Table 1 Admixture f3 (Source1, Source2; Target) to detect putative admixing sources in context of three assigned Kuwaiti population subgroups.

By f3 outgroup analysis with modern populations using Mbuti as an outgroup, we observed that all three Kuwaiti subgroups had an identical declining pattern of shared drift from Europe to East Asia (Fig. 2A). The results also showed that the Kuwait-B subgroup had a distinctly lower shared drift than both the Kuwait-P and Kuwait-S subgroups, in regard to all modern populations across the globe. Compared with the Kuwait-S subgroup, the Kuwait-P subgroup had a higher shared drift with most Eurasian populations, with the exception of those from the AP and Africa. However, the Kuwait-S subgroup had a higher shared drift with populations with deeper Arabian and African genetic backgrounds. Interestingly, among the three subgroups, the Kuwait-B subgroup had the least drift sharing with Africans, which can probably be explained by the higher masking of African-related alleles from the outgroup. To verify this hypothesis, we performed another f3 analysis (Supplementary Fig. 4) using Papuans (instead of Mbuti) as an outgroup underlying the idea of an early split of Papuans from other non-Africans. Consequently, we observed that the drift sharing was greater for the Kuwait-B subgroup with Africans than the other two Kuwaiti groups. This drift sharing of the Kuwait-B subgroup with Africans was relatively higher with Yoruba/Mandenka/Bantu/African Pygmy populations than that with Ethiopian/Moroccan/Mozabite populations.

Fig. 2: Outgroup f3 (Mbuti; Pop1, X) results of the three subgroups of Kuwaiti populations with Mbuti as an outgroup.
figure 2

A Comparison of the shared drift of the Kuwaiti populations to that of other modern individuals. The continental group color code for the populations on the y-axis: brown4-Africa; blue-Arabian Peninsula; blueviolet-Middle East; deepskyblue-Europe; coral-Caucasus; chartreuse4-Central Asia; darkorange1-South Asia; cyan3-East Asia. B Relative affinities to ancient individuals from West Eurasia. The regional color code for the populations on the y-axis: chartreuse4-America; blue-Anatolia; brown1-Caucasus; deepskyblue-Europe; cornflowerblue-Iran; blueviolet-Iran_Turan; brown4-Levant; deeppink-Siberia; darkorange1-South Asia; cyan3-Steppe.

In the context of the f3 outgroup analysis with ancient West Eurasian individuals, the Kuwait-B subgroup again stood apart from the other subgroups due to the low derived allele sharing with all analyzed ancient genomes (Fig. 2B). The Kuwait-P subgroup had the highest genetic affinity with Iberian_BA or Anatolian_ChL individuals but showed higher genetic relatedness with most of the ancient individuals compared with the Kuwait-B and Kuwait-S subgroups. With respect to the Europe_EN and Europe_MnChl, the Kuwait-P and Kuwait-S subgroups shared equal amounts of derived alleles. Interestingly, the Kuwait-S subgroup had higher genetic sharing with the Levant_N, Levant_BA, and Natufians than the Kuwait-P and Kuwait-B subgroups. In general, all three subgroups showed a pattern of higher genetic affinity with ancient individuals from the Steppe and Caucasus regions.

The f4 statistical results of the modern genomes (Supplementary Table 2) showed higher allelic sharing (gene flow) of the Kuwait-P and Kuwait-S subgroups with all Eurasian populations, compared with the Kuwait-B subgroup. Kuwait-P was found to have the highest sharing of alleles with all Eurasian populations, with the exception of those from the AP and Africa. Populations of the AP and Africa mostly shared higher alleles with the Kuwait-S subgroup than with the Kuwait-P and Kuwait-B subgroups. Given the pronounced African genetic component of the Kuwait-B subgroup in the ADMIXTURE results, the less allelic sharing in f-stats could be the consequence of higher masking of African-like alleles. The f4 statistical analysis with aDNA also disclosed a similar pattern, as the Kuwait-P and Kuwait-S subgroups shared higher alleles with all ancient individuals than with the Kuwait-B subgroup (Supplementary Table 3). Kuwait-P shared more alleles with almost all ancient individuals than the Kuwait-S subgroup, except in the context of the Levant_N, Levant_BA, and Natufian, which shared more alleles with the Kuwait-S subgroup than with the Kuwait-P subgroup. However, with regard to the Europe_EN and Anatolia_N, both the Kuwait-P and Kuwait-S subgroups shared almost equivalent numbers of alleles. From the f4-ratio estimates, we observed very low proportions of Neanderthal ancestry (<0.5%) in the Kuwaiti subgroups (Supplementary Table 4). As the associated standard errors are quite high, no definite conclusion could be reached in this context.

Distinct genetic heterogeneity in the Kuwaiti population

To understand the fine-scale population structure and recent admixture history, we scrutinized the haplotype-sharing patterns among Kuwaiti individuals. Through fineSTRUCTURE analysis, we inferred the existence of eight different groups that included at least five Kuwaiti individuals (Fig. 3B). Of these, three (Kuwait1–3) were part of a group of clusters consisted of all individuals from the Kuwait-B subgroup including three that formed a minor group. Four additional (Kuwait4–7) groups comprised a group of clusters that included groups from the AP, which included 38% individuals of the Kuwait-P subgroup and 75% of the Kuwait-S subgroup. Moreover, 33% of the Kuwait-P subgroup were part of a larger group that included Jews and populations from Anatolia, the Caucasus, and the Levant, which formed a sister group with the Druze population. This cluster scheme confirmed that the previous clustering postulating the existence of three different groups had only partially captured the high genetic variation of Kuwaitis.

Fig. 3: Genetic structure based on haplotype-sharing pattern.
figure 3

A PCA based on haplotype sharing. The chunkcount coancestry matrix obtained with the ChromoPainter was used to perform PCA analysis with prcomp in R. The first and third components are shown in the scatterplot. B The refined fineSTRUCTURE tree consisted of 40 homogeneous clusters. Clusters containing more than two Kuwaiti individuals are presented in various colors, as shown in the legend.

The relationships among populations were further explored by PCA based on the chunkcount coancestry matrix. As shown in the PCA plot (Fig. 3A), PC1 and PC3 accounted for 0.24% and 0.08% of the variation, respectively. As such, the fineSTRUCTRE clustering of Kuwaiti individuals mirrors the assemblies of individuals of different Kuwaiti groups with corresponding populations of the same fineSTRUCTURE subcluster, rather than with each other.

The allele frequency and haplotype-based results further divided the genetic structure of the Kuwaiti population, highlighting the high genetic variability among the three Kuwaiti subgroups. Therefore, to ensure that the three subgroups sufficiently accounted for the existent genetic variability of the Kuwaiti population, we explored the pairwise TVD distribution at different levels. Basically, we calculated the TVD between individuals of a cluster considering haplotypic copying vectors of the individuals. When the intra-cluster TVD was taken into consideration, most of the Kuwaiti samples had the highest TVD among all clusters in the dataset, revealing strong heterogeneity (Supplementary Fig. 5). This heterogeneity was still evident when only the TVD (a) among Kuwaitis and all the members of the same clusters (Fig. 4C, D) and (b) among Kuwaitis of the same cluster (Supplementary Fig. 6) were explored, implying that the high diversity of the clusters with large numbers of Kuwaitis is, at least in part, driven by the high heterogeneity among native Kuwaitis, rather than by differences among individuals from the AP.

Fig. 4: Ancestral genetic components, admixture history and genetic variability.
figure 4

A Ancestry proportions of the main eight Kuwaiti clusters as inferred by NNLS analysis using the North/East Europe, Bedouins, and Yoruba clusters as putative sources. B Inferred admixture dates by GLOBETROTTER population-based analysis. Populations on the x-axis denote Kuwait-B (KWB), Kuwait-P (KWP), and Kuwait-S (KWS). C, D Intra-cluster TVD of the number and length of genomic fragments of Kuwaitis and individuals from the same cluster.

Ancestral genetic components of the Kuwaiti population clusters

The extreme variation of the analyzed samples was also confirmed when the ancestry of the eight clusters was explored with the NNLS function (Fig. 4A). Namely, the proportion of inferred European, African, and Middle East/Arabian was quite variable. More specifically, the Kuwait1, Kuwait2, and Kuwait3 clusters were characterized by a similar proportion of the three ancestries, with a slightly higher contribution from Africa (39%, 36%, and 46% from the Yoruba cluster, respectively). In contrast, clusters Kuwait6, Kuwait7, and Kuwait8 were characterized by a large proportion of Middle East/Arabian ancestry (47%, 62%, and 66% from the Bedouins1 cluster, respectively). The Kuwait5 cluster had similar contributions from all sources, with the greatest contribution from Europe (26%, 33%, and 39% from the Yoruba, Bedouins1, and North/East Europe clusters, respectively). Finally, the Kuwait9 cluster had the highest European ancestry (54% from the North/East Europe cluster). The genetic contributions of the Druze and North Africa clusters were also evaluated, but the proportions did not vary much among the eight Kuwaiti clusters (Supplementary Fig. 7).

Recent admixture events

To identify the admixture events with different sources for the three Kuwaiti subgroups, we analyzed their haplotype data using GLOBETROTTER [45], which disclosed different genetic profiles of the major and minor ancestral sources of the three subgroups (Supplementary Fig. 8). The major ancestral sources of the Kuwait-P subgroup were Kuwait, Europe, and South Asia, those of the Kuwait-B subgroup were Africa and Kuwait, and that of the Kuwait-S subgroup was Kuwait. We found different minor ancestral sources of Africa, such as Mandenka and some hunter-gatherers, for the Kuwait-S subgroup, which also had high Central Asian ancestral sources (Supplementary Fig. 8A). On the other hand, the Kuwait-B subgroup underwent several African admixture events (Supplementary Fig. 8B). Moreover, while inferring the time of admixture events of the three groups, we observed the GLOBETROTTER results showing significantly more recent admixture events for the Kuwait-B subgroup than for the Kuwait-P and Kuwait-S subgroups (Fig. 4B, Wilcoxon test with Bonferroni-adjusted p values: KWB vs. KWP = 0.000012; KWB vs. KWS = 0.0012). More specifically, the mean inferred time for the Kuwait-B subgroup was 13 generations ago (5–95%: 2–27 generations ago), and similar values were observed for the Kuwait-P and Kuwait-S subgroups (~19 generations; 5–95%: 4–37 and 3–43 generations ago, respectively) (Fig. 4B). To gain additional confidence in the admixture events predicted by GLOBETROTTER, we applied ALDER [46] tool for admixture dating and observed that the time of admixture estimates obtained from ALDER (Supplementary Table 5) were consistent with GLOBETROTTER estimations.

Discussion

The results of the classical allele frequency-based analyses conducted in the present study are in agreement with our previous genome-wide study of the genetic substructure of the Kuwaiti population, which showed genetic heterogeneity [28]. Additionally, the RoH were relatively long and frequent in the Kuwaiti population, but also divergent among the subgroups, which is rather intriguing considering the high prevalence of consanguinity in the studied population. Thus, we applied refined statistical approaches to analyze the genetic structure of the Kuwaiti population in the context of available ancient samples and haplotypes of modern genomes. As the “within” population structure is embedded within haplotypes, we resolved the extent of genetic heterogeneity by fine-scale, high-resolution, haplotype-based analyses. We used ChromoPainter and fineSTRUCTURE to uncover the hidden genetic structure “within” the Kuwaiti population. These methods exploit the rich information available within haplotypes to identify clusters of genetically distinct individuals with a resolution that cannot be attained with the use of allele frequency-based methods. Through this approach, we could identify distinct genetic clusters of individuals that strongly segregate within the three Kuwaiti subgroups.

The results of f3 and f4 confirmed the existence of a heterogeneous genetic pattern among the Kuwaitis, signaling out a probable impact of distinct population dynamics that characterized the current genetic diversities of different Kuwaiti subgroups. The f3 admixture results (Table 1) showed a distinct set of ancestral sources for the Kuwait-B subgroup, rather than for the Kuwait-P and Kuwait-S subgroups, suggesting a different genetic background of the Kuwait-B subgroup related to ancestors of contemporary Bedouins and Africans. Notably, a similar higher African-related genetic background of the Kuwait-B subgroup was also observed in the results of ADMIXTURE and CP-NNLS analyses. However, these results did not replicate similar patterns in the outgroup f3 and f4 statistics with African Mbuti as an outgroup, probably due to the masking of alleles shared with the outgroup. This higher masking of alleles related to the outgroup in the Kuwait-B subgroup plausibly reflects a greater number of African-related alleles among individuals in the Kuwait-B subgroup possibly due to a recent admixture event. There was greater genetic affinity between the Kuwait-P and Kuwait-S subgroups. With regard to other modern and ancient populations (Fig. 3A, B), the significant admixture signals with the use of the Saudi population as an admixture source (Table 1) indicate a relatively higher common genetic background of these two groups compared with the Kuwait-B subgroup. However, considering the relatively recent origin of the Kuwaiti population from the Saudi people, the discrepancies in the admixture signal, shared drift, and allele sharing pattern of the Kuwait-B subgroup, particularly in the context of both modern and ancient individuals, can be interpreted as a consequence of later gene flow from African-related populations to the Kuwait-B subgroup. This is obvious from the lowest number and length of RoH segments in the Kuwait-B subgroup (Fig. 1A), suggesting higher interactions with outsiders.

Meanwhile, the visibly higher genetic affinity of the Kuwait-S subgroup to the AP and African populations, compared with the Kuwait-P subgroup in admixture f3 (Table 1), the shared drift (Fig. 2A) and allele sharing (Supplementary Table 2) patterns suggest a varied population history for both and a lesser extent of intermixing. In fact, the highest amount of RoH segments (both in the average number and length) of the Kuwait-S subgroup is also in agreement with inbreeding and negligible interactions with other groups. Such consanguinity is plausibly a causal factor among individuals in the Kuwait-S subgroup of being much closer to their ancestral source groups from the AP, which is also supported by the admixture f3 results showing that among the three subgroups, the Kuwait-S subgroup was more of a basal group. The greater affinity of the Kuwait-S subgroup with populations from the AP is also corroborated by the f3 outgroup results with modern data and even with ancient individuals where the Kuwait-S subgroup had greater affinity to ancient Levant farmers and hunter-gatherers than the Kuwait-P subgroup (Fig. 2B). This genetic affinity further corroborates that Neolithic farmers from the Fertile Crescent plausibly repopulated the AP [5]. Meanwhile, the genetic affinity of the Kuwait-P subgroup to European and Caucasus populations is in agreement with the Persian-related genetic background. Considering this fact, it is anticipated that the Kuwait-P subgroup would be more closely related to ancient populations from the Steppe, Caucasus, and Iran, with lesser genetic affinity to ancient Levantines and Natufian populations than the Kuwait-S subgroup, as reflected in outgroup f3 with ancient genomes. It was intriguing to find less allele sharing of the Kuwait-B subgroup with Africans in f3 with Mbuti as outgroup, especially considering the higher African genetic components as determined by ADMIXTURE analysis. However, in the additional f3 using Papuans as the outgroup (Supplementary Fig. 4), strikingly, the Kuwait-B subgroup had the highest degree of drift sharing with West Africans, suggesting a recent gene flow between these groups.

Taking into account that both the West and East African populations were transported to the Middle East, Arabia, and Indian Ocean during the slave trade in the 15–19th centuries [47, 48], the f3 admixture results of the Kuwait-B and Kuwait-S subgroups verified the impact of the slave trade on AP populations. Moreover, considering the NNLS results of higher ancestry contribution from Yoruba in Kuwait1, 2, and 3 clusters (with most Kuwait-B individuals), and the low level of variation in North African genetic ancestry profile for all eight Kuwaiti clusters in addition to the admixture signal with Mandenka, we infer a recent possible admixture event in the Kuwait-B subgroup with a West African (Mandenka or Yoruba) or sub-Saharan African group.

Furthermore, the GLOBETROTTER results reinforce different major and minor ancestral sources of all three Kuwaiti subgroups (Supplementary Fig. 8A) and varied admixture events (Supplementary Fig. 8B). The higher African source profile of the Kuwait-B subgroup suggests a more recent admixture event within the last ~10 generations (~300 years) for the Kuwait-B subgroup than the other two subgroups (Fig. 4B). The Kuwait-P and Kuwait-S subgroups share a similar single date of admixture approximately 18 generations (~500 years) ago (Fig. 4B; Supplementary Fig. 8B). These admixture events signify the role of the Arabian maritime dominance [15, 16] and slave trade [48, 49] by the genetic footprint in present-day Kuwaitis as was expected, but it is almost obvious that the Kuwait-B subgroup is probably one of the most exogamous groups inhabiting the AP with a different and more recent history of intermixing that plausibly involved slaves from Western or sub-Saharan Africa [50]. Moreover, the fineSTRUCTURE and PCA identified distinct genetic clusters of individuals that strongly segregated within the three Kuwaiti subgroups, indicating a much deeper and distinct level of genetic heterogeneity among present-day Kuwaitis.

The TVD results of the differences in genetic ancestry profiles of individuals of a cluster provided a better picture of the genetic heterogeneity of the Kuwaiti population. Notably, the intra-cluster TVD value was the highest (both in length and number) for Kuwaiti populations inhabiting the AP and remained highest when the TVD was calculated only among Kuwaiti individuals in a single cluster. Consequently, the results indicate that the heterogeneity of clusters was due to the Kuwaiti population, suggesting that Kuwaitis are one of the most genetically heterogeneous populations inhabiting the AP region. NNLS analysis also confirmed the genetic heterogeneity of the Kuwaiti population by showing the differential amount of genetic ancestries that contributed to each of these clusters by three different major donor populations of Yoruba, Bedouins, and North-East Europeans.

In general, Kuwaiti populations are demographically characterized by large families and a remarkable rate of consanguinity, which is a potential threat to human health, especially in the context of rare autosomal recessive genetic diseases. As there have been historical population migrations in this region, a complex genetic diversity is expected, which was clearly reflected in this study. Our thorough investigations on the Kuwaiti population genetic structure at the finest scale highlights the precise genetic history and distinct heterogeneity of the Kuwaiti people, which could enormously aid in the systematic discovery of population- and/or family-specific diseases, especially in deciphering deleterious founder genetic variations. Overall, our study presents the fine-scale genetic structure of the distinctively heterogeneous Kuwaiti population and further highlights the recent historical population influx and gene flow from Western/sub-Saharan Africa to the AP region.