Introduction

Human consanguinity has been shown to lead to increased rates of recessive genetic disorders and has likely had a significant negative effect on polygenic traits influencing health.1, 2, 3, 4 An effect of consanguineous mating recognizable in most organisms is inbreeding depression, defined as the reduced survival and fertility of offspring of related individuals.5 A plausible explanation for inbreeding depression is increased homozygosity for partially recessive deleterious mutations (partial dominance). This explanation is currently favored over the alternative of increased homozygosity for alleles at loci with heterozygote advantage (overdominance).6 Within humans increased homozygosity is most likely to be observed in isolated populations. In fact the identification of rare recessive disease genes is greatly facilitated by conducting mapping studies in isolated populations founded by a limited number of individuals in whom causative rare alleles are often observed in homozygous form. In addition, population isolates are hypothesized to possess lower levels of allelic heterogeneity underlying disease traits and higher levels of linkage disequilibrium (LD), which suggests, when coupled with a relatively homogeneous background, that they may also be useful for the identification of susceptibility genes for complex diseases and quantitative traits.7, 8, 9, 10

However, it is important to characterize the level of inbreeding in a population isolate to determine its usefulness for examining particular diseases. Whereas the study of population isolates that were recently founded from a small number of individuals and that have undergone population expansions are useful for identifying loci associated with rare recessive diseases (for example, Finns), older population isolates with constant small Ne may be more helpful for finding loci contributing to complex disease (for example, the Saami).11 Ethnographic pedigree estimates of inbreeding are usually limited in resolution to three to six generations,3 whereas unknown relationships are more likely to exist in populations with higher rates of consanguinity. Fortunately, the availability of dense genome-wide genetic data now makes it possible to characterize the level of population inbreeding over much larger timescales.

The traditional coefficient of inbreeding, F, is estimated on the basis of the observed versus expected number of homozygous genotypes in a population, with the latter derived from population allele frequencies. However, F statistics will undervalue the true level of inbreeding if the level of homozygosity is high.12 An alternative method for inferring levels of inbreeding that is less sensitive to biases resulting from estimating population allele frequencies is to examine the length distribution of Runs of Homozygosity (ROH) within individuals.13 Importantly, this method can also distinguish between recent and more ancient consanguinity based on homozygous tract lengths.10, 12, 14, 15, 16, 17, 18, 19, 20 Recently, a model-based approach for detecting ROH was described that better accounts for missing data because of potential sequencing error, is potentially more sensitive to detecting true autozygous segments and does not require a priori size boundaries for different size classes that may introduce bias when comparing populations with different histories.19, 21, 22

In this study we examine the level of inbreeding and population structure using genome-wide autosomal single-nucleotide polymorphisms (SNPs) genotyped in 20 Daghestanian populations belonging to 19 ethnic groups from the Caucasus Mountains. Of the 50 indigenous Caucasian groups inhabiting the region today, 26 live within Daghestan, a territory comprising 50 300 km2 of the North Caucasus. Archaeological sites first appear in the Mesolithic (~10 000 BP) after the retreat of glaciers, with evidence of stable continuous human occupation thereafter.23, 24 No archaeological or linguistic evidence points to further major movements into the highlands.24, 25, 26, 27, 28 Thus, populations may have lived in the same highland region for hundreds of generations in relative isolation.24, 29 Consistent with this hypothesis, modern highland Daghestanian groups speak Nakh-Daghestanian (ND) or Northeast Caucasian languages that fall into deeply divergent branches of a single language family that appears to be endemic to the eastern Caucasus.25, 26, 27 Moreover, some Daghestanian isolates are characterized by a high prevalence of a number of genetic diseases, including schizophrenia, major recurrent depression, neuromuscular dystrophy, cardiovascular diseases and autosomal recessive deafness.30, 31, 32, 33, 34, 35, 36 Limited pedigree data suggest very high mean inbreeding coefficients (0.005–0.0134),32, 37 whereas population genetic studies have been restricted to autosomal Alu insertion and STR markers, the Y chromosome and mtDNA.32, 37, 38, 39, 40, 41, 42, 43, 44, 45

Materials and methods

Sample collection

Cheek swabs were obtained from 613 individuals from 19 ethnic groups living in 20 villages. Figure 1 shows a map of the sampled region, and Supplementary Table 1 lists samples defined by language and sample sites. Note that 14 of the Daghestanian populations speak unique languages that are part of the ND language family, whereas 6 speak non-ND languages. Sampled individuals were thought to be unrelated for at least three generations. Informed consent was obtained from all individuals according to protocols approved by the University of Arizona Human Subjects Committee. Armenian samples were collected by LY in Ararat Region, Armenia with a written consent form approved by the Institute of Molecular Biology, Yerevan, Armenia.

Figure 1
figure 1

Approximate geographic location of sampling sites. Of the 20 Daghestanian populations sampled here, 14 speak unique languages that are part of the ND language family. Of the remaining six populations (non-ND), three speak languages that are closely related to the Turkic branch of Altaic language family (Kumyks, Nogais and Azerbaijans), and three speak languages belonging to the Iranian language branch of the Indo-European language family (ethnic Tats, Mountain Jews and a group of Azerbaijans originated in Iran). See Supplementary Table 1 for population codes. ND and non-ND populations are shown in circle and triangle, respectively.

Genotyping and curation of autosomal SNPs

We genotyped 567 096 SNPs in 314 samples from Daghestan using the Affymetrix (Axiom, Santa Clara, CA, USA) platform, as well as in an additional 261 samples from the Caucasus (that is, non-Daghestanian), Near East, Europe, Central Asia and South Asia (Supplementary Table 1). Data were submitted to the NCBI/dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewBatch.cgi?sbid=1061909. Submitter Batch ID: Daghestan_SNP_Affymetrix. Our Axiom-genotyping data can be also accessed through our website at http://hammerlab.biosci.arizona.edu/SupplementaryData/HAMMER_LAB_AFFYMETRIX_CHIP.tar. Other than the Armenian samples, which are described here for the first time, all of the additional non-Daghestanian samples were included in previous studies.46, 47 Only SNPs located on the 22 autosomes (with centromeric regions removed) spanning a total length 2738 Mb were included in downstream analyses. We used the software PLINK1.0713 to filter the data set by removing SNPs with 10% or more missing genotypes and SNPs with a minor allele frequency (MAF) less than 1%. We also removed SNPs in high LD (window size: 50 SNPs; sliding window: 5 SNPs; r2 threshold=0.8), leaving a panel of 549 008 SNPs. Only a single individual from any pair of samples demonstrating high relatedness within the same population (inferred as third-degree relatives or higher using a PLINK PI_HAT value of ≥0.125) was included in the analysis. After removing close relatives, the total number of samples in our Axiom data set was 480. For several analyses (principal component analysis (PCA), ADMIXTURE, AMOVA, estimation of traditional coefficient of inbreeding, F), we intersected our data with publicly available samples (Supplementary Table 1). After applying the same filtering described above and removing obvious PCA outliers, the final merged data set resulted in 104 519 SNPs for 1100 individuals across 56 populations.

Data analysis

To evaluate the putative genetic ancestry of Daghestanian ethnic groups, we conducted a PCA using SMARTPCA48 and ADMIXTURE analysis49 on the merged autosomal data set of our samples from Daghestan, the Caucasus, Near East, Europe, Central Asia and South Asia. To minimize bias from high within-group covariance for larger samples sizes, we used a ‘drop one in’ procedure described by Veeramah et al.10 Genetic differentiation was estimated by the ARLEQUIN 3.5 software50 and SMARTPCA program in the EIGENSOFT software package.48

To assess the extent of inbreeding in Daghestan, we used three different measures of genome-wide homozygosity: (1) the coefficient of inbreeding F as obtained from PLINK,13 (2) a population-based estimate of the distribution of ROH as measured by a logarithm of the odds (LOD) score19, 22 with the same parameters as those used by Pemberton et al19 and (3) an individual-based estimate of the distribution of ROH using PLINK13 with the parameters identical to McQuillan et al.20 Estimates of F as obtained from PLINK (which assumes independence among SNPs) were performed on the merged data set of 104 519 SNPs in 56 populations, whereas the LOD and PLINK-ROH analyses were performed on our Axiom data set in 480 individuals from 33 populations (549 008 SNPs). To explore whether Daghestanian populations differ in patterns of ROH from surrounding populations, for PLINK-ROH analysis we supplemented our data set with seven Caucasus groups (N=132) from published data genotyped on different Illumina platforms: Adygei, Georgians, Lezgins, Abkhasian, Balkar, Chechen and North Ossetian (Supplementary Table 1). The number and density of SNPs included in ROH analyses varied only marginally across data sets: 549 008 and 4.9 kb/SNP (this study), 509 119 and 5.3 kb/SNP,45 510 154 and 5.3 kb/SNP51 and 569 388 and 4.7 kb/SNP.52, 53 To investigate the effect of potential ascertainment bias on the genome-wide homozygosity, we repeated principle PLINK-ROH analyses removing SNPs with a MAF less than 10%, leaving a panel of 380 015 SNPs. Applying different MAF thresholds of 10% changed the absolute values of ROH statistics, but had little effect on the relative order of populations and/or individuals (Supplementary Figure 1). The ROH frequencies and ROH similarities between populations were calculated as suggested in Pemberton et al.19 The Pearson correlation coefficient among total lengths in different pairs of classes was estimated using the R package.54

We evaluated the decay of LD with recombination distance for each chromosome using the genotypic-based r2 statistic estimated in PLINK. ND-speaking populations were compared with our European and Yoruba (YRI) reference populations from the HapMap collection. The data set was filtered to remove SNPs with 10% or more missing genotypes and SNPs with MAF less than 5%, leaving a panel of 538 788 SNPs. To control for the effect of sample size differences on r2, we randomly re-sampled 11 individuals when computing r2 for each SNP pair.55 All SNPs were assigned to genetic map positions using the CEU or YRI HapMap (hg18) recombination maps. r2 values were binned into 50 genetic distance groups between 0.005 and 0.25 cM in increments of 0.005 cM, and their mean r2 was used for further calculation of effective population size (Ne). For each population with sample size N≥10 we estimated Ne=1/(4c) × ((1/r 2)−2), where r2 was adjusted for sample size: (r2–1/n) and c is the recombination distance between loci in Morgans56, 57 (1 cM≈1.1-1.2 Mb). The standard error and confidence intervals for Ne were derived from separate analysis of each autosome.56 As recombination distance between markers is approximately inversely proportional to the number of generations t≈1/(2c),58 we estimated Ne in different time intervals assuming 25 years per generation.

Results

Genetic structure of Daghestanian populations

PCA revealed relatively distinct clusters of Europeans, South Asians and Central Asians, whereas Daghestanian samples (except the Nogais and Mountain Jews) are intermingled with other individuals from the Caucasus and show an affinity with our Near Eastern samples (Figure 2). Among our Near Eastern groups, Turks and Iranians demonstrate genetic similarity to populations from Daghestan, in particular, to Azerbajans from Daghestan (who consist of Turkic and Iranian speakers) and Tats (who are recent migrants from Persia). In general, the clustering within and general relatedness between populations corresponds approximately to their relative geographical location. We applied a STRUCTURE-like approach49 to estimate individual ancestry in K hypothetical ancestral populations. The best projecting accuracy was observed for a model with K=7 (Supplementary Figure 2). Overall, this approach revealed population structure consistent with the PCA analysis. For K1–K5 the ancestry proportions for all Daghestanian populations (except the Nogais and Mountain Jews) were similar to other populations from the Caucasus, Iranians and the Turks (Supplementary Figure 3). With higher K values, a single ancestry component emerged in the Hinukh, Hunzib and Tsez populations, whereas a separate ancestry component prevailed in the Akhvakh, Ratlub and Tindal populations.

Figure 2
figure 2

PCA analysis using the ‘drop one in’ technique for 56 populations. Circled positions indicate the median coordinate values for populations. See Supplementary Table 1 for population codes.

The mean pairwise FST value for our 19 Daghestanian ethnic groups was 0.0174 (Table 1). When only the 13 ND-speaking populations were included in the analyses, the FST estimate increased to 0.0186. Despite the small geographic area involved, this value is significantly higher than that observed among our Caucasian, Near Eastern, European, Central Asian or South Asian samples, although a strict comparison is difficult as the majority of the latter samples are cosmopolitan (that is, may be from multiple subpopulations).

Table 1 Mean pairwise FST values

Inbreeding coefficient and ROH in Daghestanian populations

Population samples from the Caucasus, Europe and Central Asia demonstrate the lowest F values (mean 0.0017, 0.0029 and 0.0113, respectively; Supplementary Table 2 and Supplementary Figure 4). Our Near Eastern and South Asian samples are characterized by F values that are two to three times higher. Non-ND ethnic groups from Daghestan show low F values similar to other populations from the Caucasus. On the other hand, ND-speaking populations reveal significantly higher coefficients of inbreeding compared with populations from other geographical regions (mean 0.0447, P<0.0001).

To examine ROH at the population level, we applied the Guassian mixture likelihood approach of Pemberton et al,19 which incorporates population allele frequencies to estimate a LOD score for measuring how likely it is for a segment to be autozygous.19, 21, 22 All ROH were subdivided into three classes: short (A), intermediate (B) and long (C) with a model-based clustering algorithm (with longer runs generally attributed to more recent inbreeding). LOD score thresholds and the size boundaries between classes A and B and classes B and C for each population are presented in Supplementary Table 3. The bimodal distribution of LOD scores (Supplementary Figure 5) demonstrates an increased presence of autozygous genomic regions in our samples from Daghestan and the Near East compared with those from the Caucasus, Europe, Central Asia and South Asia. The population-specific thresholds show negative values for five ND populations and Saudi Arabians (Supplementary Table 3), suggesting that autozygosity is more common in these populations. The ND populations also demonstrated significantly higher boundary sizes for classes A and B, and B and C than those from any other geographical regions suggesting longer ROH sizes on average in all three classes. The difference is most notable in the total length of ROH in class C and in all three classes combined (Figure 3). The variance appears to be higher in Daghestan, which may be suggestive of a lower Ne.12

Figure 3
figure 3

Distribution of total length of different ROH classes per individual in 40 populations. A box plot and a kernel density plot are shown as ‘violin plots’ for classes A (a), B (b), C (c) and A, B, C classes pooled together (d). The white horizontal line is the median, whereas the black bar represents the interquartile range. Populations are grouped according to geographic regions.

In an effort to distinguish whether the observed distribution of different size classes of ROH in various geographic regions may have arisen via distinctive demographic and/or evolutionary processes, we calculated the correlations of distance matrices based on the frequencies of different ROH classes and distance matrices based on frequencies of autosomal SNPs (N=549 008). Autosomal SNP FST values show a strong and highly significant correlation with distances based on the frequencies of ROH in classes A (r=0.601, P=0.007) and B (r=0.575, P=0.004), but no correlation with class C ROH (r=−0.066, P=0.705). We then repeated pairwise FST analysis excluding SNPs from different ROH classes. After removing SNPs from ROH classes C, B and C, and A and B and C, the average differentiation for ND-speaking populations decreases from 0.0254 to the values of 0.0223, 0.0203 and 0.0182, respectively. This is by far the largest absolute reduction for any regional group for either category. This suggests that much of the excessive population differentiation observed above is related to more prominent long-term inbreeding in Daghestanian populations (Supplementary Table 4).

Finally, we inferred levels of inbreeding at the level of individuals by examining the length distribution of ROH in PLINK (that is, unlike for F and the model-based ROH approach, this analysis is not reliant on population allele frequency estimation). In the expanded sample of 612 individuals, the total ROH length and ROH number per individual ranges from 5.5 to 465.7 Mb and from 7 to 186, respectively. The population mean of the cumulative ROH length per individual varies from 16.29 Mb (SE=2.96 Mb) in our sample of Sri Lankans to 219.3 Mb (SE=37.0 Mb) in the Hinukhs from Daghestan. Similarly, the population mean of the ROH number per individual shows the minimum value of 19.1 (SE=2.82) in the Sri Lankans, with the maximum reaching 108.8 (SE=25.53) among Hinukhs (Supplementary Table 5, Supplementary Figure 6). The proportion of different length ROH fluctuates considerably among geographic regions. Samples from Daghestan, particularly ND speakers, differ strikingly from those in the other geographical regions. The mean of the total ROH length per individual in our ND-speaking sample (128.1 Mb, SE=6.54) is significantly higher than in those from any other geographical region (P<0.0005). Approximately 55% of ND-speaking individuals possess a single contiguous ROH tract with length >100 Mb, and the average fraction of the genome in ROH is 4.7%. Out of the 13 ND-speaking populations analyzed, 6 populations (Akhvakh, Hinukh, Hunzib, Ratlub, Tindal and Tsez) exhibit a cumulative ROH length >100 Mb in more than 70% of individuals. Similarly, the number of ROH per individual is significantly higher among our ND-speaking samples compared with other geographical regions (73.41, SE=2.09; P<0.0001).

Interestingly, substantially elevated ROH numbers and lengths are not seen in our non-ND-speaking population samples, with both parameters being similar to other populations from the Caucasus. As shown previously17, 59, 60 geographical regions differ in their frequency of distinctive pre-defined ROH length categories (Figure 4,Supplementary Figure 6). Whereas shorter ROHs (<2 Mb) are common in all regions, the longer ROHs are increasingly prevalent only in our ND-speaking samples and those from the Near East.

Figure 4
figure 4

Regional distribution of ROH. The mean total ROH length (Mb)±SE is plotted for each geographic region.

We compared several ROH parameters in ND-speaking populations with those from other studies of genetic isolates.14, 15, 20, 61 The distribution of ROHs according to their size (in Mb; Supplementary Figure 7, Supplementary Table 6) clearly shows a higher proportion of individuals with extended regions of autozygosity in ND-speaking populations compared with individuals in other genetic isolates. For example, with 1.5 Mb used as the minimum length, endogamous Dalmatians and Orcadians had the mean ROH length of 35 and 28 Mb, respectively,20 and six geographically isolated villages in northeastern Italy were characterized by an average ROH of ~47 Mb.15 All of our ND-speaking populations are distinguished by a much higher mean ROH length of 92.6 Mb (that is, ranging from 50.4 Mb in Andians to 162.3 Mb in Hunzibs; Supplementary Table 6). When ROHs greater than 1 Mb are considered, the median ROH number and total ROH length for Daghestanian ND populations are 36.8 and 97.7 Mb, respectively, compared with ~16–19 and ~40–60 Mb for three isolates from Finland (Supplementary Table 6).61 One likely cause of this higher level of autozygosity is increased recent inbreeding among ND-speaking Daghestanian populations.

We evaluated hotspots in ND populations and contrasted them with those previously identified in European populations. Considering genomic regions of ≥50 adjacent SNPs, a total of 150 regions were identified as ROH hotspots (with minimum ROH frequencies higher than 10%) in ND populations (Supplementary Table 7). Among these 150 hotspots, 18 have been recognized in previous surveys of ROH in European populations18 and in worldwide human populations.19 Interestingly, 11 hotspots overlapped with 108 recently published schizophrenia-associated genetic loci.62

Estimation of effective population sizes

To further assess their level of genetic isolation, we evaluated the decay of LD over recombination distance for ND-speaking populations and compared this with our reference populations from Europe and YRI. We explored spatial and temporal variation in Ne estimates, although an exact comparison is not straightforward because the majority of the European samples may come from multiple subpopulations. All ND-speaking populations except the Avars show a significantly elevated level of LD at all recombination distances (Supplementary Figure 8). The Avars, the most numerous ND-speaking population, appear to have a comparable level of LD as European populations. Ne estimates vary significantly across populations in our study (Figure 5,Supplementary Table 8). The Ne value for YRI (12 113, SD=867) is slightly lower than that estimated in McEvoy et al.56 (13 900). The highest average estimate of Ne by region is found in our South Asian samples (Ne=10 365; SD=453). The highest Ne values for Daghestanian populations are observed in the Kumyks and Avars, which represent the two largest populations in Daghestan by census size. The smallest Ne values are found in ND-speaking populations (ranging from ~2730 to 4150) with the highest level of inbreeding (Akhvakh, Hunzib, Ratlub, Tindal and Tsez). (Note, that the Hinukh sample was omitted from this analysis because of small sample size). The average Ne for ND-speaking populations (4675, SD=1463) is approximately a half of the estimated Ne of our Near Eastern and European samples. The Ne estimates are significantly correlated with census population sizes in Daghestan (r=0.55, P=0.011), particularly for ND-speaking populations (r=0.72, P=0.009). LD observations in the bins between 0.005 and 0.25 cM in all populations allowed us to examine temporal variation in Ne from ~125 000-5000 years ago. The higher levels of LD over recombination distances in ND-speaking populations made it possible to extend our calculations between 0.005 and 0.36 cM, providing estimates of Ne to 3000 years ago. We chose European, ND-speaking and YRI populations to trace temporal variation of Ne (Supplementary Figure 9). Consistent with the earlier study56 we found a relatively large Ne for YRI, with a slight increase around 9000 years ago, and a population size expansion for Europeans for the last 10 000–12 000 years. All ND-speaking populations except the Avars show a steady decline in their effective population sizes.

Figure 5
figure 5

The harmonic mean of Ne for each population with sample size N≥10 over 50 recombination distance classes between 0.005 and 0.25 cM with increment 0.005 cM. The SEs for Ne indicated by the bars were derived from separate analysis of each autosomal chromosome. Yorubans were included for comparisons with published data.

Discussion

Our study represents the first attempt to characterize genetic diversity in Daghestanian populations using high-density SNP analyses in a large number of ethnic groups (N=19). Marriage customs favor endogamy on both sides in all ND-speaking groups, which vary in total population size from ~600 individuals in Hinukhs to 785 300 in Avars. Traditional consanguineous marriages are commonly between cousins (that is, mostly between paternal cousins) in order to keep resources within the clan (Tukhum). As a result, ethnic identity is preserved, and numerous long-isolated, inbred small populations are created.39, 63, 64, 65 Limited pedigree data suggest very high inbreeding coefficients (0.005–0.0134). Three different measures of genome-wide homozygosity consistently reveal the highest level of inbreeding in Daghestanian ND-speaking groups compared with our Eurasian population samples. Our ND-speaking sample has the highest traditional coefficient of inbreeding F, the longest genomic stretches of homozygosity for all ROH length categories and a higher proportion of individuals with extended regions of autozygosity.

The extent of recent inbreeding in Daghestanian populations is evidenced by the presence of very long ROH segments, with a large proportion of individuals in ND-speaking population possessing segments >100 MB in length (Figure 4, Supplementary Figure 7,Supplementary Table 5). To put this number in a temporal context, under a very simplistic model, a mean ROH length of 10 and 25 cM, respectively, would be compatible with the parents sharing a common ancestor five and two generations ago.66 However, we note that the scenario for Daghestan is likely much more complicated, with multiple layers of possible inbreeding. Although a high coefficient of inbreeding, F, and the presence of long autozygous segments are consistent with previous inferences of inbreeding coefficients estimated from pedigree data of up to 11–14 generations (275–350 years),32, 37 our combined analysis demonstrates that inbreeding has likely been a common feature in Daghestan over a sustained period, with shorter runs indicative of more ancient inbreeding also more prominent in Daghestanian versus other populations.

In fact, our results suggest that Daghestan demonstrates the highest levels of autozygosity observed on mainland Eurasia to date.14, 15, 16, 18, 20, 60, 61, 67, 68 This effect appears to predominate particularly among the ND-speaking populations, with non-ND-speaking Daghestanian and non-Daghestanian Caucasus populations showing levels of autozygosity that are comparable to other Eurasian populations. In addition, a high level of genetic structure is observed among ND-speaking populations. The long-term Ne of these populations as estimated by the decay in LD appears to have been lower than other European populations for the last 10 000 years. These results are compatible with the hypothesis of long-term isolation of Daghestanian ethnic groups living in a highly mountainous region with extensive inbreeding and negligible gene flow among villages speaking different languages within the ND language family.

Our results are also compatible with the high incidence of various complex diseases in a number of Daghestanian populations.30, 32, 33, 34, 68 Several genetic studies of quantitative, disease-related phenotypes have already been successfully carried out in Daghestanian small isolated communities.30, 32, 33, 34, 69, 70, 71, 72, 73, 74 These studies observed notable within- and between-isolate diversity in clinical and genetic heterogeneity likely because of differences in founders, subsequent endogamy and inbreeding within the isolates. The results from these studies suggest that mapping genes of complex diseases, including major depressive disorder, schizophrenia, hearing loss across genetically homogeneous isolates, can facilitate detecting linkage signals and expedite the search for susceptibility genes when combined with the methods that identify structural genomic and nucleotide variation in linkage regions. This suggests that the populations of Daghestan are excellent candidates for studies of the effects of homozygosity on the health of subjects with a shared genetic and environment background, as well as for disease gene mapping. In particular, Daghestan appears to possess a relatively large number of what could be defined as ancient or primary population isolates.75 The utility of population isolates with low long-term Ne in the mapping and identification of genes is not only limited to the study of rare diseases—isolated populations also provide a useful resource for the identification of susceptibility genes for complex diseases, initial insights into disease pathogenesis and for understanding the biology underlying common diseases and their component traits.8, 76, 77, 78, 79, 80 The study of isolated populations facilitates the discovery of disease-associated genes with alleles that have increased in frequency as a result of a strong genetic drift.78 As such, the genetic isolates of Daghestan may provide the opportunity to identify both common and rare disease-causing variants through association studies.