Introduction

Present day Iran is land-bound by multiple nations, including Iraq to the west, Turkey, Armenia and Azerbaijan to the northwest, Turkmenistan to the northeast, and Afghanistan and Pakistan to the east. Its southern perimeter consists of the Persian Gulf, a maritime gate between Iran and the eastern coastal region of Arabian Peninsula, which encompasses Kuwait, Saudi Arabia, Bahrain, Qatar, UAE and Oman. The Strait of Hormuz, the narrowest section of this waterway, separates southwestern Iran from the northeast tip of Oman by a mere 46.7 km of shallow water littered with numerous tiny islands.1, 2 Climatologists and historians have resolved that episodic sea level oscillations have exposed the ocean floor between these two regions at various points during the ancient past.1 This land bridge may have been a critical segment of the southern coastal route facilitating the out-of-Africa migration of modern humans 60,000 years before present.3

Iran's pivotal geographical location and its proximity to trans-continental migratory routes have led researchers to believe that the Iranian territory may have also played a key role in subsequent migrations, both prehistoric and historic, between Africa, Asia and Europe. This may be especially true for the Neolithic agricultural diffusion4, 5 given Iran's proximity to the Fertile Crescent where pastoral agriculturalists are postulated to have commenced their trek into both western and eastern Eurasia. Furthermore, Iran's location near the Central Asian steppes may have facilitated the westward-bound Mongol invasion in the thirteenth century AD,6 the ancient Silk Road passages,7 as well as the Central Asian settlement of pre-Islamic Iranian people, namely the Sogdians, Chorasmians, Scythians and Alans.8 Studies utilizing autosomal markers have thus far pointed to little or no differentiation between Iranian ethnic groups,9, 10, 11 and altogether they have been found to resemble most populations from the southern Balkan region and Anatolia.9, 10, 11, 12, 13

Iran's presumed role as a cultural and ethnic nexus, however, may have been undermined by physical features capable of posing substantial barriers to demic movements. The Iranian Plateau is framed by two major mountain chains, the Kuhha ye Zagros mountains, bordering southern and western Iran, and the Kuhha ye Alborz range, which extends 998 km along Iran's northern boundary with the Caspian Sea. Both cordilleras border the elevated central Iranian Plateau, which is covered, primarily, by two major deserts, the Dasht-e Kavir in north central Iran and the Dash-e Lut in the southeast. Together, these topogeographical obstacles may have restricted gene flow across as well as within the Iranian perimeter. These geographical characteristics may have forced migrants to settle either in the northern or southern areas of Iran, steering clear of its harsh interior desert terrain.

The contention that these aforementioned geographical barriers may have restricted genetic flow within Iran and between Iran and neighboring regions is supported by Y-chromosome data reported by Wells et al.14 as well as the Central Asian mitochondrial DNA (mtDNA) analysis performed by Quintana-Murci et al.15 In the former study, examination of the paternal gene pools of north and south Iranians reveals a clear demarcation in the distribution of R1a1 lineages (defined by the M17/M198 mutations) between the two sections of the Plateau.14 Similarly, Quintana-Murci et al.15 found greater proportions of mtDNA haplogroups N1b, R2, HV2, U7, J2 and T* in northern Iran, whereas M*, N*, R5, B, pre-HV1, U2*, U2e and U3 lineages were higher in the south. A recent study, based on both Y-chromosome and mtDNA analyses, found little to no differences in ethnic groups (Indo–European speakers versus Semitic speakers) residing in close geographical proximity within Iran.16 Furthermore, another mtDNA investigation led to the conclusion that two Indo–Iranian-speaking Talysh groups from Iran and Azerbaijan, that claim a common ancestry, were genetically similar.17 In the same study, however, Y-chromosomal marker composition was shown to differ considerably between the Iranian and Azerbaijani Talysh, with the Azerbaijan Talysh more closely resembling the Azerbaijan neighbors than its Iranian counterpart.17 Results reported by Regueiro et al.1 also indicate differential gene flow between northern and southern Iranian groups (divided by the Dasht-e Kavir and Dash-e Lut deserts) not only with respect to the R-M198 mutation, as illustrated by Wells et al.,14 but also with R-M269 as well. The same study also reveals significant divergence in the overall Y-haplogroup distributions between northern and southern Iranians as well as between both groups and other spatially separated Iranian populations (the Esfahan of Central Iran reported by Nasidze et al.18 and Uzbekistan discussed in the study by Wells et al.14). In spite of these efforts, a consensus has not yet been reached as to the source populations, overall genetic relationships and degree of stratification between different Iranian regions.

In the current inquiry, we explore Iran's relative importance, in contrasting roles, as a genetic nexus and as a genetic barrier from the perspective of high-resolution mtDNA analyses. We examine the maternal lineages of the 148 Iranian males featured in Regueiro et al.1 using the same north/south divide outlined in that Y-chromosome analysis. The present investigation compares and contrasts the overall mtDNA haplogroup frequency patterns and the phylogenetic relationships observed among geographically relevant Eurasian and North African groups representing areas to the north, south, east and west of Iran (see Table 1). This study also expands upon the mtDNA analysis of Iran presented in the study by Quintana-Murci et al.15 by utilizing high-resolution markers and through inclusion of samples from the southern coast and northeast Iranian territories as well as several other pertinent regions of the Plateau. We have also reanalyzed our previously published Y-chromosomal data to include the counterpart reference populations herein used as a point of comparison with our mtDNA data in order to directly assess the contrasting patterns of maternal and paternal inheritance throughout the region.

Table 1 Populations analyzed

Materials and methods

Population information

Whole blood was collected from a total of 148 healthy Iranian males whose ancestry can be traced back to at least two generations. Samples were obtained in strict compliance with National Institutes of Health guidelines as well as with those indicated by the Institutional Review Board of Florida International University. Donors from throughout modern-day Iran were divided geographically into northern (n=31) and southern (n=117) groups segregated by a virtual line at the southern fringe of the Dash-e Kavir desert extending horizontally to Iran's eastern and western borders (see Supplementary Table 1 for the location of the collection of each individual included in the study). The Iranian groups (Iran North (IN) and Iran South (IS)) are analyzed phylogenetically and statistically in a tri-continental context against 44 previously published, geographically targeted reference populations from various neighboring regions: northwest Africa, the Caucasus, central Asia and southwest Asia5, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 (Table 1). Due to of overlaps and differences in the resolution of the markers employed, populations marked with an asterisk (*) are only included in the NETWORK analysis and the time to most recent common ancestor estimates.

DNA extraction and sequencing

DNA from the 148 Iranian samples was extracted by standard phenol–chloroform methods as previously described,1, 29, 30 ethanol precipitated and diluted in 10 mM Tris-EDTA, pH 8.0. All samples were stored at −80°C when not in use. The mtDNA control region (nucleotide positions 15997–409) encompassing hypervariable regions I and II was PCR amplified with the single primer pair described in the study by Stoneking et al.31 Following amplification, the 1250 base pair amplicon was quantified with an Agilent Technologies (Santa Clara, CA, USA) 2100 Bioanalyzer. In all, 7–10 ng of the PCR product was then utilized for sequencing reactions (in both directions), employing the same primer pair used for PCR amplification and the dRhodamine Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Foster City, CA, USA). Sequenced products were separated in an ABI 3100 Genetic Analyzer with POP 6TM polymer (Applied Biosystems).

mtDNA haplotyping

The resulting sequences were aligned to the revised Cambridge Reference Sequence32, 33 using the BioEdit sequence alignment software (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Haplogroup assignment was based on the classification scheme outlined by Macaulay et al.34 Following alignment, restriction fragment length polymorphism analysis35 of coding sequences was performed to confirm designations of haplogroup assignments as previously described.36 The tentative assignments of haplogroups H, HV, M, N, R, U, W, I, K, JT and T1 were substantiated by restriction endonuclease digestions at the following diagnostic sites: both –14 766 MseI and –7025 AluI for haplogroup H, –14 766 MseI for HV, +10 397 AluI for M, +10 871 MnlI for N, +12 703 MboII for R, and both +12 703 MboII and +12 308 HinfI for haplogroup U, –8994 HaeIII for W, +10 032 for I, –9052 HaeII for haplogroup K, +4216 NlaIII for JT, and –12 629 AvaII for T1. The restriction endonuclease site at position 12 308 was confirmed via a mismatched primer approach as previously reported.36 Haplogroup H samples were typed for sub-haplogroups H1–H13 using restriction enzyme analyses and/or allele-specific PCR according to Achilli et al.,37 Loogväli et al.,38 Roostalu et al.39 and Gayden et al.40

Phylogenetic/statistical analyses

To assess the statistical significance of haplogroup frequencies, the Baysean 0.95 credible region (0.95 CR) was calculated using the SAMPLING program provided by Vincent Macaulay. Bonferroni corrections and χ2 analyses were also conducted to test for significant differences in the observed haplogroup frequencies between the IN and IS groups.

In order to ascertain the proportion of gene flow from the surrounding regions into the Iranian domain, admixture analyses based on frequency distributions were conducted using regional population groups as parentals and collections within the Iranian Plateau as hybrids with the aid of the SPSS version 16.0 software.41, 42 Parental groups were assembled as specified in Table 1 for mtDNA and in Supplementary Table 2 for the Y-chromosome.1, 14, 18, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60

To gauge significant deviations from neutral theory,61 the Ewens-Watterson homozygosity test62, 63 was performed on Iranian mtDNA control region sequences (nucleotide positions 16 019–16 569). In addition, mtDNA control region sequence diversity was evaluated by nucleotide diversity index.64 Time to most recent common ancestor of several mtDNA haplogroups was calculated according to Forster et al.65 using an average mutation rate of one nucleotide change in 20 180 years for nucleotide positions 16 090–16 365.5, 22, 66, 67

Multidimensional scaling (MDS) plots were generated with the SPSS program version 16.0 for both mtDNA and Y-chromosomal data sets in order to infer genetic relationships among IN, IS and all reference populations. The MDS analyses were performed utilizing the observed frequency of haplogroups. The software package Surfer (http://www.goldensoftware.com) was employed to obtain contour plots to visualize regional frequency gradients of specific mtDNA and Y-chromosomal haplogroups.

A series of global (multipopulation) and Iranian (segregated into north and south collections) network analyses based on the most frequent Iranian haplogroups were generated with the reduced median method68 and the NETWORK 4.1 software program (http://www.fluxus-engineering.com).

Reanalyzed, previously published Y-chromosome haplogroup frequency data

The Y-chromosome haplogroup frequency data published in the study by Regueiro et al.1 were utilized to perform admixture, MDS and contour plot analyses as described in the Phylogenetic/statistical analyses section of Materials and methods section (see above). In addition, we employed the Y-chromosome haplogroup frequency data from the study by Regueiro et al.1 to generate a map of haplogroup distribution in the two Iranian collections and the 36 reference populations.

Results

Iranian and global haplogroup distribution

Haplogroup frequencies and the associated 0.95 credibility region data for IN, IS and Iran total are presented in Table 2. A phylogeographic map, constructed on the basis of haplogroup distributions in the above-mentioned Iranian collections and the 44 reference populations, is provided in Figure 1a. Also, for a complete list of the high-resolution haplotypes observed in the Iranian populations surveyed in the current report, see Supplementary Table 1. Phylogeographic frequency distributions based on the major Y-chromosomal haplogroups are presented in Figure 1b. References for all populations included in this projection are provided in Supplementary Table 2.

Table 2 Haplogroup frequencies and credible region of Iran
Figure 1
figure 1

(a) mtDNA haplogroup distributions. Population designations are followed as presented in Table 1. (b) Y-chromosomal haplogroup distributions. Population designations are followed as presented in Supplementary Table 2.

IN and IS possess the same three most prominent mtDNA haplogroups, H, J and U; however, the frequency distribution of these lineages varies between the northern and southern regions of the Plateau. For the northern collection, J is by far the most frequent haplogroup (0.355, 0.95 CR: 0.211–0.532) and is followed in decreasing order by U (0.161, 0.95 CR: 0.072–0.328) and H (0.131, 0.95 CR: 0.053–0.290) lineages. In contrast, haplogroup U (0.222, 0.95 CR: 0.157–0.306) predominates in the southern population, and is followed closely by haplogroups H (0.205, 0.95 CR: 0.142–0.287) and J (0.145, 0.95 CR: 0.093–0.221). Yet, according to the results of the χ2 tests (Supplementary Table 3), the only haplogroups that differ significantly between these two collections are J total (0.355 versus 0.145 for IN and IS, respectively) and J1b (0.226 versus 0.060 for IN and IS, respectively). Despite other notable differences between the IN and IS collections, including the complete absence of L haplogroups as well as H and U sub-haplogroups in IN, the significance of these dissimilarities was not supported by the Pearson's χ2 statistical analysis. The lack of L haplogroups, and H and U diversification in the IN population may be merely the result of smaller sample size (N=31).

MDS

In the MDS plot based on mtDNA (Figure 2a), the southwest Asian populations are restricted to the left portion of the chart, the majority of which sequester in the lower left quadrant. The Afghanis group with the central Asians in the lower center of the graph is an expected association given that Afghanistan is frequently considered as a part of central Asia.40 The Balkan Peninsula populations form a tight cluster at the right-most extreme of the lower right quadrant, whereas the populations from the Caucasus, Levant/Anatolia and North Africa conform to a ladder-like pattern that extends from the extreme right center of the chart into the upper right quadrant. The central and southern Iranians are close to each other and to the North Africa and Levant/Anatolia assemblages. The Peninsular Arabs partition to the left of above mentioned groups of populations; interestingly, IN (present study) is located within this cluster, specifically close to the Qatar collection. Two other North Iranian populations from the South Caspian region, the Gilaki and Mazandarian, are positioned between Arabian and Levant groups, closest to Saudi Arabia, Oman and Egypt.

Figure 2
figure 2

(a) MDS plot based on observed frequency of mtDNA haplogroup distributions (stress=0.28852). Population designations are followed as presented in Table 1. (b) MDS plot based on observed frequency of Y-chromosome haplogroup distributions (stress=0.12492). Population designations are followed as presented in Supplementary Table 2.

Whereas distinct geographical-based clustering is apparent within the mtDNA plot, the Y-chromosomal counterpart contains overlapping regional groups as opposed to clear subdivisions (Figure 2b). The left portion of the chart, toward the center and in the upper quadrant, consists of the southwest and central Asian populations. The IN collection and the Georgian group lie toward the center of the lower left portion of the chart. The IS population plots proximal to the Azerbaijan collection, suggestive of a connection between Iran and Caucasia. The IS, Turkey and Armenia groups partition toward the vertical axis of the lower portion of the plot. The Central (Esfahān) Iranian population lies on the dividing line between the right and left upper quadrants, whereas the Balkan populations form a loose assemblage in the upper right quadrant. The Somalis and Ethiopians are sequestered to the right extreme of the plot, whereas the other North African group from Egypt is adjacent to a closely intertwined Levant/Peninsular Arab grouping. The Yemenis are the only population from the Arabian Peninsula that deviates from this spatial pattern, likely due to their previously described geographical isolation from the rest of the Peninsular Arabs.58

Contour plots

Contour plots based on the geographical clines of mtDNA haplogroups J, I, H, U, U2, U4, U5 and U7 are presented in Figure 3a, while Y-chromosomal haplogroups J and R are presented in Figure 3b. Haplogroups J and I (Figure 3a) are widespread throughout the Arabian Peninsula and in the region of the Fertile Crescent from the Levant to Iran. Interestingly, both haplogroups exhibit frequency pinnacles in IN as well as within the Arabian Peninsula. Haplogroup H displays a clear demic gradient from Europe into the Near/Middle East, with frequencies dwindling in the east of the Indus Valley.

Figure 3
figure 3

(a) Contour plots based on regional frequency distributions of specific mtDNA haplogroups. Populations included, latitudes and longitudes employed, as well as frequencies are presented in Supplementary Table 3. (b) Contour plots based on regional frequency distributions of specific Y-chromosomal haplogroups. Populations included, latitudes and longitudes employed, as well as frequencies are presented in Supplementary Table 4.

When all branches of haplogroup U are considered together, there are no well-defined frequency clines observed except for the obvious lack of the haplogroup within the African continent (Figure 3a). Upon sub-dividing the branches of the aforementioned haplogroup (only the most highly represented branches within the Iranian domain were further explored), clear region-specific gradients are detected. For example, sub-haplogroups U2 and U7 are widely distributed throughout Asia and the Arabian Peninsula, exhibiting their highest frequencies in the southwest Asian collections and displaying east-to-west frequency clines. It is noteworthy that both haplogroups are found in the Arabian Peninsula. U7 specifically exists at considerable levels (10.8% in the Gilaki and 3.2% in IN) in north Iranian territories beyond the Indus Valley where U2 dwindles. The opposite scenario is observed for U4 and U5, which instead exhibit west-to-east demic dispersal, with high frequencies in the Balkan Peninsula and increasingly lower frequencies toward the Near and Middle East. Neither of these sub-haplogroups is present to any appreciable level in the southwest Asian groups, with the exception of Afghanistan for U4. U5 presents an additional high-frequency focus in northern Iran, exhibiting a decreasing gradient frequency distribution toward the Arabian Peninsula. A second high-density point for both haplogroups is apparent in central Asia (Supplementary Table 4).

Y-chromosomal haplogroup J is present in high frequencies throughout the Arabian Peninsula and the Levant, dissipating considerably in all directions (Figure 3b). Haplogroup R, on the other hand, presents very high frequencies in the central Asian/southwest Asian regions, with levels decreasing immediately beyond the Indus Valley area. A slight increase in frequencies is observed in the Balkan Peninsula (Supplementary Table 5).

Admixture analyses

Clear differences are observed in the maternal versus paternal gene pools of each specific Iranian region, as well as when these are compared with each other (Table 3). The IN collection exhibits a 92.1% influence from the Peninsular Arabs when mtDNA is examined while this impact diminishs to 11.2% when Y-chromosomal data are examined. Similarly, the north Iranian Caspian populations of Gilaki and Mazandarian as well as central Iran and IS exhibit considerable proportions of mtDNA from the Arabian Peninsula (43.5 and 64.3%, 53.3 and 52.1%, respectively), whereas no apparent effect is seen in the Y-chromosomal component for central Iran and only 7.3% is observed for IS. Unfortunately, the Y-chromosome haplogroup counterparts were not reported at the resolution required for these analyses in the north Iran/Caspian populations. Balkan inputs are observed in the mtDNA pool of both IN (7.9%) and IS (23.1%), but are absent in Central Iran and in the other two north Iran collections. Whereas the Balkan region impacts the central Iran group at 28.7% via Y-chromosomal inputs, no Y-influence is detected in either the IN or IS populations. Imprints from the Levant and southwest Asia are mostly of Y-chromosomal origin, but are seen in the mtDNA of the central Iranian population and in the Gilaki. Central Asian impacts are only detected at the Y-chromosomal level and are absent from IS, whereas influences from Caucasia are observed in all instances except via mtDNA in IN despite its close geographical proximity to the region. No north African effects were detected for any of the Iranian populations using either mtDNA or Y-markers.

Table 3 Admixture analysis for populations within the Iranian Plateau

Neutrality tests and nucleotide diversity index

Control-sequence variation of both the IN and IS collections conforms to neutral expectations according to the outcome of the Ewens-Watterson test. The nucleotide diversity indices for 336 nucleotides of the mtDNA control region (nucleotide positions) is 0.021935±0.011783, 0.021769±0.011434 and 0.021388±0.011224 for IN, IS and Iran total, respectively (Table 4).

Table 4 Selective neutrality test and standard diversity indices of Iran populations

Network analyses

Time to most recent common ancestor estimates for H, I, J, J1b, U and U7 Iranian-specific and global networks are provided in Table 5 along with the corresponding topological shapes of each. In addition, network diagrams for each of the aforementioned lineages are available online in Supplementary Figures 1 to 11. Of these phylogenies, all global networks with the exemption of haplogroup U exhibit star-shaped morphologies, whereas the H network for IN (Supplementary Figure 10) is the only individual population projection to do so.

Table 5 Time-to-most recent common ancestor

Discussion

Differences between northern and southern Persians: relationship with the Arabian Peninsula

A comparison of the mtDNA pools of IN and IS (see Supplementary Figure 12 for exact location of collection of each sample) populations reveals contrasting frequencies of haplogroups H, J and U. Although haplogroup J constitutes the majority (35.5%) of the maternal component in the north, it is considerably lower (14.5%) in IS. Haplogroup U accounts for the majority (22.2%) of the mtDNA lineages in the south, a pattern consistent with that presented by Quintana-Murci et al.15 The large percentage of J-derived samples in Iran, specially in the north, contrasts with the more modest frequencies observed by Houshmand et al.,69 who also reported a greater proportion of J haplotypes for the northern (9.8%) versus the southern (5.9%) regions of Iran. Differences in collection sites may account for the higher frequencies of J in the collections reported in the present study as compared to that by Houshmand et al.69 The IN and IS also differ with respect to haplogroups T*, T1 and T3 (middle eastern- and lower Arabian Peninsula-specific), and L0 and L1 (characteristic of sub-Saharan Africans). In IS, haplogroups T and L are detected at frequencies of 3.4 and 2.56%, respectively, whereas both lineages are completely absent from the northern sample set. These findings, however, contradict the data published by Quintana-Murci et al.,15 where L lineages are reported for the northern but not southern groups, and haplogroups T* and T1 are observed in both regions of the Plateau but are higher in the north than in the south. These differences could be due to the small sample size of the North Iranian collection. The presence of both haplogroups in the Iranian populations may be indicative of gene flow from the Middle East and Africa.

The admixture analysis results indicate that the majority of Iran's mitochondrial pool is derived from Arabia (Table 3). The Persian groups obtained from previous studies also display high degrees of similarity with the Peninsular Arabs; however, they all exhibit greater contribution from adjacent populations especially with groups from Caucasia. These genetic affinities are also evident in the MDS projection (Figure 2a) in which all the Iranian populations plot between the Arab collections, and the Levant-Anatolia and the northeast Africa assemblages. The three north Iranian populations partition nearest to the Arab cluster, whereas the central and south Iranian populations segregate closest to the Levant–Anatolia and the north African groups. The genetic affinities between the Arabian Peninsula and Iranian groups may stem from gene flow at various points during the time continuum since the initial out-of-Africa dispersal including: (1) ancient migrations during the initial out-of-Africa exodus in which the Strait of Hormuz is believed to have played a major role (see ref. 1, 2) during Neolithic times as a conduit for pastoral nomads and/or (3) during the Arab expansions of the third to the seventh centuries AD.70, 71

Another plausible explanation for the closeness between Persia and Arabia may be the result of dispersals emanating out of central Asia into the Arabian Peninsula via Persia. It has been documented that military incursions during Sassanian times have left Persian communities deep behind in the heart of the Arabian Peninsula (as far south as Yemen).72 Similarly, the Balochis under the leadership of Iranian Gedrassians73, 74 are believed to have traversed the Strait of Hormuz into the Arabian range and are considered responsible for introducing Asian/Indian-specific β-thalassaemia mutations into the Peninsula.73, 75 These military campaigns are Y-chromosome-driven migrations of minimal mtDNA impact. However, the effect of these migrations is not well understood, and the degree of similarities between the Peninsular Arabs and the Iranians suggests widespread (involving the movement of large numbers of individuals) rather than discreet (a few scattered communities) migratory waves (Figure 2a; Table 3).

In the current study, the MDS plot (Figure 2b) based on Y-chromosomal profiles portrays a close genetic relationship between IN and IS that is reflected in their similar admixture profiles (Table 3). As previously reported by Regueiro et al.,1 the two collections have been thoroughly affected by the migratory waves that have impacted the region with seemingly higher proportions of admixture from the western perimeter than from the east, though over 37% (average of IN and IS) of their Y-chromosomal component is of central or southwest Asian origin (Table 3). It should be noted that the degree of genetic flow from Arabia, as seen in the admixture analysis results, is much lower for the Y-chromosome than it is for the mtDNA (Table 3). It is possible that this is the result of a larger male dissemination from other territories into Persia. This is apparent in the high frequencies of Y-haplogroup R1a1 (M198) of central Asian descent, which is believed to be a tell-tale marker for the expansion of the Kurgan horse culture and Indo–European languages (Figure 3b).1 It is widely accepted that Iranians are Aryans who migrated from the central Asian steppes around 4000 years before present.10

Haplogroups J and I: autochtonous to north Persian or the result of genetic drift?

Although the χ2 analyses performed (Supplementary Table 3) reveal genetic homogeneity between IN and IS for most mtDNA haplogroups, notable differences in the distribution of haplogroups J and J1b, both of which are more abundant in IN, exist (X2 P-value <0.005 for both). J and J1b lineages have been associated with the Neolithic diffusion of agriculture and domestication from its proposed geographical origin, the Tigris–Euphrates river valley, westward into Europe.35 The river valley, also known as the Fertile Crescent, was the physical location of ancient Mesopotamia and encompasses present day eastern Iraq, northeastern Syria, southeastern Turkey and western Iran. The J1b sub-haplogroup is abundant in the Mediterranean and southern Atlantic regions.35, 76 Interestingly, the frequency of this marker in IN is significantly (with the Bonferroni adjustment for 11 comparisons) higher (Supplementary Table 3) than that of any of the surrounding regions surveyed (panel J in Figure 3a portrays mtDNA haplogroup J as a whole), including those from the Levant (Palestine, Syria, Egypt and Jordan), west central Asia (Armenia), the Near East (Iraq and Iran Kurdish) and the Arabian Peninsula (Oman, UAE, Qatar and Yemen). Although it is tempting to conclude that this distribution pattern suggests a North Iranian origin for this lineage, genetic drift may be responsible. Although IN and IS individuals form part of the ancestral core in the global J1b network (Supplementary Figure 2), most of the remaining Iranian J1b haplotypes are located individually along the branch harboring the 16 222 transition. If the J1b source lies within northern Iran, it seems logical to expect more haplotype sharing or, at least, more integration of the IN and reference collections J1b sequences. The significance of the Iranian J1b frequency distribution and lineage pattern is not clear at this point. Denser sampling within and around Iran may provide added insight with respect to the phylogeographic history of J1b within this region.

The asymmetrical partitioning of mtDNA haplogroups J (IN 35.5% and IS 14.5%) and J1b (IN 22.7% and IS 6%) between the two study populations parallels that of the Y-lineage R1b1a-M269, also found at a substantially higher frequency in the northern portion of the Plateau (15 versus 6% for IN and IS, respectively). Furthermore, as was observed with the J and J1b mtDNA haplogroups, this Y-specific marker is substantially more abundant in IN than in most of the surrounding Middle East, Near East and Levantine groups examined, with the exception of Turkey (14.5%).1, 59 The M269 mutation is observed at elevated levels throughout Europe77 and declines in frequency along a southeast trajectory from Europe toward Pakistan (14.5–2.8%). The significance with respect to the enrichment of this European Y-chromosome marker in IN remains unclear. It is not known whether the presence of M269 in north Persia is associated with the northwest Neolithic agricultural movement from the Near East to Europe or if it signals a subsequent back migration eastward from Europe.

The distribution of haplogroup I also differs between the northern (9.7%) and southern (1.7%) regions of Iran. This incongruence is significant at α=0.05 (P<0.03) but not following the application of the Bonferroni adjustment (Supplementary Table 3). It is noteworthy that, with the exception of its northern neighbor Azerbaijan, IN is the only population in which haplogroup I exhibits polymorphic levels. Also, a contour plot based on the regional phylogeographic distribution of the I haplogroup exhibits frequency clines consistent with an Iranian cradle (panel I in Figure 3a). Moreover, when compared with other populations in the region, those from the Levant (Iraq, Syria and Palestine) and the Arabian Peninsula (Oman and UAE) exhibit significantly lower proportions of I individuals (1–2%; Supplementary Table 3). It should be noted that this haplogroup has been detected in European groups (Krk, a tiny island off the coast of Croatia (11.3%),78 and Lemko, an isolate from the Carpathian Highlands (11.3%)79) at comparable frequencies to those observed in the North Iranian population. However, the higher frequencies of the haplogroup within Europe are found in geographical isolates and are likely the result of founder effects and/or drift.79 In addition, several studies5, 34, 36, 80 report the Middle East as the origin of this haplogroup, but for unknown reasons, the prevalence of this lineage in the region has been lost. Thus, it is plausible that the high levels of haplogroup I present in IN may be the result of a localized enrichment through the action of genetic drift or may signal geographical proximity to the location of origin.

Gene flow and topogeographical barriers

Although haplogroup H and its subclades are found in highest frequencies in Europe and Caucasia, the presence of these haplogroups in Iran may reflect gene flow from neighboring southwest Asia where they are present at moderate frequencies. Furthermore, considering the substantial frequency of H2a1 (12.5%) in central and inner Asia, its low frequency in eastern Europe and its absence in western Europe,39 it is likely that its presence in Iran may be due to gene flow from Asia. The fact that sub-haplogroups H2, H2a1, H4 and H7 are seen only in IS (absent in IN), and at relatively low frequencies, may stem from the low number of individuals collected in IN (n=31).

mtDNA haplogroup T is common in eastern and northern Europe, and is found as far as the Indus Valley and the Arabian Peninsula.5 Thus, the presence of sub-haplogroups T*, T1 and T3 in IS, and their absence in IN, may be associated with gene flow from the Arabian Peninsula to southern Iran.

The best examples of barriers to gene flow are observed in the contour plots of haplogroups J and I and of sub-haplogroups U2 and U7. Both haplogroups J and I are found in high frequencies in northern Iran and exhibit a dwindling cline toward the Levant/Anatolia region, Europe and Asia. A sharp decline is observed beyond the Dasht-e Kavir desert, with some resurgence of haplogroup I in central Asia but no similar presence in southwest Asia, suggesting that this desert could have deterred migrants from traversing from one region to the other. On the other hand, haplogroups U2 and U7 exhibit the opposite demic pattern with high frequencies in southwestern Asia and in the Indus Valley, experiencing a sharp decline/disappearance (it is especially notable for U7) upon arrival at the Zagros mountains/Dasht-e Kavir region. These mtDNA results are mirrored in the distribution of Y-chromosomal haplogroup R, which exhibits a dramatic drop in the Dasht-e Kavir zone (Figure 3b).

The presence of haplogroups/sub-haplogroups J, I, U2 and U7 in the Arabian Peninsula25 again attests to the close genetic affiliation between Persia and Arabia and may suggest gene flow between the two regions. In connection with the putative migratory link between Persia and Arabia, it is noteworthy that high-frequency foci for haplogroups J and I, in the contour plots, are observed in the Arabia Peninsula, again possible signals of gene flow between the two regions (Figure 3a).