High-frequency microsatellite haplotypes of the male-specific Y-chromosome can signal past episodes of high reproductive success of particular men and their patrilineal descendants. Previously, two examples of such successful Y-lineages have been described in Asia, both associated with Altaic-speaking pastoral nomadic societies, and putatively linked to dynasties descending, respectively, from Genghis Khan and Giocangga. Here we surveyed a total of 5321 Y-chromosomes from 127 Asian populations, including novel Y-SNP and microsatellite data on 461 Central Asian males, to ask whether additional lineage expansions could be identified. Based on the most frequent eight-microsatellite haplotypes, we objectively defined 11 descent clusters (DCs), each within a specific haplogroup, that represent likely past instances of high male reproductive success, including the two previously identified cases. Analysis of the geographical patterns and ages of these DCs and their associated cultural characteristics showed that the most successful lineages are found both among sedentary agriculturalists and pastoral nomads, and expanded between 2100 BCE and 1100 CE. However, those with recent origins in the historical period are almost exclusively found in Altaic-speaking pastoral nomadic populations, which may reflect a shift in political organisation in pastoralist economies and a greater ease of transmission of Y-chromosomes through time and space facilitated by the use of horses.
Reproductive success, widely used as a measure of fitness, is described as the genetic contribution of an individual to future generations.1 Human behavioural ecologists use various fertility-specific measures to directly estimate reproductive success in extant populations (eg, Strassmann and Gillespie2); however, indirect estimates of past reproductive success can also be obtained through evolutionary population genetic approaches, which demonstrate that its cultural transmission can be extremely effective.3, 4
High variance of male reproductive success is detectable from genetic data because it leads to many Y-chromosomal lineages becoming extinct through drift and others expanding markedly. For any increase to lead to a high population frequency, two factors are needed: biological, with a need for men to be fertile, and cultural, with a continued transmission of reproductive success over generations. Indeed, previously, strong signals of successful transmission of Y-lineages have been associated with recent social selection, and were explained by inherited social status, with two cases in Asia and one in the British Isles. The best-known instance is the finding that ~0.5% of the world’s Y-chromosomes belongs to a single Asian patrilineage,5 descending from a common ancestor in historical times, and suggested to be due to the imperial dynasty founded by Genghis Khan (died 1227). The lineage is characterised by a high-frequency Y-microsatellite haplotype and a set of close mutational neighbours—a so-called ‘star-cluster’. In its structure and haplotype diversity, it resembles the recent descent clusters (DCs) observed within rare British surnames,6 despite its presence in populations spread over an enormous geographical range and speaking many different languages. Subsequently, two further examples of high-frequency DCs were described and were associated with the Qing dynasty descendants of Giocangga (died 1582) in Asia,7 and the Irish early medieval ‘Uí Néill’ dynasty in Europe.8 These three examples might suggest an association between high reproductive success (detectable at the population level) and both wealth and socio-political power, as is observed in some present-day populations (eg, the Tsimané of Bolivia9).
In this study, we ask whether the two signals of continued transmission of success over generations detected in mainland Asia are the only examples of likely recent social selection, or if similar patterns can be detected among the Y-chromosomes of this continent, known for the numerous expansive polities that emerged there during the Bronze Age and later (beginning 5000–4000 YBP). Pinpointing the historical figures associated with any such signals of Y-lineage transmission would require their identification via ancient DNA testing, and/or a comparison with certified living descendants and will not be attempted here. Instead, we aim to understand whether efficient transmission is linked to rules that apply in populations with specific histories of subsistence or culture, or can be detected in other groups with diverse cultural features. We also ask whether any expansions detected converge to the same time period or reflect distinct temporal episodes.
To address these questions, we generated novel Y-microsatellite and Y-SNP data and combined these with published data in order to carry out a systematic analysis of the most frequent haplotypes and their associated DCs. We chose a lineage-based, or interpopulation approach, rather than an intrapopulation approach,10 as we reasoned that dominant lineages are likely to migrate.11 We considered the geographical pattern of identified expansions, their ages and the cultural factors (language and mode of subsistence) characteristic of each population in which they were detected. In total, the 5321 chromosomes analysed belong to 127 populations distributed from the Middle East to Korea. Our analysis focuses upon the 15 most frequent haplotypes that include two haplotypes reflecting the previously recognised ‘Khan’ and ‘Giocangga’ expansions.5, 7 Our analysis shows that the most successful Y-chromosomes in Asia form part of expansions that began between 2100 BCE and 1100 CE, found both among sedentary agriculturalists and pastoral nomads. The ages of these expansions are considered in their cultural contexts.
Materials and methods
DNA samples (461) from Central Asia were collected by the authors with appropriate informed consent.12 An additional 4860 samples form part of sets published previously13, 14, 15, 16, 17, 18, 19, 20, 21 with the exclusion of population samples containing fewer than 15 individuals. A total of 5321 males from 127 populations were analysed (Supplementary Table 1 and Figure 1).
Y-chromosome haplotyping and nomenclature
In the 461 Central Asian samples up to 31 binary markers were typed hierarchically by using the SNaPshot protocol on an ABI3100 capillary electrophoresis apparatus (Applied Biosystems Inc., Foster City, CA, USA). Primers were based on ones published previously22, 23, 24 with additional primers based on published sequences.25 The binary markers define haplogroups that are represented in a maximum parsimony tree25 (Supplementary Figure 1). For comparison among data sets, some simplification of resolution was required, but the original haplogroup information was also retained and correspondence between previous nomenclatures was established.
Eight Y-specific microsatellites (DYS19, DYS389I, DYS389b (equivalent to DYS389II-DYS389I), DYS390, DYS391, DYS392, DYS393 and DYS439) were typed in two previously described multiplexes26, 27 by using an ABI3100 apparatus and GeneMapper v 4.0 (Applied Biosystems Inc.). When DYS19 was observed as two alleles, we retained for analysis the allele found at higher frequency in the population. Allele nomenclature was as described27 with appropriate adjustments in some published data sets.
DC definition and display
Having identified frequent haplotypes, we applied a rule to define a DC centred upon each, using a script (‘Cluster Generator’; Supplementary Information) to extract these DCs from the database. This selects individuals from the data set, starting from a given haplotype and including all haplotypes that are continuously traceable to a core haplotype via a pathway of increasing frequency, a topology consistent with a Poisson-distributed single-step mutational model.8 An additional ‘single-step+10% two-steps’ model was tested, but had the effect of incorporating unrelated chromosomes belonging to distinct haplogroups. We therefore applied only the most conservative single-step model. For each population, we reported the proportion of each DC detected, and, as a measure of DC diversity, its average within-population microsatellite variance. The latter measure was used only for cases in which the number of DC chromosomes per population was >7 in order to avoid strong variance bias.28, 29 Use of seven or six microsatellites instead of eight identifies the same DCs (data not shown).
Frequencies of DCs and local microsatellite variances were calculated, and both frequency and highest variance value (source) were displayed using Surfer 8.02 (Golden Software, LLC, Golden, CO, USA) by the gridding method. Latitudes and longitudes were based on sampling centres.
TMRCA estimation and multivariate statistical analysis
TMRCA estimates were generated using BATWING30 using an exponential growth model, an average mutation rate of 0.0028 per locus per generation for microsatellites31, 32 and a generation time of 30 years, compatible with population-based estimates for males.33 BATWING was run on samples comprising DCs to reduce computation time. Tests showed that running BATWING on populations and extracting DC TMRCAs led to similar results (data not shown).
Statistical tests were done using R.34 PCA was carried out on the frequency of DCs per population (arcsine square root transformed) to test relative contributions of DCs and potential associations between expansions. To test for shared cultural features (language and subsistence methods), a co-inertia analysis35 was carried out on the 95 populations for which cultural information was available. This examines two synthetic variables, one in each data set (here, frequency of DC and cultural features), with maximum covariance. The Rv-coefficient (a multivariate extension of the Pearson correlation coefficient) was calculated and tested by means of a Monte–Carlo method (1000 random permutations of the rows of each data set). PCA and co-inertia analyses were done using the ADE package.36
Identification of unusually frequent haplotypes and associated DCs
Through haplotyping of Central Asian samples (Supplementary Table 2) and a literature survey, we collected 5321 eight-locus Y-chromosomal microsatellite haplotypes belonging to 127 Asian populations (Figure 1 and Supplementary Table 1). We ranked haplotypes by frequency (Figure 2), reasoning that particularly frequent haplotypes should represent potential cores of clusters indicating past expansions of paternal lineages. The sample contained 2552 distinct haplotypes, 67% of which were unique, and 15 haplotypes (0.6%) were each present >20 times (and up to 71 times), indicating probable examples of successful transmissions (Figure 2). We focused on these 15 haplotypes and studied their spatial distributions and ages to illuminate the history of high reproductive success of their respective common ancestors.
Frequent haplotypes are expected to be accompanied by neighbouring haplotypes within the same Y-SNP haplogroup that arise from them via mutation, thereby forming DCs.6 These DCs were uniformly and objectively extracted from the database using ‘Cluster Generator’. When applied to the 15 most frequent haplotypes, this led to some common haplotypes merging into clusters centred on other common haplotypes, and thus to a total of 11 DCs, whose characteristics are summarised in Table 1.
Characteristics of DCs
A total of 2000 (37.8%) males carry Y-chromosomes belonging to these DCs, and they are remarkably widespread, with only three of the 127 populations analysed lacking any DC chromosomes; DC2 is also detected in one of the two ancient DNA population samples in our database (Krasnoyarsk_Kurgan17). Overall DC proportions in each population vary greatly from 5 to 96.7% (Supplementary Table 3).
As DC4 is restricted to Korea,18 it was not considered for interpopulation analysis. For the remaining 10 DCs, we explored three characteristics for each: its geographical frequency distribution, the geographical distribution of its mean microsatellite variance to indicate its likely expansion source and its TMRCA to suggest the age of expansion. Frequency and variance are represented on maps (Figure 3 and Supplementary Figure 2), and are summarised in Table 2 and Figure 4, which also gives TMRCA estimates. Note that the dating method used, BATWING, provides estimates with large confidence intervals.
Our results show heterogeneity in DC features for all variables analysed: (i) Geographical pattern: two kinds of pattern are observed: six DCs display high frequencies spread over large geographic areas (DCs 1, 2, 6, 8, 10 and 14), and four DCs are more restricted to specific areas (DCs 3, 5, 8 and 12). (ii) Origin: three main source locations emerge: Silk Road/East Mongolia (DCs 1, 8 and 10), Middle East/Central Asia (DCs 2, 3 and 5) and East India/South East Asia (DCs 2, 6, 11, 12 and 14). The potential ‘source’ populations are given in Table 2. (iii) Age: mean TMRCA estimates range from 900 to 4100 YA. In considering these results, we focus on two periods: the ‘historical’ period (700–1060 CE) and the protohistorical period (2100–300 BCE), corresponding, respectively, to pre-Imperial ‘Ancient’ China,37 and the Bronze Age/Iron Age in Levantine/Inner Asia.
Seeking common features among expansions
We performed a PCA on DC frequencies per population (Figure 5) to reveal the relative contributions of the 10 DCs and to test any potential associations between expansions. The two first axes of the PCA contribute, respectively, 27% and 21% of the total variance, with the main contributions of DCs 2, 5, 12 and 14 along axis 1, and those of DC1, 8 and 10 along axis 2. DCs 12 and 14 are systematically associated, as are DCs 2 and 5, and DCs 1, 8 and 10, suggesting shared characteristics between these three sets of expansions (Figure 5).
To test whether individuals included in these 10 expansions share any particular cultural features, such as language or subsistence methods, as is the case for previously published examples,5, 7 we performed a co-inertia test (Figure 6). This revealed the preferential association of DCs 12 and 14 with Sino-Tibetan and Austro-Asiatic languages and a ‘multiple’ mode of subsistence, of DC2s 2 and 5 with Indo-European languages and ‘agricultural’ subsistence, and of DCs 1, 8 and 10 with Altaic languages and pastoral nomadism (Figure 6). Genetic and cultural patterns (mode of subsistence and language) are significantly correlated (Rv=0.47, P-value <0.0001, N=95), suggesting that several expansions share the same cultural characteristics.
The TMRCA estimates are consistent with six expansions beginning in protohistorical times (DCs 2, 5, 6, 11, 12 and 14), and four beginning in historical times (DCs 1, 3, 8 and 10; Table 2). DCs detected in the different periods differ (Figure 7) in regard to mode of subsistence (χ2=345.59; df=2; P<0.0001) and language (χ2=211.67; df=4; P<0.0001). This suggests a shift in population dynamics between the two periods, with an increase of successful Y-lineage expansions in Altai-speaking pastoral nomadic populations in more recent times. The historical-period expansions show the highest growth rates (Table 2).
Associating protohistorical and historical expansions with subsistence mode and language
Six of the 10 DCs analysed originated during the Bronze Age or earlier (2100–300 BCE), and correspond to 30% of the individuals in our data set. The ‘origins’ of these expansions lie within three main regions: South East Asia, including Laos (DCs 12 and 14), Tibet/East India (DCs 6 and 11) and Central Asia/Fertile Crescent (DC2 and DC5). The co-inertia analysis shows that these expansions are found mainly among agricultural populations, speaking Indo-European and Austro-Asiatic languages. DCs 5, 12 and 14 have their putative sources centred both on the Fertile Crescent and South East Asia—places known as centres of agricultural innovation;38 these DCs could be, then, by-products of the civilisations that underwent a shift to agriculture during the Bronze Age (Figure 4). The origin of DC2 is located in Central Asia and its TMRCA estimate is 1300 BCE; this is coherent with the ancient DNA Y-chromosomes from the Middle Bronze Age (Andronovo) detected in DC2.17
Four expansions date to the historical period (700–1100 CE). The most recent (DC3; 1100 CE), originating in the Near East and extending to the South East Indian coast, is mainly detected in agricultural populations. It could be linked to the rapid expansion of Muslim power from the Middle East, across Central Asia and to the borders of China and India, after the establishment of a unified polity in the Arabian Peninsula by Muhammad in the 7th century and under the subsequent Caliphates.
The three other expansions (DCs 1, 8 and 10; 700–1060 CE) encompass 79% of individuals involved in historical expansions and are almost exclusively detected in Altaic-speaking pastoral nomadic populations. The territory is large and follows the Silk Road corridor. The signal of expansion spreads from East to West (from Mongolia to the Caspian Sea), as DC1 has its source in Inner Mongolia (hgC3[xC3c]), DC8 in the Oroqen (hgC3c) and DC10 in the Hezhe (hgN1). Given their different sources and associated haplogroups, these DCs are likely to represent three distinct expansions. These expansions show high variance values not only at their sources but also in other locations (Supplementary Table 3); this could be explained by a rapid migration of founders and parallel transmissions to generate diversity in these different places, all descending from a unique ancestor.
The two previously recognised Asian star-clusters (‘Khan’ and ‘Giocangga’5, 7) are identified here, and correspond to DCs 1 and 8, respectively. Their TMRCAs are estimated at 1090 CE for DC1, almost identical to the published estimate of ~1000 YA,5 and at 700 CE for DC8, older than the published estimate of 590±340 YA.7 The ‘Khan’ DC remains the most striking signal of an Asian expansion lineage, representing 2.7% of the entire data set, and the highest number of identical Y-chromosomes (N=71). Interestingly, the westward directions of expansions DC8 and DC10, their potential sources in northeast China, their geographic extents from China to Karakalpakia, and also the Altai-speaking populations associated with them, could also indicate involvement of the Imperial or elite lineages associated with the Khitan Empire. Abaoji, Emperor Taizu of Liao and the Great Khan of the Khitans (died 926 CE), who officially designated his eldest son as his successor, maintained a pattern of seasonal movements typical of pastoral societies, which could also explain the geographic distribution of these lineages.
Highly represented Y-lineages as a proxy for high reproductive success
Reproductive success, widely used as a measure of fitness, is the genetic contribution of one individual to future generations; it implies that offspring are plentiful and that they survive. In humans, offspring number and survival can be strongly influenced by culture.4 Examples of long-term high reproductive success have been revealed by unusually frequent Asian Y-chromosome haplotypes suggested to descend from two individuals, Genghis Khan5 and Giocangga,7 and to have spread in past generations via social selection (because of associated power and prestige) during historical times. In this study, we used a similar approach, analysing 5321 Y-microsatellite haplotypes belonging to 127 Asian populations to ask whether other examples of continued Y-lineage transmission are observed.
A favourable post-Bronze Age socio-political context for successful Y-lineage transmissions
We have detected 11 highly represented Y-chromosome lineages among the 5321 males analysed; altogether the DCs derived from 11 founding chromosomes represent a large proportion (38%) of the Y-chromosomes analysed, attesting to the importance of efficient Y-lineage transmission in the history of the continent. The presence of DCs with TMRCAs from 2100 BCE to 1100 CE suggests that the socio-cultural context, from the Bronze Age up to more recent historical periods, has been sufficiently favourable to ensure a continuous transmission of these different Y-lineages. Indeed, several admixture events have deeply affected the Asian gene pool during this period, involving migrant groups coming from North East and South East Asia.39 It has been argued40 that the development and impact of complex polities beginning with the Bronze Age ~2500–100 BCE in Central Asia were responsible for drastic changes in population structure, movements and organisation. These polities and early states emerged from local traditions of mobility and multiresource pastoralism, including agriculture and distributed hierarchy and administration,41, 42 with a royal or imperial form of leadership combining elements of kinship and political office.42 Archaeological evidence from the late Bronze and early Iron Ages (1400–400 BCE) confirms the existence of control hierarchies associated with wealth differences and socio-political power, even before the emergence of political entities.42 The development of these polities was accompanied by the emergence of stratified societies, recognised elite lineages and differences in power and status between individuals.
High reproductive success is often associated with high social status, ‘prestigious’ men having higher intramarital fertility, lower offspring mortality9 and access to a greater than average number of wives.43 This suggests that the link between status and fitness detected in modern populations (eg, Nettle and Pollet,44 Snyder et al45 and Heyer et al46) also existed in ancient populations. For the TMRCA period from 2100 BCE to 1100 CE, the DCs are equally distributed between sedentary agriculturalist (36.8%) and pastoral nomadic populations (36.7%), indicating that uninterrupted transmission of Y-lineages over many generations could occur in both groups. Cultural transmission of fertility, reproductive success and prestige, seen in contemporaneous populations,47 could also explain the persistence of these frequent Y-lineages over time. The high variance of DC proportions among populations indicates that the ‘memes’ responsible for cultural transmission can vary greatly among populations, even when populations share similar cultural characteristics.
A shift from sedentary agriculturalist to pastoral nomadic populations for successful Y-lineages
Our study highlights a difference in population composition between protohistorical and historical periods. In protohistorical times, DCs are detected mainly in agriculturalist but also in multiresource and pastoralist populations, whereas in historical times they are predominantly seen in pastoralist populations. This shift towards pastoralists only suggests a modification of social organisation in Asia and a higher probability to generate reproductively successful lineages in pastoral-derived societies and states. The development of complex polities began with the Bronze Age in Central Asia ~2500–100 BCE.40 Agriculture arrived ~6000 BCE (Djeitun, Turkmenistan48, 49) and dispersed during the Bronze Age (3000–2000 BCE) in coexistence with hunter–gatherer communities. The pastoral nomadic lifestyle, linked to horse domestication,50 emerged in Central Asia ~5000 BCE and gained importance during the late Bronze Age. The pastoral nomadic tribes eventually came to dominate the Ponto–Caspian steppes during the first millennium BCE and entered historical records as Scythians (Herodotus) or Sakas (Persian accounts).51
The over-representation of historical-period DCs in pastoral nomadic Altaic-speaking populations is compatible with the development of new forms of political and social organisation in pastoralist economies. Among present-day pastoral nomads, patrilineal descent rules lead to a cultural transmission of reproductive success (Heyer et al, submitted) in male lineages.
New social systems and economic adaptations emerged after horse domestication. Horse-riding greatly enhanced both east–west connections and north–south trade between Siberia and southerly regions, and allowed new techniques of warfare, a key element explaining the successes of mobile pastoralists in their conflicts with more sedentary societies.41 A series of expansive polities emerged in Inner Asia by 200 BCE: 15 steppe polities beginning with Xiongnu (Khunnu) ~200 BCE and concluding with the Zunghars in the mid-18th century have been described, including the Mongol and the Qing empires/dynasties.42, 52 The DCs detected along the Silk Road corridor originated from 700 to 1060 CE and may be associated with dynasties that coexisted with five major empires that dominated the Ponto–Caspian steppes from the 7th to the 13th centuries: the Khitan (Great Liao), Tangut Xia, Jurchin, Kara-Khitan and the Mongol Empires.42 In order to identify the prestigious DC founders among potential candidates, the expansion strategies of these different dynasties, and also the mechanisms of co-inheritance of Y-lineages and prestige via marriage rules should be carefully investigated. As an example, sororal polygyny (in which a man can marry two or more sisters) followed by a shift to the Han Chinese system of taking one wife and one or more concubines, as occurred among the Liao elite throughout the length of the Liao dynasty, are likely to promote the efficient transmission of prestigious lineages among males.
Further investigations including ancient DNA approaches, and next-generation sequencing of modern expansion lineage Y-chromosomes could be undertaken to refine the history of the expansion lineages we have observed, and perhaps to identify the prestigious and powerful pastoralist founders associated with them.
We thank all DNA donors; Matt Hurles, Emma Parkin and Denise Carvalho-Silva for primers; and Philippe Mennencier for help with language classification. This work was supported by the Centre National de la Recherche Scientifique (CNRS) ATIP programme (to EH), by the CNRS interdisciplinary programme ‘Origines de l’Homme du Language et des Langues’, and by the European Regional Development Fund, the Conseil Régional Midi-Pyrénées and the PRES (Pôle de Recherche et d'Enseignement Supérieur) Toulouse (to PB). MAJ was supported by a Wellcome Trust Senior Research Fellowship in Basic Biomedical Science (grant nos. 057559 and 087576), and PB by the Wellcome Trust and the CNRS.
About this article
Supplementary Information accompanies this paper on European Journal of Human Genetics website (http://www.nature.com/ejhg)
Scientific Reports (2017)