Introduction

The Y-chromosome lineages in East Asian populations have been examined extensively. It has been shown that several dominant Y-chromosome haplogroups, such as O-M175, D-M174 and C-M130, and several relatively rare Y-chromosome haplogroups, such as F-M89, K-M9, P-M45 and N-M231, constitute the East Asian Y-chromosome gene pool.1, 2, 3, 4, 5 The ethnically diversified populations in East Asia have been suggested as the descendants of ancient modern humans of African origin, having a significant role in subsequent migrations into Siberia and the Americas.1, 6 However, the migration routes of ancient modern humans into East Asia have long been debated, although two major routes have been proposed: the southern route and the northern route.1, 6

On the basis of the Y-chromosome lineage analysis, several research groups have attempted to elucidate the timing and the routes of the prehistoric migration of modern humans into East Asia. It is widely accepted that there is a genetic divergence between northern (NEAS) and southern (SEAS) East Asian populations.1, 2, 3, 4 However, the relationship between NEAS and SEAS populations, and the cause of genetic divergence remain controversial.2, 3, 4 We have previously suggested a southern origin for all East Asian populations based on the screening of 19 Y-chromosome single-nucleotide polymorphisms and a set of autosomal microsatellites in East Asian populations.3, 7 Subsequently, an extended examination of Y-chromosome variation performed by Karafet et al.4 showed that NEAS populations have higher Y-chromosome diversity than do SEAS populations. Recently, Xue et al.2 reported that the pooled Y-chromosome short tandem repeats (STRs) have a higher diversity in NEAS populations than in SEAS populations. Therefore, these two investigations of Y-chromosome diversity in East Asia suggested the potential existence of the northern route.

Through a detailed analysis of the expansion time and distribution pattern of one dominant Y-chromosome haplogroup in certain geographical regions, the timing and routes of the prehistoric migrations can be determined more objectively and the influence of recent population admixture can be avoided. This approach has been proven effective in inferring the prehistoric migrations of modern humans into Europe.8, 9 Our previous study on Hg O3-M122 indicated a clear pattern of southern origin of this lineage and provided a solid evidence for the proposed southern route.10 Through detailed analysis of Hg N-M231, Rootsi et al.5 also detected the same migration route via Southeast Asia. The remaining question is whether the migrations of other haplogroups into East Asia followed the same route.

Hg C-M130 has a wide distribution across Asia1, 2, 4, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 and Oceania,12, 15, 23 less frequent in Europe11, 13, 16, 28, 29, 30, 31 and the Americas,26, 32, 33, 34 and absent in Africa.11, 18, 35 As a non-African lineage, Hg C is highly informative in tracing the migration route of the African exodus in prehistory.6 However, when and where Hg C occurred, migrated and expanded is yet to be disclosed. At present, most of the archaeological and genetic evidence supports that the earliest African exodus went out of Africa via the Red Sea and then rapidly migrated to mainland Southeast Asia through the Indian coastline, and eventually reached Oceania.36, 37, 38, 39 Recent Y-chromosome and mitochondrial DNA analysis in Australia and New Guinea has shown that Hg C is likely one of the earliest Out-of-Africa founder types,12 which was also proposed in another study,6 and that mitochondrial DNA lineages consisting of the founder types (M and N) are dated to approximately 50–70 KYA.12

Given the early settlement in Oceania, it remains unknown whether modern humans migrated to mainland East Asia at the same time and when and by which route they expanded across East Asia. A previous study has suggested that Hg C migrated into East Asia via both the northern route and the southern route approximately 45–50 KYA.6 However, it was also suggested that Hg C in Central Asia had a Mongol origin.40 On the other hand, the fossil records in East Asia indicate that the earliest record of modern humans was 40 KYA.1, 41 In addition, the inference based on the dental traits suggested that the earliest East Asians were the direct descendants of Southeast Asians and migrated into East Asia via the Sunda shelf.42 As an ancient haplogroup, Hg C could provide important clues to recover traces of the early colonization of Asia by anatomically modern humans.

Materials and methods

Samples

In this study, a total of 4284 unrelated males, including 4196 males from 134 East Asian populations and 88 males from 6 Southeast Asian populations (Figure 1 and Supplementary Figure 1), were recruited with informed consent. The protocol of this study was approved by the Institutional Review Board of the Kunming Institute of Zoology, Chinese Academy of Sciences. A total of 194 M130-derived Y chromosomes were extracted from the literature, and 10 M130-derived Australians typed in our previous project were included (Supplementary Table 1).

Figure 1
figure 1

The hierarchical phylogenetic relationships and distribution frequencies of Hg C and its subhaplogroups. In the Y-chromosomal haplogroup tree, Hg C2 is the combination of Hg C2* individuals and M208-derived individuals. Hg C4 includes Hg C4*, Hg C4a and Hg C4b. aThis study; bAustro-Asiatic-speaking populations; cAustronesian-speaking populations; dDaic-speaking populations; eHmong-Mien-speaking populations; fTibeto-Burman-speaking populations; gAltaic-speaking populations; #Southern and Northern East Asia are geographically separated by Yangtze River; ‘—’ indicates no available data.

Y-chromosome genotyping

Using a hierarchical genotyping strategy,11, 43 we first genotyped three Y-chromosome markers: M175, YAP and M130. The M130-derived individuals were then subjected to further typing of 12 biallelic markers, which define 13 subhaplogroups: C*-M130, C1-M8, C2-M38, C3*-M217, C3a-M93, C3b-P39, C3c-M48, C3d-M407, C3e-P53.1, C3f-P62, C4-M347, C5-M356 and C6-P55, the phylogenetic relationships of which are illustrated in Figure 1, according to the Y Chromosome Consortium (YCC 2002)44 and Y Chromosomal Haplogroup Tree.45 The genotyping primers were from the literature: M175, YAP, M130, M8, M38, M217, M93 and M48 from Underhill et al.;6 P39 from Zegura et al.;32 P53.1, P55 and P62 from Karafet et al.;45 M407 and M356 from Sengupta et al.;14 and M347 from Hudjashov et al.12 The biallelic markers were determined by sequencing PCR products, with the exceptions that the M130T allele was detected by PCR-restriction fragment length polymorphism (Bsl I digestion), M175 by running denatured PCR products on ABI 3730 and YAP by direct agarose electrophoresis of PCR products. To evaluate the phylogeographic structure of Hg C, we also typed eight commonly used Y-STR markers: DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393 using fluorescence-labeled primers (obtained from Applied Biosystems, Foster City, CA, USA) and then running denatured PCR products on ABI 3730. The Y-STR nomenclature follows the system proposed by Butler et al.46

Data analyses

Together with the published data, the frequencies of M130 in worldwide populations are summarized in Figure 1 and applied to generate a contour map of frequency distribution (Figure 2) using the Surfer 7.0 software (Golden Software). The Y-STR data (Supplementary Table 1), including those from the literature,14, 23, 24, 25, 32, 33, 34, 47, 48, 49 were used to construct the median-joining networks using the program NETWORK 4.5.0.7 (Fluxus Engineering),50 and to calculate the average gene diversity and the RST genetic distances based on eight STR loci by Arlequin 3.01.51 Multidimensional scaling (MDS) analysis was performed based on the RST genetic distances using SPSS 15.0 (SPSS). The ages of STR variation and the divergence times of the Hg C subhaplogroups were estimated following Zhivotovsky et al., assuming an average Y-STR mutation rate of 0.00069 per locus per 25 years.8, 14, 52, 53 The age of STR variation within a haplogroup reveals the time when variation occurred compared with a median haplotype in the given population; for the divergence time of a haplogroup, it represents the time when a subhaplogroup diverged from an ancestral haplogroup.

Figure 2
figure 2

Frequency distribution of Hg C in worldwide populations and the inferred migration routes of the African exodus carrying the M130 mutation in prehistory.

Results and Discussion

Hg C is prevalent in various geographical areas (Figures 1 and 2), including Australia (65.74%), Polynesia (40.52%), Heilongjiang of northeastern China (Manchu, 44.00%), Inner Mongolia (Mongolian, 52.17%; Oroqen, 61.29%), Xinjiang of northwestern China (Hazak, 75.47%), Outer Mongolia (52.80%) and northeastern Siberia (37.41%). Hg C is also present in other regions, extending longitudinally from Sardinia13 in Southern Europe all the way to Northern Colombia,32 and latitudinally from Yakutia24 of Northern Siberia and Alaska32 of Northern America to India, Indonesia and Polynesia, but absent in Africa.

As shown in Figure 1, most of the subhaplogroups of Hg C have a geographically pronounced distribution. Hg C6, which is defined by a recently identified marker,45 was not detected in our samples. Hg C1 and C4 are completely restricted to Japan and Australia, respectively, and not detected in the other samples from East Asia and Southeast Asia. Hg C5 occurs in India and its neighboring regions Pakistan and Nepal.14, 54 In mainland East Asia, four Hg C5 individuals were detected, including two in Xibe, one in Uygur and one in Shanxi Han. Although the dispersal of Hg C2 is relatively wide, its distribution remains limited to Oceania and its neighboring regions, except Australia. In our samples, only three Hg C2 individuals were observed in Eastern Indonesia, which is consistent with previous reports.15, 23 Hg C3 is the most widespread subhaplogroup, which was detected in Central Asia, South Asia, Southeast Asia, East Asia, Siberia and the Americas, but absent in Oceania. Different subhaplogroups of Hg C that do not overlap between the regions suggest that these individuals have undergone long-time isolation. As these subhaplogroups have a common origin by sharing the M130-derived allele, their geographical distributions enable us to infer the prehistoric migration routes of this lineage.

Hg C3 (defined by M217) can be further divided into six sub-branches: C3a-M93, C3b-P39, C3c-M48, C3d-M407, C3e-P53.1 and C3f-P62. As shown in the recent Y Chromosomal Haplogroup Tree (Figure 1), Hg C3* is the ancestral state at the M93, P39, M48, M407, P53.1 and P62 loci, and therefore presumably ancestral to the other Hg C3 sub-branches (Hg C3* may contain unidentified sub-branches, and therefore may not be a monophyletic group). Previous data have shown that Hg C3a and C3b were only detected in Japan11 and North America,32 respectively. Hg C3c was detected in NEAS populations, Siberia and Central Asia.2, 16, 23, 24, 25, 55 Hg C3d was detected only in a Yakut population.14 Hg C3* was detected in multiple regions, including Southeast Asia, East Asia, Central Asia and Siberia. Unfortunately, in the published data, those Hg C3* individuals were not subtyped,2, 4, 6, 14, 15, 16, 23, 24, 25, 54, 55 and therefore it cannot be correctly assigned to the Hg C3 sub-branches.

In our samples, a total of 465 M130-derived Y chromosomes were identified (Figure 1 and Supplementary Table 1), and 430 of them were M217-derived (Hg C3) individuals, including 374 Hg C3* (all non-M93-, P39-, M48-, M407-, P53.1- and P62-derived individuals are assigned to the Hg C3* group in this study), 18 Hg C3c, 23 Hg C3d and 15 Hg C3e individuals. Hg C3a, C3b and C3f were not detected. As shown in Figure 1, the high frequencies of Hg C3* are observed in NEAS populations, including Inner/Outer Mongolians and Manchurian from Heilongjiang and Hazak (>30%). A total of 23 populations among the 31 NEAS tested have Hg C3* with frequencies >10%. Relatively low frequencies of Hg C3* are observed in SEAS populations. Only 9 populations out of the 47 SEAS have frequencies >10%, and Hg C3* is totally absent in 14 populations. As for Hg C3c and Hg C3e, they have similar distribution patterns and occur in Tibetan and Altaic populations with the exception of one Hg C3c individual and one Hg C3e individual detected in Heilongjiang Han and Gansu Han, respectively. Hg C3d is sparsely distributed in East Asian populations (Figure 1). In addition, there are 28 Hg C* individuals (Hg C* represents non-M8-, M38-, M217-, M347-, M356- and P55-derived individuals and is considered a potential ancestral haplogroup of the Hg C lineage in this study, although it may contain unidentified subclades), 7 in NEAS, 19 in SEAS and 2 in Southeast Asia (Figure 1). Combining the recently reported data,2 Hg C* occurs from the southernmost to the northernmost in East Asia, but is more frequent in SEAS than in NEAS populations. Previous studies have shown that Hg C* might also exist in Central Asia.16, 17 However, we believe that these Hg C* individuals should be Hg C3 because many sub-branch markers were not typed in the reported studies. This speculation is further supported by two lines of evidence. First, in Central Asia, all M130-derived individuals detected by Karafet et al.4 are M217-derived. Second, the assumed Hg C* individuals in Central Asia are shown to be the descendants of Mongols by subsequent Y-STR analysis.40

The phylogeographic pattern of Hg C is consistent with the mitochondrial DNA evidence indicating rapid initial settlement, followed by prolonged isolation.36 As shown in Figure 3a, most of the East Asian populations cluster together in the MDS plot, whereas other populations show separations from each other and have relatively large genetic distances, especially the Japanese-specific Hg C1 being clearly an outlier in the MDS plot. Interestingly, besides Hg C1, Japanese also have M217-derived individuals who have a close relationship with the Han Chinese (Figures 3a and b), rather than with the Altaic-speaking populations. Therefore, the two distinctive sets of Hg C lineages in Japan support the hypothesized two independent migration waves to Japan,23 that is, the Paleolithic migration and the Neolithic migration likely due to the demic diffusion of the Han culture.56 The Hg C5 sublineage in India is also distinctive in the MDS plot, but with relatively short genetic distances with the East Asian populations. As expected, Australians and Austronesians are clustered together and are relatively close to SEAS, including Hmong-Mien-, Daic- and Austro-Asiatic-speaking populations. Native Americans and Siberians are close in the MDS plot with short genetic distance with the Altaic-speaking populations, which can also be reflected when only analyzing the Hg C3 sublineage (Supplementary Figure 2).

Figure 3
figure 3

The MDS plots. Populations in (a) are grouped according to geographic distributions and language families and include all M130-derived individuals. Their detailed information can be obtained from Supplementary Table 1. Populations in (b) include only Hg C3* individuals.

To estimate STR gene diversity, we grouped the populations based on geographical regions and language families (Supplementary Table 2). A general east-to-west and south-to-north cline was observed. The Austronesian group has the highest diversity (0.582), followed by Australian (0.545), Hainan aborigines (0.522) and Southern Han (0.508). In contrast, Siberian, Native American, Tibeto-Burman and Altaic groups show relatively low diversities (0.251, 0.359, 0.317 and 0.371, respectively). Hence, in combination with the above analysis of the MDS plot (Figure 3a), the STR diversity pattern (Supplementary Table 2) suggests that Southeast Asia might be the cradle land of the M130 lineage, and that the M130 lineage, derived from the M168 ancestral type (the shared marker in non-Africans),6 first migrated into mainland Southeast Asia by way of the Indian subcontinent, and then into Australia and mainland East Asia separately. After its settlement in Southeast Asia in prehistory, the M130 lineage probably experienced a population expansion as reflected by the high STR diversity. It then began to migrate northward via the coastline, and gradually settled in southern and northern East Asia, then northeast Siberia, and finally into the Americas via Beringia.

The M217-derived (Hg C3) lineages are informative in revealing the eastward migration of modern humans into East Asia in prehistory because of its extensive distribution in East Asia, Central Asia and Siberia. It was suggested that the M217-derived individuals first reached South Asia and then started migrating eastward through two routes: Central Asia and Southeast Asia.6 However, the Central Asian M217-derived individuals were shown having a recent Mongol origin (1000 years ago).40 The Han Chinese display a high STR diversity (Supplementary Table 2 and Figure 4), especially those in the eastern coastal region (0.467) as well as other eastern populations (Korean, 0.463; Japanese, 0.453), whereas populations in the north and west show low diversities (Altaic, 0.281; Tibetan, 0.366). Therefore, the distribution and gene diversities of the M217-derived lineages support a single eastward migration through the southern route and the subsequent northward migration of Hg C along the coastline of mainland East Asia in prehistory. The evidence from dental morphological traits pointed to the same direction.42

Figure 4
figure 4

The median-joining networks of Y-STR haplotypes within subhaplogroups of Hg C. The network of Hg C3* was constructed by the median-joining method after weighting STRs according to their repeat number variances and processing the data using the reduced median method. The sizes of the nodes are proportional to their frequencies. The lengths of the lines are proportional to the mutational steps.

The sub-branches (namely Hg C3c, C3d and C3e) of Hg C3 in East Asia can also tell the pattern of prehistoric migrations of regional populations. Hg C3c is restricted to Altai-speaking populations with only sporadic appearance in Northern Han Chinese (one individual), Tibetan (four individuals) and Japanese (three individuals) (Figure 1). Among the 82 Hg C3c individuals identified, 76 of them (92.7%) share a 9-repeat motif at the DYS391 locus. The median-joining network of Hg C3c (Figure 4) indicated a star-like/short-distanced network, implying that Hg C3c has a relatively recent origin. Hg C3d was detected in NEAS and SEAS populations (Figure 1), but it is more prevalent in NEAS. Moreover, the Y-STR diversity of Hg C3d is higher in NEAS (0.313) than in SEAS (0.198). As shown in the median-joining network (Figure 4), the Hg C3d individuals in NEAS have more STR haplotypes than those in SEAS, suggesting that Hg C3d likely occurred in NEAS and then expanded to SEAS recently due to the demic diffusion of the Han culture.56 Hg C3e was detected only in NEAS (Figure 1) with a low STR diversity (Figure 4), suggesting its recent origin in NEAS.

On the basis of STR data, we estimated the ages of STR variation and the divergence times of Hg C subhaplogroups (Table 1). In general, the times estimated are highly consistent with the inferred migration events. The divergence times of C3*-M217 and C1-M8 were estimated as 32.6±14.1 KYA and 41.9±16.6 KYA, respectively, indicating that the proposed eastward migration of Hg C into East Asia started about 32–42 KYA. This is consistent with the mitochondrial DNA findings, in which the Japanese- and Korean-specific Haplogroup M7a was estimated as 37.0±20.0 KYA.57 In addition, the archaeological findings also provided strong evidence that an Upper Paleolithic wave of migration brought people into Japan more than 30.0 KYA.58, 59 At that time, Pleistocene land bridges likely connected Japan to the mainland and there was a much shorter coastline between East Asia and Southeast Asia.38 However, the STR-variation ages for Hg C3* and C1 were estimated as 18.9±4.0 KYA and 10.0±3.5 KYA, respectively, reflecting relatively recent population expansions, which is reasonable because this is the time that the Last Ice Age started to retreat and the climate became warmer.60 Another ancient lineage is Hg C5 (33.3±19.1 KYA), and its divergence time agrees well with the suggested midway station of the Indian subcontinent during the eastward migration of Hg C from Africa to East Asia. Similar to Hg C1 and Hg C3*, the STR-variation age of Hg C5 also reflects recent population expansion time (14.2±3.3 KYA), which is a bit younger than the reported age by Sengupta et al.14 As expected, the sub-branches of Hg C3 are young, which are consistent with the proposed later migration events associated with these sublineages. For example, M48-derived individuals have the highest Y-STR diversity (0.384) in NEAS but with a young age of 10 KYA (Table 1). We believe that M48 originated in NEAS populations, which agrees well with the suggested recent migration (for example, the Mongol expansion) of M48-derived individuals into Central Asia and Siberia.24, 40

Table 1 The estimated ages of STR variation and the divergence times of the Hg C subhaplogroups

It should be noted that the estimated age is not necessarily always a reliable indicator of the founding date of a lineage/population. The STR-variation age of Hg C* is surprisingly young (5.5±1.6 KYA), which seems to contradict the assumed ancestral status of Hg C*. As shown in Figure 4, the STR haplotypes of Hg C* form a star-like network and the mutational steps are short. There are two possible explanations. One is that there might be other unidentified young sublineages under Hg C*. The other would invoke an ancient bottleneck-related genetic drift or natural selection. In addition, the relatively small sample size of Hg C* may also cause the underestimation. However, we tend to believe that the Hg C* individuals detected in this study are the genetic footprints of the ancient lineage because they not only have a very wide distribution (although low frequency) but also have similar STR haplotypes (Figure 4 and Supplementary Table 1). Finally, Hg C3*, C1 and C5 discussed in this study possibly contain unidentified subclades; therefore, further studies are required for a well-resolved phylogeny and detailed phylogeographic inferences.

Conclusions

We demonstrated the phylogeographic distribution of one of the most ancient non-African Y-chromosome lineages, from which we inferred the prehistoric migration and expansion of the Hg C lineage. We propose that Hg C was derived from the African exodus and gradually colonized South Asia, Southeast Asia, Oceania and East Asia by a single Paleolithic migration from Africa to Asia and Oceania, which occurred more than 40 KYA. The prehistoric northward migration of Hg C in mainland East Asia likely followed the coastline and is consistent with the northward migration of other East Asian Y-chromosome haplogroups.