Main

According to a modified version of the Y-chromosome Consortium tree,1 haplogroup C3c is defined by three parallel mutations: M48, M77 and M86. However, Pakendorf et al.2, 3 have demonstrated that a few Siberian individuals from Evenk and Yukaghir populations have the derived state at M48, but the ancestral state at M86. They suggested that the derived state at M48 and the ancestral state at M86 defines subhaplogroup C3c*, whereas the derived state at M86 further defines subhaplogroup C3c1. Unfortunately, this suggestion had not been taken into account and, in general, any one of three markers is analyzed in studies of Eurasian populations (for example, M484, 5, 6, 7, 8, M869, 10, 11, 12 and M7713, 14, 15, 16 in studies). Moreover, according to the structure of the Y-chromosome haplogroup tree proposed by the International Society of Genetic Genealogy (http://www.isogg.org; Version: 7.28; Date: 18 June 2012), haplogroup C3c is determined by parallel mutations at M48 and M77, whereas the derived state at M86 defines its C3c1 branch.

We have recently analyzed haplogroup C structure in populations of Northern Eurasia in a total sample of 1449 males from 18 ethnic groups of Siberia, Eastern Asia and Eastern Europe.16 We have found that, besides haplogroup C3c defined in our study by marker M77, some Siberian populations are characterized by high frequency of C3* Y chromosomes (for instance, Koryaks have 38.5% of C3* haplotypes). Because there are aforementioned ambiguities in C3c classification and a possibility of incomplete identification of haplogroup C3c using the only M77 marker, we have analyzed all three single-nucleotide polymorphisms (SNPs)—M48, M77 and M86—in two sets of samples in the present study, previously typed as belonging to haplogroups C3c-M77 (141 samples) and C3* (xC3a, C3b, C3c, C3d, C3e and C3f) (90 samples).

M48 and M86 typing of C3c-M77 and C3* Y chromosomes from populations of Northern Asia and Eastern Europe has shown that haplogroup C3c is strongly defined by marker M48 and consists of two subhaplogroups—C3c1 characterizing by parallel mutations at loci M77 and M86 and C3c* with ancestral states at M77 and M86 (Table 1, Supplementary Table S1). Therefore, all previously published C3c-M77 and C3c-M86 haplotypes would belong, in fact, to subhaplogroup C3c1, and it might be that some of the individuals who were classified previously as belonging to haplogroup C3c, based on analysis of the M48 alone, belong to subhaplogroup C3c*.

Table 1 Haplogroups C3c and C3* distribution (number of individuals and % values in parentheses) in populations studied

In our study, the highest frequencies of subhaplogroup C3c1-(M77, M86) were observed in Tungusic-speaking people of North-Eastern Asia, such as Evens and Evenks, as well as in Turkic-speaking Altaian Kazakhs and Mongolic-speaking Kalmyks. These results are in agreement with previous observations based on separate or joint genotyping of M77 and M86 markers.3, 9, 12, 16

C3c* haplotypes were detected in aboriginal populations of North-Eastern Asia—Koryaks (28.2%) and Evens (1.6%) from the Sea of Okhotsk coast (Magadan region) and West Evenks (2.4%) from Central Siberia (Evenki Autonomous District) (Table 1). Earlier, two Evenk individuals from southern part of Yakutia, one Yakut-speaking Evenk and one Yukaghir were found to belong to C3c*.2, 3 Therefore, the geographic distribution of subhaplogroup C3c* is limited to the eastern part of Siberia.

Comparison of short tandem repeat (STR) haplotypes belonging to subhaplogroup C3c* demonstrates that haplotypes found in Koryaks, Evenks, Yukaghirs and Even are very similar to each other and probably belong to the same branch (Supplementary Table S2). A search of similar nine-marker haplotypes (loci DYS19, DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) in the YHRD 3.0 database17 (http://www.yhrd.org; release 37 built at 17 February 2012; 99 258 haplotypes within 732 world populations) has shown (Supplementary Table S2) that analogous haplotypes are present only in Yakuts from western part of Yakutia (Viluisk region).18 Unfortunately, these haplotypes were not typed using SNP markers, so their exact phylogenetic affiliation remains unclear.

The ages of STR variation of haplogroups C3c and its branches are <10 ky (Table 2), reflecting relatively recent population expansions, connected probably with the Holocene warming. Meanwhile, the divergence time (TD)19 between subhaplogroups C3c1 and C3c* is very high, being estimated as 30.2±25.9 ky, pointing to the Upper Pleistocene origin and divergence of the C3c lineages. We should note, however, that the calculated divergence time is so high due to very high repeat variance at locus DYS389I (5.7 versus 0.00025–0.1 at the remaining seven loci), which is characterized by the average means of the repeat number of 10 and 13.4 in subhaplogroups C3c* and C3c1, respectively. However, if we assume that the deletion of three repeats at locus DYS389I in the founder C3c* haplotype was the result of a single mutation event, then the divergence time of C3c1 and C3c* haplotypes is estimated as 8.6±4.6 ky ago, started in the Holocene epoch. Unfortunately, results of molecular dating in this case are controversial, and further studies are required to clarify the haplogroup C3c diversification patterns.

Table 2 The age of STR variation within haplogroup C3c and its subgroups based on STR diversity

To sum up, by analyzing three SNPs (M48, M77 and M86), we found that haplogroup C3c is defined by marker M48, and both M77 and M86 mutations occurred on the background of derived M48 chromosomes, yielding the subhaplogroup C3c1. Subhaplogroup C3c* individuals have been detected, albeit at low frequencies, in populations of Central and Eastern Siberia (Koryaks, Evenks, Evens and Yukaghirs) up to the Okhotsk Sea coast, and based on the age of STR variation, they could be considered as remnants of the Neolithic population of Siberia.