Introduction

Y chromosome (NRY) is now a widely used tool to trace the origin and migration events of modern humans.1, 2, 3, 4, 5 Detailed characterization of Y-chromosome phylogeny has permitted researchers to reveal previously inaccessible details of human demographic history.6, 7, 8 Y-chromosome haplogroup O-M175 is the prominent lineage in East Asia populations, comprises ~75% of the male populations in China2, 9, 10, 11 and ~87% in Southeast Asia.12, 13, 14 O-M134 is one of the most frequent sub-lineages of O-M175, which represent ~30 million people in East Asia (Supplementary Figure S1). The frequency of O*-M134(xM117) varies in different regions, namely China (~11.4%),12 Japan (~3.6%),15 Korea (~9.6%),16 Mongolia (~4.1%),17 Thailand (~11.6%)18 and Khasi in India (~16.6%).17 This paragroup also has been found at high frequency in some Kazakhs (26.1%).19 Despite its abundance and wide distribution, the phylogeny of O-M134 has not been adequately resolved with respect to O-00261120 and O-M95.21 To date, the only marker investigated in literatures internal to O-M134 has been M117, and this was not sufficient to resolve the phylogeny of the populations belonging to this haplogroup.22 New data from high-throughput sequencing allowed the identification of more explicit branching structure within the entire Y-chromosome tree.23 More than 200 potential single-nucleotide polymorphisms (SNPs) of haplogroup O-M134 were discovered by hybridization capture and Illumina sequencing of 22 Han Chinese male Y chromosomes.23 Using 10 SNPs (F444, F629, F3451, F46, F48, F209, F2887, F3386, F1739 and F152), we genotyped >1300 Chinese males and updated the phylogenetic structure of haplogroup O-M134.

Materials and methods

We collected a total of 1301 blood samples of unrelated males with the approval of the Ethics Committee of Biological Research at Fudan University and with the signed informed consent of the donors. DNA was extracted, and genotyped at M117 and M134. Samples belonging to O*-M134(xM117) were then genotyped at F444, F629, F3451, F46, F48, F209, F2887, F3386, F1739 and F152 using the SNaPshot multiplex kit (ABI, Carlsbad, CA, USA). This information was submitted to dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), and the assigned accession numbers are ss1712295747–ss1712295756. The amplification and extension primers are listed in Supplementary Table S1. We also typed 17 commonly used Y-STR (short tandem repeat) markers: DYS19, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, H4, DYS385a and DYS385b using fluorescence labeled primers (AmpFLSTR Yfiler PCR Amplification Kit, Life Technologies, Carlsbad, CA, USA). The results were read from a 3730 sequencer (Supplementary Table S2), and ages of each clade was estimated from the median-joining networks within Network 4.6.1.1 (Fluxus) using the rho statistic. Both the evolutionary (6.9 × 10−4 per STR per generation) and the observed genealogical (2.1 × 10−3 per STR per generation) mutation rate were applied in this study,24, 25 assuming a generation time of 25 years. A Contour map was drawn based on the frequency distribution of Haplogroup O*-M134(xM117) generated in this study and previously published data (Supplementary Table S3) using the Kriging procedure with the aid of the Surfer 8.0 Software (Golden Software, Golden, CO, USA).

Results

Of the 1301 male individuals, 154 (~12%) were identified as belonging to O*-M134(xM117), in agreement with previous studies of East Asian populations.9, 12, 22 We genotyped 10 SNPs (F444, F629, F3451, F46, F48, F209, F2887, F3386, F1739 and F152) in these O*-M134(xM117) individuals, and used results to refine the O-M134 phylogenetic tree (Figure 1). We investigate the recently described O-F444 subclade within O-M134. This subgroup is parallel to O-M117, and all the novel SNPs were found in subgroup O-F444, with no individuals falling in Haplogroup O*-M134(xM117). This indicates two distinct founders within the O-M134 membership of mainland China.22 Furthermore, all the O-F444 samples tested were derived at F629 or F3451, and these two subclades had different geographic distributions (Supplementary Figure S1). Clade O-F3451 was confined to Han Chinese in the North, South and East, whereas clade O-F629 was found in all the populations of this study. The majority of this clade belong to a lineage marked by the F46 mutation (Figure 1), and the remaining part is O*-F629(xF46). In total, five parallel SNPs (F48, F209, F2887, F3386 and F1739) were assigned to downstream the O-F46 branch, the star-like structure of the O-F46 lineage indicating a male population expansion in East Asia.23 Of the six O-F46 lineages (O*-F46, O-F48, O-F209, O-F2887, O-F3386, O-F1739), the two haplogroups, O-F209 and O-F2887, were by far the most abundant in the Han Chinese in this study (~3 and ~4.2%, respectively).

Figure 1
figure 1

Refined phylogenetic tree of Haplogroup O-M134, and the frequency of its sub-haplogroups genotyped in this study. ‘East’ refers to the samples origins from the provinces of Jiangsu, Zhejiang, Anhui and Shanghai. ‘South’ refers to provinces of Guangdong, Hunan, Hubei and Fujian. ‘North’ refers to provinces of Hebei, Henan, Shandong, Shanxi, Tianjin and Beijing. ‘Northwest’ refers to provinces of Shaanxi and Gansu. ‘Southwest’ refers to Guizhou, Sichuan and Chongqing. ‘Northeast’ refers to provinces of Jilin, Heilongjiang and Inner Mongolia.

The STR data (Supplementary Table S2) allowed us to estimate the coalescence times of the O-F444 subclades in East Asians. Despite long-standing debates about the accuracy of STRs for time estimation,25, 26 the times proposed here provide at least a frame of reference (Table 1), more accurate times will be confirmed in further study on dense sequencing. Two sets of commonly used mutation rates were applied in this study, one evolutionary rate (6.9 × 10−4 per STR per generation) and the other genealogical rate (2.1 × 10−3 per STR per generation). The ages estimated using evolutionary rate is three times higher than using genealogical rate. This phenomenon has been observed before and was explained by selection on deleterious mutations.27 However, owing to the better fit of genealogical rate than mutation rate with sequence-based time estimations,25, 28 we pose our time here on genealogical rate. The coalescent time of O-F444 in the Han Chinese was 7.99±1.36 kya (thousand years ago). The TMRCA (time of most recent common ancestor) of the O-F46 lineage estimated was 7.76±1.32 kya, which was within the time estimated from high-throughput sequencing on Y chromosomes SNPs.23

Table 1 Calculated coalescence times (thousand years) of Y-chromosome haplogroup O-F444 and its sub-haplogroups in Han Chinese

Discussion

Haplogroup O-M134 occurs most frequently in East Asians.9, 29 The genotyping of 10 potential SNPs under Haplogroup O-M134 enables us to refine and update the phylogeny of this lineage, and revealed that the internal structure of this haplogroup may be used to resolve the position of a large proportion of individuals in mainland China. This enables more fine-scaled estimations of patrilineal origin and will also be useful for forensic studies.

The general outline structure of O-M134 consists of bifurcations, with the exception of the star-like structure of O-F46 and O-F8, indicating demographic expansion.23 So far, estimation of TMRCA was performed following the stepwise model, a model fit well for Y STRs. However, owing to the uncertainty mutation rate for STR markers and possible deviation from stepwise model, the time estimated from STRs should be regarded just as a reference, despite its prevalence in time calculation. The time estimated from STRs should be regard just as a reference, despite its prevalence in time calculation. The estimated TMRCA of the O-F46 lineage was 7.76±1.32 kya, which is a little different but within the time estimated from high-throughput sequencing of Y-chromosome SNPs.23 This date is a little after the shift from a hunter-gather subsistence to intensive agriculture, for example, the Yangshao Culture (6.9–4.9 kya) in the Central Yellow Basin, and the Majiayao Culture (6.0–4.9 kya) in the Upper Yellow River Basin.30 Yan et al23 suggested that crop harvests may have provided a more stable source of food than hunting and gathering, enabling populations to reach greater densities.

Despite the prevalence of O-F444 across East Asian and Southeast Asian populations,12, 15, 16, 17, 18, 19 this haplogroup has also been found with moderate frequency in Qiangic populations (~5%).28 The Qiangic populations might participate in the establishing of the Sino-Tibetan populations based on recent genetic evidences.17, 18, 28 Detailed characterization of O-F444 may provide a broader framework of Sino-Tibetan populations.

In sum, our study has greatly improved the resolution of East Asian Y chromosomes. With the more widespread use of high-throughput DNA sequencing has decreased markedly the time and expense of DNA sequencing. The discovery of novel bi-allelic markers will permit improved resolution of the Y-chromosome phylogenetic tree, and help us understand better the evolutionary history of populations in East Asia.