Introduction

China is an ancient country with 5,000-year-long civilization and has the largest population in the world, about 1.371 billion in the sixth national population census of China in 2010. As the biggest one of the 56 ethnic groups and with a population of approximately 1.226 billion, the Han population is widespread across China. Their spoken and written language is Chinese, one branch of the Sino-Tibetan language family. Chu et al. constructed the phylogenies using the neighbor-joining method based on difference population data for short tandem repeat (STR) loci and concluded that there was the distinction between southern and northern populations in China1. For Chinese Han population, previous population genetic studies based on STRs or single nucleotide polymorphisms (SNPs) have shown that the Chinese Han population was intricately sub-structured and clustered roughly to two (northern Han and southern Han)2,3 or three (northern Han, central Han and southern Han) subgroups4. So, it is of significance to further clarify the genetic structure of Chinese Han populations from different regions.

Guanzhong region, literally means “within the passes” in Chinese, is located in the middle of the Chinese mainland and includes the cities of Xi'an, Tongchuan, Baoji, Xianyang and Weinan in Shaanxi province, China. There are several ethnic groups, mainly including Han, Hui and Manchu nationalities living together in the region. Shen et al. reported that the Guanzhong Han population had the close genetic relationship with the northern and southern Han populations using genetic distance measurements, neighbor-joining dendrograms and principal component analysis (PCA) base on different HLA loci5.

STRs have been the most widely used in forensic science and population genetics. In order to provide more genetic information and increase the power of discrimination (PD) and probability of exclusion (PE), more novel STR loci with high genetic polymorphisms were integrated into one fluorescence-labeled multiplex amplification system. And, it is necessary to analyze the allelic distribution of STR loci before used in forensic applications. We have so far reported population data6,7,8,9,10,11,12,13,14 for a panel of 21 STR loci and these STR loci demonstrated tremendous potential for forensic applications. In the present study, we first aimed to present the population genetic data and forensic parameters of the Chinese Guanzhong Han (Northern Han in geography) with a panel of 21 non-CODIS autosomal STRs. Moreover, we investigated the genetic relationships and population differentiations between Guanzhong Han and other Chinese groups.

Methods

Populations and DNA extraction

Blood samples were randomly collected from 275 unrelated individual of the Han Chinese living in Guanzhong region, Shaanxi province, China. Before getting involved in the study, all the participants signed the written informed consents for the sample collections and succedent analyses. This study was conducted according to the humane and ethical research principles and approved by the ethical committee of Xi'an Jiaotong University Health Science Center, China. The genomic DNA was extracted from blood-stained samples using the Chelex-100 method as described by Walsh et al.15.

Genotyping results of the 21 STR loci from 10 Chinese groups were chosen for population comparison, including Mongolian (n = 86) from Inner Mongolia autonomous region6, Bai (n = 106) from Yunnan province7, Kazak (n = 114) from Xinjiang autonomous region8, Ningxia Han (Northern Han) (n = 202) from Ningxia autonomous region9, Russian (n = 114) from Inner Mongolia autonomous region10, Tibetan (n = 104) from Tibet autonomous region11, Tujia (n = 107) from Hubei province12, Uigur (n = 218) from Xinjiang autonomous region13, Yi (n = 110) from Yunnan province14, Salar (n = 120) from Qinghai province16. The geographical locations of the reference populations were shown in Figure 1.

Figure 1
figure 1

The geographical locations of the Guanzhong Han and 10 reference groups in China.

The map was created in matlab R2013b software (MathWorks Inc., USA).

PCR amplification and STR typing

A panel of STRs were amplified in a single reaction using the AGCU 21+1 STR system (AGCU ScienTech Incorporation, Wuxi, Jiangsu, China), according to the manufacturer's instructions. The PCR products were separated and detected by capillary electrophoresis on the ABI 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). The STR typing results were obtained by comparing to the 21+1 Allelic Ladder using the program GeneMapper® ID-X v1.3 (Applied Biosystems, Foster City, CA, USA). Control DNA from 9947A cell line (Promega Corporation, Madison, WI, USA) was typed for quality control. All laboratory procedures were in accordance with the laboratory internal control standards.

Statistical analyses

Allelic frequencies and forensic parameters were calculated using the modified Powerstats v1.217. The Genepop v4.0.10 (http://genepop.curtin.edu.au/) was utilized to estimate the linkage disequilibriums (LDs) for all pair-wise STR loci. To estimate the inter-population differentiations between the Guanzhong Han and 10 reference populations in China, the locus-by-locus Fst, associated p and overall Fst values were calculated using the method of analysis of molecular variance (AMOVA) by the software ARLEQUIN v3.1 (http://cmpg.unibe.ch/software/arlequin3) and the DA distances were calculated using the DISPAN program18. To visually estimate the genetic relationships between the Guanzhong Han and reference populations, we performed two kinds of phylogenetic trees using the software MEGA v5 with the unweighted pair-group method with arithmetic means (UPGMA) based on DA distances and the software PHYLIP v3.6 by a bootstrap-over-loci method with 1,000 replicates based on allelic frequencies, respectively. A PCA plot was conducted with MATLAB 2007a (MathWorks Inc., USA) based on allelic frequencies of 21 STRs. The existence of significant LD among STRs has an impact on some subsequent analyses, including DA calculation and MEGA, so the STR loci which observed to be in significant LD with one or more other loci would be removed in the analyses mentioned above.

Results and Discussion

The typing results of the 21 STR loci from the Guanzhong Han population were listed in supplemental Table 1 and the allelic frequencies and forensic parameters were shown in Table 1. A total of 166 alleles were observed with corresponding allelic frequencies in the range of 0.0018 to 0.5564. No STR locus was observed to deviate from the Hardy-Weinberg equilibrium (after Bonferroni's correction; p > 0.00238). All the 21 loci showed a high level of PD values, ranging from 0.7700 for D1S1627 locus to 0.9437 for D19S433 locus. The values ranged from 0.5325 (D1S1627 locus) to 0.7916 (D19S433 locus) for polymorphism information content and 0.2738 (D1GATA113 locus) to 0.5856 (D19S433 locus) for PE, respectively. Observed heterozygosity ranged from 0.5855 (D1GATA113 locus) to 0.7927 (D19S433 locus), while the expected heterozygosity ranged from 0.5940 (D1S1627 locus) to 0.8147 (D19S433 locus). The cumulative PD and PE of all the 21 STR loci were 0.99999999999999999993814 and 0.999998184, respectively. The results indicated that the panel of 21 STRs showed a high level of polymorphisms and were suited for personal identification and parentage testing in forensic science.

Table 1 The allelic frequencies and statistical parameters for the 21 STR loci in Han population from Guanzhong region, Shaanxi, China (n = 275)

LD is the correlations among neighboring alleles descended from single, ancestral chromosomes19. The level of LD is affected by multiple factors, for example, genetic linkage, population structure and natural selection. In the present study, 11 out of 210 pairwise loci were observed to be in linkage disequilibriums for 21 STR loci in Guanzhong Han population (shown in supplementary Table 2). However, no significant linkage disequilibrium remained after applying Bonferroni correction (p < 0.05/210 = 0.00024). In addition, the loci previously reported to be in significant LD with other loci in the reference groups were removed in some subsequent analyses and there were 10 loci (including D10S1248, D11S4463, D14S1434, D18S853, D1GATA113, D22S1045, D2S441, D4S2408, D6S1017 and D9S1122) reserved for the analyses of DA calculation and MEGA.

Population differentiations between the Guanzhong Han and other 10 previously published groups were performed by the method of AMOVA based on the allelic frequencies of 21 STR loci. As shown in Table 2, the Guanzhong Han population was observed to be significantly different from the Uigur group at 9 loci, then from the Yi, Tibetan, Kazak, Russian and Salar groups at 3, 3, 2, 1 and 1 STR loci, respectively (after Bonferroni's correction; p < 0.00238). No significant difference was observed between the Guanzhong Han population and the Ningxia Han, Bai, Tujia and Mongolian groups. Nine loci, including D11S4463, D14S1434, D18S853, D19S433, D20S482, D2S1776, D4S2408, D6S1017 and D9S1122, showed no significant difference between the Guanzhong Han and reference groups. There were up to 4 reference groups at D22S1045 locus; 2 groups at D12ATA63, D1GATA113, D3S4529 and D5S2500 loci, showing significant difference from the Guanzhong Han population, respectively; and the results indicated that these loci had higher population differentiation and were appropriate for the studies of inter-population comparison.

Table 2 Pairwise Fst and associated p values of 21 STR loci between Chinese Guanzhong Han population and 10 reference populations

The DA distance values based on the 10 loci between the Guanzhong Han and 10 reference groups were shown in Table 3. The largest DA distance (0.0337) was observed between the Guanzhong Han and the Yi group, followed by Russian (0.0281) and Salar (0.0264) groups; whereas the smallest distance was found with the Ningxia Han population (0.0073), followed by Tujia (0.0077) and Bai (0.0091) groups. The DA distances between Guanzhong Han and Kazak, Tibetan, Mongolian and Uigur groups were 0.0126, 0.0133, 0.0141 and 0.0153, respectively. The DA distances showed closer relationship between the Guanzhong Han and the Ningxia Han, Tujia and Bai populations. In addition, the population differentiations between the Guanzhong Han and reference groups obtained from the overall Fst values based on all the 21 loci by the AMOVA method were basically in line with that from the DA distances.

Table 3 The DA distances between Guanzhong Han population and other groups based on 10 STR loci

The phylogenetic tree constructed by the software MEGA v5 based on DA distances was shown in Figure 2A. From the figure, three clusters were observed: the Guanzhong Han, Ningxia Han, Tibetan, Tujia and Bai groups shared the same clade; Yi, Russian, Salar and Mongolian groups were delineated in a branch; the remaining groups including Uigur and Kazak groups clustered together. In order to further confirm the phylogenetic relationship, the phylogenetic tree was also constructed using PHYLIP v3.6 based on the allelic frequencies of 21 STR loci and the result was shown in Figure 2B. The results obtained from two phylogenetic trees were extremely similar and the only exception was Tibetan group. The exception may due to the different number of STR loci.

Figure 2
figure 2

Phylogenetic tree for Guanzhong Han and 10 reference populations constructed by the software MEGA v5 based on DA distances (A) and by the software PHYLIP v3.6 based on allelic frequencies (B), respectively.

As shown in Figure 3, the PCA plot among 11 groups was obtained with the first two components to be 29.92% and 16.37%, respectively, which could explain 46.29% of the variance. The Guanzhong Han population was observed to cluster closest with the Ningxia Han population, then with the Tujia and Bai groups, which is consistent with the results of phylogenetic trees above. The genetic evidence in our study showed that the Guanzhong Han population had closer relationship with Ningxia Han, Tujia and Bai populations than other 7 groups. The present result was basically consistent with the previous result of HLA loci as described by Shen et al.5. In order to further understand their genetic relationships and ancestry information, more genetic markers, such as SNPs and insertion/deletion polymorphisms should be used and analyzed in future.

Figure 3
figure 3

Principal component analysis plot structured based on allelic frequencies of 21 STR loci in 11 populations.

Conclusions

In conclusion, we presented the genetic data of the Guanzhong Han population with 21 STR loci and these STR loci showed high level of genetic polymorphisms and were suited for forensic application for the Guanzhong Han population. The population comparison showed the Guanzhong Han had a close genetic relationship with the Ningxia Han, Tujia and Bai populations among the populations tested.