Genetic Variability and Phylogenetic Analysis of Han Population from Guanzhong Region of China based on 21 non-CODIS STR Loci

In the present study, we presented the population genetic data and their forensic parameters of 21 non-CODIS autosomal STR loci in Chinese Guanzhong Han population. A total of 166 alleles were observed with corresponding allelic frequencies ranging from 0.0018 to 0.5564. No STR locus was observed to deviate from the Hardy-Weinberg equilibrium and linkage disequilibriums after applying Bonferroni correction. The cumulative power of discrimination and probability of exclusion of all the 21 STR loci were 0.99999999999999999993814 and 0.999998184, respectively. The results of genetic distances, phylogenetic trees and principal component analysis revealed that the Guanzhong Han population had a closer relationship with Ningxia Han, Tujia and Bai groups than other populations tested. In summary, these 21 STR loci showed a high level of genetic polymorphisms for the Guanzhong Han population and could be used for forensic applications and the studies of population genetics.

In the present study, we presented the population genetic data and their forensic parameters of 21 non-CODIS autosomal STR loci in Chinese Guanzhong Han population. A total of 166 alleles were observed with corresponding allelic frequencies ranging from 0.0018 to 0.5564. No STR locus was observed to deviate from the Hardy-Weinberg equilibrium and linkage disequilibriums after applying Bonferroni correction. The cumulative power of discrimination and probability of exclusion of all the 21 STR loci were 0.99999999999999999993814 and 0.999998184, respectively. The results of genetic distances, phylogenetic trees and principal component analysis revealed that the Guanzhong Han population had a closer relationship with Ningxia Han, Tujia and Bai groups than other populations tested. In summary, these 21 STR loci showed a high level of genetic polymorphisms for the Guanzhong Han population and could be used for forensic applications and the studies of population genetics. C hina is an ancient country with 5,000-year-long civilization and has the largest population in the world, about 1.371 billion in the sixth national population census of China in 2010. As the biggest one of the 56 ethnic groups and with a population of approximately 1.226 billion, the Han population is widespread across China. Their spoken and written language is Chinese, one branch of the Sino-Tibetan language family. Chu et al. constructed the phylogenies using the neighbor-joining method based on difference population data for short tandem repeat (STR) loci and concluded that there was the distinction between southern and northern populations in China 1 . For Chinese Han population, previous population genetic studies based on STRs or single nucleotide polymorphisms (SNPs) have shown that the Chinese Han population was intricately sub-structured and clustered roughly to two (northern Han and southern Han) 2-3 or three (northern Han, central Han and southern Han) subgroups 4 . So, it is of significance to further clarify the genetic structure of Chinese Han populations from different regions.
Guanzhong region, literally means ''within the passes'' in Chinese, is located in the middle of the Chinese mainland and includes the cities of Xi'an, Tongchuan, Baoji, Xianyang and Weinan in Shaanxi province, China. There are several ethnic groups, mainly including Han, Hui and Manchu nationalities living together in the region. Shen et al. reported that the Guanzhong Han population had the close genetic relationship with the northern and southern Han populations using genetic distance measurements, neighbor-joining dendrograms and principal component analysis (PCA) base on different HLA loci 5 .
STRs have been the most widely used in forensic science and population genetics. In order to provide more genetic information and increase the power of discrimination (PD) and probability of exclusion (PE), more novel STR loci with high genetic polymorphisms were integrated into one fluorescence-labeled multiplex amplification system. And, it is necessary to analyze the allelic distribution of STR loci before used in forensic applications. We have so far reported population data 6-14 for a panel of 21 STR loci, and these STR loci demonstrated tremendous potential for forensic applications. In the present study, we first aimed to present the population genetic data and forensic parameters of the Chinese Guanzhong Han (Northern Han in geography) with a panel of 21 non-CODIS autosomal STRs. Moreover, we investigated the genetic relationships and population differentiations between Guanzhong Han and other Chinese groups.

Methods
Populations and DNA extraction. Blood samples were randomly collected from 275 unrelated individual of the Han Chinese living in Guanzhong region, Shaanxi province, China. Before getting involved in the study, all the participants signed the written informed consents for the sample collections and succedent analyses. This study was conducted according to the humane and ethical research principles and approved by the ethical committee of Xi'an Jiaotong University Health Science Center, China. The genomic DNA was extracted from blood-stained samples using the Chelex-100 method as described by Walsh et al. 15 .
PCR amplification and STR typing. A panel of STRs were amplified in a single reaction using the AGCU 2111 STR system (AGCU ScienTech Incorporation, Wuxi, Jiangsu, China), according to the manufacturer's instructions. The PCR products were separated and detected by capillary electrophoresis on the ABI 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). The STR typing results were obtained by comparing to the 2111 Allelic Ladder using the program GeneMapperH ID-X v1.3 (Applied Biosystems, Foster City, CA, USA). Control DNA from 9947A cell line (Promega Corporation, Madison, WI, USA) was typed for quality control. All laboratory procedures were in accordance with the laboratory internal control standards.
Statistical analyses. Allelic frequencies and forensic parameters were calculated using the modified Powerstats v1.2 17 . The Genepop v4.0.10 (http://genepop.curtin.edu.au/) was utilized to estimate the linkage disequilibriums (LDs) for all pair-wise STR loci. To estimate the inter-population differentiations between the Guanzhong Han and 10 reference populations in China, the locus-by-locus Fst, associated p and overall Fst values were calculated using the method of analysis of molecular variance (AMOVA) by the software ARLEQUIN v3.1 (http://cmpg.unibe.ch/software/arlequin3) and the D A distances were calculated using the DISPAN program 18 . To visually estimate the genetic relationships between the Guanzhong Han and reference populations, we performed two kinds of phylogenetic trees using the software MEGA v5 with the unweighted pair-group method with arithmetic means (UPGMA) based on D A distances and the software PHYLIP v3.6 by a bootstrap-over-loci method with 1,000 replicates based on allelic frequencies, respectively. A PCA plot was conducted with MATLAB 2007a (MathWorks Inc., USA) based on allelic frequencies of 21 STRs. The existence of significant LD among STRs has an impact on some subsequent analyses, including D A calculation and MEGA, so the STR loci which observed to be in significant LD with one or more other loci would be removed in the analyses mentioned above.

Results and Discussion
The typing results of the 21 STR loci from the Guanzhong Han population were listed in supplemental Table 1, and the allelic frequencies and forensic parameters were shown in Table 1. A total of  166 alleles were observed with corresponding allelic frequencies in the range of 0.0018 to 0.5564. No STR locus was observed to deviate from the Hardy-Weinberg equilibrium (after Bonferroni's correction; p . 0.00238). All the 21 loci showed a high level of PD values, ranging from 0.7700 for D1S1627 locus to 0.9437 for D19S433 locus. The values ranged from 0.5325 (D1S1627 locus) to 0.7916 (D19S433 locus) for polymorphism information content and 0.2738 (D1GATA113 locus) to 0.5856 (D19S433 locus) for PE, respectively. Observed heterozygosity ranged from 0.5855 (D1GATA113 locus) to 0.7927 (D19S433 locus), while the expected heterozygosity ranged from 0.5940 (D1S1627 locus) to 0.8147 (D19S433 locus). The cumulative PD and PE of all the 21 STR loci were 0.99999999999999999993814 and 0.999998184, respectively. The results indicated that the panel of 21 STRs showed a high level of polymorphisms and were suited for personal identification and parentage testing in forensic science.
LD is the correlations among neighboring alleles descended from single, ancestral chromosomes 19 . The level of LD is affected by multiple factors, for example, genetic linkage, population structure and natural selection. In the present study, 11 out of 210 pairwise loci were observed to be in linkage disequilibriums for 21 STR loci in Guanzhong Han population (shown in supplementary Table 2). However, no significant linkage disequilibrium remained after applying Bonferroni correction (p , 0.05/210 5 0.00024). In addition, the loci previously reported to be in significant LD with other loci in the reference groups were removed in some subsequent analyses and there were 10 loci (including D10S1248, D11S4463, D14S1434, D18S853, D1GATA113, D22S1045, D2S441, D4S2408, D6S1017 and D9S1122) reserved for the analyses of D A calculation and MEGA.
Population differentiations between the Guanzhong Han and other 10 previously published groups were performed by the method of AMOVA based on the allelic frequencies of 21 STR loci. As shown in Table 2, the Guanzhong Han population was observed to be significantly different from the Uigur group at 9 loci, then from the Yi, Tibetan, Kazak, Russian and Salar groups at 3, 3, 2, 1, and 1 STR loci, respectively (after Bonferroni's correction; p , 0.00238). No significant difference was observed between the Guanzhong Han population and the Ningxia Han, Bai, Tujia and Mongolian groups. Nine loci, including D11S4463, D14S1434, D18S853, D19S433, D20S482, D2S1776, D4S2408, D6S1017 and D9S1122, showed no significant difference between the Guanzhong Han and reference groups. There were up to 4 reference groups at D22S1045 locus; 2 groups at D12ATA63, D1GATA113, D3S4529 and D5S2500 loci, showing significant difference from the Guanzhong Han population, respectively; and the results indicated that these loci had higher population differentiation and were appropriate for the studies of inter-population comparison.
The D A distance values based on the 10 loci between the Guanzhong Han and 10 reference groups were shown in Table 3. The largest D A distance (0.0337) was observed between the Guanzhong Han and the Yi group, followed by Russian (0.0281) and Salar (0.0264) groups; whereas the smallest distance was found with the Ningxia Han population (0.0073), followed by Tujia Tibetan, Tujia and Bai groups shared the same clade; Yi, Russian, Salar and Mongolian groups were delineated in a branch; the remaining groups including Uigur and Kazak groups clustered together. In order to further confirm the phylogenetic relationship, the phylogenetic tree was also constructed using PHYLIP v3.6 based on the allelic frequencies of 21 STR loci and the result was shown in Figure 2B. The results obtained from two phylogenetic trees were extremely similar, and the only exception was Tibetan group. The exception may due to the different number of STR loci.
As shown in Figure 3, the PCA plot among 11 groups was obtained with the first two components to be 29.92% and 16.37%, respectively, which could explain 46.29% of the variance. The Guanzhong Han    population was observed to cluster closest with the Ningxia Han population, then with the Tujia and Bai groups, which is consistent with the results of phylogenetic trees above. The genetic evidence in our study showed that the Guanzhong Han population had closer relationship with Ningxia Han, Tujia and Bai populations than other 7 groups. The present result was basically consistent with the previous result of HLA loci as described by Shen et al. 5 . In order to further understand their genetic relationships and ancestry information, more genetic markers, such as SNPs and insertion/deletion polymorphisms should be used and analyzed in future.

Conclusions
In conclusion, we presented the genetic data of the Guanzhong Han population with 21 STR loci, and these STR loci showed high level of genetic polymorphisms and were suited for forensic application for the Guanzhong Han population. The population comparison showed the Guanzhong Han had a close genetic relationship with the Ningxia Han, Tujia and Bai populations among the populations tested.