Genetic analysis of 19 X chromosome STR loci for forensic purposes in four Chinese ethnic groups

A new 19 X- short tandem repeat (STR) multiplex PCR system has recently been developed, though its applicability in forensic studies has not been thoroughly assessed. In this study, 932 unrelated individuals from four Chinese ethnic groups (Han, Tibet, Uighur and Hui) were successfully genotyped using this new multiplex PCR system. Our results showed significant linkage disequilibrium between markers DXS10103 and DXS10101 in all four ethnic groups; markers DXS10159 and DXS10162, DXS6809 and DXS6789, and HPRTB and DXS10101 in Tibetan populations; and markers DXS10074 and DXS10075 in Uighur populations. The combined powers of discrimination in males and females were calculated according to haplotype frequencies from allele distributions rather than haplotype counts in the relevant population and were high in four ethnic groups. The cumulative powers of discrimination of the tested X-STR loci were 1.000000000000000 and 0.999999999997940 in females and males, respectively. All 19 X-STR loci are highly polymorphic. The highest Reynolds genetic distances were observed for the Tibet-Uighur pairwise comparisons. This study represents an extensive report on X-STR marker variation in minor Chinese populations and a comprehensive analysis of the diversity of these 19 X STR markers in four Chinese ethnic groups.


Results and Discussion
Polymorphism. The genotyping results of the 932 unrelated individuals from the four ethnic groups were successfully typed with the newly developed 19 X-STR loci multiplex system. Allele frequencies between female and male samples in all ethnic groups were not significantly different in the examined loci based on a Wilcoxon signed-ranks test (p ≤ 0.05). Hardy-Weinberg equilibrium (HWE) tests were performed on female samples. Based on a significance level of 0.05, the DXS10079 and DXS7424 markers in the Southern Han population; DXS10135 and DXS10134 in the Tibetan population; DXS10148, DXS10159 and DXS101 in the Uighur population; and DXS6809 in the Hui population all showed departures from HWE. However, no significant deviations from HWE were observed after Bonferroni corrections (P = 0.05/171 = 0.00029).
For these 932 samples, the number of observed alleles varies from 8 to 32 across the different loci. The allele frequencies are shown in Supplementary Tables S1-S10 and the power of discrimination in those females (PD f ) and males (PD m ), the polymorphism information content (PIC), the observed heterozygosity (Ho), the expected heterozygosity (He), the mean exclusion chance (MEC), the combined power of discrimination for the females (CDP f ) and males (CDP m ), and the combined mean exclusion chance in duo cases (CMEC d ) for the 19 loci in the Southern Han, Tibetan, Uighur and Hui ethnic groups were all shown in Tables 1,2,3,4,5,6,7,8,9 and 10. The typing results for the 9947A control DNA were consistent with those reported in the X chromosome database shown in Supplementary Tables S1-S10. Ho and He are both greater than 0.7 for all markers and, specifically, greater than 0.75 for the DXS8378, DXS10162, DXS10164, DXS7424, DXS7423, DXS10148, DXS10135, DXS10159, DXS10101 and DXS10134 markers. The PIC values of all the selected loci were greater than 0.6 except for those of the DXS8378 marker in the Southern Han and Hui populations, the DXS10164 marker in all groups, and the DXS7423 marker in the Southern Han, Tibetan and Hui populations. The finding of low PIC value in DXS7423 was consistent to the result in Guanzhong Han, Shaanxi province, Western China 8 . The PIC values for the DXS10134, DXS10135, DXS10148 and DXS10101 markers were all greater than 0.8 across all ethnic groups. Meanwhile, the PIC values for the DXS10164 and DXS7423 markers were less than 0.5, which is consistent with   Table 3. Forensic parameters of 19 X-STR loci among the four ethnic populations.
the results of Liu et al. 9 . We found that DXS10134, DXS10079, DXS10135, and DXS10101 were the most polymorphic loci. All markers possessed high forensic efficiency values within the studied population samples, supporting the benefits of using multiplexes in forensic practices.

Linkage disequilibrium.
A previous study showed that LDE between markers more than 5 Mb apart is unlikely 10 . To validate this theory, LDE was estimated for all pairs of markers in the four population groups. In addition, gametic associations were tested for all pairs of loci in the male samples 11 . The P values for the LDE exact tests are listed in Table 11. Significant associations were found between all pairs, including between DXS10103 and DXS10101 in all four ethnic groups; between DXS10159 and DXS10162, DXS6809 and DXS6789, HPRTB and DXS10101 in the Tibetan population; and between DXS10074 and DXS10075 in the Uighur population.    www.nature.com/scientificreports/ These pairs showed a significant LDE even after Bonferroni correction (P = 0.05/171 = 0.00029). These results suggested that these loci pairs could be treated as haplotype clusters or blocks. For markers showing strong LDE, population data could directly lead to the estimation of haplotype frequencies. The haplotype frequencies and the forensic parameters for DXS10103-DXS10101 in all four ethnic groups; for DXS10159-DXS10162, DXS6809-DXS6789, and DXS10103-HPRTB-DXS10101 in the Tibetan population; and for DXS10074 -DXS10075 in the Uighur population are shown in Supplementary Tables S11-S15. Seventy-five haplotypes were observed for the DXS10103-DXS10101 pair in all 631 male samples, and the PIC and PD m values for this haplotype were both greater than 0.9. The DXS10103-DXS10101 pair was had also been treated as haplotype in Shanghai Han and Taiwanese Han populations in previous studies 12,13 . There are 11 X-STR loci that are also used for genetic testing in the Investigator Argus X-12 human identification kit (Qiagen, Hilden, Germany) 12 . These 11 shared loci were marked with an asterisk in Fig. 1. According to previous studies, even when the physical distance between loci is very small, recombination and crossing-over might still happen 14 . While DXS101-DXS7424 and DXS6789-DXS7424 were previously reported to be in linkage disequilibrium in a northwestern Italian population and other populations 15,16 , no evidence for LDE in DXS101-DXS7424 was observed in this study. Further studies should be performed to more thoroughly assess the linkage between markers and better define the proposed linkage groups.
The forensic statistical parameters found for the five haplogroups are shown in Table 12.    greater than 0.9 except for DXS10159-DXS10162 in the Tibetan population. All haplotypes showed high forensic efficiency values that reflect their utility for forensic uses.
Comparisons among the four ethnic groups. Allele frequency distribution comparisons were performed among these four ethnic populations. The allele frequency distribution showed significant differences for most of the loci among these four Chinese ethnic groups; based on these results, population analyses were performed separately for each individual population (Supplementary Table S16). Significant differences were found for 11 loci between the Han and Tibetan populations, for 1 locus between the Han and Hui populations, and for 16 loci between the Han and Uighur populations. Based on these results, the Hui population is genetically closer to the Southern Han populations than to the Tibetan and Uighur populations. The allele frequencies of these four Chinese populations were also compared with those from other populations, including the Chinese Northern Han population 17 , a Korean population 18 , a population from Japan 19 , a population from northern Germany 20 , the Polish Tatars 21 , a northern Italian population 22 , a population from Spain 23 , and an Ecuadorian Kichwa population 24 (Tables S17-S20). We found no significant differences between the Southern Han and Northern Han populations. This result was not consistent with Shin's findings 25 , probably because of the different loci assayed. Meantime, the allele frequency distribution comparisons between Southern Han and Guanzhong Han,which study concerning the same panel as our 8 , presented no significant differences in Table S22. While the value are much greater among Guanzhong Han and Tibet. Uighur. Hui than Southern Han ethnic groups in PIC, He, CDP f , CDP m CMEC t and CMEC d 8 in Table S23. We did find significant differences for most of the loci among the Southern Han, Tibetan, Uighur, Japanese, Northern German, Polish Tatars, Northern Italian, Spanish and Ecuadorian Kichwa populations (Supplementary Tables S17-S20). However, we found no significant differences among the Southern Han, Hui and Korean populations, except for the DXS8378 and DXS6789 loci.
The F-statistic (Fst) is often used in forensic sciences to measure population substructure 23 . The maximum observed Fst value was 0.01142 (p = 0.00000 ± 0.0000) for the Tibetan and Uighur populations, whereas the minimum Fst value was 0.00128 (p = 0.46847 ± 0.0572) for the Southern Han and Hui populations (Table 13). These results were consistent with the existence of population substructure within the above mentioned populations. However, these results differ from previous STR studies that showed the smallest and the largest genetic distance between the Southern Han and Uighur populations and the Tibetan and Hui populations respectively 26 . A possible explanation for this discrepancy might be that the Hui populations assayed in the two studies are from different geographical regions in China (Kansu and Sinkiang in a previous study and Ningxia Hui Autonomous region in our study).
Forensic efficiency parameter data. The forensic efficiency parameter data were calculated based on the observed haplotype frequencies when loci were in LDE and allele frequencies in the four ethnic groups,   frequencies 8 . These results showed that the 19 X-STR loci were highly polymorphic and could provide valuable information for forensic analysis 13 . This set of markers may indeed be very useful for kinship testing, as well as for human identification.
A recombination study of two-generation families with two or more children. Pairwise linkage studies and recombination fraction (θ ) calculations were performed for the 19 X-STR loci. The maximum likelihood (LOD) scores for all pairwise linkage analyses in females are shown in the Supplementary Table S21. Several marker pairs showed significant linkage (maximum LOD scores > 3). The number of informative meioses ranged from 48 to 87. LOD scores and recombination fractions for adjacent X-STR markers are listed in Table 15.
The recombination fraction estimation is necessary for the calculation of likelihood ratios when linked markers are used. It has been previously shown that X-STR recombination rates among populations may differ 27,28 . In our study, recombination among the STR clusters was inferred from Southern Han families with two or more children. We did not observe many recombination events between tightly linked markers, though they had been previously found by other researchers between the DXS10079-DXS10074 and the DXS6809-DXS6789 markers with physical distances < 1.0 Mb 29 . As suggested by previous reports, recombination estimates should be taken with caution when closely linked X-STRs are considered as stable haplotypes in kinship analysis 30 . However, no recombination events were observed within the seven linked clusters in our study. In our study, the recombination fractions observed for all pairs are in the 95% CIs. More family samples and/or more generation pedigrees are needed to obtain a better estimation of recombination events.  Tibetan populations (0.00631) and the Tibetan and Hui populations (0.00722). As to the largest genetic distance, first one was between the Tibetan and Uighur populations (0.01149), followed by the Han and Uighur populations (0.01075) and the Hui and Uighur populations (0.00900). Based on the Reynolds study, multidimensional scaling (MDS) analysis was performed to evaluate the phylogenetic relationships among the four Chinese ethics groups ( Fig. 2) (the significance of the MDS plot data was confirmed using a chi-square test). The Tibetan and Uighur populations at the upper portions of MDS plot segregated as distant outliers, revealing that the Hui and Han population were more genotypic resembling, which may due to their geographical proximity and historic distributions. A possible explanation is that intra-population marriages are more frequent in Han and Hui populations, while inter-population marriages are more common in Tibetan and Uighur populations.

Conclusions
In this study, we investigated genetic polymorphisms in four Chinese ethnic groups. We tested linkage disequilibrium in 19 X-STR loci and found that these X-STR loci were not independent from each other. Haplotypes of loci in LDE was crucial and meaningful to calculate the exact value of CDP and CMEC in relationship identification case and kinship testing. Hence, allele and haplotype frequencies were both considered when we calculated forensic parameters in this study. In addition, the results indicated that most X-STR allele frequency were shown in a specific population. What is more, the different STR loci applied in genectic distanct calculation contribute to the estimation of far or close relationship among the ethnic groups. Moreover, to achieve a better understanding of genetic structure and inter-population relationships, larger sample sizes from wider geographic area are needed for further evaluation.

Materials and methods
Sample collection and DNA extraction. In   and 132 males) from the Ningxia Hui Autonomous region. Additionally, 40 two-generation Southern Han families with two or more children (94) were tested for the recombination study. AmpFlSTR Identifiler PCR kit purchased from Applied Biosystems, were utilized. Each potential blood donor was investigated for their aboriginal ancestry before and after sample collecting. Only unrelated individuals were sampled. Human blood samples were collected upon approval by the Ethics Committee at the Institute of Forensic Sciences, Ministry of Justice, P R China. All the methods were carried out in accordance with the approved guidelines of the Institute of Forensic Sciences, Ministry of Justice, PR China. We extracted DNA from samples with magnetic beads (DNA IQ System) on the Maxwell 16 Research System (Promega, Madison WI, USA) and made quantification analysis by 7500 Real-time PCR System following the Human DNA Quantification Kit instruction manual (Thermo Fisher Scientific). Co-amplification of 19 X-STR loci (DXS7423, DXS10148, DXS10159, DXS6809, DXS7424, DXS8378, DXS10164, DXS10162, DXS7132, DXS10079, DXS6789, DXS101, DXS10103, DXS10101, HPRTB, DXS10075, DXS10074, DXS10135 and DXS10134) was performed by following the protocol described in the validation research 31 . For PCR experiment, 1 μ L of template DNA, 4 μ L of reaction mix, 2 μ L of primers, 0.2 μ L of A-Taq DNA polymerase, and sdH 2 O were added to a volume of 10 μ L solution for reaction. The same cycling parameters were selected for the direct amplification of our samples 31 , with a 1.2 mm punch from FTA blood cards.
Markers and genotyping. The amplified products were resolved and detected by capillary electrophoresis (CE) with PO denaturing polymers (Thermo Fisher Scientific) in the AB 3130xl Genetic Analyzer (Applied