Genetic polymorphism and phylogenetic differentiation of the Huaxia Platinum System in three Chinese minority ethnicities

Short tandem repeats (STRs) with features of high polymorphism and abundant evolution information play a significant role in genetic applications such as human forensics, anthropology and population genetics. The Huaxia Platinum System was specifically exploited to allow coamplification of all markers in the expanded Combined DNA Index System and the Chinese National Database. Herein, in continuation of our previous studies, 493 unrelated individuals were firstly genotyped to investigate the efficacy of this novel system in three minority ethnicities of China (Hui, Tibetan and Uygur). Additionally, genetic relationships among our three investigated populations and other previously published populations were analyzed using pairwise genetic distances, multidimensional scaling (MDS), principal component analysis (PCA), cladogram and STRUCTURE. The combined match probabilities (CMP) for the Hui, Tibetan and Uygur groups were 1.6894 × 10−27, 6.1666 × 10−27 and 5.0655 × 10−27, respectively, and the combined powers of exclusion (CPE) were 0.999999999646627, 0.999999999304935 and 0.999999999433994. Population comparison analysis manifested that the Hui and Tibetan populations had genetic affinities with the Han, Yi and Korean populations, while the Uygur group had a close relationship with the Kazakh population. The aforementioned results suggested that the Huaxia Platinum System is a polymorphic and effective tool that is appropriate for personal identification and population genetics.


Results
Forensic parameters of the Huaxia Platinum System. This study was implemented to obtain batches of genotype data (Supplementary Tables S3) from 23 STRs from the three populations (Ningxia Wuzhong Hui, Sichuan Chengdu Tibetan and Xinjiang Kumul Uygur). No significant deviation from Hardy-Weinberg equilibrium (HWE) was detected after Bonferroni correction was performed (p > 0.05/23 ≈ 0.0022), and no significant departure from linkage disequilibrium (LD) in the locus-by-locus pairwise comparison test was observed after Bonferroni correction was performed (p > 0.05/253 ≈ 0.0002) at any STR loci or in any ethnic group (Table 1 and  Supplementary Tables S4-S6).
The allele frequencies of 23 autosomal STR loci for the three populations are given in Supplementary Tables S7-S9. The forensic informative metrics include the match probability (MP), power of exclusion (PE), polymorphism information content (PIC), typical paternity index (TPI), observed heterozygosity (Ho) and expected heterozygosity (He), which are listed in Table 1. The Ho ranged from 0.5550 (TPOX) to 0.9500 (Penta E) in the Sichuan Chengdu Tibetan (SCT) population, with an average value of 0.7780, which was lower than the average values of the Xinjiang Kumul Uygur (XKU, 0.7937) and Ningxia Wuzhong Hui (NWH, 0.7952) ethnic groups. In the three populations, Penta E had the highest discrimination power and the MP values were 0.0160 (NWH),  Interpopulation genetic distances. We performed locus-by-locus pairwise comparisons (F st ) and calculated Nei's standard genetic distances (R st ) between our three studied populations and 47 previously published worldwide populations (Asian 9,15-29 , North American 30-32 , European 33-36 , Oceanian 37 , South American 38-41 and South African 42 populations) based on allele frequencies of 20 expanded CODIS loci to infer interpopulation similarity and differentiation (Supplementary Tables S10-S13 and Fig. S1). The locus-by-locus F st and corresponding p values showed no significant genetic difference between the NWH group and Han, Tibetan, Yi and Korean populations at all loci; no significant genetic difference between the SCT population and Yi and Tibetan ethnic groups at all loci; and no significant genetic difference between the XKU ethnic group and the Xinjiang Uygur-3 and Kazakh populations at all loci. However, for the NWH group, significant genetic differences were observed with 37 reference populations, ranging from 1 to 17 loci. For the SCT group, significant genetic differences were observed with 46 reference populations, varing from 1 to 19 loci, and for the XKU group, significant genetic differences were observed with 47 Fig. 1A, our three studied populations and most Asian populations were distributed in the middle-upper part of the x-axis. Among them, the NWH and SCT populations grouped together with the Han, Yi, Tibetan, Korean and Japanese populations; however, the XKU group was relatively farther away from these populations. Most European, South American and some Oceanian populations were located in the middle-lower part of the x-axis, and other populations were scattered around in the MDS. For comparisons among Chinese populations, our three studied groups separated from each other (Fig. 2A). The NWH population gathered together with the Han group and was distributed in the first quadrant, the SCT group was situated in the fourth quadrant with the Central Chinese Han, Yi and Tibetan ethnic groups, and the XKU and remaining three Uygur and Kazakh populations were distributed in the left part of the y-axis.
As shown in Fig. 1B  Furthermore, as shown in Fig. 2C, the cladogram of 17 Chinese populations also grouped into three branches. The NWH population clustered with the Han group, the SCT population clustered with the Yi and other Tibetan ethnic groups, and the XKU group clustered with other Uygur and Kazakh populations, which made the genetic relationships between our three studied populations and 14 Chinese populations clearer. population structure analysis. To illustrate the genetic structure of our three studied populations, a STRUCTURE plot was implemented based on the genotype data of 23 STRs by combining eight previously published populations 7,13-15 . The probable admixture levels and cluster membership patterns of each population are presented in Fig. 3 and Supplementary Table S15, and the optimum K value was three. The NWH and SCT populations shared similar membership proportions with other Sino-Tibetan populations, while the XKU group had similar admixture levels with other Turkic populations and was different from Sino-Tibetan populations. The proportion of membership of the Sino-Tibetan populations in cluster 2 ranged from 37.17% to 49.03% and was approximately 10% to 20% higher than the Turkic populations (24.18% to 26.93%). In contrast, the proportion of membership of the Sino-Tibetan populations in cluster 3 varied from 20.91% to 26.65% and was approximately 20% lower than the Turkic populations (41.94% to 45.13%).

Discussion
Three minority ethnicities, the Hui, Tibetan and Uygur, have always been a hot topic in research by linguists, anthropologists, archaeologists and population geneticists 10,17,45,46 . However, the limited genetic research of the three populations in China is far from sufficient. In this study, we focused on the genetic makeup and phylogeny of the three ethnic groups using 23 autosomal STRs included in the Huaxia Platinum System. First, we confirmed that these STR loci conformed to HWE and that there was no LD existing among them. Thus the 23 STRs could be used as independent genetic markers for forensic investigation and population genetics in subsequent research. The assay performed well in the three populations and it could be a robust and effective tool for forensic applications, such as for individual identification and paternity tests.
Population substructure dissection is pivotal in population genetic studies, genotype-phenotype association research and forensic genetics, especially in East Asia with its high cultural, ethnical and geographical diversity. We performed population comparisons based on 20 autosomal STR loci (more loci can provide a higher refinement in the construction of a phylogeny than other similar studies) and found that the results of the interpopulation genetic distances, MDS, PCA and phylogenetic tree were consistent. For population comparisons worldwide, only Asian populations can be distinguished from other continental populations, which proved that the Huaxia Platinum System is useful for the identification of Asian populations. For population comparisons within China, the NWH group showed significant genetic homogeneity with the Han, Yi and Tibetan populations, which supports the origin theory of Hui via simple cultural diffusion 45 . The SCT population also had genetic affinities with the Han and Yi populations, and our results indicated that the high altitude Tibetan (Tibet Tibetan) and the low altitude Tibetan (SCT) populations had strong genetic similarity, this feature is maintained in accordance with  Supplementary Table S2. www.nature.com/scientificreports www.nature.com/scientificreports/ previous findings and is indicated by high-throughput genotyping and sequencing data [47][48][49] . Although the XKU had close genetic relationships with other Uygur and Kazakh ethnic groups, it was not as close as that between other Uygur groups. This may be caused by the special geographical position and history of Kumul. Kumul is a multiethnic region (according to the 2010 population census, the Han Chinese accounted for 68.4%, while the Uygur, Kazakh, Hui, Mongolian and other ethnic minorities accounted for 31.6%) and is located at the border of Gansu Province in the East and Mongolia in the North, which is the throat of the Silk Road and the gateway to inland China. It was founded by a people known in Han Chinese source during the 1st millennium BCE, and the Uygur people and other minority ethnicities gradually settled in this area subsequently 50,51 . Furthermore, structure analysis based on the 23 autosomal STRs manifested that the populations of the same language family had a similar genetic makeup, and this was consistent with the results of population comparison analysis, namely the genetic structure is strongly correlated with linguistic affiliations 13 .
To further understand the genetic background and differentiation of Sino-Tibetan language (Han, Tibetan, Hui and Yi) populations, we are looking for and are validating a large number of new genetic markers with ethnic distinguishing potency. Large-scale population genetic studies using different high-density genetic markers and even whole-genome sequencing (WGS) will be conducted in the future. were coamplified using the Huaxia Platinum PCR amplification system. Multiplex amplification was conducted on a ProFlex TM PCR system (Thermo Fisher Scientific, USA) following the manufacturer's protocol. The 25 μL PCR volume contained 10 μL of master mix, 10 μL of primer set, 4 μL of deionized water and 1 μL of template DNA. Thermal cycler conditions were as described below: predenaturation at 95 °C for 1 min, followed by 26 cycles of 94 °C for 3 s, 59 °C for 16 s, and 65 °C for 29 s, and a final extension at 60 °C for 5 min. Separation and analysis of PCR amplified products was performed on the Applied Biosystems 3500 Genetic Analyzer (Thermo Fisher Scientific, USA) using POP-4 polymer (Life Technologies, USA), and injections were conducted at 1.2 kV for 16 s. Allele identification was conducted using the Huaxia Platinum panels, bin sets, stutter files and a 175 relative fluorescence units (RFU) threshold, unless otherwise stated, and were compared with the allele ladder provided by the corresponding kit via Applied Biosystems GeneMapper ID-X version 1.2 software. statistical analysis. Allele frequencies, forensic statistical parameters (containing the match probability (MP), power of exclusion (PE), polymorphism information content (PIC) and typical paternity index (TPI)) and p values of the Hardy-Weinberg equilibrium test (HWE-p) of 23 autosomal STR loci were calculated in the modified PowerStat spreadsheet (Promega, Madison, WI, USA). Subsequently, the observed heterozygosity (Ho), expected heterozygosity (He) and p values of the linkage disequilibrium test (LD-p) were assessed in Arlequin v3.5 software 52 . Interpopulation differentiation was performed in Arlequin v3.5 software to compute pairwise F st using locus-by-locus pairwise population comparisons. Nei's genetic standard genetic distance (R st ) was calculated using the Phylip3.695 package. Multidimensional scaling analysis (MDS) was conducted by SPSS software (IBM SPSS, version 19.0, Chicago), and principal component analysis (PCA) was carried out based on allele frequencies using MVSP v3.22 software 43 . The phylogenetic tree based on the neighbor-joining method was delineated in the Molecular Evolutionary Genetics Analysis v7.0 (MEGA v7.0) software 44 . The genetic distribution of 11 populations was conducted using STRUCTURE v.2.3.4 software 46 , for which the parameters were a100,000 lenth of burnin period after 100,000 steps for the Markov Chain Monte Carlo (MCMC) using the standard admixture model, five independent runs for each K value and K values from 2 to 7.

Quality controls. Our laboratory is accredited by ISO 17025 and the China National Accreditation
Service for Conformity Assessment (CNAS). The recommendations published by the DNA Commission of the International Society for Forensic Genetics (ISFG) were followed in the overall experimental procedure. The positive control of Control DNA 007 and negative control of ddH 2 O in each batch of genotyping was conducted.