Investigation of 12 X-STR loci in Mongolian and Eastern Han populations of China with comparison to other populations

Due to the unique inheritance pattern, X-chromosomal short tandem repeats (X-STRs) have several advantages in complex kinship cases, such as deficiency cases or grandparent-grandchild and half-sisters testing. In our study, 541 unrelated individuals gathered from Mongolian and Eastern Chinese Han populations were successfully genotyped using the Investigator Argus X-12 kit. We calculated allele/haplotype frequencies and other forensic parameters of the two populations and further explored their genetic distance with already published Chinese populations and six global populations. Our results showed that the 12 X-STR markers were highly informative in the two populations when compared with nine other Chinese populations: significant differences were found at several loci. Geographically neighboring populations or different ethnic groups within the same area appeared to have closer evolutionary relationships. We also analyzed population genetic structure by performing clustering with the STRUCTURE program and Principal Coordinate Analysis (PCoA), and we found that the Chinese and other populations enrolled in this study could be distinguished. Furthermore, Mongolian males were distinguishable from the other studied males by a moderate genetic distance. Our study also expanded the X-STR database, which could facilitate the appropriate application of the 12 X-STR markers in the forensic field in China.

genetic differences between populations in defined areas. Therefore, for a better forensic application of the 12 X-STRs, it is important to understand the similarities or differences in genetic distance between different Chinese populations.
The purpose of this study was to investigate the frequency distribution data and forensic parameters of 12 X-STR loci in Mongolian and Eastern Chinese Han unrelated individuals as well as to illustrate the X-chromosomal evolutionary relationships between different Chinese populations and in comparison to other global populations. Additionally, this study enriched the X-STR database in order to highlight its significance in forensic identification and kinship analysis in different ethnic populations of China.

Materials and Methods
Population samples and DNA extraction. More than 70% of China's Mongolian population lives in Inner Mongolia, which is located on the Eurasian continent 20 . Eastern China also has a large Mongolian population and accounts for 29.3% of the total population of China. In our study, samples from 541 healthy unrelated individuals (116 males and 116 females for the Mongolian samples, 200 males and 109 females for the Eastern Chinese Han samples) were collected. Blood samples were obtained from volunteer donors had information available under informed consent, following the protocols approved by the ethics committee at the Academy of Forensic Science, Ministry of Justice, P.R. China. DNA isolation was carried out by a Chelex-100 extraction protocol 21 . All the methods were carried out in accordance with the approved guidelines of the Academy of Forensic Sciences, Ministry of Justice, P.R. China. PCR amplification and capillary electrophoresis. All DNA samples and positive control samples (9947 A and XX28) were amplified with the Investigator Argus X-12 kit (Qiagen, Hilden, Germany) on a GeneAmp PCR System 9700 Thermal Cycler (Thermo Fisher Scientific, MA, USA), according to the manufacturer's protocol. PCR products were detected by capillary electrophoresis on an ABI PRISM 3130XL Genetic Analyzer (Thermo Fisher Scientific, MA, USA) with the protocol guide for genotyping and the allele designation was analyzed using the Genemapper ID v3.2.1 software (Thermo Fisher Scientific, MA, USA).  [10][11][12][13][14][16][17][18][19] . Forensic parameters for 12 X-STRs including polymorphism information content (PIC), homozygosity (HOM), heterozygosity (HET), power of exclusion (PE), paternity index (PI), paternity exclusion chance in duos and trios (MEC D and MEC T ) 24 , as well as power of discrimination for males (PD M ) and females (PD F ) were calculated with pooled allelic frequencies of the two populations using the online tool of the Forensic ChrX Research database (http:// www.chrx-str.org).

Statistical analysis and population comparisons.
China map was generated by and Package (ggplot2) and (maptools) of R: A Language and Environment for Statistical (https://www.R-project.org). The evolutionary history of eleven Chinese populations was inferred using the Unweighted Pair Group Method with Arithmetic mean (UPGMA) method 25 , and the optimal tree with the sum of branch length = 0.00821881 was shown (next to the branches). The evolutionary distances were used for phylogenetic analyses conducted in MEGA7 software 26 . Detailed population genetic structure analysis was performed with STRUCTURE v2.3.4 27,28 under the condition of the admixture model and correlated allelic frequencies between populations. We analyzed the structure of Mongolian/Eastern Chinese Han populations and nine other populations from previous studies based on the same 12 X-STRs. The analysis results were uploaded to the Structure Harvester 29 to detect the true K value of the data. Further comparisons among different male populations based on the Nei's unbiased genetic distance 30,31 were calculated and Principal Coordinate Analysis (PCoA, known previously as PCA) 32 was performed using GENALEX 6.5 software.
Data Availability. All data generated during this study are included in this published article (and its Supplementary Information files) and from the corresponding author on reasonable request.

Results and Discussion
HWE and LD test within populations. The results of HWE and LD tests in Eastern Chinese Han were presented in our previous report 33 . In the Mongolian samples, a total of 164 alleles were typed using the Argus X-12 kit and no statistically significant deviation from HWE was observed except for the DXS10134 locus in female samples (p = 0.0027) even after applying the Bonferroni's correction for multiple testing (p = 0.05/12). This deviation from HWE may be caused by the substructure of the chosen population, which could result in a bias of the samples. In addition, no statistically significant LD was found in Mongolian females after Bonferroni's correction (p = 0.05/66), which was different from the Eastern Chinese Han analysis. This might be due to the rigorous statistical methods. On the other hand, statistically significant LD was found between DXS10101 and DXS10103 loci in Mongolian males (p < 0.0001), which has also been reported in other populations 5,34,35 . Haplotype analysis in males. The  LG4, respectively (Supplementary Table S2). Haplotype diversity (HD) of the four LGs varied from 0.9901 (in LG2) to 0.9972 (in LG1) in Mongolian males and varied from 0.9876 (in LG3) to 0.9973 (in LG1) in Eastern Chinese Han males. This illustrated that LG1 was the most polymorphic group with a frequency of 0.0259 for the most common haplotype in Mongolian males, 10-21-24.1 and 10-21-25.1 (for DXS8378-DXS10135-DXS10148), and a frequency of 0.0200 for the most common haplotype in Eastern Chinese Han males, 10-25-27.1. Haplotype 12-16-32 (for HPRTB-DXS10103-DXS10101) was the most common, observed 12 times in LG3 of the Eastern Chinese Han samples with a population frequency of 0.0600. The results showed that the four closely linked X-STR groups are highly informative genetic markers in the two studied populations.
Genetic variability. The allelic frequency and forensic parameters of Eastern Chinese Han were also presented in our previous report 33 . Based on the calculated allelic frequencies of 12 X-STRs of Mongolian females and males, no significant difference was found between them (p > 0.05) by the Exact Test; therefore, males and females were pooled for calculating forensic parameters. The allelic frequencies of 12 X-STRs are shown in Supplementary Table S3 Table 1. In total, the combined PD F , PD M , MEC D and MEC T were calculated as exceeding 0.9999999999, 0.9999999983, 0.9999987070 and 0.9999999931, respectively. All these forensic parameters demonstrated that the 12 X-STR markers were highly polymorphic and will be useful in forensic applications or anthropological research among the Mongolian population of Inner Mongolia and the Eastern Chinese Han population.
Inter-population comparison. For a better understanding of the genetic structures of these 12 X-STRs in Chinese populations, we assembled data from five Chinese Han subpopulations and four ethnic minority groups from different areas of China (Fig. 1a, Supplementary Table S4). Detailed genotyping data and allelic frequencies of these populations assayed by Investigator Argus X-12 kit were available in previously published reports. By calculation and comparison, we noticed that all the combined PD F of the 11 populations were greater than 0.999999999, except for the Guangdong Yao ethnic group (cPD F = 0.999999998); the combined PD M ranged from 0.999999997 to 0.999999999; and the combined MEC D and MEC T ranged from 0.999998065 to 0.999999993 and 0.999998732 to 0.999999998, respectively. The abovementioned forensic parameters perfectly illustrated that the 12 X-STRs had great application value in forensic identification and kinship analysis among Chinese populations.
Next, the allelic frequencies of Mongolian and Eastern Chinese Han populations were compared with nine other Chinese populations. Since the F st value is usually used for analyzing genetic distances between populations 36,37 , we calculated the pairwise F st genetic distances (shown in Supplementary Table S5)   Phylogenetic tree by UPGMA phylograms. Despite the limited number of loci that can provide strong information about the general level of genetic diversity, the phylogenetic analyses indicated that populations that are geographically close tend to have shorter genetic distances compared to those that are geographically far apart, which is in accordance with the conventional knowledge of population genetics 38  Clustering by STRUCTURE analysis. To further investigate the population genetic structure at an individual level on a worldwide scale, the STRUCTURE program was used to process the genetic data of five representative Chinese populations (Mongolian, Han from Eastern China, Han from Liaoning province, Han from Hebei province, and Korean from Jilin province) 10,11,16,33 , as well as six extra published populations from European and Asia [39][40][41][42][43][44] . The range of possible Ks was tested from 2 to 11. According to the Evanno method 45 , when the real K is indeterminate under the interference of Ks plateaus in the distribution of L (K), it often locates at the modal value of the distribution of ΔK; therefore, the most appropriate K was 2 (Supplementary Figure S1). The clustering of the populations presented an easily distinguishable geographic pattern (Fig. 2). Mongolian, Eastern Chinese Han and the other Chinese populations belonged to cluster 1, while the other six global populations belonged to cluster 2. When K = 3-11, there were no recognizable boundaries within the Chinese populations by STRUCTURE analysis. The same situation was also observed in cluster 2 (the Emirati, Belarusian, Hungarian, Germany, Swedish and Italian populations): the five geographically close Western European populations and the Emirati from United Arab Emirates (UAE), who live in the tri-continental crossroads connecting Africa, Europe and Asia, shared a relatively consistent genetic structure pattern no matter how the K value was assumed (Supplementary Figure S2). These results conformed to the findings of Brissenden et al. based on an autosomal SNP study 46 and indicated that the Mongolian and Eastern Chinese Han were inseparable from other Chinese at the genetic level from STRUCTURE analysis.
Principal Coordinate Analysis (PCoA). The genetic pattern of the male X chromosome inherited from the mother, normally referred to as the haplotype data set, makes the genetic distribution of X-STRs in males worth exploring. PCoA provides a way of visualizing the essential patterns of genetic relationships contained in a special matrix (e.g., distance matrix) and allows us to find and plot the major patterns within a multivariate data set. PCoA gives a reasonable assumption at a higher taxonomic level, compared to the algorithms that always assume a hierarchical genetic structure in Tree building methods. In this respect, PCoA is an effective complement to the evolutionary tree. Therefore, PCoA was carried out to test for significant variation in the genetic distribution of 12 X-STR markers among the males of abovementioned eleven populations in the STRUCTURE analyses. As shown  in Fig. 3, the first two principal components explained 94.62% of the total variance observed within these populations (the first and second component accounted for 88.02% and 6.60%, respectively). In the PCoA diagram, Mongolian, Eastern Chinese Han and other Chinese populations were clustered together on the right, while other global populations were clustered on the left. Among the European male populations, Hungarian, Swedish and Italian were located in the upper left quadrant, while Germany and Belarusian were clustered together in the lower left quadrant, which was in accordance with the national stratification within Europe in other reports 41,43 . Meanwhile, the Emirati males from the Arabian Peninsula of Western Asia also located in the lower left quadrant and could not be distinguished from the Europeans by the 12 X-STRs, which was consistent with the result of the STRUCTURE analysis in our study. This may due to the complex biogeography and history background of UAE. Among the cluster of Chinese populations, Hebei Han, Liaoning Han and Korean from Jilin shared a closer genetic distance, which supported the results observed from the phylogenetic tree. Additionally, the Mongolian population was located in the lower right corner, slightly far from other Chinese groups in terms of genetic relationship. Similar results were found in a study of Mongolian Y-STRs, and this probably reflects customs and territory restriction to some extent 47 . Our analysis indicated that the 12 X-STR loci had some advantages in distinguishing Mongolian males from others by the PCoA method. The Mongolian is a traditional nomadic nationality mainly distributed in Eastern Asia, it is one of the largest minority nationalities in China, and also the main nation of Mongolia. In addition, the Mongolians are also distributed in Asian and European countries such as Russia. The Mongolian nationality first originated from the eastern bank of the ancient Wangjian River. In the early thirteenth century, the Mongolian ministry, headed by Genghis Khan, unified the Mongolia region, and gradually formed a new national community. In the history of China and even in the world, the Mongolian people have had a profound influence and significant role in the economy, politics and culture. Besides, Eastern China is a densely populated area, the population there accounts for 29.3% of the total population of China, which has the most common characteristics of Chinese. Therefore, they are both very important and particular population groups in China, that's why our study focused on them. Its particular genetic inheritance that makes the X chromosome valuable for human evolutionary and historical research 48,49 . On the one hand, there is a single copy of the X chromosome in males, it is possible to detect their haplotypes which are needed to infer the phylogeny of a region. The X/Y chromosome and mitochondrion are candidates for researches that use phylogenies, an advantage that explicates their dominant role in human historical studies [50][51][52] . On the other hand, recombination on women X chromosome also creates vast quantities of information that is vital for providing a complete view of the history of human populations. Although the bulk of human historical researches have been done with Y chromosome or mitochondrion, X chromosome has the advantage of existing in both genders and much lower rates of genetic drift 50 . The X-STR markers included in Argus X-12 kit are the most developed and widely used, there have been a great amount of 12 X-STRs data of different populations, by which we can conduct the analysis of population characteristics or geographical studies. Therefore, we conducted genetic analysis of Mongolian and Eastern Chinese Han populations based on 12 X-STR markers and achieved significant and prominent results that might be used in more precise substructure detection of Eastern Asian population and the estimation of individual biogeographical ancestry in further study.

Conclusions
Our current study indicated that the Investigator Argus X-12 kit used in Mongolian and Eastern Chinese Han sample sets provided highly polymorphic data for discriminating individuals and testing kinship, which enriched the Chinese ethnic genetic information. The significant deviations from HWE observed at the DXS10134 locus in the Mongolian population could possibly be eliminated by increasing the sample size from geographically closed regions in Inner Mongolia. The genetic comparison among eleven Chinese populations demonstrated that the statistically significant genetic distances were consistent with the known situation of these national mixed, small-settled and staggered populations living in China. In addition, neighboring populations and different ethnic groups within the same area appeared to have closer evolutionary relationships in the phylogenetic tree. The STRUCTURE analyses revealed that the Mongolian and Eastern Chinese Han populations belonged to the same cluster as the other Chinese populations, but were apparently distinct from other global populations. The PCoA analysis indicated that the 12 X-STRs could not only be applied in distinguishing Chinese males and the males from the abovementioned countries, but also able to show a moderate genetic distance between Mongolian and the other Chinese male populations. To our best knowledge, this is the first report to demonstrate that the 12 X-STR markers could be useful in dealing with the genetic relationship between Mongolian males and others.
However, for some populations, such as the Manchu and Miao with their large populations, or the Kazak with significant differences from other Chinese populations in other genetic markers, there are still no published data concerning the 12 X-STRs panel. Therefore, further studies should be focus on increasing the sample size of current ethnic groups and including more nationalities. Hopefully, our endeavor will help to establish an appropriate and integrated X-STR database that is suitable for the large population of China.